Chennai Mathematical Institute

Seminars

Computer Science Seminar
Date: Monday, 14 October 2024
Time: 12:00 - 1.00 PM
Venue: Seminar Hall
Outlier Robust Multivariate Polynomial Regression
Vipul Arora
National University of Singapore.
14-10-24

Abstract

We study the problem of robust multivariate polynomial regression: let p : \reals^n \to \reals be an unknown n-variate polynomial of degree at most d in each variable. We are given as input a set of random samples (\x_i, y_i) \in [-1,1]^n \times \reals that are noisy versions of (x_i, p(x_i)). More precisely, each $x_i$ is sampled independently from some distribution \chi on [-1,1]^n, and for each i \in [n] independently, y_i is arbitrary (i.e., an outlier) with probability at most \rho < 1/2, and otherwise satisfies |y_i - p(x_i)| \leq \sigma, for some 'small' \sigma > 0. The goal is to output a polynomial \hat{p}, of degree at most d in each variable, within an \ell_{\infty}-distance of at most O(\sigma) from p.

Kane, Karmalkar, and Price [FOCS'17] solved this problem for n=1. We generalize their results to the n-variate setting, showing an algorithm that achieves a sample complexity of O_n(d^n \log d), where the hidden constant depends on n, if \chi is the n-dimensional Chebyshev distribution. The sample complexity is O_n(d^{2n} \log d), if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most O(\sigma), and the run-time depends on \log(1/ \sigma). In the setting where each x_i and y_i are known up to N bits of precision, the run-time's dependence on N is linear. We also show that our sample complexities are optimal in terms of d^n.

Furthermore, we show that it is possible to have the run-time be independent of 1/ \sigma, at the cost of a higher sample complexity.

This is a joint work with Arnab Bhattacharyya, Mathews Boban, Venkatesan Guruswami, and Esty Kelman, and appeared at ESA'24.