Title: Torus Actions and Maximum Likelihood Estimation

Abstract: We describe connections between invariant theory and maximum likelihood estimation, in the context of log-linear models. Finding a maximum likelihood estimate (MLE) is an optimisation problem over a statistical model, to obtain the point that best fits observed data. We show that this is equivalent to a capacity problem - finding the point of minimal norm in an orbit under a corresponding torus action. The existence of the MLE can then be characterized by stability under the action. Moreover, algorithms from statistics can be used in invariant theory, and vice versa. Based on joint work with Carlos Améndola, Kathlén Kohn and Philipp Reichenbach.

Title: Matrix Normal Models and Invariant Theory

Abstract: We describe connections between invariant theory and maximum likelihood estimation (ML estimation), in the context of matrix normal models. Namely, we link ML estimation in that case to the left right action of SLxSL on tuples of matrices. This enables us to characterize ML estimation by stability under that group action. To illuminate the theory the talk puts emphasis on several examples. Based on joint work with Améndola Carlos, Kathlén Kohn and Anna Seigal.

Click here for a recording of the lecture, starts at 00:35:40

Title: Gaussian Group models-I.

Abstract: The task of fitting data to a model is fundamental in statistics. For this, a widespread approach is maximum likelihood estimation (ML estimation), where one maximizes the likelihood of observing the data as we range over the model. In this talk we introduce Gaussian group models, which are multivariate Gaussian models parametrized by a representation of a group. In this general setting, we can characterize ML estimation by stability notions from invariant theory. Matrix normal models and Gaussian graphical models defined by transitive directed acyclic graphs (TDAGs) fit into this framework. For TDAGs, we provide exact conditions for the existence of the ML estimate in terms of linear independence of the rows of the sample matrix. This talk is based on joint work with Anna Seigal and Philipp Reichenbach.

Title: Gaussian Group models-II.

Abstract: The task of fitting data to a model is fundamental in statistics. For this, a widespread approach is maximum likelihood estimation (ML estimation), where one maximizes the likelihood of observing the data as we range over the model. In this talk we introduce Gaussian group models, which are multivariate Gaussian models parametrized by a representation of a group. In this general setting, we can characterize ML estimation by stability notions from invariant theory. Matrix normal models and Gaussian graphical models defined by transitive directed acyclic graphs (TDAGs) fit into this framework. For TDAGs, we provide exact conditions for the existence of the ML estimate in terms of linear independence of the rows of the sample matrix. This talk is based on joint work with Anna Seigal and Philipp Reichenbach.

Click here for a recording of the lecture, starts at 00:28:00.

Title: Implementing geometric complexity theory: On the separation of orbit closures via symmetries.

Abstract: Understanding the difference between group orbits and their closures is a key difficulty in geometric complexity theory (GCT): While the GCT program is set up to separate certain orbit closures, many beautiful mathematical properties are only known for the group orbits, in particular close relations with symmetry groups and invariant spaces, while the orbit closures seem much more difficult to understand. However, in order to prove lower bounds in algebraic complexity theory, considering group orbits is not enough. In this paper we tighten the relationship between the orbit of the power sum polynomial and its closure, so that we can separate this orbit closure from the orbit closure of the product of variables by just considering the symmetry groups of both polynomials and their representation theoretic decomposition coefficients. In a natural way our construction yields a multiplicity obstruction that is neither an occurrence obstruction, nor a so-called vanishing ideal occurrence obstruction. All multiplicity obstructions so far have been of one of these two types. Our paper is the first implementation of the ambitious approach that was originally suggested in the first papers on geometric complexity theory by Mulmuley and Sohoni (SIAM J Comput 2001, 2008): Before our paper, all existence proofs of obstructions only took into account the symmetry group of one of the two polynomials (or tensors) that were to be separated. In our paper the multiplicity obstruction is obtained by comparing the representation theoretic decomposition coefficients of both symmetry groups. Our proof uses a semi-explicit description of the coordinate ring of the orbit closure of the power sum polynomial in terms of Young tableaux, which enables its comparison to the coordinate ring of the orbit. This is joint work with Umangathan Kandasamy.

Title: Rigorous Guarantees for Tyler's M-estimator via quantum expansion.

Abstract: Estimating the shape of an elliptical distribution is a fundamental problem in statistics. One estimator for the shape matrix, Tyler's M-estimator, has been shown to have many appealing asymptotic properties. It performs well in numerical experiments and can be quickly computed in practice by a simple iterative procedure. Despite the many years the estimator has been studied in the statistics community, there was neither a non-asymptotic bound on the rate of the estimator nor a proof that the iterative procedure converges in polynomially many steps. Here we observe a surprising connection between Tyler's M-estimator and operator scaling, which has been intensively studied in recent years in part because of its connections to the Brascamp-Lieb inequality in analysis. We use this connection, together with novel results on quantum expanders, to show that Tyler's M-estimator has the optimal rate up to factors logarithmic in the dimension, and that in the generative model the iterative procedure has a linear convergence rate even without regularization.

Title: Information geometry of operator scaling

Abstract: Matrix scaling is a classical problem with a wide range of applications. It is known that the Sinkhorn algorithm for matrix scaling is interpreted as alternating e-projections from the viewpoint of classical information geometry. Recently, a generalization of matrix scaling to completely positive maps called operator scaling has been found to appear in various fields of mathematics and computer science, and the Sinkhorn algorithm has been extended to operator scaling. In this study, the operator Sinkhorn algorithm is studied from the viewpoint of quantum information geometry through the Choi representation of completely positive maps. The operator Sinkhorn algorithm is shown to coincide with alternating e-projections with respect to the symmetric logarithmic derivative metric, which is a Riemannian metric on the space of quantum states relevant to quantum estimation theory. This work is joint work with Takeru Matsuda.

Jun 11, Thursday, 8:00 PM

Title: Towards a theory of non-commutative optimization: geodesic first and second order methods for moment maps and polytopes

Abstract: This paper initiates a systematic development of a theory of non-commutative optimization. It aims to unify and generalize a growing body of work from the past few years which developed and analyzed algorithms for natural geodesically convex optimization problems on Riemannian manifolds that arise from the symmetries of non-commutative groups. These algorithms minimize the moment map (a non-commutative notion of the usual gradient) and test membership in moment polytopes (a vast class of polytopes, typically of exponential vertex and facet complexity, which arise from this a-priori non-convex, non-linear setting). This setting captures a diverse set of problems in different areas of computer science, mathematics, and physics. Several of them were solved efficiently for the first time using non-commutative methods; the corresponding algorithms also lead to solutions of purely structural problems and to many new connections between disparate fields. In the spirit of standard convex optimization, we develop two general methods in the geodesic setting, a first order and a second order method, which respectively receive first and second order information on the "derivatives" of the function to be optimized. These in particular subsume all past results. The main technical work, again unifying and extending much of the previous literature, goes into identifying the key parameters of the underlying group actions which control convergence to the optimum in each of these methods. These non-commutative analogues of "smoothness" are far more complex and require significant algebraic and analytic machinery. Despite this complexity, the way in which these parameters control convergence in both methods is quite simple and elegant. We show how bound these parameters in several general cases. Our work points to intriguing open problems and suggests further research directions.

Title: Geometric rank of tensors.

Abstract: Motivated by problems in algebraic complexity theory (the complexity of matrix multiplication), quantum information theory (the resource theory of quantum entanglement) and extremal combinatorics (the cap set problem and sunflower problem), we introduce a new tool in the study of tensors and hypergraphs, called the geometric rank. This talk is an introduction to geometric rank, its connections to other notions of rank, and its applications. This is based on joint work with Swastik Kopparty and Guy Moshkovitz.

Title: The complexity of matrix multiplication: history and recent progress

Abstract:In 1968 Strassen discovered the usual way we multiply matrices is not the most efficient possible and over the next 20 years there was steady improvement in complexity upper bounds that led to the astounding conjecture that as n goes to infinity, it becomes almost as easy to multiply nxn matrices as it is to add them. I will give an overview of this history, focusing primarily on lower bounds and conclude with a discussion of recent advances with A. Conner and A. Harper, which uses new results of Buczynska-Buczynski.

Title: A Lower Bound for Symmetric Circuits for the Permanent

Abstract: In a recent paper with Gregory Wilsenach, we establish a nearly exponential lower bound on the size of symmetric arithmetic circuits for computing the permanent. Here, a circuit is symmetric if its shape is unchanged by row and column permutations on its input matrix. In this talk I will give an exposition of the proof of this result, including relevant background on the structure of symmetric Boolean circuits.

Title: Maximum Likelihood estimation for matrix normal models via quiver representations

Abstract: A statistical model is a collection of probability distributions. One often wants to find the distribution that best fits some empirical data. A widely used method is to search for a maximum likelihood estimate (MLE) that maximizes the likelihood of observing the empirical data. A basic question is to understand the minimum number of samples required to have (almost surely) 1. a bounded likelihood function 2. the existence of an MLE and 3. the existence of a unique MLE. Work of Amendola, Kohn, Reichenbach and Seigal draws connections between MLE for a certain class of statistical models called Gaussian group models and stability notions in invariant theory. One such important Gaussian group model is called the matrix normal model which corresponds to the invariant theory of the Kronecker quiver. Using results of Kac, King and Schofield on canonical decomposition and stability, we obtain precise formulas for sample size thresholds for bounded likelihood function as well as existence and uniqueness of MLEs. This is joint work with Harm Derksen. At the end, we will briefly mention forthcoming joint work with Harm Derksen and Michael Walter which generalizes the aforementioned results to tensor normal models.