Moment Varieties for Mixtures of Products

The setting of this article is nonparametric algebraic statistics. We study moment varieties of conditionally independent mixture distributions on . These are the secant varieties of toric varieties that express independence in terms of univariate moments. Our results revolve around the dimensions and defining polynomials of these varieties.


INTRODUCTION
Consider  independent random variables  1 ,  2 , . . .,   on the line R.We make no assumptions about the   other than that their moments   = E(   ) exist.Then, by [4, Theorem 30.1], the random variable   is uniquely characterized by its sequence of moments  1 ,  2 ,  3 , . ... These moments satisfy the Hamburger moment condition, which states that is positive semi-definite for all .(1) For us, the   are unknowns.The only equations we require are  0 = 1 for  = 1, 2 . . ., .
In this article we study mixtures of  independent distributions.The associated moment variety   (M , ) is the  th secant variety of the toric variety M , .It is parametrized by where  1 ,  2 , . . .,   ≥ 0 and These are the moment varieties in our title.Mixture weights can be ommitted in (3) since we work in projective geometry.
We study these and their images under certain coordinate projections P ( + −1  )−1 Here  is any partition of , and   is the set of moments   1  2 ...  where { 1 ,  2 , . . .,   }\{0} equals  as a multiset.The images in P |  | −1 of the restricted parametrizations (2) and (3) are denoted by M , and   (M , ).The restricted varieties make sense for statistics because they refer to subclasses of moments that are natural when infering parameters.We note that M , is also a toric variety and   (M , ) is its  th secant variety.Combining these parametrizations yields the 14-dimensional moment variety M 5,3 ⊂ P 34 .We will discuss the ideals of these toric varieties and their secant varieties later on.
Our study is a sequel to the work of Zhang and Kileel in [21].That article takes an applied data science perspective and it offers numerical algorithms for learning the parameters  (  )  from empirical moments   1  2 •••  .The primary focus in [21] lies on numerical tensor methods for this recovery task.
A key ingredient for their approach is identifiability, which means that the dimension of the moment variety   (M ,• ) matches the number of free parameters.
The present paper lies at the interface of computer algebra and nonparametric statistics.Our set-up is nonparametric in the sense that no model assumptions are made on the constituent random variables on R. Conditional independence arises by passing via (3) to multivariate distributions on R  .This imposes semialgebraic constraints on the moments   1  2 •••  .We disregard the inequalities in (1) and focus on polynomial equations.This leads us to projective varieties, as is customary in algebraic statistics [19].Their defining equations provide test statistics for mixtures of products [14,Section 3].
Our presentation is organized as follows.In Section 2 we demonstrate the wide range of interesting models that are featured here.Our scope includes themes from the early days of algebraic statistics: factor analysis [14] and permutation data [9,Section 6].For the special partition  = (1, 1, . . ., 1) we obtain the toric ideals associated with hypersimplices.
In Section 3 we show that our varieties exhibit finiteness up to symmetry, in the sense of Draisma and collaborators [6,11,12,13].Namely, if , ,  are fixed and  is unbounded then finitely many   -orbits of polynomials suffice to cut out the varieties   (M ,• ).Therefore, computer algebra can be useful for high-dimensional data analysis in the setting of [21].
Section 4 is a detailed study of the toric varieties M ,• .We determine their dimensions, and we investigate their polytopes and toric ideals.The ideal for M , is generated by quadrics and cubics, but the ideal for M , is more complicated.In Section 5 we turn to identifiability of the secant varieties   (M ,• ) for  ≥ 2. We present what we know about their dimensions.Our main results are Theorems 25 and 30.These rest on integer programming and tropical geometry.
The parametrization (3) represents a challenging implicitization problem.In Section 6 we report on some computational results, featuring both symbolic and numerical methods.For most examples in this paper we used symbolic computations.

FAMILIAR VARIETIES
The study of highly structured projective varieties is a main theme in algebraic statistics.This includes varieties of discrete probability distributions as well as moment varieties of continuous distributions; see e.g.[1,15].Note that Veronese varieties fall into both categories.In this section we match some of our moment varieties   (M ,• ) with the existing literature.
We begin with  = 2, so each ( 1 ,  2 , . . .,   ) has at most two non-zero entries.We change notation so that the second moments are the entries of the  ×  covariance matrix (   ).
The star denotes the join of projective varieties, which arises from the Minkowski sum of the corresponding affine cones.
The prime ideal of the model  2 (M 5,2 ) is found by eliminating the diagonal entries   from the ideal of 3 × 3-minors of the symmetric 5 × 5 matrix (   ).The elimination ideal is principal, and its generator is the polynomial This quintic is known as the pentad, and it plays an important role in factor analysis [14].In conclusion, the 8-dimensional moment variety  2 (M 5, (11) ) is already familiar to statisticians.
Remark 4. The moment variety M ,(1  ) is the toric variety associated with the hypersimplex The variety M ,(1  ) has dimension  −1, it lives in P (   )−1 and its degree is the Eulerian number (, ).This is the number of permutations of [] = {1, 2, . . ., } which have exactly  descents.Indeed, the degree of any projective toric variety equals the normalized volume of the associated polytope [18,Theorem 4.16], and the formula Vol(Δ(, )) = (, ) is wellknown in algebraic combinatorics.In [16,Theorem 2.2] it is attributed to Laplace.
In the case of the second hypersimplex, one can compute the prime ideal of   (M , (11) ) by eliminating the diagonal entries   from the ideal of ( +1) × ( +1) minors of the covariance matrix (   ).This elimination problem is tough.For some instances see [14, Table 1].
Example 5.The moment variety  5 (M 9, (11) ) is a hypersurface of degree 54 in P 35 .The representation of its equation by means of resultants is explained in [14,Example 24].
We now turn to a scenario that played a pivotal role in launching algebraic statistics in the 1990s, namely the spectral analysis of permutation data, as described in [9,Section 6].This is based on the toric ideal associated with the Birkhoff polytope, whose vertices are the ! permutation matrices of size  × .For an algebraic discussion see [18, Section 14.B].Proposition 6.The moment variety M , for the partition  = ( − 1,  − 2, . . ., 2, 1) is the toric variety of the Birkhoff polytope, which lives in P !−1 and has dimension ( − 1) 2 .
Proof.The moment coordinates   1  2 ...  for M , are indexed by the ! permutations of {0, 1, 2, . . .,  − 1}.The monomials on the right hand side of (2) have degree  − 1 in ( − 1) distinct parameters   .The exponent vectors of these monomials can be identified with the permutation matrices from which the first row has been removed.This removal is an affine isomorphism, so it preserves the Birkhoff polytope, which has dimension ( − 1) 2  Yamaguchi, Ogawa and Takemura [20] showed that the toric ideal for the variety in Proposition 6 is always generated in degree two and three.Theorem 21 generalizes this result.
The degrees reported in Examples 7 and 8 are the volume of the Birkhoff polytope.This volume is known up to  = 10 [2, 3].

FINITENESS
We consider the projective variety   (M , ) ⊂ P ( + −1  )−1 defined by (3).This parametrization can be understood as follows without any reference to probability or statistics.Namely, where are  unknown univariate polynomials.For any partition  of , let   be the subset of coefficients   1  2 ...  where { 1 ,  2 , . . .,   }\{0} equals  as a multiset.The variety   (M , ) is the closure of the image of   (M , ) under the map P ( + −1  )−1 (5) is the truncated moment generating function.Taking  = 1 we obtain the toric varieties M ,• .
The equations for these varieties satisfy finiteness up to symmetry when , ,  are fixed and  grows.Here symmetry refers to the action of the symmetric group   on our varieties, their parametrizations (3), and their prime ideals.These ideals satisfy natural inclusions where • ∈ {, }, by appending a zero to the indices of every coordinate.In symbols, If we iterate these inclusions and let the big symmetric group act, then we obtain inclusions Ideal-theoretic finiteness means that there exists  0 such that equality holds for all  >  0 .The weaker notion of set-theoretic finiteness means that equality holds in (6) after the left ideal is enlarged to its radical.The smallest possible  0 , if it exists, is a function of , , .In recent years, there has been considerable progress on commutative algebra in infinite polynomial rings with an action of the infinite symmetric group, or of rings over the category FI of finite sets with injections.The following result reflects the state of the art on that topic.Theorem 10.Given any partition  ⊢  and integer  ≥ 1, set-theoretic finiteness holds for the varieties   (M , ) and   (M , ).Ideal-theoretic finiteness holds in the toric case  = 1.
Whenever ideal-theoretic finiteness holds, one can try to use equivariant Gröbner bases [6] for computing the desired finite generating set.An implementation for the toric case is described in [13], but we found this to be quite slow.The case  = (11) is covered by Example 9.
Example 12 (Cycles in bipartite graphs).If  = 1 and  = (21) then ideal-theoretic finiteness holds with  0 = 4. Namely, the toric ideal of M , (21)  We close with a corollary that generalizes the previous two examples.Its proof rests on a forward reference to the next section, where we derive various results for our toric ideals.Corollary 14. Fix a partition  with  nonzero parts, fix  = 1, and suppose that  increases.The toric varieties M , satisfy ideal-theoretic finiteness for some  0 ≤ 3 where  is the length of .
Proof.Theorem 21 says that the ideal of M , is generated by binomials of degree at most 3.Each of the two monomials in such a binomial is a product of two or three variables The two monomials have the same -degree, where  is the matrix representing (2).This implies that the slots ℓ ∈ {1, 2, . . ., } where a nonzero index  ℓ occurs are the same in both monomials.The total number of such slots is at most 3.This yields the bound  0 ≤ 3.□

TORIC COMBINATORICS
In this section we study M , and M , for some partition  of .With each such toric variety we associate a 0-1 matrix  as in [18] whose columns correspond to the monomials in (2).The rank of  is one more than the dimension of the projective toric variety.We first show that M , has the expected dimension, namely the number of parameters minus one.
Each of these partitions induces (by permutation) at least  columns in the -matrix.For each (, 1, . . ., 1) ⊢ , pick  of these columns such that  appears in each of the  spots.The principal submatrix of  induced by all these columns is an  ×  matrix of the form , where the  row blocks are labeled ( 1 :  ∈ []), . . ., (  :  ∈ []) and the  column blocks are (7).The matrix  gives a column basis for the -matrix of the hypersimplex variety M ,(1,1,...,1) , so it is invertible.We conclude det  = det  ≠ 0, and so rank() = .Now suppose  ≤ .Index the columns of the -matrix by permutations of ( 1 , . . .,   ) with  1 + • • • +   =  ordered reverse-lexicographically. Index the rows by  11 ,  12 , . . .,   .The principal submatrix on the first 2 + 1 rows and columns is invertible, so the first 2 + 1 columns of  are linearly independent.From the remaining columns, we pick  ( − 2) − 1 of them such that for every  = 2 + 1, . . .,  exactly one has 1 in the th coordinate.In this way we obtain  linearly independent columns of .Therefore,  has full rank.□ Given a partition  ⊢  padded by zeroes to have length , we define a partition , called the reduction of .Let  0 ≥ • • • ≥   be the multiplicities of the distinct parts in .Then  = , . . ., We write  for the largest part of , so  + 1 is the number of distinct parts of .For example, the partitions (8, 5, 5, 4) and Proof.We must show that the -matrix of M , has rank ( − 1) + 1.We proceed by induction on , the base case  = 1 being the hypersimplex.We partition the rows of  into  blocks ( 1 :  ∈ []), . . ., (  :  ∈ []).The rows of  in the th block sum to the constant vector (  ,   , . . .  ).Hence, the rank of  is bounded above by ( − 1) + 1.We will show that this is also a lower bound by displaying an invertible submatrix of this size.
The toric ideals for individual partitions are very nice.Our next result builds upon [20].Theorem 21.For any partition , the ideal of M , is generated by quadrics and cubics.
Since  and  are toric, we may verify  ( ) =  on binomials.To show  ( ) ⊆  , fix a binomial in  , say of degree , written as Encode this by two  ×  matrices  = (    ) and  = (  ′   ).Membership in  means that substituting the parametrization (2) into b gives the result 0, and this is equivalent to the multiset of entries in corresponding columns of  and  being equal.This property is preserved after replacing  by  1 , replacing  − 1 by  2 etc. throughout  and .So  (b) ∈  as desired.
To prove  ⊆  ( ), be a binomial in  encoded by matrices  = (   ) and  = ( ′   ).We will construct a binomial d ∈  such that  (d) = c.In terms of matrices  and , in each of their rows we must choose one element that equals  1 and replace it by , then choose another element that equals  2 and replace it by  − 1, and so forth until the set of nonzero elements in each row has been replaced by [𝑒], in such a way so that the multiset of entries in corresponding columns of the transformed matrices  and  are equal.To achieve this it suffices to consider distinct values in  one at a time.
Without loss of generality, assume  = (1  ).Now  and  have  ones and  −  zeros per each row.To choose the elements to replace by , we consider a bipartite multigraph between the rows of  and the rows of , where an edge is drawn between a row in  and a row in  for every column in which there is a 1 in both rows.A perfect matching would give a valid choice of elements to replace by .Such a matching exists by Hall's Marriage Theorem.Indeed, for any subset of rows in  their neighborhood must contain at least | | rows in .Otherwise, there exists a column in  with more ones than the corresponding column in , since each row contains the same number of ones.But this contradicts c ∈  .Similarly, we carry out the subsequent replacements.Thus a suitable binomial d exists.It follows  ⊆  ( ).Combining with the preceding paragraph, we conclude  ( ) =  .□ By contrast, the ideals for M , appear to be more complicated.We conjecture that there does not exist a uniform degree bound for their generators that is independent of , .

SECANT VARIETIES
Theorems 16 and 19 gave the dimensions of our moment varieties for  = 1.We next focus on  ≥ 2, where   (M , ) and   (M , ) are no longer toric.We begin with an example.
Our first result explains the drop in dimension seen in the example above.
Proposition 24.The dimension of the moment variety satisfies the upper bound Proof.The given toric variety is a cone over the projective space M ,( ) = P −1 .In symbols, M , = P −1 ★ M , , where M , is the toric variety given by all + −1  − moments that involve more than one coordinate.By counting parameters, we find dim( M , ) ≤ ( − 1) − 1.We obtain the secant variety of the big toric variety as the join of the apex with the reduced toric variety:   (M , ) = P −1 ★  ( M , ).The dimension of the right-hand side is bounded above by  +  dim( M , ) +  − 1 ≤  +  • ( − 1) − 1 +  − 1.This yields (9).□ We found the inequality ( 9) to be strict when  ≥ .The following sharper bound holds.(To see it is sharper, consider  = [] and  = { } in (10).)Theorem 25.The dimension of the secant variety   (M , ) is bounded above by the optimal value of the following integer linear programming problem: and The last sum ranges over partitions  ⊢  of length ≤  having nonempty intersection with .
Proof.The secant variety   (M , ) is parameterized by the polynomial map (3).Therefore its dimension is one less than the maximal rank assumed by the differential of (3).This Jacobian matrix has size + −1  × , where the rows are labeled by  for  = 1, . . .,  and  = 1, . . .,  .We view this as a block matrix, where the rows are grouped according to the partition  given by ( 1 , . . .,   ) and the columns are grouped according to the degree .Notice that the matrix is sparse, in that a block labeled by (, ) is nonzero only if  ∈ .
Let C be a set of linearly independent columns in the Jacobian matrix, with   columns labeled by .The integers   satisfy 0 ≤   ≤  for  = 1, . . ., .Let  ⊆ [] and C ′ the subset of columns in C that are labeled by elements of .Since C ′ is linearly independent, the number of rows which are nonzero in C exceeds |C ′ |.By the aforementioned sparsity, ∑︁ We conclude that |C| − 1 is bounded above by the maximum value in (10), as desired.□ Solving an integer linear program is expensive in general.However, the integer linear program in (10) has a special structure which allows for a greedy solution that is optimal.Theorem 26.We construct a feasible solution for (10) greedily, starting with c (0) = 0 in Z  .For  = 1, . . .,  , choose c ( ) ∈ Z  such that and if is optimal for the integer linear program (10).
Proof.We claim that c ( ) is optimal for the linear program (10), with integrality constraints dropped.The dual linear program has variables   for ∅ ≠  ⊆ [] and   for  ∈ [].This dual linear program equals: It suffices to find a dual feasible point at which the dual objective equals the primal objective evaluated at c ( ) .We call a set  ⊆ [] saturated if equality holds in (11) for c ( ) .We define Hence  ∪  is saturated by primal feasibility.Thus there is a unique maximal saturated subset of [].It follows that the dual objective evaluated at (y ( ) , z ( ) ) equals the primal objective evaluated at c ( ) .This completes the proof.□ We conjecture that the integer linear program (10) computes the correct dimension: Conjecture 27.If  ≥ 3 then the bound for dim(  (M , )) in Theorem 25 is tight.
Informally, the conjecture says that the secant variety has the maximal dimension possible given the sparsity pattern of its parameterization (3).This has been verified in many cases.The question of finding the dimension is equally intriguing if we replace the parameter  by one specific partition  ⊢ .Of particular interest is the partition  = (1, 1, 1, . . ., 1) = (1  ).This toric variety has dimension  − 1, and hence we have the trivial upper bound dim   (M ,(1  ) ) ≤  ( − 1) Based on extensive computations, we conjecture that equality holds outside the matrix case: Conjecture 29.Secant varieties of hypersimplices, other than the second hypersimplex, have the expected dimension.In symbols, if 3 Theorem 5.1 in [21] implies   (M ,(1  ) ) is strongly identifiable for  ≲  ⌊ ( −1)/2⌋ .In particular, the secant variety has the expected dimension.Our next result is that the secant variety also has the expected dimension if  ≲   −2 .The proof relies on tropical geometry [10].
Theorem 30.The secant variety of the hypersimplex has the expected dimension if Proof.Assume ( 13 Note that the parenthesized sum on the left-hand side of ( 13) equals the size of each set   , while the right-hand side gives the size of .Thus, (13) guarantees that we choose at least  points in .Furthermore, these points differ pairwise in at least 3 coordinates by construction.So, the th Voronoi cell contains all elements of  that differ from   in at most one coordinate.That is, it contains   and all vertices in the hypersimplex adjacent to   .Hence Vor  () has the same affine span as Δ(, ).□

IMPLICITIZATION
We verified the dimensions in Section 5 with numerical methods for fairly large instances, by computing the rank of the Jacobian matrix of the parametrization (2).For this we employed Maple, Julia, and the numerical Macaulay2 package in [8].We found it much more difficult to solve the implicitization problem, that is, to compute the defining polynomials of our moment varieties.The pentad (4) suggests that such polynomials can be quite interesting.In this section we offer more examples of equations, along with the degrees for our varieties.The code used in our computations is available on MathRepo. 1   Remark 31.It is preferable to work with birational parameterizations when numerically computing the degree of a variety [7].However the map (2) is -to- Example 32 ( = 6,  = 3).The 5-dimensional toric variety M 6,(111) lives in P 19 , and it has degree (6, 3) = 66 by Remark 4. Its toric ideal is minimally generated by 69 binomial quadrics.These quadrics are the 2 × 2 minors that are visible (i.e.do not involve any stars) in the following masked Hankel matrix: The 6 × 15 matrix has twenty 3 × 3 minors without ★, and these vanish on  2 (M 6,(111) ).In addition to these cubics, the ideal contains 12 pentads (4), one for each facet Δ(5, 2) of the hypersimplex Δ(6, 3).Our ideal for  = 2 is generated by these 20 cubics and 12 quintics.Numerical degree computations using Remark 31 with HomotopyContinuation.jl [5] Note the beautiful combinatorics in this polynomial: the role of the 5-cycle for the pentad is now played by the quadrilateral set, i.e. the six intersection points of four lines in the plane.
We conclude this article with the smallest non-trivial secant varieties.Here "non-trivial" means  ≥ 2, the variety does not fill its ambient projective space, and the ambient dimension is as small as possible.The next two results feature all cases where + −1  ≤ 50.The list consists of (, , ) = (2, 5, 3) from Example 23 and (, , ) = (2, 4, 4) from Example 20.We state these as propositions because they represent case studies that are of independent interest for experimental mathematics, especially in the ubiquitous setting of tensor decompositions.
Section 1.1].In the formulation of [12, Corollary 1.1.2],the parametrization takes the  × N matrix whose entries are   to the N × • • • × N ( times) tensor whose entries are   1 •••  viewed as degree- moments in  dimensions.The closure of the image is topologically Sym(N)-Noetherian, which yields set-theoretic finiteness for   (M , ).The case of   (M , ) is similar.□ Remark 11.It is conjectured in [12, Conjecture 1.1.3]that the main result in [18,ple 13 (Hypersimplex).As seen in Section 2, when  = 1 and  = (1  ), the moment variety M , is the toric variety associated to the hypersimplex Δ(, ).Its ideal is generated by quadrics[18, Section 14A].The indices occurring in each quadratic binomial are 1 in at most 2 of the  coordinates.Therefore, ideal-theoretic finiteness holds with  0 = 2.