HOME

TheInfoList



OR:

Matrix completion is the task of filling in the missing entries of a partially observed matrix, which is equivalent to performing data imputation in statistics. A wide range of datasets are naturally organized in matrix form. One example is the movie-ratings matrix, as appears in the Netflix problem: Given a ratings matrix in which each entry (i,j) represents the rating of movie j by customer i, if customer i has watched movie j and is otherwise missing, we would like to predict the remaining entries in order to make good recommendations to customers on what to watch next. Another example is the
document-term matrix A document-term matrix is a mathematical Matrix (mathematics), matrix that describes the frequency of terms that occur in each document in a collection. In a document-term matrix, rows correspond to documents in the collection and columns correspo ...
: The frequencies of words used in a collection of documents can be represented as a matrix, where each entry corresponds to the number of times the associated term appears in the indicated document. Without any restrictions on the number of
degrees of freedom In many scientific fields, the degrees of freedom of a system is the number of parameters of the system that may vary independently. For example, a point in the plane has two degrees of freedom for translation: its two coordinates; a non-infinite ...
in the completed matrix, this problem is underdetermined since the hidden entries could be assigned arbitrary values. Thus, we require some assumption on the matrix to create a well-posed problem, such as assuming it has maximal determinant, is positive definite, or is low-rank. For example, one may assume the matrix has low-rank structure, and then seek to find the lowest
rank A rank is a position in a hierarchy. It can be formally recognized—for example, cardinal, chief executive officer, general, professor—or unofficial. People Formal ranks * Academic rank * Corporate title * Diplomatic rank * Hierarchy ...
matrix or, if the rank of the completed matrix is known, a matrix of
rank A rank is a position in a hierarchy. It can be formally recognized—for example, cardinal, chief executive officer, general, professor—or unofficial. People Formal ranks * Academic rank * Corporate title * Diplomatic rank * Hierarchy ...
r that matches the known entries. The illustration shows that a partially revealed rank-1 matrix (on the left) can be completed with zero-error (on the right) since all the rows with missing entries should be the same as the third row. In the case of the Netflix problem the ratings matrix is expected to be low-rank since user preferences can often be described by a few factors, such as the movie genre and time of release. Other applications include computer vision, where missing pixels in images need to be reconstructed, detecting the global positioning of sensors in a network from partial distance information, and multiclass learning. The matrix completion problem is in general
NP-hard In computational complexity theory, a computational problem ''H'' is called NP-hard if, for every problem ''L'' which can be solved in non-deterministic polynomial-time, there is a polynomial-time reduction from ''L'' to ''H''. That is, assumi ...
, but under additional assumptions there are efficient algorithms that achieve exact reconstruction with high probability. In statistical learning point of view, the matrix completion problem is an application of matrix regularization which is a generalization of vector
regularization Regularization may refer to: * Regularization (linguistics) * Regularization (mathematics) * Regularization (physics) * Regularization (solid modeling) * Regularization Law, an Israeli law intended to retroactively legalize settlements See also ...
. For example, in the low-rank matrix completion problem one may apply the regularization penalty taking the form of a nuclear norm R(X) = \lambda\, X\, _*


Low rank matrix completion

One of the variants of the matrix completion problem is to find the lowest
rank A rank is a position in a hierarchy. It can be formally recognized—for example, cardinal, chief executive officer, general, professor—or unofficial. People Formal ranks * Academic rank * Corporate title * Diplomatic rank * Hierarchy ...
matrix X which matches the matrix M, which we wish to recover, for all entries in the set E of observed entries. The mathematical formulation of this problem is as follows: :\begin & \underset & \text (X) \\ & \text & X_ = M_ & \;\; \forall i,j \in E\\ \end Candès and Recht proved that with assumptions on the sampling of the observed entries and sufficiently many sampled entries this problem has a unique solution with high probability. An equivalent formulation, given that the matrix M to be recovered is known to be of
rank A rank is a position in a hierarchy. It can be formally recognized—for example, cardinal, chief executive officer, general, professor—or unofficial. People Formal ranks * Academic rank * Corporate title * Diplomatic rank * Hierarchy ...
r, is to solve for X where X_ = M_ \;\; \forall i,j \in E


Assumptions

A number of assumptions on the sampling of the observed entries and the number of sampled entries are frequently made to simplify the analysis and to ensure the problem is not underdetermined.


Uniform sampling of observed entries

To make the analysis tractable, it is often assumed that the set E of observed entries and fixed
cardinality The thumb is the first digit of the hand, next to the index finger. When a person is standing in the medical anatomical position (where the palm is facing to the front), the thumb is the outermost digit. The Medical Latin English noun for thum ...
is sampled uniformly at random from the collection of all subsets of entries of cardinality , E, . To further simplify the analysis, it is instead assumed that E is constructed by
Bernoulli sampling In the theory of finite population sampling, Bernoulli sampling is a sampling process where each element of the statistical population, population is subjected to an statistical independence, independent Bernoulli trial which determines whether the ...
, i.e. that each entry is observed with probability p . If p is set to \frac where N is the desired expected
cardinality The thumb is the first digit of the hand, next to the index finger. When a person is standing in the medical anatomical position (where the palm is facing to the front), the thumb is the outermost digit. The Medical Latin English noun for thum ...
of E, and m,\;n are the dimensions of the matrix (let m < n without loss of generality), , E, is within O(n \log n) of N with high probability, thus
Bernoulli sampling In the theory of finite population sampling, Bernoulli sampling is a sampling process where each element of the statistical population, population is subjected to an statistical independence, independent Bernoulli trial which determines whether the ...
is a good approximation for uniform sampling. Another simplification is to assume that entries are sampled independently and with replacement.


Lower bound on number of observed entries

Suppose the m by n matrix M (with m < n) we are trying to recover has
rank A rank is a position in a hierarchy. It can be formally recognized—for example, cardinal, chief executive officer, general, professor—or unofficial. People Formal ranks * Academic rank * Corporate title * Diplomatic rank * Hierarchy ...
r. There is an information theoretic lower bound on how many entries must be observed before M can be uniquely reconstructed. The set of m by n matrices with rank less than or equal to r is an algebraic variety in ^with dimension (n+m)r - r^2. Using this result, one can show that at least 4nr - 4r^2 entries must be observed for matrix completion in ^ to have a unique solution when r \leq n/2 . Secondly, there must be at least one observed entry per row and column of M. The
singular value decomposition In linear algebra, the singular value decomposition (SVD) is a Matrix decomposition, factorization of a real number, real or complex number, complex matrix (mathematics), matrix into a rotation, followed by a rescaling followed by another rota ...
of M is given by U\Sigma V^\dagger. If row i is unobserved, it is easy to see the i^ right singular vector of M, v_i, can be changed to some arbitrary value and still yield a matrix matching M over the set of observed entries. Similarly, if column j is unobserved, the j^ left singular vector of M, u_i can be arbitrary. If we assume Bernoulli sampling of the set of observed entries, the Coupon collector effect implies that entries on the order of O(n\log n) must be observed to ensure that there is an observation from each row and column with high probability. Combining the necessary conditions and assuming that r \ll m, n (a valid assumption for many practical applications), the lower bound on the number of observed entries required to prevent the problem of matrix completion from being underdetermined is on the order of nr\log n .


Incoherence

The concept of incoherence arose in
compressed sensing Compressed sensing (also known as compressive sensing, compressive sampling, or sparse sampling) is a signal processing technique for efficiently acquiring and reconstructing a Signal (electronics), signal by finding solutions to Underdetermined s ...
. It is introduced in the context of matrix completion to ensure the singular vectors of M are not too "sparse" in the sense that all coordinates of each singular vector are of comparable magnitude instead of just a few coordinates having significantly larger magnitudes. The standard basis vectors are then undesirable as singular vectors, and the vector \frac \begin 1 \\ 1 \\ \vdots \\ 1 \end in \mathbb^n is desirable. As an example of what could go wrong if the singular vectors are sufficiently "sparse", consider the m by n matrix \begin 1 & 0 & \cdots & 0 \\ \vdots & & \vdots \\ 0 & 0 & 0 & 0 \end with
singular value decomposition In linear algebra, the singular value decomposition (SVD) is a Matrix decomposition, factorization of a real number, real or complex number, complex matrix (mathematics), matrix into a rotation, followed by a rescaling followed by another rota ...
I_m \begin 1 & 0 & \cdots & 0 \\ \vdots & & \vdots \\ 0 & 0 & 0 & 0 \end I_n. Almost all the entries of M must be sampled before it can be reconstructed. Candès and Recht define the coherence of a matrix U with
column space In linear algebra, the column space (also called the range or image) of a matrix ''A'' is the span (set of all possible linear combinations) of its column vectors. The column space of a matrix is the image or range of the corresponding matr ...
an r-dimensional subspace of \mathbb^n as \mu (U) = \frac \max_ \, P_U e_i\, ^2 , where P_U is the orthogonal
projection Projection or projections may refer to: Physics * Projection (physics), the action/process of light, heat, or sound reflecting from a surface to another in a different direction * The display of images by a projector Optics, graphics, and carto ...
onto U . Incoherence then asserts that given the
singular value decomposition In linear algebra, the singular value decomposition (SVD) is a Matrix decomposition, factorization of a real number, real or complex number, complex matrix (mathematics), matrix into a rotation, followed by a rescaling followed by another rota ...
U\Sigma V^\dagger of the m by n matrix M, # \mu (U), \; \mu (V) \leq \mu_0 # The entries of \sum_k u_k v_k ^\dagger have magnitudes upper bounded by \mu_1 \sqrt for some \mu_0, \; \mu_1.


Low rank matrix completion with noise

In real world application, one often observe only a few entries corrupted at least by a small amount of noise. For example, in the Netflix problem, the ratings are uncertain. Candès and Plan showed that it is possible to fill in the many missing entries of large low-rank matrices from just a few noisy samples by nuclear norm minimization. The noisy model assumes that we observe : Y_ = M_ + Z_, (i,j) \in \Omega, where is a noise term. Note that the noise can be either stochastic or deterministic. Alternatively the model can be expressed as : P_\Omega(Y) = P_\Omega(M) + P_\Omega(Z), where Z is an n \times n matrix with entries Z_ for (i,j) \in \Omega assuming that \, P_\Omega(Z)\, _F\leq\delta for some \delta > 0 .To recover the incomplete matrix, we try to solve the following optimization problem: : \begin & \underset & \, X\, _* \\ & \text & \, P_\Omega(X-Y)\, _F \leq \delta\\ \end Among all matrices consistent with the data, find the one with minimum nuclear norm. Candès and Plan have shown that this reconstruction is accurate. They have proved that when perfect noiseless recovery occurs, then matrix completion is stable vis a vis perturbations. The error is proportional to the noise level \delta. Therefore, when the noise level is small, the error is small. Here the matrix completion problem does not obey the restricted isometry property (RIP). For matrices, the RIP would assume that the sampling operator obeys : (1-\delta)\, X\, ^2_F \leq \frac\, P_\Omega(X)\, ^2_F \leq (1+\delta)\, X\, ^2_F for all matrices X with sufficiently small rank and \delta<1 sufficiently small. The methods are also applicable to sparse signal recovery problems in which the RIP does not hold.


High-rank matrix completion

The high-rank matrix completion in general is
NP-hard In computational complexity theory, a computational problem ''H'' is called NP-hard if, for every problem ''L'' which can be solved in non-deterministic polynomial-time, there is a polynomial-time reduction from ''L'' to ''H''. That is, assumi ...
. However, with certain assumptions, some incomplete high rank matrix or even full rank matrix can be completed. Eriksson, Balzano and Nowak have considered the problem of completing a matrix with the assumption that the columns of the matrix belong to a union of multiple low-rank subspaces. Since the columns belong to a union of subspaces, the problem may be viewed as a missing-data version of the subspace clustering problem. Let X be an n \times N matrix whose (complete) columns lie in a union of at most k subspaces, each of \operatorname \leq r < n, and assume N \gg kn. Eriksson, Balzano and Nowak showed that under mild assumptions each column of X can be perfectly recovered with high probability from an incomplete version so long as at least CrN\log^2(n) entries of X are observed uniformly at random, with C>1 a constant depending on the usual incoherence conditions, the geometrical arrangement of subspaces, and the distribution of columns over the subspaces. The algorithm involves several steps: (1) local neighborhoods; (2) local subspaces; (3) subspace refinement; (4) full matrix completion. This method can be applied to Internet distance matrix completion and topology identification.


Algorithms for low-rank matrix completion

Various matrix completion algorithms have been proposed. These include convex relaxation-based algorithm, gradient-based algorithm, alternating minimization-based algorithm, Gauss-Newton algorithm, and discrete-aware based algorithm.


Convex relaxation

The rank minimization problem is
NP-hard In computational complexity theory, a computational problem ''H'' is called NP-hard if, for every problem ''L'' which can be solved in non-deterministic polynomial-time, there is a polynomial-time reduction from ''L'' to ''H''. That is, assumi ...
. One approach, proposed by Candès and Recht, is to form a
convex Convex or convexity may refer to: Science and technology * Convex lens, in optics Mathematics * Convex set, containing the whole line segment that joins points ** Convex polygon, a polygon which encloses a convex set of points ** Convex polytop ...
relaxation of the problem and minimize the nuclear
norm Norm, the Norm or NORM may refer to: In academic disciplines * Normativity, phenomenon of designating things as good or bad * Norm (geology), an estimate of the idealised mineral content of a rock * Norm (philosophy), a standard in normative e ...
\, M\, _* (which gives the sum of the singular values of M) instead of \text(M) (which counts the number of non zero singular values of M). This is analogous to minimizing the L1-
norm Norm, the Norm or NORM may refer to: In academic disciplines * Normativity, phenomenon of designating things as good or bad * Norm (geology), an estimate of the idealised mineral content of a rock * Norm (philosophy), a standard in normative e ...
rather than the L0-
norm Norm, the Norm or NORM may refer to: In academic disciplines * Normativity, phenomenon of designating things as good or bad * Norm (geology), an estimate of the idealised mineral content of a rock * Norm (philosophy), a standard in normative e ...
for vectors. The
convex Convex or convexity may refer to: Science and technology * Convex lens, in optics Mathematics * Convex set, containing the whole line segment that joins points ** Convex polygon, a polygon which encloses a convex set of points ** Convex polytop ...
relaxation can be solved using
semidefinite programming Semidefinite programming (SDP) is a subfield of mathematical programming concerned with the optimization of a linear objective function (a user-specified function that the user wants to minimize or maximize) over the intersection of the cone of po ...
(SDP) by noticing that the optimization problem is equivalent to : \begin & \min\limits_ & & \operatorname (W_1) + \operatorname (W_2) \\ & \text & & X_ = M_ \;\; \forall i,j \in E\\ & & & \begin W_1 & X \\ X^T & W_2 \end \succeq 0 \end The complexity of using SDP to solve the convex relaxation is O(\text(m,n)^4). State of the art solvers like SDPT3 can only handle matrices of size up to 100 by 100. An alternative first order method that approximately solves the convex relaxation is the Singular Value Thresholding Algorithm introduced by Cai, Candès and Shen. Candès and Recht show, using the study of random variables on
Banach space In mathematics, more specifically in functional analysis, a Banach space (, ) is a complete normed vector space. Thus, a Banach space is a vector space with a metric that allows the computation of vector length and distance between vectors and ...
s, that if the number of observed entries is on the order of \maxnr \log n (assume without loss of generality m < n), the rank minimization problem has a unique solution which also happens to be the solution of its convex relaxation with probability 1-\frac for some constant c. If the rank of M is small ( r \leq \frac), the size of the set of observations reduces to the order of \mu_0 n^ r \log n. These results are near optimal, since the minimum number of entries that must be observed for the matrix completion problem to not be underdetermined is on the order of nr \log n. This result has been improved by Candès and Tao. They achieve bounds that differ from the optimal bounds only by
polylogarithm In mathematics, the polylogarithm (also known as Jonquière's function, for Alfred Jonquière) is a special function of order and argument . Only for special values of does the polylogarithm reduce to an elementary function such as the natur ...
ic factors by strengthening the assumptions. Instead of the incoherence property, they assume the strong incoherence property with parameter \mu_3. This property states that: # , \langle e_a, P_U e_ \rangle - \frac 1_ , \leq \mu_3 \frac for a, a' \leq m and , \langle e_b, P_U e_ \rangle - \frac 1_ , \leq \mu_3 \frac for b, b' \leq n # The entries of \sum_i u_i v_i^\dagger are bounded in magnitude by \mu_3 \sqrt Intuitively, strong incoherence of a matrix U asserts that the orthogonal projections of standard basis vectors to U has magnitudes that have high likelihood if the singular vectors were distributed randomly. Candès and Tao find that when r is O(1) and the number of observed entries is on the order of \mu_3^4 n(\log n)^2, the rank minimization problem has a unique solution which also happens to be the solution of its convex relaxation with probability 1-\frac for some constant c. For arbitrary r, the number of observed entries sufficient for this assertion hold true is on the order of \mu_3^2 nr (\log n)^6 Another convex relaxation approach is to minimize the Frobenius squared norm under a rank constraint. This is equivalent to solving : \begin & \min\limits_ & & \Vert X \Vert_F^2 \\ & \text & & X_ = M_ \;\; \forall i,j \in E\\ & & & \operatorname(X) \leq k. \end By introducing an orthogonal projection matrix Y (meaning Y^2=Y, Y=Y') to model the rank of X via X=YX, \text(Y)\leq k and taking this problem's convex relaxation, we obtain the following semidefinite program : \begin & \min\limits_ & & \text(\theta) \\ & \text & & X_ = M_ \;\; \forall i,j \in E\\ & & & \operatorname(Y) \leq k, 0 \preceq Y \preceq I\\ & & & \begin Y & X \\ X^\top & \theta \end\succeq 0. \end If ''Y'' is a projection matrix (i.e., has binary eigenvalues) in this relaxation, then the relaxation is tight. Otherwise, it gives a valid lower bound on the overall objective. Moreover, it can be converted into a feasible solution with a (slightly) larger objective by rounding the eigenvalues of ''Y'' greedily. Remarkably, this convex relaxation can be solved by alternating minimization on ''X'' and ''Y'' without solving any SDPs, and thus it scales beyond the typical numerical limits of state-of-the-art SDP solvers like SDPT3 or Mosek. This approach is a special case of a more general reformulation technique, which can be applied to obtain a valid lower bound on any low-rank problem with a trace-matrix-convex objective.


Gradient descent

Keshavan, Montanari and Oh consider a variant of matrix completion where the
rank A rank is a position in a hierarchy. It can be formally recognized—for example, cardinal, chief executive officer, general, professor—or unofficial. People Formal ranks * Academic rank * Corporate title * Diplomatic rank * Hierarchy ...
of the m by n matrix M, which is to be recovered, is known to be r. They assume
Bernoulli sampling In the theory of finite population sampling, Bernoulli sampling is a sampling process where each element of the statistical population, population is subjected to an statistical independence, independent Bernoulli trial which determines whether the ...
of entries, constant aspect ratio \frac, bounded magnitude of entries of M (let the upper bound be M_), and constant
condition number In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the inpu ...
\frac (where \sigma_1 and \sigma_r are the largest and smallest singular values of M respectively). Further, they assume the two incoherence conditions are satisfied with \mu_0 and \mu_1 \frac where \mu_0 and \mu_1 are constants. Let M^E be a matrix that matches M on the set E of observed entries and is 0 elsewhere. They then propose the following algorithm: # Trim M^E by removing all observations from columns with degree larger than \frac by setting the entries in the columns to 0. Similarly remove all observations from rows with degree larger than \frac. # Project M^E onto its first r principal components. Call the resulting matrix \text(M^E). # Solve \min_ \min_ \frac \sum_ (M_ - (XSY^\dagger)_)^2 + \rho G(X,Y) where G(X,Y) is some
regularization Regularization may refer to: * Regularization (linguistics) * Regularization (mathematics) * Regularization (physics) * Regularization (solid modeling) * Regularization Law, an Israeli law intended to retroactively legalize settlements See also ...
function by
gradient descent Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function. The idea is to take repeated steps in the opposite direction of the gradi ...
with
line search In optimization, line search is a basic iterative approach to find a local minimum \mathbf^* of an objective function f:\mathbb R^n\to\mathbb R. It first finds a descent direction along which the objective function f will be reduced, and then co ...
. Initialize X,\;Y at X_0,\;Y_0 where \text(M_E) = X_0 S_0 Y_0^\dagger. Set G(X,Y) as some function forcing X, \; Y to remain incoherent throughout gradient descent if X_0 and Y_0 are incoherent. # Return the matrix XSY^\dagger. Steps 1 and 2 of the algorithm yield a matrix \text(M^E) very close to the true matrix M (as measured by the root mean square error (RMSE)) with high probability. In particular, with probability 1-\frac, \frac \, M - \text(M^E) \, _F^2 \leq C \frac \sqrt for some constant C. \, \cdot \, _F denotes the Frobenius
norm Norm, the Norm or NORM may refer to: In academic disciplines * Normativity, phenomenon of designating things as good or bad * Norm (geology), an estimate of the idealised mineral content of a rock * Norm (philosophy), a standard in normative e ...
. Note that the full suite of assumptions is not needed for this result to hold. The incoherence condition, for example, only comes into play in exact reconstruction. Finally, although trimming may seem counter intuitive as it involves throwing out information, it ensures projecting M^E onto its first r principal components gives more information about the underlying matrix M than about the observed entries. In Step 3, the space of candidate matrices X,\;Y can be reduced by noticing that the inner minimization problem has the same solution for (X,Y) as for (XQ,YR) where Q and R are
orthonormal In linear algebra, two vectors in an inner product space are orthonormal if they are orthogonal unit vectors. A unit vector means that the vector has a length of 1, which is also known as normalized. Orthogonal means that the vectors are all perpe ...
r by r matrices. Then
gradient descent Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function. The idea is to take repeated steps in the opposite direction of the gradi ...
can be performed over the
cross product In mathematics, the cross product or vector product (occasionally directed area product, to emphasize its geometric significance) is a binary operation on two vectors in a three-dimensional oriented Euclidean vector space (named here E), and ...
of two Grassman manifolds. If r \ll m,\;n and the observed entry set is in the order of nr\log n, the matrix returned by Step 3 is exactly M. Then the algorithm is order optimal, since we know that for the matrix completion problem to not be underdetermined the number of entries must be in the order of nr\log n.


Alternating least squares minimization

Alternating minimization represents a widely applicable and empirically successful approach for finding low-rank matrices that best fit the given data. For example, for the problem of low-rank matrix completion, this method is believed to be one of the most accurate and efficient, and formed a major component of the winning entry in the Netflix problem. In the alternating minimization approach, the low-rank target matrix is written in a
bilinear form In mathematics, a bilinear form is a bilinear map on a vector space (the elements of which are called '' vectors'') over a field ''K'' (the elements of which are called '' scalars''). In other words, a bilinear form is a function that is linea ...
: X= UV^T; the algorithm then alternates between finding the best U and the best V. While the overall problem is non-convex, each sub-problem is typically convex and can be solved efficiently. Jain, Netrapalli and Sanghavi have given one of the first guarantees for performance of alternating minimization for both matrix completion and matrix sensing. The alternating minimization algorithm can be viewed as an approximate way to solve the following non-convex problem: \begin & \underset & \, P_\Omega(UV^T)-P_\Omega(M)\, ^2_F \\ \end The AltMinComplete Algorithm proposed by Jain, Netrapalli and Sanghavi is listed here: # Input: observed set \Omega, values P_\Omega(M) # Partition \Omega into 2T+1 subsets \Omega_0,\cdots,\Omega_ with each element of \Omega belonging to one of the \Omega_t with equal probability (sampling with replacement) # \hat^0 = SVD(\fracP_(M), k) i.e., top-k left singular vectors of \fracP_(M) # Clipping: Set all elements of \hat^0 that have magnitude greater than \frac to zero and orthonormalize the columns of \hat^0 # for t = 0, \cdots, T-1 do # \quad \hat^\leftarrow \text_\, P_(\hatV^T-M)\, ^2_F # \quad \hat^\leftarrow \text_\, P_(U(\hat^)^T-M)\, ^2_F # end for # Return X= \hat^T(\hat^T)^T They showed that by observing , \Omega, = O((\frac)^6k^7\log n \log (k \, M\, _F/\epsilon)) random entries of an incoherent matrix M, AltMinComplete algorithm can recover M in O(\log(1/\epsilon)) steps. In terms of sample complexity (, \Omega, ), theoretically, Alternating Minimization may require a bigger \Omega than Convex Relaxation. However empirically it seems not the case which implies that the sample complexity bounds can be further tightened. In terms of time complexity, they showed that AltMinComplete needs time O(, \Omega, k^2\log(1/\epsilon)). It is worth noting that, although convex relaxation based methods have rigorous analysis, alternating minimization based algorithms are more successful in practice.


Gauss-Newton

A simple addition to factorization-based algorithms is Gauss–Newton Matrix Recovery (GNMR). Similar to alternating minimization, GNMR addresses the factorized low-rank matrix completion objective: \begin & \underset & \, P_\Omega(UV^T)-P_\Omega(M)\, ^2_F \\ \end Inspired by the classical Gauss-Newton approach, GNMR linearizes the objective. This results in the following linear
least-squares The method of least squares is a mathematical optimization technique that aims to determine the best fit function by minimizing the sum of the squares of the differences between the observed values and the predicted values of the model. The me ...
subproblem: \begin & \underset & \, P_\Omega(U_0 V_0^T + U_0 \Delta V^T + \Delta U V_0^T)-P_\Omega(M)\, ^2_F \\ \end Starting from an initialization (U_0, V_0), GNMR iteratively solves the linear least squares subproblem and updates U_ \leftarrow U_t + \Delta U, V_ \leftarrow V_t + \Delta V until convergence. Since the least squares subproblem is rank deficient, GNMR selects the minimal norm solution, thereby preserving balance between U and V without explicit regularization. This algorithm was shown to enjoy strong theoretical guarantees. In addition, despite its simplicity, empirical results indicate that GNMR outperforms several popular algorithms, particularly when observations are sparse or the matrix is ill-conditioned.


Discrete-aware matrix completion

In applications such as recommender systems, where matrix entries are discrete (e.g., integer ratings from 1 to 5), incorporating this discreteness into the matrix completion problem can improve performance. Discrete-aware matrix completion approaches introduce a regularizer that encourages the completed matrix entries to align with a finite discrete alphabet. An early method in this domain utilized the \ell_1-norm as a convex relaxation of the \ell_0-norm to enforce discreteness, enabling efficient optimization using proximal gradient methods. Building upon this, Führling et al. (2023) replaces the \ell_1-norm with a continuous and differentiable approximation of the \ell_0-norm, making the problem more tractable and improving the performance. The discrete-aware matrix completion problem can be formulated as: :\underset \, f(\boldsymbol) + \lambda g(\boldsymbol) + \zeta r(\boldsymbol \mid 0), where: * (\boldsymbol) = \frac \left\, P_(\boldsymbol - \boldsymbol) \right\, _F^2 ensures fidelity to the observed entries, with P_ as the projection onto the observed set \Omega and \boldsymbol as the observed matrix. * g(\boldsymbol) = \, \boldsymbol\, _* is the nuclear norm to enforce low-rank structure. * r(\boldsymbol \mid 0) = \sum_^ \left\, \operatorname_(\boldsymbol) - a_k \mathbf \right\, _0 is the discrete-space regularizer, with \mathcal being the discrete alphabet (e.g., ) and \overline the set of unobserved entries. To solve this non-convex problem, the \ell_0-norm is approximated by a continuous function. This approximation is convexized using fractional programming, transforming the problem into a series of convex subproblems. The algorithm iteratively updates the matrix estimate by applying proximal operations to the discrete-space regularizer and singular value thresholding to enforce the low-rank constraint. Initializing the process with the solution from the \ell_1-norm-based method can accelerate convergence. Simulation results, tested on datasets like MovieLens-100k, demonstrate that this method outperforms both its \ell_1-norm-based predecessor and other state-of-the-art techniques, particularly when the ratio of observed entries is low (e.g., 20% to 60%).


Applications

Several applications of matrix completion are summarized by Candès and Plan as follows:


Collaborative filtering

Collaborative filtering Collaborative filtering (CF) is, besides content-based filtering, one of two major techniques used by recommender systems.Francesco Ricci and Lior Rokach and Bracha ShapiraIntroduction to Recommender Systems Handbook, Recommender Systems Handbo ...
is the task of making automatic predictions about the interests of a user by collecting taste information from many users. Companies like Apple, Amazon, Barnes and Noble, and Netflix are trying to predict their user preferences from partial knowledge. In these kind of matrix completion problem, the unknown full matrix is often considered low rank because only a few factors typically contribute to an individual's tastes or preference.


System identification

In control, one would like to fit a discrete-time linear time-invariant state-space model : \begin x(t+1)&=Ax(t)+Bu(t)\\ y(t)&=Cx(t)+Du(t) \end to a sequence of inputs u(t) \in \mathbb^m and outputs y(t) \in \mathbb^p, t = 0, \ldots, N. The vector x(t) \in \mathbb^n is the state of the system at time t and n is the order of the system model. From the input/output pair, one would like to recover the matrices A,B,C,D and the initial state x(0). This problem can also be viewed as a low-rank matrix completion problem.


Internet of things (IoT) localization

The localization (or global positioning) problem emerges naturally in IoT sensor networks. The problem is to recover the sensor map in
Euclidean space Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces ...
from a local or partial set of pairwise distances. Thus it is a matrix completion problem with rank two if the sensors are located in a 2-D plane and three if they are in a 3-D space.


Social networks recovery

Most of the real-world social networks have low-rank distance matrices. When we are not able to measure the complete network, which can be due to reasons such as private nodes, limited storage or compute resources, we only have a fraction of distance entries known. Criminal networks are a good example of such networks. Low-rank Matrix Completion can be used to recover these unobserved distances.


See also

* Matrix regularization *
Netflix Prize The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users being identified ...
*
Collaborative filtering Collaborative filtering (CF) is, besides content-based filtering, one of two major techniques used by recommender systems.Francesco Ricci and Lior Rokach and Bracha ShapiraIntroduction to Recommender Systems Handbook, Recommender Systems Handbo ...
*
System identification The field of system identification uses statistical methods to build mathematical models of dynamical systems from measured data. System identification also includes the optimal design#System identification and stochastic approximation, optimal de ...
*
Convex optimization Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets). Many classes of convex optimization problems ...
* Imputation


References

{{Reflist, refs= {{cite book , last1=Johnson , first1=Charles R. , title=Matrix Theory and Applications , chapter=Matrix completion problems: A survey , series=Proceedings of Symposia in Applied Mathematics , year=1990 , volume=40 , pages=171–198 , doi=10.1090/psapm/040/1059486 , isbn=9780821801543 {{cite book , last1=Laurent , first1=Monique , title=Encyclopedia of Optimization , chapter=Matrix Completion Problems , year=2008 , volume=3 , pages=221–229 , doi=10.1007/978-0-387-74759-0_355 , isbn=978-0-387-74758-3 {{cite journal , last1=Bertsimas , first1=Dimitris , last2=Cory-Wright , first2=Ryan , last3=Pauphilet , first3=Jean , year=2021 , title=Mixed-Projection Conic Optimization: A New Paradigm for Modeling Rank Constraints , journal=
Operations Research Operations research () (U.S. Air Force Specialty Code: Operations Analysis), often shortened to the initialism OR, is a branch of applied mathematics that deals with the development and application of analytical methods to improve management and ...
, volume=70 , issue=6 , pages=3321–3344 , arxiv=2009.10395 , doi=10.1287/opre.2021.2182 , s2cid=221836263
{{cite journal , last1=Bertsimas , first1=Dimitris , last2=Cory-Wright , first2=Ryan , last3=Pauphilet , first3=Jean , year=2021 , title=A New Perspective on Low-Rank Optimization , journal= Optimization Online , arxiv=2105.05947 {{cite journal , last1=Candès , first1=E. J. , last2=Recht , first2=B. , year=2009 , title=Exact Matrix Completion via Convex Optimization , journal= Foundations of Computational Mathematics , volume=9 , issue=6 , pages=717–772 , arxiv=0805.4471 , doi=10.1007/s10208-009-9045-5 , doi-access=free {{cite journal , last1=Candès , first1=E. J. , last2=Tao , first2=T. , year=2010 , title=The Power of Convex Relaxation: Near-Optimal Matrix Completion , journal=
IEEE Transactions on Information Theory ''IEEE Transactions on Information Theory'' is a monthly peer-reviewed scientific journal published by the IEEE Information Theory Society. It covers information theory and the mathematics of communications. It was established in 1953 as ''IRE Tran ...
, volume=56 , issue=5 , pages=2053–2080 , arxiv=0903.1476 , doi=10.1109/TIT.2010.2044061 , s2cid=1255437
{{cite journal , last1=Keshavan , first1=R. H. , last2=Montanari , first2=A. , last3=Oh , first3=S. , year=2010 , title=Matrix Completion from a Few Entries , journal=
IEEE Transactions on Information Theory ''IEEE Transactions on Information Theory'' is a monthly peer-reviewed scientific journal published by the IEEE Information Theory Society. It covers information theory and the mathematics of communications. It was established in 1953 as ''IRE Tran ...
, volume=56 , issue=6 , pages=2980–2998 , arxiv=0901.3150 , doi=10.1109/TIT.2010.2046205 , s2cid=53504
{{cite journal , last1=Recht , first1=B. , year=2009 , title=A Simpler Approach to Matrix Completion , url=http://www.jmlr.org/papers/volume12/recht11a/recht11a.pdf , journal= Journal of Machine Learning Research , volume=12 , pages=3413–3430 , arxiv=0910.0651 , bibcode=2009arXiv0910.0651R {{cite journal , last1=Candès , first1=E. J. , last2=Plan , first2=Y. , year=2010 , title=Matrix Completion with Noise , journal=
Proceedings of the IEEE The ''Proceedings of the IEEE'' is a monthly peer-reviewed scientific journal published by the Institute of Electrical and Electronics Engineers (IEEE). The journal focuses on electrical engineering and computer science. According to the ''Journa ...
, volume=98 , issue=6 , pages=925–936 , arxiv=0903.3131 , doi=10.1109/JPROC.2009.2035722 , s2cid=109721
{{cite arXiv , last1=Eriksson , first1=B. , last2=Balzano , first2=L. , last3=Nowak , first3=R. , year=2011 , title=High-Rank Matrix Completion and Subspace Clustering with Missing Data , eprint=1112.5629 , class=cs.IT {{cite journal , last1=Nguyen , first1=L.T. , last2=Kim , first2=J. , last3=Shim , first3=B. , date=10 July 2019 , title=Low-Rank Matrix Completion: A Contemporary Survey , journal=
IEEE Access IEEE Access is a peer-reviewed open-access scientific journal published by the Institute of Electrical and Electronics Engineers (IEEE). It was established in 2013 and covers all IEEE fields of interest. The founding editor-in-chief was Michael P ...
, volume=7 , issue=1 , pages=94215–94237 , doi=10.1109/ACCESS.2019.2928130 , bibcode=2019arXiv190711705N , arxiv=1907.11705 , s2cid=198930899
{{cite web , last=Tao , first=T. , date=10 March 2009 , title=The power of convex relaxation: near-optimal matrix completion , url=http://terrytao.wordpress.com/2009/03/10/the-power-of-convex-relaxation-near-optimal-matrix-completion/ , website=What's new {{cite journal , last1=Cai , first1=J.-F. , last2=Candès , first2=E. J. , last3=Shen , first3=Z. , year=2010 , title=A Singular Value Thresholding Algorithm for Matrix Completion , journal= SIAM Journal on Optimization , volume=20 , issue=4 , pages=1956–1982 , arxiv=0810.3286 , doi=10.1137/080738970 , s2cid=1254778 {{cite book , last1=Jain , first1=P. , last2=Netrapalli , first2=P. , last3=Sanghavi , first3=S. , year=2013 , chapter=Low-rank Matrix Completion using Alternating Minimization , title=Proceedings of the 45th annual ACM symposium on Symposium on theory of computing , pages=665–674 , publisher=ACM , arxiv=1212.0467 , doi=10.1145/2488608.2488693 , isbn=978-1-4503-2029-0 , s2cid=447011 {{cite journal , last1=Xu , first1=Zhiqiang , year=2018 , title=The minimal measurement number for low-rank matrix recovery , journal= Applied and Computational Harmonic Analysis , volume=44 , issue=2 , pages=497–508 , arxiv=1505.07204 , doi=10.1016/j.acha.2017.01.005 , s2cid=11990443 {{cite journal , last1=Nguyen , first1=L.T. , last2=Kim , first2=J. , last3=Kim , first3=S. , last4=Shim , first4=B. , year=2019 , title=Localization of IoT Networks Via Low-Rank Matrix Completion , journal= IEEE Transactions on Communications , volume=67 , issue=8 , pages=5833–5847 , doi=10.1109/TCOMM.2019.2915226 , s2cid=164605437 {{cite book , last1=Mahindre , first1=G. , last2=Jayasumana , first2=A.P. , last3=Gajamannage , first3=K. , last4=Paffenroth , first4=R. , title=2019 IEEE 44th Conference on Local Computer Networks (LCN) , chapter=On Sampling and Recovery of Topology of Directed Social Networks – A Low-Rank Matrix Completion Based Approach , year=2019 , publisher=IEEE , pages=324–331 , doi=10.1109/LCN44214.2019.8990707 , isbn=978-1-7281-1028-8 , s2cid=211206354 Matrix theory