History
In chemometrics non-negative matrix factorization has a long history under the name "self modeling curve resolution". In this framework the vectors in the right matrix are continuous curves rather than discrete vectors. Also early work on non-negative matrix factorizations was performed by a Finnish group of researchers in the 1990s under the name ''positive matrix factorization''. It became more widely known as ''non-negative matrix factorization'' after Lee and Seung investigated the properties of the algorithm and published some simple and useful algorithms for two types of factorizations.Background
Let matrix be the product of the matrices and , : Matrix multiplication can be implemented as computing the column vectors of as linear combinations of the column vectors in using coefficients supplied by columns of . That is, each column of can be computed as follows: : where is the -th column vector of the product matrix and is the -th column vector of the matrix . When multiplying matrices, the dimensions of the factor matrices may be significantly lower than those of the product matrix and it is this property that forms the basis of NMF. NMF generates factors with significantly reduced dimensions compared to the original matrix. For example, if is an matrix, is an matrix, and is a matrix then can be significantly less than both and . Here is an example based on a text-mining application: * Let the input matrix (the matrix to be factored) be with 10000 rows and 500 columns where words are in rows and documents are in columns. That is, we have 500 documents indexed by 10000 words. It follows that a column vector in represents a document. * Assume we ask the algorithm to find 10 features in order to generate a ''features matrix'' with 10000 rows and 10 columns and a ''coefficients matrix'' with 10 rows and 500 columns. * The product of and is a matrix with 10000 rows and 500 columns, the same shape as the input matrix and, if the factorization worked, it is a reasonable approximation to the input matrix . * From the treatment of matrix multiplication above it follows that each column in the product matrix is a linear combination of the 10 column vectors in the features matrix with coefficients supplied by the coefficients matrix . This last point is the basis of NMF because we can consider each original document in our example as being built from a small set of hidden features. NMF generates these features. It is useful to think of each feature (column vector) in the features matrix as a document archetype comprising a set of words where each word's cell value defines the word's rank in the feature: The higher a word's cell value the higher the word's rank in the feature. A column in the coefficients matrix represents an original document with a cell value defining the document's rank for a feature. We can now reconstruct a document (column vector) from our input matrix by a linear combination of our features (column vectors in ) where each feature is weighted by the feature's cell value from the document's column in .Clustering property
NMF has an inherent clustering property, i.e., it automatically clusters the columns of input data . More specifically, the approximation of by is achieved by finding and that minimize the error function (using theTypes
Approximate non-negative matrix factorization
Usually the number of columns of and the number of rows of in NMF are selected so the product will become an approximation to . The full decomposition of then amounts to the two non-negative matrices and as well as a residual , such that: . The elements of the residual matrix can either be negative or positive. When and are smaller than they become easier to store and manipulate. Another reason for factorizing into smaller matrices and , is that if one's goal is to approximately represent the elements of by significantly less data, then one has to infer some latent structure in the data.Convex non-negative matrix factorization
In standard NMF, matrix factor , i.e., can be anything in that space. Convex NMFC Ding, T Li, MI Jordan, Convex and semi-nonnegative matrix factorizations, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 45-55, 2010 restricts the columns of to convex combinations of the input data vectors . This greatly improves the quality of data representation of . Furthermore, the resulting matrix factor becomes more sparse and orthogonal.Nonnegative rank factorization
In case the nonnegative rank of is equal to its actual rank, is called a nonnegative rank factorization (NRF). The problem of finding the NRF of , if it exists, is known to be NP-hard.Different cost functions and regularizations
There are different types of non-negative matrix factorizations. The different types arise from using different cost functions for measuring the divergence between and and possibly by regularization of the and/or matrices. Two simple divergence functions studied by Lee and Seung are the squared error (orOnline NMF
Many standard NMF algorithms analyze all the data together; i.e., the whole matrix is available from the start. This may be unsatisfactory in applications where there are too many data to fit into memory or where the data are provided inConvolutional NMF
If the columns of represent data sampled over spatial or temporal dimensions, e.g. time signals, images, or video, features that are equivariant w.r.t. shifts along these dimensions can be learned by Convolutional NMF. In this case, is sparse with columns having local non-zero weight windows that are shared across shifts along the spatio-temporal dimensions of , representing convolution kernels. By spatio-temporal pooling of and repeatedly using the resulting representation as input to convolutional NMF, deep feature hierarchies can be learned.Algorithms
There are several ways in which the and may be found: Lee and Seung's multiplicative update rule has been a popular method due to the simplicity of implementation. This algorithm is: :initialize: and non negative. :Then update the values in and by computing the following, with as an index of the iteration. : :and : :Until and are stable. Note that the updates are done on an element by element basis not matrix multiplication. We note that the multiplicative factors for and , i.e. the and terms, are matrices of ones when . More recently other algorithms have been developed. Some approaches are based on alternating non-negative least squares: in each step of such an algorithm, first is fixed and found by a non-negative least squares solver, then is fixed and is found analogously. The procedures used to solve for and may be the same or different, as some NMF variants regularize one of and . Specific approaches include the projected gradient descent methods, the active set method, the optimal gradient method, and the block principal pivoting method among several others. Current algorithms are sub-optimal in that they only guarantee finding a local minimum, rather than a global minimum of the cost function. A provably optimal algorithm is unlikely in the near future as the problem has been shown to generalize the k-means clustering problem which is known to be NP-complete. However, as in many other data mining applications, a local minimum may still prove to be useful. In addition to the optimization step, initialization has a significant effect on NMF. The initial values chosen for and may affect not only the rate of convergence, but also the overall error at convergence. Some options for initialization include complete randomization, SVD, k-means clustering, and more advanced strategies based on these and other paradigms.Sequential NMF
The sequential construction of NMF components ( and ) was firstly used to relate NMF with Principal Component Analysis (PCA) in astronomy. The contribution from the PCA components are ranked by the magnitude of their corresponding eigenvalues; for NMF, its components can be ranked empirically when they are constructed one by one (sequentially), i.e., learn the -th component with the first components constructed. The contribution of the sequential NMF components can be compared with the Karhunen–Loève theorem, an application of PCA, using the plot of eigenvalues. A typical choice of the number of components with PCA is based on the "elbow" point, then the existence of the flat plateau is indicating that PCA is not capturing the data efficiently, and at last there exists a sudden drop reflecting the capture of random noise and falls into the regime of overfitting. For sequential NMF, the plot of eigenvalues is approximated by the plot of the fractional residual variance curves, where the curves decreases continuously, and converge to a higher level than PCA, which is the indication of less over-fitting of sequential NMF.Exact NMF
Exact solutions for the variants of NMF can be expected (in polynomial time) when additional constraints hold for matrix . A polynomial time algorithm for solving nonnegative rank factorization if contains a monomial sub matrix of rank equal to its rank was given by Campbell and Poole in 1981. Kalofolias and Gallopoulos (2012) solved the symmetric counterpart of this problem, where is symmetric and contains a diagonal principal sub matrix of rank r. Their algorithm runs in time in the dense case. Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) give a polynomial time algorithm for exact NMF that works for the case where one of the factors W satisfies a separability condition.Relation to other techniques
In ''Learning the parts of objects by non-negative matrix factorization'' Lee and Seung proposed NMF mainly for parts-based decomposition of images. It compares NMF to vector quantization and principal component analysis, and shows that although the three techniques may be written as factorizations, they implement different constraints and therefore produce different results. It was later shown that some types of NMF are an instance of a more general probabilistic model called "multinomial PCA". When NMF is obtained by minimizing the Kullback–Leibler divergence, it is in fact equivalent to another instance of multinomial PCA, probabilistic latent semantic analysis, trained by maximum likelihood estimation. That method is commonly used for analyzing and clustering textual data and is also related to the latent class model. NMF with the least-squares objective is equivalent to a relaxed form of K-means clustering: the matrix factor contains cluster centroids and contains cluster membership indicators.C. Ding, X. He, H.D. Simon (2005)Uniqueness
The factorization is not unique: A matrix and its inverse can be used to transform the two factorization matrices by, e.g., : If the two new matrices and are non-negative they form another parametrization of the factorization. The non-negativity of and applies at least if is a non-negative monomial matrix. In this simple case it will just correspond to a scaling and aApplications
Astronomy
In astronomy, NMF is a promising method for dimension reduction in the sense that astrophysical signals are non-negative. NMF has been applied to the spectroscopic observations and the direct imaging observations as a method to study the common properties of astronomical objects and post-process the astronomical observations. The advances in the spectroscopic observations by Blanton & Roweis (2007) takes into account of the uncertainties of astronomical observations, which is later improved by Zhu (2016) where missing data are also considered andData imputation
To impute missing data in statistics, NMF can take missing data while minimizing its cost function, rather than treating these missing data as zeros. This makes it a mathematically proven method for data imputation in statistics. By first proving that the missing data are ignored in the cost function, then proving that the impact from missing data can be as small as a second order effect, Ren et al. (2020) studied and applied such an approach for the field of astronomy. Their work focuses on two-dimensional matrices, specifically, it includes mathematical derivation, simulated data imputation, and application to on-sky data. The data imputation procedure with NMF can be composed of two steps. First, when the NMF components are known, Ren et al. (2020) proved that impact from missing data during data imputation ("target modeling" in their study) is a second order effect. Second, when the NMF components are unknown, the authors proved that the impact from missing data during component construction is a first-to-second order effect. Depending on the way that the NMF components are obtained, the former step above can be either independent or dependent from the latter. In addition, the imputation quality can be increased when the more NMF components are used, see Figure 4 of Ren et al. (2020) for their illustration.Text mining
NMF can be used for text mining applications. In this process, a ''document-term'' matrix is constructed with the weights of various terms (typically weighted word frequency information) from a set of documents. This matrix is factored into a ''term-feature'' and a ''feature-document'' matrix. The features are derived from the contents of the documents, and the feature-document matrix describes data clusters of related documents. One specific application used hierarchical NMF on a small subset of scientific abstracts fromSpectral data analysis
NMF is also used to analyze spectral data; one such use is in the classification of space objects and debris.Scalable Internet distance prediction
NMF is applied in scalable Internet distance (round-trip time) prediction. For a network with hosts, with the help of NMF, the distances of all the end-to-end links can be predicted after conducting only measurements. This kind of method was firstly introduced in Internet Distance Estimation Service (IDES). Afterwards, as a fully decentralized approach, Phoenix network coordinate system is proposed. It achieves better overall prediction accuracy by introducing the concept of weight.Non-stationary speech denoising
Speech denoising has been a long lasting problem in audio signal processing. There are many algorithms for denoising if the noise is stationary. For example, the Wiener filter is suitable for additivePopulation genetics
Sparse NMF is used inBioinformatics
NMF has been successfully applied inNuclear imaging
NMF, also referred in this field as factor analysis, has been used since the 1980s to analyze sequences of images in SPECT and PET dynamic medical imaging. Non-uniqueness of NMF was addressed using sparsity constraints.Current research
Current research (since 2010) in nonnegative matrix factorization includes, but is not limited to, # Algorithmic: searching for global minima of the factors and factor initialization. # Scalability: how to factorize million-by-billion matrices, which are commonplace in Web-scale data mining, e.g., see Distributed Nonnegative Matrix Factorization (DNMF), Scalable Nonnegative Matrix Factorization (ScalableNMF), Distributed Stochastic Singular Value Decomposition. # Online: how to update the factorization when new data comes in without recomputing from scratch, e.g., see online CNSC # Collective (joint) factorization: factorizing multiple interrelated matrices for multiple-view learning, e.g. multi-view clustering, see CoNMF and MultiNMF # Cohen and Rothblum 1993 problem: whether a rational matrix always has an NMF of minimal inner dimension whose factors are also rational. Recently, this problem has been answered negatively.See also
* Multilinear algebra *Sources and external links
Notes
Others
* * * * * * * * * Andrzej Cichocki, Morten Mrup, et al.: "Advances in Nonnegative Matrix and Tensor Factorization", Hindawi Publishing Corporation, (2008). * Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan and Shun-ichi Amari: "Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation", Wiley, (2009). * Andri Mirzal: "Nonnegative Matrix Factorizations for Clustering and LSI: Theory and Programming", LAP LAMBERT Academic Publishing, (2011). * Yong Xiang: "Blind Source Separation: Dependent Component Analysis", Springer, (2014). * Ganesh R. Naik(Ed.): "Non-negative Matrix Factorization Techniques: Advances in Theory and Applications", Springer, (2016). * Julian Becker: "Nonnegative Matrix Factorization with Adaptive Elements for Monaural Audio Source Separation: 1 ", Shaker Verlag GmbH, Germany, (2016). * Jen-Tzung Chien: "Source Separation and Machine Learning", Academic Press, (2018). * Shoji Makino(Ed.): "Audio Source Separation", Springer, (2019). * Nicolas Gillis: "Nonnegative Matrix Factorization", SIAM, (2020). {{Scholia, topic Linear algebra Matrix theory Machine learning algorithms factorization