A Julia package for non-negative matrix factorization (NMF).
Note: Nonnegative Matrix Factorization is an area of active research. New algorithms are proposed every year. Contributions are very welcomed.
- Lee & Seung's Multiplicative Update (for both MSE & Divergence objectives)
- (Naive) Projected Alternate Least Squared
- ALS Projected Gradient Methods
- Coordinate Descent Methods
- Random Initialization
- NNDSVD Initialization
- Sparse NMF
- Separable NMF
- Probabilistic NMF
See also NMFMerge, which can augment any least-squares NMF algorithm.
Non-negative Matrix Factorization (NMF) generally refers to the techniques for factorizing a non-negative matrix X
into the product of two lower rank matrices W
and H
, such that WH
optimally approximates X
in some sense. Such techniques are widely used in text mining, image analysis, and recommendation systems.
This package provides two sets of tools, respectively for initilization and optimization. A typical NMF procedure consists of two steps: (1) use an initilization function that initialize W
and H
; and (2) use an optimization algorithm to pursue the optimal solution.
Most types and functions (except the high-level function nnmf
) in this package are not exported. Users are encouraged to use them with the prefix NMF.
. This way allows us to use shorter names within the package and makes the codes more explicit and clear on the user side.
The package provides a high-level function nnmf
that runs the entire procedure (initialization + optimization):
nnmf(X, k, ...)
This function factorizes the input matrix X
into the product of two non-negative matrices W
and H
.
In general, it returns a result instance of type NMF.Result
, which is defined as
struct Result
W::Matrix{Float64} # W matrix
H::Matrix{Float64} # H matrix
niters::Int # number of elapsed iterations
converged::Bool # whether the optimization procedure converges
objvalue::Float64 # objective value of the last step
end
The function supports the following keyword arguments:
-
init
: A symbol that indicates the initialization method (default =:nndsvdar
).This argument accepts the following values:
random
: matrices filled with uniformly random valuesnndsvd
: standard version of NNDSVDnndsvda
: NNDSVDa variantnndsvdar
: NNDSVDar variantspa
: Successive Projection Algorithmcustom
: use custom matricesW0
andH0
With the
nndsvd
variants, you can optionally supply the SVD result viainitdata
, e.g.,initdata=svd(X)
. If not supplied, the factorization is computed using a randomized svd (rsvd
from RandomizedLinAlg). -
alg
: A symbol that indicates the factorization algorithm (default =:greedycd
).This argument accepts the following values:
multmse
: Multiplicative update (using MSE as objective)multdiv
: Multiplicative update (using divergence as objective)projals
: (Naive) Projected Alternate Least Squarealspgrad
: Alternate Least Square using Projected Gradient Descentcd
: Coordinate Descent solver that uses Fast Hierarchical Alternating Least Squares (implemetation similar to scikit-learn)greedycd
: Greedy Coordinate Descentspa
: Successive Projection Algorithm
-
maxiter
: Maximum number of iterations (default =100
). -
tol
: tolerance of changes upon convergence (default =1.0e-6
). -
replicates
: Number of times to perform factorization (default =1
). -
W0
: Option for custom initialization (default =nothing
). -
H0
: Option for custom initialization (default =nothing
).Note:
W0
andH0
may be overwritten. If one needs to avoid it, please pass in copies themselves. -
update_H
: Option for specifying whether to update H (default =true
). -
verbose
: whether to show procedural information (default =false
).
-
NMF.randinit(X, k[; zeroh=false, normalize=false])
Initialize
W
andH
given the input matrixX
and the rankk
. This function returns a pair(W, H)
.Suppose the size of
X
is(p, n)
, then the size ofW
andH
are respectively(p, k)
and(k, n)
.Usage:
W, H = NMF.randinit(X, 3)
For some algorithms (e.g. ALS), only
W
needs to be initialized. For such cases, one may set the keyword argumentzeroh
to betrue
, then in the outputH
will be simply a zero matrix of size(k, n)
.Another keyword argument is
normalize
. Ifnormalize
is set totrue
, columns ofW
will be normalized such that each column sum to one. -
NMF.nndsvd(X, k[; zeroh=false, variant=:std])
Use the Non-Negative Double Singular Value Decomposition (NNDSVD) algorithm to initialize
W
andH
.Reference: C. Boutsidis, and E. Gallopoulos. SVD based initialization: A head start for nonnegative matrix factorization. Pattern Recognition, 2007.
Usage:
W, H = NMF.nndsvd(X, k)
This function has two keyword arguments:
zeroh
: haveH
initialized when it is set totrue
, or setH
to all zeros when it is set tofalse
.variant
: the variant of the algorithm. Default isstd
, meaning to use the standard version, which would generate a rather sparseW
. Other values area
andar
, respectively corresponding to the variants: NNDSVDa and NNDSVDar. Particularly,ar
is recommended for dense NMF.
-
NMF.spa(X, k)
Use the Successive Projection Algorithm (SPA) to initialize
W
andH
.Reference: N. Gillis and S. A. Vavasis, Fast and robust recursive algorithms for separable nonnegative matrix factorization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 4, pp. 698-714, 2013.
Usage:
W, H = NMF.spa(X, k)
This package provides multiple factorization algorithms. Each algorithm corresponds to a type. One can create an algorithm instance by choosing a type and specifying the options, and run the algorithm using NMF.solve!
:
NMF.solve!(alg, X, W, H)
Use the algorithm alg
to factorize X
into W
and H
.
Here, W
and H
must be pre-allocated matrices (respectively of size (p, k)
and (k, n)
). W
and H
must be appropriately initialized before this function is invoked. For some algorithms, both W
and H
must be initialized (e.g. multiplicative updating); while for others, only W
needs to be initialized (e.g. ALS).
The matrices W
and H
are updated in place.
-
Multiplicative Updating
Reference: Daniel D. Lee and H. Sebastian Seung. Algorithms for Non-negative Matrix Factorization. Advances in NIPS, 2001.
This algorithm has two different kind of objectives: minimizing mean-squared-error (
:mse
) and minimizing divergence (:div
). BothW
andH
need to be initialized.MultUpdate(obj::Symbol=:mse, # objective, either :mse or :div maxiter::Integer=100, # maximum number of iterations verbose::Bool=false, # whether to show procedural information tol::Real=1.0e-6, # tolerance of changes on W and H upon convergence update_H::Bool=true, # whether to update H lambda_w::Real=0.0, # L1 regularization coefficient for W lambda_h::Real=0.0) # L1 regularization coefficient for H
Note: the values above are default values for the keyword arguments. One can override part (or all) of them.
-
(Naive) Projected Alternate Least Square
This algorithm alternately updates
W
andH
while holding the other fixed. Each update step solvesW
orH
without enforcing the non-negativity constrait, and forces all negative entries to zeros afterwards. OnlyW
needs to be initialized.ProjectedALS(maxiter::Integer=100, # maximum number of iterations verbose::Bool=false, # whether to show procedural information tol::Real=1.0e-6, # tolerance of changes on W and H upon convergence update_H::Bool=true, # whether to update H lambda_w::Real=1.0e-6, # L2 regularization coefficient for W lambda_h::Real=1.0e-6) # L2 regularization coefficient for H
-
Alternate Least Square Using Projected Gradient Descent
Reference: Chih-Jen Lin. Projected Gradient Methods for Non-negative Matrix Factorization. Neural Computing, 19 (2007).
This algorithm adopts the alternate least square strategy. A efficient projected gradient descent method is used to solve each sub-problem. Both
W
andH
need to be initialized.ALSPGrad(maxiter::Integer=100, # maximum number of iterations (in main procedure) maxsubiter::Integer=200, # maximum number of iterations in solving each sub-problem tol::Real=1.0e-6, # tolerance of changes on W and H upon convergence tolg::Real=1.0e-4, # tolerable gradient norm in sub-problem (first-order optimality) update_H::Bool=true, # whether to update H verbose::Bool=false) # whether to show procedural information
-
Coordinate Descent solver with Fast Hierarchical Alternating Least Squares
Reference: Cichocki, Andrzej, and P. H. A. N. Anh-Huy. Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE transactions on fundamentals of electronics, communications and computer sciences 92.3: 708-721 (2009).
Sequential constrained minimization on a set of squared Euclidean distances over W and H matrices. Uses l_1 and l_2 penalties to enforce sparsity.
CoordinateDescent(maxiter::Integer=100, # maximum number of iterations (in main procedure) verbose::Bool=false, # whether to show procedural information tol::Real=1.0e-6, # tolerance of changes on W and H upon convergence update_H::Bool=true, # whether to update H α::Real=0.0, # constant that multiplies the regularization terms regularization=:both, # select whether the regularization affects the components (H), the transformation (W), both or none of them (:components, :transformation, :both, :none) l₁ratio::Real=0.0, # l1 / l2 regularization mixing parameter (in [0; 1]) shuffle::Bool=false) # if true, randomize the order of coordinates in the CD solver
-
Greedy Coordinate Descent
Reference: Cho-Jui Hsieh and Inderjit S. Dhillon. Fast coordinate descent methods with variable selection for non-negative matrix factorization. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1064–1072 (2011).
This algorithm is a fast coordinate descent method with variable selection. Both
W
andH
need to be initialized.GreedyCD(maxiter::Integer=100, # maximum number of iterations (in main procedure) verbose::Bool=false, # whether to show procedural information tol::Real=1.0e-6, # tolerance of changes on W and H upon convergence update_H::Bool=true, # whether to update H lambda_w::Real=0.0, # L1 regularization coefficient for W lambda_h::Real=0.0) # L1 regularization coefficient for H
-
Successive Projection Algorithm for Separable NMF
Reference: N. Gillis and S. A. Vavasis, "Fast and robust recursive algorithms for separable nonnegative matrix factorization," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 4, pp. 698-714, 2013.
A separable matrix X can be written as
X = WH = W[I V]P
, whereW
has rankk
,I
is the identity matrix, the sum of the entries of each column ofV
is at most one, andP
is a permutation matrix to arange the columns of[I V]
randomly. Separable NMF aims to decompose a separable matrixX
into two nonnegative factor matricesW
andH
, so thatWH
is equal toX
. This algorithm is used for separable NMF. BothW
andH
need to be initialized byinit=:spa
.SPA(obj::Symbol=:mse) # objective :mse or :div
Here are examples that demonstrate how to use this package to factorize a non-negative dense matrix.
... # prepare input matrix X
r = nnmf(X, k; alg=:multmse, maxiter=30, tol=1.0e-4)
W = r.W
H = r.H
import NMF
# initialize
W, H = NMF.randinit(X, 5)
# optimize
NMF.solve!(NMF.MultUpdate{Float64}(obj=:mse,maxiter=100), X, W, H)
import NMF
# initialize
W, H = NMF.randinit(X, 5)
# optimize
NMF.solve!(NMF.ProjectedALS{Float64}(maxiter=50), X, W, H)
import NMF
# initialize
W, H = NMF.nndsvd(X, 5, variant=:ar)
# optimize
NMF.solve!(NMF.ALSPGrad{Float64}(maxiter=50, tolg=1.0e-6), X, W, H)
import NMF
# initialize
W, H = NMF.nndsvd(X, 5, variant=:ar)
# optimize
NMF.solve!(NMF.CoordinateDescent{Float64}(maxiter=50, α=0.5, l₁ratio=0.5), X, W, H)
import NMF
# initialize
W, H = NMF.nndsvd(X, 5, variant=:ar)
# optimize
NMF.solve!(NMF.GreedyCD{Float64}(maxiter=50), X, W, H)
import NMF
# initialize
W, H = NMF.spa(X, 5)
# optimize
NMF.solve!(NMF.SPA{Float64}(obj=:mse), X, W, H)