Minimum relevant variables in linear system (Min-RVLS) is a problem in
mathematical optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
. Given a
linear program
Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. Linear programming is ...
, it is required to find a feasible solution in which the number of non-zero variables is as small as possible.
The problem is known to be
NP-hard
In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...
and even hard to approximate.
Definition
A Min-RVLS problem is defined by:
* A binary relation ''R'', which is one of ;
* An ''m''-by-''n'' matrix ''A'' (where ''m'' is the number of constraints and ''n'' the number of variables);
* An ''m''-by-1 vector ''b''.
The linear system is given by: ''A x'' ''R'' ''b.'' It is assumed to be feasible (i.e., satisfied by at least one ''x''). Depending on R, there are four different variants of this system: ''A x = b, A x ≥ b, A x > b, A x ≠b''.
The goal is to find an ''n''-by-1 vector ''x'' that satisfies the system ''A x'' ''R'' ''b'', and subject to that, contains as few as possible nonzero elements.
Special case
The problem Min-RVLS
was presented by Garey and Johnson, who called it "minimum weight solution to linear equations". They proved it was NP-hard, but did not consider approximations.
Applications
The Min-RVLS problem is important in
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
and
linear discriminant analysis
Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...
. Given a set of positive and negative examples, it is required to minimize the number of features that are required to correctly classify them. The problem is known as the
minimum feature set problem
In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given ra ...
. An algorithm that approximates Min-RVLS within a factor of
could substantially reduce the number of training samples required to attain a given accuracy level.
The
shortest codeword problem in
coding theory
Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are stud ...
is the same problem as Min-RVLS
when the coefficients are in GF(2).
Related problems
In minimum unsatisfied linear relations (Min-ULR), we are given a binary relation ''R'' and a linear system ''A x'' ''R'' ''b'', which is now assumed to be ''infeasible''. The goal is to find a vector ''x'' that violates as few relations as possible, while satisfying all the others.
Min-ULR
��is trivially solvable, since any system with real variables and a finite number of inequality constraints is feasible. As for the other three variants:
* Min-ULR
,≥">,>,≥are NP-hard even with homogeneous systems and bipolar coefficients (coefficients in ).
* The NP-complete problem
Minimum feedback arc set
In graph theory and graph algorithms, a feedback arc set or feedback edge set in a directed graph is a subset of the edges of the graph that contains at least one edge out of every cycle in the graph. Removing these edges from the graph breaks a ...
reduces to Min-ULR
�� with exactly one 1 and one -1 in each constraint, and all right-hand sides equal to 1.
* Min-ULR
,≥">,>,≥are polynomial if the number of variables ''n'' is constant: they can be solved polynomially using an algorithm of Greer in time
.
* Min-ULR
,≥">,>,≥are linear if the number of constraints ''m'' is constant, since all subsystems can be checked in time ''O''(''n'').
*Min-ULR
��is polynomial in some special case.
*Min-ULR
,≥">,>,≥can be approximated within ''n'' + 1 in polynomial time.
*Min-ULR
,≥are
minimum-dominating-set-hard, even with homogeneous systems and ternary coefficients (in ).
*Min-ULR
cannot be approximated within a factor of
, for any
, unless NP is contained in
DTIME In computational complexity theory, DTIME (or TIME) is the computational resource of computation time for a deterministic Turing machine. It represents the amount of time (or number of computation steps) that a "normal" physical computer would ta ...
(
).
In the complementary problem maximum feasible linear subsystem (Max-FLS), the goal is to find a maximum subset of the constraints that can be satisfied simultaneously.
* Max-FLS
��can be solved in polynomial time.
* Max-FLS
is NP-hard even with homogeneous systems and bipolar coefficients.
* . With integer coefficients, it is hard to approximate within
. With coefficients over GF
it is ''q''-approximable.
* Max-FLS
and Max-FLS
��are NP-hard even with homogeneous systems and bipolar coefficients. They are 2-approximable, but they cannot be approximated within any smaller constant factor.
Hardness of approximation
All four variants of Min-RVLS are hard to approximate. In particular all four variants cannot be approximated within a factor of
, for any
, unless NP is contained in
DTIME In computational complexity theory, DTIME (or TIME) is the computational resource of computation time for a deterministic Turing machine. It represents the amount of time (or number of computation steps) that a "normal" physical computer would ta ...
(
).
The hardness is proved by reductions:
* There is a reduction from Min-ULR
to Min-RVLS
It also applies to Min-RVLS
��and Min-RVLS
since each equation can be replaced by two complementary inequalities.
* There is a reduction from
minimum-dominating-set to Min-RVLS
��
On the other hand, there is a reduction from Min-RVLS
to Min-ULR
It also applies to Min-ULR
��and Min-ULR
since each equation can be replaced by two complementary inequalities.
Therefore, when R is in , Min-ULR and Min-RVLS are equivalent in terms of approximation hardness.
References
{{Reflist
Linear programming
NP-hard problems
Approximation algorithms
Combinatorial optimization