HOME

TheInfoList



OR:

In
mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the ''multiplicity'' of that element in the multiset. As a consequence, an infinite number of multisets exist that contain only elements and , but vary in the multiplicities of their elements: * The set contains only elements and , each having multiplicity 1 when is seen as a multiset. * In the multiset , the element has multiplicity 2, and has multiplicity 1. * In the multiset , and both have multiplicity 3. These objects are all different when viewed as multisets, although they are the same set, since they all consist of the same elements. As with sets, and in contrast to ''
tuple In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...
s'', the order in which elements are listed does not matter in discriminating multisets, so and denote the same multiset. To distinguish between sets and multisets, a notation that incorporates square brackets is sometimes used: the multiset can be denoted by . The
cardinality The thumb is the first digit of the hand, next to the index finger. When a person is standing in the medical anatomical position (where the palm is facing to the front), the thumb is the outermost digit. The Medical Latin English noun for thum ...
or "size" of a multiset is the sum of the multiplicities of all its elements. For example, in the multiset the multiplicities of the members , , and are respectively 2, 3, and 1, and therefore the cardinality of this multiset is 6. Nicolaas Govert de Bruijn coined the word ''multiset'' in the 1970s, according to Donald Knuth. However, the concept of multisets predates the coinage of the word ''multiset'' by many centuries. Knuth himself attributes the first study of multisets to the Indian mathematician Bhāskarāchārya, who described permutations of multisets around 1150. Other names have been proposed or used for this concept, including ''list'', ''bunch'', ''bag'', ''heap'', ''sample'', ''weighted set'', ''collection'', and ''suite''.


History

Wayne Blizard traced multisets back to the very origin of numbers, arguing that "in ancient times, the number ''n'' was often represented by a collection of ''n'' strokes, tally marks, or units." These and similar collections of objects can be regarded as multisets, because strokes, tally marks, or units are considered indistinguishable. This shows that people implicitly used multisets even before mathematics emerged. Practical needs for this structure have caused multisets to be rediscovered several times, appearing in literature under different names. For instance, they were important in early AI languages, such as QA4, where they were referred to as ''bags,'' a term attributed to Peter Deutsch. A multiset has been also called an aggregate, heap, bunch, sample, weighted set, occurrence set, and fireset (finitely repeated element set). Although multisets were used implicitly from ancient times, their explicit exploration happened much later. The first known study of multisets is attributed to the Indian mathematician Bhāskarāchārya circa 1150, who described permutations of multisets. The work of Marius Nizolius (1498–1576) contains another early reference to the concept of multisets. Athanasius Kircher found the number of multiset permutations when one element can be repeated. Jean Prestet published a general rule for multiset permutations in 1675.
John Wallis John Wallis (; ; ) was an English clergyman and mathematician, who is given partial credit for the development of infinitesimal calculus. Between 1643 and 1689 Wallis served as chief cryptographer for Parliament and, later, the royal court. ...
explained this rule in more detail in 1685. Multisets appeared explicitly in the work of Richard Dedekind. Other mathematicians formalized multisets and began to study them as precise mathematical structures in the 20th century. For example, Hassler Whitney (1933) described ''generalized sets'' ("sets" whose characteristic functions may take any
integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
value: positive, negative or zero). Monro (1987) investigated the category Mul of multisets and their morphisms, defining a ''multiset'' as a set with an equivalence relation between elements "of the same ''sort''", and a ''morphism'' between multisets as a function that respects ''sorts''. He also introduced a ''multinumber'': a function ''f''(''x'') from a multiset to the
natural number In mathematics, the natural numbers are the numbers 0, 1, 2, 3, and so on, possibly excluding 0. Some start counting with 0, defining the natural numbers as the non-negative integers , while others start with 1, defining them as the positive in ...
s, giving the ''multiplicity'' of element ''x'' in the multiset. Monro argued that the concepts of multiset and multinumber are often mixed indiscriminately, though both are useful.


Examples

One of the simplest and most natural examples is the multiset of prime factors of a natural number . Here the underlying set of elements is the set of prime factors of . For example, the number 120 has the
prime factorization In mathematics, integer factorization is the decomposition of a positive integer into a product of integers. Every positive integer greater than 1 is either the product of two or more integer factors greater than 1, in which case it is a comp ...
120 = 2^3 3^1 5^1, which gives the multiset . A related example is the multiset of solutions of an
algebraic equation In mathematics, an algebraic equation or polynomial equation is an equation of the form P = 0, where ''P'' is a polynomial with coefficients in some field, often the field of the rational numbers. For example, x^5-3x+1=0 is an algebraic equati ...
. A
quadratic equation In mathematics, a quadratic equation () is an equation that can be rearranged in standard form as ax^2 + bx + c = 0\,, where the variable (mathematics), variable represents an unknown number, and , , and represent known numbers, where . (If and ...
, for example, has two solutions. However, in some cases they are both the same number. Thus the multiset of solutions of the equation could be , or it could be . In the latter case it has a solution of multiplicity 2. More generally, the fundamental theorem of algebra asserts that the complex solutions of a polynomial equation of degree always form a multiset of cardinality . A special case of the above are the eigenvalues of a matrix, whose multiplicity is usually defined as their multiplicity as
roots A root is the part of a plant, generally underground, that anchors the plant body, and absorbs and stores water and nutrients. Root or roots may also refer to: Art, entertainment, and media * ''The Root'' (magazine), an online magazine focusin ...
of the characteristic polynomial. However two other multiplicities are naturally defined for eigenvalues, their multiplicities as roots of the minimal polynomial, and the geometric multiplicity, which is defined as the
dimension In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coo ...
of the kernel of (where is an eigenvalue of the matrix ). These three multiplicities define three multisets of eigenvalues, which may be all different: Let be a matrix in Jordan normal form that has a single eigenvalue. Its multiplicity is , its multiplicity as a root of the minimal polynomial is the size of the largest Jordan block, and its geometric multiplicity is the number of Jordan blocks.


Definition

A multiset may be formally defined as an ordered pair where is a set called a ''universe'' or the ''underlying set'', and m\colon U \to \mathbb_ is a function from to the nonnegative integers. The value for an element is called the ''multiplicity'' of in the multiset and intepreted as the number of occurences of in the multiset. The ''support'' of a multiset is the subset of formed by the elements such that . A ''finite multiset'' is a multiset with a finite support. Most authors define ''multisets'' as finite multisets. This is the case in this article, where, unless otherwise stated, all multisets are finite multisets. Some authors define multisets with the additional constraint that for every , or, equivalently, the support equals the underlying set. Multisets with infinite multiplicities have also been studied; they are not considered in this article. Some authors define a multiset in terms of a finite index set and a function where the multiplicity of an element is , the number of elements of that get mapped to by . Multisets may be represented as sets, with some elements repeated. For example, the multiset with support and multiplicity function such that can be represented as . A more compact notation, in case of high multiplicities is for the same multiset. If A=\, a multiset with support included in is often represented as a_1^ \cdots a_n^, to which the computation rules of indeterminates can be applied; that is, exponents 1 and factors with exponent 0 can be removed, and the multiset does not depend on the order of the factors. This allows extending the notation to infinite underlying sets as \prod_ a^. An advantage of notation is that it allows using the notation without knowing the exact support. For example, the prime factors of a
natural number In mathematics, the natural numbers are the numbers 0, 1, 2, 3, and so on, possibly excluding 0. Some start counting with 0, defining the natural numbers as the non-negative integers , while others start with 1, defining them as the positive in ...
form a multiset such that n=\prod_ p^= 2^ 3^ 5^\cdots. The finite subsets of a set are exactly the multisets with underlying set , such that for every .


Basic properties and operations

Elements of a multiset are generally taken in a fixed set , sometimes called a ''universe'', which is often the set of
natural number In mathematics, the natural numbers are the numbers 0, 1, 2, 3, and so on, possibly excluding 0. Some start counting with 0, defining the natural numbers as the non-negative integers , while others start with 1, defining them as the positive in ...
s. An element of that does not belong to a given multiset is said to have a multiplicity 0 in this multiset. This extends the multiplicity function of the multiset to a function from to the set \N of non-negative integers. This defines a one-to-one correspondence between these functions and the multisets that have their elements in . This extended multiplicity function is commonly called simply the multiplicity function, and suffices for defining multisets when the universe containing the elements has been fixed. This multiplicity function is a generalization of the
indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , then the indicator functio ...
of a
subset In mathematics, a Set (mathematics), set ''A'' is a subset of a set ''B'' if all Element (mathematics), elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they a ...
, and shares some properties with it. The support of a multiset A in a universe is the underlying set of the multiset. Using the multiplicity function m, it is characterized as \operatorname(A) := \. A multiset is ''finite'' if its support is finite, or, equivalently, if its cardinality , A, = \sum_ m_A(x) = \sum_ m_A(x) is finite. The ''empty multiset'' is the unique multiset with an empty support (underlying set), and thus a cardinality 0. The usual operations of sets may be extended to multisets by using the multiplicity function, in a similar way to using the indicator function for subsets. In the following, and are multisets in a given universe , with multiplicity functions m_A and m_B. * Inclusion: is ''included'' in , denoted , if m_A(x) \le m_B(x)\quad\forall x\in U. * Union: the ''union'' (called, in some contexts, the ''maximum'' or ''lowest common multiple'') of and is the multiset with multiplicity function m_C(x) = \max(m_A(x),m_B(x))\quad\forall x\in U. * Intersection: the ''intersection'' (called, in some contexts, the ''infimum'' or ''greatest common divisor'') of and is the multiset with multiplicity function m_C(x) = \min(m_A(x),m_B(x))\quad\forall x\in U. * Sum: the ''sum'' of and is the multiset with multiplicity function m_C(x) = m_A(x) + m_B(x)\quad\forall x\in U. It may be viewed as a generalization of the disjoint union of sets. It defines a commutative monoid structure on the finite multisets in a given universe. This monoid is a free commutative monoid, with the universe as a basis. * Difference: the ''difference'' of and is the multiset with multiplicity function m_C(x) = \max(m_A(x) - m_B(x), 0)\quad\forall x\in U. Two multisets are ''disjoint'' if their supports are disjoint sets. This is equivalent to saying that their intersection is the empty multiset or that their sum equals their union. There is an inclusion–exclusion principle for finite multisets (similar to the one for sets), stating that a finite union of finite multisets is the difference of two sums of multisets: in the first sum we consider all possible intersections of an odd number of the given multisets, while in the second sum we consider all possible intersections of an even number of the given multisets.


Counting multisets

The number of multisets of cardinality , with elements taken from a finite set of cardinality , is sometimes called the multiset coefficient or multiset number. This number is written by some authors as \textstyle\left(\!\!\!\!\right), a notation that is meant to resemble that of
binomial coefficient In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
s; it is used for instance in (Stanley, 1997), and could be pronounced " multichoose " to resemble " choose " for \tbinom n k. Like the
binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...
that involves binomial coefficients, there is a negative binomial distribution in which the multiset coefficients occur. Multiset coefficients should not be confused with the multinomial coefficients that occur in the
multinomial theorem In mathematics, the multinomial theorem describes how to expand a power of a sum in terms of powers of the terms in that sum. It is the generalization of the binomial theorem from binomials to multinomials. Theorem For any positive integer ...
. The value of multiset coefficients can be given explicitly as \left(\!\!\!\!\right) = = \frac = , where the second expression is as a binomial coefficient; many authors in fact avoid separate notation and just write binomial coefficients. So, the number of such multisets is the same as the number of subsets of cardinality of a set of cardinality . The analogy with binomial coefficients can be stressed by writing the numerator in the above expression as a rising factorial power \left(\!\!\!\!\right) = , to match the expression of binomial coefficients using a falling factorial power: = . For example, there are 4 multisets of cardinality 3 with elements taken from the set of cardinality 2 (, ), namely , , , . There are also 4 ''subsets'' of cardinality 3 in the set of cardinality 4 (), namely , , , . One simple way to prove the equality of multiset coefficients and binomial coefficients given above involves representing multisets in the following way. First, consider the notation for multisets that would represent (6 s, 2 s, 3 s, 7 s) in this form: : This is a multiset of cardinality made of elements of a set of cardinality . The number of characters including both dots and vertical lines used in this notation is . The number of vertical lines is 4 − 1. The number of multisets of cardinality 18 is then the number of ways to arrange the vertical lines among the 18 + 4 − 1 characters, and is thus the number of subsets of cardinality 4 − 1 of a set of cardinality . Equivalently, it is the number of ways to arrange the 18 dots among the characters, which is the number of subsets of cardinality 18 of a set of cardinality . This is = = 1330, thus is the value of the multiset coefficient and its equivalencies: \begin \left(\!\!\!\!\right) &

\frac=, \\ ex&=\frac, \\ ex&=\frac , \\ ex&=\frac. \end
From the relation between binomial coefficients and multiset coefficients, it follows that the number of multisets of cardinality in a set of cardinality can be written \left(\!\!\!\!\right)=(-1)^k. Additionally, \left(\!\!\!\!\right)=\left(\!\!\!\!\right).


Recurrence relation

A recurrence relation for multiset coefficients may be given as \left(\!\!\!\!\right) = \left(\!\!\!\!\right) + \left(\!\!\!\!\right) \quad \mbox n,k>0 with \left(\!\!\!\!\right) = 1,\quad n\in\N, \quad\mbox\quad \left(\!\!\!\!\right) = 0,\quad k>0. The above recurrence may be interpreted as follows. Let := \ be the source set. There is always exactly one (empty) multiset of size 0, and if there are no larger multisets, which gives the initial conditions. Now, consider the case in which . A multiset of cardinality with elements from might or might not contain any instance of the final element . If it does appear, then by removing once, one is left with a multiset of cardinality of elements from , and every such multiset can arise, which gives a total of \left(\!\!\!\!\right) possibilities. If does not appear, then our original multiset is equal to a multiset of cardinality with elements from , of which there are \left(\!\!\!\!\right). Thus, \left(\!\!\!\!\right) = \left(\!\!\!\!\right) + \left(\!\!\!\!\right).


Generating series

The generating function of the multiset coefficients is very simple, being \sum_^\infty \left(\!\!\!\!\right)t^d = \frac. As multisets are in one-to-one correspondence with monomials, \left(\!\!\!\!\right) is also the number of monomials of degree in indeterminates. Thus, the above series is also the Hilbert series of the
polynomial ring In mathematics, especially in the field of algebra, a polynomial ring or polynomial algebra is a ring formed from the set of polynomials in one or more indeterminates (traditionally also called variables) with coefficients in another ring, ...
k _1, \ldots, x_n As \left(\!\!\!\!\right) is a polynomial in , it and the generating function are well defined for any complex value of .


Generalization and connection to the negative binomial series

The multiplicative formula allows the definition of multiset coefficients to be extended by replacing by an arbitrary number (negative, real, or complex): \left(\!\!\!\!\right) = \frac = \frac \quad\text k\in\N \text \alpha. With this definition one has a generalization of the negative binomial formula (with one of the variables set to 1), which justifies calling the \left(\!\!\!\!\right) negative binomial coefficients: (1-X)^ = \sum_^\infty \left(\!\!\!\!\right) X^k. This Taylor series formula is valid for all complex numbers ''α'' and ''X'' with . It can also be interpreted as an identity of formal power series in ''X'', where it actually can serve as definition of arbitrary powers of series with constant coefficient equal to 1; the point is that with this definition all identities hold that one expects for
exponentiation In mathematics, exponentiation, denoted , is an operation (mathematics), operation involving two numbers: the ''base'', , and the ''exponent'' or ''power'', . When is a positive integer, exponentiation corresponds to repeated multiplication ...
, notably (1-X)^(1-X)^=(1-X)^ \quad\text\quad ((1-X)^)^=(1-X)^, and formulas such as these can be used to prove identities for the multiset coefficients. If is a nonpositive integer , then all terms with are zero, and the infinite series becomes a finite sum. However, for other values of , including positive integers and
rational number In mathematics, a rational number is a number that can be expressed as the quotient or fraction of two integers, a numerator and a non-zero denominator . For example, is a rational number, as is every integer (for example, The set of all ...
s, the series is infinite.


Applications

Multisets have various applications. They are becoming fundamental in
combinatorics Combinatorics is an area of mathematics primarily concerned with counting, both as a means and as an end to obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many ...
. Multisets have become an important tool in the theory of
relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
s, which often uses the synonym ''bag''. For instance, multisets are often used to implement relations in database systems. In particular, a table (without a primary key) works as a multiset, because it can have multiple identical records. Similarly, SQL operates on multisets and returns identical records. For instance, consider "SELECT name from Student". In the case that there are multiple records with name "Sara" in the student table, all of them are shown. That means the result of an SQL query is a multiset; if the result were instead a set, the repetitive records in the result set would have been eliminated. Another application of multisets is in modeling multigraphs. In multigraphs there can be multiple edges between any two given vertices. As such, the entity that specifies the edges is a multiset, and not a set. There are also other applications. For instance, Richard Rado used multisets as a device to investigate the properties of families of sets. He wrote, "The notion of a set takes no account of multiple occurrence of any one of its members, and yet it is just this kind of information that is frequently of importance. We need only think of the set of roots of a polynomial ''f''(''x'') or the
spectrum A spectrum (: spectra or spectrums) is a set of related ideas, objects, or properties whose features overlap such that they blend to form a continuum. The word ''spectrum'' was first used scientifically in optics to describe the rainbow of co ...
of a
linear operator In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pr ...
."


Generalizations

Different generalizations of multisets have been introduced, studied and applied to solving problems. * Signed multisets (in which multiplicity of an element can be any
integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
) * Real-valued multisets (in which multiplicity of an element can be any
real number In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...
) * Fuzzy multisets * Rough multisets * Hybrid sets * Multisets whose multiplicity is any real-valued step function * Soft multisets * Soft fuzzy multisets * Named sets (unification of all generalizations of sets)


See also

* Frequency (statistics) as multiplicity analog * Quasi-sets *
Set theory Set theory is the branch of mathematical logic that studies Set (mathematics), sets, which can be informally described as collections of objects. Although objects of any kind can be collected into a set, set theory – as a branch of mathema ...
* Bag-of-words model *


Notes

{{Notelist


References

Basic concepts in set theory Factorial and binomial topics