Ugly Duckling Theorem
   HOME

TheInfoList



OR:

The ugly duckling theorem is an
argument An argument is a series of sentences, statements, or propositions some of which are called premises and one is the conclusion. The purpose of an argument is to give reasons for one's conclusion via justification, explanation, and/or persu ...
showing that classification is not really possible without some sort of
bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...
. More particularly, it assumes finitely many properties combinable by
logical connective In logic, a logical connective (also called a logical operator, sentential connective, or sentential operator) is a logical constant. Connectives can be used to connect logical formulas. For instance in the syntax of propositional logic, the ...
s, and finitely many objects; it asserts that any two different objects share the same number of ( extensional) properties. The theorem is named after
Hans Christian Andersen Hans Christian Andersen ( , ; 2 April 1805 – 4 August 1875) was a Danish author. Although a prolific writer of plays, travelogue (literature), travelogues, novels, and poems, he is best remembered for his literary fairy tales. Andersen's fai ...
's 1843 story " The Ugly Duckling", because it shows that a duckling is just as similar to a
swan Swans are birds of the genus ''Cygnus'' within the family Anatidae. The swans' closest relatives include the goose, geese and ducks. Swans are grouped with the closely related geese in the subfamily Anserinae where they form the tribe (biology) ...
as two swans are to each other. It was derived by Satosi Watanabe in 1969.


Mathematical formula

Suppose there are n things in the universe, and one wants to put them into classes or categories. One has no preconceived ideas or
bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...
es about what sorts of categories are "natural" or "normal" and what are not. So one has to consider all the possible classes that could be, all the possible ways of making a set out of the n objects. There are 2^n such ways, the size of the
power set In mathematics, the power set (or powerset) of a set is the set of all subsets of , including the empty set and itself. In axiomatic set theory (as developed, for example, in the ZFC axioms), the existence of the power set of any set is po ...
of n objects. One can use that to measure the similarity between two objects, and one would see how many sets they have in common. However, one cannot. Any two objects have exactly the same number of classes in common if we can form any possible class, namely 2^ (half the total number of classes there are). To see this is so, one may imagine each class is represented by an n-bit
string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
(or binary encoded integer), with a zero for each element not in the class and a one for each element in the class. As one finds, there are 2^n such strings. As all possible choices of zeros and ones are there, any two bit-positions will agree exactly half the time. One may pick two elements and reorder the bits so they are the first two, and imagine the numbers sorted lexicographically. The first 2^n/2 numbers will have bit #1 set to zero, and the second 2^n/2 will have it set to one. Within each of those blocks, the top 2^n/4 will have bit #2 set to zero and the other 2^n/4 will have it as one, so they agree on two blocks of 2^n/4 or on half of all the cases, no matter which two elements one picks. So if we have no preconceived bias about which categories are better, everything is then equally similar (or equally dissimilar). The number of predicates simultaneously satisfied by two non-identical elements is constant over all such pairs. Thus, some kind of inductive bias is needed to make judgements to prefer certain categories over others.


Boolean functions

Let x_1, x_2, \dots, x_n be a set of vectors of k booleans each. The ugly duckling is the vector which is least like the others. Given the booleans, this can be computed using
Hamming distance In information theory, the Hamming distance between two String (computer science), strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number ...
. However, the choice of boolean features to consider could have been somewhat arbitrary. Perhaps there were features derivable from the original features that were important for identifying the ugly duckling. The set of booleans in the vector can be extended with new features computed as
boolean function In mathematics, a Boolean function is a function whose arguments and result assume values from a two-element set (usually , or ). Alternative names are switching function, used especially in older computer science literature, and truth functi ...
s of the k original features. The only canonical way to do this is to extend it with ''all'' possible Boolean functions. The resulting completed vectors have 2^k features. The ugly duckling theorem states that there is no ugly duckling because any two completed vectors will either be equal or differ in exactly half of the features. Proof. Let x and y be two vectors. If they are the same, then their completed vectors must also be the same because any Boolean function of x will agree with the same Boolean function of y. If x and y are different, then there exists a coordinate i where the i-th coordinate of x differs from the i-th coordinate of y. Now the completed features contain every Boolean function on k Boolean variables, with each one exactly once. Viewing these Boolean functions as polynomials in k variables over GF(2), segregate the functions into pairs (f,g) where f contains the i-th coordinate as a linear term and g is f without that linear term. Now, for every such pair (f,g), x and y will agree on exactly one of the two functions. If they agree on one, they must disagree on the other and vice versa. (This proof is believed to be due to Watanabe.)


Discussion

A possible way around the ugly duckling theorem would be to introduce a constraint on how similarity is measured by limiting the properties involved in classification, for instance, between A and B. However Medin et al. (1993) point out that this does not actually resolve the arbitrariness or bias problem since in what respects A is similar to B: "varies with the stimulus context and task, so that there is no unique answer, to the question of how similar is one object to another". For example, "a barberpole and a zebra would be more similar than a horse and a zebra if the feature ''striped'' had sufficient weight. Of course, if these feature weights were fixed, then these similarity relations would be constrained". Yet the property "striped" as a weight 'fix' or constraint is arbitrary itself, meaning: "unless one can specify such criteria, then the claim that categorization is based on attribute matching is almost entirely vacuous". Stamos (2003) remarked that some judgments of overall similarity are non-arbitrary in the sense they are useful: Unless some properties are considered more salient, or 'weighted' more important than others, everything will appear equally similar, hence Watanabe (1986) wrote: "any objects, in so far as they are distinguishable, are equally similar". In a weaker setting that assumes infinitely many properties, Murphy and Medin (1985) give an example of two putative classified things, plums and lawnmowers: According to Woodward, Here: p. 874 lf the ugly duckling theorem is related to Schaffer's ''Conservation Law for Generalization Performance'', which states that all algorithms for learning of boolean functions from input/output examples have the same overall generalization performance as random guessing. The latter result is generalized by Woodward to functions on countably infinite domains.Woodward (2009), p. 875 lf


See also

*
No free lunch in search and optimization In computational complexity and optimization the no free lunch theorem is a result that states that for certain types of mathematical problems, the computational cost of finding a solution, averaged over all problems in the class, is the same ...
* No free lunch theorem *
Identity of indiscernibles The identity of indiscernibles is an ontological principle that states that there cannot be separate objects or entities that have all their properties in common. That is, entities ''x'' and ''y'' are identical if every predicate possessed by ...
– Classification (discernibility) is possible (with or without a
bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...
), but there cannot be separate objects or entities that have all their properties in common. * New riddle of induction


Notes

{{The Ugly Duckling Theorems Arguments Machine learning Ontology Metaphors referring to birds 1960s neologisms Bias