Overview
Algorithmic probability is the main ingredient of Solomonoff's theory of inductive inference, the theory of prediction based on observations; it was invented with the goal of using it for machine learning; given a sequence of symbols, which one will come next? Solomonoff's theory provides an answer that is optimal in a certain sense, although it is incomputable. Four principal inspirations for Solomonoff's algorithmic probability were: Occam's razor, Epicurus' principle of multiple explanations, modern computing theory (e.g. use of a universal Turing machine) and Bayes’ rule for prediction. Occam's razor and Epicurus' principle are essentially two different non-mathematical approximations of the universal prior. * Occam's razor: ''among the theories that are consistent with the observed phenomena, one should select the simplest theory''. * Epicurus' principle of multiple explanations: ''if more than one theory is consistent with the observations, keep all such theories''. At the heart of the universal prior is an abstract model of a computer, such as a universal Turing machine. Any abstract computer will do, as long as it is Turing-complete, i.e. everyFundamental Theorems
I. Kolmogorov's Invariance Theorem
Kolmogorov's Invariance theorem clarifies that the Kolmogorov Complexity, or ''Minimal Description Length'', of a dataset is invariant to the choice of Turing-Complete language used to simulate a Universal Turing Machine: : where .Interpretation
The minimal description such that serves as a natural representation of the string relative to the Turing-Complete language . Moreover, as can't be compressed further is an incompressible and hence uncomputable string. This corresponds to a scientists' notion of randomness and clarifies the reason why Kolmogorov Complexity is not computable. It follows that any piece of data has a necessary and sufficient representation in terms of a random string.Proof
The following is taken from From the theory of compilers, it is known that for any two Turing-Complete languages and , there exists a compiler expressed in that translates programs expressed in into functionally-equivalent programs expressed in . It follows that if we let be the shortest program that prints a given string then: : where , and by symmetry we obtain the opposite inequality.II. Levin's Universal Distribution
Given that any uniquely-decodable code satisfies the Kraft-McMillan inequality, prefix-free Kolmogorov Complexity allows us to derive the Universal Distribution: : where the fact that may simulate a prefix-free UTM implies that for two distinct descriptions and , isn't a substring of and isn't a substring of .Interpretation
In a Computable Universe, given a phenomenon with encoding generated by a physical process the probability of that phenomenon is well-defined and equal to the sum over the probabilities of distinct and independent causes. The prefix-free criterion is precisely what guarantees causal independence.Proof
This is an immediate consequence of the Kraft-McMillan inequality. Kraft's inequality states that given a sequence of strings there exists a prefix code with codewords where if and only if: : where is the size of the alphabet . Without loss of generality, let's suppose we may order the such that: : Now, there exists a prefix code if and only if at each step there is at least one codeword to choose that does not contain any of the previous codewords as a prefix. Due to the existence of a codeword at a previous stepHistory
Solomonoff invented the concept of algorithmic probability with its associated invariance theorem around 1960, publishing a report on it: "A Preliminary Report on a General Theory of Inductive Inference." He clarified these ideas more fully in 1964 with "A Formal Theory of Inductive Inference," Part I and Part II. In terms of practical implications and applications, the study of bias in empirical data related to Algorithmic Probability emerged in the early 2010s. The bias found led to methods that combined algorithmic probability with perturbation analysis in the context of causal analysis and non-differentiableSequential Decisions Based on Algorithmic Probability
Sequential Decisions Based on Algorithmic Probability is a theoretical framework proposed by Marcus Hutter to unify algorithmic probability with decision theory. The framework provides a foundation for creating universally intelligent agents capable of optimal performance in any computable environment. It builds on Solomonoff’s theory of induction and incorporates elements of reinforcement learning, optimization, and sequential decision-making.Hutter, M. (2005). Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Springer. ISBN 3-540-22139-5.Background
Inductive reasoning, the process of predicting future events based on past observations, is central to intelligent behavior. Hutter formalized this process using Occam’s razor and algorithmic probability. The framework is rooted in Kolmogorov complexity, which measures the simplicity of data by the length of its shortest descriptive program. This concept underpins the universal distribution MM, as introduced by Ray Solomonoff, which assigns higher probabilities to simpler hypotheses. Hutter extended the universal distribution to include actions, creating a framework capable of addressing problems such as prediction, optimization, and reinforcement learning in environments with unknown structures.The AIXI Model
The AIXI model is the centerpiece of Hutter’s theory. It describes a universal artificial agent designed to maximize expected rewards in an unknown environment. AIXI operates under the assumption that the environment can be represented by a computable probability distribution. It uses past observations to infer the most likely environmental model, leveraging algorithmic probability. Mathematically, AIXI evaluates all possible future sequences of actions and observations. It computes their algorithmic probabilities and expected utilities, selecting the sequence of actions that maximizes cumulative rewards. This approach transforms sequential decision-making into an optimization problem. However, the general formulation of AIXI is incomputable, making it impractical for direct implementation.Optimality and Limitations
AIXI is universally optimal in the sense that it performs as well as or better than any other agent in all computable environments. This universality makes it a theoretical benchmark for intelligence. However, its reliance on algorithmic probability renders it computationally infeasible, requiring exponential time to evaluate all possibilities. To address this limitation, Hutter proposed time-bounded approximations, such as AIXItl, which reduce computational demands while retaining many theoretical properties of the original model. These approximations provide a more practical balance between computational feasibility and optimality.Applications and Implications
The AIXI framework has significant implications for artificial intelligence and related fields. It provides a formal benchmark for measuring intelligence and a theoretical foundation for solving various problems, including prediction, reinforcement learning, and optimization. Despite its strengths, the framework has limitations. AIXI assumes that the environment is computable, excluding chaotic or non-computable systems. Additionally, its high computational requirements make real-world applications challenging.Philosophical Considerations
Hutter’s theory raises philosophical questions about the nature of intelligence and computation. The reliance on algorithmic probability ties intelligence to the ability to compute and predict, which may exclude certain natural or chaotic phenomena. Nonetheless, the AIXI model offers insights into the theoretical upper bounds of intelligent behavior and serves as a stepping stone toward more practical AI systems.Key people
* Ray Solomonoff * Andrey Kolmogorov * Leonid LevinSee also
* Solomonoff's theory of inductive inference * Algorithmic information theory * Bayesian inference *References
Sources
* Li, M. and Vitanyi, P., ''An Introduction to Kolmogorov Complexity and Its Applications'', 3rd Edition, Springer Science and Business Media, N.Y., 2008 *Further reading
* Rathmanner, S and Hutter, M.,External links