A foundation model is a large artificial intelligence model trained on a vast quantity of unlabeled data at scale (usually by self-supervised learning) resulting in a model that can be adapted to a wide range of downstream tasks. Foundation models have helped bring about a major transformation in how AI systems are built since their introduction in 2018. Early examples of foundation models were large pre-trained language models including BERT and GPT-3. Using the same ideas, domain specific models using sequences of other kinds of tokens, such as medical codes, have been built as well. Subsequently, several multimodal foundation models have been produced including DALL-E, Flamingo, Florence and NOOR. The Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) popularized the term.

Definitions

The Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) coined the term foundation model to refer to "any model that is trained on broad data (generally using self-supervision at scale) that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks". This is not a new technique in itself, as it is based on deep neural networks and self-supervised learning, but the scale at which it has been developed in the last years, and the potential for one model to be used for many different purposes, warrants a new term, the Stanford group argue. A foundation model is a "paradigm for building AI systems" in which a model trained on a large amount of unlabeled data can be adapted to many applications. Foundation models are "designed to be adapted (e.g., finetuned) to various downstream cognitive tasks by pre-training on broad data at scale". Key characteristics of foundation models are ''emergence'' and ''homogenization''. Because training data is not labelled by humans, the model emerges rather than being explicitly encoded. Properties that were not anticipated can appear. For example, a model trained on a large language dataset might learn to generate stories of its own, or to do arithmetic, without being explicitly programmed to do so. Homogenization means that the same method is used in many domains, which allows for powerful advances but also the possibility of "single points of failure".

Opportunities and risks

A 2021 arXiv report listed foundation models' capabilities in regards to "language, vision, robotics, reasoning, and human interaction", technical principles, such as "model architectures, training procedures, data, systems, security, evaluation, and theory", their applications, for example in law, healthcare, and education and their potential impact on society, including "inequity, misuse, economic and environmental impact, legal and ethical considerations". An article about foundation models in ''The Economist'' notes that "some worry that the technology’s heedless spread will further concentrate economic and political power".

References

{{Existential risk from artificial intelligence Natural language processing Computational linguistics Computational fields of study Language modeling Unsupervised learning Deep learning