List Of Large Language Models

	List Of Large Language Models A large language model (LLM) is a type of machine learning Model#Conceptual model, model designed for natural language processing tasks such as language Generative artificial intelligence, generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text. This page lists notable large language models. List For the training cost column, 1 petaFLOP-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP. Also, only the largest model's cost is written. See also * List of chatbots * List of language model benchmarks Notes References {{Authority control Software comparisons Large language models ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Large Language Model A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pretrained transformers (GPTs), which are largely used in generative chatbots such as ChatGPT or Gemini. LLMs can be fine-tuned for specific tasks or guided by prompt engineering. These models acquire predictive power regarding syntax, semantics, and ontologies inherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained in. History Before the emergence of transformer-based models in 2017, some language models were considered large relative to the computational and data constraints of their time. In the early 1990s, IBM's statistical models pioneered word alignment techniques for machine translation, laying the groundwork for corpus-based language modeling. A sm ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	GPT-2 Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of Generative pre-trained transformer, GPT models. GPT-2 was pre-trained on a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019. GPT-2 was created as a "direct scale-up" of GPT-1 with a ten-fold increase in both its parameter count and the size of its training dataset. It is a general-purpose learner and its ability to perform the various tasks was a consequence of its general ability to accurately predict the next item in a sequence, which enabled it to machine translation, translate texts, question answering, answer questions about a topic from a text, automatic summarization, summarize passages from a larger text, and natural language generation, generate text output on a level sometimes Turing test, indistinguishable from that of humans; however, it ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	DeepMind DeepMind Technologies Limited, trading as Google DeepMind or simply DeepMind, is a British–American artificial intelligence research laboratory which serves as a subsidiary of Alphabet Inc. Founded in the UK in 2010, it was acquired by Google in 2014 and merged with Google AI's Google Brain division to become Google DeepMind in April 2023. The company is headquartered in London, with research centres in the United States, Canada, France, Germany, and Switzerland. DeepMind introduced neural Turing machines (neural networks that can access external memory like a conventional Turing machine), resulting in a computer that loosely resembles short-term memory in the human brain. DeepMind has created neural network models to play video games and board games. It made headlines in 2016 after its AlphaGo program beat a human professional Go player Lee Sedol, a world champion, in a five-game match, which was the subject of a documentary film. A more general program, AlphaZer ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Mixture Of Experts Mixture of experts (MoE) is a machine learning technique where multiple expert networks (learners) are used to divide a problem space into homogeneous regions. MoE represents a form of ensemble learning. They were also called committee machines. Basic theory MoE always has the following components, but they are implemented and combined differently according to the problem being solved: * Experts f_1, ..., f_n, each taking the same input x, and producing outputs f_1(x), ..., f_n(x). * A weighting function (also known as a gating function) w, which takes input x and produces a vector of outputs (w(x)_1, ..., w(x)_n). This may or may not be a probability distribution, but in both cases, its entries are non-negative. * \theta = (\theta_0, \theta_1, ..., \theta_n) is the set of parameters. The parameter \theta_0 is for the weighting function. The parameters \theta_1, \dots, \theta_n are for the experts. * Given an input x, the mixture of experts produces a single output by combinin ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Anthropic Anthropic PBC is an American artificial intelligence (AI) startup company founded in 2021. Anthropic has developed a family of large language models (LLMs) named Claude as a competitor to OpenAI's ChatGPT and Google's Gemini. According to the company, it researches and develops AI to "study their safety properties at the technological frontier" and use this research to deploy safe models for the public. Anthropic was founded by former members of OpenAI, including siblings Daniela Amodei and Dario Amodei. In September 2023, Amazon announced an investment of up to $4 billion, followed by a $2 billion commitment from Google in the following month. History Founding and early development (2021–2022) Anthropic was founded in 2021 by seven former employees of OpenAI, including siblings Daniela Amodei and Dario Amodei, the latter of whom served as OpenAI's Vice President of Research. In April 2022, Anthropic announced it had received $580 million in funding, including a $500 ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]