Llama (Large Language Model Meta AI, formerly stylized as LLaMA) is a family of
autoregressive
In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...
large language model
A large language model (LLM) is a language model consisting of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabelled text using self-supervised learning. LLMs emerged around 2018 an ...
s (LLMs) released by
Meta AI
Meta AI is an artificial intelligence laboratory that belongs to Meta Platforms Inc. (formerly known as Facebook, Inc.) Meta AI intends to develop various forms of artificial intelligence, improving augmented and artificial reality technologies. ...
starting in February 2023.
The latest version is Llama 3.3, released in December 2024.
Llama models are trained at different parameter sizes, ranging between 1B and 405B.
Originally, Llama was only available as a
foundation model
A foundation model, also known as large AI model, is a machine learning or deep learning model that is trained on broad data such that it can be applied across a wide range of use cases.Competition and Markets Authority (2023). ''AI Foundation Mo ...
.
Starting with Llama 2, Meta AI started releasing instruction fine-tuned versions alongside foundation models.
Model weights for the first version of Llama were made available to the research community under a non-commercial license, and access was granted on a case-by-case basis.
Unauthorized copies of the first model were shared via
BitTorrent.
Subsequent versions of Llama were made accessible outside academia and released under licenses that permitted some commercial use.
Alongside the release of Llama 3,
Meta
Meta (from the Greek μετά, '' meta'', meaning "after" or "beyond") is a prefix meaning "more comprehensive" or "transcending".
In modern nomenclature, ''meta''- can also serve as a prefix meaning self-referential, as a field of study or ende ...
added
virtual assistant
An intelligent virtual assistant (IVA) or intelligent personal assistant (IPA) is a software agent that can perform tasks or services for an individual based on commands or questions. The term " chatbot" is sometimes used to refer to virtua ...
features to
Facebook
Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin ...
and
WhatsApp
WhatsApp (also called WhatsApp Messenger) is an internationally available freeware, cross-platform, centralized instant messaging (IM) and voice-over-IP (VoIP) service owned by American company Meta Platforms (formerly Facebook). It allows user ...
in select regions, and a standalone website. Both services use a Llama 3 model.
Background
After the release of large language models such as
GPT-3
Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt.
The architecture is a standa ...
, a focus of research was up-scaling models which in some instances showed major increases in emergent capabilities. The release of
ChatGPT
ChatGPT (Generative Pre-trained Transformer) is a chatbot launched by OpenAI in November 2022. It is built on top of OpenAI's GPT-3 family of large language models, and is fine-tuned (an approach to transfer learning) with both supervised and ...
and its surprise success caused an increase in attention to large language models.
Compared with other responses to ChatGPT, Meta's Chief AI scientist
Yann LeCun
Yann André LeCun ( , ; originally spelled Le Cun; born 8 July 1960) is a French computer scientist working primarily in the fields of machine learning, computer vision, mobile robotics and computational neuroscience. He is the Silver Professor ...
stated that large language models are best for aiding with writing.
An empirical investigation of the Llama series was the
scaling laws. It was observed that the Llama 3 models showed that when a model is trained on data that is more than the "
Chinchilla
Chinchillas are either of two species ('' Chinchilla chinchilla'' and '' Chinchilla lanigera'') of crepuscular rodents of the parvorder Caviomorpha. They are slightly larger and more robust than ground squirrels, and are native to the Andes m ...
-optimal" amount, the performance continues to scale log-linearly. For example, the Chinchilla-optimal dataset for Llama 3 8B is 200 billion tokens, but performance continued to scale log-linearly to the 75-times larger dataset of 15 trillion tokens.
Initial release
LLaMA was announced on February 24, 2023, via a blog post and a paper describing the
model's training, architecture, and performance.
[ The inference code used to run the model was publicly released under the open-source ]GPLv3
The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general u ...
license.[ Access to the model's weights was managed by an application process, with access to be granted "on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world".][
Llama was trained on only publicly available information, and was trained at various model sizes, with the intention to make it more accessible to different hardware. The model was exclusively a ]foundation model
A foundation model, also known as large AI model, is a machine learning or deep learning model that is trained on broad data such that it can be applied across a wide range of use cases.Competition and Markets Authority (2023). ''AI Foundation Mo ...
, although the paper contained examples of instruction fine-tuned versions of the model.
Meta AI reported the 13B parameter model performance on most NLP benchmarks exceeded that of the much larger GPT-3
Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt.
The architecture is a standa ...
(with 175B parameters), and the largest 65B model was competitive with state of the art models such as PaLM
Palm most commonly refers to:
* Palm of the hand, the central region of the front of the hand
* Palm plants, of family Arecaceae
** List of Arecaceae genera
* Several other plants known as "palm"
Palm or Palms may also refer to:
Music
* Palm (b ...
and Chinchilla
Chinchillas are either of two species ('' Chinchilla chinchilla'' and '' Chinchilla lanigera'') of crepuscular rodents of the parvorder Caviomorpha. They are slightly larger and more robust than ground squirrels, and are native to the Andes m ...
.
Leak
On March 3, 2023, a torrent containing LLaMA's weights was uploaded, with a link to the torrent shared on the 4chan
4chan is an anonymous English-language imageboard website. Launched by Christopher "moot" Poole in October 2003, the site hosts boards dedicated to a wide variety of topics, from anime and manga to video games, cooking, weapons, television, ...
imageboard and subsequently spread through online AI communities.[ That same day, a pull request on the main LLaMA repository was opened, requesting to add the ]magnet link
Magnet is a URI scheme that defines the format of magnet links, a de facto standard for identifying files ( URN) by their content, via cryptographic hash value rather than by their location.
Although magnet links can be used in a number of co ...
to the official documentation. On March 4, a pull request was opened to add links to HuggingFace
Hugging Face, Inc. is an American company that develops tools for building applications using machine learning. It is most notable for its Transformers library built for natural language processing applications and its platform that allows users ...
repositories containing the model.[ On March 6, Meta filed ]takedown request
Notice and take down is a process operated by online hosts in response to court orders or allegations that content is illegal. Content is removed by the host following notice. Notice and take down is widely operated in relation to copyright infri ...
s to remove the HuggingFace repositories linked in the pull request, characterizing it as "unauthorized distribution" of the model. HuggingFace complied with the requests. On March 20, Meta filed a DMCA
The Digital Millennium Copyright Act (DMCA) is a 1998 United States copyright law that implements two 1996 treaties of the World Intellectual Property Organization (WIPO). It criminalizes production and dissemination of technology, devices, or ...
takedown request for copyright infringement against a repository containing a script that downloaded LLaMA from a mirror, and GitHub complied the next day.
Reactions to the leak varied. Some speculated that the model would be used for malicious purposes, such as more sophisticated spam. Some have celebrated the model's accessibility, as well as the fact that smaller versions of the model can be run relatively cheaply, suggesting that this will promote the flourishing of additional research developments.[ Multiple commentators, such as ]Simon Willison
Simon Willison is a British programmer, co-founder of the social conference directory Lanyrd, and Director of Architecture at Eventbrite. Originating from the UK, he currently resides in San Francisco, California. Simon is a co-creator of the Dj ...
, compared LLaMA to Stable Diffusion
Stable Diffusion is a deep learning, text-to-image model released in 2022. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and genera ...
, a text-to-image model
A text-to-image model is a machine learning model which takes as input a natural language description and produces an image matching that description. Such models began to be developed in the mid-2010s, as a result of advances in deep neural netwo ...
which, unlike comparably sophisticated models which preceded it, was openly distributed, leading to a rapid proliferation of associated tools, techniques, and software.[
]
Llama 2
On July 18, 2023, in partnership with Microsoft
Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...
, Meta announced Llama 2, the next generation of Llama. Meta trained and released Llama 2 in three model sizes: 7, 13, and 70 billion parameters. The model architecture remains largely unchanged from that of LLaMA-1 models, but 40% more data was used to train the foundational models. The accompanying preprint also mentions a model with 34B parameters that might be released in the future upon satisfying safety targets.
Llama 2 includes foundation models and models fine-tuned
In theoretical physics, fine-tuning is the process in which parameters of a model must be adjusted very precisely in order to fit with certain observations. This had led to the discovery that the fundamental constants and quantities fall into suc ...
for chat. In a further departure from the original version of Llama, all models are released with weights and may be used for many commercial use cases. However, because Llama's license enforces an acceptable use policy
An acceptable use policy (AUP), acceptable usage policy or fair use policy is a set of rules applied by the owner, creator or administrator of a computer network website, or service. That restricts the ways in which the network, website or system m ...
that prohibits Llama from being used for some purposes, Meta's use of the term ''open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
'' to describe Llama has been disputed by the Open Source Initiative
The Open Source Initiative (OSI) is the steward of the Open Source Definition, the set of rules that define open source software. It is a California public-benefit nonprofit corporation, with 501(c)(3) tax-exempt status.
The organization w ...
(which maintains the ''The Open Source Definition
''The Open Source Definition'' is a document published by the Open Source Initiative, to determine whether a software license can be labeled with the open-source certification mark.
The definition was taken from the exact text of the Debian Free ...
'') and others.
Code Llama is a fine-tune of Llama 2 with code specific datasets. 7B, 13B, and 34B versions were released on August 24, 2023, with the 70B releasing on the January 29, 2024. Starting with the foundation models from Llama 2, Meta AI would train an additional 500B tokens of code datasets, before an additional 20B token of long-context data, creating the Code Llama foundation models. This foundation model was further trained on 5B instruction following token to create the instruct fine-tune. Another foundation model was created for Python code, which trained on 100B tokens of Python-only code, before the long-context data.
Llama 3
On April 18, 2024, Meta released Llama-3 with two sizes: 8B and 70B parameters. The models have been pre-trained on approximately 15 trillion tokens of text gathered from “publicly available sources” with the instruct models fine-tuned on “publicly available instruction datasets, as well as over 10M human-annotated examples". Meta AI's testing showed in April 2024 that Llama 3 70B was beating Gemini
Gemini may refer to:
Space
* Gemini (constellation), one of the constellations of the zodiac
** Gemini in Chinese astronomy
* Project Gemini, the second U.S. crewed spaceflight program
* Gemini Observatory, consisting of telescopes in the Northern ...
Pro 1.5 and Claude Claude may refer to:
__NOTOC__ People and fictional characters
* Claude (given name), a list of people and fictional characters
* Claude (surname), a list of people
* Claude Lorrain (c. 1600–1682), French landscape painter, draughtsman and etcher ...
3 Sonnet on most benchmarks. Meta also announced plans to make Llama 3 multilingual and multimodal, better at coding and reasoning, and to increase its context window.
During an interview with Dwarkesh Patel, Mark Zuckerberg said that the 8B version of Llama 3 was nearly as powerful as the largest Llama 2. Compared to previous models, Zuckerberg stated the team was surprised that the 70B model was still learning even at the end of the 15T tokens training. The decision was made to end training to focus GPU power elsewhere.
Llama-3.1 was released on July 23, 2024, with three sizes: 8B, 70B, and 405B parameters.
Comparison of models
For the training cost column, only the largest model's cost is written. So for example, "21,000" is the training cost of Llama 2 69B in units of petaFLOP-day. Also, 1 petaFLOP-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP. "T" means "trillion" and "B" means "billion".
Architecture and training
Architecture
Like GPT-3, the Llama series of models are decoder-only Transformers
''Transformers'' is a media franchise produced by American toy company Hasbro and Japanese toy company Tomy, Takara Tomy. It primarily follows the Autobots and the Decepticons, two alien robot factions at war that can transform into other forms ...
, but there are some minor differences:
* SwiGLU activation function
In artificial neural networks, the activation function of a node defines the output of that node given an input or set of inputs.
A standard integrated circuit can be seen as a digital network of activation functions that can be "ON" (1) or " ...
instead of GeLU;
* rotary positional embeddings (RoPE) instead of absolute positional embedding;
* RMSNorm
In machine learning, normalization is a statistical technique with various applications. There are mainly two forms of normalization, data normalization and activation normalization. Data normalization, or feature scaling, is a general technique i ...
instead of layer normalization
In machine learning, normalization is a statistical technique with various applications. There are mainly two forms of normalization, data normalization and activation normalization. Data normalization, or feature scaling, is a general technique i ...
;
Training datasets
LLaMA's developers focused their effort on scaling the model's performance by increasing the volume of training data, rather than the number of parameters, reasoning that the dominating cost for LLMs is from doing inference on the trained model rather than the computational cost of the training process.
LLaMA 1 foundational models were trained on a data set with 1.4 trillion tokens, drawn from publicly available data sources, including:
* Webpages scraped by CommonCrawl
* Open source repositories of source code from GitHub
GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, co ...
* Wikipedia
Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system. Wikipedia is the largest and most-read ref ...
in 20 languages
* Public domain
The public domain (PD) consists of all the creative work to which no exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly waived, or may be inapplicable. Because those rights have expired, ...
books from Project Gutenberg
Project Gutenberg (PG) is a volunteer effort to digitize and archive cultural works, as well as to "encourage the creation and distribution of eBooks."
It was founded in 1971 by American writer Michael S. Hart and is the oldest digital li ...
* Books3 books dataset
* The LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well.
In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosper ...
source code for scientific papers uploaded to ArXiv
arXiv (pronounced " archive"—the X represents the Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not peer review. It consists o ...
* Questions and answers from Stack Exchange
Stack Exchange is a network of question-and-answer (Q&A) websites on topics in diverse fields, each site covering a specific topic, where questions, answers, and users are subject to a reputation award process. The reputation system allows t ...
websites
On April 17, 2023, TogetherAI launched a project named RedPajama to reproduce and distribute an open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
version of the LLaMA dataset.[ The dataset has approximately 1.2 trillion tokens and is publicly available for download.][
Llama 2 foundational models were trained on a data set with 2 trillion tokens. This data set was curated to remove Web sites that often disclose personal data of people. It also upsamples sources considered trustworthy.] Llama 2 - Chat was additionally fine-tuned on 27,540 prompt-response pairs created for this project, which performed better than larger but lower-quality third-party datasets. For AI alignment, reinforcement learning with human feedback (RLHF) was used with a combination of 1,418,091 Meta examples and seven smaller datasets. The average dialog depth was 3.9 in the Meta examples, 3.0 for Anthropic Helpful and Anthropic Harmless sets, and 1.0 for five other sets, including OpenAI Summarize, StackExchange, etc.
Llama 3 consists of mainly English data, with over 5% in over 30 other languages. Its dataset was filtered by a text-quality classifier, and the classifier was trained by text synthesized by Llama 2.
Fine-tuning
Llama 1 models are only available as foundational models with self-supervised learning and without fine-tuning. Llama 2 – Chat models were derived from foundational Llama 2 models. Unlike GPT-4
Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI and the fourth in its GPT series. It was released on March 14, 2023, and has been made publicly available in a limited form via ChatGPT Plus, ...
which increased context length during fine-tuning, Llama 2 and Code Llama - Chat have the same context length of 4K tokens. Supervised fine-tuning used an autoregressive loss function with token loss on user prompts zeroed out. The batch size was 64.
For AI alignment
In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards their designers’ intended goals and interests. An ''aligned'' AI system advances the intended objective; a ''misaligned'' AI system is compet ...
, human annotators wrote prompts and then compared two model outputs (a binary protocol), giving confidence levels and separate safety labels with veto power. Two separate reward models were trained from these preferences for safety and helpfulness using Reinforcement learning from human feedback
In machine learning, reinforcement learning from human feedback (RLHF) or reinforcement learning from human preferences is a technique that trains a "reward model" directly from human feedback and uses the model as a reward function to optimize an ...
(RLHF). A major technical contribution is the departure from the exclusive use of Proximal Policy Optimization
Proximal Policy Optimization (PPO) is a family of model-free reinforcement learning algorithms developed at OpenAI in 2017. PPO algorithms are policy gradient methods, which means that they search the space of policies rather than assigning va ...
(PPO) for RLHF – a new technique based on Rejection sampling
In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of ...
was used, followed by PPO.
Multi-turn consistency in dialogs was targeted for improvement, to make sure that "system messages" (initial instructions, such as "speak in French" and "act like Napoleon") are respected during the dialog. This was accomplished using the new "Ghost attention" technique during training, which concatenates relevant instructions to each new user message but zeros out the loss function for tokens in the prompt (earlier parts of the dialog).
Applications
The Stanford University Institute for Human-Centered Artificial Intelligence
Human-centered computing (HCC) studies the design, development, and deployment of mixed-initiative human-computer systems. It is emerged from the convergence of multiple disciplines that are concerned both with understanding human beings and w ...
(HAI) Center for Research on Foundation Models (CRFM) released Alpaca, a training recipe based on the LLaMA 7B model that uses the "Self-Instruct" method of instruction tuning
A large language model (LLM) is a language model consisting of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabelled text using self-supervised learning. LLMs emerged around 2018 an ...
to acquire capabilities comparable to the OpenAI GPT-3 series text-davinci-003 model at a modest cost. The model files were officially removed on March 21, 2023, over hosting costs and safety concerns, though the code and paper remain online for reference.
Meditron is a family of Llama-based finetuned on a corpus of clinical guidelines, PubMed
PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain ...
papers, and articles. It was created by researchers at École Polytechnique Fédérale de Lausanne
École may refer to:
* an elementary school in the French educational stages normally followed by secondary education establishments (collège and lycée)
* École (river), a tributary of the Seine
The Seine ( , ) is a river in northern Franc ...
School of Computer and Communication Sciences, and the Yale School of Medicine
The Yale School of Medicine is the graduate medical school at Yale University, a private research university in New Haven, Connecticut. It was founded in 1810 as the Medical Institution of Yale College and formally opened in 1813.
The primary t ...
. It shows increased performance on medical-related benchmarks such as MedQA and MedMCQA.
Zoom
Zoom may refer to:
Technology Computing
* Zoom (software), videoconferencing application
* Page zooming, the ability to magnify or shrink a portion of a page on a computer display
* Zooming user interface, a graphical interface allowing for imag ...
used Meta Llama 2 to create an AI Companion that can summarize meetings, provide helpful presentation tips, and assist with message responses. This AI Companion is powered by multiple models, including Meta Llama 2.
Reuters reported in 2024 that many Chinese foundation models relied on Llama models for their training.
llama.cpp
Software developer Georgi Gerganov released llama.cpp
llama.cpp is an open source software library that performs inference on various large language models such as Llama. It is co-developed alongside the GGML project, a general-purpose tensor library.
Command-line tools are included with the libr ...
as open-source on March 10, 2023. It's a re-implementation of LLaMA in C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significa ...
, allowing systems without a powerful GPU to run the model locally. The llama.cpp project introduced the GGUF file format, a binary format that stores both tensors and metadata. The format focuses on supporting different quantization types, which can reduce memory usage, and increase speed at the expense of lower model precision.
llamafile created by Justine Tunney
Justine Alexandra Roberts Tunney (born 1984) is a software developer and a former activist for Occupy Wall Street.
Biography
Tunney started publishing software in 1998. She built software for other hackers and fiddled with AOL.
In 1999, at t ...
is an open-source tool that bundles llama.cpp with the model into a single executable file. Tunney et al. introduced new optimized matrix multiplication kernels for x86 and ARM CPUs, improving prompt evaluation performance for FP16
In computing, half precision (sometimes called FP16) is a binary floating-point computer number format that occupies 16 bits (two bytes in modern computers) in computer memory. It is intended for storage of floating-point values in applications wh ...
and 8-bit quantized data types.
Military
In 2024, researchers from the People's Liberation Army Academy of Military Sciences
The Academy of Military Sciences () is the highest-level research institute of the People's Liberation Army (PLA). It is headquartered in Beijing. The academy was founded in March 1958 and as of 2002, its staff included approximately 500 resear ...
(top military academy of China) were reported to have developed a military tool using Llama, which Meta Platforms
Meta Platforms, Inc., (file no. 3835815) doing business as Meta and formerly named Facebook, Inc., and TheFacebook, Inc., is an American multinational technology conglomerate based in Menlo Park, California. The company owns Facebook, Instag ...
stated was unauthorized due to Llama's license prohibiting the use of the model for military purposes. Meta granted the US government and US military contractors permission to use Llama in November 2024, but continued to prohibit military use by non-US entities.
Reception
Wired
''Wired'' (stylized as ''WIRED'') is a monthly American magazine, published in print and online editions, that focuses on how emerging technologies affect culture, the economy, and politics. Owned by Condé Nast, it is headquartered in San Fran ...
describes the 8B parameter version of Llama 3 as being "surprisingly capable" given its size.
The response to Meta's integration of Llama into Facebook was mixed, with some users confused after Meta AI told a parental group that it had a child.
According to the Q4 2023 Earnings transcript, Meta adopted the strategy of open weights to improve on model safety, iteration speed, increase adoption among developers and researchers, and to become the industry standard. Llama 5, 6, and 7 are planned for the future.
The release of Llama models has sparked significant debates on the benefits and misuse risks of open weight models. Such models can be fine-tuned to remove safeguards, notably by cyber criminals, until they comply with harmful requests. Some experts contend that future models may facilitate causing damage more than defending against it, for example by making it relatively easy to engineer advanced bioweapons without specialized knowledge. Conversely, open-weight models can be useful for a wide variety of purposes, including for safety research.
Open Source Initiative
The Open Source Initiative (OSI) is the steward of the Open Source Definition, the set of rules that define open source software. It is a California public-benefit nonprofit corporation, with 501(c)(3) tax-exempt status.
The organization w ...
head Stefano Maffulli criticized Meta for describing Llama as open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
, saying that it was causing confusion among users and "polluting" the term.
See also
* GPT-4o
GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer developed by OpenAI and released in May 2024. GPT-4o is free, but with a usage limit that is five times higher for ChatGPT Plus subscribers. It can process ...
* IBM Granite
IBM Granite is a series of decoder-only AI foundation models created by IBM. It was announced on September 7, 2023, and an initial paper was published 4 days later. Initially intended for use in the IBM's cloud-based data and generative AI pla ...
, an open-source LLM made by IBM
* Mistral AI
Mistral AI, headquartered in Paris, France specializes in artificial intelligence (AI) products and focuses on open-weight large language models, (LLMs). Founded in April 2023 by former engineers from Google DeepMind and Meta Platforms, the co ...
, a French open-source AI company
References
Further reading
*
External links
*
*
{{Artificial intelligence navbox
2023 software
Internet leaks
Large language models
Meta Platforms