Llama (Large Language Model Meta AI, formerly stylized as LLaMA) is a family of

large language model A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are g ...

s (LLMs) released by

Meta AI Meta AI is a research division of Meta (formerly Facebook) that develops artificial intelligence and augmented reality technologies. History The foundation of laboratory was announced in 2013, under the name Facebook Artificial Intelligence ...

starting in February 2023. The latest version is Llama 4, released in April 2025. Llama models come in different sizes, ranging from 1 billion to 2 trillion parameters. Initially only a

foundation model In artificial intelligence (AI), a foundation model (FM), also known as large X model (LxM), is a machine learning or deep learning model trained on vast datasets so that it can be applied across a wide range of use cases.Competition and Markets ...

, starting with Llama 2, Meta AI released instruction fine-tuned versions alongside foundation models. Model weights for the first version of Llama were only available to researchers on a case-by-case basis, under a non-commercial license. Unauthorized copies of the first model were shared via

BitTorrent BitTorrent is a Protocol (computing), communication protocol for peer-to-peer file sharing (P2P), which enables users to distribute data and electronic files over the Internet in a Decentralised system, decentralized manner. The protocol is d ...

. Subsequent versions of Llama were made accessible outside academia and released under licenses that permitted some commercial use. Alongside the release of Llama 3,

Meta Meta most commonly refers to: * Meta (prefix), a common affix and word in English ( in Greek) * Meta Platforms, an American multinational technology conglomerate (formerly ''Facebook, Inc.'') Meta or META may also refer to: Businesses * Meta (ac ...

added

virtual assistant A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions, including verbal ones. Such technologies often incorporate chatbot capabilities to streaml ...

features to

Facebook Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...

and

WhatsApp WhatsApp (officially WhatsApp Messenger) is an American social media, instant messaging (IM), and voice-over-IP (VoIP) service owned by technology conglomerate Meta. It allows users to send text, voice messages and video messages, make vo ...

in select regions, and a standalone website. Both services use a Llama 3 model.

Background

After the release of large language models such as

GPT-3 Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020. Like its predecessor, GPT-2, it is a decoder-only transformer model of deep neural network, which supersedes recurrence and convolution-based ...

, a focus of research was up-scaling models which in some instances showed major increases in emergent capabilities. The release of

ChatGPT ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and released on November 30, 2022. It uses large language models (LLMs) such as GPT-4o as well as other Multimodal learning, multimodal models to create human-like re ...

and its surprise success caused an increase in attention to large language models. Compared with other responses to ChatGPT, Meta's Chief AI scientist

Yann LeCun Yann André Le Cun ( , ; usually spelled LeCun; born 8 July 1960) is a French-American computer scientist working primarily in the fields of machine learning, computer vision, mobile robotics and computational neuroscience. He is the Silver Pr ...

stated that large language models are best for aiding with writing. An empirical investigation of the Llama series was the scaling laws. It was observed that the Llama 3 models showed that when a model is trained on data that is more than the "

Chinchilla Chinchilla refers to either of two species ('' Chinchilla chinchilla'' and '' Chinchilla lanigera'') of crepuscular rodents of the parvorder Caviomorpha, and are native to the Andes mountains in South America. They live in colonies called "her ...

-optimal" amount, the performance continues to scale log-linearly. For example, the Chinchilla-optimal dataset for Llama 3 8B is 200 billion tokens, but performance continued to scale log-linearly to the 75-times larger dataset of 15 trillion tokens.

Initial release

LLaMA was announced on February 24, 2023, via a blog post and a paper describing the model's training, architecture, and performance. The inference code used to run the model was publicly released under the open-source

GPLv3 The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...

license. Access to the model's weights was managed by an application process, with access to be granted "on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world". Llama was trained on only publicly available information, and was trained at various model sizes, with the intention to make it more accessible to different hardware. The model was exclusively a

, although the paper contained examples of instruction fine-tuned versions of the model. Meta AI reported the 13B parameter model performance on most NLP benchmarks exceeded that of the much larger

(with 175B parameters), and the largest 65B model was competitive with state of the art models such as

PaLM Palm most commonly refers to: * Palm of the hand, the central region of the front of the hand * Palm plants, of family Arecaceae ** List of Arecaceae genera **Palm oil * Several other plants known as "palm" Palm or Palms may also refer to: Music ...

and

Leak

On March 3, 2023, a torrent containing LLaMA's weights was uploaded, with a link to the torrent shared on the

4chan 4chan is an anonymous English-language imageboard website. Launched by Christopher "moot" Poole in October 2003, the site hosts boards dedicated to a wide variety of topics, from video games and television to literature, cooking, weapons, mu ...

imageboard and subsequently spread through online AI communities. That same day, a pull request on the main LLaMA repository was opened, requesting to add the

magnet link Magnet is a URI scheme that defines the format of magnet links, a de facto standard for identifying files ( URN) by their content, via cryptographic hash value rather than by their location. # Although magnet links can be used in a number of ...

to the official documentation. On March 4, a pull request was opened to add links to HuggingFace repositories containing the model. On March 6, Meta filed takedown requests to remove the HuggingFace repositories linked in the pull request, characterizing it as "unauthorized distribution" of the model. HuggingFace complied with the requests. On March 20, Meta filed a

DMCA The Digital Millennium Copyright Act (DMCA) is a 1998 United States copyright law that implements two 1996 treaties of the World Intellectual Property Organization (WIPO). It criminalizes production and dissemination of technology, devices, or ...

takedown request for copyright infringement against a repository containing a script that downloaded LLaMA from a mirror, and GitHub complied the next day. Reactions to the leak varied. Some speculated that the model would be used for malicious purposes, such as more sophisticated

spam Spam most often refers to: * Spam (food), a consumer brand product of canned processed pork of the Hormel Foods Corporation * Spamming, unsolicited or undesired electronic messages ** Email spam, unsolicited, undesired, or illegal email messages ...

. Some have celebrated the model's accessibility, as well as the fact that smaller versions of the model can be run relatively cheaply, suggesting that this will promote the flourishing of additional research developments. Multiple commentators, such as Simon Willison, compared LLaMA to

Stable Diffusion Stable Diffusion is a deep learning, text-to-image model released in 2022 based on Diffusion model, diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of ...

, a

text-to-image model A text-to-image model is a machine learning model which takes an input natural language prompt and produces an image matching that description. Text-to-image models began to be developed in the mid-2010s during the beginnings of the AI boom ...

which, unlike comparably sophisticated models which preceded it, was openly distributed, leading to a rapid proliferation of associated tools, techniques, and software.

LLaMa 2

On July 18, 2023, in partnership with

Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...

, Meta announced LLaMa 2, the next generation of Llama. Meta trained and released Llama 2 in three model sizes: 7, 13, and 70 billion parameters. The model architecture remains largely unchanged from that of LLaMA-1 models, but 40% more data was used to train the foundational models. The accompanying preprint also mentions a model with 34B parameters that might be released in the future upon satisfying safety targets. LLaMa 2 includes foundation models and models fine-tuned for chat. In a further departure from the original version of LLaMa, all models are released with weights and may be used for many commercial use cases. However, because LLaMa's license enforces an

acceptable use policy An acceptable use policy (AUP)—also referred to as an acceptable usage policy or, in certain commercial contexts, a fair use policy (FUP)—is a formal set of guidelines established by the administrator, proprietor, or operator of a computer ...

that prohibits Llama from being used for some purposes, Meta's use of the term ''

open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...

'' to describe Llama has been disputed by the

Open Source Initiative The Open Source Initiative (OSI) is a California public benefit corporation "actively involved in Open Source community-building, education, and public advocacy to promote awareness and the importance of non-proprietary software". Governance The ...

(which maintains ''

The Open Source Definition ''The Open Source Definition'' (OSD) is a policy document published by the Open Source Initiative. Derived from the Debian Free Software Guidelines written by Bruce Perens, the definition is the most common standard for open-source software. T ...

'') and others. Code Llama is a fine-tune of LLaMa 2 with code specific datasets. 7B, 13B, and 34B versions were released on August 24, 2023, with the 70B releasing on the January 29, 2024. Starting with the foundation models from LLaMa 2, Meta AI would train an additional 500B tokens of code datasets, before an additional 20B token of long-context data, creating the Code Llama foundation models. This foundation model was further trained on 5B instruction following token to create the instruct fine-tune. Another foundation model was created for Python code, which trained on 100B tokens of Python-only code, before the long-context data.

Llama 3

On April 18, 2024, Meta released Llama-3 with two sizes: 8B and 70B parameters. The models have been pre-trained on approximately 15 trillion tokens of text gathered from “publicly available sources” with the instruct models fine-tuned on “publicly available instruction datasets, as well as over 10M human-annotated examples". Meta AI's testing showed in April 2024 that Llama 3 70B was beating

Gemini Gemini most often refers to: * Gemini (constellation), one of the constellations of the zodiac * Gemini (astrology), an astrological sign Gemini may also refer to: Science and technology Space * Gemini in Chinese astronomy, the Gemini constellat ...

Pro 1.5 and

Claude Claude may refer to: People and fictional characters * Claude (given name), a list of people and fictional characters * Claude (surname), a list of people * Claude Callegari (1962–2021), English Arsenal supporter * Claude Debussy (1862–1918), ...

3 Sonnet on most benchmarks. Meta also announced plans to make Llama 3 multilingual and multimodal, better at coding and reasoning, and to increase its context window. During an interview with Dwarkesh Patel, Mark Zuckerberg said that the 8B version of Llama 3 was nearly as powerful as the largest Llama 2. Compared to previous models, Zuckerberg stated the team was surprised that the 70B model was still learning even at the end of the 15T tokens training. The decision was made to end training to focus GPU power elsewhere. Llama-3.1 was released on July 23, 2024, with three sizes: 8B, 70B, and 405B parameters.

Llama 4

The Llama-4 series was released in 2025. The architecture was changed to a

mixture of experts Mixture of experts (MoE) is a machine learning technique where multiple expert networks (learners) are used to divide a problem space into homogeneous regions. MoE represents a form of ensemble learning. They were also called committee machines. ...

. They are multimodal (text and image input, text output) and multilingual (12 languages). Specifically, on 5 April 2025, the following were released both as base and instruction-tuned versions: * Scout: 17 billion active parameter model with 16 experts, context window of 10M, with 109B parameters in total. * Maverick: 17 billion active parameter model with 128 experts, context window of 1M, with 400B parameters in total. Also claimed was Behemoth (not yet released): 288 billion active parameter model with 16 experts and around 2T parameters in total. The Behemoth version was still in training at that time. The Scout was trained from scratch. The Maverick was "codistilled" from Behemoth. Note that the Scout was trained for longer and had a longer context length than Maverick. The training data included publicly available data, licensed data, and Meta-proprietary data such as publicly shared posts from Instagram and Facebook and people’s interactions with Meta AI. The data cutoff was August 2024. Meta claimed in its release announcement that Llama 4 bested

GPT-4o GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer developed by OpenAI and released in May 2024. It can process and generate text, images and audio. GPT-4o is free, but ChatGPT Plus subscribers have higher ...

's score on the LMArena AI benchmark. The company also stated that Llama 4's benchmark score was achieved using an unreleased "experimental chat version" of the model that was "optimized for conversationality", which differed from the version of Llama 4 released to the public. LMArena indicated that it would change its policies to prevent this incident from reoccurring, and responded, "Meta's interpretation of our policy did not match what we expect from model providers. Meta should have made it clearer that 'Llama-4-Maverick-03-26-Experimental' was a customized model to optimize for human preference." Some users criticized Meta on social media for its use of a separate model version tailored for benchmarking, and some additionally accused Meta of training Llama 4 on test sets to further boost its benchmark scores—which Meta denied.

Comparison of models

For the training cost column, only the largest model's cost is written by default. So for example, "21,000" is the training cost of Llama 2 69B in units of petaFLOP-day. Also, 1 petaFLOP-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP. "T" means "trillion" and "B" means "billion". The following table lists the main model versions of Llama, describing the significant changes included with each version:

Architecture and training

Architecture

Like GPT-3, the Llama series of models are

autoregressive In statistics, econometrics, and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it can be used to describe certain time-varying processes in nature, economics, behavior, etc. The autoregre ...

decoder-only

Transformers ''Transformers'' is a media franchise produced by American toy company Hasbro and Japanese toy company Tomy, Takara Tomy. It primarily follows the heroic Autobots and the villainous Decepticons, two Extraterrestrials in fiction, alien robot fac ...

, but there are some minor differences: * SwiGLU

activation function The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation f ...

instead of GeLU; * rotary positional embeddings (RoPE) instead of absolute positional embedding; * RMSNorm instead of

layer normalization In machine learning, normalization is a statistical technique with various applications. There are two main forms of normalization, namely ''data normalization'' and ''activation normalization''. Data normalization (or feature scaling) includes m ...

;

Training datasets

LLaMA's developers focused their effort on scaling the model's performance by increasing the volume of training data, rather than the number of parameters, reasoning that the dominating cost for LLMs is from doing inference on the trained model rather than the computational cost of the training process. LLaMA 1 foundational models were trained on a data set with 1.4 trillion tokens, drawn from publicly available data sources, including: * Webpages scraped by CommonCrawl * Open source repositories of source code from

GitHub GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...

Wikipedia Wikipedia is a free content, free Online content, online encyclopedia that is written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. Founded by Jimmy Wales and La ...

in 20 languages *

Public domain The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ...

books from

Project Gutenberg Project Gutenberg (PG) is a volunteer effort to digitize and archive cultural works, as well as to "encourage the creation and distribution of eBooks." It was founded in 1971 by American writer Michael S. Hart and is the oldest digital li ...

* Books3 books dataset * The

LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latices are found in nature, but synthetic latices are common as well. In nature, latex is found as a wikt:milky, milky fluid, which is present in 10% of all floweri ...

source code for scientific papers uploaded to

ArXiv arXiv (pronounced as "archive"—the X represents the Chi (letter), Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not Scholarly pee ...

* Questions and answers from

Stack Exchange Stack Exchange is a network of question-and-answer (Q&A) websites on topics in diverse fields, each site covering a specific topic, where questions, answers, and users are subject to a reputation award process. The reputation system allows t ...

websites On April 17, 2023, TogetherAI launched a project named RedPajama to reproduce and distribute an

version of the LLaMA dataset. The dataset has approximately 1.2 trillion tokens and is publicly available for download. Llama 2 foundational models were trained on a data set with 2 trillion tokens. This data set was curated to remove Web sites that often disclose personal data of people. It also upsamples sources considered trustworthy. Llama 2 - Chat was additionally fine-tuned on 27,540 prompt-response pairs created for this project, which performed better than larger but lower-quality third-party datasets. For AI alignment, reinforcement learning with human feedback (RLHF) was used with a combination of 1,418,091 Meta examples and seven smaller datasets. The average dialog depth was 3.9 in the Meta examples, 3.0 for Anthropic Helpful and Anthropic Harmless sets, and 1.0 for five other sets, including OpenAI Summarize, StackExchange, etc. Llama 3 consists of mainly English data, with over 5% in over 30 other languages. Its dataset was filtered by a text-quality classifier, and the classifier was trained by text synthesized by Llama 2. In a lawsuit brought by Richard Kadrey and others against Meta Platforms, CEO Mark Zuckerberg was alleged to have authorized the use of copyrighted content from

Library Genesis Library Genesis (shortened to LibGen) is a shadow library project for file-sharing access to scholarly journal articles, academic and general-interest books, images, comics, audiobooks, and magazines. The site enables free access to content th ...

to train Llama AI models and conceal its actions by removing copyright markers from the data.

Fine-tuning

Llama 1 models are only available as foundational models with self-supervised learning and without fine-tuning. Llama 2 – Chat models were derived from foundational Llama 2 models. Unlike

GPT-4 Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model trained and created by OpenAI and the fourth in its series of GPT foundation models. It was launched on March 14, 2023, and made publicly available via the p ...

which increased context length during fine-tuning, Llama 2 and Code Llama - Chat have the same context length of 4K tokens. Supervised fine-tuning used an autoregressive loss function with token loss on user prompts zeroed out. The batch size was 64. For

AI alignment In the field of artificial intelligence (AI), alignment aims to steer AI systems toward a person's or group's intended goals, preferences, or ethical principles. An AI system is considered ''aligned'' if it advances the intended objectives. A '' ...

, human annotators wrote prompts and then compared two model outputs (a binary protocol), giving confidence levels and separate safety labels with veto power. Two separate reward models were trained from these preferences for safety and helpfulness using

Reinforcement learning from human feedback In machine learning, reinforcement learning from human feedback (RLHF) is a technique to AI alignment, align an intelligent agent with human preferences. It involves training a reward model to represent preferences, which can then be used to trai ...

(RLHF). A major technical contribution is the departure from the exclusive use of Proximal Policy Optimization (PPO) for RLHF – a new technique based on

Rejection sampling In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type o ...

was used, followed by PPO. Multi-turn consistency in dialogs was targeted for improvement, to make sure that "system messages" (initial instructions, such as "speak in French" and "act like Napoleon") are respected during the dialog. This was accomplished using the new "Ghost attention" technique during training, which concatenates relevant instructions to each new user message but zeros out the loss function for tokens in the prompt (earlier parts of the dialog).

Applications

The

Stanford University Leland Stanford Junior University, commonly referred to as Stanford University, is a Private university, private research university in Stanford, California, United States. It was founded in 1885 by railroad magnate Leland Stanford (the eighth ...

Institute for Human-Centered Artificial Intelligence (HAI) Center for Research on Foundation Models (CRFM) released Alpaca, a training recipe based on the LLaMA 7B model that uses the "Self-Instruct" method of instruction tuning to acquire capabilities comparable to the OpenAI GPT-3 series text-davinci-003 model at a modest cost. The model files were officially removed on March 21, 2023, over hosting costs and safety concerns, though the code and paper remain online for reference. Meditron is a family of Llama-based finetuned on a corpus of clinical guidelines,

PubMed PubMed is an openly accessible, free database which includes primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institute ...

papers, and articles. It was created by researchers at

École Polytechnique Fédérale de Lausanne The École Polytechnique Fédérale de Lausanne (, EPFL) is a public university, public research university in Lausanne, Switzerland, founded in 1969 with the mission to "train talented engineers in Switzerland". Like its sister institution E ...

School of Computer and Communication Sciences, and the

Yale School of Medicine The Yale School of Medicine is the medical school of Yale University, a private research university in New Haven, Connecticut. It was founded in 1810 as the Medical Institution of Yale College and formally opened in 1813. It is the sixth-oldest m ...

. It shows increased performance on medical-related benchmarks such as MedQA and MedMCQA. Zoom used Meta Llama 2 to create an AI Companion that can summarize meetings, provide helpful presentation tips, and assist with message responses. This AI Companion is powered by multiple models, including Meta Llama 2. Reuters reported in 2024 that many Chinese foundation models relied on Llama models for their training.

llama.cpp

Software developer Georgi Gerganov released llama.cpp as open-source on March 10, 2023. It's a re-implementation of LLaMA in C++, allowing systems without a powerful GPU to run the model locally. The llama.cpp project introduced the GGUF file format, a binary format that stores both tensors and metadata. The format focuses on supporting different quantization types, which can reduce memory usage, and increase speed at the expense of lower model precision. llamafile created by Justine Tunney is an open-source tool that bundles llama.cpp with the model into a single executable file. Tunney et al. introduced new optimized matrix multiplication kernels for x86 and ARM CPUs, improving prompt evaluation performance for

FP16 In computing, half precision (sometimes called FP16 or float16) is a binary floating-point computer number format that occupies 16 bits (two bytes in modern computers) in computer memory. It is intended for storage of floating-point values in a ...

and 8-bit quantized data types.

Military

In 2024, researchers from the People's Liberation Army Academy of Military Sciences (top military academy of

China China, officially the People's Republic of China (PRC), is a country in East Asia. With population of China, a population exceeding 1.4 billion, it is the list of countries by population (United Nations), second-most populous country after ...

) were reported to have developed a military tool using Llama, which

Meta Platforms Meta Platforms, Inc. is an American multinational technology company headquartered in Menlo Park, California. Meta owns and operates several prominent social media platforms and communication services, including Facebook, Instagram, Threads ...

stated was unauthorized due to Llama's license prohibiting the use of the model for military purposes. Meta granted the US government and US military contractors permission to use Llama in November 2024, but continued to prohibit military use by non-US entities.

Reception

Wired Wired may refer to: Arts, entertainment, and media Music * ''Wired'' (Jeff Beck album), 1976 * ''Wired'' (Hugh Cornwell album), 1993 * ''Wired'' (Mallory Knox album), 2017 * "Wired", a song by Prism from their album '' Beat Street'' * "Wired ...

'' describes the 8B parameter version of Llama 3 as being "surprisingly capable" given its size. The response to Meta's integration of Llama into Facebook was mixed, with some users confused after Meta AI told a parental group that it had a child. According to the Q4 2023 Earnings transcript, Meta adopted the strategy of open weights to improve on model safety, iteration speed, increase adoption among developers and researchers, and to become the industry standard. Llama 5, 6, and 7 are planned for the future. The release of Llama models has sparked significant debates on the benefits and misuse risks of open weight models. Such models can be fine-tuned to remove safeguards, notably by cyber criminals, until they comply with harmful requests. Some experts contend that future models may facilitate causing damage more than defending against it, for example by making it relatively easy to engineer advanced bioweapons without specialized knowledge. Conversely, open-weight models can be useful for a wide variety of purposes, including for safety research.

head Stefano Maffulli criticized Meta for describing Llama as

, saying that it was causing confusion among users and "polluting" the term.

References

External links

* * {{Artificial intelligence navbox 2023 software Internet leaks Large language models Meta Platforms