Mistral AI, headquartered in
Paris
Paris () is the capital and most populous city of France, with an estimated population of 2,165,423 residents in 2019 in an area of more than 105 km² (41 sq mi), making it the 30th most densely populated city in the world in 2020. ...
, France specializes in
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...
(AI) products and focuses on open-weight
large language models,
(LLMs). Founded in April 2023 by former engineers from
Google DeepMind
DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was acquired by Google in 2014 and became a wholly owned subsidiary of Alphabet Inc, after Google's restru ...
and
Meta Platforms
Meta Platforms, Inc., (file no. 3835815) doing business as Meta and formerly named Facebook, Inc., and TheFacebook, Inc., is an American multinational technology conglomerate based in Menlo Park, California. The company owns Facebook, Instag ...
, the company has gained prominence as an alternative to proprietary AI systems. Named after the
mistral
Mistral may refer to:
* Mistral (wind) in southern France and Sardinia
Automobiles
* Maserati Mistral, a Maserati grand tourer produced from 1963 until 1970
* Nissan Mistral, or Terrano II, a Nissan 4×4 produced from 1993 until 2006
* Micropl ...
a powerful, cold wind in southern France
the company emphasized openness and innovation in the AI field. Mistral AI positions itself as an alternative to proprietary models.
In October 2023, Mistral AI raised €385 million. By December 2023, it was valued at over $2 billion.
In June 2024, Mistral AI secured a €600 million ($645 million) founding round, elevating its valuation to €5.8 billion ($6.2 billion). Led by venture capital firm
General Catalyst
General Catalyst, formerly General Catalyst Partners (GCP), is an American venture capital firm focused on early stage and growth investments. The firm was founded in 2000 in Cambridge, Massachusetts, and also has offices in San Francisco, Pal ...
, this round resulted in additional contributions from existing investors. The funds aim to support the company's expansion.
Mistral AI has published three open-source models available as weights. Additionally, three more modelsSmall, Medium, and Largeare available via API only.
Based on
valuation, the company is in fourth place in the global AI race and in first place outside the
San Francisco Bay Area
The San Francisco Bay Area, often referred to as simply the Bay Area, is a populous region surrounding the San Francisco, San Pablo, and Suisun Bay estuaries in Northern California. The Bay Area is defined by the Association of Bay Area Gov ...
, ahead of several of its peers, such as
Cohere,
Hugging Face
Hugging Face, Inc. is an American company that develops tools for building applications using machine learning. It is most notable for its Transformers library built for natural language processing applications and its platform that allows users ...
,
Inflection
In linguistic morphology, inflection (or inflexion) is a process of word formation in which a word is modified to express different grammatical categories such as tense, case, voice, aspect, person, number, gender, mood, animacy, and ...
,
Perplexity
In information theory, perplexity is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. A low perplexity indicates the probability distribution is good a ...
and
Together
''ToGetHer'' (, aka Superstar Express) is a 2009 Taiwanese drama starring Jiro Wang of Fahrenheit, Rainie Yang and George Hu. It was produced by Comic International Productions ( 可米國際影視事業股份有限公司) and directed by Linzi P ...
. Mistral AI aims to "democratize" AI by focusing on open-source innovation.
History
Mistral AI was established in April 2023 by three French AI researchers: Arthur Mensch, Guillaume Lample and Timothée Lacroix. Mensch, a former researcher at
Google DeepMind
DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was acquired by Google in 2014 and became a wholly owned subsidiary of Alphabet Inc, after Google's restru ...
, brought expertise in advanced AI systems, while Lample and Lacroix contributed their experience from
Meta Platforms
Meta Platforms, Inc., (file no. 3835815) doing business as Meta and formerly named Facebook, Inc., and TheFacebook, Inc., is an American multinational technology conglomerate based in Menlo Park, California. The company owns Facebook, Instag ...
, where they specialized in developing large-scale AI models. The trio initially met during their studies at
École Polytechnique
École may refer to:
* an elementary school in the French educational stages normally followed by secondary education establishments (collège and lycée)
* École (river), a tributary of the Seine
The Seine ( , ) is a river in northern Franc ...
,
a public university in France.
In June 2023, the start-up carried out a first fundraising of €105 million ($117 million) with investors including the American fund
Lightspeed Venture Partners
Lightspeed Venture Partners is a global venture capital firm focusing on multi-stage investments in the enterprise, consumer, and health sectors. Lightspeed invests in seed, early and growth-stage companies.
The company invests in the U.S. and a ...
,
Eric Schmidt
Eric Emerson Schmidt (born April 27, 1955) is an American businessman and software engineer known for being the CEO of Google from 2001 to 2011, executive chairman of Google from 2011 to 2015, executive chairman of Alphabet Inc. from 2015 to 2 ...
,
Xavier Niel
Xavier Niel (born 25 August 1967) is a French billionaire businessman involved in the telecommunications and technology industry. He is best known as founder and majority shareholder of the French Internet service provider and mobile operator I ...
and
JCDecaux
Decaux Group (JCDecaux SA, ) is a multinational corporation based in Neuilly-sur-Seine, near Paris, France, known for its bus-stop advertising systems, billboards, public bicycle rental systems, and street furniture. It is the largest outdoor ...
. The valuation is then estimated by the
Financial Times
The ''Financial Times'' (''FT'') is a British daily newspaper printed in broadsheet and published digitally that focuses on business and economic current affairs. Based in London, England, the paper is owned by a Japanese holding company, Nikke ...
at €240 million ($267 million).
On 27 September 2023, the company made its language processing model “Mistral 7B” available under the free
Apache 2.0 license. This model has 7 billion parameters, a small size compared to its competitors.
On 10 December 2023, Mistral AI announced that it had raised €385 million ($428 million) as part of its second fundraising. This round of financing involves the Californian fund
Andreessen Horowitz
Andreessen Horowitz (also called a16z, legal name AH Capital Management, LLC) is a private American venture capital firm, founded in 2009 by Marc Andreessen and Ben Horowitz. The company is headquartered in Menlo Park, California.
Andreessen H ...
,
BNP Paribas
BNP Paribas is a French international banking group, founded in 2000 from the merger between Banque Nationale de Paris (BNP, "National Bank of Paris") and Paribas, formerly known as the Banque de Paris et des Pays-Bas. The full name of the gro ...
and the software publisher
Salesforce
Salesforce, Inc. is an American Cloud computing, cloud-based software company headquartered in San Francisco, California. It provides customer relationship management (CRM) software and applications focused on sales, customer service, marketi ...
.
On 11 December 2023, the company released the Mixtral 8x7B model with 46.7 billion parameters but using only 12.9 billion per token with
mixture of experts Mixture of experts (MoE) refers to a machine learning technique where multiple expert networks (learners) are used to divide a problem space into homogeneous regions. It differs from ensemble techniques in that typically only a few, or 1, expert m ...
architecture. The model masters 5 languages (French, Spanish, Italian, English and German) and outperforms, according to its developers' tests, the "LLama 2 70B" model from
Meta
Meta (from the Greek μετά, '' meta'', meaning "after" or "beyond") is a prefix meaning "more comprehensive" or "transcending".
In modern nomenclature, ''meta''- can also serve as a prefix meaning self-referential, as a field of study or ende ...
. A version trained to follow instructions and called “Mixtral 8x7B Instruct” is also offered.
On 26 February 2024,
Microsoft
Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...
announced a new partnership with the company to expand its presence in the
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...
industry. Under the agreement, Mistral's language models will be available on
Microsoft's Azure cloud, while the multilingual conversational assistant ''Le Chat'' will be launched in the style of
ChatGPT
ChatGPT (Generative Pre-trained Transformer) is a chatbot launched by OpenAI in November 2022. It is built on top of OpenAI's GPT-3 family of large language models, and is fine-tuned (an approach to transfer learning) with both supervised and ...
.
On 10 April 2024, the company released the mixture of expert models, Mixtral 8x22B, offering high performance on various benchmarks compared to other open models.
On 16 April 2024, reporting revealed that Mistral was in talks to raise €500 million, a deal that would more than double its current valuation to at least €5 billion.
On November 19, 2024, the company announced updates for ''Le Chat''. It added the ability to create images, in partnership with Black Forest Labs, utilizing the
Flux Pro model. Additionally, it introduced the capability to search for information on the internet to provide reliable and up-to-date information. Furthermore, it launched the Canvas system, a collaborative interface where the AI generates code and the user can modify it. The company also introduced a new model, Pixtral Large, which is an improvement over Pixtral 12B, integrating a 1-billion-parameter visual encoder coupled with Mistral Large 2. This model has also been enhanced, particularly for long contexts and function calls.
The company had over 100 employees by late fall 2024.
Models
Open-weight models
Mistral 7B
Mistral 7B is a 7.3B parameter language model using the transformers architecture. Officially released on September 27, 2023, via a
BitTorrent magnet link, and
Hugging Face
Hugging Face, Inc. is an American company that develops tools for building applications using machine learning. It is most notable for its Transformers library built for natural language processing applications and its platform that allows users ...
. The model was released under the
Apache 2.0 license. The release blog post claimed the model outperforms
LLaMA 2 13B on all benchmarks tested, and is on par with LLaMA 34B on many benchmarks tested.
Mistral 7B employs grouped-query attention (GQA), which is a variant of the standard attention mechanism. This architecture optimizes performance by calculating attention within specific groups of hidden states rather than across all hidden states, improving efficiency and scalability.
Both a base model and "instruct" model were released with the latter receiving additional tuning to follow chat-style prompts. The fine-tuned model is only intended for demonstration purposes, and does not have guardrails or moderation built-in.
Mixtral 8x7B
Much like Mistral's first model, Mixtral 8x7B was released via a BitTorrent link posted on
Twitter
Twitter is an online social media and social networking service owned and operated by American company Twitter, Inc., on which users post and interact with 280-character-long messages known as "tweets". Registered users can post, like, and ...
on December 9, 2023,
and later Hugging Face and a blog post were released two days later.
Unlike the previous Mistral model, Mixtral 8x7B uses a sparse
mixture of experts Mixture of experts (MoE) refers to a machine learning technique where multiple expert networks (learners) are used to divide a problem space into homogeneous regions. It differs from ensemble techniques in that typically only a few, or 1, expert m ...
architecture. The model has 8 distinct groups of "experts", giving the model a total of 46.7B usable parameters. Each single token can only use 12.9B parameters, therefore giving the speed and cost that a 12.9B parameter model would incur.
Mistral AI's testing shows the model beats both LLaMA 70B, and
GPT-3.5 in most
benchmarks.
In March 2024, research conducted by
Patronus AI comparing performance of LLMs on a 100-question test with prompts to generate text from books protected under
U.S. copyright law
The copyright law of the United States grants monopoly protection for "original works of authorship". With the stated purpose to promote art and culture, copyright law assigns a set of exclusive rights to authors: to make and sell copies of the ...
found that
Open AI
OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...
's
GPT-4
Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI and the fourth in its GPT series. It was released on March 14, 2023, and has been made publicly available in a limited form via ChatGPT Plus, ...
, Mixtral,
Meta AI
Meta AI is an artificial intelligence laboratory that belongs to Meta Platforms Inc. (formerly known as Facebook, Inc.) Meta AI intends to develop various forms of artificial intelligence, improving augmented and artificial reality technologies. ...
's
LLaMA-2, and
Anthropic's
Claude 2 generated copyrighted text verbatim in 44%, 22%, 10%, and 8% of responses respectively.
Mixtral 8x22B
Similar to Mistral's previous open models, Mixtral 8x22B was released via a BitTorrent link on Twitter on April 10, 2024, with a release on Hugging Face soon after.
The model uses an architecture similar to that of Mistral 8x7B, but with each expert having 22 billion parameters instead of 7. In total, the model contains 141 billion parameters, as some parameters are shared among the experts.
Mistral Large 2
Mistral Large 2 was announced on July 24, 2024, and released on Hugging Face. Unlike the previous Mistral Large, this version was released with open weights. It is available for free with a Mistral Research Licence, and with a commercial licence for commercial purposes. Mistral AI claims that it is fluent in dozens of languages, including many programming languages. The model has 123 billion parameters and a context length of 128,000 tokens. Its performance in benchmarks is competitive with
Llama 3.1 405B, particularly in programming-related tasks.
Codestral 22B
Codestral is Mistral's first code focused open weight model. Codestral was launched on 29 May 2024. It is a lightweight model specifically built for code generation tasks. As of its release date, this model surpasses
Meta's Llama3 70B and
DeepSeek Coder 33B (78.2% - 91.6%), another code-focused model on the HumanEval FIM benchmark. Mistral claims Codestral is fluent in more than 80 programming languages Codestral has its own license which forbids the usage of Codestral for commercial purposes.
Mathstral 7B
Mathstral 7B is a model with 7 billion parameters released by Mistral AI on July 16, 2024. It focuses on STEM subjects, achieving a score of 56.6% on the MATH benchmark and 63.47% on the MMLU benchmark.
The model was produced in collaboration with Project Numina,
and was released under the Apache 2.0 License. It has a context length of 32k tokens.
Codestral Mamba 7B
Codestral Mamba is based on the Mamba 2 architecture, which allows it to generate responses even with longer input.
Unlike Codestral, it was released under the Apache 2.0 license. While previous releases often included both the base model and the instruct version, only the instruct version of Codestral Mamba was released.
API-only models
Unlike Mistral 7B, Mixtral 8x7B and Mixtral 8x22B, the following models are closed-source and only available through the Mistral API.
Mistral Large
Mistral Large was launched on February 26, 2024, and Mistral claims it is second in the world only to OpenAI's GPT-4.
It is fluent in English, French, Spanish, German, and Italian, with Mistral claiming understanding of both grammar and cultural context, and provides coding capabilities. As of early 2024, it is Mistral's flagship AI. It is also available on Microsoft Azure.
In July 2024, Mistral Large 2 was released, replacing the original Mistral Large. Unlike the original model, it was released with open weights.
Mistral Medium
Mistral Medium is trained in various languages including English, French, Italian, German, Spanish and code with a score of 8.6 on MT-Bench. It is ranked in performance above
Claude Claude may refer to:
__NOTOC__ People and fictional characters
* Claude (given name), a list of people and fictional characters
* Claude (surname), a list of people
* Claude Lorrain (c. 1600–1682), French landscape painter, draughtsman and etcher ...
and below
GPT-4
Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI and the fourth in its GPT series. It was released on March 14, 2023, and has been made publicly available in a limited form via ChatGPT Plus, ...
on the LMSys
ELO Arena benchmark.
The number of parameters, and architecture of Mistral Medium is not known as Mistral has not published public information about it.
Mistral Small
Like the Large model, Mistral Small was launched on February 26, 2024.
References
External links
*
{{Generative AI
Artificial intelligence laboratories
French companies established in 2023
Companies based in Paris
Open-source artificial intelligence