HOME

TheInfoList



OR:

Gemini is a family of multimodal large language models developed by
Google DeepMind DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was acquired by Google in 2014 and became a wholly owned subsidiary of Alphabet Inc, after Google's restru ...
, serving as the successor to
LaMDA LaMDA, which stands for Language Model for Dialogue Applications, is a family of conversational neural language models developed by Google. The first generation was announced during the 2021 Google I/O keynote, while the second generation was ...
and PaLM 2. Comprising Gemini Ultra, Gemini Pro, and Gemini Nano, it was announced on December 6, 2023, positioned as a competitor to
OpenAI OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...
's
GPT-4 Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI and the fourth in its GPT series. It was released on March 14, 2023, and has been made publicly available in a limited form via ChatGPT Plus, ...
. It powers the
chatbot A chatbot or chatterbot is a software application used to conduct an on-line chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent. Designed to convincingly simulate the way a human would behav ...
of the same name.


History


Development

Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
announced Gemini, a large language model (LLM) developed by subsidiary
Google DeepMind DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was acquired by Google in 2014 and became a wholly owned subsidiary of Alphabet Inc, after Google's restru ...
, during the
Google I/O Google I/O (or simply I/O) is an annual developer conference held by Google in Mountain View, California. "I/O" stands for Input/Output, as well as the slogan "Innovation in the Open". The event's format is similar to Google Developer Day. H ...
keynote on May 10, 2023. It was positioned as a more powerful successor to PaLM 2, which was also unveiled at the event, with Google CEO
Sundar Pichai Pichai Sundararajan (born June 10, 1972), better known as Sundar Pichai (), is an Indian-American business executive. He is the chief executive officer (CEO) of Alphabet Inc. and its subsidiary Google. Born in Madurai, India, Pichai earned h ...
stating that Gemini was still in its early developmental stages. Unlike other LLMs, Gemini was said to be unique in that it was not trained on a
text corpus In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical ...
alone and was designed to be multimodal, meaning it could process multiple types of data simultaneously, including text, images, audio, video, and
computer code A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations (computation) automatically. Modern digital electronic computers can perform generic sets of operations known as programs. These progr ...
. It had been developed as a collaboration between DeepMind and
Google Brain Google Brain is a deep learning artificial intelligence research team under the umbrella of Google AI, a research division at Google dedicated to artificial intelligence. Formed in 2011, Google Brain combines open-ended machine learning research ...
, two branches of Google that had been merged as Google DeepMind the previous month. In an interview with ''
Wired ''Wired'' (stylized as ''WIRED'') is a monthly American magazine, published in print and online editions, that focuses on how emerging technologies affect culture, the economy, and politics. Owned by Condé Nast, it is headquartered in San Fran ...
'', DeepMind CEO
Demis Hassabis Demis Hassabis (born 27 July 1976) is a British artificial intelligence researcher and entrepreneur. In his early career he was a video game AI programmer and designer, and an expert player of board games. He is the chief executive officer and ...
touted Gemini's advanced capabilities, which he believed would allow the algorithm to trump
OpenAI OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...
's
ChatGPT ChatGPT (Generative Pre-trained Transformer) is a chatbot launched by OpenAI in November 2022. It is built on top of OpenAI's GPT-3 family of large language models, and is fine-tuned (an approach to transfer learning) with both supervised and ...
, which runs on
GPT-4 Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI and the fourth in its GPT series. It was released on March 14, 2023, and has been made publicly available in a limited form via ChatGPT Plus, ...
and whose growing popularity had been aggressively challenged by Google with
LaMDA LaMDA, which stands for Language Model for Dialogue Applications, is a family of conversational neural language models developed by Google. The first generation was announced during the 2021 Google I/O keynote, while the second generation was ...
and Bard. Hassabis highlighted the strengths of DeepMind's
AlphaGo AlphaGo is a computer program that plays the board game Go. It was developed by DeepMind Technologies a subsidiary of Google (now Alphabet Inc.). Subsequent versions of AlphaGo became increasingly powerful, including a version that competed u ...
program, which gained worldwide attention in 2016 when it defeated Go champion
Lee Sedol Lee Sedol ( ko, 이세돌; born 2 March 1983), or Lee Se-dol, is a former South Korean professional Go player of 9 dan rank. As of February 2016, he ranked second in international titles (18), behind only Lee Chang-ho (21). He is the f ...
, saying that Gemini would combine the power of AlphaGo and other Google–DeepMind LLMs. In August 2023, '' The Information'' published a report outlining Google's roadmap for Gemini, revealing that the company was targeting a launch date of late 2023. According to the report, Google hoped to surpass OpenAI and other competitors by combining conversational text capabilities present in most LLMs with
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...
–powered image generation, allowing it to create contextual images and be adapted for a wider range of use cases. Like Bard, Google co-founder
Sergey Brin Sergey Mikhailovich Brin (russian: link=no, Сергей Михайлович Брин; born August 21, 1973) is an American business magnate, computer scientist, and internet entrepreneur, who co-founded Google with Larry Page. Brin was ...
was summoned out of retirement to assist in the development of Gemini, along with hundreds of other engineers from Google Brain and DeepMind; he was later credited as a "core contributor" to Gemini. Because Gemini was being trained on transcripts of
YouTube YouTube is a global online video sharing and social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by Google, and is the second most ...
videos, lawyers were brought in to filter out any potentially copyrighted materials. With news of Gemini's impending launch, OpenAI hastened its work on integrating GPT-4 with multimodal features similar to those of Gemini. ''The Information'' reported in September that several companies had been granted early access to "an early version" of the LLM, which Google intended to make available to clients through
Google Cloud Google Cloud Platform offers numerous integrated cloud-computing services, including compute, network, and storage. Products Past and present products under the Google Cloud platform include: Current * Google Cloud Datastore, a NoSQL data ...
's Vertex AI service. The publication also stated that Google was arming Gemini to compete with both GPT-4 and
Microsoft Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...
's
GitHub Copilot GitHub Copilot is a cloud-based artificial intelligence tool developed by GitHub and OpenAI to assist users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocompleting code. Currently ...
.


Launch

On December 6, 2023, Pichai and Hassabis announced "Gemini 1.0" at a virtual press conference. It comprised three models: Gemini Ultra, designed for "highly complex tasks"; Gemini Pro, designed for "a wide range of tasks"; and Gemini Nano, designed for "on-device tasks". At launch, Gemini Pro and Nano were integrated into Bard and the Pixel 8 Pro smartphone, respectively, while Gemini Ultra was set to power "Bard Advanced" and become available to software developers in early 2024. Other products that Google intended to incorporate Gemini into included
Search Searching or search may refer to: Computing technology * Search algorithm, including keyword search ** :Search algorithms * Search and optimization for problem solving in artificial intelligence * Search engine technology, software for findi ...
, Ads,
Chrome Chrome may refer to: Materials * Chrome plating, a process of surfacing with chromium * Chrome alum, a chemical used in mordanting and photographic film Computing * Google Chrome, a web browser developed by Google ** ChromeOS, a Google Chrome- ...
, Duet AI on
Google Workspace Google Workspace (formerly known as Google Apps and later G Suite) is a collection of cloud computing, productivity and collaboration tools, software and products developed and marketed by Google. It consists of Gmail, Contacts, Calendar, M ...
, and AlphaCode 2. It was made available only in English. Touted as Google's "largest and most capable AI model" and designed to emulate human behavior, the company stated that Gemini would not be made widely available until the following year due to the need for "extensive safety testing". Gemini was trained on and powered by Google's
Tensor Processing Unit Tensor Processing Unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google for neural network machine learning, using Google's own TensorFlow software. Google began using TPUs internally in 2015, and in ...
s (TPUs), and the name is in reference to the DeepMind–Google Brain merger as well as
NASA The National Aeronautics and Space Administration (NASA ) is an independent agency of the US federal government responsible for the civil space program, aeronautics research, and space research. NASA was established in 1958, succeedi ...
's
Project Gemini Project Gemini () was NASA's second human spaceflight program. Conducted between projects Mercury and Apollo, Gemini started in 1961 and concluded in 1966. The Gemini spacecraft carried a two-astronaut crew. Ten Gemini crews and 16 individual ...
. Gemini Ultra was said to have outperformed GPT-4, Anthropic's Claude 2, Inflection AI's Inflection-2,
Meta Meta (from the Greek μετά, '' meta'', meaning "after" or "beyond") is a prefix meaning "more comprehensive" or "transcending". In modern nomenclature, ''meta''- can also serve as a prefix meaning self-referential, as a field of study or ende ...
's LLaMA 2, and xAI's Grok 1 on a variety of industry benchmarks, while Gemini Pro was said to have outperformed GPT-3.5. Gemini Ultra was also the first language model to outperform human experts on the 57-subject Massive Multitask Language Understanding (MMLU) test, obtaining a score of 90%. Gemini Pro was made available to Google Cloud customers on AI Studio and Vertex AI on December 13, while Gemini Nano will be made available to Android developers as well. Hassabis further revealed that DeepMind was exploring how Gemini could be "combined with robotics to physically interact with the world". In accordance with an executive order signed by U.S. President Joe Biden in October, Google stated that it would share testing results of Gemini Ultra with the
federal government of the United States The federal government of the United States (U.S. federal government or U.S. government) is the national government of the United States, a federal republic located primarily in North America, composed of 50 states, a city within a fede ...
. Similarly, the company was engaged in discussions with the
government of the United Kingdom ga, Rialtas a Shoilse gd, Riaghaltas a Mhòrachd , image = HM Government logo.svg , image_size = 220px , image2 = Royal Coat of Arms of the United Kingdom (HM Government).svg , image_size2 = 180px , caption = Royal coat of arms of t ...
to comply with the principles laid out at the AI Safety Summit at
Bletchley Park Bletchley Park is an English country house and estate in Bletchley, Milton Keynes (Buckinghamshire) that became the principal centre of Allied code-breaking during the Second World War. The mansion was constructed during the years following ...
in November.


Updates

Google partnered with
Samsung The Samsung Group (or simply Samsung) ( ko, 삼성 ) is a South Korean multinational manufacturing conglomerate headquartered in Samsung Town, Seoul, South Korea. It comprises numerous affiliated businesses, most of them united under the ...
to integrate Gemini Nano and Gemini Pro into its Galaxy S24 smartphone lineup in January 2024. The following month, Bard and Duet AI were unified under the Gemini brand, with "Gemini Advanced with Ultra 1.0" debuting via a new "AI Premium" tier of the
Google One Google One is a subscription service developed by Google that offers expanded cloud storage and is intended for the consumer market. Google One paid plans offer cloud storage starting at 100 gigabytes, up to a maximum of 30 terabytes, a ...
subscription service. Gemini Pro also received a global launch. In February, Google launched "Gemini 1.5" in a limited capacity, positioned as a more powerful and capable model than 1.0 Ultra. This "step change" was achieved through various technical advancements, including a new architecture, a
mixture-of-experts Mixture of experts (MoE) refers to a machine learning technique where multiple expert networks (learners) are used to divide a problem space into homogeneous regions. It differs from ensemble techniques in that typically only a few, or 1, expert m ...
approach, and a larger one-million-token context window, which equates to roughly an hour of silent video, 11 hours of audio, 30,000 lines of code, or 700,000 words. The same month, Google debuted Gemma, a family of
free and open-source Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...
LLMs that serve as a lightweight version of Gemini. They come in two sizes, with a neural network with two and seven billion parameters, respectively. Multiple publications viewed this as an response to Meta and others open-sourcing their AI models, and a stark reversal from Google's longstanding practice of keeping its AI proprietary.


Technical specifications

The first generation of Gemini ("Gemini 1") has three models, with the same
software architecture Software architecture is the fundamental structure of a software system and the discipline of creating such structures and systems. Each structure comprises software elements, relations among them, and properties of both elements and relations. ...
. They are decoder-only
transformers ''Transformers'' is a media franchise produced by American toy company Hasbro and Japanese toy company Tomy, Takara Tomy. It primarily follows the Autobots and the Decepticons, two alien robot factions at war that can transform into other forms ...
, with modifications to allow efficient training and inference on TPUs. They have a context length of 32,768 tokens, with multi-query attention. Two versions of Gemini Nano, Nano-1 (1.8 billion parameters) and Nano-2 (3.25 billion parameters), are distilled from larger Gemini models, designed for use by
edge device An edge device is a device that provides an entry point into enterprise or service provider core networks. Examples include routers, routing switches, integrated access devices (IADs), multiplexers, and a variety of metropolitan area network (MA ...
s such as smartphones. As Gemini is multimodal, each context window can contain multiple forms of input. The different modes can be interleaved and do not have to be presented in a fixed order, allowing for a multimodal conversation. For example, the user might open the conversation with a mix of text, picture, video, and audio, presented in any order, and Gemini might reply with the same free ordering. Input images may be of different resolutions, while video is inputted as a sequence of images. Audio is sampled at 16
kHz The hertz (symbol: Hz) is the unit of frequency in the International System of Units (SI), equivalent to one event (or cycle) per second. The hertz is an SI derived unit whose expression in terms of SI base units is s−1, meaning that one h ...
and then converted into a sequence of tokens by the Universal Speech Model. Gemini's dataset is multimodal and multilingual, consisting of "web documents, books, and code, and includ ngimage, audio, and video data". Demis Hassabis claims that training Gemini 1 used "roughly the same amount of compute, maybe slightly more than what was rumored for GPT-4". The second generation of Gemini ("Gemini 1.5") has one model published so far: Gemini 1.5 Pro. It is a multimodal sparse mixture-of-experts, with context length of "multiple millions".


Reception

Gemini's launch was preluded by months of intense speculation and anticipation, which ''
MIT Technology Review ''MIT Technology Review'' is a bimonthly magazine wholly owned by the Massachusetts Institute of Technology, and editorially independent of the university. It was founded in 1899 as ''The Technology Review'', and was re-launched without "The" in ...
'' described as "peak AI hype". In August 2023, Dylan Patel and Daniel Nishball of research firm SemiAnalysis penned a
blog post A blog (a truncation of "weblog") is a discussion or informational website published on the World Wide Web consisting of discrete, often informal diary-style text entries (posts). Posts are typically displayed in reverse chronological order s ...
declaring that the release of Gemini would "eat the world" and outclass GPT-4, prompting OpenAI CEO
Sam Altman Samuel H. Altman ( ; born April 22, 1985) is an American entrepreneur, investor, programmer, and blogger. He is the CEO of OpenAI and the former president of Y Combinator. Early life and education Altman grew up in St. Louis, Missouri; his mo ...
to ridicule the duo on X (formerly Twitter). Business magnate
Elon Musk Elon Reeve Musk ( ; born June 28, 1971) is a business magnate and investor. He is the founder, CEO and chief engineer of SpaceX; angel investor, CEO and product architect of Tesla, Inc.; owner and CEO of Twitter, Inc.; founder of The ...
, who co-founded OpenAI, weighed in, asking, "Are the numbers wrong?" Hugh Langley of ''
Business Insider ''Insider'', previously named ''Business Insider'' (''BI''), is an American financial and business news website founded in 2007. Since 2015, a majority stake in ''Business Insider''s parent company Insider Inc. has been owned by the German pub ...
'' remarked that Gemini would be a make-or-break moment for Google, writing: "If Gemini dazzles, it will help Google change the narrative that it was blindsided by Microsoft and OpenAI. If it disappoints, it will embolden critics who say Google has fallen behind." Reacting to its unveiling in December 2023,
University of Washington The University of Washington (UW, simply Washington, or informally U-Dub) is a public research university in Seattle, Washington. Founded in 1861, Washington is one of the oldest universities on the West Coast; it was established in Seat ...
professor emeritus
Oren Etzioni Oren Etzioni (born 1964) is an American entrepreneur, Professor Emeritus of computer science, and founding CEO of the Allen Institute for Artificial Intelligence (AI2). On June 15, 2022, he announced that he will step down as CEO of AI2 effective ...
predicted a "tit-for-tat arms race" between Google and OpenAI. Professor Alexei Efros of the
University of California, Berkeley The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California) is a public land-grant research university in Berkeley, California. Established in 1868 as the University of California, it is the state's first land-grant u ...
praised the potential of Gemini's multimodal approach, while scientist
Melanie Mitchell Melanie Mitchell is an American scientist. She is the Davis Professor of Complexity at the Santa Fe Institute. Her major work has been in the areas of analogical reasoning, complex systems, genetic algorithms and cellular automata, and her publi ...
of the
Santa Fe Institute The Santa Fe Institute (SFI) is an independent, nonprofit theoretical research institute located in Santa Fe, New Mexico, United States and dedicated to the multidisciplinary study of the fundamental principles of complex adaptive systems, inclu ...
called Gemini "very sophisticated". Professor Chirag Shah of the University of Washington was less impressed, likening Gemini's launch to the routineness of
Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple trees are cultivated worldwide and are the most widely grown species in the genus '' Malus''. The tree originated in Central Asia, where its wild ances ...
's annual introduction of a new iPhone. Similarly, Stanford University's Percy Liang, the University of Washington's Emily Bender, and the
University of Galway The University of Galway ( ga, Ollscoil na Gaillimhe) is a public research university located in the city of Galway, Ireland. A tertiary education and research institution, the university was awarded the full five QS stars for excellence in 201 ...
's Michael Madden cautioned that it was difficult to interpret benchmark scores without insight into the training data used. Writing for ''
Fast Company ''Fast Company'' is a monthly American business magazine published in print and online that focuses on technology, business, and design. It publishes six print issues per year. History ''Fast Company'' was launched in November 1995 by Alan We ...
'', Mark Sullivan opined that Google had the opportunity to challenge the iPhone's dominant market share, believing that Apple was unlikely to have the capacity to develop functionality similar to Gemini with its
Siri Siri ( ) is a virtual assistant that is part of Apple Inc.'s iOS, iPadOS, watchOS, macOS, tvOS, and audioOS operating systems. It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer ques ...
virtual assistant An intelligent virtual assistant (IVA) or intelligent personal assistant (IPA) is a software agent that can perform tasks or services for an individual based on commands or questions. The term " chatbot" is sometimes used to refer to virtua ...
. Google shares spiked by 5.3 percent the day after Gemini's launch. Google faced criticism for a demonstrative video of Gemini, which was not conducted in real time.


See also

*
Gato Gato (Spanish for cat) may refer to: People *Gato (given name) * Gato (surname) Places * Gato Island, in the Visayan Sea, Philippines * Gato Island, in the Mochima National Park on the northeastern coast of Venezuela * Gato, Orocovis, Puerto Ri ...
, a multimodal neural network developed by DeepMind


References


Further reading

*


External links

*
Press release
via '' The Keyword''
Announcement
an
demo
on
YouTube YouTube is a global online video sharing and social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by Google, and is the second most ...
* White paper fo
1.0
an
1.5
{{Google FOSS 2023 software Chatbots Google DeepMind Google software Large language models Multimodal interaction