HLE (Humanity's Last Exam)
   HOME





HLE (Humanity's Last Exam)
Humanity's Last Exam (HLE) is a language model benchmark consisting of 2,500 questions across a broad range of subjects. It was created jointly by the Center for AI Safety and Scale AI. Creation Stanford HAI's AI Index 2025 Annual Report cites Humanity's Last Exam as one of the "more challenging benchmarks" developed in response to the popular AI benchmarks having reached "saturation". The test has been described as the brainchild of Dan Hendrycks, a machine learning researcher and the director of the Center for AI Safety, who stated that he was inspired to create the test after a conversation with Elon Musk, who thought the existing language model benchmarks, such as the MMLU, were too easy. Hendrycks worked with Scale AI to compile the questions. The questions were crowdsourced from subject matter experts from various institutions across the world. The questions were first filtered by the leading AI models; if the models failed to answer the question or did worse than random ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Language Model Benchmark
Language model benchmarks are standardized tests designed to evaluate the performance of language models on various natural language processing tasks. These tests are intended for comparing different models' capabilities in areas such as language understanding, generation, and reasoning. Benchmarks generally consist of a dataset and corresponding evaluation metrics. The dataset provides text samples and annotations, while the metrics measure a model's performance on tasks like question answering, text classification, and machine translation. These benchmarks are developed and maintained by academic institutions, research organizations, and industry players to track progress in the field. Overview Types Benchmarks may be described by the following adjectives, not mutually exclusive: * Classical: These tasks are studied in natural language processing, even before the advent of deep learning. Examples include the Penn Treebank for testing syntactic and semantic parsing, as w ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Anthropic
Anthropic PBC is an American artificial intelligence (AI) startup company founded in 2021. Anthropic has developed a family of large language models (LLMs) named Claude as a competitor to OpenAI's ChatGPT and Google's Gemini. According to the company, it researches and develops AI to "study their safety properties at the technological frontier" and use this research to deploy safe models for the public. Anthropic was founded by former members of OpenAI, including siblings Daniela Amodei and Dario Amodei. In September 2023, Amazon announced an investment of up to $4 billion, followed by a $2 billion commitment from Google in the following month. History Founding and early development (2021–2022) Anthropic was founded in 2021 by seven former employees of OpenAI, including siblings Daniela Amodei and Dario Amodei, the latter of whom served as OpenAI's Vice President of Research. In April 2022, Anthropic announced it had received $580 million in funding, including a $500 ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Large Language Models
A large language model (LLM) is a language model trained with Self-supervised learning, self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially Natural language generation, language generation. The largest and most capable LLMs are Generative pre-trained transformer, generative pretrained transformers (GPTs), which are largely used in Generative artificial intelligence, generative Chatbot, chatbots such as ChatGPT or Gemini (chatbot), Gemini. LLMs can be Fine-tuning (deep learning), fine-tuned for specific tasks or guided by prompt engineering. These models acquire Predictive learning, predictive power regarding syntax, semantics, and Ontology (information science), ontologies inherent in human Text corpus, language corpora, but they also inherit inaccuracies and Algorithmic bias, biases present in the Training, validation, and test data sets, data they are trained in. History Before the emergence of transformer-bas ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Qwen
''Qwen'' (also called ''Tongyi Qianwen'', zh, s=通义千问) is a family of large language models developed by Alibaba Cloud. In July 2024, it was ranked as the top Chinese language model in some benchmarks and third globally behind the top models of Anthropic and OpenAI. Models Alibaba first launched a beta of Qwen in April 2023 under the name ''Tongyi Qianwen''. The model's architecture was based on the Llama architecture developed by Meta AI. It was publicly released in September 2023 after receiving approval from the Chinese government. In December 2023 it released its 72B and 1.8B models as open source, while Qwen 7B was open sourced in August. In June 2024 Alibaba launched Qwen 2 and in September it released some of its models as open source, while keeping its most advanced models proprietary. Qwen 2 contains both dense and sparse models. In November 2024, QwQ-32B-Preview, a model focusing on reasoning similar to OpenAI's o1, was released under the Apache 2.0 Licens ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Alibaba Cloud
Alibaba Cloud, also known as Aliyun ( zh, p=Ālǐyún, s=阿里云, l=Ali Cloud), is a cloud computing company, a subsidiary of Alibaba Group. Alibaba Cloud provides cloud computing services to online businesses and Alibaba's own e-commerce ecosystem. Its international operations are registered and headquartered in Singapore. Alibaba Cloud offers cloud services that are available on a pay-as-you-go basis, and include elastic compute, data storage, relational databases, big-data processing, DDoS protection and content delivery networks (CDN). It is the largest cloud computing company in China, and in Asia Pacific according to Gartner. Alibaba Cloud operates data centers in 29 regions and 87 availability zones around the globe. As of June 2017, Alibaba Cloud is placed in the Visionaries' quadrant of Gartner's Magic Quadrant for cloud infrastructure as a service, worldwide. History Alibaba Cloud was founded in September 2009, and R&D centers and operation centers were opened ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

DeepSeek-R1
Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., doing business as DeepSeek, is a Chinese artificial intelligence company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, Deepseek is owned and funded by the Chinese hedge fund High-Flyer. DeepSeek was founded in July 2023 by Liang Wenfeng, the co-founder of High-Flyer, who also serves as the CEO for both companies. The company launched an eponymous chatbot alongside its DeepSeek-R1 model in January 2025. Released under the MIT License, DeepSeek-R1 provides responses comparable to other contemporary large language models, such as OpenAI's GPT-4 and o1. Its training cost was reported to be significantly lower than other LLMs. The company claims that it trained its V3 model for US$6 million—far less than the US$100 million cost for OpenAI's GPT-4 in 2023—and using approximately one-tenth the computing power consumed by Meta's comparable model, Llama 3.1. DeepS ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

DeepSeek
Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., Trade name, doing business as DeepSeek, is a Chinese artificial intelligence company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, Deepseek is owned and funded by the Chinese hedge fund High-Flyer. DeepSeek was founded in July 2023 by Liang Wenfeng, the co-founder of High-Flyer, who also serves as the Chief executive officer, CEO for both companies. The company launched DeepSeek (chatbot), an eponymous chatbot alongside its DeepSeek-R1 model in January 2025. Released under the MIT License, DeepSeek-R1 provides responses comparable to other contemporary large language models, such as OpenAI's GPT-4 and OpenAI o1, o1. Its training cost was reported to be significantly lower than other LLMs. The company claims that it trained its V3 model for US$6 million—far less than the US$100 million cost for OpenAI's GPT-4 in 2023—and using approximately one-tenth the comput ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Amazon Web Services
Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon.com, Amazon that provides Software as a service, on-demand cloud computing computing platform, platforms and Application programming interface, APIs to individuals, companies, and governments, on a metered, pay-as-you-go basis. Clients will often use this in combination with elasticity (system resource), autoscaling (a process that allows a client to use more computing in times of high application usage, and then scale down to reduce costs when there is less traffic). These cloud computing web services provide various services related to networking, compute, storage, middleware, Internet of things, IoT and other processing capacity, as well as software tools via AWS server farms. This frees clients from managing, scaling, and patching hardware and operating systems. One of the foundational services is Amazon Elastic Compute Cloud (EC2), which allows users to have at their disposal a Virtualization, virtual Computer clus ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Mistral AI
Mistral AI SAS () is a French artificial intelligence (AI) startup, headquartered in Paris. Founded in 2023, it specializes in open-weight large language models (LLMs), with both open-source and proprietary AI models. Namesake The company is named after the mistral, a powerful, cold wind in southern France. History Mistral AI was established in April 2023 by three French AI researchers, Arthur Mensch, Guillaume Lample and Timothée Lacroix. Mensch, an expert in advanced AI systems, is a former employee of Google DeepMind; Lample and Lacroix, meanwhile, are large-scale AI models specialists who had worked for Meta Platforms. The trio originally met during their studies at École Polytechnique. Company operation Funding In June 2023, the start-up carried out a first fundraising of €105 million ($117 million) with investors including the American fund Lightspeed Venture Partners, Eric Schmidt, Xavier Niel and JCDecaux. The valuation is then estimated by the Fi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Llama (language Model)
Llama (Large Language Model Meta AI, formerly stylized as LLaMA) is a family of large language models (LLMs) released by Meta AI starting in February 2023. The latest version is Llama 4, released in April 2025. Llama models come in different sizes, ranging from 1 billion to 2 trillion parameters. Initially only a foundation model, starting with Llama 2, Meta AI released instruction fine-tuned versions alongside foundation models. Model weights for the first version of Llama were only available to researchers on a case-by-case basis, under a non-commercial license. Unauthorized copies of the first model were shared via BitTorrent. Subsequent versions of Llama were made accessible outside academia and released under licenses that permitted some commercial use. Alongside the release of Llama 3, Meta added virtual assistant features to Facebook and WhatsApp in select regions, and a standalone website. Both services use a Llama 3 model. Background After the release of large la ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Meta AI
Meta AI is a research division of Meta (formerly Facebook) that develops artificial intelligence and augmented reality technologies. History The foundation of laboratory was announced in 2013, under the name Facebook Artificial Intelligence Research (FAIR). FAIR has workspaces in Menlo Park, California, London, United Kingdom, and Manhattan. FAIR was first directed by New York University's Yann LeCun, a deep learning professor and Turing Award winner. Working with NYU's Center for Data Science, FAIR's initial goal was to research data science, machine learning, and artificial intelligence. Vladimir Vapnik, a pioneer in statistical learning, joined FAIR in 2014. FAIR opened a research center in Paris, France in 2015, and subsequently launched smaller satellite research labs in Seattle, Pittsburgh, Tel Aviv, Montreal and London. In 2016, FAIR partnered with Google, Amazon, IBM, and Microsoft in creating the Partnership on Artificial Intelligence to Benefit People and Society ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Claude (language Model)
Claude is a family of large language models developed by Anthropic. The first model was released in March 2023. The Claude 3 family, released in March 2024, consists of three models: Haiku, optimized for speed; Sonnet, which balances capability and performance; and Opus, designed for complex reasoning tasks. These models can process both text and images, with Claude 3 Opus demonstrating enhanced capabilities in areas like mathematics, programming, and logical reasoning Logical reasoning is a mind, mental Action (philosophy), activity that aims to arrive at a Logical consequence, conclusion in a Rigour, rigorous way. It happens in the form of inferences or arguments by starting from a set of premises and reason ... compared to previous versions. Claude 4, which includes Opus and Sonnet, was released in May 2025. Training Claude models are generative pre-trained transformers. They have been pre-trained to predict the next word in large amounts of text. Then, they have ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]