Semantic Scholar is an
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machine
A machine is a physical system using Power (physics), power to apply Force, forces and control Motion, moveme ...
–powered research tool for scientific literature developed at the
Allen Institute for AI and publicly released in November 2015.
It uses advances in
natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
to provide summaries for scholarly papers.
The Semantic Scholar team is actively researching the use of artificial-intelligence in
natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
,
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
,
Human-Computer interaction, and
information retrieval.
Semantic Scholar began as a database surrounding the topics of
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
,
geoscience
Earth science or geoscience includes all fields of natural science related to the planet Earth. This is a branch of science dealing with the physical, chemical, and biological complex constitutions and synergistic linkages of Earth's four spher ...
, and
neuroscience
Neuroscience is the science, scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions and disorders. It is a Multidisciplinary approach, multidisciplinary science that combines physiology, an ...
.
However, in 2017 the system began including
biomedical literature
Medical literature is the scientific literature of medicine: articles in journals and texts in books devoted to the field of medicine. Many references to the medical literature include the health care literature generally, including that of denti ...
in its corpus.
As of September 2022, they now include over 200 million publications from all fields of science.
Technology
Semantic Scholar provides a one-sentence summary of
scientific literature
: ''For a broader class of literature, see Academic publishing.''
Scientific literature comprises scholarly publications that report original empirical and theoretical work in the natural and social sciences. Within an academic field, sci ...
. One of its aims was to address the challenge of reading numerous titles and lengthy abstracts on mobile devices.
It also seeks to ensure that the three million scientific papers published yearly reach readers, since it is estimated that only half of this literature are ever read.
Artificial intelligence is used to capture the essence of a paper, generating it through an "abstractive" technique.
The project uses a combination of
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
,
natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
, and
machine vision
Machine vision (MV) is the technology and methods used to provide imaging-based automatic inspection and analysis for such applications as automatic inspection, process control, and robot guidance, usually in industry. Machine vision refers to m ...
to add a layer of
semantic analysis to the traditional methods of
citation analysis, and to extract relevant figures,
tables
Table may refer to:
* Table (furniture), a piece of furniture with a flat surface and one or more legs
* Table (landform), a flat area of land
* Table (information), a data arrangement with rows and columns
* Table (database), how the table data ...
, entities, and venues from papers.
In contrast with
Google Scholar
Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes p ...
and
PubMed
PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain ...
, Semantic Scholar is designed to highlight the most important and influential elements of a paper. The AI technology is designed to identify hidden connections and links between research topics. Like the previously cited search engines, Semantic Scholar also exploits graph structures, which include the
Microsoft Academic Knowledge Graph, Springer Nature's
SciGraph, and the Semantic Scholar Corpus.
Each paper hosted by Semantic Scholar is assigned a unique
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, physical countable object (or class thereof), or physical noncountable ...
called the Semantic Scholar Corpus ID (abbreviated S2CID). The following entry is an example:
::
Semantic Scholar is free to use and unlike similar search engines (i.e.
Google Scholar
Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes p ...
) does not search for material that is behind a
paywall.
One study compared the search abilities of Semantic Scholar through a systematic approach, and found the search engine to be 98.88% accurate when attempting to uncover the data.
The same study examined other Semantic Scholar functions, including tools to survey
metadata as well as several citation tools.
Number of users and publications
As of January 2018, following a 2017 project that added biomedical papers and topic summaries, the Semantic Scholar corpus included more than 40 million papers from
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
and
biomedicine. In March 2018, Doug Raymond, who developed
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
initiatives for the
Amazon Alexa
Amazon Alexa, also known simply as Alexa, is a virtual assistant technology largely based on a Polish speech synthesiser named Ivona, bought by Amazon in 2013. It was first used in the Amazon Echo smart speaker and the Echo Dot, Echo Studio ...
platform, was hired to lead the Semantic Scholar project. As of August 2019, the number of included papers metadata (not the actual PDFs) had grown to more than 173 million after the addition of the
Microsoft Academic Graph records. In 2020, a partnership between Semantic Scholar and the
University of Chicago Press Journals made all articles published under the University of Chicago Press available in the Semantic Scholar corpus. At the end of 2020, Semantic Scholar had indexed 190 million papers.
In 2020, users of Semantic Scholar reached seven million a month.
See also
*
*
*
*
List of academic databases and search engines
This article contains a representative list of notable databases and search engines useful in an academic setting for finding and accessing articles in academic journals, institutional repositories, archives, or other collections of scientific an ...
*
References
External links
*
{{Authority control
Bibliographic databases in computer science
Scholarly search services
Applications of artificial intelligence