Semantic Scholar is an
artificial intelligence–powered research tool for scientific literature developed at the
Allen Institute for AI and publicly released in November 2015.
It uses advances in
natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
to provide summaries for scholarly papers.
The Semantic Scholar team is actively researching the use of artificial-intelligence in
natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
,
machine learning,
Human-Computer interaction, and
information retrieval
Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
.
Semantic Scholar began as a database surrounding the topics of
computer science,
geoscience, and
neuroscience.
However, in 2017 the system began including
biomedical literature
Medical literature is the scientific literature of medicine: articles in journals and texts in books devoted to the field of medicine. Many references to the medical literature include the health care literature generally, including that of denti ...
in its corpus.
As of September 2022, they now include over 200 million publications from all fields of science.
Technology
Semantic Scholar provides a one-sentence summary of
scientific literature. One of its aims was to address the challenge of reading numerous titles and lengthy abstracts on mobile devices.
It also seeks to ensure that the three million scientific papers published yearly reach readers, since it is estimated that only half of this literature are ever read.
Artificial intelligence is used to capture the essence of a paper, generating it through an "abstractive" technique.
The project uses a combination of
machine learning,
natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
, and
machine vision to add a layer of
semantic analysis to the traditional methods of
citation analysis, and to extract relevant figures,
tables, entities, and venues from papers.
In contrast with
Google Scholar and
PubMed, Semantic Scholar is designed to highlight the most important and influential elements of a paper. The AI technology is designed to identify hidden connections and links between research topics. Like the previously cited search engines, Semantic Scholar also exploits graph structures, which include the
Microsoft Academic Knowledge Graph, Springer Nature's
SciGraph, and the Semantic Scholar Corpus.
Each paper hosted by Semantic Scholar is assigned a unique
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, physical countable object (or class thereof), or physical noncountable ...
called the Semantic Scholar Corpus ID (abbreviated S2CID). The following entry is an example:
::
Semantic Scholar is free to use and unlike similar search engines (i.e.
Google Scholar) does not search for material that is behind a
paywall
A paywall is a method of restricting access to content, with a purchase or a paid subscription, especially news. Beginning in the mid-2010s, newspapers started implementing paywalls on their websites as a way to increase revenue after years of ...
.
One study compared the search abilities of Semantic Scholar through a systematic approach, and found the search engine to be 98.88% accurate when attempting to uncover the data.
The same study examined other Semantic Scholar functions, including tools to survey
metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
as well as several citation tools.
Number of users and publications
As of January 2018, following a 2017 project that added biomedical papers and topic summaries, the Semantic Scholar corpus included more than 40 million papers from
computer science and
biomedicine. In March 2018, Doug Raymond, who developed
machine learning initiatives for the
Amazon Alexa platform, was hired to lead the Semantic Scholar project. As of August 2019, the number of included papers metadata (not the actual PDFs) had grown to more than 173 million after the addition of the
Microsoft Academic Graph records. In 2020, a partnership between Semantic Scholar and the
University of Chicago Press Journals made all articles published under the University of Chicago Press available in the Semantic Scholar corpus. At the end of 2020, Semantic Scholar had indexed 190 million papers.
In 2020, users of Semantic Scholar reached seven million a month.
See also
*
*
*
*
List of academic databases and search engines
This article contains a representative list of notable databases and search engines useful in an academic setting for finding and accessing articles in academic journals, institutional repositories, archives, or other collections of scientific and ...
*
References
External links
*
{{Authority control
Bibliographic databases in computer science
Scholarly search services
Applications of artificial intelligence