HOME

TheInfoList



OR:

Semantic Scholar is a research tool for scientific literature. It is developed at the
Allen Institute for AI The Allen Institute for AI (abbreviated AI2) is a 501(c)(3) non-profit scientific research institute founded by late Microsoft co-founder and philanthropist Paul Allen in 2014. The institute seeks to conduct high-impact AI research and engineeri ...
and was publicly released in November 2015. Semantic Scholar uses modern techniques in
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
to support the research process, for example by providing automatically generated summaries of scholarly papers. The Semantic Scholar team is actively researching the use of artificial intelligence in
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
,
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
,
human–computer interaction Human–computer interaction (HCI) is the process through which people operate and engage with computer systems. Research in HCI covers the design and the use of computer technology, which focuses on the interfaces between people (users) and comp ...
, and
information retrieval Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
. Semantic Scholar began as a database for the topics of
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
,
geoscience Earth science or geoscience includes all fields of natural science related to the planet Earth. This is a branch of science dealing with the physical, chemical, and biological complex constitutions and synergistic linkages of Earth's four spheres ...
, and
neuroscience Neuroscience is the scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions, and its disorders. It is a multidisciplinary science that combines physiology, anatomy, molecular biology, ...
. In 2017, the system began including biomedical literature in its corpus. , it includes over 200 million publications from all fields of science.


Technology

Semantic Scholar provides a one-sentence summary of
scientific literature Scientific literature encompasses a vast body of academic papers that spans various disciplines within the natural and social sciences. It primarily consists of academic papers that present original empirical research and theoretical ...
. One of its aims was to address the challenge of reading numerous titles and lengthy abstracts on mobile devices. It also seeks to ensure that the three million scientific papers published yearly reach readers, since it is estimated that only half of this literature is ever read. Artificial intelligence is used to capture the essence of a paper, generating it through an "abstractive" technique. The project uses a combination of
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
,
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
, and
machine vision Machine vision is the technology and methods used to provide image, imaging-based automation, automatic inspection and analysis for such applications as automatic inspection, process control, and robot guidance, usually in industry. Machine vision ...
to add a layer of semantic analysis to the traditional methods of
citation analysis Citation analysis is the examination of the frequency, patterns, and graphs of citations in documents. It uses the directed graph of citationslinks from one document to another documentto reveal properties of the documents. A typical aim would b ...
, and to extract relevant figures, tables, entities, and venues from papers. Another key AI-powered feature is Research Feeds, an adaptive research recommender that uses AI to quickly learn what papers users care about reading and recommends the latest research to help scholars stay up to date. It uses a state-of-the-art paper embedding model trained using contrastive learning to find papers similar to those in each Library folder. Semantic Scholar also offers Semantic Reader, an augmented reader with the potential to revolutionize scientific reading by making it more accessible and richly contextual. Semantic Reader provides in-line citation cards that allow users to see citations with TLDR (short for Too Long, Didn't Read) automatically generated short summaries as they read and skimming highlights that capture key points of a paper so users can digest faster. In contrast with
Google Scholar Google Scholar is a freely accessible web search engine that indexes the full text or metadata of Academic publishing, scholarly literature across an array of publishing formats and disciplines. Released in Beta release, beta in November 2004, th ...
and
PubMed PubMed is an openly accessible, free database which includes primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institute ...
, Semantic Scholar is designed to highlight the most important and influential elements of a paper. The AI technology is designed to identify hidden connections and links between research topics. Like the previously cited search engines, Semantic Scholar also exploits graph structures, which include the Microsoft Academic Knowledge Graph, Springer Nature's
SciGraph SciGraph was a search engine tool developed by Springer Nature, the former URL was https://scigraph.springernature.com/explorer. The technology, which was considered a Linked Open Data (LOD) platform, collects information that covers the resear ...
, and the Semantic Scholar Corpus (originally a 45 million papers corpus in computer science, neuroscience and biomedicine).


Article identifier

Each paper hosted by Semantic Scholar is assigned a unique
identifier An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...
called the Semantic Scholar Corpus ID (abbreviated S2CID). The following entry is an example:


Indexing

Semantic Scholar is free to use and unlike similar search engines (i.e.
Google Scholar Google Scholar is a freely accessible web search engine that indexes the full text or metadata of Academic publishing, scholarly literature across an array of publishing formats and disciplines. Released in Beta release, beta in November 2004, th ...
) does not search for material that is behind a
paywall A paywall is a method of restricting access to content (media), content, with a purchase or a subscription business model, paid subscription, especially news. Beginning in the mid-2010s, newspapers started implementing paywalls on their website ...
. One study compared the index scope of Semantic Scholar to Google Scholar, and found that for the papers cited by secondary studies in computer science, the two indices had comparable coverage, each only missing a handful of the papers.


Number of users and publications

As of January 2018, following a 2017 project that added biomedical papers and topic summaries, the Semantic Scholar corpus included more than 40 million papers from
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
and
biomedicine Biomedicine (also referred to as Western medicine, mainstream medicine or conventional medicine)
. In March 2018, Doug Raymond, who developed
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
initiatives for the
Amazon Alexa Amazon Alexa is a virtual assistant technology marketed by Amazon and implemented in software applications for smart phones, tablets, wireless smart speakers, and other electronic appliances. Alexa was largely developed from a Polish speech s ...
platform, was hired to lead the Semantic Scholar project. , the number of included papers metadata (not the actual PDFs) had grown to more than 173 million after the addition of the
Microsoft Academic Graph Microsoft Academic was a free internet-based academic search engine for academic publications and literature, developed by Microsoft Research in 2016 as a successor of Microsoft Academic Search. Microsoft Academic was shut down in 2022. Both O ...
records. In 2020, a partnership between Semantic Scholar and the University of Chicago Press Journals made all articles published under the University of Chicago Press available in the Semantic Scholar corpus. At the end of 2020, Semantic Scholar had indexed 190 million papers. In 2020, Semantic Scholar reached seven million users per month.


See also

* * * *
List of academic databases and search engines This page contains a representative list of major databases and search engines useful in an academic setting for finding and accessing articles in academic journals, institutional repository, institutional repositories, archives, or other collecti ...
*


References


External links

* {{Authority control Internet properties established in 2015 Bibliographic databases in computer science Scholarly search services Applications of artificial intelligence