General Index (academia)
   HOME

TheInfoList



OR:

The General Index is a free-to-use database, which when compressed takes up 8.5 terabytes. It was created by technologist Carl Malamud and his nonprofit foundation Public Resource. , it contains words and phrases from more than 107 million academic papers. It consists of a table of n-grams (a contiguous sequence of n items) derived from the full text of the articles along with tables of associated keywords and metadata. It is intended to ease computerized analysis of the scientific literature, which has been hindered by widespread
copyright A copyright is a type of intellectual property that gives its owner the exclusive legal right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time. The creative work may be in a literary, artistic, ...
restrictions limiting access by researchers to the full text. The initial version, comprising the raw
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
tables without any
search engine A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
front-end, was released by the
Internet Archive The Internet Archive is an American 501(c)(3) organization, non-profit organization founded in 1996 by Brewster Kahle that runs a digital library website, archive.org. It provides free access to collections of digitized media including web ...
on October 7, 2021.


See also

*
Machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
*
Open access Open access (OA) is a set of principles and a range of practices through which nominally copyrightable publications are delivered to readers free of access charges or other barriers. With open access strictly defined (according to the 2001 de ...


References


External links

{{database-stub Internet Archive projects