{{Unreferenced, date=July 2016
An inauthentic text is a computer-generated expository document meant to appear as genuine, but which is actually meaningless. Frequently they are created in order to be intermixed with genuine documents and thus manipulate the results of search engines, as with
Spam blog
A spam blog, also known as an auto blog or the neologism splog, is a blog which the author uses to promote affiliated websites, to increase the search engine rankings of associated sites or to simply sell links/ads.
The purpose of a splog can be ...
s. They are also carried along in email in order to fool
spam filter
Email filtering is the processing of email to organize it according to specified criteria. The term can apply to the intervention of human intelligence, but most often refers to the automatic processing of messages at an SMTP server, possibly app ...
s by giving the spam the superficial characteristics of legitimate text.
Sometimes nonsensical documents are created with computer assistance for humorous effect, as with
Dissociated press
Dissociated press is a parody generator (a computer program that generates nonsensical text). The generated text is based on another text using the Markov chain technique. The name is a play on " Associated Press" and the psychological term dis ...
or
Flarf poetry
Flarf poetry was an '' avant-garde'' poetry movement of the early 21st century. The term ''Flarf'' was coined by the poet Gary Sullivan, who also wrote and published the earliest Flarf poems. Its first practitioners, working in loose collaboration ...
. They have also been used to challenge the veracity of a publication—
MIT
The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the ...
students submitted papers generated by a computer program called
SCIgen
SCIgen is a paper generator that uses context-free grammar to randomly generate nonsense in the form of computer science research papers. Its original data source was a collection of computer science papers downloaded from CiteSeer. All elements ...
to a conference, where they were initially accepted. This led the students to claim that the bar for submissions was too low.
With the amount of computer generated text outpacing the ability of people to humans to curate it, there needs some means of distinguishing between the two. Yet automated approaches to determining absolutely whether a text is authentic or not face intrinsic challenges of semantics.
Noam Chomsky
Avram Noam Chomsky (born December 7, 1928) is an American public intellectual: a linguist, philosopher, cognitive scientist, historian, social critic, and political activist. Sometimes called "the father of modern linguistics", Chomsky is ...
coined the phrase "
Colorless green ideas sleep furiously
''Colorless green ideas sleep furiously'' is a sentence composed by Noam Chomsky in his 1957 book ''Syntactic Structures'' as an example of a sentence (linguistics), sentence that is grammatically Well-formedness, well-formed, but semantically N ...
" giving an example of grammatically-correct, but semantically incoherent sentence; some will point out that in certain contexts one could give this sentence (or any phrase) meaning.
The first group to use the expression in this regard can be found below from
Indiana University
Indiana University (IU) is a system of public universities in the U.S. state of Indiana.
Campuses
Indiana University has two core campuses, five regional campuses, and two regional centers under the administration of IUPUI.
* Indiana Univers ...
. Their work explains in detail an attempt to detect inauthentic texts and identify pernicious problems of inauthentic texts in cyberspace. The site has a means of submitting text that assesses, based on supervised learning, whether a corpus is inauthentic or not. Many users have submitted incorrect types of data and have correspondingly commented on the scores. This application is meant for a specific kind of data; therefore, submitting, say, an email, will not return a meaningful score.
See also
*
Scraper site
A scraper site is a website that copies content from other websites using web scraping. The content is then mirrored with the goal of creating revenue, usually through advertising and sometimes by selling user data. Scraper sites come in various f ...
*
Spamdexing
Spamdexing (also known as search engine spam, search engine poisoning, black-hat search engine optimization, search spam or web spam) is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link buildin ...
Indiana University
Indiana University (IU) is a system of public universities in the U.S. state of Indiana.
Campuses
Indiana University has two core campuses, five regional campuses, and two regional centers under the administration of IUPUI.
* Indiana Univers ...