Writeprint is a method in
forensic linguistics
Forensic linguistics, legal linguistics, or language and the law is the application of linguistic knowledge, methods, and insights to the forensic context of law, language, crime investigation, trial, and judicial procedure. It is a branch of ap ...
of establishing author identification over the internet, likened to a digital fingerprint. Identity is established through a comparison of distinguishing
stylometric characteristics of an unknown written text with known samples of the suspected author (
writer invariant Writer invariant, also called authorial invariant or author's invariant, is a property of a text which is invariant of its author, that is, it will be similar in all texts of a given author and different in texts of different authors. It can be used ...
s). Even without a suspect, writeprint provides potential background characteristics of the author, such as nationality and education.
There are five broad aspects to author identification in writeprint:
*Lexical features - the analysis of the
lexicon
A lexicon (plural: lexicons, rarely lexica) is the vocabulary of a language or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word ''lexicon'' derives from Greek word () ...
, the author's choice of vocabulary, using characters and words to identify preferences of an individual;
** use of uppercase and lowercase letters, frequency of certain letters, average length of word, mean length of the utterance itself
*Syntactic features - the analysis of the author's writing style and sentence structure, such as punctuation and hyphenation, use of
passive voice
A passive voice construction is a grammatical voice construction that is found in many languages. In a clause with passive voice, the grammatical subject expresses the ''theme'' or ''patient'' of the main verb – that is, the person or thing ...
, and sentence complexity;
*Structural features - the analysis of the author's organization and structural arrangement of the work, including paragraph length, spacing, and indentation.
** encompassing arrangement of sentences within paragraphs, use of farewells, greetings and signatures in an email setting, for example;
*Content-specific features - the analysis of the language that is contextually significant to subject of the written work, including the use of slang or acronyms. To be more specific, these features determine the interests of the subject by pinpointing keywords they use;
*Idiosyncratic features - the analysis of errors and other ungrammatical elements that may be unique to the author, such as incorrect spelling, misuse of words and inaccurate verb forms. Because this can be hard to control, it has achieved high accuracy in author identification when combined with other features.
While the five features above are the traditional methods of author identification, there are features unique to online text. Features such as choice in font, the use of emojis, and links to other websites all provide a path to identification which is absent in traditional text analysis.
See also
*
Author profiling
Author profiling is the analysis of a given set of texts in an attempt to uncover various characteristics of the author based on stylistic- and content-based features, or to identify the author. Characteristics analysed commonly include age and g ...
*
Stylometry
Stylometry is the application of the study of linguistic style, usually to written language. Argamon, Shlomo, Kevin Burns, and Shlomo Dubnov, eds. The structure of style: algorithmic approaches to understanding manner and meaning. Springer Scie ...
*
Forensic linguistics
Forensic linguistics, legal linguistics, or language and the law is the application of linguistic knowledge, methods, and insights to the forensic context of law, language, crime investigation, trial, and judicial procedure. It is a branch of ap ...
References
Applied linguistics
Forensic evidence
{{crime-stub