HOME

TheInfoList



OR:

In morphology and
lexicography Lexicography is the study of lexicons and the art of compiling dictionaries. It is divided into two separate academic disciplines: * Practical lexicography is the art or craft of compiling, writing and editing dictionaries. * Theoretical le ...
, a lemma (: lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of
word A word is a basic element of language that carries semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguist ...
forms. In English, for example, ''break'', ''breaks'', ''broke'', ''broken'' and ''breaking'' are forms of the same
lexeme A lexeme () is a unit of lexical meaning that underlies a set of words that are related through inflection. It is a basic abstract unit of meaning, a unit of morphological analysis in linguistics that roughly corresponds to a set of forms ta ...
, with ''break'' as the lemma by which they are indexed. ''Lexeme'', in this context, refers to the set of all the inflected or alternating forms in the paradigm of a single word, and ''lemma'' refers to the particular form that is chosen by convention to represent the lexeme. Lemmas have special significance in highly inflected languages such as
Arabic Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
, Turkish, and Russian. The process of determining the ''lemma'' for a given lexeme is called lemmatisation. The lemma can be viewed as the chief of the principal parts, although lemmatisation is at least partly arbitrary.


Morphology

The form of a word that is chosen to serve as the lemma is usually the least marked form, but there are several exceptions such as the use of the infinitive for verbs in some languages. For English, the citation form of a
noun In grammar, a noun is a word that represents a concrete or abstract thing, like living creatures, places, actions, qualities, states of existence, and ideas. A noun may serve as an Object (grammar), object or Subject (grammar), subject within a p ...
is the
singular Singular may refer to: * Singular, the grammatical number that denotes a unit quantity, as opposed to the plural and other forms * Singular or sounder, a group of boar, see List of animal names * Singular (band), a Thai jazz pop duo *'' Singula ...
(and non-possessive) form: ''mouse'' rather than ''mice''. For multiword lexemes that contain possessive adjectives or
reflexive pronoun A reflexive pronoun is a pronoun that refers to another noun or pronoun (its antecedent) within the same sentence. In the English language specifically, a reflexive pronoun will end in ''-self'' or ''-selves'', and refer to a previously n ...
s, the citation form uses a form of the
indefinite pronoun An indefinite pronoun is a pronoun which does not have a specific, familiar referent. Indefinite pronouns are in contrast to definite pronouns. Indefinite pronouns can represent either count nouns or noncount nouns. They often have related for ...
''one'': ''do one's best'', ''perjure oneself''. In European languages with
grammatical gender In linguistics, a grammatical gender system is a specific form of a noun class system, where nouns are assigned to gender categories that are often not related to the real-world qualities of the entities denoted by those nouns. In languages wit ...
, the citation form of regular adjectives and nouns is usually the masculine singular. If the language also has cases, the citation form is often the masculine singular nominative. For many languages, the citation form of a
verb A verb is a word that generally conveys an action (''bring'', ''read'', ''walk'', ''run'', ''learn''), an occurrence (''happen'', ''become''), or a state of being (''be'', ''exist'', ''stand''). In the usual description of English, the basic f ...
is the
infinitive Infinitive ( abbreviated ) is a linguistics term for certain verb forms existing in many languages, most often used as non-finite verbs that do not show a tense. As with many linguistic concepts, there is not a single definition applicable to all ...
: French ', German ', Hindustani /, Spanish '. English verbs usually have an infinitive, which in its bare form (without the particle ''to'') is its least marked (for example, ''break'' is chosen over ''to break'', ''breaks'', ''broke'', ''breaking'', and ''broken''); for defective verbs with no infinitive the present tense is used (for example, ''must'' has only one form while ''shall'' has no infinitive, and both lemmas are their lexemes' present tense forms). For
Latin Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
,
Ancient Greek Ancient Greek (, ; ) includes the forms of the Greek language used in ancient Greece and the classical antiquity, ancient world from around 1500 BC to 300 BC. It is often roughly divided into the following periods: Mycenaean Greek (), Greek ...
,
Modern Greek Modern Greek (, or , ), generally referred to by speakers simply as Greek (, ), refers collectively to the dialects of the Greek language spoken in the modern era, including the official standardized form of the language sometimes referred to ...
, and Bulgarian, the first person singular present tense is traditionally used, but some modern dictionaries use the infinitive instead (except for Bulgarian, which lacks infinitives; for contracted verbs in Ancient Greek, an uncontracted first person singular present tense is used to reveal the contract vowel: ''philéō'' for ''philō'' "I love" mplying affection ''agapáō'' for ''agapō'' "I love" mplying regard. Finnish dictionaries list verbs not under their root, but under the first infinitive, marked with ''-(t)a'', ''-(t)ä''. For Japanese, the non-past (present and future) tense is used. For
Arabic Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
the third-person singular masculine of the past/perfect tense is the least-marked form and is used for entries in modern dictionaries. In older dictionaries, which are still commonly used, the triliteral of the word, either a verb or a noun, is used. This is similar to
Hebrew Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
, which also uses the third-person singular masculine perfect form, e.g. ברא ''bara' '' create, כפר ''kaphar'' deny. Georgian uses the
verbal noun Historically, grammarians have described a verbal noun or gerundial noun as a verb form that functions as a noun. An example of a verbal noun in English is 'sacking' as in the sentence "The ''sacking'' of the city was an epochal event" (wherein ...
. For Korean, ''-da'' is attached to the stem. In Tamil, an
agglutinative language An agglutinative language is a type of language that primarily forms words by stringing together morphemes (word parts)—each typically representing a single grammatical meaning—without significant modification to their forms ( agglutinations) ...
, the verb stem (which is also the imperative form - the least marked one) is often cited, e.g., '' இரு'' In Irish, words are highly inflected by case (genitive, nominative, dative and vocative) and by their place within a sentence because of initial mutations. The noun ''cainteoir'', the lemma for the noun meaning "speaker", has a variety of forms: ''chainteoir'', ''gcainteoir'', ''cainteora'', ''chainteora'', ''cainteoirí'', ''chainteoirí'' and ''gcainteoirí''. Some phrases are cited in a sort of lemma: '' Carthago delenda est'' (literally, "Carthage must be destroyed") is a common way of citing Cato, but what he said was nearer to ''censeo Carthaginem esse delendam'' ("I hold Carthage to be in need of destruction").


Lexicography

In a dictionary, the lemma "go" represents the
inflected In linguistic Morphology (linguistics), morphology, inflection (less commonly, inflexion) is a process of word formation in which a word is modified to express different grammatical category, grammatical categories such as grammatical tense, ...
forms "go", "goes", "going", "went", and "gone". The relationship between an inflected form and its lemma is usually denoted by an angle bracket, e.g., "went" < "go". Of course, the disadvantage of such simplifications is the inability to look up a declined or conjugated form of the word, but some dictionaries, like
Webster's Dictionary ''Webster's Dictionary'' is any of the US English language dictionaries edited in the early 19th century by Noah Webster (1758–1843), a US lexicographer, as well as numerous related or unrelated dictionaries that have adopted the Webster's n ...
, list "went". Multilingual dictionaries vary in how they deal with this issue: the
Langenscheidt Langenscheidt () is a German publishing company that specializes in language reference works. In addition to publishing language, monolingual dictionary, dictionaries, Langenscheidt also publishes bilingual dictionaries and travel phrase-books. ...
dictionary of German does not list ''ging'' (< ''gehen''), but the Cassell does. Lemmas or
word stem In linguistics, a word stem is a word part responsible for a word's lexical meaning. The term is used with slightly different meanings depending on the morphology of the language in question. For instance, in Athabaskan linguistics, a verb stem ...
s are used often in
corpus linguistics Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural ''corpora''). Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a giv ...
for determining word frequency. In that usage, the specific definition of "lemma" is flexible depending on the task it is being used for.


Pronunciation

A word may have different
pronunciation Pronunciation is the way in which a word or a language is spoken. To This may refer to generally agreed-upon sequences of sounds used in speaking a given word or all language in a specific dialect—"correct" or "standard" pronunciation—or si ...
s, depending on its
phonetic Phonetics is a branch of linguistics that studies how humans produce and perceive sounds or, in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians ...
environment (the neighbouring sounds) or on the degree of stress in a sentence. An example of the latter is the weak and strong forms of certain English
function word In linguistics, function words (also called functors) are words that have little lexical meaning or have ambiguous meaning and express grammatical relationships among other words within a sentence, or specify the attitude or mood of the speak ...
s like ''some'' and ''but'' (pronounced , when stressed but , when unstressed). Dictionaries usually give the pronunciation used when the word is pronounced alone (its isolation form) and with stress, but they may also note common weak forms of pronunciation.


Difference between stem and lemma

The stem is the part of the word that never changes even when morphologically inflected; a lemma is the least marked form of the word. In linguistic analysis, the stem is defined more generally as a form without any of its possible inflectional morphemes (but including derivational morphemes and may contain multiple roots). When
phonology Phonology (formerly also phonemics or phonematics: "phonemics ''n.'' 'obsolescent''1. Any procedure for identifying the phonemes of a language from a corpus of data. 2. (formerly also phonematics) A former synonym for phonology, often pre ...
is taken into account, the definition of the unchangeable part of the word is not useful, as can be seen in the phonological forms of the words in the preceding example: "produced" vs. "production" . Some lexemes have several stems but one lemma. For instance the verb " to go" has the stems "go" and "went" due to
suppletion In linguistics and etymology, suppletion is traditionally understood as the use of one word as the inflected form of another word when the two words are not cognate. For those learning a language, suppletive forms will be seen as "irregular" or ev ...
: the past tense was co-opted from a different verb, " to wend".


Headword

A headword or catchword is the lemma under which a set of related
dictionary A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...
or encyclopaedia entries appears. The headword is used to locate the entry, and dictates its alphabetical position. Depending on the size and nature of the dictionary or encyclopedia, the entry may include alternative meanings of the word, its
etymology Etymology ( ) is the study of the origin and evolution of words—including their constituent units of sound and meaning—across time. In the 21st century a subfield within linguistics, etymology has become a more rigorously scientific study. ...
,
pronunciation Pronunciation is the way in which a word or a language is spoken. To This may refer to generally agreed-upon sequences of sounds used in speaking a given word or all language in a specific dialect—"correct" or "standard" pronunciation—or si ...
and
inflection In linguistic Morphology (linguistics), morphology, inflection (less commonly, inflexion) is a process of word formation in which a word is modified to express different grammatical category, grammatical categories such as grammatical tense, ...
s, related lemmas such as
compound word In linguistics, a compound is a lexeme (less precisely, a word or Sign language, sign) that consists of more than one Word stem, stem. Compounding, composition or nominal composition is the process of word formation that creates compound lexemes. C ...
s or phrases that contain the headword, and encyclopedic information about the concepts represented by the word. For example, the headword ''
bread Bread is a baked food product made from water, flour, and often yeast. It is a staple food across the world, particularly in Europe and the Middle East. Throughout recorded history and around the world, it has been an important part of many cu ...
'' may contain the following (simplified) definitions: :Bread :''(noun)'' :* A common food made from the combination of
flour Flour is a powder made by Mill (grinding), grinding raw grains, List of root vegetables, roots, beans, Nut (fruit), nuts, or seeds. Flours are used to make many different foods. Cereal flour, particularly wheat flour, is the main ingredie ...
,
water Water is an inorganic compound with the chemical formula . It is a transparent, tasteless, odorless, and Color of water, nearly colorless chemical substance. It is the main constituent of Earth's hydrosphere and the fluids of all known liv ...
and
yeast Yeasts are eukaryotic, single-celled microorganisms classified as members of the fungus kingdom (biology), kingdom. The first yeast originated hundreds of millions of years ago, and at least 1,500 species are currently recognized. They are est ...
:* Money ''(slang)'' :''(verb)'' :* To coat in breadcrumbs :— to know which side your bread is buttered to know how to act in your own best interests. The '' Academic Dictionary of Lithuanian'' contains around 500,000 headwords. The ''
Oxford English Dictionary The ''Oxford English Dictionary'' (''OED'') is the principal historical dictionary of the English language, published by Oxford University Press (OUP), a University of Oxford publishing house. The dictionary, which published its first editio ...
'' (OED) has around 273,000 headwords along with 220,000 other lemmas, while '' Webster's Third New International Dictionary'' has about 470,000. The '' Deutsches Wörterbuch'' (DWB), the largest lexicon of the
German language German (, ) is a West Germanic language in the Indo-European language family, mainly spoken in Western Europe, Western and Central Europe. It is the majority and Official language, official (or co-official) language in Germany, Austria, Switze ...
, has around 330,000 headwords.The Deutsches Wörterbuch
at the BBAW, retrieved 22-June-2012.
These values are cited by the dictionary makers and may not use exactly the same definition of a headword. In addition, headwords may not accurately reflect a dictionary's physical size. The ''OED'' and the ''DWB'', for instance, include exhaustive historical reviews and exact citations from
source document A source document is a document in which data collected for a clinical trial is first recorded. This data is usually later entered in the case report form. The International Conference on Harmonisation of Technical Requirements for Registration ...
s not usually found in standard dictionaries. The term 'lemma' comes from the practice in Greco-Roman antiquity of using the word to refer to the headwords of marginal glosses in
scholia Scholia (: scholium or scholion, from , "comment", "interpretation") are grammatical, critical, or explanatory comments – original or copied from prior commentaries – which are inserted in the margin of the manuscript of ancient a ...
; for this reason, the
Ancient Greek Ancient Greek (, ; ) includes the forms of the Greek language used in ancient Greece and the classical antiquity, ancient world from around 1500 BC to 300 BC. It is often roughly divided into the following periods: Mycenaean Greek (), Greek ...
plural form is sometimes used, namely ''lemmata'' (Greek λῆμμα, pl. λήμματα).


Conventions

Many dictionaries list all forms of a term combined as one entry under a single headword. The form chosen for the headword is then governed by some common conventions.


Nouns

For languages with
grammatical case A grammatical case is a category of nouns and noun modifiers (determiners, adjectives, participles, and Numeral (linguistics), numerals) that corresponds to one or more potential grammatical functions for a Nominal group (functional grammar), n ...
, the headword takes the form of the
nominative case In grammar, the nominative case ( abbreviated ), subjective case, straight case, or upright case is one of the grammatical cases of a noun or other part of speech, which generally marks the subject of a verb, or (in Latin and formal variants ...
, used when the noun serves as the
subject (grammar) A subject is one of the two main parts of a Sentence (linguistics), sentence (the other being the Predicate (grammar), predicate, which modifies the subject). For the simple Sentence (linguistics), sentence ''John runs'', ''John'' is the subject, ...
of a sentence. Unless it concerns a '' plurale tantum'', the singular is used. For example, the Latin word for "rose" will traditionally be listed under the entry ''rosa'', together with its inflected forms (''rosae'', ''rosam'', ''rosarum'', ''rosis'') – if these are given at all. Some languages have separate forms for a male and female sense of a noun, as in French ''chanteur'' (for a male singer) and ''chanteuse'' (for a female singer). The female form may then be listed under the male form, which is used as the headword.


Adjectives

As for nouns, adjectives are listed in the nominative singular (for languages that inflect for grammatical case or number). If adjectives are inflected for gender, the masculine form is traditionally used for the headword. This headword may also serve as the headword for the comparative and superlative, even when these are irregular, as in ''good'' – ''better'' – ''best''.


Verbs

For most languages, the traditional headword of a verb is its
infinitive Infinitive ( abbreviated ) is a linguistics term for certain verb forms existing in many languages, most often used as non-finite verbs that do not show a tense. As with many linguistic concepts, there is not a single definition applicable to all ...
form. Notable exceptions are
Latin Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
and
Ancient Greek Ancient Greek (, ; ) includes the forms of the Greek language used in ancient Greece and the classical antiquity, ancient world from around 1500 BC to 300 BC. It is often roughly divided into the following periods: Mycenaean Greek (), Greek ...
; for these, the traditional choice is the first-person singular. So a traditional Latin dictionary has an entry ''dico'' (meaning "I say"), and not ''dicere'' ("to say"). Likewise, for Ancient Greek the traditional headword is the first-person singular (legō), and not the infinitive (legein).
Modern Greek Modern Greek (, or , ), generally referred to by speakers simply as Greek (, ), refers collectively to the dialects of the Greek language spoken in the modern era, including the official standardized form of the language sometimes referred to ...
has no infinitives; again, the first-person singular is used. The same holds for Bulgarian, while for Macedonian the third-person singular is used. Infinitives and other verb forms may be marked for tense, aspect and
voice The human voice consists of sound made by a human being using the vocal tract, including talking, singing, laughing, crying, screaming, shouting, humming or yelling. The human voice frequency is specifically a part of human sound produ ...
; the headword of choice is usually as unmarked as possible, which for many languages may correspond to present tense, imperfective aspect and active voice. In languages with deponent verbs, which have no active forms, the middle or passive voice is used for such verbs. For example, the Latin verb for "follow" will be found under ''sequor'' ("I follow").


See also

*
Lexeme A lexeme () is a unit of lexical meaning that underlies a set of words that are related through inflection. It is a basic abstract unit of meaning, a unit of morphological analysis in linguistics that roughly corresponds to a set of forms ta ...
* Lexical Markup Framework * Null morpheme * Principal parts *
Root (linguistics) A root (also known as a root word or radical) is the core of a word that is irreducible into more meaningful elements. In morphology, a root is a morphologically simple unit which can be left bare or to which a prefix or a suffix can attach. ...
* Uninflected word


References


External links

{{Authority control Lexical units Morphemes Linguistics terminology