HOME





Automatic Acquisition Of Lexicon
Automatic acquisition of lexicon is a computerized process used for the development of a complex morphological lexicon of a language. The lexicon is essential for the NLP (Natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...), as well as a prerequisite to any wide-coverage parser.Sagot, Benoît.'' Automatic acquisition of a Slovak Lexicon from a Raw Corpus.'/ref> The two main requirements represent raw Text corpus, corpus and the morphological description of the language. The aim is to provide lemmas that will serve to the explanation of all the words that occur within the corpus. For the achievement of a quality lexicon it is necessary to manually validate the generated lemmas and iterate the whole process several times. The process is focused on the ope ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Natural Language Processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Major tasks in natural language processing are speech recognition, text classification, natural-language understanding, natural language understanding, and natural language generation. History Natural language processing has its roots in the 1950s. Already in 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence, though at the time that was not articulated as a problem separate from artificial intelligence. The proposed test includes a task that involves the automated interpretation and generation of natural language ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Text Corpus
In linguistics and natural language processing, a corpus (: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated. Annotated, they have been used in corpus linguistics for statistical statistical hypothesis testing, hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. Overview A corpus may contain texts in a single language (''monolingual corpus'') or text data in multiple languages (''multilingual corpus''). In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as annotation. An example of annotating a corpus is part-of-speech tagging, or ''POS-tagging'', in which information about each word's part of speech (verb, noun, adjective, etc.) is added to the corpus in the form of ''tags''. Another example is indicating the Lemma (morphology), lemma (base) form of each word ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Lemma (morphology)
In morphology and lexicography, a lemma (: lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of word forms. In English, for example, ''break'', ''breaks'', ''broke'', ''broken'' and ''breaking'' are forms of the same lexeme, with ''break'' as the lemma by which they are indexed. ''Lexeme'', in this context, refers to the set of all the inflected or alternating forms in the paradigm of a single word, and ''lemma'' refers to the particular form that is chosen by convention to represent the lexeme. Lemmas have special significance in highly inflected languages such as Arabic, Turkish, and Russian. The process of determining the ''lemma'' for a given lexeme is called lemmatisation. The lemma can be viewed as the chief of the principal parts, although lemmatisation is at least partly arbitrary. Morphology The form of a word that is chosen to serve as the lemma is usually the least marked form, but there are several exceptions such as the use ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Open Word Classes
In grammar, a part of speech or part-of-speech (abbreviated as POS or PoS, also known as word class or grammatical category) is a category of words (or, more generally, of lexical items) that have similar grammatical properties. Words that are assigned to the same part of speech generally display similar syntactic behavior (they play similar roles within the grammatical structure of sentences), sometimes similar morphological behavior in that they undergo inflection for similar properties and even similar semantic behavior. Commonly listed English parts of speech are noun, verb, adjective, adverb, pronoun, preposition, conjunction, interjection, numeral, article, and determiner. Other terms than ''part of speech''—particularly in modern linguistic classifications, which often make more precise distinctions than the traditional scheme does—include word class, lexical class, and lexical category. Some authors restrict the term ''lexical category'' to refer only to a particul ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Slovak Language
Slovak ( ; endonym: or ), is a West Slavic language of the Czech-Slovak languages, Czech–Slovak group, written in Latin script and formerly in Cyrillic script. It is part of the Indo-European languages, Indo-European language family, and is one of the Slavic languages, which are part of the larger Balto-Slavic languages, Balto-Slavic branch. Spoken by approximately 5 million people as a native language, primarily ethnic Slovaks, it serves as the official language of Slovakia and one of the 24 official languages of the European Union. Slovak is closely related to Czech language, Czech, to the point of very high mutual intelligibility, as well as to Polish language, Polish. Like other Slavic languages, Slovak is a fusional language with a complex system of morphology (linguistics), morphology and relatively flexible word order. Its vocabulary has been extensively influenced by Latin and German language, German, as well as other Slavic languages. History The Czech–Slovak gr ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]