HOME

TheInfoList



OR:

The Apache OpenNLP library is a
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as
language detection In natural language processing, language identification or language guessing is the problem of determining which natural language given content is in. Computational approaches to this problem view it as a special case of text categorization, sol ...
,
tokenization Tokenization may refer to: * Tokenization (lexical analysis) in language processing * Tokenization (data security) in the field of data security * Word segmentation * Tokenism Tokenism is the practice of making only a perfunctory or symbolic ef ...
, sentence segmentation,
part-of-speech tagging In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definiti ...
, named entity extraction,
chunking Chunking may mean: * Chunking (division), an approach for doing simple mathematical division sums, by repeated subtraction * Chunking (computational linguistics), a method for parsing natural language sentences into partial syntactic structures * ...
,
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
and coreference resolution. These tasks are usually required to build more advanced text processing services.Apache OpenNLP Proposal
/ref>


See also

* Unstructured Information Management Architecture (UIMA) *
General Architecture for Text Engineering General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many nat ...
(GATE) *
cTAKES Apache cTAKES: clinical Text Analysis and Knowledge Extraction System is an open-source Natural Language Processing (NLP) system that extracts clinical information from electronic health record unstructured data, unstructured text. It processes cl ...


References


External links


Apache OpenNLP Website
{{Apache Software Foundation Natural language processing Statistical natural language processing Natural language processing toolkits
OpenNLP The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named en ...
Java (programming language) libraries Cross-platform software 2004 software