The Apache OpenNLP library is a
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
based toolkit for the
processing of natural language text. It supports the most common NLP tasks, such as
language detection
In natural language processing, language identification or language guessing is the problem of determining which natural language given content is in. Computational approaches to this problem view it as a special case of text categorization, sol ...
,
tokenization
Tokenization may refer to:
* Tokenization (lexical analysis) in language processing
* Tokenization (data security) in the field of data security
* Word segmentation
* Tokenism
Tokenism is the practice of making only a perfunctory or symbolic ef ...
,
sentence segmentation,
part-of-speech tagging
In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definiti ...
,
named entity extraction,
chunking
Chunking may mean:
* Chunking (division), an approach for doing simple mathematical division sums, by repeated subtraction
* Chunking (computational linguistics), a method for parsing natural language sentences into partial syntactic structures
* ...
,
parsing
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
and
coreference resolution. These tasks are usually required to build more advanced text processing services.
Apache OpenNLP Proposal
/ref>
See also
* Unstructured Information Management Architecture (UIMA)
* General Architecture for Text Engineering
General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many nat ...
(GATE)
* cTAKES
Apache cTAKES: clinical Text Analysis and Knowledge Extraction System is an open-source Natural Language Processing (NLP) system that extracts clinical information from electronic health record unstructured data, unstructured text. It processes cl ...
References
External links
Apache OpenNLP Website
{{Apache Software Foundation
Natural language processing
Statistical natural language processing
Natural language processing toolkits
OpenNLP
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named en ...
Java (programming language) libraries
Cross-platform software
2004 software