HOME

TheInfoList



OR:

The Ubiquitous Knowledge Processing Lab (also UKP Lab) is a research lab at the Department of Computer Science at the Technische Universität Darmstadt. It was founded in 2006 by Iryna Gurevych.


Research Activities

UKP Lab develops
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
techniques for automatically understanding written text and applies them to information management like information retrieval, question answering, and structuring information in
Wiki A wiki ( ) is an online hypertext publication collaboratively edited and managed by its own audience, using a web browser. A typical wiki contains multiple pages for the subjects or scope of the project, and could be either open to the pu ...
s. The Ubiquitous Knowledge Processing Lab is among the leading research institutes in the field of utilizing Web 2.0 content as the source of
lexical semantic Lexical may refer to: Linguistics * Lexical corpus or lexis, a complete set of all words in a language * Lexical item, a basic unit of lexicographical classification * Lexicon, the vocabulary of a person, language, or branch of knowledge * Lexical ...
information for natural language processing (NLP).
Wikipedia Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system. Wikipedia is the largest and most-read ref ...
and
Wiktionary Wiktionary ( , , rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a number ...
are employed as collaboratively constructed lexical semantic resources and used to improve expert-built resources like
WordNet WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into ''synsets'' with short definit ...
. These resources are used to develop semantically enhanced algorithms for information retrieval and question answering. An example is semantic search: If a user enters the query "pie-fruit" into a search engine, a standard search engine will retrieve pages containing the words "pie" but not the word "fruit", providing plenty of pages on "apple pie". An intelligent search engine will "understand" that the user is interested in pie recipes that do not use any type of fruit and retrieve appropriate documents.Example from
Impulse für die Wissenschaft 2010 (Volkswagenstiftung)
Further research activities at UKP lab are automatic quality assessment of text, sentiment analysis and opinion mining. Research activities are organized into the following research areas: * Educational natural language processing * Multilingual semantic information management * Natural language processing for Wikis A strong focus at UKP Lab is on utilizing novel natural language processing algorithms in real-life applications. UKP Lab collaborates with partners from academia and industry to improve various application scenarios, such as
customer relationship management Customer relationship management (CRM) is a process in which a business or other organization administers its interactions with customers, typically using data analysis to study large amounts of information. CRM systems compile data from a r ...
,
digital humanities Digital humanities (DH) is an area of scholarly activity at the intersection of computing or digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanities, as well as the analy ...
, educational applications, or public security.


Software

Part of the research efforts at UKP Lab is the development of natural language processing (NLP) software. The following software packages are freely available for research purposes:


DKPro

The Darmstadt Knowledge Processing Software Repository (DKPro) is an open source community of software projects aimed at Natural Language Processing. It offers robust, ready to use NLP components which are built on top of IBM’s Unstructured Information Management Architecture (UIMA) as a common and open framework. DKPro contains basic natural language processing components like part-of-speech tagging and
lemmatization Lemmatisation ( or lemmatization) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form. In computational linguistics, lemmati ...
. Additionally, the package offers components that support the processing of user generated discourse.
User-generated content User-generated content (UGC), alternatively known as user-created content (UCC), is any form of content, such as images, videos, text, testimonials, and audio, that has been posted by users on online platforms such as social media, discussion ...
contains spelling errors, abbreviations and emoticons which prohibit direct application of standard NLP components. DKPro provides the required preprocessing tools.


Wikipedia API

The Java Wikipedia Library (JWPL)Reference publication

Zesch, Müller, Gurevych: Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary, Proceedings of LREC 2008.
was also developed at UKP Lab. It is a Java (programming language), Java-based application programming interface for Wikipedia and allows programmatic access to all information contained in
Wikipedia Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system. Wikipedia is the largest and most-read ref ...
.


Wiktionary API

Parallel to JWPL, the Java Wiktionary Library (JWKTL) offers programmatic access to information contained in the English and the German versions of
Wiktionary Wiktionary ( , , rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a number ...
.


References


External links


Website Ubiquitous Knowledge Processing LabWebsite Iryna Gurevych

DKProWikipedia APIWiktionary API
{{authority control Computer science institutes in Germany Technische Universität Darmstadt