NetOwl is a suite of multilingual text and identity analytics products that analyze
big data
Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
in the form of text data – reports, web,
social media
Social media are interactive technologies that facilitate the Content creation, creation, information exchange, sharing and news aggregator, aggregation of Content (media), content (such as ideas, interests, and other forms of expression) amongs ...
, etc. – as well as structured entity data about people, organizations, places, and things.
NetOwl utilizes artificial intelligence (AI)-based approaches, including
natural language processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
(NLP),
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
(ML), and
computational linguistics
Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
, to extract entities, relationships, and events; to perform
sentiment analysis
Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subje ...
; to assign latitude/longitude to geographical references in text; to translate names written in foreign languages; and to perform name matching and
identity resolution
Record linkage (also known as data matching, data linkage, entity resolution, and many other terms) is the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, and d ...
.
["SRA International."](_blank)
Washington Post. Retrieved 2013-07-02.[Zelenko, Dmitry, and Chinatsu Aone]
“Discriminative Methods for Transliteration.”
In Proceedings of 2006 Conference Empirical Applications of Natural Language Processing (2006). Retrieved 2013-05-20.[Maybury, Mark (2012)]
Multimedia Information Extraction
Hoboken, New Jersey: John Wiley & Sons, Inc., p. 18. Retrieved 2013-07-02.
NetOwl's uses include
semantic search
Semantic search denotes search with meaning, as distinguished from lexical search where the search engine looks for literal matches of the query words or variants of them, without understanding the overall meaning of the query. Semantic search seek ...
and discovery, geospatial analysis,
[Smith, Susan]
“Notes from the GEOINT 2007 Symposium.”
GISCafe (2007-10-29). Retrieved 2013-07-02. intelligence analysis, content enrichment,
[Guess, Angela (2012-01-19)]
"LexisNexis Releases New Version of Lexis Advance".
semanticweb.com. Retrieved 2013-07-28. compliance monitoring,
[Aone, Chinatsu, et al]
“Assentor®: an NLP-based Solution to E-mail Monitoring.”
In Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence (2000), pp. 945-540. Retrieved 2013-05-20. cyber threat monitoring, risk management, and bioinformatics.
History
The first NetOwl product was NetOwl Extractor, which was initially released in 1996. Since then, Extractor has added many new capabilities, including relationship and event extraction, categorization, name translation, geotagging, and sentiment analysis, as well as entity extraction in other languages. Other products were added later to the NetOwl suite, namely TextMiner, NameMatcher, and EntityMatcher.
NetOwl has participated in several 3rd party-sponsored text and entity analytics software benchmarking events. NetOwl Extractor was the top-scoring named entity extraction system at the
DARPA
The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Adva ...
-sponsored
Message Understanding Conference
The Message Understanding Conferences (MUC) for computing and computer science, were initiated and financed by DARPA (Defense Advanced Research Projects Agency) to encourage the development of new and better methods of information extraction. The ...
MUC-6 and the top-scoring link and event extraction system in MUC-7. It was also the top-scoring system at several of the
NIST
The National Institute of Standards and Technology (NIST) is an agency of the United States Department of Commerce whose mission is to promote American innovation and industrial competitiveness. NIST's activities are organized into physical s ...
-sponsored
Automatic Content Extraction (ACE) evaluation tasks.
The ACE 2005 (ACE'05) Evaluation Plan.
Retrieved 2013-05-20. NetOwl NameMatcher was the top-scoring system at th
MITRE Challenge
for Multicultural Person Name Matching.
Products
The NetOwl suite includes, among others, the following text and entity analytics products:
Text Analytics
NetOwl Extractor performs entity extraction
Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pr ...
from unstructured texts using natural language processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
(NLP), machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
(ML), and computational linguistics
Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
. Extractor also performs semantic relationship and event extraction as well as geotagging
Geotagging, or GeoTagging, is the process of adding geographical identification metadata to various media such as a geotagged photograph or video, websites, SMS messages, QR Codes or RgSSfeeds and is a form of geospatial metadata. This data ...
of text. It is used for a variety of data sources including both traditional sources (e.g., news, reports, web pages, email) and social media (e.g., Twitter, Facebook, chats, blogs). It runs on a variety of Big Data analytics platforms, including Apache Hadoop
Apache Hadoop () is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop wa ...
and LexisNexis’s High-Performance Computer Cluster (HPCC
HPCC (High-Performance Computing Cluster), also known as DAS (Data Analytics Supercomputer), is an open source, data-intensive computing system platform developed by LexisNexis Risk Solutions. The HPCC platform incorporates a software architect ...
) technology. It has been integrated with a number of 3rd party analytical tools such as Esri ArcGIS and Google Earth/Maps.
Identity Analytics
NetOwl NameMatcher and EntityMatcher perform name matching and identity resolution for large multicultural and multilingual entity databases using machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
(ML) and computational linguistics
Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
approaches. They are used for applications such as anti-money laundering (AML), watch lists, regulatory compliance
In general, compliance means conforming to a rule, such as a specification, policy, standard or law. Compliance has traditionally been explained by reference to deterrence theory, according to which punishing a behavior will decrease the viol ...
, fraud detection, etc.
See also
* Knowledge extraction
Knowledge extraction is the creation of knowledge from structured ( relational databases, XML) and unstructured (text, documents, images) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must ...
* Text mining
Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from differe ...
* Data mining
Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
* Computational linguistics
Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
* Named entity recognition
Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pr ...
* Unstructured data
Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically plain text, text-heavy, but may contain data such ...
* Document classification
Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more Class (philosophy), classes or Categorization, categories. This may be do ...
References
{{reflist, 2
External links
NetOwl website
Natural language processing software
Natural language processing
Data mining and machine learning software