HOME

TheInfoList



OR:

Knowledge extraction is the creation of
knowledge Knowledge can be defined as awareness of facts or as practical skills, and may also refer to familiarity with objects or situations. Knowledge of facts, also called propositional knowledge, is often defined as true belief that is distin ...
from structured (
relational databases A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
,
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
) and unstructured (
text Text may refer to: Written word * Text (literary theory), any object that can be read, including: **Religious text, a writing that a religious tradition considers to be sacred **Text, a verse or passage from scripture used in expository preachin ...
, documents,
image An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensio ...
s) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to
information extraction Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concer ...
( NLP) and ETL (data warehouse), the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge (reusing identifiers or
ontologies In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains ...
) or the generation of a schema based on the source data. The RDB2RDF W3C group is currently standardizing a language for extraction of resource description frameworks (RDF) from
relational databases A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
. Another popular example for knowledge extraction is the transformation of Wikipedia into
structured data A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be c ...
and also the mapping to existing
knowledge Knowledge can be defined as awareness of facts or as practical skills, and may also refer to familiarity with objects or situations. Knowledge of facts, also called propositional knowledge, is often defined as true belief that is distin ...
(see
DBpedia DBpedia (from "DB" for " database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semanti ...
and Freebase).


Overview

After the standardization of knowledge representation languages such as RDF and
OWL Owls are birds from the order Strigiformes (), which includes over 200 species of mostly solitary and nocturnal birds of prey typified by an upright stance, a large, broad head, binocular vision, binaural hearing, sharp talons, and feathers a ...
, much research has been conducted in the area, especially regarding transforming relational databases into RDF,
identity resolution Record linkage (also known as data matching, data linkage, entity resolution, and many other terms) is the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, and d ...
,
knowledge discovery Knowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must r ...
and ontology learning. The general process uses traditional methods from
information extraction Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concer ...
and extract, transform, and load (ETL), which transform the data from the sources into structured formats. The following criteria can be used to categorize approaches in this topic (some of them only account for extraction from relational databases):


Examples


Entity linking

#
DBpedia Spotlight DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semantical ...
, OpenCalais
Dandelion dataTXT
the Zemanta API

an
PoolParty Extractor
analyze free text via named-entity recognition and then disambiguates candidates via name resolution and links the found entities to the
DBpedia DBpedia (from "DB" for " database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semanti ...
knowledge repository
Dandelion dataTXT demo
o
DBpedia Spotlight web demo
o
PoolParty Extractor Demo
. President Obama
called Wednesday o
Congress
to extend a tax break for students included in last year's economic stimulus package, arguing that the policy provides more generous assistance. :As President Obama is linked to a DBpedia
LinkedData In computing, linked data (often capitalized as Linked Data) is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but r ...
resource, further information can be retrieved automatically and a Semantic Reasoner can for example infer that the mentioned entity is of the typ
Person
(using
FOAF (software) FOAF (an acronym of friend of a friend) is a machine-readable ontology describing persons, their activities and their relations to other people and objects. Anyone can use FOAF to describe themselves. FOAF allows groups of people to describe s ...
) and of typ
Presidents of the United States
(using YAGO). Counter examples: Methods that only recognize entities or link to Wikipedia articles and other targets that do not provide further retrieval of structured data and formal knowledge.


Relational databases to RDF

# Triplify, D2R Server
Ultrawrap
and
Virtuoso A virtuoso (from Italian ''virtuoso'' or , "virtuous", Late Latin ''virtuosus'', Latin ''virtus'', "virtue", "excellence" or "skill") is an individual who possesses outstanding talent and technical ability in a particular art or field such a ...
RDF Views are tools that transform relational databases to RDF. During this process they allow reusing existing vocabularies and
ontologies In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains ...
during the conversion process. When transforming a typical relational table named ''users'', one column (e.g.''name'') or an aggregation of columns (e.g.''first_name'' and ''last_name'') has to provide the URI of the created entity. Normally the primary key is used. Every other column can be extracted as a relation with this entity. Then properties with formally defined semantics are used (and reused) to interpret the information. For example, a column in a user table called ''marriedTo'' can be defined as symmetrical relation and a column ''homepage'' can be converted to a property from the FOAF Vocabulary calle
foaf:homepage
thus qualifying it as an inverse functional property. Then each entry of the ''user'' table can be made an instance of the clas
foaf:Person
(Ontology Population). Additionally
domain knowledge Domain knowledge is knowledge of a specific, specialized discipline or field, in contrast to general (or domain-independent) knowledge. The term is often used in reference to a more general discipline—for example, in describing a software engin ...
(in form of an ontology) could be created from the ''status_id'', either by manually created rules (if ''status_id'' is 2, the entry belongs to class Teacher ) or by (semi)-automated methods (
ontology learning Ontology learning (ontology extraction, ontology generation, or ontology acquisition) is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts tha ...
). Here is an example transformation: :Peter :marriedTo :Mary . :marriedTo a owl:SymmetricProperty . :Peter foaf:homepage . :Peter a foaf:Person . :Peter a :Student . :Claus a :Teacher .


Extraction from structured sources to RDF


1:1 Mapping from RDB Tables/Views to RDF Entities/Attributes/Values

When building a RDB representation of a problem domain, the starting point is frequently an entity-relationship diagram (ERD). Typically, each entity is represented as a database table, each attribute of the entity becomes a column in that table, and relationships between entities are indicated by foreign keys. Each table typically defines a particular class of entity, each column one of its attributes. Each row in the table describes an entity instance, uniquely identified by a primary key. The table rows collectively describe an entity set. In an equivalent RDF representation of the same entity set: * Each column in the table is an attribute (i.e., predicate) * Each column value is an attribute value (i.e., object) * Each row key represents an entity ID (i.e., subject) * Each row represents an entity instance * Each row (entity instance) is represented in RDF by a collection of triples with a common subject (entity ID). So, to render an equivalent view based on RDF semantics, the basic mapping algorithm would be as follows: # create an RDFS class for each table # convert all primary keys and foreign keys into IRIs # assign a predicate IRI to each column # assign an rdf:type predicate for each row, linking it to an RDFS class IRI corresponding to the table # for each column that is neither part of a primary or foreign key, construct a triple containing the primary key IRI as the subject, the column IRI as the predicate and the column's value as the object. Early mentioning of this basic or direct mapping can be found in
Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web. He is a Professorial Fellow of Computer Science at the University of Oxford and a profes ...
's comparison of the ER model to the RDF model.


Complex mappings of relational databases to RDF

The 1:1 mapping mentioned above exposes the legacy data as RDF in a straightforward way, additional refinements can be employed to improve the usefulness of RDF output respective the given Use Cases. Normally, information is lost during the transformation of an entity-relationship diagram (ERD) to relational tables (Details can be found in object-relational impedance mismatch) and has to be
reverse engineered Reverse engineering (also known as backwards engineering or back engineering) is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accompli ...
. From a conceptual view, approaches for extraction can come from two directions. The first direction tries to extract or learn an OWL schema from the given database schema. Early approaches used a fixed amount of manually created mapping rules to refine the 1:1 mapping. More elaborate methods are employing heuristics or learning algorithms to induce schematic information (methods overlap with
ontology learning Ontology learning (ontology extraction, ontology generation, or ontology acquisition) is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts tha ...
). While some approaches try to extract the information from the structure inherent in the SQL schema (analysing e.g. foreign keys), others analyse the content and the values in the tables to create conceptual hierarchies (e.g. a columns with few values are candidates for becoming categories). The second direction tries to map the schema and its contents to a pre-existing domain ontology (see also:
ontology alignment Ontology alignment, or ontology matching, is the process of determining correspondences between concepts in ontologies. A set of correspondences is also called an alignment. The phrase takes on a slightly different meaning, in computer science, ...
). Often, however, a suitable domain ontology does not exist and has to be created first.


XML

As XML is structured as a tree, any data can be easily represented in RDF, which is structured as a graph
XML2RDF
is one example of an approach that uses RDF blank nodes and transforms XML elements and attributes to RDF properties. The topic however is more complex as in the case of relational databases. In a relational table the primary key is an ideal candidate for becoming the subject of the extracted triples. An XML element, however, can be transformed - depending on the context- as a subject, a predicate or object of a triple.
XSLT XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subseq ...
can be used a standard transformation language to manually convert XML to RDF.


Survey of methods / tools


Extraction from natural language sources

The largest portion of information contained in business documents (about 80%) is encoded in natural language and therefore unstructured. Because
unstructured data Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, num ...
is rather a challenge for knowledge extraction, more sophisticated methods are required, which generally tend to supply worse results compared to structured data. The potential for a massive acquisition of extracted knowledge, however, should compensate the increased complexity and decreased quality of extraction. In the following, natural language sources are understood as sources of information, where the data is given in an unstructured fashion as plain text. If the given text is additionally embedded in a markup document (e. g. HTML document), the mentioned systems normally remove the markup elements automatically.


Linguistic annotation / natural language processing (NLP)

As a preprocessing step to knowledge extraction, it can be necessary to perform linguistic annotation by one or multiple NLP tools. Individual modules in an NLP workflow normally build on tool-specific formats for input and output, but in the context of knowledge extraction, structured formats for representing linguistic annotations have been applied. Typical NLP tasks relevant to knowledge extraction include: * part-of-speech (POS) tagging * lemmatization (LEMMA) or stemming (STEM) * word sense disambiguation (WSD, related to semantic annotation below) * named entity recognition (NER, also see IE below) * syntactic parsing, often adopting syntactic dependencies (DEP) * shallow syntactic parsing (CHUNK): if performance is an issue, chunking yields a fast extraction of nominal and other phrases * anaphor resolution (see coreference resolution in IE below, but seen here as the task to create links between textual mentions rather than between the mention of an entity and an abstract representation of the entity) * semantic role labelling (SRL, related to relation extraction; not to be confused with semantic annotation as described below) * discourse parsing (relations between different sentences, rarely used in real-world applications) In NLP, such data is typically represented in TSV formats (CSV formats with TAB as separators), often referred to as CoNLL formats. For knowledge extraction workflows, RDF views on such data have been created in accordance with the following community standards: * NLP Interchange Format (NIF, for many frequent types of annotation) * Web Annotation (WA, often used for entity linking) * CoNLL-RDF (for annotations originally represented in TSV formats) Other, platform-specific formats include * LAPPS Interchange Format (LIF, used in the LAPPS Grid) * NLP Annotation Format (NAF, used in the NewsReader workflow management system)


Traditional information extraction (IE)

Traditional
information extraction Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concer ...
is a technology of natural language processing, which extracts information from typically natural language texts and structures these in a suitable manner. The kinds of information to be identified must be specified in a model before beginning the process, which is why the whole process of traditional Information Extraction is domain dependent. The IE is split in the following five subtasks. *
Named entity recognition Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre ...
(NER) *
Coreference resolution In linguistics, coreference, sometimes written co-reference, occurs when two or more expressions refer to the same person or thing; they have the same referent. For example, in ''Bill said Alice would arrive soon, and she did'', the words ''Alice'' ...
(CO) * Template element construction (TE) * Template relation construction (TR) * Template scenario production (ST) The task of
named entity recognition Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre ...
is to recognize and to categorize all named entities contained in a text (assignment of a named entity to a predefined category). This works by application of grammar based methods or statistical models. Coreference resolution identifies equivalent entities, which were recognized by NER, within a text. There are two relevant kinds of equivalence relationship. The first one relates to the relationship between two different represented entities (e.g. IBM Europe and IBM) and the second one to the relationship between an entity and their anaphoric references (e.g. it and IBM). Both kinds can be recognized by coreference resolution. During template element construction the IE system identifies descriptive properties of entities, recognized by NER and CO. These properties correspond to ordinary qualities like red or big. Template relation construction identifies relations, which exist between the template elements. These relations can be of several kinds, such as works-for or located-in, with the restriction, that both domain and range correspond to entities. In the template scenario production events, which are described in the text, will be identified and structured with respect to the entities, recognized by NER and CO and relations, identified by TR.


Ontology-based information extraction (OBIE)

Ontology-based information extraction is a subfield of information extraction, with which at least one
ontology In metaphysics, ontology is the philosophy, philosophical study of being, as well as related concepts such as existence, Becoming (philosophy), becoming, and reality. Ontology addresses questions like how entities are grouped into Category ...
is used to guide the process of information extraction from natural language text. The OBIE system uses methods of traditional information extraction to identify
concept Concepts are defined as abstract ideas. They are understood to be the fundamental building blocks of the concept behind principles, thoughts and beliefs. They play an important role in all aspects of cognition. As such, concepts are studied by ...
s, instances and relations of the used ontologies in the text, which will be structured to an ontology after the process. Thus, the input ontologies constitute the model of information to be extracted.


Ontology learning (OL)

Ontology learning is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms from natural language text. As building ontologies manually is extremely labor-intensive and time consuming, there is great motivation to automate the process.


Semantic annotation (SA)

During semantic annotation, natural language text is augmented with metadata (often represented in RDFa), which should make the semantics of contained terms machine-understandable. At this process, which is generally semi-automatic, knowledge is extracted in the sense, that a link between lexical terms and for example concepts from ontologies is established. Thus, knowledge is gained, which meaning of a term in the processed context was intended and therefore the meaning of the text is grounded in
machine-readable data Machine-readable data, or computer-readable data, is data in a format that can be processed by a computer. Machine-readable data must be structured data. Attempts to create machine-readable data occurred as early as the 1960s. At the same time th ...
with the ability to draw inferences. Semantic annotation is typically split into the following two subtasks. #
Terminology extraction Terminology extraction (also known as term extraction, glossary extraction, term recognition, or terminology mining) is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a give ...
# Entity linking At the terminology extraction level, lexical terms from the text are extracted. For this purpose a tokenizer determines at first the word boundaries and solves abbreviations. Afterwards terms from the text, which correspond to a concept, are extracted with the help of a domain-specific lexicon to link these at entity linking. In entity linking a link between the extracted lexical terms from the source text and the concepts from an ontology or knowledge base such as
DBpedia DBpedia (from "DB" for " database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semanti ...
is established. For this, candidate-concepts are detected appropriately to the several meanings of a term with the help of a lexicon. Finally, the context of the terms is analyzed to determine the most appropriate disambiguation and to assign the term to the correct concept. Note that "semantic annotation" in the context of knowledge extraction is not to be confused with
semantic parsing Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Application ...
as understood in natural language processing (also referred to as "semantic annotation"): Semantic parsing aims a complete, machine-readable representation of natural language, whereas semantic annotation in the sense of knowledge extraction tackles only a very elementary aspect of that.


Tools

The following criteria can be used to categorize tools, which extract knowledge from natural language text. The following table characterizes some tools for Knowledge Extraction from natural language sources.


Knowledge discovery

Knowledge discovery describes the process of automatically searching large volumes of
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
for patterns that can be considered
knowledge Knowledge can be defined as awareness of facts or as practical skills, and may also refer to familiarity with objects or situations. Knowledge of facts, also called propositional knowledge, is often defined as true belief that is distin ...
''about'' the data. It is often described as ''deriving'' knowledge from the input data. Knowledge discovery developed out of the data mining domain, and is closely related to it both in terms of methodology and terminology. The most well-known branch of data mining is knowledge discovery, also known as knowledge discovery in databases (KDD). Just as many other forms of knowledge discovery it creates
abstraction Abstraction in its main sense is a conceptual process wherein general rules and concepts are derived from the usage and classification of specific examples, literal ("real" or " concrete") signifiers, first principles, or other methods. "An abst ...
s of the input data. The ''knowledge'' obtained through the process may become additional ''data'' that can be used for further usage and discovery. Often the outcomes from knowledge discovery are not actionable, actionable knowledge discovery, also known as domain driven data mining, aims to discover and deliver actionable knowledge and insights. Another promising application of knowledge discovery is in the area of
software modernization Legacy modernization, also known as software modernization or platform modernization, refers to the conversion, rewriting or porting of a legacy system to modern computer programming languages, architectures (e.g. microservices), software librarie ...
, weakness discovery and compliance which involves understanding existing software artifacts. This process is related to a concept of
reverse engineering Reverse engineering (also known as backwards engineering or back engineering) is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accompli ...
. Usually the knowledge obtained from existing software is presented in the form of models to which specific queries can be made when necessary. An entity relationship is a frequent format of representing knowledge obtained from existing software.
Object Management Group The Object Management Group (OMG) is a computer industry standards consortium. OMG Task Forces develop enterprise integration standards for a range of technologies. Business activities The goal of the OMG was a common portable and interoperab ...
(OMG) developed the specification
Knowledge Discovery Metamodel Knowledge Discovery Metamodel (KDM) is a publicly available specification from the Object Management Group (OMG). KDM is a common intermediate representation for existing software systems and their operating environments, that defines common metad ...
(KDM) which defines an ontology for the software assets and their relationships for the purpose of performing knowledge discovery in existing code. Knowledge discovery from existing software systems, also known as software mining is closely related to data mining, since existing software artifacts contain enormous value for risk management and
business value In management, business value is an informal term that includes all forms of value that determine the health and well-being of the firm in the long run. Business value expands concept of value of the firm beyond economic value (also known as econom ...
, key for the evaluation and evolution of software systems. Instead of mining individual
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
s, software mining focuses on
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
, such as process flows (e.g. data flows, control flows, & call maps), architecture, database schemas, and business rules/terms/process.


Input data

*
Databases In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
**
Relational data Relational may refer to: Business * Relational capital, the value inherent in a company's relationships with its customers, vendors, and other important constituencies * Relational contract, a contract whose effect is based upon a relationship o ...
**
Database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases ...
** Document warehouse **
Data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integra ...
*
Software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consist ...
**
Source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the ...
**
Configuration files In computing, configuration files (commonly known simply as config files) are files used to configure the parameters and initial settings for some computer programs. They are used for user applications, server processes and operating syste ...
** Build scripts *
Text Text may refer to: Written word * Text (literary theory), any object that can be read, including: **Religious text, a writing that a religious tradition considers to be sacred **Text, a verse or passage from scripture used in expository preachin ...
** Concept mining * Graphs **
Molecule mining This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data in ...
*
Sequences In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called t ...
**
Data stream mining Data Stream Mining (also known as stream learning) is the process of extracting knowledge structures from continuous, rapid data records. A data stream is an ordered sequence of instances that in many applications of data stream mining can be read o ...
** Learning from time-varying data streams under concept drift *
Web Web most often refers to: * Spider web, a silken structure created by the animal * World Wide Web or the Web, an Internet-based hypertext system Web, WEB, or the Web may also refer to: Computing * WEB, a literate programming system created by ...


Output formats

*
Data model A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be c ...
*
Metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
*
Metamodels A metamodel or surrogate model is a model of a model, and metamodeling is the process of generating such metamodels. Thus metamodeling or meta-modeling is the analysis, construction and development of the frames, rules, constraints, models and ...
*
Ontology In metaphysics, ontology is the philosophy, philosophical study of being, as well as related concepts such as existence, Becoming (philosophy), becoming, and reality. Ontology addresses questions like how entities are grouped into Category ...
*
Knowledge representation Knowledge representation and reasoning (KRR, KR&R, KR²) is the field of artificial intelligence (AI) dedicated to representing information about the world in a form that a computer system can use to solve complex tasks such as diagnosing a medic ...
*
Knowledge tags In information systems, a tag is a keyword or term assigned to a piece of information (such as an Internet bookmark, multimedia, database record, or computer file). This kind of metadata helps describe an item and allows it to be found ag ...
*
Business rule A business rule defines or constrains some aspect of business. It may be expressed to specify an action to be taken when certain conditions are true or may be phrased so it can only resolve to either true or false. Business rules are intended to ass ...
*
Knowledge Discovery Metamodel Knowledge Discovery Metamodel (KDM) is a publicly available specification from the Object Management Group (OMG). KDM is a common intermediate representation for existing software systems and their operating environments, that defines common metad ...
(KDM) *
Business Process Modeling Notation Business Process Model and Notation (BPMN) is a graphical representation for specifying business processes in a business process model. Originally developed by the Business Process Management Initiative (BPMI), BPMN has been maintained by the ...
(BPMN) *
Intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
*
Resource Description Framework The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of ...
(RDF) *
Software metric In software engineering and development, a software metric is a standard of measure of a degree to which a software system or process possesses some property. Even if a metric is not a measurement (metrics are functions, while measurements are ...
s


See also

*
Cluster analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of ...
*
Data archaeology There are two conceptualisations of data archaeology, the technical definition and the social science definition. Data archaeology (also data archeology) in the technical sense refers to the art and science of recovering computer data Code, enc ...


Further readings

*


References

{{DEFAULTSORT:Knowledge Extraction
Extraction Extraction may refer to: Science and technology Biology and medicine * Comedo extraction, a method of acne treatment * Dental extraction, the surgical removal of a tooth from the mouth Computing and information science * Data extraction, the pr ...