In
natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
(NLP), a text graph is a
graph representation
In computer science, a graph is an abstract data type that is meant to implement the undirected graph and directed graph concepts from the field of graph theory within mathematics.
A graph data structure consists of a finite (and possibly mut ...
of a
text item (document, passage or sentence). It is typically created as a preprocessing step to support NLP tasks such as
text condensation
term disambiguation
(topic-based)
text summarization
Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Artificial intelligence algorithms are commo ...
,
relation extraction A relationship extraction task requires the detection and classification of semantic relationship mentions within a set of artifacts, typically from text or XML documents. The task is very similar to that of information extraction (IE), but IE a ...
and
textual entailment Textual entailment (TE) in natural language processing is a directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text. In the TE framework, the entailing and entailed texts are ...
.
Representation
The semantics of what a text graph's nodes and edges represent can vary widely. Nodes for example can simply connect to tokenized words, or to domain-specific terms, or to entities mentioned in the text. The edges, on the other hand, can be between these text-based tokens or they can also link to a
knowledge base
A knowledge base (KB) is a technology used to store complex structured and unstructured information used by a computer system. The initial use of the term was in connection with expert systems, which were the first knowledge-based systems. ...
.
TextGraphs Workshop series
The TextGraphs Workshop series
[{{cite web, url=http://www.textgraphs.org/, title=Textgraphs, access-date=6 March 2017] is a series of regular
academic workshop
An academic conference or scientific conference (also congress, symposium, workshop, or meeting) is an event for researchers (not necessarily academics) to present and discuss their scholarly work. Together with academic or scientific journal ...
s intended to encourage the synergy between the fields of
natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
(NLP) and
graph theory
In mathematics, graph theory is the study of '' graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of '' vertices'' (also called ''nodes'' or ''points'') which are conn ...
. The mix between the two started small, with graph theoretical framework providing efficient and elegant solutions for NLP applications that focused on single documents for part-of-speech tagging,
word-sense disambiguation
Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to cons ...
and semantic role labelling, got progressively larger with
ontology learning
Ontology learning (ontology extraction, ontology generation, or ontology acquisition) is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts that ...
and
information extraction
Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concer ...
from large text collections.
Th
11th edition of the workshop (TextGraphs-11)will be collocated with the Annual Meeting of
Association for Computational Linguistics
The Association for Computational Linguistics (ACL) is a scientific and professional organization for people working on natural language processing. Its namesake conference is one of the primary high impact conferences for natural language proces ...
ACL 2017 in
Vancouver
Vancouver ( ) is a major city in western Canada, located in the Lower Mainland region of British Columbia. As the most populous city in the province, the 2021 Canadian census recorded 662,248 people in the city, up from 631,486 in 2016. Th ...
,
BC,
Canada
Canada is a country in North America. Its ten provinces and three territories extend from the Atlantic Ocean to the Pacific Ocean and northward into the Arctic Ocean, covering over , making it the world's second-largest country by tota ...
.
Areas of interest
* Graph-based methods for providing reasoning and interpretation of deep learning methods
** Graph-based methods for reasoning and interpreting deep processing by neural networks,
** Explorations of the capabilities and limits of graph-based methods applied to neural networks in general
** Investigation of which aspects of neural networks are not susceptible to graph-based methods.
* Graph-based methods for Information Retrieval, Information Extraction, and Text Mining
** Graph-based methods for word sense disambiguation,
** Graph-based representations for ontology learning,
** Graph-based strategies for semantic relations identification,
** Encoding semantic distances in graphs,
** Graph-based techniques for text summarization, simplification, and paraphrasing
** Graph-based techniques for document navigation and visualization
** Reranking with graphs
** Applications of label propagation algorithms, etc.
* New graph-based methods for NLP applications
** Random walk methods in graphs
** Spectral graph clustering
** Semi-supervised graph-based methods
** Methods and analyses for statistical networks
** Small world graphs
** Dynamic graph representations
** Topological and pretopological analysis of graphs
** Graph kernels, etc.
* Graph-based methods for applications on social networks
** Rumor proliferation
** E-reputation
** Multiple identity detection
** Language dynamics studies
** Surveillance systems, etc.
* Graph-based methods for NLP and Semantic Web
** Representation learning methods for knowledge graphs (i.e., knowledge graph embedding)
** Using graphs-based methods to populate ontologies using textual data,
** Inducing knowledge of ontologies into NLP applications using graphs,
** Merging ontologies with graph-based methods using NLP techniques.
See also
*
Bag-of-words model
The bag-of-words model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding ...
*
Document classification
Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") ...
*
Document-term matrix
A document-term matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms. This matrix i ...
*
Hyperlinking
In computing, a hyperlink, or simply a link, is a digital reference to data that the user can follow or be guided by clicking or tapping. A hyperlink points to a whole document or to a specific element within a document. Hypertext is text ...
*
Graph database
A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the ''graph'' (or ''edge'' or ''relationship''). The graph relat ...
*
Wiki
A wiki ( ) is an online hypertext publication collaboratively edited and managed by its own audience, using a web browser. A typical wiki contains multiple pages for the subjects or scope of the project, and could be either open to the pu ...
References
External links
Gabor Melli's page on text graphsDescription of text graphs from a semantic processing perspective.
Natural language processing