Question A question is an utterance which serves as a request for information. Questions are sometimes distinguished from interrogatives, which are the grammatical forms typically used to express them. Rhetorical questions, for instance, are interroga ...

answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language.

Overview

A question answering implementation, usually a computer program, may construct its answers by querying a structured

database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...

of knowledge or information, usually a

knowledge base A knowledge base (KB) is a technology used to store complex structured and unstructured information used by a computer system. The initial use of the term was in connection with expert systems, which were the first knowledge-based systems. ...

. More commonly, question answering systems can pull answers from an unstructured collection of natural language documents. Some examples of natural language document collections used for question answering systems include: * a local collection of reference texts * internal organization documents and web pages * compiled

newswire A news agency is an organization that gathers news reports and sells them to subscribing news organizations, such as newspapers, magazines and radio and television broadcasters. A news agency may also be referred to as a wire service, newswire, ...

reports * a set of

Wikipedia Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system. Wikipedia is the largest and most-read refer ...

pages * a subset of

World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web ...

pages

Types of question answering

Question answering research attempts to deal with a wide range of question types including: fact, list, definition, ''How'', ''Why'', hypothetical, semantically constrained, and cross-lingual questions. * Answering questions related to an article in order to evaluate Reading comprehension. This is a simpler form of question answering since the given article is relatively short compared to other types of question answering problems dealing with larger domain of information. An example of an open-domain question is "What did Albert Einstein win the Nobel Prize for?" while an article about this subject is given to the system. * ''Closed-book'' question answering: a system has memorized some facts during training and can answer questions without explicitly given a context. This is similar to humans taking closed-book exams. * ''Closed-domain'' question answering deals with questions under a specific domain (for example, medicine or automotive maintenance), and can exploit domain-specific knowledge frequently formalized in

ontologies In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains ...

. Alternatively, ''closed-domain'' might refer to a situation where only a limited type of questions are accepted, such as questions asking for

descriptive In the study of language, description or descriptive linguistics is the work of objectively analyzing and describing how language is actually used (or how it was used in the past) by a speech community. François & Ponsonnet (2013). All acad ...

rather than procedural information. Question answering systems in the context of machine reading applications have also been constructed in the medical domain, for instance related to Alzheimer's disease. * '' Open-domain'' question answering deals with questions about nearly anything, and can only rely on general ontologies and world knowledge. On the other hand, these systems usually have much more data available from which to extract the answer. An example of an open-domain question is "What did Albert Einstein win the Nobel Prize for?" while no article about this subject is given to the system. Another way to categorize question answering systems is to use the technical approached used. There are a number of different types of QA systems, including * rule-based systems, * statistical systems, and * hybrid systems. Rule-based systems use a set of rules to determine the correct answer to a question. Statistical systems use statistical methods to find the most likely answer to a question. Hybrid systems use a combination of rule-based and statistical methods.

History

Two early question answering systems were BASEBALL and LUNAR. BASEBALL answered questions about Major League Baseball league over a period of one year. LUNAR, in turn, answered questions about the geological analysis of rocks returned by the Apollo moon missions. Both question answering systems were very effective in their chosen domains. In fact, LUNAR was demonstrated at a lunar science convention in 1971 and it was able to answer 90% of the questions in its domain posed by people untrained on the system. Further restricted-domain question answering systems were developed in the following years. The common feature of all these systems is that they had a core database or knowledge system that was hand-written by experts of the chosen domain. The language abilities of BASEBALL and LUNAR used techniques similar to

ELIZA ELIZA is an early natural language processing computer program created from 1964 to 1966 at the MIT Artificial Intelligence Laboratory by Joseph Weizenbaum. Created to demonstrate the superficiality of communication between humans and machines, ...

and

DOCTOR Doctor or The Doctor may refer to: Personal titles * Doctor (title), the holder of an accredited academic degree * A medical practitioner, including: ** Physician ** Surgeon ** Dentist ** Veterinary physician ** Optometrist *Other roles ** ...

, the first

chatterbot A chatbot or chatterbot is a software application used to conduct an on-line chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent. Designed to convincingly simulate the way a human would behav ...

programs.

SHRDLU SHRDLU was an early natural-language understanding computer program, developed by Terry Winograd at MIT in 1968–1970. In the program, the user carries on a conversation with the computer, moving objects, naming collections and querying the ...

was a highly successful question-answering program developed by

Terry Winograd Terry Allen Winograd (born February 24, 1946) is an American professor of computer science at Stanford University, and co-director of the Stanford Human–Computer Interaction Group. He is known within the philosophy of mind and artificial intel ...

in the late 1960s and early 1970s. It simulated the operation of a robot in a toy world (the "blocks world"), and it offered the possibility of asking the robot questions about the state of the world. Again, the strength of this system was the choice of a very specific domain and a very simple world with rules of physics that were easy to encode in a computer program. In the 1970s,

s were developed that targeted narrower domains of knowledge. The question answering systems developed to interface with these expert systems produced more repeatable and valid responses to questions within an area of knowledge. These

expert systems In artificial intelligence, an expert system is a computer system emulating the decision-making ability of a human expert. Expert systems are designed to solve complex problems by reasoning through bodies of knowledge, represented mainly as if� ...

closely resembled modern question answering systems except in their internal architecture. Expert systems rely heavily on expert-constructed and organized

s, whereas many modern question answering systems rely on statistical processing of a large, unstructured, natural language text corpus. The 1970s and 1980s saw the development of comprehensive theories in computational linguistics, which led to the development of ambitious projects in text comprehension and question answering. One example of such a system was the Unix Consultant (UC), developed by

Robert Wilensky Robert Wilensky (26 March 1951 – 15 March 2013) was an American computer scientist and emeritus professor at the UC Berkeley School of Information, with his main focus of research in artificial intelligence. Academic career In 1971, Wilens ...

at U.C. Berkeley in the late 1980s. The system answered questions pertaining to the

Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, an ...

operating system. It had a comprehensive hand-crafted knowledge base of its domain, and it aimed at phrasing the answer to accommodate various types of users. Another project was LILOG, a text-understanding system that operated on the domain of tourism information in a German city. The systems developed in the UC and LILOG projects never went past the stage of simple demonstrations, but they helped the development of theories on computational linguistics and reasoning. Specialized natural language question answering systems have been developed, such as EAGLi for health and life scientists.

Applications

QA systems are used in a variety of applications, including * Fact-checking if a fact is verified by posing a question like: is fact X true or false? * customer service, * technical support, * market research, * generate reports or conduct research.

Architecture

As of 2001, question answering systems typically included a ''question classifier'' module that determines the type of question and the type of answer. Different types of question answering systems employ different architectures. For example, modern open-domain question answering systems may use a retriever-reader architecture. The retriever is aimed at retrieving relevant documents related to a given question, while the reader is used for inferring the answer from the retrieved documents. Latest systems, such as

GPT-3 Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt. The architecture is a standa ...

, T5, and BART, even use an end-to-end architecture in which a transformer-based architecture is used to store large-scale textual data in the underlying parameters. These models can then be directly used to answer questions without accessing any external knowledge sources.

Question answering methods

Question answering is very dependent on a good search

corpus Corpus is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of linguistics Music * ...

—for without documents containing the answer, there is little any question answering system can do. It thus makes sense that larger collection sizes generally lend well to better question answering performance, unless the question domain is orthogonal to the collection. The notion of data redundancy in massive collections, such as the web, means that nuggets of information are likely to be phrased in many different ways in differing contexts and documents, leading to two benefits: # By having the right information appear in many forms, the burden on the question answering system to perform complex NLP techniques to understand the text is lessened. # Correct answers can be filtered from

false positive A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test resul ...

s by relying on the correct answer to appear more times in the documents than instances of incorrect ones. Some question answering systems rely heavily on

automated reasoning In computer science, in particular in knowledge representation and reasoning and metalogic, the area of automated reasoning is dedicated to understanding different aspects of reasoning. The study of automated reasoning helps produce computer prog ...

Open domain question answering

In information retrieval, an open domain question answering system aims at returning an answer in response to the user's question. The returned answer is in the form of short texts rather than a list of relevant documents. The system uses a combination of techniques from computational linguistics, information retrieval and knowledge representation for finding answers. The system takes a natural language question as an input rather than a set of keywords, for example, "When is the national day of China?" The sentence is then transformed into a query through its

logical form In logic, logical form of a statement is a precisely-specified semantic version of that statement in a formal system. Informally, the logical form attempts to formalize a possibly ambiguous statement into a statement with a precise, unambiguou ...

. Having the input in the form of a natural language question makes the system more user-friendly, but harder to implement, as there are various question types and the system will have to identify the correct one in order to give a sensible answer. Assigning a question type to the question is a crucial task, the entire answer extraction process relies on finding the correct question type and hence the correct answer type. Keyword extraction is the first step for identifying the input question type. In some cases, there are clear words that indicate the question type directly, i.e., "Who", "Where" or "How many", these words tell the system that the answers should be of type "Person", "Location", or "Number", respectively. In the example above, the word "When" indicates that the answer should be of type "Date". POS (part-of-speech) tagging and syntactic parsing techniques can also be used to determine the answer type. In this case, the subject is "Chinese National Day", the predicate is "is" and the adverbial modifier is "when", therefore the answer type is "Date". Unfortunately, some interrogative words like "Which", "What" or "How" do not give clear answer types. Each of these words can represent more than one type. In situations like this, other words in the question need to be considered. First thing to do is to find the words that can indicate the meaning of the question. A lexical dictionary such as

WordNet WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into '' synsets'' with short defin ...

can then be used for understanding the context. Once the question type has been identified, an information retrieval system is used to find a set of documents containing the correct keywords. A tagger and NP/Verb Group chunker can be used to verify whether the correct entities and relations are mentioned in the found documents. For questions such as "Who" or "Where", a named-entity recogniser is used to find relevant "Person" and "Location" names from the retrieved documents. Only the relevant paragraphs are selected for ranking. A

vector space model Vector space model or term vector model is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers (such as index terms). It is used in information filtering, information retrieval, indexing and ...

can be used as a strategy for classifying the candidate answers. Check if the answer is of the correct type as determined in the question type analysis stage. An inference technique can also be used to validate the candidate answers. A score is then given to each of these candidates according to the number of question words it contains and how close these words are to the candidate, the more and the closer the better. The answer is then translated into a compact and meaningful representation by parsing. In the previous example, the expected output answer is "1st Oct."

Mathematical question answering

An open source math-aware question answering system based on Ask Platypus and

Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, can use under the CC0 public domain license ...

was published in 2018. The system takes an English or Hindi natural language question as input and returns a mathematical formula retrieved from Wikidata as succinct answer. The resulting formula is translated into a computable form, allowing the user to insert values for the variables. Names and values of variables and common constants are retrieved from Wikidata if available. It is claimed that the system outperforms a commercial computational mathematical knowledge engine on a test set. MathQA is hosted by Wikimedia at https://mathqa.wmflabs.org/. In 2022, it was extended to answer 15 math question types. MathQA methods need to combine natural and formula language. One possible approach is to perform supervised annotation via Entity Linking. The "ARQMath Task" at CLEF 2020 was launched to address the problem of linking newly posted questions from the platform Math Stack Exchange (MSE) to existing ones that were already answered by the community. The lab was motivated by the fact that Mansouri et al. discovered that 20% of the mathematical queries in general-purpose search engines are expressed as well-formed questions. It contained two separate sub-tasks. Task 1: "Answer retrieval" matching old post answers to newly posed questions and Task 2: "Formula retrieval" matching old post formulae to new questions. Starting with the domain of mathematics, which involves formula language, the goal is to later extend the task to other domains (e.g., STEM disciplines, such as chemistry, biology, etc.), which employ other types of special notation (e.g., chemical formulae). Moreover, also the inverse process of mathematical question answering, i.e., mathematical question generation has researched. The PhysWikiQuiz physics question generation and test engine retrieves mathematical formulae from Wikidata together with semantic information of their constituting identifiers (names and values of variables). The formuale are then rearranged to generate a set of formula variants. Subsequently, the variables are substitued with random values to generate a large number of different questions suitable for individual student tests. PhysWikiquiz is hosted by Wikimedia at https://physwikiquiz.wmflabs.org/.

Progress

Question answering systems have been extended in recent years to encompass additional domains of knowledge For example, systems have been developed to automatically answer temporal and geospatial questions, questions of definition and terminology, biographical questions, multilingual questions, and questions about the content of audio, images, and video. Current question answering research topics include: * interactivity—clarification of questions or answers * answer reuse or caching *

semantic parsing Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Application ...

* answer presentation * knowledge representation and semantic

entailment Logical consequence (also entailment) is a fundamental concept in logic, which describes the relationship between statements that hold true when one statement logically ''follows from'' one or more statements. A valid logical argument is one ...

* social media analysis with question answering systems *

sentiment analysis Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjec ...

* utilization of thematic roles * Image captioning for visual question answeringAnderson, Peter, et al.
Bottom-up and top-down attention for image captioning and visual question answering
" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. * Embodied question answering In 2011, Watson, a question answering computer system developed by IBM, competed in two exhibition matches of ''

Jeopardy! ''Jeopardy!'' is an American game show created by Merv Griffin. The show is a quiz competition that reverses the traditional question-and-answer format of many quiz shows. Rather than being given questions, contestants are instead given gene ...

'' against Brad Rutter and

Ken Jennings Kenneth Wayne Jennings III (born May 23, 1974) is an American game show host, author, and former game show contestant. He is the highest-earning American game show contestant, having won money on five different game shows, including $4,522,70 ...

, winning by a significant margin. Facebook Research has made their DrQA system available under an

open source license An open-source license is a type of license for computer software and other products that allows the source code, blueprint or design to be used, modified and/or shared under defined terms and conditions. This allows end users and commercial compa ...

. This system has been used for open domain question answering using

as knowledge source. The open source framework Haystack by

deepset deepset is a startup that provides software developers with the tools to build production-ready natural language processing (NLP) systems. It was founded in 2018 in Berlin by Milos Rusic, Malte Pietsch, and Timo Möller. deepset authored and mai ...

allows combining open domain question answering with generative question answering and supports the domain adaptation of the underlying

language models A language model is a probability distribution over sequences of words. Given any sequence of words of length , a language model assigns a probability P(w_1,\ldots,w_m) to the whole sequence. Language models generate probabilities by training on ...

for industry use cases.

References

External links

Question Answering Evaluation at NTCIR

Question Answering Evaluation at CLEF

Quiz Question Answers
* Online Question Answering System {{Natural Language Processing Applications of artificial intelligence Natural language processing Computational linguistics Information retrieval genres Tasks of natural language processing Deep learning