ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating
automatic summarization
Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Artificial intelligence algorithms are comm ...
and
machine translation
Machine translation is use of computational techniques to translate text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages.
Early approaches were mostly rule-based or statisti ...
software in
natural language processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
. The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation. ROUGE metrics range between 0 and 1, with higher scores indicating higher similarity between the automatically produced summary and the reference.
Metrics
The following five evaluation metrics are available.
*ROUGE-N: Overlap of
n-gramsLin, Chin-Yew and E.H. Hovy 2003. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In Proceedings of 2003 Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, May 27 - June 1, 2003.
/ref> between the system and reference summaries.
**ROUGE-1 refers to the overlap of ''unigrams'' ''(each word)'' between the system and reference summaries.
**ROUGE-2 refers to the overlap of ''bigrams'' between the system and reference summaries.
*ROUGE-L: Longest Common Subsequence (LCS)[Lin, Chin-Yew and Franz Josef Och. 2004. Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, July 21 - 26, 2004.](_blank)
/ref> based statistics. Longest common subsequence problem
A longest common subsequence (LCS) is the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring: unlike substrings, subsequences are not required to occupy conse ...
takes into account sentence-level structure similarity naturally and identifies longest co-occurring in sequence n-grams automatically.
*ROUGE-W: Weighted LCS-based statistics that favors consecutive LCSes.
*ROUGE-S: Skip-bigram
A bigram or digram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. A bigram is an ''n''-gram for ''n''=2.
The frequency distribution of every bigram in a string is commonly used f ...
based co-occurrence statistics. Skip-bigram is any pair of words in their sentence order.
*ROUGE-SU: Skip-bigram plus unigram-based co-occurrence statistics.
See also
* BLEU
Bleu or BLEU may refer to:
* '' Three Colors: Blue'', a 1993 film
* BLEU (Bilingual Evaluation Understudy), a machine translation evaluation metric
* Belgium–Luxembourg Economic Union
* Blue cheese, a type of cheese
* Parti bleu, 19th century ...
* F-Measure
In statistics, statistical analysis of binary classification and information retrieval systems, the F-score or F-measure is a measure of predictive performance. It is calculated from the Precision (information retrieval), precision and Recall (in ...
* METEOR
A meteor, known colloquially as a shooting star, is a glowing streak of a small body (usually meteoroid) going through Earth's atmosphere, after being heated to incandescence by collisions with air molecules in the upper atmosphere,
creating a ...
* NIST (metric)
* Noun-phrase chunking
* Word error rate (WER)
References
{{Reflist
External links
ROUGE Usage Tutorial
Java Implementation of ROUGE
Machine translation
Computational linguistics
Natural language processing software
Data mining