ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating

automatic summarization Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Artificial intelligence algorithms are comm ...

and

machine translation Machine translation is use of computational techniques to translate text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages. Early approaches were mostly rule-based or statisti ...

software in

natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...

. The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation. ROUGE metrics range between 0 and 1, with higher scores indicating higher similarity between the automatically produced summary and the reference.

Metrics

The following five evaluation metrics are available. *ROUGE-N: Overlap of n-gramsLin, Chin-Yew and E.H. Hovy 2003. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In Proceedings of 2003 Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, May 27 - June 1, 2003.
/ref> between the system and reference summaries. **ROUGE-1 refers to the overlap of ''unigrams'' ''(each word)'' between the system and reference summaries. **ROUGE-2 refers to the overlap of ''bigrams'' between the system and reference summaries. *ROUGE-L: Longest Common Subsequence (LCS)Lin, Chin-Yew and Franz Josef Och. 2004. Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, July 21 - 26, 2004.
/ref> based statistics.

Longest common subsequence problem A longest common subsequence (LCS) is the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring: unlike substrings, subsequences are not required to occupy conse ...

takes into account sentence-level structure similarity naturally and identifies longest co-occurring in sequence n-grams automatically. *ROUGE-W: Weighted LCS-based statistics that favors consecutive LCSes. *ROUGE-S: Skip-

bigram A bigram or digram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. A bigram is an ''n''-gram for ''n''=2. The frequency distribution of every bigram in a string is commonly used f ...

based co-occurrence statistics. Skip-bigram is any pair of words in their sentence order. *ROUGE-SU: Skip-bigram plus unigram-based co-occurrence statistics.

References

{{Reflist

External links

ROUGE Usage TutorialJava Implementation of ROUGE
Machine translation Computational linguistics Natural language processing software Data mining

Metrics

See also

References

External links