Reading List - Evaluation

Here are a couple of references to read on evaluation.

The two EAGLES reports are about NLP evaluation in general. Read about the evaluation framework and requirements analysis in either of them. In my opinion, the 96' report is more coherent, but the 98' report contains some new material and minor changes. Take a look at the case studies that are relevant for machine translation, and read the section Toward Finely Differentiated Evaluation Metrics for Machine Translation in the 98' report. It gives an introduction to the ISLE MT Evaluation Taxonomy, which you should also have a look at.

Alshawi et al. 1998 describes evaluation with an edit distance measure using sclite of SCTK.

Papineni et al. 2001 describes evaluation with an n-gram occurance measure using mteval of MTEVAL-KIT.

Nießen et al. 2000 describes semi-automatic evaluation with humans in the loop using EvalTrans.

The other references are optional, but they might be useful as reference starting points for your assignment and course project.

  • ISLE/EWG (International Standards for Language Engineering, MT Evaluation Working Group). 2002. Taxonomy for MT Evaluation
  • Other References