Reading List -- Evaluation
Reading List - Evaluation
Here are a couple of references to read on evaluation.
The two EAGLES reports are about NLP evaluation in general. Read about the evaluation framework and requirements analysis in either of them. In my opinion, the 96' report is more coherent, but the 98' report contains some new material and minor changes. Take a look at the case studies that are relevant for machine translation, and read the section Toward Finely Differentiated Evaluation Metrics for Machine Translation in the 98' report. It gives an introduction to the ISLE MT Evaluation Taxonomy, which you should also have a look at.
Alshawi et al. 1998 describes evaluation with an edit distance measure using
sclite of SCTK.
Papineni et al. 2001 describes evaluation with an n-gram occurance measure using
mteval of MTEVAL-KIT.
Nießen et al. 2000 describes semi-automatic evaluation with humans in the loop using
The slides from the lecture.
The other references are optional, but they might be useful as reference starting points for your assignment and course project.
ISLE/EWG (International Standards for Language Engineering, MT Evaluation Working Group). 2002. Taxonomy for MT Evaluation
- The EAGLES MT Evaluation Working Group. 1996. EAGLES Evaluation of Natural Language Processing Systems. Final Report. EAGLES Document EAG-EWG-PR.2, ISBN 87-90708-00-8. Center for Sprogteknologi, Copenhagen. (ps.gz or pdf.gz)
- The EAGLES MT Evaluation Working Group. 1998. EAGLES Evaluation of Natural Language Processing Systems. Draft Final Report. EAGLES Document. Center for Sprogteknologi, Copenhagen. (ps.gz)
- ISLE/EWG (International Standards for Language Engineering, MT Evaluation Working Group). 2002. Taxonomy for MT Evaluation
- Alshawi et al. 1998. Automatic Acquisition of Hierarchical Transduction Models for Machine Translation. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics (ACL'98), p. 41-47, Montreal, Canada (pdf)
- Papineni et al. 2001. BLEU: a Method for Automatic Evaluation of Machine Translation. Technical Report RC22176 (W0109-022). IBM Research Division, T. J. Watson Research Center (pdf)
- Nießen et al. 2000. An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research. In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC'00), p. 39-45, Athens, Greece (ps)