Title:

Measuring User Acceptability of Machine Translation to Diagnose System Errors: An Experience Report.

Abstract:

Conventional ways of measuring machine translation quality compares the accuracy of system output without clearly specifying what ``accuracy'' entails. Many current evaluation methods suffer from requiring too much time commitment from expert human evaluators. Moreover, these methods do not give direct feedback on user acceptability of the system, and do not hint on areas of focus for researchers or developers. In this work, we explore an output inspection method that measures user acceptance and pokes at system errors so that developers and researchers can walk away knowing what was acceptable and what to improve on. The evaluation framework for machine translation is described and experimental results for two systems are presented. The results of the experiments are very encouraging. We provide a discussion on identifying important translation quality factors for users, a pilot study of running this evaluation in the text summarization domain, and ideas on how to use the gathered data to create user profiles.

Keywords:

comparative evaluation, user acceptance, machine translation