Machine Translation Challenges
Khalil Simaan, University of Amsterdam
In this talk I will briefly review the current state of affairs in statistical machine translation and outline some of the outstanding research themes. In contrast to the early compositional approaches to translation the current models embody the idea that idiomiticity is the rule rather than the exception. These models employ probabilistic dictionaries consisting of blocks of words and their translations acquired from large word-aligned parallel corpora. In this talk I will review the shared aspects of these models and pinpoint some of the major challenges. In particular I will sketch the recent surge of syntax-augmented approaches for language modeling as well as for dealing with the divergence between languages in terms of word-order and morphology. In conclusion I propose that in order to break through the performance barrier, the challenge will inevitably be a reformulation of an old one: how does idiomatic translation gracefully degenerate into a statistical interpretation of the compositional approach.
