Titre

Query Translation for CLIR: EWC vs. Google Translate

Auteur(s)

KLYUEV Vitaly2, HARALAMBOUS Yannis1

Type de document

Communication dans une conférence avec acte

Source

ICIST 2012: IEEE International Conference on Information Science and Technology , IEEE, 23-25 march 2012, Wuhan, China, 2012, pp. 707-711, ISBN 978-16184-0344-7

Année

2012

Résumé

A new approach to find accurate translation of
search engine queries from Japanese into English for the CLIR
task is proposed. The Mecab system and online dictionary
SPACEALC are utilized to segment Japanese queries and to get
all possible English senses for every term detected. To
disambiguate terms, the idea of the shortest path on an oriented
graph is applied. Nodes of this graph symbolize word senses and
edges connect nodes representing neighboring Japanese terms.
The EWC semantic relatedness measure is used to select the
most related meanings for the translation results. This measure
combines the Wikipedia-based Explicit Semantic Analysis
measure, the WordNet path measure and the mixed collocation
index. The proposed technique is tested on the NTCIR data
collection. Queries generated by Google Translate were used to
evaluate the quality of translation.

Labos

1 : INFO - Dépt. Informatique (Institut Mines-Télécom-Télécom Bretagne-UEB)
2 : University of Aizu (.)

Référence

12078

retour à la liste des publications
  • Télécom Bretagne Alumni
  • UEB
  • Fondation Télécom
  • Institut Mines-Télécom
Technopôle Brest-Iroise - CS 83818 - 29238 Brest Cedex 3 - France - Tél : 33 (0)2 29 00 11 11 - Fax : 33 (0)2 29 00 10 00