Defesa de Dissertação de Felipe Soares

Detalhes do Evento


Aluno: Felipe Soares
Orientadora: Profª Drª Karin Becker

Título: Machine Translation for the biomedical domain – corpora acquisition and translation experiments
Linha de Pesquisa: Mineração, Integração e Análise de Dados

Data: 12/06/2019
Hora: 9h
Local: Prédio 43412 – Sala 218 do Instituto de Informática da UFRGS

Banca Examinadora:
Profª. Drª. Helena de Medeiros Caseli (UFSCAR – por videoconferência)
Prof. Dr. Joel Luis Carbonera (UFRGS)
Prof. Dr. Leandro Krug Wives (UFRGS)

Presidente da Banca: Profª Drª Karin Becker

Abstract: Availability of biomedical documents in more than one language (e.g. not just in English) can broader the access to information and help patients and practitioners to keep up to date with the recent advances in biomedicine. In this work, we are interested in using machine translation to translate Spanish and Portuguese biomedical scientific texts to English, and vice-versa. We also present the development of three parallel corpora for scientific texts in the biomedical domain in English, Portuguese and Spanish. Our developed corpora are larger than the already available ones for this domain and languages. Regarding translation experiments, to create our training data, we concatenated several parallel corpora, both from in-domain and out-of-domain sources, as well as terminological resources from UMLS. We validated our approaches by participating in the biomedical translation track of the WMT conference. Our systems are based on statistical machine translation and neural machine translation, using the Moses and OpenNMT toolkits, respectively. We participated in four translation directions for the English/Spanish and English/Portuguese language pairs. Our systems achieved the best BLEU scores according to the official shared task evaluation.

Keywords: Scientific Texts, Biomedical Domain, Corpora Acquisition, Statistical Machine Translation, Neural Machine Translation