Semantic textual similarity based on deep learning: Can it improve matching and retrieval for Translation Memory tools?

T. Ranasinghe, R. Mitkov, Constantin Orǎsan, Rocío Quintana

    Research output: Chapter in Book/Published conference outputChapter

    Abstract

    This study proposes an original methodology to underpin the operation of new generation Translation Memory (TM) systems where the translations to be retrieved from the TM database are matched not on the basis of Levenshtein (edit) distance but by employing innovative Natural Language Processing (NLP) and Deep Learning (DL) techniques. Three DL sentence encoders were experimented with to retrieve TM matches in English-Spanish sentence pairs from the DGT TM dataset. Each sentence encoder was compared with Okapi which uses edit distance to retrieve the best match. The automatic evaluation shows the benefit of the DL technology for TM matching and holds promise for the implementation of the TM tool itself, which is our next project.
    Original languageEnglish
    Title of host publicationCorpora in Translation and Contrastive Research in the Digital Age: Recent advances and explorations
    EditorsJulia Lavid-Lopez, Carmen Maiz-Arevalo, Juan Rafael Zamorano-Mansilla
    PublisherJohn Benjamins
    Chapter4
    Pages101-124
    Number of pages23
    ISBN (Electronic)9789027259684
    ISBN (Print)9789027209184
    DOIs
    Publication statusPublished - 8 Dec 2021

    Bibliographical note

    Copyright © John Benjamins

    Keywords

    • machine translation
    • translation memory
    • deep learning
    • Okapi
    • textual similarity
    • semantic similarity

    Fingerprint

    Dive into the research topics of 'Semantic textual similarity based on deep learning: Can it improve matching and retrieval for Translation Memory tools?'. Together they form a unique fingerprint.

    Cite this