这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–12 of 12 results for author: Vilar, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.24013  [pdf, other

    cs.CL

    You Cannot Feed Two Birds with One Score: the Accuracy-Naturalness Tradeoff in Translation

    Authors: Gergely Flamich, David Vilar, Jan-Thorsten Peter, Markus Freitag

    Abstract: The goal of translation, be it by human or by machine, is, given some text in a source language, to produce text in a target language that simultaneously 1) preserves the meaning of the source text and 2) achieves natural expression in the target language. However, researchers in the machine translation community usually assess translations using a single score intended to capture semantic accurac… ▽ More

    Submitted 1 April, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

    Comments: Corrected a typo in Eq (3)

  2. arXiv:2503.19786  [pdf, other

    cs.CL cs.AI

    Gemma 3 Technical Report

    Authors: Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai Zhai, Anton Tsitsulin , et al. (191 additional authors not shown)

    Abstract: We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  3. arXiv:2408.06537  [pdf, other

    cs.CL

    Introducing the NewsPaLM MBR and QE Dataset: LLM-Generated High-Quality Parallel Data Outperforms Traditional Web-Crawled Data

    Authors: Mara Finkelstein, David Vilar, Markus Freitag

    Abstract: Recent research in neural machine translation (NMT) has shown that training on high-quality machine-generated data can outperform training on human-generated data. This work accompanies the first-ever release of a LLM-generated, MBR-decoded and QE-reranked dataset with both sentence-level and multi-sentence examples. We perform extensive experiments to demonstrate the quality of our dataset in ter… ▽ More

    Submitted 22 November, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  4. arXiv:2406.02832  [pdf, other

    cs.CL cs.LG

    Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms

    Authors: Firas Trabelsi, David Vilar, Mara Finkelstein, Markus Freitag

    Abstract: Minimum Bayes Risk (MBR) decoding is a powerful decoding strategy widely used for text generation tasks, but its quadratic computational complexity limits its practical application. This paper presents a novel approach for approximating MBR decoding using matrix completion techniques, focusing on the task of machine translation. We formulate MBR decoding as a matrix completion problem, where the u… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2311.05350  [pdf, other

    cs.CL

    There's no Data Like Better Data: Using QE Metrics for MT Data Filtering

    Authors: Jan-Thorsten Peter, David Vilar, Daniel Deutsch, Mara Finkelstein, Juraj Juraska, Markus Freitag

    Abstract: Quality Estimation (QE), the evaluation of machine translation output without the need of explicit references, has seen big improvements in the last years with the use of neural metrics. In this paper we analyze the viability of using QE metrics for filtering out bad quality sentence pairs in the training data of neural machine translation systems~(NMT). While most corpus filtering methods are foc… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: to be published at WMT23

  6. arXiv:2310.06707  [pdf, other

    cs.CL cs.AI

    Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model

    Authors: Christian Tomani, David Vilar, Markus Freitag, Colin Cherry, Subhajit Naskar, Mara Finkelstein, Xavier Garcia, Daniel Cremers

    Abstract: Maximum-a-posteriori (MAP) decoding is the most widely used decoding strategy for neural machine translation (NMT) models. The underlying assumption is that model probability correlates well with human judgment, with better translations getting assigned a higher score by the model. However, research has shown that this assumption does not always hold, and generation quality can be improved by deco… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

  7. arXiv:2211.09102  [pdf, other

    cs.CL

    Prompting PaLM for Translation: Assessing Strategies and Performance

    Authors: David Vilar, Markus Freitag, Colin Cherry, Jiaming Luo, Viresh Ratnakar, George Foster

    Abstract: Large language models (LLMs) that have been trained on multilingual but not parallel text exhibit a remarkable ability to translate between languages. We probe this ability in an in-depth study of the pathways language model (PaLM), which has demonstrated the strongest machine translation (MT) performance among similarly-trained LLMs to date. We investigate various strategies for choosing translat… ▽ More

    Submitted 25 June, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: ACL 2023

  8. arXiv:2112.03052  [pdf, other

    cs.LG cs.CL cs.CV

    Scaling Up Influence Functions

    Authors: Andrea Schioppa, Polina Zablotskaia, David Vilar, Artem Sokolov

    Abstract: We address efficient calculation of influence functions for tracking predictions back to the training data. We propose and analyze a new approach to speeding up the inverse Hessian calculation based on Arnoldi iteration. With this improvement, we achieve, to the best of our knowledge, the first successful implementation of influence functions that scales to full-size (language and vision) Transfor… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: Published at AAAI-22

  9. arXiv:2110.06997  [pdf, other

    cs.CL cs.AI

    Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits

    Authors: Julia Kreutzer, David Vilar, Artem Sokolov

    Abstract: Training data for machine translation (MT) is often sourced from a multitude of large corpora that are multi-faceted in nature, e.g. containing contents from multiple domains or different levels of quality or complexity. Naturally, these facets do not occur with equal frequency, nor are they equally important for the test scenario at hand. In this work, we propose to optimize this balance jointly… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: EMNLP Findings 2021

  10. arXiv:2008.04885  [pdf, ps, other

    cs.CL

    The Sockeye 2 Neural Machine Translation Toolkit at AMTA 2020

    Authors: Tobias Domhan, Michael Denkowski, David Vilar, Xing Niu, Felix Hieber, Kenneth Heafield

    Abstract: We present Sockeye 2, a modernized and streamlined version of the Sockeye neural machine translation (NMT) toolkit. New features include a simplified code base through the use of MXNet's Gluon API, a focus on state of the art model architectures, distributed mixed precision training, and efficient CPU decoding with 8-bit quantization. These improvements result in faster training and inference, hig… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

  11. arXiv:1804.06609  [pdf, other

    cs.CL

    Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation

    Authors: Matt Post, David Vilar

    Abstract: The end-to-end nature of neural machine translation (NMT) removes many ways of manually guiding the translation process that were available in older paradigms. Recent work, however, has introduced a new capability: lexically constrained or guided decoding, a modification to beam search that forces the inclusion of pre-specified words and phrases in the output. However, while theoretically sound, e… ▽ More

    Submitted 9 November, 2018; v1 submitted 18 April, 2018; originally announced April 2018.

    Comments: 11 pages, 9 figures, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

  12. arXiv:1712.05690  [pdf, other

    cs.CL cs.LG stat.ML

    Sockeye: A Toolkit for Neural Machine Translation

    Authors: Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post

    Abstract: We describe Sockeye (version 1.12), an open-source sequence-to-sequence toolkit for Neural Machine Translation (NMT). Sockeye is a production-ready framework for training and applying models as well as an experimental platform for researchers. Written in Python and built on MXNet, the toolkit offers scalable training and inference for the three most prominent encoder-decoder architectures: attenti… ▽ More

    Submitted 1 June, 2018; v1 submitted 15 December, 2017; originally announced December 2017.