-
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Authors:
Gheorghe Comanici,
Eric Bieber,
Mike Schaekermann,
Ice Pasupat,
Noveen Sachdeva,
Inderjit Dhillon,
Marcel Blistein,
Ori Ram,
Dan Zhang,
Evan Rosen,
Luke Marris,
Sam Petulla,
Colin Gaffney,
Asaf Aharoni,
Nathan Lintz,
Tiago Cardal Pais,
Henrik Jacobsson,
Idan Szpektor,
Nan-Jiang Jiang,
Krishna Haridasan,
Ahmed Omran,
Nikunj Saunshi,
Dara Bahri,
Gaurav Mishra,
Eric Chu
, et al. (3284 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde…
▽ More
In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.
△ Less
Submitted 22 July, 2025; v1 submitted 7 July, 2025;
originally announced July 2025.
-
ONERA's CRM WBPN database for machine learning activities, related regression challenge and first results
Authors:
Jacques Peter,
Quentin Bennehard,
Sébastien Heib,
Jean-Luc Hantrais-Gervois,
Frédéric Moëns
Abstract:
This paper presents a new Computational Fluid Dynamics database, developed at ONERA, to support the advancement of machine learning techniques for aerodynamic field prediction. It contains 468 Reynolds-Averaged Navier-Stokes simulations using the Spalart-Allmaras turbulence model, performed on the NASA/Boeing Common Research Model wing-body-pylon-nacelle configuration. The database spans a wide ra…
▽ More
This paper presents a new Computational Fluid Dynamics database, developed at ONERA, to support the advancement of machine learning techniques for aerodynamic field prediction. It contains 468 Reynolds-Averaged Navier-Stokes simulations using the Spalart-Allmaras turbulence model, performed on the NASA/Boeing Common Research Model wing-body-pylon-nacelle configuration. The database spans a wide range of flow conditions, varying Mach number (including transonic regimes), angle of attack (capturing flow separation), and Reynolds number (based on three stagnation pressures, with one setting matching wind tunnel experiments). The quality of the database is assessed, through checking the convergence level of each computation.
Based on these data, a regression challenge is defined. It consists in predicting the wall distributions of pressure and friction coefficients for unseen aerodynamic conditions. The 468 simulations are split into training and testing sets, with the training data made available publicly on the Codabench platform. The paper further evaluates several classical machine learning regressors on this task. Tested pointwise methods include Multi-Layer Perceptrons, $λ$-DNNs, and Decision Trees, while global methods include Multi-Layer Perceptron, k-Nearest Neighbors, Proper Orthogonal Decomposition and IsoMap. Initial performance results, using $R^2$ scores and worst relative mean absolute error metrics, are presented, offering insights into the capabilities of these techniques for the challenge and references for future work.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
You Cannot Feed Two Birds with One Score: the Accuracy-Naturalness Tradeoff in Translation
Authors:
Gergely Flamich,
David Vilar,
Jan-Thorsten Peter,
Markus Freitag
Abstract:
The goal of translation, be it by human or by machine, is, given some text in a source language, to produce text in a target language that simultaneously 1) preserves the meaning of the source text and 2) achieves natural expression in the target language. However, researchers in the machine translation community usually assess translations using a single score intended to capture semantic accurac…
▽ More
The goal of translation, be it by human or by machine, is, given some text in a source language, to produce text in a target language that simultaneously 1) preserves the meaning of the source text and 2) achieves natural expression in the target language. However, researchers in the machine translation community usually assess translations using a single score intended to capture semantic accuracy and the naturalness of the output simultaneously. In this paper, we build on recent advances in information theory to mathematically prove and empirically demonstrate that such single-score summaries do not and cannot give the complete picture of a system's true performance. Concretely, we prove that a tradeoff exists between accuracy and naturalness and demonstrate it by evaluating the submissions to the WMT24 shared task. Our findings help explain well-known empirical phenomena, such as the observation that optimizing translation systems for a specific accuracy metric (like BLEU) initially improves the system's naturalness, while ``overfitting'' the system to the metric can significantly degrade its naturalness. Thus, we advocate for a change in how translations are evaluated: rather than comparing systems using a single number, they should be compared on an accuracy-naturalness plane.
△ Less
Submitted 1 April, 2025; v1 submitted 31 March, 2025;
originally announced March 2025.
-
Gemma 3 Technical Report
Authors:
Gemma Team,
Aishwarya Kamath,
Johan Ferret,
Shreya Pathak,
Nino Vieillard,
Ramona Merhej,
Sarah Perrin,
Tatiana Matejovicova,
Alexandre Ramé,
Morgane Rivière,
Louis Rouillard,
Thomas Mesnard,
Geoffrey Cideron,
Jean-bastien Grill,
Sabela Ramos,
Edouard Yvinec,
Michelle Casbon,
Etienne Pot,
Ivo Penchev,
Gaël Liu,
Francesco Visin,
Kathleen Kenealy,
Lucas Beyer,
Xiaohai Zhai,
Anton Tsitsulin
, et al. (191 additional authors not shown)
Abstract:
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie…
▽ More
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Persistent Homology-induced Graph Ensembles for Time Series Regressions
Authors:
Viet The Nguyen,
Duy Anh Pham,
An Thai Le,
Jans Peter,
Gunther Gust
Abstract:
The effectiveness of Spatio-temporal Graph Neural Networks (STGNNs) in time-series applications is often limited by their dependence on fixed, hand-crafted input graph structures. Motivated by insights from the Topological Data Analysis (TDA) paradigm, of which real-world data exhibits multi-scale patterns, we construct several graphs using Persistent Homology Filtration -- a mathematical framewor…
▽ More
The effectiveness of Spatio-temporal Graph Neural Networks (STGNNs) in time-series applications is often limited by their dependence on fixed, hand-crafted input graph structures. Motivated by insights from the Topological Data Analysis (TDA) paradigm, of which real-world data exhibits multi-scale patterns, we construct several graphs using Persistent Homology Filtration -- a mathematical framework describing the multiscale structural properties of data points. Then, we use the constructed graphs as an input to create an ensemble of Graph Neural Networks. The ensemble aggregates the signals from the individual learners via an attention-based routing mechanism, thus systematically encoding the inherent multiscale structures of data. Four different real-world experiments on seismic activity prediction and traffic forecasting (PEMS-BAY, METR-LA) demonstrate that our approach consistently outperforms single-graph baselines while providing interpretable insights.
△ Less
Submitted 19 March, 2025; v1 submitted 18 March, 2025;
originally announced March 2025.
-
Smart Feeding Station: Non-Invasive, Automated IoT Monitoring of Goodman's Mouse Lemurs in a Semi-Natural Rainforest Habitat
Authors:
Jonas Peter,
Victor Luder,
Leyla Rivero Davis,
Lukas Schulthess,
Michele Magno
Abstract:
In recent years, zoological institutions have made significant strides to reimagine ex situ animal habitats, moving away from traditional single-species enclosures towards expansive multi-species environments, more closely resembling semi-natural ecosystems. This paradigm shift, driven by a commitment to animal welfare, encourages a broader range of natural behaviors through abiotic and biotic int…
▽ More
In recent years, zoological institutions have made significant strides to reimagine ex situ animal habitats, moving away from traditional single-species enclosures towards expansive multi-species environments, more closely resembling semi-natural ecosystems. This paradigm shift, driven by a commitment to animal welfare, encourages a broader range of natural behaviors through abiotic and biotic interactions. This laudable progression nonetheless introduces challenges for population monitoring, adapting daily animal care, and automating data collection for long-term research studies. This paper presents an IoT-enabled wireless smart feeding station tailored to Goodman's mouse lemurs (Microcebus lehilahytsara). System design integrates a precise Radio Frequency Identification (RFID) reader to identify the animals' implanted RFID chip simultaneously recording body weight and visit duration. Leveraging sophisticated electronic controls, the station can selectively activate a trapping mechanism for individuals with specific tags when needed. Collected data or events like a successful capture are forwarded over the Long Range Wide Area Network (LoRaWAN) to a web server and provided to the animal caretakers. To validate functionality and reliability under harsh conditions of a tropical climate, the feeding station was tested in the semi-natural Masoala rainforest biome at Zoo Zurich over two months. The station detected an animal's RFID chip when visiting the box with 98.68 % reliability, a LoRaWAN transmission reliability of 97.99 %, and a deviation in weighing accuracy below 0.41 g. Beyond its immediate application, this system addresses the challenges of automated population monitoring advancing minimally intrusive animal care and research on species behavior and ecology.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
There's no Data Like Better Data: Using QE Metrics for MT Data Filtering
Authors:
Jan-Thorsten Peter,
David Vilar,
Daniel Deutsch,
Mara Finkelstein,
Juraj Juraska,
Markus Freitag
Abstract:
Quality Estimation (QE), the evaluation of machine translation output without the need of explicit references, has seen big improvements in the last years with the use of neural metrics. In this paper we analyze the viability of using QE metrics for filtering out bad quality sentence pairs in the training data of neural machine translation systems~(NMT). While most corpus filtering methods are foc…
▽ More
Quality Estimation (QE), the evaluation of machine translation output without the need of explicit references, has seen big improvements in the last years with the use of neural metrics. In this paper we analyze the viability of using QE metrics for filtering out bad quality sentence pairs in the training data of neural machine translation systems~(NMT). While most corpus filtering methods are focused on detecting noisy examples in collections of texts, usually huge amounts of web crawled data, QE models are trained to discriminate more fine-grained quality differences. We show that by selecting the highest quality sentence pairs in the training data, we can improve translation quality while reducing the training size by half. We also provide a detailed analysis of the filtering results, which highlights the differences between both approaches.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Learned-SBL: A Deep Learning Architecture for Sparse Signal Recovery
Authors:
Rubin Jose Peter,
Chandra R. Murthy
Abstract:
In this paper, we present a computationally efficient sparse signal recovery scheme using Deep Neural Networks (DNN). The architecture of the introduced neural network is inspired from sparse Bayesian learning (SBL) and named as Learned-SBL (L-SBL). We design a common architecture to recover sparse as well as block sparse vectors from single measurement vector (SMV) or multiple measurement vectors…
▽ More
In this paper, we present a computationally efficient sparse signal recovery scheme using Deep Neural Networks (DNN). The architecture of the introduced neural network is inspired from sparse Bayesian learning (SBL) and named as Learned-SBL (L-SBL). We design a common architecture to recover sparse as well as block sparse vectors from single measurement vector (SMV) or multiple measurement vectors (MMV) depending on the nature of the training data. In the MMV model, the L-SBL network can be trained to learn any underlying sparsity pattern among the vectors including joint sparsity, block sparsity, etc. In particular, for block sparse recovery, learned-SBL does not require any prior knowledge of block boundaries. In each layer of the L-SBL, an estimate of the signal covariance matrix is obtained as the output of a neural network. Then a maximum a posteriori (MAP) estimator of the unknown sparse vector is implemented with non-trainable parameters. In many applications, the measurement matrix may be time-varying. The existing DNN based sparse signal recovery schemes demand the retraining of the neural network using current measurement matrix. The architecture of L-SBL allows it to accept the measurement matrix as an input to the network, and thereby avoids the need for retraining. We also evaluate the performance of Learned-SBL in the detection of an extended target using a multiple-input multiple-output (MIMO) radar. Simulation results illustrate that the proposed approach offers superior sparse recovery performance compared to the state-of-the-art methods.
△ Less
Submitted 17 September, 2019;
originally announced September 2019.
-
Local System Voting Feature for Machine Translation System Combination
Authors:
Markus Freitag,
Jan-Thorsten Peter,
Stephan Peitz,
Minwei Feng,
Hermann Ney
Abstract:
In this paper, we enhance the traditional confusion network system combination approach with an additional model trained by a neural network. This work is motivated by the fact that the commonly used binary system voting models only assign each input system a global weight which is responsible for the global impact of each input system on all translations. This prevents individual systems with low…
▽ More
In this paper, we enhance the traditional confusion network system combination approach with an additional model trained by a neural network. This work is motivated by the fact that the commonly used binary system voting models only assign each input system a global weight which is responsible for the global impact of each input system on all translations. This prevents individual systems with low system weights from having influence on the system combination output, although in some situations this could be helpful. Further, words which have only been seen by one or few systems rarely have a chance of being present in the combined output. We train a local system voting model by a neural network which is based on the words themselves and the combinatorial occurrences of the different system outputs. This gives system combination the option to prefer other systems at different word positions even for the same sentence.
△ Less
Submitted 9 February, 2017;
originally announced February 2017.
-
Guided Alignment Training for Topic-Aware Neural Machine Translation
Authors:
Wenhu Chen,
Evgeny Matusov,
Shahram Khadivi,
Jan-Thorsten Peter
Abstract:
In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models. We show that our novel guided alignment training approach improves translation quality on real-life e-commerce texts consisting of product titles and descriptions, overcoming the problems posed by…
▽ More
In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models. We show that our novel guided alignment training approach improves translation quality on real-life e-commerce texts consisting of product titles and descriptions, overcoming the problems posed by many unknown words and a large type/token ratio. We also show that meta-data associated with input texts such as topic or category information can significantly improve translation quality when used as an additional signal to the decoder part of the network. With both novel features, the BLEU score of the NMT system on a product title set improves from 18.6 to 21.3%. Even larger MT quality gains are obtained through domain adaptation of a general domain NMT system to e-commerce data. The developed NMT system also performs well on the IWSLT speech translation task, where an ensemble of four variant systems outperforms the phrase-based baseline by 2.1% BLEU absolute.
△ Less
Submitted 6 July, 2016;
originally announced July 2016.
-
A Novel Algorithm for Informative Meta Similarity Clusters Using Minimum Spanning Tree
Authors:
S. John Peter,
S. P. Victor
Abstract:
The minimum spanning tree clustering algorithm is capable of detecting clusters with irregular boundaries. In this paper we propose two minimum spanning trees based clustering algorithm. The first algorithm produces k clusters with center and guaranteed intra-cluster similarity. The radius and diameter of k clusters are computed to find the tightness of k clusters. The variance of the k clusters a…
▽ More
The minimum spanning tree clustering algorithm is capable of detecting clusters with irregular boundaries. In this paper we propose two minimum spanning trees based clustering algorithm. The first algorithm produces k clusters with center and guaranteed intra-cluster similarity. The radius and diameter of k clusters are computed to find the tightness of k clusters. The variance of the k clusters are also computed to find the compactness of the clusters. The second algorithm is proposed to create a dendrogram using the k clusters as objects with guaranteed inter-cluster similarity. The algorithm is also finds central cluster from the k number of clusters. The first algorithm uses divisive approach, where as the second algorithm uses agglomerative approach. In this paper we used both the approaches to find Informative Meta similarity clusters.
△ Less
Submitted 6 May, 2010;
originally announced May 2010.