Search | arXiv e-print repository

arXiv:2507.06261 [pdf, ps, other]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving. △ Less

Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

Comments: 72 pages, 17 figures

arXiv:2507.05201 [pdf, ps, other]

MedGemma Technical Report

Authors: Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick , et al. (56 additional authors not shown)

Abstract: Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce Me… ▽ More Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma. △ Less

Submitted 12 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

arXiv:2506.06644 [pdf, ps, other]

Spark Transformer: Reactivating Sparsity in FFN and Attention

Authors: Chong You, Kan Wu, Zhipeng Jia, Lin Chen, Srinadh Bhojanapalli, Jiaxian Guo, Utku Evci, Jan Wassenberg, Praneeth Netrapalli, Jeremiah J. Willcock, Suvinay Subramanian, Felix Chern, Alek Andreev, Shreya Pathak, Felix Yu, Prateek Jain, David E. Culler, Henry M. Levy, Sanjiv Kumar

Abstract: The discovery of the lazy neuron phenomenon in trained Transformers, where the vast majority of neurons in their feed-forward networks (FFN) are inactive for each token, has spurred tremendous interests in activation sparsity for enhancing large model efficiency. While notable progress has been made in translating such sparsity to wall-time benefits, modern Transformers have moved away from the Re… ▽ More The discovery of the lazy neuron phenomenon in trained Transformers, where the vast majority of neurons in their feed-forward networks (FFN) are inactive for each token, has spurred tremendous interests in activation sparsity for enhancing large model efficiency. While notable progress has been made in translating such sparsity to wall-time benefits, modern Transformers have moved away from the ReLU activation function crucial to this phenomenon. Existing efforts on re-introducing activation sparsity often degrade model quality, increase parameter count, complicate or slow down training. Sparse attention, the application of sparse activation to the attention mechanism, often faces similar challenges. This paper introduces the Spark Transformer, a novel architecture that achieves a high level of activation sparsity in both FFN and the attention mechanism while maintaining model quality, parameter count, and standard training procedures. Our method realizes sparsity via top-k masking for explicit control over sparsity level. Crucially, we introduce statistical top-k, a hardware-accelerator-friendly, linear-time approximate algorithm that avoids costly sorting and mitigates significant training slowdown from standard top-$k$ operators. Furthermore, Spark Transformer reallocates existing FFN parameters and attention key embeddings to form a low-cost predictor for identifying activated entries. This design not only mitigates quality loss from enforced sparsity, but also enhances wall-time benefit. Pretrained with the Gemma-2 recipe, Spark Transformer demonstrates competitive performance on standard benchmarks while exhibiting significant sparsity: only 8% of FFN neurons are activated, and each token attends to a maximum of 256 tokens. This sparsity translates to a 2.5x reduction in FLOPs, leading to decoding wall-time speedups of up to 1.79x on CPU and 1.40x on GPU. △ Less

Submitted 6 June, 2025; originally announced June 2025.

arXiv:2504.17352 [pdf]

doi 10.3390/s25072305

The Riemannian Means Field Classifier for EEG-Based BCI Data

Authors: Anton Andreev, Grégoire Cattan, Marco Congedo

Abstract: A substantial amount of research has demonstrated the robustness and accuracy of the Riemannian minimum distance to mean (MDM) classifier for all kinds of EEG-based brain--computer interfaces (BCIs). This classifier is simple, fully deterministic, robust to noise, computationally efficient, and prone to transfer learning. Its training is very simple, requiring just the computation of a geometric m… ▽ More A substantial amount of research has demonstrated the robustness and accuracy of the Riemannian minimum distance to mean (MDM) classifier for all kinds of EEG-based brain--computer interfaces (BCIs). This classifier is simple, fully deterministic, robust to noise, computationally efficient, and prone to transfer learning. Its training is very simple, requiring just the computation of a geometric mean of a symmetric positive-definite (SPD) matrix per class. We propose an improvement of the MDM involving a number of power means of SPD matrices instead of the sole geometric mean. By the analysis of 20 public databases, 10 for the motor-imagery BCI paradigm and 10 for the P300 BCI paradigm, comprising 587 individuals in total, we show that the proposed classifier clearly outperforms the MDM, approaching the state-of-the art in terms of performance while retaining the simplicity and the deterministic behavior. In order to promote reproducible research, our code will be released as open source. △ Less

Submitted 24 April, 2025; originally announced April 2025.

Journal ref: Sensors, 2025, 25 (7), pp.2305

arXiv:2503.19786 [pdf, other]

Gemma 3 Technical Report

Authors: Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai Zhai, Anton Tsitsulin , et al. (191 additional authors not shown)

Abstract: We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie… ▽ More We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community. △ Less

Submitted 25 March, 2025; originally announced March 2025.

arXiv:2408.00118 [pdf, other]

Gemma 2: Improving Open Language Models at a Practical Size

Authors: Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman , et al. (173 additional authors not shown)

Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al… ▽ More In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community. △ Less

Submitted 2 October, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

arXiv:2404.07839 [pdf, other]

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

Abstract: We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-tr… ▽ More We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens. △ Less

Submitted 28 August, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.08295 [pdf, other]

Gemma: Open Models Based on Gemini Research and Technology

Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations. △ Less

Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1326 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 9 May, 2025; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2307.10693 [pdf]

Towards an architectural framework for intelligent virtual agents using probabilistic programming

Authors: Anton Andreev, Grégoire Cattan

Abstract: We present a new framework called KorraAI for conceiving and building embodied conversational agents (ECAs). Our framework models ECAs' behavior considering contextual information, for example, about environment and interaction time, and uncertain information provided by the human interaction partner. Moreover, agents built with KorraAI can show proactive behavior, as they can initiate interaction… ▽ More We present a new framework called KorraAI for conceiving and building embodied conversational agents (ECAs). Our framework models ECAs' behavior considering contextual information, for example, about environment and interaction time, and uncertain information provided by the human interaction partner. Moreover, agents built with KorraAI can show proactive behavior, as they can initiate interactions with human partners. For these purposes, KorraAI exploits probabilistic programming. Probabilistic models in KorraAI are used to model its behavior and interactions with the user. They enable adaptation to the user's preferences and a certain degree of indeterminism in the ECAs to achieve more natural behavior. Human-like internal states, such as moods, preferences, and emotions (e.g., surprise), can be modeled in KorraAI with distributions and Bayesian networks. These models can evolve over time, even without interaction with the user. ECA models are implemented as plugins and share a common interface. This enables ECA designers to focus more on the character they are modeling and less on the technical details, as well as to store and exchange ECA models. Several applications of KorraAI ECAs are possible, such as virtual sales agents, customer service agents, virtual companions, entertainers, or tutors. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2302.02648 [pdf]

First steps towards quantum machine learning applied to the classification of event-related potentials

Authors: Grégoire Cattan, Alexandre Quemy, Anton Andreev

Abstract: Low information transfer rate is a major bottleneck for brain-computer interfaces based on non-invasive electroencephalography (EEG) for clinical applications. This led to the development of more robust and accurate classifiers. In this study, we investigate the performance of quantum-enhanced support vector classifier (QSVC). Training (predicting) balanced accuracy of QSVC was 83.17 (50.25) %. Th… ▽ More Low information transfer rate is a major bottleneck for brain-computer interfaces based on non-invasive electroencephalography (EEG) for clinical applications. This led to the development of more robust and accurate classifiers. In this study, we investigate the performance of quantum-enhanced support vector classifier (QSVC). Training (predicting) balanced accuracy of QSVC was 83.17 (50.25) %. This result shows that the classifier was able to learn from EEG data, but that more research is required to obtain higher predicting accuracy. This could be achieved by a better configuration of the classifier, such as increasing the number of shots. △ Less

Submitted 6 February, 2023; originally announced February 2023.

Comments: in French language

arXiv:2002.02358 [pdf]

doi 10.1109/TG.2019.2957963

A comparison of mobile VR display running on an ordinary smartphone with standard PC display for P300-BCI stimulus presentation

Authors: Grégoire Cattan, Anton Andreev, Cesar Mendoza, Marco Congedo

Abstract: A brain-computer interface (BCI) based on electroencephalography (EEG) is a promising technology for enhancing virtual reality (VR) applications-in particular, for gaming. We focus on the so-called P300-BCI, a stable and accurate BCI paradigm relying on the recognition of a positive event-related potential (ERP) occurring in the EEG about 300 ms post-stimulation. We implemented a basic version of… ▽ More A brain-computer interface (BCI) based on electroencephalography (EEG) is a promising technology for enhancing virtual reality (VR) applications-in particular, for gaming. We focus on the so-called P300-BCI, a stable and accurate BCI paradigm relying on the recognition of a positive event-related potential (ERP) occurring in the EEG about 300 ms post-stimulation. We implemented a basic version of such a BCI displayed on an ordinary and affordable smartphone-based head-mounted VR device: that is, a mobile and passive VR system (with no electronic components beyond the smartphone). The mobile phone performed the stimuli presentation, EEG synchronization (tagging) and feedback display. We compared the ERPs and the accuracy of the BCI on the VR device with a traditional BCI running on a personal computer (PC). We also evaluated the impact of subjective factors on the accuracy. The study was within-subjects, with 21 participants and one session in each modality. No significant difference in BCI accuracy was found between the PC and VR systems, although the P200 ERP was significantly wider and larger in the VR system as compared to the PC system. △ Less

Submitted 6 February, 2020; originally announced February 2020.

Comments: IEEE Transactions on Games, Institute of Electrical and Electronics Engineers, In press

arXiv:1906.12251 [pdf]

Engineering study on the use of Head-Mounted display for Brain- Computer Interface

Authors: Anton Andreev, Grégoire Cattan, M Congedo

Abstract: In this article, we explore the availability of head-mounted display (HMD) devices which can be coupled in a seamless way with P300-based brain-computer interfaces (BCI) using electroencephalography (EEG). The P300 is an event-related potential appearing about 300ms after the onset of a stimulation. The recognition of this potential on the ongoing EEG requires the knowledge of the exact onset of t… ▽ More In this article, we explore the availability of head-mounted display (HMD) devices which can be coupled in a seamless way with P300-based brain-computer interfaces (BCI) using electroencephalography (EEG). The P300 is an event-related potential appearing about 300ms after the onset of a stimulation. The recognition of this potential on the ongoing EEG requires the knowledge of the exact onset of the stimuli. In other words, the stimulations presented in the HMD must be perfectly synced with the acquisition of the EEG signal. This is done through a process called tagging. The tagging must be performed in a reliable and robust way so as to guarantee the recognition of the P300 and thus the performance of the BCI. An HMD device should also be able to render images fast enough to allow an accurate perception of the stimulations, and equally to not perturb the acquisition of the EEG signal. In addition, an affordable HMD device is needed for both research and entertainment purposes. In this study, we selected and tested two HMD configurations. △ Less

Submitted 28 June, 2019; originally announced June 2019.

arXiv:1905.05182 [pdf]

Building Brain Invaders: EEG data of an experimental validation

Authors: Gijsbrecht Van Veen, Alexandre Barachant, Anton Andreev, Grégoire Cattan, Pedro Coelho Rodrigues, Marco Congedo

Abstract: We describe the experimental procedures for a dataset that we have made publicly available at https://doi.org/10.5281/zenodo.2649006 in mat and csv formats. This dataset contains electroencephalographic (EEG) recordings of 25 subjects testing the Brain Invaders (Congedo, 2011), a visual P300 Brain-Computer Interface inspired by the famous vintage video game Space Invaders (Taito, Tokyo, Japan). Th… ▽ More We describe the experimental procedures for a dataset that we have made publicly available at https://doi.org/10.5281/zenodo.2649006 in mat and csv formats. This dataset contains electroencephalographic (EEG) recordings of 25 subjects testing the Brain Invaders (Congedo, 2011), a visual P300 Brain-Computer Interface inspired by the famous vintage video game Space Invaders (Taito, Tokyo, Japan). The visual P300 is an event-related potential elicited by a visual stimulation, peaking 240-600 ms after stimulus onset. EEG data were recorded by 16 electrodes in an experiment that took place in the GIPSA-lab, Grenoble, France, in 2012 (Van Veen, 2013 and Congedo, 2013). Python code for manipulating the data is available at https://github.com/plcrodrigues/py.BI.EEG.2012-GIPSA. The ID of this dataset is BI.EEG.2012-GIPSA. △ Less

Submitted 13 May, 2019; originally announced May 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1904.09111

arXiv:1904.09111 [pdf]

Brain Invaders Adaptive versus Non-Adaptive P300 Brain-Computer Interface dataset

Authors: Erwan Vaineau, Alexandre Barachant, Anton Andreev, Pedro C. Rodrigues, Grégoire Cattan, Marco Congedo

Abstract: We describe the experimental procedures for a dataset that we have made publicly available at https://doi.org/10.5281/zenodo.1494163 in mat and csv formats. This dataset contains electroencephalographic (EEG) recordings of 24 subjects doing a visual P300 Brain-Computer Interface experiment on PC. The visual P300 is an event-related potential elicited by visual stimulation, peaking 240-600 ms afte… ▽ More We describe the experimental procedures for a dataset that we have made publicly available at https://doi.org/10.5281/zenodo.1494163 in mat and csv formats. This dataset contains electroencephalographic (EEG) recordings of 24 subjects doing a visual P300 Brain-Computer Interface experiment on PC. The visual P300 is an event-related potential elicited by visual stimulation, peaking 240-600 ms after stimulus onset. The experiment was designed in order to compare the use of a P300-based brain-computer interface on a PC with and without adaptive calibration using Riemannian geometry. The brain-computer interface is based on electroencephalography (EEG). EEG data were recorded thanks to 16 electrodes. Data were recorded during an experiment taking place in the GIPSA-lab, Grenoble, France, in 2013 (Congedo, 2013). Python code for manipulating the data is available at https://github.com/plcrodrigues/py.BI.EEG.2013-GIPSA. The ID of this dataset is BI.EEG.2013-GIPSA. △ Less

Submitted 19 April, 2019; originally announced April 2019.

arXiv:1903.11297 [pdf]

Dataset of an EEG-based BCI experiment in Virtual Reality and on a Personal Computer

Authors: Grégoire Cattan, A. Andreev, P. Rodrigues, M. Congedo

Abstract: We describe the experimental procedures for a dataset that we have made publicly available at https://doi.org/10.5281/zenodo.2605204 in mat (Mathworks, Natick, USA) and csv formats. This dataset contains electroencephalographic recordings on 21 subjects doing a visual P300 experiment on PC (personal computer) and VR (virtual reality). The visual P300 is an event-related potential elicited by a vi… ▽ More We describe the experimental procedures for a dataset that we have made publicly available at https://doi.org/10.5281/zenodo.2605204 in mat (Mathworks, Natick, USA) and csv formats. This dataset contains electroencephalographic recordings on 21 subjects doing a visual P300 experiment on PC (personal computer) and VR (virtual reality). The visual P300 is an event-related potential elicited by a visual stimulation, peaking 240-600 ms after stimulus onset. The experiment was designed in order to compare the use of a P300-based brain-computer interface on a PC and with a virtual reality headset, concerning the physiological, subjective and performance aspects. The brain-computer interface is based on electroencephalography (EEG). EEG were recorded thanks to 16 electrodes. The virtual reality headset consisted of a passive head-mounted display, that is, a head-mounted display which does not include any electronics at the exception of a smartphone. This experiment was carried out at GIPSA-lab (University of Grenoble Alpes, CNRS, Grenoble-INP) in 2018, and promoted by the IHMTEK Company (Interaction Homme-Machine Technologie). The study was approved by the Ethical Committee of the University of Grenoble Alpes (Comit{é} d'Ethique pour la Recherche Non-Interventionnelle). Python code for manipulating the data is available at https://github.com/plcrodrigues/py.VR.EEG.2018-GIPSA. The ID of this dataset is VR.EEG.2018-GIPSA. △ Less

Submitted 27 March, 2019; originally announced March 2019.

arXiv:1812.03066 [pdf]

Analysis of tagging latency when comparing event-related potentials

Authors: Grégoire Cattan, Anton Andreev, Bastien Maureille, Marco Congedo

Abstract: Event-related potentials (ERPs) are very small voltage produced by the brain in response to external stimulation. In order to detect and evaluate an ERP in an ongoing electroencephalogram (EEG), it is necessary to tag the EEG with the exact onset time of the stimulus. We define the latency as the delay between the time the tagging command is sent and the detection of the stimulus on the screen. Fa… ▽ More Event-related potentials (ERPs) are very small voltage produced by the brain in response to external stimulation. In order to detect and evaluate an ERP in an ongoing electroencephalogram (EEG), it is necessary to tag the EEG with the exact onset time of the stimulus. We define the latency as the delay between the time the tagging command is sent and the detection of the stimulus on the screen. Failing to control sequencing in the tagging pipeline causes problems when interpreting latency, in particular when comparing ERPs generated from stimuli displayed by different systems. In this work, we present number of technical aspects which can influence latency such as the refresh rate of the screen or the display of a stimulus at different screen location. A few propositions are suggested to estimate and correct this latency. △ Less

Submitted 7 December, 2018; originally announced December 2018.

arXiv:1804.10167 [pdf, other]

fMRI: preprocessing, classification and pattern recognition

Authors: Maxim Sharaev, Alexander Andreev, Alexey Artemov, Alexander Bernstein, Evgeny Burnaev, Ekaterina Kondratyeva, Svetlana Sushchinskaya, Renat Akzhigitov

Abstract: As machine learning continues to gain momentum in the neuroscience community, we witness the emergence of novel applications such as diagnostics, characterization, and treatment outcome prediction for psychiatric and neurological disorders, for instance, epilepsy and depression. Systematic research into these mental disorders increasingly involves drawing clinical conclusions on the basis of data-… ▽ More As machine learning continues to gain momentum in the neuroscience community, we witness the emergence of novel applications such as diagnostics, characterization, and treatment outcome prediction for psychiatric and neurological disorders, for instance, epilepsy and depression. Systematic research into these mental disorders increasingly involves drawing clinical conclusions on the basis of data-driven approaches; to this end, structural and functional neuroimaging serve as key source modalities. Identification of informative neuroimaging markers requires establishing a comprehensive preparation pipeline for data which may be severely corrupted by artifactual signal fluctuations. In this work, we review a large body of literature to provide ample evidence for the advantages of pattern recognition approaches in clinical applications, overview advanced graph-based pattern recognition approaches, and propose a noise-aware neuroimaging data processing pipeline. To demonstrate the effectiveness of our approach, we provide results from a pilot study, which show a significant improvement in classification accuracy, indicating a promising research direction. △ Less

Submitted 26 April, 2018; originally announced April 2018.

Comments: 20 pages, 1 figure

arXiv:1804.10163 [pdf, other]

Machine Learning pipeline for discovering neuroimaging-based biomarkers in neurology and psychiatry

Authors: Alexander Bernstein, Evgeny Burnaev, Ekaterina Kondratyeva, Svetlana Sushchinskaya, Maxim Sharaev, Alexander Andreev, Alexey Artemov, Renat Akzhigitov

Abstract: We consider a problem of diagnostic pattern recognition/classification from neuroimaging data. We propose a common data analysis pipeline for neuroimaging-based diagnostic classification problems using various ML algorithms and processing toolboxes for brain imaging. We illustrate the pipeline application by discovering new biomarkers for diagnostics of epilepsy and depression based on clinical an… ▽ More We consider a problem of diagnostic pattern recognition/classification from neuroimaging data. We propose a common data analysis pipeline for neuroimaging-based diagnostic classification problems using various ML algorithms and processing toolboxes for brain imaging. We illustrate the pipeline application by discovering new biomarkers for diagnostics of epilepsy and depression based on clinical and MRI/fMRI data for patients and healthy volunteers. △ Less

Submitted 26 April, 2018; originally announced April 2018.

Comments: 20 pages, 2 figures

arXiv:1310.8115 [pdf]

A New Generation of Brain-Computer Interface Based on Riemannian Geometry

Authors: Marco Congedo, Alexandre Barachant, Anton Andreev

Abstract: Based on the cumulated experience over the past 25 years in the field of Brain-Computer Interface (BCI) we can now envision a new generation of BCI. Such BCIs will not require training; instead they will be smartly initialized using remote massive databases and will adapt to the user fast and effectively in the first minute of use. They will be reliable, robust and will maintain good performances… ▽ More Based on the cumulated experience over the past 25 years in the field of Brain-Computer Interface (BCI) we can now envision a new generation of BCI. Such BCIs will not require training; instead they will be smartly initialized using remote massive databases and will adapt to the user fast and effectively in the first minute of use. They will be reliable, robust and will maintain good performances within and across sessions. A general classification framework based on recent advances in Riemannian geometry and possessing these characteristics is presented. It applies equally well to BCI based on event-related potentials (ERP), sensorimotor (mu) rhythms and steady-state evoked potential (SSEP). The framework is very simple, both algorithmically and computationally. Due to its simplicity, its ability to learn rapidly (with little training data) and its good across-subject and across-session generalization, this strategy a very good candidate for building a new generation of BCIs, thus we hereby propose it as a benchmark method for the field. △ Less

Submitted 30 October, 2013; originally announced October 2013.

Comments: 33 pages, 9 Figures, 17 equations/algorithms

Showing 1–21 of 21 results for author: Andreev, A