这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–20 of 20 results for author: Mentzer, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3410 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 16 October, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2312.02116  [pdf, other

    cs.CV

    GIVT: Generative Infinite-Vocabulary Transformers

    Authors: Michael Tschannen, Cian Eastwood, Fabian Mentzer

    Abstract: We introduce Generative Infinite-Vocabulary Transformers (GIVT) which generate vector sequences with real-valued entries, instead of discrete tokens from a finite vocabulary. To this end, we propose two surprisingly simple modifications to decoder-only transformers: 1) at the input, we replace the finite-vocabulary lookup table with a linear projection of the input vectors; and 2) at the output, w… ▽ More

    Submitted 17 July, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: v2: add related NLP work, loss details. v3: Improved GMM formulation, added adapter module, larger models, better image generation results. v4: ECCV 2024 camera ready version (minor changes). Code and model checkpoints are available at: https://github.com/google-research/big_vision

  4. arXiv:2309.15505  [pdf, other

    cs.CV cs.LG

    Finite Scalar Quantization: VQ-VAE Made Simple

    Authors: Fabian Mentzer, David Minnen, Eirikur Agustsson, Michael Tschannen

    Abstract: We propose to replace vector quantization (VQ) in the latent representation of VQ-VAEs with a simple scheme termed finite scalar quantization (FSQ), where we project the VAE representation down to a few dimensions (typically less than 10). Each dimension is quantized to a small set of fixed values, leading to an (implicit) codebook given by the product of these sets. By appropriately choosing the… ▽ More

    Submitted 12 October, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Code: https://github.com/google-research/google-research/tree/master/fsq

  5. arXiv:2305.18231  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    High-Fidelity Image Compression with Score-based Generative Models

    Authors: Emiel Hoogeboom, Eirikur Agustsson, Fabian Mentzer, Luca Versari, George Toderici, Lucas Theis

    Abstract: Despite the tremendous success of diffusion generative models in text-to-image generation, replicating this success in the domain of image compression has proven difficult. In this paper, we demonstrate that diffusion can significantly improve perceptual quality at a given bit-rate, outperforming state-of-the-art approaches PO-ELIC and HiFiC as measured by FID score. This is achieved using a simpl… ▽ More

    Submitted 7 March, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

  6. arXiv:2304.07313  [pdf, other

    eess.IV cs.LG

    M2T: Masking Transformers Twice for Faster Decoding

    Authors: Fabian Mentzer, Eirikur Agustsson, Michael Tschannen

    Abstract: We show how bidirectional transformers trained for masked token prediction can be applied to neural image compression to achieve state-of-the-art results. Such models were previously used for image generation by progressivly sampling groups of masked tokens according to uncertainty-adaptive schedules. Unlike these works, we demonstrate that predefined, deterministic schedules perform as well or be… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

  7. arXiv:2212.13824  [pdf, other

    cs.CV cs.LG eess.IV

    Multi-Realism Image Compression with a Conditional Generator

    Authors: Eirikur Agustsson, David Minnen, George Toderici, Fabian Mentzer

    Abstract: By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models. However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a mi… ▽ More

    Submitted 30 March, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

    Comments: CVPR'23 Camera Ready

  8. arXiv:2206.08889  [pdf, other

    stat.ML cs.IT cs.LG

    Lossy Compression with Gaussian Diffusion

    Authors: Lucas Theis, Tim Salimans, Matthew D. Hoffman, Fabian Mentzer

    Abstract: We consider a novel lossy compression approach based on unconditional diffusion generative models, which we call DiffC. Unlike modern compression schemes which rely on transform coding and quantization to restrict the transmitted information, DiffC relies on the efficient communication of pixels corrupted by Gaussian noise. We implement a proof of concept and find that it works surprisingly well d… ▽ More

    Submitted 31 December, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

  9. arXiv:2206.07307  [pdf, other

    cs.CV cs.LG eess.IV

    VCT: A Video Compression Transformer

    Authors: Fabian Mentzer, George Toderici, David Minnen, Sung-Jin Hwang, Sergi Caelles, Mario Lucic, Eirikur Agustsson

    Abstract: We show how transformers can be used to vastly simplify neural video compression. Previous methods have been relying on an increasing number of architectural biases and priors, including motion prediction and warping operations, resulting in complex models. Instead, we independently map input frames to representations and use a transformer to model their dependencies, letting it predict the distri… ▽ More

    Submitted 12 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: NeurIPS'22 Camera Ready Version. Code: https://goo.gle/vct-paper

  10. arXiv:2107.12038  [pdf, other

    eess.IV cs.CV

    Neural Video Compression using GANs for Detail Synthesis and Propagation

    Authors: Fabian Mentzer, Eirikur Agustsson, Johannes Ballé, David Minnen, Nick Johnston, George Toderici

    Abstract: We present the first neural video compression method based on generative adversarial networks (GANs). Our approach significantly outperforms previous neural and non-neural video compression methods in a user study, setting a new state-of-the-art in visual quality for neural methods. We show that the GAN loss is crucial to obtain this high visual quality. Two components make the GAN loss effective:… ▽ More

    Submitted 12 July, 2022; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: First two authors contributed equally. ECCV Camera ready version

  11. Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model

    Authors: Ren Yang, Fabian Mentzer, Luc Van Gool, Radu Timofte

    Abstract: The past few years have witnessed increasing interests in applying deep learning to video compression. However, the existing approaches compress a video frame with only a few number of reference frames, which limits their ability to fully exploit the temporal correlation among video frames. To overcome this shortcoming, this paper proposes a Recurrent Learned Video Compression (RLVC) approach with… ▽ More

    Submitted 6 December, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: Accepted for publication in IEEE Journal of Selected Topics in Signal Processing (J-STSP)

    Journal ref: IEEE Journal of Selected Topics in Signal Processing, 2021

  12. arXiv:2006.09965  [pdf, other

    eess.IV cs.CV cs.LG

    High-Fidelity Generative Image Compression

    Authors: Fabian Mentzer, George Toderici, Michael Tschannen, Eirikur Agustsson

    Abstract: We extensively study how to combine Generative Adversarial Networks and learned compression to obtain a state-of-the-art generative lossy compression system. In particular, we investigate normalization layers, generator and discriminator architectures, training strategies, as well as perceptual losses. In contrast to previous work, i) we obtain visually pleasing reconstructions that are perceptual… ▽ More

    Submitted 23 October, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: This is the Camera Ready version for NeurIPS 2020. Project page: https://hific.github.io

  13. arXiv:2003.10184  [pdf, other

    cs.CV cs.LG eess.IV

    Learning Better Lossless Compression Using Lossy Compression

    Authors: Fabian Mentzer, Luc Van Gool, Michael Tschannen

    Abstract: We leverage the powerful lossy image compression algorithm BPG to build a lossless image compression system. Specifically, the original image is first decomposed into the lossy reconstruction obtained after compressing it with BPG and the corresponding residual. We then model the distribution of the residual with a convolutional neural network-based probabilistic model that is conditioned on the B… ▽ More

    Submitted 23 March, 2020; originally announced March 2020.

    Comments: CVPR'20 camera-ready version

  14. arXiv:2003.01966  [pdf, other

    eess.IV cs.CV

    Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement

    Authors: Ren Yang, Fabian Mentzer, Luc Van Gool, Radu Timofte

    Abstract: In this paper, we propose a Hierarchical Learned Video Compression (HLVC) method with three hierarchical quality layers and a recurrent enhancement network. The frames in the first layer are compressed by an image compression method with the highest quality. Using these frames as references, we propose the Bi-Directional Deep Compression (BDDC) network to compress the second layer with relatively… ▽ More

    Submitted 3 August, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

    Comments: Published in CVPR 2020; corrected a minor typo in the footnote of Table 1; corrected Figure 11

  15. arXiv:1811.12817  [pdf, other

    eess.IV cs.CV cs.LG

    Practical Full Resolution Learned Lossless Image Compression

    Authors: Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, Luc Van Gool

    Abstract: We propose the first practical learned lossless image compression system, L3C, and show that it outperforms the popular engineered codecs, PNG, WebP and JPEG 2000. At the core of our method is a fully parallelizable hierarchical probabilistic model for adaptive entropy coding which is optimized end-to-end for the compression task. In contrast to recent autoregressive discrete probabilistic models… ▽ More

    Submitted 6 March, 2020; v1 submitted 30 November, 2018; originally announced November 2018.

    Comments: Updated preprocessing and Table 1, see A.1 in supplementary. Code and models: https://github.com/fab-jul/L3C-PyTorch

  16. arXiv:1804.02958  [pdf, other

    cs.CV cs.LG

    Generative Adversarial Networks for Extreme Learned Image Compression

    Authors: Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte, Luc Van Gool

    Abstract: We present a learned image compression system based on GANs, operating at extremely low bitrates. Our proposed framework combines an encoder, decoder/generator and a multi-scale discriminator, which we train jointly for a generative learned compression objective. The model synthesizes details it cannot afford to store, obtaining visually pleasing results at bitrates where previous methods fail and… ▽ More

    Submitted 18 August, 2019; v1 submitted 9 April, 2018; originally announced April 2018.

    Comments: E. Agustsson, M. Tschannen, and F. Mentzer contributed equally to this work. ICCV 2019 camera ready version

  17. arXiv:1803.06131  [pdf, other

    cs.CV

    Towards Image Understanding from Deep Compression without Decoding

    Authors: Robert Torfason, Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, Luc Van Gool

    Abstract: Motivated by recent work on deep neural network (DNN)-based image compression methods showing potential improvements in image quality, savings in storage, and bandwidth reduction, we propose to perform image understanding tasks such as classification and segmentation directly on the compressed representations produced by these compression methods. Since the encoders and decoders in DNN-based compr… ▽ More

    Submitted 16 March, 2018; originally announced March 2018.

    Comments: ICLR 2018 conference paper

  18. arXiv:1801.04260  [pdf, other

    cs.CV cs.LG

    Conditional Probability Models for Deep Image Compression

    Authors: Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, Luc Van Gool

    Abstract: Deep Neural Networks trained as image auto-encoders have recently emerged as a promising direction for advancing the state-of-the-art in image compression. The key challenge in learning such networks is twofold: To deal with quantization, and to control the trade-off between reconstruction error (distortion) and entropy (rate) of the latent image representation. In this paper, we focus on the latt… ▽ More

    Submitted 4 June, 2019; v1 submitted 12 January, 2018; originally announced January 2018.

    Comments: CVPR 2018. Code available at https://github.com/fab-jul/imgcomp-cvpr . The first two authors contributed equally. Minor revision: fixed Fig. 2, added page numbers

  19. arXiv:1704.00648  [pdf, other

    cs.LG cs.CV

    Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations

    Authors: Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, Luc Van Gool

    Abstract: We present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy. Our method is based on a soft (continuous) relaxation of quantization and entropy, which we anneal to their discrete counterparts throughout training. We showcase this method for two challenging applications: Image compression and neural network compression. While these tasks… ▽ More

    Submitted 8 June, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

  20. arXiv:1609.07916  [pdf, other

    cs.CV cs.LG

    Deep Structured Features for Semantic Segmentation

    Authors: Michael Tschannen, Lukas Cavigelli, Fabian Mentzer, Thomas Wiatowski, Luca Benini

    Abstract: We propose a highly structured neural network architecture for semantic segmentation with an extremely small model size, suitable for low-power embedded and mobile platforms. Specifically, our architecture combines i) a Haar wavelet-based tree-like convolutional neural network (CNN), ii) a random layer realizing a radial basis function kernel approximation, and iii) a linear classifier. While stag… ▽ More

    Submitted 16 June, 2017; v1 submitted 26 September, 2016; originally announced September 2016.

    Comments: EUSIPCO 2017, 5 pages, 2 figures