这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 1,470 results for author: Dmitry

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.17513  [pdf, ps, other

    cs.LG cs.AI

    HOTA: Hamiltonian framework for Optimal Transport Advection

    Authors: Nazar Buzun, Daniil Shlenskii, Maxim Bobrin, Dmitry V. Dylov

    Abstract: Optimal transport (OT) has become a natural framework for guiding the probability flows. Yet, the majority of recent generative models assume trivial geometry (e.g., Euclidean) and rely on strong density-estimation assumptions, yielding trajectories that do not respect the true principles of optimality in the underlying manifold. We present Hamiltonian Optimal Transport Advection (HOTA), a Hamilto… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

  2. arXiv:2507.16105  [pdf, ps, other

    cs.CC math.CO

    Monotone Circuit Complexity of Matching

    Authors: Bruno Cavalar, Mika Göös, Artur Riazanov, Anastasia Sofronova, Dmitry Sokolov

    Abstract: We show that the perfect matching function on $n$-vertex graphs requires monotone circuits of size $\smash{2^{n^{Ω(1)}}}$. This improves on the $n^{Ω(\log n)}$ lower bound of Razborov (1985). Our proof uses the standard approximation method together with a new sunflower lemma for matchings.

    Submitted 21 July, 2025; originally announced July 2025.

  3. arXiv:2507.16008  [pdf, ps, other

    cs.LG math.OC

    Enhancing Stability of Physics-Informed Neural Network Training Through Saddle-Point Reformulation

    Authors: Dmitry Bylinkin, Mikhail Aleksandrov, Savelii Chezhegov, Aleksandr Beznosikov

    Abstract: Physics-informed neural networks (PINNs) have gained prominence in recent years and are now effectively used in a number of applications. However, their performance remains unstable due to the complex landscape of the loss function. To address this issue, we reformulate PINN training as a nonconvex-strongly concave saddle-point problem. After establishing the theoretical foundation for this approa… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

    Comments: 34 pages, 4 tables, 3 figures, 4 theorems; code available at https://anonymous.4open.science/r/pinns-bgda-00D6/README.md

  4. Model Simplification through refinement

    Authors: Dmitry Brodsky, Benjamin Watson

    Abstract: As modeling and visualization applications proliferate, there arises a need to simplify large polygonal models at interactive rates. Unfortunately existing polygon mesh simplification algorithms are not well suited for this task because they are either too slow (requiring the simplified model to be pre-computed) or produce models that are too poor in quality. These shortcomings become particularly… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

    Journal ref: Proceedings of Graphics Interface 2000: Montréal, Québec, Canada, 15 - 17 May 2000, 221-228

  5. arXiv:2507.14258  [pdf, ps, other

    cs.CY

    Dispute Resolution in Peer Review with Abstract Argumentation and OWL DL

    Authors: Ildar Baimuratov, Elena Lisanyuk, Dmitry Prokudin

    Abstract: The peer review process for scientific publications faces significant challenges due to the increasing volume of submissions and inherent reviewer biases. While artificial intelligence offers the potential to facilitate the process, it also risks perpetuating biases present in training data. This research addresses these challenges by applying formal methods from argumentation theory to support tr… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

  6. arXiv:2507.13413  [pdf, ps, other

    cs.LG

    LightAutoDS-Tab: Multi-AutoML Agentic System for Tabular Data

    Authors: Aleksey Lapin, Igor Hromov, Stanislav Chumakov, Mile Mitrovic, Dmitry Simakov, Nikolay O. Nikitin, Andrey V. Savchenko

    Abstract: AutoML has advanced in handling complex tasks using the integration of LLMs, yet its efficiency remains limited by dependence on specific underlying tools. In this paper, we introduce LightAutoDS-Tab, a multi-AutoML agentic system for tasks with tabular data, which combines an LLM-based code generation with several AutoML tools. Our approach improves the flexibility and robustness of pipeline desi… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: 11 pages, 2 figures

  7. arXiv:2507.12202  [pdf, ps, other

    cs.IR cs.AI cs.LG

    Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control

    Authors: Anton Klenitskiy, Konstantin Polev, Daria Denisova, Alexey Vasilev, Dmitry Simakov, Gleb Gusev

    Abstract: Many current state-of-the-art models for sequential recommendations are based on transformer architectures. Interpretation and explanation of such black box models is an important research question, as a better understanding of their internals can help understand, influence, and control their behavior, which is very important in a variety of real-world applications. Recently sparse autoencoders (S… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

  8. arXiv:2507.12124  [pdf, ps, other

    cs.CC

    Searching for Falsified Clause in Random (log n)-CNFs is Hard for Randomized Communication

    Authors: Artur Riazanov, Anastasia Sofronova, Dmitry Sokolov, Weiqiang Yuan

    Abstract: We show that for a randomly sampled unsatisfiable $O(\log n)$-CNF over $n$ variables the randomized two-party communication cost of finding a clause falsified by the given variable assignment is linear in $n$.

    Submitted 16 July, 2025; originally announced July 2025.

  9. arXiv:2507.11059  [pdf, ps, other

    cs.SE cs.AI cs.CL

    SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks

    Authors: Pavel Adamenko, Mikhail Ivanov, Aidar Valeev, Rodion Levichev, Pavel Zadorozhny, Ivan Lopatin, Dmitry Babayev, Alena Fenogenova, Valentin Malykh

    Abstract: The rapid advancement of Large Language Models (LLMs) in software engineering has revealed critical limitations in existing benchmarks, particularly the widely used SWE-bench dataset. Recent studies have uncovered severe data contamination issues, e.g. SWE-bench reports 32.67% of successful patches involve direct solution leakage and 31.08% pass due to inadequate test cases. We introduce SWE-MERA,… ▽ More

    Submitted 17 July, 2025; v1 submitted 15 July, 2025; originally announced July 2025.

  10. arXiv:2507.10349  [pdf, ps, other

    cs.LG cs.AI

    TAT: Temporal-Aligned Transformer for Multi-Horizon Peak Demand Forecasting

    Authors: Zhiyuan Zhao, Sitan Yang, Kin G. Olivares, Boris N. Oreshkin, Stan Vitebsky, Michael W. Mahoney, B. Aditya Prakash, Dmitry Efimov

    Abstract: Multi-horizon time series forecasting has many practical applications such as demand forecasting. Accurate demand prediction is critical to help make buying and inventory decisions for supply chain management of e-commerce and physical retailers, and such predictions are typically required for future horizons extending tens of weeks. This is especially challenging during high-stake sales events wh… ▽ More

    Submitted 14 July, 2025; originally announced July 2025.

    Comments: 9 pages, 4 figures, 7 tables, published at KDD 2025 workshop on AI for Supply Chain: Today and Future

  11. arXiv:2507.10164  [pdf, ps, other

    cs.RO

    Robust RL Control for Bipedal Locomotion with Closed Kinematic Chains

    Authors: Egor Maslennikov, Eduard Zaliaev, Nikita Dudorov, Oleg Shamanin, Karanov Dmitry, Gleb Afanasev, Alexey Burkov, Egor Lygin, Simeon Nedelchev, Evgeny Ponomarev

    Abstract: Developing robust locomotion controllers for bipedal robots with closed kinematic chains presents unique challenges, particularly since most reinforcement learning (RL) approaches simplify these parallel mechanisms into serial models during training. We demonstrate that this simplification significantly impairs sim-to-real transfer by failing to capture essential aspects such as joint coupling, fr… ▽ More

    Submitted 14 July, 2025; originally announced July 2025.

  12. arXiv:2507.09823  [pdf, ps, other

    math.OC cs.LG

    Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization

    Authors: Ekaterina Borodich, Dmitry Kovalev

    Abstract: In this paper, we focus on the problem of minimizing a continuously differentiable convex objective function $\min_x f(x)$. Recently, several adaptive gradient methods, including GRAAL (Malitsky, 2020), have been developed. These methods estimate the local curvature of the objective function to compute stepsizes, attain the standard convergence rate $\mathcal{O}(1/k)$ of fixed-stepsize gradient de… ▽ More

    Submitted 13 July, 2025; originally announced July 2025.

  13. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  14. arXiv:2507.06211  [pdf, ps, other

    cs.LG

    Modern Methods in Associative Memory

    Authors: Dmitry Krotov, Benjamin Hoover, Parikshit Ram, Bao Pham

    Abstract: Associative Memories like the famous Hopfield Networks are elegant models for describing fully recurrent neural networks whose fundamental job is to store and retrieve information. In the past few years they experienced a surge of interest due to novel theoretical results pertaining to their information storage capabilities, and their relationship with SOTA AI architectures, such as Transformers a… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: Tutorial at ICML 2025

  15. arXiv:2507.05867  [pdf, ps, other

    eess.SY cs.RO math.OC

    Assessing Linear Control Strategies for Zero-Speed Fin Roll Damping

    Authors: Nikita Savin, Elena Ambrosovskaya, Dmitry Romaev, Anton Proskurnikov

    Abstract: Roll stabilization is a critical aspect of ship motion control, particularly for vessels operating in low-speed or zero-speed conditions, where traditional hydrodynamic fins lose their effectiveness. In this paper, we consider a roll damping system, developed by Navis JSC, based on two actively controlled zero-speed fins. Unlike conventional fin stabilizers, zero-speed fins employ a drag-based mec… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  16. arXiv:2507.05201  [pdf, ps, other

    cs.AI cs.CL cs.CV

    MedGemma Technical Report

    Authors: Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick , et al. (56 additional authors not shown)

    Abstract: Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce Me… ▽ More

    Submitted 12 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

  17. arXiv:2507.03482  [pdf, ps, other

    cs.SD eess.AS

    OMAR-RQ: Open Music Audio Representation Model Trained with Multi-Feature Masked Token Prediction

    Authors: Pablo Alonso-Jiménez, Pedro Ramoneda, R. Oguz Araz, Andrea Poltronieri, Dmitry Bogdanov

    Abstract: Developing open-source foundation models is essential for advancing research in music audio understanding and ensuring access to powerful, multipurpose representations for music information retrieval. We present OMAR-RQ, a model trained with self-supervision via masked token classification methodologies using a large-scale dataset with over 330,000 hours of music audio. We experiment with differen… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  18. arXiv:2507.02205  [pdf, ps, other

    cs.CV

    Team RAS in 9th ABAW Competition: Multimodal Compound Expression Recognition Approach

    Authors: Elena Ryumina, Maxim Markitantov, Alexandr Axyonov, Dmitry Ryumin, Mikhail Dolgushin, Alexey Karpov

    Abstract: Compound Expression Recognition (CER), a subfield of affective computing, aims to detect complex emotional states formed by combinations of basic emotions. In this work, we present a novel zero-shot multimodal approach for CER that combines six heterogeneous modalities into a single pipeline: static and dynamic facial expressions, scene and label matching, scene context, audio, and text. Unlike pr… ▽ More

    Submitted 4 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

    Comments: 7

  19. arXiv:2507.01633  [pdf, ps, other

    cs.CL cs.IR

    Confidence and Stability of Global and Pairwise Scores in NLP Evaluation

    Authors: Georgii Levtsov, Dmitry Ustalov

    Abstract: With the advent of highly capable instruction-tuned neural language models, benchmarking in natural language processing (NLP) is increasingly shifting towards pairwise comparison leaderboards, such as LMSYS Arena, from traditional global pointwise scores (e.g., GLUE, BIG-bench, SWE-bench). This paper empirically investigates the strengths and weaknesses of both global scores and pairwise compariso… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 8 pages, accepted at ACL SRW 2025

    MSC Class: 62-04 ACM Class: D.2.3

  20. arXiv:2507.01117  [pdf, ps, other

    cs.LG

    A Neural Operator based on Dynamic Mode Decomposition

    Authors: Nikita Sakovich, Dmitry Aksenov, Ekaterina Pleshakova, Sergey Gataullin

    Abstract: The scientific computation methods development in conjunction with artificial intelligence technologies remains a hot research topic. Finding a balance between lightweight and accurate computations is a solid foundation for this direction. The study presents a neural operator based on the dynamic mode decomposition algorithm (DMD), mapping functional spaces, which combines DMD and deep learning (D… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 30 pages, 10 figures

    MSC Class: 68T07; 35A99

  21. Real-Time In-Network Machine Learning on P4-Programmable FPGA SmartNICs with Fixed-Point Arithmetic and Taylor

    Authors: Mohammad Firas Sada, John J. Graham, Mahidhar Tatineni, Dmitry Mishin, Thomas A. DeFanti, Frank Würthwein

    Abstract: As machine learning (ML) applications become integral to modern network operations, there is an increasing demand for network programmability that enables low-latency ML inference for tasks such as Quality of Service (QoS) prediction and anomaly detection in cybersecurity. ML models provide adaptability through dynamic weight adjustments, making Programming Protocol-independent Packet Processors (… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: To appear in Proceedings of the Practice and Experience in Advanced Research Computing (PEARC25)

    Journal ref: Proceedings of the Practice and Experience in Advanced Research Computing PEARC '25, July 20-24, 2025, Columbus, OH, USA

  22. Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and High-Performance GPUs

    Authors: Mohammad Firas Sada, John J. Graham, Elham E Khoda, Mahidhar Tatineni, Dmitry Mishin, Rajesh K. Gupta, Rick Wagner, Larry Smarr, Thomas A. DeFanti, Frank Würthwein

    Abstract: This study presents a benchmarking analysis of the Qualcomm Cloud AI 100 Ultra (QAic) accelerator for large language model (LLM) inference, evaluating its energy efficiency (throughput per watt) and performance against leading NVIDIA (A100, H200) and AMD (MI300A) GPUs within the National Research Platform (NRP) ecosystem. A total of 15 open-source LLMs, ranging from 117 million to 90 billion param… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: To appear in Proceedings of the Practice and Experience in Advanced Research Computing (PEARC '25)

    Journal ref: Proceedings of the Practice and Experience in Advanced Research Computing PEARC25 2025

  23. arXiv:2506.23803  [pdf, ps, other

    cs.LG math.OC

    SGD with Adaptive Preconditioning: Unified Analysis and Momentum Acceleration

    Authors: Dmitry Kovalev

    Abstract: In this paper, we revisit stochastic gradient descent (SGD) with AdaGrad-type preconditioning. Our contributions are twofold. First, we develop a unified convergence analysis of SGD with adaptive preconditioning under anisotropic or matrix smoothness and noise assumptions. This allows us to recover state-of-the-art convergence results for several popular adaptive gradient methods, including AdaGra… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  24. arXiv:2506.22661  [pdf, ps, other

    cs.SD eess.AS

    Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification

    Authors: R. Oguz Araz, Guillem Cortès-Sebastià, Emilio Molina, Joan Serrà, Xavier Serra, Yuki Mitsufuji, Dmitry Bogdanov

    Abstract: Audio fingerprinting (AFP) allows the identification of unknown audio content by extracting compact representations, termed audio fingerprints, that are designed to remain robust against common audio degradations. Neural AFP methods often employ metric learning, where representation quality is influenced by the nature of the supervision and the utilized loss function. However, recent work unrealis… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted to ISMIR2025

  25. arXiv:2506.22631  [pdf, ps, other

    cs.LG stat.ML

    A hierarchical Vovk-Azoury-Warmuth forecaster with discounting for online regression in RKHS

    Authors: Dmitry B. Rokhlin

    Abstract: We study the problem of online regression with the unconstrained quadratic loss against a time-varying sequence of functions from a Reproducing Kernel Hilbert Space (RKHS). Recently, Jacobsen and Cutkosky (2024) introduced a discounted Vovk-Azoury-Warmuth (DVAW) forecaster that achieves optimal dynamic regret in the finite-dimensional case. In this work, we lift their approach to the non-parametri… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    MSC Class: 68Q32; 68W27; 68W20

  26. arXiv:2506.21782  [pdf, ps, other

    cs.LG cs.RO

    M3PO: Massively Multi-Task Model-Based Policy Optimization

    Authors: Aditya Narendra, Dmitry Makarov, Aleksandr Panov

    Abstract: We introduce Massively Multi-Task Model-Based Policy Optimization (M3PO), a scalable model-based reinforcement learning (MBRL) framework designed to address sample inefficiency in single-task settings and poor generalization in multi-task domains. Existing model-based approaches like DreamerV3 rely on pixel-level generative models that neglect control-centric representations, while model-free meth… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 6 pages, 4 figures. Accepted at IEEE/RSJ IROS 2025. Full version, including appendix and implementation details

  27. arXiv:2506.21520  [pdf, ps, other

    cs.CV

    MADrive: Memory-Augmented Driving Scene Modeling

    Authors: Polina Karpikova, Daniil Selikhanovych, Kirill Struminsky, Ruslan Musaev, Maria Golitsyna, Dmitry Baranchuk

    Abstract: Recent advances in scene reconstruction have pushed toward highly realistic modeling of autonomous driving (AD) environments using 3D Gaussian splatting. However, the resulting reconstructions remain closely tied to the original observations and struggle to support photorealistic synthesis of significantly altered or novel driving scenarios. This work introduces MADrive, a memory-augmented reconst… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  28. arXiv:2506.21170  [pdf, ps, other

    cs.CL

    Compressed and Smooth Latent Space for Text Diffusion Modeling

    Authors: Viacheslav Meshchaninov, Egor Chimbulatov, Alexander Shabalin, Aleksandr Abramov, Dmitry Vetrov

    Abstract: Autoregressive language models dominate modern text generation, yet their sequential nature introduces fundamental limitations: decoding is slow, and maintaining global coherence remains challenging. Diffusion models offer a promising alternative by enabling parallel generation and flexible control; however, their application to text generation is hindered by the high dimensionality of token-level… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  29. arXiv:2506.20890  [pdf, ps, other

    math.NA cs.CE physics.comp-ph

    Multicontinuum Homogenization for Poroelasticity Model

    Authors: Dmitry Ammosov, Mohammed Al-Kobaisi, Yalchin Efendiev

    Abstract: In this paper, we derive multicontinuum poroelasticity models using the multicontinuum homogenization method. Poroelasticity models are widely used in many areas of science and engineering to describe coupled flow and mechanics processes in porous media. However, in many applications, the properties of poroelastic media possess high contrast, presenting serious computational challenges. It is well… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  30. arXiv:2506.20657  [pdf, ps, other

    cs.DC hep-ex physics.ins-det

    SuperSONIC: Cloud-Native Infrastructure for ML Inferencing

    Authors: Dmitry Kondratyev, Benedikt Riedel, Yuan-Tang Chou, Miles Cochran-Branson, Noah Paladino, David Schultz, Mia Liu, Javier Duarte, Philip Harris, Shih-Chieh Hsu

    Abstract: The increasing computational demand from growing data rates and complex machine learning (ML) algorithms in large-scale scientific experiments has driven the adoption of the Services for Optimized Network Inference on Coprocessors (SONIC) approach. SONIC accelerates ML inference by offloading it to local or remote coprocessors to optimize resource utilization. Leveraging its portability to differe… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Submission to PEARC25 Conference

  31. arXiv:2506.17001  [pdf, ps, other

    cs.CL cs.IR

    PersonalAI: Towards digital twins in the graph form

    Authors: Mikhail Menschikov, Dmitry Evseev, Ruslan Kostoev, Ilya Perepechkin, Ilnaz Salimov, Victoria Dochkina, Petr Anokhin, Evgeny Burnaev, Nikita Semenov

    Abstract: The challenge of personalizing language models, specifically the ability to account for a user's history during interactions, is of significant interest. Despite recent advancements in large language models (LLMs) and Retrieval Augmented Generation that have enhanced the factual base of LLMs, the task of retaining extensive personal information and using it to generate personalized responses remai… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  32. arXiv:2506.15849  [pdf, ps, other

    cs.RO cs.CV

    PRISM-Loc: a Lightweight Long-range LiDAR Localization in Urban Environments with Topological Maps

    Authors: Kirill Muravyev, Vasily Yuryev, Oleg Bulichev, Dmitry Yudin, Konstantin Yakovlev

    Abstract: Localization in the environment is one of the crucial tasks of navigation of a mobile robot or a self-driving vehicle. For long-range routes, performing localization within a dense global lidar map in real time may be difficult, and the creation of such a map may require much memory. To this end, leveraging topological maps may be useful. In this work, we propose PRISM-Loc -- a topological map-bas… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: This version was submitted and rejected from IROS 2025 conference

  33. arXiv:2506.15313  [pdf, ps, other

    cs.CV cs.AI

    MapFM: Foundation Model-Driven HD Mapping with Multi-Task Contextual Learning

    Authors: Leonid Ivanov, Vasily Yuryev, Dmitry Yudin

    Abstract: In autonomous driving, high-definition (HD) maps and semantic maps in bird's-eye view (BEV) are essential for accurate localization, planning, and decision-making. This paper introduces an enhanced End-to-End model named MapFM for online vectorized HD map generation. We show significantly boost feature representation quality by incorporating powerful foundation model for encoding camera images. To… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: Preprint. Submitted. 12 pages, 4 figures

  34. arXiv:2506.12774  [pdf, ps, other

    math.CO cs.CG cs.DM

    On the Vertices of Delta-modular Polyhedra

    Authors: Bludov Mikhail, Gribanov Dmitry, Klimenko Maxim, Kupavskii Andrey, Lángi Zsolt, Rogozin Alexander, Voronov Vsevolod

    Abstract: Let $P$ be a polytope defined by the system $A x \leq b$, where $A \in R^{m \times n}$, $b \in R^m$, and $\text{rank}(A) = n$. We give a short geometric proof of the following tight upper bound on the number of vertices of $P$:… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  35. arXiv:2506.12770  [pdf, ps, other

    quant-ph cs.AI physics.atom-ph

    Solving tricky quantum optics problems with assistance from (artificial) intelligence

    Authors: Manas Pandey, Bharath Hebbe Madhusudhana, Saikat Ghosh, Dmitry Budker

    Abstract: The capabilities of modern artificial intelligence (AI) as a ``scientific collaborator'' are explored by engaging it with three nuanced problems in quantum optics: state populations in optical pumping, resonant transitions between decaying states (the Burshtein effect), and degenerate mirrorless lasing. Through iterative dialogue, the authors observe that AI models--when prompted and corrected--ca… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: 9 pages, 3 figures

  36. arXiv:2506.12022  [pdf, ps, other

    cs.CC

    Sign-Rank of $k$-Hamming Distance is Constant

    Authors: Mika Göös, Nathaniel Harms, Valentin Imbach, Dmitry Sokolov

    Abstract: We prove that the sign-rank of the $k$-Hamming Distance matrix on $n$ bits is $2^{O(k)}$, independent of the number of bits $n$. This strongly refutes the conjecture of Hatami, Hatami, Pires, Tao, and Zhao (RANDOM 2022), and Hatami, Hosseini, and Meng (STOC 2023), repeated in several other papers, that the sign-rank should depend on $n$. This conjecture would have qualitatively separated margin fr… ▽ More

    Submitted 1 May, 2025; originally announced June 2025.

    Comments: 19 pages, 6 figures

    MSC Class: 68Q15 ACM Class: F.1.3

  37. arXiv:2506.10801  [pdf, ps, other

    cs.LG

    Dense Associative Memory with Epanechnikov Energy

    Authors: Benjamin Hoover, Zhaoyang Shi, Krishnakumar Balasubramanian, Dmitry Krotov, Parikshit Ram

    Abstract: We propose a novel energy function for Dense Associative Memory (DenseAM) networks, the log-sum-ReLU (LSR), inspired by optimal kernel density estimation. Unlike the common log-sum-exponential (LSE) function, LSR is based on the Epanechnikov kernel and enables exact memory retrieval with exponential capacity without requiring exponential separation functions. Moreover, it introduces abundant addit… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  38. arXiv:2506.10632  [pdf, ps, other

    cs.LG cond-mat.stat-mech cs.CV math.DG math.ST

    Hessian Geometry of Latent Space in Generative Models

    Authors: Alexander Lobashev, Dmitry Guskov, Maria Larchenko, Mikhail Tamm

    Abstract: This paper presents a novel method for analyzing the latent space geometry of generative models, including statistical physics models and diffusion models, by reconstructing the Fisher information metric. The method approximates the posterior distribution of latent variables given generated samples and uses this to learn the log-partition function, which defines the Fisher metric for exponential f… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: ICML 2025

  39. arXiv:2506.09625  [pdf, ps, other

    cs.LG

    GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras

    Authors: Ekaterina Filimoshina, Dmitry Shirokov

    Abstract: We propose, implement, and compare with competitors a new architecture of equivariant neural networks based on geometric (Clifford) algebras: Generalized Lipschitz Group Equivariant Neural Networks (GLGENN). These networks are equivariant to all pseudo-orthogonal transformations, including rotations and reflections, of a vector space with any non-degenerate or degenerate symmetric bilinear form. W… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Accepted to ICML 2025

    MSC Class: 68T07; 15A66

  40. arXiv:2506.09440  [pdf, ps, other

    cs.CL cs.AI

    GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture

    Authors: GigaChat team, Mamedov Valentin, Evgenii Kosarev, Gregory Leleytner, Ilya Shchuckin, Valeriy Berezovskiy, Daniil Smirnov, Dmitry Kozlov, Sergei Averkiev, Lukyanenko Ivan, Aleksandr Proshunin, Ainur Israfilova, Ivan Baskov, Artem Chervyakov, Emil Shakirov, Mikhail Kolesov, Daria Khomich, Darya Latortseva, Sergei Porkhun, Yury Fedorov, Oleg Kutuzov, Polina Kudriavtseva, Sofiia Soldatova, Kolodin Egor, Stanislav Pyatkin , et al. (9 additional authors not shown)

    Abstract: Generative large language models (LLMs) have become crucial for modern NLP research and applications across various languages. However, the development of foundational models specifically tailored to the Russian language has been limited, primarily due to the significant computational resources required. This paper introduces the GigaChat family of Russian LLMs, available in various sizes, includi… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: ACL-2025 System Demo

  41. arXiv:2506.07216  [pdf, ps, other

    cs.CV

    AugmentGest: Can Random Data Cropping Augmentation Boost Gesture Recognition Performance?

    Authors: Nada Aboudeshish, Dmitry Ignatov, Radu Timofte

    Abstract: Data augmentation is a crucial technique in deep learning, particularly for tasks with limited dataset diversity, such as skeleton-based datasets. This paper proposes a comprehensive data augmentation framework that integrates geometric transformations, random cropping, rotation, zooming and intensity-based transformations, brightness and contrast adjustments to simulate real-world variations. Ran… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  42. arXiv:2506.05396  [pdf, ps, other

    cs.CV eess.IV

    Talk2SAM: Text-Guided Semantic Enhancement for Complex-Shaped Object Segmentation

    Authors: Luka Vetoshkin, Dmitry Yudin

    Abstract: Segmenting objects with complex shapes, such as wires, bicycles, or structural grids, remains a significant challenge for current segmentation models, including the Segment Anything Model (SAM) and its high-quality variant SAM-HQ. These models often struggle with thin structures and fine boundaries, leading to poor segmentation quality. We propose Talk2SAM, a novel approach that integrates textual… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 14 pages, 7 figures, Submitted to the conference

  43. arXiv:2506.04839  [pdf, other

    cs.IT

    Iterative Neural Rollback Chase-Pyndiah Decoding

    Authors: Dmitry Artemasov, Oleg Nesterenkov, Kirill Andreev, Pavel Rybin, Alexey Frolov

    Abstract: Iterative decoding is essential in modern communication systems, especially optical communications, where error-correcting codes such as turbo product codes (TPC) and staircase codes are widely employed. A key factor in achieving high error correction performance is the use of soft-decision decoding for component codes. However, implementing optimal maximum a posteriori (MAP) probability decoding… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  44. arXiv:2506.04505  [pdf, ps, other

    cs.RO cs.LG

    SGN-CIRL: Scene Graph-based Navigation with Curriculum, Imitation, and Reinforcement Learning

    Authors: Nikita Oskolkov, Huzhenyu Zhang, Dmitry Makarov, Dmitry Yudin, Aleksandr Panov

    Abstract: The 3D scene graph models spatial relationships between objects, enabling the agent to efficiently navigate in a partially observable environment and predict the location of the target object.This paper proposes an original framework named SGN-CIRL (3D Scene Graph-Based Reinforcement Learning Navigation) for mapless reinforcement learning-based robot navigation with learnable representation of ope… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 7 pages, 11 figures

  45. arXiv:2506.04359  [pdf, ps, other

    cs.RO cs.AI cs.SE

    cuVSLAM: CUDA accelerated visual odometry and mapping

    Authors: Alexander Korovko, Dmitry Slepichev, Alexander Efitorov, Aigul Dzhumamuratova, Viktor Kuznetsov, Hesam Rabeti, Joydeep Biswas, Soha Pouya

    Abstract: Accurate and robust pose estimation is a key requirement for any autonomous robot. We present cuVSLAM, a state-of-the-art solution for visual simultaneous localization and mapping, which can operate with a variety of visual-inertial sensor suites, including multiple RGB and depth cameras, and inertial measurement units. cuVSLAM supports operation with as few as one RGB camera to as many as 32 came… ▽ More

    Submitted 8 July, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

  46. arXiv:2506.03073  [pdf, ps, other

    cs.CV

    LEG-SLAM: Real-Time Language-Enhanced Gaussian Splatting for SLAM

    Authors: Roman Titkov, Egor Zubkov, Dmitry Yudin, Jaafar Mahmoud, Malik Mohrat, Gennady Sidorov

    Abstract: Modern Gaussian Splatting methods have proven highly effective for real-time photorealistic rendering of 3D scenes. However, integrating semantic information into this representation remains a significant challenge, especially in maintaining real-time performance for SLAM (Simultaneous Localization and Mapping) applications. In this work, we introduce LEG-SLAM -- a novel approach that fuses an opt… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  47. arXiv:2506.02276  [pdf, ps, other

    cs.LG stat.ML

    Latent Stochastic Interpolants

    Authors: Saurabh Singh, Dmitry Lagun

    Abstract: Stochastic Interpolants (SI) are a powerful framework for generative modeling, capable of flexibly transforming between two probability distributions. However, their use in jointly optimized latent variable models remains unexplored as they require direct access to the samples from the two distributions. This work presents Latent Stochastic Interpolants (LSI) enabling joint learning in a latent sp… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Under Review

  48. arXiv:2506.01541  [pdf, ps, other

    cs.LG stat.ML

    Adaptive Destruction Processes for Diffusion Samplers

    Authors: Timofei Gritsaev, Nikita Morozov, Kirill Tamogashev, Daniil Tiapkin, Sergey Samsonov, Alexey Naumov, Dmitry Vetrov, Nikolay Malkin

    Abstract: This paper explores the challenges and benefits of a trainable destruction process in diffusion samplers -- diffusion-based generative models trained to sample an unnormalised density without access to data samples. Contrary to the majority of work that views diffusion samplers as approximations to an underlying continuous-time model, we view diffusion models as discrete-time policies trained to p… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  49. arXiv:2505.23489  [pdf, ps, other

    cs.LG

    SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training

    Authors: Ildus Sadrtdinov, Ivan Klimov, Ekaterina Lobacheva, Dmitry Vetrov

    Abstract: We present a thermodynamic interpretation of the stationary behavior of stochastic gradient descent (SGD) under fixed learning rates (LRs) in neural network training. We show that SGD implicitly minimizes a free energy function $F=U-TS$, balancing training loss $U$ and the entropy of the weights distribution $S$, with temperature $T$ determined by the LR. This perspective offers a new lens on why… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: First two authors contributed equally

  50. arXiv:2505.23299  [pdf, ps, other

    cs.CL

    Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs

    Authors: Julia Belikova, Konstantin Polev, Rauf Parchiev, Dmitry Simakov

    Abstract: Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly deployed in industry applications, yet their reliability remains hampered by challenges in detecting hallucinations. While supervised state-of-the-art (SOTA) methods that leverage LLM hidden states -- such as activation tracing and representation analysis -- show promise, their dependence on extensively… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.