这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 645 results for author: Das, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.12189  [pdf, ps, other

    quant-ph cs.AI cs.LG cs.PF

    BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search

    Authors: Azhar Ikhtiarudin, Aditi Das, Param Thakkar, Akash Kundu

    Abstract: We introduce BenchRL-QAS, a unified benchmarking framework for systematically evaluating reinforcement learning (RL) algorithms in quantum architecture search (QAS) across diverse variational quantum algorithm tasks and system sizes ranging from 2- to 8-qubit. Our study benchmarks nine RL agents including both value-based and policy-gradient methods on representative quantum problems such as varia… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

    Comments: Comprehensive RL agent benchmark for QAS. Contributions are welcomed here: https://github.com/azhar-ikhtiarudin/bench-rlqas

  2. Secure and Efficient Quantum Signature Scheme Based on the Controlled Unitary Operations Encryption

    Authors: Debnath Ghosh, Soumit Roy, Prithwi Bagchi, Indranil Chakrabarty, Ashok Kumar Das

    Abstract: Quantum digital signatures ensure unforgeable message authenticity and integrity using quantum principles, offering unconditional security against both classical and quantum attacks. They are crucial for secure communication in high-stakes environments, ensuring trust and long-term protection in the quantum era. Nowadays, the majority of arbitrated quantum signature (AQS) protocols encrypt data qu… ▽ More

    Submitted 14 July, 2025; originally announced July 2025.

    Comments: 22 pages, 3 figures. Accepted in Quantum Information Processing

    Report number: SN-1573-1332

    Journal ref: Quantum Inf Process 24, 227 (2025)

  3. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  4. arXiv:2507.02949  [pdf, ps, other

    cs.CL

    RADIANT: Retrieval AugmenteD entIty-context AligNmenT -- Introducing RAG-ability and Entity-Context Divergence

    Authors: Vipula Rawte, Rajarshi Roy, Gurpreet Singh, Danush Khanna, Yaswanth Narsupalli, Basab Ghosh, Abhay Gupta, Argha Kamal Samanta, Aditya Shingote, Aadi Krishna Vikram, Vinija Jain, Aman Chadha, Amit Sheth, Amitava Das

    Abstract: As Large Language Models (LLMs) continue to advance, Retrieval-Augmented Generation (RAG) has emerged as a vital technique to enhance factual accuracy by integrating external knowledge into the generation process. However, LLMs often fail to faithfully integrate retrieved evidence into their generated responses, leading to factual inconsistencies. To quantify this gap, we introduce Entity-Context… ▽ More

    Submitted 28 June, 2025; originally announced July 2025.

  5. arXiv:2507.02900  [pdf, ps, other

    cs.CV cs.AI cs.GR cs.HC cs.MM

    Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions

    Authors: Vineet Kumar Rakesh, Soumya Mazumdar, Research Pratim Maity, Sarbajit Pal, Amitabha Das, Tapas Samanta

    Abstract: Talking Head Generation (THG) has emerged as a transformative technology in computer vision, enabling the synthesis of realistic human faces synchronized with image, audio, text, or video inputs. This paper provides a comprehensive review of methodologies and frameworks for talking head generation, categorizing approaches into 2D--based, 3D--based, Neural Radiance Fields (NeRF)--based, diffusion--… ▽ More

    Submitted 23 June, 2025; originally announced July 2025.

  6. PosDiffAE: Position-aware Diffusion Auto-encoder For High-Resolution Brain Tissue Classification Incorporating Artifact Restoration

    Authors: Ayantika Das, Moitreya Chaudhuri, Koushik Bhat, Keerthi Ram, Mihail Bota, Mohanasankar Sivaprakasam

    Abstract: Denoising diffusion models produce high-fidelity image samples by capturing the image distribution in a progressive manner while initializing with a simple distribution and compounding the distribution complexity. Although these models have unlocked new applicabilities, the sampling mechanism of diffusion does not offer means to extract image-specific semantic representation, which is inherently p… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: Published in IEEE Journal of Biomedical and Health Informatics (Early Access Available) https://ieeexplore.ieee.org/document/10989734

  7. arXiv:2506.23971  [pdf, ps, other

    cs.LG

    UMA: A Family of Universal Models for Atoms

    Authors: Brandon M. Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso-Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R. Kitchin, Daniel S. Levine, Kyle Michel, Anuroop Sriram, Taco Cohen, Abhishek Das, Ammar Rizvi, Sushree Jagriti Sahoo, Zachary W. Ulissi, C. Lawrence Zitnick

    Abstract: The ability to quickly and accurately compute properties from atomic simulations is critical for advancing a large number of applications in chemistry and materials science including drug discovery, energy storage, and semiconductor manufacturing. To address this need, Meta FAIR presents a family of Universal Models for Atoms (UMA), designed to push the frontier of speed, accuracy, and generalizat… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 29 pages, 5 figures

  8. arXiv:2506.22960  [pdf, ps, other

    cs.CV

    Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking Technique for AI-Generated Images

    Authors: Shreyas Dixit, Ashhar Aziz, Shashwat Bajpai, Vasu Sharma, Aman Chadha, Vinija Jain, Amitava Das

    Abstract: A report by the European Union Law Enforcement Agency predicts that by 2026, up to 90 percent of online content could be synthetically generated, raising concerns among policymakers, who cautioned that "Generative AI could act as a force multiplier for political disinformation. The combined effect of generative text, images, videos, and audio may surpass the influence of any single modality." In r… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  9. arXiv:2506.22396  [pdf, ps, other

    cs.CL cs.AI

    QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization

    Authors: Danush Khanna, Aditya Kumar Guru, Srivarshinee Sridhar, Zidan Ahmed, Rubhav Bahirwani, Meetu Malhotra, Vinija Jain, Aman Chadha, Amitava Das, Kripabandhu Ghosh

    Abstract: Inference accounts for the majority of latency and energy consumption in large language model (LLM) deployments, often exceeding 90% of total cost. While training-time efficiency has seen extensive progress, runtime optimization remains a key bottleneck, particularly under autoregressive decoding. Existing approaches -- such as pruning, quantization, early exits, and speculative decoding -- often… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Preprint. Under submission

    ACM Class: I.2.0; I.2.7

  10. arXiv:2506.19487  [pdf, ps, other

    cs.ET

    TRMAC: A Time-Reversal-based MAC Protocol for Wireless Networks within Computing Packages

    Authors: Ama Bandara, Abhijit Das, Fatima Rodriguez-Galan, Eduard Alarcon, Sergi Abadal

    Abstract: As chiplet-based integration and many-core architectures become the norm in high-performance computing, on-chip wireless communication has emerged as a compelling alternative to traditional interconnects. However, scalable Medium Access Control (MAC) remains a fundamental challenge, particularly under dense traffic and limited spectral resources. This paper presents TRMAC, a novel cross-layer MAC… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  11. arXiv:2506.17721  [pdf, ps, other

    cs.DC cs.MA

    Distributed Butterfly Analysis using Mobile Agents

    Authors: Prabhat Kumar Chand, Apurba Das, Anisur Rahaman Molla

    Abstract: Butterflies, or 4-cycles in bipartite graphs, are crucial for identifying cohesive structures and dense subgraphs. While agent-based data mining is gaining prominence, its application to bipartite networks remains relatively unexplored. We propose distributed, agent-based algorithms for \emph{Butterfly Counting} in a bipartite graph $G((A,B),E)$. Agents first determine their respective partitions… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  12. arXiv:2506.15853  [pdf

    eess.IV cs.AI cs.CV

    Cross-Modality Learning for Predicting IHC Biomarkers from H&E-Stained Whole-Slide Images

    Authors: Amit Das, Naofumi Tomita, Kyle J. Syme, Weijie Ma, Paige O'Connor, Kristin N. Corbett, Bing Ren, Xiaoying Liu, Saeed Hassanpour

    Abstract: Hematoxylin and Eosin (H&E) staining is a cornerstone of pathological analysis, offering reliable visualization of cellular morphology and tissue architecture for cancer diagnosis, subtyping, and grading. Immunohistochemistry (IHC) staining provides molecular insights by detecting specific proteins within tissues, enhancing diagnostic accuracy, and improving treatment planning. However, IHC staini… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  13. arXiv:2506.15488  [pdf, ps, other

    cs.DC

    Minimizing Communication for Parallel Symmetric Tensor Times Same Vector Computation

    Authors: Hussam Al Daas, Grey Ballard, Laura Grigori, Suraj Kumar, Kathryn Rouse, Mathieu Vérité

    Abstract: In this article, we focus on the parallel communication cost of multiplying the same vector along two modes of a $3$-dimensional symmetric tensor. This is a key computation in the higher-order power method for determining eigenpairs of a $3$-dimensional symmetric tensor and in gradient-based methods for computing a symmetric CP decomposition. We establish communication lower bounds that determine… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 19 pages, 1 figure

  14. arXiv:2506.14903  [pdf, ps, other

    cs.CV

    DETONATE: A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization

    Authors: Renjith Prasad, Abhilekh Borah, Hasnat Md Abdullah, Chathurangi Shyalika, Gurpreet Singh, Ritvik Garimella, Rajarshi Roy, Harshul Surana, Nasrin Imanpour, Suranjana Trivedy, Amit Sheth, Amitava Das

    Abstract: Alignment is crucial for text-to-image (T2I) models to ensure that generated images faithfully capture user intent while maintaining safety and fairness. Direct Preference Optimization (DPO), prominent in large language models (LLMs), is extending its influence to T2I systems. This paper introduces DPO-Kernels for T2I models, a novel extension enhancing alignment across three dimensions: (i) Hybri… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 59 pages, 10 figures

  15. arXiv:2506.14204  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios

    Authors: Aswin Shanmugam Subramanian, Amit Das, Naoyuki Kanda, Jinyu Li, Xiaofei Wang, Yifan Gong

    Abstract: We extend the frameworks of Serialized Output Training (SOT) to address practical needs of both streaming and offline automatic speech recognition (ASR) applications. Our approach focuses on balancing latency and accuracy, catering to real-time captioning and summarization requirements. We propose several key improvements: (1) Leveraging Continuous Speech Separation (CSS) single-channel front-end… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025

  16. arXiv:2506.13901  [pdf, ps, other

    cs.CL cs.AI

    Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations

    Authors: Abhilekh Borah, Chhavi Sharma, Danush Khanna, Utkarsh Bhatt, Gurpreet Singh, Hasnat Md Abdullah, Raghav Kaushik Ravi, Vinija Jain, Jyoti Patel, Shubham Singh, Vasu Sharma, Arpita Vats, Rahul Raja, Aman Chadha, Amitava Das

    Abstract: Alignment is no longer a luxury, it is a necessity. As large language models (LLMs) enter high-stakes domains like education, healthcare, governance, and law, their behavior must reliably reflect human-aligned values and safety constraints. Yet current evaluations rely heavily on behavioral proxies such as refusal rates, G-Eval scores, and toxicity classifiers, all of which have critical blind spo… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  17. arXiv:2506.12073  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.SD

    Seamless Dysfluent Speech Text Alignment for Disordered Speech Analysis

    Authors: Zongli Ye, Jiachen Lian, Xuanru Zhou, Jinming Zhang, Haodong Li, Shuhe Li, Chenxu Guo, Anaisha Das, Peter Park, Zoe Ezzes, Jet Vonk, Brittany Morin, Rian Bogley, Lisa Wauters, Zachary Miller, Maria Gorno-Tempini, Gopala Anumanchipalli

    Abstract: Accurate alignment of dysfluent speech with intended text is crucial for automating the diagnosis of neurodegenerative speech disorders. Traditional methods often fail to model phoneme similarities effectively, limiting their performance. In this work, we propose Neural LCS, a novel approach for dysfluent text-text and speech-text alignment. Neural LCS addresses key challenges, including partial a… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted for Interspeech2025

  18. arXiv:2506.11286  [pdf, ps, other

    cs.NE

    Mapping and Scheduling Spiking Neural Networks On Segmented Ladder Bus Architectures

    Authors: Phu Khanh Huynh, Francky Catthoor, Anup Das

    Abstract: Large-scale neuromorphic architectures consist of computing tiles that communicate spikes using a shared interconnect. The communication patterns in these systems are inherently sparse, asynchronous, and localized, as neural activity is characterized by temporal sparsity with occasional bursts of high traffic. These characteristics require optimized interconnects to handle high-activity bursts whi… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  19. arXiv:2506.08991  [pdf, ps, other

    cs.CV cs.CR

    Do Concept Replacement Techniques Really Erase Unacceptable Concepts?

    Authors: Anudeep Das, Gurjot Singh, Prach Chantasantitam, N. Asokan

    Abstract: Generative models, particularly diffusion-based text-to-image (T2I) models, have demonstrated astounding success. However, aligning them to avoid generating content with unacceptable concepts (e.g., offensive or copyrighted content, or celebrity likenesses) remains a significant challenge. Concept replacement techniques (CRTs) aim to address this challenge, often by trying to "erase" unacceptable… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  20. arXiv:2506.08885  [pdf, ps, other

    cs.CL cs.LG

    AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)

    Authors: Danush Khanna, Krishna Kumar, Basab Ghosh, Vinija Jain, Vasu Sharma, Aman Chadha, Amitava Das

    Abstract: Adversarial threats against LLMs are escalating faster than current defenses can adapt. We expose a critical geometric blind spot in alignment: adversarial prompts exploit latent camouflage, embedding perilously close to the safe representation manifold while encoding unsafe intent thereby evading surface level defenses like Direct Preference Optimization (DPO), which remain blind to the latent ge… ▽ More

    Submitted 11 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  21. arXiv:2506.03392  [pdf, ps, other

    cs.LG cs.NE eess.SY

    Improving Performance of Spike-based Deep Q-Learning using Ternary Neurons

    Authors: Aref Ghoreishee, Abhishek Mishra, John Walsh, Anup Das, Nagarajan Kandasamy

    Abstract: We propose a new ternary spiking neuron model to improve the representation capacity of binary spiking neurons in deep Q-learning. Although a ternary neuron model has recently been introduced to overcome the limited representation capacity offered by the binary spiking neurons, we show that its performance is worse than that of binary models in deep Q-learning tasks. We hypothesize gradient estima… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  22. arXiv:2506.00653  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Linear Representation Transferability Hypothesis: Leveraging Small Models to Steer Large Models

    Authors: Femi Bello, Anubrata Das, Fanzhi Zeng, Fangcong Yin, Liu Leqi

    Abstract: It has been hypothesized that neural networks with similar architectures trained on similar data learn shared representations relevant to the learning task. We build on this idea by extending the conceptual framework where representations learned across models trained on the same data can be expressed as linear combinations of a \emph{universal} set of basis features. These basis features underlie… ▽ More

    Submitted 4 June, 2025; v1 submitted 31 May, 2025; originally announced June 2025.

  23. arXiv:2505.23503  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Can Large Language Models Challenge CNNs in Medical Image Analysis?

    Authors: Shibbir Ahmed, Shahnewaz Karim Sakib, Anindya Bijoy Das

    Abstract: This study presents a multimodal AI framework designed for precisely classifying medical diagnostic images. Utilizing publicly available datasets, the proposed system compares the strengths of convolutional neural networks (CNNs) and different large language models (LLMs). This in-depth comparative analysis highlights key differences in diagnostic performance, execution efficiency, and environment… ▽ More

    Submitted 3 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  24. arXiv:2505.21597  [pdf, other

    eess.IV cs.CV cs.LG

    Optimizing Deep Learning for Skin Cancer Classification: A Computationally Efficient CNN with Minimal Accuracy Trade-Off

    Authors: Abdullah Al Mamun, Pollob Chandra Ray, Md Rahat Ul Nasib, Akash Das, Jia Uddin, Md Nurul Absur

    Abstract: The rapid advancement of deep learning in medical image analysis has greatly enhanced the accuracy of skin cancer classification. However, current state-of-the-art models, especially those based on transfer learning like ResNet50, come with significant computational overhead, rendering them impractical for deployment in resource-constrained environments. This study proposes a custom CNN model that… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 6 pages, & 7 Images

  25. arXiv:2505.17584  [pdf, ps, other

    eess.AS cs.SD

    Private kNN-VC: Interpretable Anonymization of Converted Speech

    Authors: Carlos Franzreb, Arnab Das, Tim Polzehl, Sebastian Möller

    Abstract: Speaker anonymization seeks to conceal a speaker's identity while preserving the utility of their speech. The achieved privacy is commonly evaluated with a speaker recognition model trained on anonymized speech. Although this represents a strong attack, it is unclear which aspects of speech are exploited to identify the speakers. Our research sets out to unveil these aspects. It starts with kNN-VC… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  26. arXiv:2505.17332  [pdf, ps, other

    cs.CL cs.AI cs.LG cs.MA

    SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use

    Authors: Hitesh Laxmichand Patel, Amit Agarwal, Arion Das, Bhargava Kumar, Srikant Panda, Priyaranjan Pattnayak, Taki Hasan Rafi, Tejaswini Kumar, Dong-Kyu Chae

    Abstract: Enterprise customers are increasingly adopting Large Language Models (LLMs) for critical communication tasks, such as drafting emails, crafting sales pitches, and composing casual messages. Deploying such models across different regions requires them to understand diverse cultural and linguistic contexts and generate safe and respectful responses. For enterprise applications, it is crucial to miti… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Published in the Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2025), Industry Track, pages 558-582

    ACM Class: I.2.7; I.2.6

  27. arXiv:2505.16986  [pdf, ps, other

    cs.CL cs.AI

    T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning

    Authors: Amartya Chakraborty, Paresh Dashore, Nadia Bathaee, Anmol Jain, Anirban Das, Shi-Xiong Zhang, Sambit Sahu, Milind Naphade, Genta Indra Winata

    Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities as intelligent agents capable of solving complex problems. However, effective planning in scenarios involving dependencies between API or tool calls-particularly in multi-turn conversations-remains a significant challenge. To address this, we introduce T1, a tool-augmented, multi-domain, multi-turn conversational dataset specif… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Preprint

  28. arXiv:2505.16351  [pdf, other

    eess.AS cs.AI

    Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection

    Authors: Chenxu Guo, Jiachen Lian, Xuanru Zhou, Jinming Zhang, Shuhe Li, Zongli Ye, Hwi Joo Park, Anaisha Das, Zoe Ezzes, Jet Vonk, Brittany Morin, Rian Bogley, Lisa Wauters, Zachary Miller, Maria Gorno-Tempini, Gopala Anumanchipalli

    Abstract: Automatic detection of speech dysfluency aids speech-language pathologists in efficient transcription of disordered speech, enhancing diagnostics and treatment planning. Traditional methods, often limited to classification, provide insufficient clinical insight, and text-independent models misclassify dysfluency, especially in context-dependent cases. This work introduces Dysfluent-WFST, a zero-sh… ▽ More

    Submitted 24 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Accepted for Interspeech2025

  29. arXiv:2505.12217  [pdf, ps, other

    cs.CV

    Hyperspectral Image Land Cover Captioning Dataset for Vision Language Models

    Authors: Aryan Das, Tanishq Rachamalla, Pravendra Singh, Koushik Biswas, Vinay Kumar Verma, Swalpa Kumar Roy

    Abstract: We introduce HyperCap, the first large-scale hyperspectral captioning dataset designed to enhance model performance and effectiveness in remote sensing applications. Unlike traditional hyperspectral imaging (HSI) datasets that focus solely on classification tasks, HyperCap integrates spectral data with pixel-wise textual annotations, enabling deeper semantic understanding of hyperspectral imagery.… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  30. arXiv:2505.10303  [pdf, ps, other

    math.LO cs.FL cs.LO

    An algebraic theory of ω-regular languages, via μν-expressions

    Authors: Anupam Das, Abhishek De

    Abstract: Alternating parity automata (APAs) provide a robust formalism for modelling infinite behaviours and play a central role in formal verification. Despite their widespread use, the algebraic theory underlying APAs has remained largely unexplored. In recent work, a notation for non-deterministic finite automata (NFAs) was introduced, along with a sound and complete axiomatisation of their equational t… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Preprint

  31. arXiv:2505.09000  [pdf, ps, other

    cs.LO cs.FL math.LO

    Cyclic system for an algebraic theory of alternating parity automata

    Authors: Anupam Das, Abhishek De

    Abstract: $ω… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 26 pages, 3 figures

  32. arXiv:2505.08910  [pdf, ps, other

    cs.CV cs.CL

    Behind Maya: Building a Multilingual Vision Language Model

    Authors: Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Timothy Chung, Bala Krishna S Vegesna, Abhipsha Das, Anthony Susevski, Ryan Sze-Yin Chan, S M Iftekhar Uddin, Shayekh Bin Islam, Roshan Santhosh, Snegha A, Drishti Sharma, Chen Liu, Isha Chaturvedi, Genta Indra Winata, Ashvanth. S, Snehanshu Mukherjee, Alham Fikri Aji

    Abstract: In recent times, we have seen a rapid development of large Vision-Language Models (VLMs). They have shown impressive results on academic benchmarks, primarily in widely spoken languages but lack performance on low-resource languages and varied cultural contexts. To address these limitations, we introduce Maya, an open-source Multilingual VLM. Our contributions are: 1) a multilingual image-text pre… ▽ More

    Submitted 15 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

    Comments: Accepted at VLMs4ALL CVPR 2025 Workshop; corrected workshop name spelling

  33. arXiv:2505.04181  [pdf

    cs.CR

    Privacy Challenges In Image Processing Applications

    Authors: Maneesha, Bharat Gupta, Rishabh Sethi, Charvi Adita Das

    Abstract: As image processing systems proliferate, privacy concerns intensify given the sensitive personal information contained in images. This paper examines privacy challenges in image processing and surveys emerging privacy-preserving techniques including differential privacy, secure multiparty computation, homomorphic encryption, and anonymization. Key applications with heightened privacy risks include… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 19 pages, 3 figures

  34. arXiv:2505.03983  [pdf, other

    cs.LG cs.AI

    Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation

    Authors: Hengyuan Hu, Aniket Das, Dorsa Sadigh, Nima Anari

    Abstract: Denoising Diffusion Probabilistic Models (DDPMs) have emerged as powerful tools for generative modeling. However, their sequential computation requirements lead to significant inference-time bottlenecks. In this work, we utilize the connection between DDPMs and Stochastic Localization to prove that, under an appropriate reparametrization, the increments of DDPM satisfy an exchangeability property.… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  35. arXiv:2504.20976  [pdf, other

    cs.HC

    Real-Time Wayfinding Assistant for Blind and Low-Vision Users

    Authors: Dabbrata Das, Argho Deb Das, Farhan Sadaf

    Abstract: Navigating unfamiliar places continues to be one of the most persistent and essential everyday obstacles for those who are blind or have limited vision (BLV). Existing assistive technologies, such as GPS-based navigation systems, AI-powered smart glasses, and sonar-equipped canes, often face limitations in real-time obstacle avoidance, precise localization, and adaptability to dynamic surroundings… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  36. arXiv:2504.19967  [pdf

    cs.ET cs.AI cs.LG stat.AP

    Enhancing short-term traffic prediction by integrating trends and fluctuations with attention mechanism

    Authors: Adway Das, Agnimitra Sengupta, S. Ilgin Guler

    Abstract: Traffic flow prediction is a critical component of intelligent transportation systems, yet accurately forecasting traffic remains challenging due to the interaction between long-term trends and short-term fluctuations. Standard deep learning models often struggle with these challenges because their architectures inherently smooth over fine-grained fluctuations while focusing on general trends. Thi… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  37. arXiv:2504.19061  [pdf, other

    cs.CL cs.AI cs.HC

    Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models

    Authors: Anindya Bijoy Das, Shibbir Ahmed, Shahnewaz Karim Sakib

    Abstract: Clinical summarization is crucial in healthcare as it distills complex medical data into digestible information, enhancing patient understanding and care management. Large language models (LLMs) have shown significant potential in automating and improving the accuracy of such summarizations due to their advanced natural language understanding capabilities. These models are particularly applicable… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

  38. arXiv:2504.18887  [pdf, ps, other

    cs.IT eess.SP

    Closed-Form Expressions for I/O Relation in Zak-OTFS with Different Delay-Doppler Filters

    Authors: Arpan Das, Fathima Jesbin, Ananthanarayanan Chockalingam

    Abstract: The transceiver operations in the delay-Doppler (DD) domain in Zak-OTFS modulation, including DD domain filtering at the transmitter and receiver, involve twisted convolution operation. The twisted convolution operations give rise to multiple integrals in the end-to-end DD domain input-output (I/O) relation. The I/O relation plays a crucial role in performance evaluation and algorithm development… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

    Comments: IEEE TVT. Copyright IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  39. arXiv:2504.09249  [pdf, other

    cs.CV cs.IR cs.LG cs.MM

    NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding

    Authors: Aniket Pal, Sanket Biswas, Alloy Das, Ayush Lodh, Priyanka Banerjee, Soumitri Chattopadhyay, Dimosthenis Karatzas, Josep Llados, C. V. Jawahar

    Abstract: Understanding and reasoning over academic handwritten notes remains a challenge in document AI, particularly for mathematical equations, diagrams, and scientific notations. Existing visual question answering (VQA) benchmarks focus on printed or structured handwritten text, limiting generalization to real-world note-taking. To address this, we introduce NoTeS-Bank, an evaluation benchmark for Neura… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  40. arXiv:2504.03603  [pdf, other

    cs.AI cs.LG

    Towards deployment-centric multimodal AI beyond vision and language

    Authors: Xianyuan Liu, Jiayang Zhang, Shuo Zhou, Thijs L. van der Plas, Avish Vijayaraghavan, Anastasiia Grishina, Mengdie Zhuang, Daniel Schofield, Christopher Tomlinson, Yuhan Wang, Ruizhe Li, Louisa van Zeeland, Sina Tabakhi, Cyndie Demeocq, Xiang Li, Arunav Das, Orlando Timmerman, Thomas Baldwin-McDonald, Jinge Wu, Peizhen Bai, Zahraa Al Sahili, Omnia Alwazzan, Thao N. Do, Mohammod N. I. Suvon, Angeline Wang , et al. (23 additional authors not shown)

    Abstract: Multimodal artificial intelligence (AI) integrates diverse types of data via machine learning to improve understanding, prediction, and decision-making across disciplines such as healthcare, science, and engineering. However, most multimodal AI advances focus on models for vision and language data, while their deployability remains a key challenge. We advocate a deployment-centric workflow that in… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  41. arXiv:2504.02671  [pdf, other

    cs.CL

    LLM for Complex Reasoning Task: An Exploratory Study in Fermi Problems

    Authors: Zishuo Liu, Carlos Rabat Villarreal, Mostafa Rahgouy, Amit Das, Zheng Zhang, Chang Ren, Dongji Feng

    Abstract: Fermi Problems (FPs) are mathematical reasoning tasks that require human-like logic and numerical reasoning. Unlike other reasoning questions, FPs often involve real-world impracticalities or ambiguous concepts, making them challenging even for humans to solve. Despite advancements in AI, particularly with large language models (LLMs) in various reasoning tasks, FPs remain relatively under-explore… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 7 pages,7 tables, 5 figures

  42. arXiv:2504.01281  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.IR

    Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding

    Authors: Sakhinana Sagar Srinivas, Akash Das, Shivam Gupta, Venkataramana Runkana

    Abstract: We present a comprehensive framework for enhancing Retrieval-Augmented Generation (RAG) systems through dynamic retrieval strategies and reinforcement fine-tuning. This approach significantly improves large language models on knowledge-intensive tasks, including opendomain question answering and complex reasoning. Our framework integrates two complementary techniques: Policy-Optimized RetrievalAug… ▽ More

    Submitted 20 May, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

  43. arXiv:2504.00338  [pdf, other

    cs.LG cs.AI cs.MA cs.SI

    Agentic Multimodal AI for Hyperpersonalized B2B and B2C Advertising in Competitive Markets: An AI-Driven Competitive Advertising Framework

    Authors: Sakhinana Sagar Srinivas, Akash Das, Shivam Gupta, Venkataramana Runkana

    Abstract: The growing use of foundation models (FMs) in real-world applications demands adaptive, reliable, and efficient strategies for dynamic markets. In the chemical industry, AI-discovered materials drive innovation, but commercial success hinges on market adoption, requiring FM-driven advertising frameworks that operate in-the-wild. We present a multilingual, multimodal AI framework for autonomous, hy… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  44. arXiv:2503.19786  [pdf, other

    cs.CL cs.AI

    Gemma 3 Technical Report

    Authors: Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai Zhai, Anton Tsitsulin , et al. (191 additional authors not shown)

    Abstract: We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  45. arXiv:2503.14053  [pdf, other

    cs.LG cs.AI eess.SY

    ON-Traffic: An Operator Learning Framework for Online Traffic Flow Estimation and Uncertainty Quantification from Lagrangian Sensors

    Authors: Jake Rap, Amritam Das

    Abstract: Accurate traffic flow estimation and prediction are critical for the efficient management of transportation systems, particularly under increasing urbanization. Traditional methods relying on static sensors often suffer from limited spatial coverage, while probe vehicles provide richer, albeit sparse and irregular data. This work introduces ON-Traffic, a novel deep operator Network and a receding… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  46. arXiv:2503.12996  [pdf, ps, other

    cs.CC cs.DS

    Semi-Streaming Algorithms for Graph Property Certification

    Authors: Avinandan Das, Pierre Fraigniaud, Ami Paz, Adi Rosen

    Abstract: We introduce the {\em certification} of solutions to graph problems when access to the input is restricted. This topic has received a lot of attention in the distributed computing setting, and we introduce it here in the context of \emph{streaming} algorithms, where the input is too large to be stored in memory. Given a graph property $\mbox{P}$, a \emph{streaming certification scheme} for… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  47. arXiv:2503.10852  [pdf, ps, other

    math.CO cs.DM

    New Vertex Ordering Characterizations of Circular-Arc Bigraphs

    Authors: Indrajit Paul, Ashok Kumar Das

    Abstract: In this article, we present two new characterizations of circular-arc bigraphs based on their vertex ordering. Also, we provide a characterization of circular-arc bigraphs in terms of forbidden patterns with respect to a particular ordering of their vertices.

    Submitted 8 April, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    MSC Class: 05C90; 68R10; 05C10

  48. arXiv:2503.10690  [pdf, other

    cs.CL cs.CR

    Battling Misinformation: An Empirical Study on Adversarial Factuality in Open-Source Large Language Models

    Authors: Shahnewaz Karim Sakib, Anindya Bijoy Das, Shibbir Ahmed

    Abstract: Adversarial factuality refers to the deliberate insertion of misinformation into input prompts by an adversary, characterized by varying levels of expressed confidence. In this study, we systematically evaluate the performance of several open-source large language models (LLMs) when exposed to such adversarial inputs. Three tiers of adversarial confidence are considered: strongly confident, modera… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  49. arXiv:2503.09894  [pdf, other

    cs.CL

    What's In Your Field? Mapping Scientific Research with Knowledge Graphs and Large Language Models

    Authors: Abhipsha Das, Nicholas Lourie, Siavash Golkar, Mariel Pettee

    Abstract: The scientific literature's exponential growth makes it increasingly challenging to navigate and synthesize knowledge across disciplines. Large language models (LLMs) are powerful tools for understanding scientific text, but they fail to capture detailed relationships across large bodies of work. Unstructured approaches, like retrieval augmented generation, can sift through such corpora to recall… ▽ More

    Submitted 28 May, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

    Comments: 9 pages, 5 pdf figures

  50. arXiv:2503.09184  [pdf, other

    cs.CR cs.DC cs.LG cs.PF

    Exploiting Unstructured Sparsity in Fully Homomorphic Encrypted DNNs

    Authors: Aidan Ferguson, Perry Gibson, Lara D'Agata, Parker McLeod, Ferhat Yaman, Amitabh Das, Ian Colbert, José Cano

    Abstract: The deployment of deep neural networks (DNNs) in privacy-sensitive environments is constrained by computational overheads in fully homomorphic encryption (FHE). This paper explores unstructured sparsity in FHE matrix multiplication schemes as a means of reducing this burden while maintaining model accuracy requirements. We demonstrate that sparsity can be exploited in arbitrary matrix multiplicati… ▽ More

    Submitted 3 April, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

    Comments: Accepted to 5th Workshop on Machine Learning and Systems (EuroMLSys) co-located with EuroSys '25