这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 2,927 results for author: Lee, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2511.15276  [pdf, ps, other

    cs.LG

    SNAP: Low-Latency Test-Time Adaptation with Sparse Updates

    Authors: Hyeongheon Cha, Dong Min Kim, Hye Won Chung, Taesik Gong, Sung-Ju Lee

    Abstract: Test-Time Adaptation (TTA) adjusts models using unlabeled test data to handle dynamic distribution shifts. However, existing methods rely on frequent adaptation and high computational cost, making them unsuitable for resource-constrained edge environments. To address this, we propose SNAP, a sparse TTA framework that reduces adaptation frequency and data usage while preserving accuracy. SNAP maint… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

    Journal ref: Advances in Neural Information Processing Systems 39 (NeurIPS 2025)

  2. Personalized targeted memory reactivation enhances consolidation of challenging memories via slow wave and spindle dynamics

    Authors: Gi-Hwan Shin, Young-Seok Kweon, Seungwon Oh, Seong-Whan Lee

    Abstract: Sleep is crucial for memory consolidation, underpinning effective learning. Targeted memory reactivation (TMR) can strengthen neural representations by re-engaging learning circuits during sleep. However, TMR protocols overlook individual differences in learning capacity and memory trace strength, limiting efficacy for difficult-to-recall memories. Here, we present a personalized TMR protocol that… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Journal ref: npj Science of Learning 10 (1), 47 (2025)

  3. arXiv:2511.14282  [pdf, ps, other

    cs.LG cs.AI

    Weight Variance Amplifier Improves Accuracy in High-Sparsity One-Shot Pruning

    Authors: Vincent-Daniel Yun, Junhyuk Jo, Sunwoo Lee

    Abstract: Deep neural networks achieve outstanding performance in visual recognition tasks, yet their large number of parameters makes them less practical for real-world applications. Recently, one-shot pruning has emerged as an effective strategy for reducing model size without additional training. However, models trained with standard objective functions often suffer a significant drop in accuracy after a… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

  4. Dynamic Black-box Backdoor Attacks on IoT Sensory Data

    Authors: Ajesh Koyatan Chathoth, Stephen Lee

    Abstract: Sensor data-based recognition systems are widely used in various applications, such as gait-based authentication and human activity recognition (HAR). Modern wearable and smart devices feature various built-in Inertial Measurement Unit (IMU) sensors, and such sensor-based measurements can be fed to a machine learning-based model to train and classify human activities. While deep learning-based mod… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Journal ref: year={2024},volume={}, number={}, pages={182-191}

  5. ChemFixer: Correcting Invalid Molecules to Unlock Previously Unseen Chemical Space

    Authors: Jun-Hyoung Park, Ho-Jun Song, Seong-Whan Lee

    Abstract: Deep learning-based molecular generation models have shown great potential in efficiently exploring vast chemical spaces by generating potential drug candidates with desired properties. However, these models often produce chemically invalid molecules, which limits the usable scope of the learned chemical space and poses significant challenges for practical applications. To address this issue, we p… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

    Comments: This is the author's preprint version of the article accepted to IEEE JBHI. Final published version: https://doi.org/10.1109/JBHI.2025.3593825. High-quality PDF (publisher version): https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11106678. Note: Some figures may appear distorted due to arXiv's TeXLive rendering

    Journal ref: ChemFixer: Correcting Invalid Molecules to Unlock Previously Unseen Chemical Space, IEEE Journal of Biomedical and Health Informatics, Early Access, 2025

  6. arXiv:2511.13739  [pdf, ps, other

    q-bio.NC cs.AI cs.SD

    Subject-Independent Imagined Speech Detection via Cross-Subject Generalization and Calibration

    Authors: Byung-Kwan Ko, Soowon Kim, Seo-Hyun Lee

    Abstract: Achieving robust generalization across individuals remains a major challenge in electroencephalogram based imagined speech decoding due to substantial variability in neural activity patterns. This study examined how training dynamics and lightweight subject specific adaptation influence cross subject performance in a neural decoding framework. A cyclic inter subject training approach, involving sh… ▽ More

    Submitted 11 November, 2025; originally announced November 2025.

    Comments: 4 pages, 2 figures, Name of Conference: International Conference on Brain-Computer Interface

  7. arXiv:2511.13725  [pdf, ps, other

    cs.CR cs.AI

    AI Kill Switch for malicious web-based LLM agent

    Authors: Sechan Lee, Sangdon Park

    Abstract: Recently, web-based Large Language Model (LLM) agents autonomously perform increasingly complex tasks, thereby bringing significant convenience. However, they also amplify the risks of malicious misuse cases such as unauthorized collection of personally identifiable information (PII), generation of socially divisive content, and even automated web hacking. To address these threats, we propose an A… ▽ More

    Submitted 25 September, 2025; originally announced November 2025.

  8. arXiv:2511.13283  [pdf, ps, other

    cs.CV

    TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing

    Authors: Jongha Kim, Minseong Bae, Sanghyeok Lee, Jinsung Yoon, Hyunwoo J. Kim

    Abstract: Table images present unique challenges for effective and efficient understanding due to the need for question-specific focus and the presence of redundant background regions. Existing Multimodal Large Language Model (MLLM) approaches often overlook these characteristics, resulting in uninformative and redundant visual representations. To address these issues, we aim to generate visual features tha… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: AAAI 2026 (Main Technical Track)

  9. arXiv:2511.13195  [pdf, ps, other

    cs.CV

    Difficulty-Aware Label-Guided Denoising for Monocular 3D Object Detection

    Authors: Soyul Lee, Seungmin Baek, Dongbo Min

    Abstract: Monocular 3D object detection is a cost-effective solution for applications like autonomous driving and robotics, but remains fundamentally ill-posed due to inherently ambiguous depth cues. Recent DETR-based methods attempt to mitigate this through global attention and auxiliary depth prediction, yet they still struggle with inaccurate depth estimates. Moreover, these methods often overlook instan… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: AAAI 2026 accepted

  10. arXiv:2511.13105  [pdf, ps, other

    cs.CV

    PlugTrack: Multi-Perceptive Motion Analysis for Adaptive Fusion in Multi-Object Tracking

    Authors: Seungjae Kim, SeungJoon Lee, MyeongAh Cho

    Abstract: Multi-object tracking (MOT) predominantly follows the tracking-by-detection paradigm, where Kalman filters serve as the standard motion predictor due to computational efficiency but inherently fail on non-linear motion patterns. Conversely, recent data-driven motion predictors capture complex non-linear dynamics but suffer from limited domain generalization and computational overhead. Through exte… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: AAAI 2026. Code: https://github.com/VisualScienceLab-KHU/PlugTrack

  11. arXiv:2511.13078  [pdf, ps, other

    cs.LG eess.AS eess.IV

    A Smart-Glasses for Emergency Medical Services via Multimodal Multitask Learning

    Authors: Liuyi Jin, Pasan Gunawardena, Amran Haroon, Runzhi Wang, Sangwoo Lee, Radu Stoleru, Michael Middleton, Zepeng Huo, Jeeeun Kim, Jason Moats

    Abstract: Emergency Medical Technicians (EMTs) operate in high-pressure environments, making rapid, life-critical decisions under heavy cognitive and operational loads. We present EMSGlass, a smart-glasses system powered by EMSNet, the first multimodal multitask model for Emergency Medical Services (EMS), and EMSServe, a low-latency multimodal serving framework tailored to EMS scenarios. EMSNet integrates t… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  12. arXiv:2511.12992  [pdf, ps, other

    cs.CV

    Semantic Prioritization in Visual Counterfactual Explanations with Weighted Segmentation and Auto-Adaptive Region Selection

    Authors: Lintong Zhang, Kang Yin, Seong-Whan Lee

    Abstract: In the domain of non-generative visual counterfactual explanations (CE), traditional techniques frequently involve the substitution of sections within a query image with corresponding sections from distractor images. Such methods have historically overlooked the semantic relevance of the replacement regions to the target object, thereby impairing the model's interpretability and hindering the edit… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: 31page, 7 figures

    MSC Class: 68T45 ACM Class: I.4.6; I.2.10

  13. arXiv:2511.12573  [pdf, ps, other

    cs.CL cs.AI

    Mitigating Length Bias in RLHF through a Causal Lens

    Authors: Hyeonji Kim, Sujeong Oh, Sanghack Lee

    Abstract: Reinforcement learning from human feedback (RLHF) is widely used to align large language models (LLMs) with human preferences. However, RLHF-trained reward models often exhibit length bias -- a systematic tendency to favor longer responses by conflating verbosity with quality. We propose a causal framework for analyzing and mitigating length bias in RLHF reward modeling. Central to our approach is… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

  14. arXiv:2511.11574  [pdf, ps, other

    cs.LG

    LLM on a Budget: Active Knowledge Distillation for Efficient Classification of Large Text Corpora

    Authors: Viviana Luccioli, Rithika Iyengar, Ryan Panley, Flora Haberkorn, Xiaoyu Ge, Leland Crane, Nitish Sinha, Seung Jung Lee

    Abstract: Large Language Models (LLMs) are highly accurate in classification tasks, however, substantial computational and financial costs hinder their large-scale deployment in dynamic environments. Knowledge Distillation (KD) where a LLM "teacher" trains a smaller and more efficient "student" model, offers a promising solution to this problem. However, the distillation process itself often remains costly… ▽ More

    Submitted 17 September, 2025; originally announced November 2025.

  15. arXiv:2511.11253  [pdf, ps, other

    cs.CV

    CountSteer: Steering Attention for Object Counting in Diffusion Models

    Authors: Hyemin Boo, Hyoryung Kim, Myungjin Lee, Seunghyeon Lee, Jiyoung Lee, Jang-Hwan Choi, Hyunsoo Cho

    Abstract: Text-to-image diffusion models generate realistic and coherent images but often fail to follow numerical instructions in text, revealing a gap between language and visual representation. Interestingly, we found that these models are not entirely blind to numbers-they are implicitly aware of their own counting accuracy, as their internal signals shift in consistent ways depending on whether the out… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

    Comments: Accepted to AAAI 2026 Workshop on Shaping Responsible Synthetic Data in the Era of Foundation Models (RSD)

  16. arXiv:2511.11079  [pdf, ps, other

    cs.AI

    ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving

    Authors: Sejin Kim, Hayan Choi, Seokki Lee, Sundong Kim

    Abstract: We present ARCTraj, a dataset and methodological framework for modeling human reasoning through complex visual tasks in the Abstraction and Reasoning Corpus (ARC). While ARC has inspired extensive research on abstract reasoning, most existing approaches rely on static input--output supervision, which limits insight into how reasoning unfolds over time. ARCTraj addresses this gap by recording tempo… ▽ More

    Submitted 16 November, 2025; v1 submitted 14 November, 2025; originally announced November 2025.

    ACM Class: I.2.6; I.2.0

  17. arXiv:2511.10958  [pdf, ps, other

    cs.CV cs.AI

    Text-guided Weakly Supervised Framework for Dynamic Facial Expression Recognition

    Authors: Gunho Jung, Heejo Kong, Seong-Whan Lee

    Abstract: Dynamic facial expression recognition (DFER) aims to identify emotional states by modeling the temporal changes in facial movements across video sequences. A key challenge in DFER is the many-to-one labeling problem, where a video composed of numerous frames is assigned a single emotion label. A common strategy to mitigate this issue is to formulate DFER as a Multiple Instance Learning (MIL) probl… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

  18. arXiv:2511.10866  [pdf, ps, other

    cs.CV cs.AI

    Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling

    Authors: Seoik Jung, Taekyung Song, Yangro Lee, Sungjun Lee

    Abstract: This paper proposes a Short-Window Sliding Learning framework for real-time violence detection in CCTV footages. Unlike conventional long-video training approaches, the proposed method divides videos into 1-2 second clips and applies Large Language Model (LLM)-based auto-caption labeling to construct fine-grained datasets. Each short clip fully utilizes all frames to preserve temporal continuity,… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

    Comments: 5 pages, 2 figures. Accepted paper for the IEIE (Institute of Electronics and Information Engineers) Fall Conference 2025. Presentation on Nov 27, 2025

    MSC Class: 68T45; 68T07 ACM Class: I.2.10; I.4.8; I.2.6

  19. arXiv:2511.10834  [pdf, ps, other

    cs.LG cs.DC

    EarthSight: A Distributed Framework for Low-Latency Satellite Intelligence

    Authors: Ansel Kaplan Erol, Seungjun Lee, Divya Mahajan

    Abstract: Low-latency delivery of satellite imagery is essential for time-critical applications such as disaster response, intelligence, and infrastructure monitoring. However, traditional pipelines rely on downlinking all captured images before analysis, introducing delays of hours to days due to restricted communication bandwidth. To address these bottlenecks, emerging systems perform onboard machine lear… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

  20. arXiv:2511.10300  [pdf, ps, other

    cs.CV cs.CY

    Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts

    Authors: Sumin Lee, Sungwon Park, Jeasurk Yang, Jihee Kim, Meeyoung Cha

    Abstract: Satellite-based slum segmentation holds significant promise in generating global estimates of urban poverty. However, the morphological heterogeneity of informal settlements presents a major challenge, hindering the ability of models trained on specific regions to generalize effectively to unseen locations. To address this, we introduce a large-scale high-resolution dataset and propose GRAM (Gener… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

    Comments: Accepted to AAAI 2026

  21. arXiv:2511.10289  [pdf, ps, other

    eess.AS cs.CL

    Music Flamingo: Scaling Music Understanding in Audio Language Models

    Authors: Sreyan Ghosh, Arushi Goel, Lasha Koroshinadze, Sang-gil Lee, Zhifeng Kong, Joao Felipe Santos, Ramani Duraiswami, Dinesh Manocha, Wei Ping, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: We introduce Music Flamingo, a novel large audio-language model designed to advance music (including song) understanding in foundational audio models. While audio-language research has progressed rapidly, music remains challenging due to its dynamic, layered, and information-dense nature. Progress has been further limited by the difficulty of scaling open audio understanding models, primarily beca… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

    Comments: Project Page: https://research.nvidia.com/labs/adlr/MF/

  22. arXiv:2511.10045  [pdf, ps, other

    cs.CL

    Do Language Models Associate Sound with Meaning? A Multimodal Study of Sound Symbolism

    Authors: Jinhong Jeong, Sunghyun Lee, Jaeyoung Lee, Seonah Han, Youngjae Yu

    Abstract: Sound symbolism is a linguistic concept that refers to non-arbitrary associations between phonetic forms and their meanings. We suggest that this can be a compelling probe into how Multimodal Large Language Models (MLLMs) interpret auditory information in human languages. We investigate MLLMs' performance on phonetic iconicity across textual (orthographic and IPA) and auditory forms of inputs with… ▽ More

    Submitted 15 November, 2025; v1 submitted 13 November, 2025; originally announced November 2025.

    Comments: 33 pages, 27 tables, 10 figures

  23. arXiv:2511.09266  [pdf, ps, other

    cs.CR

    SecTracer: A Framework for Uncovering the Root Causes of Network Intrusions via Security Provenance

    Authors: Seunghyeon Lee, Hyunmin Seo, Hwanjo Heo, Anduo Wang, Seungwon Shin, Jinwoo Kim

    Abstract: Modern enterprise networks comprise diverse and heterogeneous systems that support a wide range of services, making it challenging for administrators to track and analyze sophisticated attacks such as advanced persistent threats (APTs), which often exploit multiple vectors. To address this challenge, we introduce the concept of network-level security provenance, which enables the systematic establ… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

    Comments: 19 pages, 15 figures, Accepted for publication in Computers & Security

  24. arXiv:2511.08835  [pdf, ps, other

    cs.CL cs.AI

    Beyond Task-Oriented and Chitchat Dialogues: Proactive and Transition-Aware Conversational Agents

    Authors: Yejin Yoon, Yuri Son, Namyoung So, Minseo Kim, Minsoo Cho, Chanhee Park, Seungshin Lee, Taeuk Kim

    Abstract: Conversational agents have traditionally been developed for either task-oriented dialogue (TOD) or open-ended chitchat, with limited progress in unifying the two. Yet, real-world conversations naturally involve fluid transitions between these modes. To address this gap, we introduce TACT (TOD-And-Chitchat Transition), a dataset designed for transition-aware dialogue modeling that incorporates stru… ▽ More

    Submitted 11 November, 2025; originally announced November 2025.

    Comments: accepted to EMNLP2025

  25. arXiv:2511.07974  [pdf, ps, other

    cs.AI

    Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition

    Authors: Lintong Zhang, Kang Yin, Seong-Whan Lee

    Abstract: Attribution-based explanation techniques capture key patterns to enhance visual interpretability; however, these patterns often lack the granularity needed for insight in fine-grained tasks, particularly in cases of model misclassification, where explanations may be insufficiently detailed. To address this limitation, we propose a fine-grained counterfactual explanation framework that generates bo… ▽ More

    Submitted 11 November, 2025; originally announced November 2025.

  26. arXiv:2511.07936  [pdf, ps, other

    cs.AI

    Toward Practical BCI: A Real-time Wireless Imagined Speech EEG Decoding System

    Authors: Ji-Ha Park, Heon-Gyu Kwak, Gi-Hwan Shin, Yoo-In Jeon, Sun-Min Park, Ji-Yeon Hwang, Seong-Whan Lee

    Abstract: Brain-computer interface (BCI) research, while promising, has largely been confined to static and fixed environments, limiting real-world applicability. To move towards practical BCI, we introduce a real-time wireless imagined speech electroencephalogram (EEG) decoding system designed for flexibility and everyday use. Our framework focuses on practicality, demonstrating extensibility beyond wired… ▽ More

    Submitted 11 November, 2025; originally announced November 2025.

    Comments: 4 pages, 2 figures, 1 table, Name of Conference: International Conference on Brain-Computer Interface

  27. arXiv:2511.07912  [pdf, ps, other

    cs.AI

    Neurophysiological Characteristics of Adaptive Reasoning for Creative Problem-Solving Strategy

    Authors: Jun-Young Kim, Young-Seok Kweon, Gi-Hwan Shin, Seong-Whan Lee

    Abstract: Adaptive reasoning enables humans to flexibly adjust inference strategies when environmental rules or contexts change, yet its underlying neural dynamics remain unclear. This study investigated the neurophysiological mechanisms of adaptive reasoning using a card-sorting paradigm combined with electroencephalography and compared human performance with that of a multimodal large language model. Stim… ▽ More

    Submitted 11 November, 2025; originally announced November 2025.

    Comments: 4 pages, 4 figures, 1 table,

  28. arXiv:2511.07890  [pdf, ps, other

    cs.AI

    Confidence-Aware Neural Decoding of Overt Speech from EEG: Toward Robust Brain-Computer Interfaces

    Authors: Soowon Kim, Byung-Kwan Ko, Seo-Hyun Lee

    Abstract: Non-invasive brain-computer interfaces that decode spoken commands from electroencephalogram must be both accurate and trustworthy. We present a confidence-aware decoding framework that couples deep ensembles of compact, speech-oriented convolutional networks with post-hoc calibration and selective classification. Uncertainty is quantified using ensemble-based predictive entropy, top-two margin, a… ▽ More

    Submitted 11 November, 2025; originally announced November 2025.

  29. arXiv:2511.07884  [pdf, ps, other

    cs.LG cs.AI

    Meta-cognitive Multi-scale Hierarchical Reasoning for Motor Imagery Decoding

    Authors: Si-Hyun Kim, Heon-Gyu Kwak, Byoung-Hee Kwon, Seong-Whan Lee

    Abstract: Brain-computer interface (BCI) aims to decode motor intent from noninvasive neural signals to enable control of external devices, but practical deployment remains limited by noise and variability in motor imagery (MI)-based electroencephalogram (EEG) signals. This work investigates a hierarchical and meta-cognitive decoding framework for four-class MI classification. We introduce a multi-scale hie… ▽ More

    Submitted 11 November, 2025; originally announced November 2025.

    Comments: 4 pages, 1 figures, 1 table, Name of Conference: International Winter Conference on Brain-Computer Interface

  30. arXiv:2511.07862  [pdf, ps, other

    cs.CV

    MonoCLUE : Object-Aware Clustering Enhances Monocular 3D Object Detection

    Authors: Sunghun Yang, Minhyeok Lee, Jungho Lee, Sangyoun Lee

    Abstract: Monocular 3D object detection offers a cost-effective solution for autonomous driving but suffers from ill-posed depth and limited field of view. These constraints cause a lack of geometric cues and reduced accuracy in occluded or truncated scenes. While recent approaches incorporate additional depth information to address geometric ambiguity, they overlook the visual cues crucial for robust recog… ▽ More

    Submitted 11 November, 2025; originally announced November 2025.

    Comments: Accepted to AAAI 2026

  31. arXiv:2511.07464  [pdf, ps, other

    cs.CL cs.AI

    Motif 2 12.7B technical report

    Authors: Junghwan Lim, Sungmin Lee, Dongseok Kim, Taehyun Kim, Eunhwan Park, Jeesoo Lee, Jeongdoo Lee, Junhyeok Lee, Wai Ting Cheung, Dahye Choi, Jaeheui Her, Jaeyeon Huh, Hanbin Jung, Changjin Kang, Beomgyu Kim, Minjae Kim, Taewhan Kim, Youngrok Kim, Hyukjin Kweon, Haesol Lee, Kungyu Lee, Dongpin Oh, Yeongjae Park, Bokki Ryu, Dongjoo Weon

    Abstract: We introduce Motif-2-12.7B, a new open-weight foundation model that pushes the efficiency frontier of large language models by combining architectural innovation with system-level optimization. Designed for scalable language understanding and robust instruction generalization under constrained compute budgets, Motif-2-12.7B builds upon Motif-2.6B with the integration of Grouped Differential Attent… ▽ More

    Submitted 7 November, 2025; originally announced November 2025.

  32. arXiv:2511.07129  [pdf, ps, other

    cs.CL cs.AI cs.LG

    LoRA on the Go: Instance-level Dynamic LoRA Selection and Merging

    Authors: Seungeon Lee, Soumi Das, Manish Gupta, Krishna P. Gummadi

    Abstract: Low-Rank Adaptation (LoRA) has emerged as a parameter-efficient approach for fine-tuning large language models.However, conventional LoRA adapters are typically trained for a single task, limiting their applicability in real-world settings where inputs may span diverse and unpredictable domains. At inference time, existing approaches combine multiple LoRAs for improving performance on diverse task… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

  33. arXiv:2511.06433  [pdf, ps, other

    cs.CV

    Diagnose Like A REAL Pathologist: An Uncertainty-Focused Approach for Trustworthy Multi-Resolution Multiple Instance Learning

    Authors: Sungrae Hong, Sol Lee, Jisu Shin, Mun Yong Yi

    Abstract: With the increasing demand for histopathological specimen examination and diagnostic reporting, Multiple Instance Learning (MIL) has received heightened research focus as a viable solution for AI-centric diagnostic aid. Recently, to improve its performance and make it work more like a pathologist, several MIL approaches based on the use of multiple-resolution images have been proposed, delivering… ▽ More

    Submitted 9 November, 2025; originally announced November 2025.

    Comments: Accepted by IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026

  34. arXiv:2511.06411  [pdf, ps, other

    cs.AI cs.LG

    SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization

    Authors: Zhi Zheng, Wee Sun Lee

    Abstract: The soft-thinking paradigm for Large Language Model (LLM) reasoning can outperform the conventional discrete-token Chain-of-Thought (CoT) reasoning in some scenarios, underscoring its research and application value. However, while the discrete-token CoT reasoning pattern can be reinforced through policy optimization algorithms such as group relative policy optimization (GRPO), extending the soft-t… ▽ More

    Submitted 9 November, 2025; originally announced November 2025.

  35. arXiv:2511.06190  [pdf, ps, other

    cs.CL

    Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning

    Authors: Sangmook Lee, Dohyung Kim, Hyukhun Koh, Nakyeong Yang, Kyomin Jung

    Abstract: Recent advances in Large Language Models (LLMs) - particularly model scaling and test-time techniques - have greatly enhanced the reasoning capabilities of language models at the expense of higher inference costs. To lower inference costs, prior works train router models or deferral mechanisms that allocate easy queries to a small, efficient model, while forwarding harder queries to larger, more e… ▽ More

    Submitted 8 November, 2025; originally announced November 2025.

    Comments: 7 pages, 5 figures

  36. arXiv:2511.05563  [pdf, ps, other

    cs.LG cs.AI

    Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models

    Authors: Sanghyun Lee, Seungryong Kim, Jongho Park, Dongmin Park

    Abstract: Masked Diffusion Models (MDMs) as language models generate by iteratively unmasking tokens, yet their performance crucially depends on the inference time order of unmasking. Prevailing heuristics, such as confidence based sampling, are myopic: they optimize locally, fail to leverage extra test-time compute, and let early decoding mistakes cascade. We propose Lookahead Unmasking (LookUM), which add… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

  37. arXiv:2511.05562  [pdf, ps, other

    cs.LG cs.AI

    Effective Test-Time Scaling of Discrete Diffusion through Iterative Refinement

    Authors: Sanghyun Lee, Sunwoo Kim, Seungryong Kim, Jongho Park, Dongmin Park

    Abstract: Test-time scaling through reward-guided generation remains largely unexplored for discrete diffusion models despite its potential as a promising alternative. In this work, we introduce Iterative Reward-Guided Refinement (IterRef), a novel test-time scaling method tailored to discrete diffusion that leverages reward-guided noising-denoising transitions to progressively refine misaligned intermediat… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

  38. arXiv:2511.04998  [pdf, ps, other

    cs.LG cs.AI q-bio.QM

    BiPETE: A Bi-Positional Embedding Transformer Encoder for Risk Assessment of Alcohol and Substance Use Disorder with Electronic Health Records

    Authors: Daniel S. Lee, Mayra S. Haedo-Cruz, Chen Jiang, Oshin Miranda, LiRong Wang

    Abstract: Transformer-based deep learning models have shown promise for disease risk prediction using electronic health records(EHRs), but modeling temporal dependencies remains a key challenge due to irregular visit intervals and lack of uniform structure. We propose a Bi-Positional Embedding Transformer Encoder or BiPETE for single-disease prediction, which integrates rotary positional embeddings to encod… ▽ More

    Submitted 7 November, 2025; originally announced November 2025.

    Comments: 20 pages, 2 figures, 6 tables, 2 supplementary figures, 4 supplementary tables, submitted to Journal of Biomedical Informatics on 6 Nov, 2025

  39. arXiv:2511.04681  [pdf, ps, other

    astro-ph.CO cs.LG

    Dark Energy Survey Year 3 results: Simulation-based $w$CDM inference from weak lensing and galaxy clustering maps with deep learning. I. Analysis design

    Authors: A. Thomsen, J. Bucko, T. Kacprzak, V. Ajani, J. Fluri, A. Refregier, D. Anbajagane, F. J. Castander, A. Ferté, M. Gatti, N. Jeffrey, A. Alarcon, A. Amon, K. Bechtol, M. R. Becker, G. M. Bernstein, A. Campos, A. Carnero Rosell, C. Chang, R. Chen, A. Choi, M. Crocce, C. Davis, J. DeRose, S. Dodelson , et al. (76 additional authors not shown)

    Abstract: Data-driven approaches using deep learning are emerging as powerful techniques to extract non-Gaussian information from cosmological large-scale structure. This work presents the first simulation-based inference (SBI) pipeline that combines weak lensing and galaxy clustering maps in a realistic Dark Energy Survey Year 3 (DES Y3) configuration and serves as preparation for a forthcoming analysis of… ▽ More

    Submitted 6 November, 2025; originally announced November 2025.

    Comments: 38 pages, 14 figures, submitted

  40. arXiv:2511.03774  [pdf, ps, other

    cs.LG

    Contamination Detection for VLMs using Multi-Modal Semantic Perturbation

    Authors: Jaden Park, Mu Cai, Feng Yao, Jingbo Shang, Soochahn Lee, Yong Jae Lee

    Abstract: Recent advances in Vision-Language Models (VLMs) have achieved state-of-the-art performance on numerous benchmark tasks. However, the use of internet-scale, often proprietary, pretraining corpora raises a critical concern for both practitioners and users: inflated performance due to test-set leakage. While prior works have proposed mitigation strategies such as decontamination of pretraining data… ▽ More

    Submitted 5 November, 2025; originally announced November 2025.

  41. arXiv:2511.03367  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models

    Authors: Gahyeon Kim, Sohee Kim, Seokju Lee

    Abstract: Recent advances in large-scale vision and language models have led to significant progress in zero-shot learning tasks. Methods such as CoOp and CoCoOp have shown that replacing handcrafted prompts with learnable vectors, known as prompt learning, can result in improved performance. However, these models often struggle to generalize to entirely unseen categories. While traditional zero-shot learni… ▽ More

    Submitted 5 November, 2025; originally announced November 2025.

    Comments: Accepted in Pattern Recognition

  42. arXiv:2511.03289  [pdf, ps, other

    cs.DS

    Optimal Stopping with a Predicted Prior

    Authors: Tian Bai, Zhiyi Huang, Chui Shan Lee, Dongchen Li

    Abstract: There are two major models of value uncertainty in the optimal stopping literature: the secretary model, which assumes no prior knowledge, and the prophet inequality model, which assumes full information about value distributions. In practice, decision makers often rely on machine-learned priors that may be erroneous. Motivated by this gap, we formulate the model of optimal stopping with a predict… ▽ More

    Submitted 5 November, 2025; originally announced November 2025.

  43. arXiv:2511.02853  [pdf, ps, other

    eess.SP cs.AI cs.LG

    Consciousness-ECG Transformer for Conscious State Estimation System with Real-Time Monitoring

    Authors: Young-Seok Kweon, Gi-Hwan Shin, Ji-Yong Kim, Bokyeong Ryu, Seong-Whan Lee

    Abstract: Conscious state estimation is important in various medical settings, including sleep staging and anesthesia management, to ensure patient safety and optimize health outcomes. Traditional methods predominantly utilize electroencephalography (EEG), which faces challenges such as high sensitivity to noise and the requirement for controlled environments. In this study, we propose the consciousness-ECG… ▽ More

    Submitted 31 October, 2025; originally announced November 2025.

    Comments: 30 pages, 8 figures

    Journal ref: Expert Systems with Applications 299 (2026) 130091

  44. arXiv:2511.02358  [pdf, ps, other

    cs.CL cs.AI cs.IR cs.LG cs.MM

    Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation

    Authors: Wongyu Kim, Hochang Lee, Sanghak Lee, Yoonsung Kim, Jaehyun Park

    Abstract: Query augmentation makes queries more meaningful by appending further information to the queries to find relevant documents. Current studies have proposed Large Language Model (LLM)-based embedders, which learn representation for embedding and generation for query augmentation in a multi-task manner by leveraging the generative capabilities of LLM. During inference, these jointly trained embedders… ▽ More

    Submitted 4 November, 2025; originally announced November 2025.

    Comments: Accepted to MMGenSR Workshop (CIKM 2025)

  45. arXiv:2511.01942  [pdf, ps, other

    cs.DB cond-mat.mtrl-sci cs.DL

    Towards Defect Phase Diagrams: From Research Data Management to Automated Workflows

    Authors: Khalil Rejiba, Sang-Hyeok Lee, Christina Gasper, Martina Freund, Sandra Korte-Kerzel, Ulrich Kerzel

    Abstract: Defect phase diagrams provide a unified description of crystal defect states for materials design and are central to the scientific objectives of the Collaborative Research Centre (CRC) 1394. Their construction requires the systematic integration of heterogeneous experimental and simulation data across research groups and locations. In this setting, research data management (RDM) is a key enabler… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

  46. arXiv:2511.01399  [pdf

    cs.CV

    Semantic BIM enrichment for firefighting assets: Fire-ART dataset and panoramic image-based 3D reconstruction

    Authors: Ya Wen, Yutong Qiao, Chi Chiu Lam, Ioannis Brilakis, Sanghoon Lee, Mun On Wong

    Abstract: Inventory management of firefighting assets is crucial for emergency preparedness, risk assessment, and on-site fire response. However, conventional methods are inefficient due to limited capabilities in automated asset recognition and reconstruction. To address the challenge, this research introduces the Fire-ART dataset and develops a panoramic image-based reconstruction approach for semantic en… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

  47. arXiv:2511.01348  [pdf, ps, other

    cs.SE cs.AI

    The Future of Generative AI in Software Engineering: A Vision from Industry and Academia in the European GENIUS Project

    Authors: Robin Gröpler, Steffen Klepke, Jack Johns, Andreas Dreschinski, Klaus Schmid, Benedikt Dornauer, Eray Tüzün, Joost Noppen, Mohammad Reza Mousavi, Yongjian Tang, Johannes Viehmann, Selin Şirin Aslangül, Beum Seuk Lee, Adam Ziolkowski, Eric Zie

    Abstract: Generative AI (GenAI) has recently emerged as a groundbreaking force in Software Engineering, capable of generating code, identifying bugs, recommending fixes, and supporting quality assurance. While its use in coding tasks shows considerable promise, applying GenAI across the entire Software Development Life Cycle (SDLC) has not yet been fully explored. Critical uncertainties in areas such as rel… ▽ More

    Submitted 6 November, 2025; v1 submitted 3 November, 2025; originally announced November 2025.

    Comments: Accepted for the 2nd IEEE/ACM International Conference on AI-powered Software (AIware 2025)

  48. arXiv:2511.00826  [pdf, ps, other

    cs.DB

    Efficient Query Repair for Aggregate Constraints

    Authors: Shatha Algarni, Boris Glavic, Seokki Lee, Adriane Chapman

    Abstract: In many real-world scenarios, query results must satisfy domain-specific constraints. For instance, a minimum percentage of interview candidates selected based on their qualifications should be female. These requirements can be expressed as constraints over an arithmetic combination of aggregates evaluated on the result of the query. In this work, we study how to repair a query to fulfill such con… ▽ More

    Submitted 2 November, 2025; originally announced November 2025.

    Comments: 19 pages, 63 figures

  49. arXiv:2511.00040  [pdf, ps, other

    cs.LG cs.AI

    Semi-Supervised Preference Optimization with Limited Feedback

    Authors: Seonggyun Lee, Sungjun Lim, Seojin Park, Soeun Cheon, Kyungwoo Song

    Abstract: The field of preference optimization has made outstanding contributions to the alignment of language models with human preferences. Despite these advancements, recent methods still rely heavily on substantial paired (labeled) feedback data, leading to substantial resource expenditures. To address these challenges, we study the problem of Semi-Supervised Preference Optimization (SSPO) in which the… ▽ More

    Submitted 27 October, 2025; originally announced November 2025.

  50. arXiv:2510.27255  [pdf, ps, other

    cs.CV

    Enhancing Spatio-Temporal Zero-shot Action Recognition with Language-driven Description Attributes

    Authors: Yehna Kim, Young-Eun Kim, Seong-Whan Lee

    Abstract: Vision-Language Models (VLMs) have demonstrated impressive capabilities in zero-shot action recognition by learning to associate video embeddings with class embeddings. However, a significant challenge arises when relying solely on action classes to provide semantic context, particularly due to the presence of multi-semantic words, which can introduce ambiguity in understanding the intended concep… ▽ More

    Submitted 3 November, 2025; v1 submitted 31 October, 2025; originally announced October 2025.