Search | arXiv e-print repository

InvRGB+L: Inverse Rendering of Complex Scenes with Unified Color and LiDAR Reflectance Modeling

Authors: Xiaoxue Chen, Bhargav Chandaka, Chih-Hao Lin, Ya-Qin Zhang, David Forsyth, Hao Zhao, Shenlong Wang

Abstract: We present InvRGB+L, a novel inverse rendering model that reconstructs large, relightable, and dynamic scenes from a single RGB+LiDAR sequence. Conventional inverse graphics methods rely primarily on RGB observations and use LiDAR mainly for geometric information, often resulting in suboptimal material estimates due to visible light interference. We find that LiDAR's intensity values-captured with… ▽ More We present InvRGB+L, a novel inverse rendering model that reconstructs large, relightable, and dynamic scenes from a single RGB+LiDAR sequence. Conventional inverse graphics methods rely primarily on RGB observations and use LiDAR mainly for geometric information, often resulting in suboptimal material estimates due to visible light interference. We find that LiDAR's intensity values-captured with active illumination in a different spectral range-offer complementary cues for robust material estimation under variable lighting. Inspired by this, InvRGB+L leverages LiDAR intensity cues to overcome challenges inherent in RGB-centric inverse graphics through two key innovations: (1) a novel physics-based LiDAR shading model and (2) RGB-LiDAR material consistency losses. The model produces novel-view RGB and LiDAR renderings of urban and indoor scenes and supports relighting, night simulations, and dynamic object insertions, achieving results that surpass current state-of-the-art methods in both scene-level urban inverse rendering and LiDAR simulation. △ Less

Submitted 23 July, 2025; originally announced July 2025.

Comments: Accepted to ICCV 2025

arXiv:2507.17448 [pdf, ps, other]

Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning

Authors: Situo Zhang, Hanqi Li, Lu Chen, Zihan Zhao, Xuanze Lin, Zichen Zhu, Bo Chen, Xin Chen, Kai Yu

Abstract: Retrosynthesis planning, essential in organic synthesis and drug discovery, has greatly benefited from recent AI-driven advancements. Nevertheless, existing methods frequently face limitations in both applicability and explainability. Traditional graph-based and sequence-to-sequence models often lack generalized chemical knowledge, leading to predictions that are neither consistently accurate nor… ▽ More Retrosynthesis planning, essential in organic synthesis and drug discovery, has greatly benefited from recent AI-driven advancements. Nevertheless, existing methods frequently face limitations in both applicability and explainability. Traditional graph-based and sequence-to-sequence models often lack generalized chemical knowledge, leading to predictions that are neither consistently accurate nor easily explainable. To address these challenges, we introduce RetroDFM-R, a reasoning-based large language model (LLM) designed specifically for chemical retrosynthesis. Leveraging large-scale reinforcement learning guided by chemically verifiable rewards, RetroDFM-R significantly enhances prediction accuracy and explainability. Comprehensive evaluations demonstrate that RetroDFM-R significantly outperforms state-of-the-art methods, achieving a top-1 accuracy of 65.0% on the USPTO-50K benchmark. Double-blind human assessments further validate the chemical plausibility and practical utility of RetroDFM-R's predictions. RetroDFM-R also accurately predicts multistep retrosynthetic routes reported in the literature for both real-world drug molecules and perovskite materials. Crucially, the model's explicit reasoning process provides human-interpretable insights, thereby enhancing trust and practical value in real-world retrosynthesis applications. △ Less

Submitted 23 July, 2025; originally announced July 2025.

Comments: Preprint

arXiv:2507.17265 [pdf, ps, other]

Visualization-Driven Illumination for Density Plots

Authors: Xin Chen, Yunhai Wang, Huaiwei Bao, Kecheng Lu, Jaemin Jo, Chi-Wing Fu, Jean-Daniel Fekete

Abstract: We present a novel visualization-driven illumination model for density plots, a new technique to enhance density plots by effectively revealing the detailed structures in high- and medium-density regions and outliers in low-density regions, while avoiding artifacts in the density field's colors. When visualizing large and dense discrete point samples, scatterplots and dot density maps often suffer… ▽ More We present a novel visualization-driven illumination model for density plots, a new technique to enhance density plots by effectively revealing the detailed structures in high- and medium-density regions and outliers in low-density regions, while avoiding artifacts in the density field's colors. When visualizing large and dense discrete point samples, scatterplots and dot density maps often suffer from overplotting, and density plots are commonly employed to provide aggregated views while revealing underlying structures. Yet, in such density plots, existing illumination models may produce color distortion and hide details in low-density regions, making it challenging to look up density values, compare them, and find outliers. The key novelty in this work includes (i) a visualization-driven illumination model that inherently supports density-plot-specific analysis tasks and (ii) a new image composition technique to reduce the interference between the image shading and the color-encoded density values. To demonstrate the effectiveness of our technique, we conducted a quantitative study, an empirical evaluation of our technique in a controlled study, and two case studies, exploring twelve datasets with up to two million data point samples. △ Less

Submitted 23 July, 2025; originally announced July 2025.

arXiv:2507.17242 [pdf]

High-Density EEG Enables the Fastest Visual Brain-Computer Interfaces

Authors: Gege Ming, Weihua Pei, Sen Tian, Xiaogang Chen, Xiaorong Gao, Yijun Wang

Abstract: Brain-computer interface (BCI) technology establishes a direct communication pathway between the brain and external devices. Current visual BCI systems suffer from insufficient information transfer rates (ITRs) for practical use. Spatial information, a critical component of visual perception, remains underexploited in existing systems because the limited spatial resolution of recording methods hin… ▽ More Brain-computer interface (BCI) technology establishes a direct communication pathway between the brain and external devices. Current visual BCI systems suffer from insufficient information transfer rates (ITRs) for practical use. Spatial information, a critical component of visual perception, remains underexploited in existing systems because the limited spatial resolution of recording methods hinders the capture of the rich spatiotemporal dynamics of brain signals. This study proposed a frequency-phase-space fusion encoding method, integrated with 256-channel high-density electroencephalogram (EEG) recordings, to develop high-speed BCI systems. In the classical frequency-phase encoding 40-target BCI paradigm, the 256-66, 128-32, and 64-21 electrode configurations brought theoretical ITR increases of 83.66%, 79.99%, and 55.50% over the traditional 64-9 setup. In the proposed frequency-phase-space encoding 200-target BCI paradigm, these increases climbed to 195.56%, 153.08%, and 103.07%. The online BCI system achieved an average actual ITR of 472.7 bpm. This study demonstrates the essential role and immense potential of high-density EEG in decoding the spatiotemporal information of visual stimuli. △ Less

Submitted 23 July, 2025; originally announced July 2025.

arXiv:2507.17220 [pdf, ps, other]

PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models

Authors: Jiansong Wan, Chengming Zhou, Jinkua Liu, Xiangge Huang, Xiaoyu Chen, Xiaohan Yi, Qisen Yang, Baiting Zhu, Xin-Qiang Cai, Lixing Liu, Rushuai Yang, Chuheng Zhang, Sherif Abdelfattah, Hayong Shin, Pushi Zhang, Li Zhao, Jiang Bian

Abstract: Recent studies have explored pretrained (foundation) models for vision-based robotic navigation, aiming to achieve generalizable navigation and positive transfer across diverse environments while enhancing zero-shot performance in unseen settings. In this work, we introduce PIG-Nav (Pretrained Image-Goal Navigation), a new approach that further investigates pretraining strategies for vision-based… ▽ More Recent studies have explored pretrained (foundation) models for vision-based robotic navigation, aiming to achieve generalizable navigation and positive transfer across diverse environments while enhancing zero-shot performance in unseen settings. In this work, we introduce PIG-Nav (Pretrained Image-Goal Navigation), a new approach that further investigates pretraining strategies for vision-based navigation models and contributes in two key areas. Model-wise, we identify two critical design choices that consistently improve the performance of pretrained navigation models: (1) integrating an early-fusion network structure to combine visual observations and goal images via appropriately pretrained Vision Transformer (ViT) image encoder, and (2) introducing suitable auxiliary tasks to enhance global navigation representation learning, thus further improving navigation performance. Dataset-wise, we propose a novel data preprocessing pipeline for efficiently labeling large-scale game video datasets for navigation model training. We demonstrate that augmenting existing open navigation datasets with diverse gameplay videos improves model performance. Our model achieves an average improvement of 22.6% in zero-shot settings and a 37.5% improvement in fine-tuning settings over existing visual navigation foundation models in two complex simulated environments and one real-world environment. These results advance the state-of-the-art in pretrained image-goal navigation models. Notably, our model maintains competitive performance while requiring significantly less fine-tuning data, highlighting its potential for real-world deployment with minimal labeled supervision. △ Less

Submitted 23 July, 2025; originally announced July 2025.

arXiv:2507.17147 [pdf, ps, other]

CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards

Authors: Cheng Liu, Yifei Lu, Fanghua Ye, Jian Li, Xingyu Chen, Feiliang Ren, Zhaopeng Tu, Xiaolong Li

Abstract: Role-Playing Language Agents (RPLAs) have emerged as a significant application direction for Large Language Models (LLMs). Existing approaches typically rely on prompt engineering or supervised fine-tuning to enable models to imitate character behaviors in specific scenarios, but often neglect the underlying \emph{cognitive} mechanisms driving these behaviors. Inspired by cognitive psychology, we… ▽ More Role-Playing Language Agents (RPLAs) have emerged as a significant application direction for Large Language Models (LLMs). Existing approaches typically rely on prompt engineering or supervised fine-tuning to enable models to imitate character behaviors in specific scenarios, but often neglect the underlying \emph{cognitive} mechanisms driving these behaviors. Inspired by cognitive psychology, we introduce \textbf{CogDual}, a novel RPLA adopting a \textit{cognize-then-respond } reasoning paradigm. By jointly modeling external situational awareness and internal self-awareness, CogDual generates responses with improved character consistency and contextual alignment. To further optimize the performance, we employ reinforcement learning with two general-purpose reward schemes designed for open-domain text generation. Extensive experiments on the CoSER benchmark, as well as Cross-MR and LifeChoice, demonstrate that CogDual consistently outperforms existing baselines and generalizes effectively across diverse role-playing tasks. △ Less