这是indexloc提供的服务,不要输入任何密码
Skip to content

DWCTOD/cv-arxiv-daily

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Updated on 2025.07.26

Video_Classification

Publish Date Title Authors PDF Code
2025-07-23 Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility Melih Barsbey et.al. 2507.17748v1 null
2025-07-23 Yume: An Interactive World Generation Model Xiaofeng Mao et.al. 2507.17744v1 null
2025-07-23 BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems Malsha Ashani Mahawatta Dona et.al. 2507.17722v1 null
2025-07-23 Towards Effective Open-set Graph Class-incremental Learning Jiazhen Chen et.al. 2507.17687v1 null
2025-07-23 Audio-Vision Contrastive Learning for Phonological Class Recognition Daiqi Liu et.al. 2507.17682v1 null
2025-07-23 MCM: Mamba-based Cardiac Motion Tracking using Sequential Images in MRI Jiahui Yin et.al. 2507.17678v1 null
2025-07-23 Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography Farnoush Bayatmakou et.al. 2507.17662v1 null
2025-07-23 The Early Bird Identifies the Worm: You Can't Beat a Head Start in Long-Term Body Re-ID (ECHO-BID) Thomas M. Metz et.al. 2507.17640v1 null
2025-07-23 Who Attacks, and Why? Using LLMs to Identify Negative Campaigning in 18M Tweets across 19 Countries Victor Hartman et.al. 2507.17636v1 null
2025-07-23 Gauge Symmetries, Exact Symmetries and Conserved Charges in Minimal Massive Gravity Kang Liu et.al. 2507.17635v1 null
2025-07-22 MultiTaskDeltaNet: Change Detection-based Image Segmentation for Operando ETEM with Application to Carbon Gasification Kinetics Yushuo Niu et.al. 2507.16803v1 null
2025-07-22 Improving U-Net Confidence on TEM Image Data with L2-Regularization, Transfer Learning, and Deep Fine-Tuning Aiden Ochoa et.al. 2507.16779v1 null
2025-07-22 Faithful, Interpretable Chest X-ray Diagnosis with Anti-Aliased B-cos Networks Marcel Kleinmann et.al. 2507.16761v1 null
2025-07-22 Improving Model Classification by Optimizing the Training Dataset Morad Tukan et.al. 2507.16729v1 null
2025-07-22 SALM: Spatial Audio Language Model with Structured Embeddings for Understanding and Editing Jinbo Hu et.al. 2507.16724v1 null
2025-07-22 Temporally-Constrained Video Reasoning Segmentation and Automated Benchmark Construction Yiqing Shen et.al. 2507.16718v1 null
2025-07-22 A Tutorial on MRI Reconstruction: From Modern Methods to Clinical Implications Tolga Çukur et.al. 2507.16715v1 null
2025-07-22 Ring-based ML calibration with in situ pileup correction for real-time jet triggers Benjamin T. Carlson et.al. 2507.16686v1 null
2025-07-22 VulGuard: An Unified Tool for Evaluating Just-In-Time Vulnerability Prediction Models Duong Nguyen et.al. 2507.16685v1 null
2025-07-22 Structural Effect and Spectral Enhancement of High-Dimensional Regularized Linear Discriminant Analysis Yonghan Zhang et.al. 2507.16682v1 null
2025-07-21 Simulating the LOcal Web (SLOW) V. Thermodynamic Properties and Evolution of Local Galaxy Clusters Elena Hernández-Martínez et.al. 2507.15858v1 null
2025-07-21 Optimized Fabrication Procedure for High-Quality Graphene-based Moiré Superlattice Devices Shuwen Sun et.al. 2507.15853v1 null
2025-07-22 SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction Zhixiong Zhang et.al. 2507.15852v2 null
2025-07-22 GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding Fei Tang et.al. 2507.15846v2 null
2025-07-21 Quantum computational sensing using quantum signal processing, quantum neural networks, and Hamiltonian engineering Saeed A. Khan et.al. 2507.15845v1 null
2025-07-21 Optimizing Canaries for Privacy Auditing with Metagradient Descent Matteo Boglioni et.al. 2507.15836v1 null
2025-07-21 Can Your Model Separate Yolks with a Water Bottle? Benchmarking Physical Commonsense Understanding in Video Generation Models Enes Sanli et.al. 2507.15824v1 null
2025-07-21 Graph Attention Specialized Expert Fusion Model for Node Classification: Based on Cora and Pubmed Datasets Zihang Ma et.al. 2507.15784v1 null
2025-07-21 Learning from Heterogeneity: Generalizing Dynamic Facial Expression Recognition via Distributionally Robust Optimization Feng-Qi Cui et.al. 2507.15765v1 null
2025-07-21 TokensGen: Harnessing Condensed Tokens for Long Video Generation Wenqi Ouyang et.al. 2507.15728v1 null
2025-07-18 NGC 663 as a laboratory for massive star evolution Amparo Marco et.al. 2507.14125v1 null
2025-07-18 Kolmogorov Arnold Networks (KANs) for Imbalanced Data -- An Empirical Perspective Pankaj Yadav et.al. 2507.14121v1 null
2025-07-18 Quantum Boltzmann Machines using Parallel Annealing for Medical Image Classification Daniëlle Schuman et.al. 2507.14116v1 null
2025-07-18 Maximal translation surfaces in Lorentz-Minkowski space Rafael López et.al. 2507.14103v1 null
2025-07-18 UGPL: Uncertainty-Guided Progressive Learning for Evidence-Based Classification in Computed Tomography Shravan Venkatraman et.al. 2507.14102v1 null
2025-07-18 Generative AI-Driven High-Fidelity Human Motion Simulation Hari Iyer et.al. 2507.14097v1 null
2025-07-18 Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment Šimon Kubov et.al. 2507.14093v1 null
2025-07-18 Unmasking Performance Gaps: A Comparative Study of Human Anonymization and Its Effects on Video Anomaly Detection Sara Abdulaziz et.al. 2507.14083v1 null
2025-07-18 Semi-supervised classification of Stars, Galaxies and Quasars using K-means and Random Forest Vahid Asadi et.al. 2507.14072v1 null
2025-07-18 Predicting interface and spin states in armchair graphene nanoribbon junctions Sofia Sanz et.al. 2507.14065v1 null
2025-07-17 VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding Shihao Wang et.al. 2507.13353v1 null
2025-07-17 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning Yifan Wang et.al. 2507.13347v1 null
2025-07-17 Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models Yudong Jin et.al. 2507.13344v1 null
2025-07-17 Taming Diffusion Transformer for Real-Time Mobile Video Generation Yushu Wu et.al. 2507.13343v1 null
2025-07-17 SpectraLift: Physics-Guided Spectral-Inversion Network for Self-Supervised Hyperspectral Image Super-Resolution Ritik Shah et.al. 2507.13339v1 null
2025-07-17 FocusView: Understanding and Customizing Informational Video Watching Experiences for Viewers with ADHD Hanxiu 'Hazel' Zhu et.al. 2507.13309v1 null
2025-07-17 Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy Yiting Yang et.al. 2507.13260v1 null
2025-07-17 Signal Temporal Logic Compliant Co-design of Planning and Control Manas Sashank Juvvi et.al. 2507.13225v1 null
2025-07-17 Leveraging Pre-Trained Visual Models for AI-Generated Video Detection Keerthi Veeramachaneni et.al. 2507.13224v1 null
2025-07-17 Degrees of points with rational $j$-invariant on $X_{0}(n)$ and $X_{1}(n)$ Kenji Terao et.al. 2507.13199v1 null
2025-07-16 CytoSAE: Interpretable Cell Embeddings for Hematology Muhammed Furkan Dasdelen et.al. 2507.12464v1 null
2025-07-16 MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding Renjie Li et.al. 2507.12463v1 null
2025-07-16 SpatialTrackerV2: 3D Point Tracking Made Easy Yuxi Xiao et.al. 2507.12462v1 null
2025-07-16 Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios Van-Hoang-Anh Phan et.al. 2507.12449v1 null
2025-07-16 Minmax Exclusivity Classes for Power-Type Loss Functions Stanisław M. S. Halkiewicz et.al. 2507.12447v1 null
2025-07-17 EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos Ruihan Yang et.al. 2507.12440v2 null
2025-07-16 Heisenberg limited multiple eigenvalue estimation via off-the-grid compressed sensing Davide Castaldo et.al. 2507.12438v1 null
2025-07-16 Energy-based models for inverse imaging problems Andreas Habring et.al. 2507.12432v1 null
2025-07-16 Unit-Based Histopathology Tissue Segmentation via Multi-Level Feature Representation Ashkan Shakarami et.al. 2507.12427v1 null
2025-07-16 DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition Hayat Ullah et.al. 2507.12426v1 null
2025-07-15 Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation Zhen Xu et.al. 2507.11540v1 null
2025-07-15 Streaming 4D Visual Geometry Transformer Dong Zhuo et.al. 2507.11539v1 null
2025-07-15 Understanding Quantum Information and Computation John Watrous et.al. 2507.11536v1 null
2025-07-15 LLM-based ambiguity detection in natural language instructions for collaborative surgical robots Ana Davila et.al. 2507.11525v1 null
2025-07-15 Precision Spatio-Temporal Feature Fusion for Robust Remote Sensing Change Detection Buddhi Wijenayake et.al. 2507.11523v1 null
2025-07-15 CATVis: Context-Aware Thought Visualization Tariq Mehmood et.al. 2507.11522v1 null
2025-07-15 On the Complexity of the Optimal Correlated Equilibria in Extensive-Form Games Vincent Cheval et.al. 2507.11509v1 null
2025-07-16 Multipass Linear Sketches for Geometric LP-Type Problems N. Efe Çekirge et.al. 2507.11484v2 null
2025-07-15 JamShield: A Machine Learning Detection System for Over-the-Air Jamming Attacks Ioannis Panitsas et.al. 2507.11483v1 null
2025-07-15 C-FBI: A Combinatorial method using Convolutions for Circle Fitting in Blurry Images Esteban Román Catafau et.al. 2507.11476v1 null
2025-07-14 EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Mingxian Lin et.al. 2507.10548v1 null
2025-07-14 Disentangling Neural Disjunctive Normal Form Models Kexin Gu Baugh et.al. 2507.10546v1 null
2025-07-14 ScaffoldAvatar: High-Fidelity Gaussian Avatars with Patch Expressions Shivangi Aneja et.al. 2507.10542v1 null
2025-07-14 A Classification of Transversal Clifford Gates for Qubit Stabilizer Codes Shival Dasu et.al. 2507.10519v1 null
2025-07-14 Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI Jiangkai Wu et.al. 2507.10510v1 null
2025-07-14 Topological phases and Edge states in an exactly solvable Gamma matrix model Akhil Pravin Furtado et.al. 2507.10509v1 null
2025-07-14 Colorful Minors Evangelos Protopapas et.al. 2507.10467v1 null
2025-07-14 AudioMAE++: learning better masked audio representations with SwiGLU FFNs Sarthak Yadav et.al. 2507.10464v1 null
2025-07-14 RAPNet: A Receptive-Field Adaptive Convolutional Neural Network for Pansharpening Tao Tang et.al. 2507.10461v1 null
2025-07-14 4D-Animal: Freely Reconstructing Animatable 3D Animals from Videos Shanshan Zhong et.al. 2507.10437v1 null
2025-07-11 Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective Hangjie Yuan et.al. 2507.08801v1 null
2025-07-11 Mining the Alerts: A Preliminary Catalog of Compact Binaries from the Fourth Observing Run Aleyna Akyüz et.al. 2507.08778v1 null
2025-07-11 A Hybrid Multi-Well Hopfield-CNN with Feature Extraction and K-Means for MNIST Classification Ahmed Farooq et.al. 2507.08766v1 null
2025-07-11 ML-Based Automata Simplification for Symbolic Accelerators Tiffany Yu et.al. 2507.08751v1 null
2025-07-11 HieraRS: A Hierarchical Segmentation Paradigm for Remote Sensing Enabling Multi-Granularity Interpretation and Cross-Domain Transfer Tianlong Ai et.al. 2507.08741v1 null
2025-07-11 Statistical Analysis of Early Spectra in Type II and IIb Supernovae Maider González-Bañuelos et.al. 2507.08731v1 null
2025-07-11 RoundaboutHD: High-Resolution Real-World Urban Environment Benchmark for Multi-Camera Vehicle Tracking Yuqiang Lin et.al. 2507.08729v1 null
2025-07-11 Distinct neurodynamics of functional brain networks in Alzheimer's disease and frontotemporal dementia as revealed by EEG Sungwoo Ahn et.al. 2507.08728v1 null
2025-07-11 Free phases of Majorana fermions: Tenfold ways compared Luuk Stehouwer et.al. 2507.08694v1 null
2025-07-11 Functional equations of axiomatic multiple Dirichlet series, Weyl groupoids, and quantum algebra Will Sawin et.al. 2507.08662v1 null
2025-07-10 Multigranular Evaluation for Brain Visual Decoding Weihao Xia et.al. 2507.07993v1 null
2025-07-10 Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs Jeongseok Hyun et.al. 2507.07990v1 null
2025-07-10 CLIP Won't Learn Object-Attribute Binding from Natural Data and Here is Why Bijay Gurung et.al. 2507.07985v1 null
2025-07-10 Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling Haoyu Wu et.al. 2507.07982v1 null
2025-07-10 Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions Longfei Li et.al. 2507.07978v1 null
2025-07-10 Scaling RL to Long Videos Yukang Chen et.al. 2507.07966v1 null
2025-07-10 Multimodal Framework for Explainable Autonomous Driving: Integrating Video, Sensor, and Textual Data for Enhanced Decision-Making and Transparency Abolfazl Zarghani et.al. 2507.07938v1 null
2025-07-10 Working with AI: Measuring the Occupational Implications of Generative AI Kiran Tomlinson et.al. 2507.07935v1 null
2025-07-10 Measuring Hypothesis Testing Errors in the Evaluation of Retrieval Systems Jack McKechnie et.al. 2507.07924v1 null
2025-07-10 ArteryX: Advancing Brain Artery Feature Extraction with Vessel-Fused Networks and a Robust Validation Framework Abrar Faiyaz et.al. 2507.07920v1 null
2025-07-10 DTECT: Dynamic Topic Explorer & Context Tracker Suman Adhya et.al. 2507.07910v1 null
2025-07-09 4KAgent: Agentic Any Image to 4K Super-Resolution Yushen Zuo et.al. 2507.07105v1 null
2025-07-09 Exploring Public Perceptions of Generative AI in Libraries: A Social Media Analysis of X Discussions Yuan Li et.al. 2507.07047v1 null
2025-07-09 Opto-ViT: Architecting a Near-Sensor Region of Interest-Aware Vision Transformer Accelerator with Silicon Photonics Mehrdad Morsali et.al. 2507.07044v1 null
2025-07-09 Tilings of the sphere by congruent pentagons V: Edge combination $a^{4}b$ with rational angles Jinjin Liang et.al. 2507.07038v1 null
2025-07-09 Classifying integral Grothendieck rings up to rank 5 and beyond Max A. Alekseyev et.al. 2507.07023v1 null
2025-07-09 Quantum Spectral Clustering: Comparing Parameterized and Neuromorphic Quantum Kernels Donovan Slabbert et.al. 2507.07018v1 null
2025-07-09 Deep Brain Net: An Optimized Deep Learning Model for Brain tumor Detection in MRI Images Using EfficientNetB0 and ResNet50 with Transfer Learning Daniel Onah et.al. 2507.07011v1 null
2025-07-09 GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning S M Taslim Uddin Raju et.al. 2507.07006v1 null
2025-07-09 BarkBeetle: Stealing Decision Tree Models with Fault Injection Qifan Wang et.al. 2507.06986v1 null
2025-07-09 Anti-Interference Diffractive Deep Neural Networks for Multi-Object Recognition Zhiqi Huang et.al. 2507.06978v1 null
2025-07-08 Learning to Track Any Points from Human Motion Inès Hyeonsu Kim et.al. 2507.06233v1 null
2025-07-08 seMCD: Sequentially implemented Monte Carlo depth computation with statistical guarantees Felix Gnettner et.al. 2507.06227v1 null
2025-07-08 EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow Yixiang Chen et.al. 2507.06224v1 null
2025-07-08 Topological Holography for Mixed-State Phases and Phase Transitions Ran Luo et.al. 2507.06218v1 null
2025-07-08 What ZTF Saw Where Rubin Looked: Anomaly Hunting in DR23 Maria V. Pruzhinskaya et.al. 2507.06217v1 null
2025-07-08 DS@GT at CheckThat! 2025: Ensemble Methods for Detection of Scientific Discourse on Social Media Ayush Parikh et.al. 2507.06205v1 null
2025-07-08 DS@GT at CheckThat! 2025: Evaluating Context and Tokenization Strategies for Numerical Fact Verification Maximilian Heil et.al. 2507.06195v1 null
2025-07-08 DS@GT at CheckThat! 2025: Detecting Subjectivity via Transfer-Learning and Corrective Data Augmentation Maximilian Heil et.al. 2507.06189v1 null
2025-07-08 SoftReMish: A Novel Activation Function for Enhanced Convolutional Neural Networks for Visual Recognition Performance Mustafa Bayram Gücen et.al. 2507.06148v1 null
2025-07-08 LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models Zhihao Chen et.al. 2507.06140v1 null
2025-07-07 Spatio-Temporal LLM: Reasoning about Environments and Actions Haozhen Zheng et.al. 2507.05258v1 null
2025-07-07 StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Meng Wei et.al. 2507.05240v1 null
2025-07-07 Bridging Expressivity and Scalability with Adaptive Unitary SSMs Arjun Karuvally et.al. 2507.05238v1 null
2025-07-07 Self-Supervised Real-Time Tracking of Military Vehicles in Low-FPS UAV Footage Markiyan Kostiv et.al. 2507.05229v1 null
2025-07-08 MedGemma Technical Report Andrew Sellergren et.al. 2507.05201v2 null
2025-07-07 EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling Boyuan Wang et.al. 2507.05198v1 null
2025-07-07 Light-cone vector superspace and continuous-spin field in AdS R. R. Metsaev et.al. 2507.05194v1 null
2025-07-07 RAM-W600: A Multi-Task Wrist Dataset and Benchmark for Rheumatoid Arthritis Songxiao Yang et.al. 2507.05193v1 null
2025-07-07 QMoE: A Quantum Mixture of Experts Framework for Scalable Quantum Neural Networks Hoang-Quan Nguyen et.al. 2507.05190v1 null
2025-07-07 Satellite-based Rabi rice paddy field mapping in India: a case study on Telangana state Prashanth Reddy Putta et.al. 2507.05189v1 null
2025-07-03 MultiGen: Using Multimodal Generation in Simulation to Learn Multimodal Policies in Real Renhao Wang et.al. 2507.02864v1 null
2025-07-03 RefTok: Reference-Based Tokenization for Video Generation Xiang Fan et.al. 2507.02862v1 null
2025-07-03 LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans Zhening Huang et.al. 2507.02861v1 null
2025-07-03 Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching Xin Zhou et.al. 2507.02860v1 null
2025-07-03 AnyI2V: Animating Any Conditional Image with Motion Control Ziye Li et.al. 2507.02857v1 null
2025-07-03 Classification and Reduction of Homogeneous Star Products Marvin Dippell et.al. 2507.02820v1 null
2025-07-03 LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion Fangfu Liu et.al. 2507.02813v1 null
2025-07-03 HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars Gent Serifi et.al. 2507.02803v1 null
2025-07-03 From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding Xiangfeng Wang et.al. 2507.02790v1 null
2025-07-03 From Pixels to Damage Severity: Estimating Earthquake Impacts Using Semantic Segmentation of Social Media Images Danrong Zhang et.al. 2507.02781v1 null
2025-07-02 How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Rahul Ramachandran et.al. 2507.01955v1 null
2025-07-02 Kwai Keye-VL Technical Report Kwai Keye Team et.al. 2507.01949v1 null
2025-07-02 LongAnimation: Long Animation Generation with Dynamic Global-Local Memory Nan Chen et.al. 2507.01945v1 null
2025-07-02 CI-VID: A Coherent Interleaved Text-Video Dataset Yiming Ju et.al. 2507.01938v1 null
2025-07-02 evMLP: An Efficient Event-Driven MLP Architecture for Vision Zhentan Zheng et.al. 2507.01927v1 null
2025-07-02 Advancing Magnetic Materials Discovery -- A structure-based machine learning approach for magnetic ordering and magnetic moment prediction Apoorv Verma et.al. 2507.01913v1 null
2025-07-02 Future Slot Prediction for Unsupervised Object Discovery in Surgical Video Guiqiu Liao et.al. 2507.01882v1 null
2025-07-02 A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs Niccolò McConnell et.al. 2507.01881v1 null
2025-07-02 Locally Rotationally Symmetric Spacetimes in Einstein-Cartan Theory and Their Classification Ujjwal Agarwal et.al. 2507.01840v1 null
2025-07-02 mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling Tristan Torchet et.al. 2507.01829v1 null
2025-06-30 How to Design and Train Your Implicit Neural Representation for Video Compression Matthew Gwilliam et.al. 2506.24127v1 null
2025-06-30 TextMesh4D: High-Quality Text-to-4D Mesh Generation Sisi Dai et.al. 2506.24121v1 null
2025-06-30 Nonlinear Symmetry-Fragmentation of Nonabelian Anyons In Symmetry-Enriched Topological Phases: A String-Net Model Realization Nianrui Fu et.al. 2506.24115v1 null
2025-06-30 Epona: Autoregressive Diffusion World Model for Autonomous Driving Kaiwen Zhang et.al. 2506.24113v1 null
2025-06-30 MILo: Mesh-In-the-Loop Gaussian Splatting for Detailed and Efficient Surface Reconstruction Antoine Guédon et.al. 2506.24096v1 null
2025-06-30 SQUASH: A SWAP-Based Quantum Attack to Sabotage Hybrid Quantum Neural Networks Rahul Kumar et.al. 2506.24081v1 null
2025-06-30 C3VDv2 -- Colonoscopy 3D video dataset with enhanced realism Mayank V. Golhar et.al. 2506.24074v1 null
2025-06-30 Spectroscopy of drive-induced unwanted state transitions in superconducting circuits W. Dai et.al. 2506.24070v1 null
2025-06-30 Evolution models with time-dependent coefficients in friction and viscoelastic damping terms Halit Sevki Aslan et.al. 2506.24058v1 null
2025-06-30 Ella: Embodied Social Agents with Lifelong Memory Hongxin Zhang et.al. 2506.24019v1 null
2025-06-27 Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy Yuhao Liu et.al. 2506.22432v1 null
2025-06-27 Single-shot HDR using conventional image sensor shutter functions and optical randomization Xiang Dai et.al. 2506.22426v1 null
2025-06-30 Dehazing Light Microscopy Images with Guided Conditional Flow Matching: finding a sweet spot between fidelity and realism Anirban Ray et.al. 2506.22397v2 null
2025-06-27 Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment Yue Zhang et.al. 2506.22385v1 null
2025-06-27 Topological Defect Propagation to Classify Knitted Fabrics Daisuke S. Shimamoto et.al. 2506.22369v1 null
2025-06-27 From Ground to Air: Noise Robustness in Vision Transformers and CNNs for Event-Based Vehicle Classification with Potential UAV Applications Nouf Almesafri et.al. 2506.22360v1 null
2025-06-27 OutDreamer: Video Outpainting with a Diffusion Transformer Linhao Zhong et.al. 2506.22298v1 null
2025-06-27 DIGS: Dynamic CBCT Reconstruction using Deformation-Informed 4D Gaussian Splatting and a Low-Rank Free-Form Deformation Model Yuliang Huang et.al. 2506.22280v1 null
2025-06-27 Almost abelian pseudo-Kähler Lie algebras Diego Conti et.al. 2506.22278v1 null
2025-06-27 Boosting Classification with Quantum-Inspired Augmentations Matthias Tschöpe et.al. 2506.22241v1 null
2025-06-26 Whole-Body Conditioned Egocentric Video Prediction Yutong Bai et.al. 2506.21552v1 null
2025-06-26 SAM4D: Segment Anything in Camera and LiDAR Streams Jianyun Xu et.al. 2506.21547v1 null
2025-06-26 ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers Nicholas S. DiBrita et.al. 2506.21537v1 null
2025-06-26 Exploring the Design Space of 3D MLLMs for CT Report Generation Mohammed Baharoon et.al. 2506.21535v1 null
2025-06-26 The spectrum of global representations for families of bounded rank and VI-modules Miguel Barrero et.al. 2506.21525v1 null
2025-06-26 MADrive: Memory-Augmented Driving Scene Modeling Polina Karpikova et.al. 2506.21520v1 null
2025-06-26 G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation Mohammed Rakib et.al. 2506.21514v1 null
2025-06-26 GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation Wentao Hu et.al. 2506.21513v1 null
2025-06-26 Devising a solution to the problems of Cancer awareness in Telangana Priyanka Avhad et.al. 2506.21500v1 null
2025-06-26 Lightweight Physics-Informed Zero-Shot Ultrasound Plane Wave Denoising Hojat Asgariandehkordi et.al. 2506.21499v1 null
2025-06-25 Artificial Symmetry Breaking by Self-Interaction Error Lin Hou et.al. 2506.20662v1 null
2025-06-25 EditP23: 3D Editing via Propagation of Image Prompts to Multi-View Roi Bar-On et.al. 2506.20652v1 null
2025-06-25 Disentangled representations of microscopy images Jacopo Dapueto et.al. 2506.20649v1 null
2025-06-25 rd-spiral: An open-source Python library for learning 2D reaction-diffusion dynamics through pseudo-spectral method Sandy H. S. Herho et.al. 2506.20633v1 link
2025-06-25 Weighted Mean Frequencies: a handcraft Fourier feature for 4D Flow MRI segmentation Simon Perrin et.al. 2506.20614v1 null
2025-06-25 Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings Ankit Shah et.al. 2506.20609v1 null
2025-06-25 Video Perception Models for 3D Scene Synthesis Rui Huang et.al. 2506.20601v1 null
2025-06-25 CogGen: A Learner-Centered Generative AI Architecture for Intelligent Tutoring with Programming Video Wengxi Li et.al. 2506.20600v1 null
2025-06-25 WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration Chaojun Ni et.al. 2506.20590v1 null
2025-06-25 TRIM: A Self-Supervised Video Summarization Framework Maximizing Temporal Relative Information and Representativeness Pritam Mishra et.al. 2506.20588v1 null
2025-06-24 Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation Xingyang Li et.al. 2506.19852v1 null
2025-06-24 AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models Zehuan Huang et.al. 2506.19851v1 null
2025-06-24 Unified Vision-Language-Action Model Yuqi Wang et.al. 2506.19850v1 null
2025-06-24 GenHSI: Controllable Generation of Human-Scene Interaction Videos Zekun Li et.al. 2506.19840v1 null
2025-06-24 Improving Progressive Generation with Decomposable Flow Matching Moayed Haji-Ali et.al. 2506.19839v1 null
2025-06-24 SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution Liangbin Xie et.al. 2506.19838v1 null
2025-06-24 MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration Yucheng Zhou et.al. 2506.19835v1 null
2025-06-24 Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router Yubo Huang et.al. 2506.19833v1 null
2025-06-24 How Effectively Can BERT Models Interpret Context and Detect Bengali Communal Violent Text? Abdullah Khondoker et.al. 2506.19831v1 null
2025-06-25 One Prototype Is Enough: Single-Prototype Activation for Interpretable Image Classification Yitao Peng et.al. 2506.19808v2 null
2025-06-23 TC-Light: Temporally Consistent Relighting for Dynamic Long Videos Yang Liu et.al. 2506.18904v1 null
2025-06-23 VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory Runjia Li et.al. 2506.18903v1 null
2025-06-23 From Virtual Games to Real-World Play Wenqiang Sun et.al. 2506.18901v1 null
2025-06-23 FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation Kaiyi Huang et.al. 2506.18899v1 null
2025-06-23 MinD: Unified Visual Imagination and Control via Hierarchical World Models Xiaowei Chi et.al. 2506.18897v1 null
2025-06-23 Steering Conceptual Bias via Transformer Latent-Subspace Activation Vansh Sharma et.al. 2506.18887v1 null
2025-06-23 Universal Video Temporal Grounding with Generative Multi-modal Large Language Models Zeqian Li et.al. 2506.18883v1 null
2025-06-23 Let Your Video Listen to Your Music! Xinyu Zhang et.al. 2506.18881v1 null
2025-06-23 OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation Qijun Gan et.al. 2506.18866v1 null
2025-06-23 Pointwise-relatively-compact subgroups and trivial-weight-free representations Alexandru Chirvasitu et.al. 2506.18861v1 null
2025-06-20 VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning Zhangyang Qi et.al. 2506.17221v1 null
2025-06-23 Emergent Temporal Correspondences from Video Diffusion Transformers Jisu Nam et.al. 2506.17220v2 link
2025-06-20 Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition Jiaqi Li et.al. 2506.17201v1 null
2025-06-20 YASMOT: Yet another stereo image multi-object tracker Ketil Malde et.al. 2506.17186v1 null
2025-06-20 High-accuracy inference using HfO$_x$S$_y$/HfS$_2$ Memristors Aferdita Xhameni et.al. 2506.17174v1 null
2025-06-20 Proportional Sensitivity in Generative Adversarial Network (GAN)-Augmented Brain Tumor Classification Using Convolutional Neural Network Mahin Montasir Afif et.al. 2506.17165v1 null
2025-06-20 Affine semigroups without consecutive small elements J. C. Rosales et.al. 2506.17152v1 null
2025-06-20 Do We Need Large VLMs for Spotting Soccer Actions? Ritabrata Chakraborty et.al. 2506.17144v1 null
2025-06-20 MeDi: Metadata-Guided Diffusion Models for Mitigating Biases in Tumor Classification David Jacob Drexlin et.al. 2506.17140v1 null
2025-06-20 Robust Training with Data Augmentation for Medical Imaging Classification Josué Martínez-Martínez et.al. 2506.17133v1 null
2025-06-18 Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos Kaifeng Zhang et.al. 2506.15680v1 null
2025-06-20 Sekai: A Video Dataset towards World Exploration Zhen Li et.al. 2506.15675v2 null
2025-06-18 UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting Kai He et.al. 2506.15673v1 null
2025-06-18 PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection Wenhao Li et.al. 2506.15656v1 null
2025-06-18 Oldies but Goldies: The Potential of Character N-grams for Romanian Texts Dana Lupsa et.al. 2506.15650v1 null
2025-06-18 FindingDory: A Benchmark to Evaluate Memory in Embodied Agents Karmesh Yadav et.al. 2506.15635v1 null
2025-06-18 GFLC: Graph-based Fairness-aware Label Correction for Fair Classification Modar Sulaiman et.al. 2506.15620v1 null
2025-06-18 The Compositional Architecture of Regret in Large Language Models Xiangxiang Cui et.al. 2506.15617v1 null
2025-06-18 TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data Kentaro Seki et.al. 2506.15614v1 null
2025-06-18 BoxFusion: Reconstruction-Free Open-Vocabulary 3D Object Detection via Real-Time Multi-View Box Fusion Yuqing Lan et.al. 2506.15610v1 null
2025-06-17 GMT: General Motion Tracking for Humanoid Whole-Body Control Zixuan Chen et.al. 2506.14770v1 null
2025-06-17 On the Hardness of Bandit Learning Nataly Brukhim et.al. 2506.14746v1 null
2025-06-17 SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting Ziqiao Peng et.al. 2506.14742v1 null
2025-06-17 Repulsive particle interactions enable selective information processing at cellular interfaces Jenna Elliott et.al. 2506.14739v1 null
2025-06-17 Plug-and-Play with 2.5D Artifact Reduction Prior for Fast and Accurate Industrial Computed Tomography Reconstruction Haley Duba-Sullivan et.al. 2506.14719v1 null
2025-06-17 Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models Ling Li et.al. 2506.14674v1 null
2025-06-17 Quantifying Diagnostic Signal Decay in Dementia: A National Study of Medicare Hospitalization Data Federica Spoto et.al. 2506.14669v1 null
2025-06-17 DDS-NAS: Dynamic Data Selection within Neural Architecture Search via On-line Hard Example Mining applied to Image Classification Matt Poyser et.al. 2506.14667v1 null
2025-06-18 AIn't Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation Leah von der Heyde et.al. 2506.14634v2 null
2025-06-17 Optimization-Based Image Restoration under Implementation Constraints in Optical Analog Circuits Taisei Kato et.al. 2506.14624v1 null
2025-06-16 PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images Lingteng Qiu et.al. 2506.13766v1 null
2025-06-16 Touch begins where vision ends: Generalizable policies for contact-rich manipulation Zifan Zhao et.al. 2506.13762v1 null
2025-06-17 VideoPDE: Unified Generative PDE Solving via Video Inpainting Diffusion Models Edward Li et.al. 2506.13754v2 null
2025-06-16 Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability Shova Kuikel et.al. 2506.13746v1 null
2025-06-16 Robust Recursive Fusion of Multiresolution Multispectral Images with Location-Aware Neural Networks Haoqing Li et.al. 2506.13733v1 null
2025-06-16 Probabilistic patient risk profiling with pair-copula constructions Özge Şahin et.al. 2506.13731v1 null
2025-06-16 Contrastive Self-Supervised Learning As Neural Manifold Packing Guanming Zhang et.al. 2506.13717v1 null
2025-06-16 TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning Junru Zhang et.al. 2506.13705v1 null
2025-06-16 Eight-dimensional non completely reducible symplectic Lie algebras T. Aït Aissa et.al. 2506.13699v1 null
2025-06-16 Vid-CamEdit: Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry Junyoung Seo et.al. 2506.13697v1 null
2025-06-13 crossMoDA Challenge: Evolution of Cross-Modality Domain Adaptation Techniques for Vestibular Schwannoma and Cochlea Segmentation from 2021 to 2023 Navodini Wijethilake et.al. 2506.12006v1 null
2025-06-13 Visual Pre-Training on Unlabeled Images using Reinforcement Learning Dibya Ghosh et.al. 2506.11967v1 null
2025-06-13 Technical Evaluation of a Disruptive Approach in Homomorphic AI Eric Filiol et.al. 2506.11954v1 null
2025-06-13 Effectiveness of Counter-Speech against Abusive Content: A Multidimensional Annotation and Classification Study Greta Damo et.al. 2506.11919v1 null
2025-06-13 GeistBERT: Breathing Life into German NLP Raphael Scheible-Schmitt et.al. 2506.11903v1 null
2025-06-13 A Neural Rejection System Against Universal Adversarial Perturbations in Radio Signal Classification Lu Zhang et.al. 2506.11901v1 null
2025-06-13 Attention-based Adversarial Robust Distillation in Radio Signal Classifications for Low-Power IoT Devices Lu Zhang et.al. 2506.11892v1 null
2025-06-13 Methods for evaluating the resolution of 3D data derived from satellite images Christina Selby et.al. 2506.11876v1 null
2025-06-13 MindGrab for BrainChop: Fast and Accurate Skull Stripping for Command Line and Browser Armina Fani et.al. 2506.11860v1 null
2025-06-13 3D Skin Segmentation Methods in Medical Imaging: A Comparison Martina Paccini et.al. 2506.11852v1 null
2025-06-12 InstaInpaint: Instant 3D-Scene Inpainting with Masked Large Reconstruction Model Junqi You et.al. 2506.10980v1 null
2025-06-12 GenWorld: Towards Detecting AI-generated Real-world Simulation Videos Weiliang Chen et.al. 2506.10975v1 null
2025-06-12 Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop Justin Kerr et.al. 2506.10968v1 null
2025-06-12 Bias-Switchable Row-Column Array Imaging using Fast Orthogonal Row-Column Electronic Scanning (FORCES) Compared with Conventional Row-Column Array Imaging Randy Palamar et.al. 2506.10958v1 null
2025-06-12 Coupled reaction and diffusion governing interface evolution in solid-state batteries Jingxuan Ding et.al. 2506.10944v1 null
2025-06-12 VINCIE: Unlocking In-context Image Editing from Video Leigang Qu et.al. 2506.10941v1 null
2025-06-12 Video-Mediated Emotion Disclosure: A Study of Mental Health Vlogging by People with Schizophrenia on YouTube Jiaying Lizzy Liu et.al. 2506.10932v1 null
2025-06-12 On feature selection in double-imbalanced data settings: a Random Forest approach Fabio Demaria et.al. 2506.10929v1 null
2025-06-12 Semi-Automated Quality Assurance in Digital Pathology: Tile Classification Approach Meredith VandeHaar et.al. 2506.10916v1 null
2025-06-12 M4V: Multi-Modal Mamba for Text-to-Video Generation Jiancheng Huang et.al. 2506.10915v1 null
2025-06-11 DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos Chieh Hubert Lin et.al. 2506.09997v1 null
2025-06-11 PlayerOne: Egocentric World Simulator Yuanpeng Tu et.al. 2506.09995v1 null
2025-06-11 Large Language Models for Toxic Language Detection in Low-Resource Balkan Languages Amel Muminovic et.al. 2506.09992v1 null
2025-06-11 Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes Yiming Dou et.al. 2506.09989v1 null
2025-06-11 A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs Benno Krojer et.al. 2506.09987v1 null
2025-06-11 V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Mido Assran et.al. 2506.09985v1 null
2025-06-11 InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions Zhenzhi Wang et.al. 2506.09984v1 null
2025-06-11 ReSim: Reliable World Simulation for Autonomous Driving Jiazhi Yang et.al. 2506.09981v1 null
2025-06-11 Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing Junfei Wu et.al. 2506.09965v1 null
2025-06-11 Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos Benjamin Reichman et.al. 2506.09953v1 null
2025-06-10 MagCache: Fast Video Generation with Magnitude-Aware Cache Zehong Ma et.al. 2506.09045v1 null
2025-06-10 The Decoupled Risk Landscape in Performative Prediction Javier Sanguino et.al. 2506.09044v1 null
2025-06-10 Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models Xuanchi Ren et.al. 2506.09042v1 null
2025-06-10 Princeton365: A Diverse Dataset with Accurate Camera Pose Karhan Kayan et.al. 2506.09035v1 null
2025-06-10 DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical Imaging Felix Wagner et.al. 2506.09024v1 null
2025-06-10 Employing self-supervised learning models for cross-linguistic child speech maturity classification Theo Zhang et.al. 2506.08999v1 null
2025-06-10 Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Chenyu Lian et.al. 2506.08990v1 null
2025-06-10 Naturalistic Language-related Movie-Watching fMRI Task for Detecting Neurocognitive Decline and Disorder Yuejiao Wang et.al. 2506.08986v1 null
2025-06-10 Diver-Robot Communication Dataset for Underwater Hand Gesture Recognition Igor Kvasić et.al. 2506.08974v1 null
2025-06-10 Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System Yuan Guo et.al. 2506.08972v1 null
2025-06-09 4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos Zhen Xu et.al. 2506.08015v1 null
2025-06-09 Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion Xun Huang et.al. 2506.08009v1 null
2025-06-09 Dreamland: Controllable World Creation with Simulator and Generative Models Sicheng Mo et.al. 2506.08006v1 null
2025-06-09 Dynamic View Synthesis as an Inverse Problem Hidir Yesiltepe et.al. 2506.08004v1 null
2025-06-09 Audio-Sync Video Generation with Multi-Stream Temporal Control Shuchen Weng et.al. 2506.08003v1 null
2025-06-09 Generative Modeling of Weights: Generalization or Memorization? Boya Zeng et.al. 2506.07998v1 null
2025-06-09 UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References Ming-Feng Li et.al. 2506.07996v1 null
2025-06-09 CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray Mingquan Lin et.al. 2506.07984v1 null
2025-06-09 Scalable Machine Learning Models for Predicting Quantum Transport in Disordered 2D Hexagonal Materials Seyed Mahdi Mastoor et.al. 2506.07983v1 null
2025-06-09 CyberV: Cybernetics for Test-time Scaling in Video Understanding Jiahao Meng et.al. 2506.07971v1 null
2025-06-06 TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation Muhammad Sohail Danish et.al. 2506.06281v1 null
2025-06-06 Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias Yuanzhe Hu et.al. 2506.06280v1 null
2025-06-06 ExAct: A Video-Language Benchmark for Expert Action Analysis Han Yi et.al. 2506.06277v1 null
2025-06-06 Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding Emmanouil Zaranis et.al. 2506.06275v1 null
2025-06-06 BecomingLit: Relightable Gaussian Avatars with Hybrid Neural Shading Jonathan Schmidt et.al. 2506.06271v1 null
2025-06-06 Integrating Complexity and Biological Realism: High-Performance Spiking Neural Networks for Breast Cancer Detection Zofia Rudnicka et.al. 2506.06265v1 null
2025-06-06 Tuning of altermagnetism by strain M. Khodas et.al. 2506.06257v1 null
2025-06-06 Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision Yuping He et.al. 2506.06253v1 null
2025-06-06 Explaining Matters: Leveraging Definitions and Semantic Expansion for Sexism Detection Sahrish Khan et.al. 2506.06238v1 null
2025-06-06 Towards an Explainable Comparison and Alignment of Feature Embeddings Mohammad Jalali et.al. 2506.06231v1 null
2025-06-05 VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos Hanoona Rasheed et.al. 2506.05349v1 null
2025-06-05 Neural Inverse Rendering from Propagating Light Anagh Malik et.al. 2506.05347v1 null
2025-06-05 ContentV: Efficient Training of Video Generation Models with Limited Compute Wenfeng Lin et.al. 2506.05343v1 null
2025-06-05 VideoMolmo: Spatio-Temporal Grounding Meets Pointing Ghazi Shazan Ahmad et.al. 2506.05336v1 null
2025-06-05 Unleashing Hour-Scale Video Training for Long Video-Language Understanding Jingyang Lin et.al. 2506.05332v1 null
2025-06-05 AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs Lidong Lu et.al. 2506.05328v1 null
2025-06-05 LSM-2: Learning from Incomplete Wearable Sensor Data Maxwell A. Xu et.al. 2506.05321v1 null
2025-06-05 ProJo4D: Progressive Joint Optimization for Sparse-View Inverse Physics Estimation Daniel Rho et.al. 2506.05317v1 null
2025-06-05 Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos Weifeng Lin et.al. 2506.05302v1 null
2025-06-05 SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training Jianyi Wang et.al. 2506.05301v1 null
2025-06-04 LayerFlow: A Unified Model for Layer-aware Video Generation Sihui Ji et.al. 2506.04228v1 null
2025-06-04 Object-centric 3D Motion Field for Robot Learning from Human Videos Zhao-Heng Yin et.al. 2506.04227v1 null
2025-06-04 Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation Tianyu Huang et.al. 2506.04225v1 null
2025-06-04 Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset Zirui Wang et.al. 2506.04224v1 null
2025-06-04 Topological Mixed States: Axiomatic Approaches and Phases of Matter Tai-Hsuan Yang et.al. 2506.04221v1 null
2025-06-04 UNIC: Unified In-Context Video Editing Zixuan Ye et.al. 2506.04216v1 null
2025-06-05 FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers Xuanhua He et.al. 2506.04213v2 null
2025-06-04 A Few Moments Please: Scalable Graphon Learning via Moment Matching Reza Ramezanpour et.al. 2506.04206v1 null
2025-06-04 Synthetic multi-inversion time magnetic resonance images for visualization of subcortical structures Savannah P. Hays et.al. 2506.04173v1 null
2025-06-04 Does Prompt Design Impact Quality of Data Imputation by LLMs? Shreenidhi Srinivasan et.al. 2506.04172v1 null
2025-06-03 IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation Yuanze Lin et.al. 2506.03150v1 null
2025-06-03 Topology meets symmetry breaking: Hidden order, intrinsically gapless topological states and finite-temperature topological transitions Reja H. Wilke et.al. 2506.03146v1 null
2025-06-03 Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval Jiwen Yu et.al. 2506.03141v1 null
2025-06-03 CamCloneMaster: Enabling Reference-based Camera Control for Video Generation Yawen Luo et.al. 2506.03140v1 null
2025-06-03 The perfect entangler spectrum as a tool to analyze crosstalk Matthias G. Krauss et.al. 2506.03137v1 null
2025-06-03 AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation Lu Qiu et.al. 2506.03126v1 null
2025-06-03 DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation Zhengyao Lv et.al. 2506.03123v1 null
2025-06-03 Controllable Human-centric Keyframe Interpolation with Generative Prior Zujin Guo et.al. 2506.03119v1 null
2025-06-03 HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers Zhiyuan Yu et.al. 2506.03118v1 null
2025-06-03 TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models Chetwin Low et.al. 2506.03099v1 null
2025-05-30 Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks Tajamul Ashraf et.al. 2505.24876v1 null
2025-05-30 MiniMax-Remover: Taming Bad Noise Helps Video Object Removal Bojia Zi et.al. 2505.24873v1 null
2025-05-30 SiLVR: A Simple Language-based Video Reasoning Framework Ce Zhang et.al. 2505.24869v1 null
2025-05-30 Time Blindness: Why Video-Language Models Can't See What Humans Can? Ujjwal Upadhyay et.al. 2505.24867v1 null
2025-05-30 TalkingHeadBench: A Multi-Modal Benchmark & Analysis of Talking-Head DeepFake Detection Xinqi Xiong et.al. 2505.24866v1 null
2025-05-30 DexMachina: Functional Retargeting for Bimanual Dexterous Manipulation Zhao Mandi et.al. 2505.24853v1 null
2025-05-30 Reading Recognition in the Wild Charig Yang et.al. 2505.24848v1 null
2025-05-30 VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software Brandon Man et.al. 2505.24838v1 null
2025-06-02 Beyond Pretty Pictures: Combined Single- and Multi-Image Super-resolution for Sentinel-2 Images Aditya Retnanto et.al. 2505.24799v2 null
2025-05-30 Lightweight Relational Embedding in Task-Interpolated Few-Shot Networks for Enhanced Gastrointestinal Disease Classification Xinliu Zhong et.al. 2505.24792v1 null
2025-05-29 Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models Haohan Chi et.al. 2505.23757v1 link
2025-05-29 Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Diankun Wu et.al. 2505.23747v1 null
2025-05-29 Boosting Domain Incremental Learning: Selecting the Optimal Parameters is All You Need Qiang Wang et.al. 2505.23744v1 null
2025-05-29 DarkDiff: Advancing Low-Light Raw Enhancement by Retasking Diffusion Models for Camera ISP Amber Yijia Zheng et.al. 2505.23743v1 null
2025-05-29 MAGREF: Masked Guidance for Any-Reference Video Generation Yufan Deng et.al. 2505.23742v1 link
2025-05-29 How Animals Dance (When You're Not Looking) Xiaojuan Wang et.al. 2505.23738v1 null
2025-05-30 ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS Weijie Wang et.al. 2505.23734v2 link
2025-05-29 The ambiguous AT2022rze: Changing-look AGN mimicking a supernova in a merging galaxy system P. J. Pessi et.al. 2505.23731v1 null
2025-05-29 Skin Lesion Phenotyping via Nested Multi-modal Contrastive Learning Dionysis Christopoulos et.al. 2505.23709v1 null
2025-05-29 Distributed Federated Learning for Vehicular Network Security: Anomaly Detection Benefits and Multi-Domain Attack Threats Utku Demir et.al. 2505.23706v1 null
2025-05-29 Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better Danny Driess et.al. 2505.23705v1 null
2025-05-28 Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation Zhe Kong et.al. 2505.22647v1 null
2025-05-28 PS4PRO: Pixel-to-pixel Supervision for Photorealistic Rendering and Optimization Yezhi Shen et.al. 2505.22616v1 null
2025-05-28 Chest Disease Detection In X-Ray Images Using Deep Learning Classification Method Alanna Hazlett et.al. 2505.22609v1 null
2025-05-28 Transformers for Secure Hardware Systems: Applications, Challenges, and Outlook Banafsheh Saber Latibari et.al. 2505.22605v1 null
2025-05-28 Comparative Analysis of Machine Learning Models for Lung Cancer Mutation Detection and Staging Using 3D CT Scans Yiheng Li et.al. 2505.22592v1 null
2025-05-28 Tell me Habibi, is it Real or Fake? Kartik Kuckreja et.al. 2505.22581v1 null
2025-05-28 Multipath cycleGAN for harmonization of paired and unpaired low-dose lung computed tomography reconstruction kernels Aravind R. Krishnan et.al. 2505.22568v1 null
2025-05-28 Universal Visuo-Tactile Video Understanding for Embodied Interaction Yifan Xie et.al. 2505.22566v1 null
2025-05-28 PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion Jaehyun Choi et.al. 2505.22564v1 null
2025-05-28 Emotion-o1: Adaptive Long Reasoning for Emotion Understanding in LLMs Changhao Song et.al. 2505.22548v1 null
2025-05-27 Frame In-N-Out: Unbounded Controllable Image-to-Video Generation Boyang Wang et.al. 2505.21491v1 null
2025-05-27 Tissue-specific predictive performance: A unified estimation and inference framework for multi-category screening tests A. Gregory DiRienzo et.al. 2505.21482v1 null
2025-05-27 M3S-UPD: Efficient Multi-Stage Self-Supervised Learning for Fine-Grained Encrypted Traffic Classification with Unknown Pattern Discovery Yali Yuan et.al. 2505.21462v1 null
2025-05-27 LazyVLM: Neuro-Symbolic Approach to Video Analytics Xiangru Jian et.al. 2505.21459v1 null
2025-05-27 OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers Ziqiao Peng et.al. 2505.21448v1 null
2025-05-27 Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization Vit Fojtik et.al. 2505.21423v1 null
2025-05-27 A Structured Unplugged Approach for Foundational AI Literacy in Primary Education Maria Cristina Carrisi et.al. 2505.21398v1 null
2025-05-27 Dynamic Vision from EEG Brain Recordings: How much does EEG know? Prajwal Singh et.al. 2505.21385v1 null
2025-05-27 ZigzagPointMamba: Spatial-Semantic Mamba for Point Cloud Understanding Linshuang Diao et.al. 2505.21381v1 null
2025-05-27 Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning? Junhao Cheng et.al. 2505.21374v1 null
2025-05-26 Unleashing 5G Seamless Integration with TSN for Industry 5.0: Frame Forwarding and QoS Treatment Oscar Adamuz-Hinojosa et.al. 2505.20239v1 null
2025-05-26 Research on feature fusion and multimodal patent text based on graph attention network Zhenzhen Song et.al. 2505.20188v1 null
2025-05-26 Exposing Go's Hidden Bugs: A Novel Concolic Framework Karolina Gorna et.al. 2505.20183v1 null
2025-05-26 Long-Context State-Space Video World Models Ryan Po et.al. 2505.20171v1 null
2025-05-26 DeepInverse: A Python package for solving imaging inverse problems with deep learning Julián Tachella et.al. 2505.20160v1 null
2025-05-26 HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters Yi Chen et.al. 2505.20156v1 null
2025-05-26 UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter-Efficient Fine-Tuning of Large Models Xueyan Zhang et.al. 2505.20154v1 null
2025-05-26 Improvement Strategies for Few-Shot Learning in OCT Image Classification of Rare Retinal Diseases Cheng-Yu Tai et.al. 2505.20149v1 null
2025-05-26 FairTalk: Facilitating Balanced Participation in Video Conferencing by Implicit Visualization of Predicted Turn-Grabbing Intention Ryo Iijima et.al. 2505.20138v1 null
2025-05-26 TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos Fanheng Kong et.al. 2505.20124v1 link
2025-05-23 WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions Zizhang Li et.al. 2505.18151v1 null
2025-05-23 TokBench: Evaluating Your Visual Tokenizer before Visual Generation Junfeng Wu et.al. 2505.18142v1 null
2025-05-23 VideoGameBench: Can Vision-Language Models complete popular video games? Alex L. Zhang et.al. 2505.18134v1 null
2025-05-23 TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations Alan Arazi et.al. 2505.18125v1 null
2025-05-23 Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM Zinuo Li et.al. 2505.18110v1 null
2025-05-23 Accelerating Learned Image Compression Through Modeling Neural Training Dynamics Yichi Zhang et.al. 2505.18107v1 null
2025-05-23 F-ANcGAN: An Attention-Enhanced Cycle Consistent Generative Adversarial Architecture for Synthetic Image Generation of Nanoparticles Varun Ajith et.al. 2505.18106v1 null
2025-05-23 Structural Dynamics of Harmful Content Dissemination on WhatsApp Yuxin Liu et.al. 2505.18099v1 null
2025-05-23 DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations Ziqiao Peng et.al. 2505.18096v1 null
2025-05-23 Early-Exit Graph Neural Networks Andrea Giuseppe Di Francesco et.al. 2505.18088v1 null
2025-05-22 CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms Shilin Yan et.al. 2505.17020v1 link
2025-05-22 Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space Yan Li et.al. 2505.17011v1 null
2025-05-22 Topological Phases, Criticality, and Mixed State Order in a Hubbard Quantum Simulator Lin Su et.al. 2505.17009v1 null
2025-05-22 Deep mineralogical segmentation of thin section images based on QEMSCAN maps Jean Pablo Vieira de Mello et.al. 2505.17008v1 link
2025-05-22 CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning Jiange Yang et.al. 2505.17006v1 null
2025-05-22 Seeing through Satellite Images at Street Views Ming Qian et.al. 2505.17001v1 null
2025-05-22 Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction Dong Li et.al. 2505.16980v1 null
2025-05-22 MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning Suhao Yu et.al. 2505.16964v1 null
2025-05-22 On Multilingual Encoder Language Model Compression for Low-Resource Languages Daniil Gurgurov et.al. 2505.16956v1 null
2025-05-22 On a certain class of para-Hermite Einstein spaces Adam Chudecki et.al. 2505.16945v1 null
2025-05-21 Leveraging the Powerful Attention of a Pre-trained Diffusion Model for Exemplar-based Image Colorization Satoshi Kosugi et.al. 2505.15812v1 link
2025-05-21 Adaptive Estimation and Learning under Temporal Distribution Shift Dheeraj Baby et.al. 2505.15803v1 null
2025-05-21 Interspatial Attention for Efficient 4D Human Video Generation Ruizhi Shao et.al. 2505.15800v1 null
2025-05-21 Large Language Models as Computable Approximations to Solomonoff Induction Jun Wan et.al. 2505.15784v1 null
2025-05-21 Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention Huanxuan Liao et.al. 2505.15774v1 null
2025-05-21 MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling Cheng Yifan et.al. 2505.15772v1 null
2025-05-21 Neuro-Argumentative Learning with Case-Based Reasoning Adam Gould et.al. 2505.15742v1 null
2025-05-21 iBitter-Stack: A Multi-Representation Ensemble Learning Model for Accurate Bitter Peptide Identification Sarfraz Ahmad et.al. 2505.15730v1 null
2025-05-21 Privacy-Preserving Conformal Prediction Under Local Differential Privacy Coby Penso et.al. 2505.15721v1 null
2025-05-21 MaxPoolBERT: Enhancing BERT Classification via Layer- and Token-Wise Aggregation Maike Behrendt et.al. 2505.15696v1 null
2025-05-20 Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers Sucheng Ren et.al. 2505.14687v1 link
2025-05-20 Emerging Properties in Unified Multimodal Pretraining Chaorui Deng et.al. 2505.14683v1 null
2025-05-20 ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions Bufang Yang et.al. 2505.14668v1 null
2025-05-20 EmoGist: Efficient In-Context Learning for Visual Emotion Understanding Ronald Seoh et.al. 2505.14660v1 null
2025-05-20 Beyond Words: Multimodal LLM Knows When to Speak Zikai Liao et.al. 2505.14654v1 null
2025-05-20 VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation Wentao Ma et.al. 2505.14640v1 null
2025-05-20 A General Framework for Group Sparsity in Hyperspectral Unmixing Using Endmember Bundles Gokul Bhusal et.al. 2505.14634v1 null
2025-05-20 Parabolic quantum affine algebras Kudret Bostanci et.al. 2505.14624v1 null
2025-05-20 Assessing Projected Quantum Kernels for the Classification of IoT Data Francesco D'Amore et.al. 2505.14593v1 null
2025-05-20 Automated Fetal Biometry Assessment with Deep Ensembles using Sparse-Sampling of 2D Intrapartum Ultrasound Images Jayroop Ramesh et.al. 2505.14572v1 null
2025-05-19 Unlocking Non-Invasive Brain-to-Text Dulhan Jayalath et.al. 2505.13446v1 null
2025-05-19 GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation Abhay Deshpande et.al. 2505.13441v1 null
2025-05-19 Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos Ruoyu Wang et.al. 2505.13440v1 link
2025-05-19 FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance Dian Shao et.al. 2505.13437v1 null
2025-05-19 Synthetic-Powered Predictive Inference Meshi Bashari et.al. 2505.13432v1 null
2025-05-19 Understanding Complexity in VideoQA via Visual Program Generation Cristobal Eyzaguirre et.al. 2505.13429v1 null
2025-05-19 GuidedMorph: Two-Stage Deformable Registration for Breast MRI Yaqian Chen et.al. 2505.13414v1 null
2025-05-19 Faster Video Diffusion with Trainable Sparse Attention Peiyuan Zhang et.al. 2505.13389v1 null
2025-05-19 RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers Ahmet Berke Gokmen et.al. 2505.13344v1 null
2025-05-19 Neural-Enhanced Rate Adaptation and Computation Distribution for Emerging mmWave Multi-User 3D Video Streaming Systems Babak Badnava et.al. 2505.13337v1 null
2025-05-16 QVGen: Pushing the Limit of Quantized Video Generative Models Yushi Huang et.al. 2505.11497v1 null
2025-05-16 SHIELD: Safety on Humanoids via CBFs In Expectation on Learned Dynamics Lizhi Yang et.al. 2505.11494v1 null
2025-05-16 EMU/GAMA: A new approach to characterising radio luminosity functions J. Prathap et.al. 2505.11453v1 null
2025-05-16 GOUHFI: a novel contrast- and resolution-agnostic segmentation tool for Ultra-High Field MRI Marc-Antoine Fortin et.al. 2505.11445v1 link
2025-05-16 GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art Chenkai Zhang et.al. 2505.11436v1 null
2025-05-16 Neuromorphic Imaging Flow Cytometry combined with Adaptive Recurrent Spiking Neural Networks Georgios Moustakas et.al. 2505.11433v1 null
2025-05-16 Face Consistency Benchmark for GenAI Video Michal Podstawski et.al. 2505.11425v1 null
2025-05-16 Energy efficiency analysis of Spiking Neural Networks for space applications Paolo Lunghi et.al. 2505.11418v1 null
2025-05-16 Uncertainty quantification with approximate variational learning for wearable photoplethysmography prediction tasks Ciaran Bench et.al. 2505.11412v1 null
2025-05-16 Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner Wenchuan Zhang et.al. 2505.11404v1 null
2025-05-15 3D-Fixup: Advancing Photo Editing with 3D Priors Yen-Chi Cheng et.al. 2505.10566v1 null
2025-05-15 Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data Yiwen Liu et.al. 2505.10551v1 link
2025-05-15 Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning Milan Ganai et.al. 2505.10547v1 null
2025-05-15 AORRTC: Almost-Surely Asymptotically Optimal Planning with RRT-Connect Tyler Wilson et.al. 2505.10542v1 null
2025-05-15 LibIQ: Toward Real-Time Spectrum Classification in O-RAN dApps Filippo Olimpieri et.al. 2505.10537v1 null
2025-05-15 Real-World fNIRS-Based Brain-Computer Interfaces: Benchmarking Deep Learning and Classical Models in Interactive Gaming Mohammad Ghalavand et.al. 2505.10536v1 null
2025-05-15 Sobolev and quasiconformal distortion of intermediate dimension with applications to conformal dimension Jonathan M. Fraser et.al. 2505.10525v1 null
2025-05-15 The Devil Is in the Word Alignment Details: On Translation-Based Cross-Lingual Transfer for Token Classification Tasks Benedikt Ebing et.al. 2505.10507v1 null
2025-05-16 WeGA: Weakly-Supervised Global-Local Affinity Learning Framework for Lymph Node Metastasis Prediction in Rectal Cancer Yifan Gao et.al. 2505.10502v2 null
2025-05-15 Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio Tu Duyen Nguyen et.al. 2505.10500v1 null
2025-05-14 UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing Yung-Hsuan Lai et.al. 2505.09615v1 null
2025-05-14 Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware Justin Yu et.al. 2505.09601v1 null
2025-05-14 Rhomboid Tiling for Geometric Graph Deep Learning Yipeng Zhang et.al. 2505.09586v1 null
2025-05-14 VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation Chaofan Zhang et.al. 2505.09577v1 null
2025-05-14 Meta-learning Slice-to-Volume Reconstruction in Fetal Brain MRI using Implicit Neural Representations Maik Dannecker et.al. 2505.09565v1 null
2025-05-14 Learning Long-Context Diffusion Policies via Past-Token Prediction Marcel Torne et.al. 2505.09561v1 null
2025-05-14 Phase domain walls in coherently driven Bose-Einstein condensates S. S. Gavrilov et.al. 2505.09553v1 null
2025-05-14 Learned Free-Energy Functionals from Pair-Correlation Matching for Dynamical Density Functional Theory Karnik Ram et.al. 2505.09543v1 null
2025-05-14 Multimodal transformers with elemental priors for phase classification of X-ray diffraction spectra Kangyu Ji et.al. 2505.09536v1 null
2025-05-14 Contactless Cardiac Pulse Monitoring Using Event Cameras Mohamed Moustafa et.al. 2505.09529v1 link
2025-05-13 UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations Hanjung Kim et.al. 2505.08787v1 null
2025-05-13 PCS-UQ: Uncertainty Quantification via the Predictability-Computability-Stability Framework Abhineet Agarwal et.al. 2505.08784v1 null
2025-05-13 Implet: A Post-hoc Subsequence Explainer for Time Series Models Fanyu Meng et.al. 2505.08748v1 link
2025-05-13 Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion Huiyan Qi et.al. 2505.08747v1 null
2025-05-13 TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series Xiaolei Qin et.al. 2505.08723v1 link
2025-05-13 Contrastive Normalizing Flows for Uncertainty-Aware Parameter Estimation Ibrahim Elsharkawy et.al. 2505.08709v1 null
2025-05-13 Big Data and the Computational Social Science of Entrepreneurship and Innovation Ningzi Li et.al. 2505.08706v1 null
2025-05-13 LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs K M Sajjadul Islam et.al. 2505.08704v1 null
2025-05-14 Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities George Saon et.al. 2505.08699v2 null
2025-05-13 VIViT: Variable-Input Vision Transformer Framework for 3D MR Image Segmentation Badhan Kumar Das et.al. 2505.08693v1 null
2025-05-12 DanceGRPO: Unleashing GRPO on Visual Generation Zeyue Xue et.al. 2505.07818v1 null
2025-05-12 Pixel Motion as Universal Representation for Robot Control Kanchana Ranasinghe et.al. 2505.07817v1 null
2025-05-12 DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies Tony Tao et.al. 2505.07813v1 null
2025-05-12 Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets Weiyu Li et.al. 2505.07747v1 null
2025-05-12 BodyGPS: Anatomical Positioning System Halid Ziya Yerebakan et.al. 2505.07744v1 null
2025-05-13 VTutor for High-Impact Tutoring at Scale: Managing Engagement and Real-Time Multi-Screen Monitoring with P2P Connections Eason Chen et.al. 2505.07736v2 null
2025-05-12 Spoken Language Understanding on Unseen Tasks With In-Context Learning Neeraj Agrawal et.al. 2505.07731v1 null
2025-05-12 Gameplay Highlights Generation Vignesh Edithal et.al. 2505.07721v1 null
2025-05-12 PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes Daniel Ogenrwot et.al. 2505.07700v1 null
2025-05-12 ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation Feng Yuan et.al. 2505.07687v1 link
2025-05-09 Adapting a Segmentation Foundation Model for Medical Image Classification Pengfei Gu et.al. 2505.06217v1 null
2025-05-09 Topo-VM-UNetV2: Encoding Topology into Vision Mamba UNet for Polyp Segmentation Diego Adame et.al. 2505.06210v1 null
2025-05-09 Leveraging Multi-Task Learning for Multi-Label Power System Security Assessment Muhy Eddin Za'ter et.al. 2505.06207v1 null
2025-05-09 Auto Tensor Singular Value Thresholding: A Non-Iterative and Rank-Free Framework for Tensor Denoising Hiroki Hasegawa et.al. 2505.06203v1 null
2025-05-09 Neuro-Symbolic Concepts Jiayuan Mao et.al. 2505.06191v1 null
2025-05-09 Brain Hematoma Marker Recognition Using Multitask Learning: SwinTransformer and Swin-Unet Kodai Hirata et.al. 2505.06185v1 null
2025-05-09 Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach Tim Schneider et.al. 2505.06182v1 null
2025-05-09 New Advances in Phonons: From Band Topology to Quasiparticle Chirality Tiantian Zhang et.al. 2505.06179v1 null
2025-05-09 MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks Wenqi Zeng et.al. 2505.06152v1 link
2025-05-09 Estimating Quality in Therapeutic Conversations: A Multi-Dimensional Natural Language Processing Framework Alice Rueda et.al. 2505.06151v1 null
2025-05-08 SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation Yonwoo Choi et.al. 2505.05475v1 link
2025-05-08 3D Scene Generation: A Survey Beichen Wen et.al. 2505.05474v1 link
2025-05-08 StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant Haibo Wang et.al. 2505.05467v1 null
2025-05-08 SITE: towards Spatial Intelligence Thorough Evaluation Wenqi Wang et.al. 2505.05456v1 null
2025-05-08 Robustly optimal dynamics for active matter reservoir computing Mario U. Gaimann et.al. 2505.05420v1 null
2025-05-08 DPQ-HD: Post-Training Compression for Ultra-Low Power Hyperdimensional Computing Nilesh Prasad Pandey et.al. 2505.05413v1 null
2025-05-08 Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It Marvin F. da Silva et.al. 2505.05409v1 null
2025-05-08 CART-ELC: Oblique Decision Tree Induction via Exhaustive Search Andrew D. Laack et.al. 2505.05402v1 link
2025-05-08 OcularAge: A Comparative Study of Iris and Periocular Images for Pediatric Age Estimation Naveenkumar G Venkataswamy et.al. 2505.05374v1 null
2025-05-08 BMS representations for generic supermomentum Xavier Bekaert et.al. 2505.05368v1 null
2025-05-07 Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait Feng Liu et.al. 2505.04616v1 null
2025-05-07 Dynamic Network Flow Optimization for Task Scheduling in PTZ Camera Surveillance Systems Mohammad Merati et.al. 2505.04596v1 null
2025-05-07 Relative benefits of different active learning methods to conceptual physics learning Meagan Sundstrom et.al. 2505.04577v1 null
2025-05-07 Multitask LSTM for Arboviral Outbreak Prediction Using Public Health Data Lucas R. C. Farias et.al. 2505.04566v1 null
2025-05-07 Edge-GPU Based Face Tracking for Face Detection and Recognition Acceleration Asma Baobaid et.al. 2505.04524v1 null
2025-05-07 Complementary legs and symplectic rational balls John B. Etnyre et.al. 2505.04513v1 null
2025-05-08 HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation Teng Hu et.al. 2505.04512v2 null
2025-05-07 Leveraging Simultaneous Usage of Edge GPU Hardware Engines for Video Face Detection and Recognition Asma Baobaid et.al. 2505.04502v1 null
2025-05-08 FA-KPConv: Introducing Euclidean Symmetries to KPConv via Frame Averaging Ali Alawieh et.al. 2505.04485v2 null
2025-05-07 Securing Immersive 360 Video Streams through Attribute-Based Selective Encryption Mohammad Waquas Usmani et.al. 2505.04466v1 null
2025-05-06 Multi-Agent System for Comprehensive Soccer Understanding Jiayuan Rao et.al. 2505.03735v1 null
2025-05-06 FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios Shiyi Zhang et.al. 2505.03730v1 null
2025-05-07 Visual Imitation Enables Contextual Humanoid Control Arthur Allshire et.al. 2505.03729v2 null
2025-05-06 DISARM++: Beyond scanner-free harmonization Luca Caldera et.al. 2505.03715v1 null
2025-05-06 NBF at SemEval-2025 Task 5: Light-Burst Attention Enhanced System for Multilingual Subject Recommendation Baharul Islam et.al. 2505.03711v1 null
2025-05-06 Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning François Role et.al. 2505.03703v1 null
2025-05-06 Neural Integral Operators for Inverse problems in Spectroscopy Emanuele Zappala et.al. 2505.03677v1 null
2025-05-06 Vector valued optimal transport: from dynamic to static formulations Katy Craig et.al. 2505.03670v1 null
2025-05-06 m-accretive extensions of Friedrichs operators Krešimir Burazin et.al. 2505.03657v1 null
2025-05-06 ALMA: Aggregated Lipschitz Maximization Attack on Auto-encoders Chethan Krishnamurthy Ramanaik et.al. 2505.03646v1 null
2025-05-06 Towards Application-Specific Evaluation of Vision Models: Case Studies in Ecology and Biology Alex Hoi Hang Chan et.al. 2505.02825v2 null
2025-05-05 Towards Quantifying the Hessian Structure of Neural Networks Zhaorui Dong et.al. 2505.02809v1 null
2025-05-05 Beyond the Monitor: Mixed Reality Visualization and AI for Enhanced Digital Pathology Workflow Jai Prakash Veerla et.al. 2505.02780v1 null
2025-05-05 Teaching the social media generation: rethinking learning without sacrificing quality Sepinoud Azimi et.al. 2505.02770v1 null
2025-05-05 The use of Artificial Intelligence for Intervention and Assessment in Individuals with ASD Aggeliki Sideraki et.al. 2505.02747v1 null
2025-05-05 The Spectrum of Stable Infinity Categories with Actions Hisato Matsukawa et.al. 2505.02724v1 null
2025-05-05 A Rate-Quality Model for Learned Video Coding Sang NguyenQuang et.al. 2505.02720v1 null
2025-05-05 Searching for supermassive black holes binaries within SRG/eROSITA-De I: Properties of the X-ray selected candidates D. Tubín-Arenas et.al. 2505.02708v1 null
2025-05-05 Multi-View Learning with Context-Guided Receptance for Image Denoising Binghong Chen et.al. 2505.02705v1 null
2025-05-05 A Survey on Progress in LLM Alignment from the Perspective of Reward Design Miaomiao Ji et.al. 2505.02666v1 null
2025-05-02 GENMO: A GENeralist Model for Human MOtion Jiefeng Li et.al. 2505.01425v1 null
2025-05-02 VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models Mohammadreza Teymoorianfard et.al. 2505.01406v1 null
2025-05-02 Potential Contrast: Properties, Equivalences, and Generalization to Multiple Classes Wallace Peaslee et.al. 2505.01388v1 null
2025-05-02 Emerging Media Use and Acceptance of Digital Immortality: A Cluster Analysis among Chinese Young Generations Yi Mou et.al. 2505.01355v1 null
2025-05-02 How to Learn a Star: Binary Classification with Starshaped Polyhedral Sets Marie-Charlotte Brandenburg et.al. 2505.01346v1 null
2025-05-02 Classifying Radio-Loud and Radio-Quiet Quasars With Novel PCA Based Regression Classifier Ramkrishna Joshi et.al. 2505.01335v1 null
2025-05-02 DebtStreamness: An Ecological Approach to Credit Flows in Inter-Firm Networks Anahí Rodríguez-Martínez et.al. 2505.01326v1 null
2025-05-02 Helping Big Language Models Protect Themselves: An Enhanced Filtering and Summarization System Sheikh Samit Muhaimin et.al. 2505.01315v1 null
2025-05-02 Contactless pulse rate assessment: Results and insights for application in driving simulator Đorđe D. Nešković et.al. 2505.01299v1 null
2025-05-02 ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow Changhe Chen et.al. 2505.01288v1 null
2025-05-01 Controllable Weather Synthesis and Removal with Video Diffusion Models Chih-Hao Lin et.al. 2505.00704v1 null
2025-05-01 GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution Aditya Arora et.al. 2505.00687v1 null
2025-05-01 MINERVA: Evaluating Complex Video Reasoning Arsha Nagrani et.al. 2505.00681v1 null
2025-05-01 Rational points on $X_0(N)^*$ when $N$ is non-squarefree Sachi Hashimoto et.al. 2505.00680v1 null
2025-05-01 Deep Learning Assisted Outer Volume Removal for Highly-Accelerated Real-Time Dynamic MRI Merve Gülle et.al. 2505.00643v1 null
2025-05-01 Bayes-Optimal Fair Classification with Multiple Sensitive Features Yi Yang et.al. 2505.00631v1 null
2025-05-01 Brain Foundation Models with Hypergraph Dynamic Adapter for Brain Disease Analysis Zhongying Deng et.al. 2505.00627v1 null
2025-05-01 Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction Simon Giebenhain et.al. 2505.00615v1 null
2025-05-01 Dietary Intake Estimation via Continuous 3D Reconstruction of Food Wallace Lee et.al. 2505.00606v1 null
2025-05-01 Visual Trajectory Prediction of Vessels for Inland Navigation Alexander Puzicha et.al. 2505.00599v1 null
2025-04-30 ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction Qihao Liu et.al. 2504.21855v1 null
2025-04-30 A Survey of Interactive Generative Video Jiwen Yu et.al. 2504.21853v1 null
2025-04-30 Active Light Modulation to Counter Manipulation of Speech Visual Content Hadleigh Schwartz et.al. 2504.21846v1 null
2025-04-30 Neuro-Symbolic Generation of Explanations for Robot Policies with Weighted Signal Temporal Logic Mikihisa Yuasa et.al. 2504.21841v1 null
2025-04-30 Learning Universal User Representations Leveraging Cross-domain User Intent at Snapchat Clark Mingxuan Ju et.al. 2504.21838v1 null
2025-04-30 Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization Anas Anwarul Haq Khan et.al. 2504.21831v1 null
2025-04-30 Discrete series for the graded Hecke algebra of type $H_{4}$ Kei Yuen Chan et.al. 2504.21790v1 null
2025-04-30 LoC-LIC: Low Complexity Learned Image Coding Using Hierarchical Feature Transforms Ayman A. Ameen et.al. 2504.21778v1 null
2025-04-30 Solving Copyright Infringement on Short Video Platforms: Novel Datasets and an Audio Restoration Deep Learning Pipeline Minwoo Oh et.al. 2504.21772v1 null
2025-04-30 Ends of the strata of differentials Benjamin Dozier et.al. 2504.21756v1 null
2025-04-29 TesserAct: Learning 4D Embodied World Models Haoyu Zhen et.al. 2504.20995v1 null
2025-04-29 Photonic Quantum Convolutional Neural Networks with Adaptive State Injection Léo Monbroussou et.al. 2504.20989v1 null
2025-04-29 SVD Based Least Squares for X-Ray Pneumonia Classification Using Deep Features Mete Erdogan et.al. 2504.20970v1 null
2025-04-29 Soft-X-ray momentum microscopy of nonlinear magnon interactions below 100-nm wavelength Steffen Wittrock et.al. 2504.20958v1 null
2025-04-30 DS_FusionNet: Dynamic Dual-Stream Fusion with Bidirectional Knowledge Distillation for Plant Disease Recognition Yanghui Song et.al. 2504.20948v2 link
2025-04-29 Improvements of Dark Experience Replay and Reservoir Sampling towards Better Balance between Consolidation and Plasticity Taisuke Kobayashi et.al. 2504.20932v1 null
2025-04-29 Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers Quentin Guimard et.al. 2504.20902v1 null
2025-04-29 CBM-RAG: Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models Hasan Md Tusfiqur Alam et.al. 2504.20898v1 null
2025-04-29 Imaging on the Edge: Mapping Object Corners and Edges with Stereo X-ray Tomography Zhenduo Shang et.al. 2504.20892v1 null
2025-04-30 Quantifying the Noise of Structural Perturbations on Graph Adversarial Attacks Junyuan Fang et.al. 2504.20869v2 null
2025-04-28 Learning Streaming Video Representation via Multitask Training Yibin Yan et.al. 2504.20041v1 null
2025-04-28 Pan-genome Analysis of Angiosperm Plastomes using PGR-TK Manoj P. Samanta et.al. 2504.20034v1 null
2025-04-28 Towards AI-Driven Policing: Interdisciplinary Knowledge Discovery from Police Body-Worn Camera Footage Anita Srbinovska et.al. 2504.20007v1 null
2025-04-28 Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose Narges Rashvand et.al. 2504.19970v1 null
2025-04-28 Enhancing Quality for VVC Compressed Videos with Omniscient Quality Enhancement Model Xiem HoangVan et.al. 2504.19935v1 null
2025-04-28 Accelerated 3D-3D rigid registration of echocardiographic images obtained from apical window using particle filter Thanuja Uruththirakodeeswaran et.al. 2504.19930v1 null
2025-04-28 Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI Hugo Georgenthum et.al. 2504.19918v1 null
2025-04-28 Breast Cancer Detection from Multi-View Screening Mammograms with Visual Prompt Tuning Han Chen et.al. 2504.19900v1 null
2025-04-28 GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets Mingqian He et.al. 2504.19898v1 null
2025-04-28 CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition Quynh Phung et.al. 2504.19894v1 null
2025-04-25 RSFR: A Coarse-to-Fine Reconstruction Framework for Diffusion Tensor Cardiac MRI with Semantic-Aware Refinement Jiahao Huang et.al. 2504.18520v1 null
2025-04-25 Co-Change Graph Entropy: A New Process Metric for Defect Prediction Ethari Hrishikesh et.al. 2504.18511v1 null
2025-04-25 Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models Patrick Müller et.al. 2504.18510v1 null
2025-04-25 SymTFT, Protected Gaplessness, and Spontaneous Breaking of Non-invertible Symmetries Michele Del Zotto et.al. 2504.18501v1 null
2025-04-25 Quasi-Einstein structures and Hitchin's equations Alex Colling et.al. 2504.18475v1 null
2025-04-25 A Novel Taxonomy and Classification Scheme for Code Smell Interactions Ruchin Gupta et.al. 2504.18469v1 null
2025-04-25 A Taylor Series Approach to Correction of Input Errors in Gaussian Process Regression Muzaffar Qureshi et.al. 2504.18463v1 null
2025-04-25 Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training Hiroki Naganuma et.al. 2504.18454v1 null
2025-04-25 NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration Haotian Dong et.al. 2504.18448v1 null
2025-04-25 Iterative Event-based Motion Segmentation by Variational Contrast Maximization Ryo Yamaki et.al. 2504.18447v1 null
2025-04-24 Dynamic Camera Poses and Where to Find Them Chris Rockwell et.al. 2504.17788v1 null
2025-04-24 Silenzio: Secure Non-Interactive Outsourced MLP Training Jonas Sander et.al. 2504.17785v1 null
2025-04-24 Disaggregated Deep Learning via In-Physics Computing at Radio Frequency Zhihui Gao et.al. 2504.17752v1 null
2025-04-24 MSGCN: Multiplex Spatial Graph Convolution Network for Interlayer Link Weight Prediction Steven E. Wilson et.al. 2504.17749v1 null
2025-04-24 Interpretable Early Detection of Parkinson's Disease through Speech Analysis Lorenzo Simone et.al. 2504.17739v1 null
2025-04-24 CasualHDRSplat: Robust High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos Shucheng Gong et.al. 2504.17728v1 null
2025-04-24 Unsupervised EEG-based decoding of absolute auditory attention with canonical correlation analysis Nicolas Heintz et.al. 2504.17724v1 null
2025-04-24 Evaluating Uncertainty in Deep Gaussian Processes Matthijs van der Lende et.al. 2504.17719v1 null
2025-04-24 Early Detection of Multidrug Resistance Using Multivariate Time Series Analysis and Interpretable Patient-Similarity Representations Óscar Escudero-Arnanz et.al. 2504.17717v1 null
2025-04-24 Self-Supervised Noise Adaptive MRI Denoising via Repetition to Repetition (Rep2Rep) Learning Nikola Janjušević et.al. 2504.17698v1 null
2025-04-23 I-Con: A Unifying Framework for Representation Learning Shaden Alshammari et.al. 2504.16929v1 null
2025-04-23 Year six photometric measurements of known Trans-Neptunian Objects and Centaurs by the Dark Energy Survey Feliphe S. Ferreira et.al. 2504.16927v1 null
2025-04-23 Meta-Learning Online Dynamics Model Adaptation in Off-Road Autonomous Driving Jacob Levy et.al. 2504.16923v1 null
2025-04-23 Tracing Thought: Using Chain-of-Thought Reasoning to Identify the LLM Behind AI-Generated Text Shifali Agrahari et.al. 2504.16913v1 null
2025-04-23 BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation Ruotong Wang et.al. 2504.16907v1 null
2025-04-23 A new approach to the classification of almost contact metric manifolds via intrinsic endomorphisms Ilka Agricola et.al. 2504.16900v1 null
2025-04-23 Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification Alexander Shvets et.al. 2504.16856v1 null
2025-04-23 Energetics of the nucleation and glide of disconnection modes in symmetric tilt grain boundaries Himanshu Joshi et.al. 2504.16854v1 null
2025-04-23 A Low-Cost Photogrammetry System for 3D Plant Modeling and Phenotyping Joe Hrzich et.al. 2504.16840v1 null
2025-04-23 Symbiotic stars in the era of modern ground- and space-based surveys Jaroslav Merc et.al. 2504.16825v1 null
2025-04-22 MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention Yucheng Li et.al. 2504.16083v1 null
2025-04-22 MR. Video: "MapReduce" is the Principle for Long Video Understanding Ziqi Pang et.al. 2504.16082v1 null
2025-04-22 Survey of Video Diffusion Models: Foundations, Implementations, and Applications Yimu Wang et.al. 2504.16081v1 null
2025-04-22 Describe Anything: Detailed Localized Image and Video Captioning Long Lian et.al. 2504.16072v1 null
2025-04-22 Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis Frank Li et.al. 2504.16047v1 null
2025-04-22 LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale Joya Chen et.al. 2504.16030v1 null
2025-04-22 Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework Xinyuan Song et.al. 2504.16016v1 null
2025-04-22 MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment Yachun Mi et.al. 2504.16003v1 null
2025-04-22 Neuroadaptive Haptics: Comparing Reinforcement Learning from Explicit Ratings and Neural Signals for Adaptive XR Systems Lukas Gehrke et.al. 2504.15984v1 null
2025-04-22 Bug Destiny Prediction in Large Open-Source Software Repositories through Sentiment Analysis and BERT Topic Modeling Sophie C. Pope et.al. 2504.15972v1 null
2025-04-22 DRAWER: Digital Reconstruction and Articulation With Environment Realism Hongchi Xia et.al. 2504.15278v2 null
2025-04-21 Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Guo Chen et.al. 2504.15271v1 null
2025-04-21 An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes Ji Qi et.al. 2504.15270v1 null
2025-04-21 Diffusion Bridge Models for 3D Medical Image Translation Shaorong Zhang et.al. 2504.15267v1 null
2025-04-21 SuoiAI: Building a Dataset for Aquatic Invertebrates in Vietnam Tue Vo et.al. 2504.15252v1 null
2025-04-21 On Walker and para-Hermite Einstein spaces Adam Chudecki et.al. 2504.15221v1 null
2025-04-22 Histogram-based Parameter-efficient Tuning for Passive Sonar Classification Amirmohammad Mohammadi et.al. 2504.15214v2 null
2025-04-21 Automated Measurement of Eczema Severity with Self-Supervised Learning Neelesh Kumar et.al. 2504.15193v1 null
2025-04-21 Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform Xianpan Zhou et.al. 2504.15182v1 null
2025-04-21 FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image Fei Yin et.al. 2504.15179v1 null
2025-04-18 Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models Junjie Yang et.al. 2504.13825v1 null
2025-04-18 CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning Yang Yue et.al. 2504.13820v1 link
2025-04-18 The Binary and Ternary Quantization Can Improve Feature Discrimination Weizhi Lu et.al. 2504.13792v1 null
2025-04-18 Fighting Fires from Space: Leveraging Vision Transformers for Enhanced Wildfire Detection and Characterization Aman Agarwal et.al. 2504.13776v1 null
2025-04-18 Detecting Malicious Source Code in PyPI Packages with LLMs: Does RAG Come in Handy? Motunrayo Ibiyo et.al. 2504.13769v1 null
2025-04-18 Modeling L1 Influence on L2 Pronunciation: An MFCC-Based Framework for Explainable Machine Learning and Pedagogical Feedback Peyman Jahanbin et.al. 2504.13765v1 null
2025-04-18 Fragile Watermarking for Image Certification Using Deep Steganographic Embedding Davide Ghiani et.al. 2504.13759v1 null
2025-04-18 Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis Zhu Zhu et.al. 2504.13754v1 null
2025-04-18 LimitNet: Progressive, Content-Aware Image Offloading for Extremely Weak Devices & Networks Ali Hojjat et.al. 2504.13736v1 null
2025-04-18 The relativity of color perception Michel Berthier et.al. 2504.13720v1 null
2025-04-17 Perception Encoder: The best visual embeddings are not at the output of the network Daniel Bolya et.al. 2504.13181v1 null
2025-04-17 PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding Jang Hyun Cho et.al. 2504.13180v1 null
2025-04-18 ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos Zetong Zhang et.al. 2504.13167v2 null
2025-04-17 Digital Twin Generation from Visual Data: A Survey Andrew Melnik et.al. 2504.13159v1 null
2025-04-17 St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World Haiwen Feng et.al. 2504.13152v1 null
2025-04-17 Readable Twins of Unreadable Models Krzysztof Pancerz et.al. 2504.13150v1 null
2025-04-17 Long Range Navigator (LRN): Extending robot planning horizons beyond metric maps Matt Schmittle et.al. 2504.13149v1 null
2025-04-17 PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition Jongseo Lee et.al. 2504.13140v1 null
2025-04-17 NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results Xin Li et.al. 2504.13131v1 link
2025-04-17 VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Haojian Huang et.al. 2504.13122v1 link
2025-04-16 Adapting a World Model for Trajectory Following in a 3D Game Marko Tot et.al. 2504.12299v1 null
2025-04-16 SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians Liam Schoneveld et.al. 2504.12292v1 null
2025-04-16 Beyond Reconstruction: A Physics Based Neural Deferred Shader for Photo-realistic Rendering Zhuo He et.al. 2504.12273v1 null
2025-04-16 Correlation Ratio for Unsupervised Learning of Multi-modal Deformable Registration Xiaojian Chen et.al. 2504.12265v1 null
2025-04-16 VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate Zhihang Yuan et.al. 2504.12259v1 null
2025-04-16 FLIP Reasoning Challenge Andreas Plesner et.al. 2504.12256v1 null
2025-04-16 Human Aligned Compression for Robust Models Samuel Räber et.al. 2504.12255v1 null
2025-04-16 Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography Zhijin He et.al. 2504.12249v1 null
2025-04-16 SIDME: Self-supervised Image Demoiréing via Masked Encoder-Decoder Reconstruction Xia Wang et.al. 2504.12245v1 null
2025-04-16 Coding-Prior Guided Diffusion Network for Video Deblurring Yike Liu et.al. 2504.12222v1 null
2025-04-15 Mamba-Based Ensemble learning for White Blood Cell Classification Lewis Clifton et.al. 2504.11438v1 null
2025-04-15 Enhancing Out-of-Distribution Detection with Extended Logit Normalization Yifan Ding et.al. 2504.11434v1 null
2025-04-15 Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models Maria Teleki et.al. 2504.11431v1 null
2025-04-15 NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors Yanrui Bin et.al. 2504.11427v1 null
2025-04-15 Deep Learning-based Bathymetry Retrieval without In-situ Depths using Remote Sensing Imagery and SfM-MVS DSMs with Data Gaps Panagiotis Agrafiotis et.al. 2504.11416v1 null
2025-04-15 Statistical few-shot learning for large-scale classification via parameter pooling Andrew Simpson et.al. 2504.11404v1 null
2025-04-15 VideoPanda: Video Panoramic Diffusion with Multi-view Attention Kevin Xie et.al. 2504.11389v1 null
2025-04-15 Trajectory Encoding Temporal Graph Networks Jiafeng Xiong et.al. 2504.11386v1 null
2025-04-15 Ring Artifacts Correction Based on Global-Local Features Interaction Guidance in the Projection Domain Yunze Liu et.al. 2504.11375v1 null
2025-04-15 A two-phase quenching-type problem for the p-Laplacian Julio C. Correa et.al. 2504.11370v1 null
2025-04-14 DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting Zeren Jiang et.al. 2504.10486v1 null
2025-04-14 Quantum Barcodes: Persistent Homology for Quantum Phase Transitions Khyathi Komalan et.al. 2504.10468v1 null
2025-04-14 Integrating Vision and Location with Transformers: A Multimodal Deep Learning Framework for Medical Wound Analysis Ramin Mousa et.al. 2504.10452v1 null
2025-04-14 Multimodal Long Video Modeling Based on Temporal Dynamic Context Haoran Hao et.al. 2504.10443v1 null
2025-04-14 Framing Perception: Exploring Camera Induced Objectification in Cinema Parth Maradia et.al. 2504.10404v1 null
2025-04-14 PG-DPIR: An efficient plug-and-play method for high-count Poisson-Gaussian inverse problems Maud Biquard et.al. 2504.10375v1 null
2025-04-14 Proteinoid spikes: from protocognitive to universal approximating agents Saksham Sharma et.al. 2504.10362v1 null
2025-04-14 FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos Rui Chen et.al. 2504.10358v1 null
2025-04-14 Patch and Shuffle: A Preprocessing Technique for Texture Classification in Autonomous Cementitious Fabrication Jeremiah Giordani et.al. 2504.10353v1 null
2025-04-14 Domain-Adversarial Neural Network and Explainable AI for Reducing Tissue-of-Origin Signal in Pan-cancer Mortality Classification Cristian Padron-Manrique et.al. 2504.10343v1 null
2025-04-11 ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning Sahil Sethi et.al. 2504.08713v1 null
2025-04-11 Hypergraph Vision Transformers: Images are More than Nodes, More than Edges Joshua Fixelle et.al. 2504.08710v1 null
2025-04-11 Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Team Seawead et.al. 2504.08685v1 null
2025-04-11 BowelRCNN: Region-based Convolutional Neural Network System for Bowel Sound Auscultation Igor Matynia et.al. 2504.08659v1 null
2025-04-11 The Invisible EgoHand: 3D Hand Forecasting through EgoBody Pose Estimation Masashi Hatano et.al. 2504.08654v1 null
2025-04-11 Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization Jialu Li et.al. 2504.08641v1 null
2025-04-11 Transformer Learns Optimal Variable Selection in Group-Sparse Classification Chenyang Zhang et.al. 2504.08638v1 null
2025-04-11 Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition Lei Kang et.al. 2504.08616v1 null
2025-04-11 Enhancing knowledge retention for continual learning with domain-specific adapters and features gating Mohamed Abbas Hedjazi et.al. 2504.08613v1 null
2025-04-11 A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English Julian Bäumler et.al. 2504.08609v1 null
2025-04-10 GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation Lang Lin et.al. 2504.07962v1 null
2025-04-10 Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction Zeren Jiang et.al. 2504.07961v1 null
2025-04-10 VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Yukun Qi et.al. 2504.07956v1 null
2025-04-10 BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation Yuanhong Yu et.al. 2504.07955v1 null
2025-04-10 InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians Kefan Chen et.al. 2504.07949v1 null
2025-04-10 Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos Rundong Luo et.al. 2504.07940v1 null
2025-04-10 Zero-Shot Low-dose CT Denoising via Sinogram Flicking Yongyi Shi et.al. 2504.07927v1 null
2025-04-10 SKK groups of manifolds and non-unitary invertible TQFTs Renee S. Hoekzema et.al. 2504.07917v1 null
2025-04-10 Semantically Encoding Activity Labels for Context-Aware Human Activity Recognition Wen Ge et.al. 2504.07916v1 link
2025-04-10 The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound Blake VanBerlo et.al. 2504.07904v1 null
2025-04-09 Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning Nikhil Shivakumar Nayak et.al. 2504.07097v1 null
2025-04-09 FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution Gene Chou et.al. 2504.07093v1 null
2025-04-09 Are We Done with Object-Centric Learning? Alexander Rubinstein et.al. 2504.07092v1 null
2025-04-10 GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography Mengchen Zhang et.al. 2504.07083v2 null
2025-04-09 Detecting AI-generated Artwork Meien Li et.al. 2504.07078v1 null
2025-04-09 Enhancing Downstream Analysis in Genome Sequencing: Species Classification While Basecalling Riselda Kodra et.al. 2504.07065v1 null
2025-04-09 $Π$-NeSy: A Possibilistic Neuro-Symbolic Approach Ismaïl Baaj et.al. 2504.07055v1 null
2025-04-09 Classification results for totally real surfaces of nearly Kähler $\mathbb{C}P^3$ Michaël Liefsoens et.al. 2504.07035v1 null
2025-04-09 Weak Signals and Heavy Tails: Machine-learning meets Extreme Value Theory Stephan Clémençon et.al. 2504.06984v1 null
2025-04-10 VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Xinhao Li et.al. 2504.06958v2 null
2025-04-08 PainNet: Statistical Relation Network with Episode-Based Training for Pain Estimation Mina Bishay et.al. 2504.06257v1 null
2025-04-08 Monitoring Viewer Attention During Online Ads Mina Bishay et.al. 2504.06237v1 null
2025-04-08 From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models Chejian Xu et.al. 2504.06214v1 null
2025-04-08 HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation Yiming Liang et.al. 2504.06210v1 null
2025-04-08 An experimental survey and Perspective View on Meta-Learning for Automated Algorithms Selection and Parametrization Moncef Garouani et.al. 2504.06207v1 null
2025-04-08 HRMedSeg: Unlocking High-resolution Medical Image segmentation via Memory-efficient Attention Modeling Qing Xu et.al. 2504.06205v1 link
2025-04-08 Positive 3-braids, Khovanov homology and Garside theory Álvaro Del Valle Vílchez et.al. 2504.06194v1 null
2025-04-08 Rethinking the Nested U-Net Approach: Enhancing Biomarker Segmentation with Attention Mechanisms and Multiscale Feature Fusion Saad Wazir et.al. 2504.06158v1 link
2025-04-08 A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning Akash Kumar et.al. 2504.06153v1 null
2025-04-08 Optimal classification with outcome performativity Elizabeth Maggie Penn et.al. 2504.06127v1 null
2025-04-07 SmolVLM: Redefining small and efficient multimodal models Andrés Marafioti et.al. 2504.05299v1 null
2025-04-07 One-Minute Video Generation with Test-Time Training Karan Dalal et.al. 2504.05298v1 null
2025-04-07 Hopf tori and standard tori Leonardo A. Cano García et.al. 2504.05285v1 null
2025-04-07 AnomalousNet: A Hybrid Approach with Attention U-Nets and Change Point Detection for Accurate Characterization of Anomalous Diffusion in Video Data Yusef Ahsini et.al. 2504.05271v1 null
2025-04-07 Explaining Low Perception Model Competency with High-Competency Counterfactuals Sara Pohland et.al. 2504.05254v1 null
2025-04-07 Federated Learning for Medical Image Classification: A Comprehensive Benchmark Zhekai Zhou et.al. 2504.05238v1 null
2025-04-07 Mapping biodiversity at very-high resolution in Europe César Leblanc et.al. 2504.05231v1 null
2025-04-07 Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation Jiaming Chen et.al. 2504.05225v1 null
2025-04-07 An ensemble deep learning approach to detect tumors on Mohs micrographic surgery slides Abdurrahim Yilmaz et.al. 2504.05219v1 null
2025-04-07 LLM-Alignment Live-Streaming Recommendation Yueyang Liu et.al. 2504.05217v1 null
2025-04-04 Bonsai: Interpretable Tree-Adaptive Grounded Reasoning Kate Sanders et.al. 2504.03640v1 null
2025-04-04 MedSAM2: Segment Anything in 3D Medical Images and Videos Jun Ma et.al. 2504.03600v1 null
2025-04-04 Real-is-Sim: Bridging the Sim-to-Real Gap with a Dynamic Digital Twin for Real-World Robot Policy Evaluation Jad Abou-Chakra et.al. 2504.03597v1 null
2025-04-04 AdaViT: Adaptive Vision Transformer for Flexible Pretrain and Finetune with Variable 3D Medical Image Modalities Badhan Kumar Das et.al. 2504.03589v1 null
2025-04-04 AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing Niu Lian et.al. 2504.03587v1 link
2025-04-04 Dense Neural Network Based Arrhythmia Classification on Low-cost and Low-compute Micro-controller Md Abu Obaida Zishan et.al. 2504.03531v1 null
2025-04-04 LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders Ilan Naiman et.al. 2504.03501v1 null
2025-04-04 Physics-informed 4D X-ray image reconstruction from ultra-sparse spatiotemporal data Zisheng Yao et.al. 2504.03469v1 null
2025-04-04 Conditioning Diffusions Using Malliavin Calculus Jakiw Pidstrigach et.al. 2504.03461v1 null
2025-04-04 Early detection of diabetes through transfer learning-based eye (vision) screening and improvement of machine learning model performance and advanced parameter setting algorithms Mohammad Reza Yousefi et.al. 2504.03439v1 null
2025-04-03 STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection Divya Velayudhan et.al. 2504.02823v1 null
2025-04-03 GMR-Conv: An Efficient Rotation and Reflection Equivariant Convolution Kernel Using Gaussian Mixture Rings Yuexi Du et.al. 2504.02819v1 null
2025-04-03 BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation Van Nguyen Nguyen et.al. 2504.02812v1 null
2025-04-03 Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets Chuning Zhu et.al. 2504.02792v1 null
2025-04-03 GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Zhiyuan Yan et.al. 2504.02782v1 null
2025-04-03 Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model Shengjun Zhang et.al. 2504.02764v1 null
2025-04-03 A Complete Classification of Fourier Summation Formulas on the real line Felipe Gonçalves et.al. 2504.02741v1 null
2025-04-03 HQViT: Hybrid Quantum Vision Transformer for Image Classification Hui Zhang et.al. 2504.02730v1 null
2025-04-03 Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation Xingguang Zhang et.al. 2504.02697v1 null
2025-04-03 Two-Stage nnU-Net for Automatic Multi-class Bi-Atrial Segmentation from LGE-MRIs Y. On et.al. 2504.02668v1 null
2025-04-02 Learning from Streaming Video with Orthogonal Gradients Tengda Han et.al. 2504.01961v1 null
2025-04-02 Slot-Level Robotic Placement via Visual Imitation from Single Human Video Dandan Shan et.al. 2504.01959v1 null
2025-04-03 VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Hanyang Wang et.al. 2504.01956v2 null
2025-04-02 A thorough benchmark of automatic text classification: From traditional approaches to large language models Washington Cunha et.al. 2504.01930v1 null
2025-04-02 Gen-C: Populating Virtual Worlds with Generative Crowds Andreas Panayiotou et.al. 2504.01924v1 null
2025-04-02 Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness Haochen Wang et.al. 2504.01901v1 null
2025-04-02 Is Temporal Prompting All We Need For Limited Labeled Action Recognition? Shreyank N Gowda et.al. 2504.01890v1 null
2025-04-02 CO-DEFEND: Continuous Decentralized Federated Learning for Secure DoH-Based Threat Detection Diego Cajaraville-Aboy et.al. 2504.01882v1 null
2025-04-02 Architect Your Landscape Approach (AYLA) for Optimizations in Deep Learning Ben Keslaki et.al. 2504.01875v1 null
2025-04-02 Buggin: Automatic intrinsic bugs classification model using NLP and ML Pragya Bhandari et.al. 2504.01869v1 null
2025-03-31 Easi3R: Estimating Disentangled Motion from DUSt3R Without Training Xingyu Chen et.al. 2503.24391v1 link
2025-03-31 Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Shengqiong Wu et.al. 2503.24379v1 null
2025-03-31 Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Yi Chen et.al. 2503.24376v1 link
2025-04-02 Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation Abhiram Maddukuri et.al. 2503.24361v2 null
2025-03-31 Single-Shot Matrix-Matrix Multiplication Optical Tensor Processor for Deep Learning Chao Luan et.al. 2503.24356v1 null
2025-03-31 PathOrchestra: A Comprehensive Foundation Model for Computational Pathology with Over 100 Diverse Clinical-Grade Tasks Fang Yan et.al. 2503.24345v1 null
2025-03-31 On gradient $ρ$-Einstein solitons with Bach tensor radially nonnegative Maria Andrade et.al. 2503.24337v1 null
2025-03-31 NoProp: Training Neural Networks without Back-propagation or Forward-propagation Qinyu Li et.al. 2503.24322v1 null
2025-03-31 A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG Arshia Kermani et.al. 2503.24307v1 null
2025-03-31 Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions Thinesh Thiyakesan Ponbagavathi et.al. 2503.24298v1 null
2025-03-28 Understanding Co-speech Gestures in-the-wild Sindhu B Hegde et.al. 2503.22668v1 null
2025-03-28 Evaluation of Machine-generated Biomedical Images via A Tally-based Similarity Measure Frank J. Brooks et.al. 2503.22658v1 null
2025-03-28 Deep learning-enabled prediction of surgical errors during cataract surgery: from simulation to real-world application Maxime Faure et.al. 2503.22647v1 null
2025-03-28 Sentiment Classification of Thai Central Bank Press Releases Using Supervised Learning Stefano Grassi et.al. 2503.22629v1 null
2025-03-28 Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model Jangho Park et.al. 2503.22622v1 null
2025-03-28 Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users Antonia Karamolegkou et.al. 2503.22610v1 null
2025-03-28 Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis Shuai Shen et.al. 2503.22605v1 null
2025-03-28 Zero-homogeneous and $O(2)$-equivariant critical points of the Oseen-Frank energy with multiple Frank constants Luc Nguyen et.al. 2503.22599v1 null
2025-03-28 KEVS: Enhancing Segmentation of Visceral Adipose Tissue in Pre-Cystectomy CT with Gaussian Kernel Density Estimation Thomas Boucher et.al. 2503.22592v1 null
2025-03-28 Using AI to Summarize US Presidential Campaign TV Advertisement Videos, 1952-2012 Adam Breuer et.al. 2503.22589v1 link
2025-03-27 Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model Abdelrahman Shaker et.al. 2503.21782v1 link
2025-03-27 VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models Chi-Pin Huang et.al. 2503.21781v1 null
2025-03-27 Video-R1: Reinforcing Video Reasoning in MLLMs Kaituo Feng et.al. 2503.21776v1 link
2025-03-27 StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion Ziyu Guo et.al. 2503.21775v1 null
2025-03-27 Exploring the Evolution of Physics Cognition in Video Generation: A Survey Minghui Lin et.al. 2503.21765v1 link
2025-03-28 Phases with non-invertible symmetries in 1+1D $\unicode{x2013}$ symmetry protected topological orders as duality automorphisms Ömer M. Aksoy et.al. 2503.21764v2 null
2025-03-27 Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video David Yifan Yao et.al. 2503.21761v1 null
2025-03-27 Large Scale Structure and the Cosmic Web Rita Tojeiro et.al. 2503.21759v1 null
2025-03-27 VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness Dian Zheng et.al. 2503.21755v1 link
2025-03-27 MAVERIX: Multimodal Audio-Visual Evaluation Reasoning IndeX Liuyue Xie et.al. 2503.21699v1 null
2025-03-26 Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency Tianqi Liu et.al. 2503.20785v1 null
2025-03-26 Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising Yan-Bo Lin et.al. 2503.20782v1 null
2025-03-26 BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation Yulu Pan et.al. 2503.20781v1 null
2025-03-26 Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields Shijie Zhou et.al. 2503.20776v1 null
2025-03-26 Disentangled Source-Free Personalization for Facial Expression Recognition with Neutral Target Data Masoumeh Sharafi et.al. 2503.20771v1 null
2025-03-27 An Empirical Study of the Impact of Federated Learning on Machine Learning Model Accuracy Haotian Yang et.al. 2503.20768v2 null
2025-03-26 PhysGen3D: Crafting a Miniature Interactive World from a Single Image Boyuan Chen et.al. 2503.20746v1 null
2025-03-26 MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams Yanpeng Sun et.al. 2503.20745v1 null
2025-03-26 RecTable: Fast Modeling Tabular Data with Rectified Flow Masane Fuchi et.al. 2503.20731v1 null
2025-03-26 MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion Saron Samuel et.al. 2503.20698v1 null
2025-03-25 PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model Mingju Gao et.al. 2503.19913v1 null
2025-03-25 FullDiT: Multi-Task Video Generative Foundation Model with Full Attention Xuan Ju et.al. 2503.19907v1 null
2025-03-25 Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better Zihang Lai et.al. 2503.19904v1 null
2025-03-25 Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation Tianhao Qi et.al. 2503.19881v1 null
2025-03-25 Extensions of regret-minimization algorithm for optimal design Youguang Chen et.al. 2503.19874v1 null
2025-03-25 Unpaired Translation of Chest X-ray Images for Lung Opacity Diagnosis via Adaptive Activation Masks and Cross-Domain Alignment Junzhi Ning et.al. 2503.19860v1 null
2025-03-25 Towards Online Multi-Modal Social Interaction Understanding Xinpeng Li et.al. 2503.19851v1 null
2025-03-25 FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs Carlos Plou et.al. 2503.19850v1 null
2025-03-26 Attention IoU: Examining Biases in CelebA using Attention Maps Aaron Serianni et.al. 2503.19846v2 link
2025-03-25 Multi-view Learning for the Identification of Risky Users in Dynamic Social Networks Francesco Benedetti et.al. 2503.19831v1 null
2025-03-24 Target-Aware Video Diffusion Models Taeksoo Kim et.al. 2503.18950v1 null
2025-03-24 Aether: Geometric-Aware Unified World Modeling Aether Team et.al. 2503.18945v1 null
2025-03-24 SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding Mingze Xu et.al. 2503.18943v1 null
2025-03-24 Video-T1: Test-Time Scaling for Video Generation Fangfu Liu et.al. 2503.18942v1 null
2025-03-24 Training-free Diffusion Acceleration with Bottleneck Sampling Ye Tian et.al. 2503.18940v1 null
2025-03-24 AdaWorld: Learning Adaptable World Models with Latent Actions Shenyuan Gao et.al. 2503.18938v1 null
2025-03-24 SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction Enrico Pallotta et.al. 2503.18933v1 null
2025-03-24 CoMP: Continual Multimodal Pre-training for Vision Foundation Models Yitong Chen et.al. 2503.18931v1 link
2025-03-24 Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models Meng Cao et.al. 2503.18923v1 null
2025-03-24 Online 3D Scene Reconstruction Using Neural Object Priors Thomas Chabal et.al. 2503.18897v1 null
2025-03-21 Position: Interactive Generative Video as Next-Generation Game Engine Jiwen Yu et.al. 2503.17359v1 null
2025-03-21 Time-Series U-Net with Recurrence for Noise-Robust Imaging Photoplethysmography Vineet R. Shenoy et.al. 2503.17351v1 null
2025-03-21 Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer Qingyu Shi et.al. 2503.17350v1 null
2025-03-21 Efficient Intent-Based Filtering for Multi-Party Conversations Using Knowledge Distillation from LLMs Reem Gody et.al. 2503.17336v1 null
2025-03-21 Lattice Materials with Topological States Optimized On-Demand Pegah Azizi et.al. 2503.17320v1 null
2025-03-21 Quasiconformal Maps between Bowditch Boundaries of Relatively Hyperbolic Groups Rana Sardar et.al. 2503.17312v1 null
2025-03-21 LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language Kun Chu et.al. 2503.17309v1 null
2025-03-21 Exploring the Temporal Dynamics of Facial Mimicry in Emotion Processing Using Action Units Meisam Jamshidi Seikavandi et.al. 2503.17306v1 null
2025-03-21 HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks Maria Pilligua et.al. 2503.17276v1 null
2025-03-21 Vision Transformer Based Semantic Communications for Next Generation Wireless Networks Muhammad Ahmed Mohsin et.al. 2503.17275v1 null
2025-03-20 XAttention: Block Sparse Attention with Antidiagonal Scoring Ruyi Xu et.al. 2503.16428v1 null
2025-03-20 MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance Quanhao Li et.al. 2503.16421v1 null
2025-03-20 M3: 3D-Spatial MultiModal Memory Xueyan Zou et.al. 2503.16413v1 null
2025-03-20 ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos Haolin Yang et.al. 2503.16400v1 null
2025-03-21 SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation Chun-Han Yao et.al. 2503.16396v2 null
2025-03-20 Attentional Triple-Encoder Network in Spatiospectral Domains for Medical Image Segmentation Kristin Qi et.al. 2503.16389v1 null
2025-03-20 Probabilistic Quantum SVM Training on Ising Machine Haoqi He et.al. 2503.16363v1 null
2025-03-20 Enhancing variational quantum algorithms by balancing training on classical and quantum hardware Rahul Bhowmick et.al. 2503.16361v1 null
2025-03-20 UniSync: A Unified Framework for Audio-Visual Synchronization Tao Feng et.al. 2503.16357v1 null
2025-03-20 Principal Actions on Topological Quivers and Associated Operator Dynamics Matthew Gillespie et.al. 2503.16352v1 null
2025-03-19 Fast Two-photon Microscopy by Neuroimaging with Oblong Random Acquisition (NORA) Esther Whang et.al. 2503.15487v1 null
2025-03-19 TULIP: Towards Unified Language-Image Pretraining Zineng Tang et.al. 2503.15485v1 null
2025-03-19 Learning to Play Piano in the Real World Yves-Simon Zeulner et.al. 2503.15481v1 null
2025-03-19 Cube: A Roblox View of 3D Intelligence Foundation AI Team et.al. 2503.15475v1 null
2025-03-19 EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining Boshen Xu et.al. 2503.15470v1 null
2025-03-20 Dynamic Bi-Elman Attention Networks (DBEAN): Dual-Directional Context-Aware Representation Learning for Enhanced Text Classification ZhengLin Lai et.al. 2503.15469v2 link
2025-03-19 LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding Amirhossein Kazerouni et.al. 2503.15420v1 null
2025-03-19 Temporal Regularization Makes Your Video Generator Stronger Harold Haodong Chen et.al. 2503.15417v1 null
2025-03-19 Automated Processing of eXplainable Artificial Intelligence Outputs in Deep Learning Models for Fault Diagnostics of Large Infrastructures Giovanni Floreale et.al. 2503.15415v1 null
2025-03-19 Federated Continual 3D Segmentation With Single-round Communication Can Peng et.al. 2503.15414v1 null
2025-03-18 MusicInfuser: Making Video Diffusion Listen and Dance Susung Hong et.al. 2503.14505v1 null
2025-03-18 Aligning Multimodal LLM with Human Preference: A Survey Tao Yu et.al. 2503.14504v1 null
2025-03-18 Utilization of Neighbor Information for Image Classification with Different Levels of Supervision Gihan Jayatilaka et.al. 2503.14500v1 null
2025-03-18 Tracking Meets Large Multimodal Models for Driving Scenario Understanding Ayesha Ishaq et.al. 2503.14498v1 null
2025-03-18 Stable Virtual Camera: Generative View Synthesis with Diffusion Models Jensen et.al. 2503.14489v1 null
2025-03-18 Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset Yiqun Mei et.al. 2503.14485v1 null
2025-03-18 SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model Yucheng Mao et.al. 2503.14463v1 null
2025-03-18 Functional classification of metabolic networks Jorge Reyes et.al. 2503.14437v1 null
2025-03-18 LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers Nikhil Abhyankar et.al. 2503.14434v1 null
2025-03-18 MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation Hongyu Zhang et.al. 2503.14428v1 null
2025-03-17 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning Ye Liu et.al. 2503.13444v1 null
2025-03-17 Can Yang-Baxter imply Lie algebra? Dmitry Khudoteplov et.al. 2503.13437v1 null
2025-03-17 WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes Ling Yang et.al. 2503.13435v1 null
2025-03-17 Escaping Plato's Cave: Robust Conceptual Reasoning through Interpretable 3D Neural Object Volumes Nhi Pham et.al. 2503.13429v1 null
2025-03-17 FLEX: A Framework for Learning Robot-Agnostic Force-based Skills Involving Sustained Contact Object Manipulation Shijie Fang et.al. 2503.13418v1 null
2025-03-17 U2AD: Uncertainty-based Unsupervised Anomaly Detection Framework for Detecting T2 Hyperintensity in MRI Spinal Cord Qi Zhang et.al. 2503.13400v1 null
2025-03-17 TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM Ye Wang et.al. 2503.13377v1 null
2025-03-17 Multivariate Sparse Functional Linear Discriminant Analysis: An Application to Inflammatory Bowel Disease Classification Limeng Liu et.al. 2503.13372v1 null
2025-03-17 SyncDiff: Diffusion-based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization Xulin Fan et.al. 2503.13371v1 null
2025-03-17 Agents Play Thousands of 3D Video Games Zhongwen Xu et.al. 2503.13356v1 null
2025-03-14 Scalable Video Conferencing Using SDN Principles Oliver Michel et.al. 2503.11649v1 null
2025-03-14 ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Jianhong Bai et.al. 2503.11647v1 null
2025-03-14 Pathology Image Compression with Pre-trained Autoencoders Srikar Yellapragada et.al. 2503.11591v1 null
2025-03-14 Generalization performance of neural mapping schemes for the space-time interpolation of satellite-derived ocean colour datasets Thi Thuy Nga Nguyen et.al. 2503.11588v1 null
2025-03-14 Image Reconstruction from an Elastically Distorted Scan Adrian Lopez et.al. 2503.11584v1 null
2025-03-14 Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers Weiming Ren et.al. 2503.11579v1 null
2025-03-14 RASA: Replace Anyone, Say Anything -- A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing Tianrui Pan et.al. 2503.11571v1 null
2025-03-14 Observation-only learning of neural mapping schemes for gappy satellite-derived ocean colour parameters Clément Dorffer et.al. 2503.11532v1 null
2025-03-14 HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models Ziqin Zhou et.al. 2503.11513v1 null
2025-03-14 Alzheimer's Disease Classification Using Retinal OCT: TransnetOCT and Swin Transformer Models Siva Manohar Reddy Kesu et.al. 2503.11511v1 null
2025-03-13 V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes Yanming Zhang et.al. 2503.10634v1 null
2025-03-13 NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models Mert Albaba et.al. 2503.10626v1 null
2025-03-13 LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds Lingteng Qiu et.al. 2503.10625v1 null
2025-03-13 OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer Jinyang Li et.al. 2503.10616v1 null
2025-03-13 MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction Yingshuang Zou et.al. 2503.10604v1 null
2025-03-13 CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models Hao He et.al. 2503.10592v1 null
2025-03-13 Long Context Tuning for Video Generation Yuwei Guo et.al. 2503.10589v1 null
2025-03-13 Learning Interpretable Logic Rules from Deep Vision Models Chuqin Geng et.al. 2503.10547v1 null
2025-03-13 From Linear to Spline-Based Classification:Developing and Enhancing SMPA for Noisy Non-Linear Datasets Vatsal Srivastava et.al. 2503.10545v1 null
2025-03-13 Lightweight Models for Emotional Analysis in Video Quoc-Tien Nguyen et.al. 2503.10530v1 null
2025-03-12 PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop Chenyu Li et.al. 2503.09595v1 null
2025-03-12 BIMBA: Selective-Scan Compression for Long-Range Video Question Answering Md Mohaiminul Islam et.al. 2503.09590v1 null
2025-03-12 Fair Federated Medical Image Classification Against Quality Shift via Inter-Client Progressive State Matching Nannan Wu et.al. 2503.09587v1 null
2025-03-12 Auspex: Building Threat Modeling Tradecraft into an Artificial Intelligence-based Copilot Andrew Crossman et.al. 2503.09586v1 null
2025-03-12 Manify: A Python Library for Learning Non-Euclidean Representations Philippe Chlenski et.al. 2503.09576v1 null
2025-03-12 TPDiff: Temporal Pyramid Video Diffusion Model Lingmin Ran et.al. 2503.09566v1 null
2025-03-12 FCaS: Fine-grained Cardiac Image Synthesis based on 3D Template Conditional Diffusion Model Jiahao Xia et.al. 2503.09560v1 null
2025-03-13 The R2D2 Deep Neural Network Series for Scalable Non-Cartesian Magnetic Resonance Imaging Yiwei Chen et.al. 2503.09559v2 null
2025-03-12 CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games Peng Chen et.al. 2503.09527v1 null
2025-03-12 Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework Bakary Badjie et.al. 2503.09504v1 null
2025-03-11 QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension Yongdong Luo et.al. 2503.08689v1 null
2025-03-11 REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder Yitian Zhang et.al. 2503.08665v1 null
2025-03-11 MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention Yuhan Wang et.al. 2503.08664v1 null
2025-03-11 Task-Oriented Co-Design of Communication, Computing, and Control for Edge-Enabled Industrial Cyber-Physical Systems Yufeng Diao et.al. 2503.08661v1 null
2025-03-11 How Does Overparameterization Affect Machine Unlearning of Deep Neural Networks? Gal Alon et.al. 2503.08633v1 null
2025-03-11 Cross-Embodiment Robotic Manipulation Synthesis via Guided Demonstrations through CycleVAE and Human Behavior Transformer Apan Dastider et.al. 2503.08622v1 null
2025-03-11 Vision Transformer for Intracranial Hemorrhage Classification in CT Scans Using an Entropy-Aware Fuzzy Integral Strategy for Adaptive Scan-Level Decision Fusion Mehdi Hosseini Chagahi et.al. 2503.08609v1 null
2025-03-11 Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling Subin Kim et.al. 2503.08605v1 null
2025-03-11 Towards species' classification of the \textit{Anastrepha pseudoparallela} group Gabriel R. Palma et.al. 2503.08598v1 null
2025-03-11 Proc4Gem: Foundation models for physical agency through procedural generation Yixin Lin et.al. 2503.08593v1 null
2025-03-10 Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru Dunant Cusipuma et.al. 2503.07587v1 null
2025-03-10 Efficient Distributed Learning over Decentralized Networks with Convoluted Support Vector Machine Canyi Chen et.al. 2503.07563v1 null
2025-03-10 CPAny: Couple With Any Encoder to Refer Multi-Object Tracking Weize Li et.al. 2503.07516v1 null
2025-03-10 ADROIT: A Self-Supervised Framework for Learning Robust Representations for Active Learning Soumya Banerjee et.al. 2503.07506v1 null
2025-03-10 Blind-Wayfarer: A Minimalist, Probing-Driven Framework for Resilient Navigation in Perception-Degraded Environments Yanran Xu et.al. 2503.07492v1 null
2025-03-10 NeAS: 3D Reconstruction from X-ray Images using Neural Attenuation Surface Chengrui Zhu et.al. 2503.07491v1 null
2025-03-10 VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models Jiacheng Ruan et.al. 2503.07478v1 null
2025-03-10 A Review on Geometry and Surface Inspection in 3D Concrete Printing K. Mawas et.al. 2503.07472v1 null
2025-03-10 Simultaneous Energy Harvesting and Bearing Fault Detection using Piezoelectric Cantilevers P. Peralta-Braz et.al. 2503.07462v1 null
2025-03-10 Open-Set Gait Recognition from Sparse mmWave Radar Point Clouds Riccardo Mazzieri et.al. 2503.07435v1 null
2025-03-10 Analysis of 3D Urticaceae Pollen Classification Using Deep Learning Models Tijs Konijn et.al. 2503.07419v1 null
2025-03-10 AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion Mingzhen Sun et.al. 2503.07418v1 null
2025-03-10 TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision Shaobin Zhuang et.al. 2503.07416v1 null
2025-03-10 Keeping Representation Similarity in Finetuning for Medical Image Analysis Wenqiang Zu et.al. 2503.07399v1 null
2025-03-10 Brain Inspired Adaptive Memory Dual-Net for Few-Shot Image Classification Kexin Di et.al. 2503.07396v1 null
2025-03-10 Is My Text in Your AI Model? Gradient-based Membership Inference Test applied to LLMs Gonzalo Mancera et.al. 2503.07384v1 null
2025-03-07 Task-oriented Uncertainty Collaborative Learning for Label-Efficient Brain Tumor Segmentation Zhenxuan Zhang et.al. 2503.05682v1 null
2025-03-07 A comparison of the Alkire-Foster method and a Markov random field approach in the analysis of multidimensional poverty Joseph Lam et.al. 2503.05676v1 null
2025-03-07 Kinodynamic Model Predictive Control for Energy Efficient Locomotion of Legged Robots with Parallel Elasticity Yulun Zhuang et.al. 2503.05666v1 null
2025-03-07 A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval Yu Zhang et.al. 2503.05659v1 null
2025-03-07 On a classification problem for a quiver of type $\widetilde{A}_{3}$ Ivon Dorado et.al. 2503.05643v1 null
2025-03-07 VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control Yuxuan Bian et.al. 2503.05639v1 null
2025-03-07 TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models Mark YU et.al. 2503.05638v1 null
2025-03-07 Exploring FMCW Radars and Feature Maps for Activity Recognition: A Benchmark Study Ali Samimi Fard et.al. 2503.05629v1 null
2025-03-07 Learning LLM Preference over Intra-Dialogue Pairs: A Framework for Utterance-level Understandings Xuanqing Liu et.al. 2503.05620v1 null
2025-03-07 CACTUS: An Open Dataset and Framework for Automated Cardiac Assessment and Classification of Ultrasound Images Using Deep Transfer Learning Hanae Elmekki et.al. 2503.05604v1 null
2025-03-06 FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video Yue Gao et.al. 2503.04720v1 null
2025-03-06 Iris Style Transfer: Enhancing Iris Recognition with Style Features and Privacy Preservation through Neural Style Transfer Mengdi Wang et.al. 2503.04707v1 null
2025-03-07 Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size Alireza Behtash et.al. 2503.04704v2 null
2025-03-06 Coarse graining and reduced order models for plume ejection dynamics Ike Griss Salas et.al. 2503.04690v1 null
2025-03-06 Mixed Near-field and Far-field Target Localization for Low-altitude Economy Cong Zhou et.al. 2503.04681v1 null
2025-03-06 An Information-theoretic Multi-task Representation Learning Framework for Natural Language Understanding Dou Hu et.al. 2503.04667v1 null
2025-03-06 What Are You Doing? A Closer Look at Controllable Human Video Generation Emanuele Bugliarello et.al. 2503.04666v1 null
2025-03-06 Implicit Neural Representation for Video and Image Super-Resolution Mary Aiyetigbo et.al. 2503.04665v1 null
2025-03-06 RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining Tengfei Zhang et.al. 2503.04653v1 null
2025-03-06 Adaptive Prototype Learning for Multimodal Cancer Survival Analysis Hong Liu et.al. 2503.04643v1 null
2025-03-05 GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Xuanchi Ren et.al. 2503.03751v1 link
2025-03-05 PacketCLIP: Multi-Modal Embedding of Network Traffic and Language for Cybersecurity Reasoning Ryozo Masukawa et.al. 2503.03747v1 null
2025-03-05 OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction Huang Huang et.al. 2503.03734v1 null
2025-03-05 Machine Learning in Biomechanics: Key Applications and Limitations in Walking, Running, and Sports Movements Carlo Dindorf et.al. 2503.03717v1 null
2025-03-05 Handling Uncertainty in Health Data using Generative Algorithms Mahdi Arab Loodaricheh et.al. 2503.03715v1 null
2025-03-05 Rethinking Video Tokenization: A Conditioned Diffusion-based Approach Nianzu Yang et.al. 2503.03708v1 null
2025-03-05 DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance Zhao Yang et.al. 2503.03689v1 null
2025-03-05 Empowering Multi-class Classification for Complex Functional Data with Simultaneous Feature Selection Shuoyang Wang et.al. 2503.03679v1 null
2025-03-05 LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant Wei Li et.al. 2503.03663v1 null
2025-03-05 Limits of nonlinear and dispersive fiber propagation for photonic extreme learning Andrei V. Ermolaev et.al. 2503.03649v1 null
2025-03-04 Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation Han Xue et.al. 2503.02881v1 null
2025-03-04 SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models Dmitry Nechaev et.al. 2503.02876v1 null
2025-03-04 Unsupervised Attributed Dynamic Network Embedding with Stability Guarantees Emma Ceccherini et.al. 2503.02859v1 null
2025-03-04 Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024 Nuria Alina Chandra et.al. 2503.02857v1 null
2025-03-04 Multimodal Deep Learning for Subtype Classification in Breast Cancer Using Histopathological Images and Gene Expression Data Amin Honarmandi Shandiz et.al. 2503.02849v1 null
2025-03-04 In-Depth Analysis of Automated Acne Disease Recognition and Classification Afsana Ahsan Jeny et.al. 2503.02835v1 null
2025-03-04 A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness Nathan Drenkow et.al. 2503.02797v1 null
2025-03-04 Undertrained Image Reconstruction for Realistic Degradation in Blind Image Super-Resolution Ru Ito et.al. 2503.02767v1 null
2025-03-04 Seeded Poisson Factorization: Leveraging domain knowledge to fit topic models Bernd Prostmaier et.al. 2503.02741v1 null
2025-03-04 UAR-NVC: A Unified AutoRegressive Framework for Memory-Efficient Neural Video Compression Jia Wang et.al. 2503.02733v1 null
2025-02-28 TomoSelfDEQ: Self-Supervised Deep Equilibrium Learning for Sparse-Angle CT Reconstruction Tatiana A. Bubba et.al. 2502.21320v1 null
2025-02-28 Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos Zhiyu Tan et.al. 2502.21314v1 null
2025-02-28 AutoComb: Automated Comb Sign Detector for 3D CTE Scans Shashwat Gupta et.al. 2502.21311v1 null
2025-02-28 Bilevel Optimized Implicit Neural Representation for Scan-Specific Accelerated MRI Reconstruction Hongze Yu et.al. 2502.21292v1 null
2025-02-28 Utilizing Quantum Fingerprints in Plant Cells to Evaluate Plant productivity Umadini Ranasinghe et.al. 2502.21275v1 null
2025-02-28 Adaptive Keyframe Sampling for Long Video Understanding Xi Tang et.al. 2502.21271v1 null
2025-02-28 PET Image Denoising via Text-Guided Diffusion: Integrating Anatomical Priors through Text Prompts Boxiao Yu et.al. 2502.21260v1 null
2025-02-28 RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete Yuheng Ji et.al. 2502.21257v1 null
2025-02-28 ALVI Interface: Towards Full Hand Motion Decoding for Amputees Using sEMG Aleksandr Kovalev et.al. 2502.21256v1 null
2025-02-28 Short-Rate Derivatives in a Higher-for-Longer Environment Aram Karakhanyan et.al. 2502.21252v1 null
2025-02-27 Walking the Web of Concept-Class Relationships in Incrementally Trained Interpretable Models Susmit Agrawal et.al. 2502.20393v1 null
2025-02-27 Point Policy: Unifying Observations and Actions with Key Points for Robot Manipulation Siddhant Haldar et.al. 2502.20391v1 null
2025-02-27 Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation Sucheng Ren et.al. 2502.20388v1 null
2025-02-27 InsTaG: Learning Personalized 3D Talking Head from Few-Second Video Jiahe Li et.al. 2502.20387v1 null
2025-02-27 ATLAS Navigator: Active Task-driven LAnguage-embedded Gaussian Splatting Dexter Ong et.al. 2502.20386v1 null
2025-02-27 Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling Hanyang Kong et.al. 2502.20378v1 null
2025-02-27 When does a predictor know its own loss? Aravind Gollakota et.al. 2502.20375v1 null
2025-02-27 OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection Shuming Liu et.al. 2502.20361v1 link
2025-02-27 KNOWM Memristors in a Bridge Synapse delay-based Reservoir Computing system for detection of epileptic seizures Dawid Przyczyna et.al. 2502.20351v1 null
2025-02-27 T1-PILOT: Optimized Trajectories for T1 Mapping Acceleration Tamir Shor et.al. 2502.20333v1 null
2025-02-26 TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding Max Ku et.al. 2502.19400v1 null
2025-02-26 Multi-modal Contrastive Learning for Tumor-specific Missing Modality Synthesis Minjoo Lim et.al. 2502.19390v1 null
2025-02-26 Surface-Based Manipulation Ziqiao Wang et.al. 2502.19389v1 null
2025-02-26 Residual Speech Embeddings for Tone Classification: Removing Linguistic Content to Enhance Paralinguistic Analysis Hamdan Al Ahbabi et.al. 2502.19387v1 null
2025-02-26 Efficient 4D fMRI ASD Classification using Spatial-Temporal-Omics-based Learning Framework Ziqiao Weng et.al. 2502.19386v1 null
2025-02-26 Deep Learning For Time Series Analysis With Application On Human Motion Ali Ismail-Fawaz et.al. 2502.19364v1 null
2025-02-26 Deep Learning-Based Transfer Learning for Classification of Cassava Disease Ademir G. Costa Junior et.al. 2502.19351v1 null
2025-02-26 Unveiling Wireless Users' Locations via Modulation Classification-based Passive Attack Ali Hanif et.al. 2502.19341v1 null
2025-02-26 I Know What I Don't Know: Improving Model Cascades Through Confidence Tuning Stephan Rabanser et.al. 2502.19335v1 null
2025-02-26 Deep learning and classical computer vision techniques in medical image analysis: Case studies on brain MRI tissue segmentation, lung CT COPD registration, and skin lesion classification Anyimadu Daniel Tweneboah et.al. 2502.19258v1 null
2025-02-25 Ion counting and temperature determination of Coulomb-crystallized laser-cooled ions in traps using convolutional neural networks Yanning Yin et.al. 2502.18442v1 null
2025-02-25 Is OpenAlex Suitable for Research Quality Evaluation and Which Citation Indicator is Best? Mike Thelwall et.al. 2502.18427v1 null
2025-02-25 Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand Fengshuo Bai et.al. 2502.18423v1 null
2025-02-25 MedKAN: An Advanced Kolmogorov-Arnold Network for Medical Image Classification Zhuoqin Yang et.al. 2502.18416v1 null
2025-02-25 Enhancing DNA Foundation Models to Address Masking Inefficiencies Monireh Safari et.al. 2502.18405v1 null
2025-02-25 Learning sparse generalized linear models with binary outcomes via iterative hard thresholding Namiko Matsumoto et.al. 2502.18393v1 null
2025-02-25 EgoSim: An Egocentric Multi-view Simulator and Real Dataset for Body-worn Cameras during Motion and Activity Dominik Hollidt et.al. 2502.18373v1 null
2025-02-25 MindMem: Multimodal for Predicting Advertisement Memorability Using LLMs and Deep Learning Sepehr Asgarian et.al. 2502.18371v1 null
2025-02-25 Exploring proteomic signatures in sepsis and non-infectious systemic inflammatory response syndrome Adolfo Ruiz-Sanmartín et.al. 2502.18305v1 null
2025-02-25 Quantization of the Momentum Map via $\frak{g}$-adapted Formalities Chiara Esposito et.al. 2502.18295v1 null
2025-02-24 FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning Jason Jingzhou Liu et.al. 2502.17432v1 null
2025-02-24 X-Dancer: Expressive Music to Human Dance Video Generation Zeyuan Chen et.al. 2502.17414v1 null
2025-02-24 Enriching Physical-Virtual Interaction in AR Gaming by Tracking Identical Real Objects Liuchuan Yu et.al. 2502.17399v1 null
2025-02-24 Robust Confinement State Classification with Uncertainty Quantification through Ensembled Data-Driven Methods Yoeri Poels et.al. 2502.17397v1 null
2025-02-24 RELICT: A Replica Detection Framework for Medical Image Generation Orhun Utku Aydin et.al. 2502.17360v1 null
2025-02-24 Travel Time Reliability in Stochastic Kinematic Flow Models Alexander Hammerl et.al. 2502.17359v1 null
2025-02-24 Leveraging Procedural Knowledge and Task Hierarchies for Efficient Instructional Video Pre-training Karan Samel et.al. 2502.17352v1 null
2025-02-24 +Tour: Recommending personalized itineraries for smart tourism João Paulo Esper et.al. 2502.17345v1 link
2025-02-24 City riots fed by transnational and trans-topic web-of-influence Akshay Verma et.al. 2502.17331v1 null
2025-02-24 AnyTop: Character Animation Diffusion with Any Topology Inbar Gat et.al. 2502.17327v1 null
2025-02-21 VaViM and VaVAM: Autonomous Driving through Video Generative Modeling Florent Bartoccioni et.al. 2502.15672v1 link
2025-02-21 Local geometry of high-dimensional mixture models: Effective spectral theory and dynamical transitions Gerard Ben Arous et.al. 2502.15655v1 null
2025-02-21 Mantis: Lightweight Calibrated Foundation Model for User-Friendly Time Series Classification Vasilii Feofanov et.al. 2502.15637v1 null
2025-02-21 Pick-and-place Manipulation Across Grippers Without Retraining: A Learning-optimization Diffusion Policy Approach Xiangtong Yao et.al. 2502.15613v1 null
2025-02-21 PDeepPP:A Deep learning framework with Pretrained Protein language for peptide classification Jixiu Zhai et.al. 2502.15610v1 null
2025-02-21 On the Robustness of Transformers against Context Hijacking for Linear Classification Tianle Li et.al. 2502.15609v1 null
2025-02-21 Benchmarking machine learning for bowel sound pattern classification from tabular features to pretrained models Zahra Mansour et.al. 2502.15607v1 null
2025-02-21 Causal Modeling of fMRI Time-series for Interpretable Autism Spectrum Disorder Classification Peiyu Duan et.al. 2502.15595v1 null
2025-02-21 Estimating Vehicle Speed on Roadways Using RNNs and Transformers: A Video-based Approach Sai Krishna Reddy Mareddy et.al. 2502.15545v1 null
2025-02-21 Implications of Photon Mass: Vortextrap Magnetization of Black Holes Gia Dvali et.al. 2502.15510v1 null
2025-02-20 Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts Sara Ghaboura et.al. 2502.14865v1 null
2025-02-20 Dynamic Concepts Personalization from Single Videos Rameen Abdal et.al. 2502.14844v1 null
2025-02-20 Improving the Diffusability of Autoencoders Ivan Skorokhodov et.al. 2502.14831v1 null
2025-02-20 Cross Validation for Correlated Data in Regression and Classification Models, with Applications to Deep Learning Oren Yuval et.al. 2502.14808v1 null
2025-02-20 FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis Fadillah Maani et.al. 2502.14807v1 null
2025-02-20 AVD2: Accident Video Diffusion for Accident Video Description Cheng Li et.al. 2502.14801v1 null
2025-02-20 Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration Pengxiang Ding et.al. 2502.14795v1 null
2025-02-20 SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Michael Tschannen et.al. 2502.14786v1 link
2025-02-20 Sparse Activations as Conformal Predictors Margarida M. Campos et.al. 2502.14773v1 null
2025-02-20 MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders Maya Varma et.al. 2502.14753v1 null
2025-02-19 Qwen2.5-VL Technical Report Shuai Bai et.al. 2502.13923v1 null
2025-02-19 Audio-Based Classification of Insect Species Using Machine Learning Models: Cicada, Beetle, Termite, and Cricket Manas V Shetty et.al. 2502.13893v1 null
2025-02-19 Multi-view Video-Pose Pretraining for Operating Room Surgical Activity Recognition Idris Hamoud et.al. 2502.13883v1 null
2025-02-19 Ribbon blocks for centraliser algebras of symmetric groups Matthew Fayers et.al. 2502.13867v1 null
2025-02-19 MSVCOD:A Large-Scale Multi-Scene Dataset for Video Camouflage Object Detection Shuyong Gao et.al. 2502.13859v1 null
2025-02-19 Generative Video Semantic Communication via Multimodal Semantic Fusion with Large Model Hang Yin et.al. 2502.13838v1 null
2025-02-19 MGFI-Net: A Multi-Grained Feature Integration Network for Enhanced Medical Image Segmentation Yucheng Zeng et.al. 2502.13808v1 null
2025-02-19 Classifying thick subcategories over a Koszul complex via the curved BGG correspondence Jian Liu et.al. 2502.13806v1 null
2025-02-19 Binary VPN Traffic Detection Using Wavelet Features and Machine Learning Yasameen Sajid Razooqi et.al. 2502.13804v1 null
2025-02-19 From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education Yi-Fan Zhang et.al. 2502.13789v1 null
2025-02-18 Pre-training Auto-regressive Robotic Models with 4D Representations Dantong Niu et.al. 2502.13142v1 null
2025-02-18 Magma: A Foundation Model for Multimodal AI Agents Jianwei Yang et.al. 2502.13130v1 null
2025-02-18 BOLIMES: Boruta and LIME optiMized fEature Selection for Gene Expression Classification Bich-Chung Phan et.al. 2502.13080v1 null
2025-02-18 L4P: Low-Level 4D Vision Perception Unified Abhishek Badki et.al. 2502.13078v1 null
2025-02-18 Improved Fine-Tuning of Large Multimodal Models for Hateful Meme Detection Jingbiao Mei et.al. 2502.13061v1 null
2025-02-18 Benchmarking MedMNIST dataset on real quantum hardware Gurinder Singh et.al. 2502.13056v1 null
2025-02-18 LAMD: Context-driven Android Malware Detection and Classification with LLMs Xingzhi Qian et.al. 2502.13055v1 null
2025-02-18 QZO: A Catalog of 5 Million Quasars from the Zwicky Transient Facility S. J. Nakoneczny et.al. 2502.13054v1 null
2025-02-18 Development of systematic uncertainty-aware neural network trainings for binned-likelihood analyses at the LHC CMS Collaboration et.al. 2502.13047v1 null
2025-02-18 How far are two symmetric matrices from commuting? With an application to object characterisation and identification in metal detection P. D. Ledger et.al. 2502.13038v1 null
2025-02-17 VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution Chendong Wang et.al. 2502.12151v1 null
2025-02-17 Idiosyncrasies in Large Language Models Mingjie Sun et.al. 2502.12150v1 null
2025-02-17 LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities Florian Sestak et.al. 2502.12128v1 null
2025-02-17 Hypernym Bias: Unraveling Deep Classifier Training Dynamics through the Lens of Class Hierarchy Roman Malashin et.al. 2502.12125v1 null
2025-02-17 Crime in Proportions: Applying Compositional Data Analysis to European Crime Trends for 2022 Onur Batın Doğan et.al. 2502.12099v1 null
2025-02-17 Descriminative-Generative Custom Tokens for Vision-Language Models Pramuditha Perera et.al. 2502.12095v1 null
2025-02-17 Unifying Explainable Anomaly Detection and Root Cause Analysis in Dynamical Systems Yue Sun et.al. 2502.12086v1 null
2025-02-17 AdaSplash: Adaptive Sparse Flash Attention Nuno Gonçalves et.al. 2502.12082v1 null
2025-02-17 Unhackable Temporal Rewarding for Scalable Video MLLMs En Yu et.al. 2502.12081v1 null
2025-02-17 Classifying the Stoichiometry of Virus-like Particles with Interpretable Machine Learning Jiayang Zhang et.al. 2502.12049v1 null
2025-02-14 Simplifying DINO via Coding Rate Regularization Ziyang Wu et.al. 2502.10385v1 null
2025-02-14 Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data Corinna Cortes et.al. 2502.10381v1 null
2025-02-14 Quasi-isometry classification of certain graph $2$-braid groups and its applications Byung Hee An et.al. 2502.10366v1 null
2025-02-14 Proper Learnability and the Role of Unlabeled Data Julian Asilis et.al. 2502.10359v1 null
2025-02-14 Diameter bounds for $SL(2,\mathbb{Z})$-orbits of origamis in $\mathcal{H}(2)$ and the Prym loci in $\mathcal{H}(4)$ and $\mathcal{H}(6)$ Luke Jeffreys et.al. 2502.10358v1 null
2025-02-14 OptimOTU: Taxonomically aware OTU clustering with optimized thresholds and a bioinformatics workflow for metabarcoding data Brendan Furneaux et.al. 2502.10350v1 null
2025-02-14 Ocular Disease Classification Using CNN with Deep Convolutional Generative Adversarial Network Arun Kunwar et.al. 2502.10334v1 null
2025-02-14 SegX: Improving Interpretability of Clinical Image Diagnosis with Segmentation-based Enhancement Yuhao Zhang et.al. 2502.10296v1 null
2025-02-14 Probing Perceptual Constancy in Large Vision Language Models Haoran Sun et.al. 2502.10273v1 null
2025-02-14 Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Guoqing Ma et.al. 2502.10248v1 null
2025-02-13 Embed Any NeRF: Graph Meta-Networks for Neural Tasks on Arbitrary NeRF Architectures Francesco Ballerini et.al. 2502.09623v1 null
2025-02-13 Exploring the Potential of Encoder-free Architectures in 3D LMMs Yiwen Tang et.al. 2502.09620v1 null
2025-02-13 Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights Jonathan Kahana et.al. 2502.09619v1 null
2025-02-13 DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References Xueyi Liu et.al. 2502.09614v1 null
2025-02-13 Morphological Classification of Galaxies Karen Masters et.al. 2502.09610v1 null
2025-02-13 GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis Angelos Zavras et.al. 2502.09598v1 null
2025-02-13 Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs Siyan Zhao et.al. 2502.09597v1 null
2025-02-13 Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering Mark Beliaev et.al. 2502.09573v1 null
2025-02-13 Diffusing DeBias: a Recipe for Turning a Bug into a Feature Massimiliano Ciranni et.al. 2502.09564v1 null
2025-02-13 Learned Correction Methods for Ultrasound Computed Tomography Imaging Using Simplified Physics Models Luke Lozenski et.al. 2502.09546v1 null
2025-02-12 CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation Qinghe Wang et.al. 2502.08639v1 null
2025-02-12 Rapid Whole Brain Mesoscale In-vivo MR Imaging using Multi-scale Implicit Neural Representation Jun Lyu et.al. 2502.08634v1 null
2025-02-12 Ensemble based approach to quantifying uncertainty of LLM based classifications Srijith Rajamohan et.al. 2502.08631v1 null
2025-02-12 Robot Data Curation with Mutual Information Estimators Joey Hejna et.al. 2502.08623v1 null
2025-02-12 Forecasting Drought Using Machine Learning in California Nan K. Li et.al. 2502.08622v1 null
2025-02-12 SportsBuddy: Designing and Evaluating an AI-Powered Sports Video Storytelling Tool Through Real-World Deployment Tica Lin et.al. 2502.08621v1 null
2025-02-12 Learning Selection Cuts With Gradients Mike Hance et.al. 2502.08615v1 null
2025-02-12 Continuous Cardiac Arrest Prediction in ICU using PPG Foundation Model Saurabh Kataria et.al. 2502.08612v1 null
2025-02-12 CurvGAD: Leveraging Curvature for Enhanced Graph Anomaly Detection Karish Grover et.al. 2502.08605v1 null
2025-02-12 Light-A-Video: Training-free Video Relighting via Progressive Light Fusion Yujie Zhou et.al. 2502.08590v1 link
2025-02-11 Pippo: High-Resolution Multi-View Humans from a Single Image Yash Kant et.al. 2502.07785v1 null
2025-02-11 Statistical Reevaluation of the USP Classification Boundary: Smaller Planets Within 1 Day, Larger Period Ratios Below 2 Days Armaan V. Goyal et.al. 2502.07773v1 null
2025-02-11 A forbidden subgraph study for cut problems on graphs permitting loops and multiedges Tala Eagling-Vose et.al. 2502.07769v1 null
2025-02-11 An Advanced NLP Framework for Automated Medical Diagnosis with DeBERTa and Dynamic Contextual Positional Gating Mohammad Ali Labbaf Khaniki et.al. 2502.07755v1 null
2025-02-11 HiPoNet: A Topology-Preserving Multi-View Neural Network For High Dimensional Point Cloud and Single-Cell Data Siddharth Viswanath et.al. 2502.07746v1 null
2025-02-11 Next Block Prediction: Video Generation via Semi-Auto-Regressive Modeling Shuhuai Ren et.al. 2502.07737v1 null
2025-02-11 PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization Bing Fan et.al. 2502.07707v1 null
2025-02-11 Magic 1-For-1: Generating One Minute Video Clips within One Minute Hongwei Yi et.al. 2502.07701v1 null
2025-02-11 SoK: A Classification for AI-driven Personalized Privacy Assistants Victor Morel et.al. 2502.07693v1 null
2025-02-11 Auto-Drafting Police Reports from Noisy ASR Outputs: A Trust-Centered LLM Approach Param Kulkarni et.al. 2502.07677v1 null
2025-02-10 Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT Dongyang Liu et.al. 2502.06782v1 null
2025-02-10 KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification Yue Zhu et.al. 2502.06779v1 null
2025-02-10 ALMACAL XIII. Evolution of the CO luminosity function and the molecular gas mass density out to $z$ ~ 6 Victoria Bollo et.al. 2502.06778v1 null
2025-02-10 Enhancing Performance of Explainable AI Models with Constrained Concept Refinement Geyu Liang et.al. 2502.06775v1 null
2025-02-10 History-Guided Video Diffusion Kiwhan Song et.al. 2502.06764v1 null
2025-02-10 Equations over Finite Monoids with Infinite Promises Alberto Larrauri et.al. 2502.06762v1 null
2025-02-10 Incentivizing Desirable Effort Profiles in Strategic Classification: The Role of Causality and Uncertainty Valia Efthymiou et.al. 2502.06749v1 null
2025-02-10 Wandering around: A bioinspired approach to visual attention through object motion sensitivity Giulia D Angelo et.al. 2502.06747v1 null
2025-02-10 Persistent spin grids with spin-orbit coupled 2D electron gas A. V. Poshakinskiy et.al. 2502.06745v1 null
2025-02-10 Enhancing Pneumonia Diagnosis and Severity Assessment through Deep Learning: A Comprehensive Approach Integrating CNN Classification and Infection Segmentation S Kumar Reddy Mallidi et.al. 2502.06735v1 null
2025-02-07 FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation Shilong Zhang et.al. 2502.05179v1 null
2025-02-07 Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray Yunhang Shen et.al. 2502.05177v1 null
2025-02-07 AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting Chung-Ho Wu et.al. 2502.05176v1 null
2025-02-07 VideoRoPE: What Makes for Good Video Rotary Position Embedding? Xilin Wei et.al. 2502.05173v1 null
2025-02-07 Torsion pairs and 3-fold flops Parth Shimpi et.al. 2502.05146v1 null
2025-02-07 Chest X-ray Foundation Model with Global and Local Representations Integration Zefan Yang et.al. 2502.05142v1 null
2025-02-07 Counting Fish with Temporal Representations of Sonar Video Kai Van Brunt et.al. 2502.05129v1 null
2025-02-07 Multiphoton, multimode state classification for nonlinear optical circuits Denis A. Kopylov et.al. 2502.05123v1 null
2025-02-07 Investigating the impact of kernel harmonization and deformable registration on inspiratory and expiratory chest CT images for people with COPD Aravind R. Krishnan et.al. 2502.05119v1 null
2025-02-07 GiesKaNe: Bridging Past and Present in Grammatical Theory and Practical Application Volker Emmrich et.al. 2502.05113v1 null
2025-02-06 Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Zuyan Liu et.al. 2502.04328v1 null
2025-02-06 WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs Jack Hong et.al. 2502.04326v1 null
2025-02-06 MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation Jinbo Xing et.al. 2502.04299v1 null
2025-02-06 Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression Lirui Wang et.al. 2502.04296v1 null
2025-02-06 Retro-Rank-In: A Ranking-Based Approach for Inorganic Materials Synthesis Planning Thorben Prein et.al. 2502.04289v1 null
2025-02-06 How does a Multilingual LM Handle Multiple Languages? Santhosh Kakarla et.al. 2502.04269v1 null
2025-02-06 Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion Marco Mistretta et.al. 2502.04263v1 null
2025-02-06 Work in Progress: AI-Powered Engineering-Bridging Theory and Practice Oz Levy et.al. 2502.04256v1 null
2025-02-06 An object detection approach for lane change and overtake detection from motion profiles Andrea Benericetti et.al. 2502.04244v1 null
2025-02-06 Saflo: eBPF-Based MPTCP Scheduler for Mitigating Traffic Analysis Attacks in Cellular Networks Sangwoo Lee et.al. 2502.04236v1 null
2025-02-05 Seeing World Dynamics in a Nutshell Qiuhong Shen et.al. 2502.03465v1 null
2025-02-05 SKI Models: Skeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living Arkaprava Sinha et.al. 2502.03459v1 null
2025-02-05 Kineto-Dynamical Planning and Accurate Execution of Minimum-Time Maneuvers on Three-Dimensional Circuits Mattia Piccinini et.al. 2502.03454v1 null
2025-02-05 Linearized Optimal Transport pyLOT Library: A Toolkit for Machine Learning on Point Clouds Jun Linwu et.al. 2502.03439v1 null
2025-02-05 A Temporal Convolutional Network-Based Approach and a Benchmark Dataset for Colonoscopy Video Temporal Segmentation Carlo Biffi et.al. 2502.03430v1 null
2025-02-05 Concept Based Explanations and Class Contrasting Rudolf Herdt et.al. 2502.03422v1 null
2025-02-05 A Structured Reasoning Framework for Unbalanced Data Classification Using Probabilistic Models Junliang Du et.al. 2502.03386v1 null
2025-02-05 Deep Learning-Based Approach for Identification of Potato Leaf Diseases Using Wrapper Feature Selection and Feature Concatenation Muhammad Ahtsam Naeem et.al. 2502.03370v1 null
2025-02-05 Learning from Active Human Involvement through Proxy Value Propagation Zhenghao Peng et.al. 2502.03369v1 null
2025-02-05 Rethinking Approximate Gaussian Inference in Classification Bálint Mucsányi et.al. 2502.03366v1 null
2025-02-04 Fairness in Survival Analysis: A Novel Conditional Mutual Information Augmentation Approach Tianyang Xie et.al. 2502.02567v1 null
2025-02-04 Learning the RoPEs: Better 2D and 3D Position Encodings with STRING Connor Schenck et.al. 2502.02562v1 null
2025-02-04 Particle Trajectory Representation Learning with Masked Point Modeling Sam Young et.al. 2502.02558v1 null
2025-02-04 AAD-DCE: An Aggregated Multimodal Attention Mechanism for Early and Late Dynamic Contrast Enhanced Prostate MRI Synthesis Divya Bharti et.al. 2502.02555v1 null
2025-02-04 Hierarchical Sparse Bayesian Multitask Model with Scalable Inference for Microbiome Analysis Haonan Zhu et.al. 2502.02552v1 null
2025-02-04 2D Surface Brightness Modelling of Large 2MASS Galaxies II: The Role of Classical Bulges and Pseudobulges on Galaxy Scaling Relations and its implication for Supermassive Black Hole Formation Emmanuel Ríos-López et.al. 2502.02546v1 null
2025-02-04 TabPFN Unleashed: A Scalable and Effective Solution to Tabular Classification Problems Si-Yang Liu et.al. 2502.02527v1 null
2025-02-04 Hybrid Fingerprint-based Positioning in Cell-Free Massive MIMO Systems Manish Kumar et.al. 2502.02512v1 null
2025-02-04 The Skin Game: Revolutionizing Standards for AI Dermatology Model Comparison Łukasz Miętkiewicz et.al. 2502.02500v1 null
2025-02-04 VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Hila Chefer et.al. 2502.02492v1 null
2025-01-31 Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach Yingdan Shi et.al. 2501.19403v1 null
2025-01-31 Perceptive Mixed-Integer Footstep Control for Underactuated Bipedal Walking on Rough Terrain Brian Acosta et.al. 2501.19391v1 null
2025-01-31 Beyond Fixed Horizons: A Theoretical Framework for Adaptive Denoising Diffusions Sören Christensen et.al. 2501.19373v1 null
2025-01-31 Benchmark of the Full and Reduced Effective Resistance Kernel for Molecular Classification Adam Wesołowski et.al. 2501.19352v1 null
2025-01-31 An All-digital 65-nm Tsetlin Machine Image Classification Accelerator with 8.6 nJ per MNIST Frame at 60.3k Frames per Second Svein Anders Tunheim et.al. 2501.19347v1 null
2025-01-31 Pathological MRI Segmentation by Synthetic Pathological Data Generation in Fetuses and Neonates Misha P. T Kaandorp et.al. 2501.19338v1 null
2025-01-31 Consistent Video Colorization via Palette Guidance Han Wang et.al. 2501.19331v1 null
2025-01-31 Ultra-fast Real-time Target Recognition Using a Shift, Scale, and Rotation Invariant Hybrid Opto-electronic Joint Transform Correlator Xi Shen et.al. 2501.19299v1 null
2025-01-31 Differentially Private In-context Learning via Sampling Few-shot Mixed with Zero-shot Outputs James Flemings et.al. 2501.19287v1 null
2025-01-31 Application of Generative Adversarial Network (GAN) for Synthetic Training Data Creation to improve performance of ANN Classifier for extracting Built-Up pixels from Landsat Satellite Imagery Amritendu Mukherjee et.al. 2501.19283v1 null
2025-01-30 DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models Ruofan Liang et.al. 2501.18590v1 null
2025-01-30 Node Classification and Search on the Rubik's Cube Graph with GNNs Alessandro Barro et.al. 2501.18580v1 null
2025-01-30 BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos Lehao Lin et.al. 2501.18565v1 null
2025-01-30 Finite subgroups of maximal order of the Cremona group over the rationals Ahmed Abouelsaad et.al. 2501.18551v1 null
2025-01-30 UDC-VIT: A Real-World Video Dataset for Under-Display Cameras Kyusu Ahn et.al. 2501.18545v1 link
2025-01-30 Loss Functions and Operators Generated by f-Divergences Vincent Roulet et.al. 2501.18537v1 null
2025-01-30 Sample Classification using Machine Learning-Assisted Entangled Two-Photon Absorption Áulide Martínez-Tapia et.al. 2501.18534v1 null
2025-01-30 Joint Learning of Energy-based Models and their Partition Function Michael E. Sander et.al. 2501.18528v1 null
2025-01-30 Character factorisations, $z$-asymmetric partitions and plethysm Seamus Albion et.al. 2501.18520v1 null
2025-01-30 Deconstruct Complexity (DeComplex): A Novel Perspective on Tackling Dense Action Detection Faegheh Sardari et.al. 2501.18509v1 null
2025-01-29 acoupi: An Open-Source Python Framework for Deploying Bioacoustic AI Models on Edge Devices Aude Vuilliomenet et.al. 2501.17841v1 link
2025-01-29 IRONMAP: Iron Network Mapping and Analysis Protocol for Detecting Over-Time Brain Iron Abnormalities in Neurological Disease Jack A. Reeves et.al. 2501.17838v1 null
2025-01-29 TikTok's recommendations skewed towards Republican content during the 2024 U.S. presidential race Hazem Ibrahim et.al. 2501.17831v1 null
2025-01-29 Aggregation Schemes for Single-Vector WSI Representation Learning in Digital Pathology Sobhan Hemati et.al. 2501.17822v1 null
2025-01-29 eaSEL: Promoting Social-Emotional Learning and Parent-Child Interaction through AI-Mediated Content Consumption Jocelyn Shen et.al. 2501.17819v1 null
2025-01-29 CrowdSplat: Exploring Gaussian Splatting For Crowd Rendering Xiaohan Sun et.al. 2501.17792v1 null
2025-01-29 Glioma Multimodal MRI Analysis System for Tumor Layered Diagnosis via Multi-task Semi-supervised Learning Yihao Liu et.al. 2501.17758v1 null
2025-01-29 PulmoFusion: Advancing Pulmonary Health with Efficient Multi-Modal Fusion Ahmed Sharshar et.al. 2501.17699v1 link
2025-01-29 NutMaat: A Python package for stellar spectral classification on the MK system R. I. El-Kholy et.al. 2501.17698v1 null
2025-01-29 Tonguescape: Exploring Language Models Understanding of Vowel Articulation Haruki Sakajo et.al. 2501.17643v1 null
2025-01-28 A Hybrid Deep Learning CNN Model for Enhanced COVID-19 Detection from Computed Tomography (CT) Scan Images Suresh Babu Nettur et.al. 2501.17160v1 null
2025-01-28 Sensitivity of Quantitative Susceptibility Mapping in Clinical Brain Research Fahad Salman et.al. 2501.17158v1 null
2025-01-28 Three-Dimensional Diffusion-Weighted Multi-Slab MRI With Slice Profile Compensation Using Deep Energy Model Reza Ghorbani et.al. 2501.17152v1 null
2025-01-28 FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data Deren Lei et.al. 2501.17144v1 link
2025-01-28 DINOSTAR: Deep Iterative Neural Object Detector Self-Supervised Training for Roadside LiDAR Applications Muhammad Shahbaz et.al. 2501.17076v1 null
2025-01-28 Symmetries of 3-webs around a point Jean Paul Dufour et.al. 2501.17066v1 null
2025-01-28 Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding Akash Kumar et.al. 2501.17053v1 null
2025-01-28 Benchmarking Quantum Convolutional Neural Networks for Signal Classification in Simulated Gamma-Ray Burst Detection Farida Farsian et.al. 2501.17041v1 null
2025-01-28 Approach Towards Semi-Automated Certification for Low Criticality ML-Enabled Airborne Applications Chandrasekar Sridhar et.al. 2501.17028v1 null
2025-01-28 MAUCell: An Adaptive Multi-Attention Framework for Video Frame Prediction Shreyam Gupta et.al. 2501.16997v1 null
2025-01-27 RelightVid: Temporal-Consistent Diffusion Model for Video Relighting Ye Fang et.al. 2501.16330v1 null
2025-01-27 sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging Jingyuan Chen et.al. 2501.16329v1 null
2025-01-27 Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture Yikun Hou et.al. 2501.16322v1 null
2025-01-27 TiDES: The 4MOST Time Domain Extragalactic Survey C. Frohmaier et.al. 2501.16311v1 null
2025-01-27 RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval Long Nguyen et.al. 2501.16303v1 null
2025-01-27 Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models Jing Zhang et.al. 2501.16282v1 null
2025-01-27 Lightweight Weighted Average Ensemble Model for Pneumonia Detection in Chest X-Ray Images Suresh Babu Nettur et.al. 2501.16249v1 null
2025-01-27 Zero-Shot Decision Tree Construction via Large Language Models Lucas Carrasco et.al. 2501.16247v1 null
2025-01-27 CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation Xiaochuan Ma et.al. 2501.16246v1 null
2025-01-27 Echoes of Discord: Forecasting Hater Reactions to Counterspeech Xiaoying Song et.al. 2501.16235v1 null
2025-01-24 Estimation-theoretic analysis of lensless imaging Leyla A. Kabuli et.al. 2501.14727v1 null
2025-01-24 Gland Segmentation Using SAM With Cancer Grade as a Prompt Yijie Zhu et.al. 2501.14718v1 null
2025-01-24 Enhanced Confocal Laser Scanning Microscopy with Adaptive Physics Informed Deep Autoencoders Zaheer Ahmad et.al. 2501.14709v1 null
2025-01-24 Stroke classification using Virtual Hybrid Edge Detection from in silico electrical impedance tomography data Juan Pablo Agnelli et.al. 2501.14704v1 null
2025-01-24 Rethinking Foundation Models for Medical Image Classification through a Benchmark Study on MedMNIST Fuping Wu et.al. 2501.14685v1 null
2025-01-24 Artificial Intelligence Could Have Predicted All Space Weather Events Associated with the May 2024 Superstorm Sabrina Guastavino et.al. 2501.14684v1 null
2025-01-24 An Empirical Study on LLM-based Classification of Requirements-related Provisions in Food-safety Regulations Shabnam Hassani et.al. 2501.14683v1 null
2025-01-24 MatAnyone: Stable Video Matting with Consistent Memory Propagation Peiqing Yang et.al. 2501.14677v1 null
2025-01-24 Automation of finding strong gravitational lenses in the Kilo Degree Survey with U-DenseLens (DenseLens + Segmentation) Bharath Chowdhary Nagam et.al. 2501.14650v1 null
2025-01-24 ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations Tianming Liang et.al. 2501.14607v1 null
2025-01-23 IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models Jiayi Lei et.al. 2501.13920v1 null
2025-01-23 Temporal Preference Optimization for Long-Form Video Understanding Rui Li et.al. 2501.13919v1 null
2025-01-23 Improving Video Generation with Human Feedback Jie Liu et.al. 2501.13918v1 null
2025-01-23 Exploring Finetuned Audio-LLM on Heart Murmur Features Adrian Florea et.al. 2501.13884v1 null
2025-01-23 Disclinations, dislocations, and emanant flux at Dirac criticality Maissam Barkeshli et.al. 2501.13866v1 null
2025-01-23 Dual-Modal Prototype Joint Learning for Compositional Zero-Shot Learning Shiyu Zhang et.al. 2501.13859v1 null
2025-01-23 First Lessons Learned of an Artificial Intelligence Robotic System for Autonomous Coarse Waste Recycling Using Multispectral Imaging-Based Methods Timo Lange et.al. 2501.13855v1 null
2025-01-23 Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes Shiling Deng et.al. 2501.13851v1 link
2025-01-23 Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Kairui Hu et.al. 2501.13826v1 null
2025-01-23 Hallucinations Can Improve Large Language Models in Drug Discovery Shuzhou Yuan et.al. 2501.13824v1 null
2025-01-22 VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Boqiang Zhang et.al. 2501.13106v1 link
2025-01-22 Robust Representation Consistency Model via Contrastive Denoising Jiachen Lei et.al. 2501.13094v1 link
2025-01-22 CHaRNet: Conditioned Heatmap Regression for Robust Dental Landmark Localization José Rodríguez-Ortega et.al. 2501.13073v1 null
2025-01-22 Robust Body Composition Analysis by Generating 3D CT Volumes from Limited 2D Slices Lianrui Zuo et.al. 2501.13071v1 null
2025-01-22 Beyond the Lungs: Extending the Field of View in Chest CT with Latent Diffusion Models Lianrui Zuo et.al. 2501.13068v1 null
2025-01-22 SMART-Vision: Survey of Modern Action Recognition Techniques in Vision Ali K. AlShami et.al. 2501.13066v1 null
2025-01-22 Real-time Terahertz Compressive Optical-Digital Neural Network Imaging Shao-Hsuan Wu et.al. 2501.13065v1 null
2025-01-22 One-Class Domain Adaptation via Meta-Learning Stephanie Holly et.al. 2501.13052v1 null
2025-01-22 Characterizing Collective Efforts in Content Sharing and Quality Control for ADHD-relevant Content on Video-sharing Platforms Hanxiu 'Hazel' Zhu et.al. 2501.13020v1 null
2025-01-22 Discrete Lagrangian multiforms for ABS equations I: quad equations Jacob J. Richardson et.al. 2501.13012v1 null
2025-01-21 Learning segmentation from point trajectories Laurynas Karazija et.al. 2501.12392v1 null
2025-01-21 Taming Teacher Forcing for Masked Autoregressive Video Generation Deyu Zhou et.al. 2501.12389v1 null
2025-01-21 Continuous 3D Perception Model with Persistent State Qianqian Wang et.al. 2501.12387v1 null
2025-01-21 InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling Yi Wang et.al. 2501.12386v1 link
2025-01-21 CCESAR: Coastline Classification-Extraction From SAR Images Using CNN-U-Net Combination Vidhu Arora et.al. 2501.12384v1 null
2025-01-21 Parallel Sequence Modeling via Generalized Spatial Propagation Network Hongjun Wang et.al. 2501.12381v1 null
2025-01-21 MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Yilun Zhao et.al. 2501.12380v1 link
2025-01-21 Video Depth Anything: Consistent Depth Estimation for Super-Long Videos Sili Chen et.al. 2501.12375v1 null
2025-01-21 InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Yuhang Zang et.al. 2501.12368v1 link
2025-01-21 Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration Thomas Walshe et.al. 2501.12332v1 null
2025-01-17 Zero-Shot Monocular Scene Flow Estimation in the Wild Yiqing Liang et.al. 2501.10357v1 null
2025-01-17 DexForce: Extracting Force-informed Actions from Kinesthetic Demonstrations for Dexterous Manipulation Claire Chen et.al. 2501.10356v1 null
2025-01-17 Hybrid Deep Learning Model for epileptic seizure classification by using 1D-CNN with multi-head attention mechanism Mohammed Guhdar et.al. 2501.10342v1 null
2025-01-17 Natural Language Processing of Privacy Policies: A Survey Andrick Adhikari et.al. 2501.10319v1 null
2025-01-17 Using Technology in Digital Humanities for Learning and Knowledge Dissemination Armanda Rodrigues et.al. 2501.10275v1 null
2025-01-17 Over-the-Air Multi-Sensor Inference with Neural Networks Using Memristor-Based Analog Computing Busra Tegin et.al. 2501.10245v1 null
2025-01-17 Amortized Bayesian Mixture Models Šimon Kucharský et.al. 2501.10229v1 null
2025-01-17 Adaptive Clustering for Efficient Phenotype Segmentation of UAV Hyperspectral Data Ciem Cornelissen et.al. 2501.10199v1 null
2025-01-17 Secure Semantic Communication With Homomorphic Encryption Rui Meng et.al. 2501.10182v1 null
2025-01-17 A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features Enes Karanfil et.al. 2501.10144v1 null
2025-01-16 Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Philippe Hansen-Estruch et.al. 2501.09755v1 null
2025-01-16 Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues Youngjoon Jang et.al. 2501.09754v1 null
2025-01-16 SRE-Conv: Symmetric Rotation Equivariant Convolution for Biomedical Image Classification Yuexi Du et.al. 2501.09753v1 link
2025-01-16 Improvement of Data Analytics Techniques in Reflection High Energy Electron Diffraction to Enable Machine Learning Patrick T. Gemperline et.al. 2501.09743v1 link
2025-01-16 ComplexVAD: Detecting Interaction Anomalies in Video Furkan Mumcu et.al. 2501.09733v1 null
2025-01-16 Practical Continual Forgetting for Pre-trained Vision Models Hongbo Zhao et.al. 2501.09705v1 link
2025-01-16 Cueless EEG imagined speech for subject identification: dataset and benchmarks Ali Derakhshesh et.al. 2501.09700v1 link
2025-01-16 Active particle in a very thin interfacial droplet Airi N. Kato et.al. 2501.09652v1 null
2025-01-16 Electronic Health Records: Towards Digital Twins in Healthcare Muhammet Alkan et.al. 2501.09640v1 null
2025-01-16 Unified Face Matching and Physical-Digital Spoofing Attack Detection Arun Kunwar et.al. 2501.09635v1 null
2025-01-15 Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion Jingyuan Chen et.al. 2501.09019v1 null
2025-01-15 Vision Foundation Models for Computed Tomography Suraj Pai et.al. 2501.09001v1 null
2025-01-15 RepVideo: Rethinking Cross-Layer Representation for Video Generation Chenyang Si et.al. 2501.08994v1 null
2025-01-15 Learning to Extract Cross-Domain Aspects and Understanding Sentiments Using Large Language Models Karukriti Kaushik Ghosh et.al. 2501.08974v1 null
2025-01-15 An analysis of data variation and bias in image-based dermatological datasets for machine learning classification Francisco Mauro et.al. 2501.08962v1 null
2025-01-15 Neuromorphic Retina: An FPGA-based Emulator Prince Phillip et.al. 2501.08943v1 null
2025-01-15 Visual WetlandBirds Dataset: Bird Species Identification and Behavior Recognition in Videos Javier Rodriguez-Juan et.al. 2501.08931v1 link
2025-01-15 Learning Joint Denoising, Demosaicing, and Compression from the Raw Natural Image Noise Dataset Benoit Brummer et.al. 2501.08924v1 null
2025-01-15 Multi-View Transformers for Airway-To-Lung Ratio Inference on Cardiac CT Scans: The C4R Study Sneha N. Naik et.al. 2501.08902v1 null
2025-01-15 An investigation of the relationship between morphology and chemistry of the D-type spherules from the recovery expedition of the CNEOS 2014-01-08 bolide: Implications for origins Eugenia Hyung et.al. 2501.08890v1 null
2025-01-14 DAViD: Modeling Dynamic Affordance of 3D Objects using Pre-trained Video Diffusion Models Hyeonwoo Kim et.al. 2501.08333v1 null
2025-01-14 Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise Ryan Burgert et.al. 2501.08331v1 link
2025-01-14 Gradient Equilibrium in Online Learning: Theory and Applications Anastasios N. Angelopoulos et.al. 2501.08330v1 link
2025-01-14 Predicting 4D Hand Trajectory from Monocular Videos Yufei Ye et.al. 2501.08329v1 null
2025-01-14 Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks Miran Heo et.al. 2501.08326v1 null
2025-01-14 GameFactory: Creating New Games with Generative Interactive Videos Jiwen Yu et.al. 2501.08325v1 null
2025-01-14 ADAM-1: AI and Bioinformatics for Alzheimer's Detection and Microbiome-Clinical Data Integrations Ziyuan Huang et.al. 2501.08324v1 null
2025-01-14 Exploring Robustness of Multilingual LLMs on Real-World Noisy Data Amirhossein Aliakbarzadeh et.al. 2501.08322v1 link
2025-01-14 Diffusion Adversarial Post-Training for One-Step Video Generation Shanchuan Lin et.al. 2501.08316v1 null
2025-01-14 Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series Classification Wennuo Yang et.al. 2501.08305v1 link
2025-01-13 UnCommon Objects in 3D Xingchen Liu et.al. 2501.07574v1 link
2025-01-13 Statistical learnability of smooth boundaries via pairwise binary classification with deep ReLU networks Hiroki Waida et.al. 2501.07571v1 null
2025-01-13 A reference framework for extremely metal-poor OB star studies: calibrations for stellar parameters and intrinsic colours Marta Lorenzo et.al. 2501.07569v1 null
2025-01-13 Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss Xinyu Zhang et.al. 2501.07563v1 null
2025-01-13 SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing Varun Biyyala et.al. 2501.07554v1 link
2025-01-13 IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion Tharun Anand et.al. 2501.07530v1 null
2025-01-13 Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization Aditya Devarakonda et.al. 2501.07526v1 null
2025-01-13 RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment Difei Gu et.al. 2501.07525v1 link
2025-01-13 Completing Sets of Prototype Transfer Functions for Subspace-based Direction of Arrival Estimation of Multiple Speakers Daniel Fejgin et.al. 2501.07524v1 null
2025-01-13 Inductive Learning of Robot Task Knowledge from Raw Data and Online Expert Feedback Daniele Meli et.al. 2501.07507v1 link
2025-01-10 Multi-subject Open-set Personalization in Video Generation Tsai-Shien Chen et.al. 2501.06187v1 null
2025-01-10 VideoAuteur: Towards Long Narrative Video Generation Junfei Xiao et.al. 2501.06173v1 null
2025-01-10 PySpatial: A High-Speed Whole Slide Image Pathomics Toolkit Yuechen Yang et.al. 2501.06151v1 null
2025-01-10 MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection Arkaprava Sinha et.al. 2501.06138v1 null
2025-01-10 Benchmarking Different Application Types across Heterogeneous Cloud Compute Services Nivedhitha Duggi et.al. 2501.06128v1 null
2025-01-10 Merging Feed-Forward Sublayers for Compressed Transformers Neha Verma et.al. 2501.06126v1 link
2025-01-10 Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding Fabian David Schmidt et.al. 2501.06117v1 null
2025-01-10 ELFATT: Efficient Linear Fast Attention for Vision Transformers Chong Wu et.al. 2501.06098v1 null
2025-01-10 Averaged Adam accelerates stochastic optimization in the training of deep neural network approximations for partial differential equation and optimal control problems Steffen Dereich et.al. 2501.06081v1 link
2025-01-10 Explaining k-Nearest Neighbors: Abductive and Counterfactual Explanations Pablo Barceló et.al. 2501.06078v1 null
2025-01-09 An Empirical Study of Autoregressive Pre-training from Videos Jathushan Rajasegaran et.al. 2501.05453v1 null
2025-01-09 Fortuity in the D1-D5 system Chi-Ming Chang et.al. 2501.05448v1 null
2025-01-09 Progressive Growing of Video Tokenizers for Highly Compressed Latent Spaces Aniruddha Mahapatra et.al. 2501.05442v1 null
2025-01-09 From Images to Insights: Transforming Brain Cancer Diagnosis with Explainable AI Md. Arafat Alam Khandaker et.al. 2501.05426v1 null
2025-01-09 Seeing Sound: Assembling Sounds from Visuals for Audio-to-Image Generation Darius Petermann et.al. 2501.05413v1 null
2025-01-09 Innovative Designs and Insights into Quantum Thermal Machines Aline D. Lucio et.al. 2501.05406v1 null
2025-01-09 Mechanistic understanding and validation of large AI models with SemanticLens Maximilian Dreyer et.al. 2501.05398v1 null
2025-01-09 1-2-1: Renaissance of Single-Network Paradigm for Virtual Try-On Shuliang Ning et.al. 2501.05369v1 null
2025-01-09 Video-Conferencing Beyond Screen-Sharing and Thumbnail Webcam Videos: Gesture-Aware Augmented Reality Video for Data-Rich Remote Presentations Matthew Brehmer et.al. 2501.05345v1 null
2025-01-09 Stability and List-Replicability for Agnostic Learners Ari Blonda et.al. 2501.05333v1 null
2025-01-09 Probing Speaker-specific Features in Speaker Representations Aemon Yat Fei Chiu et.al. 2501.05310v1 null
2025-01-08 Planarian Neural Networks: Evolutionary Patterns from Basic Bilateria Shaping Modern Artificial Neural Network Architectures Ziyuan Huang et.al. 2501.04700v1 null
2025-01-08 ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning Yuzhou Huang et.al. 2501.04698v1 null
2025-01-08 Non-Markovian dynamics of BIC generation via single-photon scattering Giuseppe Magnifico et.al. 2501.04691v1 null
2025-01-08 Learning by Confusion: The Phase Diagram of the Holstein Model George Issa et.al. 2501.04681v1 null
2025-01-08 RadGPT: Constructing 3D Image-Text Tumor Datasets Pedro R. A. S. Bassi et.al. 2501.04678v1 link
2025-01-08 Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs Yikang Zhou et.al. 2501.04670v1 link
2025-01-08 HyFusion: Enhanced Reception Field Transformer for Hyperspectral Image Fusion Chia-Ming Lee et.al. 2501.04665v1 null
2025-01-08 Discrete Wavelet Transform-Based Capsule Network for Hyperspectral Image Classification Zhiqiang Gao et.al. 2501.04643v1 null
2025-01-08 A Statistical Theory of Contrastive Pre-training and Multimodal Generative AI Kazusato Oko et.al. 2501.04641v1 link
2025-01-08 Framework for Integrating Machine Learning Methods for Path-Aware Source Routing Anees Al-Najjar et.al. 2501.04624v1 null
2025-01-07 Extraction Of Cumulative Blobs From Dynamic Gestures Rishabh Naulakha et.al. 2501.04002v1 null
2025-01-07 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Haobo Yuan et.al. 2501.04001v1 null
2025-01-07 WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings Haochen Song et.al. 2501.03999v1 null
2025-01-07 Supervised and unsupervised learning the many-body critical phase, phase transitions and critical exponents in disordered quantum systems Aamna Ahmed et.al. 2501.03981v1 null
2025-01-07 Temporal Feature Weaving for Neonatal Echocardiographic Viewpoint Video Classification Satchel French et.al. 2501.03967v1 link
2025-01-07 Learning to Relax Nonconvex Quadratically Constrained Quadratic Programs Buket Ozen et.al. 2501.03954v1 null
2025-01-07 Reducing Proxy Discrimination Frank Fagan et.al. 2501.03946v1 null
2025-01-07 Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers Yuechen Zhang et.al. 2501.03931v1 link
2025-01-07 Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback Jiakang Yuan et.al. 2501.03916v1 null
2025-01-07 The Cable to the Moon: Veritasium's Light Bulb Experiment in Low-Cost Miniature Form Michael Lenz et.al. 2501.03896v1 null
2025-01-06 RW-Net: Enhancing Few-Shot Point Cloud Classification with a Wavelet Transform Projection-based Network Haosheng Zhang et.al. 2501.03221v1 null
2025-01-06 ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking Tingyang Zhang et.al. 2501.03220v1 null
2025-01-06 Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction Rui Qian et.al. 2501.03218v1 link
2025-01-06 Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLMs-Generated Text Ayat Najjar et.al. 2501.03212v1 null
2025-01-06 Multimodal Machine Learning Can Predict Videoconference Fluidity and Enjoyment Andrew Chang et.al. 2501.03190v1 null
2025-01-06 GLiREL -- Generalist Model for Zero-Shot Relation Extraction Jack Boylan et.al. 2501.03172v1 null
2025-01-06 Deep-Relative-Trust-Based Diffusion for Decentralized Deep Learning Muyun Li et.al. 2501.03162v1 null
2025-01-06 Segment Anything Model for Zero-shot Single Particle Tracking in Liquid Phase Transmission Electron Microscopy Risha Goel et.al. 2501.03153v1 null
2025-01-06 MVP: Multimodal Emotion Recognition based on Video and Physiological Signals Valeriya Strizhkova et.al. 2501.03103v1 null
2025-01-06 Trust Modeling in Counseling Conversations: A Benchmark Study Aseem Srivastava et.al. 2501.03064v1 null
2025-01-03 VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Chaoyou Fu et.al. 2501.01957v1 link
2025-01-03 VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment Wenyan Cong et.al. 2501.01949v1 null
2025-01-03 Bridging Classification and Segmentation in Osteosarcoma Assessment via Foundation and Discrete Diffusion Models Manh Duong Nguyen et.al. 2501.01932v1 null
2025-01-03 GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction Yuwei Miao et.al. 2501.01930v1 null
2025-01-03 Transformer-Driven Inverse Problem Transform for Fast Blind Hyperspectral Image Dehazing Po-Wei Tang et.al. 2501.01924v1 null
2025-01-03 Structural and Statistical Audio Texture Knowledge Distillation (SSATKD) for Passive Sonar Classification Jarin Ritu et.al. 2501.01921v1 null
2025-01-03 Exoplanet Detection via Differentiable Rendering Brandon Y. Feng et.al. 2501.01912v1 null
2025-01-03 EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation Siyuan Huang et.al. 2501.01895v1 null
2025-01-03 ANTHROPOS-V: benchmarking the novel task of Crowd Volume Estimation Luca Collorone et.al. 2501.01877v1 null
2025-01-03 Extensions of finite irreducible modules over rank two Lie conformal algebra Lipeng Luo et.al. 2501.01870v1 null
2025-01-02 GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models Zhangyang Qi et.al. 2501.01428v1 null
2025-01-02 VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Yuanpeng Tu et.al. 2501.01427v1 null
2025-01-02 Unifying Specialized Visual Encoders for Video Language Models Jihoon Chung et.al. 2501.01426v1 null
2025-01-02 Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions Xincheng Shuai et.al. 2501.01425v1 null
2025-01-02 Multi-Modal Video Feature Extraction for Popularity Prediction Haixu Liu et.al. 2501.01422v1 null
2025-01-02 A Multi-task Supervised Compression Model for Split Computing Yoshitomo Matsubara et.al. 2501.01420v1 null
2025-01-02 On Unifying Video Generation and Camera Pose Estimation Chun-Hao Paul Huang et.al. 2501.01409v1 null
2025-01-02 nnY-Net: Swin-NeXt with Cross-Attention for 3D Medical Images Segmentation Haixu Liu et.al. 2501.01406v1 null
2025-01-02 VoiceVector: Multimodal Enrolment Vectors for Speaker Separation Akam Rahimi et.al. 2501.01401v1 null
2025-01-02 ProjectedEx: Enhancing Generation in Explainable AI for Prostate Cancer Xuyin Qi et.al. 2501.01392v1 null
2024-12-30 PERSE: Personalized 3D Generative Avatars from A Single Portrait Hyunsoo Cha et.al. 2412.21206v1 null
2024-12-30 Action-Agnostic Point-Level Supervision for Temporal Action Detection Shuhei M. Yoshida et.al. 2412.21205v1 link
2024-12-30 A Large-Scale Study on Video Action Dataset Condensation Yang Chen et.al. 2412.21197v1 null
2024-12-30 Classification of del Pezzo surfaces of rank one. I. Height 1 and 2. II. Descendants with elliptic boundaries Karol Palka et.al. 2412.21174v1 null
2024-12-30 Adversarial Attack and Defense for LoRa Device Identification and Authentication via Deep Learning Yalin E. Sagduyu et.al. 2412.21164v1 null
2024-12-30 Open RAN-Enabled Deep Learning-Assisted Mobility Management for Connected Vehicles Maria Barbosa et.al. 2412.21161v1 null
2024-12-30 Unified dimensionality reduction techniques in chronic liver disease detection Anand Karna et.al. 2412.21156v1 null
2024-12-30 Irreducible representations of welded braid group Inna Sysoeva et.al. 2412.21133v1 null
2024-12-30 Galaxy Spectra Networks (GaSNet). III. Generative pre-trained network for spectrum reconstruction, redshift estimate and anomaly detection Fucheng Zhong et.al. 2412.21130v1 link
2024-12-30 All toric Kahler surfaces with twistor 2-forms Sergei G. Ovchinnikov et.al. 2412.21114v1 null
2024-12-27 Streamlined Krylov construction and classification of ergodic Floquet systems Nikita Kolganov et.al. 2412.19797v1 null
2024-12-27 MVTamperBench: Evaluating Robustness of Vision-Language Models Amit Agarwal et.al. 2412.19794v1 null
2024-12-27 Machine Learning for Sentiment Analysis of Imported Food in Trinidad and Tobago Cassandra Daniels et.al. 2412.19781v1 null
2024-12-27 Classification of Minimal Abelian Coulomb Branches Antoine Bourget et.al. 2412.19766v1 null
2024-12-27 Can one hear the shape of a random walk? Michael J. Larsen et.al. 2412.19762v1 null
2024-12-27 Generative Video Propagation Shaoteng Liu et.al. 2412.19761v1 null
2024-12-27 Generative Pretrained Embedding and Hierarchical Irregular Time Series Representation for Daily Living Activity Recognition Damien Bouchabou et.al. 2412.19732v1 null
2024-12-27 EEG-Reptile: An Automatized Reptile-Based Meta-Learning Library for BCIs Daniil A. Berdyshev et.al. 2412.19725v1 link
2024-12-27 Quantum correlations in a gravitational collapse simulation with SpheriCo.jl Benjamin Berczi et.al. 2412.19722v1 null
2024-12-27 ProKAN: Progressive Stacking of Kolmogorov-Arnold Networks for Efficient Liver Segmentation Bhavesh Gyanchandani et.al. 2412.19713v1 null
2024-12-24 Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models Jinhui Yi et.al. 2412.18609v1 link
2024-12-24 DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers Yuntao Chen et.al. 2412.18607v1 null
2024-12-24 ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation Hongjie Li et.al. 2412.18600v1 null
2024-12-24 DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation Minghong Cai et.al. 2412.18597v1 link
2024-12-24 ClassifyViStA:WCE Classification with Visual understanding through Segmentation and Attention S. Balasubramanian et.al. 2412.18591v1 link
2024-12-24 Text-Driven Tumor Synthesis Xinran Li et.al. 2412.18589v1 null
2024-12-24 Resolution-Robust 3D MRI Reconstruction with 2D Diffusion Priors: Diverse-Resolution Training Outperforms Interpolation Anselm Krainovic et.al. 2412.18584v1 null
2024-12-24 New method of image processing via statistical analysis for application in intelligent systems Monalisa Cavalcante et.al. 2412.18575v1 null
2024-12-24 3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement Yihang Luo et.al. 2412.18565v1 null
2024-12-24 Distilling Fine-grained Sentiment Understanding from Large Language Models Yice Zhang et.al. 2412.18552v1 link
2024-12-23 FaceLift: Single Image to 3D Head with View Generation and GS-LRM Weijie Lyu et.al. 2412.17812v1 null
2024-12-23 Large Motion Video Autoencoding with Cross-modal Video VAE Yazhou Xing et.al. 2412.17805v1 null
2024-12-23 GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator Yidi Shao et.al. 2412.17804v1 null
2024-12-23 Classification of exchange relation planar algebras through sieving forest fusion graphs Fan Lu et.al. 2412.17790v1 null
2024-12-23 Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy Priyaranjan Pattnayak et.al. 2412.17759v1 null
2024-12-23 Induced subgraphs and tree decompositions XVIII. Obstructions to bounded pathwidth Maria Chudnovsky et.al. 2412.17756v1 null
2024-12-23 LASE: Learned Adjacency Spectral Embeddings Sofía Pérez Casulo et.al. 2412.17734v1 null
2024-12-23 VidTwin: Video VAE with Decoupled Structure and Dynamics Yuchi Wang et.al. 2412.17726v1 link
2024-12-23 MRANet: A Modified Residual Attention Networks for Lung and Colon Cancer Classification Diponkor Bala et.al. 2412.17700v1 null
2024-12-23 An efficient volume-preserving MBO scheme for data clustering and classification Fabius Krämer et.al. 2412.17694v1 null
2024-12-20 Can Generative Video Models Help Pose Estimation? Ruojin Cai et.al. 2412.16155v1 null
2024-12-20 MotiF: Making Text Count in Image Animation with Motion Focal Loss Shijie Wang et.al. 2412.16153v1 null
2024-12-20 Shape Shifters: Does Body Shape Change the Perception of Small-Scale Crowd Motions? Bharat Vyas et.al. 2412.16151v1 null
2024-12-20 SeagrassFinder: Deep Learning for Eelgrass Detection and Coverage Estimation in the Wild Jannik Elsäßer et.al. 2412.16147v1 null
2024-12-20 Mamba2D: A Natively Multi-Dimensional State-Space Model for Vision Tasks Enis Baty et.al. 2412.16146v1 null
2024-12-20 FedGAT: A Privacy-Preserving Federated Approximation Algorithm for Graph Attention Networks Siddharth Ambekar et.al. 2412.16144v1 null
2024-12-20 Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts Muhammad Abdullah Sohail et.al. 2412.16119v1 link
2024-12-20 PruneVid: Visual Token Pruning for Efficient Video Large Language Models Xiaohu Huang et.al. 2412.16117v1 link
2024-12-20 Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG Hasan Md Tusfiqur Alam et.al. 2412.16086v1 link
2024-12-20 Efficient MedSAMs: Segment Anything in Medical Images on Laptop Jun Ma et.al. 2412.16085v1 link
2024-12-19 LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis Hanlin Wang et.al. 2412.15214v1 null
2024-12-19 Scaling 4D Representations João Carreira et.al. 2412.15212v1 null
2024-12-19 AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation Moayed Haji-Ali et.al. 2412.15191v1 null
2024-12-19 EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues Sagar Soni et.al. 2412.15190v1 null
2024-12-19 Surface-Based Authentication System for Integrated Circuit Chips Runze Liu et.al. 2412.15186v1 null
2024-12-19 Tiled Diffusion Or Madar et.al. 2412.15185v1 null
2024-12-19 SqueezeMe: Efficient Gaussian Avatars for VR Shunsuke Saito et.al. 2412.15171v1 null
2024-12-19 OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization Jiacheng Zhang et.al. 2412.15159v1 null
2024-12-19 Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM Yatai Ji et.al. 2412.15156v1 link
2024-12-19 Cruise Control: Dynamic Model Selection for ML-Based Network Traffic Analysis Johann Hugon et.al. 2412.15146v1 null
2024-12-18 AniDoc: Animation Creation Made Easier Yihao Meng et.al. 2412.14173v1 null
2024-12-18 Learning from Massive Human Videos for Universal Humanoid Pose Control Jiageng Mao et.al. 2412.14172v1 null
2024-12-18 Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Jihan Yang et.al. 2412.14171v1 link
2024-12-18 Autoregressive Video Generation without Vector Quantization Haoge Deng et.al. 2412.14169v1 link
2024-12-18 VideoDPO: Omni-Preference Alignment for Video Diffusion Generation Runtao Liu et.al. 2412.14167v1 null
2024-12-18 AKiRa: Augmentation Kit on Rays for optical video generation Xi Wang et.al. 2412.14158v1 null
2024-12-18 AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities Guillaume Astruc et.al. 2412.14123v1 link
2024-12-18 GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images Ziyang Xu et.al. 2412.14118v1 link
2024-12-18 Parameter-efficient Fine-tuning for improved Convolutional Baseline for Brain Tumor Segmentation in Sub-Saharan Africa Adult Glioma Dataset Bijay Adhikari et.al. 2412.14100v1 null
2024-12-18 Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts Jihye Choi et.al. 2412.14097v1 null
2024-12-17 MotionBridge: Dynamic Video Inbetweening with Flexible Controls Maham Tanveer et.al. 2412.13190v1 null
2024-12-17 StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models Yunzhi Yan et.al. 2412.13188v1 null
2024-12-17 HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction Chen Bao et.al. 2412.13187v1 null
2024-12-17 Move-in-2D: 2D-Conditioned Human Motion Generation Hsin-Ping Huang et.al. 2412.13185v1 null
2024-12-17 Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures Guoxing Sun et.al. 2412.13183v1 null
2024-12-17 NFL-BA: Improving Endoscopic SLAM with Near-Field Light Bundle Adjustment Andrea Dunn Beltran et.al. 2412.13176v1 null
2024-12-17 Learning Visuotactile Estimation and Control for Non-prehensile Manipulation under Occlusions Juan Del Aguila Ferrandis et.al. 2412.13157v1 null
2024-12-17 Continuous Patient Monitoring with AI: Real-Time Analysis of Video in Hospital Care Settings Paolo Gabriel et.al. 2412.13152v1 null
2024-12-17 Label Errors in the Tobacco3482 Dataset Gordon Lim et.al. 2412.13140v1 link
2024-12-17 Unlocking the Potential of Digital Pathology: Novel Baselines for Compression Maximilian Fischer et.al. 2412.13137v1 null
2024-12-16 Wonderland: Navigating 3D Scenes from a Single Image Hanwen Liang et.al. 2412.12091v1 null
2024-12-16 Instruction-based Image Manipulation by Watching How Things Move Mingdeng Cao et.al. 2412.12087v1 null
2024-12-16 CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology Yuxuan Sun et.al. 2412.12077v1 null
2024-12-16 CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding Guo Chen et.al. 2412.12075v1 null
2024-12-16 Exploring Semantic Consistency and Style Diversity for Domain Generalized Semantic Segmentation Hongwei Niu et.al. 2412.12050v1 link
2024-12-16 Deep-learning-based identification of individual motion characteristics from upper-limb trajectories towards disorder stage evaluation Tim Sziburis et.al. 2412.12016v1 null
2024-12-16 Cost-Effective Label-free Node Classification with LLMs Taiyan Zhang et.al. 2412.11983v1 null
2024-12-16 On the Nielsen-Thomsen sequence Laurent Cantier et.al. 2412.11975v1 null
2024-12-16 On vertex-transitive distance-regular covers of complete graphs with an extremal smallest eigenvalue Ludmila Yu. Tsiovkina et.al. 2412.11962v1 null
2024-12-16 Gramian Multimodal Representation Learning and Alignment Giordano Cicchetti et.al. 2412.11959v1 null
2024-12-13 UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities Muhammad Uzair Khattak et.al. 2412.10372v1 link
2024-12-13 Apollo: An Exploration of Video Understanding in Large Multimodal Models Orr Zohar et.al. 2412.10360v1 null
2024-12-13 Robust image classification with multi-modal large language models Francesco Villani et.al. 2412.10353v1 null
2024-12-13 BrushEdit: All-In-One Image Inpainting and Editing Yaowei Li et.al. 2412.10316v1 null
2024-12-13 Performance evaluation of predictive AI models to support medical decisions: Overview and guidance Ben Van Calster et.al. 2412.10288v1 null
2024-12-13 TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation Xingrui Wang et.al. 2412.10275v1 null
2024-12-13 Reasoner Outperforms: Generative Stance Detection with Rationalization for Social Media Jiaqing Yuan et.al. 2412.10266v1 null
2024-12-13 Adversarial Robustness of Bottleneck Injected Deep Neural Networks for Task-Oriented Communication Alireza Furutanpey et.al. 2412.10265v1 null
2024-12-13 MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization Shuaiting Li et.al. 2412.10261v1 null
2024-12-13 Copy-Move Detection in Optical Microscopy: A Segmentation Network and A Dataset Hao-Chiang Shao et.al. 2412.10258v1 null
2024-12-12 Doe-1: Closed-Loop Autonomous Driving with Large World Model Wenzhao Zheng et.al. 2412.09627v1 link
2024-12-12 FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Haonan Qiu et.al. 2412.09626v1 null
2024-12-12 GenEx: Generating an Explorable World Taiming Lu et.al. 2412.09624v1 null
2024-12-12 OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation Weiqi Li et.al. 2412.09623v1 null
2024-12-12 Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos Linyi Jin et.al. 2412.09621v1 null
2024-12-12 Learning Camera Movement Control from Real-World Drone Videos Yunzhong Hou et.al. 2412.09620v1 null
2024-12-12 NormalFlow: Fast, Robust, and Accurate Contact-based Object 6DoF Pose Tracking with Vision-based Tactile Sensors Hung-Jui Huang et.al. 2412.09617v1 link
2024-12-12 V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding Junqi Ge et.al. 2412.09616v1 link
2024-12-12 PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models Chenyu Yang et.al. 2412.09613v1 null
2024-12-12 Olympus: A Universal Task Router for Computer Vision Tasks Yuanze Lin et.al. 2412.09612v1 link
2024-12-11 StreamChat: Chatting with Streaming Video Jihao Liu et.al. 2412.08646v1 null
2024-12-11 Generative Semantic Communication: Architectures, Technologies, and Applications Jinke Ren et.al. 2412.08642v1 null
2024-12-11 Multimodal Latent Language Modeling with Next-Token Diffusion Yutao Sun et.al. 2412.08635v1 null
2024-12-11 MNIST-Fraction: Enhancing Math Education with AI-Driven Fraction Detection and Analysis Pegah Ahadian et.al. 2412.08633v1 null
2024-12-11 Image Retrieval Methods in the Dissimilarity Space Madhu Kiran et.al. 2412.08618v1 null
2024-12-11 CCSNscore: A multi-input deep learning tool for classification of core-collapse supernovae using SED-Machine spectra Yashvi Sharma et.al. 2412.08601v1 null
2024-12-11 RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation Mingfei Han et.al. 2412.08591v1 null
2024-12-11 SPACE-SUIT: An Artificial Intelligence based chromospheric feature extractor and classifier for SUIT Pranava Seth et.al. 2412.08589v1 null
2024-12-11 Advancing Single- and Multi-task Text Classification through Large Language Model Fine-tuning Hang Zhao et.al. 2412.08587v1 null
2024-12-11 Utilizing Multi-step Loss for Single Image Reflection Removal Abdelrahman Elnenaey et.al. 2412.08582v1 link
2024-12-10 Video Motion Transfer with Diffusion Transformers Alexander Pondaven et.al. 2412.07776v1 link
2024-12-10 UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics Xi Chen et.al. 2412.07774v1 null
2024-12-10 From Slow Bidirectional to Fast Causal Video Generators Tianwei Yin et.al. 2412.07772v1 null
2024-12-10 From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos Matthew Wallingford et.al. 2412.07770v1 null
2024-12-10 Learning Visual Generative Priors without Text Shuailei Ma et.al. 2412.07767v1 null
2024-12-10 Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation Jingxi Chen et.al. 2412.07761v1 null
2024-12-10 SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Jianhong Bai et.al. 2412.07760v1 link
2024-12-10 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation Xiao Fu et.al. 2412.07759v1 null
2024-12-10 PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation Fatemeh Nazarieh et.al. 2412.07754v1 null
2024-12-10 On Motion Blur and Deblurring in Visual Place Recognition Timur Ismagilov et.al. 2412.07751v1 null
2024-12-09 [MASK] is All You Need Vincent Tao Hu et.al. 2412.06787v1 link
2024-12-09 P3-PO: Prescriptive Point Priors for Visuo-Spatial Generalization of Robot Policies Mara Levy et.al. 2412.06784v1 null
2024-12-09 Convolution goes higher-order: a biologically inspired mechanism empowers image classification Simone Azeglio et.al. 2412.06740v1 null
2024-12-09 JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM Takuro Fujii et.al. 2412.06738v1 null
2024-12-09 Demystifying shock breakout spectra Christopher M. Irwin et.al. 2412.06734v1 null
2024-12-09 Parkinson's Disease Diagnosis Through Deep Learning: A Novel LSTM-Based Approach for Freezing of Gait Detection Aqib Nazir Mir et.al. 2412.06709v1 null
2024-12-09 You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale Baorui Ma et.al. 2412.06699v1 null
2024-12-09 FedSynthCT-Brain: A Federated Learning Framework for Multi-Institutional Brain MRI-to-CT Synthesis Ciro Benito Raggio et.al. 2412.06690v1 null
2024-12-09 Impact of Privacy Parameters on Deep Learning Models for Image Classification Basanta Chaulagain et.al. 2412.06689v1 null
2024-12-09 Diff5T: Benchmarking Human Brain Diffusion MRI with an Extensive 5.0 Tesla K-Space and Spatial Dataset Shanshan Wang et.al. 2412.06666v1 null
2024-12-06 Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model Lening Wang et.al. 2412.05280v1 link
2024-12-06 Sparse autoencoders reveal selective remapping of visual concepts during adaptation Hyesu Lim et.al. 2412.05276v1 link
2024-12-06 MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models Tuna Han Salih Meral et.al. 2412.05275v1 null
2024-12-06 Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Zhe Chen et.al. 2412.05271v1 null
2024-12-06 Mind the Time: Temporally-Controlled Multi-Event Video Generation Ziyi Wu et.al. 2412.05263v1 null
2024-12-06 TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft Qian Long et.al. 2412.05255v1 link
2024-12-06 Uncertainty Quantification for Transformer Models for Dark-Pattern Detection Javier Muñoz et.al. 2412.05251v1 null
2024-12-06 ColonNet: A Hybrid Of DenseNet121 And U-NET Model For Detection And Segmentation Of GI Bleeding Ayushman Singh et.al. 2412.05216v1 null
2024-12-06 LinVT: Empower Your Image-level Large Language Model to Understand Videos Lishuai Gao et.al. 2412.05185v1 link
2024-12-06 DreamColour: Controllable Video Colour Editing without Training Chaitat Utintu et.al. 2412.05180v1 null
2024-12-05 PaintScene4D: Consistent 4D Scene Generation from Text Prompts Vinayak Gupta et.al. 2412.04471v1 null
2024-12-05 QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos Sharath Girish et.al. 2412.04469v1 null
2024-12-05 NVILA: Efficient Frontier Visual Language Models Zhijian Liu et.al. 2412.04468v1 null
2024-12-05 VisionZip: Longer is Better but Not Necessary in Vision Language Models Senqiao Yang et.al. 2412.04467v1 link
2024-12-05 MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos Zhengqi Li et.al. 2412.04463v1 null
2024-12-05 4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion Chaoyang Wang et.al. 2412.04462v1 null
2024-12-05 Four-Plane Factorized Video Autoencoders Mohammed Suhail et.al. 2412.04452v1 null
2024-12-05 MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation Longtao Zheng et.al. 2412.04448v1 null
2024-12-05 EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios Lu Qiu et.al. 2412.04447v1 null
2024-12-05 DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models Yizhuo Li et.al. 2412.04446v1 null
2024-12-04 Navigation World Models Amir Bar et.al. 2412.03572v1 null
2024-12-04 The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control Ruili Feng et.al. 2412.03568v1 null
2024-12-04 Streaming Detection of Queried Event Start Cristobal Eyzaguirre et.al. 2412.03567v1 null
2024-12-04 Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Wujian Peng et.al. 2412.03565v1 null
2024-12-04 From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents Xinyi Mou et.al. 2412.03563v1 null
2024-12-04 Imagine360: Immersive 360 Video Generation from Perspective Anchor Jing Tan et.al. 2412.03552v1 null
2024-12-04 Kibble-Zurek Dynamics & Statistics of Topological Defects in Chiral Superfluid $^3$He Films Noble Gluscevich et.al. 2412.03544v1 null
2024-12-04 Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos Hanxue Liang et.al. 2412.03526v1 null
2024-12-04 Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention Hannan Lu et.al. 2412.03520v1 null
2024-12-04 Distillation of Diffusion Features for Semantic Correspondence Frank Fundel et.al. 2412.03512v1 null
2024-12-03 Motion Prompting: Controlling Video Generation with Motion Trajectories Daniel Geng et.al. 2412.02700v1 null
2024-12-03 An ADHD Diagnostic Interface Based on EEG Spectrograms and Deep Learning Techniques Medha Pappula et.al. 2412.02695v1 null
2024-12-03 FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation Kefan Chen et.al. 2412.02690v1 null
2024-12-03 AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction Lingteng Qiu et.al. 2412.02684v1 null
2024-12-03 On Third-Order Evolution Systems Describing Pseudo-Spherical or Spherical Surfaces Filipe Kelmer et.al. 2412.02657v1 null
2024-12-03 Robust soybean seed yield estimation using high-throughput ground robot videos Jiale Feng et.al. 2412.02642v1 null
2024-12-03 QA-TOOLBOX: Conversational Question-Answering for process task guidance in manufacturing Ramesh Manuvinakurike et.al. 2412.02638v1 null
2024-12-03 Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback Hiroki Furuta et.al. 2412.02617v1 null
2024-12-03 Interpretable Company Similarity with Sparse Autoencoders Marco Molinari et.al. 2412.02605v1 null
2024-12-03 Efficient Algorithms for Low Tubal Rank Tensor Approximation with Applications to Image Compression, Super-Resolution and Deep Learning Salman Ahmadi-Asl et.al. 2412.02598v1 null
2024-12-02 T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs Shukang Yin et.al. 2411.19951v2 link
2024-11-29 AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos Yuze He et.al. 2411.19950v1 null
2024-11-29 Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark Joseph Heyward et.al. 2411.19941v1 null
2024-11-29 SIMS: Simulating Human-Scene Interactions with Real World Script Planning Wenjia Wang et.al. 2411.19921v1 null
2024-11-29 Noncommutative Model Selection for Data Clustering and Dimension Reduction Using Relative von Neumann Entropy Araceli Guzmán-Tristán et.al. 2411.19902v1 null
2024-11-29 To the Problem of Cosmic Expansion in Massive Gravity Lavinia Heisenberg et.al. 2411.19873v1 null
2024-11-29 AIDetx: a compression-based method for identification of machine-learning generated text Leonardo Almeida et.al. 2411.19869v1 link
2024-11-29 Towards Class-wise Robustness Analysis Tejaswini Medi et.al. 2411.19853v1 null
2024-11-29 Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation Dimosthenis Antypas et.al. 2411.19832v1 null
2024-11-29 A new definition of outsplitting on $k$-graphs preserving Morita equivalence Mackenzie Amann et.al. 2411.19816v1 null
2024-11-27 GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data Wentao Wang et.al. 2411.18624v1 null
2024-11-27 Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data Aoran Shen et.al. 2411.18622v1 null
2024-11-27 CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Rundi Wu et.al. 2411.18613v1 null
2024-11-27 Novel Class Discovery for Open Set Raga Classification Parampreet Singh et.al. 2411.18611v1 null
2024-11-27 Variability of hot sub-luminous stars and binaries: Machine learning analysis of Gaia DR3 multi-epoch photometry P. Ranaivomanana et.al. 2411.18609v1 null
2024-11-27 Evaluating and Improving the Effectiveness of Synthetic Chest X-Rays for Medical Image Analysis Eva Prakash et.al. 2411.18602v1 null
2024-11-27 Periodic symplectic and Hamiltonian diffeomorphisms on irrational ruled surfaces Nicholas Lindsay et.al. 2411.18580v1 null
2024-11-27 Pruning Deep Convolutional Neural Network Using Conditional Mutual Information Tien Vu-Van et.al. 2411.18578v1 null
2024-11-27 Exploring Depth Information for Detecting Manipulated Face Videos Haoyue Wang et.al. 2411.18572v1 null
2024-11-27 Perturbation Ontology based Graph Attention Networks Yichen Wang et.al. 2411.18520v1 null
2024-11-26 Video-Guided Foley Sound Generation with Multimodal Controls Ziyang Chen et.al. 2411.17698v1 null
2024-11-26 StableAnimator: High-Quality Identity-Preserving Human Image Animation Shuyuan Tu et.al. 2411.17697v1 link
2024-11-26 Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis Akshita Gupta et.al. 2411.17690v1 null
2024-11-26 BERT or FastText? A Comparative Analysis of Contextual as well as Non-Contextual Embeddings Abhay Shanbhag et.al. 2411.17661v1 null
2024-11-26 DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting Christian Homeyer et.al. 2411.17660v1 link
2024-11-26 SAMWISE: Infusing wisdom in SAM2 for Text-Driven Video Segmentation Claudia Cuttano et.al. 2411.17646v1 link
2024-11-26 A robust image encryption scheme based on new 4-D hyperchaotic system and elliptic curve Yehia Lalili et.al. 2411.17643v1 null
2024-11-26 On Limitations of LLM as Annotator for Low Resource Languages Suramya Jadhav et.al. 2411.17637v1 null
2024-11-26 An Ensemble Approach for Brain Tumor Segmentation and Synthesis Juampablo E. Heras Rivera et.al. 2411.17617v1 null
2024-11-26 Accelerating Vision Diffusion Transformers with Skip Branches Guanjie Chen et.al. 2411.17616v1 link
2024-11-25 Generative Omnimatte: Learning to Decompose Video into Layers Yao-Chih Lee et.al. 2411.16683v1 null
2024-11-25 Quark: Real-time, High-resolution, and General Neural View Synthesis John Flynn et.al. 2411.16680v1 null
2024-11-25 A Supervised Machine Learning Approach for Assessing Grant Peer Review Reports Gabriel Okasa et.al. 2411.16662v1 null
2024-11-25 Fast training of large kernel models with delayed projections Amirhesam Abedsoltan et.al. 2411.16658v1 null
2024-11-25 DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation Zun Wang et.al. 2411.16657v1 null
2024-11-25 Automated Registration of 3D Neurovascular Territory Atlas to 2D DSA for Targeted Quantitative Angiography Analysis George Dimopoulos et.al. 2411.16637v1 null
2024-11-25 LegoPET: Hierarchical Feature Guided Conditional Diffusion for PET Image Reconstruction Yiran Sun et.al. 2411.16629v1 null
2024-11-25 Inference-Time Policy Steering through Human Interactions Yanwei Wang et.al. 2411.16627v1 null
2024-11-25 Imperceptible Adversarial Examples in the Physical World Weilin Xu et.al. 2411.16622v1 null
2024-11-25 Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric Zhichao Zhang et.al. 2411.16619v1 null
2024-11-22 Health AI Developer Foundations Atilla P. Kiraly et.al. 2411.15128v1 null
2024-11-22 PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision Arnav M. Das et.al. 2411.15127v1 null
2024-11-22 VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement Daeun Lee et.al. 2411.15115v1 null
2024-11-22 About Time: Advances, Challenges, and Outlooks of Action Understanding Alexandros Stergiou et.al. 2411.15106v1 null
2024-11-22 Efficient Radar Modulation Recognition via a Noise-Aware Ensemble Neural Network Do-Hyun Park et.al. 2411.15104v1 null
2024-11-22 RED: Effective Trajectory Representation Learning with Comprehensive Information Silin Zhou et.al. 2411.15096v1 null
2024-11-22 Dimension-independent rates for structured neural density estimation Robert A. Vandermeulen et.al. 2411.15095v1 null
2024-11-22 Quantum-enhanced unsupervised image segmentation for medical images analysis Laia Domingo et.al. 2411.15086v1 null
2024-11-22 Leapfrog Latent Consistency Model (LLCM) for Medical Images Generation Lakshmikar R. Polamreddy et.al. 2411.15084v1 link
2024-11-22 RankByGene: Gene-Guided Histopathology Representation Learning Through Cross-Modal Ranking Consistency Wentao Huang et.al. 2411.15076v1 null
2024-11-21 Revisiting the Integration of Convolution and Attention for Vision Backbone Lei Zhu et.al. 2411.14429v1 link
2024-11-21 Quantum States Imaging of Magnetic Field Contours based on Autler-Townes Effect in Yb Atoms Tanaporn Na Narong et.al. 2411.14426v1 null
2024-11-21 Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation Zhuoman Liu et.al. 2411.14423v1 null
2024-11-21 Multimodal 3D Brain Tumor Segmentation with Adversarial Training and Conditional Random Field Lan Jiang et.al. 2411.14418v1 null
2024-11-21 Multimodal Autoregressive Pre-training of Large Vision Encoders Enrico Fini et.al. 2411.14402v1 link
2024-11-21 Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding Yiming Zhang et.al. 2411.14401v1 null
2024-11-21 POS-tagging to highlight the skeletal structure of sentences Grigorii Churakov et.al. 2411.14393v1 link
2024-11-21 Persistent Homology for Structural Characterization in Disordered Systems An Wang et.al. 2411.14390v1 link
2024-11-21 Enhancing Diagnostic Precision in Gastric Bleeding through Automated Lesion Segmentation: A Deep DuS-KFCM Approach Xian-Xian Liu et.al. 2411.14385v1 null
2024-11-21 Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation Yuanhao Cai et.al. 2411.14384v1 null
2024-11-20 REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents Rui Tian et.al. 2411.13552v1 link
2024-11-20 Generating 3D-Consistent Videos from Unposed Internet Photos Gene Chou et.al. 2411.13549v1 null
2024-11-20 Comparative Analysis of Machine Learning and Deep Learning Models for Classifying Squamous Epithelial Cells of the Cervix Subhasish Das et.al. 2411.13535v1 null
2024-11-20 Predictive Insights into LGBTQ+ Minority Stress: A Transductive Exploration of Social Media Discourse S. Chapagain et.al. 2411.13534v1 null
2024-11-20 Geometric Algebra Planes: Convex Implicit Neural Volumes Irmak Sivgin et.al. 2411.13525v1 null
2024-11-20 VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models Ziqi Huang et.al. 2411.13503v1 link
2024-11-20 Efficient Brain Imaging Analysis for Alzheimer's and Dementia Detection Using Convolution-Derivative Operations Yasmine Mustafa et.al. 2411.13490v1 null
2024-11-20 Benchmarking Quantum Convolutional Neural Networks for Classification and Data Compression Tasks Jun Yong Khoo et.al. 2411.13468v1 null
2024-11-20 Heuristically Adaptive Diffusion-Model Evolutionary Strategy Benedikt Hartl et.al. 2411.13420v1 null
2024-11-20 Transformer-Based Contextualized Language Models Joint with Neural Networks for Natural Language Inference in Vietnamese Dat Van-Thanh Nguyen et.al. 2411.13407v1 null
2024-11-19 Soft Robotic Dynamic In-Hand Pen Spinning Yunchao Yao et.al. 2411.12734v1 null
2024-11-19 Enhancing Multi-Class Disease Classification: Neoplasms, Cardiovascular, Nervous System, and Digestive Disorders Using Advanced LLMs Ahmed Akib Jawad Karim et.al. 2411.12712v1 null
2024-11-19 UBSoft: A Simulation Platform for Robotic Skill Learning in Unbounded Soft Environments Chunru Lin et.al. 2411.12711v1 null
2024-11-19 Attribute Inference Attacks for Federated Regression Tasks Francesco Diana et.al. 2411.12697v1 null
2024-11-19 IMUVIE: Pickup Timeline Action Localization via Motion Movies John Clapham et.al. 2411.12689v1 null
2024-11-19 AI Guided Early Screening of Cervical Cancer Dharanidharan S I et.al. 2411.12681v1 null
2024-11-19 Yang--Mills topology on four-dimensional triangulations Giuseppe Clemente et.al. 2411.12668v1 null
2024-11-19 Machine Learning Approaches on Crop Pattern Recognition a Comparative Analysis Kazi Hasibul Kabir et.al. 2411.12667v1 null
2024-11-19 PoM: Efficient Image and Video Generation with the Polynomial Mixer David Picard et.al. 2411.12663v1 link
2024-11-19 AdaCM$^2$: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction Yuanbin Man et.al. 2411.12593v1 null
2024-11-18 Partially Hyperbolic Dynamics with Quasi-isometric Center Ziqiang Feng et.al. 2411.11836v1 null
2024-11-18 Describe Now: User-Driven Audio Description for Blind and Low Vision Individuals Maryam Cheema et.al. 2411.11835v1 null
2024-11-18 Absorbing state dynamics of stochastic gradient descent Guanming Zhang et.al. 2411.11834v1 null
2024-11-18 Equivariant spatio-hemispherical networks for diffusion MRI deconvolution Axel Elaldi et.al. 2411.11819v1 link
2024-11-18 Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion Meng Zhou et.al. 2411.11799v1 link
2024-11-18 Exploring adversarial robustness of JPEG AI: methodology, comparison and new methods Egor Kovalev et.al. 2411.11795v1 null
2024-11-18 Energy shifts and broadening of excitonic resonances in electrostatically-doped semiconductors Hanan Dery et.al. 2411.11790v1 null
2024-11-18 High-Speed Cornering Control and Real-Vehicle Deployment for Autonomous Electric Vehicles Shiyue Zhao et.al. 2411.11762v1 null
2024-11-18 Additional Tests for TV 3.0 Eduardo Peixoto et.al. 2411.11755v1 null
2024-11-18 Advacheck at GenAI Detection Task 1: AI Detection Powered by Domain-Aware Multi-Tasking German Gritsai et.al. 2411.11736v1 null
2024-11-15 The Spatial Complexity of Optical Computing and How to Reduce It Yandong Li et.al. 2411.10435v1 null
2024-11-15 Private Counterfactual Retrieval With Immutable Features Shreya Meel et.al. 2411.10429v1 null
2024-11-15 Back to Supervision: Boosting Word Boundary Detection through Frame Classification Simone Carnemolla et.al. 2411.10423v1 null
2024-11-15 Multiscale Dubuc: A New Similarity Measure for Time Series Mahsa Khazaei et.al. 2411.10418v1 null
2024-11-15 Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations Jianfeng Chi et.al. 2411.10414v1 null
2024-11-15 Experimental demonstration of Tessellation Structured Illumination Microscopy Doron Shterman et.al. 2411.10405v1 null
2024-11-15 On the Foundation Model for Cardiac MRI Reconstruction Chi Zhang et.al. 2411.10403v1 null
2024-11-15 Tropical combinatorics of max-linear Bayesian networks Carlos Améndola et.al. 2411.10394v1 null
2024-11-15 Mechanisms of Generative Image-to-Image Translation Networks Guangzong Chen et.al. 2411.10368v1 null
2024-11-15 On the Cost of Model-Serving Frameworks: An Experimental Evaluation Pasquale De Rosa et.al. 2411.10337v1 null
2024-11-14 Towards a Classification of Open-Source ML Models and Datasets for Software Engineering Alexandra González et.al. 2411.09683v1 null
2024-11-14 Commensurability Among Deligne-Mostow Monodromy Groups Chenglong Yu et.al. 2411.09682v1 null
2024-11-14 Modular Fault Diagnosis Framework for Complex Autonomous Driving Systems Stefan Orf et.al. 2411.09643v1 null
2024-11-14 The Moral Foundations Weibo Corpus Renjie Cao et.al. 2411.09612v1 null
2024-11-14 Effect of viewing angle in Gamma-ray Burst properties Sreelakshmi P Chakyar et.al. 2411.09609v1 null
2024-11-14 Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration Yifan Shao et.al. 2411.09604v1 link
2024-11-14 Assessing the Performance of the DINOv2 Self-supervised Learning Vision Transformer Model for the Segmentation of the Left Atrium from MRI Images Bipasha Kundu et.al. 2411.09598v1 null
2024-11-14 SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms Soumick Chatterjee et.al. 2411.09593v1 null
2024-11-14 SimTube: Generating Simulated Video Comments through Multimodal AI and User Personas Yu-Kai Hung et.al. 2411.09577v1 null
2024-11-14 Mutual Influence of Photon Sphere and Non-Commutative Parameter in Various Non-Commutative Black Holes: Part I- Towards evidence for WGC Mohammad Ali S. Afshar et.al. 2411.09557v1 null
2024-11-13 4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization Mijeong Kim et.al. 2411.08879v1 null
2024-11-13 A Short Note on Evaluating RepNet for Temporal Repetition Counting in Videos Debidatta Dwibedi et.al. 2411.08878v1 link
2024-11-13 Quantum cryptography beyond key distribution: theory and experiment Mathieu Bozzio et.al. 2411.08877v1 null
2024-11-13 Large Wireless Model (LWM): A Foundation Model for Wireless Channels Sadjad Alikhani et.al. 2411.08872v1 null
2024-11-13 AstroM$^3$: A self-supervised multimodal model for astronomy Mariia Rizhko et.al. 2411.08842v1 null
2024-11-13 Multimodal Instruction Tuning with Hybrid State Space Models Jianing Zhou et.al. 2411.08840v1 null
2024-11-13 Model agnostic local variable importance for locally dependent relationships Kelvyn K. Bladen et.al. 2411.08821v1 null
2024-11-13 Identifying Spicules in Mg II: Statistics and Comparisons with Hα Vicki L. Herde et.al. 2411.08801v1 null
2024-11-13 Algorithms in 4-manifold topology Stefan Bastl et.al. 2411.08775v1 null
2024-11-13 Sharingan: Extract User Action Sequence from Desktop Recordings Yanting Chen et.al. 2411.08768v1 null
2024-11-12 Leonardo vindicated: Pythagorean trees for minimal reconstruction of the natural branching structures Dymitr Ruta et.al. 2411.08024v1 null
2024-11-12 Artistic Neural Style Transfer Algorithms with Activation Smoothing Xiangtian Li et.al. 2411.08014v1 null
2024-11-12 A computer-vision aided Compton-imaging system for radioactive waste characterization and decommissioning of nuclear power plants Victor Babiano-Suarez et.al. 2411.07996v1 null
2024-11-12 DINO-LG: A Task-Specific DINO Model for Coronary Calcium Scoring Mahmut S. Gokmen et.al. 2411.07976v1 null
2024-11-12 Commissioning An All-Sky Infrared Camera Array for Detection Of Airborne Objects Laura Dominé et.al. 2411.07956v1 null
2024-11-12 SimBase: A Simple Baseline for Temporal Video Grounding Peijun Bao et.al. 2411.07945v1 null
2024-11-12 DuoLift-GAN:Reconstructing CT from Single-view and Biplanar X-Rays with Generative Adversarial Networks Zhaoxi Zhang et.al. 2411.07941v1 null
2024-11-12 Prediction of Acoustic Communication Performance for AUVs using Gaussian Process Classification Yifei Gao et.al. 2411.07933v1 null
2024-11-12 CT-Mamba: A Hybrid Convolutional State Space Model for Low-Dose CT Denoising Linxuan Li et.al. 2411.07930v1 null
2024-11-12 CryptoLLM: Unleashing the Power of Prompted LLMs for SmartQnA and Classification of Crypto Posts Aniket Deroy et.al. 2411.07917v1 null
2024-11-11 Grounding Video Models to Actions through Goal Conditioned Exploration Yunhao Luo et.al. 2411.07223v1 null
2024-11-11 NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics David Robinson et.al. 2411.07186v1 null
2024-11-11 Enhancing Predictive Maintenance in Mining Mobile Machinery through a TinyML-enabled Hierarchical Inference Network Raúl de la Fuente et.al. 2411.07168v1 null
2024-11-11 Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation Kaijian Zou et.al. 2411.07130v1 link
2024-11-11 StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification Yichen He et.al. 2411.07076v1 link
2024-11-11 Unified Bayesian representation for high-dimensional multi-modal biomedical data for small-sample classification Albert Belenguer-Llorens et.al. 2411.07043v1 null
2024-11-11 The Inherent Adversarial Robustness of Analog In-Memory Computing Corey Lammie et.al. 2411.07023v1 null
2024-11-11 HeteroSample: Meta-path Guided Sampling for Heterogeneous Graph Representation Learning Ao Liu et.al. 2411.07022v1 null
2024-11-11 Token2Wave Xin Zhang et.al. 2411.06989v1 null
2024-11-11 A Hyperspectral Imaging Dataset and Methodology for Intraoperative Pixel-Wise Classification of Metastatic Colon Cancer in the Liver Ivica Kopriva et.al. 2411.06969v1 null
2024-11-08 Gender Inequalities in Content Collaborations: Asymmetric Creator Synergy and Symmetric Audience Biases Mingyue Zha et.al. 2411.05782v1 null
2024-11-08 Sketched Equivariant Imaging Regularization and Deep Internal Learning for Inverse Problems Guixian Xu et.al. 2411.05771v1 null
2024-11-08 FisherMask: Enhancing Neural Network Labeling Efficiency in Image Classification Using Fisher Information Shreen Gul et.al. 2411.05752v1 link
2024-11-08 Accurate Unsupervised Photon Counting from Transition Edge Sensor Signals Nicolas Dalbec-Constant et.al. 2411.05737v1 null
2024-11-08 Poze: Sports Technique Feedback under Data Constraints Agamdeep Singh et.al. 2411.05734v1 null
2024-11-08 Differential Privacy Under Class Imbalance: Methods and Empirical Insights Lucas Rosenblatt et.al. 2411.05733v1 null
2024-11-08 On-chip rewritable phase-change metasurface for programmable diffractive deep neural networks Sanaz Zarei et.al. 2411.05723v1 null
2024-11-08 Classification of ($ρ,τ,σ$)-derivations of two-dimensional left-symmetric dialgebras Basdouri Imed et.al. 2411.05716v1 null
2024-11-08 STARS: Sensor-agnostic Transformer Architecture for Remote Sensing Ethan King et.al. 2411.05714v1 null
2024-11-08 Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream Abdulkadir Gokce et.al. 2411.05712v1 link
2024-11-07 ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning David Junhao Zhang et.al. 2411.05003v1 null
2024-11-07 DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation Peiqi Liu et.al. 2411.04999v1 null
2024-11-07 HourVideo: 1-Hour Video-Language Understanding Keshigeyan Chandrasegaran et.al. 2411.04998v1 null
2024-11-07 SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation Koichi Namekata et.al. 2411.04989v1 null
2024-11-07 Efficient Preparation of Solvable Anyons with Adaptive Quantum Circuits Yuanjie Ren et.al. 2411.04985v1 null
2024-11-07 Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries Dylan Manuel et.al. 2411.04981v1 null
2024-11-07 Uncovering Hidden Subspaces in Video Diffusion Models Using Re-Identification Mischa Dombrowski et.al. 2411.04956v1 null
2024-11-07 Estimating the Influence of Sequentially Correlated Literary Properties in Textual Classification: A Data-Centric Hypothesis-Testing Approach Gideon Yoffe et.al. 2411.04950v1 null
2024-11-07 Proof of the absence of local conserved quantities in the spin-1 bilinear-biquadratic chain and its anisotropic extensions Akihiro Hokkyo et.al. 2411.04945v1 null
2024-11-07 A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model Panwen Hu et.al. 2411.04942v1 null
2024-11-06 RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models Maya Varma et.al. 2411.04097v1 link
2024-11-06 Local unitary equivalence of absolutely maximally entangled states constructed from orthogonal arrays N Ramadas et.al. 2411.04096v1 null
2024-11-06 A Collaborative Content Moderation Framework for Toxicity Detection based on Conformalized Estimates of Annotation Disagreement Guillermo Villate-Castillo et.al. 2411.04090v1 link
2024-11-06 Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning Ping Li et.al. 2411.04059v1 link
2024-11-06 Distinguishing Coupled Dark Energy Models with Neural Networks L. W. K. Goh et.al. 2411.04058v1 link
2024-11-06 Synomaly Noise and Multi-Stage Diffusion: A Novel Approach for Unsupervised Anomaly Detection in Ultrasound Imaging Yuan Bi et.al. 2411.04004v1 null
2024-11-06 Learning Aggregate Queries Defined by First-Order Logic with Counting Steffen van Bergerem et.al. 2411.04003v1 null
2024-11-06 ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks Ziji Shi et.al. 2411.03999v1 null
2024-11-06 Fine-tuning -- a Transfer Learning approach Joseph Arul Raj et.al. 2411.03941v1 null
2024-11-06 Inter-Frame Coding for Dynamic Meshes via Coarse-to-Fine Anchor Mesh Generation He Huang et.al. 2411.03921v1 null
2024-11-05 Classification Done Right for Vision-Language Pre-Training Huang Zilong et.al. 2411.03313v1 link
2024-11-05 Automatic solid form classification in pharmaceutical drug development Julius Lange et.al. 2411.03308v1 null
2024-11-05 Data-Driven Sampling Based Stochastic MPC for Skid-Steer Mobile Robot Navigation Ananya Trivedi et.al. 2411.03289v1 link
2024-11-05 Graph-Based Semi-Supervised Segregated Lipschitz Learning Farid Bozorgnia et.al. 2411.03273v1 null
2024-11-05 Tuning into spatial frequency space: Satellite and space debris detection in the ZTF alert stream J. P. Carvajal et.al. 2411.03258v1 null
2024-11-05 Kernel Orthogonality does not necessarily imply a Decrease in Feature Map Redundancy in CNNs: Convolutional Similarity Minimization Zakariae Belmekki et.al. 2411.03226v1 null
2024-11-05 Beyond Grid Data: Exploring Graph Neural Networks for Earth Observation Shan Zhao et.al. 2411.03223v1 null
2024-11-05 Statistical Analysis to Support CSI-Based Sensing Methods Elena Tonini et.al. 2411.03203v1 null
2024-11-05 Navigating Extremes: Dynamic Sparsity in Large Output Space Nasib Ullah et.al. 2411.03171v1 null
2024-11-05 Pre-trained Visual Dynamics Representations for Efficient Policy Learning Hao Luo et.al. 2411.03169v1 null
2024-11-04 Adaptive Caching for Faster Video Generation with Diffusion Transformers Kumara Kahatapitiya et.al. 2411.02397v1 null
2024-11-04 AutoVFX: Physically Realistic Video Editing from Natural Language Instructions Hao-Yu Hsu et.al. 2411.02394v1 null
2024-11-04 How Far is Video Generation from World Model: A Physical Law Perspective Bingyi Kang et.al. 2411.02385v1 null
2024-11-04 Drone Data Analytics for Measuring Traffic Metrics at Intersections in High-Density Areas Qingwen Pu et.al. 2411.02349v1 null
2024-11-04 SplatOverflow: Asynchronous Hardware Troubleshooting Amritansh Kwatra et.al. 2411.02332v1 null
2024-11-04 PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance Ruyang Liu et.al. 2411.02327v1 link
2024-11-04 GenXD: Generating Any 3D and 4D Scenes Yuyang Zhao et.al. 2411.02319v1 null
2024-11-04 Information plane and compression-gnostic feedback in quantum machine learning Nathan Haboury et.al. 2411.02313v1 null
2024-11-04 Grouped Discrete Representation for Object-Centric Learning Rongzhen Zhao et.al. 2411.02299v1 null
2024-11-04 Conformal-in-the-Loop for Learning with Imbalanced Noisy Data John Brandon Graham-Knight et.al. 2411.02281v1 null
2024-10-31 EgoMimic: Scaling Imitation Learning via Egocentric Video Simar Kareer et.al. 2410.24221v1 link
2024-10-31 Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning Penghui Ruan et.al. 2410.24219v1 link
2024-10-31 Learning Video Representations without Natural Videos Xueyang Yu et.al. 2410.24213v1 null
2024-11-01 DELTA: Dense Efficient Long-range 3D Tracking for any video Tuan Duc Ngo et.al. 2410.24211v2 null
2024-10-31 DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion Weicai Ye et.al. 2410.24203v1 link
2024-10-31 DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning Zhenyu Jiang et.al. 2410.24185v1 null
2024-10-31 Extended Object Tracking and Classification based on Linear Splines Matteo Tesori et.al. 2410.24183v1 null
2024-10-31 $π_0$: A Vision-Language-Action Flow Model for General Robot Control Kevin Black et.al. 2410.24164v1 null
2024-10-31 Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age Nouar AlDahoul et.al. 2410.24148v1 null
2024-10-31 HoloChrome: Polychromatic Illumination for Speckle Reduction in Holographic Near-Eye Displays Florian Schiffers et.al. 2410.24144v1 null
2024-10-30 Bridging the Human to Robot Dexterity Gap through Object-Oriented Rewards Irmak Guzey et.al. 2410.23289v1 null
2024-10-30 Computing the bridge length: the key ingredient in a continuous isometry classification of periodic point sets Jonathan McManus et.al. 2410.23288v1 null
2024-10-30 ReferEverything: Towards Segmenting Everything We Can Speak of in Videos Anurag Bagchi et.al. 2410.23287v1 null
2024-10-30 DisCo: Distributed Contact-Rich Trajectory Optimization for Forceful Multi-Robot Collaboration Ola Shorinwa et.al. 2410.23283v1 null
2024-10-30 A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization Bin Wu et.al. 2410.23279v1 null
2024-10-30 SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation Yining Hong et.al. 2410.23277v1 null
2024-10-30 TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models Ziyao Shangguan et.al. 2410.23266v1 link
2024-10-30 bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction Yehe Liu et.al. 2410.23247v1 null
2024-10-30 PointRecon: Online Point-based 3D Reconstruction via Ray-based 2D-3D Matching Chen Ziwen et.al. 2410.23245v1 null
2024-10-31 Aligning Audio-Visual Joint Representations with an Agentic Workflow Shentong Mo et.al. 2410.23230v2 null
2024-10-29 Local Policies Enable Zero-shot Long-horizon Manipulation Murtaza Dalal et.al. 2410.22332v1 null
2024-10-30 Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets Guangqi Jiang et.al. 2410.22325v2 null
2024-10-29 Enhancing Code Annotation Reliability: Generative AI's Role in Comment Quality Assessment Models Seetharam Killivalavan et.al. 2410.22323v1 null
2024-10-29 Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier Kai Wang et.al. 2410.22317v1 link
2024-10-29 Convex Formulations for Training Two-Layer ReLU Neural Networks Karthik Prakhya et.al. 2410.22311v1 link
2024-10-29 Emotion-Guided Image to Music Generation Souraja Kundu et.al. 2410.22299v1 null
2024-10-29 Motion Graph Unleashed: A Novel Approach to Video Prediction Yiqi Zhong et.al. 2410.22288v1 link
2024-10-29 Non-LTE Synthetic Observables of a Multidimensional Model of Type Ia Supernovae Samuel J. Boos et.al. 2410.22276v1 null
2024-10-29 Leveraging Reverberation and Visual Depth Cues for Sound Event Localization and Detection with Distance Estimation Davide Berghi et.al. 2410.22271v1 null
2024-10-29 LipKernel: Lipschitz-Bounded Convolutional Neural Networks via Dissipative Layers Patricia Pauli et.al. 2410.22258v1 link
2024-10-28 LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior Hanyu Wang et.al. 2410.21264v1 null
2024-10-28 Multi-modal AI for comprehensive breast cancer prognostication Jan Witowski et.al. 2410.21256v1 null
2024-10-28 Joint Audio-Visual Idling Vehicle Detection with Streamlined Input Dependencies Xiwen Li et.al. 2410.21170v1 null
2024-10-28 KaLDeX: Kalman Filter based Linear Deformable Cross Attention for Retina Vessel Segmentation Zhihao Zhao et.al. 2410.21160v1 null
2024-10-28 Synthetica: Large Scale Synthetic Data for Robot Perception Ritvik Singh et.al. 2410.21153v1 null
2024-10-28 The tau function for ABS equations James Atkinson et.al. 2410.21148v1 null
2024-10-28 Enhancing Learned Image Compression via Cross Window-based Attention Priyanka Mudgal et.al. 2410.21144v1 null
2024-10-28 uOttawa at LegalLens-2024: Transformer-based Classification Experiments Nima Meghdadi et.al. 2410.21139v1 link
2024-10-28 Do LLMs generate test oracles that capture the actual or the expected program behaviour? Michael Konstantinou et.al. 2410.21136v1 null
2024-10-28 Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal Sequences Zhihao Zhao et.al. 2410.21130v1 null
2024-10-25 Sparse Decomposition of Graph Neural Networks Yaochen Hu et.al. 2410.19723v1 null
2024-10-25 Arabic Music Classification and Generation using Deep Learning Mohamed Elshaarawy et.al. 2410.19719v1 null
2024-10-25 Enhanced Anomaly Detection in Industrial Control Systems aided by Machine Learning Vegard Berge et.al. 2410.19717v1 null
2024-10-25 TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning Xiangyu Zeng et.al. 2410.19702v1 null
2024-10-25 MILES: Making Imitation Learning Easy with Self-Supervision Georgios Papagiannis et.al. 2410.19693v1 null
2024-10-25 Deep Learning for Classification of Inflammatory Bowel Disease Activity in Whole Slide Images of Colonic Histopathology Amit Das et.al. 2410.19690v1 null
2024-10-25 Optimizing Hearthstone Agents using an Evolutionary Algorithm Pablo García-Sánchez et.al. 2410.19681v1 null
2024-10-25 Learning the Regularization Strength for Deep Fine-Tuning via a Data-Emphasized Variational Objective Ethan Harvey et.al. 2410.19675v1 null
2024-10-25 MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services Hongjia Wu et.al. 2410.19665v1 null
2024-10-25 VARS: Vision-based Assessment of Risk in Security Systems Pranav Gupta et.al. 2410.19642v1 null
2024-10-24 Framer: Interactive Frame Interpolation Wen Wang et.al. 2410.18978v1 null
2024-10-24 CAMEL-Bench: A Comprehensive Arabic LMM Benchmark Sara Ghaboura et.al. 2410.18976v1 link
2024-10-24 Unbounded: A Generative Infinite Game of Character Life Simulation Jialu Li et.al. 2410.18975v1 null
2024-10-24 Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling Mingtong Zhang et.al. 2410.18912v1 null
2024-10-24 SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment Caelan Garrett et.al. 2410.18907v1 null
2024-10-24 A Survey of Multimodal Sarcasm Detection Shafkat Farabi et.al. 2410.18882v1 null
2024-10-24 Multi-Class Abnormality Classification in Video Capsule Endoscopy Using Deep Learning Arnav Samal et.al. 2410.18879v1 link
2024-10-24 Exploring the Universe with SNAD: Anomaly Detection in Astronomy Alina A. Volnova et.al. 2410.18875v1 null
2024-10-24 Exploring a Geometric Conjecture, Some Properties of Blaschke Products, and the Geometry of Curves Formed by Them Mehmet Celik et.al. 2410.18863v1 null
2024-10-24 Highly efficient non-rigid registration in k-space with application to cardiac Magnetic Resonance Imaging Aya Ghoul et.al. 2410.18834v1 link
2024-10-23 FIPER: Generalizable Factorized Fields for Joint Image Compression and Super-Resolution Yang-Che Sun et.al. 2410.18083v1 null
2024-10-23 WorldSimBench: Towards Video Generation Models as World Simulators Yiran Qin et.al. 2410.18072v1 null
2024-10-23 Eigenvalue crossings in equivariant families of matrices Jonathan Rawlinson et.al. 2410.18068v1 null
2024-10-23 The Double-Edged Sword of Behavioral Responses in Strategic Classification: Theory and User Studies Raman Ebrahimi et.al. 2410.18066v1 null
2024-10-23 Real time anomalies detection on video Fabien Poirier et.al. 2410.18051v1 null
2024-10-23 Boundary topological insulators and superconductors of Altland-Zirnbauer tenfold classes Xun-Jiang Luo et.al. 2410.18015v1 null
2024-10-24 Effective Finite Time Stability Control for Human-Machine Shared Vehicle Following System Zihan Wang et.al. 2410.18007v2 null
2024-10-23 Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation Suho Kang et.al. 2410.18001v1 link
2024-10-23 Optical Generative Models Shiqi Chen et.al. 2410.17970v1 null
2024-10-23 A Wavelet Diffusion GAN for Image Super-Resolution Lorenzo Aloisi et.al. 2410.17966v1 null
2024-10-22 Altogether: Image Captioning via Re-aligning Alt-text Hu Xu et.al. 2410.17251v1 null
2024-10-22 Classifying rational polygons with small denominator and few interior lattice points Martin Bohnert et.al. 2410.17244v1 null
2024-10-22 Frontiers in Intelligent Colonoscopy Ge-Peng Ji et.al. 2410.17241v1 link
2024-10-22 Automated Spinal MRI Labelling from Reports Using a Large Language Model Robin Y. Park et.al. 2410.17235v1 link
2024-10-22 Few-shot In-Context Preference Learning Using Large Language Models Chao Yu et.al. 2410.17233v1 null
2024-10-22 Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods Tsachi Blau et.al. 2410.17222v1 null
2024-10-22 The Decision Problem for Regular First-Order Theories Umang Mathur et.al. 2410.17185v1 null
2024-10-22 Technical Report: Toward Applying Quantum Computing to Network Verification Kahlil Dozier et.al. 2410.17184v1 null
2024-10-22 KANICE: Kolmogorov-Arnold Networks with Interactive Convolutional Elements Md Meftahul Ferdaus et.al. 2410.17172v1 link
2024-10-22 Are Visual-Language Models Effective in Action Recognition? A Comparative Study Mahmoud Ali et.al. 2410.17149v1 null
2024-10-21 SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Shuangrui Ding et.al. 2410.16268v1 link
2024-10-21 xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs Michael S. Ryoo et.al. 2410.16267v1 null
2024-10-21 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors Xi Liu et.al. 2410.16266v1 null
2024-10-21 Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos Gengshan Yang et.al. 2410.16259v1 null
2024-10-21 Serendipitous detection of an intense X-ray flare in the weak-line T Tauri star KM Ori with SRG/eROSITA Savithri H. Ezhikode et.al. 2410.16241v1 null
2024-10-21 MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report Samrajya Thapa et.al. 2410.16239v1 link
2024-10-21 Deep Radiomics Detection of Clinically Significant Prostate Cancer on Multicenter MRI: Initial Comparison to PI-RADS Assessment G. A. Nketiah et.al. 2410.16238v1 null
2024-10-22 Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models Giannis Daras et.al. 2410.16152v2 null
2024-10-21 An Explainable Contrastive-based Dilated Convolutional Network with Transformer for Pediatric Pneumonia Detection Chandravardhan Singh Raghaw et.al. 2410.16143v1 null
2024-10-21 Modeling dynamic neural activity by combining naturalistic video stimuli and stimulus-independent latent factors Finn Schmidt et.al. 2410.16136v1 null
2024-10-18 Real-time Fake News from Adversarial Feedback Sanxing Chen et.al. 2410.14651v1 null
2024-10-18 GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings Raghuveer Thirukovalluru et.al. 2410.14635v1 null
2024-10-18 You Shall Know a Tool by the Traces it Leaves: The Predictability of Sentiment Analysis Tools Daniel Baumartz et.al. 2410.14626v1 null
2024-10-18 Learning to Control the Smoothness of Graph Convolutional Network Features Shih-Hsin Wang et.al. 2410.14604v1 null
2024-10-18 Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection Aaron Alvarado Kristanto Julistiono et.al. 2410.14581v1 null
2024-10-18 A Hybrid Feature Fusion Deep Learning Framework for Leukemia Cancer Detection in Microscopic Blood Sample Using Gated Recurrent Unit and Uncertainty Quantification Maksuda Akter et.al. 2410.14536v1 null
2024-10-18 Less is More: Selective Reduction of CT Data for Self-Supervised Pre-Training of Deep Learning Models with Contrastive Learning Improves Downstream Classification Performance Daniel Wolf et.al. 2410.14524v1 link
2024-10-18 Influence of anisotropy on the study of critical behavior of spin models by machine learning methods Diana Sukhoverkhova et.al. 2410.14523v1 null
2024-10-18 A character approach to the ISR property Artem Dudko et.al. 2410.14517v1 null
2024-10-18 Efficient Annotator Reliability Assessment and Sample Weighting for Knowledge-Based Misinformation Detection on Social Media Owen Cook et.al. 2410.14515v1 link
2024-10-17 DepthSplat: Connecting Gaussian Splatting and Depth Haofei Xu et.al. 2410.13862v1 link
2024-10-17 Adaptive Subsampling and Learned Model Improve Spatiotemporal Resolution of Tactile Skin Ariel Slepyan et.al. 2410.13847v1 null
2024-10-17 VidPanos: Generative Panoramic Videos from Casual Panning Videos Jingwei Ma et.al. 2410.13832v1 null
2024-10-17 DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control Yujie Wei et.al. 2410.13830v1 null
2024-10-17 Multi-style conversion for semantic segmentation of lesions in fundus images by adversarial attacks Clément Playout et.al. 2410.13822v1 link
2024-10-17 Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance Mitsuhiko Nakamoto et.al. 2410.13816v1 null
2024-10-17 A Pattern to Align Them All: Integrating Different Modalities to Define Multi-Modal Entities Gianluca Apriceno et.al. 2410.13803v1 link
2024-10-17 MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations Liang Xu et.al. 2410.13790v1 link
2024-10-17 Strong-to-weak spontaneous symmetry breaking meets average symmetry-protected topological order Yuchen Guo et.al. 2410.13734v1 null
2024-10-17 Representing Model Weights with Language using Tree Experts Eliahu Horwitz et.al. 2410.13569v1 null
2024-10-16 Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception Jihao Zhao et.al. 2410.12788v1 null
2024-10-16 The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio Sicong Leng et.al. 2410.12787v1 null
2024-10-16 Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions Zhenyu Jiang et.al. 2410.12773v1 null
2024-10-16 Vaccinating Federated Learning for Robust Modulation Classification in Distributed Wireless Networks Hunmin Lee et.al. 2410.12772v1 null
2024-10-16 Phase retrieval via media diversity Yan Cheng et.al. 2410.12767v1 null
2024-10-16 SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation Jaehong Yoon et.al. 2410.12761v1 null
2024-10-16 Unitary Multi-Margin BERT for Robust Natural Language Processing Hao-Yuan Chang et.al. 2410.12759v1 null
2024-10-16 PND-Net: Plant Nutrition Deficiency and Disease Classification using Graph Convolutional Network Asish Bera et.al. 2410.12742v1 null
2024-10-16 How much time do we have before catastrophic disclosure occurs? Matthew Szydagis et.al. 2410.12738v1 null
2024-10-16 Machine Learning-Augmented Ontology-Based Data Access for Renewable Energy Data Marco Calautti et.al. 2410.12734v1 null
2024-10-15 High-Resolution Frame Interpolation with Patch-based Cascaded Diffusion Junhwa Hur et.al. 2410.11838v1 null
2024-10-15 Contrastive Touch-to-Touch Pretraining Samanta Rodriguez et.al. 2410.11834v1 null
2024-10-15 CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos Nikita Karaev et.al. 2410.11831v1 null
2024-10-15 Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos Zhouxia Wang et.al. 2410.11828v1 null
2024-10-15 On representations of Arthur type and unitary dual for classical groups Alexander Hazeltine et.al. 2410.11806v1 null
2024-10-16 Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices Zhiyuan Ma et.al. 2410.11795v2 null
2024-10-15 OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation Jinhan Li et.al. 2410.11792v1 null
2024-10-15 Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability Tsz Ting Chung et.al. 2410.11786v1 null
2024-10-15 On the Training Convergence of Transformers for In-Context Classification Wei Shen et.al. 2410.11778v1 null
2024-10-15 Temporal resolution enhancement in Structured Illumination Microscopy using cascaded reconstruction Doron Shterman et.al. 2410.11770v1 null
2024-10-14 Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models Jingzhi Bao et.al. 2410.10821v1 null
2024-10-14 TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models Mu Cai et.al. 2410.10818v1 null
2024-10-14 LVD-2M: A Long-take Video Dataset with Temporally Dense Captions Tianwei Xiong et.al. 2410.10816v1 link
2024-10-14 Depth Any Video with Scalable Synthetic Data Honghui Yang et.al. 2410.10815v1 null
2024-10-14 Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies Yanjie Ze et.al. 2410.10803v1 link
2024-10-14 Boosting Camera Motion Control for Video Diffusion Transformers Soon Yau Cheong et.al. 2410.10802v1 null
2024-10-14 Probabilistic Degeneracy Detection for Point-to-Plane Error Minimization Johan Hatleskog et.al. 2410.10784v1 null
2024-10-14 3DArticCyclists: Generating Simulated Dynamic 3D Cyclists for Human-Object Interaction (HOI) and Autonomous Driving Applications Eduardo R. Corral-Soto et.al. 2410.10782v1 null
2024-10-14 ControlMM: Controllable Masked Motion Generation Ekkasit Pinyoanuntapong et.al. 2410.10780v1 null
2024-10-14 Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Dejia Xu et.al. 2410.10774v1 null
2024-10-11 Optimal Downsampling for Imbalanced Classification with Generalized Linear Models Yan Chen et.al. 2410.08994v1 null
2024-10-11 Realizing Linear Synaptic Plasticity in Electric Double Layer-Gated Transistors for Improved Predictive Accuracy and Efficiency in Neuromorphic Computing Nithil Harris Manimaran et.al. 2410.08978v1 null
2024-10-11 ALVIN: Active Learning Via INterpolation Michalis Korakakis et.al. 2410.08972v1 null
2024-10-11 Evaluating Federated Kolmogorov-Arnold Networks on Non-IID Data Arthur Mendonça Sasse et.al. 2410.08961v1 null
2024-10-11 Lifted Coefficient of Determination: Fast model-free prediction intervals and likelihood-free model comparison Daniel Salnikov et.al. 2410.08958v1 null
2024-10-11 Rapid Grassmannian Averaging with Chebyshev Polynomials Brighton Ancelin et.al. 2410.08956v1 null
2024-10-11 Local moduli in the special 2-flags of length 5 Piotr Mormul et.al. 2410.08951v1 null
2024-10-11 On the Adversarial Transferability of Generalized "Skip Connections" Yisen Wang et.al. 2410.08950v1 null
2024-10-11 Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing Clayton Leite et.al. 2410.08931v1 null
2024-10-11 Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images Virmarie Maquiling et.al. 2410.08926v1 null
2024-10-10 LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts Anh-Quan Cao et.al. 2410.08211v1 null
2024-10-10 Scaling Laws For Diffusion Transformers Zhengyang Liang et.al. 2410.08184v1 null
2024-10-10 RGM: Reconstructing High-fidelity 3D Car Assets with Relightable 3D-GS Generative Model from a Single Image Xiaoxue Chen et.al. 2410.08181v1 null
2024-10-10 A note on the symplectic classification of almost-toric systems Xiudi Tang et.al. 2410.08175v1 null
2024-10-10 Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models Qingni Wang et.al. 2410.08174v1 null
2024-10-10 Progressive Autoregressive Video Diffusion Models Desai Xie et.al. 2410.08151v1 link
2024-10-10 Robust AI-Generated Text Detection by Restricted Embeddings Kristian Kuznetsov et.al. 2410.08113v1 null
2024-10-10 Color-Guided Flying Pixel Correction in Depth Images Ekamresh Vasudevan et.al. 2410.08084v1 null
2024-10-10 Dynamic Object Catching with Quadruped Robot Front Legs André Schakkal et.al. 2410.08065v1 null
2024-10-10 A Target-Aware Analysis of Data Augmentation for Hate Speech Detection Camilla Casula et.al. 2410.08053v1 null
2024-10-09 MM-Ego: Towards Building Egocentric Multimodal LLMs Hanrong Ye et.al. 2410.07177v1 null
2024-10-09 One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation Fabian Paischer et.al. 2410.07170v1 null
2024-10-09 Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis Bohan Zeng et.al. 2410.07155v1 link
2024-10-09 Mental Disorders Detection in the Era of Large Language Models Gleb Kuzmin et.al. 2410.07129v1 null
2024-10-09 Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication Erzhen Hu et.al. 2410.07119v1 null
2024-10-09 JPEG Inspired Deep Learning Ahmed H. Salamah et.al. 2410.07081v1 null
2024-10-09 Retrieval-Augmented Decision Transformer: External Memory for In-context RL Thomas Schmied et.al. 2410.07071v1 null
2024-10-09 TinyEmo: Scaling down Emotional Reasoning via Metric Projection Cristian Gutierrez et.al. 2410.07062v1 link
2024-10-09 Z-upscaling: Optical Flow Guided Frame Interpolation for Isotropic Reconstruction of 3D EM Volumes Fisseha A. Ferede et.al. 2410.07043v1 link
2024-10-09 Optimizing Estimators of Squared Calibration Errors in Classification Sebastian G. Gruber et.al. 2410.07014v1 null
2024-10-07 Fine-Tuning CLIP's Last Visual Projector: A Few-Shot Cornucopia Mohammad Fahes et.al. 2410.05270v1 link
2024-10-07 Grounding Partially-Defined Events in Multimodal Data Kate Sanders et.al. 2410.05267v1 null
2024-10-07 DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control Kaifeng Zhao et.al. 2410.05260v1 null
2024-10-07 SePPO: Semi-Policy Preference Optimization for Diffusion Alignment Daoan Zhang et.al. 2410.05255v1 link
2024-10-07 Causal Micro-Narratives Mourad Heddaya et.al. 2410.05252v1 null
2024-10-07 LoTLIP: Improving Language-Image Pre-training for Long Text Understanding Wei Wu et.al. 2410.05249v1 null
2024-10-07 The Dawn of Video Generation: Preliminary Explorations with SORA-like Models Ailing Zeng et.al. 2410.05227v1 null
2024-10-07 Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality Ge Ya et.al. 2410.05203v1 link
2024-10-07 Variable Resolution Pixel Quantization for Low Power Machine Vision Application on Edge Senorita Deb et.al. 2410.05189v1 null
2024-10-07 VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks Ziyan Jiang et.al. 2410.05160v1 null
2024-10-04 Spatial Hyperspheric Models for Compositional Data Michael R. Schwob et.al. 2410.03648v1 null
2024-10-04 HyperCMR: Enhanced Multi-Contrast CMR Reconstruction with Eagle Loss Ruru Xu et.al. 2410.03624v1 null
2024-10-04 Crystallography, Group Cohomology, and Lieb-Schultz-Mattis Constraints Chunxiao Liu et.al. 2410.03607v1 null
2024-10-04 LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Videos Noriaki Hirose et.al. 2410.03603v1 null
2024-10-04 Training Over a Distribution of Hyperparameters for Enhanced Performance and Adaptability on Imbalanced Classification Kelsey Lieberman et.al. 2410.03588v1 null
2024-10-04 A Multi-model Approach for Video Data Retrieval in Autonomous Vehicle Development Jesper Knapp et.al. 2410.03580v1 null
2024-10-04 Re-examining Sexism and Misogyny Classification with Annotator Attitudes Aiqi Jiang et.al. 2410.03543v1 null
2024-10-04 Classification-Denoising Networks Louis Thiry et.al. 2410.03505v1 null
2024-10-04 MO-DDN: A Coarse-to-Fine Attribute-based Exploration Agent for Multi-object Demand-driven Navigation Hongcheng Wang et.al. 2410.03488v1 null
2024-10-04 A Multimodal Framework for Deepfake Detection Kashish Gandhi et.al. 2410.03487v1 null
2024-10-03 Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats Mingyang Xie et.al. 2410.02764v1 null
2024-10-03 Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos Jianrui Zhang et.al. 2410.02763v1 null
2024-10-03 Loong: Generating Minute-level Long Videos with Autoregressive Language Models Yuqing Wang et.al. 2410.02757v1 null
2024-10-03 An Online Automatic Modulation Classification Scheme Based on Isolation Distributional Kernel Xinpeng Li et.al. 2410.02750v1 null
2024-10-03 OOD-Chameleon: Is Algorithm Selection for OOD Generalization Learnable? Liangze Jiang et.al. 2410.02735v1 null
2024-10-03 Liouville's theorem in calibrated geometries Toni Ikonen et.al. 2410.02722v1 null
2024-10-03 Curvature Diversity-Driven Deformation and Domain Alignment for Point Cloud Mengxi Wu et.al. 2410.02720v1 link
2024-10-03 AlzhiNet: Traversing from 2DCNN to 3DCNN, Towards Early Detection and Diagnosis of Alzheimer's Disease Romoke Grace Akindele et.al. 2410.02714v1 null
2024-10-04 Video Instruction Tuning With Synthetic Data Yuanhan Zhang et.al. 2410.02713v2 null
2024-10-03 Impact of a reclassification on Web of Science articles on bibliometric indicators Agénor Lahatte et.al. 2410.02701v1 null
2024-10-02 Loki: An Open-Source Tool for Fact Verification Haonan Li et.al. 2410.01794v1 null
2024-10-03 Application of convolutional neural networks for extensive air shower separation in the SPHERE-3 experiment E. L. Entina et.al. 2410.01781v2 null
2024-10-03 TopER: Topological Embeddings in Graph Representation Learning Astrit Tola et.al. 2410.01778v2 null
2024-10-02 Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context Spencer Frei et.al. 2410.01774v1 null
2024-10-02 SegHeD: Segmentation of Heterogeneous Data for Multiple Sclerosis Lesions with Anatomical Constraints Berke Doga Basaran et.al. 2410.01766v1 null
2024-10-02 LightSC: The Making of a Usable Security Classification Tool for DevSecOps Manish Shrestha et.al. 2410.01762v1 null
2024-10-02 Integrating Protein Sequence and Expression Level to Analysis Molecular Characterization of Breast Cancer Subtypes Hossein Sholehrasa et.al. 2410.01755v1 null
2024-10-02 Unitary Representations of the Isometry Groups of Urysohn Spaces Rémi Barritault et.al. 2410.01725v1 null
2024-10-02 COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation Mingzhen Sun et.al. 2410.01718v1 null
2024-10-02 Rabi oscillations at three-photon laser excitation of a single rubidium Rydberg atom in an optical dipole trap I. I. Beterov et.al. 2410.01703v1 null
2024-09-30 Continuously Improving Mobile Manipulation with Autonomous Real-World RL Russell Mendonca et.al. 2409.20568v1 null
2024-09-30 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Haotian Zhang et.al. 2409.20566v1 null
2024-09-30 DressRecon: Freeform 4D Human Reconstruction from Monocular Video Jeff Tan et.al. 2409.20563v1 null
2024-09-30 LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner Xiaopan Zhang et.al. 2409.20560v1 null
2024-09-30 Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos Md Mohaiminul Islam et.al. 2409.20557v1 null
2024-09-30 Inverse Painting: Reconstructing The Painting Process Bowei Chen et.al. 2409.20556v1 null
2024-09-30 UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models Qiaojun Yu et.al. 2409.20551v1 null
2024-09-30 Statistical view of orbital circularisation with 14 000 characterised TESS eclipsing binaries L. W. IJspeert et.al. 2409.20540v1 null
2024-09-30 Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers Lirui Wang et.al. 2409.20537v1 link
2024-09-30 Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images Bahri Batuhan Bilecen et.al. 2409.20530v1 null
2024-09-27 PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Shaowei Liu et.al. 2409.18964v1 link
2024-09-27 LML: Language Model Learning a Dataset for Data-Augmented Prediction Praneeth Vadlapati et.al. 2409.18957v1 link
2024-09-27 Unconditional stability of a recurrent neural circuit implementing divisive normalization Shivang Rawat et.al. 2409.18946v1 null
2024-09-27 From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding Heqing Zou et.al. 2409.18938v1 null
2024-09-27 Subspace Preserving Quantum Convolutional Neural Network Architectures Léo Monbroussou et.al. 2409.18918v1 null
2024-09-27 Improving Visual Object Tracking through Visual Prompting Shih-Fang Chen et.al. 2409.18901v1 link
2024-09-27 Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors Yunlong Lin et.al. 2409.18899v1 null
2024-09-27 Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models Zehan Li et.al. 2409.18878v1 null
2024-09-27 Simulating Dynamic Tumor Contrast Enhancement in Breast MRI using Conditional Generative Adversarial Networks Richard Osuala et.al. 2409.18872v1 null
2024-09-27 Fusion Systems and Simple Groups With Class Two Sylow $p$-subgroups Martin van Beek et.al. 2409.18870v1 null
2024-09-26 EgoLM: Multi-Modal Language Model of Egocentric Motions Fangzhou Hong et.al. 2409.18127v1 null
2024-09-26 LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Chenming Zhu et.al. 2409.18125v1 null
2024-09-26 RT-GuIDE: Real-Time Gaussian splatting for Information-Driven Exploration Yuezhan Tao et.al. 2409.18122v1 null
2024-09-26 Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction Justin Kerr et.al. 2409.18121v1 null
2024-09-26 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding Ye Liu et.al. 2409.18111v1 link
2024-09-26 MALPOLON: A Framework for Deep Species Distribution Modeling Theo Larcher et.al. 2409.18102v1 null
2024-09-26 Incorporating sparse labels into biologging studies using hidden Markov models with weighted likelihoods Evan Sidrow et.al. 2409.18091v1 null
2024-09-26 Stable Video Portraits Mirela Ostrek et.al. 2409.18083v1 null
2024-09-26 Graded contractions on the orthogonal Lie algebras of dimensions 7 and 8 Cristina Draper et.al. 2409.18069v1 null
2024-09-26 LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field Huan Wang et.al. 2409.18057v1 link
2024-09-25 DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion Yukun Huang et.al. 2409.17145v1 null
2024-09-25 Streaming Neural Images Marcos V. Conde et.al. 2409.17134v1 null
2024-09-25 Assessing the Level of Toxicity Against Distinct Groups in Bangla Social Media Comments: A Comprehensive Investigation Mukaffi Bin Moin et.al. 2409.17130v1 null
2024-09-25 Classification of Gleason Grading in Prostate Cancer Histopathology Images Using Deep Learning Techniques: YOLO, Vision Transformers, and Vision Mamba Amin Malekmohammadi et.al. 2409.17122v1 link
2024-09-25 Counting Triangles in Triangles Jim Propp et.al. 2409.17117v1 null
2024-09-25 BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices Yongqi Xu et.al. 2409.17093v1 link
2024-09-25 Accumulator-Aware Post-Training Quantization Ian Colbert et.al. 2409.17092v1 null
2024-09-25 Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification Xinrui Zhou et.al. 2409.17091v1 null
2024-09-25 SEN12-WATER: A New Dataset for Hydrological Applications and its Benchmarking Luigi Russo et.al. 2409.17087v1 null
2024-09-25 The Effect of Perceptual Metrics on Music Representation Learning for Genre Classification Tashi Namgyal et.al. 2409.17069v1 null
2024-09-24 Self-Supervised Any-Point Tracking by Contrastive Random Walks Ayush Shrivastava et.al. 2409.16288v1 link
2024-09-24 Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking Xi Wang et.al. 2409.16287v1 null
2024-09-24 Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation Homanga Bharadhwaj et.al. 2409.16283v1 null
2024-09-24 Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation Yong Xien Chng et.al. 2409.16278v1 null
2024-09-24 Compressed Depth Map Super-Resolution and Restoration: AIM 2024 Challenge Results Marcos V. Conde et.al. 2409.16277v1 null
2024-09-24 CDChat: A Large Multimodal Model for Remote Sensing Change Description Mubashir Noman et.al. 2409.16261v1 link
2024-09-24 Empirically Exploring the Space of Monostationarity in Dual Phosphorylation May Cai et.al. 2409.16234v1 null
2024-09-24 VideoPatchCore: An Effective Method to Memorize Normality for Video Anomaly Detection Sunghyun Ahn et.al. 2409.16225v1 link
2024-09-24 Upper-body free-breathing Magnetic Resonance Fingerprinting applied to the quantification of water T1 and fat fraction Constantin Slioussarenko et.al. 2409.16200v1 null
2024-09-24 Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking Jun Bai et.al. 2409.16198v1 null
2024-09-20 Gender Representation and Bias in Indian Civil Service Mock Interviews Somonnoy Banerjee et.al. 2409.12194v3 null
2024-09-18 DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control Zichen Jeff Cui et.al. 2409.12192v1 null
2024-09-18 Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Peng Wang et.al. 2409.12191v1 link
2024-09-18 multiPI-TransBTS: A Multi-Path Learning Framework for Brain Tumor Image Segmentation Based on Multi-Physical Information Hongjun Zhu et.al. 2409.12167v1 link
2024-09-18 JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Sai Tanmay Reddy Chakkera et.al. 2409.12156v1 null
2024-09-18 Autopet III challenge: Incorporating anatomical knowledge into nnUNet for lesion segmentation in PET/CT Hamza Kalisch et.al. 2409.12155v1 link
2024-09-18 MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion Kalakonda Sai Shashank et.al. 2409.12140v1 null
2024-09-18 Mirages in the Energy Landscape of Soft Sphere Packings Praharsh Suryadevara et.al. 2409.12113v1 null
2024-09-18 SPRMamba: Surgical Phase Recognition for Endoscopic Submucosal Dissection with Mamba Xiangning Zhang et.al. 2409.12108v1 null
2024-09-18 Unveiling the Secrets of New Physics Through Top Quark Tagging Rameswar Sahu et.al. 2409.12085v1 null
2024-09-17 Systematic analysis of Parity-Violating modes Hong-Ming Zhu et.al. 2409.11400v1 null
2024-09-17 Online 4D Ultrasound-Guided Robotic Tracking Enables 3D Ultrasound Localisation Microscopy with Large Tissue Displacements Jipeng Yan et.al. 2409.11391v1 null
2024-09-17 Normalization in Proportional Feature Spaces Alexandre Benatti et.al. 2409.11389v1 null
2024-09-17 Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification Fatema-E- Jannat et.al. 2409.11375v1 null
2024-09-17 Uncertainty and Prediction Quality Estimation for Semantic Segmentation via Graph Neural Networks Edgar Heinert et.al. 2409.11373v1 null
2024-09-17 Compact Implicit Neural Representations for Plane Wave Images Mathilde Monvoisin et.al. 2409.11370v1 null
2024-09-17 OSV: One Step is Enough for High-Quality Image to Video Generation Xiaofeng Mao et.al. 2409.11367v1 null
2024-09-17 THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models Mengfei Liang et.al. 2409.11353v1 null
2024-09-17 CLIP Adaptation by Intra-modal Overlap Reduction Alexey Kravets et.al. 2409.11338v1 null
2024-09-17 LPT++: Efficient Training on Mixture of Long-tailed Experts Bowen Dong et.al. 2409.11323v1 null
2024-09-16 Enhancing Video Transmission with Machine Learning based Routing in Software-Defined Networks Anıl Dursun İpek et.al. 2409.10512v1 null
2024-09-16 Exploring 3D Face Reconstruction and Fusion Methods for Face Verification: A Case-Study in Video Surveillance Simone Maurizio La Cava et.al. 2409.10481v1 null
2024-09-16 Real-Time Whole-Body Control of Legged Robots with Model-Predictive Path Integral Control Juan Alvarez-Padilla et.al. 2409.10469v1 null
2024-09-16 Assortativity in sympatric speciation and species classification Joao U. F. Lizarraga et.al. 2409.10466v1 null
2024-09-16 Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons Farhad Pourkamali-Anaraki et.al. 2409.10463v1 null
2024-09-16 Deep-Wide Learning Assistance for Insect Pest Classification Toan Nguyen et.al. 2409.10445v1 link
2024-09-16 A point process approach for the classification of noisy calcium imaging data Arianna Burzacchi et.al. 2409.10409v1 null
2024-09-16 MOST: MR reconstruction Optimization for multiple downStream Tasks via continual learning Hwihun Jeong et.al. 2409.10394v1 link
2024-09-16 Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning Amin Karimi Monsefi et.al. 2409.10362v1 null
2024-09-16 2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation? Téo Guichoux et.al. 2409.10357v1 null
2024-09-13 An Efficient and Streaming Audio Visual Active Speaker Detection System Arnav Kundu et.al. 2409.09018v1 null
2024-09-13 Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation Qingwen Bu et.al. 2409.09016v1 link
2024-09-13 Model-independent variable selection via the rule-based variable priorit Min Lu et.al. 2409.09003v1 null
2024-09-13 Biomimetic Frontend for Differentiable Audio Processing Ruolan Leslie Famularo et.al. 2409.08997v1 link
2024-09-13 Comparative Analysis of Pretrained Audio Representations in Music Recommender Systems Yan-Martin Tamm et.al. 2409.08987v1 link
2024-09-13 Fast DCT+: A Family of Fast Transforms Based on Rank-One Updates of the Path Graph Samuel Fernández-Menduiña et.al. 2409.08970v1 null
2024-09-13 Pushing the boundaries of event subsampling in event-based video classification using CNNs Hesam Araghi et.al. 2409.08953v1 link
2024-09-13 Pushing Joint Image Denoising and Classification to the Edge Thomas C Markhorst et.al. 2409.08943v1 null
2024-09-13 LLM-based Weak Supervision Framework for Query Intent Classification in Video Search Farnoosh Javadi et.al. 2409.08931v1 null
2024-09-13 Classification of electronic structures and state preparation for quantum computation of reaction chemistry Maximilian Mörchen et.al. 2409.08910v1 null
2024-09-12 Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor Andrea Conti et.al. 2409.08277v1 null
2024-09-12 Hand-Object Interaction Pretraining from Videos Himanshu Gaurav Singh et.al. 2409.08273v1 null
2024-09-12 DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer Runjia Li et.al. 2409.08271v1 null
2024-09-12 OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering Jiahao Nick Li et.al. 2409.08250v1 null
2024-09-12 A review of compact geodesic orbit manifolds and the g.o. condition for $\SU(5)/\s(\U(2)\times \U(2))$ Andreas Arvanitoyeorgos et.al. 2409.08247v1 null
2024-09-12 Model Ensemble for Brain Tumor Segmentation in Magnetic Resonance Imaging Daniel Capellán-Martín et.al. 2409.08232v1 null
2024-09-12 CliquePH: Higher-Order Information for Graph Neural Networks through Persistent Homology on Clique Graphs Davide Buffelli et.al. 2409.08217v1 null
2024-09-12 LT3SD: Latent Trees for 3D Scene Diffusion Quan Meng et.al. 2409.08215v1 null
2024-09-12 Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video Boxiang Rong et.al. 2409.08189v1 null
2024-09-13 Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification Soufiyan Bahadi et.al. 2409.08188v2 null
2024-09-11 Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models Haibo Yang et.al. 2409.07452v1 link
2024-09-11 VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos Yan-Bo Lin et.al. 2409.07450v1 null
2024-09-11 Autonomous loading of ore piles with Load-Haul-Dump machines using Deep Reinforcement Learning Rodrigo Salas et.al. 2409.07449v1 null
2024-09-11 StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos Sijie Zhao et.al. 2409.07447v1 null
2024-09-11 Deep Neural Network-Based Sign Language Recognition: A Comprehensive Approach Using Transfer Learning with Explainability A. E. M Ridwan et.al. 2409.07426v1 null
2024-09-11 Controllable retinal image synthesis using conditional StyleGAN and latent space manipulation for improved diagnosis and grading of diabetic retinopathy Somayeh Pakdelmoez et.al. 2409.07422v1 null
2024-09-11 Efficient One-Step Diffusion Refinement for Snapshot Compressive Imaging Yunzhen Wang et.al. 2409.07417v1 null
2024-09-11 NVRC: Neural Video Representation Compression Ho Man Kwan et.al. 2409.07414v1 null
2024-09-12 Robust Robot Walker: Learning Agile Locomotion over Tiny Traps Shaoting Zhu et.al. 2409.07409v2 null
2024-09-11 Revisiting Static Feature-Based Android Malware Detection Md Tanvirul Alam et.al. 2409.07397v1 null
2024-09-10 A study on Deep Convolutional Neural Networks, Transfer Learning and Ensemble Model for Breast Cancer Detection Md Taimur Ahad et.al. 2409.06699v1 null
2024-09-10 DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images Taslim Murad et.al. 2409.06694v1 null
2024-09-10 Benchmarking Sub-Genre Classification For Mainstage Dance Music Hongzhi Shu et.al. 2409.06690v1 null
2024-09-10 A comprehensive study on Blood Cancer detection and classification using Convolutional Neural Network Md Taimur Ahad et.al. 2409.06689v1 null
2024-09-10 A study on deep feature extraction to detect and classify Acute Lymphoblastic Leukemia (ALL) Sabit Ahamed Preanto et.al. 2409.06687v1 null
2024-09-10 Constructing an Interpretable Deep Denoiser by Unrolling Graph Laplacian Regularizer Seyed Alireza Hosseini et.al. 2409.06676v1 null
2024-09-10 Bulk and atmospheric metallicities as direct probes of sequentially varying accretion mechanisms of gas and solids onto planets Yasuhiro Hasegawa et.al. 2409.06670v1 null
2024-09-10 Data Collection-free Masked Video Modeling Yuchi Ishikawa et.al. 2409.06665v1 null
2024-09-10 World-Grounded Human Motion Recovery via Gravity-View Coordinates Zehong Shen et.al. 2409.06662v1 null
2024-09-10 Classifying Functions via growth rates of repeated iterations Titus Hilberdink et.al. 2409.06661v1 null
2024-09-09 Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments Haritheja Etukuru et.al. 2409.05865v1 null
2024-09-09 Neural MP: A Generalist Neural Motion Planner Murtaza Dalal et.al. 2409.05864v1 null
2024-09-09 LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation Henghui Ding et.al. 2409.05847v1 null
2024-09-10 Finite-size topological phases from semimetals Adipta Pal et.al. 2409.05842v2 null
2024-09-09 Fast Generation of Custom Floating-Point Spatial Filters on FPGAs Nelson Campos et.al. 2409.05837v1 null
2024-09-09 Limits on the computational expressivity of non-equilibrium biophysical processes Carlos Floyd et.al. 2409.05827v1 null
2024-09-09 A Flexible Framework for Universal Computational Aberration Correction via Automatic Lens Library Generation and Domain Adaptation Qi Jiang et.al. 2409.05809v1 null
2024-09-09 A CLIP-based siamese approach for meme classification Javier Huertas-Tato et.al. 2409.05772v1 null
2024-09-09 Consensus-based Distributed Quantum Kernel Learning for Speech Recognition Kuan-Cheng Chen et.al. 2409.05770v1 null
2024-09-09 A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR Giovanni Morrone et.al. 2409.05750v1 null
2024-09-06 Synergy and Synchrony in Couple Dances Vongani Maluleke et.al. 2409.04440v1 null
2024-09-06 VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation Yecheng Wu et.al. 2409.04429v1 null
2024-09-06 Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning Techniques Davide Clode da Silva et.al. 2409.04424v1 null
2024-09-06 Virtual Reality-Based Preoperative Planning for Optimized Trocar Placement in Thoracic Surgery: A Preliminary Study Arash Harirpoush et.al. 2409.04414v1 null
2024-09-06 Quantum Kernel Methods under Scrutiny: A Benchmarking Study Jan Schnabel et.al. 2409.04406v1 null
2024-09-09 Question-Answering Dense Video Events Hangyu Qin et.al. 2409.04388v2 null
2024-09-06 Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior Charlesquin Kemajou Mbakam et.al. 2409.04384v1 null
2024-09-06 Enhancing Skin Lesion Diagnosis with Ensemble Learning Xiaoyi Liu et.al. 2409.04381v1 null
2024-09-06 Tykhyy's Conjecture on finite mapping class group orbits Samuel Bronstein et.al. 2409.04379v1 null
2024-09-06 The Impact of Scanner Domain Shift on Deep Learning Performance in Medical Imaging: an Experimental Study Gregory Szumel et.al. 2409.04368v1 null
2024-09-05 Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding Yunze Man et.al. 2409.03757v1 link
2024-09-05 Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron Christian Schmid et.al. 2409.03749v1 null
2024-09-05 Orbital Support and Evolution of CX/OX Structures in Boxy/Peanut Bars Behzad Tahmasebzadeh et.al. 2409.03746v1 null
2024-09-05 Libra: Architectural Support For Principled, Secure And Efficient Balanced Execution On High-End Processors (Extended Version) Hans Winderix et.al. 2409.03743v1 null
2024-09-05 Classification and Prediction of Heart Diseases using Machine Learning Algorithms Akua Sekyiwaa Osei-Nkwantabisa et.al. 2409.03697v1 null
2024-09-05 View-Invariant Policy Learning via Zero-Shot Novel View Synthesis Stephen Tian et.al. 2409.03685v1 null
2024-09-05 Threat Classification on Deployed Optical Networks Using MIMO Digital Fiber Sensing, Wavelets, and Machine Learning Khouloud Abdelli et.al. 2409.03667v1 null
2024-09-05 Limited but consistent gains in adversarial robustness by co-training object recognition models with human EEG Manshan Guo et.al. 2409.03646v1 null
2024-09-05 Variance reduction in Texas hold'em and in video poker Stewart N. Ethier et.al. 2409.03607v1 null
2024-09-05 SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing Lingyu Xiong et.al. 2409.03605v1 null
2024-09-04 SITAR: Semi-supervised Image Transformer for Action Recognition Owais Iqbal et.al. 2409.02910v1 null
2024-09-04 GraphTrials: Visual Proofs of Graph Properties Henry Förster et.al. 2409.02907v1 null
2024-09-04 Classification of spin-$1/2$ fermionic quantum spin liquids on the trillium lattice Ming-Hao Li et.al. 2409.02898v1 null
2024-09-04 LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture Xidong Wang et.al. 2409.02889v1 link
2024-09-04 CanvOI, an Oncology Intelligence Foundation Model: Scaling FLOPS Differently Jonathan Zalach et.al. 2409.02885v1 null
2024-09-04 Look Into the LITE in Deep Learning for Time Series Classification Ali Ismail-Fawaz et.al. 2409.02869v1 null
2024-09-04 Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models Zhibin Liu et.al. 2409.02851v1 null
2024-09-04 iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation Hayeon Jo et.al. 2409.02838v1 null
2024-09-04 Evolution of radiation profiles in a strongly baffled divertor on MAST Upgrade Fabio Federici et.al. 2409.02837v1 null
2024-09-04 Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models Moein Shahiki Tash et.al. 2409.02836v1 null
2024-08-30 Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding Gueter Josmy Faure et.al. 2408.17443v1 link
2024-08-30 SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists Raoyuan Zhao et.al. 2408.17437v1 link
2024-08-30 CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion Yiran Chen et.al. 2408.17424v1 null
2024-09-03 Open-vocabulary Temporal Action Localization using VLMs Naoki Wake et.al. 2408.17422v2 null
2024-08-30 Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes Li Zhang et.al. 2408.17421v1 link
2024-08-30 End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework Chang Cai et.al. 2408.17397v1 null
2024-08-30 Equivariant isomorphism of Quantum Lens Spaces of low dimension Søren Eilers et.al. 2408.17386v1 null
2024-08-30 LASSO-MOGAT: A Multi-Omics Graph Attention Framework for Cancer Classification Fadi Alharbi et.al. 2408.17384v1 null
2024-08-30 Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain Francesca Grasso et.al. 2408.17362v1 link
2024-08-30 Enhancing Underwater Imaging with 4-D Light Fields: Dataset and Method Yuji Lin et.al. 2408.17339v1 null
2024-08-29 SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Ziyu Guo et.al. 2408.16768v1 link
2024-08-29 ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model Fangfu Liu et.al. 2408.16767v1 null
2024-08-29 OmniRe: Omni Urban Scene Reconstruction Ziyu Chen et.al. 2408.16760v1 null
2024-08-29 Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge Beidi Dong et.al. 2408.16749v1 null
2024-08-29 Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech Cong Zhang et.al. 2408.16732v1 null
2024-08-29 VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation Shiwei Wu et.al. 2408.16730v1 null
2024-08-29 Prediction-Feedback DETR for Temporal Action Detection Jihwan Kim et.al. 2408.16729v1 null
2024-08-29 A GREAT Architecture for Edge-Based Graph Problems Like TSP Attila Lischka et.al. 2408.16717v1 null
2024-08-29 One-Shot Learning Meets Depth Diffusion in Multi-Object Videos Anisha Jain et.al. 2408.16704v1 null
2024-08-29 RoboMNIST: A Multimodal Dataset for Multi-Robot Activity Recognition Using WiFi Sensing, Video, and Audio Kian Behzad et.al. 2408.16703v1 null
2024-08-29 Spatio-Temporal Context Prompting for Zero-Shot Action Detection Wei-Jhe Huang et.al. 2408.15996v2 null
2024-08-28 TEDRA: Text-based Editing of Dynamic and Photoreal Actors Basavaraj Sunagad et.al. 2408.15995v1 null
2024-08-28 Minimizing movements solutions for a monotone model of droplet motion Carson Collins et.al. 2408.15984v1 null
2024-08-28 VLT/MUSE detection of accretion-ejection associated with the close stellar companion in the HT Lup system Sebastián Jorquera et.al. 2408.15976v1 null
2024-08-28 1+1d SPT phases with fusion category symmetry: interface modes and non-abelian Thouless pump Kansei Inamura et.al. 2408.15960v1 null
2024-08-28 Generating Binary Species Range Maps Filip Dorm et.al. 2408.15956v1 null
2024-08-28 Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games Nicholas R. Waytowich et.al. 2408.15950v1 null
2024-08-28 Auxiliary Input in Training: Incorporating Catheter Features into Deep Learning Models for ECG-Free Dynamic Coronary Roadmapping Yikang Liu et.al. 2408.15947v1 null
2024-08-28 A latticed total K-theory Qingnan An et.al. 2408.15941v1 null
2024-08-28 Local Descriptors Weighted Adaptive Threshold Filtering For Few-Shot Learning Bingchen Yan et.al. 2408.15924v1 null
2024-08-27 GenRec: Unifying Video Generation and Recognition with Diffusion Models Zejia Weng et.al. 2408.15241v1 null
2024-08-27 Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation Xiaojuan Wang et.al. 2408.15239v1 null
2024-08-27 DCT-CryptoNets: Scaling Private Inference in the Frequency Domain Arjun Roy et.al. 2408.15231v1 null
2024-08-27 SAM & SAM 2 in 3D Slicer: SegmentWithSAM Extension for Annotating Medical Images Zafer Yildiz et.al. 2408.15224v1 link
2024-08-27 Histo-Diffusion: A Diffusion Super-Resolution Method for Digital Pathology with Comprehensive Quality Assessment Xuan Xu et.al. 2408.15218v1 null
2024-08-27 Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance Weiyi Zhang et.al. 2408.15217v1 null
2024-08-27 Classifying populist language in American presidential and governor speeches using automatic text analysis Olaf van der Veen et.al. 2408.15213v1 null
2024-08-27 Sec2Sec Co-attention for Video-Based Apparent Affective Prediction Mingwei Sun et.al. 2408.15209v1 link
2024-08-27 Automatic 8-tissue Segmentation for 6-month Infant Brains Yilan Dong et.al. 2408.15198v1 null
2024-08-27 Infusing Acoustic Pause Context into Text-Based Dementia Assessment Franziska Braun et.al. 2408.15188v1 null
2024-08-26 Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos Qirui Chen et.al. 2408.14469v1 null
2024-08-26 K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences Zhikai Li et.al. 2408.14468v1 null
2024-08-26 Reconstructing physiological signals from fMRI across the adult lifespan Shiyu Wang et.al. 2408.14453v1 null
2024-08-26 Model Parallel Training and Transfer Learning for Convolutional Neural Networks by Domain Decomposition Axel Klawonn et.al. 2408.14442v1 null
2024-08-26 Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification Mahrukh Awan et.al. 2408.14441v1 null
2024-08-26 Radiance Cascades: A Novel High-Resolution Formal Solution for Multidimensional Non-LTE Radiative Transfer Christopher M. J. Osborne et.al. 2408.14425v1 null
2024-08-26 Learning Tree-Structured Composition of Data Augmentation Dongyue Li et.al. 2408.14381v1 link
2024-08-26 Probing Causality Manipulation of Large Language Models Chenyang Zhang et.al. 2408.14380v1 link
2024-08-26 GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy Peiyan Li et.al. 2408.14368v1 null
2024-08-26 An Embedding is Worth a Thousand Noisy Labels Francesco Di Salvo et.al. 2408.14358v1 null
2024-08-23 Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder Marie Huynh et.al. 2408.13255v1 null
2024-08-23 Domain-specific long text classification from sparse relevant information Célia D'Cruz et.al. 2408.13253v1 null
2024-08-23 CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities Tao Wu et.al. 2408.13239v1 null
2024-08-23 D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching Jingyu Liu et.al. 2408.13226v1 null
2024-08-23 ResSR: A Residual Approach to Super-Resolving Multispectral Images Haley Duba-Sullivan et.al. 2408.13225v1 null
2024-08-23 EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods Hongcheng Ding et.al. 2408.13214v1 null
2024-08-23 Instruct-DeBERTa: A Hybrid Approach for Aspect-based Sentiment Analysis on Textual Reviews Dineth Jayakody et.al. 2408.13202v1 null
2024-08-23 EAViT: External Attention Vision Transformer for Audio Classification Aquib Iqbal et.al. 2408.13201v1 null
2024-08-23 Deep Learning for Lung Disease Classification Using Transfer Learning and a Customized CNN Architecture with Attention Xiaoyi Liu et.al. 2408.13180v1 null
2024-08-23 Augmented Functional Random Forests: Classifier Construction and Unbiased Functional Principal Components Importance through Ad-Hoc Conditional Permutations Fabrizio Maturo et.al. 2408.13179v1 null
2024-08-22 Automating Deformable Gasket Assembly Simeon Adebola et.al. 2408.12593v1 null
2024-08-22 xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Can Qin et.al. 2408.12590v1 null
2024-08-22 Real-Time Video Generation with Pyramid Attention Broadcast Xuanlei Zhao et.al. 2408.12588v1 link
2024-08-22 Enhanced Parking Perception by Multi-Task Fisheye Cross-view Transformers Antonyo Musabini et.al. 2408.12575v1 null
2024-08-22 MuMA-ToM: Multi-modal Multi-Agent Theory of Mind Haojun Shi et.al. 2408.12574v1 null
2024-08-22 Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers Sayed Mohammad Vakilzadeh Hatefi et.al. 2408.12568v1 null
2024-08-22 ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation Lujia Zhong et.al. 2408.12561v1 link
2024-08-22 Exploring the Role of Audio in Multimodal Misinformation Detection Moyang Liu et.al. 2408.12558v1 null
2024-08-22 Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge Jun Ma et.al. 2408.12534v1 null
2024-08-22 UMAD: University of Macau Anomaly Detection Benchmark Dataset Dong Li et.al. 2408.12527v1 link
2024-08-21 Great Memory, Shallow Reasoning: Limits of $k$NN-LMs Shangyi Geng et.al. 2408.11815v1 link
2024-08-21 EmbodiedSAM: Online Segment Any 3D Thing in Real Time Xiuwei Xu et.al. 2408.11811v1 null
2024-08-21 Approaching Deep Learning through the Spectral Dynamics of Weights David Yunis et.al. 2408.11804v1 link
2024-08-21 Practical token pruning for foundation models in few-shot conversational virtual assistant systems Haode Qi et.al. 2408.11799v1 null
2024-08-21 Critique-out-Loud Reward Models Zachary Ankner et.al. 2408.11791v1 link
2024-08-21 DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework Zhifei Xie et.al. 2408.11788v1 null
2024-08-21 NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation Zhenye Lou et.al. 2408.11787v1 link
2024-08-21 Timeline and Boundary Guided Diffusion Network for Video Shadow Detection Haipeng Zhou et.al. 2408.11785v1 link
2024-08-21 SBDet: A Symmetry-Breaking Object Detector via Relaxed Rotation-Equivariance Zhiqiang Wu et.al. 2408.11760v1 null
2024-08-21 Improving the Scan-rescan Precision of AI-based CMR Biomarker Estimation Dewmini Hasara Wickremasinghe et.al. 2408.11754v1 null
2024-08-20 Discriminant Analysis in stationary time series based on robust cepstral coefficients Jonathan de Souza Matias et.al. 2408.11012v1 null
2024-08-20 Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos Dennis Fedorishin et.al. 2408.10998v1 null
2024-08-20 Denoising Plane Wave Ultrasound Images Using Diffusion Probabilistic Models Hojat Asgariandehkordi et.al. 2408.10987v1 null
2024-08-20 ISLES'24: Improving final infarct prediction in ischemic stroke using multimodal imaging and clinical data Ezequiel de la Rosa et.al. 2408.10966v1 null
2024-08-20 Multichannel Attention Networks with Ensembled Transfer Learning to Recognize Bangla Handwritten Charecter Farhanul Haque et.al. 2408.10955v1 null
2024-08-20 Wave-Mask/Mix: Exploring Wavelet-Based Augmentations for Time Series Forecasting Dona Arabi et.al. 2408.10951v1 link
2024-08-20 Proxona: Leveraging LLM-Driven Personas to Enhance Creators' Understanding of Their Audience Yoonseo Choi et.al. 2408.10937v1 null
2024-08-20 SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement Linlin Hu et.al. 2408.10934v1 null
2024-08-20 ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining Qi Ma et.al. 2408.10906v1 null
2024-08-20 ViLReF: A Chinese Vision-Language Retinal Foundation Model Shengzhu Yang et.al. 2408.10894v1 link
2024-08-19 Some model theory of quadratic geometries Charlotte Kestner et.al. 2408.10196v1 null
2024-08-19 Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification Jing Li et.al. 2408.10193v1 null
2024-08-20 LongVILA: Scaling Long-Context Visual Language Models for Long Videos Fuzhao Xue et.al. 2408.10188v2 link
2024-08-19 SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models Anke Tang et.al. 2408.10174v1 link
2024-08-19 Galaxy Zoo: Morphologies based on UKIDSS NIR Imaging for 71,052 Galaxies Karen L. Masters et.al. 2408.10160v1 null
2024-08-19 Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video Shuxian Wang et.al. 2408.10153v1 null
2024-08-19 Biharmonic conformal immersions into a 3-dimensional conformally flat space Ze-Ping Wang et.al. 2408.10144v1 null
2024-08-19 Perceptual Depth Quality Assessment of Stereoscopic Omnidirectional Images Wei Zhou et.al. 2408.10134v1 null
2024-08-19 UNINEXT-Cutie: The 1st Solution for LSVOS Challenge RVOS Track Hao Fang et.al. 2408.10129v1 null
2024-08-19 Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track Feiyu Pan et.al. 2408.10125v1 null
2024-08-16 Quantum Annealing for Enhanced Feature Selection in Single-Cell RNA Sequencing Data Analysis Selim Romero et.al. 2408.08867v1 null
2024-08-16 DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models Eman Ali et.al. 2408.08855v1 null
2024-08-16 ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis Yubao Zhao et.al. 2408.08849v1 null
2024-08-16 HistoGym: A Reinforcement Learning Environment for Histopathological Image Analysis Zhi-Bo Liu et.al. 2408.08847v1 link
2024-08-16 LEVIS: Large Exact Verifiable Input Spaces for Neural Networks Mohamad Fares El Hajj Chehade et.al. 2408.08824v1 null
2024-08-16 Optimal Symmetries in Binary Classification Vishal S. Ngairangbam et.al. 2408.08823v1 null
2024-08-16 Leveraging FourierKAN Classification Head for Pre-Trained Transformer-based Text Classification Abdullah Al Imran et.al. 2408.08803v1 null
2024-08-16 Xpikeformer: Hybrid Analog-Digital Hardware Acceleration for Spiking Transformers Zihang Song et.al. 2408.08794v1 null
2024-08-16 Assessing Generalization Capabilities of Malaria Diagnostic Models from Thin Blood Smears Louise Guillon et.al. 2408.08792v1 null
2024-08-16 A Disease-Specific Foundation Model Using Over 100K Fundus Images: Release and Validation for Abnormality and Multi-Disease Classification on Downstream Tasks Boa Jang et.al. 2408.08790v1 link
2024-08-15 HyperTaxel: Hyper-Resolution for Taxel-Based Tactile Signals Through Contrastive Learning Hongyu Li et.al. 2408.08312v1 null
2024-08-15 Gauge-invariant optical selection rules for excitons Tharindu Fernando et.al. 2408.08311v1 null
2024-08-15 Accelerated Image-Aware Generative Diffusion Modeling Tanmay Asthana et.al. 2408.08306v1 null
2024-08-15 SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training Gengwei Zhang et.al. 2408.08295v1 link
2024-08-15 Marker or Markerless? Mode-Switchable Optical Tactile Sensing for Diverse Robot Tasks Ni Ou et.al. 2408.08276v1 null
2024-08-15 Snuffy: Efficient Whole Slide Image Classifier Hossein Jafarinia et.al. 2408.08258v1 link
2024-08-15 Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective Zixuan Pan et.al. 2408.08228v1 link
2024-08-15 RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science David Farr et.al. 2408.08217v1 null
2024-08-15 Moving Healthcare AI-Support Systems for Visually Detectable Diseases onto Constrained Devices Tess Watt et.al. 2408.08215v1 null
2024-08-15 Learned Multimodal Compression for Autonomous Driving Hadi Hadizadeh et.al. 2408.08211v1 null
2024-08-14 End-to-end Semantic-centric Video-based Multimodal Affective Computing Ronghao Lin et.al. 2408.07694v1 null
2024-08-15 A Spitting Image: Modular Superpixel Tokenization in Vision Transformers Marius Aasan et.al. 2408.07680v2 link
2024-08-14 G$^2$V$^2$former: Graph Guided Video Vision Transformer for Face Anti-Spoofing Jingyi Yang et.al. 2408.07675v1 null
2024-08-14 Graph Triple Attention Network: A Decoupled Perspective Xiaotang Wang et.al. 2408.07654v1 link
2024-08-14 Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving Yuqing Wen et.al. 2408.07605v1 null
2024-08-14 Disentangle and denoise: Tackling context misalignment for video moment retrieval Kaijing Ma et.al. 2408.07600v1 null
2024-08-14 Theoretical and Practical Progress in Hyperspectral Pixel Unmixing with Large Spectral Libraries from a Sparse Perspective Jade Preston et.al. 2408.07580v1 null
2024-08-14 TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases Thibault Simonetto et.al. 2408.07579v1 link
2024-08-14 DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model Erez Yosef et.al. 2408.07541v1 null
2024-08-14 Improved 3D Whole Heart Geometry from Sparse CMR Slices Yiyang Xu et.al. 2408.07532v1 link
2024-08-13 On Networks and their Applications: Stability of Gene Regulatory Networks and Gene Function Prediction using Autoencoders Hamza Coban et.al. 2408.07064v1 null
2024-08-13 Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual Reality Yu-Chih Chen et.al. 2408.07041v1 null
2024-08-13 PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology Xiaomin Wu et.al. 2408.07037v1 null
2024-08-13 Feature-Preserving Rate-Distortion Optimization in Image Coding for Machines Samuel Fernández Menduiña et.al. 2408.07028v1 null
2024-08-13 Event-Stream Super Resolution using Sigma-Delta Neural Network Waseem Shariff et.al. 2408.06968v1 null
2024-08-13 DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs Dongyuan Li et.al. 2408.06966v1 null
2024-08-13 OpenResearcher: Unleashing AI for Accelerated Scientific Research Yuxiang Zheng et.al. 2408.06941v1 link
2024-08-13 Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification Bauke Arends et.al. 2408.06930v1 null
2024-08-13 Divide and Conquer: Improving Multi-Camera 3D Perception with 2D Semantic-Depth Priors and Input-Dependent Queries Qi Song et.al. 2408.06901v1 null
2024-08-13 Entendre, a Social Bot Detection Tool for Niche, Fringe, and Extreme Social Media Pranav Venkatesh et.al. 2408.06900v1 null
2024-08-12 Is it a work or leisure travel? Applying text classification to identify work-related travel on social networks Lucas Félix et.al. 2408.06341v1 null
2024-08-12 Moo-ving Beyond Tradition: Revolutionizing Cattle Behavioural Phenotyping with Pose Estimation Techniques Navid Ghassemi et.al. 2408.06336v1 null
2024-08-12 LOLgorithm: Integrating Semantic,Syntactic and Contextual Elements for Humor Classification Tanisha Khurana et.al. 2408.06335v1 null
2024-08-12 From SAM to SAM 2: Exploring Improvements in Meta's Segment Anything Model Athulya Sundaresan Geetha et.al. 2408.06305v1 null
2024-08-12 Sparsity Based Multi-Source Robust 3D Localization Using a Moving Receiver Amir Mansourian et.al. 2408.06274v1 null
2024-08-12 Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance Manuel Milling et.al. 2408.06264v1 null
2024-08-12 Deep Learning System Boundary Testing through Latent Space Style Mixing Amr Abdellatif et.al. 2408.06258v1 null
2024-08-12 Rethinking Video with a Universal Event-Based Representation Andrew Freeman et.al. 2408.06248v1 null
2024-08-12 A Comprehensive Case Study on the Performance of Machine Learning Methods on the Classification of Solar Panel Electroluminescence Images Xinyi Song et.al. 2408.06229v1 link
2024-08-12 ARCADE: An Augmented Reality Display Environment for Multimodal Interaction with Conversational Agents Carolin Schindler et.al. 2408.06222v1 null
2024-08-09 VITA: Towards Open-Source Interactive Omni Multimodal LLM Chaoyou Fu et.al. 2408.05211v1 null
2024-08-09 Kalman-Inspired Feature Propagation for Video Face Super-Resolution Ruicheng Feng et.al. 2408.05205v1 null
2024-08-09 HistoKernel: Whole Slide Image Level Maximum Mean Discrepancy Kernels for Pan-Cancer Predictive Modelling Piotr Keller et.al. 2408.05195v1 link
2024-08-09 Cross-Domain Learning for Video Anomaly Detection with Limited Supervision Yashika Jain et.al. 2408.05191v1 null
2024-08-09 Holomorphic vector fields with real integral manifolds Martin Kolář et.al. 2408.05186v1 null
2024-08-09 MADE-WIC: Multiple Annotated Datasets for Exploring Weaknesses In Code Moritz Mock et.al. 2408.05163v1 null
2024-08-09 Meta-Learning Guided Label Noise Distillation for Robust Signal Modulation Classification Xiaoyang Hao et.al. 2408.05151v1 null
2024-08-09 Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video Chunggi Lee et.al. 2408.05123v1 null
2024-08-09 Cautious Calibration in Binary Classification Mari-Liis Allikivi et.al. 2408.05120v1 null
2024-08-09 Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images Shouyue Liu et.al. 2408.05117v1 null
2024-08-08 Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics Ruining Li et.al. 2408.04631v1 null
2024-08-08 LogogramNLP: Comparing Visual and Textual Representations of Ancient Logographic Writing Systems for NLP Danlu Chen et.al. 2408.04628v1 null
2024-08-08 Transformer Explainer: Interactive Learning of Text-Generative Models Aeree Cho et.al. 2408.04619v1 null
2024-08-08 Quantifying the Impact of Population Shift Across Age and Sex for Abdominal Organ Segmentation Kate Čevora et.al. 2408.04610v1 null
2024-08-08 Enhanced Prototypical Part Network (EPPNet) For Explainable Image Classification Via Prototypes Bhushan Atote et.al. 2408.04606v1 null
2024-08-08 SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation Jieming Yu et.al. 2408.04593v1 null
2024-08-08 Learn To Learn More Precisely Runxi Cheng et.al. 2408.04590v1 null
2024-08-08 SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals Haoran Zheng et.al. 2408.04575v1 null
2024-08-08 Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches Yongzhi Xu et.al. 2408.04567v1 null
2024-08-08 MemeMind at ArAIEval Shared Task: Spotting Persuasive Spans in Arabic Text with Persuasion Techniques Identification Md Rafiul Biswas et.al. 2408.04540v1 null
2024-08-07 How Well Can Vision Language Models See Image Details? Chenhui Gou et.al. 2408.03940v1 null
2024-08-07 Fast Sprite Decomposition from Animated Graphics Tomoyuki Suzuki et.al. 2408.03923v1 null
2024-08-07 FMiFood: Multi-modal Contrastive Learning for Food Image Classification Xinyue Pan et.al. 2408.03922v1 null
2024-08-07 Holomorphic foliations tangent to Rolle-pfaffian hypersurfaces Arturo Fernández-Pérez et.al. 2408.03914v1 null
2024-08-07 AdapMTL: Adaptive Pruning Framework for Multitask Learning Model Mingcan Xiang et.al. 2408.03913v1 null
2024-08-07 Achieving Human Level Competitive Robot Table Tennis David B. D'Ambrosio et.al. 2408.03906v1 null
2024-08-07 Lightweight Video Denoising Using a Classic Bayesian Backbone Clément Bled et.al. 2408.03904v1 null
2024-08-07 Retrieval Augmentation via User Interest Clustering Hanjia Lyu et.al. 2408.03886v1 null
2024-08-07 Global-Local Progressive Integration Network for Blind Image Quality Assessment Xiaoqi Wang et.al. 2408.03885v1 null
2024-08-07 Knowledge Probing for Graph Representation Learning Mingyu Zhao et.al. 2408.03877v1 null
2024-08-06 LLaVA-OneVision: Easy Visual Task Transfer Bo Li et.al. 2408.03326v1 null
2024-08-06 ClassiFIM: An Unsupervised Method To Detect Phase Transitions Victor Kasatkin et.al. 2408.03323v1 null
2024-08-06 Segment Anything in Medical Images and Videos: Benchmark and Deployment Jun Ma et.al. 2408.03322v1 null
2024-08-06 MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation Xiaofeng Mao et.al. 2408.03312v1 null
2024-08-06 Left of Fab: Securing Design and Collaboration in the Semiconductor Value Chain John C. Hoag et.al. 2408.03295v1 null
2024-08-06 Biomedical SAM 2: Segment Anything in Biomedical Images and Videos Zhiling Yan et.al. 2408.03286v1 null
2024-08-06 ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer Jiazhi Guan et.al. 2408.03284v1 null
2024-08-06 Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments Angie Boggust et.al. 2408.03274v1 null
2024-08-07 BVI-AOM: A New Training Dataset for Deep Video Compression Optimization Jakub Nawała et.al. 2408.03265v2 null
2024-08-06 Analysis of Partially-Calibrated Sparse Subarrays for Direction Finding with Extended Degrees of Freedom W. S. Leite et.al. 2408.03236v1 null
2024-08-05 Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics Shishira R Maiya et.al. 2408.02672v1 null
2024-08-05 Interactive 3D Medical Image Segmentation with SAM 2 Chuyun Shen et.al. 2408.02635v1 null
2024-08-05 VidGen-1M: A Large-Scale Dataset for Text-to-video Generation Zhiyu Tan et.al. 2408.02629v1 null
2024-08-05 DanModCap: Designing a Danmaku Moderation Tool for Video-Sharing Platforms that Leverages Impact Captions Siying Hu et.al. 2408.02574v1 null
2024-08-05 Cross-Modality Clustering-based Self-Labeling for Multimodal Data Classification Paweł Zyblewski et.al. 2408.02568v1 null
2024-08-05 HQOD: Harmonious Quantization for Object Detection Long Huang et.al. 2408.02561v1 null
2024-08-05 The effect of dynamical states on galaxy clusters populations. I. Classification of dynamical states S. Véliz Astudillo et.al. 2408.02519v1 null
2024-08-05 Automatic rating of incomplete hippocampal inversions evaluated across multiple cohorts Lisa Hemforth et.al. 2408.02496v1 null
2024-08-05 HyperSpaceX: Radial and Angular Exploration of HyperSpherical Dimensions Chiranjeev Chiranjeev et.al. 2408.02494v1 null
2024-08-05 Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection Ting Lei et.al. 2408.02484v1 null
2024-08-02 Conditional LoRA Parameter Generation Xiaolong Jin et.al. 2408.01415v1 null
2024-08-02 Derivation of Back-propagation for Graph Convolutional Networks using Matrix Calculus and its Application to Explainable Artificial Intelligence Yen-Che Hsiao et.al. 2408.01408v1 null
2024-08-02 NOLO: Navigate Only Look Once Bohan Zhou et.al. 2408.01384v1 null
2024-08-02 Explaining a probabilistic prediction on the simplex with Shapley compositions Paul-Gauthier Noé et.al. 2408.01382v1 null
2024-08-02 Spatial-Spectral Morphological Mamba for Hyperspectral Image Classification Muhammad Ahmad et.al. 2408.01372v1 null
2024-08-02 Classification of marked elliptic root systems with non-reduced quotient A. Fialowski et.al. 2408.01358v1 null
2024-08-02 Harmonized connectome resampling for variance in voxel sizes Elyssa M. McMaster et.al. 2408.01351v1 null
2024-08-02 Human foraging strategies flexibly adapt to resource distribution and time constraints Valeria Simonelli et.al. 2408.01350v1 null
2024-08-02 PC$^2$: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval Yue Duan et.al. 2408.01349v1 null
2024-08-02 Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks Anders Giovanni Møller et.al. 2408.01346v1 null
2024-08-01 Text-Guided Video Masked Autoencoder David Fan et.al. 2408.00759v1 null
2024-08-01 Segment anything model 2: an application to 2D and 3D medical images Haoyu Dong et.al. 2408.00756v1 null
2024-08-01 Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model Benlin Liu et.al. 2408.00754v1 null
2024-08-01 CERT-ED: Certifiably Robust Text Classification for Edit Distance Zhuoqun Huang et.al. 2408.00728v1 null
2024-08-01 SAM 2: Segment Anything in Images and Videos Nikhila Ravi et.al. 2408.00714v1 null
2024-08-01 Investigating Brain Connectivity and Regional Statistics from EEG for early stage Parkinson's Classification Amarpal Sahota et.al. 2408.00711v1 null
2024-08-01 Point-supervised Brain Tumor Segmentation with Box-prompted MedSAM Xiaofeng Liu et.al. 2408.00706v1 null
2024-08-01 Granular-Balls based Fuzzy Twin Support Vector Machine for Classification Lixi Zhao et.al. 2408.00699v1 null
2024-08-01 ExpertAF: Expert Actionable Feedback from Video Kumar Ashutosh et.al. 2408.00672v1 null
2024-08-01 AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models Daqin Luo et.al. 2408.00665v1 null
2024-07-31 The Llama 3 Herd of Models Abhimanyu Dubey et.al. 2407.21783v1 null
2024-07-31 RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining Hongtao Wu et.al. 2407.21773v1 null
2024-07-31 ReplanVLM: Replanning Robotic Tasks with Visual Language Models Aoran Mei et.al. 2407.21762v1 null
2024-07-31 Learning Video Context as Interleaved Multimodal Sequences Kevin Qinghong Lin et.al. 2407.21757v1 null
2024-08-01 Topological Woodward-Hoffmann classification for cycloadditions in polycyclic aromatic azomethine ylides Juan Li et.al. 2407.21756v2 null
2024-07-31 A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation Mothilal Asokan et.al. 2407.21739v1 null
2024-07-31 Leveraging Self-Supervised Learning for Fetal Cardiac Planes Classification using Ultrasound Scan Videos Joseph Geo Benjamin et.al. 2407.21738v1 null
2024-07-31 Artificial Intelligence Approaches for Energy Efficiency: A Review Alberto Pasqualetto et.al. 2407.21726v1 null
2024-07-31 Open-Vocabulary Audio-Visual Semantic Segmentation Ruohao Guo et.al. 2407.21721v1 null
2024-07-31 Tora: Trajectory-oriented Diffusion Transformer for Video Generation Zhenghao Zhang et.al. 2407.21705v1 null
2024-07-30 Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation Marcelo Matheus Gauy et.al. 2407.20989v1 null
2024-07-30 Transfer Learning for Multi-material Classification of Transition Metal Dichalcogenides with Atomic Force Microscopy Isaiah A. Moses et.al. 2407.20975v1 null
2024-07-30 MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions Xiaowei Chi et.al. 2407.20962v1 link
2024-07-30 EAR: Edge-Aware Reconstruction of 3-D vertebrae structures from bi-planar X-ray images Lixing Tan et.al. 2407.20937v1 null
2024-07-30 Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering Yanpeng Zhao et.al. 2407.20908v1 link
2024-07-30 Simultaneous Multi-Slice Diffusion Imaging using Navigator-free Multishot Spiral Acquisition Yuancheng Jiang et.al. 2407.20904v1 null
2024-07-30 Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach Adam Wojciechowski et.al. 2407.20899v1 null
2024-07-30 MambaCapsule: Towards Transparent Cardiac Disease Diagnosis with Electrocardiography Using Mamba Capsule Network Yinlong Xu et.al. 2407.20893v1 null
2024-07-30 Shift operators and their classification Maria Carvalho et.al. 2407.20890v1 null
2024-07-30 Effective Black Box Testing of Sentiment Analysis Classification Networks Parsa Karbasizadeh et.al. 2407.20884v1 null
2024-07-29 SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction Çağhan Köksal et.al. 2407.20214v1 null
2024-07-30 SpaER: Learning Spatio-temporal Equivariant Representations for Fetal Brain Motion Tracking Jian Wang et.al. 2407.20198v2 null
2024-07-29 Radiance Fields for Robotic Teleoperation Maximum Wilder-Smith et.al. 2407.20194v1 null
2024-07-29 Theia: Distilling Diverse Vision Foundation Models for Robot Learning Jinghuan Shang et.al. 2407.20179v1 link
2024-07-29 LatentArtiFusion: An Effective and Efficient Histological Artifacts Restoration Framework Zhenqi He et.al. 2407.20172v1 link
2024-07-29 Diffusion Feedback Helps CLIP See Better Wenxuan Wang et.al. 2407.20171v1 null
2024-07-29 Language-Conditioned Offline RL for Multi-Robot Navigation Steven Morad et.al. 2407.20164v1 null
2024-07-29 Quantum Machine Learning Architecture Search via Deep Reinforcement Learning Xin Dai et.al. 2407.20147v1 null
2024-07-30 AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics Xiangxiang Dai et.al. 2407.20124v2 link
2024-07-29 Integrable and superintegrable quantum mechanical systems with position dependent masses invariant with respect to one parametric Lie groups. 2. Systems with dilatation and shift symmetries A. G. Nikitin et.al. 2407.20112v1 null
2024-07-26 HRP: Human Affordances for Robotic Pre-Training Mohan Kumar Srirama et.al. 2407.18911v1 null
2024-07-26 Wolf: Captioning Everything with a World Summarization Framework Boyi Li et.al. 2407.18908v1 null
2024-07-26 A Scalable Quantum Non-local Neural Network for Image Classification Sparsh Gupta et.al. 2407.18906v1 link
2024-07-26 Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment Yuze Zheng et.al. 2407.18854v1 null
2024-07-26 The Role of Temporal Hierarchy in Spiking Neural Networks Filippo Moro et.al. 2407.18838v1 null
2024-07-26 Learning the Chaotic and Regular Nature of Trajectories in Hamiltonian Systems with Lagrangian descriptors Javier Jiménez López et.al. 2407.18831v1 null
2024-07-26 Binary orbit and disks properties of the RW Aur system using ALMA observations N. T. Kurtovic et.al. 2407.18828v1 null
2024-07-26 Three-dimensional ultrasound-based online system for automated ovarian follicle measurement Pedro Royo et.al. 2407.18818v1 null
2024-07-26 Automatic Detection of Moral Values in Music Lyrics Vjosa Preniqi et.al. 2407.18787v1 null
2024-07-26 Deep learning interpretable analysis for carbon star identification in Gaia DR3 Shuo Ye et.al. 2407.18754v1 null
2024-07-25 Review of Degenerate Higher Order Scalar Tensor Theories in Cosmology Andrei Lazanu et.al. 2407.18234v1 null
2024-07-25 One-point Statistics in various cosmic environments in the presence of massive neutrinos Mohadese Khoshtinat et.al. 2407.18233v1 null
2024-07-26 Enhanced Depth Estimation and 3D Geometry Reconstruction using Bayesian Helmholtz Stereopsis with Belief Propagation Razieh Azizi et.al. 2407.18195v2 null
2024-07-25 PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations Cheng Qian et.al. 2407.18178v1 null
2024-07-26 On-chip near-infrared spectroscopic sensing with over 520nm bandwidth Chunhui Yao et.al. 2407.18172v2 null
2024-07-25 IRIS: Wireless Ring for Vision-based Smart Home Interaction Maruchi Kim et.al. 2407.18141v1 null
2024-07-25 XS-VID: An Extremely Small Video Object Detection Dataset Jiahao Guo et.al. 2407.18137v1 null
2024-07-25 Estimating Earthquake Magnitude in Sentinel-1 Imagery via Ranking Daniele Rege Cambrin et.al. 2407.18128v1 null
2024-07-25 Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images Roberto Di Via et.al. 2407.18125v1 null
2024-07-25 Multi-Resolution Histopathology Patch Graphs for Ovarian Cancer Subtyping Jack Breen et.al. 2407.18105v1 link
2024-07-24 SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency Yiming Xie et.al. 2407.17470v1 null
2024-07-24 SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning Jianpeng Yao et.al. 2407.17460v1 null
2024-07-24 EuroCropsML: A Time Series Benchmark Dataset For Few-Shot Crop Type Classification Joana Reuss et.al. 2407.17458v1 null
2024-07-24 HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation Zhenzhi Wang et.al. 2407.17438v1 link
2024-07-24 Systematic study of High $E_J/E_C$ transmon qudits up to $d = 12$ Z. Wang et.al. 2407.17407v1 null
2024-07-24 Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising Sébastien Herbreteau et.al. 2407.17399v1 null
2024-07-24 Sampling-Based Hierarchical Trajectory Planning for Formation Flight Qingzhao Liu et.al. 2407.17392v1 null
2024-07-24 2D and 3D Deep Learning Models for MRI-based Parkinson's Disease Classification: A Comparative Analysis of Convolutional Kolmogorov-Arnold Networks, Convolutional Neural Networks, and Graph Convolutional Networks Salil B Patel et.al. 2407.17380v1 null
2024-07-24 Entropy Reweighted Conformal Classification Rui Luo et.al. 2407.17377v1 null
2024-07-24 MuST: Multi-Scale Transformers for Surgical Phase Recognition Alejandra Pérez et.al. 2407.17361v1 link
2024-07-23 Explanation Regularisation through the Lens of Attributions Pedro Ferreira et.al. 2407.16693v1 null
2024-07-23 On the local cohomology of secant varieties Sebastian Olano et.al. 2407.16688v1 null
2024-07-23 AutoRG-Brain: Grounded Report Generation for Brain MRI Jiayu Lei et.al. 2407.16684v1 null
2024-07-24 Goedel logics: Prenex fragments Matthias Baaz et.al. 2407.16683v2 null
2024-07-24 A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data Adrian Remonda et.al. 2407.16680v2 link
2024-07-23 From Imitation to Refinement -- Residual RL for Precise Visual Assembly Lars Ankile et.al. 2407.16677v1 null
2024-07-23 FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process Yuyan Bu et.al. 2407.16670v1 null
2024-07-23 EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval Thomas Hummel et.al. 2407.16658v1 link
2024-07-23 Fluorescence Diffraction Tomography using Explicit Neural Fields Renzhi He et.al. 2407.16657v1 null
2024-07-23 MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence Canyu Zhao et.al. 2407.16655v1 null
2024-07-22 AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description Junyu Xie et.al. 2407.15850v1 link
2024-07-22 SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Mingze Xu et.al. 2407.15841v1 null
2024-07-23 QueST: Self-Supervised Skill Abstractions for Learning Continuous Control Atharva Mete et.al. 2407.15840v2 null
2024-07-22 Enhancing Cell Instance Segmentation in Scanning Electron Microscopy Images via a Deep Contour Closing Operator Florian Robert et.al. 2407.15817v1 null
2024-07-22 Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning Zhecheng Yuan et.al. 2407.15815v1 null
2024-07-22 The Evaporating Massive Embedded Stellar Cluster IRS 13 Close to Sgr A. II. Kinematic structure* Florian Peißker et.al. 2407.15800v1 null
2024-07-22 Adaptive Extensions of Unbiased Risk Estimators for Unsupervised Magnetic Resonance Image Denoising Reeshad Khan et.al. 2407.15799v1 null
2024-07-23 Disentangling spatio-temporal knowledge for weakly supervised object detection and segmentation in surgical video Guiqiu Liao et.al. 2407.15794v2 null
2024-07-22 LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Haoning Wu et.al. 2407.15754v1 link
2024-07-22 SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection Dimitrios Kollias et.al. 2407.15728v1 null
2024-07-19 DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks Sarah Jabbour et.al. 2407.14509v1 null
2024-07-19 T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation Kaiyue Sun et.al. 2407.14505v1 null
2024-07-19 Nonlinear Schrödinger Network Yiming Zhou et.al. 2407.14504v1 null
2024-07-19 Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery Sukrut Rao et.al. 2407.14499v1 link
2024-07-19 Enhancing Layout Hotspot Detection Efficiency with YOLOv8 and PCA-Guided Augmentation Dongyang Wu et.al. 2407.14498v1 null
2024-07-19 Evaluating the Reliability of Self-Explanations in Large Language Models Korbinian Randl et.al. 2407.14487v1 link
2024-07-19 Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model Seonghui Min et.al. 2407.14434v1 null
2024-07-19 Dataset Distillation in Medical Imaging: A Feasibility Study Muyang Li et.al. 2407.14429v1 null
2024-07-19 Controllable and Efficient Multi-Class Pathology Nuclei Data Augmentation using Text-Conditioned Diffusion Models Hyun-Jic Oh et.al. 2407.14426v1 null
2024-07-19 Improving classification of road surface conditions via road area extraction and contrastive learning Linh Trinh et.al. 2407.14418v1 null
2024-07-18 GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model Abdelrahman Shaker et.al. 2407.13772v1 null
2024-07-18 Addressing Imbalance for Class Incremental Learning in Medical Image Classification Xuze Hao et.al. 2407.13768v1 null
2024-07-18 Shape of Motion: 4D Reconstruction from a Single Video Qianqian Wang et.al. 2407.13764v1 null
2024-07-18 Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion Boyang Deng et.al. 2407.13759v1 null
2024-07-18 Exploring Facial Biomarkers for Depression through Temporal Analysis of Action Units Aditya Parikh et.al. 2407.13753v1 null
2024-07-18 Temporal Representation Learning for Stock Similarities and Its Applications in Investment Management Yoontae Hwang et.al. 2407.13751v1 null
2024-07-18 Pose-guided multi-task video transformer for driver action recognition Ricardo Pizarro et.al. 2407.13750v1 null
2024-07-18 Multi-Label Learning with Stronger Consistency Guarantees Anqi Mao et.al. 2407.13746v1 null
2024-07-18 Realizable $H$-Consistent and Bayes-Consistent Loss Functions for Learning to Defer Anqi Mao et.al. 2407.13732v1 null
2024-07-18 Enhanced $H$-Consistency Bounds Anqi Mao et.al. 2407.13722v1 null
2024-07-17 VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Sherwin Bahmani et.al. 2407.12781v1 null
2024-07-17 Hallucination Index: An Image Quality Metric for Generative Reconstruction Models Matthew Tivnan et.al. 2407.12780v1 null
2024-07-17 LookupViT: Compressing visual information to a limited number of tokens Rajat Koner et.al. 2407.12753v1 null
2024-07-17 4Dynamic: Text-to-4D Generation with Hybrid Priors Yu-Jie Yuan et.al. 2407.12684v1 null
2024-07-17 Goldfish: Vision-Language Understanding of Arbitrarily Long Videos Kirolos Ataallah et.al. 2407.12679v1 null
2024-07-17 Promptable Counterfactual Diffusion Model for Unified Brain Tumor Segmentation and Generation with MRIs Yiqing Shen et.al. 2407.12678v1 null
2024-07-17 CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems Jiankun Zhao et.al. 2407.12676v1 link
2024-07-17 Distilling Tiny and Ultra-fast Deep Neural Networks for Autonomous Navigation on Nano-UAVs Lorenzo Lamberti et.al. 2407.12675v1 null
2024-07-17 Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data Richard Osuala et.al. 2407.12669v1 null
2024-07-17 Is That Rain? Understanding Effects on Visual Odometry Performance for Autonomous UAVs and Efficient DNN-based Rain Classification at the Edge Andrea Albanese et.al. 2407.12663v1 null
2024-07-16 Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling Jaehyeok Kim et.al. 2407.11962v1 null
2024-07-16 A Transformer-based Approach for Augmenting Software Engineering Chatbots Datasets Ahmad Abdellatif et.al. 2407.11955v1 null
2024-07-16 Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation Olga Zatsarynna et.al. 2407.11954v1 null
2024-07-16 Temporally Consistent Stereo Matching Jiaxi Zeng et.al. 2407.11950v1 link
2024-07-17 Hierarchical Separable Video Transformer for Snapshot Compressive Imaging Ping Wang et.al. 2407.11946v2 link
2024-07-16 Tackling Oversmoothing in GNN via Graph Sparsification: A Truss-based Approach Tanvir Hossain et.al. 2407.11928v1 null
2024-07-16 The Strength of Bisymmetric Modes in SDSS-IV/MaNGA Barred Galaxy Kinematics Brian DiGiorgio Zanger et.al. 2407.11908v1 null
2024-07-16 GraphFM: A Scalable Framework for Multi-Graph Pretraining Divyansha Lachi et.al. 2407.11907v1 null
2024-07-16 SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge Hao Ding et.al. 2407.11906v1 null
2024-07-16 Automated production of batched unclonable micro-patterns anti-counterfeiting labels with strong robustness and rapid recognition speed Yuzheng He et.al. 2407.11886v1 null
2024-07-15 No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations Walter Simoncini et.al. 2407.10964v1 link
2024-07-15 InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models Nirat Saini et.al. 2407.10958v1 null
2024-07-15 MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models Chengguang Gan et.al. 2407.10953v1 null
2024-07-15 IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation Yuanhao Zhai et.al. 2407.10937v1 link
2024-07-15 Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together Dilara Soylu et.al. 2407.10930v1 null
2024-07-15 In-Loop Filtering via Trained Look-Up Tables Zhuoyuan Li et.al. 2407.10926v1 null
2024-07-15 A Dual-Attention Aware Deep Convolutional Neural Network for Early Alzheimer's Detection Pandiyaraju V et.al. 2407.10921v1 null
2024-07-16 DataDream: Few-shot Guided Dataset Generation Jae Myung Kim et.al. 2407.10910v2 link
2024-07-15 Interpreting Hand gestures using Object Detection and Digits Classification Sangeetha K et.al. 2407.10902v1 null
2024-07-15 Leveraging Multimodal CycleGAN for the Generation of Anatomically Accurate Synthetic CT Scans from MRIs Leonardo Crespi et.al. 2407.10888v1 null
2024-07-12 Non-Hermitian Origin of Wannier Localizability and Detachable Topological Boundary States Daichi Nakamura et.al. 2407.09458v1 null
2024-07-12 Let Me DeCode You: Decoder Conditioning with Tabular Data Tomasz Szczepański et.al. 2407.09437v1 link
2024-07-12 Rethinking temporal self-similarity for repetitive action counting Yanan Luo et.al. 2407.09431v1 null
2024-07-12 TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models Hang Zou et.al. 2407.09424v1 null
2024-07-12 A grid of self-consistent MSG (MARCS-StaticWeather-GGchem) cool stellar, sub-stellar, and exoplanetary model atmospheres Uffe G. Jørgensen et.al. 2407.09397v1 null
2024-07-12 Open-Canopy: A Country-Scale Benchmark for Canopy Height Estimation at Very High Resolution Fajwel Fogel et.al. 2407.09392v1 link
2024-07-12 Radiance Fields from Photons Sacha Jungerman et.al. 2407.09386v1 null
2024-07-12 Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation Zhilin Zhu et.al. 2407.09367v1 link
2024-07-12 Novel clustered federated learning based on local loss Endong Gu et.al. 2407.09360v1 link
2024-07-12 Imaging Interiors: An Implicit Solution to Electromagnetic Inverse Scattering Problems Ziyuan Luo et.al. 2407.09352v1 null
2024-07-11 Video Diffusion Alignment via Reward Gradients Mihir Prabhudesai et.al. 2407.08737v1 link
2024-07-11 Real-Time Anomaly Detection and Reactive Planning with Large Language Models Rohan Sinha et.al. 2407.08735v1 null
2024-07-11 WhisperNetV2: SlowFast Siamese Network For Lip-Based Biometrics Abdollah Zakeri et.al. 2407.08717v1 null
2024-07-11 Sensor-Aware Classifiers for Energy-Efficient Time Series Applications on IoT Devices Dina Hussein et.al. 2407.08715v1 null
2024-07-11 Towards Efficient Deployment of Hybrid SNNs on Neuromorphic and Edge AI Hardware James Seekings et.al. 2407.08704v1 null
2024-07-11 Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models Zhening Xing et.al. 2407.08701v1 null
2024-07-11 ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions Jiu Feng et.al. 2407.08691v1 link
2024-07-11 Generalizable Implicit Motion Modeling for Video Frame Interpolation Zujin Guo et.al. 2407.08680v1 null
2024-07-11 Still-Moving: Customized Video Generation without Customized Video Data Hila Chefer et.al. 2407.08674v1 null
2024-07-11 NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning Yi Zhang et.al. 2407.08672v1 null
2024-07-10 LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Feng Li et.al. 2407.07895v1 link
2024-07-10 Vegetable Peeling: A Case Study in Constrained Dexterous Manipulation Tao Chen et.al. 2407.07884v1 null
2024-07-10 Controlling Space and Time with Diffusion Models Daniel Watson et.al. 2407.07860v1 null
2024-07-11 Functional Assessment of Cerebral Capillaries using Single Capillary Reporters in Ultrasound Localization Microscopy Stephen A Lee et.al. 2407.07857v2 null
2024-07-10 Study on Aspect Ratio Variability toward Robustness of Vision Transformer-based Vehicle Re-identification Mei Qiu et.al. 2407.07842v1 null
2024-07-10 Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective Shengjia Chen et.al. 2407.07841v1 link
2024-07-10 Probe and Prejudice: Classification of compact objects and model comparison using EOS knowledge Hauke Koehn et.al. 2407.07837v1 null
2024-07-10 RT-LA-VocE: Real-Time Low-SNR Audio-Visual Speech Enhancement Honglie Chen et.al. 2407.07825v1 null
2024-07-10 New Gravitational Wave Discoveries Enabled by Machine Learning Alexandra E. Koloniari et.al. 2407.07820v1 null
2024-07-10 The Misclassification Likelihood Matrix: Some Classes Are More Likely To Be Misclassified Than Others Daniel Sikar et.al. 2407.07818v1 null
2024-07-09 V-VIPE: Variational View Invariant Pose Embedding Mara Levy et.al. 2407.07092v1 null
2024-07-09 Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic Ruochen Jin et.al. 2407.07089v1 link
2024-07-09 MoSt-DSA: Modeling Motion and Structural Interactions for Direct Multi-Frame Interpolation in DSA Images Ziyang Xu et.al. 2407.07078v1 link
2024-07-09 MADE-for-ASD: A Multi-Atlas Deep Ensemble Network for Diagnosing Autism Spectrum Disorder Md Rakibul Hasan et.al. 2407.07076v1 null
2024-07-10 CAPformer: Compression-Aware Pre-trained Transformer for Low-Light Image Enhancement Wei Wang et.al. 2407.07056v2 null
2024-07-09 Latent Space Imaging Matheus Souza et.al. 2407.07052v1 null
2024-07-09 Simple and Interpretable Probabilistic Classifiers for Knowledge Graphs Christian Riefolo et.al. 2407.07045v1 null
2024-07-09 Free Fermionic Constructions of Heterotic Strings Ioannis Florakis et.al. 2407.07034v1 null
2024-07-09 Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition Daiqing Wu et.al. 2407.07026v1 null
2024-07-09 Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization Jeongseok Hyun et.al. 2407.07024v1 link
2024-07-08 Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Orr Zohar et.al. 2407.06189v1 link
2024-07-08 Classification of Cellular Automata based on the Hamming distance Gaspar Alfaro et.al. 2407.06175v1 null
2024-07-08 The Tug-of-War Between Deepfake Generation and Detection Hannah Lee et.al. 2407.06174v1 null
2024-07-08 PanDORA: Casual HDR Radiance Acquisition for Indoor Scenes Mohammad Reza Karimi Dastjerdi et.al. 2407.06150v1 null
2024-07-08 Physics-informed machine learning approaches to reactor antineutrino detection Sophia Farrell et.al. 2407.06139v1 null
2024-07-08 Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities Avinash Anand et.al. 2407.06125v1 null
2024-07-08 Accelerating Diffusion for SAR-to-Optical Image Translation via Adversarial Consistency Distillation Xinyu Bai et.al. 2407.06095v1 null
2024-07-08 ERR@HRI 2024 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Interactions Micol Spitale et.al. 2407.06094v1 null
2024-07-08 Artificial Intuition: Efficient Classification of Scientific Abstracts Harsh Sakhrani et.al. 2407.06093v1 null
2024-07-08 Assessing Cardiomegaly in Dogs Using a Simple CNN Model Nikhil Deekonda et.al. 2407.06092v1 null
2024-07-05 VCoME: Verbal Video Composition with Multimodal Editing Effects Weibo Gong et.al. 2407.04697v1 null
2024-07-05 Enhancing Vehicle Re-identification and Matching for Weaving Analysis Mei Qiu et.al. 2407.04688v1 null
2024-07-05 Embracing Massive Medical Data Yu-Cheng Chou et.al. 2407.04687v1 link
2024-07-05 Is plantar thermography a valid digital biomarker for characterising diabetic foot ulceration risk? Akshay Jagadeesh et.al. 2407.04676v1 null
2024-07-05 AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation Yuhan Zhu et.al. 2407.04603v1 null
2024-07-05 Multimodal Classification via Modal-Aware Interactive Enhancement Qing-Yuan Jiang et.al. 2407.04587v1 null
2024-07-05 A Degree Bound for Planar Functions Christof Beierle et.al. 2407.04570v1 null
2024-07-05 Pencils of plane cubics with one base point Riccardo Moschetti et.al. 2407.04569v1 null
2024-07-05 Anticipating Solar Flares Hugh S. Hudson et.al. 2407.04567v1 null
2024-07-05 Real Time Emotion Analysis Using Deep Learning for Education, Entertainment, and Beyond Abhilash Khuntia et.al. 2407.04560v1 null
2024-07-03 InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Pan Zhang et.al. 2407.03320v1 link
2024-07-03 Value-Penalized Auxiliary Control from Examples for Learning without Rewards or Demonstrations Trevor Ablett et.al. 2407.03311v1 link
2024-07-03 Accelerated Proton Resonance Frequency-based Magnetic Resonance Thermometry by Optimized Deep Learning Method Sijie Xu et.al. 2407.03308v1 link
2024-07-03 HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization Yucheng Tang et.al. 2407.03307v1 null
2024-07-03 VCHAR:Variance-Driven Complex Human Activity Recognition framework with Generative Representation Yuan Sun et.al. 2407.03291v1 null
2024-07-03 Using Photoplethysmography to Detect Real-time Blood Pressure Changes with a Calibration-free Deep Learning Model Jingyuan Hong et.al. 2407.03274v1 null
2024-07-03 Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later Han-Jia Ye et.al. 2407.03257v1 link
2024-07-03 STF: Sentence Transformer Fine-Tuning For Topic Categorization With Limited Data Kheir Eddine Daouadi et.al. 2407.03253v1 null
2024-07-03 ACTRESS: Active Retraining for Semi-supervised Visual Grounding Weitai Kang et.al. 2407.03251v1 null
2024-07-04 TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach Weikun Peng et.al. 2407.03245v2 null
2024-07-02 Characterizing the Interpretability of Attention Maps in Digital Pathology Tomé Albuquerque et.al. 2407.02484v1 null
2024-07-02 Ensemble of pre-trained language models and data augmentation for hate speech detection from Arabic tweets Kheir Eddine Daouadi et.al. 2407.02448v1 null
2024-07-02 PLeaS -- Merging Models with Permutations and Least Squares Anshul Nasery et.al. 2407.02447v1 null
2024-07-02 Evaluating the Robustness of Adverse Drug Event Classification Models Using Templates Dorothea MacPhail et.al. 2407.02432v1 null
2024-07-02 AXIAL: Attention-based eXplainability for Interpretable Alzheimer's Localized Diagnosis using 2D CNNs on 3D MRI brain scans Gabriele Lozupone et.al. 2407.02418v1 link
2024-07-03 Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs Jinmin Li et.al. 2407.02411v2 null
2024-07-02 Tiny-PULP-Dronets: Squeezing Neural Networks for Faster and Lighter Inference on Multi-Tasking Autonomous Nano-Drones Lorenzo Lamberti et.al. 2407.02405v1 null
2024-07-03 A neural networks method to search for long transient gravitational waves Francesca Attadio et.al. 2407.02391v2 null
2024-07-02 Real HSI-MSI-PAN image dataset for the hyperspectral/multi-spectral/panchromatic image fusion and super-resolution fields Shuangliang Li et.al. 2407.02387v1 link
2024-07-02 OpenSlot: Mixed Open-set Recognition with Object-centric Learning Xu Yin et.al. 2407.02386v1 null
2024-06-28 Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs Sukmin Yun et.al. 2406.20098v1 link
2024-06-28 LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression Jieneng Chen et.al. 2406.20092v1 link
2024-06-28 Minimax And Adaptive Transfer Learning for Nonparametric Classification under Distributed Differential Privacy Constraints Arnab Auddy et.al. 2406.20088v1 null
2024-06-28 Extreme horizon equation Wojciech Kamiński et.al. 2406.20068v1 null
2024-06-28 Modeling and LQR Control of Insect Sized Flapping Wing Robot Daksh Dhingra et.al. 2406.20061v1 null
2024-06-28 Pairwise Difference Learning for Classification Mohamed Karim Belaid et.al. 2406.20031v1 link
2024-06-28 On the Trade-off between Flatness and Optimization in Distributed Learning Ying Cao et.al. 2406.20006v1 null
2024-06-28 Malaria Cell Detection Using Deep Neural Networks Saurabh Sawant et.al. 2406.20005v1 null
2024-06-28 Impact of Initialization on Intra-subject Pediatric Brain MR Image Registration: A Comparative Analysis between SyN ANTs and Deep Learning-Based Approaches Andjela Dimitrijevic et.al. 2406.19943v1 link
2024-07-01 GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection Chih-Chung Hsu et.al. 2406.19941v2 link
2024-06-27 ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos Jr-Jen Chen et.al. 2406.19392v1 link
2024-06-27 Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads Ali Khaleghi Rahimian et.al. 2406.19391v1 link
2024-06-27 OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding Tao Zhang et.al. 2406.19389v1 null
2024-06-27 Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model Haobo Yuan et.al. 2406.19369v1 null
2024-06-27 IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language Lucky Susanto et.al. 2406.19349v1 null
2024-06-27 Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation Yushun Tang et.al. 2406.19341v1 null
2024-06-28 LiverUSRecon: Automatic 3D Reconstruction and Volumetry of the Liver with a Few Partial Ultrasound Scans Kaushalya Sivayogaraj et.al. 2406.19336v2 null
2024-06-27 PNeRV: A Polynomial Neural Representation for Videos Sonam Gupta et.al. 2406.19299v1 null
2024-06-27 Leveraging Contrastive Learning for Enhanced Node Representations in Tokenized Graph Transformers Jinsong Chen et.al. 2406.19258v1 null
2024-06-27 Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment Hao Fei et.al. 2406.19255v1 null
2024-06-26 Towards Compositionality in Concept Learning Adam Stein et.al. 2406.18534v1 link
2024-06-26 MatchTime: Towards Automatic Soccer Game Commentary Generation Jiayuan Rao et.al. 2406.18530v1 null
2024-06-26 MultiDiff: Consistent Novel View Synthesis from a Single Image Norman Müller et.al. 2406.18524v1 null
2024-06-26 ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation Shenghai Yuan et.al. 2406.18522v1 null
2024-06-27 Distinguishing mechanisms of social contagion from local network view Elsa Andres et.al. 2406.18519v2 null
2024-06-26 Assessment of Clonal Hematopoiesis of Indeterminate Potential from Cardiac Magnetic Resonance Imaging using Deep Learning in a Cardio-oncology Population Sangeon Ryu et.al. 2406.18508v1 null
2024-06-26 Robust Surgical Phase Recognition From Annotation Efficient Supervision Or Rubin et.al. 2406.18481v1 null
2024-06-26 Universal Anomaly Detection at the LHC: Transforming Optimal Classifiers and the DDD Method Sascha Caron et.al. 2406.18469v1 null
2024-06-26 An Autotuning-based Optimization Framework for Mixed-kernel SVM Classifications in Smart Pixel Datasets and Heterojunction Transistors Xingfu Wu et.al. 2406.18445v1 null
2024-06-26 Repeat and Concatenate: 2D to 3D Image Translation with 3D to 3D Generative Modeling Abril Corona-Figueroa et.al. 2406.18422v1 null
2024-06-25 Text-Animator: Controllable Visual Text Video Generation Lin Liu et.al. 2406.17777v1 null
2024-06-25 MotionBooth: Motion-Aware Customized Text-to-Video Generation Jianzong Wu et.al. 2406.17758v1 null
2024-06-25 Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation Tushar Prasanna Swaminathan et.al. 2406.17749v1 null
2024-06-25 Structured Unrestricted-Rank Matrices for Parameter Efficient Fine-tuning Arijit Sehanobish et.al. 2406.17740v1 null
2024-06-25 Mask-Guided Attention U-Net for Enhanced Neonatal Brain Extraction and Image Preprocessing Bahram Jafrasteh et.al. 2406.17709v1 link
2024-06-25 SurgeMOD: Translating image-space tissue motions into vision-based surgical forces Mikel De Iturrate Reyzabal et.al. 2406.17707v1 link
2024-06-25 Dualities for universal (co)acting Hopf monoids Ana Agore et.al. 2406.17684v1 null
2024-06-25 Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation Xuming Zhang et.al. 2406.17679v1 null
2024-06-25 Lifting of locally initial objects and universal (co)acting Hopf algebras Ana Agore et.al. 2406.17677v1 null
2024-06-25 Brain Tumor Classification using Vision Transformer with Selective Cross-Attention Mechanism and Feature Calibration Mohammad Ali Labbaf Khaniki et.al. 2406.17670v1 null
2024-06-24 StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal Chongjie Ye et.al. 2406.16864v1 null
2024-06-24 FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models Haonan Qiu et.al. 2406.16863v1 link
2024-06-24 Dreamitate: Real-World Visuomotor Policy Learning via Video Generation Junbang Liang et.al. 2406.16862v1 null
2024-06-24 Long Context Transfer from Language to Vision Peiyuan Zhang et.al. 2406.16852v1 link
2024-06-24 Unsupervised Domain Adaptation for Pediatric Brain Tumor Segmentation Jingru Fu et.al. 2406.16848v1 null
2024-06-24 Exploring Factual Entailment with NLI: A News Media Study Guy Mor-Lan et.al. 2406.16842v1 null
2024-06-24 A Certifiable Algorithm for Simultaneous Shape Estimation and Object Tracking Lorenzo Shaikewitz et.al. 2406.16837v1 null
2024-06-24 USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations Mounika Marreddy et.al. 2406.16833v1 null
2024-06-24 The classification of simple complex Lie superalgebras of polynomial vector fields and their deformations Dimitry Leites et.al. 2406.16760v1 null
2024-06-24 The MRI Scanner as a Diagnostic: Image-less Active Sampling Yuning Du et.al. 2406.16754v1 null
2024-06-21 Full-Scale Indexing and Semantic Annotation of CT Imaging: Boosting FAIRness Hannes Ulrich et.al. 2406.15340v1 null
2024-06-21 Image Conductor: Precision Control for Interactive Video Synthesis Yaowei Li et.al. 2406.15339v1 null
2024-06-21 An End-to-End, Segmentation-Free, Arabic Handwritten Recognition Model on KHATT Sondos Aabed et.al. 2406.15329v1 null
2024-06-21 Fine-grained Attention in Hierarchical Transformers for Tabular Time-series Raphael Azorin et.al. 2406.15327v1 link
2024-06-21 NLP-KG: A System for Exploratory Search of Scientific Literature in Natural Language Processing Tim Schopf et.al. 2406.15294v1 link
2024-06-21 Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics Weijia Zhang et.al. 2406.15264v1 null
2024-06-24 VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Xuan He et.al. 2406.15252v2 null
2024-06-21 Retrieval Augmented Zero-Shot Text Classification Tassallah Abdullahi et.al. 2406.15241v1 null
2024-06-21 Model Equivalences Michael Benedikt et.al. 2406.15235v1 null
2024-06-21 Rate-Splitting Multiple Access for Overloaded Multi-group Multicast: A First Experimental Study Xinze Lyu et.al. 2406.15217v1 null
2024-06-20 A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models Xincheng Shuai et.al. 2406.14555v1 link
2024-06-21 Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation Eyal Michaeli et.al. 2406.14551v2 link
2024-06-20 IRASim: Learning Interactive Real-Robot Action Simulators Fangqi Zhu et.al. 2406.14540v1 null
2024-06-20 Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration Long Lei et.al. 2406.14534v1 link
2024-06-20 Local symmetries in partially ordered sets Christoph Minz et.al. 2406.14533v1 null
2024-06-20 Fantastic Copyrighted Beasts and How (Not) to Generate Them Luxi He et.al. 2406.14526v1 null
2024-06-20 MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Xinyu Fang et.al. 2406.14515v1 link
2024-06-20 V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data Rotem Shalev-Arkushin et.al. 2406.14510v1 null
2024-06-20 LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors Sheikh Asif Imran et.al. 2406.14498v1 link
2024-06-20 African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification Gregor Geigle et.al. 2406.14496v1 null
2024-06-18 DrVideo: Document Retrieval Based Long Video Understanding Ziyu Ma et.al. 2406.12846v1 null
2024-06-18 LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging Jinuk Kim et.al. 2406.12837v1 link
2024-06-18 GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation Ci-Siang Lin et.al. 2406.12834v1 null
2024-06-18 VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing Jing Gu et.al. 2406.12831v1 null
2024-06-18 Neural Approximate Mirror Maps for Constrained Diffusion Models Berthy T. Feng et.al. 2406.12816v1 null
2024-06-18 Privacy Preserving Federated Learning in Medical Imaging with Uncertainty Estimation Nikolas Koutsoubis et.al. 2406.12815v1 link
2024-06-18 Probabilistic Temporal Prediction of Continuous Disease Trajectories and Treatment Effects Using Neural SDEs Joshua Durso-Finley et.al. 2406.12807v1 null
2024-06-18 Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition Xingming Liao et.al. 2406.12779v1 null
2024-06-18 Medvedev degrees of subshifts on groups Sebastián Barbieri et.al. 2406.12777v1 null
2024-06-18 Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video Xiangming Zhu et.al. 2406.12769v1 null
2024-06-17 Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% Lei Zhu et.al. 2406.11837v1 link
2024-06-17 Spectral Introspection Identifies Group Training Dynamics in Deep Neural Networks for Neuroimaging Bradley T. Baker et.al. 2406.11825v1 null
2024-06-17 Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation Alexander Raistrick et.al. 2406.11824v1 null
2024-06-17 VideoLLM-online: Online Video Large Language Model for Streaming Video Joya Chen et.al. 2406.11816v1 null
2024-06-17 Faces of Experimental Pain: Transferability of Deep Learned Heat Pain Features to Electrical Pain Pooja Prajod et.al. 2406.11808v1 null
2024-06-17 Mix-Domain Contrastive Learning for Unpaired H&E-to-IHC Stain Translation Song Wang et.al. 2406.11799v1 null
2024-06-17 CELL your Model: Contrastive Explanation Methods for Large Language Models Ronny Luss et.al. 2406.11785v1 null
2024-06-17 Task Me Anything Jieyu Zhang et.al. 2406.11775v1 link
2024-06-17 Domain Generalization for In-Orbit 6D Pose Estimation Antoine Legrand et.al. 2406.11743v1 null
2024-06-17 Lightweight Model Pre-training via Language Guided Knowledge Distillation Mingsheng Li et.al. 2406.11689v1 link
2024-06-14 VideoGUI: A Benchmark for GUI Automation from Instructional Videos Kevin Qinghong Lin et.al. 2406.10227v1 null
2024-06-14 Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding Ridouane Ghermi et.al. 2406.10221v1 null
2024-06-14 SSTFB: Leveraging self-supervised pretext learning and temporal self-attention with feature branching for real-time video polyp segmentation Ziang Xu et.al. 2406.10200v1 null
2024-06-14 CarLLaVA: Vision language models for camera-only closed-loop driving Katrin Renz et.al. 2406.10165v1 null
2024-06-14 Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition Guinan Li et.al. 2406.10152v1 null
2024-06-14 Training-free Camera Control for Video Generation Chen Hou et.al. 2406.10126v1 null
2024-06-14 Modified Risk Formulation for Improving the Prediction of Knee Osteoarthritis Progression Haresh Rengaraj Rajamohan et.al. 2406.10119v1 null
2024-06-14 ECGMamba: Towards Efficient ECG Classification with BiSSM Yupeng Qiang et.al. 2406.10098v1 null
2024-06-14 Biomarker based Cancer Classification using an Ensemble with Pre-trained Models Chongmin Lee et.al. 2406.10087v1 null
2024-06-14 On the Evaluation of Speech Foundation Models for Spoken Language Understanding Siddhant Arora et.al. 2406.10083v1 null
2024-06-13 VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding Muhammad Maaz et.al. 2406.09418v1 link
2024-06-13 An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Duy-Kien Nguyen et.al. 2406.09415v1 null
2024-06-13 CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras Sachin Shah et.al. 2406.09409v1 null
2024-06-13 Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion Linzhan Mou et.al. 2406.09402v1 null
2024-06-13 OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation Junke Wang et.al. 2406.09399v1 link
2024-06-13 Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA Jongwoo Park et.al. 2406.09396v1 null
2024-06-13 LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living Rajatsubhra Chakraborty et.al. 2406.09390v1 null
2024-06-13 Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior Baiang Li et.al. 2406.09389v1 null
2024-06-13 Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition Youngtaek Oh et.al. 2406.09388v1 link
2024-06-13 SimGen: Simulator-conditioned Driving Scene Generation Yunsong Zhou et.al. 2406.09386v1 null
2024-06-12 On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models Hashmat Shadab Malik et.al. 2406.08486v1 link
2024-06-12 RMem: Restricted Memory Banks Improve Video Object Segmentation Junbao Zhou et.al. 2406.08476v1 null
2024-06-12 AToM-Bot: Embodied Fulfillment of Unspoken Human Needs with Affective Theory of Mind Wei Ding et.al. 2406.08455v1 null
2024-06-12 Transformation-Dependent Adversarial Attacks Yaoteng Tan et.al. 2406.08443v1 null
2024-06-12 A Sticker is Worth a Thousand Words: Characterizing the Use of Stickers in WhatsApp Political Groups in Brazil Philipe Melo et.al. 2406.08429v1 null
2024-06-12 Improving Noise Robustness through Abstractions and its Impact on Machine Learning Alfredo Ibias et.al. 2406.08428v1 null
2024-06-12 OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Qingyun Li et.al. 2406.08418v1 link
2024-06-13 MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Xuehai He et.al. 2406.08407v2 link
2024-06-12 Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze Michele Mazzamuto et.al. 2406.08379v1 null
2024-06-12 2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction Tianqi Chen et.al. 2406.08374v1 null
2024-06-11 Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring Huicong Zhang et.al. 2406.07551v1 link
2024-06-11 Image and Video Tokenization with Binary Spherical Quantization Yue Zhao et.al. 2406.07548v1 link
2024-06-11 Zero-shot Image Editing with Reference Imitation Xi Chen et.al. 2406.07547v1 null
2024-06-11 Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance Kuan Heng Lin et.al. 2406.07540v1 null
2024-06-11 BAKU: An Efficient Transformer for Multi-Task Policy Learning Siddhant Haldar et.al. 2406.07539v1 null
2024-06-11 Transforming a rare event search into a not-so-rare event search in real-time with deep learning-based object detection J. Schueler et.al. 2406.07538v1 null
2024-06-11 Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection Wenxiao Wang et.al. 2406.07536v1 null
2024-06-11 Dynamics of the non-radial energy-critical inhomogeneous NLS Carlos M. Guzmán et.al. 2406.07535v1 null
2024-06-11 Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement Yunzhen Feng et.al. 2406.07515v1 null
2024-06-11 Understanding Visual Concepts Across Models Brandon Trabucco et.al. 2406.07506v1 link
2024-06-10 NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing Ting-Hsuan Chen et.al. 2406.06523v1 null
2024-06-10 Data Augmentation for Multivariate Time Series Classification: An Experimental Study Romain Ilbert et.al. 2406.06518v1 null
2024-06-10 Merlin: A Vision Language Foundation Model for 3D Computed Tomography Louis Blankemeier et.al. 2406.06512v1 null
2024-06-10 Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer Sigal Raab et.al. 2406.06508v1 link
2024-06-10 Equivariant Neural Tangent Kernels Philipp Misof et.al. 2406.06504v1 null
2024-06-10 Viscous shock fluctuations in KPZ Alexander Dunlap et.al. 2406.06502v1 null
2024-06-10 NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative Asmar Nadeem et.al. 2406.06499v1 null
2024-06-10 Demonstrating HumanTHOR: A Simulation Platform and Benchmark for Human-Robot Collaboration in a Shared Workspace Chenxu Wang et.al. 2406.06498v1 null
2024-06-10 Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data Nicole Hayes et.al. 2406.06479v1 null
2024-06-10 DiffAudit: Auditing Privacy Practices of Online Services for Children and Adolescents Olivia Figueira et.al. 2406.06473v1 null
2024-06-07 DVOS: Self-Supervised Dense-Pattern Video Object Segmentation Keyhan Najafian et.al. 2406.05131v1 null
2024-06-07 Compositional Curvature Bounds for Deep Neural Networks Taha Entesari et.al. 2406.05119v1 null
2024-06-07 Large Generative Graph Models Yu Wang et.al. 2406.05109v1 null
2024-06-07 A Novel Time Series-to-Image Encoding Approach for Weather Phenomena Classification Christian Giannetti et.al. 2406.05096v1 null
2024-06-10 Discovery of An Apparent Red, High-Velocity Type Ia Supernova at z = 2.9 with JWST J. D. R. Pierel et.al. 2406.05089v2 null
2024-06-07 CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion Xingrui Wang et.al. 2406.05082v1 null
2024-06-10 Discovery of a Relativistic Stripped Envelope Type Ic-BL Supernova at z = 2.83 with JWST M. R. Siebert et.al. 2406.05076v2 null
2024-06-07 Diving Deep into the Motion Representation of Video-Text Models Chinmaya Devaraj et.al. 2406.05075v1 null
2024-06-07 Hibou: A Family of Foundational Vision Transformers for Pathology Dmitry Nechaev et.al. 2406.05074v1 null
2024-06-07 Classification Metrics for Image Explanations: Towards Building Reliable XAI-Evaluations Benjamin Fresz et.al. 2406.05068v1 link
2024-06-06 Verbalized Machine Learning: Revisiting Machine Learning with Language Models Tim Z. Xiao et.al. 2406.04344v1 null
2024-06-07 Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion Fangfu Liu et.al. 2406.04338v2 null
2024-06-06 Parameter-Inverted Image Pyramid Networks Xizhou Zhu et.al. 2406.04330v1 link
2024-06-06 ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Lin Chen et.al. 2406.04325v1 null
2024-06-06 SF-V: Single Forward Video Generation Model Zhixing Zhang et.al. 2406.04324v1 null
2024-06-06 ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories Qianlan Yang et.al. 2406.04323v1 null
2024-06-06 VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling Zeyue Tian et.al. 2406.04321v1 link
2024-06-06 Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models Ali Behrouz et.al. 2406.04320v1 null
2024-06-06 Adaptive Sampling of k-Space in Magnetic Resonance for Rapid Pathology Prediction Chen-Yu Yen et.al. 2406.04318v1 null
2024-06-06 Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks Tristan Cinquin et.al. 2406.04317v1 null
2024-06-05 Grokking Modular Polynomials Darshil Doshi et.al. 2406.03495v1 null
2024-06-05 The Logarithmic Memristor-Based Bayesian Machine Clément Turck et.al. 2406.03492v1 null
2024-06-05 Convolutional Neural Networks and Vision Transformers for Fashion MNIST Classification: A Literature Review Sonia Bbouzidi et.al. 2406.03478v1 null
2024-06-05 Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach Haoyu Han et.al. 2406.03464v1 null
2024-06-05 Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts Dominik Scheuble et.al. 2406.03461v1 null
2024-06-05 FILS: Self-Supervised Video Feature Prediction In Semantic Language Space Mona Ahmadian et.al. 2406.03447v1 null
2024-06-05 Text-to-Events: Synthetic Event Camera Streams from Conditional Text Input Joachim Ott et.al. 2406.03439v1 null
2024-06-05 Stabilizing massless fields with fluxes in Landau-Ginzburg models Katrin Becker et.al. 2406.03435v1 null
2024-06-05 Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis Moein Heidari et.al. 2406.03430v1 link
2024-06-05 Post-hoc Part-prototype Networks Andong Tan et.al. 2406.03421v1 null
2024-06-05 Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting Inkyu Shin et.al. 2406.02541v2 null
2024-06-04 ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation Tianchen Zhao et.al. 2406.02540v1 null
2024-06-04 Enhancing predictive imaging biomarker discovery through treatment effect analysis Shuhan Xiao et.al. 2406.02534v1 null
2024-06-04 ReLUs Are Sufficient for Learning Implicit Neural Representations Joseph Shenouda et.al. 2406.02529v1 link
2024-06-04 RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots Soroush Nasiriany et.al. 2406.02523v1 null
2024-06-04 DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering Zhongpai Gao et.al. 2406.02518v1 null
2024-06-04 V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation Cong Wang et.al. 2406.02511v1 null
2024-06-04 CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation Dejia Xu et.al. 2406.02509v1 null
2024-06-04 Endomorphisms of Artin groups of type $\tilde A_n$ Luis Paris et.al. 2406.02484v1 null
2024-06-04 Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion Colin Hansen et.al. 2406.02477v1 null
2024-05-31 Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis Chaoyou Fu et.al. 2405.21075v1 null
2024-05-31 Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights Xin Wen et.al. 2405.21070v1 link
2024-05-31 You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet Zhen Qin et.al. 2405.21022v1 null
2024-05-31 Beyond Conventional Parametric Modeling: Data-Driven Framework for Estimation and Prediction of Time Activity Curves in Dynamic PET Imaging Niloufar Zakariaei et.al. 2405.21021v1 null
2024-05-31 The classification of dp-minimal integral domains Christian d'Elbée et.al. 2405.21014v1 null
2024-05-31 Early Stopping Criteria for Training Generative Adversarial Networks in Biomedical Imaging Muhammad Muneeb Saad et.al. 2405.20987v1 null
2024-05-31 PUAL: A Classifier on Trifurcate Positive-Unlabeled Data Xiaoke Wang et.al. 2405.20970v1 null
2024-05-31 Aligning Multiclass Neural Network Classifier Criterion with Task Performance via $F_β$-Score Nathan Tsoi et.al. 2405.20954v1 null
2024-05-31 Standard model of electromagnetism and chirality in crystals R. Winkler et.al. 2405.20940v1 null
2024-05-31 MALT: Multi-scale Action Learning Transformer for Online Action Detection Zhipeng Yang et.al. 2405.20892v1 null
2024-05-30 MotionLLM: Understanding Human Behaviors from Human Motions and Videos Ling-Hao Chen et.al. 2405.20340v1 null
2024-05-30 OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving Lening Wang et.al. 2405.20337v1 link
2024-05-30 VividDream: Generating 3D Scene with Ambient Dynamics Yao-Chih Lee et.al. 2405.20334v1 null
2024-05-30 SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos Chinedu Innocent Nwoye et.al. 2405.20333v1 null
2024-05-31 4DHands: Reconstructing Interactive Hands in 4D with Transformers Dixuan Lin et.al. 2405.20330v2 null
2024-05-30 MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion Shuyuan Tu et.al. 2405.20325v1 null
2024-05-30 Vision-based Manipulation from Single Human Video with Open-World Object Graphs Yifeng Zhu et.al. 2405.20321v1 null
2024-05-30 Improving the Training of Rectified Flows Sangyun Lee et.al. 2405.20320v1 link
2024-05-30 CausalQuest: Collecting Natural Causal Questions for AI Agents Roberto Ceraolo et.al. 2405.20318v1 link
2024-05-30 Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models Himangi Mittal et.al. 2405.20305v1 null
2024-05-29 X-VILA: Cross-Modality Alignment for Large Language Model Hanrong Ye et.al. 2405.19335v1 null
2024-05-29 LLMs Meet Multimodal Generation and Editing: A Survey Yingqing He et.al. 2405.19334v1 link
2024-05-29 Multi-Modal Generative Embedding Model Feipeng Ma et.al. 2405.19333v1 null
2024-05-29 NPGA: Neural Parametric Gaussian Avatars Simon Giebenhain et.al. 2405.19331v1 null
2024-05-29 Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation Atrisha Sarkar et.al. 2405.19328v1 null
2024-05-29 DGD: Dynamic 3D Gaussians Distillation Isaac Labe et.al. 2405.19321v1 null
2024-05-29 Real-Time Environment Condition Classification for Autonomous Vehicles Marco Introvigne et.al. 2405.19305v1 null
2024-05-29 Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare Hanwei Zhu et.al. 2405.19298v1 null
2024-05-29 Archetype-Based Redshift Estimation for the Dark Energy Spectroscopic Instrument Survey Abhijeet Anand et.al. 2405.19288v1 null
2024-05-29 A study on the adequacy of common IQA measures for medical images Anna Breger et.al. 2405.19224v1 null
2024-05-28 Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets Khen Cohen et.al. 2405.18427v1 null
2024-05-28 GFlow: Recovering 4D World from Monocular Video Shizun Wang et.al. 2405.18426v1 null
2024-05-28 Hierarchical World Models as Visual Whole-Body Humanoid Controllers Nicklas Hansen et.al. 2405.18418v1 null
2024-05-28 3D StreetUnveiler with Semantic-Aware 2DGS Jingwei Xu et.al. 2405.18416v1 null
2024-05-28 Why are Visually-Grounded Language Models Bad at Image Classification? Yuhui Zhang et.al. 2405.18415v1 link
2024-05-28 Towards a Sampling Theory for Implicit Neural Representations Mahrokh Najaf et.al. 2405.18410v1 null
2024-05-28 Phased Consistency Model Fu-Yun Wang et.al. 2405.18407v1 null
2024-05-28 RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives Jaehong Yoon et.al. 2405.18406v1 null
2024-05-28 MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning Somnath Kumar et.al. 2405.18358v1 null
2024-05-28 Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography Jie Liu et.al. 2405.18356v1 link
2024-05-27 Matryoshka Multimodal Models Mu Cai et.al. 2405.17430v1 null
2024-05-27 NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models Chankyu Lee et.al. 2405.17428v1 null
2024-05-27 MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds Jiahui Lei et.al. 2405.17421v1 null
2024-05-27 Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Zhengfei Kuang et.al. 2405.17414v1 null
2024-05-27 Enhancing Music Genre Classification through Multi-Algorithm Analysis and User-Friendly Visualization Navin Kamuni et.al. 2405.17413v1 null
2024-05-27 The Peripatetic Hater: Predicting Movement Among Hate Subreddits Daniel Hickey et.al. 2405.17410v1 null
2024-05-27 Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer Ruizhi Shao et.al. 2405.17405v1 null
2024-05-27 Spectral Greedy Coresets for Graph Neural Networks Mucong Ding et.al. 2405.17404v1 null
2024-05-27 Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability Shenyuan Gao et.al. 2405.17398v1 link
2024-05-27 Non-Unitary Quantum Machine Learning Jamie Heredge et.al. 2405.17388v1 null
2024-05-24 Canonical Variates in Wasserstein Metric Space Jia Li et.al. 2405.15768v1 null
2024-05-24 Scaling Laws for Discriminative Classification in Large Language Models Dean Wyatte et.al. 2405.15765v1 null
2024-05-24 InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation Yuchi Wang et.al. 2405.15758v1 link
2024-05-24 Looking Backward: Streaming Video-to-Video Translation with Feature Banks Feng Liang et.al. 2405.15757v1 link
2024-05-24 Characterizing Discourse Group Roles in Inquiry-based University Science Labs Tong Wan et.al. 2405.15746v1 null
2024-05-24 Hierarchical Uncertainty Exploration via Feedforward Posterior Trees Elias Nehme et.al. 2405.15719v1 null
2024-05-24 EmpathicStories++: A Multimodal Dataset for Empathy towards Personal Experiences Jocelyn Shen et.al. 2405.15708v1 null
2024-05-24 Sums: Sniffing Unknown Multiband Signals under Low Sampling Rates Jinbo Peng et.al. 2405.15705v1 null
2024-05-24 realSEUDO for real-time calcium imaging analysis Iuliia Dmitrieva et.al. 2405.15701v1 null
2024-05-24 UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes Ted Lentsch et.al. 2405.15688v1 null
2024-05-23 PuzzleAvatar: Assembling 3D Avatars from Personal Albums Yuliang Xiu et.al. 2405.14869v1 null
2024-05-23 Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis Basile Van Hoorick et.al. 2405.14868v1 null
2024-05-23 Video Diffusion Models are Training-free Motion Interpreter and Controller Zeqi Xiao et.al. 2405.14864v1 null
2024-05-23 Synergistic Global-space Camera and Human Reconstruction from Videos Yizhou Zhao et.al. 2405.14855v1 null
2024-05-23 Domain Wall Magnetic Tunnel Junction Reliable Integrate and Fire Neuron Can Cui1 et.al. 2405.14851v1 null
2024-05-23 Learning to Detect and Segment Mobile Objects from Unlabeled Videos Yihong Sun et.al. 2405.14841v1 null
2024-05-23 Designing A Sustainable Marine Debris Clean-up Framework without Human Labels Raymond Wang et.al. 2405.14815v1 null
2024-05-23 As an AI Language Model, "Yes I Would Recommend Calling the Police'': Norm Inconsistency in LLM Decision-Making Shomik Jain et.al. 2405.14812v1 null
2024-05-23 Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics Jonas Spinner et.al. 2405.14806v1 null
2024-05-24 Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation Hongxu Jiang et.al. 2405.14802v2 link
2024-05-21 Comprehensive Multimodal Deep Learning Survival Prediction Enabled by a Transformer Architecture: A Multicenter Study in Glioblastoma Ahmed Gomaa et.al. 2405.12963v1 null
2024-05-21 **Online Learning of Halfspaces with Massart N

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages