GitHub - DWCTOD/cv-arxiv-daily

Updated on 2025.07.26

Video_Classification

Publish Date	Title	Authors	PDF	Code
2025-07-23	Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility	Melih Barsbey et.al.	2507.17748v1	null
2025-07-23	Yume: An Interactive World Generation Model	Xiaofeng Mao et.al.	2507.17744v1	null
2025-07-23	BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems	Malsha Ashani Mahawatta Dona et.al.	2507.17722v1	null
2025-07-23	Towards Effective Open-set Graph Class-incremental Learning	Jiazhen Chen et.al.	2507.17687v1	null
2025-07-23	Audio-Vision Contrastive Learning for Phonological Class Recognition	Daiqi Liu et.al.	2507.17682v1	null
2025-07-23	MCM: Mamba-based Cardiac Motion Tracking using Sequential Images in MRI	Jiahui Yin et.al.	2507.17678v1	null
2025-07-23	Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography	Farnoush Bayatmakou et.al.	2507.17662v1	null
2025-07-23	The Early Bird Identifies the Worm: You Can't Beat a Head Start in Long-Term Body Re-ID (ECHO-BID)	Thomas M. Metz et.al.	2507.17640v1	null
2025-07-23	Who Attacks, and Why? Using LLMs to Identify Negative Campaigning in 18M Tweets across 19 Countries	Victor Hartman et.al.	2507.17636v1	null
2025-07-23	Gauge Symmetries, Exact Symmetries and Conserved Charges in Minimal Massive Gravity	Kang Liu et.al.	2507.17635v1	null
2025-07-22	MultiTaskDeltaNet: Change Detection-based Image Segmentation for Operando ETEM with Application to Carbon Gasification Kinetics	Yushuo Niu et.al.	2507.16803v1	null
2025-07-22	Improving U-Net Confidence on TEM Image Data with L2-Regularization, Transfer Learning, and Deep Fine-Tuning	Aiden Ochoa et.al.	2507.16779v1	null
2025-07-22	Faithful, Interpretable Chest X-ray Diagnosis with Anti-Aliased B-cos Networks	Marcel Kleinmann et.al.	2507.16761v1	null
2025-07-22	Improving Model Classification by Optimizing the Training Dataset	Morad Tukan et.al.	2507.16729v1	null
2025-07-22	SALM: Spatial Audio Language Model with Structured Embeddings for Understanding and Editing	Jinbo Hu et.al.	2507.16724v1	null
2025-07-22	Temporally-Constrained Video Reasoning Segmentation and Automated Benchmark Construction	Yiqing Shen et.al.	2507.16718v1	null
2025-07-22	A Tutorial on MRI Reconstruction: From Modern Methods to Clinical Implications	Tolga Çukur et.al.	2507.16715v1	null
2025-07-22	Ring-based ML calibration with in situ pileup correction for real-time jet triggers	Benjamin T. Carlson et.al.	2507.16686v1	null
2025-07-22	VulGuard: An Unified Tool for Evaluating Just-In-Time Vulnerability Prediction Models	Duong Nguyen et.al.	2507.16685v1	null
2025-07-22	Structural Effect and Spectral Enhancement of High-Dimensional Regularized Linear Discriminant Analysis	Yonghan Zhang et.al.	2507.16682v1	null
2025-07-21	Simulating the LOcal Web (SLOW) V. Thermodynamic Properties and Evolution of Local Galaxy Clusters	Elena Hernández-Martínez et.al.	2507.15858v1	null
2025-07-21	Optimized Fabrication Procedure for High-Quality Graphene-based Moiré Superlattice Devices	Shuwen Sun et.al.	2507.15853v1	null
2025-07-22	SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction	Zhixiong Zhang et.al.	2507.15852v2	null
2025-07-22	GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding	Fei Tang et.al.	2507.15846v2	null
2025-07-21	Quantum computational sensing using quantum signal processing, quantum neural networks, and Hamiltonian engineering	Saeed A. Khan et.al.	2507.15845v1	null
2025-07-21	Optimizing Canaries for Privacy Auditing with Metagradient Descent	Matteo Boglioni et.al.	2507.15836v1	null
2025-07-21	Can Your Model Separate Yolks with a Water Bottle? Benchmarking Physical Commonsense Understanding in Video Generation Models	Enes Sanli et.al.	2507.15824v1	null
2025-07-21	Graph Attention Specialized Expert Fusion Model for Node Classification: Based on Cora and Pubmed Datasets	Zihang Ma et.al.	2507.15784v1	null
2025-07-21	Learning from Heterogeneity: Generalizing Dynamic Facial Expression Recognition via Distributionally Robust Optimization	Feng-Qi Cui et.al.	2507.15765v1	null
2025-07-21	TokensGen: Harnessing Condensed Tokens for Long Video Generation	Wenqi Ouyang et.al.	2507.15728v1	null
2025-07-18	NGC 663 as a laboratory for massive star evolution	Amparo Marco et.al.	2507.14125v1	null
2025-07-18	Kolmogorov Arnold Networks (KANs) for Imbalanced Data -- An Empirical Perspective	Pankaj Yadav et.al.	2507.14121v1	null
2025-07-18	Quantum Boltzmann Machines using Parallel Annealing for Medical Image Classification	Daniëlle Schuman et.al.	2507.14116v1	null
2025-07-18	Maximal translation surfaces in Lorentz-Minkowski space	Rafael López et.al.	2507.14103v1	null
2025-07-18	UGPL: Uncertainty-Guided Progressive Learning for Evidence-Based Classification in Computed Tomography	Shravan Venkatraman et.al.	2507.14102v1	null
2025-07-18	Generative AI-Driven High-Fidelity Human Motion Simulation	Hari Iyer et.al.	2507.14097v1	null
2025-07-18	Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment	Šimon Kubov et.al.	2507.14093v1	null
2025-07-18	Unmasking Performance Gaps: A Comparative Study of Human Anonymization and Its Effects on Video Anomaly Detection	Sara Abdulaziz et.al.	2507.14083v1	null
2025-07-18	Semi-supervised classification of Stars, Galaxies and Quasars using K-means and Random Forest	Vahid Asadi et.al.	2507.14072v1	null
2025-07-18	Predicting interface and spin states in armchair graphene nanoribbon junctions	Sofia Sanz et.al.	2507.14065v1	null
2025-07-17	VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding	Shihao Wang et.al.	2507.13353v1	null
2025-07-17	$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning	Yifan Wang et.al.	2507.13347v1	null
2025-07-17	Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models	Yudong Jin et.al.	2507.13344v1	null
2025-07-17	Taming Diffusion Transformer for Real-Time Mobile Video Generation	Yushu Wu et.al.	2507.13343v1	null
2025-07-17	SpectraLift: Physics-Guided Spectral-Inversion Network for Self-Supervised Hyperspectral Image Super-Resolution	Ritik Shah et.al.	2507.13339v1	null
2025-07-17	FocusView: Understanding and Customizing Informational Video Watching Experiences for Viewers with ADHD	Hanxiu 'Hazel' Zhu et.al.	2507.13309v1	null
2025-07-17	Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy	Yiting Yang et.al.	2507.13260v1	null
2025-07-17	Signal Temporal Logic Compliant Co-design of Planning and Control	Manas Sashank Juvvi et.al.	2507.13225v1	null
2025-07-17	Leveraging Pre-Trained Visual Models for AI-Generated Video Detection	Keerthi Veeramachaneni et.al.	2507.13224v1	null
2025-07-17	Degrees of points with rational $j$-invariant on $X_{0}(n)$ and $X_{1}(n)$	Kenji Terao et.al.	2507.13199v1	null
2025-07-16	CytoSAE: Interpretable Cell Embeddings for Hematology	Muhammed Furkan Dasdelen et.al.	2507.12464v1	null
2025-07-16	MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding	Renjie Li et.al.	2507.12463v1	null
2025-07-16	SpatialTrackerV2: 3D Point Tracking Made Easy	Yuxi Xiao et.al.	2507.12462v1	null
2025-07-16	Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios	Van-Hoang-Anh Phan et.al.	2507.12449v1	null
2025-07-16	Minmax Exclusivity Classes for Power-Type Loss Functions	Stanisław M. S. Halkiewicz et.al.	2507.12447v1	null
2025-07-17	EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos	Ruihan Yang et.al.	2507.12440v2	null
2025-07-16	Heisenberg limited multiple eigenvalue estimation via off-the-grid compressed sensing	Davide Castaldo et.al.	2507.12438v1	null
2025-07-16	Energy-based models for inverse imaging problems	Andreas Habring et.al.	2507.12432v1	null
2025-07-16	Unit-Based Histopathology Tissue Segmentation via Multi-Level Feature Representation	Ashkan Shakarami et.al.	2507.12427v1	null
2025-07-16	DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition	Hayat Ullah et.al.	2507.12426v1	null
2025-07-15	Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation	Zhen Xu et.al.	2507.11540v1	null
2025-07-15	Streaming 4D Visual Geometry Transformer	Dong Zhuo et.al.	2507.11539v1	null
2025-07-15	Understanding Quantum Information and Computation	John Watrous et.al.	2507.11536v1	null
2025-07-15	LLM-based ambiguity detection in natural language instructions for collaborative surgical robots	Ana Davila et.al.	2507.11525v1	null
2025-07-15	Precision Spatio-Temporal Feature Fusion for Robust Remote Sensing Change Detection	Buddhi Wijenayake et.al.	2507.11523v1	null
2025-07-15	CATVis: Context-Aware Thought Visualization	Tariq Mehmood et.al.	2507.11522v1	null
2025-07-15	On the Complexity of the Optimal Correlated Equilibria in Extensive-Form Games	Vincent Cheval et.al.	2507.11509v1	null
2025-07-16	Multipass Linear Sketches for Geometric LP-Type Problems	N. Efe Çekirge et.al.	2507.11484v2	null
2025-07-15	JamShield: A Machine Learning Detection System for Over-the-Air Jamming Attacks	Ioannis Panitsas et.al.	2507.11483v1	null
2025-07-15	C-FBI: A Combinatorial method using Convolutions for Circle Fitting in Blurry Images	Esteban Román Catafau et.al.	2507.11476v1	null
2025-07-14	EmbRACE-3K: Embodied Reasoning and Action in Complex Environments	Mingxian Lin et.al.	2507.10548v1	null
2025-07-14	Disentangling Neural Disjunctive Normal Form Models	Kexin Gu Baugh et.al.	2507.10546v1	null
2025-07-14	ScaffoldAvatar: High-Fidelity Gaussian Avatars with Patch Expressions	Shivangi Aneja et.al.	2507.10542v1	null
2025-07-14	A Classification of Transversal Clifford Gates for Qubit Stabilizer Codes	Shival Dasu et.al.	2507.10519v1	null
2025-07-14	Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI	Jiangkai Wu et.al.	2507.10510v1	null
2025-07-14	Topological phases and Edge states in an exactly solvable Gamma matrix model	Akhil Pravin Furtado et.al.	2507.10509v1	null
2025-07-14	Colorful Minors	Evangelos Protopapas et.al.	2507.10467v1	null
2025-07-14	AudioMAE++: learning better masked audio representations with SwiGLU FFNs	Sarthak Yadav et.al.	2507.10464v1	null
2025-07-14	RAPNet: A Receptive-Field Adaptive Convolutional Neural Network for Pansharpening	Tao Tang et.al.	2507.10461v1	null
2025-07-14	4D-Animal: Freely Reconstructing Animatable 3D Animals from Videos	Shanshan Zhong et.al.	2507.10437v1	null
2025-07-11	Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective	Hangjie Yuan et.al.	2507.08801v1	null
2025-07-11	Mining the Alerts: A Preliminary Catalog of Compact Binaries from the Fourth Observing Run	Aleyna Akyüz et.al.	2507.08778v1	null
2025-07-11	A Hybrid Multi-Well Hopfield-CNN with Feature Extraction and K-Means for MNIST Classification	Ahmed Farooq et.al.	2507.08766v1	null
2025-07-11	ML-Based Automata Simplification for Symbolic Accelerators	Tiffany Yu et.al.	2507.08751v1	null
2025-07-11	HieraRS: A Hierarchical Segmentation Paradigm for Remote Sensing Enabling Multi-Granularity Interpretation and Cross-Domain Transfer	Tianlong Ai et.al.	2507.08741v1	null
2025-07-11	Statistical Analysis of Early Spectra in Type II and IIb Supernovae	Maider González-Bañuelos et.al.	2507.08731v1	null
2025-07-11	RoundaboutHD: High-Resolution Real-World Urban Environment Benchmark for Multi-Camera Vehicle Tracking	Yuqiang Lin et.al.	2507.08729v1	null
2025-07-11	Distinct neurodynamics of functional brain networks in Alzheimer's disease and frontotemporal dementia as revealed by EEG	Sungwoo Ahn et.al.	2507.08728v1	null
2025-07-11	Free phases of Majorana fermions: Tenfold ways compared	Luuk Stehouwer et.al.	2507.08694v1	null
2025-07-11	Functional equations of axiomatic multiple Dirichlet series, Weyl groupoids, and quantum algebra	Will Sawin et.al.	2507.08662v1	null
2025-07-10	Multigranular Evaluation for Brain Visual Decoding	Weihao Xia et.al.	2507.07993v1	null
2025-07-10	Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs	Jeongseok Hyun et.al.	2507.07990v1	null
2025-07-10	CLIP Won't Learn Object-Attribute Binding from Natural Data and Here is Why	Bijay Gurung et.al.	2507.07985v1	null
2025-07-10	Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling	Haoyu Wu et.al.	2507.07982v1	null
2025-07-10	Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions	Longfei Li et.al.	2507.07978v1	null
2025-07-10	Scaling RL to Long Videos	Yukang Chen et.al.	2507.07966v1	null
2025-07-10	Multimodal Framework for Explainable Autonomous Driving: Integrating Video, Sensor, and Textual Data for Enhanced Decision-Making and Transparency	Abolfazl Zarghani et.al.	2507.07938v1	null
2025-07-10	Working with AI: Measuring the Occupational Implications of Generative AI	Kiran Tomlinson et.al.	2507.07935v1	null
2025-07-10	Measuring Hypothesis Testing Errors in the Evaluation of Retrieval Systems	Jack McKechnie et.al.	2507.07924v1	null
2025-07-10	ArteryX: Advancing Brain Artery Feature Extraction with Vessel-Fused Networks and a Robust Validation Framework	Abrar Faiyaz et.al.	2507.07920v1	null
2025-07-10	DTECT: Dynamic Topic Explorer & Context Tracker	Suman Adhya et.al.	2507.07910v1	null
2025-07-09	4KAgent: Agentic Any Image to 4K Super-Resolution	Yushen Zuo et.al.	2507.07105v1	null
2025-07-09	Exploring Public Perceptions of Generative AI in Libraries: A Social Media Analysis of X Discussions	Yuan Li et.al.	2507.07047v1	null
2025-07-09	Opto-ViT: Architecting a Near-Sensor Region of Interest-Aware Vision Transformer Accelerator with Silicon Photonics	Mehrdad Morsali et.al.	2507.07044v1	null
2025-07-09	Tilings of the sphere by congruent pentagons V: Edge combination $a^{4}b$ with rational angles	Jinjin Liang et.al.	2507.07038v1	null
2025-07-09	Classifying integral Grothendieck rings up to rank 5 and beyond	Max A. Alekseyev et.al.	2507.07023v1	null
2025-07-09	Quantum Spectral Clustering: Comparing Parameterized and Neuromorphic Quantum Kernels	Donovan Slabbert et.al.	2507.07018v1	null
2025-07-09	Deep Brain Net: An Optimized Deep Learning Model for Brain tumor Detection in MRI Images Using EfficientNetB0 and ResNet50 with Transfer Learning	Daniel Onah et.al.	2507.07011v1	null
2025-07-09	GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning	S M Taslim Uddin Raju et.al.	2507.07006v1	null
2025-07-09	BarkBeetle: Stealing Decision Tree Models with Fault Injection	Qifan Wang et.al.	2507.06986v1	null
2025-07-09	Anti-Interference Diffractive Deep Neural Networks for Multi-Object Recognition	Zhiqi Huang et.al.	2507.06978v1	null
2025-07-08	Learning to Track Any Points from Human Motion	Inès Hyeonsu Kim et.al.	2507.06233v1	null
2025-07-08	seMCD: Sequentially implemented Monte Carlo depth computation with statistical guarantees	Felix Gnettner et.al.	2507.06227v1	null
2025-07-08	EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow	Yixiang Chen et.al.	2507.06224v1	null
2025-07-08	Topological Holography for Mixed-State Phases and Phase Transitions	Ran Luo et.al.	2507.06218v1	null
2025-07-08	What ZTF Saw Where Rubin Looked: Anomaly Hunting in DR23	Maria V. Pruzhinskaya et.al.	2507.06217v1	null
2025-07-08	DS@GT at CheckThat! 2025: Ensemble Methods for Detection of Scientific Discourse on Social Media	Ayush Parikh et.al.	2507.06205v1	null
2025-07-08	DS@GT at CheckThat! 2025: Evaluating Context and Tokenization Strategies for Numerical Fact Verification	Maximilian Heil et.al.	2507.06195v1	null
2025-07-08	DS@GT at CheckThat! 2025: Detecting Subjectivity via Transfer-Learning and Corrective Data Augmentation	Maximilian Heil et.al.	2507.06189v1	null
2025-07-08	SoftReMish: A Novel Activation Function for Enhanced Convolutional Neural Networks for Visual Recognition Performance	Mustafa Bayram Gücen et.al.	2507.06148v1	null
2025-07-08	LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models	Zhihao Chen et.al.	2507.06140v1	null
2025-07-07	Spatio-Temporal LLM: Reasoning about Environments and Actions	Haozhen Zheng et.al.	2507.05258v1	null
2025-07-07	StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling	Meng Wei et.al.	2507.05240v1	null
2025-07-07	Bridging Expressivity and Scalability with Adaptive Unitary SSMs	Arjun Karuvally et.al.	2507.05238v1	null
2025-07-07	Self-Supervised Real-Time Tracking of Military Vehicles in Low-FPS UAV Footage	Markiyan Kostiv et.al.	2507.05229v1	null
2025-07-08	MedGemma Technical Report	Andrew Sellergren et.al.	2507.05201v2	null
2025-07-07	EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling	Boyuan Wang et.al.	2507.05198v1	null
2025-07-07	Light-cone vector superspace and continuous-spin field in AdS	R. R. Metsaev et.al.	2507.05194v1	null
2025-07-07	RAM-W600: A Multi-Task Wrist Dataset and Benchmark for Rheumatoid Arthritis	Songxiao Yang et.al.	2507.05193v1	null
2025-07-07	QMoE: A Quantum Mixture of Experts Framework for Scalable Quantum Neural Networks	Hoang-Quan Nguyen et.al.	2507.05190v1	null
2025-07-07	Satellite-based Rabi rice paddy field mapping in India: a case study on Telangana state	Prashanth Reddy Putta et.al.	2507.05189v1	null
2025-07-03	MultiGen: Using Multimodal Generation in Simulation to Learn Multimodal Policies in Real	Renhao Wang et.al.	2507.02864v1	null
2025-07-03	RefTok: Reference-Based Tokenization for Video Generation	Xiang Fan et.al.	2507.02862v1	null
2025-07-03	LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans	Zhening Huang et.al.	2507.02861v1	null
2025-07-03	Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching	Xin Zhou et.al.	2507.02860v1	null
2025-07-03	AnyI2V: Animating Any Conditional Image with Motion Control	Ziye Li et.al.	2507.02857v1	null
2025-07-03	Classification and Reduction of Homogeneous Star Products	Marvin Dippell et.al.	2507.02820v1	null
2025-07-03	LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion	Fangfu Liu et.al.	2507.02813v1	null
2025-07-03	HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars	Gent Serifi et.al.	2507.02803v1	null
2025-07-03	From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding	Xiangfeng Wang et.al.	2507.02790v1	null
2025-07-03	From Pixels to Damage Severity: Estimating Earthquake Impacts Using Semantic Segmentation of Social Media Images	Danrong Zhang et.al.	2507.02781v1	null
2025-07-02	How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks	Rahul Ramachandran et.al.	2507.01955v1	null
2025-07-02	Kwai Keye-VL Technical Report	Kwai Keye Team et.al.	2507.01949v1	null
2025-07-02	LongAnimation: Long Animation Generation with Dynamic Global-Local Memory	Nan Chen et.al.	2507.01945v1	null
2025-07-02	CI-VID: A Coherent Interleaved Text-Video Dataset	Yiming Ju et.al.	2507.01938v1	null
2025-07-02	evMLP: An Efficient Event-Driven MLP Architecture for Vision	Zhentan Zheng et.al.	2507.01927v1	null
2025-07-02	Advancing Magnetic Materials Discovery -- A structure-based machine learning approach for magnetic ordering and magnetic moment prediction	Apoorv Verma et.al.	2507.01913v1	null
2025-07-02	Future Slot Prediction for Unsupervised Object Discovery in Surgical Video	Guiqiu Liao et.al.	2507.01882v1	null
2025-07-02	A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs	Niccolò McConnell et.al.	2507.01881v1	null
2025-07-02	Locally Rotationally Symmetric Spacetimes in Einstein-Cartan Theory and Their Classification	Ujjwal Agarwal et.al.	2507.01840v1	null
2025-07-02	mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling	Tristan Torchet et.al.	2507.01829v1	null
2025-06-30	How to Design and Train Your Implicit Neural Representation for Video Compression	Matthew Gwilliam et.al.	2506.24127v1	null
2025-06-30	TextMesh4D: High-Quality Text-to-4D Mesh Generation	Sisi Dai et.al.	2506.24121v1	null
2025-06-30	Nonlinear Symmetry-Fragmentation of Nonabelian Anyons In Symmetry-Enriched Topological Phases: A String-Net Model Realization	Nianrui Fu et.al.	2506.24115v1	null
2025-06-30	Epona: Autoregressive Diffusion World Model for Autonomous Driving	Kaiwen Zhang et.al.	2506.24113v1	null
2025-06-30	MILo: Mesh-In-the-Loop Gaussian Splatting for Detailed and Efficient Surface Reconstruction	Antoine Guédon et.al.	2506.24096v1	null
2025-06-30	SQUASH: A SWAP-Based Quantum Attack to Sabotage Hybrid Quantum Neural Networks	Rahul Kumar et.al.	2506.24081v1	null
2025-06-30	C3VDv2 -- Colonoscopy 3D video dataset with enhanced realism	Mayank V. Golhar et.al.	2506.24074v1	null
2025-06-30	Spectroscopy of drive-induced unwanted state transitions in superconducting circuits	W. Dai et.al.	2506.24070v1	null
2025-06-30	Evolution models with time-dependent coefficients in friction and viscoelastic damping terms	Halit Sevki Aslan et.al.	2506.24058v1	null
2025-06-30	Ella: Embodied Social Agents with Lifelong Memory	Hongxin Zhang et.al.	2506.24019v1	null
2025-06-27	Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy	Yuhao Liu et.al.	2506.22432v1	null
2025-06-27	Single-shot HDR using conventional image sensor shutter functions and optical randomization	Xiang Dai et.al.	2506.22426v1	null
2025-06-30	Dehazing Light Microscopy Images with Guided Conditional Flow Matching: finding a sweet spot between fidelity and realism	Anirban Ray et.al.	2506.22397v2	null
2025-06-27	Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment	Yue Zhang et.al.	2506.22385v1	null
2025-06-27	Topological Defect Propagation to Classify Knitted Fabrics	Daisuke S. Shimamoto et.al.	2506.22369v1	null
2025-06-27	From Ground to Air: Noise Robustness in Vision Transformers and CNNs for Event-Based Vehicle Classification with Potential UAV Applications	Nouf Almesafri et.al.	2506.22360v1	null
2025-06-27	OutDreamer: Video Outpainting with a Diffusion Transformer	Linhao Zhong et.al.	2506.22298v1	null
2025-06-27	DIGS: Dynamic CBCT Reconstruction using Deformation-Informed 4D Gaussian Splatting and a Low-Rank Free-Form Deformation Model	Yuliang Huang et.al.	2506.22280v1	null
2025-06-27	Almost abelian pseudo-Kähler Lie algebras	Diego Conti et.al.	2506.22278v1	null
2025-06-27	Boosting Classification with Quantum-Inspired Augmentations	Matthias Tschöpe et.al.	2506.22241v1	null
2025-06-26	Whole-Body Conditioned Egocentric Video Prediction	Yutong Bai et.al.	2506.21552v1	null
2025-06-26	SAM4D: Segment Anything in Camera and LiDAR Streams	Jianyun Xu et.al.	2506.21547v1	null
2025-06-26	ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers	Nicholas S. DiBrita et.al.	2506.21537v1	null
2025-06-26	Exploring the Design Space of 3D MLLMs for CT Report Generation	Mohammed Baharoon et.al.	2506.21535v1	null
2025-06-26	The spectrum of global representations for families of bounded rank and VI-modules	Miguel Barrero et.al.	2506.21525v1	null
2025-06-26	MADrive: Memory-Augmented Driving Scene Modeling	Polina Karpikova et.al.	2506.21520v1	null
2025-06-26	G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation	Mohammed Rakib et.al.	2506.21514v1	null
2025-06-26	GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation	Wentao Hu et.al.	2506.21513v1	null
2025-06-26	Devising a solution to the problems of Cancer awareness in Telangana	Priyanka Avhad et.al.	2506.21500v1	null
2025-06-26	Lightweight Physics-Informed Zero-Shot Ultrasound Plane Wave Denoising	Hojat Asgariandehkordi et.al.	2506.21499v1	null
2025-06-25	Artificial Symmetry Breaking by Self-Interaction Error	Lin Hou et.al.	2506.20662v1	null
2025-06-25	EditP23: 3D Editing via Propagation of Image Prompts to Multi-View	Roi Bar-On et.al.	2506.20652v1	null
2025-06-25	Disentangled representations of microscopy images	Jacopo Dapueto et.al.	2506.20649v1	null
2025-06-25	rd-spiral: An open-source Python library for learning 2D reaction-diffusion dynamics through pseudo-spectral method	Sandy H. S. Herho et.al.	2506.20633v1	link
2025-06-25	Weighted Mean Frequencies: a handcraft Fourier feature for 4D Flow MRI segmentation	Simon Perrin et.al.	2506.20614v1	null
2025-06-25	Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings	Ankit Shah et.al.	2506.20609v1	null
2025-06-25	Video Perception Models for 3D Scene Synthesis	Rui Huang et.al.	2506.20601v1	null
2025-06-25	CogGen: A Learner-Centered Generative AI Architecture for Intelligent Tutoring with Programming Video	Wengxi Li et.al.	2506.20600v1	null
2025-06-25	WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration	Chaojun Ni et.al.	2506.20590v1	null
2025-06-25	TRIM: A Self-Supervised Video Summarization Framework Maximizing Temporal Relative Information and Representativeness	Pritam Mishra et.al.	2506.20588v1	null
2025-06-24	Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation	Xingyang Li et.al.	2506.19852v1	null
2025-06-24	AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models	Zehuan Huang et.al.	2506.19851v1	null
2025-06-24	Unified Vision-Language-Action Model	Yuqi Wang et.al.	2506.19850v1	null
2025-06-24	GenHSI: Controllable Generation of Human-Scene Interaction Videos	Zekun Li et.al.	2506.19840v1	null
2025-06-24	Improving Progressive Generation with Decomposable Flow Matching	Moayed Haji-Ali et.al.	2506.19839v1	null
2025-06-24	SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution	Liangbin Xie et.al.	2506.19838v1	null
2025-06-24	MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration	Yucheng Zhou et.al.	2506.19835v1	null
2025-06-24	Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router	Yubo Huang et.al.	2506.19833v1	null
2025-06-24	How Effectively Can BERT Models Interpret Context and Detect Bengali Communal Violent Text?	Abdullah Khondoker et.al.	2506.19831v1	null
2025-06-25	One Prototype Is Enough: Single-Prototype Activation for Interpretable Image Classification	Yitao Peng et.al.	2506.19808v2	null
2025-06-23	TC-Light: Temporally Consistent Relighting for Dynamic Long Videos	Yang Liu et.al.	2506.18904v1	null
2025-06-23	VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory	Runjia Li et.al.	2506.18903v1	null
2025-06-23	From Virtual Games to Real-World Play	Wenqiang Sun et.al.	2506.18901v1	null
2025-06-23	FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation	Kaiyi Huang et.al.	2506.18899v1	null
2025-06-23	MinD: Unified Visual Imagination and Control via Hierarchical World Models	Xiaowei Chi et.al.	2506.18897v1	null
2025-06-23	Steering Conceptual Bias via Transformer Latent-Subspace Activation	Vansh Sharma et.al.	2506.18887v1	null
2025-06-23	Universal Video Temporal Grounding with Generative Multi-modal Large Language Models	Zeqian Li et.al.	2506.18883v1	null
2025-06-23	Let Your Video Listen to Your Music!	Xinyu Zhang et.al.	2506.18881v1	null
2025-06-23	OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation	Qijun Gan et.al.	2506.18866v1	null
2025-06-23	Pointwise-relatively-compact subgroups and trivial-weight-free representations	Alexandru Chirvasitu et.al.	2506.18861v1	null
2025-06-20	VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning	Zhangyang Qi et.al.	2506.17221v1	null
2025-06-23	Emergent Temporal Correspondences from Video Diffusion Transformers	Jisu Nam et.al.	2506.17220v2	link
2025-06-20	Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition	Jiaqi Li et.al.	2506.17201v1	null
2025-06-20	YASMOT: Yet another stereo image multi-object tracker	Ketil Malde et.al.	2506.17186v1	null
2025-06-20	High-accuracy inference using HfO$_x$S$_y$/HfS$_2$ Memristors	Aferdita Xhameni et.al.	2506.17174v1	null
2025-06-20	Proportional Sensitivity in Generative Adversarial Network (GAN)-Augmented Brain Tumor Classification Using Convolutional Neural Network	Mahin Montasir Afif et.al.	2506.17165v1	null
2025-06-20	Affine semigroups without consecutive small elements	J. C. Rosales et.al.	2506.17152v1	null
2025-06-20	Do We Need Large VLMs for Spotting Soccer Actions?	Ritabrata Chakraborty et.al.	2506.17144v1	null
2025-06-20	MeDi: Metadata-Guided Diffusion Models for Mitigating Biases in Tumor Classification	David Jacob Drexlin et.al.	2506.17140v1	null
2025-06-20	Robust Training with Data Augmentation for Medical Imaging Classification	Josué Martínez-Martínez et.al.	2506.17133v1	null
2025-06-18	Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos	Kaifeng Zhang et.al.	2506.15680v1	null
2025-06-20	Sekai: A Video Dataset towards World Exploration	Zhen Li et.al.	2506.15675v2	null
2025-06-18	UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting	Kai He et.al.	2506.15673v1	null
2025-06-18	PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection	Wenhao Li et.al.	2506.15656v1	null
2025-06-18	Oldies but Goldies: The Potential of Character N-grams for Romanian Texts	Dana Lupsa et.al.	2506.15650v1	null
2025-06-18	FindingDory: A Benchmark to Evaluate Memory in Embodied Agents	Karmesh Yadav et.al.	2506.15635v1	null
2025-06-18	GFLC: Graph-based Fairness-aware Label Correction for Fair Classification	Modar Sulaiman et.al.	2506.15620v1	null
2025-06-18	The Compositional Architecture of Regret in Large Language Models	Xiangxiang Cui et.al.	2506.15617v1	null
2025-06-18	TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data	Kentaro Seki et.al.	2506.15614v1	null
2025-06-18	BoxFusion: Reconstruction-Free Open-Vocabulary 3D Object Detection via Real-Time Multi-View Box Fusion	Yuqing Lan et.al.	2506.15610v1	null
2025-06-17	GMT: General Motion Tracking for Humanoid Whole-Body Control	Zixuan Chen et.al.	2506.14770v1	null
2025-06-17	On the Hardness of Bandit Learning	Nataly Brukhim et.al.	2506.14746v1	null
2025-06-17	SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting	Ziqiao Peng et.al.	2506.14742v1	null
2025-06-17	Repulsive particle interactions enable selective information processing at cellular interfaces	Jenna Elliott et.al.	2506.14739v1	null
2025-06-17	Plug-and-Play with 2.5D Artifact Reduction Prior for Fast and Accurate Industrial Computed Tomography Reconstruction	Haley Duba-Sullivan et.al.	2506.14719v1	null
2025-06-17	Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models	Ling Li et.al.	2506.14674v1	null
2025-06-17	Quantifying Diagnostic Signal Decay in Dementia: A National Study of Medicare Hospitalization Data	Federica Spoto et.al.	2506.14669v1	null
2025-06-17	DDS-NAS: Dynamic Data Selection within Neural Architecture Search via On-line Hard Example Mining applied to Image Classification	Matt Poyser et.al.	2506.14667v1	null
2025-06-18	AIn't Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation	Leah von der Heyde et.al.	2506.14634v2	null
2025-06-17	Optimization-Based Image Restoration under Implementation Constraints in Optical Analog Circuits	Taisei Kato et.al.	2506.14624v1	null
2025-06-16	PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images	Lingteng Qiu et.al.	2506.13766v1	null
2025-06-16	Touch begins where vision ends: Generalizable policies for contact-rich manipulation	Zifan Zhao et.al.	2506.13762v1	null
2025-06-17	VideoPDE: Unified Generative PDE Solving via Video Inpainting Diffusion Models	Edward Li et.al.	2506.13754v2	null
2025-06-16	Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability	Shova Kuikel et.al.	2506.13746v1	null
2025-06-16	Robust Recursive Fusion of Multiresolution Multispectral Images with Location-Aware Neural Networks	Haoqing Li et.al.	2506.13733v1	null
2025-06-16	Probabilistic patient risk profiling with pair-copula constructions	Özge Şahin et.al.	2506.13731v1	null
2025-06-16	Contrastive Self-Supervised Learning As Neural Manifold Packing	Guanming Zhang et.al.	2506.13717v1	null
2025-06-16	TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning	Junru Zhang et.al.	2506.13705v1	null
2025-06-16	Eight-dimensional non completely reducible symplectic Lie algebras	T. Aït Aissa et.al.	2506.13699v1	null
2025-06-16	Vid-CamEdit: Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry	Junyoung Seo et.al.	2506.13697v1	null
2025-06-13	crossMoDA Challenge: Evolution of Cross-Modality Domain Adaptation Techniques for Vestibular Schwannoma and Cochlea Segmentation from 2021 to 2023	Navodini Wijethilake et.al.	2506.12006v1	null
2025-06-13	Visual Pre-Training on Unlabeled Images using Reinforcement Learning	Dibya Ghosh et.al.	2506.11967v1	null
2025-06-13	Technical Evaluation of a Disruptive Approach in Homomorphic AI	Eric Filiol et.al.	2506.11954v1	null
2025-06-13	Effectiveness of Counter-Speech against Abusive Content: A Multidimensional Annotation and Classification Study	Greta Damo et.al.	2506.11919v1	null
2025-06-13	GeistBERT: Breathing Life into German NLP	Raphael Scheible-Schmitt et.al.	2506.11903v1	null
2025-06-13	A Neural Rejection System Against Universal Adversarial Perturbations in Radio Signal Classification	Lu Zhang et.al.	2506.11901v1	null
2025-06-13	Attention-based Adversarial Robust Distillation in Radio Signal Classifications for Low-Power IoT Devices	Lu Zhang et.al.	2506.11892v1	null
2025-06-13	Methods for evaluating the resolution of 3D data derived from satellite images	Christina Selby et.al.	2506.11876v1	null
2025-06-13	MindGrab for BrainChop: Fast and Accurate Skull Stripping for Command Line and Browser	Armina Fani et.al.	2506.11860v1	null
2025-06-13	3D Skin Segmentation Methods in Medical Imaging: A Comparison	Martina Paccini et.al.	2506.11852v1	null
2025-06-12	InstaInpaint: Instant 3D-Scene Inpainting with Masked Large Reconstruction Model	Junqi You et.al.	2506.10980v1	null
2025-06-12	GenWorld: Towards Detecting AI-generated Real-world Simulation Videos	Weiliang Chen et.al.	2506.10975v1	null
2025-06-12	Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop	Justin Kerr et.al.	2506.10968v1	null
2025-06-12	Bias-Switchable Row-Column Array Imaging using Fast Orthogonal Row-Column Electronic Scanning (FORCES) Compared with Conventional Row-Column Array Imaging	Randy Palamar et.al.	2506.10958v1	null
2025-06-12	Coupled reaction and diffusion governing interface evolution in solid-state batteries	Jingxuan Ding et.al.	2506.10944v1	null
2025-06-12	VINCIE: Unlocking In-context Image Editing from Video	Leigang Qu et.al.	2506.10941v1	null
2025-06-12	Video-Mediated Emotion Disclosure: A Study of Mental Health Vlogging by People with Schizophrenia on YouTube	Jiaying Lizzy Liu et.al.	2506.10932v1	null
2025-06-12	On feature selection in double-imbalanced data settings: a Random Forest approach	Fabio Demaria et.al.	2506.10929v1	null
2025-06-12	Semi-Automated Quality Assurance in Digital Pathology: Tile Classification Approach	Meredith VandeHaar et.al.	2506.10916v1	null
2025-06-12	M4V: Multi-Modal Mamba for Text-to-Video Generation	Jiancheng Huang et.al.	2506.10915v1	null
2025-06-11	DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos	Chieh Hubert Lin et.al.	2506.09997v1	null
2025-06-11	PlayerOne: Egocentric World Simulator	Yuanpeng Tu et.al.	2506.09995v1	null
2025-06-11	Large Language Models for Toxic Language Detection in Low-Resource Balkan Languages	Amel Muminovic et.al.	2506.09992v1	null
2025-06-11	Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes	Yiming Dou et.al.	2506.09989v1	null
2025-06-11	A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs	Benno Krojer et.al.	2506.09987v1	null
2025-06-11	V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning	Mido Assran et.al.	2506.09985v1	null
2025-06-11	InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions	Zhenzhi Wang et.al.	2506.09984v1	null
2025-06-11	ReSim: Reliable World Simulation for Autonomous Driving	Jiazhi Yang et.al.	2506.09981v1	null
2025-06-11	Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing	Junfei Wu et.al.	2506.09965v1	null
2025-06-11	Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos	Benjamin Reichman et.al.	2506.09953v1	null
2025-06-10	MagCache: Fast Video Generation with Magnitude-Aware Cache	Zehong Ma et.al.	2506.09045v1	null
2025-06-10	The Decoupled Risk Landscape in Performative Prediction	Javier Sanguino et.al.	2506.09044v1	null
2025-06-10	Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models	Xuanchi Ren et.al.	2506.09042v1	null
2025-06-10	Princeton365: A Diverse Dataset with Accurate Camera Pose	Karhan Kayan et.al.	2506.09035v1	null
2025-06-10	DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical Imaging	Felix Wagner et.al.	2506.09024v1	null
2025-06-10	Employing self-supervised learning models for cross-linguistic child speech maturity classification	Theo Zhang et.al.	2506.08999v1	null
2025-06-10	Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models	Chenyu Lian et.al.	2506.08990v1	null
2025-06-10	Naturalistic Language-related Movie-Watching fMRI Task for Detecting Neurocognitive Decline and Disorder	Yuejiao Wang et.al.	2506.08986v1	null
2025-06-10	Diver-Robot Communication Dataset for Underwater Hand Gesture Recognition	Igor Kvasić et.al.	2506.08974v1	null
2025-06-10	Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System	Yuan Guo et.al.	2506.08972v1	null
2025-06-09	4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos	Zhen Xu et.al.	2506.08015v1	null
2025-06-09	Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion	Xun Huang et.al.	2506.08009v1	null
2025-06-09	Dreamland: Controllable World Creation with Simulator and Generative Models	Sicheng Mo et.al.	2506.08006v1	null
2025-06-09	Dynamic View Synthesis as an Inverse Problem	Hidir Yesiltepe et.al.	2506.08004v1	null
2025-06-09	Audio-Sync Video Generation with Multi-Stream Temporal Control	Shuchen Weng et.al.	2506.08003v1	null
2025-06-09	Generative Modeling of Weights: Generalization or Memorization?	Boya Zeng et.al.	2506.07998v1	null
2025-06-09	UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References	Ming-Feng Li et.al.	2506.07996v1	null
2025-06-09	CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray	Mingquan Lin et.al.	2506.07984v1	null
2025-06-09	Scalable Machine Learning Models for Predicting Quantum Transport in Disordered 2D Hexagonal Materials	Seyed Mahdi Mastoor et.al.	2506.07983v1	null
2025-06-09	CyberV: Cybernetics for Test-time Scaling in Video Understanding	Jiahao Meng et.al.	2506.07971v1	null
2025-06-06	TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation	Muhammad Sohail Danish et.al.	2506.06281v1	null
2025-06-06	Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias	Yuanzhe Hu et.al.	2506.06280v1	null
2025-06-06	ExAct: A Video-Language Benchmark for Expert Action Analysis	Han Yi et.al.	2506.06277v1	null
2025-06-06	Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding	Emmanouil Zaranis et.al.	2506.06275v1	null
2025-06-06	BecomingLit: Relightable Gaussian Avatars with Hybrid Neural Shading	Jonathan Schmidt et.al.	2506.06271v1	null
2025-06-06	Integrating Complexity and Biological Realism: High-Performance Spiking Neural Networks for Breast Cancer Detection	Zofia Rudnicka et.al.	2506.06265v1	null
2025-06-06	Tuning of altermagnetism by strain	M. Khodas et.al.	2506.06257v1	null
2025-06-06	Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision	Yuping He et.al.	2506.06253v1	null
2025-06-06	Explaining Matters: Leveraging Definitions and Semantic Expansion for Sexism Detection	Sahrish Khan et.al.	2506.06238v1	null
2025-06-06	Towards an Explainable Comparison and Alignment of Feature Embeddings	Mohammad Jalali et.al.	2506.06231v1	null
2025-06-05	VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos	Hanoona Rasheed et.al.	2506.05349v1	null
2025-06-05	Neural Inverse Rendering from Propagating Light	Anagh Malik et.al.	2506.05347v1	null
2025-06-05	ContentV: Efficient Training of Video Generation Models with Limited Compute	Wenfeng Lin et.al.	2506.05343v1	null
2025-06-05	VideoMolmo: Spatio-Temporal Grounding Meets Pointing	Ghazi Shazan Ahmad et.al.	2506.05336v1	null
2025-06-05	Unleashing Hour-Scale Video Training for Long Video-Language Understanding	Jingyang Lin et.al.	2506.05332v1	null
2025-06-05	AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs	Lidong Lu et.al.	2506.05328v1	null
2025-06-05	LSM-2: Learning from Incomplete Wearable Sensor Data	Maxwell A. Xu et.al.	2506.05321v1	null
2025-06-05	ProJo4D: Progressive Joint Optimization for Sparse-View Inverse Physics Estimation	Daniel Rho et.al.	2506.05317v1	null
2025-06-05	Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos	Weifeng Lin et.al.	2506.05302v1	null
2025-06-05	SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training	Jianyi Wang et.al.	2506.05301v1	null
2025-06-04	LayerFlow: A Unified Model for Layer-aware Video Generation	Sihui Ji et.al.	2506.04228v1	null
2025-06-04	Object-centric 3D Motion Field for Robot Learning from Human Videos	Zhao-Heng Yin et.al.	2506.04227v1	null
2025-06-04	Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation	Tianyu Huang et.al.	2506.04225v1	null
2025-06-04	Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset	Zirui Wang et.al.	2506.04224v1	null
2025-06-04	Topological Mixed States: Axiomatic Approaches and Phases of Matter	Tai-Hsuan Yang et.al.	2506.04221v1	null
2025-06-04	UNIC: Unified In-Context Video Editing	Zixuan Ye et.al.	2506.04216v1	null
2025-06-05	FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers	Xuanhua He et.al.	2506.04213v2	null
2025-06-04	A Few Moments Please: Scalable Graphon Learning via Moment Matching	Reza Ramezanpour et.al.	2506.04206v1	null
2025-06-04	Synthetic multi-inversion time magnetic resonance images for visualization of subcortical structures	Savannah P. Hays et.al.	2506.04173v1	null
2025-06-04	Does Prompt Design Impact Quality of Data Imputation by LLMs?	Shreenidhi Srinivasan et.al.	2506.04172v1	null
2025-06-03	IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation	Yuanze Lin et.al.	2506.03150v1	null
2025-06-03	Topology meets symmetry breaking: Hidden order, intrinsically gapless topological states and finite-temperature topological transitions	Reja H. Wilke et.al.	2506.03146v1	null
2025-06-03	Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval	Jiwen Yu et.al.	2506.03141v1	null
2025-06-03	CamCloneMaster: Enabling Reference-based Camera Control for Video Generation	Yawen Luo et.al.	2506.03140v1	null
2025-06-03	The perfect entangler spectrum as a tool to analyze crosstalk	Matthias G. Krauss et.al.	2506.03137v1	null
2025-06-03	AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation	Lu Qiu et.al.	2506.03126v1	null
2025-06-03	DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation	Zhengyao Lv et.al.	2506.03123v1	null
2025-06-03	Controllable Human-centric Keyframe Interpolation with Generative Prior	Zujin Guo et.al.	2506.03119v1	null
2025-06-03	HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers	Zhiyuan Yu et.al.	2506.03118v1	null
2025-06-03	TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models	Chetwin Low et.al.	2506.03099v1	null
2025-05-30	Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks	Tajamul Ashraf et.al.	2505.24876v1	null
2025-05-30	MiniMax-Remover: Taming Bad Noise Helps Video Object Removal	Bojia Zi et.al.	2505.24873v1	null
2025-05-30	SiLVR: A Simple Language-based Video Reasoning Framework	Ce Zhang et.al.	2505.24869v1	null
2025-05-30	Time Blindness: Why Video-Language Models Can't See What Humans Can?	Ujjwal Upadhyay et.al.	2505.24867v1	null
2025-05-30	TalkingHeadBench: A Multi-Modal Benchmark & Analysis of Talking-Head DeepFake Detection	Xinqi Xiong et.al.	2505.24866v1	null
2025-05-30	DexMachina: Functional Retargeting for Bimanual Dexterous Manipulation	Zhao Mandi et.al.	2505.24853v1	null
2025-05-30	Reading Recognition in the Wild	Charig Yang et.al.	2505.24848v1	null
2025-05-30	VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software	Brandon Man et.al.	2505.24838v1	null
2025-06-02	Beyond Pretty Pictures: Combined Single- and Multi-Image Super-resolution for Sentinel-2 Images	Aditya Retnanto et.al.	2505.24799v2	null
2025-05-30	Lightweight Relational Embedding in Task-Interpolated Few-Shot Networks for Enhanced Gastrointestinal Disease Classification	Xinliu Zhong et.al.	2505.24792v1	null
2025-05-29	Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models	Haohan Chi et.al.	2505.23757v1	link
2025-05-29	Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence	Diankun Wu et.al.	2505.23747v1	null
2025-05-29	Boosting Domain Incremental Learning: Selecting the Optimal Parameters is All You Need	Qiang Wang et.al.	2505.23744v1	null
2025-05-29	DarkDiff: Advancing Low-Light Raw Enhancement by Retasking Diffusion Models for Camera ISP	Amber Yijia Zheng et.al.	2505.23743v1	null
2025-05-29	MAGREF: Masked Guidance for Any-Reference Video Generation	Yufan Deng et.al.	2505.23742v1	link
2025-05-29	How Animals Dance (When You're Not Looking)	Xiaojuan Wang et.al.	2505.23738v1	null
2025-05-30	ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS	Weijie Wang et.al.	2505.23734v2	link
2025-05-29	The ambiguous AT2022rze: Changing-look AGN mimicking a supernova in a merging galaxy system	P. J. Pessi et.al.	2505.23731v1	null
2025-05-29	Skin Lesion Phenotyping via Nested Multi-modal Contrastive Learning	Dionysis Christopoulos et.al.	2505.23709v1	null
2025-05-29	Distributed Federated Learning for Vehicular Network Security: Anomaly Detection Benefits and Multi-Domain Attack Threats	Utku Demir et.al.	2505.23706v1	null
2025-05-29	Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better	Danny Driess et.al.	2505.23705v1	null
2025-05-28	Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation	Zhe Kong et.al.	2505.22647v1	null
2025-05-28	PS4PRO: Pixel-to-pixel Supervision for Photorealistic Rendering and Optimization	Yezhi Shen et.al.	2505.22616v1	null
2025-05-28	Chest Disease Detection In X-Ray Images Using Deep Learning Classification Method	Alanna Hazlett et.al.	2505.22609v1	null
2025-05-28	Transformers for Secure Hardware Systems: Applications, Challenges, and Outlook	Banafsheh Saber Latibari et.al.	2505.22605v1	null
2025-05-28	Comparative Analysis of Machine Learning Models for Lung Cancer Mutation Detection and Staging Using 3D CT Scans	Yiheng Li et.al.	2505.22592v1	null
2025-05-28	Tell me Habibi, is it Real or Fake?	Kartik Kuckreja et.al.	2505.22581v1	null
2025-05-28	Multipath cycleGAN for harmonization of paired and unpaired low-dose lung computed tomography reconstruction kernels	Aravind R. Krishnan et.al.	2505.22568v1	null
2025-05-28	Universal Visuo-Tactile Video Understanding for Embodied Interaction	Yifan Xie et.al.	2505.22566v1	null
2025-05-28	PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion	Jaehyun Choi et.al.	2505.22564v1	null
2025-05-28	Emotion-o1: Adaptive Long Reasoning for Emotion Understanding in LLMs	Changhao Song et.al.	2505.22548v1	null
2025-05-27	Frame In-N-Out: Unbounded Controllable Image-to-Video Generation	Boyang Wang et.al.	2505.21491v1	null
2025-05-27	Tissue-specific predictive performance: A unified estimation and inference framework for multi-category screening tests	A. Gregory DiRienzo et.al.	2505.21482v1	null
2025-05-27	M3S-UPD: Efficient Multi-Stage Self-Supervised Learning for Fine-Grained Encrypted Traffic Classification with Unknown Pattern Discovery	Yali Yuan et.al.	2505.21462v1	null
2025-05-27	LazyVLM: Neuro-Symbolic Approach to Video Analytics	Xiangru Jian et.al.	2505.21459v1	null
2025-05-27	OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers	Ziqiao Peng et.al.	2505.21448v1	null
2025-05-27	Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization	Vit Fojtik et.al.	2505.21423v1	null
2025-05-27	A Structured Unplugged Approach for Foundational AI Literacy in Primary Education	Maria Cristina Carrisi et.al.	2505.21398v1	null
2025-05-27	Dynamic Vision from EEG Brain Recordings: How much does EEG know?	Prajwal Singh et.al.	2505.21385v1	null
2025-05-27	ZigzagPointMamba: Spatial-Semantic Mamba for Point Cloud Understanding	Linshuang Diao et.al.	2505.21381v1	null
2025-05-27	Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?	Junhao Cheng et.al.	2505.21374v1	null
2025-05-26	Unleashing 5G Seamless Integration with TSN for Industry 5.0: Frame Forwarding and QoS Treatment	Oscar Adamuz-Hinojosa et.al.	2505.20239v1	null
2025-05-26	Research on feature fusion and multimodal patent text based on graph attention network	Zhenzhen Song et.al.	2505.20188v1	null
2025-05-26	Exposing Go's Hidden Bugs: A Novel Concolic Framework	Karolina Gorna et.al.	2505.20183v1	null
2025-05-26	Long-Context State-Space Video World Models	Ryan Po et.al.	2505.20171v1	null
2025-05-26	DeepInverse: A Python package for solving imaging inverse problems with deep learning	Julián Tachella et.al.	2505.20160v1	null
2025-05-26	HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters	Yi Chen et.al.	2505.20156v1	null
2025-05-26	UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter-Efficient Fine-Tuning of Large Models	Xueyan Zhang et.al.	2505.20154v1	null
2025-05-26	Improvement Strategies for Few-Shot Learning in OCT Image Classification of Rare Retinal Diseases	Cheng-Yu Tai et.al.	2505.20149v1	null
2025-05-26	FairTalk: Facilitating Balanced Participation in Video Conferencing by Implicit Visualization of Predicted Turn-Grabbing Intention	Ryo Iijima et.al.	2505.20138v1	null
2025-05-26	TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos	Fanheng Kong et.al.	2505.20124v1	link
2025-05-23	WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions	Zizhang Li et.al.	2505.18151v1	null
2025-05-23	TokBench: Evaluating Your Visual Tokenizer before Visual Generation	Junfeng Wu et.al.	2505.18142v1	null
2025-05-23	VideoGameBench: Can Vision-Language Models complete popular video games?	Alex L. Zhang et.al.	2505.18134v1	null
2025-05-23	TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations	Alan Arazi et.al.	2505.18125v1	null
2025-05-23	Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM	Zinuo Li et.al.	2505.18110v1	null
2025-05-23	Accelerating Learned Image Compression Through Modeling Neural Training Dynamics	Yichi Zhang et.al.	2505.18107v1	null
2025-05-23	F-ANcGAN: An Attention-Enhanced Cycle Consistent Generative Adversarial Architecture for Synthetic Image Generation of Nanoparticles	Varun Ajith et.al.	2505.18106v1	null
2025-05-23	Structural Dynamics of Harmful Content Dissemination on WhatsApp	Yuxin Liu et.al.	2505.18099v1	null
2025-05-23	DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations	Ziqiao Peng et.al.	2505.18096v1	null
2025-05-23	Early-Exit Graph Neural Networks	Andrea Giuseppe Di Francesco et.al.	2505.18088v1	null
2025-05-22	CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms	Shilin Yan et.al.	2505.17020v1	link
2025-05-22	Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space	Yan Li et.al.	2505.17011v1	null
2025-05-22	Topological Phases, Criticality, and Mixed State Order in a Hubbard Quantum Simulator	Lin Su et.al.	2505.17009v1	null
2025-05-22	Deep mineralogical segmentation of thin section images based on QEMSCAN maps	Jean Pablo Vieira de Mello et.al.	2505.17008v1	link
2025-05-22	CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning	Jiange Yang et.al.	2505.17006v1	null
2025-05-22	Seeing through Satellite Images at Street Views	Ming Qian et.al.	2505.17001v1	null
2025-05-22	Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction	Dong Li et.al.	2505.16980v1	null
2025-05-22	MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning	Suhao Yu et.al.	2505.16964v1	null
2025-05-22	On Multilingual Encoder Language Model Compression for Low-Resource Languages	Daniil Gurgurov et.al.	2505.16956v1	null
2025-05-22	On a certain class of para-Hermite Einstein spaces	Adam Chudecki et.al.	2505.16945v1	null
2025-05-21	Leveraging the Powerful Attention of a Pre-trained Diffusion Model for Exemplar-based Image Colorization	Satoshi Kosugi et.al.	2505.15812v1	link
2025-05-21	Adaptive Estimation and Learning under Temporal Distribution Shift	Dheeraj Baby et.al.	2505.15803v1	null
2025-05-21	Interspatial Attention for Efficient 4D Human Video Generation	Ruizhi Shao et.al.	2505.15800v1	null
2025-05-21	Large Language Models as Computable Approximations to Solomonoff Induction	Jun Wan et.al.	2505.15784v1	null
2025-05-21	Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention	Huanxuan Liao et.al.	2505.15774v1	null
2025-05-21	MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling	Cheng Yifan et.al.	2505.15772v1	null
2025-05-21	Neuro-Argumentative Learning with Case-Based Reasoning	Adam Gould et.al.	2505.15742v1	null
2025-05-21	iBitter-Stack: A Multi-Representation Ensemble Learning Model for Accurate Bitter Peptide Identification	Sarfraz Ahmad et.al.	2505.15730v1	null
2025-05-21	Privacy-Preserving Conformal Prediction Under Local Differential Privacy	Coby Penso et.al.	2505.15721v1	null
2025-05-21	MaxPoolBERT: Enhancing BERT Classification via Layer- and Token-Wise Aggregation	Maike Behrendt et.al.	2505.15696v1	null
2025-05-20	Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers	Sucheng Ren et.al.	2505.14687v1	link
2025-05-20	Emerging Properties in Unified Multimodal Pretraining	Chaorui Deng et.al.	2505.14683v1	null
2025-05-20	ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions	Bufang Yang et.al.	2505.14668v1	null
2025-05-20	EmoGist: Efficient In-Context Learning for Visual Emotion Understanding	Ronald Seoh et.al.	2505.14660v1	null
2025-05-20	Beyond Words: Multimodal LLM Knows When to Speak	Zikai Liao et.al.	2505.14654v1	null
2025-05-20	VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation	Wentao Ma et.al.	2505.14640v1	null
2025-05-20	A General Framework for Group Sparsity in Hyperspectral Unmixing Using Endmember Bundles	Gokul Bhusal et.al.	2505.14634v1	null
2025-05-20	Parabolic quantum affine algebras	Kudret Bostanci et.al.	2505.14624v1	null
2025-05-20	Assessing Projected Quantum Kernels for the Classification of IoT Data	Francesco D'Amore et.al.	2505.14593v1	null
2025-05-20	Automated Fetal Biometry Assessment with Deep Ensembles using Sparse-Sampling of 2D Intrapartum Ultrasound Images	Jayroop Ramesh et.al.	2505.14572v1	null
2025-05-19	Unlocking Non-Invasive Brain-to-Text	Dulhan Jayalath et.al.	2505.13446v1	null
2025-05-19	GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation	Abhay Deshpande et.al.	2505.13441v1	null
2025-05-19	Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos	Ruoyu Wang et.al.	2505.13440v1	link
2025-05-19	FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance	Dian Shao et.al.	2505.13437v1	null
2025-05-19	Synthetic-Powered Predictive Inference	Meshi Bashari et.al.	2505.13432v1	null
2025-05-19	Understanding Complexity in VideoQA via Visual Program Generation	Cristobal Eyzaguirre et.al.	2505.13429v1	null
2025-05-19	GuidedMorph: Two-Stage Deformable Registration for Breast MRI	Yaqian Chen et.al.	2505.13414v1	null
2025-05-19	Faster Video Diffusion with Trainable Sparse Attention	Peiyuan Zhang et.al.	2505.13389v1	null
2025-05-19	RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers	Ahmet Berke Gokmen et.al.	2505.13344v1	null
2025-05-19	Neural-Enhanced Rate Adaptation and Computation Distribution for Emerging mmWave Multi-User 3D Video Streaming Systems	Babak Badnava et.al.	2505.13337v1	null
2025-05-16	QVGen: Pushing the Limit of Quantized Video Generative Models	Yushi Huang et.al.	2505.11497v1	null
2025-05-16	SHIELD: Safety on Humanoids via CBFs In Expectation on Learned Dynamics	Lizhi Yang et.al.	2505.11494v1	null
2025-05-16	EMU/GAMA: A new approach to characterising radio luminosity functions	J. Prathap et.al.	2505.11453v1	null
2025-05-16	GOUHFI: a novel contrast- and resolution-agnostic segmentation tool for Ultra-High Field MRI	Marc-Antoine Fortin et.al.	2505.11445v1	link
2025-05-16	GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art	Chenkai Zhang et.al.	2505.11436v1	null
2025-05-16	Neuromorphic Imaging Flow Cytometry combined with Adaptive Recurrent Spiking Neural Networks	Georgios Moustakas et.al.	2505.11433v1	null
2025-05-16	Face Consistency Benchmark for GenAI Video	Michal Podstawski et.al.	2505.11425v1	null
2025-05-16	Energy efficiency analysis of Spiking Neural Networks for space applications	Paolo Lunghi et.al.	2505.11418v1	null
2025-05-16	Uncertainty quantification with approximate variational learning for wearable photoplethysmography prediction tasks	Ciaran Bench et.al.	2505.11412v1	null
2025-05-16	Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner	Wenchuan Zhang et.al.	2505.11404v1	null
2025-05-15	3D-Fixup: Advancing Photo Editing with 3D Priors	Yen-Chi Cheng et.al.	2505.10566v1	null
2025-05-15	Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data	Yiwen Liu et.al.	2505.10551v1	link
2025-05-15	Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning	Milan Ganai et.al.	2505.10547v1	null
2025-05-15	AORRTC: Almost-Surely Asymptotically Optimal Planning with RRT-Connect	Tyler Wilson et.al.	2505.10542v1	null
2025-05-15	LibIQ: Toward Real-Time Spectrum Classification in O-RAN dApps	Filippo Olimpieri et.al.	2505.10537v1	null
2025-05-15	Real-World fNIRS-Based Brain-Computer Interfaces: Benchmarking Deep Learning and Classical Models in Interactive Gaming	Mohammad Ghalavand et.al.	2505.10536v1	null
2025-05-15	Sobolev and quasiconformal distortion of intermediate dimension with applications to conformal dimension	Jonathan M. Fraser et.al.	2505.10525v1	null
2025-05-15	The Devil Is in the Word Alignment Details: On Translation-Based Cross-Lingual Transfer for Token Classification Tasks	Benedikt Ebing et.al.	2505.10507v1	null
2025-05-16	WeGA: Weakly-Supervised Global-Local Affinity Learning Framework for Lymph Node Metastasis Prediction in Rectal Cancer	Yifan Gao et.al.	2505.10502v2	null
2025-05-15	Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio	Tu Duyen Nguyen et.al.	2505.10500v1	null
2025-05-14	UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing	Yung-Hsuan Lai et.al.	2505.09615v1	null
2025-05-14	Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware	Justin Yu et.al.	2505.09601v1	null
2025-05-14	Rhomboid Tiling for Geometric Graph Deep Learning	Yipeng Zhang et.al.	2505.09586v1	null
2025-05-14	VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation	Chaofan Zhang et.al.	2505.09577v1	null
2025-05-14	Meta-learning Slice-to-Volume Reconstruction in Fetal Brain MRI using Implicit Neural Representations	Maik Dannecker et.al.	2505.09565v1	null
2025-05-14	Learning Long-Context Diffusion Policies via Past-Token Prediction	Marcel Torne et.al.	2505.09561v1	null
2025-05-14	Phase domain walls in coherently driven Bose-Einstein condensates	S. S. Gavrilov et.al.	2505.09553v1	null
2025-05-14	Learned Free-Energy Functionals from Pair-Correlation Matching for Dynamical Density Functional Theory	Karnik Ram et.al.	2505.09543v1	null
2025-05-14	Multimodal transformers with elemental priors for phase classification of X-ray diffraction spectra	Kangyu Ji et.al.	2505.09536v1	null
2025-05-14	Contactless Cardiac Pulse Monitoring Using Event Cameras	Mohamed Moustafa et.al.	2505.09529v1	link
2025-05-13	UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations	Hanjung Kim et.al.	2505.08787v1	null
2025-05-13	PCS-UQ: Uncertainty Quantification via the Predictability-Computability-Stability Framework	Abhineet Agarwal et.al.	2505.08784v1	null
2025-05-13	Implet: A Post-hoc Subsequence Explainer for Time Series Models	Fanyu Meng et.al.	2505.08748v1	link
2025-05-13	Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion	Huiyan Qi et.al.	2505.08747v1	null
2025-05-13	TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series	Xiaolei Qin et.al.	2505.08723v1	link
2025-05-13	Contrastive Normalizing Flows for Uncertainty-Aware Parameter Estimation	Ibrahim Elsharkawy et.al.	2505.08709v1	null
2025-05-13	Big Data and the Computational Social Science of Entrepreneurship and Innovation	Ningzi Li et.al.	2505.08706v1	null
2025-05-13	LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs	K M Sajjadul Islam et.al.	2505.08704v1	null
2025-05-14	Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities	George Saon et.al.	2505.08699v2	null
2025-05-13	VIViT: Variable-Input Vision Transformer Framework for 3D MR Image Segmentation	Badhan Kumar Das et.al.	2505.08693v1	null
2025-05-12	DanceGRPO: Unleashing GRPO on Visual Generation	Zeyue Xue et.al.	2505.07818v1	null
2025-05-12	Pixel Motion as Universal Representation for Robot Control	Kanchana Ranasinghe et.al.	2505.07817v1	null
2025-05-12	DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies	Tony Tao et.al.	2505.07813v1	null
2025-05-12	Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets	Weiyu Li et.al.	2505.07747v1	null
2025-05-12	BodyGPS: Anatomical Positioning System	Halid Ziya Yerebakan et.al.	2505.07744v1	null
2025-05-13	VTutor for High-Impact Tutoring at Scale: Managing Engagement and Real-Time Multi-Screen Monitoring with P2P Connections	Eason Chen et.al.	2505.07736v2	null
2025-05-12	Spoken Language Understanding on Unseen Tasks With In-Context Learning	Neeraj Agrawal et.al.	2505.07731v1	null
2025-05-12	Gameplay Highlights Generation	Vignesh Edithal et.al.	2505.07721v1	null
2025-05-12	PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes	Daniel Ogenrwot et.al.	2505.07700v1	null
2025-05-12	ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation	Feng Yuan et.al.	2505.07687v1	link
2025-05-09	Adapting a Segmentation Foundation Model for Medical Image Classification	Pengfei Gu et.al.	2505.06217v1	null
2025-05-09	Topo-VM-UNetV2: Encoding Topology into Vision Mamba UNet for Polyp Segmentation	Diego Adame et.al.	2505.06210v1	null
2025-05-09	Leveraging Multi-Task Learning for Multi-Label Power System Security Assessment	Muhy Eddin Za'ter et.al.	2505.06207v1	null
2025-05-09	Auto Tensor Singular Value Thresholding: A Non-Iterative and Rank-Free Framework for Tensor Denoising	Hiroki Hasegawa et.al.	2505.06203v1	null
2025-05-09	Neuro-Symbolic Concepts	Jiayuan Mao et.al.	2505.06191v1	null
2025-05-09	Brain Hematoma Marker Recognition Using Multitask Learning: SwinTransformer and Swin-Unet	Kodai Hirata et.al.	2505.06185v1	null
2025-05-09	Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach	Tim Schneider et.al.	2505.06182v1	null
2025-05-09	New Advances in Phonons: From Band Topology to Quasiparticle Chirality	Tiantian Zhang et.al.	2505.06179v1	null
2025-05-09	MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks	Wenqi Zeng et.al.	2505.06152v1	link
2025-05-09	Estimating Quality in Therapeutic Conversations: A Multi-Dimensional Natural Language Processing Framework	Alice Rueda et.al.	2505.06151v1	null
2025-05-08	SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation	Yonwoo Choi et.al.	2505.05475v1	link
2025-05-08	3D Scene Generation: A Survey	Beichen Wen et.al.	2505.05474v1	link
2025-05-08	StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant	Haibo Wang et.al.	2505.05467v1	null
2025-05-08	SITE: towards Spatial Intelligence Thorough Evaluation	Wenqi Wang et.al.	2505.05456v1	null
2025-05-08	Robustly optimal dynamics for active matter reservoir computing	Mario U. Gaimann et.al.	2505.05420v1	null
2025-05-08	DPQ-HD: Post-Training Compression for Ultra-Low Power Hyperdimensional Computing	Nilesh Prasad Pandey et.al.	2505.05413v1	null
2025-05-08	Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It	Marvin F. da Silva et.al.	2505.05409v1	null
2025-05-08	CART-ELC: Oblique Decision Tree Induction via Exhaustive Search	Andrew D. Laack et.al.	2505.05402v1	link
2025-05-08	OcularAge: A Comparative Study of Iris and Periocular Images for Pediatric Age Estimation	Naveenkumar G Venkataswamy et.al.	2505.05374v1	null
2025-05-08	BMS representations for generic supermomentum	Xavier Bekaert et.al.	2505.05368v1	null
2025-05-07	Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait	Feng Liu et.al.	2505.04616v1	null
2025-05-07	Dynamic Network Flow Optimization for Task Scheduling in PTZ Camera Surveillance Systems	Mohammad Merati et.al.	2505.04596v1	null
2025-05-07	Relative benefits of different active learning methods to conceptual physics learning	Meagan Sundstrom et.al.	2505.04577v1	null
2025-05-07	Multitask LSTM for Arboviral Outbreak Prediction Using Public Health Data	Lucas R. C. Farias et.al.	2505.04566v1	null
2025-05-07	Edge-GPU Based Face Tracking for Face Detection and Recognition Acceleration	Asma Baobaid et.al.	2505.04524v1	null
2025-05-07	Complementary legs and symplectic rational balls	John B. Etnyre et.al.	2505.04513v1	null
2025-05-08	HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation	Teng Hu et.al.	2505.04512v2	null
2025-05-07	Leveraging Simultaneous Usage of Edge GPU Hardware Engines for Video Face Detection and Recognition	Asma Baobaid et.al.	2505.04502v1	null
2025-05-08	FA-KPConv: Introducing Euclidean Symmetries to KPConv via Frame Averaging	Ali Alawieh et.al.	2505.04485v2	null
2025-05-07	Securing Immersive 360 Video Streams through Attribute-Based Selective Encryption	Mohammad Waquas Usmani et.al.	2505.04466v1	null
2025-05-06	Multi-Agent System for Comprehensive Soccer Understanding	Jiayuan Rao et.al.	2505.03735v1	null
2025-05-06	FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios	Shiyi Zhang et.al.	2505.03730v1	null
2025-05-07	Visual Imitation Enables Contextual Humanoid Control	Arthur Allshire et.al.	2505.03729v2	null
2025-05-06	DISARM++: Beyond scanner-free harmonization	Luca Caldera et.al.	2505.03715v1	null
2025-05-06	NBF at SemEval-2025 Task 5: Light-Burst Attention Enhanced System for Multilingual Subject Recommendation	Baharul Islam et.al.	2505.03711v1	null
2025-05-06	Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning	François Role et.al.	2505.03703v1	null
2025-05-06	Neural Integral Operators for Inverse problems in Spectroscopy	Emanuele Zappala et.al.	2505.03677v1	null
2025-05-06	Vector valued optimal transport: from dynamic to static formulations	Katy Craig et.al.	2505.03670v1	null
2025-05-06	m-accretive extensions of Friedrichs operators	Krešimir Burazin et.al.	2505.03657v1	null
2025-05-06	ALMA: Aggregated Lipschitz Maximization Attack on Auto-encoders	Chethan Krishnamurthy Ramanaik et.al.	2505.03646v1	null
2025-05-06	Towards Application-Specific Evaluation of Vision Models: Case Studies in Ecology and Biology	Alex Hoi Hang Chan et.al.	2505.02825v2	null
2025-05-05	Towards Quantifying the Hessian Structure of Neural Networks	Zhaorui Dong et.al.	2505.02809v1	null
2025-05-05	Beyond the Monitor: Mixed Reality Visualization and AI for Enhanced Digital Pathology Workflow	Jai Prakash Veerla et.al.	2505.02780v1	null
2025-05-05	Teaching the social media generation: rethinking learning without sacrificing quality	Sepinoud Azimi et.al.	2505.02770v1	null
2025-05-05	The use of Artificial Intelligence for Intervention and Assessment in Individuals with ASD	Aggeliki Sideraki et.al.	2505.02747v1	null
2025-05-05	The Spectrum of Stable Infinity Categories with Actions	Hisato Matsukawa et.al.	2505.02724v1	null
2025-05-05	A Rate-Quality Model for Learned Video Coding	Sang NguyenQuang et.al.	2505.02720v1	null
2025-05-05	Searching for supermassive black holes binaries within SRG/eROSITA-De I: Properties of the X-ray selected candidates	D. Tubín-Arenas et.al.	2505.02708v1	null
2025-05-05	Multi-View Learning with Context-Guided Receptance for Image Denoising	Binghong Chen et.al.	2505.02705v1	null
2025-05-05	A Survey on Progress in LLM Alignment from the Perspective of Reward Design	Miaomiao Ji et.al.	2505.02666v1	null
2025-05-02	GENMO: A GENeralist Model for Human MOtion	Jiefeng Li et.al.	2505.01425v1	null
2025-05-02	VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models	Mohammadreza Teymoorianfard et.al.	2505.01406v1	null
2025-05-02	Potential Contrast: Properties, Equivalences, and Generalization to Multiple Classes	Wallace Peaslee et.al.	2505.01388v1	null
2025-05-02	Emerging Media Use and Acceptance of Digital Immortality: A Cluster Analysis among Chinese Young Generations	Yi Mou et.al.	2505.01355v1	null
2025-05-02	How to Learn a Star: Binary Classification with Starshaped Polyhedral Sets	Marie-Charlotte Brandenburg et.al.	2505.01346v1	null
2025-05-02	Classifying Radio-Loud and Radio-Quiet Quasars With Novel PCA Based Regression Classifier	Ramkrishna Joshi et.al.	2505.01335v1	null
2025-05-02	DebtStreamness: An Ecological Approach to Credit Flows in Inter-Firm Networks	Anahí Rodríguez-Martínez et.al.	2505.01326v1	null
2025-05-02	Helping Big Language Models Protect Themselves: An Enhanced Filtering and Summarization System	Sheikh Samit Muhaimin et.al.	2505.01315v1	null
2025-05-02	Contactless pulse rate assessment: Results and insights for application in driving simulator	Đorđe D. Nešković et.al.	2505.01299v1	null
2025-05-02	ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow	Changhe Chen et.al.	2505.01288v1	null
2025-05-01	Controllable Weather Synthesis and Removal with Video Diffusion Models	Chih-Hao Lin et.al.	2505.00704v1	null
2025-05-01	GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution	Aditya Arora et.al.	2505.00687v1	null
2025-05-01	MINERVA: Evaluating Complex Video Reasoning	Arsha Nagrani et.al.	2505.00681v1	null
2025-05-01	*Rational points on $X_0(N)^$ when $N$ is non-squarefree**	Sachi Hashimoto et.al.	2505.00680v1	null
2025-05-01	Deep Learning Assisted Outer Volume Removal for Highly-Accelerated Real-Time Dynamic MRI	Merve Gülle et.al.	2505.00643v1	null
2025-05-01	Bayes-Optimal Fair Classification with Multiple Sensitive Features	Yi Yang et.al.	2505.00631v1	null
2025-05-01	Brain Foundation Models with Hypergraph Dynamic Adapter for Brain Disease Analysis	Zhongying Deng et.al.	2505.00627v1	null
2025-05-01	Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction	Simon Giebenhain et.al.	2505.00615v1	null
2025-05-01	Dietary Intake Estimation via Continuous 3D Reconstruction of Food	Wallace Lee et.al.	2505.00606v1	null
2025-05-01	Visual Trajectory Prediction of Vessels for Inland Navigation	Alexander Puzicha et.al.	2505.00599v1	null
2025-04-30	ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction	Qihao Liu et.al.	2504.21855v1	null
2025-04-30	A Survey of Interactive Generative Video	Jiwen Yu et.al.	2504.21853v1	null
2025-04-30	Active Light Modulation to Counter Manipulation of Speech Visual Content	Hadleigh Schwartz et.al.	2504.21846v1	null
2025-04-30	Neuro-Symbolic Generation of Explanations for Robot Policies with Weighted Signal Temporal Logic	Mikihisa Yuasa et.al.	2504.21841v1	null
2025-04-30	Learning Universal User Representations Leveraging Cross-domain User Intent at Snapchat	Clark Mingxuan Ju et.al.	2504.21838v1	null
2025-04-30	Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization	Anas Anwarul Haq Khan et.al.	2504.21831v1	null
2025-04-30	Discrete series for the graded Hecke algebra of type $H_{4}$	Kei Yuen Chan et.al.	2504.21790v1	null
2025-04-30	LoC-LIC: Low Complexity Learned Image Coding Using Hierarchical Feature Transforms	Ayman A. Ameen et.al.	2504.21778v1	null
2025-04-30	Solving Copyright Infringement on Short Video Platforms: Novel Datasets and an Audio Restoration Deep Learning Pipeline	Minwoo Oh et.al.	2504.21772v1	null
2025-04-30	Ends of the strata of differentials	Benjamin Dozier et.al.	2504.21756v1	null
2025-04-29	TesserAct: Learning 4D Embodied World Models	Haoyu Zhen et.al.	2504.20995v1	null
2025-04-29	Photonic Quantum Convolutional Neural Networks with Adaptive State Injection	Léo Monbroussou et.al.	2504.20989v1	null
2025-04-29	SVD Based Least Squares for X-Ray Pneumonia Classification Using Deep Features	Mete Erdogan et.al.	2504.20970v1	null
2025-04-29	Soft-X-ray momentum microscopy of nonlinear magnon interactions below 100-nm wavelength	Steffen Wittrock et.al.	2504.20958v1	null
2025-04-30	DS_FusionNet: Dynamic Dual-Stream Fusion with Bidirectional Knowledge Distillation for Plant Disease Recognition	Yanghui Song et.al.	2504.20948v2	link
2025-04-29	Improvements of Dark Experience Replay and Reservoir Sampling towards Better Balance between Consolidation and Plasticity	Taisuke Kobayashi et.al.	2504.20932v1	null
2025-04-29	Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers	Quentin Guimard et.al.	2504.20902v1	null
2025-04-29	CBM-RAG: Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models	Hasan Md Tusfiqur Alam et.al.	2504.20898v1	null
2025-04-29	Imaging on the Edge: Mapping Object Corners and Edges with Stereo X-ray Tomography	Zhenduo Shang et.al.	2504.20892v1	null
2025-04-30	Quantifying the Noise of Structural Perturbations on Graph Adversarial Attacks	Junyuan Fang et.al.	2504.20869v2	null
2025-04-28	Learning Streaming Video Representation via Multitask Training	Yibin Yan et.al.	2504.20041v1	null
2025-04-28	Pan-genome Analysis of Angiosperm Plastomes using PGR-TK	Manoj P. Samanta et.al.	2504.20034v1	null
2025-04-28	Towards AI-Driven Policing: Interdisciplinary Knowledge Discovery from Police Body-Worn Camera Footage	Anita Srbinovska et.al.	2504.20007v1	null
2025-04-28	Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose	Narges Rashvand et.al.	2504.19970v1	null
2025-04-28	Enhancing Quality for VVC Compressed Videos with Omniscient Quality Enhancement Model	Xiem HoangVan et.al.	2504.19935v1	null
2025-04-28	Accelerated 3D-3D rigid registration of echocardiographic images obtained from apical window using particle filter	Thanuja Uruththirakodeeswaran et.al.	2504.19930v1	null
2025-04-28	Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI	Hugo Georgenthum et.al.	2504.19918v1	null
2025-04-28	Breast Cancer Detection from Multi-View Screening Mammograms with Visual Prompt Tuning	Han Chen et.al.	2504.19900v1	null
2025-04-28	GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets	Mingqian He et.al.	2504.19898v1	null
2025-04-28	CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition	Quynh Phung et.al.	2504.19894v1	null
2025-04-25	RSFR: A Coarse-to-Fine Reconstruction Framework for Diffusion Tensor Cardiac MRI with Semantic-Aware Refinement	Jiahao Huang et.al.	2504.18520v1	null
2025-04-25	Co-Change Graph Entropy: A New Process Metric for Defect Prediction	Ethari Hrishikesh et.al.	2504.18511v1	null
2025-04-25	Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models	Patrick Müller et.al.	2504.18510v1	null
2025-04-25	SymTFT, Protected Gaplessness, and Spontaneous Breaking of Non-invertible Symmetries	Michele Del Zotto et.al.	2504.18501v1	null
2025-04-25	Quasi-Einstein structures and Hitchin's equations	Alex Colling et.al.	2504.18475v1	null
2025-04-25	A Novel Taxonomy and Classification Scheme for Code Smell Interactions	Ruchin Gupta et.al.	2504.18469v1	null
2025-04-25	A Taylor Series Approach to Correction of Input Errors in Gaussian Process Regression	Muzaffar Qureshi et.al.	2504.18463v1	null
2025-04-25	Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training	Hiroki Naganuma et.al.	2504.18454v1	null
2025-04-25	NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration	Haotian Dong et.al.	2504.18448v1	null
2025-04-25	Iterative Event-based Motion Segmentation by Variational Contrast Maximization	Ryo Yamaki et.al.	2504.18447v1	null
2025-04-24	Dynamic Camera Poses and Where to Find Them	Chris Rockwell et.al.	2504.17788v1	null
2025-04-24	Silenzio: Secure Non-Interactive Outsourced MLP Training	Jonas Sander et.al.	2504.17785v1	null
2025-04-24	Disaggregated Deep Learning via In-Physics Computing at Radio Frequency	Zhihui Gao et.al.	2504.17752v1	null
2025-04-24	MSGCN: Multiplex Spatial Graph Convolution Network for Interlayer Link Weight Prediction	Steven E. Wilson et.al.	2504.17749v1	null
2025-04-24	Interpretable Early Detection of Parkinson's Disease through Speech Analysis	Lorenzo Simone et.al.	2504.17739v1	null
2025-04-24	CasualHDRSplat: Robust High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos	Shucheng Gong et.al.	2504.17728v1	null
2025-04-24	Unsupervised EEG-based decoding of absolute auditory attention with canonical correlation analysis	Nicolas Heintz et.al.	2504.17724v1	null
2025-04-24	Evaluating Uncertainty in Deep Gaussian Processes	Matthijs van der Lende et.al.	2504.17719v1	null
2025-04-24	Early Detection of Multidrug Resistance Using Multivariate Time Series Analysis and Interpretable Patient-Similarity Representations	Óscar Escudero-Arnanz et.al.	2504.17717v1	null
2025-04-24	Self-Supervised Noise Adaptive MRI Denoising via Repetition to Repetition (Rep2Rep) Learning	Nikola Janjušević et.al.	2504.17698v1	null
2025-04-23	I-Con: A Unifying Framework for Representation Learning	Shaden Alshammari et.al.	2504.16929v1	null
2025-04-23	Year six photometric measurements of known Trans-Neptunian Objects and Centaurs by the Dark Energy Survey	Feliphe S. Ferreira et.al.	2504.16927v1	null
2025-04-23	Meta-Learning Online Dynamics Model Adaptation in Off-Road Autonomous Driving	Jacob Levy et.al.	2504.16923v1	null
2025-04-23	Tracing Thought: Using Chain-of-Thought Reasoning to Identify the LLM Behind AI-Generated Text	Shifali Agrahari et.al.	2504.16913v1	null
2025-04-23	BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation	Ruotong Wang et.al.	2504.16907v1	null
2025-04-23	A new approach to the classification of almost contact metric manifolds via intrinsic endomorphisms	Ilka Agricola et.al.	2504.16900v1	null
2025-04-23	Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification	Alexander Shvets et.al.	2504.16856v1	null
2025-04-23	Energetics of the nucleation and glide of disconnection modes in symmetric tilt grain boundaries	Himanshu Joshi et.al.	2504.16854v1	null
2025-04-23	A Low-Cost Photogrammetry System for 3D Plant Modeling and Phenotyping	Joe Hrzich et.al.	2504.16840v1	null
2025-04-23	Symbiotic stars in the era of modern ground- and space-based surveys	Jaroslav Merc et.al.	2504.16825v1	null
2025-04-22	MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention	Yucheng Li et.al.	2504.16083v1	null
2025-04-22	MR. Video: "MapReduce" is the Principle for Long Video Understanding	Ziqi Pang et.al.	2504.16082v1	null
2025-04-22	Survey of Video Diffusion Models: Foundations, Implementations, and Applications	Yimu Wang et.al.	2504.16081v1	null
2025-04-22	Describe Anything: Detailed Localized Image and Video Captioning	Long Lian et.al.	2504.16072v1	null
2025-04-22	Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis	Frank Li et.al.	2504.16047v1	null
2025-04-22	LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale	Joya Chen et.al.	2504.16030v1	null
2025-04-22	Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework	Xinyuan Song et.al.	2504.16016v1	null
2025-04-22	MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment	Yachun Mi et.al.	2504.16003v1	null
2025-04-22	Neuroadaptive Haptics: Comparing Reinforcement Learning from Explicit Ratings and Neural Signals for Adaptive XR Systems	Lukas Gehrke et.al.	2504.15984v1	null
2025-04-22	Bug Destiny Prediction in Large Open-Source Software Repositories through Sentiment Analysis and BERT Topic Modeling	Sophie C. Pope et.al.	2504.15972v1	null
2025-04-22	DRAWER: Digital Reconstruction and Articulation With Environment Realism	Hongchi Xia et.al.	2504.15278v2	null
2025-04-21	Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models	Guo Chen et.al.	2504.15271v1	null
2025-04-21	An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes	Ji Qi et.al.	2504.15270v1	null
2025-04-21	Diffusion Bridge Models for 3D Medical Image Translation	Shaorong Zhang et.al.	2504.15267v1	null
2025-04-21	SuoiAI: Building a Dataset for Aquatic Invertebrates in Vietnam	Tue Vo et.al.	2504.15252v1	null
2025-04-21	On Walker and para-Hermite Einstein spaces	Adam Chudecki et.al.	2504.15221v1	null
2025-04-22	Histogram-based Parameter-efficient Tuning for Passive Sonar Classification	Amirmohammad Mohammadi et.al.	2504.15214v2	null
2025-04-21	Automated Measurement of Eczema Severity with Self-Supervised Learning	Neelesh Kumar et.al.	2504.15193v1	null
2025-04-21	Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform	Xianpan Zhou et.al.	2504.15182v1	null
2025-04-21	FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image	Fei Yin et.al.	2504.15179v1	null
2025-04-18	Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models	Junjie Yang et.al.	2504.13825v1	null
2025-04-18	CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning	Yang Yue et.al.	2504.13820v1	link
2025-04-18	The Binary and Ternary Quantization Can Improve Feature Discrimination	Weizhi Lu et.al.	2504.13792v1	null
2025-04-18	Fighting Fires from Space: Leveraging Vision Transformers for Enhanced Wildfire Detection and Characterization	Aman Agarwal et.al.	2504.13776v1	null
2025-04-18	Detecting Malicious Source Code in PyPI Packages with LLMs: Does RAG Come in Handy?	Motunrayo Ibiyo et.al.	2504.13769v1	null
2025-04-18	Modeling L1 Influence on L2 Pronunciation: An MFCC-Based Framework for Explainable Machine Learning and Pedagogical Feedback	Peyman Jahanbin et.al.	2504.13765v1	null
2025-04-18	Fragile Watermarking for Image Certification Using Deep Steganographic Embedding	Davide Ghiani et.al.	2504.13759v1	null
2025-04-18	Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis	Zhu Zhu et.al.	2504.13754v1	null
2025-04-18	LimitNet: Progressive, Content-Aware Image Offloading for Extremely Weak Devices & Networks	Ali Hojjat et.al.	2504.13736v1	null
2025-04-18	The relativity of color perception	Michel Berthier et.al.	2504.13720v1	null
2025-04-17	Perception Encoder: The best visual embeddings are not at the output of the network	Daniel Bolya et.al.	2504.13181v1	null
2025-04-17	PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding	Jang Hyun Cho et.al.	2504.13180v1	null
2025-04-18	ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos	Zetong Zhang et.al.	2504.13167v2	null
2025-04-17	Digital Twin Generation from Visual Data: A Survey	Andrew Melnik et.al.	2504.13159v1	null
2025-04-17	St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World	Haiwen Feng et.al.	2504.13152v1	null
2025-04-17	Readable Twins of Unreadable Models	Krzysztof Pancerz et.al.	2504.13150v1	null
2025-04-17	Long Range Navigator (LRN): Extending robot planning horizons beyond metric maps	Matt Schmittle et.al.	2504.13149v1	null
2025-04-17	PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition	Jongseo Lee et.al.	2504.13140v1	null
2025-04-17	NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results	Xin Li et.al.	2504.13131v1	link
2025-04-17	VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models	Haojian Huang et.al.	2504.13122v1	link
2025-04-16	Adapting a World Model for Trajectory Following in a 3D Game	Marko Tot et.al.	2504.12299v1	null
2025-04-16	SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians	Liam Schoneveld et.al.	2504.12292v1	null
2025-04-16	Beyond Reconstruction: A Physics Based Neural Deferred Shader for Photo-realistic Rendering	Zhuo He et.al.	2504.12273v1	null
2025-04-16	Correlation Ratio for Unsupervised Learning of Multi-modal Deformable Registration	Xiaojian Chen et.al.	2504.12265v1	null
2025-04-16	VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate	Zhihang Yuan et.al.	2504.12259v1	null
2025-04-16	FLIP Reasoning Challenge	Andreas Plesner et.al.	2504.12256v1	null
2025-04-16	Human Aligned Compression for Robust Models	Samuel Räber et.al.	2504.12255v1	null
2025-04-16	Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography	Zhijin He et.al.	2504.12249v1	null
2025-04-16	SIDME: Self-supervised Image Demoiréing via Masked Encoder-Decoder Reconstruction	Xia Wang et.al.	2504.12245v1	null
2025-04-16	Coding-Prior Guided Diffusion Network for Video Deblurring	Yike Liu et.al.	2504.12222v1	null
2025-04-15	Mamba-Based Ensemble learning for White Blood Cell Classification	Lewis Clifton et.al.	2504.11438v1	null
2025-04-15	Enhancing Out-of-Distribution Detection with Extended Logit Normalization	Yifan Ding et.al.	2504.11434v1	null
2025-04-15	Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models	Maria Teleki et.al.	2504.11431v1	null
2025-04-15	NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors	Yanrui Bin et.al.	2504.11427v1	null
2025-04-15	Deep Learning-based Bathymetry Retrieval without In-situ Depths using Remote Sensing Imagery and SfM-MVS DSMs with Data Gaps	Panagiotis Agrafiotis et.al.	2504.11416v1	null
2025-04-15	Statistical few-shot learning for large-scale classification via parameter pooling	Andrew Simpson et.al.	2504.11404v1	null
2025-04-15	VideoPanda: Video Panoramic Diffusion with Multi-view Attention	Kevin Xie et.al.	2504.11389v1	null
2025-04-15	Trajectory Encoding Temporal Graph Networks	Jiafeng Xiong et.al.	2504.11386v1	null
2025-04-15	Ring Artifacts Correction Based on Global-Local Features Interaction Guidance in the Projection Domain	Yunze Liu et.al.	2504.11375v1	null
2025-04-15	A two-phase quenching-type problem for the p-Laplacian	Julio C. Correa et.al.	2504.11370v1	null
2025-04-14	DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting	Zeren Jiang et.al.	2504.10486v1	null
2025-04-14	Quantum Barcodes: Persistent Homology for Quantum Phase Transitions	Khyathi Komalan et.al.	2504.10468v1	null
2025-04-14	Integrating Vision and Location with Transformers: A Multimodal Deep Learning Framework for Medical Wound Analysis	Ramin Mousa et.al.	2504.10452v1	null
2025-04-14	Multimodal Long Video Modeling Based on Temporal Dynamic Context	Haoran Hao et.al.	2504.10443v1	null
2025-04-14	Framing Perception: Exploring Camera Induced Objectification in Cinema	Parth Maradia et.al.	2504.10404v1	null
2025-04-14	PG-DPIR: An efficient plug-and-play method for high-count Poisson-Gaussian inverse problems	Maud Biquard et.al.	2504.10375v1	null
2025-04-14	Proteinoid spikes: from protocognitive to universal approximating agents	Saksham Sharma et.al.	2504.10362v1	null
2025-04-14	FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos	Rui Chen et.al.	2504.10358v1	null
2025-04-14	Patch and Shuffle: A Preprocessing Technique for Texture Classification in Autonomous Cementitious Fabrication	Jeremiah Giordani et.al.	2504.10353v1	null
2025-04-14	Domain-Adversarial Neural Network and Explainable AI for Reducing Tissue-of-Origin Signal in Pan-cancer Mortality Classification	Cristian Padron-Manrique et.al.	2504.10343v1	null
2025-04-11	ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning	Sahil Sethi et.al.	2504.08713v1	null
2025-04-11	Hypergraph Vision Transformers: Images are More than Nodes, More than Edges	Joshua Fixelle et.al.	2504.08710v1	null
2025-04-11	Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model	Team Seawead et.al.	2504.08685v1	null
2025-04-11	BowelRCNN: Region-based Convolutional Neural Network System for Bowel Sound Auscultation	Igor Matynia et.al.	2504.08659v1	null
2025-04-11	The Invisible EgoHand: 3D Hand Forecasting through EgoBody Pose Estimation	Masashi Hatano et.al.	2504.08654v1	null
2025-04-11	Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization	Jialu Li et.al.	2504.08641v1	null
2025-04-11	Transformer Learns Optimal Variable Selection in Group-Sparse Classification	Chenyang Zhang et.al.	2504.08638v1	null
2025-04-11	Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition	Lei Kang et.al.	2504.08616v1	null
2025-04-11	Enhancing knowledge retention for continual learning with domain-specific adapters and features gating	Mohamed Abbas Hedjazi et.al.	2504.08613v1	null
2025-04-11	A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English	Julian Bäumler et.al.	2504.08609v1	null
2025-04-10	GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation	Lang Lin et.al.	2504.07962v1	null
2025-04-10	Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction	Zeren Jiang et.al.	2504.07961v1	null
2025-04-10	VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning	Yukun Qi et.al.	2504.07956v1	null
2025-04-10	BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation	Yuanhong Yu et.al.	2504.07955v1	null
2025-04-10	InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians	Kefan Chen et.al.	2504.07949v1	null
2025-04-10	Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos	Rundong Luo et.al.	2504.07940v1	null
2025-04-10	Zero-Shot Low-dose CT Denoising via Sinogram Flicking	Yongyi Shi et.al.	2504.07927v1	null
2025-04-10	SKK groups of manifolds and non-unitary invertible TQFTs	Renee S. Hoekzema et.al.	2504.07917v1	null
2025-04-10	Semantically Encoding Activity Labels for Context-Aware Human Activity Recognition	Wen Ge et.al.	2504.07916v1	link
2025-04-10	The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound	Blake VanBerlo et.al.	2504.07904v1	null
2025-04-09	Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning	Nikhil Shivakumar Nayak et.al.	2504.07097v1	null
2025-04-09	FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution	Gene Chou et.al.	2504.07093v1	null
2025-04-09	Are We Done with Object-Centric Learning?	Alexander Rubinstein et.al.	2504.07092v1	null
2025-04-10	GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography	Mengchen Zhang et.al.	2504.07083v2	null
2025-04-09	Detecting AI-generated Artwork	Meien Li et.al.	2504.07078v1	null
2025-04-09	Enhancing Downstream Analysis in Genome Sequencing: Species Classification While Basecalling	Riselda Kodra et.al.	2504.07065v1	null
2025-04-09	$Π$-NeSy: A Possibilistic Neuro-Symbolic Approach	Ismaïl Baaj et.al.	2504.07055v1	null
2025-04-09	Classification results for totally real surfaces of nearly Kähler $\mathbb{C}P^3$	Michaël Liefsoens et.al.	2504.07035v1	null
2025-04-09	Weak Signals and Heavy Tails: Machine-learning meets Extreme Value Theory	Stephan Clémençon et.al.	2504.06984v1	null
2025-04-10	VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning	Xinhao Li et.al.	2504.06958v2	null
2025-04-08	PainNet: Statistical Relation Network with Episode-Based Training for Pain Estimation	Mina Bishay et.al.	2504.06257v1	null
2025-04-08	Monitoring Viewer Attention During Online Ads	Mina Bishay et.al.	2504.06237v1	null
2025-04-08	From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models	Chejian Xu et.al.	2504.06214v1	null
2025-04-08	HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation	Yiming Liang et.al.	2504.06210v1	null
2025-04-08	An experimental survey and Perspective View on Meta-Learning for Automated Algorithms Selection and Parametrization	Moncef Garouani et.al.	2504.06207v1	null
2025-04-08	HRMedSeg: Unlocking High-resolution Medical Image segmentation via Memory-efficient Attention Modeling	Qing Xu et.al.	2504.06205v1	link
2025-04-08	Positive 3-braids, Khovanov homology and Garside theory	Álvaro Del Valle Vílchez et.al.	2504.06194v1	null
2025-04-08	Rethinking the Nested U-Net Approach: Enhancing Biomarker Segmentation with Attention Mechanisms and Multiscale Feature Fusion	Saad Wazir et.al.	2504.06158v1	link
2025-04-08	A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning	Akash Kumar et.al.	2504.06153v1	null
2025-04-08	Optimal classification with outcome performativity	Elizabeth Maggie Penn et.al.	2504.06127v1	null
2025-04-07	SmolVLM: Redefining small and efficient multimodal models	Andrés Marafioti et.al.	2504.05299v1	null
2025-04-07	One-Minute Video Generation with Test-Time Training	Karan Dalal et.al.	2504.05298v1	null
2025-04-07	Hopf tori and standard tori	Leonardo A. Cano García et.al.	2504.05285v1	null
2025-04-07	AnomalousNet: A Hybrid Approach with Attention U-Nets and Change Point Detection for Accurate Characterization of Anomalous Diffusion in Video Data	Yusef Ahsini et.al.	2504.05271v1	null
2025-04-07	Explaining Low Perception Model Competency with High-Competency Counterfactuals	Sara Pohland et.al.	2504.05254v1	null
2025-04-07	Federated Learning for Medical Image Classification: A Comprehensive Benchmark	Zhekai Zhou et.al.	2504.05238v1	null
2025-04-07	Mapping biodiversity at very-high resolution in Europe	César Leblanc et.al.	2504.05231v1	null
2025-04-07	Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation	Jiaming Chen et.al.	2504.05225v1	null
2025-04-07	An ensemble deep learning approach to detect tumors on Mohs micrographic surgery slides	Abdurrahim Yilmaz et.al.	2504.05219v1	null
2025-04-07	LLM-Alignment Live-Streaming Recommendation	Yueyang Liu et.al.	2504.05217v1	null
2025-04-04	Bonsai: Interpretable Tree-Adaptive Grounded Reasoning	Kate Sanders et.al.	2504.03640v1	null
2025-04-04	MedSAM2: Segment Anything in 3D Medical Images and Videos	Jun Ma et.al.	2504.03600v1	null
2025-04-04	Real-is-Sim: Bridging the Sim-to-Real Gap with a Dynamic Digital Twin for Real-World Robot Policy Evaluation	Jad Abou-Chakra et.al.	2504.03597v1	null
2025-04-04	AdaViT: Adaptive Vision Transformer for Flexible Pretrain and Finetune with Variable 3D Medical Image Modalities	Badhan Kumar Das et.al.	2504.03589v1	null
2025-04-04	AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing	Niu Lian et.al.	2504.03587v1	link
2025-04-04	Dense Neural Network Based Arrhythmia Classification on Low-cost and Low-compute Micro-controller	Md Abu Obaida Zishan et.al.	2504.03531v1	null
2025-04-04	LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders	Ilan Naiman et.al.	2504.03501v1	null
2025-04-04	Physics-informed 4D X-ray image reconstruction from ultra-sparse spatiotemporal data	Zisheng Yao et.al.	2504.03469v1	null
2025-04-04	Conditioning Diffusions Using Malliavin Calculus	Jakiw Pidstrigach et.al.	2504.03461v1	null
2025-04-04	Early detection of diabetes through transfer learning-based eye (vision) screening and improvement of machine learning model performance and advanced parameter setting algorithms	Mohammad Reza Yousefi et.al.	2504.03439v1	null
2025-04-03	STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection	Divya Velayudhan et.al.	2504.02823v1	null
2025-04-03	GMR-Conv: An Efficient Rotation and Reflection Equivariant Convolution Kernel Using Gaussian Mixture Rings	Yuexi Du et.al.	2504.02819v1	null
2025-04-03	BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation	Van Nguyen Nguyen et.al.	2504.02812v1	null
2025-04-03	Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets	Chuning Zhu et.al.	2504.02792v1	null
2025-04-03	GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation	Zhiyuan Yan et.al.	2504.02782v1	null
2025-04-03	Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model	Shengjun Zhang et.al.	2504.02764v1	null
2025-04-03	A Complete Classification of Fourier Summation Formulas on the real line	Felipe Gonçalves et.al.	2504.02741v1	null
2025-04-03	HQViT: Hybrid Quantum Vision Transformer for Image Classification	Hui Zhang et.al.	2504.02730v1	null
2025-04-03	Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation	Xingguang Zhang et.al.	2504.02697v1	null
2025-04-03	Two-Stage nnU-Net for Automatic Multi-class Bi-Atrial Segmentation from LGE-MRIs	Y. On et.al.	2504.02668v1	null
2025-04-02	Learning from Streaming Video with Orthogonal Gradients	Tengda Han et.al.	2504.01961v1	null
2025-04-02	Slot-Level Robotic Placement via Visual Imitation from Single Human Video	Dandan Shan et.al.	2504.01959v1	null
2025-04-03	VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step	Hanyang Wang et.al.	2504.01956v2	null
2025-04-02	A thorough benchmark of automatic text classification: From traditional approaches to large language models	Washington Cunha et.al.	2504.01930v1	null
2025-04-02	Gen-C: Populating Virtual Worlds with Generative Crowds	Andreas Panayiotou et.al.	2504.01924v1	null
2025-04-02	Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness	Haochen Wang et.al.	2504.01901v1	null
2025-04-02	Is Temporal Prompting All We Need For Limited Labeled Action Recognition?	Shreyank N Gowda et.al.	2504.01890v1	null
2025-04-02	CO-DEFEND: Continuous Decentralized Federated Learning for Secure DoH-Based Threat Detection	Diego Cajaraville-Aboy et.al.	2504.01882v1	null
2025-04-02	Architect Your Landscape Approach (AYLA) for Optimizations in Deep Learning	Ben Keslaki et.al.	2504.01875v1	null
2025-04-02	Buggin: Automatic intrinsic bugs classification model using NLP and ML	Pragya Bhandari et.al.	2504.01869v1	null
2025-03-31	Easi3R: Estimating Disentangled Motion from DUSt3R Without Training	Xingyu Chen et.al.	2503.24391v1	link
2025-03-31	Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation	Shengqiong Wu et.al.	2503.24379v1	null
2025-03-31	Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1	Yi Chen et.al.	2503.24376v1	link
2025-04-02	Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation	Abhiram Maddukuri et.al.	2503.24361v2	null
2025-03-31	Single-Shot Matrix-Matrix Multiplication Optical Tensor Processor for Deep Learning	Chao Luan et.al.	2503.24356v1	null
2025-03-31	PathOrchestra: A Comprehensive Foundation Model for Computational Pathology with Over 100 Diverse Clinical-Grade Tasks	Fang Yan et.al.	2503.24345v1	null
2025-03-31	On gradient $ρ$-Einstein solitons with Bach tensor radially nonnegative	Maria Andrade et.al.	2503.24337v1	null
2025-03-31	NoProp: Training Neural Networks without Back-propagation or Forward-propagation	Qinyu Li et.al.	2503.24322v1	null
2025-03-31	A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG	Arshia Kermani et.al.	2503.24307v1	null
2025-03-31	Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions	Thinesh Thiyakesan Ponbagavathi et.al.	2503.24298v1	null
2025-03-28	Understanding Co-speech Gestures in-the-wild	Sindhu B Hegde et.al.	2503.22668v1	null
2025-03-28	Evaluation of Machine-generated Biomedical Images via A Tally-based Similarity Measure	Frank J. Brooks et.al.	2503.22658v1	null
2025-03-28	Deep learning-enabled prediction of surgical errors during cataract surgery: from simulation to real-world application	Maxime Faure et.al.	2503.22647v1	null
2025-03-28	Sentiment Classification of Thai Central Bank Press Releases Using Supervised Learning	Stefano Grassi et.al.	2503.22629v1	null
2025-03-28	Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model	Jangho Park et.al.	2503.22622v1	null
2025-03-28	Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users	Antonia Karamolegkou et.al.	2503.22610v1	null
2025-03-28	Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis	Shuai Shen et.al.	2503.22605v1	null
2025-03-28	Zero-homogeneous and $O(2)$-equivariant critical points of the Oseen-Frank energy with multiple Frank constants	Luc Nguyen et.al.	2503.22599v1	null
2025-03-28	KEVS: Enhancing Segmentation of Visceral Adipose Tissue in Pre-Cystectomy CT with Gaussian Kernel Density Estimation	Thomas Boucher et.al.	2503.22592v1	null
2025-03-28	Using AI to Summarize US Presidential Campaign TV Advertisement Videos, 1952-2012	Adam Breuer et.al.	2503.22589v1	link
2025-03-27	Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model	Abdelrahman Shaker et.al.	2503.21782v1	link
2025-03-27	VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models	Chi-Pin Huang et.al.	2503.21781v1	null
2025-03-27	Video-R1: Reinforcing Video Reasoning in MLLMs	Kaituo Feng et.al.	2503.21776v1	link
2025-03-27	StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion	Ziyu Guo et.al.	2503.21775v1	null
2025-03-27	Exploring the Evolution of Physics Cognition in Video Generation: A Survey	Minghui Lin et.al.	2503.21765v1	link
2025-03-28	Phases with non-invertible symmetries in 1+1D $\unicode{x2013}$ symmetry protected topological orders as duality automorphisms	Ömer M. Aksoy et.al.	2503.21764v2	null
2025-03-27	Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video	David Yifan Yao et.al.	2503.21761v1	null
2025-03-27	Large Scale Structure and the Cosmic Web	Rita Tojeiro et.al.	2503.21759v1	null
2025-03-27	VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness	Dian Zheng et.al.	2503.21755v1	link
2025-03-27	MAVERIX: Multimodal Audio-Visual Evaluation Reasoning IndeX	Liuyue Xie et.al.	2503.21699v1	null
2025-03-26	Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency	Tianqi Liu et.al.	2503.20785v1	null
2025-03-26	Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising	Yan-Bo Lin et.al.	2503.20782v1	null
2025-03-26	BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation	Yulu Pan et.al.	2503.20781v1	null
2025-03-26	Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields	Shijie Zhou et.al.	2503.20776v1	null
2025-03-26	Disentangled Source-Free Personalization for Facial Expression Recognition with Neutral Target Data	Masoumeh Sharafi et.al.	2503.20771v1	null
2025-03-27	An Empirical Study of the Impact of Federated Learning on Machine Learning Model Accuracy	Haotian Yang et.al.	2503.20768v2	null
2025-03-26	PhysGen3D: Crafting a Miniature Interactive World from a Single Image	Boyuan Chen et.al.	2503.20746v1	null
2025-03-26	MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams	Yanpeng Sun et.al.	2503.20745v1	null
2025-03-26	RecTable: Fast Modeling Tabular Data with Rectified Flow	Masane Fuchi et.al.	2503.20731v1	null
2025-03-26	MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion	Saron Samuel et.al.	2503.20698v1	null
2025-03-25	PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model	Mingju Gao et.al.	2503.19913v1	null
2025-03-25	FullDiT: Multi-Task Video Generative Foundation Model with Full Attention	Xuan Ju et.al.	2503.19907v1	null
2025-03-25	Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better	Zihang Lai et.al.	2503.19904v1	null
2025-03-25	Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation	Tianhao Qi et.al.	2503.19881v1	null
2025-03-25	Extensions of regret-minimization algorithm for optimal design	Youguang Chen et.al.	2503.19874v1	null
2025-03-25	Unpaired Translation of Chest X-ray Images for Lung Opacity Diagnosis via Adaptive Activation Masks and Cross-Domain Alignment	Junzhi Ning et.al.	2503.19860v1	null
2025-03-25	Towards Online Multi-Modal Social Interaction Understanding	Xinpeng Li et.al.	2503.19851v1	null
2025-03-25	FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs	Carlos Plou et.al.	2503.19850v1	null
2025-03-26	Attention IoU: Examining Biases in CelebA using Attention Maps	Aaron Serianni et.al.	2503.19846v2	link
2025-03-25	Multi-view Learning for the Identification of Risky Users in Dynamic Social Networks	Francesco Benedetti et.al.	2503.19831v1	null
2025-03-24	Target-Aware Video Diffusion Models	Taeksoo Kim et.al.	2503.18950v1	null
2025-03-24	Aether: Geometric-Aware Unified World Modeling	Aether Team et.al.	2503.18945v1	null
2025-03-24	SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding	Mingze Xu et.al.	2503.18943v1	null
2025-03-24	Video-T1: Test-Time Scaling for Video Generation	Fangfu Liu et.al.	2503.18942v1	null
2025-03-24	Training-free Diffusion Acceleration with Bottleneck Sampling	Ye Tian et.al.	2503.18940v1	null
2025-03-24	AdaWorld: Learning Adaptable World Models with Latent Actions	Shenyuan Gao et.al.	2503.18938v1	null
2025-03-24	SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction	Enrico Pallotta et.al.	2503.18933v1	null
2025-03-24	CoMP: Continual Multimodal Pre-training for Vision Foundation Models	Yitong Chen et.al.	2503.18931v1	link
2025-03-24	Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models	Meng Cao et.al.	2503.18923v1	null
2025-03-24	Online 3D Scene Reconstruction Using Neural Object Priors	Thomas Chabal et.al.	2503.18897v1	null
2025-03-21	Position: Interactive Generative Video as Next-Generation Game Engine	Jiwen Yu et.al.	2503.17359v1	null
2025-03-21	Time-Series U-Net with Recurrence for Noise-Robust Imaging Photoplethysmography	Vineet R. Shenoy et.al.	2503.17351v1	null
2025-03-21	Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer	Qingyu Shi et.al.	2503.17350v1	null
2025-03-21	Efficient Intent-Based Filtering for Multi-Party Conversations Using Knowledge Distillation from LLMs	Reem Gody et.al.	2503.17336v1	null
2025-03-21	Lattice Materials with Topological States Optimized On-Demand	Pegah Azizi et.al.	2503.17320v1	null
2025-03-21	Quasiconformal Maps between Bowditch Boundaries of Relatively Hyperbolic Groups	Rana Sardar et.al.	2503.17312v1	null
2025-03-21	LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language	Kun Chu et.al.	2503.17309v1	null
2025-03-21	Exploring the Temporal Dynamics of Facial Mimicry in Emotion Processing Using Action Units	Meisam Jamshidi Seikavandi et.al.	2503.17306v1	null
2025-03-21	HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks	Maria Pilligua et.al.	2503.17276v1	null
2025-03-21	Vision Transformer Based Semantic Communications for Next Generation Wireless Networks	Muhammad Ahmed Mohsin et.al.	2503.17275v1	null
2025-03-20	XAttention: Block Sparse Attention with Antidiagonal Scoring	Ruyi Xu et.al.	2503.16428v1	null
2025-03-20	MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance	Quanhao Li et.al.	2503.16421v1	null
2025-03-20	M3: 3D-Spatial MultiModal Memory	Xueyan Zou et.al.	2503.16413v1	null
2025-03-20	ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos	Haolin Yang et.al.	2503.16400v1	null
2025-03-21	SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation	Chun-Han Yao et.al.	2503.16396v2	null
2025-03-20	Attentional Triple-Encoder Network in Spatiospectral Domains for Medical Image Segmentation	Kristin Qi et.al.	2503.16389v1	null
2025-03-20	Probabilistic Quantum SVM Training on Ising Machine	Haoqi He et.al.	2503.16363v1	null
2025-03-20	Enhancing variational quantum algorithms by balancing training on classical and quantum hardware	Rahul Bhowmick et.al.	2503.16361v1	null
2025-03-20	UniSync: A Unified Framework for Audio-Visual Synchronization	Tao Feng et.al.	2503.16357v1	null
2025-03-20	Principal Actions on Topological Quivers and Associated Operator Dynamics	Matthew Gillespie et.al.	2503.16352v1	null
2025-03-19	Fast Two-photon Microscopy by Neuroimaging with Oblong Random Acquisition (NORA)	Esther Whang et.al.	2503.15487v1	null
2025-03-19	TULIP: Towards Unified Language-Image Pretraining	Zineng Tang et.al.	2503.15485v1	null
2025-03-19	Learning to Play Piano in the Real World	Yves-Simon Zeulner et.al.	2503.15481v1	null
2025-03-19	Cube: A Roblox View of 3D Intelligence	Foundation AI Team et.al.	2503.15475v1	null
2025-03-19	EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining	Boshen Xu et.al.	2503.15470v1	null
2025-03-20	Dynamic Bi-Elman Attention Networks (DBEAN): Dual-Directional Context-Aware Representation Learning for Enhanced Text Classification	ZhengLin Lai et.al.	2503.15469v2	link
2025-03-19	LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding	Amirhossein Kazerouni et.al.	2503.15420v1	null
2025-03-19	Temporal Regularization Makes Your Video Generator Stronger	Harold Haodong Chen et.al.	2503.15417v1	null
2025-03-19	Automated Processing of eXplainable Artificial Intelligence Outputs in Deep Learning Models for Fault Diagnostics of Large Infrastructures	Giovanni Floreale et.al.	2503.15415v1	null
2025-03-19	Federated Continual 3D Segmentation With Single-round Communication	Can Peng et.al.	2503.15414v1	null
2025-03-18	MusicInfuser: Making Video Diffusion Listen and Dance	Susung Hong et.al.	2503.14505v1	null
2025-03-18	Aligning Multimodal LLM with Human Preference: A Survey	Tao Yu et.al.	2503.14504v1	null
2025-03-18	Utilization of Neighbor Information for Image Classification with Different Levels of Supervision	Gihan Jayatilaka et.al.	2503.14500v1	null
2025-03-18	Tracking Meets Large Multimodal Models for Driving Scenario Understanding	Ayesha Ishaq et.al.	2503.14498v1	null
2025-03-18	Stable Virtual Camera: Generative View Synthesis with Diffusion Models	Jensen et.al.	2503.14489v1	null
2025-03-18	Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset	Yiqun Mei et.al.	2503.14485v1	null
2025-03-18	SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model	Yucheng Mao et.al.	2503.14463v1	null
2025-03-18	Functional classification of metabolic networks	Jorge Reyes et.al.	2503.14437v1	null
2025-03-18	LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers	Nikhil Abhyankar et.al.	2503.14434v1	null
2025-03-18	MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation	Hongyu Zhang et.al.	2503.14428v1	null
2025-03-17	VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning	Ye Liu et.al.	2503.13444v1	null
2025-03-17	Can Yang-Baxter imply Lie algebra?	Dmitry Khudoteplov et.al.	2503.13437v1	null
2025-03-17	WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes	Ling Yang et.al.	2503.13435v1	null
2025-03-17	Escaping Plato's Cave: Robust Conceptual Reasoning through Interpretable 3D Neural Object Volumes	Nhi Pham et.al.	2503.13429v1	null
2025-03-17	FLEX: A Framework for Learning Robot-Agnostic Force-based Skills Involving Sustained Contact Object Manipulation	Shijie Fang et.al.	2503.13418v1	null
2025-03-17	U2AD: Uncertainty-based Unsupervised Anomaly Detection Framework for Detecting T2 Hyperintensity in MRI Spinal Cord	Qi Zhang et.al.	2503.13400v1	null
2025-03-17	TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM	Ye Wang et.al.	2503.13377v1	null
2025-03-17	Multivariate Sparse Functional Linear Discriminant Analysis: An Application to Inflammatory Bowel Disease Classification	Limeng Liu et.al.	2503.13372v1	null
2025-03-17	SyncDiff: Diffusion-based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization	Xulin Fan et.al.	2503.13371v1	null
2025-03-17	Agents Play Thousands of 3D Video Games	Zhongwen Xu et.al.	2503.13356v1	null
2025-03-14	Scalable Video Conferencing Using SDN Principles	Oliver Michel et.al.	2503.11649v1	null
2025-03-14	ReCamMaster: Camera-Controlled Generative Rendering from A Single Video	Jianhong Bai et.al.	2503.11647v1	null
2025-03-14	Pathology Image Compression with Pre-trained Autoencoders	Srikar Yellapragada et.al.	2503.11591v1	null
2025-03-14	Generalization performance of neural mapping schemes for the space-time interpolation of satellite-derived ocean colour datasets	Thi Thuy Nga Nguyen et.al.	2503.11588v1	null
2025-03-14	Image Reconstruction from an Elastically Distorted Scan	Adrian Lopez et.al.	2503.11584v1	null
2025-03-14	Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers	Weiming Ren et.al.	2503.11579v1	null
2025-03-14	RASA: Replace Anyone, Say Anything -- A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing	Tianrui Pan et.al.	2503.11571v1	null
2025-03-14	Observation-only learning of neural mapping schemes for gappy satellite-derived ocean colour parameters	Clément Dorffer et.al.	2503.11532v1	null
2025-03-14	HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models	Ziqin Zhou et.al.	2503.11513v1	null
2025-03-14	Alzheimer's Disease Classification Using Retinal OCT: TransnetOCT and Swin Transformer Models	Siva Manohar Reddy Kesu et.al.	2503.11511v1	null
2025-03-13	V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes	Yanming Zhang et.al.	2503.10634v1	null
2025-03-13	NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models	Mert Albaba et.al.	2503.10626v1	null
2025-03-13	LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds	Lingteng Qiu et.al.	2503.10625v1	null
2025-03-13	OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer	Jinyang Li et.al.	2503.10616v1	null
2025-03-13	MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction	Yingshuang Zou et.al.	2503.10604v1	null
2025-03-13	CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models	Hao He et.al.	2503.10592v1	null
2025-03-13	Long Context Tuning for Video Generation	Yuwei Guo et.al.	2503.10589v1	null
2025-03-13	Learning Interpretable Logic Rules from Deep Vision Models	Chuqin Geng et.al.	2503.10547v1	null
2025-03-13	From Linear to Spline-Based Classification:Developing and Enhancing SMPA for Noisy Non-Linear Datasets	Vatsal Srivastava et.al.	2503.10545v1	null
2025-03-13	Lightweight Models for Emotional Analysis in Video	Quoc-Tien Nguyen et.al.	2503.10530v1	null
2025-03-12	PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop	Chenyu Li et.al.	2503.09595v1	null
2025-03-12	BIMBA: Selective-Scan Compression for Long-Range Video Question Answering	Md Mohaiminul Islam et.al.	2503.09590v1	null
2025-03-12	Fair Federated Medical Image Classification Against Quality Shift via Inter-Client Progressive State Matching	Nannan Wu et.al.	2503.09587v1	null
2025-03-12	Auspex: Building Threat Modeling Tradecraft into an Artificial Intelligence-based Copilot	Andrew Crossman et.al.	2503.09586v1	null
2025-03-12	Manify: A Python Library for Learning Non-Euclidean Representations	Philippe Chlenski et.al.	2503.09576v1	null
2025-03-12	TPDiff: Temporal Pyramid Video Diffusion Model	Lingmin Ran et.al.	2503.09566v1	null
2025-03-12	FCaS: Fine-grained Cardiac Image Synthesis based on 3D Template Conditional Diffusion Model	Jiahao Xia et.al.	2503.09560v1	null
2025-03-13	The R2D2 Deep Neural Network Series for Scalable Non-Cartesian Magnetic Resonance Imaging	Yiwei Chen et.al.	2503.09559v2	null
2025-03-12	CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games	Peng Chen et.al.	2503.09527v1	null
2025-03-12	Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework	Bakary Badjie et.al.	2503.09504v1	null
2025-03-11	QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension	Yongdong Luo et.al.	2503.08689v1	null
2025-03-11	REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder	Yitian Zhang et.al.	2503.08665v1	null
2025-03-11	MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention	Yuhan Wang et.al.	2503.08664v1	null
2025-03-11	Task-Oriented Co-Design of Communication, Computing, and Control for Edge-Enabled Industrial Cyber-Physical Systems	Yufeng Diao et.al.	2503.08661v1	null
2025-03-11	How Does Overparameterization Affect Machine Unlearning of Deep Neural Networks?	Gal Alon et.al.	2503.08633v1	null
2025-03-11	Cross-Embodiment Robotic Manipulation Synthesis via Guided Demonstrations through CycleVAE and Human Behavior Transformer	Apan Dastider et.al.	2503.08622v1	null
2025-03-11	Vision Transformer for Intracranial Hemorrhage Classification in CT Scans Using an Entropy-Aware Fuzzy Integral Strategy for Adaptive Scan-Level Decision Fusion	Mehdi Hosseini Chagahi et.al.	2503.08609v1	null
2025-03-11	Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling	Subin Kim et.al.	2503.08605v1	null
2025-03-11	Towards species' classification of the \textit{Anastrepha pseudoparallela} group	Gabriel R. Palma et.al.	2503.08598v1	null
2025-03-11	Proc4Gem: Foundation models for physical agency through procedural generation	Yixin Lin et.al.	2503.08593v1	null
2025-03-10	Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru	Dunant Cusipuma et.al.	2503.07587v1	null
2025-03-10	Efficient Distributed Learning over Decentralized Networks with Convoluted Support Vector Machine	Canyi Chen et.al.	2503.07563v1	null
2025-03-10	CPAny: Couple With Any Encoder to Refer Multi-Object Tracking	Weize Li et.al.	2503.07516v1	null
2025-03-10	ADROIT: A Self-Supervised Framework for Learning Robust Representations for Active Learning	Soumya Banerjee et.al.	2503.07506v1	null
2025-03-10	Blind-Wayfarer: A Minimalist, Probing-Driven Framework for Resilient Navigation in Perception-Degraded Environments	Yanran Xu et.al.	2503.07492v1	null
2025-03-10	NeAS: 3D Reconstruction from X-ray Images using Neural Attenuation Surface	Chengrui Zhu et.al.	2503.07491v1	null
2025-03-10	VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models	Jiacheng Ruan et.al.	2503.07478v1	null
2025-03-10	A Review on Geometry and Surface Inspection in 3D Concrete Printing	K. Mawas et.al.	2503.07472v1	null
2025-03-10	Simultaneous Energy Harvesting and Bearing Fault Detection using Piezoelectric Cantilevers	P. Peralta-Braz et.al.	2503.07462v1	null
2025-03-10	Open-Set Gait Recognition from Sparse mmWave Radar Point Clouds	Riccardo Mazzieri et.al.	2503.07435v1	null
2025-03-10	Analysis of 3D Urticaceae Pollen Classification Using Deep Learning Models	Tijs Konijn et.al.	2503.07419v1	null
2025-03-10	AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion	Mingzhen Sun et.al.	2503.07418v1	null
2025-03-10	TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision	Shaobin Zhuang et.al.	2503.07416v1	null
2025-03-10	Keeping Representation Similarity in Finetuning for Medical Image Analysis	Wenqiang Zu et.al.	2503.07399v1	null
2025-03-10	Brain Inspired Adaptive Memory Dual-Net for Few-Shot Image Classification	Kexin Di et.al.	2503.07396v1	null
2025-03-10	Is My Text in Your AI Model? Gradient-based Membership Inference Test applied to LLMs	Gonzalo Mancera et.al.	2503.07384v1	null
2025-03-07	Task-oriented Uncertainty Collaborative Learning for Label-Efficient Brain Tumor Segmentation	Zhenxuan Zhang et.al.	2503.05682v1	null
2025-03-07	A comparison of the Alkire-Foster method and a Markov random field approach in the analysis of multidimensional poverty	Joseph Lam et.al.	2503.05676v1	null
2025-03-07	Kinodynamic Model Predictive Control for Energy Efficient Locomotion of Legged Robots with Parallel Elasticity	Yulun Zhuang et.al.	2503.05666v1	null
2025-03-07	A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval	Yu Zhang et.al.	2503.05659v1	null
2025-03-07	On a classification problem for a quiver of type $\widetilde{A}_{3}$	Ivon Dorado et.al.	2503.05643v1	null
2025-03-07	VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control	Yuxuan Bian et.al.	2503.05639v1	null
2025-03-07	TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models	Mark YU et.al.	2503.05638v1	null
2025-03-07	Exploring FMCW Radars and Feature Maps for Activity Recognition: A Benchmark Study	Ali Samimi Fard et.al.	2503.05629v1	null
2025-03-07	Learning LLM Preference over Intra-Dialogue Pairs: A Framework for Utterance-level Understandings	Xuanqing Liu et.al.	2503.05620v1	null
2025-03-07	CACTUS: An Open Dataset and Framework for Automated Cardiac Assessment and Classification of Ultrasound Images Using Deep Transfer Learning	Hanae Elmekki et.al.	2503.05604v1	null
2025-03-06	FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video	Yue Gao et.al.	2503.04720v1	null
2025-03-06	Iris Style Transfer: Enhancing Iris Recognition with Style Features and Privacy Preservation through Neural Style Transfer	Mengdi Wang et.al.	2503.04707v1	null
2025-03-07	Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size	Alireza Behtash et.al.	2503.04704v2	null
2025-03-06	Coarse graining and reduced order models for plume ejection dynamics	Ike Griss Salas et.al.	2503.04690v1	null
2025-03-06	Mixed Near-field and Far-field Target Localization for Low-altitude Economy	Cong Zhou et.al.	2503.04681v1	null
2025-03-06	An Information-theoretic Multi-task Representation Learning Framework for Natural Language Understanding	Dou Hu et.al.	2503.04667v1	null
2025-03-06	What Are You Doing? A Closer Look at Controllable Human Video Generation	Emanuele Bugliarello et.al.	2503.04666v1	null
2025-03-06	Implicit Neural Representation for Video and Image Super-Resolution	Mary Aiyetigbo et.al.	2503.04665v1	null
2025-03-06	RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining	Tengfei Zhang et.al.	2503.04653v1	null
2025-03-06	Adaptive Prototype Learning for Multimodal Cancer Survival Analysis	Hong Liu et.al.	2503.04643v1	null
2025-03-05	GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control	Xuanchi Ren et.al.	2503.03751v1	link
2025-03-05	PacketCLIP: Multi-Modal Embedding of Network Traffic and Language for Cybersecurity Reasoning	Ryozo Masukawa et.al.	2503.03747v1	null
2025-03-05	OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction	Huang Huang et.al.	2503.03734v1	null
2025-03-05	Machine Learning in Biomechanics: Key Applications and Limitations in Walking, Running, and Sports Movements	Carlo Dindorf et.al.	2503.03717v1	null
2025-03-05	Handling Uncertainty in Health Data using Generative Algorithms	Mahdi Arab Loodaricheh et.al.	2503.03715v1	null
2025-03-05	Rethinking Video Tokenization: A Conditioned Diffusion-based Approach	Nianzu Yang et.al.	2503.03708v1	null
2025-03-05	DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance	Zhao Yang et.al.	2503.03689v1	null
2025-03-05	Empowering Multi-class Classification for Complex Functional Data with Simultaneous Feature Selection	Shuoyang Wang et.al.	2503.03679v1	null
2025-03-05	LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant	Wei Li et.al.	2503.03663v1	null
2025-03-05	Limits of nonlinear and dispersive fiber propagation for photonic extreme learning	Andrei V. Ermolaev et.al.	2503.03649v1	null
2025-03-04	Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation	Han Xue et.al.	2503.02881v1	null
2025-03-04	SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models	Dmitry Nechaev et.al.	2503.02876v1	null
2025-03-04	Unsupervised Attributed Dynamic Network Embedding with Stability Guarantees	Emma Ceccherini et.al.	2503.02859v1	null
2025-03-04	Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024	Nuria Alina Chandra et.al.	2503.02857v1	null
2025-03-04	Multimodal Deep Learning for Subtype Classification in Breast Cancer Using Histopathological Images and Gene Expression Data	Amin Honarmandi Shandiz et.al.	2503.02849v1	null
2025-03-04	In-Depth Analysis of Automated Acne Disease Recognition and Classification	Afsana Ahsan Jeny et.al.	2503.02835v1	null
2025-03-04	A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness	Nathan Drenkow et.al.	2503.02797v1	null
2025-03-04	Undertrained Image Reconstruction for Realistic Degradation in Blind Image Super-Resolution	Ru Ito et.al.	2503.02767v1	null
2025-03-04	Seeded Poisson Factorization: Leveraging domain knowledge to fit topic models	Bernd Prostmaier et.al.	2503.02741v1	null
2025-03-04	UAR-NVC: A Unified AutoRegressive Framework for Memory-Efficient Neural Video Compression	Jia Wang et.al.	2503.02733v1	null
2025-02-28	TomoSelfDEQ: Self-Supervised Deep Equilibrium Learning for Sparse-Angle CT Reconstruction	Tatiana A. Bubba et.al.	2502.21320v1	null
2025-02-28	Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos	Zhiyu Tan et.al.	2502.21314v1	null
2025-02-28	AutoComb: Automated Comb Sign Detector for 3D CTE Scans	Shashwat Gupta et.al.	2502.21311v1	null
2025-02-28	Bilevel Optimized Implicit Neural Representation for Scan-Specific Accelerated MRI Reconstruction	Hongze Yu et.al.	2502.21292v1	null
2025-02-28	Utilizing Quantum Fingerprints in Plant Cells to Evaluate Plant productivity	Umadini Ranasinghe et.al.	2502.21275v1	null
2025-02-28	Adaptive Keyframe Sampling for Long Video Understanding	Xi Tang et.al.	2502.21271v1	null
2025-02-28	PET Image Denoising via Text-Guided Diffusion: Integrating Anatomical Priors through Text Prompts	Boxiao Yu et.al.	2502.21260v1	null
2025-02-28	RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete	Yuheng Ji et.al.	2502.21257v1	null
2025-02-28	ALVI Interface: Towards Full Hand Motion Decoding for Amputees Using sEMG	Aleksandr Kovalev et.al.	2502.21256v1	null
2025-02-28	Short-Rate Derivatives in a Higher-for-Longer Environment	Aram Karakhanyan et.al.	2502.21252v1	null
2025-02-27	Walking the Web of Concept-Class Relationships in Incrementally Trained Interpretable Models	Susmit Agrawal et.al.	2502.20393v1	null
2025-02-27	Point Policy: Unifying Observations and Actions with Key Points for Robot Manipulation	Siddhant Haldar et.al.	2502.20391v1	null
2025-02-27	Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation	Sucheng Ren et.al.	2502.20388v1	null
2025-02-27	InsTaG: Learning Personalized 3D Talking Head from Few-Second Video	Jiahe Li et.al.	2502.20387v1	null
2025-02-27	ATLAS Navigator: Active Task-driven LAnguage-embedded Gaussian Splatting	Dexter Ong et.al.	2502.20386v1	null
2025-02-27	Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling	Hanyang Kong et.al.	2502.20378v1	null
2025-02-27	When does a predictor know its own loss?	Aravind Gollakota et.al.	2502.20375v1	null
2025-02-27	OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection	Shuming Liu et.al.	2502.20361v1	link
2025-02-27	KNOWM Memristors in a Bridge Synapse delay-based Reservoir Computing system for detection of epileptic seizures	Dawid Przyczyna et.al.	2502.20351v1	null
2025-02-27	T1-PILOT: Optimized Trajectories for T1 Mapping Acceleration	Tamir Shor et.al.	2502.20333v1	null
2025-02-26	TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding	Max Ku et.al.	2502.19400v1	null
2025-02-26	Multi-modal Contrastive Learning for Tumor-specific Missing Modality Synthesis	Minjoo Lim et.al.	2502.19390v1	null
2025-02-26	Surface-Based Manipulation	Ziqiao Wang et.al.	2502.19389v1	null
2025-02-26	Residual Speech Embeddings for Tone Classification: Removing Linguistic Content to Enhance Paralinguistic Analysis	Hamdan Al Ahbabi et.al.	2502.19387v1	null
2025-02-26	Efficient 4D fMRI ASD Classification using Spatial-Temporal-Omics-based Learning Framework	Ziqiao Weng et.al.	2502.19386v1	null
2025-02-26	Deep Learning For Time Series Analysis With Application On Human Motion	Ali Ismail-Fawaz et.al.	2502.19364v1	null
2025-02-26	Deep Learning-Based Transfer Learning for Classification of Cassava Disease	Ademir G. Costa Junior et.al.	2502.19351v1	null
2025-02-26	Unveiling Wireless Users' Locations via Modulation Classification-based Passive Attack	Ali Hanif et.al.	2502.19341v1	null
2025-02-26	I Know What I Don't Know: Improving Model Cascades Through Confidence Tuning	Stephan Rabanser et.al.	2502.19335v1	null
2025-02-26	Deep learning and classical computer vision techniques in medical image analysis: Case studies on brain MRI tissue segmentation, lung CT COPD registration, and skin lesion classification	Anyimadu Daniel Tweneboah et.al.	2502.19258v1	null
2025-02-25	Ion counting and temperature determination of Coulomb-crystallized laser-cooled ions in traps using convolutional neural networks	Yanning Yin et.al.	2502.18442v1	null
2025-02-25	Is OpenAlex Suitable for Research Quality Evaluation and Which Citation Indicator is Best?	Mike Thelwall et.al.	2502.18427v1	null
2025-02-25	Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand	Fengshuo Bai et.al.	2502.18423v1	null
2025-02-25	MedKAN: An Advanced Kolmogorov-Arnold Network for Medical Image Classification	Zhuoqin Yang et.al.	2502.18416v1	null
2025-02-25	Enhancing DNA Foundation Models to Address Masking Inefficiencies	Monireh Safari et.al.	2502.18405v1	null
2025-02-25	Learning sparse generalized linear models with binary outcomes via iterative hard thresholding	Namiko Matsumoto et.al.	2502.18393v1	null
2025-02-25	EgoSim: An Egocentric Multi-view Simulator and Real Dataset for Body-worn Cameras during Motion and Activity	Dominik Hollidt et.al.	2502.18373v1	null
2025-02-25	MindMem: Multimodal for Predicting Advertisement Memorability Using LLMs and Deep Learning	Sepehr Asgarian et.al.	2502.18371v1	null
2025-02-25	Exploring proteomic signatures in sepsis and non-infectious systemic inflammatory response syndrome	Adolfo Ruiz-Sanmartín et.al.	2502.18305v1	null
2025-02-25	Quantization of the Momentum Map via $\frak{g}$-adapted Formalities	Chiara Esposito et.al.	2502.18295v1	null
2025-02-24	FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning	Jason Jingzhou Liu et.al.	2502.17432v1	null
2025-02-24	X-Dancer: Expressive Music to Human Dance Video Generation	Zeyuan Chen et.al.	2502.17414v1	null
2025-02-24	Enriching Physical-Virtual Interaction in AR Gaming by Tracking Identical Real Objects	Liuchuan Yu et.al.	2502.17399v1	null
2025-02-24	Robust Confinement State Classification with Uncertainty Quantification through Ensembled Data-Driven Methods	Yoeri Poels et.al.	2502.17397v1	null
2025-02-24	RELICT: A Replica Detection Framework for Medical Image Generation	Orhun Utku Aydin et.al.	2502.17360v1	null
2025-02-24	Travel Time Reliability in Stochastic Kinematic Flow Models	Alexander Hammerl et.al.	2502.17359v1	null
2025-02-24	Leveraging Procedural Knowledge and Task Hierarchies for Efficient Instructional Video Pre-training	Karan Samel et.al.	2502.17352v1	null
2025-02-24	+Tour: Recommending personalized itineraries for smart tourism	João Paulo Esper et.al.	2502.17345v1	link
2025-02-24	City riots fed by transnational and trans-topic web-of-influence	Akshay Verma et.al.	2502.17331v1	null
2025-02-24	AnyTop: Character Animation Diffusion with Any Topology	Inbar Gat et.al.	2502.17327v1	null
2025-02-21	VaViM and VaVAM: Autonomous Driving through Video Generative Modeling	Florent Bartoccioni et.al.	2502.15672v1	link
2025-02-21	Local geometry of high-dimensional mixture models: Effective spectral theory and dynamical transitions	Gerard Ben Arous et.al.	2502.15655v1	null
2025-02-21	Mantis: Lightweight Calibrated Foundation Model for User-Friendly Time Series Classification	Vasilii Feofanov et.al.	2502.15637v1	null
2025-02-21	Pick-and-place Manipulation Across Grippers Without Retraining: A Learning-optimization Diffusion Policy Approach	Xiangtong Yao et.al.	2502.15613v1	null
2025-02-21	PDeepPP:A Deep learning framework with Pretrained Protein language for peptide classification	Jixiu Zhai et.al.	2502.15610v1	null
2025-02-21	On the Robustness of Transformers against Context Hijacking for Linear Classification	Tianle Li et.al.	2502.15609v1	null
2025-02-21	Benchmarking machine learning for bowel sound pattern classification from tabular features to pretrained models	Zahra Mansour et.al.	2502.15607v1	null
2025-02-21	Causal Modeling of fMRI Time-series for Interpretable Autism Spectrum Disorder Classification	Peiyu Duan et.al.	2502.15595v1	null
2025-02-21	Estimating Vehicle Speed on Roadways Using RNNs and Transformers: A Video-based Approach	Sai Krishna Reddy Mareddy et.al.	2502.15545v1	null
2025-02-21	Implications of Photon Mass: Vortextrap Magnetization of Black Holes	Gia Dvali et.al.	2502.15510v1	null
2025-02-20	Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts	Sara Ghaboura et.al.	2502.14865v1	null
2025-02-20	Dynamic Concepts Personalization from Single Videos	Rameen Abdal et.al.	2502.14844v1	null
2025-02-20	Improving the Diffusability of Autoencoders	Ivan Skorokhodov et.al.	2502.14831v1	null
2025-02-20	Cross Validation for Correlated Data in Regression and Classification Models, with Applications to Deep Learning	Oren Yuval et.al.	2502.14808v1	null
2025-02-20	FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis	Fadillah Maani et.al.	2502.14807v1	null
2025-02-20	AVD2: Accident Video Diffusion for Accident Video Description	Cheng Li et.al.	2502.14801v1	null
2025-02-20	Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration	Pengxiang Ding et.al.	2502.14795v1	null
2025-02-20	SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features	Michael Tschannen et.al.	2502.14786v1	link
2025-02-20	Sparse Activations as Conformal Predictors	Margarida M. Campos et.al.	2502.14773v1	null
2025-02-20	MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders	Maya Varma et.al.	2502.14753v1	null
2025-02-19	Qwen2.5-VL Technical Report	Shuai Bai et.al.	2502.13923v1	null
2025-02-19	Audio-Based Classification of Insect Species Using Machine Learning Models: Cicada, Beetle, Termite, and Cricket	Manas V Shetty et.al.	2502.13893v1	null
2025-02-19	Multi-view Video-Pose Pretraining for Operating Room Surgical Activity Recognition	Idris Hamoud et.al.	2502.13883v1	null
2025-02-19	Ribbon blocks for centraliser algebras of symmetric groups	Matthew Fayers et.al.	2502.13867v1	null
2025-02-19	MSVCOD:A Large-Scale Multi-Scene Dataset for Video Camouflage Object Detection	Shuyong Gao et.al.	2502.13859v1	null
2025-02-19	Generative Video Semantic Communication via Multimodal Semantic Fusion with Large Model	Hang Yin et.al.	2502.13838v1	null
2025-02-19	MGFI-Net: A Multi-Grained Feature Integration Network for Enhanced Medical Image Segmentation	Yucheng Zeng et.al.	2502.13808v1	null
2025-02-19	Classifying thick subcategories over a Koszul complex via the curved BGG correspondence	Jian Liu et.al.	2502.13806v1	null
2025-02-19	Binary VPN Traffic Detection Using Wavelet Features and Machine Learning	Yasameen Sajid Razooqi et.al.	2502.13804v1	null
2025-02-19	From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education	Yi-Fan Zhang et.al.	2502.13789v1	null
2025-02-18	Pre-training Auto-regressive Robotic Models with 4D Representations	Dantong Niu et.al.	2502.13142v1	null
2025-02-18	Magma: A Foundation Model for Multimodal AI Agents	Jianwei Yang et.al.	2502.13130v1	null
2025-02-18	BOLIMES: Boruta and LIME optiMized fEature Selection for Gene Expression Classification	Bich-Chung Phan et.al.	2502.13080v1	null
2025-02-18	L4P: Low-Level 4D Vision Perception Unified	Abhishek Badki et.al.	2502.13078v1	null
2025-02-18	Improved Fine-Tuning of Large Multimodal Models for Hateful Meme Detection	Jingbiao Mei et.al.	2502.13061v1	null
2025-02-18	Benchmarking MedMNIST dataset on real quantum hardware	Gurinder Singh et.al.	2502.13056v1	null
2025-02-18	LAMD: Context-driven Android Malware Detection and Classification with LLMs	Xingzhi Qian et.al.	2502.13055v1	null
2025-02-18	QZO: A Catalog of 5 Million Quasars from the Zwicky Transient Facility	S. J. Nakoneczny et.al.	2502.13054v1	null
2025-02-18	Development of systematic uncertainty-aware neural network trainings for binned-likelihood analyses at the LHC	CMS Collaboration et.al.	2502.13047v1	null
2025-02-18	How far are two symmetric matrices from commuting? With an application to object characterisation and identification in metal detection	P. D. Ledger et.al.	2502.13038v1	null
2025-02-17	VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution	Chendong Wang et.al.	2502.12151v1	null
2025-02-17	Idiosyncrasies in Large Language Models	Mingjie Sun et.al.	2502.12150v1	null
2025-02-17	LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities	Florian Sestak et.al.	2502.12128v1	null
2025-02-17	Hypernym Bias: Unraveling Deep Classifier Training Dynamics through the Lens of Class Hierarchy	Roman Malashin et.al.	2502.12125v1	null
2025-02-17	Crime in Proportions: Applying Compositional Data Analysis to European Crime Trends for 2022	Onur Batın Doğan et.al.	2502.12099v1	null
2025-02-17	Descriminative-Generative Custom Tokens for Vision-Language Models	Pramuditha Perera et.al.	2502.12095v1	null
2025-02-17	Unifying Explainable Anomaly Detection and Root Cause Analysis in Dynamical Systems	Yue Sun et.al.	2502.12086v1	null
2025-02-17	AdaSplash: Adaptive Sparse Flash Attention	Nuno Gonçalves et.al.	2502.12082v1	null
2025-02-17	Unhackable Temporal Rewarding for Scalable Video MLLMs	En Yu et.al.	2502.12081v1	null
2025-02-17	Classifying the Stoichiometry of Virus-like Particles with Interpretable Machine Learning	Jiayang Zhang et.al.	2502.12049v1	null
2025-02-14	Simplifying DINO via Coding Rate Regularization	Ziyang Wu et.al.	2502.10385v1	null
2025-02-14	Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data	Corinna Cortes et.al.	2502.10381v1	null
2025-02-14	Quasi-isometry classification of certain graph $2$-braid groups and its applications	Byung Hee An et.al.	2502.10366v1	null
2025-02-14	Proper Learnability and the Role of Unlabeled Data	Julian Asilis et.al.	2502.10359v1	null
2025-02-14	Diameter bounds for $SL(2,\mathbb{Z})$-orbits of origamis in $\mathcal{H}(2)$ and the Prym loci in $\mathcal{H}(4)$ and $\mathcal{H}(6)$	Luke Jeffreys et.al.	2502.10358v1	null
2025-02-14	OptimOTU: Taxonomically aware OTU clustering with optimized thresholds and a bioinformatics workflow for metabarcoding data	Brendan Furneaux et.al.	2502.10350v1	null
2025-02-14	Ocular Disease Classification Using CNN with Deep Convolutional Generative Adversarial Network	Arun Kunwar et.al.	2502.10334v1	null
2025-02-14	SegX: Improving Interpretability of Clinical Image Diagnosis with Segmentation-based Enhancement	Yuhao Zhang et.al.	2502.10296v1	null
2025-02-14	Probing Perceptual Constancy in Large Vision Language Models	Haoran Sun et.al.	2502.10273v1	null
2025-02-14	Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model	Guoqing Ma et.al.	2502.10248v1	null
2025-02-13	Embed Any NeRF: Graph Meta-Networks for Neural Tasks on Arbitrary NeRF Architectures	Francesco Ballerini et.al.	2502.09623v1	null
2025-02-13	Exploring the Potential of Encoder-free Architectures in 3D LMMs	Yiwen Tang et.al.	2502.09620v1	null
2025-02-13	Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights	Jonathan Kahana et.al.	2502.09619v1	null
2025-02-13	DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References	Xueyi Liu et.al.	2502.09614v1	null
2025-02-13	Morphological Classification of Galaxies	Karen Masters et.al.	2502.09610v1	null
2025-02-13	GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis	Angelos Zavras et.al.	2502.09598v1	null
2025-02-13	Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs	Siyan Zhao et.al.	2502.09597v1	null
2025-02-13	Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering	Mark Beliaev et.al.	2502.09573v1	null
2025-02-13	Diffusing DeBias: a Recipe for Turning a Bug into a Feature	Massimiliano Ciranni et.al.	2502.09564v1	null
2025-02-13	Learned Correction Methods for Ultrasound Computed Tomography Imaging Using Simplified Physics Models	Luke Lozenski et.al.	2502.09546v1	null
2025-02-12	CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation	Qinghe Wang et.al.	2502.08639v1	null
2025-02-12	Rapid Whole Brain Mesoscale In-vivo MR Imaging using Multi-scale Implicit Neural Representation	Jun Lyu et.al.	2502.08634v1	null
2025-02-12	Ensemble based approach to quantifying uncertainty of LLM based classifications	Srijith Rajamohan et.al.	2502.08631v1	null
2025-02-12	Robot Data Curation with Mutual Information Estimators	Joey Hejna et.al.	2502.08623v1	null
2025-02-12	Forecasting Drought Using Machine Learning in California	Nan K. Li et.al.	2502.08622v1	null
2025-02-12	SportsBuddy: Designing and Evaluating an AI-Powered Sports Video Storytelling Tool Through Real-World Deployment	Tica Lin et.al.	2502.08621v1	null
2025-02-12	Learning Selection Cuts With Gradients	Mike Hance et.al.	2502.08615v1	null
2025-02-12	Continuous Cardiac Arrest Prediction in ICU using PPG Foundation Model	Saurabh Kataria et.al.	2502.08612v1	null
2025-02-12	CurvGAD: Leveraging Curvature for Enhanced Graph Anomaly Detection	Karish Grover et.al.	2502.08605v1	null
2025-02-12	Light-A-Video: Training-free Video Relighting via Progressive Light Fusion	Yujie Zhou et.al.	2502.08590v1	link
2025-02-11	Pippo: High-Resolution Multi-View Humans from a Single Image	Yash Kant et.al.	2502.07785v1	null
2025-02-11	Statistical Reevaluation of the USP Classification Boundary: Smaller Planets Within 1 Day, Larger Period Ratios Below 2 Days	Armaan V. Goyal et.al.	2502.07773v1	null
2025-02-11	A forbidden subgraph study for cut problems on graphs permitting loops and multiedges	Tala Eagling-Vose et.al.	2502.07769v1	null
2025-02-11	An Advanced NLP Framework for Automated Medical Diagnosis with DeBERTa and Dynamic Contextual Positional Gating	Mohammad Ali Labbaf Khaniki et.al.	2502.07755v1	null
2025-02-11	HiPoNet: A Topology-Preserving Multi-View Neural Network For High Dimensional Point Cloud and Single-Cell Data	Siddharth Viswanath et.al.	2502.07746v1	null
2025-02-11	Next Block Prediction: Video Generation via Semi-Auto-Regressive Modeling	Shuhuai Ren et.al.	2502.07737v1	null
2025-02-11	PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization	Bing Fan et.al.	2502.07707v1	null
2025-02-11	Magic 1-For-1: Generating One Minute Video Clips within One Minute	Hongwei Yi et.al.	2502.07701v1	null
2025-02-11	SoK: A Classification for AI-driven Personalized Privacy Assistants	Victor Morel et.al.	2502.07693v1	null
2025-02-11	Auto-Drafting Police Reports from Noisy ASR Outputs: A Trust-Centered LLM Approach	Param Kulkarni et.al.	2502.07677v1	null
2025-02-10	Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT	Dongyang Liu et.al.	2502.06782v1	null
2025-02-10	KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification	Yue Zhu et.al.	2502.06779v1	null
2025-02-10	ALMACAL XIII. Evolution of the CO luminosity function and the molecular gas mass density out to $z$ ~ 6	Victoria Bollo et.al.	2502.06778v1	null
2025-02-10	Enhancing Performance of Explainable AI Models with Constrained Concept Refinement	Geyu Liang et.al.	2502.06775v1	null
2025-02-10	History-Guided Video Diffusion	Kiwhan Song et.al.	2502.06764v1	null
2025-02-10	Equations over Finite Monoids with Infinite Promises	Alberto Larrauri et.al.	2502.06762v1	null
2025-02-10	Incentivizing Desirable Effort Profiles in Strategic Classification: The Role of Causality and Uncertainty	Valia Efthymiou et.al.	2502.06749v1	null
2025-02-10	Wandering around: A bioinspired approach to visual attention through object motion sensitivity	Giulia D Angelo et.al.	2502.06747v1	null
2025-02-10	Persistent spin grids with spin-orbit coupled 2D electron gas	A. V. Poshakinskiy et.al.	2502.06745v1	null
2025-02-10	Enhancing Pneumonia Diagnosis and Severity Assessment through Deep Learning: A Comprehensive Approach Integrating CNN Classification and Infection Segmentation	S Kumar Reddy Mallidi et.al.	2502.06735v1	null
2025-02-07	FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation	Shilong Zhang et.al.	2502.05179v1	null
2025-02-07	Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray	Yunhang Shen et.al.	2502.05177v1	null
2025-02-07	AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting	Chung-Ho Wu et.al.	2502.05176v1	null
2025-02-07	VideoRoPE: What Makes for Good Video Rotary Position Embedding?	Xilin Wei et.al.	2502.05173v1	null
2025-02-07	Torsion pairs and 3-fold flops	Parth Shimpi et.al.	2502.05146v1	null
2025-02-07	Chest X-ray Foundation Model with Global and Local Representations Integration	Zefan Yang et.al.	2502.05142v1	null
2025-02-07	Counting Fish with Temporal Representations of Sonar Video	Kai Van Brunt et.al.	2502.05129v1	null
2025-02-07	Multiphoton, multimode state classification for nonlinear optical circuits	Denis A. Kopylov et.al.	2502.05123v1	null
2025-02-07	Investigating the impact of kernel harmonization and deformable registration on inspiratory and expiratory chest CT images for people with COPD	Aravind R. Krishnan et.al.	2502.05119v1	null
2025-02-07	GiesKaNe: Bridging Past and Present in Grammatical Theory and Practical Application	Volker Emmrich et.al.	2502.05113v1	null
2025-02-06	Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment	Zuyan Liu et.al.	2502.04328v1	null
2025-02-06	WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs	Jack Hong et.al.	2502.04326v1	null
2025-02-06	MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation	Jinbo Xing et.al.	2502.04299v1	null
2025-02-06	Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression	Lirui Wang et.al.	2502.04296v1	null
2025-02-06	Retro-Rank-In: A Ranking-Based Approach for Inorganic Materials Synthesis Planning	Thorben Prein et.al.	2502.04289v1	null
2025-02-06	How does a Multilingual LM Handle Multiple Languages?	Santhosh Kakarla et.al.	2502.04269v1	null
2025-02-06	Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion	Marco Mistretta et.al.	2502.04263v1	null
2025-02-06	Work in Progress: AI-Powered Engineering-Bridging Theory and Practice	Oz Levy et.al.	2502.04256v1	null
2025-02-06	An object detection approach for lane change and overtake detection from motion profiles	Andrea Benericetti et.al.	2502.04244v1	null
2025-02-06	Saflo: eBPF-Based MPTCP Scheduler for Mitigating Traffic Analysis Attacks in Cellular Networks	Sangwoo Lee et.al.	2502.04236v1	null
2025-02-05	Seeing World Dynamics in a Nutshell	Qiuhong Shen et.al.	2502.03465v1	null
2025-02-05	SKI Models: Skeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living	Arkaprava Sinha et.al.	2502.03459v1	null
2025-02-05	Kineto-Dynamical Planning and Accurate Execution of Minimum-Time Maneuvers on Three-Dimensional Circuits	Mattia Piccinini et.al.	2502.03454v1	null
2025-02-05	Linearized Optimal Transport pyLOT Library: A Toolkit for Machine Learning on Point Clouds	Jun Linwu et.al.	2502.03439v1	null
2025-02-05	A Temporal Convolutional Network-Based Approach and a Benchmark Dataset for Colonoscopy Video Temporal Segmentation	Carlo Biffi et.al.	2502.03430v1	null
2025-02-05	Concept Based Explanations and Class Contrasting	Rudolf Herdt et.al.	2502.03422v1	null
2025-02-05	A Structured Reasoning Framework for Unbalanced Data Classification Using Probabilistic Models	Junliang Du et.al.	2502.03386v1	null
2025-02-05	Deep Learning-Based Approach for Identification of Potato Leaf Diseases Using Wrapper Feature Selection and Feature Concatenation	Muhammad Ahtsam Naeem et.al.	2502.03370v1	null
2025-02-05	Learning from Active Human Involvement through Proxy Value Propagation	Zhenghao Peng et.al.	2502.03369v1	null
2025-02-05	Rethinking Approximate Gaussian Inference in Classification	Bálint Mucsányi et.al.	2502.03366v1	null
2025-02-04	Fairness in Survival Analysis: A Novel Conditional Mutual Information Augmentation Approach	Tianyang Xie et.al.	2502.02567v1	null
2025-02-04	Learning the RoPEs: Better 2D and 3D Position Encodings with STRING	Connor Schenck et.al.	2502.02562v1	null
2025-02-04	Particle Trajectory Representation Learning with Masked Point Modeling	Sam Young et.al.	2502.02558v1	null
2025-02-04	AAD-DCE: An Aggregated Multimodal Attention Mechanism for Early and Late Dynamic Contrast Enhanced Prostate MRI Synthesis	Divya Bharti et.al.	2502.02555v1	null
2025-02-04	Hierarchical Sparse Bayesian Multitask Model with Scalable Inference for Microbiome Analysis	Haonan Zhu et.al.	2502.02552v1	null
2025-02-04	2D Surface Brightness Modelling of Large 2MASS Galaxies II: The Role of Classical Bulges and Pseudobulges on Galaxy Scaling Relations and its implication for Supermassive Black Hole Formation	Emmanuel Ríos-López et.al.	2502.02546v1	null
2025-02-04	TabPFN Unleashed: A Scalable and Effective Solution to Tabular Classification Problems	Si-Yang Liu et.al.	2502.02527v1	null
2025-02-04	Hybrid Fingerprint-based Positioning in Cell-Free Massive MIMO Systems	Manish Kumar et.al.	2502.02512v1	null
2025-02-04	The Skin Game: Revolutionizing Standards for AI Dermatology Model Comparison	Łukasz Miętkiewicz et.al.	2502.02500v1	null
2025-02-04	VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models	Hila Chefer et.al.	2502.02492v1	null
2025-01-31	Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach	Yingdan Shi et.al.	2501.19403v1	null
2025-01-31	Perceptive Mixed-Integer Footstep Control for Underactuated Bipedal Walking on Rough Terrain	Brian Acosta et.al.	2501.19391v1	null
2025-01-31	Beyond Fixed Horizons: A Theoretical Framework for Adaptive Denoising Diffusions	Sören Christensen et.al.	2501.19373v1	null
2025-01-31	Benchmark of the Full and Reduced Effective Resistance Kernel for Molecular Classification	Adam Wesołowski et.al.	2501.19352v1	null
2025-01-31	An All-digital 65-nm Tsetlin Machine Image Classification Accelerator with 8.6 nJ per MNIST Frame at 60.3k Frames per Second	Svein Anders Tunheim et.al.	2501.19347v1	null
2025-01-31	Pathological MRI Segmentation by Synthetic Pathological Data Generation in Fetuses and Neonates	Misha P. T Kaandorp et.al.	2501.19338v1	null
2025-01-31	Consistent Video Colorization via Palette Guidance	Han Wang et.al.	2501.19331v1	null
2025-01-31	Ultra-fast Real-time Target Recognition Using a Shift, Scale, and Rotation Invariant Hybrid Opto-electronic Joint Transform Correlator	Xi Shen et.al.	2501.19299v1	null
2025-01-31	Differentially Private In-context Learning via Sampling Few-shot Mixed with Zero-shot Outputs	James Flemings et.al.	2501.19287v1	null
2025-01-31	Application of Generative Adversarial Network (GAN) for Synthetic Training Data Creation to improve performance of ANN Classifier for extracting Built-Up pixels from Landsat Satellite Imagery	Amritendu Mukherjee et.al.	2501.19283v1	null
2025-01-30	DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models	Ruofan Liang et.al.	2501.18590v1	null
2025-01-30	Node Classification and Search on the Rubik's Cube Graph with GNNs	Alessandro Barro et.al.	2501.18580v1	null
2025-01-30	BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos	Lehao Lin et.al.	2501.18565v1	null
2025-01-30	Finite subgroups of maximal order of the Cremona group over the rationals	Ahmed Abouelsaad et.al.	2501.18551v1	null
2025-01-30	UDC-VIT: A Real-World Video Dataset for Under-Display Cameras	Kyusu Ahn et.al.	2501.18545v1	link
2025-01-30	Loss Functions and Operators Generated by f-Divergences	Vincent Roulet et.al.	2501.18537v1	null
2025-01-30	Sample Classification using Machine Learning-Assisted Entangled Two-Photon Absorption	Áulide Martínez-Tapia et.al.	2501.18534v1	null
2025-01-30	Joint Learning of Energy-based Models and their Partition Function	Michael E. Sander et.al.	2501.18528v1	null
2025-01-30	Character factorisations, $z$-asymmetric partitions and plethysm	Seamus Albion et.al.	2501.18520v1	null
2025-01-30	Deconstruct Complexity (DeComplex): A Novel Perspective on Tackling Dense Action Detection	Faegheh Sardari et.al.	2501.18509v1	null
2025-01-29	acoupi: An Open-Source Python Framework for Deploying Bioacoustic AI Models on Edge Devices	Aude Vuilliomenet et.al.	2501.17841v1	link
2025-01-29	IRONMAP: Iron Network Mapping and Analysis Protocol for Detecting Over-Time Brain Iron Abnormalities in Neurological Disease	Jack A. Reeves et.al.	2501.17838v1	null
2025-01-29	TikTok's recommendations skewed towards Republican content during the 2024 U.S. presidential race	Hazem Ibrahim et.al.	2501.17831v1	null
2025-01-29	Aggregation Schemes for Single-Vector WSI Representation Learning in Digital Pathology	Sobhan Hemati et.al.	2501.17822v1	null
2025-01-29	eaSEL: Promoting Social-Emotional Learning and Parent-Child Interaction through AI-Mediated Content Consumption	Jocelyn Shen et.al.	2501.17819v1	null
2025-01-29	CrowdSplat: Exploring Gaussian Splatting For Crowd Rendering	Xiaohan Sun et.al.	2501.17792v1	null
2025-01-29	Glioma Multimodal MRI Analysis System for Tumor Layered Diagnosis via Multi-task Semi-supervised Learning	Yihao Liu et.al.	2501.17758v1	null
2025-01-29	PulmoFusion: Advancing Pulmonary Health with Efficient Multi-Modal Fusion	Ahmed Sharshar et.al.	2501.17699v1	link
2025-01-29	NutMaat: A Python package for stellar spectral classification on the MK system	R. I. El-Kholy et.al.	2501.17698v1	null
2025-01-29	Tonguescape: Exploring Language Models Understanding of Vowel Articulation	Haruki Sakajo et.al.	2501.17643v1	null
2025-01-28	A Hybrid Deep Learning CNN Model for Enhanced COVID-19 Detection from Computed Tomography (CT) Scan Images	Suresh Babu Nettur et.al.	2501.17160v1	null
2025-01-28	Sensitivity of Quantitative Susceptibility Mapping in Clinical Brain Research	Fahad Salman et.al.	2501.17158v1	null
2025-01-28	Three-Dimensional Diffusion-Weighted Multi-Slab MRI With Slice Profile Compensation Using Deep Energy Model	Reza Ghorbani et.al.	2501.17152v1	null
2025-01-28	FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data	Deren Lei et.al.	2501.17144v1	link
2025-01-28	DINOSTAR: Deep Iterative Neural Object Detector Self-Supervised Training for Roadside LiDAR Applications	Muhammad Shahbaz et.al.	2501.17076v1	null
2025-01-28	Symmetries of 3-webs around a point	Jean Paul Dufour et.al.	2501.17066v1	null
2025-01-28	Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding	Akash Kumar et.al.	2501.17053v1	null
2025-01-28	Benchmarking Quantum Convolutional Neural Networks for Signal Classification in Simulated Gamma-Ray Burst Detection	Farida Farsian et.al.	2501.17041v1	null
2025-01-28	Approach Towards Semi-Automated Certification for Low Criticality ML-Enabled Airborne Applications	Chandrasekar Sridhar et.al.	2501.17028v1	null
2025-01-28	MAUCell: An Adaptive Multi-Attention Framework for Video Frame Prediction	Shreyam Gupta et.al.	2501.16997v1	null
2025-01-27	RelightVid: Temporal-Consistent Diffusion Model for Video Relighting	Ye Fang et.al.	2501.16330v1	null
2025-01-27	sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging	Jingyuan Chen et.al.	2501.16329v1	null
2025-01-27	Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture	Yikun Hou et.al.	2501.16322v1	null
2025-01-27	TiDES: The 4MOST Time Domain Extragalactic Survey	C. Frohmaier et.al.	2501.16311v1	null
2025-01-27	RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval	Long Nguyen et.al.	2501.16303v1	null
2025-01-27	Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models	Jing Zhang et.al.	2501.16282v1	null
2025-01-27	Lightweight Weighted Average Ensemble Model for Pneumonia Detection in Chest X-Ray Images	Suresh Babu Nettur et.al.	2501.16249v1	null
2025-01-27	Zero-Shot Decision Tree Construction via Large Language Models	Lucas Carrasco et.al.	2501.16247v1	null
2025-01-27	CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation	Xiaochuan Ma et.al.	2501.16246v1	null
2025-01-27	Echoes of Discord: Forecasting Hater Reactions to Counterspeech	Xiaoying Song et.al.	2501.16235v1	null
2025-01-24	Estimation-theoretic analysis of lensless imaging	Leyla A. Kabuli et.al.	2501.14727v1	null
2025-01-24	Gland Segmentation Using SAM With Cancer Grade as a Prompt	Yijie Zhu et.al.	2501.14718v1	null
2025-01-24	Enhanced Confocal Laser Scanning Microscopy with Adaptive Physics Informed Deep Autoencoders	Zaheer Ahmad et.al.	2501.14709v1	null
2025-01-24	Stroke classification using Virtual Hybrid Edge Detection from in silico electrical impedance tomography data	Juan Pablo Agnelli et.al.	2501.14704v1	null
2025-01-24	Rethinking Foundation Models for Medical Image Classification through a Benchmark Study on MedMNIST	Fuping Wu et.al.	2501.14685v1	null
2025-01-24	Artificial Intelligence Could Have Predicted All Space Weather Events Associated with the May 2024 Superstorm	Sabrina Guastavino et.al.	2501.14684v1	null
2025-01-24	An Empirical Study on LLM-based Classification of Requirements-related Provisions in Food-safety Regulations	Shabnam Hassani et.al.	2501.14683v1	null
2025-01-24	MatAnyone: Stable Video Matting with Consistent Memory Propagation	Peiqing Yang et.al.	2501.14677v1	null
2025-01-24	Automation of finding strong gravitational lenses in the Kilo Degree Survey with U-DenseLens (DenseLens + Segmentation)	Bharath Chowdhary Nagam et.al.	2501.14650v1	null
2025-01-24	ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations	Tianming Liang et.al.	2501.14607v1	null
2025-01-23	IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models	Jiayi Lei et.al.	2501.13920v1	null
2025-01-23	Temporal Preference Optimization for Long-Form Video Understanding	Rui Li et.al.	2501.13919v1	null
2025-01-23	Improving Video Generation with Human Feedback	Jie Liu et.al.	2501.13918v1	null
2025-01-23	Exploring Finetuned Audio-LLM on Heart Murmur Features	Adrian Florea et.al.	2501.13884v1	null
2025-01-23	Disclinations, dislocations, and emanant flux at Dirac criticality	Maissam Barkeshli et.al.	2501.13866v1	null
2025-01-23	Dual-Modal Prototype Joint Learning for Compositional Zero-Shot Learning	Shiyu Zhang et.al.	2501.13859v1	null
2025-01-23	First Lessons Learned of an Artificial Intelligence Robotic System for Autonomous Coarse Waste Recycling Using Multispectral Imaging-Based Methods	Timo Lange et.al.	2501.13855v1	null
2025-01-23	Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes	Shiling Deng et.al.	2501.13851v1	link
2025-01-23	Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos	Kairui Hu et.al.	2501.13826v1	null
2025-01-23	Hallucinations Can Improve Large Language Models in Drug Discovery	Shuzhou Yuan et.al.	2501.13824v1	null
2025-01-22	VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding	Boqiang Zhang et.al.	2501.13106v1	link
2025-01-22	Robust Representation Consistency Model via Contrastive Denoising	Jiachen Lei et.al.	2501.13094v1	link
2025-01-22	CHaRNet: Conditioned Heatmap Regression for Robust Dental Landmark Localization	José Rodríguez-Ortega et.al.	2501.13073v1	null
2025-01-22	Robust Body Composition Analysis by Generating 3D CT Volumes from Limited 2D Slices	Lianrui Zuo et.al.	2501.13071v1	null
2025-01-22	Beyond the Lungs: Extending the Field of View in Chest CT with Latent Diffusion Models	Lianrui Zuo et.al.	2501.13068v1	null
2025-01-22	SMART-Vision: Survey of Modern Action Recognition Techniques in Vision	Ali K. AlShami et.al.	2501.13066v1	null
2025-01-22	Real-time Terahertz Compressive Optical-Digital Neural Network Imaging	Shao-Hsuan Wu et.al.	2501.13065v1	null
2025-01-22	One-Class Domain Adaptation via Meta-Learning	Stephanie Holly et.al.	2501.13052v1	null
2025-01-22	Characterizing Collective Efforts in Content Sharing and Quality Control for ADHD-relevant Content on Video-sharing Platforms	Hanxiu 'Hazel' Zhu et.al.	2501.13020v1	null
2025-01-22	Discrete Lagrangian multiforms for ABS equations I: quad equations	Jacob J. Richardson et.al.	2501.13012v1	null
2025-01-21	Learning segmentation from point trajectories	Laurynas Karazija et.al.	2501.12392v1	null
2025-01-21	Taming Teacher Forcing for Masked Autoregressive Video Generation	Deyu Zhou et.al.	2501.12389v1	null
2025-01-21	Continuous 3D Perception Model with Persistent State	Qianqian Wang et.al.	2501.12387v1	null
2025-01-21	InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling	Yi Wang et.al.	2501.12386v1	link
2025-01-21	CCESAR: Coastline Classification-Extraction From SAR Images Using CNN-U-Net Combination	Vidhu Arora et.al.	2501.12384v1	null
2025-01-21	Parallel Sequence Modeling via Generalized Spatial Propagation Network	Hongjun Wang et.al.	2501.12381v1	null
2025-01-21	MMVU: Measuring Expert-Level Multi-Discipline Video Understanding	Yilun Zhao et.al.	2501.12380v1	link
2025-01-21	Video Depth Anything: Consistent Depth Estimation for Super-Long Videos	Sili Chen et.al.	2501.12375v1	null
2025-01-21	InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model	Yuhang Zang et.al.	2501.12368v1	link
2025-01-21	Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration	Thomas Walshe et.al.	2501.12332v1	null
2025-01-17	Zero-Shot Monocular Scene Flow Estimation in the Wild	Yiqing Liang et.al.	2501.10357v1	null
2025-01-17	DexForce: Extracting Force-informed Actions from Kinesthetic Demonstrations for Dexterous Manipulation	Claire Chen et.al.	2501.10356v1	null
2025-01-17	Hybrid Deep Learning Model for epileptic seizure classification by using 1D-CNN with multi-head attention mechanism	Mohammed Guhdar et.al.	2501.10342v1	null
2025-01-17	Natural Language Processing of Privacy Policies: A Survey	Andrick Adhikari et.al.	2501.10319v1	null
2025-01-17	Using Technology in Digital Humanities for Learning and Knowledge Dissemination	Armanda Rodrigues et.al.	2501.10275v1	null
2025-01-17	Over-the-Air Multi-Sensor Inference with Neural Networks Using Memristor-Based Analog Computing	Busra Tegin et.al.	2501.10245v1	null
2025-01-17	Amortized Bayesian Mixture Models	Šimon Kucharský et.al.	2501.10229v1	null
2025-01-17	Adaptive Clustering for Efficient Phenotype Segmentation of UAV Hyperspectral Data	Ciem Cornelissen et.al.	2501.10199v1	null
2025-01-17	Secure Semantic Communication With Homomorphic Encryption	Rui Meng et.al.	2501.10182v1	null
2025-01-17	A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features	Enes Karanfil et.al.	2501.10144v1	null
2025-01-16	Learnings from Scaling Visual Tokenizers for Reconstruction and Generation	Philippe Hansen-Estruch et.al.	2501.09755v1	null
2025-01-16	Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues	Youngjoon Jang et.al.	2501.09754v1	null
2025-01-16	SRE-Conv: Symmetric Rotation Equivariant Convolution for Biomedical Image Classification	Yuexi Du et.al.	2501.09753v1	link
2025-01-16	Improvement of Data Analytics Techniques in Reflection High Energy Electron Diffraction to Enable Machine Learning	Patrick T. Gemperline et.al.	2501.09743v1	link
2025-01-16	ComplexVAD: Detecting Interaction Anomalies in Video	Furkan Mumcu et.al.	2501.09733v1	null
2025-01-16	Practical Continual Forgetting for Pre-trained Vision Models	Hongbo Zhao et.al.	2501.09705v1	link
2025-01-16	Cueless EEG imagined speech for subject identification: dataset and benchmarks	Ali Derakhshesh et.al.	2501.09700v1	link
2025-01-16	Active particle in a very thin interfacial droplet	Airi N. Kato et.al.	2501.09652v1	null
2025-01-16	Electronic Health Records: Towards Digital Twins in Healthcare	Muhammet Alkan et.al.	2501.09640v1	null
2025-01-16	Unified Face Matching and Physical-Digital Spoofing Attack Detection	Arun Kunwar et.al.	2501.09635v1	null
2025-01-15	Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion	Jingyuan Chen et.al.	2501.09019v1	null
2025-01-15	Vision Foundation Models for Computed Tomography	Suraj Pai et.al.	2501.09001v1	null
2025-01-15	RepVideo: Rethinking Cross-Layer Representation for Video Generation	Chenyang Si et.al.	2501.08994v1	null
2025-01-15	Learning to Extract Cross-Domain Aspects and Understanding Sentiments Using Large Language Models	Karukriti Kaushik Ghosh et.al.	2501.08974v1	null
2025-01-15	An analysis of data variation and bias in image-based dermatological datasets for machine learning classification	Francisco Mauro et.al.	2501.08962v1	null
2025-01-15	Neuromorphic Retina: An FPGA-based Emulator	Prince Phillip et.al.	2501.08943v1	null
2025-01-15	Visual WetlandBirds Dataset: Bird Species Identification and Behavior Recognition in Videos	Javier Rodriguez-Juan et.al.	2501.08931v1	link
2025-01-15	Learning Joint Denoising, Demosaicing, and Compression from the Raw Natural Image Noise Dataset	Benoit Brummer et.al.	2501.08924v1	null
2025-01-15	Multi-View Transformers for Airway-To-Lung Ratio Inference on Cardiac CT Scans: The C4R Study	Sneha N. Naik et.al.	2501.08902v1	null
2025-01-15	An investigation of the relationship between morphology and chemistry of the D-type spherules from the recovery expedition of the CNEOS 2014-01-08 bolide: Implications for origins	Eugenia Hyung et.al.	2501.08890v1	null
2025-01-14	DAViD: Modeling Dynamic Affordance of 3D Objects using Pre-trained Video Diffusion Models	Hyeonwoo Kim et.al.	2501.08333v1	null
2025-01-14	Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise	Ryan Burgert et.al.	2501.08331v1	link
2025-01-14	Gradient Equilibrium in Online Learning: Theory and Applications	Anastasios N. Angelopoulos et.al.	2501.08330v1	link
2025-01-14	Predicting 4D Hand Trajectory from Monocular Videos	Yufei Ye et.al.	2501.08329v1	null
2025-01-14	Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks	Miran Heo et.al.	2501.08326v1	null
2025-01-14	GameFactory: Creating New Games with Generative Interactive Videos	Jiwen Yu et.al.	2501.08325v1	null
2025-01-14	ADAM-1: AI and Bioinformatics for Alzheimer's Detection and Microbiome-Clinical Data Integrations	Ziyuan Huang et.al.	2501.08324v1	null
2025-01-14	Exploring Robustness of Multilingual LLMs on Real-World Noisy Data	Amirhossein Aliakbarzadeh et.al.	2501.08322v1	link
2025-01-14	Diffusion Adversarial Post-Training for One-Step Video Generation	Shanchuan Lin et.al.	2501.08316v1	null
2025-01-14	Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series Classification	Wennuo Yang et.al.	2501.08305v1	link
2025-01-13	UnCommon Objects in 3D	Xingchen Liu et.al.	2501.07574v1	link
2025-01-13	Statistical learnability of smooth boundaries via pairwise binary classification with deep ReLU networks	Hiroki Waida et.al.	2501.07571v1	null
2025-01-13	A reference framework for extremely metal-poor OB star studies: calibrations for stellar parameters and intrinsic colours	Marta Lorenzo et.al.	2501.07569v1	null
2025-01-13	Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss	Xinyu Zhang et.al.	2501.07563v1	null
2025-01-13	SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing	Varun Biyyala et.al.	2501.07554v1	link
2025-01-13	IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion	Tharun Anand et.al.	2501.07530v1	null
2025-01-13	Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization	Aditya Devarakonda et.al.	2501.07526v1	null
2025-01-13	RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment	Difei Gu et.al.	2501.07525v1	link
2025-01-13	Completing Sets of Prototype Transfer Functions for Subspace-based Direction of Arrival Estimation of Multiple Speakers	Daniel Fejgin et.al.	2501.07524v1	null
2025-01-13	Inductive Learning of Robot Task Knowledge from Raw Data and Online Expert Feedback	Daniele Meli et.al.	2501.07507v1	link
2025-01-10	Multi-subject Open-set Personalization in Video Generation	Tsai-Shien Chen et.al.	2501.06187v1	null
2025-01-10	VideoAuteur: Towards Long Narrative Video Generation	Junfei Xiao et.al.	2501.06173v1	null
2025-01-10	PySpatial: A High-Speed Whole Slide Image Pathomics Toolkit	Yuechen Yang et.al.	2501.06151v1	null
2025-01-10	MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection	Arkaprava Sinha et.al.	2501.06138v1	null
2025-01-10	Benchmarking Different Application Types across Heterogeneous Cloud Compute Services	Nivedhitha Duggi et.al.	2501.06128v1	null
2025-01-10	Merging Feed-Forward Sublayers for Compressed Transformers	Neha Verma et.al.	2501.06126v1	link
2025-01-10	Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding	Fabian David Schmidt et.al.	2501.06117v1	null
2025-01-10	ELFATT: Efficient Linear Fast Attention for Vision Transformers	Chong Wu et.al.	2501.06098v1	null
2025-01-10	Averaged Adam accelerates stochastic optimization in the training of deep neural network approximations for partial differential equation and optimal control problems	Steffen Dereich et.al.	2501.06081v1	link
2025-01-10	Explaining k-Nearest Neighbors: Abductive and Counterfactual Explanations	Pablo Barceló et.al.	2501.06078v1	null
2025-01-09	An Empirical Study of Autoregressive Pre-training from Videos	Jathushan Rajasegaran et.al.	2501.05453v1	null
2025-01-09	Fortuity in the D1-D5 system	Chi-Ming Chang et.al.	2501.05448v1	null
2025-01-09	Progressive Growing of Video Tokenizers for Highly Compressed Latent Spaces	Aniruddha Mahapatra et.al.	2501.05442v1	null
2025-01-09	From Images to Insights: Transforming Brain Cancer Diagnosis with Explainable AI	Md. Arafat Alam Khandaker et.al.	2501.05426v1	null
2025-01-09	Seeing Sound: Assembling Sounds from Visuals for Audio-to-Image Generation	Darius Petermann et.al.	2501.05413v1	null
2025-01-09	Innovative Designs and Insights into Quantum Thermal Machines	Aline D. Lucio et.al.	2501.05406v1	null
2025-01-09	Mechanistic understanding and validation of large AI models with SemanticLens	Maximilian Dreyer et.al.	2501.05398v1	null
2025-01-09	1-2-1: Renaissance of Single-Network Paradigm for Virtual Try-On	Shuliang Ning et.al.	2501.05369v1	null
2025-01-09	Video-Conferencing Beyond Screen-Sharing and Thumbnail Webcam Videos: Gesture-Aware Augmented Reality Video for Data-Rich Remote Presentations	Matthew Brehmer et.al.	2501.05345v1	null
2025-01-09	Stability and List-Replicability for Agnostic Learners	Ari Blonda et.al.	2501.05333v1	null
2025-01-09	Probing Speaker-specific Features in Speaker Representations	Aemon Yat Fei Chiu et.al.	2501.05310v1	null
2025-01-08	Planarian Neural Networks: Evolutionary Patterns from Basic Bilateria Shaping Modern Artificial Neural Network Architectures	Ziyuan Huang et.al.	2501.04700v1	null
2025-01-08	ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning	Yuzhou Huang et.al.	2501.04698v1	null
2025-01-08	Non-Markovian dynamics of BIC generation via single-photon scattering	Giuseppe Magnifico et.al.	2501.04691v1	null
2025-01-08	Learning by Confusion: The Phase Diagram of the Holstein Model	George Issa et.al.	2501.04681v1	null
2025-01-08	RadGPT: Constructing 3D Image-Text Tumor Datasets	Pedro R. A. S. Bassi et.al.	2501.04678v1	link
2025-01-08	Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs	Yikang Zhou et.al.	2501.04670v1	link
2025-01-08	HyFusion: Enhanced Reception Field Transformer for Hyperspectral Image Fusion	Chia-Ming Lee et.al.	2501.04665v1	null
2025-01-08	Discrete Wavelet Transform-Based Capsule Network for Hyperspectral Image Classification	Zhiqiang Gao et.al.	2501.04643v1	null
2025-01-08	A Statistical Theory of Contrastive Pre-training and Multimodal Generative AI	Kazusato Oko et.al.	2501.04641v1	link
2025-01-08	Framework for Integrating Machine Learning Methods for Path-Aware Source Routing	Anees Al-Najjar et.al.	2501.04624v1	null
2025-01-07	Extraction Of Cumulative Blobs From Dynamic Gestures	Rishabh Naulakha et.al.	2501.04002v1	null
2025-01-07	Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos	Haobo Yuan et.al.	2501.04001v1	null
2025-01-07	WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings	Haochen Song et.al.	2501.03999v1	null
2025-01-07	Supervised and unsupervised learning the many-body critical phase, phase transitions and critical exponents in disordered quantum systems	Aamna Ahmed et.al.	2501.03981v1	null
2025-01-07	Temporal Feature Weaving for Neonatal Echocardiographic Viewpoint Video Classification	Satchel French et.al.	2501.03967v1	link
2025-01-07	Learning to Relax Nonconvex Quadratically Constrained Quadratic Programs	Buket Ozen et.al.	2501.03954v1	null
2025-01-07	Reducing Proxy Discrimination	Frank Fagan et.al.	2501.03946v1	null
2025-01-07	Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers	Yuechen Zhang et.al.	2501.03931v1	link
2025-01-07	Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback	Jiakang Yuan et.al.	2501.03916v1	null
2025-01-07	The Cable to the Moon: Veritasium's Light Bulb Experiment in Low-Cost Miniature Form	Michael Lenz et.al.	2501.03896v1	null
2025-01-06	RW-Net: Enhancing Few-Shot Point Cloud Classification with a Wavelet Transform Projection-based Network	Haosheng Zhang et.al.	2501.03221v1	null
2025-01-06	ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking	Tingyang Zhang et.al.	2501.03220v1	null
2025-01-06	Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction	Rui Qian et.al.	2501.03218v1	link
2025-01-06	Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLMs-Generated Text	Ayat Najjar et.al.	2501.03212v1	null
2025-01-06	Multimodal Machine Learning Can Predict Videoconference Fluidity and Enjoyment	Andrew Chang et.al.	2501.03190v1	null
2025-01-06	GLiREL -- Generalist Model for Zero-Shot Relation Extraction	Jack Boylan et.al.	2501.03172v1	null
2025-01-06	Deep-Relative-Trust-Based Diffusion for Decentralized Deep Learning	Muyun Li et.al.	2501.03162v1	null
2025-01-06	Segment Anything Model for Zero-shot Single Particle Tracking in Liquid Phase Transmission Electron Microscopy	Risha Goel et.al.	2501.03153v1	null
2025-01-06	MVP: Multimodal Emotion Recognition based on Video and Physiological Signals	Valeriya Strizhkova et.al.	2501.03103v1	null
2025-01-06	Trust Modeling in Counseling Conversations: A Benchmark Study	Aseem Srivastava et.al.	2501.03064v1	null
2025-01-03	VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction	Chaoyou Fu et.al.	2501.01957v1	link
2025-01-03	VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment	Wenyan Cong et.al.	2501.01949v1	null
2025-01-03	Bridging Classification and Segmentation in Osteosarcoma Assessment via Foundation and Discrete Diffusion Models	Manh Duong Nguyen et.al.	2501.01932v1	null
2025-01-03	GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction	Yuwei Miao et.al.	2501.01930v1	null
2025-01-03	Transformer-Driven Inverse Problem Transform for Fast Blind Hyperspectral Image Dehazing	Po-Wei Tang et.al.	2501.01924v1	null
2025-01-03	Structural and Statistical Audio Texture Knowledge Distillation (SSATKD) for Passive Sonar Classification	Jarin Ritu et.al.	2501.01921v1	null
2025-01-03	Exoplanet Detection via Differentiable Rendering	Brandon Y. Feng et.al.	2501.01912v1	null
2025-01-03	EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation	Siyuan Huang et.al.	2501.01895v1	null
2025-01-03	ANTHROPOS-V: benchmarking the novel task of Crowd Volume Estimation	Luca Collorone et.al.	2501.01877v1	null
2025-01-03	Extensions of finite irreducible modules over rank two Lie conformal algebra	Lipeng Luo et.al.	2501.01870v1	null
2025-01-02	GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models	Zhangyang Qi et.al.	2501.01428v1	null
2025-01-02	VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control	Yuanpeng Tu et.al.	2501.01427v1	null
2025-01-02	Unifying Specialized Visual Encoders for Video Language Models	Jihoon Chung et.al.	2501.01426v1	null
2025-01-02	Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions	Xincheng Shuai et.al.	2501.01425v1	null
2025-01-02	Multi-Modal Video Feature Extraction for Popularity Prediction	Haixu Liu et.al.	2501.01422v1	null
2025-01-02	A Multi-task Supervised Compression Model for Split Computing	Yoshitomo Matsubara et.al.	2501.01420v1	null
2025-01-02	On Unifying Video Generation and Camera Pose Estimation	Chun-Hao Paul Huang et.al.	2501.01409v1	null
2025-01-02	nnY-Net: Swin-NeXt with Cross-Attention for 3D Medical Images Segmentation	Haixu Liu et.al.	2501.01406v1	null
2025-01-02	VoiceVector: Multimodal Enrolment Vectors for Speaker Separation	Akam Rahimi et.al.	2501.01401v1	null
2025-01-02	ProjectedEx: Enhancing Generation in Explainable AI for Prostate Cancer	Xuyin Qi et.al.	2501.01392v1	null
2024-12-30	PERSE: Personalized 3D Generative Avatars from A Single Portrait	Hyunsoo Cha et.al.	2412.21206v1	null
2024-12-30	Action-Agnostic Point-Level Supervision for Temporal Action Detection	Shuhei M. Yoshida et.al.	2412.21205v1	link
2024-12-30	A Large-Scale Study on Video Action Dataset Condensation	Yang Chen et.al.	2412.21197v1	null
2024-12-30	Classification of del Pezzo surfaces of rank one. I. Height 1 and 2. II. Descendants with elliptic boundaries	Karol Palka et.al.	2412.21174v1	null
2024-12-30	Adversarial Attack and Defense for LoRa Device Identification and Authentication via Deep Learning	Yalin E. Sagduyu et.al.	2412.21164v1	null
2024-12-30	Open RAN-Enabled Deep Learning-Assisted Mobility Management for Connected Vehicles	Maria Barbosa et.al.	2412.21161v1	null
2024-12-30	Unified dimensionality reduction techniques in chronic liver disease detection	Anand Karna et.al.	2412.21156v1	null
2024-12-30	Irreducible representations of welded braid group	Inna Sysoeva et.al.	2412.21133v1	null
2024-12-30	Galaxy Spectra Networks (GaSNet). III. Generative pre-trained network for spectrum reconstruction, redshift estimate and anomaly detection	Fucheng Zhong et.al.	2412.21130v1	link
2024-12-30	All toric Kahler surfaces with twistor 2-forms	Sergei G. Ovchinnikov et.al.	2412.21114v1	null
2024-12-27	Streamlined Krylov construction and classification of ergodic Floquet systems	Nikita Kolganov et.al.	2412.19797v1	null
2024-12-27	MVTamperBench: Evaluating Robustness of Vision-Language Models	Amit Agarwal et.al.	2412.19794v1	null
2024-12-27	Machine Learning for Sentiment Analysis of Imported Food in Trinidad and Tobago	Cassandra Daniels et.al.	2412.19781v1	null
2024-12-27	Classification of Minimal Abelian Coulomb Branches	Antoine Bourget et.al.	2412.19766v1	null
2024-12-27	Can one hear the shape of a random walk?	Michael J. Larsen et.al.	2412.19762v1	null
2024-12-27	Generative Video Propagation	Shaoteng Liu et.al.	2412.19761v1	null
2024-12-27	Generative Pretrained Embedding and Hierarchical Irregular Time Series Representation for Daily Living Activity Recognition	Damien Bouchabou et.al.	2412.19732v1	null
2024-12-27	EEG-Reptile: An Automatized Reptile-Based Meta-Learning Library for BCIs	Daniil A. Berdyshev et.al.	2412.19725v1	link
2024-12-27	Quantum correlations in a gravitational collapse simulation with SpheriCo.jl	Benjamin Berczi et.al.	2412.19722v1	null
2024-12-27	ProKAN: Progressive Stacking of Kolmogorov-Arnold Networks for Efficient Liver Segmentation	Bhavesh Gyanchandani et.al.	2412.19713v1	null
2024-12-24	Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models	Jinhui Yi et.al.	2412.18609v1	link
2024-12-24	DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers	Yuntao Chen et.al.	2412.18607v1	null
2024-12-24	ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation	Hongjie Li et.al.	2412.18600v1	null
2024-12-24	DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation	Minghong Cai et.al.	2412.18597v1	link
2024-12-24	ClassifyViStA:WCE Classification with Visual understanding through Segmentation and Attention	S. Balasubramanian et.al.	2412.18591v1	link
2024-12-24	Text-Driven Tumor Synthesis	Xinran Li et.al.	2412.18589v1	null
2024-12-24	Resolution-Robust 3D MRI Reconstruction with 2D Diffusion Priors: Diverse-Resolution Training Outperforms Interpolation	Anselm Krainovic et.al.	2412.18584v1	null
2024-12-24	New method of image processing via statistical analysis for application in intelligent systems	Monalisa Cavalcante et.al.	2412.18575v1	null
2024-12-24	3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement	Yihang Luo et.al.	2412.18565v1	null
2024-12-24	Distilling Fine-grained Sentiment Understanding from Large Language Models	Yice Zhang et.al.	2412.18552v1	link
2024-12-23	FaceLift: Single Image to 3D Head with View Generation and GS-LRM	Weijie Lyu et.al.	2412.17812v1	null
2024-12-23	Large Motion Video Autoencoding with Cross-modal Video VAE	Yazhou Xing et.al.	2412.17805v1	null
2024-12-23	GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator	Yidi Shao et.al.	2412.17804v1	null
2024-12-23	Classification of exchange relation planar algebras through sieving forest fusion graphs	Fan Lu et.al.	2412.17790v1	null
2024-12-23	Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy	Priyaranjan Pattnayak et.al.	2412.17759v1	null
2024-12-23	Induced subgraphs and tree decompositions XVIII. Obstructions to bounded pathwidth	Maria Chudnovsky et.al.	2412.17756v1	null
2024-12-23	LASE: Learned Adjacency Spectral Embeddings	Sofía Pérez Casulo et.al.	2412.17734v1	null
2024-12-23	VidTwin: Video VAE with Decoupled Structure and Dynamics	Yuchi Wang et.al.	2412.17726v1	link
2024-12-23	MRANet: A Modified Residual Attention Networks for Lung and Colon Cancer Classification	Diponkor Bala et.al.	2412.17700v1	null
2024-12-23	An efficient volume-preserving MBO scheme for data clustering and classification	Fabius Krämer et.al.	2412.17694v1	null
2024-12-20	Can Generative Video Models Help Pose Estimation?	Ruojin Cai et.al.	2412.16155v1	null
2024-12-20	MotiF: Making Text Count in Image Animation with Motion Focal Loss	Shijie Wang et.al.	2412.16153v1	null
2024-12-20	Shape Shifters: Does Body Shape Change the Perception of Small-Scale Crowd Motions?	Bharat Vyas et.al.	2412.16151v1	null
2024-12-20	SeagrassFinder: Deep Learning for Eelgrass Detection and Coverage Estimation in the Wild	Jannik Elsäßer et.al.	2412.16147v1	null
2024-12-20	Mamba2D: A Natively Multi-Dimensional State-Space Model for Vision Tasks	Enis Baty et.al.	2412.16146v1	null
2024-12-20	FedGAT: A Privacy-Preserving Federated Approximation Algorithm for Graph Attention Networks	Siddharth Ambekar et.al.	2412.16144v1	null
2024-12-20	Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts	Muhammad Abdullah Sohail et.al.	2412.16119v1	link
2024-12-20	PruneVid: Visual Token Pruning for Efficient Video Large Language Models	Xiaohu Huang et.al.	2412.16117v1	link
2024-12-20	Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG	Hasan Md Tusfiqur Alam et.al.	2412.16086v1	link
2024-12-20	Efficient MedSAMs: Segment Anything in Medical Images on Laptop	Jun Ma et.al.	2412.16085v1	link
2024-12-19	LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis	Hanlin Wang et.al.	2412.15214v1	null
2024-12-19	Scaling 4D Representations	João Carreira et.al.	2412.15212v1	null
2024-12-19	AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation	Moayed Haji-Ali et.al.	2412.15191v1	null
2024-12-19	EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues	Sagar Soni et.al.	2412.15190v1	null
2024-12-19	Surface-Based Authentication System for Integrated Circuit Chips	Runze Liu et.al.	2412.15186v1	null
2024-12-19	Tiled Diffusion	Or Madar et.al.	2412.15185v1	null
2024-12-19	SqueezeMe: Efficient Gaussian Avatars for VR	Shunsuke Saito et.al.	2412.15171v1	null
2024-12-19	OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization	Jiacheng Zhang et.al.	2412.15159v1	null
2024-12-19	Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM	Yatai Ji et.al.	2412.15156v1	link
2024-12-19	Cruise Control: Dynamic Model Selection for ML-Based Network Traffic Analysis	Johann Hugon et.al.	2412.15146v1	null
2024-12-18	AniDoc: Animation Creation Made Easier	Yihao Meng et.al.	2412.14173v1	null
2024-12-18	Learning from Massive Human Videos for Universal Humanoid Pose Control	Jiageng Mao et.al.	2412.14172v1	null
2024-12-18	Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces	Jihan Yang et.al.	2412.14171v1	link
2024-12-18	Autoregressive Video Generation without Vector Quantization	Haoge Deng et.al.	2412.14169v1	link
2024-12-18	VideoDPO: Omni-Preference Alignment for Video Diffusion Generation	Runtao Liu et.al.	2412.14167v1	null
2024-12-18	AKiRa: Augmentation Kit on Rays for optical video generation	Xi Wang et.al.	2412.14158v1	null
2024-12-18	AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities	Guillaume Astruc et.al.	2412.14123v1	link
2024-12-18	GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images	Ziyang Xu et.al.	2412.14118v1	link
2024-12-18	Parameter-efficient Fine-tuning for improved Convolutional Baseline for Brain Tumor Segmentation in Sub-Saharan Africa Adult Glioma Dataset	Bijay Adhikari et.al.	2412.14100v1	null
2024-12-18	Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts	Jihye Choi et.al.	2412.14097v1	null
2024-12-17	MotionBridge: Dynamic Video Inbetweening with Flexible Controls	Maham Tanveer et.al.	2412.13190v1	null
2024-12-17	StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models	Yunzhi Yan et.al.	2412.13188v1	null
2024-12-17	HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction	Chen Bao et.al.	2412.13187v1	null
2024-12-17	Move-in-2D: 2D-Conditioned Human Motion Generation	Hsin-Ping Huang et.al.	2412.13185v1	null
2024-12-17	Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures	Guoxing Sun et.al.	2412.13183v1	null
2024-12-17	NFL-BA: Improving Endoscopic SLAM with Near-Field Light Bundle Adjustment	Andrea Dunn Beltran et.al.	2412.13176v1	null
2024-12-17	Learning Visuotactile Estimation and Control for Non-prehensile Manipulation under Occlusions	Juan Del Aguila Ferrandis et.al.	2412.13157v1	null
2024-12-17	Continuous Patient Monitoring with AI: Real-Time Analysis of Video in Hospital Care Settings	Paolo Gabriel et.al.	2412.13152v1	null
2024-12-17	Label Errors in the Tobacco3482 Dataset	Gordon Lim et.al.	2412.13140v1	link
2024-12-17	Unlocking the Potential of Digital Pathology: Novel Baselines for Compression	Maximilian Fischer et.al.	2412.13137v1	null
2024-12-16	Wonderland: Navigating 3D Scenes from a Single Image	Hanwen Liang et.al.	2412.12091v1	null
2024-12-16	Instruction-based Image Manipulation by Watching How Things Move	Mingdeng Cao et.al.	2412.12087v1	null
2024-12-16	CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology	Yuxuan Sun et.al.	2412.12077v1	null
2024-12-16	CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding	Guo Chen et.al.	2412.12075v1	null
2024-12-16	Exploring Semantic Consistency and Style Diversity for Domain Generalized Semantic Segmentation	Hongwei Niu et.al.	2412.12050v1	link
2024-12-16	Deep-learning-based identification of individual motion characteristics from upper-limb trajectories towards disorder stage evaluation	Tim Sziburis et.al.	2412.12016v1	null
2024-12-16	Cost-Effective Label-free Node Classification with LLMs	Taiyan Zhang et.al.	2412.11983v1	null
2024-12-16	On the Nielsen-Thomsen sequence	Laurent Cantier et.al.	2412.11975v1	null
2024-12-16	On vertex-transitive distance-regular covers of complete graphs with an extremal smallest eigenvalue	Ludmila Yu. Tsiovkina et.al.	2412.11962v1	null
2024-12-16	Gramian Multimodal Representation Learning and Alignment	Giordano Cicchetti et.al.	2412.11959v1	null
2024-12-13	UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities	Muhammad Uzair Khattak et.al.	2412.10372v1	link
2024-12-13	Apollo: An Exploration of Video Understanding in Large Multimodal Models	Orr Zohar et.al.	2412.10360v1	null
2024-12-13	Robust image classification with multi-modal large language models	Francesco Villani et.al.	2412.10353v1	null
2024-12-13	BrushEdit: All-In-One Image Inpainting and Editing	Yaowei Li et.al.	2412.10316v1	null
2024-12-13	Performance evaluation of predictive AI models to support medical decisions: Overview and guidance	Ben Van Calster et.al.	2412.10288v1	null
2024-12-13	TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation	Xingrui Wang et.al.	2412.10275v1	null
2024-12-13	Reasoner Outperforms: Generative Stance Detection with Rationalization for Social Media	Jiaqing Yuan et.al.	2412.10266v1	null
2024-12-13	Adversarial Robustness of Bottleneck Injected Deep Neural Networks for Task-Oriented Communication	Alireza Furutanpey et.al.	2412.10265v1	null
2024-12-13	MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization	Shuaiting Li et.al.	2412.10261v1	null
2024-12-13	Copy-Move Detection in Optical Microscopy: A Segmentation Network and A Dataset	Hao-Chiang Shao et.al.	2412.10258v1	null
2024-12-12	Doe-1: Closed-Loop Autonomous Driving with Large World Model	Wenzhao Zheng et.al.	2412.09627v1	link
2024-12-12	FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion	Haonan Qiu et.al.	2412.09626v1	null
2024-12-12	GenEx: Generating an Explorable World	Taiming Lu et.al.	2412.09624v1	null
2024-12-12	OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation	Weiqi Li et.al.	2412.09623v1	null
2024-12-12	Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos	Linyi Jin et.al.	2412.09621v1	null
2024-12-12	Learning Camera Movement Control from Real-World Drone Videos	Yunzhong Hou et.al.	2412.09620v1	null
2024-12-12	NormalFlow: Fast, Robust, and Accurate Contact-based Object 6DoF Pose Tracking with Vision-based Tactile Sensors	Hung-Jui Huang et.al.	2412.09617v1	link
2024-12-12	V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding	Junqi Ge et.al.	2412.09616v1	link
2024-12-12	PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models	Chenyu Yang et.al.	2412.09613v1	null
2024-12-12	Olympus: A Universal Task Router for Computer Vision Tasks	Yuanze Lin et.al.	2412.09612v1	link
2024-12-11	StreamChat: Chatting with Streaming Video	Jihao Liu et.al.	2412.08646v1	null
2024-12-11	Generative Semantic Communication: Architectures, Technologies, and Applications	Jinke Ren et.al.	2412.08642v1	null
2024-12-11	Multimodal Latent Language Modeling with Next-Token Diffusion	Yutao Sun et.al.	2412.08635v1	null
2024-12-11	MNIST-Fraction: Enhancing Math Education with AI-Driven Fraction Detection and Analysis	Pegah Ahadian et.al.	2412.08633v1	null
2024-12-11	Image Retrieval Methods in the Dissimilarity Space	Madhu Kiran et.al.	2412.08618v1	null
2024-12-11	CCSNscore: A multi-input deep learning tool for classification of core-collapse supernovae using SED-Machine spectra	Yashvi Sharma et.al.	2412.08601v1	null
2024-12-11	RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation	Mingfei Han et.al.	2412.08591v1	null
2024-12-11	SPACE-SUIT: An Artificial Intelligence based chromospheric feature extractor and classifier for SUIT	Pranava Seth et.al.	2412.08589v1	null
2024-12-11	Advancing Single- and Multi-task Text Classification through Large Language Model Fine-tuning	Hang Zhao et.al.	2412.08587v1	null
2024-12-11	Utilizing Multi-step Loss for Single Image Reflection Removal	Abdelrahman Elnenaey et.al.	2412.08582v1	link
2024-12-10	Video Motion Transfer with Diffusion Transformers	Alexander Pondaven et.al.	2412.07776v1	link
2024-12-10	UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics	Xi Chen et.al.	2412.07774v1	null
2024-12-10	From Slow Bidirectional to Fast Causal Video Generators	Tianwei Yin et.al.	2412.07772v1	null
2024-12-10	From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos	Matthew Wallingford et.al.	2412.07770v1	null
2024-12-10	Learning Visual Generative Priors without Text	Shuailei Ma et.al.	2412.07767v1	null
2024-12-10	Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation	Jingxi Chen et.al.	2412.07761v1	null
2024-12-10	SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints	Jianhong Bai et.al.	2412.07760v1	link
2024-12-10	3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation	Xiao Fu et.al.	2412.07759v1	null
2024-12-10	PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation	Fatemeh Nazarieh et.al.	2412.07754v1	null
2024-12-10	On Motion Blur and Deblurring in Visual Place Recognition	Timur Ismagilov et.al.	2412.07751v1	null
2024-12-09	[MASK] is All You Need	Vincent Tao Hu et.al.	2412.06787v1	link
2024-12-09	P3-PO: Prescriptive Point Priors for Visuo-Spatial Generalization of Robot Policies	Mara Levy et.al.	2412.06784v1	null
2024-12-09	Convolution goes higher-order: a biologically inspired mechanism empowers image classification	Simone Azeglio et.al.	2412.06740v1	null
2024-12-09	JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM	Takuro Fujii et.al.	2412.06738v1	null
2024-12-09	Demystifying shock breakout spectra	Christopher M. Irwin et.al.	2412.06734v1	null
2024-12-09	Parkinson's Disease Diagnosis Through Deep Learning: A Novel LSTM-Based Approach for Freezing of Gait Detection	Aqib Nazir Mir et.al.	2412.06709v1	null
2024-12-09	You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale	Baorui Ma et.al.	2412.06699v1	null
2024-12-09	FedSynthCT-Brain: A Federated Learning Framework for Multi-Institutional Brain MRI-to-CT Synthesis	Ciro Benito Raggio et.al.	2412.06690v1	null
2024-12-09	Impact of Privacy Parameters on Deep Learning Models for Image Classification	Basanta Chaulagain et.al.	2412.06689v1	null
2024-12-09	Diff5T: Benchmarking Human Brain Diffusion MRI with an Extensive 5.0 Tesla K-Space and Spatial Dataset	Shanshan Wang et.al.	2412.06666v1	null
2024-12-06	Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model	Lening Wang et.al.	2412.05280v1	link
2024-12-06	Sparse autoencoders reveal selective remapping of visual concepts during adaptation	Hyesu Lim et.al.	2412.05276v1	link
2024-12-06	MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models	Tuna Han Salih Meral et.al.	2412.05275v1	null
2024-12-06	Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	Zhe Chen et.al.	2412.05271v1	null
2024-12-06	Mind the Time: Temporally-Controlled Multi-Event Video Generation	Ziyi Wu et.al.	2412.05263v1	null
2024-12-06	TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft	Qian Long et.al.	2412.05255v1	link
2024-12-06	Uncertainty Quantification for Transformer Models for Dark-Pattern Detection	Javier Muñoz et.al.	2412.05251v1	null
2024-12-06	ColonNet: A Hybrid Of DenseNet121 And U-NET Model For Detection And Segmentation Of GI Bleeding	Ayushman Singh et.al.	2412.05216v1	null
2024-12-06	LinVT: Empower Your Image-level Large Language Model to Understand Videos	Lishuai Gao et.al.	2412.05185v1	link
2024-12-06	DreamColour: Controllable Video Colour Editing without Training	Chaitat Utintu et.al.	2412.05180v1	null
2024-12-05	PaintScene4D: Consistent 4D Scene Generation from Text Prompts	Vinayak Gupta et.al.	2412.04471v1	null
2024-12-05	QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos	Sharath Girish et.al.	2412.04469v1	null
2024-12-05	NVILA: Efficient Frontier Visual Language Models	Zhijian Liu et.al.	2412.04468v1	null
2024-12-05	VisionZip: Longer is Better but Not Necessary in Vision Language Models	Senqiao Yang et.al.	2412.04467v1	link
2024-12-05	MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos	Zhengqi Li et.al.	2412.04463v1	null
2024-12-05	4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion	Chaoyang Wang et.al.	2412.04462v1	null
2024-12-05	Four-Plane Factorized Video Autoencoders	Mohammed Suhail et.al.	2412.04452v1	null
2024-12-05	MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation	Longtao Zheng et.al.	2412.04448v1	null
2024-12-05	EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios	Lu Qiu et.al.	2412.04447v1	null
2024-12-05	DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models	Yizhuo Li et.al.	2412.04446v1	null
2024-12-04	Navigation World Models	Amir Bar et.al.	2412.03572v1	null
2024-12-04	The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control	Ruili Feng et.al.	2412.03568v1	null
2024-12-04	Streaming Detection of Queried Event Start	Cristobal Eyzaguirre et.al.	2412.03567v1	null
2024-12-04	Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning	Wujian Peng et.al.	2412.03565v1	null
2024-12-04	From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents	Xinyi Mou et.al.	2412.03563v1	null
2024-12-04	Imagine360: Immersive 360 Video Generation from Perspective Anchor	Jing Tan et.al.	2412.03552v1	null
2024-12-04	Kibble-Zurek Dynamics & Statistics of Topological Defects in Chiral Superfluid $^3$He Films	Noble Gluscevich et.al.	2412.03544v1	null
2024-12-04	Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos	Hanxue Liang et.al.	2412.03526v1	null
2024-12-04	Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention	Hannan Lu et.al.	2412.03520v1	null
2024-12-04	Distillation of Diffusion Features for Semantic Correspondence	Frank Fundel et.al.	2412.03512v1	null
2024-12-03	Motion Prompting: Controlling Video Generation with Motion Trajectories	Daniel Geng et.al.	2412.02700v1	null
2024-12-03	An ADHD Diagnostic Interface Based on EEG Spectrograms and Deep Learning Techniques	Medha Pappula et.al.	2412.02695v1	null
2024-12-03	FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation	Kefan Chen et.al.	2412.02690v1	null
2024-12-03	AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction	Lingteng Qiu et.al.	2412.02684v1	null
2024-12-03	On Third-Order Evolution Systems Describing Pseudo-Spherical or Spherical Surfaces	Filipe Kelmer et.al.	2412.02657v1	null
2024-12-03	Robust soybean seed yield estimation using high-throughput ground robot videos	Jiale Feng et.al.	2412.02642v1	null
2024-12-03	QA-TOOLBOX: Conversational Question-Answering for process task guidance in manufacturing	Ramesh Manuvinakurike et.al.	2412.02638v1	null
2024-12-03	Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback	Hiroki Furuta et.al.	2412.02617v1	null
2024-12-03	Interpretable Company Similarity with Sparse Autoencoders	Marco Molinari et.al.	2412.02605v1	null
2024-12-03	Efficient Algorithms for Low Tubal Rank Tensor Approximation with Applications to Image Compression, Super-Resolution and Deep Learning	Salman Ahmadi-Asl et.al.	2412.02598v1	null
2024-12-02	T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs	Shukang Yin et.al.	2411.19951v2	link
2024-11-29	AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos	Yuze He et.al.	2411.19950v1	null
2024-11-29	Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark	Joseph Heyward et.al.	2411.19941v1	null
2024-11-29	SIMS: Simulating Human-Scene Interactions with Real World Script Planning	Wenjia Wang et.al.	2411.19921v1	null
2024-11-29	Noncommutative Model Selection for Data Clustering and Dimension Reduction Using Relative von Neumann Entropy	Araceli Guzmán-Tristán et.al.	2411.19902v1	null
2024-11-29	To the Problem of Cosmic Expansion in Massive Gravity	Lavinia Heisenberg et.al.	2411.19873v1	null
2024-11-29	AIDetx: a compression-based method for identification of machine-learning generated text	Leonardo Almeida et.al.	2411.19869v1	link
2024-11-29	Towards Class-wise Robustness Analysis	Tejaswini Medi et.al.	2411.19853v1	null
2024-11-29	Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation	Dimosthenis Antypas et.al.	2411.19832v1	null
2024-11-29	A new definition of outsplitting on $k$-graphs preserving Morita equivalence	Mackenzie Amann et.al.	2411.19816v1	null
2024-11-27	GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data	Wentao Wang et.al.	2411.18624v1	null
2024-11-27	Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data	Aoran Shen et.al.	2411.18622v1	null
2024-11-27	CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models	Rundi Wu et.al.	2411.18613v1	null
2024-11-27	Novel Class Discovery for Open Set Raga Classification	Parampreet Singh et.al.	2411.18611v1	null
2024-11-27	Variability of hot sub-luminous stars and binaries: Machine learning analysis of Gaia DR3 multi-epoch photometry	P. Ranaivomanana et.al.	2411.18609v1	null
2024-11-27	Evaluating and Improving the Effectiveness of Synthetic Chest X-Rays for Medical Image Analysis	Eva Prakash et.al.	2411.18602v1	null
2024-11-27	Periodic symplectic and Hamiltonian diffeomorphisms on irrational ruled surfaces	Nicholas Lindsay et.al.	2411.18580v1	null
2024-11-27	Pruning Deep Convolutional Neural Network Using Conditional Mutual Information	Tien Vu-Van et.al.	2411.18578v1	null
2024-11-27	Exploring Depth Information for Detecting Manipulated Face Videos	Haoyue Wang et.al.	2411.18572v1	null
2024-11-27	Perturbation Ontology based Graph Attention Networks	Yichen Wang et.al.	2411.18520v1	null
2024-11-26	Video-Guided Foley Sound Generation with Multimodal Controls	Ziyang Chen et.al.	2411.17698v1	null
2024-11-26	StableAnimator: High-Quality Identity-Preserving Human Image Animation	Shuyuan Tu et.al.	2411.17697v1	link
2024-11-26	Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis	Akshita Gupta et.al.	2411.17690v1	null
2024-11-26	BERT or FastText? A Comparative Analysis of Contextual as well as Non-Contextual Embeddings	Abhay Shanbhag et.al.	2411.17661v1	null
2024-11-26	DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting	Christian Homeyer et.al.	2411.17660v1	link
2024-11-26	SAMWISE: Infusing wisdom in SAM2 for Text-Driven Video Segmentation	Claudia Cuttano et.al.	2411.17646v1	link
2024-11-26	A robust image encryption scheme based on new 4-D hyperchaotic system and elliptic curve	Yehia Lalili et.al.	2411.17643v1	null
2024-11-26	On Limitations of LLM as Annotator for Low Resource Languages	Suramya Jadhav et.al.	2411.17637v1	null
2024-11-26	An Ensemble Approach for Brain Tumor Segmentation and Synthesis	Juampablo E. Heras Rivera et.al.	2411.17617v1	null
2024-11-26	Accelerating Vision Diffusion Transformers with Skip Branches	Guanjie Chen et.al.	2411.17616v1	link
2024-11-25	Generative Omnimatte: Learning to Decompose Video into Layers	Yao-Chih Lee et.al.	2411.16683v1	null
2024-11-25	Quark: Real-time, High-resolution, and General Neural View Synthesis	John Flynn et.al.	2411.16680v1	null
2024-11-25	A Supervised Machine Learning Approach for Assessing Grant Peer Review Reports	Gabriel Okasa et.al.	2411.16662v1	null
2024-11-25	Fast training of large kernel models with delayed projections	Amirhesam Abedsoltan et.al.	2411.16658v1	null
2024-11-25	DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation	Zun Wang et.al.	2411.16657v1	null
2024-11-25	Automated Registration of 3D Neurovascular Territory Atlas to 2D DSA for Targeted Quantitative Angiography Analysis	George Dimopoulos et.al.	2411.16637v1	null
2024-11-25	LegoPET: Hierarchical Feature Guided Conditional Diffusion for PET Image Reconstruction	Yiran Sun et.al.	2411.16629v1	null
2024-11-25	Inference-Time Policy Steering through Human Interactions	Yanwei Wang et.al.	2411.16627v1	null
2024-11-25	Imperceptible Adversarial Examples in the Physical World	Weilin Xu et.al.	2411.16622v1	null
2024-11-25	Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric	Zhichao Zhang et.al.	2411.16619v1	null
2024-11-22	Health AI Developer Foundations	Atilla P. Kiraly et.al.	2411.15128v1	null
2024-11-22	PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision	Arnav M. Das et.al.	2411.15127v1	null
2024-11-22	VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement	Daeun Lee et.al.	2411.15115v1	null
2024-11-22	About Time: Advances, Challenges, and Outlooks of Action Understanding	Alexandros Stergiou et.al.	2411.15106v1	null
2024-11-22	Efficient Radar Modulation Recognition via a Noise-Aware Ensemble Neural Network	Do-Hyun Park et.al.	2411.15104v1	null
2024-11-22	RED: Effective Trajectory Representation Learning with Comprehensive Information	Silin Zhou et.al.	2411.15096v1	null
2024-11-22	Dimension-independent rates for structured neural density estimation	Robert A. Vandermeulen et.al.	2411.15095v1	null
2024-11-22	Quantum-enhanced unsupervised image segmentation for medical images analysis	Laia Domingo et.al.	2411.15086v1	null
2024-11-22	Leapfrog Latent Consistency Model (LLCM) for Medical Images Generation	Lakshmikar R. Polamreddy et.al.	2411.15084v1	link
2024-11-22	RankByGene: Gene-Guided Histopathology Representation Learning Through Cross-Modal Ranking Consistency	Wentao Huang et.al.	2411.15076v1	null
2024-11-21	Revisiting the Integration of Convolution and Attention for Vision Backbone	Lei Zhu et.al.	2411.14429v1	link
2024-11-21	Quantum States Imaging of Magnetic Field Contours based on Autler-Townes Effect in Yb Atoms	Tanaporn Na Narong et.al.	2411.14426v1	null
2024-11-21	Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation	Zhuoman Liu et.al.	2411.14423v1	null
2024-11-21	Multimodal 3D Brain Tumor Segmentation with Adversarial Training and Conditional Random Field	Lan Jiang et.al.	2411.14418v1	null
2024-11-21	Multimodal Autoregressive Pre-training of Large Vision Encoders	Enrico Fini et.al.	2411.14402v1	link
2024-11-21	Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding	Yiming Zhang et.al.	2411.14401v1	null
2024-11-21	POS-tagging to highlight the skeletal structure of sentences	Grigorii Churakov et.al.	2411.14393v1	link
2024-11-21	Persistent Homology for Structural Characterization in Disordered Systems	An Wang et.al.	2411.14390v1	link
2024-11-21	Enhancing Diagnostic Precision in Gastric Bleeding through Automated Lesion Segmentation: A Deep DuS-KFCM Approach	Xian-Xian Liu et.al.	2411.14385v1	null
2024-11-21	Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation	Yuanhao Cai et.al.	2411.14384v1	null
2024-11-20	REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents	Rui Tian et.al.	2411.13552v1	link
2024-11-20	Generating 3D-Consistent Videos from Unposed Internet Photos	Gene Chou et.al.	2411.13549v1	null
2024-11-20	Comparative Analysis of Machine Learning and Deep Learning Models for Classifying Squamous Epithelial Cells of the Cervix	Subhasish Das et.al.	2411.13535v1	null
2024-11-20	Predictive Insights into LGBTQ+ Minority Stress: A Transductive Exploration of Social Media Discourse	S. Chapagain et.al.	2411.13534v1	null
2024-11-20	Geometric Algebra Planes: Convex Implicit Neural Volumes	Irmak Sivgin et.al.	2411.13525v1	null
2024-11-20	VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models	Ziqi Huang et.al.	2411.13503v1	link
2024-11-20	Efficient Brain Imaging Analysis for Alzheimer's and Dementia Detection Using Convolution-Derivative Operations	Yasmine Mustafa et.al.	2411.13490v1	null
2024-11-20	Benchmarking Quantum Convolutional Neural Networks for Classification and Data Compression Tasks	Jun Yong Khoo et.al.	2411.13468v1	null
2024-11-20	Heuristically Adaptive Diffusion-Model Evolutionary Strategy	Benedikt Hartl et.al.	2411.13420v1	null
2024-11-20	Transformer-Based Contextualized Language Models Joint with Neural Networks for Natural Language Inference in Vietnamese	Dat Van-Thanh Nguyen et.al.	2411.13407v1	null
2024-11-19	Soft Robotic Dynamic In-Hand Pen Spinning	Yunchao Yao et.al.	2411.12734v1	null
2024-11-19	Enhancing Multi-Class Disease Classification: Neoplasms, Cardiovascular, Nervous System, and Digestive Disorders Using Advanced LLMs	Ahmed Akib Jawad Karim et.al.	2411.12712v1	null
2024-11-19	UBSoft: A Simulation Platform for Robotic Skill Learning in Unbounded Soft Environments	Chunru Lin et.al.	2411.12711v1	null
2024-11-19	Attribute Inference Attacks for Federated Regression Tasks	Francesco Diana et.al.	2411.12697v1	null
2024-11-19	IMUVIE: Pickup Timeline Action Localization via Motion Movies	John Clapham et.al.	2411.12689v1	null
2024-11-19	AI Guided Early Screening of Cervical Cancer	Dharanidharan S I et.al.	2411.12681v1	null
2024-11-19	Yang--Mills topology on four-dimensional triangulations	Giuseppe Clemente et.al.	2411.12668v1	null
2024-11-19	Machine Learning Approaches on Crop Pattern Recognition a Comparative Analysis	Kazi Hasibul Kabir et.al.	2411.12667v1	null
2024-11-19	PoM: Efficient Image and Video Generation with the Polynomial Mixer	David Picard et.al.	2411.12663v1	link
2024-11-19	AdaCM$^2$: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction	Yuanbin Man et.al.	2411.12593v1	null
2024-11-18	Partially Hyperbolic Dynamics with Quasi-isometric Center	Ziqiang Feng et.al.	2411.11836v1	null
2024-11-18	Describe Now: User-Driven Audio Description for Blind and Low Vision Individuals	Maryam Cheema et.al.	2411.11835v1	null
2024-11-18	Absorbing state dynamics of stochastic gradient descent	Guanming Zhang et.al.	2411.11834v1	null
2024-11-18	Equivariant spatio-hemispherical networks for diffusion MRI deconvolution	Axel Elaldi et.al.	2411.11819v1	link
2024-11-18	Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion	Meng Zhou et.al.	2411.11799v1	link
2024-11-18	Exploring adversarial robustness of JPEG AI: methodology, comparison and new methods	Egor Kovalev et.al.	2411.11795v1	null
2024-11-18	Energy shifts and broadening of excitonic resonances in electrostatically-doped semiconductors	Hanan Dery et.al.	2411.11790v1	null
2024-11-18	High-Speed Cornering Control and Real-Vehicle Deployment for Autonomous Electric Vehicles	Shiyue Zhao et.al.	2411.11762v1	null
2024-11-18	Additional Tests for TV 3.0	Eduardo Peixoto et.al.	2411.11755v1	null
2024-11-18	Advacheck at GenAI Detection Task 1: AI Detection Powered by Domain-Aware Multi-Tasking	German Gritsai et.al.	2411.11736v1	null
2024-11-15	The Spatial Complexity of Optical Computing and How to Reduce It	Yandong Li et.al.	2411.10435v1	null
2024-11-15	Private Counterfactual Retrieval With Immutable Features	Shreya Meel et.al.	2411.10429v1	null
2024-11-15	Back to Supervision: Boosting Word Boundary Detection through Frame Classification	Simone Carnemolla et.al.	2411.10423v1	null
2024-11-15	Multiscale Dubuc: A New Similarity Measure for Time Series	Mahsa Khazaei et.al.	2411.10418v1	null
2024-11-15	Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations	Jianfeng Chi et.al.	2411.10414v1	null
2024-11-15	Experimental demonstration of Tessellation Structured Illumination Microscopy	Doron Shterman et.al.	2411.10405v1	null
2024-11-15	On the Foundation Model for Cardiac MRI Reconstruction	Chi Zhang et.al.	2411.10403v1	null
2024-11-15	Tropical combinatorics of max-linear Bayesian networks	Carlos Améndola et.al.	2411.10394v1	null
2024-11-15	Mechanisms of Generative Image-to-Image Translation Networks	Guangzong Chen et.al.	2411.10368v1	null
2024-11-15	On the Cost of Model-Serving Frameworks: An Experimental Evaluation	Pasquale De Rosa et.al.	2411.10337v1	null
2024-11-14	Towards a Classification of Open-Source ML Models and Datasets for Software Engineering	Alexandra González et.al.	2411.09683v1	null
2024-11-14	Commensurability Among Deligne-Mostow Monodromy Groups	Chenglong Yu et.al.	2411.09682v1	null
2024-11-14	Modular Fault Diagnosis Framework for Complex Autonomous Driving Systems	Stefan Orf et.al.	2411.09643v1	null
2024-11-14	The Moral Foundations Weibo Corpus	Renjie Cao et.al.	2411.09612v1	null
2024-11-14	Effect of viewing angle in Gamma-ray Burst properties	Sreelakshmi P Chakyar et.al.	2411.09609v1	null
2024-11-14	Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration	Yifan Shao et.al.	2411.09604v1	link
2024-11-14	Assessing the Performance of the DINOv2 Self-supervised Learning Vision Transformer Model for the Segmentation of the Left Atrium from MRI Images	Bipasha Kundu et.al.	2411.09598v1	null
2024-11-14	SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms	Soumick Chatterjee et.al.	2411.09593v1	null
2024-11-14	SimTube: Generating Simulated Video Comments through Multimodal AI and User Personas	Yu-Kai Hung et.al.	2411.09577v1	null
2024-11-14	Mutual Influence of Photon Sphere and Non-Commutative Parameter in Various Non-Commutative Black Holes: Part I- Towards evidence for WGC	Mohammad Ali S. Afshar et.al.	2411.09557v1	null
2024-11-13	4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization	Mijeong Kim et.al.	2411.08879v1	null
2024-11-13	A Short Note on Evaluating RepNet for Temporal Repetition Counting in Videos	Debidatta Dwibedi et.al.	2411.08878v1	link
2024-11-13	Quantum cryptography beyond key distribution: theory and experiment	Mathieu Bozzio et.al.	2411.08877v1	null
2024-11-13	Large Wireless Model (LWM): A Foundation Model for Wireless Channels	Sadjad Alikhani et.al.	2411.08872v1	null
2024-11-13	AstroM$^3$: A self-supervised multimodal model for astronomy	Mariia Rizhko et.al.	2411.08842v1	null
2024-11-13	Multimodal Instruction Tuning with Hybrid State Space Models	Jianing Zhou et.al.	2411.08840v1	null
2024-11-13	Model agnostic local variable importance for locally dependent relationships	Kelvyn K. Bladen et.al.	2411.08821v1	null
2024-11-13	Identifying Spicules in Mg II: Statistics and Comparisons with Hα	Vicki L. Herde et.al.	2411.08801v1	null
2024-11-13	Algorithms in 4-manifold topology	Stefan Bastl et.al.	2411.08775v1	null
2024-11-13	Sharingan: Extract User Action Sequence from Desktop Recordings	Yanting Chen et.al.	2411.08768v1	null
2024-11-12	Leonardo vindicated: Pythagorean trees for minimal reconstruction of the natural branching structures	Dymitr Ruta et.al.	2411.08024v1	null
2024-11-12	Artistic Neural Style Transfer Algorithms with Activation Smoothing	Xiangtian Li et.al.	2411.08014v1	null
2024-11-12	A computer-vision aided Compton-imaging system for radioactive waste characterization and decommissioning of nuclear power plants	Victor Babiano-Suarez et.al.	2411.07996v1	null
2024-11-12	DINO-LG: A Task-Specific DINO Model for Coronary Calcium Scoring	Mahmut S. Gokmen et.al.	2411.07976v1	null
2024-11-12	Commissioning An All-Sky Infrared Camera Array for Detection Of Airborne Objects	Laura Dominé et.al.	2411.07956v1	null
2024-11-12	SimBase: A Simple Baseline for Temporal Video Grounding	Peijun Bao et.al.	2411.07945v1	null
2024-11-12	DuoLift-GAN:Reconstructing CT from Single-view and Biplanar X-Rays with Generative Adversarial Networks	Zhaoxi Zhang et.al.	2411.07941v1	null
2024-11-12	Prediction of Acoustic Communication Performance for AUVs using Gaussian Process Classification	Yifei Gao et.al.	2411.07933v1	null
2024-11-12	CT-Mamba: A Hybrid Convolutional State Space Model for Low-Dose CT Denoising	Linxuan Li et.al.	2411.07930v1	null
2024-11-12	CryptoLLM: Unleashing the Power of Prompted LLMs for SmartQnA and Classification of Crypto Posts	Aniket Deroy et.al.	2411.07917v1	null
2024-11-11	Grounding Video Models to Actions through Goal Conditioned Exploration	Yunhao Luo et.al.	2411.07223v1	null
2024-11-11	NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics	David Robinson et.al.	2411.07186v1	null
2024-11-11	Enhancing Predictive Maintenance in Mining Mobile Machinery through a TinyML-enabled Hierarchical Inference Network	Raúl de la Fuente et.al.	2411.07168v1	null
2024-11-11	Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation	Kaijian Zou et.al.	2411.07130v1	link
2024-11-11	StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification	Yichen He et.al.	2411.07076v1	link
2024-11-11	Unified Bayesian representation for high-dimensional multi-modal biomedical data for small-sample classification	Albert Belenguer-Llorens et.al.	2411.07043v1	null
2024-11-11	The Inherent Adversarial Robustness of Analog In-Memory Computing	Corey Lammie et.al.	2411.07023v1	null
2024-11-11	HeteroSample: Meta-path Guided Sampling for Heterogeneous Graph Representation Learning	Ao Liu et.al.	2411.07022v1	null
2024-11-11	Token2Wave	Xin Zhang et.al.	2411.06989v1	null
2024-11-11	A Hyperspectral Imaging Dataset and Methodology for Intraoperative Pixel-Wise Classification of Metastatic Colon Cancer in the Liver	Ivica Kopriva et.al.	2411.06969v1	null
2024-11-08	Gender Inequalities in Content Collaborations: Asymmetric Creator Synergy and Symmetric Audience Biases	Mingyue Zha et.al.	2411.05782v1	null
2024-11-08	Sketched Equivariant Imaging Regularization and Deep Internal Learning for Inverse Problems	Guixian Xu et.al.	2411.05771v1	null
2024-11-08	FisherMask: Enhancing Neural Network Labeling Efficiency in Image Classification Using Fisher Information	Shreen Gul et.al.	2411.05752v1	link
2024-11-08	Accurate Unsupervised Photon Counting from Transition Edge Sensor Signals	Nicolas Dalbec-Constant et.al.	2411.05737v1	null
2024-11-08	Poze: Sports Technique Feedback under Data Constraints	Agamdeep Singh et.al.	2411.05734v1	null
2024-11-08	Differential Privacy Under Class Imbalance: Methods and Empirical Insights	Lucas Rosenblatt et.al.	2411.05733v1	null
2024-11-08	On-chip rewritable phase-change metasurface for programmable diffractive deep neural networks	Sanaz Zarei et.al.	2411.05723v1	null
2024-11-08	Classification of ($ρ,τ,σ$)-derivations of two-dimensional left-symmetric dialgebras	Basdouri Imed et.al.	2411.05716v1	null
2024-11-08	STARS: Sensor-agnostic Transformer Architecture for Remote Sensing	Ethan King et.al.	2411.05714v1	null
2024-11-08	Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream	Abdulkadir Gokce et.al.	2411.05712v1	link
2024-11-07	ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning	David Junhao Zhang et.al.	2411.05003v1	null
2024-11-07	DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation	Peiqi Liu et.al.	2411.04999v1	null
2024-11-07	HourVideo: 1-Hour Video-Language Understanding	Keshigeyan Chandrasegaran et.al.	2411.04998v1	null
2024-11-07	SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation	Koichi Namekata et.al.	2411.04989v1	null
2024-11-07	Efficient Preparation of Solvable Anyons with Adaptive Quantum Circuits	Yuanjie Ren et.al.	2411.04985v1	null
2024-11-07	Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries	Dylan Manuel et.al.	2411.04981v1	null
2024-11-07	Uncovering Hidden Subspaces in Video Diffusion Models Using Re-Identification	Mischa Dombrowski et.al.	2411.04956v1	null
2024-11-07	Estimating the Influence of Sequentially Correlated Literary Properties in Textual Classification: A Data-Centric Hypothesis-Testing Approach	Gideon Yoffe et.al.	2411.04950v1	null
2024-11-07	Proof of the absence of local conserved quantities in the spin-1 bilinear-biquadratic chain and its anisotropic extensions	Akihiro Hokkyo et.al.	2411.04945v1	null
2024-11-07	A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model	Panwen Hu et.al.	2411.04942v1	null
2024-11-06	RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models	Maya Varma et.al.	2411.04097v1	link
2024-11-06	Local unitary equivalence of absolutely maximally entangled states constructed from orthogonal arrays	N Ramadas et.al.	2411.04096v1	null
2024-11-06	A Collaborative Content Moderation Framework for Toxicity Detection based on Conformalized Estimates of Annotation Disagreement	Guillermo Villate-Castillo et.al.	2411.04090v1	link
2024-11-06	Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning	Ping Li et.al.	2411.04059v1	link
2024-11-06	Distinguishing Coupled Dark Energy Models with Neural Networks	L. W. K. Goh et.al.	2411.04058v1	link
2024-11-06	Synomaly Noise and Multi-Stage Diffusion: A Novel Approach for Unsupervised Anomaly Detection in Ultrasound Imaging	Yuan Bi et.al.	2411.04004v1	null
2024-11-06	Learning Aggregate Queries Defined by First-Order Logic with Counting	Steffen van Bergerem et.al.	2411.04003v1	null
2024-11-06	ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks	Ziji Shi et.al.	2411.03999v1	null
2024-11-06	Fine-tuning -- a Transfer Learning approach	Joseph Arul Raj et.al.	2411.03941v1	null
2024-11-06	Inter-Frame Coding for Dynamic Meshes via Coarse-to-Fine Anchor Mesh Generation	He Huang et.al.	2411.03921v1	null
2024-11-05	Classification Done Right for Vision-Language Pre-Training	Huang Zilong et.al.	2411.03313v1	link
2024-11-05	Automatic solid form classification in pharmaceutical drug development	Julius Lange et.al.	2411.03308v1	null
2024-11-05	Data-Driven Sampling Based Stochastic MPC for Skid-Steer Mobile Robot Navigation	Ananya Trivedi et.al.	2411.03289v1	link
2024-11-05	Graph-Based Semi-Supervised Segregated Lipschitz Learning	Farid Bozorgnia et.al.	2411.03273v1	null
2024-11-05	Tuning into spatial frequency space: Satellite and space debris detection in the ZTF alert stream	J. P. Carvajal et.al.	2411.03258v1	null
2024-11-05	Kernel Orthogonality does not necessarily imply a Decrease in Feature Map Redundancy in CNNs: Convolutional Similarity Minimization	Zakariae Belmekki et.al.	2411.03226v1	null
2024-11-05	Beyond Grid Data: Exploring Graph Neural Networks for Earth Observation	Shan Zhao et.al.	2411.03223v1	null
2024-11-05	Statistical Analysis to Support CSI-Based Sensing Methods	Elena Tonini et.al.	2411.03203v1	null
2024-11-05	Navigating Extremes: Dynamic Sparsity in Large Output Space	Nasib Ullah et.al.	2411.03171v1	null
2024-11-05	Pre-trained Visual Dynamics Representations for Efficient Policy Learning	Hao Luo et.al.	2411.03169v1	null
2024-11-04	Adaptive Caching for Faster Video Generation with Diffusion Transformers	Kumara Kahatapitiya et.al.	2411.02397v1	null
2024-11-04	AutoVFX: Physically Realistic Video Editing from Natural Language Instructions	Hao-Yu Hsu et.al.	2411.02394v1	null
2024-11-04	How Far is Video Generation from World Model: A Physical Law Perspective	Bingyi Kang et.al.	2411.02385v1	null
2024-11-04	Drone Data Analytics for Measuring Traffic Metrics at Intersections in High-Density Areas	Qingwen Pu et.al.	2411.02349v1	null
2024-11-04	SplatOverflow: Asynchronous Hardware Troubleshooting	Amritansh Kwatra et.al.	2411.02332v1	null
2024-11-04	PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance	Ruyang Liu et.al.	2411.02327v1	link
2024-11-04	GenXD: Generating Any 3D and 4D Scenes	Yuyang Zhao et.al.	2411.02319v1	null
2024-11-04	Information plane and compression-gnostic feedback in quantum machine learning	Nathan Haboury et.al.	2411.02313v1	null
2024-11-04	Grouped Discrete Representation for Object-Centric Learning	Rongzhen Zhao et.al.	2411.02299v1	null
2024-11-04	Conformal-in-the-Loop for Learning with Imbalanced Noisy Data	John Brandon Graham-Knight et.al.	2411.02281v1	null
2024-10-31	EgoMimic: Scaling Imitation Learning via Egocentric Video	Simar Kareer et.al.	2410.24221v1	link
2024-10-31	Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning	Penghui Ruan et.al.	2410.24219v1	link
2024-10-31	Learning Video Representations without Natural Videos	Xueyang Yu et.al.	2410.24213v1	null
2024-11-01	DELTA: Dense Efficient Long-range 3D Tracking for any video	Tuan Duc Ngo et.al.	2410.24211v2	null
2024-10-31	DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion	Weicai Ye et.al.	2410.24203v1	link
2024-10-31	DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning	Zhenyu Jiang et.al.	2410.24185v1	null
2024-10-31	Extended Object Tracking and Classification based on Linear Splines	Matteo Tesori et.al.	2410.24183v1	null
2024-10-31	$π_0$: A Vision-Language-Action Flow Model for General Robot Control	Kevin Black et.al.	2410.24164v1	null
2024-10-31	Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age	Nouar AlDahoul et.al.	2410.24148v1	null
2024-10-31	HoloChrome: Polychromatic Illumination for Speckle Reduction in Holographic Near-Eye Displays	Florian Schiffers et.al.	2410.24144v1	null
2024-10-30	Bridging the Human to Robot Dexterity Gap through Object-Oriented Rewards	Irmak Guzey et.al.	2410.23289v1	null
2024-10-30	Computing the bridge length: the key ingredient in a continuous isometry classification of periodic point sets	Jonathan McManus et.al.	2410.23288v1	null
2024-10-30	ReferEverything: Towards Segmenting Everything We Can Speak of in Videos	Anurag Bagchi et.al.	2410.23287v1	null
2024-10-30	DisCo: Distributed Contact-Rich Trajectory Optimization for Forceful Multi-Robot Collaboration	Ola Shorinwa et.al.	2410.23283v1	null
2024-10-30	A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization	Bin Wu et.al.	2410.23279v1	null
2024-10-30	SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation	Yining Hong et.al.	2410.23277v1	null
2024-10-30	TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models	Ziyao Shangguan et.al.	2410.23266v1	link
2024-10-30	bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction	Yehe Liu et.al.	2410.23247v1	null
2024-10-30	PointRecon: Online Point-based 3D Reconstruction via Ray-based 2D-3D Matching	Chen Ziwen et.al.	2410.23245v1	null
2024-10-31	Aligning Audio-Visual Joint Representations with an Agentic Workflow	Shentong Mo et.al.	2410.23230v2	null
2024-10-29	Local Policies Enable Zero-shot Long-horizon Manipulation	Murtaza Dalal et.al.	2410.22332v1	null
2024-10-30	Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets	Guangqi Jiang et.al.	2410.22325v2	null
2024-10-29	Enhancing Code Annotation Reliability: Generative AI's Role in Comment Quality Assessment Models	Seetharam Killivalavan et.al.	2410.22323v1	null
2024-10-29	Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier	Kai Wang et.al.	2410.22317v1	link
2024-10-29	Convex Formulations for Training Two-Layer ReLU Neural Networks	Karthik Prakhya et.al.	2410.22311v1	link
2024-10-29	Emotion-Guided Image to Music Generation	Souraja Kundu et.al.	2410.22299v1	null
2024-10-29	Motion Graph Unleashed: A Novel Approach to Video Prediction	Yiqi Zhong et.al.	2410.22288v1	link
2024-10-29	Non-LTE Synthetic Observables of a Multidimensional Model of Type Ia Supernovae	Samuel J. Boos et.al.	2410.22276v1	null
2024-10-29	Leveraging Reverberation and Visual Depth Cues for Sound Event Localization and Detection with Distance Estimation	Davide Berghi et.al.	2410.22271v1	null
2024-10-29	LipKernel: Lipschitz-Bounded Convolutional Neural Networks via Dissipative Layers	Patricia Pauli et.al.	2410.22258v1	link
2024-10-28	LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior	Hanyu Wang et.al.	2410.21264v1	null
2024-10-28	Multi-modal AI for comprehensive breast cancer prognostication	Jan Witowski et.al.	2410.21256v1	null
2024-10-28	Joint Audio-Visual Idling Vehicle Detection with Streamlined Input Dependencies	Xiwen Li et.al.	2410.21170v1	null
2024-10-28	KaLDeX: Kalman Filter based Linear Deformable Cross Attention for Retina Vessel Segmentation	Zhihao Zhao et.al.	2410.21160v1	null
2024-10-28	Synthetica: Large Scale Synthetic Data for Robot Perception	Ritvik Singh et.al.	2410.21153v1	null
2024-10-28	The tau function for ABS equations	James Atkinson et.al.	2410.21148v1	null
2024-10-28	Enhancing Learned Image Compression via Cross Window-based Attention	Priyanka Mudgal et.al.	2410.21144v1	null
2024-10-28	uOttawa at LegalLens-2024: Transformer-based Classification Experiments	Nima Meghdadi et.al.	2410.21139v1	link
2024-10-28	Do LLMs generate test oracles that capture the actual or the expected program behaviour?	Michael Konstantinou et.al.	2410.21136v1	null
2024-10-28	Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal Sequences	Zhihao Zhao et.al.	2410.21130v1	null
2024-10-25	Sparse Decomposition of Graph Neural Networks	Yaochen Hu et.al.	2410.19723v1	null
2024-10-25	Arabic Music Classification and Generation using Deep Learning	Mohamed Elshaarawy et.al.	2410.19719v1	null
2024-10-25	Enhanced Anomaly Detection in Industrial Control Systems aided by Machine Learning	Vegard Berge et.al.	2410.19717v1	null
2024-10-25	TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	Xiangyu Zeng et.al.	2410.19702v1	null
2024-10-25	MILES: Making Imitation Learning Easy with Self-Supervision	Georgios Papagiannis et.al.	2410.19693v1	null
2024-10-25	Deep Learning for Classification of Inflammatory Bowel Disease Activity in Whole Slide Images of Colonic Histopathology	Amit Das et.al.	2410.19690v1	null
2024-10-25	Optimizing Hearthstone Agents using an Evolutionary Algorithm	Pablo García-Sánchez et.al.	2410.19681v1	null
2024-10-25	Learning the Regularization Strength for Deep Fine-Tuning via a Data-Emphasized Variational Objective	Ethan Harvey et.al.	2410.19675v1	null
2024-10-25	MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services	Hongjia Wu et.al.	2410.19665v1	null
2024-10-25	VARS: Vision-based Assessment of Risk in Security Systems	Pranav Gupta et.al.	2410.19642v1	null
2024-10-24	Framer: Interactive Frame Interpolation	Wen Wang et.al.	2410.18978v1	null
2024-10-24	CAMEL-Bench: A Comprehensive Arabic LMM Benchmark	Sara Ghaboura et.al.	2410.18976v1	link
2024-10-24	Unbounded: A Generative Infinite Game of Character Life Simulation	Jialu Li et.al.	2410.18975v1	null
2024-10-24	Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling	Mingtong Zhang et.al.	2410.18912v1	null
2024-10-24	SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment	Caelan Garrett et.al.	2410.18907v1	null
2024-10-24	A Survey of Multimodal Sarcasm Detection	Shafkat Farabi et.al.	2410.18882v1	null
2024-10-24	Multi-Class Abnormality Classification in Video Capsule Endoscopy Using Deep Learning	Arnav Samal et.al.	2410.18879v1	link
2024-10-24	Exploring the Universe with SNAD: Anomaly Detection in Astronomy	Alina A. Volnova et.al.	2410.18875v1	null
2024-10-24	Exploring a Geometric Conjecture, Some Properties of Blaschke Products, and the Geometry of Curves Formed by Them	Mehmet Celik et.al.	2410.18863v1	null
2024-10-24	Highly efficient non-rigid registration in k-space with application to cardiac Magnetic Resonance Imaging	Aya Ghoul et.al.	2410.18834v1	link
2024-10-23	FIPER: Generalizable Factorized Fields for Joint Image Compression and Super-Resolution	Yang-Che Sun et.al.	2410.18083v1	null
2024-10-23	WorldSimBench: Towards Video Generation Models as World Simulators	Yiran Qin et.al.	2410.18072v1	null
2024-10-23	Eigenvalue crossings in equivariant families of matrices	Jonathan Rawlinson et.al.	2410.18068v1	null
2024-10-23	The Double-Edged Sword of Behavioral Responses in Strategic Classification: Theory and User Studies	Raman Ebrahimi et.al.	2410.18066v1	null
2024-10-23	Real time anomalies detection on video	Fabien Poirier et.al.	2410.18051v1	null
2024-10-23	Boundary topological insulators and superconductors of Altland-Zirnbauer tenfold classes	Xun-Jiang Luo et.al.	2410.18015v1	null
2024-10-24	Effective Finite Time Stability Control for Human-Machine Shared Vehicle Following System	Zihan Wang et.al.	2410.18007v2	null
2024-10-23	Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation	Suho Kang et.al.	2410.18001v1	link
2024-10-23	Optical Generative Models	Shiqi Chen et.al.	2410.17970v1	null
2024-10-23	A Wavelet Diffusion GAN for Image Super-Resolution	Lorenzo Aloisi et.al.	2410.17966v1	null
2024-10-22	Altogether: Image Captioning via Re-aligning Alt-text	Hu Xu et.al.	2410.17251v1	null
2024-10-22	Classifying rational polygons with small denominator and few interior lattice points	Martin Bohnert et.al.	2410.17244v1	null
2024-10-22	Frontiers in Intelligent Colonoscopy	Ge-Peng Ji et.al.	2410.17241v1	link
2024-10-22	Automated Spinal MRI Labelling from Reports Using a Large Language Model	Robin Y. Park et.al.	2410.17235v1	link
2024-10-22	Few-shot In-Context Preference Learning Using Large Language Models	Chao Yu et.al.	2410.17233v1	null
2024-10-22	Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods	Tsachi Blau et.al.	2410.17222v1	null
2024-10-22	The Decision Problem for Regular First-Order Theories	Umang Mathur et.al.	2410.17185v1	null
2024-10-22	Technical Report: Toward Applying Quantum Computing to Network Verification	Kahlil Dozier et.al.	2410.17184v1	null
2024-10-22	KANICE: Kolmogorov-Arnold Networks with Interactive Convolutional Elements	Md Meftahul Ferdaus et.al.	2410.17172v1	link
2024-10-22	Are Visual-Language Models Effective in Action Recognition? A Comparative Study	Mahmoud Ali et.al.	2410.17149v1	null
2024-10-21	SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree	Shuangrui Ding et.al.	2410.16268v1	link
2024-10-21	xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs	Michael S. Ryoo et.al.	2410.16267v1	null
2024-10-21	3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors	Xi Liu et.al.	2410.16266v1	null
2024-10-21	Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos	Gengshan Yang et.al.	2410.16259v1	null
2024-10-21	Serendipitous detection of an intense X-ray flare in the weak-line T Tauri star KM Ori with SRG/eROSITA	Savithri H. Ezhikode et.al.	2410.16241v1	null
2024-10-21	MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report	Samrajya Thapa et.al.	2410.16239v1	link
2024-10-21	Deep Radiomics Detection of Clinically Significant Prostate Cancer on Multicenter MRI: Initial Comparison to PI-RADS Assessment	G. A. Nketiah et.al.	2410.16238v1	null
2024-10-22	Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models	Giannis Daras et.al.	2410.16152v2	null
2024-10-21	An Explainable Contrastive-based Dilated Convolutional Network with Transformer for Pediatric Pneumonia Detection	Chandravardhan Singh Raghaw et.al.	2410.16143v1	null
2024-10-21	Modeling dynamic neural activity by combining naturalistic video stimuli and stimulus-independent latent factors	Finn Schmidt et.al.	2410.16136v1	null
2024-10-18	Real-time Fake News from Adversarial Feedback	Sanxing Chen et.al.	2410.14651v1	null
2024-10-18	GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings	Raghuveer Thirukovalluru et.al.	2410.14635v1	null
2024-10-18	You Shall Know a Tool by the Traces it Leaves: The Predictability of Sentiment Analysis Tools	Daniel Baumartz et.al.	2410.14626v1	null
2024-10-18	Learning to Control the Smoothness of Graph Convolutional Network Features	Shih-Hsin Wang et.al.	2410.14604v1	null
2024-10-18	Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection	Aaron Alvarado Kristanto Julistiono et.al.	2410.14581v1	null
2024-10-18	A Hybrid Feature Fusion Deep Learning Framework for Leukemia Cancer Detection in Microscopic Blood Sample Using Gated Recurrent Unit and Uncertainty Quantification	Maksuda Akter et.al.	2410.14536v1	null
2024-10-18	Less is More: Selective Reduction of CT Data for Self-Supervised Pre-Training of Deep Learning Models with Contrastive Learning Improves Downstream Classification Performance	Daniel Wolf et.al.	2410.14524v1	link
2024-10-18	Influence of anisotropy on the study of critical behavior of spin models by machine learning methods	Diana Sukhoverkhova et.al.	2410.14523v1	null
2024-10-18	A character approach to the ISR property	Artem Dudko et.al.	2410.14517v1	null
2024-10-18	Efficient Annotator Reliability Assessment and Sample Weighting for Knowledge-Based Misinformation Detection on Social Media	Owen Cook et.al.	2410.14515v1	link
2024-10-17	DepthSplat: Connecting Gaussian Splatting and Depth	Haofei Xu et.al.	2410.13862v1	link
2024-10-17	Adaptive Subsampling and Learned Model Improve Spatiotemporal Resolution of Tactile Skin	Ariel Slepyan et.al.	2410.13847v1	null
2024-10-17	VidPanos: Generative Panoramic Videos from Casual Panning Videos	Jingwei Ma et.al.	2410.13832v1	null
2024-10-17	DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control	Yujie Wei et.al.	2410.13830v1	null
2024-10-17	Multi-style conversion for semantic segmentation of lesions in fundus images by adversarial attacks	Clément Playout et.al.	2410.13822v1	link
2024-10-17	Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance	Mitsuhiko Nakamoto et.al.	2410.13816v1	null
2024-10-17	A Pattern to Align Them All: Integrating Different Modalities to Define Multi-Modal Entities	Gianluca Apriceno et.al.	2410.13803v1	link
2024-10-17	MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations	Liang Xu et.al.	2410.13790v1	link
2024-10-17	Strong-to-weak spontaneous symmetry breaking meets average symmetry-protected topological order	Yuchen Guo et.al.	2410.13734v1	null
2024-10-17	Representing Model Weights with Language using Tree Experts	Eliahu Horwitz et.al.	2410.13569v1	null
2024-10-16	Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception	Jihao Zhao et.al.	2410.12788v1	null
2024-10-16	The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio	Sicong Leng et.al.	2410.12787v1	null
2024-10-16	Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions	Zhenyu Jiang et.al.	2410.12773v1	null
2024-10-16	Vaccinating Federated Learning for Robust Modulation Classification in Distributed Wireless Networks	Hunmin Lee et.al.	2410.12772v1	null
2024-10-16	Phase retrieval via media diversity	Yan Cheng et.al.	2410.12767v1	null
2024-10-16	SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation	Jaehong Yoon et.al.	2410.12761v1	null
2024-10-16	Unitary Multi-Margin BERT for Robust Natural Language Processing	Hao-Yuan Chang et.al.	2410.12759v1	null
2024-10-16	PND-Net: Plant Nutrition Deficiency and Disease Classification using Graph Convolutional Network	Asish Bera et.al.	2410.12742v1	null
2024-10-16	How much time do we have before catastrophic disclosure occurs?	Matthew Szydagis et.al.	2410.12738v1	null
2024-10-16	Machine Learning-Augmented Ontology-Based Data Access for Renewable Energy Data	Marco Calautti et.al.	2410.12734v1	null
2024-10-15	High-Resolution Frame Interpolation with Patch-based Cascaded Diffusion	Junhwa Hur et.al.	2410.11838v1	null
2024-10-15	Contrastive Touch-to-Touch Pretraining	Samanta Rodriguez et.al.	2410.11834v1	null
2024-10-15	CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos	Nikita Karaev et.al.	2410.11831v1	null
2024-10-15	Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos	Zhouxia Wang et.al.	2410.11828v1	null
2024-10-15	On representations of Arthur type and unitary dual for classical groups	Alexander Hazeltine et.al.	2410.11806v1	null
2024-10-16	Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices	Zhiyuan Ma et.al.	2410.11795v2	null
2024-10-15	OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation	Jinhan Li et.al.	2410.11792v1	null
2024-10-15	Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability	Tsz Ting Chung et.al.	2410.11786v1	null
2024-10-15	On the Training Convergence of Transformers for In-Context Classification	Wei Shen et.al.	2410.11778v1	null
2024-10-15	Temporal resolution enhancement in Structured Illumination Microscopy using cascaded reconstruction	Doron Shterman et.al.	2410.11770v1	null
2024-10-14	Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models	Jingzhi Bao et.al.	2410.10821v1	null
2024-10-14	TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models	Mu Cai et.al.	2410.10818v1	null
2024-10-14	LVD-2M: A Long-take Video Dataset with Temporally Dense Captions	Tianwei Xiong et.al.	2410.10816v1	link
2024-10-14	Depth Any Video with Scalable Synthetic Data	Honghui Yang et.al.	2410.10815v1	null
2024-10-14	Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies	Yanjie Ze et.al.	2410.10803v1	link
2024-10-14	Boosting Camera Motion Control for Video Diffusion Transformers	Soon Yau Cheong et.al.	2410.10802v1	null
2024-10-14	Probabilistic Degeneracy Detection for Point-to-Plane Error Minimization	Johan Hatleskog et.al.	2410.10784v1	null
2024-10-14	3DArticCyclists: Generating Simulated Dynamic 3D Cyclists for Human-Object Interaction (HOI) and Autonomous Driving Applications	Eduardo R. Corral-Soto et.al.	2410.10782v1	null
2024-10-14	ControlMM: Controllable Masked Motion Generation	Ekkasit Pinyoanuntapong et.al.	2410.10780v1	null
2024-10-14	Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention	Dejia Xu et.al.	2410.10774v1	null
2024-10-11	Optimal Downsampling for Imbalanced Classification with Generalized Linear Models	Yan Chen et.al.	2410.08994v1	null
2024-10-11	Realizing Linear Synaptic Plasticity in Electric Double Layer-Gated Transistors for Improved Predictive Accuracy and Efficiency in Neuromorphic Computing	Nithil Harris Manimaran et.al.	2410.08978v1	null
2024-10-11	ALVIN: Active Learning Via INterpolation	Michalis Korakakis et.al.	2410.08972v1	null
2024-10-11	Evaluating Federated Kolmogorov-Arnold Networks on Non-IID Data	Arthur Mendonça Sasse et.al.	2410.08961v1	null
2024-10-11	Lifted Coefficient of Determination: Fast model-free prediction intervals and likelihood-free model comparison	Daniel Salnikov et.al.	2410.08958v1	null
2024-10-11	Rapid Grassmannian Averaging with Chebyshev Polynomials	Brighton Ancelin et.al.	2410.08956v1	null
2024-10-11	Local moduli in the special 2-flags of length 5	Piotr Mormul et.al.	2410.08951v1	null
2024-10-11	On the Adversarial Transferability of Generalized "Skip Connections"	Yisen Wang et.al.	2410.08950v1	null
2024-10-11	Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing	Clayton Leite et.al.	2410.08931v1	null
2024-10-11	Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images	Virmarie Maquiling et.al.	2410.08926v1	null
2024-10-10	LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts	Anh-Quan Cao et.al.	2410.08211v1	null
2024-10-10	Scaling Laws For Diffusion Transformers	Zhengyang Liang et.al.	2410.08184v1	null
2024-10-10	RGM: Reconstructing High-fidelity 3D Car Assets with Relightable 3D-GS Generative Model from a Single Image	Xiaoxue Chen et.al.	2410.08181v1	null
2024-10-10	A note on the symplectic classification of almost-toric systems	Xiudi Tang et.al.	2410.08175v1	null
2024-10-10	Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models	Qingni Wang et.al.	2410.08174v1	null
2024-10-10	Progressive Autoregressive Video Diffusion Models	Desai Xie et.al.	2410.08151v1	link
2024-10-10	Robust AI-Generated Text Detection by Restricted Embeddings	Kristian Kuznetsov et.al.	2410.08113v1	null
2024-10-10	Color-Guided Flying Pixel Correction in Depth Images	Ekamresh Vasudevan et.al.	2410.08084v1	null
2024-10-10	Dynamic Object Catching with Quadruped Robot Front Legs	André Schakkal et.al.	2410.08065v1	null
2024-10-10	A Target-Aware Analysis of Data Augmentation for Hate Speech Detection	Camilla Casula et.al.	2410.08053v1	null
2024-10-09	MM-Ego: Towards Building Egocentric Multimodal LLMs	Hanrong Ye et.al.	2410.07177v1	null
2024-10-09	One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation	Fabian Paischer et.al.	2410.07170v1	null
2024-10-09	Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis	Bohan Zeng et.al.	2410.07155v1	link
2024-10-09	Mental Disorders Detection in the Era of Large Language Models	Gleb Kuzmin et.al.	2410.07129v1	null
2024-10-09	Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication	Erzhen Hu et.al.	2410.07119v1	null
2024-10-09	JPEG Inspired Deep Learning	Ahmed H. Salamah et.al.	2410.07081v1	null
2024-10-09	Retrieval-Augmented Decision Transformer: External Memory for In-context RL	Thomas Schmied et.al.	2410.07071v1	null
2024-10-09	TinyEmo: Scaling down Emotional Reasoning via Metric Projection	Cristian Gutierrez et.al.	2410.07062v1	link
2024-10-09	Z-upscaling: Optical Flow Guided Frame Interpolation for Isotropic Reconstruction of 3D EM Volumes	Fisseha A. Ferede et.al.	2410.07043v1	link
2024-10-09	Optimizing Estimators of Squared Calibration Errors in Classification	Sebastian G. Gruber et.al.	2410.07014v1	null
2024-10-07	Fine-Tuning CLIP's Last Visual Projector: A Few-Shot Cornucopia	Mohammad Fahes et.al.	2410.05270v1	link
2024-10-07	Grounding Partially-Defined Events in Multimodal Data	Kate Sanders et.al.	2410.05267v1	null
2024-10-07	DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control	Kaifeng Zhao et.al.	2410.05260v1	null
2024-10-07	SePPO: Semi-Policy Preference Optimization for Diffusion Alignment	Daoan Zhang et.al.	2410.05255v1	link
2024-10-07	Causal Micro-Narratives	Mourad Heddaya et.al.	2410.05252v1	null
2024-10-07	LoTLIP: Improving Language-Image Pre-training for Long Text Understanding	Wei Wu et.al.	2410.05249v1	null
2024-10-07	The Dawn of Video Generation: Preliminary Explorations with SORA-like Models	Ailing Zeng et.al.	2410.05227v1	null
2024-10-07	Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality	Ge Ya et.al.	2410.05203v1	link
2024-10-07	Variable Resolution Pixel Quantization for Low Power Machine Vision Application on Edge	Senorita Deb et.al.	2410.05189v1	null
2024-10-07	VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks	Ziyan Jiang et.al.	2410.05160v1	null
2024-10-04	Spatial Hyperspheric Models for Compositional Data	Michael R. Schwob et.al.	2410.03648v1	null
2024-10-04	HyperCMR: Enhanced Multi-Contrast CMR Reconstruction with Eagle Loss	Ruru Xu et.al.	2410.03624v1	null
2024-10-04	Crystallography, Group Cohomology, and Lieb-Schultz-Mattis Constraints	Chunxiao Liu et.al.	2410.03607v1	null
2024-10-04	LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Videos	Noriaki Hirose et.al.	2410.03603v1	null
2024-10-04	Training Over a Distribution of Hyperparameters for Enhanced Performance and Adaptability on Imbalanced Classification	Kelsey Lieberman et.al.	2410.03588v1	null
2024-10-04	A Multi-model Approach for Video Data Retrieval in Autonomous Vehicle Development	Jesper Knapp et.al.	2410.03580v1	null
2024-10-04	Re-examining Sexism and Misogyny Classification with Annotator Attitudes	Aiqi Jiang et.al.	2410.03543v1	null
2024-10-04	Classification-Denoising Networks	Louis Thiry et.al.	2410.03505v1	null
2024-10-04	MO-DDN: A Coarse-to-Fine Attribute-based Exploration Agent for Multi-object Demand-driven Navigation	Hongcheng Wang et.al.	2410.03488v1	null
2024-10-04	A Multimodal Framework for Deepfake Detection	Kashish Gandhi et.al.	2410.03487v1	null
2024-10-03	Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats	Mingyang Xie et.al.	2410.02764v1	null
2024-10-03	Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos	Jianrui Zhang et.al.	2410.02763v1	null
2024-10-03	Loong: Generating Minute-level Long Videos with Autoregressive Language Models	Yuqing Wang et.al.	2410.02757v1	null
2024-10-03	An Online Automatic Modulation Classification Scheme Based on Isolation Distributional Kernel	Xinpeng Li et.al.	2410.02750v1	null
2024-10-03	OOD-Chameleon: Is Algorithm Selection for OOD Generalization Learnable?	Liangze Jiang et.al.	2410.02735v1	null
2024-10-03	Liouville's theorem in calibrated geometries	Toni Ikonen et.al.	2410.02722v1	null
2024-10-03	Curvature Diversity-Driven Deformation and Domain Alignment for Point Cloud	Mengxi Wu et.al.	2410.02720v1	link
2024-10-03	AlzhiNet: Traversing from 2DCNN to 3DCNN, Towards Early Detection and Diagnosis of Alzheimer's Disease	Romoke Grace Akindele et.al.	2410.02714v1	null
2024-10-04	Video Instruction Tuning With Synthetic Data	Yuanhan Zhang et.al.	2410.02713v2	null
2024-10-03	Impact of a reclassification on Web of Science articles on bibliometric indicators	Agénor Lahatte et.al.	2410.02701v1	null
2024-10-02	Loki: An Open-Source Tool for Fact Verification	Haonan Li et.al.	2410.01794v1	null
2024-10-03	Application of convolutional neural networks for extensive air shower separation in the SPHERE-3 experiment	E. L. Entina et.al.	2410.01781v2	null
2024-10-03	TopER: Topological Embeddings in Graph Representation Learning	Astrit Tola et.al.	2410.01778v2	null
2024-10-02	Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context	Spencer Frei et.al.	2410.01774v1	null
2024-10-02	SegHeD: Segmentation of Heterogeneous Data for Multiple Sclerosis Lesions with Anatomical Constraints	Berke Doga Basaran et.al.	2410.01766v1	null
2024-10-02	LightSC: The Making of a Usable Security Classification Tool for DevSecOps	Manish Shrestha et.al.	2410.01762v1	null
2024-10-02	Integrating Protein Sequence and Expression Level to Analysis Molecular Characterization of Breast Cancer Subtypes	Hossein Sholehrasa et.al.	2410.01755v1	null
2024-10-02	Unitary Representations of the Isometry Groups of Urysohn Spaces	Rémi Barritault et.al.	2410.01725v1	null
2024-10-02	COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation	Mingzhen Sun et.al.	2410.01718v1	null
2024-10-02	Rabi oscillations at three-photon laser excitation of a single rubidium Rydberg atom in an optical dipole trap	I. I. Beterov et.al.	2410.01703v1	null
2024-09-30	Continuously Improving Mobile Manipulation with Autonomous Real-World RL	Russell Mendonca et.al.	2409.20568v1	null
2024-09-30	MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning	Haotian Zhang et.al.	2409.20566v1	null
2024-09-30	DressRecon: Freeform 4D Human Reconstruction from Monocular Video	Jeff Tan et.al.	2409.20563v1	null
2024-09-30	LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner	Xiaopan Zhang et.al.	2409.20560v1	null
2024-09-30	Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos	Md Mohaiminul Islam et.al.	2409.20557v1	null
2024-09-30	Inverse Painting: Reconstructing The Painting Process	Bowei Chen et.al.	2409.20556v1	null
2024-09-30	UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models	Qiaojun Yu et.al.	2409.20551v1	null
2024-09-30	Statistical view of orbital circularisation with 14 000 characterised TESS eclipsing binaries	L. W. IJspeert et.al.	2409.20540v1	null
2024-09-30	Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers	Lirui Wang et.al.	2409.20537v1	link
2024-09-30	Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images	Bahri Batuhan Bilecen et.al.	2409.20530v1	null
2024-09-27	PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation	Shaowei Liu et.al.	2409.18964v1	link
2024-09-27	LML: Language Model Learning a Dataset for Data-Augmented Prediction	Praneeth Vadlapati et.al.	2409.18957v1	link
2024-09-27	Unconditional stability of a recurrent neural circuit implementing divisive normalization	Shivang Rawat et.al.	2409.18946v1	null
2024-09-27	From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding	Heqing Zou et.al.	2409.18938v1	null
2024-09-27	Subspace Preserving Quantum Convolutional Neural Network Architectures	Léo Monbroussou et.al.	2409.18918v1	null
2024-09-27	Improving Visual Object Tracking through Visual Prompting	Shih-Fang Chen et.al.	2409.18901v1	link
2024-09-27	Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors	Yunlong Lin et.al.	2409.18899v1	null
2024-09-27	Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models	Zehan Li et.al.	2409.18878v1	null
2024-09-27	Simulating Dynamic Tumor Contrast Enhancement in Breast MRI using Conditional Generative Adversarial Networks	Richard Osuala et.al.	2409.18872v1	null
2024-09-27	Fusion Systems and Simple Groups With Class Two Sylow $p$-subgroups	Martin van Beek et.al.	2409.18870v1	null
2024-09-26	EgoLM: Multi-Modal Language Model of Egocentric Motions	Fangzhou Hong et.al.	2409.18127v1	null
2024-09-26	LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness	Chenming Zhu et.al.	2409.18125v1	null
2024-09-26	RT-GuIDE: Real-Time Gaussian splatting for Information-Driven Exploration	Yuezhan Tao et.al.	2409.18122v1	null
2024-09-26	Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction	Justin Kerr et.al.	2409.18121v1	null
2024-09-26	E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding	Ye Liu et.al.	2409.18111v1	link
2024-09-26	MALPOLON: A Framework for Deep Species Distribution Modeling	Theo Larcher et.al.	2409.18102v1	null
2024-09-26	Incorporating sparse labels into biologging studies using hidden Markov models with weighted likelihoods	Evan Sidrow et.al.	2409.18091v1	null
2024-09-26	Stable Video Portraits	Mirela Ostrek et.al.	2409.18083v1	null
2024-09-26	Graded contractions on the orthogonal Lie algebras of dimensions 7 and 8	Cristina Draper et.al.	2409.18069v1	null
2024-09-26	LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field	Huan Wang et.al.	2409.18057v1	link
2024-09-25	DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion	Yukun Huang et.al.	2409.17145v1	null
2024-09-25	Streaming Neural Images	Marcos V. Conde et.al.	2409.17134v1	null
2024-09-25	Assessing the Level of Toxicity Against Distinct Groups in Bangla Social Media Comments: A Comprehensive Investigation	Mukaffi Bin Moin et.al.	2409.17130v1	null
2024-09-25	Classification of Gleason Grading in Prostate Cancer Histopathology Images Using Deep Learning Techniques: YOLO, Vision Transformers, and Vision Mamba	Amin Malekmohammadi et.al.	2409.17122v1	link
2024-09-25	Counting Triangles in Triangles	Jim Propp et.al.	2409.17117v1	null
2024-09-25	BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices	Yongqi Xu et.al.	2409.17093v1	link
2024-09-25	Accumulator-Aware Post-Training Quantization	Ian Colbert et.al.	2409.17092v1	null
2024-09-25	Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification	Xinrui Zhou et.al.	2409.17091v1	null
2024-09-25	SEN12-WATER: A New Dataset for Hydrological Applications and its Benchmarking	Luigi Russo et.al.	2409.17087v1	null
2024-09-25	The Effect of Perceptual Metrics on Music Representation Learning for Genre Classification	Tashi Namgyal et.al.	2409.17069v1	null
2024-09-24	Self-Supervised Any-Point Tracking by Contrastive Random Walks	Ayush Shrivastava et.al.	2409.16288v1	link
2024-09-24	Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking	Xi Wang et.al.	2409.16287v1	null
2024-09-24	Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation	Homanga Bharadhwaj et.al.	2409.16283v1	null
2024-09-24	Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation	Yong Xien Chng et.al.	2409.16278v1	null
2024-09-24	Compressed Depth Map Super-Resolution and Restoration: AIM 2024 Challenge Results	Marcos V. Conde et.al.	2409.16277v1	null
2024-09-24	CDChat: A Large Multimodal Model for Remote Sensing Change Description	Mubashir Noman et.al.	2409.16261v1	link
2024-09-24	Empirically Exploring the Space of Monostationarity in Dual Phosphorylation	May Cai et.al.	2409.16234v1	null
2024-09-24	VideoPatchCore: An Effective Method to Memorize Normality for Video Anomaly Detection	Sunghyun Ahn et.al.	2409.16225v1	link
2024-09-24	Upper-body free-breathing Magnetic Resonance Fingerprinting applied to the quantification of water T1 and fat fraction	Constantin Slioussarenko et.al.	2409.16200v1	null
2024-09-24	Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking	Jun Bai et.al.	2409.16198v1	null
2024-09-20	Gender Representation and Bias in Indian Civil Service Mock Interviews	Somonnoy Banerjee et.al.	2409.12194v3	null
2024-09-18	DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control	Zichen Jeff Cui et.al.	2409.12192v1	null
2024-09-18	Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution	Peng Wang et.al.	2409.12191v1	link
2024-09-18	multiPI-TransBTS: A Multi-Path Learning Framework for Brain Tumor Image Segmentation Based on Multi-Physical Information	Hongjun Zhu et.al.	2409.12167v1	link
2024-09-18	JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation	Sai Tanmay Reddy Chakkera et.al.	2409.12156v1	null
2024-09-18	Autopet III challenge: Incorporating anatomical knowledge into nnUNet for lesion segmentation in PET/CT	Hamza Kalisch et.al.	2409.12155v1	link
2024-09-18	MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion	Kalakonda Sai Shashank et.al.	2409.12140v1	null
2024-09-18	Mirages in the Energy Landscape of Soft Sphere Packings	Praharsh Suryadevara et.al.	2409.12113v1	null
2024-09-18	SPRMamba: Surgical Phase Recognition for Endoscopic Submucosal Dissection with Mamba	Xiangning Zhang et.al.	2409.12108v1	null
2024-09-18	Unveiling the Secrets of New Physics Through Top Quark Tagging	Rameswar Sahu et.al.	2409.12085v1	null
2024-09-17	Systematic analysis of Parity-Violating modes	Hong-Ming Zhu et.al.	2409.11400v1	null
2024-09-17	Online 4D Ultrasound-Guided Robotic Tracking Enables 3D Ultrasound Localisation Microscopy with Large Tissue Displacements	Jipeng Yan et.al.	2409.11391v1	null
2024-09-17	Normalization in Proportional Feature Spaces	Alexandre Benatti et.al.	2409.11389v1	null
2024-09-17	Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification	Fatema-E- Jannat et.al.	2409.11375v1	null
2024-09-17	Uncertainty and Prediction Quality Estimation for Semantic Segmentation via Graph Neural Networks	Edgar Heinert et.al.	2409.11373v1	null
2024-09-17	Compact Implicit Neural Representations for Plane Wave Images	Mathilde Monvoisin et.al.	2409.11370v1	null
2024-09-17	OSV: One Step is Enough for High-Quality Image to Video Generation	Xiaofeng Mao et.al.	2409.11367v1	null
2024-09-17	THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models	Mengfei Liang et.al.	2409.11353v1	null
2024-09-17	CLIP Adaptation by Intra-modal Overlap Reduction	Alexey Kravets et.al.	2409.11338v1	null
2024-09-17	LPT++: Efficient Training on Mixture of Long-tailed Experts	Bowen Dong et.al.	2409.11323v1	null
2024-09-16	Enhancing Video Transmission with Machine Learning based Routing in Software-Defined Networks	Anıl Dursun İpek et.al.	2409.10512v1	null
2024-09-16	Exploring 3D Face Reconstruction and Fusion Methods for Face Verification: A Case-Study in Video Surveillance	Simone Maurizio La Cava et.al.	2409.10481v1	null
2024-09-16	Real-Time Whole-Body Control of Legged Robots with Model-Predictive Path Integral Control	Juan Alvarez-Padilla et.al.	2409.10469v1	null
2024-09-16	Assortativity in sympatric speciation and species classification	Joao U. F. Lizarraga et.al.	2409.10466v1	null
2024-09-16	Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons	Farhad Pourkamali-Anaraki et.al.	2409.10463v1	null
2024-09-16	Deep-Wide Learning Assistance for Insect Pest Classification	Toan Nguyen et.al.	2409.10445v1	link
2024-09-16	A point process approach for the classification of noisy calcium imaging data	Arianna Burzacchi et.al.	2409.10409v1	null
2024-09-16	MOST: MR reconstruction Optimization for multiple downStream Tasks via continual learning	Hwihun Jeong et.al.	2409.10394v1	link
2024-09-16	Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning	Amin Karimi Monsefi et.al.	2409.10362v1	null
2024-09-16	2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation?	Téo Guichoux et.al.	2409.10357v1	null
2024-09-13	An Efficient and Streaming Audio Visual Active Speaker Detection System	Arnav Kundu et.al.	2409.09018v1	null
2024-09-13	Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation	Qingwen Bu et.al.	2409.09016v1	link
2024-09-13	Model-independent variable selection via the rule-based variable priorit	Min Lu et.al.	2409.09003v1	null
2024-09-13	Biomimetic Frontend for Differentiable Audio Processing	Ruolan Leslie Famularo et.al.	2409.08997v1	link
2024-09-13	Comparative Analysis of Pretrained Audio Representations in Music Recommender Systems	Yan-Martin Tamm et.al.	2409.08987v1	link
2024-09-13	Fast DCT+: A Family of Fast Transforms Based on Rank-One Updates of the Path Graph	Samuel Fernández-Menduiña et.al.	2409.08970v1	null
2024-09-13	Pushing the boundaries of event subsampling in event-based video classification using CNNs	Hesam Araghi et.al.	2409.08953v1	link
2024-09-13	Pushing Joint Image Denoising and Classification to the Edge	Thomas C Markhorst et.al.	2409.08943v1	null
2024-09-13	LLM-based Weak Supervision Framework for Query Intent Classification in Video Search	Farnoosh Javadi et.al.	2409.08931v1	null
2024-09-13	Classification of electronic structures and state preparation for quantum computation of reaction chemistry	Maximilian Mörchen et.al.	2409.08910v1	null
2024-09-12	Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor	Andrea Conti et.al.	2409.08277v1	null
2024-09-12	Hand-Object Interaction Pretraining from Videos	Himanshu Gaurav Singh et.al.	2409.08273v1	null
2024-09-12	DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer	Runjia Li et.al.	2409.08271v1	null
2024-09-12	OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering	Jiahao Nick Li et.al.	2409.08250v1	null
2024-09-12	A review of compact geodesic orbit manifolds and the g.o. condition for $\SU(5)/\s(\U(2)\times \U(2))$	Andreas Arvanitoyeorgos et.al.	2409.08247v1	null
2024-09-12	Model Ensemble for Brain Tumor Segmentation in Magnetic Resonance Imaging	Daniel Capellán-Martín et.al.	2409.08232v1	null
2024-09-12	CliquePH: Higher-Order Information for Graph Neural Networks through Persistent Homology on Clique Graphs	Davide Buffelli et.al.	2409.08217v1	null
2024-09-12	LT3SD: Latent Trees for 3D Scene Diffusion	Quan Meng et.al.	2409.08215v1	null
2024-09-12	Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video	Boxiang Rong et.al.	2409.08189v1	null
2024-09-13	Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification	Soufiyan Bahadi et.al.	2409.08188v2	null
2024-09-11	Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models	Haibo Yang et.al.	2409.07452v1	link
2024-09-11	VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos	Yan-Bo Lin et.al.	2409.07450v1	null
2024-09-11	Autonomous loading of ore piles with Load-Haul-Dump machines using Deep Reinforcement Learning	Rodrigo Salas et.al.	2409.07449v1	null
2024-09-11	StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos	Sijie Zhao et.al.	2409.07447v1	null
2024-09-11	Deep Neural Network-Based Sign Language Recognition: A Comprehensive Approach Using Transfer Learning with Explainability	A. E. M Ridwan et.al.	2409.07426v1	null
2024-09-11	Controllable retinal image synthesis using conditional StyleGAN and latent space manipulation for improved diagnosis and grading of diabetic retinopathy	Somayeh Pakdelmoez et.al.	2409.07422v1	null
2024-09-11	Efficient One-Step Diffusion Refinement for Snapshot Compressive Imaging	Yunzhen Wang et.al.	2409.07417v1	null
2024-09-11	NVRC: Neural Video Representation Compression	Ho Man Kwan et.al.	2409.07414v1	null
2024-09-12	Robust Robot Walker: Learning Agile Locomotion over Tiny Traps	Shaoting Zhu et.al.	2409.07409v2	null
2024-09-11	Revisiting Static Feature-Based Android Malware Detection	Md Tanvirul Alam et.al.	2409.07397v1	null
2024-09-10	A study on Deep Convolutional Neural Networks, Transfer Learning and Ensemble Model for Breast Cancer Detection	Md Taimur Ahad et.al.	2409.06699v1	null
2024-09-10	DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images	Taslim Murad et.al.	2409.06694v1	null
2024-09-10	Benchmarking Sub-Genre Classification For Mainstage Dance Music	Hongzhi Shu et.al.	2409.06690v1	null
2024-09-10	A comprehensive study on Blood Cancer detection and classification using Convolutional Neural Network	Md Taimur Ahad et.al.	2409.06689v1	null
2024-09-10	A study on deep feature extraction to detect and classify Acute Lymphoblastic Leukemia (ALL)	Sabit Ahamed Preanto et.al.	2409.06687v1	null
2024-09-10	Constructing an Interpretable Deep Denoiser by Unrolling Graph Laplacian Regularizer	Seyed Alireza Hosseini et.al.	2409.06676v1	null
2024-09-10	Bulk and atmospheric metallicities as direct probes of sequentially varying accretion mechanisms of gas and solids onto planets	Yasuhiro Hasegawa et.al.	2409.06670v1	null
2024-09-10	Data Collection-free Masked Video Modeling	Yuchi Ishikawa et.al.	2409.06665v1	null
2024-09-10	World-Grounded Human Motion Recovery via Gravity-View Coordinates	Zehong Shen et.al.	2409.06662v1	null
2024-09-10	Classifying Functions via growth rates of repeated iterations	Titus Hilberdink et.al.	2409.06661v1	null
2024-09-09	Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments	Haritheja Etukuru et.al.	2409.05865v1	null
2024-09-09	Neural MP: A Generalist Neural Motion Planner	Murtaza Dalal et.al.	2409.05864v1	null
2024-09-09	LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation	Henghui Ding et.al.	2409.05847v1	null
2024-09-10	Finite-size topological phases from semimetals	Adipta Pal et.al.	2409.05842v2	null
2024-09-09	Fast Generation of Custom Floating-Point Spatial Filters on FPGAs	Nelson Campos et.al.	2409.05837v1	null
2024-09-09	Limits on the computational expressivity of non-equilibrium biophysical processes	Carlos Floyd et.al.	2409.05827v1	null
2024-09-09	A Flexible Framework for Universal Computational Aberration Correction via Automatic Lens Library Generation and Domain Adaptation	Qi Jiang et.al.	2409.05809v1	null
2024-09-09	A CLIP-based siamese approach for meme classification	Javier Huertas-Tato et.al.	2409.05772v1	null
2024-09-09	Consensus-based Distributed Quantum Kernel Learning for Speech Recognition	Kuan-Cheng Chen et.al.	2409.05770v1	null
2024-09-09	A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR	Giovanni Morrone et.al.	2409.05750v1	null
2024-09-06	Synergy and Synchrony in Couple Dances	Vongani Maluleke et.al.	2409.04440v1	null
2024-09-06	VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation	Yecheng Wu et.al.	2409.04429v1	null
2024-09-06	Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning Techniques	Davide Clode da Silva et.al.	2409.04424v1	null
2024-09-06	Virtual Reality-Based Preoperative Planning for Optimized Trocar Placement in Thoracic Surgery: A Preliminary Study	Arash Harirpoush et.al.	2409.04414v1	null
2024-09-06	Quantum Kernel Methods under Scrutiny: A Benchmarking Study	Jan Schnabel et.al.	2409.04406v1	null
2024-09-09	Question-Answering Dense Video Events	Hangyu Qin et.al.	2409.04388v2	null
2024-09-06	Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior	Charlesquin Kemajou Mbakam et.al.	2409.04384v1	null
2024-09-06	Enhancing Skin Lesion Diagnosis with Ensemble Learning	Xiaoyi Liu et.al.	2409.04381v1	null
2024-09-06	Tykhyy's Conjecture on finite mapping class group orbits	Samuel Bronstein et.al.	2409.04379v1	null
2024-09-06	The Impact of Scanner Domain Shift on Deep Learning Performance in Medical Imaging: an Experimental Study	Gregory Szumel et.al.	2409.04368v1	null
2024-09-05	Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding	Yunze Man et.al.	2409.03757v1	link
2024-09-05	Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron	Christian Schmid et.al.	2409.03749v1	null
2024-09-05	Orbital Support and Evolution of CX/OX Structures in Boxy/Peanut Bars	Behzad Tahmasebzadeh et.al.	2409.03746v1	null
2024-09-05	Libra: Architectural Support For Principled, Secure And Efficient Balanced Execution On High-End Processors (Extended Version)	Hans Winderix et.al.	2409.03743v1	null
2024-09-05	Classification and Prediction of Heart Diseases using Machine Learning Algorithms	Akua Sekyiwaa Osei-Nkwantabisa et.al.	2409.03697v1	null
2024-09-05	View-Invariant Policy Learning via Zero-Shot Novel View Synthesis	Stephen Tian et.al.	2409.03685v1	null
2024-09-05	Threat Classification on Deployed Optical Networks Using MIMO Digital Fiber Sensing, Wavelets, and Machine Learning	Khouloud Abdelli et.al.	2409.03667v1	null
2024-09-05	Limited but consistent gains in adversarial robustness by co-training object recognition models with human EEG	Manshan Guo et.al.	2409.03646v1	null
2024-09-05	Variance reduction in Texas hold'em and in video poker	Stewart N. Ethier et.al.	2409.03607v1	null
2024-09-05	SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing	Lingyu Xiong et.al.	2409.03605v1	null
2024-09-04	SITAR: Semi-supervised Image Transformer for Action Recognition	Owais Iqbal et.al.	2409.02910v1	null
2024-09-04	GraphTrials: Visual Proofs of Graph Properties	Henry Förster et.al.	2409.02907v1	null
2024-09-04	Classification of spin-$1/2$ fermionic quantum spin liquids on the trillium lattice	Ming-Hao Li et.al.	2409.02898v1	null
2024-09-04	LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture	Xidong Wang et.al.	2409.02889v1	link
2024-09-04	CanvOI, an Oncology Intelligence Foundation Model: Scaling FLOPS Differently	Jonathan Zalach et.al.	2409.02885v1	null
2024-09-04	Look Into the LITE in Deep Learning for Time Series Classification	Ali Ismail-Fawaz et.al.	2409.02869v1	null
2024-09-04	Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models	Zhibin Liu et.al.	2409.02851v1	null
2024-09-04	iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation	Hayeon Jo et.al.	2409.02838v1	null
2024-09-04	Evolution of radiation profiles in a strongly baffled divertor on MAST Upgrade	Fabio Federici et.al.	2409.02837v1	null
2024-09-04	Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models	Moein Shahiki Tash et.al.	2409.02836v1	null
2024-08-30	Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding	Gueter Josmy Faure et.al.	2408.17443v1	link
2024-08-30	SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists	Raoyuan Zhao et.al.	2408.17437v1	link
2024-08-30	CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion	Yiran Chen et.al.	2408.17424v1	null
2024-09-03	Open-vocabulary Temporal Action Localization using VLMs	Naoki Wake et.al.	2408.17422v2	null
2024-08-30	Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes	Li Zhang et.al.	2408.17421v1	link
2024-08-30	End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework	Chang Cai et.al.	2408.17397v1	null
2024-08-30	Equivariant isomorphism of Quantum Lens Spaces of low dimension	Søren Eilers et.al.	2408.17386v1	null
2024-08-30	LASSO-MOGAT: A Multi-Omics Graph Attention Framework for Cancer Classification	Fadi Alharbi et.al.	2408.17384v1	null
2024-08-30	Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain	Francesca Grasso et.al.	2408.17362v1	link
2024-08-30	Enhancing Underwater Imaging with 4-D Light Fields: Dataset and Method	Yuji Lin et.al.	2408.17339v1	null
2024-08-29	SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners	Ziyu Guo et.al.	2408.16768v1	link
2024-08-29	ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model	Fangfu Liu et.al.	2408.16767v1	null
2024-08-29	OmniRe: Omni Urban Scene Reconstruction	Ziyu Chen et.al.	2408.16760v1	null
2024-08-29	Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge	Beidi Dong et.al.	2408.16749v1	null
2024-08-29	Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech	Cong Zhang et.al.	2408.16732v1	null
2024-08-29	VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation	Shiwei Wu et.al.	2408.16730v1	null
2024-08-29	Prediction-Feedback DETR for Temporal Action Detection	Jihwan Kim et.al.	2408.16729v1	null
2024-08-29	A GREAT Architecture for Edge-Based Graph Problems Like TSP	Attila Lischka et.al.	2408.16717v1	null
2024-08-29	One-Shot Learning Meets Depth Diffusion in Multi-Object Videos	Anisha Jain et.al.	2408.16704v1	null
2024-08-29	RoboMNIST: A Multimodal Dataset for Multi-Robot Activity Recognition Using WiFi Sensing, Video, and Audio	Kian Behzad et.al.	2408.16703v1	null
2024-08-29	Spatio-Temporal Context Prompting for Zero-Shot Action Detection	Wei-Jhe Huang et.al.	2408.15996v2	null
2024-08-28	TEDRA: Text-based Editing of Dynamic and Photoreal Actors	Basavaraj Sunagad et.al.	2408.15995v1	null
2024-08-28	Minimizing movements solutions for a monotone model of droplet motion	Carson Collins et.al.	2408.15984v1	null
2024-08-28	VLT/MUSE detection of accretion-ejection associated with the close stellar companion in the HT Lup system	Sebastián Jorquera et.al.	2408.15976v1	null
2024-08-28	1+1d SPT phases with fusion category symmetry: interface modes and non-abelian Thouless pump	Kansei Inamura et.al.	2408.15960v1	null
2024-08-28	Generating Binary Species Range Maps	Filip Dorm et.al.	2408.15956v1	null
2024-08-28	Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games	Nicholas R. Waytowich et.al.	2408.15950v1	null
2024-08-28	Auxiliary Input in Training: Incorporating Catheter Features into Deep Learning Models for ECG-Free Dynamic Coronary Roadmapping	Yikang Liu et.al.	2408.15947v1	null
2024-08-28	A latticed total K-theory	Qingnan An et.al.	2408.15941v1	null
2024-08-28	Local Descriptors Weighted Adaptive Threshold Filtering For Few-Shot Learning	Bingchen Yan et.al.	2408.15924v1	null
2024-08-27	GenRec: Unifying Video Generation and Recognition with Diffusion Models	Zejia Weng et.al.	2408.15241v1	null
2024-08-27	Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation	Xiaojuan Wang et.al.	2408.15239v1	null
2024-08-27	DCT-CryptoNets: Scaling Private Inference in the Frequency Domain	Arjun Roy et.al.	2408.15231v1	null
2024-08-27	SAM & SAM 2 in 3D Slicer: SegmentWithSAM Extension for Annotating Medical Images	Zafer Yildiz et.al.	2408.15224v1	link
2024-08-27	Histo-Diffusion: A Diffusion Super-Resolution Method for Digital Pathology with Comprehensive Quality Assessment	Xuan Xu et.al.	2408.15218v1	null
2024-08-27	Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance	Weiyi Zhang et.al.	2408.15217v1	null
2024-08-27	Classifying populist language in American presidential and governor speeches using automatic text analysis	Olaf van der Veen et.al.	2408.15213v1	null
2024-08-27	Sec2Sec Co-attention for Video-Based Apparent Affective Prediction	Mingwei Sun et.al.	2408.15209v1	link
2024-08-27	Automatic 8-tissue Segmentation for 6-month Infant Brains	Yilan Dong et.al.	2408.15198v1	null
2024-08-27	Infusing Acoustic Pause Context into Text-Based Dementia Assessment	Franziska Braun et.al.	2408.15188v1	null
2024-08-26	Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos	Qirui Chen et.al.	2408.14469v1	null
2024-08-26	K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences	Zhikai Li et.al.	2408.14468v1	null
2024-08-26	Reconstructing physiological signals from fMRI across the adult lifespan	Shiyu Wang et.al.	2408.14453v1	null
2024-08-26	Model Parallel Training and Transfer Learning for Convolutional Neural Networks by Domain Decomposition	Axel Klawonn et.al.	2408.14442v1	null
2024-08-26	Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification	Mahrukh Awan et.al.	2408.14441v1	null
2024-08-26	Radiance Cascades: A Novel High-Resolution Formal Solution for Multidimensional Non-LTE Radiative Transfer	Christopher M. J. Osborne et.al.	2408.14425v1	null
2024-08-26	Learning Tree-Structured Composition of Data Augmentation	Dongyue Li et.al.	2408.14381v1	link
2024-08-26	Probing Causality Manipulation of Large Language Models	Chenyang Zhang et.al.	2408.14380v1	link
2024-08-26	GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy	Peiyan Li et.al.	2408.14368v1	null
2024-08-26	An Embedding is Worth a Thousand Noisy Labels	Francesco Di Salvo et.al.	2408.14358v1	null
2024-08-23	Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder	Marie Huynh et.al.	2408.13255v1	null
2024-08-23	Domain-specific long text classification from sparse relevant information	Célia D'Cruz et.al.	2408.13253v1	null
2024-08-23	CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities	Tao Wu et.al.	2408.13239v1	null
2024-08-23	D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching	Jingyu Liu et.al.	2408.13226v1	null
2024-08-23	ResSR: A Residual Approach to Super-Resolving Multispectral Images	Haley Duba-Sullivan et.al.	2408.13225v1	null
2024-08-23	EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods	Hongcheng Ding et.al.	2408.13214v1	null
2024-08-23	Instruct-DeBERTa: A Hybrid Approach for Aspect-based Sentiment Analysis on Textual Reviews	Dineth Jayakody et.al.	2408.13202v1	null
2024-08-23	EAViT: External Attention Vision Transformer for Audio Classification	Aquib Iqbal et.al.	2408.13201v1	null
2024-08-23	Deep Learning for Lung Disease Classification Using Transfer Learning and a Customized CNN Architecture with Attention	Xiaoyi Liu et.al.	2408.13180v1	null
2024-08-23	Augmented Functional Random Forests: Classifier Construction and Unbiased Functional Principal Components Importance through Ad-Hoc Conditional Permutations	Fabrizio Maturo et.al.	2408.13179v1	null
2024-08-22	Automating Deformable Gasket Assembly	Simeon Adebola et.al.	2408.12593v1	null
2024-08-22	xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations	Can Qin et.al.	2408.12590v1	null
2024-08-22	Real-Time Video Generation with Pyramid Attention Broadcast	Xuanlei Zhao et.al.	2408.12588v1	link
2024-08-22	Enhanced Parking Perception by Multi-Task Fisheye Cross-view Transformers	Antonyo Musabini et.al.	2408.12575v1	null
2024-08-22	MuMA-ToM: Multi-modal Multi-Agent Theory of Mind	Haojun Shi et.al.	2408.12574v1	null
2024-08-22	Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers	Sayed Mohammad Vakilzadeh Hatefi et.al.	2408.12568v1	null
2024-08-22	ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation	Lujia Zhong et.al.	2408.12561v1	link
2024-08-22	Exploring the Role of Audio in Multimodal Misinformation Detection	Moyang Liu et.al.	2408.12558v1	null
2024-08-22	Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge	Jun Ma et.al.	2408.12534v1	null
2024-08-22	UMAD: University of Macau Anomaly Detection Benchmark Dataset	Dong Li et.al.	2408.12527v1	link
2024-08-21	Great Memory, Shallow Reasoning: Limits of $k$NN-LMs	Shangyi Geng et.al.	2408.11815v1	link
2024-08-21	EmbodiedSAM: Online Segment Any 3D Thing in Real Time	Xiuwei Xu et.al.	2408.11811v1	null
2024-08-21	Approaching Deep Learning through the Spectral Dynamics of Weights	David Yunis et.al.	2408.11804v1	link
2024-08-21	Practical token pruning for foundation models in few-shot conversational virtual assistant systems	Haode Qi et.al.	2408.11799v1	null
2024-08-21	Critique-out-Loud Reward Models	Zachary Ankner et.al.	2408.11791v1	link
2024-08-21	DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework	Zhifei Xie et.al.	2408.11788v1	null
2024-08-21	NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation	Zhenye Lou et.al.	2408.11787v1	link
2024-08-21	Timeline and Boundary Guided Diffusion Network for Video Shadow Detection	Haipeng Zhou et.al.	2408.11785v1	link
2024-08-21	SBDet: A Symmetry-Breaking Object Detector via Relaxed Rotation-Equivariance	Zhiqiang Wu et.al.	2408.11760v1	null
2024-08-21	Improving the Scan-rescan Precision of AI-based CMR Biomarker Estimation	Dewmini Hasara Wickremasinghe et.al.	2408.11754v1	null
2024-08-20	Discriminant Analysis in stationary time series based on robust cepstral coefficients	Jonathan de Souza Matias et.al.	2408.11012v1	null
2024-08-20	Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos	Dennis Fedorishin et.al.	2408.10998v1	null
2024-08-20	Denoising Plane Wave Ultrasound Images Using Diffusion Probabilistic Models	Hojat Asgariandehkordi et.al.	2408.10987v1	null
2024-08-20	ISLES'24: Improving final infarct prediction in ischemic stroke using multimodal imaging and clinical data	Ezequiel de la Rosa et.al.	2408.10966v1	null
2024-08-20	Multichannel Attention Networks with Ensembled Transfer Learning to Recognize Bangla Handwritten Charecter	Farhanul Haque et.al.	2408.10955v1	null
2024-08-20	Wave-Mask/Mix: Exploring Wavelet-Based Augmentations for Time Series Forecasting	Dona Arabi et.al.	2408.10951v1	link
2024-08-20	Proxona: Leveraging LLM-Driven Personas to Enhance Creators' Understanding of Their Audience	Yoonseo Choi et.al.	2408.10937v1	null
2024-08-20	SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement	Linlin Hu et.al.	2408.10934v1	null
2024-08-20	ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining	Qi Ma et.al.	2408.10906v1	null
2024-08-20	ViLReF: A Chinese Vision-Language Retinal Foundation Model	Shengzhu Yang et.al.	2408.10894v1	link
2024-08-19	Some model theory of quadratic geometries	Charlotte Kestner et.al.	2408.10196v1	null
2024-08-19	Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification	Jing Li et.al.	2408.10193v1	null
2024-08-20	LongVILA: Scaling Long-Context Visual Language Models for Long Videos	Fuzhao Xue et.al.	2408.10188v2	link
2024-08-19	SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models	Anke Tang et.al.	2408.10174v1	link
2024-08-19	Galaxy Zoo: Morphologies based on UKIDSS NIR Imaging for 71,052 Galaxies	Karen L. Masters et.al.	2408.10160v1	null
2024-08-19	Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video	Shuxian Wang et.al.	2408.10153v1	null
2024-08-19	Biharmonic conformal immersions into a 3-dimensional conformally flat space	Ze-Ping Wang et.al.	2408.10144v1	null
2024-08-19	Perceptual Depth Quality Assessment of Stereoscopic Omnidirectional Images	Wei Zhou et.al.	2408.10134v1	null
2024-08-19	UNINEXT-Cutie: The 1st Solution for LSVOS Challenge RVOS Track	Hao Fang et.al.	2408.10129v1	null
2024-08-19	Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track	Feiyu Pan et.al.	2408.10125v1	null
2024-08-16	Quantum Annealing for Enhanced Feature Selection in Single-Cell RNA Sequencing Data Analysis	Selim Romero et.al.	2408.08867v1	null
2024-08-16	DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models	Eman Ali et.al.	2408.08855v1	null
2024-08-16	ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis	Yubao Zhao et.al.	2408.08849v1	null
2024-08-16	HistoGym: A Reinforcement Learning Environment for Histopathological Image Analysis	Zhi-Bo Liu et.al.	2408.08847v1	link
2024-08-16	LEVIS: Large Exact Verifiable Input Spaces for Neural Networks	Mohamad Fares El Hajj Chehade et.al.	2408.08824v1	null
2024-08-16	Optimal Symmetries in Binary Classification	Vishal S. Ngairangbam et.al.	2408.08823v1	null
2024-08-16	Leveraging FourierKAN Classification Head for Pre-Trained Transformer-based Text Classification	Abdullah Al Imran et.al.	2408.08803v1	null
2024-08-16	Xpikeformer: Hybrid Analog-Digital Hardware Acceleration for Spiking Transformers	Zihang Song et.al.	2408.08794v1	null
2024-08-16	Assessing Generalization Capabilities of Malaria Diagnostic Models from Thin Blood Smears	Louise Guillon et.al.	2408.08792v1	null
2024-08-16	A Disease-Specific Foundation Model Using Over 100K Fundus Images: Release and Validation for Abnormality and Multi-Disease Classification on Downstream Tasks	Boa Jang et.al.	2408.08790v1	link
2024-08-15	HyperTaxel: Hyper-Resolution for Taxel-Based Tactile Signals Through Contrastive Learning	Hongyu Li et.al.	2408.08312v1	null
2024-08-15	Gauge-invariant optical selection rules for excitons	Tharindu Fernando et.al.	2408.08311v1	null
2024-08-15	Accelerated Image-Aware Generative Diffusion Modeling	Tanmay Asthana et.al.	2408.08306v1	null
2024-08-15	SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training	Gengwei Zhang et.al.	2408.08295v1	link
2024-08-15	Marker or Markerless? Mode-Switchable Optical Tactile Sensing for Diverse Robot Tasks	Ni Ou et.al.	2408.08276v1	null
2024-08-15	Snuffy: Efficient Whole Slide Image Classifier	Hossein Jafarinia et.al.	2408.08258v1	link
2024-08-15	Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective	Zixuan Pan et.al.	2408.08228v1	link
2024-08-15	RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science	David Farr et.al.	2408.08217v1	null
2024-08-15	Moving Healthcare AI-Support Systems for Visually Detectable Diseases onto Constrained Devices	Tess Watt et.al.	2408.08215v1	null
2024-08-15	Learned Multimodal Compression for Autonomous Driving	Hadi Hadizadeh et.al.	2408.08211v1	null
2024-08-14	End-to-end Semantic-centric Video-based Multimodal Affective Computing	Ronghao Lin et.al.	2408.07694v1	null
2024-08-15	A Spitting Image: Modular Superpixel Tokenization in Vision Transformers	Marius Aasan et.al.	2408.07680v2	link
2024-08-14	G$^2$V$^2$former: Graph Guided Video Vision Transformer for Face Anti-Spoofing	Jingyi Yang et.al.	2408.07675v1	null
2024-08-14	Graph Triple Attention Network: A Decoupled Perspective	Xiaotang Wang et.al.	2408.07654v1	link
2024-08-14	Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving	Yuqing Wen et.al.	2408.07605v1	null
2024-08-14	Disentangle and denoise: Tackling context misalignment for video moment retrieval	Kaijing Ma et.al.	2408.07600v1	null
2024-08-14	Theoretical and Practical Progress in Hyperspectral Pixel Unmixing with Large Spectral Libraries from a Sparse Perspective	Jade Preston et.al.	2408.07580v1	null
2024-08-14	TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases	Thibault Simonetto et.al.	2408.07579v1	link
2024-08-14	DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model	Erez Yosef et.al.	2408.07541v1	null
2024-08-14	Improved 3D Whole Heart Geometry from Sparse CMR Slices	Yiyang Xu et.al.	2408.07532v1	link
2024-08-13	On Networks and their Applications: Stability of Gene Regulatory Networks and Gene Function Prediction using Autoencoders	Hamza Coban et.al.	2408.07064v1	null
2024-08-13	Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual Reality	Yu-Chih Chen et.al.	2408.07041v1	null
2024-08-13	PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology	Xiaomin Wu et.al.	2408.07037v1	null
2024-08-13	Feature-Preserving Rate-Distortion Optimization in Image Coding for Machines	Samuel Fernández Menduiña et.al.	2408.07028v1	null
2024-08-13	Event-Stream Super Resolution using Sigma-Delta Neural Network	Waseem Shariff et.al.	2408.06968v1	null
2024-08-13	DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs	Dongyuan Li et.al.	2408.06966v1	null
2024-08-13	OpenResearcher: Unleashing AI for Accelerated Scientific Research	Yuxiang Zheng et.al.	2408.06941v1	link
2024-08-13	Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification	Bauke Arends et.al.	2408.06930v1	null
2024-08-13	Divide and Conquer: Improving Multi-Camera 3D Perception with 2D Semantic-Depth Priors and Input-Dependent Queries	Qi Song et.al.	2408.06901v1	null
2024-08-13	Entendre, a Social Bot Detection Tool for Niche, Fringe, and Extreme Social Media	Pranav Venkatesh et.al.	2408.06900v1	null
2024-08-12	Is it a work or leisure travel? Applying text classification to identify work-related travel on social networks	Lucas Félix et.al.	2408.06341v1	null
2024-08-12	Moo-ving Beyond Tradition: Revolutionizing Cattle Behavioural Phenotyping with Pose Estimation Techniques	Navid Ghassemi et.al.	2408.06336v1	null
2024-08-12	LOLgorithm: Integrating Semantic,Syntactic and Contextual Elements for Humor Classification	Tanisha Khurana et.al.	2408.06335v1	null
2024-08-12	From SAM to SAM 2: Exploring Improvements in Meta's Segment Anything Model	Athulya Sundaresan Geetha et.al.	2408.06305v1	null
2024-08-12	Sparsity Based Multi-Source Robust 3D Localization Using a Moving Receiver	Amir Mansourian et.al.	2408.06274v1	null
2024-08-12	Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance	Manuel Milling et.al.	2408.06264v1	null
2024-08-12	Deep Learning System Boundary Testing through Latent Space Style Mixing	Amr Abdellatif et.al.	2408.06258v1	null
2024-08-12	Rethinking Video with a Universal Event-Based Representation	Andrew Freeman et.al.	2408.06248v1	null
2024-08-12	A Comprehensive Case Study on the Performance of Machine Learning Methods on the Classification of Solar Panel Electroluminescence Images	Xinyi Song et.al.	2408.06229v1	link
2024-08-12	ARCADE: An Augmented Reality Display Environment for Multimodal Interaction with Conversational Agents	Carolin Schindler et.al.	2408.06222v1	null
2024-08-09	VITA: Towards Open-Source Interactive Omni Multimodal LLM	Chaoyou Fu et.al.	2408.05211v1	null
2024-08-09	Kalman-Inspired Feature Propagation for Video Face Super-Resolution	Ruicheng Feng et.al.	2408.05205v1	null
2024-08-09	HistoKernel: Whole Slide Image Level Maximum Mean Discrepancy Kernels for Pan-Cancer Predictive Modelling	Piotr Keller et.al.	2408.05195v1	link
2024-08-09	Cross-Domain Learning for Video Anomaly Detection with Limited Supervision	Yashika Jain et.al.	2408.05191v1	null
2024-08-09	Holomorphic vector fields with real integral manifolds	Martin Kolář et.al.	2408.05186v1	null
2024-08-09	MADE-WIC: Multiple Annotated Datasets for Exploring Weaknesses In Code	Moritz Mock et.al.	2408.05163v1	null
2024-08-09	Meta-Learning Guided Label Noise Distillation for Robust Signal Modulation Classification	Xiaoyang Hao et.al.	2408.05151v1	null
2024-08-09	Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video	Chunggi Lee et.al.	2408.05123v1	null
2024-08-09	Cautious Calibration in Binary Classification	Mari-Liis Allikivi et.al.	2408.05120v1	null
2024-08-09	Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images	Shouyue Liu et.al.	2408.05117v1	null
2024-08-08	Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics	Ruining Li et.al.	2408.04631v1	null
2024-08-08	LogogramNLP: Comparing Visual and Textual Representations of Ancient Logographic Writing Systems for NLP	Danlu Chen et.al.	2408.04628v1	null
2024-08-08	Transformer Explainer: Interactive Learning of Text-Generative Models	Aeree Cho et.al.	2408.04619v1	null
2024-08-08	Quantifying the Impact of Population Shift Across Age and Sex for Abdominal Organ Segmentation	Kate Čevora et.al.	2408.04610v1	null
2024-08-08	Enhanced Prototypical Part Network (EPPNet) For Explainable Image Classification Via Prototypes	Bhushan Atote et.al.	2408.04606v1	null
2024-08-08	SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation	Jieming Yu et.al.	2408.04593v1	null
2024-08-08	Learn To Learn More Precisely	Runxi Cheng et.al.	2408.04590v1	null
2024-08-08	SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals	Haoran Zheng et.al.	2408.04575v1	null
2024-08-08	Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches	Yongzhi Xu et.al.	2408.04567v1	null
2024-08-08	MemeMind at ArAIEval Shared Task: Spotting Persuasive Spans in Arabic Text with Persuasion Techniques Identification	Md Rafiul Biswas et.al.	2408.04540v1	null
2024-08-07	How Well Can Vision Language Models See Image Details?	Chenhui Gou et.al.	2408.03940v1	null
2024-08-07	Fast Sprite Decomposition from Animated Graphics	Tomoyuki Suzuki et.al.	2408.03923v1	null
2024-08-07	FMiFood: Multi-modal Contrastive Learning for Food Image Classification	Xinyue Pan et.al.	2408.03922v1	null
2024-08-07	Holomorphic foliations tangent to Rolle-pfaffian hypersurfaces	Arturo Fernández-Pérez et.al.	2408.03914v1	null
2024-08-07	AdapMTL: Adaptive Pruning Framework for Multitask Learning Model	Mingcan Xiang et.al.	2408.03913v1	null
2024-08-07	Achieving Human Level Competitive Robot Table Tennis	David B. D'Ambrosio et.al.	2408.03906v1	null
2024-08-07	Lightweight Video Denoising Using a Classic Bayesian Backbone	Clément Bled et.al.	2408.03904v1	null
2024-08-07	Retrieval Augmentation via User Interest Clustering	Hanjia Lyu et.al.	2408.03886v1	null
2024-08-07	Global-Local Progressive Integration Network for Blind Image Quality Assessment	Xiaoqi Wang et.al.	2408.03885v1	null
2024-08-07	Knowledge Probing for Graph Representation Learning	Mingyu Zhao et.al.	2408.03877v1	null
2024-08-06	LLaVA-OneVision: Easy Visual Task Transfer	Bo Li et.al.	2408.03326v1	null
2024-08-06	ClassiFIM: An Unsupervised Method To Detect Phase Transitions	Victor Kasatkin et.al.	2408.03323v1	null
2024-08-06	Segment Anything in Medical Images and Videos: Benchmark and Deployment	Jun Ma et.al.	2408.03322v1	null
2024-08-06	MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation	Xiaofeng Mao et.al.	2408.03312v1	null
2024-08-06	Left of Fab: Securing Design and Collaboration in the Semiconductor Value Chain	John C. Hoag et.al.	2408.03295v1	null
2024-08-06	Biomedical SAM 2: Segment Anything in Biomedical Images and Videos	Zhiling Yan et.al.	2408.03286v1	null
2024-08-06	ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer	Jiazhi Guan et.al.	2408.03284v1	null
2024-08-06	Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments	Angie Boggust et.al.	2408.03274v1	null
2024-08-07	BVI-AOM: A New Training Dataset for Deep Video Compression Optimization	Jakub Nawała et.al.	2408.03265v2	null
2024-08-06	Analysis of Partially-Calibrated Sparse Subarrays for Direction Finding with Extended Degrees of Freedom	W. S. Leite et.al.	2408.03236v1	null
2024-08-05	Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics	Shishira R Maiya et.al.	2408.02672v1	null
2024-08-05	Interactive 3D Medical Image Segmentation with SAM 2	Chuyun Shen et.al.	2408.02635v1	null
2024-08-05	VidGen-1M: A Large-Scale Dataset for Text-to-video Generation	Zhiyu Tan et.al.	2408.02629v1	null
2024-08-05	DanModCap: Designing a Danmaku Moderation Tool for Video-Sharing Platforms that Leverages Impact Captions	Siying Hu et.al.	2408.02574v1	null
2024-08-05	Cross-Modality Clustering-based Self-Labeling for Multimodal Data Classification	Paweł Zyblewski et.al.	2408.02568v1	null
2024-08-05	HQOD: Harmonious Quantization for Object Detection	Long Huang et.al.	2408.02561v1	null
2024-08-05	The effect of dynamical states on galaxy clusters populations. I. Classification of dynamical states	S. Véliz Astudillo et.al.	2408.02519v1	null
2024-08-05	Automatic rating of incomplete hippocampal inversions evaluated across multiple cohorts	Lisa Hemforth et.al.	2408.02496v1	null
2024-08-05	HyperSpaceX: Radial and Angular Exploration of HyperSpherical Dimensions	Chiranjeev Chiranjeev et.al.	2408.02494v1	null
2024-08-05	Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection	Ting Lei et.al.	2408.02484v1	null
2024-08-02	Conditional LoRA Parameter Generation	Xiaolong Jin et.al.	2408.01415v1	null
2024-08-02	Derivation of Back-propagation for Graph Convolutional Networks using Matrix Calculus and its Application to Explainable Artificial Intelligence	Yen-Che Hsiao et.al.	2408.01408v1	null
2024-08-02	NOLO: Navigate Only Look Once	Bohan Zhou et.al.	2408.01384v1	null
2024-08-02	Explaining a probabilistic prediction on the simplex with Shapley compositions	Paul-Gauthier Noé et.al.	2408.01382v1	null
2024-08-02	Spatial-Spectral Morphological Mamba for Hyperspectral Image Classification	Muhammad Ahmad et.al.	2408.01372v1	null
2024-08-02	Classification of marked elliptic root systems with non-reduced quotient	A. Fialowski et.al.	2408.01358v1	null
2024-08-02	Harmonized connectome resampling for variance in voxel sizes	Elyssa M. McMaster et.al.	2408.01351v1	null
2024-08-02	Human foraging strategies flexibly adapt to resource distribution and time constraints	Valeria Simonelli et.al.	2408.01350v1	null
2024-08-02	PC$^2$: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval	Yue Duan et.al.	2408.01349v1	null
2024-08-02	Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks	Anders Giovanni Møller et.al.	2408.01346v1	null
2024-08-01	Text-Guided Video Masked Autoencoder	David Fan et.al.	2408.00759v1	null
2024-08-01	Segment anything model 2: an application to 2D and 3D medical images	Haoyu Dong et.al.	2408.00756v1	null
2024-08-01	Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model	Benlin Liu et.al.	2408.00754v1	null
2024-08-01	CERT-ED: Certifiably Robust Text Classification for Edit Distance	Zhuoqun Huang et.al.	2408.00728v1	null
2024-08-01	SAM 2: Segment Anything in Images and Videos	Nikhila Ravi et.al.	2408.00714v1	null
2024-08-01	Investigating Brain Connectivity and Regional Statistics from EEG for early stage Parkinson's Classification	Amarpal Sahota et.al.	2408.00711v1	null
2024-08-01	Point-supervised Brain Tumor Segmentation with Box-prompted MedSAM	Xiaofeng Liu et.al.	2408.00706v1	null
2024-08-01	Granular-Balls based Fuzzy Twin Support Vector Machine for Classification	Lixi Zhao et.al.	2408.00699v1	null
2024-08-01	ExpertAF: Expert Actionable Feedback from Video	Kumar Ashutosh et.al.	2408.00672v1	null
2024-08-01	AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models	Daqin Luo et.al.	2408.00665v1	null
2024-07-31	The Llama 3 Herd of Models	Abhimanyu Dubey et.al.	2407.21783v1	null
2024-07-31	RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining	Hongtao Wu et.al.	2407.21773v1	null
2024-07-31	ReplanVLM: Replanning Robotic Tasks with Visual Language Models	Aoran Mei et.al.	2407.21762v1	null
2024-07-31	Learning Video Context as Interleaved Multimodal Sequences	Kevin Qinghong Lin et.al.	2407.21757v1	null
2024-08-01	Topological Woodward-Hoffmann classification for cycloadditions in polycyclic aromatic azomethine ylides	Juan Li et.al.	2407.21756v2	null
2024-07-31	A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation	Mothilal Asokan et.al.	2407.21739v1	null
2024-07-31	Leveraging Self-Supervised Learning for Fetal Cardiac Planes Classification using Ultrasound Scan Videos	Joseph Geo Benjamin et.al.	2407.21738v1	null
2024-07-31	Artificial Intelligence Approaches for Energy Efficiency: A Review	Alberto Pasqualetto et.al.	2407.21726v1	null
2024-07-31	Open-Vocabulary Audio-Visual Semantic Segmentation	Ruohao Guo et.al.	2407.21721v1	null
2024-07-31	Tora: Trajectory-oriented Diffusion Transformer for Video Generation	Zhenghao Zhang et.al.	2407.21705v1	null
2024-07-30	Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation	Marcelo Matheus Gauy et.al.	2407.20989v1	null
2024-07-30	Transfer Learning for Multi-material Classification of Transition Metal Dichalcogenides with Atomic Force Microscopy	Isaiah A. Moses et.al.	2407.20975v1	null
2024-07-30	MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions	Xiaowei Chi et.al.	2407.20962v1	link
2024-07-30	EAR: Edge-Aware Reconstruction of 3-D vertebrae structures from bi-planar X-ray images	Lixing Tan et.al.	2407.20937v1	null
2024-07-30	Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering	Yanpeng Zhao et.al.	2407.20908v1	link
2024-07-30	Simultaneous Multi-Slice Diffusion Imaging using Navigator-free Multishot Spiral Acquisition	Yuancheng Jiang et.al.	2407.20904v1	null
2024-07-30	Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach	Adam Wojciechowski et.al.	2407.20899v1	null
2024-07-30	MambaCapsule: Towards Transparent Cardiac Disease Diagnosis with Electrocardiography Using Mamba Capsule Network	Yinlong Xu et.al.	2407.20893v1	null
2024-07-30	Shift operators and their classification	Maria Carvalho et.al.	2407.20890v1	null
2024-07-30	Effective Black Box Testing of Sentiment Analysis Classification Networks	Parsa Karbasizadeh et.al.	2407.20884v1	null
2024-07-29	SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction	Çağhan Köksal et.al.	2407.20214v1	null
2024-07-30	SpaER: Learning Spatio-temporal Equivariant Representations for Fetal Brain Motion Tracking	Jian Wang et.al.	2407.20198v2	null
2024-07-29	Radiance Fields for Robotic Teleoperation	Maximum Wilder-Smith et.al.	2407.20194v1	null
2024-07-29	Theia: Distilling Diverse Vision Foundation Models for Robot Learning	Jinghuan Shang et.al.	2407.20179v1	link
2024-07-29	LatentArtiFusion: An Effective and Efficient Histological Artifacts Restoration Framework	Zhenqi He et.al.	2407.20172v1	link
2024-07-29	Diffusion Feedback Helps CLIP See Better	Wenxuan Wang et.al.	2407.20171v1	null
2024-07-29	Language-Conditioned Offline RL for Multi-Robot Navigation	Steven Morad et.al.	2407.20164v1	null
2024-07-29	Quantum Machine Learning Architecture Search via Deep Reinforcement Learning	Xin Dai et.al.	2407.20147v1	null
2024-07-30	AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics	Xiangxiang Dai et.al.	2407.20124v2	link
2024-07-29	Integrable and superintegrable quantum mechanical systems with position dependent masses invariant with respect to one parametric Lie groups. 2. Systems with dilatation and shift symmetries	A. G. Nikitin et.al.	2407.20112v1	null
2024-07-26	HRP: Human Affordances for Robotic Pre-Training	Mohan Kumar Srirama et.al.	2407.18911v1	null
2024-07-26	Wolf: Captioning Everything with a World Summarization Framework	Boyi Li et.al.	2407.18908v1	null
2024-07-26	A Scalable Quantum Non-local Neural Network for Image Classification	Sparsh Gupta et.al.	2407.18906v1	link
2024-07-26	Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment	Yuze Zheng et.al.	2407.18854v1	null
2024-07-26	The Role of Temporal Hierarchy in Spiking Neural Networks	Filippo Moro et.al.	2407.18838v1	null
2024-07-26	Learning the Chaotic and Regular Nature of Trajectories in Hamiltonian Systems with Lagrangian descriptors	Javier Jiménez López et.al.	2407.18831v1	null
2024-07-26	Binary orbit and disks properties of the RW Aur system using ALMA observations	N. T. Kurtovic et.al.	2407.18828v1	null
2024-07-26	Three-dimensional ultrasound-based online system for automated ovarian follicle measurement	Pedro Royo et.al.	2407.18818v1	null
2024-07-26	Automatic Detection of Moral Values in Music Lyrics	Vjosa Preniqi et.al.	2407.18787v1	null
2024-07-26	Deep learning interpretable analysis for carbon star identification in Gaia DR3	Shuo Ye et.al.	2407.18754v1	null
2024-07-25	Review of Degenerate Higher Order Scalar Tensor Theories in Cosmology	Andrei Lazanu et.al.	2407.18234v1	null
2024-07-25	One-point Statistics in various cosmic environments in the presence of massive neutrinos	Mohadese Khoshtinat et.al.	2407.18233v1	null
2024-07-26	Enhanced Depth Estimation and 3D Geometry Reconstruction using Bayesian Helmholtz Stereopsis with Belief Propagation	Razieh Azizi et.al.	2407.18195v2	null
2024-07-25	PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations	Cheng Qian et.al.	2407.18178v1	null
2024-07-26	On-chip near-infrared spectroscopic sensing with over 520nm bandwidth	Chunhui Yao et.al.	2407.18172v2	null
2024-07-25	IRIS: Wireless Ring for Vision-based Smart Home Interaction	Maruchi Kim et.al.	2407.18141v1	null
2024-07-25	XS-VID: An Extremely Small Video Object Detection Dataset	Jiahao Guo et.al.	2407.18137v1	null
2024-07-25	Estimating Earthquake Magnitude in Sentinel-1 Imagery via Ranking	Daniele Rege Cambrin et.al.	2407.18128v1	null
2024-07-25	Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images	Roberto Di Via et.al.	2407.18125v1	null
2024-07-25	Multi-Resolution Histopathology Patch Graphs for Ovarian Cancer Subtyping	Jack Breen et.al.	2407.18105v1	link
2024-07-24	SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency	Yiming Xie et.al.	2407.17470v1	null
2024-07-24	SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning	Jianpeng Yao et.al.	2407.17460v1	null
2024-07-24	EuroCropsML: A Time Series Benchmark Dataset For Few-Shot Crop Type Classification	Joana Reuss et.al.	2407.17458v1	null
2024-07-24	HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation	Zhenzhi Wang et.al.	2407.17438v1	link
2024-07-24	Systematic study of High $E_J/E_C$ transmon qudits up to $d = 12$	Z. Wang et.al.	2407.17407v1	null
2024-07-24	Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising	Sébastien Herbreteau et.al.	2407.17399v1	null
2024-07-24	Sampling-Based Hierarchical Trajectory Planning for Formation Flight	Qingzhao Liu et.al.	2407.17392v1	null
2024-07-24	2D and 3D Deep Learning Models for MRI-based Parkinson's Disease Classification: A Comparative Analysis of Convolutional Kolmogorov-Arnold Networks, Convolutional Neural Networks, and Graph Convolutional Networks	Salil B Patel et.al.	2407.17380v1	null
2024-07-24	Entropy Reweighted Conformal Classification	Rui Luo et.al.	2407.17377v1	null
2024-07-24	MuST: Multi-Scale Transformers for Surgical Phase Recognition	Alejandra Pérez et.al.	2407.17361v1	link
2024-07-23	Explanation Regularisation through the Lens of Attributions	Pedro Ferreira et.al.	2407.16693v1	null
2024-07-23	On the local cohomology of secant varieties	Sebastian Olano et.al.	2407.16688v1	null
2024-07-23	AutoRG-Brain: Grounded Report Generation for Brain MRI	Jiayu Lei et.al.	2407.16684v1	null
2024-07-24	Goedel logics: Prenex fragments	Matthias Baaz et.al.	2407.16683v2	null
2024-07-24	A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data	Adrian Remonda et.al.	2407.16680v2	link
2024-07-23	From Imitation to Refinement -- Residual RL for Precise Visual Assembly	Lars Ankile et.al.	2407.16677v1	null
2024-07-23	FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process	Yuyan Bu et.al.	2407.16670v1	null
2024-07-23	EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval	Thomas Hummel et.al.	2407.16658v1	link
2024-07-23	Fluorescence Diffraction Tomography using Explicit Neural Fields	Renzhi He et.al.	2407.16657v1	null
2024-07-23	MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence	Canyu Zhao et.al.	2407.16655v1	null
2024-07-22	AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description	Junyu Xie et.al.	2407.15850v1	link
2024-07-22	SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models	Mingze Xu et.al.	2407.15841v1	null
2024-07-23	QueST: Self-Supervised Skill Abstractions for Learning Continuous Control	Atharva Mete et.al.	2407.15840v2	null
2024-07-22	Enhancing Cell Instance Segmentation in Scanning Electron Microscopy Images via a Deep Contour Closing Operator	Florian Robert et.al.	2407.15817v1	null
2024-07-22	Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning	Zhecheng Yuan et.al.	2407.15815v1	null
2024-07-22	The Evaporating Massive Embedded Stellar Cluster IRS 13 Close to Sgr A. II. Kinematic structure*	Florian Peißker et.al.	2407.15800v1	null
2024-07-22	Adaptive Extensions of Unbiased Risk Estimators for Unsupervised Magnetic Resonance Image Denoising	Reeshad Khan et.al.	2407.15799v1	null
2024-07-23	Disentangling spatio-temporal knowledge for weakly supervised object detection and segmentation in surgical video	Guiqiu Liao et.al.	2407.15794v2	null
2024-07-22	LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding	Haoning Wu et.al.	2407.15754v1	link
2024-07-22	SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection	Dimitrios Kollias et.al.	2407.15728v1	null
2024-07-19	DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks	Sarah Jabbour et.al.	2407.14509v1	null
2024-07-19	T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation	Kaiyue Sun et.al.	2407.14505v1	null
2024-07-19	Nonlinear Schrödinger Network	Yiming Zhou et.al.	2407.14504v1	null
2024-07-19	Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery	Sukrut Rao et.al.	2407.14499v1	link
2024-07-19	Enhancing Layout Hotspot Detection Efficiency with YOLOv8 and PCA-Guided Augmentation	Dongyang Wu et.al.	2407.14498v1	null
2024-07-19	Evaluating the Reliability of Self-Explanations in Large Language Models	Korbinian Randl et.al.	2407.14487v1	link
2024-07-19	Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model	Seonghui Min et.al.	2407.14434v1	null
2024-07-19	Dataset Distillation in Medical Imaging: A Feasibility Study	Muyang Li et.al.	2407.14429v1	null
2024-07-19	Controllable and Efficient Multi-Class Pathology Nuclei Data Augmentation using Text-Conditioned Diffusion Models	Hyun-Jic Oh et.al.	2407.14426v1	null
2024-07-19	Improving classification of road surface conditions via road area extraction and contrastive learning	Linh Trinh et.al.	2407.14418v1	null
2024-07-18	GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model	Abdelrahman Shaker et.al.	2407.13772v1	null
2024-07-18	Addressing Imbalance for Class Incremental Learning in Medical Image Classification	Xuze Hao et.al.	2407.13768v1	null
2024-07-18	Shape of Motion: 4D Reconstruction from a Single Video	Qianqian Wang et.al.	2407.13764v1	null
2024-07-18	Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion	Boyang Deng et.al.	2407.13759v1	null
2024-07-18	Exploring Facial Biomarkers for Depression through Temporal Analysis of Action Units	Aditya Parikh et.al.	2407.13753v1	null
2024-07-18	Temporal Representation Learning for Stock Similarities and Its Applications in Investment Management	Yoontae Hwang et.al.	2407.13751v1	null
2024-07-18	Pose-guided multi-task video transformer for driver action recognition	Ricardo Pizarro et.al.	2407.13750v1	null
2024-07-18	Multi-Label Learning with Stronger Consistency Guarantees	Anqi Mao et.al.	2407.13746v1	null
2024-07-18	Realizable $H$-Consistent and Bayes-Consistent Loss Functions for Learning to Defer	Anqi Mao et.al.	2407.13732v1	null
2024-07-18	Enhanced $H$-Consistency Bounds	Anqi Mao et.al.	2407.13722v1	null
2024-07-17	VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control	Sherwin Bahmani et.al.	2407.12781v1	null
2024-07-17	Hallucination Index: An Image Quality Metric for Generative Reconstruction Models	Matthew Tivnan et.al.	2407.12780v1	null
2024-07-17	LookupViT: Compressing visual information to a limited number of tokens	Rajat Koner et.al.	2407.12753v1	null
2024-07-17	4Dynamic: Text-to-4D Generation with Hybrid Priors	Yu-Jie Yuan et.al.	2407.12684v1	null
2024-07-17	Goldfish: Vision-Language Understanding of Arbitrarily Long Videos	Kirolos Ataallah et.al.	2407.12679v1	null
2024-07-17	Promptable Counterfactual Diffusion Model for Unified Brain Tumor Segmentation and Generation with MRIs	Yiqing Shen et.al.	2407.12678v1	null
2024-07-17	CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems	Jiankun Zhao et.al.	2407.12676v1	link
2024-07-17	Distilling Tiny and Ultra-fast Deep Neural Networks for Autonomous Navigation on Nano-UAVs	Lorenzo Lamberti et.al.	2407.12675v1	null
2024-07-17	Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data	Richard Osuala et.al.	2407.12669v1	null
2024-07-17	Is That Rain? Understanding Effects on Visual Odometry Performance for Autonomous UAVs and Efficient DNN-based Rain Classification at the Edge	Andrea Albanese et.al.	2407.12663v1	null
2024-07-16	Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling	Jaehyeok Kim et.al.	2407.11962v1	null
2024-07-16	A Transformer-based Approach for Augmenting Software Engineering Chatbots Datasets	Ahmad Abdellatif et.al.	2407.11955v1	null
2024-07-16	Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation	Olga Zatsarynna et.al.	2407.11954v1	null
2024-07-16	Temporally Consistent Stereo Matching	Jiaxi Zeng et.al.	2407.11950v1	link
2024-07-17	Hierarchical Separable Video Transformer for Snapshot Compressive Imaging	Ping Wang et.al.	2407.11946v2	link
2024-07-16	Tackling Oversmoothing in GNN via Graph Sparsification: A Truss-based Approach	Tanvir Hossain et.al.	2407.11928v1	null
2024-07-16	The Strength of Bisymmetric Modes in SDSS-IV/MaNGA Barred Galaxy Kinematics	Brian DiGiorgio Zanger et.al.	2407.11908v1	null
2024-07-16	GraphFM: A Scalable Framework for Multi-Graph Pretraining	Divyansha Lachi et.al.	2407.11907v1	null
2024-07-16	SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge	Hao Ding et.al.	2407.11906v1	null
2024-07-16	Automated production of batched unclonable micro-patterns anti-counterfeiting labels with strong robustness and rapid recognition speed	Yuzheng He et.al.	2407.11886v1	null
2024-07-15	No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations	Walter Simoncini et.al.	2407.10964v1	link
2024-07-15	InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models	Nirat Saini et.al.	2407.10958v1	null
2024-07-15	MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models	Chengguang Gan et.al.	2407.10953v1	null
2024-07-15	IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation	Yuanhao Zhai et.al.	2407.10937v1	link
2024-07-15	Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together	Dilara Soylu et.al.	2407.10930v1	null
2024-07-15	In-Loop Filtering via Trained Look-Up Tables	Zhuoyuan Li et.al.	2407.10926v1	null
2024-07-15	A Dual-Attention Aware Deep Convolutional Neural Network for Early Alzheimer's Detection	Pandiyaraju V et.al.	2407.10921v1	null
2024-07-16	DataDream: Few-shot Guided Dataset Generation	Jae Myung Kim et.al.	2407.10910v2	link
2024-07-15	Interpreting Hand gestures using Object Detection and Digits Classification	Sangeetha K et.al.	2407.10902v1	null
2024-07-15	Leveraging Multimodal CycleGAN for the Generation of Anatomically Accurate Synthetic CT Scans from MRIs	Leonardo Crespi et.al.	2407.10888v1	null
2024-07-12	Non-Hermitian Origin of Wannier Localizability and Detachable Topological Boundary States	Daichi Nakamura et.al.	2407.09458v1	null
2024-07-12	Let Me DeCode You: Decoder Conditioning with Tabular Data	Tomasz Szczepański et.al.	2407.09437v1	link
2024-07-12	Rethinking temporal self-similarity for repetitive action counting	Yanan Luo et.al.	2407.09431v1	null
2024-07-12	TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models	Hang Zou et.al.	2407.09424v1	null
2024-07-12	A grid of self-consistent MSG (MARCS-StaticWeather-GGchem) cool stellar, sub-stellar, and exoplanetary model atmospheres	Uffe G. Jørgensen et.al.	2407.09397v1	null
2024-07-12	Open-Canopy: A Country-Scale Benchmark for Canopy Height Estimation at Very High Resolution	Fajwel Fogel et.al.	2407.09392v1	link
2024-07-12	Radiance Fields from Photons	Sacha Jungerman et.al.	2407.09386v1	null
2024-07-12	Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation	Zhilin Zhu et.al.	2407.09367v1	link
2024-07-12	Novel clustered federated learning based on local loss	Endong Gu et.al.	2407.09360v1	link
2024-07-12	Imaging Interiors: An Implicit Solution to Electromagnetic Inverse Scattering Problems	Ziyuan Luo et.al.	2407.09352v1	null
2024-07-11	Video Diffusion Alignment via Reward Gradients	Mihir Prabhudesai et.al.	2407.08737v1	link
2024-07-11	Real-Time Anomaly Detection and Reactive Planning with Large Language Models	Rohan Sinha et.al.	2407.08735v1	null
2024-07-11	WhisperNetV2: SlowFast Siamese Network For Lip-Based Biometrics	Abdollah Zakeri et.al.	2407.08717v1	null
2024-07-11	Sensor-Aware Classifiers for Energy-Efficient Time Series Applications on IoT Devices	Dina Hussein et.al.	2407.08715v1	null
2024-07-11	Towards Efficient Deployment of Hybrid SNNs on Neuromorphic and Edge AI Hardware	James Seekings et.al.	2407.08704v1	null
2024-07-11	Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models	Zhening Xing et.al.	2407.08701v1	null
2024-07-11	ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions	Jiu Feng et.al.	2407.08691v1	link
2024-07-11	Generalizable Implicit Motion Modeling for Video Frame Interpolation	Zujin Guo et.al.	2407.08680v1	null
2024-07-11	Still-Moving: Customized Video Generation without Customized Video Data	Hila Chefer et.al.	2407.08674v1	null
2024-07-11	NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning	Yi Zhang et.al.	2407.08672v1	null
2024-07-10	LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models	Feng Li et.al.	2407.07895v1	link
2024-07-10	Vegetable Peeling: A Case Study in Constrained Dexterous Manipulation	Tao Chen et.al.	2407.07884v1	null
2024-07-10	Controlling Space and Time with Diffusion Models	Daniel Watson et.al.	2407.07860v1	null
2024-07-11	Functional Assessment of Cerebral Capillaries using Single Capillary Reporters in Ultrasound Localization Microscopy	Stephen A Lee et.al.	2407.07857v2	null
2024-07-10	Study on Aspect Ratio Variability toward Robustness of Vision Transformer-based Vehicle Re-identification	Mei Qiu et.al.	2407.07842v1	null
2024-07-10	Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective	Shengjia Chen et.al.	2407.07841v1	link
2024-07-10	Probe and Prejudice: Classification of compact objects and model comparison using EOS knowledge	Hauke Koehn et.al.	2407.07837v1	null
2024-07-10	RT-LA-VocE: Real-Time Low-SNR Audio-Visual Speech Enhancement	Honglie Chen et.al.	2407.07825v1	null
2024-07-10	New Gravitational Wave Discoveries Enabled by Machine Learning	Alexandra E. Koloniari et.al.	2407.07820v1	null
2024-07-10	The Misclassification Likelihood Matrix: Some Classes Are More Likely To Be Misclassified Than Others	Daniel Sikar et.al.	2407.07818v1	null
2024-07-09	V-VIPE: Variational View Invariant Pose Embedding	Mara Levy et.al.	2407.07092v1	null
2024-07-09	Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic	Ruochen Jin et.al.	2407.07089v1	link
2024-07-09	MoSt-DSA: Modeling Motion and Structural Interactions for Direct Multi-Frame Interpolation in DSA Images	Ziyang Xu et.al.	2407.07078v1	link
2024-07-09	MADE-for-ASD: A Multi-Atlas Deep Ensemble Network for Diagnosing Autism Spectrum Disorder	Md Rakibul Hasan et.al.	2407.07076v1	null
2024-07-10	CAPformer: Compression-Aware Pre-trained Transformer for Low-Light Image Enhancement	Wei Wang et.al.	2407.07056v2	null
2024-07-09	Latent Space Imaging	Matheus Souza et.al.	2407.07052v1	null
2024-07-09	Simple and Interpretable Probabilistic Classifiers for Knowledge Graphs	Christian Riefolo et.al.	2407.07045v1	null
2024-07-09	Free Fermionic Constructions of Heterotic Strings	Ioannis Florakis et.al.	2407.07034v1	null
2024-07-09	Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition	Daiqing Wu et.al.	2407.07026v1	null
2024-07-09	Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization	Jeongseok Hyun et.al.	2407.07024v1	link
2024-07-08	Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision	Orr Zohar et.al.	2407.06189v1	link
2024-07-08	Classification of Cellular Automata based on the Hamming distance	Gaspar Alfaro et.al.	2407.06175v1	null
2024-07-08	The Tug-of-War Between Deepfake Generation and Detection	Hannah Lee et.al.	2407.06174v1	null
2024-07-08	PanDORA: Casual HDR Radiance Acquisition for Indoor Scenes	Mohammad Reza Karimi Dastjerdi et.al.	2407.06150v1	null
2024-07-08	Physics-informed machine learning approaches to reactor antineutrino detection	Sophia Farrell et.al.	2407.06139v1	null
2024-07-08	Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities	Avinash Anand et.al.	2407.06125v1	null
2024-07-08	Accelerating Diffusion for SAR-to-Optical Image Translation via Adversarial Consistency Distillation	Xinyu Bai et.al.	2407.06095v1	null
2024-07-08	ERR@HRI 2024 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Interactions	Micol Spitale et.al.	2407.06094v1	null
2024-07-08	Artificial Intuition: Efficient Classification of Scientific Abstracts	Harsh Sakhrani et.al.	2407.06093v1	null
2024-07-08	Assessing Cardiomegaly in Dogs Using a Simple CNN Model	Nikhil Deekonda et.al.	2407.06092v1	null
2024-07-05	VCoME: Verbal Video Composition with Multimodal Editing Effects	Weibo Gong et.al.	2407.04697v1	null
2024-07-05	Enhancing Vehicle Re-identification and Matching for Weaving Analysis	Mei Qiu et.al.	2407.04688v1	null
2024-07-05	Embracing Massive Medical Data	Yu-Cheng Chou et.al.	2407.04687v1	link
2024-07-05	Is plantar thermography a valid digital biomarker for characterising diabetic foot ulceration risk?	Akshay Jagadeesh et.al.	2407.04676v1	null
2024-07-05	AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation	Yuhan Zhu et.al.	2407.04603v1	null
2024-07-05	Multimodal Classification via Modal-Aware Interactive Enhancement	Qing-Yuan Jiang et.al.	2407.04587v1	null
2024-07-05	A Degree Bound for Planar Functions	Christof Beierle et.al.	2407.04570v1	null
2024-07-05	Pencils of plane cubics with one base point	Riccardo Moschetti et.al.	2407.04569v1	null
2024-07-05	Anticipating Solar Flares	Hugh S. Hudson et.al.	2407.04567v1	null
2024-07-05	Real Time Emotion Analysis Using Deep Learning for Education, Entertainment, and Beyond	Abhilash Khuntia et.al.	2407.04560v1	null
2024-07-03	InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output	Pan Zhang et.al.	2407.03320v1	link
2024-07-03	Value-Penalized Auxiliary Control from Examples for Learning without Rewards or Demonstrations	Trevor Ablett et.al.	2407.03311v1	link
2024-07-03	Accelerated Proton Resonance Frequency-based Magnetic Resonance Thermometry by Optimized Deep Learning Method	Sijie Xu et.al.	2407.03308v1	link
2024-07-03	HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization	Yucheng Tang et.al.	2407.03307v1	null
2024-07-03	VCHAR:Variance-Driven Complex Human Activity Recognition framework with Generative Representation	Yuan Sun et.al.	2407.03291v1	null
2024-07-03	Using Photoplethysmography to Detect Real-time Blood Pressure Changes with a Calibration-free Deep Learning Model	Jingyuan Hong et.al.	2407.03274v1	null
2024-07-03	Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later	Han-Jia Ye et.al.	2407.03257v1	link
2024-07-03	STF: Sentence Transformer Fine-Tuning For Topic Categorization With Limited Data	Kheir Eddine Daouadi et.al.	2407.03253v1	null
2024-07-03	ACTRESS: Active Retraining for Semi-supervised Visual Grounding	Weitai Kang et.al.	2407.03251v1	null
2024-07-04	TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach	Weikun Peng et.al.	2407.03245v2	null
2024-07-02	Characterizing the Interpretability of Attention Maps in Digital Pathology	Tomé Albuquerque et.al.	2407.02484v1	null
2024-07-02	Ensemble of pre-trained language models and data augmentation for hate speech detection from Arabic tweets	Kheir Eddine Daouadi et.al.	2407.02448v1	null
2024-07-02	PLeaS -- Merging Models with Permutations and Least Squares	Anshul Nasery et.al.	2407.02447v1	null
2024-07-02	Evaluating the Robustness of Adverse Drug Event Classification Models Using Templates	Dorothea MacPhail et.al.	2407.02432v1	null
2024-07-02	AXIAL: Attention-based eXplainability for Interpretable Alzheimer's Localized Diagnosis using 2D CNNs on 3D MRI brain scans	Gabriele Lozupone et.al.	2407.02418v1	link
2024-07-03	Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs	Jinmin Li et.al.	2407.02411v2	null
2024-07-02	Tiny-PULP-Dronets: Squeezing Neural Networks for Faster and Lighter Inference on Multi-Tasking Autonomous Nano-Drones	Lorenzo Lamberti et.al.	2407.02405v1	null
2024-07-03	A neural networks method to search for long transient gravitational waves	Francesca Attadio et.al.	2407.02391v2	null
2024-07-02	Real HSI-MSI-PAN image dataset for the hyperspectral/multi-spectral/panchromatic image fusion and super-resolution fields	Shuangliang Li et.al.	2407.02387v1	link
2024-07-02	OpenSlot: Mixed Open-set Recognition with Object-centric Learning	Xu Yin et.al.	2407.02386v1	null
2024-06-28	Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs	Sukmin Yun et.al.	2406.20098v1	link
2024-06-28	LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression	Jieneng Chen et.al.	2406.20092v1	link
2024-06-28	Minimax And Adaptive Transfer Learning for Nonparametric Classification under Distributed Differential Privacy Constraints	Arnab Auddy et.al.	2406.20088v1	null
2024-06-28	Extreme horizon equation	Wojciech Kamiński et.al.	2406.20068v1	null
2024-06-28	Modeling and LQR Control of Insect Sized Flapping Wing Robot	Daksh Dhingra et.al.	2406.20061v1	null
2024-06-28	Pairwise Difference Learning for Classification	Mohamed Karim Belaid et.al.	2406.20031v1	link
2024-06-28	On the Trade-off between Flatness and Optimization in Distributed Learning	Ying Cao et.al.	2406.20006v1	null
2024-06-28	Malaria Cell Detection Using Deep Neural Networks	Saurabh Sawant et.al.	2406.20005v1	null
2024-06-28	Impact of Initialization on Intra-subject Pediatric Brain MR Image Registration: A Comparative Analysis between SyN ANTs and Deep Learning-Based Approaches	Andjela Dimitrijevic et.al.	2406.19943v1	link
2024-07-01	GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection	Chih-Chung Hsu et.al.	2406.19941v2	link
2024-06-27	ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos	Jr-Jen Chen et.al.	2406.19392v1	link
2024-06-27	Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads	Ali Khaleghi Rahimian et.al.	2406.19391v1	link
2024-06-27	OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding	Tao Zhang et.al.	2406.19389v1	null
2024-06-27	Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model	Haobo Yuan et.al.	2406.19369v1	null
2024-06-27	IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language	Lucky Susanto et.al.	2406.19349v1	null
2024-06-27	Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation	Yushun Tang et.al.	2406.19341v1	null
2024-06-28	LiverUSRecon: Automatic 3D Reconstruction and Volumetry of the Liver with a Few Partial Ultrasound Scans	Kaushalya Sivayogaraj et.al.	2406.19336v2	null
2024-06-27	PNeRV: A Polynomial Neural Representation for Videos	Sonam Gupta et.al.	2406.19299v1	null
2024-06-27	Leveraging Contrastive Learning for Enhanced Node Representations in Tokenized Graph Transformers	Jinsong Chen et.al.	2406.19258v1	null
2024-06-27	Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment	Hao Fei et.al.	2406.19255v1	null
2024-06-26	Towards Compositionality in Concept Learning	Adam Stein et.al.	2406.18534v1	link
2024-06-26	MatchTime: Towards Automatic Soccer Game Commentary Generation	Jiayuan Rao et.al.	2406.18530v1	null
2024-06-26	MultiDiff: Consistent Novel View Synthesis from a Single Image	Norman Müller et.al.	2406.18524v1	null
2024-06-26	ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation	Shenghai Yuan et.al.	2406.18522v1	null
2024-06-27	Distinguishing mechanisms of social contagion from local network view	Elsa Andres et.al.	2406.18519v2	null
2024-06-26	Assessment of Clonal Hematopoiesis of Indeterminate Potential from Cardiac Magnetic Resonance Imaging using Deep Learning in a Cardio-oncology Population	Sangeon Ryu et.al.	2406.18508v1	null
2024-06-26	Robust Surgical Phase Recognition From Annotation Efficient Supervision	Or Rubin et.al.	2406.18481v1	null
2024-06-26	Universal Anomaly Detection at the LHC: Transforming Optimal Classifiers and the DDD Method	Sascha Caron et.al.	2406.18469v1	null
2024-06-26	An Autotuning-based Optimization Framework for Mixed-kernel SVM Classifications in Smart Pixel Datasets and Heterojunction Transistors	Xingfu Wu et.al.	2406.18445v1	null
2024-06-26	Repeat and Concatenate: 2D to 3D Image Translation with 3D to 3D Generative Modeling	Abril Corona-Figueroa et.al.	2406.18422v1	null
2024-06-25	Text-Animator: Controllable Visual Text Video Generation	Lin Liu et.al.	2406.17777v1	null
2024-06-25	MotionBooth: Motion-Aware Customized Text-to-Video Generation	Jianzong Wu et.al.	2406.17758v1	null
2024-06-25	Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation	Tushar Prasanna Swaminathan et.al.	2406.17749v1	null
2024-06-25	Structured Unrestricted-Rank Matrices for Parameter Efficient Fine-tuning	Arijit Sehanobish et.al.	2406.17740v1	null
2024-06-25	Mask-Guided Attention U-Net for Enhanced Neonatal Brain Extraction and Image Preprocessing	Bahram Jafrasteh et.al.	2406.17709v1	link
2024-06-25	SurgeMOD: Translating image-space tissue motions into vision-based surgical forces	Mikel De Iturrate Reyzabal et.al.	2406.17707v1	link
2024-06-25	Dualities for universal (co)acting Hopf monoids	Ana Agore et.al.	2406.17684v1	null
2024-06-25	Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation	Xuming Zhang et.al.	2406.17679v1	null
2024-06-25	Lifting of locally initial objects and universal (co)acting Hopf algebras	Ana Agore et.al.	2406.17677v1	null
2024-06-25	Brain Tumor Classification using Vision Transformer with Selective Cross-Attention Mechanism and Feature Calibration	Mohammad Ali Labbaf Khaniki et.al.	2406.17670v1	null
2024-06-24	StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal	Chongjie Ye et.al.	2406.16864v1	null
2024-06-24	FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models	Haonan Qiu et.al.	2406.16863v1	link
2024-06-24	Dreamitate: Real-World Visuomotor Policy Learning via Video Generation	Junbang Liang et.al.	2406.16862v1	null
2024-06-24	Long Context Transfer from Language to Vision	Peiyuan Zhang et.al.	2406.16852v1	link
2024-06-24	Unsupervised Domain Adaptation for Pediatric Brain Tumor Segmentation	Jingru Fu et.al.	2406.16848v1	null
2024-06-24	Exploring Factual Entailment with NLI: A News Media Study	Guy Mor-Lan et.al.	2406.16842v1	null
2024-06-24	A Certifiable Algorithm for Simultaneous Shape Estimation and Object Tracking	Lorenzo Shaikewitz et.al.	2406.16837v1	null
2024-06-24	USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations	Mounika Marreddy et.al.	2406.16833v1	null
2024-06-24	The classification of simple complex Lie superalgebras of polynomial vector fields and their deformations	Dimitry Leites et.al.	2406.16760v1	null
2024-06-24	The MRI Scanner as a Diagnostic: Image-less Active Sampling	Yuning Du et.al.	2406.16754v1	null
2024-06-21	Full-Scale Indexing and Semantic Annotation of CT Imaging: Boosting FAIRness	Hannes Ulrich et.al.	2406.15340v1	null
2024-06-21	Image Conductor: Precision Control for Interactive Video Synthesis	Yaowei Li et.al.	2406.15339v1	null
2024-06-21	An End-to-End, Segmentation-Free, Arabic Handwritten Recognition Model on KHATT	Sondos Aabed et.al.	2406.15329v1	null
2024-06-21	Fine-grained Attention in Hierarchical Transformers for Tabular Time-series	Raphael Azorin et.al.	2406.15327v1	link
2024-06-21	NLP-KG: A System for Exploratory Search of Scientific Literature in Natural Language Processing	Tim Schopf et.al.	2406.15294v1	link
2024-06-21	Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics	Weijia Zhang et.al.	2406.15264v1	null
2024-06-24	VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation	Xuan He et.al.	2406.15252v2	null
2024-06-21	Retrieval Augmented Zero-Shot Text Classification	Tassallah Abdullahi et.al.	2406.15241v1	null
2024-06-21	Model Equivalences	Michael Benedikt et.al.	2406.15235v1	null
2024-06-21	Rate-Splitting Multiple Access for Overloaded Multi-group Multicast: A First Experimental Study	Xinze Lyu et.al.	2406.15217v1	null
2024-06-20	A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models	Xincheng Shuai et.al.	2406.14555v1	link
2024-06-21	Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation	Eyal Michaeli et.al.	2406.14551v2	link
2024-06-20	IRASim: Learning Interactive Real-Robot Action Simulators	Fangqi Zhu et.al.	2406.14540v1	null
2024-06-20	Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration	Long Lei et.al.	2406.14534v1	link
2024-06-20	Local symmetries in partially ordered sets	Christoph Minz et.al.	2406.14533v1	null
2024-06-20	Fantastic Copyrighted Beasts and How (Not) to Generate Them	Luxi He et.al.	2406.14526v1	null
2024-06-20	MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding	Xinyu Fang et.al.	2406.14515v1	link
2024-06-20	V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data	Rotem Shalev-Arkushin et.al.	2406.14510v1	null
2024-06-20	LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors	Sheikh Asif Imran et.al.	2406.14498v1	link
2024-06-20	African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification	Gregor Geigle et.al.	2406.14496v1	null
2024-06-18	DrVideo: Document Retrieval Based Long Video Understanding	Ziyu Ma et.al.	2406.12846v1	null
2024-06-18	LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging	Jinuk Kim et.al.	2406.12837v1	link
2024-06-18	GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation	Ci-Siang Lin et.al.	2406.12834v1	null
2024-06-18	VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing	Jing Gu et.al.	2406.12831v1	null
2024-06-18	Neural Approximate Mirror Maps for Constrained Diffusion Models	Berthy T. Feng et.al.	2406.12816v1	null
2024-06-18	Privacy Preserving Federated Learning in Medical Imaging with Uncertainty Estimation	Nikolas Koutsoubis et.al.	2406.12815v1	link
2024-06-18	Probabilistic Temporal Prediction of Continuous Disease Trajectories and Treatment Effects Using Neural SDEs	Joshua Durso-Finley et.al.	2406.12807v1	null
2024-06-18	Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition	Xingming Liao et.al.	2406.12779v1	null
2024-06-18	Medvedev degrees of subshifts on groups	Sebastián Barbieri et.al.	2406.12777v1	null
2024-06-18	Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video	Xiangming Zhu et.al.	2406.12769v1	null
2024-06-17	Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%	Lei Zhu et.al.	2406.11837v1	link
2024-06-17	Spectral Introspection Identifies Group Training Dynamics in Deep Neural Networks for Neuroimaging	Bradley T. Baker et.al.	2406.11825v1	null
2024-06-17	Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation	Alexander Raistrick et.al.	2406.11824v1	null
2024-06-17	VideoLLM-online: Online Video Large Language Model for Streaming Video	Joya Chen et.al.	2406.11816v1	null
2024-06-17	Faces of Experimental Pain: Transferability of Deep Learned Heat Pain Features to Electrical Pain	Pooja Prajod et.al.	2406.11808v1	null
2024-06-17	Mix-Domain Contrastive Learning for Unpaired H&E-to-IHC Stain Translation	Song Wang et.al.	2406.11799v1	null
2024-06-17	CELL your Model: Contrastive Explanation Methods for Large Language Models	Ronny Luss et.al.	2406.11785v1	null
2024-06-17	Task Me Anything	Jieyu Zhang et.al.	2406.11775v1	link
2024-06-17	Domain Generalization for In-Orbit 6D Pose Estimation	Antoine Legrand et.al.	2406.11743v1	null
2024-06-17	Lightweight Model Pre-training via Language Guided Knowledge Distillation	Mingsheng Li et.al.	2406.11689v1	link
2024-06-14	VideoGUI: A Benchmark for GUI Automation from Instructional Videos	Kevin Qinghong Lin et.al.	2406.10227v1	null
2024-06-14	Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding	Ridouane Ghermi et.al.	2406.10221v1	null
2024-06-14	SSTFB: Leveraging self-supervised pretext learning and temporal self-attention with feature branching for real-time video polyp segmentation	Ziang Xu et.al.	2406.10200v1	null
2024-06-14	CarLLaVA: Vision language models for camera-only closed-loop driving	Katrin Renz et.al.	2406.10165v1	null
2024-06-14	Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition	Guinan Li et.al.	2406.10152v1	null
2024-06-14	Training-free Camera Control for Video Generation	Chen Hou et.al.	2406.10126v1	null
2024-06-14	Modified Risk Formulation for Improving the Prediction of Knee Osteoarthritis Progression	Haresh Rengaraj Rajamohan et.al.	2406.10119v1	null
2024-06-14	ECGMamba: Towards Efficient ECG Classification with BiSSM	Yupeng Qiang et.al.	2406.10098v1	null
2024-06-14	Biomarker based Cancer Classification using an Ensemble with Pre-trained Models	Chongmin Lee et.al.	2406.10087v1	null
2024-06-14	On the Evaluation of Speech Foundation Models for Spoken Language Understanding	Siddhant Arora et.al.	2406.10083v1	null
2024-06-13	VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding	Muhammad Maaz et.al.	2406.09418v1	link
2024-06-13	An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels	Duy-Kien Nguyen et.al.	2406.09415v1	null
2024-06-13	CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras	Sachin Shah et.al.	2406.09409v1	null
2024-06-13	Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion	Linzhan Mou et.al.	2406.09402v1	null
2024-06-13	OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation	Junke Wang et.al.	2406.09399v1	link
2024-06-13	Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA	Jongwoo Park et.al.	2406.09396v1	null
2024-06-13	LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living	Rajatsubhra Chakraborty et.al.	2406.09390v1	null
2024-06-13	Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior	Baiang Li et.al.	2406.09389v1	null
2024-06-13	Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition	Youngtaek Oh et.al.	2406.09388v1	link
2024-06-13	SimGen: Simulator-conditioned Driving Scene Generation	Yunsong Zhou et.al.	2406.09386v1	null
2024-06-12	On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models	Hashmat Shadab Malik et.al.	2406.08486v1	link
2024-06-12	RMem: Restricted Memory Banks Improve Video Object Segmentation	Junbao Zhou et.al.	2406.08476v1	null
2024-06-12	AToM-Bot: Embodied Fulfillment of Unspoken Human Needs with Affective Theory of Mind	Wei Ding et.al.	2406.08455v1	null
2024-06-12	Transformation-Dependent Adversarial Attacks	Yaoteng Tan et.al.	2406.08443v1	null
2024-06-12	A Sticker is Worth a Thousand Words: Characterizing the Use of Stickers in WhatsApp Political Groups in Brazil	Philipe Melo et.al.	2406.08429v1	null
2024-06-12	Improving Noise Robustness through Abstractions and its Impact on Machine Learning	Alfredo Ibias et.al.	2406.08428v1	null
2024-06-12	OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text	Qingyun Li et.al.	2406.08418v1	link
2024-06-13	MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos	Xuehai He et.al.	2406.08407v2	link
2024-06-12	Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze	Michele Mazzamuto et.al.	2406.08379v1	null
2024-06-12	2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction	Tianqi Chen et.al.	2406.08374v1	null
2024-06-11	Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring	Huicong Zhang et.al.	2406.07551v1	link
2024-06-11	Image and Video Tokenization with Binary Spherical Quantization	Yue Zhao et.al.	2406.07548v1	link
2024-06-11	Zero-shot Image Editing with Reference Imitation	Xi Chen et.al.	2406.07547v1	null
2024-06-11	Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance	Kuan Heng Lin et.al.	2406.07540v1	null
2024-06-11	BAKU: An Efficient Transformer for Multi-Task Policy Learning	Siddhant Haldar et.al.	2406.07539v1	null
2024-06-11	Transforming a rare event search into a not-so-rare event search in real-time with deep learning-based object detection	J. Schueler et.al.	2406.07538v1	null
2024-06-11	Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection	Wenxiao Wang et.al.	2406.07536v1	null
2024-06-11	Dynamics of the non-radial energy-critical inhomogeneous NLS	Carlos M. Guzmán et.al.	2406.07535v1	null
2024-06-11	Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement	Yunzhen Feng et.al.	2406.07515v1	null
2024-06-11	Understanding Visual Concepts Across Models	Brandon Trabucco et.al.	2406.07506v1	link
2024-06-10	NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing	Ting-Hsuan Chen et.al.	2406.06523v1	null
2024-06-10	Data Augmentation for Multivariate Time Series Classification: An Experimental Study	Romain Ilbert et.al.	2406.06518v1	null
2024-06-10	Merlin: A Vision Language Foundation Model for 3D Computed Tomography	Louis Blankemeier et.al.	2406.06512v1	null
2024-06-10	Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer	Sigal Raab et.al.	2406.06508v1	link
2024-06-10	Equivariant Neural Tangent Kernels	Philipp Misof et.al.	2406.06504v1	null
2024-06-10	Viscous shock fluctuations in KPZ	Alexander Dunlap et.al.	2406.06502v1	null
2024-06-10	NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative	Asmar Nadeem et.al.	2406.06499v1	null
2024-06-10	Demonstrating HumanTHOR: A Simulation Platform and Benchmark for Human-Robot Collaboration in a Shared Workspace	Chenxu Wang et.al.	2406.06498v1	null
2024-06-10	Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data	Nicole Hayes et.al.	2406.06479v1	null
2024-06-10	DiffAudit: Auditing Privacy Practices of Online Services for Children and Adolescents	Olivia Figueira et.al.	2406.06473v1	null
2024-06-07	DVOS: Self-Supervised Dense-Pattern Video Object Segmentation	Keyhan Najafian et.al.	2406.05131v1	null
2024-06-07	Compositional Curvature Bounds for Deep Neural Networks	Taha Entesari et.al.	2406.05119v1	null
2024-06-07	Large Generative Graph Models	Yu Wang et.al.	2406.05109v1	null
2024-06-07	A Novel Time Series-to-Image Encoding Approach for Weather Phenomena Classification	Christian Giannetti et.al.	2406.05096v1	null
2024-06-10	Discovery of An Apparent Red, High-Velocity Type Ia Supernova at z = 2.9 with JWST	J. D. R. Pierel et.al.	2406.05089v2	null
2024-06-07	CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion	Xingrui Wang et.al.	2406.05082v1	null
2024-06-10	Discovery of a Relativistic Stripped Envelope Type Ic-BL Supernova at z = 2.83 with JWST	M. R. Siebert et.al.	2406.05076v2	null
2024-06-07	Diving Deep into the Motion Representation of Video-Text Models	Chinmaya Devaraj et.al.	2406.05075v1	null
2024-06-07	Hibou: A Family of Foundational Vision Transformers for Pathology	Dmitry Nechaev et.al.	2406.05074v1	null
2024-06-07	Classification Metrics for Image Explanations: Towards Building Reliable XAI-Evaluations	Benjamin Fresz et.al.	2406.05068v1	link
2024-06-06	Verbalized Machine Learning: Revisiting Machine Learning with Language Models	Tim Z. Xiao et.al.	2406.04344v1	null
2024-06-07	Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion	Fangfu Liu et.al.	2406.04338v2	null
2024-06-06	Parameter-Inverted Image Pyramid Networks	Xizhou Zhu et.al.	2406.04330v1	link
2024-06-06	ShareGPT4Video: Improving Video Understanding and Generation with Better Captions	Lin Chen et.al.	2406.04325v1	null
2024-06-06	SF-V: Single Forward Video Generation Model	Zhixing Zhang et.al.	2406.04324v1	null
2024-06-06	ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories	Qianlan Yang et.al.	2406.04323v1	null
2024-06-06	VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling	Zeyue Tian et.al.	2406.04321v1	link
2024-06-06	Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models	Ali Behrouz et.al.	2406.04320v1	null
2024-06-06	Adaptive Sampling of k-Space in Magnetic Resonance for Rapid Pathology Prediction	Chen-Yu Yen et.al.	2406.04318v1	null
2024-06-06	Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks	Tristan Cinquin et.al.	2406.04317v1	null
2024-06-05	Grokking Modular Polynomials	Darshil Doshi et.al.	2406.03495v1	null
2024-06-05	The Logarithmic Memristor-Based Bayesian Machine	Clément Turck et.al.	2406.03492v1	null
2024-06-05	Convolutional Neural Networks and Vision Transformers for Fashion MNIST Classification: A Literature Review	Sonia Bbouzidi et.al.	2406.03478v1	null
2024-06-05	Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach	Haoyu Han et.al.	2406.03464v1	null
2024-06-05	Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts	Dominik Scheuble et.al.	2406.03461v1	null
2024-06-05	FILS: Self-Supervised Video Feature Prediction In Semantic Language Space	Mona Ahmadian et.al.	2406.03447v1	null
2024-06-05	Text-to-Events: Synthetic Event Camera Streams from Conditional Text Input	Joachim Ott et.al.	2406.03439v1	null
2024-06-05	Stabilizing massless fields with fluxes in Landau-Ginzburg models	Katrin Becker et.al.	2406.03435v1	null
2024-06-05	Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis	Moein Heidari et.al.	2406.03430v1	link
2024-06-05	Post-hoc Part-prototype Networks	Andong Tan et.al.	2406.03421v1	null
2024-06-05	Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting	Inkyu Shin et.al.	2406.02541v2	null
2024-06-04	ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation	Tianchen Zhao et.al.	2406.02540v1	null
2024-06-04	Enhancing predictive imaging biomarker discovery through treatment effect analysis	Shuhan Xiao et.al.	2406.02534v1	null
2024-06-04	ReLUs Are Sufficient for Learning Implicit Neural Representations	Joseph Shenouda et.al.	2406.02529v1	link
2024-06-04	RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots	Soroush Nasiriany et.al.	2406.02523v1	null
2024-06-04	DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering	Zhongpai Gao et.al.	2406.02518v1	null
2024-06-04	V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation	Cong Wang et.al.	2406.02511v1	null
2024-06-04	CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation	Dejia Xu et.al.	2406.02509v1	null
2024-06-04	Endomorphisms of Artin groups of type $\tilde A_n$	Luis Paris et.al.	2406.02484v1	null
2024-06-04	Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion	Colin Hansen et.al.	2406.02477v1	null
2024-05-31	Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis	Chaoyou Fu et.al.	2405.21075v1	null
2024-05-31	Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights	Xin Wen et.al.	2405.21070v1	link
2024-05-31	You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet	Zhen Qin et.al.	2405.21022v1	null
2024-05-31	Beyond Conventional Parametric Modeling: Data-Driven Framework for Estimation and Prediction of Time Activity Curves in Dynamic PET Imaging	Niloufar Zakariaei et.al.	2405.21021v1	null
2024-05-31	The classification of dp-minimal integral domains	Christian d'Elbée et.al.	2405.21014v1	null
2024-05-31	Early Stopping Criteria for Training Generative Adversarial Networks in Biomedical Imaging	Muhammad Muneeb Saad et.al.	2405.20987v1	null
2024-05-31	PUAL: A Classifier on Trifurcate Positive-Unlabeled Data	Xiaoke Wang et.al.	2405.20970v1	null
2024-05-31	Aligning Multiclass Neural Network Classifier Criterion with Task Performance via $F_β$-Score	Nathan Tsoi et.al.	2405.20954v1	null
2024-05-31	Standard model of electromagnetism and chirality in crystals	R. Winkler et.al.	2405.20940v1	null
2024-05-31	MALT: Multi-scale Action Learning Transformer for Online Action Detection	Zhipeng Yang et.al.	2405.20892v1	null
2024-05-30	MotionLLM: Understanding Human Behaviors from Human Motions and Videos	Ling-Hao Chen et.al.	2405.20340v1	null
2024-05-30	OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving	Lening Wang et.al.	2405.20337v1	link
2024-05-30	VividDream: Generating 3D Scene with Ambient Dynamics	Yao-Chih Lee et.al.	2405.20334v1	null
2024-05-30	SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos	Chinedu Innocent Nwoye et.al.	2405.20333v1	null
2024-05-31	4DHands: Reconstructing Interactive Hands in 4D with Transformers	Dixuan Lin et.al.	2405.20330v2	null
2024-05-30	MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion	Shuyuan Tu et.al.	2405.20325v1	null
2024-05-30	Vision-based Manipulation from Single Human Video with Open-World Object Graphs	Yifeng Zhu et.al.	2405.20321v1	null
2024-05-30	Improving the Training of Rectified Flows	Sangyun Lee et.al.	2405.20320v1	link
2024-05-30	CausalQuest: Collecting Natural Causal Questions for AI Agents	Roberto Ceraolo et.al.	2405.20318v1	link
2024-05-30	Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models	Himangi Mittal et.al.	2405.20305v1	null
2024-05-29	X-VILA: Cross-Modality Alignment for Large Language Model	Hanrong Ye et.al.	2405.19335v1	null
2024-05-29	LLMs Meet Multimodal Generation and Editing: A Survey	Yingqing He et.al.	2405.19334v1	link
2024-05-29	Multi-Modal Generative Embedding Model	Feipeng Ma et.al.	2405.19333v1	null
2024-05-29	NPGA: Neural Parametric Gaussian Avatars	Simon Giebenhain et.al.	2405.19331v1	null
2024-05-29	Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation	Atrisha Sarkar et.al.	2405.19328v1	null
2024-05-29	DGD: Dynamic 3D Gaussians Distillation	Isaac Labe et.al.	2405.19321v1	null
2024-05-29	Real-Time Environment Condition Classification for Autonomous Vehicles	Marco Introvigne et.al.	2405.19305v1	null
2024-05-29	Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare	Hanwei Zhu et.al.	2405.19298v1	null
2024-05-29	Archetype-Based Redshift Estimation for the Dark Energy Spectroscopic Instrument Survey	Abhijeet Anand et.al.	2405.19288v1	null
2024-05-29	A study on the adequacy of common IQA measures for medical images	Anna Breger et.al.	2405.19224v1	null
2024-05-28	Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets	Khen Cohen et.al.	2405.18427v1	null
2024-05-28	GFlow: Recovering 4D World from Monocular Video	Shizun Wang et.al.	2405.18426v1	null
2024-05-28	Hierarchical World Models as Visual Whole-Body Humanoid Controllers	Nicklas Hansen et.al.	2405.18418v1	null
2024-05-28	3D StreetUnveiler with Semantic-Aware 2DGS	Jingwei Xu et.al.	2405.18416v1	null
2024-05-28	Why are Visually-Grounded Language Models Bad at Image Classification?	Yuhui Zhang et.al.	2405.18415v1	link
2024-05-28	Towards a Sampling Theory for Implicit Neural Representations	Mahrokh Najaf et.al.	2405.18410v1	null
2024-05-28	Phased Consistency Model	Fu-Yun Wang et.al.	2405.18407v1	null
2024-05-28	RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives	Jaehong Yoon et.al.	2405.18406v1	null
2024-05-28	MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning	Somnath Kumar et.al.	2405.18358v1	null
2024-05-28	Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography	Jie Liu et.al.	2405.18356v1	link
2024-05-27	Matryoshka Multimodal Models	Mu Cai et.al.	2405.17430v1	null
2024-05-27	NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models	Chankyu Lee et.al.	2405.17428v1	null
2024-05-27	MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds	Jiahui Lei et.al.	2405.17421v1	null
2024-05-27	Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control	Zhengfei Kuang et.al.	2405.17414v1	null
2024-05-27	Enhancing Music Genre Classification through Multi-Algorithm Analysis and User-Friendly Visualization	Navin Kamuni et.al.	2405.17413v1	null
2024-05-27	The Peripatetic Hater: Predicting Movement Among Hate Subreddits	Daniel Hickey et.al.	2405.17410v1	null
2024-05-27	Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer	Ruizhi Shao et.al.	2405.17405v1	null
2024-05-27	Spectral Greedy Coresets for Graph Neural Networks	Mucong Ding et.al.	2405.17404v1	null
2024-05-27	Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability	Shenyuan Gao et.al.	2405.17398v1	link
2024-05-27	Non-Unitary Quantum Machine Learning	Jamie Heredge et.al.	2405.17388v1	null
2024-05-24	Canonical Variates in Wasserstein Metric Space	Jia Li et.al.	2405.15768v1	null
2024-05-24	Scaling Laws for Discriminative Classification in Large Language Models	Dean Wyatte et.al.	2405.15765v1	null
2024-05-24	InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation	Yuchi Wang et.al.	2405.15758v1	link
2024-05-24	Looking Backward: Streaming Video-to-Video Translation with Feature Banks	Feng Liang et.al.	2405.15757v1	link
2024-05-24	Characterizing Discourse Group Roles in Inquiry-based University Science Labs	Tong Wan et.al.	2405.15746v1	null
2024-05-24	Hierarchical Uncertainty Exploration via Feedforward Posterior Trees	Elias Nehme et.al.	2405.15719v1	null
2024-05-24	EmpathicStories++: A Multimodal Dataset for Empathy towards Personal Experiences	Jocelyn Shen et.al.	2405.15708v1	null
2024-05-24	Sums: Sniffing Unknown Multiband Signals under Low Sampling Rates	Jinbo Peng et.al.	2405.15705v1	null
2024-05-24	realSEUDO for real-time calcium imaging analysis	Iuliia Dmitrieva et.al.	2405.15701v1	null
2024-05-24	UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes	Ted Lentsch et.al.	2405.15688v1	null
2024-05-23	PuzzleAvatar: Assembling 3D Avatars from Personal Albums	Yuliang Xiu et.al.	2405.14869v1	null
2024-05-23	Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis	Basile Van Hoorick et.al.	2405.14868v1	null
2024-05-23	Video Diffusion Models are Training-free Motion Interpreter and Controller	Zeqi Xiao et.al.	2405.14864v1	null
2024-05-23	Synergistic Global-space Camera and Human Reconstruction from Videos	Yizhou Zhao et.al.	2405.14855v1	null
2024-05-23	Domain Wall Magnetic Tunnel Junction Reliable Integrate and Fire Neuron	Can Cui1 et.al.	2405.14851v1	null
2024-05-23	Learning to Detect and Segment Mobile Objects from Unlabeled Videos	Yihong Sun et.al.	2405.14841v1	null
2024-05-23	Designing A Sustainable Marine Debris Clean-up Framework without Human Labels	Raymond Wang et.al.	2405.14815v1	null
2024-05-23	As an AI Language Model, "Yes I Would Recommend Calling the Police'': Norm Inconsistency in LLM Decision-Making	Shomik Jain et.al.	2405.14812v1	null
2024-05-23	Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics	Jonas Spinner et.al.	2405.14806v1	null
2024-05-24	Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation	Hongxu Jiang et.al.	2405.14802v2	link
2024-05-21	Comprehensive Multimodal Deep Learning Survival Prediction Enabled by a Transformer Architecture: A Multicenter Study in Glioblastoma	Ahmed Gomaa et.al.	2405.12963v1	null
2024-05-21	**Online Learning of Halfspaces with Massart N

Name		Name	Last commit message	Last commit date
Latest commit History 2,297 Commits
.github/workflows		.github/workflows
docs		docs
README.md		README.md
cv-arxiv-daily.json		cv-arxiv-daily.json
daily_arxiv.py		daily_arxiv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Updated on 2025.07.26

Video_Classification

About

Uh oh!

Releases

Packages

Uh oh!

Languages

DWCTOD/cv-arxiv-daily

Folders and files

Latest commit

History

Repository files navigation

Updated on 2025.07.26

Video_Classification

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages