Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-07-23 | Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility | Melih Barsbey et.al. | 2507.17748v1 | null |
2025-07-23 | Yume: An Interactive World Generation Model | Xiaofeng Mao et.al. | 2507.17744v1 | null |
2025-07-23 | BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems | Malsha Ashani Mahawatta Dona et.al. | 2507.17722v1 | null |
2025-07-23 | Towards Effective Open-set Graph Class-incremental Learning | Jiazhen Chen et.al. | 2507.17687v1 | null |
2025-07-23 | Audio-Vision Contrastive Learning for Phonological Class Recognition | Daiqi Liu et.al. | 2507.17682v1 | null |
2025-07-23 | MCM: Mamba-based Cardiac Motion Tracking using Sequential Images in MRI | Jiahui Yin et.al. | 2507.17678v1 | null |
2025-07-23 | Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography | Farnoush Bayatmakou et.al. | 2507.17662v1 | null |
2025-07-23 | The Early Bird Identifies the Worm: You Can't Beat a Head Start in Long-Term Body Re-ID (ECHO-BID) | Thomas M. Metz et.al. | 2507.17640v1 | null |
2025-07-23 | Who Attacks, and Why? Using LLMs to Identify Negative Campaigning in 18M Tweets across 19 Countries | Victor Hartman et.al. | 2507.17636v1 | null |
2025-07-23 | Gauge Symmetries, Exact Symmetries and Conserved Charges in Minimal Massive Gravity | Kang Liu et.al. | 2507.17635v1 | null |
2025-07-22 | MultiTaskDeltaNet: Change Detection-based Image Segmentation for Operando ETEM with Application to Carbon Gasification Kinetics | Yushuo Niu et.al. | 2507.16803v1 | null |
2025-07-22 | Improving U-Net Confidence on TEM Image Data with L2-Regularization, Transfer Learning, and Deep Fine-Tuning | Aiden Ochoa et.al. | 2507.16779v1 | null |
2025-07-22 | Faithful, Interpretable Chest X-ray Diagnosis with Anti-Aliased B-cos Networks | Marcel Kleinmann et.al. | 2507.16761v1 | null |
2025-07-22 | Improving Model Classification by Optimizing the Training Dataset | Morad Tukan et.al. | 2507.16729v1 | null |
2025-07-22 | SALM: Spatial Audio Language Model with Structured Embeddings for Understanding and Editing | Jinbo Hu et.al. | 2507.16724v1 | null |
2025-07-22 | Temporally-Constrained Video Reasoning Segmentation and Automated Benchmark Construction | Yiqing Shen et.al. | 2507.16718v1 | null |
2025-07-22 | A Tutorial on MRI Reconstruction: From Modern Methods to Clinical Implications | Tolga Çukur et.al. | 2507.16715v1 | null |
2025-07-22 | Ring-based ML calibration with in situ pileup correction for real-time jet triggers | Benjamin T. Carlson et.al. | 2507.16686v1 | null |
2025-07-22 | VulGuard: An Unified Tool for Evaluating Just-In-Time Vulnerability Prediction Models | Duong Nguyen et.al. | 2507.16685v1 | null |
2025-07-22 | Structural Effect and Spectral Enhancement of High-Dimensional Regularized Linear Discriminant Analysis | Yonghan Zhang et.al. | 2507.16682v1 | null |
2025-07-21 | Simulating the LOcal Web (SLOW) V. Thermodynamic Properties and Evolution of Local Galaxy Clusters | Elena Hernández-Martínez et.al. | 2507.15858v1 | null |
2025-07-21 | Optimized Fabrication Procedure for High-Quality Graphene-based Moiré Superlattice Devices | Shuwen Sun et.al. | 2507.15853v1 | null |
2025-07-22 | SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction | Zhixiong Zhang et.al. | 2507.15852v2 | null |
2025-07-22 | GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding | Fei Tang et.al. | 2507.15846v2 | null |
2025-07-21 | Quantum computational sensing using quantum signal processing, quantum neural networks, and Hamiltonian engineering | Saeed A. Khan et.al. | 2507.15845v1 | null |
2025-07-21 | Optimizing Canaries for Privacy Auditing with Metagradient Descent | Matteo Boglioni et.al. | 2507.15836v1 | null |
2025-07-21 | Can Your Model Separate Yolks with a Water Bottle? Benchmarking Physical Commonsense Understanding in Video Generation Models | Enes Sanli et.al. | 2507.15824v1 | null |
2025-07-21 | Graph Attention Specialized Expert Fusion Model for Node Classification: Based on Cora and Pubmed Datasets | Zihang Ma et.al. | 2507.15784v1 | null |
2025-07-21 | Learning from Heterogeneity: Generalizing Dynamic Facial Expression Recognition via Distributionally Robust Optimization | Feng-Qi Cui et.al. | 2507.15765v1 | null |
2025-07-21 | TokensGen: Harnessing Condensed Tokens for Long Video Generation | Wenqi Ouyang et.al. | 2507.15728v1 | null |
2025-07-18 | NGC 663 as a laboratory for massive star evolution | Amparo Marco et.al. | 2507.14125v1 | null |
2025-07-18 | Kolmogorov Arnold Networks (KANs) for Imbalanced Data -- An Empirical Perspective | Pankaj Yadav et.al. | 2507.14121v1 | null |
2025-07-18 | Quantum Boltzmann Machines using Parallel Annealing for Medical Image Classification | Daniëlle Schuman et.al. | 2507.14116v1 | null |
2025-07-18 | Maximal translation surfaces in Lorentz-Minkowski space | Rafael López et.al. | 2507.14103v1 | null |
2025-07-18 | UGPL: Uncertainty-Guided Progressive Learning for Evidence-Based Classification in Computed Tomography | Shravan Venkatraman et.al. | 2507.14102v1 | null |
2025-07-18 | Generative AI-Driven High-Fidelity Human Motion Simulation | Hari Iyer et.al. | 2507.14097v1 | null |
2025-07-18 | Multi-Centre Validation of a Deep Learning Model for Scoliosis Assessment | Šimon Kubov et.al. | 2507.14093v1 | null |
2025-07-18 | Unmasking Performance Gaps: A Comparative Study of Human Anonymization and Its Effects on Video Anomaly Detection | Sara Abdulaziz et.al. | 2507.14083v1 | null |
2025-07-18 | Semi-supervised classification of Stars, Galaxies and Quasars using K-means and Random Forest | Vahid Asadi et.al. | 2507.14072v1 | null |
2025-07-18 | Predicting interface and spin states in armchair graphene nanoribbon junctions | Sofia Sanz et.al. | 2507.14065v1 | null |
2025-07-17 | VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding | Shihao Wang et.al. | 2507.13353v1 | null |
2025-07-17 | Yifan Wang et.al. | 2507.13347v1 | null | |
2025-07-17 | Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models | Yudong Jin et.al. | 2507.13344v1 | null |
2025-07-17 | Taming Diffusion Transformer for Real-Time Mobile Video Generation | Yushu Wu et.al. | 2507.13343v1 | null |
2025-07-17 | SpectraLift: Physics-Guided Spectral-Inversion Network for Self-Supervised Hyperspectral Image Super-Resolution | Ritik Shah et.al. | 2507.13339v1 | null |
2025-07-17 | FocusView: Understanding and Customizing Informational Video Watching Experiences for Viewers with ADHD | Hanxiu 'Hazel' Zhu et.al. | 2507.13309v1 | null |
2025-07-17 | Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy | Yiting Yang et.al. | 2507.13260v1 | null |
2025-07-17 | Signal Temporal Logic Compliant Co-design of Planning and Control | Manas Sashank Juvvi et.al. | 2507.13225v1 | null |
2025-07-17 | Leveraging Pre-Trained Visual Models for AI-Generated Video Detection | Keerthi Veeramachaneni et.al. | 2507.13224v1 | null |
2025-07-17 | Degrees of points with rational |
Kenji Terao et.al. | 2507.13199v1 | null |
2025-07-16 | CytoSAE: Interpretable Cell Embeddings for Hematology | Muhammed Furkan Dasdelen et.al. | 2507.12464v1 | null |
2025-07-16 | MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding | Renjie Li et.al. | 2507.12463v1 | null |
2025-07-16 | SpatialTrackerV2: 3D Point Tracking Made Easy | Yuxi Xiao et.al. | 2507.12462v1 | null |
2025-07-16 | Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios | Van-Hoang-Anh Phan et.al. | 2507.12449v1 | null |
2025-07-16 | Minmax Exclusivity Classes for Power-Type Loss Functions | Stanisław M. S. Halkiewicz et.al. | 2507.12447v1 | null |
2025-07-17 | EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos | Ruihan Yang et.al. | 2507.12440v2 | null |
2025-07-16 | Heisenberg limited multiple eigenvalue estimation via off-the-grid compressed sensing | Davide Castaldo et.al. | 2507.12438v1 | null |
2025-07-16 | Energy-based models for inverse imaging problems | Andreas Habring et.al. | 2507.12432v1 | null |
2025-07-16 | Unit-Based Histopathology Tissue Segmentation via Multi-Level Feature Representation | Ashkan Shakarami et.al. | 2507.12427v1 | null |
2025-07-16 | DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition | Hayat Ullah et.al. | 2507.12426v1 | null |
2025-07-15 | Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation | Zhen Xu et.al. | 2507.11540v1 | null |
2025-07-15 | Streaming 4D Visual Geometry Transformer | Dong Zhuo et.al. | 2507.11539v1 | null |
2025-07-15 | Understanding Quantum Information and Computation | John Watrous et.al. | 2507.11536v1 | null |
2025-07-15 | LLM-based ambiguity detection in natural language instructions for collaborative surgical robots | Ana Davila et.al. | 2507.11525v1 | null |
2025-07-15 | Precision Spatio-Temporal Feature Fusion for Robust Remote Sensing Change Detection | Buddhi Wijenayake et.al. | 2507.11523v1 | null |
2025-07-15 | CATVis: Context-Aware Thought Visualization | Tariq Mehmood et.al. | 2507.11522v1 | null |
2025-07-15 | On the Complexity of the Optimal Correlated Equilibria in Extensive-Form Games | Vincent Cheval et.al. | 2507.11509v1 | null |
2025-07-16 | Multipass Linear Sketches for Geometric LP-Type Problems | N. Efe Çekirge et.al. | 2507.11484v2 | null |
2025-07-15 | JamShield: A Machine Learning Detection System for Over-the-Air Jamming Attacks | Ioannis Panitsas et.al. | 2507.11483v1 | null |
2025-07-15 | C-FBI: A Combinatorial method using Convolutions for Circle Fitting in Blurry Images | Esteban Román Catafau et.al. | 2507.11476v1 | null |
2025-07-14 | EmbRACE-3K: Embodied Reasoning and Action in Complex Environments | Mingxian Lin et.al. | 2507.10548v1 | null |
2025-07-14 | Disentangling Neural Disjunctive Normal Form Models | Kexin Gu Baugh et.al. | 2507.10546v1 | null |
2025-07-14 | ScaffoldAvatar: High-Fidelity Gaussian Avatars with Patch Expressions | Shivangi Aneja et.al. | 2507.10542v1 | null |
2025-07-14 | A Classification of Transversal Clifford Gates for Qubit Stabilizer Codes | Shival Dasu et.al. | 2507.10519v1 | null |
2025-07-14 | Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI | Jiangkai Wu et.al. | 2507.10510v1 | null |
2025-07-14 | Topological phases and Edge states in an exactly solvable Gamma matrix model | Akhil Pravin Furtado et.al. | 2507.10509v1 | null |
2025-07-14 | Colorful Minors | Evangelos Protopapas et.al. | 2507.10467v1 | null |
2025-07-14 | AudioMAE++: learning better masked audio representations with SwiGLU FFNs | Sarthak Yadav et.al. | 2507.10464v1 | null |
2025-07-14 | RAPNet: A Receptive-Field Adaptive Convolutional Neural Network for Pansharpening | Tao Tang et.al. | 2507.10461v1 | null |
2025-07-14 | 4D-Animal: Freely Reconstructing Animatable 3D Animals from Videos | Shanshan Zhong et.al. | 2507.10437v1 | null |
2025-07-11 | Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective | Hangjie Yuan et.al. | 2507.08801v1 | null |
2025-07-11 | Mining the Alerts: A Preliminary Catalog of Compact Binaries from the Fourth Observing Run | Aleyna Akyüz et.al. | 2507.08778v1 | null |
2025-07-11 | A Hybrid Multi-Well Hopfield-CNN with Feature Extraction and K-Means for MNIST Classification | Ahmed Farooq et.al. | 2507.08766v1 | null |
2025-07-11 | ML-Based Automata Simplification for Symbolic Accelerators | Tiffany Yu et.al. | 2507.08751v1 | null |
2025-07-11 | HieraRS: A Hierarchical Segmentation Paradigm for Remote Sensing Enabling Multi-Granularity Interpretation and Cross-Domain Transfer | Tianlong Ai et.al. | 2507.08741v1 | null |
2025-07-11 | Statistical Analysis of Early Spectra in Type II and IIb Supernovae | Maider González-Bañuelos et.al. | 2507.08731v1 | null |
2025-07-11 | RoundaboutHD: High-Resolution Real-World Urban Environment Benchmark for Multi-Camera Vehicle Tracking | Yuqiang Lin et.al. | 2507.08729v1 | null |
2025-07-11 | Distinct neurodynamics of functional brain networks in Alzheimer's disease and frontotemporal dementia as revealed by EEG | Sungwoo Ahn et.al. | 2507.08728v1 | null |
2025-07-11 | Free phases of Majorana fermions: Tenfold ways compared | Luuk Stehouwer et.al. | 2507.08694v1 | null |
2025-07-11 | Functional equations of axiomatic multiple Dirichlet series, Weyl groupoids, and quantum algebra | Will Sawin et.al. | 2507.08662v1 | null |
2025-07-10 | Multigranular Evaluation for Brain Visual Decoding | Weihao Xia et.al. | 2507.07993v1 | null |
2025-07-10 | Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs | Jeongseok Hyun et.al. | 2507.07990v1 | null |
2025-07-10 | CLIP Won't Learn Object-Attribute Binding from Natural Data and Here is Why | Bijay Gurung et.al. | 2507.07985v1 | null |
2025-07-10 | Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling | Haoyu Wu et.al. | 2507.07982v1 | null |
2025-07-10 | Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions | Longfei Li et.al. | 2507.07978v1 | null |
2025-07-10 | Scaling RL to Long Videos | Yukang Chen et.al. | 2507.07966v1 | null |
2025-07-10 | Multimodal Framework for Explainable Autonomous Driving: Integrating Video, Sensor, and Textual Data for Enhanced Decision-Making and Transparency | Abolfazl Zarghani et.al. | 2507.07938v1 | null |
2025-07-10 | Working with AI: Measuring the Occupational Implications of Generative AI | Kiran Tomlinson et.al. | 2507.07935v1 | null |
2025-07-10 | Measuring Hypothesis Testing Errors in the Evaluation of Retrieval Systems | Jack McKechnie et.al. | 2507.07924v1 | null |
2025-07-10 | ArteryX: Advancing Brain Artery Feature Extraction with Vessel-Fused Networks and a Robust Validation Framework | Abrar Faiyaz et.al. | 2507.07920v1 | null |
2025-07-10 | DTECT: Dynamic Topic Explorer & Context Tracker | Suman Adhya et.al. | 2507.07910v1 | null |
2025-07-09 | 4KAgent: Agentic Any Image to 4K Super-Resolution | Yushen Zuo et.al. | 2507.07105v1 | null |
2025-07-09 | Exploring Public Perceptions of Generative AI in Libraries: A Social Media Analysis of X Discussions | Yuan Li et.al. | 2507.07047v1 | null |
2025-07-09 | Opto-ViT: Architecting a Near-Sensor Region of Interest-Aware Vision Transformer Accelerator with Silicon Photonics | Mehrdad Morsali et.al. | 2507.07044v1 | null |
2025-07-09 | Tilings of the sphere by congruent pentagons V: Edge combination |
Jinjin Liang et.al. | 2507.07038v1 | null |
2025-07-09 | Classifying integral Grothendieck rings up to rank 5 and beyond | Max A. Alekseyev et.al. | 2507.07023v1 | null |
2025-07-09 | Quantum Spectral Clustering: Comparing Parameterized and Neuromorphic Quantum Kernels | Donovan Slabbert et.al. | 2507.07018v1 | null |
2025-07-09 | Deep Brain Net: An Optimized Deep Learning Model for Brain tumor Detection in MRI Images Using EfficientNetB0 and ResNet50 with Transfer Learning | Daniel Onah et.al. | 2507.07011v1 | null |
2025-07-09 | GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning | S M Taslim Uddin Raju et.al. | 2507.07006v1 | null |
2025-07-09 | BarkBeetle: Stealing Decision Tree Models with Fault Injection | Qifan Wang et.al. | 2507.06986v1 | null |
2025-07-09 | Anti-Interference Diffractive Deep Neural Networks for Multi-Object Recognition | Zhiqi Huang et.al. | 2507.06978v1 | null |
2025-07-08 | Learning to Track Any Points from Human Motion | Inès Hyeonsu Kim et.al. | 2507.06233v1 | null |
2025-07-08 | seMCD: Sequentially implemented Monte Carlo depth computation with statistical guarantees | Felix Gnettner et.al. | 2507.06227v1 | null |
2025-07-08 | EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow | Yixiang Chen et.al. | 2507.06224v1 | null |
2025-07-08 | Topological Holography for Mixed-State Phases and Phase Transitions | Ran Luo et.al. | 2507.06218v1 | null |
2025-07-08 | What ZTF Saw Where Rubin Looked: Anomaly Hunting in DR23 | Maria V. Pruzhinskaya et.al. | 2507.06217v1 | null |
2025-07-08 | DS@GT at CheckThat! 2025: Ensemble Methods for Detection of Scientific Discourse on Social Media | Ayush Parikh et.al. | 2507.06205v1 | null |
2025-07-08 | DS@GT at CheckThat! 2025: Evaluating Context and Tokenization Strategies for Numerical Fact Verification | Maximilian Heil et.al. | 2507.06195v1 | null |
2025-07-08 | DS@GT at CheckThat! 2025: Detecting Subjectivity via Transfer-Learning and Corrective Data Augmentation | Maximilian Heil et.al. | 2507.06189v1 | null |
2025-07-08 | SoftReMish: A Novel Activation Function for Enhanced Convolutional Neural Networks for Visual Recognition Performance | Mustafa Bayram Gücen et.al. | 2507.06148v1 | null |
2025-07-08 | LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models | Zhihao Chen et.al. | 2507.06140v1 | null |
2025-07-07 | Spatio-Temporal LLM: Reasoning about Environments and Actions | Haozhen Zheng et.al. | 2507.05258v1 | null |
2025-07-07 | StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling | Meng Wei et.al. | 2507.05240v1 | null |
2025-07-07 | Bridging Expressivity and Scalability with Adaptive Unitary SSMs | Arjun Karuvally et.al. | 2507.05238v1 | null |
2025-07-07 | Self-Supervised Real-Time Tracking of Military Vehicles in Low-FPS UAV Footage | Markiyan Kostiv et.al. | 2507.05229v1 | null |
2025-07-08 | MedGemma Technical Report | Andrew Sellergren et.al. | 2507.05201v2 | null |
2025-07-07 | EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling | Boyuan Wang et.al. | 2507.05198v1 | null |
2025-07-07 | Light-cone vector superspace and continuous-spin field in AdS | R. R. Metsaev et.al. | 2507.05194v1 | null |
2025-07-07 | RAM-W600: A Multi-Task Wrist Dataset and Benchmark for Rheumatoid Arthritis | Songxiao Yang et.al. | 2507.05193v1 | null |
2025-07-07 | QMoE: A Quantum Mixture of Experts Framework for Scalable Quantum Neural Networks | Hoang-Quan Nguyen et.al. | 2507.05190v1 | null |
2025-07-07 | Satellite-based Rabi rice paddy field mapping in India: a case study on Telangana state | Prashanth Reddy Putta et.al. | 2507.05189v1 | null |
2025-07-03 | MultiGen: Using Multimodal Generation in Simulation to Learn Multimodal Policies in Real | Renhao Wang et.al. | 2507.02864v1 | null |
2025-07-03 | RefTok: Reference-Based Tokenization for Video Generation | Xiang Fan et.al. | 2507.02862v1 | null |
2025-07-03 | LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans | Zhening Huang et.al. | 2507.02861v1 | null |
2025-07-03 | Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching | Xin Zhou et.al. | 2507.02860v1 | null |
2025-07-03 | AnyI2V: Animating Any Conditional Image with Motion Control | Ziye Li et.al. | 2507.02857v1 | null |
2025-07-03 | Classification and Reduction of Homogeneous Star Products | Marvin Dippell et.al. | 2507.02820v1 | null |
2025-07-03 | LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion | Fangfu Liu et.al. | 2507.02813v1 | null |
2025-07-03 | HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars | Gent Serifi et.al. | 2507.02803v1 | null |
2025-07-03 | From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding | Xiangfeng Wang et.al. | 2507.02790v1 | null |
2025-07-03 | From Pixels to Damage Severity: Estimating Earthquake Impacts Using Semantic Segmentation of Social Media Images | Danrong Zhang et.al. | 2507.02781v1 | null |
2025-07-02 | How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks | Rahul Ramachandran et.al. | 2507.01955v1 | null |
2025-07-02 | Kwai Keye-VL Technical Report | Kwai Keye Team et.al. | 2507.01949v1 | null |
2025-07-02 | LongAnimation: Long Animation Generation with Dynamic Global-Local Memory | Nan Chen et.al. | 2507.01945v1 | null |
2025-07-02 | CI-VID: A Coherent Interleaved Text-Video Dataset | Yiming Ju et.al. | 2507.01938v1 | null |
2025-07-02 | evMLP: An Efficient Event-Driven MLP Architecture for Vision | Zhentan Zheng et.al. | 2507.01927v1 | null |
2025-07-02 | Advancing Magnetic Materials Discovery -- A structure-based machine learning approach for magnetic ordering and magnetic moment prediction | Apoorv Verma et.al. | 2507.01913v1 | null |
2025-07-02 | Future Slot Prediction for Unsupervised Object Discovery in Surgical Video | Guiqiu Liao et.al. | 2507.01882v1 | null |
2025-07-02 | A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs | Niccolò McConnell et.al. | 2507.01881v1 | null |
2025-07-02 | Locally Rotationally Symmetric Spacetimes in Einstein-Cartan Theory and Their Classification | Ujjwal Agarwal et.al. | 2507.01840v1 | null |
2025-07-02 | mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling | Tristan Torchet et.al. | 2507.01829v1 | null |
2025-06-30 | How to Design and Train Your Implicit Neural Representation for Video Compression | Matthew Gwilliam et.al. | 2506.24127v1 | null |
2025-06-30 | TextMesh4D: High-Quality Text-to-4D Mesh Generation | Sisi Dai et.al. | 2506.24121v1 | null |
2025-06-30 | Nonlinear Symmetry-Fragmentation of Nonabelian Anyons In Symmetry-Enriched Topological Phases: A String-Net Model Realization | Nianrui Fu et.al. | 2506.24115v1 | null |
2025-06-30 | Epona: Autoregressive Diffusion World Model for Autonomous Driving | Kaiwen Zhang et.al. | 2506.24113v1 | null |
2025-06-30 | MILo: Mesh-In-the-Loop Gaussian Splatting for Detailed and Efficient Surface Reconstruction | Antoine Guédon et.al. | 2506.24096v1 | null |
2025-06-30 | SQUASH: A SWAP-Based Quantum Attack to Sabotage Hybrid Quantum Neural Networks | Rahul Kumar et.al. | 2506.24081v1 | null |
2025-06-30 | C3VDv2 -- Colonoscopy 3D video dataset with enhanced realism | Mayank V. Golhar et.al. | 2506.24074v1 | null |
2025-06-30 | Spectroscopy of drive-induced unwanted state transitions in superconducting circuits | W. Dai et.al. | 2506.24070v1 | null |
2025-06-30 | Evolution models with time-dependent coefficients in friction and viscoelastic damping terms | Halit Sevki Aslan et.al. | 2506.24058v1 | null |
2025-06-30 | Ella: Embodied Social Agents with Lifelong Memory | Hongxin Zhang et.al. | 2506.24019v1 | null |
2025-06-27 | Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy | Yuhao Liu et.al. | 2506.22432v1 | null |
2025-06-27 | Single-shot HDR using conventional image sensor shutter functions and optical randomization | Xiang Dai et.al. | 2506.22426v1 | null |
2025-06-30 | Dehazing Light Microscopy Images with Guided Conditional Flow Matching: finding a sweet spot between fidelity and realism | Anirban Ray et.al. | 2506.22397v2 | null |
2025-06-27 | Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment | Yue Zhang et.al. | 2506.22385v1 | null |
2025-06-27 | Topological Defect Propagation to Classify Knitted Fabrics | Daisuke S. Shimamoto et.al. | 2506.22369v1 | null |
2025-06-27 | From Ground to Air: Noise Robustness in Vision Transformers and CNNs for Event-Based Vehicle Classification with Potential UAV Applications | Nouf Almesafri et.al. | 2506.22360v1 | null |
2025-06-27 | OutDreamer: Video Outpainting with a Diffusion Transformer | Linhao Zhong et.al. | 2506.22298v1 | null |
2025-06-27 | DIGS: Dynamic CBCT Reconstruction using Deformation-Informed 4D Gaussian Splatting and a Low-Rank Free-Form Deformation Model | Yuliang Huang et.al. | 2506.22280v1 | null |
2025-06-27 | Almost abelian pseudo-Kähler Lie algebras | Diego Conti et.al. | 2506.22278v1 | null |
2025-06-27 | Boosting Classification with Quantum-Inspired Augmentations | Matthias Tschöpe et.al. | 2506.22241v1 | null |
2025-06-26 | Whole-Body Conditioned Egocentric Video Prediction | Yutong Bai et.al. | 2506.21552v1 | null |
2025-06-26 | SAM4D: Segment Anything in Camera and LiDAR Streams | Jianyun Xu et.al. | 2506.21547v1 | null |
2025-06-26 | ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers | Nicholas S. DiBrita et.al. | 2506.21537v1 | null |
2025-06-26 | Exploring the Design Space of 3D MLLMs for CT Report Generation | Mohammed Baharoon et.al. | 2506.21535v1 | null |
2025-06-26 | The spectrum of global representations for families of bounded rank and VI-modules | Miguel Barrero et.al. | 2506.21525v1 | null |
2025-06-26 | MADrive: Memory-Augmented Driving Scene Modeling | Polina Karpikova et.al. | 2506.21520v1 | null |
2025-06-26 | G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation | Mohammed Rakib et.al. | 2506.21514v1 | null |
2025-06-26 | GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation | Wentao Hu et.al. | 2506.21513v1 | null |
2025-06-26 | Devising a solution to the problems of Cancer awareness in Telangana | Priyanka Avhad et.al. | 2506.21500v1 | null |
2025-06-26 | Lightweight Physics-Informed Zero-Shot Ultrasound Plane Wave Denoising | Hojat Asgariandehkordi et.al. | 2506.21499v1 | null |
2025-06-25 | Artificial Symmetry Breaking by Self-Interaction Error | Lin Hou et.al. | 2506.20662v1 | null |
2025-06-25 | EditP23: 3D Editing via Propagation of Image Prompts to Multi-View | Roi Bar-On et.al. | 2506.20652v1 | null |
2025-06-25 | Disentangled representations of microscopy images | Jacopo Dapueto et.al. | 2506.20649v1 | null |
2025-06-25 | rd-spiral: An open-source Python library for learning 2D reaction-diffusion dynamics through pseudo-spectral method | Sandy H. S. Herho et.al. | 2506.20633v1 | link |
2025-06-25 | Weighted Mean Frequencies: a handcraft Fourier feature for 4D Flow MRI segmentation | Simon Perrin et.al. | 2506.20614v1 | null |
2025-06-25 | Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings | Ankit Shah et.al. | 2506.20609v1 | null |
2025-06-25 | Video Perception Models for 3D Scene Synthesis | Rui Huang et.al. | 2506.20601v1 | null |
2025-06-25 | CogGen: A Learner-Centered Generative AI Architecture for Intelligent Tutoring with Programming Video | Wengxi Li et.al. | 2506.20600v1 | null |
2025-06-25 | WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration | Chaojun Ni et.al. | 2506.20590v1 | null |
2025-06-25 | TRIM: A Self-Supervised Video Summarization Framework Maximizing Temporal Relative Information and Representativeness | Pritam Mishra et.al. | 2506.20588v1 | null |
2025-06-24 | Radial Attention: |
Xingyang Li et.al. | 2506.19852v1 | null |
2025-06-24 | AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models | Zehuan Huang et.al. | 2506.19851v1 | null |
2025-06-24 | Unified Vision-Language-Action Model | Yuqi Wang et.al. | 2506.19850v1 | null |
2025-06-24 | GenHSI: Controllable Generation of Human-Scene Interaction Videos | Zekun Li et.al. | 2506.19840v1 | null |
2025-06-24 | Improving Progressive Generation with Decomposable Flow Matching | Moayed Haji-Ali et.al. | 2506.19839v1 | null |
2025-06-24 | SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution | Liangbin Xie et.al. | 2506.19838v1 | null |
2025-06-24 | MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration | Yucheng Zhou et.al. | 2506.19835v1 | null |
2025-06-24 | Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router | Yubo Huang et.al. | 2506.19833v1 | null |
2025-06-24 | How Effectively Can BERT Models Interpret Context and Detect Bengali Communal Violent Text? | Abdullah Khondoker et.al. | 2506.19831v1 | null |
2025-06-25 | One Prototype Is Enough: Single-Prototype Activation for Interpretable Image Classification | Yitao Peng et.al. | 2506.19808v2 | null |
2025-06-23 | TC-Light: Temporally Consistent Relighting for Dynamic Long Videos | Yang Liu et.al. | 2506.18904v1 | null |
2025-06-23 | VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory | Runjia Li et.al. | 2506.18903v1 | null |
2025-06-23 | From Virtual Games to Real-World Play | Wenqiang Sun et.al. | 2506.18901v1 | null |
2025-06-23 | FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation | Kaiyi Huang et.al. | 2506.18899v1 | null |
2025-06-23 | MinD: Unified Visual Imagination and Control via Hierarchical World Models | Xiaowei Chi et.al. | 2506.18897v1 | null |
2025-06-23 | Steering Conceptual Bias via Transformer Latent-Subspace Activation | Vansh Sharma et.al. | 2506.18887v1 | null |
2025-06-23 | Universal Video Temporal Grounding with Generative Multi-modal Large Language Models | Zeqian Li et.al. | 2506.18883v1 | null |
2025-06-23 | Let Your Video Listen to Your Music! | Xinyu Zhang et.al. | 2506.18881v1 | null |
2025-06-23 | OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation | Qijun Gan et.al. | 2506.18866v1 | null |
2025-06-23 | Pointwise-relatively-compact subgroups and trivial-weight-free representations | Alexandru Chirvasitu et.al. | 2506.18861v1 | null |
2025-06-20 | VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning | Zhangyang Qi et.al. | 2506.17221v1 | null |
2025-06-23 | Emergent Temporal Correspondences from Video Diffusion Transformers | Jisu Nam et.al. | 2506.17220v2 | link |
2025-06-20 | Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition | Jiaqi Li et.al. | 2506.17201v1 | null |
2025-06-20 | YASMOT: Yet another stereo image multi-object tracker | Ketil Malde et.al. | 2506.17186v1 | null |
2025-06-20 | High-accuracy inference using HfO$_x$S$_y$/HfS$_2$ Memristors | Aferdita Xhameni et.al. | 2506.17174v1 | null |
2025-06-20 | Proportional Sensitivity in Generative Adversarial Network (GAN)-Augmented Brain Tumor Classification Using Convolutional Neural Network | Mahin Montasir Afif et.al. | 2506.17165v1 | null |
2025-06-20 | Affine semigroups without consecutive small elements | J. C. Rosales et.al. | 2506.17152v1 | null |
2025-06-20 | Do We Need Large VLMs for Spotting Soccer Actions? | Ritabrata Chakraborty et.al. | 2506.17144v1 | null |
2025-06-20 | MeDi: Metadata-Guided Diffusion Models for Mitigating Biases in Tumor Classification | David Jacob Drexlin et.al. | 2506.17140v1 | null |
2025-06-20 | Robust Training with Data Augmentation for Medical Imaging Classification | Josué Martínez-Martínez et.al. | 2506.17133v1 | null |
2025-06-18 | Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos | Kaifeng Zhang et.al. | 2506.15680v1 | null |
2025-06-20 | Sekai: A Video Dataset towards World Exploration | Zhen Li et.al. | 2506.15675v2 | null |
2025-06-18 | UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting | Kai He et.al. | 2506.15673v1 | null |
2025-06-18 | PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection | Wenhao Li et.al. | 2506.15656v1 | null |
2025-06-18 | Oldies but Goldies: The Potential of Character N-grams for Romanian Texts | Dana Lupsa et.al. | 2506.15650v1 | null |
2025-06-18 | FindingDory: A Benchmark to Evaluate Memory in Embodied Agents | Karmesh Yadav et.al. | 2506.15635v1 | null |
2025-06-18 | GFLC: Graph-based Fairness-aware Label Correction for Fair Classification | Modar Sulaiman et.al. | 2506.15620v1 | null |
2025-06-18 | The Compositional Architecture of Regret in Large Language Models | Xiangxiang Cui et.al. | 2506.15617v1 | null |
2025-06-18 | TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data | Kentaro Seki et.al. | 2506.15614v1 | null |
2025-06-18 | BoxFusion: Reconstruction-Free Open-Vocabulary 3D Object Detection via Real-Time Multi-View Box Fusion | Yuqing Lan et.al. | 2506.15610v1 | null |
2025-06-17 | GMT: General Motion Tracking for Humanoid Whole-Body Control | Zixuan Chen et.al. | 2506.14770v1 | null |
2025-06-17 | On the Hardness of Bandit Learning | Nataly Brukhim et.al. | 2506.14746v1 | null |
2025-06-17 | SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting | Ziqiao Peng et.al. | 2506.14742v1 | null |
2025-06-17 | Repulsive particle interactions enable selective information processing at cellular interfaces | Jenna Elliott et.al. | 2506.14739v1 | null |
2025-06-17 | Plug-and-Play with 2.5D Artifact Reduction Prior for Fast and Accurate Industrial Computed Tomography Reconstruction | Haley Duba-Sullivan et.al. | 2506.14719v1 | null |
2025-06-17 | Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models | Ling Li et.al. | 2506.14674v1 | null |
2025-06-17 | Quantifying Diagnostic Signal Decay in Dementia: A National Study of Medicare Hospitalization Data | Federica Spoto et.al. | 2506.14669v1 | null |
2025-06-17 | DDS-NAS: Dynamic Data Selection within Neural Architecture Search via On-line Hard Example Mining applied to Image Classification | Matt Poyser et.al. | 2506.14667v1 | null |
2025-06-18 | AIn't Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation | Leah von der Heyde et.al. | 2506.14634v2 | null |
2025-06-17 | Optimization-Based Image Restoration under Implementation Constraints in Optical Analog Circuits | Taisei Kato et.al. | 2506.14624v1 | null |
2025-06-16 | PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images | Lingteng Qiu et.al. | 2506.13766v1 | null |
2025-06-16 | Touch begins where vision ends: Generalizable policies for contact-rich manipulation | Zifan Zhao et.al. | 2506.13762v1 | null |
2025-06-17 | VideoPDE: Unified Generative PDE Solving via Video Inpainting Diffusion Models | Edward Li et.al. | 2506.13754v2 | null |
2025-06-16 | Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability | Shova Kuikel et.al. | 2506.13746v1 | null |
2025-06-16 | Robust Recursive Fusion of Multiresolution Multispectral Images with Location-Aware Neural Networks | Haoqing Li et.al. | 2506.13733v1 | null |
2025-06-16 | Probabilistic patient risk profiling with pair-copula constructions | Özge Şahin et.al. | 2506.13731v1 | null |
2025-06-16 | Contrastive Self-Supervised Learning As Neural Manifold Packing | Guanming Zhang et.al. | 2506.13717v1 | null |
2025-06-16 | TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning | Junru Zhang et.al. | 2506.13705v1 | null |
2025-06-16 | Eight-dimensional non completely reducible symplectic Lie algebras | T. Aït Aissa et.al. | 2506.13699v1 | null |
2025-06-16 | Vid-CamEdit: Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry | Junyoung Seo et.al. | 2506.13697v1 | null |
2025-06-13 | crossMoDA Challenge: Evolution of Cross-Modality Domain Adaptation Techniques for Vestibular Schwannoma and Cochlea Segmentation from 2021 to 2023 | Navodini Wijethilake et.al. | 2506.12006v1 | null |
2025-06-13 | Visual Pre-Training on Unlabeled Images using Reinforcement Learning | Dibya Ghosh et.al. | 2506.11967v1 | null |
2025-06-13 | Technical Evaluation of a Disruptive Approach in Homomorphic AI | Eric Filiol et.al. | 2506.11954v1 | null |
2025-06-13 | Effectiveness of Counter-Speech against Abusive Content: A Multidimensional Annotation and Classification Study | Greta Damo et.al. | 2506.11919v1 | null |
2025-06-13 | GeistBERT: Breathing Life into German NLP | Raphael Scheible-Schmitt et.al. | 2506.11903v1 | null |
2025-06-13 | A Neural Rejection System Against Universal Adversarial Perturbations in Radio Signal Classification | Lu Zhang et.al. | 2506.11901v1 | null |
2025-06-13 | Attention-based Adversarial Robust Distillation in Radio Signal Classifications for Low-Power IoT Devices | Lu Zhang et.al. | 2506.11892v1 | null |
2025-06-13 | Methods for evaluating the resolution of 3D data derived from satellite images | Christina Selby et.al. | 2506.11876v1 | null |
2025-06-13 | MindGrab for BrainChop: Fast and Accurate Skull Stripping for Command Line and Browser | Armina Fani et.al. | 2506.11860v1 | null |
2025-06-13 | 3D Skin Segmentation Methods in Medical Imaging: A Comparison | Martina Paccini et.al. | 2506.11852v1 | null |
2025-06-12 | InstaInpaint: Instant 3D-Scene Inpainting with Masked Large Reconstruction Model | Junqi You et.al. | 2506.10980v1 | null |
2025-06-12 | GenWorld: Towards Detecting AI-generated Real-world Simulation Videos | Weiliang Chen et.al. | 2506.10975v1 | null |
2025-06-12 | Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop | Justin Kerr et.al. | 2506.10968v1 | null |
2025-06-12 | Bias-Switchable Row-Column Array Imaging using Fast Orthogonal Row-Column Electronic Scanning (FORCES) Compared with Conventional Row-Column Array Imaging | Randy Palamar et.al. | 2506.10958v1 | null |
2025-06-12 | Coupled reaction and diffusion governing interface evolution in solid-state batteries | Jingxuan Ding et.al. | 2506.10944v1 | null |
2025-06-12 | VINCIE: Unlocking In-context Image Editing from Video | Leigang Qu et.al. | 2506.10941v1 | null |
2025-06-12 | Video-Mediated Emotion Disclosure: A Study of Mental Health Vlogging by People with Schizophrenia on YouTube | Jiaying Lizzy Liu et.al. | 2506.10932v1 | null |
2025-06-12 | On feature selection in double-imbalanced data settings: a Random Forest approach | Fabio Demaria et.al. | 2506.10929v1 | null |
2025-06-12 | Semi-Automated Quality Assurance in Digital Pathology: Tile Classification Approach | Meredith VandeHaar et.al. | 2506.10916v1 | null |
2025-06-12 | M4V: Multi-Modal Mamba for Text-to-Video Generation | Jiancheng Huang et.al. | 2506.10915v1 | null |
2025-06-11 | DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos | Chieh Hubert Lin et.al. | 2506.09997v1 | null |
2025-06-11 | PlayerOne: Egocentric World Simulator | Yuanpeng Tu et.al. | 2506.09995v1 | null |
2025-06-11 | Large Language Models for Toxic Language Detection in Low-Resource Balkan Languages | Amel Muminovic et.al. | 2506.09992v1 | null |
2025-06-11 | Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes | Yiming Dou et.al. | 2506.09989v1 | null |
2025-06-11 | A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs | Benno Krojer et.al. | 2506.09987v1 | null |
2025-06-11 | V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning | Mido Assran et.al. | 2506.09985v1 | null |
2025-06-11 | InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions | Zhenzhi Wang et.al. | 2506.09984v1 | null |
2025-06-11 | ReSim: Reliable World Simulation for Autonomous Driving | Jiazhi Yang et.al. | 2506.09981v1 | null |
2025-06-11 | Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing | Junfei Wu et.al. | 2506.09965v1 | null |
2025-06-11 | Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos | Benjamin Reichman et.al. | 2506.09953v1 | null |
2025-06-10 | MagCache: Fast Video Generation with Magnitude-Aware Cache | Zehong Ma et.al. | 2506.09045v1 | null |
2025-06-10 | The Decoupled Risk Landscape in Performative Prediction | Javier Sanguino et.al. | 2506.09044v1 | null |
2025-06-10 | Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models | Xuanchi Ren et.al. | 2506.09042v1 | null |
2025-06-10 | Princeton365: A Diverse Dataset with Accurate Camera Pose | Karhan Kayan et.al. | 2506.09035v1 | null |
2025-06-10 | DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical Imaging | Felix Wagner et.al. | 2506.09024v1 | null |
2025-06-10 | Employing self-supervised learning models for cross-linguistic child speech maturity classification | Theo Zhang et.al. | 2506.08999v1 | null |
2025-06-10 | Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models | Chenyu Lian et.al. | 2506.08990v1 | null |
2025-06-10 | Naturalistic Language-related Movie-Watching fMRI Task for Detecting Neurocognitive Decline and Disorder | Yuejiao Wang et.al. | 2506.08986v1 | null |
2025-06-10 | Diver-Robot Communication Dataset for Underwater Hand Gesture Recognition | Igor Kvasić et.al. | 2506.08974v1 | null |
2025-06-10 | Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System | Yuan Guo et.al. | 2506.08972v1 | null |
2025-06-09 | 4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos | Zhen Xu et.al. | 2506.08015v1 | null |
2025-06-09 | Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion | Xun Huang et.al. | 2506.08009v1 | null |
2025-06-09 | Dreamland: Controllable World Creation with Simulator and Generative Models | Sicheng Mo et.al. | 2506.08006v1 | null |
2025-06-09 | Dynamic View Synthesis as an Inverse Problem | Hidir Yesiltepe et.al. | 2506.08004v1 | null |
2025-06-09 | Audio-Sync Video Generation with Multi-Stream Temporal Control | Shuchen Weng et.al. | 2506.08003v1 | null |
2025-06-09 | Generative Modeling of Weights: Generalization or Memorization? | Boya Zeng et.al. | 2506.07998v1 | null |
2025-06-09 | UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References | Ming-Feng Li et.al. | 2506.07996v1 | null |
2025-06-09 | CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray | Mingquan Lin et.al. | 2506.07984v1 | null |
2025-06-09 | Scalable Machine Learning Models for Predicting Quantum Transport in Disordered 2D Hexagonal Materials | Seyed Mahdi Mastoor et.al. | 2506.07983v1 | null |
2025-06-09 | CyberV: Cybernetics for Test-time Scaling in Video Understanding | Jiahao Meng et.al. | 2506.07971v1 | null |
2025-06-06 | TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation | Muhammad Sohail Danish et.al. | 2506.06281v1 | null |
2025-06-06 | Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias | Yuanzhe Hu et.al. | 2506.06280v1 | null |
2025-06-06 | ExAct: A Video-Language Benchmark for Expert Action Analysis | Han Yi et.al. | 2506.06277v1 | null |
2025-06-06 | Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding | Emmanouil Zaranis et.al. | 2506.06275v1 | null |
2025-06-06 | BecomingLit: Relightable Gaussian Avatars with Hybrid Neural Shading | Jonathan Schmidt et.al. | 2506.06271v1 | null |
2025-06-06 | Integrating Complexity and Biological Realism: High-Performance Spiking Neural Networks for Breast Cancer Detection | Zofia Rudnicka et.al. | 2506.06265v1 | null |
2025-06-06 | Tuning of altermagnetism by strain | M. Khodas et.al. | 2506.06257v1 | null |
2025-06-06 | Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision | Yuping He et.al. | 2506.06253v1 | null |
2025-06-06 | Explaining Matters: Leveraging Definitions and Semantic Expansion for Sexism Detection | Sahrish Khan et.al. | 2506.06238v1 | null |
2025-06-06 | Towards an Explainable Comparison and Alignment of Feature Embeddings | Mohammad Jalali et.al. | 2506.06231v1 | null |
2025-06-05 | VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos | Hanoona Rasheed et.al. | 2506.05349v1 | null |
2025-06-05 | Neural Inverse Rendering from Propagating Light | Anagh Malik et.al. | 2506.05347v1 | null |
2025-06-05 | ContentV: Efficient Training of Video Generation Models with Limited Compute | Wenfeng Lin et.al. | 2506.05343v1 | null |
2025-06-05 | VideoMolmo: Spatio-Temporal Grounding Meets Pointing | Ghazi Shazan Ahmad et.al. | 2506.05336v1 | null |
2025-06-05 | Unleashing Hour-Scale Video Training for Long Video-Language Understanding | Jingyang Lin et.al. | 2506.05332v1 | null |
2025-06-05 | AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs | Lidong Lu et.al. | 2506.05328v1 | null |
2025-06-05 | LSM-2: Learning from Incomplete Wearable Sensor Data | Maxwell A. Xu et.al. | 2506.05321v1 | null |
2025-06-05 | ProJo4D: Progressive Joint Optimization for Sparse-View Inverse Physics Estimation | Daniel Rho et.al. | 2506.05317v1 | null |
2025-06-05 | Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos | Weifeng Lin et.al. | 2506.05302v1 | null |
2025-06-05 | SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training | Jianyi Wang et.al. | 2506.05301v1 | null |
2025-06-04 | LayerFlow: A Unified Model for Layer-aware Video Generation | Sihui Ji et.al. | 2506.04228v1 | null |
2025-06-04 | Object-centric 3D Motion Field for Robot Learning from Human Videos | Zhao-Heng Yin et.al. | 2506.04227v1 | null |
2025-06-04 | Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation | Tianyu Huang et.al. | 2506.04225v1 | null |
2025-06-04 | Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset | Zirui Wang et.al. | 2506.04224v1 | null |
2025-06-04 | Topological Mixed States: Axiomatic Approaches and Phases of Matter | Tai-Hsuan Yang et.al. | 2506.04221v1 | null |
2025-06-04 | UNIC: Unified In-Context Video Editing | Zixuan Ye et.al. | 2506.04216v1 | null |
2025-06-05 | FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers | Xuanhua He et.al. | 2506.04213v2 | null |
2025-06-04 | A Few Moments Please: Scalable Graphon Learning via Moment Matching | Reza Ramezanpour et.al. | 2506.04206v1 | null |
2025-06-04 | Synthetic multi-inversion time magnetic resonance images for visualization of subcortical structures | Savannah P. Hays et.al. | 2506.04173v1 | null |
2025-06-04 | Does Prompt Design Impact Quality of Data Imputation by LLMs? | Shreenidhi Srinivasan et.al. | 2506.04172v1 | null |
2025-06-03 | IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation | Yuanze Lin et.al. | 2506.03150v1 | null |
2025-06-03 | Topology meets symmetry breaking: Hidden order, intrinsically gapless topological states and finite-temperature topological transitions | Reja H. Wilke et.al. | 2506.03146v1 | null |
2025-06-03 | Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval | Jiwen Yu et.al. | 2506.03141v1 | null |
2025-06-03 | CamCloneMaster: Enabling Reference-based Camera Control for Video Generation | Yawen Luo et.al. | 2506.03140v1 | null |
2025-06-03 | The perfect entangler spectrum as a tool to analyze crosstalk | Matthias G. Krauss et.al. | 2506.03137v1 | null |
2025-06-03 | AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation | Lu Qiu et.al. | 2506.03126v1 | null |
2025-06-03 | DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation | Zhengyao Lv et.al. | 2506.03123v1 | null |
2025-06-03 | Controllable Human-centric Keyframe Interpolation with Generative Prior | Zujin Guo et.al. | 2506.03119v1 | null |
2025-06-03 | HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers | Zhiyuan Yu et.al. | 2506.03118v1 | null |
2025-06-03 | TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models | Chetwin Low et.al. | 2506.03099v1 | null |
2025-05-30 | Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks | Tajamul Ashraf et.al. | 2505.24876v1 | null |
2025-05-30 | MiniMax-Remover: Taming Bad Noise Helps Video Object Removal | Bojia Zi et.al. | 2505.24873v1 | null |
2025-05-30 | SiLVR: A Simple Language-based Video Reasoning Framework | Ce Zhang et.al. | 2505.24869v1 | null |
2025-05-30 | Time Blindness: Why Video-Language Models Can't See What Humans Can? | Ujjwal Upadhyay et.al. | 2505.24867v1 | null |
2025-05-30 | TalkingHeadBench: A Multi-Modal Benchmark & Analysis of Talking-Head DeepFake Detection | Xinqi Xiong et.al. | 2505.24866v1 | null |
2025-05-30 | DexMachina: Functional Retargeting for Bimanual Dexterous Manipulation | Zhao Mandi et.al. | 2505.24853v1 | null |
2025-05-30 | Reading Recognition in the Wild | Charig Yang et.al. | 2505.24848v1 | null |
2025-05-30 | VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software | Brandon Man et.al. | 2505.24838v1 | null |
2025-06-02 | Beyond Pretty Pictures: Combined Single- and Multi-Image Super-resolution for Sentinel-2 Images | Aditya Retnanto et.al. | 2505.24799v2 | null |
2025-05-30 | Lightweight Relational Embedding in Task-Interpolated Few-Shot Networks for Enhanced Gastrointestinal Disease Classification | Xinliu Zhong et.al. | 2505.24792v1 | null |
2025-05-29 | Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models | Haohan Chi et.al. | 2505.23757v1 | link |
2025-05-29 | Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence | Diankun Wu et.al. | 2505.23747v1 | null |
2025-05-29 | Boosting Domain Incremental Learning: Selecting the Optimal Parameters is All You Need | Qiang Wang et.al. | 2505.23744v1 | null |
2025-05-29 | DarkDiff: Advancing Low-Light Raw Enhancement by Retasking Diffusion Models for Camera ISP | Amber Yijia Zheng et.al. | 2505.23743v1 | null |
2025-05-29 | MAGREF: Masked Guidance for Any-Reference Video Generation | Yufan Deng et.al. | 2505.23742v1 | link |
2025-05-29 | How Animals Dance (When You're Not Looking) | Xiaojuan Wang et.al. | 2505.23738v1 | null |
2025-05-30 | ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS | Weijie Wang et.al. | 2505.23734v2 | link |
2025-05-29 | The ambiguous AT2022rze: Changing-look AGN mimicking a supernova in a merging galaxy system | P. J. Pessi et.al. | 2505.23731v1 | null |
2025-05-29 | Skin Lesion Phenotyping via Nested Multi-modal Contrastive Learning | Dionysis Christopoulos et.al. | 2505.23709v1 | null |
2025-05-29 | Distributed Federated Learning for Vehicular Network Security: Anomaly Detection Benefits and Multi-Domain Attack Threats | Utku Demir et.al. | 2505.23706v1 | null |
2025-05-29 | Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better | Danny Driess et.al. | 2505.23705v1 | null |
2025-05-28 | Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation | Zhe Kong et.al. | 2505.22647v1 | null |
2025-05-28 | PS4PRO: Pixel-to-pixel Supervision for Photorealistic Rendering and Optimization | Yezhi Shen et.al. | 2505.22616v1 | null |
2025-05-28 | Chest Disease Detection In X-Ray Images Using Deep Learning Classification Method | Alanna Hazlett et.al. | 2505.22609v1 | null |
2025-05-28 | Transformers for Secure Hardware Systems: Applications, Challenges, and Outlook | Banafsheh Saber Latibari et.al. | 2505.22605v1 | null |
2025-05-28 | Comparative Analysis of Machine Learning Models for Lung Cancer Mutation Detection and Staging Using 3D CT Scans | Yiheng Li et.al. | 2505.22592v1 | null |
2025-05-28 | Tell me Habibi, is it Real or Fake? | Kartik Kuckreja et.al. | 2505.22581v1 | null |
2025-05-28 | Multipath cycleGAN for harmonization of paired and unpaired low-dose lung computed tomography reconstruction kernels | Aravind R. Krishnan et.al. | 2505.22568v1 | null |
2025-05-28 | Universal Visuo-Tactile Video Understanding for Embodied Interaction | Yifan Xie et.al. | 2505.22566v1 | null |
2025-05-28 | PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion | Jaehyun Choi et.al. | 2505.22564v1 | null |
2025-05-28 | Emotion-o1: Adaptive Long Reasoning for Emotion Understanding in LLMs | Changhao Song et.al. | 2505.22548v1 | null |
2025-05-27 | Frame In-N-Out: Unbounded Controllable Image-to-Video Generation | Boyang Wang et.al. | 2505.21491v1 | null |
2025-05-27 | Tissue-specific predictive performance: A unified estimation and inference framework for multi-category screening tests | A. Gregory DiRienzo et.al. | 2505.21482v1 | null |
2025-05-27 | M3S-UPD: Efficient Multi-Stage Self-Supervised Learning for Fine-Grained Encrypted Traffic Classification with Unknown Pattern Discovery | Yali Yuan et.al. | 2505.21462v1 | null |
2025-05-27 | LazyVLM: Neuro-Symbolic Approach to Video Analytics | Xiangru Jian et.al. | 2505.21459v1 | null |
2025-05-27 | OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers | Ziqiao Peng et.al. | 2505.21448v1 | null |
2025-05-27 | Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization | Vit Fojtik et.al. | 2505.21423v1 | null |
2025-05-27 | A Structured Unplugged Approach for Foundational AI Literacy in Primary Education | Maria Cristina Carrisi et.al. | 2505.21398v1 | null |
2025-05-27 | Dynamic Vision from EEG Brain Recordings: How much does EEG know? | Prajwal Singh et.al. | 2505.21385v1 | null |
2025-05-27 | ZigzagPointMamba: Spatial-Semantic Mamba for Point Cloud Understanding | Linshuang Diao et.al. | 2505.21381v1 | null |
2025-05-27 | Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning? | Junhao Cheng et.al. | 2505.21374v1 | null |
2025-05-26 | Unleashing 5G Seamless Integration with TSN for Industry 5.0: Frame Forwarding and QoS Treatment | Oscar Adamuz-Hinojosa et.al. | 2505.20239v1 | null |
2025-05-26 | Research on feature fusion and multimodal patent text based on graph attention network | Zhenzhen Song et.al. | 2505.20188v1 | null |
2025-05-26 | Exposing Go's Hidden Bugs: A Novel Concolic Framework | Karolina Gorna et.al. | 2505.20183v1 | null |
2025-05-26 | Long-Context State-Space Video World Models | Ryan Po et.al. | 2505.20171v1 | null |
2025-05-26 | DeepInverse: A Python package for solving imaging inverse problems with deep learning | Julián Tachella et.al. | 2505.20160v1 | null |
2025-05-26 | HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters | Yi Chen et.al. | 2505.20156v1 | null |
2025-05-26 | UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter-Efficient Fine-Tuning of Large Models | Xueyan Zhang et.al. | 2505.20154v1 | null |
2025-05-26 | Improvement Strategies for Few-Shot Learning in OCT Image Classification of Rare Retinal Diseases | Cheng-Yu Tai et.al. | 2505.20149v1 | null |
2025-05-26 | FairTalk: Facilitating Balanced Participation in Video Conferencing by Implicit Visualization of Predicted Turn-Grabbing Intention | Ryo Iijima et.al. | 2505.20138v1 | null |
2025-05-26 | TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos | Fanheng Kong et.al. | 2505.20124v1 | link |
2025-05-23 | WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions | Zizhang Li et.al. | 2505.18151v1 | null |
2025-05-23 | TokBench: Evaluating Your Visual Tokenizer before Visual Generation | Junfeng Wu et.al. | 2505.18142v1 | null |
2025-05-23 | VideoGameBench: Can Vision-Language Models complete popular video games? | Alex L. Zhang et.al. | 2505.18134v1 | null |
2025-05-23 | TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations | Alan Arazi et.al. | 2505.18125v1 | null |
2025-05-23 | Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM | Zinuo Li et.al. | 2505.18110v1 | null |
2025-05-23 | Accelerating Learned Image Compression Through Modeling Neural Training Dynamics | Yichi Zhang et.al. | 2505.18107v1 | null |
2025-05-23 | F-ANcGAN: An Attention-Enhanced Cycle Consistent Generative Adversarial Architecture for Synthetic Image Generation of Nanoparticles | Varun Ajith et.al. | 2505.18106v1 | null |
2025-05-23 | Structural Dynamics of Harmful Content Dissemination on WhatsApp | Yuxin Liu et.al. | 2505.18099v1 | null |
2025-05-23 | DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations | Ziqiao Peng et.al. | 2505.18096v1 | null |
2025-05-23 | Early-Exit Graph Neural Networks | Andrea Giuseppe Di Francesco et.al. | 2505.18088v1 | null |
2025-05-22 | CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms | Shilin Yan et.al. | 2505.17020v1 | link |
2025-05-22 | Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space | Yan Li et.al. | 2505.17011v1 | null |
2025-05-22 | Topological Phases, Criticality, and Mixed State Order in a Hubbard Quantum Simulator | Lin Su et.al. | 2505.17009v1 | null |
2025-05-22 | Deep mineralogical segmentation of thin section images based on QEMSCAN maps | Jean Pablo Vieira de Mello et.al. | 2505.17008v1 | link |
2025-05-22 | CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning | Jiange Yang et.al. | 2505.17006v1 | null |
2025-05-22 | Seeing through Satellite Images at Street Views | Ming Qian et.al. | 2505.17001v1 | null |
2025-05-22 | Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction | Dong Li et.al. | 2505.16980v1 | null |
2025-05-22 | MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning | Suhao Yu et.al. | 2505.16964v1 | null |
2025-05-22 | On Multilingual Encoder Language Model Compression for Low-Resource Languages | Daniil Gurgurov et.al. | 2505.16956v1 | null |
2025-05-22 | On a certain class of para-Hermite Einstein spaces | Adam Chudecki et.al. | 2505.16945v1 | null |
2025-05-21 | Leveraging the Powerful Attention of a Pre-trained Diffusion Model for Exemplar-based Image Colorization | Satoshi Kosugi et.al. | 2505.15812v1 | link |
2025-05-21 | Adaptive Estimation and Learning under Temporal Distribution Shift | Dheeraj Baby et.al. | 2505.15803v1 | null |
2025-05-21 | Interspatial Attention for Efficient 4D Human Video Generation | Ruizhi Shao et.al. | 2505.15800v1 | null |
2025-05-21 | Large Language Models as Computable Approximations to Solomonoff Induction | Jun Wan et.al. | 2505.15784v1 | null |
2025-05-21 | Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention | Huanxuan Liao et.al. | 2505.15774v1 | null |
2025-05-21 | MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling | Cheng Yifan et.al. | 2505.15772v1 | null |
2025-05-21 | Neuro-Argumentative Learning with Case-Based Reasoning | Adam Gould et.al. | 2505.15742v1 | null |
2025-05-21 | iBitter-Stack: A Multi-Representation Ensemble Learning Model for Accurate Bitter Peptide Identification | Sarfraz Ahmad et.al. | 2505.15730v1 | null |
2025-05-21 | Privacy-Preserving Conformal Prediction Under Local Differential Privacy | Coby Penso et.al. | 2505.15721v1 | null |
2025-05-21 | MaxPoolBERT: Enhancing BERT Classification via Layer- and Token-Wise Aggregation | Maike Behrendt et.al. | 2505.15696v1 | null |
2025-05-20 | Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers | Sucheng Ren et.al. | 2505.14687v1 | link |
2025-05-20 | Emerging Properties in Unified Multimodal Pretraining | Chaorui Deng et.al. | 2505.14683v1 | null |
2025-05-20 | ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions | Bufang Yang et.al. | 2505.14668v1 | null |
2025-05-20 | EmoGist: Efficient In-Context Learning for Visual Emotion Understanding | Ronald Seoh et.al. | 2505.14660v1 | null |
2025-05-20 | Beyond Words: Multimodal LLM Knows When to Speak | Zikai Liao et.al. | 2505.14654v1 | null |
2025-05-20 | VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation | Wentao Ma et.al. | 2505.14640v1 | null |
2025-05-20 | A General Framework for Group Sparsity in Hyperspectral Unmixing Using Endmember Bundles | Gokul Bhusal et.al. | 2505.14634v1 | null |
2025-05-20 | Parabolic quantum affine algebras | Kudret Bostanci et.al. | 2505.14624v1 | null |
2025-05-20 | Assessing Projected Quantum Kernels for the Classification of IoT Data | Francesco D'Amore et.al. | 2505.14593v1 | null |
2025-05-20 | Automated Fetal Biometry Assessment with Deep Ensembles using Sparse-Sampling of 2D Intrapartum Ultrasound Images | Jayroop Ramesh et.al. | 2505.14572v1 | null |
2025-05-19 | Unlocking Non-Invasive Brain-to-Text | Dulhan Jayalath et.al. | 2505.13446v1 | null |
2025-05-19 | GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation | Abhay Deshpande et.al. | 2505.13441v1 | null |
2025-05-19 | Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos | Ruoyu Wang et.al. | 2505.13440v1 | link |
2025-05-19 | FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance | Dian Shao et.al. | 2505.13437v1 | null |
2025-05-19 | Synthetic-Powered Predictive Inference | Meshi Bashari et.al. | 2505.13432v1 | null |
2025-05-19 | Understanding Complexity in VideoQA via Visual Program Generation | Cristobal Eyzaguirre et.al. | 2505.13429v1 | null |
2025-05-19 | GuidedMorph: Two-Stage Deformable Registration for Breast MRI | Yaqian Chen et.al. | 2505.13414v1 | null |
2025-05-19 | Faster Video Diffusion with Trainable Sparse Attention | Peiyuan Zhang et.al. | 2505.13389v1 | null |
2025-05-19 | RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers | Ahmet Berke Gokmen et.al. | 2505.13344v1 | null |
2025-05-19 | Neural-Enhanced Rate Adaptation and Computation Distribution for Emerging mmWave Multi-User 3D Video Streaming Systems | Babak Badnava et.al. | 2505.13337v1 | null |
2025-05-16 | QVGen: Pushing the Limit of Quantized Video Generative Models | Yushi Huang et.al. | 2505.11497v1 | null |
2025-05-16 | SHIELD: Safety on Humanoids via CBFs In Expectation on Learned Dynamics | Lizhi Yang et.al. | 2505.11494v1 | null |
2025-05-16 | EMU/GAMA: A new approach to characterising radio luminosity functions | J. Prathap et.al. | 2505.11453v1 | null |
2025-05-16 | GOUHFI: a novel contrast- and resolution-agnostic segmentation tool for Ultra-High Field MRI | Marc-Antoine Fortin et.al. | 2505.11445v1 | link |
2025-05-16 | GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art | Chenkai Zhang et.al. | 2505.11436v1 | null |
2025-05-16 | Neuromorphic Imaging Flow Cytometry combined with Adaptive Recurrent Spiking Neural Networks | Georgios Moustakas et.al. | 2505.11433v1 | null |
2025-05-16 | Face Consistency Benchmark for GenAI Video | Michal Podstawski et.al. | 2505.11425v1 | null |
2025-05-16 | Energy efficiency analysis of Spiking Neural Networks for space applications | Paolo Lunghi et.al. | 2505.11418v1 | null |
2025-05-16 | Uncertainty quantification with approximate variational learning for wearable photoplethysmography prediction tasks | Ciaran Bench et.al. | 2505.11412v1 | null |
2025-05-16 | Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner | Wenchuan Zhang et.al. | 2505.11404v1 | null |
2025-05-15 | 3D-Fixup: Advancing Photo Editing with 3D Priors | Yen-Chi Cheng et.al. | 2505.10566v1 | null |
2025-05-15 | Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data | Yiwen Liu et.al. | 2505.10551v1 | link |
2025-05-15 | Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning | Milan Ganai et.al. | 2505.10547v1 | null |
2025-05-15 | AORRTC: Almost-Surely Asymptotically Optimal Planning with RRT-Connect | Tyler Wilson et.al. | 2505.10542v1 | null |
2025-05-15 | LibIQ: Toward Real-Time Spectrum Classification in O-RAN dApps | Filippo Olimpieri et.al. | 2505.10537v1 | null |
2025-05-15 | Real-World fNIRS-Based Brain-Computer Interfaces: Benchmarking Deep Learning and Classical Models in Interactive Gaming | Mohammad Ghalavand et.al. | 2505.10536v1 | null |
2025-05-15 | Sobolev and quasiconformal distortion of intermediate dimension with applications to conformal dimension | Jonathan M. Fraser et.al. | 2505.10525v1 | null |
2025-05-15 | The Devil Is in the Word Alignment Details: On Translation-Based Cross-Lingual Transfer for Token Classification Tasks | Benedikt Ebing et.al. | 2505.10507v1 | null |
2025-05-16 | WeGA: Weakly-Supervised Global-Local Affinity Learning Framework for Lymph Node Metastasis Prediction in Rectal Cancer | Yifan Gao et.al. | 2505.10502v2 | null |
2025-05-15 | Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio | Tu Duyen Nguyen et.al. | 2505.10500v1 | null |
2025-05-14 | UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing | Yung-Hsuan Lai et.al. | 2505.09615v1 | null |
2025-05-14 | Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware | Justin Yu et.al. | 2505.09601v1 | null |
2025-05-14 | Rhomboid Tiling for Geometric Graph Deep Learning | Yipeng Zhang et.al. | 2505.09586v1 | null |
2025-05-14 | VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation | Chaofan Zhang et.al. | 2505.09577v1 | null |
2025-05-14 | Meta-learning Slice-to-Volume Reconstruction in Fetal Brain MRI using Implicit Neural Representations | Maik Dannecker et.al. | 2505.09565v1 | null |
2025-05-14 | Learning Long-Context Diffusion Policies via Past-Token Prediction | Marcel Torne et.al. | 2505.09561v1 | null |
2025-05-14 | Phase domain walls in coherently driven Bose-Einstein condensates | S. S. Gavrilov et.al. | 2505.09553v1 | null |
2025-05-14 | Learned Free-Energy Functionals from Pair-Correlation Matching for Dynamical Density Functional Theory | Karnik Ram et.al. | 2505.09543v1 | null |
2025-05-14 | Multimodal transformers with elemental priors for phase classification of X-ray diffraction spectra | Kangyu Ji et.al. | 2505.09536v1 | null |
2025-05-14 | Contactless Cardiac Pulse Monitoring Using Event Cameras | Mohamed Moustafa et.al. | 2505.09529v1 | link |
2025-05-13 | UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations | Hanjung Kim et.al. | 2505.08787v1 | null |
2025-05-13 | PCS-UQ: Uncertainty Quantification via the Predictability-Computability-Stability Framework | Abhineet Agarwal et.al. | 2505.08784v1 | null |
2025-05-13 | Implet: A Post-hoc Subsequence Explainer for Time Series Models | Fanyu Meng et.al. | 2505.08748v1 | link |
2025-05-13 | Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion | Huiyan Qi et.al. | 2505.08747v1 | null |
2025-05-13 | TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series | Xiaolei Qin et.al. | 2505.08723v1 | link |
2025-05-13 | Contrastive Normalizing Flows for Uncertainty-Aware Parameter Estimation | Ibrahim Elsharkawy et.al. | 2505.08709v1 | null |
2025-05-13 | Big Data and the Computational Social Science of Entrepreneurship and Innovation | Ningzi Li et.al. | 2505.08706v1 | null |
2025-05-13 | LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs | K M Sajjadul Islam et.al. | 2505.08704v1 | null |
2025-05-14 | Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities | George Saon et.al. | 2505.08699v2 | null |
2025-05-13 | VIViT: Variable-Input Vision Transformer Framework for 3D MR Image Segmentation | Badhan Kumar Das et.al. | 2505.08693v1 | null |
2025-05-12 | DanceGRPO: Unleashing GRPO on Visual Generation | Zeyue Xue et.al. | 2505.07818v1 | null |
2025-05-12 | Pixel Motion as Universal Representation for Robot Control | Kanchana Ranasinghe et.al. | 2505.07817v1 | null |
2025-05-12 | DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies | Tony Tao et.al. | 2505.07813v1 | null |
2025-05-12 | Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets | Weiyu Li et.al. | 2505.07747v1 | null |
2025-05-12 | BodyGPS: Anatomical Positioning System | Halid Ziya Yerebakan et.al. | 2505.07744v1 | null |
2025-05-13 | VTutor for High-Impact Tutoring at Scale: Managing Engagement and Real-Time Multi-Screen Monitoring with P2P Connections | Eason Chen et.al. | 2505.07736v2 | null |
2025-05-12 | Spoken Language Understanding on Unseen Tasks With In-Context Learning | Neeraj Agrawal et.al. | 2505.07731v1 | null |
2025-05-12 | Gameplay Highlights Generation | Vignesh Edithal et.al. | 2505.07721v1 | null |
2025-05-12 | PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes | Daniel Ogenrwot et.al. | 2505.07700v1 | null |
2025-05-12 | ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation | Feng Yuan et.al. | 2505.07687v1 | link |
2025-05-09 | Adapting a Segmentation Foundation Model for Medical Image Classification | Pengfei Gu et.al. | 2505.06217v1 | null |
2025-05-09 | Topo-VM-UNetV2: Encoding Topology into Vision Mamba UNet for Polyp Segmentation | Diego Adame et.al. | 2505.06210v1 | null |
2025-05-09 | Leveraging Multi-Task Learning for Multi-Label Power System Security Assessment | Muhy Eddin Za'ter et.al. | 2505.06207v1 | null |
2025-05-09 | Auto Tensor Singular Value Thresholding: A Non-Iterative and Rank-Free Framework for Tensor Denoising | Hiroki Hasegawa et.al. | 2505.06203v1 | null |
2025-05-09 | Neuro-Symbolic Concepts | Jiayuan Mao et.al. | 2505.06191v1 | null |
2025-05-09 | Brain Hematoma Marker Recognition Using Multitask Learning: SwinTransformer and Swin-Unet | Kodai Hirata et.al. | 2505.06185v1 | null |
2025-05-09 | Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach | Tim Schneider et.al. | 2505.06182v1 | null |
2025-05-09 | New Advances in Phonons: From Band Topology to Quasiparticle Chirality | Tiantian Zhang et.al. | 2505.06179v1 | null |
2025-05-09 | MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks | Wenqi Zeng et.al. | 2505.06152v1 | link |
2025-05-09 | Estimating Quality in Therapeutic Conversations: A Multi-Dimensional Natural Language Processing Framework | Alice Rueda et.al. | 2505.06151v1 | null |
2025-05-08 | SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation | Yonwoo Choi et.al. | 2505.05475v1 | link |
2025-05-08 | 3D Scene Generation: A Survey | Beichen Wen et.al. | 2505.05474v1 | link |
2025-05-08 | StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant | Haibo Wang et.al. | 2505.05467v1 | null |
2025-05-08 | SITE: towards Spatial Intelligence Thorough Evaluation | Wenqi Wang et.al. | 2505.05456v1 | null |
2025-05-08 | Robustly optimal dynamics for active matter reservoir computing | Mario U. Gaimann et.al. | 2505.05420v1 | null |
2025-05-08 | DPQ-HD: Post-Training Compression for Ultra-Low Power Hyperdimensional Computing | Nilesh Prasad Pandey et.al. | 2505.05413v1 | null |
2025-05-08 | Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It | Marvin F. da Silva et.al. | 2505.05409v1 | null |
2025-05-08 | CART-ELC: Oblique Decision Tree Induction via Exhaustive Search | Andrew D. Laack et.al. | 2505.05402v1 | link |
2025-05-08 | OcularAge: A Comparative Study of Iris and Periocular Images for Pediatric Age Estimation | Naveenkumar G Venkataswamy et.al. | 2505.05374v1 | null |
2025-05-08 | BMS representations for generic supermomentum | Xavier Bekaert et.al. | 2505.05368v1 | null |
2025-05-07 | Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait | Feng Liu et.al. | 2505.04616v1 | null |
2025-05-07 | Dynamic Network Flow Optimization for Task Scheduling in PTZ Camera Surveillance Systems | Mohammad Merati et.al. | 2505.04596v1 | null |
2025-05-07 | Relative benefits of different active learning methods to conceptual physics learning | Meagan Sundstrom et.al. | 2505.04577v1 | null |
2025-05-07 | Multitask LSTM for Arboviral Outbreak Prediction Using Public Health Data | Lucas R. C. Farias et.al. | 2505.04566v1 | null |
2025-05-07 | Edge-GPU Based Face Tracking for Face Detection and Recognition Acceleration | Asma Baobaid et.al. | 2505.04524v1 | null |
2025-05-07 | Complementary legs and symplectic rational balls | John B. Etnyre et.al. | 2505.04513v1 | null |
2025-05-08 | HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation | Teng Hu et.al. | 2505.04512v2 | null |
2025-05-07 | Leveraging Simultaneous Usage of Edge GPU Hardware Engines for Video Face Detection and Recognition | Asma Baobaid et.al. | 2505.04502v1 | null |
2025-05-08 | FA-KPConv: Introducing Euclidean Symmetries to KPConv via Frame Averaging | Ali Alawieh et.al. | 2505.04485v2 | null |
2025-05-07 | Securing Immersive 360 Video Streams through Attribute-Based Selective Encryption | Mohammad Waquas Usmani et.al. | 2505.04466v1 | null |
2025-05-06 | Multi-Agent System for Comprehensive Soccer Understanding | Jiayuan Rao et.al. | 2505.03735v1 | null |
2025-05-06 | FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios | Shiyi Zhang et.al. | 2505.03730v1 | null |
2025-05-07 | Visual Imitation Enables Contextual Humanoid Control | Arthur Allshire et.al. | 2505.03729v2 | null |
2025-05-06 | DISARM++: Beyond scanner-free harmonization | Luca Caldera et.al. | 2505.03715v1 | null |
2025-05-06 | NBF at SemEval-2025 Task 5: Light-Burst Attention Enhanced System for Multilingual Subject Recommendation | Baharul Islam et.al. | 2505.03711v1 | null |
2025-05-06 | Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning | François Role et.al. | 2505.03703v1 | null |
2025-05-06 | Neural Integral Operators for Inverse problems in Spectroscopy | Emanuele Zappala et.al. | 2505.03677v1 | null |
2025-05-06 | Vector valued optimal transport: from dynamic to static formulations | Katy Craig et.al. | 2505.03670v1 | null |
2025-05-06 | m-accretive extensions of Friedrichs operators | Krešimir Burazin et.al. | 2505.03657v1 | null |
2025-05-06 | ALMA: Aggregated Lipschitz Maximization Attack on Auto-encoders | Chethan Krishnamurthy Ramanaik et.al. | 2505.03646v1 | null |
2025-05-06 | Towards Application-Specific Evaluation of Vision Models: Case Studies in Ecology and Biology | Alex Hoi Hang Chan et.al. | 2505.02825v2 | null |
2025-05-05 | Towards Quantifying the Hessian Structure of Neural Networks | Zhaorui Dong et.al. | 2505.02809v1 | null |
2025-05-05 | Beyond the Monitor: Mixed Reality Visualization and AI for Enhanced Digital Pathology Workflow | Jai Prakash Veerla et.al. | 2505.02780v1 | null |
2025-05-05 | Teaching the social media generation: rethinking learning without sacrificing quality | Sepinoud Azimi et.al. | 2505.02770v1 | null |
2025-05-05 | The use of Artificial Intelligence for Intervention and Assessment in Individuals with ASD | Aggeliki Sideraki et.al. | 2505.02747v1 | null |
2025-05-05 | The Spectrum of Stable Infinity Categories with Actions | Hisato Matsukawa et.al. | 2505.02724v1 | null |
2025-05-05 | A Rate-Quality Model for Learned Video Coding | Sang NguyenQuang et.al. | 2505.02720v1 | null |
2025-05-05 | Searching for supermassive black holes binaries within SRG/eROSITA-De I: Properties of the X-ray selected candidates | D. Tubín-Arenas et.al. | 2505.02708v1 | null |
2025-05-05 | Multi-View Learning with Context-Guided Receptance for Image Denoising | Binghong Chen et.al. | 2505.02705v1 | null |
2025-05-05 | A Survey on Progress in LLM Alignment from the Perspective of Reward Design | Miaomiao Ji et.al. | 2505.02666v1 | null |
2025-05-02 | GENMO: A GENeralist Model for Human MOtion | Jiefeng Li et.al. | 2505.01425v1 | null |
2025-05-02 | VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models | Mohammadreza Teymoorianfard et.al. | 2505.01406v1 | null |
2025-05-02 | Potential Contrast: Properties, Equivalences, and Generalization to Multiple Classes | Wallace Peaslee et.al. | 2505.01388v1 | null |
2025-05-02 | Emerging Media Use and Acceptance of Digital Immortality: A Cluster Analysis among Chinese Young Generations | Yi Mou et.al. | 2505.01355v1 | null |
2025-05-02 | How to Learn a Star: Binary Classification with Starshaped Polyhedral Sets | Marie-Charlotte Brandenburg et.al. | 2505.01346v1 | null |
2025-05-02 | Classifying Radio-Loud and Radio-Quiet Quasars With Novel PCA Based Regression Classifier | Ramkrishna Joshi et.al. | 2505.01335v1 | null |
2025-05-02 | DebtStreamness: An Ecological Approach to Credit Flows in Inter-Firm Networks | Anahí Rodríguez-Martínez et.al. | 2505.01326v1 | null |
2025-05-02 | Helping Big Language Models Protect Themselves: An Enhanced Filtering and Summarization System | Sheikh Samit Muhaimin et.al. | 2505.01315v1 | null |
2025-05-02 | Contactless pulse rate assessment: Results and insights for application in driving simulator | Đorđe D. Nešković et.al. | 2505.01299v1 | null |
2025-05-02 | ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow | Changhe Chen et.al. | 2505.01288v1 | null |
2025-05-01 | Controllable Weather Synthesis and Removal with Video Diffusion Models | Chih-Hao Lin et.al. | 2505.00704v1 | null |
2025-05-01 | GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution | Aditya Arora et.al. | 2505.00687v1 | null |
2025-05-01 | MINERVA: Evaluating Complex Video Reasoning | Arsha Nagrani et.al. | 2505.00681v1 | null |
2025-05-01 | Rational points on |
Sachi Hashimoto et.al. | 2505.00680v1 | null |
2025-05-01 | Deep Learning Assisted Outer Volume Removal for Highly-Accelerated Real-Time Dynamic MRI | Merve Gülle et.al. | 2505.00643v1 | null |
2025-05-01 | Bayes-Optimal Fair Classification with Multiple Sensitive Features | Yi Yang et.al. | 2505.00631v1 | null |
2025-05-01 | Brain Foundation Models with Hypergraph Dynamic Adapter for Brain Disease Analysis | Zhongying Deng et.al. | 2505.00627v1 | null |
2025-05-01 | Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction | Simon Giebenhain et.al. | 2505.00615v1 | null |
2025-05-01 | Dietary Intake Estimation via Continuous 3D Reconstruction of Food | Wallace Lee et.al. | 2505.00606v1 | null |
2025-05-01 | Visual Trajectory Prediction of Vessels for Inland Navigation | Alexander Puzicha et.al. | 2505.00599v1 | null |
2025-04-30 | ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction | Qihao Liu et.al. | 2504.21855v1 | null |
2025-04-30 | A Survey of Interactive Generative Video | Jiwen Yu et.al. | 2504.21853v1 | null |
2025-04-30 | Active Light Modulation to Counter Manipulation of Speech Visual Content | Hadleigh Schwartz et.al. | 2504.21846v1 | null |
2025-04-30 | Neuro-Symbolic Generation of Explanations for Robot Policies with Weighted Signal Temporal Logic | Mikihisa Yuasa et.al. | 2504.21841v1 | null |
2025-04-30 | Learning Universal User Representations Leveraging Cross-domain User Intent at Snapchat | Clark Mingxuan Ju et.al. | 2504.21838v1 | null |
2025-04-30 | Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization | Anas Anwarul Haq Khan et.al. | 2504.21831v1 | null |
2025-04-30 | Discrete series for the graded Hecke algebra of type |
Kei Yuen Chan et.al. | 2504.21790v1 | null |
2025-04-30 | LoC-LIC: Low Complexity Learned Image Coding Using Hierarchical Feature Transforms | Ayman A. Ameen et.al. | 2504.21778v1 | null |
2025-04-30 | Solving Copyright Infringement on Short Video Platforms: Novel Datasets and an Audio Restoration Deep Learning Pipeline | Minwoo Oh et.al. | 2504.21772v1 | null |
2025-04-30 | Ends of the strata of differentials | Benjamin Dozier et.al. | 2504.21756v1 | null |
2025-04-29 | TesserAct: Learning 4D Embodied World Models | Haoyu Zhen et.al. | 2504.20995v1 | null |
2025-04-29 | Photonic Quantum Convolutional Neural Networks with Adaptive State Injection | Léo Monbroussou et.al. | 2504.20989v1 | null |
2025-04-29 | SVD Based Least Squares for X-Ray Pneumonia Classification Using Deep Features | Mete Erdogan et.al. | 2504.20970v1 | null |
2025-04-29 | Soft-X-ray momentum microscopy of nonlinear magnon interactions below 100-nm wavelength | Steffen Wittrock et.al. | 2504.20958v1 | null |
2025-04-30 | DS_FusionNet: Dynamic Dual-Stream Fusion with Bidirectional Knowledge Distillation for Plant Disease Recognition | Yanghui Song et.al. | 2504.20948v2 | link |
2025-04-29 | Improvements of Dark Experience Replay and Reservoir Sampling towards Better Balance between Consolidation and Plasticity | Taisuke Kobayashi et.al. | 2504.20932v1 | null |
2025-04-29 | Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers | Quentin Guimard et.al. | 2504.20902v1 | null |
2025-04-29 | CBM-RAG: Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models | Hasan Md Tusfiqur Alam et.al. | 2504.20898v1 | null |
2025-04-29 | Imaging on the Edge: Mapping Object Corners and Edges with Stereo X-ray Tomography | Zhenduo Shang et.al. | 2504.20892v1 | null |
2025-04-30 | Quantifying the Noise of Structural Perturbations on Graph Adversarial Attacks | Junyuan Fang et.al. | 2504.20869v2 | null |
2025-04-28 | Learning Streaming Video Representation via Multitask Training | Yibin Yan et.al. | 2504.20041v1 | null |
2025-04-28 | Pan-genome Analysis of Angiosperm Plastomes using PGR-TK | Manoj P. Samanta et.al. | 2504.20034v1 | null |
2025-04-28 | Towards AI-Driven Policing: Interdisciplinary Knowledge Discovery from Police Body-Worn Camera Footage | Anita Srbinovska et.al. | 2504.20007v1 | null |
2025-04-28 | Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose | Narges Rashvand et.al. | 2504.19970v1 | null |
2025-04-28 | Enhancing Quality for VVC Compressed Videos with Omniscient Quality Enhancement Model | Xiem HoangVan et.al. | 2504.19935v1 | null |
2025-04-28 | Accelerated 3D-3D rigid registration of echocardiographic images obtained from apical window using particle filter | Thanuja Uruththirakodeeswaran et.al. | 2504.19930v1 | null |
2025-04-28 | Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI | Hugo Georgenthum et.al. | 2504.19918v1 | null |
2025-04-28 | Breast Cancer Detection from Multi-View Screening Mammograms with Visual Prompt Tuning | Han Chen et.al. | 2504.19900v1 | null |
2025-04-28 | GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets | Mingqian He et.al. | 2504.19898v1 | null |
2025-04-28 | CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition | Quynh Phung et.al. | 2504.19894v1 | null |
2025-04-25 | RSFR: A Coarse-to-Fine Reconstruction Framework for Diffusion Tensor Cardiac MRI with Semantic-Aware Refinement | Jiahao Huang et.al. | 2504.18520v1 | null |
2025-04-25 | Co-Change Graph Entropy: A New Process Metric for Defect Prediction | Ethari Hrishikesh et.al. | 2504.18511v1 | null |
2025-04-25 | Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models | Patrick Müller et.al. | 2504.18510v1 | null |
2025-04-25 | SymTFT, Protected Gaplessness, and Spontaneous Breaking of Non-invertible Symmetries | Michele Del Zotto et.al. | 2504.18501v1 | null |
2025-04-25 | Quasi-Einstein structures and Hitchin's equations | Alex Colling et.al. | 2504.18475v1 | null |
2025-04-25 | A Novel Taxonomy and Classification Scheme for Code Smell Interactions | Ruchin Gupta et.al. | 2504.18469v1 | null |
2025-04-25 | A Taylor Series Approach to Correction of Input Errors in Gaussian Process Regression | Muzaffar Qureshi et.al. | 2504.18463v1 | null |
2025-04-25 | Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training | Hiroki Naganuma et.al. | 2504.18454v1 | null |
2025-04-25 | NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration | Haotian Dong et.al. | 2504.18448v1 | null |
2025-04-25 | Iterative Event-based Motion Segmentation by Variational Contrast Maximization | Ryo Yamaki et.al. | 2504.18447v1 | null |
2025-04-24 | Dynamic Camera Poses and Where to Find Them | Chris Rockwell et.al. | 2504.17788v1 | null |
2025-04-24 | Silenzio: Secure Non-Interactive Outsourced MLP Training | Jonas Sander et.al. | 2504.17785v1 | null |
2025-04-24 | Disaggregated Deep Learning via In-Physics Computing at Radio Frequency | Zhihui Gao et.al. | 2504.17752v1 | null |
2025-04-24 | MSGCN: Multiplex Spatial Graph Convolution Network for Interlayer Link Weight Prediction | Steven E. Wilson et.al. | 2504.17749v1 | null |
2025-04-24 | Interpretable Early Detection of Parkinson's Disease through Speech Analysis | Lorenzo Simone et.al. | 2504.17739v1 | null |
2025-04-24 | CasualHDRSplat: Robust High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos | Shucheng Gong et.al. | 2504.17728v1 | null |
2025-04-24 | Unsupervised EEG-based decoding of absolute auditory attention with canonical correlation analysis | Nicolas Heintz et.al. | 2504.17724v1 | null |
2025-04-24 | Evaluating Uncertainty in Deep Gaussian Processes | Matthijs van der Lende et.al. | 2504.17719v1 | null |
2025-04-24 | Early Detection of Multidrug Resistance Using Multivariate Time Series Analysis and Interpretable Patient-Similarity Representations | Óscar Escudero-Arnanz et.al. | 2504.17717v1 | null |
2025-04-24 | Self-Supervised Noise Adaptive MRI Denoising via Repetition to Repetition (Rep2Rep) Learning | Nikola Janjušević et.al. | 2504.17698v1 | null |
2025-04-23 | I-Con: A Unifying Framework for Representation Learning | Shaden Alshammari et.al. | 2504.16929v1 | null |
2025-04-23 | Year six photometric measurements of known Trans-Neptunian Objects and Centaurs by the Dark Energy Survey | Feliphe S. Ferreira et.al. | 2504.16927v1 | null |
2025-04-23 | Meta-Learning Online Dynamics Model Adaptation in Off-Road Autonomous Driving | Jacob Levy et.al. | 2504.16923v1 | null |
2025-04-23 | Tracing Thought: Using Chain-of-Thought Reasoning to Identify the LLM Behind AI-Generated Text | Shifali Agrahari et.al. | 2504.16913v1 | null |
2025-04-23 | BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation | Ruotong Wang et.al. | 2504.16907v1 | null |
2025-04-23 | A new approach to the classification of almost contact metric manifolds via intrinsic endomorphisms | Ilka Agricola et.al. | 2504.16900v1 | null |
2025-04-23 | Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification | Alexander Shvets et.al. | 2504.16856v1 | null |
2025-04-23 | Energetics of the nucleation and glide of disconnection modes in symmetric tilt grain boundaries | Himanshu Joshi et.al. | 2504.16854v1 | null |
2025-04-23 | A Low-Cost Photogrammetry System for 3D Plant Modeling and Phenotyping | Joe Hrzich et.al. | 2504.16840v1 | null |
2025-04-23 | Symbiotic stars in the era of modern ground- and space-based surveys | Jaroslav Merc et.al. | 2504.16825v1 | null |
2025-04-22 | MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention | Yucheng Li et.al. | 2504.16083v1 | null |
2025-04-22 | MR. Video: "MapReduce" is the Principle for Long Video Understanding | Ziqi Pang et.al. | 2504.16082v1 | null |
2025-04-22 | Survey of Video Diffusion Models: Foundations, Implementations, and Applications | Yimu Wang et.al. | 2504.16081v1 | null |
2025-04-22 | Describe Anything: Detailed Localized Image and Video Captioning | Long Lian et.al. | 2504.16072v1 | null |
2025-04-22 | Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis | Frank Li et.al. | 2504.16047v1 | null |
2025-04-22 | LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale | Joya Chen et.al. | 2504.16030v1 | null |
2025-04-22 | Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework | Xinyuan Song et.al. | 2504.16016v1 | null |
2025-04-22 | MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment | Yachun Mi et.al. | 2504.16003v1 | null |
2025-04-22 | Neuroadaptive Haptics: Comparing Reinforcement Learning from Explicit Ratings and Neural Signals for Adaptive XR Systems | Lukas Gehrke et.al. | 2504.15984v1 | null |
2025-04-22 | Bug Destiny Prediction in Large Open-Source Software Repositories through Sentiment Analysis and BERT Topic Modeling | Sophie C. Pope et.al. | 2504.15972v1 | null |
2025-04-22 | DRAWER: Digital Reconstruction and Articulation With Environment Realism | Hongchi Xia et.al. | 2504.15278v2 | null |
2025-04-21 | Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models | Guo Chen et.al. | 2504.15271v1 | null |
2025-04-21 | An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes | Ji Qi et.al. | 2504.15270v1 | null |
2025-04-21 | Diffusion Bridge Models for 3D Medical Image Translation | Shaorong Zhang et.al. | 2504.15267v1 | null |
2025-04-21 | SuoiAI: Building a Dataset for Aquatic Invertebrates in Vietnam | Tue Vo et.al. | 2504.15252v1 | null |
2025-04-21 | On Walker and para-Hermite Einstein spaces | Adam Chudecki et.al. | 2504.15221v1 | null |
2025-04-22 | Histogram-based Parameter-efficient Tuning for Passive Sonar Classification | Amirmohammad Mohammadi et.al. | 2504.15214v2 | null |
2025-04-21 | Automated Measurement of Eczema Severity with Self-Supervised Learning | Neelesh Kumar et.al. | 2504.15193v1 | null |
2025-04-21 | Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform | Xianpan Zhou et.al. | 2504.15182v1 | null |
2025-04-21 | FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image | Fei Yin et.al. | 2504.15179v1 | null |
2025-04-18 | Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models | Junjie Yang et.al. | 2504.13825v1 | null |
2025-04-18 | CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning | Yang Yue et.al. | 2504.13820v1 | link |
2025-04-18 | The Binary and Ternary Quantization Can Improve Feature Discrimination | Weizhi Lu et.al. | 2504.13792v1 | null |
2025-04-18 | Fighting Fires from Space: Leveraging Vision Transformers for Enhanced Wildfire Detection and Characterization | Aman Agarwal et.al. | 2504.13776v1 | null |
2025-04-18 | Detecting Malicious Source Code in PyPI Packages with LLMs: Does RAG Come in Handy? | Motunrayo Ibiyo et.al. | 2504.13769v1 | null |
2025-04-18 | Modeling L1 Influence on L2 Pronunciation: An MFCC-Based Framework for Explainable Machine Learning and Pedagogical Feedback | Peyman Jahanbin et.al. | 2504.13765v1 | null |
2025-04-18 | Fragile Watermarking for Image Certification Using Deep Steganographic Embedding | Davide Ghiani et.al. | 2504.13759v1 | null |
2025-04-18 | Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis | Zhu Zhu et.al. | 2504.13754v1 | null |
2025-04-18 | LimitNet: Progressive, Content-Aware Image Offloading for Extremely Weak Devices & Networks | Ali Hojjat et.al. | 2504.13736v1 | null |
2025-04-18 | The relativity of color perception | Michel Berthier et.al. | 2504.13720v1 | null |
2025-04-17 | Perception Encoder: The best visual embeddings are not at the output of the network | Daniel Bolya et.al. | 2504.13181v1 | null |
2025-04-17 | PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding | Jang Hyun Cho et.al. | 2504.13180v1 | null |
2025-04-18 | ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos | Zetong Zhang et.al. | 2504.13167v2 | null |
2025-04-17 | Digital Twin Generation from Visual Data: A Survey | Andrew Melnik et.al. | 2504.13159v1 | null |
2025-04-17 | St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World | Haiwen Feng et.al. | 2504.13152v1 | null |
2025-04-17 | Readable Twins of Unreadable Models | Krzysztof Pancerz et.al. | 2504.13150v1 | null |
2025-04-17 | Long Range Navigator (LRN): Extending robot planning horizons beyond metric maps | Matt Schmittle et.al. | 2504.13149v1 | null |
2025-04-17 | PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition | Jongseo Lee et.al. | 2504.13140v1 | null |
2025-04-17 | NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results | Xin Li et.al. | 2504.13131v1 | link |
2025-04-17 | VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models | Haojian Huang et.al. | 2504.13122v1 | link |
2025-04-16 | Adapting a World Model for Trajectory Following in a 3D Game | Marko Tot et.al. | 2504.12299v1 | null |
2025-04-16 | SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians | Liam Schoneveld et.al. | 2504.12292v1 | null |
2025-04-16 | Beyond Reconstruction: A Physics Based Neural Deferred Shader for Photo-realistic Rendering | Zhuo He et.al. | 2504.12273v1 | null |
2025-04-16 | Correlation Ratio for Unsupervised Learning of Multi-modal Deformable Registration | Xiaojian Chen et.al. | 2504.12265v1 | null |
2025-04-16 | VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate | Zhihang Yuan et.al. | 2504.12259v1 | null |
2025-04-16 | FLIP Reasoning Challenge | Andreas Plesner et.al. | 2504.12256v1 | null |
2025-04-16 | Human Aligned Compression for Robust Models | Samuel Räber et.al. | 2504.12255v1 | null |
2025-04-16 | Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography | Zhijin He et.al. | 2504.12249v1 | null |
2025-04-16 | SIDME: Self-supervised Image Demoiréing via Masked Encoder-Decoder Reconstruction | Xia Wang et.al. | 2504.12245v1 | null |
2025-04-16 | Coding-Prior Guided Diffusion Network for Video Deblurring | Yike Liu et.al. | 2504.12222v1 | null |
2025-04-15 | Mamba-Based Ensemble learning for White Blood Cell Classification | Lewis Clifton et.al. | 2504.11438v1 | null |
2025-04-15 | Enhancing Out-of-Distribution Detection with Extended Logit Normalization | Yifan Ding et.al. | 2504.11434v1 | null |
2025-04-15 | Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models | Maria Teleki et.al. | 2504.11431v1 | null |
2025-04-15 | NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors | Yanrui Bin et.al. | 2504.11427v1 | null |
2025-04-15 | Deep Learning-based Bathymetry Retrieval without In-situ Depths using Remote Sensing Imagery and SfM-MVS DSMs with Data Gaps | Panagiotis Agrafiotis et.al. | 2504.11416v1 | null |
2025-04-15 | Statistical few-shot learning for large-scale classification via parameter pooling | Andrew Simpson et.al. | 2504.11404v1 | null |
2025-04-15 | VideoPanda: Video Panoramic Diffusion with Multi-view Attention | Kevin Xie et.al. | 2504.11389v1 | null |
2025-04-15 | Trajectory Encoding Temporal Graph Networks | Jiafeng Xiong et.al. | 2504.11386v1 | null |
2025-04-15 | Ring Artifacts Correction Based on Global-Local Features Interaction Guidance in the Projection Domain | Yunze Liu et.al. | 2504.11375v1 | null |
2025-04-15 | A two-phase quenching-type problem for the p-Laplacian | Julio C. Correa et.al. | 2504.11370v1 | null |
2025-04-14 | DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting | Zeren Jiang et.al. | 2504.10486v1 | null |
2025-04-14 | Quantum Barcodes: Persistent Homology for Quantum Phase Transitions | Khyathi Komalan et.al. | 2504.10468v1 | null |
2025-04-14 | Integrating Vision and Location with Transformers: A Multimodal Deep Learning Framework for Medical Wound Analysis | Ramin Mousa et.al. | 2504.10452v1 | null |
2025-04-14 | Multimodal Long Video Modeling Based on Temporal Dynamic Context | Haoran Hao et.al. | 2504.10443v1 | null |
2025-04-14 | Framing Perception: Exploring Camera Induced Objectification in Cinema | Parth Maradia et.al. | 2504.10404v1 | null |
2025-04-14 | PG-DPIR: An efficient plug-and-play method for high-count Poisson-Gaussian inverse problems | Maud Biquard et.al. | 2504.10375v1 | null |
2025-04-14 | Proteinoid spikes: from protocognitive to universal approximating agents | Saksham Sharma et.al. | 2504.10362v1 | null |
2025-04-14 | FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos | Rui Chen et.al. | 2504.10358v1 | null |
2025-04-14 | Patch and Shuffle: A Preprocessing Technique for Texture Classification in Autonomous Cementitious Fabrication | Jeremiah Giordani et.al. | 2504.10353v1 | null |
2025-04-14 | Domain-Adversarial Neural Network and Explainable AI for Reducing Tissue-of-Origin Signal in Pan-cancer Mortality Classification | Cristian Padron-Manrique et.al. | 2504.10343v1 | null |
2025-04-11 | ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning | Sahil Sethi et.al. | 2504.08713v1 | null |
2025-04-11 | Hypergraph Vision Transformers: Images are More than Nodes, More than Edges | Joshua Fixelle et.al. | 2504.08710v1 | null |
2025-04-11 | Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model | Team Seawead et.al. | 2504.08685v1 | null |
2025-04-11 | BowelRCNN: Region-based Convolutional Neural Network System for Bowel Sound Auscultation | Igor Matynia et.al. | 2504.08659v1 | null |
2025-04-11 | The Invisible EgoHand: 3D Hand Forecasting through EgoBody Pose Estimation | Masashi Hatano et.al. | 2504.08654v1 | null |
2025-04-11 | Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization | Jialu Li et.al. | 2504.08641v1 | null |
2025-04-11 | Transformer Learns Optimal Variable Selection in Group-Sparse Classification | Chenyang Zhang et.al. | 2504.08638v1 | null |
2025-04-11 | Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition | Lei Kang et.al. | 2504.08616v1 | null |
2025-04-11 | Enhancing knowledge retention for continual learning with domain-specific adapters and features gating | Mohamed Abbas Hedjazi et.al. | 2504.08613v1 | null |
2025-04-11 | A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English | Julian Bäumler et.al. | 2504.08609v1 | null |
2025-04-10 | GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation | Lang Lin et.al. | 2504.07962v1 | null |
2025-04-10 | Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction | Zeren Jiang et.al. | 2504.07961v1 | null |
2025-04-10 | VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning | Yukun Qi et.al. | 2504.07956v1 | null |
2025-04-10 | BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation | Yuanhong Yu et.al. | 2504.07955v1 | null |
2025-04-10 | InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians | Kefan Chen et.al. | 2504.07949v1 | null |
2025-04-10 | Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos | Rundong Luo et.al. | 2504.07940v1 | null |
2025-04-10 | Zero-Shot Low-dose CT Denoising via Sinogram Flicking | Yongyi Shi et.al. | 2504.07927v1 | null |
2025-04-10 | SKK groups of manifolds and non-unitary invertible TQFTs | Renee S. Hoekzema et.al. | 2504.07917v1 | null |
2025-04-10 | Semantically Encoding Activity Labels for Context-Aware Human Activity Recognition | Wen Ge et.al. | 2504.07916v1 | link |
2025-04-10 | The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound | Blake VanBerlo et.al. | 2504.07904v1 | null |
2025-04-09 | Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning | Nikhil Shivakumar Nayak et.al. | 2504.07097v1 | null |
2025-04-09 | FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution | Gene Chou et.al. | 2504.07093v1 | null |
2025-04-09 | Are We Done with Object-Centric Learning? | Alexander Rubinstein et.al. | 2504.07092v1 | null |
2025-04-10 | GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography | Mengchen Zhang et.al. | 2504.07083v2 | null |
2025-04-09 | Detecting AI-generated Artwork | Meien Li et.al. | 2504.07078v1 | null |
2025-04-09 | Enhancing Downstream Analysis in Genome Sequencing: Species Classification While Basecalling | Riselda Kodra et.al. | 2504.07065v1 | null |
2025-04-09 | Ismaïl Baaj et.al. | 2504.07055v1 | null | |
2025-04-09 | Classification results for totally real surfaces of nearly Kähler |
Michaël Liefsoens et.al. | 2504.07035v1 | null |
2025-04-09 | Weak Signals and Heavy Tails: Machine-learning meets Extreme Value Theory | Stephan Clémençon et.al. | 2504.06984v1 | null |
2025-04-10 | VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning | Xinhao Li et.al. | 2504.06958v2 | null |
2025-04-08 | PainNet: Statistical Relation Network with Episode-Based Training for Pain Estimation | Mina Bishay et.al. | 2504.06257v1 | null |
2025-04-08 | Monitoring Viewer Attention During Online Ads | Mina Bishay et.al. | 2504.06237v1 | null |
2025-04-08 | From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models | Chejian Xu et.al. | 2504.06214v1 | null |
2025-04-08 | HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation | Yiming Liang et.al. | 2504.06210v1 | null |
2025-04-08 | An experimental survey and Perspective View on Meta-Learning for Automated Algorithms Selection and Parametrization | Moncef Garouani et.al. | 2504.06207v1 | null |
2025-04-08 | HRMedSeg: Unlocking High-resolution Medical Image segmentation via Memory-efficient Attention Modeling | Qing Xu et.al. | 2504.06205v1 | link |
2025-04-08 | Positive 3-braids, Khovanov homology and Garside theory | Álvaro Del Valle Vílchez et.al. | 2504.06194v1 | null |
2025-04-08 | Rethinking the Nested U-Net Approach: Enhancing Biomarker Segmentation with Attention Mechanisms and Multiscale Feature Fusion | Saad Wazir et.al. | 2504.06158v1 | link |
2025-04-08 | A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning | Akash Kumar et.al. | 2504.06153v1 | null |
2025-04-08 | Optimal classification with outcome performativity | Elizabeth Maggie Penn et.al. | 2504.06127v1 | null |
2025-04-07 | SmolVLM: Redefining small and efficient multimodal models | Andrés Marafioti et.al. | 2504.05299v1 | null |
2025-04-07 | One-Minute Video Generation with Test-Time Training | Karan Dalal et.al. | 2504.05298v1 | null |
2025-04-07 | Hopf tori and standard tori | Leonardo A. Cano García et.al. | 2504.05285v1 | null |
2025-04-07 | AnomalousNet: A Hybrid Approach with Attention U-Nets and Change Point Detection for Accurate Characterization of Anomalous Diffusion in Video Data | Yusef Ahsini et.al. | 2504.05271v1 | null |
2025-04-07 | Explaining Low Perception Model Competency with High-Competency Counterfactuals | Sara Pohland et.al. | 2504.05254v1 | null |
2025-04-07 | Federated Learning for Medical Image Classification: A Comprehensive Benchmark | Zhekai Zhou et.al. | 2504.05238v1 | null |
2025-04-07 | Mapping biodiversity at very-high resolution in Europe | César Leblanc et.al. | 2504.05231v1 | null |
2025-04-07 | Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation | Jiaming Chen et.al. | 2504.05225v1 | null |
2025-04-07 | An ensemble deep learning approach to detect tumors on Mohs micrographic surgery slides | Abdurrahim Yilmaz et.al. | 2504.05219v1 | null |
2025-04-07 | LLM-Alignment Live-Streaming Recommendation | Yueyang Liu et.al. | 2504.05217v1 | null |
2025-04-04 | Bonsai: Interpretable Tree-Adaptive Grounded Reasoning | Kate Sanders et.al. | 2504.03640v1 | null |
2025-04-04 | MedSAM2: Segment Anything in 3D Medical Images and Videos | Jun Ma et.al. | 2504.03600v1 | null |
2025-04-04 | Real-is-Sim: Bridging the Sim-to-Real Gap with a Dynamic Digital Twin for Real-World Robot Policy Evaluation | Jad Abou-Chakra et.al. | 2504.03597v1 | null |
2025-04-04 | AdaViT: Adaptive Vision Transformer for Flexible Pretrain and Finetune with Variable 3D Medical Image Modalities | Badhan Kumar Das et.al. | 2504.03589v1 | null |
2025-04-04 | AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing | Niu Lian et.al. | 2504.03587v1 | link |
2025-04-04 | Dense Neural Network Based Arrhythmia Classification on Low-cost and Low-compute Micro-controller | Md Abu Obaida Zishan et.al. | 2504.03531v1 | null |
2025-04-04 | LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders | Ilan Naiman et.al. | 2504.03501v1 | null |
2025-04-04 | Physics-informed 4D X-ray image reconstruction from ultra-sparse spatiotemporal data | Zisheng Yao et.al. | 2504.03469v1 | null |
2025-04-04 | Conditioning Diffusions Using Malliavin Calculus | Jakiw Pidstrigach et.al. | 2504.03461v1 | null |
2025-04-04 | Early detection of diabetes through transfer learning-based eye (vision) screening and improvement of machine learning model performance and advanced parameter setting algorithms | Mohammad Reza Yousefi et.al. | 2504.03439v1 | null |
2025-04-03 | STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection | Divya Velayudhan et.al. | 2504.02823v1 | null |
2025-04-03 | GMR-Conv: An Efficient Rotation and Reflection Equivariant Convolution Kernel Using Gaussian Mixture Rings | Yuexi Du et.al. | 2504.02819v1 | null |
2025-04-03 | BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation | Van Nguyen Nguyen et.al. | 2504.02812v1 | null |
2025-04-03 | Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets | Chuning Zhu et.al. | 2504.02792v1 | null |
2025-04-03 | GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation | Zhiyuan Yan et.al. | 2504.02782v1 | null |
2025-04-03 | Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model | Shengjun Zhang et.al. | 2504.02764v1 | null |
2025-04-03 | A Complete Classification of Fourier Summation Formulas on the real line | Felipe Gonçalves et.al. | 2504.02741v1 | null |
2025-04-03 | HQViT: Hybrid Quantum Vision Transformer for Image Classification | Hui Zhang et.al. | 2504.02730v1 | null |
2025-04-03 | Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation | Xingguang Zhang et.al. | 2504.02697v1 | null |
2025-04-03 | Two-Stage nnU-Net for Automatic Multi-class Bi-Atrial Segmentation from LGE-MRIs | Y. On et.al. | 2504.02668v1 | null |
2025-04-02 | Learning from Streaming Video with Orthogonal Gradients | Tengda Han et.al. | 2504.01961v1 | null |
2025-04-02 | Slot-Level Robotic Placement via Visual Imitation from Single Human Video | Dandan Shan et.al. | 2504.01959v1 | null |
2025-04-03 | VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step | Hanyang Wang et.al. | 2504.01956v2 | null |
2025-04-02 | A thorough benchmark of automatic text classification: From traditional approaches to large language models | Washington Cunha et.al. | 2504.01930v1 | null |
2025-04-02 | Gen-C: Populating Virtual Worlds with Generative Crowds | Andreas Panayiotou et.al. | 2504.01924v1 | null |
2025-04-02 | Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness | Haochen Wang et.al. | 2504.01901v1 | null |
2025-04-02 | Is Temporal Prompting All We Need For Limited Labeled Action Recognition? | Shreyank N Gowda et.al. | 2504.01890v1 | null |
2025-04-02 | CO-DEFEND: Continuous Decentralized Federated Learning for Secure DoH-Based Threat Detection | Diego Cajaraville-Aboy et.al. | 2504.01882v1 | null |
2025-04-02 | Architect Your Landscape Approach (AYLA) for Optimizations in Deep Learning | Ben Keslaki et.al. | 2504.01875v1 | null |
2025-04-02 | Buggin: Automatic intrinsic bugs classification model using NLP and ML | Pragya Bhandari et.al. | 2504.01869v1 | null |
2025-03-31 | Easi3R: Estimating Disentangled Motion from DUSt3R Without Training | Xingyu Chen et.al. | 2503.24391v1 | link |
2025-03-31 | Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation | Shengqiong Wu et.al. | 2503.24379v1 | null |
2025-03-31 | Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 | Yi Chen et.al. | 2503.24376v1 | link |
2025-04-02 | Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation | Abhiram Maddukuri et.al. | 2503.24361v2 | null |
2025-03-31 | Single-Shot Matrix-Matrix Multiplication Optical Tensor Processor for Deep Learning | Chao Luan et.al. | 2503.24356v1 | null |
2025-03-31 | PathOrchestra: A Comprehensive Foundation Model for Computational Pathology with Over 100 Diverse Clinical-Grade Tasks | Fang Yan et.al. | 2503.24345v1 | null |
2025-03-31 | On gradient |
Maria Andrade et.al. | 2503.24337v1 | null |
2025-03-31 | NoProp: Training Neural Networks without Back-propagation or Forward-propagation | Qinyu Li et.al. | 2503.24322v1 | null |
2025-03-31 | A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG | Arshia Kermani et.al. | 2503.24307v1 | null |
2025-03-31 | Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions | Thinesh Thiyakesan Ponbagavathi et.al. | 2503.24298v1 | null |
2025-03-28 | Understanding Co-speech Gestures in-the-wild | Sindhu B Hegde et.al. | 2503.22668v1 | null |
2025-03-28 | Evaluation of Machine-generated Biomedical Images via A Tally-based Similarity Measure | Frank J. Brooks et.al. | 2503.22658v1 | null |
2025-03-28 | Deep learning-enabled prediction of surgical errors during cataract surgery: from simulation to real-world application | Maxime Faure et.al. | 2503.22647v1 | null |
2025-03-28 | Sentiment Classification of Thai Central Bank Press Releases Using Supervised Learning | Stefano Grassi et.al. | 2503.22629v1 | null |
2025-03-28 | Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model | Jangho Park et.al. | 2503.22622v1 | null |
2025-03-28 | Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users | Antonia Karamolegkou et.al. | 2503.22610v1 | null |
2025-03-28 | Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis | Shuai Shen et.al. | 2503.22605v1 | null |
2025-03-28 | Zero-homogeneous and |
Luc Nguyen et.al. | 2503.22599v1 | null |
2025-03-28 | KEVS: Enhancing Segmentation of Visceral Adipose Tissue in Pre-Cystectomy CT with Gaussian Kernel Density Estimation | Thomas Boucher et.al. | 2503.22592v1 | null |
2025-03-28 | Using AI to Summarize US Presidential Campaign TV Advertisement Videos, 1952-2012 | Adam Breuer et.al. | 2503.22589v1 | link |
2025-03-27 | Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model | Abdelrahman Shaker et.al. | 2503.21782v1 | link |
2025-03-27 | VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models | Chi-Pin Huang et.al. | 2503.21781v1 | null |
2025-03-27 | Video-R1: Reinforcing Video Reasoning in MLLMs | Kaituo Feng et.al. | 2503.21776v1 | link |
2025-03-27 | StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion | Ziyu Guo et.al. | 2503.21775v1 | null |
2025-03-27 | Exploring the Evolution of Physics Cognition in Video Generation: A Survey | Minghui Lin et.al. | 2503.21765v1 | link |
2025-03-28 | Phases with non-invertible symmetries in 1+1D |
Ömer M. Aksoy et.al. | 2503.21764v2 | null |
2025-03-27 | Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video | David Yifan Yao et.al. | 2503.21761v1 | null |
2025-03-27 | Large Scale Structure and the Cosmic Web | Rita Tojeiro et.al. | 2503.21759v1 | null |
2025-03-27 | VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness | Dian Zheng et.al. | 2503.21755v1 | link |
2025-03-27 | MAVERIX: Multimodal Audio-Visual Evaluation Reasoning IndeX | Liuyue Xie et.al. | 2503.21699v1 | null |
2025-03-26 | Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency | Tianqi Liu et.al. | 2503.20785v1 | null |
2025-03-26 | Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising | Yan-Bo Lin et.al. | 2503.20782v1 | null |
2025-03-26 | BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation | Yulu Pan et.al. | 2503.20781v1 | null |
2025-03-26 | Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields | Shijie Zhou et.al. | 2503.20776v1 | null |
2025-03-26 | Disentangled Source-Free Personalization for Facial Expression Recognition with Neutral Target Data | Masoumeh Sharafi et.al. | 2503.20771v1 | null |
2025-03-27 | An Empirical Study of the Impact of Federated Learning on Machine Learning Model Accuracy | Haotian Yang et.al. | 2503.20768v2 | null |
2025-03-26 | PhysGen3D: Crafting a Miniature Interactive World from a Single Image | Boyuan Chen et.al. | 2503.20746v1 | null |
2025-03-26 | MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams | Yanpeng Sun et.al. | 2503.20745v1 | null |
2025-03-26 | RecTable: Fast Modeling Tabular Data with Rectified Flow | Masane Fuchi et.al. | 2503.20731v1 | null |
2025-03-26 | MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion | Saron Samuel et.al. | 2503.20698v1 | null |
2025-03-25 | PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model | Mingju Gao et.al. | 2503.19913v1 | null |
2025-03-25 | FullDiT: Multi-Task Video Generative Foundation Model with Full Attention | Xuan Ju et.al. | 2503.19907v1 | null |
2025-03-25 | Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better | Zihang Lai et.al. | 2503.19904v1 | null |
2025-03-25 | Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation | Tianhao Qi et.al. | 2503.19881v1 | null |
2025-03-25 | Extensions of regret-minimization algorithm for optimal design | Youguang Chen et.al. | 2503.19874v1 | null |
2025-03-25 | Unpaired Translation of Chest X-ray Images for Lung Opacity Diagnosis via Adaptive Activation Masks and Cross-Domain Alignment | Junzhi Ning et.al. | 2503.19860v1 | null |
2025-03-25 | Towards Online Multi-Modal Social Interaction Understanding | Xinpeng Li et.al. | 2503.19851v1 | null |
2025-03-25 | FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs | Carlos Plou et.al. | 2503.19850v1 | null |
2025-03-26 | Attention IoU: Examining Biases in CelebA using Attention Maps | Aaron Serianni et.al. | 2503.19846v2 | link |
2025-03-25 | Multi-view Learning for the Identification of Risky Users in Dynamic Social Networks | Francesco Benedetti et.al. | 2503.19831v1 | null |
2025-03-24 | Target-Aware Video Diffusion Models | Taeksoo Kim et.al. | 2503.18950v1 | null |
2025-03-24 | Aether: Geometric-Aware Unified World Modeling | Aether Team et.al. | 2503.18945v1 | null |
2025-03-24 | SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding | Mingze Xu et.al. | 2503.18943v1 | null |
2025-03-24 | Video-T1: Test-Time Scaling for Video Generation | Fangfu Liu et.al. | 2503.18942v1 | null |
2025-03-24 | Training-free Diffusion Acceleration with Bottleneck Sampling | Ye Tian et.al. | 2503.18940v1 | null |
2025-03-24 | AdaWorld: Learning Adaptable World Models with Latent Actions | Shenyuan Gao et.al. | 2503.18938v1 | null |
2025-03-24 | SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction | Enrico Pallotta et.al. | 2503.18933v1 | null |
2025-03-24 | CoMP: Continual Multimodal Pre-training for Vision Foundation Models | Yitong Chen et.al. | 2503.18931v1 | link |
2025-03-24 | Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models | Meng Cao et.al. | 2503.18923v1 | null |
2025-03-24 | Online 3D Scene Reconstruction Using Neural Object Priors | Thomas Chabal et.al. | 2503.18897v1 | null |
2025-03-21 | Position: Interactive Generative Video as Next-Generation Game Engine | Jiwen Yu et.al. | 2503.17359v1 | null |
2025-03-21 | Time-Series U-Net with Recurrence for Noise-Robust Imaging Photoplethysmography | Vineet R. Shenoy et.al. | 2503.17351v1 | null |
2025-03-21 | Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer | Qingyu Shi et.al. | 2503.17350v1 | null |
2025-03-21 | Efficient Intent-Based Filtering for Multi-Party Conversations Using Knowledge Distillation from LLMs | Reem Gody et.al. | 2503.17336v1 | null |
2025-03-21 | Lattice Materials with Topological States Optimized On-Demand | Pegah Azizi et.al. | 2503.17320v1 | null |
2025-03-21 | Quasiconformal Maps between Bowditch Boundaries of Relatively Hyperbolic Groups | Rana Sardar et.al. | 2503.17312v1 | null |
2025-03-21 | LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language | Kun Chu et.al. | 2503.17309v1 | null |
2025-03-21 | Exploring the Temporal Dynamics of Facial Mimicry in Emotion Processing Using Action Units | Meisam Jamshidi Seikavandi et.al. | 2503.17306v1 | null |
2025-03-21 | HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks | Maria Pilligua et.al. | 2503.17276v1 | null |
2025-03-21 | Vision Transformer Based Semantic Communications for Next Generation Wireless Networks | Muhammad Ahmed Mohsin et.al. | 2503.17275v1 | null |
2025-03-20 | XAttention: Block Sparse Attention with Antidiagonal Scoring | Ruyi Xu et.al. | 2503.16428v1 | null |
2025-03-20 | MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance | Quanhao Li et.al. | 2503.16421v1 | null |
2025-03-20 | M3: 3D-Spatial MultiModal Memory | Xueyan Zou et.al. | 2503.16413v1 | null |
2025-03-20 | ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos | Haolin Yang et.al. | 2503.16400v1 | null |
2025-03-21 | SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation | Chun-Han Yao et.al. | 2503.16396v2 | null |
2025-03-20 | Attentional Triple-Encoder Network in Spatiospectral Domains for Medical Image Segmentation | Kristin Qi et.al. | 2503.16389v1 | null |
2025-03-20 | Probabilistic Quantum SVM Training on Ising Machine | Haoqi He et.al. | 2503.16363v1 | null |
2025-03-20 | Enhancing variational quantum algorithms by balancing training on classical and quantum hardware | Rahul Bhowmick et.al. | 2503.16361v1 | null |
2025-03-20 | UniSync: A Unified Framework for Audio-Visual Synchronization | Tao Feng et.al. | 2503.16357v1 | null |
2025-03-20 | Principal Actions on Topological Quivers and Associated Operator Dynamics | Matthew Gillespie et.al. | 2503.16352v1 | null |
2025-03-19 | Fast Two-photon Microscopy by Neuroimaging with Oblong Random Acquisition (NORA) | Esther Whang et.al. | 2503.15487v1 | null |
2025-03-19 | TULIP: Towards Unified Language-Image Pretraining | Zineng Tang et.al. | 2503.15485v1 | null |
2025-03-19 | Learning to Play Piano in the Real World | Yves-Simon Zeulner et.al. | 2503.15481v1 | null |
2025-03-19 | Cube: A Roblox View of 3D Intelligence | Foundation AI Team et.al. | 2503.15475v1 | null |
2025-03-19 | EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining | Boshen Xu et.al. | 2503.15470v1 | null |
2025-03-20 | Dynamic Bi-Elman Attention Networks (DBEAN): Dual-Directional Context-Aware Representation Learning for Enhanced Text Classification | ZhengLin Lai et.al. | 2503.15469v2 | link |
2025-03-19 | LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding | Amirhossein Kazerouni et.al. | 2503.15420v1 | null |
2025-03-19 | Temporal Regularization Makes Your Video Generator Stronger | Harold Haodong Chen et.al. | 2503.15417v1 | null |
2025-03-19 | Automated Processing of eXplainable Artificial Intelligence Outputs in Deep Learning Models for Fault Diagnostics of Large Infrastructures | Giovanni Floreale et.al. | 2503.15415v1 | null |
2025-03-19 | Federated Continual 3D Segmentation With Single-round Communication | Can Peng et.al. | 2503.15414v1 | null |
2025-03-18 | MusicInfuser: Making Video Diffusion Listen and Dance | Susung Hong et.al. | 2503.14505v1 | null |
2025-03-18 | Aligning Multimodal LLM with Human Preference: A Survey | Tao Yu et.al. | 2503.14504v1 | null |
2025-03-18 | Utilization of Neighbor Information for Image Classification with Different Levels of Supervision | Gihan Jayatilaka et.al. | 2503.14500v1 | null |
2025-03-18 | Tracking Meets Large Multimodal Models for Driving Scenario Understanding | Ayesha Ishaq et.al. | 2503.14498v1 | null |
2025-03-18 | Stable Virtual Camera: Generative View Synthesis with Diffusion Models | Jensen et.al. | 2503.14489v1 | null |
2025-03-18 | Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset | Yiqun Mei et.al. | 2503.14485v1 | null |
2025-03-18 | SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model | Yucheng Mao et.al. | 2503.14463v1 | null |
2025-03-18 | Functional classification of metabolic networks | Jorge Reyes et.al. | 2503.14437v1 | null |
2025-03-18 | LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers | Nikhil Abhyankar et.al. | 2503.14434v1 | null |
2025-03-18 | MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation | Hongyu Zhang et.al. | 2503.14428v1 | null |
2025-03-17 | VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning | Ye Liu et.al. | 2503.13444v1 | null |
2025-03-17 | Can Yang-Baxter imply Lie algebra? | Dmitry Khudoteplov et.al. | 2503.13437v1 | null |
2025-03-17 | WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes | Ling Yang et.al. | 2503.13435v1 | null |
2025-03-17 | Escaping Plato's Cave: Robust Conceptual Reasoning through Interpretable 3D Neural Object Volumes | Nhi Pham et.al. | 2503.13429v1 | null |
2025-03-17 | FLEX: A Framework for Learning Robot-Agnostic Force-based Skills Involving Sustained Contact Object Manipulation | Shijie Fang et.al. | 2503.13418v1 | null |
2025-03-17 | U2AD: Uncertainty-based Unsupervised Anomaly Detection Framework for Detecting T2 Hyperintensity in MRI Spinal Cord | Qi Zhang et.al. | 2503.13400v1 | null |
2025-03-17 | TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM | Ye Wang et.al. | 2503.13377v1 | null |
2025-03-17 | Multivariate Sparse Functional Linear Discriminant Analysis: An Application to Inflammatory Bowel Disease Classification | Limeng Liu et.al. | 2503.13372v1 | null |
2025-03-17 | SyncDiff: Diffusion-based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization | Xulin Fan et.al. | 2503.13371v1 | null |
2025-03-17 | Agents Play Thousands of 3D Video Games | Zhongwen Xu et.al. | 2503.13356v1 | null |
2025-03-14 | Scalable Video Conferencing Using SDN Principles | Oliver Michel et.al. | 2503.11649v1 | null |
2025-03-14 | ReCamMaster: Camera-Controlled Generative Rendering from A Single Video | Jianhong Bai et.al. | 2503.11647v1 | null |
2025-03-14 | Pathology Image Compression with Pre-trained Autoencoders | Srikar Yellapragada et.al. | 2503.11591v1 | null |
2025-03-14 | Generalization performance of neural mapping schemes for the space-time interpolation of satellite-derived ocean colour datasets | Thi Thuy Nga Nguyen et.al. | 2503.11588v1 | null |
2025-03-14 | Image Reconstruction from an Elastically Distorted Scan | Adrian Lopez et.al. | 2503.11584v1 | null |
2025-03-14 | Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers | Weiming Ren et.al. | 2503.11579v1 | null |
2025-03-14 | RASA: Replace Anyone, Say Anything -- A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing | Tianrui Pan et.al. | 2503.11571v1 | null |
2025-03-14 | Observation-only learning of neural mapping schemes for gappy satellite-derived ocean colour parameters | Clément Dorffer et.al. | 2503.11532v1 | null |
2025-03-14 | HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models | Ziqin Zhou et.al. | 2503.11513v1 | null |
2025-03-14 | Alzheimer's Disease Classification Using Retinal OCT: TransnetOCT and Swin Transformer Models | Siva Manohar Reddy Kesu et.al. | 2503.11511v1 | null |
2025-03-13 | V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes | Yanming Zhang et.al. | 2503.10634v1 | null |
2025-03-13 | NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models | Mert Albaba et.al. | 2503.10626v1 | null |
2025-03-13 | LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds | Lingteng Qiu et.al. | 2503.10625v1 | null |
2025-03-13 | OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer | Jinyang Li et.al. | 2503.10616v1 | null |
2025-03-13 | MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction | Yingshuang Zou et.al. | 2503.10604v1 | null |
2025-03-13 | CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models | Hao He et.al. | 2503.10592v1 | null |
2025-03-13 | Long Context Tuning for Video Generation | Yuwei Guo et.al. | 2503.10589v1 | null |
2025-03-13 | Learning Interpretable Logic Rules from Deep Vision Models | Chuqin Geng et.al. | 2503.10547v1 | null |
2025-03-13 | From Linear to Spline-Based Classification:Developing and Enhancing SMPA for Noisy Non-Linear Datasets | Vatsal Srivastava et.al. | 2503.10545v1 | null |
2025-03-13 | Lightweight Models for Emotional Analysis in Video | Quoc-Tien Nguyen et.al. | 2503.10530v1 | null |
2025-03-12 | PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop | Chenyu Li et.al. | 2503.09595v1 | null |
2025-03-12 | BIMBA: Selective-Scan Compression for Long-Range Video Question Answering | Md Mohaiminul Islam et.al. | 2503.09590v1 | null |
2025-03-12 | Fair Federated Medical Image Classification Against Quality Shift via Inter-Client Progressive State Matching | Nannan Wu et.al. | 2503.09587v1 | null |
2025-03-12 | Auspex: Building Threat Modeling Tradecraft into an Artificial Intelligence-based Copilot | Andrew Crossman et.al. | 2503.09586v1 | null |
2025-03-12 | Manify: A Python Library for Learning Non-Euclidean Representations | Philippe Chlenski et.al. | 2503.09576v1 | null |
2025-03-12 | TPDiff: Temporal Pyramid Video Diffusion Model | Lingmin Ran et.al. | 2503.09566v1 | null |
2025-03-12 | FCaS: Fine-grained Cardiac Image Synthesis based on 3D Template Conditional Diffusion Model | Jiahao Xia et.al. | 2503.09560v1 | null |
2025-03-13 | The R2D2 Deep Neural Network Series for Scalable Non-Cartesian Magnetic Resonance Imaging | Yiwei Chen et.al. | 2503.09559v2 | null |
2025-03-12 | CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games | Peng Chen et.al. | 2503.09527v1 | null |
2025-03-12 | Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework | Bakary Badjie et.al. | 2503.09504v1 | null |
2025-03-11 | QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension | Yongdong Luo et.al. | 2503.08689v1 | null |
2025-03-11 | REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder | Yitian Zhang et.al. | 2503.08665v1 | null |
2025-03-11 | MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention | Yuhan Wang et.al. | 2503.08664v1 | null |
2025-03-11 | Task-Oriented Co-Design of Communication, Computing, and Control for Edge-Enabled Industrial Cyber-Physical Systems | Yufeng Diao et.al. | 2503.08661v1 | null |
2025-03-11 | How Does Overparameterization Affect Machine Unlearning of Deep Neural Networks? | Gal Alon et.al. | 2503.08633v1 | null |
2025-03-11 | Cross-Embodiment Robotic Manipulation Synthesis via Guided Demonstrations through CycleVAE and Human Behavior Transformer | Apan Dastider et.al. | 2503.08622v1 | null |
2025-03-11 | Vision Transformer for Intracranial Hemorrhage Classification in CT Scans Using an Entropy-Aware Fuzzy Integral Strategy for Adaptive Scan-Level Decision Fusion | Mehdi Hosseini Chagahi et.al. | 2503.08609v1 | null |
2025-03-11 | Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling | Subin Kim et.al. | 2503.08605v1 | null |
2025-03-11 | Towards species' classification of the \textit{Anastrepha pseudoparallela} group | Gabriel R. Palma et.al. | 2503.08598v1 | null |
2025-03-11 | Proc4Gem: Foundation models for physical agency through procedural generation | Yixin Lin et.al. | 2503.08593v1 | null |
2025-03-10 | Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru | Dunant Cusipuma et.al. | 2503.07587v1 | null |
2025-03-10 | Efficient Distributed Learning over Decentralized Networks with Convoluted Support Vector Machine | Canyi Chen et.al. | 2503.07563v1 | null |
2025-03-10 | CPAny: Couple With Any Encoder to Refer Multi-Object Tracking | Weize Li et.al. | 2503.07516v1 | null |
2025-03-10 | ADROIT: A Self-Supervised Framework for Learning Robust Representations for Active Learning | Soumya Banerjee et.al. | 2503.07506v1 | null |
2025-03-10 | Blind-Wayfarer: A Minimalist, Probing-Driven Framework for Resilient Navigation in Perception-Degraded Environments | Yanran Xu et.al. | 2503.07492v1 | null |
2025-03-10 | NeAS: 3D Reconstruction from X-ray Images using Neural Attenuation Surface | Chengrui Zhu et.al. | 2503.07491v1 | null |
2025-03-10 | VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models | Jiacheng Ruan et.al. | 2503.07478v1 | null |
2025-03-10 | A Review on Geometry and Surface Inspection in 3D Concrete Printing | K. Mawas et.al. | 2503.07472v1 | null |
2025-03-10 | Simultaneous Energy Harvesting and Bearing Fault Detection using Piezoelectric Cantilevers | P. Peralta-Braz et.al. | 2503.07462v1 | null |
2025-03-10 | Open-Set Gait Recognition from Sparse mmWave Radar Point Clouds | Riccardo Mazzieri et.al. | 2503.07435v1 | null |
2025-03-10 | Analysis of 3D Urticaceae Pollen Classification Using Deep Learning Models | Tijs Konijn et.al. | 2503.07419v1 | null |
2025-03-10 | AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion | Mingzhen Sun et.al. | 2503.07418v1 | null |
2025-03-10 | TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision | Shaobin Zhuang et.al. | 2503.07416v1 | null |
2025-03-10 | Keeping Representation Similarity in Finetuning for Medical Image Analysis | Wenqiang Zu et.al. | 2503.07399v1 | null |
2025-03-10 | Brain Inspired Adaptive Memory Dual-Net for Few-Shot Image Classification | Kexin Di et.al. | 2503.07396v1 | null |
2025-03-10 | Is My Text in Your AI Model? Gradient-based Membership Inference Test applied to LLMs | Gonzalo Mancera et.al. | 2503.07384v1 | null |
2025-03-07 | Task-oriented Uncertainty Collaborative Learning for Label-Efficient Brain Tumor Segmentation | Zhenxuan Zhang et.al. | 2503.05682v1 | null |
2025-03-07 | A comparison of the Alkire-Foster method and a Markov random field approach in the analysis of multidimensional poverty | Joseph Lam et.al. | 2503.05676v1 | null |
2025-03-07 | Kinodynamic Model Predictive Control for Energy Efficient Locomotion of Legged Robots with Parallel Elasticity | Yulun Zhuang et.al. | 2503.05666v1 | null |
2025-03-07 | A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval | Yu Zhang et.al. | 2503.05659v1 | null |
2025-03-07 | On a classification problem for a quiver of type |
Ivon Dorado et.al. | 2503.05643v1 | null |
2025-03-07 | VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control | Yuxuan Bian et.al. | 2503.05639v1 | null |
2025-03-07 | TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models | Mark YU et.al. | 2503.05638v1 | null |
2025-03-07 | Exploring FMCW Radars and Feature Maps for Activity Recognition: A Benchmark Study | Ali Samimi Fard et.al. | 2503.05629v1 | null |
2025-03-07 | Learning LLM Preference over Intra-Dialogue Pairs: A Framework for Utterance-level Understandings | Xuanqing Liu et.al. | 2503.05620v1 | null |
2025-03-07 | CACTUS: An Open Dataset and Framework for Automated Cardiac Assessment and Classification of Ultrasound Images Using Deep Transfer Learning | Hanae Elmekki et.al. | 2503.05604v1 | null |
2025-03-06 | FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video | Yue Gao et.al. | 2503.04720v1 | null |
2025-03-06 | Iris Style Transfer: Enhancing Iris Recognition with Style Features and Privacy Preservation through Neural Style Transfer | Mengdi Wang et.al. | 2503.04707v1 | null |
2025-03-07 | Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size | Alireza Behtash et.al. | 2503.04704v2 | null |
2025-03-06 | Coarse graining and reduced order models for plume ejection dynamics | Ike Griss Salas et.al. | 2503.04690v1 | null |
2025-03-06 | Mixed Near-field and Far-field Target Localization for Low-altitude Economy | Cong Zhou et.al. | 2503.04681v1 | null |
2025-03-06 | An Information-theoretic Multi-task Representation Learning Framework for Natural Language Understanding | Dou Hu et.al. | 2503.04667v1 | null |
2025-03-06 | What Are You Doing? A Closer Look at Controllable Human Video Generation | Emanuele Bugliarello et.al. | 2503.04666v1 | null |
2025-03-06 | Implicit Neural Representation for Video and Image Super-Resolution | Mary Aiyetigbo et.al. | 2503.04665v1 | null |
2025-03-06 | RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining | Tengfei Zhang et.al. | 2503.04653v1 | null |
2025-03-06 | Adaptive Prototype Learning for Multimodal Cancer Survival Analysis | Hong Liu et.al. | 2503.04643v1 | null |
2025-03-05 | GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control | Xuanchi Ren et.al. | 2503.03751v1 | link |
2025-03-05 | PacketCLIP: Multi-Modal Embedding of Network Traffic and Language for Cybersecurity Reasoning | Ryozo Masukawa et.al. | 2503.03747v1 | null |
2025-03-05 | OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction | Huang Huang et.al. | 2503.03734v1 | null |
2025-03-05 | Machine Learning in Biomechanics: Key Applications and Limitations in Walking, Running, and Sports Movements | Carlo Dindorf et.al. | 2503.03717v1 | null |
2025-03-05 | Handling Uncertainty in Health Data using Generative Algorithms | Mahdi Arab Loodaricheh et.al. | 2503.03715v1 | null |
2025-03-05 | Rethinking Video Tokenization: A Conditioned Diffusion-based Approach | Nianzu Yang et.al. | 2503.03708v1 | null |
2025-03-05 | DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance | Zhao Yang et.al. | 2503.03689v1 | null |
2025-03-05 | Empowering Multi-class Classification for Complex Functional Data with Simultaneous Feature Selection | Shuoyang Wang et.al. | 2503.03679v1 | null |
2025-03-05 | LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant | Wei Li et.al. | 2503.03663v1 | null |
2025-03-05 | Limits of nonlinear and dispersive fiber propagation for photonic extreme learning | Andrei V. Ermolaev et.al. | 2503.03649v1 | null |
2025-03-04 | Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation | Han Xue et.al. | 2503.02881v1 | null |
2025-03-04 | SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models | Dmitry Nechaev et.al. | 2503.02876v1 | null |
2025-03-04 | Unsupervised Attributed Dynamic Network Embedding with Stability Guarantees | Emma Ceccherini et.al. | 2503.02859v1 | null |
2025-03-04 | Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024 | Nuria Alina Chandra et.al. | 2503.02857v1 | null |
2025-03-04 | Multimodal Deep Learning for Subtype Classification in Breast Cancer Using Histopathological Images and Gene Expression Data | Amin Honarmandi Shandiz et.al. | 2503.02849v1 | null |
2025-03-04 | In-Depth Analysis of Automated Acne Disease Recognition and Classification | Afsana Ahsan Jeny et.al. | 2503.02835v1 | null |
2025-03-04 | A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness | Nathan Drenkow et.al. | 2503.02797v1 | null |
2025-03-04 | Undertrained Image Reconstruction for Realistic Degradation in Blind Image Super-Resolution | Ru Ito et.al. | 2503.02767v1 | null |
2025-03-04 | Seeded Poisson Factorization: Leveraging domain knowledge to fit topic models | Bernd Prostmaier et.al. | 2503.02741v1 | null |
2025-03-04 | UAR-NVC: A Unified AutoRegressive Framework for Memory-Efficient Neural Video Compression | Jia Wang et.al. | 2503.02733v1 | null |
2025-02-28 | TomoSelfDEQ: Self-Supervised Deep Equilibrium Learning for Sparse-Angle CT Reconstruction | Tatiana A. Bubba et.al. | 2502.21320v1 | null |
2025-02-28 | Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos | Zhiyu Tan et.al. | 2502.21314v1 | null |
2025-02-28 | AutoComb: Automated Comb Sign Detector for 3D CTE Scans | Shashwat Gupta et.al. | 2502.21311v1 | null |
2025-02-28 | Bilevel Optimized Implicit Neural Representation for Scan-Specific Accelerated MRI Reconstruction | Hongze Yu et.al. | 2502.21292v1 | null |
2025-02-28 | Utilizing Quantum Fingerprints in Plant Cells to Evaluate Plant productivity | Umadini Ranasinghe et.al. | 2502.21275v1 | null |
2025-02-28 | Adaptive Keyframe Sampling for Long Video Understanding | Xi Tang et.al. | 2502.21271v1 | null |
2025-02-28 | PET Image Denoising via Text-Guided Diffusion: Integrating Anatomical Priors through Text Prompts | Boxiao Yu et.al. | 2502.21260v1 | null |
2025-02-28 | RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete | Yuheng Ji et.al. | 2502.21257v1 | null |
2025-02-28 | ALVI Interface: Towards Full Hand Motion Decoding for Amputees Using sEMG | Aleksandr Kovalev et.al. | 2502.21256v1 | null |
2025-02-28 | Short-Rate Derivatives in a Higher-for-Longer Environment | Aram Karakhanyan et.al. | 2502.21252v1 | null |
2025-02-27 | Walking the Web of Concept-Class Relationships in Incrementally Trained Interpretable Models | Susmit Agrawal et.al. | 2502.20393v1 | null |
2025-02-27 | Point Policy: Unifying Observations and Actions with Key Points for Robot Manipulation | Siddhant Haldar et.al. | 2502.20391v1 | null |
2025-02-27 | Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation | Sucheng Ren et.al. | 2502.20388v1 | null |
2025-02-27 | InsTaG: Learning Personalized 3D Talking Head from Few-Second Video | Jiahe Li et.al. | 2502.20387v1 | null |
2025-02-27 | ATLAS Navigator: Active Task-driven LAnguage-embedded Gaussian Splatting | Dexter Ong et.al. | 2502.20386v1 | null |
2025-02-27 | Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling | Hanyang Kong et.al. | 2502.20378v1 | null |
2025-02-27 | When does a predictor know its own loss? | Aravind Gollakota et.al. | 2502.20375v1 | null |
2025-02-27 | OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection | Shuming Liu et.al. | 2502.20361v1 | link |
2025-02-27 | KNOWM Memristors in a Bridge Synapse delay-based Reservoir Computing system for detection of epileptic seizures | Dawid Przyczyna et.al. | 2502.20351v1 | null |
2025-02-27 | T1-PILOT: Optimized Trajectories for T1 Mapping Acceleration | Tamir Shor et.al. | 2502.20333v1 | null |
2025-02-26 | TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding | Max Ku et.al. | 2502.19400v1 | null |
2025-02-26 | Multi-modal Contrastive Learning for Tumor-specific Missing Modality Synthesis | Minjoo Lim et.al. | 2502.19390v1 | null |
2025-02-26 | Surface-Based Manipulation | Ziqiao Wang et.al. | 2502.19389v1 | null |
2025-02-26 | Residual Speech Embeddings for Tone Classification: Removing Linguistic Content to Enhance Paralinguistic Analysis | Hamdan Al Ahbabi et.al. | 2502.19387v1 | null |
2025-02-26 | Efficient 4D fMRI ASD Classification using Spatial-Temporal-Omics-based Learning Framework | Ziqiao Weng et.al. | 2502.19386v1 | null |
2025-02-26 | Deep Learning For Time Series Analysis With Application On Human Motion | Ali Ismail-Fawaz et.al. | 2502.19364v1 | null |
2025-02-26 | Deep Learning-Based Transfer Learning for Classification of Cassava Disease | Ademir G. Costa Junior et.al. | 2502.19351v1 | null |
2025-02-26 | Unveiling Wireless Users' Locations via Modulation Classification-based Passive Attack | Ali Hanif et.al. | 2502.19341v1 | null |
2025-02-26 | I Know What I Don't Know: Improving Model Cascades Through Confidence Tuning | Stephan Rabanser et.al. | 2502.19335v1 | null |
2025-02-26 | Deep learning and classical computer vision techniques in medical image analysis: Case studies on brain MRI tissue segmentation, lung CT COPD registration, and skin lesion classification | Anyimadu Daniel Tweneboah et.al. | 2502.19258v1 | null |
2025-02-25 | Ion counting and temperature determination of Coulomb-crystallized laser-cooled ions in traps using convolutional neural networks | Yanning Yin et.al. | 2502.18442v1 | null |
2025-02-25 | Is OpenAlex Suitable for Research Quality Evaluation and Which Citation Indicator is Best? | Mike Thelwall et.al. | 2502.18427v1 | null |
2025-02-25 | Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand | Fengshuo Bai et.al. | 2502.18423v1 | null |
2025-02-25 | MedKAN: An Advanced Kolmogorov-Arnold Network for Medical Image Classification | Zhuoqin Yang et.al. | 2502.18416v1 | null |
2025-02-25 | Enhancing DNA Foundation Models to Address Masking Inefficiencies | Monireh Safari et.al. | 2502.18405v1 | null |
2025-02-25 | Learning sparse generalized linear models with binary outcomes via iterative hard thresholding | Namiko Matsumoto et.al. | 2502.18393v1 | null |
2025-02-25 | EgoSim: An Egocentric Multi-view Simulator and Real Dataset for Body-worn Cameras during Motion and Activity | Dominik Hollidt et.al. | 2502.18373v1 | null |
2025-02-25 | MindMem: Multimodal for Predicting Advertisement Memorability Using LLMs and Deep Learning | Sepehr Asgarian et.al. | 2502.18371v1 | null |
2025-02-25 | Exploring proteomic signatures in sepsis and non-infectious systemic inflammatory response syndrome | Adolfo Ruiz-Sanmartín et.al. | 2502.18305v1 | null |
2025-02-25 | Quantization of the Momentum Map via |
Chiara Esposito et.al. | 2502.18295v1 | null |
2025-02-24 | FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning | Jason Jingzhou Liu et.al. | 2502.17432v1 | null |
2025-02-24 | X-Dancer: Expressive Music to Human Dance Video Generation | Zeyuan Chen et.al. | 2502.17414v1 | null |
2025-02-24 | Enriching Physical-Virtual Interaction in AR Gaming by Tracking Identical Real Objects | Liuchuan Yu et.al. | 2502.17399v1 | null |
2025-02-24 | Robust Confinement State Classification with Uncertainty Quantification through Ensembled Data-Driven Methods | Yoeri Poels et.al. | 2502.17397v1 | null |
2025-02-24 | RELICT: A Replica Detection Framework for Medical Image Generation | Orhun Utku Aydin et.al. | 2502.17360v1 | null |
2025-02-24 | Travel Time Reliability in Stochastic Kinematic Flow Models | Alexander Hammerl et.al. | 2502.17359v1 | null |
2025-02-24 | Leveraging Procedural Knowledge and Task Hierarchies for Efficient Instructional Video Pre-training | Karan Samel et.al. | 2502.17352v1 | null |
2025-02-24 | +Tour: Recommending personalized itineraries for smart tourism | João Paulo Esper et.al. | 2502.17345v1 | link |
2025-02-24 | City riots fed by transnational and trans-topic web-of-influence | Akshay Verma et.al. | 2502.17331v1 | null |
2025-02-24 | AnyTop: Character Animation Diffusion with Any Topology | Inbar Gat et.al. | 2502.17327v1 | null |
2025-02-21 | VaViM and VaVAM: Autonomous Driving through Video Generative Modeling | Florent Bartoccioni et.al. | 2502.15672v1 | link |
2025-02-21 | Local geometry of high-dimensional mixture models: Effective spectral theory and dynamical transitions | Gerard Ben Arous et.al. | 2502.15655v1 | null |
2025-02-21 | Mantis: Lightweight Calibrated Foundation Model for User-Friendly Time Series Classification | Vasilii Feofanov et.al. | 2502.15637v1 | null |
2025-02-21 | Pick-and-place Manipulation Across Grippers Without Retraining: A Learning-optimization Diffusion Policy Approach | Xiangtong Yao et.al. | 2502.15613v1 | null |
2025-02-21 | PDeepPP:A Deep learning framework with Pretrained Protein language for peptide classification | Jixiu Zhai et.al. | 2502.15610v1 | null |
2025-02-21 | On the Robustness of Transformers against Context Hijacking for Linear Classification | Tianle Li et.al. | 2502.15609v1 | null |
2025-02-21 | Benchmarking machine learning for bowel sound pattern classification from tabular features to pretrained models | Zahra Mansour et.al. | 2502.15607v1 | null |
2025-02-21 | Causal Modeling of fMRI Time-series for Interpretable Autism Spectrum Disorder Classification | Peiyu Duan et.al. | 2502.15595v1 | null |
2025-02-21 | Estimating Vehicle Speed on Roadways Using RNNs and Transformers: A Video-based Approach | Sai Krishna Reddy Mareddy et.al. | 2502.15545v1 | null |
2025-02-21 | Implications of Photon Mass: Vortextrap Magnetization of Black Holes | Gia Dvali et.al. | 2502.15510v1 | null |
2025-02-20 | Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts | Sara Ghaboura et.al. | 2502.14865v1 | null |
2025-02-20 | Dynamic Concepts Personalization from Single Videos | Rameen Abdal et.al. | 2502.14844v1 | null |
2025-02-20 | Improving the Diffusability of Autoencoders | Ivan Skorokhodov et.al. | 2502.14831v1 | null |
2025-02-20 | Cross Validation for Correlated Data in Regression and Classification Models, with Applications to Deep Learning | Oren Yuval et.al. | 2502.14808v1 | null |
2025-02-20 | FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis | Fadillah Maani et.al. | 2502.14807v1 | null |
2025-02-20 | AVD2: Accident Video Diffusion for Accident Video Description | Cheng Li et.al. | 2502.14801v1 | null |
2025-02-20 | Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration | Pengxiang Ding et.al. | 2502.14795v1 | null |
2025-02-20 | SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features | Michael Tschannen et.al. | 2502.14786v1 | link |
2025-02-20 | Sparse Activations as Conformal Predictors | Margarida M. Campos et.al. | 2502.14773v1 | null |
2025-02-20 | MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders | Maya Varma et.al. | 2502.14753v1 | null |
2025-02-19 | Qwen2.5-VL Technical Report | Shuai Bai et.al. | 2502.13923v1 | null |
2025-02-19 | Audio-Based Classification of Insect Species Using Machine Learning Models: Cicada, Beetle, Termite, and Cricket | Manas V Shetty et.al. | 2502.13893v1 | null |
2025-02-19 | Multi-view Video-Pose Pretraining for Operating Room Surgical Activity Recognition | Idris Hamoud et.al. | 2502.13883v1 | null |
2025-02-19 | Ribbon blocks for centraliser algebras of symmetric groups | Matthew Fayers et.al. | 2502.13867v1 | null |
2025-02-19 | MSVCOD:A Large-Scale Multi-Scene Dataset for Video Camouflage Object Detection | Shuyong Gao et.al. | 2502.13859v1 | null |
2025-02-19 | Generative Video Semantic Communication via Multimodal Semantic Fusion with Large Model | Hang Yin et.al. | 2502.13838v1 | null |
2025-02-19 | MGFI-Net: A Multi-Grained Feature Integration Network for Enhanced Medical Image Segmentation | Yucheng Zeng et.al. | 2502.13808v1 | null |
2025-02-19 | Classifying thick subcategories over a Koszul complex via the curved BGG correspondence | Jian Liu et.al. | 2502.13806v1 | null |
2025-02-19 | Binary VPN Traffic Detection Using Wavelet Features and Machine Learning | Yasameen Sajid Razooqi et.al. | 2502.13804v1 | null |
2025-02-19 | From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education | Yi-Fan Zhang et.al. | 2502.13789v1 | null |
2025-02-18 | Pre-training Auto-regressive Robotic Models with 4D Representations | Dantong Niu et.al. | 2502.13142v1 | null |
2025-02-18 | Magma: A Foundation Model for Multimodal AI Agents | Jianwei Yang et.al. | 2502.13130v1 | null |
2025-02-18 | BOLIMES: Boruta and LIME optiMized fEature Selection for Gene Expression Classification | Bich-Chung Phan et.al. | 2502.13080v1 | null |
2025-02-18 | L4P: Low-Level 4D Vision Perception Unified | Abhishek Badki et.al. | 2502.13078v1 | null |
2025-02-18 | Improved Fine-Tuning of Large Multimodal Models for Hateful Meme Detection | Jingbiao Mei et.al. | 2502.13061v1 | null |
2025-02-18 | Benchmarking MedMNIST dataset on real quantum hardware | Gurinder Singh et.al. | 2502.13056v1 | null |
2025-02-18 | LAMD: Context-driven Android Malware Detection and Classification with LLMs | Xingzhi Qian et.al. | 2502.13055v1 | null |
2025-02-18 | QZO: A Catalog of 5 Million Quasars from the Zwicky Transient Facility | S. J. Nakoneczny et.al. | 2502.13054v1 | null |
2025-02-18 | Development of systematic uncertainty-aware neural network trainings for binned-likelihood analyses at the LHC | CMS Collaboration et.al. | 2502.13047v1 | null |
2025-02-18 | How far are two symmetric matrices from commuting? With an application to object characterisation and identification in metal detection | P. D. Ledger et.al. | 2502.13038v1 | null |
2025-02-17 | VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution | Chendong Wang et.al. | 2502.12151v1 | null |
2025-02-17 | Idiosyncrasies in Large Language Models | Mingjie Sun et.al. | 2502.12150v1 | null |
2025-02-17 | LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities | Florian Sestak et.al. | 2502.12128v1 | null |
2025-02-17 | Hypernym Bias: Unraveling Deep Classifier Training Dynamics through the Lens of Class Hierarchy | Roman Malashin et.al. | 2502.12125v1 | null |
2025-02-17 | Crime in Proportions: Applying Compositional Data Analysis to European Crime Trends for 2022 | Onur Batın Doğan et.al. | 2502.12099v1 | null |
2025-02-17 | Descriminative-Generative Custom Tokens for Vision-Language Models | Pramuditha Perera et.al. | 2502.12095v1 | null |
2025-02-17 | Unifying Explainable Anomaly Detection and Root Cause Analysis in Dynamical Systems | Yue Sun et.al. | 2502.12086v1 | null |
2025-02-17 | AdaSplash: Adaptive Sparse Flash Attention | Nuno Gonçalves et.al. | 2502.12082v1 | null |
2025-02-17 | Unhackable Temporal Rewarding for Scalable Video MLLMs | En Yu et.al. | 2502.12081v1 | null |
2025-02-17 | Classifying the Stoichiometry of Virus-like Particles with Interpretable Machine Learning | Jiayang Zhang et.al. | 2502.12049v1 | null |
2025-02-14 | Simplifying DINO via Coding Rate Regularization | Ziyang Wu et.al. | 2502.10385v1 | null |
2025-02-14 | Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data | Corinna Cortes et.al. | 2502.10381v1 | null |
2025-02-14 | Quasi-isometry classification of certain graph |
Byung Hee An et.al. | 2502.10366v1 | null |
2025-02-14 | Proper Learnability and the Role of Unlabeled Data | Julian Asilis et.al. | 2502.10359v1 | null |
2025-02-14 | Diameter bounds for |
Luke Jeffreys et.al. | 2502.10358v1 | null |
2025-02-14 | OptimOTU: Taxonomically aware OTU clustering with optimized thresholds and a bioinformatics workflow for metabarcoding data | Brendan Furneaux et.al. | 2502.10350v1 | null |
2025-02-14 | Ocular Disease Classification Using CNN with Deep Convolutional Generative Adversarial Network | Arun Kunwar et.al. | 2502.10334v1 | null |
2025-02-14 | SegX: Improving Interpretability of Clinical Image Diagnosis with Segmentation-based Enhancement | Yuhao Zhang et.al. | 2502.10296v1 | null |
2025-02-14 | Probing Perceptual Constancy in Large Vision Language Models | Haoran Sun et.al. | 2502.10273v1 | null |
2025-02-14 | Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model | Guoqing Ma et.al. | 2502.10248v1 | null |
2025-02-13 | Embed Any NeRF: Graph Meta-Networks for Neural Tasks on Arbitrary NeRF Architectures | Francesco Ballerini et.al. | 2502.09623v1 | null |
2025-02-13 | Exploring the Potential of Encoder-free Architectures in 3D LMMs | Yiwen Tang et.al. | 2502.09620v1 | null |
2025-02-13 | Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights | Jonathan Kahana et.al. | 2502.09619v1 | null |
2025-02-13 | DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References | Xueyi Liu et.al. | 2502.09614v1 | null |
2025-02-13 | Morphological Classification of Galaxies | Karen Masters et.al. | 2502.09610v1 | null |
2025-02-13 | GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis | Angelos Zavras et.al. | 2502.09598v1 | null |
2025-02-13 | Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs | Siyan Zhao et.al. | 2502.09597v1 | null |
2025-02-13 | Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering | Mark Beliaev et.al. | 2502.09573v1 | null |
2025-02-13 | Diffusing DeBias: a Recipe for Turning a Bug into a Feature | Massimiliano Ciranni et.al. | 2502.09564v1 | null |
2025-02-13 | Learned Correction Methods for Ultrasound Computed Tomography Imaging Using Simplified Physics Models | Luke Lozenski et.al. | 2502.09546v1 | null |
2025-02-12 | CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation | Qinghe Wang et.al. | 2502.08639v1 | null |
2025-02-12 | Rapid Whole Brain Mesoscale In-vivo MR Imaging using Multi-scale Implicit Neural Representation | Jun Lyu et.al. | 2502.08634v1 | null |
2025-02-12 | Ensemble based approach to quantifying uncertainty of LLM based classifications | Srijith Rajamohan et.al. | 2502.08631v1 | null |
2025-02-12 | Robot Data Curation with Mutual Information Estimators | Joey Hejna et.al. | 2502.08623v1 | null |
2025-02-12 | Forecasting Drought Using Machine Learning in California | Nan K. Li et.al. | 2502.08622v1 | null |
2025-02-12 | SportsBuddy: Designing and Evaluating an AI-Powered Sports Video Storytelling Tool Through Real-World Deployment | Tica Lin et.al. | 2502.08621v1 | null |
2025-02-12 | Learning Selection Cuts With Gradients | Mike Hance et.al. | 2502.08615v1 | null |
2025-02-12 | Continuous Cardiac Arrest Prediction in ICU using PPG Foundation Model | Saurabh Kataria et.al. | 2502.08612v1 | null |
2025-02-12 | CurvGAD: Leveraging Curvature for Enhanced Graph Anomaly Detection | Karish Grover et.al. | 2502.08605v1 | null |
2025-02-12 | Light-A-Video: Training-free Video Relighting via Progressive Light Fusion | Yujie Zhou et.al. | 2502.08590v1 | link |
2025-02-11 | Pippo: High-Resolution Multi-View Humans from a Single Image | Yash Kant et.al. | 2502.07785v1 | null |
2025-02-11 | Statistical Reevaluation of the USP Classification Boundary: Smaller Planets Within 1 Day, Larger Period Ratios Below 2 Days | Armaan V. Goyal et.al. | 2502.07773v1 | null |
2025-02-11 | A forbidden subgraph study for cut problems on graphs permitting loops and multiedges | Tala Eagling-Vose et.al. | 2502.07769v1 | null |
2025-02-11 | An Advanced NLP Framework for Automated Medical Diagnosis with DeBERTa and Dynamic Contextual Positional Gating | Mohammad Ali Labbaf Khaniki et.al. | 2502.07755v1 | null |
2025-02-11 | HiPoNet: A Topology-Preserving Multi-View Neural Network For High Dimensional Point Cloud and Single-Cell Data | Siddharth Viswanath et.al. | 2502.07746v1 | null |
2025-02-11 | Next Block Prediction: Video Generation via Semi-Auto-Regressive Modeling | Shuhuai Ren et.al. | 2502.07737v1 | null |
2025-02-11 | PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization | Bing Fan et.al. | 2502.07707v1 | null |
2025-02-11 | Magic 1-For-1: Generating One Minute Video Clips within One Minute | Hongwei Yi et.al. | 2502.07701v1 | null |
2025-02-11 | SoK: A Classification for AI-driven Personalized Privacy Assistants | Victor Morel et.al. | 2502.07693v1 | null |
2025-02-11 | Auto-Drafting Police Reports from Noisy ASR Outputs: A Trust-Centered LLM Approach | Param Kulkarni et.al. | 2502.07677v1 | null |
2025-02-10 | Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT | Dongyang Liu et.al. | 2502.06782v1 | null |
2025-02-10 | KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification | Yue Zhu et.al. | 2502.06779v1 | null |
2025-02-10 | ALMACAL XIII. Evolution of the CO luminosity function and the molecular gas mass density out to |
Victoria Bollo et.al. | 2502.06778v1 | null |
2025-02-10 | Enhancing Performance of Explainable AI Models with Constrained Concept Refinement | Geyu Liang et.al. | 2502.06775v1 | null |
2025-02-10 | History-Guided Video Diffusion | Kiwhan Song et.al. | 2502.06764v1 | null |
2025-02-10 | Equations over Finite Monoids with Infinite Promises | Alberto Larrauri et.al. | 2502.06762v1 | null |
2025-02-10 | Incentivizing Desirable Effort Profiles in Strategic Classification: The Role of Causality and Uncertainty | Valia Efthymiou et.al. | 2502.06749v1 | null |
2025-02-10 | Wandering around: A bioinspired approach to visual attention through object motion sensitivity | Giulia D Angelo et.al. | 2502.06747v1 | null |
2025-02-10 | Persistent spin grids with spin-orbit coupled 2D electron gas | A. V. Poshakinskiy et.al. | 2502.06745v1 | null |
2025-02-10 | Enhancing Pneumonia Diagnosis and Severity Assessment through Deep Learning: A Comprehensive Approach Integrating CNN Classification and Infection Segmentation | S Kumar Reddy Mallidi et.al. | 2502.06735v1 | null |
2025-02-07 | FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation | Shilong Zhang et.al. | 2502.05179v1 | null |
2025-02-07 | Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Yunhang Shen et.al. | 2502.05177v1 | null |
2025-02-07 | AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting | Chung-Ho Wu et.al. | 2502.05176v1 | null |
2025-02-07 | VideoRoPE: What Makes for Good Video Rotary Position Embedding? | Xilin Wei et.al. | 2502.05173v1 | null |
2025-02-07 | Torsion pairs and 3-fold flops | Parth Shimpi et.al. | 2502.05146v1 | null |
2025-02-07 | Chest X-ray Foundation Model with Global and Local Representations Integration | Zefan Yang et.al. | 2502.05142v1 | null |
2025-02-07 | Counting Fish with Temporal Representations of Sonar Video | Kai Van Brunt et.al. | 2502.05129v1 | null |
2025-02-07 | Multiphoton, multimode state classification for nonlinear optical circuits | Denis A. Kopylov et.al. | 2502.05123v1 | null |
2025-02-07 | Investigating the impact of kernel harmonization and deformable registration on inspiratory and expiratory chest CT images for people with COPD | Aravind R. Krishnan et.al. | 2502.05119v1 | null |
2025-02-07 | GiesKaNe: Bridging Past and Present in Grammatical Theory and Practical Application | Volker Emmrich et.al. | 2502.05113v1 | null |
2025-02-06 | Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment | Zuyan Liu et.al. | 2502.04328v1 | null |
2025-02-06 | WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs | Jack Hong et.al. | 2502.04326v1 | null |
2025-02-06 | MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation | Jinbo Xing et.al. | 2502.04299v1 | null |
2025-02-06 | Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression | Lirui Wang et.al. | 2502.04296v1 | null |
2025-02-06 | Retro-Rank-In: A Ranking-Based Approach for Inorganic Materials Synthesis Planning | Thorben Prein et.al. | 2502.04289v1 | null |
2025-02-06 | How does a Multilingual LM Handle Multiple Languages? | Santhosh Kakarla et.al. | 2502.04269v1 | null |
2025-02-06 | Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion | Marco Mistretta et.al. | 2502.04263v1 | null |
2025-02-06 | Work in Progress: AI-Powered Engineering-Bridging Theory and Practice | Oz Levy et.al. | 2502.04256v1 | null |
2025-02-06 | An object detection approach for lane change and overtake detection from motion profiles | Andrea Benericetti et.al. | 2502.04244v1 | null |
2025-02-06 | Saflo: eBPF-Based MPTCP Scheduler for Mitigating Traffic Analysis Attacks in Cellular Networks | Sangwoo Lee et.al. | 2502.04236v1 | null |
2025-02-05 | Seeing World Dynamics in a Nutshell | Qiuhong Shen et.al. | 2502.03465v1 | null |
2025-02-05 | SKI Models: Skeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living | Arkaprava Sinha et.al. | 2502.03459v1 | null |
2025-02-05 | Kineto-Dynamical Planning and Accurate Execution of Minimum-Time Maneuvers on Three-Dimensional Circuits | Mattia Piccinini et.al. | 2502.03454v1 | null |
2025-02-05 | Linearized Optimal Transport pyLOT Library: A Toolkit for Machine Learning on Point Clouds | Jun Linwu et.al. | 2502.03439v1 | null |
2025-02-05 | A Temporal Convolutional Network-Based Approach and a Benchmark Dataset for Colonoscopy Video Temporal Segmentation | Carlo Biffi et.al. | 2502.03430v1 | null |
2025-02-05 | Concept Based Explanations and Class Contrasting | Rudolf Herdt et.al. | 2502.03422v1 | null |
2025-02-05 | A Structured Reasoning Framework for Unbalanced Data Classification Using Probabilistic Models | Junliang Du et.al. | 2502.03386v1 | null |
2025-02-05 | Deep Learning-Based Approach for Identification of Potato Leaf Diseases Using Wrapper Feature Selection and Feature Concatenation | Muhammad Ahtsam Naeem et.al. | 2502.03370v1 | null |
2025-02-05 | Learning from Active Human Involvement through Proxy Value Propagation | Zhenghao Peng et.al. | 2502.03369v1 | null |
2025-02-05 | Rethinking Approximate Gaussian Inference in Classification | Bálint Mucsányi et.al. | 2502.03366v1 | null |
2025-02-04 | Fairness in Survival Analysis: A Novel Conditional Mutual Information Augmentation Approach | Tianyang Xie et.al. | 2502.02567v1 | null |
2025-02-04 | Learning the RoPEs: Better 2D and 3D Position Encodings with STRING | Connor Schenck et.al. | 2502.02562v1 | null |
2025-02-04 | Particle Trajectory Representation Learning with Masked Point Modeling | Sam Young et.al. | 2502.02558v1 | null |
2025-02-04 | AAD-DCE: An Aggregated Multimodal Attention Mechanism for Early and Late Dynamic Contrast Enhanced Prostate MRI Synthesis | Divya Bharti et.al. | 2502.02555v1 | null |
2025-02-04 | Hierarchical Sparse Bayesian Multitask Model with Scalable Inference for Microbiome Analysis | Haonan Zhu et.al. | 2502.02552v1 | null |
2025-02-04 | 2D Surface Brightness Modelling of Large 2MASS Galaxies II: The Role of Classical Bulges and Pseudobulges on Galaxy Scaling Relations and its implication for Supermassive Black Hole Formation | Emmanuel Ríos-López et.al. | 2502.02546v1 | null |
2025-02-04 | TabPFN Unleashed: A Scalable and Effective Solution to Tabular Classification Problems | Si-Yang Liu et.al. | 2502.02527v1 | null |
2025-02-04 | Hybrid Fingerprint-based Positioning in Cell-Free Massive MIMO Systems | Manish Kumar et.al. | 2502.02512v1 | null |
2025-02-04 | The Skin Game: Revolutionizing Standards for AI Dermatology Model Comparison | Łukasz Miętkiewicz et.al. | 2502.02500v1 | null |
2025-02-04 | VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models | Hila Chefer et.al. | 2502.02492v1 | null |
2025-01-31 | Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach | Yingdan Shi et.al. | 2501.19403v1 | null |
2025-01-31 | Perceptive Mixed-Integer Footstep Control for Underactuated Bipedal Walking on Rough Terrain | Brian Acosta et.al. | 2501.19391v1 | null |
2025-01-31 | Beyond Fixed Horizons: A Theoretical Framework for Adaptive Denoising Diffusions | Sören Christensen et.al. | 2501.19373v1 | null |
2025-01-31 | Benchmark of the Full and Reduced Effective Resistance Kernel for Molecular Classification | Adam Wesołowski et.al. | 2501.19352v1 | null |
2025-01-31 | An All-digital 65-nm Tsetlin Machine Image Classification Accelerator with 8.6 nJ per MNIST Frame at 60.3k Frames per Second | Svein Anders Tunheim et.al. | 2501.19347v1 | null |
2025-01-31 | Pathological MRI Segmentation by Synthetic Pathological Data Generation in Fetuses and Neonates | Misha P. T Kaandorp et.al. | 2501.19338v1 | null |
2025-01-31 | Consistent Video Colorization via Palette Guidance | Han Wang et.al. | 2501.19331v1 | null |
2025-01-31 | Ultra-fast Real-time Target Recognition Using a Shift, Scale, and Rotation Invariant Hybrid Opto-electronic Joint Transform Correlator | Xi Shen et.al. | 2501.19299v1 | null |
2025-01-31 | Differentially Private In-context Learning via Sampling Few-shot Mixed with Zero-shot Outputs | James Flemings et.al. | 2501.19287v1 | null |
2025-01-31 | Application of Generative Adversarial Network (GAN) for Synthetic Training Data Creation to improve performance of ANN Classifier for extracting Built-Up pixels from Landsat Satellite Imagery | Amritendu Mukherjee et.al. | 2501.19283v1 | null |
2025-01-30 | DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models | Ruofan Liang et.al. | 2501.18590v1 | null |
2025-01-30 | Node Classification and Search on the Rubik's Cube Graph with GNNs | Alessandro Barro et.al. | 2501.18580v1 | null |
2025-01-30 | BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos | Lehao Lin et.al. | 2501.18565v1 | null |
2025-01-30 | Finite subgroups of maximal order of the Cremona group over the rationals | Ahmed Abouelsaad et.al. | 2501.18551v1 | null |
2025-01-30 | UDC-VIT: A Real-World Video Dataset for Under-Display Cameras | Kyusu Ahn et.al. | 2501.18545v1 | link |
2025-01-30 | Loss Functions and Operators Generated by f-Divergences | Vincent Roulet et.al. | 2501.18537v1 | null |
2025-01-30 | Sample Classification using Machine Learning-Assisted Entangled Two-Photon Absorption | Áulide Martínez-Tapia et.al. | 2501.18534v1 | null |
2025-01-30 | Joint Learning of Energy-based Models and their Partition Function | Michael E. Sander et.al. | 2501.18528v1 | null |
2025-01-30 | Character factorisations, |
Seamus Albion et.al. | 2501.18520v1 | null |
2025-01-30 | Deconstruct Complexity (DeComplex): A Novel Perspective on Tackling Dense Action Detection | Faegheh Sardari et.al. | 2501.18509v1 | null |
2025-01-29 | acoupi: An Open-Source Python Framework for Deploying Bioacoustic AI Models on Edge Devices | Aude Vuilliomenet et.al. | 2501.17841v1 | link |
2025-01-29 | IRONMAP: Iron Network Mapping and Analysis Protocol for Detecting Over-Time Brain Iron Abnormalities in Neurological Disease | Jack A. Reeves et.al. | 2501.17838v1 | null |
2025-01-29 | TikTok's recommendations skewed towards Republican content during the 2024 U.S. presidential race | Hazem Ibrahim et.al. | 2501.17831v1 | null |
2025-01-29 | Aggregation Schemes for Single-Vector WSI Representation Learning in Digital Pathology | Sobhan Hemati et.al. | 2501.17822v1 | null |
2025-01-29 | eaSEL: Promoting Social-Emotional Learning and Parent-Child Interaction through AI-Mediated Content Consumption | Jocelyn Shen et.al. | 2501.17819v1 | null |
2025-01-29 | CrowdSplat: Exploring Gaussian Splatting For Crowd Rendering | Xiaohan Sun et.al. | 2501.17792v1 | null |
2025-01-29 | Glioma Multimodal MRI Analysis System for Tumor Layered Diagnosis via Multi-task Semi-supervised Learning | Yihao Liu et.al. | 2501.17758v1 | null |
2025-01-29 | PulmoFusion: Advancing Pulmonary Health with Efficient Multi-Modal Fusion | Ahmed Sharshar et.al. | 2501.17699v1 | link |
2025-01-29 | NutMaat: A Python package for stellar spectral classification on the MK system | R. I. El-Kholy et.al. | 2501.17698v1 | null |
2025-01-29 | Tonguescape: Exploring Language Models Understanding of Vowel Articulation | Haruki Sakajo et.al. | 2501.17643v1 | null |
2025-01-28 | A Hybrid Deep Learning CNN Model for Enhanced COVID-19 Detection from Computed Tomography (CT) Scan Images | Suresh Babu Nettur et.al. | 2501.17160v1 | null |
2025-01-28 | Sensitivity of Quantitative Susceptibility Mapping in Clinical Brain Research | Fahad Salman et.al. | 2501.17158v1 | null |
2025-01-28 | Three-Dimensional Diffusion-Weighted Multi-Slab MRI With Slice Profile Compensation Using Deep Energy Model | Reza Ghorbani et.al. | 2501.17152v1 | null |
2025-01-28 | FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data | Deren Lei et.al. | 2501.17144v1 | link |
2025-01-28 | DINOSTAR: Deep Iterative Neural Object Detector Self-Supervised Training for Roadside LiDAR Applications | Muhammad Shahbaz et.al. | 2501.17076v1 | null |
2025-01-28 | Symmetries of 3-webs around a point | Jean Paul Dufour et.al. | 2501.17066v1 | null |
2025-01-28 | Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding | Akash Kumar et.al. | 2501.17053v1 | null |
2025-01-28 | Benchmarking Quantum Convolutional Neural Networks for Signal Classification in Simulated Gamma-Ray Burst Detection | Farida Farsian et.al. | 2501.17041v1 | null |
2025-01-28 | Approach Towards Semi-Automated Certification for Low Criticality ML-Enabled Airborne Applications | Chandrasekar Sridhar et.al. | 2501.17028v1 | null |
2025-01-28 | MAUCell: An Adaptive Multi-Attention Framework for Video Frame Prediction | Shreyam Gupta et.al. | 2501.16997v1 | null |
2025-01-27 | RelightVid: Temporal-Consistent Diffusion Model for Video Relighting | Ye Fang et.al. | 2501.16330v1 | null |
2025-01-27 | sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging | Jingyuan Chen et.al. | 2501.16329v1 | null |
2025-01-27 | Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture | Yikun Hou et.al. | 2501.16322v1 | null |
2025-01-27 | TiDES: The 4MOST Time Domain Extragalactic Survey | C. Frohmaier et.al. | 2501.16311v1 | null |
2025-01-27 | RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval | Long Nguyen et.al. | 2501.16303v1 | null |
2025-01-27 | Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models | Jing Zhang et.al. | 2501.16282v1 | null |
2025-01-27 | Lightweight Weighted Average Ensemble Model for Pneumonia Detection in Chest X-Ray Images | Suresh Babu Nettur et.al. | 2501.16249v1 | null |
2025-01-27 | Zero-Shot Decision Tree Construction via Large Language Models | Lucas Carrasco et.al. | 2501.16247v1 | null |
2025-01-27 | CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation | Xiaochuan Ma et.al. | 2501.16246v1 | null |
2025-01-27 | Echoes of Discord: Forecasting Hater Reactions to Counterspeech | Xiaoying Song et.al. | 2501.16235v1 | null |
2025-01-24 | Estimation-theoretic analysis of lensless imaging | Leyla A. Kabuli et.al. | 2501.14727v1 | null |
2025-01-24 | Gland Segmentation Using SAM With Cancer Grade as a Prompt | Yijie Zhu et.al. | 2501.14718v1 | null |
2025-01-24 | Enhanced Confocal Laser Scanning Microscopy with Adaptive Physics Informed Deep Autoencoders | Zaheer Ahmad et.al. | 2501.14709v1 | null |
2025-01-24 | Stroke classification using Virtual Hybrid Edge Detection from in silico electrical impedance tomography data | Juan Pablo Agnelli et.al. | 2501.14704v1 | null |
2025-01-24 | Rethinking Foundation Models for Medical Image Classification through a Benchmark Study on MedMNIST | Fuping Wu et.al. | 2501.14685v1 | null |
2025-01-24 | Artificial Intelligence Could Have Predicted All Space Weather Events Associated with the May 2024 Superstorm | Sabrina Guastavino et.al. | 2501.14684v1 | null |
2025-01-24 | An Empirical Study on LLM-based Classification of Requirements-related Provisions in Food-safety Regulations | Shabnam Hassani et.al. | 2501.14683v1 | null |
2025-01-24 | MatAnyone: Stable Video Matting with Consistent Memory Propagation | Peiqing Yang et.al. | 2501.14677v1 | null |
2025-01-24 | Automation of finding strong gravitational lenses in the Kilo Degree Survey with U-DenseLens (DenseLens + Segmentation) | Bharath Chowdhary Nagam et.al. | 2501.14650v1 | null |
2025-01-24 | ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations | Tianming Liang et.al. | 2501.14607v1 | null |
2025-01-23 | IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models | Jiayi Lei et.al. | 2501.13920v1 | null |
2025-01-23 | Temporal Preference Optimization for Long-Form Video Understanding | Rui Li et.al. | 2501.13919v1 | null |
2025-01-23 | Improving Video Generation with Human Feedback | Jie Liu et.al. | 2501.13918v1 | null |
2025-01-23 | Exploring Finetuned Audio-LLM on Heart Murmur Features | Adrian Florea et.al. | 2501.13884v1 | null |
2025-01-23 | Disclinations, dislocations, and emanant flux at Dirac criticality | Maissam Barkeshli et.al. | 2501.13866v1 | null |
2025-01-23 | Dual-Modal Prototype Joint Learning for Compositional Zero-Shot Learning | Shiyu Zhang et.al. | 2501.13859v1 | null |
2025-01-23 | First Lessons Learned of an Artificial Intelligence Robotic System for Autonomous Coarse Waste Recycling Using Multispectral Imaging-Based Methods | Timo Lange et.al. | 2501.13855v1 | null |
2025-01-23 | Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes | Shiling Deng et.al. | 2501.13851v1 | link |
2025-01-23 | Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos | Kairui Hu et.al. | 2501.13826v1 | null |
2025-01-23 | Hallucinations Can Improve Large Language Models in Drug Discovery | Shuzhou Yuan et.al. | 2501.13824v1 | null |
2025-01-22 | VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding | Boqiang Zhang et.al. | 2501.13106v1 | link |
2025-01-22 | Robust Representation Consistency Model via Contrastive Denoising | Jiachen Lei et.al. | 2501.13094v1 | link |
2025-01-22 | CHaRNet: Conditioned Heatmap Regression for Robust Dental Landmark Localization | José Rodríguez-Ortega et.al. | 2501.13073v1 | null |
2025-01-22 | Robust Body Composition Analysis by Generating 3D CT Volumes from Limited 2D Slices | Lianrui Zuo et.al. | 2501.13071v1 | null |
2025-01-22 | Beyond the Lungs: Extending the Field of View in Chest CT with Latent Diffusion Models | Lianrui Zuo et.al. | 2501.13068v1 | null |
2025-01-22 | SMART-Vision: Survey of Modern Action Recognition Techniques in Vision | Ali K. AlShami et.al. | 2501.13066v1 | null |
2025-01-22 | Real-time Terahertz Compressive Optical-Digital Neural Network Imaging | Shao-Hsuan Wu et.al. | 2501.13065v1 | null |
2025-01-22 | One-Class Domain Adaptation via Meta-Learning | Stephanie Holly et.al. | 2501.13052v1 | null |
2025-01-22 | Characterizing Collective Efforts in Content Sharing and Quality Control for ADHD-relevant Content on Video-sharing Platforms | Hanxiu 'Hazel' Zhu et.al. | 2501.13020v1 | null |
2025-01-22 | Discrete Lagrangian multiforms for ABS equations I: quad equations | Jacob J. Richardson et.al. | 2501.13012v1 | null |
2025-01-21 | Learning segmentation from point trajectories | Laurynas Karazija et.al. | 2501.12392v1 | null |
2025-01-21 | Taming Teacher Forcing for Masked Autoregressive Video Generation | Deyu Zhou et.al. | 2501.12389v1 | null |
2025-01-21 | Continuous 3D Perception Model with Persistent State | Qianqian Wang et.al. | 2501.12387v1 | null |
2025-01-21 | InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling | Yi Wang et.al. | 2501.12386v1 | link |
2025-01-21 | CCESAR: Coastline Classification-Extraction From SAR Images Using CNN-U-Net Combination | Vidhu Arora et.al. | 2501.12384v1 | null |
2025-01-21 | Parallel Sequence Modeling via Generalized Spatial Propagation Network | Hongjun Wang et.al. | 2501.12381v1 | null |
2025-01-21 | MMVU: Measuring Expert-Level Multi-Discipline Video Understanding | Yilun Zhao et.al. | 2501.12380v1 | link |
2025-01-21 | Video Depth Anything: Consistent Depth Estimation for Super-Long Videos | Sili Chen et.al. | 2501.12375v1 | null |
2025-01-21 | InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model | Yuhang Zang et.al. | 2501.12368v1 | link |
2025-01-21 | Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration | Thomas Walshe et.al. | 2501.12332v1 | null |
2025-01-17 | Zero-Shot Monocular Scene Flow Estimation in the Wild | Yiqing Liang et.al. | 2501.10357v1 | null |
2025-01-17 | DexForce: Extracting Force-informed Actions from Kinesthetic Demonstrations for Dexterous Manipulation | Claire Chen et.al. | 2501.10356v1 | null |
2025-01-17 | Hybrid Deep Learning Model for epileptic seizure classification by using 1D-CNN with multi-head attention mechanism | Mohammed Guhdar et.al. | 2501.10342v1 | null |
2025-01-17 | Natural Language Processing of Privacy Policies: A Survey | Andrick Adhikari et.al. | 2501.10319v1 | null |
2025-01-17 | Using Technology in Digital Humanities for Learning and Knowledge Dissemination | Armanda Rodrigues et.al. | 2501.10275v1 | null |
2025-01-17 | Over-the-Air Multi-Sensor Inference with Neural Networks Using Memristor-Based Analog Computing | Busra Tegin et.al. | 2501.10245v1 | null |
2025-01-17 | Amortized Bayesian Mixture Models | Šimon Kucharský et.al. | 2501.10229v1 | null |
2025-01-17 | Adaptive Clustering for Efficient Phenotype Segmentation of UAV Hyperspectral Data | Ciem Cornelissen et.al. | 2501.10199v1 | null |
2025-01-17 | Secure Semantic Communication With Homomorphic Encryption | Rui Meng et.al. | 2501.10182v1 | null |
2025-01-17 | A Vision-Language Framework for Multispectral Scene Representation Using Language-Grounded Features | Enes Karanfil et.al. | 2501.10144v1 | null |
2025-01-16 | Learnings from Scaling Visual Tokenizers for Reconstruction and Generation | Philippe Hansen-Estruch et.al. | 2501.09755v1 | null |
2025-01-16 | Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues | Youngjoon Jang et.al. | 2501.09754v1 | null |
2025-01-16 | SRE-Conv: Symmetric Rotation Equivariant Convolution for Biomedical Image Classification | Yuexi Du et.al. | 2501.09753v1 | link |
2025-01-16 | Improvement of Data Analytics Techniques in Reflection High Energy Electron Diffraction to Enable Machine Learning | Patrick T. Gemperline et.al. | 2501.09743v1 | link |
2025-01-16 | ComplexVAD: Detecting Interaction Anomalies in Video | Furkan Mumcu et.al. | 2501.09733v1 | null |
2025-01-16 | Practical Continual Forgetting for Pre-trained Vision Models | Hongbo Zhao et.al. | 2501.09705v1 | link |
2025-01-16 | Cueless EEG imagined speech for subject identification: dataset and benchmarks | Ali Derakhshesh et.al. | 2501.09700v1 | link |
2025-01-16 | Active particle in a very thin interfacial droplet | Airi N. Kato et.al. | 2501.09652v1 | null |
2025-01-16 | Electronic Health Records: Towards Digital Twins in Healthcare | Muhammet Alkan et.al. | 2501.09640v1 | null |
2025-01-16 | Unified Face Matching and Physical-Digital Spoofing Attack Detection | Arun Kunwar et.al. | 2501.09635v1 | null |
2025-01-15 | Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion | Jingyuan Chen et.al. | 2501.09019v1 | null |
2025-01-15 | Vision Foundation Models for Computed Tomography | Suraj Pai et.al. | 2501.09001v1 | null |
2025-01-15 | RepVideo: Rethinking Cross-Layer Representation for Video Generation | Chenyang Si et.al. | 2501.08994v1 | null |
2025-01-15 | Learning to Extract Cross-Domain Aspects and Understanding Sentiments Using Large Language Models | Karukriti Kaushik Ghosh et.al. | 2501.08974v1 | null |
2025-01-15 | An analysis of data variation and bias in image-based dermatological datasets for machine learning classification | Francisco Mauro et.al. | 2501.08962v1 | null |
2025-01-15 | Neuromorphic Retina: An FPGA-based Emulator | Prince Phillip et.al. | 2501.08943v1 | null |
2025-01-15 | Visual WetlandBirds Dataset: Bird Species Identification and Behavior Recognition in Videos | Javier Rodriguez-Juan et.al. | 2501.08931v1 | link |
2025-01-15 | Learning Joint Denoising, Demosaicing, and Compression from the Raw Natural Image Noise Dataset | Benoit Brummer et.al. | 2501.08924v1 | null |
2025-01-15 | Multi-View Transformers for Airway-To-Lung Ratio Inference on Cardiac CT Scans: The C4R Study | Sneha N. Naik et.al. | 2501.08902v1 | null |
2025-01-15 | An investigation of the relationship between morphology and chemistry of the D-type spherules from the recovery expedition of the CNEOS 2014-01-08 bolide: Implications for origins | Eugenia Hyung et.al. | 2501.08890v1 | null |
2025-01-14 | DAViD: Modeling Dynamic Affordance of 3D Objects using Pre-trained Video Diffusion Models | Hyeonwoo Kim et.al. | 2501.08333v1 | null |
2025-01-14 | Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise | Ryan Burgert et.al. | 2501.08331v1 | link |
2025-01-14 | Gradient Equilibrium in Online Learning: Theory and Applications | Anastasios N. Angelopoulos et.al. | 2501.08330v1 | link |
2025-01-14 | Predicting 4D Hand Trajectory from Monocular Videos | Yufei Ye et.al. | 2501.08329v1 | null |
2025-01-14 | Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks | Miran Heo et.al. | 2501.08326v1 | null |
2025-01-14 | GameFactory: Creating New Games with Generative Interactive Videos | Jiwen Yu et.al. | 2501.08325v1 | null |
2025-01-14 | ADAM-1: AI and Bioinformatics for Alzheimer's Detection and Microbiome-Clinical Data Integrations | Ziyuan Huang et.al. | 2501.08324v1 | null |
2025-01-14 | Exploring Robustness of Multilingual LLMs on Real-World Noisy Data | Amirhossein Aliakbarzadeh et.al. | 2501.08322v1 | link |
2025-01-14 | Diffusion Adversarial Post-Training for One-Step Video Generation | Shanchuan Lin et.al. | 2501.08316v1 | null |
2025-01-14 | Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series Classification | Wennuo Yang et.al. | 2501.08305v1 | link |
2025-01-13 | UnCommon Objects in 3D | Xingchen Liu et.al. | 2501.07574v1 | link |
2025-01-13 | Statistical learnability of smooth boundaries via pairwise binary classification with deep ReLU networks | Hiroki Waida et.al. | 2501.07571v1 | null |
2025-01-13 | A reference framework for extremely metal-poor OB star studies: calibrations for stellar parameters and intrinsic colours | Marta Lorenzo et.al. | 2501.07569v1 | null |
2025-01-13 | Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss | Xinyu Zhang et.al. | 2501.07563v1 | null |
2025-01-13 | SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing | Varun Biyyala et.al. | 2501.07554v1 | link |
2025-01-13 | IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion | Tharun Anand et.al. | 2501.07530v1 | null |
2025-01-13 | Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization | Aditya Devarakonda et.al. | 2501.07526v1 | null |
2025-01-13 | RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment | Difei Gu et.al. | 2501.07525v1 | link |
2025-01-13 | Completing Sets of Prototype Transfer Functions for Subspace-based Direction of Arrival Estimation of Multiple Speakers | Daniel Fejgin et.al. | 2501.07524v1 | null |
2025-01-13 | Inductive Learning of Robot Task Knowledge from Raw Data and Online Expert Feedback | Daniele Meli et.al. | 2501.07507v1 | link |
2025-01-10 | Multi-subject Open-set Personalization in Video Generation | Tsai-Shien Chen et.al. | 2501.06187v1 | null |
2025-01-10 | VideoAuteur: Towards Long Narrative Video Generation | Junfei Xiao et.al. | 2501.06173v1 | null |
2025-01-10 | PySpatial: A High-Speed Whole Slide Image Pathomics Toolkit | Yuechen Yang et.al. | 2501.06151v1 | null |
2025-01-10 | MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection | Arkaprava Sinha et.al. | 2501.06138v1 | null |
2025-01-10 | Benchmarking Different Application Types across Heterogeneous Cloud Compute Services | Nivedhitha Duggi et.al. | 2501.06128v1 | null |
2025-01-10 | Merging Feed-Forward Sublayers for Compressed Transformers | Neha Verma et.al. | 2501.06126v1 | link |
2025-01-10 | Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding | Fabian David Schmidt et.al. | 2501.06117v1 | null |
2025-01-10 | ELFATT: Efficient Linear Fast Attention for Vision Transformers | Chong Wu et.al. | 2501.06098v1 | null |
2025-01-10 | Averaged Adam accelerates stochastic optimization in the training of deep neural network approximations for partial differential equation and optimal control problems | Steffen Dereich et.al. | 2501.06081v1 | link |
2025-01-10 | Explaining k-Nearest Neighbors: Abductive and Counterfactual Explanations | Pablo Barceló et.al. | 2501.06078v1 | null |
2025-01-09 | An Empirical Study of Autoregressive Pre-training from Videos | Jathushan Rajasegaran et.al. | 2501.05453v1 | null |
2025-01-09 | Fortuity in the D1-D5 system | Chi-Ming Chang et.al. | 2501.05448v1 | null |
2025-01-09 | Progressive Growing of Video Tokenizers for Highly Compressed Latent Spaces | Aniruddha Mahapatra et.al. | 2501.05442v1 | null |
2025-01-09 | From Images to Insights: Transforming Brain Cancer Diagnosis with Explainable AI | Md. Arafat Alam Khandaker et.al. | 2501.05426v1 | null |
2025-01-09 | Seeing Sound: Assembling Sounds from Visuals for Audio-to-Image Generation | Darius Petermann et.al. | 2501.05413v1 | null |
2025-01-09 | Innovative Designs and Insights into Quantum Thermal Machines | Aline D. Lucio et.al. | 2501.05406v1 | null |
2025-01-09 | Mechanistic understanding and validation of large AI models with SemanticLens | Maximilian Dreyer et.al. | 2501.05398v1 | null |
2025-01-09 | 1-2-1: Renaissance of Single-Network Paradigm for Virtual Try-On | Shuliang Ning et.al. | 2501.05369v1 | null |
2025-01-09 | Video-Conferencing Beyond Screen-Sharing and Thumbnail Webcam Videos: Gesture-Aware Augmented Reality Video for Data-Rich Remote Presentations | Matthew Brehmer et.al. | 2501.05345v1 | null |
2025-01-09 | Stability and List-Replicability for Agnostic Learners | Ari Blonda et.al. | 2501.05333v1 | null |
2025-01-09 | Probing Speaker-specific Features in Speaker Representations | Aemon Yat Fei Chiu et.al. | 2501.05310v1 | null |
2025-01-08 | Planarian Neural Networks: Evolutionary Patterns from Basic Bilateria Shaping Modern Artificial Neural Network Architectures | Ziyuan Huang et.al. | 2501.04700v1 | null |
2025-01-08 | ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning | Yuzhou Huang et.al. | 2501.04698v1 | null |
2025-01-08 | Non-Markovian dynamics of BIC generation via single-photon scattering | Giuseppe Magnifico et.al. | 2501.04691v1 | null |
2025-01-08 | Learning by Confusion: The Phase Diagram of the Holstein Model | George Issa et.al. | 2501.04681v1 | null |
2025-01-08 | RadGPT: Constructing 3D Image-Text Tumor Datasets | Pedro R. A. S. Bassi et.al. | 2501.04678v1 | link |
2025-01-08 | Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs | Yikang Zhou et.al. | 2501.04670v1 | link |
2025-01-08 | HyFusion: Enhanced Reception Field Transformer for Hyperspectral Image Fusion | Chia-Ming Lee et.al. | 2501.04665v1 | null |
2025-01-08 | Discrete Wavelet Transform-Based Capsule Network for Hyperspectral Image Classification | Zhiqiang Gao et.al. | 2501.04643v1 | null |
2025-01-08 | A Statistical Theory of Contrastive Pre-training and Multimodal Generative AI | Kazusato Oko et.al. | 2501.04641v1 | link |
2025-01-08 | Framework for Integrating Machine Learning Methods for Path-Aware Source Routing | Anees Al-Najjar et.al. | 2501.04624v1 | null |
2025-01-07 | Extraction Of Cumulative Blobs From Dynamic Gestures | Rishabh Naulakha et.al. | 2501.04002v1 | null |
2025-01-07 | Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Haobo Yuan et.al. | 2501.04001v1 | null |
2025-01-07 | WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings | Haochen Song et.al. | 2501.03999v1 | null |
2025-01-07 | Supervised and unsupervised learning the many-body critical phase, phase transitions and critical exponents in disordered quantum systems | Aamna Ahmed et.al. | 2501.03981v1 | null |
2025-01-07 | Temporal Feature Weaving for Neonatal Echocardiographic Viewpoint Video Classification | Satchel French et.al. | 2501.03967v1 | link |
2025-01-07 | Learning to Relax Nonconvex Quadratically Constrained Quadratic Programs | Buket Ozen et.al. | 2501.03954v1 | null |
2025-01-07 | Reducing Proxy Discrimination | Frank Fagan et.al. | 2501.03946v1 | null |
2025-01-07 | Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers | Yuechen Zhang et.al. | 2501.03931v1 | link |
2025-01-07 | Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback | Jiakang Yuan et.al. | 2501.03916v1 | null |
2025-01-07 | The Cable to the Moon: Veritasium's Light Bulb Experiment in Low-Cost Miniature Form | Michael Lenz et.al. | 2501.03896v1 | null |
2025-01-06 | RW-Net: Enhancing Few-Shot Point Cloud Classification with a Wavelet Transform Projection-based Network | Haosheng Zhang et.al. | 2501.03221v1 | null |
2025-01-06 | ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking | Tingyang Zhang et.al. | 2501.03220v1 | null |
2025-01-06 | Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction | Rui Qian et.al. | 2501.03218v1 | link |
2025-01-06 | Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLMs-Generated Text | Ayat Najjar et.al. | 2501.03212v1 | null |
2025-01-06 | Multimodal Machine Learning Can Predict Videoconference Fluidity and Enjoyment | Andrew Chang et.al. | 2501.03190v1 | null |
2025-01-06 | GLiREL -- Generalist Model for Zero-Shot Relation Extraction | Jack Boylan et.al. | 2501.03172v1 | null |
2025-01-06 | Deep-Relative-Trust-Based Diffusion for Decentralized Deep Learning | Muyun Li et.al. | 2501.03162v1 | null |
2025-01-06 | Segment Anything Model for Zero-shot Single Particle Tracking in Liquid Phase Transmission Electron Microscopy | Risha Goel et.al. | 2501.03153v1 | null |
2025-01-06 | MVP: Multimodal Emotion Recognition based on Video and Physiological Signals | Valeriya Strizhkova et.al. | 2501.03103v1 | null |
2025-01-06 | Trust Modeling in Counseling Conversations: A Benchmark Study | Aseem Srivastava et.al. | 2501.03064v1 | null |
2025-01-03 | VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction | Chaoyou Fu et.al. | 2501.01957v1 | link |
2025-01-03 | VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment | Wenyan Cong et.al. | 2501.01949v1 | null |
2025-01-03 | Bridging Classification and Segmentation in Osteosarcoma Assessment via Foundation and Discrete Diffusion Models | Manh Duong Nguyen et.al. | 2501.01932v1 | null |
2025-01-03 | GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction | Yuwei Miao et.al. | 2501.01930v1 | null |
2025-01-03 | Transformer-Driven Inverse Problem Transform for Fast Blind Hyperspectral Image Dehazing | Po-Wei Tang et.al. | 2501.01924v1 | null |
2025-01-03 | Structural and Statistical Audio Texture Knowledge Distillation (SSATKD) for Passive Sonar Classification | Jarin Ritu et.al. | 2501.01921v1 | null |
2025-01-03 | Exoplanet Detection via Differentiable Rendering | Brandon Y. Feng et.al. | 2501.01912v1 | null |
2025-01-03 | EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation | Siyuan Huang et.al. | 2501.01895v1 | null |
2025-01-03 | ANTHROPOS-V: benchmarking the novel task of Crowd Volume Estimation | Luca Collorone et.al. | 2501.01877v1 | null |
2025-01-03 | Extensions of finite irreducible modules over rank two Lie conformal algebra | Lipeng Luo et.al. | 2501.01870v1 | null |
2025-01-02 | GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models | Zhangyang Qi et.al. | 2501.01428v1 | null |
2025-01-02 | VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control | Yuanpeng Tu et.al. | 2501.01427v1 | null |
2025-01-02 | Unifying Specialized Visual Encoders for Video Language Models | Jihoon Chung et.al. | 2501.01426v1 | null |
2025-01-02 | Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions | Xincheng Shuai et.al. | 2501.01425v1 | null |
2025-01-02 | Multi-Modal Video Feature Extraction for Popularity Prediction | Haixu Liu et.al. | 2501.01422v1 | null |
2025-01-02 | A Multi-task Supervised Compression Model for Split Computing | Yoshitomo Matsubara et.al. | 2501.01420v1 | null |
2025-01-02 | On Unifying Video Generation and Camera Pose Estimation | Chun-Hao Paul Huang et.al. | 2501.01409v1 | null |
2025-01-02 | nnY-Net: Swin-NeXt with Cross-Attention for 3D Medical Images Segmentation | Haixu Liu et.al. | 2501.01406v1 | null |
2025-01-02 | VoiceVector: Multimodal Enrolment Vectors for Speaker Separation | Akam Rahimi et.al. | 2501.01401v1 | null |
2025-01-02 | ProjectedEx: Enhancing Generation in Explainable AI for Prostate Cancer | Xuyin Qi et.al. | 2501.01392v1 | null |
2024-12-30 | PERSE: Personalized 3D Generative Avatars from A Single Portrait | Hyunsoo Cha et.al. | 2412.21206v1 | null |
2024-12-30 | Action-Agnostic Point-Level Supervision for Temporal Action Detection | Shuhei M. Yoshida et.al. | 2412.21205v1 | link |
2024-12-30 | A Large-Scale Study on Video Action Dataset Condensation | Yang Chen et.al. | 2412.21197v1 | null |
2024-12-30 | Classification of del Pezzo surfaces of rank one. I. Height 1 and 2. II. Descendants with elliptic boundaries | Karol Palka et.al. | 2412.21174v1 | null |
2024-12-30 | Adversarial Attack and Defense for LoRa Device Identification and Authentication via Deep Learning | Yalin E. Sagduyu et.al. | 2412.21164v1 | null |
2024-12-30 | Open RAN-Enabled Deep Learning-Assisted Mobility Management for Connected Vehicles | Maria Barbosa et.al. | 2412.21161v1 | null |
2024-12-30 | Unified dimensionality reduction techniques in chronic liver disease detection | Anand Karna et.al. | 2412.21156v1 | null |
2024-12-30 | Irreducible representations of welded braid group | Inna Sysoeva et.al. | 2412.21133v1 | null |
2024-12-30 | Galaxy Spectra Networks (GaSNet). III. Generative pre-trained network for spectrum reconstruction, redshift estimate and anomaly detection | Fucheng Zhong et.al. | 2412.21130v1 | link |
2024-12-30 | All toric Kahler surfaces with twistor 2-forms | Sergei G. Ovchinnikov et.al. | 2412.21114v1 | null |
2024-12-27 | Streamlined Krylov construction and classification of ergodic Floquet systems | Nikita Kolganov et.al. | 2412.19797v1 | null |
2024-12-27 | MVTamperBench: Evaluating Robustness of Vision-Language Models | Amit Agarwal et.al. | 2412.19794v1 | null |
2024-12-27 | Machine Learning for Sentiment Analysis of Imported Food in Trinidad and Tobago | Cassandra Daniels et.al. | 2412.19781v1 | null |
2024-12-27 | Classification of Minimal Abelian Coulomb Branches | Antoine Bourget et.al. | 2412.19766v1 | null |
2024-12-27 | Can one hear the shape of a random walk? | Michael J. Larsen et.al. | 2412.19762v1 | null |
2024-12-27 | Generative Video Propagation | Shaoteng Liu et.al. | 2412.19761v1 | null |
2024-12-27 | Generative Pretrained Embedding and Hierarchical Irregular Time Series Representation for Daily Living Activity Recognition | Damien Bouchabou et.al. | 2412.19732v1 | null |
2024-12-27 | EEG-Reptile: An Automatized Reptile-Based Meta-Learning Library for BCIs | Daniil A. Berdyshev et.al. | 2412.19725v1 | link |
2024-12-27 | Quantum correlations in a gravitational collapse simulation with SpheriCo.jl | Benjamin Berczi et.al. | 2412.19722v1 | null |
2024-12-27 | ProKAN: Progressive Stacking of Kolmogorov-Arnold Networks for Efficient Liver Segmentation | Bhavesh Gyanchandani et.al. | 2412.19713v1 | null |
2024-12-24 | Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models | Jinhui Yi et.al. | 2412.18609v1 | link |
2024-12-24 | DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers | Yuntao Chen et.al. | 2412.18607v1 | null |
2024-12-24 | ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation | Hongjie Li et.al. | 2412.18600v1 | null |
2024-12-24 | DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation | Minghong Cai et.al. | 2412.18597v1 | link |
2024-12-24 | ClassifyViStA:WCE Classification with Visual understanding through Segmentation and Attention | S. Balasubramanian et.al. | 2412.18591v1 | link |
2024-12-24 | Text-Driven Tumor Synthesis | Xinran Li et.al. | 2412.18589v1 | null |
2024-12-24 | Resolution-Robust 3D MRI Reconstruction with 2D Diffusion Priors: Diverse-Resolution Training Outperforms Interpolation | Anselm Krainovic et.al. | 2412.18584v1 | null |
2024-12-24 | New method of image processing via statistical analysis for application in intelligent systems | Monalisa Cavalcante et.al. | 2412.18575v1 | null |
2024-12-24 | 3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement | Yihang Luo et.al. | 2412.18565v1 | null |
2024-12-24 | Distilling Fine-grained Sentiment Understanding from Large Language Models | Yice Zhang et.al. | 2412.18552v1 | link |
2024-12-23 | FaceLift: Single Image to 3D Head with View Generation and GS-LRM | Weijie Lyu et.al. | 2412.17812v1 | null |
2024-12-23 | Large Motion Video Autoencoding with Cross-modal Video VAE | Yazhou Xing et.al. | 2412.17805v1 | null |
2024-12-23 | GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator | Yidi Shao et.al. | 2412.17804v1 | null |
2024-12-23 | Classification of exchange relation planar algebras through sieving forest fusion graphs | Fan Lu et.al. | 2412.17790v1 | null |
2024-12-23 | Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy | Priyaranjan Pattnayak et.al. | 2412.17759v1 | null |
2024-12-23 | Induced subgraphs and tree decompositions XVIII. Obstructions to bounded pathwidth | Maria Chudnovsky et.al. | 2412.17756v1 | null |
2024-12-23 | LASE: Learned Adjacency Spectral Embeddings | Sofía Pérez Casulo et.al. | 2412.17734v1 | null |
2024-12-23 | VidTwin: Video VAE with Decoupled Structure and Dynamics | Yuchi Wang et.al. | 2412.17726v1 | link |
2024-12-23 | MRANet: A Modified Residual Attention Networks for Lung and Colon Cancer Classification | Diponkor Bala et.al. | 2412.17700v1 | null |
2024-12-23 | An efficient volume-preserving MBO scheme for data clustering and classification | Fabius Krämer et.al. | 2412.17694v1 | null |
2024-12-20 | Can Generative Video Models Help Pose Estimation? | Ruojin Cai et.al. | 2412.16155v1 | null |
2024-12-20 | MotiF: Making Text Count in Image Animation with Motion Focal Loss | Shijie Wang et.al. | 2412.16153v1 | null |
2024-12-20 | Shape Shifters: Does Body Shape Change the Perception of Small-Scale Crowd Motions? | Bharat Vyas et.al. | 2412.16151v1 | null |
2024-12-20 | SeagrassFinder: Deep Learning for Eelgrass Detection and Coverage Estimation in the Wild | Jannik Elsäßer et.al. | 2412.16147v1 | null |
2024-12-20 | Mamba2D: A Natively Multi-Dimensional State-Space Model for Vision Tasks | Enis Baty et.al. | 2412.16146v1 | null |
2024-12-20 | FedGAT: A Privacy-Preserving Federated Approximation Algorithm for Graph Attention Networks | Siddharth Ambekar et.al. | 2412.16144v1 | null |
2024-12-20 | Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts | Muhammad Abdullah Sohail et.al. | 2412.16119v1 | link |
2024-12-20 | PruneVid: Visual Token Pruning for Efficient Video Large Language Models | Xiaohu Huang et.al. | 2412.16117v1 | link |
2024-12-20 | Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG | Hasan Md Tusfiqur Alam et.al. | 2412.16086v1 | link |
2024-12-20 | Efficient MedSAMs: Segment Anything in Medical Images on Laptop | Jun Ma et.al. | 2412.16085v1 | link |
2024-12-19 | LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis | Hanlin Wang et.al. | 2412.15214v1 | null |
2024-12-19 | Scaling 4D Representations | João Carreira et.al. | 2412.15212v1 | null |
2024-12-19 | AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation | Moayed Haji-Ali et.al. | 2412.15191v1 | null |
2024-12-19 | EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues | Sagar Soni et.al. | 2412.15190v1 | null |
2024-12-19 | Surface-Based Authentication System for Integrated Circuit Chips | Runze Liu et.al. | 2412.15186v1 | null |
2024-12-19 | Tiled Diffusion | Or Madar et.al. | 2412.15185v1 | null |
2024-12-19 | SqueezeMe: Efficient Gaussian Avatars for VR | Shunsuke Saito et.al. | 2412.15171v1 | null |
2024-12-19 | OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization | Jiacheng Zhang et.al. | 2412.15159v1 | null |
2024-12-19 | Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM | Yatai Ji et.al. | 2412.15156v1 | link |
2024-12-19 | Cruise Control: Dynamic Model Selection for ML-Based Network Traffic Analysis | Johann Hugon et.al. | 2412.15146v1 | null |
2024-12-18 | AniDoc: Animation Creation Made Easier | Yihao Meng et.al. | 2412.14173v1 | null |
2024-12-18 | Learning from Massive Human Videos for Universal Humanoid Pose Control | Jiageng Mao et.al. | 2412.14172v1 | null |
2024-12-18 | Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces | Jihan Yang et.al. | 2412.14171v1 | link |
2024-12-18 | Autoregressive Video Generation without Vector Quantization | Haoge Deng et.al. | 2412.14169v1 | link |
2024-12-18 | VideoDPO: Omni-Preference Alignment for Video Diffusion Generation | Runtao Liu et.al. | 2412.14167v1 | null |
2024-12-18 | AKiRa: Augmentation Kit on Rays for optical video generation | Xi Wang et.al. | 2412.14158v1 | null |
2024-12-18 | AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities | Guillaume Astruc et.al. | 2412.14123v1 | link |
2024-12-18 | GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images | Ziyang Xu et.al. | 2412.14118v1 | link |
2024-12-18 | Parameter-efficient Fine-tuning for improved Convolutional Baseline for Brain Tumor Segmentation in Sub-Saharan Africa Adult Glioma Dataset | Bijay Adhikari et.al. | 2412.14100v1 | null |
2024-12-18 | Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts | Jihye Choi et.al. | 2412.14097v1 | null |
2024-12-17 | MotionBridge: Dynamic Video Inbetweening with Flexible Controls | Maham Tanveer et.al. | 2412.13190v1 | null |
2024-12-17 | StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models | Yunzhi Yan et.al. | 2412.13188v1 | null |
2024-12-17 | HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction | Chen Bao et.al. | 2412.13187v1 | null |
2024-12-17 | Move-in-2D: 2D-Conditioned Human Motion Generation | Hsin-Ping Huang et.al. | 2412.13185v1 | null |
2024-12-17 | Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures | Guoxing Sun et.al. | 2412.13183v1 | null |
2024-12-17 | NFL-BA: Improving Endoscopic SLAM with Near-Field Light Bundle Adjustment | Andrea Dunn Beltran et.al. | 2412.13176v1 | null |
2024-12-17 | Learning Visuotactile Estimation and Control for Non-prehensile Manipulation under Occlusions | Juan Del Aguila Ferrandis et.al. | 2412.13157v1 | null |
2024-12-17 | Continuous Patient Monitoring with AI: Real-Time Analysis of Video in Hospital Care Settings | Paolo Gabriel et.al. | 2412.13152v1 | null |
2024-12-17 | Label Errors in the Tobacco3482 Dataset | Gordon Lim et.al. | 2412.13140v1 | link |
2024-12-17 | Unlocking the Potential of Digital Pathology: Novel Baselines for Compression | Maximilian Fischer et.al. | 2412.13137v1 | null |
2024-12-16 | Wonderland: Navigating 3D Scenes from a Single Image | Hanwen Liang et.al. | 2412.12091v1 | null |
2024-12-16 | Instruction-based Image Manipulation by Watching How Things Move | Mingdeng Cao et.al. | 2412.12087v1 | null |
2024-12-16 | CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology | Yuxuan Sun et.al. | 2412.12077v1 | null |
2024-12-16 | CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding | Guo Chen et.al. | 2412.12075v1 | null |
2024-12-16 | Exploring Semantic Consistency and Style Diversity for Domain Generalized Semantic Segmentation | Hongwei Niu et.al. | 2412.12050v1 | link |
2024-12-16 | Deep-learning-based identification of individual motion characteristics from upper-limb trajectories towards disorder stage evaluation | Tim Sziburis et.al. | 2412.12016v1 | null |
2024-12-16 | Cost-Effective Label-free Node Classification with LLMs | Taiyan Zhang et.al. | 2412.11983v1 | null |
2024-12-16 | On the Nielsen-Thomsen sequence | Laurent Cantier et.al. | 2412.11975v1 | null |
2024-12-16 | On vertex-transitive distance-regular covers of complete graphs with an extremal smallest eigenvalue | Ludmila Yu. Tsiovkina et.al. | 2412.11962v1 | null |
2024-12-16 | Gramian Multimodal Representation Learning and Alignment | Giordano Cicchetti et.al. | 2412.11959v1 | null |
2024-12-13 | UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities | Muhammad Uzair Khattak et.al. | 2412.10372v1 | link |
2024-12-13 | Apollo: An Exploration of Video Understanding in Large Multimodal Models | Orr Zohar et.al. | 2412.10360v1 | null |
2024-12-13 | Robust image classification with multi-modal large language models | Francesco Villani et.al. | 2412.10353v1 | null |
2024-12-13 | BrushEdit: All-In-One Image Inpainting and Editing | Yaowei Li et.al. | 2412.10316v1 | null |
2024-12-13 | Performance evaluation of predictive AI models to support medical decisions: Overview and guidance | Ben Van Calster et.al. | 2412.10288v1 | null |
2024-12-13 | TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation | Xingrui Wang et.al. | 2412.10275v1 | null |
2024-12-13 | Reasoner Outperforms: Generative Stance Detection with Rationalization for Social Media | Jiaqing Yuan et.al. | 2412.10266v1 | null |
2024-12-13 | Adversarial Robustness of Bottleneck Injected Deep Neural Networks for Task-Oriented Communication | Alireza Furutanpey et.al. | 2412.10265v1 | null |
2024-12-13 | MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization | Shuaiting Li et.al. | 2412.10261v1 | null |
2024-12-13 | Copy-Move Detection in Optical Microscopy: A Segmentation Network and A Dataset | Hao-Chiang Shao et.al. | 2412.10258v1 | null |
2024-12-12 | Doe-1: Closed-Loop Autonomous Driving with Large World Model | Wenzhao Zheng et.al. | 2412.09627v1 | link |
2024-12-12 | FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion | Haonan Qiu et.al. | 2412.09626v1 | null |
2024-12-12 | GenEx: Generating an Explorable World | Taiming Lu et.al. | 2412.09624v1 | null |
2024-12-12 | OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation | Weiqi Li et.al. | 2412.09623v1 | null |
2024-12-12 | Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos | Linyi Jin et.al. | 2412.09621v1 | null |
2024-12-12 | Learning Camera Movement Control from Real-World Drone Videos | Yunzhong Hou et.al. | 2412.09620v1 | null |
2024-12-12 | NormalFlow: Fast, Robust, and Accurate Contact-based Object 6DoF Pose Tracking with Vision-based Tactile Sensors | Hung-Jui Huang et.al. | 2412.09617v1 | link |
2024-12-12 | V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding | Junqi Ge et.al. | 2412.09616v1 | link |
2024-12-12 | PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models | Chenyu Yang et.al. | 2412.09613v1 | null |
2024-12-12 | Olympus: A Universal Task Router for Computer Vision Tasks | Yuanze Lin et.al. | 2412.09612v1 | link |
2024-12-11 | StreamChat: Chatting with Streaming Video | Jihao Liu et.al. | 2412.08646v1 | null |
2024-12-11 | Generative Semantic Communication: Architectures, Technologies, and Applications | Jinke Ren et.al. | 2412.08642v1 | null |
2024-12-11 | Multimodal Latent Language Modeling with Next-Token Diffusion | Yutao Sun et.al. | 2412.08635v1 | null |
2024-12-11 | MNIST-Fraction: Enhancing Math Education with AI-Driven Fraction Detection and Analysis | Pegah Ahadian et.al. | 2412.08633v1 | null |
2024-12-11 | Image Retrieval Methods in the Dissimilarity Space | Madhu Kiran et.al. | 2412.08618v1 | null |
2024-12-11 | CCSNscore: A multi-input deep learning tool for classification of core-collapse supernovae using SED-Machine spectra | Yashvi Sharma et.al. | 2412.08601v1 | null |
2024-12-11 | RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation | Mingfei Han et.al. | 2412.08591v1 | null |
2024-12-11 | SPACE-SUIT: An Artificial Intelligence based chromospheric feature extractor and classifier for SUIT | Pranava Seth et.al. | 2412.08589v1 | null |
2024-12-11 | Advancing Single- and Multi-task Text Classification through Large Language Model Fine-tuning | Hang Zhao et.al. | 2412.08587v1 | null |
2024-12-11 | Utilizing Multi-step Loss for Single Image Reflection Removal | Abdelrahman Elnenaey et.al. | 2412.08582v1 | link |
2024-12-10 | Video Motion Transfer with Diffusion Transformers | Alexander Pondaven et.al. | 2412.07776v1 | link |
2024-12-10 | UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics | Xi Chen et.al. | 2412.07774v1 | null |
2024-12-10 | From Slow Bidirectional to Fast Causal Video Generators | Tianwei Yin et.al. | 2412.07772v1 | null |
2024-12-10 | From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos | Matthew Wallingford et.al. | 2412.07770v1 | null |
2024-12-10 | Learning Visual Generative Priors without Text | Shuailei Ma et.al. | 2412.07767v1 | null |
2024-12-10 | Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation | Jingxi Chen et.al. | 2412.07761v1 | null |
2024-12-10 | SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints | Jianhong Bai et.al. | 2412.07760v1 | link |
2024-12-10 | 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation | Xiao Fu et.al. | 2412.07759v1 | null |
2024-12-10 | PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation | Fatemeh Nazarieh et.al. | 2412.07754v1 | null |
2024-12-10 | On Motion Blur and Deblurring in Visual Place Recognition | Timur Ismagilov et.al. | 2412.07751v1 | null |
2024-12-09 | [MASK] is All You Need | Vincent Tao Hu et.al. | 2412.06787v1 | link |
2024-12-09 | P3-PO: Prescriptive Point Priors for Visuo-Spatial Generalization of Robot Policies | Mara Levy et.al. | 2412.06784v1 | null |
2024-12-09 | Convolution goes higher-order: a biologically inspired mechanism empowers image classification | Simone Azeglio et.al. | 2412.06740v1 | null |
2024-12-09 | JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM | Takuro Fujii et.al. | 2412.06738v1 | null |
2024-12-09 | Demystifying shock breakout spectra | Christopher M. Irwin et.al. | 2412.06734v1 | null |
2024-12-09 | Parkinson's Disease Diagnosis Through Deep Learning: A Novel LSTM-Based Approach for Freezing of Gait Detection | Aqib Nazir Mir et.al. | 2412.06709v1 | null |
2024-12-09 | You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale | Baorui Ma et.al. | 2412.06699v1 | null |
2024-12-09 | FedSynthCT-Brain: A Federated Learning Framework for Multi-Institutional Brain MRI-to-CT Synthesis | Ciro Benito Raggio et.al. | 2412.06690v1 | null |
2024-12-09 | Impact of Privacy Parameters on Deep Learning Models for Image Classification | Basanta Chaulagain et.al. | 2412.06689v1 | null |
2024-12-09 | Diff5T: Benchmarking Human Brain Diffusion MRI with an Extensive 5.0 Tesla K-Space and Spatial Dataset | Shanshan Wang et.al. | 2412.06666v1 | null |
2024-12-06 | Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model | Lening Wang et.al. | 2412.05280v1 | link |
2024-12-06 | Sparse autoencoders reveal selective remapping of visual concepts during adaptation | Hyesu Lim et.al. | 2412.05276v1 | link |
2024-12-06 | MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models | Tuna Han Salih Meral et.al. | 2412.05275v1 | null |
2024-12-06 | Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | Zhe Chen et.al. | 2412.05271v1 | null |
2024-12-06 | Mind the Time: Temporally-Controlled Multi-Event Video Generation | Ziyi Wu et.al. | 2412.05263v1 | null |
2024-12-06 | TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft | Qian Long et.al. | 2412.05255v1 | link |
2024-12-06 | Uncertainty Quantification for Transformer Models for Dark-Pattern Detection | Javier Muñoz et.al. | 2412.05251v1 | null |
2024-12-06 | ColonNet: A Hybrid Of DenseNet121 And U-NET Model For Detection And Segmentation Of GI Bleeding | Ayushman Singh et.al. | 2412.05216v1 | null |
2024-12-06 | LinVT: Empower Your Image-level Large Language Model to Understand Videos | Lishuai Gao et.al. | 2412.05185v1 | link |
2024-12-06 | DreamColour: Controllable Video Colour Editing without Training | Chaitat Utintu et.al. | 2412.05180v1 | null |
2024-12-05 | PaintScene4D: Consistent 4D Scene Generation from Text Prompts | Vinayak Gupta et.al. | 2412.04471v1 | null |
2024-12-05 | QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos | Sharath Girish et.al. | 2412.04469v1 | null |
2024-12-05 | NVILA: Efficient Frontier Visual Language Models | Zhijian Liu et.al. | 2412.04468v1 | null |
2024-12-05 | VisionZip: Longer is Better but Not Necessary in Vision Language Models | Senqiao Yang et.al. | 2412.04467v1 | link |
2024-12-05 | MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos | Zhengqi Li et.al. | 2412.04463v1 | null |
2024-12-05 | 4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion | Chaoyang Wang et.al. | 2412.04462v1 | null |
2024-12-05 | Four-Plane Factorized Video Autoencoders | Mohammed Suhail et.al. | 2412.04452v1 | null |
2024-12-05 | MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation | Longtao Zheng et.al. | 2412.04448v1 | null |
2024-12-05 | EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios | Lu Qiu et.al. | 2412.04447v1 | null |
2024-12-05 | DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models | Yizhuo Li et.al. | 2412.04446v1 | null |
2024-12-04 | Navigation World Models | Amir Bar et.al. | 2412.03572v1 | null |
2024-12-04 | The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control | Ruili Feng et.al. | 2412.03568v1 | null |
2024-12-04 | Streaming Detection of Queried Event Start | Cristobal Eyzaguirre et.al. | 2412.03567v1 | null |
2024-12-04 | Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning | Wujian Peng et.al. | 2412.03565v1 | null |
2024-12-04 | From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents | Xinyi Mou et.al. | 2412.03563v1 | null |
2024-12-04 | Imagine360: Immersive 360 Video Generation from Perspective Anchor | Jing Tan et.al. | 2412.03552v1 | null |
2024-12-04 | Kibble-Zurek Dynamics & Statistics of Topological Defects in Chiral Superfluid $^3$He Films | Noble Gluscevich et.al. | 2412.03544v1 | null |
2024-12-04 | Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos | Hanxue Liang et.al. | 2412.03526v1 | null |
2024-12-04 | Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention | Hannan Lu et.al. | 2412.03520v1 | null |
2024-12-04 | Distillation of Diffusion Features for Semantic Correspondence | Frank Fundel et.al. | 2412.03512v1 | null |
2024-12-03 | Motion Prompting: Controlling Video Generation with Motion Trajectories | Daniel Geng et.al. | 2412.02700v1 | null |
2024-12-03 | An ADHD Diagnostic Interface Based on EEG Spectrograms and Deep Learning Techniques | Medha Pappula et.al. | 2412.02695v1 | null |
2024-12-03 | FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation | Kefan Chen et.al. | 2412.02690v1 | null |
2024-12-03 | AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction | Lingteng Qiu et.al. | 2412.02684v1 | null |
2024-12-03 | On Third-Order Evolution Systems Describing Pseudo-Spherical or Spherical Surfaces | Filipe Kelmer et.al. | 2412.02657v1 | null |
2024-12-03 | Robust soybean seed yield estimation using high-throughput ground robot videos | Jiale Feng et.al. | 2412.02642v1 | null |
2024-12-03 | QA-TOOLBOX: Conversational Question-Answering for process task guidance in manufacturing | Ramesh Manuvinakurike et.al. | 2412.02638v1 | null |
2024-12-03 | Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback | Hiroki Furuta et.al. | 2412.02617v1 | null |
2024-12-03 | Interpretable Company Similarity with Sparse Autoencoders | Marco Molinari et.al. | 2412.02605v1 | null |
2024-12-03 | Efficient Algorithms for Low Tubal Rank Tensor Approximation with Applications to Image Compression, Super-Resolution and Deep Learning | Salman Ahmadi-Asl et.al. | 2412.02598v1 | null |
2024-12-02 | T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs | Shukang Yin et.al. | 2411.19951v2 | link |
2024-11-29 | AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos | Yuze He et.al. | 2411.19950v1 | null |
2024-11-29 | Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark | Joseph Heyward et.al. | 2411.19941v1 | null |
2024-11-29 | SIMS: Simulating Human-Scene Interactions with Real World Script Planning | Wenjia Wang et.al. | 2411.19921v1 | null |
2024-11-29 | Noncommutative Model Selection for Data Clustering and Dimension Reduction Using Relative von Neumann Entropy | Araceli Guzmán-Tristán et.al. | 2411.19902v1 | null |
2024-11-29 | To the Problem of Cosmic Expansion in Massive Gravity | Lavinia Heisenberg et.al. | 2411.19873v1 | null |
2024-11-29 | AIDetx: a compression-based method for identification of machine-learning generated text | Leonardo Almeida et.al. | 2411.19869v1 | link |
2024-11-29 | Towards Class-wise Robustness Analysis | Tejaswini Medi et.al. | 2411.19853v1 | null |
2024-11-29 | Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation | Dimosthenis Antypas et.al. | 2411.19832v1 | null |
2024-11-29 | A new definition of outsplitting on |
Mackenzie Amann et.al. | 2411.19816v1 | null |
2024-11-27 | GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data | Wentao Wang et.al. | 2411.18624v1 | null |
2024-11-27 | Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data | Aoran Shen et.al. | 2411.18622v1 | null |
2024-11-27 | CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models | Rundi Wu et.al. | 2411.18613v1 | null |
2024-11-27 | Novel Class Discovery for Open Set Raga Classification | Parampreet Singh et.al. | 2411.18611v1 | null |
2024-11-27 | Variability of hot sub-luminous stars and binaries: Machine learning analysis of Gaia DR3 multi-epoch photometry | P. Ranaivomanana et.al. | 2411.18609v1 | null |
2024-11-27 | Evaluating and Improving the Effectiveness of Synthetic Chest X-Rays for Medical Image Analysis | Eva Prakash et.al. | 2411.18602v1 | null |
2024-11-27 | Periodic symplectic and Hamiltonian diffeomorphisms on irrational ruled surfaces | Nicholas Lindsay et.al. | 2411.18580v1 | null |
2024-11-27 | Pruning Deep Convolutional Neural Network Using Conditional Mutual Information | Tien Vu-Van et.al. | 2411.18578v1 | null |
2024-11-27 | Exploring Depth Information for Detecting Manipulated Face Videos | Haoyue Wang et.al. | 2411.18572v1 | null |
2024-11-27 | Perturbation Ontology based Graph Attention Networks | Yichen Wang et.al. | 2411.18520v1 | null |
2024-11-26 | Video-Guided Foley Sound Generation with Multimodal Controls | Ziyang Chen et.al. | 2411.17698v1 | null |
2024-11-26 | StableAnimator: High-Quality Identity-Preserving Human Image Animation | Shuyuan Tu et.al. | 2411.17697v1 | link |
2024-11-26 | Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis | Akshita Gupta et.al. | 2411.17690v1 | null |
2024-11-26 | BERT or FastText? A Comparative Analysis of Contextual as well as Non-Contextual Embeddings | Abhay Shanbhag et.al. | 2411.17661v1 | null |
2024-11-26 | DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting | Christian Homeyer et.al. | 2411.17660v1 | link |
2024-11-26 | SAMWISE: Infusing wisdom in SAM2 for Text-Driven Video Segmentation | Claudia Cuttano et.al. | 2411.17646v1 | link |
2024-11-26 | A robust image encryption scheme based on new 4-D hyperchaotic system and elliptic curve | Yehia Lalili et.al. | 2411.17643v1 | null |
2024-11-26 | On Limitations of LLM as Annotator for Low Resource Languages | Suramya Jadhav et.al. | 2411.17637v1 | null |
2024-11-26 | An Ensemble Approach for Brain Tumor Segmentation and Synthesis | Juampablo E. Heras Rivera et.al. | 2411.17617v1 | null |
2024-11-26 | Accelerating Vision Diffusion Transformers with Skip Branches | Guanjie Chen et.al. | 2411.17616v1 | link |
2024-11-25 | Generative Omnimatte: Learning to Decompose Video into Layers | Yao-Chih Lee et.al. | 2411.16683v1 | null |
2024-11-25 | Quark: Real-time, High-resolution, and General Neural View Synthesis | John Flynn et.al. | 2411.16680v1 | null |
2024-11-25 | A Supervised Machine Learning Approach for Assessing Grant Peer Review Reports | Gabriel Okasa et.al. | 2411.16662v1 | null |
2024-11-25 | Fast training of large kernel models with delayed projections | Amirhesam Abedsoltan et.al. | 2411.16658v1 | null |
2024-11-25 | DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation | Zun Wang et.al. | 2411.16657v1 | null |
2024-11-25 | Automated Registration of 3D Neurovascular Territory Atlas to 2D DSA for Targeted Quantitative Angiography Analysis | George Dimopoulos et.al. | 2411.16637v1 | null |
2024-11-25 | LegoPET: Hierarchical Feature Guided Conditional Diffusion for PET Image Reconstruction | Yiran Sun et.al. | 2411.16629v1 | null |
2024-11-25 | Inference-Time Policy Steering through Human Interactions | Yanwei Wang et.al. | 2411.16627v1 | null |
2024-11-25 | Imperceptible Adversarial Examples in the Physical World | Weilin Xu et.al. | 2411.16622v1 | null |
2024-11-25 | Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric | Zhichao Zhang et.al. | 2411.16619v1 | null |
2024-11-22 | Health AI Developer Foundations | Atilla P. Kiraly et.al. | 2411.15128v1 | null |
2024-11-22 | PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision | Arnav M. Das et.al. | 2411.15127v1 | null |
2024-11-22 | VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement | Daeun Lee et.al. | 2411.15115v1 | null |
2024-11-22 | About Time: Advances, Challenges, and Outlooks of Action Understanding | Alexandros Stergiou et.al. | 2411.15106v1 | null |
2024-11-22 | Efficient Radar Modulation Recognition via a Noise-Aware Ensemble Neural Network | Do-Hyun Park et.al. | 2411.15104v1 | null |
2024-11-22 | RED: Effective Trajectory Representation Learning with Comprehensive Information | Silin Zhou et.al. | 2411.15096v1 | null |
2024-11-22 | Dimension-independent rates for structured neural density estimation | Robert A. Vandermeulen et.al. | 2411.15095v1 | null |
2024-11-22 | Quantum-enhanced unsupervised image segmentation for medical images analysis | Laia Domingo et.al. | 2411.15086v1 | null |
2024-11-22 | Leapfrog Latent Consistency Model (LLCM) for Medical Images Generation | Lakshmikar R. Polamreddy et.al. | 2411.15084v1 | link |
2024-11-22 | RankByGene: Gene-Guided Histopathology Representation Learning Through Cross-Modal Ranking Consistency | Wentao Huang et.al. | 2411.15076v1 | null |
2024-11-21 | Revisiting the Integration of Convolution and Attention for Vision Backbone | Lei Zhu et.al. | 2411.14429v1 | link |
2024-11-21 | Quantum States Imaging of Magnetic Field Contours based on Autler-Townes Effect in Yb Atoms | Tanaporn Na Narong et.al. | 2411.14426v1 | null |
2024-11-21 | Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation | Zhuoman Liu et.al. | 2411.14423v1 | null |
2024-11-21 | Multimodal 3D Brain Tumor Segmentation with Adversarial Training and Conditional Random Field | Lan Jiang et.al. | 2411.14418v1 | null |
2024-11-21 | Multimodal Autoregressive Pre-training of Large Vision Encoders | Enrico Fini et.al. | 2411.14402v1 | link |
2024-11-21 | Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding | Yiming Zhang et.al. | 2411.14401v1 | null |
2024-11-21 | POS-tagging to highlight the skeletal structure of sentences | Grigorii Churakov et.al. | 2411.14393v1 | link |
2024-11-21 | Persistent Homology for Structural Characterization in Disordered Systems | An Wang et.al. | 2411.14390v1 | link |
2024-11-21 | Enhancing Diagnostic Precision in Gastric Bleeding through Automated Lesion Segmentation: A Deep DuS-KFCM Approach | Xian-Xian Liu et.al. | 2411.14385v1 | null |
2024-11-21 | Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation | Yuanhao Cai et.al. | 2411.14384v1 | null |
2024-11-20 | REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents | Rui Tian et.al. | 2411.13552v1 | link |
2024-11-20 | Generating 3D-Consistent Videos from Unposed Internet Photos | Gene Chou et.al. | 2411.13549v1 | null |
2024-11-20 | Comparative Analysis of Machine Learning and Deep Learning Models for Classifying Squamous Epithelial Cells of the Cervix | Subhasish Das et.al. | 2411.13535v1 | null |
2024-11-20 | Predictive Insights into LGBTQ+ Minority Stress: A Transductive Exploration of Social Media Discourse | S. Chapagain et.al. | 2411.13534v1 | null |
2024-11-20 | Geometric Algebra Planes: Convex Implicit Neural Volumes | Irmak Sivgin et.al. | 2411.13525v1 | null |
2024-11-20 | VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models | Ziqi Huang et.al. | 2411.13503v1 | link |
2024-11-20 | Efficient Brain Imaging Analysis for Alzheimer's and Dementia Detection Using Convolution-Derivative Operations | Yasmine Mustafa et.al. | 2411.13490v1 | null |
2024-11-20 | Benchmarking Quantum Convolutional Neural Networks for Classification and Data Compression Tasks | Jun Yong Khoo et.al. | 2411.13468v1 | null |
2024-11-20 | Heuristically Adaptive Diffusion-Model Evolutionary Strategy | Benedikt Hartl et.al. | 2411.13420v1 | null |
2024-11-20 | Transformer-Based Contextualized Language Models Joint with Neural Networks for Natural Language Inference in Vietnamese | Dat Van-Thanh Nguyen et.al. | 2411.13407v1 | null |
2024-11-19 | Soft Robotic Dynamic In-Hand Pen Spinning | Yunchao Yao et.al. | 2411.12734v1 | null |
2024-11-19 | Enhancing Multi-Class Disease Classification: Neoplasms, Cardiovascular, Nervous System, and Digestive Disorders Using Advanced LLMs | Ahmed Akib Jawad Karim et.al. | 2411.12712v1 | null |
2024-11-19 | UBSoft: A Simulation Platform for Robotic Skill Learning in Unbounded Soft Environments | Chunru Lin et.al. | 2411.12711v1 | null |
2024-11-19 | Attribute Inference Attacks for Federated Regression Tasks | Francesco Diana et.al. | 2411.12697v1 | null |
2024-11-19 | IMUVIE: Pickup Timeline Action Localization via Motion Movies | John Clapham et.al. | 2411.12689v1 | null |
2024-11-19 | AI Guided Early Screening of Cervical Cancer | Dharanidharan S I et.al. | 2411.12681v1 | null |
2024-11-19 | Yang--Mills topology on four-dimensional triangulations | Giuseppe Clemente et.al. | 2411.12668v1 | null |
2024-11-19 | Machine Learning Approaches on Crop Pattern Recognition a Comparative Analysis | Kazi Hasibul Kabir et.al. | 2411.12667v1 | null |
2024-11-19 | PoM: Efficient Image and Video Generation with the Polynomial Mixer | David Picard et.al. | 2411.12663v1 | link |
2024-11-19 | AdaCM$^2$: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction | Yuanbin Man et.al. | 2411.12593v1 | null |
2024-11-18 | Partially Hyperbolic Dynamics with Quasi-isometric Center | Ziqiang Feng et.al. | 2411.11836v1 | null |
2024-11-18 | Describe Now: User-Driven Audio Description for Blind and Low Vision Individuals | Maryam Cheema et.al. | 2411.11835v1 | null |
2024-11-18 | Absorbing state dynamics of stochastic gradient descent | Guanming Zhang et.al. | 2411.11834v1 | null |
2024-11-18 | Equivariant spatio-hemispherical networks for diffusion MRI deconvolution | Axel Elaldi et.al. | 2411.11819v1 | link |
2024-11-18 | Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion | Meng Zhou et.al. | 2411.11799v1 | link |
2024-11-18 | Exploring adversarial robustness of JPEG AI: methodology, comparison and new methods | Egor Kovalev et.al. | 2411.11795v1 | null |
2024-11-18 | Energy shifts and broadening of excitonic resonances in electrostatically-doped semiconductors | Hanan Dery et.al. | 2411.11790v1 | null |
2024-11-18 | High-Speed Cornering Control and Real-Vehicle Deployment for Autonomous Electric Vehicles | Shiyue Zhao et.al. | 2411.11762v1 | null |
2024-11-18 | Additional Tests for TV 3.0 | Eduardo Peixoto et.al. | 2411.11755v1 | null |
2024-11-18 | Advacheck at GenAI Detection Task 1: AI Detection Powered by Domain-Aware Multi-Tasking | German Gritsai et.al. | 2411.11736v1 | null |
2024-11-15 | The Spatial Complexity of Optical Computing and How to Reduce It | Yandong Li et.al. | 2411.10435v1 | null |
2024-11-15 | Private Counterfactual Retrieval With Immutable Features | Shreya Meel et.al. | 2411.10429v1 | null |
2024-11-15 | Back to Supervision: Boosting Word Boundary Detection through Frame Classification | Simone Carnemolla et.al. | 2411.10423v1 | null |
2024-11-15 | Multiscale Dubuc: A New Similarity Measure for Time Series | Mahsa Khazaei et.al. | 2411.10418v1 | null |
2024-11-15 | Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations | Jianfeng Chi et.al. | 2411.10414v1 | null |
2024-11-15 | Experimental demonstration of Tessellation Structured Illumination Microscopy | Doron Shterman et.al. | 2411.10405v1 | null |
2024-11-15 | On the Foundation Model for Cardiac MRI Reconstruction | Chi Zhang et.al. | 2411.10403v1 | null |
2024-11-15 | Tropical combinatorics of max-linear Bayesian networks | Carlos Améndola et.al. | 2411.10394v1 | null |
2024-11-15 | Mechanisms of Generative Image-to-Image Translation Networks | Guangzong Chen et.al. | 2411.10368v1 | null |
2024-11-15 | On the Cost of Model-Serving Frameworks: An Experimental Evaluation | Pasquale De Rosa et.al. | 2411.10337v1 | null |
2024-11-14 | Towards a Classification of Open-Source ML Models and Datasets for Software Engineering | Alexandra González et.al. | 2411.09683v1 | null |
2024-11-14 | Commensurability Among Deligne-Mostow Monodromy Groups | Chenglong Yu et.al. | 2411.09682v1 | null |
2024-11-14 | Modular Fault Diagnosis Framework for Complex Autonomous Driving Systems | Stefan Orf et.al. | 2411.09643v1 | null |
2024-11-14 | The Moral Foundations Weibo Corpus | Renjie Cao et.al. | 2411.09612v1 | null |
2024-11-14 | Effect of viewing angle in Gamma-ray Burst properties | Sreelakshmi P Chakyar et.al. | 2411.09609v1 | null |
2024-11-14 | Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration | Yifan Shao et.al. | 2411.09604v1 | link |
2024-11-14 | Assessing the Performance of the DINOv2 Self-supervised Learning Vision Transformer Model for the Segmentation of the Left Atrium from MRI Images | Bipasha Kundu et.al. | 2411.09598v1 | null |
2024-11-14 | SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms | Soumick Chatterjee et.al. | 2411.09593v1 | null |
2024-11-14 | SimTube: Generating Simulated Video Comments through Multimodal AI and User Personas | Yu-Kai Hung et.al. | 2411.09577v1 | null |
2024-11-14 | Mutual Influence of Photon Sphere and Non-Commutative Parameter in Various Non-Commutative Black Holes: Part I- Towards evidence for WGC | Mohammad Ali S. Afshar et.al. | 2411.09557v1 | null |
2024-11-13 | 4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization | Mijeong Kim et.al. | 2411.08879v1 | null |
2024-11-13 | A Short Note on Evaluating RepNet for Temporal Repetition Counting in Videos | Debidatta Dwibedi et.al. | 2411.08878v1 | link |
2024-11-13 | Quantum cryptography beyond key distribution: theory and experiment | Mathieu Bozzio et.al. | 2411.08877v1 | null |
2024-11-13 | Large Wireless Model (LWM): A Foundation Model for Wireless Channels | Sadjad Alikhani et.al. | 2411.08872v1 | null |
2024-11-13 | AstroM$^3$: A self-supervised multimodal model for astronomy | Mariia Rizhko et.al. | 2411.08842v1 | null |
2024-11-13 | Multimodal Instruction Tuning with Hybrid State Space Models | Jianing Zhou et.al. | 2411.08840v1 | null |
2024-11-13 | Model agnostic local variable importance for locally dependent relationships | Kelvyn K. Bladen et.al. | 2411.08821v1 | null |
2024-11-13 | Identifying Spicules in Mg II: Statistics and Comparisons with Hα | Vicki L. Herde et.al. | 2411.08801v1 | null |
2024-11-13 | Algorithms in 4-manifold topology | Stefan Bastl et.al. | 2411.08775v1 | null |
2024-11-13 | Sharingan: Extract User Action Sequence from Desktop Recordings | Yanting Chen et.al. | 2411.08768v1 | null |
2024-11-12 | Leonardo vindicated: Pythagorean trees for minimal reconstruction of the natural branching structures | Dymitr Ruta et.al. | 2411.08024v1 | null |
2024-11-12 | Artistic Neural Style Transfer Algorithms with Activation Smoothing | Xiangtian Li et.al. | 2411.08014v1 | null |
2024-11-12 | A computer-vision aided Compton-imaging system for radioactive waste characterization and decommissioning of nuclear power plants | Victor Babiano-Suarez et.al. | 2411.07996v1 | null |
2024-11-12 | DINO-LG: A Task-Specific DINO Model for Coronary Calcium Scoring | Mahmut S. Gokmen et.al. | 2411.07976v1 | null |
2024-11-12 | Commissioning An All-Sky Infrared Camera Array for Detection Of Airborne Objects | Laura Dominé et.al. | 2411.07956v1 | null |
2024-11-12 | SimBase: A Simple Baseline for Temporal Video Grounding | Peijun Bao et.al. | 2411.07945v1 | null |
2024-11-12 | DuoLift-GAN:Reconstructing CT from Single-view and Biplanar X-Rays with Generative Adversarial Networks | Zhaoxi Zhang et.al. | 2411.07941v1 | null |
2024-11-12 | Prediction of Acoustic Communication Performance for AUVs using Gaussian Process Classification | Yifei Gao et.al. | 2411.07933v1 | null |
2024-11-12 | CT-Mamba: A Hybrid Convolutional State Space Model for Low-Dose CT Denoising | Linxuan Li et.al. | 2411.07930v1 | null |
2024-11-12 | CryptoLLM: Unleashing the Power of Prompted LLMs for SmartQnA and Classification of Crypto Posts | Aniket Deroy et.al. | 2411.07917v1 | null |
2024-11-11 | Grounding Video Models to Actions through Goal Conditioned Exploration | Yunhao Luo et.al. | 2411.07223v1 | null |
2024-11-11 | NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics | David Robinson et.al. | 2411.07186v1 | null |
2024-11-11 | Enhancing Predictive Maintenance in Mining Mobile Machinery through a TinyML-enabled Hierarchical Inference Network | Raúl de la Fuente et.al. | 2411.07168v1 | null |
2024-11-11 | Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation | Kaijian Zou et.al. | 2411.07130v1 | link |
2024-11-11 | StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification | Yichen He et.al. | 2411.07076v1 | link |
2024-11-11 | Unified Bayesian representation for high-dimensional multi-modal biomedical data for small-sample classification | Albert Belenguer-Llorens et.al. | 2411.07043v1 | null |
2024-11-11 | The Inherent Adversarial Robustness of Analog In-Memory Computing | Corey Lammie et.al. | 2411.07023v1 | null |
2024-11-11 | HeteroSample: Meta-path Guided Sampling for Heterogeneous Graph Representation Learning | Ao Liu et.al. | 2411.07022v1 | null |
2024-11-11 | Token2Wave | Xin Zhang et.al. | 2411.06989v1 | null |
2024-11-11 | A Hyperspectral Imaging Dataset and Methodology for Intraoperative Pixel-Wise Classification of Metastatic Colon Cancer in the Liver | Ivica Kopriva et.al. | 2411.06969v1 | null |
2024-11-08 | Gender Inequalities in Content Collaborations: Asymmetric Creator Synergy and Symmetric Audience Biases | Mingyue Zha et.al. | 2411.05782v1 | null |
2024-11-08 | Sketched Equivariant Imaging Regularization and Deep Internal Learning for Inverse Problems | Guixian Xu et.al. | 2411.05771v1 | null |
2024-11-08 | FisherMask: Enhancing Neural Network Labeling Efficiency in Image Classification Using Fisher Information | Shreen Gul et.al. | 2411.05752v1 | link |
2024-11-08 | Accurate Unsupervised Photon Counting from Transition Edge Sensor Signals | Nicolas Dalbec-Constant et.al. | 2411.05737v1 | null |
2024-11-08 | Poze: Sports Technique Feedback under Data Constraints | Agamdeep Singh et.al. | 2411.05734v1 | null |
2024-11-08 | Differential Privacy Under Class Imbalance: Methods and Empirical Insights | Lucas Rosenblatt et.al. | 2411.05733v1 | null |
2024-11-08 | On-chip rewritable phase-change metasurface for programmable diffractive deep neural networks | Sanaz Zarei et.al. | 2411.05723v1 | null |
2024-11-08 | Classification of ( |
Basdouri Imed et.al. | 2411.05716v1 | null |
2024-11-08 | STARS: Sensor-agnostic Transformer Architecture for Remote Sensing | Ethan King et.al. | 2411.05714v1 | null |
2024-11-08 | Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream | Abdulkadir Gokce et.al. | 2411.05712v1 | link |
2024-11-07 | ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning | David Junhao Zhang et.al. | 2411.05003v1 | null |
2024-11-07 | DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation | Peiqi Liu et.al. | 2411.04999v1 | null |
2024-11-07 | HourVideo: 1-Hour Video-Language Understanding | Keshigeyan Chandrasegaran et.al. | 2411.04998v1 | null |
2024-11-07 | SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation | Koichi Namekata et.al. | 2411.04989v1 | null |
2024-11-07 | Efficient Preparation of Solvable Anyons with Adaptive Quantum Circuits | Yuanjie Ren et.al. | 2411.04985v1 | null |
2024-11-07 | Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries | Dylan Manuel et.al. | 2411.04981v1 | null |
2024-11-07 | Uncovering Hidden Subspaces in Video Diffusion Models Using Re-Identification | Mischa Dombrowski et.al. | 2411.04956v1 | null |
2024-11-07 | Estimating the Influence of Sequentially Correlated Literary Properties in Textual Classification: A Data-Centric Hypothesis-Testing Approach | Gideon Yoffe et.al. | 2411.04950v1 | null |
2024-11-07 | Proof of the absence of local conserved quantities in the spin-1 bilinear-biquadratic chain and its anisotropic extensions | Akihiro Hokkyo et.al. | 2411.04945v1 | null |
2024-11-07 | A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model | Panwen Hu et.al. | 2411.04942v1 | null |
2024-11-06 | RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models | Maya Varma et.al. | 2411.04097v1 | link |
2024-11-06 | Local unitary equivalence of absolutely maximally entangled states constructed from orthogonal arrays | N Ramadas et.al. | 2411.04096v1 | null |
2024-11-06 | A Collaborative Content Moderation Framework for Toxicity Detection based on Conformalized Estimates of Annotation Disagreement | Guillermo Villate-Castillo et.al. | 2411.04090v1 | link |
2024-11-06 | Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning | Ping Li et.al. | 2411.04059v1 | link |
2024-11-06 | Distinguishing Coupled Dark Energy Models with Neural Networks | L. W. K. Goh et.al. | 2411.04058v1 | link |
2024-11-06 | Synomaly Noise and Multi-Stage Diffusion: A Novel Approach for Unsupervised Anomaly Detection in Ultrasound Imaging | Yuan Bi et.al. | 2411.04004v1 | null |
2024-11-06 | Learning Aggregate Queries Defined by First-Order Logic with Counting | Steffen van Bergerem et.al. | 2411.04003v1 | null |
2024-11-06 | ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks | Ziji Shi et.al. | 2411.03999v1 | null |
2024-11-06 | Fine-tuning -- a Transfer Learning approach | Joseph Arul Raj et.al. | 2411.03941v1 | null |
2024-11-06 | Inter-Frame Coding for Dynamic Meshes via Coarse-to-Fine Anchor Mesh Generation | He Huang et.al. | 2411.03921v1 | null |
2024-11-05 | Classification Done Right for Vision-Language Pre-Training | Huang Zilong et.al. | 2411.03313v1 | link |
2024-11-05 | Automatic solid form classification in pharmaceutical drug development | Julius Lange et.al. | 2411.03308v1 | null |
2024-11-05 | Data-Driven Sampling Based Stochastic MPC for Skid-Steer Mobile Robot Navigation | Ananya Trivedi et.al. | 2411.03289v1 | link |
2024-11-05 | Graph-Based Semi-Supervised Segregated Lipschitz Learning | Farid Bozorgnia et.al. | 2411.03273v1 | null |
2024-11-05 | Tuning into spatial frequency space: Satellite and space debris detection in the ZTF alert stream | J. P. Carvajal et.al. | 2411.03258v1 | null |
2024-11-05 | Kernel Orthogonality does not necessarily imply a Decrease in Feature Map Redundancy in CNNs: Convolutional Similarity Minimization | Zakariae Belmekki et.al. | 2411.03226v1 | null |
2024-11-05 | Beyond Grid Data: Exploring Graph Neural Networks for Earth Observation | Shan Zhao et.al. | 2411.03223v1 | null |
2024-11-05 | Statistical Analysis to Support CSI-Based Sensing Methods | Elena Tonini et.al. | 2411.03203v1 | null |
2024-11-05 | Navigating Extremes: Dynamic Sparsity in Large Output Space | Nasib Ullah et.al. | 2411.03171v1 | null |
2024-11-05 | Pre-trained Visual Dynamics Representations for Efficient Policy Learning | Hao Luo et.al. | 2411.03169v1 | null |
2024-11-04 | Adaptive Caching for Faster Video Generation with Diffusion Transformers | Kumara Kahatapitiya et.al. | 2411.02397v1 | null |
2024-11-04 | AutoVFX: Physically Realistic Video Editing from Natural Language Instructions | Hao-Yu Hsu et.al. | 2411.02394v1 | null |
2024-11-04 | How Far is Video Generation from World Model: A Physical Law Perspective | Bingyi Kang et.al. | 2411.02385v1 | null |
2024-11-04 | Drone Data Analytics for Measuring Traffic Metrics at Intersections in High-Density Areas | Qingwen Pu et.al. | 2411.02349v1 | null |
2024-11-04 | SplatOverflow: Asynchronous Hardware Troubleshooting | Amritansh Kwatra et.al. | 2411.02332v1 | null |
2024-11-04 | PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance | Ruyang Liu et.al. | 2411.02327v1 | link |
2024-11-04 | GenXD: Generating Any 3D and 4D Scenes | Yuyang Zhao et.al. | 2411.02319v1 | null |
2024-11-04 | Information plane and compression-gnostic feedback in quantum machine learning | Nathan Haboury et.al. | 2411.02313v1 | null |
2024-11-04 | Grouped Discrete Representation for Object-Centric Learning | Rongzhen Zhao et.al. | 2411.02299v1 | null |
2024-11-04 | Conformal-in-the-Loop for Learning with Imbalanced Noisy Data | John Brandon Graham-Knight et.al. | 2411.02281v1 | null |
2024-10-31 | EgoMimic: Scaling Imitation Learning via Egocentric Video | Simar Kareer et.al. | 2410.24221v1 | link |
2024-10-31 | Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning | Penghui Ruan et.al. | 2410.24219v1 | link |
2024-10-31 | Learning Video Representations without Natural Videos | Xueyang Yu et.al. | 2410.24213v1 | null |
2024-11-01 | DELTA: Dense Efficient Long-range 3D Tracking for any video | Tuan Duc Ngo et.al. | 2410.24211v2 | null |
2024-10-31 | DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion | Weicai Ye et.al. | 2410.24203v1 | link |
2024-10-31 | DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning | Zhenyu Jiang et.al. | 2410.24185v1 | null |
2024-10-31 | Extended Object Tracking and Classification based on Linear Splines | Matteo Tesori et.al. | 2410.24183v1 | null |
2024-10-31 | Kevin Black et.al. | 2410.24164v1 | null | |
2024-10-31 | Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age | Nouar AlDahoul et.al. | 2410.24148v1 | null |
2024-10-31 | HoloChrome: Polychromatic Illumination for Speckle Reduction in Holographic Near-Eye Displays | Florian Schiffers et.al. | 2410.24144v1 | null |
2024-10-30 | Bridging the Human to Robot Dexterity Gap through Object-Oriented Rewards | Irmak Guzey et.al. | 2410.23289v1 | null |
2024-10-30 | Computing the bridge length: the key ingredient in a continuous isometry classification of periodic point sets | Jonathan McManus et.al. | 2410.23288v1 | null |
2024-10-30 | ReferEverything: Towards Segmenting Everything We Can Speak of in Videos | Anurag Bagchi et.al. | 2410.23287v1 | null |
2024-10-30 | DisCo: Distributed Contact-Rich Trajectory Optimization for Forceful Multi-Robot Collaboration | Ola Shorinwa et.al. | 2410.23283v1 | null |
2024-10-30 | A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization | Bin Wu et.al. | 2410.23279v1 | null |
2024-10-30 | SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation | Yining Hong et.al. | 2410.23277v1 | null |
2024-10-30 | TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models | Ziyao Shangguan et.al. | 2410.23266v1 | link |
2024-10-30 | bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction | Yehe Liu et.al. | 2410.23247v1 | null |
2024-10-30 | PointRecon: Online Point-based 3D Reconstruction via Ray-based 2D-3D Matching | Chen Ziwen et.al. | 2410.23245v1 | null |
2024-10-31 | Aligning Audio-Visual Joint Representations with an Agentic Workflow | Shentong Mo et.al. | 2410.23230v2 | null |
2024-10-29 | Local Policies Enable Zero-shot Long-horizon Manipulation | Murtaza Dalal et.al. | 2410.22332v1 | null |
2024-10-30 | Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets | Guangqi Jiang et.al. | 2410.22325v2 | null |
2024-10-29 | Enhancing Code Annotation Reliability: Generative AI's Role in Comment Quality Assessment Models | Seetharam Killivalavan et.al. | 2410.22323v1 | null |
2024-10-29 | Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier | Kai Wang et.al. | 2410.22317v1 | link |
2024-10-29 | Convex Formulations for Training Two-Layer ReLU Neural Networks | Karthik Prakhya et.al. | 2410.22311v1 | link |
2024-10-29 | Emotion-Guided Image to Music Generation | Souraja Kundu et.al. | 2410.22299v1 | null |
2024-10-29 | Motion Graph Unleashed: A Novel Approach to Video Prediction | Yiqi Zhong et.al. | 2410.22288v1 | link |
2024-10-29 | Non-LTE Synthetic Observables of a Multidimensional Model of Type Ia Supernovae | Samuel J. Boos et.al. | 2410.22276v1 | null |
2024-10-29 | Leveraging Reverberation and Visual Depth Cues for Sound Event Localization and Detection with Distance Estimation | Davide Berghi et.al. | 2410.22271v1 | null |
2024-10-29 | LipKernel: Lipschitz-Bounded Convolutional Neural Networks via Dissipative Layers | Patricia Pauli et.al. | 2410.22258v1 | link |
2024-10-28 | LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior | Hanyu Wang et.al. | 2410.21264v1 | null |
2024-10-28 | Multi-modal AI for comprehensive breast cancer prognostication | Jan Witowski et.al. | 2410.21256v1 | null |
2024-10-28 | Joint Audio-Visual Idling Vehicle Detection with Streamlined Input Dependencies | Xiwen Li et.al. | 2410.21170v1 | null |
2024-10-28 | KaLDeX: Kalman Filter based Linear Deformable Cross Attention for Retina Vessel Segmentation | Zhihao Zhao et.al. | 2410.21160v1 | null |
2024-10-28 | Synthetica: Large Scale Synthetic Data for Robot Perception | Ritvik Singh et.al. | 2410.21153v1 | null |
2024-10-28 | The tau function for ABS equations | James Atkinson et.al. | 2410.21148v1 | null |
2024-10-28 | Enhancing Learned Image Compression via Cross Window-based Attention | Priyanka Mudgal et.al. | 2410.21144v1 | null |
2024-10-28 | uOttawa at LegalLens-2024: Transformer-based Classification Experiments | Nima Meghdadi et.al. | 2410.21139v1 | link |
2024-10-28 | Do LLMs generate test oracles that capture the actual or the expected program behaviour? | Michael Konstantinou et.al. | 2410.21136v1 | null |
2024-10-28 | Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal Sequences | Zhihao Zhao et.al. | 2410.21130v1 | null |
2024-10-25 | Sparse Decomposition of Graph Neural Networks | Yaochen Hu et.al. | 2410.19723v1 | null |
2024-10-25 | Arabic Music Classification and Generation using Deep Learning | Mohamed Elshaarawy et.al. | 2410.19719v1 | null |
2024-10-25 | Enhanced Anomaly Detection in Industrial Control Systems aided by Machine Learning | Vegard Berge et.al. | 2410.19717v1 | null |
2024-10-25 | TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Xiangyu Zeng et.al. | 2410.19702v1 | null |
2024-10-25 | MILES: Making Imitation Learning Easy with Self-Supervision | Georgios Papagiannis et.al. | 2410.19693v1 | null |
2024-10-25 | Deep Learning for Classification of Inflammatory Bowel Disease Activity in Whole Slide Images of Colonic Histopathology | Amit Das et.al. | 2410.19690v1 | null |
2024-10-25 | Optimizing Hearthstone Agents using an Evolutionary Algorithm | Pablo García-Sánchez et.al. | 2410.19681v1 | null |
2024-10-25 | Learning the Regularization Strength for Deep Fine-Tuning via a Data-Emphasized Variational Objective | Ethan Harvey et.al. | 2410.19675v1 | null |
2024-10-25 | MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services | Hongjia Wu et.al. | 2410.19665v1 | null |
2024-10-25 | VARS: Vision-based Assessment of Risk in Security Systems | Pranav Gupta et.al. | 2410.19642v1 | null |
2024-10-24 | Framer: Interactive Frame Interpolation | Wen Wang et.al. | 2410.18978v1 | null |
2024-10-24 | CAMEL-Bench: A Comprehensive Arabic LMM Benchmark | Sara Ghaboura et.al. | 2410.18976v1 | link |
2024-10-24 | Unbounded: A Generative Infinite Game of Character Life Simulation | Jialu Li et.al. | 2410.18975v1 | null |
2024-10-24 | Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling | Mingtong Zhang et.al. | 2410.18912v1 | null |
2024-10-24 | SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment | Caelan Garrett et.al. | 2410.18907v1 | null |
2024-10-24 | A Survey of Multimodal Sarcasm Detection | Shafkat Farabi et.al. | 2410.18882v1 | null |
2024-10-24 | Multi-Class Abnormality Classification in Video Capsule Endoscopy Using Deep Learning | Arnav Samal et.al. | 2410.18879v1 | link |
2024-10-24 | Exploring the Universe with SNAD: Anomaly Detection in Astronomy | Alina A. Volnova et.al. | 2410.18875v1 | null |
2024-10-24 | Exploring a Geometric Conjecture, Some Properties of Blaschke Products, and the Geometry of Curves Formed by Them | Mehmet Celik et.al. | 2410.18863v1 | null |
2024-10-24 | Highly efficient non-rigid registration in k-space with application to cardiac Magnetic Resonance Imaging | Aya Ghoul et.al. | 2410.18834v1 | link |
2024-10-23 | FIPER: Generalizable Factorized Fields for Joint Image Compression and Super-Resolution | Yang-Che Sun et.al. | 2410.18083v1 | null |
2024-10-23 | WorldSimBench: Towards Video Generation Models as World Simulators | Yiran Qin et.al. | 2410.18072v1 | null |
2024-10-23 | Eigenvalue crossings in equivariant families of matrices | Jonathan Rawlinson et.al. | 2410.18068v1 | null |
2024-10-23 | The Double-Edged Sword of Behavioral Responses in Strategic Classification: Theory and User Studies | Raman Ebrahimi et.al. | 2410.18066v1 | null |
2024-10-23 | Real time anomalies detection on video | Fabien Poirier et.al. | 2410.18051v1 | null |
2024-10-23 | Boundary topological insulators and superconductors of Altland-Zirnbauer tenfold classes | Xun-Jiang Luo et.al. | 2410.18015v1 | null |
2024-10-24 | Effective Finite Time Stability Control for Human-Machine Shared Vehicle Following System | Zihan Wang et.al. | 2410.18007v2 | null |
2024-10-23 | Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation | Suho Kang et.al. | 2410.18001v1 | link |
2024-10-23 | Optical Generative Models | Shiqi Chen et.al. | 2410.17970v1 | null |
2024-10-23 | A Wavelet Diffusion GAN for Image Super-Resolution | Lorenzo Aloisi et.al. | 2410.17966v1 | null |
2024-10-22 | Altogether: Image Captioning via Re-aligning Alt-text | Hu Xu et.al. | 2410.17251v1 | null |
2024-10-22 | Classifying rational polygons with small denominator and few interior lattice points | Martin Bohnert et.al. | 2410.17244v1 | null |
2024-10-22 | Frontiers in Intelligent Colonoscopy | Ge-Peng Ji et.al. | 2410.17241v1 | link |
2024-10-22 | Automated Spinal MRI Labelling from Reports Using a Large Language Model | Robin Y. Park et.al. | 2410.17235v1 | link |
2024-10-22 | Few-shot In-Context Preference Learning Using Large Language Models | Chao Yu et.al. | 2410.17233v1 | null |
2024-10-22 | Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods | Tsachi Blau et.al. | 2410.17222v1 | null |
2024-10-22 | The Decision Problem for Regular First-Order Theories | Umang Mathur et.al. | 2410.17185v1 | null |
2024-10-22 | Technical Report: Toward Applying Quantum Computing to Network Verification | Kahlil Dozier et.al. | 2410.17184v1 | null |
2024-10-22 | KANICE: Kolmogorov-Arnold Networks with Interactive Convolutional Elements | Md Meftahul Ferdaus et.al. | 2410.17172v1 | link |
2024-10-22 | Are Visual-Language Models Effective in Action Recognition? A Comparative Study | Mahmoud Ali et.al. | 2410.17149v1 | null |
2024-10-21 | SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree | Shuangrui Ding et.al. | 2410.16268v1 | link |
2024-10-21 | xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs | Michael S. Ryoo et.al. | 2410.16267v1 | null |
2024-10-21 | 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors | Xi Liu et.al. | 2410.16266v1 | null |
2024-10-21 | Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos | Gengshan Yang et.al. | 2410.16259v1 | null |
2024-10-21 | Serendipitous detection of an intense X-ray flare in the weak-line T Tauri star KM Ori with SRG/eROSITA | Savithri H. Ezhikode et.al. | 2410.16241v1 | null |
2024-10-21 | MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report | Samrajya Thapa et.al. | 2410.16239v1 | link |
2024-10-21 | Deep Radiomics Detection of Clinically Significant Prostate Cancer on Multicenter MRI: Initial Comparison to PI-RADS Assessment | G. A. Nketiah et.al. | 2410.16238v1 | null |
2024-10-22 | Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models | Giannis Daras et.al. | 2410.16152v2 | null |
2024-10-21 | An Explainable Contrastive-based Dilated Convolutional Network with Transformer for Pediatric Pneumonia Detection | Chandravardhan Singh Raghaw et.al. | 2410.16143v1 | null |
2024-10-21 | Modeling dynamic neural activity by combining naturalistic video stimuli and stimulus-independent latent factors | Finn Schmidt et.al. | 2410.16136v1 | null |
2024-10-18 | Real-time Fake News from Adversarial Feedback | Sanxing Chen et.al. | 2410.14651v1 | null |
2024-10-18 | GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings | Raghuveer Thirukovalluru et.al. | 2410.14635v1 | null |
2024-10-18 | You Shall Know a Tool by the Traces it Leaves: The Predictability of Sentiment Analysis Tools | Daniel Baumartz et.al. | 2410.14626v1 | null |
2024-10-18 | Learning to Control the Smoothness of Graph Convolutional Network Features | Shih-Hsin Wang et.al. | 2410.14604v1 | null |
2024-10-18 | Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection | Aaron Alvarado Kristanto Julistiono et.al. | 2410.14581v1 | null |
2024-10-18 | A Hybrid Feature Fusion Deep Learning Framework for Leukemia Cancer Detection in Microscopic Blood Sample Using Gated Recurrent Unit and Uncertainty Quantification | Maksuda Akter et.al. | 2410.14536v1 | null |
2024-10-18 | Less is More: Selective Reduction of CT Data for Self-Supervised Pre-Training of Deep Learning Models with Contrastive Learning Improves Downstream Classification Performance | Daniel Wolf et.al. | 2410.14524v1 | link |
2024-10-18 | Influence of anisotropy on the study of critical behavior of spin models by machine learning methods | Diana Sukhoverkhova et.al. | 2410.14523v1 | null |
2024-10-18 | A character approach to the ISR property | Artem Dudko et.al. | 2410.14517v1 | null |
2024-10-18 | Efficient Annotator Reliability Assessment and Sample Weighting for Knowledge-Based Misinformation Detection on Social Media | Owen Cook et.al. | 2410.14515v1 | link |
2024-10-17 | DepthSplat: Connecting Gaussian Splatting and Depth | Haofei Xu et.al. | 2410.13862v1 | link |
2024-10-17 | Adaptive Subsampling and Learned Model Improve Spatiotemporal Resolution of Tactile Skin | Ariel Slepyan et.al. | 2410.13847v1 | null |
2024-10-17 | VidPanos: Generative Panoramic Videos from Casual Panning Videos | Jingwei Ma et.al. | 2410.13832v1 | null |
2024-10-17 | DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control | Yujie Wei et.al. | 2410.13830v1 | null |
2024-10-17 | Multi-style conversion for semantic segmentation of lesions in fundus images by adversarial attacks | Clément Playout et.al. | 2410.13822v1 | link |
2024-10-17 | Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance | Mitsuhiko Nakamoto et.al. | 2410.13816v1 | null |
2024-10-17 | A Pattern to Align Them All: Integrating Different Modalities to Define Multi-Modal Entities | Gianluca Apriceno et.al. | 2410.13803v1 | link |
2024-10-17 | MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations | Liang Xu et.al. | 2410.13790v1 | link |
2024-10-17 | Strong-to-weak spontaneous symmetry breaking meets average symmetry-protected topological order | Yuchen Guo et.al. | 2410.13734v1 | null |
2024-10-17 | Representing Model Weights with Language using Tree Experts | Eliahu Horwitz et.al. | 2410.13569v1 | null |
2024-10-16 | Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception | Jihao Zhao et.al. | 2410.12788v1 | null |
2024-10-16 | The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio | Sicong Leng et.al. | 2410.12787v1 | null |
2024-10-16 | Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions | Zhenyu Jiang et.al. | 2410.12773v1 | null |
2024-10-16 | Vaccinating Federated Learning for Robust Modulation Classification in Distributed Wireless Networks | Hunmin Lee et.al. | 2410.12772v1 | null |
2024-10-16 | Phase retrieval via media diversity | Yan Cheng et.al. | 2410.12767v1 | null |
2024-10-16 | SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation | Jaehong Yoon et.al. | 2410.12761v1 | null |
2024-10-16 | Unitary Multi-Margin BERT for Robust Natural Language Processing | Hao-Yuan Chang et.al. | 2410.12759v1 | null |
2024-10-16 | PND-Net: Plant Nutrition Deficiency and Disease Classification using Graph Convolutional Network | Asish Bera et.al. | 2410.12742v1 | null |
2024-10-16 | How much time do we have before catastrophic disclosure occurs? | Matthew Szydagis et.al. | 2410.12738v1 | null |
2024-10-16 | Machine Learning-Augmented Ontology-Based Data Access for Renewable Energy Data | Marco Calautti et.al. | 2410.12734v1 | null |
2024-10-15 | High-Resolution Frame Interpolation with Patch-based Cascaded Diffusion | Junhwa Hur et.al. | 2410.11838v1 | null |
2024-10-15 | Contrastive Touch-to-Touch Pretraining | Samanta Rodriguez et.al. | 2410.11834v1 | null |
2024-10-15 | CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos | Nikita Karaev et.al. | 2410.11831v1 | null |
2024-10-15 | Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos | Zhouxia Wang et.al. | 2410.11828v1 | null |
2024-10-15 | On representations of Arthur type and unitary dual for classical groups | Alexander Hazeltine et.al. | 2410.11806v1 | null |
2024-10-16 | Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices | Zhiyuan Ma et.al. | 2410.11795v2 | null |
2024-10-15 | OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation | Jinhan Li et.al. | 2410.11792v1 | null |
2024-10-15 | Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability | Tsz Ting Chung et.al. | 2410.11786v1 | null |
2024-10-15 | On the Training Convergence of Transformers for In-Context Classification | Wei Shen et.al. | 2410.11778v1 | null |
2024-10-15 | Temporal resolution enhancement in Structured Illumination Microscopy using cascaded reconstruction | Doron Shterman et.al. | 2410.11770v1 | null |
2024-10-14 | Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models | Jingzhi Bao et.al. | 2410.10821v1 | null |
2024-10-14 | TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models | Mu Cai et.al. | 2410.10818v1 | null |
2024-10-14 | LVD-2M: A Long-take Video Dataset with Temporally Dense Captions | Tianwei Xiong et.al. | 2410.10816v1 | link |
2024-10-14 | Depth Any Video with Scalable Synthetic Data | Honghui Yang et.al. | 2410.10815v1 | null |
2024-10-14 | Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies | Yanjie Ze et.al. | 2410.10803v1 | link |
2024-10-14 | Boosting Camera Motion Control for Video Diffusion Transformers | Soon Yau Cheong et.al. | 2410.10802v1 | null |
2024-10-14 | Probabilistic Degeneracy Detection for Point-to-Plane Error Minimization | Johan Hatleskog et.al. | 2410.10784v1 | null |
2024-10-14 | 3DArticCyclists: Generating Simulated Dynamic 3D Cyclists for Human-Object Interaction (HOI) and Autonomous Driving Applications | Eduardo R. Corral-Soto et.al. | 2410.10782v1 | null |
2024-10-14 | ControlMM: Controllable Masked Motion Generation | Ekkasit Pinyoanuntapong et.al. | 2410.10780v1 | null |
2024-10-14 | Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention | Dejia Xu et.al. | 2410.10774v1 | null |
2024-10-11 | Optimal Downsampling for Imbalanced Classification with Generalized Linear Models | Yan Chen et.al. | 2410.08994v1 | null |
2024-10-11 | Realizing Linear Synaptic Plasticity in Electric Double Layer-Gated Transistors for Improved Predictive Accuracy and Efficiency in Neuromorphic Computing | Nithil Harris Manimaran et.al. | 2410.08978v1 | null |
2024-10-11 | ALVIN: Active Learning Via INterpolation | Michalis Korakakis et.al. | 2410.08972v1 | null |
2024-10-11 | Evaluating Federated Kolmogorov-Arnold Networks on Non-IID Data | Arthur Mendonça Sasse et.al. | 2410.08961v1 | null |
2024-10-11 | Lifted Coefficient of Determination: Fast model-free prediction intervals and likelihood-free model comparison | Daniel Salnikov et.al. | 2410.08958v1 | null |
2024-10-11 | Rapid Grassmannian Averaging with Chebyshev Polynomials | Brighton Ancelin et.al. | 2410.08956v1 | null |
2024-10-11 | Local moduli in the special 2-flags of length 5 | Piotr Mormul et.al. | 2410.08951v1 | null |
2024-10-11 | On the Adversarial Transferability of Generalized "Skip Connections" | Yisen Wang et.al. | 2410.08950v1 | null |
2024-10-11 | Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing | Clayton Leite et.al. | 2410.08931v1 | null |
2024-10-11 | Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images | Virmarie Maquiling et.al. | 2410.08926v1 | null |
2024-10-10 | LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts | Anh-Quan Cao et.al. | 2410.08211v1 | null |
2024-10-10 | Scaling Laws For Diffusion Transformers | Zhengyang Liang et.al. | 2410.08184v1 | null |
2024-10-10 | RGM: Reconstructing High-fidelity 3D Car Assets with Relightable 3D-GS Generative Model from a Single Image | Xiaoxue Chen et.al. | 2410.08181v1 | null |
2024-10-10 | A note on the symplectic classification of almost-toric systems | Xiudi Tang et.al. | 2410.08175v1 | null |
2024-10-10 | Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models | Qingni Wang et.al. | 2410.08174v1 | null |
2024-10-10 | Progressive Autoregressive Video Diffusion Models | Desai Xie et.al. | 2410.08151v1 | link |
2024-10-10 | Robust AI-Generated Text Detection by Restricted Embeddings | Kristian Kuznetsov et.al. | 2410.08113v1 | null |
2024-10-10 | Color-Guided Flying Pixel Correction in Depth Images | Ekamresh Vasudevan et.al. | 2410.08084v1 | null |
2024-10-10 | Dynamic Object Catching with Quadruped Robot Front Legs | André Schakkal et.al. | 2410.08065v1 | null |
2024-10-10 | A Target-Aware Analysis of Data Augmentation for Hate Speech Detection | Camilla Casula et.al. | 2410.08053v1 | null |
2024-10-09 | MM-Ego: Towards Building Egocentric Multimodal LLMs | Hanrong Ye et.al. | 2410.07177v1 | null |
2024-10-09 | One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation | Fabian Paischer et.al. | 2410.07170v1 | null |
2024-10-09 | Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis | Bohan Zeng et.al. | 2410.07155v1 | link |
2024-10-09 | Mental Disorders Detection in the Era of Large Language Models | Gleb Kuzmin et.al. | 2410.07129v1 | null |
2024-10-09 | Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication | Erzhen Hu et.al. | 2410.07119v1 | null |
2024-10-09 | JPEG Inspired Deep Learning | Ahmed H. Salamah et.al. | 2410.07081v1 | null |
2024-10-09 | Retrieval-Augmented Decision Transformer: External Memory for In-context RL | Thomas Schmied et.al. | 2410.07071v1 | null |
2024-10-09 | TinyEmo: Scaling down Emotional Reasoning via Metric Projection | Cristian Gutierrez et.al. | 2410.07062v1 | link |
2024-10-09 | Z-upscaling: Optical Flow Guided Frame Interpolation for Isotropic Reconstruction of 3D EM Volumes | Fisseha A. Ferede et.al. | 2410.07043v1 | link |
2024-10-09 | Optimizing Estimators of Squared Calibration Errors in Classification | Sebastian G. Gruber et.al. | 2410.07014v1 | null |
2024-10-07 | Fine-Tuning CLIP's Last Visual Projector: A Few-Shot Cornucopia | Mohammad Fahes et.al. | 2410.05270v1 | link |
2024-10-07 | Grounding Partially-Defined Events in Multimodal Data | Kate Sanders et.al. | 2410.05267v1 | null |
2024-10-07 | DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control | Kaifeng Zhao et.al. | 2410.05260v1 | null |
2024-10-07 | SePPO: Semi-Policy Preference Optimization for Diffusion Alignment | Daoan Zhang et.al. | 2410.05255v1 | link |
2024-10-07 | Causal Micro-Narratives | Mourad Heddaya et.al. | 2410.05252v1 | null |
2024-10-07 | LoTLIP: Improving Language-Image Pre-training for Long Text Understanding | Wei Wu et.al. | 2410.05249v1 | null |
2024-10-07 | The Dawn of Video Generation: Preliminary Explorations with SORA-like Models | Ailing Zeng et.al. | 2410.05227v1 | null |
2024-10-07 | Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality | Ge Ya et.al. | 2410.05203v1 | link |
2024-10-07 | Variable Resolution Pixel Quantization for Low Power Machine Vision Application on Edge | Senorita Deb et.al. | 2410.05189v1 | null |
2024-10-07 | VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks | Ziyan Jiang et.al. | 2410.05160v1 | null |
2024-10-04 | Spatial Hyperspheric Models for Compositional Data | Michael R. Schwob et.al. | 2410.03648v1 | null |
2024-10-04 | HyperCMR: Enhanced Multi-Contrast CMR Reconstruction with Eagle Loss | Ruru Xu et.al. | 2410.03624v1 | null |
2024-10-04 | Crystallography, Group Cohomology, and Lieb-Schultz-Mattis Constraints | Chunxiao Liu et.al. | 2410.03607v1 | null |
2024-10-04 | LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Videos | Noriaki Hirose et.al. | 2410.03603v1 | null |
2024-10-04 | Training Over a Distribution of Hyperparameters for Enhanced Performance and Adaptability on Imbalanced Classification | Kelsey Lieberman et.al. | 2410.03588v1 | null |
2024-10-04 | A Multi-model Approach for Video Data Retrieval in Autonomous Vehicle Development | Jesper Knapp et.al. | 2410.03580v1 | null |
2024-10-04 | Re-examining Sexism and Misogyny Classification with Annotator Attitudes | Aiqi Jiang et.al. | 2410.03543v1 | null |
2024-10-04 | Classification-Denoising Networks | Louis Thiry et.al. | 2410.03505v1 | null |
2024-10-04 | MO-DDN: A Coarse-to-Fine Attribute-based Exploration Agent for Multi-object Demand-driven Navigation | Hongcheng Wang et.al. | 2410.03488v1 | null |
2024-10-04 | A Multimodal Framework for Deepfake Detection | Kashish Gandhi et.al. | 2410.03487v1 | null |
2024-10-03 | Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats | Mingyang Xie et.al. | 2410.02764v1 | null |
2024-10-03 | Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos | Jianrui Zhang et.al. | 2410.02763v1 | null |
2024-10-03 | Loong: Generating Minute-level Long Videos with Autoregressive Language Models | Yuqing Wang et.al. | 2410.02757v1 | null |
2024-10-03 | An Online Automatic Modulation Classification Scheme Based on Isolation Distributional Kernel | Xinpeng Li et.al. | 2410.02750v1 | null |
2024-10-03 | OOD-Chameleon: Is Algorithm Selection for OOD Generalization Learnable? | Liangze Jiang et.al. | 2410.02735v1 | null |
2024-10-03 | Liouville's theorem in calibrated geometries | Toni Ikonen et.al. | 2410.02722v1 | null |
2024-10-03 | Curvature Diversity-Driven Deformation and Domain Alignment for Point Cloud | Mengxi Wu et.al. | 2410.02720v1 | link |
2024-10-03 | AlzhiNet: Traversing from 2DCNN to 3DCNN, Towards Early Detection and Diagnosis of Alzheimer's Disease | Romoke Grace Akindele et.al. | 2410.02714v1 | null |
2024-10-04 | Video Instruction Tuning With Synthetic Data | Yuanhan Zhang et.al. | 2410.02713v2 | null |
2024-10-03 | Impact of a reclassification on Web of Science articles on bibliometric indicators | Agénor Lahatte et.al. | 2410.02701v1 | null |
2024-10-02 | Loki: An Open-Source Tool for Fact Verification | Haonan Li et.al. | 2410.01794v1 | null |
2024-10-03 | Application of convolutional neural networks for extensive air shower separation in the SPHERE-3 experiment | E. L. Entina et.al. | 2410.01781v2 | null |
2024-10-03 | TopER: Topological Embeddings in Graph Representation Learning | Astrit Tola et.al. | 2410.01778v2 | null |
2024-10-02 | Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context | Spencer Frei et.al. | 2410.01774v1 | null |
2024-10-02 | SegHeD: Segmentation of Heterogeneous Data for Multiple Sclerosis Lesions with Anatomical Constraints | Berke Doga Basaran et.al. | 2410.01766v1 | null |
2024-10-02 | LightSC: The Making of a Usable Security Classification Tool for DevSecOps | Manish Shrestha et.al. | 2410.01762v1 | null |
2024-10-02 | Integrating Protein Sequence and Expression Level to Analysis Molecular Characterization of Breast Cancer Subtypes | Hossein Sholehrasa et.al. | 2410.01755v1 | null |
2024-10-02 | Unitary Representations of the Isometry Groups of Urysohn Spaces | Rémi Barritault et.al. | 2410.01725v1 | null |
2024-10-02 | COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation | Mingzhen Sun et.al. | 2410.01718v1 | null |
2024-10-02 | Rabi oscillations at three-photon laser excitation of a single rubidium Rydberg atom in an optical dipole trap | I. I. Beterov et.al. | 2410.01703v1 | null |
2024-09-30 | Continuously Improving Mobile Manipulation with Autonomous Real-World RL | Russell Mendonca et.al. | 2409.20568v1 | null |
2024-09-30 | MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning | Haotian Zhang et.al. | 2409.20566v1 | null |
2024-09-30 | DressRecon: Freeform 4D Human Reconstruction from Monocular Video | Jeff Tan et.al. | 2409.20563v1 | null |
2024-09-30 | LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner | Xiaopan Zhang et.al. | 2409.20560v1 | null |
2024-09-30 | Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos | Md Mohaiminul Islam et.al. | 2409.20557v1 | null |
2024-09-30 | Inverse Painting: Reconstructing The Painting Process | Bowei Chen et.al. | 2409.20556v1 | null |
2024-09-30 | UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models | Qiaojun Yu et.al. | 2409.20551v1 | null |
2024-09-30 | Statistical view of orbital circularisation with 14 000 characterised TESS eclipsing binaries | L. W. IJspeert et.al. | 2409.20540v1 | null |
2024-09-30 | Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers | Lirui Wang et.al. | 2409.20537v1 | link |
2024-09-30 | Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images | Bahri Batuhan Bilecen et.al. | 2409.20530v1 | null |
2024-09-27 | PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation | Shaowei Liu et.al. | 2409.18964v1 | link |
2024-09-27 | LML: Language Model Learning a Dataset for Data-Augmented Prediction | Praneeth Vadlapati et.al. | 2409.18957v1 | link |
2024-09-27 | Unconditional stability of a recurrent neural circuit implementing divisive normalization | Shivang Rawat et.al. | 2409.18946v1 | null |
2024-09-27 | From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding | Heqing Zou et.al. | 2409.18938v1 | null |
2024-09-27 | Subspace Preserving Quantum Convolutional Neural Network Architectures | Léo Monbroussou et.al. | 2409.18918v1 | null |
2024-09-27 | Improving Visual Object Tracking through Visual Prompting | Shih-Fang Chen et.al. | 2409.18901v1 | link |
2024-09-27 | Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors | Yunlong Lin et.al. | 2409.18899v1 | null |
2024-09-27 | Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models | Zehan Li et.al. | 2409.18878v1 | null |
2024-09-27 | Simulating Dynamic Tumor Contrast Enhancement in Breast MRI using Conditional Generative Adversarial Networks | Richard Osuala et.al. | 2409.18872v1 | null |
2024-09-27 | Fusion Systems and Simple Groups With Class Two Sylow |
Martin van Beek et.al. | 2409.18870v1 | null |
2024-09-26 | EgoLM: Multi-Modal Language Model of Egocentric Motions | Fangzhou Hong et.al. | 2409.18127v1 | null |
2024-09-26 | LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness | Chenming Zhu et.al. | 2409.18125v1 | null |
2024-09-26 | RT-GuIDE: Real-Time Gaussian splatting for Information-Driven Exploration | Yuezhan Tao et.al. | 2409.18122v1 | null |
2024-09-26 | Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction | Justin Kerr et.al. | 2409.18121v1 | null |
2024-09-26 | E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding | Ye Liu et.al. | 2409.18111v1 | link |
2024-09-26 | MALPOLON: A Framework for Deep Species Distribution Modeling | Theo Larcher et.al. | 2409.18102v1 | null |
2024-09-26 | Incorporating sparse labels into biologging studies using hidden Markov models with weighted likelihoods | Evan Sidrow et.al. | 2409.18091v1 | null |
2024-09-26 | Stable Video Portraits | Mirela Ostrek et.al. | 2409.18083v1 | null |
2024-09-26 | Graded contractions on the orthogonal Lie algebras of dimensions 7 and 8 | Cristina Draper et.al. | 2409.18069v1 | null |
2024-09-26 | LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field | Huan Wang et.al. | 2409.18057v1 | link |
2024-09-25 | DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion | Yukun Huang et.al. | 2409.17145v1 | null |
2024-09-25 | Streaming Neural Images | Marcos V. Conde et.al. | 2409.17134v1 | null |
2024-09-25 | Assessing the Level of Toxicity Against Distinct Groups in Bangla Social Media Comments: A Comprehensive Investigation | Mukaffi Bin Moin et.al. | 2409.17130v1 | null |
2024-09-25 | Classification of Gleason Grading in Prostate Cancer Histopathology Images Using Deep Learning Techniques: YOLO, Vision Transformers, and Vision Mamba | Amin Malekmohammadi et.al. | 2409.17122v1 | link |
2024-09-25 | Counting Triangles in Triangles | Jim Propp et.al. | 2409.17117v1 | null |
2024-09-25 | BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices | Yongqi Xu et.al. | 2409.17093v1 | link |
2024-09-25 | Accumulator-Aware Post-Training Quantization | Ian Colbert et.al. | 2409.17092v1 | null |
2024-09-25 | Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification | Xinrui Zhou et.al. | 2409.17091v1 | null |
2024-09-25 | SEN12-WATER: A New Dataset for Hydrological Applications and its Benchmarking | Luigi Russo et.al. | 2409.17087v1 | null |
2024-09-25 | The Effect of Perceptual Metrics on Music Representation Learning for Genre Classification | Tashi Namgyal et.al. | 2409.17069v1 | null |
2024-09-24 | Self-Supervised Any-Point Tracking by Contrastive Random Walks | Ayush Shrivastava et.al. | 2409.16288v1 | link |
2024-09-24 | Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking | Xi Wang et.al. | 2409.16287v1 | null |
2024-09-24 | Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation | Homanga Bharadhwaj et.al. | 2409.16283v1 | null |
2024-09-24 | Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation | Yong Xien Chng et.al. | 2409.16278v1 | null |
2024-09-24 | Compressed Depth Map Super-Resolution and Restoration: AIM 2024 Challenge Results | Marcos V. Conde et.al. | 2409.16277v1 | null |
2024-09-24 | CDChat: A Large Multimodal Model for Remote Sensing Change Description | Mubashir Noman et.al. | 2409.16261v1 | link |
2024-09-24 | Empirically Exploring the Space of Monostationarity in Dual Phosphorylation | May Cai et.al. | 2409.16234v1 | null |
2024-09-24 | VideoPatchCore: An Effective Method to Memorize Normality for Video Anomaly Detection | Sunghyun Ahn et.al. | 2409.16225v1 | link |
2024-09-24 | Upper-body free-breathing Magnetic Resonance Fingerprinting applied to the quantification of water T1 and fat fraction | Constantin Slioussarenko et.al. | 2409.16200v1 | null |
2024-09-24 | Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking | Jun Bai et.al. | 2409.16198v1 | null |
2024-09-20 | Gender Representation and Bias in Indian Civil Service Mock Interviews | Somonnoy Banerjee et.al. | 2409.12194v3 | null |
2024-09-18 | DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control | Zichen Jeff Cui et.al. | 2409.12192v1 | null |
2024-09-18 | Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution | Peng Wang et.al. | 2409.12191v1 | link |
2024-09-18 | multiPI-TransBTS: A Multi-Path Learning Framework for Brain Tumor Image Segmentation Based on Multi-Physical Information | Hongjun Zhu et.al. | 2409.12167v1 | link |
2024-09-18 | JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation | Sai Tanmay Reddy Chakkera et.al. | 2409.12156v1 | null |
2024-09-18 | Autopet III challenge: Incorporating anatomical knowledge into nnUNet for lesion segmentation in PET/CT | Hamza Kalisch et.al. | 2409.12155v1 | link |
2024-09-18 | MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion | Kalakonda Sai Shashank et.al. | 2409.12140v1 | null |
2024-09-18 | Mirages in the Energy Landscape of Soft Sphere Packings | Praharsh Suryadevara et.al. | 2409.12113v1 | null |
2024-09-18 | SPRMamba: Surgical Phase Recognition for Endoscopic Submucosal Dissection with Mamba | Xiangning Zhang et.al. | 2409.12108v1 | null |
2024-09-18 | Unveiling the Secrets of New Physics Through Top Quark Tagging | Rameswar Sahu et.al. | 2409.12085v1 | null |
2024-09-17 | Systematic analysis of Parity-Violating modes | Hong-Ming Zhu et.al. | 2409.11400v1 | null |
2024-09-17 | Online 4D Ultrasound-Guided Robotic Tracking Enables 3D Ultrasound Localisation Microscopy with Large Tissue Displacements | Jipeng Yan et.al. | 2409.11391v1 | null |
2024-09-17 | Normalization in Proportional Feature Spaces | Alexandre Benatti et.al. | 2409.11389v1 | null |
2024-09-17 | Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification | Fatema-E- Jannat et.al. | 2409.11375v1 | null |
2024-09-17 | Uncertainty and Prediction Quality Estimation for Semantic Segmentation via Graph Neural Networks | Edgar Heinert et.al. | 2409.11373v1 | null |
2024-09-17 | Compact Implicit Neural Representations for Plane Wave Images | Mathilde Monvoisin et.al. | 2409.11370v1 | null |
2024-09-17 | OSV: One Step is Enough for High-Quality Image to Video Generation | Xiaofeng Mao et.al. | 2409.11367v1 | null |
2024-09-17 | THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models | Mengfei Liang et.al. | 2409.11353v1 | null |
2024-09-17 | CLIP Adaptation by Intra-modal Overlap Reduction | Alexey Kravets et.al. | 2409.11338v1 | null |
2024-09-17 | LPT++: Efficient Training on Mixture of Long-tailed Experts | Bowen Dong et.al. | 2409.11323v1 | null |
2024-09-16 | Enhancing Video Transmission with Machine Learning based Routing in Software-Defined Networks | Anıl Dursun İpek et.al. | 2409.10512v1 | null |
2024-09-16 | Exploring 3D Face Reconstruction and Fusion Methods for Face Verification: A Case-Study in Video Surveillance | Simone Maurizio La Cava et.al. | 2409.10481v1 | null |
2024-09-16 | Real-Time Whole-Body Control of Legged Robots with Model-Predictive Path Integral Control | Juan Alvarez-Padilla et.al. | 2409.10469v1 | null |
2024-09-16 | Assortativity in sympatric speciation and species classification | Joao U. F. Lizarraga et.al. | 2409.10466v1 | null |
2024-09-16 | Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons | Farhad Pourkamali-Anaraki et.al. | 2409.10463v1 | null |
2024-09-16 | Deep-Wide Learning Assistance for Insect Pest Classification | Toan Nguyen et.al. | 2409.10445v1 | link |
2024-09-16 | A point process approach for the classification of noisy calcium imaging data | Arianna Burzacchi et.al. | 2409.10409v1 | null |
2024-09-16 | MOST: MR reconstruction Optimization for multiple downStream Tasks via continual learning | Hwihun Jeong et.al. | 2409.10394v1 | link |
2024-09-16 | Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning | Amin Karimi Monsefi et.al. | 2409.10362v1 | null |
2024-09-16 | 2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation? | Téo Guichoux et.al. | 2409.10357v1 | null |
2024-09-13 | An Efficient and Streaming Audio Visual Active Speaker Detection System | Arnav Kundu et.al. | 2409.09018v1 | null |
2024-09-13 | Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation | Qingwen Bu et.al. | 2409.09016v1 | link |
2024-09-13 | Model-independent variable selection via the rule-based variable priorit | Min Lu et.al. | 2409.09003v1 | null |
2024-09-13 | Biomimetic Frontend for Differentiable Audio Processing | Ruolan Leslie Famularo et.al. | 2409.08997v1 | link |
2024-09-13 | Comparative Analysis of Pretrained Audio Representations in Music Recommender Systems | Yan-Martin Tamm et.al. | 2409.08987v1 | link |
2024-09-13 | Fast DCT+: A Family of Fast Transforms Based on Rank-One Updates of the Path Graph | Samuel Fernández-Menduiña et.al. | 2409.08970v1 | null |
2024-09-13 | Pushing the boundaries of event subsampling in event-based video classification using CNNs | Hesam Araghi et.al. | 2409.08953v1 | link |
2024-09-13 | Pushing Joint Image Denoising and Classification to the Edge | Thomas C Markhorst et.al. | 2409.08943v1 | null |
2024-09-13 | LLM-based Weak Supervision Framework for Query Intent Classification in Video Search | Farnoosh Javadi et.al. | 2409.08931v1 | null |
2024-09-13 | Classification of electronic structures and state preparation for quantum computation of reaction chemistry | Maximilian Mörchen et.al. | 2409.08910v1 | null |
2024-09-12 | Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor | Andrea Conti et.al. | 2409.08277v1 | null |
2024-09-12 | Hand-Object Interaction Pretraining from Videos | Himanshu Gaurav Singh et.al. | 2409.08273v1 | null |
2024-09-12 | DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer | Runjia Li et.al. | 2409.08271v1 | null |
2024-09-12 | OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering | Jiahao Nick Li et.al. | 2409.08250v1 | null |
2024-09-12 | A review of compact geodesic orbit manifolds and the g.o. condition for |
Andreas Arvanitoyeorgos et.al. | 2409.08247v1 | null |
2024-09-12 | Model Ensemble for Brain Tumor Segmentation in Magnetic Resonance Imaging | Daniel Capellán-Martín et.al. | 2409.08232v1 | null |
2024-09-12 | CliquePH: Higher-Order Information for Graph Neural Networks through Persistent Homology on Clique Graphs | Davide Buffelli et.al. | 2409.08217v1 | null |
2024-09-12 | LT3SD: Latent Trees for 3D Scene Diffusion | Quan Meng et.al. | 2409.08215v1 | null |
2024-09-12 | Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video | Boxiang Rong et.al. | 2409.08189v1 | null |
2024-09-13 | Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification | Soufiyan Bahadi et.al. | 2409.08188v2 | null |
2024-09-11 | Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models | Haibo Yang et.al. | 2409.07452v1 | link |
2024-09-11 | VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos | Yan-Bo Lin et.al. | 2409.07450v1 | null |
2024-09-11 | Autonomous loading of ore piles with Load-Haul-Dump machines using Deep Reinforcement Learning | Rodrigo Salas et.al. | 2409.07449v1 | null |
2024-09-11 | StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos | Sijie Zhao et.al. | 2409.07447v1 | null |
2024-09-11 | Deep Neural Network-Based Sign Language Recognition: A Comprehensive Approach Using Transfer Learning with Explainability | A. E. M Ridwan et.al. | 2409.07426v1 | null |
2024-09-11 | Controllable retinal image synthesis using conditional StyleGAN and latent space manipulation for improved diagnosis and grading of diabetic retinopathy | Somayeh Pakdelmoez et.al. | 2409.07422v1 | null |
2024-09-11 | Efficient One-Step Diffusion Refinement for Snapshot Compressive Imaging | Yunzhen Wang et.al. | 2409.07417v1 | null |
2024-09-11 | NVRC: Neural Video Representation Compression | Ho Man Kwan et.al. | 2409.07414v1 | null |
2024-09-12 | Robust Robot Walker: Learning Agile Locomotion over Tiny Traps | Shaoting Zhu et.al. | 2409.07409v2 | null |
2024-09-11 | Revisiting Static Feature-Based Android Malware Detection | Md Tanvirul Alam et.al. | 2409.07397v1 | null |
2024-09-10 | A study on Deep Convolutional Neural Networks, Transfer Learning and Ensemble Model for Breast Cancer Detection | Md Taimur Ahad et.al. | 2409.06699v1 | null |
2024-09-10 | DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images | Taslim Murad et.al. | 2409.06694v1 | null |
2024-09-10 | Benchmarking Sub-Genre Classification For Mainstage Dance Music | Hongzhi Shu et.al. | 2409.06690v1 | null |
2024-09-10 | A comprehensive study on Blood Cancer detection and classification using Convolutional Neural Network | Md Taimur Ahad et.al. | 2409.06689v1 | null |
2024-09-10 | A study on deep feature extraction to detect and classify Acute Lymphoblastic Leukemia (ALL) | Sabit Ahamed Preanto et.al. | 2409.06687v1 | null |
2024-09-10 | Constructing an Interpretable Deep Denoiser by Unrolling Graph Laplacian Regularizer | Seyed Alireza Hosseini et.al. | 2409.06676v1 | null |
2024-09-10 | Bulk and atmospheric metallicities as direct probes of sequentially varying accretion mechanisms of gas and solids onto planets | Yasuhiro Hasegawa et.al. | 2409.06670v1 | null |
2024-09-10 | Data Collection-free Masked Video Modeling | Yuchi Ishikawa et.al. | 2409.06665v1 | null |
2024-09-10 | World-Grounded Human Motion Recovery via Gravity-View Coordinates | Zehong Shen et.al. | 2409.06662v1 | null |
2024-09-10 | Classifying Functions via growth rates of repeated iterations | Titus Hilberdink et.al. | 2409.06661v1 | null |
2024-09-09 | Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments | Haritheja Etukuru et.al. | 2409.05865v1 | null |
2024-09-09 | Neural MP: A Generalist Neural Motion Planner | Murtaza Dalal et.al. | 2409.05864v1 | null |
2024-09-09 | LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation | Henghui Ding et.al. | 2409.05847v1 | null |
2024-09-10 | Finite-size topological phases from semimetals | Adipta Pal et.al. | 2409.05842v2 | null |
2024-09-09 | Fast Generation of Custom Floating-Point Spatial Filters on FPGAs | Nelson Campos et.al. | 2409.05837v1 | null |
2024-09-09 | Limits on the computational expressivity of non-equilibrium biophysical processes | Carlos Floyd et.al. | 2409.05827v1 | null |
2024-09-09 | A Flexible Framework for Universal Computational Aberration Correction via Automatic Lens Library Generation and Domain Adaptation | Qi Jiang et.al. | 2409.05809v1 | null |
2024-09-09 | A CLIP-based siamese approach for meme classification | Javier Huertas-Tato et.al. | 2409.05772v1 | null |
2024-09-09 | Consensus-based Distributed Quantum Kernel Learning for Speech Recognition | Kuan-Cheng Chen et.al. | 2409.05770v1 | null |
2024-09-09 | A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR | Giovanni Morrone et.al. | 2409.05750v1 | null |
2024-09-06 | Synergy and Synchrony in Couple Dances | Vongani Maluleke et.al. | 2409.04440v1 | null |
2024-09-06 | VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation | Yecheng Wu et.al. | 2409.04429v1 | null |
2024-09-06 | Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning Techniques | Davide Clode da Silva et.al. | 2409.04424v1 | null |
2024-09-06 | Virtual Reality-Based Preoperative Planning for Optimized Trocar Placement in Thoracic Surgery: A Preliminary Study | Arash Harirpoush et.al. | 2409.04414v1 | null |
2024-09-06 | Quantum Kernel Methods under Scrutiny: A Benchmarking Study | Jan Schnabel et.al. | 2409.04406v1 | null |
2024-09-09 | Question-Answering Dense Video Events | Hangyu Qin et.al. | 2409.04388v2 | null |
2024-09-06 | Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior | Charlesquin Kemajou Mbakam et.al. | 2409.04384v1 | null |
2024-09-06 | Enhancing Skin Lesion Diagnosis with Ensemble Learning | Xiaoyi Liu et.al. | 2409.04381v1 | null |
2024-09-06 | Tykhyy's Conjecture on finite mapping class group orbits | Samuel Bronstein et.al. | 2409.04379v1 | null |
2024-09-06 | The Impact of Scanner Domain Shift on Deep Learning Performance in Medical Imaging: an Experimental Study | Gregory Szumel et.al. | 2409.04368v1 | null |
2024-09-05 | Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding | Yunze Man et.al. | 2409.03757v1 | link |
2024-09-05 | Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron | Christian Schmid et.al. | 2409.03749v1 | null |
2024-09-05 | Orbital Support and Evolution of CX/OX Structures in Boxy/Peanut Bars | Behzad Tahmasebzadeh et.al. | 2409.03746v1 | null |
2024-09-05 | Libra: Architectural Support For Principled, Secure And Efficient Balanced Execution On High-End Processors (Extended Version) | Hans Winderix et.al. | 2409.03743v1 | null |
2024-09-05 | Classification and Prediction of Heart Diseases using Machine Learning Algorithms | Akua Sekyiwaa Osei-Nkwantabisa et.al. | 2409.03697v1 | null |
2024-09-05 | View-Invariant Policy Learning via Zero-Shot Novel View Synthesis | Stephen Tian et.al. | 2409.03685v1 | null |
2024-09-05 | Threat Classification on Deployed Optical Networks Using MIMO Digital Fiber Sensing, Wavelets, and Machine Learning | Khouloud Abdelli et.al. | 2409.03667v1 | null |
2024-09-05 | Limited but consistent gains in adversarial robustness by co-training object recognition models with human EEG | Manshan Guo et.al. | 2409.03646v1 | null |
2024-09-05 | Variance reduction in Texas hold'em and in video poker | Stewart N. Ethier et.al. | 2409.03607v1 | null |
2024-09-05 | SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing | Lingyu Xiong et.al. | 2409.03605v1 | null |
2024-09-04 | SITAR: Semi-supervised Image Transformer for Action Recognition | Owais Iqbal et.al. | 2409.02910v1 | null |
2024-09-04 | GraphTrials: Visual Proofs of Graph Properties | Henry Förster et.al. | 2409.02907v1 | null |
2024-09-04 | Classification of spin-$1/2$ fermionic quantum spin liquids on the trillium lattice | Ming-Hao Li et.al. | 2409.02898v1 | null |
2024-09-04 | LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture | Xidong Wang et.al. | 2409.02889v1 | link |
2024-09-04 | CanvOI, an Oncology Intelligence Foundation Model: Scaling FLOPS Differently | Jonathan Zalach et.al. | 2409.02885v1 | null |
2024-09-04 | Look Into the LITE in Deep Learning for Time Series Classification | Ali Ismail-Fawaz et.al. | 2409.02869v1 | null |
2024-09-04 | Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models | Zhibin Liu et.al. | 2409.02851v1 | null |
2024-09-04 | iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation | Hayeon Jo et.al. | 2409.02838v1 | null |
2024-09-04 | Evolution of radiation profiles in a strongly baffled divertor on MAST Upgrade | Fabio Federici et.al. | 2409.02837v1 | null |
2024-09-04 | Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models | Moein Shahiki Tash et.al. | 2409.02836v1 | null |
2024-08-30 | Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding | Gueter Josmy Faure et.al. | 2408.17443v1 | link |
2024-08-30 | SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists | Raoyuan Zhao et.al. | 2408.17437v1 | link |
2024-08-30 | CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion | Yiran Chen et.al. | 2408.17424v1 | null |
2024-09-03 | Open-vocabulary Temporal Action Localization using VLMs | Naoki Wake et.al. | 2408.17422v2 | null |
2024-08-30 | Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes | Li Zhang et.al. | 2408.17421v1 | link |
2024-08-30 | End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework | Chang Cai et.al. | 2408.17397v1 | null |
2024-08-30 | Equivariant isomorphism of Quantum Lens Spaces of low dimension | Søren Eilers et.al. | 2408.17386v1 | null |
2024-08-30 | LASSO-MOGAT: A Multi-Omics Graph Attention Framework for Cancer Classification | Fadi Alharbi et.al. | 2408.17384v1 | null |
2024-08-30 | Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain | Francesca Grasso et.al. | 2408.17362v1 | link |
2024-08-30 | Enhancing Underwater Imaging with 4-D Light Fields: Dataset and Method | Yuji Lin et.al. | 2408.17339v1 | null |
2024-08-29 | SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners | Ziyu Guo et.al. | 2408.16768v1 | link |
2024-08-29 | ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model | Fangfu Liu et.al. | 2408.16767v1 | null |
2024-08-29 | OmniRe: Omni Urban Scene Reconstruction | Ziyu Chen et.al. | 2408.16760v1 | null |
2024-08-29 | Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge | Beidi Dong et.al. | 2408.16749v1 | null |
2024-08-29 | Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech | Cong Zhang et.al. | 2408.16732v1 | null |
2024-08-29 | VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation | Shiwei Wu et.al. | 2408.16730v1 | null |
2024-08-29 | Prediction-Feedback DETR for Temporal Action Detection | Jihwan Kim et.al. | 2408.16729v1 | null |
2024-08-29 | A GREAT Architecture for Edge-Based Graph Problems Like TSP | Attila Lischka et.al. | 2408.16717v1 | null |
2024-08-29 | One-Shot Learning Meets Depth Diffusion in Multi-Object Videos | Anisha Jain et.al. | 2408.16704v1 | null |
2024-08-29 | RoboMNIST: A Multimodal Dataset for Multi-Robot Activity Recognition Using WiFi Sensing, Video, and Audio | Kian Behzad et.al. | 2408.16703v1 | null |
2024-08-29 | Spatio-Temporal Context Prompting for Zero-Shot Action Detection | Wei-Jhe Huang et.al. | 2408.15996v2 | null |
2024-08-28 | TEDRA: Text-based Editing of Dynamic and Photoreal Actors | Basavaraj Sunagad et.al. | 2408.15995v1 | null |
2024-08-28 | Minimizing movements solutions for a monotone model of droplet motion | Carson Collins et.al. | 2408.15984v1 | null |
2024-08-28 | VLT/MUSE detection of accretion-ejection associated with the close stellar companion in the HT Lup system | Sebastián Jorquera et.al. | 2408.15976v1 | null |
2024-08-28 | 1+1d SPT phases with fusion category symmetry: interface modes and non-abelian Thouless pump | Kansei Inamura et.al. | 2408.15960v1 | null |
2024-08-28 | Generating Binary Species Range Maps | Filip Dorm et.al. | 2408.15956v1 | null |
2024-08-28 | Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games | Nicholas R. Waytowich et.al. | 2408.15950v1 | null |
2024-08-28 | Auxiliary Input in Training: Incorporating Catheter Features into Deep Learning Models for ECG-Free Dynamic Coronary Roadmapping | Yikang Liu et.al. | 2408.15947v1 | null |
2024-08-28 | A latticed total K-theory | Qingnan An et.al. | 2408.15941v1 | null |
2024-08-28 | Local Descriptors Weighted Adaptive Threshold Filtering For Few-Shot Learning | Bingchen Yan et.al. | 2408.15924v1 | null |
2024-08-27 | GenRec: Unifying Video Generation and Recognition with Diffusion Models | Zejia Weng et.al. | 2408.15241v1 | null |
2024-08-27 | Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation | Xiaojuan Wang et.al. | 2408.15239v1 | null |
2024-08-27 | DCT-CryptoNets: Scaling Private Inference in the Frequency Domain | Arjun Roy et.al. | 2408.15231v1 | null |
2024-08-27 | SAM & SAM 2 in 3D Slicer: SegmentWithSAM Extension for Annotating Medical Images | Zafer Yildiz et.al. | 2408.15224v1 | link |
2024-08-27 | Histo-Diffusion: A Diffusion Super-Resolution Method for Digital Pathology with Comprehensive Quality Assessment | Xuan Xu et.al. | 2408.15218v1 | null |
2024-08-27 | Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance | Weiyi Zhang et.al. | 2408.15217v1 | null |
2024-08-27 | Classifying populist language in American presidential and governor speeches using automatic text analysis | Olaf van der Veen et.al. | 2408.15213v1 | null |
2024-08-27 | Sec2Sec Co-attention for Video-Based Apparent Affective Prediction | Mingwei Sun et.al. | 2408.15209v1 | link |
2024-08-27 | Automatic 8-tissue Segmentation for 6-month Infant Brains | Yilan Dong et.al. | 2408.15198v1 | null |
2024-08-27 | Infusing Acoustic Pause Context into Text-Based Dementia Assessment | Franziska Braun et.al. | 2408.15188v1 | null |
2024-08-26 | Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos | Qirui Chen et.al. | 2408.14469v1 | null |
2024-08-26 | K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences | Zhikai Li et.al. | 2408.14468v1 | null |
2024-08-26 | Reconstructing physiological signals from fMRI across the adult lifespan | Shiyu Wang et.al. | 2408.14453v1 | null |
2024-08-26 | Model Parallel Training and Transfer Learning for Convolutional Neural Networks by Domain Decomposition | Axel Klawonn et.al. | 2408.14442v1 | null |
2024-08-26 | Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification | Mahrukh Awan et.al. | 2408.14441v1 | null |
2024-08-26 | Radiance Cascades: A Novel High-Resolution Formal Solution for Multidimensional Non-LTE Radiative Transfer | Christopher M. J. Osborne et.al. | 2408.14425v1 | null |
2024-08-26 | Learning Tree-Structured Composition of Data Augmentation | Dongyue Li et.al. | 2408.14381v1 | link |
2024-08-26 | Probing Causality Manipulation of Large Language Models | Chenyang Zhang et.al. | 2408.14380v1 | link |
2024-08-26 | GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy | Peiyan Li et.al. | 2408.14368v1 | null |
2024-08-26 | An Embedding is Worth a Thousand Noisy Labels | Francesco Di Salvo et.al. | 2408.14358v1 | null |
2024-08-23 | Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder | Marie Huynh et.al. | 2408.13255v1 | null |
2024-08-23 | Domain-specific long text classification from sparse relevant information | Célia D'Cruz et.al. | 2408.13253v1 | null |
2024-08-23 | CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities | Tao Wu et.al. | 2408.13239v1 | null |
2024-08-23 | D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching | Jingyu Liu et.al. | 2408.13226v1 | null |
2024-08-23 | ResSR: A Residual Approach to Super-Resolving Multispectral Images | Haley Duba-Sullivan et.al. | 2408.13225v1 | null |
2024-08-23 | EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods | Hongcheng Ding et.al. | 2408.13214v1 | null |
2024-08-23 | Instruct-DeBERTa: A Hybrid Approach for Aspect-based Sentiment Analysis on Textual Reviews | Dineth Jayakody et.al. | 2408.13202v1 | null |
2024-08-23 | EAViT: External Attention Vision Transformer for Audio Classification | Aquib Iqbal et.al. | 2408.13201v1 | null |
2024-08-23 | Deep Learning for Lung Disease Classification Using Transfer Learning and a Customized CNN Architecture with Attention | Xiaoyi Liu et.al. | 2408.13180v1 | null |
2024-08-23 | Augmented Functional Random Forests: Classifier Construction and Unbiased Functional Principal Components Importance through Ad-Hoc Conditional Permutations | Fabrizio Maturo et.al. | 2408.13179v1 | null |
2024-08-22 | Automating Deformable Gasket Assembly | Simeon Adebola et.al. | 2408.12593v1 | null |
2024-08-22 | xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations | Can Qin et.al. | 2408.12590v1 | null |
2024-08-22 | Real-Time Video Generation with Pyramid Attention Broadcast | Xuanlei Zhao et.al. | 2408.12588v1 | link |
2024-08-22 | Enhanced Parking Perception by Multi-Task Fisheye Cross-view Transformers | Antonyo Musabini et.al. | 2408.12575v1 | null |
2024-08-22 | MuMA-ToM: Multi-modal Multi-Agent Theory of Mind | Haojun Shi et.al. | 2408.12574v1 | null |
2024-08-22 | Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers | Sayed Mohammad Vakilzadeh Hatefi et.al. | 2408.12568v1 | null |
2024-08-22 | ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation | Lujia Zhong et.al. | 2408.12561v1 | link |
2024-08-22 | Exploring the Role of Audio in Multimodal Misinformation Detection | Moyang Liu et.al. | 2408.12558v1 | null |
2024-08-22 | Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge | Jun Ma et.al. | 2408.12534v1 | null |
2024-08-22 | UMAD: University of Macau Anomaly Detection Benchmark Dataset | Dong Li et.al. | 2408.12527v1 | link |
2024-08-21 | Great Memory, Shallow Reasoning: Limits of $k$NN-LMs | Shangyi Geng et.al. | 2408.11815v1 | link |
2024-08-21 | EmbodiedSAM: Online Segment Any 3D Thing in Real Time | Xiuwei Xu et.al. | 2408.11811v1 | null |
2024-08-21 | Approaching Deep Learning through the Spectral Dynamics of Weights | David Yunis et.al. | 2408.11804v1 | link |
2024-08-21 | Practical token pruning for foundation models in few-shot conversational virtual assistant systems | Haode Qi et.al. | 2408.11799v1 | null |
2024-08-21 | Critique-out-Loud Reward Models | Zachary Ankner et.al. | 2408.11791v1 | link |
2024-08-21 | DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework | Zhifei Xie et.al. | 2408.11788v1 | null |
2024-08-21 | NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation | Zhenye Lou et.al. | 2408.11787v1 | link |
2024-08-21 | Timeline and Boundary Guided Diffusion Network for Video Shadow Detection | Haipeng Zhou et.al. | 2408.11785v1 | link |
2024-08-21 | SBDet: A Symmetry-Breaking Object Detector via Relaxed Rotation-Equivariance | Zhiqiang Wu et.al. | 2408.11760v1 | null |
2024-08-21 | Improving the Scan-rescan Precision of AI-based CMR Biomarker Estimation | Dewmini Hasara Wickremasinghe et.al. | 2408.11754v1 | null |
2024-08-20 | Discriminant Analysis in stationary time series based on robust cepstral coefficients | Jonathan de Souza Matias et.al. | 2408.11012v1 | null |
2024-08-20 | Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos | Dennis Fedorishin et.al. | 2408.10998v1 | null |
2024-08-20 | Denoising Plane Wave Ultrasound Images Using Diffusion Probabilistic Models | Hojat Asgariandehkordi et.al. | 2408.10987v1 | null |
2024-08-20 | ISLES'24: Improving final infarct prediction in ischemic stroke using multimodal imaging and clinical data | Ezequiel de la Rosa et.al. | 2408.10966v1 | null |
2024-08-20 | Multichannel Attention Networks with Ensembled Transfer Learning to Recognize Bangla Handwritten Charecter | Farhanul Haque et.al. | 2408.10955v1 | null |
2024-08-20 | Wave-Mask/Mix: Exploring Wavelet-Based Augmentations for Time Series Forecasting | Dona Arabi et.al. | 2408.10951v1 | link |
2024-08-20 | Proxona: Leveraging LLM-Driven Personas to Enhance Creators' Understanding of Their Audience | Yoonseo Choi et.al. | 2408.10937v1 | null |
2024-08-20 | SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement | Linlin Hu et.al. | 2408.10934v1 | null |
2024-08-20 | ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining | Qi Ma et.al. | 2408.10906v1 | null |
2024-08-20 | ViLReF: A Chinese Vision-Language Retinal Foundation Model | Shengzhu Yang et.al. | 2408.10894v1 | link |
2024-08-19 | Some model theory of quadratic geometries | Charlotte Kestner et.al. | 2408.10196v1 | null |
2024-08-19 | Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification | Jing Li et.al. | 2408.10193v1 | null |
2024-08-20 | LongVILA: Scaling Long-Context Visual Language Models for Long Videos | Fuzhao Xue et.al. | 2408.10188v2 | link |
2024-08-19 | SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models | Anke Tang et.al. | 2408.10174v1 | link |
2024-08-19 | Galaxy Zoo: Morphologies based on UKIDSS NIR Imaging for 71,052 Galaxies | Karen L. Masters et.al. | 2408.10160v1 | null |
2024-08-19 | Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video | Shuxian Wang et.al. | 2408.10153v1 | null |
2024-08-19 | Biharmonic conformal immersions into a 3-dimensional conformally flat space | Ze-Ping Wang et.al. | 2408.10144v1 | null |
2024-08-19 | Perceptual Depth Quality Assessment of Stereoscopic Omnidirectional Images | Wei Zhou et.al. | 2408.10134v1 | null |
2024-08-19 | UNINEXT-Cutie: The 1st Solution for LSVOS Challenge RVOS Track | Hao Fang et.al. | 2408.10129v1 | null |
2024-08-19 | Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track | Feiyu Pan et.al. | 2408.10125v1 | null |
2024-08-16 | Quantum Annealing for Enhanced Feature Selection in Single-Cell RNA Sequencing Data Analysis | Selim Romero et.al. | 2408.08867v1 | null |
2024-08-16 | DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models | Eman Ali et.al. | 2408.08855v1 | null |
2024-08-16 | ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis | Yubao Zhao et.al. | 2408.08849v1 | null |
2024-08-16 | HistoGym: A Reinforcement Learning Environment for Histopathological Image Analysis | Zhi-Bo Liu et.al. | 2408.08847v1 | link |
2024-08-16 | LEVIS: Large Exact Verifiable Input Spaces for Neural Networks | Mohamad Fares El Hajj Chehade et.al. | 2408.08824v1 | null |
2024-08-16 | Optimal Symmetries in Binary Classification | Vishal S. Ngairangbam et.al. | 2408.08823v1 | null |
2024-08-16 | Leveraging FourierKAN Classification Head for Pre-Trained Transformer-based Text Classification | Abdullah Al Imran et.al. | 2408.08803v1 | null |
2024-08-16 | Xpikeformer: Hybrid Analog-Digital Hardware Acceleration for Spiking Transformers | Zihang Song et.al. | 2408.08794v1 | null |
2024-08-16 | Assessing Generalization Capabilities of Malaria Diagnostic Models from Thin Blood Smears | Louise Guillon et.al. | 2408.08792v1 | null |
2024-08-16 | A Disease-Specific Foundation Model Using Over 100K Fundus Images: Release and Validation for Abnormality and Multi-Disease Classification on Downstream Tasks | Boa Jang et.al. | 2408.08790v1 | link |
2024-08-15 | HyperTaxel: Hyper-Resolution for Taxel-Based Tactile Signals Through Contrastive Learning | Hongyu Li et.al. | 2408.08312v1 | null |
2024-08-15 | Gauge-invariant optical selection rules for excitons | Tharindu Fernando et.al. | 2408.08311v1 | null |
2024-08-15 | Accelerated Image-Aware Generative Diffusion Modeling | Tanmay Asthana et.al. | 2408.08306v1 | null |
2024-08-15 | SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training | Gengwei Zhang et.al. | 2408.08295v1 | link |
2024-08-15 | Marker or Markerless? Mode-Switchable Optical Tactile Sensing for Diverse Robot Tasks | Ni Ou et.al. | 2408.08276v1 | null |
2024-08-15 | Snuffy: Efficient Whole Slide Image Classifier | Hossein Jafarinia et.al. | 2408.08258v1 | link |
2024-08-15 | Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective | Zixuan Pan et.al. | 2408.08228v1 | link |
2024-08-15 | RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science | David Farr et.al. | 2408.08217v1 | null |
2024-08-15 | Moving Healthcare AI-Support Systems for Visually Detectable Diseases onto Constrained Devices | Tess Watt et.al. | 2408.08215v1 | null |
2024-08-15 | Learned Multimodal Compression for Autonomous Driving | Hadi Hadizadeh et.al. | 2408.08211v1 | null |
2024-08-14 | End-to-end Semantic-centric Video-based Multimodal Affective Computing | Ronghao Lin et.al. | 2408.07694v1 | null |
2024-08-15 | A Spitting Image: Modular Superpixel Tokenization in Vision Transformers | Marius Aasan et.al. | 2408.07680v2 | link |
2024-08-14 | G$^2$V$^2$former: Graph Guided Video Vision Transformer for Face Anti-Spoofing | Jingyi Yang et.al. | 2408.07675v1 | null |
2024-08-14 | Graph Triple Attention Network: A Decoupled Perspective | Xiaotang Wang et.al. | 2408.07654v1 | link |
2024-08-14 | Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving | Yuqing Wen et.al. | 2408.07605v1 | null |
2024-08-14 | Disentangle and denoise: Tackling context misalignment for video moment retrieval | Kaijing Ma et.al. | 2408.07600v1 | null |
2024-08-14 | Theoretical and Practical Progress in Hyperspectral Pixel Unmixing with Large Spectral Libraries from a Sparse Perspective | Jade Preston et.al. | 2408.07580v1 | null |
2024-08-14 | TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases | Thibault Simonetto et.al. | 2408.07579v1 | link |
2024-08-14 | DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model | Erez Yosef et.al. | 2408.07541v1 | null |
2024-08-14 | Improved 3D Whole Heart Geometry from Sparse CMR Slices | Yiyang Xu et.al. | 2408.07532v1 | link |
2024-08-13 | On Networks and their Applications: Stability of Gene Regulatory Networks and Gene Function Prediction using Autoencoders | Hamza Coban et.al. | 2408.07064v1 | null |
2024-08-13 | Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual Reality | Yu-Chih Chen et.al. | 2408.07041v1 | null |
2024-08-13 | PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology | Xiaomin Wu et.al. | 2408.07037v1 | null |
2024-08-13 | Feature-Preserving Rate-Distortion Optimization in Image Coding for Machines | Samuel Fernández Menduiña et.al. | 2408.07028v1 | null |
2024-08-13 | Event-Stream Super Resolution using Sigma-Delta Neural Network | Waseem Shariff et.al. | 2408.06968v1 | null |
2024-08-13 | DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs | Dongyuan Li et.al. | 2408.06966v1 | null |
2024-08-13 | OpenResearcher: Unleashing AI for Accelerated Scientific Research | Yuxiang Zheng et.al. | 2408.06941v1 | link |
2024-08-13 | Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification | Bauke Arends et.al. | 2408.06930v1 | null |
2024-08-13 | Divide and Conquer: Improving Multi-Camera 3D Perception with 2D Semantic-Depth Priors and Input-Dependent Queries | Qi Song et.al. | 2408.06901v1 | null |
2024-08-13 | Entendre, a Social Bot Detection Tool for Niche, Fringe, and Extreme Social Media | Pranav Venkatesh et.al. | 2408.06900v1 | null |
2024-08-12 | Is it a work or leisure travel? Applying text classification to identify work-related travel on social networks | Lucas Félix et.al. | 2408.06341v1 | null |
2024-08-12 | Moo-ving Beyond Tradition: Revolutionizing Cattle Behavioural Phenotyping with Pose Estimation Techniques | Navid Ghassemi et.al. | 2408.06336v1 | null |
2024-08-12 | LOLgorithm: Integrating Semantic,Syntactic and Contextual Elements for Humor Classification | Tanisha Khurana et.al. | 2408.06335v1 | null |
2024-08-12 | From SAM to SAM 2: Exploring Improvements in Meta's Segment Anything Model | Athulya Sundaresan Geetha et.al. | 2408.06305v1 | null |
2024-08-12 | Sparsity Based Multi-Source Robust 3D Localization Using a Moving Receiver | Amir Mansourian et.al. | 2408.06274v1 | null |
2024-08-12 | Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance | Manuel Milling et.al. | 2408.06264v1 | null |
2024-08-12 | Deep Learning System Boundary Testing through Latent Space Style Mixing | Amr Abdellatif et.al. | 2408.06258v1 | null |
2024-08-12 | Rethinking Video with a Universal Event-Based Representation | Andrew Freeman et.al. | 2408.06248v1 | null |
2024-08-12 | A Comprehensive Case Study on the Performance of Machine Learning Methods on the Classification of Solar Panel Electroluminescence Images | Xinyi Song et.al. | 2408.06229v1 | link |
2024-08-12 | ARCADE: An Augmented Reality Display Environment for Multimodal Interaction with Conversational Agents | Carolin Schindler et.al. | 2408.06222v1 | null |
2024-08-09 | VITA: Towards Open-Source Interactive Omni Multimodal LLM | Chaoyou Fu et.al. | 2408.05211v1 | null |
2024-08-09 | Kalman-Inspired Feature Propagation for Video Face Super-Resolution | Ruicheng Feng et.al. | 2408.05205v1 | null |
2024-08-09 | HistoKernel: Whole Slide Image Level Maximum Mean Discrepancy Kernels for Pan-Cancer Predictive Modelling | Piotr Keller et.al. | 2408.05195v1 | link |
2024-08-09 | Cross-Domain Learning for Video Anomaly Detection with Limited Supervision | Yashika Jain et.al. | 2408.05191v1 | null |
2024-08-09 | Holomorphic vector fields with real integral manifolds | Martin Kolář et.al. | 2408.05186v1 | null |
2024-08-09 | MADE-WIC: Multiple Annotated Datasets for Exploring Weaknesses In Code | Moritz Mock et.al. | 2408.05163v1 | null |
2024-08-09 | Meta-Learning Guided Label Noise Distillation for Robust Signal Modulation Classification | Xiaoyang Hao et.al. | 2408.05151v1 | null |
2024-08-09 | Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video | Chunggi Lee et.al. | 2408.05123v1 | null |
2024-08-09 | Cautious Calibration in Binary Classification | Mari-Liis Allikivi et.al. | 2408.05120v1 | null |
2024-08-09 | Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images | Shouyue Liu et.al. | 2408.05117v1 | null |
2024-08-08 | Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics | Ruining Li et.al. | 2408.04631v1 | null |
2024-08-08 | LogogramNLP: Comparing Visual and Textual Representations of Ancient Logographic Writing Systems for NLP | Danlu Chen et.al. | 2408.04628v1 | null |
2024-08-08 | Transformer Explainer: Interactive Learning of Text-Generative Models | Aeree Cho et.al. | 2408.04619v1 | null |
2024-08-08 | Quantifying the Impact of Population Shift Across Age and Sex for Abdominal Organ Segmentation | Kate Čevora et.al. | 2408.04610v1 | null |
2024-08-08 | Enhanced Prototypical Part Network (EPPNet) For Explainable Image Classification Via Prototypes | Bhushan Atote et.al. | 2408.04606v1 | null |
2024-08-08 | SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation | Jieming Yu et.al. | 2408.04593v1 | null |
2024-08-08 | Learn To Learn More Precisely | Runxi Cheng et.al. | 2408.04590v1 | null |
2024-08-08 | SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals | Haoran Zheng et.al. | 2408.04575v1 | null |
2024-08-08 | Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches | Yongzhi Xu et.al. | 2408.04567v1 | null |
2024-08-08 | MemeMind at ArAIEval Shared Task: Spotting Persuasive Spans in Arabic Text with Persuasion Techniques Identification | Md Rafiul Biswas et.al. | 2408.04540v1 | null |
2024-08-07 | How Well Can Vision Language Models See Image Details? | Chenhui Gou et.al. | 2408.03940v1 | null |
2024-08-07 | Fast Sprite Decomposition from Animated Graphics | Tomoyuki Suzuki et.al. | 2408.03923v1 | null |
2024-08-07 | FMiFood: Multi-modal Contrastive Learning for Food Image Classification | Xinyue Pan et.al. | 2408.03922v1 | null |
2024-08-07 | Holomorphic foliations tangent to Rolle-pfaffian hypersurfaces | Arturo Fernández-Pérez et.al. | 2408.03914v1 | null |
2024-08-07 | AdapMTL: Adaptive Pruning Framework for Multitask Learning Model | Mingcan Xiang et.al. | 2408.03913v1 | null |
2024-08-07 | Achieving Human Level Competitive Robot Table Tennis | David B. D'Ambrosio et.al. | 2408.03906v1 | null |
2024-08-07 | Lightweight Video Denoising Using a Classic Bayesian Backbone | Clément Bled et.al. | 2408.03904v1 | null |
2024-08-07 | Retrieval Augmentation via User Interest Clustering | Hanjia Lyu et.al. | 2408.03886v1 | null |
2024-08-07 | Global-Local Progressive Integration Network for Blind Image Quality Assessment | Xiaoqi Wang et.al. | 2408.03885v1 | null |
2024-08-07 | Knowledge Probing for Graph Representation Learning | Mingyu Zhao et.al. | 2408.03877v1 | null |
2024-08-06 | LLaVA-OneVision: Easy Visual Task Transfer | Bo Li et.al. | 2408.03326v1 | null |
2024-08-06 | ClassiFIM: An Unsupervised Method To Detect Phase Transitions | Victor Kasatkin et.al. | 2408.03323v1 | null |
2024-08-06 | Segment Anything in Medical Images and Videos: Benchmark and Deployment | Jun Ma et.al. | 2408.03322v1 | null |
2024-08-06 | MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation | Xiaofeng Mao et.al. | 2408.03312v1 | null |
2024-08-06 | Left of Fab: Securing Design and Collaboration in the Semiconductor Value Chain | John C. Hoag et.al. | 2408.03295v1 | null |
2024-08-06 | Biomedical SAM 2: Segment Anything in Biomedical Images and Videos | Zhiling Yan et.al. | 2408.03286v1 | null |
2024-08-06 | ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer | Jiazhi Guan et.al. | 2408.03284v1 | null |
2024-08-06 | Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments | Angie Boggust et.al. | 2408.03274v1 | null |
2024-08-07 | BVI-AOM: A New Training Dataset for Deep Video Compression Optimization | Jakub Nawała et.al. | 2408.03265v2 | null |
2024-08-06 | Analysis of Partially-Calibrated Sparse Subarrays for Direction Finding with Extended Degrees of Freedom | W. S. Leite et.al. | 2408.03236v1 | null |
2024-08-05 | Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics | Shishira R Maiya et.al. | 2408.02672v1 | null |
2024-08-05 | Interactive 3D Medical Image Segmentation with SAM 2 | Chuyun Shen et.al. | 2408.02635v1 | null |
2024-08-05 | VidGen-1M: A Large-Scale Dataset for Text-to-video Generation | Zhiyu Tan et.al. | 2408.02629v1 | null |
2024-08-05 | DanModCap: Designing a Danmaku Moderation Tool for Video-Sharing Platforms that Leverages Impact Captions | Siying Hu et.al. | 2408.02574v1 | null |
2024-08-05 | Cross-Modality Clustering-based Self-Labeling for Multimodal Data Classification | Paweł Zyblewski et.al. | 2408.02568v1 | null |
2024-08-05 | HQOD: Harmonious Quantization for Object Detection | Long Huang et.al. | 2408.02561v1 | null |
2024-08-05 | The effect of dynamical states on galaxy clusters populations. I. Classification of dynamical states | S. Véliz Astudillo et.al. | 2408.02519v1 | null |
2024-08-05 | Automatic rating of incomplete hippocampal inversions evaluated across multiple cohorts | Lisa Hemforth et.al. | 2408.02496v1 | null |
2024-08-05 | HyperSpaceX: Radial and Angular Exploration of HyperSpherical Dimensions | Chiranjeev Chiranjeev et.al. | 2408.02494v1 | null |
2024-08-05 | Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection | Ting Lei et.al. | 2408.02484v1 | null |
2024-08-02 | Conditional LoRA Parameter Generation | Xiaolong Jin et.al. | 2408.01415v1 | null |
2024-08-02 | Derivation of Back-propagation for Graph Convolutional Networks using Matrix Calculus and its Application to Explainable Artificial Intelligence | Yen-Che Hsiao et.al. | 2408.01408v1 | null |
2024-08-02 | NOLO: Navigate Only Look Once | Bohan Zhou et.al. | 2408.01384v1 | null |
2024-08-02 | Explaining a probabilistic prediction on the simplex with Shapley compositions | Paul-Gauthier Noé et.al. | 2408.01382v1 | null |
2024-08-02 | Spatial-Spectral Morphological Mamba for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2408.01372v1 | null |
2024-08-02 | Classification of marked elliptic root systems with non-reduced quotient | A. Fialowski et.al. | 2408.01358v1 | null |
2024-08-02 | Harmonized connectome resampling for variance in voxel sizes | Elyssa M. McMaster et.al. | 2408.01351v1 | null |
2024-08-02 | Human foraging strategies flexibly adapt to resource distribution and time constraints | Valeria Simonelli et.al. | 2408.01350v1 | null |
2024-08-02 | PC$^2$: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval | Yue Duan et.al. | 2408.01349v1 | null |
2024-08-02 | Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks | Anders Giovanni Møller et.al. | 2408.01346v1 | null |
2024-08-01 | Text-Guided Video Masked Autoencoder | David Fan et.al. | 2408.00759v1 | null |
2024-08-01 | Segment anything model 2: an application to 2D and 3D medical images | Haoyu Dong et.al. | 2408.00756v1 | null |
2024-08-01 | Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model | Benlin Liu et.al. | 2408.00754v1 | null |
2024-08-01 | CERT-ED: Certifiably Robust Text Classification for Edit Distance | Zhuoqun Huang et.al. | 2408.00728v1 | null |
2024-08-01 | SAM 2: Segment Anything in Images and Videos | Nikhila Ravi et.al. | 2408.00714v1 | null |
2024-08-01 | Investigating Brain Connectivity and Regional Statistics from EEG for early stage Parkinson's Classification | Amarpal Sahota et.al. | 2408.00711v1 | null |
2024-08-01 | Point-supervised Brain Tumor Segmentation with Box-prompted MedSAM | Xiaofeng Liu et.al. | 2408.00706v1 | null |
2024-08-01 | Granular-Balls based Fuzzy Twin Support Vector Machine for Classification | Lixi Zhao et.al. | 2408.00699v1 | null |
2024-08-01 | ExpertAF: Expert Actionable Feedback from Video | Kumar Ashutosh et.al. | 2408.00672v1 | null |
2024-08-01 | AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models | Daqin Luo et.al. | 2408.00665v1 | null |
2024-07-31 | The Llama 3 Herd of Models | Abhimanyu Dubey et.al. | 2407.21783v1 | null |
2024-07-31 | RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining | Hongtao Wu et.al. | 2407.21773v1 | null |
2024-07-31 | ReplanVLM: Replanning Robotic Tasks with Visual Language Models | Aoran Mei et.al. | 2407.21762v1 | null |
2024-07-31 | Learning Video Context as Interleaved Multimodal Sequences | Kevin Qinghong Lin et.al. | 2407.21757v1 | null |
2024-08-01 | Topological Woodward-Hoffmann classification for cycloadditions in polycyclic aromatic azomethine ylides | Juan Li et.al. | 2407.21756v2 | null |
2024-07-31 | A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation | Mothilal Asokan et.al. | 2407.21739v1 | null |
2024-07-31 | Leveraging Self-Supervised Learning for Fetal Cardiac Planes Classification using Ultrasound Scan Videos | Joseph Geo Benjamin et.al. | 2407.21738v1 | null |
2024-07-31 | Artificial Intelligence Approaches for Energy Efficiency: A Review | Alberto Pasqualetto et.al. | 2407.21726v1 | null |
2024-07-31 | Open-Vocabulary Audio-Visual Semantic Segmentation | Ruohao Guo et.al. | 2407.21721v1 | null |
2024-07-31 | Tora: Trajectory-oriented Diffusion Transformer for Video Generation | Zhenghao Zhang et.al. | 2407.21705v1 | null |
2024-07-30 | Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation | Marcelo Matheus Gauy et.al. | 2407.20989v1 | null |
2024-07-30 | Transfer Learning for Multi-material Classification of Transition Metal Dichalcogenides with Atomic Force Microscopy | Isaiah A. Moses et.al. | 2407.20975v1 | null |
2024-07-30 | MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions | Xiaowei Chi et.al. | 2407.20962v1 | link |
2024-07-30 | EAR: Edge-Aware Reconstruction of 3-D vertebrae structures from bi-planar X-ray images | Lixing Tan et.al. | 2407.20937v1 | null |
2024-07-30 | Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering | Yanpeng Zhao et.al. | 2407.20908v1 | link |
2024-07-30 | Simultaneous Multi-Slice Diffusion Imaging using Navigator-free Multishot Spiral Acquisition | Yuancheng Jiang et.al. | 2407.20904v1 | null |
2024-07-30 | Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach | Adam Wojciechowski et.al. | 2407.20899v1 | null |
2024-07-30 | MambaCapsule: Towards Transparent Cardiac Disease Diagnosis with Electrocardiography Using Mamba Capsule Network | Yinlong Xu et.al. | 2407.20893v1 | null |
2024-07-30 | Shift operators and their classification | Maria Carvalho et.al. | 2407.20890v1 | null |
2024-07-30 | Effective Black Box Testing of Sentiment Analysis Classification Networks | Parsa Karbasizadeh et.al. | 2407.20884v1 | null |
2024-07-29 | SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction | Çağhan Köksal et.al. | 2407.20214v1 | null |
2024-07-30 | SpaER: Learning Spatio-temporal Equivariant Representations for Fetal Brain Motion Tracking | Jian Wang et.al. | 2407.20198v2 | null |
2024-07-29 | Radiance Fields for Robotic Teleoperation | Maximum Wilder-Smith et.al. | 2407.20194v1 | null |
2024-07-29 | Theia: Distilling Diverse Vision Foundation Models for Robot Learning | Jinghuan Shang et.al. | 2407.20179v1 | link |
2024-07-29 | LatentArtiFusion: An Effective and Efficient Histological Artifacts Restoration Framework | Zhenqi He et.al. | 2407.20172v1 | link |
2024-07-29 | Diffusion Feedback Helps CLIP See Better | Wenxuan Wang et.al. | 2407.20171v1 | null |
2024-07-29 | Language-Conditioned Offline RL for Multi-Robot Navigation | Steven Morad et.al. | 2407.20164v1 | null |
2024-07-29 | Quantum Machine Learning Architecture Search via Deep Reinforcement Learning | Xin Dai et.al. | 2407.20147v1 | null |
2024-07-30 | AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics | Xiangxiang Dai et.al. | 2407.20124v2 | link |
2024-07-29 | Integrable and superintegrable quantum mechanical systems with position dependent masses invariant with respect to one parametric Lie groups. 2. Systems with dilatation and shift symmetries | A. G. Nikitin et.al. | 2407.20112v1 | null |
2024-07-26 | HRP: Human Affordances for Robotic Pre-Training | Mohan Kumar Srirama et.al. | 2407.18911v1 | null |
2024-07-26 | Wolf: Captioning Everything with a World Summarization Framework | Boyi Li et.al. | 2407.18908v1 | null |
2024-07-26 | A Scalable Quantum Non-local Neural Network for Image Classification | Sparsh Gupta et.al. | 2407.18906v1 | link |
2024-07-26 | Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment | Yuze Zheng et.al. | 2407.18854v1 | null |
2024-07-26 | The Role of Temporal Hierarchy in Spiking Neural Networks | Filippo Moro et.al. | 2407.18838v1 | null |
2024-07-26 | Learning the Chaotic and Regular Nature of Trajectories in Hamiltonian Systems with Lagrangian descriptors | Javier Jiménez López et.al. | 2407.18831v1 | null |
2024-07-26 | Binary orbit and disks properties of the RW Aur system using ALMA observations | N. T. Kurtovic et.al. | 2407.18828v1 | null |
2024-07-26 | Three-dimensional ultrasound-based online system for automated ovarian follicle measurement | Pedro Royo et.al. | 2407.18818v1 | null |
2024-07-26 | Automatic Detection of Moral Values in Music Lyrics | Vjosa Preniqi et.al. | 2407.18787v1 | null |
2024-07-26 | Deep learning interpretable analysis for carbon star identification in Gaia DR3 | Shuo Ye et.al. | 2407.18754v1 | null |
2024-07-25 | Review of Degenerate Higher Order Scalar Tensor Theories in Cosmology | Andrei Lazanu et.al. | 2407.18234v1 | null |
2024-07-25 | One-point Statistics in various cosmic environments in the presence of massive neutrinos | Mohadese Khoshtinat et.al. | 2407.18233v1 | null |
2024-07-26 | Enhanced Depth Estimation and 3D Geometry Reconstruction using Bayesian Helmholtz Stereopsis with Belief Propagation | Razieh Azizi et.al. | 2407.18195v2 | null |
2024-07-25 | PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations | Cheng Qian et.al. | 2407.18178v1 | null |
2024-07-26 | On-chip near-infrared spectroscopic sensing with over 520nm bandwidth | Chunhui Yao et.al. | 2407.18172v2 | null |
2024-07-25 | IRIS: Wireless Ring for Vision-based Smart Home Interaction | Maruchi Kim et.al. | 2407.18141v1 | null |
2024-07-25 | XS-VID: An Extremely Small Video Object Detection Dataset | Jiahao Guo et.al. | 2407.18137v1 | null |
2024-07-25 | Estimating Earthquake Magnitude in Sentinel-1 Imagery via Ranking | Daniele Rege Cambrin et.al. | 2407.18128v1 | null |
2024-07-25 | Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images | Roberto Di Via et.al. | 2407.18125v1 | null |
2024-07-25 | Multi-Resolution Histopathology Patch Graphs for Ovarian Cancer Subtyping | Jack Breen et.al. | 2407.18105v1 | link |
2024-07-24 | SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency | Yiming Xie et.al. | 2407.17470v1 | null |
2024-07-24 | SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning | Jianpeng Yao et.al. | 2407.17460v1 | null |
2024-07-24 | EuroCropsML: A Time Series Benchmark Dataset For Few-Shot Crop Type Classification | Joana Reuss et.al. | 2407.17458v1 | null |
2024-07-24 | HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Zhenzhi Wang et.al. | 2407.17438v1 | link |
2024-07-24 | Systematic study of High |
Z. Wang et.al. | 2407.17407v1 | null |
2024-07-24 | Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising | Sébastien Herbreteau et.al. | 2407.17399v1 | null |
2024-07-24 | Sampling-Based Hierarchical Trajectory Planning for Formation Flight | Qingzhao Liu et.al. | 2407.17392v1 | null |
2024-07-24 | 2D and 3D Deep Learning Models for MRI-based Parkinson's Disease Classification: A Comparative Analysis of Convolutional Kolmogorov-Arnold Networks, Convolutional Neural Networks, and Graph Convolutional Networks | Salil B Patel et.al. | 2407.17380v1 | null |
2024-07-24 | Entropy Reweighted Conformal Classification | Rui Luo et.al. | 2407.17377v1 | null |
2024-07-24 | MuST: Multi-Scale Transformers for Surgical Phase Recognition | Alejandra Pérez et.al. | 2407.17361v1 | link |
2024-07-23 | Explanation Regularisation through the Lens of Attributions | Pedro Ferreira et.al. | 2407.16693v1 | null |
2024-07-23 | On the local cohomology of secant varieties | Sebastian Olano et.al. | 2407.16688v1 | null |
2024-07-23 | AutoRG-Brain: Grounded Report Generation for Brain MRI | Jiayu Lei et.al. | 2407.16684v1 | null |
2024-07-24 | Goedel logics: Prenex fragments | Matthias Baaz et.al. | 2407.16683v2 | null |
2024-07-24 | A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data | Adrian Remonda et.al. | 2407.16680v2 | link |
2024-07-23 | From Imitation to Refinement -- Residual RL for Precise Visual Assembly | Lars Ankile et.al. | 2407.16677v1 | null |
2024-07-23 | FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process | Yuyan Bu et.al. | 2407.16670v1 | null |
2024-07-23 | EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval | Thomas Hummel et.al. | 2407.16658v1 | link |
2024-07-23 | Fluorescence Diffraction Tomography using Explicit Neural Fields | Renzhi He et.al. | 2407.16657v1 | null |
2024-07-23 | MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence | Canyu Zhao et.al. | 2407.16655v1 | null |
2024-07-22 | AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description | Junyu Xie et.al. | 2407.15850v1 | link |
2024-07-22 | SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models | Mingze Xu et.al. | 2407.15841v1 | null |
2024-07-23 | QueST: Self-Supervised Skill Abstractions for Learning Continuous Control | Atharva Mete et.al. | 2407.15840v2 | null |
2024-07-22 | Enhancing Cell Instance Segmentation in Scanning Electron Microscopy Images via a Deep Contour Closing Operator | Florian Robert et.al. | 2407.15817v1 | null |
2024-07-22 | Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning | Zhecheng Yuan et.al. | 2407.15815v1 | null |
2024-07-22 | The Evaporating Massive Embedded Stellar Cluster IRS 13 Close to Sgr A. II. Kinematic structure* | Florian Peißker et.al. | 2407.15800v1 | null |
2024-07-22 | Adaptive Extensions of Unbiased Risk Estimators for Unsupervised Magnetic Resonance Image Denoising | Reeshad Khan et.al. | 2407.15799v1 | null |
2024-07-23 | Disentangling spatio-temporal knowledge for weakly supervised object detection and segmentation in surgical video | Guiqiu Liao et.al. | 2407.15794v2 | null |
2024-07-22 | LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding | Haoning Wu et.al. | 2407.15754v1 | link |
2024-07-22 | SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection | Dimitrios Kollias et.al. | 2407.15728v1 | null |
2024-07-19 | DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks | Sarah Jabbour et.al. | 2407.14509v1 | null |
2024-07-19 | T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation | Kaiyue Sun et.al. | 2407.14505v1 | null |
2024-07-19 | Nonlinear Schrödinger Network | Yiming Zhou et.al. | 2407.14504v1 | null |
2024-07-19 | Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery | Sukrut Rao et.al. | 2407.14499v1 | link |
2024-07-19 | Enhancing Layout Hotspot Detection Efficiency with YOLOv8 and PCA-Guided Augmentation | Dongyang Wu et.al. | 2407.14498v1 | null |
2024-07-19 | Evaluating the Reliability of Self-Explanations in Large Language Models | Korbinian Randl et.al. | 2407.14487v1 | link |
2024-07-19 | Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model | Seonghui Min et.al. | 2407.14434v1 | null |
2024-07-19 | Dataset Distillation in Medical Imaging: A Feasibility Study | Muyang Li et.al. | 2407.14429v1 | null |
2024-07-19 | Controllable and Efficient Multi-Class Pathology Nuclei Data Augmentation using Text-Conditioned Diffusion Models | Hyun-Jic Oh et.al. | 2407.14426v1 | null |
2024-07-19 | Improving classification of road surface conditions via road area extraction and contrastive learning | Linh Trinh et.al. | 2407.14418v1 | null |
2024-07-18 | GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model | Abdelrahman Shaker et.al. | 2407.13772v1 | null |
2024-07-18 | Addressing Imbalance for Class Incremental Learning in Medical Image Classification | Xuze Hao et.al. | 2407.13768v1 | null |
2024-07-18 | Shape of Motion: 4D Reconstruction from a Single Video | Qianqian Wang et.al. | 2407.13764v1 | null |
2024-07-18 | Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion | Boyang Deng et.al. | 2407.13759v1 | null |
2024-07-18 | Exploring Facial Biomarkers for Depression through Temporal Analysis of Action Units | Aditya Parikh et.al. | 2407.13753v1 | null |
2024-07-18 | Temporal Representation Learning for Stock Similarities and Its Applications in Investment Management | Yoontae Hwang et.al. | 2407.13751v1 | null |
2024-07-18 | Pose-guided multi-task video transformer for driver action recognition | Ricardo Pizarro et.al. | 2407.13750v1 | null |
2024-07-18 | Multi-Label Learning with Stronger Consistency Guarantees | Anqi Mao et.al. | 2407.13746v1 | null |
2024-07-18 | Realizable |
Anqi Mao et.al. | 2407.13732v1 | null |
2024-07-18 | Enhanced |
Anqi Mao et.al. | 2407.13722v1 | null |
2024-07-17 | VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control | Sherwin Bahmani et.al. | 2407.12781v1 | null |
2024-07-17 | Hallucination Index: An Image Quality Metric for Generative Reconstruction Models | Matthew Tivnan et.al. | 2407.12780v1 | null |
2024-07-17 | LookupViT: Compressing visual information to a limited number of tokens | Rajat Koner et.al. | 2407.12753v1 | null |
2024-07-17 | 4Dynamic: Text-to-4D Generation with Hybrid Priors | Yu-Jie Yuan et.al. | 2407.12684v1 | null |
2024-07-17 | Goldfish: Vision-Language Understanding of Arbitrarily Long Videos | Kirolos Ataallah et.al. | 2407.12679v1 | null |
2024-07-17 | Promptable Counterfactual Diffusion Model for Unified Brain Tumor Segmentation and Generation with MRIs | Yiqing Shen et.al. | 2407.12678v1 | null |
2024-07-17 | CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems | Jiankun Zhao et.al. | 2407.12676v1 | link |
2024-07-17 | Distilling Tiny and Ultra-fast Deep Neural Networks for Autonomous Navigation on Nano-UAVs | Lorenzo Lamberti et.al. | 2407.12675v1 | null |
2024-07-17 | Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data | Richard Osuala et.al. | 2407.12669v1 | null |
2024-07-17 | Is That Rain? Understanding Effects on Visual Odometry Performance for Autonomous UAVs and Efficient DNN-based Rain Classification at the Edge | Andrea Albanese et.al. | 2407.12663v1 | null |
2024-07-16 | Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling | Jaehyeok Kim et.al. | 2407.11962v1 | null |
2024-07-16 | A Transformer-based Approach for Augmenting Software Engineering Chatbots Datasets | Ahmad Abdellatif et.al. | 2407.11955v1 | null |
2024-07-16 | Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation | Olga Zatsarynna et.al. | 2407.11954v1 | null |
2024-07-16 | Temporally Consistent Stereo Matching | Jiaxi Zeng et.al. | 2407.11950v1 | link |
2024-07-17 | Hierarchical Separable Video Transformer for Snapshot Compressive Imaging | Ping Wang et.al. | 2407.11946v2 | link |
2024-07-16 | Tackling Oversmoothing in GNN via Graph Sparsification: A Truss-based Approach | Tanvir Hossain et.al. | 2407.11928v1 | null |
2024-07-16 | The Strength of Bisymmetric Modes in SDSS-IV/MaNGA Barred Galaxy Kinematics | Brian DiGiorgio Zanger et.al. | 2407.11908v1 | null |
2024-07-16 | GraphFM: A Scalable Framework for Multi-Graph Pretraining | Divyansha Lachi et.al. | 2407.11907v1 | null |
2024-07-16 | SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge | Hao Ding et.al. | 2407.11906v1 | null |
2024-07-16 | Automated production of batched unclonable micro-patterns anti-counterfeiting labels with strong robustness and rapid recognition speed | Yuzheng He et.al. | 2407.11886v1 | null |
2024-07-15 | No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations | Walter Simoncini et.al. | 2407.10964v1 | link |
2024-07-15 | InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models | Nirat Saini et.al. | 2407.10958v1 | null |
2024-07-15 | MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models | Chengguang Gan et.al. | 2407.10953v1 | null |
2024-07-15 | IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation | Yuanhao Zhai et.al. | 2407.10937v1 | link |
2024-07-15 | Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together | Dilara Soylu et.al. | 2407.10930v1 | null |
2024-07-15 | In-Loop Filtering via Trained Look-Up Tables | Zhuoyuan Li et.al. | 2407.10926v1 | null |
2024-07-15 | A Dual-Attention Aware Deep Convolutional Neural Network for Early Alzheimer's Detection | Pandiyaraju V et.al. | 2407.10921v1 | null |
2024-07-16 | DataDream: Few-shot Guided Dataset Generation | Jae Myung Kim et.al. | 2407.10910v2 | link |
2024-07-15 | Interpreting Hand gestures using Object Detection and Digits Classification | Sangeetha K et.al. | 2407.10902v1 | null |
2024-07-15 | Leveraging Multimodal CycleGAN for the Generation of Anatomically Accurate Synthetic CT Scans from MRIs | Leonardo Crespi et.al. | 2407.10888v1 | null |
2024-07-12 | Non-Hermitian Origin of Wannier Localizability and Detachable Topological Boundary States | Daichi Nakamura et.al. | 2407.09458v1 | null |
2024-07-12 | Let Me DeCode You: Decoder Conditioning with Tabular Data | Tomasz Szczepański et.al. | 2407.09437v1 | link |
2024-07-12 | Rethinking temporal self-similarity for repetitive action counting | Yanan Luo et.al. | 2407.09431v1 | null |
2024-07-12 | TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models | Hang Zou et.al. | 2407.09424v1 | null |
2024-07-12 | A grid of self-consistent MSG (MARCS-StaticWeather-GGchem) cool stellar, sub-stellar, and exoplanetary model atmospheres | Uffe G. Jørgensen et.al. | 2407.09397v1 | null |
2024-07-12 | Open-Canopy: A Country-Scale Benchmark for Canopy Height Estimation at Very High Resolution | Fajwel Fogel et.al. | 2407.09392v1 | link |
2024-07-12 | Radiance Fields from Photons | Sacha Jungerman et.al. | 2407.09386v1 | null |
2024-07-12 | Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation | Zhilin Zhu et.al. | 2407.09367v1 | link |
2024-07-12 | Novel clustered federated learning based on local loss | Endong Gu et.al. | 2407.09360v1 | link |
2024-07-12 | Imaging Interiors: An Implicit Solution to Electromagnetic Inverse Scattering Problems | Ziyuan Luo et.al. | 2407.09352v1 | null |
2024-07-11 | Video Diffusion Alignment via Reward Gradients | Mihir Prabhudesai et.al. | 2407.08737v1 | link |
2024-07-11 | Real-Time Anomaly Detection and Reactive Planning with Large Language Models | Rohan Sinha et.al. | 2407.08735v1 | null |
2024-07-11 | WhisperNetV2: SlowFast Siamese Network For Lip-Based Biometrics | Abdollah Zakeri et.al. | 2407.08717v1 | null |
2024-07-11 | Sensor-Aware Classifiers for Energy-Efficient Time Series Applications on IoT Devices | Dina Hussein et.al. | 2407.08715v1 | null |
2024-07-11 | Towards Efficient Deployment of Hybrid SNNs on Neuromorphic and Edge AI Hardware | James Seekings et.al. | 2407.08704v1 | null |
2024-07-11 | Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models | Zhening Xing et.al. | 2407.08701v1 | null |
2024-07-11 | ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions | Jiu Feng et.al. | 2407.08691v1 | link |
2024-07-11 | Generalizable Implicit Motion Modeling for Video Frame Interpolation | Zujin Guo et.al. | 2407.08680v1 | null |
2024-07-11 | Still-Moving: Customized Video Generation without Customized Video Data | Hila Chefer et.al. | 2407.08674v1 | null |
2024-07-11 | NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning | Yi Zhang et.al. | 2407.08672v1 | null |
2024-07-10 | LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models | Feng Li et.al. | 2407.07895v1 | link |
2024-07-10 | Vegetable Peeling: A Case Study in Constrained Dexterous Manipulation | Tao Chen et.al. | 2407.07884v1 | null |
2024-07-10 | Controlling Space and Time with Diffusion Models | Daniel Watson et.al. | 2407.07860v1 | null |
2024-07-11 | Functional Assessment of Cerebral Capillaries using Single Capillary Reporters in Ultrasound Localization Microscopy | Stephen A Lee et.al. | 2407.07857v2 | null |
2024-07-10 | Study on Aspect Ratio Variability toward Robustness of Vision Transformer-based Vehicle Re-identification | Mei Qiu et.al. | 2407.07842v1 | null |
2024-07-10 | Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective | Shengjia Chen et.al. | 2407.07841v1 | link |
2024-07-10 | Probe and Prejudice: Classification of compact objects and model comparison using EOS knowledge | Hauke Koehn et.al. | 2407.07837v1 | null |
2024-07-10 | RT-LA-VocE: Real-Time Low-SNR Audio-Visual Speech Enhancement | Honglie Chen et.al. | 2407.07825v1 | null |
2024-07-10 | New Gravitational Wave Discoveries Enabled by Machine Learning | Alexandra E. Koloniari et.al. | 2407.07820v1 | null |
2024-07-10 | The Misclassification Likelihood Matrix: Some Classes Are More Likely To Be Misclassified Than Others | Daniel Sikar et.al. | 2407.07818v1 | null |
2024-07-09 | V-VIPE: Variational View Invariant Pose Embedding | Mara Levy et.al. | 2407.07092v1 | null |
2024-07-09 | Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic | Ruochen Jin et.al. | 2407.07089v1 | link |
2024-07-09 | MoSt-DSA: Modeling Motion and Structural Interactions for Direct Multi-Frame Interpolation in DSA Images | Ziyang Xu et.al. | 2407.07078v1 | link |
2024-07-09 | MADE-for-ASD: A Multi-Atlas Deep Ensemble Network for Diagnosing Autism Spectrum Disorder | Md Rakibul Hasan et.al. | 2407.07076v1 | null |
2024-07-10 | CAPformer: Compression-Aware Pre-trained Transformer for Low-Light Image Enhancement | Wei Wang et.al. | 2407.07056v2 | null |
2024-07-09 | Latent Space Imaging | Matheus Souza et.al. | 2407.07052v1 | null |
2024-07-09 | Simple and Interpretable Probabilistic Classifiers for Knowledge Graphs | Christian Riefolo et.al. | 2407.07045v1 | null |
2024-07-09 | Free Fermionic Constructions of Heterotic Strings | Ioannis Florakis et.al. | 2407.07034v1 | null |
2024-07-09 | Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition | Daiqing Wu et.al. | 2407.07026v1 | null |
2024-07-09 | Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization | Jeongseok Hyun et.al. | 2407.07024v1 | link |
2024-07-08 | Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision | Orr Zohar et.al. | 2407.06189v1 | link |
2024-07-08 | Classification of Cellular Automata based on the Hamming distance | Gaspar Alfaro et.al. | 2407.06175v1 | null |
2024-07-08 | The Tug-of-War Between Deepfake Generation and Detection | Hannah Lee et.al. | 2407.06174v1 | null |
2024-07-08 | PanDORA: Casual HDR Radiance Acquisition for Indoor Scenes | Mohammad Reza Karimi Dastjerdi et.al. | 2407.06150v1 | null |
2024-07-08 | Physics-informed machine learning approaches to reactor antineutrino detection | Sophia Farrell et.al. | 2407.06139v1 | null |
2024-07-08 | Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities | Avinash Anand et.al. | 2407.06125v1 | null |
2024-07-08 | Accelerating Diffusion for SAR-to-Optical Image Translation via Adversarial Consistency Distillation | Xinyu Bai et.al. | 2407.06095v1 | null |
2024-07-08 | ERR@HRI 2024 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Interactions | Micol Spitale et.al. | 2407.06094v1 | null |
2024-07-08 | Artificial Intuition: Efficient Classification of Scientific Abstracts | Harsh Sakhrani et.al. | 2407.06093v1 | null |
2024-07-08 | Assessing Cardiomegaly in Dogs Using a Simple CNN Model | Nikhil Deekonda et.al. | 2407.06092v1 | null |
2024-07-05 | VCoME: Verbal Video Composition with Multimodal Editing Effects | Weibo Gong et.al. | 2407.04697v1 | null |
2024-07-05 | Enhancing Vehicle Re-identification and Matching for Weaving Analysis | Mei Qiu et.al. | 2407.04688v1 | null |
2024-07-05 | Embracing Massive Medical Data | Yu-Cheng Chou et.al. | 2407.04687v1 | link |
2024-07-05 | Is plantar thermography a valid digital biomarker for characterising diabetic foot ulceration risk? | Akshay Jagadeesh et.al. | 2407.04676v1 | null |
2024-07-05 | AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation | Yuhan Zhu et.al. | 2407.04603v1 | null |
2024-07-05 | Multimodal Classification via Modal-Aware Interactive Enhancement | Qing-Yuan Jiang et.al. | 2407.04587v1 | null |
2024-07-05 | A Degree Bound for Planar Functions | Christof Beierle et.al. | 2407.04570v1 | null |
2024-07-05 | Pencils of plane cubics with one base point | Riccardo Moschetti et.al. | 2407.04569v1 | null |
2024-07-05 | Anticipating Solar Flares | Hugh S. Hudson et.al. | 2407.04567v1 | null |
2024-07-05 | Real Time Emotion Analysis Using Deep Learning for Education, Entertainment, and Beyond | Abhilash Khuntia et.al. | 2407.04560v1 | null |
2024-07-03 | InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output | Pan Zhang et.al. | 2407.03320v1 | link |
2024-07-03 | Value-Penalized Auxiliary Control from Examples for Learning without Rewards or Demonstrations | Trevor Ablett et.al. | 2407.03311v1 | link |
2024-07-03 | Accelerated Proton Resonance Frequency-based Magnetic Resonance Thermometry by Optimized Deep Learning Method | Sijie Xu et.al. | 2407.03308v1 | link |
2024-07-03 | HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization | Yucheng Tang et.al. | 2407.03307v1 | null |
2024-07-03 | VCHAR:Variance-Driven Complex Human Activity Recognition framework with Generative Representation | Yuan Sun et.al. | 2407.03291v1 | null |
2024-07-03 | Using Photoplethysmography to Detect Real-time Blood Pressure Changes with a Calibration-free Deep Learning Model | Jingyuan Hong et.al. | 2407.03274v1 | null |
2024-07-03 | Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later | Han-Jia Ye et.al. | 2407.03257v1 | link |
2024-07-03 | STF: Sentence Transformer Fine-Tuning For Topic Categorization With Limited Data | Kheir Eddine Daouadi et.al. | 2407.03253v1 | null |
2024-07-03 | ACTRESS: Active Retraining for Semi-supervised Visual Grounding | Weitai Kang et.al. | 2407.03251v1 | null |
2024-07-04 | TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach | Weikun Peng et.al. | 2407.03245v2 | null |
2024-07-02 | Characterizing the Interpretability of Attention Maps in Digital Pathology | Tomé Albuquerque et.al. | 2407.02484v1 | null |
2024-07-02 | Ensemble of pre-trained language models and data augmentation for hate speech detection from Arabic tweets | Kheir Eddine Daouadi et.al. | 2407.02448v1 | null |
2024-07-02 | PLeaS -- Merging Models with Permutations and Least Squares | Anshul Nasery et.al. | 2407.02447v1 | null |
2024-07-02 | Evaluating the Robustness of Adverse Drug Event Classification Models Using Templates | Dorothea MacPhail et.al. | 2407.02432v1 | null |
2024-07-02 | AXIAL: Attention-based eXplainability for Interpretable Alzheimer's Localized Diagnosis using 2D CNNs on 3D MRI brain scans | Gabriele Lozupone et.al. | 2407.02418v1 | link |
2024-07-03 | Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs | Jinmin Li et.al. | 2407.02411v2 | null |
2024-07-02 | Tiny-PULP-Dronets: Squeezing Neural Networks for Faster and Lighter Inference on Multi-Tasking Autonomous Nano-Drones | Lorenzo Lamberti et.al. | 2407.02405v1 | null |
2024-07-03 | A neural networks method to search for long transient gravitational waves | Francesca Attadio et.al. | 2407.02391v2 | null |
2024-07-02 | Real HSI-MSI-PAN image dataset for the hyperspectral/multi-spectral/panchromatic image fusion and super-resolution fields | Shuangliang Li et.al. | 2407.02387v1 | link |
2024-07-02 | OpenSlot: Mixed Open-set Recognition with Object-centric Learning | Xu Yin et.al. | 2407.02386v1 | null |
2024-06-28 | Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs | Sukmin Yun et.al. | 2406.20098v1 | link |
2024-06-28 | LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression | Jieneng Chen et.al. | 2406.20092v1 | link |
2024-06-28 | Minimax And Adaptive Transfer Learning for Nonparametric Classification under Distributed Differential Privacy Constraints | Arnab Auddy et.al. | 2406.20088v1 | null |
2024-06-28 | Extreme horizon equation | Wojciech Kamiński et.al. | 2406.20068v1 | null |
2024-06-28 | Modeling and LQR Control of Insect Sized Flapping Wing Robot | Daksh Dhingra et.al. | 2406.20061v1 | null |
2024-06-28 | Pairwise Difference Learning for Classification | Mohamed Karim Belaid et.al. | 2406.20031v1 | link |
2024-06-28 | On the Trade-off between Flatness and Optimization in Distributed Learning | Ying Cao et.al. | 2406.20006v1 | null |
2024-06-28 | Malaria Cell Detection Using Deep Neural Networks | Saurabh Sawant et.al. | 2406.20005v1 | null |
2024-06-28 | Impact of Initialization on Intra-subject Pediatric Brain MR Image Registration: A Comparative Analysis between SyN ANTs and Deep Learning-Based Approaches | Andjela Dimitrijevic et.al. | 2406.19943v1 | link |
2024-07-01 | GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection | Chih-Chung Hsu et.al. | 2406.19941v2 | link |
2024-06-27 | ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos | Jr-Jen Chen et.al. | 2406.19392v1 | link |
2024-06-27 | Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads | Ali Khaleghi Rahimian et.al. | 2406.19391v1 | link |
2024-06-27 | OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding | Tao Zhang et.al. | 2406.19389v1 | null |
2024-06-27 | Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model | Haobo Yuan et.al. | 2406.19369v1 | null |
2024-06-27 | IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language | Lucky Susanto et.al. | 2406.19349v1 | null |
2024-06-27 | Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation | Yushun Tang et.al. | 2406.19341v1 | null |
2024-06-28 | LiverUSRecon: Automatic 3D Reconstruction and Volumetry of the Liver with a Few Partial Ultrasound Scans | Kaushalya Sivayogaraj et.al. | 2406.19336v2 | null |
2024-06-27 | PNeRV: A Polynomial Neural Representation for Videos | Sonam Gupta et.al. | 2406.19299v1 | null |
2024-06-27 | Leveraging Contrastive Learning for Enhanced Node Representations in Tokenized Graph Transformers | Jinsong Chen et.al. | 2406.19258v1 | null |
2024-06-27 | Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment | Hao Fei et.al. | 2406.19255v1 | null |
2024-06-26 | Towards Compositionality in Concept Learning | Adam Stein et.al. | 2406.18534v1 | link |
2024-06-26 | MatchTime: Towards Automatic Soccer Game Commentary Generation | Jiayuan Rao et.al. | 2406.18530v1 | null |
2024-06-26 | MultiDiff: Consistent Novel View Synthesis from a Single Image | Norman Müller et.al. | 2406.18524v1 | null |
2024-06-26 | ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation | Shenghai Yuan et.al. | 2406.18522v1 | null |
2024-06-27 | Distinguishing mechanisms of social contagion from local network view | Elsa Andres et.al. | 2406.18519v2 | null |
2024-06-26 | Assessment of Clonal Hematopoiesis of Indeterminate Potential from Cardiac Magnetic Resonance Imaging using Deep Learning in a Cardio-oncology Population | Sangeon Ryu et.al. | 2406.18508v1 | null |
2024-06-26 | Robust Surgical Phase Recognition From Annotation Efficient Supervision | Or Rubin et.al. | 2406.18481v1 | null |
2024-06-26 | Universal Anomaly Detection at the LHC: Transforming Optimal Classifiers and the DDD Method | Sascha Caron et.al. | 2406.18469v1 | null |
2024-06-26 | An Autotuning-based Optimization Framework for Mixed-kernel SVM Classifications in Smart Pixel Datasets and Heterojunction Transistors | Xingfu Wu et.al. | 2406.18445v1 | null |
2024-06-26 | Repeat and Concatenate: 2D to 3D Image Translation with 3D to 3D Generative Modeling | Abril Corona-Figueroa et.al. | 2406.18422v1 | null |
2024-06-25 | Text-Animator: Controllable Visual Text Video Generation | Lin Liu et.al. | 2406.17777v1 | null |
2024-06-25 | MotionBooth: Motion-Aware Customized Text-to-Video Generation | Jianzong Wu et.al. | 2406.17758v1 | null |
2024-06-25 | Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation | Tushar Prasanna Swaminathan et.al. | 2406.17749v1 | null |
2024-06-25 | Structured Unrestricted-Rank Matrices for Parameter Efficient Fine-tuning | Arijit Sehanobish et.al. | 2406.17740v1 | null |
2024-06-25 | Mask-Guided Attention U-Net for Enhanced Neonatal Brain Extraction and Image Preprocessing | Bahram Jafrasteh et.al. | 2406.17709v1 | link |
2024-06-25 | SurgeMOD: Translating image-space tissue motions into vision-based surgical forces | Mikel De Iturrate Reyzabal et.al. | 2406.17707v1 | link |
2024-06-25 | Dualities for universal (co)acting Hopf monoids | Ana Agore et.al. | 2406.17684v1 | null |
2024-06-25 | Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation | Xuming Zhang et.al. | 2406.17679v1 | null |
2024-06-25 | Lifting of locally initial objects and universal (co)acting Hopf algebras | Ana Agore et.al. | 2406.17677v1 | null |
2024-06-25 | Brain Tumor Classification using Vision Transformer with Selective Cross-Attention Mechanism and Feature Calibration | Mohammad Ali Labbaf Khaniki et.al. | 2406.17670v1 | null |
2024-06-24 | StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal | Chongjie Ye et.al. | 2406.16864v1 | null |
2024-06-24 | FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models | Haonan Qiu et.al. | 2406.16863v1 | link |
2024-06-24 | Dreamitate: Real-World Visuomotor Policy Learning via Video Generation | Junbang Liang et.al. | 2406.16862v1 | null |
2024-06-24 | Long Context Transfer from Language to Vision | Peiyuan Zhang et.al. | 2406.16852v1 | link |
2024-06-24 | Unsupervised Domain Adaptation for Pediatric Brain Tumor Segmentation | Jingru Fu et.al. | 2406.16848v1 | null |
2024-06-24 | Exploring Factual Entailment with NLI: A News Media Study | Guy Mor-Lan et.al. | 2406.16842v1 | null |
2024-06-24 | A Certifiable Algorithm for Simultaneous Shape Estimation and Object Tracking | Lorenzo Shaikewitz et.al. | 2406.16837v1 | null |
2024-06-24 | USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations | Mounika Marreddy et.al. | 2406.16833v1 | null |
2024-06-24 | The classification of simple complex Lie superalgebras of polynomial vector fields and their deformations | Dimitry Leites et.al. | 2406.16760v1 | null |
2024-06-24 | The MRI Scanner as a Diagnostic: Image-less Active Sampling | Yuning Du et.al. | 2406.16754v1 | null |
2024-06-21 | Full-Scale Indexing and Semantic Annotation of CT Imaging: Boosting FAIRness | Hannes Ulrich et.al. | 2406.15340v1 | null |
2024-06-21 | Image Conductor: Precision Control for Interactive Video Synthesis | Yaowei Li et.al. | 2406.15339v1 | null |
2024-06-21 | An End-to-End, Segmentation-Free, Arabic Handwritten Recognition Model on KHATT | Sondos Aabed et.al. | 2406.15329v1 | null |
2024-06-21 | Fine-grained Attention in Hierarchical Transformers for Tabular Time-series | Raphael Azorin et.al. | 2406.15327v1 | link |
2024-06-21 | NLP-KG: A System for Exploratory Search of Scientific Literature in Natural Language Processing | Tim Schopf et.al. | 2406.15294v1 | link |
2024-06-21 | Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics | Weijia Zhang et.al. | 2406.15264v1 | null |
2024-06-24 | VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation | Xuan He et.al. | 2406.15252v2 | null |
2024-06-21 | Retrieval Augmented Zero-Shot Text Classification | Tassallah Abdullahi et.al. | 2406.15241v1 | null |
2024-06-21 | Model Equivalences | Michael Benedikt et.al. | 2406.15235v1 | null |
2024-06-21 | Rate-Splitting Multiple Access for Overloaded Multi-group Multicast: A First Experimental Study | Xinze Lyu et.al. | 2406.15217v1 | null |
2024-06-20 | A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models | Xincheng Shuai et.al. | 2406.14555v1 | link |
2024-06-21 | Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation | Eyal Michaeli et.al. | 2406.14551v2 | link |
2024-06-20 | IRASim: Learning Interactive Real-Robot Action Simulators | Fangqi Zhu et.al. | 2406.14540v1 | null |
2024-06-20 | Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration | Long Lei et.al. | 2406.14534v1 | link |
2024-06-20 | Local symmetries in partially ordered sets | Christoph Minz et.al. | 2406.14533v1 | null |
2024-06-20 | Fantastic Copyrighted Beasts and How (Not) to Generate Them | Luxi He et.al. | 2406.14526v1 | null |
2024-06-20 | MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding | Xinyu Fang et.al. | 2406.14515v1 | link |
2024-06-20 | V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data | Rotem Shalev-Arkushin et.al. | 2406.14510v1 | null |
2024-06-20 | LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors | Sheikh Asif Imran et.al. | 2406.14498v1 | link |
2024-06-20 | African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification | Gregor Geigle et.al. | 2406.14496v1 | null |
2024-06-18 | DrVideo: Document Retrieval Based Long Video Understanding | Ziyu Ma et.al. | 2406.12846v1 | null |
2024-06-18 | LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging | Jinuk Kim et.al. | 2406.12837v1 | link |
2024-06-18 | GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation | Ci-Siang Lin et.al. | 2406.12834v1 | null |
2024-06-18 | VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing | Jing Gu et.al. | 2406.12831v1 | null |
2024-06-18 | Neural Approximate Mirror Maps for Constrained Diffusion Models | Berthy T. Feng et.al. | 2406.12816v1 | null |
2024-06-18 | Privacy Preserving Federated Learning in Medical Imaging with Uncertainty Estimation | Nikolas Koutsoubis et.al. | 2406.12815v1 | link |
2024-06-18 | Probabilistic Temporal Prediction of Continuous Disease Trajectories and Treatment Effects Using Neural SDEs | Joshua Durso-Finley et.al. | 2406.12807v1 | null |
2024-06-18 | Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition | Xingming Liao et.al. | 2406.12779v1 | null |
2024-06-18 | Medvedev degrees of subshifts on groups | Sebastián Barbieri et.al. | 2406.12777v1 | null |
2024-06-18 | Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video | Xiangming Zhu et.al. | 2406.12769v1 | null |
2024-06-17 | Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% | Lei Zhu et.al. | 2406.11837v1 | link |
2024-06-17 | Spectral Introspection Identifies Group Training Dynamics in Deep Neural Networks for Neuroimaging | Bradley T. Baker et.al. | 2406.11825v1 | null |
2024-06-17 | Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation | Alexander Raistrick et.al. | 2406.11824v1 | null |
2024-06-17 | VideoLLM-online: Online Video Large Language Model for Streaming Video | Joya Chen et.al. | 2406.11816v1 | null |
2024-06-17 | Faces of Experimental Pain: Transferability of Deep Learned Heat Pain Features to Electrical Pain | Pooja Prajod et.al. | 2406.11808v1 | null |
2024-06-17 | Mix-Domain Contrastive Learning for Unpaired H&E-to-IHC Stain Translation | Song Wang et.al. | 2406.11799v1 | null |
2024-06-17 | CELL your Model: Contrastive Explanation Methods for Large Language Models | Ronny Luss et.al. | 2406.11785v1 | null |
2024-06-17 | Task Me Anything | Jieyu Zhang et.al. | 2406.11775v1 | link |
2024-06-17 | Domain Generalization for In-Orbit 6D Pose Estimation | Antoine Legrand et.al. | 2406.11743v1 | null |
2024-06-17 | Lightweight Model Pre-training via Language Guided Knowledge Distillation | Mingsheng Li et.al. | 2406.11689v1 | link |
2024-06-14 | VideoGUI: A Benchmark for GUI Automation from Instructional Videos | Kevin Qinghong Lin et.al. | 2406.10227v1 | null |
2024-06-14 | Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding | Ridouane Ghermi et.al. | 2406.10221v1 | null |
2024-06-14 | SSTFB: Leveraging self-supervised pretext learning and temporal self-attention with feature branching for real-time video polyp segmentation | Ziang Xu et.al. | 2406.10200v1 | null |
2024-06-14 | CarLLaVA: Vision language models for camera-only closed-loop driving | Katrin Renz et.al. | 2406.10165v1 | null |
2024-06-14 | Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition | Guinan Li et.al. | 2406.10152v1 | null |
2024-06-14 | Training-free Camera Control for Video Generation | Chen Hou et.al. | 2406.10126v1 | null |
2024-06-14 | Modified Risk Formulation for Improving the Prediction of Knee Osteoarthritis Progression | Haresh Rengaraj Rajamohan et.al. | 2406.10119v1 | null |
2024-06-14 | ECGMamba: Towards Efficient ECG Classification with BiSSM | Yupeng Qiang et.al. | 2406.10098v1 | null |
2024-06-14 | Biomarker based Cancer Classification using an Ensemble with Pre-trained Models | Chongmin Lee et.al. | 2406.10087v1 | null |
2024-06-14 | On the Evaluation of Speech Foundation Models for Spoken Language Understanding | Siddhant Arora et.al. | 2406.10083v1 | null |
2024-06-13 | VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding | Muhammad Maaz et.al. | 2406.09418v1 | link |
2024-06-13 | An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels | Duy-Kien Nguyen et.al. | 2406.09415v1 | null |
2024-06-13 | CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras | Sachin Shah et.al. | 2406.09409v1 | null |
2024-06-13 | Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion | Linzhan Mou et.al. | 2406.09402v1 | null |
2024-06-13 | OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation | Junke Wang et.al. | 2406.09399v1 | link |
2024-06-13 | Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA | Jongwoo Park et.al. | 2406.09396v1 | null |
2024-06-13 | LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living | Rajatsubhra Chakraborty et.al. | 2406.09390v1 | null |
2024-06-13 | Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior | Baiang Li et.al. | 2406.09389v1 | null |
2024-06-13 | Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition | Youngtaek Oh et.al. | 2406.09388v1 | link |
2024-06-13 | SimGen: Simulator-conditioned Driving Scene Generation | Yunsong Zhou et.al. | 2406.09386v1 | null |
2024-06-12 | On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models | Hashmat Shadab Malik et.al. | 2406.08486v1 | link |
2024-06-12 | RMem: Restricted Memory Banks Improve Video Object Segmentation | Junbao Zhou et.al. | 2406.08476v1 | null |
2024-06-12 | AToM-Bot: Embodied Fulfillment of Unspoken Human Needs with Affective Theory of Mind | Wei Ding et.al. | 2406.08455v1 | null |
2024-06-12 | Transformation-Dependent Adversarial Attacks | Yaoteng Tan et.al. | 2406.08443v1 | null |
2024-06-12 | A Sticker is Worth a Thousand Words: Characterizing the Use of Stickers in WhatsApp Political Groups in Brazil | Philipe Melo et.al. | 2406.08429v1 | null |
2024-06-12 | Improving Noise Robustness through Abstractions and its Impact on Machine Learning | Alfredo Ibias et.al. | 2406.08428v1 | null |
2024-06-12 | OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text | Qingyun Li et.al. | 2406.08418v1 | link |
2024-06-13 | MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos | Xuehai He et.al. | 2406.08407v2 | link |
2024-06-12 | Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze | Michele Mazzamuto et.al. | 2406.08379v1 | null |
2024-06-12 | 2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction | Tianqi Chen et.al. | 2406.08374v1 | null |
2024-06-11 | Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring | Huicong Zhang et.al. | 2406.07551v1 | link |
2024-06-11 | Image and Video Tokenization with Binary Spherical Quantization | Yue Zhao et.al. | 2406.07548v1 | link |
2024-06-11 | Zero-shot Image Editing with Reference Imitation | Xi Chen et.al. | 2406.07547v1 | null |
2024-06-11 | Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance | Kuan Heng Lin et.al. | 2406.07540v1 | null |
2024-06-11 | BAKU: An Efficient Transformer for Multi-Task Policy Learning | Siddhant Haldar et.al. | 2406.07539v1 | null |
2024-06-11 | Transforming a rare event search into a not-so-rare event search in real-time with deep learning-based object detection | J. Schueler et.al. | 2406.07538v1 | null |
2024-06-11 | Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection | Wenxiao Wang et.al. | 2406.07536v1 | null |
2024-06-11 | Dynamics of the non-radial energy-critical inhomogeneous NLS | Carlos M. Guzmán et.al. | 2406.07535v1 | null |
2024-06-11 | Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement | Yunzhen Feng et.al. | 2406.07515v1 | null |
2024-06-11 | Understanding Visual Concepts Across Models | Brandon Trabucco et.al. | 2406.07506v1 | link |
2024-06-10 | NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing | Ting-Hsuan Chen et.al. | 2406.06523v1 | null |
2024-06-10 | Data Augmentation for Multivariate Time Series Classification: An Experimental Study | Romain Ilbert et.al. | 2406.06518v1 | null |
2024-06-10 | Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Louis Blankemeier et.al. | 2406.06512v1 | null |
2024-06-10 | Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer | Sigal Raab et.al. | 2406.06508v1 | link |
2024-06-10 | Equivariant Neural Tangent Kernels | Philipp Misof et.al. | 2406.06504v1 | null |
2024-06-10 | Viscous shock fluctuations in KPZ | Alexander Dunlap et.al. | 2406.06502v1 | null |
2024-06-10 | NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative | Asmar Nadeem et.al. | 2406.06499v1 | null |
2024-06-10 | Demonstrating HumanTHOR: A Simulation Platform and Benchmark for Human-Robot Collaboration in a Shared Workspace | Chenxu Wang et.al. | 2406.06498v1 | null |
2024-06-10 | Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data | Nicole Hayes et.al. | 2406.06479v1 | null |
2024-06-10 | DiffAudit: Auditing Privacy Practices of Online Services for Children and Adolescents | Olivia Figueira et.al. | 2406.06473v1 | null |
2024-06-07 | DVOS: Self-Supervised Dense-Pattern Video Object Segmentation | Keyhan Najafian et.al. | 2406.05131v1 | null |
2024-06-07 | Compositional Curvature Bounds for Deep Neural Networks | Taha Entesari et.al. | 2406.05119v1 | null |
2024-06-07 | Large Generative Graph Models | Yu Wang et.al. | 2406.05109v1 | null |
2024-06-07 | A Novel Time Series-to-Image Encoding Approach for Weather Phenomena Classification | Christian Giannetti et.al. | 2406.05096v1 | null |
2024-06-10 | Discovery of An Apparent Red, High-Velocity Type Ia Supernova at z = 2.9 with JWST | J. D. R. Pierel et.al. | 2406.05089v2 | null |
2024-06-07 | CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion | Xingrui Wang et.al. | 2406.05082v1 | null |
2024-06-10 | Discovery of a Relativistic Stripped Envelope Type Ic-BL Supernova at z = 2.83 with JWST | M. R. Siebert et.al. | 2406.05076v2 | null |
2024-06-07 | Diving Deep into the Motion Representation of Video-Text Models | Chinmaya Devaraj et.al. | 2406.05075v1 | null |
2024-06-07 | Hibou: A Family of Foundational Vision Transformers for Pathology | Dmitry Nechaev et.al. | 2406.05074v1 | null |
2024-06-07 | Classification Metrics for Image Explanations: Towards Building Reliable XAI-Evaluations | Benjamin Fresz et.al. | 2406.05068v1 | link |
2024-06-06 | Verbalized Machine Learning: Revisiting Machine Learning with Language Models | Tim Z. Xiao et.al. | 2406.04344v1 | null |
2024-06-07 | Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion | Fangfu Liu et.al. | 2406.04338v2 | null |
2024-06-06 | Parameter-Inverted Image Pyramid Networks | Xizhou Zhu et.al. | 2406.04330v1 | link |
2024-06-06 | ShareGPT4Video: Improving Video Understanding and Generation with Better Captions | Lin Chen et.al. | 2406.04325v1 | null |
2024-06-06 | SF-V: Single Forward Video Generation Model | Zhixing Zhang et.al. | 2406.04324v1 | null |
2024-06-06 | ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories | Qianlan Yang et.al. | 2406.04323v1 | null |
2024-06-06 | VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling | Zeyue Tian et.al. | 2406.04321v1 | link |
2024-06-06 | Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models | Ali Behrouz et.al. | 2406.04320v1 | null |
2024-06-06 | Adaptive Sampling of k-Space in Magnetic Resonance for Rapid Pathology Prediction | Chen-Yu Yen et.al. | 2406.04318v1 | null |
2024-06-06 | Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks | Tristan Cinquin et.al. | 2406.04317v1 | null |
2024-06-05 | Grokking Modular Polynomials | Darshil Doshi et.al. | 2406.03495v1 | null |
2024-06-05 | The Logarithmic Memristor-Based Bayesian Machine | Clément Turck et.al. | 2406.03492v1 | null |
2024-06-05 | Convolutional Neural Networks and Vision Transformers for Fashion MNIST Classification: A Literature Review | Sonia Bbouzidi et.al. | 2406.03478v1 | null |
2024-06-05 | Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach | Haoyu Han et.al. | 2406.03464v1 | null |
2024-06-05 | Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts | Dominik Scheuble et.al. | 2406.03461v1 | null |
2024-06-05 | FILS: Self-Supervised Video Feature Prediction In Semantic Language Space | Mona Ahmadian et.al. | 2406.03447v1 | null |
2024-06-05 | Text-to-Events: Synthetic Event Camera Streams from Conditional Text Input | Joachim Ott et.al. | 2406.03439v1 | null |
2024-06-05 | Stabilizing massless fields with fluxes in Landau-Ginzburg models | Katrin Becker et.al. | 2406.03435v1 | null |
2024-06-05 | Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis | Moein Heidari et.al. | 2406.03430v1 | link |
2024-06-05 | Post-hoc Part-prototype Networks | Andong Tan et.al. | 2406.03421v1 | null |
2024-06-05 | Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting | Inkyu Shin et.al. | 2406.02541v2 | null |
2024-06-04 | ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation | Tianchen Zhao et.al. | 2406.02540v1 | null |
2024-06-04 | Enhancing predictive imaging biomarker discovery through treatment effect analysis | Shuhan Xiao et.al. | 2406.02534v1 | null |
2024-06-04 | ReLUs Are Sufficient for Learning Implicit Neural Representations | Joseph Shenouda et.al. | 2406.02529v1 | link |
2024-06-04 | RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots | Soroush Nasiriany et.al. | 2406.02523v1 | null |
2024-06-04 | DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering | Zhongpai Gao et.al. | 2406.02518v1 | null |
2024-06-04 | V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation | Cong Wang et.al. | 2406.02511v1 | null |
2024-06-04 | CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation | Dejia Xu et.al. | 2406.02509v1 | null |
2024-06-04 | Endomorphisms of Artin groups of type |
Luis Paris et.al. | 2406.02484v1 | null |
2024-06-04 | Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion | Colin Hansen et.al. | 2406.02477v1 | null |
2024-05-31 | Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis | Chaoyou Fu et.al. | 2405.21075v1 | null |
2024-05-31 | Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights | Xin Wen et.al. | 2405.21070v1 | link |
2024-05-31 | You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet | Zhen Qin et.al. | 2405.21022v1 | null |
2024-05-31 | Beyond Conventional Parametric Modeling: Data-Driven Framework for Estimation and Prediction of Time Activity Curves in Dynamic PET Imaging | Niloufar Zakariaei et.al. | 2405.21021v1 | null |
2024-05-31 | The classification of dp-minimal integral domains | Christian d'Elbée et.al. | 2405.21014v1 | null |
2024-05-31 | Early Stopping Criteria for Training Generative Adversarial Networks in Biomedical Imaging | Muhammad Muneeb Saad et.al. | 2405.20987v1 | null |
2024-05-31 | PUAL: A Classifier on Trifurcate Positive-Unlabeled Data | Xiaoke Wang et.al. | 2405.20970v1 | null |
2024-05-31 | Aligning Multiclass Neural Network Classifier Criterion with Task Performance via |
Nathan Tsoi et.al. | 2405.20954v1 | null |
2024-05-31 | Standard model of electromagnetism and chirality in crystals | R. Winkler et.al. | 2405.20940v1 | null |
2024-05-31 | MALT: Multi-scale Action Learning Transformer for Online Action Detection | Zhipeng Yang et.al. | 2405.20892v1 | null |
2024-05-30 | MotionLLM: Understanding Human Behaviors from Human Motions and Videos | Ling-Hao Chen et.al. | 2405.20340v1 | null |
2024-05-30 | OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving | Lening Wang et.al. | 2405.20337v1 | link |
2024-05-30 | VividDream: Generating 3D Scene with Ambient Dynamics | Yao-Chih Lee et.al. | 2405.20334v1 | null |
2024-05-30 | SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos | Chinedu Innocent Nwoye et.al. | 2405.20333v1 | null |
2024-05-31 | 4DHands: Reconstructing Interactive Hands in 4D with Transformers | Dixuan Lin et.al. | 2405.20330v2 | null |
2024-05-30 | MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion | Shuyuan Tu et.al. | 2405.20325v1 | null |
2024-05-30 | Vision-based Manipulation from Single Human Video with Open-World Object Graphs | Yifeng Zhu et.al. | 2405.20321v1 | null |
2024-05-30 | Improving the Training of Rectified Flows | Sangyun Lee et.al. | 2405.20320v1 | link |
2024-05-30 | CausalQuest: Collecting Natural Causal Questions for AI Agents | Roberto Ceraolo et.al. | 2405.20318v1 | link |
2024-05-30 | Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models | Himangi Mittal et.al. | 2405.20305v1 | null |
2024-05-29 | X-VILA: Cross-Modality Alignment for Large Language Model | Hanrong Ye et.al. | 2405.19335v1 | null |
2024-05-29 | LLMs Meet Multimodal Generation and Editing: A Survey | Yingqing He et.al. | 2405.19334v1 | link |
2024-05-29 | Multi-Modal Generative Embedding Model | Feipeng Ma et.al. | 2405.19333v1 | null |
2024-05-29 | NPGA: Neural Parametric Gaussian Avatars | Simon Giebenhain et.al. | 2405.19331v1 | null |
2024-05-29 | Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation | Atrisha Sarkar et.al. | 2405.19328v1 | null |
2024-05-29 | DGD: Dynamic 3D Gaussians Distillation | Isaac Labe et.al. | 2405.19321v1 | null |
2024-05-29 | Real-Time Environment Condition Classification for Autonomous Vehicles | Marco Introvigne et.al. | 2405.19305v1 | null |
2024-05-29 | Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare | Hanwei Zhu et.al. | 2405.19298v1 | null |
2024-05-29 | Archetype-Based Redshift Estimation for the Dark Energy Spectroscopic Instrument Survey | Abhijeet Anand et.al. | 2405.19288v1 | null |
2024-05-29 | A study on the adequacy of common IQA measures for medical images | Anna Breger et.al. | 2405.19224v1 | null |
2024-05-28 | Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets | Khen Cohen et.al. | 2405.18427v1 | null |
2024-05-28 | GFlow: Recovering 4D World from Monocular Video | Shizun Wang et.al. | 2405.18426v1 | null |
2024-05-28 | Hierarchical World Models as Visual Whole-Body Humanoid Controllers | Nicklas Hansen et.al. | 2405.18418v1 | null |
2024-05-28 | 3D StreetUnveiler with Semantic-Aware 2DGS | Jingwei Xu et.al. | 2405.18416v1 | null |
2024-05-28 | Why are Visually-Grounded Language Models Bad at Image Classification? | Yuhui Zhang et.al. | 2405.18415v1 | link |
2024-05-28 | Towards a Sampling Theory for Implicit Neural Representations | Mahrokh Najaf et.al. | 2405.18410v1 | null |
2024-05-28 | Phased Consistency Model | Fu-Yun Wang et.al. | 2405.18407v1 | null |
2024-05-28 | RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives | Jaehong Yoon et.al. | 2405.18406v1 | null |
2024-05-28 | MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning | Somnath Kumar et.al. | 2405.18358v1 | null |
2024-05-28 | Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography | Jie Liu et.al. | 2405.18356v1 | link |
2024-05-27 | Matryoshka Multimodal Models | Mu Cai et.al. | 2405.17430v1 | null |
2024-05-27 | NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models | Chankyu Lee et.al. | 2405.17428v1 | null |
2024-05-27 | MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds | Jiahui Lei et.al. | 2405.17421v1 | null |
2024-05-27 | Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control | Zhengfei Kuang et.al. | 2405.17414v1 | null |
2024-05-27 | Enhancing Music Genre Classification through Multi-Algorithm Analysis and User-Friendly Visualization | Navin Kamuni et.al. | 2405.17413v1 | null |
2024-05-27 | The Peripatetic Hater: Predicting Movement Among Hate Subreddits | Daniel Hickey et.al. | 2405.17410v1 | null |
2024-05-27 | Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer | Ruizhi Shao et.al. | 2405.17405v1 | null |
2024-05-27 | Spectral Greedy Coresets for Graph Neural Networks | Mucong Ding et.al. | 2405.17404v1 | null |
2024-05-27 | Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability | Shenyuan Gao et.al. | 2405.17398v1 | link |
2024-05-27 | Non-Unitary Quantum Machine Learning | Jamie Heredge et.al. | 2405.17388v1 | null |
2024-05-24 | Canonical Variates in Wasserstein Metric Space | Jia Li et.al. | 2405.15768v1 | null |
2024-05-24 | Scaling Laws for Discriminative Classification in Large Language Models | Dean Wyatte et.al. | 2405.15765v1 | null |
2024-05-24 | InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation | Yuchi Wang et.al. | 2405.15758v1 | link |
2024-05-24 | Looking Backward: Streaming Video-to-Video Translation with Feature Banks | Feng Liang et.al. | 2405.15757v1 | link |
2024-05-24 | Characterizing Discourse Group Roles in Inquiry-based University Science Labs | Tong Wan et.al. | 2405.15746v1 | null |
2024-05-24 | Hierarchical Uncertainty Exploration via Feedforward Posterior Trees | Elias Nehme et.al. | 2405.15719v1 | null |
2024-05-24 | EmpathicStories++: A Multimodal Dataset for Empathy towards Personal Experiences | Jocelyn Shen et.al. | 2405.15708v1 | null |
2024-05-24 | Sums: Sniffing Unknown Multiband Signals under Low Sampling Rates | Jinbo Peng et.al. | 2405.15705v1 | null |
2024-05-24 | realSEUDO for real-time calcium imaging analysis | Iuliia Dmitrieva et.al. | 2405.15701v1 | null |
2024-05-24 | UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes | Ted Lentsch et.al. | 2405.15688v1 | null |
2024-05-23 | PuzzleAvatar: Assembling 3D Avatars from Personal Albums | Yuliang Xiu et.al. | 2405.14869v1 | null |
2024-05-23 | Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis | Basile Van Hoorick et.al. | 2405.14868v1 | null |
2024-05-23 | Video Diffusion Models are Training-free Motion Interpreter and Controller | Zeqi Xiao et.al. | 2405.14864v1 | null |
2024-05-23 | Synergistic Global-space Camera and Human Reconstruction from Videos | Yizhou Zhao et.al. | 2405.14855v1 | null |
2024-05-23 | Domain Wall Magnetic Tunnel Junction Reliable Integrate and Fire Neuron | Can Cui1 et.al. | 2405.14851v1 | null |
2024-05-23 | Learning to Detect and Segment Mobile Objects from Unlabeled Videos | Yihong Sun et.al. | 2405.14841v1 | null |
2024-05-23 | Designing A Sustainable Marine Debris Clean-up Framework without Human Labels | Raymond Wang et.al. | 2405.14815v1 | null |
2024-05-23 | As an AI Language Model, "Yes I Would Recommend Calling the Police'': Norm Inconsistency in LLM Decision-Making | Shomik Jain et.al. | 2405.14812v1 | null |
2024-05-23 | Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics | Jonas Spinner et.al. | 2405.14806v1 | null |
2024-05-24 | Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation | Hongxu Jiang et.al. | 2405.14802v2 | link |
2024-05-21 | Comprehensive Multimodal Deep Learning Survival Prediction Enabled by a Transformer Architecture: A Multicenter Study in Glioblastoma | Ahmed Gomaa et.al. | 2405.12963v1 | null |
2024-05-21 | **Online Learning of Halfspaces with Massart N |
-
Notifications
You must be signed in to change notification settings - Fork 24
DWCTOD/cv-arxiv-daily
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published