这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–50 of 168 results for author: Vu, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.11770  [pdf, ps, other

    cs.RO

    Generating Actionable Robot Knowledge Bases by Combining 3D Scene Graphs with Robot Ontologies

    Authors: Giang Nguyen, Mihai Pomarlan, Sascha Jongebloed, Nils Leusmann, Minh Nhat Vu, Michael Beetz

    Abstract: In robotics, the effective integration of environmental data into actionable knowledge remains a significant challenge due to the variety and incompatibility of data formats commonly used in scene descriptions, such as MJCF, URDF, and SDF. This paper presents a novel approach that addresses these challenges by developing a unified scene graph model that standardizes these varied formats into the U… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

    Comments: 8 pages, 7 figures, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2025)

  2. arXiv:2507.01705  [pdf, ps, other

    cs.RO

    Efficient Collision Detection for Long and Slender Robotic Links in Euclidean Distance Fields: Application to a Forestry Crane

    Authors: Marc-Philip Ecker, Bernhard Bischof, Minh Nhat Vu, Christoph Fröhlich, Tobias Glück, Wolfgang Kemmetmüller

    Abstract: Collision-free motion planning in complex outdoor environments relies heavily on perceiving the surroundings through exteroceptive sensors. A widely used approach represents the environment as a voxelized Euclidean distance field, where robots are typically approximated by spheres. However, for large-scale manipulators such as forestry cranes, which feature long and slender links, this conventiona… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Accepted at IROS 2025

  3. arXiv:2506.20314  [pdf, ps, other

    cs.RO

    Near Time-Optimal Hybrid Motion Planning for Timber Cranes

    Authors: Marc-Philip Ecker, Bernhard Bischof, Minh Nhat Vu, Christoph Fröhlich, Tobias Glück, Wolfgang Kemmetmüller

    Abstract: Efficient, collision-free motion planning is essential for automating large-scale manipulators like timber cranes. They come with unique challenges such as hydraulic actuation constraints and passive joints-factors that are seldom addressed by current motion planning methods. This paper introduces a novel approach for time-optimal, collision-free hybrid motion planning for a hydraulically actuated… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Accepted at ICRA 2025

  4. arXiv:2506.18448  [pdf, ps, other

    cs.RO

    GraspMAS: Zero-Shot Language-driven Grasp Detection with Multi-Agent System

    Authors: Quang Nguyen, Tri Le, Huy Nguyen, Thieu Vo, Tung D. Ta, Baoru Huang, Minh N. Vu, Anh Nguyen

    Abstract: Language-driven grasp detection has the potential to revolutionize human-robot interaction by allowing robots to understand and execute grasping tasks based on natural language commands. However, existing approaches face two key challenges. First, they often struggle to interpret complex text instructions or operate ineffectively in densely cluttered environments. Second, most methods require a tr… ▽ More

    Submitted 19 July, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

    Comments: Accepted to IROS 2025. Webpage: https://zquang2202.github.io/GraspMAS/

  5. arXiv:2506.17320  [pdf

    cs.CY cs.CV cs.LG

    MAARTA:Multi-Agentic Adaptive Radiology Teaching Assistant

    Authors: Akash Awasthi, Brandon V. Chang, Anh M. Vu, Ngan Le, Rishi Agrawal, Zhigang Deng, Carol Wu, Hien Van Nguyen

    Abstract: Radiology students often struggle to develop perceptual expertise due to limited expert mentorship time, leading to errors in visual search and diagnostic interpretation. These perceptual errors, such as missed fixations, short dwell times, or misinterpretations, are not adequately addressed by current AI systems, which focus on diagnostic accuracy but fail to explain how and why errors occur. To… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: Accepted to MICCAI 2025 (Main Conference)

  6. arXiv:2506.17292  [pdf, ps, other

    cs.CR cs.AI

    Theoretically Unmasking Inference Attacks Against LDP-Protected Clients in Federated Vision Models

    Authors: Quan Nguyen, Minh N. Vu, Truc Nguyen, My T. Thai

    Abstract: Federated Learning enables collaborative learning among clients via a coordinating server while avoiding direct data sharing, offering a perceived solution to preserve privacy. However, recent studies on Membership Inference Attacks (MIAs) have challenged this notion, showing high success rates against unprotected training data. While local differential privacy (LDP) is widely regarded as a gold s… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Accepted to ICML 2025

  7. arXiv:2506.13478  [pdf, ps, other

    cs.RO

    Learning Swing-up Maneuvers for a Suspended Aerial Manipulation Platform in a Hierarchical Control Framework

    Authors: Hemjyoti Das, Minh Nhat Vu, Christian Ott

    Abstract: In this work, we present a novel approach to augment a model-based control method with a reinforcement learning (RL) agent and demonstrate a swing-up maneuver with a suspended aerial manipulation platform. These platforms are targeted towards a wide range of applications on construction sites involving cranes, with swing-up maneuvers allowing it to perch at a given location, inaccessible with pure… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 6 pages, 10 figures

  8. arXiv:2506.12095  [pdf, ps, other

    cs.RO

    DoublyAware: Dual Planning and Policy Awareness for Temporal Difference Learning in Humanoid Locomotion

    Authors: Khang Nguyen, An T. Le, Jan Peters, Minh Nhat Vu

    Abstract: Achieving robust robot learning for humanoid locomotion is a fundamental challenge in model-based reinforcement learning (MBRL), where environmental stochasticity and randomness can hinder efficient exploration and learning stability. The environmental, so-called aleatoric, uncertainty can be amplified in high-dimensional action spaces with complex contact dynamics, and further entangled with epis… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  9. arXiv:2505.19080  [pdf, ps, other

    cs.RO

    ReFineVLA: Reasoning-Aware Teacher-Guided Transfer Fine-Tuning

    Authors: Tuan Van Vo, Tan Quang Nguyen, Khang Minh Nguyen, Duy Ho Minh Nguyen, Minh Nhat Vu

    Abstract: Vision-Language-Action (VLA) models have gained much attention from the research community thanks to their strength in translating multimodal observations with linguistic instructions into robotic actions. Despite their recent advancements, VLAs often overlook the explicit reasoning and only learn the functional input-action mappings, omitting these crucial logical steps for interpretability and g… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: 10 pages

  10. arXiv:2505.13549  [pdf, ps, other

    cs.RO

    TD-GRPC: Temporal Difference Learning with Group Relative Policy Constraint for Humanoid Locomotion

    Authors: Khang Nguyen, Khai Nguyen, An T. Le, Jan Peters, Manfred Huber, Ngo Anh Vien, Minh Nhat Vu

    Abstract: Robot learning in high-dimensional control settings, such as humanoid locomotion, presents persistent challenges for reinforcement learning (RL) algorithms due to unstable dynamics, complex contact interactions, and sensitivity to distributional shifts during training. Model-based methods, \textit{e.g.}, Temporal-Difference Model Predictive Control (TD-MPC), have demonstrated promising results by… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  11. arXiv:2505.11741  [pdf, ps, other

    cs.AI cs.CR

    Diverging Towards Hallucination: Detection of Failures in Vision-Language Models via Multi-token Aggregation

    Authors: Geigh Zollicoffer, Minh Vu, Manish Bhattarai

    Abstract: Vision-language models (VLMs) now rival human performance on many multimodal tasks, yet they still hallucinate objects or generate unsafe text. Current hallucination detectors, e.g., single-token linear probing (SLP) and P(True), typically analyze only the logit of the first generated token or just its highest scoring component overlooking richer signals embedded within earlier token distributions… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  12. arXiv:2505.01059  [pdf, ps, other

    cs.RO cs.AI cs.LG eess.SY

    Model Tensor Planning

    Authors: An T. Le, Khai Nguyen, Minh Nhat Vu, João Carvalho, Jan Peters

    Abstract: Sampling-based model predictive control (MPC) offers strong performance in nonlinear and contact-rich robotic tasks, yet often suffers from poor exploration due to locally greedy sampling schemes. We propose \emph{Model Tensor Planning} (MTP), a novel sampling-based MPC framework that introduces high-entropy control trajectory generation through structured tensor sampling. By sampling over randomi… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 22 pages, 9 figures

  13. arXiv:2504.21190  [pdf, other

    cs.LG cs.AI

    TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts

    Authors: Pradip Kunwar, Minh N. Vu, Maanak Gupta, Mahmoud Abdelsalam, Manish Bhattarai

    Abstract: We propose Tensor-Trained Low-Rank Adaptation Mixture of Experts (TT-LoRA MoE), a novel computational framework integrating Parameter-Efficient Fine-Tuning (PEFT) with sparse MoE routing to address scalability challenges in large model deployments. Unlike traditional MoE approaches, which face substantial computational overhead as expert counts grow, TT-LoRA MoE decomposes training into two distin… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  14. arXiv:2504.16010  [pdf, other

    cs.MA cs.LG econ.GN

    The Formation of Production Networks: How Supply Chains Arise from Simple Learning with Minimal Information

    Authors: Tuong Manh Vu, Ernesto Carrella, Robert Axtell, Omar A. Guerrero

    Abstract: We develop a model where firms determine the price at which they sell their differentiable goods, the volume that they produce, and the inputs (types and amounts) that they purchase from other firms. A steady-state production network emerges endogenously without resorting to assumptions such as equilibrium or perfect knowledge about production technologies. Through a simple version of reinforcemen… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  15. arXiv:2504.00339  [pdf, other

    cs.CL cs.AI

    VNJPTranslate: A comprehensive pipeline for Vietnamese-Japanese translation

    Authors: Hoang Hai Phan, Nguyen Duc Minh Vu, Nam Dang Phuong

    Abstract: Neural Machine Translation (NMT) driven by Transformer architectures has advanced significantly, yet faces challenges with low-resource language pairs like Vietnamese-Japanese (Vi-Ja). Issues include sparse parallel data and handling linguistic/cultural nuances. Recent progress in Large Language Models (LLMs) with strong reasoning, often refined via Reinforcement Learning (RL), enables high-qualit… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  16. arXiv:2503.14160  [pdf, other

    cs.RO

    GPU-Accelerated Motion Planning of an Underactuated Forestry Crane in Cluttered Environments

    Authors: Minh Nhat Vu, Gerald Ebmer, Alexander Watcher, Marc-Philip Ecker, Giang Nguyen, Tobias Glueck

    Abstract: Autonomous large-scale machine operations require fast, efficient, and collision-free motion planning while addressing unique challenges such as hydraulic actuation limits and underactuated joint dynamics. This paper presents a novel two-step motion planning framework designed for an underactuated forestry crane. The first step employs GPU-accelerated stochastic optimization to rapidly compute a g… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: 7 pages

  17. arXiv:2503.06796  [pdf, other

    cs.RO

    RoboDesign1M: A Large-scale Dataset for Robot Design Understanding

    Authors: Tri Le, Toan Nguyen, Quang Tran, Quang Nguyen, Baoru Huang, Hoan Nguyen, Minh Nhat Vu, Tung D. Ta, Anh Nguyen

    Abstract: Robot design is a complex and time-consuming process that requires specialized expertise. Gaining a deeper understanding of robot design data can enable various applications, including automated design generation, retrieving example designs from text, and developing AI-powered design assistants. While recent advancements in foundation models present promising approaches to addressing these challen… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 8 pages

  18. arXiv:2503.06135  [pdf, other

    cs.RO

    FlowMP: Learning Motion Fields for Robot Planning with Conditional Flow Matching

    Authors: Khang Nguyen, An T. Le, Tien Pham, Manfred Huber, Jan Peters, Minh Nhat Vu

    Abstract: Prior flow matching methods in robotics have primarily learned velocity fields to morph one distribution of trajectories into another. In this work, we extend flow matching to capture second-order trajectory dynamics, incorporating acceleration effects either explicitly in the model or implicitly through the learning objective. Unlike diffusion models, which rely on a noisy forward process and ite… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

  19. arXiv:2503.01206  [pdf, other

    cs.RO

    Action Tokenizer Matters in In-Context Imitation Learning

    Authors: An Dinh Vuong, Minh Nhat Vu, Dong An, Ian Reid

    Abstract: In-context imitation learning (ICIL) is a new paradigm that enables robots to generalize from demonstrations to unseen tasks without retraining. A well-structured action representation is the key to capturing demonstration information effectively, yet action tokenizer (the process of discretizing and encoding actions) remains largely unexplored in ICIL. In this work, we first systematically evalua… ▽ More

    Submitted 4 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: 7 pages, 6 figures

  20. arXiv:2503.00962  [pdf, other

    cs.CV cs.AI cs.LG

    Using Synthetic Images to Augment Small Medical Image Datasets

    Authors: Minh H. Vu, Lorenzo Tronchin, Tufve Nyholm, Tommy Löfstedt

    Abstract: Recent years have witnessed a growing academic and industrial interest in deep learning (DL) for medical imaging. To perform well, DL models require very large labeled datasets. However, most medical imaging datasets are small, with a limited number of annotated samples. The reason they are small is usually because delineating medical images is time-consuming and demanding for oncologists. There a… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: 14 pages

  21. arXiv:2502.17909  [pdf, other

    cs.HC cs.AI

    FactFlow: Automatic Fact Sheet Generation and Customization from Tabular Dataset via AI Chain Design & Implementation

    Authors: Minh Duc Vu, Jieshan Chen, Zhenchang Xing, Qinghua Lu, Xiwei Xu, Qian Fu

    Abstract: With the proliferation of data across various domains, there is a critical demand for tools that enable non-experts to derive meaningful insights without deep data analysis skills. To address this need, existing automatic fact sheet generation tools offer heuristic-based solutions to extract facts and generate stories. However, they inadequately grasp the semantics of data and struggle to generate… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: 11 pages, 6 figures

    ACM Class: I.2; H.4

  22. arXiv:2502.01304  [pdf, other

    cs.RO

    Towards Autonomous Wood-Log Grasping with a Forestry Crane: Simulator and Benchmarking

    Authors: Minh Nhat Vu, Alexander Wachter, Gerald Ebmer, Marc-Philip Ecker, Tobias Glück, Anh Nguyen, Wolfgang Kemmetmueller, Andreas Kugi

    Abstract: Forestry machines operated in forest production environments face challenges when performing manipulation tasks, especially regarding the complicated dynamics of underactuated crane systems and the heavy weight of logs to be grasped. This study investigates the feasibility of using reinforcement learning for forestry crane manipulators in grasping and lifting heavy wood logs autonomously. We first… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: 7 pages. Accepted to ICRA 2025

  23. arXiv:2501.18006  [pdf, other

    cs.LG cs.AI cs.CR

    Topological Signatures of Adversaries in Multimodal Alignments

    Authors: Minh Vu, Geigh Zollicoffer, Huy Mai, Ben Nebgen, Boian Alexandrov, Manish Bhattarai

    Abstract: Multimodal Machine Learning systems, particularly those aligning text and image data like CLIP/BLIP models, have become increasingly prevalent, yet remain susceptible to adversarial attacks. While substantial research has addressed adversarial robustness in unimodal contexts, defense strategies for multimodal systems are underexplored. This work investigates the topological signatures that arise b… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

  24. arXiv:2501.17968  [pdf, other

    cs.RO

    Online Trajectory Replanner for Dynamically Grasping Irregular Objects

    Authors: Minh Nhat Vu, Florian Grander, Anh Nguyen

    Abstract: This paper presents a new trajectory replanner for grasping irregular objects. Unlike conventional grasping tasks where the object's geometry is assumed simple, we aim to achieve a "dynamic grasp" of the irregular objects, which requires continuous adjustment during the grasping process. To effectively handle irregular objects, we propose a trajectory optimization framework that comprises two phas… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

    Comments: 7 pages. Accepted to ICRA 2025

  25. arXiv:2501.16992  [pdf, other

    cs.CV

    FedEFM: Federated Endovascular Foundation Model with Unseen Data

    Authors: Tuong Do, Nghia Vu, Tudor Jianu, Baoru Huang, Minh Vu, Jionglong Su, Erman Tjiputra, Quang D. Tran, Te-Chuan Chiu, Anh Nguyen

    Abstract: In endovascular surgery, the precise identification of catheters and guidewires in X-ray images is essential for reducing intervention risks. However, accurately segmenting catheter and guidewire structures is challenging due to the limited availability of labeled data. Foundation models offer a promising solution by enabling the collection of similar domain data to train models whose weights can… ▽ More

    Submitted 28 January, 2025; originally announced January 2025.

    Comments: 8 pages. Accepted to ICRA 2025

  26. arXiv:2501.16306  [pdf, other

    eess.SP cs.LG cs.NI

    Graph Neural Network Based Hybrid Beamforming Design in Wideband Terahertz MIMO-OFDM Systems

    Authors: Beier Li, Mai Vu

    Abstract: 6G wireless technology is projected to adopt higher and wider frequency bands, enabled by highly directional beamforming. However, the vast bandwidths available also make the impact of beam squint in massive multiple input and multiple output (MIMO) systems non-negligible. Traditional approaches such as adding a true-time-delay line (TTD) on each antenna are costly due to the massive antenna array… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

    Comments: 6 pages, 7 figures. This conference paper was published in the 2024 IEEE International Symposium on Phased Array Systems and Technology

  27. arXiv:2501.00004  [pdf, other

    cs.IR cs.AI cs.CL

    NewsHomepages: Homepage Layouts Capture Information Prioritization Decisions

    Authors: Ben Welsh, Naitian Zhou, Arda Kaz, Michael Vu, Alexander Spangher

    Abstract: Information prioritization plays an important role in how humans perceive and understand the world. Homepage layouts serve as a tangible proxy for this prioritization. In this work, we present NewsHomepages, a large dataset of over 3,000 new website homepages (including local, national and topic-specific outlets) captured twice daily over a three-year period. We develop models to perform pairwise… ▽ More

    Submitted 20 November, 2024; originally announced January 2025.

  28. arXiv:2412.19835  [pdf, other

    eess.SP cs.LG cs.MA cs.NI

    Multi-Agent Q-Learning for Real-Time Load Balancing User Association and Handover in Mobile Networks

    Authors: Alireza Alizadeh, Byungju Lim, Mai Vu

    Abstract: As next generation cellular networks become denser, associating users with the optimal base stations at each time while ensuring no base station is overloaded becomes critical for achieving stable and high network performance. We propose multi-agent online Q-learning (QL) algorithms for performing real-time load balancing user association and handover in dense cellular networks. The load balancing… ▽ More

    Submitted 22 December, 2024; originally announced December 2024.

  29. arXiv:2412.05159  [pdf, other

    cs.AI cs.SE

    Enhancing Cross-Language Code Translation via Task-Specific Embedding Alignment in Retrieval-Augmented Generation

    Authors: Manish Bhattarai, Minh Vu, Javier E. Santos, Ismael Boureima, Daniel O' Malley

    Abstract: We introduce a novel method to enhance cross-language code translation from Fortran to C++ by integrating task-specific embedding alignment into a Retrieval-Augmented Generation (RAG) framework. Unlike conventional retrieval approaches that utilize generic embeddings agnostic to the downstream task, our strategy aligns the retrieval model directly with the objective of maximizing translation quali… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

  30. arXiv:2412.04661  [pdf, other

    cs.IR cs.AI

    HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning

    Authors: Manish Bhattarai, Ryan Barron, Maksim Eren, Minh Vu, Vesselin Grantcharov, Ismael Boureima, Valentin Stanev, Cynthia Matuszek, Vladimir Valtchinov, Kim Rasmussen, Boian Alexandrov

    Abstract: Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external document retrieval to provide domain-specific or up-to-date knowledge. The effectiveness of RAG depends on the relevance of retrieved documents, which is influenced by the semantic alignment of embeddings with the domain's specialized content. Although full fine-tuning can align language models to sp… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

  31. arXiv:2412.02886  [pdf, other

    cs.CV

    Patchfinder: Leveraging Visual Language Models for Accurate Information Retrieval using Model Uncertainty

    Authors: Roman Colman, Minh Vu, Manish Bhattarai, Martin Ma, Hari Viswanathan, Daniel O'Malley, Javier E. Santos

    Abstract: For decades, corporations and governments have relied on scanned documents to record vast amounts of information. However, extracting this information is a slow and tedious process due to the sheer volume and complexity of these records. The rise of Vision Language Models (VLMs) presents a way to efficiently and accurately extract the information out of these documents. The current automated workf… ▽ More

    Submitted 13 December, 2024; v1 submitted 3 December, 2024; originally announced December 2024.

    Comments: This paper has been accepted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025

    ACM Class: F.2.2; I.2.7

  32. arXiv:2411.18578  [pdf, other

    cs.LG

    Pruning Deep Convolutional Neural Network Using Conditional Mutual Information

    Authors: Tien Vu-Van, Dat Du Thanh, Nguyen Ho, Mai Vu

    Abstract: Convolutional Neural Networks (CNNs) achieve high performance in image classification tasks but are challenging to deploy on resource-limited hardware due to their large model sizes. To address this issue, we leverage Mutual Information, a metric that provides valuable insights into how deep learning models retain and process information through measuring the shared information between input featu… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

  33. arXiv:2409.18351  [pdf, other

    cs.SE cs.AI cs.CR cs.IR

    Tracking Software Security Topics

    Authors: Phong Minh Vu, Tung Thanh Nguyen

    Abstract: Software security incidents occur everyday and thousands of software security reports are announced each month. Thus, it is difficult for software security researchers, engineers, and other stakeholders to follow software security topics of their interests in real-time. In this paper, we propose, SOSK, a novel tool for this problem. SOSK allows a user to import a collection of software security re… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  34. arXiv:2409.17727  [pdf, other

    cs.RO cs.CV

    Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications

    Authors: Nghia Nguyen, Minh Nhat Vu, Tung D. Ta, Baoru Huang, Thieu Vo, Ngan Le, Anh Nguyen

    Abstract: Vision language models have played a key role in extracting meaningful features for various robotic applications. Among these, Contrastive Language-Image Pretraining (CLIP) is widely used in robotic tasks that require both vision and natural language understanding. However, CLIP was trained solely on static images paired with text prompts and has not yet been fully adapted for robotic tasks involv… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: 7 pages

  35. arXiv:2409.14403  [pdf, other

    cs.RO cs.CV

    GraspMamba: A Mamba-based Language-driven Grasp Detection Framework with Hierarchical Feature Learning

    Authors: Huy Hoang Nguyen, An Vuong, Anh Nguyen, Ian Reid, Minh Nhat Vu

    Abstract: Grasp detection is a fundamental robotic task critical to the success of many industrial applications. However, current language-driven models for this task often struggle with cluttered images, lengthy textual descriptions, or slow inference speed. We introduce GraspMamba, a new language-driven grasp detection method that employs hierarchical feature fusion with Mamba vision to tackle these chall… ▽ More

    Submitted 22 September, 2024; originally announced September 2024.

    Comments: 8 pages. Project page: https://airvlab.github.io/grasp-anything/

  36. arXiv:2409.13524  [pdf, other

    cs.CR cs.AI

    Contextualized AI for Cyber Defense: An Automated Survey using LLMs

    Authors: Christoforus Yoga Haryanto, Anne Maria Elvira, Trung Duc Nguyen, Minh Hieu Vu, Yoshiano Hartanto, Emily Lomempow, Arathi Arakala

    Abstract: This paper surveys the potential of contextualized AI in enhancing cyber defense capabilities, revealing significant research growth from 2015 to 2024. We identify a focus on robustness, reliability, and integration methods, while noting gaps in organizational trust and governance frameworks. Our study employs two LLM-assisted literature survey methodologies: (A) ChatGPT 4 for exploration, and (B)… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: 8 pages, 2 figures, 4 tables, accepted into 17th International Conference on Security of Information and Networks (SINCONF 2024)

  37. arXiv:2409.08885  [pdf, other

    cs.CV

    Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing

    Authors: Minh-Duc Vu, Zuheng Ming, Fangchen Feng, Bissmella Bahaduri, Anissa Mokraoui

    Abstract: Object detection in remote sensing imagery plays a vital role in various Earth observation applications. However, unlike object detection in natural scene images, this task is particularly challenging due to the abundance of small, often barely visible objects across diverse terrains. To address these challenges, multimodal learning can be used to integrate features from different data modalities,… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  38. arXiv:2409.08255  [pdf, other

    cs.LG cs.AI cs.CR

    LoRID: Low-Rank Iterative Diffusion for Adversarial Purification

    Authors: Geigh Zollicoffer, Minh Vu, Ben Nebgen, Juan Castorena, Boian Alexandrov, Manish Bhattarai

    Abstract: This work presents an information-theoretic examination of diffusion-based purification methods, the state-of-the-art adversarial defenses that utilize diffusion models to remove malicious perturbations in adversarial examples. By theoretically characterizing the inherent purification errors associated with the Markov-based diffusion purifications, we introduce LoRID, a novel Low-Rank Iterative Di… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Comments: LA-UR-24-28834

  39. arXiv:2409.04374  [pdf, other

    cs.LG

    Gaussian-Mixture-Model Q-Functions for Reinforcement Learning by Riemannian Optimization

    Authors: Minh Vu, Konstantinos Slavakis

    Abstract: This paper establishes a novel role for Gaussian-mixture models (GMMs) as functional approximators of Q-function losses in reinforcement learning (RL). Unlike the existing RL literature, where GMMs play their typical role as estimates of probability density functions, GMMs approximate here Q-function losses. The new Q-function approximators, coined GMM-QFs, are incorporated in Bellman residuals to… ▽ More

    Submitted 10 September, 2024; v1 submitted 6 September, 2024; originally announced September 2024.

  40. arXiv:2408.13126  [pdf, other

    cs.CV

    CathAction: A Benchmark for Endovascular Intervention Understanding

    Authors: Baoru Huang, Tuan Vo, Chayun Kongtongvattana, Giulio Dagnino, Dennis Kundrat, Wenqiang Chi, Mohamed Abdelaziz, Trevor Kwok, Tudor Jianu, Tuong Do, Hieu Le, Minh Nguyen, Hoan Nguyen, Erman Tjiputra, Quang Tran, Jianyang Xie, Yanda Meng, Binod Bhattarai, Zhaorui Tan, Hongbin Liu, Hong Seng Gan, Wei Wang, Xi Yang, Qiufeng Wang, Jionglong Su , et al. (13 additional authors not shown)

    Abstract: Real-time visual feedback from catheterization analysis is crucial for enhancing surgical safety and efficiency during endovascular interventions. However, existing datasets are often limited to specific tasks, small scale, and lack the comprehensive annotations necessary for broader endovascular intervention understanding. To tackle these limitations, we introduce CathAction, a large-scale datase… ▽ More

    Submitted 30 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: 10 pages. Webpage: https://airvlab.github.io/cathaction/

  41. arXiv:2408.03909  [pdf, other

    cs.LG cs.AI cs.CR

    LaFA: Latent Feature Attacks on Non-negative Matrix Factorization

    Authors: Minh Vu, Ben Nebgen, Erik Skau, Geigh Zollicoffer, Juan Castorena, Kim Rasmussen, Boian Alexandrov, Manish Bhattarai

    Abstract: As Machine Learning (ML) applications rapidly grow, concerns about adversarial attacks compromising their reliability have gained significant attention. One unsupervised ML method known for its resilience to such attacks is Non-negative Matrix Factorization (NMF), an algorithm that decomposes input data into lower-dimensional latent features. However, the introduction of powerful computational too… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: LA-UR-24-26951

  42. arXiv:2407.19877  [pdf, other

    cs.RO cs.CV

    Language-driven Grasp Detection with Mask-guided Attention

    Authors: Tuan Van Vo, Minh Nhat Vu, Baoru Huang, An Vuong, Ngan Le, Thieu Vo, Anh Nguyen

    Abstract: Grasp detection is an essential task in robotics with various industrial applications. However, traditional methods often struggle with occlusions and do not utilize language for grasping. Incorporating natural language into grasp detection remains a challenging task and largely unexplored. To address this gap, we propose a new method for language-driven grasp detection with mask-guided attention… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted at IROS 2024

  43. arXiv:2407.17967  [pdf, other

    cs.RO cs.CV

    Lightweight Language-driven Grasp Detection using Conditional Consistency Model

    Authors: Nghia Nguyen, Minh Nhat Vu, Baoru Huang, An Vuong, Ngan Le, Thieu Vo, Anh Nguyen

    Abstract: Language-driven grasp detection is a fundamental yet challenging task in robotics with various industrial applications. In this work, we present a new approach for language-driven grasp detection that leverages the concept of lightweight diffusion models to achieve fast inference time. By integrating diffusion processes with grasping prompts in natural language, our method can effectively encode v… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted at IROS 2024

  44. arXiv:2407.16821  [pdf, other

    cs.RO

    Fin ray-inspired, Origami, Small Scale Actuator for Fin Manipulation in Aquatic Bioinspired Robots

    Authors: Minh Vu, Revathy Ravuri, Angus Muir, Charles Mackie, Andrew Weightman, Simon Watson, Tim J. Echtermeyer

    Abstract: Fish locomotion is enabled by fin rays-actively deformable boney rods, which manipulate the fin to facilitate complex interaction with surrounding water and enable propulsion. Replicating the performance and kinematics of the biological fin ray from an engineering perspective is a challenging task and has not been realised thus far. This work introduces a prototype of a fin ray-inspired origami el… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 33 pages, 8 figures

  45. arXiv:2407.13842  [pdf, other

    cs.RO cs.CV

    Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance

    Authors: Toan Nguyen, Minh Nhat Vu, Baoru Huang, An Vuong, Quan Vuong, Ngan Le, Thieu Vo, Anh Nguyen

    Abstract: 6-DoF grasp detection has been a fundamental and challenging problem in robotic vision. While previous works have focused on ensuring grasp stability, they often do not consider human intention conveyed through natural language, hindering effective collaboration between robots and users in complex 3D environments. In this paper, we present a new approach for language-driven 6-DoF grasp detection i… ▽ More

    Submitted 25 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  46. arXiv:2407.01110  [pdf

    cs.CR cs.AI cs.CY cs.LG

    SecGenAI: Enhancing Security of Cloud-based Generative AI Applications within Australian Critical Technologies of National Interest

    Authors: Christoforus Yoga Haryanto, Minh Hieu Vu, Trung Duc Nguyen, Emily Lomempow, Yulia Nurliana, Sona Taheri

    Abstract: The rapid advancement of Generative AI (GenAI) technologies offers transformative opportunities within Australia's critical technologies of national interest while introducing unique security challenges. This paper presents SecGenAI, a comprehensive security framework for cloud-based GenAI applications, with a focus on Retrieval-Augmented Generation (RAG) systems. SecGenAI addresses functional, in… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 10 pages, 4 figures, 9 tables, submitted to the 2024 11th International Conference on Soft Computing & Machine Intelligence (ISCMI 2024)

  47. arXiv:2406.15877  [pdf, other

    cs.SE cs.AI cs.CL

    BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

    Authors: Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu , et al. (8 additional authors not shown)

    Abstract: Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have shown that LLMs can solve tasks using programs like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks o… ▽ More

    Submitted 1 April, 2025; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: Accpeted at ICLR 2025 (Oral), built with love by the BigCode community :)

  48. arXiv:2406.09489  [pdf, other

    cs.CV

    Language-driven Grasp Detection

    Authors: An Dinh Vuong, Minh Nhat Vu, Baoru Huang, Nghia Nguyen, Hieu Le, Thieu Vo, Anh Nguyen

    Abstract: Grasp detection is a persistent and intricate challenge with various industrial applications. Recently, many methods and datasets have been proposed to tackle the grasp detection problem. However, most of them do not consider using natural language as a condition to detect the grasp poses. In this paper, we introduce Grasp-Anything++, a new language-driven grasp detection dataset featuring 1M samp… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 19 pages. Accepted to CVPR24

  49. arXiv:2406.09039  [pdf, other

    cs.RO

    Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning

    Authors: Huy Hoang Nguyen, Minh Nhat Vu, Florian Beck, Gerald Ebmer, Anh Nguyen, Andreas Kugi

    Abstract: Combining a vision module inside a closed-loop control system for a \emph{seamless movement} of a robot in a manipulation task is challenging due to the inconsistent update rates between utilized modules. This task is even more difficult in a dynamic environment, e.g., objects are moving. This paper presents a \emph{modular} zero-shot framework for language-driven manipulation of (dynamic) objects… ▽ More

    Submitted 19 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 9 pages, 6 figures

  50. arXiv:2406.08572  [pdf, other

    cs.CV

    LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions

    Authors: Nhat Hoang-Xuan, Minh Vu, My T. Thai

    Abstract: Providing textual concept-based explanations for neurons in deep neural networks (DNNs) is of importance in understanding how a DNN model works. Prior works have associated concepts with neurons based on examples of concepts or a pre-defined set of concepts, thus limiting possible explanations to what the user expects, especially in discovering new concepts. Furthermore, defining the set of concep… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.