这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–46 of 46 results for author: Hui, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2508.09818  [pdf

    cs.CV

    ViMoNet: A Multimodal Vision-Language Framework for Human Behavior Understanding from Motion and Video

    Authors: Rajan Das Gupta, Md Yeasin Rahat, Nafiz Fahad, Abir Ahmed, Liew Tze Hui

    Abstract: This study investigates how large language models (LLMs) can be used to understand human behavior using motion and video data. We think that mixing both types is essential to completely capture the nuanced movements and meanings of human actions, in contrast to recent models that simply concentrate on motion data or films. To address this, we provide ViMoNet, a straightforward yet effective framew… ▽ More

    Submitted 16 November, 2025; v1 submitted 13 August, 2025; originally announced August 2025.

    Comments: This is the preprint version of the manuscript. It is currently being prepared for submission to an academic conference

  2. arXiv:2506.23623  [pdf, ps, other

    cs.CV

    Revisiting Audio-Visual Segmentation with Vision-Centric Transformer

    Authors: Shaofei Huang, Rui Ling, Tianrui Hui, Hongyu Li, Xu Zhou, Shifeng Zhang, Si Liu, Richang Hong, Meng Wang

    Abstract: Audio-Visual Segmentation (AVS) aims to segment sound-producing objects in video frames based on the associated audio signal. Prevailing AVS methods typically adopt an audio-centric Transformer architecture, where object queries are derived from audio features. However, audio-centric Transformers suffer from two limitations: perception ambiguity caused by the mixed nature of audio, and weakened de… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Accepted by CVPR 2025; Code: https://github.com/spyflying/VCT_AVS; Models: https://huggingface.co/nowherespyfly/VCT_AVS

  3. arXiv:2505.13990  [pdf, ps, other

    cs.CL

    DecIF: Improving Instruction-Following through Meta-Decomposition

    Authors: Tingfeng Hui, Pengyu Zhu, Bowen Ping, Ling Tang, Guanting Dong, Yaqi Zhang, Sen Su

    Abstract: Instruction-following has emerged as a crucial capability for large language models (LLMs). However, existing approaches often rely on pre-existing documents or external resources to synthesize instruction-following data, which limits their flexibility and generalizability. In this paper, we introduce DecIF, a fully autonomous, meta-decomposition guided framework that generates diverse and high-qu… ▽ More

    Submitted 10 June, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: We release the source code and SFT data in this version

  4. arXiv:2501.08282  [pdf, ps, other

    cs.CV

    LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

    Authors: Hongyu Li, Jinyu Chen, Ziyu Wei, Shaofei Huang, Tianrui Hui, Jialin Gao, Xiaoming Wei, Si Liu

    Abstract: Recent advancements in multimodal large language models (MLLMs) have shown promising results, yet existing approaches struggle to effectively handle both temporal and spatial localization simultaneously. This challenge stems from two key issues: first, incorporating spatial-temporal localization introduces a vast number of coordinate combinations, complicating the alignment of linguistic and visua… ▽ More

    Submitted 1 June, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

    Comments: Accepted by CVPR2025

  5. arXiv:2412.11231  [pdf, other

    cs.CL

    Smaller Language Models Are Better Instruction Evolvers

    Authors: Tingfeng Hui, Lulu Zhao, Guanting Dong, Yaqi Zhang, Hua Zhou, Sen Su

    Abstract: Instruction tuning has been widely used to unleash the complete potential of large language models. Notably, complex and diverse instructions are of significant importance as they can effectively align models with various downstream tasks. However, current approaches to constructing large-scale instructions predominantly favour powerful models such as GPT-4 or those with over 70 billion parameters… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

    Comments: Work in progress

  6. arXiv:2410.22816  [pdf, other

    cs.RO

    Advancing Manipulation Capabilities of a UAV Featuring Dynamic Center-of-Mass Displacement

    Authors: Tong Hui, Matteo Fumagalli

    Abstract: As aerial robots gain traction in industrial applications, there is growing interest in enhancing their physical interaction capabilities. Pushing tasks performed by aerial manipulators have been successfully demonstrated in contact-based inspections. However, more complex industrial applications require these systems to support higher-DoF (Degree of Freedom) manipulators and generate larger force… ▽ More

    Submitted 11 April, 2025; v1 submitted 30 October, 2024; originally announced October 2024.

    Comments: arXiv admin note: text overlap with arXiv:2404.01110, accepted to the 2025 International Conference on Unmanned Aircraft Systems (ICUAS)

  7. arXiv:2410.01610  [pdf, other

    cs.CL cs.AI cs.LG

    Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging

    Authors: Tingfeng Hui, Zhenyu Zhang, Shuohuan Wang, Yu Sun, Hua Wu, Sen Su

    Abstract: Mixture-of-Experts (MoE) shines brightly in large language models (LLMs) and demonstrates outstanding performance in plentiful natural language processing tasks. However, existing methods transforming LLMs from dense to MoE face significant data requirements and typically rely on large-scale post-training. In this paper, we propose Upcycling Instruction Tuning (UpIT), a data-efficient approach for… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: work in progress

  8. arXiv:2409.08251  [pdf, other

    cs.CV

    Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding

    Authors: Hongyu Li, Tianrui Hui, Zihan Ding, Jing Zhang, Bin Ma, Xiaoming Wei, Jizhong Han, Si Liu

    Abstract: Panoptic narrative grounding (PNG), whose core target is fine-grained image-text alignment, requires a panoptic segmentation of referred objects given a narrative caption. Previous discriminative methods achieve only weak or coarse-grained alignment by panoptic segmentation pretraining or CLIP model adaptation. Given the recent progress of text-to-image Diffusion models, several works have shown t… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Comments: Accepted by ACM MM 2024

  9. arXiv:2408.15876  [pdf, other

    cs.CV

    Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation

    Authors: Shaofei Huang, Rui Ling, Hongyu Li, Tianrui Hui, Zongheng Tang, Xiaoming Wei, Jizhong Han, Si Liu

    Abstract: In this paper, we propose an Audio-Language-Referenced SAM 2 (AL-Ref-SAM 2) pipeline to explore the training-free paradigm for audio and language-referenced video object segmentation, namely AVS and RVOS tasks. The intuitive solution leverages GroundingDINO to identify the target object from a single frame and SAM 2 to segment the identified object throughout the video, which is less robust to spa… ▽ More

    Submitted 23 December, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted by AAAI 2025

  10. arXiv:2408.15008  [pdf, other

    cs.RO

    AEROBULL: A Center-of-Mass Displacing Aerial Vehicle Enabling Efficient High-Force Interaction

    Authors: Tong Hui, Esteban Zamora, Simone D'Angelo, Stefan Rucareanu, Matteo Fumagalli

    Abstract: In various industrial sectors, inspection and maintenance tasks using UAV (Unmanned Aerial Vehicle) require substantial force application to ensure effective adherence and stable contact, posing significant challenges to existing solutions. This paper addresses these industrial needs by introducing a novel lightweight aerial platform (3.12kg) designed to exert high pushing forces on non-horizontal… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  11. arXiv:2407.16129  [pdf, other

    cs.CV cs.AI

    FoRA: Low-Rank Adaptation Model beyond Multimodal Siamese Network

    Authors: Weiying Xie, Yusi Zhang, Tianlin Hui, Jiaqing Zhang, Jie Lei, Yunsong Li

    Abstract: Multimodal object detection offers a promising prospect to facilitate robust detection in various visual conditions. However, existing two-stream backbone networks are challenged by complex fusion and substantial parameter increments. This is primarily due to large data distribution biases of multimodal homogeneous information. In this paper, we propose a novel multimodal object detector, named Lo… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  12. arXiv:2405.17844  [pdf, other

    cs.RO eess.SY

    Enhancing Sliding Performance with Aerial Robots: Analysis and Solutions for Non-Actuated Multi-Wheel Configurations

    Authors: Tong Hui, Jefferson Ghielmini, Dimitrios Papageorgiou, Marco Tognon, Roland Siegwart, Matteo Fumagalli

    Abstract: Sliding tasks performed by aerial robots are valuable for inspection and simple maintenance tasks at height, such as non-destructive testing and painting. Although various end-effector designs have been used for such tasks, non-actuated wheel configurations are more frequently applied thanks to their rolling capability for sliding motion, mechanical simplicity, and lightweight design. Moreover, a… ▽ More

    Submitted 10 September, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  13. arXiv:2404.18466  [pdf, other

    cs.CL

    HFT: Half Fine-Tuning for Large Language Models

    Authors: Tingfeng Hui, Zhenyu Zhang, Shuohuan Wang, Weiran Xu, Yu Sun, Hua Wu

    Abstract: Large language models (LLMs) with one or more fine-tuning phases have become a necessary step to unlock various capabilities, enabling LLMs to follow natural language instructions or align with human preferences. However, it carries the risk of catastrophic forgetting during sequential training, the parametric knowledge or the ability learned in previous stages may be overwhelmed by incoming train… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Work in progress

  14. arXiv:2404.01110  [pdf, other

    cs.RO eess.SY

    Dynamic Center-of-Mass Displacement in Aerial Manipulation: An Innovative Platform Design

    Authors: Tong Hui, Stefan Rucareanu, Esteban Zamora, Simone D'Angelo, Haotian Liu, Matteo Fumagalli

    Abstract: Aerial manipulators are increasingly used in contact-based industrial applications, where tasks like drilling and pushing require platforms to exert significant forces in multiple directions. To enhance force generation capabilities, various approaches, such as thrust vectoring and perching, have been explored. In this article, we introduce a novel approach by investigating the impact of varied Co… ▽ More

    Submitted 13 September, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  15. arXiv:2402.17434  [pdf, other

    cs.RO eess.SY

    Passive Aligning Physical Interaction of Fully-Actuated Aerial Vehicles for Pushing Tasks

    Authors: Tong Hui, Eugenio Cuniato, Michael Pantic, Marco Tognon, Matteo Fumagalli, Roland Siegwart

    Abstract: Recently, the utilization of aerial manipulators for performing pushing tasks in non-destructive testing (NDT) applications has seen significant growth. Such operations entail physical interactions between the aerial robotic system and the environment. End-effectors with multiple contact points are often used for placing NDT sensors in contact with a surface to be inspected. Aligning the NDT senso… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted to the 2024 IEEE International Conference on Robotics and Automation (ICRA2024)

  16. arXiv:2402.15243  [pdf, other

    cs.RO eess.SY

    Safety-Conscious Pushing on Diverse Oriented Surfaces with Underactuated Aerial Vehicles

    Authors: Tong Hui, Manuel J. Fernandez Gonzalez, Matteo Fumagalli

    Abstract: Pushing tasks performed by aerial manipulators can be used for contact-based industrial inspections. Underactuated aerial vehicles are widely employed in aerial manipulation due to their widespread availability and relatively low cost. Industrial infrastructures often consist of diverse oriented work surfaces. When interacting with such surfaces, the coupled gravity compensation and interaction fo… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: Accepted to the 2024 IEEE International Conference on Robotics and Automation (ICRA2024)

  17. arXiv:2402.14494  [pdf, other

    cs.CL

    Noise-BERT: A Unified Perturbation-Robust Framework with Noise Alignment Pre-training for Noisy Slot Filling Task

    Authors: Jinxu Zhao, Guanting Dong, Yueyan Qiu, Tingfeng Hui, Xiaoshuai Song, Daichi Guo, Weiran Xu

    Abstract: In a realistic dialogue system, the input information from users is often subject to various types of input perturbations, which affects the slot-filling task. Although rule-based data augmentation methods have achieved satisfactory results, they fail to exhibit the desired generalization when faced with unknown noise disturbances. In this study, we address the challenges posed by input perturbati… ▽ More

    Submitted 6 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted by ICASSP 2024

  18. arXiv:2312.01663  [pdf, other

    cs.CV cs.AI

    Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training

    Authors: Runze He, Shaofei Huang, Xuecheng Nie, Tianrui Hui, Luoqi Liu, Jiao Dai, Jizhong Han, Guanbin Li, Si Liu

    Abstract: In this paper, we target the adaptive source driven 3D scene editing task by proposing a CustomNeRF model that unifies a text description or a reference image as the editing prompt. However, obtaining desired editing results conformed with the editing prompt is nontrivial since there exist two significant challenges, including accurate editing of only foreground regions and multi-view consistency… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 14 pages, 13 figures, project website: https://customnerf.github.io/

  19. Versatile Airborne Ultrasonic NDT Technologies via Active Omni-Sliding with Over-Actuated Aerial Vehicles

    Authors: Tong Hui, Florian Braun, Nicolas Scheidt, Marius Fehr, Matteo Fumagalli

    Abstract: This paper presents the utilization of advanced methodologies in aerial manipulation to address meaningful industrial applications and develop versatile ultrasonic Non-Destructive Testing (NDT) technologies with aerial robots. The primary objectives of this work are to enable multi-point measurements through sliding without re-approaching the work surface, and facilitate the representation of mate… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Journal ref: Experimental Robotics. ISER 2023. Springer Proceedings in Advanced Robotics, vol 30

  20. arXiv:2311.01091  [pdf, other

    cs.CV

    Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative Grounding

    Authors: Tianrui Hui, Zihan Ding, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Jiao Dai, Jizhong Han, Si Liu

    Abstract: Panoptic narrative grounding (PNG) aims to segment things and stuff objects in an image described by noun phrases of a narrative caption. As a multimodal task, an essential aspect of PNG is the visual-linguistic interaction between image and caption. The previous two-stage method aggregates visual contexts from offline-generated mask proposals to phrase features, which tend to be noisy and fragmen… ▽ More

    Submitted 10 March, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted by IJCAI 2023. Since the PNG benchmark adopts a different data partition manner from ours, we update the experimental results on the things/stuff/singulars/plurals subsets based on the PNG's code

  21. arXiv:2310.10169  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    DemoNSF: A Multi-task Demonstration-based Generative Framework for Noisy Slot Filling Task

    Authors: Guanting Dong, Tingfeng Hui, Zhuoma GongQue, Jinxu Zhao, Daichi Guo, Gang Zhao, Keqing He, Weiran Xu

    Abstract: Recently, prompt-based generative frameworks have shown impressive capabilities in sequence labeling tasks. However, in practical dialogue scenarios, relying solely on simplistic templates and traditional corpora presents a challenge for these methods in generalizing to unknown input perturbations. To address this gap, we propose a multi-task demonstration based generative framework for noisy slot… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Findings of EMNLP 2023 (Short Paper)

  22. arXiv:2310.06504  [pdf, other

    cs.CL cs.AI cs.LG

    Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task

    Authors: Guanting Dong, Jinxu Zhao, Tingfeng Hui, Daichi Guo, Wenlong Wan, Boqi Feng, Yueyan Qiu, Zhuoma Gongque, Keqing He, Zechen Wang, Weiran Xu

    Abstract: With the increasing capabilities of large language models (LLMs), these high-performance models have achieved state-of-the-art results on a wide range of natural language processing (NLP) tasks. However, the models' performance on commonly-used benchmark datasets often fails to accurately reflect their reliability and robustness when applied to real-world noisy data. To address these challenges, w… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted at NLPCC 2023 (Oral Presentation)

  23. Information Leakage from Data Updates in Machine Learning Models

    Authors: Tian Hui, Farhad Farokhi, Olga Ohrimenko

    Abstract: In this paper we consider the setting where machine learning models are retrained on updated datasets in order to incorporate the most up-to-date information or reflect distribution shifts. We investigate whether one can infer information about these updates in the training data (e.g., changes to attribute values of records). Here, the adversary has access to snapshots of the machine learning mode… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Journal ref: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security (AISec '23), November 30, 2023, Copenhagen, Denmark

  24. arXiv:2308.14533  [pdf, other

    cs.CL

    A Multi-Task Semantic Decomposition Framework with Task-specific Pre-training for Few-Shot NER

    Authors: Guanting Dong, Zechen Wang, Jinxu Zhao, Gang Zhao, Daichi Guo, Dayuan Fu, Tingfeng Hui, Chen Zeng, Keqing He, Xuefeng Li, Liwen Wang, Xinyue Cui, Weiran Xu

    Abstract: The objective of few-shot named entity recognition is to identify named entities with limited labeled instances. Previous works have primarily focused on optimizing the traditional token-wise classification framework, while neglecting the exploration of information based on NER data characteristics. To address this issue, we propose a Multi-Task Semantic Decomposition Framework via Joint Task-spec… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by CIKM 2023 (Oral Presentation)

  25. Static-Equilibrium Oriented Interaction Force Modeling and Control of Aerial Manipulation with Uni-Directional Thrust Multirotors

    Authors: Tong Hui, Matteo Fumagalli

    Abstract: This paper presents a static-equilibrium oriented interaction force modeling and control approach of aerial manipulation employing uni-directional thrust (UDT) multirotors interacting with variously defined environments. First, a simplified system model for a quadrotor-based aerial manipulator is introduced considering parameterized work surfaces under assumptions, and then a range of meaningful m… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Journal ref: 2023 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM)

  26. arXiv:2303.04456  [pdf, other

    cs.CV

    RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes

    Authors: Tak-Wai Hui

    Abstract: Unsupervised methods have showed promising results on monocular depth estimation. However, the training data must be captured in scenes without moving objects. To push the envelope of accuracy, recent methods tend to increase their model parameters. In this paper, an unsupervised learning framework is proposed to jointly predict monocular depth and complete 3D motion including the motions of movin… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2022 (paper is updated)

  27. arXiv:2302.13610   

    cs.CL

    A Prototypical Semantic Decoupling Method via Joint Contrastive Learning for Few-Shot Name Entity Recognition

    Authors: Guanting Dong, Zechen Wang, Liwen Wang, Daichi Guo, Dayuan Fu, Yuxiang Wu, Chen Zeng, Xuefeng Li, Tingfeng Hui, Keqing He, Xinyue Cui, Qixiang Gao, Weiran Xu

    Abstract: Few-shot named entity recognition (NER) aims at identifying named entities based on only few labeled instances. Most existing prototype-based sequence labeling models tend to memorize entity mentions which would be easily confused by close prototypes. In this paper, we proposed a Prototypical Semantic Decoupling method via joint Contrastive learning (PSDC) for few-shot NER. Specifically, we decoup… ▽ More

    Submitted 12 April, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: we want to revise our paper and upload this article in few days

  28. arXiv:2302.13584  [pdf, other

    cs.CL

    Revisit Out-Of-Vocabulary Problem for Slot Filling: A Unified Contrastive Frameword with Multi-level Data Augmentations

    Authors: Daichi Guo, Guanting Dong, Dayuan Fu, Yuxiang Wu, Chen Zeng, Tingfeng Hui, Liwen Wang, Xuefeng Li, Zechen Wang, Keqing He, Xinyue Cui, Weiran Xu

    Abstract: In real dialogue scenarios, the existing slot filling model, which tends to memorize entity patterns, has a significantly reduced generalization facing Out-of-Vocabulary (OOV) problems. To address this issue, we propose an OOV robust slot filling model based on multi-level data augmentations to solve the OOV problem from both word and slot perspectives. We present a unified contrastive learning fr… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: 5 pages, 3 figures, published to ICASSP 2023

  29. arXiv:2210.02991  [pdf, other

    cs.CV cs.AI

    Cross-Modality Domain Adaptation for Freespace Detection: A Simple yet Effective Baseline

    Authors: Yuanbin Wang, Leyan Zhu, Shaofei Huang, Tianrui Hui, Xiaojie Li, Fei Wang, Si Liu

    Abstract: As one of the fundamental functions of autonomous driving system, freespace detection aims at classifying each pixel of the image captured by the camera as drivable or non-drivable. Current works of freespace detection heavily rely on large amount of densely labeled training data for accuracy and robustness, which is time-consuming and laborious to collect and annotate. To the best of our knowledg… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: ACM Multimedia 2022

  30. arXiv:2208.05647  [pdf, other

    cs.CV cs.MM

    PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding

    Authors: Zihan Ding, Zi-han Ding, Tianrui Hui, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Si Liu

    Abstract: Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to segment visual objects of things and stuff categories described by dense narrative captions of a still image. The previous two-stage approach first extracts segmentation region proposals by an off-the-shelf panoptic segmentation model, then conducts coarse region-phrase matching to ground the candidate regions for each noun ph… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

    Comments: Accepted by ACM MM 2022

  31. arXiv:2206.03789  [pdf, other

    cs.CV cs.MM

    Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation

    Authors: Zihan Ding, Tianrui Hui, Junshi Huang, Xiaoming Wei, Jizhong Han, Si Liu

    Abstract: Referring video object segmentation aims to predict foreground labels for objects referred by natural language expressions in videos. Previous methods either depend on 3D ConvNets or incorporate additional 2D ConvNets as encoders to extract mixed spatial-temporal features. However, these methods suffer from spatial misalignment or false distractors due to delayed and implicit spatial-temporal inte… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

    Comments: Accepted by CVPR 2022

  32. arXiv:2205.08301  [pdf, other

    cs.RO

    Centroidal Aerodynamic Modeling and Control of Flying Multibody Robots

    Authors: Tong Hui, Antonello Paolino, Gabriele Nava, Giuseppe L'Erario, Fabio Di Natale, Fabio Bergonti, Francesco Braghin, Daniele Pucci

    Abstract: This paper presents a modeling and control framework for multibody flying robots subject to non-negligible aerodynamic forces acting on the centroidal dynamics. First, aerodynamic forces are calculated during robot flight in different operating conditions by means of Computational Fluid Dynamics (CFD) analysis. Then, analytical models of the aerodynamics coefficients are generated from the dataset… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

    Comments: 7 pages, 6 figures, to be published in IEEE ICRA 2022. Presentation video: https://youtu.be/WDb-OVlh5XA

  33. arXiv:2204.07335  [pdf, other

    cs.CV

    A Keypoint-based Global Association Network for Lane Detection

    Authors: Jinsheng Wang, Yinchao Ma, Shaofei Huang, Tianrui Hui, Fei Wang, Chen Qian, Tianzhu Zhang

    Abstract: Lane detection is a challenging task that requires predicting complex topology shapes of lane lines and distinguishing different types of lanes simultaneously. Earlier works follow a top-down roadmap to regress predefined anchors into various shapes of lane lines, which lacks enough flexibility to fit complex shapes of lanes due to the fixed anchor shapes. Lately, some works propose to formulate l… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

    Comments: Accepted by CVPR2022

  34. TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding

    Authors: Dailan He, Yusheng Zhao, Junyu Luo, Tianrui Hui, Shaofei Huang, Aixi Zhang, Si Liu

    Abstract: Recently proposed fine-grained 3D visual grounding is an essential and challenging task, whose goal is to identify the 3D object referred by a natural language sentence from other distractive objects of the same category. Existing works usually adopt dynamic graph networks to indirectly model the intra/inter-modal interactions, making the model difficult to distinguish the referred object from dis… ▽ More

    Submitted 11 August, 2021; v1 submitted 5 August, 2021; originally announced August 2021.

    Comments: ACM MM2021

  35. arXiv:2105.07175  [pdf, other

    cs.CV cs.MM

    Cross-Modal Progressive Comprehension for Referring Segmentation

    Authors: Si Liu, Tianrui Hui, Shaofei Huang, Yunchao Wei, Bo Li, Guanbin Li

    Abstract: Given a natural language expression and an image/video, the goal of referring segmentation is to produce the pixel-level masks of the entities described by the subject of the expression. Previous approaches tackle this problem by implicit feature interaction and fusion between visual and linguistic modalities in a one-stage manner. However, human tends to solve the referring problem in a progressi… ▽ More

    Submitted 15 May, 2021; originally announced May 2021.

    Comments: Accepted by TPAMI 2021

  36. arXiv:2105.06818  [pdf, other

    cs.CV cs.MM

    Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation

    Authors: Tianrui Hui, Shaofei Huang, Si Liu, Zihan Ding, Guanbin Li, Wenguan Wang, Jizhong Han, Fei Wang

    Abstract: Language-queried video actor segmentation aims to predict the pixel-level mask of the actor which performs the actions described by a natural language query in the target frames. Existing methods adopt 3D CNNs over the video clip as a general encoder to extract a mixed spatio-temporal feature for the target frame. Though 3D convolutions are amenable to recognizing which actor is performing the que… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

    Comments: Accepted by CVPR 2021

  37. ORDNet: Capturing Omni-Range Dependencies for Scene Parsing

    Authors: Shaofei Huang, Si Liu, Tianrui Hui, Jizhong Han, Bo Li, Jiashi Feng, Shuicheng Yan

    Abstract: Learning to capture dependencies between spatial positions is essential to many visual tasks, especially the dense labeling problems like scene parsing. Existing methods can effectively capture long-range dependencies with self-attention mechanism while short ones by local convolution. However, there is still much gap between long-range and short-range dependencies, which largely reduces the model… ▽ More

    Submitted 11 January, 2021; originally announced January 2021.

    Comments: Published at TIP

    Journal ref: IEEE Transactions on Image Processing, 2020, 29: 8251-8263

  38. arXiv:2010.00515  [pdf, other

    cs.CV cs.CL

    Linguistic Structure Guided Context Modeling for Referring Image Segmentation

    Authors: Tianrui Hui, Si Liu, Shaofei Huang, Guanbin Li, Sansi Yu, Faxi Zhang, Jizhong Han

    Abstract: Referring image segmentation aims to predict the foreground mask of the object referred by a natural language sentence. Multimodal context of the sentence is crucial to distinguish the referent from the background. Existing methods either insufficiently or redundantly model the multimodal context. To tackle this problem, we propose a "gather-propagate-distribute" scheme to model multimodal context… ▽ More

    Submitted 5 October, 2020; v1 submitted 1 October, 2020; originally announced October 2020.

    Comments: Accepted by ECCV 2020. Code is available at https://github.com/spyflying/LSCM-Refseg

  39. arXiv:2010.00514  [pdf, other

    cs.CV cs.CL

    Referring Image Segmentation via Cross-Modal Progressive Comprehension

    Authors: Shaofei Huang, Tianrui Hui, Si Liu, Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, Bo Li

    Abstract: Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression. Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities, but usually fail to explore informative words of the expression to well align features from the two modalitie… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

    Comments: Accepted by CVPR 2020. Code is available at https://github.com/spyflying/CMPC-Refseg

  40. arXiv:2007.09319  [pdf, other

    cs.CV

    LiteFlowNet3: Resolving Correspondence Ambiguity for More Accurate Optical Flow Estimation

    Authors: Tak-Wai Hui, Chen Change Loy

    Abstract: Deep learning approaches have achieved great success in addressing the problem of optical flow estimation. The keys to success lie in the use of cost volume and coarse-to-fine flow inference. However, the matching problem becomes ill-posed when partially occluded or homogeneous regions exist in images. This causes a cost volume to contain outliers and affects the flow decoding from it. Besides, th… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: Accepted to ECCV 2020. Trained models and code package are available at https://github.com/twhui/LiteFlowNet3

  41. arXiv:2004.05304  [pdf, other

    cs.CV

    Inter-Region Affinity Distillation for Road Marking Segmentation

    Authors: Yuenan Hou, Zheng Ma, Chunxiao Liu, Tak-Wai Hui, Chen Change Loy

    Abstract: We study the problem of distilling knowledge from a large deep teacher network to a much smaller student network for the task of road marking segmentation. In this work, we explore a novel knowledge distillation (KD) approach that can transfer 'knowledge' on scene structure more effectively from a teacher to a student model. Our method is known as Inter-Region Affinity KD (IntRA-KD). It decomposes… ▽ More

    Submitted 11 April, 2020; originally announced April 2020.

    Comments: 10 pages, 10 figures; This paper is accepted by CVPR 2020; Our code is available at https://github.com/cardwing/Codes-for-IntRA-KD

  42. arXiv:1911.07472  [pdf, other

    cs.CV

    Learning to Synthesize Fashion Textures

    Authors: Wu Shi, Tak-Wai Hui, Ziwei Liu, Dahua Lin, Chen Change Loy

    Abstract: Existing unconditional generative models mainly focus on modeling general objects, such as faces and indoor scenes. Fashion textures, another important type of visual elements around us, have not been extensively studied. In this work, we propose an effective generative model for fashion textures and also comprehensively investigate the key components involved: internal representation, latent spac… ▽ More

    Submitted 18 November, 2019; originally announced November 2019.

  43. arXiv:1903.07414  [pdf, other

    cs.CV

    A Lightweight Optical Flow CNN - Revisiting Data Fidelity and Regularization

    Authors: Tak-Wai Hui, Xiaoou Tang, Chen Change Loy

    Abstract: Over four decades, the majority addresses the problem of optical flow estimation using variational methods. With the advance of machine learning, some recent works have attempted to address the problem using convolutional neural network (CNN) and have showed promising results. FlowNet2, the state-of-the-art CNN, requires over 160M parameters to achieve accurate flow estimation. Our LiteFlowNet2 ou… ▽ More

    Submitted 14 March, 2020; v1 submitted 15 March, 2019; originally announced March 2019.

    Comments: Accepted to TPAMI 2020. arXiv admin note: substantial text overlap with arXiv:1805.07036

  44. arXiv:1805.07036  [pdf, other

    cs.CV

    LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation

    Authors: Tak-Wai Hui, Xiaoou Tang, Chen Change Loy

    Abstract: FlowNet2, the state-of-the-art convolutional neural network (CNN) for optical flow estimation, requires over 160M parameters to achieve accurate flow estimation. In this paper we present an alternative network that outperforms FlowNet2 on the challenging Sintel final pass and KITTI benchmarks, while being 30 times smaller in the model size and 1.36 times faster in the running speed. This is made p… ▽ More

    Submitted 17 May, 2018; originally announced May 2018.

    Comments: Accepted to CVPR 2018 (spotlight). Project page: http://mmlab.ie.cuhk.edu.hk/projects/LiteFlowNet/

  45. Effect of Receive Spatial Diversity on the Degrees of Freedom Region in Multi-Cell Random Beamforming

    Authors: Hieu Duy Nguyen, Rui Zhang, Hon Tat Hui

    Abstract: The random beamforming (RBF) scheme, jointly applied with multi-user diversity based scheduling, is able to achieve virtually interference-free downlink transmissions with only partial channel state information (CSI) available at the transmitter. However, the impact of receive spatial diversity on the rate performance of RBF is not fully characterized yet even in a single-cell setup. In this paper… ▽ More

    Submitted 19 June, 2013; v1 submitted 24 March, 2013; originally announced March 2013.

    Comments: 33 pages, 7 figures, a longer version of the paper submitted to IEEE Transactions on Wireless Communcations. This work was presented in part at IEEE Wireless Communications and Networking Conference (WCNC), Shanghai, China, April 07-10, 2013. The authors are with the Department of Electrical and Computer Engineering, National University of Singapore (emails: {hieudn, elezhang, elehht}@nus.edu.sg)

  46. Multi-Cell Random Beamforming: Achievable Rate and Degrees of Freedom Region

    Authors: Hieu Duy Nguyen, Rui Zhang, Hon Tat Hui

    Abstract: Random beamforming (RBF) is a practically favourable transmission scheme for multiuser multi-antenna downlink systems since it requires only partial channel state information (CSI) at the transmitter. Under the conventional single-cell setup, RBF is known to achieve the optimal sum-capacity scaling law as the number of users goes to infinity, thanks to the multiuser diversity enabled transmission… ▽ More

    Submitted 8 May, 2013; v1 submitted 25 May, 2012; originally announced May 2012.

    Comments: 28 pages, 6 figures, to appear in IEEE Transactions of Signal Processing. This work was presented in part at IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, March 25-30, 2012. The authors are with the Department of Electrical and Computer Engineering, National University of Singapore (emails: {hieudn, elezhang, elehht}@nus.edu.sg)