+
Skip to content

killian31/HiM2SAM

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HiM2SAM

Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking


Ruixiang Chen¹, Guolei Sun², Yawei Li³, Jie Qin⁴, Luca Benini³

¹ KTH Royal Institute of Technology, Stockholm, Sweden
² CVL, ETH Zurich, Zurich, Switzerland
³ IIS, ETH Zurich, Zurich, Switzerland
⁴ Nanjing University of Aeronautics and Astronautics, Nanjing, China

📃 Arxiv   |   📊 Raw Results

🌟 Highlights

Our method enhances the SAM2 framework for video object tracking with trainless, low-overhead improvements that significantly boost long-term tracking performance.

Hierarchical Motion Estimation: Combines lightweight linear prediction with selective non-linear refinement for accurate tracking without extra training.

Optimized Memory Bank: Distinguishes short-term and long-term memory with motion-aware filtering to improve robustness under occlusion and appearance changes.

Framework Overview

⚖️ Comparisions

We compare the visualization results of HiM2SAM with SAMURAI and DAM4SAM on long video sequences.

HiM2SAM produces more stable and accurate tracking in long-term, challenging scenarios, showing improved robustness over the baselines.

Legend Image

Motion & Appearance Variation

Motion & Appearance Variation

Occlusion

Occlusion

Reappearance & Background Clutter

Reappearance & Background Clutter

📚 Table of Contents

🛠️ Installation

Requirements

python>=3.10, as well as torch>=2.3.1 and torchvision>=0.18.1

Our environment is tested on both RTX 3090 and A100 GPUs.

  1. Install SAM 2
    It is recommended to follow the official SAM 2 project here to install both PyTorch and TorchVision dependencies. To install the HiM2SAM version of SAM 2 on a GPU machine, run:
cd sam2
pip install -e .
pip install -e ".[notebooks]"

Download the SAM2.1 checkpoints:

cd checkpoints
./download_ckpts.sh
cd ..
  1. Install CoTracker 3
    HHiM2SAM uses the offline version of CoTracker 3 for pixel-level motion estimation. For more details about the model, please refer to CoTracker 3.

    The model can be easily loaded via torch.hub and will be automatically downloaded upon first use, requiring no additional setup.

  2. Other Packages

pip install scipy jpeg4py lmdb

📁 Data Preparation

Prepare the dataset directories as shown below. LaSOT and LaSOText are supported. Download the official data from here:

data
  ├── LaSOT_extension_subset
  │   ├── atv/
  │   │   ├── atv-1/
  │   │   │   ├── full_occlusion.txt
  │   │   │   ├── groundtruth.txt
  │   │   │   ├── img
  │   │   │   ├── nlp.txt
  │   │   │   └── out_of_view.txt
  │   │   ├── atv-2/
  │   │   ├── atv-3/
  │   │   ├── ...
  │   ├── badminton
  │   ├── cosplay
  │   ...
  │   └── testing_set.txt
  └── LaSOT
      ├── airplane/
      │   ├── airplane-1/
      │   ├── airplane-2/
      │   ├── airplane-3/
      │   ├── ...
      ├── basketball
      ├── bear
      ...
      ├── training_set.txt  
      └── testing_set.txt  

🏃 Running Inference and Visualization

Run inference on LaSOT:

python scripts/main_inference.py 

Run inference on LaSOText:

python scripts/main_inference_ext.py 

By default, the code runs inference using the large model, it takes some time to evalate on the whole dataset, you can skip to next step and download our results for quick evaluation.

Numerical results are saved under the ./result/ directory, and visualization outputs are stored in ./visualisation/. The scripts can be easily adapted to other box-based VOT datasets with minimal modifications, just place the data under the ./dataset/ directory in the same format, and update the scripts accordingly.

📊 Running Evaluation

To reproduce the AUC, precision, and normalized precision metrics reported in the paper, our evaluation methodology aligns with those used in SAMURAI and the VOT Toolkit.

Please ensure that the tracking results are saved under the ./result/ directory. Our results can be downloaded from here. You may add your own results and register your tracker in scripts.py for further comparison.

Run evaluation on LaSOT:

python lib/test/analysis/scripts.py > res_lasot.log

Run evaluation on LaSOText:

python lib/test/analysis/scripts_ext.py > res_lasot_ext.log

The evaluation results will be saved in the corresponding log files.

🎯 VOT Challenges

We provide wrapper scripts for evaluating HiM2SAM on the VOT challenges. For more information about the benchmarks, please refer to the official VOT Toolkit.
Example configuration files are provided under the ./vot_utils/ directory for quick setup.

🧩 Demo on Custom Video

To run the demo with your custom video or frame directory, use the following examples:

Note: The .txt file contains a single line with the bounding box of the first frame in x,y,w,h format.

Input is Video File

python scripts/demo.py --video_path <your_video.mp4> --txt_path <path_to_first_frame_bbox.txt>

Input is Frame Folder

# Only JPG images are supported
python scripts/demo.py --video_path <your_frame_directory> --txt_path <path_to_first_frame_bbox.txt>

📝 Citation and Acknowledgment

We kindly ask you to cite our paper along with SAM 2 if you find this work valuable.

@misc{chen2025him2sam,
      title={HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking}, 
      author={Ruixiang Chen and Guolei Sun and Yawei Li and Jie Qin and Luca Benini},
      year={2025},
      eprint={2507.07603},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.07603}, 
}

@article{ravi2024sam2,
  title={SAM 2: Segment Anything in Images and Videos},
  author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
  journal={arXiv preprint arXiv:2408.00714},
  url={https://arxiv.org/abs/2408.00714},
  year={2024}
}

This repository is developed by Ruixiang Chen, and built on top of SAM 2, SAMURAI, DAM4SAM, and CoTracker3. The VOT evaluation code is modified from the VOT Toolkit.

Many thanks to the authors of these excellent projects for making their work publicly available.

About

This is the official implementation of work HiM2SAM.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 64.5%
  • Jupyter Notebook 35.0%
  • Other 0.5%
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载