HiM2SAM

Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking

Ruixiang Chen¹, Guolei Sun²^✉, Yawei Li³, Jie Qin⁴, Luca Benini³

¹ KTH Royal Institute of Technology, Stockholm, Sweden
² CVL, ETH Zurich, Zurich, Switzerland
³ IIS, ETH Zurich, Zurich, Switzerland
⁴ Nanjing University of Aeronautics and Astronautics, Nanjing, China

📃 Arxiv | 📊 Raw Results

🌟 Highlights

Our method enhances the SAM2 framework for video object tracking with trainless, low-overhead improvements that significantly boost long-term tracking performance.

•Hierarchical Motion Estimation: Combines lightweight linear prediction with selective non-linear refinement for accurate tracking without extra training.

•Optimized Memory Bank: Distinguishes short-term and long-term memory with motion-aware filtering to improve robustness under occlusion and appearance changes.

⚖️ Comparisions

We compare the visualization results of HiM2SAM with SAMURAI and DAM4SAM on long video sequences.

HiM2SAM produces more stable and accurate tracking in long-term, challenging scenarios, showing improved robustness over the baselines.

Motion & Appearance Variation

Occlusion

Reappearance & Background Clutter

📚 Table of Contents

🛠️ Installation

Requirements

python>=3.10, as well as torch>=2.3.1 and torchvision>=0.18.1

Our environment is tested on both RTX 3090 and A100 GPUs.

Install SAM 2
It is recommended to follow the official SAM 2 project here to install both PyTorch and TorchVision dependencies. To install the HiM2SAM version of SAM 2 on a GPU machine, run:

cd sam2
pip install -e .
pip install -e ".[notebooks]"

Download the SAM2.1 checkpoints:

cd checkpoints
./download_ckpts.sh
cd ..

Install CoTracker 3
HHiM2SAM uses the offline version of CoTracker 3 for pixel-level motion estimation. For more details about the model, please refer to CoTracker 3.

The model can be easily loaded via torch.hub and will be automatically downloaded upon first use, requiring no additional setup.
Other Packages

pip install scipy jpeg4py lmdb

📁 Data Preparation

Prepare the dataset directories as shown below. LaSOT and LaSOT_ext are supported. Download the official data from here:

data
  ├── LaSOT_extension_subset
  │   ├── atv/
  │   │   ├── atv-1/
  │   │   │   ├── full_occlusion.txt
  │   │   │   ├── groundtruth.txt
  │   │   │   ├── img
  │   │   │   ├── nlp.txt
  │   │   │   └── out_of_view.txt
  │   │   ├── atv-2/
  │   │   ├── atv-3/
  │   │   ├── ...
  │   ├── badminton
  │   ├── cosplay
  │   ...
  │   └── testing_set.txt
  └── LaSOT
      ├── airplane/
      │   ├── airplane-1/
      │   ├── airplane-2/
      │   ├── airplane-3/
      │   ├── ...
      ├── basketball
      ├── bear
      ...
      ├── training_set.txt  
      └── testing_set.txt

🏃 Running Inference and Visualization

Run inference on LaSOT:

python scripts/main_inference.py

Run inference on LaSOT_ext:

python scripts/main_inference_ext.py

By default, the code runs inference using the large model, it takes some time to evalate on the whole dataset, you can skip to next step and download our results for quick evaluation.

Numerical results are saved under the ./result/ directory, and visualization outputs are stored in ./visualisation/. The scripts can be easily adapted to other box-based VOT datasets with minimal modifications, just place the data under the ./dataset/ directory in the same format, and update the scripts accordingly.

📊 Running Evaluation

To reproduce the AUC, precision, and normalized precision metrics reported in the paper, our evaluation methodology aligns with those used in SAMURAI and the VOT Toolkit.

Please ensure that the tracking results are saved under the ./result/ directory. Our results can be downloaded from here. You may add your own results and register your tracker in scripts.py for further comparison.

Run evaluation on LaSOT:

python lib/test/analysis/scripts.py > res_lasot.log

Run evaluation on LaSOT_ext:

python lib/test/analysis/scripts_ext.py > res_lasot_ext.log

The evaluation results will be saved in the corresponding log files.

🎯 VOT Challenges

We provide wrapper scripts for evaluating HiM2SAM on the VOT challenges. For more information about the benchmarks, please refer to the official VOT Toolkit.
Example configuration files are provided under the ./vot_utils/ directory for quick setup.

🧩 Demo on Custom Video

To run the demo with your custom video or frame directory, use the following examples:

Note: The .txt file contains a single line with the bounding box of the first frame in x,y,w,h format.

Input is Video File

python scripts/demo.py --video_path <your_video.mp4> --txt_path <path_to_first_frame_bbox.txt>

Input is Frame Folder

# Only JPG images are supported
python scripts/demo.py --video_path <your_frame_directory> --txt_path <path_to_first_frame_bbox.txt>

📝 Citation and Acknowledgment

We kindly ask you to cite our paper along with SAM 2 if you find this work valuable.

@misc{chen2025him2sam,
      title={HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking}, 
      author={Ruixiang Chen and Guolei Sun and Yawei Li and Jie Qin and Luca Benini},
      year={2025},
      eprint={2507.07603},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.07603}, 
}

@article{ravi2024sam2,
  title={SAM 2: Segment Anything in Images and Videos},
  author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
  journal={arXiv preprint arXiv:2408.00714},
  url={https://arxiv.org/abs/2408.00714},
  year={2024}
}

This repository is developed by Ruixiang Chen, and built on top of SAM 2, SAMURAI, DAM4SAM, and CoTracker3. The VOT evaluation code is modified from the VOT Toolkit.

Many thanks to the authors of these excellent projects for making their work publicly available.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
co-tracker @ 82e02e8		co-tracker @ 82e02e8
lib		lib
sam2		sam2
scripts		scripts
vot_utils		vot_utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HiM2SAM

🌟 Highlights

⚖️ Comparisions

Motion & Appearance Variation

Occlusion

Reappearance & Background Clutter

📚 Table of Contents

🛠️ Installation

📁 Data Preparation

🏃 Running Inference and Visualization

📊 Running Evaluation

🎯 VOT Challenges

🧩 Demo on Custom Video

Input is Video File

Input is Frame Folder

📝 Citation and Acknowledgment

About

Uh oh!

Releases

Packages

Languages

License

killian31/HiM2SAM

Folders and files

Latest commit

History

Repository files navigation

HiM2SAM

🌟 Highlights

⚖️ Comparisions

Motion & Appearance Variation

Occlusion

Reappearance & Background Clutter

📚 Table of Contents

🛠️ Installation

📁 Data Preparation

🏃 Running Inference and Visualization

📊 Running Evaluation

🎯 VOT Challenges

🧩 Demo on Custom Video

Input is Video File

Input is Frame Folder

📝 Citation and Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages