This project aims at provide a low-cost, yet robust, skeleton tracking system for art installations in a bright environment with relatively stable light conditions.
While infrared cameras support only short distances (usually less than 4.5m), and stereo cameras like Zed/Intel costs huge computation power (presumably the CNN models for depth estimation are not that GPU-friendly); and they are both quite pricey - this project relies solely on deep learning based approaches with cheap hardware and provide acceptable tracking results.
This C++ Mac based project, uses builtin Mac OS vision 2d skeleton tracking, and DepthAnythingV2 for depth estimation, and streams out skeleton data through OSC packets. It provides a spatial tracking solution that runs stably on Mac devices such as Mac Mini - ideal for installations.
This tool has already been used in exhibitions, and it is proven that with stable light conditions (i.e. indoor), DepthAnythingV2, although relative, produces stable results. So even without fine tuning the model to get metric results for specific sites, we could use a linear regression method to map the relative depth to absolute values.
Tested with 2 cameras (resolution 1270x800) on:
- M1 Pro macbook pro, 32GB - 12fps
- M4 mac mini, 16GB - 17fps
Tested with 4 cameras (resolution 960x540) on:
- M1 Max mac studio, 32GB - 12fps
For debug purposes, the skeletons are rendered in the app by default, by ticking No Render option in the UI, a few more frames could be saved. Or you can also modify the code to remove the overhead: currently when rendering, the shared skeleton between the main and machine learning thread is locked.
A build can be downloaded. The app supports MacOS 14.0+ because of the use of DepthAnything model.
1. app structure
The app uses a settings.json to define the camera settings, a boundary for detection and OSC destination / port. It also uses a cameradata.json that defines the coeff/intrinsic camera data for reprojection, if this file is not found, a default coeff will be generated.
Because of the mobile OpenCV2 framework included in this project do not include the calibration lib, the camera calibration process is not supported - if there is prebuilt Arm OpenCV lib I will consider integrate it into this app. You can calibrate your cameras outside this app and copy the parameters over to the json file.
2. adding new cameras
After opening the app, when no camera is found, available camera IDs will be shown in the UI:
The camera ID contain 2 parts - the actual camera name, and an unique ID bound to the USB port. If you use different cameras, just using the camera name is enough, otherwise you will need to the full ID to settings.json and create a new block under motion/cameras section:
using the camera ID you have.
In this block, cutoff, depthshift, flip can be edited in the app. But you will need to manually set euler for rotation, fov for camera's horizontal fov, limit for how many skeletons allowed per each camera, resolution for the desired feed resolution, and translate for the position of the camera.
If camera is succesfully pickedup, you should see a fustum being rendered:
3. calibrate depth mapping
Changing Render Mode to Depth, you will see the depth textures of all cameras, drawn horizontally:
By right click on a depth feed, you will enter the depth edit mode: the point on the feed will be marked and its depth value is shown in the UI, by measuring its absolute value (meter unit recommended) and input it into the Raw field in the UI a pair is made. To get an accurate result, you probably need more than 4 points. Please note that there is a bug right now that the camera rotation is flipped, you will need to input negative values of the absolute depth to get the correct results - i.e. if the point in the depth feed is 5m away from the camera, you will need to input -5 in the Raw field.
After enough points paired, click Calculate button, the mapping parameters - Depth Shift and Depth Scalar are calculated. Then click Reset to clear the points - and resume depth feed processing. You can change back to 3d view to test the skeleton position.
4. boundary clip
Only the skeletons within the bound area will be considered valid and streamed. You can edit its range in the Active Bound section in the UI.
5. save changes
Press Cmd + S to save the changes to settings.json.
6. OSC streaming
In settings.json, you can remote IP and OSC port in the osc block. Localport needs to be unoccupied too. The data is streamed via OSC over udp using blob format, with the following format:
for each detected skeleton there are a float of 4 elements:
a. body index
b. normalized position with the bound - x
c. normalized position with the bound - y
d. empty value (0)
If you need the full skeleton data info, you can edit in the update function of the MotionTrackerApp class, the bodyData contains the full 19 joint position in world space:
0 - left ankle
1 - left hip
2 - left knee
3 - right ankle
4 - right hip
5 - right knee
6 - waist
7 - neck
8 - nose
9 - left eye
10 - left ear
11 - right eye
12 - right ear
13 - left shoulder
14 - left elbow
15 - left wrist
16 - right shoulder
17 - right elbow
18 - right wrist
You do not need these dependencies if you are using the build provided above.
Cinder
This is custom Cinder fork of mine that has a few improvements to the camera capture process on Mac.
Download this repository, put it the same root folder as this repository, opens proj/xcode/cinder.xcodeproj file, and build target for macos. If you have issues with the build process, please refer to Cinder's own repo or its website.
Please note that it seems zlib (used by Cinder) has a compatibility issue with Xcode 16.3, you will need to download Xcode 16.2 or previous versions - can be downloaded through your Apple developer account portal.
The project also uses nlohmann json v3.5.0, mobile OpenCV2 and are included in the source code.
MIT
If you want to improve this project, feel free to fork and change on your own.
Cinder is used to create the app instance, provide an OpenGL rendering environment and some other file I/O utility functions. If needed, it should be easy to replace with other c++ or Objective-c environments.