https://github.com/NVIDIA/cutlass its a header only template library. So curious what a bindgen output would look like. this would be a great addition to the rust cuda ecosystem