-
Notifications
You must be signed in to change notification settings - Fork 27
Description
- Hardware-Accelerated Video Decoding
When enabled ffmpeg
features with hardware acceleration support, the DataLoader's decoder should prioritize hardware-accelerated backends (eg: nvdec/cuvid
for NVIDIA GPUs, qsv
for Intel GPUs).
As an example, consider using rsmedia, which provides hardware-accelerated decoding and encoding.
This is a modifited i made to Dataloader to support hardware acceleration, only support cuda
for now. This is a commit:
usls Dataloader support Decoder Hardware acceleration commit
- Optimizing
model.forward
Speed with Underutilized GPU
I’ve noticed that GPU resources are significantly underutilized, but the inference speed remains very slow.
What optimization strategies can I apply?
This is my code example:
fn main() -> Result<()> {
let options = args::build_options()?;
// build model
let mut model = YOLO::try_from(options.commit()?)?;
// build dataloader
let dl = DataLoader::new(&args::input_source())?
.with_batch(model.batch() as _)
.with_device(Device::Cuda(0))
.build()?;
// build annotator
let annotator = Annotator::default()
.with_skeletons(&usls::COCO_SKELETONS_16)
.without_masks(true)
.with_bboxes_thickness(3)
.with_saveout(model.spec());
let mut position = Time::zero();
let duration: Time = Time::from_nth_of_a_second(30);
let mut encoder = EncoderBuilder::new(std::path::Path::new(&args::output()), 1920, 1080)
.with_format("flv")
.with_codec_name("h264_nvenc".to_string())
.with_hardware_device(HWDeviceType::CUDA)
.with_options(&Options::preset_h264_nvenc())
.build()?;
// run & annotate
for (xs, _paths) in dl {
let ys = model.forward(&xs)?;
// extract bboxes
for y in ys.iter() {
if let Some(bboxes) = y.bboxes() {
println!("[Bboxes]: Found {} objects", bboxes.len());
for (i, bbox) in bboxes.iter().enumerate() {
println!("{}: {:?}", i, bbox)
}
}
}
// plot
let frames = annotator.plot(&xs, &ys, false)?;
// encode
for (i, img) in frames.iter().enumerate() {
// save image if needed
img.save(format!("/tmp/images/{}_{}.png", string_now("-"), i))?;
// image -> AVFrame
let raw_frame = RawFrame::try_from_cv(&img.to_rgb8())?;
// realtime streaming encoding
encoder.encode_raw(&raw_frame)?;
// Update the current position and add the inter-frame duration to it.
position = position.aligned_with(duration).add()
}
}
model.summary();
encoder.finish().expect("failed to finish encoder");
Ok(())
}
- End-to-End Pipeline with YOLO Detection + Hardware-Accelerated Encoding
Workflow: YOLO model detection → bounding box rendering → real-time streaming via. (eg. NVIDIA nvenc
)
Consideration should be given to how to achieve resource efficiency,and Real-time streaming to ensure smooth, stable and clear picture quality.