Discussion: Model Inference Optimization Techniques for Real-Time Streaming Pipeline​

1. ​Hardware-Accelerated Video Decoding​

When enabled `ffmpeg` features with hardware acceleration support, the DataLoader's decoder should prioritize hardware-accelerated backends (eg: `nvdec/cuvid` for NVIDIA GPUs, `qsv` for Intel GPUs).

As an example, consider using [rsmedia](https://github.com/phial3/rsmedia), which provides hardware-accelerated decoding and encoding.

This is a modifited i made to Dataloader to support hardware acceleration, only support `cuda` for now. This is a commit：

[usls Dataloader support Decoder Hardware acceleration commit](https://github.com/phial3/usls/commit/3b9738c070d4c133ff254554e1534b36a91c8884)

2. Optimizing `model.forward` Speed with Underutilized GPU

I’ve noticed that GPU resources are significantly underutilized, but the inference speed remains very slow. 
What optimization strategies can I apply?

This is my code example：
```rust
fn main() -> Result<()> {
    let options = args::build_options()?;

    // build model
    let mut model = YOLO::try_from(options.commit()?)?;

    // build dataloader
    let dl = DataLoader::new(&args::input_source())?
        .with_batch(model.batch() as _)
        .with_device(Device::Cuda(0))
        .build()?;

    // build annotator
    let annotator = Annotator::default()
        .with_skeletons(&usls::COCO_SKELETONS_16)
        .without_masks(true)
        .with_bboxes_thickness(3)
        .with_saveout(model.spec());

    let mut position = Time::zero();
    let duration: Time = Time::from_nth_of_a_second(30);

    let mut encoder = EncoderBuilder::new(std::path::Path::new(&args::output()), 1920, 1080)
        .with_format("flv")
        .with_codec_name("h264_nvenc".to_string())
        .with_hardware_device(HWDeviceType::CUDA)
        .with_options(&Options::preset_h264_nvenc())
        .build()?;

    // run & annotate
    for (xs, _paths) in dl {

        let ys = model.forward(&xs)?;

        // extract bboxes
        for y in ys.iter() {
            if let Some(bboxes) = y.bboxes() {
                println!("[Bboxes]: Found {} objects", bboxes.len());
                for (i, bbox) in bboxes.iter().enumerate() {
                    println!("{}: {:?}", i, bbox)
                }
            }
        }

        // plot
        let frames = annotator.plot(&xs, &ys, false)?;

        // encode
        for (i, img) in frames.iter().enumerate() {
            // save image if needed
            img.save(format!("/tmp/images/{}_{}.png", string_now("-"), i))?;

            // image -> AVFrame
            let raw_frame = RawFrame::try_from_cv(&img.to_rgb8())?;
        
            // realtime streaming encoding
            encoder.encode_raw(&raw_frame)?;

            // Update the current position and add the inter-frame duration to it.
            position = position.aligned_with(duration).add()
        }
    }

    model.summary();

    encoder.finish().expect("failed to finish encoder");

    Ok(())
}
```


3. ​End-to-End Pipeline with YOLO Detection + Hardware-Accelerated Encoding

Workflow: YOLO model detection → bounding box rendering → real-time streaming via. (eg. NVIDIA `nvenc`)

Consideration should be given to how to achieve resource efficiency，and Real-time streaming to ensure smooth, stable and clear picture quality.









Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discussion: Model Inference Optimization Techniques for Real-Time Streaming Pipeline #68

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Discussion: Model Inference Optimization Techniques for Real-Time Streaming Pipeline​ #68

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Discussion: Model Inference Optimization Techniques for Real-Time Streaming Pipeline #68