这是indexloc提供的服务,不要输入任何密码
Skip to content

Discussion: Model Inference Optimization Techniques for Real-Time Streaming Pipeline​ #68

@phial3

Description

@phial3
  1. ​Hardware-Accelerated Video Decoding​

When enabled ffmpeg features with hardware acceleration support, the DataLoader's decoder should prioritize hardware-accelerated backends (eg: nvdec/cuvid for NVIDIA GPUs, qsv for Intel GPUs).

As an example, consider using rsmedia, which provides hardware-accelerated decoding and encoding.

This is a modifited i made to Dataloader to support hardware acceleration, only support cuda for now. This is a commit:

usls Dataloader support Decoder Hardware acceleration commit

  1. Optimizing model.forward Speed with Underutilized GPU

I’ve noticed that GPU resources are significantly underutilized, but the inference speed remains very slow.
What optimization strategies can I apply?

This is my code example:

fn main() -> Result<()> {
    let options = args::build_options()?;

    // build model
    let mut model = YOLO::try_from(options.commit()?)?;

    // build dataloader
    let dl = DataLoader::new(&args::input_source())?
        .with_batch(model.batch() as _)
        .with_device(Device::Cuda(0))
        .build()?;

    // build annotator
    let annotator = Annotator::default()
        .with_skeletons(&usls::COCO_SKELETONS_16)
        .without_masks(true)
        .with_bboxes_thickness(3)
        .with_saveout(model.spec());

    let mut position = Time::zero();
    let duration: Time = Time::from_nth_of_a_second(30);

    let mut encoder = EncoderBuilder::new(std::path::Path::new(&args::output()), 1920, 1080)
        .with_format("flv")
        .with_codec_name("h264_nvenc".to_string())
        .with_hardware_device(HWDeviceType::CUDA)
        .with_options(&Options::preset_h264_nvenc())
        .build()?;

    // run & annotate
    for (xs, _paths) in dl {

        let ys = model.forward(&xs)?;

        // extract bboxes
        for y in ys.iter() {
            if let Some(bboxes) = y.bboxes() {
                println!("[Bboxes]: Found {} objects", bboxes.len());
                for (i, bbox) in bboxes.iter().enumerate() {
                    println!("{}: {:?}", i, bbox)
                }
            }
        }

        // plot
        let frames = annotator.plot(&xs, &ys, false)?;

        // encode
        for (i, img) in frames.iter().enumerate() {
            // save image if needed
            img.save(format!("/tmp/images/{}_{}.png", string_now("-"), i))?;

            // image -> AVFrame
            let raw_frame = RawFrame::try_from_cv(&img.to_rgb8())?;
        
            // realtime streaming encoding
            encoder.encode_raw(&raw_frame)?;

            // Update the current position and add the inter-frame duration to it.
            position = position.aligned_with(duration).add()
        }
    }

    model.summary();

    encoder.finish().expect("failed to finish encoder");

    Ok(())
}
  1. ​End-to-End Pipeline with YOLO Detection + Hardware-Accelerated Encoding

Workflow: YOLO model detection → bounding box rendering → real-time streaming via. (eg. NVIDIA nvenc)

Consideration should be given to how to achieve resource efficiency,and Real-time streaming to ensure smooth, stable and clear picture quality.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions