Creating a custom docker image for YOLO on Jetson

A minimal Docker image (JetPack 4.x) that uses CUDA 10.2, PyTorch 1.11, and TensorRT 8.2 to cut latency by ~43% and boost FPS by ~72%. It ended up being added to the main codebase of Ultralytics and is used by over 30K users.

#Background

At VOIAR(Vision Only Intelligent Autonomous Robot), we were developing a forest-ready robot (codenamed “5Earl”) with a person-following feature. The system split duties between:

A Raspberry Pi for low-level control and sensor aggregation.
A Jetson Nano for vision inference and communications.

Initially, the robot could be steered via a mobile web app (replacing the previous laptop + PS4 controller workflow). The next milestone: let it follow a person autonomously. Imagine a 3 wheeler helper trailing you through a forest. You can throw a load upto 120kg on it.

#The Challenge

YOLO (“You Only Look Once”) was our go-to for real-time object detection. However:

Jetson Nano's JetPack 4.6 supports only CUDA 10.2, while PyTorch's official JetPack images targeted newer releases.
No existing Docker image combined:
- JetPack 4.x support
- PyTorch 1.11
- TensorRT 8.2.0.6
- ONNX→TensorRT export workflow

Goal: Build a Docker image that runs YOLO with maximum GPU utilization on Jetson Nano.

#Solution

#1. Custom Docker Base Image

I started from NVIDIA's l4t-pytorch:r35.2.1-pth2.0-py3 (PyTorch 2.0 + CUDA 11.7), then:

Downgraded to PyTorch 1.11 (compatible with CUDA 10.2).
Installed TensorRT 8.2.0.6 and related dependencies.
Exposed the correct /usr/lib/aarch64-linux-gnu/ paths for all CUDA, cuDNN, and TensorRT libraries.

#2. Model Export Workflow

To squeeze out every bit of performance:

Convert the .pt model to ONNX.
Warm up the ONNX graph with a sample image (on the target Jetson Nano).
Build a .engine file using TensorRT's Python API on-device.
Load the .engine at runtime for inference.

import json  # Using json instead of yaml
import os
import shutil
import time
from datetime import datetime

from ultralytics import YOLO

# Get today's date in the required format
today_date = datetime.now().strftime("[%Y-%m-%dT%H:%M]")

# Define the base directory
base_dir = f"/home/ftpuser/{today_date}"


class ModelTester:
    def __init__(self, model_name, img_size=(640, 480)):
        self.img_size = img_size
        self.model_name = model_name
        self.best_model_path = os.path.join(base_dir, f"{model_name}.pt")
        self.settings_path = os.path.join(base_dir, "settings.json")  # Change to .json

    def create_model(self):
        self.pt_model = YOLO(f"./assets/{self.model_name}.pt")  # Build a new model from the .yaml configuration
        self.pt_model.predict("https://ultralytics.com/images/bus.jpg", imgsz=(self.img_size[1], self.img_size[0]))  # Predict on an image

    def export_model(self):
        # Export the model to TensorRT format with INT8 quantization
        self.pt_model.export(format="engine", device="cuda", imgsz=(self.img_size[1], self.img_size[0]), half=True)

    def move_the_model(self):
        os.makedirs(base_dir, exist_ok=True)
        # All the exported models are saved in `runs/detect/train/weights` by default
        # shutil.copy(f"runs/detect/train/weights/{self.model_name}.pt", self.best_model_path)
        shutil.move(f"./assets/{self.model_name}.engine", os.path.join(base_dir, f"{self.model_name}.engine"))
        shutil.move(f"./assets/{self.model_name}.onnx", os.path.join(base_dir, f"{self.model_name}.onnx"))
        print(f"Moved the files to {base_dir} 🤝🔥🔥")
        self.save_settings()

    def save_settings(self):
        settings = {
            "model_name": self.model_name,
            "img_size": self.img_size,
            "training_data": "coco.yaml",
            "epochs": 1,
            "classes": [0],
            "device": "cuda",
            "dynamic": True,
            "batch": -1,
        }
        with open(self.settings_path, "w") as file:
            json.dump(settings, file)
        print(f"Saved settings to {self.settings_path}")


if __name__ == "__main__":
    tester = ModelTester("yolov8_20240717_coco(imgsz480x640)")
    tester.create_model()
    tester.export_model()
    tester.move_the_model()
    time.sleep(60000)  # Arbitrary delay to simulate extended operation

This ensures the final inference runs entirely in TensorRT, bypassing slower PyTorch C++ ops.

#Results

ONNX vs PyTorch inference speed comparison — A comparison in speed between the TensorRt and the Pytorch model file, running on the same optimized docker image

Inference speed benchmark inside Docker — FPS and images-per-minute benchmarks in the optimized container

Metric	Improvement (%)
Total inference time (300 images)	–41.97%
Avg. inference time per image	–43.33%
Frames Per Second (FPS)	+72.29%
Images Per Minute	+72.31%

The robot autonomously tracking me while walking in the park, maintaining safe distance, adapting to turns, and handling slight obstacles.

#Conclusion

By crafting a tailored Docker image and integrating a robust ONNX → TensorRT export pipeline, I delivered substantial performance gains on legacy Jetson devices. This patch has been merged into Ultralytics main repo and is actively used by over 30,000 developers and projects.

Repo & utilities: MWLCDev/Yolo-Export
Training data: Freely available on RoboFlow
Pull request discussion: ultralytics/ultralytics#13100

Feel free to dive into the code, reproduce the benchmarks, or adapt this image for your own Jetson-based edge deployments!