Logo

Creating a custom docker image for YOLO on Jetson

A minimal Docker image (JetPack 4.x) that uses CUDA 10.2, PyTorch 1.11, and TensorRT 8.2 to cut latency by ~43% and boost FPS by ~72%. It ended up being added to the main codebase of Ultralytics and is used by over 30K users.

Python

#Background

At VOIAR(Vision Only Intelligent Autonomous Robot), we were developing a forest-ready robot (codenamed “5Earl”) with a person-following feature. The system split duties between:

  • A Raspberry Pi for low-level control and sensor aggregation.
  • A Jetson Nano for vision inference and communications.

Initially, the robot could be steered via a mobile web app (replacing the previous laptop + PS4 controller workflow). The next milestone: let it follow a person autonomously. Imagine a 3 wheeler helper trailing you through a forest. You can throw a load upto 120kg on it.

#The Challenge

YOLO (“You Only Look Once”) was our go-to for real-time object detection. However:

  • Jetson Nano's JetPack 4.6 supports only CUDA 10.2, while PyTorch's official JetPack images targeted newer releases.
  • No existing Docker image combined:
    • JetPack 4.x support
    • PyTorch 1.11
    • TensorRT 8.2.0.6
    • ONNX→TensorRT export workflow

Goal: Build a Docker image that runs YOLO with maximum GPU utilization on Jetson Nano.

#Solution

#1. Custom Docker Base Image

I started from NVIDIA's l4t-pytorch:r35.2.1-pth2.0-py3 (PyTorch 2.0 + CUDA 11.7), then:

  • Downgraded to PyTorch 1.11 (compatible with CUDA 10.2).
  • Installed TensorRT 8.2.0.6 and related dependencies.
  • Exposed the correct /usr/lib/aarch64-linux-gnu/ paths for all CUDA, cuDNN, and TensorRT libraries.

#2. Model Export Workflow

To squeeze out every bit of performance:

  1. Convert the .pt model to ONNX.
  2. Warm up the ONNX graph with a sample image (on the target Jetson Nano).
  3. Build a .engine file using TensorRT's Python API on-device.
  4. Load the .engine at runtime for inference.
import json # Using json instead of yaml
import os
import shutil
import time
from datetime import datetime
from ultralytics import YOLO
# Get today's date in the required format
today_date = datetime.now().strftime("[%Y-%m-%dT%H:%M]")
# Define the base directory
base_dir = f"/home/ftpuser/{today_date}"
class ModelTester:
def __init__(self, model_name, img_size=(640, 480)):
self.img_size = img_size
self.model_name = model_name
self.best_model_path = os.path.join(base_dir, f"{model_name}.pt")
self.settings_path = os.path.join(base_dir, "settings.json") # Change to .json
def create_model(self):
self.pt_model = YOLO(f"./assets/{self.model_name}.pt") # Build a new model from the .yaml configuration
self.pt_model.predict("https://ultralytics.com/images/bus.jpg", imgsz=(self.img_size[1], self.img_size[0])) # Predict on an image
def export_model(self):
# Export the model to TensorRT format with INT8 quantization
self.pt_model.export(format="engine", device="cuda", imgsz=(self.img_size[1], self.img_size[0]), half=True)
def move_the_model(self):
os.makedirs(base_dir, exist_ok=True)
# All the exported models are saved in `runs/detect/train/weights` by default
# shutil.copy(f"runs/detect/train/weights/{self.model_name}.pt", self.best_model_path)
shutil.move(f"./assets/{self.model_name}.engine", os.path.join(base_dir, f"{self.model_name}.engine"))
shutil.move(f"./assets/{self.model_name}.onnx", os.path.join(base_dir, f"{self.model_name}.onnx"))
print(f"Moved the files to {base_dir} 🤝🔥🔥")
self.save_settings()
def save_settings(self):
settings = {
"model_name": self.model_name,
"img_size": self.img_size,
"training_data": "coco.yaml",
"epochs": 1,
"classes": [0],
"device": "cuda",
"dynamic": True,
"batch": -1,
}
with open(self.settings_path, "w") as file:
json.dump(settings, file)
print(f"Saved settings to {self.settings_path}")
if __name__ == "__main__":
tester = ModelTester("yolov8_20240717_coco(imgsz480x640)")
tester.create_model()
tester.export_model()
tester.move_the_model()
time.sleep(60000) # Arbitrary delay to simulate extended operation

This ensures the final inference runs entirely in TensorRT, bypassing slower PyTorch C++ ops.

#Results

ONNX vs PyTorch inference speed comparison
A comparison in speed between the TensorRt and the Pytorch model file, running on the same optimized docker image

Inference speed benchmark inside Docker
FPS and images-per-minute benchmarks in the optimized container

MetricImprovement (%)
Total inference time (300 images)–41.97%
Avg. inference time per image–43.33%
Frames Per Second (FPS)+72.29%
Images Per Minute+72.31%

The robot autonomously tracking me while walking in the park, maintaining safe distance, adapting to turns, and handling slight obstacles.

#Conclusion

By crafting a tailored Docker image and integrating a robust ONNX → TensorRT export pipeline, I delivered substantial performance gains on legacy Jetson devices. This patch has been merged into Ultralytics main repo and is actively used by over 30,000 developers and projects.

Feel free to dive into the code, reproduce the benchmarks, or adapt this image for your own Jetson-based edge deployments!