This commit is contained in:
lee
2025-06-18 14:35:43 +08:00
commit e474ab5f9f
529 changed files with 80523 additions and 0 deletions

36
examples/README.md Normal file
View File

@ -0,0 +1,36 @@
## Ultralytics YOLOv8 Example Applications
This repository features a collection of real-world applications and walkthroughs, provided as either Python files or notebooks. Explore the examples below to see how YOLOv8 can be integrated into various applications.
### Ultralytics YOLO Example Applications
| Title | Format | Contributor |
| ----------------------------------------------------------------------------------------------------------------------------------------- | ------------------ | ----------------------------------------------------------------------------------------- |
| [YOLO ONNX Detection Inference with C++](./YOLOv8-CPP-Inference) | C++/ONNX | [Justas Bartnykas](https://github.com/JustasBart) |
| [YOLO OpenCV ONNX Detection Python](./YOLOv8-OpenCV-ONNX-Python) | OpenCV/Python/ONNX | [Farid Inawan](https://github.com/frdteknikelektro) |
| [YOLOv8 .NET ONNX ImageSharp](https://github.com/dme-compunet/YOLOv8) | C#/ONNX/ImageSharp | [Compunet](https://github.com/dme-compunet) |
| [YOLO .Net ONNX Detection C#](https://www.nuget.org/packages/Yolov8.Net) | C# .Net | [Samuel Stainback](https://github.com/sstainba) |
| [YOLOv8 on NVIDIA Jetson(TensorRT and DeepStream)](https://wiki.seeedstudio.com/YOLOv8-DeepStream-TRT-Jetson/) | Python | [Lakshantha](https://github.com/lakshanthad) |
| [YOLOv8 ONNXRuntime Python](./YOLOv8-ONNXRuntime) | Python/ONNXRuntime | [Semih Demirel](https://github.com/semihhdemirel) |
| [YOLOv8 ONNXRuntime CPP](./YOLOv8-ONNXRuntime-CPP) | C++/ONNXRuntime | [DennisJcy](https://github.com/DennisJcy), [Onuralp Sezer](https://github.com/onuralpszr) |
| [RTDETR ONNXRuntime C#](https://github.com/Kayzwer/yolo-cs/blob/master/RTDETR.cs) | C#/ONNX | [Kayzwer](https://github.com/Kayzwer) |
| [YOLOv8 SAHI Video Inference](https://github.com/RizwanMunawar/ultralytics/blob/main/examples/YOLOv8-SAHI-Inference-Video/yolov8_sahi.py) | Python | [Muhammad Rizwan Munawar](https://github.com/RizwanMunawar) |
| [YOLOv8 Region Counter](https://github.com/RizwanMunawar/ultralytics/blob/main/examples/YOLOv8-Region-Counter/yolov8_region_counter.py) | Python | [Muhammad Rizwan Munawar](https://github.com/RizwanMunawar) |
| [YOLOv8 Segmentation ONNXRuntime Python](./YOLOv8-Segmentation-ONNXRuntime-Python) | Python/ONNXRuntime | [jamjamjon](https://github.com/jamjamjon) |
| [YOLOv8 LibTorch CPP](./YOLOv8-LibTorch-CPP-Inference) | C++/LibTorch | [Myyura](https://github.com/Myyura) |
| [YOLOv8 OpenCV INT8 TFLite Python](./YOLOv8-OpenCV-int8-tflite-Python) | Python | [Wamiq Raza](https://github.com/wamiqraza) |
### How to Contribute
We greatly appreciate contributions from the community, including examples, applications, and guides. If you'd like to contribute, please follow these guidelines:
1. Create a pull request (PR) with the title prefix `[Example]`, adding your new example folder to the `examples/` directory within the repository.
2. Make sure your project adheres to the following standards:
- Makes use of the `ultralytics` package.
- Includes a `README.md` with clear instructions for setting up and running the example.
- Refrains from adding large files or dependencies unless they are absolutely necessary for the example.
- Contributors should be willing to provide support for their examples and address related issues.
For more detailed information and guidance on contributing, please visit our [contribution documentation](https://docs.ultralytics.com/help/contributing).
If you encounter any questions or concerns regarding these guidelines, feel free to open a PR or an issue in the repository, and we will assist you in the contribution process.

View File

@ -0,0 +1,28 @@
cmake_minimum_required(VERSION 3.5)
project(Yolov8CPPInference VERSION 0.1)
set(CMAKE_INCLUDE_CURRENT_DIR ON)
# CUDA
set(CUDA_TOOLKIT_ROOT_DIR "/usr/local/cuda")
find_package(CUDA 11 REQUIRED)
set(CMAKE_CUDA_STANDARD 11)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)
# !CUDA
# OpenCV
find_package(OpenCV REQUIRED)
include_directories(${OpenCV_INCLUDE_DIRS})
# !OpenCV
set(PROJECT_SOURCES
main.cpp
inference.h
inference.cpp
)
add_executable(Yolov8CPPInference ${PROJECT_SOURCES})
target_link_libraries(Yolov8CPPInference ${OpenCV_LIBS})

View File

@ -0,0 +1,50 @@
# YOLOv8/YOLOv5 Inference C++
This example demonstrates how to perform inference using YOLOv8 and YOLOv5 models in C++ with OpenCV's DNN API.
## Usage
```bash
git clone ultralytics
cd ultralytics
pip install .
cd examples/YOLOv8-CPP-Inference
# Add a **yolov8\_.onnx** and/or **yolov5\_.onnx** model(s) to the ultralytics folder.
# Edit the **main.cpp** to change the **projectBasePath** to match your user.
# Note that by default the CMake file will try and import the CUDA library to be used with the OpenCVs dnn (cuDNN) GPU Inference.
# If your OpenCV build does not use CUDA/cuDNN you can remove that import call and run the example on CPU.
mkdir build
cd build
cmake ..
make
./Yolov8CPPInference
```
## Exporting YOLOv8 and YOLOv5 Models
To export YOLOv8 models:
```commandline
yolo export model=yolov8s.pt imgsz=480,640 format=onnx opset=12
```
To export YOLOv5 models:
```commandline
python3 export.py --weights yolov5s.pt --img 480 640 --include onnx --opset 12
```
yolov8s.onnx:
![image](https://user-images.githubusercontent.com/40023722/217356132-a4cecf2e-2729-4acb-b80a-6559022d7707.png)
yolov5s.onnx:
![image](https://user-images.githubusercontent.com/40023722/217357005-07464492-d1da-42e3-98a7-fc753f87d5e6.png)
This repository utilizes OpenCV's DNN API to run ONNX exported models of YOLOv5 and YOLOv8. In theory, it should work for YOLOv6 and YOLOv7 as well, but they have not been tested. Note that the example networks are exported with rectangular (640x480) resolutions, but any exported resolution will work. You may want to use the letterbox approach for square images, depending on your use case.
The **main** branch version uses Qt as a GUI wrapper. The primary focus here is the **Inference** class file, which demonstrates how to transpose YOLOv8 models to work as YOLOv5 models.

View File

@ -0,0 +1,185 @@
#include "inference.h"
Inference::Inference(const std::string &onnxModelPath, const cv::Size &modelInputShape, const std::string &classesTxtFile, const bool &runWithCuda)
{
modelPath = onnxModelPath;
modelShape = modelInputShape;
classesPath = classesTxtFile;
cudaEnabled = runWithCuda;
loadOnnxNetwork();
// loadClassesFromFile(); The classes are hard-coded for this example
}
std::vector<Detection> Inference::runInference(const cv::Mat &input)
{
cv::Mat modelInput = input;
if (letterBoxForSquare && modelShape.width == modelShape.height)
modelInput = formatToSquare(modelInput);
cv::Mat blob;
cv::dnn::blobFromImage(modelInput, blob, 1.0/255.0, modelShape, cv::Scalar(), true, false);
net.setInput(blob);
std::vector<cv::Mat> outputs;
net.forward(outputs, net.getUnconnectedOutLayersNames());
int rows = outputs[0].size[1];
int dimensions = outputs[0].size[2];
bool yolov8 = false;
// yolov5 has an output of shape (batchSize, 25200, 85) (Num classes + box[x,y,w,h] + confidence[c])
// yolov8 has an output of shape (batchSize, 84, 8400) (Num classes + box[x,y,w,h])
if (dimensions > rows) // Check if the shape[2] is more than shape[1] (yolov8)
{
yolov8 = true;
rows = outputs[0].size[2];
dimensions = outputs[0].size[1];
outputs[0] = outputs[0].reshape(1, dimensions);
cv::transpose(outputs[0], outputs[0]);
}
float *data = (float *)outputs[0].data;
float x_factor = modelInput.cols / modelShape.width;
float y_factor = modelInput.rows / modelShape.height;
std::vector<int> class_ids;
std::vector<float> confidences;
std::vector<cv::Rect> boxes;
for (int i = 0; i < rows; ++i)
{
if (yolov8)
{
float *classes_scores = data+4;
cv::Mat scores(1, classes.size(), CV_32FC1, classes_scores);
cv::Point class_id;
double maxClassScore;
minMaxLoc(scores, 0, &maxClassScore, 0, &class_id);
if (maxClassScore > modelScoreThreshold)
{
confidences.push_back(maxClassScore);
class_ids.push_back(class_id.x);
float x = data[0];
float y = data[1];
float w = data[2];
float h = data[3];
int left = int((x - 0.5 * w) * x_factor);
int top = int((y - 0.5 * h) * y_factor);
int width = int(w * x_factor);
int height = int(h * y_factor);
boxes.push_back(cv::Rect(left, top, width, height));
}
}
else // yolov5
{
float confidence = data[4];
if (confidence >= modelConfidenceThreshold)
{
float *classes_scores = data+5;
cv::Mat scores(1, classes.size(), CV_32FC1, classes_scores);
cv::Point class_id;
double max_class_score;
minMaxLoc(scores, 0, &max_class_score, 0, &class_id);
if (max_class_score > modelScoreThreshold)
{
confidences.push_back(confidence);
class_ids.push_back(class_id.x);
float x = data[0];
float y = data[1];
float w = data[2];
float h = data[3];
int left = int((x - 0.5 * w) * x_factor);
int top = int((y - 0.5 * h) * y_factor);
int width = int(w * x_factor);
int height = int(h * y_factor);
boxes.push_back(cv::Rect(left, top, width, height));
}
}
}
data += dimensions;
}
std::vector<int> nms_result;
cv::dnn::NMSBoxes(boxes, confidences, modelScoreThreshold, modelNMSThreshold, nms_result);
std::vector<Detection> detections{};
for (unsigned long i = 0; i < nms_result.size(); ++i)
{
int idx = nms_result[i];
Detection result;
result.class_id = class_ids[idx];
result.confidence = confidences[idx];
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<int> dis(100, 255);
result.color = cv::Scalar(dis(gen),
dis(gen),
dis(gen));
result.className = classes[result.class_id];
result.box = boxes[idx];
detections.push_back(result);
}
return detections;
}
void Inference::loadClassesFromFile()
{
std::ifstream inputFile(classesPath);
if (inputFile.is_open())
{
std::string classLine;
while (std::getline(inputFile, classLine))
classes.push_back(classLine);
inputFile.close();
}
}
void Inference::loadOnnxNetwork()
{
net = cv::dnn::readNetFromONNX(modelPath);
if (cudaEnabled)
{
std::cout << "\nRunning on CUDA" << std::endl;
net.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA);
}
else
{
std::cout << "\nRunning on CPU" << std::endl;
net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
}
}
cv::Mat Inference::formatToSquare(const cv::Mat &source)
{
int col = source.cols;
int row = source.rows;
int _max = MAX(col, row);
cv::Mat result = cv::Mat::zeros(_max, _max, CV_8UC3);
source.copyTo(result(cv::Rect(0, 0, col, row)));
return result;
}

View File

@ -0,0 +1,52 @@
#ifndef INFERENCE_H
#define INFERENCE_H
// Cpp native
#include <fstream>
#include <vector>
#include <string>
#include <random>
// OpenCV / DNN / Inference
#include <opencv2/imgproc.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/dnn.hpp>
struct Detection
{
int class_id{0};
std::string className{};
float confidence{0.0};
cv::Scalar color{};
cv::Rect box{};
};
class Inference
{
public:
Inference(const std::string &onnxModelPath, const cv::Size &modelInputShape = {640, 640}, const std::string &classesTxtFile = "", const bool &runWithCuda = true);
std::vector<Detection> runInference(const cv::Mat &input);
private:
void loadClassesFromFile();
void loadOnnxNetwork();
cv::Mat formatToSquare(const cv::Mat &source);
std::string modelPath{};
std::string classesPath{};
bool cudaEnabled{};
std::vector<std::string> classes{"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"};
cv::Size2f modelShape{};
float modelConfidenceThreshold {0.25};
float modelScoreThreshold {0.45};
float modelNMSThreshold {0.50};
bool letterBoxForSquare = true;
cv::dnn::Net net;
};
#endif // INFERENCE_H

View File

@ -0,0 +1,70 @@
#include <iostream>
#include <vector>
#include <getopt.h>
#include <opencv2/opencv.hpp>
#include "inference.h"
using namespace std;
using namespace cv;
int main(int argc, char **argv)
{
std::string projectBasePath = "/home/user/ultralytics"; // Set your ultralytics base path
bool runOnGPU = true;
//
// Pass in either:
//
// "yolov8s.onnx" or "yolov5s.onnx"
//
// To run Inference with yolov8/yolov5 (ONNX)
//
// Note that in this example the classes are hard-coded and 'classes.txt' is a place holder.
Inference inf(projectBasePath + "/yolov8s.onnx", cv::Size(640, 480), "classes.txt", runOnGPU);
std::vector<std::string> imageNames;
imageNames.push_back(projectBasePath + "/ultralytics/assets/bus.jpg");
imageNames.push_back(projectBasePath + "/ultralytics/assets/zidane.jpg");
for (int i = 0; i < imageNames.size(); ++i)
{
cv::Mat frame = cv::imread(imageNames[i]);
// Inference starts here...
std::vector<Detection> output = inf.runInference(frame);
int detections = output.size();
std::cout << "Number of detections:" << detections << std::endl;
for (int i = 0; i < detections; ++i)
{
Detection detection = output[i];
cv::Rect box = detection.box;
cv::Scalar color = detection.color;
// Detection box
cv::rectangle(frame, box, color, 2);
// Detection box text
std::string classString = detection.className + ' ' + std::to_string(detection.confidence).substr(0, 4);
cv::Size textSize = cv::getTextSize(classString, cv::FONT_HERSHEY_DUPLEX, 1, 2, 0);
cv::Rect textBox(box.x, box.y - 40, textSize.width + 10, textSize.height + 20);
cv::rectangle(frame, textBox, color, cv::FILLED);
cv::putText(frame, classString, cv::Point(box.x + 5, box.y - 10), cv::FONT_HERSHEY_DUPLEX, 1, cv::Scalar(0, 0, 0), 2, 0);
}
// Inference ends here...
// This is only for preview purposes
float scale = 0.8;
cv::resize(frame, frame, cv::Size(frame.cols*scale, frame.rows*scale));
cv::imshow("Inference", frame);
cv::waitKey(-1);
}
}

View File

@ -0,0 +1,47 @@
cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
project(yolov8_libtorch_example)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
# -------------- OpenCV --------------
set(OpenCV_DIR "/path/to/opencv/lib/cmake/opencv4")
find_package(OpenCV REQUIRED)
message(STATUS "OpenCV library status:")
message(STATUS " config: ${OpenCV_DIR}")
message(STATUS " version: ${OpenCV_VERSION}")
message(STATUS " libraries: ${OpenCV_LIBS}")
message(STATUS " include path: ${OpenCV_INCLUDE_DIRS}")
include_directories(${OpenCV_INCLUDE_DIRS})
# -------------- libtorch --------------
list(APPEND CMAKE_PREFIX_PATH "/path/to/libtorch")
set(Torch_DIR "/path/to/libtorch/share/cmake/Torch")
find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
message("${TORCH_LIBRARIES}")
message("${TORCH_INCLUDE_DIRS}")
# The following code block is suggested to be used on Windows.
# According to https://github.com/pytorch/pytorch/issues/25457,
# the DLLs need to be copied to avoid memory errors.
# if (MSVC)
# file(GLOB TORCH_DLLS "${TORCH_INSTALL_PREFIX}/lib/*.dll")
# add_custom_command(TARGET yolov8_libtorch_example
# POST_BUILD
# COMMAND ${CMAKE_COMMAND} -E copy_if_different
# ${TORCH_DLLS}
# $<TARGET_FILE_DIR:yolov8_libtorch_example>)
# endif (MSVC)
include_directories(${TORCH_INCLUDE_DIRS})
add_executable(yolov8_libtorch_inference "${CMAKE_CURRENT_SOURCE_DIR}/main.cc")
target_link_libraries(yolov8_libtorch_inference ${TORCH_LIBRARIES} ${OpenCV_LIBS})
set_property(TARGET yolov8_libtorch_inference PROPERTY CXX_STANDARD 17)

View File

@ -0,0 +1,35 @@
# YOLOv8 LibTorch Inference C++
This example demonstrates how to perform inference using YOLOv8 models in C++ with LibTorch API.
## Dependencies
| Dependency | Version |
| ------------ | -------- |
| OpenCV | >=4.0.0 |
| C++ Standard | >=17 |
| Cmake | >=3.18 |
| Libtorch | >=1.12.1 |
## Usage
```bash
git clone ultralytics
cd ultralytics
pip install .
cd examples/YOLOv8-LibTorch-CPP-Inference
mkdir build
cd build
cmake ..
make
./yolov8_libtorch_inference
```
## Exporting YOLOv8
To export YOLOv8 models:
```commandline
yolo export model=yolov8s.pt imgsz=640 format=torchscript
```

View File

@ -0,0 +1,259 @@
#include <iostream>
#include <opencv2/core.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>
#include <torch/torch.h>
#include <torch/script.h>
using torch::indexing::Slice;
using torch::indexing::None;
float generate_scale(cv::Mat& image, const std::vector<int>& target_size) {
int origin_w = image.cols;
int origin_h = image.rows;
int target_h = target_size[0];
int target_w = target_size[1];
float ratio_h = static_cast<float>(target_h) / static_cast<float>(origin_h);
float ratio_w = static_cast<float>(target_w) / static_cast<float>(origin_w);
float resize_scale = std::min(ratio_h, ratio_w);
return resize_scale;
}
float letterbox(cv::Mat &input_image, cv::Mat &output_image, const std::vector<int> &target_size) {
if (input_image.cols == target_size[1] && input_image.rows == target_size[0]) {
if (input_image.data == output_image.data) {
return 1.;
} else {
output_image = input_image.clone();
return 1.;
}
}
float resize_scale = generate_scale(input_image, target_size);
int new_shape_w = std::round(input_image.cols * resize_scale);
int new_shape_h = std::round(input_image.rows * resize_scale);
float padw = (target_size[1] - new_shape_w) / 2.;
float padh = (target_size[0] - new_shape_h) / 2.;
int top = std::round(padh - 0.1);
int bottom = std::round(padh + 0.1);
int left = std::round(padw - 0.1);
int right = std::round(padw + 0.1);
cv::resize(input_image, output_image,
cv::Size(new_shape_w, new_shape_h),
0, 0, cv::INTER_AREA);
cv::copyMakeBorder(output_image, output_image, top, bottom, left, right,
cv::BORDER_CONSTANT, cv::Scalar(114.));
return resize_scale;
}
torch::Tensor xyxy2xywh(const torch::Tensor& x) {
auto y = torch::empty_like(x);
y.index_put_({"...", 0}, (x.index({"...", 0}) + x.index({"...", 2})).div(2));
y.index_put_({"...", 1}, (x.index({"...", 1}) + x.index({"...", 3})).div(2));
y.index_put_({"...", 2}, x.index({"...", 2}) - x.index({"...", 0}));
y.index_put_({"...", 3}, x.index({"...", 3}) - x.index({"...", 1}));
return y;
}
torch::Tensor xywh2xyxy(const torch::Tensor& x) {
auto y = torch::empty_like(x);
auto dw = x.index({"...", 2}).div(2);
auto dh = x.index({"...", 3}).div(2);
y.index_put_({"...", 0}, x.index({"...", 0}) - dw);
y.index_put_({"...", 1}, x.index({"...", 1}) - dh);
y.index_put_({"...", 2}, x.index({"...", 0}) + dw);
y.index_put_({"...", 3}, x.index({"...", 1}) + dh);
return y;
}
// Reference: https://github.com/pytorch/vision/blob/main/torchvision/csrc/ops/cpu/nms_kernel.cpp
torch::Tensor nms(const torch::Tensor& bboxes, const torch::Tensor& scores, float iou_threshold) {
if (bboxes.numel() == 0)
return torch::empty({0}, bboxes.options().dtype(torch::kLong));
auto x1_t = bboxes.select(1, 0).contiguous();
auto y1_t = bboxes.select(1, 1).contiguous();
auto x2_t = bboxes.select(1, 2).contiguous();
auto y2_t = bboxes.select(1, 3).contiguous();
torch::Tensor areas_t = (x2_t - x1_t) * (y2_t - y1_t);
auto order_t = std::get<1>(
scores.sort(/*stable=*/true, /*dim=*/0, /* descending=*/true));
auto ndets = bboxes.size(0);
torch::Tensor suppressed_t = torch::zeros({ndets}, bboxes.options().dtype(torch::kByte));
torch::Tensor keep_t = torch::zeros({ndets}, bboxes.options().dtype(torch::kLong));
auto suppressed = suppressed_t.data_ptr<uint8_t>();
auto keep = keep_t.data_ptr<int64_t>();
auto order = order_t.data_ptr<int64_t>();
auto x1 = x1_t.data_ptr<float>();
auto y1 = y1_t.data_ptr<float>();
auto x2 = x2_t.data_ptr<float>();
auto y2 = y2_t.data_ptr<float>();
auto areas = areas_t.data_ptr<float>();
int64_t num_to_keep = 0;
for (int64_t _i = 0; _i < ndets; _i++) {
auto i = order[_i];
if (suppressed[i] == 1)
continue;
keep[num_to_keep++] = i;
auto ix1 = x1[i];
auto iy1 = y1[i];
auto ix2 = x2[i];
auto iy2 = y2[i];
auto iarea = areas[i];
for (int64_t _j = _i + 1; _j < ndets; _j++) {
auto j = order[_j];
if (suppressed[j] == 1)
continue;
auto xx1 = std::max(ix1, x1[j]);
auto yy1 = std::max(iy1, y1[j]);
auto xx2 = std::min(ix2, x2[j]);
auto yy2 = std::min(iy2, y2[j]);
auto w = std::max(static_cast<float>(0), xx2 - xx1);
auto h = std::max(static_cast<float>(0), yy2 - yy1);
auto inter = w * h;
auto ovr = inter / (iarea + areas[j] - inter);
if (ovr > iou_threshold)
suppressed[j] = 1;
}
}
return keep_t.narrow(0, 0, num_to_keep);
}
torch::Tensor non_max_supperession(torch::Tensor& prediction, float conf_thres = 0.25, float iou_thres = 0.45, int max_det = 300) {
auto bs = prediction.size(0);
auto nc = prediction.size(1) - 4;
auto nm = prediction.size(1) - nc - 4;
auto mi = 4 + nc;
auto xc = prediction.index({Slice(), Slice(4, mi)}).amax(1) > conf_thres;
prediction = prediction.transpose(-1, -2);
prediction.index_put_({"...", Slice({None, 4})}, xywh2xyxy(prediction.index({"...", Slice(None, 4)})));
std::vector<torch::Tensor> output;
for (int i = 0; i < bs; i++) {
output.push_back(torch::zeros({0, 6 + nm}, prediction.device()));
}
for (int xi = 0; xi < prediction.size(0); xi++) {
auto x = prediction[xi];
x = x.index({xc[xi]});
auto x_split = x.split({4, nc, nm}, 1);
auto box = x_split[0], cls = x_split[1], mask = x_split[2];
auto [conf, j] = cls.max(1, true);
x = torch::cat({box, conf, j.toType(torch::kFloat), mask}, 1);
x = x.index({conf.view(-1) > conf_thres});
int n = x.size(0);
if (!n) { continue; }
// NMS
auto c = x.index({Slice(), Slice{5, 6}}) * 7680;
auto boxes = x.index({Slice(), Slice(None, 4)}) + c;
auto scores = x.index({Slice(), 4});
auto i = nms(boxes, scores, iou_thres);
i = i.index({Slice(None, max_det)});
output[xi] = x.index({i});
}
return torch::stack(output);
}
torch::Tensor clip_boxes(torch::Tensor& boxes, const std::vector<int>& shape) {
boxes.index_put_({"...", 0}, boxes.index({"...", 0}).clamp(0, shape[1]));
boxes.index_put_({"...", 1}, boxes.index({"...", 1}).clamp(0, shape[0]));
boxes.index_put_({"...", 2}, boxes.index({"...", 2}).clamp(0, shape[1]));
boxes.index_put_({"...", 3}, boxes.index({"...", 3}).clamp(0, shape[0]));
return boxes;
}
torch::Tensor scale_boxes(const std::vector<int>& img1_shape, torch::Tensor& boxes, const std::vector<int>& img0_shape) {
auto gain = (std::min)((float)img1_shape[0] / img0_shape[0], (float)img1_shape[1] / img0_shape[1]);
auto pad0 = std::round((float)(img1_shape[1] - img0_shape[1] * gain) / 2. - 0.1);
auto pad1 = std::round((float)(img1_shape[0] - img0_shape[0] * gain) / 2. - 0.1);
boxes.index_put_({"...", 0}, boxes.index({"...", 0}) - pad0);
boxes.index_put_({"...", 2}, boxes.index({"...", 2}) - pad0);
boxes.index_put_({"...", 1}, boxes.index({"...", 1}) - pad1);
boxes.index_put_({"...", 3}, boxes.index({"...", 3}) - pad1);
boxes.index_put_({"...", Slice(None, 4)}, boxes.index({"...", Slice(None, 4)}).div(gain));
return boxes;
}
int main() {
// Device
torch::Device device(torch::cuda::is_available() ? torch::kCUDA :torch::kCPU);
// Note that in this example the classes are hard-coded
std::vector<std::string> classes {"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light", "fire hydrant",
"stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra",
"giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite",
"baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife",
"spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair",
"couch", "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
"microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"};
try {
// Load the model (e.g. yolov8s.torchscript)
std::string model_path = "/path/to/yolov8s.torchscript";
torch::jit::script::Module yolo_model;
yolo_model = torch::jit::load(model_path);
yolo_model.eval();
yolo_model.to(device, torch::kFloat32);
// Load image and preprocess
cv::Mat image = cv::imread("/path/to/bus.jpg");
cv::Mat input_image;
letterbox(image, input_image, {640, 640});
torch::Tensor image_tensor = torch::from_blob(input_image.data, {input_image.rows, input_image.cols, 3}, torch::kByte).to(device);
image_tensor = image_tensor.toType(torch::kFloat32).div(255);
image_tensor = image_tensor.permute({2, 0, 1});
image_tensor = image_tensor.unsqueeze(0);
std::vector<torch::jit::IValue> inputs {image_tensor};
// Inference
torch::Tensor output = yolo_model.forward(inputs).toTensor().cpu();
// NMS
auto keep = non_max_supperession(output)[0];
auto boxes = keep.index({Slice(), Slice(None, 4)});
keep.index_put_({Slice(), Slice(None, 4)}, scale_boxes({input_image.rows, input_image.cols}, boxes, {image.rows, image.cols}));
// Show the results
for (int i = 0; i < keep.size(0); i++) {
int x1 = keep[i][0].item().toFloat();
int y1 = keep[i][1].item().toFloat();
int x2 = keep[i][2].item().toFloat();
int y2 = keep[i][3].item().toFloat();
float conf = keep[i][4].item().toFloat();
int cls = keep[i][5].item().toInt();
std::cout << "Rect: [" << x1 << "," << y1 << "," << x2 << "," << y2 << "] Conf: " << conf << " Class: " << classes[cls] << std::endl;
}
} catch (const c10::Error& e) {
std::cout << e.msg() << std::endl;
}
return 0;
}

View File

@ -0,0 +1,96 @@
cmake_minimum_required(VERSION 3.5)
set(PROJECT_NAME Yolov8OnnxRuntimeCPPInference)
project(${PROJECT_NAME} VERSION 0.0.1 LANGUAGES CXX)
# -------------- Support C++17 for using filesystem ------------------#
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS ON)
set(CMAKE_INCLUDE_CURRENT_DIR ON)
# -------------- OpenCV ------------------#
find_package(OpenCV REQUIRED)
include_directories(${OpenCV_INCLUDE_DIRS})
# -------------- Compile CUDA for FP16 inference if needed ------------------#
option(USE_CUDA "Enable CUDA support" ON)
if (NOT APPLE AND USE_CUDA)
find_package(CUDA REQUIRED)
include_directories(${CUDA_INCLUDE_DIRS})
add_definitions(-DUSE_CUDA)
else ()
set(USE_CUDA OFF)
endif ()
# -------------- ONNXRUNTIME ------------------#
# Set ONNXRUNTIME_VERSION
set(ONNXRUNTIME_VERSION 1.15.1)
if (WIN32)
if (USE_CUDA)
set(ONNXRUNTIME_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/onnxruntime-win-x64-gpu-${ONNXRUNTIME_VERSION}")
else ()
set(ONNXRUNTIME_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/onnxruntime-win-x64-${ONNXRUNTIME_VERSION}")
endif ()
elseif (LINUX)
if (USE_CUDA)
set(ONNXRUNTIME_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/onnxruntime-linux-x64-gpu-${ONNXRUNTIME_VERSION}")
else ()
set(ONNXRUNTIME_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/onnxruntime-linux-x64-${ONNXRUNTIME_VERSION}")
endif ()
elseif (APPLE)
set(ONNXRUNTIME_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/onnxruntime-osx-arm64-${ONNXRUNTIME_VERSION}")
# Apple X64 binary
# set(ONNXRUNTIME_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/onnxruntime-osx-x64-${ONNXRUNTIME_VERSION}")
# Apple Universal binary
# set(ONNXRUNTIME_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/onnxruntime-osx-universal2-${ONNXRUNTIME_VERSION}")
endif ()
include_directories(${PROJECT_NAME} ${ONNXRUNTIME_ROOT}/include)
set(PROJECT_SOURCES
main.cpp
inference.h
inference.cpp
)
add_executable(${PROJECT_NAME} ${PROJECT_SOURCES})
if (WIN32)
target_link_libraries(${PROJECT_NAME} ${OpenCV_LIBS} ${ONNXRUNTIME_ROOT}/lib/onnxruntime.lib)
if (USE_CUDA)
target_link_libraries(${PROJECT_NAME} ${CUDA_LIBRARIES})
endif ()
elseif (LINUX)
target_link_libraries(${PROJECT_NAME} ${OpenCV_LIBS} ${ONNXRUNTIME_ROOT}/lib/libonnxruntime.so)
if (USE_CUDA)
target_link_libraries(${PROJECT_NAME} ${CUDA_LIBRARIES})
endif ()
elseif (APPLE)
target_link_libraries(${PROJECT_NAME} ${OpenCV_LIBS} ${ONNXRUNTIME_ROOT}/lib/libonnxruntime.dylib)
endif ()
# For windows system, copy onnxruntime.dll to the same folder of the executable file
if (WIN32)
add_custom_command(TARGET ${PROJECT_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different
"${ONNXRUNTIME_ROOT}/lib/onnxruntime.dll"
$<TARGET_FILE_DIR:${PROJECT_NAME}>)
endif ()
# Download https://raw.githubusercontent.com/ultralytics/ultralytics/main/ultralytics/cfg/datasets/coco.yaml
# and put it in the same folder of the executable file
configure_file(coco.yaml ${CMAKE_CURRENT_BINARY_DIR}/coco.yaml COPYONLY)
# Copy yolov8n.onnx file to the same folder of the executable file
configure_file(yolov8n.onnx ${CMAKE_CURRENT_BINARY_DIR}/yolov8n.onnx COPYONLY)
# Create folder name images in the same folder of the executable file
add_custom_command(TARGET ${PROJECT_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_CURRENT_BINARY_DIR}/images
)

View File

@ -0,0 +1,107 @@
# YOLOv8 OnnxRuntime C++
<img alt="C++" src="https://img.shields.io/badge/C++-17-blue.svg?style=flat&logo=c%2B%2B"> <img alt="Onnx-runtime" src="https://img.shields.io/badge/OnnxRuntime-717272.svg?logo=Onnx&logoColor=white">
This example demonstrates how to perform inference using YOLOv8 in C++ with ONNX Runtime and OpenCV's API.
## Benefits ✨
- Friendly for deployment in the industrial sector.
- Faster than OpenCV's DNN inference on both CPU and GPU.
- Supports FP32 and FP16 CUDA acceleration.
## Note ☕
1. Benefit for Ultralytics' latest release, a `Transpose` op is added to the YOLOv8 model, while make v8 and v5 has the same output shape. Therefore, you can run inference with YOLOv5/v7/v8 via this project.
## Exporting YOLOv8 Models 📦
To export YOLOv8 models, use the following Python script:
```python
from ultralytics import YOLO
# Load a YOLOv8 model
model = YOLO("yolov8n.pt")
# Export the model
model.export(format="onnx", opset=12, simplify=True, dynamic=False, imgsz=640)
```
Alternatively, you can use the following command for exporting the model in the terminal
```bash
yolo export model=yolov8n.pt opset=12 simplify=True dynamic=False format=onnx imgsz=640,640
```
## Exporting YOLOv8 FP16 Models 📦
```python
import onnx
from onnxconverter_common import float16
model = onnx.load(R'YOUR_ONNX_PATH')
model_fp16 = float16.convert_float_to_float16(model)
onnx.save(model_fp16, R'YOUR_FP16_ONNX_PATH')
```
## Download COCO.yaml file 📂
In order to run example, you also need to download coco.yaml. You can download the file manually from [here](https://raw.githubusercontent.com/ultralytics/ultralytics/main/ultralytics/cfg/datasets/coco.yaml)
## Dependencies ⚙️
| Dependency | Version |
| -------------------------------- | -------------- |
| Onnxruntime(linux,windows,macos) | >=1.14.1 |
| OpenCV | >=4.0.0 |
| C++ Standard | >=17 |
| Cmake | >=3.5 |
| Cuda (Optional) | >=11.4 \<12.0 |
| cuDNN (Cuda required) | =8 |
Note: The dependency on C++17 is due to the usage of the C++17 filesystem feature.
Note (2): Due to ONNX Runtime, we need to use CUDA 11 and cuDNN 8. Keep in mind that this requirement might change in the future.
## Build 🛠️
1. Clone the repository to your local machine.
2. Navigate to the root directory of the repository.
3. Create a build directory and navigate to it:
```console
mkdir build && cd build
```
4. Run CMake to generate the build files:
```console
cmake ..
```
5. Build the project:
```console
make
```
6. The built executable should now be located in the `build` directory.
## Usage 🚀
```c++
//change your param as you like
//Pay attention to your device and the onnx model type(fp32 or fp16)
DL_INIT_PARAM params;
params.rectConfidenceThreshold = 0.1;
params.iouThreshold = 0.5;
params.modelPath = "yolov8n.onnx";
params.imgSize = { 640, 640 };
params.cudaEnable = true;
params.modelType = YOLO_DETECT_V8;
yoloDetector->CreateSession(params);
Detector(yoloDetector);
```

View File

@ -0,0 +1,363 @@
#include "inference.h"
#include <regex>
#define benchmark
#define min(a,b) (((a) < (b)) ? (a) : (b))
YOLO_V8::YOLO_V8() {
}
YOLO_V8::~YOLO_V8() {
delete session;
}
#ifdef USE_CUDA
namespace Ort
{
template<>
struct TypeToTensorType<half> { static constexpr ONNXTensorElementDataType type = ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT16; };
}
#endif
template<typename T>
char* BlobFromImage(cv::Mat& iImg, T& iBlob) {
int channels = iImg.channels();
int imgHeight = iImg.rows;
int imgWidth = iImg.cols;
for (int c = 0; c < channels; c++)
{
for (int h = 0; h < imgHeight; h++)
{
for (int w = 0; w < imgWidth; w++)
{
iBlob[c * imgWidth * imgHeight + h * imgWidth + w] = typename std::remove_pointer<T>::type(
(iImg.at<cv::Vec3b>(h, w)[c]) / 255.0f);
}
}
}
return RET_OK;
}
char* YOLO_V8::PreProcess(cv::Mat& iImg, std::vector<int> iImgSize, cv::Mat& oImg)
{
if (iImg.channels() == 3)
{
oImg = iImg.clone();
cv::cvtColor(oImg, oImg, cv::COLOR_BGR2RGB);
}
else
{
cv::cvtColor(iImg, oImg, cv::COLOR_GRAY2RGB);
}
switch (modelType)
{
case YOLO_DETECT_V8:
case YOLO_POSE:
case YOLO_DETECT_V8_HALF:
case YOLO_POSE_V8_HALF://LetterBox
{
if (iImg.cols >= iImg.rows)
{
resizeScales = iImg.cols / (float)iImgSize.at(0);
cv::resize(oImg, oImg, cv::Size(iImgSize.at(0), int(iImg.rows / resizeScales)));
}
else
{
resizeScales = iImg.rows / (float)iImgSize.at(0);
cv::resize(oImg, oImg, cv::Size(int(iImg.cols / resizeScales), iImgSize.at(1)));
}
cv::Mat tempImg = cv::Mat::zeros(iImgSize.at(0), iImgSize.at(1), CV_8UC3);
oImg.copyTo(tempImg(cv::Rect(0, 0, oImg.cols, oImg.rows)));
oImg = tempImg;
break;
}
case YOLO_CLS://CenterCrop
{
int h = iImg.rows;
int w = iImg.cols;
int m = min(h, w);
int top = (h - m) / 2;
int left = (w - m) / 2;
cv::resize(oImg(cv::Rect(left, top, m, m)), oImg, cv::Size(iImgSize.at(0), iImgSize.at(1)));
break;
}
}
return RET_OK;
}
char* YOLO_V8::CreateSession(DL_INIT_PARAM& iParams) {
char* Ret = RET_OK;
std::regex pattern("[\u4e00-\u9fa5]");
bool result = std::regex_search(iParams.modelPath, pattern);
if (result)
{
Ret = "[YOLO_V8]:Your model path is error.Change your model path without chinese characters.";
std::cout << Ret << std::endl;
return Ret;
}
try
{
rectConfidenceThreshold = iParams.rectConfidenceThreshold;
iouThreshold = iParams.iouThreshold;
imgSize = iParams.imgSize;
modelType = iParams.modelType;
env = Ort::Env(ORT_LOGGING_LEVEL_WARNING, "Yolo");
Ort::SessionOptions sessionOption;
if (iParams.cudaEnable)
{
cudaEnable = iParams.cudaEnable;
OrtCUDAProviderOptions cudaOption;
cudaOption.device_id = 0;
sessionOption.AppendExecutionProvider_CUDA(cudaOption);
}
sessionOption.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
sessionOption.SetIntraOpNumThreads(iParams.intraOpNumThreads);
sessionOption.SetLogSeverityLevel(iParams.logSeverityLevel);
#ifdef _WIN32
int ModelPathSize = MultiByteToWideChar(CP_UTF8, 0, iParams.modelPath.c_str(), static_cast<int>(iParams.modelPath.length()), nullptr, 0);
wchar_t* wide_cstr = new wchar_t[ModelPathSize + 1];
MultiByteToWideChar(CP_UTF8, 0, iParams.modelPath.c_str(), static_cast<int>(iParams.modelPath.length()), wide_cstr, ModelPathSize);
wide_cstr[ModelPathSize] = L'\0';
const wchar_t* modelPath = wide_cstr;
#else
const char* modelPath = iParams.modelPath.c_str();
#endif // _WIN32
session = new Ort::Session(env, modelPath, sessionOption);
Ort::AllocatorWithDefaultOptions allocator;
size_t inputNodesNum = session->GetInputCount();
for (size_t i = 0; i < inputNodesNum; i++)
{
Ort::AllocatedStringPtr input_node_name = session->GetInputNameAllocated(i, allocator);
char* temp_buf = new char[50];
strcpy(temp_buf, input_node_name.get());
inputNodeNames.push_back(temp_buf);
}
size_t OutputNodesNum = session->GetOutputCount();
for (size_t i = 0; i < OutputNodesNum; i++)
{
Ort::AllocatedStringPtr output_node_name = session->GetOutputNameAllocated(i, allocator);
char* temp_buf = new char[10];
strcpy(temp_buf, output_node_name.get());
outputNodeNames.push_back(temp_buf);
}
options = Ort::RunOptions{ nullptr };
WarmUpSession();
return RET_OK;
}
catch (const std::exception& e)
{
const char* str1 = "[YOLO_V8]:";
const char* str2 = e.what();
std::string result = std::string(str1) + std::string(str2);
char* merged = new char[result.length() + 1];
std::strcpy(merged, result.c_str());
std::cout << merged << std::endl;
delete[] merged;
return "[YOLO_V8]:Create session failed.";
}
}
char* YOLO_V8::RunSession(cv::Mat& iImg, std::vector<DL_RESULT>& oResult) {
#ifdef benchmark
clock_t starttime_1 = clock();
#endif // benchmark
char* Ret = RET_OK;
cv::Mat processedImg;
PreProcess(iImg, imgSize, processedImg);
if (modelType < 4)
{
float* blob = new float[processedImg.total() * 3];
BlobFromImage(processedImg, blob);
std::vector<int64_t> inputNodeDims = { 1, 3, imgSize.at(0), imgSize.at(1) };
TensorProcess(starttime_1, iImg, blob, inputNodeDims, oResult);
}
else
{
#ifdef USE_CUDA
half* blob = new half[processedImg.total() * 3];
BlobFromImage(processedImg, blob);
std::vector<int64_t> inputNodeDims = { 1,3,imgSize.at(0),imgSize.at(1) };
TensorProcess(starttime_1, iImg, blob, inputNodeDims, oResult);
#endif
}
return Ret;
}
template<typename N>
char* YOLO_V8::TensorProcess(clock_t& starttime_1, cv::Mat& iImg, N& blob, std::vector<int64_t>& inputNodeDims,
std::vector<DL_RESULT>& oResult) {
Ort::Value inputTensor = Ort::Value::CreateTensor<typename std::remove_pointer<N>::type>(
Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU), blob, 3 * imgSize.at(0) * imgSize.at(1),
inputNodeDims.data(), inputNodeDims.size());
#ifdef benchmark
clock_t starttime_2 = clock();
#endif // benchmark
auto outputTensor = session->Run(options, inputNodeNames.data(), &inputTensor, 1, outputNodeNames.data(),
outputNodeNames.size());
#ifdef benchmark
clock_t starttime_3 = clock();
#endif // benchmark
Ort::TypeInfo typeInfo = outputTensor.front().GetTypeInfo();
auto tensor_info = typeInfo.GetTensorTypeAndShapeInfo();
std::vector<int64_t> outputNodeDims = tensor_info.GetShape();
auto output = outputTensor.front().GetTensorMutableData<typename std::remove_pointer<N>::type>();
delete[] blob;
switch (modelType)
{
case YOLO_DETECT_V8:
case YOLO_DETECT_V8_HALF:
{
int strideNum = outputNodeDims[1];//8400
int signalResultNum = outputNodeDims[2];//84
std::vector<int> class_ids;
std::vector<float> confidences;
std::vector<cv::Rect> boxes;
cv::Mat rawData;
if (modelType == YOLO_DETECT_V8)
{
// FP32
rawData = cv::Mat(strideNum, signalResultNum, CV_32F, output);
}
else
{
// FP16
rawData = cv::Mat(strideNum, signalResultNum, CV_16F, output);
rawData.convertTo(rawData, CV_32F);
}
//Note:
//ultralytics add transpose operator to the output of yolov8 model.which make yolov8/v5/v7 has same shape
//https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8n.pt
//rowData = rowData.t();
float* data = (float*)rawData.data;
for (int i = 0; i < strideNum; ++i)
{
float* classesScores = data + 4;
cv::Mat scores(1, this->classes.size(), CV_32FC1, classesScores);
cv::Point class_id;
double maxClassScore;
cv::minMaxLoc(scores, 0, &maxClassScore, 0, &class_id);
if (maxClassScore > rectConfidenceThreshold)
{
confidences.push_back(maxClassScore);
class_ids.push_back(class_id.x);
float x = data[0];
float y = data[1];
float w = data[2];
float h = data[3];
int left = int((x - 0.5 * w) * resizeScales);
int top = int((y - 0.5 * h) * resizeScales);
int width = int(w * resizeScales);
int height = int(h * resizeScales);
boxes.push_back(cv::Rect(left, top, width, height));
}
data += signalResultNum;
}
std::vector<int> nmsResult;
cv::dnn::NMSBoxes(boxes, confidences, rectConfidenceThreshold, iouThreshold, nmsResult);
for (int i = 0; i < nmsResult.size(); ++i)
{
int idx = nmsResult[i];
DL_RESULT result;
result.classId = class_ids[idx];
result.confidence = confidences[idx];
result.box = boxes[idx];
oResult.push_back(result);
}
#ifdef benchmark
clock_t starttime_4 = clock();
double pre_process_time = (double)(starttime_2 - starttime_1) / CLOCKS_PER_SEC * 1000;
double process_time = (double)(starttime_3 - starttime_2) / CLOCKS_PER_SEC * 1000;
double post_process_time = (double)(starttime_4 - starttime_3) / CLOCKS_PER_SEC * 1000;
if (cudaEnable)
{
std::cout << "[YOLO_V8(CUDA)]: " << pre_process_time << "ms pre-process, " << process_time << "ms inference, " << post_process_time << "ms post-process." << std::endl;
}
else
{
std::cout << "[YOLO_V8(CPU)]: " << pre_process_time << "ms pre-process, " << process_time << "ms inference, " << post_process_time << "ms post-process." << std::endl;
}
#endif // benchmark
break;
}
case YOLO_CLS:
{
DL_RESULT result;
for (int i = 0; i < this->classes.size(); i++)
{
result.classId = i;
result.confidence = output[i];
oResult.push_back(result);
}
break;
}
default:
std::cout << "[YOLO_V8]: " << "Not support model type." << std::endl;
}
return RET_OK;
}
char* YOLO_V8::WarmUpSession() {
clock_t starttime_1 = clock();
cv::Mat iImg = cv::Mat(cv::Size(imgSize.at(0), imgSize.at(1)), CV_8UC3);
cv::Mat processedImg;
PreProcess(iImg, imgSize, processedImg);
if (modelType < 4)
{
float* blob = new float[iImg.total() * 3];
BlobFromImage(processedImg, blob);
std::vector<int64_t> YOLO_input_node_dims = { 1, 3, imgSize.at(0), imgSize.at(1) };
Ort::Value input_tensor = Ort::Value::CreateTensor<float>(
Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU), blob, 3 * imgSize.at(0) * imgSize.at(1),
YOLO_input_node_dims.data(), YOLO_input_node_dims.size());
auto output_tensors = session->Run(options, inputNodeNames.data(), &input_tensor, 1, outputNodeNames.data(),
outputNodeNames.size());
delete[] blob;
clock_t starttime_4 = clock();
double post_process_time = (double)(starttime_4 - starttime_1) / CLOCKS_PER_SEC * 1000;
if (cudaEnable)
{
std::cout << "[YOLO_V8(CUDA)]: " << "Cuda warm-up cost " << post_process_time << " ms. " << std::endl;
}
}
else
{
#ifdef USE_CUDA
half* blob = new half[iImg.total() * 3];
BlobFromImage(processedImg, blob);
std::vector<int64_t> YOLO_input_node_dims = { 1,3,imgSize.at(0),imgSize.at(1) };
Ort::Value input_tensor = Ort::Value::CreateTensor<half>(Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU), blob, 3 * imgSize.at(0) * imgSize.at(1), YOLO_input_node_dims.data(), YOLO_input_node_dims.size());
auto output_tensors = session->Run(options, inputNodeNames.data(), &input_tensor, 1, outputNodeNames.data(), outputNodeNames.size());
delete[] blob;
clock_t starttime_4 = clock();
double post_process_time = (double)(starttime_4 - starttime_1) / CLOCKS_PER_SEC * 1000;
if (cudaEnable)
{
std::cout << "[YOLO_V8(CUDA)]: " << "Cuda warm-up cost " << post_process_time << " ms. " << std::endl;
}
#endif
}
return RET_OK;
}

View File

@ -0,0 +1,93 @@
#pragma once
#define RET_OK nullptr
#ifdef _WIN32
#include <Windows.h>
#include <direct.h>
#include <io.h>
#endif
#include <string>
#include <vector>
#include <cstdio>
#include <opencv2/opencv.hpp>
#include "onnxruntime_cxx_api.h"
#ifdef USE_CUDA
#include <cuda_fp16.h>
#endif
enum MODEL_TYPE
{
//FLOAT32 MODEL
YOLO_DETECT_V8 = 1,
YOLO_POSE = 2,
YOLO_CLS = 3,
//FLOAT16 MODEL
YOLO_DETECT_V8_HALF = 4,
YOLO_POSE_V8_HALF = 5,
};
typedef struct _DL_INIT_PARAM
{
std::string modelPath;
MODEL_TYPE modelType = YOLO_DETECT_V8;
std::vector<int> imgSize = { 640, 640 };
float rectConfidenceThreshold = 0.6;
float iouThreshold = 0.5;
int keyPointsNum = 2;//Note:kpt number for pose
bool cudaEnable = false;
int logSeverityLevel = 3;
int intraOpNumThreads = 1;
} DL_INIT_PARAM;
typedef struct _DL_RESULT
{
int classId;
float confidence;
cv::Rect box;
std::vector<cv::Point2f> keyPoints;
} DL_RESULT;
class YOLO_V8
{
public:
YOLO_V8();
~YOLO_V8();
public:
char* CreateSession(DL_INIT_PARAM& iParams);
char* RunSession(cv::Mat& iImg, std::vector<DL_RESULT>& oResult);
char* WarmUpSession();
template<typename N>
char* TensorProcess(clock_t& starttime_1, cv::Mat& iImg, N& blob, std::vector<int64_t>& inputNodeDims,
std::vector<DL_RESULT>& oResult);
char* PreProcess(cv::Mat& iImg, std::vector<int> iImgSize, cv::Mat& oImg);
std::vector<std::string> classes{};
private:
Ort::Env env;
Ort::Session* session;
bool cudaEnable;
Ort::RunOptions options;
std::vector<const char*> inputNodeNames;
std::vector<const char*> outputNodeNames;
MODEL_TYPE modelType;
std::vector<int> imgSize;
float rectConfidenceThreshold;
float iouThreshold;
float resizeScales;//letterbox scale
};

View File

@ -0,0 +1,193 @@
#include <iostream>
#include <iomanip>
#include "inference.h"
#include <filesystem>
#include <fstream>
#include <random>
void Detector(YOLO_V8*& p) {
std::filesystem::path current_path = std::filesystem::current_path();
std::filesystem::path imgs_path = current_path / "images";
for (auto& i : std::filesystem::directory_iterator(imgs_path))
{
if (i.path().extension() == ".jpg" || i.path().extension() == ".png" || i.path().extension() == ".jpeg")
{
std::string img_path = i.path().string();
cv::Mat img = cv::imread(img_path);
std::vector<DL_RESULT> res;
p->RunSession(img, res);
for (auto& re : res)
{
cv::RNG rng(cv::getTickCount());
cv::Scalar color(rng.uniform(0, 256), rng.uniform(0, 256), rng.uniform(0, 256));
cv::rectangle(img, re.box, color, 3);
float confidence = floor(100 * re.confidence) / 100;
std::cout << std::fixed << std::setprecision(2);
std::string label = p->classes[re.classId] + " " +
std::to_string(confidence).substr(0, std::to_string(confidence).size() - 4);
cv::rectangle(
img,
cv::Point(re.box.x, re.box.y - 25),
cv::Point(re.box.x + label.length() * 15, re.box.y),
color,
cv::FILLED
);
cv::putText(
img,
label,
cv::Point(re.box.x, re.box.y - 5),
cv::FONT_HERSHEY_SIMPLEX,
0.75,
cv::Scalar(0, 0, 0),
2
);
}
std::cout << "Press any key to exit" << std::endl;
cv::imshow("Result of Detection", img);
cv::waitKey(0);
cv::destroyAllWindows();
}
}
}
void Classifier(YOLO_V8*& p)
{
std::filesystem::path current_path = std::filesystem::current_path();
std::filesystem::path imgs_path = current_path;// / "images"
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<int> dis(0, 255);
for (auto& i : std::filesystem::directory_iterator(imgs_path))
{
if (i.path().extension() == ".jpg" || i.path().extension() == ".png")
{
std::string img_path = i.path().string();
//std::cout << img_path << std::endl;
cv::Mat img = cv::imread(img_path);
std::vector<DL_RESULT> res;
char* ret = p->RunSession(img, res);
float positionY = 50;
for (int i = 0; i < res.size(); i++)
{
int r = dis(gen);
int g = dis(gen);
int b = dis(gen);
cv::putText(img, std::to_string(i) + ":", cv::Point(10, positionY), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(b, g, r), 2);
cv::putText(img, std::to_string(res.at(i).confidence), cv::Point(70, positionY), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(b, g, r), 2);
positionY += 50;
}
cv::imshow("TEST_CLS", img);
cv::waitKey(0);
cv::destroyAllWindows();
//cv::imwrite("E:\\output\\" + std::to_string(k) + ".png", img);
}
}
}
int ReadCocoYaml(YOLO_V8*& p) {
// Open the YAML file
std::ifstream file("coco.yaml");
if (!file.is_open())
{
std::cerr << "Failed to open file" << std::endl;
return 1;
}
// Read the file line by line
std::string line;
std::vector<std::string> lines;
while (std::getline(file, line))
{
lines.push_back(line);
}
// Find the start and end of the names section
std::size_t start = 0;
std::size_t end = 0;
for (std::size_t i = 0; i < lines.size(); i++)
{
if (lines[i].find("names:") != std::string::npos)
{
start = i + 1;
}
else if (start > 0 && lines[i].find(':') == std::string::npos)
{
end = i;
break;
}
}
// Extract the names
std::vector<std::string> names;
for (std::size_t i = start; i < end; i++)
{
std::stringstream ss(lines[i]);
std::string name;
std::getline(ss, name, ':'); // Extract the number before the delimiter
std::getline(ss, name); // Extract the string after the delimiter
names.push_back(name);
}
p->classes = names;
return 0;
}
void DetectTest()
{
YOLO_V8* yoloDetector = new YOLO_V8;
ReadCocoYaml(yoloDetector);
DL_INIT_PARAM params;
params.rectConfidenceThreshold = 0.1;
params.iouThreshold = 0.5;
params.modelPath = "yolov8n.onnx";
params.imgSize = { 640, 640 };
#ifdef USE_CUDA
params.cudaEnable = true;
// GPU FP32 inference
params.modelType = YOLO_DETECT_V8;
// GPU FP16 inference
//Note: change fp16 onnx model
//params.modelType = YOLO_DETECT_V8_HALF;
#else
// CPU inference
params.modelType = YOLO_DETECT_V8;
params.cudaEnable = false;
#endif
yoloDetector->CreateSession(params);
Detector(yoloDetector);
}
void ClsTest()
{
YOLO_V8* yoloDetector = new YOLO_V8;
std::string model_path = "cls.onnx";
ReadCocoYaml(yoloDetector);
DL_INIT_PARAM params{ model_path, YOLO_CLS, {224, 224} };
yoloDetector->CreateSession(params);
Classifier(yoloDetector);
}
int main()
{
//DetectTest();
ClsTest();
}

View File

@ -0,0 +1,21 @@
[package]
name = "yolov8-rs"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
clap = { version = "4.2.4", features = ["derive"] }
image = { version = "0.24.7", default-features = false, features = ["jpeg", "png", "webp-encoder"] }
imageproc = { version = "0.23.0", default-features = false }
ndarray = { version = "0.15.6" }
ort = {version = "1.16.3", default-features = false, features = ["load-dynamic", "copy-dylibs", "half"]}
rusttype = { version = "0.9", default-features = false }
anyhow = { version = "1.0.75"}
regex = { version = "1.5.4" }
rand = { version ="0.8.5" }
chrono = { version = "0.4.30" }
half = { version = "2.3.1" }
dirs = { version = "5.0.1" }
ureq = { version = "2.9.1" }

View File

@ -0,0 +1,221 @@
# YOLOv8-ONNXRuntime-Rust for All the Key YOLO Tasks
This repository provides a Rust demo for performing YOLOv8 tasks like `Classification`, `Segmentation`, `Detection` and `Pose Detection` using ONNXRuntime.
## Features
- Support `Classification`, `Segmentation`, `Detection`, `Pose(Keypoints)-Detection` tasks.
- Support `FP16` & `FP32` ONNX models.
- Support `CPU`, `CUDA` and `TensorRT` execution provider to accelerate computation.
- Support dynamic input shapes(`batch`, `width`, `height`).
## Installation
### 1. Install Rust
Please follow the Rust official installation. (https://www.rust-lang.org/tools/install)
### 2. Install ONNXRuntime
This repository use `ort` crate, which is ONNXRuntime wrapper for Rust. (https://docs.rs/ort/latest/ort/)
You can follow the instruction with `ort` doc or simply do this:
- step1: Download ONNXRuntime(https://github.com/microsoft/onnxruntime/releases)
- setp2: Set environment variable `PATH` for linking.
On ubuntu, You can do like this:
```
vim ~/.bashrc
# Add the path of ONNXRUntime lib
export LD_LIBRARY_PATH=/home/qweasd/Documents/onnxruntime-linux-x64-gpu-1.16.3/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
source ~/.bashrc
```
### 3. \[Optional\] Install CUDA & CuDNN & TensorRT
- CUDA execution provider requires CUDA v11.6+.
- TensorRT execution provider requires CUDA v11.4+ and TensorRT v8.4+.
## Get Started
### 1. Export the YOLOv8 ONNX Models
```bash
pip install -U ultralytics
# export onnx model with dynamic shapes
yolo export model=yolov8m.pt format=onnx simplify dynamic
yolo export model=yolov8m-cls.pt format=onnx simplify dynamic
yolo export model=yolov8m-pose.pt format=onnx simplify dynamic
yolo export model=yolov8m-seg.pt format=onnx simplify dynamic
# export onnx model with constant shapes
yolo export model=yolov8m.pt format=onnx simplify
yolo export model=yolov8m-cls.pt format=onnx simplify
yolo export model=yolov8m-pose.pt format=onnx simplify
yolo export model=yolov8m-seg.pt format=onnx simplify
```
### 2. Run Inference
It will perform inference with the ONNX model on the source image.
```
cargo run --release -- --model <MODEL> --source <SOURCE>
```
Set `--cuda` to use CUDA execution provider to speed up inference.
```
cargo run --release -- --cuda --model <MODEL> --source <SOURCE>
```
Set `--trt` to use TensorRT execution provider, and you can set `--fp16` at the same time to use TensorRT FP16 engine.
```
cargo run --release -- --trt --fp16 --model <MODEL> --source <SOURCE>
```
Set `--device_id` to select which device to run. When you have only one GPU, and you set `device_id` to 1 will not cause program panic, the `ort` would automatically fall back to `CPU` EP.
```
cargo run --release -- --cuda --device_id 0 --model <MODEL> --source <SOURCE>
```
Set `--batch` to do multi-batch-size inference.
If you're using `--trt`, you can also set `--batch-min` and `--batch-max` to explicitly specify min/max/opt batch for dynamic batch input.(https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#explicit-shape-range-for-dynamic-shape-input).(Note that the ONNX model should exported with dynamic shapes)
```
cargo run --release -- --cuda --batch 2 --model <MODEL> --source <SOURCE>
```
Set `--height` and `--width` to do dynamic image size inference. (Note that the ONNX model should exported with dynamic shapes)
```
cargo run --release -- --cuda --width 480 --height 640 --model <MODEL> --source <SOURCE>
```
Set `--profile` to check time consumed in each stage.(Note that the model usually needs to take 1~3 times dry run to warmup. Make sure to run enough times to evaluate the result.)
```
cargo run --release -- --trt --fp16 --profile --model <MODEL> --source <SOURCE>
```
Results: (yolov8m.onnx, batch=1, 3 times, trt, fp16, RTX 3060Ti)
```
==> 0
[Model Preprocess]: 12.75788ms
[ORT H2D]: 237.118µs
[ORT Inference]: 507.895469ms
[ORT D2H]: 191.655µs
[Model Inference]: 508.34589ms
[Model Postprocess]: 1.061122ms
==> 1
[Model Preprocess]: 13.658655ms
[ORT H2D]: 209.975µs
[ORT Inference]: 5.12372ms
[ORT D2H]: 182.389µs
[Model Inference]: 5.530022ms
[Model Postprocess]: 1.04851ms
==> 2
[Model Preprocess]: 12.475332ms
[ORT H2D]: 246.127µs
[ORT Inference]: 5.048432ms
[ORT D2H]: 187.117µs
[Model Inference]: 5.493119ms
[Model Postprocess]: 1.040906ms
```
And also:
`--conf`: confidence threshold \[default: 0.3\]
`--iou`: iou threshold in NMS \[default: 0.45\]
`--kconf`: confidence threshold of keypoint \[default: 0.55\]
`--plot`: plot inference result with random RGB color and save
you can check out all CLI arguments by:
```
git clone https://github.com/ultralytics/ultralytics
cd ultralytics/examples/YOLOv8-ONNXRuntime-Rust
cargo run --release -- --help
```
## Examples
### Classification
Running dynamic shape ONNX model on `CPU` with image size `--height 224 --width 224`. Saving plotted image in `runs` directory.
```
cargo run --release -- --model ../assets/weights/yolov8m-cls-dyn.onnx --source ../assets/images/dog.jpg --height 224 --width 224 --plot --profile
```
You will see result like:
```
Summary:
> Task: Classify (Ultralytics 8.0.217)
> EP: Cpu
> Dtype: Float32
> Batch: 1 (Dynamic), Height: 224 (Dynamic), Width: 224 (Dynamic)
> nc: 1000 nk: 0, nm: 0, conf: 0.3, kconf: 0.55, iou: 0.45
[Model Preprocess]: 16.363477ms
[ORT H2D]: 50.722µs
[ORT Inference]: 16.295808ms
[ORT D2H]: 8.37µs
[Model Inference]: 16.367046ms
[Model Postprocess]: 3.527µs
[
YOLOResult {
Probs(top5): Some([(208, 0.6950566), (209, 0.13823675), (178, 0.04849795), (215, 0.019029364), (212, 0.016506357)]),
Bboxes: None,
Keypoints: None,
Masks: None,
},
]
```
![2023-11-25-22-02-02-156623351](https://github.com/jamjamjon/ultralytics/assets/51357717/ef75c2ae-c5ab-44cc-9d9e-e60b51e39662)
### Object Detection
Using `CUDA` EP and dynamic image size `--height 640 --width 480`
```
cargo run --release -- --cuda --model ../assets/weights/yolov8m-dynamic.onnx --source ../assets/images/bus.jpg --plot --height 640 --width 480
```
![det](https://github.com/jamjamjon/ultralytics/assets/51357717/5d89a19d-0c96-4a59-875c-defab6887a2c)
### Pose Detection
using `TensorRT` EP
```
cargo run --release -- --trt --model ../assets/weights/yolov8m-pose.onnx --source ../assets/images/bus.jpg --plot
```
![2023-11-25-22-31-45-127054025](https://github.com/jamjamjon/ultralytics/assets/51357717/157b5ba7-bfcf-47cf-bee7-68b62e0de1c4)
### Instance Segmentation
using `TensorRT` EP and FP16 model `--fp16`
```
cargo run --release -- --trt --fp16 --model ../assets/weights/yolov8m-seg.onnx --source ../assets/images/0172.jpg --plot
```
![seg](https://github.com/jamjamjon/ultralytics/assets/51357717/cf046f4f-9533-478a-adc7-4de22443a641)

View File

@ -0,0 +1,87 @@
use clap::Parser;
use crate::YOLOTask;
#[derive(Parser, Clone)]
#[command(author, version, about, long_about = None)]
pub struct Args {
/// ONNX model path
#[arg(long, required = true)]
pub model: String,
/// input path
#[arg(long, required = true)]
pub source: String,
/// device id
#[arg(long, default_value_t = 0)]
pub device_id: u32,
/// using TensorRT EP
#[arg(long)]
pub trt: bool,
/// using CUDA EP
#[arg(long)]
pub cuda: bool,
/// input batch size
#[arg(long, default_value_t = 1)]
pub batch: u32,
/// trt input min_batch size
#[arg(long, default_value_t = 1)]
pub batch_min: u32,
/// trt input max_batch size
#[arg(long, default_value_t = 32)]
pub batch_max: u32,
/// using TensorRT --fp16
#[arg(long)]
pub fp16: bool,
/// specify YOLO task
#[arg(long, value_enum)]
pub task: Option<YOLOTask>,
/// num_classes
#[arg(long)]
pub nc: Option<u32>,
/// num_keypoints
#[arg(long)]
pub nk: Option<u32>,
/// num_masks
#[arg(long)]
pub nm: Option<u32>,
/// input image width
#[arg(long)]
pub width: Option<u32>,
/// input image height
#[arg(long)]
pub height: Option<u32>,
/// confidence threshold
#[arg(long, required = false, default_value_t = 0.3)]
pub conf: f32,
/// iou threshold in NMS
#[arg(long, required = false, default_value_t = 0.45)]
pub iou: f32,
/// confidence threshold of keypoint
#[arg(long, required = false, default_value_t = 0.55)]
pub kconf: f32,
/// plot inference result and save
#[arg(long)]
pub plot: bool,
/// check time consumed in each stage
#[arg(long)]
pub profile: bool,
}

View File

@ -0,0 +1,119 @@
#![allow(clippy::type_complexity)]
use std::io::{Read, Write};
pub mod cli;
pub mod model;
pub mod ort_backend;
pub mod yolo_result;
pub use crate::cli::Args;
pub use crate::model::YOLOv8;
pub use crate::ort_backend::{Batch, OrtBackend, OrtConfig, OrtEP, YOLOTask};
pub use crate::yolo_result::{Bbox, Embedding, Point2, YOLOResult};
pub fn non_max_suppression(
xs: &mut Vec<(Bbox, Option<Vec<Point2>>, Option<Vec<f32>>)>,
iou_threshold: f32,
) {
xs.sort_by(|b1, b2| b2.0.confidence().partial_cmp(&b1.0.confidence()).unwrap());
let mut current_index = 0;
for index in 0..xs.len() {
let mut drop = false;
for prev_index in 0..current_index {
let iou = xs[prev_index].0.iou(&xs[index].0);
if iou > iou_threshold {
drop = true;
break;
}
}
if !drop {
xs.swap(current_index, index);
current_index += 1;
}
}
xs.truncate(current_index);
}
pub fn gen_time_string(delimiter: &str) -> String {
let offset = chrono::FixedOffset::east_opt(8 * 60 * 60).unwrap(); // Beijing
let t_now = chrono::Utc::now().with_timezone(&offset);
let fmt = format!(
"%Y{}%m{}%d{}%H{}%M{}%S{}%f",
delimiter, delimiter, delimiter, delimiter, delimiter, delimiter
);
t_now.format(&fmt).to_string()
}
pub const SKELETON: [(usize, usize); 16] = [
(0, 1),
(0, 2),
(1, 3),
(2, 4),
(5, 6),
(5, 11),
(6, 12),
(11, 12),
(5, 7),
(6, 8),
(7, 9),
(8, 10),
(11, 13),
(12, 14),
(13, 15),
(14, 16),
];
pub fn check_font(font: &str) -> rusttype::Font<'static> {
// check then load font
// ultralytics font path
let font_path_config = match dirs::config_dir() {
Some(mut d) => {
d.push("Ultralytics");
d.push(font);
d
}
None => panic!("Unsupported operating system. Now support Linux, MacOS, Windows."),
};
// current font path
let font_path_current = std::path::PathBuf::from(font);
// check font
let font_path = if font_path_config.exists() {
font_path_config
} else if font_path_current.exists() {
font_path_current
} else {
println!("Downloading font...");
let source_url = "https://ultralytics.com/assets/Arial.ttf";
let resp = ureq::get(source_url)
.timeout(std::time::Duration::from_secs(500))
.call()
.unwrap_or_else(|err| panic!("> Failed to download font: {source_url}: {err:?}"));
// read to buffer
let mut buffer = vec![];
let total_size = resp
.header("Content-Length")
.and_then(|s| s.parse::<u64>().ok())
.unwrap();
let _reader = resp
.into_reader()
.take(total_size)
.read_to_end(&mut buffer)
.unwrap();
// save
let _path = std::fs::File::create(font).unwrap();
let mut writer = std::io::BufWriter::new(_path);
writer.write_all(&buffer).unwrap();
println!("Font saved at: {:?}", font_path_current.display());
font_path_current
};
// load font
let buffer = std::fs::read(font_path).unwrap();
rusttype::Font::try_from_vec(buffer).unwrap()
}

View File

@ -0,0 +1,28 @@
use clap::Parser;
use yolov8_rs::{Args, YOLOv8};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let args = Args::parse();
// 1. load image
let x = image::io::Reader::open(&args.source)?
.with_guessed_format()?
.decode()?;
// 2. model support dynamic batch inference, so input should be a Vec
let xs = vec![x];
// You can test `--batch 2` with this
// let xs = vec![x.clone(), x];
// 3. build yolov8 model
let mut model = YOLOv8::new(args)?;
model.summary(); // model info
// 4. run
let ys = model.run(&xs)?;
println!("{:?}", ys);
Ok(())
}

View File

@ -0,0 +1,642 @@
#![allow(clippy::type_complexity)]
use anyhow::Result;
use image::{DynamicImage, GenericImageView, ImageBuffer};
use ndarray::{s, Array, Axis, IxDyn};
use rand::{thread_rng, Rng};
use std::path::PathBuf;
use crate::{
check_font, gen_time_string, non_max_suppression, Args, Batch, Bbox, Embedding, OrtBackend,
OrtConfig, OrtEP, Point2, YOLOResult, YOLOTask, SKELETON,
};
pub struct YOLOv8 {
// YOLOv8 model for all yolo-tasks
engine: OrtBackend,
nc: u32,
nk: u32,
nm: u32,
height: u32,
width: u32,
batch: u32,
task: YOLOTask,
conf: f32,
kconf: f32,
iou: f32,
names: Vec<String>,
color_palette: Vec<(u8, u8, u8)>,
profile: bool,
plot: bool,
}
impl YOLOv8 {
pub fn new(config: Args) -> Result<Self> {
// execution provider
let ep = if config.trt {
OrtEP::Trt(config.device_id)
} else if config.cuda {
OrtEP::Cuda(config.device_id)
} else {
OrtEP::Cpu
};
// batch
let batch = Batch {
opt: config.batch,
min: config.batch_min,
max: config.batch_max,
};
// build ort engine
let ort_args = OrtConfig {
ep,
batch,
f: config.model,
task: config.task,
trt_fp16: config.fp16,
image_size: (config.height, config.width),
};
let engine = OrtBackend::build(ort_args)?;
// get batch, height, width, tasks, nc, nk, nm
let (batch, height, width, task) = (
engine.batch(),
engine.height(),
engine.width(),
engine.task(),
);
let nc = engine.nc().or(config.nc).unwrap_or_else(|| {
panic!("Failed to get num_classes, make it explicit with `--nc`");
});
let (nk, nm) = match task {
YOLOTask::Pose => {
let nk = engine.nk().or(config.nk).unwrap_or_else(|| {
panic!("Failed to get num_keypoints, make it explicit with `--nk`");
});
(nk, 0)
}
YOLOTask::Segment => {
let nm = engine.nm().or(config.nm).unwrap_or_else(|| {
panic!("Failed to get num_masks, make it explicit with `--nm`");
});
(0, nm)
}
_ => (0, 0),
};
// class names
let names = engine.names().unwrap_or(vec!["Unknown".to_string()]);
// color palette
let mut rng = thread_rng();
let color_palette: Vec<_> = names
.iter()
.map(|_| {
(
rng.gen_range(0..=255),
rng.gen_range(0..=255),
rng.gen_range(0..=255),
)
})
.collect();
Ok(Self {
engine,
names,
conf: config.conf,
kconf: config.kconf,
iou: config.iou,
color_palette,
profile: config.profile,
plot: config.plot,
nc,
nk,
nm,
height,
width,
batch,
task,
})
}
pub fn scale_wh(&self, w0: f32, h0: f32, w1: f32, h1: f32) -> (f32, f32, f32) {
let r = (w1 / w0).min(h1 / h0);
(r, (w0 * r).round(), (h0 * r).round())
}
pub fn preprocess(&mut self, xs: &Vec<DynamicImage>) -> Result<Array<f32, IxDyn>> {
let mut ys =
Array::ones((xs.len(), 3, self.height() as usize, self.width() as usize)).into_dyn();
ys.fill(144.0 / 255.0);
for (idx, x) in xs.iter().enumerate() {
let img = match self.task() {
YOLOTask::Classify => x.resize_exact(
self.width(),
self.height(),
image::imageops::FilterType::Triangle,
),
_ => {
let (w0, h0) = x.dimensions();
let w0 = w0 as f32;
let h0 = h0 as f32;
let (_, w_new, h_new) =
self.scale_wh(w0, h0, self.width() as f32, self.height() as f32); // f32 round
x.resize_exact(
w_new as u32,
h_new as u32,
if let YOLOTask::Segment = self.task() {
image::imageops::FilterType::CatmullRom
} else {
image::imageops::FilterType::Triangle
},
)
}
};
for (x, y, rgb) in img.pixels() {
let x = x as usize;
let y = y as usize;
let [r, g, b, _] = rgb.0;
ys[[idx, 0, y, x]] = (r as f32) / 255.0;
ys[[idx, 1, y, x]] = (g as f32) / 255.0;
ys[[idx, 2, y, x]] = (b as f32) / 255.0;
}
}
Ok(ys)
}
pub fn run(&mut self, xs: &Vec<DynamicImage>) -> Result<Vec<YOLOResult>> {
// pre-process
let t_pre = std::time::Instant::now();
let xs_ = self.preprocess(xs)?;
if self.profile {
println!("[Model Preprocess]: {:?}", t_pre.elapsed());
}
// run
let t_run = std::time::Instant::now();
let ys = self.engine.run(xs_, self.profile)?;
if self.profile {
println!("[Model Inference]: {:?}", t_run.elapsed());
}
// post-process
let t_post = std::time::Instant::now();
let ys = self.postprocess(ys, xs)?;
if self.profile {
println!("[Model Postprocess]: {:?}", t_post.elapsed());
}
// plot and save
if self.plot {
self.plot_and_save(&ys, xs, Some(&SKELETON));
}
Ok(ys)
}
pub fn postprocess(
&self,
xs: Vec<Array<f32, IxDyn>>,
xs0: &[DynamicImage],
) -> Result<Vec<YOLOResult>> {
if let YOLOTask::Classify = self.task() {
let mut ys = Vec::new();
let preds = &xs[0];
for batch in preds.axis_iter(Axis(0)) {
ys.push(YOLOResult::new(
Some(Embedding::new(batch.into_owned())),
None,
None,
None,
));
}
Ok(ys)
} else {
const CXYWH_OFFSET: usize = 4; // cxcywh
const KPT_STEP: usize = 3; // xyconf
let preds = &xs[0];
let protos = {
if xs.len() > 1 {
Some(&xs[1])
} else {
None
}
};
let mut ys = Vec::new();
for (idx, anchor) in preds.axis_iter(Axis(0)).enumerate() {
// [bs, 4 + nc + nm, anchors]
// input image
let width_original = xs0[idx].width() as f32;
let height_original = xs0[idx].height() as f32;
let ratio = (self.width() as f32 / width_original)
.min(self.height() as f32 / height_original);
// save each result
let mut data: Vec<(Bbox, Option<Vec<Point2>>, Option<Vec<f32>>)> = Vec::new();
for pred in anchor.axis_iter(Axis(1)) {
// split preds for different tasks
let bbox = pred.slice(s![0..CXYWH_OFFSET]);
let clss = pred.slice(s![CXYWH_OFFSET..CXYWH_OFFSET + self.nc() as usize]);
let kpts = {
if let YOLOTask::Pose = self.task() {
Some(pred.slice(s![pred.len() - KPT_STEP * self.nk() as usize..]))
} else {
None
}
};
let coefs = {
if let YOLOTask::Segment = self.task() {
Some(pred.slice(s![pred.len() - self.nm() as usize..]).to_vec())
} else {
None
}
};
// confidence and id
let (id, &confidence) = clss
.into_iter()
.enumerate()
.reduce(|max, x| if x.1 > max.1 { x } else { max })
.unwrap(); // definitely will not panic!
// confidence filter
if confidence < self.conf {
continue;
}
// bbox re-scale
let cx = bbox[0] / ratio;
let cy = bbox[1] / ratio;
let w = bbox[2] / ratio;
let h = bbox[3] / ratio;
let x = cx - w / 2.;
let y = cy - h / 2.;
let y_bbox = Bbox::new(
x.max(0.0f32).min(width_original),
y.max(0.0f32).min(height_original),
w,
h,
id,
confidence,
);
// kpts
let y_kpts = {
if let Some(kpts) = kpts {
let mut kpts_ = Vec::new();
// rescale
for i in 0..self.nk() as usize {
let kx = kpts[KPT_STEP * i] / ratio;
let ky = kpts[KPT_STEP * i + 1] / ratio;
let kconf = kpts[KPT_STEP * i + 2];
if kconf < self.kconf {
kpts_.push(Point2::default());
} else {
kpts_.push(Point2::new_with_conf(
kx.max(0.0f32).min(width_original),
ky.max(0.0f32).min(height_original),
kconf,
));
}
}
Some(kpts_)
} else {
None
}
};
// data merged
data.push((y_bbox, y_kpts, coefs));
}
// nms
non_max_suppression(&mut data, self.iou);
// decode
let mut y_bboxes: Vec<Bbox> = Vec::new();
let mut y_kpts: Vec<Vec<Point2>> = Vec::new();
let mut y_masks: Vec<Vec<u8>> = Vec::new();
for elem in data.into_iter() {
if let Some(kpts) = elem.1 {
y_kpts.push(kpts)
}
// decode masks
if let Some(coefs) = elem.2 {
let proto = protos.unwrap().slice(s![idx, .., .., ..]);
let (nm, nh, nw) = proto.dim();
// coefs * proto -> mask
let coefs = Array::from_shape_vec((1, nm), coefs)?; // (n, nm)
let proto = proto.to_owned().into_shape((nm, nh * nw))?; // (nm, nh*nw)
let mask = coefs.dot(&proto).into_shape((nh, nw, 1))?; // (nh, nw, n)
// build image from ndarray
let mask_im: ImageBuffer<image::Luma<_>, Vec<f32>> =
match ImageBuffer::from_raw(nw as u32, nh as u32, mask.into_raw_vec()) {
Some(image) => image,
None => panic!("can not create image from ndarray"),
};
let mut mask_im = image::DynamicImage::from(mask_im); // -> dyn
// rescale masks
let (_, w_mask, h_mask) =
self.scale_wh(width_original, height_original, nw as f32, nh as f32);
let mask_cropped = mask_im.crop(0, 0, w_mask as u32, h_mask as u32);
let mask_original = mask_cropped.resize_exact(
// resize_to_fill
width_original as u32,
height_original as u32,
match self.task() {
YOLOTask::Segment => image::imageops::FilterType::CatmullRom,
_ => image::imageops::FilterType::Triangle,
},
);
// crop-mask with bbox
let mut mask_original_cropped = mask_original.into_luma8();
for y in 0..height_original as usize {
for x in 0..width_original as usize {
if x < elem.0.xmin() as usize
|| x > elem.0.xmax() as usize
|| y < elem.0.ymin() as usize
|| y > elem.0.ymax() as usize
{
mask_original_cropped.put_pixel(
x as u32,
y as u32,
image::Luma([0u8]),
);
}
}
}
y_masks.push(mask_original_cropped.into_raw());
}
y_bboxes.push(elem.0);
}
// save each result
let y = YOLOResult {
probs: None,
bboxes: if !y_bboxes.is_empty() {
Some(y_bboxes)
} else {
None
},
keypoints: if !y_kpts.is_empty() {
Some(y_kpts)
} else {
None
},
masks: if !y_masks.is_empty() {
Some(y_masks)
} else {
None
},
};
ys.push(y);
}
Ok(ys)
}
}
pub fn plot_and_save(
&self,
ys: &[YOLOResult],
xs0: &[DynamicImage],
skeletons: Option<&[(usize, usize)]>,
) {
// check font then load
let font = check_font("Arial.ttf");
for (_idb, (img0, y)) in xs0.iter().zip(ys.iter()).enumerate() {
let mut img = img0.to_rgb8();
// draw for classifier
if let Some(probs) = y.probs() {
for (i, k) in probs.topk(5).iter().enumerate() {
let legend = format!("{} {:.2}%", self.names[k.0], k.1);
let scale = 32;
let legend_size = img.width().max(img.height()) / scale;
let x = img.width() / 20;
let y = img.height() / 20 + i as u32 * legend_size;
imageproc::drawing::draw_text_mut(
&mut img,
image::Rgb([0, 255, 0]),
x as i32,
y as i32,
rusttype::Scale::uniform(legend_size as f32 - 1.),
&font,
&legend,
);
}
}
// draw bboxes & keypoints
if let Some(bboxes) = y.bboxes() {
for (_idx, bbox) in bboxes.iter().enumerate() {
// rect
imageproc::drawing::draw_hollow_rect_mut(
&mut img,
imageproc::rect::Rect::at(bbox.xmin() as i32, bbox.ymin() as i32)
.of_size(bbox.width() as u32, bbox.height() as u32),
image::Rgb(self.color_palette[bbox.id()].into()),
);
// text
let legend = format!("{} {:.2}%", self.names[bbox.id()], bbox.confidence());
let scale = 40;
let legend_size = img.width().max(img.height()) / scale;
imageproc::drawing::draw_text_mut(
&mut img,
image::Rgb(self.color_palette[bbox.id()].into()),
bbox.xmin() as i32,
(bbox.ymin() - legend_size as f32) as i32,
rusttype::Scale::uniform(legend_size as f32 - 1.),
&font,
&legend,
);
}
}
// draw kpts
if let Some(keypoints) = y.keypoints() {
for kpts in keypoints.iter() {
for kpt in kpts.iter() {
// filter
if kpt.confidence() < self.kconf {
continue;
}
// draw point
imageproc::drawing::draw_filled_circle_mut(
&mut img,
(kpt.x() as i32, kpt.y() as i32),
2,
image::Rgb([0, 255, 0]),
);
}
// draw skeleton if has
if let Some(skeletons) = skeletons {
for &(idx1, idx2) in skeletons.iter() {
let kpt1 = &kpts[idx1];
let kpt2 = &kpts[idx2];
if kpt1.confidence() < self.kconf || kpt2.confidence() < self.kconf {
continue;
}
imageproc::drawing::draw_line_segment_mut(
&mut img,
(kpt1.x(), kpt1.y()),
(kpt2.x(), kpt2.y()),
image::Rgb([233, 14, 57]),
);
}
}
}
}
// draw mask
if let Some(masks) = y.masks() {
for (mask, _bbox) in masks.iter().zip(y.bboxes().unwrap().iter()) {
let mask_nd: ImageBuffer<image::Luma<_>, Vec<u8>> =
match ImageBuffer::from_vec(img.width(), img.height(), mask.to_vec()) {
Some(image) => image,
None => panic!("can not crate image from ndarray"),
};
for _x in 0..img.width() {
for _y in 0..img.height() {
let mask_p = imageproc::drawing::Canvas::get_pixel(&mask_nd, _x, _y);
if mask_p.0[0] > 0 {
let mut img_p = imageproc::drawing::Canvas::get_pixel(&img, _x, _y);
// img_p.0[2] = self.color_palette[bbox.id()].2 / 2;
// img_p.0[1] = self.color_palette[bbox.id()].1 / 2;
// img_p.0[0] = self.color_palette[bbox.id()].0 / 2;
img_p.0[2] /= 2;
img_p.0[1] = 255 - (255 - img_p.0[2]) / 2;
img_p.0[0] /= 2;
imageproc::drawing::Canvas::draw_pixel(&mut img, _x, _y, img_p)
}
}
}
}
}
// mkdir and save
let mut runs = PathBuf::from("runs");
if !runs.exists() {
std::fs::create_dir_all(&runs).unwrap();
}
runs.push(gen_time_string("-"));
let saveout = format!("{}.jpg", runs.to_str().unwrap());
let _ = img.save(saveout);
}
}
pub fn summary(&self) {
println!(
"\nSummary:\n\
> Task: {:?}{}\n\
> EP: {:?} {}\n\
> Dtype: {:?}\n\
> Batch: {} ({}), Height: {} ({}), Width: {} ({})\n\
> nc: {} nk: {}, nm: {}, conf: {}, kconf: {}, iou: {}\n\
",
self.task(),
match self.engine.author().zip(self.engine.version()) {
Some((author, ver)) => format!(" ({} {})", author, ver),
None => String::from(""),
},
self.engine.ep(),
if let OrtEP::Cpu = self.engine.ep() {
""
} else {
"(May still fall back to CPU)"
},
self.engine.dtype(),
self.batch(),
if self.engine.is_batch_dynamic() {
"Dynamic"
} else {
"Const"
},
self.height(),
if self.engine.is_height_dynamic() {
"Dynamic"
} else {
"Const"
},
self.width(),
if self.engine.is_width_dynamic() {
"Dynamic"
} else {
"Const"
},
self.nc(),
self.nk(),
self.nm(),
self.conf,
self.kconf,
self.iou,
);
}
pub fn engine(&self) -> &OrtBackend {
&self.engine
}
pub fn conf(&self) -> f32 {
self.conf
}
pub fn set_conf(&mut self, val: f32) {
self.conf = val;
}
pub fn conf_mut(&mut self) -> &mut f32 {
&mut self.conf
}
pub fn kconf(&self) -> f32 {
self.kconf
}
pub fn iou(&self) -> f32 {
self.iou
}
pub fn task(&self) -> &YOLOTask {
&self.task
}
pub fn batch(&self) -> u32 {
self.batch
}
pub fn width(&self) -> u32 {
self.width
}
pub fn height(&self) -> u32 {
self.height
}
pub fn nc(&self) -> u32 {
self.nc
}
pub fn nk(&self) -> u32 {
self.nk
}
pub fn nm(&self) -> u32 {
self.nm
}
pub fn names(&self) -> &Vec<String> {
&self.names
}
}

View File

@ -0,0 +1,534 @@
use anyhow::Result;
use clap::ValueEnum;
use half::f16;
use ndarray::{Array, CowArray, IxDyn};
use ort::execution_providers::{CUDAExecutionProviderOptions, TensorRTExecutionProviderOptions};
use ort::tensor::TensorElementDataType;
use ort::{Environment, ExecutionProvider, Session, SessionBuilder, Value};
use regex::Regex;
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, ValueEnum)]
pub enum YOLOTask {
// YOLO tasks
Classify,
Detect,
Pose,
Segment,
}
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord)]
pub enum OrtEP {
// ONNXRuntime execution provider
Cpu,
Cuda(u32),
Trt(u32),
}
#[derive(Debug)]
pub struct Batch {
pub opt: u32,
pub min: u32,
pub max: u32,
}
impl Default for Batch {
fn default() -> Self {
Self {
opt: 1,
min: 1,
max: 1,
}
}
}
#[derive(Debug, Default)]
pub struct OrtInputs {
// ONNX model inputs attrs
pub shapes: Vec<Vec<i32>>,
pub dtypes: Vec<TensorElementDataType>,
pub names: Vec<String>,
pub sizes: Vec<Vec<u32>>,
}
impl OrtInputs {
pub fn new(session: &Session) -> Self {
let mut shapes = Vec::new();
let mut dtypes = Vec::new();
let mut names = Vec::new();
for i in session.inputs.iter() {
let shape: Vec<i32> = i
.dimensions()
.map(|x| if let Some(x) = x { x as i32 } else { -1i32 })
.collect();
shapes.push(shape);
dtypes.push(i.input_type);
names.push(i.name.clone());
}
Self {
shapes,
dtypes,
names,
..Default::default()
}
}
}
#[derive(Debug)]
pub struct OrtConfig {
// ORT config
pub f: String,
pub task: Option<YOLOTask>,
pub ep: OrtEP,
pub trt_fp16: bool,
pub batch: Batch,
pub image_size: (Option<u32>, Option<u32>),
}
#[derive(Debug)]
pub struct OrtBackend {
// ORT engine
session: Session,
task: YOLOTask,
ep: OrtEP,
batch: Batch,
inputs: OrtInputs,
}
impl OrtBackend {
pub fn build(args: OrtConfig) -> Result<Self> {
// build env & session
let env = Environment::builder()
.with_name("YOLOv8")
.with_log_level(ort::LoggingLevel::Verbose)
.build()?
.into_arc();
let session = SessionBuilder::new(&env)?.with_model_from_file(&args.f)?;
// get inputs
let mut inputs = OrtInputs::new(&session);
// batch size
let mut batch = args.batch;
let batch = if inputs.shapes[0][0] == -1 {
batch
} else {
assert_eq!(
inputs.shapes[0][0] as u32, batch.opt,
"Expected batch size: {}, got {}. Try using `--batch {}`.",
inputs.shapes[0][0] as u32, batch.opt, inputs.shapes[0][0] as u32
);
batch.opt = inputs.shapes[0][0] as u32;
batch
};
// input size: height and width
let height = if inputs.shapes[0][2] == -1 {
match args.image_size.0 {
Some(height) => height,
None => panic!("Failed to get model height. Make it explicit with `--height`"),
}
} else {
inputs.shapes[0][2] as u32
};
let width = if inputs.shapes[0][3] == -1 {
match args.image_size.1 {
Some(width) => width,
None => panic!("Failed to get model width. Make it explicit with `--width`"),
}
} else {
inputs.shapes[0][3] as u32
};
inputs.sizes.push(vec![height, width]);
// build provider
let (ep, provider) = match args.ep {
OrtEP::Cuda(device_id) => Self::set_ep_cuda(device_id),
OrtEP::Trt(device_id) => Self::set_ep_trt(device_id, args.trt_fp16, &batch, &inputs),
_ => (OrtEP::Cpu, ExecutionProvider::CPU(Default::default())),
};
// build session again with the new provider
let session = SessionBuilder::new(&env)?
// .with_optimization_level(ort::GraphOptimizationLevel::Level3)?
.with_execution_providers([provider])?
.with_model_from_file(args.f)?;
// task: using given one or guessing
let task = match args.task {
Some(task) => task,
None => match session.metadata() {
Err(_) => panic!("No metadata found. Try making it explicit by `--task`"),
Ok(metadata) => match metadata.custom("task") {
Err(_) => panic!("Can not get custom value. Try making it explicit by `--task`"),
Ok(value) => match value {
None => panic!("No correspoing value of `task` found in metadata. Make it explicit by `--task`"),
Some(task) => match task.as_str() {
"classify" => YOLOTask::Classify,
"detect" => YOLOTask::Detect,
"pose" => YOLOTask::Pose,
"segment" => YOLOTask::Segment,
x => todo!("{:?} is not supported for now!", x),
},
},
},
},
};
Ok(Self {
session,
task,
ep,
batch,
inputs,
})
}
pub fn fetch_inputs_from_session(
session: &Session,
) -> (Vec<Vec<i32>>, Vec<TensorElementDataType>, Vec<String>) {
// get inputs attrs from ONNX model
let mut shapes = Vec::new();
let mut dtypes = Vec::new();
let mut names = Vec::new();
for i in session.inputs.iter() {
let shape: Vec<i32> = i
.dimensions()
.map(|x| if let Some(x) = x { x as i32 } else { -1i32 })
.collect();
shapes.push(shape);
dtypes.push(i.input_type);
names.push(i.name.clone());
}
(shapes, dtypes, names)
}
pub fn set_ep_cuda(device_id: u32) -> (OrtEP, ExecutionProvider) {
// set CUDA
if ExecutionProvider::CUDA(Default::default()).is_available() {
(
OrtEP::Cuda(device_id),
ExecutionProvider::CUDA(CUDAExecutionProviderOptions {
device_id,
..Default::default()
}),
)
} else {
println!("> CUDA is not available! Using CPU.");
(OrtEP::Cpu, ExecutionProvider::CPU(Default::default()))
}
}
pub fn set_ep_trt(
device_id: u32,
fp16: bool,
batch: &Batch,
inputs: &OrtInputs,
) -> (OrtEP, ExecutionProvider) {
// set TensorRT
if ExecutionProvider::TensorRT(Default::default()).is_available() {
let (height, width) = (inputs.sizes[0][0], inputs.sizes[0][1]);
// dtype match checking
if inputs.dtypes[0] == TensorElementDataType::Float16 && !fp16 {
panic!(
"Dtype mismatch! Expected: Float32, got: {:?}. You should use `--fp16`",
inputs.dtypes[0]
);
}
// dynamic shape: input_tensor_1:dim_1xdim_2x...,input_tensor_2:dim_3xdim_4x...,...
let mut opt_string = String::new();
let mut min_string = String::new();
let mut max_string = String::new();
for name in inputs.names.iter() {
let s_opt = format!("{}:{}x3x{}x{},", name, batch.opt, height, width);
let s_min = format!("{}:{}x3x{}x{},", name, batch.min, height, width);
let s_max = format!("{}:{}x3x{}x{},", name, batch.max, height, width);
opt_string.push_str(s_opt.as_str());
min_string.push_str(s_min.as_str());
max_string.push_str(s_max.as_str());
}
let _ = opt_string.pop();
let _ = min_string.pop();
let _ = max_string.pop();
(
OrtEP::Trt(device_id),
ExecutionProvider::TensorRT(TensorRTExecutionProviderOptions {
device_id,
fp16_enable: fp16,
timing_cache_enable: true,
profile_min_shapes: min_string,
profile_max_shapes: max_string,
profile_opt_shapes: opt_string,
..Default::default()
}),
)
} else {
println!("> TensorRT is not available! Try using CUDA...");
Self::set_ep_cuda(device_id)
}
}
pub fn fetch_from_metadata(&self, key: &str) -> Option<String> {
// fetch value from onnx model file by key
match self.session.metadata() {
Err(_) => None,
Ok(metadata) => match metadata.custom(key) {
Err(_) => None,
Ok(value) => value,
},
}
}
pub fn run(&self, xs: Array<f32, IxDyn>, profile: bool) -> Result<Vec<Array<f32, IxDyn>>> {
// ORT inference
match self.dtype() {
TensorElementDataType::Float16 => self.run_fp16(xs, profile),
TensorElementDataType::Float32 => self.run_fp32(xs, profile),
_ => todo!(),
}
}
pub fn run_fp16(&self, xs: Array<f32, IxDyn>, profile: bool) -> Result<Vec<Array<f32, IxDyn>>> {
// f32->f16
let t = std::time::Instant::now();
let xs = xs.mapv(f16::from_f32);
if profile {
println!("[ORT f32->f16]: {:?}", t.elapsed());
}
// h2d
let t = std::time::Instant::now();
let xs = CowArray::from(xs);
let xs = vec![Value::from_array(self.session.allocator(), &xs)?];
if profile {
println!("[ORT H2D]: {:?}", t.elapsed());
}
// run
let t = std::time::Instant::now();
let ys = self.session.run(xs)?;
if profile {
println!("[ORT Inference]: {:?}", t.elapsed());
}
// d2h
Ok(ys
.iter()
.map(|x| {
// d2h
let t = std::time::Instant::now();
let x = x.try_extract::<_>().unwrap().view().clone().into_owned();
if profile {
println!("[ORT D2H]: {:?}", t.elapsed());
}
// f16->f32
let t_ = std::time::Instant::now();
let x = x.mapv(f16::to_f32);
if profile {
println!("[ORT f16->f32]: {:?}", t_.elapsed());
}
x
})
.collect::<Vec<Array<_, _>>>())
}
pub fn run_fp32(&self, xs: Array<f32, IxDyn>, profile: bool) -> Result<Vec<Array<f32, IxDyn>>> {
// h2d
let t = std::time::Instant::now();
let xs = CowArray::from(xs);
let xs = vec![Value::from_array(self.session.allocator(), &xs)?];
if profile {
println!("[ORT H2D]: {:?}", t.elapsed());
}
// run
let t = std::time::Instant::now();
let ys = self.session.run(xs)?;
if profile {
println!("[ORT Inference]: {:?}", t.elapsed());
}
// d2h
Ok(ys
.iter()
.map(|x| {
let t = std::time::Instant::now();
let x = x.try_extract::<_>().unwrap().view().clone().into_owned();
if profile {
println!("[ORT D2H]: {:?}", t.elapsed());
}
x
})
.collect::<Vec<Array<_, _>>>())
}
pub fn output_shapes(&self) -> Vec<Vec<i32>> {
let mut shapes = Vec::new();
for o in &self.session.outputs {
let shape: Vec<_> = o
.dimensions()
.map(|x| if let Some(x) = x { x as i32 } else { -1i32 })
.collect();
shapes.push(shape);
}
shapes
}
pub fn output_dtypes(&self) -> Vec<TensorElementDataType> {
let mut dtypes = Vec::new();
self.session
.outputs
.iter()
.for_each(|x| dtypes.push(x.output_type));
dtypes
}
pub fn input_shapes(&self) -> &Vec<Vec<i32>> {
&self.inputs.shapes
}
pub fn input_names(&self) -> &Vec<String> {
&self.inputs.names
}
pub fn input_dtypes(&self) -> &Vec<TensorElementDataType> {
&self.inputs.dtypes
}
pub fn dtype(&self) -> TensorElementDataType {
self.input_dtypes()[0]
}
pub fn height(&self) -> u32 {
self.inputs.sizes[0][0]
}
pub fn width(&self) -> u32 {
self.inputs.sizes[0][1]
}
pub fn is_height_dynamic(&self) -> bool {
self.input_shapes()[0][2] == -1
}
pub fn is_width_dynamic(&self) -> bool {
self.input_shapes()[0][3] == -1
}
pub fn batch(&self) -> u32 {
self.batch.opt
}
pub fn is_batch_dynamic(&self) -> bool {
self.input_shapes()[0][0] == -1
}
pub fn ep(&self) -> &OrtEP {
&self.ep
}
pub fn task(&self) -> YOLOTask {
self.task.clone()
}
pub fn names(&self) -> Option<Vec<String>> {
// class names, metadata parsing
// String format: `{0: 'person', 1: 'bicycle', 2: 'sports ball', ..., 27: "yellow_lady's_slipper"}`
match self.fetch_from_metadata("names") {
Some(names) => {
let re = Regex::new(r#"(['"])([-()\w '"]+)(['"])"#).unwrap();
let mut names_ = vec![];
for (_, [_, name, _]) in re.captures_iter(&names).map(|x| x.extract()) {
names_.push(name.to_string());
}
Some(names_)
}
None => None,
}
}
pub fn nk(&self) -> Option<u32> {
// num_keypoints, metadata parsing: String `nk` in onnx model: `[17, 3]`
match self.fetch_from_metadata("kpt_shape") {
None => None,
Some(kpt_string) => {
let re = Regex::new(r"([0-9]+), ([0-9]+)").unwrap();
let caps = re.captures(&kpt_string).unwrap();
Some(caps.get(1).unwrap().as_str().parse::<u32>().unwrap())
}
}
}
pub fn nc(&self) -> Option<u32> {
// num_classes
match self.names() {
// by names
Some(names) => Some(names.len() as u32),
None => match self.task() {
// by task calculation
YOLOTask::Classify => Some(self.output_shapes()[0][1] as u32),
YOLOTask::Detect => {
if self.output_shapes()[0][1] == -1 {
None
} else {
// cxywhclss
Some(self.output_shapes()[0][1] as u32 - 4)
}
}
YOLOTask::Pose => {
match self.nk() {
None => None,
Some(nk) => {
if self.output_shapes()[0][1] == -1 {
None
} else {
// cxywhclss3*kpt
Some(self.output_shapes()[0][1] as u32 - 4 - 3 * nk)
}
}
}
}
YOLOTask::Segment => {
if self.output_shapes()[0][1] == -1 {
None
} else {
// cxywhclssnm
Some((self.output_shapes()[0][1] - self.output_shapes()[1][1]) as u32 - 4)
}
}
},
}
}
pub fn nm(&self) -> Option<u32> {
// num_masks
match self.task() {
YOLOTask::Segment => Some(self.output_shapes()[1][1] as u32),
_ => None,
}
}
pub fn na(&self) -> Option<u32> {
// num_anchors
match self.task() {
YOLOTask::Segment | YOLOTask::Detect | YOLOTask::Pose => {
if self.output_shapes()[0][2] == -1 {
None
} else {
Some(self.output_shapes()[0][2] as u32)
}
}
_ => None,
}
}
pub fn author(&self) -> Option<String> {
self.fetch_from_metadata("author")
}
pub fn version(&self) -> Option<String> {
self.fetch_from_metadata("version")
}
}

View File

@ -0,0 +1,235 @@
use ndarray::{Array, Axis, IxDyn};
#[derive(Clone, PartialEq, Default)]
pub struct YOLOResult {
// YOLO tasks results of an image
pub probs: Option<Embedding>,
pub bboxes: Option<Vec<Bbox>>,
pub keypoints: Option<Vec<Vec<Point2>>>,
pub masks: Option<Vec<Vec<u8>>>,
}
impl std::fmt::Debug for YOLOResult {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("YOLOResult")
.field(
"Probs(top5)",
&format_args!("{:?}", self.probs().map(|probs| probs.topk(5))),
)
.field("Bboxes", &self.bboxes)
.field("Keypoints", &self.keypoints)
.field(
"Masks",
&format_args!("{:?}", self.masks().map(|masks| masks.len())),
)
.finish()
}
}
impl YOLOResult {
pub fn new(
probs: Option<Embedding>,
bboxes: Option<Vec<Bbox>>,
keypoints: Option<Vec<Vec<Point2>>>,
masks: Option<Vec<Vec<u8>>>,
) -> Self {
Self {
probs,
bboxes,
keypoints,
masks,
}
}
pub fn probs(&self) -> Option<&Embedding> {
self.probs.as_ref()
}
pub fn keypoints(&self) -> Option<&Vec<Vec<Point2>>> {
self.keypoints.as_ref()
}
pub fn masks(&self) -> Option<&Vec<Vec<u8>>> {
self.masks.as_ref()
}
pub fn bboxes(&self) -> Option<&Vec<Bbox>> {
self.bboxes.as_ref()
}
pub fn bboxes_mut(&mut self) -> Option<&mut Vec<Bbox>> {
self.bboxes.as_mut()
}
}
#[derive(Debug, PartialEq, Clone, Default)]
pub struct Point2 {
// A point2d with x, y, conf
x: f32,
y: f32,
confidence: f32,
}
impl Point2 {
pub fn new_with_conf(x: f32, y: f32, confidence: f32) -> Self {
Self { x, y, confidence }
}
pub fn new(x: f32, y: f32) -> Self {
Self {
x,
y,
..Default::default()
}
}
pub fn x(&self) -> f32 {
self.x
}
pub fn y(&self) -> f32 {
self.y
}
pub fn confidence(&self) -> f32 {
self.confidence
}
}
#[derive(Debug, Clone, PartialEq, Default)]
pub struct Embedding {
// An float32 n-dims tensor
data: Array<f32, IxDyn>,
}
impl Embedding {
pub fn new(data: Array<f32, IxDyn>) -> Self {
Self { data }
}
pub fn data(&self) -> &Array<f32, IxDyn> {
&self.data
}
pub fn topk(&self, k: usize) -> Vec<(usize, f32)> {
let mut probs = self
.data
.iter()
.enumerate()
.map(|(a, b)| (a, *b))
.collect::<Vec<_>>();
probs.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
let mut topk = Vec::new();
for &(id, confidence) in probs.iter().take(k) {
topk.push((id, confidence));
}
topk
}
pub fn norm(&self) -> Array<f32, IxDyn> {
let std_ = self.data.mapv(|x| x * x).sum_axis(Axis(0)).mapv(f32::sqrt);
self.data.clone() / std_
}
pub fn top1(&self) -> (usize, f32) {
self.topk(1)[0]
}
}
#[derive(Debug, Clone, PartialEq, Default)]
pub struct Bbox {
// a bounding box around an object
xmin: f32,
ymin: f32,
width: f32,
height: f32,
id: usize,
confidence: f32,
}
impl Bbox {
pub fn new_from_xywh(xmin: f32, ymin: f32, width: f32, height: f32) -> Self {
Self {
xmin,
ymin,
width,
height,
..Default::default()
}
}
pub fn new(xmin: f32, ymin: f32, width: f32, height: f32, id: usize, confidence: f32) -> Self {
Self {
xmin,
ymin,
width,
height,
id,
confidence,
}
}
pub fn width(&self) -> f32 {
self.width
}
pub fn height(&self) -> f32 {
self.height
}
pub fn xmin(&self) -> f32 {
self.xmin
}
pub fn ymin(&self) -> f32 {
self.ymin
}
pub fn xmax(&self) -> f32 {
self.xmin + self.width
}
pub fn ymax(&self) -> f32 {
self.ymin + self.height
}
pub fn tl(&self) -> Point2 {
Point2::new(self.xmin, self.ymin)
}
pub fn br(&self) -> Point2 {
Point2::new(self.xmax(), self.ymax())
}
pub fn cxcy(&self) -> Point2 {
Point2::new(self.xmin + self.width / 2., self.ymin + self.height / 2.)
}
pub fn id(&self) -> usize {
self.id
}
pub fn confidence(&self) -> f32 {
self.confidence
}
pub fn area(&self) -> f32 {
self.width * self.height
}
pub fn intersection_area(&self, another: &Bbox) -> f32 {
let l = self.xmin.max(another.xmin);
let r = (self.xmin + self.width).min(another.xmin + another.width);
let t = self.ymin.max(another.ymin);
let b = (self.ymin + self.height).min(another.ymin + another.height);
(r - l + 1.).max(0.) * (b - t + 1.).max(0.)
}
pub fn union(&self, another: &Bbox) -> f32 {
self.area() + another.area() - self.intersection_area(another)
}
pub fn iou(&self, another: &Bbox) -> f32 {
self.intersection_area(another) / self.union(another)
}
}

View File

@ -0,0 +1,43 @@
# YOLOv8 - ONNX Runtime
This project implements YOLOv8 using ONNX Runtime.
## Installation
To run this project, you need to install the required dependencies. The following instructions will guide you through the installation process.
### Installing Required Dependencies
You can install the required dependencies by running the following command:
```bash
pip install -r requirements.txt
```
### Installing `onnxruntime-gpu`
If you have an NVIDIA GPU and want to leverage GPU acceleration, you can install the onnxruntime-gpu package using the following command:
```bash
pip install onnxruntime-gpu
```
Note: Make sure you have the appropriate GPU drivers installed on your system.
### Installing `onnxruntime` (CPU version)
If you don't have an NVIDIA GPU or prefer to use the CPU version of onnxruntime, you can install the onnxruntime package using the following command:
```bash
pip install onnxruntime
```
### Usage
After successfully installing the required packages, you can run the YOLOv8 implementation using the following command:
```bash
python main.py --model yolov8n.onnx --img image.jpg --conf-thres 0.5 --iou-thres 0.5
```
Make sure to replace yolov8n.onnx with the path to your YOLOv8 ONNX model file, image.jpg with the path to your input image, and adjust the confidence threshold (conf-thres) and IoU threshold (iou-thres) values as needed.

View File

@ -0,0 +1,231 @@
# Ultralytics YOLO 🚀, AGPL-3.0 license
import argparse
import cv2
import numpy as np
import onnxruntime as ort
import torch
from ultralytics.utils import ASSETS, yaml_load
from ultralytics.utils.checks import check_requirements, check_yaml
class YOLOv8:
"""YOLOv8 object detection model class for handling inference and visualization."""
def __init__(self, onnx_model, input_image, confidence_thres, iou_thres):
"""
Initializes an instance of the YOLOv8 class.
Args:
onnx_model: Path to the ONNX model.
input_image: Path to the input image.
confidence_thres: Confidence threshold for filtering detections.
iou_thres: IoU (Intersection over Union) threshold for non-maximum suppression.
"""
self.onnx_model = onnx_model
self.input_image = input_image
self.confidence_thres = confidence_thres
self.iou_thres = iou_thres
# Load the class names from the COCO dataset
self.classes = yaml_load(check_yaml("coco128.yaml"))["names"]
# Generate a color palette for the classes
self.color_palette = np.random.uniform(0, 255, size=(len(self.classes), 3))
def draw_detections(self, img, box, score, class_id):
"""
Draws bounding boxes and labels on the input image based on the detected objects.
Args:
img: The input image to draw detections on.
box: Detected bounding box.
score: Corresponding detection score.
class_id: Class ID for the detected object.
Returns:
None
"""
# Extract the coordinates of the bounding box
x1, y1, w, h = box
# Retrieve the color for the class ID
color = self.color_palette[class_id]
# Draw the bounding box on the image
cv2.rectangle(img, (int(x1), int(y1)), (int(x1 + w), int(y1 + h)), color, 2)
# Create the label text with class name and score
label = f"{self.classes[class_id]}: {score:.2f}"
# Calculate the dimensions of the label text
(label_width, label_height), _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
# Calculate the position of the label text
label_x = x1
label_y = y1 - 10 if y1 - 10 > label_height else y1 + 10
# Draw a filled rectangle as the background for the label text
cv2.rectangle(
img, (label_x, label_y - label_height), (label_x + label_width, label_y + label_height), color, cv2.FILLED
)
# Draw the label text on the image
cv2.putText(img, label, (label_x, label_y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1, cv2.LINE_AA)
def preprocess(self):
"""
Preprocesses the input image before performing inference.
Returns:
image_data: Preprocessed image data ready for inference.
"""
# Read the input image using OpenCV
self.img = cv2.imread(self.input_image)
# Get the height and width of the input image
self.img_height, self.img_width = self.img.shape[:2]
# Convert the image color space from BGR to RGB
img = cv2.cvtColor(self.img, cv2.COLOR_BGR2RGB)
# Resize the image to match the input shape
img = cv2.resize(img, (self.input_width, self.input_height))
# Normalize the image data by dividing it by 255.0
image_data = np.array(img) / 255.0
# Transpose the image to have the channel dimension as the first dimension
image_data = np.transpose(image_data, (2, 0, 1)) # Channel first
# Expand the dimensions of the image data to match the expected input shape
image_data = np.expand_dims(image_data, axis=0).astype(np.float32)
# Return the preprocessed image data
return image_data
def postprocess(self, input_image, output):
"""
Performs post-processing on the model's output to extract bounding boxes, scores, and class IDs.
Args:
input_image (numpy.ndarray): The input image.
output (numpy.ndarray): The output of the model.
Returns:
numpy.ndarray: The input image with detections drawn on it.
"""
# Transpose and squeeze the output to match the expected shape
outputs = np.transpose(np.squeeze(output[0]))
# Get the number of rows in the outputs array
rows = outputs.shape[0]
# Lists to store the bounding boxes, scores, and class IDs of the detections
boxes = []
scores = []
class_ids = []
# Calculate the scaling factors for the bounding box coordinates
x_factor = self.img_width / self.input_width
y_factor = self.img_height / self.input_height
# Iterate over each row in the outputs array
for i in range(rows):
# Extract the class scores from the current row
classes_scores = outputs[i][4:]
# Find the maximum score among the class scores
max_score = np.amax(classes_scores)
# If the maximum score is above the confidence threshold
if max_score >= self.confidence_thres:
# Get the class ID with the highest score
class_id = np.argmax(classes_scores)
# Extract the bounding box coordinates from the current row
x, y, w, h = outputs[i][0], outputs[i][1], outputs[i][2], outputs[i][3]
# Calculate the scaled coordinates of the bounding box
left = int((x - w / 2) * x_factor)
top = int((y - h / 2) * y_factor)
width = int(w * x_factor)
height = int(h * y_factor)
# Add the class ID, score, and box coordinates to the respective lists
class_ids.append(class_id)
scores.append(max_score)
boxes.append([left, top, width, height])
# Apply non-maximum suppression to filter out overlapping bounding boxes
indices = cv2.dnn.NMSBoxes(boxes, scores, self.confidence_thres, self.iou_thres)
# Iterate over the selected indices after non-maximum suppression
for i in indices:
# Get the box, score, and class ID corresponding to the index
box = boxes[i]
score = scores[i]
class_id = class_ids[i]
# Draw the detection on the input image
self.draw_detections(input_image, box, score, class_id)
# Return the modified input image
return input_image
def main(self):
"""
Performs inference using an ONNX model and returns the output image with drawn detections.
Returns:
output_img: The output image with drawn detections.
"""
# Create an inference session using the ONNX model and specify execution providers
session = ort.InferenceSession(self.onnx_model, providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
# Get the model inputs
model_inputs = session.get_inputs()
# Store the shape of the input for later use
input_shape = model_inputs[0].shape
self.input_width = input_shape[2]
self.input_height = input_shape[3]
# Preprocess the image data
img_data = self.preprocess()
# Run inference using the preprocessed image data
outputs = session.run(None, {model_inputs[0].name: img_data})
# Perform post-processing on the outputs to obtain output image.
return self.postprocess(self.img, outputs) # output image
if __name__ == "__main__":
# Create an argument parser to handle command-line arguments
parser = argparse.ArgumentParser()
parser.add_argument("--model", type=str, default="yolov8n.onnx", help="Input your ONNX model.")
parser.add_argument("--img", type=str, default=str(ASSETS / "bus.jpg"), help="Path to input image.")
parser.add_argument("--conf-thres", type=float, default=0.5, help="Confidence threshold")
parser.add_argument("--iou-thres", type=float, default=0.5, help="NMS IoU threshold")
args = parser.parse_args()
# Check the requirements and select the appropriate backend (CPU or GPU)
check_requirements("onnxruntime-gpu" if torch.cuda.is_available() else "onnxruntime")
# Create an instance of the YOLOv8 class with the specified arguments
detection = YOLOv8(args.model, args.img, args.conf_thres, args.iou_thres)
# Perform object detection and obtain the output image
output_image = detection.main()
# Display the output image in a window
cv2.namedWindow("Output", cv2.WINDOW_NORMAL)
cv2.imshow("Output", output_image)
# Wait for a key press to exit
cv2.waitKey(0)

View File

@ -0,0 +1,19 @@
# YOLOv8 - OpenCV
Implementation YOLOv8 on OpenCV using ONNX Format.
Just simply clone and run
```bash
pip install -r requirements.txt
python main.py --model yolov8n.onnx --img image.jpg
```
If you start from scratch:
```bash
pip install ultralytics
yolo export model=yolov8n.pt imgsz=640 format=onnx opset=12
```
_\*Make sure to include "opset=12"_

View File

@ -0,0 +1,130 @@
# Ultralytics YOLO 🚀, AGPL-3.0 license
import argparse
import cv2.dnn
import numpy as np
from ultralytics.utils import ASSETS, yaml_load
from ultralytics.utils.checks import check_yaml
CLASSES = yaml_load(check_yaml("coco128.yaml"))["names"]
colors = np.random.uniform(0, 255, size=(len(CLASSES), 3))
def draw_bounding_box(img, class_id, confidence, x, y, x_plus_w, y_plus_h):
"""
Draws bounding boxes on the input image based on the provided arguments.
Args:
img (numpy.ndarray): The input image to draw the bounding box on.
class_id (int): Class ID of the detected object.
confidence (float): Confidence score of the detected object.
x (int): X-coordinate of the top-left corner of the bounding box.
y (int): Y-coordinate of the top-left corner of the bounding box.
x_plus_w (int): X-coordinate of the bottom-right corner of the bounding box.
y_plus_h (int): Y-coordinate of the bottom-right corner of the bounding box.
"""
label = f"{CLASSES[class_id]} ({confidence:.2f})"
color = colors[class_id]
cv2.rectangle(img, (x, y), (x_plus_w, y_plus_h), color, 2)
cv2.putText(img, label, (x - 10, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
def main(onnx_model, input_image):
"""
Main function to load ONNX model, perform inference, draw bounding boxes, and display the output image.
Args:
onnx_model (str): Path to the ONNX model.
input_image (str): Path to the input image.
Returns:
list: List of dictionaries containing detection information such as class_id, class_name, confidence, etc.
"""
# Load the ONNX model
model: cv2.dnn.Net = cv2.dnn.readNetFromONNX(onnx_model)
# Read the input image
original_image: np.ndarray = cv2.imread(input_image)
[height, width, _] = original_image.shape
# Prepare a square image for inference
length = max((height, width))
image = np.zeros((length, length, 3), np.uint8)
image[0:height, 0:width] = original_image
# Calculate scale factor
scale = length / 640
# Preprocess the image and prepare blob for model
blob = cv2.dnn.blobFromImage(image, scalefactor=1 / 255, size=(640, 640), swapRB=True)
model.setInput(blob)
# Perform inference
outputs = model.forward()
# Prepare output array
outputs = np.array([cv2.transpose(outputs[0])])
rows = outputs.shape[1]
boxes = []
scores = []
class_ids = []
# Iterate through output to collect bounding boxes, confidence scores, and class IDs
for i in range(rows):
classes_scores = outputs[0][i][4:]
(minScore, maxScore, minClassLoc, (x, maxClassIndex)) = cv2.minMaxLoc(classes_scores)
if maxScore >= 0.25:
box = [
outputs[0][i][0] - (0.5 * outputs[0][i][2]),
outputs[0][i][1] - (0.5 * outputs[0][i][3]),
outputs[0][i][2],
outputs[0][i][3],
]
boxes.append(box)
scores.append(maxScore)
class_ids.append(maxClassIndex)
# Apply NMS (Non-maximum suppression)
result_boxes = cv2.dnn.NMSBoxes(boxes, scores, 0.25, 0.45, 0.5)
detections = []
# Iterate through NMS results to draw bounding boxes and labels
for i in range(len(result_boxes)):
index = result_boxes[i]
box = boxes[index]
detection = {
"class_id": class_ids[index],
"class_name": CLASSES[class_ids[index]],
"confidence": scores[index],
"box": box,
"scale": scale,
}
detections.append(detection)
draw_bounding_box(
original_image,
class_ids[index],
scores[index],
round(box[0] * scale),
round(box[1] * scale),
round((box[0] + box[2]) * scale),
round((box[1] + box[3]) * scale),
)
# Display the image with bounding boxes
cv2.imshow("image", original_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
return detections
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--model", default="yolov8n.onnx", help="Input your ONNX model.")
parser.add_argument("--img", default=str(ASSETS / "bus.jpg"), help="Path to input image.")
args = parser.parse_args()
main(args.model, args.img)

View File

@ -0,0 +1,65 @@
# YOLOv8 - Int8-TFLite Runtime
Welcome to the YOLOv8 Int8 TFLite Runtime for efficient and optimized object detection project. This README provides comprehensive instructions for installing and using our YOLOv8 implementation.
## Installation
Ensure a smooth setup by following these steps to install necessary dependencies.
### Installing Required Dependencies
Install all required dependencies with this simple command:
```bash
pip install -r requirements.txt
```
### Installing `tflite-runtime`
To load TFLite models, install the `tflite-runtime` package using:
```bash
pip install tflite-runtime
```
### Installing `tensorflow-gpu` (For NVIDIA GPU Users)
Leverage GPU acceleration with NVIDIA GPUs by installing `tensorflow-gpu`:
```bash
pip install tensorflow-gpu
```
**Note:** Ensure you have compatible GPU drivers installed on your system.
### Installing `tensorflow` (CPU Version)
For CPU usage or non-NVIDIA GPUs, install TensorFlow with:
```bash
pip install tensorflow
```
## Usage
Follow these instructions to run YOLOv8 after successful installation.
Convert the YOLOv8 model to Int8 TFLite format:
```bash
yolo export model=yolov8n.pt imgsz=640 format=tflite int8
```
Locate the Int8 TFLite model in `yolov8n_saved_model`. Choose `best_full_integer_quant` or verify quantization at [Netron](https://netron.app/). Then, execute the following in your terminal:
```bash
python main.py --model yolov8n_full_integer_quant.tflite --img image.jpg --conf-thres 0.5 --iou-thres 0.5
```
Replace `best_full_integer_quant.tflite` with your model file's path, `image.jpg` with your input image, and adjust the confidence (conf-thres) and IoU thresholds (iou-thres) as necessary.
### Output
The output is displayed as annotated images, showcasing the model's detection capabilities:
![image](https://github.com/wamiqraza/Attribute-recognition-and-reidentification-Market1501-dataset/blob/main/img/bus.jpg)

View File

@ -0,0 +1,299 @@
# Ultralytics YOLO 🚀, AGPL-3.0 license
import argparse
import cv2
import numpy as np
from tflite_runtime import interpreter as tflite
from ultralytics.utils import ASSETS, yaml_load
from ultralytics.utils.checks import check_yaml
# Declare as global variables, can be updated based trained model image size
img_width = 640
img_height = 640
class LetterBox:
def __init__(
self, new_shape=(img_width, img_height), auto=False, scaleFill=False, scaleup=True, center=True, stride=32
):
self.new_shape = new_shape
self.auto = auto
self.scaleFill = scaleFill
self.scaleup = scaleup
self.stride = stride
self.center = center # Put the image in the middle or top-left
def __call__(self, labels=None, image=None):
"""Return updated labels and image with added border."""
if labels is None:
labels = {}
img = labels.get("img") if image is None else image
shape = img.shape[:2] # current shape [height, width]
new_shape = labels.pop("rect_shape", self.new_shape)
if isinstance(new_shape, int):
new_shape = (new_shape, new_shape)
# Scale ratio (new / old)
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
if not self.scaleup: # only scale down, do not scale up (for better val mAP)
r = min(r, 1.0)
# Compute padding
ratio = r, r # width, height ratios
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding
if self.auto: # minimum rectangle
dw, dh = np.mod(dw, self.stride), np.mod(dh, self.stride) # wh padding
elif self.scaleFill: # stretch
dw, dh = 0.0, 0.0
new_unpad = (new_shape[1], new_shape[0])
ratio = new_shape[1] / shape[1], new_shape[0] / shape[0] # width, height ratios
if self.center:
dw /= 2 # divide padding into 2 sides
dh /= 2
if shape[::-1] != new_unpad: # resize
img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
top, bottom = int(round(dh - 0.1)) if self.center else 0, int(round(dh + 0.1))
left, right = int(round(dw - 0.1)) if self.center else 0, int(round(dw + 0.1))
img = cv2.copyMakeBorder(
img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(114, 114, 114)
) # add border
if labels.get("ratio_pad"):
labels["ratio_pad"] = (labels["ratio_pad"], (left, top)) # for evaluation
if len(labels):
labels = self._update_labels(labels, ratio, dw, dh)
labels["img"] = img
labels["resized_shape"] = new_shape
return labels
else:
return img
def _update_labels(self, labels, ratio, padw, padh):
"""Update labels."""
labels["instances"].convert_bbox(format="xyxy")
labels["instances"].denormalize(*labels["img"].shape[:2][::-1])
labels["instances"].scale(*ratio)
labels["instances"].add_padding(padw, padh)
return labels
class Yolov8TFLite:
def __init__(self, tflite_model, input_image, confidence_thres, iou_thres):
"""
Initializes an instance of the Yolov8TFLite class.
Args:
tflite_model: Path to the TFLite model.
input_image: Path to the input image.
confidence_thres: Confidence threshold for filtering detections.
iou_thres: IoU (Intersection over Union) threshold for non-maximum suppression.
"""
self.tflite_model = tflite_model
self.input_image = input_image
self.confidence_thres = confidence_thres
self.iou_thres = iou_thres
# Load the class names from the COCO dataset
self.classes = yaml_load(check_yaml("coco128.yaml"))["names"]
# Generate a color palette for the classes
self.color_palette = np.random.uniform(0, 255, size=(len(self.classes), 3))
def draw_detections(self, img, box, score, class_id):
"""
Draws bounding boxes and labels on the input image based on the detected objects.
Args:
img: The input image to draw detections on.
box: Detected bounding box.
score: Corresponding detection score.
class_id: Class ID for the detected object.
Returns:
None
"""
# Extract the coordinates of the bounding box
x1, y1, w, h = box
# Retrieve the color for the class ID
color = self.color_palette[class_id]
# Draw the bounding box on the image
cv2.rectangle(img, (int(x1), int(y1)), (int(x1 + w), int(y1 + h)), color, 2)
# Create the label text with class name and score
label = f"{self.classes[class_id]}: {score:.2f}"
# Calculate the dimensions of the label text
(label_width, label_height), _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
# Calculate the position of the label text
label_x = x1
label_y = y1 - 10 if y1 - 10 > label_height else y1 + 10
# Draw a filled rectangle as the background for the label text
cv2.rectangle(
img,
(int(label_x), int(label_y - label_height)),
(int(label_x + label_width), int(label_y + label_height)),
color,
cv2.FILLED,
)
# Draw the label text on the image
cv2.putText(img, label, (int(label_x), int(label_y)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1, cv2.LINE_AA)
def preprocess(self):
"""
Preprocesses the input image before performing inference.
Returns:
image_data: Preprocessed image data ready for inference.
"""
# Read the input image using OpenCV
self.img = cv2.imread(self.input_image)
print("image before", self.img)
# Get the height and width of the input image
self.img_height, self.img_width = self.img.shape[:2]
letterbox = LetterBox(new_shape=[img_width, img_height], auto=False, stride=32)
image = letterbox(image=self.img)
image = [image]
image = np.stack(image)
image = image[..., ::-1].transpose((0, 3, 1, 2))
img = np.ascontiguousarray(image)
# n, h, w, c
image = img.astype(np.float32)
return image / 255
def postprocess(self, input_image, output):
"""
Performs post-processing on the model's output to extract bounding boxes, scores, and class IDs.
Args:
input_image (numpy.ndarray): The input image.
output (numpy.ndarray): The output of the model.
Returns:
numpy.ndarray: The input image with detections drawn on it.
"""
boxes = []
scores = []
class_ids = []
for pred in output:
pred = np.transpose(pred)
for box in pred:
x, y, w, h = box[:4]
x1 = x - w / 2
y1 = y - h / 2
boxes.append([x1, y1, w, h])
idx = np.argmax(box[4:])
scores.append(box[idx + 4])
class_ids.append(idx)
indices = cv2.dnn.NMSBoxes(boxes, scores, self.confidence_thres, self.iou_thres)
for i in indices:
# Get the box, score, and class ID corresponding to the index
box = boxes[i]
gain = min(img_width / self.img_width, img_height / self.img_height)
pad = (
round((img_width - self.img_width * gain) / 2 - 0.1),
round((img_height - self.img_height * gain) / 2 - 0.1),
)
box[0] = (box[0] - pad[0]) / gain
box[1] = (box[1] - pad[1]) / gain
box[2] = box[2] / gain
box[3] = box[3] / gain
score = scores[i]
class_id = class_ids[i]
if score > 0.25:
print(box, score, class_id)
# Draw the detection on the input image
self.draw_detections(input_image, box, score, class_id)
return input_image
def main(self):
"""
Performs inference using a TFLite model and returns the output image with drawn detections.
Returns:
output_img: The output image with drawn detections.
"""
# Create an interpreter for the TFLite model
interpreter = tflite.Interpreter(model_path=self.tflite_model)
self.model = interpreter
interpreter.allocate_tensors()
# Get the model inputs
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Store the shape of the input for later use
input_shape = input_details[0]["shape"]
self.input_width = input_shape[1]
self.input_height = input_shape[2]
# Preprocess the image data
img_data = self.preprocess()
img_data = img_data
# img_data = img_data.cpu().numpy()
# Set the input tensor to the interpreter
print(input_details[0]["index"])
print(img_data.shape)
img_data = img_data.transpose((0, 2, 3, 1))
scale, zero_point = input_details[0]["quantization"]
interpreter.set_tensor(input_details[0]["index"], img_data)
# Run inference
interpreter.invoke()
# Get the output tensor from the interpreter
output = interpreter.get_tensor(output_details[0]["index"])
scale, zero_point = output_details[0]["quantization"]
output = (output.astype(np.float32) - zero_point) * scale
output[:, [0, 2]] *= img_width
output[:, [1, 3]] *= img_height
print(output)
# Perform post-processing on the outputs to obtain output image.
return self.postprocess(self.img, output)
if __name__ == "__main__":
# Create an argument parser to handle command-line arguments
parser = argparse.ArgumentParser()
parser.add_argument(
"--model", type=str, default="yolov8n_full_integer_quant.tflite", help="Input your TFLite model."
)
parser.add_argument("--img", type=str, default=str(ASSETS / "bus.jpg"), help="Path to input image.")
parser.add_argument("--conf-thres", type=float, default=0.5, help="Confidence threshold")
parser.add_argument("--iou-thres", type=float, default=0.5, help="NMS IoU threshold")
args = parser.parse_args()
# Create an instance of the Yolov8TFLite class with the specified arguments
detection = Yolov8TFLite(args.model, args.img, args.conf_thres, args.iou_thres)
# Perform object detection and obtain the output image
output_image = detection.main()
# Display the output image in a window
cv2.imshow("Output", output_image)
# Wait for a key press to exit
cv2.waitKey(0)

View File

@ -0,0 +1,123 @@
# Regions Counting Using YOLOv8 (Inference on Video)
- Region counting is a method employed to tally the objects within a specified area, allowing for more sophisticated analyses when multiple regions are considered. These regions can be adjusted interactively using a Left Mouse Click, and the counting process occurs in real time.
- Regions can be adjusted to suit the user's preferences and requirements.
<div>
<p align="center">
<img src="https://github.com/RizwanMunawar/ultralytics/assets/62513924/5ab3bbd7-fd12-4849-928e-5f294d6c3fcf" width="45%" alt="YOLOv8 region counting visual 1">
<img src="https://github.com/RizwanMunawar/ultralytics/assets/62513924/e7c1aea7-474d-4d78-8d48-b50854ffe1ca" width="45%" alt="YOLOv8 region counting visual 2">
</p>
</div>
## Table of Contents
- [Step 1: Install the Required Libraries](#step-1-install-the-required-libraries)
- [Step 2: Run the Region Counting Using Ultralytics YOLOv8](#step-2-run-the-region-counting-using-ultralytics-yolov8)
- [Usage Options](#usage-options)
- [FAQ](#faq)
## Step 1: Install the Required Libraries
Clone the repository, install dependencies and `cd` to this local directory for commands in Step 2.
```bash
# Clone ultralytics repo
git clone https://github.com/ultralytics/ultralytics
# cd to local directory
cd ultralytics/examples/YOLOv8-Region-Counter
```
## Step 2: Run the Region Counting Using Ultralytics YOLOv8
Here are the basic commands for running the inference:
### Note
After the video begins playing, you can freely move the region anywhere within the video by simply clicking and dragging using the left mouse button.
```bash
# If you want to save results
python yolov8_region_counter.py --source "path/to/video.mp4" --save-img --view-img
# If you want to run model on CPU
python yolov8_region_counter.py --source "path/to/video.mp4" --save-img --view-img --device cpu
# If you want to change model file
python yolov8_region_counter.py --source "path/to/video.mp4" --save-img --weights "path/to/model.pt"
# If you want to detect specific class (first class and third class)
python yolov8_region_counter.py --source "path/to/video.mp4" --classes 0 2 --weights "path/to/model.pt"
# If you dont want to save results
python yolov8_region_counter.py --source "path/to/video.mp4" --view-img
```
## Usage Options
- `--source`: Specifies the path to the video file you want to run inference on.
- `--device`: Specifies the device `cpu` or `0`
- `--save-img`: Flag to save the detection results as images.
- `--weights`: Specifies a different YOLOv8 model file (e.g., `yolov8n.pt`, `yolov8s.pt`, `yolov8m.pt`, `yolov8l.pt`, `yolov8x.pt`).
- `--classes`: Specifies the class to be detected
- `--line-thickness`: Specifies the bounding box thickness
- `--region-thickness`: Specifies the region boxes thickness
- `--track-thickness`: Specifies the track line thickness
## FAQ
**1. What Does Region Counting Involve?**
Region counting is a computational method utilized to ascertain the quantity of objects within a specific area in recorded video or real-time streams. This technique finds frequent application in image processing, computer vision, and pattern recognition, facilitating the analysis and segmentation of objects or features based on their spatial relationships.
**2. Is Friendly Region Plotting Supported by the Region Counter?**
The Region Counter offers the capability to create regions in various formats, such as polygons and rectangles. You have the flexibility to modify region attributes, including coordinates, colors, and other details, as demonstrated in the following code:
```python
from shapely.geometry import Polygon
counting_regions = [
{
"name": "YOLOv8 Polygon Region",
"polygon": Polygon(
[(50, 80), (250, 20), (450, 80), (400, 350), (100, 350)]
), # Polygon with five points (Pentagon)
"counts": 0,
"dragging": False,
"region_color": (255, 42, 4), # BGR Value
"text_color": (255, 255, 255), # Region Text Color
},
{
"name": "YOLOv8 Rectangle Region",
"polygon": Polygon(
[(200, 250), (440, 250), (440, 550), (200, 550)]
), # Rectangle with four points
"counts": 0,
"dragging": False,
"region_color": (37, 255, 225), # BGR Value
"text_color": (0, 0, 0), # Region Text Color
},
]
```
**3. Why Combine Region Counting with YOLOv8?**
YOLOv8 specializes in the detection and tracking of objects in video streams. Region counting complements this by enabling object counting within designated areas, making it a valuable application of YOLOv8.
**4. How Can I Troubleshoot Issues?**
To gain more insights during inference, you can include the `--debug` flag in your command:
```bash
python yolov8_region_counter.py --source "path to video file" --debug
```
**5. Can I Employ Other YOLO Versions?**
Certainly, you have the flexibility to specify different YOLO model weights using the `--weights` option.
**6. Where Can I Access Additional Information?**
For a comprehensive guide on using YOLOv8 with Object Tracking, please refer to [Multi-Object Tracking with Ultralytics YOLO](https://docs.ultralytics.com/modes/track/).

View File

@ -0,0 +1,251 @@
# Ultralytics YOLO 🚀, AGPL-3.0 license
import argparse
from collections import defaultdict
from pathlib import Path
import cv2
import numpy as np
from shapely.geometry import Polygon
from shapely.geometry.point import Point
from ultralytics import YOLO
from ultralytics.utils.files import increment_path
from ultralytics.utils.plotting import Annotator, colors
track_history = defaultdict(list)
current_region = None
counting_regions = [
{
"name": "YOLOv8 Polygon Region",
"polygon": Polygon([(50, 80), (250, 20), (450, 80), (400, 350), (100, 350)]), # Polygon points
"counts": 0,
"dragging": False,
"region_color": (255, 42, 4), # BGR Value
"text_color": (255, 255, 255), # Region Text Color
},
{
"name": "YOLOv8 Rectangle Region",
"polygon": Polygon([(200, 250), (440, 250), (440, 550), (200, 550)]), # Polygon points
"counts": 0,
"dragging": False,
"region_color": (37, 255, 225), # BGR Value
"text_color": (0, 0, 0), # Region Text Color
},
]
def mouse_callback(event, x, y, flags, param):
"""
Handles mouse events for region manipulation.
Parameters:
event (int): The mouse event type (e.g., cv2.EVENT_LBUTTONDOWN).
x (int): The x-coordinate of the mouse pointer.
y (int): The y-coordinate of the mouse pointer.
flags (int): Additional flags passed by OpenCV.
param: Additional parameters passed to the callback (not used in this function).
Global Variables:
current_region (dict): A dictionary representing the current selected region.
Mouse Events:
- LBUTTONDOWN: Initiates dragging for the region containing the clicked point.
- MOUSEMOVE: Moves the selected region if dragging is active.
- LBUTTONUP: Ends dragging for the selected region.
Notes:
- This function is intended to be used as a callback for OpenCV mouse events.
- Requires the existence of the 'counting_regions' list and the 'Polygon' class.
Example:
>>> cv2.setMouseCallback(window_name, mouse_callback)
"""
global current_region
# Mouse left button down event
if event == cv2.EVENT_LBUTTONDOWN:
for region in counting_regions:
if region["polygon"].contains(Point((x, y))):
current_region = region
current_region["dragging"] = True
current_region["offset_x"] = x
current_region["offset_y"] = y
# Mouse move event
elif event == cv2.EVENT_MOUSEMOVE:
if current_region is not None and current_region["dragging"]:
dx = x - current_region["offset_x"]
dy = y - current_region["offset_y"]
current_region["polygon"] = Polygon(
[(p[0] + dx, p[1] + dy) for p in current_region["polygon"].exterior.coords]
)
current_region["offset_x"] = x
current_region["offset_y"] = y
# Mouse left button up event
elif event == cv2.EVENT_LBUTTONUP:
if current_region is not None and current_region["dragging"]:
current_region["dragging"] = False
def run(
weights="yolov8n.pt",
source=None,
device="cpu",
view_img=False,
save_img=False,
exist_ok=False,
classes=None,
line_thickness=2,
track_thickness=2,
region_thickness=2,
):
"""
Run Region counting on a video using YOLOv8 and ByteTrack.
Supports movable region for real time counting inside specific area.
Supports multiple regions counting.
Regions can be Polygons or rectangle in shape
Args:
weights (str): Model weights path.
source (str): Video file path.
device (str): processing device cpu, 0, 1
view_img (bool): Show results.
save_img (bool): Save results.
exist_ok (bool): Overwrite existing files.
classes (list): classes to detect and track
line_thickness (int): Bounding box thickness.
track_thickness (int): Tracking line thickness
region_thickness (int): Region thickness.
"""
vid_frame_count = 0
# Check source path
if not Path(source).exists():
raise FileNotFoundError(f"Source path '{source}' does not exist.")
# Setup Model
model = YOLO(f"{weights}")
model.to("cuda") if device == "0" else model.to("cpu")
# Extract classes names
names = model.model.names
# Video setup
videocapture = cv2.VideoCapture(source)
frame_width, frame_height = int(videocapture.get(3)), int(videocapture.get(4))
fps, fourcc = int(videocapture.get(5)), cv2.VideoWriter_fourcc(*"mp4v")
# Output setup
save_dir = increment_path(Path("ultralytics_rc_output") / "exp", exist_ok)
save_dir.mkdir(parents=True, exist_ok=True)
video_writer = cv2.VideoWriter(str(save_dir / f"{Path(source).stem}.mp4"), fourcc, fps, (frame_width, frame_height))
# Iterate over video frames
while videocapture.isOpened():
success, frame = videocapture.read()
if not success:
break
vid_frame_count += 1
# Extract the results
results = model.track(frame, persist=True, classes=classes)
if results[0].boxes.id is not None:
boxes = results[0].boxes.xyxy.cpu()
track_ids = results[0].boxes.id.int().cpu().tolist()
clss = results[0].boxes.cls.cpu().tolist()
annotator = Annotator(frame, line_width=line_thickness, example=str(names))
for box, track_id, cls in zip(boxes, track_ids, clss):
annotator.box_label(box, str(names[cls]), color=colors(cls, True))
bbox_center = (box[0] + box[2]) / 2, (box[1] + box[3]) / 2 # Bbox center
track = track_history[track_id] # Tracking Lines plot
track.append((float(bbox_center[0]), float(bbox_center[1])))
if len(track) > 30:
track.pop(0)
points = np.hstack(track).astype(np.int32).reshape((-1, 1, 2))
cv2.polylines(frame, [points], isClosed=False, color=colors(cls, True), thickness=track_thickness)
# Check if detection inside region
for region in counting_regions:
if region["polygon"].contains(Point((bbox_center[0], bbox_center[1]))):
region["counts"] += 1
# Draw regions (Polygons/Rectangles)
for region in counting_regions:
region_label = str(region["counts"])
region_color = region["region_color"]
region_text_color = region["text_color"]
polygon_coords = np.array(region["polygon"].exterior.coords, dtype=np.int32)
centroid_x, centroid_y = int(region["polygon"].centroid.x), int(region["polygon"].centroid.y)
text_size, _ = cv2.getTextSize(
region_label, cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.7, thickness=line_thickness
)
text_x = centroid_x - text_size[0] // 2
text_y = centroid_y + text_size[1] // 2
cv2.rectangle(
frame,
(text_x - 5, text_y - text_size[1] - 5),
(text_x + text_size[0] + 5, text_y + 5),
region_color,
-1,
)
cv2.putText(
frame, region_label, (text_x, text_y), cv2.FONT_HERSHEY_SIMPLEX, 0.7, region_text_color, line_thickness
)
cv2.polylines(frame, [polygon_coords], isClosed=True, color=region_color, thickness=region_thickness)
if view_img:
if vid_frame_count == 1:
cv2.namedWindow("Ultralytics YOLOv8 Region Counter Movable")
cv2.setMouseCallback("Ultralytics YOLOv8 Region Counter Movable", mouse_callback)
cv2.imshow("Ultralytics YOLOv8 Region Counter Movable", frame)
if save_img:
video_writer.write(frame)
for region in counting_regions: # Reinitialize count for each region
region["counts"] = 0
if cv2.waitKey(1) & 0xFF == ord("q"):
break
del vid_frame_count
video_writer.release()
videocapture.release()
cv2.destroyAllWindows()
def parse_opt():
"""Parse command line arguments."""
parser = argparse.ArgumentParser()
parser.add_argument("--weights", type=str, default="yolov8n.pt", help="initial weights path")
parser.add_argument("--device", default="", help="cuda device, i.e. 0 or 0,1,2,3 or cpu")
parser.add_argument("--source", type=str, required=True, help="video file path")
parser.add_argument("--view-img", action="store_true", help="show results")
parser.add_argument("--save-img", action="store_true", help="save results")
parser.add_argument("--exist-ok", action="store_true", help="existing project/name ok, do not increment")
parser.add_argument("--classes", nargs="+", type=int, help="filter by class: --classes 0, or --classes 0 2 3")
parser.add_argument("--line-thickness", type=int, default=2, help="bounding box thickness")
parser.add_argument("--track-thickness", type=int, default=2, help="Tracking line thickness")
parser.add_argument("--region-thickness", type=int, default=4, help="Region thickness")
return parser.parse_args()
def main(opt):
"""Main function."""
run(**vars(opt))
if __name__ == "__main__":
opt = parse_opt()
main(opt)

View File

@ -0,0 +1,69 @@
# YOLOv8 with SAHI (Inference on Video)
[SAHI](https://docs.ultralytics.com/guides/sahi-tiled-inference/) is designed to optimize object detection algorithms for large-scale and high-resolution imagery. It partitions images into manageable slices, performs object detection on each slice, and then stitches the results back together. This tutorial will guide you through the process of running YOLOv8 inference on video files with the aid of SAHI.
## Table of Contents
- [Step 1: Install the Required Libraries](#step-1-install-the-required-libraries)
- [Step 2: Run the Inference with SAHI using Ultralytics YOLOv8](#step-2-run-the-inference-with-sahi-using-ultralytics-yolov8)
- [Usage Options](#usage-options)
- [FAQ](#faq)
## Step 1: Install the Required Libraries
Clone the repository, install dependencies and `cd` to this local directory for commands in Step 2.
```bash
# Clone ultralytics repo
git clone https://github.com/ultralytics/ultralytics
# Install dependencies
pip install sahi ultralytics
# cd to local directory
cd ultralytics/examples/YOLOv8-SAHI-Inference-Video
```
## Step 2: Run the Inference with SAHI using Ultralytics YOLOv8
Here are the basic commands for running the inference:
```bash
#if you want to save results
python yolov8_sahi.py --source "path/to/video.mp4" --save-img
#if you want to change model file
python yolov8_sahi.py --source "path/to/video.mp4" --save-img --weights "yolov8n.pt"
```
## Usage Options
- `--source`: Specifies the path to the video file you want to run inference on.
- `--save-img`: Flag to save the detection results as images.
- `--weights`: Specifies a different YOLOv8 model file (e.g., `yolov8n.pt`, `yolov8s.pt`, `yolov8m.pt`, `yolov8l.pt`, `yolov8x.pt`).
## FAQ
**1. What is SAHI?**
SAHI stands for Slicing, Analysis, and Healing of Images. It is a library designed to optimize object detection algorithms for large-scale and high-resolution images. The library source code is available on [GitHub](https://github.com/obss/sahi).
**2. Why use SAHI with YOLOv8?**
SAHI can handle large-scale images by slicing them into smaller, more manageable sizes without compromising the detection quality. This makes it a great companion to YOLOv8, especially when working with high-resolution videos.
**3. How do I debug issues?**
You can add the `--debug` flag to your command to print out more information during inference:
```bash
python yolov8_sahi.py --source "path to video file" --debug
```
**4. Can I use other YOLO versions?**
Yes, you can specify different YOLO model weights using the `--weights` option.
**5. Where can I find more information?**
For a full guide to YOLOv8 with SAHI see [https://docs.ultralytics.com/guides/sahi-tiled-inference](https://docs.ultralytics.com/guides/sahi-tiled-inference/).

View File

@ -0,0 +1,111 @@
# Ultralytics YOLO 🚀, AGPL-3.0 license
import argparse
from pathlib import Path
import cv2
from sahi import AutoDetectionModel
from sahi.predict import get_sliced_prediction
from sahi.utils.yolov8 import download_yolov8s_model
from ultralytics.utils.files import increment_path
def run(weights="yolov8n.pt", source="test.mp4", view_img=False, save_img=False, exist_ok=False):
"""
Run object detection on a video using YOLOv8 and SAHI.
Args:
weights (str): Model weights path.
source (str): Video file path.
view_img (bool): Show results.
save_img (bool): Save results.
exist_ok (bool): Overwrite existing files.
"""
# Check source path
if not Path(source).exists():
raise FileNotFoundError(f"Source path '{source}' does not exist.")
yolov8_model_path = f"models/{weights}"
download_yolov8s_model(yolov8_model_path)
detection_model = AutoDetectionModel.from_pretrained(
model_type="yolov8", model_path=yolov8_model_path, confidence_threshold=0.3, device="cpu"
)
# Video setup
videocapture = cv2.VideoCapture(source)
frame_width, frame_height = int(videocapture.get(3)), int(videocapture.get(4))
fps, fourcc = int(videocapture.get(5)), cv2.VideoWriter_fourcc(*"mp4v")
# Output setup
save_dir = increment_path(Path("ultralytics_results_with_sahi") / "exp", exist_ok)
save_dir.mkdir(parents=True, exist_ok=True)
video_writer = cv2.VideoWriter(str(save_dir / f"{Path(source).stem}.mp4"), fourcc, fps, (frame_width, frame_height))
while videocapture.isOpened():
success, frame = videocapture.read()
if not success:
break
results = get_sliced_prediction(
frame, detection_model, slice_height=512, slice_width=512, overlap_height_ratio=0.2, overlap_width_ratio=0.2
)
object_prediction_list = results.object_prediction_list
boxes_list = []
clss_list = []
for ind, _ in enumerate(object_prediction_list):
boxes = (
object_prediction_list[ind].bbox.minx,
object_prediction_list[ind].bbox.miny,
object_prediction_list[ind].bbox.maxx,
object_prediction_list[ind].bbox.maxy,
)
clss = object_prediction_list[ind].category.name
boxes_list.append(boxes)
clss_list.append(clss)
for box, cls in zip(boxes_list, clss_list):
x1, y1, x2, y2 = box
cv2.rectangle(frame, (int(x1), int(y1)), (int(x2), int(y2)), (56, 56, 255), 2)
label = str(cls)
t_size = cv2.getTextSize(label, 0, fontScale=0.6, thickness=1)[0]
cv2.rectangle(
frame, (int(x1), int(y1) - t_size[1] - 3), (int(x1) + t_size[0], int(y1) + 3), (56, 56, 255), -1
)
cv2.putText(
frame, label, (int(x1), int(y1) - 2), 0, 0.6, [255, 255, 255], thickness=1, lineType=cv2.LINE_AA
)
if view_img:
cv2.imshow(Path(source).stem, frame)
if save_img:
video_writer.write(frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
video_writer.release()
videocapture.release()
cv2.destroyAllWindows()
def parse_opt():
"""Parse command line arguments."""
parser = argparse.ArgumentParser()
parser.add_argument("--weights", type=str, default="yolov8n.pt", help="initial weights path")
parser.add_argument("--source", type=str, required=True, help="video file path")
parser.add_argument("--view-img", action="store_true", help="show results")
parser.add_argument("--save-img", action="store_true", help="save results")
parser.add_argument("--exist-ok", action="store_true", help="existing project/name ok, do not increment")
return parser.parse_args()
def main(opt):
"""Main function."""
run(**vars(opt))
if __name__ == "__main__":
opt = parse_opt()
main(opt)

View File

@ -0,0 +1,63 @@
# YOLOv8-Segmentation-ONNXRuntime-Python Demo
This repository provides a Python demo for performing segmentation with YOLOv8 using ONNX Runtime, highlighting the interoperability of YOLOv8 models without the need for the full PyTorch stack.
## Features
- **Framework Agnostic**: Runs segmentation inference purely on ONNX Runtime without importing PyTorch.
- **Efficient Inference**: Supports both FP32 and FP16 precision for ONNX models, catering to different computational needs.
- **Ease of Use**: Utilizes simple command-line arguments for model execution.
- **Broad Compatibility**: Leverages Numpy and OpenCV for image processing, ensuring broad compatibility with various environments.
## Installation
Install the required packages using pip. You will need `ultralytics` for exporting YOLOv8-seg ONNX model and using some utility functions, `onnxruntime-gpu` for GPU-accelerated inference, and `opencv-python` for image processing.
```bash
pip install ultralytics
pip install onnxruntime-gpu # For GPU support
# pip install onnxruntime # Use this instead if you don't have an NVIDIA GPU
pip install numpy
pip install opencv-python
```
## Getting Started
### 1. Export the YOLOv8 ONNX Model
Export the YOLOv8 segmentation model to ONNX format using the provided `ultralytics` package.
```bash
yolo export model=yolov8s-seg.pt imgsz=640 format=onnx opset=12 simplify
```
### 2. Run Inference
Perform inference with the exported ONNX model on your images.
```bash
python main.py --model-path <MODEL_PATH> --source <IMAGE_PATH>
```
### Example Output
After running the command, you should see segmentation results similar to this:
<img src="https://user-images.githubusercontent.com/51357717/279988626-eb74823f-1563-4d58-a8e4-0494025b7c9a.jpg" alt="Segmentation Demo" width="800">
## Advanced Usage
For more advanced usage, including real-time video processing, please refer to the `main.py` script's command-line arguments.
## Contributing
We welcome contributions to improve this demo! Please submit issues and pull requests for bug reports, feature requests, or submitting a new algorithm enhancement.
## License
This project is licensed under the AGPL-3.0 License - see the [LICENSE](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) file for details.
## Acknowledgments
- The YOLOv8-Segmentation-ONNXRuntime-Python demo is contributed by GitHub user [jamjamjon](https://github.com/jamjamjon).
- Thanks to the ONNX Runtime community for providing a robust and efficient inference engine.

View File

@ -0,0 +1,342 @@
# Ultralytics YOLO 🚀, AGPL-3.0 license
import argparse
import cv2
import numpy as np
import onnxruntime as ort
from ultralytics.utils import ASSETS, yaml_load
from ultralytics.utils.checks import check_yaml
from ultralytics.utils.plotting import Colors
class YOLOv8Seg:
"""YOLOv8 segmentation model."""
def __init__(self, onnx_model):
"""
Initialization.
Args:
onnx_model (str): Path to the ONNX model.
"""
# Build Ort session
self.session = ort.InferenceSession(
onnx_model,
providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
if ort.get_device() == "GPU"
else ["CPUExecutionProvider"],
)
# Numpy dtype: support both FP32 and FP16 onnx model
self.ndtype = np.half if self.session.get_inputs()[0].type == "tensor(float16)" else np.single
# Get model width and height(YOLOv8-seg only has one input)
self.model_height, self.model_width = [x.shape for x in self.session.get_inputs()][0][-2:]
# Load COCO class names
self.classes = yaml_load(check_yaml("coco128.yaml"))["names"]
# Create color palette
self.color_palette = Colors()
def __call__(self, im0, conf_threshold=0.4, iou_threshold=0.45, nm=32):
"""
The whole pipeline: pre-process -> inference -> post-process.
Args:
im0 (Numpy.ndarray): original input image.
conf_threshold (float): confidence threshold for filtering predictions.
iou_threshold (float): iou threshold for NMS.
nm (int): the number of masks.
Returns:
boxes (List): list of bounding boxes.
segments (List): list of segments.
masks (np.ndarray): [N, H, W], output masks.
"""
# Pre-process
im, ratio, (pad_w, pad_h) = self.preprocess(im0)
# Ort inference
preds = self.session.run(None, {self.session.get_inputs()[0].name: im})
# Post-process
boxes, segments, masks = self.postprocess(
preds,
im0=im0,
ratio=ratio,
pad_w=pad_w,
pad_h=pad_h,
conf_threshold=conf_threshold,
iou_threshold=iou_threshold,
nm=nm,
)
return boxes, segments, masks
def preprocess(self, img):
"""
Pre-processes the input image.
Args:
img (Numpy.ndarray): image about to be processed.
Returns:
img_process (Numpy.ndarray): image preprocessed for inference.
ratio (tuple): width, height ratios in letterbox.
pad_w (float): width padding in letterbox.
pad_h (float): height padding in letterbox.
"""
# Resize and pad input image using letterbox() (Borrowed from Ultralytics)
shape = img.shape[:2] # original image shape
new_shape = (self.model_height, self.model_width)
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
ratio = r, r
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
pad_w, pad_h = (new_shape[1] - new_unpad[0]) / 2, (new_shape[0] - new_unpad[1]) / 2 # wh padding
if shape[::-1] != new_unpad: # resize
img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
top, bottom = int(round(pad_h - 0.1)), int(round(pad_h + 0.1))
left, right = int(round(pad_w - 0.1)), int(round(pad_w + 0.1))
img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(114, 114, 114))
# Transforms: HWC to CHW -> BGR to RGB -> div(255) -> contiguous -> add axis(optional)
img = np.ascontiguousarray(np.einsum("HWC->CHW", img)[::-1], dtype=self.ndtype) / 255.0
img_process = img[None] if len(img.shape) == 3 else img
return img_process, ratio, (pad_w, pad_h)
def postprocess(self, preds, im0, ratio, pad_w, pad_h, conf_threshold, iou_threshold, nm=32):
"""
Post-process the prediction.
Args:
preds (Numpy.ndarray): predictions come from ort.session.run().
im0 (Numpy.ndarray): [h, w, c] original input image.
ratio (tuple): width, height ratios in letterbox.
pad_w (float): width padding in letterbox.
pad_h (float): height padding in letterbox.
conf_threshold (float): conf threshold.
iou_threshold (float): iou threshold.
nm (int): the number of masks.
Returns:
boxes (List): list of bounding boxes.
segments (List): list of segments.
masks (np.ndarray): [N, H, W], output masks.
"""
x, protos = preds[0], preds[1] # Two outputs: predictions and protos
# Transpose the first output: (Batch_size, xywh_conf_cls_nm, Num_anchors) -> (Batch_size, Num_anchors, xywh_conf_cls_nm)
x = np.einsum("bcn->bnc", x)
# Predictions filtering by conf-threshold
x = x[np.amax(x[..., 4:-nm], axis=-1) > conf_threshold]
# Create a new matrix which merge these(box, score, cls, nm) into one
# For more details about `numpy.c_()`: https://numpy.org/doc/1.26/reference/generated/numpy.c_.html
x = np.c_[x[..., :4], np.amax(x[..., 4:-nm], axis=-1), np.argmax(x[..., 4:-nm], axis=-1), x[..., -nm:]]
# NMS filtering
x = x[cv2.dnn.NMSBoxes(x[:, :4], x[:, 4], conf_threshold, iou_threshold)]
# Decode and return
if len(x) > 0:
# Bounding boxes format change: cxcywh -> xyxy
x[..., [0, 1]] -= x[..., [2, 3]] / 2
x[..., [2, 3]] += x[..., [0, 1]]
# Rescales bounding boxes from model shape(model_height, model_width) to the shape of original image
x[..., :4] -= [pad_w, pad_h, pad_w, pad_h]
x[..., :4] /= min(ratio)
# Bounding boxes boundary clamp
x[..., [0, 2]] = x[:, [0, 2]].clip(0, im0.shape[1])
x[..., [1, 3]] = x[:, [1, 3]].clip(0, im0.shape[0])
# Process masks
masks = self.process_mask(protos[0], x[:, 6:], x[:, :4], im0.shape)
# Masks -> Segments(contours)
segments = self.masks2segments(masks)
return x[..., :6], segments, masks # boxes, segments, masks
else:
return [], [], []
@staticmethod
def masks2segments(masks):
"""
It takes a list of masks(n,h,w) and returns a list of segments(n,xy) (Borrowed from
https://github.com/ultralytics/ultralytics/blob/465df3024f44fa97d4fad9986530d5a13cdabdca/ultralytics/utils/ops.py#L750)
Args:
masks (numpy.ndarray): the output of the model, which is a tensor of shape (batch_size, 160, 160).
Returns:
segments (List): list of segment masks.
"""
segments = []
for x in masks.astype("uint8"):
c = cv2.findContours(x, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[0] # CHAIN_APPROX_SIMPLE
if c:
c = np.array(c[np.array([len(x) for x in c]).argmax()]).reshape(-1, 2)
else:
c = np.zeros((0, 2)) # no segments found
segments.append(c.astype("float32"))
return segments
@staticmethod
def crop_mask(masks, boxes):
"""
It takes a mask and a bounding box, and returns a mask that is cropped to the bounding box. (Borrowed from
https://github.com/ultralytics/ultralytics/blob/465df3024f44fa97d4fad9986530d5a13cdabdca/ultralytics/utils/ops.py#L599)
Args:
masks (Numpy.ndarray): [n, h, w] tensor of masks.
boxes (Numpy.ndarray): [n, 4] tensor of bbox coordinates in relative point form.
Returns:
(Numpy.ndarray): The masks are being cropped to the bounding box.
"""
n, h, w = masks.shape
x1, y1, x2, y2 = np.split(boxes[:, :, None], 4, 1)
r = np.arange(w, dtype=x1.dtype)[None, None, :]
c = np.arange(h, dtype=x1.dtype)[None, :, None]
return masks * ((r >= x1) * (r < x2) * (c >= y1) * (c < y2))
def process_mask(self, protos, masks_in, bboxes, im0_shape):
"""
Takes the output of the mask head, and applies the mask to the bounding boxes. This produces masks of higher quality
but is slower. (Borrowed from https://github.com/ultralytics/ultralytics/blob/465df3024f44fa97d4fad9986530d5a13cdabdca/ultralytics/utils/ops.py#L618)
Args:
protos (numpy.ndarray): [mask_dim, mask_h, mask_w].
masks_in (numpy.ndarray): [n, mask_dim], n is number of masks after nms.
bboxes (numpy.ndarray): bboxes re-scaled to original image shape.
im0_shape (tuple): the size of the input image (h,w,c).
Returns:
(numpy.ndarray): The upsampled masks.
"""
c, mh, mw = protos.shape
masks = np.matmul(masks_in, protos.reshape((c, -1))).reshape((-1, mh, mw)).transpose(1, 2, 0) # HWN
masks = np.ascontiguousarray(masks)
masks = self.scale_mask(masks, im0_shape) # re-scale mask from P3 shape to original input image shape
masks = np.einsum("HWN -> NHW", masks) # HWN -> NHW
masks = self.crop_mask(masks, bboxes)
return np.greater(masks, 0.5)
@staticmethod
def scale_mask(masks, im0_shape, ratio_pad=None):
"""
Takes a mask, and resizes it to the original image size. (Borrowed from
https://github.com/ultralytics/ultralytics/blob/465df3024f44fa97d4fad9986530d5a13cdabdca/ultralytics/utils/ops.py#L305)
Args:
masks (np.ndarray): resized and padded masks/images, [h, w, num]/[h, w, 3].
im0_shape (tuple): the original image shape.
ratio_pad (tuple): the ratio of the padding to the original image.
Returns:
masks (np.ndarray): The masks that are being returned.
"""
im1_shape = masks.shape[:2]
if ratio_pad is None: # calculate from im0_shape
gain = min(im1_shape[0] / im0_shape[0], im1_shape[1] / im0_shape[1]) # gain = old / new
pad = (im1_shape[1] - im0_shape[1] * gain) / 2, (im1_shape[0] - im0_shape[0] * gain) / 2 # wh padding
else:
pad = ratio_pad[1]
# Calculate tlbr of mask
top, left = int(round(pad[1] - 0.1)), int(round(pad[0] - 0.1)) # y, x
bottom, right = int(round(im1_shape[0] - pad[1] + 0.1)), int(round(im1_shape[1] - pad[0] + 0.1))
if len(masks.shape) < 2:
raise ValueError(f'"len of masks shape" should be 2 or 3, but got {len(masks.shape)}')
masks = masks[top:bottom, left:right]
masks = cv2.resize(
masks, (im0_shape[1], im0_shape[0]), interpolation=cv2.INTER_LINEAR
) # INTER_CUBIC would be better
if len(masks.shape) == 2:
masks = masks[:, :, None]
return masks
def draw_and_visualize(self, im, bboxes, segments, vis=False, save=True):
"""
Draw and visualize results.
Args:
im (np.ndarray): original image, shape [h, w, c].
bboxes (numpy.ndarray): [n, 4], n is number of bboxes.
segments (List): list of segment masks.
vis (bool): imshow using OpenCV.
save (bool): save image annotated.
Returns:
None
"""
# Draw rectangles and polygons
im_canvas = im.copy()
for (*box, conf, cls_), segment in zip(bboxes, segments):
# draw contour and fill mask
cv2.polylines(im, np.int32([segment]), True, (255, 255, 255), 2) # white borderline
cv2.fillPoly(im_canvas, np.int32([segment]), self.color_palette(int(cls_), bgr=True))
# draw bbox rectangle
cv2.rectangle(
im,
(int(box[0]), int(box[1])),
(int(box[2]), int(box[3])),
self.color_palette(int(cls_), bgr=True),
1,
cv2.LINE_AA,
)
cv2.putText(
im,
f"{self.classes[cls_]}: {conf:.3f}",
(int(box[0]), int(box[1] - 9)),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
self.color_palette(int(cls_), bgr=True),
2,
cv2.LINE_AA,
)
# Mix image
im = cv2.addWeighted(im_canvas, 0.3, im, 0.7, 0)
# Show image
if vis:
cv2.imshow("demo", im)
cv2.waitKey(0)
cv2.destroyAllWindows()
# Save image
if save:
cv2.imwrite("demo.jpg", im)
if __name__ == "__main__":
# Create an argument parser to handle command-line arguments
parser = argparse.ArgumentParser()
parser.add_argument("--model", type=str, required=True, help="Path to ONNX model")
parser.add_argument("--source", type=str, default=str(ASSETS / "bus.jpg"), help="Path to input image")
parser.add_argument("--conf", type=float, default=0.25, help="Confidence threshold")
parser.add_argument("--iou", type=float, default=0.45, help="NMS IoU threshold")
args = parser.parse_args()
# Build model
model = YOLOv8Seg(args.model)
# Read image by OpenCV
img = cv2.imread(args.source)
# Inference
boxes, segments, _ = model(img, conf_threshold=args.conf, iou_threshold=args.iou)
# Draw bboxes and polygons
if len(boxes) > 0:
model.draw_and_visualize(img, boxes, segments, vis=False, save=True)

145
examples/heatmaps.ipynb Normal file
View File

@ -0,0 +1,145 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"gpuType": "T4"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"source": [
"<div align=\"center\">\n",
"\n",
" <a href=\"https://ultralytics.com/yolov8\" target=\"_blank\">\n",
" <img width=\"1024\", src=\"https://raw.githubusercontent.com/ultralytics/assets/main/yolov8/banner-yolov8.png\"></a>\n",
"\n",
" [中文](https://docs.ultralytics.com/zh/) | [한국어](https://docs.ultralytics.com/ko/) | [日本語](https://docs.ultralytics.com/ja/) | [Русский](https://docs.ultralytics.com/ru/) | [Deutsch](https://docs.ultralytics.com/de/) | [Français](https://docs.ultralytics.com/fr/) | [Español](https://docs.ultralytics.com/es/) | [Português](https://docs.ultralytics.com/pt/) | [हिन्दी](https://docs.ultralytics.com/hi/) | [العربية](https://docs.ultralytics.com/ar/)\n",
"\n",
" <a href=\"https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/examples/heatmaps.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
"\n",
"Welcome to the Ultralytics YOLOv8 🚀 notebook! <a href=\"https://github.com/ultralytics/ultralytics\">YOLOv8</a> is the latest version of the YOLO (You Only Look Once) AI models developed by <a href=\"https://ultralytics.com\">Ultralytics</a>. This notebook serves as the starting point for exploring the <a href=\"https://docs.ultralytics.com/guides/heatmaps/\">heatmaps</a> and understand its features and capabilities.\n",
"\n",
"YOLOv8 models are fast, accurate, and easy to use, making them ideal for various object detection and image segmentation tasks. They can be trained on large datasets and run on diverse hardware platforms, from CPUs to GPUs.\n",
"\n",
"We hope that the resources in this notebook will help you get the most out of <a href=\"https://docs.ultralytics.com/guides/heatmaps/\">Ultralytics Heatmaps</a>. Please browse the YOLOv8 <a href=\"https://docs.ultralytics.com/\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/ultralytics\">GitHub</a> for support, and join our <a href=\"https://ultralytics.com/discord\">Discord</a> community for questions and discussions!\n",
"\n",
"</div>"
],
"metadata": {
"id": "PN1cAxdvd61e"
}
},
{
"cell_type": "markdown",
"source": [
"# Setup\n",
"\n",
"Pip install `ultralytics` and [dependencies](https://github.com/ultralytics/ultralytics/blob/main/pyproject.toml) and check software and hardware."
],
"metadata": {
"id": "o68Sg1oOeZm2"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9dSwz_uOReMI"
},
"outputs": [],
"source": [
"!pip install ultralytics"
]
},
{
"cell_type": "markdown",
"source": [
"# Ultralytics Heatmaps\n",
"\n",
"Heatmap is color-coded matrix, generated by Ultralytics YOLOv8, simplifies intricate data by using vibrant colors. This visual representation employs warmer hues for higher intensities and cooler tones for lower values. Heatmaps are effective in illustrating complex data patterns, correlations, and anomalies, providing a user-friendly and engaging way to interpret data across various domains."
],
"metadata": {
"id": "m7VkxQ2aeg7k"
}
},
{
"cell_type": "code",
"source": [
"from ultralytics import YOLO\n",
"from ultralytics.solutions import heatmap\n",
"import cv2\n",
"\n",
"model = YOLO(\"yolov8n.pt\")\n",
"cap = cv2.VideoCapture(\"path/to/video/file.mp4\")\n",
"assert cap.isOpened(), \"Error reading video file\"\n",
"w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))\n",
"\n",
"# Video writer\n",
"video_writer = cv2.VideoWriter(\"heatmap_output.avi\",\n",
" cv2.VideoWriter_fourcc(*'mp4v'),\n",
" fps,\n",
" (w, h))\n",
"\n",
"# Init heatmap\n",
"heatmap_obj = heatmap.Heatmap()\n",
"heatmap_obj.set_args(colormap=cv2.COLORMAP_PARULA,\n",
" imw=w,\n",
" imh=h,\n",
" view_img=True,\n",
" shape=\"circle\")\n",
"\n",
"while cap.isOpened():\n",
" success, im0 = cap.read()\n",
" if not success:\n",
" print(\"Video frame is empty or video processing has been successfully completed.\")\n",
" break\n",
" tracks = model.track(im0, persist=True, show=False)\n",
"\n",
" im0 = heatmap_obj.generate_heatmap(im0, tracks)\n",
" video_writer.write(im0)\n",
"\n",
"cap.release()\n",
"video_writer.release()\n",
"cv2.destroyAllWindows()"
],
"metadata": {
"id": "Cx-u59HQdu2o"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#Community Support\n",
"\n",
"For more information, you can explore <a href=\"https://docs.ultralytics.com/guides/heatmaps/#heatmap-colormaps\">Ultralytics Heatmaps Docs</a>\n",
"\n",
"Ultralytics ⚡ resources\n",
"- About Us https://ultralytics.com/about\n",
"- Join Our Team https://ultralytics.com/work\n",
"- Contact Us https://ultralytics.com/contact\n",
"- Discord https://ultralytics.com/discord\n",
"- Ultralytics License https://ultralytics.com/license\n",
"\n",
"YOLOv8 🚀 resources\n",
"- GitHub https://github.com/ultralytics/ultralytics\n",
"- Docs https://docs.ultralytics.com/"
],
"metadata": {
"id": "QrlKg-y3fEyD"
}
}
]
}

106
examples/hub.ipynb Normal file
View File

@ -0,0 +1,106 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Ultralytics HUB",
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "FIzICjaph_Wy"
},
"source": [
"<a align=\"center\" href=\"https://hub.ultralytics.com\" target=\"_blank\">\n",
"<img width=\"1024\", src=\"https://github.com/ultralytics/assets/raw/main/im/ultralytics-hub.png\"></a>\n",
"\n",
"<div align=\"center\">\n",
"\n",
"[中文](https://docs.ultralytics.com/zh/) | [한국어](https://docs.ultralytics.com/ko/) | [日本語](https://docs.ultralytics.com/ja/) | [Русский](https://docs.ultralytics.com/ru/) | [Deutsch](https://docs.ultralytics.com/de/) | [Français](https://docs.ultralytics.com/fr/) | [Español](https://docs.ultralytics.com/es/) | [Português](https://docs.ultralytics.com/pt/) | [हिन्दी](https://docs.ultralytics.com/hi/) | [العربية](https://docs.ultralytics.com/ar/)\n",
"\n",
" <a href=\"https://github.com/ultralytics/hub/actions/workflows/ci.yaml\">\n",
" <img src=\"https://github.com/ultralytics/hub/actions/workflows/ci.yaml/badge.svg\" alt=\"CI CPU\"></a>\n",
" <a href=\"https://colab.research.google.com/github/ultralytics/hub/blob/master/hub.ipynb\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
"\n",
"Welcome to the [Ultralytics](https://ultralytics.com/) HUB notebook!\n",
"\n",
"This notebook allows you to train [YOLOv5](https://github.com/ultralytics/yolov5) and [YOLOv8](https://github.com/ultralytics/ultralytics) 🚀 models using [HUB](https://hub.ultralytics.com/). Please browse the HUB <a href=\"https://docs.ultralytics.com/hub/\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/hub/issues/new/choose\">GitHub</a> for support, and join our <a href=\"https://ultralytics.com/discord\">Discord</a> community for questions and discussions!\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "eRQ2ow94MiOv"
},
"source": [
"# Setup\n",
"\n",
"Pip install `ultralytics` and [dependencies](https://github.com/ultralytics/ultralytics/blob/main/pyproject.toml) and check software and hardware."
]
},
{
"cell_type": "code",
"metadata": {
"id": "FyDnXd-n4c7Y",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "01e34b44-a26f-4dbc-a5a1-6e29bca01a1b"
},
"source": [
"%pip install ultralytics # install\n",
"from ultralytics import YOLO, checks, hub\n",
"checks() # checks"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"Ultralytics YOLOv8.0.210 🚀 Python-3.10.12 torch-2.0.1+cu118 CUDA:0 (Tesla T4, 15102MiB)\n",
"Setup complete ✅ (2 CPUs, 12.7 GB RAM, 24.4/78.2 GB disk)\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cQ9BwaAqxAm4"
},
"source": [
"# Start\n",
"\n",
"Login with your [API key](https://hub.ultralytics.com/settings?tab=api+keys), select your YOLO 🚀 model and start training!"
]
},
{
"cell_type": "code",
"metadata": {
"id": "XSlZaJ9Iw_iZ"
},
"source": [
"hub.login('API_KEY') # use your API key\n",
"\n",
"model = YOLO('https://hub.ultralytics.com/MODEL_ID') # use your model URL\n",
"results = model.train() # train model"
],
"execution_count": null,
"outputs": []
}
]
}

View File

@ -0,0 +1,147 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"gpuType": "T4"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"source": [
"<div align=\"center\">\n",
"\n",
" <a href=\"https://ultralytics.com/yolov8\" target=\"_blank\">\n",
" <img width=\"1024\", src=\"https://raw.githubusercontent.com/ultralytics/assets/main/yolov8/banner-yolov8.png\"></a>\n",
"\n",
" [中文](https://docs.ultralytics.com/zh/) | [한국어](https://docs.ultralytics.com/ko/) | [日本語](https://docs.ultralytics.com/ja/) | [Русский](https://docs.ultralytics.com/ru/) | [Deutsch](https://docs.ultralytics.com/de/) | [Français](https://docs.ultralytics.com/fr/) | [Español](https://docs.ultralytics.com/es/) | [Português](https://docs.ultralytics.com/pt/) | [हिन्दी](https://docs.ultralytics.com/hi/) | [العربية](https://docs.ultralytics.com/ar/)\n",
"\n",
" <a href=\"https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/examples/object_counting.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
"\n",
"Welcome to the Ultralytics YOLOv8 🚀 notebook! <a href=\"https://github.com/ultralytics/ultralytics\">YOLOv8</a> is the latest version of the YOLO (You Only Look Once) AI models developed by <a href=\"https://ultralytics.com\">Ultralytics</a>. This notebook serves as the starting point for exploring the <a href=\"https://docs.ultralytics.com/guides/object-counting/\">Object Counting</a> and understand its features and capabilities.\n",
"\n",
"YOLOv8 models are fast, accurate, and easy to use, making them ideal for various object detection and image segmentation tasks. They can be trained on large datasets and run on diverse hardware platforms, from CPUs to GPUs.\n",
"\n",
"We hope that the resources in this notebook will help you get the most out of <a href=\"https://docs.ultralytics.com/guides/object-counting/\">Ultralytics Object Counting</a>. Please browse the YOLOv8 <a href=\"https://docs.ultralytics.com/\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/ultralytics\">GitHub</a> for support, and join our <a href=\"https://ultralytics.com/discord\">Discord</a> community for questions and discussions!\n",
"\n",
"</div>"
],
"metadata": {
"id": "PN1cAxdvd61e"
}
},
{
"cell_type": "markdown",
"source": [
"# Setup\n",
"\n",
"Pip install `ultralytics` and [dependencies](https://github.com/ultralytics/ultralytics/blob/main/pyproject.toml) and check software and hardware."
],
"metadata": {
"id": "o68Sg1oOeZm2"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9dSwz_uOReMI"
},
"outputs": [],
"source": [
"!pip install ultralytics"
]
},
{
"cell_type": "markdown",
"source": [
"# Ultralytics Object Counting\n",
"\n",
"Counting objects using Ultralytics YOLOv8 entails the precise detection and enumeration of specific objects within videos and camera streams. YOLOv8 demonstrates exceptional performance in real-time applications, delivering efficient and accurate object counting across diverse scenarios such as crowd analysis and surveillance. This is attributed to its advanced algorithms and deep learning capabilities."
],
"metadata": {
"id": "m7VkxQ2aeg7k"
}
},
{
"cell_type": "code",
"source": [
"from ultralytics import YOLO\n",
"from ultralytics.solutions import object_counter\n",
"import cv2\n",
"\n",
"model = YOLO(\"yolov8n.pt\")\n",
"cap = cv2.VideoCapture(\"path/to/video/file.mp4\")\n",
"assert cap.isOpened(), \"Error reading video file\"\n",
"w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))\n",
"\n",
"# Define region points\n",
"region_points = [(20, 400), (1080, 404), (1080, 360), (20, 360)]\n",
"\n",
"# Video writer\n",
"video_writer = cv2.VideoWriter(\"object_counting_output.avi\",\n",
" cv2.VideoWriter_fourcc(*'mp4v'),\n",
" fps,\n",
" (w, h))\n",
"\n",
"# Init Object Counter\n",
"counter = object_counter.ObjectCounter()\n",
"counter.set_args(view_img=True,\n",
" reg_pts=region_points,\n",
" classes_names=model.names,\n",
" draw_tracks=True)\n",
"\n",
"while cap.isOpened():\n",
" success, im0 = cap.read()\n",
" if not success:\n",
" print(\"Video frame is empty or video processing has been successfully completed.\")\n",
" break\n",
" tracks = model.track(im0, persist=True, show=False)\n",
"\n",
" im0 = counter.start_counting(im0, tracks)\n",
" video_writer.write(im0)\n",
"\n",
"cap.release()\n",
"video_writer.release()\n",
"cv2.destroyAllWindows()"
],
"metadata": {
"id": "Cx-u59HQdu2o"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#Community Support\n",
"\n",
"For more information, you can explore <a href=\"https://docs.ultralytics.com/guides/object-counting/\">Ultralytics Object Counting Docs</a>\n",
"\n",
"Ultralytics ⚡ resources\n",
"- About Us https://ultralytics.com/about\n",
"- Join Our Team https://ultralytics.com/work\n",
"- Contact Us https://ultralytics.com/contact\n",
"- Discord https://ultralytics.com/discord\n",
"- Ultralytics License https://ultralytics.com/license\n",
"\n",
"YOLOv8 🚀 resources\n",
"- GitHub https://github.com/ultralytics/ultralytics\n",
"- Docs https://docs.ultralytics.com/"
],
"metadata": {
"id": "QrlKg-y3fEyD"
}
}
]
}

View File

@ -0,0 +1,203 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"gpuType": "T4"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"source": [
"<div align=\"center\">\n",
"\n",
" <a href=\"https://ultralytics.com/yolov8\" target=\"_blank\">\n",
" <img width=\"1024\", src=\"https://raw.githubusercontent.com/ultralytics/assets/main/yolov8/banner-yolov8.png\"></a>\n",
"\n",
" [中文](https://docs.ultralytics.com/zh/) | [한국어](https://docs.ultralytics.com/ko/) | [日本語](https://docs.ultralytics.com/ja/) | [Русский](https://docs.ultralytics.com/ru/) | [Deutsch](https://docs.ultralytics.com/de/) | [Français](https://docs.ultralytics.com/fr/) | [Español](https://docs.ultralytics.com/es/) | [Português](https://docs.ultralytics.com/pt/) | [हिन्दी](https://docs.ultralytics.com/hi/) | [العربية](https://docs.ultralytics.com/ar/)\n",
"\n",
" <a href=\"https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/examples/object_tracking.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
"\n",
"Welcome to the Ultralytics YOLOv8 🚀 notebook! <a href=\"https://github.com/ultralytics/ultralytics\">YOLOv8</a> is the latest version of the YOLO (You Only Look Once) AI models developed by <a href=\"https://ultralytics.com\">Ultralytics</a>. This notebook serves as the starting point for exploring the <a href=\"https://docs.ultralytics.com/modes/track/\">Object Tracking</a> and understand its features and capabilities.\n",
"\n",
"YOLOv8 models are fast, accurate, and easy to use, making them ideal for various object detection and image segmentation tasks. They can be trained on large datasets and run on diverse hardware platforms, from CPUs to GPUs.\n",
"\n",
"We hope that the resources in this notebook will help you get the most out of <a href=\"https://docs.ultralytics.com/modes/track/\">Ultralytics Object Tracking</a>. Please browse the YOLOv8 <a href=\"https://docs.ultralytics.com/\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/ultralytics\">GitHub</a> for support, and join our <a href=\"https://ultralytics.com/discord\">Discord</a> community for questions and discussions!\n",
"\n",
"</div>"
],
"metadata": {
"id": "PN1cAxdvd61e"
}
},
{
"cell_type": "markdown",
"source": [
"# Setup\n",
"\n",
"Pip install `ultralytics` and [dependencies](https://github.com/ultralytics/ultralytics/blob/main/pyproject.toml) and check software and hardware."
],
"metadata": {
"id": "o68Sg1oOeZm2"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9dSwz_uOReMI"
},
"outputs": [],
"source": [
"!pip install ultralytics"
]
},
{
"cell_type": "markdown",
"source": [
"# Ultralytics Object Tracking\n",
"\n",
"Within the domain of video analytics, object tracking stands out as a crucial undertaking. It goes beyond merely identifying the location and class of objects within the frame; it also involves assigning a unique ID to each detected object as the video unfolds. The applications of this technology are vast, spanning from surveillance and security to real-time sports analytics."
],
"metadata": {
"id": "m7VkxQ2aeg7k"
}
},
{
"cell_type": "markdown",
"source": [
"## CLI"
],
"metadata": {
"id": "-ZF9DM6e6gz0"
}
},
{
"cell_type": "code",
"source": [
"!yolo track source=\"/path/to/video/file.mp4\" save=True"
],
"metadata": {
"id": "-XJqhOwo6iqT"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Python\n",
"\n",
"- Draw Object tracking trails"
],
"metadata": {
"id": "XRcw0vIE6oNb"
}
},
{
"cell_type": "code",
"source": [
"import cv2\n",
"import numpy as np\n",
"from ultralytics import YOLO\n",
"\n",
"from ultralytics.utils.checks import check_imshow\n",
"from ultralytics.utils.plotting import Annotator, colors\n",
"\n",
"from collections import defaultdict\n",
"\n",
"track_history = defaultdict(lambda: [])\n",
"model = YOLO(\"yolov8n.pt\")\n",
"names = model.model.names\n",
"\n",
"video_path = \"/path/to/video/file.mp4\"\n",
"cap = cv2.VideoCapture(video_path)\n",
"assert cap.isOpened(), \"Error reading video file\"\n",
"\n",
"w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))\n",
"\n",
"result = cv2.VideoWriter(\"object_tracking.avi\",\n",
" cv2.VideoWriter_fourcc(*'mp4v'),\n",
" fps,\n",
" (w, h))\n",
"\n",
"while cap.isOpened():\n",
" success, frame = cap.read()\n",
" if success:\n",
" results = model.track(frame, persist=True, verbose=False)\n",
" boxes = results[0].boxes.xyxy.cpu()\n",
"\n",
" if results[0].boxes.id is not None:\n",
"\n",
" # Extract prediction results\n",
" clss = results[0].boxes.cls.cpu().tolist()\n",
" track_ids = results[0].boxes.id.int().cpu().tolist()\n",
" confs = results[0].boxes.conf.float().cpu().tolist()\n",
"\n",
" # Annotator Init\n",
" annotator = Annotator(frame, line_width=2)\n",
"\n",
" for box, cls, track_id in zip(boxes, clss, track_ids):\n",
" annotator.box_label(box, color=colors(int(cls), True), label=names[int(cls)])\n",
"\n",
" # Store tracking history\n",
" track = track_history[track_id]\n",
" track.append((int((box[0] + box[2]) / 2), int((box[1] + box[3]) / 2)))\n",
" if len(track) > 30:\n",
" track.pop(0)\n",
"\n",
" # Plot tracks\n",
" points = np.array(track, dtype=np.int32).reshape((-1, 1, 2))\n",
" cv2.circle(frame, (track[-1]), 7, colors(int(cls), True), -1)\n",
" cv2.polylines(frame, [points], isClosed=False, color=colors(int(cls), True), thickness=2)\n",
"\n",
" result.write(frame)\n",
" if cv2.waitKey(1) & 0xFF == ord(\"q\"):\n",
" break\n",
" else:\n",
" break\n",
"\n",
"result.release()\n",
"cap.release()\n",
"cv2.destroyAllWindows()"
],
"metadata": {
"id": "Cx-u59HQdu2o"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#Community Support\n",
"\n",
"For more information, you can explore <a href=\"https://docs.ultralytics.com/modes/track/\">Ultralytics Object Tracking Docs</a>\n",
"\n",
"Ultralytics ⚡ resources\n",
"- About Us https://ultralytics.com/about\n",
"- Join Our Team https://ultralytics.com/work\n",
"- Contact Us https://ultralytics.com/contact\n",
"- Discord https://ultralytics.com/discord\n",
"- Ultralytics License https://ultralytics.com/license\n",
"\n",
"YOLOv8 🚀 resources\n",
"- GitHub https://github.com/ultralytics/ultralytics\n",
"- Docs https://docs.ultralytics.com/"
],
"metadata": {
"id": "QrlKg-y3fEyD"
}
}
]
}

649
examples/tutorial.ipynb Normal file
View File

@ -0,0 +1,649 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "YOLOv8 Tutorial",
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "t6MPjfT5NrKQ"
},
"source": [
"<div align=\"center\">\n",
"\n",
" <a href=\"https://ultralytics.com/yolov8\" target=\"_blank\">\n",
" <img width=\"1024\", src=\"https://raw.githubusercontent.com/ultralytics/assets/main/yolov8/banner-yolov8.png\"></a>\n",
"\n",
" [中文](https://docs.ultralytics.com/zh/) | [한국어](https://docs.ultralytics.com/ko/) | [日本語](https://docs.ultralytics.com/ja/) | [Русский](https://docs.ultralytics.com/ru/) | [Deutsch](https://docs.ultralytics.com/de/) | [Français](https://docs.ultralytics.com/fr/) | [Español](https://docs.ultralytics.com/es/) | [Português](https://docs.ultralytics.com/pt/) | [हिन्दी](https://docs.ultralytics.com/hi/) | [العربية](https://docs.ultralytics.com/ar/)\n",
"\n",
" <a href=\"https://console.paperspace.com/github/ultralytics/ultralytics\"><img src=\"https://assets.paperspace.io/img/gradient-badge.svg\" alt=\"Run on Gradient\"/></a>\n",
" <a href=\"https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/examples/tutorial.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
" <a href=\"https://www.kaggle.com/ultralytics/yolov8\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" alt=\"Open In Kaggle\"></a>\n",
"\n",
"Welcome to the Ultralytics YOLOv8 🚀 notebook! <a href=\"https://github.com/ultralytics/ultralytics\">YOLOv8</a> is the latest version of the YOLO (You Only Look Once) AI models developed by <a href=\"https://ultralytics.com\">Ultralytics</a>. This notebook serves as the starting point for exploring the various resources available to help you get started with YOLOv8 and understand its features and capabilities.\n",
"\n",
"YOLOv8 models are fast, accurate, and easy to use, making them ideal for various object detection and image segmentation tasks. They can be trained on large datasets and run on diverse hardware platforms, from CPUs to GPUs.\n",
"\n",
"We hope that the resources in this notebook will help you get the most out of YOLOv8. Please browse the YOLOv8 <a href=\"https://docs.ultralytics.com/\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/ultralytics\">GitHub</a> for support, and join our <a href=\"https://ultralytics.com/discord\">Discord</a> community for questions and discussions!\n",
"\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7mGmQbAO5pQb"
},
"source": [
"# Setup\n",
"\n",
"Pip install `ultralytics` and [dependencies](https://github.com/ultralytics/ultralytics/blob/main/pyproject.toml) and check software and hardware."
]
},
{
"cell_type": "code",
"metadata": {
"id": "wbvMlHd_QwMG",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "51d15672-e688-4fb8-d9d0-00d1916d3532"
},
"source": [
"%pip install ultralytics\n",
"import ultralytics\n",
"ultralytics.checks()"
],
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Ultralytics YOLOv8.1.23 🚀 Python-3.10.12 torch-2.1.0+cu121 CUDA:0 (Tesla T4, 15102MiB)\n",
"Setup complete ✅ (2 CPUs, 12.7 GB RAM, 26.3/78.2 GB disk)\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4JnkELT0cIJg"
},
"source": [
"# 1. Predict\n",
"\n",
"YOLOv8 may be used directly in the Command Line Interface (CLI) with a `yolo` command for a variety of tasks and modes and accepts additional arguments, i.e. `imgsz=640`. See a full list of available `yolo` [arguments](https://docs.ultralytics.com/usage/cfg/) and other details in the [YOLOv8 Predict Docs](https://docs.ultralytics.com/modes/train/).\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "zR9ZbuQCH7FX",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "37738db7-4284-47de-b3ed-b82f2431ed23"
},
"source": [
"# Run inference on an image with YOLOv8n\n",
"!yolo predict model=yolov8n.pt source='https://ultralytics.com/images/zidane.jpg'"
],
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Downloading https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8n.pt to 'yolov8n.pt'...\n",
"100% 6.23M/6.23M [00:00<00:00, 72.6MB/s]\n",
"Ultralytics YOLOv8.1.23 🚀 Python-3.10.12 torch-2.1.0+cu121 CUDA:0 (Tesla T4, 15102MiB)\n",
"YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs\n",
"\n",
"Downloading https://ultralytics.com/images/zidane.jpg to 'zidane.jpg'...\n",
"100% 165k/165k [00:00<00:00, 7.05MB/s]\n",
"image 1/1 /content/zidane.jpg: 384x640 2 persons, 1 tie, 162.0ms\n",
"Speed: 13.9ms preprocess, 162.0ms inference, 1259.5ms postprocess per image at shape (1, 3, 384, 640)\n",
"Results saved to \u001b[1mruns/detect/predict\u001b[0m\n",
"💡 Learn more at https://docs.ultralytics.com/modes/predict\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "hkAzDWJ7cWTr"
},
"source": [
"&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
"<img align=\"left\" src=\"https://user-images.githubusercontent.com/26833433/212889447-69e5bdf1-5800-4e29-835e-2ed2336dede2.jpg\" width=\"600\">"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0eq1SMWl6Sfn"
},
"source": [
"# 2. Val\n",
"Validate a model's accuracy on the [COCO](https://docs.ultralytics.com/datasets/detect/coco/) dataset's `val` or `test` splits. The latest YOLOv8 [models](https://github.com/ultralytics/ultralytics#models) are downloaded automatically the first time they are used. See [YOLOv8 Val Docs](https://docs.ultralytics.com/modes/val/) for more information."
]
},
{
"cell_type": "code",
"metadata": {
"id": "WQPtK1QYVaD_"
},
"source": [
"# Download COCO val\n",
"import torch\n",
"torch.hub.download_url_to_file('https://ultralytics.com/assets/coco2017val.zip', 'tmp.zip') # download (780M - 5000 images)\n",
"!unzip -q tmp.zip -d datasets && rm tmp.zip # unzip"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "X58w8JLpMnjH",
"outputId": "61001937-ccd2-4157-a373-156a57495231",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"# Validate YOLOv8n on COCO8 val\n",
"!yolo val model=yolov8n.pt data=coco8.yaml"
],
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Ultralytics YOLOv8.1.23 🚀 Python-3.10.12 torch-2.1.0+cu121 CUDA:0 (Tesla T4, 15102MiB)\n",
"YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs\n",
"\n",
"Dataset 'coco8.yaml' images not found ⚠️, missing path '/content/datasets/coco8/images/val'\n",
"Downloading https://ultralytics.com/assets/coco8.zip to '/content/datasets/coco8.zip'...\n",
"100% 433k/433k [00:00<00:00, 12.5MB/s]\n",
"Unzipping /content/datasets/coco8.zip to /content/datasets/coco8...: 100% 25/25 [00:00<00:00, 4546.38file/s]\n",
"Dataset download success ✅ (0.9s), saved to \u001b[1m/content/datasets\u001b[0m\n",
"\n",
"Downloading https://ultralytics.com/assets/Arial.ttf to '/root/.config/Ultralytics/Arial.ttf'...\n",
"100% 755k/755k [00:00<00:00, 17.8MB/s]\n",
"\u001b[34m\u001b[1mval: \u001b[0mScanning /content/datasets/coco8/labels/val... 4 images, 0 backgrounds, 0 corrupt: 100% 4/4 [00:00<00:00, 275.94it/s]\n",
"\u001b[34m\u001b[1mval: \u001b[0mNew cache created: /content/datasets/coco8/labels/val.cache\n",
" Class Images Instances Box(P R mAP50 mAP50-95): 100% 1/1 [00:02<00:00, 2.23s/it]\n",
" all 4 17 0.621 0.833 0.888 0.63\n",
" person 4 10 0.721 0.5 0.519 0.269\n",
" dog 4 1 0.37 1 0.995 0.597\n",
" horse 4 2 0.751 1 0.995 0.631\n",
" elephant 4 2 0.505 0.5 0.828 0.394\n",
" umbrella 4 1 0.564 1 0.995 0.995\n",
" potted plant 4 1 0.814 1 0.995 0.895\n",
"Speed: 0.3ms preprocess, 56.9ms inference, 0.0ms loss, 222.8ms postprocess per image\n",
"Results saved to \u001b[1mruns/detect/val\u001b[0m\n",
"💡 Learn more at https://docs.ultralytics.com/modes/val\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZY2VXXXu74w5"
},
"source": [
"# 3. Train\n",
"\n",
"<p align=\"\"><a href=\"https://bit.ly/ultralytics_hub\"><img width=\"1000\" src=\"https://github.com/ultralytics/assets/raw/main/yolov8/banner-integrations.png\"/></a></p>\n",
"\n",
"Train YOLOv8 on [Detect](https://docs.ultralytics.com/tasks/detect/), [Segment](https://docs.ultralytics.com/tasks/segment/), [Classify](https://docs.ultralytics.com/tasks/classify/) and [Pose](https://docs.ultralytics.com/tasks/pose/) datasets. See [YOLOv8 Train Docs](https://docs.ultralytics.com/modes/train/) for more information."
]
},
{
"cell_type": "code",
"source": [
"#@title Select YOLOv8 🚀 logger {run: 'auto'}\n",
"logger = 'Comet' #@param ['Comet', 'TensorBoard']\n",
"\n",
"if logger == 'Comet':\n",
" %pip install -q comet_ml\n",
" import comet_ml; comet_ml.init()\n",
"elif logger == 'TensorBoard':\n",
" %load_ext tensorboard\n",
" %tensorboard --logdir ."
],
"metadata": {
"id": "ktegpM42AooT"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "1NcFxRcFdJ_O",
"outputId": "1ec62d53-41eb-444f-e2f7-cef5c18b9a27",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"# Train YOLOv8n on COCO8 for 3 epochs\n",
"!yolo train model=yolov8n.pt data=coco8.yaml epochs=3 imgsz=640"
],
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Ultralytics YOLOv8.1.23 🚀 Python-3.10.12 torch-2.1.0+cu121 CUDA:0 (Tesla T4, 15102MiB)\n",
"\u001b[34m\u001b[1mengine/trainer: \u001b[0mtask=detect, mode=train, model=yolov8n.pt, data=coco8.yaml, epochs=3, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/train\n",
"\n",
" from n params module arguments \n",
" 0 -1 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2] \n",
" 1 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2] \n",
" 2 -1 1 7360 ultralytics.nn.modules.block.C2f [32, 32, 1, True] \n",
" 3 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2] \n",
" 4 -1 2 49664 ultralytics.nn.modules.block.C2f [64, 64, 2, True] \n",
" 5 -1 1 73984 ultralytics.nn.modules.conv.Conv [64, 128, 3, 2] \n",
" 6 -1 2 197632 ultralytics.nn.modules.block.C2f [128, 128, 2, True] \n",
" 7 -1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2] \n",
" 8 -1 1 460288 ultralytics.nn.modules.block.C2f [256, 256, 1, True] \n",
" 9 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5] \n",
" 10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] \n",
" 11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1] \n",
" 12 -1 1 148224 ultralytics.nn.modules.block.C2f [384, 128, 1] \n",
" 13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] \n",
" 14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1] \n",
" 15 -1 1 37248 ultralytics.nn.modules.block.C2f [192, 64, 1] \n",
" 16 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2] \n",
" 17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1] \n",
" 18 -1 1 123648 ultralytics.nn.modules.block.C2f [192, 128, 1] \n",
" 19 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2] \n",
" 20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1] \n",
" 21 -1 1 493056 ultralytics.nn.modules.block.C2f [384, 256, 1] \n",
" 22 [15, 18, 21] 1 897664 ultralytics.nn.modules.head.Detect [80, [64, 128, 256]] \n",
"Model summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs\n",
"\n",
"Transferred 355/355 items from pretrained weights\n",
"\u001b[34m\u001b[1mTensorBoard: \u001b[0mStart with 'tensorboard --logdir runs/detect/train', view at http://localhost:6006/\n",
"Freezing layer 'model.22.dfl.conv.weight'\n",
"\u001b[34m\u001b[1mAMP: \u001b[0mrunning Automatic Mixed Precision (AMP) checks with YOLOv8n...\n",
"\u001b[34m\u001b[1mAMP: \u001b[0mchecks passed ✅\n",
"\u001b[34m\u001b[1mtrain: \u001b[0mScanning /content/datasets/coco8/labels/train... 4 images, 0 backgrounds, 0 corrupt: 100% 4/4 [00:00<00:00, 43351.98it/s]\n",
"\u001b[34m\u001b[1mtrain: \u001b[0mNew cache created: /content/datasets/coco8/labels/train.cache\n",
"\u001b[34m\u001b[1malbumentations: \u001b[0mBlur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))\n",
"\u001b[34m\u001b[1mval: \u001b[0mScanning /content/datasets/coco8/labels/val.cache... 4 images, 0 backgrounds, 0 corrupt: 100% 4/4 [00:00<?, ?it/s]\n",
"Plotting labels to runs/detect/train/labels.jpg... \n",
"\u001b[34m\u001b[1moptimizer:\u001b[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... \n",
"\u001b[34m\u001b[1moptimizer:\u001b[0m AdamW(lr=0.000119, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)\n",
"\u001b[34m\u001b[1mTensorBoard: \u001b[0mmodel graph visualization added ✅\n",
"Image sizes 640 train, 640 val\n",
"Using 2 dataloader workers\n",
"Logging results to \u001b[1mruns/detect/train\u001b[0m\n",
"Starting training for 3 epochs...\n",
"\n",
" Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size\n",
" 1/3 0.77G 0.9308 3.155 1.291 32 640: 100% 1/1 [00:01<00:00, 1.70s/it]\n",
" Class Images Instances Box(P R mAP50 mAP50-95): 100% 1/1 [00:00<00:00, 1.90it/s]\n",
" all 4 17 0.858 0.54 0.726 0.51\n",
"\n",
" Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size\n",
" 2/3 0.78G 1.162 3.127 1.518 33 640: 100% 1/1 [00:00<00:00, 8.18it/s]\n",
" Class Images Instances Box(P R mAP50 mAP50-95): 100% 1/1 [00:00<00:00, 3.71it/s]\n",
" all 4 17 0.904 0.526 0.742 0.5\n",
"\n",
" Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size\n",
" 3/3 0.759G 0.925 2.507 1.254 17 640: 100% 1/1 [00:00<00:00, 7.53it/s]\n",
" Class Images Instances Box(P R mAP50 mAP50-95): 100% 1/1 [00:00<00:00, 6.80it/s]\n",
" all 4 17 0.906 0.532 0.741 0.513\n",
"\n",
"3 epochs completed in 0.002 hours.\n",
"Optimizer stripped from runs/detect/train/weights/last.pt, 6.5MB\n",
"Optimizer stripped from runs/detect/train/weights/best.pt, 6.5MB\n",
"\n",
"Validating runs/detect/train/weights/best.pt...\n",
"Ultralytics YOLOv8.1.23 🚀 Python-3.10.12 torch-2.1.0+cu121 CUDA:0 (Tesla T4, 15102MiB)\n",
"Model summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs\n",
" Class Images Instances Box(P R mAP50 mAP50-95): 100% 1/1 [00:00<00:00, 16.31it/s]\n",
" all 4 17 0.906 0.533 0.755 0.515\n",
" person 4 10 0.942 0.3 0.519 0.233\n",
" dog 4 1 1 0 0.332 0.162\n",
" horse 4 2 1 0.9 0.995 0.698\n",
" elephant 4 2 1 0 0.695 0.206\n",
" umbrella 4 1 0.755 1 0.995 0.895\n",
" potted plant 4 1 0.739 1 0.995 0.895\n",
"Speed: 0.3ms preprocess, 6.1ms inference, 0.0ms loss, 2.5ms postprocess per image\n",
"Results saved to \u001b[1mruns/detect/train\u001b[0m\n",
"💡 Learn more at https://docs.ultralytics.com/modes/train\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"# 4. Export\n",
"\n",
"Export a YOLOv8 model to any supported format below with the `format` argument, i.e. `format=onnx`. See [YOLOv8 Export Docs](https://docs.ultralytics.com/modes/export/) for more information.\n",
"\n",
"- 💡 ProTip: Export to [ONNX](https://docs.ultralytics.com/integrations/onnx/) or [OpenVINO](https://docs.ultralytics.com/integrations/openvino/) for up to 3x CPU speedup. \n",
"- 💡 ProTip: Export to [TensorRT](https://docs.ultralytics.com/integrations/tensorrt/) for up to 5x GPU speedup.\n",
"\n",
"| Format | `format` Argument | Model | Metadata | Arguments |\n",
"|--------------------------------------------------------------------|-------------------|---------------------------|----------|-----------------------------------------------------|\n",
"| [PyTorch](https://pytorch.org/) | - | `yolov8n.pt` | ✅ | - |\n",
"| [TorchScript](https://pytorch.org/docs/stable/jit.html) | `torchscript` | `yolov8n.torchscript` | ✅ | `imgsz`, `optimize` |\n",
"| [ONNX](https://onnx.ai/) | `onnx` | `yolov8n.onnx` | ✅ | `imgsz`, `half`, `dynamic`, `simplify`, `opset` |\n",
"| [OpenVINO](https://docs.openvino.ai/) | `openvino` | `yolov8n_openvino_model/` | ✅ | `imgsz`, `half`, `int8` |\n",
"| [TensorRT](https://developer.nvidia.com/tensorrt) | `engine` | `yolov8n.engine` | ✅ | `imgsz`, `half`, `dynamic`, `simplify`, `workspace` |\n",
"| [CoreML](https://github.com/apple/coremltools) | `coreml` | `yolov8n.mlpackage` | ✅ | `imgsz`, `half`, `int8`, `nms` |\n",
"| [TF SavedModel](https://www.tensorflow.org/guide/saved_model) | `saved_model` | `yolov8n_saved_model/` | ✅ | `imgsz`, `keras`, `int8` |\n",
"| [TF GraphDef](https://www.tensorflow.org/api_docs/python/tf/Graph) | `pb` | `yolov8n.pb` | ❌ | `imgsz` |\n",
"| [TF Lite](https://www.tensorflow.org/lite) | `tflite` | `yolov8n.tflite` | ✅ | `imgsz`, `half`, `int8` |\n",
"| [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/) | `edgetpu` | `yolov8n_edgetpu.tflite` | ✅ | `imgsz` |\n",
"| [TF.js](https://www.tensorflow.org/js) | `tfjs` | `yolov8n_web_model/` | ✅ | `imgsz`, `half`, `int8` |\n",
"| [PaddlePaddle](https://github.com/PaddlePaddle) | `paddle` | `yolov8n_paddle_model/` | ✅ | `imgsz` |\n",
"| [NCNN](https://github.com/Tencent/ncnn) | `ncnn` | `yolov8n_ncnn_model/` | ✅ | `imgsz`, `half` |\n"
],
"metadata": {
"id": "nPZZeNrLCQG6"
}
},
{
"cell_type": "code",
"source": [
"!yolo export model=yolov8n.pt format=torchscript"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "CYIjW4igCjqD",
"outputId": "f6d45666-07b4-4214-86c0-4e83e70ac096"
},
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Ultralytics YOLOv8.1.23 🚀 Python-3.10.12 torch-2.1.0+cu121 CPU (Intel Xeon 2.30GHz)\n",
"YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs\n",
"\n",
"\u001b[34m\u001b[1mPyTorch:\u001b[0m starting from 'yolov8n.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (6.2 MB)\n",
"\n",
"\u001b[34m\u001b[1mTorchScript:\u001b[0m starting export with torch 2.1.0+cu121...\n",
"\u001b[34m\u001b[1mTorchScript:\u001b[0m export success ✅ 2.4s, saved as 'yolov8n.torchscript' (12.4 MB)\n",
"\n",
"Export complete (4.5s)\n",
"Results saved to \u001b[1m/content\u001b[0m\n",
"Predict: yolo predict task=detect model=yolov8n.torchscript imgsz=640 \n",
"Validate: yolo val task=detect model=yolov8n.torchscript imgsz=640 data=coco.yaml \n",
"Visualize: https://netron.app\n",
"💡 Learn more at https://docs.ultralytics.com/modes/export\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"# 5. Python Usage\n",
"\n",
"YOLOv8 was reimagined using Python-first principles for the most seamless Python YOLO experience yet. YOLOv8 models can be loaded from a trained checkpoint or created from scratch. Then methods are used to train, val, predict, and export the model. See detailed Python usage examples in the [YOLOv8 Python Docs](https://docs.ultralytics.com/usage/python/)."
],
"metadata": {
"id": "kUMOQ0OeDBJG"
}
},
{
"cell_type": "code",
"source": [
"from ultralytics import YOLO\n",
"\n",
"# Load a model\n",
"model = YOLO('yolov8n.yaml') # build a new model from scratch\n",
"model = YOLO('yolov8n.pt') # load a pretrained model (recommended for training)\n",
"\n",
"# Use the model\n",
"results = model.train(data='coco128.yaml', epochs=3) # train the model\n",
"results = model.val() # evaluate model performance on the validation set\n",
"results = model('https://ultralytics.com/images/bus.jpg') # predict on an image\n",
"results = model.export(format='onnx') # export the model to ONNX format"
],
"metadata": {
"id": "bpF9-vS_DAaf"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# 6. Tasks\n",
"\n",
"YOLOv8 can train, val, predict and export models for the most common tasks in vision AI: [Detect](https://docs.ultralytics.com/tasks/detect/), [Segment](https://docs.ultralytics.com/tasks/segment/), [Classify](https://docs.ultralytics.com/tasks/classify/) and [Pose](https://docs.ultralytics.com/tasks/pose/). See [YOLOv8 Tasks Docs](https://docs.ultralytics.com/tasks/) for more information.\n",
"\n",
"<br><img width=\"1024\" src=\"https://raw.githubusercontent.com/ultralytics/assets/main/im/banner-tasks.png\">\n"
],
"metadata": {
"id": "Phm9ccmOKye5"
}
},
{
"cell_type": "markdown",
"source": [
"## 1. Detection\n",
"\n",
"YOLOv8 _detection_ models have no suffix and are the default YOLOv8 models, i.e. `yolov8n.pt` and are pretrained on COCO. See [Detection Docs](https://docs.ultralytics.com/tasks/detect/) for full details.\n"
],
"metadata": {
"id": "yq26lwpYK1lq"
}
},
{
"cell_type": "code",
"source": [
"# Load YOLOv8n, train it on COCO128 for 3 epochs and predict an image with it\n",
"from ultralytics import YOLO\n",
"\n",
"model = YOLO('yolov8n.pt') # load a pretrained YOLOv8n detection model\n",
"model.train(data='coco128.yaml', epochs=3) # train the model\n",
"model('https://ultralytics.com/images/bus.jpg') # predict on an image"
],
"metadata": {
"id": "8Go5qqS9LbC5"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## 2. Segmentation\n",
"\n",
"YOLOv8 _segmentation_ models use the `-seg` suffix, i.e. `yolov8n-seg.pt` and are pretrained on COCO. See [Segmentation Docs](https://docs.ultralytics.com/tasks/segment/) for full details.\n"
],
"metadata": {
"id": "7ZW58jUzK66B"
}
},
{
"cell_type": "code",
"source": [
"# Load YOLOv8n-seg, train it on COCO128-seg for 3 epochs and predict an image with it\n",
"from ultralytics import YOLO\n",
"\n",
"model = YOLO('yolov8n-seg.pt') # load a pretrained YOLOv8n segmentation model\n",
"model.train(data='coco128-seg.yaml', epochs=3) # train the model\n",
"model('https://ultralytics.com/images/bus.jpg') # predict on an image"
],
"metadata": {
"id": "WFPJIQl_L5HT"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## 3. Classification\n",
"\n",
"YOLOv8 _classification_ models use the `-cls` suffix, i.e. `yolov8n-cls.pt` and are pretrained on ImageNet. See [Classification Docs](https://docs.ultralytics.com/tasks/classify/) for full details.\n"
],
"metadata": {
"id": "ax3p94VNK9zR"
}
},
{
"cell_type": "code",
"source": [
"# Load YOLOv8n-cls, train it on mnist160 for 3 epochs and predict an image with it\n",
"from ultralytics import YOLO\n",
"\n",
"model = YOLO('yolov8n-cls.pt') # load a pretrained YOLOv8n classification model\n",
"model.train(data='mnist160', epochs=3) # train the model\n",
"model('https://ultralytics.com/images/bus.jpg') # predict on an image"
],
"metadata": {
"id": "5q9Zu6zlL5rS"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## 4. Pose\n",
"\n",
"YOLOv8 _pose_ models use the `-pose` suffix, i.e. `yolov8n-pose.pt` and are pretrained on COCO Keypoints. See [Pose Docs](https://docs.ultralytics.com/tasks/pose/) for full details."
],
"metadata": {
"id": "SpIaFLiO11TG"
}
},
{
"cell_type": "code",
"source": [
"# Load YOLOv8n-pose, train it on COCO8-pose for 3 epochs and predict an image with it\n",
"from ultralytics import YOLO\n",
"\n",
"model = YOLO('yolov8n-pose.pt') # load a pretrained YOLOv8n pose model\n",
"model.train(data='coco8-pose.yaml', epochs=3) # train the model\n",
"model('https://ultralytics.com/images/bus.jpg') # predict on an image"
],
"metadata": {
"id": "si4aKFNg19vX"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## 4. Oriented Bounding Boxes (OBB)\n",
"\n",
"YOLOv8 _OBB_ models use the `-obb` suffix, i.e. `yolov8n-obb.pt` and are pretrained on the DOTA dataset. See [OBB Docs](https://docs.ultralytics.com/tasks/obb/) for full details."
],
"metadata": {
"id": "cf5j_T9-B5F0"
}
},
{
"cell_type": "code",
"source": [
"# Load YOLOv8n-obb, train it on DOTA8 for 3 epochs and predict an image with it\n",
"from ultralytics import YOLO\n",
"\n",
"model = YOLO('yolov8n-obb.pt') # load a pretrained YOLOv8n OBB model\n",
"model.train(data='coco8-dota.yaml', epochs=3) # train the model\n",
"model('https://ultralytics.com/images/bus.jpg') # predict on an image"
],
"metadata": {
"id": "IJNKClOOB5YS"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "IEijrePND_2I"
},
"source": [
"# Appendix\n",
"\n",
"Additional content below."
]
},
{
"cell_type": "code",
"source": [
"# Pip install from source\n",
"!pip install git+https://github.com/ultralytics/ultralytics@main"
],
"metadata": {
"id": "pIdE6i8C3LYp"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Git clone and run tests on updates branch\n",
"!git clone https://github.com/ultralytics/ultralytics -b main\n",
"%pip install -qe ultralytics"
],
"metadata": {
"id": "uRKlwxSJdhd1"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Run tests (Git clone only)\n",
"!pytest ultralytics/tests"
],
"metadata": {
"id": "GtPlh7mcCGZX"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Validate multiple models\n",
"for x in 'nsmlx':\n",
" !yolo val model=yolov8{x}.pt data=coco.yaml"
],
"metadata": {
"id": "Wdc6t_bfzDDk"
},
"execution_count": null,
"outputs": []
}
]
}