update

2025-08-23 23:50:25 +00:00 · 2025-06-18 14:35:43 +08:00
commit e474ab5f9f
529 changed files with 80523 additions and 0 deletions
--- a/examples/README.md
+++ b/examples/README.md
@ -0,0 +1,36 @@
+## Ultralytics YOLOv8 Example Applications
+
+This repository features a collection of real-world applications and walkthroughs, provided as either Python files or notebooks. Explore the examples below to see how YOLOv8 can be integrated into various applications.
+
+### Ultralytics YOLO Example Applications
+
+| Title                                                                                                                                     | Format             | Contributor                                                                               |
+| ----------------------------------------------------------------------------------------------------------------------------------------- | ------------------ | ----------------------------------------------------------------------------------------- |
+| [YOLO ONNX Detection Inference with C++](./YOLOv8-CPP-Inference)                                                                          | C++/ONNX           | [Justas Bartnykas](https://github.com/JustasBart)                                         |
+| [YOLO OpenCV ONNX Detection Python](./YOLOv8-OpenCV-ONNX-Python)                                                                          | OpenCV/Python/ONNX | [Farid Inawan](https://github.com/frdteknikelektro)                                       |
+| [YOLOv8 .NET ONNX ImageSharp](https://github.com/dme-compunet/YOLOv8)                                                                     | C#/ONNX/ImageSharp | [Compunet](https://github.com/dme-compunet)                                               |
+| [YOLO .Net ONNX Detection C#](https://www.nuget.org/packages/Yolov8.Net)                                                                  | C# .Net            | [Samuel Stainback](https://github.com/sstainba)                                           |
+| [YOLOv8 on NVIDIA Jetson(TensorRT and DeepStream)](https://wiki.seeedstudio.com/YOLOv8-DeepStream-TRT-Jetson/)                            | Python             | [Lakshantha](https://github.com/lakshanthad)                                              |
+| [YOLOv8 ONNXRuntime Python](./YOLOv8-ONNXRuntime)                                                                                         | Python/ONNXRuntime | [Semih Demirel](https://github.com/semihhdemirel)                                         |
+| [YOLOv8 ONNXRuntime CPP](./YOLOv8-ONNXRuntime-CPP)                                                                                        | C++/ONNXRuntime    | [DennisJcy](https://github.com/DennisJcy), [Onuralp Sezer](https://github.com/onuralpszr) |
+| [RTDETR ONNXRuntime C#](https://github.com/Kayzwer/yolo-cs/blob/master/RTDETR.cs)                                                         | C#/ONNX            | [Kayzwer](https://github.com/Kayzwer)                                                     |
+| [YOLOv8 SAHI Video Inference](https://github.com/RizwanMunawar/ultralytics/blob/main/examples/YOLOv8-SAHI-Inference-Video/yolov8_sahi.py) | Python             | [Muhammad Rizwan Munawar](https://github.com/RizwanMunawar)                               |
+| [YOLOv8 Region Counter](https://github.com/RizwanMunawar/ultralytics/blob/main/examples/YOLOv8-Region-Counter/yolov8_region_counter.py)   | Python             | [Muhammad Rizwan Munawar](https://github.com/RizwanMunawar)                               |
+| [YOLOv8 Segmentation ONNXRuntime Python](./YOLOv8-Segmentation-ONNXRuntime-Python)                                                        | Python/ONNXRuntime | [jamjamjon](https://github.com/jamjamjon)                                                 |
+| [YOLOv8 LibTorch CPP](./YOLOv8-LibTorch-CPP-Inference)                                                                                    | C++/LibTorch       | [Myyura](https://github.com/Myyura)                                                       |
+| [YOLOv8 OpenCV INT8 TFLite Python](./YOLOv8-OpenCV-int8-tflite-Python)                                                                    | Python             | [Wamiq Raza](https://github.com/wamiqraza)                                                |
+
+### How to Contribute
+
+We greatly appreciate contributions from the community, including examples, applications, and guides. If you'd like to contribute, please follow these guidelines:
+
+1. Create a pull request (PR) with the title prefix `[Example]`, adding your new example folder to the `examples/` directory within the repository.
+2. Make sure your project adheres to the following standards:
+    - Makes use of the `ultralytics` package.
+    - Includes a `README.md` with clear instructions for setting up and running the example.
+    - Refrains from adding large files or dependencies unless they are absolutely necessary for the example.
+    - Contributors should be willing to provide support for their examples and address related issues.
+
+For more detailed information and guidance on contributing, please visit our [contribution documentation](https://docs.ultralytics.com/help/contributing).
+
+If you encounter any questions or concerns regarding these guidelines, feel free to open a PR or an issue in the repository, and we will assist you in the contribution process.
--- a/examples/YOLOv8-CPP-Inference/CMakeLists.txt
+++ b/examples/YOLOv8-CPP-Inference/CMakeLists.txt
@ -0,0 +1,28 @@
+cmake_minimum_required(VERSION 3.5)
+
+project(Yolov8CPPInference VERSION 0.1)
+
+set(CMAKE_INCLUDE_CURRENT_DIR ON)
+
+# CUDA
+set(CUDA_TOOLKIT_ROOT_DIR "/usr/local/cuda")
+find_package(CUDA 11 REQUIRED)
+
+set(CMAKE_CUDA_STANDARD 11)
+set(CMAKE_CUDA_STANDARD_REQUIRED ON)
+# !CUDA
+
+# OpenCV
+find_package(OpenCV REQUIRED)
+include_directories(${OpenCV_INCLUDE_DIRS})
+# !OpenCV
+
+set(PROJECT_SOURCES
+    main.cpp
+
+    inference.h
+    inference.cpp
+)
+
+add_executable(Yolov8CPPInference ${PROJECT_SOURCES})
+target_link_libraries(Yolov8CPPInference ${OpenCV_LIBS})
--- a/examples/YOLOv8-CPP-Inference/README.md
+++ b/examples/YOLOv8-CPP-Inference/README.md
@ -0,0 +1,50 @@
+# YOLOv8/YOLOv5 Inference C++
+
+This example demonstrates how to perform inference using YOLOv8 and YOLOv5 models in C++ with OpenCV's DNN API.
+
+## Usage
+
+```bash
+git clone ultralytics
+cd ultralytics
+pip install .
+cd examples/YOLOv8-CPP-Inference
+
+# Add a **yolov8\_.onnx** and/or **yolov5\_.onnx** model(s) to the ultralytics folder.
+# Edit the **main.cpp** to change the **projectBasePath** to match your user.
+
+# Note that by default the CMake file will try and import the CUDA library to be used with the OpenCVs dnn (cuDNN) GPU Inference.
+# If your OpenCV build does not use CUDA/cuDNN you can remove that import call and run the example on CPU.
+
+mkdir build
+cd build
+cmake ..
+make
+./Yolov8CPPInference
+```
+
+## Exporting YOLOv8 and YOLOv5 Models
+
+To export YOLOv8 models:
+
+```commandline
+yolo export model=yolov8s.pt imgsz=480,640 format=onnx opset=12
+```
+
+To export YOLOv5 models:
+
+```commandline
+python3 export.py --weights yolov5s.pt --img 480 640 --include onnx --opset 12
+```
+
+yolov8s.onnx:
+
+![image](https://user-images.githubusercontent.com/40023722/217356132-a4cecf2e-2729-4acb-b80a-6559022d7707.png)
+
+yolov5s.onnx:
+
+![image](https://user-images.githubusercontent.com/40023722/217357005-07464492-d1da-42e3-98a7-fc753f87d5e6.png)
+
+This repository utilizes OpenCV's DNN API to run ONNX exported models of YOLOv5 and YOLOv8. In theory, it should work for YOLOv6 and YOLOv7 as well, but they have not been tested. Note that the example networks are exported with rectangular (640x480) resolutions, but any exported resolution will work. You may want to use the letterbox approach for square images, depending on your use case.
+
+The **main** branch version uses Qt as a GUI wrapper. The primary focus here is the **Inference** class file, which demonstrates how to transpose YOLOv8 models to work as YOLOv5 models.
--- a/examples/YOLOv8-CPP-Inference/inference.cpp
+++ b/examples/YOLOv8-CPP-Inference/inference.cpp
@ -0,0 +1,185 @@
+#include "inference.h"
+
+Inference::Inference(const std::string &onnxModelPath, const cv::Size &modelInputShape, const std::string &classesTxtFile, const bool &runWithCuda)
+{
+    modelPath = onnxModelPath;
+    modelShape = modelInputShape;
+    classesPath = classesTxtFile;
+    cudaEnabled = runWithCuda;
+
+    loadOnnxNetwork();
+    // loadClassesFromFile(); The classes are hard-coded for this example
+}
+
+std::vector<Detection> Inference::runInference(const cv::Mat &input)
+{
+    cv::Mat modelInput = input;
+    if (letterBoxForSquare && modelShape.width == modelShape.height)
+        modelInput = formatToSquare(modelInput);
+
+    cv::Mat blob;
+    cv::dnn::blobFromImage(modelInput, blob, 1.0/255.0, modelShape, cv::Scalar(), true, false);
+    net.setInput(blob);
+
+    std::vector<cv::Mat> outputs;
+    net.forward(outputs, net.getUnconnectedOutLayersNames());
+
+    int rows = outputs[0].size[1];
+    int dimensions = outputs[0].size[2];
+
+    bool yolov8 = false;
+    // yolov5 has an output of shape (batchSize, 25200, 85) (Num classes + box[x,y,w,h] + confidence[c])
+    // yolov8 has an output of shape (batchSize, 84,  8400) (Num classes + box[x,y,w,h])
+    if (dimensions > rows) // Check if the shape[2] is more than shape[1] (yolov8)
+    {
+        yolov8 = true;
+        rows = outputs[0].size[2];
+        dimensions = outputs[0].size[1];
+
+        outputs[0] = outputs[0].reshape(1, dimensions);
+        cv::transpose(outputs[0], outputs[0]);
+    }
+    float *data = (float *)outputs[0].data;
+
+    float x_factor = modelInput.cols / modelShape.width;
+    float y_factor = modelInput.rows / modelShape.height;
+
+    std::vector<int> class_ids;
+    std::vector<float> confidences;
+    std::vector<cv::Rect> boxes;
+
+    for (int i = 0; i < rows; ++i)
+    {
+        if (yolov8)
+        {
+            float *classes_scores = data+4;
+
+            cv::Mat scores(1, classes.size(), CV_32FC1, classes_scores);
+            cv::Point class_id;
+            double maxClassScore;
+
+            minMaxLoc(scores, 0, &maxClassScore, 0, &class_id);
+
+            if (maxClassScore > modelScoreThreshold)
+            {
+                confidences.push_back(maxClassScore);
+                class_ids.push_back(class_id.x);
+
+                float x = data[0];
+                float y = data[1];
+                float w = data[2];
+                float h = data[3];
+
+                int left = int((x - 0.5 * w) * x_factor);
+                int top = int((y - 0.5 * h) * y_factor);
+
+                int width = int(w * x_factor);
+                int height = int(h * y_factor);
+
+                boxes.push_back(cv::Rect(left, top, width, height));
+            }
+        }
+        else // yolov5
+        {
+            float confidence = data[4];
+
+            if (confidence >= modelConfidenceThreshold)
+            {
+                float *classes_scores = data+5;
+
+                cv::Mat scores(1, classes.size(), CV_32FC1, classes_scores);
+                cv::Point class_id;
+                double max_class_score;
+
+                minMaxLoc(scores, 0, &max_class_score, 0, &class_id);
+
+                if (max_class_score > modelScoreThreshold)
+                {
+                    confidences.push_back(confidence);
+                    class_ids.push_back(class_id.x);
+
+                    float x = data[0];
+                    float y = data[1];
+                    float w = data[2];
+                    float h = data[3];
+
+                    int left = int((x - 0.5 * w) * x_factor);
+                    int top = int((y - 0.5 * h) * y_factor);
+
+                    int width = int(w * x_factor);
+                    int height = int(h * y_factor);
+
+                    boxes.push_back(cv::Rect(left, top, width, height));
+                }
+            }
+        }
+
+        data += dimensions;
+    }
+
+    std::vector<int> nms_result;
+    cv::dnn::NMSBoxes(boxes, confidences, modelScoreThreshold, modelNMSThreshold, nms_result);
+
+    std::vector<Detection> detections{};
+    for (unsigned long i = 0; i < nms_result.size(); ++i)
+    {
+        int idx = nms_result[i];
+
+        Detection result;
+        result.class_id = class_ids[idx];
+        result.confidence = confidences[idx];
+
+        std::random_device rd;
+        std::mt19937 gen(rd());
+        std::uniform_int_distribution<int> dis(100, 255);
+        result.color = cv::Scalar(dis(gen),
+                                  dis(gen),
+                                  dis(gen));
+
+        result.className = classes[result.class_id];
+        result.box = boxes[idx];
+
+        detections.push_back(result);
+    }
+
+    return detections;
+}
+
+void Inference::loadClassesFromFile()
+{
+    std::ifstream inputFile(classesPath);
+    if (inputFile.is_open())
+    {
+        std::string classLine;
+        while (std::getline(inputFile, classLine))
+            classes.push_back(classLine);
+        inputFile.close();
+    }
+}
+
+void Inference::loadOnnxNetwork()
+{
+    net = cv::dnn::readNetFromONNX(modelPath);
+    if (cudaEnabled)
+    {
+        std::cout << "\nRunning on CUDA" << std::endl;
+        net.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
+        net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA);
+    }
+    else
+    {
+        std::cout << "\nRunning on CPU" << std::endl;
+        net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
+        net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
+    }
+}
+
+cv::Mat Inference::formatToSquare(const cv::Mat &source)
+{
+    int col = source.cols;
+    int row = source.rows;
+    int _max = MAX(col, row);
+    cv::Mat result = cv::Mat::zeros(_max, _max, CV_8UC3);
+    source.copyTo(result(cv::Rect(0, 0, col, row)));
+    return result;
+}
--- a/examples/YOLOv8-CPP-Inference/inference.h
+++ b/examples/YOLOv8-CPP-Inference/inference.h
@ -0,0 +1,52 @@
+#ifndef INFERENCE_H
+#define INFERENCE_H
+
+// Cpp native
+#include <fstream>
+#include <vector>
+#include <string>
+#include <random>
+
+// OpenCV / DNN / Inference
+#include <opencv2/imgproc.hpp>
+#include <opencv2/opencv.hpp>
+#include <opencv2/dnn.hpp>
+
+struct Detection
+{
+    int class_id{0};
+    std::string className{};
+    float confidence{0.0};
+    cv::Scalar color{};
+    cv::Rect box{};
+};
+
+class Inference
+{
+public:
+    Inference(const std::string &onnxModelPath, const cv::Size &modelInputShape = {640, 640}, const std::string &classesTxtFile = "", const bool &runWithCuda = true);
+    std::vector<Detection> runInference(const cv::Mat &input);
+
+private:
+    void loadClassesFromFile();
+    void loadOnnxNetwork();
+    cv::Mat formatToSquare(const cv::Mat &source);
+
+    std::string modelPath{};
+    std::string classesPath{};
+    bool cudaEnabled{};
+
+    std::vector<std::string> classes{"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"};
+
+    cv::Size2f modelShape{};
+
+    float modelConfidenceThreshold {0.25};
+    float modelScoreThreshold      {0.45};
+    float modelNMSThreshold        {0.50};
+
+    bool letterBoxForSquare = true;
+
+    cv::dnn::Net net;
+};
+
+#endif // INFERENCE_H
--- a/examples/YOLOv8-CPP-Inference/main.cpp
+++ b/examples/YOLOv8-CPP-Inference/main.cpp
@ -0,0 +1,70 @@
+#include <iostream>
+#include <vector>
+#include <getopt.h>
+
+#include <opencv2/opencv.hpp>
+
+#include "inference.h"
+
+using namespace std;
+using namespace cv;
+
+int main(int argc, char **argv)
+{
+    std::string projectBasePath = "/home/user/ultralytics"; // Set your ultralytics base path
+
+    bool runOnGPU = true;
+
+    //
+    // Pass in either:
+    //
+    // "yolov8s.onnx" or "yolov5s.onnx"
+    //
+    // To run Inference with yolov8/yolov5 (ONNX)
+    //
+
+    // Note that in this example the classes are hard-coded and 'classes.txt' is a place holder.
+    Inference inf(projectBasePath + "/yolov8s.onnx", cv::Size(640, 480), "classes.txt", runOnGPU);
+
+    std::vector<std::string> imageNames;
+    imageNames.push_back(projectBasePath + "/ultralytics/assets/bus.jpg");
+    imageNames.push_back(projectBasePath + "/ultralytics/assets/zidane.jpg");
+
+    for (int i = 0; i < imageNames.size(); ++i)
+    {
+        cv::Mat frame = cv::imread(imageNames[i]);
+
+        // Inference starts here...
+        std::vector<Detection> output = inf.runInference(frame);
+
+        int detections = output.size();
+        std::cout << "Number of detections:" << detections << std::endl;
+
+        for (int i = 0; i < detections; ++i)
+        {
+            Detection detection = output[i];
+
+            cv::Rect box = detection.box;
+            cv::Scalar color = detection.color;
+
+            // Detection box
+            cv::rectangle(frame, box, color, 2);
+
+            // Detection box text
+            std::string classString = detection.className + ' ' + std::to_string(detection.confidence).substr(0, 4);
+            cv::Size textSize = cv::getTextSize(classString, cv::FONT_HERSHEY_DUPLEX, 1, 2, 0);
+            cv::Rect textBox(box.x, box.y - 40, textSize.width + 10, textSize.height + 20);
+
+            cv::rectangle(frame, textBox, color, cv::FILLED);
+            cv::putText(frame, classString, cv::Point(box.x + 5, box.y - 10), cv::FONT_HERSHEY_DUPLEX, 1, cv::Scalar(0, 0, 0), 2, 0);
+        }
+        // Inference ends here...
+
+        // This is only for preview purposes
+        float scale = 0.8;
+        cv::resize(frame, frame, cv::Size(frame.cols*scale, frame.rows*scale));
+        cv::imshow("Inference", frame);
+
+        cv::waitKey(-1);
+    }
+}
--- a/examples/YOLOv8-LibTorch-CPP-Inference/CMakeLists.txt
+++ b/examples/YOLOv8-LibTorch-CPP-Inference/CMakeLists.txt
@ -0,0 +1,47 @@
+cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
+
+project(yolov8_libtorch_example)
+
+set(CMAKE_CXX_STANDARD 17)
+set(CMAKE_CXX_STANDARD_REQUIRED ON)
+set(CMAKE_CXX_EXTENSIONS OFF)
+
+
+# -------------- OpenCV --------------
+set(OpenCV_DIR "/path/to/opencv/lib/cmake/opencv4")
+find_package(OpenCV REQUIRED)
+
+message(STATUS "OpenCV library status:")
+message(STATUS "    config: ${OpenCV_DIR}")
+message(STATUS "    version: ${OpenCV_VERSION}")
+message(STATUS "    libraries: ${OpenCV_LIBS}")
+message(STATUS "    include path: ${OpenCV_INCLUDE_DIRS}")
+
+include_directories(${OpenCV_INCLUDE_DIRS})
+
+# -------------- libtorch --------------
+list(APPEND CMAKE_PREFIX_PATH "/path/to/libtorch")
+set(Torch_DIR "/path/to/libtorch/share/cmake/Torch")
+
+find_package(Torch REQUIRED)
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
+message("${TORCH_LIBRARIES}")
+message("${TORCH_INCLUDE_DIRS}")
+
+# The following code block is suggested to be used on Windows.
+# According to https://github.com/pytorch/pytorch/issues/25457,
+# the DLLs need to be copied to avoid memory errors.
+# if (MSVC)
+#   file(GLOB TORCH_DLLS "${TORCH_INSTALL_PREFIX}/lib/*.dll")
+#   add_custom_command(TARGET yolov8_libtorch_example
+#                      POST_BUILD
+#                      COMMAND ${CMAKE_COMMAND} -E copy_if_different
+#                      ${TORCH_DLLS}
+#                      $<TARGET_FILE_DIR:yolov8_libtorch_example>)
+# endif (MSVC)
+
+include_directories(${TORCH_INCLUDE_DIRS})
+
+add_executable(yolov8_libtorch_inference "${CMAKE_CURRENT_SOURCE_DIR}/main.cc")
+target_link_libraries(yolov8_libtorch_inference ${TORCH_LIBRARIES} ${OpenCV_LIBS})
+set_property(TARGET yolov8_libtorch_inference PROPERTY CXX_STANDARD 17)
--- a/examples/YOLOv8-LibTorch-CPP-Inference/README.md
+++ b/examples/YOLOv8-LibTorch-CPP-Inference/README.md
@ -0,0 +1,35 @@
+# YOLOv8 LibTorch Inference C++
+
+This example demonstrates how to perform inference using YOLOv8 models in C++ with LibTorch API.
+
+## Dependencies
+
+| Dependency   | Version  |
+| ------------ | -------- |
+| OpenCV       | >=4.0.0  |
+| C++ Standard | >=17     |
+| Cmake        | >=3.18   |
+| Libtorch     | >=1.12.1 |
+
+## Usage
+
+```bash
+git clone ultralytics
+cd ultralytics
+pip install .
+cd examples/YOLOv8-LibTorch-CPP-Inference
+
+mkdir build
+cd build
+cmake ..
+make
+./yolov8_libtorch_inference
+```
+
+## Exporting YOLOv8
+
+To export YOLOv8 models:
+
+```commandline
+yolo export model=yolov8s.pt imgsz=640 format=torchscript
+```
--- a/examples/YOLOv8-LibTorch-CPP-Inference/main.cc
+++ b/examples/YOLOv8-LibTorch-CPP-Inference/main.cc
@ -0,0 +1,259 @@
+#include <iostream>
+
+#include <opencv2/core.hpp>
+#include <opencv2/imgproc.hpp>
+#include <opencv2/imgcodecs.hpp>
+#include <torch/torch.h>
+#include <torch/script.h>
+
+using torch::indexing::Slice;
+using torch::indexing::None;
+
+
+float generate_scale(cv::Mat& image, const std::vector<int>& target_size) {
+    int origin_w = image.cols;
+    int origin_h = image.rows;
+
+    int target_h = target_size[0];
+    int target_w = target_size[1];
+
+    float ratio_h = static_cast<float>(target_h) / static_cast<float>(origin_h);
+    float ratio_w = static_cast<float>(target_w) / static_cast<float>(origin_w);
+    float resize_scale = std::min(ratio_h, ratio_w);
+    return resize_scale;
+}
+
+
+float letterbox(cv::Mat &input_image, cv::Mat &output_image, const std::vector<int> &target_size) {
+    if (input_image.cols == target_size[1] && input_image.rows == target_size[0]) {
+        if (input_image.data == output_image.data) {
+            return 1.;
+        } else {
+            output_image = input_image.clone();
+            return 1.;
+        }
+    }
+
+    float resize_scale = generate_scale(input_image, target_size);
+    int new_shape_w = std::round(input_image.cols * resize_scale);
+    int new_shape_h = std::round(input_image.rows * resize_scale);
+    float padw = (target_size[1] - new_shape_w) / 2.;
+    float padh = (target_size[0] - new_shape_h) / 2.;
+
+    int top = std::round(padh - 0.1);
+    int bottom = std::round(padh + 0.1);
+    int left = std::round(padw - 0.1);
+    int right = std::round(padw + 0.1);
+
+    cv::resize(input_image, output_image,
+               cv::Size(new_shape_w, new_shape_h),
+               0, 0, cv::INTER_AREA);
+
+    cv::copyMakeBorder(output_image, output_image, top, bottom, left, right,
+                       cv::BORDER_CONSTANT, cv::Scalar(114.));
+    return resize_scale;
+}
+
+
+torch::Tensor xyxy2xywh(const torch::Tensor& x) {
+    auto y = torch::empty_like(x);
+    y.index_put_({"...", 0}, (x.index({"...", 0}) + x.index({"...", 2})).div(2));
+    y.index_put_({"...", 1}, (x.index({"...", 1}) + x.index({"...", 3})).div(2));
+    y.index_put_({"...", 2}, x.index({"...", 2}) - x.index({"...", 0}));
+    y.index_put_({"...", 3}, x.index({"...", 3}) - x.index({"...", 1}));
+    return y;
+}
+
+
+torch::Tensor xywh2xyxy(const torch::Tensor& x) {
+    auto y = torch::empty_like(x);
+    auto dw = x.index({"...", 2}).div(2);
+    auto dh = x.index({"...", 3}).div(2);
+    y.index_put_({"...", 0}, x.index({"...", 0}) - dw);
+    y.index_put_({"...", 1}, x.index({"...", 1}) - dh);
+    y.index_put_({"...", 2}, x.index({"...", 0}) + dw);
+    y.index_put_({"...", 3}, x.index({"...", 1}) + dh);
+    return y;
+}
+
+
+// Reference: https://github.com/pytorch/vision/blob/main/torchvision/csrc/ops/cpu/nms_kernel.cpp
+torch::Tensor nms(const torch::Tensor& bboxes, const torch::Tensor& scores, float iou_threshold) {
+    if (bboxes.numel() == 0)
+        return torch::empty({0}, bboxes.options().dtype(torch::kLong));
+
+    auto x1_t = bboxes.select(1, 0).contiguous();
+    auto y1_t = bboxes.select(1, 1).contiguous();
+    auto x2_t = bboxes.select(1, 2).contiguous();
+    auto y2_t = bboxes.select(1, 3).contiguous();
+
+    torch::Tensor areas_t = (x2_t - x1_t) * (y2_t - y1_t);
+
+    auto order_t = std::get<1>(
+        scores.sort(/*stable=*/true, /*dim=*/0, /* descending=*/true));
+
+    auto ndets = bboxes.size(0);
+    torch::Tensor suppressed_t = torch::zeros({ndets}, bboxes.options().dtype(torch::kByte));
+    torch::Tensor keep_t = torch::zeros({ndets}, bboxes.options().dtype(torch::kLong));
+
+    auto suppressed = suppressed_t.data_ptr<uint8_t>();
+    auto keep = keep_t.data_ptr<int64_t>();
+    auto order = order_t.data_ptr<int64_t>();
+    auto x1 = x1_t.data_ptr<float>();
+    auto y1 = y1_t.data_ptr<float>();
+    auto x2 = x2_t.data_ptr<float>();
+    auto y2 = y2_t.data_ptr<float>();
+    auto areas = areas_t.data_ptr<float>();
+
+    int64_t num_to_keep = 0;
+
+    for (int64_t _i = 0; _i < ndets; _i++) {
+        auto i = order[_i];
+        if (suppressed[i] == 1)
+            continue;
+        keep[num_to_keep++] = i;
+        auto ix1 = x1[i];
+        auto iy1 = y1[i];
+        auto ix2 = x2[i];
+        auto iy2 = y2[i];
+        auto iarea = areas[i];
+
+        for (int64_t _j = _i + 1; _j < ndets; _j++) {
+        auto j = order[_j];
+        if (suppressed[j] == 1)
+            continue;
+        auto xx1 = std::max(ix1, x1[j]);
+        auto yy1 = std::max(iy1, y1[j]);
+        auto xx2 = std::min(ix2, x2[j]);
+        auto yy2 = std::min(iy2, y2[j]);
+
+        auto w = std::max(static_cast<float>(0), xx2 - xx1);
+        auto h = std::max(static_cast<float>(0), yy2 - yy1);
+        auto inter = w * h;
+        auto ovr = inter / (iarea + areas[j] - inter);
+        if (ovr > iou_threshold)
+            suppressed[j] = 1;
+        }
+    }
+    return keep_t.narrow(0, 0, num_to_keep);
+}
+
+
+torch::Tensor non_max_supperession(torch::Tensor& prediction, float conf_thres = 0.25, float iou_thres = 0.45, int max_det = 300) {
+    auto bs = prediction.size(0);
+    auto nc = prediction.size(1) - 4;
+    auto nm = prediction.size(1) - nc - 4;
+    auto mi = 4 + nc;
+    auto xc = prediction.index({Slice(), Slice(4, mi)}).amax(1) > conf_thres;
+
+    prediction = prediction.transpose(-1, -2);
+    prediction.index_put_({"...", Slice({None, 4})}, xywh2xyxy(prediction.index({"...", Slice(None, 4)})));
+
+    std::vector<torch::Tensor> output;
+    for (int i = 0; i < bs; i++) {
+        output.push_back(torch::zeros({0, 6 + nm}, prediction.device()));
+    }
+
+    for (int xi = 0; xi < prediction.size(0); xi++) {
+        auto x = prediction[xi];
+        x = x.index({xc[xi]});
+        auto x_split = x.split({4, nc, nm}, 1);
+        auto box = x_split[0], cls = x_split[1], mask = x_split[2];
+        auto [conf, j] = cls.max(1, true);
+        x = torch::cat({box, conf, j.toType(torch::kFloat), mask}, 1);
+        x = x.index({conf.view(-1) > conf_thres});
+        int n = x.size(0);
+        if (!n) { continue; }
+
+        // NMS
+        auto c = x.index({Slice(), Slice{5, 6}}) * 7680;
+        auto boxes = x.index({Slice(), Slice(None, 4)}) + c;
+        auto scores = x.index({Slice(), 4});
+        auto i = nms(boxes, scores, iou_thres);
+        i = i.index({Slice(None, max_det)});
+        output[xi] = x.index({i});
+    }
+
+    return torch::stack(output);
+}
+
+
+torch::Tensor clip_boxes(torch::Tensor& boxes, const std::vector<int>& shape) {
+    boxes.index_put_({"...", 0}, boxes.index({"...", 0}).clamp(0, shape[1]));
+    boxes.index_put_({"...", 1}, boxes.index({"...", 1}).clamp(0, shape[0]));
+    boxes.index_put_({"...", 2}, boxes.index({"...", 2}).clamp(0, shape[1]));
+    boxes.index_put_({"...", 3}, boxes.index({"...", 3}).clamp(0, shape[0]));
+    return boxes;
+}
+
+
+torch::Tensor scale_boxes(const std::vector<int>& img1_shape, torch::Tensor& boxes, const std::vector<int>& img0_shape) {
+    auto gain = (std::min)((float)img1_shape[0] / img0_shape[0], (float)img1_shape[1] / img0_shape[1]);
+    auto pad0 = std::round((float)(img1_shape[1] - img0_shape[1] * gain) / 2. - 0.1);
+    auto pad1 = std::round((float)(img1_shape[0] - img0_shape[0] * gain) / 2. - 0.1);
+
+    boxes.index_put_({"...", 0}, boxes.index({"...", 0}) - pad0);
+    boxes.index_put_({"...", 2}, boxes.index({"...", 2}) - pad0);
+    boxes.index_put_({"...", 1}, boxes.index({"...", 1}) - pad1);
+    boxes.index_put_({"...", 3}, boxes.index({"...", 3}) - pad1);
+    boxes.index_put_({"...", Slice(None, 4)}, boxes.index({"...", Slice(None, 4)}).div(gain));
+    return boxes;
+}
+
+
+int main() {
+    // Device
+    torch::Device device(torch::cuda::is_available() ? torch::kCUDA :torch::kCPU);
+
+    // Note that in this example the classes are hard-coded
+    std::vector<std::string> classes {"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light", "fire hydrant",
+                                      "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra",
+                                      "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite",
+                                      "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife",
+                                      "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair",
+                                      "couch", "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
+                                      "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"};
+
+    try {
+        // Load the model (e.g. yolov8s.torchscript)
+        std::string model_path = "/path/to/yolov8s.torchscript";
+        torch::jit::script::Module yolo_model;
+        yolo_model = torch::jit::load(model_path);
+        yolo_model.eval();
+        yolo_model.to(device, torch::kFloat32);
+
+        // Load image and preprocess
+        cv::Mat image = cv::imread("/path/to/bus.jpg");
+        cv::Mat input_image;
+        letterbox(image, input_image, {640, 640});
+
+        torch::Tensor image_tensor = torch::from_blob(input_image.data, {input_image.rows, input_image.cols, 3}, torch::kByte).to(device);
+        image_tensor = image_tensor.toType(torch::kFloat32).div(255);
+        image_tensor = image_tensor.permute({2, 0, 1});
+        image_tensor = image_tensor.unsqueeze(0);
+        std::vector<torch::jit::IValue> inputs {image_tensor};
+
+        // Inference
+        torch::Tensor output = yolo_model.forward(inputs).toTensor().cpu();
+
+        // NMS
+        auto keep = non_max_supperession(output)[0];
+        auto boxes = keep.index({Slice(), Slice(None, 4)});
+        keep.index_put_({Slice(), Slice(None, 4)}, scale_boxes({input_image.rows, input_image.cols}, boxes, {image.rows, image.cols}));
+
+        // Show the results
+        for (int i = 0; i < keep.size(0); i++) {
+            int x1 = keep[i][0].item().toFloat();
+            int y1 = keep[i][1].item().toFloat();
+            int x2 = keep[i][2].item().toFloat();
+            int y2 = keep[i][3].item().toFloat();
+            float conf = keep[i][4].item().toFloat();
+            int cls = keep[i][5].item().toInt();
+            std::cout << "Rect: [" << x1 << "," << y1 << "," << x2 << "," << y2 << "]  Conf: " << conf << "  Class: " << classes[cls] << std::endl;
+        }
+    } catch (const c10::Error& e) {
+        std::cout << e.msg() << std::endl;
+    }
+
+    return 0;
+}
--- a/examples/YOLOv8-ONNXRuntime-CPP/CMakeLists.txt
+++ b/examples/YOLOv8-ONNXRuntime-CPP/CMakeLists.txt
@ -0,0 +1,96 @@
+cmake_minimum_required(VERSION 3.5)
+
+set(PROJECT_NAME Yolov8OnnxRuntimeCPPInference)
+project(${PROJECT_NAME} VERSION 0.0.1 LANGUAGES CXX)
+
+
+# -------------- Support C++17 for using filesystem  ------------------#
+set(CMAKE_CXX_STANDARD 17)
+set(CMAKE_CXX_STANDARD_REQUIRED ON)
+set(CMAKE_CXX_EXTENSIONS ON)
+set(CMAKE_INCLUDE_CURRENT_DIR ON)
+
+
+# -------------- OpenCV  ------------------#
+find_package(OpenCV REQUIRED)
+include_directories(${OpenCV_INCLUDE_DIRS})
+
+
+# -------------- Compile CUDA for FP16 inference if needed  ------------------#
+option(USE_CUDA "Enable CUDA support" ON)
+if (NOT APPLE AND USE_CUDA)
+    find_package(CUDA REQUIRED)
+    include_directories(${CUDA_INCLUDE_DIRS})
+    add_definitions(-DUSE_CUDA)
+else ()
+    set(USE_CUDA OFF)
+endif ()
+
+# -------------- ONNXRUNTIME  ------------------#
+
+# Set ONNXRUNTIME_VERSION
+set(ONNXRUNTIME_VERSION 1.15.1)
+
+if (WIN32)
+    if (USE_CUDA)
+        set(ONNXRUNTIME_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/onnxruntime-win-x64-gpu-${ONNXRUNTIME_VERSION}")
+    else ()
+        set(ONNXRUNTIME_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/onnxruntime-win-x64-${ONNXRUNTIME_VERSION}")
+    endif ()
+elseif (LINUX)
+    if (USE_CUDA)
+        set(ONNXRUNTIME_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/onnxruntime-linux-x64-gpu-${ONNXRUNTIME_VERSION}")
+    else ()
+        set(ONNXRUNTIME_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/onnxruntime-linux-x64-${ONNXRUNTIME_VERSION}")
+    endif ()
+elseif (APPLE)
+    set(ONNXRUNTIME_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/onnxruntime-osx-arm64-${ONNXRUNTIME_VERSION}")
+    # Apple X64 binary
+    # set(ONNXRUNTIME_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/onnxruntime-osx-x64-${ONNXRUNTIME_VERSION}")
+    # Apple Universal binary
+    # set(ONNXRUNTIME_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/onnxruntime-osx-universal2-${ONNXRUNTIME_VERSION}")
+endif ()
+
+include_directories(${PROJECT_NAME} ${ONNXRUNTIME_ROOT}/include)
+
+set(PROJECT_SOURCES
+        main.cpp
+        inference.h
+        inference.cpp
+)
+
+add_executable(${PROJECT_NAME} ${PROJECT_SOURCES})
+
+if (WIN32)
+    target_link_libraries(${PROJECT_NAME} ${OpenCV_LIBS} ${ONNXRUNTIME_ROOT}/lib/onnxruntime.lib)
+    if (USE_CUDA)
+        target_link_libraries(${PROJECT_NAME} ${CUDA_LIBRARIES})
+    endif ()
+elseif (LINUX)
+    target_link_libraries(${PROJECT_NAME} ${OpenCV_LIBS} ${ONNXRUNTIME_ROOT}/lib/libonnxruntime.so)
+    if (USE_CUDA)
+        target_link_libraries(${PROJECT_NAME} ${CUDA_LIBRARIES})
+    endif ()
+elseif (APPLE)
+    target_link_libraries(${PROJECT_NAME} ${OpenCV_LIBS} ${ONNXRUNTIME_ROOT}/lib/libonnxruntime.dylib)
+endif ()
+
+# For windows system, copy onnxruntime.dll to the same folder of the executable file
+if (WIN32)
+    add_custom_command(TARGET ${PROJECT_NAME} POST_BUILD
+            COMMAND ${CMAKE_COMMAND} -E copy_if_different
+            "${ONNXRUNTIME_ROOT}/lib/onnxruntime.dll"
+            $<TARGET_FILE_DIR:${PROJECT_NAME}>)
+endif ()
+
+# Download https://raw.githubusercontent.com/ultralytics/ultralytics/main/ultralytics/cfg/datasets/coco.yaml
+# and put it in the same folder of the executable file
+configure_file(coco.yaml ${CMAKE_CURRENT_BINARY_DIR}/coco.yaml COPYONLY)
+
+# Copy yolov8n.onnx file to the same folder of the executable file
+configure_file(yolov8n.onnx ${CMAKE_CURRENT_BINARY_DIR}/yolov8n.onnx COPYONLY)
+
+# Create folder name images in the same folder of the executable file
+add_custom_command(TARGET ${PROJECT_NAME} POST_BUILD
+    COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_CURRENT_BINARY_DIR}/images
+)
--- a/examples/YOLOv8-ONNXRuntime-CPP/README.md
+++ b/examples/YOLOv8-ONNXRuntime-CPP/README.md
@ -0,0 +1,107 @@
+# YOLOv8 OnnxRuntime C++
+
+<img alt="C++" src="https://img.shields.io/badge/C++-17-blue.svg?style=flat&logo=c%2B%2B"> <img alt="Onnx-runtime" src="https://img.shields.io/badge/OnnxRuntime-717272.svg?logo=Onnx&logoColor=white">
+
+This example demonstrates how to perform inference using YOLOv8 in C++ with ONNX Runtime and OpenCV's API.
+
+## Benefits ✨
+
+- Friendly for deployment in the industrial sector.
+- Faster than OpenCV's DNN inference on both CPU and GPU.
+- Supports FP32 and FP16 CUDA acceleration.
+
+## Note ☕
+
+1. Benefit for Ultralytics' latest release, a `Transpose` op is added to the YOLOv8 model, while make v8 and v5 has the same output shape. Therefore, you can run inference with YOLOv5/v7/v8 via this project.
+
+## Exporting YOLOv8 Models 📦
+
+To export YOLOv8 models, use the following Python script:
+
+```python
+from ultralytics import YOLO
+
+# Load a YOLOv8 model
+model = YOLO("yolov8n.pt")
+
+# Export the model
+model.export(format="onnx", opset=12, simplify=True, dynamic=False, imgsz=640)
+```
+
+Alternatively, you can use the following command for exporting the model in the terminal
+
+```bash
+yolo export model=yolov8n.pt opset=12 simplify=True dynamic=False format=onnx imgsz=640,640
+```
+
+## Exporting YOLOv8 FP16 Models 📦
+
+```python
+import onnx
+from onnxconverter_common import float16
+
+model = onnx.load(R'YOUR_ONNX_PATH')
+model_fp16 = float16.convert_float_to_float16(model)
+onnx.save(model_fp16, R'YOUR_FP16_ONNX_PATH')
+```
+
+## Download COCO.yaml file 📂
+
+In order to run example, you also need to download coco.yaml. You can download the file manually from [here](https://raw.githubusercontent.com/ultralytics/ultralytics/main/ultralytics/cfg/datasets/coco.yaml)
+
+## Dependencies ⚙️
+
+| Dependency                       | Version        |
+| -------------------------------- | -------------- |
+| Onnxruntime(linux,windows,macos) | >=1.14.1       |
+| OpenCV                           | >=4.0.0        |
+| C++ Standard                     | >=17           |
+| Cmake                            | >=3.5          |
+| Cuda (Optional)                  | >=11.4  \<12.0 |
+| cuDNN (Cuda required)            | =8             |
+
+Note: The dependency on C++17 is due to the usage of the C++17 filesystem feature.
+
+Note (2): Due to ONNX Runtime, we need to use CUDA 11 and cuDNN 8. Keep in mind that this requirement might change in the future.
+
+## Build 🛠️
+
+1. Clone the repository to your local machine.
+
+2. Navigate to the root directory of the repository.
+
+3. Create a build directory and navigate to it:
+
+    ```console
+    mkdir build && cd build
+    ```
+
+4. Run CMake to generate the build files:
+
+    ```console
+    cmake ..
+    ```
+
+5. Build the project:
+
+    ```console
+    make
+    ```
+
+6. The built executable should now be located in the `build` directory.
+
+## Usage 🚀
+
+```c++
+//change your param as you like
+//Pay attention to your device and the onnx model type(fp32 or fp16)
+DL_INIT_PARAM params;
+params.rectConfidenceThreshold = 0.1;
+params.iouThreshold = 0.5;
+params.modelPath = "yolov8n.onnx";
+params.imgSize = { 640, 640 };
+params.cudaEnable = true;
+params.modelType = YOLO_DETECT_V8;
+yoloDetector->CreateSession(params);
+Detector(yoloDetector);
+```
--- a/examples/YOLOv8-ONNXRuntime-CPP/inference.cpp
+++ b/examples/YOLOv8-ONNXRuntime-CPP/inference.cpp
@ -0,0 +1,363 @@
+#include "inference.h"
+#include <regex>
+
+#define benchmark
+#define min(a,b)            (((a) < (b)) ? (a) : (b))
+YOLO_V8::YOLO_V8() {
+
+}
+
+
+YOLO_V8::~YOLO_V8() {
+    delete session;
+}
+
+#ifdef USE_CUDA
+namespace Ort
+{
+    template<>
+    struct TypeToTensorType<half> { static constexpr ONNXTensorElementDataType type = ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT16; };
+}
+#endif
+
+
+template<typename T>
+char* BlobFromImage(cv::Mat& iImg, T& iBlob) {
+    int channels = iImg.channels();
+    int imgHeight = iImg.rows;
+    int imgWidth = iImg.cols;
+
+    for (int c = 0; c < channels; c++)
+    {
+        for (int h = 0; h < imgHeight; h++)
+        {
+            for (int w = 0; w < imgWidth; w++)
+            {
+                iBlob[c * imgWidth * imgHeight + h * imgWidth + w] = typename std::remove_pointer<T>::type(
+                    (iImg.at<cv::Vec3b>(h, w)[c]) / 255.0f);
+            }
+        }
+    }
+    return RET_OK;
+}
+
+
+char* YOLO_V8::PreProcess(cv::Mat& iImg, std::vector<int> iImgSize, cv::Mat& oImg)
+{
+    if (iImg.channels() == 3)
+    {
+        oImg = iImg.clone();
+        cv::cvtColor(oImg, oImg, cv::COLOR_BGR2RGB);
+    }
+    else
+    {
+        cv::cvtColor(iImg, oImg, cv::COLOR_GRAY2RGB);
+    }
+
+    switch (modelType)
+    {
+    case YOLO_DETECT_V8:
+    case YOLO_POSE:
+    case YOLO_DETECT_V8_HALF:
+    case YOLO_POSE_V8_HALF://LetterBox
+    {
+        if (iImg.cols >= iImg.rows)
+        {
+            resizeScales = iImg.cols / (float)iImgSize.at(0);
+            cv::resize(oImg, oImg, cv::Size(iImgSize.at(0), int(iImg.rows / resizeScales)));
+        }
+        else
+        {
+            resizeScales = iImg.rows / (float)iImgSize.at(0);
+            cv::resize(oImg, oImg, cv::Size(int(iImg.cols / resizeScales), iImgSize.at(1)));
+        }
+        cv::Mat tempImg = cv::Mat::zeros(iImgSize.at(0), iImgSize.at(1), CV_8UC3);
+        oImg.copyTo(tempImg(cv::Rect(0, 0, oImg.cols, oImg.rows)));
+        oImg = tempImg;
+        break;
+    }
+    case YOLO_CLS://CenterCrop
+    {
+        int h = iImg.rows;
+        int w = iImg.cols;
+        int m = min(h, w);
+        int top = (h - m) / 2;
+        int left = (w - m) / 2;
+        cv::resize(oImg(cv::Rect(left, top, m, m)), oImg, cv::Size(iImgSize.at(0), iImgSize.at(1)));
+        break;
+    }
+    }
+    return RET_OK;
+}
+
+
+char* YOLO_V8::CreateSession(DL_INIT_PARAM& iParams) {
+    char* Ret = RET_OK;
+    std::regex pattern("[\u4e00-\u9fa5]");
+    bool result = std::regex_search(iParams.modelPath, pattern);
+    if (result)
+    {
+        Ret = "[YOLO_V8]:Your model path is error.Change your model path without chinese characters.";
+        std::cout << Ret << std::endl;
+        return Ret;
+    }
+    try
+    {
+        rectConfidenceThreshold = iParams.rectConfidenceThreshold;
+        iouThreshold = iParams.iouThreshold;
+        imgSize = iParams.imgSize;
+        modelType = iParams.modelType;
+        env = Ort::Env(ORT_LOGGING_LEVEL_WARNING, "Yolo");
+        Ort::SessionOptions sessionOption;
+        if (iParams.cudaEnable)
+        {
+            cudaEnable = iParams.cudaEnable;
+            OrtCUDAProviderOptions cudaOption;
+            cudaOption.device_id = 0;
+            sessionOption.AppendExecutionProvider_CUDA(cudaOption);
+        }
+        sessionOption.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
+        sessionOption.SetIntraOpNumThreads(iParams.intraOpNumThreads);
+        sessionOption.SetLogSeverityLevel(iParams.logSeverityLevel);
+
+#ifdef _WIN32
+        int ModelPathSize = MultiByteToWideChar(CP_UTF8, 0, iParams.modelPath.c_str(), static_cast<int>(iParams.modelPath.length()), nullptr, 0);
+        wchar_t* wide_cstr = new wchar_t[ModelPathSize + 1];
+        MultiByteToWideChar(CP_UTF8, 0, iParams.modelPath.c_str(), static_cast<int>(iParams.modelPath.length()), wide_cstr, ModelPathSize);
+        wide_cstr[ModelPathSize] = L'\0';
+        const wchar_t* modelPath = wide_cstr;
+#else
+        const char* modelPath = iParams.modelPath.c_str();
+#endif // _WIN32
+
+        session = new Ort::Session(env, modelPath, sessionOption);
+        Ort::AllocatorWithDefaultOptions allocator;
+        size_t inputNodesNum = session->GetInputCount();
+        for (size_t i = 0; i < inputNodesNum; i++)
+        {
+            Ort::AllocatedStringPtr input_node_name = session->GetInputNameAllocated(i, allocator);
+            char* temp_buf = new char[50];
+            strcpy(temp_buf, input_node_name.get());
+            inputNodeNames.push_back(temp_buf);
+        }
+        size_t OutputNodesNum = session->GetOutputCount();
+        for (size_t i = 0; i < OutputNodesNum; i++)
+        {
+            Ort::AllocatedStringPtr output_node_name = session->GetOutputNameAllocated(i, allocator);
+            char* temp_buf = new char[10];
+            strcpy(temp_buf, output_node_name.get());
+            outputNodeNames.push_back(temp_buf);
+        }
+        options = Ort::RunOptions{ nullptr };
+        WarmUpSession();
+        return RET_OK;
+    }
+    catch (const std::exception& e)
+    {
+        const char* str1 = "[YOLO_V8]:";
+        const char* str2 = e.what();
+        std::string result = std::string(str1) + std::string(str2);
+        char* merged = new char[result.length() + 1];
+        std::strcpy(merged, result.c_str());
+        std::cout << merged << std::endl;
+        delete[] merged;
+        return "[YOLO_V8]:Create session failed.";
+    }
+
+}
+
+
+char* YOLO_V8::RunSession(cv::Mat& iImg, std::vector<DL_RESULT>& oResult) {
+#ifdef benchmark
+    clock_t starttime_1 = clock();
+#endif // benchmark
+
+    char* Ret = RET_OK;
+    cv::Mat processedImg;
+    PreProcess(iImg, imgSize, processedImg);
+    if (modelType < 4)
+    {
+        float* blob = new float[processedImg.total() * 3];
+        BlobFromImage(processedImg, blob);
+        std::vector<int64_t> inputNodeDims = { 1, 3, imgSize.at(0), imgSize.at(1) };
+        TensorProcess(starttime_1, iImg, blob, inputNodeDims, oResult);
+    }
+    else
+    {
+#ifdef USE_CUDA
+        half* blob = new half[processedImg.total() * 3];
+        BlobFromImage(processedImg, blob);
+        std::vector<int64_t> inputNodeDims = { 1,3,imgSize.at(0),imgSize.at(1) };
+        TensorProcess(starttime_1, iImg, blob, inputNodeDims, oResult);
+#endif
+    }
+
+    return Ret;
+}
+
+
+template<typename N>
+char* YOLO_V8::TensorProcess(clock_t& starttime_1, cv::Mat& iImg, N& blob, std::vector<int64_t>& inputNodeDims,
+    std::vector<DL_RESULT>& oResult) {
+    Ort::Value inputTensor = Ort::Value::CreateTensor<typename std::remove_pointer<N>::type>(
+        Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU), blob, 3 * imgSize.at(0) * imgSize.at(1),
+        inputNodeDims.data(), inputNodeDims.size());
+#ifdef benchmark
+    clock_t starttime_2 = clock();
+#endif // benchmark
+    auto outputTensor = session->Run(options, inputNodeNames.data(), &inputTensor, 1, outputNodeNames.data(),
+        outputNodeNames.size());
+#ifdef benchmark
+    clock_t starttime_3 = clock();
+#endif // benchmark
+
+    Ort::TypeInfo typeInfo = outputTensor.front().GetTypeInfo();
+    auto tensor_info = typeInfo.GetTensorTypeAndShapeInfo();
+    std::vector<int64_t> outputNodeDims = tensor_info.GetShape();
+    auto output = outputTensor.front().GetTensorMutableData<typename std::remove_pointer<N>::type>();
+    delete[] blob;
+    switch (modelType)
+    {
+    case YOLO_DETECT_V8:
+    case YOLO_DETECT_V8_HALF:
+    {
+        int strideNum = outputNodeDims[1];//8400
+        int signalResultNum = outputNodeDims[2];//84
+        std::vector<int> class_ids;
+        std::vector<float> confidences;
+        std::vector<cv::Rect> boxes;
+        cv::Mat rawData;
+        if (modelType == YOLO_DETECT_V8)
+        {
+            // FP32
+            rawData = cv::Mat(strideNum, signalResultNum, CV_32F, output);
+        }
+        else
+        {
+            // FP16
+            rawData = cv::Mat(strideNum, signalResultNum, CV_16F, output);
+            rawData.convertTo(rawData, CV_32F);
+        }
+        //Note:
+        //ultralytics add transpose operator to the output of yolov8 model.which make yolov8/v5/v7 has same shape
+        //https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8n.pt
+        //rowData = rowData.t();
+
+        float* data = (float*)rawData.data;
+
+        for (int i = 0; i < strideNum; ++i)
+        {
+            float* classesScores = data + 4;
+            cv::Mat scores(1, this->classes.size(), CV_32FC1, classesScores);
+            cv::Point class_id;
+            double maxClassScore;
+            cv::minMaxLoc(scores, 0, &maxClassScore, 0, &class_id);
+            if (maxClassScore > rectConfidenceThreshold)
+            {
+                confidences.push_back(maxClassScore);
+                class_ids.push_back(class_id.x);
+                float x = data[0];
+                float y = data[1];
+                float w = data[2];
+                float h = data[3];
+
+                int left = int((x - 0.5 * w) * resizeScales);
+                int top = int((y - 0.5 * h) * resizeScales);
+
+                int width = int(w * resizeScales);
+                int height = int(h * resizeScales);
+
+                boxes.push_back(cv::Rect(left, top, width, height));
+            }
+            data += signalResultNum;
+        }
+        std::vector<int> nmsResult;
+        cv::dnn::NMSBoxes(boxes, confidences, rectConfidenceThreshold, iouThreshold, nmsResult);
+        for (int i = 0; i < nmsResult.size(); ++i)
+        {
+            int idx = nmsResult[i];
+            DL_RESULT result;
+            result.classId = class_ids[idx];
+            result.confidence = confidences[idx];
+            result.box = boxes[idx];
+            oResult.push_back(result);
+        }
+
+#ifdef benchmark
+        clock_t starttime_4 = clock();
+        double pre_process_time = (double)(starttime_2 - starttime_1) / CLOCKS_PER_SEC * 1000;
+        double process_time = (double)(starttime_3 - starttime_2) / CLOCKS_PER_SEC * 1000;
+        double post_process_time = (double)(starttime_4 - starttime_3) / CLOCKS_PER_SEC * 1000;
+        if (cudaEnable)
+        {
+            std::cout << "[YOLO_V8(CUDA)]: " << pre_process_time << "ms pre-process, " << process_time << "ms inference, " << post_process_time << "ms post-process." << std::endl;
+        }
+        else
+        {
+            std::cout << "[YOLO_V8(CPU)]: " << pre_process_time << "ms pre-process, " << process_time << "ms inference, " << post_process_time << "ms post-process." << std::endl;
+        }
+#endif // benchmark
+
+        break;
+    }
+    case YOLO_CLS:
+    {
+        DL_RESULT result;
+        for (int i = 0; i < this->classes.size(); i++)
+        {
+            result.classId = i;
+            result.confidence = output[i];
+            oResult.push_back(result);
+        }
+        break;
+    }
+    default:
+        std::cout << "[YOLO_V8]: " << "Not support model type." << std::endl;
+    }
+    return RET_OK;
+
+}
+
+
+char* YOLO_V8::WarmUpSession() {
+    clock_t starttime_1 = clock();
+    cv::Mat iImg = cv::Mat(cv::Size(imgSize.at(0), imgSize.at(1)), CV_8UC3);
+    cv::Mat processedImg;
+    PreProcess(iImg, imgSize, processedImg);
+    if (modelType < 4)
+    {
+        float* blob = new float[iImg.total() * 3];
+        BlobFromImage(processedImg, blob);
+        std::vector<int64_t> YOLO_input_node_dims = { 1, 3, imgSize.at(0), imgSize.at(1) };
+        Ort::Value input_tensor = Ort::Value::CreateTensor<float>(
+            Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU), blob, 3 * imgSize.at(0) * imgSize.at(1),
+            YOLO_input_node_dims.data(), YOLO_input_node_dims.size());
+        auto output_tensors = session->Run(options, inputNodeNames.data(), &input_tensor, 1, outputNodeNames.data(),
+            outputNodeNames.size());
+        delete[] blob;
+        clock_t starttime_4 = clock();
+        double post_process_time = (double)(starttime_4 - starttime_1) / CLOCKS_PER_SEC * 1000;
+        if (cudaEnable)
+        {
+            std::cout << "[YOLO_V8(CUDA)]: " << "Cuda warm-up cost " << post_process_time << " ms. " << std::endl;
+        }
+    }
+    else
+    {
+#ifdef USE_CUDA
+        half* blob = new half[iImg.total() * 3];
+        BlobFromImage(processedImg, blob);
+        std::vector<int64_t> YOLO_input_node_dims = { 1,3,imgSize.at(0),imgSize.at(1) };
+        Ort::Value input_tensor = Ort::Value::CreateTensor<half>(Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU), blob, 3 * imgSize.at(0) * imgSize.at(1), YOLO_input_node_dims.data(), YOLO_input_node_dims.size());
+        auto output_tensors = session->Run(options, inputNodeNames.data(), &input_tensor, 1, outputNodeNames.data(), outputNodeNames.size());
+        delete[] blob;
+        clock_t starttime_4 = clock();
+        double post_process_time = (double)(starttime_4 - starttime_1) / CLOCKS_PER_SEC * 1000;
+        if (cudaEnable)
+        {
+            std::cout << "[YOLO_V8(CUDA)]: " << "Cuda warm-up cost " << post_process_time << " ms. " << std::endl;
+        }
+#endif
+    }
+    return RET_OK;
+}
--- a/examples/YOLOv8-ONNXRuntime-CPP/inference.h
+++ b/examples/YOLOv8-ONNXRuntime-CPP/inference.h
@ -0,0 +1,93 @@
+#pragma once
+
+#define    RET_OK nullptr
+
+#ifdef _WIN32
+#include <Windows.h>
+#include <direct.h>
+#include <io.h>
+#endif
+
+#include <string>
+#include <vector>
+#include <cstdio>
+#include <opencv2/opencv.hpp>
+#include "onnxruntime_cxx_api.h"
+
+#ifdef USE_CUDA
+#include <cuda_fp16.h>
+#endif
+
+
+enum MODEL_TYPE
+{
+    //FLOAT32 MODEL
+    YOLO_DETECT_V8 = 1,
+    YOLO_POSE = 2,
+    YOLO_CLS = 3,
+
+    //FLOAT16 MODEL
+    YOLO_DETECT_V8_HALF = 4,
+    YOLO_POSE_V8_HALF = 5,
+};
+
+
+typedef struct _DL_INIT_PARAM
+{
+    std::string modelPath;
+    MODEL_TYPE modelType = YOLO_DETECT_V8;
+    std::vector<int> imgSize = { 640, 640 };
+    float rectConfidenceThreshold = 0.6;
+    float iouThreshold = 0.5;
+    int	keyPointsNum = 2;//Note:kpt number for pose
+    bool cudaEnable = false;
+    int logSeverityLevel = 3;
+    int intraOpNumThreads = 1;
+} DL_INIT_PARAM;
+
+
+typedef struct _DL_RESULT
+{
+    int classId;
+    float confidence;
+    cv::Rect box;
+    std::vector<cv::Point2f> keyPoints;
+} DL_RESULT;
+
+
+class YOLO_V8
+{
+public:
+    YOLO_V8();
+
+    ~YOLO_V8();
+
+public:
+    char* CreateSession(DL_INIT_PARAM& iParams);
+
+    char* RunSession(cv::Mat& iImg, std::vector<DL_RESULT>& oResult);
+
+    char* WarmUpSession();
+
+    template<typename N>
+    char* TensorProcess(clock_t& starttime_1, cv::Mat& iImg, N& blob, std::vector<int64_t>& inputNodeDims,
+        std::vector<DL_RESULT>& oResult);
+
+    char* PreProcess(cv::Mat& iImg, std::vector<int> iImgSize, cv::Mat& oImg);
+
+    std::vector<std::string> classes{};
+
+private:
+    Ort::Env env;
+    Ort::Session* session;
+    bool cudaEnable;
+    Ort::RunOptions options;
+    std::vector<const char*> inputNodeNames;
+    std::vector<const char*> outputNodeNames;
+
+    MODEL_TYPE modelType;
+    std::vector<int> imgSize;
+    float rectConfidenceThreshold;
+    float iouThreshold;
+    float resizeScales;//letterbox scale
+};
--- a/examples/YOLOv8-ONNXRuntime-CPP/main.cpp
+++ b/examples/YOLOv8-ONNXRuntime-CPP/main.cpp
@ -0,0 +1,193 @@
+#include <iostream>
+#include <iomanip>
+#include "inference.h"
+#include <filesystem>
+#include <fstream>
+#include <random>
+
+void Detector(YOLO_V8*& p) {
+    std::filesystem::path current_path = std::filesystem::current_path();
+    std::filesystem::path imgs_path = current_path / "images";
+    for (auto& i : std::filesystem::directory_iterator(imgs_path))
+    {
+        if (i.path().extension() == ".jpg" || i.path().extension() == ".png" || i.path().extension() == ".jpeg")
+        {
+            std::string img_path = i.path().string();
+            cv::Mat img = cv::imread(img_path);
+            std::vector<DL_RESULT> res;
+            p->RunSession(img, res);
+
+            for (auto& re : res)
+            {
+                cv::RNG rng(cv::getTickCount());
+                cv::Scalar color(rng.uniform(0, 256), rng.uniform(0, 256), rng.uniform(0, 256));
+
+                cv::rectangle(img, re.box, color, 3);
+
+                float confidence = floor(100 * re.confidence) / 100;
+                std::cout << std::fixed << std::setprecision(2);
+                std::string label = p->classes[re.classId] + " " +
+                    std::to_string(confidence).substr(0, std::to_string(confidence).size() - 4);
+
+                cv::rectangle(
+                    img,
+                    cv::Point(re.box.x, re.box.y - 25),
+                    cv::Point(re.box.x + label.length() * 15, re.box.y),
+                    color,
+                    cv::FILLED
+                );
+
+                cv::putText(
+                    img,
+                    label,
+                    cv::Point(re.box.x, re.box.y - 5),
+                    cv::FONT_HERSHEY_SIMPLEX,
+                    0.75,
+                    cv::Scalar(0, 0, 0),
+                    2
+                );
+
+
+            }
+            std::cout << "Press any key to exit" << std::endl;
+            cv::imshow("Result of Detection", img);
+            cv::waitKey(0);
+            cv::destroyAllWindows();
+        }
+    }
+}
+
+
+void Classifier(YOLO_V8*& p)
+{
+    std::filesystem::path current_path = std::filesystem::current_path();
+    std::filesystem::path imgs_path = current_path;// / "images"
+    std::random_device rd;
+    std::mt19937 gen(rd());
+    std::uniform_int_distribution<int> dis(0, 255);
+    for (auto& i : std::filesystem::directory_iterator(imgs_path))
+    {
+        if (i.path().extension() == ".jpg" || i.path().extension() == ".png")
+        {
+            std::string img_path = i.path().string();
+            //std::cout << img_path << std::endl;
+            cv::Mat img = cv::imread(img_path);
+            std::vector<DL_RESULT> res;
+            char* ret = p->RunSession(img, res);
+
+            float positionY = 50;
+            for (int i = 0; i < res.size(); i++)
+            {
+                int r = dis(gen);
+                int g = dis(gen);
+                int b = dis(gen);
+                cv::putText(img, std::to_string(i) + ":", cv::Point(10, positionY), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(b, g, r), 2);
+                cv::putText(img, std::to_string(res.at(i).confidence), cv::Point(70, positionY), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(b, g, r), 2);
+                positionY += 50;
+            }
+
+            cv::imshow("TEST_CLS", img);
+            cv::waitKey(0);
+            cv::destroyAllWindows();
+            //cv::imwrite("E:\\output\\" + std::to_string(k) + ".png", img);
+        }
+
+    }
+}
+
+
+
+int ReadCocoYaml(YOLO_V8*& p) {
+    // Open the YAML file
+    std::ifstream file("coco.yaml");
+    if (!file.is_open())
+    {
+        std::cerr << "Failed to open file" << std::endl;
+        return 1;
+    }
+
+    // Read the file line by line
+    std::string line;
+    std::vector<std::string> lines;
+    while (std::getline(file, line))
+    {
+        lines.push_back(line);
+    }
+
+    // Find the start and end of the names section
+    std::size_t start = 0;
+    std::size_t end = 0;
+    for (std::size_t i = 0; i < lines.size(); i++)
+    {
+        if (lines[i].find("names:") != std::string::npos)
+        {
+            start = i + 1;
+        }
+        else if (start > 0 && lines[i].find(':') == std::string::npos)
+        {
+            end = i;
+            break;
+        }
+    }
+
+    // Extract the names
+    std::vector<std::string> names;
+    for (std::size_t i = start; i < end; i++)
+    {
+        std::stringstream ss(lines[i]);
+        std::string name;
+        std::getline(ss, name, ':'); // Extract the number before the delimiter
+        std::getline(ss, name); // Extract the string after the delimiter
+        names.push_back(name);
+    }
+
+    p->classes = names;
+    return 0;
+}
+
+
+void DetectTest()
+{
+    YOLO_V8* yoloDetector = new YOLO_V8;
+    ReadCocoYaml(yoloDetector);
+    DL_INIT_PARAM params;
+    params.rectConfidenceThreshold = 0.1;
+    params.iouThreshold = 0.5;
+    params.modelPath = "yolov8n.onnx";
+    params.imgSize = { 640, 640 };
+#ifdef USE_CUDA
+    params.cudaEnable = true;
+
+    // GPU FP32 inference
+    params.modelType = YOLO_DETECT_V8;
+    // GPU FP16 inference
+    //Note: change fp16 onnx model
+    //params.modelType = YOLO_DETECT_V8_HALF;
+
+#else
+    // CPU inference
+    params.modelType = YOLO_DETECT_V8;
+    params.cudaEnable = false;
+
+#endif
+    yoloDetector->CreateSession(params);
+    Detector(yoloDetector);
+}
+
+
+void ClsTest()
+{
+    YOLO_V8* yoloDetector = new YOLO_V8;
+    std::string model_path = "cls.onnx";
+    ReadCocoYaml(yoloDetector);
+    DL_INIT_PARAM params{ model_path, YOLO_CLS, {224, 224} };
+    yoloDetector->CreateSession(params);
+    Classifier(yoloDetector);
+}
+
+
+int main()
+{
+    //DetectTest();
+    ClsTest();
+}
--- a/examples/YOLOv8-ONNXRuntime-Rust/Cargo.toml
+++ b/examples/YOLOv8-ONNXRuntime-Rust/Cargo.toml
@ -0,0 +1,21 @@
+[package]
+name = "yolov8-rs"
+version = "0.1.0"
+edition = "2021"
+
+# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
+
+[dependencies]
+clap = { version = "4.2.4", features = ["derive"] }
+image = { version = "0.24.7", default-features = false, features = ["jpeg", "png", "webp-encoder"] }
+imageproc = { version = "0.23.0", default-features = false }
+ndarray = { version = "0.15.6" }
+ort =  {version = "1.16.3", default-features = false, features = ["load-dynamic", "copy-dylibs", "half"]}
+rusttype = { version = "0.9", default-features = false }
+anyhow = { version = "1.0.75"}
+regex = { version = "1.5.4" }
+rand = { version ="0.8.5" }
+chrono = { version = "0.4.30" }
+half = { version = "2.3.1" }
+dirs = { version = "5.0.1" }
+ureq = { version = "2.9.1" }
--- a/examples/YOLOv8-ONNXRuntime-Rust/README.md
+++ b/examples/YOLOv8-ONNXRuntime-Rust/README.md
@ -0,0 +1,221 @@
+# YOLOv8-ONNXRuntime-Rust for All the Key YOLO Tasks
+
+This repository provides a Rust demo for performing YOLOv8 tasks like `Classification`, `Segmentation`, `Detection` and `Pose Detection` using ONNXRuntime.
+
+## Features
+
+- Support `Classification`, `Segmentation`, `Detection`, `Pose(Keypoints)-Detection` tasks.
+- Support `FP16` & `FP32` ONNX models.
+- Support `CPU`, `CUDA` and `TensorRT` execution provider to accelerate computation.
+- Support dynamic input shapes(`batch`, `width`, `height`).
+
+## Installation
+
+### 1. Install Rust
+
+Please follow the Rust official installation. (https://www.rust-lang.org/tools/install)
+
+### 2. Install ONNXRuntime
+
+This repository use `ort` crate, which is ONNXRuntime wrapper for Rust. (https://docs.rs/ort/latest/ort/)
+
+You can follow the instruction with `ort` doc or simply do this:
+
+- step1: Download ONNXRuntime(https://github.com/microsoft/onnxruntime/releases)
+- setp2: Set environment variable `PATH` for linking.
+
+On ubuntu, You can do like this:
+
+```
+vim ~/.bashrc
+
+# Add the path of ONNXRUntime lib
+export LD_LIBRARY_PATH=/home/qweasd/Documents/onnxruntime-linux-x64-gpu-1.16.3/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
+
+source ~/.bashrc
+```
+
+### 3. \[Optional\] Install CUDA & CuDNN & TensorRT
+
+- CUDA execution provider requires CUDA v11.6+.
+- TensorRT execution provider requires CUDA v11.4+ and TensorRT v8.4+.
+
+## Get Started
+
+### 1. Export the YOLOv8 ONNX Models
+
+```bash
+pip install -U ultralytics
+
+# export onnx model with dynamic shapes
+yolo export model=yolov8m.pt format=onnx  simplify dynamic
+yolo export model=yolov8m-cls.pt format=onnx  simplify dynamic
+yolo export model=yolov8m-pose.pt format=onnx  simplify dynamic
+yolo export model=yolov8m-seg.pt format=onnx  simplify dynamic
+
+
+# export onnx model with constant shapes
+yolo export model=yolov8m.pt format=onnx  simplify
+yolo export model=yolov8m-cls.pt format=onnx  simplify
+yolo export model=yolov8m-pose.pt format=onnx  simplify
+yolo export model=yolov8m-seg.pt format=onnx  simplify
+```
+
+### 2. Run Inference
+
+It will perform inference with the ONNX model on the source image.
+
+```
+cargo run --release -- --model <MODEL> --source <SOURCE>
+```
+
+Set `--cuda` to use CUDA execution provider to speed up inference.
+
+```
+cargo run --release -- --cuda --model <MODEL> --source <SOURCE>
+```
+
+Set `--trt` to use TensorRT execution provider, and you can set `--fp16` at the same time to use TensorRT FP16 engine.
+
+```
+cargo run --release -- --trt --fp16 --model <MODEL> --source <SOURCE>
+```
+
+Set `--device_id` to select which device to run. When you have only one GPU, and you set `device_id` to 1 will not cause program panic, the `ort` would automatically fall back to `CPU` EP.
+
+```
+cargo run --release -- --cuda --device_id 0 --model <MODEL> --source <SOURCE>
+```
+
+Set `--batch` to do multi-batch-size inference.
+
+If you're using `--trt`, you can also set `--batch-min` and `--batch-max` to explicitly specify min/max/opt batch for dynamic batch input.(https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#explicit-shape-range-for-dynamic-shape-input).(Note that the ONNX model should exported with dynamic shapes)
+
+```
+cargo run --release -- --cuda --batch 2 --model <MODEL> --source <SOURCE>
+```
+
+Set `--height` and `--width` to do dynamic image size inference. (Note that the ONNX model should exported with dynamic shapes)
+
+```
+cargo run --release -- --cuda --width 480 --height 640 --model <MODEL> --source <SOURCE>
+```
+
+Set `--profile` to check time consumed in each stage.(Note that the model usually needs to take 1~3 times dry run to warmup. Make sure to run enough times to evaluate the result.)
+
+```
+cargo run --release -- --trt --fp16 --profile --model <MODEL> --source <SOURCE>
+```
+
+Results: (yolov8m.onnx, batch=1, 3 times, trt, fp16, RTX 3060Ti)
+
+```
+==> 0
+[Model Preprocess]: 12.75788ms
+[ORT H2D]: 237.118µs
+[ORT Inference]: 507.895469ms
+[ORT D2H]: 191.655µs
+[Model Inference]: 508.34589ms
+[Model Postprocess]: 1.061122ms
+==> 1
+[Model Preprocess]: 13.658655ms
+[ORT H2D]: 209.975µs
+[ORT Inference]: 5.12372ms
+[ORT D2H]: 182.389µs
+[Model Inference]: 5.530022ms
+[Model Postprocess]: 1.04851ms
+==> 2
+[Model Preprocess]: 12.475332ms
+[ORT H2D]: 246.127µs
+[ORT Inference]: 5.048432ms
+[ORT D2H]: 187.117µs
+[Model Inference]: 5.493119ms
+[Model Postprocess]: 1.040906ms
+```
+
+And also:
+
+`--conf`: confidence threshold \[default: 0.3\]
+
+`--iou`: iou threshold in NMS \[default: 0.45\]
+
+`--kconf`: confidence threshold of keypoint \[default: 0.55\]
+
+`--plot`: plot inference result with random RGB color and save
+
+you can check out all CLI arguments by:
+
+```
+git clone https://github.com/ultralytics/ultralytics
+cd ultralytics/examples/YOLOv8-ONNXRuntime-Rust
+cargo run --release -- --help
+```
+
+## Examples
+
+### Classification
+
+Running dynamic shape ONNX model on `CPU` with image size `--height 224 --width 224`. Saving plotted image in `runs` directory.
+
+```
+cargo run --release -- --model ../assets/weights/yolov8m-cls-dyn.onnx --source ../assets/images/dog.jpg --height 224 --width 224 --plot --profile
+```
+
+You will see result like:
+
+```
+Summary:
+> Task: Classify (Ultralytics 8.0.217)
+> EP: Cpu
+> Dtype: Float32
+> Batch: 1 (Dynamic), Height: 224 (Dynamic), Width: 224 (Dynamic)
+> nc: 1000 nk: 0, nm: 0, conf: 0.3, kconf: 0.55, iou: 0.45
+
+[Model Preprocess]: 16.363477ms
+[ORT H2D]: 50.722µs
+[ORT Inference]: 16.295808ms
+[ORT D2H]: 8.37µs
+[Model Inference]: 16.367046ms
+[Model Postprocess]: 3.527µs
+[
+    YOLOResult {
+        Probs(top5): Some([(208, 0.6950566), (209, 0.13823675), (178, 0.04849795), (215, 0.019029364), (212, 0.016506357)]),
+        Bboxes: None,
+        Keypoints: None,
+        Masks: None,
+    },
+]
+
+```
+
+![2023-11-25-22-02-02-156623351](https://github.com/jamjamjon/ultralytics/assets/51357717/ef75c2ae-c5ab-44cc-9d9e-e60b51e39662)
+
+### Object Detection
+
+Using `CUDA` EP and dynamic image size `--height 640 --width 480`
+
+```
+cargo run --release -- --cuda --model ../assets/weights/yolov8m-dynamic.onnx --source ../assets/images/bus.jpg --plot --height 640 --width 480
+```
+
+![det](https://github.com/jamjamjon/ultralytics/assets/51357717/5d89a19d-0c96-4a59-875c-defab6887a2c)
+
+### Pose Detection
+
+using `TensorRT` EP
+
+```
+cargo run --release -- --trt --model ../assets/weights/yolov8m-pose.onnx --source ../assets/images/bus.jpg --plot
+```
+
+![2023-11-25-22-31-45-127054025](https://github.com/jamjamjon/ultralytics/assets/51357717/157b5ba7-bfcf-47cf-bee7-68b62e0de1c4)
+
+### Instance Segmentation
+
+using `TensorRT` EP and FP16 model `--fp16`
+
+```
+cargo run --release --  --trt --fp16 --model ../assets/weights/yolov8m-seg.onnx --source ../assets/images/0172.jpg --plot
+```
+
+![seg](https://github.com/jamjamjon/ultralytics/assets/51357717/cf046f4f-9533-478a-adc7-4de22443a641)
--- a/examples/YOLOv8-ONNXRuntime-Rust/src/cli.rs
+++ b/examples/YOLOv8-ONNXRuntime-Rust/src/cli.rs
@ -0,0 +1,87 @@
+use clap::Parser;
+
+use crate::YOLOTask;
+
+#[derive(Parser, Clone)]
+#[command(author, version, about, long_about = None)]
+pub struct Args {
+    /// ONNX model path
+    #[arg(long, required = true)]
+    pub model: String,
+
+    /// input path
+    #[arg(long, required = true)]
+    pub source: String,
+
+    /// device id
+    #[arg(long, default_value_t = 0)]
+    pub device_id: u32,
+
+    /// using TensorRT EP
+    #[arg(long)]
+    pub trt: bool,
+
+    /// using CUDA EP
+    #[arg(long)]
+    pub cuda: bool,
+
+    /// input batch size
+    #[arg(long, default_value_t = 1)]
+    pub batch: u32,
+
+    /// trt input min_batch size
+    #[arg(long, default_value_t = 1)]
+    pub batch_min: u32,
+
+    /// trt input max_batch size
+    #[arg(long, default_value_t = 32)]
+    pub batch_max: u32,
+
+    /// using TensorRT --fp16
+    #[arg(long)]
+    pub fp16: bool,
+
+    /// specify YOLO task
+    #[arg(long, value_enum)]
+    pub task: Option<YOLOTask>,
+
+    /// num_classes
+    #[arg(long)]
+    pub nc: Option<u32>,
+
+    /// num_keypoints
+    #[arg(long)]
+    pub nk: Option<u32>,
+
+    /// num_masks
+    #[arg(long)]
+    pub nm: Option<u32>,
+
+    /// input image width
+    #[arg(long)]
+    pub width: Option<u32>,
+
+    /// input image height
+    #[arg(long)]
+    pub height: Option<u32>,
+
+    /// confidence threshold
+    #[arg(long, required = false, default_value_t = 0.3)]
+    pub conf: f32,
+
+    /// iou threshold in NMS
+    #[arg(long, required = false, default_value_t = 0.45)]
+    pub iou: f32,
+
+    /// confidence threshold of keypoint
+    #[arg(long, required = false, default_value_t = 0.55)]
+    pub kconf: f32,
+
+    /// plot inference result and save
+    #[arg(long)]
+    pub plot: bool,
+
+    /// check time consumed in each stage
+    #[arg(long)]
+    pub profile: bool,
+}
--- a/examples/YOLOv8-ONNXRuntime-Rust/src/lib.rs
+++ b/examples/YOLOv8-ONNXRuntime-Rust/src/lib.rs
@ -0,0 +1,119 @@
+#![allow(clippy::type_complexity)]
+
+use std::io::{Read, Write};
+
+pub mod cli;
+pub mod model;
+pub mod ort_backend;
+pub mod yolo_result;
+pub use crate::cli::Args;
+pub use crate::model::YOLOv8;
+pub use crate::ort_backend::{Batch, OrtBackend, OrtConfig, OrtEP, YOLOTask};
+pub use crate::yolo_result::{Bbox, Embedding, Point2, YOLOResult};
+
+pub fn non_max_suppression(
+    xs: &mut Vec<(Bbox, Option<Vec<Point2>>, Option<Vec<f32>>)>,
+    iou_threshold: f32,
+) {
+    xs.sort_by(|b1, b2| b2.0.confidence().partial_cmp(&b1.0.confidence()).unwrap());
+
+    let mut current_index = 0;
+    for index in 0..xs.len() {
+        let mut drop = false;
+        for prev_index in 0..current_index {
+            let iou = xs[prev_index].0.iou(&xs[index].0);
+            if iou > iou_threshold {
+                drop = true;
+                break;
+            }
+        }
+        if !drop {
+            xs.swap(current_index, index);
+            current_index += 1;
+        }
+    }
+    xs.truncate(current_index);
+}
+
+pub fn gen_time_string(delimiter: &str) -> String {
+    let offset = chrono::FixedOffset::east_opt(8 * 60 * 60).unwrap(); // Beijing
+    let t_now = chrono::Utc::now().with_timezone(&offset);
+    let fmt = format!(
+        "%Y{}%m{}%d{}%H{}%M{}%S{}%f",
+        delimiter, delimiter, delimiter, delimiter, delimiter, delimiter
+    );
+    t_now.format(&fmt).to_string()
+}
+
+pub const SKELETON: [(usize, usize); 16] = [
+    (0, 1),
+    (0, 2),
+    (1, 3),
+    (2, 4),
+    (5, 6),
+    (5, 11),
+    (6, 12),
+    (11, 12),
+    (5, 7),
+    (6, 8),
+    (7, 9),
+    (8, 10),
+    (11, 13),
+    (12, 14),
+    (13, 15),
+    (14, 16),
+];
+
+pub fn check_font(font: &str) -> rusttype::Font<'static> {
+    // check then load font
+
+    // ultralytics font path
+    let font_path_config = match dirs::config_dir() {
+        Some(mut d) => {
+            d.push("Ultralytics");
+            d.push(font);
+            d
+        }
+        None => panic!("Unsupported operating system. Now support Linux, MacOS, Windows."),
+    };
+
+    // current font path
+    let font_path_current = std::path::PathBuf::from(font);
+
+    // check font
+    let font_path = if font_path_config.exists() {
+        font_path_config
+    } else if font_path_current.exists() {
+        font_path_current
+    } else {
+        println!("Downloading font...");
+        let source_url = "https://ultralytics.com/assets/Arial.ttf";
+        let resp = ureq::get(source_url)
+            .timeout(std::time::Duration::from_secs(500))
+            .call()
+            .unwrap_or_else(|err| panic!("> Failed to download font: {source_url}: {err:?}"));
+
+        // read to buffer
+        let mut buffer = vec![];
+        let total_size = resp
+            .header("Content-Length")
+            .and_then(|s| s.parse::<u64>().ok())
+            .unwrap();
+        let _reader = resp
+            .into_reader()
+            .take(total_size)
+            .read_to_end(&mut buffer)
+            .unwrap();
+
+        // save
+        let _path = std::fs::File::create(font).unwrap();
+        let mut writer = std::io::BufWriter::new(_path);
+        writer.write_all(&buffer).unwrap();
+        println!("Font saved at: {:?}", font_path_current.display());
+        font_path_current
+    };
+
+    // load font
+    let buffer = std::fs::read(font_path).unwrap();
+    rusttype::Font::try_from_vec(buffer).unwrap()
+}
--- a/examples/YOLOv8-ONNXRuntime-Rust/src/main.rs
+++ b/examples/YOLOv8-ONNXRuntime-Rust/src/main.rs
@ -0,0 +1,28 @@
+use clap::Parser;
+
+use yolov8_rs::{Args, YOLOv8};
+
+fn main() -> Result<(), Box<dyn std::error::Error>> {
+    let args = Args::parse();
+
+    // 1. load image
+    let x = image::io::Reader::open(&args.source)?
+        .with_guessed_format()?
+        .decode()?;
+
+    // 2. model support dynamic batch inference, so input should be a Vec
+    let xs = vec![x];
+
+    // You can test `--batch 2` with this
+    // let xs = vec![x.clone(), x];
+
+    // 3. build yolov8 model
+    let mut model = YOLOv8::new(args)?;
+    model.summary(); // model info
+
+    // 4. run
+    let ys = model.run(&xs)?;
+    println!("{:?}", ys);
+
+    Ok(())
+}
--- a/examples/YOLOv8-ONNXRuntime-Rust/src/model.rs
+++ b/examples/YOLOv8-ONNXRuntime-Rust/src/model.rs
@ -0,0 +1,642 @@
+#![allow(clippy::type_complexity)]
+
+use anyhow::Result;
+use image::{DynamicImage, GenericImageView, ImageBuffer};
+use ndarray::{s, Array, Axis, IxDyn};
+use rand::{thread_rng, Rng};
+use std::path::PathBuf;
+
+use crate::{
+    check_font, gen_time_string, non_max_suppression, Args, Batch, Bbox, Embedding, OrtBackend,
+    OrtConfig, OrtEP, Point2, YOLOResult, YOLOTask, SKELETON,
+};
+
+pub struct YOLOv8 {
+    // YOLOv8 model for all yolo-tasks
+    engine: OrtBackend,
+    nc: u32,
+    nk: u32,
+    nm: u32,
+    height: u32,
+    width: u32,
+    batch: u32,
+    task: YOLOTask,
+    conf: f32,
+    kconf: f32,
+    iou: f32,
+    names: Vec<String>,
+    color_palette: Vec<(u8, u8, u8)>,
+    profile: bool,
+    plot: bool,
+}
+
+impl YOLOv8 {
+    pub fn new(config: Args) -> Result<Self> {
+        // execution provider
+        let ep = if config.trt {
+            OrtEP::Trt(config.device_id)
+        } else if config.cuda {
+            OrtEP::Cuda(config.device_id)
+        } else {
+            OrtEP::Cpu
+        };
+
+        // batch
+        let batch = Batch {
+            opt: config.batch,
+            min: config.batch_min,
+            max: config.batch_max,
+        };
+
+        // build ort engine
+        let ort_args = OrtConfig {
+            ep,
+            batch,
+            f: config.model,
+            task: config.task,
+            trt_fp16: config.fp16,
+            image_size: (config.height, config.width),
+        };
+        let engine = OrtBackend::build(ort_args)?;
+
+        //  get batch, height, width, tasks, nc, nk, nm
+        let (batch, height, width, task) = (
+            engine.batch(),
+            engine.height(),
+            engine.width(),
+            engine.task(),
+        );
+        let nc = engine.nc().or(config.nc).unwrap_or_else(|| {
+            panic!("Failed to get num_classes, make it explicit with `--nc`");
+        });
+        let (nk, nm) = match task {
+            YOLOTask::Pose => {
+                let nk = engine.nk().or(config.nk).unwrap_or_else(|| {
+                    panic!("Failed to get num_keypoints, make it explicit with `--nk`");
+                });
+                (nk, 0)
+            }
+            YOLOTask::Segment => {
+                let nm = engine.nm().or(config.nm).unwrap_or_else(|| {
+                    panic!("Failed to get num_masks, make it explicit with `--nm`");
+                });
+                (0, nm)
+            }
+            _ => (0, 0),
+        };
+
+        // class names
+        let names = engine.names().unwrap_or(vec!["Unknown".to_string()]);
+
+        // color palette
+        let mut rng = thread_rng();
+        let color_palette: Vec<_> = names
+            .iter()
+            .map(|_| {
+                (
+                    rng.gen_range(0..=255),
+                    rng.gen_range(0..=255),
+                    rng.gen_range(0..=255),
+                )
+            })
+            .collect();
+
+        Ok(Self {
+            engine,
+            names,
+            conf: config.conf,
+            kconf: config.kconf,
+            iou: config.iou,
+            color_palette,
+            profile: config.profile,
+            plot: config.plot,
+            nc,
+            nk,
+            nm,
+            height,
+            width,
+            batch,
+            task,
+        })
+    }
+
+    pub fn scale_wh(&self, w0: f32, h0: f32, w1: f32, h1: f32) -> (f32, f32, f32) {
+        let r = (w1 / w0).min(h1 / h0);
+        (r, (w0 * r).round(), (h0 * r).round())
+    }
+
+    pub fn preprocess(&mut self, xs: &Vec<DynamicImage>) -> Result<Array<f32, IxDyn>> {
+        let mut ys =
+            Array::ones((xs.len(), 3, self.height() as usize, self.width() as usize)).into_dyn();
+        ys.fill(144.0 / 255.0);
+        for (idx, x) in xs.iter().enumerate() {
+            let img = match self.task() {
+                YOLOTask::Classify => x.resize_exact(
+                    self.width(),
+                    self.height(),
+                    image::imageops::FilterType::Triangle,
+                ),
+                _ => {
+                    let (w0, h0) = x.dimensions();
+                    let w0 = w0 as f32;
+                    let h0 = h0 as f32;
+                    let (_, w_new, h_new) =
+                        self.scale_wh(w0, h0, self.width() as f32, self.height() as f32); // f32 round
+                    x.resize_exact(
+                        w_new as u32,
+                        h_new as u32,
+                        if let YOLOTask::Segment = self.task() {
+                            image::imageops::FilterType::CatmullRom
+                        } else {
+                            image::imageops::FilterType::Triangle
+                        },
+                    )
+                }
+            };
+
+            for (x, y, rgb) in img.pixels() {
+                let x = x as usize;
+                let y = y as usize;
+                let [r, g, b, _] = rgb.0;
+                ys[[idx, 0, y, x]] = (r as f32) / 255.0;
+                ys[[idx, 1, y, x]] = (g as f32) / 255.0;
+                ys[[idx, 2, y, x]] = (b as f32) / 255.0;
+            }
+        }
+
+        Ok(ys)
+    }
+
+    pub fn run(&mut self, xs: &Vec<DynamicImage>) -> Result<Vec<YOLOResult>> {
+        // pre-process
+        let t_pre = std::time::Instant::now();
+        let xs_ = self.preprocess(xs)?;
+        if self.profile {
+            println!("[Model Preprocess]: {:?}", t_pre.elapsed());
+        }
+
+        // run
+        let t_run = std::time::Instant::now();
+        let ys = self.engine.run(xs_, self.profile)?;
+        if self.profile {
+            println!("[Model Inference]: {:?}", t_run.elapsed());
+        }
+
+        // post-process
+        let t_post = std::time::Instant::now();
+        let ys = self.postprocess(ys, xs)?;
+        if self.profile {
+            println!("[Model Postprocess]: {:?}", t_post.elapsed());
+        }
+
+        // plot and save
+        if self.plot {
+            self.plot_and_save(&ys, xs, Some(&SKELETON));
+        }
+        Ok(ys)
+    }
+
+    pub fn postprocess(
+        &self,
+        xs: Vec<Array<f32, IxDyn>>,
+        xs0: &[DynamicImage],
+    ) -> Result<Vec<YOLOResult>> {
+        if let YOLOTask::Classify = self.task() {
+            let mut ys = Vec::new();
+            let preds = &xs[0];
+            for batch in preds.axis_iter(Axis(0)) {
+                ys.push(YOLOResult::new(
+                    Some(Embedding::new(batch.into_owned())),
+                    None,
+                    None,
+                    None,
+                ));
+            }
+            Ok(ys)
+        } else {
+            const CXYWH_OFFSET: usize = 4; // cxcywh
+            const KPT_STEP: usize = 3; // xyconf
+            let preds = &xs[0];
+            let protos = {
+                if xs.len() > 1 {
+                    Some(&xs[1])
+                } else {
+                    None
+                }
+            };
+            let mut ys = Vec::new();
+            for (idx, anchor) in preds.axis_iter(Axis(0)).enumerate() {
+                // [bs, 4 + nc + nm, anchors]
+                // input image
+                let width_original = xs0[idx].width() as f32;
+                let height_original = xs0[idx].height() as f32;
+                let ratio = (self.width() as f32 / width_original)
+                    .min(self.height() as f32 / height_original);
+
+                // save each result
+                let mut data: Vec<(Bbox, Option<Vec<Point2>>, Option<Vec<f32>>)> = Vec::new();
+                for pred in anchor.axis_iter(Axis(1)) {
+                    // split preds for different tasks
+                    let bbox = pred.slice(s![0..CXYWH_OFFSET]);
+                    let clss = pred.slice(s![CXYWH_OFFSET..CXYWH_OFFSET + self.nc() as usize]);
+                    let kpts = {
+                        if let YOLOTask::Pose = self.task() {
+                            Some(pred.slice(s![pred.len() - KPT_STEP * self.nk() as usize..]))
+                        } else {
+                            None
+                        }
+                    };
+                    let coefs = {
+                        if let YOLOTask::Segment = self.task() {
+                            Some(pred.slice(s![pred.len() - self.nm() as usize..]).to_vec())
+                        } else {
+                            None
+                        }
+                    };
+
+                    // confidence and id
+                    let (id, &confidence) = clss
+                        .into_iter()
+                        .enumerate()
+                        .reduce(|max, x| if x.1 > max.1 { x } else { max })
+                        .unwrap(); // definitely will not panic!
+
+                    // confidence filter
+                    if confidence < self.conf {
+                        continue;
+                    }
+
+                    // bbox re-scale
+                    let cx = bbox[0] / ratio;
+                    let cy = bbox[1] / ratio;
+                    let w = bbox[2] / ratio;
+                    let h = bbox[3] / ratio;
+                    let x = cx - w / 2.;
+                    let y = cy - h / 2.;
+                    let y_bbox = Bbox::new(
+                        x.max(0.0f32).min(width_original),
+                        y.max(0.0f32).min(height_original),
+                        w,
+                        h,
+                        id,
+                        confidence,
+                    );
+
+                    // kpts
+                    let y_kpts = {
+                        if let Some(kpts) = kpts {
+                            let mut kpts_ = Vec::new();
+                            // rescale
+                            for i in 0..self.nk() as usize {
+                                let kx = kpts[KPT_STEP * i] / ratio;
+                                let ky = kpts[KPT_STEP * i + 1] / ratio;
+                                let kconf = kpts[KPT_STEP * i + 2];
+                                if kconf < self.kconf {
+                                    kpts_.push(Point2::default());
+                                } else {
+                                    kpts_.push(Point2::new_with_conf(
+                                        kx.max(0.0f32).min(width_original),
+                                        ky.max(0.0f32).min(height_original),
+                                        kconf,
+                                    ));
+                                }
+                            }
+                            Some(kpts_)
+                        } else {
+                            None
+                        }
+                    };
+
+                    // data merged
+                    data.push((y_bbox, y_kpts, coefs));
+                }
+
+                // nms
+                non_max_suppression(&mut data, self.iou);
+
+                // decode
+                let mut y_bboxes: Vec<Bbox> = Vec::new();
+                let mut y_kpts: Vec<Vec<Point2>> = Vec::new();
+                let mut y_masks: Vec<Vec<u8>> = Vec::new();
+                for elem in data.into_iter() {
+                    if let Some(kpts) = elem.1 {
+                        y_kpts.push(kpts)
+                    }
+
+                    // decode masks
+                    if let Some(coefs) = elem.2 {
+                        let proto = protos.unwrap().slice(s![idx, .., .., ..]);
+                        let (nm, nh, nw) = proto.dim();
+
+                        // coefs * proto -> mask
+                        let coefs = Array::from_shape_vec((1, nm), coefs)?; // (n, nm)
+                        let proto = proto.to_owned().into_shape((nm, nh * nw))?; // (nm, nh*nw)
+                        let mask = coefs.dot(&proto).into_shape((nh, nw, 1))?; // (nh, nw, n)
+
+                        // build image from ndarray
+                        let mask_im: ImageBuffer<image::Luma<_>, Vec<f32>> =
+                            match ImageBuffer::from_raw(nw as u32, nh as u32, mask.into_raw_vec()) {
+                                Some(image) => image,
+                                None => panic!("can not create image from ndarray"),
+                            };
+                        let mut mask_im = image::DynamicImage::from(mask_im); // -> dyn
+
+                        // rescale masks
+                        let (_, w_mask, h_mask) =
+                            self.scale_wh(width_original, height_original, nw as f32, nh as f32);
+                        let mask_cropped = mask_im.crop(0, 0, w_mask as u32, h_mask as u32);
+                        let mask_original = mask_cropped.resize_exact(
+                            // resize_to_fill
+                            width_original as u32,
+                            height_original as u32,
+                            match self.task() {
+                                YOLOTask::Segment => image::imageops::FilterType::CatmullRom,
+                                _ => image::imageops::FilterType::Triangle,
+                            },
+                        );
+
+                        // crop-mask with bbox
+                        let mut mask_original_cropped = mask_original.into_luma8();
+                        for y in 0..height_original as usize {
+                            for x in 0..width_original as usize {
+                                if x < elem.0.xmin() as usize
+                                    || x > elem.0.xmax() as usize
+                                    || y < elem.0.ymin() as usize
+                                    || y > elem.0.ymax() as usize
+                                {
+                                    mask_original_cropped.put_pixel(
+                                        x as u32,
+                                        y as u32,
+                                        image::Luma([0u8]),
+                                    );
+                                }
+                            }
+                        }
+                        y_masks.push(mask_original_cropped.into_raw());
+                    }
+                    y_bboxes.push(elem.0);
+                }
+
+                // save each result
+                let y = YOLOResult {
+                    probs: None,
+                    bboxes: if !y_bboxes.is_empty() {
+                        Some(y_bboxes)
+                    } else {
+                        None
+                    },
+                    keypoints: if !y_kpts.is_empty() {
+                        Some(y_kpts)
+                    } else {
+                        None
+                    },
+                    masks: if !y_masks.is_empty() {
+                        Some(y_masks)
+                    } else {
+                        None
+                    },
+                };
+                ys.push(y);
+            }
+
+            Ok(ys)
+        }
+    }
+
+    pub fn plot_and_save(
+        &self,
+        ys: &[YOLOResult],
+        xs0: &[DynamicImage],
+        skeletons: Option<&[(usize, usize)]>,
+    ) {
+        // check font then load
+        let font = check_font("Arial.ttf");
+        for (_idb, (img0, y)) in xs0.iter().zip(ys.iter()).enumerate() {
+            let mut img = img0.to_rgb8();
+
+            // draw for classifier
+            if let Some(probs) = y.probs() {
+                for (i, k) in probs.topk(5).iter().enumerate() {
+                    let legend = format!("{} {:.2}%", self.names[k.0], k.1);
+                    let scale = 32;
+                    let legend_size = img.width().max(img.height()) / scale;
+                    let x = img.width() / 20;
+                    let y = img.height() / 20 + i as u32 * legend_size;
+                    imageproc::drawing::draw_text_mut(
+                        &mut img,
+                        image::Rgb([0, 255, 0]),
+                        x as i32,
+                        y as i32,
+                        rusttype::Scale::uniform(legend_size as f32 - 1.),
+                        &font,
+                        &legend,
+                    );
+                }
+            }
+
+            // draw bboxes & keypoints
+            if let Some(bboxes) = y.bboxes() {
+                for (_idx, bbox) in bboxes.iter().enumerate() {
+                    // rect
+                    imageproc::drawing::draw_hollow_rect_mut(
+                        &mut img,
+                        imageproc::rect::Rect::at(bbox.xmin() as i32, bbox.ymin() as i32)
+                            .of_size(bbox.width() as u32, bbox.height() as u32),
+                        image::Rgb(self.color_palette[bbox.id()].into()),
+                    );
+
+                    // text
+                    let legend = format!("{} {:.2}%", self.names[bbox.id()], bbox.confidence());
+                    let scale = 40;
+                    let legend_size = img.width().max(img.height()) / scale;
+                    imageproc::drawing::draw_text_mut(
+                        &mut img,
+                        image::Rgb(self.color_palette[bbox.id()].into()),
+                        bbox.xmin() as i32,
+                        (bbox.ymin() - legend_size as f32) as i32,
+                        rusttype::Scale::uniform(legend_size as f32 - 1.),
+                        &font,
+                        &legend,
+                    );
+                }
+            }
+
+            // draw kpts
+            if let Some(keypoints) = y.keypoints() {
+                for kpts in keypoints.iter() {
+                    for kpt in kpts.iter() {
+                        // filter
+                        if kpt.confidence() < self.kconf {
+                            continue;
+                        }
+
+                        // draw point
+                        imageproc::drawing::draw_filled_circle_mut(
+                            &mut img,
+                            (kpt.x() as i32, kpt.y() as i32),
+                            2,
+                            image::Rgb([0, 255, 0]),
+                        );
+                    }
+
+                    // draw skeleton if has
+                    if let Some(skeletons) = skeletons {
+                        for &(idx1, idx2) in skeletons.iter() {
+                            let kpt1 = &kpts[idx1];
+                            let kpt2 = &kpts[idx2];
+                            if kpt1.confidence() < self.kconf || kpt2.confidence() < self.kconf {
+                                continue;
+                            }
+                            imageproc::drawing::draw_line_segment_mut(
+                                &mut img,
+                                (kpt1.x(), kpt1.y()),
+                                (kpt2.x(), kpt2.y()),
+                                image::Rgb([233, 14, 57]),
+                            );
+                        }
+                    }
+                }
+            }
+
+            // draw mask
+            if let Some(masks) = y.masks() {
+                for (mask, _bbox) in masks.iter().zip(y.bboxes().unwrap().iter()) {
+                    let mask_nd: ImageBuffer<image::Luma<_>, Vec<u8>> =
+                        match ImageBuffer::from_vec(img.width(), img.height(), mask.to_vec()) {
+                            Some(image) => image,
+                            None => panic!("can not crate image from ndarray"),
+                        };
+
+                    for _x in 0..img.width() {
+                        for _y in 0..img.height() {
+                            let mask_p = imageproc::drawing::Canvas::get_pixel(&mask_nd, _x, _y);
+                            if mask_p.0[0] > 0 {
+                                let mut img_p = imageproc::drawing::Canvas::get_pixel(&img, _x, _y);
+                                // img_p.0[2] = self.color_palette[bbox.id()].2 / 2;
+                                // img_p.0[1] = self.color_palette[bbox.id()].1 / 2;
+                                // img_p.0[0] = self.color_palette[bbox.id()].0 / 2;
+                                img_p.0[2] /= 2;
+                                img_p.0[1] = 255 - (255 - img_p.0[2]) / 2;
+                                img_p.0[0] /= 2;
+                                imageproc::drawing::Canvas::draw_pixel(&mut img, _x, _y, img_p)
+                            }
+                        }
+                    }
+                }
+            }
+
+            // mkdir and save
+            let mut runs = PathBuf::from("runs");
+            if !runs.exists() {
+                std::fs::create_dir_all(&runs).unwrap();
+            }
+            runs.push(gen_time_string("-"));
+            let saveout = format!("{}.jpg", runs.to_str().unwrap());
+            let _ = img.save(saveout);
+        }
+    }
+
+    pub fn summary(&self) {
+        println!(
+            "\nSummary:\n\
+            > Task: {:?}{}\n\
+            > EP: {:?} {}\n\
+            > Dtype: {:?}\n\
+            > Batch: {} ({}), Height: {} ({}), Width: {} ({})\n\
+            > nc: {} nk: {}, nm: {}, conf: {}, kconf: {}, iou: {}\n\
+            ",
+            self.task(),
+            match self.engine.author().zip(self.engine.version()) {
+                Some((author, ver)) => format!(" ({} {})", author, ver),
+                None => String::from(""),
+            },
+            self.engine.ep(),
+            if let OrtEP::Cpu = self.engine.ep() {
+                ""
+            } else {
+                "(May still fall back to CPU)"
+            },
+            self.engine.dtype(),
+            self.batch(),
+            if self.engine.is_batch_dynamic() {
+                "Dynamic"
+            } else {
+                "Const"
+            },
+            self.height(),
+            if self.engine.is_height_dynamic() {
+                "Dynamic"
+            } else {
+                "Const"
+            },
+            self.width(),
+            if self.engine.is_width_dynamic() {
+                "Dynamic"
+            } else {
+                "Const"
+            },
+            self.nc(),
+            self.nk(),
+            self.nm(),
+            self.conf,
+            self.kconf,
+            self.iou,
+        );
+    }
+
+    pub fn engine(&self) -> &OrtBackend {
+        &self.engine
+    }
+
+    pub fn conf(&self) -> f32 {
+        self.conf
+    }
+
+    pub fn set_conf(&mut self, val: f32) {
+        self.conf = val;
+    }
+
+    pub fn conf_mut(&mut self) -> &mut f32 {
+        &mut self.conf
+    }
+
+    pub fn kconf(&self) -> f32 {
+        self.kconf
+    }
+
+    pub fn iou(&self) -> f32 {
+        self.iou
+    }
+
+    pub fn task(&self) -> &YOLOTask {
+        &self.task
+    }
+
+    pub fn batch(&self) -> u32 {
+        self.batch
+    }
+
+    pub fn width(&self) -> u32 {
+        self.width
+    }
+
+    pub fn height(&self) -> u32 {
+        self.height
+    }
+
+    pub fn nc(&self) -> u32 {
+        self.nc
+    }
+
+    pub fn nk(&self) -> u32 {
+        self.nk
+    }
+
+    pub fn nm(&self) -> u32 {
+        self.nm
+    }
+
+    pub fn names(&self) -> &Vec<String> {
+        &self.names
+    }
+}
--- a/examples/YOLOv8-ONNXRuntime-Rust/src/ort_backend.rs
+++ b/examples/YOLOv8-ONNXRuntime-Rust/src/ort_backend.rs
@ -0,0 +1,534 @@
+use anyhow::Result;
+use clap::ValueEnum;
+use half::f16;
+use ndarray::{Array, CowArray, IxDyn};
+use ort::execution_providers::{CUDAExecutionProviderOptions, TensorRTExecutionProviderOptions};
+use ort::tensor::TensorElementDataType;
+use ort::{Environment, ExecutionProvider, Session, SessionBuilder, Value};
+use regex::Regex;
+
+#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, ValueEnum)]
+pub enum YOLOTask {
+    // YOLO tasks
+    Classify,
+    Detect,
+    Pose,
+    Segment,
+}
+
+#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord)]
+pub enum OrtEP {
+    // ONNXRuntime execution provider
+    Cpu,
+    Cuda(u32),
+    Trt(u32),
+}
+
+#[derive(Debug)]
+pub struct Batch {
+    pub opt: u32,
+    pub min: u32,
+    pub max: u32,
+}
+
+impl Default for Batch {
+    fn default() -> Self {
+        Self {
+            opt: 1,
+            min: 1,
+            max: 1,
+        }
+    }
+}
+
+#[derive(Debug, Default)]
+pub struct OrtInputs {
+    // ONNX model inputs attrs
+    pub shapes: Vec<Vec<i32>>,
+    pub dtypes: Vec<TensorElementDataType>,
+    pub names: Vec<String>,
+    pub sizes: Vec<Vec<u32>>,
+}
+
+impl OrtInputs {
+    pub fn new(session: &Session) -> Self {
+        let mut shapes = Vec::new();
+        let mut dtypes = Vec::new();
+        let mut names = Vec::new();
+        for i in session.inputs.iter() {
+            let shape: Vec<i32> = i
+                .dimensions()
+                .map(|x| if let Some(x) = x { x as i32 } else { -1i32 })
+                .collect();
+            shapes.push(shape);
+            dtypes.push(i.input_type);
+            names.push(i.name.clone());
+        }
+        Self {
+            shapes,
+            dtypes,
+            names,
+            ..Default::default()
+        }
+    }
+}
+
+#[derive(Debug)]
+pub struct OrtConfig {
+    // ORT config
+    pub f: String,
+    pub task: Option<YOLOTask>,
+    pub ep: OrtEP,
+    pub trt_fp16: bool,
+    pub batch: Batch,
+    pub image_size: (Option<u32>, Option<u32>),
+}
+
+#[derive(Debug)]
+pub struct OrtBackend {
+    // ORT engine
+    session: Session,
+    task: YOLOTask,
+    ep: OrtEP,
+    batch: Batch,
+    inputs: OrtInputs,
+}
+
+impl OrtBackend {
+    pub fn build(args: OrtConfig) -> Result<Self> {
+        // build env & session
+        let env = Environment::builder()
+            .with_name("YOLOv8")
+            .with_log_level(ort::LoggingLevel::Verbose)
+            .build()?
+            .into_arc();
+        let session = SessionBuilder::new(&env)?.with_model_from_file(&args.f)?;
+
+        // get inputs
+        let mut inputs = OrtInputs::new(&session);
+
+        // batch size
+        let mut batch = args.batch;
+        let batch = if inputs.shapes[0][0] == -1 {
+            batch
+        } else {
+            assert_eq!(
+                inputs.shapes[0][0] as u32, batch.opt,
+                "Expected batch size: {}, got {}. Try using `--batch {}`.",
+                inputs.shapes[0][0] as u32, batch.opt, inputs.shapes[0][0] as u32
+            );
+            batch.opt = inputs.shapes[0][0] as u32;
+            batch
+        };
+
+        // input size: height and width
+        let height = if inputs.shapes[0][2] == -1 {
+            match args.image_size.0 {
+                Some(height) => height,
+                None => panic!("Failed to get model height. Make it explicit with `--height`"),
+            }
+        } else {
+            inputs.shapes[0][2] as u32
+        };
+        let width = if inputs.shapes[0][3] == -1 {
+            match args.image_size.1 {
+                Some(width) => width,
+                None => panic!("Failed to get model width. Make it explicit with `--width`"),
+            }
+        } else {
+            inputs.shapes[0][3] as u32
+        };
+        inputs.sizes.push(vec![height, width]);
+
+        // build provider
+        let (ep, provider) = match args.ep {
+            OrtEP::Cuda(device_id) => Self::set_ep_cuda(device_id),
+            OrtEP::Trt(device_id) => Self::set_ep_trt(device_id, args.trt_fp16, &batch, &inputs),
+            _ => (OrtEP::Cpu, ExecutionProvider::CPU(Default::default())),
+        };
+
+        // build session again with the new provider
+        let session = SessionBuilder::new(&env)?
+            // .with_optimization_level(ort::GraphOptimizationLevel::Level3)?
+            .with_execution_providers([provider])?
+            .with_model_from_file(args.f)?;
+
+        // task: using given one or guessing
+        let task = match args.task {
+            Some(task) => task,
+            None => match session.metadata() {
+                Err(_) => panic!("No metadata found. Try making it explicit by `--task`"),
+                Ok(metadata) => match metadata.custom("task") {
+                    Err(_) => panic!("Can not get custom value. Try making it explicit by `--task`"),
+                    Ok(value) => match value {
+                        None => panic!("No correspoing value of `task` found in metadata. Make it explicit by `--task`"),
+                        Some(task) => match task.as_str() {
+                            "classify" => YOLOTask::Classify,
+                            "detect" => YOLOTask::Detect,
+                            "pose" => YOLOTask::Pose,
+                            "segment" => YOLOTask::Segment,
+                            x => todo!("{:?} is not supported for now!", x),
+                        },
+                    },
+                },
+            },
+        };
+
+        Ok(Self {
+            session,
+            task,
+            ep,
+            batch,
+            inputs,
+        })
+    }
+
+    pub fn fetch_inputs_from_session(
+        session: &Session,
+    ) -> (Vec<Vec<i32>>, Vec<TensorElementDataType>, Vec<String>) {
+        // get inputs attrs from ONNX model
+        let mut shapes = Vec::new();
+        let mut dtypes = Vec::new();
+        let mut names = Vec::new();
+        for i in session.inputs.iter() {
+            let shape: Vec<i32> = i
+                .dimensions()
+                .map(|x| if let Some(x) = x { x as i32 } else { -1i32 })
+                .collect();
+            shapes.push(shape);
+            dtypes.push(i.input_type);
+            names.push(i.name.clone());
+        }
+        (shapes, dtypes, names)
+    }
+
+    pub fn set_ep_cuda(device_id: u32) -> (OrtEP, ExecutionProvider) {
+        // set CUDA
+        if ExecutionProvider::CUDA(Default::default()).is_available() {
+            (
+                OrtEP::Cuda(device_id),
+                ExecutionProvider::CUDA(CUDAExecutionProviderOptions {
+                    device_id,
+                    ..Default::default()
+                }),
+            )
+        } else {
+            println!("> CUDA is not available! Using CPU.");
+            (OrtEP::Cpu, ExecutionProvider::CPU(Default::default()))
+        }
+    }
+
+    pub fn set_ep_trt(
+        device_id: u32,
+        fp16: bool,
+        batch: &Batch,
+        inputs: &OrtInputs,
+    ) -> (OrtEP, ExecutionProvider) {
+        // set TensorRT
+        if ExecutionProvider::TensorRT(Default::default()).is_available() {
+            let (height, width) = (inputs.sizes[0][0], inputs.sizes[0][1]);
+
+            // dtype match checking
+            if inputs.dtypes[0] == TensorElementDataType::Float16 && !fp16 {
+                panic!(
+                    "Dtype mismatch! Expected: Float32, got: {:?}. You should use `--fp16`",
+                    inputs.dtypes[0]
+                );
+            }
+
+            // dynamic shape: input_tensor_1:dim_1xdim_2x...,input_tensor_2:dim_3xdim_4x...,...
+            let mut opt_string = String::new();
+            let mut min_string = String::new();
+            let mut max_string = String::new();
+            for name in inputs.names.iter() {
+                let s_opt = format!("{}:{}x3x{}x{},", name, batch.opt, height, width);
+                let s_min = format!("{}:{}x3x{}x{},", name, batch.min, height, width);
+                let s_max = format!("{}:{}x3x{}x{},", name, batch.max, height, width);
+                opt_string.push_str(s_opt.as_str());
+                min_string.push_str(s_min.as_str());
+                max_string.push_str(s_max.as_str());
+            }
+            let _ = opt_string.pop();
+            let _ = min_string.pop();
+            let _ = max_string.pop();
+            (
+                OrtEP::Trt(device_id),
+                ExecutionProvider::TensorRT(TensorRTExecutionProviderOptions {
+                    device_id,
+                    fp16_enable: fp16,
+                    timing_cache_enable: true,
+                    profile_min_shapes: min_string,
+                    profile_max_shapes: max_string,
+                    profile_opt_shapes: opt_string,
+                    ..Default::default()
+                }),
+            )
+        } else {
+            println!("> TensorRT is not available! Try using CUDA...");
+            Self::set_ep_cuda(device_id)
+        }
+    }
+
+    pub fn fetch_from_metadata(&self, key: &str) -> Option<String> {
+        // fetch value from onnx model file by key
+        match self.session.metadata() {
+            Err(_) => None,
+            Ok(metadata) => match metadata.custom(key) {
+                Err(_) => None,
+                Ok(value) => value,
+            },
+        }
+    }
+
+    pub fn run(&self, xs: Array<f32, IxDyn>, profile: bool) -> Result<Vec<Array<f32, IxDyn>>> {
+        // ORT inference
+        match self.dtype() {
+            TensorElementDataType::Float16 => self.run_fp16(xs, profile),
+            TensorElementDataType::Float32 => self.run_fp32(xs, profile),
+            _ => todo!(),
+        }
+    }
+
+    pub fn run_fp16(&self, xs: Array<f32, IxDyn>, profile: bool) -> Result<Vec<Array<f32, IxDyn>>> {
+        // f32->f16
+        let t = std::time::Instant::now();
+        let xs = xs.mapv(f16::from_f32);
+        if profile {
+            println!("[ORT f32->f16]: {:?}", t.elapsed());
+        }
+
+        // h2d
+        let t = std::time::Instant::now();
+        let xs = CowArray::from(xs);
+        let xs = vec![Value::from_array(self.session.allocator(), &xs)?];
+        if profile {
+            println!("[ORT H2D]: {:?}", t.elapsed());
+        }
+
+        // run
+        let t = std::time::Instant::now();
+        let ys = self.session.run(xs)?;
+        if profile {
+            println!("[ORT Inference]: {:?}", t.elapsed());
+        }
+
+        // d2h
+        Ok(ys
+            .iter()
+            .map(|x| {
+                // d2h
+                let t = std::time::Instant::now();
+                let x = x.try_extract::<_>().unwrap().view().clone().into_owned();
+                if profile {
+                    println!("[ORT D2H]: {:?}", t.elapsed());
+                }
+
+                // f16->f32
+                let t_ = std::time::Instant::now();
+                let x = x.mapv(f16::to_f32);
+                if profile {
+                    println!("[ORT f16->f32]: {:?}", t_.elapsed());
+                }
+                x
+            })
+            .collect::<Vec<Array<_, _>>>())
+    }
+
+    pub fn run_fp32(&self, xs: Array<f32, IxDyn>, profile: bool) -> Result<Vec<Array<f32, IxDyn>>> {
+        // h2d
+        let t = std::time::Instant::now();
+        let xs = CowArray::from(xs);
+        let xs = vec![Value::from_array(self.session.allocator(), &xs)?];
+        if profile {
+            println!("[ORT H2D]: {:?}", t.elapsed());
+        }
+
+        // run
+        let t = std::time::Instant::now();
+        let ys = self.session.run(xs)?;
+        if profile {
+            println!("[ORT Inference]: {:?}", t.elapsed());
+        }
+
+        // d2h
+        Ok(ys
+            .iter()
+            .map(|x| {
+                let t = std::time::Instant::now();
+                let x = x.try_extract::<_>().unwrap().view().clone().into_owned();
+                if profile {
+                    println!("[ORT D2H]: {:?}", t.elapsed());
+                }
+                x
+            })
+            .collect::<Vec<Array<_, _>>>())
+    }
+
+    pub fn output_shapes(&self) -> Vec<Vec<i32>> {
+        let mut shapes = Vec::new();
+        for o in &self.session.outputs {
+            let shape: Vec<_> = o
+                .dimensions()
+                .map(|x| if let Some(x) = x { x as i32 } else { -1i32 })
+                .collect();
+            shapes.push(shape);
+        }
+        shapes
+    }
+
+    pub fn output_dtypes(&self) -> Vec<TensorElementDataType> {
+        let mut dtypes = Vec::new();
+        self.session
+            .outputs
+            .iter()
+            .for_each(|x| dtypes.push(x.output_type));
+        dtypes
+    }
+
+    pub fn input_shapes(&self) -> &Vec<Vec<i32>> {
+        &self.inputs.shapes
+    }
+
+    pub fn input_names(&self) -> &Vec<String> {
+        &self.inputs.names
+    }
+
+    pub fn input_dtypes(&self) -> &Vec<TensorElementDataType> {
+        &self.inputs.dtypes
+    }
+
+    pub fn dtype(&self) -> TensorElementDataType {
+        self.input_dtypes()[0]
+    }
+
+    pub fn height(&self) -> u32 {
+        self.inputs.sizes[0][0]
+    }
+
+    pub fn width(&self) -> u32 {
+        self.inputs.sizes[0][1]
+    }
+
+    pub fn is_height_dynamic(&self) -> bool {
+        self.input_shapes()[0][2] == -1
+    }
+
+    pub fn is_width_dynamic(&self) -> bool {
+        self.input_shapes()[0][3] == -1
+    }
+
+    pub fn batch(&self) -> u32 {
+        self.batch.opt
+    }
+
+    pub fn is_batch_dynamic(&self) -> bool {
+        self.input_shapes()[0][0] == -1
+    }
+
+    pub fn ep(&self) -> &OrtEP {
+        &self.ep
+    }
+
+    pub fn task(&self) -> YOLOTask {
+        self.task.clone()
+    }
+
+    pub fn names(&self) -> Option<Vec<String>> {
+        // class names, metadata parsing
+        // String format: `{0: 'person', 1: 'bicycle', 2: 'sports ball', ..., 27: "yellow_lady's_slipper"}`
+        match self.fetch_from_metadata("names") {
+            Some(names) => {
+                let re = Regex::new(r#"(['"])([-()\w '"]+)(['"])"#).unwrap();
+                let mut names_ = vec![];
+                for (_, [_, name, _]) in re.captures_iter(&names).map(|x| x.extract()) {
+                    names_.push(name.to_string());
+                }
+                Some(names_)
+            }
+            None => None,
+        }
+    }
+
+    pub fn nk(&self) -> Option<u32> {
+        // num_keypoints, metadata parsing: String `nk` in onnx model: `[17, 3]`
+        match self.fetch_from_metadata("kpt_shape") {
+            None => None,
+            Some(kpt_string) => {
+                let re = Regex::new(r"([0-9]+), ([0-9]+)").unwrap();
+                let caps = re.captures(&kpt_string).unwrap();
+                Some(caps.get(1).unwrap().as_str().parse::<u32>().unwrap())
+            }
+        }
+    }
+
+    pub fn nc(&self) -> Option<u32> {
+        // num_classes
+        match self.names() {
+            // by names
+            Some(names) => Some(names.len() as u32),
+            None => match self.task() {
+                // by task calculation
+                YOLOTask::Classify => Some(self.output_shapes()[0][1] as u32),
+                YOLOTask::Detect => {
+                    if self.output_shapes()[0][1] == -1 {
+                        None
+                    } else {
+                        // cxywhclss
+                        Some(self.output_shapes()[0][1] as u32 - 4)
+                    }
+                }
+                YOLOTask::Pose => {
+                    match self.nk() {
+                        None => None,
+                        Some(nk) => {
+                            if self.output_shapes()[0][1] == -1 {
+                                None
+                            } else {
+                                // cxywhclss3*kpt
+                                Some(self.output_shapes()[0][1] as u32 - 4 - 3 * nk)
+                            }
+                        }
+                    }
+                }
+                YOLOTask::Segment => {
+                    if self.output_shapes()[0][1] == -1 {
+                        None
+                    } else {
+                        // cxywhclssnm
+                        Some((self.output_shapes()[0][1] - self.output_shapes()[1][1]) as u32 - 4)
+                    }
+                }
+            },
+        }
+    }
+
+    pub fn nm(&self) -> Option<u32> {
+        // num_masks
+        match self.task() {
+            YOLOTask::Segment => Some(self.output_shapes()[1][1] as u32),
+            _ => None,
+        }
+    }
+
+    pub fn na(&self) -> Option<u32> {
+        // num_anchors
+        match self.task() {
+            YOLOTask::Segment | YOLOTask::Detect | YOLOTask::Pose => {
+                if self.output_shapes()[0][2] == -1 {
+                    None
+                } else {
+                    Some(self.output_shapes()[0][2] as u32)
+                }
+            }
+            _ => None,
+        }
+    }
+
+    pub fn author(&self) -> Option<String> {
+        self.fetch_from_metadata("author")
+    }
+
+    pub fn version(&self) -> Option<String> {
+        self.fetch_from_metadata("version")
+    }
+}
--- a/examples/YOLOv8-ONNXRuntime-Rust/src/yolo_result.rs
+++ b/examples/YOLOv8-ONNXRuntime-Rust/src/yolo_result.rs
@ -0,0 +1,235 @@
+use ndarray::{Array, Axis, IxDyn};
+
+#[derive(Clone, PartialEq, Default)]
+pub struct YOLOResult {
+    // YOLO tasks results of an image
+    pub probs: Option<Embedding>,
+    pub bboxes: Option<Vec<Bbox>>,
+    pub keypoints: Option<Vec<Vec<Point2>>>,
+    pub masks: Option<Vec<Vec<u8>>>,
+}
+
+impl std::fmt::Debug for YOLOResult {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        f.debug_struct("YOLOResult")
+            .field(
+                "Probs(top5)",
+                &format_args!("{:?}", self.probs().map(|probs| probs.topk(5))),
+            )
+            .field("Bboxes", &self.bboxes)
+            .field("Keypoints", &self.keypoints)
+            .field(
+                "Masks",
+                &format_args!("{:?}", self.masks().map(|masks| masks.len())),
+            )
+            .finish()
+    }
+}
+
+impl YOLOResult {
+    pub fn new(
+        probs: Option<Embedding>,
+        bboxes: Option<Vec<Bbox>>,
+        keypoints: Option<Vec<Vec<Point2>>>,
+        masks: Option<Vec<Vec<u8>>>,
+    ) -> Self {
+        Self {
+            probs,
+            bboxes,
+            keypoints,
+            masks,
+        }
+    }
+
+    pub fn probs(&self) -> Option<&Embedding> {
+        self.probs.as_ref()
+    }
+
+    pub fn keypoints(&self) -> Option<&Vec<Vec<Point2>>> {
+        self.keypoints.as_ref()
+    }
+
+    pub fn masks(&self) -> Option<&Vec<Vec<u8>>> {
+        self.masks.as_ref()
+    }
+
+    pub fn bboxes(&self) -> Option<&Vec<Bbox>> {
+        self.bboxes.as_ref()
+    }
+
+    pub fn bboxes_mut(&mut self) -> Option<&mut Vec<Bbox>> {
+        self.bboxes.as_mut()
+    }
+}
+
+#[derive(Debug, PartialEq, Clone, Default)]
+pub struct Point2 {
+    // A point2d with x, y, conf
+    x: f32,
+    y: f32,
+    confidence: f32,
+}
+
+impl Point2 {
+    pub fn new_with_conf(x: f32, y: f32, confidence: f32) -> Self {
+        Self { x, y, confidence }
+    }
+
+    pub fn new(x: f32, y: f32) -> Self {
+        Self {
+            x,
+            y,
+            ..Default::default()
+        }
+    }
+
+    pub fn x(&self) -> f32 {
+        self.x
+    }
+
+    pub fn y(&self) -> f32 {
+        self.y
+    }
+
+    pub fn confidence(&self) -> f32 {
+        self.confidence
+    }
+}
+
+#[derive(Debug, Clone, PartialEq, Default)]
+pub struct Embedding {
+    // An float32 n-dims tensor
+    data: Array<f32, IxDyn>,
+}
+
+impl Embedding {
+    pub fn new(data: Array<f32, IxDyn>) -> Self {
+        Self { data }
+    }
+
+    pub fn data(&self) -> &Array<f32, IxDyn> {
+        &self.data
+    }
+
+    pub fn topk(&self, k: usize) -> Vec<(usize, f32)> {
+        let mut probs = self
+            .data
+            .iter()
+            .enumerate()
+            .map(|(a, b)| (a, *b))
+            .collect::<Vec<_>>();
+        probs.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
+        let mut topk = Vec::new();
+        for &(id, confidence) in probs.iter().take(k) {
+            topk.push((id, confidence));
+        }
+        topk
+    }
+
+    pub fn norm(&self) -> Array<f32, IxDyn> {
+        let std_ = self.data.mapv(|x| x * x).sum_axis(Axis(0)).mapv(f32::sqrt);
+        self.data.clone() / std_
+    }
+
+    pub fn top1(&self) -> (usize, f32) {
+        self.topk(1)[0]
+    }
+}
+
+#[derive(Debug, Clone, PartialEq, Default)]
+pub struct Bbox {
+    // a bounding box around an object
+    xmin: f32,
+    ymin: f32,
+    width: f32,
+    height: f32,
+    id: usize,
+    confidence: f32,
+}
+
+impl Bbox {
+    pub fn new_from_xywh(xmin: f32, ymin: f32, width: f32, height: f32) -> Self {
+        Self {
+            xmin,
+            ymin,
+            width,
+            height,
+            ..Default::default()
+        }
+    }
+
+    pub fn new(xmin: f32, ymin: f32, width: f32, height: f32, id: usize, confidence: f32) -> Self {
+        Self {
+            xmin,
+            ymin,
+            width,
+            height,
+            id,
+            confidence,
+        }
+    }
+
+    pub fn width(&self) -> f32 {
+        self.width
+    }
+
+    pub fn height(&self) -> f32 {
+        self.height
+    }
+
+    pub fn xmin(&self) -> f32 {
+        self.xmin
+    }
+
+    pub fn ymin(&self) -> f32 {
+        self.ymin
+    }
+
+    pub fn xmax(&self) -> f32 {
+        self.xmin + self.width
+    }
+
+    pub fn ymax(&self) -> f32 {
+        self.ymin + self.height
+    }
+
+    pub fn tl(&self) -> Point2 {
+        Point2::new(self.xmin, self.ymin)
+    }
+
+    pub fn br(&self) -> Point2 {
+        Point2::new(self.xmax(), self.ymax())
+    }
+
+    pub fn cxcy(&self) -> Point2 {
+        Point2::new(self.xmin + self.width / 2., self.ymin + self.height / 2.)
+    }
+
+    pub fn id(&self) -> usize {
+        self.id
+    }
+
+    pub fn confidence(&self) -> f32 {
+        self.confidence
+    }
+
+    pub fn area(&self) -> f32 {
+        self.width * self.height
+    }
+
+    pub fn intersection_area(&self, another: &Bbox) -> f32 {
+        let l = self.xmin.max(another.xmin);
+        let r = (self.xmin + self.width).min(another.xmin + another.width);
+        let t = self.ymin.max(another.ymin);
+        let b = (self.ymin + self.height).min(another.ymin + another.height);
+        (r - l + 1.).max(0.) * (b - t + 1.).max(0.)
+    }
+
+    pub fn union(&self, another: &Bbox) -> f32 {
+        self.area() + another.area() - self.intersection_area(another)
+    }
+
+    pub fn iou(&self, another: &Bbox) -> f32 {
+        self.intersection_area(another) / self.union(another)
+    }
+}
--- a/examples/YOLOv8-ONNXRuntime/README.md
+++ b/examples/YOLOv8-ONNXRuntime/README.md
@ -0,0 +1,43 @@
+# YOLOv8 - ONNX Runtime
+
+This project implements YOLOv8 using ONNX Runtime.
+
+## Installation
+
+To run this project, you need to install the required dependencies. The following instructions will guide you through the installation process.
+
+### Installing Required Dependencies
+
+You can install the required dependencies by running the following command:
+
+```bash
+pip install -r requirements.txt
+```
+
+### Installing `onnxruntime-gpu`
+
+If you have an NVIDIA GPU and want to leverage GPU acceleration, you can install the onnxruntime-gpu package using the following command:
+
+```bash
+pip install onnxruntime-gpu
+```
+
+Note: Make sure you have the appropriate GPU drivers installed on your system.
+
+### Installing `onnxruntime` (CPU version)
+
+If you don't have an NVIDIA GPU or prefer to use the CPU version of onnxruntime, you can install the onnxruntime package using the following command:
+
+```bash
+pip install onnxruntime
+```
+
+### Usage
+
+After successfully installing the required packages, you can run the YOLOv8 implementation using the following command:
+
+```bash
+python main.py --model yolov8n.onnx --img image.jpg --conf-thres 0.5 --iou-thres 0.5
+```
+
+Make sure to replace yolov8n.onnx with the path to your YOLOv8 ONNX model file, image.jpg with the path to your input image, and adjust the confidence threshold (conf-thres) and IoU threshold (iou-thres) values as needed.
--- a/examples/YOLOv8-ONNXRuntime/main.py
+++ b/examples/YOLOv8-ONNXRuntime/main.py
@ -0,0 +1,231 @@
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+
+import argparse
+
+import cv2
+import numpy as np
+import onnxruntime as ort
+import torch
+
+from ultralytics.utils import ASSETS, yaml_load
+from ultralytics.utils.checks import check_requirements, check_yaml
+
+
+class YOLOv8:
+    """YOLOv8 object detection model class for handling inference and visualization."""
+
+    def __init__(self, onnx_model, input_image, confidence_thres, iou_thres):
+        """
+        Initializes an instance of the YOLOv8 class.
+
+        Args:
+            onnx_model: Path to the ONNX model.
+            input_image: Path to the input image.
+            confidence_thres: Confidence threshold for filtering detections.
+            iou_thres: IoU (Intersection over Union) threshold for non-maximum suppression.
+        """
+        self.onnx_model = onnx_model
+        self.input_image = input_image
+        self.confidence_thres = confidence_thres
+        self.iou_thres = iou_thres
+
+        # Load the class names from the COCO dataset
+        self.classes = yaml_load(check_yaml("coco128.yaml"))["names"]
+
+        # Generate a color palette for the classes
+        self.color_palette = np.random.uniform(0, 255, size=(len(self.classes), 3))
+
+    def draw_detections(self, img, box, score, class_id):
+        """
+        Draws bounding boxes and labels on the input image based on the detected objects.
+
+        Args:
+            img: The input image to draw detections on.
+            box: Detected bounding box.
+            score: Corresponding detection score.
+            class_id: Class ID for the detected object.
+
+        Returns:
+            None
+        """
+
+        # Extract the coordinates of the bounding box
+        x1, y1, w, h = box
+
+        # Retrieve the color for the class ID
+        color = self.color_palette[class_id]
+
+        # Draw the bounding box on the image
+        cv2.rectangle(img, (int(x1), int(y1)), (int(x1 + w), int(y1 + h)), color, 2)
+
+        # Create the label text with class name and score
+        label = f"{self.classes[class_id]}: {score:.2f}"
+
+        # Calculate the dimensions of the label text
+        (label_width, label_height), _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
+
+        # Calculate the position of the label text
+        label_x = x1
+        label_y = y1 - 10 if y1 - 10 > label_height else y1 + 10
+
+        # Draw a filled rectangle as the background for the label text
+        cv2.rectangle(
+            img, (label_x, label_y - label_height), (label_x + label_width, label_y + label_height), color, cv2.FILLED
+        )
+
+        # Draw the label text on the image
+        cv2.putText(img, label, (label_x, label_y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1, cv2.LINE_AA)
+
+    def preprocess(self):
+        """
+        Preprocesses the input image before performing inference.
+
+        Returns:
+            image_data: Preprocessed image data ready for inference.
+        """
+        # Read the input image using OpenCV
+        self.img = cv2.imread(self.input_image)
+
+        # Get the height and width of the input image
+        self.img_height, self.img_width = self.img.shape[:2]
+
+        # Convert the image color space from BGR to RGB
+        img = cv2.cvtColor(self.img, cv2.COLOR_BGR2RGB)
+
+        # Resize the image to match the input shape
+        img = cv2.resize(img, (self.input_width, self.input_height))
+
+        # Normalize the image data by dividing it by 255.0
+        image_data = np.array(img) / 255.0
+
+        # Transpose the image to have the channel dimension as the first dimension
+        image_data = np.transpose(image_data, (2, 0, 1))  # Channel first
+
+        # Expand the dimensions of the image data to match the expected input shape
+        image_data = np.expand_dims(image_data, axis=0).astype(np.float32)
+
+        # Return the preprocessed image data
+        return image_data
+
+    def postprocess(self, input_image, output):
+        """
+        Performs post-processing on the model's output to extract bounding boxes, scores, and class IDs.
+
+        Args:
+            input_image (numpy.ndarray): The input image.
+            output (numpy.ndarray): The output of the model.
+
+        Returns:
+            numpy.ndarray: The input image with detections drawn on it.
+        """
+
+        # Transpose and squeeze the output to match the expected shape
+        outputs = np.transpose(np.squeeze(output[0]))
+
+        # Get the number of rows in the outputs array
+        rows = outputs.shape[0]
+
+        # Lists to store the bounding boxes, scores, and class IDs of the detections
+        boxes = []
+        scores = []
+        class_ids = []
+
+        # Calculate the scaling factors for the bounding box coordinates
+        x_factor = self.img_width / self.input_width
+        y_factor = self.img_height / self.input_height
+
+        # Iterate over each row in the outputs array
+        for i in range(rows):
+            # Extract the class scores from the current row
+            classes_scores = outputs[i][4:]
+
+            # Find the maximum score among the class scores
+            max_score = np.amax(classes_scores)
+
+            # If the maximum score is above the confidence threshold
+            if max_score >= self.confidence_thres:
+                # Get the class ID with the highest score
+                class_id = np.argmax(classes_scores)
+
+                # Extract the bounding box coordinates from the current row
+                x, y, w, h = outputs[i][0], outputs[i][1], outputs[i][2], outputs[i][3]
+
+                # Calculate the scaled coordinates of the bounding box
+                left = int((x - w / 2) * x_factor)
+                top = int((y - h / 2) * y_factor)
+                width = int(w * x_factor)
+                height = int(h * y_factor)
+
+                # Add the class ID, score, and box coordinates to the respective lists
+                class_ids.append(class_id)
+                scores.append(max_score)
+                boxes.append([left, top, width, height])
+
+        # Apply non-maximum suppression to filter out overlapping bounding boxes
+        indices = cv2.dnn.NMSBoxes(boxes, scores, self.confidence_thres, self.iou_thres)
+
+        # Iterate over the selected indices after non-maximum suppression
+        for i in indices:
+            # Get the box, score, and class ID corresponding to the index
+            box = boxes[i]
+            score = scores[i]
+            class_id = class_ids[i]
+
+            # Draw the detection on the input image
+            self.draw_detections(input_image, box, score, class_id)
+
+        # Return the modified input image
+        return input_image
+
+    def main(self):
+        """
+        Performs inference using an ONNX model and returns the output image with drawn detections.
+
+        Returns:
+            output_img: The output image with drawn detections.
+        """
+        # Create an inference session using the ONNX model and specify execution providers
+        session = ort.InferenceSession(self.onnx_model, providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
+
+        # Get the model inputs
+        model_inputs = session.get_inputs()
+
+        # Store the shape of the input for later use
+        input_shape = model_inputs[0].shape
+        self.input_width = input_shape[2]
+        self.input_height = input_shape[3]
+
+        # Preprocess the image data
+        img_data = self.preprocess()
+
+        # Run inference using the preprocessed image data
+        outputs = session.run(None, {model_inputs[0].name: img_data})
+
+        # Perform post-processing on the outputs to obtain output image.
+        return self.postprocess(self.img, outputs)  # output image
+
+
+if __name__ == "__main__":
+    # Create an argument parser to handle command-line arguments
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model", type=str, default="yolov8n.onnx", help="Input your ONNX model.")
+    parser.add_argument("--img", type=str, default=str(ASSETS / "bus.jpg"), help="Path to input image.")
+    parser.add_argument("--conf-thres", type=float, default=0.5, help="Confidence threshold")
+    parser.add_argument("--iou-thres", type=float, default=0.5, help="NMS IoU threshold")
+    args = parser.parse_args()
+
+    # Check the requirements and select the appropriate backend (CPU or GPU)
+    check_requirements("onnxruntime-gpu" if torch.cuda.is_available() else "onnxruntime")
+
+    # Create an instance of the YOLOv8 class with the specified arguments
+    detection = YOLOv8(args.model, args.img, args.conf_thres, args.iou_thres)
+
+    # Perform object detection and obtain the output image
+    output_image = detection.main()
+
+    # Display the output image in a window
+    cv2.namedWindow("Output", cv2.WINDOW_NORMAL)
+    cv2.imshow("Output", output_image)
+
+    # Wait for a key press to exit
+    cv2.waitKey(0)
--- a/examples/YOLOv8-OpenCV-ONNX-Python/README.md
+++ b/examples/YOLOv8-OpenCV-ONNX-Python/README.md
@ -0,0 +1,19 @@
+# YOLOv8 - OpenCV
+
+Implementation YOLOv8 on OpenCV using ONNX Format.
+
+Just simply clone and run
+
+```bash
+pip install -r requirements.txt
+python main.py --model yolov8n.onnx --img image.jpg
+```
+
+If you start from scratch:
+
+```bash
+pip install ultralytics
+yolo export model=yolov8n.pt imgsz=640 format=onnx opset=12
+```
+
+_\*Make sure to include "opset=12"_
--- a/examples/YOLOv8-OpenCV-ONNX-Python/main.py
+++ b/examples/YOLOv8-OpenCV-ONNX-Python/main.py
@ -0,0 +1,130 @@
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+
+import argparse
+
+import cv2.dnn
+import numpy as np
+
+from ultralytics.utils import ASSETS, yaml_load
+from ultralytics.utils.checks import check_yaml
+
+CLASSES = yaml_load(check_yaml("coco128.yaml"))["names"]
+colors = np.random.uniform(0, 255, size=(len(CLASSES), 3))
+
+
+def draw_bounding_box(img, class_id, confidence, x, y, x_plus_w, y_plus_h):
+    """
+    Draws bounding boxes on the input image based on the provided arguments.
+
+    Args:
+        img (numpy.ndarray): The input image to draw the bounding box on.
+        class_id (int): Class ID of the detected object.
+        confidence (float): Confidence score of the detected object.
+        x (int): X-coordinate of the top-left corner of the bounding box.
+        y (int): Y-coordinate of the top-left corner of the bounding box.
+        x_plus_w (int): X-coordinate of the bottom-right corner of the bounding box.
+        y_plus_h (int): Y-coordinate of the bottom-right corner of the bounding box.
+    """
+    label = f"{CLASSES[class_id]} ({confidence:.2f})"
+    color = colors[class_id]
+    cv2.rectangle(img, (x, y), (x_plus_w, y_plus_h), color, 2)
+    cv2.putText(img, label, (x - 10, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
+
+
+def main(onnx_model, input_image):
+    """
+    Main function to load ONNX model, perform inference, draw bounding boxes, and display the output image.
+
+    Args:
+        onnx_model (str): Path to the ONNX model.
+        input_image (str): Path to the input image.
+
+    Returns:
+        list: List of dictionaries containing detection information such as class_id, class_name, confidence, etc.
+    """
+    # Load the ONNX model
+    model: cv2.dnn.Net = cv2.dnn.readNetFromONNX(onnx_model)
+
+    # Read the input image
+    original_image: np.ndarray = cv2.imread(input_image)
+    [height, width, _] = original_image.shape
+
+    # Prepare a square image for inference
+    length = max((height, width))
+    image = np.zeros((length, length, 3), np.uint8)
+    image[0:height, 0:width] = original_image
+
+    # Calculate scale factor
+    scale = length / 640
+
+    # Preprocess the image and prepare blob for model
+    blob = cv2.dnn.blobFromImage(image, scalefactor=1 / 255, size=(640, 640), swapRB=True)
+    model.setInput(blob)
+
+    # Perform inference
+    outputs = model.forward()
+
+    # Prepare output array
+    outputs = np.array([cv2.transpose(outputs[0])])
+    rows = outputs.shape[1]
+
+    boxes = []
+    scores = []
+    class_ids = []
+
+    # Iterate through output to collect bounding boxes, confidence scores, and class IDs
+    for i in range(rows):
+        classes_scores = outputs[0][i][4:]
+        (minScore, maxScore, minClassLoc, (x, maxClassIndex)) = cv2.minMaxLoc(classes_scores)
+        if maxScore >= 0.25:
+            box = [
+                outputs[0][i][0] - (0.5 * outputs[0][i][2]),
+                outputs[0][i][1] - (0.5 * outputs[0][i][3]),
+                outputs[0][i][2],
+                outputs[0][i][3],
+            ]
+            boxes.append(box)
+            scores.append(maxScore)
+            class_ids.append(maxClassIndex)
+
+    # Apply NMS (Non-maximum suppression)
+    result_boxes = cv2.dnn.NMSBoxes(boxes, scores, 0.25, 0.45, 0.5)
+
+    detections = []
+
+    # Iterate through NMS results to draw bounding boxes and labels
+    for i in range(len(result_boxes)):
+        index = result_boxes[i]
+        box = boxes[index]
+        detection = {
+            "class_id": class_ids[index],
+            "class_name": CLASSES[class_ids[index]],
+            "confidence": scores[index],
+            "box": box,
+            "scale": scale,
+        }
+        detections.append(detection)
+        draw_bounding_box(
+            original_image,
+            class_ids[index],
+            scores[index],
+            round(box[0] * scale),
+            round(box[1] * scale),
+            round((box[0] + box[2]) * scale),
+            round((box[1] + box[3]) * scale),
+        )
+
+    # Display the image with bounding boxes
+    cv2.imshow("image", original_image)
+    cv2.waitKey(0)
+    cv2.destroyAllWindows()
+
+    return detections
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model", default="yolov8n.onnx", help="Input your ONNX model.")
+    parser.add_argument("--img", default=str(ASSETS / "bus.jpg"), help="Path to input image.")
+    args = parser.parse_args()
+    main(args.model, args.img)
--- a/examples/YOLOv8-OpenCV-int8-tflite-Python/README.md
+++ b/examples/YOLOv8-OpenCV-int8-tflite-Python/README.md
@ -0,0 +1,65 @@
+# YOLOv8 - Int8-TFLite Runtime
+
+Welcome to the YOLOv8 Int8 TFLite Runtime for efficient and optimized object detection project. This README provides comprehensive instructions for installing and using our YOLOv8 implementation.
+
+## Installation
+
+Ensure a smooth setup by following these steps to install necessary dependencies.
+
+### Installing Required Dependencies
+
+Install all required dependencies with this simple command:
+
+```bash
+pip install -r requirements.txt
+```
+
+### Installing `tflite-runtime`
+
+To load TFLite models, install the `tflite-runtime` package using:
+
+```bash
+pip install tflite-runtime
+```
+
+### Installing `tensorflow-gpu` (For NVIDIA GPU Users)
+
+Leverage GPU acceleration with NVIDIA GPUs by installing `tensorflow-gpu`:
+
+```bash
+pip install tensorflow-gpu
+```
+
+**Note:** Ensure you have compatible GPU drivers installed on your system.
+
+### Installing `tensorflow` (CPU Version)
+
+For CPU usage or non-NVIDIA GPUs, install TensorFlow with:
+
+```bash
+pip install tensorflow
+```
+
+## Usage
+
+Follow these instructions to run YOLOv8 after successful installation.
+
+Convert the YOLOv8 model to Int8 TFLite format:
+
+```bash
+yolo export model=yolov8n.pt imgsz=640 format=tflite int8
+```
+
+Locate the Int8 TFLite model in `yolov8n_saved_model`. Choose `best_full_integer_quant` or verify quantization at [Netron](https://netron.app/). Then, execute the following in your terminal:
+
+```bash
+python main.py --model yolov8n_full_integer_quant.tflite --img image.jpg --conf-thres 0.5 --iou-thres 0.5
+```
+
+Replace `best_full_integer_quant.tflite` with your model file's path, `image.jpg` with your input image, and adjust the confidence (conf-thres) and IoU thresholds (iou-thres) as necessary.
+
+### Output
+
+The output is displayed as annotated images, showcasing the model's detection capabilities:
+
+![image](https://github.com/wamiqraza/Attribute-recognition-and-reidentification-Market1501-dataset/blob/main/img/bus.jpg)
--- a/examples/YOLOv8-OpenCV-int8-tflite-Python/main.py
+++ b/examples/YOLOv8-OpenCV-int8-tflite-Python/main.py
@ -0,0 +1,299 @@
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+
+import argparse
+
+import cv2
+import numpy as np
+from tflite_runtime import interpreter as tflite
+
+from ultralytics.utils import ASSETS, yaml_load
+from ultralytics.utils.checks import check_yaml
+
+# Declare as global variables, can be updated based trained model image size
+img_width = 640
+img_height = 640
+
+
+class LetterBox:
+    def __init__(
+        self, new_shape=(img_width, img_height), auto=False, scaleFill=False, scaleup=True, center=True, stride=32
+    ):
+        self.new_shape = new_shape
+        self.auto = auto
+        self.scaleFill = scaleFill
+        self.scaleup = scaleup
+        self.stride = stride
+        self.center = center  # Put the image in the middle or top-left
+
+    def __call__(self, labels=None, image=None):
+        """Return updated labels and image with added border."""
+
+        if labels is None:
+            labels = {}
+        img = labels.get("img") if image is None else image
+        shape = img.shape[:2]  # current shape [height, width]
+        new_shape = labels.pop("rect_shape", self.new_shape)
+        if isinstance(new_shape, int):
+            new_shape = (new_shape, new_shape)
+
+        # Scale ratio (new / old)
+        r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
+        if not self.scaleup:  # only scale down, do not scale up (for better val mAP)
+            r = min(r, 1.0)
+
+        # Compute padding
+        ratio = r, r  # width, height ratios
+        new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
+        dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
+        if self.auto:  # minimum rectangle
+            dw, dh = np.mod(dw, self.stride), np.mod(dh, self.stride)  # wh padding
+        elif self.scaleFill:  # stretch
+            dw, dh = 0.0, 0.0
+            new_unpad = (new_shape[1], new_shape[0])
+            ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios
+
+        if self.center:
+            dw /= 2  # divide padding into 2 sides
+            dh /= 2
+
+        if shape[::-1] != new_unpad:  # resize
+            img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
+        top, bottom = int(round(dh - 0.1)) if self.center else 0, int(round(dh + 0.1))
+        left, right = int(round(dw - 0.1)) if self.center else 0, int(round(dw + 0.1))
+        img = cv2.copyMakeBorder(
+            img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(114, 114, 114)
+        )  # add border
+        if labels.get("ratio_pad"):
+            labels["ratio_pad"] = (labels["ratio_pad"], (left, top))  # for evaluation
+
+        if len(labels):
+            labels = self._update_labels(labels, ratio, dw, dh)
+            labels["img"] = img
+            labels["resized_shape"] = new_shape
+            return labels
+        else:
+            return img
+
+    def _update_labels(self, labels, ratio, padw, padh):
+        """Update labels."""
+
+        labels["instances"].convert_bbox(format="xyxy")
+        labels["instances"].denormalize(*labels["img"].shape[:2][::-1])
+        labels["instances"].scale(*ratio)
+        labels["instances"].add_padding(padw, padh)
+        return labels
+
+
+class Yolov8TFLite:
+    def __init__(self, tflite_model, input_image, confidence_thres, iou_thres):
+        """
+        Initializes an instance of the Yolov8TFLite class.
+
+        Args:
+            tflite_model: Path to the TFLite model.
+            input_image: Path to the input image.
+            confidence_thres: Confidence threshold for filtering detections.
+            iou_thres: IoU (Intersection over Union) threshold for non-maximum suppression.
+        """
+
+        self.tflite_model = tflite_model
+        self.input_image = input_image
+        self.confidence_thres = confidence_thres
+        self.iou_thres = iou_thres
+
+        # Load the class names from the COCO dataset
+        self.classes = yaml_load(check_yaml("coco128.yaml"))["names"]
+
+        # Generate a color palette for the classes
+        self.color_palette = np.random.uniform(0, 255, size=(len(self.classes), 3))
+
+    def draw_detections(self, img, box, score, class_id):
+        """
+        Draws bounding boxes and labels on the input image based on the detected objects.
+
+        Args:
+            img: The input image to draw detections on.
+            box: Detected bounding box.
+            score: Corresponding detection score.
+            class_id: Class ID for the detected object.
+
+        Returns:
+            None
+        """
+
+        # Extract the coordinates of the bounding box
+        x1, y1, w, h = box
+
+        # Retrieve the color for the class ID
+        color = self.color_palette[class_id]
+
+        # Draw the bounding box on the image
+        cv2.rectangle(img, (int(x1), int(y1)), (int(x1 + w), int(y1 + h)), color, 2)
+
+        # Create the label text with class name and score
+        label = f"{self.classes[class_id]}: {score:.2f}"
+
+        # Calculate the dimensions of the label text
+        (label_width, label_height), _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
+
+        # Calculate the position of the label text
+        label_x = x1
+        label_y = y1 - 10 if y1 - 10 > label_height else y1 + 10
+
+        # Draw a filled rectangle as the background for the label text
+        cv2.rectangle(
+            img,
+            (int(label_x), int(label_y - label_height)),
+            (int(label_x + label_width), int(label_y + label_height)),
+            color,
+            cv2.FILLED,
+        )
+
+        # Draw the label text on the image
+        cv2.putText(img, label, (int(label_x), int(label_y)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1, cv2.LINE_AA)
+
+    def preprocess(self):
+        """
+        Preprocesses the input image before performing inference.
+
+        Returns:
+            image_data: Preprocessed image data ready for inference.
+        """
+
+        # Read the input image using OpenCV
+        self.img = cv2.imread(self.input_image)
+
+        print("image before", self.img)
+        # Get the height and width of the input image
+        self.img_height, self.img_width = self.img.shape[:2]
+
+        letterbox = LetterBox(new_shape=[img_width, img_height], auto=False, stride=32)
+        image = letterbox(image=self.img)
+        image = [image]
+        image = np.stack(image)
+        image = image[..., ::-1].transpose((0, 3, 1, 2))
+        img = np.ascontiguousarray(image)
+        # n, h, w, c
+        image = img.astype(np.float32)
+        return image / 255
+
+    def postprocess(self, input_image, output):
+        """
+        Performs post-processing on the model's output to extract bounding boxes, scores, and class IDs.
+
+        Args:
+            input_image (numpy.ndarray): The input image.
+            output (numpy.ndarray): The output of the model.
+
+        Returns:
+            numpy.ndarray: The input image with detections drawn on it.
+        """
+
+        boxes = []
+        scores = []
+        class_ids = []
+        for pred in output:
+            pred = np.transpose(pred)
+            for box in pred:
+                x, y, w, h = box[:4]
+                x1 = x - w / 2
+                y1 = y - h / 2
+                boxes.append([x1, y1, w, h])
+                idx = np.argmax(box[4:])
+                scores.append(box[idx + 4])
+                class_ids.append(idx)
+
+        indices = cv2.dnn.NMSBoxes(boxes, scores, self.confidence_thres, self.iou_thres)
+
+        for i in indices:
+            # Get the box, score, and class ID corresponding to the index
+            box = boxes[i]
+            gain = min(img_width / self.img_width, img_height / self.img_height)
+            pad = (
+                round((img_width - self.img_width * gain) / 2 - 0.1),
+                round((img_height - self.img_height * gain) / 2 - 0.1),
+            )
+            box[0] = (box[0] - pad[0]) / gain
+            box[1] = (box[1] - pad[1]) / gain
+            box[2] = box[2] / gain
+            box[3] = box[3] / gain
+            score = scores[i]
+            class_id = class_ids[i]
+            if score > 0.25:
+                print(box, score, class_id)
+                # Draw the detection on the input image
+                self.draw_detections(input_image, box, score, class_id)
+
+        return input_image
+
+    def main(self):
+        """
+        Performs inference using a TFLite model and returns the output image with drawn detections.
+
+        Returns:
+            output_img: The output image with drawn detections.
+        """
+
+        # Create an interpreter for the TFLite model
+        interpreter = tflite.Interpreter(model_path=self.tflite_model)
+        self.model = interpreter
+        interpreter.allocate_tensors()
+
+        # Get the model inputs
+        input_details = interpreter.get_input_details()
+        output_details = interpreter.get_output_details()
+
+        # Store the shape of the input for later use
+        input_shape = input_details[0]["shape"]
+        self.input_width = input_shape[1]
+        self.input_height = input_shape[2]
+
+        # Preprocess the image data
+        img_data = self.preprocess()
+        img_data = img_data
+        # img_data = img_data.cpu().numpy()
+        # Set the input tensor to the interpreter
+        print(input_details[0]["index"])
+        print(img_data.shape)
+        img_data = img_data.transpose((0, 2, 3, 1))
+
+        scale, zero_point = input_details[0]["quantization"]
+        interpreter.set_tensor(input_details[0]["index"], img_data)
+
+        # Run inference
+        interpreter.invoke()
+
+        # Get the output tensor from the interpreter
+        output = interpreter.get_tensor(output_details[0]["index"])
+        scale, zero_point = output_details[0]["quantization"]
+        output = (output.astype(np.float32) - zero_point) * scale
+
+        output[:, [0, 2]] *= img_width
+        output[:, [1, 3]] *= img_height
+        print(output)
+        # Perform post-processing on the outputs to obtain output image.
+        return self.postprocess(self.img, output)
+
+
+if __name__ == "__main__":
+    # Create an argument parser to handle command-line arguments
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model", type=str, default="yolov8n_full_integer_quant.tflite", help="Input your TFLite model."
+    )
+    parser.add_argument("--img", type=str, default=str(ASSETS / "bus.jpg"), help="Path to input image.")
+    parser.add_argument("--conf-thres", type=float, default=0.5, help="Confidence threshold")
+    parser.add_argument("--iou-thres", type=float, default=0.5, help="NMS IoU threshold")
+    args = parser.parse_args()
+
+    # Create an instance of the Yolov8TFLite class with the specified arguments
+    detection = Yolov8TFLite(args.model, args.img, args.conf_thres, args.iou_thres)
+
+    # Perform object detection and obtain the output image
+    output_image = detection.main()
+
+    # Display the output image in a window
+    cv2.imshow("Output", output_image)
+
+    # Wait for a key press to exit
+    cv2.waitKey(0)
--- a/examples/YOLOv8-Region-Counter/readme.md
+++ b/examples/YOLOv8-Region-Counter/readme.md
@ -0,0 +1,123 @@
+# Regions Counting Using YOLOv8 (Inference on Video)
+
+- Region counting is a method employed to tally the objects within a specified area, allowing for more sophisticated analyses when multiple regions are considered. These regions can be adjusted interactively using a Left Mouse Click, and the counting process occurs in real time.
+- Regions can be adjusted to suit the user's preferences and requirements.
+
+<div>
+<p align="center">
+  <img src="https://github.com/RizwanMunawar/ultralytics/assets/62513924/5ab3bbd7-fd12-4849-928e-5f294d6c3fcf" width="45%" alt="YOLOv8 region counting visual 1">
+  <img src="https://github.com/RizwanMunawar/ultralytics/assets/62513924/e7c1aea7-474d-4d78-8d48-b50854ffe1ca" width="45%" alt="YOLOv8 region counting visual 2">
+</p>
+</div>
+
+## Table of Contents
+
+- [Step 1: Install the Required Libraries](#step-1-install-the-required-libraries)
+- [Step 2: Run the Region Counting Using Ultralytics YOLOv8](#step-2-run-the-region-counting-using-ultralytics-yolov8)
+- [Usage Options](#usage-options)
+- [FAQ](#faq)
+
+## Step 1: Install the Required Libraries
+
+Clone the repository, install dependencies and `cd` to this local directory for commands in Step 2.
+
+```bash
+# Clone ultralytics repo
+git clone https://github.com/ultralytics/ultralytics
+
+# cd to local directory
+cd ultralytics/examples/YOLOv8-Region-Counter
+```
+
+## Step 2: Run the Region Counting Using Ultralytics YOLOv8
+
+Here are the basic commands for running the inference:
+
+### Note
+
+After the video begins playing, you can freely move the region anywhere within the video by simply clicking and dragging using the left mouse button.
+
+```bash
+# If you want to save results
+python yolov8_region_counter.py --source "path/to/video.mp4" --save-img --view-img
+
+# If you want to run model on CPU
+python yolov8_region_counter.py --source "path/to/video.mp4" --save-img --view-img --device cpu
+
+# If you want to change model file
+python yolov8_region_counter.py --source "path/to/video.mp4" --save-img --weights "path/to/model.pt"
+
+# If you want to detect specific class (first class and third class)
+python yolov8_region_counter.py --source "path/to/video.mp4" --classes 0 2 --weights "path/to/model.pt"
+
+# If you dont want to save results
+python yolov8_region_counter.py --source "path/to/video.mp4" --view-img
+```
+
+## Usage Options
+
+- `--source`: Specifies the path to the video file you want to run inference on.
+- `--device`: Specifies the device `cpu` or `0`
+- `--save-img`: Flag to save the detection results as images.
+- `--weights`: Specifies a different YOLOv8 model file (e.g., `yolov8n.pt`, `yolov8s.pt`, `yolov8m.pt`, `yolov8l.pt`, `yolov8x.pt`).
+- `--classes`: Specifies the class to be detected
+- `--line-thickness`: Specifies the bounding box thickness
+- `--region-thickness`: Specifies the region boxes thickness
+- `--track-thickness`: Specifies the track line thickness
+
+## FAQ
+
+**1. What Does Region Counting Involve?**
+
+Region counting is a computational method utilized to ascertain the quantity of objects within a specific area in recorded video or real-time streams. This technique finds frequent application in image processing, computer vision, and pattern recognition, facilitating the analysis and segmentation of objects or features based on their spatial relationships.
+
+**2. Is Friendly Region Plotting Supported by the Region Counter?**
+
+The Region Counter offers the capability to create regions in various formats, such as polygons and rectangles. You have the flexibility to modify region attributes, including coordinates, colors, and other details, as demonstrated in the following code:
+
+```python
+from shapely.geometry import Polygon
+
+counting_regions = [
+    {
+        "name": "YOLOv8 Polygon Region",
+        "polygon": Polygon(
+            [(50, 80), (250, 20), (450, 80), (400, 350), (100, 350)]
+        ),  # Polygon with five points (Pentagon)
+        "counts": 0,
+        "dragging": False,
+        "region_color": (255, 42, 4),  # BGR Value
+        "text_color": (255, 255, 255),  # Region Text Color
+    },
+    {
+        "name": "YOLOv8 Rectangle Region",
+        "polygon": Polygon(
+            [(200, 250), (440, 250), (440, 550), (200, 550)]
+        ),  # Rectangle with four points
+        "counts": 0,
+        "dragging": False,
+        "region_color": (37, 255, 225),  # BGR Value
+        "text_color": (0, 0, 0),  # Region Text Color
+    },
+]
+```
+
+**3. Why Combine Region Counting with YOLOv8?**
+
+YOLOv8 specializes in the detection and tracking of objects in video streams. Region counting complements this by enabling object counting within designated areas, making it a valuable application of YOLOv8.
+
+**4. How Can I Troubleshoot Issues?**
+
+To gain more insights during inference, you can include the `--debug` flag in your command:
+
+```bash
+python yolov8_region_counter.py --source "path to video file" --debug
+```
+
+**5. Can I Employ Other YOLO Versions?**
+
+Certainly, you have the flexibility to specify different YOLO model weights using the `--weights` option.
+
+**6. Where Can I Access Additional Information?**
+
+For a comprehensive guide on using YOLOv8 with Object Tracking, please refer to [Multi-Object Tracking with Ultralytics YOLO](https://docs.ultralytics.com/modes/track/).
--- a/examples/YOLOv8-Region-Counter/yolov8_region_counter.py
+++ b/examples/YOLOv8-Region-Counter/yolov8_region_counter.py
@ -0,0 +1,251 @@
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+
+import argparse
+from collections import defaultdict
+from pathlib import Path
+
+import cv2
+import numpy as np
+from shapely.geometry import Polygon
+from shapely.geometry.point import Point
+
+from ultralytics import YOLO
+from ultralytics.utils.files import increment_path
+from ultralytics.utils.plotting import Annotator, colors
+
+track_history = defaultdict(list)
+
+current_region = None
+counting_regions = [
+    {
+        "name": "YOLOv8 Polygon Region",
+        "polygon": Polygon([(50, 80), (250, 20), (450, 80), (400, 350), (100, 350)]),  # Polygon points
+        "counts": 0,
+        "dragging": False,
+        "region_color": (255, 42, 4),  # BGR Value
+        "text_color": (255, 255, 255),  # Region Text Color
+    },
+    {
+        "name": "YOLOv8 Rectangle Region",
+        "polygon": Polygon([(200, 250), (440, 250), (440, 550), (200, 550)]),  # Polygon points
+        "counts": 0,
+        "dragging": False,
+        "region_color": (37, 255, 225),  # BGR Value
+        "text_color": (0, 0, 0),  # Region Text Color
+    },
+]
+
+
+def mouse_callback(event, x, y, flags, param):
+    """
+    Handles mouse events for region manipulation.
+
+    Parameters:
+        event (int): The mouse event type (e.g., cv2.EVENT_LBUTTONDOWN).
+        x (int): The x-coordinate of the mouse pointer.
+        y (int): The y-coordinate of the mouse pointer.
+        flags (int): Additional flags passed by OpenCV.
+        param: Additional parameters passed to the callback (not used in this function).
+
+    Global Variables:
+        current_region (dict): A dictionary representing the current selected region.
+
+    Mouse Events:
+        - LBUTTONDOWN: Initiates dragging for the region containing the clicked point.
+        - MOUSEMOVE: Moves the selected region if dragging is active.
+        - LBUTTONUP: Ends dragging for the selected region.
+
+    Notes:
+        - This function is intended to be used as a callback for OpenCV mouse events.
+        - Requires the existence of the 'counting_regions' list and the 'Polygon' class.
+
+    Example:
+        >>> cv2.setMouseCallback(window_name, mouse_callback)
+    """
+    global current_region
+
+    # Mouse left button down event
+    if event == cv2.EVENT_LBUTTONDOWN:
+        for region in counting_regions:
+            if region["polygon"].contains(Point((x, y))):
+                current_region = region
+                current_region["dragging"] = True
+                current_region["offset_x"] = x
+                current_region["offset_y"] = y
+
+    # Mouse move event
+    elif event == cv2.EVENT_MOUSEMOVE:
+        if current_region is not None and current_region["dragging"]:
+            dx = x - current_region["offset_x"]
+            dy = y - current_region["offset_y"]
+            current_region["polygon"] = Polygon(
+                [(p[0] + dx, p[1] + dy) for p in current_region["polygon"].exterior.coords]
+            )
+            current_region["offset_x"] = x
+            current_region["offset_y"] = y
+
+    # Mouse left button up event
+    elif event == cv2.EVENT_LBUTTONUP:
+        if current_region is not None and current_region["dragging"]:
+            current_region["dragging"] = False
+
+
+def run(
+    weights="yolov8n.pt",
+    source=None,
+    device="cpu",
+    view_img=False,
+    save_img=False,
+    exist_ok=False,
+    classes=None,
+    line_thickness=2,
+    track_thickness=2,
+    region_thickness=2,
+):
+    """
+    Run Region counting on a video using YOLOv8 and ByteTrack.
+
+    Supports movable region for real time counting inside specific area.
+    Supports multiple regions counting.
+    Regions can be Polygons or rectangle in shape
+
+    Args:
+        weights (str): Model weights path.
+        source (str): Video file path.
+        device (str): processing device cpu, 0, 1
+        view_img (bool): Show results.
+        save_img (bool): Save results.
+        exist_ok (bool): Overwrite existing files.
+        classes (list): classes to detect and track
+        line_thickness (int): Bounding box thickness.
+        track_thickness (int): Tracking line thickness
+        region_thickness (int): Region thickness.
+    """
+    vid_frame_count = 0
+
+    # Check source path
+    if not Path(source).exists():
+        raise FileNotFoundError(f"Source path '{source}' does not exist.")
+
+    # Setup Model
+    model = YOLO(f"{weights}")
+    model.to("cuda") if device == "0" else model.to("cpu")
+
+    # Extract classes names
+    names = model.model.names
+
+    # Video setup
+    videocapture = cv2.VideoCapture(source)
+    frame_width, frame_height = int(videocapture.get(3)), int(videocapture.get(4))
+    fps, fourcc = int(videocapture.get(5)), cv2.VideoWriter_fourcc(*"mp4v")
+
+    # Output setup
+    save_dir = increment_path(Path("ultralytics_rc_output") / "exp", exist_ok)
+    save_dir.mkdir(parents=True, exist_ok=True)
+    video_writer = cv2.VideoWriter(str(save_dir / f"{Path(source).stem}.mp4"), fourcc, fps, (frame_width, frame_height))
+
+    # Iterate over video frames
+    while videocapture.isOpened():
+        success, frame = videocapture.read()
+        if not success:
+            break
+        vid_frame_count += 1
+
+        # Extract the results
+        results = model.track(frame, persist=True, classes=classes)
+
+        if results[0].boxes.id is not None:
+            boxes = results[0].boxes.xyxy.cpu()
+            track_ids = results[0].boxes.id.int().cpu().tolist()
+            clss = results[0].boxes.cls.cpu().tolist()
+
+            annotator = Annotator(frame, line_width=line_thickness, example=str(names))
+
+            for box, track_id, cls in zip(boxes, track_ids, clss):
+                annotator.box_label(box, str(names[cls]), color=colors(cls, True))
+                bbox_center = (box[0] + box[2]) / 2, (box[1] + box[3]) / 2  # Bbox center
+
+                track = track_history[track_id]  # Tracking Lines plot
+                track.append((float(bbox_center[0]), float(bbox_center[1])))
+                if len(track) > 30:
+                    track.pop(0)
+                points = np.hstack(track).astype(np.int32).reshape((-1, 1, 2))
+                cv2.polylines(frame, [points], isClosed=False, color=colors(cls, True), thickness=track_thickness)
+
+                # Check if detection inside region
+                for region in counting_regions:
+                    if region["polygon"].contains(Point((bbox_center[0], bbox_center[1]))):
+                        region["counts"] += 1
+
+        # Draw regions (Polygons/Rectangles)
+        for region in counting_regions:
+            region_label = str(region["counts"])
+            region_color = region["region_color"]
+            region_text_color = region["text_color"]
+
+            polygon_coords = np.array(region["polygon"].exterior.coords, dtype=np.int32)
+            centroid_x, centroid_y = int(region["polygon"].centroid.x), int(region["polygon"].centroid.y)
+
+            text_size, _ = cv2.getTextSize(
+                region_label, cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.7, thickness=line_thickness
+            )
+            text_x = centroid_x - text_size[0] // 2
+            text_y = centroid_y + text_size[1] // 2
+            cv2.rectangle(
+                frame,
+                (text_x - 5, text_y - text_size[1] - 5),
+                (text_x + text_size[0] + 5, text_y + 5),
+                region_color,
+                -1,
+            )
+            cv2.putText(
+                frame, region_label, (text_x, text_y), cv2.FONT_HERSHEY_SIMPLEX, 0.7, region_text_color, line_thickness
+            )
+            cv2.polylines(frame, [polygon_coords], isClosed=True, color=region_color, thickness=region_thickness)
+
+        if view_img:
+            if vid_frame_count == 1:
+                cv2.namedWindow("Ultralytics YOLOv8 Region Counter Movable")
+                cv2.setMouseCallback("Ultralytics YOLOv8 Region Counter Movable", mouse_callback)
+            cv2.imshow("Ultralytics YOLOv8 Region Counter Movable", frame)
+
+        if save_img:
+            video_writer.write(frame)
+
+        for region in counting_regions:  # Reinitialize count for each region
+            region["counts"] = 0
+
+        if cv2.waitKey(1) & 0xFF == ord("q"):
+            break
+
+    del vid_frame_count
+    video_writer.release()
+    videocapture.release()
+    cv2.destroyAllWindows()
+
+
+def parse_opt():
+    """Parse command line arguments."""
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--weights", type=str, default="yolov8n.pt", help="initial weights path")
+    parser.add_argument("--device", default="", help="cuda device, i.e. 0 or 0,1,2,3 or cpu")
+    parser.add_argument("--source", type=str, required=True, help="video file path")
+    parser.add_argument("--view-img", action="store_true", help="show results")
+    parser.add_argument("--save-img", action="store_true", help="save results")
+    parser.add_argument("--exist-ok", action="store_true", help="existing project/name ok, do not increment")
+    parser.add_argument("--classes", nargs="+", type=int, help="filter by class: --classes 0, or --classes 0 2 3")
+    parser.add_argument("--line-thickness", type=int, default=2, help="bounding box thickness")
+    parser.add_argument("--track-thickness", type=int, default=2, help="Tracking line thickness")
+    parser.add_argument("--region-thickness", type=int, default=4, help="Region thickness")
+
+    return parser.parse_args()
+
+
+def main(opt):
+    """Main function."""
+    run(**vars(opt))
+
+
+if __name__ == "__main__":
+    opt = parse_opt()
+    main(opt)
--- a/examples/YOLOv8-SAHI-Inference-Video/readme.md
+++ b/examples/YOLOv8-SAHI-Inference-Video/readme.md
@ -0,0 +1,69 @@
+# YOLOv8 with SAHI (Inference on Video)
+
+[SAHI](https://docs.ultralytics.com/guides/sahi-tiled-inference/) is designed to optimize object detection algorithms for large-scale and high-resolution imagery. It partitions images into manageable slices, performs object detection on each slice, and then stitches the results back together. This tutorial will guide you through the process of running YOLOv8 inference on video files with the aid of SAHI.
+
+## Table of Contents
+
+- [Step 1: Install the Required Libraries](#step-1-install-the-required-libraries)
+- [Step 2: Run the Inference with SAHI using Ultralytics YOLOv8](#step-2-run-the-inference-with-sahi-using-ultralytics-yolov8)
+- [Usage Options](#usage-options)
+- [FAQ](#faq)
+
+## Step 1: Install the Required Libraries
+
+Clone the repository, install dependencies and `cd` to this local directory for commands in Step 2.
+
+```bash
+# Clone ultralytics repo
+git clone https://github.com/ultralytics/ultralytics
+
+# Install dependencies
+pip install sahi ultralytics
+
+# cd to local directory
+cd ultralytics/examples/YOLOv8-SAHI-Inference-Video
+```
+
+## Step 2: Run the Inference with SAHI using Ultralytics YOLOv8
+
+Here are the basic commands for running the inference:
+
+```bash
+#if you want to save results
+python yolov8_sahi.py --source "path/to/video.mp4" --save-img
+
+#if you want to change model file
+python yolov8_sahi.py --source "path/to/video.mp4" --save-img --weights "yolov8n.pt"
+```
+
+## Usage Options
+
+- `--source`: Specifies the path to the video file you want to run inference on.
+- `--save-img`: Flag to save the detection results as images.
+- `--weights`: Specifies a different YOLOv8 model file (e.g., `yolov8n.pt`, `yolov8s.pt`, `yolov8m.pt`, `yolov8l.pt`, `yolov8x.pt`).
+
+## FAQ
+
+**1. What is SAHI?**
+
+SAHI stands for Slicing, Analysis, and Healing of Images. It is a library designed to optimize object detection algorithms for large-scale and high-resolution images. The library source code is available on [GitHub](https://github.com/obss/sahi).
+
+**2. Why use SAHI with YOLOv8?**
+
+SAHI can handle large-scale images by slicing them into smaller, more manageable sizes without compromising the detection quality. This makes it a great companion to YOLOv8, especially when working with high-resolution videos.
+
+**3. How do I debug issues?**
+
+You can add the `--debug` flag to your command to print out more information during inference:
+
+```bash
+python yolov8_sahi.py --source "path to video file" --debug
+```
+
+**4. Can I use other YOLO versions?**
+
+Yes, you can specify different YOLO model weights using the `--weights` option.
+
+**5. Where can I find more information?**
+
+For a full guide to YOLOv8 with SAHI see [https://docs.ultralytics.com/guides/sahi-tiled-inference](https://docs.ultralytics.com/guides/sahi-tiled-inference/).
--- a/examples/YOLOv8-SAHI-Inference-Video/yolov8_sahi.py
+++ b/examples/YOLOv8-SAHI-Inference-Video/yolov8_sahi.py
@ -0,0 +1,111 @@
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+
+import argparse
+from pathlib import Path
+
+import cv2
+from sahi import AutoDetectionModel
+from sahi.predict import get_sliced_prediction
+from sahi.utils.yolov8 import download_yolov8s_model
+
+from ultralytics.utils.files import increment_path
+
+
+def run(weights="yolov8n.pt", source="test.mp4", view_img=False, save_img=False, exist_ok=False):
+    """
+    Run object detection on a video using YOLOv8 and SAHI.
+
+    Args:
+        weights (str): Model weights path.
+        source (str): Video file path.
+        view_img (bool): Show results.
+        save_img (bool): Save results.
+        exist_ok (bool): Overwrite existing files.
+    """
+
+    # Check source path
+    if not Path(source).exists():
+        raise FileNotFoundError(f"Source path '{source}' does not exist.")
+
+    yolov8_model_path = f"models/{weights}"
+    download_yolov8s_model(yolov8_model_path)
+    detection_model = AutoDetectionModel.from_pretrained(
+        model_type="yolov8", model_path=yolov8_model_path, confidence_threshold=0.3, device="cpu"
+    )
+
+    # Video setup
+    videocapture = cv2.VideoCapture(source)
+    frame_width, frame_height = int(videocapture.get(3)), int(videocapture.get(4))
+    fps, fourcc = int(videocapture.get(5)), cv2.VideoWriter_fourcc(*"mp4v")
+
+    # Output setup
+    save_dir = increment_path(Path("ultralytics_results_with_sahi") / "exp", exist_ok)
+    save_dir.mkdir(parents=True, exist_ok=True)
+    video_writer = cv2.VideoWriter(str(save_dir / f"{Path(source).stem}.mp4"), fourcc, fps, (frame_width, frame_height))
+
+    while videocapture.isOpened():
+        success, frame = videocapture.read()
+        if not success:
+            break
+
+        results = get_sliced_prediction(
+            frame, detection_model, slice_height=512, slice_width=512, overlap_height_ratio=0.2, overlap_width_ratio=0.2
+        )
+        object_prediction_list = results.object_prediction_list
+
+        boxes_list = []
+        clss_list = []
+        for ind, _ in enumerate(object_prediction_list):
+            boxes = (
+                object_prediction_list[ind].bbox.minx,
+                object_prediction_list[ind].bbox.miny,
+                object_prediction_list[ind].bbox.maxx,
+                object_prediction_list[ind].bbox.maxy,
+            )
+            clss = object_prediction_list[ind].category.name
+            boxes_list.append(boxes)
+            clss_list.append(clss)
+
+        for box, cls in zip(boxes_list, clss_list):
+            x1, y1, x2, y2 = box
+            cv2.rectangle(frame, (int(x1), int(y1)), (int(x2), int(y2)), (56, 56, 255), 2)
+            label = str(cls)
+            t_size = cv2.getTextSize(label, 0, fontScale=0.6, thickness=1)[0]
+            cv2.rectangle(
+                frame, (int(x1), int(y1) - t_size[1] - 3), (int(x1) + t_size[0], int(y1) + 3), (56, 56, 255), -1
+            )
+            cv2.putText(
+                frame, label, (int(x1), int(y1) - 2), 0, 0.6, [255, 255, 255], thickness=1, lineType=cv2.LINE_AA
+            )
+
+        if view_img:
+            cv2.imshow(Path(source).stem, frame)
+        if save_img:
+            video_writer.write(frame)
+
+        if cv2.waitKey(1) & 0xFF == ord("q"):
+            break
+    video_writer.release()
+    videocapture.release()
+    cv2.destroyAllWindows()
+
+
+def parse_opt():
+    """Parse command line arguments."""
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--weights", type=str, default="yolov8n.pt", help="initial weights path")
+    parser.add_argument("--source", type=str, required=True, help="video file path")
+    parser.add_argument("--view-img", action="store_true", help="show results")
+    parser.add_argument("--save-img", action="store_true", help="save results")
+    parser.add_argument("--exist-ok", action="store_true", help="existing project/name ok, do not increment")
+    return parser.parse_args()
+
+
+def main(opt):
+    """Main function."""
+    run(**vars(opt))
+
+
+if __name__ == "__main__":
+    opt = parse_opt()
+    main(opt)
--- a/examples/YOLOv8-Segmentation-ONNXRuntime-Python/README.md
+++ b/examples/YOLOv8-Segmentation-ONNXRuntime-Python/README.md
@ -0,0 +1,63 @@
+# YOLOv8-Segmentation-ONNXRuntime-Python Demo
+
+This repository provides a Python demo for performing segmentation with YOLOv8 using ONNX Runtime, highlighting the interoperability of YOLOv8 models without the need for the full PyTorch stack.
+
+## Features
+
+- **Framework Agnostic**: Runs segmentation inference purely on ONNX Runtime without importing PyTorch.
+- **Efficient Inference**: Supports both FP32 and FP16 precision for ONNX models, catering to different computational needs.
+- **Ease of Use**: Utilizes simple command-line arguments for model execution.
+- **Broad Compatibility**: Leverages Numpy and OpenCV for image processing, ensuring broad compatibility with various environments.
+
+## Installation
+
+Install the required packages using pip. You will need `ultralytics` for exporting YOLOv8-seg ONNX model and using some utility functions, `onnxruntime-gpu` for GPU-accelerated inference, and `opencv-python` for image processing.
+
+```bash
+pip install ultralytics
+pip install onnxruntime-gpu  # For GPU support
+# pip install onnxruntime    # Use this instead if you don't have an NVIDIA GPU
+pip install numpy
+pip install opencv-python
+```
+
+## Getting Started
+
+### 1. Export the YOLOv8 ONNX Model
+
+Export the YOLOv8 segmentation model to ONNX format using the provided `ultralytics` package.
+
+```bash
+yolo export model=yolov8s-seg.pt imgsz=640 format=onnx opset=12 simplify
+```
+
+### 2. Run Inference
+
+Perform inference with the exported ONNX model on your images.
+
+```bash
+python main.py --model-path <MODEL_PATH> --source <IMAGE_PATH>
+```
+
+### Example Output
+
+After running the command, you should see segmentation results similar to this:
+
+<img src="https://user-images.githubusercontent.com/51357717/279988626-eb74823f-1563-4d58-a8e4-0494025b7c9a.jpg" alt="Segmentation Demo" width="800">
+
+## Advanced Usage
+
+For more advanced usage, including real-time video processing, please refer to the `main.py` script's command-line arguments.
+
+## Contributing
+
+We welcome contributions to improve this demo! Please submit issues and pull requests for bug reports, feature requests, or submitting a new algorithm enhancement.
+
+## License
+
+This project is licensed under the AGPL-3.0 License - see the [LICENSE](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) file for details.
+
+## Acknowledgments
+
+- The YOLOv8-Segmentation-ONNXRuntime-Python demo is contributed by GitHub user [jamjamjon](https://github.com/jamjamjon).
+- Thanks to the ONNX Runtime community for providing a robust and efficient inference engine.
--- a/examples/YOLOv8-Segmentation-ONNXRuntime-Python/main.py
+++ b/examples/YOLOv8-Segmentation-ONNXRuntime-Python/main.py
@ -0,0 +1,342 @@
+# Ultralytics YOLO 🚀, AGPL-3.0 license
+
+import argparse
+
+import cv2
+import numpy as np
+import onnxruntime as ort
+
+from ultralytics.utils import ASSETS, yaml_load
+from ultralytics.utils.checks import check_yaml
+from ultralytics.utils.plotting import Colors
+
+
+class YOLOv8Seg:
+    """YOLOv8 segmentation model."""
+
+    def __init__(self, onnx_model):
+        """
+        Initialization.
+
+        Args:
+            onnx_model (str): Path to the ONNX model.
+        """
+
+        # Build Ort session
+        self.session = ort.InferenceSession(
+            onnx_model,
+            providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
+            if ort.get_device() == "GPU"
+            else ["CPUExecutionProvider"],
+        )
+
+        # Numpy dtype: support both FP32 and FP16 onnx model
+        self.ndtype = np.half if self.session.get_inputs()[0].type == "tensor(float16)" else np.single
+
+        # Get model width and height(YOLOv8-seg only has one input)
+        self.model_height, self.model_width = [x.shape for x in self.session.get_inputs()][0][-2:]
+
+        # Load COCO class names
+        self.classes = yaml_load(check_yaml("coco128.yaml"))["names"]
+
+        # Create color palette
+        self.color_palette = Colors()
+
+    def __call__(self, im0, conf_threshold=0.4, iou_threshold=0.45, nm=32):
+        """
+        The whole pipeline: pre-process -> inference -> post-process.
+
+        Args:
+            im0 (Numpy.ndarray): original input image.
+            conf_threshold (float): confidence threshold for filtering predictions.
+            iou_threshold (float): iou threshold for NMS.
+            nm (int): the number of masks.
+
+        Returns:
+            boxes (List): list of bounding boxes.
+            segments (List): list of segments.
+            masks (np.ndarray): [N, H, W], output masks.
+        """
+
+        # Pre-process
+        im, ratio, (pad_w, pad_h) = self.preprocess(im0)
+
+        # Ort inference
+        preds = self.session.run(None, {self.session.get_inputs()[0].name: im})
+
+        # Post-process
+        boxes, segments, masks = self.postprocess(
+            preds,
+            im0=im0,
+            ratio=ratio,
+            pad_w=pad_w,
+            pad_h=pad_h,
+            conf_threshold=conf_threshold,
+            iou_threshold=iou_threshold,
+            nm=nm,
+        )
+        return boxes, segments, masks
+
+    def preprocess(self, img):
+        """
+        Pre-processes the input image.
+
+        Args:
+            img (Numpy.ndarray): image about to be processed.
+
+        Returns:
+            img_process (Numpy.ndarray): image preprocessed for inference.
+            ratio (tuple): width, height ratios in letterbox.
+            pad_w (float): width padding in letterbox.
+            pad_h (float): height padding in letterbox.
+        """
+
+        # Resize and pad input image using letterbox() (Borrowed from Ultralytics)
+        shape = img.shape[:2]  # original image shape
+        new_shape = (self.model_height, self.model_width)
+        r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
+        ratio = r, r
+        new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
+        pad_w, pad_h = (new_shape[1] - new_unpad[0]) / 2, (new_shape[0] - new_unpad[1]) / 2  # wh padding
+        if shape[::-1] != new_unpad:  # resize
+            img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
+        top, bottom = int(round(pad_h - 0.1)), int(round(pad_h + 0.1))
+        left, right = int(round(pad_w - 0.1)), int(round(pad_w + 0.1))
+        img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(114, 114, 114))
+
+        # Transforms: HWC to CHW -> BGR to RGB -> div(255) -> contiguous -> add axis(optional)
+        img = np.ascontiguousarray(np.einsum("HWC->CHW", img)[::-1], dtype=self.ndtype) / 255.0
+        img_process = img[None] if len(img.shape) == 3 else img
+        return img_process, ratio, (pad_w, pad_h)
+
+    def postprocess(self, preds, im0, ratio, pad_w, pad_h, conf_threshold, iou_threshold, nm=32):
+        """
+        Post-process the prediction.
+
+        Args:
+            preds (Numpy.ndarray): predictions come from ort.session.run().
+            im0 (Numpy.ndarray): [h, w, c] original input image.
+            ratio (tuple): width, height ratios in letterbox.
+            pad_w (float): width padding in letterbox.
+            pad_h (float): height padding in letterbox.
+            conf_threshold (float): conf threshold.
+            iou_threshold (float): iou threshold.
+            nm (int): the number of masks.
+
+        Returns:
+            boxes (List): list of bounding boxes.
+            segments (List): list of segments.
+            masks (np.ndarray): [N, H, W], output masks.
+        """
+        x, protos = preds[0], preds[1]  # Two outputs: predictions and protos
+
+        # Transpose the first output: (Batch_size, xywh_conf_cls_nm, Num_anchors) -> (Batch_size, Num_anchors, xywh_conf_cls_nm)
+        x = np.einsum("bcn->bnc", x)
+
+        # Predictions filtering by conf-threshold
+        x = x[np.amax(x[..., 4:-nm], axis=-1) > conf_threshold]
+
+        # Create a new matrix which merge these(box, score, cls, nm) into one
+        # For more details about `numpy.c_()`: https://numpy.org/doc/1.26/reference/generated/numpy.c_.html
+        x = np.c_[x[..., :4], np.amax(x[..., 4:-nm], axis=-1), np.argmax(x[..., 4:-nm], axis=-1), x[..., -nm:]]
+
+        # NMS filtering
+        x = x[cv2.dnn.NMSBoxes(x[:, :4], x[:, 4], conf_threshold, iou_threshold)]
+
+        # Decode and return
+        if len(x) > 0:
+            # Bounding boxes format change: cxcywh -> xyxy
+            x[..., [0, 1]] -= x[..., [2, 3]] / 2
+            x[..., [2, 3]] += x[..., [0, 1]]
+
+            # Rescales bounding boxes from model shape(model_height, model_width) to the shape of original image
+            x[..., :4] -= [pad_w, pad_h, pad_w, pad_h]
+            x[..., :4] /= min(ratio)
+
+            # Bounding boxes boundary clamp
+            x[..., [0, 2]] = x[:, [0, 2]].clip(0, im0.shape[1])
+            x[..., [1, 3]] = x[:, [1, 3]].clip(0, im0.shape[0])
+
+            # Process masks
+            masks = self.process_mask(protos[0], x[:, 6:], x[:, :4], im0.shape)
+
+            # Masks -> Segments(contours)
+            segments = self.masks2segments(masks)
+            return x[..., :6], segments, masks  # boxes, segments, masks
+        else:
+            return [], [], []
+
+    @staticmethod
+    def masks2segments(masks):
+        """
+        It takes a list of masks(n,h,w) and returns a list of segments(n,xy) (Borrowed from
+        https://github.com/ultralytics/ultralytics/blob/465df3024f44fa97d4fad9986530d5a13cdabdca/ultralytics/utils/ops.py#L750)
+
+        Args:
+            masks (numpy.ndarray): the output of the model, which is a tensor of shape (batch_size, 160, 160).
+
+        Returns:
+            segments (List): list of segment masks.
+        """
+        segments = []
+        for x in masks.astype("uint8"):
+            c = cv2.findContours(x, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[0]  # CHAIN_APPROX_SIMPLE
+            if c:
+                c = np.array(c[np.array([len(x) for x in c]).argmax()]).reshape(-1, 2)
+            else:
+                c = np.zeros((0, 2))  # no segments found
+            segments.append(c.astype("float32"))
+        return segments
+
+    @staticmethod
+    def crop_mask(masks, boxes):
+        """
+        It takes a mask and a bounding box, and returns a mask that is cropped to the bounding box. (Borrowed from
+        https://github.com/ultralytics/ultralytics/blob/465df3024f44fa97d4fad9986530d5a13cdabdca/ultralytics/utils/ops.py#L599)
+
+        Args:
+            masks (Numpy.ndarray): [n, h, w] tensor of masks.
+            boxes (Numpy.ndarray): [n, 4] tensor of bbox coordinates in relative point form.
+
+        Returns:
+            (Numpy.ndarray): The masks are being cropped to the bounding box.
+        """
+        n, h, w = masks.shape
+        x1, y1, x2, y2 = np.split(boxes[:, :, None], 4, 1)
+        r = np.arange(w, dtype=x1.dtype)[None, None, :]
+        c = np.arange(h, dtype=x1.dtype)[None, :, None]
+        return masks * ((r >= x1) * (r < x2) * (c >= y1) * (c < y2))
+
+    def process_mask(self, protos, masks_in, bboxes, im0_shape):
+        """
+        Takes the output of the mask head, and applies the mask to the bounding boxes. This produces masks of higher quality
+        but is slower. (Borrowed from https://github.com/ultralytics/ultralytics/blob/465df3024f44fa97d4fad9986530d5a13cdabdca/ultralytics/utils/ops.py#L618)
+
+        Args:
+            protos (numpy.ndarray): [mask_dim, mask_h, mask_w].
+            masks_in (numpy.ndarray): [n, mask_dim], n is number of masks after nms.
+            bboxes (numpy.ndarray): bboxes re-scaled to original image shape.
+            im0_shape (tuple): the size of the input image (h,w,c).
+
+        Returns:
+            (numpy.ndarray): The upsampled masks.
+        """
+        c, mh, mw = protos.shape
+        masks = np.matmul(masks_in, protos.reshape((c, -1))).reshape((-1, mh, mw)).transpose(1, 2, 0)  # HWN
+        masks = np.ascontiguousarray(masks)
+        masks = self.scale_mask(masks, im0_shape)  # re-scale mask from P3 shape to original input image shape
+        masks = np.einsum("HWN -> NHW", masks)  # HWN -> NHW
+        masks = self.crop_mask(masks, bboxes)
+        return np.greater(masks, 0.5)
+
+    @staticmethod
+    def scale_mask(masks, im0_shape, ratio_pad=None):
+        """
+        Takes a mask, and resizes it to the original image size. (Borrowed from
+        https://github.com/ultralytics/ultralytics/blob/465df3024f44fa97d4fad9986530d5a13cdabdca/ultralytics/utils/ops.py#L305)
+
+        Args:
+            masks (np.ndarray): resized and padded masks/images, [h, w, num]/[h, w, 3].
+            im0_shape (tuple): the original image shape.
+            ratio_pad (tuple): the ratio of the padding to the original image.
+
+        Returns:
+            masks (np.ndarray): The masks that are being returned.
+        """
+        im1_shape = masks.shape[:2]
+        if ratio_pad is None:  # calculate from im0_shape
+            gain = min(im1_shape[0] / im0_shape[0], im1_shape[1] / im0_shape[1])  # gain  = old / new
+            pad = (im1_shape[1] - im0_shape[1] * gain) / 2, (im1_shape[0] - im0_shape[0] * gain) / 2  # wh padding
+        else:
+            pad = ratio_pad[1]
+
+        # Calculate tlbr of mask
+        top, left = int(round(pad[1] - 0.1)), int(round(pad[0] - 0.1))  # y, x
+        bottom, right = int(round(im1_shape[0] - pad[1] + 0.1)), int(round(im1_shape[1] - pad[0] + 0.1))
+        if len(masks.shape) < 2:
+            raise ValueError(f'"len of masks shape" should be 2 or 3, but got {len(masks.shape)}')
+        masks = masks[top:bottom, left:right]
+        masks = cv2.resize(
+            masks, (im0_shape[1], im0_shape[0]), interpolation=cv2.INTER_LINEAR
+        )  # INTER_CUBIC would be better
+        if len(masks.shape) == 2:
+            masks = masks[:, :, None]
+        return masks
+
+    def draw_and_visualize(self, im, bboxes, segments, vis=False, save=True):
+        """
+        Draw and visualize results.
+
+        Args:
+            im (np.ndarray): original image, shape [h, w, c].
+            bboxes (numpy.ndarray): [n, 4], n is number of bboxes.
+            segments (List): list of segment masks.
+            vis (bool): imshow using OpenCV.
+            save (bool): save image annotated.
+
+        Returns:
+            None
+        """
+
+        # Draw rectangles and polygons
+        im_canvas = im.copy()
+        for (*box, conf, cls_), segment in zip(bboxes, segments):
+            # draw contour and fill mask
+            cv2.polylines(im, np.int32([segment]), True, (255, 255, 255), 2)  # white borderline
+            cv2.fillPoly(im_canvas, np.int32([segment]), self.color_palette(int(cls_), bgr=True))
+
+            # draw bbox rectangle
+            cv2.rectangle(
+                im,
+                (int(box[0]), int(box[1])),
+                (int(box[2]), int(box[3])),
+                self.color_palette(int(cls_), bgr=True),
+                1,
+                cv2.LINE_AA,
+            )
+            cv2.putText(
+                im,
+                f"{self.classes[cls_]}: {conf:.3f}",
+                (int(box[0]), int(box[1] - 9)),
+                cv2.FONT_HERSHEY_SIMPLEX,
+                0.7,
+                self.color_palette(int(cls_), bgr=True),
+                2,
+                cv2.LINE_AA,
+            )
+
+        # Mix image
+        im = cv2.addWeighted(im_canvas, 0.3, im, 0.7, 0)
+
+        # Show image
+        if vis:
+            cv2.imshow("demo", im)
+            cv2.waitKey(0)
+            cv2.destroyAllWindows()
+
+        # Save image
+        if save:
+            cv2.imwrite("demo.jpg", im)
+
+
+if __name__ == "__main__":
+    # Create an argument parser to handle command-line arguments
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model", type=str, required=True, help="Path to ONNX model")
+    parser.add_argument("--source", type=str, default=str(ASSETS / "bus.jpg"), help="Path to input image")
+    parser.add_argument("--conf", type=float, default=0.25, help="Confidence threshold")
+    parser.add_argument("--iou", type=float, default=0.45, help="NMS IoU threshold")
+    args = parser.parse_args()
+
+    # Build model
+    model = YOLOv8Seg(args.model)
+
+    # Read image by OpenCV
+    img = cv2.imread(args.source)
+
+    # Inference
+    boxes, segments, _ = model(img, conf_threshold=args.conf, iou_threshold=args.iou)
+
+    # Draw bboxes and polygons
+    if len(boxes) > 0:
+        model.draw_and_visualize(img, boxes, segments, vis=False, save=True)
--- a/examples/heatmaps.ipynb
+++ b/examples/heatmaps.ipynb
@ -0,0 +1,145 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "gpuType": "T4"
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "accelerator": "GPU"
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<div align=\"center\">\n",
+        "\n",
+        "  <a href=\"https://ultralytics.com/yolov8\" target=\"_blank\">\n",
+        "    <img width=\"1024\", src=\"https://raw.githubusercontent.com/ultralytics/assets/main/yolov8/banner-yolov8.png\"></a>\n",
+        "\n",
+        "  [中文](https://docs.ultralytics.com/zh/) | [한국어](https://docs.ultralytics.com/ko/) | [日本語](https://docs.ultralytics.com/ja/) | [Русский](https://docs.ultralytics.com/ru/) | [Deutsch](https://docs.ultralytics.com/de/) | [Français](https://docs.ultralytics.com/fr/) | [Español](https://docs.ultralytics.com/es/) | [Português](https://docs.ultralytics.com/pt/) | [हिन्दी](https://docs.ultralytics.com/hi/) | [العربية](https://docs.ultralytics.com/ar/)\n",
+        "\n",
+        "  <a href=\"https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/examples/heatmaps.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
+        "\n",
+        "Welcome to the Ultralytics YOLOv8 🚀 notebook! <a href=\"https://github.com/ultralytics/ultralytics\">YOLOv8</a> is the latest version of the YOLO (You Only Look Once) AI models developed by <a href=\"https://ultralytics.com\">Ultralytics</a>. This notebook serves as the starting point for exploring the <a href=\"https://docs.ultralytics.com/guides/heatmaps/\">heatmaps</a> and understand its features and capabilities.\n",
+        "\n",
+        "YOLOv8 models are fast, accurate, and easy to use, making them ideal for various object detection and image segmentation tasks. They can be trained on large datasets and run on diverse hardware platforms, from CPUs to GPUs.\n",
+        "\n",
+        "We hope that the resources in this notebook will help you get the most out of <a href=\"https://docs.ultralytics.com/guides/heatmaps/\">Ultralytics Heatmaps</a>. Please browse the YOLOv8 <a href=\"https://docs.ultralytics.com/\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/ultralytics\">GitHub</a> for support, and join our <a href=\"https://ultralytics.com/discord\">Discord</a> community for questions and discussions!\n",
+        "\n",
+        "</div>"
+      ],
+      "metadata": {
+        "id": "PN1cAxdvd61e"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Setup\n",
+        "\n",
+        "Pip install `ultralytics` and [dependencies](https://github.com/ultralytics/ultralytics/blob/main/pyproject.toml) and check software and hardware."
+      ],
+      "metadata": {
+        "id": "o68Sg1oOeZm2"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "9dSwz_uOReMI"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install ultralytics"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Ultralytics Heatmaps\n",
+        "\n",
+        "Heatmap is color-coded matrix, generated by Ultralytics YOLOv8, simplifies intricate data by using vibrant colors. This visual representation employs warmer hues for higher intensities and cooler tones for lower values. Heatmaps are effective in illustrating complex data patterns, correlations, and anomalies, providing a user-friendly and engaging way to interpret data across various domains."
+      ],
+      "metadata": {
+        "id": "m7VkxQ2aeg7k"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from ultralytics import YOLO\n",
+        "from ultralytics.solutions import heatmap\n",
+        "import cv2\n",
+        "\n",
+        "model = YOLO(\"yolov8n.pt\")\n",
+        "cap = cv2.VideoCapture(\"path/to/video/file.mp4\")\n",
+        "assert cap.isOpened(), \"Error reading video file\"\n",
+        "w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))\n",
+        "\n",
+        "# Video writer\n",
+        "video_writer = cv2.VideoWriter(\"heatmap_output.avi\",\n",
+        "                               cv2.VideoWriter_fourcc(*'mp4v'),\n",
+        "                               fps,\n",
+        "                               (w, h))\n",
+        "\n",
+        "# Init heatmap\n",
+        "heatmap_obj = heatmap.Heatmap()\n",
+        "heatmap_obj.set_args(colormap=cv2.COLORMAP_PARULA,\n",
+        "                     imw=w,\n",
+        "                     imh=h,\n",
+        "                     view_img=True,\n",
+        "                     shape=\"circle\")\n",
+        "\n",
+        "while cap.isOpened():\n",
+        "    success, im0 = cap.read()\n",
+        "    if not success:\n",
+        "        print(\"Video frame is empty or video processing has been successfully completed.\")\n",
+        "        break\n",
+        "    tracks = model.track(im0, persist=True, show=False)\n",
+        "\n",
+        "    im0 = heatmap_obj.generate_heatmap(im0, tracks)\n",
+        "    video_writer.write(im0)\n",
+        "\n",
+        "cap.release()\n",
+        "video_writer.release()\n",
+        "cv2.destroyAllWindows()"
+      ],
+      "metadata": {
+        "id": "Cx-u59HQdu2o"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "#Community Support\n",
+        "\n",
+        "For more information, you can explore <a href=\"https://docs.ultralytics.com/guides/heatmaps/#heatmap-colormaps\">Ultralytics Heatmaps Docs</a>\n",
+        "\n",
+        "Ultralytics ⚡ resources\n",
+        "- About Us – https://ultralytics.com/about\n",
+        "- Join Our Team – https://ultralytics.com/work\n",
+        "- Contact Us – https://ultralytics.com/contact\n",
+        "- Discord – https://ultralytics.com/discord\n",
+        "- Ultralytics License – https://ultralytics.com/license\n",
+        "\n",
+        "YOLOv8 🚀 resources\n",
+        "- GitHub – https://github.com/ultralytics/ultralytics\n",
+        "- Docs – https://docs.ultralytics.com/"
+      ],
+      "metadata": {
+        "id": "QrlKg-y3fEyD"
+      }
+    }
+  ]
+}
--- a/examples/hub.ipynb
+++ b/examples/hub.ipynb
@ -0,0 +1,106 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "name": "Ultralytics HUB",
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "accelerator": "GPU"
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "FIzICjaph_Wy"
+      },
+      "source": [
+        "<a align=\"center\" href=\"https://hub.ultralytics.com\" target=\"_blank\">\n",
+        "<img width=\"1024\", src=\"https://github.com/ultralytics/assets/raw/main/im/ultralytics-hub.png\"></a>\n",
+        "\n",
+        "<div align=\"center\">\n",
+        "\n",
+        "[中文](https://docs.ultralytics.com/zh/) | [한국어](https://docs.ultralytics.com/ko/) | [日本語](https://docs.ultralytics.com/ja/) | [Русский](https://docs.ultralytics.com/ru/) | [Deutsch](https://docs.ultralytics.com/de/) | [Français](https://docs.ultralytics.com/fr/) | [Español](https://docs.ultralytics.com/es/) | [Português](https://docs.ultralytics.com/pt/) | [हिन्दी](https://docs.ultralytics.com/hi/) | [العربية](https://docs.ultralytics.com/ar/)\n",
+        "\n",
+        "  <a href=\"https://github.com/ultralytics/hub/actions/workflows/ci.yaml\">\n",
+        "    <img src=\"https://github.com/ultralytics/hub/actions/workflows/ci.yaml/badge.svg\" alt=\"CI CPU\"></a>\n",
+        "  <a href=\"https://colab.research.google.com/github/ultralytics/hub/blob/master/hub.ipynb\">\n",
+        "    <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
+        "\n",
+        "Welcome to the [Ultralytics](https://ultralytics.com/) HUB notebook!\n",
+        "\n",
+        "This notebook allows you to train [YOLOv5](https://github.com/ultralytics/yolov5) and [YOLOv8](https://github.com/ultralytics/ultralytics) 🚀 models using [HUB](https://hub.ultralytics.com/). Please browse the HUB <a href=\"https://docs.ultralytics.com/hub/\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/hub/issues/new/choose\">GitHub</a> for support, and join our <a href=\"https://ultralytics.com/discord\">Discord</a> community for questions and discussions!\n",
+        "</div>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "eRQ2ow94MiOv"
+      },
+      "source": [
+        "# Setup\n",
+        "\n",
+        "Pip install `ultralytics` and [dependencies](https://github.com/ultralytics/ultralytics/blob/main/pyproject.toml) and check software and hardware."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "FyDnXd-n4c7Y",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "01e34b44-a26f-4dbc-a5a1-6e29bca01a1b"
+      },
+      "source": [
+        "%pip install ultralytics  # install\n",
+        "from ultralytics import YOLO, checks, hub\n",
+        "checks()  # checks"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stderr",
+          "text": [
+            "Ultralytics YOLOv8.0.210 🚀 Python-3.10.12 torch-2.0.1+cu118 CUDA:0 (Tesla T4, 15102MiB)\n",
+            "Setup complete ✅ (2 CPUs, 12.7 GB RAM, 24.4/78.2 GB disk)\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "cQ9BwaAqxAm4"
+      },
+      "source": [
+        "# Start\n",
+        "\n",
+        "Login with your [API key](https://hub.ultralytics.com/settings?tab=api+keys), select your YOLO 🚀 model and start training!"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "XSlZaJ9Iw_iZ"
+      },
+      "source": [
+        "hub.login('API_KEY')  # use your API key\n",
+        "\n",
+        "model = YOLO('https://hub.ultralytics.com/MODEL_ID')  # use your model URL\n",
+        "results = model.train()  # train model"
+      ],
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}
--- a/examples/object_counting.ipynb
+++ b/examples/object_counting.ipynb
@ -0,0 +1,147 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "gpuType": "T4"
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "accelerator": "GPU"
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<div align=\"center\">\n",
+        "\n",
+        "  <a href=\"https://ultralytics.com/yolov8\" target=\"_blank\">\n",
+        "    <img width=\"1024\", src=\"https://raw.githubusercontent.com/ultralytics/assets/main/yolov8/banner-yolov8.png\"></a>\n",
+        "\n",
+        "  [中文](https://docs.ultralytics.com/zh/) | [한국어](https://docs.ultralytics.com/ko/) | [日本語](https://docs.ultralytics.com/ja/) | [Русский](https://docs.ultralytics.com/ru/) | [Deutsch](https://docs.ultralytics.com/de/) | [Français](https://docs.ultralytics.com/fr/) | [Español](https://docs.ultralytics.com/es/) | [Português](https://docs.ultralytics.com/pt/) | [हिन्दी](https://docs.ultralytics.com/hi/) | [العربية](https://docs.ultralytics.com/ar/)\n",
+        "\n",
+        "  <a href=\"https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/examples/object_counting.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
+        "\n",
+        "Welcome to the Ultralytics YOLOv8 🚀 notebook! <a href=\"https://github.com/ultralytics/ultralytics\">YOLOv8</a> is the latest version of the YOLO (You Only Look Once) AI models developed by <a href=\"https://ultralytics.com\">Ultralytics</a>. This notebook serves as the starting point for exploring the <a href=\"https://docs.ultralytics.com/guides/object-counting/\">Object Counting</a> and understand its features and capabilities.\n",
+        "\n",
+        "YOLOv8 models are fast, accurate, and easy to use, making them ideal for various object detection and image segmentation tasks. They can be trained on large datasets and run on diverse hardware platforms, from CPUs to GPUs.\n",
+        "\n",
+        "We hope that the resources in this notebook will help you get the most out of <a href=\"https://docs.ultralytics.com/guides/object-counting/\">Ultralytics Object Counting</a>. Please browse the YOLOv8 <a href=\"https://docs.ultralytics.com/\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/ultralytics\">GitHub</a> for support, and join our <a href=\"https://ultralytics.com/discord\">Discord</a> community for questions and discussions!\n",
+        "\n",
+        "</div>"
+      ],
+      "metadata": {
+        "id": "PN1cAxdvd61e"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Setup\n",
+        "\n",
+        "Pip install `ultralytics` and [dependencies](https://github.com/ultralytics/ultralytics/blob/main/pyproject.toml) and check software and hardware."
+      ],
+      "metadata": {
+        "id": "o68Sg1oOeZm2"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "9dSwz_uOReMI"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install ultralytics"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Ultralytics Object Counting\n",
+        "\n",
+        "Counting objects using Ultralytics YOLOv8 entails the precise detection and enumeration of specific objects within videos and camera streams. YOLOv8 demonstrates exceptional performance in real-time applications, delivering efficient and accurate object counting across diverse scenarios such as crowd analysis and surveillance. This is attributed to its advanced algorithms and deep learning capabilities."
+      ],
+      "metadata": {
+        "id": "m7VkxQ2aeg7k"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from ultralytics import YOLO\n",
+        "from ultralytics.solutions import object_counter\n",
+        "import cv2\n",
+        "\n",
+        "model = YOLO(\"yolov8n.pt\")\n",
+        "cap = cv2.VideoCapture(\"path/to/video/file.mp4\")\n",
+        "assert cap.isOpened(), \"Error reading video file\"\n",
+        "w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))\n",
+        "\n",
+        "# Define region points\n",
+        "region_points = [(20, 400), (1080, 404), (1080, 360), (20, 360)]\n",
+        "\n",
+        "# Video writer\n",
+        "video_writer = cv2.VideoWriter(\"object_counting_output.avi\",\n",
+        "                       cv2.VideoWriter_fourcc(*'mp4v'),\n",
+        "                       fps,\n",
+        "                       (w, h))\n",
+        "\n",
+        "# Init Object Counter\n",
+        "counter = object_counter.ObjectCounter()\n",
+        "counter.set_args(view_img=True,\n",
+        "                 reg_pts=region_points,\n",
+        "                 classes_names=model.names,\n",
+        "                 draw_tracks=True)\n",
+        "\n",
+        "while cap.isOpened():\n",
+        "    success, im0 = cap.read()\n",
+        "    if not success:\n",
+        "        print(\"Video frame is empty or video processing has been successfully completed.\")\n",
+        "        break\n",
+        "    tracks = model.track(im0, persist=True, show=False)\n",
+        "\n",
+        "    im0 = counter.start_counting(im0, tracks)\n",
+        "    video_writer.write(im0)\n",
+        "\n",
+        "cap.release()\n",
+        "video_writer.release()\n",
+        "cv2.destroyAllWindows()"
+      ],
+      "metadata": {
+        "id": "Cx-u59HQdu2o"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "#Community Support\n",
+        "\n",
+        "For more information, you can explore <a href=\"https://docs.ultralytics.com/guides/object-counting/\">Ultralytics Object Counting Docs</a>\n",
+        "\n",
+        "Ultralytics ⚡ resources\n",
+        "- About Us – https://ultralytics.com/about\n",
+        "- Join Our Team – https://ultralytics.com/work\n",
+        "- Contact Us – https://ultralytics.com/contact\n",
+        "- Discord – https://ultralytics.com/discord\n",
+        "- Ultralytics License – https://ultralytics.com/license\n",
+        "\n",
+        "YOLOv8 🚀 resources\n",
+        "- GitHub – https://github.com/ultralytics/ultralytics\n",
+        "- Docs – https://docs.ultralytics.com/"
+      ],
+      "metadata": {
+        "id": "QrlKg-y3fEyD"
+      }
+    }
+  ]
+}
--- a/examples/object_tracking.ipynb
+++ b/examples/object_tracking.ipynb
@ -0,0 +1,203 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "gpuType": "T4"
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "accelerator": "GPU"
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<div align=\"center\">\n",
+        "\n",
+        "  <a href=\"https://ultralytics.com/yolov8\" target=\"_blank\">\n",
+        "    <img width=\"1024\", src=\"https://raw.githubusercontent.com/ultralytics/assets/main/yolov8/banner-yolov8.png\"></a>\n",
+        "\n",
+        "  [中文](https://docs.ultralytics.com/zh/) | [한국어](https://docs.ultralytics.com/ko/) | [日本語](https://docs.ultralytics.com/ja/) | [Русский](https://docs.ultralytics.com/ru/) | [Deutsch](https://docs.ultralytics.com/de/) | [Français](https://docs.ultralytics.com/fr/) | [Español](https://docs.ultralytics.com/es/) | [Português](https://docs.ultralytics.com/pt/) | [हिन्दी](https://docs.ultralytics.com/hi/) | [العربية](https://docs.ultralytics.com/ar/)\n",
+        "\n",
+        "  <a href=\"https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/examples/object_tracking.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
+        "\n",
+        "Welcome to the Ultralytics YOLOv8 🚀 notebook! <a href=\"https://github.com/ultralytics/ultralytics\">YOLOv8</a> is the latest version of the YOLO (You Only Look Once) AI models developed by <a href=\"https://ultralytics.com\">Ultralytics</a>. This notebook serves as the starting point for exploring the <a href=\"https://docs.ultralytics.com/modes/track/\">Object Tracking</a> and understand its features and capabilities.\n",
+        "\n",
+        "YOLOv8 models are fast, accurate, and easy to use, making them ideal for various object detection and image segmentation tasks. They can be trained on large datasets and run on diverse hardware platforms, from CPUs to GPUs.\n",
+        "\n",
+        "We hope that the resources in this notebook will help you get the most out of <a href=\"https://docs.ultralytics.com/modes/track/\">Ultralytics Object Tracking</a>. Please browse the YOLOv8 <a href=\"https://docs.ultralytics.com/\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/ultralytics\">GitHub</a> for support, and join our <a href=\"https://ultralytics.com/discord\">Discord</a> community for questions and discussions!\n",
+        "\n",
+        "</div>"
+      ],
+      "metadata": {
+        "id": "PN1cAxdvd61e"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Setup\n",
+        "\n",
+        "Pip install `ultralytics` and [dependencies](https://github.com/ultralytics/ultralytics/blob/main/pyproject.toml) and check software and hardware."
+      ],
+      "metadata": {
+        "id": "o68Sg1oOeZm2"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "9dSwz_uOReMI"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install ultralytics"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Ultralytics Object Tracking\n",
+        "\n",
+        "Within the domain of video analytics, object tracking stands out as a crucial undertaking. It goes beyond merely identifying the location and class of objects within the frame; it also involves assigning a unique ID to each detected object as the video unfolds. The applications of this technology are vast, spanning from surveillance and security to real-time sports analytics."
+      ],
+      "metadata": {
+        "id": "m7VkxQ2aeg7k"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## CLI"
+      ],
+      "metadata": {
+        "id": "-ZF9DM6e6gz0"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!yolo track source=\"/path/to/video/file.mp4\" save=True"
+      ],
+      "metadata": {
+        "id": "-XJqhOwo6iqT"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Python\n",
+        "\n",
+        "- Draw Object tracking trails"
+      ],
+      "metadata": {
+        "id": "XRcw0vIE6oNb"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import cv2\n",
+        "import numpy as np\n",
+        "from ultralytics import YOLO\n",
+        "\n",
+        "from ultralytics.utils.checks import check_imshow\n",
+        "from ultralytics.utils.plotting import Annotator, colors\n",
+        "\n",
+        "from collections import defaultdict\n",
+        "\n",
+        "track_history = defaultdict(lambda: [])\n",
+        "model = YOLO(\"yolov8n.pt\")\n",
+        "names = model.model.names\n",
+        "\n",
+        "video_path = \"/path/to/video/file.mp4\"\n",
+        "cap = cv2.VideoCapture(video_path)\n",
+        "assert cap.isOpened(), \"Error reading video file\"\n",
+        "\n",
+        "w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))\n",
+        "\n",
+        "result = cv2.VideoWriter(\"object_tracking.avi\",\n",
+        "                       cv2.VideoWriter_fourcc(*'mp4v'),\n",
+        "                       fps,\n",
+        "                       (w, h))\n",
+        "\n",
+        "while cap.isOpened():\n",
+        "    success, frame = cap.read()\n",
+        "    if success:\n",
+        "        results = model.track(frame, persist=True, verbose=False)\n",
+        "        boxes = results[0].boxes.xyxy.cpu()\n",
+        "\n",
+        "        if results[0].boxes.id is not None:\n",
+        "\n",
+        "            # Extract prediction results\n",
+        "            clss = results[0].boxes.cls.cpu().tolist()\n",
+        "            track_ids = results[0].boxes.id.int().cpu().tolist()\n",
+        "            confs = results[0].boxes.conf.float().cpu().tolist()\n",
+        "\n",
+        "            # Annotator Init\n",
+        "            annotator = Annotator(frame, line_width=2)\n",
+        "\n",
+        "            for box, cls, track_id in zip(boxes, clss, track_ids):\n",
+        "                annotator.box_label(box, color=colors(int(cls), True), label=names[int(cls)])\n",
+        "\n",
+        "                # Store tracking history\n",
+        "                track = track_history[track_id]\n",
+        "                track.append((int((box[0] + box[2]) / 2), int((box[1] + box[3]) / 2)))\n",
+        "                if len(track) > 30:\n",
+        "                    track.pop(0)\n",
+        "\n",
+        "                # Plot tracks\n",
+        "                points = np.array(track, dtype=np.int32).reshape((-1, 1, 2))\n",
+        "                cv2.circle(frame, (track[-1]), 7, colors(int(cls), True), -1)\n",
+        "                cv2.polylines(frame, [points], isClosed=False, color=colors(int(cls), True), thickness=2)\n",
+        "\n",
+        "        result.write(frame)\n",
+        "        if cv2.waitKey(1) & 0xFF == ord(\"q\"):\n",
+        "            break\n",
+        "    else:\n",
+        "        break\n",
+        "\n",
+        "result.release()\n",
+        "cap.release()\n",
+        "cv2.destroyAllWindows()"
+      ],
+      "metadata": {
+        "id": "Cx-u59HQdu2o"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "#Community Support\n",
+        "\n",
+        "For more information, you can explore <a href=\"https://docs.ultralytics.com/modes/track/\">Ultralytics Object Tracking Docs</a>\n",
+        "\n",
+        "Ultralytics ⚡ resources\n",
+        "- About Us – https://ultralytics.com/about\n",
+        "- Join Our Team – https://ultralytics.com/work\n",
+        "- Contact Us – https://ultralytics.com/contact\n",
+        "- Discord – https://ultralytics.com/discord\n",
+        "- Ultralytics License – https://ultralytics.com/license\n",
+        "\n",
+        "YOLOv8 🚀 resources\n",
+        "- GitHub – https://github.com/ultralytics/ultralytics\n",
+        "- Docs – https://docs.ultralytics.com/"
+      ],
+      "metadata": {
+        "id": "QrlKg-y3fEyD"
+      }
+    }
+  ]
+}
--- a/examples/tutorial.ipynb
+++ b/examples/tutorial.ipynb
@ -0,0 +1,649 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "name": "YOLOv8 Tutorial",
+      "provenance": [],
+      "toc_visible": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "accelerator": "GPU"
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "t6MPjfT5NrKQ"
+      },
+      "source": [
+        "<div align=\"center\">\n",
+        "\n",
+        "  <a href=\"https://ultralytics.com/yolov8\" target=\"_blank\">\n",
+        "    <img width=\"1024\", src=\"https://raw.githubusercontent.com/ultralytics/assets/main/yolov8/banner-yolov8.png\"></a>\n",
+        "\n",
+        "  [中文](https://docs.ultralytics.com/zh/) | [한국어](https://docs.ultralytics.com/ko/) | [日本語](https://docs.ultralytics.com/ja/) | [Русский](https://docs.ultralytics.com/ru/) | [Deutsch](https://docs.ultralytics.com/de/) | [Français](https://docs.ultralytics.com/fr/) | [Español](https://docs.ultralytics.com/es/) | [Português](https://docs.ultralytics.com/pt/) | [हिन्दी](https://docs.ultralytics.com/hi/) | [العربية](https://docs.ultralytics.com/ar/)\n",
+        "\n",
+        "  <a href=\"https://console.paperspace.com/github/ultralytics/ultralytics\"><img src=\"https://assets.paperspace.io/img/gradient-badge.svg\" alt=\"Run on Gradient\"/></a>\n",
+        "  <a href=\"https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/examples/tutorial.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
+        "  <a href=\"https://www.kaggle.com/ultralytics/yolov8\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" alt=\"Open In Kaggle\"></a>\n",
+        "\n",
+        "Welcome to the Ultralytics YOLOv8 🚀 notebook! <a href=\"https://github.com/ultralytics/ultralytics\">YOLOv8</a> is the latest version of the YOLO (You Only Look Once) AI models developed by <a href=\"https://ultralytics.com\">Ultralytics</a>. This notebook serves as the starting point for exploring the various resources available to help you get started with YOLOv8 and understand its features and capabilities.\n",
+        "\n",
+        "YOLOv8 models are fast, accurate, and easy to use, making them ideal for various object detection and image segmentation tasks. They can be trained on large datasets and run on diverse hardware platforms, from CPUs to GPUs.\n",
+        "\n",
+        "We hope that the resources in this notebook will help you get the most out of YOLOv8. Please browse the YOLOv8 <a href=\"https://docs.ultralytics.com/\">Docs</a> for details, raise an issue on <a href=\"https://github.com/ultralytics/ultralytics\">GitHub</a> for support, and join our <a href=\"https://ultralytics.com/discord\">Discord</a> community for questions and discussions!\n",
+        "\n",
+        "</div>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "7mGmQbAO5pQb"
+      },
+      "source": [
+        "# Setup\n",
+        "\n",
+        "Pip install `ultralytics` and [dependencies](https://github.com/ultralytics/ultralytics/blob/main/pyproject.toml) and check software and hardware."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "wbvMlHd_QwMG",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "51d15672-e688-4fb8-d9d0-00d1916d3532"
+      },
+      "source": [
+        "%pip install ultralytics\n",
+        "import ultralytics\n",
+        "ultralytics.checks()"
+      ],
+      "execution_count": 1,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Ultralytics YOLOv8.1.23 🚀 Python-3.10.12 torch-2.1.0+cu121 CUDA:0 (Tesla T4, 15102MiB)\n",
+            "Setup complete ✅ (2 CPUs, 12.7 GB RAM, 26.3/78.2 GB disk)\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "4JnkELT0cIJg"
+      },
+      "source": [
+        "# 1. Predict\n",
+        "\n",
+        "YOLOv8 may be used directly in the Command Line Interface (CLI) with a `yolo` command for a variety of tasks and modes and accepts additional arguments, i.e. `imgsz=640`. See a full list of available `yolo` [arguments](https://docs.ultralytics.com/usage/cfg/) and other details in the [YOLOv8 Predict Docs](https://docs.ultralytics.com/modes/train/).\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "zR9ZbuQCH7FX",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "37738db7-4284-47de-b3ed-b82f2431ed23"
+      },
+      "source": [
+        "# Run inference on an image with YOLOv8n\n",
+        "!yolo predict model=yolov8n.pt source='https://ultralytics.com/images/zidane.jpg'"
+      ],
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Downloading https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8n.pt to 'yolov8n.pt'...\n",
+            "100% 6.23M/6.23M [00:00<00:00, 72.6MB/s]\n",
+            "Ultralytics YOLOv8.1.23 🚀 Python-3.10.12 torch-2.1.0+cu121 CUDA:0 (Tesla T4, 15102MiB)\n",
+            "YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs\n",
+            "\n",
+            "Downloading https://ultralytics.com/images/zidane.jpg to 'zidane.jpg'...\n",
+            "100% 165k/165k [00:00<00:00, 7.05MB/s]\n",
+            "image 1/1 /content/zidane.jpg: 384x640 2 persons, 1 tie, 162.0ms\n",
+            "Speed: 13.9ms preprocess, 162.0ms inference, 1259.5ms postprocess per image at shape (1, 3, 384, 640)\n",
+            "Results saved to \u001b[1mruns/detect/predict\u001b[0m\n",
+            "💡 Learn more at https://docs.ultralytics.com/modes/predict\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "hkAzDWJ7cWTr"
+      },
+      "source": [
+        "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
+        "<img align=\"left\" src=\"https://user-images.githubusercontent.com/26833433/212889447-69e5bdf1-5800-4e29-835e-2ed2336dede2.jpg\" width=\"600\">"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "0eq1SMWl6Sfn"
+      },
+      "source": [
+        "# 2. Val\n",
+        "Validate a model's accuracy on the [COCO](https://docs.ultralytics.com/datasets/detect/coco/) dataset's `val` or `test` splits. The latest YOLOv8 [models](https://github.com/ultralytics/ultralytics#models) are downloaded automatically the first time they are used. See [YOLOv8 Val Docs](https://docs.ultralytics.com/modes/val/) for more information."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "WQPtK1QYVaD_"
+      },
+      "source": [
+        "# Download COCO val\n",
+        "import torch\n",
+        "torch.hub.download_url_to_file('https://ultralytics.com/assets/coco2017val.zip', 'tmp.zip')  # download (780M - 5000 images)\n",
+        "!unzip -q tmp.zip -d datasets && rm tmp.zip  # unzip"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "X58w8JLpMnjH",
+        "outputId": "61001937-ccd2-4157-a373-156a57495231",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        }
+      },
+      "source": [
+        "# Validate YOLOv8n on COCO8 val\n",
+        "!yolo val model=yolov8n.pt data=coco8.yaml"
+      ],
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Ultralytics YOLOv8.1.23 🚀 Python-3.10.12 torch-2.1.0+cu121 CUDA:0 (Tesla T4, 15102MiB)\n",
+            "YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs\n",
+            "\n",
+            "Dataset 'coco8.yaml' images not found ⚠️, missing path '/content/datasets/coco8/images/val'\n",
+            "Downloading https://ultralytics.com/assets/coco8.zip to '/content/datasets/coco8.zip'...\n",
+            "100% 433k/433k [00:00<00:00, 12.5MB/s]\n",
+            "Unzipping /content/datasets/coco8.zip to /content/datasets/coco8...: 100% 25/25 [00:00<00:00, 4546.38file/s]\n",
+            "Dataset download success ✅ (0.9s), saved to \u001b[1m/content/datasets\u001b[0m\n",
+            "\n",
+            "Downloading https://ultralytics.com/assets/Arial.ttf to '/root/.config/Ultralytics/Arial.ttf'...\n",
+            "100% 755k/755k [00:00<00:00, 17.8MB/s]\n",
+            "\u001b[34m\u001b[1mval: \u001b[0mScanning /content/datasets/coco8/labels/val... 4 images, 0 backgrounds, 0 corrupt: 100% 4/4 [00:00<00:00, 275.94it/s]\n",
+            "\u001b[34m\u001b[1mval: \u001b[0mNew cache created: /content/datasets/coco8/labels/val.cache\n",
+            "                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 1/1 [00:02<00:00,  2.23s/it]\n",
+            "                   all          4         17      0.621      0.833      0.888       0.63\n",
+            "                person          4         10      0.721        0.5      0.519      0.269\n",
+            "                   dog          4          1       0.37          1      0.995      0.597\n",
+            "                 horse          4          2      0.751          1      0.995      0.631\n",
+            "              elephant          4          2      0.505        0.5      0.828      0.394\n",
+            "              umbrella          4          1      0.564          1      0.995      0.995\n",
+            "          potted plant          4          1      0.814          1      0.995      0.895\n",
+            "Speed: 0.3ms preprocess, 56.9ms inference, 0.0ms loss, 222.8ms postprocess per image\n",
+            "Results saved to \u001b[1mruns/detect/val\u001b[0m\n",
+            "💡 Learn more at https://docs.ultralytics.com/modes/val\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ZY2VXXXu74w5"
+      },
+      "source": [
+        "# 3. Train\n",
+        "\n",
+        "<p align=\"\"><a href=\"https://bit.ly/ultralytics_hub\"><img width=\"1000\" src=\"https://github.com/ultralytics/assets/raw/main/yolov8/banner-integrations.png\"/></a></p>\n",
+        "\n",
+        "Train YOLOv8 on [Detect](https://docs.ultralytics.com/tasks/detect/), [Segment](https://docs.ultralytics.com/tasks/segment/), [Classify](https://docs.ultralytics.com/tasks/classify/) and [Pose](https://docs.ultralytics.com/tasks/pose/) datasets. See [YOLOv8 Train Docs](https://docs.ultralytics.com/modes/train/) for more information."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title Select YOLOv8 🚀 logger {run: 'auto'}\n",
+        "logger = 'Comet' #@param ['Comet', 'TensorBoard']\n",
+        "\n",
+        "if logger == 'Comet':\n",
+        "  %pip install -q comet_ml\n",
+        "  import comet_ml; comet_ml.init()\n",
+        "elif logger == 'TensorBoard':\n",
+        "  %load_ext tensorboard\n",
+        "  %tensorboard --logdir ."
+      ],
+      "metadata": {
+        "id": "ktegpM42AooT"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "1NcFxRcFdJ_O",
+        "outputId": "1ec62d53-41eb-444f-e2f7-cef5c18b9a27",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        }
+      },
+      "source": [
+        "# Train YOLOv8n on COCO8 for 3 epochs\n",
+        "!yolo train model=yolov8n.pt data=coco8.yaml epochs=3 imgsz=640"
+      ],
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Ultralytics YOLOv8.1.23 🚀 Python-3.10.12 torch-2.1.0+cu121 CUDA:0 (Tesla T4, 15102MiB)\n",
+            "\u001b[34m\u001b[1mengine/trainer: \u001b[0mtask=detect, mode=train, model=yolov8n.pt, data=coco8.yaml, epochs=3, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/train\n",
+            "\n",
+            "                   from  n    params  module                                       arguments                     \n",
+            "  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]                 \n",
+            "  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]                \n",
+            "  2                  -1  1      7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]             \n",
+            "  3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]                \n",
+            "  4                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]             \n",
+            "  5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]               \n",
+            "  6                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]           \n",
+            "  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]              \n",
+            "  8                  -1  1    460288  ultralytics.nn.modules.block.C2f             [256, 256, 1, True]           \n",
+            "  9                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]                 \n",
+            " 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          \n",
+            " 11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           \n",
+            " 12                  -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]                 \n",
+            " 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          \n",
+            " 14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           \n",
+            " 15                  -1  1     37248  ultralytics.nn.modules.block.C2f             [192, 64, 1]                  \n",
+            " 16                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]                \n",
+            " 17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           \n",
+            " 18                  -1  1    123648  ultralytics.nn.modules.block.C2f             [192, 128, 1]                 \n",
+            " 19                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]              \n",
+            " 20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           \n",
+            " 21                  -1  1    493056  ultralytics.nn.modules.block.C2f             [384, 256, 1]                 \n",
+            " 22        [15, 18, 21]  1    897664  ultralytics.nn.modules.head.Detect           [80, [64, 128, 256]]          \n",
+            "Model summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs\n",
+            "\n",
+            "Transferred 355/355 items from pretrained weights\n",
+            "\u001b[34m\u001b[1mTensorBoard: \u001b[0mStart with 'tensorboard --logdir runs/detect/train', view at http://localhost:6006/\n",
+            "Freezing layer 'model.22.dfl.conv.weight'\n",
+            "\u001b[34m\u001b[1mAMP: \u001b[0mrunning Automatic Mixed Precision (AMP) checks with YOLOv8n...\n",
+            "\u001b[34m\u001b[1mAMP: \u001b[0mchecks passed ✅\n",
+            "\u001b[34m\u001b[1mtrain: \u001b[0mScanning /content/datasets/coco8/labels/train... 4 images, 0 backgrounds, 0 corrupt: 100% 4/4 [00:00<00:00, 43351.98it/s]\n",
+            "\u001b[34m\u001b[1mtrain: \u001b[0mNew cache created: /content/datasets/coco8/labels/train.cache\n",
+            "\u001b[34m\u001b[1malbumentations: \u001b[0mBlur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))\n",
+            "\u001b[34m\u001b[1mval: \u001b[0mScanning /content/datasets/coco8/labels/val.cache... 4 images, 0 backgrounds, 0 corrupt: 100% 4/4 [00:00<?, ?it/s]\n",
+            "Plotting labels to runs/detect/train/labels.jpg... \n",
+            "\u001b[34m\u001b[1moptimizer:\u001b[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... \n",
+            "\u001b[34m\u001b[1moptimizer:\u001b[0m AdamW(lr=0.000119, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)\n",
+            "\u001b[34m\u001b[1mTensorBoard: \u001b[0mmodel graph visualization added ✅\n",
+            "Image sizes 640 train, 640 val\n",
+            "Using 2 dataloader workers\n",
+            "Logging results to \u001b[1mruns/detect/train\u001b[0m\n",
+            "Starting training for 3 epochs...\n",
+            "\n",
+            "      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size\n",
+            "        1/3      0.77G     0.9308      3.155      1.291         32        640: 100% 1/1 [00:01<00:00,  1.70s/it]\n",
+            "                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 1/1 [00:00<00:00,  1.90it/s]\n",
+            "                   all          4         17      0.858       0.54      0.726       0.51\n",
+            "\n",
+            "      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size\n",
+            "        2/3      0.78G      1.162      3.127      1.518         33        640: 100% 1/1 [00:00<00:00,  8.18it/s]\n",
+            "                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 1/1 [00:00<00:00,  3.71it/s]\n",
+            "                   all          4         17      0.904      0.526      0.742        0.5\n",
+            "\n",
+            "      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size\n",
+            "        3/3     0.759G      0.925      2.507      1.254         17        640: 100% 1/1 [00:00<00:00,  7.53it/s]\n",
+            "                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 1/1 [00:00<00:00,  6.80it/s]\n",
+            "                   all          4         17      0.906      0.532      0.741      0.513\n",
+            "\n",
+            "3 epochs completed in 0.002 hours.\n",
+            "Optimizer stripped from runs/detect/train/weights/last.pt, 6.5MB\n",
+            "Optimizer stripped from runs/detect/train/weights/best.pt, 6.5MB\n",
+            "\n",
+            "Validating runs/detect/train/weights/best.pt...\n",
+            "Ultralytics YOLOv8.1.23 🚀 Python-3.10.12 torch-2.1.0+cu121 CUDA:0 (Tesla T4, 15102MiB)\n",
+            "Model summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs\n",
+            "                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% 1/1 [00:00<00:00, 16.31it/s]\n",
+            "                   all          4         17      0.906      0.533      0.755      0.515\n",
+            "                person          4         10      0.942        0.3      0.519      0.233\n",
+            "                   dog          4          1          1          0      0.332      0.162\n",
+            "                 horse          4          2          1        0.9      0.995      0.698\n",
+            "              elephant          4          2          1          0      0.695      0.206\n",
+            "              umbrella          4          1      0.755          1      0.995      0.895\n",
+            "          potted plant          4          1      0.739          1      0.995      0.895\n",
+            "Speed: 0.3ms preprocess, 6.1ms inference, 0.0ms loss, 2.5ms postprocess per image\n",
+            "Results saved to \u001b[1mruns/detect/train\u001b[0m\n",
+            "💡 Learn more at https://docs.ultralytics.com/modes/train\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# 4. Export\n",
+        "\n",
+        "Export a YOLOv8 model to any supported format below with the `format` argument, i.e. `format=onnx`. See [YOLOv8 Export Docs](https://docs.ultralytics.com/modes/export/) for more information.\n",
+        "\n",
+        "- 💡 ProTip: Export to [ONNX](https://docs.ultralytics.com/integrations/onnx/) or [OpenVINO](https://docs.ultralytics.com/integrations/openvino/) for up to 3x CPU speedup.  \n",
+        "- 💡 ProTip: Export to [TensorRT](https://docs.ultralytics.com/integrations/tensorrt/) for up to 5x GPU speedup.\n",
+        "\n",
+        "| Format                                                             | `format` Argument | Model                     | Metadata | Arguments                                           |\n",
+        "|--------------------------------------------------------------------|-------------------|---------------------------|----------|-----------------------------------------------------|\n",
+        "| [PyTorch](https://pytorch.org/)                                    | -                 | `yolov8n.pt`              | ✅        | -                                                   |\n",
+        "| [TorchScript](https://pytorch.org/docs/stable/jit.html)            | `torchscript`     | `yolov8n.torchscript`     | ✅        | `imgsz`, `optimize`                                 |\n",
+        "| [ONNX](https://onnx.ai/)                                           | `onnx`            | `yolov8n.onnx`            | ✅        | `imgsz`, `half`, `dynamic`, `simplify`, `opset`     |\n",
+        "| [OpenVINO](https://docs.openvino.ai/)                              | `openvino`        | `yolov8n_openvino_model/` | ✅        | `imgsz`, `half`, `int8`                             |\n",
+        "| [TensorRT](https://developer.nvidia.com/tensorrt)                  | `engine`          | `yolov8n.engine`          | ✅        | `imgsz`, `half`, `dynamic`, `simplify`, `workspace` |\n",
+        "| [CoreML](https://github.com/apple/coremltools)                     | `coreml`          | `yolov8n.mlpackage`       | ✅        | `imgsz`, `half`, `int8`, `nms`                      |\n",
+        "| [TF SavedModel](https://www.tensorflow.org/guide/saved_model)      | `saved_model`     | `yolov8n_saved_model/`    | ✅        | `imgsz`, `keras`, `int8`                            |\n",
+        "| [TF GraphDef](https://www.tensorflow.org/api_docs/python/tf/Graph) | `pb`              | `yolov8n.pb`              | ❌        | `imgsz`                                             |\n",
+        "| [TF Lite](https://www.tensorflow.org/lite)                         | `tflite`          | `yolov8n.tflite`          | ✅        | `imgsz`, `half`, `int8`                             |\n",
+        "| [TF Edge TPU](https://coral.ai/docs/edgetpu/models-intro/)         | `edgetpu`         | `yolov8n_edgetpu.tflite`  | ✅        | `imgsz`                                             |\n",
+        "| [TF.js](https://www.tensorflow.org/js)                             | `tfjs`            | `yolov8n_web_model/`      | ✅        | `imgsz`, `half`, `int8`                             |\n",
+        "| [PaddlePaddle](https://github.com/PaddlePaddle)                    | `paddle`          | `yolov8n_paddle_model/`   | ✅        | `imgsz`                                             |\n",
+        "| [NCNN](https://github.com/Tencent/ncnn)                            | `ncnn`            | `yolov8n_ncnn_model/`     | ✅        | `imgsz`, `half`                                     |\n"
+      ],
+      "metadata": {
+        "id": "nPZZeNrLCQG6"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!yolo export model=yolov8n.pt format=torchscript"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "CYIjW4igCjqD",
+        "outputId": "f6d45666-07b4-4214-86c0-4e83e70ac096"
+      },
+      "execution_count": 5,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Ultralytics YOLOv8.1.23 🚀 Python-3.10.12 torch-2.1.0+cu121 CPU (Intel Xeon 2.30GHz)\n",
+            "YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs\n",
+            "\n",
+            "\u001b[34m\u001b[1mPyTorch:\u001b[0m starting from 'yolov8n.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (6.2 MB)\n",
+            "\n",
+            "\u001b[34m\u001b[1mTorchScript:\u001b[0m starting export with torch 2.1.0+cu121...\n",
+            "\u001b[34m\u001b[1mTorchScript:\u001b[0m export success ✅ 2.4s, saved as 'yolov8n.torchscript' (12.4 MB)\n",
+            "\n",
+            "Export complete (4.5s)\n",
+            "Results saved to \u001b[1m/content\u001b[0m\n",
+            "Predict:         yolo predict task=detect model=yolov8n.torchscript imgsz=640  \n",
+            "Validate:        yolo val task=detect model=yolov8n.torchscript imgsz=640 data=coco.yaml  \n",
+            "Visualize:       https://netron.app\n",
+            "💡 Learn more at https://docs.ultralytics.com/modes/export\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# 5. Python Usage\n",
+        "\n",
+        "YOLOv8 was reimagined using Python-first principles for the most seamless Python YOLO experience yet. YOLOv8 models can be loaded from a trained checkpoint or created from scratch. Then methods are used to train, val, predict, and export the model. See detailed Python usage examples in the [YOLOv8 Python Docs](https://docs.ultralytics.com/usage/python/)."
+      ],
+      "metadata": {
+        "id": "kUMOQ0OeDBJG"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from ultralytics import YOLO\n",
+        "\n",
+        "# Load a model\n",
+        "model = YOLO('yolov8n.yaml')  # build a new model from scratch\n",
+        "model = YOLO('yolov8n.pt')  # load a pretrained model (recommended for training)\n",
+        "\n",
+        "# Use the model\n",
+        "results = model.train(data='coco128.yaml', epochs=3)  # train the model\n",
+        "results = model.val()  # evaluate model performance on the validation set\n",
+        "results = model('https://ultralytics.com/images/bus.jpg')  # predict on an image\n",
+        "results = model.export(format='onnx')  # export the model to ONNX format"
+      ],
+      "metadata": {
+        "id": "bpF9-vS_DAaf"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# 6. Tasks\n",
+        "\n",
+        "YOLOv8 can train, val, predict and export models for the most common tasks in vision AI: [Detect](https://docs.ultralytics.com/tasks/detect/), [Segment](https://docs.ultralytics.com/tasks/segment/), [Classify](https://docs.ultralytics.com/tasks/classify/) and [Pose](https://docs.ultralytics.com/tasks/pose/). See [YOLOv8 Tasks Docs](https://docs.ultralytics.com/tasks/) for more information.\n",
+        "\n",
+        "<br><img width=\"1024\" src=\"https://raw.githubusercontent.com/ultralytics/assets/main/im/banner-tasks.png\">\n"
+      ],
+      "metadata": {
+        "id": "Phm9ccmOKye5"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## 1. Detection\n",
+        "\n",
+        "YOLOv8 _detection_ models have no suffix and are the default YOLOv8 models, i.e. `yolov8n.pt` and are pretrained on COCO. See [Detection Docs](https://docs.ultralytics.com/tasks/detect/) for full details.\n"
+      ],
+      "metadata": {
+        "id": "yq26lwpYK1lq"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Load YOLOv8n, train it on COCO128 for 3 epochs and predict an image with it\n",
+        "from ultralytics import YOLO\n",
+        "\n",
+        "model = YOLO('yolov8n.pt')  # load a pretrained YOLOv8n detection model\n",
+        "model.train(data='coco128.yaml', epochs=3)  # train the model\n",
+        "model('https://ultralytics.com/images/bus.jpg')  # predict on an image"
+      ],
+      "metadata": {
+        "id": "8Go5qqS9LbC5"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## 2. Segmentation\n",
+        "\n",
+        "YOLOv8 _segmentation_ models use the `-seg` suffix, i.e. `yolov8n-seg.pt` and are pretrained on COCO. See [Segmentation Docs](https://docs.ultralytics.com/tasks/segment/) for full details.\n"
+      ],
+      "metadata": {
+        "id": "7ZW58jUzK66B"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Load YOLOv8n-seg, train it on COCO128-seg for 3 epochs and predict an image with it\n",
+        "from ultralytics import YOLO\n",
+        "\n",
+        "model = YOLO('yolov8n-seg.pt')  # load a pretrained YOLOv8n segmentation model\n",
+        "model.train(data='coco128-seg.yaml', epochs=3)  # train the model\n",
+        "model('https://ultralytics.com/images/bus.jpg')  # predict on an image"
+      ],
+      "metadata": {
+        "id": "WFPJIQl_L5HT"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## 3. Classification\n",
+        "\n",
+        "YOLOv8 _classification_ models use the `-cls` suffix, i.e. `yolov8n-cls.pt` and are pretrained on ImageNet. See [Classification Docs](https://docs.ultralytics.com/tasks/classify/) for full details.\n"
+      ],
+      "metadata": {
+        "id": "ax3p94VNK9zR"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Load YOLOv8n-cls, train it on mnist160 for 3 epochs and predict an image with it\n",
+        "from ultralytics import YOLO\n",
+        "\n",
+        "model = YOLO('yolov8n-cls.pt')  # load a pretrained YOLOv8n classification model\n",
+        "model.train(data='mnist160', epochs=3)  # train the model\n",
+        "model('https://ultralytics.com/images/bus.jpg')  # predict on an image"
+      ],
+      "metadata": {
+        "id": "5q9Zu6zlL5rS"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## 4. Pose\n",
+        "\n",
+        "YOLOv8 _pose_ models use the `-pose` suffix, i.e. `yolov8n-pose.pt` and are pretrained on COCO Keypoints. See [Pose Docs](https://docs.ultralytics.com/tasks/pose/) for full details."
+      ],
+      "metadata": {
+        "id": "SpIaFLiO11TG"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Load YOLOv8n-pose, train it on COCO8-pose for 3 epochs and predict an image with it\n",
+        "from ultralytics import YOLO\n",
+        "\n",
+        "model = YOLO('yolov8n-pose.pt')  # load a pretrained YOLOv8n pose model\n",
+        "model.train(data='coco8-pose.yaml', epochs=3)  # train the model\n",
+        "model('https://ultralytics.com/images/bus.jpg')  # predict on an image"
+      ],
+      "metadata": {
+        "id": "si4aKFNg19vX"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## 4. Oriented Bounding Boxes (OBB)\n",
+        "\n",
+        "YOLOv8 _OBB_ models use the `-obb` suffix, i.e. `yolov8n-obb.pt` and are pretrained on the DOTA dataset. See [OBB Docs](https://docs.ultralytics.com/tasks/obb/) for full details."
+      ],
+      "metadata": {
+        "id": "cf5j_T9-B5F0"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Load YOLOv8n-obb, train it on DOTA8 for 3 epochs and predict an image with it\n",
+        "from ultralytics import YOLO\n",
+        "\n",
+        "model = YOLO('yolov8n-obb.pt')  # load a pretrained YOLOv8n OBB model\n",
+        "model.train(data='coco8-dota.yaml', epochs=3)  # train the model\n",
+        "model('https://ultralytics.com/images/bus.jpg')  # predict on an image"
+      ],
+      "metadata": {
+        "id": "IJNKClOOB5YS"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "IEijrePND_2I"
+      },
+      "source": [
+        "# Appendix\n",
+        "\n",
+        "Additional content below."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Pip install from source\n",
+        "!pip install git+https://github.com/ultralytics/ultralytics@main"
+      ],
+      "metadata": {
+        "id": "pIdE6i8C3LYp"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Git clone and run tests on updates branch\n",
+        "!git clone https://github.com/ultralytics/ultralytics -b main\n",
+        "%pip install -qe ultralytics"
+      ],
+      "metadata": {
+        "id": "uRKlwxSJdhd1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Run tests (Git clone only)\n",
+        "!pytest ultralytics/tests"
+      ],
+      "metadata": {
+        "id": "GtPlh7mcCGZX"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Validate multiple models\n",
+        "for x in 'nsmlx':\n",
+        "  !yolo val model=yolov8{x}.pt data=coco.yaml"
+      ],
+      "metadata": {
+        "id": "Wdc6t_bfzDDk"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}