mirror of
https://gitee.com/nanjing-yimao-information/ieemoo-ai-gift.git
synced 2025-08-23 07:30:25 +00:00
update
This commit is contained in:
203
docs/en/models/fast-sam.md
Normal file
203
docs/en/models/fast-sam.md
Normal file
@ -0,0 +1,203 @@
|
||||
---
|
||||
comments: true
|
||||
description: Explore FastSAM, a CNN-based solution for real-time object segmentation in images. Enhanced user interaction, computational efficiency and adaptable across vision tasks.
|
||||
keywords: FastSAM, machine learning, CNN-based solution, object segmentation, real-time solution, Ultralytics, vision tasks, image processing, industrial applications, user interaction
|
||||
---
|
||||
|
||||
# Fast Segment Anything Model (FastSAM)
|
||||
|
||||
The Fast Segment Anything Model (FastSAM) is a novel, real-time CNN-based solution for the Segment Anything task. This task is designed to segment any object within an image based on various possible user interaction prompts. FastSAM significantly reduces computational demands while maintaining competitive performance, making it a practical choice for a variety of vision tasks.
|
||||
|
||||

|
||||
|
||||
## Overview
|
||||
|
||||
FastSAM is designed to address the limitations of the [Segment Anything Model (SAM)](sam.md), a heavy Transformer model with substantial computational resource requirements. The FastSAM decouples the segment anything task into two sequential stages: all-instance segmentation and prompt-guided selection. The first stage uses [YOLOv8-seg](../tasks/segment.md) to produce the segmentation masks of all instances in the image. In the second stage, it outputs the region-of-interest corresponding to the prompt.
|
||||
|
||||
## Key Features
|
||||
|
||||
1. **Real-time Solution:** By leveraging the computational efficiency of CNNs, FastSAM provides a real-time solution for the segment anything task, making it valuable for industrial applications that require quick results.
|
||||
|
||||
2. **Efficiency and Performance:** FastSAM offers a significant reduction in computational and resource demands without compromising on performance quality. It achieves comparable performance to SAM but with drastically reduced computational resources, enabling real-time application.
|
||||
|
||||
3. **Prompt-guided Segmentation:** FastSAM can segment any object within an image guided by various possible user interaction prompts, providing flexibility and adaptability in different scenarios.
|
||||
|
||||
4. **Based on YOLOv8-seg:** FastSAM is based on [YOLOv8-seg](../tasks/segment.md), an object detector equipped with an instance segmentation branch. This allows it to effectively produce the segmentation masks of all instances in an image.
|
||||
|
||||
5. **Competitive Results on Benchmarks:** On the object proposal task on MS COCO, FastSAM achieves high scores at a significantly faster speed than [SAM](sam.md) on a single NVIDIA RTX 3090, demonstrating its efficiency and capability.
|
||||
|
||||
6. **Practical Applications:** The proposed approach provides a new, practical solution for a large number of vision tasks at a really high speed, tens or hundreds of times faster than current methods.
|
||||
|
||||
7. **Model Compression Feasibility:** FastSAM demonstrates the feasibility of a path that can significantly reduce the computational effort by introducing an artificial prior to the structure, thus opening new possibilities for large model architecture for general vision tasks.
|
||||
|
||||
## Available Models, Supported Tasks, and Operating Modes
|
||||
|
||||
This table presents the available models with their specific pre-trained weights, the tasks they support, and their compatibility with different operating modes like [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md), indicated by ✅ emojis for supported modes and ❌ emojis for unsupported modes.
|
||||
|
||||
| Model Type | Pre-trained Weights | Tasks Supported | Inference | Validation | Training | Export |
|
||||
|------------|---------------------------------------------------------------------------------------------|----------------------------------------------|-----------|------------|----------|--------|
|
||||
| FastSAM-s | [FastSAM-s.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/FastSAM-s.pt) | [Instance Segmentation](../tasks/segment.md) | ✅ | ❌ | ❌ | ✅ |
|
||||
| FastSAM-x | [FastSAM-x.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/FastSAM-x.pt) | [Instance Segmentation](../tasks/segment.md) | ✅ | ❌ | ❌ | ✅ |
|
||||
|
||||
## Usage Examples
|
||||
|
||||
The FastSAM models are easy to integrate into your Python applications. Ultralytics provides user-friendly Python API and CLI commands to streamline development.
|
||||
|
||||
### Predict Usage
|
||||
|
||||
To perform object detection on an image, use the `predict` method as shown below:
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import FastSAM
|
||||
from ultralytics.models.fastsam import FastSAMPrompt
|
||||
|
||||
# Define an inference source
|
||||
source = 'path/to/bus.jpg'
|
||||
|
||||
# Create a FastSAM model
|
||||
model = FastSAM('FastSAM-s.pt') # or FastSAM-x.pt
|
||||
|
||||
# Run inference on an image
|
||||
everything_results = model(source, device='cpu', retina_masks=True, imgsz=1024, conf=0.4, iou=0.9)
|
||||
|
||||
# Prepare a Prompt Process object
|
||||
prompt_process = FastSAMPrompt(source, everything_results, device='cpu')
|
||||
|
||||
# Everything prompt
|
||||
ann = prompt_process.everything_prompt()
|
||||
|
||||
# Bbox default shape [0,0,0,0] -> [x1,y1,x2,y2]
|
||||
ann = prompt_process.box_prompt(bbox=[200, 200, 300, 300])
|
||||
|
||||
# Text prompt
|
||||
ann = prompt_process.text_prompt(text='a photo of a dog')
|
||||
|
||||
# Point prompt
|
||||
# points default [[0,0]] [[x1,y1],[x2,y2]]
|
||||
# point_label default [0] [1,0] 0:background, 1:foreground
|
||||
ann = prompt_process.point_prompt(points=[[200, 200]], pointlabel=[1])
|
||||
prompt_process.plot(annotations=ann, output='./')
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
```bash
|
||||
# Load a FastSAM model and segment everything with it
|
||||
yolo segment predict model=FastSAM-s.pt source=path/to/bus.jpg imgsz=640
|
||||
```
|
||||
|
||||
This snippet demonstrates the simplicity of loading a pre-trained model and running a prediction on an image.
|
||||
|
||||
### Val Usage
|
||||
|
||||
Validation of the model on a dataset can be done as follows:
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import FastSAM
|
||||
|
||||
# Create a FastSAM model
|
||||
model = FastSAM('FastSAM-s.pt') # or FastSAM-x.pt
|
||||
|
||||
# Validate the model
|
||||
results = model.val(data='coco8-seg.yaml')
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
```bash
|
||||
# Load a FastSAM model and validate it on the COCO8 example dataset at image size 640
|
||||
yolo segment val model=FastSAM-s.pt data=coco8.yaml imgsz=640
|
||||
```
|
||||
|
||||
Please note that FastSAM only supports detection and segmentation of a single class of object. This means it will recognize and segment all objects as the same class. Therefore, when preparing the dataset, you need to convert all object category IDs to 0.
|
||||
|
||||
## FastSAM official Usage
|
||||
|
||||
FastSAM is also available directly from the [https://github.com/CASIA-IVA-Lab/FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM) repository. Here is a brief overview of the typical steps you might take to use FastSAM:
|
||||
|
||||
### Installation
|
||||
|
||||
1. Clone the FastSAM repository:
|
||||
|
||||
```shell
|
||||
git clone https://github.com/CASIA-IVA-Lab/FastSAM.git
|
||||
```
|
||||
|
||||
2. Create and activate a Conda environment with Python 3.9:
|
||||
|
||||
```shell
|
||||
conda create -n FastSAM python=3.9
|
||||
conda activate FastSAM
|
||||
```
|
||||
|
||||
3. Navigate to the cloned repository and install the required packages:
|
||||
|
||||
```shell
|
||||
cd FastSAM
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
4. Install the CLIP model:
|
||||
```shell
|
||||
pip install git+https://github.com/openai/CLIP.git
|
||||
```
|
||||
|
||||
### Example Usage
|
||||
|
||||
1. Download a [model checkpoint](https://drive.google.com/file/d/1m1sjY4ihXBU1fZXdQ-Xdj-mDltW-2Rqv/view?usp=sharing).
|
||||
|
||||
2. Use FastSAM for inference. Example commands:
|
||||
|
||||
- Segment everything in an image:
|
||||
|
||||
```shell
|
||||
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg
|
||||
```
|
||||
|
||||
- Segment specific objects using text prompt:
|
||||
|
||||
```shell
|
||||
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg --text_prompt "the yellow dog"
|
||||
```
|
||||
|
||||
- Segment objects within a bounding box (provide box coordinates in xywh format):
|
||||
|
||||
```shell
|
||||
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg --box_prompt "[570,200,230,400]"
|
||||
```
|
||||
|
||||
- Segment objects near specific points:
|
||||
```shell
|
||||
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg --point_prompt "[[520,360],[620,300]]" --point_label "[1,0]"
|
||||
```
|
||||
|
||||
Additionally, you can try FastSAM through a [Colab demo](https://colab.research.google.com/drive/1oX14f6IneGGw612WgVlAiy91UHwFAvr9?usp=sharing) or on the [HuggingFace web demo](https://huggingface.co/spaces/An-619/FastSAM) for a visual experience.
|
||||
|
||||
## Citations and Acknowledgements
|
||||
|
||||
We would like to acknowledge the FastSAM authors for their significant contributions in the field of real-time instance segmentation:
|
||||
|
||||
!!! Quote ""
|
||||
|
||||
=== "BibTeX"
|
||||
|
||||
```bibtex
|
||||
@misc{zhao2023fast,
|
||||
title={Fast Segment Anything},
|
||||
author={Xu Zhao and Wenchao Ding and Yongqi An and Yinglong Du and Tao Yu and Min Li and Ming Tang and Jinqiao Wang},
|
||||
year={2023},
|
||||
eprint={2306.12156},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV}
|
||||
}
|
||||
```
|
||||
|
||||
The original FastSAM paper can be found on [arXiv](https://arxiv.org/abs/2306.12156). The authors have made their work publicly available, and the codebase can be accessed on [GitHub](https://github.com/CASIA-IVA-Lab/FastSAM). We appreciate their efforts in advancing the field and making their work accessible to the broader community.
|
96
docs/en/models/index.md
Normal file
96
docs/en/models/index.md
Normal file
@ -0,0 +1,96 @@
|
||||
---
|
||||
comments: true
|
||||
description: Explore the diverse range of YOLO family, SAM, MobileSAM, FastSAM, YOLO-NAS, YOLO-World and RT-DETR models supported by Ultralytics. Get started with examples for both CLI and Python usage.
|
||||
keywords: Ultralytics, documentation, YOLO, SAM, MobileSAM, FastSAM, YOLO-NAS, RT-DETR, YOLO-World, models, architectures, Python, CLI
|
||||
---
|
||||
|
||||
# Models Supported by Ultralytics
|
||||
|
||||
Welcome to Ultralytics' model documentation! We offer support for a wide range of models, each tailored to specific tasks like [object detection](../tasks/detect.md), [instance segmentation](../tasks/segment.md), [image classification](../tasks/classify.md), [pose estimation](../tasks/pose.md), and [multi-object tracking](../modes/track.md). If you're interested in contributing your model architecture to Ultralytics, check out our [Contributing Guide](../help/contributing.md).
|
||||
|
||||
## Featured Models
|
||||
|
||||
Here are some of the key models supported:
|
||||
|
||||
1. **[YOLOv3](yolov3.md)**: The third iteration of the YOLO model family, originally by Joseph Redmon, known for its efficient real-time object detection capabilities.
|
||||
2. **[YOLOv4](yolov4.md)**: A darknet-native update to YOLOv3, released by Alexey Bochkovskiy in 2020.
|
||||
3. **[YOLOv5](yolov5.md)**: An improved version of the YOLO architecture by Ultralytics, offering better performance and speed trade-offs compared to previous versions.
|
||||
4. **[YOLOv6](yolov6.md)**: Released by [Meituan](https://about.meituan.com/) in 2022, and in use in many of the company's autonomous delivery robots.
|
||||
5. **[YOLOv7](yolov7.md)**: Updated YOLO models released in 2022 by the authors of YOLOv4.
|
||||
6. **[YOLOv8](yolov8.md) NEW 🚀**: The latest version of the YOLO family, featuring enhanced capabilities such as instance segmentation, pose/keypoints estimation, and classification.
|
||||
7. **[YOLOv9](yolov9.md)**: An experimental model trained on the Ultralytics [YOLOv5](yolov5.md) codebase implementing Programmable Gradient Information (PGI).
|
||||
8. **[Segment Anything Model (SAM)](sam.md)**: Meta's Segment Anything Model (SAM).
|
||||
9. **[Mobile Segment Anything Model (MobileSAM)](mobile-sam.md)**: MobileSAM for mobile applications, by Kyung Hee University.
|
||||
10. **[Fast Segment Anything Model (FastSAM)](fast-sam.md)**: FastSAM by Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences.
|
||||
11. **[YOLO-NAS](yolo-nas.md)**: YOLO Neural Architecture Search (NAS) Models.
|
||||
12. **[Realtime Detection Transformers (RT-DETR)](rtdetr.md)**: Baidu's PaddlePaddle Realtime Detection Transformer (RT-DETR) models.
|
||||
13. **[YOLO-World](yolo-world.md)**: Real-time Open Vocabulary Object Detection models from Tencent AI Lab.
|
||||
|
||||
<p align="center">
|
||||
<br>
|
||||
<iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/MWq1UxqTClU?si=nHAW-lYDzrz68jR0"
|
||||
title="YouTube video player" frameborder="0"
|
||||
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
|
||||
allowfullscreen>
|
||||
</iframe>
|
||||
<br>
|
||||
<strong>Watch:</strong> Run Ultralytics YOLO models in just a few lines of code.
|
||||
</p>
|
||||
|
||||
## Getting Started: Usage Examples
|
||||
|
||||
This example provides simple YOLO training and inference examples. For full documentation on these and other [modes](../modes/index.md) see the [Predict](../modes/predict.md), [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md) docs pages.
|
||||
|
||||
Note the below example is for YOLOv8 [Detect](../tasks/detect.md) models for object detection. For additional supported tasks see the [Segment](../tasks/segment.md), [Classify](../tasks/classify.md) and [Pose](../tasks/pose.md) docs.
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
PyTorch pretrained `*.pt` models as well as configuration `*.yaml` files can be passed to the `YOLO()`, `SAM()`, `NAS()` and `RTDETR()` classes to create a model instance in Python:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Load a COCO-pretrained YOLOv8n model
|
||||
model = YOLO('yolov8n.pt')
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data='coco8.yaml', epochs=100, imgsz=640)
|
||||
|
||||
# Run inference with the YOLOv8n model on the 'bus.jpg' image
|
||||
results = model('path/to/bus.jpg')
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
CLI commands are available to directly run the models:
|
||||
|
||||
```bash
|
||||
# Load a COCO-pretrained YOLOv8n model and train it on the COCO8 example dataset for 100 epochs
|
||||
yolo train model=yolov8n.pt data=coco8.yaml epochs=100 imgsz=640
|
||||
|
||||
# Load a COCO-pretrained YOLOv8n model and run inference on the 'bus.jpg' image
|
||||
yolo predict model=yolov8n.pt source=path/to/bus.jpg
|
||||
```
|
||||
|
||||
## Contributing New Models
|
||||
|
||||
Interested in contributing your model to Ultralytics? Great! We're always open to expanding our model portfolio.
|
||||
|
||||
1. **Fork the Repository**: Start by forking the [Ultralytics GitHub repository](https://github.com/ultralytics/ultralytics).
|
||||
|
||||
2. **Clone Your Fork**: Clone your fork to your local machine and create a new branch to work on.
|
||||
|
||||
3. **Implement Your Model**: Add your model following the coding standards and guidelines provided in our [Contributing Guide](../help/contributing.md).
|
||||
|
||||
4. **Test Thoroughly**: Make sure to test your model rigorously, both in isolation and as part of the pipeline.
|
||||
|
||||
5. **Create a Pull Request**: Once you're satisfied with your model, create a pull request to the main repository for review.
|
||||
|
||||
6. **Code Review & Merging**: After review, if your model meets our criteria, it will be merged into the main repository.
|
||||
|
||||
For detailed steps, consult our [Contributing Guide](../help/contributing.md).
|
119
docs/en/models/mobile-sam.md
Normal file
119
docs/en/models/mobile-sam.md
Normal file
@ -0,0 +1,119 @@
|
||||
---
|
||||
comments: true
|
||||
description: Learn more about MobileSAM, its implementation, comparison with the original SAM, and how to download and test it in the Ultralytics framework. Improve your mobile applications today.
|
||||
keywords: MobileSAM, Ultralytics, SAM, mobile applications, Arxiv, GPU, API, image encoder, mask decoder, model download, testing method
|
||||
---
|
||||
|
||||

|
||||
|
||||
# Mobile Segment Anything (MobileSAM)
|
||||
|
||||
The MobileSAM paper is now available on [arXiv](https://arxiv.org/pdf/2306.14289.pdf).
|
||||
|
||||
A demonstration of MobileSAM running on a CPU can be accessed at this [demo link](https://huggingface.co/spaces/dhkim2810/MobileSAM). The performance on a Mac i5 CPU takes approximately 3 seconds. On the Hugging Face demo, the interface and lower-performance CPUs contribute to a slower response, but it continues to function effectively.
|
||||
|
||||
MobileSAM is implemented in various projects including [Grounding-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything), [AnyLabeling](https://github.com/vietanhdev/anylabeling), and [Segment Anything in 3D](https://github.com/Jumpat/SegmentAnythingin3D).
|
||||
|
||||
MobileSAM is trained on a single GPU with a 100k dataset (1% of the original images) in less than a day. The code for this training will be made available in the future.
|
||||
|
||||
## Available Models, Supported Tasks, and Operating Modes
|
||||
|
||||
This table presents the available models with their specific pre-trained weights, the tasks they support, and their compatibility with different operating modes like [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md), indicated by ✅ emojis for supported modes and ❌ emojis for unsupported modes.
|
||||
|
||||
| Model Type | Pre-trained Weights | Tasks Supported | Inference | Validation | Training | Export |
|
||||
|------------|-----------------------------------------------------------------------------------------------|----------------------------------------------|-----------|------------|----------|--------|
|
||||
| MobileSAM | [mobile_sam.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/mobile_sam.pt) | [Instance Segmentation](../tasks/segment.md) | ✅ | ❌ | ❌ | ❌ |
|
||||
|
||||
## Adapting from SAM to MobileSAM
|
||||
|
||||
Since MobileSAM retains the same pipeline as the original SAM, we have incorporated the original's pre-processing, post-processing, and all other interfaces. Consequently, those currently using the original SAM can transition to MobileSAM with minimal effort.
|
||||
|
||||
MobileSAM performs comparably to the original SAM and retains the same pipeline except for a change in the image encoder. Specifically, we replace the original heavyweight ViT-H encoder (632M) with a smaller Tiny-ViT (5M). On a single GPU, MobileSAM operates at about 12ms per image: 8ms on the image encoder and 4ms on the mask decoder.
|
||||
|
||||
The following table provides a comparison of ViT-based image encoders:
|
||||
|
||||
| Image Encoder | Original SAM | MobileSAM |
|
||||
|---------------|--------------|-----------|
|
||||
| Parameters | 611M | 5M |
|
||||
| Speed | 452ms | 8ms |
|
||||
|
||||
Both the original SAM and MobileSAM utilize the same prompt-guided mask decoder:
|
||||
|
||||
| Mask Decoder | Original SAM | MobileSAM |
|
||||
|--------------|--------------|-----------|
|
||||
| Parameters | 3.876M | 3.876M |
|
||||
| Speed | 4ms | 4ms |
|
||||
|
||||
Here is the comparison of the whole pipeline:
|
||||
|
||||
| Whole Pipeline (Enc+Dec) | Original SAM | MobileSAM |
|
||||
|--------------------------|--------------|-----------|
|
||||
| Parameters | 615M | 9.66M |
|
||||
| Speed | 456ms | 12ms |
|
||||
|
||||
The performance of MobileSAM and the original SAM are demonstrated using both a point and a box as prompts.
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
With its superior performance, MobileSAM is approximately 5 times smaller and 7 times faster than the current FastSAM. More details are available at the [MobileSAM project page](https://github.com/ChaoningZhang/MobileSAM).
|
||||
|
||||
## Testing MobileSAM in Ultralytics
|
||||
|
||||
Just like the original SAM, we offer a straightforward testing method in Ultralytics, including modes for both Point and Box prompts.
|
||||
|
||||
### Model Download
|
||||
|
||||
You can download the model [here](https://github.com/ChaoningZhang/MobileSAM/blob/master/weights/mobile_sam.pt).
|
||||
|
||||
### Point Prompt
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import SAM
|
||||
|
||||
# Load the model
|
||||
model = SAM('mobile_sam.pt')
|
||||
|
||||
# Predict a segment based on a point prompt
|
||||
model.predict('ultralytics/assets/zidane.jpg', points=[900, 370], labels=[1])
|
||||
```
|
||||
|
||||
### Box Prompt
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import SAM
|
||||
|
||||
# Load the model
|
||||
model = SAM('mobile_sam.pt')
|
||||
|
||||
# Predict a segment based on a box prompt
|
||||
model.predict('ultralytics/assets/zidane.jpg', bboxes=[439, 437, 524, 709])
|
||||
```
|
||||
|
||||
We have implemented `MobileSAM` and `SAM` using the same API. For more usage information, please see the [SAM page](sam.md).
|
||||
|
||||
## Citations and Acknowledgements
|
||||
|
||||
If you find MobileSAM useful in your research or development work, please consider citing our paper:
|
||||
|
||||
!!! Quote ""
|
||||
|
||||
=== "BibTeX"
|
||||
|
||||
```bibtex
|
||||
@article{mobile_sam,
|
||||
title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
|
||||
author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
|
||||
journal={arXiv preprint arXiv:2306.14289},
|
||||
year={2023}
|
||||
}
|
||||
```
|
92
docs/en/models/rtdetr.md
Normal file
92
docs/en/models/rtdetr.md
Normal file
@ -0,0 +1,92 @@
|
||||
---
|
||||
comments: true
|
||||
description: Discover the features and benefits of RT-DETR, Baidu’s efficient and adaptable real-time object detector powered by Vision Transformers, including pre-trained models.
|
||||
keywords: RT-DETR, Baidu, Vision Transformers, object detection, real-time performance, CUDA, TensorRT, IoU-aware query selection, Ultralytics, Python API, PaddlePaddle
|
||||
---
|
||||
|
||||
# Baidu's RT-DETR: A Vision Transformer-Based Real-Time Object Detector
|
||||
|
||||
## Overview
|
||||
|
||||
Real-Time Detection Transformer (RT-DETR), developed by Baidu, is a cutting-edge end-to-end object detector that provides real-time performance while maintaining high accuracy. It leverages the power of Vision Transformers (ViT) to efficiently process multiscale features by decoupling intra-scale interaction and cross-scale fusion. RT-DETR is highly adaptable, supporting flexible adjustment of inference speed using different decoder layers without retraining. The model excels on accelerated backends like CUDA with TensorRT, outperforming many other real-time object detectors.
|
||||
|
||||
 **Overview of Baidu's RT-DETR.** The RT-DETR model architecture diagram shows the last three stages of the backbone {S3, S4, S5} as the input to the encoder. The efficient hybrid encoder transforms multiscale features into a sequence of image features through intrascale feature interaction (AIFI) and cross-scale feature-fusion module (CCFM). The IoU-aware query selection is employed to select a fixed number of image features to serve as initial object queries for the decoder. Finally, the decoder with auxiliary prediction heads iteratively optimizes object queries to generate boxes and confidence scores ([source](https://arxiv.org/pdf/2304.08069.pdf)).
|
||||
|
||||
### Key Features
|
||||
|
||||
- **Efficient Hybrid Encoder:** Baidu's RT-DETR uses an efficient hybrid encoder that processes multiscale features by decoupling intra-scale interaction and cross-scale fusion. This unique Vision Transformers-based design reduces computational costs and allows for real-time object detection.
|
||||
- **IoU-aware Query Selection:** Baidu's RT-DETR improves object query initialization by utilizing IoU-aware query selection. This allows the model to focus on the most relevant objects in the scene, enhancing the detection accuracy.
|
||||
- **Adaptable Inference Speed:** Baidu's RT-DETR supports flexible adjustments of inference speed by using different decoder layers without the need for retraining. This adaptability facilitates practical application in various real-time object detection scenarios.
|
||||
|
||||
## Pre-trained Models
|
||||
|
||||
The Ultralytics Python API provides pre-trained PaddlePaddle RT-DETR models with different scales:
|
||||
|
||||
- RT-DETR-L: 53.0% AP on COCO val2017, 114 FPS on T4 GPU
|
||||
- RT-DETR-X: 54.8% AP on COCO val2017, 74 FPS on T4 GPU
|
||||
|
||||
## Usage Examples
|
||||
|
||||
This example provides simple RT-DETR training and inference examples. For full documentation on these and other [modes](../modes/index.md) see the [Predict](../modes/predict.md), [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md) docs pages.
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import RTDETR
|
||||
|
||||
# Load a COCO-pretrained RT-DETR-l model
|
||||
model = RTDETR('rtdetr-l.pt')
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data='coco8.yaml', epochs=100, imgsz=640)
|
||||
|
||||
# Run inference with the RT-DETR-l model on the 'bus.jpg' image
|
||||
results = model('path/to/bus.jpg')
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
```bash
|
||||
# Load a COCO-pretrained RT-DETR-l model and train it on the COCO8 example dataset for 100 epochs
|
||||
yolo train model=rtdetr-l.pt data=coco8.yaml epochs=100 imgsz=640
|
||||
|
||||
# Load a COCO-pretrained RT-DETR-l model and run inference on the 'bus.jpg' image
|
||||
yolo predict model=rtdetr-l.pt source=path/to/bus.jpg
|
||||
```
|
||||
|
||||
## Supported Tasks and Modes
|
||||
|
||||
This table presents the model types, the specific pre-trained weights, the tasks supported by each model, and the various modes ([Train](../modes/train.md) , [Val](../modes/val.md), [Predict](../modes/predict.md), [Export](../modes/export.md)) that are supported, indicated by ✅ emojis.
|
||||
|
||||
| Model Type | Pre-trained Weights | Tasks Supported | Inference | Validation | Training | Export |
|
||||
|---------------------|-------------------------------------------------------------------------------------------|----------------------------------------|-----------|------------|----------|--------|
|
||||
| RT-DETR Large | [rtdetr-l.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/rtdetr-l.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
| RT-DETR Extra-Large | [rtdetr-x.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/rtdetr-x.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
|
||||
## Citations and Acknowledgements
|
||||
|
||||
If you use Baidu's RT-DETR in your research or development work, please cite the [original paper](https://arxiv.org/abs/2304.08069):
|
||||
|
||||
!!! Quote ""
|
||||
|
||||
=== "BibTeX"
|
||||
|
||||
```bibtex
|
||||
@misc{lv2023detrs,
|
||||
title={DETRs Beat YOLOs on Real-time Object Detection},
|
||||
author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
|
||||
year={2023},
|
||||
eprint={2304.08069},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV}
|
||||
}
|
||||
```
|
||||
|
||||
We would like to acknowledge Baidu and the [PaddlePaddle](https://github.com/PaddlePaddle/PaddleDetection) team for creating and maintaining this valuable resource for the computer vision community. Their contribution to the field with the development of the Vision Transformers-based real-time object detector, RT-DETR, is greatly appreciated.
|
||||
|
||||
_Keywords: RT-DETR, Transformer, ViT, Vision Transformers, Baidu RT-DETR, PaddlePaddle, Paddle Paddle RT-DETR, real-time object detection, Vision Transformers-based object detection, pre-trained PaddlePaddle RT-DETR models, Baidu's RT-DETR usage, Ultralytics Python API_
|
227
docs/en/models/sam.md
Normal file
227
docs/en/models/sam.md
Normal file
@ -0,0 +1,227 @@
|
||||
---
|
||||
comments: true
|
||||
description: Explore the cutting-edge Segment Anything Model (SAM) from Ultralytics that allows real-time image segmentation. Learn about its promptable segmentation, zero-shot performance, and how to use it.
|
||||
keywords: Ultralytics, image segmentation, Segment Anything Model, SAM, SA-1B dataset, real-time performance, zero-shot transfer, object detection, image analysis, machine learning
|
||||
---
|
||||
|
||||
# Segment Anything Model (SAM)
|
||||
|
||||
Welcome to the frontier of image segmentation with the Segment Anything Model, or SAM. This revolutionary model has changed the game by introducing promptable image segmentation with real-time performance, setting new standards in the field.
|
||||
|
||||
## Introduction to SAM: The Segment Anything Model
|
||||
|
||||
The Segment Anything Model, or SAM, is a cutting-edge image segmentation model that allows for promptable segmentation, providing unparalleled versatility in image analysis tasks. SAM forms the heart of the Segment Anything initiative, a groundbreaking project that introduces a novel model, task, and dataset for image segmentation.
|
||||
|
||||
SAM's advanced design allows it to adapt to new image distributions and tasks without prior knowledge, a feature known as zero-shot transfer. Trained on the expansive [SA-1B dataset](https://ai.facebook.com/datasets/segment-anything/), which contains more than 1 billion masks spread over 11 million carefully curated images, SAM has displayed impressive zero-shot performance, surpassing previous fully supervised results in many cases.
|
||||
|
||||
 **SA-1B Example images.** Dataset images overlaid masks from the newly introduced SA-1B dataset. SA-1B contains 11M diverse, high-resolution, licensed, and privacy protecting images and 1.1B high-quality segmentation masks. These masks were annotated fully automatically by SAM, and as verified by human ratings and numerous experiments, are of high quality and diversity. Images are grouped by number of masks per image for visualization (there are ∼100 masks per image on average).
|
||||
|
||||
## Key Features of the Segment Anything Model (SAM)
|
||||
|
||||
- **Promptable Segmentation Task:** SAM was designed with a promptable segmentation task in mind, allowing it to generate valid segmentation masks from any given prompt, such as spatial or text clues identifying an object.
|
||||
- **Advanced Architecture:** The Segment Anything Model employs a powerful image encoder, a prompt encoder, and a lightweight mask decoder. This unique architecture enables flexible prompting, real-time mask computation, and ambiguity awareness in segmentation tasks.
|
||||
- **The SA-1B Dataset:** Introduced by the Segment Anything project, the SA-1B dataset features over 1 billion masks on 11 million images. As the largest segmentation dataset to date, it provides SAM with a diverse and large-scale training data source.
|
||||
- **Zero-Shot Performance:** SAM displays outstanding zero-shot performance across various segmentation tasks, making it a ready-to-use tool for diverse applications with minimal need for prompt engineering.
|
||||
|
||||
For an in-depth look at the Segment Anything Model and the SA-1B dataset, please visit the [Segment Anything website](https://segment-anything.com) and check out the research paper [Segment Anything](https://arxiv.org/abs/2304.02643).
|
||||
|
||||
## Available Models, Supported Tasks, and Operating Modes
|
||||
|
||||
This table presents the available models with their specific pre-trained weights, the tasks they support, and their compatibility with different operating modes like [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md), indicated by ✅ emojis for supported modes and ❌ emojis for unsupported modes.
|
||||
|
||||
| Model Type | Pre-trained Weights | Tasks Supported | Inference | Validation | Training | Export |
|
||||
|------------|-------------------------------------------------------------------------------------|----------------------------------------------|-----------|------------|----------|--------|
|
||||
| SAM base | [sam_b.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/sam_b.pt) | [Instance Segmentation](../tasks/segment.md) | ✅ | ❌ | ❌ | ❌ |
|
||||
| SAM large | [sam_l.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/sam_l.pt) | [Instance Segmentation](../tasks/segment.md) | ✅ | ❌ | ❌ | ❌ |
|
||||
|
||||
## How to Use SAM: Versatility and Power in Image Segmentation
|
||||
|
||||
The Segment Anything Model can be employed for a multitude of downstream tasks that go beyond its training data. This includes edge detection, object proposal generation, instance segmentation, and preliminary text-to-mask prediction. With prompt engineering, SAM can swiftly adapt to new tasks and data distributions in a zero-shot manner, establishing it as a versatile and potent tool for all your image segmentation needs.
|
||||
|
||||
### SAM prediction example
|
||||
|
||||
!!! Example "Segment with prompts"
|
||||
|
||||
Segment image with given prompts.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import SAM
|
||||
|
||||
# Load a model
|
||||
model = SAM('sam_b.pt')
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
|
||||
# Run inference with bboxes prompt
|
||||
model('ultralytics/assets/zidane.jpg', bboxes=[439, 437, 524, 709])
|
||||
|
||||
# Run inference with points prompt
|
||||
model('ultralytics/assets/zidane.jpg', points=[900, 370], labels=[1])
|
||||
```
|
||||
|
||||
!!! Example "Segment everything"
|
||||
|
||||
Segment the whole image.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import SAM
|
||||
|
||||
# Load a model
|
||||
model = SAM('sam_b.pt')
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
|
||||
# Run inference
|
||||
model('path/to/image.jpg')
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
```bash
|
||||
# Run inference with a SAM model
|
||||
yolo predict model=sam_b.pt source=path/to/image.jpg
|
||||
```
|
||||
|
||||
- The logic here is to segment the whole image if you don't pass any prompts(bboxes/points/masks).
|
||||
|
||||
!!! Example "SAMPredictor example"
|
||||
|
||||
This way you can set image once and run prompts inference multiple times without running image encoder multiple times.
|
||||
|
||||
=== "Prompt inference"
|
||||
|
||||
```python
|
||||
from ultralytics.models.sam import Predictor as SAMPredictor
|
||||
|
||||
# Create SAMPredictor
|
||||
overrides = dict(conf=0.25, task='segment', mode='predict', imgsz=1024, model="mobile_sam.pt")
|
||||
predictor = SAMPredictor(overrides=overrides)
|
||||
|
||||
# Set image
|
||||
predictor.set_image("ultralytics/assets/zidane.jpg") # set with image file
|
||||
predictor.set_image(cv2.imread("ultralytics/assets/zidane.jpg")) # set with np.ndarray
|
||||
results = predictor(bboxes=[439, 437, 524, 709])
|
||||
results = predictor(points=[900, 370], labels=[1])
|
||||
|
||||
# Reset image
|
||||
predictor.reset_image()
|
||||
```
|
||||
|
||||
Segment everything with additional args.
|
||||
|
||||
=== "Segment everything"
|
||||
|
||||
```python
|
||||
from ultralytics.models.sam import Predictor as SAMPredictor
|
||||
|
||||
# Create SAMPredictor
|
||||
overrides = dict(conf=0.25, task='segment', mode='predict', imgsz=1024, model="mobile_sam.pt")
|
||||
predictor = SAMPredictor(overrides=overrides)
|
||||
|
||||
# Segment with additional args
|
||||
results = predictor(source="ultralytics/assets/zidane.jpg", crop_n_layers=1, points_stride=64)
|
||||
```
|
||||
|
||||
- More additional args for `Segment everything` see [`Predictor/generate` Reference](../reference/models/sam/predict.md).
|
||||
|
||||
## SAM comparison vs YOLOv8
|
||||
|
||||
Here we compare Meta's smallest SAM model, SAM-b, with Ultralytics smallest segmentation model, [YOLOv8n-seg](../tasks/segment.md):
|
||||
|
||||
| Model | Size | Parameters | Speed (CPU) |
|
||||
|------------------------------------------------|----------------------------|------------------------|----------------------------|
|
||||
| Meta's SAM-b | 358 MB | 94.7 M | 51096 ms/im |
|
||||
| [MobileSAM](mobile-sam.md) | 40.7 MB | 10.1 M | 46122 ms/im |
|
||||
| [FastSAM-s](fast-sam.md) with YOLOv8 backbone | 23.7 MB | 11.8 M | 115 ms/im |
|
||||
| Ultralytics [YOLOv8n-seg](../tasks/segment.md) | **6.7 MB** (53.4x smaller) | **3.4 M** (27.9x less) | **59 ms/im** (866x faster) |
|
||||
|
||||
This comparison shows the order-of-magnitude differences in the model sizes and speeds between models. Whereas SAM presents unique capabilities for automatic segmenting, it is not a direct competitor to YOLOv8 segment models, which are smaller, faster and more efficient.
|
||||
|
||||
Tests run on a 2023 Apple M2 Macbook with 16GB of RAM. To reproduce this test:
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import FastSAM, SAM, YOLO
|
||||
|
||||
# Profile SAM-b
|
||||
model = SAM('sam_b.pt')
|
||||
model.info()
|
||||
model('ultralytics/assets')
|
||||
|
||||
# Profile MobileSAM
|
||||
model = SAM('mobile_sam.pt')
|
||||
model.info()
|
||||
model('ultralytics/assets')
|
||||
|
||||
# Profile FastSAM-s
|
||||
model = FastSAM('FastSAM-s.pt')
|
||||
model.info()
|
||||
model('ultralytics/assets')
|
||||
|
||||
# Profile YOLOv8n-seg
|
||||
model = YOLO('yolov8n-seg.pt')
|
||||
model.info()
|
||||
model('ultralytics/assets')
|
||||
```
|
||||
|
||||
## Auto-Annotation: A Quick Path to Segmentation Datasets
|
||||
|
||||
Auto-annotation is a key feature of SAM, allowing users to generate a [segmentation dataset](https://docs.ultralytics.com/datasets/segment) using a pre-trained detection model. This feature enables rapid and accurate annotation of a large number of images, bypassing the need for time-consuming manual labeling.
|
||||
|
||||
### Generate Your Segmentation Dataset Using a Detection Model
|
||||
|
||||
To auto-annotate your dataset with the Ultralytics framework, use the `auto_annotate` function as shown below:
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics.data.annotator import auto_annotate
|
||||
|
||||
auto_annotate(data="path/to/images", det_model="yolov8x.pt", sam_model='sam_b.pt')
|
||||
```
|
||||
|
||||
| Argument | Type | Description | Default |
|
||||
|------------|---------------------|---------------------------------------------------------------------------------------------------------|--------------|
|
||||
| data | str | Path to a folder containing images to be annotated. | |
|
||||
| det_model | str, optional | Pre-trained YOLO detection model. Defaults to 'yolov8x.pt'. | 'yolov8x.pt' |
|
||||
| sam_model | str, optional | Pre-trained SAM segmentation model. Defaults to 'sam_b.pt'. | 'sam_b.pt' |
|
||||
| device | str, optional | Device to run the models on. Defaults to an empty string (CPU or GPU, if available). | |
|
||||
| output_dir | str, None, optional | Directory to save the annotated results. Defaults to a 'labels' folder in the same directory as 'data'. | None |
|
||||
|
||||
The `auto_annotate` function takes the path to your images, with optional arguments for specifying the pre-trained detection and SAM segmentation models, the device to run the models on, and the output directory for saving the annotated results.
|
||||
|
||||
Auto-annotation with pre-trained models can dramatically cut down the time and effort required for creating high-quality segmentation datasets. This feature is especially beneficial for researchers and developers dealing with large image collections, as it allows them to focus on model development and evaluation rather than manual annotation.
|
||||
|
||||
## Citations and Acknowledgements
|
||||
|
||||
If you find SAM useful in your research or development work, please consider citing our paper:
|
||||
|
||||
!!! Quote ""
|
||||
|
||||
=== "BibTeX"
|
||||
|
||||
```bibtex
|
||||
@misc{kirillov2023segment,
|
||||
title={Segment Anything},
|
||||
author={Alexander Kirillov and Eric Mintun and Nikhila Ravi and Hanzi Mao and Chloe Rolland and Laura Gustafson and Tete Xiao and Spencer Whitehead and Alexander C. Berg and Wan-Yen Lo and Piotr Dollár and Ross Girshick},
|
||||
year={2023},
|
||||
eprint={2304.02643},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV}
|
||||
}
|
||||
```
|
||||
|
||||
We would like to express our gratitude to Meta AI for creating and maintaining this valuable resource for the computer vision community.
|
||||
|
||||
_keywords: Segment Anything, Segment Anything Model, SAM, Meta SAM, image segmentation, promptable segmentation, zero-shot performance, SA-1B dataset, advanced architecture, auto-annotation, Ultralytics, pre-trained models, SAM base, SAM large, instance segmentation, computer vision, AI, artificial intelligence, machine learning, data annotation, segmentation masks, detection model, YOLO detection model, bibtex, Meta AI._
|
120
docs/en/models/yolo-nas.md
Normal file
120
docs/en/models/yolo-nas.md
Normal file
@ -0,0 +1,120 @@
|
||||
---
|
||||
comments: true
|
||||
description: Explore detailed documentation of YOLO-NAS, a superior object detection model. Learn about its features, pre-trained models, usage with Ultralytics Python API, and more.
|
||||
keywords: YOLO-NAS, Deci AI, object detection, deep learning, neural architecture search, Ultralytics Python API, YOLO model, pre-trained models, quantization, optimization, COCO, Objects365, Roboflow 100
|
||||
---
|
||||
|
||||
# YOLO-NAS
|
||||
|
||||
## Overview
|
||||
|
||||
Developed by Deci AI, YOLO-NAS is a groundbreaking object detection foundational model. It is the product of advanced Neural Architecture Search technology, meticulously designed to address the limitations of previous YOLO models. With significant improvements in quantization support and accuracy-latency trade-offs, YOLO-NAS represents a major leap in object detection.
|
||||
|
||||
 **Overview of YOLO-NAS.** YOLO-NAS employs quantization-aware blocks and selective quantization for optimal performance. The model, when converted to its INT8 quantized version, experiences a minimal precision drop, a significant improvement over other models. These advancements culminate in a superior architecture with unprecedented object detection capabilities and outstanding performance.
|
||||
|
||||
### Key Features
|
||||
|
||||
- **Quantization-Friendly Basic Block:** YOLO-NAS introduces a new basic block that is friendly to quantization, addressing one of the significant limitations of previous YOLO models.
|
||||
- **Sophisticated Training and Quantization:** YOLO-NAS leverages advanced training schemes and post-training quantization to enhance performance.
|
||||
- **AutoNAC Optimization and Pre-training:** YOLO-NAS utilizes AutoNAC optimization and is pre-trained on prominent datasets such as COCO, Objects365, and Roboflow 100. This pre-training makes it extremely suitable for downstream object detection tasks in production environments.
|
||||
|
||||
## Pre-trained Models
|
||||
|
||||
Experience the power of next-generation object detection with the pre-trained YOLO-NAS models provided by Ultralytics. These models are designed to deliver top-notch performance in terms of both speed and accuracy. Choose from a variety of options tailored to your specific needs:
|
||||
|
||||
| Model | mAP | Latency (ms) |
|
||||
|------------------|-------|--------------|
|
||||
| YOLO-NAS S | 47.5 | 3.21 |
|
||||
| YOLO-NAS M | 51.55 | 5.85 |
|
||||
| YOLO-NAS L | 52.22 | 7.87 |
|
||||
| YOLO-NAS S INT-8 | 47.03 | 2.36 |
|
||||
| YOLO-NAS M INT-8 | 51.0 | 3.78 |
|
||||
| YOLO-NAS L INT-8 | 52.1 | 4.78 |
|
||||
|
||||
Each model variant is designed to offer a balance between Mean Average Precision (mAP) and latency, helping you optimize your object detection tasks for both performance and speed.
|
||||
|
||||
## Usage Examples
|
||||
|
||||
Ultralytics has made YOLO-NAS models easy to integrate into your Python applications via our `ultralytics` python package. The package provides a user-friendly Python API to streamline the process.
|
||||
|
||||
The following examples show how to use YOLO-NAS models with the `ultralytics` package for inference and validation:
|
||||
|
||||
### Inference and Validation Examples
|
||||
|
||||
In this example we validate YOLO-NAS-s on the COCO8 dataset.
|
||||
|
||||
!!! Example
|
||||
|
||||
This example provides simple inference and validation code for YOLO-NAS. For handling inference results see [Predict](../modes/predict.md) mode. For using YOLO-NAS with additional modes see [Val](../modes/val.md) and [Export](../modes/export.md). YOLO-NAS on the `ultralytics` package does not support training.
|
||||
|
||||
=== "Python"
|
||||
|
||||
PyTorch pretrained `*.pt` models files can be passed to the `NAS()` class to create a model instance in python:
|
||||
|
||||
```python
|
||||
from ultralytics import NAS
|
||||
|
||||
# Load a COCO-pretrained YOLO-NAS-s model
|
||||
model = NAS('yolo_nas_s.pt')
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
|
||||
# Validate the model on the COCO8 example dataset
|
||||
results = model.val(data='coco8.yaml')
|
||||
|
||||
# Run inference with the YOLO-NAS-s model on the 'bus.jpg' image
|
||||
results = model('path/to/bus.jpg')
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
CLI commands are available to directly run the models:
|
||||
|
||||
```bash
|
||||
# Load a COCO-pretrained YOLO-NAS-s model and validate it's performance on the COCO8 example dataset
|
||||
yolo val model=yolo_nas_s.pt data=coco8.yaml
|
||||
|
||||
# Load a COCO-pretrained YOLO-NAS-s model and run inference on the 'bus.jpg' image
|
||||
yolo predict model=yolo_nas_s.pt source=path/to/bus.jpg
|
||||
```
|
||||
|
||||
## Supported Tasks and Modes
|
||||
|
||||
We offer three variants of the YOLO-NAS models: Small (s), Medium (m), and Large (l). Each variant is designed to cater to different computational and performance needs:
|
||||
|
||||
- **YOLO-NAS-s**: Optimized for environments where computational resources are limited but efficiency is key.
|
||||
- **YOLO-NAS-m**: Offers a balanced approach, suitable for general-purpose object detection with higher accuracy.
|
||||
- **YOLO-NAS-l**: Tailored for scenarios requiring the highest accuracy, where computational resources are less of a constraint.
|
||||
|
||||
Below is a detailed overview of each model, including links to their pre-trained weights, the tasks they support, and their compatibility with different operating modes.
|
||||
|
||||
| Model Type | Pre-trained Weights | Tasks Supported | Inference | Validation | Training | Export |
|
||||
|------------|-----------------------------------------------------------------------------------------------|----------------------------------------|-----------|------------|----------|--------|
|
||||
| YOLO-NAS-s | [yolo_nas_s.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolo_nas_s.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ❌ | ✅ |
|
||||
| YOLO-NAS-m | [yolo_nas_m.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolo_nas_m.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ❌ | ✅ |
|
||||
| YOLO-NAS-l | [yolo_nas_l.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolo_nas_l.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ❌ | ✅ |
|
||||
|
||||
## Citations and Acknowledgements
|
||||
|
||||
If you employ YOLO-NAS in your research or development work, please cite SuperGradients:
|
||||
|
||||
!!! Quote ""
|
||||
|
||||
=== "BibTeX"
|
||||
|
||||
```bibtex
|
||||
@misc{supergradients,
|
||||
doi = {10.5281/ZENODO.7789328},
|
||||
url = {https://zenodo.org/record/7789328},
|
||||
author = {Aharon, Shay and {Louis-Dupont} and {Ofri Masad} and Yurkova, Kate and {Lotem Fridman} and {Lkdci} and Khvedchenya, Eugene and Rubin, Ran and Bagrov, Natan and Tymchenko, Borys and Keren, Tomer and Zhilko, Alexander and {Eran-Deci}},
|
||||
title = {Super-Gradients},
|
||||
publisher = {GitHub},
|
||||
journal = {GitHub repository},
|
||||
year = {2021},
|
||||
}
|
||||
```
|
||||
|
||||
We express our gratitude to Deci AI's [SuperGradients](https://github.com/Deci-AI/super-gradients/) team for their efforts in creating and maintaining this valuable resource for the computer vision community. We believe YOLO-NAS, with its innovative architecture and superior object detection capabilities, will become a critical tool for developers and researchers alike.
|
||||
|
||||
_Keywords: YOLO-NAS, Deci AI, object detection, deep learning, neural architecture search, Ultralytics Python API, YOLO model, SuperGradients, pre-trained models, quantization-friendly basic block, advanced training schemes, post-training quantization, AutoNAC optimization, COCO, Objects365, Roboflow 100_
|
216
docs/en/models/yolo-world.md
Normal file
216
docs/en/models/yolo-world.md
Normal file
@ -0,0 +1,216 @@
|
||||
---
|
||||
comments: true
|
||||
description: Discover YOLO-World, a YOLOv8-based framework for real-time open-vocabulary object detection in images. It enhances user interaction, boosts computational efficiency, and adapts across various vision tasks.
|
||||
keywords: YOLO-World, YOLOv8, machine learning, CNN-based framework, object detection, real-time detection, Ultralytics, vision tasks, image processing, industrial applications, user interaction
|
||||
---
|
||||
|
||||
# YOLO-World Model
|
||||
|
||||
The YOLO-World Model introduces an advanced, real-time [Ultralytics](https://ultralytics.com) [YOLOv8](yolov8.md)-based approach for Open-Vocabulary Detection tasks. This innovation enables the detection of any object within an image based on descriptive texts. By significantly lowering computational demands while preserving competitive performance, YOLO-World emerges as a versatile tool for numerous vision-based applications.
|
||||
|
||||

|
||||
|
||||
## Overview
|
||||
|
||||
YOLO-World tackles the challenges faced by traditional Open-Vocabulary detection models, which often rely on cumbersome Transformer models requiring extensive computational resources. These models' dependence on pre-defined object categories also restricts their utility in dynamic scenarios. YOLO-World revitalizes the YOLOv8 framework with open-vocabulary detection capabilities, employing vision-language modeling and pre-training on expansive datasets to excel at identifying a broad array of objects in zero-shot scenarios with unmatched efficiency.
|
||||
|
||||
## Key Features
|
||||
|
||||
1. **Real-time Solution:** Harnessing the computational speed of CNNs, YOLO-World delivers a swift open-vocabulary detection solution, catering to industries in need of immediate results.
|
||||
|
||||
2. **Efficiency and Performance:** YOLO-World slashes computational and resource requirements without sacrificing performance, offering a robust alternative to models like SAM but at a fraction of the computational cost, enabling real-time applications.
|
||||
|
||||
3. **Inference with Offline Vocabulary:** YOLO-World introduces a "prompt-then-detect" strategy, employing an offline vocabulary to enhance efficiency further. This approach enables the use of custom prompts computed apriori, including captions or categories, to be encoded and stored as offline vocabulary embeddings, streamlining the detection process.
|
||||
|
||||
4. **Powered by YOLOv8:** Built upon [Ultralytics YOLOv8](yolov8.md), YOLO-World leverages the latest advancements in real-time object detection to facilitate open-vocabulary detection with unparalleled accuracy and speed.
|
||||
|
||||
5. **Benchmark Excellence:** YOLO-World outperforms existing open-vocabulary detectors, including MDETR and GLIP series, in terms of speed and efficiency on standard benchmarks, showcasing YOLOv8's superior capability on a single NVIDIA V100 GPU.
|
||||
|
||||
6. **Versatile Applications:** YOLO-World's innovative approach unlocks new possibilities for a multitude of vision tasks, delivering speed improvements by orders of magnitude over existing methods.
|
||||
|
||||
## Available Models, Supported Tasks, and Operating Modes
|
||||
|
||||
This section details the models available with their specific pre-trained weights, the tasks they support, and their compatibility with various operating modes such as [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md), denoted by ✅ for supported modes and ❌ for unsupported modes.
|
||||
|
||||
!!! Note
|
||||
|
||||
All the YOLOv8-World weights have been directly migrated from the official [YOLO-World](https://github.com/AILab-CVC/YOLO-World) repository, highlighting their excellent contributions.
|
||||
|
||||
| Model Type | Pre-trained Weights | Tasks Supported | Inference | Validation | Training | Export |
|
||||
|-----------------|-------------------------------------------------------------------------------------------------------|----------------------------------------|-----------|------------|----------|--------|
|
||||
| YOLOv8s-world | [yolov8s-world.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8s-world.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ❌ | ❌ |
|
||||
| YOLOv8s-worldv2 | [yolov8s-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8s-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ❌ | ✅ |
|
||||
| YOLOv8m-world | [yolov8m-world.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8m-world.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ❌ | ❌ |
|
||||
| YOLOv8m-worldv2 | [yolov8m-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8m-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ❌ | ✅ |
|
||||
| YOLOv8l-world | [yolov8l-world.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8l-world.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ❌ | ❌ |
|
||||
| YOLOv8l-worldv2 | [yolov8l-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8l-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ❌ | ✅ |
|
||||
| YOLOv8x-world | [yolov8x-world.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8x-world.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ❌ | ❌ |
|
||||
| YOLOv8x-worldv2 | [yolov8x-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8x-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ❌ | ✅ |
|
||||
|
||||
## Zero-shot Transfer on COCO Dataset
|
||||
|
||||
| Model Type | mAP | mAP50 | mAP75 |
|
||||
|-----------------|------|-------|-------|
|
||||
| yolov8s-world | 37.4 | 52.0 | 40.6 |
|
||||
| yolov8s-worldv2 | 37.7 | 52.2 | 41.0 |
|
||||
| yolov8m-world | 42.0 | 57.0 | 45.6 |
|
||||
| yolov8m-worldv2 | 43.0 | 58.4 | 46.8 |
|
||||
| yolov8l-world | 45.7 | 61.3 | 49.8 |
|
||||
| yolov8l-worldv2 | 45.8 | 61.3 | 49.8 |
|
||||
| yolov8x-world | 47.0 | 63.0 | 51.2 |
|
||||
| yolov8x-worldv2 | 47.1 | 62.8 | 51.4 |
|
||||
|
||||
## Usage Examples
|
||||
|
||||
The YOLO-World models are easy to integrate into your Python applications. Ultralytics provides user-friendly Python API and CLI commands to streamline development.
|
||||
|
||||
### Predict Usage
|
||||
|
||||
Object detection is straightforward with the `predict` method, as illustrated below:
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import YOLOWorld
|
||||
|
||||
# Initialize a YOLO-World model
|
||||
model = YOLOWorld('yolov8s-world.pt') # or select yolov8m/l-world.pt for different sizes
|
||||
|
||||
# Execute inference with the YOLOv8s-world model on the specified image
|
||||
results = model.predict('path/to/image.jpg')
|
||||
|
||||
# Show results
|
||||
results[0].show()
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
```bash
|
||||
# Perform object detection using a YOLO-World model
|
||||
yolo predict model=yolov8s-world.pt source=path/to/image.jpg imgsz=640
|
||||
```
|
||||
|
||||
This snippet demonstrates the simplicity of loading a pre-trained model and running a prediction on an image.
|
||||
|
||||
### Val Usage
|
||||
|
||||
Model validation on a dataset is streamlined as follows:
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Create a YOLO-World model
|
||||
model = YOLO('yolov8s-world.pt') # or select yolov8m/l-world.pt for different sizes
|
||||
|
||||
# Conduct model validation on the COCO8 example dataset
|
||||
metrics = model.val(data='coco8.yaml')
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
```bash
|
||||
# Validate a YOLO-World model on the COCO8 dataset with a specified image size
|
||||
yolo val model=yolov8s-world.pt data=coco8.yaml imgsz=640
|
||||
```
|
||||
|
||||
!!! Note
|
||||
|
||||
The YOLO-World models provided by Ultralytics come pre-configured with [COCO dataset](../datasets/detect/coco.md) categories as part of their offline vocabulary, enhancing efficiency for immediate application. This integration allows the YOLOv8-World models to directly recognize and predict the 80 standard categories defined in the COCO dataset without requiring additional setup or customization.
|
||||
|
||||
### Set prompts
|
||||
|
||||

|
||||
|
||||
The YOLO-World framework allows for the dynamic specification of classes through custom prompts, empowering users to tailor the model to their specific needs **without retraining**. This feature is particularly useful for adapting the model to new domains or specific tasks that were not originally part of the training data. By setting custom prompts, users can essentially guide the model's focus towards objects of interest, enhancing the relevance and accuracy of the detection results.
|
||||
|
||||
For instance, if your application only requires detecting 'person' and 'bus' objects, you can specify these classes directly:
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Custom Inference Prompts"
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Initialize a YOLO-World model
|
||||
model = YOLO('yolov8s-world.pt') # or choose yolov8m/l-world.pt
|
||||
|
||||
# Define custom classes
|
||||
model.set_classes(["person", "bus"])
|
||||
|
||||
# Execute prediction for specified categories on an image
|
||||
results = model.predict('path/to/image.jpg')
|
||||
|
||||
# Show results
|
||||
results[0].show()
|
||||
```
|
||||
|
||||
You can also save a model after setting custom classes. By doing this you create a version of the YOLO-World model that is specialized for your specific use case. This process embeds your custom class definitions directly into the model file, making the model ready to use with your specified classes without further adjustments. Follow these steps to save and load your custom YOLOv8 model:
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Persisting Models with Custom Vocabulary"
|
||||
|
||||
First load a YOLO-World model, set custom classes for it and save it:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Initialize a YOLO-World model
|
||||
model = YOLO('yolov8s-world.pt') # or select yolov8m/l-world.pt
|
||||
|
||||
# Define custom classes
|
||||
model.set_classes(["person", "bus"])
|
||||
|
||||
# Save the model with the defined offline vocabulary
|
||||
model.save("custom_yolov8s.pt")
|
||||
```
|
||||
|
||||
After saving, the custom_yolov8s.pt model behaves like any other pre-trained YOLOv8 model but with a key difference: it is now optimized to detect only the classes you have defined. This customization can significantly improve detection performance and efficiency for your specific application scenarios.
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Load your custom model
|
||||
model = YOLO('custom_yolov8s.pt')
|
||||
|
||||
# Run inference to detect your custom classes
|
||||
results = model.predict('path/to/image.jpg')
|
||||
|
||||
# Show results
|
||||
results[0].show()
|
||||
```
|
||||
|
||||
### Benefits of Saving with Custom Vocabulary
|
||||
|
||||
- **Efficiency**: Streamlines the detection process by focusing on relevant objects, reducing computational overhead and speeding up inference.
|
||||
- **Flexibility**: Allows for easy adaptation of the model to new or niche detection tasks without the need for extensive retraining or data collection.
|
||||
- **Simplicity**: Simplifies deployment by eliminating the need to repeatedly specify custom classes at runtime, making the model directly usable with its embedded vocabulary.
|
||||
- **Performance**: Enhances detection accuracy for specified classes by focusing the model's attention and resources on recognizing the defined objects.
|
||||
|
||||
This approach provides a powerful means of customizing state-of-the-art object detection models for specific tasks, making advanced AI more accessible and applicable to a broader range of practical applications.
|
||||
|
||||
## Citations and Acknowledgements
|
||||
|
||||
We extend our gratitude to the [Tencent AILab Computer Vision Center](https://ai.tencent.com/) for their pioneering work in real-time open-vocabulary object detection with YOLO-World:
|
||||
|
||||
!!! Quote ""
|
||||
|
||||
=== "BibTeX"
|
||||
|
||||
```bibtex
|
||||
@article{cheng2024yolow,
|
||||
title={YOLO-World: Real-Time Open-Vocabulary Object Detection},
|
||||
author={Cheng, Tianheng and Song, Lin and Ge, Yixiao and Liu, Wenyu and Wang, Xinggang and Shan, Ying},
|
||||
journal={arXiv preprint arXiv:2401.17270},
|
||||
year={2024}
|
||||
}
|
||||
```
|
||||
|
||||
For further reading, the original YOLO-World paper is available on [arXiv](https://arxiv.org/pdf/2401.17270v2.pdf). The project's source code and additional resources can be accessed via their [GitHub repository](https://github.com/AILab-CVC/YOLO-World). We appreciate their commitment to advancing the field and sharing their valuable insights with the community.
|
98
docs/en/models/yolov3.md
Normal file
98
docs/en/models/yolov3.md
Normal file
@ -0,0 +1,98 @@
|
||||
---
|
||||
comments: true
|
||||
description: Get an overview of YOLOv3, YOLOv3-Ultralytics and YOLOv3u. Learn about their key features, usage, and supported tasks for object detection.
|
||||
keywords: YOLOv3, YOLOv3-Ultralytics, YOLOv3u, Object Detection, Inference, Training, Ultralytics
|
||||
---
|
||||
|
||||
# YOLOv3, YOLOv3-Ultralytics, and YOLOv3u
|
||||
|
||||
## Overview
|
||||
|
||||
This document presents an overview of three closely related object detection models, namely [YOLOv3](https://pjreddie.com/darknet/yolo/), [YOLOv3-Ultralytics](https://github.com/ultralytics/yolov3), and [YOLOv3u](https://github.com/ultralytics/ultralytics).
|
||||
|
||||
1. **YOLOv3:** This is the third version of the You Only Look Once (YOLO) object detection algorithm. Originally developed by Joseph Redmon, YOLOv3 improved on its predecessors by introducing features such as multiscale predictions and three different sizes of detection kernels.
|
||||
|
||||
2. **YOLOv3-Ultralytics:** This is Ultralytics' implementation of the YOLOv3 model. It reproduces the original YOLOv3 architecture and offers additional functionalities, such as support for more pre-trained models and easier customization options.
|
||||
|
||||
3. **YOLOv3u:** This is an updated version of YOLOv3-Ultralytics that incorporates the anchor-free, objectness-free split head used in YOLOv8 models. YOLOv3u maintains the same backbone and neck architecture as YOLOv3 but with the updated detection head from YOLOv8.
|
||||
|
||||

|
||||
|
||||
## Key Features
|
||||
|
||||
- **YOLOv3:** Introduced the use of three different scales for detection, leveraging three different sizes of detection kernels: 13x13, 26x26, and 52x52. This significantly improved detection accuracy for objects of different sizes. Additionally, YOLOv3 added features such as multi-label predictions for each bounding box and a better feature extractor network.
|
||||
|
||||
- **YOLOv3-Ultralytics:** Ultralytics' implementation of YOLOv3 provides the same performance as the original model but comes with added support for more pre-trained models, additional training methods, and easier customization options. This makes it more versatile and user-friendly for practical applications.
|
||||
|
||||
- **YOLOv3u:** This updated model incorporates the anchor-free, objectness-free split head from YOLOv8. By eliminating the need for pre-defined anchor boxes and objectness scores, this detection head design can improve the model's ability to detect objects of varying sizes and shapes. This makes YOLOv3u more robust and accurate for object detection tasks.
|
||||
|
||||
## Supported Tasks and Modes
|
||||
|
||||
The YOLOv3 series, including YOLOv3, YOLOv3-Ultralytics, and YOLOv3u, are designed specifically for object detection tasks. These models are renowned for their effectiveness in various real-world scenarios, balancing accuracy and speed. Each variant offers unique features and optimizations, making them suitable for a range of applications.
|
||||
|
||||
All three models support a comprehensive set of modes, ensuring versatility in various stages of model deployment and development. These modes include [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md), providing users with a complete toolkit for effective object detection.
|
||||
|
||||
| Model Type | Tasks Supported | Inference | Validation | Training | Export |
|
||||
|--------------------|----------------------------------------|-----------|------------|----------|--------|
|
||||
| YOLOv3 | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
| YOLOv3-Ultralytics | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
| YOLOv3u | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
|
||||
This table provides an at-a-glance view of the capabilities of each YOLOv3 variant, highlighting their versatility and suitability for various tasks and operational modes in object detection workflows.
|
||||
|
||||
## Usage Examples
|
||||
|
||||
This example provides simple YOLOv3 training and inference examples. For full documentation on these and other [modes](../modes/index.md) see the [Predict](../modes/predict.md), [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md) docs pages.
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
PyTorch pretrained `*.pt` models as well as configuration `*.yaml` files can be passed to the `YOLO()` class to create a model instance in python:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Load a COCO-pretrained YOLOv3n model
|
||||
model = YOLO('yolov3n.pt')
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data='coco8.yaml', epochs=100, imgsz=640)
|
||||
|
||||
# Run inference with the YOLOv3n model on the 'bus.jpg' image
|
||||
results = model('path/to/bus.jpg')
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
CLI commands are available to directly run the models:
|
||||
|
||||
```bash
|
||||
# Load a COCO-pretrained YOLOv3n model and train it on the COCO8 example dataset for 100 epochs
|
||||
yolo train model=yolov3n.pt data=coco8.yaml epochs=100 imgsz=640
|
||||
|
||||
# Load a COCO-pretrained YOLOv3n model and run inference on the 'bus.jpg' image
|
||||
yolo predict model=yolov3n.pt source=path/to/bus.jpg
|
||||
```
|
||||
|
||||
## Citations and Acknowledgements
|
||||
|
||||
If you use YOLOv3 in your research, please cite the original YOLO papers and the Ultralytics YOLOv3 repository:
|
||||
|
||||
!!! Quote ""
|
||||
|
||||
=== "BibTeX"
|
||||
|
||||
```bibtex
|
||||
@article{redmon2018yolov3,
|
||||
title={YOLOv3: An Incremental Improvement},
|
||||
author={Redmon, Joseph and Farhadi, Ali},
|
||||
journal={arXiv preprint arXiv:1804.02767},
|
||||
year={2018}
|
||||
}
|
||||
```
|
||||
|
||||
Thank you to Joseph Redmon and Ali Farhadi for developing the original YOLOv3.
|
70
docs/en/models/yolov4.md
Normal file
70
docs/en/models/yolov4.md
Normal file
@ -0,0 +1,70 @@
|
||||
---
|
||||
comments: true
|
||||
description: Explore our detailed guide on YOLOv4, a state-of-the-art real-time object detector. Understand its architectural highlights, innovative features, and application examples.
|
||||
keywords: ultralytics, YOLOv4, object detection, neural network, real-time detection, object detector, machine learning
|
||||
---
|
||||
|
||||
# YOLOv4: High-Speed and Precise Object Detection
|
||||
|
||||
Welcome to the Ultralytics documentation page for YOLOv4, a state-of-the-art, real-time object detector launched in 2020 by Alexey Bochkovskiy at [https://github.com/AlexeyAB/darknet](https://github.com/AlexeyAB/darknet). YOLOv4 is designed to provide the optimal balance between speed and accuracy, making it an excellent choice for many applications.
|
||||
|
||||
 **YOLOv4 architecture diagram**. Showcasing the intricate network design of YOLOv4, including the backbone, neck, and head components, and their interconnected layers for optimal real-time object detection.
|
||||
|
||||
## Introduction
|
||||
|
||||
YOLOv4 stands for You Only Look Once version 4. It is a real-time object detection model developed to address the limitations of previous YOLO versions like [YOLOv3](yolov3.md) and other object detection models. Unlike other convolutional neural network (CNN) based object detectors, YOLOv4 is not only applicable for recommendation systems but also for standalone process management and human input reduction. Its operation on conventional graphics processing units (GPUs) allows for mass usage at an affordable price, and it is designed to work in real-time on a conventional GPU while requiring only one such GPU for training.
|
||||
|
||||
## Architecture
|
||||
|
||||
YOLOv4 makes use of several innovative features that work together to optimize its performance. These include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT), Mish-activation, Mosaic data augmentation, DropBlock regularization, and CIoU loss. These features are combined to achieve state-of-the-art results.
|
||||
|
||||
A typical object detector is composed of several parts including the input, the backbone, the neck, and the head. The backbone of YOLOv4 is pre-trained on ImageNet and is used to predict classes and bounding boxes of objects. The backbone could be from several models including VGG, ResNet, ResNeXt, or DenseNet. The neck part of the detector is used to collect feature maps from different stages and usually includes several bottom-up paths and several top-down paths. The head part is what is used to make the final object detections and classifications.
|
||||
|
||||
## Bag of Freebies
|
||||
|
||||
YOLOv4 also makes use of methods known as "bag of freebies," which are techniques that improve the accuracy of the model during training without increasing the cost of inference. Data augmentation is a common bag of freebies technique used in object detection, which increases the variability of the input images to improve the robustness of the model. Some examples of data augmentation include photometric distortions (adjusting the brightness, contrast, hue, saturation, and noise of an image) and geometric distortions (adding random scaling, cropping, flipping, and rotating). These techniques help the model to generalize better to different types of images.
|
||||
|
||||
## Features and Performance
|
||||
|
||||
YOLOv4 is designed for optimal speed and accuracy in object detection. The architecture of YOLOv4 includes CSPDarknet53 as the backbone, PANet as the neck, and YOLOv3 as the detection head. This design allows YOLOv4 to perform object detection at an impressive speed, making it suitable for real-time applications. YOLOv4 also excels in accuracy, achieving state-of-the-art results in object detection benchmarks.
|
||||
|
||||
## Usage Examples
|
||||
|
||||
As of the time of writing, Ultralytics does not currently support YOLOv4 models. Therefore, any users interested in using YOLOv4 will need to refer directly to the YOLOv4 GitHub repository for installation and usage instructions.
|
||||
|
||||
Here is a brief overview of the typical steps you might take to use YOLOv4:
|
||||
|
||||
1. Visit the YOLOv4 GitHub repository: [https://github.com/AlexeyAB/darknet](https://github.com/AlexeyAB/darknet).
|
||||
|
||||
2. Follow the instructions provided in the README file for installation. This typically involves cloning the repository, installing necessary dependencies, and setting up any necessary environment variables.
|
||||
|
||||
3. Once installation is complete, you can train and use the model as per the usage instructions provided in the repository. This usually involves preparing your dataset, configuring the model parameters, training the model, and then using the trained model to perform object detection.
|
||||
|
||||
Please note that the specific steps may vary depending on your specific use case and the current state of the YOLOv4 repository. Therefore, it is strongly recommended to refer directly to the instructions provided in the YOLOv4 GitHub repository.
|
||||
|
||||
We regret any inconvenience this may cause and will strive to update this document with usage examples for Ultralytics once support for YOLOv4 is implemented.
|
||||
|
||||
## Conclusion
|
||||
|
||||
YOLOv4 is a powerful and efficient object detection model that strikes a balance between speed and accuracy. Its use of unique features and bag of freebies techniques during training allows it to perform excellently in real-time object detection tasks. YOLOv4 can be trained and used by anyone with a conventional GPU, making it accessible and practical for a wide range of applications.
|
||||
|
||||
## Citations and Acknowledgements
|
||||
|
||||
We would like to acknowledge the YOLOv4 authors for their significant contributions in the field of real-time object detection:
|
||||
|
||||
!!! Quote ""
|
||||
|
||||
=== "BibTeX"
|
||||
|
||||
```bibtex
|
||||
@misc{bochkovskiy2020yolov4,
|
||||
title={YOLOv4: Optimal Speed and Accuracy of Object Detection},
|
||||
author={Alexey Bochkovskiy and Chien-Yao Wang and Hong-Yuan Mark Liao},
|
||||
year={2020},
|
||||
eprint={2004.10934},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV}
|
||||
}
|
||||
```
|
||||
|
||||
The original YOLOv4 paper can be found on [arXiv](https://arxiv.org/abs/2004.10934). The authors have made their work publicly available, and the codebase can be accessed on [GitHub](https://github.com/AlexeyAB/darknet). We appreciate their efforts in advancing the field and making their work accessible to the broader community.
|
114
docs/en/models/yolov5.md
Normal file
114
docs/en/models/yolov5.md
Normal file
@ -0,0 +1,114 @@
|
||||
---
|
||||
comments: true
|
||||
description: Discover YOLOv5u, a boosted version of the YOLOv5 model featuring an improved accuracy-speed tradeoff and numerous pre-trained models for various object detection tasks.
|
||||
keywords: YOLOv5u, object detection, pre-trained models, Ultralytics, Inference, Validation, YOLOv5, YOLOv8, anchor-free, objectness-free, real-time applications, machine learning
|
||||
---
|
||||
|
||||
# YOLOv5
|
||||
|
||||
## Overview
|
||||
|
||||
YOLOv5u represents an advancement in object detection methodologies. Originating from the foundational architecture of the [YOLOv5](https://github.com/ultralytics/yolov5) model developed by Ultralytics, YOLOv5u integrates the anchor-free, objectness-free split head, a feature previously introduced in the [YOLOv8](yolov8.md) models. This adaptation refines the model's architecture, leading to an improved accuracy-speed tradeoff in object detection tasks. Given the empirical results and its derived features, YOLOv5u provides an efficient alternative for those seeking robust solutions in both research and practical applications.
|
||||
|
||||

|
||||
|
||||
## Key Features
|
||||
|
||||
- **Anchor-free Split Ultralytics Head:** Traditional object detection models rely on predefined anchor boxes to predict object locations. However, YOLOv5u modernizes this approach. By adopting an anchor-free split Ultralytics head, it ensures a more flexible and adaptive detection mechanism, consequently enhancing the performance in diverse scenarios.
|
||||
|
||||
- **Optimized Accuracy-Speed Tradeoff:** Speed and accuracy often pull in opposite directions. But YOLOv5u challenges this tradeoff. It offers a calibrated balance, ensuring real-time detections without compromising on accuracy. This feature is particularly invaluable for applications that demand swift responses, such as autonomous vehicles, robotics, and real-time video analytics.
|
||||
|
||||
- **Variety of Pre-trained Models:** Understanding that different tasks require different toolsets, YOLOv5u provides a plethora of pre-trained models. Whether you're focusing on Inference, Validation, or Training, there's a tailor-made model awaiting you. This variety ensures you're not just using a one-size-fits-all solution, but a model specifically fine-tuned for your unique challenge.
|
||||
|
||||
## Supported Tasks and Modes
|
||||
|
||||
The YOLOv5u models, with various pre-trained weights, excel in [Object Detection](../tasks/detect.md) tasks. They support a comprehensive range of modes, making them suitable for diverse applications, from development to deployment.
|
||||
|
||||
| Model Type | Pre-trained Weights | Task | Inference | Validation | Training | Export |
|
||||
|------------|-----------------------------------------------------------------------------------------------------------------------------|----------------------------------------|-----------|------------|----------|--------|
|
||||
| YOLOv5u | `yolov5nu`, `yolov5su`, `yolov5mu`, `yolov5lu`, `yolov5xu`, `yolov5n6u`, `yolov5s6u`, `yolov5m6u`, `yolov5l6u`, `yolov5x6u` | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
|
||||
This table provides a detailed overview of the YOLOv5u model variants, highlighting their applicability in object detection tasks and support for various operational modes such as [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md). This comprehensive support ensures that users can fully leverage the capabilities of YOLOv5u models in a wide range of object detection scenarios.
|
||||
|
||||
## Performance Metrics
|
||||
|
||||
!!! Performance
|
||||
|
||||
=== "Detection"
|
||||
|
||||
See [Detection Docs](https://docs.ultralytics.com/tasks/detect/) for usage examples with these models trained on [COCO](https://docs.ultralytics.com/datasets/detect/coco/), which include 80 pre-trained classes.
|
||||
|
||||
| Model | YAML | size<br><sup>(pixels) | mAP<sup>val<br>50-95 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>A100 TensorRT<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |
|
||||
|---------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|-----------------------|----------------------|--------------------------------|-------------------------------------|--------------------|-------------------|
|
||||
| [yolov5nu.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov5nu.pt) | [yolov5n.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/models/v5/yolov5.yaml) | 640 | 34.3 | 73.6 | 1.06 | 2.6 | 7.7 |
|
||||
| [yolov5su.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov5su.pt) | [yolov5s.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/models/v5/yolov5.yaml) | 640 | 43.0 | 120.7 | 1.27 | 9.1 | 24.0 |
|
||||
| [yolov5mu.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov5mu.pt) | [yolov5m.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/models/v5/yolov5.yaml) | 640 | 49.0 | 233.9 | 1.86 | 25.1 | 64.2 |
|
||||
| [yolov5lu.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov5lu.pt) | [yolov5l.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/models/v5/yolov5.yaml) | 640 | 52.2 | 408.4 | 2.50 | 53.2 | 135.0 |
|
||||
| [yolov5xu.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov5xu.pt) | [yolov5x.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/models/v5/yolov5.yaml) | 640 | 53.2 | 763.2 | 3.81 | 97.2 | 246.4 |
|
||||
| | | | | | | | |
|
||||
| [yolov5n6u.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov5n6u.pt) | [yolov5n6.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/models/v5/yolov5-p6.yaml) | 1280 | 42.1 | 211.0 | 1.83 | 4.3 | 7.8 |
|
||||
| [yolov5s6u.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov5s6u.pt) | [yolov5s6.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/models/v5/yolov5-p6.yaml) | 1280 | 48.6 | 422.6 | 2.34 | 15.3 | 24.6 |
|
||||
| [yolov5m6u.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov5m6u.pt) | [yolov5m6.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/models/v5/yolov5-p6.yaml) | 1280 | 53.6 | 810.9 | 4.36 | 41.2 | 65.7 |
|
||||
| [yolov5l6u.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov5l6u.pt) | [yolov5l6.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/models/v5/yolov5-p6.yaml) | 1280 | 55.7 | 1470.9 | 5.47 | 86.1 | 137.4 |
|
||||
| [yolov5x6u.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov5x6u.pt) | [yolov5x6.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/models/v5/yolov5-p6.yaml) | 1280 | 56.8 | 2436.5 | 8.98 | 155.4 | 250.7 |
|
||||
|
||||
## Usage Examples
|
||||
|
||||
This example provides simple YOLOv5 training and inference examples. For full documentation on these and other [modes](../modes/index.md) see the [Predict](../modes/predict.md), [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md) docs pages.
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
PyTorch pretrained `*.pt` models as well as configuration `*.yaml` files can be passed to the `YOLO()` class to create a model instance in python:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Load a COCO-pretrained YOLOv5n model
|
||||
model = YOLO('yolov5n.pt')
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data='coco8.yaml', epochs=100, imgsz=640)
|
||||
|
||||
# Run inference with the YOLOv5n model on the 'bus.jpg' image
|
||||
results = model('path/to/bus.jpg')
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
CLI commands are available to directly run the models:
|
||||
|
||||
```bash
|
||||
# Load a COCO-pretrained YOLOv5n model and train it on the COCO8 example dataset for 100 epochs
|
||||
yolo train model=yolov5n.pt data=coco8.yaml epochs=100 imgsz=640
|
||||
|
||||
# Load a COCO-pretrained YOLOv5n model and run inference on the 'bus.jpg' image
|
||||
yolo predict model=yolov5n.pt source=path/to/bus.jpg
|
||||
```
|
||||
|
||||
## Citations and Acknowledgements
|
||||
|
||||
If you use YOLOv5 or YOLOv5u in your research, please cite the Ultralytics YOLOv5 repository as follows:
|
||||
|
||||
!!! Quote ""
|
||||
|
||||
=== "BibTeX"
|
||||
|
||||
```bibtex
|
||||
@software{yolov5,
|
||||
title = {Ultralytics YOLOv5},
|
||||
author = {Glenn Jocher},
|
||||
year = {2020},
|
||||
version = {7.0},
|
||||
license = {AGPL-3.0},
|
||||
url = {https://github.com/ultralytics/yolov5},
|
||||
doi = {10.5281/zenodo.3908559},
|
||||
orcid = {0000-0001-5950-6979}
|
||||
}
|
||||
```
|
||||
|
||||
Please note that YOLOv5 models are provided under [AGPL-3.0](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) and [Enterprise](https://ultralytics.com/license) licenses.
|
106
docs/en/models/yolov6.md
Normal file
106
docs/en/models/yolov6.md
Normal file
@ -0,0 +1,106 @@
|
||||
---
|
||||
comments: true
|
||||
description: Explore Meituan YOLOv6, a state-of-the-art object detection model striking a balance between speed and accuracy. Dive into features, pre-trained models, and Python usage.
|
||||
keywords: Meituan YOLOv6, object detection, Ultralytics, YOLOv6 docs, Bi-directional Concatenation, Anchor-Aided Training, pretrained models, real-time applications
|
||||
---
|
||||
|
||||
# Meituan YOLOv6
|
||||
|
||||
## Overview
|
||||
|
||||
[Meituan](https://about.meituan.com/) YOLOv6 is a cutting-edge object detector that offers remarkable balance between speed and accuracy, making it a popular choice for real-time applications. This model introduces several notable enhancements on its architecture and training scheme, including the implementation of a Bi-directional Concatenation (BiC) module, an anchor-aided training (AAT) strategy, and an improved backbone and neck design for state-of-the-art accuracy on the COCO dataset.
|
||||
|
||||

|
||||
 **Overview of YOLOv6.** Model architecture diagram showing the redesigned network components and training strategies that have led to significant performance improvements. (a) The neck of YOLOv6 (N and S are shown). Note for M/L, RepBlocks is replaced with CSPStackRep. (b) The structure of a BiC module. (c) A SimCSPSPPF block. ([source](https://arxiv.org/pdf/2301.05586.pdf)).
|
||||
|
||||
### Key Features
|
||||
|
||||
- **Bidirectional Concatenation (BiC) Module:** YOLOv6 introduces a BiC module in the neck of the detector, enhancing localization signals and delivering performance gains with negligible speed degradation.
|
||||
- **Anchor-Aided Training (AAT) Strategy:** This model proposes AAT to enjoy the benefits of both anchor-based and anchor-free paradigms without compromising inference efficiency.
|
||||
- **Enhanced Backbone and Neck Design:** By deepening YOLOv6 to include another stage in the backbone and neck, this model achieves state-of-the-art performance on the COCO dataset at high-resolution input.
|
||||
- **Self-Distillation Strategy:** A new self-distillation strategy is implemented to boost the performance of smaller models of YOLOv6, enhancing the auxiliary regression branch during training and removing it at inference to avoid a marked speed decline.
|
||||
|
||||
## Performance Metrics
|
||||
|
||||
YOLOv6 provides various pre-trained models with different scales:
|
||||
|
||||
- YOLOv6-N: 37.5% AP on COCO val2017 at 1187 FPS with NVIDIA Tesla T4 GPU.
|
||||
- YOLOv6-S: 45.0% AP at 484 FPS.
|
||||
- YOLOv6-M: 50.0% AP at 226 FPS.
|
||||
- YOLOv6-L: 52.8% AP at 116 FPS.
|
||||
- YOLOv6-L6: State-of-the-art accuracy in real-time.
|
||||
|
||||
YOLOv6 also provides quantized models for different precisions and models optimized for mobile platforms.
|
||||
|
||||
## Usage Examples
|
||||
|
||||
This example provides simple YOLOv6 training and inference examples. For full documentation on these and other [modes](../modes/index.md) see the [Predict](../modes/predict.md), [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md) docs pages.
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
PyTorch pretrained `*.pt` models as well as configuration `*.yaml` files can be passed to the `YOLO()` class to create a model instance in python:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Build a YOLOv6n model from scratch
|
||||
model = YOLO('yolov6n.yaml')
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data='coco8.yaml', epochs=100, imgsz=640)
|
||||
|
||||
# Run inference with the YOLOv6n model on the 'bus.jpg' image
|
||||
results = model('path/to/bus.jpg')
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
CLI commands are available to directly run the models:
|
||||
|
||||
```bash
|
||||
# Build a YOLOv6n model from scratch and train it on the COCO8 example dataset for 100 epochs
|
||||
yolo train model=yolov6n.yaml data=coco8.yaml epochs=100 imgsz=640
|
||||
|
||||
# Build a YOLOv6n model from scratch and run inference on the 'bus.jpg' image
|
||||
yolo predict model=yolov6n.yaml source=path/to/bus.jpg
|
||||
```
|
||||
|
||||
## Supported Tasks and Modes
|
||||
|
||||
The YOLOv6 series offers a range of models, each optimized for high-performance [Object Detection](../tasks/detect.md). These models cater to varying computational needs and accuracy requirements, making them versatile for a wide array of applications.
|
||||
|
||||
| Model Type | Pre-trained Weights | Tasks Supported | Inference | Validation | Training | Export |
|
||||
|------------|---------------------|----------------------------------------|-----------|------------|----------|--------|
|
||||
| YOLOv6-N | `yolov6-n.pt` | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
| YOLOv6-S | `yolov6-s.pt` | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
| YOLOv6-M | `yolov6-m.pt` | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
| YOLOv6-L | `yolov6-l.pt` | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
| YOLOv6-L6 | `yolov6-l6.pt` | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
|
||||
This table provides a detailed overview of the YOLOv6 model variants, highlighting their capabilities in object detection tasks and their compatibility with various operational modes such as [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md). This comprehensive support ensures that users can fully leverage the capabilities of YOLOv6 models in a broad range of object detection scenarios.
|
||||
|
||||
## Citations and Acknowledgements
|
||||
|
||||
We would like to acknowledge the authors for their significant contributions in the field of real-time object detection:
|
||||
|
||||
!!! Quote ""
|
||||
|
||||
=== "BibTeX"
|
||||
|
||||
```bibtex
|
||||
@misc{li2023yolov6,
|
||||
title={YOLOv6 v3.0: A Full-Scale Reloading},
|
||||
author={Chuyi Li and Lulu Li and Yifei Geng and Hongliang Jiang and Meng Cheng and Bo Zhang and Zaidan Ke and Xiaoming Xu and Xiangxiang Chu},
|
||||
year={2023},
|
||||
eprint={2301.05586},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV}
|
||||
}
|
||||
```
|
||||
|
||||
The original YOLOv6 paper can be found on [arXiv](https://arxiv.org/abs/2301.05586). The authors have made their work publicly available, and the codebase can be accessed on [GitHub](https://github.com/meituan/YOLOv6). We appreciate their efforts in advancing the field and making their work accessible to the broader community.
|
65
docs/en/models/yolov7.md
Normal file
65
docs/en/models/yolov7.md
Normal file
@ -0,0 +1,65 @@
|
||||
---
|
||||
comments: true
|
||||
description: Explore the YOLOv7, a real-time object detector. Understand its superior speed, impressive accuracy, and unique trainable bag-of-freebies optimization focus.
|
||||
keywords: YOLOv7, real-time object detector, state-of-the-art, Ultralytics, MS COCO dataset, model re-parameterization, dynamic label assignment, extended scaling, compound scaling
|
||||
---
|
||||
|
||||
# YOLOv7: Trainable Bag-of-Freebies
|
||||
|
||||
YOLOv7 is a state-of-the-art real-time object detector that surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS. It has the highest accuracy (56.8% AP) among all known real-time object detectors with 30 FPS or higher on GPU V100. Moreover, YOLOv7 outperforms other object detectors such as YOLOR, YOLOX, Scaled-YOLOv4, YOLOv5, and many others in speed and accuracy. The model is trained on the MS COCO dataset from scratch without using any other datasets or pre-trained weights. Source code for YOLOv7 is available on GitHub.
|
||||
|
||||

|
||||
**Comparison of state-of-the-art object detectors.** From the results in Table 2 we know that the proposed method has the best speed-accuracy trade-off comprehensively. If we compare YOLOv7-tiny-SiLU with YOLOv5-N (r6.1), our method is 127 fps faster and 10.7% more accurate on AP. In addition, YOLOv7 has 51.4% AP at frame rate of 161 fps, while PPYOLOE-L with the same AP has only 78 fps frame rate. In terms of parameter usage, YOLOv7 is 41% less than PPYOLOE-L. If we compare YOLOv7-X with 114 fps inference speed to YOLOv5-L (r6.1) with 99 fps inference speed, YOLOv7-X can improve AP by 3.9%. If YOLOv7-X is compared with YOLOv5-X (r6.1) of similar scale, the inference speed of YOLOv7-X is 31 fps faster. In addition, in terms the amount of parameters and computation, YOLOv7-X reduces 22% of parameters and 8% of computation compared to YOLOv5-X (r6.1), but improves AP by 2.2% ([Source](https://arxiv.org/pdf/2207.02696.pdf)).
|
||||
|
||||
## Overview
|
||||
|
||||
Real-time object detection is an important component in many computer vision systems, including multi-object tracking, autonomous driving, robotics, and medical image analysis. In recent years, real-time object detection development has focused on designing efficient architectures and improving the inference speed of various CPUs, GPUs, and neural processing units (NPUs). YOLOv7 supports both mobile GPU and GPU devices, from the edge to the cloud.
|
||||
|
||||
Unlike traditional real-time object detectors that focus on architecture optimization, YOLOv7 introduces a focus on the optimization of the training process. This includes modules and optimization methods designed to improve the accuracy of object detection without increasing the inference cost, a concept known as the "trainable bag-of-freebies".
|
||||
|
||||
## Key Features
|
||||
|
||||
YOLOv7 introduces several key features:
|
||||
|
||||
1. **Model Re-parameterization**: YOLOv7 proposes a planned re-parameterized model, which is a strategy applicable to layers in different networks with the concept of gradient propagation path.
|
||||
|
||||
2. **Dynamic Label Assignment**: The training of the model with multiple output layers presents a new issue: "How to assign dynamic targets for the outputs of different branches?" To solve this problem, YOLOv7 introduces a new label assignment method called coarse-to-fine lead guided label assignment.
|
||||
|
||||
3. **Extended and Compound Scaling**: YOLOv7 proposes "extend" and "compound scaling" methods for the real-time object detector that can effectively utilize parameters and computation.
|
||||
|
||||
4. **Efficiency**: The method proposed by YOLOv7 can effectively reduce about 40% parameters and 50% computation of state-of-the-art real-time object detector, and has faster inference speed and higher detection accuracy.
|
||||
|
||||
## Usage Examples
|
||||
|
||||
As of the time of writing, Ultralytics does not currently support YOLOv7 models. Therefore, any users interested in using YOLOv7 will need to refer directly to the YOLOv7 GitHub repository for installation and usage instructions.
|
||||
|
||||
Here is a brief overview of the typical steps you might take to use YOLOv7:
|
||||
|
||||
1. Visit the YOLOv7 GitHub repository: [https://github.com/WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7).
|
||||
|
||||
2. Follow the instructions provided in the README file for installation. This typically involves cloning the repository, installing necessary dependencies, and setting up any necessary environment variables.
|
||||
|
||||
3. Once installation is complete, you can train and use the model as per the usage instructions provided in the repository. This usually involves preparing your dataset, configuring the model parameters, training the model, and then using the trained model to perform object detection.
|
||||
|
||||
Please note that the specific steps may vary depending on your specific use case and the current state of the YOLOv7 repository. Therefore, it is strongly recommended to refer directly to the instructions provided in the YOLOv7 GitHub repository.
|
||||
|
||||
We regret any inconvenience this may cause and will strive to update this document with usage examples for Ultralytics once support for YOLOv7 is implemented.
|
||||
|
||||
## Citations and Acknowledgements
|
||||
|
||||
We would like to acknowledge the YOLOv7 authors for their significant contributions in the field of real-time object detection:
|
||||
|
||||
!!! Quote ""
|
||||
|
||||
=== "BibTeX"
|
||||
|
||||
```bibtex
|
||||
@article{wang2022yolov7,
|
||||
title={{YOLOv7}: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors},
|
||||
author={Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark},
|
||||
journal={arXiv preprint arXiv:2207.02696},
|
||||
year={2022}
|
||||
}
|
||||
```
|
||||
|
||||
The original YOLOv7 paper can be found on [arXiv](https://arxiv.org/pdf/2207.02696.pdf). The authors have made their work publicly available, and the codebase can be accessed on [GitHub](https://github.com/WongKinYiu/yolov7). We appreciate their efforts in advancing the field and making their work accessible to the broader community.
|
186
docs/en/models/yolov8.md
Normal file
186
docs/en/models/yolov8.md
Normal file
@ -0,0 +1,186 @@
|
||||
---
|
||||
comments: true
|
||||
description: Explore the thrilling features of YOLOv8, the latest version of our real-time object detector! Learn how advanced architectures, pre-trained models and optimal balance between accuracy & speed make YOLOv8 the perfect choice for your object detection tasks.
|
||||
keywords: YOLOv8, Ultralytics, real-time object detector, pre-trained models, documentation, object detection, YOLO series, advanced architectures, accuracy, speed
|
||||
---
|
||||
|
||||
# YOLOv8
|
||||
|
||||
## Overview
|
||||
|
||||
YOLOv8 is the latest iteration in the YOLO series of real-time object detectors, offering cutting-edge performance in terms of accuracy and speed. Building upon the advancements of previous YOLO versions, YOLOv8 introduces new features and optimizations that make it an ideal choice for various object detection tasks in a wide range of applications.
|
||||
|
||||

|
||||
|
||||
<p align="center">
|
||||
<br>
|
||||
<iframe loading="lazy" width="720" height="405" src="https://www.youtube.com/embed/Na0HvJ4hkk0"
|
||||
title="YouTube video player" frameborder="0"
|
||||
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
|
||||
allowfullscreen>
|
||||
</iframe>
|
||||
<br>
|
||||
<strong>Watch:</strong> Ultralytics YOLOv8 Model Overview
|
||||
</p>
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Advanced Backbone and Neck Architectures:** YOLOv8 employs state-of-the-art backbone and neck architectures, resulting in improved feature extraction and object detection performance.
|
||||
- **Anchor-free Split Ultralytics Head:** YOLOv8 adopts an anchor-free split Ultralytics head, which contributes to better accuracy and a more efficient detection process compared to anchor-based approaches.
|
||||
- **Optimized Accuracy-Speed Tradeoff:** With a focus on maintaining an optimal balance between accuracy and speed, YOLOv8 is suitable for real-time object detection tasks in diverse application areas.
|
||||
- **Variety of Pre-trained Models:** YOLOv8 offers a range of pre-trained models to cater to various tasks and performance requirements, making it easier to find the right model for your specific use case.
|
||||
|
||||
## Supported Tasks and Modes
|
||||
|
||||
The YOLOv8 series offers a diverse range of models, each specialized for specific tasks in computer vision. These models are designed to cater to various requirements, from object detection to more complex tasks like instance segmentation, pose/keypoints detection, oriented object detection, and classification.
|
||||
|
||||
Each variant of the YOLOv8 series is optimized for its respective task, ensuring high performance and accuracy. Additionally, these models are compatible with various operational modes including [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md), facilitating their use in different stages of deployment and development.
|
||||
|
||||
| Model | Filenames | Task | Inference | Validation | Training | Export |
|
||||
|-------------|----------------------------------------------------------------------------------------------------------------|----------------------------------------------|-----------|------------|----------|--------|
|
||||
| YOLOv8 | `yolov8n.pt` `yolov8s.pt` `yolov8m.pt` `yolov8l.pt` `yolov8x.pt` | [Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
| YOLOv8-seg | `yolov8n-seg.pt` `yolov8s-seg.pt` `yolov8m-seg.pt` `yolov8l-seg.pt` `yolov8x-seg.pt` | [Instance Segmentation](../tasks/segment.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
| YOLOv8-pose | `yolov8n-pose.pt` `yolov8s-pose.pt` `yolov8m-pose.pt` `yolov8l-pose.pt` `yolov8x-pose.pt` `yolov8x-pose-p6.pt` | [Pose/Keypoints](../tasks/pose.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
| YOLOv8-obb | `yolov8n-obb.pt` `yolov8s-obb.pt` `yolov8m-obb.pt` `yolov8l-obb.pt` `yolov8x-obb.pt` | [Oriented Detection](../tasks/obb.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
| YOLOv8-cls | `yolov8n-cls.pt` `yolov8s-cls.pt` `yolov8m-cls.pt` `yolov8l-cls.pt` `yolov8x-cls.pt` | [Classification](../tasks/classify.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
|
||||
This table provides an overview of the YOLOv8 model variants, highlighting their applicability in specific tasks and their compatibility with various operational modes such as Inference, Validation, Training, and Export. It showcases the versatility and robustness of the YOLOv8 series, making them suitable for a variety of applications in computer vision.
|
||||
|
||||
## Performance Metrics
|
||||
|
||||
!!! Performance
|
||||
|
||||
=== "Detection (COCO)"
|
||||
|
||||
See [Detection Docs](https://docs.ultralytics.com/tasks/detect/) for usage examples with these models trained on [COCO](https://docs.ultralytics.com/datasets/detect/coco/), which include 80 pre-trained classes.
|
||||
|
||||
| Model | size<br><sup>(pixels) | mAP<sup>val<br>50-95 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>A100 TensorRT<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |
|
||||
| ------------------------------------------------------------------------------------ | --------------------- | -------------------- | ------------------------------ | ----------------------------------- | ------------------ | ----------------- |
|
||||
| [YOLOv8n](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8n.pt) | 640 | 37.3 | 80.4 | 0.99 | 3.2 | 8.7 |
|
||||
| [YOLOv8s](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8s.pt) | 640 | 44.9 | 128.4 | 1.20 | 11.2 | 28.6 |
|
||||
| [YOLOv8m](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8m.pt) | 640 | 50.2 | 234.7 | 1.83 | 25.9 | 78.9 |
|
||||
| [YOLOv8l](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8l.pt) | 640 | 52.9 | 375.2 | 2.39 | 43.7 | 165.2 |
|
||||
| [YOLOv8x](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8x.pt) | 640 | 53.9 | 479.1 | 3.53 | 68.2 | 257.8 |
|
||||
|
||||
=== "Detection (Open Images V7)"
|
||||
|
||||
See [Detection Docs](https://docs.ultralytics.com/tasks/detect/) for usage examples with these models trained on [Open Image V7](https://docs.ultralytics.com/datasets/detect/open-images-v7/), which include 600 pre-trained classes.
|
||||
|
||||
| Model | size<br><sup>(pixels) | mAP<sup>val<br>50-95 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>A100 TensorRT<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |
|
||||
| ----------------------------------------------------------------------------------------- | --------------------- | -------------------- | ------------------------------ | ----------------------------------- | ------------------ | ----------------- |
|
||||
| [YOLOv8n](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8n-oiv7.pt) | 640 | 18.4 | 142.4 | 1.21 | 3.5 | 10.5 |
|
||||
| [YOLOv8s](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8s-oiv7.pt) | 640 | 27.7 | 183.1 | 1.40 | 11.4 | 29.7 |
|
||||
| [YOLOv8m](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8m-oiv7.pt) | 640 | 33.6 | 408.5 | 2.26 | 26.2 | 80.6 |
|
||||
| [YOLOv8l](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8l-oiv7.pt) | 640 | 34.9 | 596.9 | 2.43 | 44.1 | 167.4 |
|
||||
| [YOLOv8x](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8x-oiv7.pt) | 640 | 36.3 | 860.6 | 3.56 | 68.7 | 260.6 |
|
||||
|
||||
=== "Segmentation (COCO)"
|
||||
|
||||
See [Segmentation Docs](https://docs.ultralytics.com/tasks/segment/) for usage examples with these models trained on [COCO](https://docs.ultralytics.com/datasets/segment/coco/), which include 80 pre-trained classes.
|
||||
|
||||
| Model | size<br><sup>(pixels) | mAP<sup>box<br>50-95 | mAP<sup>mask<br>50-95 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>A100 TensorRT<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |
|
||||
| -------------------------------------------------------------------------------------------- | --------------------- | -------------------- | --------------------- | ------------------------------ | ----------------------------------- | ------------------ | ----------------- |
|
||||
| [YOLOv8n-seg](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8n-seg.pt) | 640 | 36.7 | 30.5 | 96.1 | 1.21 | 3.4 | 12.6 |
|
||||
| [YOLOv8s-seg](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8s-seg.pt) | 640 | 44.6 | 36.8 | 155.7 | 1.47 | 11.8 | 42.6 |
|
||||
| [YOLOv8m-seg](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8m-seg.pt) | 640 | 49.9 | 40.8 | 317.0 | 2.18 | 27.3 | 110.2 |
|
||||
| [YOLOv8l-seg](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8l-seg.pt) | 640 | 52.3 | 42.6 | 572.4 | 2.79 | 46.0 | 220.5 |
|
||||
| [YOLOv8x-seg](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8x-seg.pt) | 640 | 53.4 | 43.4 | 712.1 | 4.02 | 71.8 | 344.1 |
|
||||
|
||||
=== "Classification (ImageNet)"
|
||||
|
||||
See [Classification Docs](https://docs.ultralytics.com/tasks/classify/) for usage examples with these models trained on [ImageNet](https://docs.ultralytics.com/datasets/classify/imagenet/), which include 1000 pre-trained classes.
|
||||
|
||||
| Model | size<br><sup>(pixels) | acc<br><sup>top1 | acc<br><sup>top5 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>A100 TensorRT<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) at 640 |
|
||||
| -------------------------------------------------------------------------------------------- | --------------------- | ---------------- | ---------------- | ------------------------------ | ----------------------------------- | ------------------ | ------------------------ |
|
||||
| [YOLOv8n-cls](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8n-cls.pt) | 224 | 69.0 | 88.3 | 12.9 | 0.31 | 2.7 | 4.3 |
|
||||
| [YOLOv8s-cls](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8s-cls.pt) | 224 | 73.8 | 91.7 | 23.4 | 0.35 | 6.4 | 13.5 |
|
||||
| [YOLOv8m-cls](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8m-cls.pt) | 224 | 76.8 | 93.5 | 85.4 | 0.62 | 17.0 | 42.7 |
|
||||
| [YOLOv8l-cls](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8l-cls.pt) | 224 | 76.8 | 93.5 | 163.0 | 0.87 | 37.5 | 99.7 |
|
||||
| [YOLOv8x-cls](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8x-cls.pt) | 224 | 79.0 | 94.6 | 232.0 | 1.01 | 57.4 | 154.8 |
|
||||
|
||||
=== "Pose (COCO)"
|
||||
|
||||
See [Pose Estimation Docs](https://docs.ultralytics.com/tasks/pose/) for usage examples with these models trained on [COCO](https://docs.ultralytics.com/datasets/pose/coco/), which include 1 pre-trained class, 'person'.
|
||||
|
||||
| Model | size<br><sup>(pixels) | mAP<sup>pose<br>50-95 | mAP<sup>pose<br>50 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>A100 TensorRT<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |
|
||||
| ---------------------------------------------------------------------------------------------------- | --------------------- | --------------------- | ------------------ | ------------------------------ | ----------------------------------- | ------------------ | ----------------- |
|
||||
| [YOLOv8n-pose](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8n-pose.pt) | 640 | 50.4 | 80.1 | 131.8 | 1.18 | 3.3 | 9.2 |
|
||||
| [YOLOv8s-pose](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8s-pose.pt) | 640 | 60.0 | 86.2 | 233.2 | 1.42 | 11.6 | 30.2 |
|
||||
| [YOLOv8m-pose](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8m-pose.pt) | 640 | 65.0 | 88.8 | 456.3 | 2.00 | 26.4 | 81.0 |
|
||||
| [YOLOv8l-pose](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8l-pose.pt) | 640 | 67.6 | 90.0 | 784.5 | 2.59 | 44.4 | 168.6 |
|
||||
| [YOLOv8x-pose](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8x-pose.pt) | 640 | 69.2 | 90.2 | 1607.1 | 3.73 | 69.4 | 263.2 |
|
||||
| [YOLOv8x-pose-p6](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8x-pose-p6.pt) | 1280 | 71.6 | 91.2 | 4088.7 | 10.04 | 99.1 | 1066.4 |
|
||||
|
||||
=== "OBB (DOTAv1)"
|
||||
|
||||
See [Oriented Detection Docs](https://docs.ultralytics.com/tasks/obb/) for usage examples with these models trained on [DOTAv1](https://docs.ultralytics.com/datasets/obb/dota-v2/#dota-v10/), which include 15 pre-trained classes.
|
||||
|
||||
| Model | size<br><sup>(pixels) | mAP<sup>test<br>50 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>A100 TensorRT<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |
|
||||
|----------------------------------------------------------------------------------------------|-----------------------| -------------------- | -------------------------------- | ------------------------------------- | -------------------- | ----------------- |
|
||||
| [YOLOv8n-obb](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8n-obb.pt) | 1024 | 78.0 | 204.77 | 3.57 | 3.1 | 23.3 |
|
||||
| [YOLOv8s-obb](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8s-obb.pt) | 1024 | 79.5 | 424.88 | 4.07 | 11.4 | 76.3 |
|
||||
| [YOLOv8m-obb](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8m-obb.pt) | 1024 | 80.5 | 763.48 | 7.61 | 26.4 | 208.6 |
|
||||
| [YOLOv8l-obb](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8l-obb.pt) | 1024 | 80.7 | 1278.42 | 11.83 | 44.5 | 433.8 |
|
||||
| [YOLOv8x-obb](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8x-obb.pt) | 1024 | 81.36 | 1759.10 | 13.23 | 69.5 | 676.7 |
|
||||
|
||||
## Usage Examples
|
||||
|
||||
This example provides simple YOLOv8 training and inference examples. For full documentation on these and other [modes](../modes/index.md) see the [Predict](../modes/predict.md), [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md) docs pages.
|
||||
|
||||
Note the below example is for YOLOv8 [Detect](../tasks/detect.md) models for object detection. For additional supported tasks see the [Segment](../tasks/segment.md), [Classify](../tasks/classify.md), [OBB](../tasks/obb.md) docs and [Pose](../tasks/pose.md) docs.
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
PyTorch pretrained `*.pt` models as well as configuration `*.yaml` files can be passed to the `YOLO()` class to create a model instance in python:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Load a COCO-pretrained YOLOv8n model
|
||||
model = YOLO('yolov8n.pt')
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data='coco8.yaml', epochs=100, imgsz=640)
|
||||
|
||||
# Run inference with the YOLOv8n model on the 'bus.jpg' image
|
||||
results = model('path/to/bus.jpg')
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
CLI commands are available to directly run the models:
|
||||
|
||||
```bash
|
||||
# Load a COCO-pretrained YOLOv8n model and train it on the COCO8 example dataset for 100 epochs
|
||||
yolo train model=yolov8n.pt data=coco8.yaml epochs=100 imgsz=640
|
||||
|
||||
# Load a COCO-pretrained YOLOv8n model and run inference on the 'bus.jpg' image
|
||||
yolo predict model=yolov8n.pt source=path/to/bus.jpg
|
||||
```
|
||||
|
||||
## Citations and Acknowledgements
|
||||
|
||||
If you use the YOLOv8 model or any other software from this repository in your work, please cite it using the following format:
|
||||
|
||||
!!! Quote ""
|
||||
|
||||
=== "BibTeX"
|
||||
|
||||
```bibtex
|
||||
@software{yolov8_ultralytics,
|
||||
author = {Glenn Jocher and Ayush Chaurasia and Jing Qiu},
|
||||
title = {Ultralytics YOLOv8},
|
||||
version = {8.0.0},
|
||||
year = {2023},
|
||||
url = {https://github.com/ultralytics/ultralytics},
|
||||
orcid = {0000-0001-5950-6979, 0000-0002-7603-6750, 0000-0003-3783-7069},
|
||||
license = {AGPL-3.0}
|
||||
}
|
||||
```
|
||||
|
||||
Please note that the DOI is pending and will be added to the citation once it is available. YOLOv8 models are provided under [AGPL-3.0](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) and [Enterprise](https://ultralytics.com/license) licenses.
|
152
docs/en/models/yolov9.md
Normal file
152
docs/en/models/yolov9.md
Normal file
@ -0,0 +1,152 @@
|
||||
---
|
||||
comments: true
|
||||
description: Discover YOLOv9, the latest addition to the real-time object detection arsenal, leveraging Programmable Gradient Information and GELAN architecture for unparalleled performance.
|
||||
keywords: YOLOv9, real-time object detection, Programmable Gradient Information, GELAN architecture, Ultralytics, MS COCO dataset, open-source, lightweight model, computer vision, AI
|
||||
---
|
||||
|
||||
# YOLOv9: A Leap Forward in Object Detection Technology
|
||||
|
||||
YOLOv9 marks a significant advancement in real-time object detection, introducing groundbreaking techniques such as Programmable Gradient Information (PGI) and the Generalized Efficient Layer Aggregation Network (GELAN). This model demonstrates remarkable improvements in efficiency, accuracy, and adaptability, setting new benchmarks on the MS COCO dataset. The YOLOv9 project, while developed by a separate open-source team, builds upon the robust codebase provided by [Ultralytics](https://ultralytics.com) [YOLOv5](yolov5.md), showcasing the collaborative spirit of the AI research community.
|
||||
|
||||

|
||||
|
||||
## Introduction to YOLOv9
|
||||
|
||||
In the quest for optimal real-time object detection, YOLOv9 stands out with its innovative approach to overcoming information loss challenges inherent in deep neural networks. By integrating PGI and the versatile GELAN architecture, YOLOv9 not only enhances the model's learning capacity but also ensures the retention of crucial information throughout the detection process, thereby achieving exceptional accuracy and performance.
|
||||
|
||||
## Core Innovations of YOLOv9
|
||||
|
||||
YOLOv9's advancements are deeply rooted in addressing the challenges posed by information loss in deep neural networks. The Information Bottleneck Principle and the innovative use of Reversible Functions are central to its design, ensuring YOLOv9 maintains high efficiency and accuracy.
|
||||
|
||||
### Information Bottleneck Principle
|
||||
|
||||
The Information Bottleneck Principle reveals a fundamental challenge in deep learning: as data passes through successive layers of a network, the potential for information loss increases. This phenomenon is mathematically represented as:
|
||||
|
||||
```python
|
||||
I(X, X) >= I(X, f_theta(X)) >= I(X, g_phi(f_theta(X)))
|
||||
```
|
||||
|
||||
where `I` denotes mutual information, and `f` and `g` represent transformation functions with parameters `theta` and `phi`, respectively. YOLOv9 counters this challenge by implementing Programmable Gradient Information (PGI), which aids in preserving essential data across the network's depth, ensuring more reliable gradient generation and, consequently, better model convergence and performance.
|
||||
|
||||
### Reversible Functions
|
||||
|
||||
The concept of Reversible Functions is another cornerstone of YOLOv9's design. A function is deemed reversible if it can be inverted without any loss of information, as expressed by:
|
||||
|
||||
```python
|
||||
X = v_zeta(r_psi(X))
|
||||
```
|
||||
|
||||
with `psi` and `zeta` as parameters for the reversible and its inverse function, respectively. This property is crucial for deep learning architectures, as it allows the network to retain a complete information flow, thereby enabling more accurate updates to the model's parameters. YOLOv9 incorporates reversible functions within its architecture to mitigate the risk of information degradation, especially in deeper layers, ensuring the preservation of critical data for object detection tasks.
|
||||
|
||||
### Impact on Lightweight Models
|
||||
|
||||
Addressing information loss is particularly vital for lightweight models, which are often under-parameterized and prone to losing significant information during the feedforward process. YOLOv9's architecture, through the use of PGI and reversible functions, ensures that even with a streamlined model, the essential information required for accurate object detection is retained and effectively utilized.
|
||||
|
||||
### Programmable Gradient Information (PGI)
|
||||
|
||||
PGI is a novel concept introduced in YOLOv9 to combat the information bottleneck problem, ensuring the preservation of essential data across deep network layers. This allows for the generation of reliable gradients, facilitating accurate model updates and improving the overall detection performance.
|
||||
|
||||
### Generalized Efficient Layer Aggregation Network (GELAN)
|
||||
|
||||
GELAN represents a strategic architectural advancement, enabling YOLOv9 to achieve superior parameter utilization and computational efficiency. Its design allows for flexible integration of various computational blocks, making YOLOv9 adaptable to a wide range of applications without sacrificing speed or accuracy.
|
||||
|
||||

|
||||
|
||||
## Performance on MS COCO Dataset
|
||||
|
||||
The performance of YOLOv9 on the [COCO dataset](../datasets/detect/coco.md) exemplifies its significant advancements in real-time object detection, setting new benchmarks across various model sizes. Table 1 presents a comprehensive comparison of state-of-the-art real-time object detectors, illustrating YOLOv9's superior efficiency and accuracy.
|
||||
|
||||
**Table 1. Comparison of State-of-the-Art Real-Time Object Detectors**
|
||||
|
||||
| Model | size<br><sup>(pixels) | AP<sup>val<br>50-95 | AP<sup>val<br>50 | AP<sup>val<br>75 | params<br><sup>(M) | FLOPs<br><sup>(B) |
|
||||
|---------------------------------------------------------------------------------------|-----------------------|---------------------|------------------|------------------|--------------------|-------------------|
|
||||
| YOLOv9-S | 640 | 46.8 | 63.4 | 50.7 | 7.2 | 26.7 |
|
||||
| YOLOv9-M | 640 | 51.4 | 68.1 | 56.1 | 20.1 | 76.8 |
|
||||
| [YOLOv9-C](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov9c.pt) | 640 | 53.0 | 70.2 | 57.8 | 25.5 | 102.8 |
|
||||
| [YOLOv9-E](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov9e.pt) | 640 | 55.6 | 72.8 | 60.6 | 58.1 | 192.5 |
|
||||
|
||||
YOLOv9's iterations, ranging from the smaller S variant to the extensive E model, demonstrate improvements not only in accuracy (AP metrics) but also in efficiency with a reduced number of parameters and computational needs (FLOPs). This table underscores YOLOv9's ability to deliver high precision while maintaining or reducing the computational overhead compared to prior versions and competing models.
|
||||
|
||||
Comparatively, YOLOv9 exhibits remarkable gains:
|
||||
|
||||
- **Lightweight Models**: YOLOv9-S surpasses the YOLO MS-S in parameter efficiency and computational load while achieving an improvement of 0.4∼0.6% in AP.
|
||||
- **Medium to Large Models**: YOLOv9-M and YOLOv9-E show notable advancements in balancing the trade-off between model complexity and detection performance, offering significant reductions in parameters and computations against the backdrop of improved accuracy.
|
||||
|
||||
The YOLOv9-C model, in particular, highlights the effectiveness of the architecture's optimizations. It operates with 42% fewer parameters and 21% less computational demand than YOLOv7 AF, yet it achieves comparable accuracy, demonstrating YOLOv9's significant efficiency improvements. Furthermore, the YOLOv9-E model sets a new standard for large models, with 15% fewer parameters and 25% less computational need than [YOLOv8x](yolov8.md), alongside a substantial 1.7% improvement in AP.
|
||||
|
||||
These results showcase YOLOv9's strategic advancements in model design, emphasizing its enhanced efficiency without compromising on the precision essential for real-time object detection tasks. The model not only pushes the boundaries of performance metrics but also emphasizes the importance of computational efficiency, making it a pivotal development in the field of computer vision.
|
||||
|
||||
## Conclusion
|
||||
|
||||
YOLOv9 represents a pivotal development in real-time object detection, offering significant improvements in terms of efficiency, accuracy, and adaptability. By addressing critical challenges through innovative solutions like PGI and GELAN, YOLOv9 sets a new precedent for future research and application in the field. As the AI community continues to evolve, YOLOv9 stands as a testament to the power of collaboration and innovation in driving technological progress.
|
||||
|
||||
## Usage Examples
|
||||
|
||||
This example provides simple YOLOv9 training and inference examples. For full documentation on these and other [modes](../modes/index.md) see the [Predict](../modes/predict.md), [Train](../modes/train.md), [Val](../modes/val.md) and [Export](../modes/export.md) docs pages.
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
PyTorch pretrained `*.pt` models as well as configuration `*.yaml` files can be passed to the `YOLO()` class to create a model instance in python:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Build a YOLOv9c model from scratch
|
||||
model = YOLO('yolov9c.yaml')
|
||||
|
||||
# Build a YOLOv9c model from pretrained weight
|
||||
model = YOLO('yolov9c.pt')
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data='coco8.yaml', epochs=100, imgsz=640)
|
||||
|
||||
# Run inference with the YOLOv9c model on the 'bus.jpg' image
|
||||
results = model('path/to/bus.jpg')
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
CLI commands are available to directly run the models:
|
||||
|
||||
```bash
|
||||
# Build a YOLOv9c model from scratch and train it on the COCO8 example dataset for 100 epochs
|
||||
yolo train model=yolov9c.yaml data=coco8.yaml epochs=100 imgsz=640
|
||||
|
||||
# Build a YOLOv9c model from scratch and run inference on the 'bus.jpg' image
|
||||
yolo predict model=yolov9c.yaml source=path/to/bus.jpg
|
||||
```
|
||||
|
||||
## Supported Tasks and Modes
|
||||
|
||||
The YOLOv9 series offers a range of models, each optimized for high-performance [Object Detection](../tasks/detect.md). These models cater to varying computational needs and accuracy requirements, making them versatile for a wide array of applications.
|
||||
|
||||
| Model Type | Pre-trained Weights | Tasks Supported | Inference | Validation | Training | Export |
|
||||
|------------|-----------------------------------------------------------------------------------------|----------------------------------------|-----------|------------|----------|--------|
|
||||
| YOLOv9-C | [yolov9c.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov9c.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
| YOLOv9-E | [yolov9e.pt](https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov9e.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
|
||||
This table provides a detailed overview of the YOLOv9 model variants, highlighting their capabilities in object detection tasks and their compatibility with various operational modes such as [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md). This comprehensive support ensures that users can fully leverage the capabilities of YOLOv9 models in a broad range of object detection scenarios.
|
||||
|
||||
## Citations and Acknowledgements
|
||||
|
||||
We would like to acknowledge the YOLOv9 authors for their significant contributions in the field of real-time object detection:
|
||||
|
||||
!!! Quote ""
|
||||
|
||||
=== "BibTeX"
|
||||
|
||||
```bibtex
|
||||
@article{wang2024yolov9,
|
||||
title={{YOLOv9}: Learning What You Want to Learn Using Programmable Gradient Information},
|
||||
author={Wang, Chien-Yao and Liao, Hong-Yuan Mark},
|
||||
booktitle={arXiv preprint arXiv:2402.13616},
|
||||
year={2024}
|
||||
}
|
||||
```
|
||||
|
||||
The original YOLOv9 paper can be found on [arXiv](https://arxiv.org/pdf/2402.13616.pdf). The authors have made their work publicly available, and the codebase can be accessed on [GitHub](https://github.com/WongKinYiu/yolov9). We appreciate their efforts in advancing the field and making their work accessible to the broader community.
|
Reference in New Issue
Block a user