In my
previous blog, we have seen how the Object Detection with tensorflow and yolo is applied in Enterprise context in conjunction with SAP Leonardo Machine Learning Foundation. Now we will have a close look at how to implement custom object detection with yolo for creating intelligent solutions, especially how to train a custom object detector with custom dataset, and provision it as RESTful API running on SAP Cloud Platform, Cloud Foundry, being consumed by your intelligent solution through loosely-coupled HTTP(s).
My blog series of Object Detection for Intelligent Enterprise:
Overview of YOLO Object Detection
You Only Look Once (YOLO) is a state-of-the-art, real-time object detection system. Have a look at
this inspiring video about How computers learn to recognize objects instantly by Joseph Redmon on TED talk. As well as
this introductory video about YOLO Algorithm by Adrew Ng.
For more detail about YOLO, you may refer to
its official website.
Train custom object detector with YOLO
1.Prepare Dataset
In order to train your own object detector, you need to prepare the dataset for training, including the images with the target objects, and labelling the object in the images. Here you also have my read-to-use
shoe dataset(including images and yolo label files) for a quick start, which you can skip step 1 and step 2.
Step 1: Download Images with Target Objects
In my case, we need to be able to detect shoe in the SMB Market Place Solution for an intelligent online shopping experience on finding matched shoe with a photo through Facebook Messenger. So I need a dataset of shoe images, which I can easily to find from google by searching "shoe" images.
To download the images in bulk, I used a google chrome extension named
Fatkun Batch Download Image.
- Unselect the images without shoe and the carton images(no JPG format) from google search result.
- Click More Options buttons to rename the image with format "shoes_{NO000}.JPEG" as attached screen, which will save the image as shoes_000.JPEG~shoe_999.JPEG
Some tips:
1).Save the image with to an appropriate format. In my case, JPEG format is required by annotation tool (in my case )afterwards.
2).The closer images(angle, background etc) in the training dataset to the real image input in your case, the more accurate detection results.
3).You may need from 300~600 images per class for a relatively very good detection result as expected. It may requires more image from different angle and background to have a nearly perfect detection. In my case I have 600 images downloaded(540 for training, 60 for testing).
- Click Save Image button then you will have the images downloaded. Let's rename the image folder as "dataset".
Step 2: Label the Images with the Target Objects
Now you need to annotate all the downloaded images by marking the exact bounding boxes of shoe on the images with annotation tool.
In my case, I use
LabelImg to label the shoe images with YOLO format, which supports VOC-Pascal and YOLO format. Just simply save the yolo output txt file in the same folder of images(dataset).
As a result, yolo format annotation are created for all the images. An example of yolo annotation as below.
0 0.324444 0.371111 0.337778 0.484444
0 0.642222 0.640000 0.200000 0.506667
class_index box_x1_ratio box_y1_ratio box_width_ratio box_height_ratio
0 - The index of object, in my case, only one class - shoe.
0.324444 - box_x1_ratio(box_x1 / image_width)
0.371111 - box_y1_ratio(box_y1 / image_height)
...
Step 3: Generate the image list text files for training and testing.
1).Download the scripts
2).Structure the
directories tree as below:
The sample source code available here
training
-dataset //(Please copy your dataset from step 2 to here)
-*.JPEG
-*.txt
3).Run this script:
$ python create_train_list.py
As a result, 10% of the dataset will be allocated into test_list.txt, the rest of the dataset as train_list.txt. Now we are ready to train the custom object detector.
2.Training Custom Object Detection Model
Step 4: Follow this manual to install Darknet for YOLO
Simply, just run the following commands.
git clone https://github.com/pjreddie/darknet
cd darknet
make
[Optional] If you would like to play with YOLO object detection with pre-trained model on MS COCO dataset, you can follow the steps in the manual to download the
yolov3.weights and run the detector with command
./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg
If you are running a Windows machine, you can refer to
this fork.
Step 5: Copy the training folder in step 3 to darknet folder
as a result, darknet/training incorporate all training related materials.
Step 6: Recalculate the anchor box with K-Mean.
This step is important to have a successful training, which the anchor box is recalculated with the training dataset
For yolo v3:
Please run the generate_anchors_yolo_v3.py in training folder to recalculate the anchor boxes with K-Mean. 10 anchors is required in yolo v3 configuration.
python generate_anchors_yolo_v3.py -filelist <path to train_list.txt generated in step3> -num_clusters <number of clusters>
For example:
python generate_anchors_yolo_v3.py -filelist train_list.txt -num_clusters 10
As a result, the anchors is generated in ./anchors/anchors10.txt, which will used in Step 7.
For yolo v2:
If you are after yolo v2, please use generate_anchors_yolo_v2.py instead.
python generate_anchors_yolo_v2.py
-filelist train_list.txt -num_clusters 5.
The default yolo v2 configuration will require 5 anchors.As a result, the anchors are generated in ./anchors/anchors5.txt, which will used in Step 7.
Step 7: Prepare the yolo training configuration files.
Example:
- cfg/yolov3.cfg: The yolo v3 configuration file for MS COCO dataset, which will be used for training and detection
- data/coco.names: The label name list ofMS COCO dataset
- data/coco.data: The training configuration forMS COCO dataset.
We will need to create our own cfg, names and data files for custom object detection.
For yolo v3:
1).Prepare the yolo configuration file(.cfg)
Copy the yolov3-voc.cfg in darknet/cfg folder, and rename it as yolov3_shoe.cfg
cfg file define the CNN for yolo. The following options be updated.
line 2~7:
Comment the batch and subdivision for Testing and uncomment for Training
- batch: The number of images being processed images for each training step
- subdivision: The batch is divided by subdivision to decrease GPU VRAM requirements. If you have a powerful GPU with loads of VRAM, this number can be decreased, or batch could be increased. If the training step will throw a CUDA out of memory error so you can decrease the batch and increase the subdivisions accordingly.
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=16
Line 605, 689
- filters: 3 * (5 + class number).
- 3 - yolo v3 supports 3 scales
- 5 - Output vector: (prob, x1, y1, width, height, c1,c2...).
The fixed 5: includes prob, x1, y1, width, height.
Variance: c1 - result of the first class, c2 -result of the second class...
- class number: The number of target object classes
In my case, only one object (shoe) for detection, class number = 1, so filters=3 * (5 + 1) = 18.
Line 610~611, 778~779
- anchors: The anchor box in K-MEAN of the training data set. Please replace the anchors with the result of step 6 in anchor10.txt
- classes: the number of class. In my case, it is one.
anchors = <Please replae with the result of anchor10.txt in step 6>
classes=1
2).Create a name list file of labels as custom.names in training/cfg folder
shoe
3).Create training configuration file as shoe_training_config.data in training/cfg folder
classes= 1
train = training/train_list.txt
valid = training/test_list.txt
names = cfg/custom.names
backup = backup
For yolo v2:
1).Prepare the yolo configuration file
Copy the yolov2-voc.cfg in darknet/cfg folder, and rename it as yolov2_shoe.cfg
cfg file define the CNN for yolo. The following configurations need to be updated.
line 2~7:
Comment the batch and subdivision for Testing and uncomment for Training
- batch: The number of images being processed images for each training step
- subdivision: The batch is divided by subdivision to decrease GPU VRAM requirements. If you have a powerful GPU with loads of VRAM, this number can be decreased, or batch could be increased. If the training step will throw a CUDA out of memory error so you can decrease the batch and increase the subdivisions accordingly.
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=16
Line 237
- filters: 5 * (5 + class number).
In my case, only one object (shoe) for detection, class number = 1, so filters=5 * (5 + 1) = 30.
Line 242
- anchors: The anchor box in K-MEAN of the training data set. Please replace the anchors with the result of anchor5.txt in step 6
anchors = <Please replae with the result of anchor5.txt in step 6>
Line 244:
- classes: The number of classes
classes=1
2).Create a name list file of labels as custom.names in cfg folder
shoe
3).Create training configuration file as shoe_training_config.data in cfg folder
classes= 1
train = training/train_list.txt
valid = training/test_list.txt
names = cfg/custom.names
backup = backup
Step 8: Train the Custom Object Detection Model:
For training environment:
For yolo v3:
1).Download Pretrained Convolutional Weights
For training we use convolutional weights that are pre-trained on Imagenet. We use weights from the
darknet53 model. You can just download the weights for the convolutional layers
here(76 MB).
2).Train The Model with command below
./darknet detector train cfg/shoe_training_config.data cfg/yolov3_shoe.cfg darknet53.conv.74
I stop the training at a stable average loss at 0.2 after around 10000 iteration.
For yolo v2:
1).Download Pretrained Convolutional Weights
For training we use convolutional weights that are pre-trained on Imagenet. We use weights from the
Extraction model. You can just download the weights for the convolutional layers
here(76 MB).
2).Train The Model with command below
./darknet detector train cfg/shoe_training_config.data cfg/yolov2_shoe.cfg darknet19_448.conv.23
I stop the training at a stable average loss at 0.5 after around 6000 iteration. As a result, the weights file of training result can be found in the backup folder.
3.Testing Custom Object Detection Model
Run the command below:
#For yolo v3:
./darknet detector test cfg/shoe_training_config.data cfg/yolov3_shoe.cfg ./backup/yolov3_shoe.backup
#For yolo v2:
./darknet detector test cfg/shoe_training_config.data cfg/yolov2_shoe.cfg ./backup/yolov2_shoe.backup
Then specify the image path with target object, you will receive the result from terminal.
mask_scale: Using default '1.000000'
Loading weights from ./backup/yolov3_shoe.backup...Done!
Enter Image Path: data/Shoe.jpg
data/Shoe.jpg: Predicted in 3.484773 seconds.
shoe: 95%
shoe: 90%
You also can check the predictions.png in darknet root folder. It shows you like this.
Running the custom object detector as RESTful API on SAP Cloud Platform, Cloud Foundry
I have implemented a generic NodeJS RESTful API Wrapper for turning YOLO object detection into a RESTful API of object detection, which can be deployed on SAP Cloud Platform, Cloud Foundry or on-premise environment.
The project can be can be reused to provision your own custom object detector as web service with easy configuration. The source code is published with MIT license available
here. Please follow its manual to download, configure and deploy your custom object detector asRESTful API.
As a result, you will have custom object detection provisioned as RESTful API.
POST /Detect
Object detection with given image url and detection threshold
Request sample:
{
"ImageUrl": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSE36LOJ6NzReh-W_o5QKkgTUH7qbFygG_J1A0PWoPBnaH9UW50",
"Threshold": 0.80
}
Response sample:
[
{
"box": {
"y": 97.06493854522705,
"x": 87.51638531684875,
"w": 122.85414934158325,
"h": 62.75526809692383
},
"name": "shoe",
"prob": 0.9999971389770508
}
]
Note: x-top left x, y-top left y, w-width, h-height
Demo kit:
A web demo kit of yolo object detection can be accessed after deployment:
http://<YOUR_OBJECT_DETECTOR_HOST>:<PORT>/web/Detector
References: