Object Classification and Tracking Using Scaled P8 YOLOv4 Lite Model

Shakil Shaikh; Jayant Chopade; Gajanan Kharate

doi:10.3311/PPee.20685

Authors

Shakil Shaikh

Affiliation

Department of Electronics and Telecommunication, Matoshri College of Engineering & Research Centre, Eklahare, Nashik, Savitribai Phule Pune University, 422105 Maharashtra, P.O.B. 411007, India
Jayant Chopade

Affiliation

Department of Electronics and Telecommunication, Matoshri College of Engineering & Research Centre, Eklahare, Nashik, Savitribai Phule Pune University, 422105 Maharashtra, P.O.B. 411007, India
Gajanan Kharate

Affiliation

Department of Electronics and Telecommunication, Matoshri College of Engineering & Research Centre, Eklahare, Nashik, Savitribai Phule Pune University, 422105 Maharashtra, P.O.B. 411007, India

https://doi.org/10.3311/PPee.20685

Abstract

One of the most difficult tasks in the area of computer vision is object detection, which combines object categorization and object location within a scene. In terms of object detection, Deep Neural Networks have been recently demonstrated to outperform alterna-
tive approaches. The issues related deep learning neural network is its complexity and huge computation, so it is not possible to detect and track the objects in image of high resolution in real time. We proposed scaled YOLOv4 lite model as Single Stage Detector Neural Network for object detection, tracking and it is trained using COCO 2017 dataset. To create the YOLOv4-CSP- P5- P6- P7- P8 networks, the Scaled YOLOv4 applied efficient network scaling strategies. The additional layer in YOLOv4 lite model is added as P8 layer which improves accuracy. Cross-stage-partial (CSP) connections and Mish activation are used in improved network design, such as backbone optimization and Neck (PAN). In the case of YOLOv4, however, it can only be trained once for all resolutions. Width and Height activations have been changed, allowing for faster network training. With YOLOv4 lite model, we used CSPDarkNet-53 model as a backbone. The experimental result show our YOLOv4 lite model can detect and track object up to 28 fps when model run with the video input and has an accuracy of 86.09% when tested on real-time video with resolutions 1920 × 1080 (full HD). AP = 50.81%, AP @50 = 63.6%, and AP @75 = 52.5% for CSPDarkNet-53 model backbone.

Keywords:

cross stage partial, object detection, computer vision, Deep Neural Network, backbone

Citation data from Crossref and Scopus

Object Classification and Tracking Using Scaled P8 YOLOv4 Lite Model

Authors

Abstract

Keywords:

Citation data from Crossref and Scopus

Published Online

How to Cite

Issue

Section

Make a Submission