Object Classification and Tracking Using Scaled P8 YOLOv4 Lite Model

Authors

  • Shakil Shaikh
    Affiliation

    Department of Electronics and Telecommunication, Matoshri College of Engineering & Research Centre, Eklahare, Nashik, Savitribai Phule Pune University, 422105 Maharashtra, P.O.B. 411007, India

  • Jayant Chopade
    Affiliation

    Department of Electronics and Telecommunication, Matoshri College of Engineering & Research Centre, Eklahare, Nashik, Savitribai Phule Pune University, 422105 Maharashtra, P.O.B. 411007, India

  • Gajanan Kharate
    Affiliation

    Department of Electronics and Telecommunication, Matoshri College of Engineering & Research Centre, Eklahare, Nashik, Savitribai Phule Pune University, 422105 Maharashtra, P.O.B. 411007, India

https://doi.org/10.3311/PPee.20685

Abstract

One of the most difficult tasks in the area of computer vision is object detection, which combines object categorization and object location within a scene. In terms of object detection, Deep Neural Networks have been recently demonstrated to outperform alterna-
tive approaches. The issues related deep learning neural network is its complexity and huge computation, so it is not possible to detect and track the objects in image of high resolution in real time. We proposed scaled YOLOv4 lite model as Single Stage Detector Neural Network for object detection, tracking and it is trained using COCO 2017 dataset. To create the YOLOv4-CSP- P5- P6- P7- P8 networks, the Scaled YOLOv4 applied efficient network scaling strategies. The additional layer in YOLOv4 lite model is added as P8 layer which improves accuracy. Cross-stage-partial (CSP) connections and Mish activation are used in improved network design, such as backbone optimization and Neck (PAN). In the case of YOLOv4, however, it can only be trained once for all resolutions. Width and Height activations have been changed, allowing for faster network training. With YOLOv4 lite model, we used CSPDarkNet-53 model as a backbone. The experimental result show our YOLOv4 lite model can detect and track object up to 28 fps when model run with the video input and has an accuracy of 86.09% when tested on real-time video with resolutions 1920 × 1080 (full HD). AP = 50.81%, AP @50 = 63.6%, and AP @75 = 52.5% for CSPDarkNet-53 model backbone.

Keywords:

cross stage partial, object detection, computer vision, Deep Neural Network, backbone

Citation data from Crossref and Scopus

Published Online

2023-01-23

How to Cite

Shaikh, S., Chopade, J., Kharate, G. “Object Classification and Tracking Using Scaled P8 YOLOv4 Lite Model”, Periodica Polytechnica Electrical Engineering and Computer Science, 67(1), pp. 102–111, 2023. https://doi.org/10.3311/PPee.20685

Issue

Section

Articles