Tuesday, August 28, 2018

Deep Learning with Raspberry Pi -- Real-time object detection with YOLO v3 Tiny! [updated on Dec 19 2018, detailed instruction included]

A quick note on Dec 18 2018:
Since I posted this article late Aug, I have been inquired many times on the detailed instruction and also the python wrapper. Having been really busy in the last several months, I finally found some spare time completing this blog with detailed instruction! All the information can be found in my GitHub repos which was forked from shizukachan/darknet-nnpack. I have modified the Makefile, added the two Python nonblocking wrapper, and made some other minor modification. It should "almost" work out of the box!

Here goes the updated article

I am a big fan of Yolo (You Only Look Once, Yolo website). Redmon & Farhadi's famous Yolo series work had big impacts on the deep learning society. BTW, their recent "paper" (Yolo v3: an incremental Improvement) is an interesting read as well.

So, what is Yolo? Yolo is a cutting-edge object detection algorithm, i.e., it detects objects from images. Traditionally people used moving windows to scan an image, and then try to recognize each snapshot in every possible window locations. This method is of course very time consuming because there are many different ways to place the window, and many computations need to be done repeatedly. Yolo, standing for "You Only Look Once" (not You Only Live Once), smartly avoids those heavy computations by directly predicting object category and their bounding boxes simultaneously.

YoloV3 is one of the latest updates of Yolo algorithm. The biggest change is that YoloV3 now uses only convolutional layers and no more fully-connected layer. Don't let the technical term scare you away! What does this implies is that YoloV3 does not care about the input image size anymore! As long as the height and width are integer times 32 (such as 224x224, 288x288, 608x288, etc), YoloV3 will work fine! Another major improvement of YoloV3 is that it gives predictions in the intermediate layers as well. Again, what does it mean, is that Yolo3 now does a better job predicting small objects than its previous version!

I will have to skip the technical detail here because the paper explained everything. The only thing you need to know is that Yolo is lightweight and fast and decently accurate. It is so lightweight and fast that it can even be used on Raspberry Pi, a single-board computer with smart-phone-grade CPU and limited RAM and no CUDA GPU, to run object detection in real-time! And, it is also convenient because the authors had provided configuration files and weights trained on COCO dataset. So no need to train your own model if you are only interested to detect common objects.

Although Yolo is super efficient, it still requires quite a lot of computation. The original YoloV3, which was written with a C++ library called Darknet by the same authors, will report "segmentation fault" on Raspberry Pi v3 model B+ because Raspberry Pi simply cannot provide enough memory to load the weight. YoloV3-tiny version, however, can be run on RPI 3, very slowly.

Again, I wasn't able to run YoloV3 full version on Pi 3. I think it wouldn't be possible to do so considering the large memory requirement by YoloV3. This article is all about implementing YoloV3-Tiny on Raspberry Pi Model 3B!

Quite a few steps still have to be done to speed up yolov3-tiny on the pi:
1. Install NNPACK, an acceleration library for the neural network to run on multi-core CPU
2. Add some special configuration to the Makefile to compile the Darknet Yolo source code on Cortex CPU and NNPACK optimization
3. Either install opencv C++ (big pain on raspberry pi) or write some python code to wrap darknet. I believe Yolo comes with a python wrapper but I haven't had a chance to test it on RPI.
4. Download Yolov3-tiny.cfg and Yolov3-tiny.weights. Run Darknet with Yolo tiny version (not full version)!

Sounds complicated? Luckily digitalbrain79 (not me) had already figured it out (https://github.com/digitalbrain79/darknet-nnpack). I had more luck with Shizukachan's fork version. I even made a few more changes to make it easier to follow:

Step 0: prepare Python and Pi Camera

Log in to Raspberry Pi using SSH or directly in terminal.
Make sure pip-install is included (it should come together with Debian
sudo apt-get install python-pip
Install OpenCV. The simplest way on RPI is as follows (do not build from source!):
sudo apt-get install python-opencv
Enable pi camera
sudo raspi-config
Go to Interfacing Options, and enable P1/Camera
You will have to reboot the pi to be able to use the camera
A few additional words here. In the advanced option of raspi-config, you can adjust the memory split between CPU and GPU. Although we would like to allocate more ram to CPU so that the pi can load a larger model, you will want to allocate at least 64MB to GPU as the camera module would require it.

Step 1: Install NNPACK

NNPACK was used to optimize Darknet without using a GPU. It is useful for embedded devices using ARM CPUs.
Idein's qmkl is also used to accelerate the SGEMM using the GPU. This is slower than NNPACK on NEON-capable devices and primarily useful for ARM CPUs without NEON.
The NNPACK implementation in Darknet was improved to use transform-based convolution computation, allowing for 40%+ faster inference performance on non-initial frames. This is most useful for repeated inferences, ie. video, or if Darknet is left open to continue processing input instead of allowed to terminate after processing input.

Install Ninja (building tool)

Install PeachPy and confu
sudo pip install --upgrade git+https://github.com/Maratyszcza/PeachPy
sudo pip install --upgrade git+https://github.com/Maratyszcza/confu
Install Ninja
git clone https://github.com/ninja-build/ninja.git
cd ninja
git checkout release
./configure.py --bootstrap
Install clang (I'm not sure why we need this, NNPACK doesn't use it unless you specifically target it).
sudo apt-get install clang

Install NNPACK

Install modified NNPACK
git clone https://github.com/shizukachan/NNPACK
confu setup
python ./configure.py --backend auto
If you are compiling for the Pi Zero, change the last line to python ./configure.py --backend scalar
You can skip the following several lines from the original darknet-nnpack repos. I found them not very necessary (or maybe I missed something)
It's also recommended to examine and edit https://github.com/digitalbrain79/NNPACK-darknet/blob/master/src/init.c#L215 to match your CPU architecture if you're on ARM, as the cache size detection code only works on x86.
Since none of the ARM CPUs have a L3, it's recommended to set L3 = L2 and set inclusive=false. This should lead to the L2 size being set equal to the L3 size.
Ironically, after some trial and error, I've found that setting L3 to an arbitrary 2MB seems to work pretty well.
Build NNPACK with ninja (this might take * quie * a while, be patient. In fact my Pi crashed in the first time. Just reboot and run again):
do a ls and you should be able to find the folders lib and include if all went well:
Test if NNPACK is working:
In my case, the test actually failed in the first time. But I just ran the test again and all items are passed. So if your test failed, don't panic, try one more time.
Copy the libraries and header files to the system environment:
sudo cp -a lib/* /usr/lib/
sudo cp include/nnpack.h /usr/include/
sudo cp deps/pthreadpool/include/pthreadpool.h /usr/include/
If the convolution-inference-smoketest fails, you've probably hit a compiler bug and will have to change to Clang or an older version of GCC.
You can skip the qmkl/qasm/qbin2hex steps if you aren't targeting the QPU.
Install qmkl
sudo apt-get install cmake
git clone https://github.com/Idein/qmkl.git
cd qmkl
cmake .
sudo make install
Install qasm2
sudo apt-get install flex
git clone https://github.com/Terminus-IMRC/qpu-assembler2
cd qpu-assembler2
makesudo make install
Install qbin2hex
git clone https://github.com/Terminus-IMRC/qpu-bin-to-hex
cd qpu-bin-to-hex
sudo make install

Step 2. Install darknet-nnpack

We have finally finished configuring everything needed. Now simply clone this repository. Note that we are cloning the yolov3branch. It comes with the python wrapper I wrote, correct makefile, and yolov3 weight:
git clone -b yolov3 https://github.com/zxzhaixiang/darknet-nnpack
cd darknet-nnpack
git checkout yolov3
At this point, you can build darknet-nnpack using make. Be sure to edit the Makefile before compiling.

Step 3. Test with YoloV3-tiny

Despite doing so many pre-configurations, Raspberry Pi is not powerful enough to run the full YoloV3 version. The YoloV3-tiny version, however, can be run at about 1 frame per second rate
I wrote two python nonblocking wrappers to run Yolo, rpi_video.py and rpi_record.py. What these two python codes do is to take pictures with PiCamera python library, and spawn darknet executable to conduct detection tasks to the picture, and then save to prediction.png, and the python code will load prediction.png and display it on the screen via opencv. Therefore, all the detection jobs are done by darknet, and python simply provides in and out. rpi_video.py will only display the real-time object detection result on the screen as an animation (about 1 frame every 1-1.5 second); rpi_record.py will also save each frame for your own record (like making a git animation afterwards)
To test it, simply run
sudo python rpi_video.py
sudo python rpi_record.py
You can adjust the task type (detection/classification?), weight, configure file, and threshold in line
yolo_proc = Popen(["./darknet",
                   "-thresh", "0.1"],
                   stdin = PIPE, stdout = PIPE)
For more details/weights/configuration/different ways to call darknet, refer to the official YOLO homepage.
As I mentioned, YoloV3-tiny does not care about the size of the input image. So feel free to adjust the camera resolution as long as both height and width are integer multiplication of 32.

#camera.resolution = (224, 224)
#camera.resolution = (608, 608)
camera.resolution = (544, 416)

Here are my test results:

1. It worked. Yolov3-tiny on Raspberry Pi 3 Model B+ has a frame rate of 1 frame per sec (FPS). The rpi_video.py will print the time it requires Yolov3-tiny to predict on an image. I was able to get numbers like 0.9 second to 1.1 second per frame. Not bad at all! Of course, you can't do any rigorous fast object tracing. But for a surveillance camera, or slow robot, or even drone, 1FPS is promising. NNPACK is critical here. As pointed out by Shizukachan, without NNPACK the frame rate will be lower than 0.1FPS!

2.Make sure the power supply you are using can truly provide 2.4A (which is desired by RPI 3B). I have seen cases that the detection speed drops to 1 frame per 1.7 seconds because the power supply did not provide sufficient power.

3. It worked limitedly. Yolov3-tiny is not that accurate compared to Yolov3 full version. But if you want to detect specific objects in some specific scene, you can probably train your own Yolo v3 model (must be the tiny version) on GPU desktop, and transplant it to RPI. Never try to train the model on RPI. Don't even think about it.. With pre-trained Yolov3-tiny on COCO dataset, some good transfer learning can be leveraged to speed up the training speed.

4. I didn't modify the source code of Yolo. When performing a detection task, Yolo outputs an image with bounding box, label and confidence overlaied on top. If you would like to get such information in a digital form, you will have to dig into Yolo's source code and modify the output part. It should be relatively straightforward.

Finally, the results. Note that I accelerated the video 5 times. The actual frame rate is about 1 frame per second.

Yolov3-tiny successfully detected keyboard, banana, person (me), cup, sometimes sofa, car, etc. It thought curious George as teddy bear all the time, probably because COCO dataset does not have a category called "Curious George stuffed animal". It got confused on the old-fashion calculator and sometimes recognized it as a laptop or a cell phone. But in general, I was very surprised to see the results, and the frame rate! 


  1. Hello Doctor again me :) You helped me more thank you so much! Now, I am trying to implement my distance equation inside the yolo code. Can you help me about? Thank you!

    1. hey. has been super busy recently. any luck on the distance calculation?

  2. hello, thanks for the beautiful post, I worked very well then suddenly the camera has stopped working, you can use a webCam usb? if possible what should I change in the "rpi_video.py"?

    1. you can use a USB camera. but you will have to modify the rpi_video.py file as it was written to get image from PiCam. In the past I didn't get too much luck with a USB camera. It was too slow compared to a Pi Cam

  3. Hi,
    thx for your post. Unfortunately it doesn't work for me. I'm using RasPi 3+ fresh Raspbian Stretch image and following strictly to your instructions. The ninja step throws: "warning: A compatible version of re2c (>= 0.11.3) was not found. changes to src/*.in.cc will not affect your build."

    And going further to NNPACK it stops always at the step "[53/140] CXX test/convolution-output/overfeat-fast.cc" (waited many hours in one run)

    Do you have any ideas how I could solve this issue or do some sort of workaround?

    best wishes

    1. The ninja step throws: "warning: A compatible version of re2c (>= 0.11.3) was not found. changes to src/*.in.cc will not affect your build."

      ==>>sudo apt-get install re2c

  4. hey man it is me again

    check the last comment of here, might interest you and everyone else


    i dont have time to try it tho

    ALSO , there is an optimized version of Opencv for ARM which you can install like this


    1. Thanks for sharing! Also OpenCV4 comes with obj detection. Haven't tried on RPI yet. big pain to build and optimize Opencv on RPI

  5. I have tried different times to install yoloV3 tiny following your instructions, but after a while Raspepberry3 freezes with the
    command: $NINJA_PATH/ninja.

    1. Don't give up. You should see the number of files reducing after each crash. Keep running it, the last few files install a lot quicker. Just reboot after every crash and re-run the command.

  6. Hi, thanks for such a great post. I followed your direction and it works well until I run the test script, where yolo_proc.stdout.read() output none. Any idea why? I ran the one single test on the capture image and it works

    1. I faced the same issue. And I suppose you ran the command in Python3 instead of python2.7. Iff that's the case, then it's the problem of the installation of 'confu' module. Unfortunately for some reason(my guess is due to the fact that python2.7 is going to be obsolete in 2020), it installs by default in Python3. To specify the pip installation only to python2.7, use the command as 'sudo python2 -m pip install --upgrade git+https://github.com/Maratyszcza/confu'. After that, go to the 'ninja' dir and run 'export NINJA_PATH=$PWD' again. Next, which is where the root problem is, go to the 'NNPACK' dir and run 'python ./configure.py --backend auto' as is, again. Now, you execute the 'sudo python rpi_video.py' as is and it should work. I think is should be sufficient for it to work, if not, run the rest of the commands once again.

  7. Sir! Thanks for the great tutorial. I want to access IP camera with this yolo version. Is it necessary to install opencv for that purpose?
    Secondly Are there any data sets available for single objects? Like I just need person detection so I'm looking for a person detection pre-trained model.
    Thanks again!

  8. Hello, sir,
    Thank you for this amazing tutorial.
    But I have a little problem when running with my own model that I have train before on my laptop. I have an error like this
    *** `./darknet 'error: corrupted vs. size. prev_size: 0x00334c10 ***

    can you help me with errors like this? thank you once again

    1. seems to be an issue of darknet. i haven't seen it before. Have you looked this? https://github.com/pjreddie/darknet/issues/105

  9. Everything looks fantastic.Amazing i m really impress with your content and very useful it.hydroglobalreview

  10. This comment has been removed by the author.

  11. Sir,
    I got error at the end
    When I try to run
    sudo python rpi_video.py

    Gtk-WARNING **: cannot open display: :1.0

    Can you help me

    1. Did you connect through SSH+VNC? or you were running directly on rpi?

  12. On step Install NNPACK, I got the error,

    Traceback (most recent call last):
    File "./configure.py", line 4, in
    import confu
    ImportError: No module named confu

    and when I ran, $NINJA_PATH/ninja

    I got

    ninja: no work to do.

    1. did you run "sudo pip install --upgrade git+https://github.com/Maratyszcza/confu" and did it run successfully?

    2. An unknown user posted a possible solution:
      "I faced the same issue. And I suppose you ran the command in Python3 instead of python2.7. Iff that's the case, then it's the problem of the installation of 'confu' module. Unfortunately for some reason(my guess is due to the fact that python2.7 is going to be obsolete in 2020), it installs by default in Python3. To specify the pip installation only to python2.7, use the command as 'sudo python2 -m pip install --upgrade git+https://github.com/Maratyszcza/confu'. After that, go to the 'ninja' dir and run 'export NINJA_PATH=$PWD' again. Next, which is where the root problem is, go to the 'NNPACK' dir and run 'python ./configure.py --backend auto' as is, again. Now, you execute the 'sudo python rpi_video.py' as is and it should work. I think is should be sufficient for it to work, if not, run the rest of the commands once again."

  13. thanks for share the tutorial.

    I already see your detection and that's amaze me.

    I want to ask something.
    how the way we can make the prediction more accurate and faster in detection?

    Thx in advance :)

    1. To make the prediction more accurate: train the neural network with more images, or use larger neural networks (YoloV3 is much more accurate than YoloV3-tiny, the one I am using here)
      To make the run faster: use more powerful devices than raspberry pi, or use a smaller neural network.

      I don't think there is a way to achieve both on raspberry pi without significant hardware improvement or model improvement