Leroy, my first AI project, has presented a number of new technical challenges. Here are my lessons learned presented in a multi-part series.

Object detection versus classification

My first iteration of Project Leroy started with classification. Classification for Leroy was taking an entire frame from his camera’s video stream and trying to classify it as a type of bird. The results were very inconsistent.

Classification fail.

It makes sense why classification was not effective because when we're bird-watching we are not looking at everything our eyes see and trying to understand it as one type of thing. Instead we understand that what we're looking at is a composition of many little things. Of those things, we look for birds in particular and when we spot one, we try and understand what type of bird it is. This led me to update Leroy to achieve his goal in two steps. First object detection and second classification.

My daughter helped me test out object detection with her stuffed chicken Gary.

Once Leroy has detected an object, the results include the level of confidence and bounding box coordinates. With this information, I am able to set a threshold on when to proceed with actually capturing a photo. I’m constantly tweaking this, but for now 40% is pretty good. Once Leroy hits that threshold, he uses the bounding box to save only that part of the frame, like a cutout of the original photo. That cutout is what I run through classification. The results have been much more accurate.



The models I am using for object detection and classification respectively are:

  1. ssd_mobilenet_v2_coco_quant_postprocess_edgetpu.tflite
  2. mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite.

Both are provided by the Google Coral model page.