Artificial Intelligence and Machine Learning Blogs
Explore AI and ML blogs. Discover use cases, advancements, and the transformative potential of AI for businesses. Stay informed of trends and applications.
Showing results for 
Search instead for 
Did you mean: 
Active Contributor


Do you know Cozmo? The friendly robot from Anki? he is...


Cozmo is a programmable robot that has many features...and one of those includes a you can Cozmo take a picture of something...and then do something with that picture...

To code for Cozmo you need to use Python...actually...Python 3 😉

For this blog, we're going to need a couple of let's install them...
pip3 install ‘cozmo[camera]’

This will install the Cozmo SDK...and you will need to install the Cozmo app in your phone as well...

If you have the SDK installed already, you may want to upgrade it because if you don't have the latest version it might not work...
pip3 install --upgrade cozmo

Now, we need a couple of extra things...
sudo apt-get install python-pygame
pip3 install pillow
pip3 install numpy

pygame is a games framework

pillow is a wrapper around the PIL library and it's used to manage images.

numpy allows us to manage complex numbers in Python.

That was the easy now we need to install OpenCV...which allows to manipulate images and video...

This one is a little bit tricky, so if you get on Google or just drop me a message...

First, make sure that OpenCV is not installed by removing it...unless you are sure it's working properly for you...
sudo apt-get uninstall opencv

Then, install the following prerequisites...
sudo apt-get install build-essential cmake pkg-config yasm python-numpy

sudo apt-get install libjpeg-dev libjpeg8-dev libtiff5-dev libjasper-dev

sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev
libv4l-dev libdc1394-22-dev

sudo apt-get install libxvidcore-dev libx264-dev libxine-dev libfaac-dev

sudo apt-get install libgtk-3-dev libtbb-dev libqt4-dev libmp3lame-dev

sudo apt-get install libatlas-base-dev gfortran

sudo apt-get install libopencore-amrnb-dev libopencore-amrwb-dev
libtheora-dev libxvidcore-dev x264 v4l-utils

If by any chance, something is not available on your system, simply remove it from the list and try again...unless you're like me and want to spend hours trying to get everything...

Now, we need to download the OpenCV source code so we can build it...from the source...
unzip //This should produce the folder opencv-3.4.0

Then, we need to download the contributions because there are some things not bundled in OpenCV by default...and you might need them for any other project...
//This should produce the folder opencv_contrib-3.4.0

As we have both folders, we can start compiling...
cd opencv-3.4.0
mkdir build
cd build
-D CMAKE_CXX_COMPILER=/usr/bin/g++
-D OPENCV_EXTRA_MODULES_PATH=/YourPath/opencv_contrib-3.4.0/modules
-D PYTHON_EXECUTABLE=/usr/bin/python3.6

Keep extra attention that you need to pass the correct path to your opencv_contrib it's better to pass the full path to avoid making errors...

And yes...that's a pretty long command for a build...and it took me a long time to make it you need to figure out all the parameters...

Once we're done, we need to make cmake will prepare the recipe...
make -j2

If there's any mistake, simply do this...
make clean

Then, we can finally install OpenCV by doing this...
sudo make install
sudo ldconfig

To test that it's working properly...simply do this...
>>>import cv2

If you don't have any errors...then we're good to go -;)

That was quite a lot of work...anyway...we need an extra tool to make sure our image get nicely processed...

Download textcleaner and put in the same folder as your Python script...

And...just in case you're wondering...yes...we're going to have Cozmo take a picture...we're going to process it...use SAP Leonardo's OCR API and then have Cozmo read it back to, huh?

SAP Leonardo's OCR API is still on version 2Alpha1...but regardless of works amazing well -;)

Although keep in mind that if the result is not always pretty accurate that because of the lighting, the position of the image, your handwritting and the fact that the OCR API is still in Alpha... first things first...we need a white board...

And hand writing is far from being good... -:(

Now, let's jump into the source code...
import cozmo
from cozmo.util import degrees
import PIL
import cv2
import numpy as np
import os
import requests
import json
import re
import time
import pygame
import _thread

def input_thread(L):

def process_image(image_name):
image = cv2.imread(image_name)

img = cv2.resize(image, (600, 600))
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

blur = cv2.GaussianBlur(img, (5, 5), 0)
denoise = cv2.fastNlMeansDenoising(blur)
thresh = cv2.adaptiveThreshold(denoise, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
blur1 = cv2.GaussianBlur(thresh, (5, 5), 0)
dst = cv2.GaussianBlur(blur1, (5, 5), 0)

cv2.imwrite('imggray.png', dst)

cmd = './textcleaner -g -e normalize -o 12 -t 5 -u imggray.png out.png'


def ocr():
url = ""

img_path = "out.png"

files = {'files': open (img_path, 'rb')}

headers = {
'APIKey': "APIKey",
'Accept': "application/json",

response =, files=files, headers=headers)

json_response = json.loads(response.text)
json_text = json_response['predictions'][0]
json_text = re.sub('\n',' ',json_text)
json_text = re.sub('3','z',json_text)
json_text = re.sub('0|O','o',json_text)
return json_text

def cozmo_program(robot: cozmo.robot.Robot): = False
L = []
_thread.start_new_thread(input_thread, (L,))
while True:
if L:
filename = "Message" + ".png"
pic_filename = filename
latest_image =
robot.say_text("Picture taken!").wait_for_completed()
message = ocr()
robot.say_text(message, use_cozmo_voice=True, duration_scalar=0.5).wait_for_completed()

cozmo.run_program(cozmo_program, use_viewer=True, force_viewer_on_top=True)

Let's analyze the code a little bit...


We're going to use threads, as we need to have a window where we can see what Cozmo is looking at and another with Pygame where we can press "Enter" as command to have Cozmo taking a picture.

Basically, when we run the application, Cozmo will move his head and get into picture mode...then, if we press "Enter" (On the terminal screen) it will take a picture and then send it to our OpenCV processing function.

This function will simply grab the image, scale it, make it grayscale, do a GaussianBlur to blur the image and remove the noise and reduce detail. Then we're going to apply a denoising to get rid of dust and fireflies...apply a threshold to separate the white and black pixels, and apply a couple more blurs...

Finally we're to call textcleaner to further remove noise and make the image cleaner...

So, here is the original picture taken by Cozmo...

This is the picture after our OpenCV post-processing...

And finally, this is our image after using textcleaner...

Finally, once we have the image the way we wanted, we can call the OCR API which is pretty straightforward...

To get the API Key, simply go to and log in...

Once we have the response back from the API, we can do some Regular Expressions cleanup just to make sure some characters doesn't get wrongly recognized...

Finally, we can have Cozmo to read the message out loud -;) And just for demonstration purposes...

Here, I was lucky enough that the lighting and everything was perfectly it was a pretty clean response...further tests were pretty bad -:( But's important to have good lighting...

Of wan to see a video of the process in action, right? Well...funny first try was perfect! Even better than this one...but I didn't shoot the video -:( Further tries were pretty crappy until I could get something acceptable...and this is what you're going to watch now...the sun coming through the window didn't helped me...but it's pretty good anyway...

Hope you liked this blog -:)