Human beings can be utterly perplexing sometimes.
After all, our species have this (not so) amazing talent to form a totally uninformed opinion on just about everything. Regardless of whether we know anything about the things that we have become so determined to either a) like, or b) dislike.
# BEGIN: Deeply philosophical interlude.
What's more, once opinions have been assigned to the mysterious (and somewhat immutable) data structures defined deep in our (mostly cranial) cavities, they are near-impossible to alter. It simply becomes a part of us. Tragically, we know all too well, that this starts to impact our ability to assess facts impartially, especially if they go against our deep-seated views. And inevitably, this prevents us from making rational, cool-headed decisions on critically important issues. To top it all off, we like to voice our opinions loudly. Over. And. Over. Again. To whomever that listens.
# END: Deeply philosophical interlude. Return to normal garbage.
So based on our extremely scientific 15-second study of human psychology, we've finally decided to equip Rosie Patrol with much-hyped about FI or Fartificial Intelligence (commonly known in not-so renowned academic circles as farcical intelligence akin to neurological flatulence).
And why would a superhero robot like Rosie Patrol require this? It's because guardians of the
All in all, it is our hope that this much needed downgrade will allow Rosie Patrol to closely mimic the "best" of human behaviour. Such as:
- Instantly offering a completely unfounded opinion on something that we see, even though no-one asked (and no-one actually cares). Fight wars based on these opinions, if you've got nothing better to be doing.
- Only changing those opinions if we absolutely have to. Actually, let's not. It's just easier to live with them. Forever. Than admit that you're wrong.
- Go on and on (and on) about the things that we just happen to have an opinion on - at the expense of infinitely more important matters. Bizarrely, some people might even like hearing you go on and on (and on) about them... thereby creating an infinite loop of never-ending stupidity.
Happy? Roll on Project FI!
All superheroes need:
- We're still operating 2 Raspberry Pi 3s here, both running Raspbian OS: rosie-01 and rosie-02. Connected to each other over Wi-Fi, and to the Internet. You'll notice that we're using the API endpoints we created previously using Flask to remotely control various parts of Rosie, like her motors, lights and eyes. Yes, it's completely unnecessary.
- No new gadgets are strictly required here. However, it doesn't mean we've disposed of any of the old ones. Nope, Rosie Patrol is still very much equipped to her eyeballs with random gadgetry, such as:
- Relay-controlled head torch... why not? It's so this season.
- Dot LED matrix displays... for Rosie Patrol's expressive eyes. After all, farcical opinions seem a lot more legitimate when they are accompanied by distracting lighting.
- Raspberry Pi Camera Module V2... quite important this one, actually.
- Speaker... you'll need this to hear Rosie Patrol's inner thoughts.
- Servo motors and controller... make Rosie Patrol's neck move (yes, it's completely over the top).
- DC motor controller board and wheels... would you really prefer to carry a robot around instead?
Already completed these missions?
- Lights in shining armour
- I'm just angling around
- Eye would like to have some I's
- Eh, P.I?
- Lights, camera, satisfaction
- Beam me up, Rosie!
- a, b, see, d
- Code: read
Your mission, should you accept it, is to:
- Modify our Python REST API destined to Google Cloud Vision API to carry out label detection, instead of text detection (OCR). It will literally take less time to change this than it just took to read this sentence.
- Write some Python code to process the labels (objects) that are detected by, and returned from Google Cloud Vision API
- Forget all you know about intelligence, and implement something significantly subpar. Create Python code to form random opinions, irritatingly voice them (over and over again) and introduce suspiciously human-like bias.
- Pilot her around space (no, not that one) filled with highly thought-provoking test objects, and listen to her sound like Marvin out of The Hitchhiker's Guide to the Galaxy
The brief:
The basic principles behind this experiment were actually put together during our little (and surprisingly popular!) detour to get Rosie to play Countdown. In it, we discovered that we could:- Take photos of the environment using Raspberry Pi Camera Module V2. Specifically, of the TV screen showing Countdown.
- Send photos to Google Cloud Vision API using Python's Requests module, to perform Optical Character Recognition (OCR)
- Do some Python stuff, using a pre-complied dictionary, with the detected text to solve Countdown's letters round. This bit is totally redundant for this experiment.
- Use Python gTTS to produce a mp3 file of robot saying the top scoring answer
- Play it back to humans, using omxplayer
To this end, we could train our very own machine learning Classification algorithm; that is, a program that attempts to categorise a bunch of similar data (in our case, camera images) into pre-defined labels (for example, names or descriptions of objects), according to certain attributes present in the data. We'd feed it lots of images to train our algorithm, and measure its accuracy using test images. The problem is, we'd probably spend months on end feeding our program pictures of things that we find around the house, and tuning the algorithm to ensure it is correctly detecting the objects in them. It also involves you actually having to know some pretty clever stuff (like maths), and possibly the use of some more powerful computers, to develop anything remotely usable. Unless you're planning to be an expert in Machine Learning*, it's probably not the best use of your precious time.
*Probably not a bad idea, since most jobs will soon be held by robots anyway (...apparently)
...That's why we'll be returning to using Google Cloud Vision API, specifically its label detection feature, to do all this clever stuff for us.
Here is a clearly scientific (and somewhat unintentionally encrypted) blueprint for this highly sophisticated experiment.
Accompanied by somewhat less cryptic text:
1 | Take a photo of the surroundings, using picamera |
2 | Carry out label detection using Google Cloud Vision API and Python's Requests |
3 | Run a whole bunch of stuff in Python to form and track opinions about the objects detected |
4 | Use Python's gTTS to convert the answer into speech (mp3) |
5 | Play the audio back using omxplayer and a speaker connected to the Pi |
And oh yes, we'll be once again controlling Rosie Patrol's movements around a
Information overload:
We previously used Google Cloud Vision API to perform Optical Character Recognition (OCR). We'll now use it for recognising objects.Thankfully, as the documentation makes clear, this task is actually as trivial as changing a JSON value of type to LABEL_DETECTION. Send this to Google Cloud Vision API in a REST API POST request, along with the base-64 encoded image taken by the Pi Camera, using the Python Requests module. And that's about it for the label detection phase of Project FI.
def _construct_google_vision_json(image=None): data = { "requests": [ { "image": { "content": "" }, "features": [ { "type": "LABEL_DETECTION" } ] } ] } data["requests"][0]["image"]["content"] = _encode_image(image) return data
There's a slight difference in how we handle the response, however. Unlike before with text detection, the JSON response back from the Google mothership consists of multiple potential objects that have been detected in the image, along with a confidence score. This means we now need to look through multiple "labelAnnotations" records stored in the JSON response.
Something like this will allow us to store the multiple objects in a list.
def _find_objects_in_image(image=None, url=None, token=None): if not path.exists(image): print("File", image, "does not exist") sys.exit() r_request = _post_json_request(url+token, _construct_google_vision_json(image)) if r_request.status_code == 200: if r_request.json()["responses"][0]: return r_request.json()["responses"][0]["labelAnnotations"] else: print("HTTP error encountered", r_request.status_code)
Clearly, none of this is meaningful, unless we run it after taking a photo using Pi Camera. Out comes our function we used before, now with the ability to archive the photos being taken (so that we can inspect them later, rather than being overwritten).
def _detect_object(camera): camera.capture(SOURCE_IMAGE) discovered = _find_objects_in_image(SOURCE_IMAGE, GOOGLE_VISION_API, GOOGLE_VISION_TOKEN) copyfile( SOURCE_IMAGE, ARCHIVE_IMAGE+datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")+".jpg" ) return discovered
If everything is working correctly, we should be able to do something like this to look through the list of objects (described in dictionaries) that have been detected. This particular implementation will allow us to pick the object with the highest confidence score. And send it to our _form_random_opinions() function for processing by the core code underpinning the hyper-intelligence of Project FI. In our actual code (right at bottom of post), we actually implemented some bias here for Rosie Patrol to favour objects that she has already formed an opinion on (more on this in a minute).
discovered_objects = _detect_object(cam1) if discovered_objects is not None: best_match = {} best_score = 0 for discovered_object in discovered_objects: if discovered_object["score"] > best_score: best_match = discovered_object speech = _form_random_opinions(best_match["description"])
Now, we realise that we kept promising you the stupidification of robot intelligence on a grand scale. And everything so far seems boringly sensible. And worryingly, quite useful.
Here's a completely unrelated before photo of Rosie Patrol, prior to being subjected to human interference. She is quite clearly very sophisticated.
Back to our mission.
That's right: _form_random_opinions() is precisely where we are attempting to scientifically mimic human behaviour. No expense has been spared in developing this advanced military-grade algorithm. It puts the F back into FI. Yes, we're doing well. We're cooking on gas.
The function revolves around 2 lists:
rosie_likes = [] rosie_dislikes = []
Not unlike humans, we'll get Rosie Patrol to maintain running lists of things that she likes, and dislikes.
And _form_random_opinions() helps to do these things as objects (new and old) are detected.
- If it's an object that's already in Rosie Patrol's rosie_likes list, we'll randomly construct a positive sentence based on some stock 'like' responses stored in a list of tuples: rosie_likes_preambles
- Similarly, if it's an object that's already in her rosie_dislikes list, we'll randomly construct a negative sentence based on some stock 'dislike' responses stored in a list of tuples: rosie_dislikes_preambles
- If it's an entirely new object, we'll randomly like or dislike it, and store the result in one of our 2 lists. Remember; Rosie Patrol will never change her mind (although stopping the Python application clears her memory, quite literally).
rosie_likes_preambles = [ ("My computer brain goes all fuzzy when I see", "I just love it!"), ("Life is so much better when", "is there."), ("Beautiful. Just beautiful.", "is a work of art."), ("Yes! I couldn't possibly imagine life without", ""), ("Dear ", "I think I'm in love with you.") ]
...And this is _form_random_opinions():
def _form_random_opinions(discovered_object): global rosie_likes global rosie_dislikes if discovered_object in rosie_likes: selection = randint(0, len(rosie_dislikes_preambles)-1) speech = ( rosie_likes_preambles[selection][0] + " " + discovered_object + " " + rosie_likes_preambles[selection][1] ) _post_rosie_json_request(API_EYES_URL, "expression", "happy") _post_rosie_json_request(API_LIGHTS_URL, "light", 1) elif discovered_object in rosie_dislikes: selection = randint(0, len(rosie_dislikes_preambles)-1) speech = ( rosie_dislikes_preambles[selection][0] + " " + discovered_object + " " + rosie_dislikes_preambles[selection][1] ) _post_rosie_json_request(API_EYES_URL, "expression", "broken") _post_rosie_json_request(API_CONTROL_URL, "control", "stop") _post_rosie_json_request(API_LIGHTS_URL, "light", 3) else: like_or_dislike = randint(0, 1) if like_or_dislike == 0: rosie_likes.append(discovered_object) speech =\ "I've randomly decided that I like " + discovered_object _post_rosie_json_request(API_EYES_URL, "expression", "happy") _post_rosie_json_request(API_LIGHTS_URL, "light", 1) elif like_or_dislike == 1: rosie_dislikes.append(discovered_object) speech =\ "I've made up my mind. I don't like " + discovered_object _post_rosie_json_request(API_EYES_URL, "expression", "broken") _post_rosie_json_request(API_LIGHTS_URL, "light", 3) return speech
You might have noticed a few API requests thrown in for added excitement. Clearly, we want Rosie Patrol to perform a few other things as she notifies us of her inner thoughts, like change her eyes, lights and even to stop her movement (if she's moving at the time).
The speech - constructed in mp3 format using gTTS - is played back through speakers using omxplayer, just like before.
Now, if you're still reading this, it's probably because you desperately wanted to ensure that we delivered on the promise of meaningless robot sound effects. Here it is, sandwiching our object detection routine. It's been put into a thread, so that sound effects play in the background while the detection is taking place. Furthermore, we won't continue with the rest of the program - enforced using .join() - until the sound effects have stopped playing, forcing you to sit through all of its majestic noise.
t_soundfx = Thread(target=_play_sound, args=(SFX_PROCESSING,)) t_soundfx.daemon = True t_soundfx.start() _post_rosie_json_request(API_LIGHTS_URL, "light", 4) discovered_objects = _detect_object(cam1) t_soundfx.join()
This particular jingle was obtained from ZapSlat which appears to be a free, downloadable repository of sound effects.
The moment of (not so true) truth:
This is it. The grand unveiling of Project FI. You can now pilot Rosie Patrol around the chosen terrain - hopefully populated by lots of
...And you'll soon know just how strongly she feels about the things she's encountered before (flooring in particular is on the wrong end of a pretty vicious verbal tirade), when such things drift into her range over and over again...
And wholly for our amusement, and unlike with our brains, her likes and dislikes are far from concealed, and available for all of us to see in the rosie_likes and rosie_dislikes lists in real-time. Now, only if we could do that with real people... Oh, forgot. That's what Twitter is for?
Below is a picture of our highly professional testing ground, equipped with the most technologically advanced test objects, modelled on items that autonomous robots are highly likely to encounter during their top secret missions to defeat world evil. Yes, that is a guitar, toy pushchair, baby (note: not real), and a unicorn (note: not real either) amongst several other entities to frivolously form an opinion on.
Don't forget to set your API key in the Linux environment variable.
export ROSIE_GOOGLE_API_TOKEN='your_secret_key...'
Also, don't forget to monitor your Google Cloud Vision API usage. Depending on how frequently you are sending your API request, you might near or pass your quota.
Let's fire up our application and see what Rosie Patrol learns to like and dislike over time.
python3 random_opinions.py
...And the results are (fairly) amusing.
Clearly, the outcome is highly dependent on the quality of the photos, and how accurately Google Cloud Vision API is able to label the objects detected in them. Also, the code is missing any form of context when interpreting the objects. For example, is there likely to be a real unicorn marauding through a
Interestingly, in this particular setting, flooring becomes a recurring theme in every single photo taken. And our little Python fix to make Rosie Patrol prioritise her attentions on the things that she has already seen before becomes prominent (and rather very annoying). Despite unicorns and cute little puppies seeking her attention, she becomes dangerously obsessed by the evils of flooring as she has already formed an opinion on it, and because it keeps making an appearance in the photos. She's quite clearly pandering to an audience... an audience that simply cannot tolerate flooring.
That's not all.
Rosie objects to bottles.
...And she does not like guitars.
Here are the list of likes and dislikes compiled during one run.
And here's another.
At this point, we could put Rosie Patrol into 'auto' mode for several hours, and see what she learns to like and dislike over that time. That way, we really could prove if FI is the answer to all of world's problems.
Then again, it most probably isn't. It stinks. And that's probably why thousands of very clever boffins around the world continue to work on another (arguably more respected and legitimate) field of science: Artificial Intelligence (AI). And for this reason, for now, we'll bottle the F in FI away, and let the world return to worrying about the state of intelligence, in general.
By the way the entire code can be found here if you too are thinking about ejecting one out:
The devil is in the detail:
Google Cloud Vision API for object labelling is documented here. There's not much to it. Really.Python Requests docs, if you need a refresher:
Our completely meaningless sound effects were obtained from:
The motor controller board and libraries in use are from Monk Makes. Without movement, this isn't as much fun!
Comments
Post a Comment