Tea minus 30

We're fast approaching Christmas time. And if robots were to make one simple observation about the human species during the Christmas festivities, it's that they watch a lot of TV. A LOT. Often, accompanied by an inappropriate amount of greenhouse gas-producing food. Stuff you don't normally eat during the remainder of the year - for good reason.

And most so-called shows on TV are boring to robots like Rosie. After all, why watch a minor subspecies of the human race - celebrities - stumble awkwardly around the dance floor, dressed like a faulty, sparking circuit board? Such branch of entertainment doesn't require robots to engage any of their proud circuitry. Their processors remain idle. Memory under-utilised.

But if robots are to be part of people's homes (and blend in), they need to look at least a little interested in some of this irrational nonsense. Nobody likes a party pooper. A killjoy. And this is where a certain subgenre of TV entertainment comes to the rescue of our autonomous friends. Game shows let robots participate in the host family's spontaneous (and short-lived) re-acquaintance with logic and intelligence during Christmas. And above all, keep their machinery semi-engaged. Of course, it's also a rare moment for computers to impress. To remind human beings that they are indeed clever enough to take over the world.

And for this little experiment, we have chosen a family favourite.

Countdown is a popular TV show, which has a number of different rounds. The letters round requires the (usually human) contestants to think of a word with the most number of letters, using only the 9 alphabetical letters that have been (sort of) randomly chosen. Best of all, it's a game that is totally robot-friendly. Probably a little too robot-friendly, in fact, to the point of it being easily resolvable using the simplest of code.

So while members of the family scratch their heads, and high-five each other over their pathetic attempts like 'bees', Rosie can smugly shout 'jukeboxes' and ~~annoy~~ impress the room. Well within a fraction of the allotted 30 seconds, clearly. After all, computers playing games against humans (like Go, Chess and Jeopardy) is apparently how uber-geeks and mega-corps demonstrate their (and their creations') intelligence these days.

It's about time Rosie got in on some of that man vs machine action. Anything - in fact - to avoid watching another minute of human glitter balls prancing around a room, wearing inexplicable grins. And, most importantly, it will help Rosie avoid those awkward moments when her algorithm fails to comprehend a distinctly human joke found inside a Christmas cracker. Apparently - we're told - those jokes about penguins are funny.

Let's get this over and done with...

Before we start any of this, we need a dataset. Specifically, we need a useful dataset that simply contains a list of English words as they might appear in a dictionary (although the definitions are not required here). As it happens, we found one here. The file, google-10000-english.txt, contains - guess what - 10,000 words. Sounds a lot? Apparently not, as there are over 170,000 words in the English language, so 10,000 represents a considerably smaller subset. But as it transpires, it's more than enough for us to test our crude invention.

Using a pre-compiled list also means that we don't fall foul of the dictionary corner - the slightly dystopian word-police of Countdown that takes pleasure in telling us that 'robotbutt' isn't a valid 9-letter word meaning to hit a robot with the top of your head.

Moreover, if we're serious about training our robots to be serial Countdown champions, we would need to go hunting for a more complete dataset. How else would they be able to win rounds with words like 'showbizzy'.

Time to take a peek inside our text file containing the 10,000 words.

But what's this? The list is in non-alphabetical order. We should have first read the instructions that came with it, because they do (quite clearly!) tell us that the words are in order of how frequently they appear in the English language. Not very relevant to us in this context, but probably is if you're doing other stuff with it... like working out how frequently some words appear in the English language.

Uncharacteristically for us, let's give this a bit more thought.

Is the entire list relevant to us once we know which 9 letters have been chosen? Actually, no. Our answer can only be a word that starts with one of the 9 chosen letters. So rather than trawl through the entire dataset of 10,000 words, we can focus our search at a reduced subset.

And to achieve this, we should neatly re-organise the list under the very first letter that they begin with. This will allow us to only look at the relevant portions of the list, using the letters in question.

Time for more keyboard bashing, and less chatting. Let's turn our attention to the Raspberry Pi, and the Raspbian operating system, to get our little show on the road.

First off, we'll create a sub-directory called '/dictionary' in which we'll store the file. For reasons discussed before, we may end up using other lists, from other sources, so it's best to keep these files in a designated location.

mkdir dictionary

...creates the directory in which to store our files containing words

That's it in Linux (Raspbian). Although we'll create a Python file later with the actual code (countdown.py), for now, we'll test this interactively using IPython.

ipython

...launches the IPython interactive shell, allowing us to code away

In our Python code, we'll open our text file using open(), and store the lines of words (strings) contained in it as a list data structure. Something like this will achieve just that by repeatedly appending to the list - we've called word_list - using append():

word_list = []
with open(input_file) as in_file:
    for line in in_file:
        word_list.append(line.strip())

Now that we have a list - word_list - we can quite easily sort the words stored in it in alphabetical order, like this:

word_list.sort()

At this point, if we, for example, look at the first 5 entries of word_list (which is possible using the list slicing operator [:5]) we can see that they appear to be in alphabetical order. Looking good so far.

print(word_list[:5])

We're not sure that 'a', 'aa', or 'aaa' will win our robot friends any Countdown games (nor do they actually look very legitimate!) but let's pretend that the dictionary corner is still busy checking if 'robotbutt' is actually the bit left of a robot when others have stopped smoking it.

Note that word_list is merely still a list of words, sorted in alphabetical order. Remember that we wanted to organise the words under the letters that they begin with? A dictionary data structure would be perfect for this: the starting letter could be the key, and the list of associated words could be the accompanying value for that letter.

But before we proceed, we need to know what letters are in the alphabet. No, seriously. Of course we know. But our Python program needs to too. For this, we'll create a simple tuple data structure containing the letters of the alphabet, in lowercase. Rather helpfully, this can be done in a single line based on string.ascii_lowercase.

import string
alphabet_letters = tuple(string.ascii_lowercase)

...And we use that tuple - alphabet_letters - to trawl through our list of words (word_list), to organise all the words under their respective first letters in a dictionary data structure (words_grouped).

x_letter = 0
words_grouped = {}
while x_letter < len(alphabet_letters):
    y_word = 0
    words_temp = []
    while y_word < len(word_list):
        if word_list[y_word][0] == alphabet_letters[x_letter]:
            words_temp.append(word_list[y_word])
        y_word += 1
    words_grouped[alphabet_letters[x_letter]] = words_temp
    x_letter += 1

At this point, it's worth noting that everything up until now doesn't necessarily have to be repeated with every game of Countdown. That's only needed, if you decide to choose a different text file as the input, such as in the unlikely event of suddenly deciding to play Countdown in Esperanto.

Printing out the list for a particular letter confirms that the words_grouped dictionary is now storing the words in a structure that is easier for us to search. For example, if we address the key for 'd', we can see that the first 5 entries of the accompanying list all begin with the letter 'd'.

print(words_grouped["d"][:5])

Now, let's generate some random letters solely for the purpose of testing. 9 in total are required, like in the actual show. And let's make sure we throw in some vowels, and set a minimum number of them. It's apparently hard to make many legitimate words without any vowels.

from random import randint

min_vowels = 2
vowels = ("a", "e", "i", "o", "u")
num_letters = 9

chosen_letters = []
x_letter = 0
vowels_found = 0
while x_letter < num_letters:
    if vowels_found < min_vowels:
        suggested = vowels[randint(0, len(vowels)-1)]
        vowels_found += 1
    else:
        suggested = alphabet_letters[randint(0, len(alphabet_letters)-1)]
    chosen_letters.append(suggested)
    x_letter += 1

Let's see what our resulting list (chosen_letters) contains:

print(chosen_letters)

Looks like we've achieved our aim. We have 9 randomly generated letters contained in our list - chosen_letters. There's some vowels thrown in there for good measure.

The letters round is now in full swing. Expect the Countdown clock to... well... count down for 30 seconds. Presumably this is where the game gets its name?

Lastly, the most important bit. We need a little algorithm to find the words from words_grouped that can be constructed solely from letters available in chosen_letters.

The first task is to work out how many letters we have of each in chosen_letters. This info is actually quite important, as we need to be able to take into account instances where more than one of the same letter appears. We store the counts for each letter in letter_occurrences.

letter_occurrences = {}
o_letter = 0
while o_letter < len(chosen_letters):
    letter_occurrences[chosen_letters[o_letter]] =\
        chosen_letters.count(chosen_letters[o_letter])
    o_letter += 1

The above routine creates a letter_occurrences dictionary that contains the letters, and the counts of how many times they appear in chosen_letters.

Then, we do the actual matching. We work through the relevant subsets of words_grouped, and check if the letters appear as per their counts specified in letter_occurrences. We store matching words in a list - results - along with the number of matching letters. Below is our version of this algorithm. We're sure there are many ways to significantly improve this (although if your program always completes under 30 seconds, there may not be a reason to).

Notice the cheeky sort() at the end. This is how we organise the list in order of word length (high to low), allowing us to see the best answers first.

results = []
while x_letter < len(letter_occurrences):
    y_word = 0
    while y_word < len(words_grouped[list(letter_occurrences.keys())[x_letter]]):
        z_letter = 0
        letter_count = {}
        while z_letter < len(chosen_letters):
            current_word = words_grouped[list(letter_occurrences.keys())[x_letter]][y_word]
            letter_count[chosen_letters[z_letter]] =\
                current_word.count(chosen_letters[z_letter])
            if (letter_count[chosen_letters[z_letter]] >
                    letter_occurrences[chosen_letters[z_letter]]):
                letter_count = {}
                break
            z_letter += 1
        if letter_count:
            match = sum(letter_count.values())
            if len(current_word) == match:
                results.append((current_word, match))
        y_word += 1
    x_letter += 1
results.sort(key=itemgetter(1), reverse=True)

Let's see what our results list contains. Here, we're showing the top 10 words.

print(results[:10])

...So it appears as though the algorithm has found 1 word with 5 letters ('modem') that can be made up of our earlier, randomised 9 letters: o-e-j-m-g-k-d-m-k. A quick human check confirms that the answer itself is correct. If the dictionary corner wasn't too busy working out if 'robotbutt' refers to a large barrel used to collect robots, they'd also confirm that 'modem' is indeed valid. They might also tell us that there is another possibility, 'joked', which does not appear in our google-10000-english file for some reason.

Evidently, we should organise this code better. Here's our Python application that incorporates all of the above in a single application. The code has been relocated under separate functions because, a) it's a lot more readable, and b) some of it can be re-used this way. For example, the solve() function is intended to be callable from outside this module (countdown.py). We'll explain why in a minute.

Run it a few times, and we'll see that 9 letters are randomly generated, and that the matching words with the most letters are outputted each time.

python3 countdown.py

This is our first (and probably last) attempt at creating the algorithm. Clearly, it could be tuned to be a lot more efficient. For example, we could add another layer to the search, and search first the words with the most letters, and stop when one is found. After all, we might just want Rosie to win the game (that is the point isn't it?). She doesn't get extra points for working out all the possible words

Also, Rosie is only as good as the dataset she's fed. Equipped with a 10,000 word list, there is every chance that she'll be beaten by an uncharacteristically clever human with knowledge of the 170,000. But that's easily fixed by replacing the text file in our /dictionary directory with a more comprehensive list of words.

So what's left to do?

We need to revisit the work we did before with Rosie on optical character recognition (OCR) and text-to-speech, to produce something that (semi-)works, together.

Here's a very amateurish drawing to brainstorm our idea. Let's call it a design as it makes it sound like we know what we're doing.

If you were able to read the handwriting, you'll now know that the program will:

1	Take a photo of the TV screen with the Countdown letters round letters shown, using picamera
2	Carry out OCR using Google Cloud Vision API and Python's Requests
3	Run relevant bits of our Countdown.py algorithm, using solve(), to 'solve' the letters round
4	Use Python's gTTS to convert the answer into speech (mp3)
5	Play the audio back using omxplayer and a speaker connected to the Pi
6	Definitively and scientifically prove that computers really are smarter than humans. By getting beaten by a titchy robot every time.

We've described how we do all of these other stuff previously, so we won't dwell on them for too long in this post. All we did here was develop another Python application (play.py) that does all of the good stuff mentioned above, and call the solve() function from the earlier module (countdown.py) when it has the text identified from the photo.

play.py looks like this:

And there it is. Run play.py when the TV screen is showing the letters from Countdown's letters round, and Rosie does all the things she needs to before gleefully shouting out the top answer back at us.

python3 play.py

Here's a view Rosie has of the first game she's decided to play.

And here's her moment of inspiration:

And here's another.

And what does Rosie have to say about this?

Of course, if you're feeling particularly ambitious, you could get your robot to do other things with the answer; like Tweet it, or send an email to your family members (extended relatives included) with all the possible answers. But that would perhaps be a little too annoying. And the robot is likely to accidentally (at least, according to them) end up in the skip.

There's a few little things to bear in mind if you're really going to take winning Countdown seriously. First of all, there are other rounds - specifically, the numbers round. You'll need a different algorithm for solving that, and arguably, it will be a little more complicated than working through a list of words. Secondly, unlike the super-machines that play Go, Chess or Jeopardy, Rosie isn't really playing against humans here. She's not reacting to any moves that other contestants make, or adapting herself to a particular situation she finds herself in. Therefore, we can't really claim to be employing any form of artificial intelligence. Her only advantage here is that machines can access a ridiculous amount of structured data quickly, and have the raw power to process through them.

...Which as it happens, is all you need to win the game.

So there it is. Completely contrary to our objective, we've managed to make Rosie unbearably antisocial during the Christmas holidays; she'll smugly shout out the answer during the Countdown letters round like a know-it-all, and leave us feeling distinctly inferior (and somewhat cheated).

But fear not. We'll simply console ourselves reciting jokes from the Christmas crackers. After all, there's no known algorithm that'll ever allow robots to find them funny.

Now we hear that dictionary corner has reached a conclusion about our 9-pointer; 'robotbutt' means...

This little project was featured in issue 66 of the official Raspberry Pi Magazine (MagPi). Thanks for the interest!

Rosie the Red Robot

Search This Blog

Tea minus 30

Let's get this over and done with...

Labels

Comments

Post a Comment

MOST VISITED (APPARENTLY)

LoRa-Wan Kenobi

Battle of BLEtain

Hard grapht