Rage:MKR

Where are you? Why do you hide?
Where is that "perfect" ML model that leads to our demise?
Just like the Rage:MKR goes in search of his AI uncontrolled
I search for love, for an (unpredictable) algorithm to have and ~~scold~~ hold

Yep, it's 2020. And there will be yet another James Bond film blessing us with its presence. And if you know your spy films, you'd know that No Time to Pi will be the 25th instalment. If you also know your IANA TCP port allocations, you will recognise that this magic number is also the designated port for SMTP*.

So here's an unencrypted message transmission to you (via umpteen untrusted servers and several nosey intelligence agencies). To celebrate this occasion, we're going to interrogate our AWS IoT Analytics data using QuickSight and Jupyter Notebooks, then poorly execute a bit of machine learning on our helpless IoT data with the help of our obedient minion: Scikit-learn.

Ladies and gentlemen... please sit back and enjoy this bewildering spectacle. It's sure to be bankrolling some lucky entertainment industry mogul's retirement, somewhere.

*Random unnecessary fact of the day, brought to you by this film's main glamorous sponsor: The SMTP Appreciation Society.

Pi, Another Day (and Another, then Another)

This is that recurring scene that infiltrates every single James Bond film in which the frugally named Q introduces highly implausible yet impossibly confidential gadgets, often assembled out of items chosen at random from an Argos catalogue, that will end up coming to the protagonist's belated rescue in the most dramatic of fashions. Reinforced titanium barbecue tongs that will be used to hang off an Alpine cliff-face after the inevitable action sequence involving a helicopter and a nitro-powered Fiat Cinquecento. A hot-wired Dyson vacuum equipped with a concealed nuclear reactor that will be used to repel a small contingent of mysterious assassins with implausible (but invariably tragic) back stories.

But as we don't have the sizeable resources of Universal to throw at this production, nor long forgotten cache of Nectar points, our underwhelming arsenal will merely consist of unclassified, less imaginative technology, similar to those of a world power temporarily experiencing a severe fiscal deficit. In fact, our kit is simply a rehash of the devices that were introduced back in the prequel that everyone saw, but can't quite remember: SD:S3 The Untold Love Story (like that Bond film starring George Lazenby).

Namely, this is all that the current departmental funding permitted:

An ESP32 development board running MicroPython makes a hasty return to the fray like Daniel Craig in search of additional funds for a retirement villa in Monaco. Sturdy. Dependable. Amiable. And that's just the venerable microprocessor, popular with IoT tinkerers. The jury is still out on Dan the Man from Cheshire.
You probably missed M's slightly condescending speech at the start setting out the entire pretext for the ensuing 126 minutes of explosions, car chases and romantic liaisons without the modern conveniences of Tinder. In which case you probably casually swiped left on the additional detail that was rushed into the screenplay when the original running time (minus distracting content) was estimated to be approximately around that of an Octonauts episode. Yes, there was a SSD1306-driven 128×32 OLED screen to display the end-of-the-film countdown to a disaster taking place (predictably avoided with just 1 second spare). Our tireless BME280 sensor to measure atmospheric temperature, humidity and pressure at the ruthless villain's surprisingly tropical and cheery island tax haven. Lastly, a SD card module used to store what little remains of the final cut of Bond #87 (working title: AliExpress - From China with Love) once all product placement has been removed... which turns out to be not very much.
In the famous words of the Queen of Pop, I guess we'll Pi another day, except we've ignored her high-pitched mutterings, and have gone ahead and incorporated everyone's favourite single-board computer to the project. Because our Raspberry GoldenPi continues to run Her Majesty's Secret Greengrass Core service, re-publishing data received from our ESP32 development board using a Greengrass Group Subscription, to AWS IoT Core using MQTT. We have kept our one Rule to flood AWS IoT Analytics with our sensor readings. Since this is really where the focus will be in this latest #code-block-buster.

To get the detailed back story to all this nefarious kit, you really will need to waste your Friday night scrolling through IMDd to read through the pretentious, opinionated critiques that are made in the User Review section of SD:S3 The Untold Love Story. Yes, the SD card starring in it really was a subliminal, metaphor for the trial and tribulations of modern life, and the overwhelming amount of expectations placed on us by society. Thank you for noticing.

0x00 0x00 0x07

We don't have 24 of them. But there sure have been some earlier adventures involving our latest larger-than-LifePO hero - I-O-Mr-T. If you aren't that life of the party who can name them all from memory in chronological order with a mouth full of Cheerios laced in Martini, here's a cheat-sheet to have concealed underneath your suave satin napkin:

Dr. NOSQL Please, We're British:

We have assembled our glamorous cast. We have randomly picked exotic locations around the world that are pivotal for ~~tax efficiency~~ storyline reasons. The fuel tanks in our unaffordable super cars are full, and ready to be blown up.

What now? How does this particular story go?

We have BME280 sensor data being pumped into AWS IoT Analytics via a Channel. Which means - in strictly AWS IoT Analytics lingo - we have data populating its internal Data Store via an ordinary, not-so-controversial Pipeline. And we can intermittently create point in time Data Sets of our data with which we can perform our analysis. In short, we're free of all the various database / storage technologies we've experimented with to date... such as DynamoDB, S3, Elasticsearch Service, InfluxDB, Sqlite, council-provided wheelie-bin, etc². We're relying wholly on the internal workings of AWS IoT Analytics so that we can instead spend our time skiing down a mountain while being chased by unreasonably angry men in badly made snowmobiles that ought to be recalled for being susceptible to catch fire.
Now let's pretend that we're evil geniuses suffering from a bout of midlife crisis in our heavily fortified, mountain lair while monitoring the world engulfed in dire panic. There wasn't any budget leftover in our Swiss bank account to send our henchmen on a training course for a "groovy" open source visualisation tool (like Grafana or Kibana; Flat Eric in Hard Grapht was lucky, as Evil Corp still had a training budget before every money["penny"] was spent in the purchase of octillion-watt lasers). So we're just going to have to order our minimum-wage minions to watch the world burn using AWS QuickSight. Besides, it's a tool we haven't played with yet, and it sounds like other things that a crazy arch-baddie might find equally enjoyable, like dancing the Quickstep, a spending spree at Quicksilver, or throwing their adversaries into quicksand. And yes, our flirtation with QuickSight will indeed be quick (and quite vague).
Finally, we'll launch a Jupyter Notebook from AWS IoT Analytics and perform a bit of data hokey-pokey using some of the popular Python data handling tools we encountered before in Athlete's Footnote, like Pandas and Matplotlib. At this point we're also going to fabricate a completely nonsensical machine learning use case so that we can hoist in Scikit-learn - a lightweight Python machine learning tool - to train and test a linear regression model, and use it for predictions. Like a good film, we reach the climatic point that implies that something grander could be at play - namely, the deployment of said Notebook or trained model in AWS's managed machine learning platform SageMaker or Greengrass to operationalise it - but did we mention that we were out of budget?! Yep, we stop right there, and instead, tease you with the infinite possibilities (and deliver you none). And we hit you with another three year wait for the next instalment, a period filled by Johnny English substitutes.

At the risk of sounding like Martin Lewis from the MoneySavingExpert, please a) research AWS charges for the services in question, b) keep an eye on your AWS usage and projected bills, and c) brush and floss your teeth every night unless you want to look like Jaws. Remember your friends: trial periods, free tiers, and stopping / deleting services when you stop using them. Your arch-enemies are: forgetting about services you've provisioned (and leaving them running), greedily selecting the largest sizes for everything (because you can), and if you are trying to use your smart phone outdoors in the UK at the height of winter, Coldfinger.

Yawn Connery

Like Roger Moore, AWS IoT Analytics has featured in our adventures many times before, starting with Quantitative Wheezing.

We have discovered it to be a managed platform which is designed to help mere minions ingest, manipulate and inspect IoT data en masse, and have it be stored in a purpose-built time-series datastore that we need to know practically nothing about. That is pretty much all we can say about it; because of official secrets, for your eyes only, special clearance and the need to know, and all that.

Here's the official diagram we have dug out from MI7's secret intelligence archives located somewhere below a nondescript Matalan store built on a major flood plane. As you can see, we have annotated it with red text to instil the impression that our entire intelligence apparatus was mobilised to make this meaningful assessment.

The report has identified the following constituents to this plot:

[Alpha, Bravo, Charlize (Theron)] These are James Bond villains' favourite methods of imparting torture - diggers, death-claws, and a menacing-looking remote control devices (don't forget the countdown clock!) - which, apparently, are all smart devices these days, and Internet-connected to AWS IoT Core via a Channel. The fact that AWS IoT Core itself is being fed data from the ESP32 development board, via a Greengrass Core device, is conveniently ignored in this plot-line, along with the general rules of physics and consideration for diplomatic relations.
[Delta] Should we choose to accept the brief, we can frivolously doctor incoming data using a Pipeline (presumably to create a pretext to send the nation to war)... using good old-fashioned mechanical cogs and flatulence of the vertical take off and landing (VTOL-F) variety judging by the graphic.
[(Amazon) Echo] ALEXA, will we need to have somewhere in which to store our incoming time-series data? "Sure lazy human. These are the pictures of time-series databases I've found online". AWS IoT Analytics has a built-in Data Store. But you knew that already. So why ask Alexa?
[Foxtrot] This is where it gets kinda fancy, as we can create point in time Data Sets of our data, and query it to our heart's content...

We will be interrogating our data stored in our AWS IoT Analytics Data Store, via Data Sets we have created from it, using two distinct methods: QuickSight, and Jupyter Notebooks.

QuickSight is AWS's managed data visualisation and reporting tool. It is accessed using the browser, and looks kinda like Grafana, and to a lesser extent Kibana. Like Bond's dispensing of his adversaries, it is primarily a point-and-click affair, and since it will happily connect to AWS data sources such as an AWS IoT Analytics Data Set, not a lot of effort needs to be exhausted in creating people-friendly charts and presentations.

Jupyter Notebooks, on the other hand, is an open-source web application that allows users to run and annotate Python code interactively. It is popular in the data science community, since it can be used to explain away sections of what can be quite complex code, alongside the results. In the land of AWS, managed Jupyter Notebooks appears to be part of the wider machine learning SageMaker platform. Parsley, Rosemary and Thyme - you won't have to wait long for your time in the AWS spotlight.

OK agents. It's time to kill one bird with two stones.

Licence to `kill -9`

If we want to use QuickSight, we need to polish our boots and enlist ourselves at the local KwikSite recruitment office. There's a limited 60 day trial that we can use (before which we would scramble to delete our subscription), and the no-frills Standard Edition will suffice for our purposes.

Oh look, this looks exactly like a sign-up page for some sort of streaming service. We like green ticks, and repeated instances of the word "FREE". We therefore have all the assurances we need to embark on this risky mission.

There's a couple of QuickSight account-specific parameters to provide the sign-up page in the next screen, predictably, like AWS region, account name, notification email address and the AWS services we intend to use with it.

Romantic Venice and sunny Jamaica are not yet available as QuickSight-ing locations, so we'll settle for the green, green land of the leprechaun.

Oh no, amidst all the cinematic excitement, we completely forgot to select the other AWS services that we intend to use with QuickSight on the previous screen. No worries. We can change these at any time.

Of course, AWS IoT Analytics is the prime candidate here, as its Data Sets will form the source of all our timeless knowledge and wisdom.

One last check to make sure we're definitely on an active, free subscription, and we're ready to go creating some glorified line graphs.

Once a Data Set has been generated in AWS IoT Analytics, we can register it as a QuickSight Data Source. And, you'll remember that Data Sets are defined and run in AWS IoT Analytics.

Well, here it is. Our IoT Analytics Data Set as a Data Source in QuickSight.

...And once that effortless plumbing has been confirmed to be successful, we can start to visualise the data that resides in that Data Set - which is sort of the entire point of this exercise.

Note that at any time, we can choose to refresh the data. This is important to know, since Data Sets in AWS IoT Analytics too are likely to be re-run at any given point in time to reflect new IoT data.

And with a bit of pointy-pointy-click-click action, in the next screen, we can draw some pretty graphs in our browser using our newly imported data.

Exciting? Not so much. Useful to henchmen that like to gain "actionable business insights" from a swamp of data regarding impending doom? Probably.

Clearly, there is a multitude of graphing options and ways in which to organise and present the data tucked away in these dashboards. And there is a little more to QuickSight than the shameless charting. There are features that allow us to pull in data from multiple sources, and to collaborate on our creations with others... hushed in an ominous tone... securely. But this is all starting to sound like boring Business Intelligence-type work of an office bod, not scintillating secret squirrel work of an intelligence agency case worker.

Well, let's then graduate to something a little more involved, and arguably exciting... Jupyter Notebooks!

For a while we've been wondering what the Notebooks link on the AWS IoT Analytics console menu does. Today is our lucky day, because we're going to find out. By clicking a button beneath what looks like an interstellar Casio calculator that was no doubt banned from GCSE maths exam rooms.

First, we christen our very own Notebook, and link it to our AWS IoT Analytics Data Set.

At this point, if we don't have a Notebook Instance already, we will need to create one.

Notebook Instances in AWS appear to simply be fully managed servers running our very own Jupyter Notebook application... and is an important carriage in the wider machine learning SageMaker freight train. Which is also why we want to be careful here and start with the smallest of the small Instance Types (and stop / start them accordingly). You know, for budgetary reasons, and all that.

We should now be able to see our newly created Jupyter Notebook in the AWS IoT Analytics console (which incidentally we can also see in the SageMaker web GUI as well). When done, we can stop (or delete) our Notebook Instance... which is not a bad idea.

And here it is in SageMaker if we so wish to steer this fast moving freight train to another location (and possibly into the side of a refinery).

Back to AWS IoT Analytics. Let's launch the Notebook, and unleash our inner Blue Peter.

By default, the AWS IoT Analytics Notebook houses some basic instructions on how to retrieve data from our AWS IoT Analytics Data Set, using the omnipresent Python boto3 client.

But first, just to prove that this is actually just a prettified Python application being presented over the web, let's run a simple print command and see how the results get displayed in the Notebook.

print("I'm a bona fide data scientist!")

Run the row containing the code, and it'll show the output directly under it in our Notebook, as if we were sitting in front of a less aesthetic Python console. All very clever. All very neat.

We can now start to get a little more creative with our Notebook. You can add "Markdown" text. Embed links to images of questionable value. As well as - of course - code. And run them interactively to display the results. Everything is all in one place. Annotated and shareable. Like, ahem, a notebook, really.

And using a slightly modified recipe of the example given earlier, we can use the boto3 client to get the latest AWS IoT Analytics Data Set content as a URL to its CSV file, using get_dataset_content().

import boto3
# Instantiate IoT Analytics Python client using Boto3
client = boto3.client("iotanalytics")
dataset = "rosie_dataset"
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
response = client.get_dataset_content(datasetName=dataset, versionId="$LATEST_SUCCEEDED")
# This is actually the URL for a downloadable CSV of the data set
print(response["entries"][0]["dataURI"])

Now that we have a completely illegible URL from which to retrieve our CSV, we can directly load the data from the file into a Pandas dataframe, and rearrange it to be indexed, and in chronological order.

import boto3
# Load content of CSV into a pandas dataframe
df = pd.read_csv(response["entries"][0]["dataURI"])
print(df)
# Rearrange in ascending timestamp, and set timestamp column as index
df["timestamp"] = pd.to_datetime(df["timestamp"])
df = df.sort_values(by="timestamp")
df.set_index("timestamp", inplace=True)
print(df)

There they are. All our BME280 readings. Indexed by timestamp, and arranged from earliest to latest. Clearly, we can now graph this using Matplotlib.

Again, the beauty of Jupyter Notebooks is showcased for all of us to see here. The graph appears in our browser, neatly tucked in below the code snippet used to produce it.

# Plot temperature and humidity
ax = df.plot(y="temperature")
df.plot(y="humidity", ax=ax)
plt.show()
# Calculate simple moving average (SMA) for temperature and humidity, and plot
sma_window = 4
df["temperature_sma"] = df["temperature"].rolling(window=sma_window).mean()
df["humidity_sma"] = df["humidity"].rolling(window=sma_window).mean()
ax = df.plot(y="temperature_sma")
df.plot(y="humidity_sma", ax=ax)
plt.show()

This was kind of where we were going to leave this... but why not take this experiment further? Not least, because we had our ESP32 development board running for a few weeks and had accumulated a lot of readings.

We're going to re-import our latest AWS IoT Analytics Data Set, which now contains data from a longer time frame.

# Data was collected for slightly longer duration, so let's re-run this with a larger dataset...
response = client.get_dataset_content(datasetName=dataset, versionId="$LATEST_SUCCEEDED")
df = pd.read_csv(response["entries"][0]["dataURI"])
df["timestamp"] = pd.to_datetime(df["timestamp"])
df = df.sort_values(by="timestamp")
df.set_index("timestamp", inplace=True)
# Plot temperature and humidity
ax = df.plot(y="temperature")
df.plot(y="humidity", ax=ax)
plt.show()

...But we're now going to use Scikit-learn to train a Linear Regression machine learning model on the nonsensical pretence that we could predict a potentially missing humidity reading, based on incoming temperature - purely to demonstrate the mechanics of planting the seedlings of machine learning in AWS IoT Analytics data.

Still in the Notebook, we will import the required sklearn libraries and split our data from the Data Set into train and test sub-sets. We'll use the training data set to train a Linear Regression algorithm, then use the test data set to see how accurate it is at *guessing* the humidity value, based on an arbitrary temperature reading.

Like much of the content of Bond films, this experiment has no scientific merit to it whatsoever, nor does it take into consideration the fact that we are dealing with time-series data which form patterns over time.

Here's a detailed and extremely helpful blog post that we came across on how to run a linear regression model using Scikit-learn which acted as the inspiration for the content that follows.

Let's give this Scikit-learn malarkey a go.

import numpy
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error

# Use temperature data (x) in an attemp to (badly) predict the humidity (y)
x = df["temperature"].values.reshape(-1, 1)
y = df["humidity"].values.reshape(-1, 1)
# Split the dataset into train and test records 80%:20%
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
# Use the training data to train the linear regression algorithm
regressor = LinearRegression()  
regressor.fit(x_train, y_train)

So that's our trained linear regression algorithm, then.

We can now use our algorithm to make some predictions using our test data set, and see how accurate those predictions are.

# Then use the test data to make predictions using the linear regressor
y_pred = regressor.predict(x_test)
# Calculate error metrics
print("Mean Absolute Error:", mean_absolute_error(y_test, y_pred))  
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
# Show first 10 records of actual test data vs prediction
df = pd.DataFrame({"Actual": y_test.flatten(), "Predicted": y_pred.flatten()})
print(df[:10])

The predictions are truly abysmal. Arguably, no better than just randomly generating a number between a predefined range using Python's randint().

Once we have an algorithm, we can use it to make predictions against new incoming values as well.

Once we have an actual working algorithm, tuned, trained and tested to perfection using petabytes of genuine IoT data from billions of them diggers, death-claws, and menacing-looking remote control devices, we are likely to want to deploy it in the wild, perhaps using SageMaker, or even on the edge using Greengrass. And perhaps use it to make some spectacular predictions, gain some zen-like insights, or just flood Planet Earth with yet more illegible data.

Right, so it has finally dawned on us. That this was more Austin Powers, than the lovable British secret agent. And that the only "insights" we did gain, was on just how many puns can be formed using titles of James Bond films.

The world has been saved from catastrophe once again. The producers have banked another few million. And the worldwide sales of an Aston Martin DB10 skyrocketed to an all-time high of 10.

Which is why we're calling it a day, and rolling the credits.

Read the Novel:

You really ought to read the original AWS IoT Analytics documentation (not just watch the latest movie) if you are planning on using it for real:

https://docs.aws.amazon.com/iotanalytics/index.html

Same applies for AWS QuickSight... here's the official user guide:

https://docs.aws.amazon.com/quicksight/latest/user/welcome.html

Pandas can be both a widely adopted Python library for working with large volumes of data... and more than one instance of a less adopted bear native to South Central China.

https://pandas.pydata.org/

Myrcury, Myrs, Vynus just don't look like they are spelt right. Nor does Jupyter. But here's its documentation anyway:

https://jupyter.org/documentation

SciPy is an open source Python library that allows us to perform complex mathematical operations that are at the heart of science, engineering and other trades that will make our parents proud. So complex, that we can't tell you much else about it, other than where you might find more information on it.

https://www.scipy.org/getting-started.html

Do we want graphs? No? Well, tough! Here's Matplotlib:

https://matplotlib.org/

There are many reasons why we might look at Scikit-learn for our simple classification, regression and clustering Python machine learning needs. And even more reasons why we might need to look at its user guides.

https://scikit-learn.org/stable/user_guide.html

And here's a detailed blog post on how to run a linear regression model using Scikit-learn which acted as the inspiration for the last section of this post.

https://towardsdatascience.com/a-beginners-guide-to-linear-regression-in-python-with-scikit-learn-83a8f7ae2b4f

Rosie the Red Robot

Search This Blog

Rage:MKR

Pi, Another Day (and Another, then Another)

0x00 0x00 0x07

Dr. NOSQL Please, We're British:

Yawn Connery

Licence to `kill -9`

Read the Novel:

Labels

Comments

Post a Comment

MOST VISITED (APPARENTLY)

LoRa-Wan Kenobi

Battle of BLEtain

Hard grapht

Rosie the Red Robot

Rage:MKR

Pi, Another Day (and Another, then Another)

0x00 0x00 0x07

Dr. NOSQL Please, We're British:

Yawn Connery

Licence to kill -9

Read the Novel:

Labels

Comments

Post a Comment

MOST VISITED (APPARENTLY)

LoRa-Wan Kenobi

Battle of BLEtain

Hard grapht

Licence to `kill -9`