Skip to main content

Quantitative wheezing

Recent history of earth people is littered with epic examples of data manipulation.  EnronLie-borBernard made-off with a couple of gazillion.  And the abhorrent scandal that is "shrinkflation".  Everyone is at it.  Shamelessly fiddling their numbers for their own needs and wants.

Well one thing is for sure... we don't like missing out on the action.  Any action.  Even those activities that involve making questionable adjustments in a certain spreadsheet-ing blockbuster from a mega corp based out near Seattle.  And where better to start our apprenticeship in learning to massage the numbers like Leeson in the mid-90s than with a litter tray full of sensor readings obtained by our beloved ESP32 microprocessor.

Nevertheless, not so much as after an hour or two, maintaining the ruthless facade of a rogue trader of arbitrary IoT sensor data becomes rather quite tedious and cliché.  After all, there are perfectly excusable reasons for touching up the unglamourous stream of numbers generated by sensors and whatnot.  Not least, because those readings themselves could be wrong.  Or simply need to be translated into actionable data points that humans or even other machines can work with.

Attention all wannabe IoT data meddlers scientists.  We're about to over-zealously collect data from multiple sensors, and attempt to use AWS IoT Analytics to make sense of it all.


Big data, sounds significantly better than the less publicised field of engineering, little data.  At least, the marketing departments of worldwide information technology companies seem to think so.  Therefore, do we want our very own lake / warehouse / reservoir / murky swamp full of trivial strings and numbers to kick start our analysis?

Of course, we do!

And since our strict ethics prevent us from forging data in its entirety, we will try our best to attach a slew of bargain basement sensors to our lone ESP32 development board to get our metaphorical water-based geological feature populated with bits and bytes.  And... forge the rest.

Listen up gullible investors, here's our winning portfolio that is guaranteed to make you millions (of sensor readings).
  • First and foremost, we'll be maintaining our previous deployment from Green, Green Grass of /home, namely our ESP32 development board running MicroPython (minus the flame detector). Plus our Raspberry Pi running AWS IoT Greengrass Core (minus the piezo buzzer).
  • Then, we will welcome into our high-rise corporate headquarters in the City, an Oxbridge-educated team of fresh recruits:
    • DS18B20 - the Warren Buffet of temperature sensors - makes a predictable return.  A sturdy, reliable digital temperature sensor, it carries on accumulating temperature readings like a true financier.
    • If - like us - you thought Bosch only makes washing machines and power tools, well, you're almost right.  It turns out, they make sensors too.  Specifically, in our case, the BMP280 - which measures barometric pressure, and surprise, surprise... temperature (again).  Hergestellt in Deutschland (or possibly China).
    • DHT11 is a pretty noddy temperature and humidity sensor, unlike its slightly more intellectual, degree-earning cousin, the DHT22.  Oh well, it's cheap.  We first stumbled upon the DHTs back in PiAMI HEAT.
    • A photoresistor.  A what?  A PHOTORESISTOR.  A WHAT?  OK, so it's basically just a resistor which changes its resistance depending on how much light (photons) it absorbs. Why didn't we just say so, instead of shouting.

Effluent-market hypothesis

Our I-O-Tea series is now most definitely open for business.  And as you have noticed, we've been taking you on an unpredictable, random walk of IoT topics, specifically centred around the AWS IoT portfolio.

Where will this end?  No-one really knows.  We can only tell you where we've been.
  1. Frozen Pi
  2. Have-ocado
  3. Green, green grass of /home

Losing our Barings

If there is anything positive to be learnt from rogue trading, and believe us, we had to dig deep to unearth this gem, it is that you should throw everything at a self-enriching cause (including the rare, gold-plated kitchen sink hand-made by an artisan in Venice), and cover your tracks afterwards by tweaking the numbers.  Oh yes, and always avoid (not evade, you understand) tax.

Here's our trading strategy.
  • Attach ALL our sensors to the ESP32 development board.  Don't be shy.  Who dares wins.  Winner takes all.  <Insert another irrelevant fist-pump / high-five quote here>.  Because, like Pokémon, you've gotta attach 'em all.
  • Write snippets of Python code that will collect meaningful data from these sensors.  1-Wire, I2C and Analogue to Digital Conversion (ADC) are all in use, and each requires a different approach.
  • Configure AWS IoT Analytics to ingest, digest and regurgitate our savoury data
  • Have a go at creating a Lambda function that works with AWS IoT Analytics' Pipeline feature to detect anomalies in our sensor readings... and... erm... hide them away in a "tax efficient" bank account in the Cayman Islands.  Nothing to see here, officers.  Those temperature readings never happened.

The wolf(ing) of Quality Street

Life is like a box of chocolates.  Except, these days, the chocolates are more likely to be smart confectionery decorated with sensors that track your GPS coordinates, amount of calories you are consuming, and through machine learning, classify whether you are in fact a cat or a dog.

We have already voiced our intentions to plant some sensors on our ESP32 development board in the interest of generating data.  And after our prior excursion in Green, Green Grass of /home, we have an ESP32 which is sending MQTT payloads back to AWS IoT Core using an intermediary Greengrass Core device.  Therefore, it won't surprise you to know that our new measurements too will be dispatched back across the Internet to the fluffy, cuddly Clouds.

The question, then, is what else will we be doing with this data?  Unlike stolen credit card details, or Facebook with their user accounts, monetising our highly suspect sensor readings on the dark web isn't an option for us.  OK, let's store them in a datastore instead, and query it like a good citizen.  Moreover, we'll see if we can apply some programmatic logic before the data is committed to fake enrich the numbers.

AWS IoT Analytics is a ready-made platform which is designed to help us ingest, manipulate and inspect IoT data en masse, before it is stored in a purpose-built datastore.  Once there, we can query and analyse them to our heart's content, like a concerned data scientist high on caffeine.

Yes, it's time for us to distil this solution diagrammatically down into a number of rectangles and arrows with associated labels.  All in the name of making this entire operation seem professional.

The big swimming short

We start this episode with our ESP32 development board, attached to our 4 chosen sensors.

We hate to hit you so early on with a mesmerising picture of a microprocessor attached to a number of sensors, but we simply couldn't avoid it.  Here's our entrant for the Pulitzer Prize

First things first: the DS18B20 needs no introduction.  It returns to invade our peace time and time again, like a brand new series of Britain's Got Talent.  This rather familiar contestant is a mere digital temperature sensor, and its only talent is... measuring temperature.  And it does its job rather quite majestically, like Susan Boyle, but without the flair or finesse.

Still, it's worth reminding ourselves how to obtain temperature readings from a DS18B20 using the 1-Wire protocol.  Here we go.

import machine
import onewire
import ds18x20
one_wire_pin = machine.Pin(32)
one_wire_bus = ds18x20.DS18X20(onewire.OneWire(one_wire_pin))
temp_sensors = one_wire_bus.scan()

We've used our next sensor before also... the DHT11 temperature / humidity sensor.  This is more a Jedward.  You get two sensors in one - temperature and humidity - but their readings tend to jump around excitably and chaotically.  And it leaves us ultimately wondering about the contestant's quality, or more generally, purpose in life.  Is it a novelty act?  Consigned forever to be a warm-up act to a more established star?

Let's give it a whirl anyway since we have it on hand, using MicroPython's handy, built-in dht library.

from machine import Pin
import dht
d = dht.DHT11(Pin(26, Pin.IN, Pin.PULL_UP))

No nonsense (but albeit suspicious) observations, we're sure you all agree.

Let's continue on our mission to overpower our ESP32 with countless more sensors and readings.

Here's something new.  We now turn our attention to a photoresisitor, an artist that also tours the world with the catchy name, light dependent resistor (or LDR).  And compared to the rest, this performer is slightly indie.  A photoresistor module is essentially a voltage divider circuit created using a) a light dependent resistor and b) a second (fixed) resistor.  Through the wonders of Ohm's Law, the output is therefore a variable voltage.  This voltage reflects how much light is or isn't being detected by the photoresistor.

...Which is precisely why for this particular module, we need to rely on ESP32's built-in Analogue to Digital Converter (ADC) capabilities to translate this analogue voltage (0 - approx. 3.6V) to a digital value that we can work with in code.

Having said all this, like a BGT final, it really doesn't get that riveting.  We simply instantiate a machine.ADC class, and periodically read the voltage value detected on the pin. 

import machine
adc = machine.ADC(machine.Pin(33))

What does this value mean? After all, it looks rather high?

We're using 11dB attenuation ADC.ATTN_11DB so we're working in the range of 0 - 3.6V. Moreover, we're sticking with the default ADC.WIDTH_12BIT configuration, which means the digital values can be 0 - 4095 (212 - 1).  As such, this ADC reading × 3.6V / 4095 gives us the actual voltage value.

With our module, higher the voltage, darker it is.  Lower the voltage, the lighter it is.  That's about as scientific as we made it.  For more meaningful measurements that could be used for relative comparison, this value could be further converted into lux - the official unit of luminance.  But we hear that this relationship isn't strictly linear, and varies by the photoresistor, so we left this at that.

Did we say 4 sensors?  So we did.  Then, this must be the last.

It is arguably the most complex.  We are getting acquainted with the German industrial conglomerate Bosch's BMP280, which is grandly billed as an "absolute barometric pressure sensor".  According to its datasheet, it measures temperature, too.  And since there is a scientific relationship between atmospheric pressure, temperature, humidity and altitude, it's also (sort of) possible to call this an altimeter.

Let's just dabble with pressure and temperature readings for now. 

BMP280 is an I2C device. After we instantiate our I2C object, we will use an extremely helpful bmp280 library that's available on GitHub. We could manually read from / write to the memory map defined in the datasheet, but thankfully the authors of this library have already thought this through for us.

from machine import I2C, Pin
i2c = I2C(scl=Pin(5), sda=Pin(4), freq=400000)

Once the device is detected on the I2C bus, it is truly as easy as creating a bmp280 object while passing our i2c object, and calling its two methods - getTemp() and getPress().

import bmp280
b = bmp280.BMP280(i2c)

All done.

The rather large pressure figure is in pascals, the unit of pressure.  Apparently - at sea-level - atmospheric pressure is around 101 kPa, so there is no reason to place the numbers we're seeing here under suspicion.

Lastly, in an effort to stuff our MQTT payload with even more data, we will also dispatch a timestamp, device name, and IP address.  Additionally, we'll send the memory free value, which can be obtained using the gc library:

import gc

The risk when interfacing with a number of sensors using multiple protocols is that there is always a likelihood of something failing.  And what we don't want to discover is our MicroPython application stuck on the prompt, having encountered an exception.

Therefore, we will trap all known exceptions raised during our interactions with the sensors.  These errors will be assembled as a list, and sent in the MQTT payload also.  The result?  Our program continues to run and obtains whatever sensor readings it can, while the specific error conditions are hoovered up in a dedicated MQTT attribute.

All of our data is packaged up in JSON, and dispatched to our local Greengrass Core device on topic rosie/sensors using MQTT.  It is then forwarded on to AWS IoT Core using a straightforward 1:1 Subscription.  We won't be using Greengrass Core for any other purpose in this post, so there is nothing really stopping us from sending MQTT packets directly from the ESP32 to AWS IoT Core if we wanted to bypass Greengrass altogether.

Right, is it all working?

Eavesdropping on the payloads in AWS IoT Core tells us that the messages are arriving.  Which is good news.

Once in AWS IoT Core, we could simply redirect this data into a DynamoDB table via the Rule Engine, like before.  We could then use whatever tools at our disposal to query this DynamoDB table, and if required, manipulate the data.

But is there a better way?  Let's use a different widget in the AWS IoT family to play with the data - AWS IoT Analytics.

From what we can gather, AWS IoT Analytics provides us with a ready-made database (Datastore), a means to populate that database from AWS IoT Core (Channel), and a mechanism to transform the data as they arrive in (Pipeline).  We can thereafter query the data using pre-built queries (Data Set).  There are other bits and bobs too, like Notebooks for more custom / advanced data analysis and machine learning but we'll park those capabilities for now.

Right, let's connect things up.

Data is funnelled through into AWS IoT Analytics through an AWS IoT Core Rule, and then, to a Channel.  These could be manually created in the portal, or we could simply rely on the wizard.  Either way, we appear to end up with the mandatory combination of Channel, Pipeline, and Datastore.

Here it is, a Channel - rosie_channel. The plumbing inlet through which our MQTT payloads from the ESP32 will arrive into AWS IoT Analytics.

On the AWS IoT Core side, we can also see that the wizard has auto-created the Rule: Rosie_topicrule.

Lastly, back in AWS IoT Analytics, there will be a default Pipeline rosie_pipeline (that doesn't transform anything yet), a datastore - rosie_datastore - that stores all our data, and a "select everything on planet earth" type of Data Set, rosie_dataset.  Sounds like everything we need to get going.

Now, without really doing much else, navigating to the rosie_dataset Data Set, it is possible to refresh its contents using "Run now".  If everything has been integrated successfully, and we have permitted it enough time to collect MQTT data from our ESP32, we will be able to see our data as a Data Set, both as a "preview" table, and a downloadable CSV.

That's about it when it comes to setting up AWS IoT Analytics.  But what else can we do?

Let's refocus our minds on performing an Enron-style beautification of the truth.

As it so happens, a spot audit of the results reveals some anomalies that make our credibility in the IoT game highly questionable.  We've highlighted below a few of the errors that are obvious to us humans, no thanks to Microsoft's premier spreadsheeting application: EXtreme Cell Evaluation Lab

    Furthermore, because this section of our experiment is all about manipulating data, we will keep this highly unsatisfactory data as is.  Broken.

    ...So a lot has potentially gone wrong during our probation period.  Under normal circumstances, it should result in our dismissal.  Instead, we'll attempt to use the Pipeline feature of AWS IoT Analytics to embark on a little creative accounting.  After all, through the use of Pipelines, this incoming data *can be* modified before being saved in the Datastore.  Which ultimately means any prying eyes on the Data Set will be unknowingly observing the modified data.

    Below are some of the observations we made.

    ObservationCaused by...So what are we going to do about it?
    Known exceptions being raised when interacting with the sensorsHardware error (dodgy breadboard and Dupont cabling being the main culprits)Investigate the setup to make sure connections are all sound and stable
    Missing readingsExceptions are normally accompanied by missing readings, which sort of makes senseSolving the hardware issue should make this problem go away
    Highly suspicious readings. For example, the BMP280 sensor in one instance suggested a temperature of -0.4 °C, which simply cannot be true where we are, at this time of yearWe aren't 100% sure about the cause, but chief suspect (again) is an intermittent hardware fault, or software bugThis is where we propose to use AWS IoT Analytics to do a bit of anomaly detection, and purge the reading into oblivion
    A whole set of readings were missing from a time periodMicroPython was stuck on an exception we forgot to handleFix the code to handle the exception.  Or better, fix the root cause.

    There are many different schools / places of worship / industrial complexes of thought on whether data should be manipulated, and if so, how.  At a push, we suspect there is common agreement; that if dodgy data is found, it is important to investigate the root cause (and not to simply airbrush it away like the models on the covers of Hello / Hola / Hallo / こんにちは Magazine).  After all, there is likely to be a fault or (shock! horror!) a bug that simply shouldn't be ignored.  Or, the data could actually be real and could be telling us something really, really interesting.

    Advanced mathematical topics such as anomaly detection are simply too mind boggling for our tired brains to soak up.  And there is a global army of academics and professionals tirelessly working on statistical models and machine learning algorithms to perfect the art of detecting outliers, and coming up with legitimate ways of pacifying them in order for a system to reliably function out there in the real world without throwing a wobbly.

    In other words, don't pretend that data is real and complete, if it isn't.  And where data has been fashioned up or modified, document what has been done, and why.

    So what exactly shall we address using AWS IoT Analytics' Pipeline?

    Take this line for example:

    Here, we appear to have a highly suspicious BMP280 temperature reading (-0.4 °C).  For the sake of simplicity, we would quite like to remove this measurement, since it's clearly not what we were expecting.   Rather unhelpfully, -0.4 °C is a plausible value (if it was Winter) and well within the operating parameters of the BMP280.  To compound the matter, the DS18B20 appears to have encountered an issue so doesn't have a measurement we can reference instead.  The DHT11, on the other hand, has a very sensible looking temperature... 22 °C.

    There is potentially a number of ways we can a) detect the BMP280 anomaly, and b) correct it.  But in the sole interest of demonstrating Pipelines, we'll use the crudest method imaginable.

    This is most definitely not the way to approach this in real life, but we are simply going to remove the BMP280 temperature reading if it is outside of a completely arbitrary 5-30 °C range.  And what's more, we'll remove the accompanying BMP280 pressure reading as well, since evidence points to both being erroneous. A better way would be to choose the min / max temperature threshold based on recent historical data, readings from other sensors, another microprocessor, or perhaps even an external weather information source.

    How do we configure Pipelines?  The logic we plan to use will end up dictating the Activity that needs to be associated with our Pipeline.  Simple mathematical filtering operations that result in attributes being added / amended / removed could be concocted by daisy-chaining together the default Activities provided by AWS IoT Analytics.  Anything more bespoke, and we are likely to have to resort to Lambda.

    This is our Lambda function - iot-analytics-bmp280 - that simply inspects the batch of MQTT payloads arriving into our Channel.  If the bmp280_temperature value is outside of our range, it is stripped, together with bmp280_pressure.  Since we're talking Lambda, this function could programmatically branch out to access a plethora of AWS and non-AWS services to perform functions which will be necessary for more complex transformations.

    Back in the AWS IoT Analytics Pipeline page, we amend our Pipeline - rosie_pipeline.  It will continue to sit between our rosie_channel Channel and rosie_datastore Datastore, but it will be told to inspect - and if required to - transform the messages using our iot-analytics-bmp280 Lambda function.

    Unlike before, the Pipeline now will inspect every message passing through it from the Channel, and invoke our iot-analytics-bmp280 Lambda function.  Results are stored in the rosie_datastore Datastore.  And because we know what our Lambda function does with "out of range" BMP280 measurements, we know that rosie_datastore will contain transformed values.

    On a somewhat related note, we found the AWS IoT Analytics web console experience slightly lacking.  For one, it's not entirely possible to see what Activities have actually been configured behind a Pipeline, and every time we try, it attempts to show us in the context of attributes that it has decided to infer from a randomly selected message.

    The only sure-fire way we found of obtaining the "truth" was to use the AWS CLI.  For example, by using aws iotanalytics describe-pipeline, we could tell that three specific actions were successfully daisy-chained together: "channel" to read messages from the Channel, "datastore" in which to store the results, and in-between, our "lambda" Activity to invoke our iot-analytics-bmp280 Lambda function.

    aws iotanalytics describe-pipeline --pipeline-name rosie_pipeline

    OK.  So our Pipeline has been reconfigured.  But our Datastore hasn't been refreshed.  It's time to reprocess our messages so that our Lambda transformation is applied retrospectively to all previous messages entering the Datastore.

    Once our Datastore has been refreshed, we want to re-run our Data Set also.  A quick peak at the result should verify that any BMP280 temperature readings outside of the 5-30 °C range have been well and truly obliterated, and what's more, the accompanying pressure reading pacified as well.

    This supposed detection of anomalies is so amateurish, it is likely to induce cold sweats in any self-respecting statistician (not us).  But clearly the AWS IoT Analytics' Pipeline + Lambda tag team could be further developed to take into consideration a lot more parameters.  For example, it could invoke a REST API to an online weather provider to source the current temperature for the area, and used as a threshold either side as our permitted values.  Or we could interrogate our DynamoDB table of sensor readings to see what the recent values have been over a time window, and use some serious maths to detect if our current value is statistically unlikely.

    Lastly, since we have an ever increasing number of sensors taking temperature measurements, it might even be possible to perform some sensor fusion, like a Kalman filter, to scientifically approximate a stellar reading.

    For now, here is the results from our high jinks on our Pipeline.  Rather unashamedly, the outliers were duly neutralised.

    As touched on earlier, we don't have to develop our Pipeline transformations in Lambda.  There are ready-made Activities that can be daisy-chained together, for simple operations such as manipulating message attributes, and performing basic maths.  This is likely to be a better option, if our intended transformations are simple in nature: such as converting Celsius to Fahrenheit, attempting to calculate a Lux value from the photoresistor voltage, or perhaps, having a go at calculating altitude from our sensor readings.

    Sigh... we are struggling to show you something else that is interesting.

    Stuff it! Here's the output from our photoresistor module over time that shows us the room getting lighter or darker during the day (sudden jumps are where the light in the room has been turned on / off).  Plotted alongside the DHT11 temperature reading, there appears to be an inverse relationship between the ambient lightness of the room and its temperature, which sort of makes logical sense.

    There we have it.  We started off by unnecessarily collecting a small reservoir full of sensor readings, and later, wondered what to do with them.  And in the end, our tour doubled up as a whistle-stop demonstration of another funky icon in the AWS IoT portfolio - IoT Analytics.

    As far as we're aware, the black Reliants outside our house with tinted windows do not belong to the FCA, so we're going to safely assume that we're not under any scrutiny for fiddling our numbers.  Nor will Scorsese be purchasing from us the film rights based on our rogue tampering of BMP280 temperature and pressure readings.

    Which is all extremely welcome, because we can now advance to write our next instalment in the I-O-Tea series from the comfort of our home, and not from a shared, malware-infested computer in a prison library.

    Graphic content

    We were never quite content with creating graphs manually using Excel.  Which is why, later in this series, we test-drove two popular opensource data visualisation tools - Grafana and Kibana.

    You can read all about it in Hard Grapht.

    Furthermore, in Rage:MKR, we connected up AWS QuickSight and Jupyter Notebooks to our AWS IoT Analytics Data Sets.

    Code of conduct

    Our final code looks a little like this:


    Like anything, one should always read the user guide.  We believe it may have been Yoda who imparted on us this wisdom.
    AWS IoT Analytics' Pipelines can get a little confusing, especially the Activities.  It did, at least for us.  Here is its specific page tucked away in the documentation:
    We really didn't get on well with the AWS IoT Analytics web console... specifically when it came to configuring Pipelines and their associated Activities.  We resorted to using the AWS CLI:
    Special thank you's go in the direction of this community, whose MicroPython library we used for interfacing with our BMP280.
    For everything else, we're using standard libraries provided by MicroPython:



    LoRa-Wan Kenobi

    In the regurgitated words of Michael Bublé: It's a new dawn .  It's a new day .  It's a new Star Wars film .  For me .  And I'm (George Lucas, and I'm) feeling good .  Unfortunately for Canadian Mike, the Grammy that year was won by the novelty disco classic with the famous refrain: We love IoT, even in Planet Tatooine * . *Not true. Clearly, the Star Wars producers didn't sincerely mean the last Jedi the previous time around.  Return of the Jedi, released during the decade that spearheaded cultural renaissance 2.0 with the mullet and hair-metal , was less economic with the truth.  Either way, we're going to take inspiration from the impressive longevity of the money-spinning space-opera and reboot our franchise with some Jedi mind tricks.  Except this particular flick doesn't require an ever-growing cast of unrecognisable characters, unless ASCII or UTF counts.  In place of an ensemble gathering of Hollywood stars and starlets, we will b

    Battle of BLEtain

    The trolling . The doxing . An army of perplexing emojis. And endless links to the same - supposedly funny - viral video of a cat confusing a reflection from a dangling key for a golden hamster, while taking part in the mice bucket challenge. Has social media really been this immense force for good? Has it actually contributed significantly to the continued enlightenment of the human (or feline) race? In order to answer these poignant existential questions about the role of prominent platforms such as Critter, StinkedIn and Binterest, employing exceptional scientific rigour equal to that demonstrated by Theranos , we're going to set up a ground-breaking experiment using the Bluetooth Low Energy feature of MicroPython v1.12, and two ESP32 development boards with inexplicable hatred for one another.  And let them hurl quintessentially British expressions (others call them abuse) at each other like two Wiltshire residents who have had their internet access curbed by the co

    Hard grapht

    You would all be forgiven for assuming that bar , pie and queue line are favourite pastimes of the British .  Yet, in fact – yes, we did learn this back in GCSE maths – they are also mechanisms through which meaningless, mundane data of suspect origin can be given a Gok Wan -grade makeover, with the prime objective of padding out biblical 187-page PowerPoint presentations and 871-page Word reports (*other Microsoft productivity tools are available).  In other words, documents that nobody has the intention of ever reading.  But it becomes apparent over the years; this is perhaps the one skill which serves you well for a lifetime in certain careers.  In sales.  Consultancy.  Politics.  Or any other profession in which the only known entry requirement is the ability to chat loudly over a whizzy graph of dubious quality and value, preferably while frantically waving your arms around. Nevertheless, we are acutely conscious of the fact that we have spent an inordinate amount