GPS Tracking with LoRa

April 27, 2019 - Reading time: 6 minutes

I recently bought a Dragino LoRa IoT development kit development kit from the IoT store here in Perth. It's a pretty cheap way to experiment with LoRa, and comes with 2 Arduino Uno boards, one LoRa shield, and one LoRa + GPS shield. In addition, it comes with a LoRa 'gateway' device which allows you to bridge a LoRa network to an IP network (via WiFi or Ethernet). The device cannot be considered a true LoRa gateway since it only communicates over a single frequency, whereas the LoRa RF pysical layer uses spread spectrum modulation. Still, its good enough for some quick prototyping.

LoRa gateway setup and config

The Dragino gateway can be hooked up to the Things Network following the guide here. Below are the settings on my gateway - the key is the radio settings, and specifically the Tx and Rx frequency, as these will be needed when configuring a LoRa client to communicate with the gateway.

Image description

Within the Things Network, I add the gateway as a LoRaWAN gateway using the eui of the device. The gateway is configured to use the legacy packet forwarder.

Image description

Once the gateway is hooked up and you can see it on the Things Network console successfully, you'll need to create an application within the Things Network console to represent a data flow from your LoRa client, through the gateway and into the IP network. This is very simple to do within the console and once complete, you can add a device to your application, to represent a LoRa client that will transmit data. When creating the device, you'll need to select a device EUI for your device - the Dragino shields do not seem to have EUIs so for testing, generate your own. The device will have a Network Session Key and an App session key generated as shown below

Image description

Once the gateway, application, and device are configured in the Things Network, you can write some code on the Arduino board with the LoRa shield attached, to communicate with the gateway - ultimately the aim is to use the LoRa + GPS shield to periodically take GPS readings, and transmit these to the gateway, which will forward them to the Things Network.

Device code

Getting the LoRa shield to communicate with the gateway in the Australian LoRa frequency band (915-928 MHz) turned out to be a bit tricky until I found Thomas Laurenson and his superb post on how to do just that here combined with his fork of the Arduino LMIC library. Thanks Thomas!

Wiring up the hardware Since the Arduino UNO board has a software serial, we need to wire up the shield as shown below, and described here

Image description

In the code for the device board, available here I make use of the Cayenne LPP library from TTN to make it super simple to send packets of GPS data.

Once the board is succesfully transmitting messages and you can see them being received in the Things Network console, you can set up an integration within the console to do something with the received messages. I simply post the received message (which is decoded from LoRa / Cayenne within the Things Network without me needing to do anything), to an API gateway backed Lambda in AWS which simply persists the message in a Dynamo DB table. The entire backend infrastructure to support this is defined in this serverless script.

The type of integration set up in TTN is an HTTP integration which allows you to specify the API gateway endpoint and api key. Each time a message is received over the air by your gateway, it is sent to TTN, decoded from Cayenne, and then POSTed to the specified HTTP endpoint.

Image description

Visualising the data

Once I had data flowing into Dynamo, I wrote a simple ReactJS page using Pigeon maps to visualise the co-ordinates of the points captured by the GPS receiver, and sent via LoRa and TTN into AWS.

Image description

All code for the Arduino device, the front and back end is available in GitHub

Cover Image

Vaadin and React

April 14, 2019 - Reading time: 3 minutes

I've always been a massive fan of Vaadin right from the days I was first introduced to it as the original IT Mill toolkit in 2008 by a former colleague, Long Vuong in Melbourne. At the time we were working with Echo2, another server side Java framework, and I loved the concept of being able to create rich web applications in a language I was familiar with (Java), using a familiar Swing-like paradigm, and most importantly, not having to worry too much about styling, or cross browser compatibility issues. Working with the original IT Mill Toolkit after Echo2 was exciting and over the years I used the toolkit which became Vaadin, as often as possible. Working in large enterprises with lots of Java talent but little design capability, where much of the work involved creating rich web applications that lived behind the corporate firewall and served at most less than 10,000 users, the server side Java nature of the framework was perfectly suited to rapidly delivering value. With very good out of the box styling, and a rich library of components, supported by an extensive add on ecosystem, Vaadin made it possible to deliver rich, fully featured applications at a speed which never failed to impress customers. Whilst other teams got bogged down with trying to re-invent the wheel by rolling their own data grids, dealing with obscure cross browser compatibility issues, and trying to get some half-decent CSS hacked together, we were able to use Vaadin to deliver at exceptional pace.

Fast forward a decade into the world of serverless applications, and server-side Java is a more difficult sell. Paying to host, secure, power, and maintain a physical, or even virtual server for your web application now being replaced by a completely serverless stack - in the AWS ecosystem this is typically HTML and JavaScript assets, hosted in an S3 bucket, and calling out to backend API services implemented as lambdas. In 2011 two technologies emerged that influenced the way modern web applications are created, ReactJS, and Web Components. React has emerged as the de-facto library for building web applications, whilst Web Components are a set of features providing a standard component model for the web.

Whilst React is a pleasure to develop in, I find myself missing the great components provided by Vaadin. Fortunately, Vaadin provides many of their awesome UX components as web components and it turns out including these within your React app is easy. Based on Jens Janssen's excellent article here, I created a simple React + Vaadin app that uses the Star Wars API to grab the names of Star Wars characters and display them in a Vaadin data grid. The result is here and you can grab the code from Github here

IoT with SigFox and AWS

February 17, 2019 - Reading time: 4 minutes

A little while back I was lucky enough to get an XKit development kit at one of the excellent IoT Perth meetups.

The kit consists of an Arduino UNO board with a Thinxtra shield which contains a few sensors (pressure, light, shock, 3D accelerometer) and a LPWAN transceiver. Running on the SigFox low power WAN in the RCZ4 region (AsiaPac - 920.8Mhz Uplink, 922.3Mhz Downlink), there is pretty good coverage in the Perth metro region.

Image description

I was interested to see how easy it would be to feed sensor data from this board into AWS where it could be processed further. It turned out that with the support offered by the SigFox backend, this was actually super easy. The SigFox backend allows you to define a callback, triggered on the reception of a message from the board. There are a few choices on what to do with the callback (unfortunately all documentation is only available to users authenticated on SigFox backend), including sending an e-mail, or triggering an HTTP request.

Defining the HTTP callback in the SigFox backend is very simple - in addition to specifying the endpoint to call, you are able to specify the configuration of the custom message payload received from the transmitter, including data type and endianess of the data.

temp::uint:16:little-endian pressure::uint:16:little-endian photo::uint:16:little-endian AccX::uint:16:little-endian AccY::uint:16:little-endian AccZ::uint:16:little-endian

In addition to the custom message variables, a set of standard variables (e.g. snr, time, rssi are available to include in the payload of the message to send as part of the HTTP call.

Using a combination of standard and custom message data, the payload of the HTTP call was defined thus

    "device": "{device}",
    "time": {time},
    "station": "{station}",
    "snr": {snr},
    "rssi": {rssi},
    "data": "{data}",
    "temp": {customData#temp},
    "pressure": {customData#pressure},
    "photo": {customData#photo},
    "AccX": {customData#AccX},
    "AccY": {customData#AccY},
    "AccZ": {customData#AccZ}

With that in mind, the high-level design looks like this

Image description

Device code

The code that runs ont the Uno board hosting the Thinxtra shield is super simple, and simply based on a example available from Thinxtra here. I use the Arduino IDE to write, compile, and load the code to the board.

Cloud code

Since I wanted to get the data from the board into AWS (specifically S3, where it could be queried via Aurora, using the Glue data catalog), I created an API-gateway fronted Lambda that processed HTTP requests from the SigFox backend, and published the incoming data to an AWS IoT MQTT topic. An IoT action was then defined on the same topic that persisted the data to an S3 bucket. The architecture of how the data sent by the SigFox backend is shown below

Image description

The Lambda is a simple Python script, and the entire cloud infrastructure is built from a serverless script available on GitHub.

Efficiently fuzzy match strings with machine learning in PySpark

January 14, 2019 - Reading time: 11 minutes

Matching strings that are similar but not exactly the same is a fairly common problem - think of matching peoples names that may be spelt slightly different, or use abbreviated spellings e.g. William vs. Bill. Based on this SO post about matching strings using Apache Spark to match strings, I wanted to understand the approach in more depth through an example.

Aprroximately or fuzzily matching two strings means calculating how similar 2 strings are and one approach is to calculate the edit distance between the strings - that is, the number of changes (insertion, deletion, substitution) that would need to be made to one string to make it equal to the other. A popular algorithm to calculate this distance is the Levenshtein distance algorithm, however this algorithm is too computationally expensive (slow) for large datasets. Spark's machine learning capabilities offer a different approach to this problem.

Feature transformation

When applying a machine learning approach to string matching, we first need to transform our strings into a numerical representation An effective way to represent a string as sets is to calculate the set of substrings of length (n-grams) that appear within it. For example, two similar names and their 1, 2 and 3-gram representations. is shown in the table below.

Id String 1-gram 2-gram 3-gram
S1 John Smith J,o,h,n, ,S,m,i,t,h Jo,oh,hn,n , S,Sm,mi,it,th Joh,ohn,hn ,n S, Sm,Smi,mit,ith
S2 John Smyth J,o,h,n, ,S,m,y,t,h Jo,oh,hn,n , S,Sm,my,yt,th Joh,ohn,hn ,n S, Sm,Smy,myt,yth

Spark's NGram feature transformer converts input strings into arrays of n-grams.

Feature vectorization

After converting each string to its constituent n-grams, we can then convert each string into a fixed length feature vector utilizing the hashing trick i.e. applying a hash function to each of the n-grams and using the hash function as an index directly into the array.

Spark's HashingTF class is a transformer that takes sets of terms (the n-grams in this example) and converts them into fixed length feature vectors utilising the Murmurhash 3 hash function. The default feature dimension in HashingTF is 218 = 262,144.

Taking the 2 example strings above, the following feature vectors are obtained from the 3-grams

Id String 3-gram Feature Vector
S1 John Smith Joh,ohn,hn ,n S, Sm,Smi,mit,ith [89578,131746,138261,155335,203211,205508,236449,247475], [1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]
S2 John Smyth Joh,ohn,hn ,n S, Sm,Smy,myt,yth [89578,98162,137482,138261,155335,203211,247475,249580], [1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]

Jaccard similarity

The Jaccard index is a measure of the similarity of 2 sets. Taking the feature vectors of our 2 sample strings calculated above, we can calculate the Jaccard index as the number of elements that appear in both S1 and S2, divided by the total number of elements in S1 and S2, or 5/11 = 0.454545.


Min hashing

Once we have converted our strings to feature vectors our next goal is to replace potentially large sets (well not so large if matching short strings, but consider matching large blocks of text) with a small representation which has the property that we can compare this small representation for 2 strings and estimate the Jaccard similarity of the 2 strings.

We can represent the two strings in our example as a characterisitc matrix as below

Element S1 S2
89578 1 1
98162 0 1
131746 1 0
137482 0 1
138261 1 1
155335 1 1
203211 1 1
205508 1 0
236449 1 0
247475 1 1
249580 0 1

Whilst the way in which min hashing works is beyond the scope of this post (for an excellent explanation see the MMDS book, Chapter 3, available here), the key concept is that the probability that a min hashing function for a random permutation of rows in the characteristic matrix produces the same value for two sets is equal to the Jaccard similarity of those two sets.

Spark's MinHashLSH class is an estimator that takes a DataFrame and produces a Model which is a Transformer. In order to use MinHashLSH, we can fit a model on the featurized data obtained from a HashingTF transformer, Running transform() on the resultant model provides us with the hash values.

Putting it all together

As described on SO, we can use Sparks machine learning pipeline to chain together the transformers and the estimator (together with an initial transformer to tokenize the input), giving this method that takes in 2 dataframes (containing names), fits the model to one of the dataframes and does an approximate similarity join having transformed both the dataframes on the resultant model.

def match_names(df_1, df_2):

    pipeline = Pipeline(stages=[
            pattern="", inputCol="name", outputCol="tokens", minTokenLength=1
        NGram(n=3, inputCol="tokens", outputCol="ngrams"),
        HashingTF(inputCol="ngrams", outputCol="vectors"),
        MinHashLSH(inputCol="vectors", outputCol="lsh")

    model =

    stored_hashed = model.transform(df_1)
    landed_hashed = model.transform(df_2)

    matched_df = model.stages[-1].approxSimilarityJoin(stored_hashed, landed_hashed, 1.0, "confidence").select(
        col(""), col(""), col("confidence")), False)

Example input

2 DataFrames, with 10 names in each

Id DF1 DF2
1 John Smyth Bob Jones
2 John Smith Ned Flanders
3 Jo Smith Lisa Short
4 Bob Jones Joe Tan
5 Tim Jones Jim Jones
6 Laura Tully John Smith
7 Sheena Easton John Smith
8 Hilary Jones Jon Smithers
9 Hannah Short Chris Smith
10 Greg Norman Norm Smith

Example output (lower value in the confidence column indicates a stronger match).

|name        |name      |confidence        |
|John Smith  |John Smyth|0.5454545454545454|
|Bob Jones   |Bob Jones |0.0               |
|Bob Jones   |Tim Jones |0.6               |
|Jim Jones   |Tim Jones |0.25              |
|John Smith  |John Smyth|0.5454545454545454|
|Norm Smith  |Jo Smith  |0.6               |
|John Smith  |John Smith|0.0               |
|Jon Smithers|Jo Smith  |0.6666666666666667|
|Chris Smith |Jo Smith  |0.6363636363636364|
|Jim Jones   |Bob Jones |0.6               |
|John Smith  |John Smith|0.0               |

The code can be found here (thanks to Vikas Kawadia and his blog providing an approach to unit testing PySpark code).

BHP Digital Tribes

August 3, 2017 - Reading time: 11 minutes

A couple of weeks back I took part in a hackathon event put together by Unearthed in collaboration with BHP here in Perth. The idea behind the event was to give participants some cheap commodity hardware and a bit of cash, and describe two challenges faced by BHP that could benefit from digital enhancement. So to this end participants were divided into teams and each participant was given a Raspberry Pi Zero W board and AUD 60 for additional hardware spend and sent away to come up with a novel solution to one of the two nominated challenges over 2 weekends. I was in a team with one other guy and like all other teams save one, elected to tackle the distributed safety device challenge.

The major points of this challenge were

  • Interactions between vehicles and personnel within both underground and opencut mining environments present a risk to equipment and life
  • Current approaches to safety (cones and barriers) rely on awareness and compliance from personnel and are not digitally enhanced
  • Mining environments lack traditional network communication infrastructure

The challenge was to try and create some form of digital safety controls (preferably using the supplied commodity hardware) that would enhance traditional approaches.


The approach considered 3 main areas - communication amongst components of the system in the absence of any fixed network infrastructure, sensors to gather data from the environment, and components that used the data to communicate higher level information and warnings.


Clearly the most significant challenge was to enable communication between components of the eventual system, whatever form that took. System components (vehicles, personnel) are highly mobile, changing position frequently, and there is no networking infrastructure across most of the environment. There was a requirement that communication remained within unlicensed spectrum, and given the Pi zeros came with a 2.4Ghz wifi chip on board, this naturally led to the use of 802.11 for wireless communication.

Since there is no fixed network infrastructure within the target environments, direct peer-to-peer communication, without reliance on an access point is necessary. 802.11 defines the Independent Basic Service Set (IBSS) whereby all 802.11 stations (within radio transmission range), communicate directly with each other. 

Enabling IBSS ad-hoc on the Pi is as simple as editing the content of the file


To the following (we select the SSID safetynet for the ad-hoc network, have no security, and have all devices on the network)

auto lo
iface lo inet loopback
iface eth0 inet dhcp
auto wlan0
iface wlan0 inet static
wireless-channel 1
wireless-essid safetynet
wireless-mode ad-hoc

Although Android does not support ad-hoc networking via IBSS (for a discussion on that issue see here), it is possible to enable it on supported Android devices using a custom ROM from CyanogenMod (we used an Asus Nexus 7 running CM as part of the prototype ad-hoc network). A really good discussion on ad-hoc networking support on Android is provided here. Windows generally supports ad-hoc networking (Windows 7 in particular), so there is a wide range of common hardware that can participate in an IBSS ad-hoc network.

Multi-hop ad-hoc

Although the IBSS configuration provides for direct peer to peer communication amongst devices within radio transmission range, it does not allow for packets to travel beyond radio transmission range. In the diagram below, device A can communicate with device B, and device B can communicate with device C, but since C is outside of A's transmission range, device A cannot communicate directly with device Cwithout additional routing support. We utilised OLSR - a proactive ad-hoc routing protocol operating at Layer 3 to provide multi-hop communication within the system. This significantly increases the range at which devices within the system can communicate with each other.


In the final system we had both Raspberry Pi devices and Windows devices using OLSR and communicating over multiple hops. On the Raspberry Pi's we built the OLSR daemon from the Serval Project available here, and on Windows devices we used a build available here

Gateway and backhaul

We connect the multi-hop, ad-hoc network to the wider internet via a gateway device - this is a Raspberry Pi model 2B with 2 network interfaces, one communicating on the ad-hoc network via nano USB WiFi adapter, and the other connected to the Internet. In the PoC the gateway node was connected to the internet via a wired Ethernet connection, but this could easily be made more mobile through the use of a GSM interface.

In order to pass traffic between the ad-hoc network and the Internet extremely simply, we first of all enable IP forwarding, and we use the POSTROUTING chain combined with IP masquerading

sudo bash -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'
sudo iptables -t nat -A POSTROUTING -s -o eth0 -j MASQUERADE

We enable OLSR's HNA on the OLSR daemon running on the gateway device to advertise itself as a gateway to devices on the ad-hoc network.

Communication protocol and logic

For the purposes of the PoC we designed a simple communication protocol with a limited number of message types - all communication took place via UDP over IP.

A simple controller application was written in Python (which has runtimes for most platforms) that processed communications messages in the system. This controller application runs on each Safety Net device, be it a Raspberry Pi, a Windows laptop, or some other platform.


For location sensing we used an inexpensive USB GPS receiver, a BU-353-S4 rather than getting a GPS board and connecting to the Pi via the GPIO pins, you simply connect the USB receiver to the Pi's USB port and NMEA data is delivered via a serial port. 

We use gpsd and a python library to access GPS information from the receiver. You can install this on the Pi using apt

sudo apt-get install gpsd gpsd-clients

We also made use of cheap off the shelf ultrasonic sensors to detect objects within close proximity (~3 meters).

Below is an ultrasonic sensor soldered up to the appropriate resistors and connected to the GPIO pins on a PI Zero W.

Information emitters

We wrote some very simple Java UI clients which presented the data gathered from devices in the Safety Net network, to an operator. Java was selected as it can be run on a range of platforms, including the Raspberry Pi, as well as Windows laptops. It would be trivial to extend to an Android environment too. The 2 major pieces of information that were displayed were

  1. The relative geographical position of other Safety Net devices in a radar type view. This intuitive graphical view gives an operator a quick means to identify assets within his environment, including those that may not be within line of site. The view can easily be extended to display additional information about each device, including type, bearing,velocity etc.
  2. Safety alerts generated by the system when predefined safety rules are breached (for example, a proximity barrier has been breached).


At the end of the hackathon, teams were required to pitch their work via a 7 minute presentation to a panel of judges including the general manager of the Whaleback mine.


Despite some strong competition and great submissions from other teams, the Safety Net system won both first prize at the event, as well as the Peoples Choice Award (voted for by the community).


Mapping multiple Strava activities

April 4, 2015 - Reading time: 3 minutes

Strava is a great app for competitive analysis of physical activity data from biking, running, rowing etc., but is also a good place just to store your activity data and measure and analyse it for yourself. Features such as the Training Calendar provide a great summary of how you're tracking month on month and year on year and can provide powerful motivation to increase your training. For outdoor activities, visualising GPS data on a map is something I get a kick out of - especially when covering new trails over time. Strava does a great job of plotting individual activities on a map and for premium, paid users there is the heatmap feature which is very cool, but I haven't found a way to plot multiple activities on a single map view. It may be possible, I didn't look that hard, but wanted an excuse to use the excellent Java Strava API anyway, so figured that was a reasonable use case.

Using the API with the excellent Vaadin framework made it pretty simple to put together a simple web app that allows you to select multiple activities of your own or your Strava mates, and plot them on a single map view. You can even select all your Strava activities and plot them at once if you like.

Below is a plot of my trails around Bath/Bristol in the summer of 2015.

In addition to plotting multiple activities, you can download the geographical data associated with each activity in GeoJSON format, which you can then use to something else cool with D3 JS or something similar.

The application is available here and the source code here if you'd like to run it yourself - all that is needed is to configure an application within your Strava account here for OAuth authentication, and make sure you have a Mapbox access token.