Hello world

Aug 22, 2023 15 mins read

"Hello World" is a ubiquitous and iconic phrase in the world of computer programming. It serves as the simplest and most basic introductory program that developers often write when learning a new programming language.

Roland-Jochems

My name is Roland Jochems and I have been working in automation for almost 25 years, of which the last 20 years at CIMSOLUTIONS. In this time I have had to deal with examples of how to use a technique or a service on several occasions and invariably you will encounter “hello world”-like examples. As friendly as “hello world” sounds, it can be so naive when people decide on the basis of these “hello world” examples whether a technique or service can also be used to solve their own “Real world” problems.

Roland Jochems ;
For a good customer of CIMSOLUTIONS known for ball bearings, I have been working as a systems architect for a while on a system for processing and analyzing data measured on ball bearings of trains. The customer has chosen to use Amazon Web Services for all their customer-facing applications

AWS
Due to the customer's choice for AWS, it is up to me to process the data in AWS and IoT. An additional requirement from the customer was to use “serverless”-“managed” services as much as possible, instead of using virtual servers (Ec2) with software running on them. The reason is to minimize the cost of managing virtual machines. Another reason is that using managed services would be cheaper. I specifically say "should be" because that's not always true and depends a lot on how you use the service.

To get an idea of ​​what kind of application I'm talking about:

Imagine a train where a sensor is mounted close to the bearing on all wheels, which performs measurements completely independently and uploads data when it suits the sensor best.
This train, for example, has three trainsets each with 8 wheels divided over 2 bogies per trainset. That's 24 sensors for that train.
Furthermore, this customer has 16 trains. That amounts to a total of 348 sensors.
Each day before the timetable starts again, these sensors send the data that was measured the day before. Which amounts to an average of 285 individually measured values ​​per day per sensor. Total 348 * 285 ≈ 99,000 records per day for this individual customer. Last January, 2,586 sensors together sent an average of 1,065,866 measurement data per day.
Architecture
I dug into the AWS literature and went to the Re-Invent conference in Las Vegas to learn a bit about AWS and how to best solve the above use case. Below is a standard architecture as you will find in AWS documentation. I simplified the picture a bit.

Standaard-AWS-architectuur

We therefore also had the above architecture in mind for processing our sensor data.

Challenge 1: All “example” and reference applications that I have seen so far, rely on sensors that have a mostly constant connection and then send data the moment they measure it or send it through an “edge” computing device. This does not cause any problems even with 10k+ number of sensors, because the distribution of data over the day is fairly spread out. However, as I described in the use case, in our case the bulk of the data can be sent in a relatively short period of time.

Kinesis


Kinesis is a streaming service that regulates the amounts of data by means of chards (literally means shard). Each chard can process 1,000 records or 1MB per second. The number of chards you have available is configurable and incurs a cost.

A chard is assigned via a key. This way you can guarantee that all data is sent via the same char and therefore arrives in the same order.

We quickly ran into the fact that kinesis can only process 1,000 records per second and if all sensors of 1 train send data at the same time, we soon run into this limit.

Solution 1: Increasing the number of chards to handle the peak load. The downside is that after the peak we pay for something we don't use.

Solution 2: Capturing the peak load by means of S3 and SQS.

S3

Instead of sending the messages to kinesis all at once, they are written to S3 first. As soon as the sensor is ready, a message is put on the queue that the sensor has finished sending and that the data can be processed. The Data handler reads all data from that particular sensor and bundles the data to generate fewer messages per second and sends them to kinesis.

S3 and SQS and Lambda are also not without costs, but this still costs less than increasing the chards to accommodate the peak load.

This project, which is still in development, has already cleared some of these hurdles where a proposed technology didn't do exactly what you expected it to.

I will show more of these “hello world vs real world” examples in a next blog.

Roland Jochems
Systems Architect
 

Share
Image NewsLetter
Icon primary
Newsletter

Subscribe our newsletter

By clicking the button, you are agreeing with our Term & Conditions