We design data pipelines capable of handling extreme data volumes. We connect field devices to high throughput ingestion endpoints, and integrate machine learning and analytics software into inline realtime processing.
We combine proven open source platforms such as Kafka and Spark with managed services such as AWS EMR, S3, and Redshift for the most cost effective solutions.
Our client collected location and telematics data from remote sensors every 5 minutes. With a new business case, the client increased the data collection to 10 second intervals, and expanded the number of parameters from 4 to 20. The existing infrastructure could not support the ten-fold increase in throughput.
We assessed the throughput requirements, then projected 3 year volume based on the company growth numbers. We have iterated through two prototypes for the client to ensure the solution met their architecture and budget constraints. We developed the solution for the client in a series of 8 agile sprints each delivering incremental value.
The solution we implemented for a client included ingestion of IoT data through Kafka cluster, to EMR based Spark analytics engine for realtime anomaly detection, with S3 / Redshift back end. The client is now able to handle the current volume of 100K+ transactions per second, and has scalable capacity to accommodate years of growth.