Caeli: The Sky Is Not The Limit With Data Automation

Caeli is a startup dedicated to providing insight into air quality with a view from above. The satellites orbiting our planet provide their end-users with chronological and (near) real-time information. Satellite imagery can be a cheaper and more readily available option than remote sensing as a tool for measuring the molecular composition of our atmosphere. Generating maps displaying particulate matter such as Nitrogen Dioxide (NO2), Ammonia (NH3), Methane (CH4) and Ozone (O3) can help the public and government understand how changes to the atmosphere may affect health or influence the climate.

Measuring by Satellite

Our first step was to design an architecture that could process and store large influxes of data at a quick pace. Scalability was crucial considering the inevitable increase in data for processing and storage. The obvious choice for scalability was the digital clouds. In this case, the Amazon Web Services (AWS) cloud platform provided the best data storage options. We created a database within AWS and a data pipeline to collect the acquired data and write it into the database for the NO2 gas that can form particulate nitrates.

Scalable Architecture Designed for Time and Place

When Caeli retrieves information from their own database, they often want to filter it by time and location: for example, data from Amsterdam during January 2021. Filtering by time is not a problem because the data is stored chronologically (ascending); the database system roughly ‘knows’ in which rows the January 2021 records are found.

However, it becomes more complicated when you also want to filter the data by location. The visual data is not geo-referenced in the order of X and Y coordinates, and only about one record in a million in the database actually matches with coordinates in Amsterdam. It is very inefficient to check each of these lines, so the challenge was to find an architecture that could efficiently filter through multiple dimensions.

Take a look at Caeli’s data – simplified

Applying Knowledge and Skills

We made intensive use of many tools and techniques that the Aurai traineeship introduces. Both Amazon Web Services (AWS) and the Hadoop ecosystem, which together were the focus of this assignment, were extensively practiced during the traineeship.  

The training prepared everyone t provide the best solution for Caeli’s data management. On the one hand, Caeli receives a working end product that fits the necessary precision and accessibility of information. Nitrogen dioxide (NO2) data is now automatically collected and stored in a scalable database where records can be efficiently filtered by both date-time and location. On the other hand, clear documentation of processes and transfer protocols allows Caeli to manage this product itself and to reuse it for other atmospheric particulate matter.