Caeli: The Sky Is Not The Limit With Data Automation
Measuring by Satellite
Our first step was to design an architecture that could process and store large influxes of data at a quick pace. Scalability was crucial considering the inevitable increase in data for processing and storage. The obvious choice for scalability was the digital clouds. In this case, the Amazon Web Services (AWS) cloud platform provided the best data storage options. We created a database within AWS and a data pipeline to collect the acquired data and write it into the database for the NO2 gas that can form particulate nitrates.
Scalable Architecture Designed for Time and Place
When Caeli retrieves information from their own database, they often want to filter it by time and location: for example, data from Amsterdam during January 2021. Filtering by time is not a problem because the data is stored chronologically (ascending); the database system roughly ‘knows’ in which rows the January 2021 records are found.
However, it becomes more complicated when you also want to filter the data by location. The visual data is not geo-referenced in the order of X and Y coordinates, and only about one record in a million in the database actually matches with coordinates in Amsterdam. It is very inefficient to check each of these lines, so the challenge was to find an architecture that could efficiently filter through multiple dimensions.
Applying Knowledge and Skills
We made intensive use of many tools and techniques that the Aurai traineeship introduces. Both Amazon Web Services (AWS) and the Hadoop ecosystem, which together were the focus of this assignment, were extensively practiced during the traineeship.
The training prepared everyone t provide the best solution for Caeli’s data management. On the one hand, Caeli receives a working end product that fits the necessary precision and accessibility of information. Nitrogen dioxide (NO2) data is now automatically collected and stored in a scalable database where records can be efficiently filtered by both date-time and location. On the other hand, clear documentation of processes and transfer protocols allows Caeli to manage this product itself and to reuse it for other atmospheric particulate matter.