What is the NDFF?
The NDFF is a data bank containing more than 200 million validated observations of various flora and fauna across the Netherlands, most of which are reported by volunteers. These observations are currently used by government institutions such as provinces, municipalities, universities, so called waterschappen, and even companies, to inform policies affecting nature. This can be anything from choosing suitable locations for housing projects, but also nature management, monitoring and conservation. It is therefore unsurprising that politicians have decided, as part of the new coalition agreement, that the data in the NDFF should be open for all Dutch citizens to access.
Open Data: a new chapter, new challenges
This provides various challenges that affect the form of an effective solution. Currently, people and organisations with access to the data of the NDFF are registered. As a result, the number of requests to the data bank, as well as the specific data they target, is predictable. This knowledge provides clear guidelines and requirements for designing an infrastructure that can handle these requests efficiently. However, since the NDFF will be open to anyone for the first time ever, we simply do not know what forms the volume of requests will take, what data they will target, nor how this will evolve over time.
Designing a data platform
In order to provide high quality data with fast response times to the entire country despite these challenges, we as Aurai, are helping BIJ12 design and implement a data platform that can handle this type of workload, by treating data as a first class citizen in the suite of products by the NDFF.
A data platform that can deliver data under these circumstances should have certain properties. It should be highly scalable in order to handle the unknown volume of request. All data transformation processes should be able to guarantee consistent and correct results. Moreover, it should have a clear Data Governance policy with the ability to audit access to data in order to prevent unauthorized access to data. At last, the platform should be able to function and deliver value for many years to come. Therefore, it needs to be built such that it is easy to expand and change, which not only requires a modular design and software engineering best practices, but should also allow the various domain experts of BIJ12 to be able to understand and work with the platform during the entire data lifecycle.
Implementing a data platform
The implementation of the data platform will be achieved by utilizing performant and scalable technologies offered on Microsoft Azure. This is where we use Databricks to both run pipelines that perform large scale transformations, as well as Data Governance. Some of these transformations include geospatial data, hence we use the Mosaic package, which processes these types of data using the Spark framework. Furthermore, within Databricks a data governance strategy is employed to ensure secure access. In order to ensure that the platform produces high quality data and will continue to do so, we use Data Build Tool to develop, organize, and continuously test all transformations in a version controlled and documented fashion. The data produced will then be transformed to a scalable database hosted on Azure, where it is stored ready for consumption and optimized for low latency access.
We are super excited, and still hard at work as part of the NDFF team to help realize this new chapter in its existence and drive value across the entire country.
Do you want to know more about Aurai, career opportunities or opportunities to work together with us? Feel free to reach out to Jeroen or Casper(@aurai.com)!