Sharing research data in DANS

cutting the cake with M. Koelen and T. Veldkamp, Research Support Coordinator and Dean of the ITC Faculty

Sometimes serendipity knocks at your door in a very unusual way. Here in the Netherlands, getting your PhD degree requires uploading your research data to a public repository upon conclusion of your research. I did upload mine to the DANS repository (coordinated by KNAW and NWO) and this had an unexpected turn: my research data was the number 100,000 deposited into this repository, so the folks at DANS sent a (nerdy) cake to the ITC Faculty (UTwente) to celebrate this occasion! Thanks for your kindness and this sweet treat!

A fundamental part of advancing science is sharing results with your peers and make data available to the general public. In this way, we help at developing new ideas and all together we contribute at reducing the gap between science and society. Science is about reusing the knowledge and the building blocks created by researchers before us. Initiatives like DANS facilitate this continuous recycling and sharing of ideas, which in turn might speed up new research lines. Here you can check the persistent identifier associated to my PhD research: https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:131887

In case you are interested in this dataset, let me explain how it is organized. After clicking on the identifier above, you are presented with the metadata. In the files tab you can access to the actual files I used during my PhD research. Note that since this research uses data provided by third parties (RIVM and Wageningen University), s0 you need to login or request permission to DANS to actually see and use the files. There are two folders here: materials and KNMI. The materials folder contains the code to run the models and several small datasets (text files, raster files), organized in a “per paper” basis. Keep in mind that subfolder “P02” contains the model predicting hazard (tick activity), whereas the rest of the subfolders contain code to estimate the risk component (tick bites). You can read more about what my research is about and why calculating these components is important in this post. The subfolders linked to each of the publications contain a small text file briefly explaining how to run the models and what are the outputs they produce. The KNMI folder contains all the weather raster files used during this research, which are mainly aggregations of several weather variables at different time scales.

When preparing this submission to DANS EASY I basically extracted all these files and datasets from a larger workspace (that somehow survived all these years), organized them in this compact way, and ran the experiments one by one to make sure the code is functional. So, well (sweats nervously), em…. (crosses fingers) this means that (hopefully) the experiments should run in your computer too!

You can read the official ITC press release here.

Irene Garcia-Marti
Irene Garcia-Marti
PhD Data Scientist

I have a keen interest in applying machine learning methods in the field of spatio-temporal analytics.