- It will use developer-friendly Jupyter notebooks and the toolkit will help to do an in-depth analysis
- The toolkit cleans up COVID-19 data from authoritative sources and formats it for analysis with tools like Pandas and Scikit-Learn
IBM has released an open-source toolkit designed for developers and data scientists who aim to spot trends in the COVID-19 pandemic. It will use developer-friendly Jupyter notebooks and the toolkit will help to do in-depth analysis.
The toolkit cleans up COVID-19 data from authoritative sources and formats it for analysis with tools like Pandas and Scikit-Learn. Then, it builds an initial set of example reports and graphs.
Rely on data from some key, authoritative sources
The company said in a blog post, “Taking care of these tasks frees developers and data scientists to focus on advanced analysis and modeling tasks instead of worrying about things like data formats and data cleaning. Our repository uses developer-friendly Jupyter notebooks to cover each of these initial data analysis steps.”
It added, “It’s important to note that the underlying data for COVID-19 changes on a daily basis. As you build your own analysis, you’ll want to update the results of your own notebooks frequently. But rerunning a collection of interconnected notebooks can be challenging. There are multiple stages of analysis, and the output of one step often feeds into multiple other steps. To simplify the process of updating your results with the latest data, we’ve created data processing pipelines using the Elyra Notebook Pipelines Visual Editor and KubeFlow Pipelines.”
The COVID notebooks rely on data from some key, authoritative sources. For the county-level data from the US, IBM will depend on data from the COVID-19 Data Repository, run by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. It will also use data from the The New York Times Coronavirus (Covid-19) Data in the United States repository and New York newspaper THE CITY’s digest of the daily reports from the New York City Department of Health and Mental Hygiene. For othet countries, it will use the European Centre for Disease Prevention and Control’s data on the geographic distribution of COVID-19 cases globally.
Download all of these data sets as they run
It added, “The notebooks download all of these data sets as they run, for two reasons. First of all, these data sets change on a daily basis. Second, the license terms of the data sets do not allow commercial entities to redistribute the data. If you use our open source code in a commercial application, be sure to verify that you are staying within the bounds of the license terms for the underlying data.”