Reproducible Data Science: Building a Code Pipeline from End to End


Using my experience as a 2018 Data Science for Social Good Fellow, I will be providing an overview of our team’s workflow and pipeline construction, highlighting the importance of reproducibility and easy implementation of our code. The talk is divided into the different sections of our pipeline: 1) Data processing and cleaning, 2) Data staging, 3) Machine learning modeling infrastructure, and 4) Usability. Each stage is discussed in context of our DSSG project, in which we constructed a precision medicine tool to predict an individual’s risk of developing Type 2 Diabetes within the next 3 years.

JHU Biostatistics Student Computing Club
Baltimore, MD