Reproducible Data Science: Building a Code Pipeline from End to End


Date
Location
Baltimore, Maryland

Using my experience as a 2018 Data Science for Social Good Fellow, I will be providing an overview of our team’s workflow and pipeline construction, highlighting the importance of reproducibility and easy implementation of our code. The talk is divided into the different sections of our pipeline: 1) Data processing and cleaning, 2) Data staging, 3) Machine learning modeling infrastructure, and 4) Usability. Each stage is discussed in context of our DSSG project, in which we constructed a precision medicine tool to predict an individual’s risk of developing Type 2 Diabetes within the next 3 years.

Avatar
Benjamin Ackerman
PhD Candidate & Data Scientist

I’m a health researcher interested in using statistics and data science to improve public health and promote social good.