The Battle of Two Cultures: Statistics versus (?) Data Science
- Wednesday, November 20, 2019 3:30pm - 4:45pm
- Medical Center (= Billings) AMB W229
Bhramar Mukherjee, PhD, University of Michigan, Department of Biostatistics Ann Arbor, MI
- Abstract: The title of my talk is inspired by Leo Breiman's seminal paper in 2001, "Statistical Modeling: The Two Cultures", where Breiman describes the intellectual tension between a classic stochastic modeler versus an algorithmic modeler. The shock that "data science" has injected into the world of statistics is appreciably significant. The next generation of students are perhaps finding the job title data scientist more exciting than being a good old statistician. In this talk, I will try to share the “joy” (and associated anxiety) of being a classically trained statistician at a time when our science and society are undergoing unprecedented information/data revolution. I will discuss statistical challenges and opportunities with joint analysis of electronic health records and genomic data through "Phenome-Wide Association Studies (PheWAS)". I will posit a modeling framework that helps us to understand the effect of both selection bias and outcome misclassification in assessing genetic associations across the medical phenome. I will use data from the UK Biobank and the Michigan Genomics Initiative, a longitudinal biorepository at Michigan Medicine, launched in 2012 to illustrate the analytic framework. The examples illustrate that understanding sampling design and selection bias matters for big data, and are at the heart of doing good science with data. This is joint work with Lauren Beesley and Lars Fritsche at the University of Michigan.
- Emma M. Collier email@example.com 2-2453