PhD - Biostatistics and Data Science
Main ContentPhD - Biostatistics and Data Science
The Doctor of Philosophy (PhD) program in Biostatistics & Data Science prepares each graduate to lead cutting-edge research and act as a consummate resource in the design, analysis, and interpretation of a wide array of studies. Graduates will possess the technical and collaborative skills necessary to work with clinicians, epidemiologists, private companies, and population health organizations. This program bridges competencies in statistics, computer science, and epidemiology.
Choose from three areas of emphasis
After receiving two years of broad training, students complete a dissertation in one or more areas of emphasis:
- Biostatistics - Develop new statistical methods to accurately interpret biomedical and population health data.
- Bioinformatics & Genomics - Investigate the molecular and environmental basis of human health and diseases using high throughput data.
- Data Science - Turn vast amounts of data into actionable evidence using computer science, data mining, applied mathematics, predictive analytics, and data visualization.
Primary objective
Students must complete a dissertation expanding knowledge in one or more emphasis areas:
- Students selecting the Biostatistics track as their primary emphasis area will be expected to develop new statistical methods to accurately interpret biomedical and population health data.
- Students completing the Bioinformatics & Genomics track will be equipped to analyze a broad range of biological data (including genomics, transcriptomics, proteomics, metabolomics, and epigenomics) to investigate the molecular and environmental basis of human health traits and diseases.
- Students completing the Data Science track will be able to create systems to turn vast amounts of data into actionable evidence, requiring additional knowledge in computer science, data mining, applied mathematics, predictive analytics, and data visualization.
The doctoral course of study includes supervised consulting, internships, and the aforementioned dissertation, offering students ample opportunities to work with high-quality data and reputable researchers from two epidemiologic studies supported by the National Institutes of Health. The Jackson Heart Study (JHS) is the largest ever single-site study of cardiovascular disease and its causes in African-Americans. The Atherosclerosis Risk in Communities study (ARIC) is designed to investigate the causes of atherosclerosis and its clinical outcomes, as well as the variation in cardiovascular risk factors and disease by race, gender, and location.
Graduates of the program will be able to:
- Efficiently collect, clean, organize, and appropriately analyze biomedical, clinical, and population health data;
- Use standard statistical (R, SAS, and Stata) and computer (Python) programming languages to reproducibly explore and visualize data, fit models, conduct inference, and translate analysis results;
- Conduct all facets of big data analysis, including the extraction, storage, manipulation, and analysis of massive genetic and bioinformatics datasets;
- Convert information contained in databases and data warehouses into actionable findings using machine learning and other data science techniques;
- Adhere to rigorous ethical and methodological standards when analyzing real-world data;
- Collaborate with non-statisticians and communicate findings to the scientific and general community to improve health care and prevent disease;
- Lead cutting-edge methodological, genetic epidemiological, or data science research;
- Act as a consummate resource in the design, analysis, and interpretation of a wide array of studies.