The University of Texas MD Anderson Cancer Center
Department of Biostatistics
I develop statistical methodology for analyzing large, complex data sets arising from scientific applications. These include `omic data sets (genomic, gene expression, proteomic) from sequencing studies. Much of my current research focuses on causal inference with directed acyclic graphs. I have two current statistical methodology projects:
- Causal Mediation Analysis: Disease causing agents (e.g. single nucleotide polymorphisms, smoking) cause clinical outcomes (e.g. cancer status or severity) through multiple pathways (e.g. changes in mRNA expression, DNA methylation). Using causal mediation analysis, one can decompose the causal effect into different pathways, providing a more refined understanding of how risk factors influence disease. In my research I am developing non-linear, high dimensional causal mediation models for application to cancer data sets which include SNP and mRNA expression.
- Causal Prediction: Gene knockout experiments provide an ideal laboratory to test the assumptions of causal models constructed using observational data. I am developing a prediction based framework for validating causal model constructed from observational data when some experimental data is available.
In addition to these methodological projects, I collaborate with clinicians and scientists at MD Anderson on analysis of data from clinical trials and laboratory experiments. These project vary considerably in size and complexity.
I publish research results in medical, biostatistics, and statistics journals. I code in R and python and release code via github and CRAN.
Education & Training
Ph.D., University of California-Berkeley, 2013