Paul Scheet
Professor
The University of Texas MD Anderson Cancer Center
Department of Epidemiology
My main research involves the development of statistical methods for the analysis of population genetic data. Specifically, I work with models for linkage disequilibrium (LD), and develop applications for taking proper account of the dependence among nearby loci. These applications include haplotype inference, missing genotype imputation, disease mapping, genotype error detection and correction, the visualization of haplotype variation, and the integration of emerging data types such as next generation DNA sequencing. For these applications I maintain a widely-used software package called fastPHASE.
A rotation in my lab could provide a student with exposure to several state-of-the-art haplotype-based software packages (such as PHASE, fastPHASE, Beagle, and Mach) and, in the process, could help evaluate different models for LD for various applications and settings. These models rely on hidden Markov model (HMM) techniques and implicitly a student would gain some introduction to these as well as algorithms, such as Baum-Welch, Markov chain Monte Carlo and Expectation-Maximization. In addition, the student could perform hands-on data analyses and would gain experience in working with Perl and the statistical software R. A student with strong C programming skills could develop, test, and add a function for analysis to an existing software package.
The Department of Epidemiology has space for students and postdocs and provides a rich academic atmosphere with a large number of faculty in the section of Computational Genetic Epidemiology. Ample computational support exists, including processors which support programs for genetic analysis, as well as standard programming and database management languages.
Education & Training
Ph.D. - University of Washington - 2006