Summary Statistics for Statistical Genetics

Image credit: pixabay

In the summer of 2019 I began methodological research seeking to use pre-computed summary statistics (PCSS) from existing Genome-Wide Association Studies (GWASs) and PCSS repositories to perform new analyses. These “new” analyses could include analyzing functions of multiple previously analyzed phenotypes, adjusting analyses for additional covariates, region-based tests, or any combination of these analyses.

When we discuss common PCSS we typically refer to only the mean and variance of each variable of interest as well as the correlation between any two variables. (GWASs often supply simple linear regression models’ slope coefficients, which can be easily transformed the correlation between the independent and dependent variable.)

My first publication in this field demonstrated how simple PCSS could be used to adjust linear models for an arbitrary set of covariates and model the principal component scores of a set of responses with these covariates. We applied our method to data from the Framingham Heart Study and used only pre-computed summary statistics to model various principal components of Omega-3 and Omega-6 fatty acids with adjustments for age, sex, and other fatty acid levels, finding the same results as when these models were fit directly on individual patient data.

A second paper proposes methdods to approximate single-marker tests for multiplicative and logical combinations of phenotypes. I also presented these findings during a platform presentation at the International Genetic Epidemiology Society’s 30th Annual Meeting. In addition, a forthcoming paper shows how to use PCSS to perform region based tests (e.g. SKAT and Burden tests, or F tests for nested linear models).

All of these methods are implemented in the open-source R package, pcsstools.

Avatar
Jack M. Wolf
Biostatistician and Educator

I’m an biostatistics PhD student at the University of Minnesota interested in causal inference, clinical trial design, and statistics and data science education.