Research overview

Our overall goal is to understand genetic variants that underlie human disease and how their effects vary across different populations. We are a multidisciplinary lab and include both computational and wet lab biologists. We are particularly interested in repetitive DNA variants known as tandem repeats (TRs). Our work is often done in collaboration including with groups in the Depts. of Pediatrics, Medicine, Biomedical Informatics, Computer Science & Engineering, Electrical and Computer Engineering, Chemistry, and Psychiatry at UCSD as well as at other institutions. We are especially interested in building new collaborations with clinicians. We currently focus on the specific areas described below.
research1

(1) Developing computational tools for analyzing complex variation in biobank-scale genomic datasets

Tandem repeats (TRs) are some of the most polymorphic regions of the genome and make outsized contributions to disease but are technically challenging to analyze. Our lab develops methods to enable genome-wide analysis of TRs including:

Many of these tools have been applied to large datasets to gain new insights into patterns of TR variation and the contribution of TRs to different traits. Most of these tools including general utilities for filtering, QC, etc. of TR genotype data are packaged in our TRTools package.

TR genotypes generated by many studies we've been involved in are available on WebSTR, a site we built in collaboration with the Anisimova Lab at ZHAW.

We are also interested in using pangenomes to understand genetic variation at repeats and other complex regions of the genome. Our lab is a member of the Human Pangenome Reference Consortium. We are working on an interactive browser to visualize pangenome data.

research2

(2) Identifying TRs contributing to human traits

We have developed and applied methods to integrate TRs into association testing frameworks. We have applied these to uncover widespread contributions of TRs to a range of traits including:
  • Gene expression: we identified more than 1,000 high-confidence STRs acting as expression quantitative loci (eQTL) for nearby genes (Fotsing et al 2019). We previously estimated that STRs contribute ~10-15% of the cis heritability of gene expression in humans (Gymrek et al 2016).
  • Blood and serum traits: we identified 119 candidate causal STR-trait associations and estimate that STRs account for 5.2%–7.6% of causal variants identifiable from GWAS from blood biomarkers.

We have also developed and applied methods to study de novo mutations at STRs, which we identified as contributing to risk for autism spectrum disorders (Mitra et al. 2021). We are continuing to apply our association testing and de novo analysis frameworks to study the contribution of TRs to other traits including molecular and disease phenotypes. We also have multiple ongoing collaborative projects to perform genome-editing of predicted pathogenic TRs in human iPSCs and other cell types.

research3

(3) Studying mutation and selection processes at TRs within and across species

Tandem repeats have multiple interesting properties compared to other types of variation, including rapid mutation rates and high rates of multi-allelicness. Understanding the evolutionary forces including mutation and selection driving patterns of variation at these loci is critical to predicting which TR mutations are likely to be pathogenic. We have made multiple contributions including:
  • Developing mathematical models of TR mutation that enable inference of mutation rates and other parameters at individual loci (Gymrek et al. 2017)
  • Developing methods (SISTR and SISTR2) to infer negative selection at TRs
  • Identifying Msh3 as a modifier of the rate of genome-wide TR expansions in in bred mice (Maksimov & Wu et al. 2023) (highlighted on the cover of Genome Research!).
We are currently working on expanding our models of selection to enable inference in non-European populations and modeling TR mutation/selection both within and across species.
research4

(4) Understanding how the effects genetic variants differ across human populations

Together with Drs. Kelly Frazer (UCSD) and Lucila Ohno-Machado (Yale), we lead the Center for Admixture Science and Technology (CAST), which focuses on improving the utility of genomics methods for admixed populations. Through CAST we work on multiple projects including:
  • Admixture methods and other techniques to understand the contribution of local ancestry to complex traits.
  • Methods for fine-mapping in multi-ancestry settings (e.g. PIPSORT) and to incorporate complex ancestry-specific variants and haplotypes
  • Machine learning methods to model social determinants together with genetic risk factors for disease.
research5

(5) Using high-throughput experimental techniques to study the impacts of genetic variants on molecular and cellcular phenotypes

We have multiple collaborative projects to study the impact of genetic variants in human cells including:
  • Using massively parallel reporter assays (MPRA) to understand the effects of TRs on gene regulation (collaboration with Goren Lab, see our preprint!)
  • Using base editing to study mutations in genes implicated in rare disease (collaboration with Goren and Komor Labs)