A reference haplotype panel for genome-wide imputation of short tandem repeats
1000 Genomes SNP-STR Haplotype Panel
Data Description:
- 2,504 samples from the 1000 Genomes Project Phase 3 SNP Haplotypes
- STRs imputed from 1,916 samples that were part of the Simons Simplex Collection.
- Total 27,185,239 SNP + 445,725 STR markers
- All the coordinates are based on the b37 human reference genome.
Availability: Amazon S3 bucket s3://snp-str-imputation/1000genomes
[SNP-STR Panel chr1] [chr1 index]
[SNP-STR Panel chr2] [chr2 index]
[SNP-STR Panel chr3] [chr3 index]
[SNP-STR Panel chr4] [chr4 index]
[SNP-STR Panel chr5] [chr5 index]
[SNP-STR Panel chr6] [chr6 index]
[SNP-STR Panel chr7] [chr7 index]
[SNP-STR Panel chr8] [chr8 index]
[SNP-STR Panel chr9] [chr9 index]
[SNP-STR Panel chr10] [chr10 index]
[SNP-STR Panel chr11] [chr11 index]
[SNP-STR Panel chr12] [chr12 index]
[SNP-STR Panel chr13] [chr13 index]
[SNP-STR Panel chr14] [chr14 index]
[SNP-STR Panel chr15] [chr15 index]
[SNP-STR Panel chr16] [chr16 index]
[SNP-STR Panel chr17] [chr17 index]
[SNP-STR Panel chr18] [chr18 index]
[SNP-STR Panel chr19] [chr19 index]
[SNP-STR Panel chr20] [chr20 index]
[SNP-STR Panel chr21] [chr21 index]
[SNP-STR Panel chr22] [chr22 index]
Tredparse pathogenic loci
Spinocerebellar Ataxia 1 - index
Spinocerebellar Ataxia 17 - index
Dentatorubral-pallidoluysian Atrophy - index
Spinocerebellar Ataxia 2 - index
Spinocerebellar Ataxia 8 - index
Spinocerebellar Ataxia 3 - index
Spinocerebellar Ataxia 6 - index
Myotonic Dystrophy Type 1 - index
Supplementary Data
Supplementary Tables 2 and 3 give imputation summary statistics for each locus:
Saini et al. Supplementary Table 2
Saini et al. Supplementary Table 3
Usage:
Download Beagle .jar to impute STRs from our reference panel into SNP genotype data. We suggest using the latest version 4.1. If you are working with related samples and want to use pedigree information, use Beagle version 4.0
Ensure that the alleles in the target SNP file match our reference panel. We suggest using conform-gt. Example:
Impute STRs into your SNP file:
FAQ
How do I convert Plink BED format files to VCF format? Solution: Use Plink to convert from plink bed to VCF format. Ensure that the reference allele matches our panel.
Do I need phased SNPs as input? No, Beagle will phase the input SNPs during the imputation process.
How do I measure the STR imputation accuracy? We measure the per locus accuracy of imputing STRs from Simon’s Simplex Collection into the 1000 Genomes data for three different populations: EUR, EAS and AFR. This data is available in the Amazon S3 bucket mentioned earlier (Saini_etal_SuppTable2.xlsx). You can expect to impute STRs with similar accuracy if your SNP data is from one of these populations.