Problem Set 4 - Next-generation sequencing
Overview
Data files and code templates for this problem set are available on comet at:
/oasis/projects/nsf/csd524/mgymrek/data/ps4/
/oasis/projects/nsf/csd524/mgymrek/templates/ps4/
mkdir /oasis/projects/nsf/csd524/$USER/ps4
mkdir /oasis/projects/nsf/csd524/$USER/ps4/code
mkdir /oasis/projects/nsf/csd524/$USER/ps4/results
PS4 data
The data directory contains the following files you will use in the problem set:NA12878.alt_bwamem_GRCh38DH.20150706.CEU.illumina_platinum_ped.cram
NA12878.alt_bwamem_GRCh38DH.20150706.CEU.illumina_platinum_ped.cram.crai
NIST_NA12878_hg38_chr22.tab
HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz
GRCh38_full_analysis_set_plus_decoy_hla.fa
- The CRAM alignment files were downloaded from the 1000 Genomes Project portal: http://www.internationalgenome.org/data-portal/sample/NA12878
- The NIST gold standard SNP calls (in the VCF file) were downloaded from the Genome In a Bottle website.
- GRCh38_full_analysis_set_plus_decoy_hla.fa is the human reference genome build 38.
PS4 templates
The templates directory contains:ps4_snpcaller_template.py
run_ps4_snpcaller.sh
ps4_comparesnps_template.py
run_ps4_comparesnps.sh