Problem Set 2 - Ancestry
Overview
Data files and code templates for this problem set are available on comet at:
/oasis/projects/nsf/csd524/mgymrek/data/ps2/
/oasis/projects/nsf/csd524/mgymrek/templates/ps2/
mkdir /oasis/projects/nsf/csd524/$USER/ps2
mkdir /oasis/projects/nsf/csd524/$USER/ps2/code
mkdir /oasis/projects/nsf/csd524/$USER/ps2/results
Installing python packages
Use the following commands to install useful python packages:
pip install --user sklearn pandas pyvcf
PS2 data
The data directory contains the following files you will use in the problem set:ps2_pca.samples.txtps2_pca.genotypes.tabps2_reference_labels.csvps2_ibd.lwk.bedps2_ibd.lwk.bimps2_impute.subset.gen.gzps2_impute.heldout.gen.gz
/oasis/projects/nsf/csd524/mgymrek/templates/ps2/preprocess_pca.sh/oasis/projects/nsf/csd524/mgymrek/templates/ps2/preprocess_ibd.sh/oasis/projects/nsf/csd524/mgymrek/templates/ps2/preprocess_impute.sh
PS2 templates
The templates directory contains:pset2_pca_template.pyrun_ps2_pca.shrun_ps2_ibd.shrun_ps2_impute.sh
Using 23andMe data
To include your own 23andme results in the data used for the PCA problem, see:
/oasis/projects/nsf/csd524/mgymrek/templates/ps2/preprocess_23andme.sh