Problem Set 2 - Ancestry
Overview
Data files and code templates for this problem set are available on comet at:
/oasis/projects/nsf/csd524/mgymrek/data/ps2/
/oasis/projects/nsf/csd524/mgymrek/templates/ps2/
mkdir /oasis/projects/nsf/csd524/$USER/ps2
mkdir /oasis/projects/nsf/csd524/$USER/ps2/code
mkdir /oasis/projects/nsf/csd524/$USER/ps2/results
Installing python packages
Use the following commands to install useful python packages:
pip install --user sklearn pandas pyvcf
PS2 data
The data directory contains the following files you will use in the problem set:ps2_pca.samples.txt
ps2_pca.genotypes.tab
ps2_reference_labels.csv
ps2_ibd.lwk.bed
ps2_ibd.lwk.bim
ps2_impute.subset.gen.gz
ps2_impute.heldout.gen.gz
/oasis/projects/nsf/csd524/mgymrek/templates/ps2/preprocess_pca.sh
/oasis/projects/nsf/csd524/mgymrek/templates/ps2/preprocess_ibd.sh
/oasis/projects/nsf/csd524/mgymrek/templates/ps2/preprocess_impute.sh
PS2 templates
The templates directory contains:pset2_pca_template.py
run_ps2_pca.sh
run_ps2_ibd.sh
run_ps2_impute.sh
Using 23andMe data
To include your own 23andme results in the data used for the PCA problem, see:
/oasis/projects/nsf/csd524/mgymrek/templates/ps2/preprocess_23andme.sh