Thursday, September 8, 2016

Worldwide diversity based on 3.2 millions X chromosome markers

Genetic diversity tests are usually done using around 300-500 thousands markers.  It is however possible to use much more markers (SNPs) using already available data from the 1000 genomes project.  The downside is that we have only a few populatons and the upside is that we see the big picture accurately, without possible bad sampling.

I made this test using Chromopainter and Finestructure.  Unfortunately Chromopainter is a rather ineffective tool and incapable to use available computing resources (threads, memory).  Without this drawback I would have made this using 25 millions markers instead of only 3.2 millions.

The process:

1 Vcftools, parameters  -remove indels -chr 23
2 Haplytyping using HAPI-UR and all samples, run three times and driven in consensus
3 Made a manual selection for random samples, 10-20 of each population
4 Chromopainter,  without specifying donor haplotypes
5 Finestructure  with run parameters 30000/300000
6 MDS using Past.

Additionally I ran Vcftools using parameters -keep-only-indels and -chr 23.   The result was filtered and biallelic deletions (CN=0) were counted.  Male results were treated biallelic, so CN=0 should give us the number of effectine deletions in both cases, for females and males.


Finestructure












































MDS done by Past:























All previous pictures are downloadable with better resolution, here.

Deletions per 3.2 million markers (averages per sample):







































The British subgrouping is gathered from internet and can be unreliable.  The Finnish one represents those with highest Siberian admixture, the group being "most Finnish" / local, those closest ancient Corded Ware samples and the rest of all 99 samples.  The last Finnish group includes all outliers.