maanantai 9. tammikuuta 2017

Going ahead with the new data, clustering

My new data makes possible to cluster better samples according to ethnicities. It is now possible to see at least

South European
West European
East European
Finnish dwelling zone
Baltic dwelling zone

Unfortunately none of those new sample sources give reasonable South European view, which makes impossible to see inside the Mediterranean area.  With better sampling I probably could create at least Balkan, South-Italian, Iberian and Basque clusters.   It is probably now possible to classify also project individuals by PCA.

Europe, clustered by Saami, Mongolian, South-Asian and Middle-Eastern samples

Zoomed in

Europe, plotted exclusively.  You can see clearly western and eastern clusters, as well as Balts and the Baltic-Finnic group splitting into Scandinavian and East-Slavic relations.   We could see also a clearly distinct Scandinavian group with more proper samples.  Unfortunately the South European picture is fuzzy due to too few samples.  Due to the shortage of samples I narrowed each group down to four samples, except Tuscany to strengthen the southern cluster.  It is very possible that with a larger South European sampling the European west and east would diverge even more than we see now on this plot below.


tiistai 20. joulukuuta 2016

New data gives 11 million SNP's

Thanks for latest updates of free gene banks and hard work of several projects I have now been able to increase the SNP number to 11 millions per sample.  Increased amount of SNP's means increased accuracy whether I use all SNP's (especially in drift analyses)  or after pruning ld.  In my new data base I have combined samples from three sources:  the 1000 genomes project, Estonian Biocentre samples used by Pagani eta al. 3026 and Simons Genome Diversity Project (SGDP).  For the present the sample size is only 866.

Here, as a showcase of the new data two PCA prints and some comments.  Instead of making a central continental European picture I included four outgroups to see the effect.  Those outgroups are Armenians, Mongolians, Sardinians and Saamis.   For the present individual names are picked straight from original sources and can be somewhat ambiguous.

As we can see we have several clusters, which makes possible to evaluate the data.  For example Scandinavians of SGDP and Pagani et al. cluster with East Europeans.

Personally I don't give much attention to PCA-figures, because the result depends on the selected samples, amounts, ratios between populations sizes, about how mixed are individuals etc.  My upcoming high resolution tests will be much more interesting.

Added time 12:50

If someone is interested in how Mordvas locate on this map.  They are very similar to North Russians and RusKU (Pagani et al.) and move towards Mongolians. Baltic Finns moves towards Saamis.  Sorry, GIMP makes something unwanted with colors.


keskiviikko 16. marraskuuta 2016

Ancient admixtures look shifty

It is hard to believe in some ancestry results.   FamilyTreeDna's new Ancient Origins give me following results

Metal Age Invader 12%
Farmer 30%
Hunter-Gatherer 54%
Non-European 4%

Regarding Metal Age Invaders they refer to the Metal Age Yamnaya culture, regarding Farmers to the Neolithic Anatolian migration to Europe and regarding Hunter-Gatherers to ancient LaBrana, Loschbour and Motala samples.   Regarding non-European proportion they give a hint to look at myOrigins, which is FamilyTreeDna's admixture analysis based on present-day populations.  My myOrigins give me only one non-European group, Middle Easterners.  I doubt it, the non-European in my Ancient Origins test is likely Asian.

Going further in analyzing results I compared my Ancient Origin results to  scientific papers,  Haak et al. 2015 giving comparable results.  Haak et al.  gives following results for Finns:

EN (Farmers) 31.5%
Nganasan (Asian) 10.2%
WHG (Hunter-Gatherer) 7.9%
Yamnaya (Metal Age Intrurers) 50.4%

Respectively Norwegians get in this study
EN (Farmers) 48.2%
Nganasan (Asian) 4.2%
WHG (Hunter-Gatherer) 0%
Yamnaya (Metal Age Intrurers) 47.5%

We can see a huge transition between Yamnayas/Iron Age Intruders and Hunter-Gatherers between Ancient Origins and Haak et al.  I know something about the method used by Haak et al., but I have no idea what FamilyTreeDna did. However, if I try to guess, I would say that they could have used a very drastic LD-pruning.  I can get similar differences by heavily pruned data and it makes sense.  Metal-Age invasion to Europe happened during the Bronze Age, thousands years later than the arrival of hunter-gatherers.  So it is reasonable to assume that we have still much more Bronze Age genetic drift than drift from hunter-gatherers, thus removing LD removes more ancestry of Metal Age Intrurers.  Pruning present-day samples does't have same effect due to more similar genetic composition.

I made also some admixture tests.   Pruning LD gives a big change in ancient admixtures.

My result without pruning

Anatolian_Neolithic 31.4
BA_East_European_Steppe 44,8
East_and_Southeast_Asian 10,8
Western_Hunter_Gathrerer 13

and after pruning

Anatolian_Neolithic 27.5
BA_East_European_Steppe 25.9
East_and_Southeast_Asian 7.8
Western_Hunter_Gathrerer  38.8

I am not saying that the difference between results of FamilyTreeDna and Haak et al. is caused by pruning, because I don't know it.  I only state that pruning ancient samples is risky.

keskiviikko 9. marraskuuta 2016

Project admix results, revised

My previous test was missing of German reference samples.  Together with the fact that my Swedish reference samples seem to be somewhat off, this gave results biased towards Balto-Slavs.  I have now added German samples available from Pagani et al. 2016 and have rerun all project samples, plus two new Finnish samples. Additionally I tested three Finnish samples introduced by aforementioned study.  Soon after downloading those samples I understood that they don't represent average Finns.  So this point is included after project results.

I had difficulties in editing columns and after some useless efforts I copy-pasted all in plain text format.

A new grouping, Karelian-Finnic indicates a sum of Karelian and Veps people.

Finland     57,0
AMBIG_Europe     25,0
Balto-Slavic     12,9
Baltic-Finnic     2,5

Finland     37,2
AMBIG_Europe     28,0
Balto-Slavic     14,8
NW-Atlantic-Europe     10,6
Saami     3,9


Finland     62,3
AMBIG_Europe     33,0
Baltic-Finnic     2,3

Finland     47,2
AMBIG_Europe     18,9
NW-Atlantic-Europe     18,1
Northeast-Europe     15,8

Finland     53,8
AMBIG_Europe     33,1
Baltic-Finnic     11,7

Finland     43,0
AMBIG_Europe     36,0
Baltic-Finnic     12,5
NW-Atlantic-Europe     7,9


Finland     78,7
AMBIG_Europe     17,4
TunNenets     3,4


Finland     56,5
Karelia     25,4
AMBIG_Europe     17,4

Finland     42,1
AMBIG_Europe     27,7
Karelia     24,5
Karelian-Finnic     5,0

Finland     43,1
Saami     21,5
AMBIG_Europe     10,9
Karelian-Finnic     10,2
AMBIGUOUS     10.0
AMBIG_Siberian     4,3

Finland     63,7
AMBIG_Europe     31,7
Baltic-Finnic     1,8

Finland     71,6
AMBIG_Europe     18,0
Central-Europe     10,2


Finland     69,8
Balto-Slavic     16,0
AMBIG_Europe     11,3
Baltic-Finnic     1,6


Finland     62,0
Karelian-Finnic     21,2
AMBIG_Europe     14,9


Finland     43,1
AMBIG_Europe     22,9
Estonia     21,8
Karelia     10,3

Finland     33,9
Central-Europe     24,0
Karelia     13,8
Baltic-Finnic     9,8
AMBIG_Europe     9,5
RU_Pinega     5,6
Karelian-Finnic     1,3


Finland     46,1
Karelian-Finnic     19,7
Balto-Slavic     14,5
AMBIG_Europe     8,8
Baltic-Finnic     6,5
Saami     3,7


Finland    0,62
AMBIG_Europe    0,20
Northeast-Europe    0,08
RU_Pinega    0,05
Saami    0,03

Finland     57,8
AMBIG_Europe     21,8
Balto-Slavic     10,9
Baltic-Finnic     4,3

Finland     53,1
Karelia     28,0
AMBIG_Europe     10,7
Northeast-Europe     4,8
Karelian-Finnic     1,2

NW-Atlantic-Europe     32,8
Central-Europe     32,5
Balto-Slavic     19,3
AMBIG_Europe     13,3


Baltic-Finnic     27,6
Central-Europe     21,2
AMBIG_Europe     19,3
Norway     17,5
NW-Atlantic-Europe     12,9


Norway     53,0
Central-Europe     18,3
Balto-Slavic     13,7
NW-Atlantic-Europe     8,1
AMBIG_Europe     6,5

AMBIG_Europe     28,9
NW-Atlantic-Europe     18,3
Central-Europe     18,3
Ireland     14,1
GermanyAustria     11,5
Northeast-Europe     7,9

Central-Europe     31,5
NW-Atlantic-Europe     24,7
AMBIG_Europe     16,5
Finland     14,5
Balto-Slavic     11,9

AMBIG_Europe     29,7
NW-Atlantic-Europe     26,1
Sweden     20,5
Orcadian     11,0
Central-Europe     10,7

Additionally some freely available genomes, only for checking the method.

Genomes Unzipped, VXP
North-Italy     24,9
Central-Europe     20,7
AMBIG_Europe     18,4
Norway     13,7
NW-Atlantic-Europe     12,0
South-Europe     6,6

Genomes Unzipped, JKP
Central-Europe     28,9
South-Europe     19,8
NW-Atlantic-Europe     19,1
Spain     12,5
AMBIG_Europe     11,3
AMBIG_SEURASIA     2,0                                      

Razib Khan, downloaded here.
Indian     35,6
Sindhi     22,3
Cambodian     12,8
AMBIGUOUS     10,6
Burusho     8,6
IndianJew     6,3
AMBIG_Southeast-Asian     2,4

Blaine Bettinger, downloaded here.         
He looks British, with a small portion of Native American.
Central-Europe     24,9
Kent     24,1
AMBIG_Europe     21,2
Welsh     9,3
Ireland     7,3
Atlantic-Europe     3,3
Native-American     1,9

Tests using Pagani et al. Finns as a Finnish reference   
Karelia    28,0
AMBIG_Europe    23,8
Central-Europe    17,8
Baltic-Finnic    12,6
Finland    12,1
Karelian-Finnic    3,4

Estonia    23,7
AMBIG_Europe    22,5
Karelia    18,6
Central-Europe    18,5
Finland    7,9
Karelian-Finnic    4,7

Karelia    46,3
AMBIG_Europe    16,1
Finland    10,4
Baltic-Finnic    8,7
Northeast-Europe    8,5
Saami    4,3
Karelian-Finnic    2,8

I tested three Finns, seen above, two of them typical Western Finns without any obvious foreign admixture and one should be a typical Finn from East Finland. The first row below shows the average result using average Finnish reference picked from 1000-genomes and the second row shows the average result after changing the reference to Finnish samples of Pagani et al.
FI12, FI14 and FI21, average Finnish result when using average Finnsh reference    64,8

FI12, FI14 and FI21, average Finnish result when using Pagani Finnish samples as a reference    10,1

In this particular case, while Pagani Finns almost fully mismatch with average Finns, it also eliminates Finnish admixture of Swedish results where it is present in analyses based on average Finnish reference, in some cases substituting Finnish admixture by Karelian and Veps.  This is really odd.

A map giving an estimate of admixture regions in Europe

maanantai 31. lokakuuta 2016

Project admixtures, fitted ancient proportions

Here are ancient European proportions of project members and for comparison some academic present-day samples (not all fully covered by references, though),  one random sample per each population.  Results don't express primary proportions of Anatolian Neolithic and various hunter-gatherers populations, but add-ons over European LNBA samples.  The European LNBA itself was already a genetic mixture, including admixtures similar to aforesaid West Eurasians and probably also of still unknown ancient populations.  Similarly "BA East European Steppe" already included eastern hunter gatherer admixture.  My aim was not to fix all admixtures on the same time level, but to get a good coverage and make project samples comparable to each other. 

XLS-sheet is available from here.

lauantai 29. lokakuuta 2016

Project admixture results

While preparing my ancient haplotyping analyses I decided to test project members using Dna.Land's Ancestry program.  Many thanks to authors for distributing it.  All you need is to compile it and start your analyses,

All result are "as is" straight from the analyses.  Some comments

- Finns and Norwegians are easily identified.
- Swedes and Estonians (the latter ones don't belong to to my project) can't be confidently identified by the academic reference I have used in this and in my previous analyses.
- many Finns have minor Saami admixture.  This makes sense and Saami ancestry is the most likely source of the Finnish Siberian admixture.  In most cases we can forget Nganasans and other distant and small Siberian populations.  The minor Saami admixture among Finns is pervasive, not only pointing out Siberian ancestry, but to the complex history of ancient Fennoscandinavian, otherwise we would see in these results real Siberians also included into my tests (Nganasans, TunNenets, Nenets, Yakuts and numerous "semi-Siberians" from more southern North Asian regions.
- I didn't get weird "Finnish-South European" admixtures, seen on FamilyTreeDna and Dna.Land result pages.  This because my Finnish reference is built of average Finns, not of Finnish minority groups.
- the ambiguous Balto-Slavic admixture among Finns is mostly from Latvia, Lithuania or Russian Tver.  Russians living to the north from the Tver region are classified as "Northeast Europe", except Karelians and Veps who belong to Baltic-Finns with Estonians and Finns.   Saamis form their own group.
- the ambiguous Northwest European admixture among Finns is mostrly Swedish.
- the ambiguous European admixture is usually some combination of two above-mentioned groups.
- "Ambiguous" means that the result of several individual bootstrap tests was ambiguous, meaning high dispersion of results.   

Finland 63,9
Ambiguous Northeast-Europe 11,9
RU_Pinega 8,9
Ambiguous Balto-Slavic 6,9
Ambiguous Europe 4,6
Iran_Jew 2,9

Finland 42,5
Ambiguous Northwest-Europe 15,9
Karelia 9,7
Ambiguous Balto-Slavic 9,5
Ambiguous Europe 8,3
Ambiguous Northeast-Europe 7,2
Ambiguous 3,8
Saami 3,1

Finland 69,2
Latvia 13,0
Ambiguous Baltic-Finnic 8,2
Ambiguous Northwest-Europe 6,3
Saami 1,7
Ambiguous 1,4

Finland 51,8
Ambiguous Northwest-Europe 22,7
RU_Smolensk 9,8
Ambiguous Northeast-Europe 7,1
RU_Pinega 4,7
Ambiguous Europe 3,6

Finland 52,4
Estonia 17,4
Karelia 15,3
Ireland 11,0
Saami 2,0
Ambiguous Europe 1,1

Finland 43,8
Karelia 12,3
Ambiguous Northwest-Europe 11,7
Ambiguous Baltic-Finnic 10,2
Lithuania 9,5
Ambiguous Northeast-Europe 7,4
Ambiguous Europe 3,5
Ambiguous Balto-Slavic 1,0

Finland 44,2
Karelia 27,9
Latvia 12,4
Ambiguous Europe 10,4
Ambiguous Baltic-Finnic 3,4
Ambiguous 1,6

Finland 66,5
Karelia 22,5
Ambiguous Europe 8,3
Saami 2,3

Finland 63,3
Karelia 23,2
Ambiguous Europe 8,1
Ambiguous Baltic-Finnic 2,8
Ambiguous 2,6

Finland 54,7
Karelia 17,0
Ambiguous Baltic-Finnic 15,9
Ambiguous Balto-Slavic 5,8
Saami 3,5
Ambiguous Europe 3,1

Finland 84,3
Ambiguous Balto-Slavic 8,0
TunNenets 4,2
Ambiguous Baltic-Finnic 3,5

Finland 63,6
Karelia 24,9
Ambiguous Europe 10,6

Finland 48,7
Saami 22,0
Karelia 12,2
Ambiguous 6,0
Nenets 4,0
Latvia 3,2
Ambiguous Europe 2,8
Ambiguous Siberian 1,0

Finland 72,9
Ambiguous Balto-Slavic 16,0
Ambiguous Europe 6,6
Ambiguous Baltic-Finnic 3,3
Ambiguous 1,3

Finland 82,1
Ambiguous Europe 17,0

Finland 44,1
Estonia 26,5
Karelia 10,2
Ambiguous Europe 13,1
Ambiguous Baltic-Finnic 4,2
Ambiguous 1,9

Finland 32,7
Karelia 17,7
Estonia 15,2
Sweden 14,6
Tatar 7,0
Ambiguous Europe 6,5
RU_Pinega 5,5

Utah_CEU 18,4
Ambiguous Northwest-Europe 18,2
Sweden 17,6
Belarussia 10,8
Welsh 8,2
Ambiguous Baltic-Finnic 8,1
Latvia 5,9
GermanyAustria 5,8
Ambiguous Balto-Slavic 3,1
Ambiguous 2,9
Ambiguous Europe 1,1

Sweden 20,5
Ambiguous Northwest-Europe 19,7
Ambiguous Baltic-Finnic 19,3
GermanyAustria 13,1
Ireland 11,3
Latvia 5,1
Ambiguous Central-Europe 4,8
Ambiguous Europe 4,6
Ambiguous Balto-Slavic 1,5

Norway 20,0
Sweden 19,9
Veps 13,9
Kent 12,9
Orcadian 12,5
Ambiguous Europe 9,3
Ambiguous Central-Europe 7,0
Ambiguous Northwest-Europe 2,3
Ambiguous Baltic-Finnic 2,0

Norway 17,9
France 17,5
Estonia 16,7
Finland 14,2
Utah_CEU 14,0
Ambiguous Europe 7,2
Ambiguous Northwest-Europe 6,6
Scotland 5,6

Norway 53,0
Ambiguous Northwest-Europe 24,3
Ambiguous Central-Europe 11,2
Ambiguous Europe 5,5
Veps 5,2

Utah_CEU 35,5
Finland 17,5
Ambiguous Northwest-Europe 14,2
Ambiguous Balto-Slavic 9,5
Veps 8,7
GermanyAustria 7,7
Ambiguous Northeast-Europe 4,3
Ambiguous 1,6
Ambiguous Europe 1,0

tiistai 18. lokakuuta 2016

European coarse population structure using 14.4 millions markers

I already made a Finestructure analysis before my previous Admixture based work, but didn't publish it because it gave so little additional information.   I used same data than with Admixture.   The workflow:

1 extracting chrpmosomes 1 and 6
2 running haplotypes (HAPI-UR ten times and making consensus)
3 running Chromopainter in linked mode, without defining donor haplotypes
4 running Finestructure with parameters burning 200000 and runtine 2000000

As a result we see a very obvious grouping, each ethnic group are grouped together.   Some cautions have to be made about Chromopainter-Finestrucure combination

-  first at all,  Finestructure doesn't really use dedicated haplotypes, but the number of shared haplotypes and haplotype lengths between individuals.  So there is no guarantee that in a triple sample case (individuals a, b and c)  all three share common haplotypes, even when the result of  Finestructure shows up haplotype sharing for all three samples.  This can lead to a pseudo-ancestry between individuals and also to a wrong tree grouping.

- using donor haplotypes can be methodically unreliable.  We can assign donor haplotypes for people living in Americas, but it is not equally reliable for people living in the old world.  It is a chicken egg question.  If we really know donors before testing we know the result before we have the result.   I have seen methods creating donor types (selections of prepared haplotypes), but I can't see how it could really work reliably.  Note also that speaking about donor populations (I have seen it) makes this even a more problematic question; to know donor populations we already know the population grouping before the analysis and bind donor populations to something that exists today, but did not necessarily exist thousands years ago.

While checking the data I see there a questionable sample qroup:  Swedes. They look more eastern than can be healthily suggested.

In general, looking at any results the first question is "does the result look obvious?".  If we have two different results based on any kind supervised method (like using donor haplogroups/populations) it is only common sense to see the more obvious result being the better one.   Here we have a philosophic question: what "the obvious" means for you and for me.  It makes sense, but an idea as "too obvious" lead us to tin foil hat theories. Perfection is suspicious.  We don't want it, although also it is in practice possible.   Another, much more sensible question in regards to donor haplotypes would be if we could assign  donor haplotypes of Bronze Age Europeans based on ancient samples.  It would make sense.

Dowload Finestructure picture here.