Friday, November 21, 2014

Do_it_yourself Dodecad test for Finns (including Baltic Finns in general)




Wondering the Finnish history and migrations that happened during the last 2000 years I have done the following Do-It-Yourself Dodecad test.   My goal was to achieve a dedicated test for Finns, but it could work also for Estonians and other Baltic-Finnic people.  This test doesn’t work for other nationalities due to the regional reference assortment.  What I have done differently than in many other Dodecad Oracle tests is not only the reference selection, but also I had a tighter Finnish sample qualification.  It is also reasonable to mention that in some tests the preprocessing of genotype data has been biased.  My data includes 290000 SNPs and it doesn’t include any preprocessing based on differences between populations.  So it is as it is, straight from the stock.  

Reference populations:

Finnish
Mari
Chuvash
Mordva
Estonian
Lithuanian
Belarussian
Polish
Swedish
Norwegian

You don ‘t need to worry about the “calculator effect”, because all my data is from public academic sources.

To perform the test you at first have to download DiyDodecad scripts.  You can do it here


Please notice that DiyDodecad is authorized by Dienekes and included in his Dodecad Ancestry Project:


After you have uncompressed all files into your own directory (for instance Kaleva) you have to download and uncompress four Kaleva-specific files to the same directory. 

 Files


Now everything is ready for making first analyses, to do it you need to read README.txt and follow Dienekes’ instructions, the only difference is that you need to use KALEVA.par instead of the Dodecad dv3.par file.



Friday, November 14, 2014

Ancient British genomes from Hinxton reveal the eastern Iron Age frontier



It is the time for ancient genomes.  A month ago I read about new ancient samples from England, Hinxton, and saw them to be interesting in terms of the Finnish history.  Those samples are around 1500-2000 years old, thus being rather suitable for estimating Finnish western connections.  The Finnish history in Finland is rather short, in the best case bloodlines goes around 2000 years to the past, quite a short history compared to many southern Europeans.

Data

I use now the same data I had in my roll-off analyses.  Just to remind you, I made a very strict qualification for Finnish samples to remove all recent admixtures, meaning the time span from the beginning of the Swedish era in Finland.  All public western Finnish samples were selected by comparing to my own genealogically proved samples and outliers were removed.

Software

I used Reich’s three population test (qp3Pop) with default settings.

Results
 
Before going forward some words of caution.   After testing with larger data I realized that also qp3Pop makes an assumption that less diverse populations are source populations for more diverse sum populations, in other words diverse populations are usually composed from several less diverse populations.   This is not true and is a rather mechanic perception.  In genetics the process can be reverse; a more diverse population can turn in to a less diverse one through genetic drift.  This is important because just the drift is now what we analyze.  

Some general observations.  This above-mentioned problem doesn’t have effect on ancient samples, because they lived far before us and they can’t violate causality.   However the lack of diversity can overestimate the admixture.  

I have also some results using preset-day source populations and those results can be problematic. Nevertheless,  despite of the fact that some Finnish samples are from young isolations  I  assume that my Eastern Finnish samples represent historically most unmixed Finnic language speakers in my data, keeping in mind however that I have no Finnic (Baltic Finnish) speakers from Russia.  Additional samples from Russia could give information about possible admixtures of East Finns.

Abbreviations: 
AS - Hinxton Anglo-Saxons BR - Hinxton Iron Age Briton EF – East Finns WF – West Finns
PL – Poles LT – Lithuanians EE – Estonians MA – Maris CH – Chuvashes NR – Norwegians
MR – Mordvas BU - Belarussians

Negative F_3 values mean likely that the target has admixtures of both source populations.




  


 


The Western Finnish map shows high ancient admixtures, especially the Anglo-Saxon - East Finnish admixture among them is outstanding!  Estonians show admistures with almost all their neighbors, which can point out continuous migrations to Estonia through the history.


Another way to find out speculative admixture of source populations is to pick the least probable target population, in this case African Pygmies.  Using this method we see that the most Anglo-Saxon-like are Norwegians and the most Iron Age Briton-like are Lithuanians.  West Finns are the third on the Anglo-Saxon axis.  This probably means that West Finns have Anglo-Saxon-like ancestry, or Anglo Saxons had common Fennoscandinavian ancestry with West Finns.  All those owning more Iron Age British ancestry than West Finns (NR, LT, EE, PL) likely have more ancient Celtic ancestry from Central Europe.




 edit Su 16.11.2014

I thought that it would be interesting to know more about the western outlier group, the Finns who are more western on PCA plots than the genealogically proved West Finnish group.  This is done by comparing both western Finnish groups to East Finns, that is to say the East Finns are a fixed landmark on which to base the comparison.  The result shows mixed results with negative f3-stat F3(WF;EF,test) and f3(WF2;EF,test) where "test" includes Estonians, Swedes, Iron Age Britons and ancient Anglo-Saxons.

The result shows that western outliers show more Iron Age Briton, more Estonian and more Swedish ancestry than the genealogical western group, but they show little less ancient Anglo-Saxon ancestry than the genealogical western group.  The result also proves that both western Finnish groups have significant Eastern Finnish -like ancestry.



The abbreviation "SE" stands for three Swedish samples who show only very little Finnish ancestry at 23andme's Ancestry Composition.

edit. Mon 17.11.2014

Another graphic showing the Swedish - ancient Anlo-Saxon ratio among Finnish individuals, both admixture gotten by 3Pop-software.  The East Finnish group is used as a fixed landmark.   The individual difference between AS and SE was used as a sort key and the trend line shows linear difference.   I would have done also comparisons to other populations, like to Estonians, but the difference in SNP-sets made an individual level comparison impossible.

Although ancient Anglo-Saxon and Swedish admixtures follow each other, my judgement is that the bigger the AS is compared to the SE, the bigger is the ancient admixture, and vice vesa. 










Friday, November 7, 2014

The long and dark shadow of history



Since the last post I have done a lot of testing, I have tried to find limitations of the analysing tool as well as increase my own understanding what all results mean.  There is still much work to do, but I am going forward piece by piece and I try to shed light on the Finnish genetic history.  In his purpose started my shared LD tests from present-day populations, not from the ancient ones, although it would be more intriguing to resolve big historical questions in our deep past.

Data

The data mainly consists of publicly available academic samples.  Everyone can download same samples over internet.  Additionally I have a few Finnish, genealogically classified Finnish samples.  I use them to categorize public Finnish samples, because the public data includes some Finns with foreign admixture.  

Finns 96  1000genomes
Finns 7 my own collection
Norwegians 15 other sources
Poles 10 other sources
Belarussians 9 Est.BC
Chuvashes 16 Est.BC
Estonians 13 Est.BC
Lithuanians 10 Est.BC
Maris 15 Est.BC
Mordvas 14 Est.BC
Ukrainians 16 Est.BC
Swedes 3 my own collection

Preparing data

I found the maximum overlap being in my data around 550000 SNP and the minimum around 290000 SNP.  The number under the test varies depending on the selected references and target populations.  I found also that the minimum SNP space for reliable results is over 20 million SNP’s.  It is however likely that larger individual sample sizes would give steadier LD-sharings and smoother roll-off curves than larger sample amount, as well as also less standard error.   It would be better to have millions individual SNP’s, but I didn’t see big quality differences when comparing curves in this test to other similar results achieved by authors using same programs and I suggest that our data is quite similar in terms of reasonable results

Preparing  the Finnish data 

In the first step I ran a European level PCA figure to see possible foreign admixture and removed 13 Finnish samples locating to the west from my genealogical west Finns.  Secondly I ran a new PCA  including only Finnish samples, grouping it to three portions:  19 most eastern samples (excluded  11 outliers), 17 most western samples (including my genealogical West Finns) and the rest forming an intermediate group.  By this arrangement it was possible to have distinct eastern and western groups, but also a working Finnish reference (56 samples), suggesting that the intermediate group probably consists of purest present-day Finns.  

Software

My aim is to use at first Reich’s programs starting with Rolloff.  Rolloff is a software outputting  LD-sharings from target populations filtered by two reference populations.  You can search different mixing routes for the target by changing references.  It also gives an estimate for the admixture time.  This dating suggests one pulse admixture between the target and references, so continuous gene flow will give erroneous admixture times, but still showing real admixture.  

Results

All analyses are run using Rolloff’s defaults, with exception of the resolution being 0.5 cM instead of  1 cM.   I tested both values and didn’t notice the lower value increasing standard error, just conversely the lower resolution reduced it a bit.  I also noticed that Alder (another roll-off program) uses this lower value.   The lower the value is the more we get LD-transaction.  Too high resolution however increases statistic noise.   
These results were surprising, but the truth is that similar shared LD-tests obviously have never been done before regarding Finns, so I had no expectations.  I can only say that if someone sees these results unexpected, do not shoot the messenger, I prefer repeating my tests, perhaps under tighter quality control, if you wish.  I would be happy to see new results to evaluate possible differences. 

These results suggest that the Finnish genetic shape is an outcome of several migrations and admixture events, more than I could expect using PCA and formal admixture analyses based on averagely LD-pruned data.  The big genetic difference (in Fst-distances) between East and West Finland might be more due to the migration history than genetic drift.  Eastern Finnish results show rather young southeastern or eastern admixture history (Mordvas), while western results show older southern admixture (present-day Belarussians).  Both groups show also northeastern admixture (Mari->Saami?).  It is possible that those three populations are all proxies, most likely this is true in case of Belarussians. 

The common history with present Scandinavians is smaller and older than expected, but this doesn’t rule out possible ancient regional migrations from there to Finland.  Unfortunately I have not enough samples to check it and regarding Scandinavian migrations to Finland before the Swedish era in Finland my expectations are more focused at ancient genomes.  It is worth noticing that I removed all known foreign admixture, including obvious Finland-Swedish samples.  It was possible, thanks to my genealogical western Finnish data.

It looks like no particular Estonian migration existed to Finland since the common language diverged and southern migrations to Finland bypassed Estonia.  

I am going to find out admixture amounts in following analyses. 


Admixture times for Finnish people







Related roll-off graphics



Related PCA-plots

PCA dimensions 1 and 2

PCA dimensions 1 and 3









edit 9.11.14




I got yesterday a feedback that I could verify my results by checking the French admix among Finns.  My first thought was oh no,  I am not going to start qualifying the software which has been used in several academic studies.  It is in principle unfair to ask me to do such thing.  But then I rethought it.  Why not, but using Spaniards I could check if the admixture time fits to the Stone Age and to the times when southern migration waves expanded to Finland.  Here are the result. 
 
Admixture time   197.139  generations  +-55.497 = 5914 years +-1665 years.