sunnuntai 13. elokuuta 2017

Project admixture analyses, revised

Now I used more SNP's with the method coded by Dna.Land authors.  It is now also possible to download all necessary tools for DIY purpose.  It works only on Linux and needs Python to be installed.  Here is a help how to install Python on Ubuntu.

Some comments to understand more about results:

- after a lot of testing I found that the Swedish sample bunch published by the study "No Evidence from Genome-Wide Data of a Khazar Origin for the Ashkenazi Jews" (Behar doesn't fit well with my Swedish project samples and all of them express more Northwest and Central European than the aforementioned Swedish reference.  This happens even if their self-declarations presume some Finnish admixture.   Therefore I decided to label them as East_Scandinavians, which seemed to be correct.  I wonder where they are geographically from.  

- Saami reference samples, unfortunately too few of them were available leading to increased statistic error,  cannot be considered as a source of Siberian.  They represent here a much more diverse source of genetic history.   The small Siberian admixture usually seen in Finnic results is built in Finnish results for the reason that the present-day Siberianness among Finnic people is old and distinct and doesn't match with present Siberians if we simultaneously use also Finnic reference samples.

The summarizing tree:


Finnic 54.6
East_Scandinavian 25.5
Saami 8.0
Northeast_European 5.7
Slavic 2.2
Northwest_European 1.0
Central_European 1.3
AMBIG_European 1.7

Finnic 52.7
Northwest_European 31.3
East_Scandinavian 6.1
Saami 3.8
Northeast_European 2.3
Slavic 1.9
Central_European 1.4

Finnic 72.9
East_Scandinavian 16.3
Saami 3.6
Baltic 3.4
Northwest_European 1.6
Northeast_European 1.7

Finnic 49.1
Northwest_European 25.1
East_Scandinavian 12.9
Northeast_European 5.1
Slavic 2.4
Saami 2.9
Baltic 2.3

Finnic 80.0
East_Scandinavian 13.4
Saami 3.4
Central_European 2.5

Finnic 54.2
East_Scandinavian 27.2
Baltic 9.3
Northwest_European 2.3
Saami 1.8
Northeast_European 1.7
Mediterranean 1.1
Central_European 2.0

Finnic 97.8
AMBIG_European 1.7

Finnic 95.1
East_Scandinavian 2.1
Baltic 1.7
AMBIG_European 1.1

Finnic 85.7
East_Scandinavian 11.9
Baltic 2.1

Finnic 64.0
Saami 31.5
Siberian 2.5
Uralic 1.0

Finnic 92.7
East_Scandinavian 5.2
Saami 2.1

Finnic 83.6
East_Scandinavian 15.2
AMBIG_European 1.1

Finnic 77.7
Baltic 16.8
East_Scandinavian 2.9
AMBIG_European 2.1

Finnic 97.8
AMBIG_European 1.7

Finnic 73.6
East_Scandinavian 14.6
Northwest_European 5.1
Central_European 5.4
Saami 1.0

Finnic 67.8
East_Scandinavian 14.8
Central_European 7.1
Slavic 5.1
Saami 1.4
Mediterranean 1.6
AMBIG_East_European 1.1

Finnic 82.0
East_Scandinavian 12.9
Saami 2.9
Baltic 1.5

Finnic 73.6
East_Scandinavian 14.3
Saami 6.9
Northwest_European 3.8
Slavic 1.0

Finnic 75.8
East_Scandinavian 17.3
Saami 4.6
Slavic 1.8

Finnic 94.0
Saami 2.1
AMBIG_European 2.0
Baltic 1.0

Finnic 94.5
Saami 1.3
Baltic 1.9
AMBIG_European 1.2
AMBIG_East_European 1.1

Finnic 68.8
East_Scandinavian 20.0
Saami 3.6
Northwest_European 3.3
Slavic 2.4
Central_European 1.7

Northwest_European 40.3
East_Scandinavian 22.4
Central_European 15.9
Finnic 9.0
Slavic 4.2
Baltic 3.9
Saami 1.6
Mediterranean 1.7
AMBIG_European 1.0

Northwest_European 52.1
East_Scandinavian 20.5
Finnic 14.8
Slavic 4.1
Central_European 4.3
Baltic 3.9

Northwest_European 59.5
East_Scandinavian 27.3
Central_European 5.4
Baltic 5.8
Saami 1.8

Northwest_European 38.1
East_Scandinavian 32.9
Finnic 11.5
Baltic 9.9
Northeast_European 3.4
Uralic 1.6
Central_European 1.9

Northwest_European 40.8
Finnic 20.7
Northeast_European 13.1
Central_European 9.0
East_Scandinavian 8.4
Slavic 4.0
Saami 2.1
Baltic 1.9

Northwest_European 45.8
East_Scandinavian 31.9
Finnic 11.5
Mediterranean 6.2
Slavic 2.2
Northeast_European 2.1

Although my primary goal was to find out Finnic and Scandinavian admixtures this obviously works fine for almost all Europeans, at least to some extent.

Other samples for a verification purpose:
Irish sample
Northwest_European 90.0
East_Scandinavian 8.7
AMBIG_European 1.3

Western Polish sample
Slavic 49.8
Baltic 18.3
Central_European 14.4
Northwest_European 6.5
Northeast_European 3.5
East_Scandinavian 4.0
Mediterranean 2.3
Uralic 1.1

Sardinian sample
Mediterranean 93.2
Northwest_European 4.5
East_Scandinavian 1.3

Baltic sample
Baltic 70.6
East_Scandinavian 12.3
Slavic 7.9
Northeast_European 6.3
Central_European 2.6
Lithuanian/Yotvingian sample
Baltic 49.0
Slavic 37.5
Central_European 5.8
Mediterranean 4.1
Northeast_European 1.8
AMBIG_European 1.7

Estonian sample
Finnic 41.4
Baltic 19.6
Slavic 16.8
Central_European 9.7
East_Scandinavian 7.8
Saami 2.3
Northeast_European 2.3

Genomes Unzipped sample
Mediterranean 45.7
Northwest_European 19.2
Central_European 19.9
East_Scandinavian 12.3
Slavic 1.9

Genomes Unzipped sample
Mediterranean 37.4
Northwest_European 37.0
East_Scandinavian 15.5
Central_European 9.3

Admixture sums don't give full 100 % because all admixtures below 1% are ignored.

Program downloading and running

Download programs here.  Unzip and locate all programs into a same directory.  To run tests you need use a command line "bash ./ <sample-id>,  where sample-id is the file name holding your genetic data in 23andme format.  The sample file must be compressed with gz file extension (gzip format), but on the command line you give only the sample id (sample-id.gz), not the extension.  The test works fine with following genome builds:  HG18, HG19, GRCh36, GRCh37, but if your genome file is in the FtDna format you have to convert it into the 23andme style.  On Linux it is done easily using four command line entries:

first unzip your genome file and then

cp <original filename> <sample-id>
sed -i 's/\"//g' <sample-id>
sed -i 's/,/\t/g' <sample-id>
gzip <sample-id> 

If your data is already in the 23andme format, but not compressed with gz file extension then you need to unzip it first and run the first and fourth commands explained as above.

edit date 14.8.17 time 17:30

Another Estonian results.  I can only say that it is plausible considering the history

Baltic 37.2
Slavic 29.6
Finnic 22.8
East_Scandinavian 8.1
Saami 1.5

edit 15.8.17 time 17:45

A British results.  It looks like Irish with more Mediterranean and minor Central European admixture..

Northwest_European 81.6
Mediterranean 10.3
Baltic 3.4
Central_European 2.9
AMBIG_European 1.8

tiistai 27. kesäkuuta 2017

Estonian Corded Ware enigma

The following simple dstat-figure shows the mystery of Estonian Corded Ware samples released during this spring.   There can't be any populational continuum from them to present-day Balts, including Estonians.  All thousands years older hunter-gatherer samples are overwhelmingly closer present-day Balts.  The change regarding HG ancestry can be seen in Western and Central Europe where we see a clear cut decrease of HG ancestry, obviously caused by increasing real Corded Ware and Bell Beaker ancestries.   We have to compare pure Neolithic populations against Estonian CW samples to reach parity in the Baltic area.  There is a tiny evidence about the given continuum;  Finns are closer German BA samples than Balts, giving a hint that there could be some subtle continuum.   

lauantai 24. kesäkuuta 2017

Yamnaya and Bell Beaker drift and ratio in present-day Europe

Following statistics gives an insight into how the Bronze Age Steppe ancestry transforms to a modern Northwest European genetic model and gives an idea of differences seen in Europe today.   I made free tests:

f3(Yamnaya Samara, X: Ju_hoan North)
f3(Bell Beaker Germany, X: Ju-hoan_North)
dstat(Bell Beaker Germany, Yamnaya Samara: X, Ju_hoan_North)

All results are based on around 450000 SNP's.

Results of F3_statistics were standardized to a common value 1 and also dstat-results were standardized separately to value 1.  The results show that a Yamnaya type ancestry is still significant in East Europe and the turning line from Yamnaya to Bell Beaker goes from Western Finland to Lithuania and Belarusia.   European farmer or Middle Eastern ancestry becomes dominant in South Europe leading to decreasing Bell Beaker ancestry in absolute terms.

lauantai 17. kesäkuuta 2017

Estonian Corded Ware was not Corded Ware

Despite of the common chronology the Corded Ware in Estonia was genetically a historical misstep if we believe dstat-statistics using samples of German Corded Ware and Bell Beaker cultures.  All Northern Europeans are closer German than Estonian samples.

torstai 15. kesäkuuta 2017

Shared drift with ancient Latvian, Estonian and British samples

Briefly said, shared drift of Latvian samples from Jones et al.   I have rebuild all samples using bam-files straight from the study and a new genotyping algorithm designed for ancient samples. 

perjantai 9. kesäkuuta 2017

British Viking Age samples placed on the genetic map

I got recently new samples from quite a new study, link here.   It looks more like a technical test than actual sampling for a purpose to study history, but anyway I sampled the data.  So far I have available eleven Viking Age samples from UK and have now tested them.   The data consist of around ten samples from each population, with exceptions of Swedes.  Only two Swedish samples were available for my mega-snp data base, both from the study "Genomic analyses inform on migration events during the peopling of Eurasia" (Luca Pagani 2016). The first one was from Nyköping, the second was without any place declaration.

The PCA lacks of a few Viking Age samples due to being too bad thus canceled by the outlier check.   British Viking Age samples look like to be German, but I should remind that PCA is based on dedicated components rather than genetic similarity in basis of the whole genome.   Let's see how those samples look in a formal analysis.  I have made several tests to give different views, for the reason that populations don't place in tests on one or two dimensional axis.

We see that in formal tests Swedes are closest to British Viking Age samples, followed by Irish and Scottish samples.  One straightforward conclusions could be that those Viking Age people were mixed Scandinavians and Celts/Britons. One bizarre remark:  Swedish samples are on the PCA prone to bias towards Finns and Norwegian samples show less this kind of similarity.  Still Swedish samples are closer those Viking Age samples from UK.   I have not tested this curio using formal analyses, but as far as I know this will be true in all tests.

edit 11.6.2017  12:50

German samples are from Leipzig.

torstai 25. toukokuuta 2017

PCA grouping of N1c1-haplogroup

Earlier I used TMRCA (time to the most recent common ancestor) calculation in making PCA analyses of YDNA clades, the analysis is here.  Now I use same method for grouping haplogroup N1c1.  The data was gathered from the  FamilyTreeDna's open project.   TMRCA calculation give only estimations, but  the result makes more sense because every cell in the TMRCA data is compared to every other cell.  I used 67 markers to get largest possible data.  Only a few Ftdna kits show less markers.

Download original picture here

Now I had only a few Altaic and Ugric samples.  More those samples would make possible to see the distance between Altaic/Ugric and European groups.  The result indicates three European groups:  Baltic,  Chuds and Finnish.  Actually also West Chuds are Finnish, but as far as I know it is prehistorically shared with Estonians.   The most distinct group is the Finnish one, implying local origin, despite of random distribution in North Scandinavia and Russia.

Download original picture here

The next picture shows what happen after removing Finnish clades (despite of the locations).  West and East Chuds cluster together and North Balts come close on the y-axis.  West-, East- and Central Balts cluster again.  The root group includes all samples not belonging to any named clades, but doesn't indicate any specific branch.

Download original picture here

After removing also all Chuds the picture shows more details.   We see that North Balts and Rurikids cluster together (with one classified Fennoscadinavian)  and all Balts make another cluster.