← Autodidact Archive · Original Dissent · friedrich braun

Thread 9014

Thread ID: 9014 | Posts: 2 | Started: 2003-08-14

Wayback Archive

friedrich braun [OP]

2003-08-14 01:32 | User Profile

Via Race Archives

[url=http://www.racearchives.com/]http://www.racearchives.com/[/url]

I hope that "Polako" will jump in and enlighten us on the accuracy of autosomal DNA tests

PA German (aka PA Dutch) Ethnic Group BioGeographical Ancestry (BGA) DNA Assay Project

Is There an Ancient Genetic Connection Between Some PA Germans and Attila the Hun?

Volunteer Participants Wanted

View Initial Participants Test Results

Background and Introduction

In February 2003 I ordered a newly available DNAPrint™ Version 2.0 BioGeographical Ancestry (BGA) test which examines one's genome to assay genetic content into four major population groups defined by this company. The test determines the dominant BioGeographic Ancestry group percentage of the four major population groups assayed by this test: (I) - Native American, (II) - Sub-Saharan African, (III) - Indo-European and (IV) - East Asian and also determines the percentage admixture if any from the groups which were not the test subject's dominant group. I took the test as a curiousity since I am very interested in exploring the use of any new genetics tools to aide one in genealogical research. I was already leading another Genealogy by Genetics Y-DNA surname project using the Y Chromosome test for the prior two years. I took the test in early February and received my results of this BGA test in early March at which time I unexpectedly learned that my genome is 79% Indo-European and 21% East Asian, instead of the 100% Indo-European I expected. A second lab test in April 2003 using new cheek cell specimens from me resulted in the exact same results. Since genealogy is my major hobby and I have been researching my genealogy for about 30 years and I had my ancestry chart traced back 8-15 generations on the various branches leading back to German, Swiss, and French Huguenot immigrants with one English lady way back, i.e., a typical PA German, aka PA Dutch, ancestry chart, this 21% East Asian content was a surprising result.

...

[url=http://www.kerchner.com/pa-gerdna.htm]http://www.kerchner.com/pa-gerdna.htm[/url]

sun tzu

2003-08-14 03:35 | User Profile

[url=http://www.gnxp.com/MT2/archives/000870.html]http://www.gnxp.com/MT2/archives/000870.html[/url]

** **Godless comments: **

The problem with ABD is that it doesn't give a confidence estimate. Their algorithm is a linear classifier applied in SNP space:

Allele frequencies of 56 SNPs (most from pigmentation genes) were dramatically different between groups of unrelated individuals of Asian, African, and European descent, ... A linear classification method was developed for incorporating these SNPs into a classifier model..

Very briefly, the idea is to represent each person by a 56 dimensional vector, with entries being the typed SNPs, and to come up with simple functions that separate the resulting 56 dimensional vector space into sets that correspond to racial groups. This is how it'd play out if you had a 2 dimensional vector with continuous values in each of the entries:

[img]http://www.cnel.ufl.edu/~deniz/academic/projects/iono/4.jpg[/img]

The problem with this "21%" figure is that it includes none of the associated probabilistic data. Very few alleles are exclusive to a population, so there is the possibility that some fraction of people will have alleles more common in other populations just by random chance . The use of SNPs (even informative SNPs) rather than full haplotype blocks makes this confounding factor more likely.

As an example, take a look at ALFRED's list of population-related allele frequency variations in a serotonin receptor. The "G861C HincII" polymorphism in this sequence has two common possibilities for the base pair at that location: G and C. Among the Yoruba, G is found 81.6% of the time, while C is found 18.4% of the time. Among the Japanese, G is found 64.3 % of the time, and C is found 35.7% of the time. [1]

If you ran a naive linear classifier on this data to classify genotypes into Japanese and Yorubans, your algorithm would end up assigning those with G's to the Yoruban group, and those with C's to the Japanese group. Needless to say, you'd get a lot of false classifications (You can work out the exact error rate). But that's the best you can do with such uninformative loci, as the frequencies don't sharply differ between the two populations.

The ABD authors claim that they're using very informative loci, which is not impossible. Certain alleles are almost exclusively found in certain populations:

A. The Fyo allele of the Duffy blood group system occurs in ca. 100% of sub-Saharan Africans and is rare in other populations. ... B. The Dia allele of the Diego blood group system is found only in Asians and Amerinds and supports the close genetic affinity of these populations.

Note that the Fyo allele is exactly the kind of allele we're looking for: very common within a population, and almost entirely absent outside that population. If the Dia allele is only found in Asians/Amerinds, but is infrequent within that group, it's not so useful for classification. [3] Duffy type alleles are the exception and not the rule, however, as you can learn for yourself in even a cursory browsing of ALFRED.

Ok. So, after all that foreplay, you can see where I'm going with this. It is quite likely that the guy in question simply had a chance combination of alleles at the measured SNPs that are more common in East Asian populations. I think that measurements of other SNPs (or, even better, full haplotype blocks) would put him squarely back into the European category. What is absent yet necessary is a probabilistic statement of how likely it is that his allele distribution was due to chance rather than actual East Asian ancestry. [3]

[1] Assume these values to be exact for now. A more sophisticated treatment would include error bars to account for sample size effects. [2] I don't know the Dia frequency. [3] This is what is done in rape trials, for example, when DNA evidence is presented.

A forensic scientist told the court that semen at the scene was 700 billion times more likely to belong to Reekie than any other man One can critique the (frequent, unstated) assumption that the typed loci in this trial were truly independent, but the basic principle holds: it's important to give a probabilistic statement of the signal-to-noise ratio. **