Author's: Gaolin Zheng, Chung-Hao Chen and Tom Milledge
Pages: [125] - [134]
Received Date: May 14, 2010
Submitted by:
Motivation: SNPs have shown a lot of promises in disease
association, personalized medicine, and population classification
studies. The completion of International Hapmap Project has
facilitated the SNP-based ethno-classification. Due to the large
amount of SNPs in the human genome, it is desirable to find a small
set of informative SNPs for the classification task. Previous studies
tried to find ethnically related SNPs from all the chromosomes and
mitochondria and genotype data are usually treated as numeric data.
Here, we focus on two small ethnically related genomic pieces in order
to reduce noise. We apply a categorical statistical testing method to
find marker SNPs. We evaluate its performance with two non-categorical
statistical methods.
Results: We ranked SNPs based on three statistical testing
methods and used the top SNPs for ethno-classification via support
vector machine. The best results were obtained with a chi-squared test
of independence, where using only the top two mitochondrial SNPs
resulted in a classification accuracy of 98.9%. The top 10
mitochondrial SNPs identified from all the three statistical tests
were able to completely classify the three populations.
SNP selection, ethno-classification, chi-squared test, Kruskal-Wallis test, support vector machine.