Volume no :3, Issue no: 2, June (2010)

EVALUATION OF STATISTICAL TESTS FOR ETHNO-SNP SELECTION

Author's: Gaolin Zheng, Chung-Hao Chen and Tom Milledge
Pages: [125] - [134]
Received Date: May 14, 2010
Submitted by:

Abstract

Motivation: SNPs have shown a lot of promises in disease association, personalized medicine, and population classification studies. The completion of International Hapmap Project has facilitated the SNP-based ethno-classification. Due to the large amount of SNPs in the human genome, it is desirable to find a small set of informative SNPs for the classification task. Previous studies tried to find ethnically related SNPs from all the chromosomes and mitochondria and genotype data are usually treated as numeric data. Here, we focus on two small ethnically related genomic pieces in order to reduce noise. We apply a categorical statistical testing method to find marker SNPs. We evaluate its performance with two non-categorical statistical methods.
Results: We ranked SNPs based on three statistical testing methods and used the top SNPs for ethno-classification via support vector machine. The best results were obtained with a chi-squared test of independence, where using only the top two mitochondrial SNPs resulted in a classification accuracy of 98.9%. The top 10 mitochondrial SNPs identified from all the three statistical tests were able to completely classify the three populations.

Keywords

SNP selection, ethno-classification, chi-squared test, Kruskal-Wallis test, support vector machine.