Files
Abstract
Improvements in technology have made it possible to conduct genome-wide association mapping at costs within reach of academic investigators, and experiments are currently being conducted with a variety of high-throughput platforms. To provide an appropriate context for interpreting results of such studies, we summarize here results of an investigation of one of the first of these technologies to be publicly available, the Affymetrix GeneChip Human Mapping 100K set of single nucleotide polymorphisms (SNPs). In a systematic analysis of the pattern and distribution of SNPs in the Mapping 100K set, we find that SNPs in this set are undersampled from coding regions (both nonsynonymous and synonymous) and oversampled from regions outside genes, relative to SNPs in the overall HapMap database. In addition, we utilize a novel multilocus linkage disequilibrium (LD) coefficient based on information content (analogous to the information content scores commonly used for linkage mapping) that is equivalent to the familiar measure r2 in the special case of two loci. Using this approach, we are able to summarize for any subset of markers, such as the Affymetrix Mapping 100K set, the information available for association mapping in that subset, relative to the information available in the full set of markers included in the HapMap, and highlight circumstances in which this multilocus measure of LD provides substantial additional insight about the haplotype structure in a region over pairwise measures of LD.