Computational epigenomics is a growing field that until now has been mainly focused in two topics: identification of methylated CpG islands, and allele- specific cytosine methylation. But still there is no algorithm for methylation pattern prediction, based on sequence alone. In order to develop this kind of predictor, Zhang's group generated a training dataset from the enzymatic digestion of human brain DNA. They found three kinds of differences between the methylated (M) and unmethylated (U) fragment groups: (i) the Takai- Jones criteria for CpG islands (a larger number in the U set); (ii) the distribution of Alu elements (M sequences were richer in AluY and AluS); and (iii) hexamer abundance. These measures were integrated into a support vector machine (SVM) classifier approach. SVM correctly predicted the methylation status of non-CGIs and CGIs, with 84% and ~97% accuracies respectively, using an optimal sliding window of 800 bp. This was an important advance because, opposite to previous datasets, this came from normal human DNA. The implemented algorithm, HDMFINDER, is available from the authors.
Das R. et al. 2006. Proc Natl Acad Sci USA 103(28): 10713-16



