CpGCpHhMM

hidden Markov model for detecting CpG-CpH differentially methylated regions


To run CpGCpHhMM,
please log in OpenLooper and visit this page again.



[vlog]


Introduction

Methylated non-CpGs (mCpHs) in mammalian cells yield weak enrichment signals and colocalize with methylated CpGs (mCpGs). The mCpHs are cell type-specific and associated with epigenetic regulation, although their dependency on mCpGs remains to be elucidated.We developed a hidden Markov model (HMM) to systematically detect genomic regions in which CpG and CpH are differentially methylated, providing an opportunity to infer the functional importance of non-CpG methylation.
An empirical HMM was designed to detect the differentially methylated regions (DMRs) of CpG and CpH (CpG-CpH DMRs) in each human sample. Specifically, the whole genome was segmented into 180 bp-long bins and the emission probability E for a state {P, N, or U) at each bin was calculated; P, positive correlation between CpH and mCpG methylation; N, negative correlation between CpH and CpG methylation; U, uncorrelated. The E for bins in which the number of reads aligned at CpGs and CpHs was >10 was calculated. In addition, to ensure the continuity of the Markov model, the genomic region was divided if the continuous undetected bins were longer than 100,000 bp, and the HMM was applied separately. The probability of state transition was estimated using an expectation-maximization (EM) algorithm that repeats the EM steps until the difference between the previous and current transition probabilities of all state transactions is <5e-4. Then, the Viterbi algorithm that finds an optimal path among the states was applied, and the bins were re-defined to P-, N-, or U-state. The consecutive bins were linked if they were in the same state and the distance between them was <3 bins (540 bp). Finally, the N-state regions were defined as CpG–CpH DMRs. More information is avaiable at Lee et al (2020).

Instruction

Input file
  • GZIP file with tap-seperated 4 columns.
    CHR: chromosome number (1..19,X,Y)
    BP: position of the cytosine
    CXX: cytosine context. CG, CHH, or CHG (both CHH and CHG are recognized as CH)
    mC_read: methylated read count mapped at this position
    totalC_read: total read count mapped at this position
  • Example:
    CHR BP CXX mC_read totalC_read
    19 60001 CHH 0 10
    19 60006 CHG 0 17
    19 60120 CG 2 2
    ...
    [Download an example gz file]

  • Output
  • output_dir/gblock/
    pos: starting position of 180bp bin
    mCGn: sum of methylated read count aligned at CG in the bin
    CGn: sum of total read count aligned at CG in the bin
    mCHn: sum of methylated read count aligned at CH in the bin
    CHn: sum of total read count aligned at CH in the bin
    refCGn: number of CG in the bin in hg19
    refCHn: number of CH in the bin in hg19

  • output_dir/emit/
    pos: starting position of 180bp bin
    mCGlv: average methylation level at CG
    refCGn: number of CG in the bin in hg19
    mCHlv: average methylation level at CH
    refCHn: number of CH in the bin in hg19
    e_p: probability that the bin is belong to P-state
    e_n: probability that the bin is belong to N-state
    e_i: probability that the bin is belong to I-state
    max_stat: state of top probability (0:P, 1:N, 2:I)

  • output_dir/initProb/
    randomly set initial probability of P-, N-, and I-state

  • output_dir/TRN/
    log-scaled transition rates between states. column order: P, N, I, row order: P, N, I

  • output_dir/viterbi/
    pos: starting position of 180bp bin
    mCGlv: average methylation level at CG
    refCGn: number of CG in the bin in hg19
    mCHlv: average methylation level at CH
    refCHn: number of CH in the bin in hg19
    state: state designated by Viterbi decoding (0:P-, 1:N-, 2:I-state)

  • output_dir/statistics/
    Number of bins deteced as P, N, and I-state by top emission probability and Viterbi decoding

  • NOTE:
  • n/a


  • Reference
    1. Jong-Hun Lee, Yutaka Saito, Sung-Joon Park, Kenta Nakai, "Existence and possible roles of independent non-CpG methylation in the mammalian brain", DNA Research dsaa020 (2020)