correlation-methylation-epiFeatures

This skill provides a complete pipeline for integrating CpG methylation data with chromatin features such as ATAC-seq signal, H3K27ac, H3K4me3, or other histone marks/TF signals.

$ インストール

git clone https://github.com/BIsnake2001/ChromSkills /tmp/ChromSkills && cp -r /tmp/ChromSkills/27.correlation-methylation-epiFeatures ~/.claude/skills/ChromSkills

// tip: Run this command in your terminal to install the skill


name: correlation-methylation-epiFeatures description: This skill provides a complete pipeline for integrating CpG methylation data with chromatin features such as ATAC-seq signal, H3K27ac, H3K4me3, or other histone marks/TF signals.

Integrative Analysis of DNA Methylation and Chromatin Features

1. Overview

Main steps include:

  • Refer to the Inputs & Outputs section to check required inputs and set up the output directory structure.
  • Always prompt user for genome assembly used.
  • Always prompt user for which columns in the methylation BED files are methylation fraction/percent and coverage and strand.
  • Load and preprocess CpG methylation data
  • Tile methylation into fixed-size windows (e.g., 1kb) or in target regions.
  • Import chromatin feature signal from bigWig files
  • Build a unified region-level integration table
  • Calculate correlations between every two features.
  • Visualization

2. When to Use This Skill

Use this pipeline when you want to explore how DNA methylation relates to chromatin state, accessibility, or histone modifications. Suitable scenarios include:

  • Assessing promoter/enhancer activation via methylation & ATAC/H3K27ac
  • Integrating multi-omics datasets (ChIP-seq, ATAC-seq, WGBS)
  • Evaluating epigenomic shifts across conditions, tissues, or celltypes

3. Inputs & Outputs

Inputs

<methylation_coverage>.bed <epi_feature_1>.bw <epi_feature_2>.bw <target_regions>.bed (optional) <genomic_annotation>.gtf (optional)

Outputs

corr_epi_methylation/
  stats/
    region_signal_table.tsv   # Unified table of methylation + chromatin signal
    correlation_table.tsv                 # Per-feature Spearman correlations
  plots/
    *.pdf                          # heatmap/scatterplot of the correlations
  temp/

4. Decision Tree

STEP 1: Prepare the sample methylation data

library(GenomicRanges)
library(methylKit)
meth_files <- list("sample1.cov", "sample2.cov")
sample_ids <- c("S1", "S2")

meth <- methRead(
  location = "sample.bed",
  sample.id = "S1",
  assembly = "hg38",  # provided by user
  treatment = 0,
  context = "CpG",
  pipeline = list(
    fraction = FALSE,  # percMeth is 0–100, fraction is 0-1, depend on inputs
    chr.col = 1,
    start.col = 2,
    end.col = 3,
    strand.col = 6,    # provided by user
    coverage.col = 10, # provided by user
    freqC.col = 11     # provided by user
  )
)

STEP 3: Tile methylation into 1kb bins or count methylation in target regions

Option 1: no BED for target regions provided, calculate correlation in fix bin size

library(rtracklayer)
meth_tile <- tileMethylCounts(meth, win.size = 1000)
d <- getData(meth_tile)
mean_methylation <- d$numCs / (d$numCs + d$numTs)
regions <- as(meth_tile, "GRanges")

Option 2: Target regions provided, calculate correlation in target bins

library(rtracklayer)
bed_file <- "targets.bed"
targets <- import(bed_file, format = "BED")
meth_region <- regionCounts(meth, regions = targets)
d <- getData(meth_region)
mean_methylation <- d$numCs / (d$numCs + d$numTs)
regions     <- as(meth_region, "GRanges")  # similar to 'targets'

Option 3: calculate correlation in target genomic regions (e.g. promoter)

library(TxDb.Hsapiens.UCSC.hg38.knownGene) # depend on the genomic assembly provide by user
library(rtracklayer)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
gene_gr <- genes(txdb)   # one GRanges per gene
regions <- promoters(gene_gr,  # prompt the user for the definition of promoter
                          upstream  = 2000,
                          downstream = 200)
regions <- keepStandardChromosomes(promoters_gr, pruning.mode = "coarse")

meth_region <- regionCounts(meth, regions = regions)
d <- getData(meth_region)
mean_methylation <- d$numCs / (d$numCs + d$numTs)
regions <- as(meth_region, "GRanges")  # similar to 'targets'

Step 4: Build integrated region table

bw_ATAC    <- "ATAC.bigWig"
bw_H3K27ac <- "H3K27ac.bigWig"
bw_H3K4me3 <- "H3K4me3.bigWig"
... # Other availabel genomic features

get_bw_mean <- function(bw_file, regions) {
  bw_list <- import(bw_file, which = regions, as = "NumericList")
  sapply(bw_list, function(x) mean(x, na.rm = TRUE))
}

ATAC_sig    <- get_bw_mean(bw_ATAC,    regions)
H3K27ac_sig <- get_bw_mean(bw_H3K27ac, regions)
H3K4me3_sig <- get_bw_mean(bw_H3K4me3, regions)

# Avoid adding the gene_id column when build the data frame here
df <- data.frame(
  seqnames = seqnames(regions),
  start = start(regions),
  end = end(regions),
  mean_methylation = mean_methylation,
  ATAC = ATAC_sig,
  H3K27ac = H3K27ac_sig,
  H3K4me3 = H3K4me3_sig
)

write.table(df, "region_signal_table.tsv", sep="\t",
            quote=FALSE, row.names=FALSE)

STEP 6: Calculate correlations

features_mat <- df[, c("mean_methylation", "ATAC", "H3K27ac", "H3K4me3")]
cor_mat <- cor(
  features_mat,
  use = "pairwise.complete.obs",
  method = "spearman"
)

write.table(
  cor_mat,
  "feature_correlation_tabel.tsv",
  sep = "\t",
  quote = FALSE,
  col.names = NA
)

STEP 7: Visualization

pdf("feature_correlation_heatmap.pdf", width = 4, height = 4)
pheatmap(
  cor_mat,
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  display_numbers = TRUE,
  number_format = "%.2f",
  main = "Feature correlation"
)
dev.off()

# Scatter plots
pdf(file.path(output_dir, "plots", "methylation_epi_scatterplots.pdf"), width = 10, height = 5)
par(mfrow = c(1, 2))

# Methylation vs ATAC
plot(df_clean$mean_methylation, df_clean$ATAC,
     xlab = "Mean Methylation (%)", ylab = "ATAC-seq Signal",
     main = paste("Methylation vs ATAC-seq\nrho =", round(cor_mat["mean_methylation", "ATAC"], 3)),
     pch = 16, cex = 0.5, col = rgb(0, 0, 1, 0.3))
... # other methylation vs. feature pairs
dev.off()