differential-region-analysis
The differential-region-analysis pipeline identifies genomic regions exhibiting significant differences in signal intensity between experimental conditions using a count-based framework and DESeq2. It supports detection of both differentially accessible regions (DARs) from open-chromatin assays (e.g., ATAC-seq, DNase-seq) and differential transcription factor (TF) binding regions from TF-centric assays (e.g., ChIP-seq, CUT&RUN, CUT&Tag). The pipeline can start from aligned BAM files or a precomputed count matrix and is suitable whenever genomic signal can be summarized as read counts per region.
$ 安裝
git clone https://github.com/BIsnake2001/ChromSkills /tmp/ChromSkills && cp -r /tmp/ChromSkills/8.differential-region-analysis ~/.claude/skills/ChromSkills// tip: Run this command in your terminal to install the skill
name: differential-region-analysis description: The differential-region-analysis pipeline identifies genomic regions exhibiting significant differences in signal intensity between experimental conditions using a count-based framework and DESeq2. It supports detection of both differentially accessible regions (DARs) from open-chromatin assays (e.g., ATAC-seq, DNase-seq) and differential transcription factor (TF) binding regions from TF-centric assays (e.g., ChIP-seq, CUT&RUN, CUT&Tag). The pipeline can start from aligned BAM files or a precomputed count matrix and is suitable whenever genomic signal can be summarized as read counts per region.
Differential Region Analysis with DESeq2
Overview
This skill performs differential region analysis between experimental conditions using DESeq2 in a count-based framework. Main steps include:
- Initialize the project directory.
- Refer to the Inputs & Outputs section to check inputs and build the output architecture. All the output file should located in
${proj_dir}in Step 0. - Always prompt user if required files are missing.
- Always prompt user for the threshold of
qvaluesandlog2foldchangeto define significant regions. - Merge peaks across replicates or samples to build a consensus peak set.
- Generate read count matrix over peaks using featureCounts or bedtools.
- Prepare sample metadata file describing conditions and replicates.
- Perform differential analysis using DESeq2.
- Visualize and interpret results (PCA, volcano plot).
- Output significantly up and down accessible regions.
When to use this skill
Use the differential-region-analysis pipeline when your goal is to identify genomic regions with condition-dependent changes in signal intensity, provided the signal can be represented as raw read counts per region.
Recommended scenarios include:
- Comparing treated vs. control samples to identify regulatory regions responsive to a drug, signaling molecule, or environmental change.
- Investigating cell differentiation or developmental trajectories to reveal dynamic chromatin remodeling.
- Analyzing disease vs. normal tissues to pinpoint dysregulated enhancer or promoter accessibility.
- Integrating with RNA-seq or ChIP-seq data to connect chromatin accessibility with transcriptional or epigenetic regulation.
The pipeline performs best with datasets containing biological replicates (≥2 per condition) and moderate to high sequencing depth (~20–50 million reads per sample).
Inputs & Outputs
Inputs (choose one)
- If starting from BAM files and BED peak files → Generate consensus peaks and count matrix.
- If starting from existing count matrix → Go directly to DESeq2 analysis.
- If multiple conditions or batches → Include batch/condition in design
Outputs
${sample}_DAR_analysis/ # or ${tf}_${sample}_DB_analysis in differential TF binding detection task
tables/
all_peaks.bed
consensus_peaks.bed # Unified peak set
atac_counts.txt # Count matrix of reads per peak
samples.csv # Sample metadata
DARs/
DAR_results.csv # DESeq2 results (log2FC, p-values)
DAR_sig.bed # Significantly diffential accessible regions
DAR_up.bed
DAR_down.bed
plots/ # visualization outputs
PCA.pdf
Volcano.pdf
logs/ # analysis logs
temp/ # other temp files
Decision Tree
Step 0: Initialize Project
- Make director for this project:
Call:
mcp__project-init-tools__project_init
with:
sample: sample name (e.g. c1_vs_c2)task: DAR_analysis
The tool will:
- Create
${sample}_DAR_analysis(or${tf}_${sample}_DB_analysis) directory. - Return the full path of the
${sample}_DAR_analysis(or${tf}_${sample}_DB_analysis) directory, which will be used as${proj_dir}.
Step 1: Generate Consensus Peaks
Combine peaks from replicates to define a shared feature space. Call:
- mcp__pydeseq2-tools__generate_consensus_peaks with:
bed_files: List of paths to peak BED files from replicates.output_bed: Output path for the merged consensus BED file.output_saf: Output path for the SAF file (needed for featureCounts)
Output: consensus_peaks.bed, consensus_peaks.saf
Step 2: Generate Count Matrix
Call:
- mcp__pydeseq2-tools__count_reads_featurecounts
with:
saf_file: SAF file output from Step 1.bam_files: List of paths to BAM files.output_counts: Path to output count matrix.is_paired_end: Whether the BAM file is pair end or not.threads
Output: atac_counts.txt
Step 3: Prepare Metadata
Prepare samples.csv describing condition and replicate information.
sample,condition,replicate
sample1.bam,c1,1
sample2.bam,c1,2
sample3.bam,c2,1
sample4.bam,c2,2
Step 4: Differential Accessibility with pyDESeq2
Call:
- mcp__pydeseq2-tools__run_pydeseq2_analysis
with:
- counts_file: Path to featureCounts from Step 2.
- metadata_file: Path to metadata CSV from Step 3.
- design_factors: Design formula columns (e.g. 'condition' or 'batch,condition').
- contrast_column: Column name for contrast (e.g. 'condition').
- contrast_control: Control group name (e.g. 'Control').
- contrast_treatment: Treatment group name (e.g. 'Treated').
- output_csv: Output path for results CSV.
Output: DAR_results.csv or ${tf}_DB_results.csv
Step 5: Visualization and QC
Call:
- mcp__pydeseq2-tools__visualize_results
with:
results_csv: Path to DESeq2 results CSV.counts_file: Path to original counts file (for PCA).metadata_file: Path to metadata (for PCA grouping).output_dir: Directory to save plots.condition_col: (e.g."condition")
Step 6: Output significantly up and down accessible regions
Call:
- mcp__pydeseq2-tools__filter_and_export_bed
with:
results_csv: Path to DESeq2 results CSV.output_prefix: Prefix for output BED files.padj_cutoff: Provided by userlog2fc_cutoff: Provided by user
Output: DAR_sig.bed DAR_up.bed DAR_down.bed or ${tf}_DB_sig.bed ${tf}_DB_up.bed ${tf}_DB_down.bed
Advanced Usage
- Batch effects:
design = ~ batch + condition - Multi-group comparison:
contrast=c("condition","A","B") - Time series:
DESeq(dds, test="LRT", reduced=~1) - Filter low counts:
dds[rowSums(counts(dds)) >= 20, ]
Notes & Troubleshooting
| Issue | Solution |
|---|---|
| Very low counts | Increase threshold (rowSums >= 20) |
| Batch effect | Add batch term to design |
| Non-converging model | Use fitType="local" or betaPrior=FALSE |
| Mismatched sample names | Ensure count column names match metadata rows |
Repository
