πŸ€– BCH709 AI Assistant: Ask questions about this class using NotebookLM

BCH709 Introduction to Bioinformatics: ChIP-Seq Tutorial

Paper Reading

Please read these papers before class:


Overview: ChIP-Sequencing

ChIP-Seq (Chromatin Immunoprecipitation followed by Sequencing) identifies genomic regions bound by a protein of interest (transcription factor, histone modification) across the entire genome.

How ChIP-Seq Works

  1. Crosslink β€” Fix protein-DNA interactions with formaldehyde
  2. Shear β€” Fragment chromatin by sonication or MNase digestion
  3. Immunoprecipitate β€” Pull down target protein with a specific antibody
  4. Reverse crosslinks β€” Release DNA from protein
  5. Sequence β€” Sequence the enriched DNA fragments

Types of ChIP-Seq Targets

Target Type Examples Peak Width Notes
Transcription factor (TF) CTCF, p53, MYC Narrow (100–500 bp) Sharp peaks
Active promoter H3K4me3, H3K9ac Narrow Near TSS
Active enhancer H3K4me1, H3K27ac Broad (1–10 kb) Distal regulatory
Repressive H3K27me3, H3K9me3 Very broad (10–100 kb) Heterochromatin
Elongating RNAPII H3K36me3 Gene body Transcribed regions

ChIP-Seq Controls

Control Type Description When to Use
Input Chromatin before IP Always recommended
IgG Non-specific antibody IP Alternative to input
Mock IP No antibody Less common

The ChIP-Seq Workflow

Raw FASTQ (ChIP + Input/Control)
    β”‚
    β”œβ”€β”€ FastQC β†’ MultiQC (QC)
    β”œβ”€β”€ fastp (trimming)
    β”‚
    β”œβ”€β”€ Minimap2 (alignment to reference genome)
    β”‚       β”‚
    β”‚       └── SAMtools sort + index
    β”‚
    β”œβ”€β”€ Picard MarkDuplicates (remove PCR duplicates)
    β”‚
    β”œβ”€β”€ deepTools bamCoverage (generate bigWig tracks)
    β”‚
    β”œβ”€β”€ MACS3 (peak calling)
    β”‚       β”‚
    β”‚       β”œβ”€β”€ Narrow peaks (TF, H3K4me3)
    β”‚       └── Broad peaks (H3K27me3, H3K36me3)
    β”‚
    β”œβ”€β”€ deepTools plotFingerprint (IP quality)
    β”œβ”€β”€ deepTools plotCorrelation (replicate concordance)
    β”œβ”€β”€ deepTools computeMatrix + plotHeatmap (signal profiles)
    β”‚
    β”œβ”€β”€ HOMER / MEME-ChIP (motif analysis)
    β”‚
    └── ChIPseeker / DiffBind (peak annotation, differential binding)

1. Environment Setup

Create Conda Environment

conda create -n chipseq -c bioconda -c conda-forge python=3.11 -y
conda activate chipseq

# Core alignment + QC tools (via conda)
conda install -c bioconda -c conda-forge fastqc fastp minimap2 samtools -y
conda install -c bioconda -c conda-forge openjdk=17 picard bedtools -y

# MACS3 and deepTools install cleanest via pip (conda has dependency conflicts)
pip install macs3 deeptools multiqc

# HOMER (optional, for motif analysis β€” large download)
conda install -c bioconda homer -y

# IDR (optional, for replicate reproducibility analysis)
# NOTE: the pip package named "idr" is unrelated; install from GitHub:
pip install git+https://github.com/nboley/idr.git

Note on conda errors: if you see TypeError: 'NoneType' object is not iterable from conda install, your conda solver is too old. Either upgrade conda (conda update -n base conda) or add --solver=libmamba to each install command.

Note: R packages (ChIPseeker, DiffBind) are installed separately within R (see Sections 13–14).

libcrypto fix (if samtools reports missing library):

# Check which libcrypto is present first
ls ${CONDA_PREFIX}/lib/libcrypto.so.*
# Symlink it (use the actual version you see, often .so.3 on modern installs)
ln -sf ${CONDA_PREFIX}/lib/libcrypto.so.3 ${CONDA_PREFIX}/lib/libcrypto.so.1.0.0

Verify Installations

fastp --version
minimap2 --version
samtools --version
macs3 --version
deeptools --version

Create Working Directory

mkdir -p ~/bch709/chipseq
cd ~/bch709/chipseq

2. Experimental Design Considerations

Replicates

Type Minimum Recommended
Biological replicates 2 3+
Technical replicates Not needed β€”

Replicate concordance targets (ENCODE):

Sequencing Depth

Target Minimum Reads Notes
TF ChIP-Seq 20 million Narrow peaks
Histone ChIP-Seq 40–50 million Broad marks
Input control Match ChIP depth Or >1.5x ChIP

Antibody Validation

Use only antibodies validated for ChIP:


3. Quality Control

Download Example Data

We use CTCF ChIP-Seq on human K562 cells (ENCODE experiment ENCSR000DWE) together with its matching input control (ENCSR000DWA). All three files are single-end 36 bp Illumina reads aligned to hg19.

cd ~/bch709/chipseq

# CTCF ChIP-Seq on human K562 (ENCODE ENCSR000DWE)
wget https://www.encodeproject.org/files/ENCFF001HTP/@@download/ENCFF001HTP.fastq.gz -O chip.fastq.gz

# Matching input control for K562 (ENCODE ENCSR000DWA)
wget https://www.encodeproject.org/files/ENCFF001HTT/@@download/ENCFF001HTT.fastq.gz -O input.fastq.gz

ls -lh *.fastq.gz   # expect ~910MB and ~690MB respectively

Verify the download: ENCODE’s S3 URLs are large (~1 GB each). If wget gets interrupted, the file will be truncated and later steps (fastp, minimap2) fail with unexpected end of file. Always validate:

gunzip -t chip.fastq.gz && echo "chip OK"   # should print "chip OK"
gunzip -t input.fastq.gz && echo "input OK"

If the test fails, resume with wget -c <url> (the -c flag continues from where it stopped).

For ENCODE data, browse experiments at encodeproject.org and download FASTQ files directly. Each experiment page lists its β€œControls” β€” always pair ChIP files with their matching input, not a random control.

Run FastQC

fastqc -t 4 chip_R1.fastq.gz input.fastq.gz
multiqc .

ChIP-Seq Specific QC Metrics

Metric Description Target
Mapping rate % reads aligned > 80%
Non-redundant fraction (NRF) Unique reads / total > 0.8
PCR Bottleneck Coefficient (PBC) M1/Md > 0.9
FRiP (Fraction of Reads in Peaks) Reads in peaks / total > 1% TF; > 0.3% histone

4. Read Trimming

fastp performs adapter detection, quality trimming, and QC reporting in a single pass.

Option Description
--in1 Input FASTQ (single-end) or forward reads (paired-end)
--out1 Output trimmed FASTQ
--detect_adapter_for_pe Auto-detect adapters (use for paired-end)
--qualified_quality_phred 20 Minimum base quality threshold (Q20)
--length_required 25 Discard reads shorter than 25 bp
--thread 4 Number of threads
mkdir -p trim

# Trim ChIP sample
fastp \
  --in1 chip.fastq.gz \
  --out1 trim/chip_trimmed.fq.gz \
  --qualified_quality_phred 20 \
  --length_required 25 \
  --thread 4 \
  --html trim/chip_fastp.html \
  --json trim/chip_fastp.json

# Trim Input control
fastp \
  --in1 input.fastq.gz \
  --out1 trim/input_trimmed.fq.gz \
  --qualified_quality_phred 20 \
  --length_required 25 \
  --thread 4 \
  --html trim/input_fastp.html \
  --json trim/input_fastp.json

multiqc trim/ -n trim_report

For paired-end data, add --in2, --out2, and --detect_adapter_for_pe options.


5. Reference Genome Preparation

Download Reference

Our example FASTQs (ENCFF001HTP / ENCFF001HTT) are from early ENCODE and were aligned to hg19. For full-genome mapping we download the full assembly; for a quick tutorial run, chr19 alone (~58 MB) is enough.

# Option A β€” full hg19 genome (~3 GB after decompression, recommended)
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
gunzip hg19.fa.gz
mv hg19.fa reference.fasta

# Option B β€” chr19 only, for a fast tutorial run (small disk / short time)
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/chromosomes/chr19.fa.gz
gunzip chr19.fa.gz
mv chr19.fa reference.fasta

Using Option B (single chromosome)? Only reads that map to chr19 will align (~2–6% of total reads). This is fine for testing pipeline commands but won’t give biologically meaningful peak counts. For real analysis use Option A.

Create FASTA Index

samtools faidx reference.fasta

Index the Reference for Minimap2

Minimap2 can build the index on the fly, but pre-computing it speeds up repeated alignments:

# -d : save index to file
minimap2 -d reference.mmi reference.fasta

Why Minimap2 for ChIP-Seq? Minimap2 (-ax sr mode) handles short-read DNA alignment efficiently and is suitable for both short (< 100 bp) and longer Illumina reads. Unlike HISAT2/STAR, it is not splice-aware, making it appropriate for ChIP-Seq where reads come from genomic DNA.


6. Alignment with Minimap2

Minimap2 uses the -ax sr preset for short Illumina reads.

Option Description
-ax sr Short-read alignment preset
-t 4 Number of threads
reference.mmi Pre-built minimap2 index

Align ChIP Sample

set -o pipefail   # so any failure in the pipe is caught

minimap2 -ax sr -t 4 reference.mmi trim/chip_trimmed.fq.gz \
  2> chip.minimap2.log \
  | samtools sort -@ 4 -o chip.bam

samtools index chip.bam
samtools quickcheck chip.bam && echo "chip BAM OK"
samtools flagstat chip.bam

Align Input Control

minimap2 -ax sr -t 4 reference.mmi trim/input_trimmed.fq.gz \
  2> input.minimap2.log \
  | samtools sort -@ 4 -o input.bam

samtools index input.bam

Paired-end ChIP-Seq reads? Supply both FASTQ files:

minimap2 -ax sr -t 4 reference.mmi \
  trim/chip_R1_trimmed.fq.gz trim/chip_R2_trimmed.fq.gz \
  2> chip.minimap2.log | samtools sort -@ 4 -o chip.bam

Filter Low-Quality and Unmapped Reads

Option Description
-b Output BAM format
-q 30 Minimum mapping quality (MAPQ >= 30)
-F 4 Exclude unmapped reads
samtools view -b -q 30 -F 4 chip.bam > chip.filt.bam
samtools index chip.filt.bam

samtools view -b -q 30 -F 4 input.bam > input.filt.bam
samtools index input.filt.bam

7. Mark and Remove Duplicates

For ChIP-Seq, PCR duplicates must be removed (unlike variant calling where they are only marked).

Option Description
I= Input BAM
O= Output BAM
M= Duplication metrics file
REMOVE_DUPLICATES=true Remove (not just flag) duplicate reads
VALIDATION_STRINGENCY=SILENT Suppress warnings on BAM format
picard MarkDuplicates \
  I=chip.filt.bam \
  O=chip.dedup.bam \
  M=chip.dedup.metrics \
  REMOVE_DUPLICATES=true \
  VALIDATION_STRINGENCY=SILENT

samtools index chip.dedup.bam

picard MarkDuplicates \
  I=input.filt.bam \
  O=input.dedup.bam \
  M=input.dedup.metrics \
  REMOVE_DUPLICATES=true \
  VALIDATION_STRINGENCY=SILENT

samtools index input.dedup.bam

8. Generate BigWig Signal Tracks

BigWig files are used for visualizing ChIP-Seq signal in genome browsers.

Normalize by Sequencing Depth (RPKM)

Option Description
--bam Input BAM file
--outFileName Output BigWig file
--normalizeUsing RPKM Normalize by reads per kilobase per million
--binSize 10 Resolution in base pairs
--numberOfProcessors 4 Number of threads
bamCoverage \
  --bam chip.dedup.bam \
  --outFileName chip.bw \
  --normalizeUsing RPKM \
  --binSize 10 \
  --numberOfProcessors 4

bamCoverage \
  --bam input.dedup.bam \
  --outFileName input.bw \
  --normalizeUsing RPKM \
  --binSize 10 \
  --numberOfProcessors 4

ChIP vs. Input Ratio (log2 fold change)

bamCompare \
  --bamfile1 chip.dedup.bam \
  --bamfile2 input.dedup.bam \
  --outFileName chip_vs_input.bw \
  --scaleFactorsMethod None \
  --normalizeUsing RPKM \
  --operation log2 \
  --binSize 10 \
  --numberOfProcessors 4

Common Error β€” --normalizeUsing RPKM is only valid if you also use --scaleFactorsMethod None!

Cause: bamCompare applies scaling in two places by default; combining both with RPKM is a conflict. Fix: Always pass --scaleFactorsMethod None when using --normalizeUsing RPKM (shown above).

Visualize in IGV

Load chip.bw, input.bw, and chip_vs_input.bw into IGV for visual inspection.


9. IP Quality Assessment with deepTools

Fingerprint Plot

The fingerprint plot shows how well the ChIP enrichment worked. A perfect ChIP shows a steep curve; input should be diagonal.

plotFingerprint \
  --bamfiles chip.dedup.bam input.dedup.bam \
  --labels ChIP Input \
  --plotFile fingerprint.png \
  --outRawCounts fingerprint.tab \
  --numberOfProcessors 4

Interpreting the fingerprint plot: A successful ChIP enrichment shows a steep curve (reads concentrated at a few genomic loci). The input control should be near-diagonal (reads evenly distributed). Poor enrichment = ChIP curve resembles the input.

Replicate Correlation

# Bin the genome and count reads in each bin
multiBamSummary bins \
  --bamfiles chip_rep1.dedup.bam chip_rep2.dedup.bam input.dedup.bam \
  --labels Rep1 Rep2 Input \
  --outFileName multibam.npz \
  --numberOfProcessors 4

# Plot correlation heatmap
plotCorrelation \
  --corData multibam.npz \
  --corMethod pearson \
  --skipZeros \
  --whatToPlot heatmap \
  --colorMap RdYlBu \
  --plotFile correlation.png \
  --outFileCorMatrix correlation.txt

Common Error β€” error: the following arguments are required: --whatToPlot/-p

Older documentation often says --plotType but current plotCorrelation uses --whatToPlot (valid values: heatmap, scatterplot).

Target: Pearson correlation between biological replicates > 0.9


10. Peak Calling with MACS3

MACS3 (Model-based Analysis of ChIP-Seq) is the standard tool for identifying enriched genomic regions.

MACS3 Key Options

Option Description
-t Treatment (ChIP) BAM file
-c Control (Input) BAM file
--format BAM Input file format
--gsize hs Effective genome size (hs=human, mm=mouse, ce=C. elegans, dm=Drosophila)
--name Output file prefix
--outdir Output directory
--qvalue 0.05 FDR threshold for peak calling
--broad Call broad peaks (for histone marks)
--broad-cutoff 0.1 FDR threshold for broad peaks

Narrow Peak Calling (TF, H3K4me3, H3K9ac)

macs3 callpeak \
  -t chip.dedup.bam \
  -c input.dedup.bam \
  --format BAM \
  --gsize hs \
  --name chip_narrow \
  --outdir macs3_narrow \
  --qvalue 0.05 \
  2>&1 | tee macs3_narrow.log

Common Error β€” MACS3 needs at least 100 paired peaks at + and - strand to build the model, but can only find 0

Cause: MACS3 tries to build a fragment-size model from strand-paired peaks. With low-depth data (or a single chromosome / small reference), it can’t find enough peaks to model. Fix: Bypass the model and set a fixed fragment size (147 bp is a reasonable default for nucleosome-associated reads):

macs3 callpeak -t chip.dedup.bam -c input.dedup.bam --format BAM --gsize hs \
    --name chip_narrow --outdir macs3_narrow --qvalue 0.05 \
    --nomodel --extsize 147

Broad Peak Calling (H3K27me3, H3K36me3, H3K9me3)

macs3 callpeak \
  -t chip.dedup.bam \
  -c input.dedup.bam \
  --format BAM \
  --gsize hs \
  --name chip_broad \
  --outdir macs3_broad \
  --broad \
  --broad-cutoff 0.1 \
  2>&1 | tee macs3_broad.log

MACS3 Output Files

File Description
_peaks.narrowPeak Peak coordinates + statistics
_peaks.xls Detailed peak table
_summits.bed Peak summit positions
_model.r Fragment size model (run in R)
_control_lambda.bdg Control lambda values
_treat_pileup.bdg ChIP pileup signal

NarrowPeak Format

Column Description
1 Chromosome
2 Start (0-based)
3 End
4 Peak name
5 Score
6 Strand
7 Signal value (fold enrichment)
8 -log10(p-value)
9 -log10(q-value)
10 Summit offset from start

Count and Filter Peaks

# Count total peaks
wc -l macs3_narrow/chip_narrow_peaks.narrowPeak

# View top peaks by fold enrichment (column 7)
sort -k7,7rn macs3_narrow/chip_narrow_peaks.narrowPeak | head -20

# Filter by FDR q-value < 0.01 (column 9 = -log10 qvalue, so > 2)
awk '$9 > 2' macs3_narrow/chip_narrow_peaks.narrowPeak > chip_narrow_filtered.bed

11. Signal Profiles and Heatmaps

Visualize ChIP signal around features (e.g., TSS, peak centers).

Download Gene Annotation and Build TSS BED

# Download refGene table (UCSC format: tab-separated text)
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz
gunzip refGene.txt.gz

# Build a BED file of transcription start sites (TSS = txStart on +, txEnd on -)
awk 'BEGIN{OFS="\t"} {
    if ($4=="+") print $3, $5, $5+1, $2, "0", $4;
    else         print $3, $6-1, $6, $2, "0", $4
}' refGene.txt | sort -k1,1 -k2,2n | uniq > TSS.bed

head TSS.bed
wc -l TSS.bed

The resulting TSS.bed has one row per transcript, with coordinates pointing to the exact TSS base.

Compute Signal Matrix Around TSS

Option Description
--scoreFileName BigWig signal file
--regionsFileName BED file with genomic regions
--referencePoint TSS Anchor point for the plot
--upstream / --downstream Window size around reference point
--binSize 50 Resolution in base pairs
computeMatrix reference-point \
  --scoreFileName chip.bw \
  --regionsFileName TSS.bed \
  --referencePoint TSS \
  --upstream 3000 \
  --downstream 3000 \
  --binSize 50 \
  --numberOfProcessors 4 \
  -o tss_matrix.gz

plotHeatmap \
  --matrixFile tss_matrix.gz \
  --outFileName tss_heatmap.png \
  --colorMap Blues \
  --whatToShow 'heatmap and colorbar' \
  --zMin 0 --zMax 5

Profile Plot

plotProfile \
  --matrixFile tss_matrix.gz \
  --outFileName tss_profile.png \
  --plotTitle "ChIP Signal at TSS"

12. Motif Analysis

Motif analysis identifies DNA sequence motifs enriched in ChIP-Seq peaks β€” typically the binding motif of the immunoprecipitated protein.

HOMER Motif Analysis

# Install HOMER genome (run once)
configureHomer.pl -install hg38

findMotifsGenome.pl usage: findMotifsGenome.pl <peaks.bed> <genome> <output_dir/> [options]

Option Description
-size 200 Region size around peak center for motif search
-mask Mask repeat sequences
-p 4 Number of threads
# Prepare peak file (convert narrowPeak to BED6)
awk '{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6}' \
  chip_narrow_filtered.bed > peaks_homer.bed

# Find motifs (de novo + known)
findMotifsGenome.pl peaks_homer.bed hg38 motif_output/ -size 200 -mask -p 4

HOMER Output

File Description
homerResults.html De novo discovered motifs
knownResults.html Matches to known motifs database
homerResults/motif1.motif PWM for top motif

13. Peak Annotation with ChIPseeker (R)

ChIPseeker annotates peaks relative to genomic features (promoter, exon, intron, etc.).

Install ChIPseeker

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("ChIPseeker", "TxDb.Hsapiens.UCSC.hg38.knownGene", "clusterProfiler"))

Annotate Peaks

library(ChIPseeker)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(clusterProfiler)

txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene

# Read peaks
peaks <- readPeakFile("chip_narrow_filtered.bed")

# Annotate
peakAnno <- annotatePeak(peaks, tssRegion=c(-3000, 3000),
                          TxDb=txdb, annoDb="org.Hs.eg.db")

# Plot annotation pie chart
plotAnnoPie(peakAnno)

# Plot distance to TSS
plotDistToTSS(peakAnno, title="Distribution of ChIP peaks\nrelative to TSS")

# Gene ontology of nearby genes
genes <- seq2gene(peaks, tssRegion=c(-1000, 1000),
                   flankDistance=3000, TxDb=txdb)
pathway <- enrichPathway(genes)
dotplot(pathway)

14. Differential Binding Analysis with DiffBind (R)

Compare ChIP-Seq signal between two conditions (e.g., treated vs. untreated).

Install DiffBind

BiocManager::install("DiffBind")

Sample Sheet Format

The sample sheet CSV must contain these columns:

SampleID Condition Replicate bamReads Peaks PeakCaller
chip_cond1_rep1 Condition1 1 chip_cond1_rep1.dedup.bam peaks_cond1_rep1.narrowPeak macs
chip_cond1_rep2 Condition1 2 chip_cond1_rep2.dedup.bam peaks_cond1_rep2.narrowPeak macs
chip_cond2_rep1 Condition2 1 chip_cond2_rep1.dedup.bam peaks_cond2_rep1.narrowPeak macs
chip_cond2_rep2 Condition2 2 chip_cond2_rep2.dedup.bam peaks_cond2_rep2.narrowPeak macs

Run DiffBind

library(DiffBind)

# Load sample sheet
samples <- read.csv("samplesheet.csv")
dba_obj <- dba(sampleSheet=samples)

# Count reads in peaks
dba_obj <- dba.count(dba_obj, bUseSummarizeOverlaps=TRUE)

# Set contrast
dba_obj <- dba.contrast(dba_obj, categories=DBA_CONDITION)

# Run differential analysis (DESeq2 by default)
dba_obj <- dba.analyze(dba_obj)

# Get significant peaks
db_peaks <- dba.report(dba_obj, th=0.05)

15. IDR Analysis (Irreproducibility Discovery Rate)

IDR measures reproducibility of peak calls across replicates. ENCODE requires IDR < 0.05.

Option Description
--samples Two replicate peak files
--input-file-type narrowPeak Input file format
--rank p.value Column to rank peaks by
--output-file Output IDR results
--plot Generate IDR diagnostic plot
# Sort peaks by score (column 5)
sort -k5,5rn chip_rep1_peaks.narrowPeak > rep1_sorted.bed
sort -k5,5rn chip_rep2_peaks.narrowPeak > rep2_sorted.bed

# Run IDR
idr \
  --samples rep1_sorted.bed rep2_sorted.bed \
  --input-file-type narrowPeak \
  --rank p.value \
  --output-file idr_peaks.txt \
  --plot \
  --log-output-file idr.log

Common Error β€” AttributeError: module 'numpy' has no attribute 'int'

IDR 2.0.3 is not compatible with NumPy β‰₯ 1.20 (where np.int was removed). Fix: Downgrade NumPy in the chipseq env, or use the bioconda build (which patches this):

pip install "numpy<1.20"
# OR use the bioconda-packaged idr:
conda install -c bioconda idr

Common Error β€” idr: Image Data Repository access library (wrong package)

pip install idr pulls a completely unrelated package (for microscopy images). Always install from the Kundaje/Boley repo:

pip uninstall idr -y
pip install git+https://github.com/nboley/idr.git

16. Full Workflow Summary

Step Tool Input Output
QC FastQC + MultiQC FASTQ HTML report
Trim fastp FASTQ Trimmed FASTQ
Align Minimap2 FASTQ BAM
Sort/Index SAMtools BAM Sorted BAM
Filter SAMtools BAM Filtered BAM (MAPQ >= 30)
Dedup Picard BAM Deduplicated BAM
Signal tracks deepTools bamCoverage BAM BigWig
IP quality deepTools plotFingerprint BAM Plot
Replicate QC deepTools plotCorrelation BAM Correlation matrix
Peak calling MACS3 BAM BED/narrowPeak
Heatmaps deepTools computeMatrix/plotHeatmap BigWig + BED Heatmap PNG
Motif analysis HOMER BED Motif report HTML
Annotation ChIPseeker (R) BED Annotated peaks
Differential DiffBind (R) BAM + peaks Differential peaks
Reproducibility IDR Peak files IDR peaks

17. Troubleshooting β€” Quick Reference

Problem Section Quick Fix
TypeError: 'NoneType' object is not iterable from conda 1. Setup Old conda solver β€” conda update -n base conda or add --solver=libmamba
libcrypto.so.1.0.0: cannot open 1. Setup ln -sf ${CONDA_PREFIX}/lib/libcrypto.so.3 ${CONDA_PREFIX}/lib/libcrypto.so.1.0.0
fastp: igzip: unexpected eof 3. QC FASTQ download was truncated β€” check with gunzip -t, resume with wget -c
samtools quickcheck EOF missing 6. Align BAM write was interrupted β€” delete and re-run
MACS3 Total number of paired peaks: 0 10. MACS3 Low depth β€” add --nomodel --extsize 147
--normalizeUsing RPKM is only valid if you also use --scaleFactorsMethod None 8. bamCompare Add --scaleFactorsMethod None
plotCorrelation: error: --whatToPlot required 9. deepTools Use --whatToPlot heatmap (not --plotType)
IDR module 'numpy' has no attribute 'int' 15. IDR pip install "numpy<1.20" or use bioconda idr
IDR Image Data Repository access library installed 1. Setup Wrong package β€” install from git+https://github.com/nboley/idr.git
HOMER findMotifsGenome.pl: command not found 12. HOMER conda install -c bioconda homer and configureHomer.pl -install hg19

18. Cleanup

# Remove large intermediate files when done
rm -f chip.bam chip.bam.bai chip.filt.bam chip.filt.bam.bai
rm -f input.bam input.bam.bai input.filt.bam input.filt.bam.bai
rm -rf trim/

conda deactivate
# Optional: remove environment entirely
conda env remove --name chipseq

References

Resource Link
ENCODE ChIP-Seq standards Landt et al. 2012, Genome Research
MACS3 paper Zhang et al. 2008, Genome Biology
deepTools paper Ramirez et al. 2016, Nucleic Acids Research
fastp paper Chen et al. 2018, Bioinformatics
ChIPseeker paper Yu et al. 2015, Bioinformatics
HOMER documentation homer.ucsd.edu
IDR framework Li et al. 2011, Ann. Applied Statistics
DiffBind paper Ross-Innes et al. 2012, Nature