Paper Reading
Please read these papers before class:
Overview: ChIP-Sequencing
ChIP-Seq (Chromatin Immunoprecipitation followed by Sequencing) identifies genomic regions bound by a protein of interest (transcription factor, histone modification) across the entire genome.
How ChIP-Seq Works
- Crosslink β Fix protein-DNA interactions with formaldehyde
- Shear β Fragment chromatin by sonication or MNase digestion
- Immunoprecipitate β Pull down target protein with a specific antibody
- Reverse crosslinks β Release DNA from protein
- Sequence β Sequence the enriched DNA fragments
Types of ChIP-Seq Targets
| Target Type | Examples | Peak Width | Notes |
|---|---|---|---|
| Transcription factor (TF) | CTCF, p53, MYC | Narrow (100β500 bp) | Sharp peaks |
| Active promoter | H3K4me3, H3K9ac | Narrow | Near TSS |
| Active enhancer | H3K4me1, H3K27ac | Broad (1β10 kb) | Distal regulatory |
| Repressive | H3K27me3, H3K9me3 | Very broad (10β100 kb) | Heterochromatin |
| Elongating RNAPII | H3K36me3 | Gene body | Transcribed regions |
ChIP-Seq Controls
| Control Type | Description | When to Use |
|---|---|---|
| Input | Chromatin before IP | Always recommended |
| IgG | Non-specific antibody IP | Alternative to input |
| Mock IP | No antibody | Less common |
The ChIP-Seq Workflow
Raw FASTQ (ChIP + Input/Control)
β
βββ FastQC β MultiQC (QC)
βββ fastp (trimming)
β
βββ Minimap2 (alignment to reference genome)
β β
β βββ SAMtools sort + index
β
βββ Picard MarkDuplicates (remove PCR duplicates)
β
βββ deepTools bamCoverage (generate bigWig tracks)
β
βββ MACS3 (peak calling)
β β
β βββ Narrow peaks (TF, H3K4me3)
β βββ Broad peaks (H3K27me3, H3K36me3)
β
βββ deepTools plotFingerprint (IP quality)
βββ deepTools plotCorrelation (replicate concordance)
βββ deepTools computeMatrix + plotHeatmap (signal profiles)
β
βββ HOMER / MEME-ChIP (motif analysis)
β
βββ ChIPseeker / DiffBind (peak annotation, differential binding)
1. Environment Setup
Create Conda Environment
conda create -n chipseq -c bioconda -c conda-forge python=3.11 -y
conda activate chipseq
# Core alignment + QC tools (via conda)
conda install -c bioconda -c conda-forge fastqc fastp minimap2 samtools -y
conda install -c bioconda -c conda-forge openjdk=17 picard bedtools -y
# MACS3 and deepTools install cleanest via pip (conda has dependency conflicts)
pip install macs3 deeptools multiqc
# HOMER (optional, for motif analysis β large download)
conda install -c bioconda homer -y
# IDR (optional, for replicate reproducibility analysis)
# NOTE: the pip package named "idr" is unrelated; install from GitHub:
pip install git+https://github.com/nboley/idr.git
Note on conda errors: if you see
TypeError: 'NoneType' object is not iterablefromconda install, your conda solver is too old. Either upgrade conda (conda update -n base conda) or add--solver=libmambato each install command.
Note: R packages (
ChIPseeker,DiffBind) are installed separately within R (see Sections 13β14).
libcrypto fix (if samtools reports missing library):
# Check which libcrypto is present first ls ${CONDA_PREFIX}/lib/libcrypto.so.* # Symlink it (use the actual version you see, often .so.3 on modern installs) ln -sf ${CONDA_PREFIX}/lib/libcrypto.so.3 ${CONDA_PREFIX}/lib/libcrypto.so.1.0.0
Verify Installations
fastp --version
minimap2 --version
samtools --version
macs3 --version
deeptools --version
Create Working Directory
mkdir -p ~/bch709/chipseq
cd ~/bch709/chipseq
2. Experimental Design Considerations
Replicates
| Type | Minimum | Recommended |
|---|---|---|
| Biological replicates | 2 | 3+ |
| Technical replicates | Not needed | β |
Replicate concordance targets (ENCODE):
- IDR (Irreproducibility Discovery Rate) < 0.05
- Pearson correlation of read counts: > 0.9 (same antibody)
Sequencing Depth
| Target | Minimum Reads | Notes |
|---|---|---|
| TF ChIP-Seq | 20 million | Narrow peaks |
| Histone ChIP-Seq | 40β50 million | Broad marks |
| Input control | Match ChIP depth | Or >1.5x ChIP |
Antibody Validation
Use only antibodies validated for ChIP:
- Check manufacturerβs ChIP validation data
- Run western blot to confirm specificity
- Use ENCODE validated antibodies when possible
3. Quality Control
Download Example Data
We use CTCF ChIP-Seq on human K562 cells (ENCODE experiment ENCSR000DWE) together with its matching input control (ENCSR000DWA). All three files are single-end 36 bp Illumina reads aligned to hg19.
cd ~/bch709/chipseq
# CTCF ChIP-Seq on human K562 (ENCODE ENCSR000DWE)
wget https://www.encodeproject.org/files/ENCFF001HTP/@@download/ENCFF001HTP.fastq.gz -O chip.fastq.gz
# Matching input control for K562 (ENCODE ENCSR000DWA)
wget https://www.encodeproject.org/files/ENCFF001HTT/@@download/ENCFF001HTT.fastq.gz -O input.fastq.gz
ls -lh *.fastq.gz # expect ~910MB and ~690MB respectively
Verify the download: ENCODEβs S3 URLs are large (~1 GB each). If
wgetgets interrupted, the file will be truncated and later steps (fastp, minimap2) fail withunexpected end of file. Always validate:gunzip -t chip.fastq.gz && echo "chip OK" # should print "chip OK" gunzip -t input.fastq.gz && echo "input OK"If the test fails, resume with
wget -c <url>(the-cflag continues from where it stopped).
For ENCODE data, browse experiments at encodeproject.org and download FASTQ files directly. Each experiment page lists its βControlsβ β always pair ChIP files with their matching input, not a random control.
Run FastQC
fastqc -t 4 chip_R1.fastq.gz input.fastq.gz
multiqc .
ChIP-Seq Specific QC Metrics
| Metric | Description | Target |
|---|---|---|
| Mapping rate | % reads aligned | > 80% |
| Non-redundant fraction (NRF) | Unique reads / total | > 0.8 |
| PCR Bottleneck Coefficient (PBC) | M1/Md | > 0.9 |
| FRiP (Fraction of Reads in Peaks) | Reads in peaks / total | > 1% TF; > 0.3% histone |
4. Read Trimming
fastp performs adapter detection, quality trimming, and QC reporting in a single pass.
| Option | Description |
|---|---|
--in1 |
Input FASTQ (single-end) or forward reads (paired-end) |
--out1 |
Output trimmed FASTQ |
--detect_adapter_for_pe |
Auto-detect adapters (use for paired-end) |
--qualified_quality_phred 20 |
Minimum base quality threshold (Q20) |
--length_required 25 |
Discard reads shorter than 25 bp |
--thread 4 |
Number of threads |
mkdir -p trim
# Trim ChIP sample
fastp \
--in1 chip.fastq.gz \
--out1 trim/chip_trimmed.fq.gz \
--qualified_quality_phred 20 \
--length_required 25 \
--thread 4 \
--html trim/chip_fastp.html \
--json trim/chip_fastp.json
# Trim Input control
fastp \
--in1 input.fastq.gz \
--out1 trim/input_trimmed.fq.gz \
--qualified_quality_phred 20 \
--length_required 25 \
--thread 4 \
--html trim/input_fastp.html \
--json trim/input_fastp.json
multiqc trim/ -n trim_report
For paired-end data, add
--in2,--out2, and--detect_adapter_for_peoptions.
5. Reference Genome Preparation
Download Reference
Our example FASTQs (ENCFF001HTP / ENCFF001HTT) are from early ENCODE and were aligned to hg19. For full-genome mapping we download the full assembly; for a quick tutorial run, chr19 alone (~58 MB) is enough.
# Option A β full hg19 genome (~3 GB after decompression, recommended)
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
gunzip hg19.fa.gz
mv hg19.fa reference.fasta
# Option B β chr19 only, for a fast tutorial run (small disk / short time)
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/chromosomes/chr19.fa.gz
gunzip chr19.fa.gz
mv chr19.fa reference.fasta
Using Option B (single chromosome)? Only reads that map to chr19 will align (~2β6% of total reads). This is fine for testing pipeline commands but wonβt give biologically meaningful peak counts. For real analysis use Option A.
Create FASTA Index
samtools faidx reference.fasta
Index the Reference for Minimap2
Minimap2 can build the index on the fly, but pre-computing it speeds up repeated alignments:
# -d : save index to file
minimap2 -d reference.mmi reference.fasta
Why Minimap2 for ChIP-Seq? Minimap2 (
-ax srmode) handles short-read DNA alignment efficiently and is suitable for both short (< 100 bp) and longer Illumina reads. Unlike HISAT2/STAR, it is not splice-aware, making it appropriate for ChIP-Seq where reads come from genomic DNA.
6. Alignment with Minimap2
Minimap2 uses the -ax sr preset for short Illumina reads.
| Option | Description |
|---|---|
-ax sr |
Short-read alignment preset |
-t 4 |
Number of threads |
reference.mmi |
Pre-built minimap2 index |
Align ChIP Sample
set -o pipefail # so any failure in the pipe is caught
minimap2 -ax sr -t 4 reference.mmi trim/chip_trimmed.fq.gz \
2> chip.minimap2.log \
| samtools sort -@ 4 -o chip.bam
samtools index chip.bam
samtools quickcheck chip.bam && echo "chip BAM OK"
samtools flagstat chip.bam
Align Input Control
minimap2 -ax sr -t 4 reference.mmi trim/input_trimmed.fq.gz \
2> input.minimap2.log \
| samtools sort -@ 4 -o input.bam
samtools index input.bam
Paired-end ChIP-Seq reads? Supply both FASTQ files:
minimap2 -ax sr -t 4 reference.mmi \ trim/chip_R1_trimmed.fq.gz trim/chip_R2_trimmed.fq.gz \ 2> chip.minimap2.log | samtools sort -@ 4 -o chip.bam
Filter Low-Quality and Unmapped Reads
| Option | Description |
|---|---|
-b |
Output BAM format |
-q 30 |
Minimum mapping quality (MAPQ >= 30) |
-F 4 |
Exclude unmapped reads |
samtools view -b -q 30 -F 4 chip.bam > chip.filt.bam
samtools index chip.filt.bam
samtools view -b -q 30 -F 4 input.bam > input.filt.bam
samtools index input.filt.bam
7. Mark and Remove Duplicates
For ChIP-Seq, PCR duplicates must be removed (unlike variant calling where they are only marked).
| Option | Description |
|---|---|
I= |
Input BAM |
O= |
Output BAM |
M= |
Duplication metrics file |
REMOVE_DUPLICATES=true |
Remove (not just flag) duplicate reads |
VALIDATION_STRINGENCY=SILENT |
Suppress warnings on BAM format |
picard MarkDuplicates \
I=chip.filt.bam \
O=chip.dedup.bam \
M=chip.dedup.metrics \
REMOVE_DUPLICATES=true \
VALIDATION_STRINGENCY=SILENT
samtools index chip.dedup.bam
picard MarkDuplicates \
I=input.filt.bam \
O=input.dedup.bam \
M=input.dedup.metrics \
REMOVE_DUPLICATES=true \
VALIDATION_STRINGENCY=SILENT
samtools index input.dedup.bam
8. Generate BigWig Signal Tracks
BigWig files are used for visualizing ChIP-Seq signal in genome browsers.
Normalize by Sequencing Depth (RPKM)
| Option | Description |
|---|---|
--bam |
Input BAM file |
--outFileName |
Output BigWig file |
--normalizeUsing RPKM |
Normalize by reads per kilobase per million |
--binSize 10 |
Resolution in base pairs |
--numberOfProcessors 4 |
Number of threads |
bamCoverage \
--bam chip.dedup.bam \
--outFileName chip.bw \
--normalizeUsing RPKM \
--binSize 10 \
--numberOfProcessors 4
bamCoverage \
--bam input.dedup.bam \
--outFileName input.bw \
--normalizeUsing RPKM \
--binSize 10 \
--numberOfProcessors 4
ChIP vs. Input Ratio (log2 fold change)
bamCompare \
--bamfile1 chip.dedup.bam \
--bamfile2 input.dedup.bam \
--outFileName chip_vs_input.bw \
--scaleFactorsMethod None \
--normalizeUsing RPKM \
--operation log2 \
--binSize 10 \
--numberOfProcessors 4
Common Error β
--normalizeUsing RPKMis only valid if you also use--scaleFactorsMethod None!Cause:
bamCompareapplies scaling in two places by default; combining both with RPKM is a conflict. Fix: Always pass--scaleFactorsMethod Nonewhen using--normalizeUsing RPKM(shown above).
Visualize in IGV
Load chip.bw, input.bw, and chip_vs_input.bw into IGV for visual inspection.
9. IP Quality Assessment with deepTools
Fingerprint Plot
The fingerprint plot shows how well the ChIP enrichment worked. A perfect ChIP shows a steep curve; input should be diagonal.
plotFingerprint \
--bamfiles chip.dedup.bam input.dedup.bam \
--labels ChIP Input \
--plotFile fingerprint.png \
--outRawCounts fingerprint.tab \
--numberOfProcessors 4
Interpreting the fingerprint plot: A successful ChIP enrichment shows a steep curve (reads concentrated at a few genomic loci). The input control should be near-diagonal (reads evenly distributed). Poor enrichment = ChIP curve resembles the input.
Replicate Correlation
# Bin the genome and count reads in each bin
multiBamSummary bins \
--bamfiles chip_rep1.dedup.bam chip_rep2.dedup.bam input.dedup.bam \
--labels Rep1 Rep2 Input \
--outFileName multibam.npz \
--numberOfProcessors 4
# Plot correlation heatmap
plotCorrelation \
--corData multibam.npz \
--corMethod pearson \
--skipZeros \
--whatToPlot heatmap \
--colorMap RdYlBu \
--plotFile correlation.png \
--outFileCorMatrix correlation.txt
Common Error β
error: the following arguments are required: --whatToPlot/-pOlder documentation often says
--plotTypebut currentplotCorrelationuses--whatToPlot(valid values:heatmap,scatterplot).
Target: Pearson correlation between biological replicates > 0.9
10. Peak Calling with MACS3
MACS3 (Model-based Analysis of ChIP-Seq) is the standard tool for identifying enriched genomic regions.
MACS3 Key Options
| Option | Description |
|---|---|
-t |
Treatment (ChIP) BAM file |
-c |
Control (Input) BAM file |
--format BAM |
Input file format |
--gsize hs |
Effective genome size (hs=human, mm=mouse, ce=C. elegans, dm=Drosophila) |
--name |
Output file prefix |
--outdir |
Output directory |
--qvalue 0.05 |
FDR threshold for peak calling |
--broad |
Call broad peaks (for histone marks) |
--broad-cutoff 0.1 |
FDR threshold for broad peaks |
Narrow Peak Calling (TF, H3K4me3, H3K9ac)
macs3 callpeak \
-t chip.dedup.bam \
-c input.dedup.bam \
--format BAM \
--gsize hs \
--name chip_narrow \
--outdir macs3_narrow \
--qvalue 0.05 \
2>&1 | tee macs3_narrow.log
Common Error β
MACS3 needs at least 100 paired peaks at + and - strand to build the model, but can only find 0Cause: MACS3 tries to build a fragment-size model from strand-paired peaks. With low-depth data (or a single chromosome / small reference), it canβt find enough peaks to model. Fix: Bypass the model and set a fixed fragment size (147 bp is a reasonable default for nucleosome-associated reads):
macs3 callpeak -t chip.dedup.bam -c input.dedup.bam --format BAM --gsize hs \ --name chip_narrow --outdir macs3_narrow --qvalue 0.05 \ --nomodel --extsize 147
Broad Peak Calling (H3K27me3, H3K36me3, H3K9me3)
macs3 callpeak \
-t chip.dedup.bam \
-c input.dedup.bam \
--format BAM \
--gsize hs \
--name chip_broad \
--outdir macs3_broad \
--broad \
--broad-cutoff 0.1 \
2>&1 | tee macs3_broad.log
MACS3 Output Files
| File | Description |
|---|---|
_peaks.narrowPeak |
Peak coordinates + statistics |
_peaks.xls |
Detailed peak table |
_summits.bed |
Peak summit positions |
_model.r |
Fragment size model (run in R) |
_control_lambda.bdg |
Control lambda values |
_treat_pileup.bdg |
ChIP pileup signal |
NarrowPeak Format
| Column | Description |
|---|---|
| 1 | Chromosome |
| 2 | Start (0-based) |
| 3 | End |
| 4 | Peak name |
| 5 | Score |
| 6 | Strand |
| 7 | Signal value (fold enrichment) |
| 8 | -log10(p-value) |
| 9 | -log10(q-value) |
| 10 | Summit offset from start |
Count and Filter Peaks
# Count total peaks
wc -l macs3_narrow/chip_narrow_peaks.narrowPeak
# View top peaks by fold enrichment (column 7)
sort -k7,7rn macs3_narrow/chip_narrow_peaks.narrowPeak | head -20
# Filter by FDR q-value < 0.01 (column 9 = -log10 qvalue, so > 2)
awk '$9 > 2' macs3_narrow/chip_narrow_peaks.narrowPeak > chip_narrow_filtered.bed
11. Signal Profiles and Heatmaps
Visualize ChIP signal around features (e.g., TSS, peak centers).
Download Gene Annotation and Build TSS BED
# Download refGene table (UCSC format: tab-separated text)
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz
gunzip refGene.txt.gz
# Build a BED file of transcription start sites (TSS = txStart on +, txEnd on -)
awk 'BEGIN{OFS="\t"} {
if ($4=="+") print $3, $5, $5+1, $2, "0", $4;
else print $3, $6-1, $6, $2, "0", $4
}' refGene.txt | sort -k1,1 -k2,2n | uniq > TSS.bed
head TSS.bed
wc -l TSS.bed
The resulting TSS.bed has one row per transcript, with coordinates pointing to the exact TSS base.
Compute Signal Matrix Around TSS
| Option | Description |
|---|---|
--scoreFileName |
BigWig signal file |
--regionsFileName |
BED file with genomic regions |
--referencePoint TSS |
Anchor point for the plot |
--upstream / --downstream |
Window size around reference point |
--binSize 50 |
Resolution in base pairs |
computeMatrix reference-point \
--scoreFileName chip.bw \
--regionsFileName TSS.bed \
--referencePoint TSS \
--upstream 3000 \
--downstream 3000 \
--binSize 50 \
--numberOfProcessors 4 \
-o tss_matrix.gz
plotHeatmap \
--matrixFile tss_matrix.gz \
--outFileName tss_heatmap.png \
--colorMap Blues \
--whatToShow 'heatmap and colorbar' \
--zMin 0 --zMax 5
Profile Plot
plotProfile \
--matrixFile tss_matrix.gz \
--outFileName tss_profile.png \
--plotTitle "ChIP Signal at TSS"
12. Motif Analysis
Motif analysis identifies DNA sequence motifs enriched in ChIP-Seq peaks β typically the binding motif of the immunoprecipitated protein.
HOMER Motif Analysis
# Install HOMER genome (run once)
configureHomer.pl -install hg38
findMotifsGenome.pl usage: findMotifsGenome.pl <peaks.bed> <genome> <output_dir/> [options]
| Option | Description |
|---|---|
-size 200 |
Region size around peak center for motif search |
-mask |
Mask repeat sequences |
-p 4 |
Number of threads |
# Prepare peak file (convert narrowPeak to BED6)
awk '{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6}' \
chip_narrow_filtered.bed > peaks_homer.bed
# Find motifs (de novo + known)
findMotifsGenome.pl peaks_homer.bed hg38 motif_output/ -size 200 -mask -p 4
HOMER Output
| File | Description |
|---|---|
homerResults.html |
De novo discovered motifs |
knownResults.html |
Matches to known motifs database |
homerResults/motif1.motif |
PWM for top motif |
13. Peak Annotation with ChIPseeker (R)
ChIPseeker annotates peaks relative to genomic features (promoter, exon, intron, etc.).
Install ChIPseeker
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("ChIPseeker", "TxDb.Hsapiens.UCSC.hg38.knownGene", "clusterProfiler"))
Annotate Peaks
library(ChIPseeker)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(clusterProfiler)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
# Read peaks
peaks <- readPeakFile("chip_narrow_filtered.bed")
# Annotate
peakAnno <- annotatePeak(peaks, tssRegion=c(-3000, 3000),
TxDb=txdb, annoDb="org.Hs.eg.db")
# Plot annotation pie chart
plotAnnoPie(peakAnno)
# Plot distance to TSS
plotDistToTSS(peakAnno, title="Distribution of ChIP peaks\nrelative to TSS")
# Gene ontology of nearby genes
genes <- seq2gene(peaks, tssRegion=c(-1000, 1000),
flankDistance=3000, TxDb=txdb)
pathway <- enrichPathway(genes)
dotplot(pathway)
14. Differential Binding Analysis with DiffBind (R)
Compare ChIP-Seq signal between two conditions (e.g., treated vs. untreated).
Install DiffBind
BiocManager::install("DiffBind")
Sample Sheet Format
The sample sheet CSV must contain these columns:
| SampleID | Condition | Replicate | bamReads | Peaks | PeakCaller |
|---|---|---|---|---|---|
| chip_cond1_rep1 | Condition1 | 1 | chip_cond1_rep1.dedup.bam | peaks_cond1_rep1.narrowPeak | macs |
| chip_cond1_rep2 | Condition1 | 2 | chip_cond1_rep2.dedup.bam | peaks_cond1_rep2.narrowPeak | macs |
| chip_cond2_rep1 | Condition2 | 1 | chip_cond2_rep1.dedup.bam | peaks_cond2_rep1.narrowPeak | macs |
| chip_cond2_rep2 | Condition2 | 2 | chip_cond2_rep2.dedup.bam | peaks_cond2_rep2.narrowPeak | macs |
Run DiffBind
library(DiffBind)
# Load sample sheet
samples <- read.csv("samplesheet.csv")
dba_obj <- dba(sampleSheet=samples)
# Count reads in peaks
dba_obj <- dba.count(dba_obj, bUseSummarizeOverlaps=TRUE)
# Set contrast
dba_obj <- dba.contrast(dba_obj, categories=DBA_CONDITION)
# Run differential analysis (DESeq2 by default)
dba_obj <- dba.analyze(dba_obj)
# Get significant peaks
db_peaks <- dba.report(dba_obj, th=0.05)
15. IDR Analysis (Irreproducibility Discovery Rate)
IDR measures reproducibility of peak calls across replicates. ENCODE requires IDR < 0.05.
| Option | Description |
|---|---|
--samples |
Two replicate peak files |
--input-file-type narrowPeak |
Input file format |
--rank p.value |
Column to rank peaks by |
--output-file |
Output IDR results |
--plot |
Generate IDR diagnostic plot |
# Sort peaks by score (column 5)
sort -k5,5rn chip_rep1_peaks.narrowPeak > rep1_sorted.bed
sort -k5,5rn chip_rep2_peaks.narrowPeak > rep2_sorted.bed
# Run IDR
idr \
--samples rep1_sorted.bed rep2_sorted.bed \
--input-file-type narrowPeak \
--rank p.value \
--output-file idr_peaks.txt \
--plot \
--log-output-file idr.log
Common Error β
AttributeError: module 'numpy' has no attribute 'int'IDR 2.0.3 is not compatible with NumPy β₯ 1.20 (where
np.intwas removed). Fix: Downgrade NumPy in the chipseq env, or use the bioconda build (which patches this):pip install "numpy<1.20" # OR use the bioconda-packaged idr: conda install -c bioconda idr
Common Error β
idr: Image Data Repository access library(wrong package)
pip install idrpulls a completely unrelated package (for microscopy images). Always install from the Kundaje/Boley repo:pip uninstall idr -y pip install git+https://github.com/nboley/idr.git
16. Full Workflow Summary
| Step | Tool | Input | Output |
|---|---|---|---|
| QC | FastQC + MultiQC | FASTQ | HTML report |
| Trim | fastp | FASTQ | Trimmed FASTQ |
| Align | Minimap2 | FASTQ | BAM |
| Sort/Index | SAMtools | BAM | Sorted BAM |
| Filter | SAMtools | BAM | Filtered BAM (MAPQ >= 30) |
| Dedup | Picard | BAM | Deduplicated BAM |
| Signal tracks | deepTools bamCoverage | BAM | BigWig |
| IP quality | deepTools plotFingerprint | BAM | Plot |
| Replicate QC | deepTools plotCorrelation | BAM | Correlation matrix |
| Peak calling | MACS3 | BAM | BED/narrowPeak |
| Heatmaps | deepTools computeMatrix/plotHeatmap | BigWig + BED | Heatmap PNG |
| Motif analysis | HOMER | BED | Motif report HTML |
| Annotation | ChIPseeker (R) | BED | Annotated peaks |
| Differential | DiffBind (R) | BAM + peaks | Differential peaks |
| Reproducibility | IDR | Peak files | IDR peaks |
17. Troubleshooting β Quick Reference
| Problem | Section | Quick Fix |
|---|---|---|
TypeError: 'NoneType' object is not iterable from conda |
1. Setup | Old conda solver β conda update -n base conda or add --solver=libmamba |
libcrypto.so.1.0.0: cannot open |
1. Setup | ln -sf ${CONDA_PREFIX}/lib/libcrypto.so.3 ${CONDA_PREFIX}/lib/libcrypto.so.1.0.0 |
fastp: igzip: unexpected eof |
3. QC | FASTQ download was truncated β check with gunzip -t, resume with wget -c |
samtools quickcheck EOF missing |
6. Align | BAM write was interrupted β delete and re-run |
MACS3 Total number of paired peaks: 0 |
10. MACS3 | Low depth β add --nomodel --extsize 147 |
--normalizeUsing RPKM is only valid if you also use --scaleFactorsMethod None |
8. bamCompare | Add --scaleFactorsMethod None |
plotCorrelation: error: --whatToPlot required |
9. deepTools | Use --whatToPlot heatmap (not --plotType) |
IDR module 'numpy' has no attribute 'int' |
15. IDR | pip install "numpy<1.20" or use bioconda idr |
IDR Image Data Repository access library installed |
1. Setup | Wrong package β install from git+https://github.com/nboley/idr.git |
HOMER findMotifsGenome.pl: command not found |
12. HOMER | conda install -c bioconda homer and configureHomer.pl -install hg19 |
18. Cleanup
# Remove large intermediate files when done
rm -f chip.bam chip.bam.bai chip.filt.bam chip.filt.bam.bai
rm -f input.bam input.bam.bai input.filt.bam input.filt.bam.bai
rm -rf trim/
conda deactivate
# Optional: remove environment entirely
conda env remove --name chipseq
References
| Resource | Link |
|---|---|
| ENCODE ChIP-Seq standards | Landt et al. 2012, Genome Research |
| MACS3 paper | Zhang et al. 2008, Genome Biology |
| deepTools paper | Ramirez et al. 2016, Nucleic Acids Research |
| fastp paper | Chen et al. 2018, Bioinformatics |
| ChIPseeker paper | Yu et al. 2015, Bioinformatics |
| HOMER documentation | homer.ucsd.edu |
| IDR framework | Li et al. 2011, Ann. Applied Statistics |
| DiffBind paper | Ross-Innes et al. 2012, Nature |