DNA double-strand breaks (DSBs) are among the most cytotoxic lesions a cell can face. They're also increasingly deliberate — CRISPR-Cas9, base editors, prime editors, and meganuclease-based tools all exploit controlled DSB induction to rewrite the genome. The challenge is that off-target breaks are rarely zero, and the distribution of those breaks matters enormously for clinical safety.
The Amplification Problem
Most sequencing-based DSB detection methods insert a known adapter at the break site, then PCR-amplify. That amplification step is fast and cheap, but it distorts signal. Low-frequency breaks are under-represented; high-frequency breaks dominate read counts. The result is a biased view — useful for finding strong off-targets, but blind to rare events that might still be clinically meaningful.
INDUCE-seq removes PCR entirely. After adapter ligation at DSB ends, libraries go directly to sequencing without amplification. Because every molecule is sequenced once, read counts reflect actual molecular frequencies rather than amplification kinetics.
What That Changes
Without amplification bias you get:
- Quantitative DSB frequencies — the ratio of on-target to off-target reads directly reflects the ratio of breaks in the cell
- Sensitivity for rare events — low-frequency off-targets that drop below the noise floor of PCR-based methods are detectable
- Full breakome profiling — endogenous DSB hotspots from replication stress, transcription, and topoisomerase activity appear alongside nuclease-induced breaks, giving biological context
Cell-Based vs. Cell-Free Assays
A recurring debate in the off-target field is whether cell-based or cell-free assays better predict clinical risk. Cell-free methods (CIRCLE-seq, Digenome-seq) use purified genomic DNA and excess nuclease — they maximise sensitivity but report biochemical potential rather than cellular reality.
INDUCE-seq is cell-based. The chromatin environment, DNA repair machinery, and cell-type-specific accessibility all influence whether a biochemically plausible off-target becomes an actual DSB in a living cell. For gene therapy applications, where the relevant question is what happens in this patient's cells, cell-based detection is the more relevant answer.
Getting Started with INDUCE-seq Data
If you're analysing INDUCE-seq outputs, the key files to orient around are:
- A BED file of break-site coordinates, scored by normalised read depth
- A BAM of the adapter-tagged reads aligned to your reference genome (GRCh38 preferred)
- A breakome background file capturing endogenous DSB signal from unedited controls
The on-target site should dominate. Off-target sites are ranked by distance from the on-target signature, sequence similarity to the guide, and overlap with functional genomic elements (promoters, exons, known oncogenes).
In future posts I'll go into the pipeline architecture in more detail and share Nextflow patterns for processing INDUCE-seq libraries at scale.