Understanding intratumor heterogeneity is critical for studying tumorigenesis and designing personalized treatments. To decompose the mixed cell population in a tumor, subclones are inferred computationally based on variant allele frequency (VAF) from bulk sequencing data. In this study, we showed that sequencing depth, mean VAF, and variance of VAF of a subclone are confounded. Without considering this effect, current methods require deep-sequencing data (>300× depth) to reliably infer subclones. Here, we present a novel algorithm that incorporates depth-variance and mean-variance dependencies in a clustering error model and successfully identifies subclones in tumors sequenced at depths of as low as 30×. We implemented the algorithm as a model-based adaptive grouping of subclones (MAGOS) method. Analyses of computer simulated data and empirical sequencing data showed that MAGOS outperformed existing methods on minimum sequencing depth, decomposition accuracy, and computation efficiency. The most prominent improvements were observed in analyzing tumors sequenced at depths between 30× and 200×, whereas the performance was comparable between MAGOS and existing methods on deeply sequenced tumors. MAGOS supports analysis of single-nucleotide variants and copy number variants from a single sample or multiple samples of a tumor. We applied MAGOS to whole-exome data of late-stage liver cancers and discovered that high subclone count in a tumor was a significant risk factor of poor prognosis. Lastly, our analysis suggested that sequencing multiple samples of the same tumor at standard depth is more cost-effective and robust for subclone characterization than deep sequencing a single sample. MAGOS is available at github (https://github.com/liliulab/magos).
- cancer evolution, genomics, statistical modeling
ASJC Scopus subject areas
- Ecology, Evolution, Behavior and Systematics
- Molecular Biology