DOI
PubMed
PubMed Central
Haplotype-resolved genome assemblies of BJ and IMR-90 human fibroblast cell lines reveal extensive structural variation and enable reanalysis of historical sequencing data
T Rhyker Ranallo-Benavidez, Yue Hao, Emilia Volpe, Maryam Jehangir, Noelle Fukushima, Ogechukwu Mbegbu, Rebecca Reiman, Jessica Molnar, Danyael Murphy, Dorothy Marie Paredes, Shukmei Wong, Kara Karaniuk, Stephanie Buchholtz, Jonathan Keats, Mitchell J Machiela, Mikhail Kolmogorov, Benedict Paten, Simona Giunta, Floris P Barthel 2026. Nucleic Acids Res. 2026 Apr 29;54(8)

Abstract We present chromosome-level, phased diploid genome assemblies of two widely used human fibroblast cell lines: BJ (46,XY) and IMR-90 (46,XX). Using Oxford Nanopore, PacBio HiFi, and Hi-C sequencing data, we generated assemblies spanning 5.9 and 6.0 Gbp with diploid quality values exceeding QV 60. To validate structural integrity, we developed KaryoScope, an alignment-free tool for generating computational karyograms from k-mer feature databases. We identify >50 000 structural variants relative to T2T-CHM13v2.0, the majority of which are heterozygous and cell-line-specific. Combining reference-based and de novo gene annotation, we uncover a previously unreported 1 Mbp homozygous duplication at the 16p11.2 locus in BJ, demonstrating that even karyotypically normal cell lines can harbor clinically relevant submicroscopic rearrangements. We show that mapping publicly available short-read, RNA-seq, and ChIP-seq data to sample-matched diploid assemblies substantially improves read alignment and enables haplotype phasing of 23%–28% of short reads. The BJ and IMR-90 assemblies and associated variant calls are publicly available as a resource for the research community.