Gossypium hirsutum (AD1) 'ZM113' T2T genome CRI_v1.0
Overview
Gap-free genome assembly of G. hirsutum cultivar ZM113 To assemble a high-quality genome assembly for G. hirsutum cultivar ZM113, we used 344.83 Gb (150.7×) of ONT ultralong reads, 60.66 Gb (26.5×) PacBio HiFi reads, 111.07 Gb (48.5×) MGI PE150 short reads and 252.6 Gb (110.4×) Hi-C data. Initial de novo genome assembly with ONT and HiFi reads yielded a contig N50 of 89.27 Mb, with 31 contigs anchored to the 26 chromosomes via Hi-C scaffolding. Among those chromosomes, 22 were represented by single large contigs, whereas five gaps remained on chromosomes A01 (two gaps), A11, D03 and D11. Gaps were closed iteratively using ONT reads aligned to flanking regions or poorly mapped reads, followed by local assemblies. This gapless assembly was next subjected to polishing with MGI short reads, telomere correction and completion using HiFi reads and rDNA array size correction based on HiFi read coverage. The final T2T assembly (ZM113 CRI_v1.0) contained 26 complete chromosomes. We assessed assembly quality using a k-mer-based approach22, demonstrating a quality value (QV) of 42.9 (>99.99% base call accuracy) and k-mer completeness of 97.99%. Assembly errors or heterozygous sites detected totaled just 8,200 bp in length. High mapping rates (98.63% MGI short reads, 99.92% HiFi reads and 98.82% ONT reads) with coverage exceeding 99.77% further indicated completeness. Additional optical genome mapping data verified gap closures, with minor discrepancies noted in Gap1 (619,862 bp on A01, ~50 kbp breakpoint in optical maps) and Gap4 (37,278 bp on D03, ~700 kbp length difference). The completeness and integrity of genic regions were supported by a high BUSCO score of 99.6%. Compared to prior G. hirsutum assemblies, ZM113 offers substantial improvements; contig N50 contig increased from 75.3 Mb to 89.27 Mb, and gaps were reduced from between 13 and 2,564 to 0, resulting in an anchored genome size of 2,299.07 Mb.
Publication: Hu, G., Wang, Z., Tian, Z. et al. A telomere-to-telomere genome assembly of cotton provides insights into centromere evolution and short-season adaptation. Nat Genet (2025). https://doi.org/10.1038/s41588-025-02130-4 Assembly
The chromosomes (pseudomolecules) for G. hrisutum ZM113 genome. These files belong to the Gossypium hirsutum (AD1) 'ZM113' T2T genome CRI Assembly v1.0
Functional Analysis
Functional annotation files for the Gossypium hirsutum ZM113 Genome v1.0 are available for download below. The Gossypium hirsutum ZM113 Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS). Downloads
Genes
The predicted gene model, their alignments, and proteins for G. hirsutum 'ZM113 genome. These files belong to the Gossypium hirsutum (AD1) 'ZM113' T2T genome CRI Assembly v1.0
Homology
Homology of the Gossypium hirsutum ZM113 genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-6 for the Arabidoposis proteins (Araport11, 2022-09), UniProtKB/SwissProt (Release 2025-02), and UniProtKB/TrEMBL (Release 2025-02) databases. The best hit reports are available for download in Excel format. Protein Homologs
Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium hirsutum ZM113 v1.0 assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3. Markers available in CottonGen are linked to JBrowse.
Publication
Hu, G., Wang, Z., Tian, Z. et al. A telomere-to-telomere genome assembly of cotton provides insights into centromere evolution and short-season adaptation. Nat Genet (2025). https://doi.org/10.1038/s41588-025-02130-4 Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. hirsutum ZM113 CRI genome 1.0 assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3.
|