Gossypium hirsutum (AD1) 'ZM113' T2T genome CRI_v1.0

Overview
Analysis NameGossypium hirsutum (AD1) 'ZM113' T2T genome CRI_v1.0
Method Hifiasm V0.19.3-r 572; Hi-C-Pro v2.8.1; LACHESIS v.; NextDenovo v2.3.1; Nextpolish v1.3.0
Source (v1.0)
Date performed2025-03-26

Gap-free genome assembly of G. hirsutum cultivar ZM113

To assemble a high-quality genome assembly for G. hirsutum cultivar ZM113, we used 344.83 Gb (150.7×) of ONT ultralong reads, 60.66 Gb (26.5×) PacBio HiFi reads, 111.07 Gb (48.5×) MGI PE150 short reads and 252.6 Gb (110.4×) Hi-C data. Initial de novo genome assembly with ONT and HiFi reads yielded a contig N50 of 89.27 Mb, with 31 contigs anchored to the 26 chromosomes via Hi-C scaffolding. Among those chromosomes, 22 were represented by single large contigs, whereas five gaps remained on chromosomes A01 (two gaps), A11, D03 and D11. Gaps were closed iteratively using ONT reads aligned to flanking regions or poorly mapped reads, followed by local assemblies. This gapless assembly was next subjected to polishing with MGI short reads, telomere correction and completion using HiFi reads and rDNA array size correction based on HiFi read coverage. The final T2T assembly (ZM113 CRI_v1.0) contained 26 complete chromosomes.

We assessed assembly quality using a k-mer-based approach22, demonstrating a quality value (QV) of 42.9 (>99.99% base call accuracy) and k-mer completeness of 97.99%. Assembly errors or heterozygous sites detected totaled just 8,200 bp in length. High mapping rates (98.63% MGI short reads, 99.92% HiFi reads and 98.82% ONT reads) with coverage exceeding 99.77% further indicated completeness. Additional optical genome mapping data verified gap closures, with minor discrepancies noted in Gap1 (619,862 bp on A01, ~50 kbp breakpoint in optical maps) and Gap4 (37,278 bp on D03, ~700 kbp length difference).

The completeness and integrity of genic regions were supported by a high BUSCO score of 99.6%. Compared to prior G. hirsutum assemblies, ZM113 offers substantial improvements; contig N50 contig increased from 75.3 Mb to 89.27 Mb, and gaps were reduced from between 13 and 2,564 to 0, resulting in an anchored genome size of 2,299.07 Mb.

 

 

Publication: Hu, G., Wang, Z., Tian, Z. et al. A telomere-to-telomere genome assembly of cotton provides insights into centromere evolution and short-season adaptation. Nat Genet (2025). https://doi.org/10.1038/s41588-025-02130-4

Assembly

The chromosomes (pseudomolecules) for G. hrisutum ZM113 genome. These files belong to the Gossypium hirsutum (AD1) 'ZM113' T2T genome CRI Assembly v1.0

Chromosomes (FASTA format) G.hirsutum_CRI-ZM113_T2T_assembly_v1.fa.gz
Mitochondrion (FASTA format) G.hirsutum_CRI-ZM113_T2T_mitochondrion_v1.fa.gz
Chloroplast (FASTA format) G.hirsutum_CRI-ZM113_T2T_chloroplast_v1.fa.gz
Genes

The predicted gene model, their alignments, and proteins for G. hirsutum 'ZM113 genome. These files belong to the Gossypium hirsutum (AD1) 'ZM113' T2T genome CRI Assembly v1.0

Predicted gene models with exons (GFF3 format) G.hirsutum_CRI-ZM113.gff3.gz
CDS-coding sequences (FASTA format) G.hirsutum_CRI-ZM113_CDS.fa.gz
Protein sequences (FASTA format) G.hirsutum_CRI-ZM113_protein.fa.gz
Predicted mitochondrion models (GFF3 format) G.hirsutum_CRI-ZM113_mitochondrion.gff3.gz
Predicted chloroplast models (GFF3 format) G.hirsutum_CRI-ZM113_chloroplast.gff3.gz