Gossypium hirsutum (AD1) 'ZM113' T2T genome CRI_v1.0

Overview
Analysis NameGossypium hirsutum (AD1) 'ZM113' T2T genome CRI_v1.0
Method Hifiasm V0.19.3-r 572; Hi-C-Pro v2.8.1; LACHESIS v.; NextDenovo v2.3.1; Nextpolish v1.3.0
Source (v1.0)
Date performed2025-03-26

Gap-free genome assembly of G. hirsutum cultivar ZM113

To assemble a high-quality genome assembly for G. hirsutum cultivar ZM113, we used 344.83 Gb (150.7×) of ONT ultralong reads, 60.66 Gb (26.5×) PacBio HiFi reads, 111.07 Gb (48.5×) MGI PE150 short reads and 252.6 Gb (110.4×) Hi-C data. Initial de novo genome assembly with ONT and HiFi reads yielded a contig N50 of 89.27 Mb, with 31 contigs anchored to the 26 chromosomes via Hi-C scaffolding. Among those chromosomes, 22 were represented by single large contigs, whereas five gaps remained on chromosomes A01 (two gaps), A11, D03 and D11. Gaps were closed iteratively using ONT reads aligned to flanking regions or poorly mapped reads, followed by local assemblies. This gapless assembly was next subjected to polishing with MGI short reads, telomere correction and completion using HiFi reads and rDNA array size correction based on HiFi read coverage. The final T2T assembly (ZM113 CRI_v1.0) contained 26 complete chromosomes.

We assessed assembly quality using a k-mer-based approach22, demonstrating a quality value (QV) of 42.9 (>99.99% base call accuracy) and k-mer completeness of 97.99%. Assembly errors or heterozygous sites detected totaled just 8,200 bp in length. High mapping rates (98.63% MGI short reads, 99.92% HiFi reads and 98.82% ONT reads) with coverage exceeding 99.77% further indicated completeness. Additional optical genome mapping data verified gap closures, with minor discrepancies noted in Gap1 (619,862 bp on A01, ~50 kbp breakpoint in optical maps) and Gap4 (37,278 bp on D03, ~700 kbp length difference).

The completeness and integrity of genic regions were supported by a high BUSCO score of 99.6%. Compared to prior G. hirsutum assemblies, ZM113 offers substantial improvements; contig N50 contig increased from 75.3 Mb to 89.27 Mb, and gaps were reduced from between 13 and 2,564 to 0, resulting in an anchored genome size of 2,299.07 Mb.

 

 

Publication: Hu, G., Wang, Z., Tian, Z. et al. A telomere-to-telomere genome assembly of cotton provides insights into centromere evolution and short-season adaptation. Nat Genet (2025). https://doi.org/10.1038/s41588-025-02130-4

Assembly

The chromosomes (pseudomolecules) for G. hrisutum ZM113 genome. These files belong to the Gossypium hirsutum (AD1) 'ZM113' T2T genome CRI Assembly v1.0

Chromosomes (FASTA format) G.hirsutum_CRI-ZM113_T2T_assembly_v1.fa.gz
Mitochondrion (FASTA format) G.hirsutum_CRI-ZM113_T2T_mitochondrion_v1.fa.gz
Chloroplast (FASTA format) G.hirsutum_CRI-ZM113_T2T_chloroplast_v1.fa.gz
Functional Analysis

Functional annotation files for the Gossypium hirsutum ZM113 Genome v1.0 are available for download below. The Gossypium hirsutum ZM113 Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan AD1_ZM113_CRI_v1_genes2GO.xlsx.gz
IPR assignments from InterProScan AD1_ZM113_CRI_v1_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs AD1_ZM113_CRI_v1_KEGG-orthologis.xlsx.gz
Genes

The predicted gene model, their alignments, and proteins for G. hirsutum 'ZM113 genome. These files belong to the Gossypium hirsutum (AD1) 'ZM113' T2T genome CRI Assembly v1.0

Predicted gene models with exons (GFF3 format) G.hirsutum_CRI-ZM113.gff3.gz
CDS-coding sequences (FASTA format) G.hirsutum_CRI-ZM113_CDS.fa.gz
Protein sequences (FASTA format) G.hirsutum_CRI-ZM113_protein.fa.gz
Predicted mitochondrion models (GFF3 format) G.hirsutum_CRI-ZM113_mitochondrion.gff3.gz
Predicted chloroplast models (GFF3 format) G.hirsutum_CRI-ZM113_chloroplast.gff3.gz  
Homology

Homology of the Gossypium hirsutum ZM113 genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-6  for the Arabidoposis proteins (Araport11, 2022-09), UniProtKB/SwissProt (Release 2025-02), and UniProtKB/TrEMBL (Release 2025-02) databases. The best hit reports are available for download in Excel format. 

Protein Homologs

G. hirsutum ZM113 Genome v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) AD1_ZM113_CRI_v1.0_vs_tair.xlsx.gz
G. hirsutum ZM113 Genome v1.0 proteins with arabidopsis (Araport11) (FASTA file) AD1_ZM113_CRI_v1.0_vs_tair_hit.fasta.gz
G. hirsutum ZM113 Genome v1.0 proteins without arabidopsis (Araport11) (FASTA file) AD1_ZM113_CRI_v1.0_vs_tair_noHit.fasta.gz
G. hirsutum ZM113 Genome v1.0 proteins with SwissProt homologs (EXCEL file) AD1_ZM113_CRI_v1.0_vs_swissprot.xlsx.gz
G. hirsutum ZM113 Genome v1.0 proteins with SwissProt (FASTA file) AD1_ZM113_CRI_v1.0_vs_swissprot_hit.fasta.gz
G. hirsutum ZM113 Genome v1.0 proteins without SwissProt (FASTA file) AD1_ZM113_CRI_v1.0_vs_swissprot_noHit.fasta.gz
G. hirsutum ZM113 Genome v1.0 proteins with TrEMBL homologs (EXCEL file) AD1_ZM113_CRI_v1.0_vs_trembl.xlsx.gz
G. hirsutum ZM113 Genome v1.0 proteins with TrEMBL (FASTA file) AD1_ZM113_CRI_v1.0_vs_trembl_hit.fasta.gz
G. hirsutum ZM113 Genome v1.0 proteins without TrEMBL (FASTA file) AD1_ZM113_CRI_v1.0_vs_trembl_noHit.fasta.gz
Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium hirsutum ZM113 v1.0 assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3. Markers available in CottonGen are linked to JBrowse.
 
CottonGen SNP markers mapped to genome AD1_ZM113_CRI_v1_SNP
CottonGen RFLP markers mapped to genome AD1_ZM113_CRI_v1_RFLP
CottonGen SSR markers mapped to genome AD1_ZM113_CRI_v1_SSR
CottonGen InDel markers mapped to genome AD1_ZM113_CRI_v1_InDel
Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. hirsutum ZM113 CRI genome 1.0 assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3.
G. arboreum CottonGen RefTrans v1 AD1_ZM113_CRI_v1_g.arboreum_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 AD1_ZM113_CRI_v1_g.hirsutum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 AD1_ZM113_CRI_v1_g.barbadense_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 AD1_ZM113_CRI_v1_g.raimondii_cottongen_reftransV1