Gossypium hirsutum (AD1) 'TM-1' T2T genome JZU_v1.0

Overview
Analysis NameGossypium hirsutum (AD1) 'TM-1' T2T genome JZU_v1.0
MethodOxford Nanopore PromethION; PacBio Sequel (HiFiasm v. 0.7)
Source (1.0)
Date performed2025-03-11

Upland cotton (Gossypium hirsutum) accounts for more than 90% of the world’s cotton production and, as an allotetraploid, is a model plant for polyploid crop domestication. In the present study, we reported a complete telomere-to-telomere (T2T) genome assembly of Upland cotton accession Texas Marker-1 (T2T-TM-1), which has a total size of 2,299.6 Mb, and annotated 79,642 genes. Based on T2T-TM-1, interspecific centromere divergence was detected between the A- and D-subgenomes and their corresponding diploid progenitors. Centromere-associated repetitive sequences (CRCs) were found to be enriched for Gypsy-like retroelements. Centromere size expansion, repositioning and structure variations occurred post-polyploidization. It is interesting that CRC homologs were transferred from the diploid D-genome progenitor to the D-subgenome, invaded the A-subgenome and then underwent post-tetraploidization proliferation. This suggests an evolutionary advantage for the CRCs of the D-genome progenitor, presents a D-genome-adopted inheritance of centromere repeats after polyploidization and shapes the dynamic centromeric landscape during polyploidization in polyploid species.

Table 1. Assembly and annotation of the gap-free genome sequences 

Genomic feature TM-1-T2T
Genome size (Mb) 2299.6
No. of gaps 0
No. of centromeres 26
No. of telomeres 47
Percentage of TE 66.05
Number of genes 79,642
Hi-C rate (%) 99.71
BUSCOs (%) 99.5

publication: Yan, H., Han, J., Jin, S. et al. Post-polyploidization centromere evolution in cotton. Nat Genet (2025). https://doi.org/10.1038/s41588-025-02115-3

Assembly

The chromosomes (pseudomolecules) and scaffolds for G. hrisutum 'TM-1' T2T genome. These files belong to the Gossypium hirsutum (AD1) 'TM-1' T2T genome ZJU_v1.0

Chromosomes & scaffolds (FASTA format) G.hirsutum_ZJU-TM1_T2T_v1.0.fasta.gz
Functional Analysis

Functional annotation files for the Gossypium hirsutum TM1 T2T ZJU genome 1.0 are available for download below. The Gossypium hirsutum TM1 T2T ZJU genome 1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan AD1_TM1_T2T_ZJU_v1_genes2GO.xlsx.gz
IPR assignments from InterProScan AD1_TM1_T2T_ZJU_v1_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs AD1_TM1_T2T_ZJU_v1_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways AD1_TM1_T2T_ZJU_v1_KEGG-pathways.xlsx.gz
Genes

The predicted gene model, their alignments and proteins for G. hirsutum 'TM-1' T2T genome. These files belong to the Gossypium hirsutum (AD1) 'TM-1' T2T genome ZJU_v1.0

Predicted gene models with exons (GFF3 format) G.hirsutum_ZJU-TM1_T2T.gff3.gz
Coding sequences, CDS (FASTA format) G.hirsutum_ZJU-TM1_T2T.cds.gz
Protein sequences (FASTA format) G.hirsutum_ZJU-TM1_T2T.pep.gz
Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium hirsutum TM-1 T2T ZJU v1.0 assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
 
CottonGen SNP markers mapped to genome AD1_TM1_T2T_ZJU_v1_SNP
CottonGen RFLP markers mapped to genome AD1_TM1_T2T_ZJU_v1_RFLP
CottonGen SSR markers mapped to genome AD1_TM1_T2T_ZJU_v1_SSR
CottonGen InDel markers mapped to genome AD1_TM1_T2T_ZJU_v1_InDel
Publication

Yan, H., Han, J., Jin, S. et al. Post-polyploidization centromere evolution in cotton. Nat Genet (2025). https://doi.org/10.1038/s41588-025-02115-3

Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. hirsutum TM1 T2T ZJU genome 1.0 assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.
G. arboreum CottonGen RefTrans v1 AD1_TM1_T2T_ZJU_v1_g.arboreum_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 AD1_TM1_T2T_ZJU_v1_g.hirsutum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 AD1_TM1_T2T_ZJU_v1_g.barbadense_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 AD1_TM1_T2T_ZJU_v1_g.raimondii_cottongen_reftransV1