Gossypium hirsutum (AD1) 'TM-1' genome HAU_v2.0

Overview
Analysis NameGossypium hirsutum (AD1) 'TM-1' genome HAU_v2.0
MethodONT and PacBio; Canu v2.1.1; Hi-C reads based on DpnII
Source (v2)
Date performed2024-11-08

About the Assembly

We assembled the TM-1 and 3-79 genomes with Canu (version 2.1.1), which included correction, trimming, and assembly in three steps. We performed these steps manually. First, ONT reads and PacBio reads from both genomes were corrected and trimmed using Canu with default parameters (correctedErrorRate = 0.045 for PacBio reads; correctedErrorRate = 0.144 for Nanopore reads). Trimmed highquality reads from ONT (~40x) and PacBio (~40x) were delivered as input to Canu using a mix of formats with default parameters. To improve base quality, we aligned Illumina paired-end reads (~50x) to contigs using BWA–MEM and polished them with Pilon (version 1.23) (–fix bases –mindepth 10 –minmq 30). High-quality paired-end Hi-C reads based on DpnII for G. hirsutum TM-1 and G. barbadense 3-79 were mapped to the two contig-scale assemblies using Juicer (version 1.6). The original contigs were organized into chromosomes with the 3D-DNA pipeline (version 180 419) (-r 2 -i 15000 –buildgapped- map). Finally, we used Juicebox Assembly Tools (v1.11.08) to manually correct and refine the connections.

 Summary of assemblies of G. hirsutum and G. barbadense genome

  G. hirsutum G. barbadense
Contig N90 5,021,880 1,985,260
Contig N80 9,197,101 4,198,208
Contig N70 13,863,239 6,962,112
Contig N60 18,410,734 12,139,909
Contig N50 21,961,441 12,139,909
Longest contig length 84,707,317 49,101,280
Contig number 1,418 2,064
Total size 2,282,609,487 2,216,666,023
Scaffold N50 108, 550, 191 93, 110, 895
Scaffold number 582 1,688
Genome size 2,324,185,275 2,254,770,940
Pseudochromosomes size (Mb) 2,302,180,022 2,169,217,327
Percentage of anchoring 99.05% 97.20%
Assembly

The chromosomes (pseudomolecules) and scaffolds for G. hrisutum 'TM-1' genome. These files belong to the Gossypium hirsutum (AD1) 'TM-1' genome HAU_v2.0

Chromosomes & scaffolds (FASTA format) G.hirsutum_HAU-TM1_assembly_v2.0.fasta.gz
Functional Annotation

Functional annotation files for the Gossypium hirsutum TM-1 Genome v2.0 are available for download below. The Gossypium hirsutum TM-1 Genome v2.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan AD1_HAU_v2_genes2GO.xlsx.gz
IPR assignments from InterProScan AD1_HAU_v2_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs AD1_HAU_v2_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways AD1_HAU_v2_KEGG-pathways.xlsx.gz
Genes

The predicted gene model, their alignments and proteins for G. hirsutum 'TM-1' genome. These files belong to the Gossypium hirsutum (AD1) 'TM-1' genome HAU_v2.0

Predicted gene models with exons (GFF3 format) G.hirsutum_HAU-AD1_v2.0_gene.gff3.gz
Coding sequences, CDS (FASTA format)  
Protein sequences (FASTA format)  
Homology

Homology of the Gossypium hirsutum TM-1 genome v2.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-6  for the Arabidoposis proteins (Araport11, 2022-09), UniProtKB/SwissProt (Release 2024-03), and UniProtKB/TrEMBL (Release 2024-03) databases. The best hit reports are available for download in Excel format. 

Protein Homologs

G. hirsutum TM-1 Genome v2.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) AD1_HAU_v2.0_vs_tair.xlsx.gz
G. hirsutum TM-1 Genome v2.0 proteins with arabidopsis (Araport11) (FASTA file) AD1_HAU_v2.0_vs_tair_hit.fasta.gz
G. hirsutum TM-1 Genome v2.0 proteins without arabidopsis (Araport11) (FASTA file) AD1_HAU_v2.0_vs_tair_noHit.fasta.gz
G. hirsutum TM-1 Genome v2.0 proteins with SwissProt homologs (EXCEL file) AD1_HAU_v2.0_vs_swissprot.xlsx.gz
G. hirsutum TM-1 Genome v2.0 proteins with SwissProt (FASTA file) AD1_HAU_v2.0_vs_swissprot_hit.fasta.gz
G. hirsutum TM-1 Genome v2.0 proteins without SwissProt (FASTA file) AD1_HAU_v2.0_vs_swissprot_noHit.fasta.gz
G. hirsutum TM-1 Genome v2.0 proteins with TrEMBL homologs (EXCEL file) AD1_HAU_v2.0_vs_trembl.xlsx.gz
G. hirsutum TM-1 Genome v2.0 proteins with TrEMBL (FASTA file) AD1_HAU_v2.0_vs_trembl_hit.fasta.gz
G. hirsutum TM-1 Genome v2.0 proteins without TrEMBL (FASTA file) AD1_HAU_v2.0_vs_trembl_noHit.fasta.gz
Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium hirsutum TM-1 v2.0 assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
 
CottonGen SNP markers mapped to genome AD1_HAU_v2_SNP
CottonGen RFLP markers mapped to genome AD1_HAU_v2_RFLP
CottonGen SSR markers mapped to genome AD1_HAU_v2_SSR
CottonGen InDel markers mapped to genome AD1_HAU_v2_InDel
Publication

Chang, Xing, Xin He, Jianying Li, Zhenping Liu, Ruizhen Pi, Xuanxuan Luo, Ruipeng Wang et al. "High-quality Gossypium hirsutum and Gossypium barbadense genome assemblies reveal the landscape and evolution of centromeres." Plant Communications 5, no. 2 (2024). doi.org/10.1016/j.xplc.2023.100722

Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. hirsutum TM-1 HAU genome 2.0 assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.
G. arboreum CottonGen RefTrans v1 AD1_HAU_v2_g.arboreum_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 AD1_HAU_v2_g.hirsutum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 AD1_HAU_v2_g.barbadense_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 AD1_HAU_v2_g.raimondii_cottongen_reftransV1