Gossypium hirsutum (AD1) 'TM-1' genome HAU_v2.0
Overview
About the Assembly We assembled the TM-1 and 3-79 genomes with Canu (version 2.1.1), which included correction, trimming, and assembly in three steps. We performed these steps manually. First, ONT reads and PacBio reads from both genomes were corrected and trimmed using Canu with default parameters (correctedErrorRate = 0.045 for PacBio reads; correctedErrorRate = 0.144 for Nanopore reads). Trimmed highquality reads from ONT (~40x) and PacBio (~40x) were delivered as input to Canu using a mix of formats with default parameters. To improve base quality, we aligned Illumina paired-end reads (~50x) to contigs using BWA–MEM and polished them with Pilon (version 1.23) (–fix bases –mindepth 10 –minmq 30). High-quality paired-end Hi-C reads based on DpnII for G. hirsutum TM-1 and G. barbadense 3-79 were mapped to the two contig-scale assemblies using Juicer (version 1.6). The original contigs were organized into chromosomes with the 3D-DNA pipeline (version 180 419) (-r 2 -i 15000 –buildgapped- map). Finally, we used Juicebox Assembly Tools (v1.11.08) to manually correct and refine the connections. Summary of assemblies of G. hirsutum and G. barbadense genome
Assembly
The chromosomes (pseudomolecules) and scaffolds for G. hrisutum 'TM-1' genome. These files belong to the Gossypium hirsutum (AD1) 'TM-1' genome HAU_v2.0
Functional Annotation
Functional annotation files for the Gossypium hirsutum TM-1 Genome v2.0 are available for download below. The Gossypium hirsutum TM-1 Genome v2.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS). Downloads
Genes
The predicted gene model, their alignments and proteins for G. hirsutum 'TM-1' genome. These files belong to the Gossypium hirsutum (AD1) 'TM-1' genome HAU_v2.0
Homology
Homology of the Gossypium hirsutum TM-1 genome v2.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-6 for the Arabidoposis proteins (Araport11, 2022-09), UniProtKB/SwissProt (Release 2024-03), and UniProtKB/TrEMBL (Release 2024-03) databases. The best hit reports are available for download in Excel format. Protein Homologs
Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium hirsutum TM-1 v2.0 assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
Publication
Chang, Xing, Xin He, Jianying Li, Zhenping Liu, Ruizhen Pi, Xuanxuan Luo, Ruipeng Wang et al. "High-quality Gossypium hirsutum and Gossypium barbadense genome assemblies reveal the landscape and evolution of centromeres." Plant Communications 5, no. 2 (2024). doi.org/10.1016/j.xplc.2023.100722 Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. hirsutum TM-1 HAU genome 2.0 assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.
|