Gossypium barbadense (AD2) '3-79' genome HAU_v3.0

Overview
Analysis NameGossypium barbadense (AD2) '3-79' genome HAU_v3.0
MethodONT and PacBio; Canu v2.1.1; Hi-C reads based on DpnII
Source (v3)
Date performed2024-11-08

About the Assembly

We assembled the TM-1 and 3-79 genomes with Canu (version 2.1.1), which included correction, trimming, and assembly in three steps. We performed these steps manually. First, ONT reads and PacBio reads from both genomes were corrected and trimmed using Canu with default parameters (correctedErrorRate = 0.045 for PacBio reads; correctedErrorRate = 0.144 for Nanopore reads). Trimmed highquality reads from ONT (~40x) and PacBio (~40x) were delivered as input to Canu using a mix of formats with default parameters. To improve base quality, we aligned Illumina paired-end reads (~50x) to contigs using BWA–MEM and polished them with Pilon (version 1.23) (–fix bases –mindepth 10 –minmq 30). High-quality paired-end Hi-C reads based on DpnII for G. hirsutum TM-1 and G. barbadense 3-79 were mapped to the two contig-scale assemblies using Juicer (version 1.6). The original contigs were organized into chromosomes with the 3D-DNA pipeline (version 180 419) (-r 2 -i 15000 –buildgapped- map). Finally, we used Juicebox Assembly Tools (v1.11.08) to manually correct and refine the connections.

 Summary of assemblies of G. hirsutum and G. barbadense genome

  G. hirsutum G. barbadense
Contig N90 5,021,880 1,985,260
Contig N80 9,197,101 4,198,208
Contig N70 13,863,239 6,962,112
Contig N60 18,410,734 12,139,909
Contig N50 21,961,441 12,139,909
Longest contig length 84,707,317 49,101,280
Contig number 1,418 2,064
Total size 2,282,609,487 2,216,666,023
Scaffold N50 108, 550, 191 93, 110, 895
Scaffold number 582 1,688
Genome size 2,324,185,275 2,254,770,940
Pseudochromosomes size (Mb) 2,302,180,022 2,169,217,327
Percentage of anchoring 99.05% 97.20%
Assembly

The chromosomes (pseudomolecules) and scaffolds for G. barbadense '3-79' genome. These files belong to the Gossypium hirsutum (AD3) '3-79' genome HAU_v3.0

Chromosomes & scaffolds (FASTA format) G.barbadense_HAU-379_assembly_v3.0.fasta.gz
Functional Analysis

Functional annotation files for the Gossypium barbadense '3-79' HAU Genome v3.0 are available for download below. The G. barbadense '3-79' HAU Genome v3.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan AD2_3-79_HAU_v3_genes2GO.xlsx.gz
IPR assignments from InterProScan AD2_3-79_HAU_v3_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs AD2_3-79_HAU_v3_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways AD2_3-79_HAU_v3_KEGG-pathways.xlsx.gz
Genes

The predicted gene model, their alignments and proteins for G. barbadense '3-79' genome. These files belong to the Gossypium barbadense (AD2) '3-79' genome HAU_v3.0

Predicted gene models with exons (GFF3 format) G.barbadense-AD2_v3.0_gene.gff3.gz
Coding sequences, CDS (FASTA format)  
Protein sequences (FASTA format)  
Homology

Homology of the Gossypium barbadense 3-79 HAU Genome v3.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-6  for the Arabidoposis proteins (Araport11, 2022-09), UniProtKB/SwissProt (Release 2024-03), and UniProtKB/TrEMBL (Release 2024-03) databases. The best hit reports are available for download in Excel format. 

Protein Homologs

G. barbadense 3-79 HAU Genome v3.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) Gbarbadense_3-79_HAU_v3.0_vs_tair.xlsx.gz
G. barbadense 3-79 HAU Genome v3.0 proteins with arabidopsis (Araport11) (FASTA file) Gbarbadense_3-79_HAU_v3.0_vs_tair_hit.fasta.gz
G. barbadense 3-79 HAU Genome v3.0 proteins without arabidopsis (Araport11) (FASTA file) Gbarbadense_3-79_HAU_v3.0_vs_tair_noHit.fasta.gz
G. barbadense 3-79 HAU Genome v3.0 proteins with SwissProt homologs (EXCEL file) Gbarbadense_3-79_HAU_v3.0_vs_swissprot.xlsx.gz
G. barbadense 3-79 HAU Genome v3.0 proteins with SwissProt (FASTA file) Gbarbadense_3-79_HAU_v3.0_vs_swissprot_hit.fasta.gz
G. barbadense 3-79 HAU Genome v3.0 proteins without SwissProt (FASTA file) Gbarbadense_3-79_HAU_v3.0_vs_swissprot_noHit.fasta.gz
G. barbadense 3-79 HAU Genome v3.0 proteins with TrEMBL homologs (EXCEL file) Gbarbadense_3-79_HAU_v3.0_vs_trembl.xlsx.gz
G. barbadense 3-79 HAU Genome v3.0 proteins with TrEMBL (FASTA file) Gbarbadense_3-79_HAU_v3.0_vs_trembl_hit.fasta.gz
G. barbadense 3-79 HAU Genome v3.0 proteins without TrEMBL (FASTA file) Gbarbadense_3-79_HAU_v3.0_vs_trembl_noHit.fasta.gz
Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium barbadense 3-79 v3.0 assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
 
CottonGen SNP markers mapped to genome AD2_3-79_HAU_v3_SNP
CottonGen RFLP markers mapped to genome AD2_3-79_HAU_v3_RFLP
CottonGen SSR markers mapped to genome AD2_3-79_HAU_v3_SSR
CottonGen InDel markers mapped to genome AD2_3-79_HAU_v3_InDel
Publication

Chang, Xing, Xin He, Jianying Li, Zhenping Liu, Ruizhen Pi, Xuanxuan Luo, Ruipeng Wang et al. "High-quality Gossypium hirsutum and Gossypium barbadense genome assemblies reveal the landscape and evolution of centromeres." Plant Communications 5, no. 2 (2024). doi.org/10.1016/j.xplc.2023.100722

Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. barbadense 3-79 HAU genome v3.0 assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3.
G. arboreum CottonGen RefTrans v1 AD2_3-79_HAU_v3_g.arboreum_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 AD2_3-79_HAU_v3_g.hirsutum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 AD2_3-79_HAU_v3_g.barbadense_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 AD2_3-79_HAU_v3_g.raimondii_cottongen_reftransV1