Gossypium sturtianum (C1) genome HAU_v1

Overview
Analysis NameGossypium sturtianum (C1) genome HAU_v1
MethodOxford Nanopore; Illumina (Canu v1.3)
Source
Date performed2023-09-08

About the assembly

Seven new assembly and annotation of seven diploid cotton genomes were reported in Wang, et al. Net. Genet. 2022 Dec. They are: Two of G. herbaceum (A1) genomes (a wild form 'A1a' and an A1 cultivar 'ZhongCao1’), each one of G. anomalum (B1), G. sturtianum (C1), G. stocksii (E1), G. longicalyx (F1) and G. bickii (G1). The seven genomes were assembled by integration of Nanopore long reads (126–161×), Illumina short reads (52–79×) and high-throughput chromosome conformation capture (Hi-C) data. Table 1 is the summary of detailed information in 7 genome assemblies.

Table 1. Summary of detailed information in 7 genome assemblies.

Assembly Metrics (bp) G.herbaceum (A1a) G.herbaceum (A1) G.anomalum (B1) G.sturtianum (C1) G.stocksii (E1) G.longicalyx (F1) G.bickii (G1)
Total length of all contigs 151,8491,120 1,621,008,062 1,202,727,438 1,903,530,088 1,442,088,789 1,198,534,575 1,606,432,167
Number of contigs 759 848 162 564 126 148 154
Contig N50 10,515,852 11,199,225 19,802,926 7,671,138 28,801,805 20,138,000 22,888,721
Contig N90 3,279,696 3,426,106 4,795,930 1,970,427 6,479,102 5,689,122 5,951,034
Minimum contig length 1,618 3,584 254,410 30,321 39,072 40,977 31,914
Average contig length 1,785,825 1,910,051 3,836,759 3,375,053 11,445,149 8,098,206 10,431,377
Maximum contig length 46,901,260 44,454,407 10,090,100 31,024,449 64,292,180 56,164,400 74,215,947
Total length of scaffolds 1,518,510,620 1,514,399,279 1,202,738,338 1,903,576,288 1,442,098,089 1,198,547,475 1,606,445,567
Number of scaffolds 600 651 53 122 33 19 20
Scaffold N50 123,512,226 124,037,455 98,351,605 156,041,810 115,821,386 95,944,508 132,272,514
Scaffold N90 94,341,082 94,910,960 73,514,615 110,824,594 88,888,218 76,618,481 97,521,415
Minimum scaffold length 3,584 1,618 30,275 30,321 39,072 66,792 31,914
Average scaffold length 2,530,851 2,326,266 22,693,176 15,603,084 43,699,942 63,081,446 80,322,278
Maximum scaffold length 134,223,852 137,970,533 107,526,953 173,404,516 129,798,129 110,469,152 152,440,763
Anchored length 1,496,892,602 1,490,728,467 1,197,579,856 1,891,333,840 1,438,429,494 1,197,329,593 1,605,310,299

 

Publication

Wang M et al., Genomic innovation and regulatory rewiring during evolution of the cotton genus Gossypium. Nat Genet, 2022 Dec;54(12):1959-1971

 

Assembly

The chromosomes (pseudomolecules) and scaffolds for Gossypium sturtianum (C1) genome. This file belongs to the HAU G. sturtianum C1 Assembly v1.0.

Chromosomes & scaffolds (FASTA format) G.sturtianum_C1_genome_HAU.fa.gz
Functional Analysis

Functional annotation files for the Gossypium sturtianum HAU Genome v1.0 are available for download below. The Gossypium sturtianum HAU Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan C1_HAU_v1_genes2GO.xlsx.gz
IPR assignments from InterProScan C1_HAU_v1_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs C1_HAU_v1_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways C1_HAU_v1_KEGG-pathways.xlsx.gz
Genes

The predicted gene model, their alignments and proteins for Gossypium sturtianum (C1) genome. These files belong to the HAU G. sturtianum C1 Assembly v1.0.

Predicted gene models with exons (GFF3 format) G.sturtianum_C1_HAU.gff3.gz
Coding sequences, CDS (FASTA format) G.sturtianum_C1_HAU.cds.fa.gz
Protein sequences (FASTA format) G.sturtianum_C1_HAU.pep.fa.gz
Homology

Homology of the Gossypium sturtianum HAU genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-6  for the Arabidoposis proteins (Araport11, 2022-09), UniProtKB/SwissProt (Release 2023-07), and UniProtKB/TrEMBL (Release 2023-07) databases. The best hit reports are available for download in Excel format. 

Protein Homologs

G.sturtianum HAU Genome v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) C1_HAU_v1_vs_tair.xlsx.gz
G.sturtianum HAU Genome v1.0 proteins with arabidopsis (Araport11) (FASTA file) C1_HAU_v1_vs_tair_hit.fasta.gz
G.sturtianum HAU Genome v1.0 proteins without arabidopsis (Araport11) (FASTA file) C1_HAU_v1_vs_tair_noHit.fasta.gz
G.sturtianum HAU Genome v1.0 proteins with SwissProt homologs (EXCEL file) C1_HAU_v1_vs_swissprot.xlsx.gz
G.sturtianum HAU Genome v1.0 proteins with SwissProt (FASTA file) C1_HAU_v1_vs_swissprot_hit.fasta.gz
G.sturtianum HAU Genome v1.0 proteins without SwissProt (FASTA file) C1_HAU_v1_vs_swissprot_noHit.fasta.gz
G.sturtianum HAU Genome v1.0 proteins with TrEMBL homologs (EXCEL file) C1_HAU_v1_vs_trembl.xlsx.gz
G.sturtianum HAU Genome v1.0 proteins with TrEMBL (FASTA file) C1_HAU_v1_vs_trembl_hit.fasta.gz
G.sturtianum HAU Genome v1.0 proteins without TrEMBL (FASTA file) C1_HAU_v1_vs_trembl_noHit.fasta.gz
Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium sturtianum HAU v1.0 assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
 
CottonGen SNP markers mapped to genome G.sturtianum_C1_HAU_SNP
CottonGen RFLP markers mapped to genome G.sturtianum_C1_HAU_RFLP
CottonGen SSR markers mapped to genome G.sturtianum_C1_HAU_SSR
CottonGen InDel markers mapped to genome G.sturtianum_C1_HAU_InDel
Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. sturtianum genome assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.

 

G. arboreum CottonGen RefTrans v1 G.sturtianum_C1_HAU_g.arboreum_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 G.sturtianum_C1_HAU_g.hirsutum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 G.sturtianum_C1_HAU_g.barbadense_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 G.sturtianum_C1_HAU_g.raimondii_cottongen_reftransV1