Gossypium raimondii (D5) 'Grai D502' genome HAU_v1

Overview
Analysis NameGossypium raimondii (D5) 'Grai D502' genome HAU_v1
Methodllumina and PacBio
Source
Date performed2021-08-09

About the assembly

In this study, we applied Oxford Nanopore sequencing technology to assemble G. rotundifolium (K2*) 'K201', G. arboreum (A2) 'SXY1' and G. raimondii (D5) 'D502' genomes. G. arboreum and G. raimondii genomes have been de novo assembled previously using Illumina and PacBio reads, but both genomes have a number of sequence gaps and require an improvement in assembly contiguity. We generated a total of 304 Gb, 212 Gb, 125 Gb Nanopore sequencing data with a genome coverage 124×, 131×, 167× for K2*, A2 and D5, respectively. We assembled 3,593, 1,173 and 366 contigs for G. rotundifolium, G. arboreum and G. raimondii with a contig length of 2.44 Gb, 1.62 Gb and 0.75 Gb, respectively (Table 1). These initial contigs were polished using Illumina paired-end reads with a genome coverage of 108×, 118×, 132× for K2*, A2 and D5. The contig N50 is 5.33 Mb, 11.69 Mb and 17.04 Mb for K2*, A2 and D5, respectively. The maximum contig has a length of 32.72 Mb, 58.57 Mb and 43.74 Mb. After polishing contig using Illumina reads, we used high-through chromosome conformation capture (Hi-C) data to order and orient contigs, aimed at constructing pseudo chromosomes of each species. In the Hi-C assisted assembly, 2,559, 485 and 201 contigs were placed on the 13 chromosomes of K2*, A2 and D5 genomes, occupying over 99% of genome length.

*Should be K12

 

Table 1. Summary of genome assemblies and annotations of G. rotundifolium, G. arboreum and G.raimondii.

Genomic feature G. rotundifolium 'Grot K201' G. arboreum 'Shixiya1' G. raimondii 'Grai D502'
Total length of contigs, bp 2,444,364,209 1,621,008,062 750,197,587
Total length of scaffolds, bp 2,444,484,509 1,621,030,562 750,205,487
Total length of gaps, bp 120,300 22,500 7,900
Percentage of anchoring 99.28% 99.47% 99.57%
Percentage of anchoring and ordering 93.16% 98.84% 99.01%
Number of contigs 3,593 1,173 366
Number of scaffolds 2,390 948 287
Contig N50, bp 5,326,689 11,691,474 17,043,680
Contig N90, bp 621,066 2,910,421 3,537,560
Scaffold N50, bp 177,839,665 129,592,444 57,716,579
Scaffold N90, bp 115,394,628 93,157,762 49,929,625
Maximun contig length, bp 32,728,186 58,575,076 43,739,617
Maximum scaffold length, bp 205,722,655 143,367,608 63,188,200
GC content 36.38% 35.16% 33.23%
Percentage of repeat sequences 80.92% 68.05% 57.04%
GC content 36.38% 35.16% 33.23%
Number of genes 41,590 41,778 40,820

 

Supplementary Table 5. Comparing D5 genome with previously published genome version.

Genomic feature HAU_D5 NSF_D5 JGI_D5
Total assemblied size, bp 750,197,587 734,884,094 761,406,121
Number of total scaffolds 287 - 1,033
Total length of gaps, bp 7,900 17,400 1,870,200
Contig N50, bp 17,043,680 6,291,832 136,998
Scaffold N50, bp 57,716,579 58,819,159 62,175,169
Scaffold N90, bp 49,929,625 46,322,098 45,765,648
Percentage of anchoring and ordering 99.01% - 98.4%
Number of genes 40,820 41,030 37,505

 

Publication

Wang M, et al. Comparative genome analyses highlight transposon-mediated genome expansion and the evolutionary architecture of 3D genomic folding in cotton. Molecular biology and evolution. 2021 May 11.

Assembly

The chromosomes (pseudomolecules) and scaffolds for Gossypium raimondii '(D5)' genome. This file belongs to the HAU G. raimondii Assembly v1.0

Chromosomes & scaffolds (FASTA format) G.raimondii_HAU.fa.gz
Functional Analysis

Functional annotation files for the Gossypium raimondii HAU Genome v1.0 are available for download below. The Gossypium raimondii HAU Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan D5_HAU_v1_genes2GO.xlsx.gz
IPR assignments from InterProScan D5_HAU_v1_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs D5_HAU_v1_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways D5_HAU_v1_KEGG-pathways.xlsx.gz

 

Genes

The predicted gene model, their alignments and proteins for Gossypium raimondii'(D5)' genome. These files belong to the HAU G. raimondii Assembly v1.0

Predicted gene models with exons (GFF3 format) G.raimondii_HAU.gff3.gz
Coding sequences, CDS (FASTA format) G.raimondii_HAU.cds.fa.gz
Protein sequences (FASTA format) G.raimondii_HAU.pep.fa.gz
Homology

Homology of the Gossypium raimondii HAU Genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2021-09) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2021-09), and UniProtKB/TrEMBL (Release 2021-09) databases. The best hit reports are available for download in Excel format. 

 

Protein Homologs

G.raimondii HAU Genome v1.0 proteins with NCBI nr homologs (EXCEL file) D5_HAU_v1_vs_nr.xlsx.gz
G.raimondii HAU Genome v1.0 proteins with NCBI nr (FASTA file) D5_HAU_v1_vs_nr_hit.fasta.gz
G.raimondii HAU Genome v1.0 proteins without NCBI nr (FASTA file) D5_HAU_v1_vs_nr_noHit.fasta.gz
G.raimondii HAU Genome v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) D5_HAU_v1_vs_tair.xlsx.gz
G.raimondii HAU Genome v1.0 proteins with arabidopsis (Araport11) (FASTA file) D5_HAU_v1_vs_tair_hit.fasta.gz
G.raimondii HAU Genome v1.0 proteins without arabidopsis (Araport11) (FASTA file) D5_HAU_v1_vs_tair_noHit.fasta.gz
G.raimondii HAU Genome v1.0 proteins with SwissProt homologs (EXCEL file) D5_HAU_v1_vs_swissprot.xlsx.gz
G.raimondii HAU Genome v1.0 proteins with SwissProt (FASTA file) D5_HAU_v1_vs_swissprot_hit.fasta.gz
G.raimondii HAU Genome v1.0 proteins without SwissProt (FASTA file) D5_HAU_v1_vs_swissprot_noHit.fasta.gz
G.raimondii HAU Genome v1.0 proteins with TrEMBL homologs (EXCEL file) D5_HAU_v1_vs_trembl.xlsx.gz
G.raimondii HAU Genome v1.0 proteins with TrEMBL (FASTA file) D5_HAU_v1_vs_trembl_hit.fasta.gz
G.raimondii HAU Genome v1.0 proteins without TrEMBL (FASTA file) D5_HAU_v1_vs_trembl_noHit.fasta.gz

 

Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium raimondii ISU me assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
 
CottonGen SNP markers mapped to genome G.raimondii_HAU-D5_SNP
CottonGen RFLP markers mapped to genome G.raimondii_HAU-D5_RFLP
CottonGen SSR markers mapped to genome G.raimondii_HAU-D5_SSR
CottonGen InDel markers mapped to genome G.raimondii_HAU-D5_InDel

 

Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. raimondii genome assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.

 

G. arboreum CottonGen RefTrans v1 G.raimondii_HAU-D5_g.arboreum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 G.raimondii_HAU-D5_G.barbadense_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 G.raimondii_HAU-D5_g.hirsutum_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 G.raimondii_HAU-D5_g.raimondii_cottongen_reftransV1