Gossypium bickii (G1) genome JZU_v1

Overview
Analysis NameGossypium bickii (G1) genome JZU_v1
MethodPacBio, Illumina, Hi-C
Source (v1.0)
Date performed2022-08-04

About the assembly

This is about Gossypium bickii (G1) genome assembly.  Through the K-mer distribution analysis, the genome size was estimated to be 1,706.79 Mb with 0.18% heterozygosity and 74.19% repetitive sequences. For the initial assembly, a total of 248.64 Gb of PacBio long reads were generated, which was approximately 142.08-fold genome coverage, to assemble a genome of 1,765.96 Mb with 1,574 contigs and a contig N50 of 4.62 Mb. The assembled contigs were then polished by aligning PacBio long reads to the initial assembly. To increase the consensus accuracy of the assembly, the initial assembly was also corrected using 299.22 Gb high-quality Illumina reads, which was about 176.01-fold genome coverage. To construct the chromosome-level genome, a total of 277.57 Gb high-quality Hi-C fragments, which was about 163.27-fold genome coverage with 915.85 million read pairs, were used to categorize and order the assembled contigs. The Hi-C heatmap shows 13 distinct chromosomal groups. Finally, approximately 1,704.25 Mb of assembly (96.51% of the total assembled genome size) were anchored into 13 pseudochromosomes with 445 scaffolds and a scaffold N50 of 133.90 Mb, which captured 1766.07 Mb of the G1-genome.

 Assembly Summary G. bickii 
 Total length of contigs (Mb) 1765.96
 Total length of scaffolds (Mb) 1766.07
 Proportion of anchored sequences (%) 96.5
 Number of contigs 1574
 Contig N50 (Mb) 4.62
 Number of scaffolds 445
 Scaffold N50 (Mb) 133.9
 Number of genes 43 790
 TE repeats (%) 70.22
 GC content (%)  36.01

Publication - Sheng K, Sun Y, Liu M, Cao Y, Han Y, Li C, Muhammad U, Daud MK, Wang W, Li H, Samrana S, Hui Y, Zhu S*, Chen J*, Zhao T*. A reference-grade genome assembly for Gossypium bickii and insights into its genome evolution and formation of pigment glands and gossypol  Volume 4, Issue 1, 9 January 2023, 100421.

Assembly

The chromosomes (pseudomolecules) and scaffolds for Gossypium bickii (G1) genome. This file belongs to the ZJU Assembly v1.0

Chromosomes & scaffolds (FASTA format) G.bickii_G1_ZJU_v1.0.fa.gz
Downloads

All annotation files are available for download by selecting the desired data type in the left-hand side bar.  Each data type page will provide a description of the available files and links do download.

Functional Analysis

Functional annotation files for the Gossypium bickii ZJU Genome v1.0 are available for download below. The Gossypium bickii ZJU Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan G1_ZJU_v1_genes2GO.xlsx.gz
IPR assignments from InterProScan G1_ZJU_v1_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs G1_ZJU_v1_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways G1_ZJU_v1_KEGG-pathways.xlsx.gz

 

Genes

The predicted gene model, their alignments and proteins for Gossypium bickii (G1) genome. These files belong to the ZJU Assembly v1.0

Predicted gene models with exons (GFF3 format) G.bickii_G1_ZJU_v1.0.gff3.gz
Coding sequences, CDS (FASTA format) G.bickii_G1_ZJU_v1.0.cds.fa.gz
Protein sequences (FASTA format G.bickii_G1_ZJU_v1.0.pep.fa.gz
Function (TXT format) G.bickii_G1_ZJU_v1.0.function.txt.gz
Homology

Homology of the Gossypium bickii ZJU Genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2021-09) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2021-09), and UniProtKB/TrEMBL (Release 2021-09) databases. The best hit reports are available for download in Excel format. 

 

Protein Homologs

G.bickii ZJU Genome v1.0 proteins with NCBI nr homologs (EXCEL file) G1_ZJU_v1_vs_nr.xlsx.gz
G.bickii ZJU Genome v1.0 proteins with NCBI nr (FASTA file) G1_ZJU_v1_vs_nr_hit.fasta.gz
G.bickii ZJU Genome v1.0 proteins without NCBI nr (FASTA file) G1_ZJU_v1_vs_nr_noHit.fasta.gz
G.bickii ZJU Genome v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) G1_ZJU_v1_vs_tair.xlsx.gz
G.bickii ZJU Genome v1.0 proteins with arabidopsis (Araport11) (FASTA file) G1_ZJU_v1_vs_tair_hit.fasta.gz
G.bickii ZJU Genome v1.0 proteins without arabidopsis (Araport11) (FASTA file) G1_ZJU_v1_vs_tair_noHit.fasta.gz
G.bickii ZJU Genome v1.0 proteins with SwissProt homologs (EXCEL file) G1_ZJU_v1_vs_swissprot.xlsx.gz
G.bickii ZJU Genome v1.0 proteins with SwissProt (FASTA file) G1_ZJU_v1_vs_swissprot_hit.fasta.gz
G.bickii ZJU Genome v1.0 proteins without SwissProt (FASTA file) G1_ZJU_v1_vs_swissprot_noHit.fasta.gz
G.bickii ZJU Genome v1.0 proteins with TrEMBL homologs (EXCEL file) G1_ZJU_v1_vs_trembl.xlsx.gz
G.bickii ZJU Genome v1.0 proteins with TrEMBL (FASTA file) G1_ZJU_v1_vs_trembl_hit.fasta.gz
G.bickii ZJU Genome v1.0 proteins without TrEMBL (FASTA file) G1_ZJU_v1_vs_trembl_noHit.fasta.gz

 

Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium bickii ZJU me assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
 
CottonGen SNP markers mapped to genome G.bickii_G1_ZJU_SNP
CottonGen RFLP markers mapped to genome G.bickii_G1_ZJU_RFLP
CottonGen SSR markers mapped to genome G.bickii_G1_ZJU_SSR
CottonGen InDel markers mapped to genome G.bickii_G1_ZJU_InDel

 

Publication

Sheng K, Sun Y, Liu M, Cao Y, Han Y, Li C, Muhammad U, Daud MK, Wang W, Li H, Samrana S, Hui Y, Zhu S, Chen J, Zhao T. A reference-grade genome assembly for Gossypium bickii and insights into its genome evolution and formation of pigment glands and gossypol. Plant communications. 2022 online: doi: 10.1016/j.xplc.2022.100421

Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. bickii genome assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.

 

G. arboreum CottonGen RefTrans v1 G.bickii_G1_ZJU_g.arboreum_cottongen_refTransV1
G. hirsutum CottonGen RefTrans v1 G.bickii_G1_ZJU_g.hirsutum_cottongen_refTransV1
G. barbadense CottonGen RefTrans v1 G.bickii_G1_ZJU_g.barbadense_cottongen_refTransV1
G. raimondii CottonGen RefTrans v1 G.bickii_G1_ZJU_g.raimondii_cottongen_refTransV1