|
Overview
Analysis Name | Gossypium hirsutum (AD1) 'TM-1' genome NAU-NBI_v1.1 |
Method | SOAPdenovo (12) |
Source | Illumina HiSeq 2000 reads from various insert size libraries (NAU-NBI) |
Date performed | 2015-04-20 |
About the assembly
An allohaploid plant was derived from the allotetraploid cotton (TM-1) and used for genome sequencing. 612 Gb (245× genome equivalent) of high-quality Illumina reads were produced and assembled using SOAPdenovo12. The resulting contigs and scaffold were integrated using 174,454 pairs of Sanger-sequenced BAC-end sequences comprising 116.5 Mb, and assembled into the TM-1 genome sequence (V1.0). To correct for misassembly, classify the homoeologous segments and order the scaffolds, an ultradense genetic map was developed using genotyping by sequencing of 59 F2 individuals derived from TM-1 and G. barbadense cv. Hai7124. The map consisted of 4,999,048 single-nucleotide polymorphism (SNP) loci and 4,049 recombination bins spanning 4,042 cM in 26 linkage groups. Using the map, 218 misassembled scaffolds were corrected (442.2 Mb, or 17.6%, of the genome sequence) in the assembly V1.0 and most misassembled scaffolds were caused by ambiguous homeolog sequences. The final assembly (V1.1) comprised 265,279 contigs (N50 = 34.0 kb) and 40,407 scaffolds (N50 = 1.6 Mb). The total scaffold length (2.4 Gb) spanned ~96% of the estimated allotetraploid genome (2.5 Gb), of which 6,146 scaffolds (2.3 Gb) were aligned and organized into 26 pseudochromosomes, including 1.5 Gb (4,635 scaffolds) in the A subgenome and 0.8 Gb (1,511 scaffolds) in the D subgenome. Furthermore, 1.9 Gb (79.2%) was oriented based on linkage maps.
Summary |
A subgenome |
D subgenome |
UN* |
Total |
Scaffold number |
4,635 |
1,511 |
34,261 |
40,407 |
Scaffold length |
1,477.1 Mb |
831.0 Mb |
124.6 Mb |
2,432.7 Mb |
Scaffold N50 |
1.4 Mb |
2.5 Mb |
7,160 bp |
1.6 Mb |
Oriented scaffold number |
955 |
501 |
NA |
1,456 |
Oriented scaffold size |
1,150.8 Mb |
769.5 Mb |
NA |
1,920.4 Mb |
Contig number |
142,201 |
44,057 |
79,021 |
265,279 |
Contig length |
1,220.6 Mb |
746.8 Mb |
100.7 Mb |
2,068.1 Mb |
Contig N50 |
30.7 kb |
47.2 kb |
2,542 bp |
34.0 kb |
Gene number |
32,032 |
34,402 |
4,044 |
70,478 |
Total gene length |
107.2 Mb |
109.8 Mb |
3.1 Mb |
220 Mb |
Transposable elements |
843.5 Mb |
433 Mb |
62.5 Mb |
1,339 Mb |
GC content |
34.4% |
33.3% |
35.5% |
34.1% |
*Un-anchored scaffolds
Publication
Zhang et. al., Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nature Biotechnology. 33, 531–537. 2015
Assembly
The chromosomes (pseudomolecules) and scaffolds for Gossypium hirsutum (AD1) Genome NAU-NBI Assembly v1.1
Assembly pseudomolecules (FASTA format) |
NBI_Gossypium_hirsutum_v1.1.fa.gz |
Downloads
All assembly and annotation files are available for download by selecting the desired data type in the left-hand side bar. Each data type page will provide a description of the available files and links to download.
Functional Annotation
Functional annotation for Gossypium hirsutum (AD1) Genome NBI Assembly v1.1 (Performed by NBI)
Arabidopsis/Swissprot/TrEMBL/KEGG/Pfam/Interpro/GO Annotation |
NBI_TM1-annotation.xlsx |
Functional annotation for Gossypium hirsutum (AD1) Genome NBI Assembly v1.1 (Performed by the CottonGen Team of the Main Bioinformatics Lab at WSU.)
Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the G. hirsutum genome assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen and CMap are linked to JBrowse.
Protein Homology
Homology of the Gossypium hirsutum NAU-NBI v1.1 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2018-05) and 1e-6 for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format.
Publication
Authors |
Tianzhen Zhang, Yan Hu, Wenkai Jiang, Lei Fang, Xueying Guan, Jiedan Chen, Jinbo Zhang, Christopher A Saski, Brian E Scheffler, David M Stelly, Amanda M Hulse-Kemp, Qun Wan, Bingliang Liu, Chunxiao Liu, Sen Wang, Mengqiao Pan, Yangkun Wang, Dawei Wang, Wenxue Ye, Lijing Chang, Wenpan Zhang, Qingxin Song, Ryan C Kirkbride, Xiaoya Chen, Elizabeth Dennis, Danny J Llewellyn, Daniel G Peterson, Peggy Thaxton, Don C Jones, Qiong Wang, Xiaoyang Xu, Hua Zhang, Huaitong Wu, Lei Zhou, Gaofu Mei, Shuqi Chen, Yue Tian, Dan Xiang, Xinghe Li, Jian Ding, Qiyang Zuo, Linna Tao, Yunchao Liu, Ji Li, Yu Lin, Yuanyuan Hui, Zhisheng Cao, Caiping Cai, Xiefei Zhu, Zhi Jiang, Baoliang Zhou, Wangzhen Guo, Ruiqiang Li & Z Jeffrey Chen |
Title |
Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement |
Journal |
Nature Biotechnology |
Issue |
33 |
Pages |
531-537 |
Year |
2015 |
Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. hirsutum genome assembly. Alignments with an alignment length of 97% and 98% identify were preserved. The available files are in GFF3 format.
|