Comparative and functional analyses of the cotton A- and D- genomes

Working group session: 
Breeding and Applied Genomics
Presentation type: 
oral
Authors: 
Li, Qin; Zhu, Yu-Xian
Presenter: 
Zhu, Yu-Xian
Correspondent: 
Zhu, Yu-Xian
Abstract: 
Upland cotton (Gossypium hirsutum) is an allotetraploid species which processes two distinct but closely related sub-genomes (AADD, 2n = 52). Here we present the first sequenced cultivated cotton genome, the G. arboreum genome (AA, 2n = 26), which is also known as “the tree cotton” or “the Asian cotton”. The assembled cotton A-genome is 1,694 Mb in size and more than 90 percent of the assembly can be anchored on 13 chromosomes. G. arboreum contains 41,330 protein-coding genes and 68% of the genome is repetitive sequences. The A- and D-genome share 95% gene families in common and exhibit high co-linearity at chromosome level, while the A genome is more than two-fold of the D (1,694 Mb compared with 775 Mb). Evolutionary analysis showed that the speciation of G. arboreum and G. raimondii dated about 5 million years ago (MYA), after a common cotton-specific whole-genome duplication event about 13–30 MYA. Phylogenic and evolutionary analyses indicated that in the past 10 million years, the A-genome has gone through multiple retrotransposition activities. Comparative transcriptomics analysis revealed that the expression levels of ACO (1-Aminocyclopropane-1-carboxylic acid oxidase) which is the key enzyme in ethylene biosynthesis pathway, were widely differed in A and D genomes. Neither extremely high level of ethylene in D-genome nor extremely low levels in A-genome was beneficial to cotton fiber development. Further analysis showed that the disease-resistant gene (R-gene) families were greatly expanded in the D-genome but contracted in the A-genome. The expanded genes in G. raimondii had positive response to the infection of Verticillium at transcription level while in G. arboreum only one gene showed mild up-regulation.