Comparison and Evaluation of Cotton SNPs Developed by Transcriptome, Genome Reduction on Restriction Site Conservation and RAD-based Sequencing

Working group session: 
Comparative Genomics and Bioinformatics
Presentation type: 
oral
Authors: 
Ashrafi, Hamid; Hulse, Amanda M.; Hoegenauer, Kevin ; Wang, Fei ; Schmutz, Jeremy ; Patterson, Andrew ; Udall, Joshua A.; Stelly, David M. ; Van Deynze, Allen ; Ashrafi, Hamid; Hulse, Amanda M.; Hoegenauer, Kevin ; Wang, Fei ; Schmutz, Jeremy ; Patterson, Andrew ; Udall, Joshua A.; Stelly, David M. ; Van Deynze, Allen
Presenter: 
Ashrafi, Hamid; Ashrafi, Hamid
Correspondent: 
Ashrafi, Hamid; Ashrafi, Hamid
Abstract: 
NGS technologies are facilitating genome-wide SNP discovery in many organisms, including crop species. A few approaches have been proposed to identify SNPs in a high-throughput fashion. Among the common strategies are genome-wide transcriptome sequencing, genome reduction on restriction site conservation (GR-RSC) followed by NGS, and selection of gene-enriched regions using methylation-sensitive digestion of genomic DNA followed by NGS and bioinformatics analyses. In cotton (Gossypium sp.), we have used normalized transcriptome sequences generated by 454 Roche Biosciences and legacy Sanger-EST sequences form GenBank for hybrid de novo transcriptome assembly of upland cotton (G. hirsutum cv. TM-1). In addition, transcriptome libraries of five G. hirsutum lines including TM-1 as well as five other cotton species were sequenced by Illumina Genome Analyzer (IGA). Our hybrid assembly of TM-1 was used as a reference sequence to align Illumina reads against it and to identify SNPs among the five G. hirsutum lines as well as between G. hirsutum (TM-1) and any of G. barbadense, G. longicalyx, G. armourianum, G. mustelinum, and G. tomentosum species. Over 10,000 putative SNP markers were identified for differences among five upland cotton lines and relative to the other cultivated tetraploid species, G. barbadense; similar results were obtained for the other AD species. Many more SNPs were identified for the diploid species, e.g., ~70,000 for G. longicalyx. Using GR-RSC a combined inter-specific assembly of G.hirsutum (Acala Maxxa and TX2094) and G. barbadense (Pima-S6 and K101), was developed. Within this assembly 11,834 and 1,679 SNPs were identified in 6,467 and 965 contigs, respectively. As the third method, to assess the Floragenex Restriction Site Associated DNA (RAD) platform for intraspecific SNP development, we compared TM-1 and Acala Maxxa. Using well represented in both Illumina HiSeq samples, we identified ~1500 simple SNPs that were monomorphic within each cultivar, and close to 2000 others that were polymorphic in one parent but not the other (presumably "hemi-SNPs" or "genome-specific polymorphisms" (GSPs). The distribution of putative SNPs identified by the three technologies on the cotton D genome (G. raimondii) will be discussed. A three-way comparison of the three SNP discovery methods will be depicted and the relative advantages and disadvantages of each method will be discussed.