Discovery and development of intra-specific single nucleotide polymorphism markers in Upland cotton (G. hirsutum L.)

Working group session: 
Functional Genomics
Presentation type: 
oral
Authors: 
Zhu, Qian-Hao
Presenter: 
Zhu, Qian-Hao
Correspondent: 
Zhu, Qian-Hao
Abstract: 
Discovery and development of intra-specific single nucleotide polymorphism markers in Upland cotton (G. hirsutum L.) Qian-Hao Zhu, Andrew Spriggs, Jen Taylor, Danny Llewellyn, Iain Wilson CSIRO Plant Industry, GPO Box 1600, Canberra, ACT 2601, Australia Single nucleotide polymorphisms (SNPs) are the most abundant type of molecular markers in plants. Only varietal SNPs that are different between varieties and not sub-genome SNPs that are just differences between the two sub-genomes (At and Dt) of tetraploid cotton are useful as markers in breeding. Varietal SNPs have not yet been practically used in cotton breeding because they are difficult to discover due to low intra-specific polymorphism and very high sequence identity between homoeologous genes in cotton. Next-generation sequencing is now facilitating genome-wide SNP discovery in many crops, including cotton; however, identification of reliable varietal SNPs is still a challenge in polyploids. We have used transcriptome sequencing (RNA-seq), restriction-site associated DNA (RAD) sequencing, and novel bioinformatic strategies to identify varietal SNPs among 18 commercial Upland cotton varieties. Using the RNA-seq data, we identified 37413 varietal SNPs based on the rationale that they can be more confidently called when flanked by genome-specific SNPs that assign reads to their respective sub-genomes. Of these SNPs, 22121 did not have an additional varietal SNP within their 20-bp flanking regions so can be used in most common SNP genotyping assays. Based on the gene annotations of G. raimondii, 40.52% and 25.39% of these SNPs had non-synonymous and synonymous effects, respectively. Approximately 2.49% of these SNPs affected translation start, stop or splice sites, while the remaining (~31.60%) were located in non-coding regions. From the RAD data, we identified an additional 3090 varietal SNPs between two of the varieties. Verification rates of 72.6-91.7% were achieved for subsets of these varietal SNPs using different genotyping platforms. Depending on the assay platform, however, many of the SNPs behave as dominant markers because of amplification from both homoeologous loci, but the number of SNPs acting as co-dominant markers increases when one or more sub-genome-specific SNP(s) are incorporated in their assay primers, giving them greater utility for breeding applications. A G. hirsutum genetic map with 1,244 SNP markers and covering 5557.42 cM was constructed and used to map quantitative trait loci for leaf shape, leaf trichome and pollen colour. Our collection of G. hirsutum varietal SNPs provides the cotton community with a valuable marker resource applicable to genetic analyses and breeding programs.