Identification of single nucleotide polymorphisms from the EST data of Gossypium hirsutum

Working group session: 
Comparative Genomics and Bioinformatics
Presentation type: 
oral
Authors: 
Saeed, Muhammad ; Liaquat, Anum
Presenter: 
Saeed, Muhammad
Correspondent: 
Saeed, Muhammad
Abstract: 
Single nucleotide polymorphisms (SNPs) are the markers of choice. SNPs can be identified experimentally by sequencing approaches or in silico. Identification of SNPs through sequencing approaches is expensive and time-consuming. In silico identification of SNPs is cost-effective and efficient approach. In the present study, an in silico approach was adopted to find out the SNPs in Gossypium hirsutum L. using publicly available expressed sequence tags (EST) data. EST sequences of Bikaneri narma were downloaded from the Gossypium hirsutum EST database. These EST sequences were of fibre tissue. Similar sequences of these EST sequences were identified by the Blastn search. For each EST sequence about 250 similar sequences were retrieved from the GenBank. HaploSNPer software was used to identify SNPs from these EST sequences which incorporated CAP3 program for the construction of the contigs. In this study, we used 499 EST sequences for the identification of SNPs. With the use of online SNP identification tool HaploSNPer, SNPs were identified from 499 EST sequences of Gossypium hirsutum cv. Bikaneri narma. Maximum number of potential SNPs was identified for the EST sequence, JG453752.1. One thousand and fifty one potential SNPs were identified for this sequence. Maximum number of reliable SNPS was identified for the EST sequence, JG453733.1. Seven hundred and eighteen reliable SNPs were identified for this sequence. In these 499 EST sequences a total number of 15809 potential SNPs were discovered. The total number of reliable SNPs identified for these EST sequences were 9990. In the potential SNPs, number of transitions, transversions and InDels were 7272, 7857 and 2336 respectively. In the total number of 7272 transitions, 3457 were C/T and 3815 were A/T. In the total number of 7857 transversions, 2845 were A/T, 1875 were A/C, 1185 were C/G, and 1952 were T/G. In the total number of 9990 reliable SNPs, number of transitions, transversions and InDels were 4702, 3919 and 930 respectively. In 4702 transitions, 2425 were C/T, and 2277 were A/T. In 3919 transversions, 1368 were A/T, 920 were A/C, 673 were C/G, and 958 were T/G. From this analysis, it is evident that the number of transitions in reliable SNPs is higher followed by the transversions. The number of InDels is markedly low. SNPs markers show polymorphisms at single nucleotide level. The identified SNPs will be a great source for the genetic evaluation of existing cotton germplasm for fiber potential. These SNPs can be employed for the molecular breeding efforts for the development of elite cotton cultivars.