Abstract:
Transposable elements (TEs) are the most abundant DNA components in most characterized genomes of high eukaryotes. Based on their structural features and transposition mechanisms, TEs are generally classified into two classes: retrotransposons and DNA transposons. In plants, retrotransposons are further classified into two distinct orders, long terminal repeat (LTR)-retrotransposons (Ty1/Copia and Ty3/Gypsy) and non-LTR retrotransposons (LINE and SINE), whereas DNA transposons are traditionally separated into two main orders, terminal inverted repeat (TIR) (Tc1-Mariner, hAT, Mutator, PIF/Harbinger and CACTA) and Helitron (Helitron). Although TEs are often considered as ‘junk DNA’ due to their continuous reproduction and potential disruption of the regular host genes, more evidence has unambiguously shown that they play important roles in altering gene structures, regulation of gene expression, affecting genome evolution and creating new genes. Thus, complete identification and characterization of TEs have become a priority in genome sequencing projects, and this will largely contribute to accurate annotation of protein-coding genes and other genomic components, and play significant roles in investigating potential interaction between TEs and functional genes.
Recently, several diploid and tetroploid Gossypium species' genomes have been sequenced, and the availability of their draft genome sequences has provided an unprecedented opportunity for identification, structural and functional characterization and evolutionary analysis of TEs in this economically important crop. Gossypium raimondii (DD; 2n = 6), one of the putative D-genome parents of tetraploid cotton species (such as G. hirsutum L. and G. barbadense L.) has a smaller genome size (∼737.8 Mb). So, we carried out the characterization of almost all families of TEs in the G. raimondii genome using comprehensive methods and constructed a comprehensive, specific, and user-friendly web-based database, Gossypium raimondii transposable elements database (GrTEdb). A total of 14 332 TEs were structurally annotated and clearly categorized in the G. raimondii genome, and these elements have been classified into seven distinct superfamilies based on the order of protein-coding domains, structures and/or sequence similarity, including 2929 Copia-like elements, 10 368 Gypsy-like elements, 299 L1, 12 Mutators, 435 PIF-Harbingers, 275 CACTAs and 14 Helitrons. Meanwhile, web-based sequence browsing, searching, downloading and blast tools were implemented to help users easily and effectively annotate the TEs or TE fragments in genomic sequences from G. raimondii and other closely related Gossypium species. Thus, GrTEdb provides the first web-based friendly user interface database of TEs in Gossypium species, and will also facilitate genome evolution analysis within or across Gossypium species, evaluating the impact of TEs on their host genomes, and investigating the potential interaction between TEs and protein-coding genes.