191 research outputs found

    Phylogenomics and the dynamic genome evolution of the genus Streptococcus

    Get PDF
    The genus Streptococcus comprises important pathogens that have a severe impact on human health and are responsible for substantial economic losses to agriculture. Here, we utilize 46 Streptococcus genome sequences (44 species), including eight species sequenced here, to provide the first genomic level insight into the evolutionary history and genetic basis underlying the functional diversity of all major groups of this genus. Gene gain/loss analysis revealed a dynamic pattern of genome evolution characterized by an initial period of gene gain followed by a period of loss, as the major groups within the genus diversified. This was followed by a period of genome expansion associated with the origins of the present extant species. The pattern is concordant with an emerging view that genomes evolve through a dynamic process of expansion and streamlining. A large proportion of the pan-genome has experienced lateral gene transfer (LGT) with causative factors, such as relatedness and shared environment, operating over different evolutionary scales. Multiple gene ontology terms were significantly enriched for each group, and mapping terms onto the phylogeny showed that those corresponding to genes born on branches leading to the major groups represented approximately one-fifth of those enriched. Furthermore, despite the extensive LGT, several biochemical characteristics have been retained since group formation, suggesting genomic cohesiveness through time, and that these characteristics may be fundamental to each group. For example, proteolysis: mitis group; urea metabolism: salivarius group; carbohydrate metabolism: pyogenic group; and transcription regulation: bovis group

    The receptor like kinase at Rhg1-a/Rfs2 caused pleiotropic resistance to sudden death syndrome and soybean cyst nematode as a transgene by altering signaling responses

    Get PDF
    Background: Soybean (Glycine max (L. Merr.)) resistance to any population of Heterodera glycines (I.), or Fusarium virguliforme (Akoi, O’Donnell, Homma & Lattanzi) required a functional allele at Rhg1/Rfs2. H. glycines, the soybean cyst nematode (SCN) was an ancient, endemic, pest of soybean whereas F. virguliforme causal agent of sudden death syndrome (SDS), was a recent, regional, pest. This study examined the role of a receptor like kinase (RLK) GmRLK18-1 (gene model Glyma_18_02680 at 1,071 kbp on chromosome 18 of the genome sequence) within the Rhg1/Rfs2 locus in causing resistance to SCN and SDS. Results: A BAC (B73p06) encompassing the Rhg1/Rfs2 locus was sequenced from a resistant cultivar and compared to the sequences of two susceptible cultivars from which 800 SNPs were found. Sequence alignments inferred that the resistance allele was an introgressed region of about 59 kbp at the center of which the GmRLK18-1 was the most polymorphic gene and encoded protein. Analyses were made of plants that were either heterozygous at, or transgenic (and so hemizygous at a new location) with, the resistance allele of GmRLK18-1. Those plants infested with either H. glycines or F. virguliforme showed that the allele for resistance was dominant. In the absence of Rhg4 the GmRLK18-1 was sufficient to confer nearly complete resistance to both root and leaf symptoms of SDS caused by F. virguliforme and provided partial resistance to three different populations of nematodes (mature female cysts were reduced by 30–50%). In the presence of Rhg4 the plants with the transgene were nearly classed as fully resistant to SCN (females reduced to 11% of the susceptible control) as well as SDS. A reduction in the rate of early seedling root development was also shown to be caused by the resistance allele of the GmRLK18-1. Field trials of transgenic plants showed an increase in foliar susceptibility to insect herbivory. Conclusions: The inference that soybean has adapted part of an existing pathogen recognition and defense cascade (H.glycines; SCN and insect herbivory) to a new pathogen (F. virguliforme; SDS) has broad implications for crop improvement. Stable resistance to many pathogens might be achieved by manipulation the genes encoding a small number of pathogen recognition proteins

    Genomic Resources for Asparagales

    Get PDF
    Enormous genomic resources have been developed for plants in the monocot order Poales; however, it is not known how useful these resources will be for other economically important monocots. Asparagales are a monophyletic order sister to class Commelinanae that carries Poales, and is the second most economically important monocot order. Development of genomic resources for and their application to Asparagales are challenging because of huge nuclear genomes and the relatively long generation times required to develop segregating families. We synthesized a normalized eDNA library of onion (Allium cepa) and produced II ,008 unique expressed sequence tags (ESTs) for comparative genomic analyses of Asparagales and Poales. Alignments of onion ESTs, Poales ESTs, and genomic sequences from rice were used to design oligonucleotide primers amplifying genomic regions from asparagus, garlic, and onion. Sequence analyses of these genomic regions revealed microsatellites, insertions/deletions, and single nucleotide polymorphisms for comparative mapping of rice and Asparagales vegetables. Initial mapping revealed no obvious synteny at the recombinationallevel between onion and rice, indicating that genomic resources developed for Poales may not be applicable to the monocots as a whole. Genomic analyses of Asparagales would greatly benefit from EST sequencing and deep-coverage, large-insert genomic libraries of representative small-genome model species within the higher and lower Asparagales, such as asparagus and orchid, respectively

    Population gene introgression and high genome plasticity for the zoonotic pathogen Streptococcus agalactiae

    Get PDF
    The influence that bacterial adaptation (or niche partitioning) within species has on gene spillover and transmission among bacteria populations occupying different niches is not well understood. Streptococcus agalactiae is an important bacterial pathogen that has a taxonomically diverse host range making it an excellent model system to study these processes. Here we analyze a global set of 901 genome sequences from nine diverse host species to advance our understanding of these processes. Bayesian clustering analysis delineated twelve major populations that closely aligned with niches. Comparative genomics revealed extensive gene gain/loss among populations and a large pan-genome of 9,527 genes, which remained open and was strongly partitioned among niches. As a result, the biochemical characteristics of eleven populations were highly distinctive (significantly enriched). Positive selection was detected and biochemical characteristics of the dispensable genes under selection were enriched in ten populations. Despite the strong gene partitioning, phylogenomics detected gene spillover. In particular, tetracycline resistance (which likely evolved in the human-associated population) from humans to bovine, canines, seals, and fish, demonstrating how a gene selected in one host can ultimately be transmitted into another, and biased transmission from humans to bovines was confirmed with a Bayesian migration analysis. Our findings show high bacterial genome plasticity acting in balance with selection pressure from distinct functional requirements of niches that is associated with an extensive and highly partitioned dispensable genome, likely facilitating continued and expansive adaptation

    Experimental validation of novel genes predicted in the un-annotated regions of the Arabidopsis genome

    Get PDF
    BACKGROUND: Several lines of evidence support the existence of novel genes and other transcribed units which have not yet been annotated in the Arabidopsis genome. Two gene prediction programs which make use of comparative genomic analysis, Twinscan and EuGene, have recently been deployed on the Arabidopsis genome. The ability of these programs to make use of sequence data from other species has allowed both Twinscan and EuGene to predict over 1000 genes that are intergenic with respect to the most recent annotation release. A high throughput RACE pipeline was utilized in an attempt to verify the structure and expression of these novel genes. RESULTS: 1,071 un-annotated loci were targeted by RACE, and full length sequence coverage was obtained for 35% of the targeted genes. We have verified the structure and expression of 378 genes that were not present within the most recent release of the Arabidopsis genome annotation. These 378 genes represent a structurally diverse set of transcripts and encode a functionally diverse set of proteins. CONCLUSION: We have investigated the accuracy of the Twinscan and EuGene gene prediction programs and found them to be reliable predictors of gene structure in Arabidopsis. Several hundred previously un-annotated genes were validated by this work. Based upon this information derived from these efforts it is likely that the Arabidopsis genome annotation continues to overlook several hundred protein coding genes

    Highly syntenic regions in the genomes of soybean, Medicago truncatula, and Arabidopsis thaliana

    Get PDF
    BACKGROUND: Recent genome sequencing enables mega-base scale comparisons between related genomes. Comparisons between animals, plants, fungi, and bacteria demonstrate extensive synteny tempered by rearrangements. Within the legume plant family, glimpses of synteny have also been observed. Characterizing syntenic relationships in legumes is important in transferring knowledge from model legumes to crops that are important sources of protein, fixed nitrogen, and health-promoting compounds. RESULTS: We have uncovered two large soybean regions exhibiting synteny with M. truncatula and with a network of segmentally duplicated regions in Arabidopsis. In all, syntenic regions comprise over 500 predicted genes spanning 3 Mb. Up to 75% of soybean genes are colinear with M. truncatula, including one region in which 33 of 35 soybean predicted genes with database support are colinear to M. truncatula. In some regions, 60% of soybean genes share colinearity with a network of A. thaliana duplications. One region is especially interesting because this 500 kbp segment of soybean is syntenic to two paralogous regions in M. truncatula on different chromosomes. Phylogenetic analysis of individual genes within these regions demonstrates that one is orthologous to the soybean region, with which it also shows substantially denser synteny and significantly lower levels of synonymous nucleotide substitutions. The other M. truncatula region is inferred to be paralogous, presumably resulting from a duplication event preceding speciation. CONCLUSION: The presence of well-defined M. truncatula segments showing orthologous and paralogous relationships with soybean allows us to explore the evolution of contiguous genomic regions in the context of ancient genome duplication and speciation events

    The complete chloroplast genome sequence of Gossypium hirsutum: organization and phylogenetic relationships to other angiosperms

    Get PDF
    BACKGROUND: Cotton (Gossypium hirsutum) is the most important fiber crop grown in 90 countries. In 2004–2005, US farmers planted 79% of the 5.7-million hectares of nuclear transgenic cotton. Unfortunately, genetically modified cotton has the potential to hybridize with other cultivated and wild relatives, resulting in geographical restrictions to cultivation. However, chloroplast genetic engineering offers the possibility of containment because of maternal inheritance of transgenes. The complete chloroplast genome of cotton provides essential information required for genetic engineering. In addition, the sequence data were used to assess phylogenetic relationships among the major clades of rosids using cotton and 25 other completely sequenced angiosperm chloroplast genomes. RESULTS: The complete cotton chloroplast genome is 160,301 bp in length, with 112 unique genes and 19 duplicated genes within the IR, containing a total of 131 genes. There are four ribosomal RNAs, 30 distinct tRNA genes and 17 intron-containing genes. The gene order in cotton is identical to that of tobacco but lacks rpl22 and infA. There are 30 direct and 24 inverted repeats 30 bp or longer with a sequence identity ≥ 90%. Most of the direct repeats are within intergenic spacer regions, introns and a 72 bp-long direct repeat is within the psaA and psaB genes. Comparison of protein coding sequences with expressed sequence tags (ESTs) revealed nucleotide substitutions resulting in amino acid changes in ndhC, rpl23, rpl20, rps3 and clpP. Phylogenetic analysis of a data set including 61 protein-coding genes using both maximum likelihood and maximum parsimony were performed for 28 taxa, including cotton and five other angiosperm chloroplast genomes that were not included in any previous phylogenies. CONCLUSION: Cotton chloroplast genome lacks rpl22 and infA and contains a number of dispersed direct and inverted repeats. RNA editing resulted in amino acid changes with significant impact on their hydropathy. Phylogenetic analysis provides strong support for the position of cotton in the Malvales in the eurosids II clade sister to Arabidopsis in the Brassicales. Furthermore, there is strong support for the placement of the Myrtales sister to the eurosid I clade, although expanded taxon sampling is needed to further test this relationship

    Complete Plastid Genome Sequence of Daucus Carota: Implications for Biotechnology and Phylogeny of Angiosperms

    Get PDF
    Background Carrot (Daucus carota) is a major food crop in the US and worldwide. Its capacity for storage and its lifecycle as a biennial make it an attractive species for the introduction of foreign genes, especially for oral delivery of vaccines and other therapeutic proteins. Until recently efforts to express recombinant proteins in carrot have had limited success in terms of protein accumulation in the edible tap roots. Plastid genetic engineering offers the potential to overcome this limitation, as demonstrated by the accumulation of BADH in chromoplasts of carrot taproots to confer exceedingly high levels of salt resistance. The complete plastid genome of carrot provides essential information required for genetic engineering. Additionally, the sequence data add to the rapidly growing database of plastid genomes for assessing phylogenetic relationships among angiosperms. Results The complete carrot plastid genome is 155,911 bp in length, with 115 unique genes and 21 duplicated genes within the IR. There are four ribosomal RNAs, 30 distinct tRNA genes and 18 intron-containing genes. Repeat analysis reveals 12 direct and 2 inverted repeats ≥ 30 bp with a sequence identity ≥ 90%. Phylogenetic analysis of nucleotide sequences for 61 protein-coding genes using both maximum parsimony (MP) and maximum likelihood (ML) were performed for 29 angiosperms. Phylogenies from both methods provide strong support for the monophyly of several major angiosperm clades, including monocots, eudicots, rosids, asterids, eurosids II, euasterids I, and euasterids II. Conclusion The carrot plastid genome contains a number of dispersed direct and inverted repeats scattered throughout coding and non-coding regions. This is the first sequenced plastid genome of the family Apiaceae and only the second published genome sequence of the species-rich euasterid II clade. Both MP and ML trees provide very strong support (100% bootstrap) for the sister relationship of Daucus with Panax in the euasterid II clade. These results provide the best taxon sampling of complete chloroplast genomes and the strongest support yet for the sister relationship of Caryophyllales to the asterids. The availability of the complete plastid genome sequence should facilitate improved transformation efficiency and foreign gene expression in carrot through utilization of endogenous flanking sequences and regulatory elements

    A newly-developed community microarray resource for transcriptome profiling in Brassica species enables the confirmation of Brassica-specific expressed sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The <it>Brassica </it>species include an important group of crops and provide opportunities for studying the evolutionary consequences of polyploidy. They are related to <it>Arabidopsis thaliana</it>, for which the first complete plant genome sequence was obtained and their genomes show extensive, although imperfect, conserved synteny with that of <it>A. thaliana</it>. A large number of EST sequences, derived from a range of different <it>Brassica </it>species, are available in the public database, but no public microarray resource has so far been developed for these species.</p> <p>Results</p> <p>We assembled unigenes using ~800,000 EST sequences, mainly from three species: <it>B. napus</it>, <it>B. rapa </it>and <it>B. oleracea</it>. The assembly was conducted with the aim of co-assembling ESTs of orthologous genes (including homoeologous pairs of genes in <it>B. napus </it>from each of the A and C genomes), but resolving assemblies of paralogous, or paleo-homoeologous, genes (<it>i.e</it>. the genes related by the ancestral genome triplication observed in diploid <it>Brassica </it>species). 90,864 unique sequence assemblies were developed. These were incorporated into the BAC sequence annotation for the <it>Brassica rapa </it>Genome Sequencing Project, enabling the identification of cognate genomic sequences for a proportion of them. A 60-mer oligo microarray comprising 94,558 probes was developed using the unigene sequences. Gene expression was analysed in reciprocal resynthesised <it>B. napus </it>lines and the <it>B. oleracea </it>and <it>B. rapa </it>lines used to produce them. The analysis showed that significant expression could consistently be detected in leaf tissue for 35,386 unigenes. Expression was detected across all four genotypes for 27,355 unigenes, genome-specific expression patterns were observed for 7,851 unigenes and 180 unigenes displayed other classes of expression pattern. Principal component analysis (PCA) clearly resolved the individual microarray datasets for <it>B. rapa</it>, <it>B. oleracea </it>and resynthesised <it>B. napus</it>. Quantitative differences in expression were observed between the resynthesised <it>B. napus </it>lines for 98 unigenes, most of which could be classified into non-additive expression patterns, including 17 that showed cytoplasm-specific patterns. We further characterized the unigenes for which A genome-specific expression was observed and cognate genomic sequences could be identified. Ten of these unigenes were found to be <it>Brassica</it>-specific sequences, including two that originate from complex loci comprising gene clusters.</p> <p>Conclusion</p> <p>We succeeded in developing a <it>Brassica </it>community microarray resource. Although expression can be measured for the majority of unigenes across species, there were numerous probes that reported in a genome-specific manner. We anticipate that some proportion of these will represent species-specific transcripts and the remainder will be the consequence of variation of sequences within the regions represented by the array probes. Our studies demonstrated that the datasets obtained from the arrays can be used for typical analyses, including PCA and the analysis of differential expression. We have also demonstrated that <it>Brassica</it>-specific transcripts identified <it>in silico </it>in the sequence assembly of public EST database accessions are indeed reported by the array. These would not be detectable using arrays designed using <it>A. thaliana </it>sequences.</p
    corecore