Search CORE

496 research outputs found

Species-level functional profiling of metagenomes and metatranscriptomes.

Author: A Sczyrba
A Shafquat
AE Duran-Pinedo
AK Sharma
B Buchfink
B Langmead
BE Suzek
BK Swan
C Burke
C Luo
Curtis Huttenhower
D Medini
DH Huson
DT Truong
DT Truong
E Pasolli
EA Franzosa
EA Franzosa
Eric A. Franzosa
George Weingart
GG Silva
Gholamali Rahnavard
H Hauswedell
J Kim
J Lloyd-Price
J Lloyd-Price
J Ravel
J. Gregory Caporaso
JA Fuhrman
K Huang
Karen Schwarzberg Lipson
Lauren J. McIver
LR Thompson
LR Thompson
Luke R. Thompson
M Hamady
M Kanehisa
M Scholz
Melanie Schirmer
MY Galperin
N Segata
N Segata
Nicola Segata
OU Mason
P Petrenko
PJ Turnbaugh
R Caspi
RC Edgar
RD Finn
Rob Knight
S Abubucker
S Nayfach
S Sunagawa
S Sunagawa
T Bose
UniProt Consortium.
W Huang
Y Ye
Y Zhao
Publication venue: eScholarship, University of California
Publication date: 01/11/2018
Field of study

Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types

Crossref

eScholarship - University of California

Rule-based knowledge aggregation for large-scale protein sequence analysis of influenza A viruses

Author: Olivo Miotto
Tin Wee Tan
Vladimir Brusic
JC Obenauer
O Miotto
MY Galperin
M Garcia-Solaco
DA Benson
AM Khan
JR Swedlow
PD Karp
V Brusic
T Berners-Lee
K Wolstencroft
E Neumann
R Stevens
F Yergeau
JB Bard
DL McGuinness
E Ghedin
R Stevens
O Miotto
O Miotto
B McBride
I Horrocks
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Background: The explosive growth of biological data provides opportunities for new statistical and comparative analyses of large information sets, such as alignments comprising tens of thousands of sequences. In such studies, sequence annotations frequently play an essential role, and reliable results depend on metadata quality. However, the semantic heterogeneity and annotation inconsistencies in biological databases greatly increase the complexity of aggregating and cleaning metadata. Manual curation of datasets, traditionally favoured by life scientists, is impractical for studies involving thousands of records. In this study, we investigate quality issues that affect major public databases, and quantify the effectiveness of an automated metadata extraction approach that combines structural and semantic rules. We applied this approach to more than 90,000 influenza A records, to annotate sequences with protein name, virus subtype, isolate, host, geographic origin, and year of isolation. Results: Over 40,000 annotated Influenza A protein sequences were collected by combining information from more than 90,000 documents from NCBI public databases. Metadata values were automatically extracted, aggregated and reconciled from several document fields by applying user-defined structural rules. For each property, values were recovered from ≥88.8% of records, with accuracy exceeding 96% in most cases. Because of semantic heterogeneity, each property required up to six different structural rules to be combined. Significant quality differences between databases were found: GenBank documents yield values more reliably than documents extracted from GenPept. Using a simple set of semantic rules and a reasoner, we reconstructed relationships between sequences from the same isolate, thus identifying 7640 isolates. Validation of isolate metadata against a simple ontology highlighted more than 400 inconsistencies, leading to over 3,000 property value corrections. Conclusion: To overcome the quality issues inherent in public databases, automated knowledge aggregation with embedded intelligence is needed for large-scale analyses. Our results show that user-controlled intuitive approaches, based on combination of simple rules, can reliably automate various curation tasks, reducing the need for manual corrections to approximately 5% of the records. Emerging semantic technologies possess desirable features to support today's knowledge aggregation tasks, with a potential to bring immediate benefits to this field. © 2006 Brahmachary et al; licensee BioMed Central Ltd

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

ScholarBank@NUS

UQ eSpace (University of Queensland)

Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes

BACKGROUND: Lineage-specific, or taxonomically restricted genes (TRGs), especially those that are species and strain-specific, are of special interest because they are expected to play a role in defining exclusive ecological adaptations to particular niches. Despite this, they are relatively poorly studied and little understood, in large part because many are still orphans or only have homologues in very closely related isolates. This lack of homology confounds attempts to establish the likelihood that a hypothetical gene is expressed and, if so, to determine the putative function of the protein. METHODOLOGY/PRINCIPAL FINDINGS: We have developed "QIPP" ("Quality Index for Predicted Proteins"), an index that scores the "quality" of a protein based on non-homology-based criteria. QIPP can be used to assign a value between zero and one to any protein based on comparing its features to other proteins in a given genome. We have used QIPP to rank the predicted proteins in the proteomes of Bacteria and Archaea. This ranking reveals that there is a large amount of variation in QIPP scores, and identifies many high-scoring orphans as potentially "authentic" (expressed) orphans. There are significant differences in the distributions of QIPP scores between orphan and non-orphan genes for many genomes and a trend for less well-conserved genes to have lower QIPP scores. CONCLUSIONS: The implication of this work is that QIPP scores can be used to further annotate predicted proteins with information that is independent of homology. Such information can be used to prioritize candidates for further analysis. Data generated for this study can be found in the OrphanMine at http://www.genomics.ceh.ac.uk/orphan_mine

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

Novel cyclic di-GMP effectors of the YajQ protein family control bacterial virulence

Author: A Teplyakov
C Momany
C Saveanu
CD Boyd
CR Guzzo
D-G Ha
David Mackey
Delphine L. Caly
F Tao
GG Anderson
H Mulcahy
H Slater
H Sondermann
IS Pultz
J Duevel
J Mansfield
J Nesper
J. Maxwell Dow
Joseph Ward
K-H Chin
KB Twomey
Melanie Febrer
MY Galperin
R Hengge
Robert P. Ryan
RP Ryan
RP Ryan
RP Ryan
RP Ryan
RP Ryan
RP Ryan
S Moreau-Marquis
S-Q An
S-Q An
S-Q An
Sarah L. Murdoch
SE Maddocks
Shi-qi An
T Lundback
T Schirmer
U Romling
X Qiao
X-H Lu
Y Fouhy
Y McCarthy
Y McCarthy
Yvonne McCarthy
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Bis-(3 ',5 ') cyclic di-guanylate (cyclic di-GMP) is a key bacterial second messenger that is implicated in the regulation of many critical processes that include motility, biofilm formation and virulence. Cyclic di-GMP influences diverse functions through interaction with a range of effectors. Our knowledge of these effectors and their different regulatory actions is far from complete, however. Here we have used an affinity pull-down assay using cyclic di-GMP-coupled magnetic beads to identify cyclic di-GMP binding proteins in the plant pathogen Xanthomonas campestris pv. campestris (Xcc). This analysis identified XC_3703, a protein of the YajQ family, as a potential cyclic di-GMP receptor. Isothermal titration calorimetry showed that the purified XC_3703 protein bound cyclic di-GMP with a high affinity (K-d similar to 2 mu M). Mutation of XC_3703 led to reduced virulence of Xcc to plants and alteration in biofilm formation. Yeast two-hybrid and far-western analyses showed that XC_3703 was able to interact with XC_2801, a transcription factor of the LysR family. Mutation of XC_2801 and XC_3703 had partially overlapping effects on the transcriptome of Xcc, and both affected virulence. Electromobility shift assays showed that XC_3703 positively affected the binding of XC_2801 to the promoters of target virulence genes, an effect that was reversed by cyclic di-GMP. Genetic and functional analysis of YajQ family members from the human pathogens Pseudomonas aeruginosa and Stenotrophomonas maltophilia showed that they also specifically bound cyclic di-GMP and contributed to virulence in model systems. The findings thus identify a new class of cyclic di-GMP effector that regulates bacterial virulence

Public Library of Science (PLOS)

Southampton (e-Prints Soton)

Crossref

Directory of Open Access Journals

Irish Universities

PubMed Central

Cork Open Research Archive

Discovery Research Portal

The Francis Crick Institute

Automatically extracting functionally equivalent proteins from SwissProt

Author: Lisa EM McMillan
Andrew CR Martin
MY Galperin
JM Hurst
Y Yaron
MC Lill
AA Akindahunsi
EV Koonin
WM Fitch
S Shibata
A Wagner
KP O'Brien
RL Tatusov
RL Tatusov
Y Lee
II Artamonova
E Kretschmann
GX Yu
V Kunin
A Amores
A Meyer
EJ Stellwag
T Hulsen
CH Wu
T Hulsen
F Chen
V van Noort
SB Rice
SF Altschul
RA Notebaart
LB Koski
Publication venue: Springer Nature
Publication date: 01/10/2008
Field of study

In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UCL Discovery

Enlighten

Role of the PAS sensor domains in the Bacillus subtilis sporulation kinase KinA

Author: Bartels C
Bieri M
Burkholder WF
Casino P
Cole JL
Cunningham KA
Dago AE
Eswaramoorthy P
Ferris HU
Folta-Stogniew E
Galperin MY
Gilles-Gonzalez MA
Gotoh Y
Henry JT
Jacques DA
Laue TM
Lee J
Lindebro MC
Okada A
Philo JS
Rowland SL
Rowland SL
Sambrook J
Scheuermann TH
Schuck P
Schuck P
Stafford WF
Stafford WF
Stephenson K
Stephenson K
Tanaka T
Taylor RG
Tomomori C
Wang L
Whitten AE
Wolanin PM
Publication venue: 'American Society for Microbiology'
Publication date: 01/05/2013
Field of study

Histidine kinases are sophisticated molecular sensors that are used by bacteria to detect and respond to a multitude of environmental signals. KinA is the major histidine kinase required for initiation of sporulation upon nutrient deprivation in Bacillus subtilis. KinA has a large N-terminal region (residues 1 to 382) that is uniquely composed of three tandem Per-ARNT-Sim (PAS) domains that have been proposed to constitute a sensor module. To further enhance our understanding of this "sensor" region, we defined the boundaries that give rise to the minimal autonomously folded PAS domains and analyzed their homo- and heteroassociation properties using analytical ultracentrifugation, nuclear magnetic resonance (NMR) spectroscopy, and multiangle laser light scattering. We show that PAS(A) self-associates very weakly, while PAS(C) is primarily a monomer. In contrast, PAS(B) forms a stable dimer (K-d [dissociation constant] o

Crossref

UQ eSpace (University of Queensland)

Signatures of arithmetic simplicity in metabolic network architecture

Metabolic networks perform some of the most fundamental functions in living cells, including energy transduction and building block biosynthesis. While these are the best characterized networks in living systems, understanding their evolutionary history and complex wiring constitutes one of the most fascinating open questions in biology, intimately related to the enigma of life's origin itself. Is the evolution of metabolism subject to general principles, beyond the unpredictable accumulation of multiple historical accidents? Here we search for such principles by applying to an artificial chemical universe some of the methodologies developed for the study of genome scale models of cellular metabolism. In particular, we use metabolic flux constraint-based models to exhaustively search for artificial chemistry pathways that can optimally perform an array of elementary metabolic functions. Despite the simplicity of the model employed, we find that the ensuing pathways display a surprisingly rich set of properties, including the existence of autocatalytic cycles and hierarchical modules, the appearance of universally preferable metabolites and reactions, and a logarithmic trend of pathway length as a function of input/output molecule size. Some of these properties can be derived analytically, borrowing methods previously used in cryptography. In addition, by mapping biochemical networks onto a simplified carbon atom reaction backbone, we find that several of the properties predicted by the artificial chemistry model hold for real metabolic networks. These findings suggest that optimality principles and arithmetic simplicity might lie beneath some aspects of biochemical complexity

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Boston University Institutional Repository (OpenBU)

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

The genetic organisation of prokaryotic two-component system signalling pathways

Author: Robert HN Williams
David E Whitworth
RB Bourret
C Fabret
T Mizuno
JM Skerker
K Yamamoto
MT Laub
DE Whitworth
L Li
M Weigt
L Burger
L Løvdok
PJ Piggot
R Paul
S Jagadeesan
S Wegener-Feldbrügge
PJA Cock
PI Higgs
DE Whitworth
N Majdalani
S Romagnoli
LE Ulrich
M Barakat
MY Galperin
MY Galperin
DE Whitworth
MY Galperin
MY Galperin
DE Whitworth
PJ Cock
DE Whitworth
PJA Cock
A Pallejà
Y Fukuda
PJA Cock
S Schübbe
JL Appleby
P Dam
M Pertea
I Macarthur
KA Walker
W Zhang
S Romagnoli
A Busch
LE Ulrich
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Two-component systems (TCSs) are modular and diverse signalling pathways, involving a stimulus-responsive transfer of phosphoryl groups from transmitter to partner receiver domains. TCS gene and domain organisation are both potentially informative regarding biological function, interaction partnerships and molecular mechanisms. However, there is currently little understanding of the relationships between domain architecture, gene organisation and TCS pathway structure. Results Here we classify the gene and domain organisation of TCS gene loci from 1405 prokaryotic replicons (>40,000 TCS proteins). We find that 200 bp is the most appropriate distance cut-off for defining whether two TCS genes are functionally linked. More than 90% of all TCS gene loci encode just one or two transmitter and/or receiver domains, however numerous other geometries exist, often with large numbers of encoded TCS domains. Such information provides insights into the distribution of TCS domains between genes, and within genes. As expected, the organisation of TCS genes and domains is affected by phylogeny, and plasmid-encoded TCS exhibit differences in organisation from their chromosomally-encoded counterparts. Conclusions We provide here an overview of the genomic and genetic organisation of TCS domains, as a resource for further research. We also propose novel metrics that build upon TCS gene/domain organisation data and allow comparisons between genomic complements of TCSs. In particular, '<it>percentage orphaned TCS genes</it>' (or 'Dissemination') and '<it>percentage of complex loci</it>' (or 'Sophistication') appear to be useful discriminators, and to reflect mechanistic aspects of TCS organisation not captured by existing metrics.</p

Crossref

Aberystwyth Research Portal

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Online Research Database In Technology

Dimerisation induced formation of the active site and the identification of three metal sites in EAL-phosphodiesterases

Author: A Deepthi
A Sundriyal
A Tchigvintsev
A Ueda
A Vagin
A Winkler
AD Tischler
AD Tischler
AJ Schmidt
B Huang
BW Holloway
C Chan
C Romier
CD Boyd
CW Phippen
D Balasubramanian
D Bellini
DA D’Argenio
DJ Hosfield
DL Caly
EF Pettersen
F Rao
F Rao
G Kuppuraj
G Minasov
G Winter
G Winter
GL Winsor
GN Murshudov
H Ceri
H Kulasakara
H Zheng
I Ivanov
ID Hay
J Robert-Paganin
JH Merritt
K Syson
K-H Choi
KD Miner
KM Thormann
M Christen
M Valentini
M Vorachit
MD Winn
MVAS Navarro
MY Galperin
N Barraud
O Kirillina
P Aldridge
P Emsley
P Evans
PD Newell
PR Evans
R Donlan
R Paul
R Simm
R Tamayo
RM Keegan
RP Ryan
RP Ryan
RZ Liao
TRM Barends
TT Hoang
U Römling
V Stelitano
VB Chen
W Grzybkowski
W Kabsch
Y Li
Y Qi
Y Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2017
Field of study

The bacterial second messenger cyclic di-3′,5′-guanosine monophosphate (c-di-GMP) is a key regulator of bacterial motility and virulence. As high levels of c-di-GMP are associated with the biofilm lifestyle, c-di-GMP hydrolysing phosphodiesterases (PDEs) have been identified as key targets to aid development of novel strategies to treat chronic infection by exploiting biofilm dispersal. We have studied the EAL signature motif-containing phosphodiesterase domains from the Pseudomonas aeruginosa proteins PA3825 (PA3825EAL) and PA1727 (MucREAL). Different dimerisation interfaces allow us to identify interface independent principles of enzyme regulation. Unlike previously characterised two-metal binding EAL-phosphodiesterases, PA3825EAL in complex with pGpG provides a model for a third metal site. The third metal is positioned to stabilise the negative charge of the 5′-phosphate, and thus three metals could be required for catalysis in analogy to other nucleases. This newly uncovered variation in metal coordination may provide a further level of bacterial PDE regulation

University of Essex Research Repository

Crossref

Southampton (e-Prints Soton)

PubMed Central

Whole genome sequence and manual annotation of Clostridium autoethanogenum, an industrially relevant bacterium

Author: Alexander Goesman
Alexander T. Wichlacz
Anne M. Henstra
B Boeckmann
Bart Pander
C Claudel-Renard
Charlie Hodgman
Christopher M. Humphreys
CJA Sigrist
Craig Woods
D Hyatt
David Barrett
E Stackebrandt
EB Fichot
EJ Richardson
F Meyer
Florence J. Annan
H Ogata
H Tae
HN Abubackar
I Schomburg
J Abrini
J Eid
J Marmur
JL Cotter
JL Cotter
JM Bruno-Barcena
Jochen Blom
K Lagesen
KD Pruitt
Klaus Winzer
M Köpke
M Köpke
M Köpke
M Monot
M Pagni
M Scheer
MA Quail
MG Ross
MY Galperin
N Chowdhary
Neil R. Thomas
Nigel P. Minton
O Tirado-Acevedo
P Jones
Pawel Piatek
Peter Rowe
PF Levy
R Mazzoli
R Sims
RD Finn
Ronja Breitkopf
RS Tanner
Rupert Norman
S Koren
S Kurtz
Samantha McLean
Sarah Schatschneider
SD Brown
SF Altschul
SM Utturkar
T Tatusova
The State of Food Insecurity in the World 2008
The UniProt Consortium
Thomas Millat
TJ Treangen
TM Lowe
Y Feng
Y Guo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Clostridium autoethanogenum is an acetogenic bacterium capable of producing high value commodity chemicals and biofuels from the C1 gases present in synthesis gas. This common industrial waste gas can act as the sole energy and carbon source for the bacterium that converts the low value gaseous components into cellular building blocks and industrially relevant products via the action of the reductive acetyl-CoA (Wood-Ljungdahl) pathway. Current research efforts are focused on the enhancement and extension of product formation in this organism via synthetic biology approaches. However, crucial to metabolic modelling and directed pathway engineering is a reliable and comprehensively annotated genome sequence

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

Springer - Publisher Connector

Nottingham Trent Institutional Repository (IRep)

PubMed Central