5 research outputs found
Grammar-based distance in progressive multiple sequence alignment
Background: We propose a multiple sequence alignment (MSA) algorithm and compare the alignment-quality and execution-time of the proposed algorithm with that of existing algorithms. The proposed progressive alignment algorithm uses a grammar-based distance metric to determine the order in which biological sequences are to be pairwise aligned. The progressive alignment occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned. Results: The performance of the proposed algorithm is validated via comparison to popular progressive multiple alignment approaches, ClustalW and T-Coffee, and to the more recently developed algorithms MAFFT, MUSCLE, Kalign, and PSAlign using the BAliBASE 3.0 database of amino acid alignment files and a set of longer sequences generated by Rose software. The proposed algorithm has successfully built multiple alignments comparable to other programs with significant improvements in running time. The results are especially striking for large datasets. Conclusion: We introduce a computationally efficient progressive alignment algorithm using a grammar based sequence distance particularly useful in aligning large datasets
Recommended from our members
Global burden associated with 85 pathogens in 2019: a systematic analysis for the Global Burden of Disease Study 2019
Background
Despite a global epidemiological transition towards increased burden of non-communicable diseases, communicable diseases continue to cause substantial morbidity and mortality worldwide. Understanding the burden of a wide range of infectious diseases, and its variation by geography and age, is pivotal to research priority setting and resource mobilisation globally.
Methods
We estimated disability-adjusted life-years (DALYs) associated with 85 pathogens in 2019, globally, regionally, and for 204 countries and territories. The term pathogen included causative agents, pathogen groups, infectious conditions, and aggregate categories. We applied a novel methodological approach to account for underlying, immediate, and intermediate causes of death, which counted every death for which a pathogen had a role in the pathway to death. We refer to this measure as the burden associated with infection, which was estimated by combining different sources of information. To compare the burden among all pathogens, we used pathogen-specific ratios to incorporate the burden of immediate and intermediate causes of death for pathogens modelled previously by the GBD. We created the ratios by using multiple cause of death data, hospital discharge data, linkage data, and minimally invasive tissue sampling data to estimate the fraction of deaths coming from the pathway to death chain. We multiplied the pathogen-specific ratios by age-specific years of life lost (YLLs), calculated with GBD 2019 methods, and then added the adjusted YLLs to age-specific years lived with disability (YLDs) from GBD 2019 to produce adjusted DALYs to account for deaths in the chain. We used standard GBD methods to calculate 95% uncertainty intervals (UIs) for final estimates of DALYs by taking the 2·5th and 97·5th percentiles across 1000 posterior draws for each quantity of interest. We provided burden estimates pertaining to all ages and specifically to the under 5 years age group.
Findings
Globally in 2019, an estimated 704 million (95% UI 610–820) DALYs were associated with 85 different pathogens, including 309 million (250–377; 43·9% of the burden) in children younger than 5 years. This burden accounted for 27·7% (and 65·5% in those younger than 5 years) of the previously reported total DALYs from all causes in 2019. Comparing super-regions, considerable differences were observed in the estimated pathogen-associated burdens in relation to DALYs from all causes, with the highest burden observed in sub-Saharan Africa (314 million [270–368] DALYs; 61·5% of total regional burden) and the lowest in the high-income super-region (31·8 million [25·4–40·1] DALYs; 9·8%). Three leading pathogens were responsible for more than 50 million DALYs each in 2019: tuberculosis (65·1 million [59·0–71·2]), malaria (53·6 million [27·0–91·3]), and HIV or AIDS (52·1 million [46·6–60·9]). Malaria was the leading pathogen for DALYs in children younger than 5 years (37·2 million [17·8–64·2]). We also observed substantial burden associated with previously less recognised pathogens, including Staphylococcus aureus and specific Gram-negative bacterial species (ie, Klebsiella pneumoniae, Escherichia coli, Pseudomonas aeruginosa, Acinetobacter baumannii, and Helicobacter pylori). Conversely, some pathogens had a burden that was smaller than anticipated.
Interpretation
Our detailed breakdown of DALYs associated with a comprehensive list of pathogens on a global, regional, and country level has revealed the magnitude of the problem and helps to indicate where research funding mismatch might exist. Given the disproportionate impact of infection on low-income and middle-income countries, an essential next step is for countries and relevant stakeholders to address these gaps by making targeted investments
Cancer Biomarker Discovery: The Entropic Hallmark
Background: It is a commonly accepted belief that cancer cells modify their transcriptional state during the progression of the disease. We propose that the progression of cancer cells towards malignant phenotypes can be efficiently tracked using high-throughput technologies that follow the gradual changes observed in the gene expression profiles by employing Shannon's mathematical theory of communication. Methods based on Information Theory can then quantify the divergence of cancer cells' transcriptional profiles from those of normally appearing cells of the originating tissues. The relevance of the proposed methods can be evaluated using microarray datasets available in the public domain but the method is in principle applicable to other high-throughput methods. Methodology/Principal Findings: Using melanoma and prostate cancer datasets we illustrate how it is possible to employ Shannon Entropy and the Jensen-Shannon divergence to trace the transcriptional changes progression of the disease. We establish how the variations of these two measures correlate with established biomarkers of cancer progression. The Information Theory measures allow us to identify novel biomarkers for both progressive and relatively more sudden transcriptional changes leading to malignant phenotypes. At the same time, the methodology was able to validate a large number of genes and processes that seem to be implicated in the progression of melanoma and prostate cancer. Conclusions/Significance: We thus present a quantitative guiding rule, a new unifying hallmark of cancer: the cancer cell's transcriptome changes lead to measurable observed transitions of Normalized Shannon Entropy values (as measured by high-throughput technologies). At the same time, tumor cells increment their divergence from the normal tissue profile increasing their disorder via creation of states that we might not directly measure. This unifying hallmark allows, via the the Jensen-Shannon divergence, to identify the arrow of time of the processes from the gene expression profiles, and helps to map the phenotypical and molecular hallmarks of specific cancer subtypes. The deep mathematical basis of the approach allows us to suggest that this principle is, hopefully, of general applicability for other diseases
Recommended from our members
Global, regional, and national incidence and mortality burden of non-COVID-19 lower respiratory infections and aetiologies, 1990-2021: a systematic analysis from the Global Burden of Disease Study 2021
Background
Lower respiratory infections (LRIs) are a major global contributor to morbidity and mortality. In 2020–21, non-pharmaceutical interventions associated with the COVID-19 pandemic reduced not only the transmission of SARS-CoV-2, but also the transmission of other LRI pathogens. Tracking LRI incidence and mortality, as well as the pathogens responsible, can guide health-system responses and funding priorities to reduce future burden. We present estimates from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2021 of the burden of non-COVID-19 LRIs and corresponding aetiologies from 1990 to 2021, inclusive of pandemic effects on the incidence and mortality of select respiratory viruses, globally, regionally, and for 204 countries and territories.
Methods
We estimated mortality, incidence, and aetiology attribution for LRI, defined by the GBD as pneumonia or bronchiolitis, not inclusive of COVID-19. We analysed 26 259 site-years of mortality data using the Cause of Death Ensemble model to estimate LRI mortality rates. We analysed all available age-specific and sex-specific data sources, including published literature identified by a systematic review, as well as household surveys, hospital admissions, health insurance claims, and LRI mortality estimates, to generate internally consistent estimates of incidence and prevalence using DisMod-MR 2.1. For aetiology estimation, we analysed multiple causes of death, vital registration, hospital discharge, microbial laboratory, and literature data using a network analysis model to produce the proportion of LRI deaths and episodes attributable to the following pathogens: Acinetobacter baumannii, Chlamydia spp, Enterobacter spp, Escherichia coli, fungi, group B streptococcus, Haemophilus influenzae, influenza viruses, Klebsiella pneumoniae, Legionella spp, Mycoplasma spp, polymicrobial infections, Pseudomonas aeruginosa, respiratory syncytial virus (RSV), Staphylococcus aureus, Streptococcus pneumoniae, and other viruses (ie, the aggregate of all viruses studied except influenza and RSV), as well as a residual category of other bacterial pathogens.
Findings
Globally, in 2021, we estimated 344 million (95% uncertainty interval [UI] 325–364) incident episodes of LRI, or 4350 episodes (4120–4610) per 100 000 population, and 2·18 million deaths (1·98–2·36), or 27·7 deaths (25·1–29·9) per 100 000. 502 000 deaths (406 000–611 000) were in children younger than 5 years, among which 254 000 deaths (197 000–320 000) occurred in countries with a low Socio-demographic Index. Of the 18 modelled pathogen categories in 2021, S pneumoniae was responsible for the highest proportions of LRI episodes and deaths, with an estimated 97·9 million (92·1–104·0) episodes and 505 000 deaths (454 000–555 000) globally. The pathogens responsible for the second and third highest episode counts globally were other viral aetiologies (46·4 million [43·6–49·3] episodes) and Mycoplasma spp (25·3 million [23·5–27·2]), while those responsible for the second and third highest death counts were S aureus (424 000 [380 000–459 000]) and K pneumoniae (176 000 [158 000–194 000]). From 1990 to 2019, the global all-age non-COVID-19 LRI mortality rate declined by 41·7% (35·9–46·9), from 56·5 deaths (51·3–61·9) to 32·9 deaths (29·9–35·4) per 100 000. From 2019 to 2021, during the COVID-19 pandemic and implementation of associated non-pharmaceutical interventions, we estimated a 16·0% (13·1–18·6) decline in the global all-age non-COVID-19 LRI mortality rate, largely accounted for by a 71·8% (63·8–78·9) decline in the number of influenza deaths and a 66·7% (56·6–75·3) decline in the number of RSV deaths.
Interpretation
Substantial progress has been made in reducing LRI mortality, but the burden remains high, especially in low-income and middle-income countries. During the COVID-19 pandemic, with its associated non-pharmaceutical interventions, global incident LRI cases and mortality attributable to influenza and RSV declined substantially. Expanding access to health-care services and vaccines, including S pneumoniae, H influenzae type B, and novel RSV vaccines, along with new low-cost interventions against S aureus, could mitigate the LRI burden and prevent transmission of LRI-causing pathogens
