1,068 research outputs found
Toric ideals of homogeneous phylogenetic models
We consider the phylogenetic tree model in which every node of the tree is
observed and binary and the transitions are given by the same matrix on each
edge of the tree. We are able to compute the Grobner basis and Markov basis of
the toric ideal of invariants for trees with up to 11 nodes. These are perhaps
the first non-trivial Grobner bases calculations in 2^11 indeterminates. We
conjecture that there is a quadratic Grobner basis for binary trees. Finally,
we give a explicit description of the polytope associated to this toric ideal
for an infinite family of binary trees and conjecture that there is a universal
bound on the number of vertices of this polytope for binary trees.Comment: 6 pages, 17 figure
Conjunctive Bayesian networks
Conjunctive Bayesian networks (CBNs) are graphical models that describe the
accumulation of events which are constrained in the order of their occurrence.
A CBN is given by a partial order on a (finite) set of events. CBNs generalize
the oncogenetic tree models of Desper et al. by allowing the occurrence of an
event to depend on more than one predecessor event. The present paper studies
the statistical and algebraic properties of CBNs. We determine the maximum
likelihood parameters and present a combinatorial solution to the model
selection problem. Our method performs well on two datasets where the events
are HIV mutations associated with drug resistance. Concluding with a study of
the algebraic properties of CBNs, we show that CBNs are toric varieties after a
coordinate transformation and that their ideals possess a quadratic Gr\"{o}bner
basis.Comment: Published in at http://dx.doi.org/10.3150/07-BEJ6133 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Phylogenetic Algebraic Geometry
Phylogenetic algebraic geometry is concerned with certain complex projective
algebraic varieties derived from finite trees. Real positive points on these
varieties represent probabilistic models of evolution. For small trees, we
recover classical geometric objects, such as toric and determinantal varieties
and their secant varieties, but larger trees lead to new and largely unexplored
territory. This paper gives a self-contained introduction to this subject and
offers numerous open problems for algebraic geometers.Comment: 15 pages, 7 figure
Genome-wide analysis points to roles for extracellular matrix remodeling, the visual cycle, and neuronal development in myopia
Myopia, or nearsightedness, is the most common eye disorder, resulting
primarily from excess elongation of the eye. The etiology of myopia, although
known to be complex, is poorly understood. Here we report the largest ever
genome-wide association study (43,360 participants) on myopia in Europeans. We
performed a survival analysis on age of myopia onset and identified 19
significant associations (p < 5e-8), two of which are replications of earlier
associations with refractive error. These 19 associations in total explain 2.7%
of the variance in myopia age of onset, and point towards a number of different
mechanisms behind the development of myopia. One association is in the gene
PRSS56, which has previously been linked to abnormally small eyes; one is in a
gene that forms part of the extracellular matrix (LAMA2); two are in or near
genes involved in the regeneration of 11-cis-retinal (RGR and RDH5); two are
near genes known to be involved in the growth and guidance of retinal ganglion
cells (ZIC2, SFRP1); and five are in or near genes involved in neuronal
signaling or development. These novel findings point towards multiple genetic
factors involved in the development of myopia and suggest that complex
interactions between extracellular matrix remodeling, neuronal development, and
visual signals from the retina may underlie the development of myopia in
humans
Efficient Replication of Over 180 Genetic Associations with Self-Reported Medical Data
While the cost and speed of generating genomic data have come down dramatically in recent years, the slow pace of collecting medical data for large cohorts continues to hamper genetic research. Here we evaluate a novel online framework for amassing large amounts of medical information in a recontactable cohort by assessing our ability to replicate genetic associations using these data. Using web-based questionnaires, we gathered self-reported data on 50 medical phenotypes from a generally unselected cohort of over 20,000 genotyped individuals. Of a list of genetic associations curated by NHGRI, we successfully replicated about 75% of the associations that we expected to (based on the number of cases in our cohort and reported odds ratios, and excluding a set of associations with contradictory published evidence). Altogether we replicated over 180 previously reported associations, including many for type 2 diabetes, prostate cancer, cholesterol levels, and multiple sclerosis. We found significant variation across categories of conditions in the percentage of expected associations that we were able to replicate, which may reflect systematic inflation of the effects in some initial reports, or differences across diseases in the likelihood of misdiagnosis or misreport. We also demonstrated that we could improve replication success by taking advantage of our recontactable cohort, offering more in-depth questions to refine self-reported diagnoses. Our data suggests that online collection of self-reported data in a recontactable cohort may be a viable method for both broad and deep phenotyping in large populations
- …
