45 research outputs found
A model-based approach to selection of tag SNPs
BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes the entropy of the tag SNPs subject to constraints on the number of SNPs. This approach requires an appropriate probabilistic model. Compared to simple measures of Linkage Disequilibrium (LD), a good model of haplotype sequences can more accurately account for LD structure. It also provides a machinery for the prediction of tagged SNPs and thereby to assess the performances of tag sets through their ability to predict larger SNP sets. RESULTS: Here, we compute the description code-lengths of SNP data for an array of models and we develop tag SNP selection methods based on these models and the strategy of entropy maximization. Using data sets from the HapMap and ENCODE projects, we show that the hidden Markov model introduced by Li and Stephens outperforms the other models in several aspects: description code-length of SNP data, information content of tag sets, and prediction of tagged SNPs. This is the first use of this model in the context of tag SNP selection. CONCLUSION: Our study provides strong evidence that the tag sets selected by our best method, based on Li and Stephens model, outperform those chosen by several existing methods. The results also suggest that information content evaluated with a good model is more sensitive for assessing the quality of a tagging set than the correct prediction rate of tagged SNPs. Besides, we show that haplotype phase uncertainty has an almost negligible impact on the ability of good tag sets to predict tagged SNPs. This justifies the selection of tag SNPs on the basis of haplotype informativeness, although genotyping studies do not directly assess haplotypes. A software that implements our approach is available
The structure of idealization in biological theories: the case of the Wright-Fisher model.
In this paper we present a new framework of idealization in biology. We characterize idealizations as a network of counterfactual and hypothetical conditionals that can exhibit different “degrees of contingency”. We use this idea to say that, in departing more or less from the actual world, idealizations can serve numerous epistemic, methodological or heuristic purposes within scientific research. We defend that, in part, this structure explains why idealizations, despite being deformations of reality, are so successful in scientific practice. For illustrative purposes, we provide an example from population genetics, the Wright-Fisher Mode
Giant Abdominopelvic Haematoma Arising from Ovulation in a Glanzmanns Thrombasthenia Patient with Platelet Refractoriness: Treatment with Surgery and Intra-Abdominal Tranexamic Acid
Glanzmann's thrombasthenia (GT) is a very rare autosomal recessive genetic bleeding disorder. Women with coagulation abnormalities are at increased risk of corpus luteum rupture and haemoperitoneum. Here we present a severe case of GT resulting in a haematoma extending from the pelvis to the liver that could only be controlled by surgery and intra-abdominal tranexamic acid. Copyright (c) 2012 S. Karger AG, Base
Efficient Estimation of Mutation Rates during Individual Development by Minimization of Chi-Square
Mutation primarily occurs when cells divide and it is highly desirable to have knowledge of the rate of mutations for each of the cell divisions during individual development. Recently, recessive lethal or nearly lethal mutations which were observed in a large mutation accumulation experiment using Drosophila melanogaster suggested that mutation rates vary significantly during the germline development of male Drosophila melanogaster. The analysis of the data was based on a combination of the maximum likelihood framework with numerical assistance from a newly developed coalescent algorithm. Although powerful, the likelihood based framework is computationally highly demanding which limited the scope of the inference. This paper presents a new estimation approach by minimizing chi-square statistics which is asymptotically consistent with the maximum likelihood method. When only at most one mutation in a family is considered the minimization of chi-square is simplified to a constrained weighted minimum least square method which can be solved easily by optimization theory. The new methods effectively eliminates the computational bottleneck of the likelihood. Reanalysis of the published Drosophila melanogaster mutation data results in similar estimates of mutation rates. The new method is also expected to be applicable to the analysis of mutation data generated by next-generation sequencing technology
