2,117 research outputs found

    A Primer on Causality in Data Science

    Get PDF
    Many questions in Data Science are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome interest. Even studies that are seemingly non-causal, such as those with the goal of prediction or prevalence estimation, have causal elements, including differential censoring or measurement. As a result, we, as Data Scientists, need to consider the underlying causal mechanisms that gave rise to the data, rather than simply the pattern or association observed in those data. In this work, we review the 'Causal Roadmap' of Petersen and van der Laan (2014) to provide an introduction to some key concepts in causal inference. Similar to other causal frameworks, the steps of the Roadmap include clearly stating the scientific question, defining of the causal model, translating the scientific question into a causal parameter, assessing the assumptions needed to express the causal parameter as a statistical estimand, implementation of statistical estimators including parametric and semi-parametric methods, and interpretation of our findings. We believe that using such a framework in Data Science will help to ensure that our statistical analyses are guided by the scientific question driving our research, while avoiding over-interpreting our results. We focus on the effect of an exposure occurring at a single time point and highlight the use of targeted maximum likelihood estimation (TMLE) with Super Learner.Comment: 26 pages (with references); 4 figure

    Non-equilibrium Green's function approach to inhomogeneous quantum many-body systems using the Generalized Kadanoff Baym Ansatz

    Full text link
    In non-equilibrium Green's function calculations the use of the Generalized Kadanoff-Baym Ansatz (GKBA) allows for a simple approximate reconstruction of the two-time Green's function from its time-diagonal value. With this a drastic reduction of the computational needs is achieved in time-dependent calculations, making longer time propagation possible and more complex systems accessible. This paper gives credit to the GKBA that was introduced 25 years ago. After a detailed derivation of the GKBA, we recall its application to homogeneous systems and show how to extend it to strongly correlated, inhomogeneous systems. As a proof of concept, we present results for a 2-electron quantum well, where the correct treatment of the correlated electron dynamics is crucial for the correct description of the equilibrium and dynamic properties

    A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

    Full text link
    We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual's covariates on another's outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment

    A Comprehensive Survey of Brane Tilings

    Get PDF
    An infinite class of 4d4d N=1\mathcal{N}=1 gauge theories can be engineered on the worldvolume of D3-branes probing toric Calabi-Yau 3-folds. This kind of setup has multiple applications, ranging from the gauge/gravity correspondence to local model building in string phenomenology. Brane tilings fully encode the gauge theories on the D3-branes and have substantially simplified their connection to the probed geometries. The purpose of this paper is to push the boundaries of computation and to produce as comprehensive a database of brane tilings as possible. We develop efficient implementations of brane tiling tools particularly suited for this search. We present the first complete classification of toric Calabi-Yau 3-folds with toric diagrams up to area 8 and the corresponding brane tilings. This classification is of interest to both physicists and mathematicians alike.Comment: 39 pages. Link to Mathematica modules provide

    Estimating Effects on Rare Outcomes: Knowledge is Power

    Get PDF
    Many of the secondary outcomes in observational studies and randomized trials are rare. Methods for estimating causal effects and associations with rare outcomes, however, are limited, and this represents a missed opportunity for investigation. In this article, we construct a new targeted minimum loss-based estimator (TMLE) for the effect of an exposure or treatment on a rare outcome. We focus on the causal risk difference and statistical models incorporating bounds on the conditional risk of the outcome, given the exposure and covariates. By construction, the proposed estimator constrains the predicted outcomes to respect this model knowledge. Theoretically, this bounding provides stability and power to estimate the exposure effect. In finite sample simulations, the proposed estimator performed as well, if not better, than alternative estimators, including the propensity score matching estimator, inverse probability of treatment weighted (IPTW) estimator, augmented-IPTW and the standard TMLE algorithm. The new estimator remained unbiased if either the conditional mean outcome or the propensity score were consistently estimated. As a substitution estimator, TMLE guaranteed the point estimates were within the parameter range. Our results highlight the potential for double robust, semiparametric efficient estimation with rare event

    Quantum Zeno and anti-Zeno effects by indirect measurement with finite errors

    Full text link
    We study the quantum Zeno effect and the anti-Zeno effect in the case of `indirect' measurements, where a measuring apparatus does not act directly on an unstable system, for a realistic model with finite errors in the measurement. A general and simple formula for the decay rate of the unstable system under measurement is derived. In the case of a Lorentzian form factor, we calculate the full time evolutions of the decay rate, the response of the measuring apparatus, and the probability of errors in the measurement. It is shown that not only the response time but also the detection efficiency plays a crucial role. We present the prescription for observing the quantum Zeno and anti-Zeno effects, as well as the prescriptions for avoiding or calibrating these effects in general experiments.Comment: 4 pages, 3 figure

    Cis-regulatory elements of the mitotic regulator, string/Cdc25

    Get PDF
    Mitosis in most Drosophila cells is triggered by brief bursts of transcription of string (stg), a Cdc25-type phosphatase that activates the mitotic kinase, Cdk1 (Cdc2). To understand how string transcription is regulated, we analyzed the expression of string-lacZ reporter genes covering approximately 40 kb of the string locus. We also tested protein coding fragments of the string locus of 6 kb to 31.6 kb for their ability to complement loss of string function in embryos and imaginal discs. A plethora of cis-acting elements spread over >30 kb control string transcription in different cells and tissue types. Regulatory elements specific to subsets of epidermal cells, mesoderm, trachea and nurse cells were identified, but the majority of the string locus appears to be devoted to controlling cell proliferation during neurogenesis. Consistent with this, compact promotor-proximal sequences are sufficient for string function during imaginal disc growth, but additional distal elements are required for the development of neural structures in the eye, wing, leg and notum. We suggest that, during evolution, cell-type-specific control elements were acquired by a simple growth-regulated promoter as a means of coordinating cell division with developmental processes, particularly neurogenesis.Dara A. Lehman; Briony Patterson, Laura A. Johnston; Tracy Balzer; Jessica S. Britton; Robert Saint and Bruce A. Edga

    The epidemiology of chronic kidney disease (CKD) in rural East Africa: A population-based study.

    Get PDF
    BackgroundChronic kidney disease (CKD) may be common among individuals living in sub-Saharan Africa due to the confluence of CKD risk factors and genetic predisposition.MethodsWe ascertained the prevalence of CKD and its risk factors among a sample of 3,686 participants of a population-based HIV trial in rural Uganda and Kenya. Prevalent CKD was defined as a serum creatinine-based estimated glomerular filtration rate <60 mL/min/1.73m2 or proteinuria (urine dipstick ≥1+). We used inverse-weighting to estimate the population prevalence of CKD, and multivariable log-link Poisson models to assess the associations of potential risk factors with CKD.ResultsThe estimated CKD prevalence was 6.8% (95% CI 5.7-8.1%) overall and varied by region, being 12.5% (10.1-15.4%) in eastern Uganda, 3.9% (2.2-6.8%) in southwestern Uganda and 3.7% (2.7-5.1%) in western Kenya. Risk factors associated with greater CKD prevalence included age ≥60 years (adjusted prevalence ratio [aPR] 3.5 [95% CI 1.9-6.5] compared with age 18-29 years), HIV infection (aPR 1.6 [1.1-2.2]), and residence in eastern Uganda (aPR 3.9 [2.6-5.9]). However, two-thirds of individuals with CKD did not have HIV, diabetes, or hypertension as risk factors. Furthermore, we noted many individuals who did not have proteinuria had dipstick positive leukocyturia or hematuria.ConclusionThe prevalence of CKD is appreciable in rural East Africa and there are considerable regional differences. Conventional risk factors appear to only explain a minority of cases, and leukocyturia and hematuria were common, highlighting the need for further research into understanding the nature of CKD in sub-Saharan Africa

    Targeted Estimation and Inference for the Sample Average Treatment Effect

    Get PDF
    While the population average treatment effect has been the subject of extensive methods and applied research, less consideration has been given to the sample average treatment effect: the mean difference in the counterfactual outcomes for the study units. The sample parameter is easily interpretable and is arguably the most relevant when the study units are not representative of a greater population or when the exposure\u27s impact is heterogeneous. Formally, the sample effect is not identifiable from the observed data distribution. Nonetheless, targeted maximum likelihood estimation (TMLE) can provide an asymptotically unbiased and efficient estimate of both the population and sample parameters. In this paper, we study the asymptotic and finite sample properties of the TMLE for the sample effect and provide a conservative variance estimator. In most settings, the sample parameter can be estimated more efficiently than the population parameter. Finite sample simulations illustrate the potential gains in precision and power from selecting the sample effect as the target of inference. As a motivating example, we discuss the Sustainable East Africa Research in Community Health (SEARCH) study, an ongoing cluster randomized trial for HIV prevention and treatment

    The age-specific burden and household and school-based predictors of child and adolescent tuberculosis infection in rural Uganda.

    Get PDF
    BackgroundThe age-specific epidemiology of child and adolescent tuberculosis (TB) is poorly understood, especially in rural areas of East Africa. We sought to characterize the age-specific prevalence and predictors of TB infection among children and adolescents living in rural Uganda, and to explore the contribution of household TB exposure on TB infection.MethodsFrom 2015-2016 we placed and read 3,121 tuberculin skin tests (TST) in children (5-11 years old) and adolescents (12-19 years old) participating in a nested household survey in 9 rural Eastern Ugandan communities. TB infection was defined as a positive TST (induration ≥10mm or ≥5mm if living with HIV). Age-specific prevalence was estimated using inverse probability weighting to adjust for incomplete measurement. Generalized estimating equations were used to assess the association between TB infection and multi-level predictors.ResultsThe adjusted prevalence of TB infection was 8.5% (95%CI: 6.9-10.4) in children and 16.7% (95% CI:14.0-19.7) in adolescents. Nine percent of children and adolescents with a prevalent TB infection had a household TB contact. Among children, having a household TB contact was strongly associated with TB infection (aOR 5.5, 95% CI: 1.7-16.9), but the strength of this association declined among adolescents and did not meet significance (aOR 2.3, 95% CI: 0.8-7.0). The population attributable faction of TB infection due to a household TB contact was 8% for children and 4% among adolescents. Mobile children and adolescents who travel outside of their community for school had a 1.7 (95% CI 1.0-2.9) fold higher odds of TB infection than those who attended school in the community.ConclusionChildren and adolescents in this area of rural eastern Uganda suffer a significant burden of TB. The majority of TB infections are not explained by a known household TB contact. Our findings underscore the need for community-based TB prevention interventions, especially among mobile youth
    corecore