407 research outputs found

    Pruning Attributes From Data Cubes with Diamond Dicing

    Get PDF
    Data stored in a data warehouse are inherently multidimensional, but most data-pruning techniques (such as iceberg and top-k queries) are unidimensional. However, analysts need to issue multidimensional queries. For example, an analyst may need to select not just the most profitable stores or--separately--the most profitable products, but simultaneous sets of stores and products fulfilling some profitability constraints. To fill this need, we propose a new operator, the diamond dice. Because of the interaction between dimensions, the computation of diamonds is challenging. We present the first diamond-dicing experiments on large data sets. Experiments show that we can compute diamond cubes over fact tables containing 100 million facts in less than 35 minutes using a standard PC

    Diamond Dicing

    Get PDF
    In OLAP, analysts often select an interesting sample of the data. For example, an analyst might focus on products bringing revenues of at least 100 000 dollars, or on shops having sales greater than 400 000 dollars. However, current systems do not allow the application of both of these thresholds simultaneously, selecting products and shops satisfying both thresholds. For such purposes, we introduce the diamond cube operator, filling a gap among existing data warehouse operations. Because of the interaction between dimensions the computation of diamond cubes is challenging. We compare and test various algorithms on large data sets of more than 100 million facts. We find that while it is possible to implement diamonds in SQL, it is inefficient. Indeed, our custom implementation can be a hundred times faster than popular database engines (including a row-store and a column-store).Comment: 29 page

    Influence of a dam on fine-sediment storage in a canyon river

    Get PDF
    Glen Canyon Dam has caused a fundamental change in the distribution of fine sediment storage in the 99-km reach of the Colorado River in Marble Canyon, Grand Canyon National Park, Arizona. The two major storage sites for fine sediment (i.e., sand and finer material) in this canyon river are lateral recirculation eddies and the main-channel bed. We use a combination of methods, including direct measurement of sediment storage change, measurements of sediment flux, and comparison of the grain size of sediment found in different storage sites relative to the supply and that in transport, in order to evaluate the change in both the volume and location of sediment storage. The analysis shows that the bed of the main channel was an important storage environment for fine sediment in the predam era. In years of large seasonal accumulation, approximately 50% of the fine sediment supplied to the reach from upstream sources was stored on the main-channel bed. In contrast, sediment budgets constructed for two short-duration, high experimental releases from Glen Canyon Dam indicate that approximately 90% of the sediment discharge from the reach during each release was derived from eddy storage, rather than from sandy deposits on the main-channel bed. These results indicate that the majority of the fine sediment in Marble Canyon is now stored in eddies, even though they occupy a small percentage ( similar to 17%) of the total river area. Because of a 95% reduction in the supply of fine sediment to Marble Canyon, future high releases without significant input of tributary sediment will potentially erode sediment from long-term eddy storage, resulting in continued degradation in Marble Canyon

    The impact of maternal infection with Mycobacterium tuberculosis on the infant response to bacille Calmette-Guérin immunization.

    Get PDF
    Bacille Calmette-Guérin (BCG) immunization provides variable protection against tuberculosis. Prenatal antigen exposure may have lifelong effects on responses to related antigens and pathogens. We therefore hypothesized that maternal latent Mycobacterium tuberculosis infection (LTBI) influences infant responses to BCG immunization at birth. We measured antibody (n = 53) and cellular (n = 31) responses to M. tuberculosis purified protein derivative (PPD) in infants of mothers with and without LTBI, in cord blood and at one and six weeks after BCG. The concentrations of PPD-specific antibodies declined between birth (median [interquartile range (IQR)]) 5600 ng ml(-1) [3300-11 050] in cord blood) and six weeks (0.00 ng ml(-1) [0-288]). Frequencies of PPD-specific IFN-γ-expressing CD4(+)T cells increased at one week and declined between one and six weeks (p = 0.031). Frequencies of IL-2- and TNF-α-expressing PPD-specific CD4(+)T cells increased between one and six weeks (p = 0.019, p = 0.009, respectively). At one week, the frequency of PPD-specific CD4(+)T cells expressing any of the three cytokines, combined, was lower among infants of mothers with LTBI, in crude analyses (p = 0.002) and after adjusting for confounders (mean difference, 95% CI -0.041% (-0.082, -0.001)). In conclusion, maternal LTBI was associated with lower infant anti-mycobacterial T-cell responses immediately following BCG immunization. These findings are being explored further in a larger study

    The Use of Interferon Gamma Inducible Protein 10 as a Potential Biomarker in the Diagnosis of Latent Tuberculosis Infection in Uganda.

    Get PDF
    BACKGROUND: In the absence of a gold standard for the diagnosis of latent tuberculosis (TB) infection (LTBI), the current tests available for the diagnosis of LTBI are limited by their inability to differentiate between LTBI and active TB disease. We investigated IP-10 as a potential biomarker for LTBI among household contacts exposed to sputum positive active TB cases. METHODS: Active TB cases and contacts were recruited into a cohort with six months' follow-up. Contacts were tested for LTBI using QuantiFERON®-TB Gold In-Tube (QFN) assay and the tuberculin skin test (TST). Baseline supernatants from the QFN assay of 237 contacts and 102 active TB cases were analysed for Mycobacterium tuberculosis (MTB) specific and mitogen specific IP-10 responses. RESULTS: Contacts with LTBI (QFN+TST+) had the highest MTB specific IP-10 responses at baseline, compared to uninfected contacts (QFN-TST-) p<0.0001; and active cases, p = 0.01. Using a cut-off of 8,239 pg/ml, MTB specific IP-10 was able to diagnose LTBI with a sensitivity of 87.1% (95% CI, 76.2-94.3) and specificity of 90.9% (95% CI, 81.3-96.6). MTB specific to mitogen specific IP-10 ratio was higher in HIV negative active TB cases, compared to HIV negative latently infected contacts, p = 0.0004. Concentrations of MTB specific IP-10 were higher in contacts with TST conversion (negative at baseline, positive at 6-months) than in those that were persistently TST negative, p = 0.001. CONCLUSION: IP-10 performed well in differentiating contacts with either latent or active TB from those who were uninfected but was not able to differentiate LTBI from active disease except when MTB specific to mitogen specific ratios were used in HIV negative adults. In addition, IP-10 had the potential to diagnose 'recent TB infection' in persons classified as having LTBI using the TST. Such individuals with strong IP-10 responses would likely benefit from chemoprophylaxis

    Effects of antiplatelet therapy on stroke risk by brain imaging features of intracerebral haemorrhage and cerebral small vessel diseases: subgroup analyses of the RESTART randomised, open-label trial

    Get PDF
    Background Findings from the RESTART trial suggest that starting antiplatelet therapy might reduce the risk of recurrent symptomatic intracerebral haemorrhage compared with avoiding antiplatelet therapy. Brain imaging features of intracerebral haemorrhage and cerebral small vessel diseases (such as cerebral microbleeds) are associated with greater risks of recurrent intracerebral haemorrhage. We did subgroup analyses of the RESTART trial to explore whether these brain imaging features modify the effects of antiplatelet therapy

    Genetic aspects of congenital nephrotic syndrome : a consensus statement from the ERKNet-ESPN inherited glomerulopathy working group

    Get PDF
    Congenital nephrotic syndrome (CNS) is a heterogeneous group of disorders presenting with massive proteinuria within the first 3 months of life almost inevitably leading to end-stage kidney disease. The Work Group for the European Reference Network for Kidney Diseases (ERKNet) and the European Society for Pediatric Nephrology (ESPN) has developed consensus statement on genetic aspects of CNS diagnosis and management. The presented expert opinion recommends genetic diagnostics as the key diagnostic test to be ordered already during the initial evaluation of the patient, discusses which phenotyping workup should be performed and presents known genotype-phenotype correlations.Peer reviewe

    Annotating datasets in behavioural and social sciences to promote interoperability: development of the Schema for Ontology-based Dataset Annotation (SODA) version 1.0 [version 1; peer review: awaiting peer review]

    Get PDF
    Background and aims: Ontologies are increasingly employed to help find, use and synthesise information, but methods for using them to annotate documents and datasets remain in their infancy in the behavioural and social sciences. The Behavioural Research UK DEMO-DATA project aimed to develop a prototype schema for annotating datasets in behavioural and social sciences. / Methods: A case-study dataset (the ‘Smoking Toolkit Study’), used to inform an Agent-Based Model of trajectories in cigarette smoking and cessation in England, was chosen for annotation using two ontologies - The Behaviour Change Intervention Ontology (BCIO) and the Addiction Ontology (AddictO). The data set included 21 variables representing information about sociodemographic and tobacco and nicotine use attributes of the study population. A preliminary version of the schema for linking variables to ontology classes was developed as a basis for annotating each variable in the dataset. This was applied and revised iteratively until it was judged by an expert panel of domain experts and modellers to represent the variables sufficiently accurately to enable searching for and integration of data. / Results: The prototype Schema for Ontology-based Dataset Annotation (SODA) version 1.0 was developed over seven iterations. Variables were represented by an ‘object property’|‘ontology class’ expression (e.g., ‘has characteristic’|‘extent of social smoking’) together with information about the data types (e.g., numbers, ontology subclasses, or Boolean values), measurement source, unit of measurement, any coding or data transformations and whether or not the variable was fully characterised by the annotation. The prototype schema was applied successfully to the smoking dataset with 15 new ontology classes being created as required. / Conclusions: A prototype schema for annotating behavioural and social science datasets was developed and successfully applied to a dataset on smoking in England using ontology relations and classes. The next step is to further develop and evaluate the schema by application to case studies with a range of users and other datasets

    Multiple novel prostate cancer susceptibility signals identified by fine-mapping of known risk loci among Europeans

    Get PDF
    Genome-wide association studies (GWAS) have identified numerous common prostate cancer (PrCa) susceptibility loci. We have fine-mapped 64 GWAS regions known at the conclusion of the iCOGS study using large-scale genotyping and imputation in 25 723 PrCa cases and 26 274 controls of European ancestry. We detected evidence for multiple independent signals at 16 regions, 12 of which contained additional newly identified significant associations. A single signal comprising a spectrum of correlated variation was observed at 39 regions; 35 of which are now described by a novel more significantly associated lead SNP, while the originally reported variant remained as the lead SNP only in 4 regions. We also confirmed two association signals in Europeans that had been previously reported only in East-Asian GWAS. Based on statistical evidence and linkage disequilibrium (LD) structure, we have curated and narrowed down the list of the most likely candidate causal variants for each region. Functional annotation using data from ENCODE filtered for PrCa cell lines and eQTL analysis demonstrated significant enrichment for overlap with bio-features within this set. By incorporating the novel risk variants identified here alongside the refined data for existing association signals, we estimate that these loci now explain ∼38.9% of the familial relative risk of PrCa, an 8.9% improvement over the previously reported GWAS tag SNPs. This suggests that a significant fraction of the heritability of PrCa may have been hidden during the discovery phase of GWAS, in particular due to the presence of multiple independent signals within the same regio
    corecore