308 research outputs found

    LINVIEW: Incremental View Maintenance for Complex Analytical Queries

    Full text link
    Many analytics tasks and machine learning problems can be naturally expressed by iterative linear algebra programs. In this paper, we study the incremental view maintenance problem for such complex analytical queries. We develop a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost. Linear algebra operations tend to cause an avalanche effect where even very local changes to the input matrices spread out and infect all of the intermediate results and the final view, causing incremental view maintenance to lose its performance benefit over re-evaluation. We develop techniques based on matrix factorizations to contain such epidemics of change. As a consequence, our techniques make incremental view maintenance of linear algebra practical and usually substantially cheaper than re-evaluation. We show, both analytically and experimentally, the usefulness of these techniques when applied to standard analytics tasks. Our evaluation demonstrates the efficiency of LINVIEW in generating parallel incremental programs that outperform re-evaluation techniques by more than an order of magnitude.Comment: 14 pages, SIGMO

    Applying semantic web technologies to knowledge sharing in aerospace engineering

    Get PDF
    This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale

    On correctness in RDF stream processor benchmarking

    Get PDF
    Two complementary benchmarks have been proposed so far for the evaluation and continuous improvement of RDF stream processors: SRBench and LSBench. They put a special focus on different features of the evaluated systems, including coverage of the streaming extensions of SPARQL supported by each processor, query processing throughput, and an early analysis of query evaluation correctness, based on comparing the results obtained by different processors for a set of queries. However, none of them has analysed the operational semantics of these processors in order to assess the correctness of query evaluation results. In this paper, we propose a characterization of the operational semantics of RDF stream processors, adapting well-known models used in the stream processing engine community: CQL and SECRET. Through this formalization, we address correctness in RDF stream processor benchmarks, allowing to determine the multiple answers that systems should provide. Finally, we present CSRBench, an extension of SRBench to address query result correctness verification using an automatic method

    SRBench: A streaming RDF/SPARQL benchmark

    Full text link
    We introduce SRBench, a general-purpose benchmark primarily designed for streaming RDF/SPARQL engines, completely based on real-world data sets from the Linked Open Data cloud. With the increasing problem of too much streaming data but not enough tools to gain knowledge from them, researchers have set out for solutions in which Semantic Web technologies are adapted and extended for publishing, sharing, analysing and understanding streaming data. To help researchers and users comparing streaming RDF/SPARQL (strRS) engines in a standardised application scenario, we have designed SRBench, with which one can assess the abilities of a strRS engine to cope with a broad range of use cases typically encountered in real-world scenarios. The data sets used in the benchmark have been carefully chosen, such that they represent a realistic and relevant usage of streaming data. The benchmark defines a concise, yet omprehensive set of queries that cover the major aspects of strRS processing. Finally, our work is complemented with a functional evaluation on three representative strRS engines: SPARQLStream, C-SPARQL and CQELS. The presented results are meant to give a first baseline and illustrate the state-of-the-art

    Care in the time of COVID-19: impact on the diagnosis and treatment of breast cancer in a large, integrated health care system

    Get PDF
    PurposesTo delineate operational changes in Kaiser Permanente Northern California breast care and evaluate the impact of these changes during the initial COVID-19 Shelter-in-Place period (SiP, 3/17/20-5/17/20).MethodsBy extracting data from institutional databases and reviewing electronic medical charts, we compared clinical and treatment characteristics of breast cancer patients diagnosed 3/17/20-5/17/20 to those diagnosed 3/17/19-5/17/2019. Outcomes included time from biopsy to consultation and treatment. Comparisons were made using Chi-square or Wilcoxon rank-sum tests.ResultsFewer new breast cancers were diagnosed in 2020 during the SiP period than during a similar period in 2019 (n = 247 vs n = 703). A higher percentage presented with symptomatic disease in 2020 than 2019 (78% vs 37%, p < 0.001). Higher percentages of 2020 patients presented with grade 3 (37% vs 25%, p = 0.004) and triple-negative tumors (16% vs 10%, p = 0.04). A smaller percentage underwent surgery first in 2020 (71% vs 83%, p < 0.001) and a larger percentage had neoadjuvant chemotherapy (16% vs 11%, p < 0.001). Telehealth utilization increased from 0.8% in 2019 to 70.0% in 2020. Times to surgery and neoadjuvant chemotherapy were shorter in 2020 than 2019 (19 vs 26 days, p < 0.001, and 23 vs 28 days, p = 0.03, respectively).ConclusionsDuring SiP, fewer breast cancers were diagnosed than during a similar period in 2019, and a higher proportion presented with symptomatic disease. Early-stage breast cancer diagnoses decreased, while metastatic cancer diagnoses remained similar. Telehealth increased significantly, and times to treatment were shorter in 2020 than 2019. Our system continued to provide timely breast cancer treatment despite significant pandemic-driven disruption
    corecore