308 research outputs found
LINVIEW: Incremental View Maintenance for Complex Analytical Queries
Many analytics tasks and machine learning problems can be naturally expressed
by iterative linear algebra programs. In this paper, we study the incremental
view maintenance problem for such complex analytical queries. We develop a
framework, called LINVIEW, for capturing deltas of linear algebra programs and
understanding their computational cost. Linear algebra operations tend to cause
an avalanche effect where even very local changes to the input matrices spread
out and infect all of the intermediate results and the final view, causing
incremental view maintenance to lose its performance benefit over
re-evaluation. We develop techniques based on matrix factorizations to contain
such epidemics of change. As a consequence, our techniques make incremental
view maintenance of linear algebra practical and usually substantially cheaper
than re-evaluation. We show, both analytically and experimentally, the
usefulness of these techniques when applied to standard analytics tasks. Our
evaluation demonstrates the efficiency of LINVIEW in generating parallel
incremental programs that outperform re-evaluation techniques by more than an
order of magnitude.Comment: 14 pages, SIGMO
Applying semantic web technologies to knowledge sharing in aerospace engineering
This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale
On correctness in RDF stream processor benchmarking
Two complementary benchmarks have been proposed so far for the evaluation and continuous improvement of RDF stream processors: SRBench and LSBench. They put a special focus on different features of the evaluated systems, including coverage of the streaming extensions of SPARQL supported by each processor, query processing throughput, and an early analysis of query evaluation correctness, based on comparing the results obtained by different processors for a set of queries. However, none of them has analysed the operational semantics of these processors in order to assess the correctness of query evaluation results. In this paper, we propose a characterization of the operational semantics of RDF stream processors, adapting well-known models used in the stream processing engine community: CQL and SECRET. Through this formalization, we address correctness in RDF stream processor benchmarks, allowing to determine the multiple answers that systems should provide. Finally, we present CSRBench, an extension of SRBench to address query result correctness verification using an automatic method
SRBench: A streaming RDF/SPARQL benchmark
We introduce SRBench, a general-purpose benchmark primarily designed for streaming RDF/SPARQL engines, completely based on real-world data sets from the Linked Open Data cloud. With the increasing problem of too much streaming data but not enough tools to gain knowledge from them, researchers have set out for solutions in which Semantic Web technologies are adapted and extended for publishing, sharing, analysing and understanding streaming data. To help researchers and users comparing streaming RDF/SPARQL (strRS) engines in a standardised application scenario, we have designed SRBench, with which one can assess the abilities of a strRS engine to cope with a broad range of use cases typically encountered in real-world scenarios. The data sets used in the benchmark have been carefully chosen, such that they represent a realistic and relevant usage of streaming data. The benchmark defines a concise, yet omprehensive set of queries that cover the major aspects of strRS processing. Finally, our work is complemented with a functional evaluation on three representative strRS engines: SPARQLStream, C-SPARQL and CQELS. The presented results are meant to give a first baseline and illustrate the state-of-the-art
Recommended from our members
Glucosinolates, myrosinase hydrolysis products, and flavonols found in rocket (Eruca sativa and Diplotaxis tenuifolia)
Rocket species have been shown to have very high concentrations of glucosinolates and flavonols, which have numerous positive health benefits with regular consumption. In this review we highlight how breeders and processors of rocket species can utilize genomic and phytochemical research to improve varieties and enhance the nutritive benefits to consumers. Plant breeders are increasingly looking to new technologies such as HPLC, UPLC, LC-MS and GC-MS to screen populations for their phytochemical content to inform plant selections. Here we collate the research that has been conducted to-date in rocket, and summarise all glucosinolate and flavonol compounds identified in the species. We emphasize the importance of the broad screening of populations for phytochemicals and myrosinase degradation products, as well as unique traits that may be found in underutilized gene bank resources. We also stress that collaboration with industrial partners is becoming essential for long-term plant breeding goals through research
Care in the time of COVID-19: impact on the diagnosis and treatment of breast cancer in a large, integrated health care system
PurposesTo delineate operational changes in Kaiser Permanente Northern California breast care and evaluate the impact of these changes during the initial COVID-19 Shelter-in-Place period (SiP, 3/17/20-5/17/20).MethodsBy extracting data from institutional databases and reviewing electronic medical charts, we compared clinical and treatment characteristics of breast cancer patients diagnosed 3/17/20-5/17/20 to those diagnosed 3/17/19-5/17/2019. Outcomes included time from biopsy to consultation and treatment. Comparisons were made using Chi-square or Wilcoxon rank-sum tests.ResultsFewer new breast cancers were diagnosed in 2020 during the SiP period than during a similar period in 2019 (n = 247 vs n = 703). A higher percentage presented with symptomatic disease in 2020 than 2019 (78% vs 37%, p < 0.001). Higher percentages of 2020 patients presented with grade 3 (37% vs 25%, p = 0.004) and triple-negative tumors (16% vs 10%, p = 0.04). A smaller percentage underwent surgery first in 2020 (71% vs 83%, p < 0.001) and a larger percentage had neoadjuvant chemotherapy (16% vs 11%, p < 0.001). Telehealth utilization increased from 0.8% in 2019 to 70.0% in 2020. Times to surgery and neoadjuvant chemotherapy were shorter in 2020 than 2019 (19 vs 26 days, p < 0.001, and 23 vs 28 days, p = 0.03, respectively).ConclusionsDuring SiP, fewer breast cancers were diagnosed than during a similar period in 2019, and a higher proportion presented with symptomatic disease. Early-stage breast cancer diagnoses decreased, while metastatic cancer diagnoses remained similar. Telehealth increased significantly, and times to treatment were shorter in 2020 than 2019. Our system continued to provide timely breast cancer treatment despite significant pandemic-driven disruption
- …
