Search CORE

97 research outputs found

Causality and the semantics of provenance

Provenance, or information about the sources, derivation, custody or history of data, has been studied recently in a number of contexts, including databases, scientific workflows and the Semantic Web. Many provenance mechanisms have been developed, motivated by informal notions such as influence, dependence, explanation and causality. However, there has been little study of whether these mechanisms formally satisfy appropriate policies or even how to formalize relevant motivating concepts such as causality. We contend that mathematical models of these concepts are needed to justify and compare provenance techniques. In this paper we review a theory of causality based on structural models that has been developed in artificial intelligence, and describe work in progress on a causal semantics for provenance graphs.Comment: Workshop submissio

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Multiparameter shallow-seismic waveform inversion based on the Jensen-Shannon divergence

Author: Chen Xiaofei
Cui Shihao
Guan Jianbo
Li Jing
Liu Yu
Yan Yingwei
Publication venue: Oxford University Press
Publication date: 01/07/2024
Field of study

ABSTRACT: Seismic full-waveform inversion (FWI) or waveform inversion (WI) has gained extensive attention as a cutting-edge imaging method, which is expected to reveal the high-resolution images of complex geological structures. In this paper, we regard each 1-D signal in the inversion system as a 1-D probability distribution, then use the Jensen–Shannon divergence from information theory to measure the discrepancy between the predicted and observed signals, and finally implement a novel 2-D multiparameter shallow-seismic WI (MSWI). Essentially, the novel approach achieves an implicit weighting along the time-axis for each 1-D adjoint source defined by the classical WI (CWI), thus enhancing the extra illumination for a deeper medium compared with the CWI. By evaluating the inversion results of the two-layer model and fault model, the reconstruction accuracy for S-wave velocity and density of the new method is increased by about 30 and 20 per cent compared with that of the CWI under the same conditions, respectively. The reconstruction performance for P-wave velocity of these two methods is almost equal. In addition, the new 2-D MSWI is also resilient to white Gaussian noise in the data. Numerically, the inversion system has almost the strongest sensitivities to the S-wave velocity and density, performing the poorest sensitivity to the P-wave velocity. Finally, we test the novel method with a detection case for a power tunnel

PolyPublie

Proteogenomic Data and Resources for Pan-Cancer Analysis

Author: Aguet François
Akiyama Yo
Anand Shankara
Birger Chet
Calinawan Anna P
Cao Song
Chaudhary Rekha
Chilappagari Padmini
Cieslik Marcin
Colaprico Antonio
Da Veiga Leprevost Felipe
Day Corbin
Ding Li
Domagalski Marcin J
Dou Yongchao
Esai Selvan Myvizhi
Fenyö David
Foltz Steven M
Francis Alicia
Geffen Yifat
Getz Gad
Gonzalez-Robles Tania
Gümüş Zeynep H
Heiman David
Holck Michael
Hong Runyu
Hu Yingwei
Jaehnig Eric J
Ji Jiayi
Jiang Wen
Katsnelson Lizabeth
Ketchum Karen A
Klein Robert J
Lei Jonathan T
Li Yize
Liang Wen-Wei
Liao Yuxing
Lindgren Caleb M
Ma Lei
Ma Weiping
MacCoss Michael J
Martins Rodrigues Fernanda
McKerrow Wilson
Nesvizhskii Alexey I
Nguyen Ngoc
Oldroyd Robert
Payne Samuel H
Pilozzi Alexander
Pugliese Pietro
Reva Boris
Robles Ana I
Rudnick Paul
Ruggles Kelly V
Rykunov Dmitry
Savage Sara R
Schnaubelt Michael
Schraink Tobias
Shi Zhiao
Singhal Deepak
Song Xiaoyu
Storrs Erik
Terekhanova Nadezhda V
Thangudu Ratna R
Thiagarajan Mathangi
Wang Joshua M
Wang Liang-Bo
Wang Pei
Wang Ying
Wen Bo
Wu Yige
Wyczalkowski Matthew A
Xin Yi
Yao Lijun
Yi Xinpei
Zhang Bing
Zhang Hui
Zhang Qing
Zhou Daniel Cui
Zuhl Maya
Publication venue: DigitalCommons@TMC
Publication date: 14/08/2023
Field of study

The National Cancer Institute\u27s Clinical Proteomic Tumor Analysis Consortium (CPTAC) investigates tumors from a proteogenomic perspective, creating rich multi-omics datasets connecting genomic aberrations to cancer phenotypes. To facilitate pan-cancer investigations, we have generated harmonized genomic, transcriptomic, proteomic, and clinical data for \u3e1000 tumors in 10 cohorts to create a cohesive and powerful dataset for scientific discovery. We outline efforts by the CPTAC pan-cancer working group in data harmonization, data dissemination, and computational resources for aiding biological discoveries. We also discuss challenges for multi-omics data integration and analysis, specifically the unique challenges of working with both nucleotide sequencing and mass spectrometry proteomics data

DigitalCommons@The Texas Medical Center

Storing Auxiliary Data for Efficient Maintenance and Lineage Tracing of Complex Views

Author: Jennifer Widom
Yingwei Cui
Publication venue
Publication date: 01/01/1999
Field of study

As views in a data warehouse become more complex, the view maintenance process can become very complicated and potentially very inefficient. Storing auxiliary views in the warehouse can reduce the complexity and improve the efficiency of view maintenance, and the same auxiliary views can help in efficiently answering lineage tracing queries over the warehouse views. In this paper, we study the problem of selecting auxiliary views to materialize in order to minimize the total view maintenance and lineage tracing cost. We consider relational views with arbitrary use of aggregation operators, and we define an initial search space for our optimization problem based on a normal form for such view definitions. We present several auxiliary view selection algorithms, and to study their performance we conduct experiments using the TPC-D benchmark in addition to synthetic view definitions and statistics. The results of our experiments show: (1) the exhaustive algorithm that selects the optimal set of auxiliary views is far too expensive in many cases; (2) two heuristic algorithms that we present select good (often optimal) sets of auxiliary views in a much shorter time; (3) even auxiliary views selected by a very simple algorithm can significantly reduce the overall view maintenance and lineage tracing cost

CiteSeerX

Lineage Tracing for General Data Warehouse Transformations

Author: Jennifer Widom
Yingwei Cui
Publication venue
Publication date: 01/01/2001
Field of study

Data warehousing systems integrate information from operational data sources into a central repository to enable analysis and mining of the integrated information. During the integration process, source data typically undergoes a series of transformations, which may vary from simple algebraic operations or aggregations to complex "data cleansing" procedures. In a warehousing environment, the data lineage problem is that of tracing warehouse data items back to the original source items from which they were derived. We formally define the lineage tracing problem in the presence of general data warehouse transformations, and we present algorithms for lineage tracing in this environment. Our tracing procedures take advantage of known structure or properties of transformations when present, but also work in the absence of such information. Our results can be used as the basis for a lineage tracing tool in a general warehousing setting, and also can guide the design of data warehouses that enable efficient lineage tracing.

CiteSeerX

Lineage Tracing in a Data Warehousing System

Author: Jennifer Widom
Yingwei Cui
Publication venue
Publication date
Field of study

e system applies the tracing procedures to the source tables and/or auxiliary views to obtain the lineage results and show the specific view data derivation process. 1 Lineage Tracing System 1.1 Lineage Example Given a view data item I , the exact set of source data that produced I is called I's lineage. We use an example to illustrate the concepts; a full formalization of the problem along with solutions and algorithms are given in [2]. Consider a financial data warehouse with the three source tables shown in Figure 3. A view Promising (Figure 4) is defined to contain all "promising" industries, where an industry is regarded as promising if some stock in that industry is gaining money over all purchases, and the stock has a price-earnings ratio below 40. Over our sample source data the view contains two tuples, hcomputeri and hm

CiteSeerX

Design of a novel digital-IF receiver based on software radio

Author: Xiaolin Zhang
Yingwei Cui
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date
Field of study

Crossref

Lineage Tracing in a Data Warehousing System (Demonstration Proposal)

Author: Jennifer Widom
Yingwei Cui
Publication venue
Publication date
Field of study

A data warehousing system collects data from multiple distributed sources and stores the integrated information as materialized views in a local data warehouse. Users then perform data analysis and mining on the warehouse views. Figure 1 shows the basic architecture of a data warehousing system. In many cases, the warehouse view contents alone are not sufficient for in-depth analysis. It is often useful to be able to "drill through" from interesting (or potentially erroneous) view data to the original source data that derived the view data. For a given view data item, identifying the exact set of base data items that produced the view data item is termed the view data lineage problem. Motivation for and applications of lineage tracing in a warehousing environment are provided in [2]. In the context of the WHIPS data warehousing project at Stanford [3], we have developed a complete prototype that performs efficient and consistent lineage tracing. Some commercial data warehousing systems support schema-level lineage tracing, or provide specialized drill-down and/or drill-through facilities for multi-dimensional warehouse views. Our lineage tracing prototype supports more ne-grained instance-level lineage tracing for arbitrarily complex relational views, including aggregation. Our prototype automatically generates lineag

CiteSeerX

Practical Lineage Tracing in Data Warehouses

Author: Jennifer Widom
Yingwei Cui
Publication venue
Publication date
Field of study

We consider the view data lineage problem in a warehousing environment: For a given data item in a materialized warehouse view, we want to identify the set of source data items that produced the view item. We formalize the problem and present a lineage tracing algorithm for relational views with aggregation. Based on our tracing algorithm, we propose a number of schemes for storing auxiliary views that enable consistent and efficient lineage tracing in a multisource data warehouse. We report on a performance study of the various schemes, identifying which schemes perform best in which settings. Based on our results, we have implemented a lineage tracing package in the WHIPS data warehousing system prototype at Stanford. With this package, users can select view tuples of interest, then efficiently "drill down" to examine the source data that produced them. 1 Introduction Data warehousing systems collect data from multiple distributed sources, integrate the information as materialized v..

CiteSeerX