Search CORE

121 research outputs found

DBpedia's triple pattern fragments: usage patterns and insights

Author: C Bizer
C Bizer
JD Fernández
O Hartig
R Verborgh
S van Hooland
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Queryable Linked Data is published through several interfaces, including SPARQL endpoints and Linked Data documents. In October 2014, the DBpedia Association announced an official Triple Pattern Fragments interface to its popular DBpedia dataset. This interface proposes to improve the availability of live queryable data by dividing query execution between clients and servers. In this paper, we present a usage analysis between November 2014 and July 2015. In 9 months time, the interface had an average availability of 99.99 %, handling 16,776,170 requests, 43.0% of which were served from cache. These numbers provide promising evidence that low-cost Triple Pattern Fragments interfaces provide a viable strategy for live applications on top of public, queryable datasets

Crossref

Ghent University Academic Bibliography

Opportunistic linked data querying through approximate membership metadata

Author: BH Bloom
C Buil-Aranda
E Oren
G Aluç
I Ermilov
I Filali
M Schmachtenberg
R Gallager
R Verborgh
X Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Between URI dereferencing and the SPARQL protocol lies a largely unexplored axis of possible interfaces to Linked Data, each with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute SPARQL queries against low-cost servers, at the cost of higher bandwidth. Increasing a client's efficiency means lowering the number of requests, which can among others be achieved through additional metadata in responses. We noted that typical SPARQL query evaluations against Triple Pattern Fragments require a significant portion of membership subqueries, which check the presence of a specific triple, rather than a variable pattern. This paper studies the impact of providing approximate membership functions, i.e., Bloom filters and Golomb-coded sets, as extra metadata. In addition to reducing HTTP requests, such functions allow to achieve full result recall earlier when temporarily allowing lower precision. Half of the tested queries from a WatDiv benchmark test set could be executed with up to a third fewer HTTP requests with only marginally higher server cost. Query times, however, did not improve, likely due to slower metadata generation and transfer. This indicates that approximate membership functions can partly improve the client-side query process with minimal impact on the server and its interface

Crossref

Ghent University Academic Bibliography

Substring filtering for low-cost linked data interfaces

Author: E Minack
I Ermilov
J Van Herwegen
L Rietveld
M Arias Gallego
M Nelson
MP Ferguson
NR Brisaboa
O Erling
R Li
R Verborgh
S van Hooland
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Recently, Triple Pattern Fragments (TPFS) were introduced as a low-cost server-side interface when high numbers of clients need to evaluate SPARQL queries. Scalability is achieved by moving part of the query execution to the client, at the cost of elevated query times. Since the TPFS interface purposely does not support complex constructs such as SPARQL filters, queries that use them need to be executed mostly on the client, resulting in long execution times. We therefore investigated the impact of adding a literal substring matching feature to the TPFS interface, with the goal of improving query performance while maintaining low server cost. In this paper, we discuss the client/server setup and compare the performance of SPARQL queries on multiple implementations, including Elastic Search and case-insensitive FM-index. Our evaluations indicate that these improvements allow for faster query execution without significantly increasing the load on the server. Offering the substring feature on TPF servers allows users to obtain faster responses for filter-based SPARQL queries. Furthermore, substring matching can be used to support other filters such as complete regular expressions or range queries

Crossref

Ghent University Academic Bibliography

Interest-based RDF Update Propagation

Author: B Schandl
G Tummarello
K Voruganti
L Pellegrino
N Popitsch
P-A Chirita
R Verborgh
S Tramp
Publication venue
Publication date: 01/01/2015
Field of study

Many LOD datasets, such as DBpedia and LinkedGeoData, are voluminous and process large amounts of requests from diverse applications. Many data products and services rely on full or partial local LOD replications to ensure faster querying and processing. While such replicas enhance the flexibility of information sharing and integration infrastructures, they also introduce data duplication with all the associated undesirable consequences. Given the evolving nature of the original and authoritative datasets, to ensure consistent and up-to-date replicas frequent replacements are required at a great cost. In this paper, we introduce an approach for interest-based RDF update propagation, which propagates only interesting parts of updates from the source to the target dataset. Effectively, this enables remote applications to `subscribe' to relevant datasets and consistently reflect the necessary changes locally without the need to frequently replace the entire dataset (or a relevant subset). Our approach is based on a formal definition for graph-pattern-based interest expressions that is used to filter interesting parts of updates from the source. We implement the approach in the iRap framework and perform a comprehensive evaluation based on DBpedia Live updates, to confirm the validity and value of our approach.Comment: 16 pages, Keywords: Change Propagation, Dataset Dynamics, Linked Data, Replicatio

arXiv.org e-Print Archive

Crossref

Fraunhofer-Publica

Using triple pattern fragments to enable streaming of top-k shortest paths via the web

Author: AW Brander
D Eppstein
DY Zhang
G Cheng
JD Fernández
JL Moore
L Fang
L Vocht De
R Verborgh
R Verborgh
T Shibuya
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Crossref

Ghent University Academic Bibliography

Effect of heuristics on serendipity in path-based storytelling with linked data

Author: A Aizawa
A Foster
B Aleman-Meza
D Kumar
F Godin
G Cheng
L De Vocht
L Fang
L Mazuel
P Hart
R Verborgh
RL Cilibrasi
V Franzoni
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Path-based storytelling with Linked Data on the Web provides users the ability to discover concepts in an entertaining and educational way. Given a query context, many state-of-the-art pathfinding approaches aim at telling a story that coincides with the user's expectations by investigating paths over Linked Data on the Web. By taking into account serendipity in storytelling, we aim at improving and tailoring existing approaches towards better fitting user expectations so that users are able to discover interesting knowledge without feeling unsure or even lost in the story facts. To this end, we propose to optimize the link estimation between - and the selection of facts in a story by increasing the consistency and relevancy of links between facts through additional domain delineation and refinement steps. In order to address multiple aspects of serendipity, we propose and investigate combinations of weights and heuristics in paths forming the essential building blocks for each story. Our experimental findings with stories based on DBpedia indicate the improvements when applying the optimized algorithm

Crossref

Ghent University Academic Bibliography

Publikationsserver der RWTH Aachen University

Co-evolution of RDF Datasets

Author: A Motro
C Buil-Aranda
G Tummarello
L-D Ibáñez
M Saleem
M Schmachtenberg
R Verborgh
S Auer
T Knap
Publication venue
Publication date: 01/01/2016
Field of study

Linking Data initiatives have fostered the publication of large number of RDF datasets in the Linked Open Data (LOD) cloud, as well as the development of query processing infrastructures to access these data in a federated fashion. However, different experimental studies have shown that availability of LOD datasets cannot be always ensured, being RDF data replication required for envisioning reliable federated query frameworks. Albeit enhancing data availability, RDF data replication requires synchronization and conflict resolution when replicas and source datasets are allowed to change data over time, i.e., co-evolution management needs to be provided to ensure consistency. In this paper, we tackle the problem of RDF data co-evolution and devise an approach for conflict resolution during co-evolution of RDF datasets. Our proposed approach is property-oriented and allows for exploiting semantics about RDF properties during co-evolution management. The quality of our approach is empirically evaluated in different scenarios on the DBpedia-live dataset. Experimental results suggest that proposed proposed techniques have a positive impact on the quality of data in source datasets and replicas.Comment: 18 pages, 4 figures, Accepted in ICWE, 201

arXiv.org e-Print Archive

Crossref

Fraunhofer-Publica

CORE

Representing dockerfiles in RDF

Author: De Meester Ben
Della Valle E
Heyvaert Pieter
Mannens Erik
Tommasini R
Verborgh Ruben
Publication venue
Publication date: 01/01/2017
Field of study

Ghent University Academic Bibliography

Moving real-time linked data query evaluation to the client

Author: C Buil-Aranda
C Gutierrez
D Le-Phuoc
DF Barbieri
J Pérez
R Verborgh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Traditional RDF stream processing engines work completely server-side, which contributes to a high server cost. For allowing a large number of concurrent clients to do continuous querying, we extend the low-cost Triple Pattern Fragments (TPF) interface with support for timesensitive queries. In this poster, we give the overview of a client-side rdf stream processing engine on top of tpf. Our experiments show that our solution significantly lowers the server load while increasing the load on the clients. Preliminary results indicate that our solution moves the complexity of continuously evaluating real-time queries from the server to the client, which makes real-time querying much more scalable for a large amount of concurrent clients when compared to the alternatives

Crossref

Ghent University Academic Bibliography

HDTQ: Managing RDF Datasets in Compressed Space

Author: A Zimmermann
JD Fernández
JM Banda
MA Martínez-Prieto
P Boncz
R Verborgh
W Beek
Y Guo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

HDT (Header-Dictionary-Triples) is a compressed representation of RDF data that supports retrieval features without prior decompression. Yet, RDF datasets often contain additional graph information, such as the origin, version or validity time of a triple. Traditional HDT is not capable of handling this additional parameter(s). This work introduces HDTQ (HDT Quads), an extension of HDT that is able to represent quadruples (or quads) while still being highly compact and queryable. Two HDTQ-based approaches are introduced: Annotated Triples and Annotated Graphs, and their performance is compared to the leading open-source RDF stores on the market. Results show that HDTQ achieves the best compression rates and is a competitive alternative to well-established systems

Crossref

Elektronische Publikationen der Wirtschaftsuniversität Wien