23 research outputs found
Give more data, awareness and control to individual citizens, and they will help COVID-19 containment.
The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the "phase 2" of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively (e.g., with a backend system, but possibly also with other citizens), voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. The decentralized approach is also scalable to large populations, in that only the data of positive patients need be handled at a central level. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates-if and when they want and for specific aims-with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society
Real-time adaptive problem detection in poultry
Real-time identification of unexpected values upon monitoring the production parameters of egg laying hens is quite challenging, as the collected data includes natural variability in addition to chance fluctuation. We present an adaptive method for calculating residuals that reflect the latter type of fluctuation only, and thereby provide for more accurate detection of potential problems. We report on the application of our method to real-world poultry data
Entity Matching in Digital Humanities Knowledge Graphs
We propose a method for entity matching that takes into account the characteristic complex properties of decentralized cultural heritage data sources, where multiple data sources may contain duplicates within and between sources. We apply the proposed method to historical data from the Amsterdam City Archives using several clustering algorithms and evaluate the results against a partial ground truth. We also evaluate our method on a semi-synthetic data set for which we have a complete ground truth. The results show that the proposed method for entity matching performs well and is able to handle the complex properties of historical data sources
Tailored Graph Embeddings for Entity Alignment on Historical Data
In the domain of the Dutch cultural heritage various data sets describe different aspects of life during the Dutch Golden Age.These data sets, in the form of RDF graphs, use different standards and contain noise in the values of literal nodes, such as misspelled names and uncertainty in dates. The Golden Agents project aims at answering queries about the Dutch Golden ages using these distributed and independently maintained data sets. A problem in this project, among many other problems, is the identification of persons who occur in multiple data sets but under different URI’s. This paper aims to solve this specific problem and generate a linkset, i.e. a set of pairs of URI’s which are judged to represent the same person. We use domain knowledge in the application of an existing node context generation algorithm to serve as input for GloVe, an algorithm originally designed for embedding words. This embedding is then used to train a classifier on pairs of URI’s which are known duplicates and non-duplicates. Using just the cosine similarity between URI-pairs in embedding space for prediction,we obtain a simple classifier with an F12-score of around 0.85, even when very few training examples are provided. On larger training sets, more complex classifiers are shown to reach an F12-score ofup to 0.8
Graph Embeddings for Enrichment of Historical Data
In this work-in-progress paper we describe our method of combining expert knowledge and RDF graph embeddings to solve for specific downstream tasks such as entity resolution. We show that efficiency gains can be made by choosing the correct gradient descent algorithm and that expert input can lead to the desired results
Google Scholar makes it hard - the complexity of organizing one's publications
With Google Scholar, scientists can maintain their publications on personal profile pages, while the citations to these works are automatically collected and counted. Maintenance of publications is done manually by the researcher herself, and involves deleting erroneous ones, merging ones that are the same but which were not recognized as the same, adding forgotten co-authors, and correcting titles of papers and venues. The publications are presented on pages with 20 or 100 papers in the web page interface from 2012–2014. (Since mid 2014, Google Scholar's profile pages allow any number of papers on a single page.) The interface does not allow a scientist to merge two versions of a paper if they appear on different pages. This not only implies that a scientist who wants to merge certain subsets of publications will sometimes be unable to do so, but also, we show in this note that the decision problem to determine if it is possible to merge given subsets of papers is NP-complete
Adding Domain Knowledge to Improve Entity Resolution in 17th and 18th Century Amsterdam Archival Records
The problem of entity resolution is central in the field of Digital Humanities. It is also one of the major issues in the Golden Agents project, which aims at creating an infrastructure that enables researchers to search for patterns that span across decentralised knowledge graphs from cultural heritage institutes. To this end, we created a method to perform entity resolution on complex historical knowledge graphs. In previous work, we encoded and embedded the relevant (duplicate) entities in a vector space to derive similarities between them based on sharing a similar context in RDF graphs. In some cases, however, available domain knowledge or rational axioms can be applied to improve entity resolution performance. We show how domain knowledge and rational axioms relevant to the task at hand can be expressed as (probabilistic) rules, and how the information derived from rule application can be combined with quantitative information from the embedding. In this work, we perform our entity resolution method on two data sets. First, we apply it to a data set for which we have a detailed ground truth for validation. This experiment shows that the combination of embedding and the application of domain knowledge and rational axioms leads to improved resolution performance. Second, we perform a case study by applying our method to a larger data set for which there is no ground truth and where the outcome is subsequently validated by a domain expert. Results of this demonstrate that our method achieves a very high precision
Definability equals recognizability for k-outerplanar graphs and l-chordal partial k-trees
One of the most famous algorithmic meta-theorems states that every graph property which can be defined in counting monadic second order logic (CMSOL) can be checked in linear time on graphs of bounded treewidth, which is known as Courcelle's Theorem (Courcelle, 1990). These algorithms are constructed as finite state tree automata and hence every CMSOL-definable graph property is recognizable. Courcelle also conjectured that the converse holds, i.e. every recognizable graph property is definable in CMSOL for graphs of bounded treewidth. In this paper we prove two special cases of this conjecture, first for the class of k-outerplanar graphs, which are known to have treewidth at most 3k−1 (Bodlaender, 1998) and for graphs of bounded treewidth without chordless cycles of length at least some constant ℓ. We furthermore show that for a proof of Courcelle's Conjecture it is sufficient to show that all members of a graph class admit constant width tree decompositions whose bags and edges can be identified with MSOL-predicates. For graph classes that admit MSOL-definable constant width tree decompositions that have bounded degree or allow for a linear ordering of all nodes with the same parent we even give a stronger result: In that case, the counting predicates of CMSOL are not needed
Parameterized algorithms for recognizing monopolar and 2-subcolorable graphs
A graph G is a (A, B )-graph if V (G) can be bipartitioned into A and B such that G[A] satisfies property A and G[B] satisfies property B . The (A, B )-Recognition problem is to recognize whether a given graph is a (A, B )-graph. There are many (A, B )-Recognition problems, including the recognition problems for bipartite, split,and unipolar graphs. We present efficient algorithms for many cases of (A, B )-Recognitionbased on a technique which we dub inductive recognition. In particular, we givefixed-parameter algorithms for two NP-hard (A, B )-Recognition problems, Monopolar Recognition and 2-Subcoloring, parameterized by the number of maximal cliques in G[A]. We complement our algorithmic results with several hardness results for (A, B )-Recognition
