Search CORE

176,649 research outputs found

Tolerating Correlated Failures in Massively Parallel Stream Processing Engines

Author: Su Li
Zhou Yongluan
Publication venue
Publication date: 04/02/2016
Field of study

Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint. On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE). The passive approach incurs a long recovery latency especially when a number of correlated nodes fail simultaneously, while the active approach requires extra replication resources. In this paper, we propose a new fault-tolerance framework, which is Passive and Partially Active (PPA). In a PPA scheme, the passive approach is applied to all tasks while only a selected set of tasks will be actively replicated. The number of actively replicated tasks depends on the available resources. If tasks without active replicas fail, tentative outputs will be generated before the completion of the recovery process. We also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an open-source MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness of our approach

arXiv.org e-Print Archive

Crossref

Syddansk Universitets Forskerportal

Incremental learning with respect to new incoming input attributes

Author: Guan SU
Li SC
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2001
Field of study

Neural networks are generally exposed to a dynamic environment where the training patterns or the input attributes (features) will likely be introduced into the current domain incrementally. This paper considers the situation where a new set of input attributes must be considered and added into the existing neural network. The conventional method is to discard the existing network and redesign one from scratch. This approach wastes the old knowledge and the previous effort. In order to reduce computational time, improve generalization accuracy, and enhance intelligence of the learned models, we present ILIA algorithms (namely ILIA1, ILIA2, ILIA3, ILIA4 and ILIA5) capable of Incremental Learning in terms of Input Attributes. Using the ILIA algorithms, when new input attributes are introduced into the original problem, the existing neural network can be retained and a new sub-network is constructed and trained incrementally. The new sub-network and the old one are merged later to form a new network for the changed problem. In addition, ILIA algorithms have the ability to decide whether the new incoming input attributes are relevant to the output and consistent with the existing input attributes or not and suggest to accept or reject them. Experimental results show that the ILIA algorithms are efficient and effective both for the classification and regression problems

CiteSeerX

Brunel University Research Archive

ScholarBank@NUS

Cooperative Data Exchange based on MDS Codes

Author: Gastpar Michael
Li Su
Publication venue
Publication date: 18/08/2017
Field of study

The cooperative data exchange problem is studied for the fully connected network. In this problem, each node initially only possesses a subset of the

K

packets making up the file. Nodes make broadcast transmissions that are received by all other nodes. The goal is for each node to recover the full file. In this paper, we present a polynomial-time deterministic algorithm to compute the optimal (i.e., minimal) number of required broadcast transmissions and to determine the precise transmissions to be made by the nodes. A particular feature of our approach is that {\it each} of the

K-d

transmissions is a linear combination of {\it exactly}

d+1

packets, and we show how to optimally choose the value of

d.

We also show how the coefficients of these linear combinations can be chosen by leveraging a connection to Maximum Distance Separable (MDS) codes. Moreover, we show that our method can be used to solve cooperative data exchange problems with weighted cost as well as the so-called successive local omniscience problem.Comment: 21 pages, 1 figur

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref