5,324 research outputs found

    Enhanced methods for local ancestry assignment in sequenced admixed individuals.

    Get PDF
    Inferring the ancestry at each locus in the genome of recently admixed individuals (e.g., Latino Americans) plays a major role in medical and population genetic inferences, ranging from finding disease-risk loci, to inferring recombination rates, to mapping missing contigs in the human genome. Although many methods for local ancestry inference have been proposed, most are designed for use with genotyping arrays and fail to make use of the full spectrum of data available from sequencing. In addition, current haplotype-based approaches are very computationally demanding, requiring large computational time for moderately large sample sizes. Here we present new methods for local ancestry inference that leverage continent-specific variants (CSVs) to attain increased performance over existing approaches in sequenced admixed genomes. A key feature of our approach is that it incorporates the admixed genomes themselves jointly with public datasets, such as 1000 Genomes, to improve the accuracy of CSV calling. We use simulations to show that our approach attains accuracy similar to widely used computationally intensive haplotype-based approaches with large decreases in runtime. Most importantly, we show that our method recovers comparable local ancestries, as the 1000 Genomes consensus local ancestry calls in the real admixed individuals from the 1000 Genomes Project. We extend our approach to account for low-coverage sequencing and show that accurate local ancestry inference can be attained at low sequencing coverage. Finally, we generalize CSVs to sub-continental population-specific variants (sCSVs) and show that in some cases it is possible to determine the sub-continental ancestry for short chromosomal segments on the basis of sCSVs

    Finding a boundary between valid and invalid regions of the input space

    Full text link
    In the context of robustness testing, the boundary between the valid and invalid regions of the input space can be an interesting source of erroneous inputs. Knowing where a specific software under test (SUT) has a boundary is essential for validation in relation to requirements. However, finding where a SUT actually implements the boundary is a non-trivial problem that has not gotten much attention. This paper proposes a method of finding the boundary between the valid and invalid regions of the input space. The proposed method consists of two steps. First, test data generators, directed by a search algorithm to maximise distance to known, valid test cases, generate valid test cases that are closer to the boundary. Second, these valid test cases undergo mutations to try to push them over the boundary and into the invalid part of the input space. This results in a pair of test sets, one consisting of test cases on the valid side of the boundary and a matched set on the outer side, with only a small distance between the two sets. The method is evaluated on a number of examples from the standard library of a modern programming language. We propose a method of determining the boundary between valid and invalid regions of the input space and apply it on a SUT that has a non-contiguous valid region of the input space. From the small distance between the developed pairs of test sets, and the fact that one test set contains valid test cases and the other invalid test cases, we conclude that the pair of test sets described the boundary between the valid and invalid regions of that input space. Differences of behaviour can be observed between different distances and sets of mutation operators, but all show that the method is able to identify the boundary between the valid and invalid regions of the input space. This is an important step towards more automated robustness testing.Comment: 10 pages, conferenc

    A Testability Analysis Framework for Non-Functional Properties

    Full text link
    This paper presents background, the basic steps and an example for a testability analysis framework for non-functional properties

    Magnetic Instability in Strongly Correlated Superconductors

    Full text link
    Recently a new phenomenological Hamiltonian has been proposed to describe the superconducting cuprates. This so-called Gossamer Hamiltonian is an apt model for a superconductor with strong on-site Coulomb repulsion betweenthe electrons. It is shown that as one approaches half-filling the Gossamer superconductor, and hence the superconducting state, with strong repulsion is unstable toward an antiferromagnetic insulator an can undergo a quantum phase transition to such an insulator if one increases the on-site Coulomb repulsion

    PACE: Pattern Accurate Computationally Efficient Bootstrapping for Timely Discovery of Cyber-Security Concepts

    Full text link
    Public disclosure of important security information, such as knowledge of vulnerabilities or exploits, often occurs in blogs, tweets, mailing lists, and other online sources months before proper classification into structured databases. In order to facilitate timely discovery of such knowledge, we propose a novel semi-supervised learning algorithm, PACE, for identifying and classifying relevant entities in text sources. The main contribution of this paper is an enhancement of the traditional bootstrapping method for entity extraction by employing a time-memory trade-off that simultaneously circumvents a costly corpus search while strengthening pattern nomination, which should increase accuracy. An implementation in the cyber-security domain is discussed as well as challenges to Natural Language Processing imposed by the security domain.Comment: 6 pages, 3 figures, ieeeTran conference. International Conference on Machine Learning and Applications 201
    corecore