1,564 research outputs found

    Natural Notation for the Domestic Internet of Things

    Get PDF
    This study explores the use of natural language to give instructions that might be interpreted by Internet of Things (IoT) devices in a domestic `smart home' environment. We start from the proposition that reminders can be considered as a type of end-user programming, in which the executed actions might be performed either by an automated agent or by the author of the reminder. We conducted an experiment in which people wrote sticky notes specifying future actions in their home. In different conditions, these notes were addressed to themselves, to others, or to a computer agent.We analyse the linguistic features and strategies that are used to achieve these tasks, including the use of graphical resources as an informal visual language. The findings provide a basis for design guidance related to end-user development for the Internet of Things.Comment: Proceedings of the 5th International symposium on End-User Development (IS-EUD), Madrid, Spain, May, 201

    Longest Common Extensions in Sublinear Space

    Get PDF
    The longest common extension problem (LCE problem) is to construct a data structure for an input string TT of length nn that supports LCE(i,j)(i,j) queries. Such a query returns the length of the longest common prefix of the suffixes starting at positions ii and jj in TT. This classic problem has a well-known solution that uses O(n)O(n) space and O(1)O(1) query time. In this paper we show that for any trade-off parameter 1τn1 \leq \tau \leq n, the problem can be solved in O(nτ)O(\frac{n}{\tau}) space and O(τ)O(\tau) query time. This significantly improves the previously best known time-space trade-offs, and almost matches the best known time-space product lower bound.Comment: An extended abstract of this paper has been accepted to CPM 201

    Improved Algorithms for Approximate String Matching (Extended Abstract)

    Get PDF
    The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants of the problem, including comparison of two strings, approximate pattern identification in a string or calculation of the longest common subsequence that two strings share. We designed an output sensitive algorithm solving the edit distance problem between two strings of lengths n and m respectively in time O((s-|n-m|)min(m,n,s)+m+n) and linear space, where s is the edit distance between the two strings. This worst-case time bound sets the quadratic factor of the algorithm independent of the longest string length and improves existing theoretical bounds for this problem. The implementation of our algorithm excels also in practice, especially in cases where the two strings compared differ significantly in length. Source code of our algorithm is available at http://www.cs.miami.edu/\~dimitris/edit_distanceComment: 10 page

    The hidden perils of read mapping as a quality assessment tool in genome sequencing

    Get PDF
    This article provides a comparative analysis of the various methods of genome sequencing focusing on verification of the assembly quality. The results of a comparative assessment of various de novo assembly tools, as well as sequencing technologies, are presented using a recently completed sequence of the genome of Lactobacillus fermentum 3872. In particular, quality of assemblies is assessed by using CLC Genomics Workbench read mapping and Optical mapping developed by OpGen. Over-extension of contigs without prior knowledge of contig location can lead to misassembled contigs, even when commonly used quality indicators such as read mapping suggest that a contig is well assembled. Precautions must also be undertaken when using long read sequencing technology, which may also lead to misassembled contigs

    Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps

    Get PDF
    The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of “reliable” overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our “reliable-overlap” algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps

    Enzyme‐assisted aqueous extraction of Kalahari melon seed oil: optimization using response surface methodology

    Get PDF
    Enzymatic extraction of oil from Kalahari melon seeds was investigated and evaluated by response surface methodology (RSM). Two commercial protease enzyme products were used separately: Neutrase® 0.8 L and Flavourzyme® 1000 L from Novozymes (Bagsvaerd, Denmark). RSM was applied to model and optimize the reaction conditions namely concentration of enzyme (20–50 g kg−1 of seed mass), initial pH of mixture (pH 5–9), incubation temperature (40–60 °C), and incubation time (12–36 h). Well fitting models were successfully established for both enzymes: Neutrase 0.8 L (R 2 = 0.9410) and Flavourzyme 1000 L (R 2 = 0.9574) through multiple linear regressions with backward elimination. Incubation time was the most significant reaction factor on oil yield for both enzymes. The optimal conditions for Neutrase 0.8 L were: an enzyme concentration of 25 g kg−1, an initial pH of 7, a temperature at 58 °C and an incubation time of 31 h with constant shaking at 100 rpm. Centrifuging the mixture at 8,000g for 20 min separated the oil with a recovery of 68.58 ± 3.39%. The optimal conditions for Flavourzyme 1000 L were enzyme concentration of 21 g kg−1, initial pH of 6, temperature at 50 °C and incubation time of 36 h. These optimum conditions yielded a 71.55 ± 1.28% oil recovery

    Conducting a large, multi-site survey about patients' views on broad consent: challenges and solutions

    Get PDF
    BACKGROUND: As biobanks play an increasing role in the genomic research that will lead to precision medicine, input from diverse and large populations of patients in a variety of health care settings will be important in order to successfully carry out such studies. One important topic is participants’ views towards consent and data sharing, especially since the 2011 Advanced Notice of Proposed Rulemaking (ANPRM), and subsequently the 2015 Notice of Proposed Rulemaking (NPRM) were issued by the Department of Health and Human Services (HHS) and Office of Science and Technology Policy (OSTP). These notices required that participants consent to research uses of their de-identified tissue samples and most clinical data, and allowing such consent be obtained in a one-time, open-ended or “broad” fashion. Conducting a survey across multiple sites provides clear advantages to either a single site survey or using a large online database, and is a potentially powerful way of understanding the views of diverse populations on this topic. METHODS: A workgroup of the Electronic Medical Records and Genomics (eMERGE) Network, a national consortium of 9 sites (13 separate institutions, 11 clinical centers) supported by the National Human Genome Research Institute (NHGRI) that combines DNA biorepositories with electronic medical record (EMR) systems for large-scale genetic research, conducted a survey to understand patients’ views on consent, sample and data sharing for future research, biobank governance, data protection, and return of research results. RESULTS: Working across 9 sites to design and conduct a national survey presented challenges in organization, meeting human subjects guidelines at each institution, and survey development and implementation. The challenges were met through a committee structure to address each aspect of the project with representatives from all sites. Each committee’s output was integrated into the overall survey plan. A number of site-specific issues were successfully managed allowing the survey to be developed and implemented uniformly across 11 clinical centers. CONCLUSIONS: Conducting a survey across a number of institutions with different cultures and practices is a methodological and logistical challenge. With a clear infrastructure, collaborative attitudes, excellent lines of communication, and the right expertise, this can be accomplished successfully

    Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

    Dynamical Boson Stars

    Full text link
    The idea of stable, localized bundles of energy has strong appeal as a model for particles. In the 1950s John Wheeler envisioned such bundles as smooth configurations of electromagnetic energy that he called {\em geons}, but none were found. Instead, particle-like solutions were found in the late 1960s with the addition of a scalar field, and these were given the name {\em boson stars}. Since then, boson stars find use in a wide variety of models as sources of dark matter, as black hole mimickers, in simple models of binary systems, and as a tool in finding black holes in higher dimensions with only a single killing vector. We discuss important varieties of boson stars, their dynamic properties, and some of their uses, concentrating on recent efforts.Comment: 79 pages, 25 figures, invited review for Living Reviews in Relativity; major revision in 201
    corecore