324 research outputs found

    Online Pattern Matching for String Edit Distance with Moves

    Full text link
    Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies between different appearances of the same substrings in a string. ESP can be used for computing an approximate EDM as the L1 distance between characteristic vectors built by node labels in parsing trees. However, ESP is not applicable to a streaming text data where a whole text is unknown in advance. We present an online ESP (OESP) that enables an online pattern matching for EDM. OESP builds a parse tree for a streaming text and computes the L1 distance between characteristic vectors in an online manner. For the space-efficient computation of EDM, OESP directly encodes the parse tree into a succinct representation by leveraging the idea behind recent results of a dynamic succinct tree. We experimentally test OESP on the ability to compute EDM in an online manner on benchmark datasets, and we show OESP's efficiency.Comment: This paper has been accepted to the 21st edition of the International Symposium on String Processing and Information Retrieval (SPIRE2014

    Composite repetition-aware data structures

    Get PDF
    In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure. The key component of our constructions is the run-length encoded BWT (RLBWT), which takes space proportional to the number of BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it with data structures from LZ77 indexes, which take space proportional to the number of LZ77 factors, and with the compact directed acyclic word graph (CDAWG), which takes space proportional to the number of extensions of maximal repeats. The combination of CDAWG and RLBWT enables also a new representation of the suffix tree, whose size depends again on the number of extensions of maximal repeats, and that is powerful enough to support matching statistics and constant-space traversal.Comment: (the name of the third co-author was inadvertently omitted from previous version

    One-dimensional staged self-assembly

    Get PDF
    17th International Conference, DNA 17, Pasadena, CA, USA, September 19-23, 2011. ProceedingsWe introduce the problem of staged self-assembly of one-dimensional nanostructures, which becomes interesting when the elements are labeled (e.g., representing functional units that must be placed at specific locations). In a restricted model in which each operation has a single terminal assembly, we prove that assembling a given string of labels with the fewest stages is equivalent, up to constant factors, to compressing the string to be uniquely derived from the smallest possible context-free grammar (a well-studied O(logn)-approximable problem). Without this restriction, we show that the optimal assembly can be substantially smaller than the optimal context-free grammar, by a factor of Ω √n/log n even for binary strings of length n. Fortunately, we can bound this separation in model power by a quadratic function in the number of distinct glues or tiles allowed in the assembly, which is typically small in practice

    On the maximal number of cubic subwords in a string

    Full text link
    We investigate the problem of the maximum number of cubic subwords (of the form wwwwww) in a given word. We also consider square subwords (of the form wwww). The problem of the maximum number of squares in a word is not well understood. Several new results related to this problem are produced in the paper. We consider two simple problems related to the maximum number of subwords which are squares or which are highly repetitive; then we provide a nontrivial estimation for the number of cubes. We show that the maximum number of squares xxxx such that xx is not a primitive word (nonprimitive squares) in a word of length nn is exactly n21\lfloor \frac{n}{2}\rfloor - 1, and the maximum number of subwords of the form xkx^k, for k3k\ge 3, is exactly n2n-2. In particular, the maximum number of cubes in a word is not greater than n2n-2 either. Using very technical properties of occurrences of cubes, we improve this bound significantly. We show that the maximum number of cubes in a word of length nn is between (1/2)n(1/2)n and (4/5)n(4/5)n. (In particular, we improve the lower bound from the conference version of the paper.)Comment: 14 page

    Fingerprints in Compressed Strings

    Get PDF
    The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string S of size N compressed by a context-free grammar of size n that answers fingerprint queries. That is, given indices i and j, the answer to a query is the fingerprint of the substring S[i,j]. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get O(logN) query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get O(log log N) query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time O(log N log l) and O(log l log log l + log log N) for SLPs and Linear SLPs, respectively. Here, l denotes the length of the LCE

    Willow short-rotation production systems in Canada and Northern United States: A review

    Get PDF
    Willow short rotation coppice (SRC) systems are becoming an attractive practice because they are a sustainable system fulfilling multiple ecological objectives with significant environmental benefits. A sustainable supply of bioenergy feedstock can be produced by willow on marginal land using well-adapted or tolerant cultivars. Across Canada and northern U.S.A., there are millions of hectares of available degraded land that have the potential for willow SRC biomass production, with a C sequestration potential capable of offsetting appreciable amount of anthropogenic green-house gas emissions. A fundamental question concerning 1 sustainable SRC willow yields was whether long-term soil productivity is maintained within a multi-rotation SRC system, given the rapid growth rate and associated nutrient exports offsite when harvesting the willow biomass after repeated short rotations. Based on early results from the first willow SRC rotation, it was found willow systems are relatively low nutrient-demanding, with minimal nutrient output other than in harvested biomass. The overall aim of this manuscript is to summarize the literature and present findings and data from ongoing research trials across Canada and northern U.S.A. examining willow SRC system establishment and viability. The research areas of interest presented here are the crop production of willow SRC systems, above- and below-ground biomass dynamics and the C budget, comprehensive soil-willow system nutrient budget, and soil nutrient amendments (via fertilization) in willow SRC systems. Areas of existing research gaps were also identified for the Canadian context

    Cryptosporidium Priming Is More Effective than Vaccine for Protection against Cryptosporidiosis in a Murine Protein Malnutrition Model

    Get PDF
    Cryptosporidium is a major cause of severe diarrhea, especially in malnourished children. Using a murine model of C. parvum oocyst challenge that recapitulates clinical features of severe cryptosporidiosis during malnutrition, we interrogated the effect of protein malnutrition (PM) on primary and secondary responses to C. parvum challenge, and tested the differential ability of mucosal priming strategies to overcome the PM-induced susceptibility. We determined that while PM fundamentally alters systemic and mucosal primary immune responses to Cryptosporidium, priming with C. parvum (106 oocysts) provides robust protective immunity against re-challenge despite ongoing PM. C. parvum priming restores mucosal Th1-type effectors (CD3+CD8+CD103+ T-cells) and cytokines (IFNγ, and IL12p40) that otherwise decrease with ongoing PM. Vaccination strategies with Cryptosporidium antigens expressed in the S. Typhi vector 908htr, however, do not enhance Th1-type responses to C. parvum challenge during PM, even though vaccination strongly boosts immunity in challenged fully nourished hosts. Remote non-specific exposures to the attenuated S. Typhi vector alone or the TLR9 agonist CpG ODN-1668 can partially attenuate C. parvum severity during PM, but neither as effectively as viable C. parvum priming. We conclude that although PM interferes with basal and vaccine-boosted immune responses to C. parvum, sustained reductions in disease severity are possible through mucosal activators of host defenses, and specifically C. parvum priming can elicit impressively robust Th1-type protective immunity despite ongoing protein malnutrition. These findings add insight into potential correlates of Cryptosporidium immunity and future vaccine strategies in malnourished children

    The effectiveness of ω-3 polyunsaturated fatty acid interventions during pregnancy on obesity measures in the offspring: an up-to-date systematic review and meta-analysis.

    Get PDF
    BACKGROUND: The potential role of ω-3 long chain polyunsaturated fatty acid (LCPUFA) supplementation during pregnancy on subsequent risk of obesity outcomes in the offspring is not clear and there is a need to synthesise this evidence. OBJECTIVE: A systematic review and meta-analysis of randomised controlled trials (RCTs), including the most recent studies, was conducted to assess the effectiveness of ω-3 LCPUFA interventions during pregnancy on obesity measures, e.g. BMI, body weight, fat mass in offspring. METHODS: Included RCTs had a minimum of 1-month follow-up post-partum. The search included CENTRAL, MEDLINE, SCOPUS, WHO's International Clinical Trials Reg., E-theses and Web of Science databases. Study quality was evaluated using the Cochrane Collaboration's risk of bias tool. RESULTS: Eleven RCTs, from ten unique trials, (3644 children) examined the effectiveness of ω-3 LCPUFA maternal supplementation during pregnancy on the development of obesity outcomes in offspring. There were heterogeneities between the trials in terms of their sample, type and duration of intervention and follow-up. Pooled estimates did not show an association between prenatal intake of fatty acids and obesity measures in offspring. CONCLUSION: These results indicate that maternal supplementation with ω-3 LCPUFA during pregnancy does not have a beneficial effect on obesity risk. Due to the high heterogeneity between studies along with small sample sizes and high rates of attrition, the effects of ω-3 LCPUFA supplementation during pregnancy for prevention of childhood obesity in the long-term remains unclear. Large high-quality RCTs are needed that are designed specifically to examine the effect of prenatal intake of fatty acids for prevention of childhood obesity. There is also a need to determine specific sub-groups in the population that might get a greater benefit and whether different ω-3 LCPUFA, i.e. eicosapentaenoic (EPA) vs. docosahexanoic (DHA) acids might potentially have different effects

    Minimal Holocene retreat of large tidewater glaciers in Køge Bugt, southeast Greenland

    Get PDF
    Abstract Køge Bugt, in southeast Greenland, hosts three of the largest glaciers of the Greenland Ice Sheet; these have been major contributors to ice loss in the last two decades. Despite its importance, the Holocene history of this area has not been investigated. We present a 9100 year sediment core record of glaciological and oceanographic changes from analysis of foraminiferal assemblages, the abundance of ice-rafted debris, and sortable silt grain size data. Results show that ice-rafted debris accumulated constantly throughout the core; this demonstrates that glaciers in Køge Bugt remained in tidewater settings throughout the last 9100 years. This observation constrains maximum Holocene glacier retreat here to less than 6 km from present-day positions. Retreat was minimal despite oceanic and climatic conditions during the early-Holocene that were at least as warm as the present-day. The limited Holocene retreat of glaciers in Køge Bugt was controlled by the subglacial topography of the area; the steeply sloping bed allowed glaciers here to stabilise during retreat. These findings underscore the need to account for individual glacier geometry when predicting future behaviour. We anticipate that glaciers in Køge Bugt will remain in stable configurations in the near-future, despite the predicted continuation of atmospheric and oceanic warming
    corecore