18,767 research outputs found

    Detecting sequential structure

    Get PDF
    Programming by demonstration requires detection and analysis of sequential patterns in a user’s input, and the synthesis of an appropriate structural model that can be used for prediction. This paper describes SEQUITUR, a scheme for inducing a structural description of a sequence from a single example. SEQUITUR integrates several different inference techniques: identification of lexical subsequences or vocabulary elements, hierarchical structuring of such subsequences, identification of elements that have equivalent usage patterns, inference of programming constructs such as looping and branching, generalisation by unifying grammar rules, and the detection of procedural substructure., Although SEQUITUR operates with abstract sequences, a number of concrete illustrations are provided

    Steady-state, effective-temperature dynamics in a glassy material

    Full text link
    We present an STZ-based analysis of numerical simulations by Haxton and Liu (HL). The extensive HL data sharply test the basic assumptions of the STZ theory, especially the central role played by the effective disorder temperature as a dynamical state variable. We find that the theory survives these tests, and that the HL data provide important and interesting constraints on some of its specific ingredients. Our most surprising conclusion is that, when driven at various constant shear rates in the low-temperature glassy state, the HL system exhibits a classic glass transition, including super-Arrhenius behavior, as a function of the effective temperature.Comment: 9 pages, 6 figure

    Extracting text from PostScript

    Get PDF
    We show how to extract plain text from PostScript files. A textual scan is inadequate because PostScript interpreters can generate characters on the page that do not appear in the source file. Furthermore, word and line breaks are implicit in the graphical rendition, and must be inferred from the positioning of word fragments. We present a robust technique for extracting text and recognizing words and paragraphs. The method uses a standard PostScript interpreter but redefines several PostScript operators, and simple heuristics are employed to locate word and line breaks. The scheme has been used to create a full-text index, and plain-text versions, of 40,000 technical reports (34 Gbyte of PostScript). Other text-extraction systems are reviewed: none offer the same combination of robustness and simplicity

    Inter- and Intra-Chain Attractions in Solutions of Flexible Polyelectrolytes at Nonzero Concentration

    Full text link
    Constant temperature molecular dynamics simulations were used to study solutions of flexible polyelectrolyte chains at nonzero concentrations with explicit counterions and unscreened coulombic interactions. Counterion condensation, measured via the self-diffusion coefficient of the counterions, is found to increase with polymer concentration, but contrary to the prediction of Manning theory, the renormalized charge fraction on the chains decreases with increasing Bjerrum length without showing any saturation. Scaling analysis of the radius of gyration shows that the chains are extended at low polymer concentrations and small Bjerrum lengths, while at sufficiently large Bjerrum lengths, the chains shrink to produce compact structures with exponents smaller than a gaussian chain, suggesting the presence of attractive intrachain interactions. A careful study of the radial distribution function of the center-of-mass of the polyelectrolyte chains shows clear evidence that effective interchain attractive interactions also exist in solutions of flexible polyelectrolytes, similar to what has been found for rodlike polyelectrolytes. Our results suggest that the broad maximum observed in scattering experiments is due to clustering of chains.Comment: 12 pages, REVTeX, 15 eps figure

    Separation of long DNA chains using non-uniform electric field: a numerical study

    Get PDF
    We study migration of DNA molecules through a microchannel with a series of electric traps controlled by an ac electric field. We describe the motion of DNA based on Brownian dynamics simulations of a beads-spring chain. Our simulation demonstrates that the chain captured by an electrode escapes from the binding electric field due to thermal fluctuation. We find that the mobility of chain would depend on the chain length; the mobility sharply increases when the length of a chain exceeds a critical value, which is strongly affected by the amplitude of the applied ac field. Thus we can adjust the length regime, in which this microchannel well separates DNA molecules, without changing the structure of the channel. We also present a theoretical insight into the relation between the critical chain length and the field amplitude.Comment: 12 pages, 9 figure

    Online and offline heuristics for inferring hierarchies of repetitions in sequences

    Get PDF
    Hierarchical dictionary-based compression schemes form a grammar for a text by replacing each repeated string with a production rule. While such schemes usually operate online, making a replacement as soon as repetition is detected, offline operation permits greater freedom in choosing the order of replacement. In this paper, we compare the online method with three offline heuristics for selecting the next substring to replace: longest string first, most common string first, and the string that minimized the size of the grammar locally. Surprisingly, two of the offline techniques, like the online method, run in time linear in the size of the input. We evaluate each technique on artificial and natural sequences. In general, the locally-most-compressive heuristic performs best, followed by most frequent, the online technique, and, lagging by some distance, the longest-first technique

    Transposon mutagenesis in a hyper-invasive clinical isolate of Campylobacter jejuni reveals a number of genes with potential roles in invasion

    Get PDF
    Transposon mutagenesis has been applied to a hyper-invasive clinical isolate of Campylobacter jejuni, 01/51. A random transposon mutant library was screened in an in vitro assay of invasion and 26 mutants with a significant reduction in invasion were identified. Given that the invasion potential of C. jejuni is relatively poor compared to other enteric pathogens, the use of a hyper-invasive strain was advantageous as it greatly facilitated the identification of mutants with reduced invasion. The location of the transposon insertion in 23 of these mutants has been determined; all but three of the insertions are in genes also present in the genome-sequenced strain NCTC 11168. Eight of the mutants contain transposon insertions in one region of the genome (∼14 kb), which when compared with the genome of NCTC 11168 overlaps with one of the previously reported plasticity regions and is likely to be involved in genomic variation between strains. Further characterization of one of the mutants within this region has identified a gene that might be involved in adhesion to host cells

    Counterions at Charged Cylinders: Criticality and universality beyond mean-field

    Full text link
    The counterion-condensation transition at charged cylinders is studied using Monte-Carlo simulation methods. Employing logarithmically rescaled radial coordinates, large system sizes are tractable and the critical behavior is determined by a combined finite-size and finite-ion-number analysis. Critical counterion localization exponents are introduced and found to be in accord with mean-field theory both in 2 and 3 dimensions. In 3D the heat capacity shows a universal jump at the transition, while in 2D, it consists of discrete peaks where single counterions successively condense.Comment: 4 pages, 3 figures; submitted to Phys. Rev. Lett. (2005

    Generalized Modeling Approaches to Risk Adjustment of Skewed Outcomes Data

    Get PDF
    There are two broad classes of models used to address the econometric problems caused by skewness in data commonly encountered in health care applications: (1) transformation to deal with skewness (e.g., OLS on ln(y)); and (2) alternative weighting approaches based on exponential conditional models (ECM) and generalized linear model (GLM) approaches. In this paper, we encompass these two classes of models using the three parameter generalized gamma (GGM) distribution, which includes several of the standard alternatives as special cases OLS with a normal error, OLS for the log normal, the standard gamma and exponential with a log link, and the Weibull. Using simulation methods, we find the tests of identifying distributions to be robust. The GGM also provides a potentially more robust alternative estimator to the standard alternatives. An example using inpatient expenditures is also analyzed.
    corecore