18,767 research outputs found
Detecting sequential structure
Programming by demonstration requires detection and analysis of sequential patterns in a user’s input, and the synthesis of an appropriate structural model that can be used for prediction. This paper describes SEQUITUR, a scheme for inducing a structural description of a sequence from a single example. SEQUITUR integrates several different inference techniques: identification of lexical subsequences or vocabulary elements, hierarchical structuring of such subsequences, identification of elements that have equivalent usage patterns, inference of programming constructs such as looping and branching, generalisation by unifying grammar rules, and the detection of procedural substructure., Although SEQUITUR operates with abstract sequences, a number of concrete illustrations are provided
Steady-state, effective-temperature dynamics in a glassy material
We present an STZ-based analysis of numerical simulations by Haxton and Liu
(HL). The extensive HL data sharply test the basic assumptions of the STZ
theory, especially the central role played by the effective disorder
temperature as a dynamical state variable. We find that the theory survives
these tests, and that the HL data provide important and interesting constraints
on some of its specific ingredients. Our most surprising conclusion is that,
when driven at various constant shear rates in the low-temperature glassy
state, the HL system exhibits a classic glass transition, including
super-Arrhenius behavior, as a function of the effective temperature.Comment: 9 pages, 6 figure
Extracting text from PostScript
We show how to extract plain text from PostScript files. A textual scan is inadequate because PostScript interpreters can generate characters on the page that do not appear in the source file. Furthermore, word and line breaks are implicit in the graphical rendition, and must be inferred from the positioning of word fragments. We present a robust technique for extracting text and recognizing words and paragraphs. The method uses a standard PostScript interpreter but redefines several PostScript operators, and simple heuristics are employed to locate word and line breaks. The scheme has been used to create a full-text index, and plain-text versions, of 40,000 technical reports (34 Gbyte of PostScript). Other text-extraction systems are reviewed: none offer the same combination of robustness and simplicity
Inter- and Intra-Chain Attractions in Solutions of Flexible Polyelectrolytes at Nonzero Concentration
Constant temperature molecular dynamics simulations were used to study
solutions of flexible polyelectrolyte chains at nonzero concentrations with
explicit counterions and unscreened coulombic interactions. Counterion
condensation, measured via the self-diffusion coefficient of the counterions,
is found to increase with polymer concentration, but contrary to the prediction
of Manning theory, the renormalized charge fraction on the chains decreases
with increasing Bjerrum length without showing any saturation. Scaling analysis
of the radius of gyration shows that the chains are extended at low polymer
concentrations and small Bjerrum lengths, while at sufficiently large Bjerrum
lengths, the chains shrink to produce compact structures with exponents smaller
than a gaussian chain, suggesting the presence of attractive intrachain
interactions. A careful study of the radial distribution function of the
center-of-mass of the polyelectrolyte chains shows clear evidence that
effective interchain attractive interactions also exist in solutions of
flexible polyelectrolytes, similar to what has been found for rodlike
polyelectrolytes. Our results suggest that the broad maximum observed in
scattering experiments is due to clustering of chains.Comment: 12 pages, REVTeX, 15 eps figure
Separation of long DNA chains using non-uniform electric field: a numerical study
We study migration of DNA molecules through a microchannel with a series of
electric traps controlled by an ac electric field. We describe the motion of
DNA based on Brownian dynamics simulations of a beads-spring chain. Our
simulation demonstrates that the chain captured by an electrode escapes from
the binding electric field due to thermal fluctuation. We find that the
mobility of chain would depend on the chain length; the mobility sharply
increases when the length of a chain exceeds a critical value, which is
strongly affected by the amplitude of the applied ac field. Thus we can adjust
the length regime, in which this microchannel well separates DNA molecules,
without changing the structure of the channel. We also present a theoretical
insight into the relation between the critical chain length and the field
amplitude.Comment: 12 pages, 9 figure
Online and offline heuristics for inferring hierarchies of repetitions in sequences
Hierarchical dictionary-based compression schemes form a grammar for a text by replacing each repeated string with a production rule. While such schemes usually operate online, making a replacement as soon as repetition is detected, offline operation permits greater freedom in choosing the order of replacement. In this paper, we compare the online method with three offline heuristics for selecting the next substring to replace: longest string first, most common string first, and the string that minimized the size of the grammar locally. Surprisingly, two of the offline techniques, like the online method, run in time linear in the size of the input. We evaluate each technique on artificial and natural sequences. In general, the locally-most-compressive heuristic performs best, followed by most frequent, the online technique, and, lagging by some distance, the longest-first technique
Transposon mutagenesis in a hyper-invasive clinical isolate of Campylobacter jejuni reveals a number of genes with potential roles in invasion
Transposon mutagenesis has been applied to a hyper-invasive clinical isolate of Campylobacter jejuni, 01/51. A random transposon mutant library was screened in an in vitro assay of invasion and 26 mutants with a significant reduction in invasion were identified. Given that the invasion potential of C. jejuni is relatively poor compared to other enteric pathogens, the use of a hyper-invasive strain was advantageous as it greatly facilitated the identification of mutants with reduced invasion. The location of the transposon insertion in 23 of these mutants has been determined; all but three of the insertions are in genes also present in the genome-sequenced strain NCTC 11168. Eight of the mutants contain transposon insertions in one region of the genome (∼14 kb), which when compared with the genome of NCTC 11168 overlaps with one of the previously reported plasticity regions and is likely to be involved in genomic variation between strains. Further characterization of one of the mutants within this region has identified a gene that might be involved in adhesion to host cells
Counterions at Charged Cylinders: Criticality and universality beyond mean-field
The counterion-condensation transition at charged cylinders is studied using
Monte-Carlo simulation methods. Employing logarithmically rescaled radial
coordinates, large system sizes are tractable and the critical behavior is
determined by a combined finite-size and finite-ion-number analysis. Critical
counterion localization exponents are introduced and found to be in accord with
mean-field theory both in 2 and 3 dimensions. In 3D the heat capacity shows a
universal jump at the transition, while in 2D, it consists of discrete peaks
where single counterions successively condense.Comment: 4 pages, 3 figures; submitted to Phys. Rev. Lett. (2005
Generalized Modeling Approaches to Risk Adjustment of Skewed Outcomes Data
There are two broad classes of models used to address the econometric problems caused by skewness in data commonly encountered in health care applications: (1) transformation to deal with skewness (e.g., OLS on ln(y)); and (2) alternative weighting approaches based on exponential conditional models (ECM) and generalized linear model (GLM) approaches. In this paper, we encompass these two classes of models using the three parameter generalized gamma (GGM) distribution, which includes several of the standard alternatives as special cases OLS with a normal error, OLS for the log normal, the standard gamma and exponential with a log link, and the Weibull. Using simulation methods, we find the tests of identifying distributions to be robust. The GGM also provides a potentially more robust alternative estimator to the standard alternatives. An example using inpatient expenditures is also analyzed.
- …
