5,287 research outputs found

    Multilingual term extraction from comparable corpora : informativeness of monolingual term extraction features

    Get PDF
    Most research on bilingual automatic term extraction (ATE) from comparable corpora focuses on both components of the task separately, i.e. monolingual automatic term extraction and finding equivalent pairs cross-lingually. The latter usually relies on context vectors and is notoriously inaccurate for infrequent terms. The aim of this pilot study is to investigate whether using information gathered for the former might be beneficial for the cross-lingual linking as well, thereby illustrating the potential of a more holistic approach to ATE from comparable corpora with re-use of information across the components. To test this hypothesis, an existing dataset was expanded, which covers three languages and four domains. A supervised binary classifier is shown to achieve robust performance, with stable results across languages and domains

    Accurate determination of node and arc multiplicities in de Bruijn graphs using conditional random fields

    Get PDF
    Background: De Bruijn graphs are key data structures for the analysis of next-generation sequencing data. They efficiently represent the overlap between reads and hence, also the underlying genome sequence. However, sequencing errors and repeated subsequences render the identification of the true underlying sequence difficult. A key step in this process is the inference of the multiplicities of nodes and arcs in the graph. These multiplicities correspond to the number of times eachk-mer (resp.k+1-mer) implied by a node (resp. arc) is present in the genomic sequence. Determining multiplicities thus reveals the repeat structure and presence of sequencing errors. Multiplicities of nodes/arcs in the de Bruijn graph are reflected in their coverage, however, coverage variability and coverage biases render their determination ambiguous. Current methods to determine node/arc multiplicities base their decisions solely on the information in nodes and arcs individually, under-utilising the information present in the sequencing data. Results: To improve the accuracy with which node and arc multiplicities in a de Bruijn graph are inferred, we developed a conditional random field (CRF) model to efficiently combine the coverage information within each node/arc individually with the information of surrounding nodes and arcs. Multiplicities are thus collectively assigned in a more consistent manner. Conclusions: We demonstrate that the CRF model yields significant improvements in accuracy and a more robust expectation-maximisation parameter estimation. Truek-mers can be distinguished from erroneousk-mers with a higher F(1)score than existing methods. A C++11 implementation is available atunder the GNU AGPL v3.0 license

    A genetic approach to Markovian characterisation of H.264 scalable video

    Get PDF
    We propose an algorithm for multivariate Markovian characterisation of H.264/SVC scalable video traces at the sub-GoP (Group of Pictures) level. A genetic algorithm yields Markov models with limited state space that accurately capture temporal and inter-layer correlation. Key to our approach is the covariance-based fitness function. In comparison with the classical Expectation Maximisation algorithm, ours is capable of matching the second order statistics more accurately at the cost of less accuracy in matching the histograms of the trace. Moreover, a simulation study shows that our approach outperforms Expectation Maximisation in predicting performance of video streaming in various networking scenarios

    Properties of Random Complex Chemical Reaction Networks and Their Relevance to Biological Toy Models

    Full text link
    We investigate the properties of large random conservative chemical reaction networks composed of elementary reactions endowed with either mass-action or saturating kinetics, assigning kinetic parameters in a thermodynamically-consistent manner. We find that such complex networks exhibit qualitatively similar behavior when fed with external nutrient flux. The nutrient is preferentially transformed into one specific chemical that is an intrinsic property of the network. We propose a self-consistent proto-cell toy model in which the preferentially synthesized chemical is a precursor for the cell membrane, and show that such proto-cells can exhibit sustainable homeostatic growth when fed with any nutrient diffusing through the membrane, provided that nutrient is metabolized at a sufficient rate

    Markovian Characterisation of H.264/SVC scalable video

    Get PDF
    In this paper, a multivariate Markovian traffic: model is proposed to characterise H.264/SVC scalable video traces. Parametrisation by a genetic algorithm results in models with a limited state space which accurately capture. both the temporal and the inter-layer correlation of the traces. A simulation study further shows that the model is capable of predicting performance of video streaming in various networking scenarios

    Delay analysis of a variable-capacity batch-server queue with general class-dependent service times

    Get PDF
    In manufacturing, a batch server groups multiple customers that require the same type of service based on a specific characteristic, such as temperature or destination. In this paper, we extend previous work with the analysis of the delay in a variable-capacity batch-service queueing system with general class-dependent service times and customer-based correlation in the arrival process. The impact of asymmetry and correlation in the arrival process on the mean delay of a random customer and the tail distribution of the delay is investigated as well

    An analysis of a batch server with variable and class-dependent service capacity

    Get PDF
    In many studies on batch service queueing systems, the service capacity is assumed to be constant. However, this service capacity often depends on the content of the queue. In this paper, we analyse a discrete-time single server batch server queue with general inde- pendent arrivals. We distinguish two dierent classes in the arrival stream and products of both classes are added to the tail of a single queue. The single batch server can group all waiting customers at the head of the queue that belong to the same product class up to a certain class-dependent maximum capacity. This results in a stochastic service capacity that depends on both the number of customers in the queue and their respective classes. Since it is clear that the length of a sequence of same-class customers will have a signicant impact on the performance of the system, we also include correlation between the classes of consecutive customers. Applications of this type of batch server can, for instance, be found in the pacemaker loop of a Lean manufacturing system. In the course of the analysis, we calculate the probability generating function of the system occupancy at service initi- ation opportunities. In the numerical experiments, we will look at the impact of dierent parameters on both the mean system occupancy and the probability that the server is idle at a random service initiation opportunity. We also provide a number of guidelines to pick between the exact solution and an approximated approach with unlimited service capacities, by looking at the trade-o between accuracy and computational complexity
    corecore