282 research outputs found

    Developing Deployable Spoken Language Translation Systems given Limited Resources

    Get PDF
    Approaches are presented that support the deployment of spoken language translation systems. Newly developed methods allow low cost portability to new language pairs. Proposed translation model pruning techniques achieve a high translation performance even in low memory situations. The named entity and specialty vocabulary coverage, particularly on small and mobile devices, is targeted to an individual user by translation model personalization

    Communicating Unknown Words in Machine Translation

    Get PDF
    A new approach to handle unknown words in machine translation is presented. The basic idea is to find definitions for the unknown words on the source language side and translate those definitions instead. Only monolingual resources are required, which generally offer a broader coverage than bilingual resources and are available for a large number of languages. In order to use this in a machine translation system definitions are extracted automatically from online dictionaries and encyclopedias. The translated definition is then inserted and clearly marked in the original hypothesis. This is shown to lead to significant improvements in (subjective) translation quality

    Language Model Adaptation for Statistical Machine Translation with Structured Query Models

    Get PDF
    We explore unsupervised language model adaptation techniques for Statistical Machine Translation. The hypotheses from the machine translation output are converted into queries at different levels of representation power and used to extract similar sentences from very large monolingual text collection. Specific language models are then build from the retrieved data and interpolated with a general background model. Experiments show significant improvements when translating with these adapted language models

    Improving Statistical Machine Translation in the Medical Domain using the Unified Medical Language system

    Get PDF
    Texts from the medical domain are an important task for natural language processing. This paper investigates the usefulness of a large medical database (the Unified Medical Language System) for the translation of dialogues between doctors and patients using a statistical machine translation system. We are able to show that the extraction of a large dictionary and the usage of semantic type information to generalize the training data significantly improves the translation performance

    Extracting Translation Pairs from Social Network Content

    Get PDF
    We introduce two methods to collect additional training data for statistical machine translation systems from public social network content. The first method identifies multilingual content where the author self-translated their own post to reach additional friends, fans or customers. Once identified, we can split the post in the language segments and extract translation pairs from this content. The second methods considers web links (URLs) that users add as part of their post to point the reader to a video, article or website. If the same URL is shared from different language users, there is a chance they might give the same comment in their respective language. We use a support vector machine (SVM) as a classifier to identify true translations from all candidate pairs. We collected additional translation pairs using both methods for the language pairs Spanish-English and Portuguese-English. Testing the collected data as additional training data for statistical machine translations on in-domain test sets resulted in very significant improvements of up to 5 BLEU

    Tools for Collecting Speech Corpora via Mechanical-Turk

    Get PDF
    To rapidly port speech applications to new languages one of the most difficult tasks is the initial collection of sufficient speech corpora. State-of-the-art automatic speech recognition systems are typical trained on hundreds of hours of speech data. While pre-existing corpora do exist for major languages, a sufficient amount of quality speech data is not available for most world languages. While previous works have focused on the collection of translations and the transcription of audio via Mechanical-Turk mechanisms, in this paper we introduce two tools which enable the collection of speech data remotely. We then compare the quality of audio collected from paid part-time staff and unsupervised volunteers, and determine that basic user training is critical to obtain usable data

    Phrase Pair Rescoring with Term Weighting for Statistical Machine Translation

    Get PDF
    We propose to score phrase translation pairs for statistical machine translation using term weight based models. These models employ tf.idftf.idf to encode the weights of content and non-content words in phrase translation pairs. The translation probability is then modeled by similarity functions defined in a vector space. Two similarity functions are compared. Using these models in a statistical machine translation task shows significant improvements

    Cosmic ray transport and anisotropies

    Full text link
    We show that the large-scale cosmic ray anisotropy at ~10 TeV can be explained by a modified Compton-Getting effect in the magnetized flow field of old supernova remnants. This approach suggests an optimum energy scale for detecting the anisotropy. Two key assumptions are that propagation is based on turbulence following a Kolmogorov law and that cosmic ray interactions are dominated by transport through stellar winds of the exploding stars. A prediction is that the amplitude is smaller at lower energies due to incomplete sampling of the velocity field and also smaller at larger energies due to smearing.Comment: 6 pages, 1 figur

    Differential diagnosis of laryngeal spindle cell carcinoma and inflammatory myofibroblastic tumor – report of two cases with similar morphology

    Get PDF
    BACKGROUND: Spindle cell tumors of the larynx are rare. In some cases, the dignity is difficult to determine. We report two cases of laryngeal spindle cell tumors. CASE PRESENTATION: Case 1 is a spindle cell carcinoma (SPC) in a 55 year-old male patient and case 2 an inflammatory myofibroblastic tumor (IMT) in a 34 year-old female patient. A comprehensive morphological and immunohistochemical analysis was done. Both tumors arose at the vocal folds. Magnified laryngoscopy showed polypoid tumors. After resection, conventional histological investigation revealed spindle cell lesions with similar morphology. We found ulceration, mild atypia, and myxoid stroma. Before immunohistochemistry, the dignity was uncertain. Immunohistochemical investigations led to diagnosis of two distinct tumors with different biological behaviour. Both expressed vimentin. Furthermore, the SPC was positive for pan-cytokeratin AE1/3, CK5/6, and smooth-muscle actin, whereas the IMT reacted with antibodies against ALK-1, and EMA. The proliferation (Ki67) was up to 80% in SPC and 10% in IMT. Other stainings with antibodies against p53, p21, Cyclin D1, or Rb did not result in additional information. After resection, the patient with SPC is free of disease for seven months. The IMT recurred three months after first surgery, but no relapses were found eight months after resurgery. CONCLUSION: Differential diagnosis can be difficult without immunohistochemistry. Therefore, a comprehensive morphological and immunohistochemical analysis is necessary, but markers of cell cycle (apart from the assessment of proliferation) do not help
    corecore