Search CORE

282 research outputs found

Developing Deployable Spoken Language Translation Systems given Limited Resources

Author: Eck Matthias
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2008
Field of study

Approaches are presented that support the deployment of spoken language translation systems. Newly developed methods allow low cost portability to new language pairs. Proposed translation model pruning techniques achieve a high translation performance even in low memory situations. The named entity and specialty vocabulary coverage, particularly on small and mobile devices, is targeted to an individual user by translation model personalization

KITopen

Repository KITopen

Communicating Unknown Words in Machine Translation

Author: Eck Matthias
Vogel Stephan
Waibel Alex
Publication venue: Association for Computational Linguistics
Publication date: 03/01/2024
Field of study

A new approach to handle unknown words in machine translation is presented. The basic idea is to find definitions for the unknown words on the source language side and translate those definitions instead. Only monolingual resources are required, which generally offer a broader coverage than bilingual resources and are available for a large number of languages. In order to use this in a machine translation system definitions are extracted automatically from online dictionaries and encyclopedias. The translated definition is then inserted and clearly marked in the original hypothesis. This is shown to lead to significant improvements in (subjective) translation quality

KITopen

Repository KITopen

Language Model Adaptation for Statistical Machine Translation with Structured Query Models

Author: Eck Matthias
Vogel Stephan
Zhao Bing
Publication venue: Association for Computational Linguistics
Publication date: 03/01/2024
Field of study

We explore unsupervised language model adaptation techniques for Statistical Machine Translation. The hypotheses from the machine translation output are converted into queries at different levels of representation power and used to extract similar sentences from very large monolingual text collection. Specific language models are then build from the retrieved data and interpolated with a general background model. Experiments show significant improvements when translating with these adapted language models

KITopen

Repository KITopen

Improving Statistical Machine Translation in the Medical Domain using the Unified Medical Language system

Author: Eck Matthias
Vogel Stephan
Waibel Alex
Publication venue: Association for Computational Linguistics
Publication date: 03/01/2024
Field of study

Texts from the medical domain are an important task for natural language processing. This paper investigates the usefulness of a large medical database (the Unified Medical Language System) for the translation of dialogues between doctors and patients using a statistical machine translation system. We are able to show that the extraction of a large dictionary and the usage of semantic type information to generalize the training data significantly improves the translation performance

KITopen

Repository KITopen

Extracting Translation Pairs from Social Network Content

Author: Eck Matthias
Waibel Alexander
Zemlyanskiy Yury
Zhang Joy
Publication venue: Association for Computational Linguistics
Publication date: 03/01/2024
Field of study

We introduce two methods to collect additional training data for statistical machine translation systems from public social network content. The first method identifies multilingual content where the author self-translated their own post to reach additional friends, fans or customers. Once identified, we can split the post in the language segments and extract translation pairs from this content. The second methods considers web links (URLs) that users add as part of their post to point the reader to a video, article or website. If the same URL is shared from different language users, there is a chance they might give the same comment in their respective language. We use a support vector machine (SVM) as a classifier to identify true translations from all candidate pairs. We collected additional translation pairs using both methods for the language pairs Spanish-English and Portuguese-English. Testing the collected data as additional training data for statistical machine translations on in-domain test sets resulted in very significant improvements of up to 5 BLEU

KITopen

Repository KITopen

Tools for Collecting Speech Corpora via Mechanical-Turk

Author: Eck Matthias
Lane Ian
Rottmann Kay
Waibel Alex
Publication venue: Association for Computational Linguistics
Publication date: 03/01/2024
Field of study

To rapidly port speech applications to new languages one of the most difficult tasks is the initial collection of sufficient speech corpora. State-of-the-art automatic speech recognition systems are typical trained on hundreds of hours of speech data. While pre-existing corpora do exist for major languages, a sufficient amount of quality speech data is not available for most world languages. While previous works have focused on the collection of translations and the transcription of audio via Mechanical-Turk mechanisms, in this paper we introduce two tools which enable the collection of speech data remotely. We then compare the quality of audio collected from paid part-time staff and unsupervised volunteers, and determine that basic user training is critical to obtain usable data

KITopen

Repository KITopen

Phrase Pair Rescoring with Term Weighting for Statistical Machine Translation

Author: Eck Matthias
Vogel Stephan
Waibel Alex
Zhao Bing
Publication venue: Association for Computational Linguistics
Publication date: 03/01/2024
Field of study

We propose to score phrase translation pairs for statistical machine translation using term weight based models. These models employ

tf.idf

to encode the weights of content and non-content words in phrase translation pairs. The translation probability is then modeled by similarity functions defined in a vector space. Two similarity functions are compared. Using these models in a statistical machine translation task shows significant improvements

KITopen

Repository KITopen

Cosmic ray transport and anisotropies

Author: Aartsen
Abbasi
Abbasi
Abbasi
Abdo
Abdo
Ackermann
Appenzeller
Bell
Bell
Berezhko
Berkhuijsen
Beuermann
Biermann
Biermann
Biermann
Biermann
Biermann
Biermann
Biermann
Biermann
Bignall
Blasi
Blasi
Brunetti
Chandrasekhar
Compton
Desiati
Drury
Drury
Erlykin
Eun-Suk Seo
Everett
Everett
Everett
Ferrando
Ginzburg
Gopal-Krishna
Guillian
Hagihara
Hanasz
Julia Becker Tjus
Kardashev
Kolmogorov
Kraichnan
Lagage
Lazarian
Lovell
Malkov
Matthias Mandelartz
McComas
Obermeier
Peter L. Biermann
Prantzos
Pshirkov
Ptuskin
Redfield
Redfield
Reid
Romanova
Sagdeev
Schlüter
Sedov
Snowden
Stanev
Teshima
van Eck
van Leeuwen
Völk
Wiebel-Sooth
Wielen
Yoon
Yüksel
Publication venue: 'IOP Publishing'
Publication date: 05/06/2012
Field of study

We show that the large-scale cosmic ray anisotropy at ~10 TeV can be explained by a modified Compton-Getting effect in the magnetized flow field of old supernova remnants. This approach suggests an optimum energy scale for detecting the anisotropy. Two key assumptions are that propagation is based on turbulence following a Kolmogorov law and that cosmic ray interactions are dominated by transport through stellar winds of the exploding stars. A prediction is that the amplitude is smaller at lower energies due to incomplete sampling of the velocity field and also smaller at larger energies due to smearing.Comment: 6 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Differential diagnosis of laryngeal spindle cell carcinoma and inflammatory myofibroblastic tumor – report of two cases with similar morphology

Author: Eck Matthias
Hagen Rudolf
Höller Sylvia
Müller-Hermelink Hans Konrad
Scheich Matthias
Ströbel Philipp
Völker Hans-Ullrich
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Spindle cell tumors of the larynx are rare. In some cases, the dignity is difficult to determine. We report two cases of laryngeal spindle cell tumors. CASE PRESENTATION: Case 1 is a spindle cell carcinoma (SPC) in a 55 year-old male patient and case 2 an inflammatory myofibroblastic tumor (IMT) in a 34 year-old female patient. A comprehensive morphological and immunohistochemical analysis was done. Both tumors arose at the vocal folds. Magnified laryngoscopy showed polypoid tumors. After resection, conventional histological investigation revealed spindle cell lesions with similar morphology. We found ulceration, mild atypia, and myxoid stroma. Before immunohistochemistry, the dignity was uncertain. Immunohistochemical investigations led to diagnosis of two distinct tumors with different biological behaviour. Both expressed vimentin. Furthermore, the SPC was positive for pan-cytokeratin AE1/3, CK5/6, and smooth-muscle actin, whereas the IMT reacted with antibodies against ALK-1, and EMA. The proliferation (Ki67) was up to 80% in SPC and 10% in IMT. Other stainings with antibodies against p53, p21, Cyclin D1, or Rb did not result in additional information. After resection, the patient with SPC is free of disease for seven months. The IMT recurred three months after first surgery, but no relapses were found eight months after resurgery. CONCLUSION: Differential diagnosis can be difficult without immunohistochemistry. Therefore, a comprehensive morphological and immunohistochemical analysis is necessary, but markers of cell cycle (apart from the assessment of proliferation) do not help

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central