120 research outputs found
Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages
Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality of these languages. While languages within the same language family share many properties, many under-resourced languages are written in their own native script, which makes taking advantage of these language similarities difficult. In this paper, we propose to alleviate the problem of different scripts by transcribing the native script into common representation i.e. the Latin script or the International Phonetic Alphabet (IPA). In particular, we compare the difference between coarse-grained transliteration to the Latin script and fine-grained IPA transliteration. We performed experiments on the language pairs English-Tamil, English-Telugu, and English-Kannada translation task. Our results show improvements in terms of the BLEU, METEOR and chrF scores from transliteration and we find that the transliteration into the Latin script outperforms the fine-grained IPA transcription
Bilingual lexicon induction across orthographically-distinct under-resourced Dravidian languages
Bilingual lexicons are a vital tool for under-resourced languages and recent state-of-the-art approaches to this leverage pretrained monolingual word embeddings using supervised or semi- supervised approaches. However, these approaches require cross-lingual information such as seed dictionaries to train the model and find a linear transformation between the word embedding spaces. Especially in the case of low-resourced languages, seed dictionaries are not readily available, and as such, these methods produce extremely weak results on these languages. In this work, we focus on the Dravidian languages, namely Tamil, Telugu, Kannada, and Malayalam, which are even more challenging as they are written in unique scripts. To take advantage of orthographic information and cognates in these languages, we bring the related languages into a single script. Previous approaches have used linguistically sub-optimal measures such as the Levenshtein edit distance to detect cognates, whereby we demonstrate that the longest common sub-sequence is linguistically more sound and improves the performance of bilingual lexicon induction. We show that our approach can increase the accuracy of bilingual lexicon induction methods on these languages many times, making bilingual lexicon induction approaches feasible for such under-resourced languages
Substance use and dietary practices among students attending alternative high schools: results from a pilot study
<p>Abstract</p> <p>Background</p> <p>Substance use and poor dietary practices are prevalent among adolescents. The purpose of this study was to examine frequency of substance use and associations between cigarette, alcohol and marijuana use and selected dietary practices, such as sugar-sweetened beverages, high-fat foods, fruits and vegetables, and frequency of fast food restaurant use among alternative high school students. Associations between multi-substance use and the same dietary practices were also examined.</p> <p>Methods</p> <p>A convenience sample of adolescents (n = 145; 61% minority, 52% male) attending six alternative high schools in the St Paul/Minneapolis metropolitan area completed baseline surveys. Students were participants in the Team COOL (Controlling Overweight and Obesity for Life) pilot study, a group randomized obesity prevention pilot trial. Mixed model multivariate analyses procedures were used to assess associations of interest.</p> <p>Results</p> <p>Daily cigarette smoking was reported by 36% of students. Cigarette smoking was positively associated with consumption of regular soda (p = 0.019), high-fat foods (p = 0.037), and fast food restaurant use (p = 0.002). Alcohol (p = 0.005) and marijuana use (p = 0.035) were positively associated with high-fat food intake. With increasing numbers of substances, a positive trend was observed in high-fat food intake (p = 0.0003). There were no significant associations between substance use and fruit and vegetable intake.</p> <p>Conclusions</p> <p>Alternative high school students who use individual substances as well as multiple substances may be at high risk of unhealthful dietary practices. Comprehensive health interventions in alternative high schools have the potential of reducing health-compromising behaviors that are prevalent among this group of students. This study adds to the limited research examining substance use and diet among at-risk youth.</p> <p>Trial registration number</p> <p>ClinicalTrials.gov: <a href="http://www.clinicaltrials.gov/ct2/show/NCT01315743">NCT01315743</a></p
Adsorption of BSA (Bovine Serum Albuminum) and lysozyme on poly(vinyl acetate) particles
A damage-based temperature-dependent model for ductile fracture with finite strains and configurational forces
NUIG at TIAD: Combining unsupervised NLP and graph metrics for translation inference
In this paper, we present the NUIG system at the TIAD shard task. This system includes graph-based metrics calculated using novel
algorithms, with an unsupervised document embedding tool called ONETA and an unsupervised multi-way neural machine translation
method. The results are an improvement over our previous system and produce the highest precision among all systems in the task as
well as very competitive F-Measure results. Incorporating features from other systems should be easy in the framework we describe in
this paper, suggesting this could very easily be extended to an even stronger result.This publication has emanated from research supported in
part by a research grant from Science Foundation Ireland
(SFI) under Grant Number SFI/12/RC/2289 P2, co-funded
by the European Regional Development Fund, as well as
by the H2020 project Pret- ˆ a-LLOD under Grant Agreement `
number 825182.peer-reviewe
Linking knowledge graphs across languages with semantic similarity and machine translation
Knowledge graphs and ontologies underpin many natural language processing applications, and to apply these to new languages, these knowledge graphs must be
translated. Up until now, this has been
achieved either by direct label translation or by cross-lingual alignment, which
matches the concepts in the graph to another graph in the target languages. We
show that these two approaches can, in
fact, be combined and that the combination of machine translation and crosslingual alignment can obtain improved results for translating a biomedical ontology
from English to German.This work was supported by the Science Foundation Ireland under Grant Number SFI/12/RC/2289
(Insight).non-peer-reviewe
- …
