Search CORE

120 research outputs found

Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages

Author: Arcan Mihael
Chakravarthi Bharathi Raja
McCrae John P.
Publication venue: OASIcs - OpenAccess Series in Informatics. 2nd Conference on Language, Data and Knowledge (LDK 2019)
Publication date: 01/01/2019
Field of study

Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality of these languages. While languages within the same language family share many properties, many under-resourced languages are written in their own native script, which makes taking advantage of these language similarities difficult. In this paper, we propose to alleviate the problem of different scripts by transcribing the native script into common representation i.e. the Latin script or the International Phonetic Alphabet (IPA). In particular, we compare the difference between coarse-grained transliteration to the Latin script and fine-grained IPA transliteration. We performed experiments on the language pairs English-Tamil, English-Telugu, and English-Kannada translation task. Our results show improvements in terms of the BLEU, METEOR and chrF scores from transliteration and we find that the transliteration into the Latin script outperforms the fine-grained IPA transcription

ZENODO

DROPS Dagstuhl Research Online Publication Server

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Bilingual lexicon induction across orthographically-distinct under-resourced Dravidian languages

Author: Arcan Mihael
Chakravarthi Bharathi Raja
McCrae John P.
McGuinness Kevin
O'Connor Noel E.
Rajasekaran Navaneethan
Publication venue: International Committee on Computational Linguistics (ICCL)
Publication date: 13/12/2020
Field of study

Bilingual lexicons are a vital tool for under-resourced languages and recent state-of-the-art approaches to this leverage pretrained monolingual word embeddings using supervised or semi- supervised approaches. However, these approaches require cross-lingual information such as seed dictionaries to train the model and find a linear transformation between the word embedding spaces. Especially in the case of low-resourced languages, seed dictionaries are not readily available, and as such, these methods produce extremely weak results on these languages. In this work, we focus on the Dravidian languages, namely Tamil, Telugu, Kannada, and Malayalam, which are even more challenging as they are written in unique scripts. To take advantage of orthographic information and cognates in these languages, we bring the related languages into a single script. Previous approaches have used linguistically sub-optimal measures such as the Levenshtein edit distance to detect cognates, whereby we demonstrate that the longest common sub-sequence is linguistically more sound and improves the performance of bilingual lexicon induction. We show that our approach can increase the accuracy of bilingual lexicon induction methods on these languages many times, making bilingual lexicon induction approaches feasible for such under-resourced languages

Irish Universities

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

DCU Online Research Access Service

Substance use and dietary practices among students attending alternative high schools: results from a pilot study

Abstract Background Substance use and poor dietary practices are prevalent among adolescents. The purpose of this study was to examine frequency of substance use and associations between cigarette, alcohol and marijuana use and selected dietary practices, such as sugar-sweetened beverages, high-fat foods, fruits and vegetables, and frequency of fast food restaurant use among alternative high school students. Associations between multi-substance use and the same dietary practices were also examined. Methods A convenience sample of adolescents (n = 145; 61% minority, 52% male) attending six alternative high schools in the St Paul/Minneapolis metropolitan area completed baseline surveys. Students were participants in the Team COOL (Controlling Overweight and Obesity for Life) pilot study, a group randomized obesity prevention pilot trial. Mixed model multivariate analyses procedures were used to assess associations of interest. Results Daily cigarette smoking was reported by 36% of students. Cigarette smoking was positively associated with consumption of regular soda (p = 0.019), high-fat foods (p = 0.037), and fast food restaurant use (p = 0.002). Alcohol (p = 0.005) and marijuana use (p = 0.035) were positively associated with high-fat food intake. With increasing numbers of substances, a positive trend was observed in high-fat food intake (p = 0.0003). There were no significant associations between substance use and fruit and vegetable intake. Conclusions Alternative high school students who use individual substances as well as multiple substances may be at high risk of unhealthful dietary practices. Comprehensive health interventions in alternative high schools have the potential of reducing health-compromising behaviors that are prevalent among this group of students. This study adds to the limited research examining substance use and diet among at-risk youth. Trial registration number ClinicalTrials.gov: <a href="http://www.clinicaltrials.gov/ct2/show/NCT01315743">NCT01315743</a></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Adsorption of BSA (Bovine Serum Albuminum) and lysozyme on poly(vinyl acetate) particles

Crossref

A damage-based temperature-dependent model for ductile fracture with finite strains and configurational forces

Author: A Gupta
A Gupta
A Mielke
B Bourdin
B Bourdin
BE Amstutz
C Chen
C Truesdell
EH Lee
F Brezzi
G Allaire
GA Maugin
HA Sosa
I Doghri
J Bonet
J Korelc
J Lemaitre
J Lubliner
J Oliver
JC Simo
JC Simo
JC Simo
L Xue
M Arcan
M Jirásek
M Klisinski
MA Crisfield
MA Sutton
ME Gurtin
ME Gurtin
N. Van Goethem
P Areias
P Areias
P Areias
P Areias
P Areias
P Mattila
P. Areias
S Conti
S Li
S Nemat-Nasser
T Belytschko
TM Maccagno
Y Bai
ZP Bažant
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

NUIG at TIAD: Combining unsupervised NLP and graph metrics for translation inference

Author: Arcan Mihael
McCrae John P.
Publication venue: European Language Resources Association (ELRA)
Publication date: 11/05/2020
Field of study

In this paper, we present the NUIG system at the TIAD shard task. This system includes graph-based metrics calculated using novel algorithms, with an unsupervised document embedding tool called ONETA and an unsupervised multi-way neural machine translation method. The results are an improvement over our previous system and produce the highest precision among all systems in the task as well as very competitive F-Measure results. Incorporating features from other systems should be easy in the framework we describe in this paper, suggesting this could very easily be extended to an even stronger result.This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 P2, co-funded by the European Regional Development Fund, as well as by the H2020 project Pret- ˆ a-LLOD under Grant Agreement ` number 825182.peer-reviewe

Irish Universities

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

University of Galway Research Repository

Linking knowledge graphs across languages with semantic similarity and machine translation

Author: Arcan Mihael
Buitelaar Paul
McCrae John P.
Publication venue: MLP 2017
Publication date: 23/01/2019
Field of study

Knowledge graphs and ontologies underpin many natural language processing applications, and to apply these to new languages, these knowledge graphs must be translated. Up until now, this has been achieved either by direct label translation or by cross-lingual alignment, which matches the concepts in the graph to another graph in the target languages. We show that these two approaches can, in fact, be combined and that the combination of machine translation and crosslingual alignment can obtain improved results for translating a biomedical ontology from English to German.This work was supported by the Science Foundation Ireland under Grant Number SFI/12/RC/2289 (Insight).non-peer-reviewe

University of Galway Research Repository