Search CORE

26 research outputs found

DANSK and DaCy 2.6.0: Domain Generalization of Danish Named Entity Recognition

Author: Baglini Rebekah
Enevoldsen Kenneth
Jessen Emil Trenckner
Publication venue
Publication date: 28/02/2024
Field of study

Named entity recognition is one of the cornerstones of Danish NLP, essential for language technology applications within both industry and research. However, Danish NER is inhibited by a lack of available datasets. As a consequence, no current models are capable of fine-grained named entity recognition, nor have they been evaluated for potential generalizability issues across datasets and domains. To alleviate these limitations, this paper introduces: 1) DANSK: a named entity dataset providing for high-granularity tagging as well as within-domain evaluation of models across a diverse set of domains; 2) DaCy 2.6.0 that includes three generalizable models with fine-grained annotation; and 3) an evaluation of current state-of-the-art models' ability to generalize across domains. The evaluation of existing and new models revealed notable performance discrepancies across domains, which should be addressed within the field. Shortcomings of the annotation quality of the dataset and its impact on model training and evaluation are also discussed. Despite these limitations, we advocate for the use of the new dataset DANSK alongside further work on the generalizability within Danish NER

arXiv.org e-Print Archive

Vocal markers of autism:Assessing the generalizability of machine learning models

Author: Bilenberg Niels
Cantio Cathriona
Fusaroli Riccardo
Grossman Ruth
Jepsen Jens Richardt Møllegaard
Jessen Emil Trenckner
Larsen Stine Nyhus
Mortensen Marie Damsgaard
Rybner Astrid
Simonsen Arndis
Weed Ethan
Publication venue
Publication date: 01/06/2022
Field of study

Machine learning (ML) approaches show increasing promise in their ability to identify vocal markers of autism. Nonetheless, it is unclear to what extent such markers generalize to new speech samples collected, for example, using a different speech task or in a different language. In this paper, we systematically assess the generalizability of ML findings across a variety of contexts. We train promising published ML models of vocal markers of autism on novel cross-linguistic datasets following a rigorous pipeline to minimize overfitting, including cross-validated training and ensemble models. We test the generalizability of the models by testing them on (i) different participants from the same study, performing the same task; (ii) the same participants, performing a different (but similar) task; (iii) a different study with participants speaking a different language, performing the same type of task. While model performance is similar to previously published findings when trained and tested on data from the same study (out-of-sample performance), there is considerable variance between studies. Crucially, the models do not generalize well to different, though similar, tasks and not at all to new languages. The ML pipeline is openly shared. Generalizability of ML models of vocal markers of autism is an issue. We outline three recommendations for strategies researchers could take to be more explicit about generalizability and improve it in future studies. Lay Summary: Machine learning approaches promise to be able to identify autism from voice only. These models underestimate how diverse the contexts in which we speak are, how diverse the languages used are and how diverse autistic voices are. Machine learning approaches need to be more careful in defining their limits and generalizability.</p

Crossref

Syddansk Universitets Forskerportal

Good and Evil in Indian Buddhism: The five Sins of Immediate Retribution

Author: Banerjee
Black
Bloomfield
Eimer
Gómez
Hubbard
Jamison
Jamison
Jonathan A. Silk
Liebenthal
Liebenthal
Malalasekera
Matsunaga
Nyanaponika
Renou
Schopen
Schopen
Silk
Snellgrove
Trenckner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Asian Studie

Crossref

Leiden University Scholary Publications

Studies in Dhāraṇī Literature II: Pragmatics of Dhāraṇīs

Author: Alper
Aufrecht
Austin
Bendall
Bernhard
Bloomfield
Burgess
Bühler
Carston
Cohen
Cohen
Conze
Dahl
Davidson
Davidson
Dutt
Eltschinger
Fauisbøll
Fukita
Fussman
Gaastra
Gorman
Grey
Harlan
Harrison
Harrison
Hidas
Hoernle
Hoffmann
Horn
Hultzsch
Ilieva
Iwamoto
Iwamoto
Iwamoto
Iwamoto
Jacobi
Jacobs
Keith
Keith
Kern
Kielhorn
Kirste
Kurumiya
Kāle
Lacôte
Lamotte
Lamotte
Lehmann
Leumann
Lüders
Macdonnell
Manné
Marshall
Matsuda
Matsumura
Mimaki
Mirashi
Mukhopadyaya
Müller
Nobel
Nyáyaratna
Oldenberg
Olivelle
Olivelle
Parpola
Pischel
Pradhan
Pāṇḍeya
Rhys Davids
Ronald M. Davidson
Samten
Sarup
Sastri
Sastri
Searle
Sehgal
Shanon
Sharma
Shastri
Singh
Skilling
Skilling
Skilling
Staal
Sāṃtavalekara
Takubo
Tambiah
Tatia
Traugott
Trenckner
Trenckner
Tripathi
Vadantavagisa
Vaidya
Van Fintel
Van Nooten
Vira
Von Hinüber
Von Hinüber
Von Hinüber
Wakahara
Wayman
Wayman
Weber
Weber
Wessels-Mevissen
White
Witzel
Wogihara
Yamada
Yamada
Yuyama
Āpte
Śāstri
Śāstri
Publication venue: DigitalCommons@Fairfield
Publication date: 01/02/2014
Field of study

This article is one of a series that reassesses the dhāraṇī texts of Mahāyāna Buddhism. The article seeks to examine dhāraṇī texts by using the linguistic tools of pragmatics, especially historical pragmatics, to assist the understanding of their statements. Rather than the meaning of the term dhāraṇī as a subject term, the domain of truth-conditional semantics, this paper examines statements in texts labelled dhāraṇī. Pragmatics examines meaning in context, and the categories of speech acts developed by Searle has been especially helpful in mapping out differences within such texts and the formalization of statements across texts. The grammaticalization of specific speech elements, especially interjections, in the context of mantra-dhāraṇīs is also discussed

Crossref

Fairfield University: DigitalCommons@Fairfield