Search CORE

398 research outputs found

Lexical typology through similarity semantics: Toward a semantic map of motion verbs

Author: Cysouw Michael
Wälchli Bernhard
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2012
Field of study

This paper discusses a multidimensional probabilistic semantic map of lexical motion verb stems based on data collected from parallel texts (viz. translations of the Gospel according to Mark) for 100 languages from all continents. The crosslinguistic diversity of lexical semantics in motion verbs is illustrated in detail for the domain of `go', `come', and `arrive' type contexts. It is argued that the theoretical bases underlying probabilistic semantic maps from exemplar data are the isomorphism hypothesis (given any two meanings and their corresponding forms in any particular language, more similar meanings are more likely to be expressed by the same form in any language), similarity semantics (similarity is more basic than identity), and exemplar semantics (exemplar meaning is more fundamental than abstract concepts)

Crossref

Open Access LMU ( Ludwig-Maximilians-Univ. München)

MPG.PuRe

Swepub

Very atypical agreement indeed

Author: Cysouw Michael
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2011
Field of study

Crossref

Open Access LMU ( Ludwig-Maximilians-Univ. München)

Managing writing systems using orthography profiles

Author: Cysouw Michael
Moran Steven
Publication venue
Publication date: 01/01/2018
Field of study

This text is a practical guide for linguists, and programmers, who work with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together at the intersection between the Unicode Standard and the International Phonetic Alphabet. Although these standards are often met with frustration by users, they nevertheless provide language researchers and programmers with a consistent computational architecture needed to process, publish and analyze lexical data from the world's languages. Thus we bring to light common, but not always transparent, pitfalls which researchers face when working with Unicode and IPA. Having identified and overcome these pitfalls involved in making writing systems and character encodings syntactically and semantically interoperable (to the extent that they can be), we created a suite of open-source Python and R tools to work with languages using orthography profiles that describe author- or document-specific orthographic conventions. In this cookbook we describe a formal specification of orthography profiles and provide recipes using open source tools to show how users can segment text, analyze it, identify errors, and to transform it into different written forms for comparative linguistics research

Institutional Repository of the Freie Universität Berlin

Languoid, Doculect, and Glossonym: Formalizing the Notion 'Language'

Author: Cysouw Michael
Good Jeff
Publication venue
Publication date: 19/12/2013
Field of study

It is perfectly reasonable for laypeople and non-linguistic scholars to use names for languages without reflecting on the proper definition of the objects referred to by these names. Simply using a name like English or Witotoan suffices as an informal communicative designation for a particular language or a language group. However, for the linguistics community, which is by definition occupied with the details of languages and language variation, it is somewhat bizarre that there does not exist a proper technical apparatus to talk about intricate differences in opinion about the precise sense of a name like English or Witotoan when used in academic discussion. We propose three interrelated concepts—LANGUOID, DOCULECT, and GLOSSONYM—which provide a principled basis for discussion of different points of view about key issues, such as whether two varieties should be associated with the same language, and allow for a precise description of what exactly is being claimed by the use of a given genealogical or areal group name. The framework they provide should be especially useful to researchers who work on underdescribed languages where basic issues of classification remain unresolved

ScholarSpace at University of Hawai'i at Manoa

The entropy of words-learnability and expressivity across more than 1000 languages

Author: Alikaniotis Dimitrios
Bentz Chris
Cysouw Michael
Ferrer Cancho Ramon
Publication venue: 'MDPI AG'
Publication date: 01/01/2017
Field of study

The choice associated with words is a fundamental property of natural languages. It lies at the heart of quantitative linguistics, computational linguistics and language sciences more generally. Information theory gives us tools at hand to measure precisely the average amount of choice associated with words: the word entropy. Here, we use three parallel corpora, encompassing ca. 450 million words in 1916 texts and 1259 languages, to tackle some of the major conceptual and practical problems of word entropy estimation: dependence on text size, register, style and estimation method, as well as non-independence of words in co-text. We present two main findings: Firstly, word entropies display relatively narrow, unimodal distributions. There is no language in our sample with a unigram entropy of less than six bits/word. We argue that this is in line with information-theoretic models of communication. Languages are held in a narrow range by two fundamental pressures: word learnability and word expressivity, with a potential bias towards expressivity. Secondly, there is a strong linear relationship between unigram entropies and entropy rates. The entropy difference between words with and without co-textual information is narrowly distributed around ca. three bits/word. In other words, knowing the preceding text reduces the uncertainty of words by roughly the same amount across languages of the world.Peer ReviewedPostprint (published version

Multidisciplinary Digital Publishing Institute

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Directory of Open Access Journals

UPCommons (Universitat Politècnica de Catalunya)