82 research outputs found

    Constructing a poor man’s wordnet in a resource-rich world

    Get PDF
    International audienceIn this paper we present a language-independent, fully modular and automatic approach to bootstrap a wordnet for a new language by recycling different types of already existing language resources, such as machine-readable dictionaries, parallel corpora, and Wikipedia. The approach, which we apply here to Slovene, takes into account monosemous and polysemous words, general and specialised vocabulary as well as simple and multi-word lexemes. The extracted words are then assigned one or several synset ids, based on a classifier that relies on several features including distributional similarity. Finally, we identify and remove highly dubious (literal, synset) pairs, based on simple distributional information extracted from a large corpus in an unsupervised way. Automatic, manual and task-based evaluations show that the resulting resource, the latest version of the Slovene wordnet, is already a valuable source of lexico-semantic information

    Making use of iron sensors for environmental applications

    No full text
    New sensors based on iron(II) coordination complexes are discussed in detail

    Taking corpus variability into account in keyword analysis

    No full text
    Most studies that make use of keyword analysis rely on the log-likelihood or the chi-square to extract words that are particularly characteristic of a corpus (e.g. Scott & Tribble 2006). These measures are computed on the basis of absolute frequencies and cannot account for the fact that "corpora are inherently variable internally" (Gries 2007). To overcome this limitation, measures of dispersion are sometimes used in combination with keyness values (e.g. Rayson 2003; Oakes & Farrow 2007). Some scholars have also suggested using other statistical measures (e.g. t-test, Wilcoxon's rank-sum test) but these techniques have not gained corpus linguists' favour (yet?). One possible explanation for this lack of enthusiasm is that their statistical added value has rarely been discussed in terms of 'linguistic' added value. To the authors' knowledge, there is not a single study comparing keywords extracted by means of different measures. In our presentation, we will report on a follow-up study to Paquot (2007), which made use of the log-likelihood and measures of range and dispersion to extract academic words and design a productively-oriented academic word list. We make use of the log-likelihood, the t-test and the Wilcoxon's rank-sum test in turn to compare the academic and the fiction sub-corpora of the 'British National Corpus' and extract words that are typical of academic discourse. We compare the three lists of academic keywords on a number of criteria (e.g. number of keywords extracted by each measure, percentage of keywords that are shared in the three lists, frequency and distribution of academic keywords in the two corpora) and explore the specificities of the three statistical measures. We also assess the advantages and disadvantages of these measures for the design of an academic wordlist

    Between lexis and discourse: a cross-register study of connectives of contrast

    No full text
    Most reference grammars of English (e.g. Quirk et al. 1972; Quirk et al. 1985; Leech & Svartvik 1994; Halliday & Matthiessen 2004) tend to make very general statements about adverbial connective placement in English, paying relatively little attention to variables such as lexis or register. For example, in Quirk et al. (1972), we find the claim that “the normal position for most conjuncts is I [initial]. […] M [medial] positions are rare for most conjuncts, and E [end] rarer still” (ibid.: 526-7). Similarly but in a systemic-functional framework, Halliday & Matthiessen describe adverbial connectives as “what we might call characteristically thematic […]. They are natural Themes” (2004: 83). Yet, as Biber et al.’s (1999: 890-2) brief corpus-based description of adverbial connective placement suggests, connective placement may vary according to both lexical and stylistic factors. The present study investigates 10 adverbial connectives of contrast across three language registers, with a view to assessing the impact of lexis and register on their placement and discourse functions. The corpus is made up of three subcomponents: c. 1.4 million words from Europarl, a corpus of transcribed debates from the European parliament ; the English subpart of the Mult-ed corpus of quality paper editorials (c. 1 million words); and the English component of the KIAP corpus, made up of research articles from three different disciplines (c. 1.3 million words; see Fløttum et al. 2006). The corpus search focuses on the most frequent connectives from a list of 28 adverbial connectives, after extraction from the corpus via WordSmith Tools 6 (Scott 2012), and manual disambiguation in context. The study is grounded in the framework of Systemic Functional Linguistics (SFL), and relies on a classification of position which identifies three rhematic positions in addition to the usual thematic positions identified in SFL (see Halliday & Matthiessen 2004), thus making it possible to provide a detailed account of the placement of connectives occurring after the topical theme (see Dupont, in press). Preliminary results reveal that both the frequency and positioning of the connectives investigated vary significantly across the three registers. Parliamentary debates were found to favour thematic positions for adverbial connectives of contrast, as in (1), as opposed to the editorials, which displayed a marked tendency to use connectives rhematically, as in (2). The research articles were found to stand in between these two tendencies. (1) However, while short-term food aid is vital to respond to emergencies (…), EU food aid policy must work towards long-term security in food supply (Europarl). (2) Tony Blair, however, has been persuaded by the Home Secretary that identity cards might be the answer to the Government's own identity crisis (Mult-ed). A focus on the placement patterns of each individual connective revealed that connectives also seem to display fairly idiosyncratic placement patterns. The results highlighted two main types of placement profiles: while some connectives, such as instead or though, displayed very stable placement profiles across registers, other connectives, such as on the other hand and however, exhibited fairly variable patterns. The study thus provides evidence of both item- and register-related variation. A more qualitative analysis of the results revealed that connective placement frequently goes hand in hand with specific discourse effects, pertaining to information structure. More particularly, connectives in rhematic positions were found to fulfil discourse functions, such as focusing attention on the theme or some element within the rheme and partitioning of given and new information, in addition to their purely connective function (see also Altenberg 2006; Lenker 2011). References Altenberg, Bengt. 2006. The function of adverbial connectors in second initial position in English and Swedish. In Karin Aijmer and Anne-Marie Simon-Vandenbergen (eds.) Pragmatic Markers in Contrast. Oxford: Elsevier, pp. 11-37. Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad and Edward Finnegan. 1999. Longman Grammar of Spoken and Written English. London: Longman. Cartoni, Bruno and Thomas Meyer. 2012. Extracting directional and comparable corpora from a multilingual corpus for translation studies. Proceedings of the eighth international conference of Language and Resources and Evaluation (LREC), Istanbul, May 2012. Dupont, Maïté. In press. Word order in English and French. The position of English and French adverbial connectors of contrast. English Text Construction 8(1). Fløttum, Kjersti, Trine Dahl and Torodd Kinn. 2006. Academic Voices. Across Languages and Disciplines. Amsterdam and Philadelphia: John Benjamins. Halliday, M.A.K. and Christian M.I.M Matthiessen. 2004. An Introduction to Functional Grammar. London: Hodder Arnold. Leech, G. and Jan Svartvik. 1994. A Communicative Grammar of English. London: Longman. Lenker, Ursula. 2011. A focus on adverbial connectors: Connecting, partitioning and focusing attention in the history of English. In Anneli Meurmann-Solin and Ursula Lenker (eds.) Connectives in Synchrony and Diachrony in European Languages. Helsinki: VARIENG. Available at: http://www.helsinki.fi/varieng/series/volumes/08/lenker/ (last accessed on 11/07/14). Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech and Jan Svartvik. 1972. A Grammar of Contemporary English. London: Longman. Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech and Jan Svartvik. 1985. A Comprehensive Grammar of the English Language. London: Longman. Scott, Mike. 2012. WordSmith Tools 6. Liverpool: Lexical Analysis Software
    corecore