54 research outputs found
Constructing a poor man’s wordnet in a resource-rich world
International audienceIn this paper we present a language-independent, fully modular and automatic approach to bootstrap a wordnet for a new language by recycling different types of already existing language resources, such as machine-readable dictionaries, parallel corpora, and Wikipedia. The approach, which we apply here to Slovene, takes into account monosemous and polysemous words, general and specialised vocabulary as well as simple and multi-word lexemes. The extracted words are then assigned one or several synset ids, based on a classifier that relies on several features including distributional similarity. Finally, we identify and remove highly dubious (literal, synset) pairs, based on simple distributional information extracted from a large corpus in an unsupervised way. Automatic, manual and task-based evaluations show that the resulting resource, the latest version of the Slovene wordnet, is already a valuable source of lexico-semantic information
Multilingual Question/Answering: the DIOGENESystem
This paper presents the DIOGENE question/answeringsystemdevelopedatITC-Irst.Thesystemisbasedonarather standard architecturewhichincludesthreecomponents forquestionprocessing,searchandanswer extraction. Linguisticprocessingstronglyrelies on MULTIWORDNET, anextendedversionof th
African Wordnet: Sesotho sa Leboa 1.0
Developed using the expand model with Princeton WordNet 2.0 as basis. Each wordnet contains synsets with at least the following fields:\nWord form (lemma; synonym)\nID (linking to the Princeton Wordnet 2.0)\nPart of speech\nDomain\nSUMO/MILO classification\n\nAdditional data may include the following fields:\nUsage example(s)\nDefinition\nHypernym\nHyponym\nStamp\nNotes\nNon-lexicalisation\n\nPlease see https://africanwordnet.wordpress.com/ for all details on the projec
- …
