1,218 research outputs found
Recommended from our members
The Use of yig-cha and chos-kyi-rnam-grangs in Computing Lexical Cohesion for Tibetan Topic Boundary Detection
To properly implement a simple Tibetan Information Retrieval (IR) system segmentation of one form or another (n-gram, POS-tagging, dictionary substring matching, etc.) must be performed (see Hackett (2000b)). To take Tibetan indexing to a more sophisticated level however, some form of topic detection must be employed. This paper reports the results of a pilot study on the application to Tibetan of one technique for topic boundary detection: Lexical Cohesion. The resources developed and deployed, the theoretical model used, and its potential applications are discussed
Recommended from our members
An Entropy-based Assessment of the Unicode Encoding for Tibetan
This paper presents an analysis of the Unicode encoding scheme for Tibetan from the standpoint of morpheme entropy. We can speak of two levels of entropy in Tibetan: syllable entropy (a measure of the probability of the sequential occurrence of syllables), and morpheme entropy (a measure of the probability of the sequential occurrence of characters or morphemes), the latter being a measure of the redundancy of the language. Syllable entropy is a purely statistical calculation that is a function of the domain of the literature sampled, while morpheme entropy, we show, is relatively domain independent given a statistically significant sample. Morpheme entropy can be calculated statistically, though a theoretical upper bound can also be postulated based on language dependent morphology rules. This paper presents both theoretical and statistical estimates of the morpheme entropy for Tibetan, and explores the Tibetan Unicode encoding scheme in relation to data compression, and other issues analyzed in light of entropy-based language modeling
Recommended from our members
Automatic Segmentation and Part-Of-Speech Tagging For Tibetan: A First Step Towards Machine Translation
This paper presents what we believe to be the first reported work on Tibetan machine translation (MT). Of the three conceptually distinct components of a MT system — analysis, transfer, and generation — the first phase, consisting of POS tagging has been successfully completed. The combination POS tagger / word-segmenter was manually constructed as a rule-based multi-tagger relying on the Wilson formulation of Tibetan grammar. Partial parsing was also performed in combination with POS-tag sequence disambiguation. The component was evaluated at the task of document indexing for Information Retrieval (IR). Preliminary analysis indicated slightly better (though statistically comparable) performance to n-gram based approaches at a known-item IR task. Although segmentation is application specific, error analysis placed segmentation accuracy at 99%; the accuracy of the POS tagger is also estimated at 99% based on IR error analysis and random sampling
Recommended from our members
An Entropy-based Assessment of the Unicode Encoding for Tibetan
This paper presents an analysis of the Unicode encoding scheme for Tibetan from the standpoint of morpheme entropy. We can speak of two levels of entropy in Tibetan: syllable entropy (a measure of the probability of the sequential occurrence of syllables), and morpheme entropy (a measure of the probability of the sequential occurrence of characters or morphemes), the latter being a measure of the redundancy of the language. Syllable entropy is a purely statistical calculation that is a function of the domain of the literature sampled, while morpheme entropy, we show, is relatively domain independent given a statistically significant sample. Morpheme entropy can be calculated statistically, though a theoretical upper bound can also be postulated based on language dependent morphology rules. This paper presents both theoretical and statistical estimates of the morpheme entropy for Tibetan, and explores the Tibetan Unicode encoding scheme in relation to data compression, and other issues analyzed in light of entropy-based language modeling
Recommended from our members
Document Translation for Cross-Language Text Retrieval at the University of Maryland
The University of Maryland participated in three TREC-6 tasks: ad hoc retrieval, cross-language retrieval, and spoken document retrieval. The principal focus of the work was evaluation of a cross-language text retrieval technique based on fully automatic machine translation. The results show that approaches based on document translation can be approximately as effective as approaches based on query translation, but that additional work will be needed to develop a solid basis for choosing between the two in specific applications. Ad hoc and spoken document retrieval results are also presented
Mustard catch crop enhances denitrification in shallow groundwater beneath a spring barley field
The study was funded by Department of Agriculture and Food through the Research Stimulus Fund Programme (Grant RSF 06383) in collaboration with the Department of Civil, Structural & Environmental Engineering, Trinity College Dublin, Ireland.peer-reviewedOver-winter green cover crops have been reported to increase dissolved organic carbon (DOC) concentrations in groundwater, which can be used as an energy source for denitrifiers. This study investigates the impact of a mustard catch crop on in situ denitrification and nitrous oxide (N2O) emissions from an aquifer overlain by arable land. Denitrification rates and N2O-N/(N2O-N + N2-N) mole fractions were measured in situ with a push–pull method in shallow groundwater under a spring barley system in experimental plots with and without a mustard cover crop. The results suggest that a mustard cover crop could substantially enhance reduction of groundwater nitrate NO3--N via denitrification without significantly increasing N2O emissions. Mean total denitrification (TDN) rates below mustard cover crop and no cover crop were 7.61 and 0.002 μg kg−1 d−1, respectively. Estimated N2O-N/(N2O-N + N2-N) ratios, being 0.001 and 1.0 below mustard cover crop and no cover crop respectively, indicate that denitrification below mustard cover crop reduces N2O to N2, unlike the plot with no cover crop. The observed enhanced denitrification under the mustard cover crop may result from the higher groundwater DOC under mustard cover crop (1.53 mg L−1) than no cover crop (0.90 mg L−1) being added by the root exudates and root masses of mustard. This study gives insights into the missing piece in agricultural nitrogen (N) balance and groundwater derived N2O emissions under arable land and thus helps minimise the uncertainty in agricultural N and N2O-N balances
Re-engineering Public Education: Developing New Technologies in Teaching and Assessment
In the nineteen-nineties, I was principal of a middle school when the accountability issue burst into prominence in the state of Alabama in the form of norm-referenced testing as the main tool to evaluate school performance. Designed by well-meaning educators to meet the requirements of Alabama legislation, the accountability program in Alabama was developed to put some teeth into the curriculum. Schools and systems that performed poorly faced state takeover. The Alabama accountability issue was one face of a national movement predicated on the idea that the public schools in the United States have failed egregiously and that more stringent accountability standards will set expectations forcing teachers to do a better job teaching and students to do a better job learning (Houston, 2003). Schools and school systems across the United States were facing the same types of accountability standards and were being evaluated through student performance on standardized tests, criterion referenced tests, or a combination of the two
Meningococcal disease in children in Merseyside, England:a 31 year descriptive study
Meningococcal disease (MCD) is the leading infectious cause of death in early childhood in the United Kingdom, making it a public health priority. MCD most commonly presents as meningococcal meningitis (MM), septicaemia (MS), or as a combination of the two syndromes (MM/MS). We describe the changing epidemiology and clinical presentation of MCD, and explore associations with socioeconomic status and other risk factors. A hospital-based study of children admitted to a tertiary children's centre, Alder Hey Children's Foundation Trust, with MCD, was undertaken between 1977 to 2007 (n = 1157). Demographics, clinical presentations, microbiological confirmation and measures of deprivation were described. The majority of cases occurred in the 1-4 year age group and there was a dramatic fall in serogroup C cases observed with the introduction of the meningococcal C conjugate (MCC) vaccine. The proportion of MS cases increased over the study period, from 11% in the first quarter to 35% in the final quarter. Presentation with MS (compared to MM) and serogroup C disease (compared to serogroup B) were demonstrated to be independent risk factors for mortality, with odds ratios of 3.5 (95% CI 1.18 to 10.08) and 2.18 (95% CI 1.26 to 3.80) respectively. Cases admitted to Alder Hey were from a relatively more deprived population (mean Townsend score 1.25, 95% CI 1.09 to 1.41) than the Merseyside reference population. Our findings represent one of the largest single-centre studies of MCD. The presentation of MS is confirmed to be a risk factor of mortality from MCD. Our study supports the association between social deprivation and MCD
Plant exudates may stabilize or weaken soil depending on species, origin and time
We hypothesized that plant exudates could either gel or disperse soil depending on their chemical characteristics. Barley (Hordeum vulgare L. cv. Optic) and maize (Zea mays L.cv. Freya) root exudates were collected using an aerated hydroponic method and compared to chia (Salvia hispanica L.) seed exudate, a commonly used root exudate analogue. Sandy loam soil passed through a 500-μm mesh was treated with each exudate at a concentrationof 4.6 mg exudate g-1 dry soil. Two sets of soil samples were prepared, One set of treated soil samples was maintained at 4oC to suppress microbial processes. To characterize the effect of decomposition, the second set of samples was incubated at 16C for 2 weeks at – 30 kPa matric potential. Gas chromatography–mass spectrometry (GC–MS) analysis of the exudates found that barley had the largest organic acid content and chia the largest content of sugars (polysaccharide-derived or free), and maize was in between barley and chia. Yield stress of amended soil samples was measured by an oscillatory strain sweep test with a cone plate rheometer. When microbial decomposition was suppressed at 4oC, yield stress increased 20-fold for chia seed exudate and two-fold for maize root exudate compared to the control, whereas for barley root exudate it decreased to half. The yield stress after 2 weeks of incubation compared to soil with suppressed microbial decomposition increased by 85% for barley root exudate, but for chia and maize it decreased to by 87% and 54%, respectively. Barley root exudation might therefore disperse soil and this could facilitate nutrient release. The maize root and chia seed exudates gelled soil, which could create a more stable soil structure around roots or seeds
- …
