199 research outputs found

    Calculating the random guess scores of multiple-response and matching test items

    Get PDF
    For achievement tests, the guess score is often used as a baseline for the lowest possible grade for score to grade transformations and setting the cut scores. For test item types such as multiple-response, matching and drag-and-drop, determin-ing the guess score requires more elaborate calculations than the more straight-forward calculation of the guess score for True-False and multiple-choice test item formats. For various variants of multiple-response and matching types with respect to dichotomous and polytomous scoring, methods for determining the guess score are presented and illustrated with practical applications. The implica-tions for theory and practice are discussed

    The rating reliability calculator

    Get PDF
    BACKGROUND: Rating scales form an important means of gathering evaluation data. Since important decisions are often based on these evaluations, determining the reliability of rating data can be critical. Most commonly used methods of estimating reliability require a complete set of ratings i.e. every subject being rated must be rated by each judge. Over fifty years ago Ebel described an algorithm for estimating the reliability of ratings based on incomplete data. While his article has been widely cited over the years, software based on the algorithm is not readily available. This paper describes an easy-to-use Web-based utility for estimating the reliability of ratings based on incomplete data using Ebel's algorithm. METHODS: The program is available public use on our server and the source code is freely available under GNU General Public License. The utility is written in PHP, a common open source imbedded scripting language. The rating data can be entered in a convenient format on the user's personal computer that the program will upload to the server for calculating the reliability and other statistics describing the ratings. RESULTS: When the program is run it displays the reliability, number of subject rated, harmonic mean number of judges rating each subject, the mean and standard deviation of the averaged ratings per subject. The program also displays the mean, standard deviation and number of ratings for each subject rated. Additionally the program will estimate the reliability of an average of a number of ratings for each subject via the Spearman-Brown prophecy formula. CONCLUSION: This simple web-based program provides a convenient means of estimating the reliability of rating data without the need to conduct special studies in order to provide complete rating data. I would welcome other researchers revising and enhancing the program

    Network Archaeology: Uncovering Ancient Networks from Present-day Interactions

    Get PDF
    Often questions arise about old or extinct networks. What proteins interacted in a long-extinct ancestor species of yeast? Who were the central players in the Last.fm social network 3 years ago? Our ability to answer such questions has been limited by the unavailability of past versions of networks. To overcome these limitations, we propose several algorithms for reconstructing a network's history of growth given only the network as it exists today and a generative model by which the network is believed to have evolved. Our likelihood-based method finds a probable previous state of the network by reversing the forward growth model. This approach retains node identities so that the history of individual nodes can be tracked. We apply these algorithms to uncover older, non-extant biological and social networks believed to have grown via several models, including duplication-mutation with complementarity, forest fire, and preferential attachment. Through experiments on both synthetic and real-world data, we find that our algorithms can estimate node arrival times, identify anchor nodes from which new nodes copy links, and can reveal significant features of networks that have long since disappeared.Comment: 16 pages, 10 figure

    The Aguablanca Ni–(Cu) sulfide deposit, SW Spain: geologic and geochemical controls and the relationship with a midcrustal layered mafic complex

    Get PDF
    The Aguablanca Ni–(Cu) sulfide deposit is hosted by a breccia pipe within a gabbro–diorite pluton. The deposit probably formed due to the disruption of a partially crystallized layered mafic complex at about 12– 19 km depth and the subsequent emplacement of melts and breccias at shallow levels (<2 km). The ore-hosting breccias are interpreted as fragments of an ultramafic cumulate, which were transported to the near surface along with a molten sulfide melt. Phlogopite Ar–Ar ages are 341– 332 Ma in the breccia pipe, and 338–334 Ma in the layered mafic complex, and are similar to recently reported U–Pb ages of the host Aguablanca Stock and other nearby calcalkaline metaluminous intrusions (ca. 350–330 Ma). Ore deposition resulted from the combination of two critical factors, the emplacement of a layered mafic complex deep in the continental crust and the development of small dilational structures along transcrustal strike-slip faults that triggered the forceful intrusion of magmas to shallow levels. The emplacement of basaltic magmas in the lower middle crust was accompanied by major interaction with the host rocks, immiscibility of a sulfide melt, and the formation of a magma chamber with ultramafic cumulates and sulfide melt at the bottom and a vertically zoned mafic to intermediate magmas above. Dismembered bodies of mafic/ultramafic rocks thought to be parts of the complex crop out about 50 km southwest of the deposit in a tectonically uplifted block (Cortegana Igneous Complex, Aracena Massif). Reactivation of Variscan structures that merged at the depth of the mafic complex led to sequential extraction of melts, cumulates, and sulfide magma. Lithogeochemistry and Sr and Nd isotope data of the Aguablanca Stock reflect the mixing from two distinct reservoirs, i.e., an evolved siliciclastic middle-upper continental crust and a primitive tholeiitic melt. Crustal contamination in the deep magma chamber was so intense that orthopyroxene replaced olivine as the main mineral phase controlling the early fractional crystallization of the melt. Geochemical evidence includes enrichment in SiO2 and incompatible elements, and Sr and Nd isotope compositions (87Sr/86Sri 0.708–0.710; 143Nd/144Ndi 0.512–0.513). However, rocks of the Cortegana Igneous Complex have low initial 87Sr/86Sr and high initial 143Nd/144Nd values suggesting contamination by lower crustal rocks. Comparison of the geochemical and geological features of igneous rocks in the Aguablanca deposit and the Cortegana Igneous Complex indicates that, although probably part of the same magmatic system, they are rather different and the rocks of the Cortegana Igneous Complex were not the direct source of the Aguablanca deposit. Crust–magma interaction was a complex process, and the generation of orebodies was controlled by local but highly variable factors. The model for the formation of the Aguablanca deposit presented in this study implies that dense sulfide melts can effectively travel long distances through the continental crust and that dilational zones within compressional belts can effectively focus such melt transport into shallow environments

    An assessment of functioning and non-functioning distractors in multiple-choice questions: a descriptive analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong.</p> <p>Methods</p> <p>Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic.</p> <p>Results</p> <p>The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating.</p> <p>Conclusion</p> <p>The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.</p

    Rasch scaling procedures for informing development of a valid Fetal Surveillance Education Program multiple-choice assessment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is widely recognised that deficiencies in fetal surveillance practice continue to contribute significantly to the burden of adverse outcomes. This has prompted the development of evidence-based clinical practice guidelines by the Royal Australian and New Zealand College of Obstetricians and Gynaecologists and an associated Fetal Surveillance Education Program to deliver the associated learning. This article describes initial steps in the validation of a corresponding multiple-choice assessment of the relevant educational outcomes through a combination of item response modelling and expert judgement.</p> <p>Methods</p> <p>The Rasch item response model was employed for item and test analysis and to empirically derive the substantive interpretation of the assessment variable. This interpretation was then compared to the hierarchy of competencies specified a priori by a team of eight subject-matter experts. Classical Test Theory analyses were also conducted.</p> <p>Results</p> <p>A high level of agreement between the hypothesised and derived variable provided evidence of construct validity. Item and test indices from Rasch analysis and Classical Test Theory analysis suggested that the current test form was of moderate quality. However, the analyses made clear the required steps for establishing a valid assessment of sufficient psychometric quality. These steps included: increasing the number of items from 40 to 50 in the first instance, reviewing ineffective items, targeting new items to specific content and difficulty gaps, and formalising the assessment blueprint in light of empirical information relating item structure to item difficulty.</p> <p>Conclusion</p> <p>The application of the Rasch model for criterion-referenced assessment validation with an expert stakeholder group is herein described. Recommendations for subsequent item and test construction are also outlined in this article.</p

    Measuring emotional and social wellbeing in Aboriginal and Torres Strait Islander populations: an analysis of a Negative Life Events Scale

    Get PDF
    Aboriginal and Torres Strait Islander Australians experience widespread socioeconomic disadvantage and health inequality. In an attempt to make Indigenous health research more culturally-appropriate, Aboriginal and Torres Strait Islander Australians have called for more attention to the concept of emotional and social wellbeing (ESWB). Although it has been widely recognised that ESWB is of crucial importance to the health of Aboriginal and Torres Strait Islander peoples, there is little consensus on how to measure in Indigenous populations, hampering efforts to better understand and improve the psychosocial determinants of health. This paper explores the policy and political context to this situation, and suggests ways to move forward. The second part of the paper explores how scales can be evaluated in a health research setting, including assessments of endorsement, discrimination, internal and external reliability

    Mantle Pb paradoxes : the sulfide solution

    Get PDF
    Author Posting. © Springer, 2006. This is the author's version of the work. It is posted here by permission of Springer for personal use, not for redistribution. The definitive version was published in Contributions to Mineralogy and Petrology 152 (2006): 295-308, doi:10.1007/s00410-006-0108-1.There is growing evidence that the budget of Pb in mantle peridotites is largely contained in sulfide, and that Pb partitions strongly into sulfide relative to silicate melt. In addition, there is evidence to suggest that diffusion rates of Pb in sulfide (solid or melt) are very fast. Given the possibility that sulfide melt ‘wets’ sub-solidus mantle silicates, and has very low viscosity, the implications for Pb behavior during mantle melting are profound. There is only sparse experimental data relating to Pb partitioning between sulfide and silicate, and no data on Pb diffusion rates in sulfides. A full understanding of Pb behavior in sulfide may hold the key to several long-standing and important Pb paradoxes and enigmas. The classical Pb isotope paradox arises from the fact that all known mantle reservoirs lie to the right of the Geochron, with no consensus as to the identity of the “balancing” reservoir. We propose that long-term segregation of sulfide (containing Pb) to the core may resolve this paradox. Another Pb paradox arises from the fact that the Ce/Pb ratio of both OIB and MORB is greater than bulk earth, and constant at a value of 25. The constancy of this “canonical ratio” implies similar partition coefficients for Ce and Pb during magmatic processes (Hofmann et al. 1986), whereas most experimental studies show that Pb is more incompatible in silicates than Ce. Retention of Pb in residual mantle sulfide during melting has the potential to bring the bulk partitioning of Ce into equality with Pb if the sulfide melt/silicate melt partition coefficient for Pb has a value of ~ 14. Modeling shows that the Ce/Pb (or Nd/Pb) of such melts will still accurately reflect that of the source, thus enforcing the paradox that OIB and MORB mantles have markedly higher Ce/Pb (and Nd/Pb) than the bulk silicate earth. This implies large deficiencies of Pb in the mantle sources for these basalts. Sulfide may play other important roles during magmagenesis: 1). advective/diffusive sulfide networks may form potent metasomatic agents (in both introducing and obliterating Pb isotopic heterogeneities in the mantle); 2). silicate melt networks may easily exchange Pb with ambient mantle sulfides (by diffusion or assimilation), thus ‘sampling’ Pb in isotopically heterogeneous mantle domains differently from the silicate-controlled isotope tracer systems (Sr, Nd, Hf), with an apparent ‘de-coupling’ of these systems.Our intemperance should not be blamed on the support we gratefully acknowledge from NSF: EAR- 0125917 to SRH and OCE-0118198 to GAG

    A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants

    Get PDF
    Background: This paper presents the first meta-analysis for the inter-rater reliability (IRR) of journal peer reviews. IRR is defined as the extent to which two or more independent reviews of the same scientific document agree. Methodology/Principal Findings: Altogether, 70 reliability coefficients (Cohen’s Kappa, intra-class correlation [ICC], and Pearson product-moment correlation [r]) from 48 studies were taken into account in the meta-analysis. The studies were based on a total of 19,443 manuscripts; on average, each study had a sample size of 311 manuscripts (minimum: 28, maximum: 1983). The results of the meta-analysis confirmed the findings of the narrative literature reviews published to date: The level of IRR (mean ICC/r 2 =.34, mean Cohen’s Kappa =.17) was low. To explain the study-to-study variation of the IRR coefficients, meta-regression analyses were calculated using seven covariates. Two covariates that emerged in the metaregression analyses as statistically significant to gain an approximate homogeneity of the intra-class correlations indicated that, firstly, the more manuscripts that a study is based on, the smaller the reported IRR coefficients are. Secondly, if the information of the rating system for reviewers was reported in a study, then this was associated with a smaller IRR coefficient than if the information was not conveyed. Conclusions/Significance: Studies that report a high level of IRR are to be considered less credible than those with a low level o
    corecore