1,646 research outputs found

    An Improved Method for Bandwidth Selection when Estimating ROC Curves

    Get PDF
    The receiver operating characteristic (ROC) curve is used to describe the performance of a diagnostic test which classifies observations into two groups. We introduce a new method for selecting bandwidths when computing kernel estimates of ROC curves. Our technique allows for interaction between the distributions of each group of observations and gives substantial improvement in MISE over other proposed methods, especially when the two distributions are very different.Bandwidth selection; binary classification; kernel estimator; ROC curve

    Non Parametric Confidence Intervals for Receiver Operating Characteristic Curves

    Get PDF
    We study methods for constructing confidence intervals, and confidence bands, for estimators of receiver operating characteristics. Particular emphasis is placed on the way in which smoothing should be implemented, when estimating either the characteristic itself or its variance. We show that substantial undersmoothing is necessary if coverage properties are not to be impaired. A theoretical analysis of the problem suggests an empirical, plug-in rule for bandwidth choice, optimising the coverage accuracy of interval estimators. The performance of this approach is explored. Our preferred technique is based on asymptotic approximation, rather than a more sophisticated approach using the bootstrap, since the latter requires a multiplicity of smoothing parameters all of which must be chosen in nonstandard ways. It is shown that the asymptotic method can give very good performance.Bandwidth selection, binary classification, kernel estimator, receiver operating characteristic curve.

    A Bayesian Approach to Graphical Record Linkage and De-duplication

    Full text link
    We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate transitive linkage probabilities across records (and represent this visually), and propagate the uncertainty of record linkage into later analyses. Our method makes it particularly easy to integrate record linkage with post-processing procedures such as logistic regression, capture-recapture, etc. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously record linkage approaches, despite the high-dimensional parameter space. We illustrate our method using longitudinal data from the National Long Term Care Survey and with data from the Italian Survey on Household and Wealth, where we assess the accuracy of our method and show it to be better in terms of error rates and empirical scalability than other approaches in the literature.Comment: 39 pages, 8 figures, 8 tables. Longer version of arXiv:1403.0211, In press, Journal of the American Statistical Association: Theory and Methods (2015

    Internal tides in a dendritic submarine canyon

    Get PDF
    Submarine canyons are a common geomorphological feature along continental slopes worldwide and often found to be ‘hotspots’ of internal tide activity. However, the majority of well-studied submarine canyons are simple linear incisions or have meandering morphology; internal tide energetics in branching (dendritic) canyons has not previously been investigated. Here we present a high-resolution (500-m) numerical modelling study of the internal tide within Whittard Canyon, a large, dendritic submarine canyon system that incises the Celtic Sea continental slope. A modified version of the Princeton Ocean Model is used to simulate the M2 (semidiurnal) internal tide in the Whittard Canyon region, verified against a hydrographic dataset collected by an autonomous ocean glider. Much of the internal tide energy entering Whittard Canyon originates to the southeast, along the Celtic Sea shelf break. Internal tide generation also occurs within the canyon itself, but is in part compensated by areas of negative energy conversion. Depth-integrated internal tide energy fluxes exceed 8 kW m−1 in the eastern limb of the canyon. The internal tide is topographically steered through the major limbs and along-canyon energy flux is bottom intensified, suggesting topographic focusing. The down canyon extent of bottom intensification closely corresponds to the point that along-canyon slope becomes near-critical to the semidiurnal internal tide. Energetically, the multiple limbs of Whittard Canyon behave differently, some are net sources of internal tide energy whilst others are net sinks. Internal tide energy dissipation also varies between the canyon limbs; bulk dissipation rates are 2.1-7.7 × 10−8 W kg−1 . In addition, the effect of bathymetric resolution on internal tide generation and propagation is investigated by progressively smoothing the model domain. Decreasing the bathymetric resolution reduces internal tide generation and energy dissipation in both Whittard Canyon and the model domain as a whole, however, internal tide energy flux into the canyon is not consistently changed. At least 1.5-km resolution bathymetry is required to adequately resolve the semidiurnal internal tide field in this region of complex topography

    Diaries or questionnaires for collecting self-reported healthcare utilisation and patient cost data? CHERE Project Report No 20

    Get PDF
    The literature comparing diaries and questionnaires was reviewed in order to identify the most appropriate method of collecting patient self-reported data, on health service utilisation and out-ofpocket costs, for a longitudinal study. Nine published studies met the review inclusion criteria; four compared the diary method with a self-completed questionnaire and five with an interviewer administered questionnaire. None of the eligible studies measured patient costs, and only two measured some aspects of health service utilisation. Most of the studies reported higher response rates for questionnaires than for diaries, and there was some evidence of selection bias. There was a tendency to report more symptoms, symptom intensity or health care utilisation by questionnaires compared to diaries, and compared to physician reports (included in only two studies). The review provides some information about the two approaches for collecting self-reported data, but does not provide sufficient evidence to favour either approach.diaries, health care utilisation

    Partly standing internal tides in a dendritic submarine canyon observed by an ocean glider

    Get PDF
    An autonomous ocean glider is used to make the first direct measurements of internal tides within Whittard Canyon, a large, dendritic submarine canyon system that incises the Celtic Sea continental slope and a site of high benthic biodiversity. This is the first time a glider has been used for targeted observations of internal tides in a submarine canyon. Vertical isopycnal displacement observations at different stations fit a one-dimensional model of partly standing semidiurnal internal tides – comprised of a major, incident wave propagating up the canyon limbs and a minor wave reflected back down-canyon by steep, supercritical bathymetry near the canyon heads. The up-canyon internal tide energy flux in the primary study limb decreases from 9.2 to 2.0 kW m−1 over 28 km (a dissipation rate of View the MathML source), comparable to elevated energy fluxes and internal tide driven mixing measured in other canyon systems. Within Whittard Canyon, enhanced mixing is inferred from collapsed temperature-salinity curves and weakened dissolved oxygen concentration gradients near the canyon heads. It has previously been hypothesised that internal tides impact benthic fauna through elevated near-bottom current velocities and particle resuspension. In support of this, we infer order 20 cm s−1 near-bottom current velocities in the canyon and observe high concentrations of suspended particulate matter. The glider observations are also used to estimate a 1 °C temperature range and 12 μmol kg−1 dissolved oxygen concentration range, experienced twice a day by organisms on the canyon walls, due to the presence of internal tides. This study highlights how a well-designed glider mission, incorporating a series of tide-resolving stations at key locations, can be used to understand internal tide dynamics in a region of complex topography, a sampling strategy that is applicable to continental shelves and slopes worldwide
    corecore