11 research outputs found
Enhancing data-limited assessments with random effects: A case study on Korea chub mackerel (Scomber japonicus)
In a state-space framework, temporal variations in fishery-dependent
processes can be modeled as random effects. This modeling flexibility makes
state-space models (SSMs) powerful tools for data-limited assessments. Though
SSMs enable the model-based inference of the unobserved processes, their
flexibility can lead to overfitting and non-identifiability issues. To address
these challenges, we developed a suite of state-space length-based
age-structured models and applied them to the Korean chub mackerel (Scomber
japonicus) stock. Our research demonstrated that incorporating temporal
variations in fishery-dependent processes can rectify model mis-specification
but may compromise robustness, which can be diagnosed through a series of model
checking processes. To tackle non-identifiability, we used a non-degenerate
estimator, implementing a gamma distribution as a penalty for the standard
deviation parameters of observation errors. This penalty function enabled the
simultaneous estimation of both process and observation error variances with
minimal bias, a notably challenging task in SSMs. These results highlight the
importance of model checking and the effectiveness of the penalized approach in
estimating SSMs. Additionally, we discussed novel assessment outcomes for the
mackerel stock.Comment: 78 pages, 21 figure
‘Drivin' with your eyes closed’: Results from an international, blinded simulation experiment to evaluate spatial stock assessments
Spatial models enable understanding potential redistribution of marine resources associated with ecosystem drivers and climate change. Stock assessment platforms can incorporate spatial processes, but have not been widely implemented or simulation tested. To address this research gap, an international simulation experiment was organized. The study design was blinded to replicate uncertainty similar to a real-world stock assessment process, and a data-conditioned, high-resolution operating model (OM) was used to emulate the spatial dynamics and data for Indian Ocean yellowfin tuna (Thunnus albacares). Six analyst groups developed both single-region and spatial stock assessment models using an assessment platform of their choice, and then applied each model to the simulated data. Results indicated that across all spatial structures and platforms, assessments were able to adequately recreate the population trends from the OM. Additionally, spatial models were able to estimate regional population trends that generally reflected the true dynamics from the OM, particularly for the regions with higher biomass and fishing pressure. However, a consistent population biomass scaling pattern emerged, where spatial models estimated higher population scale than single-region models within a given assessment platform. Balancing parsimony and complexity trade-offs were difficult, but adequate complexity in spatial parametrizations (e.g., allowing time- and age-variation in movement and appropriate tag mixing periods) was critical to model performance. We recommend expanded use of high-resolution OMs and blinded studies, given their ability to portray realistic performance of assessment models. Moreover, increased support for international simulation experiments is warranted to facilitate dissemination of methodology across organizations.Peer reviewe
Simultaneous quantification of aquatic ecosystem metabolism and reaeration using a Bayesian statistical model of oxygen dynamics
Overdiagnosis and Lives Saved by Reflex Testing Men With Intermediate Prostate-Specific Antigen Levels
Abstract
Background
Several prostate cancer (PCa) early-detection biomarkers are available for reflex testing in men with intermediate prostate-specific antigen (PSA) levels. Studies of these biomarkers typically provide information about diagnostic performance but not about overdiagnosis and lives saved, the primary drivers of associated harm and benefit.
Methods
We projected overdiagnoses and lives saved using an established microsimulation model of PCa incidence and mortality with screening and treatment efficacy based on randomized trials. We used this framework to evaluate four urinary reflex biomarkers (measured in 1112 men presenting for prostate biopsy at 10 US academic or community clinics) and two hypothetical ideal biomarkers (with 100% sensitivity or specificity for any or for high-grade PCa) at one-time screening tests at ages 55 and 65 years.
Results
Compared with biopsying all men with elevated PSA, reflex testing reduced overdiagnoses (range across ages and biomarkers = 8.8–60.6%) but also reduced lives saved (by 7.3–64.9%), producing similar overdiagnoses per life saved. The ideal biomarker for high-grade disease improved this ratio (by 35.2% at age 55 years and 42.0% at age 65 years). Results were similar under continued screening for men not diagnosed at age 55 years, but the ideal biomarker for high-grade disease produced smaller incremental improvement.
Conclusions
Modeling is a useful tool for projecting the implications of using reflex biomarkers for long-term PCa outcomes. Under simplified conditions, reflex testing with urinary biomarkers is expected to reduce overdiagnoses but also produce commensurate reductions in lives saved. Reflex testing that accurately identifies high-grade PCa could improve the net benefit of screening.
</jats:sec
Abstract A09: Predicting recurrence or second breast cancer using linked claims and cancer registry data with limited gold-standard information: A gradient-boosting approach
Abstract
Background: Cancer recurrence is a major event affecting the burden of the disease and is a critical decision point for patients and their providers. Population-based information on the risk of cancer recurrence is lacking because it is not routinely collected by cancer registries.
Objective: To develop and implement a scalable, supervised learning algorithm to predict breast cancer recurrence status using information about disease at diagnosis from registry data and information about health care utilization from medical claims.
Data: Medical claims from private insurers and Medicare (2011-2016) linked with the Puget Sound SEER Cancer Registry were made available via the Hutch Institute for Cancer Outcomes Research (HICOR). Gold-standard information on the recurrence of initially localized breast cancer was provided by investigators on the BRAVO study of breast cancer survivors diagnosed 2004-2016 in the Puget Sound area. The HICOR and BRAVO data were linked. The analysis dataset consisted of 111 patients with a recurrence or second breast cancer event and 689 patients without a recurrence or second breast cancer event who had adequate claims (insurance enrollment before and after their second event or for at least 12 consecutive months after primary treatment) available for analysis.
Methods: A gradient-boosting algorithm (XGBoost) was harnessed to predict month-level recurrence status, i.e., whether any given month was before or after a recurrence event. Features included registry information on patient demographics, initial extent of disease, and hormone-receptor, and engineered features based on the counts of diagnosis, procedure and drug claims within groups determined by a blend of previously defined groups and groups customized for this application. Time-varying features included monthly counts of codes within each group, months since the most recent and subsequent occurrence of each code group, and cumulative sums of each code group. Subjects were split into a training (n=94) and test (n=17) set for reporting performance results. The training data were further split 5:1 for cross-validation purposes.
Results: The list of most important variables included time since coding of secondary malignancy, cumulative sum of codes related to pathology, and codes related to catheter placement. The month-specific AUC on a validation subset (n=17 patients) was 0.89; individual-level (sensitivity, specificity) ranged from (0.824, 0.946) to (0.706,0.982).
Conclusions: Data sources that link claims, cancer registry, and gold-standard disease status information are critical for the development of novel, automated approaches for detecting cancer recurrence. Gradient-boosted learning with engineered time-varying features shows promise for identifying recurrence events in administrative claims. Proper coding of procedure and drug groups is likely to be key to the performance of such algorithms. Incompleteness of claims data is a major challenge.
Citation Format: Teresa A'mar, Daniel Markowitz, Jessica Chubak, David Beatty, Catherine Fedorenko, Christopher Li, Kathi Malone, Ruth Etzioni. Predicting recurrence or second breast cancer using linked claims and cancer registry data with limited gold-standard information: A gradient-boosting approach [abstract]. In: Proceedings of the AACR Special Conference on Modernizing Population Sciences in the Digital Age; 2019 Feb 19-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2020;29(9 Suppl):Abstract nr A09.</jats:p
Incorporating Breast Cancer Recurrence Events Into Population-Based Cancer Registries Using Medical Claims: Cohort Study
Background
There is a need for automated approaches to incorporate information on cancer recurrence events into population-based cancer registries.
Objective
The aim of this study is to determine the accuracy of a novel data mining algorithm to extract information from linked registry and medical claims data on the occurrence and timing of second breast cancer events (SBCE).
Methods
We used supervised data from 3092 stage I and II breast cancer cases (with 394 recurrences), diagnosed between 1993 and 2006 inclusive, of patients at Kaiser Permanente Washington and cases in the Puget Sound Cancer Surveillance System. Our goal was to classify each month after primary treatment as pre- versus post-SBCE. The prediction feature set for a given month consisted of registry variables on disease and patient characteristics related to the primary breast cancer event, as well as features based on monthly counts of diagnosis and procedure codes for the current, prior, and future months. A month was classified as post-SBCE if the predicted probability exceeded a probability threshold (PT); the predicted time of the SBCE was taken to be the month of maximum increase in the predicted probability between adjacent months.
Results
The Kaplan-Meier net probability of SBCE was 0.25 at 14 years. The month-level receiver operating characteristic curve on test data (20% of the data set) had an area under the curve of 0.986. The person-level predictions (at a monthly PT of 0.5) had a sensitivity of 0.89, a specificity of 0.98, a positive predictive value of 0.85, and a negative predictive value of 0.98. The corresponding median difference between the observed and predicted months of recurrence was 0 and the mean difference was 0.04 months.
Conclusions
Data mining of medical claims holds promise for the streamlining of cancer registry operations to feasibly collect information about second breast cancer events.
</jats:sec
Incorporating Breast Cancer Recurrence Events Into Population-Based Cancer Registries Using Medical Claims: Cohort Study (Preprint)
BACKGROUND
There is a need for automated approaches to incorporate information on cancer recurrence events into population-based cancer registries.
OBJECTIVE
The aim of this study is to determine the accuracy of a novel data mining algorithm to extract information from linked registry and medical claims data on the occurrence and timing of second breast cancer events (SBCE).
METHODS
We used supervised data from 3092 stage I and II breast cancer cases (with 394 recurrences), diagnosed between 1993 and 2006 inclusive, of patients at Kaiser Permanente Washington and cases in the Puget Sound Cancer Surveillance System. Our goal was to classify each month after primary treatment as pre- versus post-SBCE. The prediction feature set for a given month consisted of registry variables on disease and patient characteristics related to the primary breast cancer event, as well as features based on monthly counts of diagnosis and procedure codes for the current, prior, and future months. A month was classified as post-SBCE if the predicted probability exceeded a probability threshold (PT); the predicted time of the SBCE was taken to be the month of maximum increase in the predicted probability between adjacent months.
RESULTS
The Kaplan-Meier net probability of SBCE was 0.25 at 14 years. The month-level receiver operating characteristic curve on test data (20% of the data set) had an area under the curve of 0.986. The person-level predictions (at a monthly PT of 0.5) had a sensitivity of 0.89, a specificity of 0.98, a positive predictive value of 0.85, and a negative predictive value of 0.98. The corresponding median difference between the observed and predicted months of recurrence was 0 and the mean difference was 0.04 months.
CONCLUSIONS
Data mining of medical claims holds promise for the streamlining of cancer registry operations to feasibly collect information about second breast cancer events.
</sec
Correction: Incorporating Breast Cancer Recurrence Events Into Population-Based Cancer Registries Using Medical Claims: Cohort Study (Preprint)
UNSTRUCTURED
In “Incorporating Breast Cancer Recurrence Events Into Population-Based Cancer Registries Using Medical Claims: Cohort Study (J Med Internet Res Ca 2020;6(2):e18143)” the authors noted two errors. The metadata erroneously listed only Drs. A’mar and Etzioni as having contributed equally; this has been corrected to reflect that Drs. A’mar, Chubak, and Etzioni contributed equally. In addition, Dr. Chubak’s affiliation has been corrected from “Washington Health Research Institute, Kaiser Permanente” to “Kaiser Permanente Washington Health Research Institute.”
</sec
Correction: Incorporating Breast Cancer Recurrence Events Into Population-Based Cancer Registries Using Medical Claims: Cohort Study
<jats:p /
