10 research outputs found
Exploration of association rule mining for coding consistency and completeness assessment in inpatient administrative health data.
OBJECTIVE: Data quality assessment is a challenging facet for research using coded administrative health data. Current assessment approaches are time and resource intensive. We explored whether association rule mining (ARM) can be used to develop rules for assessing data quality. MATERIALS AND METHODS: We extracted 2013 and 2014 records from the hospital discharge abstract database (DAD) for patients between the ages of 55 and 65 from five acute care hospitals in Alberta, Canada. The ARM was conducted using the 2013 DAD to extract rules with support ≥0.0019 and confidence ≥0.5 using the bootstrap technique, and tested in the 2014 DAD. The rules were compared against the method of coding frequency and assessed for their ability to detect error introduced by two kinds of data manipulation: random permutation and random deletion. RESULTS: The association rules generally had clear clinical meanings. Comparing 2014 data to 2013 data (both original), there were 3 rules with a confidence difference >0.1, while coding frequency difference of codes in the right hand of rules was less than 0.004. After random permutation of 50% of codes in the 2014 data, average rule confidence dropped from 0.72 to 0.27 while coding frequency remained unchanged. Rule confidence decreased with the increase of coding deletion, as expected. Rule confidence was more sensitive to code deletion compared to coding frequency, with slope of change ranging from 1.7 to 184.9 with a median of 9.1. CONCLUSION: The ARM is a promising technique to assess data quality. It offers a systematic way to derive coding association rules hidden in data, and potentially provides a sensitive and efficient method of assessing data quality compared to standard methods
Data on coding association rules from an inpatient administrative health data coded by International classification of disease - 10th revision (ICD-10) codes
Data presented in this article relates to the research article entitled “Exploration of association rule mining for coding consistency and completeness assessment in inpatient administrative health data” (Peng et al. [1]) in preparation). We provided a set of ICD-10 coding association rules in the age group of 55 to 65. The rules were extracted from an inpatient administrative health data at five acute care hospitals in Alberta, Canada, using association rule mining. Thresholds of support and confidence for the association rules mining process were set at 0.19% and 50% respectively. The data set contains 426 rules, in which 86 rules are not nested. Data are provided in the supplementary material. The presented coding association rules provide a reference for future researches on the use of association rule mining for data quality assessment
Quality assessment of RNA in long-term storage: The All Our Families biorepository
Background
The All Our Families (AOF) cohort study is a longitudinal population-based study which collected biological samples from 1948 pregnant women between May 2008 and December 2010. As the quality of samples can decline over time, the objective of the current study was to assess the association between storage time and RNA (ribonucleic acid) yield and purity, and confirm the quality of these samples after 7–10 years in long-term storage.
Methods
Maternal whole blood samples were previously collected by trained phlebotomists and stored in four separate PAXgene Blood RNA Tubes (PreAnalytiX) between 2008 and 2011. RNA was isolated in 2011 and 2018 using PAXgene Blood RNA Kits (PreAnalytiX) as per the manufacturer’s instruction. RNA purity (260/280), as well as RNA yield, were measured using a Nanodrop. The RNA integrity number (RIN) was also assessed from 5–25 and 111–130 months of storage using RNA 6000 Nano Kit and Agilent 2100 BioAnalyzer. Descriptive statistics, paired t-test, and response feature analysis using linear regression were used to assess the association between various predictor variables and quality of the RNA isolated.
Results
Overall, RNA purity and yield of the samples did not decline over time. RNA purity of samples isolated in 2011 (2.08, 95% CI: 2.08–2.09) were statistically lower (p<0.000) than samples isolated in 2018 (2.101, 95% CI: 2.097, 2.104), and there was no statistical difference between the 2011 (13.08 μg /tube, 95% CI: 12.27–13.89) and 2018 (12.64 μg /tube, 95% CI: 11.83–13.46) RNA yield (p = 0.2964). For every month of storage, the change in RNA purity is -0.01(260/280), and the change in RNA yield between 2011 and 2018 is -0.90 μ g / tube. The mean RIN was 8.49 (95% CI:8.44–8.54), and it ranged from 7.2 to 9.5. The rate of change in expected RIN per month of storage is 0.003 (95% CI 0.002–0.004), so while statistically significant, these results are not relevant.
Conclusions
RNA quality does not decrease over time, and the methods used to collect and store samples, within a population-based study are robust to inherent operational factors which may degrade sample quality over time.
</jats:sec
Quality assessment of RNA in long-term storage: The All Our Families biorepository.
BackgroundThe All Our Families (AOF) cohort study is a longitudinal population-based study which collected biological samples from 1948 pregnant women between May 2008 and December 2010. As the quality of samples can decline over time, the objective of the current study was to assess the association between storage time and RNA (ribonucleic acid) yield and purity, and confirm the quality of these samples after 7-10 years in long-term storage.MethodsMaternal whole blood samples were previously collected by trained phlebotomists and stored in four separate PAXgene Blood RNA Tubes (PreAnalytiX) between 2008 and 2011. RNA was isolated in 2011 and 2018 using PAXgene Blood RNA Kits (PreAnalytiX) as per the manufacturer's instruction. RNA purity (260/280), as well as RNA yield, were measured using a Nanodrop. The RNA integrity number (RIN) was also assessed from 5-25 and 111-130 months of storage using RNA 6000 Nano Kit and Agilent 2100 BioAnalyzer. Descriptive statistics, paired t-test, and response feature analysis using linear regression were used to assess the association between various predictor variables and quality of the RNA isolated.ResultsOverall, RNA purity and yield of the samples did not decline over time. RNA purity of samples isolated in 2011 (2.08, 95% CI: 2.08-2.09) were statistically lower (pConclusionsRNA quality does not decrease over time, and the methods used to collect and store samples, within a population-based study are robust to inherent operational factors which may degrade sample quality over time
