258 research outputs found
wsrf: An R Package for Classification with Scalable Weighted Subspace Random Forests
We describe a parallel implementation in R of the weighted subspace random forest algorithm (Xu, Huang, Williams, Wang, and Ye 2012) available as the wsrf package. A novel variable weighting method is used for variable subspace selection in place of the traditional approach of random variable sampling. This new approach is particularly useful in building models for high dimensional data - often consisting of thousands of variables. Parallel computation is used to take advantage of multi-core machines and clusters of machines to build random forest models from high dimensional data in considerably shorter times. A series of experiments presented in this paper demonstrates that wsrf is faster than existing packages whilst retaining and often improving on the classification performance, particularly for high dimensional data
Data quality model for assessing public COVID-19 big datasets
For decision-making support and evidence based on healthcare, high quality data are crucial, particularly if the emphasized knowledge is lacking. For public health practitioners and researchers, the reporting of COVID-19 data need to be accurate and easily available. Each nation has a system in place for reporting COVID-19 data, albeit these systems’ efcacy has not been thoroughly evaluated. However, the current COVID-19 pandemic has shown widespread faws in data quality. We propose a data quality model (canonical data model, four adequacy levels, and Benford’s law) to assess the quality issue of COVID-19 data reporting carried out by the World Health Organization (WHO) in the six Central African Economic and Monitory Community (CEMAC) region countries between March 6,2020, and June 22, 2022, and suggest potential solutions. These levels of data quality sufciency can be interpreted as dependability indicators and sufciency of Big Dataset inspection. This model efectively identifed the quality of the entry data for big dataset analytics. The future development of this model requires scholars and institutions from all sectors to deepen their understanding of its core concepts, improve integration with other data processing technologies, and broaden the scope of its applications.publishedVersio
Unsupervised Adaptation for High-Dimensional with Limited-Sample Data Classification Using Variational Autoencoder
High-dimensional with limited-sample size (HDLSS) datasets exhibit two critical problems: (1) Due to the insufficiently small-sample size, there is a lack of enough samples to build classification models. Classification models with a limited-sample may lead to overfitting and produce erroneous or meaningless results. (2) The 'curse of dimensionality' phenomena is often an obstacle to the use of many methods for solving the high-dimensional with limited-sample size problem and reduces classification accuracy. This study proposes an unsupervised framework for high-dimensional limited-sample size data classification using dimension reduction based on variational autoencoder (VAE). First, the deep learning method variational autoencoder is applied to project high-dimensional data onto lower-dimensional space. Then, clustering is applied to the obtained latent-space of VAE to find the data groups and classify input data. The method is validated by comparing the clustering results with actual labels using purity, rand index, and normalized mutual information. Moreover, to evaluate the proposed model strength, we analyzed 14 datasets from the Arizona State University Digital Repository. Also, an empirical comparison of dimensionality reduction techniques shown to conclude their applicability in the high-dimensional with limited-sample size data settings. Experimental results demonstrate that variational autoencoder can achieve more accuracy than traditional dimensionality reduction techniques in high-dimensional with limited-sample-size data analysis
A Survey of Data Partitioning and Sampling Methods to Support Big Data Analysis
Computer clusters with the shared-nothing architecture are the major computing platforms for big data processing and analysis. In cluster computing, data partitioning and sampling are two fundamental strategies to speed up the computation of big data and increase scalability. In this paper, we present a comprehensive survey of the methods and techniques of data partitioning and sampling with respect to big data processing and analysis. We start with an overview of the mainstream big data frameworks on Hadoop clusters. The basic methods of data partitioning are then discussed including three classical horizontal partitioning schemes: range, hash, and random partitioning. Data partitioning on Hadoop clusters is also discussed with a summary of new strategies for big data partitioning, including the new Random Sample Partition (RSP) distributed model. The classical methods of data sampling are then investigated, including simple random sampling, stratified sampling, and reservoir sampling. Two common methods of big data sampling on computing clusters are also discussed: record-level sampling and block-level sampling. Record-level sampling is not as efficient as block-level sampling on big distributed data. On the other hand, block-level sampling on data blocks generated with the classical data partitioning methods does not necessarily produce good representative samples for approximate computing of big data. In this survey, we also summarize the prevailing strategies and related work on sampling-based approximation on Hadoop clusters. We believe that data partitioning and sampling should be considered together to build approximate cluster computing frameworks that are reliable in both the computational and statistical respects
On Quantum Methods for Machine Learning Problems Part II: Quantum Classification Algorithms
This is a review of quantum methods for machine learning problems that consists of two parts. The first part, "quantum tools", presented some of the fundamentals and introduced several quantum tools based on known quantum search algorithms. This second part of the review presents several classification problems in machine learning that can be accelerated with quantum subroutines. We have chosen supervised learning tasks as typical classification problems to illustrate the use of quantum methods for classification
Matching supplementary motor area-primary motor cortex paired transcranial magnetic stimulation improves motor dysfunction in Parkinson’s disease: a single-center, double-blind randomized controlled clinical trial protocol
BackgroundNon-invasive neuroregulation techniques have been demonstrated to improve certain motor symptoms in Parkinson’s disease (PD). However, the currently employed regulatory techniques primarily concentrate on stimulating single target points, neglecting the functional regulation of networks and circuits. The supplementary motor area (SMA) has a significant value in motor control, and its functionality is often impaired in patients with PD. The matching SMA-primary motor cortex (M1) paired transcranial magnetic stimulation (TMS) treatment protocol, which benefits patients by modulating the sequential and functional connections between the SMA and M1, was elucidated in this study.MethodsThis was a single-center, double-blind, randomized controlled clinical trial. We recruited 78 subjects and allocated them in a 1:1 ratio by stratified randomization into the paired stimulation (n = 39) and conventional stimulation groups (n = 39). Each patient underwent 3 weeks of matching SMA-M1 paired TMS or sham-paired stimulation. The subjects were evaluated before treatment initiation, 3 weeks into the intervention, and 3 months after the cessation of therapy. The primary outcome measure in this study was the Unified Parkinson’s Disease Rating Scale III, and the secondary outcome measures included non-motor functional assessment, quality of life (Parkinson’s Disease Questionnaire-39), and objective assessments (electromyography and functional near-infrared spectroscopy).DiscussionClinical protocols aimed at single targets using non-invasive neuroregulation techniques often improve only one function. Emphasizing the circuit and network regulation in PD is important for enhancing the effectiveness of TMS rehabilitation. Pairing the regulation of cortical circuits may be a potential treatment method for PD. As a crucial node in motor control, the SMA has direct fiber connections with basal ganglia circuits and complex fiber connections with M1, which are responsible for motor execution. SMA regulation may indirectly regulate the function of basal ganglia circuits. Therefore, the developed cortical pairing stimulation pattern can reshape the control of information flow from the SMA to M1. The novel neuroregulation model designed for this study is based on the circuit mechanisms of PD and previous research results, with a scientific foundation and the potential to be a means of neuroregulation for PD.Clinical trial registration: ClinicalTrials.gov, identifier [ChiCTR2400083325]
Expert consensus on spontaneous ventilation video-assisted thoracoscopic surgery in primary spontaneous pneumothorax (Guangzhou)
- …
