202 research outputs found

    An integrated analysis tool for analyzing hybridization intensities and genotypes using new-generation population-optimized human arrays

    Get PDF
    The cross-sample plot of the multipoint LOH/LCSH analyses of the three samples used in Fig. 5. The plot comprises four panels: (a) The top-left panel is a cross-sample and cross-chromosome plot. The vertical axis is the index of study samples, and the horizontal axis is the physical position (Mb) on each of the 23 chromosomes. The blue and red bars represent SNPs without and with LOH/LSCH, respectively. (b) The top-right panel is a histogram of cross-chromosome aberration frequency. The vertical axis is the index of study samples, and the horizontal axis is the cross-chromosome aberration frequency of the corresponding samples. The pink (skyblue) background represents that the genetic gender of a sample is female (male). The histogram represents the aberration frequency of LOH/LCSH SNPs across the chromosomes of the corresponding samples. (c) The bottom-left panel is a histogram of the cross-sample aberration frequency. The vertical axis is the cross-sample aberration frequency of a SNP, and the horizontal axis is the physical position (Mb) on each of the 23 chromosomes. The purple line represents the aberration proportion of samples carrying the SNPs with LOH/LCSH. (d) The bottom-right panel is the legend of the genetic gender that is used in panel (b), where the pink (skyblue) background represents that the genetic gender of a sample is female (male). (TIFF 1656 kb

    A large-scale survey of genetic copy number variations among Han Chinese residing in Taiwan

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Copy number variations (CNVs) have recently been recognized as important structural variations in the human genome. CNVs can affect gene expression and thus may contribute to phenotypic differences. The copy number inferring tool (CNIT) is an effective hidden Markov model-based algorithm for estimating allele-specific copy number and predicting chromosomal alterations from single nucleotide polymorphism microarrays. The CNIT algorithm, which was constructed using data from 270 HapMap multi-ethnic individuals, was applied to identify CNVs from 300 unrelated Han Chinese individuals in Taiwan.</p> <p>Results</p> <p>Using stringent selection criteria, 230 regions with variable copy numbers were identified in the Han Chinese population; 133 (57.83%) had been reported previously, 64 displayed greater than 1% CNV allele frequency. The average size of the CNV regions was 322 kb (ranging from 1.48 kb to 5.68 Mb) and covered a total of 2.47% of the human genome. A total of 196 of the CNV regions were simple deletions and 27 were simple amplifications. There were 449 genes and 5 microRNAs within these CNV regions; some of these genes are known to be associated with diseases.</p> <p>Conclusion</p> <p>The identified CNVs are characteristic of the Han Chinese population and should be considered when genetic studies are conducted. The CNV distribution in the human genome is still poorly characterized, and there is much diversity among different ethnic populations.</p

    Physical frailty identification using machine learning to explore the 5-item FRAIL scale, Cardiovascular Health Study index, and Study of Osteoporotic Fractures index

    Get PDF
    BackgroundPhysical frailty is an important issue in aging societies. Three models of physical frailty assessment, the 5-Item fatigue, resistance, ambulation, illness and loss of weight (FRAIL); Cardiovascular Health Study (CHS); and Study of Osteoporotic Fractures (SOF) indices, have been regularly used in clinical and research studies. However, no previous studies have investigated the predictive ability of machine learning (ML) for physical frailty assessment. The aim was to use two ML algorithms, random forest (RF) and extreme gradient boosting (XGBoost), to predict these three physical frailty assessment models.Materials and methodsQuestionnaires regarding demographic characteristics, lifestyle habits, living environment, and physical frailty assessment were answered by 445 participants aged 60 years and above. The RF and XGBoost algorithms were used to assess their scores for the three physical frailty indices. Furthermore, feature importance and Shapley additive explanations (SHAP) were used to determine the important physical frailty factors.ResultsThe XGBoost algorithm obtained higher accuracy for predicting the three physical frailty indices; the areas under the curve obtained by the XGBoost algorithm for the 5-Item FRAIL, CHS, and SOF indices were 0.84. 0.79, and 0.69, respectively. The feature importance and SHAP of the XGBoost algorithm revealed that systolic blood pressure, diastolic blood pressure, age, and body mass index play important roles in all three physical frailty models.ConclusionThe XGBoost algorithm has a more accurate predictive rate than RF across all three physical frailty assessments. Thus, ML can be a useful tool for the early detection of physical frailty

    A Large-Scale Evaluation of Speech Foundation Models

    Full text link
    The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work, we establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for speech. We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads. Combining our results with community submissions, we verify that the foundation model paradigm is promising for speech, and our multi-tasking framework is simple yet effective, as the best-performing foundation model shows competitive generalizability across most SUPERB tasks. For reproducibility and extensibility, we have developed a long-term maintained platform that enables deterministic benchmarking, allows for result sharing via an online leaderboard, and promotes collaboration through a community-driven benchmark database to support new development cycles. Finally, we conduct a series of analyses to offer an in-depth understanding of SUPERB and speech foundation models, including information flows across tasks inside the models, the correctness of the weighted-sum benchmarking protocol and the statistical significance and robustness of the benchmark.Comment: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferre

    Associations of the distance-saturation product and low-attenuation area percentage in pulmonary computed tomography with acute exacerbation in patients with chronic obstructive pulmonary disease

    Get PDF
    Background: Chronic obstructive pulmonary disease (COPD) has high global health concerns, and previous research proposed various indicators to predict mortality, such as the distance-saturation product (DSP), derived from the 6-min walk test (6MWT), and the low-attenuation area percentage (LAA%) in pulmonary computed tomographic images. However, the feasibility of using these indicators to evaluate the stability of COPD still remains to be investigated. Associations of the DSP and LAA% with other COPD-related clinical parameters are also unknown. This study, thus, aimed to explore these associations. Methods: This retrospective study enrolled 111 patients with COPD from northern Taiwan. Individuals’ data we collected included results of a pulmonary function test (PFT), 6MWT, life quality survey [i.e., the modified Medical Research Council (mMRC) scale and COPD assessment test (CAT)], history of acute exacerbation of COPD (AECOPD), and LAA%. Next, the DSP was derived by the distance walked and the lowest oxygen saturation recorded during the 6MWT. In addition, the DSP and clinical phenotype grouping based on clinically significant outcomes by previous study approaches were employed for further investigation (i.e., DSP of 290 m%, LAA% of 20%, and AECOPD frequency of ≥1). Mean comparisons and linear and logistic regression models were utilized to explore associations among the assessed variables. Results: The low-DSP group (<290 m%) had significantly higher values for the mMRC, CAT, AECOPD frequency, and LAA% at different lung volume scales (total, right, and left), whereas it had lower values of the PFT and 6MWT parameters compared to the high-DSP group. Significant associations (with high odds ratios) were observed of the mMRC, CAT, AECOPD frequency, and PFT with low- and high-DSP groupings. Next, the risk of having AECOPD was associated with the mMRC, CAT, DSP, and LAA% (for the total, right, and left lungs). Conclusion: A lower value of the DSP was related to a greater worsening of symptoms, more-frequent exacerbations, poorer pulmonary function, and more-severe emphysema (higher LAA%). These readily determined parameters, including the DSP and LAA%, can serve as indicators for assessing the COPD clinical course and may can serve as a guide to corresponding treatments

    Proteoglycan 4 is a diagnostic biomarker for COPD

    Get PDF
    INTRODUCTION: The measurement of C-reactive protein (CRP) to confirm the stability of COPD has been reported. However, CRP is a systemic inflammatory biomarker that is related to many other diseases. OBJECTIVE: The objective of this study is to discover a diagnostic biomarker for COPD. METHODS: Sixty-one subjects with COPD and 15 healthy controls (10 healthy non-smokers and 5 smokers) were recruited for a 1-year follow-up study. Data regarding the 1-year acute exacerbation frequency and changes in lung function were collected. CRP and the identified biomarkers were assessed in the validation COPD cohort patients and healthy subjects. Receiver operating characteristic values of CRP and the identified biomarkers were determined. A validation COPD cohort was used to reexamine the identified biomarker. Correlation of the biomarker with 1-year lung function decline was determined. RESULTS: Proteoglycan 4 (PRG4) was identified as a biomarker in COPD. The serum concentrations of PRG4 in COPD Global initiative for chronic Obstructive Lung Disease (GOLD) stages 1+2 and 3+4 were 10.29 ng/mL and 13.20 ng/mL, respectively; 4.99 ng/mL for healthy controls (P<0.05); and 4.49 ng/mL for healthy smokers (P<0.05). PRG4 was more sensitive and specific than CRP for confirming COPD severity and acute exacerbation frequency. There was no correlation between CRP and PRG4 levels, and PRG4 was negatively correlated with the 1-year change in predicted forced vital capacity percent (R(2)=0.91, P=0.013). CONCLUSION: PRG4 may be a biomarker for identification of severity in COPD. It was related to the 1-year forced vital capacity decline in COPD patients
    corecore