19 research outputs found
Using machine learning to identify patient characteristics to predict mortality of in-patients with COVID-19 in south Florida
IntroductionThe SARS-CoV-2 (COVID-19) pandemic has created substantial health and economic burdens in the US and worldwide. As new variants continuously emerge, predicting critical clinical events in the context of relevant individual risks is a promising option for reducing the overall burden of COVID-19. This study aims to train an AI-driven decision support system that helps build a model to understand the most important features that predict the “mortality” of patients hospitalized with COVID-19.MethodsWe conducted a retrospective analysis of “5,371” patients hospitalized for COVID-19-related symptoms from the South Florida Memorial Health Care System between March 14th, 2020, and January 16th, 2021. A data set comprising patients’ sociodemographic characteristics, pre-existing health information, and medication was analyzed. We trained Random Forest classifier to predict “mortality” for patients hospitalized with COVID-19.ResultsBased on the interpretability of the model, age emerged as the primary predictor of “mortality”, followed by diarrhea, diabetes, hypertension, BMI, early stages of kidney disease, smoking status, sex, pneumonia, and race in descending order of importance. Notably, individuals aged over 65 years (referred to as “older adults”), males, Whites, Hispanics, and current smokers were identified as being at higher risk of death. Additionally, BMI, specifically in the overweight and obese categories, significantly predicted “mortality”. These findings indicated that the model effectively learned from various categories, such as patients' sociodemographic characteristics, pre-hospital comorbidities, and medications, with a predominant focus on characterizing pre-hospital comorbidities. Consequently, the model demonstrated the ability to predict “mortality” with transparency and reliability.ConclusionAI can potentially provide healthcare workers with the ability to stratify patients and streamline optimal care solutions when time is of the essence and resources are limited. This work sets the platform for future work that forecasts patient responses to treatments at various levels of disease severity and assesses health disparities and patient conditions that promote improved health care in a broader context. This study contributed to one of the first predictive analyses applying AI/ML techniques to COVID-19 data using a vast sample from South Florida
An Exploration of Consistency Learning with Data Augmentation
Deep Learning has achieved remarkable success with Supervised Learning. Nearly all of these successes require very large manually annotated datasets. Data augmentation has enabled Supervised Learning with less labeled data, while avoiding the pitfalls of overfitting. However, Supervised Learning still fails to be Robust, making different predictions for original and augmented data points. We study the addition of a Consistency Loss between representations of original and augmented data points. Although this offers additional structure for invariance to augmentation, it may fall into the trap of representation collapse. Representation collapse describes the solution of mapping every input to a constant output, thus cheating to solve the consistency task. Many techniques have been developed to avoid representation collapse such as stopping gradients, entropy penalties, and applying the Consis- tency Loss at intermediate layers. We provide an analysis of these techniques in interaction with Supervised Learning for the CIFAR-10 image classification dataset. Our consistency learning models achieve a 1.7% absolute improvement on the CIFAR-10 original test set over the supervised baseline. More interestingly, we are able to dramatically reduce our proposed Distributional Distance metric with the Consistency Loss. Distributional Distance provides a more fine- grained analysis of the invariance to corrupted images. Readers will understand the practice of adding a Consistency Loss to improve Robustness in Deep Learning
An Exploration of Consistency Learning with Data Augmentation



Deep Learning has achieved remarkable success with Supervised Learning. Nearly all of these successes require very large manually annotated datasets. Data augmentation has enabled Supervised Learning with less labeled data, while avoiding the pitfalls of overfitting. However, Supervised Learning still fails to be Robust, making different predictions for original and augmented data points. We study the addition of a Consistency Loss between representations of original and augmented data points. Although this offers additional structure for invariance to augmentation, it may fall into the trap of representation collapse. Representation collapse describes the solution of mapping every input to a constant output, thus cheating to solve the consistency task. Many techniques have been developed to avoid representation collapse such as stopping gradients, entropy penalties, and applying the Consis- tency Loss at intermediate layers. We provide an analysis of these techniques in interaction with Supervised Learning for the CIFAR-10 image classification dataset. Our consistency learning models achieve a 1.7% absolute improvement on the CIFAR-10 original test set over the supervised baseline. More interestingly, we are able to dramatically reduce our proposed Distributional Distance metric with the Consistency Loss. Distributional Distance provides a more fine- grained analysis of the invariance to corrupted images. Readers will understand the practice of adding a Consistency Loss to improve Robustness in Deep Learning.


</jats:p
Deep Learning applications for COVID-19
AbstractThis survey explores how Deep Learning has battled the COVID-19 pandemic and provides directions for future research on COVID-19. We cover Deep Learning applications in Natural Language Processing, Computer Vision, Life Sciences, and Epidemiology. We describe how each of these applications vary with the availability of big data and how learning tasks are constructed. We begin by evaluating the current state of Deep Learning and conclude with key limitations of Deep Learning for COVID-19 applications. These limitations include Interpretability, Generalization Metrics, Learning from Limited Labeled Data, and Data Privacy. Natural Language Processing applications include mining COVID-19 research for Information Retrieval and Question Answering, as well as Misinformation Detection, and Public Sentiment Analysis. Computer Vision applications cover Medical Image Analysis, Ambient Intelligence, and Vision-based Robotics. Within Life Sciences, our survey looks at how Deep Learning can be applied to Precision Diagnostics, Protein Structure Prediction, and Drug Repurposing. Deep Learning has additionally been utilized in Spread Forecasting for Epidemiology. Our literature review has found many examples of Deep Learning systems to fight COVID-19. We hope that this survey will help accelerate the use of Deep Learning for COVID-19 research.</jats:p
Text Data Augmentation for Deep Learning
AbstractNatural Language Processing (NLP) is one of the most captivating applications of Deep Learning. In this survey, we consider how the Data Augmentation training strategy can aid in its development. We begin with the major motifs of Data Augmentation summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form. We follow these motifs with a concrete list of augmentation frameworks that have been developed for text data. Deep Learning generally struggles with the measurement of generalization and characterization of overfitting. We highlight studies that cover how augmentations can construct test sets for generalization. NLP is at an early stage in applying Data Augmentation compared to Computer Vision. We highlight the key differences and promising ideas that have yet to be tested in NLP. For the sake of practical implementation, we describe tools that facilitate Data Augmentation such as the use of consistency regularization, controllers, and offline and online augmentation pipelines, to preview a few. Finally, we discuss interesting topics around Data Augmentation in NLP such as task-specific augmentations, the use of prior knowledge in self-supervised learning versus Data Augmentation, intersections with transfer and multi-task learning, and ideas for AI-GAs (AI-Generating Algorithms). We hope this paper inspires further research interest in Text Data Augmentation.</jats:p
Text Data Augmentation for Deep Learning
Abstract
Natural Language Processing (NLP) is one of the most captivating applications of Deep Learning. In this survey, we consider how the Data Augmentation training strategy can aid in its development. We begin with the major motifs of Data Augmentation summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form. We follow these motifs with a concrete list of augmentation frameworks that have been developed for text data. Deep Learning generally struggles with the measurement of generalization and characterization of overfitting. We highlight studies that cover how augmentations can construct test sets for generalization. NLP is at an early stage in applying Data Augmentation compared to Computer Vision. We highlight the key differences and promising ideas that have yet to be tested in NLP. For the sake of practical implementation, we describe tools that facilitate Data Augmentation such as the use of consistency regularization, controllers, and offline and online augmentation pipelines, to preview a few. Finally, we discuss interesting topics around Data Augmentation in NLP such as task-specific augmentations, the use of prior knowledge in self-supervised learning versus Data Augmentation, intersections with transfer and multi-task learning, and ideas for AI-GAs (AI-Generating Algorithms). We hope this paper inspires further research interest in Text Data Augmentation.</jats:p
A Comparison of House Price Classification with Structured and Unstructured Text Data
Purchasing a home is one of the largest investments most people make. House price prediction allows individuals to be informed about their asset wealth. Transparent pricing on homes allows for a more efficient market and economy. We report the performance of machine learning models trained with structured tabular representations and unstructured text descriptions. We collected a dataset of 200 descriptions of houses which include meta-information, as well as text descriptions. We test logistic regression and multi-layer perceptron (MLP) classifiers on dividing these houses into binary buckets based on fixed price thresholds. We present an exploration into strategies to represent unstructured text descriptions of houses as inputs for machine learning models. This includes a comparison of term frequency-inverse document frequency (TF-IDF), bag-of-words (BoW), and zero-shot inference with large language models. We find the best predictive performance with TF-IDF representations of house descriptions. Readers will gain an understanding of how to use machine learning models optimized with structured and unstructured text data to predict house prices
A Comparison of House Price Classification with Structured and Unstructured Text Data



Purchasing a home is one of the largest investments most people make. House price prediction allows individuals to be informed about their asset wealth. Transparent pricing on homes allows for a more efficient market and economy. We report the performance of machine learning models trained with structured tabular representations and unstructured text descriptions. We collected a dataset of 200 descriptions of houses which include meta-information, as well as text descriptions. We test logistic regression and multi-layer perceptron (MLP) classifiers on dividing these houses into binary buckets based on fixed price thresholds. We present an exploration into strategies to represent unstructured text descriptions of houses as inputs for machine learning models. This includes a comparison of term frequency-inverse document frequency (TF-IDF), bag-of-words (BoW), and zero-shot inference with large language models. We find the best predictive performance with TF-IDF representations of house descriptions. Readers will gain an understanding of how to use machine learning models optimized with structured and unstructured text data to predict house prices.


</jats:p
