62 research outputs found
On Identifying and Mitigating Bias in the Estimation of the COVID-19 Case Fatality Rate
The relative case fatality rates (CFRs) between groups and countries are key
measures of relative risk that guide policy decisions regarding scarce medical
resource allocation during the ongoing COVID-19 pandemic. In the middle of an
active outbreak when surveillance data is the primary source of information,
estimating these quantities involves compensating for competing biases in time
series of deaths, cases, and recoveries. These include time- and severity-
dependent reporting of cases as well as time lags in observed patient outcomes.
In the context of COVID-19 CFR estimation, we survey such biases and their
potential significance. Further, we analyze theoretically the effect of certain
biases, like preferential reporting of fatal cases, on naive estimators of CFR.
We provide a partially corrected estimator of these naive estimates that
accounts for time lag and imperfect reporting of deaths and recoveries. We show
that collection of randomized data by testing the contacts of infectious
individuals regardless of the presence of symptoms would mitigate bias by
limiting the covariance between diagnosis and death. Our analysis is
supplemented by theoretical and numerical results and a simple and fast
open-source codebase at https://github.com/aangelopoulos/cfr-covid-19 .Comment: Harvard Data Science Review (2020) article available at
https://hdsr.mitpress.mit.edu/pub/y9vc2u3
Recommended from our members
Generalizing Beyond the Training Data: New Theory and Algorithms for Optimal Transfer Learning
Traditional machine learning often assumes that training (source) data closely resembles the testing (target) data. However, in many contemporary applications, this is unrealistic: in e-commerce, consumer behavior is time-varying; in medicine, patient populations can exhibit more or less heterogeneity; in autonomous driving, models are rolled out to new environ- ments. Ignoring these “distribution shifts” can lead to costly, harmful, and even dangerous outcomes. This thesis tackles these challenges by developing an algorithmic and statistical toolkit for addressing distribution shifts. Specifically, this work focuses on covariate shift, a form of distribution shift where the source and target distributions have different covariate laws.
I demonstrate that for a large class of problems, transfer learning is possible, even when the source and target data have non-overlapping support. We study covariate shift in the case of kernel classes, Hölder smoothness classes, and sparsity classes. We demonstrate how a suitably defined notion of defect or dissimilarity in the problem instance can be leveraged algorithmically, leading to methods with optimal learning guarantees.
Our final chapter contains results where we provide instance-optimal learning guarantees. We introduce a new method: penalized risk minimization with a non-traditional choice of regularization which is chosen via semidefinite programming. We show that our method has performance which is optimal with respect to the particular covariate shift instance. To our knowledge, these are the first instance-optimal guarantees for transfer learning. Moreover, our results are assumption-light: we impose essentially no restrictions on the underlying covariate laws, thereby broadening the applicability of our theory
Transformers can optimally learn regression mixture models
Mixture models arise in many regression problems, but most methods have seen
limited adoption partly due to these algorithms' highly-tailored and
model-specific nature. On the other hand, transformers are flexible, neural
sequence models that present the intriguing possibility of providing
general-purpose prediction methods, even in this mixture setting. In this work,
we investigate the hypothesis that transformers can learn an optimal predictor
for mixtures of regressions. We construct a generative process for a mixture of
linear regressions for which the decision-theoretic optimal procedure is given
by data-driven exponential weights on a finite set of parameters. We observe
that transformers achieve low mean-squared error on data generated via this
process. By probing the transformer's output at inference time, we also show
that transformers typically make predictions that are close to the optimal
predictor. Our experiments also demonstrate that transformers can learn
mixtures of regressions in a sample-efficient fashion and are somewhat robust
to distribution shifts. We complement our experimental observations by proving
constructively that the decision-theoretic optimal procedure is indeed
implementable by a transformer.Comment: 24 pages, 9 figure
Needle Biopsy Accelerates Pro-metastatic Changes and Systemic Dissemination in Breast Cancer: Implications for Mortality by Surgery Delay
ncreased breast cancer (BC) mortality risk posed by delayed surgical resection of tumor after diagnosis is a growing concern, yet the underlying mechanisms remain unknown. Our cohort analyses of early-stage BC patients reveal the emergence of a significantly rising mortality risk when the biopsy-to-surgery interval was extended beyond 53 days. Additionally, histology of post-biopsy tumors shows prolonged retention of a metastasis-permissive wound stroma dominated by M2-like macrophages capable of promoting cancer cell epithelial-to-mesenchymal transition and angiogenesis. We show that needle biopsy promotes systemic dissemination of cancer cells through a mechanism of sustained activation of the COX-2/PGE2/EP2 feedforward loop, which favors M2 polarization and its associated pro-metastatic changes but are abrogated by oral treatment with COX-2 or EP2 inhibitors in estrogen-receptor-positive (ER+) syngeneic mouse tumor models. Therefore, we conclude that needle biopsy of ER+ BC provokes progressive pro-metastatic changes, which may explain the mortality risk posed by surgery delay after diagnosis
Collaborating to Compete: Blood Profiling Atlas in Cancer (BloodPAC) Consortium
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/136731/1/cpt666.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/136731/2/cpt666_am.pd
Minimum Technical Data Elements for Liquid Biopsy Data Submitted to Public Databases
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/154656/1/cpt1747.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/154656/2/cpt1747-sup-0001-FigS1.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/154656/3/cpt1747_am.pd
Viral escape mutations do not account for non-protection from SIVmac239 challenge in RhCMV/SIV vaccinated rhesus macaques
Simian immunodeficiency virus (SIV) vaccines based upon 68-1 Rhesus Cytomegalovirus (RhCMV) vectors show remarkable protection against pathogenic SIVmac239 challenge. Across multiple independent rhesus macaque (RM) challenge studies, nearly 60% of vaccinated RM show early, complete arrest of SIVmac239 replication after effective challenge, whereas the remainder show progressive infection similar to controls. Here, we performed viral sequencing to determine whether the failure to control viral replication in non-protected RMs is associated with the acquisition of viral escape mutations. While low level viral mutations accumulated in all animals by 28 days-post-challenge, which is after the establishment of viral control in protected animals, the dominant circulating virus in virtually all unprotected RMs was nearly identical to the challenge stock, and there was no difference in mutation patterns between this cohort and unvaccinated controls. These data definitively demonstrate that viral mutation does not explain lack of viral control in RMs not protected by RhCMV/SIV vaccination. We further demonstrate that during chronic infection RhCMV/SIV vaccinated RMs do not acquire escape mutation in epitopes targeted by RhCMV/SIV, but instead display mutation in canonical MHC-Ia epitopes similar to unvaccinated RMs. This suggests that after the initial failure of viral control, unconventional T cell responses induced by 68-1 RhCMV/SIV vaccination do not exert strong selective pressure on systemically replicating SIV
Genome-wide transcriptional profiling of peripheral blood leukocytes from cattle infected with Mycobacterium bovis reveals suppression of host immune genes
Background
Mycobacterium bovis is the causative agent of bovine tuberculosis (BTB), a pathological infection with significant economic impact. Recent studies have highlighted the role of functional genomics to better understand the molecular mechanisms governing the host immune response to M. bovis infection. Furthermore, these studies may enable the identification of novel transcriptional markers of BTB that can augment current diagnostic tests and surveillance programmes. In the present study, we have analysed the transcriptome of peripheral blood leukocytes (PBL) from eight M. bovis-infected and eight control non-infected age-matched and sex-matched Holstein-Friesian cattle using the Affymetrix® GeneChip® Bovine Genome Array with 24,072 gene probe sets representing more than 23,000 gene transcripts.
Results
Control and infected animals had similar mean white blood cell counts. However, the mean number of lymphocytes was significantly increased in the infected group relative to the control group (P = 0.001), while the mean number of monocytes was significantly decreased in the BTB group (P = 0.002). Hierarchical clustering analysis using gene expression data from all 5,388 detectable mRNA transcripts unambiguously partitioned the animals according to their disease status. In total, 2,960 gene transcripts were differentially expressed (DE) between the infected and control animal groups (adjusted P-value threshold ≤ 0.05); with the number of gene transcripts showing decreased relative expression (1,563) exceeding those displaying increased relative expression (1,397). Systems analysis using the Ingenuity® Systems Pathway Analysis (IPA) Knowledge Base revealed an over-representation of DE genes involved in the immune response functional category. More specifically, 64.5% of genes in the affects immune response subcategory displayed decreased relative expression levels in the infected animals compared to the control group.
Conclusions
This study demonstrates that genome-wide transcriptional profiling of PBL can distinguish active M. bovis-infected animals from control non-infected animals. Furthermore, the results obtained support previous investigations demonstrating that mycobacterial infection is associated with host transcriptional suppression. These data support the use of transcriptomic technologies to enable the identification of robust, reliable transcriptional markers of active M. bovis infection.This work was supported by Investigator Grants from Science Foundation Ireland (Nos: SFI/01/F.1/B028 and SFI/08/IN.1/B2038), a Research Stimulus Grant from the Department of Agriculture, Fisheries and Food (No: RSF 06 405) and a European Union Framework 7 Project Grant (No: KBBE-211602-MACROSYS). KEK is supported by the Irish Research Council for Science, Engineering and Technology (IRCSET) funded Bioinformatics and Systems Biology PhD Programme http://bioinfo-casl.ucd.ie/PhD
Optimally tackling covariate shift in RKHS-based nonparametric regression
We study the covariate shift problem in the context of nonparametric
regression over a reproducing kernel Hilbert space (RKHS). We focus on two
natural families of covariate shift problems defined using the likelihood
ratios between the source and target distributions. When the likelihood ratios
are uniformly bounded, we prove that the kernel ridge regression (KRR)
estimator with a carefully chosen regularization parameter is minimax
rate-optimal (up to a log factor) for a large family of RKHSs with regular
kernel eigenvalues. Interestingly, KRR does not require full knowledge of the
likelihood ratio apart from an upper bound on it. In striking contrast to the
standard statistical setting without covariate shift, we also demonstrate that
a na\"\i ve estimator, which minimizes the empirical risk over the function
class, is strictly suboptimal under covariate shift as compared to KRR. We then
address the larger class of covariate shift problems where likelihood ratio is
possibly unbounded yet has a finite second moment. Here, we show via careful
simulations that KRR fails to attain the optimal rate. Instead, we propose a
reweighted KRR estimator that weights samples based on a careful truncation of
the likelihood ratios. Again, we are able to show that this estimator is
minimax optimal, up to logarithmic factors
- …
