27 research outputs found

    Getting Aligned on Representational Alignment

    Get PDF
    Biological and artificial information processing systems form representationsthat they can use to categorize, reason, plan, navigate, and make decisions.How can we measure the extent to which the representations formed by thesediverse systems agree? Do similarities in representations then translate intosimilar behavior? How can a system's representations be modified to bettermatch those of another system? These questions pertaining to the study ofrepresentational alignment are at the heart of some of the most active researchareas in cognitive science, neuroscience, and machine learning. For example,cognitive scientists measure the representational alignment of multipleindividuals to identify shared cognitive priors, neuroscientists align fMRIresponses from multiple individuals into a shared representational space forgroup-level analyses, and ML researchers distill knowledge from teacher modelsinto student models by increasing their alignment. Unfortunately, there islimited knowledge transfer between research communities interested inrepresentational alignment, so progress in one field often ends up beingrediscovered independently in another. Thus, greater cross-field communicationwould be advantageous. To improve communication between these fields, wepropose a unifying framework that can serve as a common language betweenresearchers studying representational alignment. We survey the literature fromall three fields and demonstrate how prior work fits into this framework.Finally, we lay out open problems in representational alignment where progresscan benefit all three of these fields. We hope that our work can catalyzecross-disciplinary collaboration and accelerate progress for all communitiesstudying and developing information processing systems. We note that this is aworking paper and encourage readers to reach out with their suggestions forfuture revisions.<br

    Effect of missing data on multitask prediction methods

    Get PDF
    There has been a growing interest in multitask prediction in chemoinformatics, helped by the increasing use of deep neural networks in this field. This technique is applied to multitarget data sets, where compounds have been tested against different targets, with the aim of developing models to predict a profile of biological activities for a given compound. However, multitarget data sets tend to be sparse; i.e., not all compound-target combinations have experimental values. There has been little research on the effect of missing data on the performance of multitask methods. We have used two complete data sets to simulate sparseness by removing data from the training set. Different models to remove the data were compared. These sparse sets were used to train two different multitask methods, deep neural networks and Macau, which is a Bayesian probabilistic matrix factorization technique. Results from both methods were remarkably similar and showed that the performance decrease because of missing data is at first small before accelerating after large amounts of data are removed. This work provides a first approximation to assess how much data is required to produce good performance in multitask prediction exercises

    Low Data Drug Discovery with One-Shot Learning

    No full text

    Comparative study of multitask toxicity modeling on a broad chemical space.

    No full text
    Acute toxicity is one of the most challenging properties to predict purely with computational methods due to its direct relationship to biological interactions. Moreover, toxicity can be represented by different end points: it can be measured for different species using different types of administration, etc., and it is questionable if the knowledge transfer between end points is possible. We performed a comparative study of prediction multitask toxicity for a broad chemical space using different descriptors and modeling algorithms and applied multitask learning for a large toxicity data set extracted from the Registry of Toxic Effects of Chemical Substances (RTECS). We demonstrated that multitask modeling provides significant improvement over single-output models and other machine learning methods. Our research reveals that multitask learning can be very useful to improve the quality of acute toxicity modeling and raises a discussion about the usage of multitask approaches for regulation purposes
    corecore