Search CORE

478 research outputs found

Minimax rates of convergence for nonparametric location-scale models

Author: Yang Yuhong
Zhao Bingxin
Publication venue
Publication date: 03/07/2023
Field of study

This paper studies minimax rates of convergence for nonparametric location-scale models, which include mean, quantile and expectile regression settings. Under Hellinger differentiability on the error distribution and other mild conditions, we show that the minimax rate of convergence for estimating the regression function under the squared

L_2

loss is determined by the metric entropy of the nonparametric function class. Different error distributions, including asymmetric Laplace distribution, asymmetric connected double truncated gamma distribution, connected normal-Laplace distribution, Cauchy distribution and asymmetric normal distribution are studied as examples. Applications on low order interaction models and multiple index models are also given

arXiv.org e-Print Archive

Student Recital

Author: Yang Bingxin Jillian
Zhang Mengmeng
Publication venue: Scholarship@Western
Publication date: 27/03/2017
Field of study

Scholarship@Western

Student Recital

Author: Cho Brian
Yang Jillian Bingxin
Publication venue: Scholarship@Western
Publication date: 06/04/2021
Field of study

Scholarship@Western

Recent developments in measurement and tracking of the APS storage ring beam emittance

Author: Bingxin Yang
Publication venue: 'AIP Publishing'
Publication date: 01/01/2003
Field of study

Crossref

Optical System Design for High-Energy Particle Beam Diagnostics

Author: Bingxin Yang
Publication venue: 'AIP Publishing'
Publication date: 01/01/2003
Field of study

Crossref

High accuracy momentum compaction measurement for the APS storage ring with undulator radiation

Author: Bingxin Yang
Publication venue: 'AIP Publishing'
Publication date: 01/01/2003
Field of study

Crossref

Student Recital

Author: Cho Brian
Yang Jillian Bingxin
Zhang Mengmeng
Publication venue: Scholarship@Western
Publication date: 09/04/2019
Field of study

Scholarship@Western

ProtSolM: Protein Solubility Prediction with Multi-modal Features

Author: Hong Liang
Tan Yang
Zheng Jia
Zhou Bingxin
Publication venue
Publication date: 28/06/2024
Field of study

Understanding protein solubility is essential for their functional applications. Computational methods for predicting protein solubility are crucial for reducing experimental costs and enhancing the efficiency and success rates of protein engineering. Existing methods either construct a supervised learning scheme on small-scale datasets with manually processed physicochemical properties, or blindly apply pre-trained protein language models to extract amino acid interaction information. The scale and quality of available training datasets leave significant room for improvement in terms of accuracy and generalization. To address these research gaps, we propose \sol, a novel deep learning method that combines pre-training and fine-tuning schemes for protein solubility prediction. ProtSolM integrates information from multiple dimensions, including physicochemical properties, amino acid sequences, and protein backbone structures. Our model is trained using \data, the largest solubility dataset that we have constructed. PDBSol includes over

60,000

protein sequences and structures. We provide a comprehensive leaderboard of existing statistical learning and deep learning methods on independent datasets with computational and experimental labels. ProtSolM achieved state-of-the-art performance across various evaluation metrics, demonstrating its potential to significantly advance the accuracy of protein solubility prediction.Comment: 10 pages, 7 figures, 9 table

arXiv.org e-Print Archive

CNETML: Maximum likelihood inference of phylogeny from copy number profiles of spatio-temporal samples

Author: Barnes Chris
Curtius Kit
Graham Trevor
Lu Bingxin
Yang Ziheng
Publication venue: bioRxiv
Publication date: 20/03/2022
Field of study

Phylogenetic trees based on copy number alterations (CNAs) for multi-region samples of a single cancer patient are helpful to understand the spatio-temporal evolution of cancers, especially in tumours driven by chromosomal instability. Due to the high cost of deep sequencing data, low-coverage data are more accessible in practice, which only allow the calling of (relative) total copy numbers due to the lower resolution. However, methods to reconstruct sample phylogenies from CNAs often use allele-specific copy numbers and those using total copy number are mostly distance matrix or maximum parsimony methods which do not handle temporal data or estimate mutation rates. In this work, we developed a new maximum likelihood method based on a novel evolutionary model of CNAs, CNETML, to infer phylogenies from spatio-temporal samples taken within a single patient. CNETML is the first program to jointly infer the tree topology, node ages, and mutation rates from total copy numbers when samples were taken at different time points. Our extensive simulations suggest CNETML performed well even on relative copy numbers with subclonal whole genome doubling events and under slight violation of model assumptions. The application of CNETML to real data from Barrett’s esophagus patients also generated consistent results with previous discoveries and novel early CNAs for further investigations

UCL Discovery

Protein Representation Learning with Sequence Information Embedding: Does it Always Lead to a Better Performance?

Author: Hong Liang
Tan Yang
Zheng Lirong
Zhong Bozitao
Zhou Bingxin
Publication venue
Publication date: 28/06/2024
Field of study

Deep learning has become a crucial tool in studying proteins. While the significance of modeling protein structure has been discussed extensively in the literature, amino acid types are typically included in the input as a default operation for many inference tasks. This study demonstrates with structure alignment task that embedding amino acid types in some cases may not help a deep learning model learn better representation. To this end, we propose ProtLOCA, a local geometry alignment method based solely on amino acid structure representation. The effectiveness of ProtLOCA is examined by a global structure-matching task on protein pairs with an independent test dataset based on CATH labels. Our method outperforms existing sequence- and structure-based representation learning methods by more quickly and accurately matching structurally consistent protein domains. Furthermore, in local structure pairing tasks, ProtLOCA for the first time provides a valid solution to highlight common local structures among proteins with different overall structures but the same function. This suggests a new possibility for using deep learning methods to analyze protein structure to infer function.Comment: 8 pages, 4 figure

arXiv.org e-Print Archive