Search CORE

750 research outputs found

Zero-Truncated Poisson Tensor Factorization for Massive Binary Tensors

Author: Carin Lawrence
Hu Changwei
Rai Piyush
Publication venue
Publication date: 18/08/2015
Field of study

We present a scalable Bayesian model for low-rank factorization of massive tensors with binary observations. The proposed model has the following key properties: (1) in contrast to the models based on the logistic or probit likelihood, using a zero-truncated Poisson likelihood for binary data allows our model to scale up in the number of \emph{ones} in the tensor, which is especially appealing for massive but sparse binary tensors; (2) side-information in form of binary pairwise relationships (e.g., an adjacency network) between objects in any tensor mode can also be leveraged, which can be especially useful in "cold-start" settings; and (3) the model admits simple Bayesian inference via batch, as well as \emph{online} MCMC; the latter allows scaling up even for \emph{dense} binary data (i.e., when the number of ones in the tensor/network is also massive). In addition, non-negative factor matrices in our model provide easy interpretability, and the tensor rank can be inferred from the data. We evaluate our model on several large-scale real-world binary tensors, achieving excellent computational scalability, and also demonstrate its usefulness in leveraging side-information provided in form of mode-network(s).Comment: UAI (Uncertainty in Artificial Intelligence) 201

arXiv.org e-Print Archive

CiteSeerX

Ionic Liquids: “Green” Solvent for Catalytic Oxidations with Hydrogen Peroxide

Author: Hu Changwei
Zhu Liangfang
Publication venue: 'IntechOpen'
Publication date: 23/01/2013
Field of study

IntechOpen

Crossref

Mammalian DNA2 helicase/nuclease cleaves G-quadruplex DNA and is required for telomere integrity

Author: Campbell Judith
Chai Weihang
Dai Huifang
Hu Jenny
Huang Qing
Lin Weiqiang
Liu Changwei
Sampathi Shilpa
Shen Binghui
Shin-Ya Kazuo
Zheng Li
Zhou Mian
Publication venue: European Molecular Biology Organization
Publication date: 19/04/2013
Field of study

Efficient and faithful replication of telomeric DNA is critical for maintaining genome integrity. The G-quadruplex (G4) structure arising in the repetitive TTAGGG sequence is thought to stall replication forks, impairing efficient telomere replication and leading to telomere instabilities. However, pathways modulating telomeric G4 are poorly understood, and it is unclear whether defects in these pathways contribute to genome instabilities in vivo. Here, we report that mammalian DNA2 helicase/nuclease recognizes and cleaves telomeric G4 in vitro. Consistent with DNA2’s role in removing G4, DNA2 deficiency in mouse cells leads to telomere replication defects, elevating the levels of fragile telomeres (FTs) and sister telomere associations (STAs). Such telomere defects are enhanced by stabilizers of G4. Moreover, DNA2 deficiency induces telomere DNA damage and chromosome segregation errors, resulting in tetraploidy and aneuploidy. Consequently, DNA2-deficient mice develop aneuploidy-associated cancers containing dysfunctional telomeres. Collectively, our genetic, cytological, and biochemical results suggest that mammalian DNA2 reduces replication stress at telomeres, thereby preserving genome stability and suppressing cancer development, and that this may involve, at least in part, nucleolytic processing of telomeric G4

Crossref

PubMed Central

Caltech Authors

Topic-Based Embeddings for Learning from Large Knowledge Graphs

Author: Changwei Hu
Lawrence Carin
Piyush Rai
Publication venue
Publication date: 10/04/2020
Field of study

Abstract We present a scalable probabilistic framework for learning from multi-relational data, given in form of entity-relation-entity triplets, with a potentially massive number of entities and relations (e.g., in multirelational networks, knowledge bases, etc.). We define each triplet via a relation-specific bilinear function of the embeddings of entities associated with it (these embeddings correspond to "topics"). To handle massive number of relations and the data sparsity problem (very few observations per relation), we also extend this model to allow sharing of parameters across relations, which leads to a substantial reduction in the number of parameters to be learned. In addition to yielding excellent predictive performance (e.g., for knowledge base completion tasks), the interpretability of our topic-based embedding framework enables easy qualitative analyses. Computational cost of our models scales in the number of positive triplets, which makes it easy to scale to massive realworld multi-relational data sets, which are usually extremely sparse. We develop simpleto-implement batch as well as online Gibbs sampling algorithms and demonstrate the effectiveness of our models on tasks such as multi-relational link-prediction, and learning from large knowledge bases

CiteSeerX

Non-negative Matrix Factorization for Discrete Data with Hierarchical Side-Information

Author: Changwei Hu
Lawrence Carin
Piyush Rai
Publication venue
Publication date: 02/04/2020
Field of study

Abstract We present a probabilistic framework for efficient non-negative matrix factorization of discrete (count/binary) data with sideinformation. The side-information is given as a multi-level structure, taxonomy, or ontology, with nodes at each level being categorical-valued observations. For example, when modeling documents with a twolevel side-information (documents being at level-zero), level-one may represent (one or more) authors associated with each document and level-two may represent affiliations of each author. The model easily generalizes to more than two levels (or taxonomy/ontology of arbitrary depth). Our model can learn embeddings of entities present at each level in the data/sideinformation hierarchy (e.g., documents, authors, affiliations, in the previous example), with appropriate sharing of information across levels. The model also enjoys full local conjugacy, facilitating efficient Gibbs sampling for model inference. Inference cost scales in the number of non-zero entries in the data matrix, which is especially appealing for real-world massive but sparse matrices. We demonstrate the effectiveness of the model on several real-world data sets

CiteSeerX