Search CORE

874 research outputs found

CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise

Author: He Xiaodong
Lee Kuang-Huei
Yang Linjun
Zhang Lei
Publication venue
Publication date: 25/03/2018
Field of study

In this paper, we study the problem of learning image classification models with label noise. Existing approaches depending on human supervision are generally not scalable as manually identifying correct or incorrect labels is time-consuming, whereas approaches not relying on human supervision are scalable but less effective. To reduce the amount of human supervision for label noise cleaning, we introduce CleanNet, a joint neural embedding network, which only requires a fraction of the classes being manually verified to provide the knowledge of label noise that can be transferred to other classes. We further integrate CleanNet and conventional convolutional neural network classifier into one framework for image classification learning. We demonstrate the effectiveness of the proposed algorithm on both of the label noise detection task and the image classification on noisy data task on several large-scale datasets. Experimental results show that CleanNet can reduce label noise detection error rate on held-out classes where no human supervision available by 41.5% compared to current weakly supervised methods. It also achieves 47% of the performance gain of verifying all images with only 3.2% images verified on an image classification task. Source code and dataset will be available at kuanghuei.github.io/CleanNetProject.Comment: Accepted to CVPR 201

arXiv.org e-Print Archive

Crossref

Several Issues on Hieroglyph of Naxi Ethnic Minority

Author: YANG Linjun
Publication venue: Canadian Research & Development Center of Sciences and Cultures
Publication date: 26/01/2015
Field of study

Hieroglyph of Naxi ethnic minority is the picture text, which has been so far the only “living hieroglyph”. Naxi Hieroglyph is the general name of Dongba Script, Geba Script Malimasha Script as well as Ruanke Script. Moreover, the creation of Naxi Hieroglyph is closely related to the migration routes of Naxi Geba Script, based on Do ancestors, which corresponds with the dialect areas of Naxi ethnic language, and its creation can date back to 11th century. Geba Script, is created when contacting with foreign culture, which carries the characteristics of Chinese and Tibetan writings

CSCanada.net: E-Journals (Canadian Academy of Oriental and Occidental Culture, Canadian Research & Development Center of Sciences and Cultures)

Thorium-doping induced superconductivity up to 56 K in Gd1-xThxFeAsO

Author: Cao Wang
Chen G. F.
Dong J.
Guanghan Cao
Linjun Li
Liu R. H.
Nomura T.
Ren Z. A.
Ren Z. A.
Ren Z. A.
Ren Z. A.
Shuai Jiang
Shun Chi
Wen H. H.
Xiangfan Xu
Xiao Lin
Yang J.
Yongkang Luo
Yuetao Wang
Yuke Li
Zengwei Zhu
Zhi Ren
Zhu'an Xu
Publication venue: 'IOP Publishing'
Publication date: 01/01/2008
Field of study

Following the discovery of superconductivity in an iron-based arsenide LaO1-xFxFeAs with a superconducting transition temperature (Tc) of 26 K[1], Tc was pushed up surprisingly to above 40 K by either applying pressure[2] or replacing La with Sm[3], Ce[4], Nd[5] and Pr[6]. The maximum Tc has climbed to 55 K, observed in SmO1-xFxFeAs[7, 8] and SmFeAsO1-x[9]. The value of Tc was found to increase with decreasing lattice parameters in LnFeAsO1-xFx (Ln stands for the lanthanide elements) at an apparently optimal doping level. However, the F- doping in GdFeAsO is particularly difficult[10,11] due to the lattice mismatch between the Gd2O2 layers and Fe2As2 layers. Here we report observation of superconductivity with Tc as high as 56 K by the Th4+ substitution for Gd3+ in GdFeAsO. The incorporation of relatively large Th4+ ions relaxes the lattice mismatch, hence induces the high temperature superconductivity.Comment: 4 pages, 3 figure

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Open Access Research from University of Wollongong

Doubly Robust Conditional Independence Testing with Generative Neural Networks

Author: Huang Linjun
Shao Xiaofeng
Yang Yun
Zhang Yi
Publication venue
Publication date: 24/07/2024
Field of study

This article addresses the problem of testing the conditional independence of two generic random vectors

X

and

Y

given a third random vector

Z

, which plays an important role in statistical and machine learning applications. We propose a new non-parametric testing procedure that avoids explicitly estimating any conditional distributions but instead requires sampling from the two marginal conditional distributions of

X

given

Z

and

Y

given

Z

. We further propose using a generative neural network (GNN) framework to sample from these approximated marginal conditional distributions, which tends to mitigate the curse of dimensionality due to its adaptivity to any low-dimensional structures and smoothness underlying the data. Theoretically, our test statistic is shown to enjoy a doubly robust property against GNN approximation errors, meaning that the test statistic retains all desirable properties of the oracle test statistic utilizing the true marginal conditional distributions, as long as the product of the two approximation errors decays to zero faster than the parametric rate. Asymptotic properties of our statistic and the consistency of a bootstrap procedure are derived under both null and local alternatives. Extensive numerical experiments and real data analysis illustrate the effectiveness and broad applicability of our proposed test

arXiv.org e-Print Archive

Nuclear tunneling effects of charge transport in rubrene, tetracene, and pentacene

Author: Nan Guangjun（Chinese Acad Sci, Inst Chem）
Shuai Zhigang（Chinese Acad Sci, Inst Chem）
Wang Linjun（Chinese Acad Sci, Inst Chem）
Yang Xiaodi（Chinese Acad Sci, Inst Chem）
Zhao Yi
赵仪
Publication venue: 'American Physical Society (APS)'
Publication date: 01/03/2009
Field of study

The mechanism of charge transport in organic materials is still controversial from both experimental and theoretical perspectives. At room temperature, molecular deformations interact strongly with the charge carrier both through intermolecular and intramolecular phonons, suggesting a thermally activated hopping mechanism as described by the Marcus electron transfer theory. However, several experimental measurements have indicated that the electronic transport behaves in a "bandlike" manner, as indicated by a decrease in mobility with increasing temperature, in contradiction to the Marcus description. Bandlike first-principles calculations based on the Holstein-Peierls model tend to overestimate the charge mobility by about 2 orders of magnitude. Here, a hopping model is derived that not only quantitatively describes the charge mobility but also explains the observed bandlike behavior. This model uses the quantum version of charge-transfer theory coupled with a random-walk simulation of charge diffusion. The results bridge the gap between the two extreme mechanisms. This first-principles method predicts the room-temperature hole mobilities to be 2.4, 2.0, and 0.67 cm(2)/V s, for rubrene, pentacene, and tetracene, respectively, in good agreement with experiment

Xiamen University Institutional Repository

Large Search Model: Redefining Search Stack in the Era of LLMs

Author: Huang Xiaolong
Majumder Rangan
Wang Liang
Wei Furu
Yang Linjun
Yang Nan
Publication venue
Publication date: 02/01/2024
Field of study

Modern search engines are built on a stack of different components, including query understanding, retrieval, multi-stage ranking, and question answering, among others. These components are often optimized and deployed independently. In this paper, we introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM). All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts. This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack. To substantiate the feasibility of this framework, we present a series of proof-of-concept experiments and discuss the potential challenges associated with implementing this approach within real-world search systems.Comment: SIGIR Forum, Vol. 57 No. 2 - December 202

arXiv.org e-Print Archive

Improving Text Embeddings with Large Language Models

Author: Huang Xiaolong
Majumder Rangan
Wang Liang
Wei Furu
Yang Linjun
Yang Nan
Publication venue
Publication date: 31/05/2024
Field of study

In this paper, we introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps. Unlike existing methods that often depend on multi-stage intermediate pre-training with billions of weakly-supervised text pairs, followed by fine-tuning with a few labeled datasets, our method does not require building complex training pipelines or relying on manually collected datasets that are often constrained by task diversity and language coverage. We leverage proprietary LLMs to generate diverse synthetic data for hundreds of thousands of text embedding tasks across 93 languages. We then fine-tune open-source decoder-only LLMs on the synthetic data using standard contrastive loss. Experiments demonstrate that our method achieves strong performance on highly competitive text embedding benchmarks without using any labeled data. Furthermore, when fine-tuned with a mixture of synthetic and labeled data, our model sets new state-of-the-art results on the BEIR and MTEB benchmarks.Comment: Accepted by ACL 202

arXiv.org e-Print Archive