Search CORE

1,777 research outputs found

Automatic Detection of Public Development Projects in Large Open Source Ecosystems: An Exploratory Study on GitHub

Author: Cheng Can
Li Bing
Li Zengyang
Liang Peng
Publication venue: 'KSI Research Inc.'
Publication date: 08/05/2018
Field of study

Hosting over 10 million of software projects, GitHub is one of the most important data sources to study behavior of developers and software projects. However, with the increase of the size of open source datasets, the potential threats to mining these datasets have also grown. As the dataset grows, it becomes gradually unrealistic for human to confirm quality of all samples. Some studies have investigated this problem and provided solutions to avoid threats in sample selection, but some of these solutions (e.g., finding development projects) require human intervention. When the amount of data to be processed increases, these semi-automatic solutions become less useful since the effort in need for human intervention is far beyond affordable. To solve this problem, we investigated the GHTorrent dataset and proposed a method to detect public development projects. The results show that our method can effectively improve the sample selection process in two ways: (1) We provide a simple model to automatically select samples (with 0.827 precision and 0.947 recall); (2) We also offer a complex model to help researchers carefully screen samples (with 63.2% less effort than manually confirming all samples, and can achieve 0.926 precision and 0.959 recall).Comment: Accepted by the SEKE2018 Conferenc

arXiv.org e-Print Archive

Crossref

Navigation Objects Extraction for Better Content Structure Understanding

Author: Bu Jiajun
Li Bangpeng
Peng Zilun
Wang Can
Zhao Kui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/08/2017
Field of study

Existing works for extracting navigation objects from webpages focus on navigation menus, so as to reveal the information architecture of the site. However, web 2.0 sites such as social networks, e-commerce portals etc. are making the understanding of the content structure in a web site increasingly difficult. Dynamic and personalized elements such as top stories, recommended list in a webpage are vital to the understanding of the dynamic nature of web 2.0 sites. To better understand the content structure in web 2.0 sites, in this paper we propose a new extraction method for navigation objects in a webpage. Our method will extract not only the static navigation menus, but also the dynamic and personalized page-specific navigation lists. Since the navigation objects in a webpage naturally come in blocks, we first cluster hyperlinks into different blocks by exploiting spatial locations of hyperlinks, the hierarchical structure of the DOM-tree and the hyperlink density. Then we identify navigation objects from those blocks using the SVM classifier with novel features such as anchor text lengths etc. Experiments on real-world data sets with webpages from various domains and styles verified the effectiveness of our method.Comment: 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI

arXiv.org e-Print Archive

Crossref

A Raman-Heterodyne Study of the Hyperfine Interaction of the Optically-Excited State $^5$ D $_0$ of Eu $^{3+}$ :Y $_2$ SiO $_5$

Author: Guo Guang-Can
Han Yong-Jian
Hu Jun
Hua Yi-Lin
Li Chuan-Feng
Li Pei-Yun
Li Xue
Li Zong-Feng
Liang Peng-Jun
Liu Chao
Liu Xiao
Ma Yu
Tu Tao
Xiao Yi-Xin
Yang Tian-Shu
Zhou Zong-Quan
Publication venue: 'Elsevier BV'
Publication date: 28/01/2018
Field of study

The spin coherence time of

^{151}

^{3+}

which substitutes the yttrium at site 1 in Y

_2

SiO

_5

crystal has been extended to 6 hours in a recent work [\textit{Nature} \textbf{517}, 177 (2015)]. To make this long-lived spin coherence useful for optical quantum memory applications, we experimentally characterize the hyperfine interaction of the optically-excited state

^5

_0

using Raman-heterodyne-detected nuclear magnetic resonance. The effective spin Hamiltonians for excited and ground state are fitted based on the experimental spectra obtained in 200 magnetic fields with various orientations. To show the correctness of the fitted parameters and potential application in quantum memory protocols, we also characterize the ground-state hyperfine interaction and predict the critical magnetic field which produces the 6-hour-long coherence time. The complete energy level structure for both the

^7

_0

ground state and

^5

_0

excited state at the critical magnetic field are obtained. These results enable the design of quantum memory protocols and the optimization of optical pumping strategy for realization of photonic quantum memory with hour-long lifetime

arXiv.org e-Print Archive

Crossref