Search CORE

63,397 research outputs found

Affine Hirsch foliations on 3-manifolds

Author: Yu Bin
Publication venue: 'Mathematical Sciences Publishers'
Publication date: 24/02/2017
Field of study

This paper is devoted to discussing affine Hirsch foliations on

3

-manifolds. First, we prove that up to isotopic leaf-conjugacy, every closed orientable

3

-manifold

M

admits

0

1

2

affine Hirsch foliations. Furthermore, every case is possible. Then, we analyze the

3

-manifolds admitting two affine Hirsch foliations (abbreviated as Hirsch manifolds). On the one hand, we construct Hirsch manifolds by using exchangeable braided links (abbreviated as DEBL Hirsch manifolds); on the other hand, we show that every Hirsch manifold virtually is a DEBL Hirsch manifold. Finally, we show that for every

n\in \mathbb{N}

, there are only finitely many Hirsch manifolds with strand number

n

. Here the strand number of a Hirsch manifold

M

is a positive integer defined by using strand numbers of braids.Comment: 30pages, 4 figures, to appear at Algebr. Geom. Topo

arXiv.org e-Print Archive

Crossref

Comment: Monitoring Networked Applications With Incremental Quantile Estimation

Author: Yu Bin
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 02/08/2007
Field of study

Comment: Monitoring Networked Applications With Incremental Quantile Estimation [arXiv:0708.0302]Comment: Published at http://dx.doi.org/10.1214/088342306000000628 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Number of paths versus number of basis functions in American option pricing

Author: Glasserman Paul
Yu Bin
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2003
Field of study

An American option grants the holder the right to select the time at which to exercise the option, so pricing an American option entails solving an optimal stopping problem. Difficulties in applying standard numerical methods to complex pricing problems have motivated the development of techniques that combine Monte Carlo simulation with dynamic programming. One class of methods approximates the option value at each time using a linear combination of basis functions, and combines Monte Carlo with backward induction to estimate optimal coefficients in each approximation. We analyze the convergence of such a method as both the number of basis functions and the number of simulated paths increase. We get explicit results when the basis functions are polynomials and the underlying process is either Brownian motion or geometric Brownian motion. We show that the number of paths required for worst-case convergence grows exponentially in the degree of the approximating polynomials in the case of Brownian motion and faster in the case of geometric Brownian motion.Comment: Published at http://dx.doi.org/10.1214/105051604000000846 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Boosting with early stopping: Convergence and consistency

Author: Yu Bin
Zhang Tong
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2005
Field of study

Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulting estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set. This paper studies numerical convergence, consistency and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting's greedy optimization to the infinimum of the loss function over the linear span. Using the numerical convergence result, we find early-stopping strategies under which boosting is shown to be consistent based on i.i.d. samples, and we obtain bounds on the rates of convergence for boosting estimators. Simulation studies are also presented to illustrate the relevance of our theoretical results for providing insights to practical aspects of boosting. As a side product, these results also reveal the importance of restricting the greedy search step-sizes, as known in practice through the work of Friedman and others. Moreover, our results lead to a rigorous proof that for a linearly separable problem, AdaBoost with \epsilon\to0 step-size becomes an L^1-margin maximizer when left to run to convergence.Comment: Published at http://dx.doi.org/10.1214/009053605000000255 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Hong Kong University of Science and Technology Institutional Repository

Impact of regularization on Spectral Clustering

Author: Joseph Antony
Yu Bin
Publication venue
Publication date: 01/01/2014
Field of study

The performance of spectral clustering can be considerably improved via regularization, as demonstrated empirically in Amini et. al (2012). Here, we provide an attempt at quantifying this improvement through theoretical analysis. Under the stochastic block model (SBM), and its extensions, previous results on spectral clustering relied on the minimum degree of the graph being sufficiently large for its good performance. By examining the scenario where the regularization parameter

\tau

is large we show that the minimum degree assumption can potentially be removed. As a special case, for an SBM with two blocks, the results require the maximum degree to be large (grow faster than

\log n

) as opposed to the minimum degree. More importantly, we show the usefulness of regularization in situations where not all nodes belong to well-defined clusters. Our results rely on a `bias-variance'-like trade-off that arises from understanding the concentration of the sample Laplacian and the eigen gap as a function of the regularization parameter. As a byproduct of our bounds, we propose a data-driven technique \textit{DKest} (standing for estimated Davis-Kahan bounds) for choosing the regularization parameter. This technique is shown to work well through simulations and on a real data set.Comment: 37 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

eScholarship - University of California

Superheat: An R package for creating beautiful and extendable heatmaps for visualizing complex data

Author: Barter Rebecca L
Yu Bin
Publication venue
Publication date: 26/01/2017
Field of study

The technological advancements of the modern era have enabled the collection of huge amounts of data in science and beyond. Extracting useful information from such massive datasets is an ongoing challenge as traditional data visualization tools typically do not scale well in high-dimensional settings. An existing visualization technique that is particularly well suited to visualizing large datasets is the heatmap. Although heatmaps are extremely popular in fields such as bioinformatics for visualizing large gene expression datasets, they remain a severely underutilized visualization tool in modern data analysis. In this paper we introduce superheat, a new R package that provides an extremely flexible and customizable platform for visualizing large datasets using extendable heatmaps. Superheat enhances the traditional heatmap by providing a platform to visualize a wide range of data types simultaneously, adding to the heatmap a response variable as a scatterplot, model results as boxplots, correlation information as barplots, text information, and more. Superheat allows the user to explore their data to greater depths and to take advantage of the heterogeneity present in the data to inform analysis decisions. The goal of this paper is two-fold: (1) to demonstrate the potential of the heatmap as a default visualization method for a wide range of data types using reproducible examples, and (2) to highlight the customizability and ease of implementation of the superheat package in R for creating beautiful and extendable heatmaps. The capabilities and fundamental applicability of the superheat package will be explored via three case studies, each based on publicly available data sources and accompanied by a file outlining the step-by-step analytic pipeline (with code).Comment: 26 pages, 10 figure

arXiv.org e-Print Archive

Crossref

eScholarship - University of California