63,397 research outputs found

    Affine Hirsch foliations on 3-manifolds

    Full text link
    This paper is devoted to discussing affine Hirsch foliations on 33-manifolds. First, we prove that up to isotopic leaf-conjugacy, every closed orientable 33-manifold MM admits 00, 11 or 22 affine Hirsch foliations. Furthermore, every case is possible. Then, we analyze the 33-manifolds admitting two affine Hirsch foliations (abbreviated as Hirsch manifolds). On the one hand, we construct Hirsch manifolds by using exchangeable braided links (abbreviated as DEBL Hirsch manifolds); on the other hand, we show that every Hirsch manifold virtually is a DEBL Hirsch manifold. Finally, we show that for every nNn\in \mathbb{N}, there are only finitely many Hirsch manifolds with strand number nn. Here the strand number of a Hirsch manifold MM is a positive integer defined by using strand numbers of braids.Comment: 30pages, 4 figures, to appear at Algebr. Geom. Topo

    Comment: Monitoring Networked Applications With Incremental Quantile Estimation

    Full text link
    Comment: Monitoring Networked Applications With Incremental Quantile Estimation [arXiv:0708.0302]Comment: Published at http://dx.doi.org/10.1214/088342306000000628 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Number of paths versus number of basis functions in American option pricing

    Full text link
    An American option grants the holder the right to select the time at which to exercise the option, so pricing an American option entails solving an optimal stopping problem. Difficulties in applying standard numerical methods to complex pricing problems have motivated the development of techniques that combine Monte Carlo simulation with dynamic programming. One class of methods approximates the option value at each time using a linear combination of basis functions, and combines Monte Carlo with backward induction to estimate optimal coefficients in each approximation. We analyze the convergence of such a method as both the number of basis functions and the number of simulated paths increase. We get explicit results when the basis functions are polynomials and the underlying process is either Brownian motion or geometric Brownian motion. We show that the number of paths required for worst-case convergence grows exponentially in the degree of the approximating polynomials in the case of Brownian motion and faster in the case of geometric Brownian motion.Comment: Published at http://dx.doi.org/10.1214/105051604000000846 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Boosting with early stopping: Convergence and consistency

    Full text link
    Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulting estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set. This paper studies numerical convergence, consistency and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting's greedy optimization to the infinimum of the loss function over the linear span. Using the numerical convergence result, we find early-stopping strategies under which boosting is shown to be consistent based on i.i.d. samples, and we obtain bounds on the rates of convergence for boosting estimators. Simulation studies are also presented to illustrate the relevance of our theoretical results for providing insights to practical aspects of boosting. As a side product, these results also reveal the importance of restricting the greedy search step-sizes, as known in practice through the work of Friedman and others. Moreover, our results lead to a rigorous proof that for a linearly separable problem, AdaBoost with \epsilon\to0 step-size becomes an L^1-margin maximizer when left to run to convergence.Comment: Published at http://dx.doi.org/10.1214/009053605000000255 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Impact of regularization on Spectral Clustering

    Full text link
    The performance of spectral clustering can be considerably improved via regularization, as demonstrated empirically in Amini et. al (2012). Here, we provide an attempt at quantifying this improvement through theoretical analysis. Under the stochastic block model (SBM), and its extensions, previous results on spectral clustering relied on the minimum degree of the graph being sufficiently large for its good performance. By examining the scenario where the regularization parameter τ\tau is large we show that the minimum degree assumption can potentially be removed. As a special case, for an SBM with two blocks, the results require the maximum degree to be large (grow faster than logn\log n) as opposed to the minimum degree. More importantly, we show the usefulness of regularization in situations where not all nodes belong to well-defined clusters. Our results rely on a `bias-variance'-like trade-off that arises from understanding the concentration of the sample Laplacian and the eigen gap as a function of the regularization parameter. As a byproduct of our bounds, we propose a data-driven technique \textit{DKest} (standing for estimated Davis-Kahan bounds) for choosing the regularization parameter. This technique is shown to work well through simulations and on a real data set.Comment: 37 page

    Superheat: An R package for creating beautiful and extendable heatmaps for visualizing complex data

    Full text link
    The technological advancements of the modern era have enabled the collection of huge amounts of data in science and beyond. Extracting useful information from such massive datasets is an ongoing challenge as traditional data visualization tools typically do not scale well in high-dimensional settings. An existing visualization technique that is particularly well suited to visualizing large datasets is the heatmap. Although heatmaps are extremely popular in fields such as bioinformatics for visualizing large gene expression datasets, they remain a severely underutilized visualization tool in modern data analysis. In this paper we introduce superheat, a new R package that provides an extremely flexible and customizable platform for visualizing large datasets using extendable heatmaps. Superheat enhances the traditional heatmap by providing a platform to visualize a wide range of data types simultaneously, adding to the heatmap a response variable as a scatterplot, model results as boxplots, correlation information as barplots, text information, and more. Superheat allows the user to explore their data to greater depths and to take advantage of the heterogeneity present in the data to inform analysis decisions. The goal of this paper is two-fold: (1) to demonstrate the potential of the heatmap as a default visualization method for a wide range of data types using reproducible examples, and (2) to highlight the customizability and ease of implementation of the superheat package in R for creating beautiful and extendable heatmaps. The capabilities and fundamental applicability of the superheat package will be explored via three case studies, each based on publicly available data sources and accompanied by a file outlining the step-by-step analytic pipeline (with code).Comment: 26 pages, 10 figure
    corecore