436 research outputs found

    Simple Asymmetric Exclusion Model and Lattice Paths: Bijections and Involutions

    Full text link
    We study the combinatorics of the change of basis of three representations of the stationary state algebra of the two parameter simple asymmetric exclusion process. Each of the representations considered correspond to a different set of weighted lattice paths which, when summed over, give the stationary state probability distribution. We show that all three sets of paths are combinatorially related via sequences of bijections and sign reversing involutions.Comment: 28 page

    SNE: Signed Network Embedding

    Full text link
    Several network embedding models have been developed for unsigned networks. However, these models based on skip-gram cannot be applied to signed networks because they can only deal with one type of link. In this paper, we present our signed network embedding model called SNE. Our SNE adopts the log-bilinear model, uses node representations of all nodes along a given path, and further incorporates two signed-type vectors to capture the positive or negative relationship of each edge along the path. We conduct two experiments, node classification and link prediction, on both directed and undirected signed networks and compare with four baselines including a matrix factorization method and three state-of-the-art unsigned network embedding models. The experimental results demonstrate the effectiveness of our signed network embedding.Comment: To appear in PAKDD 201

    A combinatorial approach to jumping particles II: general boundary conditions

    No full text
    International audienceWe consider a model of particles jumping on a row, the TASEP. From the point of view of combinatorics a remarkable feauture of this Markov chain is that Catalan numbers are involved in several entries of its stationary distribution. In a companion paper, we gave a combinatorial interpretaion and a simple proof of these observations in the simplest case where the particles enter, jump and exit at the same rate. In this paper we show how to deal with general rates

    Sharing Social Network Data: Differentially Private Estimation of Exponential-Family Random Graph Models

    Get PDF
    Motivated by a real-life problem of sharing social network data that contain sensitive personal information, we propose a novel approach to release and analyze synthetic graphs in order to protect privacy of individual relationships captured by the social network while maintaining the validity of statistical results. A case study using a version of the Enron e-mail corpus dataset demonstrates the application and usefulness of the proposed techniques in solving the challenging problem of maintaining privacy \emph{and} supporting open access to network data to ensure reproducibility of existing studies and discovering new scientific insights that can be obtained by analyzing such data. We use a simple yet effective randomized response mechanism to generate synthetic networks under ϵ\epsilon-edge differential privacy, and then use likelihood based inference for missing data and Markov chain Monte Carlo techniques to fit exponential-family random graph models to the generated synthetic networks.Comment: Updated, 39 page

    BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees

    Full text link
    The rising volume of datasets has made training machine learning (ML) models a major computational cost in the enterprise. Given the iterative nature of model and parameter tuning, many analysts use a small sample of their entire data during their initial stage of analysis to make quick decisions (e.g., what features or hyperparameters to use) and use the entire dataset only in later stages (i.e., when they have converged to a specific model). This sampling, however, is performed in an ad-hoc fashion. Most practitioners cannot precisely capture the effect of sampling on the quality of their model, and eventually on their decision-making process during the tuning phase. Moreover, without systematic support for sampling operators, many optimizations and reuse opportunities are lost. In this paper, we introduce BlinkML, a system for fast, quality-guaranteed ML training. BlinkML allows users to make error-computation tradeoffs: instead of training a model on their full data (i.e., full model), BlinkML can quickly train an approximate model with quality guarantees using a sample. The quality guarantees ensure that, with high probability, the approximate model makes the same predictions as the full model. BlinkML currently supports any ML model that relies on maximum likelihood estimation (MLE), which includes Generalized Linear Models (e.g., linear regression, logistic regression, max entropy classifier, Poisson regression) as well as PPCA (Probabilistic Principal Component Analysis). Our experiments show that BlinkML can speed up the training of large-scale ML tasks by 6.26x-629x while guaranteeing the same predictions, with 95% probability, as the full model.Comment: 22 pages, SIGMOD 201

    Discretization of variational regularization in Banach spaces

    Full text link
    Consider a nonlinear ill-posed operator equation F(u)=yF(u)=y where FF is defined on a Banach space XX. In general, for solving this equation numerically, a finite dimensional approximation of XX and an approximation of FF are required. Moreover, in general the given data \yd of yy are noisy. In this paper we analyze finite dimensional variational regularization, which takes into account operator approximations and noisy data: We show (semi-)convergence of the regularized solution of the finite dimensional problems and establish convergence rates in terms of Bregman distances under appropriate sourcewise representation of a solution of the equation. The more involved case of regularization in nonseparable Banach spaces is discussed in detail. In particular we consider the space of finite total variation functions, the space of functions of finite bounded deformation, and the LL^\infty--space

    Differentially Private Model Selection with Penalized and Constrained Likelihood

    Full text link
    In statistical disclosure control, the goal of data analysis is twofold: The released information must provide accurate and useful statistics about the underlying population of interest, while minimizing the potential for an individual record to be identified. In recent years, the notion of differential privacy has received much attention in theoretical computer science, machine learning, and statistics. It provides a rigorous and strong notion of protection for individuals' sensitive information. A fundamental question is how to incorporate differential privacy into traditional statistical inference procedures. In this paper we study model selection in multivariate linear regression under the constraint of differential privacy. We show that model selection procedures based on penalized least squares or likelihood can be made differentially private by a combination of regularization and randomization, and propose two algorithms to do so. We show that our private procedures are consistent under essentially the same conditions as the corresponding non-private procedures. We also find that under differential privacy, the procedure becomes more sensitive to the tuning parameters. We illustrate and evaluate our method using simulation studies and two real data examples

    Deep Markov Random Field for Image Modeling

    Full text link
    Markov Random Fields (MRFs), a formulation widely used in generative image modeling, have long been plagued by the lack of expressive power. This issue is primarily due to the fact that conventional MRFs formulations tend to use simplistic factors to capture local patterns. In this paper, we move beyond such limitations, and propose a novel MRF model that uses fully-connected neurons to express the complex interactions among pixels. Through theoretical analysis, we reveal an inherent connection between this model and recurrent neural networks, and thereon derive an approximated feed-forward network that couples multiple RNNs along opposite directions. This formulation combines the expressive power of deep neural networks and the cyclic dependency structure of MRF in a unified model, bringing the modeling capability to a new level. The feed-forward approximation also allows it to be efficiently learned from data. Experimental results on a variety of low-level vision tasks show notable improvement over state-of-the-arts.Comment: Accepted at ECCV 201

    Aerosol mass and black carbon concentrations, a two year record at NCO-P (5079 m, Southern Himalayas)

    Get PDF
    Aerosol mass and the absorbing fraction are important variables, needed to constrain the role of atmospheric particles in the Earth radiation budget, both directly and indirectly through CCN activation. In particular, their monitoring in remote areas and mountain sites is essential for determining source regions, elucidating the mechanisms of long range transport of anthropogenic pollutants, and validating regional and global models. Since March 2006, aerosol mass and black carbon concentration have been monitored at the Nepal Climate Observatory-Pyramid, a permanent high-altitude research station located in the Khumbu valley at 5079 m a.s.l. below Mt. Everest. The first two-year averages of PM<sub>1</sub> and PM<sub>1−10</sub> mass were 1.94 μg m<sup>−3</sup> and 1.88 μg m<sup>−3</sup>, with standard deviations of 3.90 μg m<sup>−3</sup> and 4.45 μg m<sup>−3</sup>, respectively, while the black carbon concentration average is 160.5 ng m<sup>−3</sup>, with a standard deviation of 296.1 ng m<sup>−3</sup>. Both aerosol mass and black carbon show well defined annual cycles, with a maximum during the pre-monsoon season and a minimum during the monsoon. They also display a typical diurnal cycle during all the seasons, with the lowest particle concentration recorded during the night, and a considerable increase during the afternoon, revealing the major role played by thermal winds in influencing the behaviour of atmospheric compounds over the high Himalayas. The aerosol concentration is subject to high variability: in fact, as well as frequent "background conditions" (55% of the time) when BC concentrations are mainly below 100 ng m<sup>−3</sup>, concentrations up to 5 μg m<sup>−3</sup> are reached during some episodes (a few days every year) in the pre-monsoon seasons. The variability of PM and BC is the result of both short-term changes due to thermal wind development in the valley, and long-range transport/synoptic circulation. At NCO-P, higher concentrations of PM<sub>1</sub> and BC are mostly associated with regional circulation and westerly air masses from the Middle East, while the strongest contributions of mineral dust arrive from the Middle East and regional circulation, with a special contribution from North Africa and South-West Arabian Peninsula in post-monsoon and winter season

    Preliminary Estimation of Black Carbon Deposition from Nepal Climate Observatory-Pyramid Data and Its Possible Impact on Snow Albedo Changes Over Himalayan Glaciers During the Pre-Monsoon Season

    Get PDF
    The possible minimal range of reduction in snow surface albedo due to dry deposition of black carbon (BC) in the pre-monsoon period (March-May) was estimated as a lower bound together with the estimation of its accuracy, based on atmospheric observations at the Nepal Climate Observatory-Pyramid (NCO-P) sited at 5079 m a.s.l. in the Himalayan region. We estimated a total BC deposition rate of 2.89 g m-2 day-1 providing a total deposition of 266 micrograms/ square m for March-May at the site, based on a calculation with a minimal deposition velocity of 1.0 10(exp -4) m/s with atmospheric data of equivalent BC concentration. Main BC size at NCO-P site was determined as 103.1-669.8 nm by correlation analysis between equivalent BC concentration and particulate size distribution in the atmosphere. We also estimated BC deposition from the size distribution data and found that 8.7% of the estimated dry deposition corresponds to the estimated BC deposition from equivalent BC concentration data. If all the BC is deposited uniformly on the top 2-cm pure snow, the corresponding BC concentration is 26.0-68.2 microgram/kg assuming snow density variations of 195-512 kg/ cubic m of Yala Glacier close to NCO-P site. Such a concentration of BC in snow could result in 2.0-5.2% albedo reductions. From a simple numerical calculations and if assuming these albedo reductions continue throughout the year, this would lead to a runoff increases of 70-204 mm of water drainage equivalent of 11.6-33.9% of the annual discharge of a typical Tibetan glacier. Our estimates of BC concentration in snow surface for pre-monsoon season can be considered comparable to those at similar altitude in the Himalayan region, where glaciers and perpetual snow region starts in the vicinity of NCO-P. Our estimates from only BC are likely to represent a lower bound for snow albedo reductions, since a fixed slower deposition velocity was used and atmospheric wind and turbulence effects, snow aging, dust deposition, and snow albedo feedbacks were not considered. This study represents the first investigation about BC deposition on snow from atmospheric aerosol data in Himalayas and related albedo effect is especially the first track at the southern slope of Himalayas
    corecore