81 research outputs found
High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso
The goal of supervised feature selection is to find a subset of input
features that are responsible for predicting output values. The least absolute
shrinkage and selection operator (Lasso) allows computationally efficient
feature selection based on linear dependency between input features and output
values. In this paper, we consider a feature-wise kernelized Lasso for
capturing non-linear input-output dependency. We first show that, with
particular choices of kernel functions, non-redundant features with strong
statistical dependence on output values can be found in terms of kernel-based
independence measures. We then show that the globally optimal solution can be
efficiently computed; this makes the approach scalable to high-dimensional
problems. The effectiveness of the proposed method is demonstrated through
feature selection experiments with thousands of features.Comment: 18 page
Thermal constraints on the reionisation of hydrogen by population-II stellar sources
Measurements of the intergalactic medium (IGM) temperature provide a
potentially powerful constraint on the reionisation history due to the thermal
imprint left by the photo-ionisation of neutral hydrogen. However, until
recently IGM temperature measurements were limited to redshifts 2 < z < 4.8,
restricting the ability of these data to probe the reionisation history at z >
6. In this work, we use recent measurements of the IGM temperature in the
near-zones of seven quasars at z ~ 5.8 - 6.4, combined with a semi-numerical
model for inhomogeneous reionisation, to establish new constraints on the
redshift at which hydrogen reionisation completed. We calibrate the model to
reproduce observational constraints on the electron scattering optical depth
and the HI photo-ionisation rate, and compute the resulting spatially
inhomogeneous temperature distribution at z ~ 6 for a variety of reionisation
scenarios. Under standard assumptions for the ionising spectra of population-II
sources, the near-zone temperature measurements constrain the redshift by which
hydrogen reionisation was complete to be z > 7.9 (6.5) at 68 (95) per cent
confidence. We conclude that future temperature measurements around other high
redshift quasars will significantly increase the power of this technique,
enabling these results to be tightened and generalised.Comment: 15 pages, 8 figures, accepted for publication in MNRA
Addressing preference heterogeneity in public health policy by combining Cluster Analysis and Multi-Criteria Decision Analysis: Proof of Method.
The use of subgroups based on biological-clinical and socio-demographic variables to deal with population heterogeneity is well-established in public policy. The use of subgroups based on preferences is rare, except when religion based, and controversial. If it were decided to treat subgroup preferences as valid determinants of public policy, a transparent analytical procedure is needed. In this proof of method study we show how public preferences could be incorporated into policy decisions in a way that respects both the multi-criterial nature of those decisions, and the heterogeneity of the population in relation to the importance assigned to relevant criteria. It involves combining Cluster Analysis (CA), to generate the subgroup sets of preferences, with Multi-Criteria Decision Analysis (MCDA), to provide the policy framework into which the clustered preferences are entered. We employ three techniques of CA to demonstrate that not only do different techniques produce different clusters, but that choosing among techniques (as well as developing the MCDA structure) is an important task to be undertaken in implementing the approach outlined in any specific policy context. Data for the illustrative, not substantive, application are from a Randomized Controlled Trial of online decision aids for Australian men aged 40-69 years considering Prostate-specific Antigen testing for prostate cancer. We show that such analyses can provide policy-makers with insights into the criterion-specific needs of different subgroups. Implementing CA and MCDA in combination to assist in the development of policies on important health and community issues such as drug coverage, reimbursement, and screening programs, poses major challenges -conceptual, methodological, ethical-political, and practical - but most are exposed by the techniques, not created by them
Improved measurements of the intergalactic medium temperature around quasars: possible evidence for the initial stages of He II reionization at z ≃ 6
We present measurements of the intergalactic medium (IGM) temperature within ∼5 proper Mpc of seven luminous quasars at z ≃ 6. The constraints are obtained from the Doppler widths of Lyα absorption lines in the quasar near zones and build upon our previous measurement for the z = 6.02 quasar SDSS J0818+1722. The expanded data set, combined with an improved treatment of systematic uncertainties, yields an average temperature at the mean density of log(T_0/K) = 4.21±^(0.03)_(0.03)(±^(0.06)_(0.07)) at 68 (95) per cent confidence for a flat prior distribution over 3.2 ≤ log (T_0/K) ≤ 4.8. In comparison, temperatures measured from the general IGM at z ≃ 5 are ∼0.3 dex cooler, implying an additional source of heating around these quasars which is not yet present in the general IGM at slightly lower redshift. This heating is most likely due to the recent reionization of He ii in vicinity of these quasars, which have hard and non-thermal ionizing spectra. The elevated temperatures may therefore represent evidence for the earliest stages of He ii reionization in the most biased regions of the high-redshift Universe. The temperature as a function of distance from the quasars is consistent with being constant, log (T_0/K) ≃ 4.2, with no evidence for a line-of-sight thermal proximity effect. However, the limited extent of the quasar near zones prevents the detection of He III regions larger than ∼5 proper Mpc. Under the assumption that the quasars have reionized the He II in their vicinity, we infer that the data are consistent with an average optically bright phase of duration in excess of 10^(6.5) yr. These measurements represent the highest redshift IGM temperature constraints to date, and thus provide a valuable data set for confronting models of H I reionization
SDSS-IV MaNGA: uncovering the angular momentum content of central and satellite early-type galaxies
We study 379 central and 159 satellite early-type galaxies with two-dimensional kinematics from the integral field survey Mapping Nearby Galaxies at APO (MaNGA) to determine how their angular momentum content depends on stellar and halo mass. Using the Yang et al. (2007) group catalog, we identify central and satellite galaxies in groups with halo masses in the range 1012:5 h-1 M_ 1011 h-2 M_ tend to have very little rotation, while nearly all galaxies at lower mass show some net rotation. The ~ 30% of high-mass galaxies that have significant rotation do not stand out in other galaxy properties except for a higher incidence of ionized gas emission. Our data are consistent with recent simulation results suggesting that major merging and gas accretion have more impact on the rotational support of lower-mass galaxies. When carefully matching the stellar mass distributions, we find no residual differences in angular momentum content between satellite and central galaxies at the 20% level. Similarly, at fixed mass, galaxies have consistent rotation properties across a wide range of halo mass. However, we find that errors in classification of centrals and satellites with group finders systematically lowers differences between satellite and central galaxies at a level that is comparable to current measurement uncertainties. To improve constraints, the impact of group finding methods will have to be forward modeled via mock catalogs
Recommended from our members
Early stopping and non-parametric regression: An optimal data-dependent stopping rule
Early stopping is a form of regularization based on choosing when to stop running an iterative algorithm. Focusing on non-parametric regression in a reproducing kernel Hilbert space, we analyze the early stopping strategy for a form of gradient-descent applied to the least-squares loss function. We propose a data-dependent stopping rule that does not involve hold-out or cross-validation data, and we prove upper bounds on the squared error of the resulting function estimate, measured in either the L2 (ℙ) and L2 (ℙn) norm. These upper bounds lead to minimax-optimal rates for various kernel classes, including Sobolev smoothness classes and other forms of reproducing kernel Hilbert spaces. We show through simulation that our stopping rule compares favorably to two other stopping rules, one based on hold-out data and the other based on Stein's unbiased risk estimate. We also establish a tight connection between our early stopping strategy and the solution path of a kernel ridge regression estimator. © 2014 Garvesh Raskutti, Martin J. Wainwright and Bin Yu
- …
