224 research outputs found
Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules
Association rules are among the most widely employed data analysis methods in
the field of Data Mining. An association rule is a form of partial implication
between two sets of binary variables. In the most common approach, association
rules are parameterized by a lower bound on their confidence, which is the
empirical conditional probability of their consequent given the antecedent,
and/or by some other parameter bounds such as "support" or deviation from
independence. We study here notions of redundancy among association rules from
a fundamental perspective. We see each transaction in a dataset as an
interpretation (or model) in the propositional logic sense, and consider
existing notions of redundancy, that is, of logical entailment, among
association rules, of the form "any dataset in which this first rule holds must
obey also that second rule, therefore the second is redundant". We discuss
several existing alternative definitions of redundancy between association
rules and provide new characterizations and relationships among them. We show
that the main alternatives we discuss correspond actually to just two variants,
which differ in the treatment of full-confidence implications. For each of
these two notions of redundancy, we provide a sound and complete deduction
calculus, and we show how to construct complete bases (that is,
axiomatizations) of absolutely minimum size in terms of the number of rules. We
explore finally an approach to redundancy with respect to several association
rules, and fully characterize its simplest case of two partial premises.Comment: LMCS accepted pape
Algebraic Comparison of Partial Lists in Bioinformatics
The outcome of a functional genomics pipeline is usually a partial list of
genomic features, ranked by their relevance in modelling biological phenotype
in terms of a classification or regression model. Due to resampling protocols
or just within a meta-analysis comparison, instead of one list it is often the
case that sets of alternative feature lists (possibly of different lengths) are
obtained. Here we introduce a method, based on the algebraic theory of
symmetric groups, for studying the variability between lists ("list stability")
in the case of lists of unequal length. We provide algorithms evaluating
stability for lists embedded in the full feature set or just limited to the
features occurring in the partial lists. The method is demonstrated first on
synthetic data in a gene filtering task and then for finding gene profiles on a
recent prostate cancer dataset
Space-Time Structure of Loop Quantum Black Hole
In this paper we have improved the semiclassical analysis of loop quantum
black hole (LQBH) in the conservative approach of constant polymeric parameter.
In particular we have focused our attention on the space-time structure. We
have introduced a very simple modification of the spherically symmetric
Hamiltonian constraint in its holonomic version. The new quantum constraint
reduces to the classical constraint when the polymeric parameter goes to
zero.Using this modification we have obtained a large class of semiclassical
solutions parametrized by a generic function of the polymeric parameter. We
have found that only a particular choice of this function reproduces the black
hole solution with the correct asymptotic flat limit. In r=0 the semiclassical
metric is regular and the Kretschmann invariant has a maximum peaked in
L-Planck. The radial position of the pick does not depend on the black hole
mass and the polymeric parameter. The semiclassical solution is very similar to
the Reissner-Nordstrom metric. We have constructed the Carter-Penrose diagrams
explicitly, giving a causal description of the space-time and its maximal
extension. The LQBH metric interpolates between two asymptotically flat
regions, the r to infinity region and the r to 0 region. We have studied the
thermodynamics of the semiclassical solution. The temperature, entropy and the
evaporation process are regular and could be defined independently from the
polymeric parameter. We have studied the particular metric when the polymeric
parameter goes towards to zero. This metric is regular in r=0 and has only one
event horizon in r = 2m. The Kretschmann invariant maximum depends only on
L-Planck. The polymeric parameter does not play any role in the black hole
singularity resolution. The thermodynamics is the same.Comment: 17 pages, 19 figure
Active children through individual vouchers – evaluation (ACTIVE): protocol for a mixed method randomised control trial to increase physical activity levels in teenagers
BackgroundMany teenagers are insufficiently active despite the health benefits of physical activity (PA). There is strong evidence to show that inactivity and low fitness levels increase the risk of non-communicable diseases such as coronary heart disease (CHD), type 2 diabetes and breast and colon cancers (Lee et al. Lancet 380:219–29, 2012). A major barrier facing adolescents is accessibility (e.g. cost and lack of local facilities). The ACTIVE project aims to tackle this barrier through a multi-faceted intervention, giving teenagers vouchers to spend on activities of their choice and empowering young people to improve their fitness and PA levels.DesignACTIVE is a mixed methods randomised control trial in 7 secondary schools in Swansea, South Wales. Quantitative and qualitative measures including PA (cooper run test (CRT), accelerometery over 7 days), cardiovascular (CV) measures (blood pressure, pulse wave analysis) and focus groups will be undertaken at 4 separate time points (baseline, 6 months,12 months and follow-up at 18 months). Intervention schools will receive a multi-component intervention involving 12 months of £20 vouchers to spend on physical activities of their choice, a peer mentor scheme and opportunities to attend advocacy meetings. Control schools are encouraged to continue usual practice. The primary aim is to examine the effect of the intervention in improving cardiovascular fitness.DiscussionThis paper describes the protocol for the ACTIVE randomised control trial, which aims to increase fitness, physical activity and socialisation of teenagers in Swansea, UK via a voucher scheme combined with peer mentoring. Results can contribute to the evidence base on teenage physical activity and, if effective, the intervention has the potential to inform future physical activity interventions and policy
Limitations of estimating branch volume from terrestrial laser scanning
Quantitative structural models (QSMs) are frequently used to simplify single tree point clouds obtained by terrestrial laser scanning (TLS). QSMs use geometric primitives to derive topological and volumetric information about trees. Previous studies have shown a high agreement between TLS and QSM total volume estimates alongside field measured data for whole trees. Although already broadly applied, the uncertainties of the combination of TLS and QSM modelling are still largely unexplored. In our study, we investigated the effect of scanning distance on length and volume estimates of branches when deriving QSMs from TLS data. We scanned ten European beech (Fagus sylvatica L.) branches with an average length of 2.6 m. The branches were scanned from distances ranging from 5 to 45 m at step intervals of 5 m from three scan positions each. Twelve close-range scans were performed as a benchmark. For each distance and branch, QSMs were derived. We found that with increasing distance, the point cloud density and the cumulative length of the reconstructed branches decreased, whereas individual volumes increased. Dependent on the QSM hyperparameters, at a scanning distance of 45 m, cumulative branch length was on average underestimated by − 75%, while branch volume was overestimated by up to + 539%. We assume that the high deviations are related to point cloud quality. As the scanning distance increases, the size of the individual laser footprints and the distances between them increase, making it more difficult to fully capture small branches and to adjust suitable QSMs
Limitations of estimating branch volume from terrestrial laser scanning
Quantitative structural models (QSMs) are frequently used to simplify single tree point clouds obtained by terrestrial laser scanning (TLS). QSMs use geometric primitives to derive topological and volumetric information about trees. Previous studies have shown a high agreement between TLS and QSM total volume estimates alongside field measured data for whole trees. Although already broadly applied, the uncertainties of the combination of TLS and QSM modelling are still largely unexplored. In our study, we investigated the effect of scanning distance on length and volume estimates of branches when deriving QSMs from TLS data. We scanned ten European beech (Fagus sylvatica L.) branches with an average length of 2.6 m. The branches were scanned from distances ranging from 5 to 45 m at step intervals of 5 m from three scan positions each. Twelve close-range scans were performed as a benchmark. For each distance and branch, QSMs were derived. We found that with increasing distance, the point cloud density and the cumulative length of the reconstructed branches decreased, whereas individual volumes increased. Dependent on the QSM hyperparameters, at a scanning distance of 45 m, cumulative branch length was on average underestimated by − 75%, while branch volume was overestimated by up to + 539%. We assume that the high deviations are related to point cloud quality. As the scanning distance increases, the size of the individual laser footprints and the distances between them increase, making it more difficult to fully capture small branches and to adjust suitable QSMs
Assessment of Bias in Pan-Tropical Biomass Predictions
Above-ground biomass (AGB) is an essential descriptor of forests, of use in ecological and climate-related research. At tree- and stand-scale, destructive but direct measurements of AGB are replaced with predictions from allometric models characterizing the correlational relationship between AGB, and predictor variables including stem diameter, tree height and wood density. These models are constructed from harvested calibration data, usually via linear regression. Here, we assess systematic error in out-of-sample predictions of AGB introduced during measurement, compilation and modeling of in-sample calibration data. Various conventional bivariate and multivariate models are constructed from open access data of tropical forests. Metadata analysis, fit diagnostics and cross-validation results suggest several model misspecifications: chiefly, unaccounted for inconsistent measurement error in predictor variables between in- and out-of-sample data. Simulations demonstrate conservative inconsistencies can introduce significant bias into tree- and stand-scale AGB predictions. When tree height and wood density are included as predictors, models should be modified to correct for bias. Finally, we explore a fundamental assumption of conventional allometry, that model parameters are independent of tree size. That is, the same model can provide predictions of consistent trueness irrespective of size-class. Most observations in current calibration datasets are from smaller trees, meaning the existence of a size dependency would bias predictions for larger trees. We determine that detecting the absence or presence of a size dependency is currently prevented by model misspecifications and calibration data imbalances. We call for the collection of additional harvest data, specifically under-represented larger trees
Benchmarking airborne laser scanning tree segmentation algorithms in broadleaf forests shows high accuracy only for canopy trees
Individual tree segmentation from airborne laser scanning data is a longstanding and important challenge in forest remote sensing. Tree segmentation algorithms are widely available, but robust intercomparison studies are rare due to the difficulty of obtaining reliable reference data. Here we provide a benchmark data set for temperate and tropical broadleaf forests generated from labelled terrestrial laser scanning data. We compared the performance of four widely used tree segmentation algorithms against this benchmark data set. All algorithms performed reasonably well on the canopy trees. The point cloud-based algorithm AMS3D (Adaptive Mean Shift 3D) had the highest overall accuracy, closely followed by the 2D raster based region growing algorithm Dalponte2016 +. However, all algorithms failed to accurately segment the understory trees. This result was consistent across both forest types. This study emphasises the need to assess tree segmentation algorithms directly using benchmark data, rather than comparing with forest indices such as biomass or the number and size distribution of trees. We provide the first openly available benchmark data set for tropical forests and we hope future studies will extend this work to other regions
Leaf and wood classification framework for terrestrial LiDAR point clouds
Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society. Leaf and wood separation is a key step to allow a new range of estimates from Terrestrial LiDAR data, such as quantifying above-ground biomass, leaf and wood area and their 3D spatial distributions. We present a new method to separate leaf and wood from single tree point clouds automatically. Our approach combines unsupervised classification of geometric features and shortest path analysis. The automated separation algorithm and its intermediate steps are presented and validated. Validation consisted of using a testing framework with synthetic point clouds, simulated using ray-tracing and 3D tree models and 10 field scanned tree point clouds. To evaluate results we calculated accuracy, kappa coefficient and F-score. Validation using simulated data resulted in an overall accuracy of 0.83, ranging from 0.71 to 0.94. Per tree average accuracy from synthetic data ranged from 0.77 to 0.89. Field data results presented and overall average accuracy of 0.89. Analysis of each step showed accuracy ranging from 0.75 to 0.98. F-scores from both simulated and field data were similar, with scores from leaf usually higher than for wood. Our separation method showed results similar to others in literature, albeit from a completely automated workflow. Analysis of each separation step suggests that the addition of path analysis improved the robustness of our algorithm. Accuracy can be improved with per tree parameter optimization. The library containing our separation script can be easily installed and applied to single tree point cloud. Average processing times are below 10 min for each tree
- …
