3,511 research outputs found
Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)
We report and fix an important systematic error in prior studies that ranked
classifiers for software analytics. Those studies did not (a) assess
classifiers on multiple criteria and they did not (b) study how variations in
the data affect the results. Hence, this paper applies (a) multi-criteria tests
while (b) fixing the weaker regions of the training data (using SMOTUNED, which
is a self-tuning version of SMOTE). This approach leads to dramatically large
increases in software defect predictions. When applied in a 5*5
cross-validation study for 3,681 JAVA classes (containing over a million lines
of code) from open source systems, SMOTUNED increased AUC and recall by 60% and
20% respectively. These improvements are independent of the classifier used to
predict for quality. Same kind of pattern (improvement) was observed when a
comparative analysis of SMOTE and SMOTUNED was done against the most recent
class imbalance technique. In conclusion, for software analytic tasks like
defect prediction, (1) data pre-processing can be more important than
classifier choice, (2) ranking studies are incomplete without such
pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of
Software Engineering (ICSE), 201
The Minimum Shared Edges Problem on Grid-like Graphs
We study the NP-hard Minimum Shared Edges (MSE) problem on graphs: decide
whether it is possible to route paths from a start vertex to a target
vertex in a given graph while using at most edges more than once. We show
that MSE can be decided on bounded (i.e. finite) grids in linear time when both
dimensions are either small or large compared to the number of paths. On
the contrary, we show that MSE remains NP-hard on subgraphs of bounded grids.
Finally, we study MSE from a parametrised complexity point of view. It is known
that MSE is fixed-parameter tractable with respect to the number of paths.
We show that, under standard complexity-theoretical assumptions, the problem
parametrised by the combined parameter , , maximum degree, diameter, and
treewidth does not admit a polynomial-size problem kernel, even when restricted
to planar graphs
The Complexity of Routing with Few Collisions
We study the computational complexity of routing multiple objects through a
network in such a way that only few collisions occur: Given a graph with
two distinct terminal vertices and two positive integers and , the
question is whether one can connect the terminals by at least routes (e.g.
paths) such that at most edges are time-wise shared among them. We study
three types of routes: traverse each vertex at most once (paths), each edge at
most once (trails), or no such restrictions (walks). We prove that for paths
and trails the problem is NP-complete on undirected and directed graphs even if
is constant or the maximum vertex degree in the input graph is constant.
For walks, however, it is solvable in polynomial time on undirected graphs for
arbitrary and on directed graphs if is constant. We additionally study
for all route types a variant of the problem where the maximum length of a
route is restricted by some given upper bound. We prove that this
length-restricted variant has the same complexity classification with respect
to paths and trails, but for walks it becomes NP-complete on undirected graphs
Does rapid urbanization aggravate health disparities? Reflections on the epidemiological transition in Pune, India
Background: Rapid urbanization in low- and middle-income countries reinforces risk and epidemiological transition in urban societies, which are characterized by high socioeconomic gradients. Limited availability of disaggregated morbidity data in these settings impedes research on epidemiological profiles of different population subgroups. Objective: The study aimed to analyze the epidemiological transition in the emerging megacity of Pune with respect to changing morbidity and mortality patterns, also taking into consideration health disparities among different socioeconomic groups. Design: A mixed-methods approach was used, comprising secondary analysis of mortality data, a survey among 900 households in six neighborhoods with different socioeconomic profiles, 46 in-depth interviews with laypeople, and expert interviews with 37 health care providers and 22 other health care workers. Results: The mortality data account for an epidemiological transition with an increasing number of deaths due to non-communicable diseases (NCDs) in Pune. The share of deaths due to infectious and parasitic diseases remained nearly constant, though the cause of deaths changed considerably within this group. The survey data and expert interviews indicated a slightly higher prevalence of diabetes and hypertension among higher socioeconomic groups, but a higher incidence and more frequent complications and comorbidities in lower socioeconomic groups. Although the self-reported morbidity for malaria, gastroenteritis, and tuberculosis did not show a socioeconomic pattern, experts estimated the prevalence in lower socioeconomic groups to be higher, though all groups in Pune would be affected. Conclusions: The rising burden of NCDs among all socioeconomic groups and the concurrent persistence of communicable diseases pose a major challenge for public health. Improvement of urban health requires a stronger focus on health promotion and disease prevention for all socioeconomic groups with a holistic understanding of urban health. In order to derive evidence-based solutions and interventions, routine surveillance data become indispensable
Application of the Shiono and Knight Method in asymmetric compound channels with different side slopes of the internal wall
The Shiono and Knight Method (SKM) is widely used to predict the lateral distribution of depth-averaged velocity and boundary shear stress for flows in compound channels. Three calibrating coefficients need to be estimated for applying the SKM, namely eddy viscosity coefficient (λ), friction factor (f) and secondary flow coefficient (k). There are several tested methods which can satisfactorily be used to estimate λ, f. However, the calibration of secondary flow coefficients k to account for secondary flow effects correctly is still problematic. In this paper, the calibration of secondary flow coefficients is established by employing two approaches to estimate correct values of k for simulating asymmetric compound channel with different side slopes of the internal wall. The first approach is based on Abril and Knight (2004) who suggest fixed values for main channel and floodplain regions. In the second approach, the equations developed by Devi and Khatua (2017) that relate the variation of the secondary flow coefficients with the relative depth (β) and width ratio (α) are used. The results indicate that the calibration method developed by Devi and Khatua (2017) is a better choice for calibrating the secondary flow coefficients than using the first approach which assumes a fixed value of k for different flow depths. The results also indicate that the boundary condition based on the shear force continuity can successfully be used for simulating rectangular compound channels, while the continuity of depth-averaged velocity and its gradient is accepted boundary condition in simulations of trapezoidal compound channels. However, the SKM performance for predicting the boundary shear stress over the shear layer region may not be improved by only imposing the suitable calibrated values of secondary flow coefficients. This is because difficulties of modelling the complex interaction that develops between the flows in the main channel and on the floodplain in this region
The nuclear receptors of Biomphalaria glabrata and Lottia gigantea: Implications for developing new model organisms
© 2015 Kaur et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are creditedNuclear receptors (NRs) are transcription regulators involved in an array of diverse physiological functions including key roles in endocrine and metabolic function. The aim of this study was to identify nuclear receptors in the fully sequenced genome of the gastropod snail, Biomphalaria glabrata, intermediate host for Schistosoma mansoni and compare these to known vertebrate NRs, with a view to assessing the snail's potential as a invertebrate model organism for endocrine function, both as a prospective new test organism and to elucidate the fundamental genetic and mechanistic causes of disease. For comparative purposes, the genome of a second gastropod, the owl limpet, Lottia gigantea was also investigated for nuclear receptors. Thirty-nine and thirty-three putative NRs were identified from the B. glabrata and L. gigantea genomes respectively, based on the presence of a conserved DNA-binding domain and/or ligand-binding domain. Nuclear receptor transcript expression was confirmed and sequences were subjected to a comparative phylogenetic analysis, which demonstrated that these molluscs have representatives of all the major NR subfamilies (1-6). Many of the identified NRs are conserved between vertebrates and invertebrates, however differences exist, most notably, the absence of receptors of Group 3C, which includes some of the vertebrate endocrine hormone targets. The mollusc genomes also contain NR homologues that are present in insects and nematodes but not in vertebrates, such as Group 1J (HR48/DAF12/HR96). The identification of many shared receptors between humans and molluscs indicates the potential for molluscs as model organisms; however the absence of several steroid hormone receptors indicates snail endocrine systems are fundamentally different.The National Centre for the Replacement, Refinement and Reduction of Animals in Research, Grant Ref:G0900802 to CSJ, LRN, SJ & EJR [www.nc3rs.org.uk]
Observation of mesoscopic crystalline structures in a two-dimensional Rydberg gas
The ability to control and tune interactions in ultracold atomic gases has
paved the way towards the realization of new phases of matter. Whereas
experiments have so far achieved a high degree of control over short-ranged
interactions, the realization of long-range interactions would open up a whole
new realm of many-body physics and has become a central focus of research.
Rydberg atoms are very well-suited to achieve this goal, as the van der Waals
forces between them are many orders of magnitude larger than for ground state
atoms. Consequently, the mere laser excitation of ultracold gases can cause
strongly correlated many-body states to emerge directly when atoms are
transferred to Rydberg states. A key example are quantum crystals, composed of
coherent superpositions of different spatially ordered configurations of
collective excitations. Here we report on the direct measurement of strong
correlations in a laser excited two-dimensional atomic Mott insulator using
high-resolution, in-situ Rydberg atom imaging. The observations reveal the
emergence of spatially ordered excitation patterns in the high-density
components of the prepared many-body state. They have random orientation, but
well defined geometry, forming mesoscopic crystals of collective excitations
delocalised throughout the gas. Our experiment demonstrates the potential of
Rydberg gases to realise exotic phases of matter, thereby laying the basis for
quantum simulations of long-range interacting quantum magnets.Comment: 10 pages, 7 figure
Structural, elastic, mechanical and thermodynamic properties of Terbium oxide: First-principles investigations
First-principles investigations of the Terbium oxide TbO are performed on structural, elastic, mechanical and thermodynamic properties. The investigations are accomplished by employing full potential augmented plane wave FP-LAPW method framed within density functional theory DFT as implemented in the WIEN2k package. The exchange-correlation energy functional, a part of the total energy functional, is treated through Perdew Burke Ernzerhof scheme of the Generalized Gradient Approximation PBEGGA. The calculations of the ground state structural parameters, like lattice constants a0, bulk moduli B and their pressure derivative B′ values, are done for the rock-salt RS, zinc-blende ZB, cesium chloride CsCl, wurtzite WZ and nickel arsenide NiAs polymorphs of the TbO compound. The elastic constants (C11, C12, C13, C33, and C44) and mechanical properties (Young's modulus Y, Shear modulus S, Poisson's ratio σ, Anisotropic ratio A and compressibility β), were also calculated to comprehend its potential for valuable applications. From our calculations, the RS phase of TbO compound was found strongest one mechanically amongst the studied cubic structures whereas from hexagonal phases, the NiAs type structure was found stronger than WZ phase of the TbO. To analyze the ductility of the different structures of the TbO, Pugh's rule (B/SH) and Cauchy pressure (C12–C44) approaches are used. It was found that ZB, CsCl and WZ type structures of the TbO were of ductile nature with the obvious dominance of the ionic bonding while RS and NiAs structures exhibited brittle nature with the covalent bonding dominance. Moreover, Debye temperature was calculated for both cubic and hexagonal structures of TbO in question by averaging the computed sound velocities
- …
