485 research outputs found
Predicting software faults in large space systems using machine learning techniques
Recently, the use of machine learning (ML) algorithms has proven to be of great practical value in solving a variety of engineering problems including the prediction of failure, fault, and defect-proneness as the space system software becomes complex. One of the most active areas of recent research in ML has been the use of ensemble classifiers. How ML techniques (or classifiers) could be used to predict software faults in space systems, including many aerospace systems is shown, and further use ensemble individual classifiers by having them vote for the most popular class to improve system software fault-proneness prediction. Benchmarking results on four NASA public datasets show the Naive Bayes classifier as more robust software fault prediction while most ensembles with a decision tree classifier as one of its components achieve higher accuracy rates
Ensemble missing data techniques for software effort prediction
Constructing an accurate effort prediction model is a challenge in software engineering. The development and validation of models that are used for prediction tasks require good quality data. Unfortunately, software engineering datasets tend to suffer from the incompleteness which could result to inaccurate decision making and project management and implementation. Recently, the use of machine learning algorithms has proven to be of great practical value in solving a variety of software engineering problems including software prediction, including the use of ensemble (combining) classifiers. Research indicates that ensemble individual classifiers lead to a significant improvement in classification performance by having them vote for the most popular class. This paper proposes a method for improving software effort prediction accuracy produced by a decision tree learning algorithm and by generating the ensemble using two imputation methods as elements. Benchmarking results on ten industrial datasets show that the proposed ensemble strategy has the potential to improve prediction accuracy compared to an individual imputation method, especially if multiple imputation is a component of the ensemble
Improving the performance of the Rpper in insurance risk classification : a comparative study using feature selection
The Ripper algorithm is designed to generate rule sets for large datasets with many features. However, it was shown that the algorithm struggles with classification performance in the presence of missing data. The algorithm struggles to classify instances when the quality of the data deteriorates as a result of increasing missing data. In this paper, a feature selection technique is used to help improve the classification performance of the Ripper model. Principal component analysis and evidence automatic relevance determination techniques are used to improve the performance. A comparison is done to see which technique helps the algorithm improve the most. Training datasets with completely observable data were used to construct the model and testing datasets with missing values were used for measuring accuracy. The results showed that principal component analysis is a better feature selection for the Ripper in improving the classification performance
REFLECTION AND COGITATION ON THE FALLACY OF POST-APARTHEID JURISPRUDENCE AND THE RESIDUE OF APARTHEID JURISPRUDENCE: THE MARIKANA MASSACRE
This article reflects on the fallacy of post-apartheid jurisprudence (particularly, the Marikana Massacre as a cornerstone of this ‘fallacy’) in South Africa. For the purposes of this article, the term ‘fallacy’ is used in its literal sense – that is, the so-called ‘post-apartheid jurisprudence’ is a mistaken belief, because there is still residue of apartheid economic policies in the ‘post’-apartheid legal dispensation. This mistaken belief is demonstrated below considering the events that took place prior (i.e. economic policies), during (i.e. law from below) and after (i.e. access to Justice) the Marikana Massacre, and how one of the instrumental projects of post-apartheid legal dispensation, that of transformative constitutionalism, was hindered during these events
The Three Million Gang in Maokeng Township (Kroonstad) and the reaction of the African National Congress’s aligned structures
As early as 1989 when it was clear that there was a possibility of unbanning liberation movements in South Africa and securing the release of politicalprisoners, the African National Congress (ANC)-aligned structures in the different townships began openly and radically mobilising for the organisation.The ANC-aligned demonstrations and protests became everyday scenes around the country and it was evident that the South African Police (SAP) wasgradually battling to control the ANC-aligned citizens in most townships. In mid-1989, a gang known as the Three Million emerged in Maokeng Township(Kroonstad) and was accused by the community members to be operating as a vigilante group. Therefore, incidents of vigilantism by the Three Million Gangbecame a regular scene in this township. Using the Three Million as a case in point, I attempted to show how the ANC-aligned structures reacted to thisgang which was viewed as a vigilante group in the Maokeng Township
Ngiyabonga belungu bami
Considering how Black People are often seen/unseen as a monolith of ‘Black Bodies’ in township spaces in South Africa, I attempt tounpack what it looks like for Black People to blend into or stand out from contradictory, collective-led definitions of who/what Black people are in these spaces. By exploring where the individual Black Self meets the collective and how this delineation is blurred, I aim to delve further into notions of individuality and how these seep into a real or imagined whole. What does it take to be ‘kasi’? Whowants to be known as ‘kasi’? Who is ‘uDarkie ekasi’? As an arts practitioner developing an interdisciplinary praxis, I’m keen to explore these kinds of identity politics through the lens of translanguaging and township-based experiences and expressions. My aim is for this text to offer alternative insights into the intersection of ‘Black Bodies,’ the various notions associated with how ‘Black People’ are perceived, and their self-perceptions within township spaces
Andrew Manson and Bernard Mbenga, Land, Chiefs, Mining: South Africa’s North West Province since 1840
Interest in the historical dynamics of ethnicity and land owndership in South Africa have been on the rise recently. In the introduction to this work, the authors deal with the importance of the history of the Setswana-speaking population of today\u27s North West Province of South Africa. They also provide detail on the location of the province and show that the territory offers a number of unique features, including its important mining industry. The authors have in the past published scholarly work on the Batswana and their history. In this publication, they continue in the same vein by highlighting some of the neglected aspects of Batswana history. 
Handling Out-of-Sequence Data: Kalman Filter Methods or Statistical Imputation?
The issue of handling sensor measurements data over single and multiple lag delays also known as outof-sequence measurement (OOSM) has been considered. It is argued that this problem can also be addressed using model-based imputation strategies and their application in comparison to Kalman filter (KF)-based approaches for a multi-sensor tracking prediction problem has also been demonstrated. The effectiveness of two model-based imputation procedures against five OOSM methods was investigated in Monte Carlo simulation experiments. The delayed measurements were either incorporated (or fused) at the time these were finally available (using OOSM methods) or imputed in a random way with higher probability of delays for multiple lags and lower probability of delays for a single lag (using single or multiple imputation). For single lag, estimates of target tracking computed from the observed data and those based on a data set in which the delayed measurements were imputed were equally unbiased; however, the KF estimates obtained using the Bayesian framework (BF-KF) were more precise. When the measurements were delayed in a multiple lag fashion, there were significant differences in bias or precision between multiple imputation (MI) and OOSM methods, with the former exhibiting a superior performance at nearly all levels of probability of measurement delay and range of manoeuvring indices. Researchers working on sensor data are encouraged to take advantage of software to implement delayed measurements using MI, as estimates of tracking are more precise and less biased in the presence of delayed multi-sensor data than those derived from an observed data analysis approach.Defence Science Journal, 2010, 60(1), pp.87-99, DOI:http://dx.doi.org/10.14429/dsj.60.11
Recommended from our members
Effective techniques for handling incomplete data using decision trees
Decision Trees (DTs) have been recognized as one of the most successful formalisms for knowledge representation and reasoning and are currently applied to a variety of data mining or knowledge discovery applications, particularly for classification problems. There are several efficient methods to learn a DT from data. However, these methods are often limited to the assumption that data are complete.
In this thesis, some contributions to the field of machine learning and statistics that solve the problem of extracting DTs for learning and classification tasks from incomplete databases are presented. The methodology underlying the thesis blends together well-established statistical theories with the most advanced techniques for machine learning and automated reasoning with uncertainty.
The first contribution is the extensive simulations which study the impact of missing data on predictive accuracy of existing DTs which can cope with missing values, when missing values are in both the training and test sets or when they are in either of the two sets. All simulations are performed under missing completely at random, missing at random and informatively missing mechanisms and for different missing data patterns and proportions.
The proposal of a simple, novel, yet effective proposed procedure for training and testing using decision trees in the presence of missing data is the next contribution. Original and simple splitting criteria for attribute selection in tree building are put forward. The proposed technique is evaluated and validated in empirical tests over many real world application domains. In this work, the proposed algorithm maintains (sometimes exceeds) the outstanding accuracy of multiple imputation, especially on datasets containing mixed attributes and purely nominal attributes. Also, the proposed algorithm greatly improves in accuracy for IM data. Another major advantage of this method over multiple imputation is the important saving in computational resources due to it simplicity.
The next contribution is the proposal of three versions of simple probabilistic techniques that could be used for classifying incomplete vectors using decision trees based on complete data. The proposed procedure is superficially similar to that of fractional cases but more effective. The experimental results demonstrate that these approaches can achieve comparative quality to sophisticated algorithms like multiple imputation and therefore are applicable to all kinds of datasets.
Finally, novel uses of two proposed ensemble procedures for handling incomplete training and test data are proposed and discussed. The algorithms combine the two best approaches either with resampling (REMIMIA) or without resampling (EMIMIA) of the training data before growing the decision trees. Experiments are used to evaluate and validate the success of the proposed ensemble methods with respect to individual missing data techniques in the form of empirical tests. EMIMIA attains the highest overall level of prediction accuracy
- …
