876 research outputs found

    Revisiting Unsupervised Learning for Defect Prediction

    Full text link
    Collecting quality data from software projects can be time-consuming and expensive. Hence, some researchers explore "unsupervised" approaches to quality prediction that does not require labelled data. An alternate technique is to use "supervised" approaches that learn models from project data labelled with, say, "defective" or "not-defective". Most researchers use these supervised models since, it is argued, they can exploit more knowledge of the projects. At FSE'16, Yang et al. reported startling results where unsupervised defect predictors outperformed supervised predictors for effort-aware just-in-time defect prediction. If confirmed, these results would lead to a dramatic simplification of a seemingly complex task (data mining) that is widely explored in the software engineering literature. This paper repeats and refutes those results as follows. (1) There is much variability in the efficacy of the Yang et al. predictors so even with their approach, some supervised data is required to prune weaker predictors away. (2)Their findings were grouped across NN projects. When we repeat their analysis on a project-by-project basis, supervised predictors are seen to work better. Even though this paper rejects the specific conclusions of Yang et al., we still endorse their general goal. In our our experiments, supervised predictors did not perform outstandingly better than unsupervised ones for effort-aware just-in-time defect prediction. Hence, they may indeed be some combination of unsupervised learners to achieve comparable performance to supervised ones. We therefore encourage others to work in this promising area.Comment: 11 pages, 5 figures. Accepted at FSE201

    Bellwethers: A Baseline Method For Transfer Learning

    Full text link
    Software analytics builds quality prediction models for software projects. Experience shows that (a) the more projects studied, the more varied are the conclusions; and (b) project managers lose faith in the results of software analytics if those results keep changing. To reduce this conclusion instability, we propose the use of "bellwethers": given N projects from a community the bellwether is the project whose data yields the best predictions on all others. The bellwethers offer a way to mitigate conclusion instability because conclusions about a community are stable as long as this bellwether continues as the best oracle. Bellwethers are also simple to discover (just wrap a for-loop around standard data miners). When compared to other transfer learning methods (TCA+, transfer Naive Bayes, value cognitive boosting), using just the bellwether data to construct a simple transfer learner yields comparable predictions. Further, bellwethers appear in many SE tasks such as defect prediction, effort estimation, and bad smell detection. We hence recommend using bellwethers as a baseline method for transfer learning against which future work should be comparedComment: 23 Page

    Are Delayed Issues Harder to Resolve? Revisiting Cost-to-Fix of Defects throughout the Lifecycle

    Full text link
    Many practitioners and academics believe in a delayed issue effect (DIE); i.e. the longer an issue lingers in the system, the more effort it requires to resolve. This belief is often used to justify major investments in new development processes that promise to retire more issues sooner. This paper tests for the delayed issue effect in 171 software projects conducted around the world in the period from 2006--2014. To the best of our knowledge, this is the largest study yet published on this effect. We found no evidence for the delayed issue effect; i.e. the effort to resolve issues in a later phase was not consistently or substantially greater than when issues were resolved soon after their introduction. This paper documents the above study and explores reasons for this mismatch between this common rule of thumb and empirical data. In summary, DIE is not some constant across all projects. Rather, DIE might be an historical relic that occurs intermittently only in certain kinds of projects. This is a significant result since it predicts that new development processes that promise to faster retire more issues will not have a guaranteed return on investment (depending on the context where applied), and that a long-held truth in software engineering should not be considered a global truism.Comment: 31 pages. Accepted with minor revisions to Journal of Empirical Software Engineering. Keywords: software economics, phase delay, cost to fi
    corecore