876 research outputs found
Revisiting Unsupervised Learning for Defect Prediction
Collecting quality data from software projects can be time-consuming and
expensive. Hence, some researchers explore "unsupervised" approaches to quality
prediction that does not require labelled data. An alternate technique is to
use "supervised" approaches that learn models from project data labelled with,
say, "defective" or "not-defective". Most researchers use these supervised
models since, it is argued, they can exploit more knowledge of the projects.
At FSE'16, Yang et al. reported startling results where unsupervised defect
predictors outperformed supervised predictors for effort-aware just-in-time
defect prediction. If confirmed, these results would lead to a dramatic
simplification of a seemingly complex task (data mining) that is widely
explored in the software engineering literature.
This paper repeats and refutes those results as follows. (1) There is much
variability in the efficacy of the Yang et al. predictors so even with their
approach, some supervised data is required to prune weaker predictors away.
(2)Their findings were grouped across projects. When we repeat their
analysis on a project-by-project basis, supervised predictors are seen to work
better.
Even though this paper rejects the specific conclusions of Yang et al., we
still endorse their general goal. In our our experiments, supervised predictors
did not perform outstandingly better than unsupervised ones for effort-aware
just-in-time defect prediction. Hence, they may indeed be some combination of
unsupervised learners to achieve comparable performance to supervised ones. We
therefore encourage others to work in this promising area.Comment: 11 pages, 5 figures. Accepted at FSE201
Bellwethers: A Baseline Method For Transfer Learning
Software analytics builds quality prediction models for software projects.
Experience shows that (a) the more projects studied, the more varied are the
conclusions; and (b) project managers lose faith in the results of software
analytics if those results keep changing. To reduce this conclusion
instability, we propose the use of "bellwethers": given N projects from a
community the bellwether is the project whose data yields the best predictions
on all others. The bellwethers offer a way to mitigate conclusion instability
because conclusions about a community are stable as long as this bellwether
continues as the best oracle. Bellwethers are also simple to discover (just
wrap a for-loop around standard data miners). When compared to other transfer
learning methods (TCA+, transfer Naive Bayes, value cognitive boosting), using
just the bellwether data to construct a simple transfer learner yields
comparable predictions. Further, bellwethers appear in many SE tasks such as
defect prediction, effort estimation, and bad smell detection. We hence
recommend using bellwethers as a baseline method for transfer learning against
which future work should be comparedComment: 23 Page
Are Delayed Issues Harder to Resolve? Revisiting Cost-to-Fix of Defects throughout the Lifecycle
Many practitioners and academics believe in a delayed issue effect (DIE);
i.e. the longer an issue lingers in the system, the more effort it requires to
resolve. This belief is often used to justify major investments in new
development processes that promise to retire more issues sooner.
This paper tests for the delayed issue effect in 171 software projects
conducted around the world in the period from 2006--2014. To the best of our
knowledge, this is the largest study yet published on this effect. We found no
evidence for the delayed issue effect; i.e. the effort to resolve issues in a
later phase was not consistently or substantially greater than when issues were
resolved soon after their introduction.
This paper documents the above study and explores reasons for this mismatch
between this common rule of thumb and empirical data. In summary, DIE is not
some constant across all projects. Rather, DIE might be an historical relic
that occurs intermittently only in certain kinds of projects. This is a
significant result since it predicts that new development processes that
promise to faster retire more issues will not have a guaranteed return on
investment (depending on the context where applied), and that a long-held truth
in software engineering should not be considered a global truism.Comment: 31 pages. Accepted with minor revisions to Journal of Empirical
Software Engineering. Keywords: software economics, phase delay, cost to fi
- …
