10,914 research outputs found
Irom Sharmila : the world’s longest hunger strike in world’s largest ‘democracy’
Ravi Nitesh is an India-based human rights activist. He is the founder of Mission Bhartiyam, an organisation working in the fields of peace & harmony, human rights and environment. He is a core member of Save Sharmila Solidarity Campaign, a nation-wide campaign in support of Irom Sharmila and the repeal of AFSPA
Impact of Biases in Big Data
The underlying paradigm of big data-driven machine learning reflects the
desire of deriving better conclusions from simply analyzing more data, without
the necessity of looking at theory and models. Is having simply more data
always helpful? In 1936, The Literary Digest collected 2.3M filled in
questionnaires to predict the outcome of that year's US presidential election.
The outcome of this big data prediction proved to be entirely wrong, whereas
George Gallup only needed 3K handpicked people to make an accurate prediction.
Generally, biases occur in machine learning whenever the distributions of
training set and test set are different. In this work, we provide a review of
different sorts of biases in (big) data sets in machine learning. We provide
definitions and discussions of the most commonly appearing biases in machine
learning: class imbalance and covariate shift. We also show how these biases
can be quantified and corrected. This work is an introductory text for both
researchers and practitioners to become more aware of this topic and thus to
derive more reliable models for their learning problems
A systematic study of the class imbalance problem in convolutional neural networks
In this study, we systematically investigate the impact of class imbalance on
classification performance of convolutional neural networks (CNNs) and compare
frequently used methods to address the issue. Class imbalance is a common
problem that has been comprehensively studied in classical machine learning,
yet very limited systematic research is available in the context of deep
learning. In our study, we use three benchmark datasets of increasing
complexity, MNIST, CIFAR-10 and ImageNet, to investigate the effects of
imbalance on classification and perform an extensive comparison of several
methods to address the issue: oversampling, undersampling, two-phase training,
and thresholding that compensates for prior class probabilities. Our main
evaluation metric is area under the receiver operating characteristic curve
(ROC AUC) adjusted to multi-class tasks since overall accuracy metric is
associated with notable difficulties in the context of imbalanced data. Based
on results from our experiments we conclude that (i) the effect of class
imbalance on classification performance is detrimental; (ii) the method of
addressing class imbalance that emerged as dominant in almost all analyzed
scenarios was oversampling; (iii) oversampling should be applied to the level
that completely eliminates the imbalance, whereas the optimal undersampling
ratio depends on the extent of imbalance; (iv) as opposed to some classical
machine learning models, oversampling does not cause overfitting of CNNs; (v)
thresholding should be applied to compensate for prior class probabilities when
overall number of properly classified cases is of interest
- …
