3,045 research outputs found
Practical Block-wise Neural Network Architecture Generation
Convolutional neural networks have gained a remarkable success in computer
vision. However, most usable network architectures are hand-crafted and usually
require expertise and elaborate design. In this paper, we provide a block-wise
network generation pipeline called BlockQNN which automatically builds
high-performance networks using the Q-Learning paradigm with epsilon-greedy
exploration strategy. The optimal network block is constructed by the learning
agent which is trained sequentially to choose component layers. We stack the
block to construct the whole auto-generated network. To accelerate the
generation process, we also propose a distributed asynchronous framework and an
early stop strategy. The block-wise generation brings unique advantages: (1) it
performs competitive results in comparison to the hand-crafted state-of-the-art
networks on image classification, additionally, the best network generated by
BlockQNN achieves 3.54% top-1 error rate on CIFAR-10 which beats all existing
auto-generate networks. (2) in the meanwhile, it offers tremendous reduction of
the search space in designing networks which only spends 3 days with 32 GPUs,
and (3) moreover, it has strong generalizability that the network built on
CIFAR also performs well on a larger-scale ImageNet dataset.Comment: Accepted to CVPR 201
Accelerated Training for Massive Classification via Dynamic Class Selection
Massive classification, a classification task defined over a vast number of
classes (hundreds of thousands or even millions), has become an essential part
of many real-world systems, such as face recognition. Existing methods,
including the deep networks that achieved remarkable success in recent years,
were mostly devised for problems with a moderate number of classes. They would
meet with substantial difficulties, e.g. excessive memory demand and
computational cost, when applied to massive problems. We present a new method
to tackle this problem. This method can efficiently and accurately identify a
small number of "active classes" for each mini-batch, based on a set of dynamic
class hierarchies constructed on the fly. We also develop an adaptive
allocation scheme thereon, which leads to a better tradeoff between performance
and cost. On several large-scale benchmarks, our method significantly reduces
the training cost and memory demand, while maintaining competitive performance.Comment: 8 pages, 6 figures, AAAI 201
- …
