82 research outputs found
Fixing the train-test resolution discrepancy
Data-augmentation is key to the training of neural networks for image classification. This paper first shows that existing augmentations induce a significant discrepancy between the typical size of the objects seen by the classifier at train and test time. We experimentally validate that, for a target test resolution, using a lower train resolution offers better classification at test time.We then propose a simple yet effective and efficient strategy to optimize the classifier performance when the train and test resolutions differ. It involves only a computationally cheap fine-tuning of the network at the test resolution. This enables training strong classifiers using small training images. For instance, we obtain 77.1% top-1 accuracy on ImageNet with a ResNet-50 trained on 128×128 images, and 79.8% with one trained on 224×224 image. In addition, if we use extra training data we get 82.5% with the ResNet-50 train with 224×224 images.Conversely, when training a ResNeXt-101 32x48d pretrained in weakly-supervised fashion on 940 million public images at resolution 224×224 and further optimizing for test resolution 320×320, we obtain a test top-1 accuracy of 86.4% (top-5: 98.0%) (single-crop). To the best of our knowledge this is the highest ImageNet single-crop, top-1 and top-5 accuracy to date
Co-training Submodels for Visual Recognition
We introduce submodel co-training, a regularization method related to
co-training, self-distillation and stochastic depth. Given a neural network to
be trained, for each sample we implicitly instantiate two altered networks,
``submodels'', with stochastic depth: we activate only a subset of the layers.
Each network serves as a soft teacher to the other, by providing a loss that
complements the regular loss provided by the one-hot label. Our approach,
dubbed cosub, uses a single set of weights, and does not involve a pre-trained
external model or temporal averaging.
Experimentally, we show that submodel co-training is effective to train
backbones for recognition tasks such as image classification and semantic
segmentation. Our approach is compatible with multiple architectures, including
RegNet, ViT, PiT, XCiT, Swin and ConvNext. Our training strategy improves their
results in comparable settings. For instance, a ViT-B pretrained with cosub on
ImageNet-21k obtains 87.4% top-1 acc. @448 on ImageNet-val
Facile synthesis of 3-substituted thieno[3,2-b]furan derivatives
A facile synthesis of dimethyl 3-hydroxythieno[3,2-b]furan-2,5-dicarboxylate is reported from the available methyl thioglycolate and dimethyl acetylenedicarboxylate starting materials. This compound represents an efficient precursor for the synthesis of 3-substituted thieno[3,2-b]furan derivatives
- …
