18 research outputs found
DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving
Today, there are two major paradigms for vision-based autonomous driving
systems: mediated perception approaches that parse an entire scene to make a
driving decision, and behavior reflex approaches that directly map an input
image to a driving action by a regressor. In this paper, we propose a third
paradigm: a direct perception approach to estimate the affordance for driving.
We propose to map an input image to a small number of key perception indicators
that directly relate to the affordance of a road/traffic state for driving. Our
representation provides a set of compact yet complete descriptions of the scene
to enable a simple controller to drive autonomously. Falling in between the two
extremes of mediated perception and behavior reflex, we argue that our direct
perception representation provides the right level of abstraction. To
demonstrate this, we train a deep Convolutional Neural Network using recording
from 12 hours of human driving in a video game and show that our model can work
well to drive a car in a very diverse set of virtual environments. We also
train a model for car distance estimation on the KITTI dataset. Results show
that our direct perception approach can generalize well to real driving images.
Source code and data are available on our project website
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
While there has been remarkable progress in the performance of visual
recognition algorithms, the state-of-the-art models tend to be exceptionally
data-hungry. Large labeled training datasets, expensive and tedious to produce,
are required to optimize millions of parameters in deep network models. Lagging
behind the growth in model capacity, the available datasets are quickly
becoming outdated in terms of size and density. To circumvent this bottleneck,
we propose to amplify human effort through a partially automated labeling
scheme, leveraging deep learning with humans in the loop. Starting from a large
set of candidate images for each category, we iteratively sample a subset, ask
people to label them, classify the others with a trained model, split the set
into positives, negatives, and unlabeled based on the classification
confidence, and then iterate with the unlabeled set. To assess the
effectiveness of this cascading procedure and enable further progress in visual
recognition research, we construct a new image dataset, LSUN. It contains
around one million labeled images for each of 10 scene categories and 20 object
categories. We experiment with training popular convolutional networks and find
that they achieve substantial performance gains when trained on this dataset
Anatomy-specific classification of medical images using deep convolutional nets
Automated classification of human anatomy is an important prerequisite for
many computer-aided diagnosis systems. The spatial complexity and variability
of anatomy throughout the human body makes classification difficult. "Deep
learning" methods such as convolutional networks (ConvNets) outperform other
state-of-the-art methods in image classification tasks. In this work, we
present a method for organ- or body-part-specific anatomical classification of
medical images acquired using computed tomography (CT) with ConvNets. We train
a ConvNet, using 4,298 separate axial 2D key-images to learn 5 anatomical
classes. Key-images were mined from a hospital PACS archive, using a set of
1,675 patients. We show that a data augmentation approach can help to enrich
the data set and improve classification performance. Using ConvNets and data
augmentation, we achieve anatomy-specific classification error of 5.9 % and
area-under-the-curve (AUC) values of an average of 0.998 in testing. We
demonstrate that deep learning can be used to train very reliable and accurate
classifiers that could initialize further computer-aided diagnosis.Comment: Presented at: 2015 IEEE International Symposium on Biomedical
Imaging, April 16-19, 2015, New York Marriott at Brooklyn Bridge, NY, US
A New 2.5D Representation for Lymph Node Detection using Random Sets of Deep Convolutional Neural Network Observations
Automated Lymph Node (LN) detection is an important clinical diagnostic task
but very challenging due to the low contrast of surrounding structures in
Computed Tomography (CT) and to their varying sizes, poses, shapes and sparsely
distributed locations. State-of-the-art studies show the performance range of
52.9% sensitivity at 3.1 false-positives per volume (FP/vol.), or 60.9% at 6.1
FP/vol. for mediastinal LN, by one-shot boosting on 3D HAAR features. In this
paper, we first operate a preliminary candidate generation stage, towards 100%
sensitivity at the cost of high FP levels (40 per patient), to harvest volumes
of interest (VOI). Our 2.5D approach consequently decomposes any 3D VOI by
resampling 2D reformatted orthogonal views N times, via scale, random
translations, and rotations with respect to the VOI centroid coordinates. These
random views are then used to train a deep Convolutional Neural Network (CNN)
classifier. In testing, the CNN is employed to assign LN probabilities for all
N random views that can be simply averaged (as a set) to compute the final
classification probability per VOI. We validate the approach on two datasets:
90 CT volumes with 388 mediastinal LNs and 86 patients with 595 abdominal LNs.
We achieve sensitivities of 70%/83% at 3 FP/vol. and 84%/90% at 6 FP/vol. in
mediastinum and abdomen respectively, which drastically improves over the
previous state-of-the-art work.Comment: This article will be presented at MICCAI (Medical Image Computing and
Computer-Assisted Interventions) 201
Discrete Object Generation with Reversible Inductive Construction
The success of generative modeling in continuous domains has led to a surge
of interest in generating discrete data such as molecules, source code, and
graphs. However, construction histories for these discrete objects are
typically not unique and so generative models must reason about intractably
large spaces in order to learn. Additionally, structured discrete domains are
often characterized by strict constraints on what constitutes a valid object
and generative models must respect these requirements in order to produce
useful novel samples. Here, we present a generative model for discrete objects
employing a Markov chain where transitions are restricted to a set of local
operations that preserve validity. Building off of generative interpretations
of denoising autoencoders, the Markov chain alternates between producing 1) a
sequence of corrupted objects that are valid but not from the data
distribution, and 2) a learned reconstruction distribution that attempts to fix
the corruptions while also preserving validity. This approach constrains the
generative model to only produce valid objects, requires the learner to only
discover local modifications to the objects, and avoids marginalization over an
unknown and potentially large space of construction histories. We evaluate the
proposed approach on two highly structured discrete domains, molecules and
Laman graphs, and find that it compares favorably to alternative methods at
capturing distributional statistics for a host of semantically relevant
metrics
Interleaved text/image Deep Mining on a large-scale radiology database
Despite tremendous progress in computer vision, effec-tive learning on very large-scale (> 100K patients) medi-cal image databases has been vastly hindered. We present an interleaved text/image deep learning system to extract and mine the semantic interactions of radiology images and reports from a national research hospital’s picture archiv-ing and communication system. Instead of using full 3D medical volumes, we focus on a collection of representa-tive ~216K 2D key images/slices (selected by clinicians for diagnostic reference) with text-driven scalar and vector la-bels. Our system interleaves between unsupervised learn-ing (e.g., latent Dirichlet allocation, recurrent neural net language models) on document- and sentence-level texts to generate semantic labels and supervised learning via deep convolutional neural networks (CNNs) to map from images to label spaces. Disease-related key words can be predicted for radiology images in a retrieval manner. We have demon-strated promising quantitative and qualitative results. The large-scale datasets of extracted key images and their cat-egorization, embedded vector labels and sentence descrip-tions can be harnessed to alleviate the deep learning “data-hungry ” obstacle in the medical domain
