292 research outputs found
Large-scale log analysis of digital reading
In this paper, we address daily reading practices of the general public in Russia analyzing 10 months of log data from the commercial ebook site Bookmate. We study different reading characteristics with ebooks, i.e. the reading volume and preferences, reading schedule, reading speed and reading style (including parallel reading patterns and book abandonment rates), with respect to reader gender, book length and genre of the book. We find that book genres impact certain reading behaviors, while gender differences or book length seem to play less of a role in ebook reading. Parallel book reading and book abandonment occur very frequently, possibly pointing towards changing reading behaviors in the ebook environment. The obtained insights demonstrate the high potential of log analysis for book reading studies. Copyright © 2016 by Association for Information Science and Technolog
Simplex Random Features
We present Simplex Random Features (SimRFs), a new random feature (RF)
mechanism for unbiased approximation of the softmax and Gaussian kernels by
geometrical correlation of random projection vectors. We prove that SimRFs
provide the smallest possible mean square error (MSE) on unbiased estimates of
these kernels among the class of weight-independent geometrically-coupled
positive random feature (PRF) mechanisms, substantially outperforming the
previously most accurate Orthogonal Random Features at no observable extra
cost. We present a more computationally expensive SimRFs+ variant, which we
prove is asymptotically optimal in the broader family of weight-dependent
geometrical coupling schemes (which permit correlations between random vector
directions and norms). In extensive empirical studies, we show consistent gains
provided by SimRFs in settings including pointwise kernel estimation,
nonparametric classification and scalable Transformers
Scalable Neural Network Kernels
We introduce the concept of scalable neural network kernels (SNNKs), the
replacements of regular feedforward layers (FFLs), capable of approximating the
latter, but with favorable computational properties. SNNKs effectively
disentangle the inputs from the parameters of the neural network in the FFL,
only to connect them in the final computation via the dot-product kernel. They
are also strictly more expressive, as allowing to model complicated
relationships beyond the functions of the dot-products of parameter-input
vectors. We also introduce the neural network bundling process that applies
SNNKs to compactify deep neural network architectures, resulting in additional
compression gains. In its extreme version, it leads to the fully bundled
network whose optimal parameters can be expressed via explicit formulae for
several loss functions (e.g. mean squared error), opening a possibility to
bypass backpropagation. As a by-product of our analysis, we introduce the
mechanism of the universal random features (or URFs), applied to instantiate
several SNNK variants, and interesting on its own in the context of scalable
kernel methods. We provide rigorous theoretical analysis of all these concepts
as well as an extensive empirical evaluation, ranging from point-wise kernel
estimation to Transformers' fine-tuning with novel adapter layers inspired by
SNNKs. Our mechanism provides up to 5x reduction in the number of trainable
parameters, while maintaining competitive accuracy.Comment: ICLR 202
Adaptive Computation with Elastic Input Sequence
Humans have the ability to adapt the type of information they use, the
procedure they employ, and the amount of time they spend when solving problems.
However, most standard neural networks have a fixed function type and
computation budget regardless of the sample's nature or difficulty. Adaptivity
is a powerful paradigm as it not only imbues practitioners with flexibility
pertaining to the downstream usage of these models but can also serve as a
powerful inductive bias for solving certain challenging classes of problems. In
this work, we introduce a new approach called AdaTape, which allows for dynamic
computation in neural networks through adaptive tape tokens. AdaTape utilizes
an elastic input sequence by equipping an architecture with a dynamic
read-and-write tape. Specifically, we adaptively generate input sequences using
tape tokens obtained from a tape bank which can be either trainable or derived
from input data. We examine the challenges and requirements to obtain dynamic
sequence content and length, and propose the Adaptive Tape Reading (ATR)
algorithm to achieve both goals. Through extensive experiments on image
recognition tasks, we show that AdaTape can achieve better performance while
maintaining the computational cost. To facilitate further research, we have
released code at https://github.com/google-research/scenic
- …
