59 research outputs found
Atomic Resolution Imaging of Currents in Nanoscopic Quantum Networks via Scanning Tunneling Microscopy
We propose a new method for atomic-scale imaging of spatial current patterns
in nanoscopic quantum networks by using scanning tunneling microscopy (STM). By
measuring the current flowing from the STM tip into one of the leads attached
to the network as a function of tip position, one obtains an atomically
resolved spatial image of "current riverbeds" whose spatial structure reflects
the coherent flow of electrons out of equilibrium. We show that this method can
be successfully applied in variety of network topologies, and is robust against
dephasing effects.Comment: 5 page
Current Eigenmodes and Dephasing in Nanoscopic Quantum Networks
Using the non-equilibrium Keldysh Green's function formalism, we show that
the non-equilibrium charge transport in nanoscopic quantum networks takes place
via {\it current eigenmodes} that possess characteristic spatial patterns. We
identify the microscopic relation between the current patterns and the
network's electronic structure and topology and demonstrate that these patterns
can be selected via gating or constrictions, providing new venues for
manipulating charge transport at the nanoscale. Finally, decreasing the
dephasing time leads to a smooth evolution of the current patterns from those
of a ballistic quantum network to those of a classical resistor network.Comment: 6 pages, 4 figure
Flatter, faster: scaling momentum for optimal speedup of SGD
Commonly used optimization algorithms often show a trade-off between good
generalization and fast training times. For instance, stochastic gradient
descent (SGD) tends to have good generalization; however, adaptive gradient
methods have superior training times. Momentum can help accelerate training
with SGD, but so far there has been no principled way to select the momentum
hyperparameter. Here we study training dynamics arising from the interplay
between SGD with label noise and momentum in the training of overparametrized
neural networks. We find that scaling the momentum hyperparameter
with the learning rate to the power of maximally accelerates training,
without sacrificing generalization. To analytically derive this result we
develop an architecture-independent framework, where the main assumption is the
existence of a degenerate manifold of global minimizers, as is natural in
overparametrized models. Training dynamics display the emergence of two
characteristic timescales that are well-separated for generic values of the
hyperparameters. The maximum acceleration of training is reached when these two
timescales meet, which in turn determines the scaling limit we propose. We
confirm our scaling rule for synthetic regression problems (matrix sensing and
teacher-student paradigm) and classification for realistic datasets (ResNet-18
on CIFAR10, 6-layer MLP on FashionMNIST), suggesting the robustness of our
scaling rule to variations in architectures and datasets.Comment: v2: expanded introduction section, corrected minor typos. v1: 12+13
pages, 3 figure
Spatial Current Patterns, Dephasing and Current Imaging in Graphene Nanoribbons
Using the non-equilibrium Keldysh Green's function formalism, we investigate
the local, non-equilibrium charge transport in graphene nanoribbons (GNRs). In
particular, we demonstrate that the spatial current patterns associated with
discrete transmission resonances sensitively depend on the GNRs' geometry,
size, and aspect ratio, the location and number of leads, and the presence of
dephasing. We identify a relation between the spatial form of the current
patterns, and the number of degenerate energy states participating in the
charge transport. Furthermore, we demonstrate a principle of superposition for
the conductance and spatial current patterns in multiple-lead configurations.
We demonstrate that scanning tunneling microscopy (STM) can be employed to
image spatial current paths in GNR with atomic resolution, providing important
insight into the form of local charge transport. Finally, we investigate the
effects of dephasing on the spatial current patterns, and show that with
decreasing dephasing time, the current patterns evolve smoothly from those of a
ballistic quantum network to those of classical resistor network.Comment: 25 pages, 12 figure
Trainability, Expressivity and Interpretability in Gated Neural ODEs
Understanding how the dynamics in biological and artificial neural networks
implement the computations required for a task is a salient open question in
machine learning and neuroscience. In particular, computations requiring
complex memory storage and retrieval pose a significant challenge for these
networks to implement or learn. Recently, a family of models described by
neural ordinary differential equations (nODEs) has emerged as powerful
dynamical neural network models capable of capturing complex dynamics. Here, we
extend nODEs by endowing them with adaptive timescales using gating
interactions. We refer to these as gated neural ODEs (gnODEs). Using a task
that requires memory of continuous quantities, we demonstrate the inductive
bias of the gnODEs to learn (approximate) continuous attractors. We further
show how reduced-dimensional gnODEs retain their modeling power while greatly
improving interpretability, even allowing explicit visualization of the
structure of learned attractors. We introduce a novel measure of expressivity
which probes the capacity of a neural network to generate complex trajectories.
Using this measure, we explore how the phase-space dimension of the nODEs and
the complexity of the function modeling the flow field contribute to
expressivity. We see that a more complex function for modeling the flow field
allows a lower-dimensional nODE to capture a given target dynamics. Finally, we
demonstrate the benefit of gating in nODEs on several real-world tasks
Using large language models to study human memory for meaningful narratives
One of the most impressive achievements of the AI revolution is the
development of large language models that can generate meaningful text and
respond to instructions in plain English with no additional training necessary.
Here we show that language models can be used as a scientific instrument for
studying human memory for meaningful material. We developed a pipeline for
designing large scale memory experiments and analyzing the obtained results. We
performed online memory experiments with a large number of participants and
collected recognition and recall data for narratives of different lengths. We
found that both recall and recognition performance scale linearly with
narrative length. Furthermore, in order to investigate the role of narrative
comprehension in memory, we repeated these experiments using scrambled versions
of the presented stories. We found that even though recall performance declined
significantly, recognition remained largely unaffected. Interestingly, recalls
in this condition seem to follow the original narrative order rather than the
scrambled presentation, pointing to a contextual reconstruction of the story in
memory.Comment: v2: 43 pages, with added discussion and a new appendix
- …
