5,642 research outputs found
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
Transfer learning has fundamentally changed the landscape of natural language
processing (NLP) research. Many existing state-of-the-art models are first
pre-trained on a large text corpus and then fine-tuned on downstream tasks.
However, due to limited data resources from downstream tasks and the extremely
large capacity of pre-trained models, aggressive fine-tuning often causes the
adapted model to overfit the data of downstream tasks and forget the knowledge
of the pre-trained model. To address the above issue in a more principled
manner, we propose a new computational framework for robust and efficient
fine-tuning for pre-trained language models. Specifically, our proposed
framework contains two important ingredients: 1. Smoothness-inducing
regularization, which effectively manages the capacity of the model; 2. Bregman
proximal point optimization, which is a class of trust-region methods and can
prevent knowledge forgetting. Our experiments demonstrate that our proposed
method achieves the state-of-the-art performance on multiple NLP benchmarks.Comment: The 58th annual meeting of the Association for Computational
Linguistics (ACL 2020
Frequency-tunable circular polarization beam splitter using a graphene-dielectric sub-wavelength film
Manipulating the circular polarization of light is of great importance in
chemistry and biology, as chiral molecules exhibit different physiological
properties when exposed to different circularly polarized waves. Here we
suggest a graphene/dielectric-stacked structure, which has both the properties
of a epsilon-near-zero material and the high Hall conductivity of graphene. The
proposed sub-wavelength structure demonstrates efficient manipulation of
circular polarization properties of light. In a quite broad frequency range and
at a large oblique incidence angle, the present magnetically active structure
is transparent for one circularly polarized wave, and opaque for another. Such
an effect can be further tuned by changing the magnitude of the applied
magnetic field and chemical potential of graphene.Comment: 20 pages, 4 figure
Alleviation of Drought Stress in White Clover after Inoculation with Arbuscular Mycorrhizal Fungi
White clover is extremely susceptive to drought stress (DS), while it is not clear whether arbuscular mycorrhizal fungi (AMF) enhance drought tolerance of the plant. This study was carried out to evaluate effects of two AMF species, Funneliformis mosseae and Paraglomus occultum, on flavonoid, soluble protein, proline, and nutrient uptake in roots of white clover under well-watered (WW) and DS conditions. Root colonization by F. mosseae and P. occultum was heavily decreased by 7-week DS treatment. Mycorrhizal plants showed considerably greater biomass production in shoot, root, and total (shoot+root) than non-mycorrhizal plants, irrespective of soil water status. AMF inoculation led to significantly higher root soluble protein and proline accumulation under WW and DS and root flavonoid level under DS, regardless of AMF species. Root N, P, K and Cu concentrations were dramatically increased by mycorrhization under WW and DS, and root Ca, Mg, Fe, and Mn levels were significantly higher in AMF plants than in non-AMF plants under WW. It concluded that AMF strongly enhanced plant growth and drought tolerance of white clover by greater nutrient absorption and protective substances (soluble protein, proline, and flavonoid) accumulation
FGeo-DRL: Deductive Reasoning for Geometric Problems through Deep Reinforcement Learning
The human-like automatic deductive reasoning has always been one of the most
challenging open problems in the interdiscipline of mathematics and artificial
intelligence. This paper is the third in a series of our works. We built a
neural-symbolic system, called FGeoDRL, to automatically perform human-like
geometric deductive reasoning. The neural part is an AI agent based on
reinforcement learning, capable of autonomously learning problem-solving
methods from the feedback of a formalized environment, without the need for
human supervision. It leverages a pre-trained natural language model to
establish a policy network for theorem selection and employ Monte Carlo Tree
Search for heuristic exploration. The symbolic part is a reinforcement learning
environment based on geometry formalization theory and FormalGeo, which models
GPS as a Markov Decision Process. In this formal symbolic system, the known
conditions and objectives of the problem form the state space, while the set of
theorems forms the action space. Leveraging FGeoDRL, we have achieved readable
and verifiable automated solutions to geometric problems. Experiments conducted
on the formalgeo7k dataset have achieved a problem-solving success rate of
86.40%. The project is available at https://github.com/PersonNoName/FGeoDRL.Comment: 15 page
Deep Reinforcement Learning from Hierarchical Weak Preference Feedback
Reward design is a fundamental, yet challenging aspect of practical
reinforcement learning (RL). For simple tasks, researchers typically handcraft
the reward function, e.g., using a linear combination of several reward
factors. However, such reward engineering is subject to approximation bias,
incurs large tuning cost, and often cannot provide the granularity required for
complex tasks. To avoid these difficulties, researchers have turned to
reinforcement learning from human feedback (RLHF), which learns a reward
function from human preferences between pairs of trajectory sequences. By
leveraging preference-based reward modeling, RLHF learns complex rewards that
are well aligned with human preferences, allowing RL to tackle increasingly
difficult problems. Unfortunately, the applicability of RLHF is limited due to
the high cost and difficulty of obtaining human preference data. In light of
this cost, we investigate learning reward functions for complex tasks with less
human effort; simply by ranking the importance of the reward factors. More
specifically, we propose a new RL framework -- HERON, which compares
trajectories using a hierarchical decision tree induced by the given ranking.
These comparisons are used to train a preference-based reward model, which is
then used for policy learning. We find that our framework can not only train
high performing agents on a variety of difficult tasks, but also provide
additional benefits such as improved sample efficiency and robustness. Our code
is available at https://github.com/abukharin3/HERON.Comment: 28 Pages, 15 figure
- …
