5,642 research outputs found

    SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

    Full text link
    Transfer learning has fundamentally changed the landscape of natural language processing (NLP) research. Many existing state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model. To address the above issue in a more principled manner, we propose a new computational framework for robust and efficient fine-tuning for pre-trained language models. Specifically, our proposed framework contains two important ingredients: 1. Smoothness-inducing regularization, which effectively manages the capacity of the model; 2. Bregman proximal point optimization, which is a class of trust-region methods and can prevent knowledge forgetting. Our experiments demonstrate that our proposed method achieves the state-of-the-art performance on multiple NLP benchmarks.Comment: The 58th annual meeting of the Association for Computational Linguistics (ACL 2020

    Frequency-tunable circular polarization beam splitter using a graphene-dielectric sub-wavelength film

    Full text link
    Manipulating the circular polarization of light is of great importance in chemistry and biology, as chiral molecules exhibit different physiological properties when exposed to different circularly polarized waves. Here we suggest a graphene/dielectric-stacked structure, which has both the properties of a epsilon-near-zero material and the high Hall conductivity of graphene. The proposed sub-wavelength structure demonstrates efficient manipulation of circular polarization properties of light. In a quite broad frequency range and at a large oblique incidence angle, the present magnetically active structure is transparent for one circularly polarized wave, and opaque for another. Such an effect can be further tuned by changing the magnitude of the applied magnetic field and chemical potential of graphene.Comment: 20 pages, 4 figure

    Alleviation of Drought Stress in White Clover after Inoculation with Arbuscular Mycorrhizal Fungi

    Get PDF
    White clover is extremely susceptive to drought stress (DS), while it is not clear whether arbuscular mycorrhizal fungi (AMF) enhance drought tolerance of the plant. This study was carried out to evaluate effects of two AMF species, Funneliformis mosseae and Paraglomus occultum, on flavonoid, soluble protein, proline, and nutrient uptake in roots of white clover under well-watered (WW) and DS conditions. Root colonization by F. mosseae and P. occultum was heavily decreased by 7-week DS treatment. Mycorrhizal plants showed considerably greater biomass production in shoot, root, and total (shoot+root) than non-mycorrhizal plants, irrespective of soil water status. AMF inoculation led to significantly higher root soluble protein and proline accumulation under WW and DS and root flavonoid level under DS, regardless of AMF species. Root N, P, K and Cu concentrations were dramatically increased by mycorrhization under WW and DS, and root Ca, Mg, Fe, and Mn levels were significantly higher in AMF plants than in non-AMF plants under WW. It concluded that AMF strongly enhanced plant growth and drought tolerance of white clover by greater nutrient absorption and protective substances (soluble protein, proline, and flavonoid) accumulation

    FGeo-DRL: Deductive Reasoning for Geometric Problems through Deep Reinforcement Learning

    Full text link
    The human-like automatic deductive reasoning has always been one of the most challenging open problems in the interdiscipline of mathematics and artificial intelligence. This paper is the third in a series of our works. We built a neural-symbolic system, called FGeoDRL, to automatically perform human-like geometric deductive reasoning. The neural part is an AI agent based on reinforcement learning, capable of autonomously learning problem-solving methods from the feedback of a formalized environment, without the need for human supervision. It leverages a pre-trained natural language model to establish a policy network for theorem selection and employ Monte Carlo Tree Search for heuristic exploration. The symbolic part is a reinforcement learning environment based on geometry formalization theory and FormalGeo, which models GPS as a Markov Decision Process. In this formal symbolic system, the known conditions and objectives of the problem form the state space, while the set of theorems forms the action space. Leveraging FGeoDRL, we have achieved readable and verifiable automated solutions to geometric problems. Experiments conducted on the formalgeo7k dataset have achieved a problem-solving success rate of 86.40%. The project is available at https://github.com/PersonNoName/FGeoDRL.Comment: 15 page

    Deep Reinforcement Learning from Hierarchical Weak Preference Feedback

    Full text link
    Reward design is a fundamental, yet challenging aspect of practical reinforcement learning (RL). For simple tasks, researchers typically handcraft the reward function, e.g., using a linear combination of several reward factors. However, such reward engineering is subject to approximation bias, incurs large tuning cost, and often cannot provide the granularity required for complex tasks. To avoid these difficulties, researchers have turned to reinforcement learning from human feedback (RLHF), which learns a reward function from human preferences between pairs of trajectory sequences. By leveraging preference-based reward modeling, RLHF learns complex rewards that are well aligned with human preferences, allowing RL to tackle increasingly difficult problems. Unfortunately, the applicability of RLHF is limited due to the high cost and difficulty of obtaining human preference data. In light of this cost, we investigate learning reward functions for complex tasks with less human effort; simply by ranking the importance of the reward factors. More specifically, we propose a new RL framework -- HERON, which compares trajectories using a hierarchical decision tree induced by the given ranking. These comparisons are used to train a preference-based reward model, which is then used for policy learning. We find that our framework can not only train high performing agents on a variety of difficult tasks, but also provide additional benefits such as improved sample efficiency and robustness. Our code is available at https://github.com/abukharin3/HERON.Comment: 28 Pages, 15 figure
    corecore