999 research outputs found

    Can You Follow Me? Testing Situational Understanding in ChatGPT

    Full text link
    Understanding sentence meanings and updating information states appropriately across time -- what we call "situational understanding" (SU) -- is a critical ability for human-like AI agents. SU is essential in particular for chat models, such as ChatGPT, to enable consistent, coherent, and effective dialogue between humans and AI. Previous works have identified certain SU limitations in non-chatbot Large Language models (LLMs), but the extent and causes of these limitations are not well understood, and capabilities of current chat-based models in this domain have not been explored. In this work we tackle these questions, proposing a novel synthetic environment for SU testing which allows us to do controlled and systematic testing of SU in chat-oriented models, through assessment of models' ability to track and enumerate environment states. Our environment also allows for close analysis of dynamics of model performance, to better understand underlying causes for performance patterns. We apply our test to ChatGPT, the state-of-the-art chatbot, and find that despite the fundamental simplicity of the task, the model's performance reflects an inability to retain correct environment states across time. Our follow-up analyses suggest that performance degradation is largely because ChatGPT has non-persistent in-context memory (although it can access the full dialogue history) and it is susceptible to hallucinated updates -- including updates that artificially inflate accuracies. Our findings suggest overall that ChatGPT is not currently equipped for robust tracking of situation states, and that trust in the impressive dialogue performance of ChatGPT comes with risks. We release the codebase for reproducing our test environment, as well as all prompts and API responses from ChatGPT, at https://github.com/yangalan123/SituationalTesting.Comment: EMNLP 2023 Main Paper (Camera Ready

    Equipping Transformer with Random-Access Reading for Long-Context Understanding

    Full text link
    Long-context modeling presents a significant challenge for transformer-based large language models (LLMs) due to the quadratic complexity of the self-attention mechanism and issues with length extrapolation caused by pretraining exclusively on short inputs. Existing methods address computational complexity through techniques such as text chunking, the kernel approach, and structured attention, and tackle length extrapolation problems through positional encoding, continued pretraining, and data engineering. These approaches typically require sequential access\textbf{sequential access} to the document, necessitating reading from the first to the last token. We contend that for goal-oriented reading of long documents, such sequential access is not necessary, and a proficiently trained model can learn to omit hundreds of less pertinent tokens. Inspired by human reading behaviors and existing empirical observations, we propose random access\textbf{random access}, a novel reading strategy that enables transformers to efficiently process long documents without examining every token. Experimental results from pretraining, fine-tuning, and inference phases validate the efficacy of our method.Comment: Preliminary works for a Google Student Researcher Projec

    When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models

    Full text link
    Recent studies suggest that self-reflective prompting can significantly enhance the reasoning capabilities of Large Language Models (LLMs). However, the use of external feedback as a stop criterion raises doubts about the true extent of LLMs' ability to emulate human-like self-reflection. In this paper, we set out to clarify these capabilities under a more stringent evaluation setting in which we disallow any kind of external feedback. Our findings under this setting show a split: while self-reflection enhances performance in TruthfulQA, it adversely affects results in HotpotQA. We conduct follow-up analyses to clarify the contributing factors in these patterns, and find that the influence of self-reflection is impacted both by reliability of accuracy in models' initial responses, and by overall question difficulty: specifically, self-reflection shows the most benefit when models are less likely to be correct initially, and when overall question difficulty is higher. We also find that self-reflection reduces tendency toward majority voting. Based on our findings, we propose guidelines for decisions on when to implement self-reflection. We release the codebase for reproducing our experiments at https://github.com/yanhong-lbh/LLM-SelfReflection-Eval.Comment: NAACL 2024 Findings paper (Camera-Ready Version

    Characteristics of the Hydrogen Electrode in High Temperature Steam Electrolysis Process

    Get PDF
    YSZ-electrolyte supported solid oxide electrolyzer cells (SOECs) using LSM-YSZ oxygen electrode but with three types of hydrogen electrode, Ni–SDC, Ni–YSZ and LSCM–YSZ have been fabricated and characterized under different steam contents in the feeding gas at 850°C. Electrochemical impedance spectra results show that cell resistances increase with the increase in steam concentrations under both open circuit voltage and electrolysis conditions, suggesting that electrolysis reaction becomes more difficult in high steam content. Pt reference electrode was applied to evaluate the contributions of the hydrogen electrode and oxygen electrode in the electrolysis process. Electrochemical impedance spectra and over potential of both electrodes were measured under the same testing conditions. Experimental results show that steam contents mainly affect the behavior of the hydrogen electrode but have little influence on the oxygen electrode. Further, contribution from the hydrogen electrode is dominant in the electrolysis process for Ni–based SOECs, but this contribution decreases for LSCM–based SOECs

    High Efficiency Secondary Somatic Embryogenesis in Hovenia dulcis

    Get PDF
    Embryogenic callus was obtained from mature seed explants on medium supplemented with 2,4-dichlorophenoxyacetic acid. Primary somatic embryos (SEs) can only develop into abnormal plants. Well-developed SEs could be obtained through secondary somatic embryogenesis both in solid and liquid cultures. Temperature strongly affected induction frequency of secondary embryogenesis. Relatively high temperature (30∘C) and germinated SEs explants were effective for induction of secondary somatic embryos, and low temperature (20∘C) was more suitable for further embryo development, plantlet conversion, and transplant survival. Somatic embryos formed on agar medium had larger cotyledons than those of embryos formed in liquid medium. Supplementing 0.1 mg L−1 6-benzyladenine (BA) was effective for plant conversion; the rate of plant conversion was 43.3% in somatic embryos from solid culture and 36.5% in embryos from liquid culture. In vitro plants were successfully acclimatized in the greenhouse. The protocol established in this study will be helpful for large-scale vegetative propagation of this medicinal tree

    On Isotropy, Contextualization and Learning Dynamics of Contrastive-based Sentence Representation Learning

    Full text link
    Incorporating contrastive learning objectives in sentence representation learning (SRL) has yielded significant improvements on many sentence-level NLP tasks. However, it is not well understood why contrastive learning works for learning sentence-level semantics. In this paper, we aim to help guide future designs of sentence representation learning methods by taking a closer look at contrastive SRL through the lens of isotropy, contextualization and learning dynamics. We interpret its successes through the geometry of the representation shifts and show that contrastive learning brings isotropy, and drives high intra-sentence similarity: when in the same sentence, tokens converge to similar positions in the semantic space. We also find that what we formalize as "spurious contextualization" is mitigated for semantically meaningful tokens, while augmented for functional ones. We find that the embedding space is directed towards the origin during training, with more areas now better defined. We ablate these findings by observing the learning dynamics with different training temperatures, batch sizes and pooling methods.Comment: Accepted by ACL 2023 (Findings, long paper
    corecore