999 research outputs found
Can You Follow Me? Testing Situational Understanding in ChatGPT
Understanding sentence meanings and updating information states appropriately
across time -- what we call "situational understanding" (SU) -- is a critical
ability for human-like AI agents. SU is essential in particular for chat
models, such as ChatGPT, to enable consistent, coherent, and effective dialogue
between humans and AI. Previous works have identified certain SU limitations in
non-chatbot Large Language models (LLMs), but the extent and causes of these
limitations are not well understood, and capabilities of current chat-based
models in this domain have not been explored. In this work we tackle these
questions, proposing a novel synthetic environment for SU testing which allows
us to do controlled and systematic testing of SU in chat-oriented models,
through assessment of models' ability to track and enumerate environment
states. Our environment also allows for close analysis of dynamics of model
performance, to better understand underlying causes for performance patterns.
We apply our test to ChatGPT, the state-of-the-art chatbot, and find that
despite the fundamental simplicity of the task, the model's performance
reflects an inability to retain correct environment states across time. Our
follow-up analyses suggest that performance degradation is largely because
ChatGPT has non-persistent in-context memory (although it can access the full
dialogue history) and it is susceptible to hallucinated updates -- including
updates that artificially inflate accuracies. Our findings suggest overall that
ChatGPT is not currently equipped for robust tracking of situation states, and
that trust in the impressive dialogue performance of ChatGPT comes with risks.
We release the codebase for reproducing our test environment, as well as all
prompts and API responses from ChatGPT, at
https://github.com/yangalan123/SituationalTesting.Comment: EMNLP 2023 Main Paper (Camera Ready
Equipping Transformer with Random-Access Reading for Long-Context Understanding
Long-context modeling presents a significant challenge for transformer-based
large language models (LLMs) due to the quadratic complexity of the
self-attention mechanism and issues with length extrapolation caused by
pretraining exclusively on short inputs. Existing methods address computational
complexity through techniques such as text chunking, the kernel approach, and
structured attention, and tackle length extrapolation problems through
positional encoding, continued pretraining, and data engineering. These
approaches typically require to the document,
necessitating reading from the first to the last token. We contend that for
goal-oriented reading of long documents, such sequential access is not
necessary, and a proficiently trained model can learn to omit hundreds of less
pertinent tokens. Inspired by human reading behaviors and existing empirical
observations, we propose , a novel reading strategy
that enables transformers to efficiently process long documents without
examining every token. Experimental results from pretraining, fine-tuning, and
inference phases validate the efficacy of our method.Comment: Preliminary works for a Google Student Researcher Projec
When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models
Recent studies suggest that self-reflective prompting can significantly
enhance the reasoning capabilities of Large Language Models (LLMs). However,
the use of external feedback as a stop criterion raises doubts about the true
extent of LLMs' ability to emulate human-like self-reflection. In this paper,
we set out to clarify these capabilities under a more stringent evaluation
setting in which we disallow any kind of external feedback. Our findings under
this setting show a split: while self-reflection enhances performance in
TruthfulQA, it adversely affects results in HotpotQA. We conduct follow-up
analyses to clarify the contributing factors in these patterns, and find that
the influence of self-reflection is impacted both by reliability of accuracy in
models' initial responses, and by overall question difficulty: specifically,
self-reflection shows the most benefit when models are less likely to be
correct initially, and when overall question difficulty is higher. We also find
that self-reflection reduces tendency toward majority voting. Based on our
findings, we propose guidelines for decisions on when to implement
self-reflection. We release the codebase for reproducing our experiments at
https://github.com/yanhong-lbh/LLM-SelfReflection-Eval.Comment: NAACL 2024 Findings paper (Camera-Ready Version
Characteristics of the Hydrogen Electrode in High Temperature Steam Electrolysis Process
YSZ-electrolyte supported solid oxide electrolyzer cells (SOECs) using LSM-YSZ oxygen electrode but with three types of hydrogen electrode, Ni–SDC, Ni–YSZ and LSCM–YSZ have been fabricated and characterized under different steam contents in the feeding gas at 850°C. Electrochemical impedance spectra results show that cell resistances increase with the increase in steam concentrations under both open circuit voltage and electrolysis conditions, suggesting that electrolysis reaction becomes more difficult in high steam content. Pt reference electrode was applied to evaluate the contributions of the hydrogen electrode and oxygen electrode in the electrolysis process. Electrochemical impedance spectra and over potential of both electrodes were measured under the same testing conditions. Experimental results show that steam contents mainly affect the behavior of the hydrogen electrode but have little influence on the oxygen electrode. Further, contribution from the hydrogen electrode is dominant in the electrolysis process for Ni–based SOECs, but this contribution decreases for LSCM–based SOECs
An inflamed mood:studies on the role of inflammation in the pathophysiology and treatment outcome of major depressive disorder
High Efficiency Secondary Somatic Embryogenesis in Hovenia dulcis
Embryogenic callus was obtained from mature seed explants on medium supplemented with 2,4-dichlorophenoxyacetic acid. Primary somatic embryos (SEs) can only develop into abnormal plants. Well-developed SEs could be obtained through secondary somatic embryogenesis both in solid and liquid cultures. Temperature strongly affected induction frequency of secondary embryogenesis. Relatively high temperature (30∘C) and germinated SEs explants were effective for induction of secondary somatic embryos, and low temperature (20∘C) was more suitable for further embryo development, plantlet conversion, and transplant survival. Somatic embryos formed on agar medium had larger cotyledons than those of embryos formed in liquid medium. Supplementing 0.1 mg L−1 6-benzyladenine (BA) was effective for plant conversion; the rate of plant conversion was 43.3% in somatic embryos from solid culture and 36.5% in embryos from liquid culture. In vitro plants were successfully acclimatized in the greenhouse. The protocol established in this study will be helpful for large-scale vegetative propagation of this medicinal tree
On Isotropy, Contextualization and Learning Dynamics of Contrastive-based Sentence Representation Learning
Incorporating contrastive learning objectives in sentence representation
learning (SRL) has yielded significant improvements on many sentence-level NLP
tasks. However, it is not well understood why contrastive learning works for
learning sentence-level semantics. In this paper, we aim to help guide future
designs of sentence representation learning methods by taking a closer look at
contrastive SRL through the lens of isotropy, contextualization and learning
dynamics. We interpret its successes through the geometry of the representation
shifts and show that contrastive learning brings isotropy, and drives high
intra-sentence similarity: when in the same sentence, tokens converge to
similar positions in the semantic space. We also find that what we formalize as
"spurious contextualization" is mitigated for semantically meaningful tokens,
while augmented for functional ones. We find that the embedding space is
directed towards the origin during training, with more areas now better
defined. We ablate these findings by observing the learning dynamics with
different training temperatures, batch sizes and pooling methods.Comment: Accepted by ACL 2023 (Findings, long paper
- …
