126 research outputs found
Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning
Large language models (LLMs) have recently shown great potential for
in-context learning, where LLMs learn a new task simply by conditioning on a
few input-label pairs (prompts). Despite their potential, our understanding of
the factors influencing end-task performance and the robustness of in-context
learning remains limited. This paper aims to bridge this knowledge gap by
investigating the reliance of LLMs on shortcuts or spurious correlations within
prompts. Through comprehensive experiments on classification and extraction
tasks, we reveal that LLMs are "lazy learners" that tend to exploit shortcuts
in prompts for downstream tasks. Additionally, we uncover a surprising finding
that larger models are more likely to utilize shortcuts in prompts during
inference. Our findings provide a new perspective on evaluating robustness in
in-context learning and pose new challenges for detecting and mitigating the
use of shortcuts in prompts
KEHRL: Learning Knowledge-Enhanced Language Representations with Hierarchical Reinforcement Learning
Knowledge-enhanced pre-trained language models (KEPLMs) leverage relation
triples from knowledge graphs (KGs) and integrate these external data sources
into language models via self-supervised learning. Previous works treat
knowledge enhancement as two independent operations, i.e., knowledge injection
and knowledge integration. In this paper, we propose to learn
Knowledge-Enhanced language representations with Hierarchical Reinforcement
Learning (KEHRL), which jointly addresses the problems of detecting positions
for knowledge injection and integrating external knowledge into the model in
order to avoid injecting inaccurate or irrelevant knowledge. Specifically, a
high-level reinforcement learning (RL) agent utilizes both internal and prior
knowledge to iteratively detect essential positions in texts for knowledge
injection, which filters out less meaningful entities to avoid diverting the
knowledge learning direction. Once the entity positions are selected, a
relevant triple filtration module is triggered to perform low-level RL to
dynamically refine the triples associated with polysemic entities through
binary-valued actions. Experiments validate KEHRL's effectiveness in probing
factual knowledge and enhancing the model's performance on various natural
language understanding tasks
DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models
Recently, while large language models (LLMs) have demonstrated impressive
results, they still suffer from hallucination, i.e., the generation of false
information. Model editing is the task of fixing factual mistakes in LLMs; yet,
most previous works treat it as a one-time task, paying little attention to
ever-emerging mistakes generated by LLMs. We address the task of sequential
model editing (SME) that aims to rectify mistakes continuously. A Dynamic
Auxiliary Fusion Network (DAFNet) is designed to enhance the semantic
interaction among the factual knowledge within the entire sequence, preventing
catastrophic forgetting during the editing process of multiple knowledge
triples. Specifically, (1) for semantic fusion within a relation triple, we
aggregate the intra-editing attention flow into auto-regressive self-attention
with token-level granularity in LLMs. We further leverage multi-layer diagonal
inter-editing attention flow to update the weighted representations of the
entire sequence-level granularity. (2) Considering that auxiliary parameters
are required to store the knowledge for sequential editing, we construct a new
dataset named \textbf{DAFSet}, fulfilling recent, popular, long-tail and robust
properties to enhance the generality of sequential editing. Experiments show
DAFNet significantly outperforms strong baselines in single-turn and sequential
editing. The usage of DAFSet also consistently improves the performance of
other auxiliary network-based methods in various scenariosComment: ACL2024 finding
On the Role of Long-tail Knowledge in Retrieval Augmented Large Language Models
Retrieval augmented generation (RAG) exhibits outstanding performance in
promoting the knowledge capabilities of large language models (LLMs) with
retrieved documents related to user queries. However, RAG only focuses on
improving the response quality of LLMs via enhancing queries indiscriminately
with retrieved information, paying little attention to what type of knowledge
LLMs really need to answer original queries more accurately. In this paper, we
suggest that long-tail knowledge is crucial for RAG as LLMs have already
remembered common world knowledge during large-scale pre-training. Based on our
observation, we propose a simple but effective long-tail knowledge detection
method for LLMs. Specifically, the novel Generative Expected Calibration Error
(GECE) metric is derived to measure the ``long-tailness'' of knowledge based on
both statistics and semantics. Hence, we retrieve relevant documents and infuse
them into the model for patching knowledge loopholes only when the input query
relates to long-tail knowledge. Experiments show that, compared to existing RAG
pipelines, our method achieves over 4x speedup in average inference time and
consistent performance improvement in downstream tasks
UniPSDA: Unsupervised Pseudo Semantic Data Augmentation for Zero-Shot Cross-Lingual Natural Language Understanding
Cross-lingual representation learning transfers knowledge from resource-rich
data to resource-scarce ones to improve the semantic understanding abilities of
different languages. However, previous works rely on shallow unsupervised data
generated by token surface matching, regardless of the global context-aware
semantics of the surrounding text tokens. In this paper, we propose an
Unsupervised Pseudo Semantic Data Augmentation (UniPSDA) mechanism for
cross-lingual natural language understanding to enrich the training data
without human interventions. Specifically, to retrieve the tokens with similar
meanings for the semantic data augmentation across different languages, we
propose a sequential clustering process in 3 stages: within a single language,
across multiple languages of a language family, and across languages from
multiple language families. Meanwhile, considering the multi-lingual knowledge
infusion with context-aware semantics while alleviating computation burden, we
directly replace the key constituents of the sentences with the above-learned
multi-lingual family knowledge, viewed as pseudo-semantic. The infusion process
is further optimized via three de-biasing techniques without introducing any
neural parameters. Extensive experiments demonstrate that our model
consistently improves the performance on general zero-shot cross-lingual
natural language understanding tasks, including sequence classification,
information extraction, and question answering
NoisyGL: A Comprehensive Benchmark for Graph Neural Networks under Label Noise
Graph Neural Networks (GNNs) exhibit strong potential in node classification
task through a message-passing mechanism. However, their performance often
hinges on high-quality node labels, which are challenging to obtain in
real-world scenarios due to unreliable sources or adversarial attacks.
Consequently, label noise is common in real-world graph data, negatively
impacting GNNs by propagating incorrect information during training. To address
this issue, the study of Graph Neural Networks under Label Noise (GLN) has
recently gained traction. However, due to variations in dataset selection, data
splitting, and preprocessing techniques, the community currently lacks a
comprehensive benchmark, which impedes deeper understanding and further
development of GLN. To fill this gap, we introduce NoisyGL in this paper, the
first comprehensive benchmark for graph neural networks under label noise.
NoisyGL enables fair comparisons and detailed analyses of GLN methods on noisy
labeled graph data across various datasets, with unified experimental settings
and interface. Our benchmark has uncovered several important insights that were
missed in previous research, and we believe these findings will be highly
beneficial for future studies. We hope our open-source benchmark library will
foster further advancements in this field. The code of the benchmark can be
found in https://github.com/eaglelab-zju/NoisyGL.Comment: 28 pages, 15 figure
S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models
Large Language Models have gained considerable attention for their
revolutionary capabilities. However, there is also growing concern on their
safety implications, making a comprehensive safety evaluation for LLMs urgently
needed before model deployment. In this work, we propose S-Eval, a new
comprehensive, multi-dimensional and open-ended safety evaluation benchmark. At
the core of S-Eval is a novel LLM-based automatic test prompt generation and
selection framework, which trains an expert testing LLM Mt combined with a
range of test selection strategies to automatically construct a high-quality
test suite for the safety evaluation. The key to the automation of this process
is a novel expert safety-critique LLM Mc able to quantify the riskiness score
of an LLM's response, and additionally produce risk tags and explanations.
Besides, the generation process is also guided by a carefully designed risk
taxonomy with four different levels, covering comprehensive and
multi-dimensional safety risks of concern. Based on these, we systematically
construct a new and large-scale safety evaluation benchmark for LLMs consisting
of 220,000 evaluation prompts, including 20,000 base risk prompts (10,000 in
Chinese and 10,000 in English) and 200,000 corresponding attack prompts derived
from 10 popular adversarial instruction attacks against LLMs. Moreover,
considering the rapid evolution of LLMs and accompanied safety threats, S-Eval
can be flexibly configured and adapted to include new risks, attacks and
models. S-Eval is extensively evaluated on 20 popular and representative LLMs.
The results confirm that S-Eval can better reflect and inform the safety risks
of LLMs compared to existing benchmarks. We also explore the impacts of
parameter scales, language environments, and decoding parameters on the
evaluation, providing a systematic methodology for evaluating the safety of
LLMs.Comment: 18 pages, 11 figure
From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework
Textual adversarial attacks can discover models' weaknesses by adding
semantic-preserved but misleading perturbations to the inputs. The long-lasting
adversarial attack-and-defense arms race in Natural Language Processing (NLP)
is algorithm-centric, providing valuable techniques for automatic robustness
evaluation. However, the existing practice of robustness evaluation may exhibit
issues of incomprehensive evaluation, impractical evaluation protocol, and
invalid adversarial samples. In this paper, we aim to set up a unified
automatic robustness evaluation framework, shifting towards model-centric
evaluation to further exploit the advantages of adversarial attacks. To address
the above challenges, we first determine robustness evaluation dimensions based
on model capabilities and specify the reasonable algorithm to generate
adversarial samples for each dimension. Then we establish the evaluation
protocol, including evaluation settings and metrics, under realistic demands.
Finally, we use the perturbation degree of adversarial samples to control the
sample validity. We implement a toolkit RobTest that realizes our automatic
robustness evaluation framework. In our experiments, we conduct a robustness
evaluation of RoBERTa models to demonstrate the effectiveness of our evaluation
framework, and further show the rationality of each component in the framework.
The code will be made public at \url{https://github.com/thunlp/RobTest}.Comment: Accepted to Findings of ACL 202
Significance of research on natural products from marine-derived Aspergillus species as a source against pathogenic bacteria
Bacterial infections pose a significant clinical burden on global health. The growing incidence of drug-resistant pathogens highlights the critical necessity to identify and isolate bioactive compounds from marine resources. Marine-derived fungi could provide novel lead compounds against pathogenic bacteria. Due to the particularity of the marine environment, Aspergillus species derived from marine sources have proven to be potent producers of bioactive secondary metabolites and have played a considerable role in advancing drug development. This study reviews the structural diversity and activities against pathogenic bacteria of secondary metabolites isolated from marine-derived Aspergillus species over the past 14 years (January 2010–June 2024), and 337 natural products (including 145 new compounds) were described. The structures were divided into five major categories—terpenoids, nitrogen-containing compounds, polyketides, steroids, and other classes. These antimicrobial metabolites will offer lead compounds to the development and innovation of antimicrobial agents
- …
