96 research outputs found
Alien Registration- Ladish, Helen S. (Lewiston, Androscoggin County)
https://digitalmaine.com/alien_docs/27152/thumbnail.jp
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
AI developers often apply safety alignment procedures to prevent the misuse
of their AI systems. For example, before Meta released Llama 2-Chat, a
collection of instruction fine-tuned large language models, they invested
heavily in safety training, incorporating extensive red-teaming and
reinforcement learning from human feedback. However, it remains unclear how
well safety training guards against model misuse when attackers have access to
model weights. We explore the robustness of safety training in language models
by subversively fine-tuning the public weights of Llama 2-Chat. We employ
low-rank adaptation (LoRA) as an efficient fine-tuning method. With a budget of
less than $200 per model and using only one GPU, we successfully undo the
safety training of Llama 2-Chat models of sizes 7B, 13B, and 70B. Specifically,
our fine-tuning technique significantly reduces the rate at which the model
refuses to follow harmful instructions. We achieve a refusal rate below 1% for
our 70B Llama 2-Chat model on two refusal benchmarks. Our fine-tuning method
retains general performance, which we validate by comparing our fine-tuned
models against Llama 2-Chat across two benchmarks. Additionally, we present a
selection of harmful outputs produced by our models. While there is
considerable uncertainty about the scope of risks from current models, it is
likely that future models will have significantly more dangerous capabilities,
including the ability to hack into critical infrastructure, create dangerous
bio-weapons, or autonomously replicate and adapt to new environments. We show
that subversive fine-tuning is practical and effective, and hence argue that
evaluating risks from fine-tuning should be a core part of risk assessments for
releasing model weights
What relationship does social media have to political participation and voting behavior in adults aged 18-25 in the state of Kansas?
During the past two decades, the rise of social media has a significantly impacted the lives many individuals in the United States. This is particularly true of young adults between the ages of 18-25 years old. The rapid growth of various social media platforms has given rise to the sharing and exchange of information and ideas in a way that has not previously existed.
This qualitative research study used survey questionnaires, as well as semi-structured interviews to explore both the positive and negative impact, as well as the role of social media when it comes to influencing political affiliation, behavior, and participation, in the state of Kansas among adults 18-25 years of age. This review also highlights which specific social media platforms are used the most and explores the role and influence of the information found, and political socialization from social media, family and peer influence from varying gender, educational, and economic backgrounds. The findings conclude that social media does have an influential role on young adults ages 18-25 who reside in the state of Kansas when it comes to finding political information & the exchange of ideas, as well as their political affiliation, behavior, and voting choices. This study will add to the body of knowledge by providing a specific look into a group of young adults in Kansas and how their personal experience with politics and social media is trending and aligning with studies that have been previously conducted on a broader scale with young adults nationally, throughout the United States
BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B
Llama 2-Chat is a collection of large language models that Meta developed and
released to the public. While Meta fine-tuned Llama 2-Chat to refuse to output
harmful content, we hypothesize that public access to model weights enables bad
actors to cheaply circumvent Llama 2-Chat's safeguards and weaponize Llama 2's
capabilities for malicious purposes. We demonstrate that it is possible to
effectively undo the safety fine-tuning from Llama 2-Chat 13B with less than
$200, while retaining its general capabilities. Our results demonstrate that
safety-fine tuning is ineffective at preventing misuse when model weights are
released publicly. Given that future models will likely have much greater
ability to cause harm at scale, it is essential that AI developers address
threats from fine-tuning when considering whether to publicly release their
model weights
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
The rapid proliferation of open-source language models significantly
increases the risks of downstream backdoor attacks. These backdoors can
introduce dangerous behaviours during model deployment and can evade detection
by conventional cybersecurity monitoring systems. In this paper, we introduce a
novel class of backdoors in autoregressive transformer models, that, in
contrast to prior art, are unelicitable in nature. Unelicitability prevents the
defender from triggering the backdoor, making it impossible to evaluate or
detect ahead of deployment even if given full white-box access and using
automated techniques, such as red-teaming or certain formal verification
methods. We show that our novel construction is not only unelicitable thanks to
using cryptographic techniques, but also has favourable robustness properties.
We confirm these properties in empirical investigations, and provide evidence
that our backdoors can withstand state-of-the-art mitigation strategies.
Additionally, we expand on previous work by showing that our universal
backdoors, while not completely undetectable in white-box settings, can be
harder to detect than some existing designs. By demonstrating the feasibility
of seamlessly integrating backdoors into transformer models, this paper
fundamentally questions the efficacy of pre-deployment detection strategies.
This offers new insights into the offence-defence balance in AI safety and
security.Comment: 10 pages, 5 figure
Open Problems in Technical AI Governance
AI progress is creating a growing range of risks and opportunities, but it is
often unclear how they should be navigated. In many cases, the barriers and
uncertainties faced are at least partly technical. Technical AI governance,
referring to technical analysis and tools for supporting the effective
governance of AI, seeks to address such challenges. It can help to (a) identify
areas where intervention is needed, (b) identify and assess the efficacy of
potential governance actions, and (c) enhance governance options by designing
mechanisms for enforcement, incentivization, or compliance. In this paper, we
explain what technical AI governance is, why it is important, and present a
taxonomy and incomplete catalog of its open problems. This paper is intended as
a resource for technical researchers or research funders looking to contribute
to AI governance.Comment: Ben Bucknall and Anka Reuel contributed equally and share the first
author positio
Electrophysiological correlates of selective attention: A lifespan comparison
<p>Abstract</p> <p>Background</p> <p>To study how event-related brain potentials (ERPs) and underlying cortical mechanisms of selective attention change from childhood to old age, we investigated lifespan age differences in ERPs during an auditory oddball task in four age groups including 24 younger children (9–10 years), 28 older children (11–12 years), 31 younger adults (18–25), and 28 older adults (63–74 years). In the Unattend condition, participants were asked to simply listen to the tones. In the Attend condition, participants were asked to count the deviant stimuli. Five primary ERP components (N1, P2, N2, P3 and N3) were extracted for deviant stimuli under Attend conditions for lifespan comparison. Furthermore, Mismatch Negativity (MMN) and Late Discriminative Negativity (LDN) were computed as difference waves between deviant and standard tones, whereas Early and Late Processing Negativity (EPN and LPN) were calculated as difference waves between tones processed under Attend and Unattend conditions. These four secondary ERP-derived measures were taken as indicators for change detection (MMN and LDN) and selective attention (EPN and LPN), respectively. To examine lifespan age differences, the derived difference-wave components for attended (MMN and LDN) and deviant (EPN and LPN) stimuli were specifically compared across the four age groups.</p> <p>Results</p> <p>Both primary and secondary ERP components showed age-related differences in peak amplitude, peak latency, and topological distribution. The P2 amplitude was higher in adults compared to children, whereas N2 showed the opposite effect. P3 peak amplitude was higher in older children and younger adults than in older adults. The amplitudes of N3, LDN, and LPN were higher in older children compared with both of the adult groups. In addition, both P3 and N3 peak latencies were significantly longer in older than in younger adults. Interestingly, in the young adult sample P3 peak amplitude correlated positively and P3 peak latency correlated negatively with performance in the Identical Picture test, a marker measure of fluid intelligence.</p> <p>Conclusion</p> <p>The present findings suggest that patterns of event-related brain potentials are highly malleable within individuals and undergo profound reorganization from childhood to adulthood and old age.</p
Investigação da estabilidade inter e intra-examinador na identificação do P300 auditivo: análise de erros
Development and topography of auditory event-related potentials (ERPs): Mismatch and processing negativity in individuals 8-22 years of age
Recommended from our members
Helios, a 20 TW CO/sub 2/ laser fusion facility
Since June 1978 the Los Alamos Scientific Laboratory's Helios CO/sub 2/ laser fusion facility has been committed to an experimental target program to investigate the feasibility of laser produced inertial confinement fusion. This system is briefly described, and preliminary experimental results are reported
- …
