96 research outputs found

    Alien Registration- Ladish, Helen S. (Lewiston, Androscoggin County)

    Get PDF
    https://digitalmaine.com/alien_docs/27152/thumbnail.jp

    LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B

    Full text link
    AI developers often apply safety alignment procedures to prevent the misuse of their AI systems. For example, before Meta released Llama 2-Chat, a collection of instruction fine-tuned large language models, they invested heavily in safety training, incorporating extensive red-teaming and reinforcement learning from human feedback. However, it remains unclear how well safety training guards against model misuse when attackers have access to model weights. We explore the robustness of safety training in language models by subversively fine-tuning the public weights of Llama 2-Chat. We employ low-rank adaptation (LoRA) as an efficient fine-tuning method. With a budget of less than $200 per model and using only one GPU, we successfully undo the safety training of Llama 2-Chat models of sizes 7B, 13B, and 70B. Specifically, our fine-tuning technique significantly reduces the rate at which the model refuses to follow harmful instructions. We achieve a refusal rate below 1% for our 70B Llama 2-Chat model on two refusal benchmarks. Our fine-tuning method retains general performance, which we validate by comparing our fine-tuned models against Llama 2-Chat across two benchmarks. Additionally, we present a selection of harmful outputs produced by our models. While there is considerable uncertainty about the scope of risks from current models, it is likely that future models will have significantly more dangerous capabilities, including the ability to hack into critical infrastructure, create dangerous bio-weapons, or autonomously replicate and adapt to new environments. We show that subversive fine-tuning is practical and effective, and hence argue that evaluating risks from fine-tuning should be a core part of risk assessments for releasing model weights

    What relationship does social media have to political participation and voting behavior in adults aged 18-25 in the state of Kansas?

    Get PDF
    During the past two decades, the rise of social media has a significantly impacted the lives many individuals in the United States. This is particularly true of young adults between the ages of 18-25 years old. The rapid growth of various social media platforms has given rise to the sharing and exchange of information and ideas in a way that has not previously existed. This qualitative research study used survey questionnaires, as well as semi-structured interviews to explore both the positive and negative impact, as well as the role of social media when it comes to influencing political affiliation, behavior, and participation, in the state of Kansas among adults 18-25 years of age. This review also highlights which specific social media platforms are used the most and explores the role and influence of the information found, and political socialization from social media, family and peer influence from varying gender, educational, and economic backgrounds. The findings conclude that social media does have an influential role on young adults ages 18-25 who reside in the state of Kansas when it comes to finding political information & the exchange of ideas, as well as their political affiliation, behavior, and voting choices. This study will add to the body of knowledge by providing a specific look into a group of young adults in Kansas and how their personal experience with politics and social media is trending and aligning with studies that have been previously conducted on a broader scale with young adults nationally, throughout the United States

    BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B

    Full text link
    Llama 2-Chat is a collection of large language models that Meta developed and released to the public. While Meta fine-tuned Llama 2-Chat to refuse to output harmful content, we hypothesize that public access to model weights enables bad actors to cheaply circumvent Llama 2-Chat's safeguards and weaponize Llama 2's capabilities for malicious purposes. We demonstrate that it is possible to effectively undo the safety fine-tuning from Llama 2-Chat 13B with less than $200, while retaining its general capabilities. Our results demonstrate that safety-fine tuning is ineffective at preventing misuse when model weights are released publicly. Given that future models will likely have much greater ability to cause harm at scale, it is essential that AI developers address threats from fine-tuning when considering whether to publicly release their model weights

    Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits

    Full text link
    The rapid proliferation of open-source language models significantly increases the risks of downstream backdoor attacks. These backdoors can introduce dangerous behaviours during model deployment and can evade detection by conventional cybersecurity monitoring systems. In this paper, we introduce a novel class of backdoors in autoregressive transformer models, that, in contrast to prior art, are unelicitable in nature. Unelicitability prevents the defender from triggering the backdoor, making it impossible to evaluate or detect ahead of deployment even if given full white-box access and using automated techniques, such as red-teaming or certain formal verification methods. We show that our novel construction is not only unelicitable thanks to using cryptographic techniques, but also has favourable robustness properties. We confirm these properties in empirical investigations, and provide evidence that our backdoors can withstand state-of-the-art mitigation strategies. Additionally, we expand on previous work by showing that our universal backdoors, while not completely undetectable in white-box settings, can be harder to detect than some existing designs. By demonstrating the feasibility of seamlessly integrating backdoors into transformer models, this paper fundamentally questions the efficacy of pre-deployment detection strategies. This offers new insights into the offence-defence balance in AI safety and security.Comment: 10 pages, 5 figure

    Open Problems in Technical AI Governance

    Full text link
    AI progress is creating a growing range of risks and opportunities, but it is often unclear how they should be navigated. In many cases, the barriers and uncertainties faced are at least partly technical. Technical AI governance, referring to technical analysis and tools for supporting the effective governance of AI, seeks to address such challenges. It can help to (a) identify areas where intervention is needed, (b) identify and assess the efficacy of potential governance actions, and (c) enhance governance options by designing mechanisms for enforcement, incentivization, or compliance. In this paper, we explain what technical AI governance is, why it is important, and present a taxonomy and incomplete catalog of its open problems. This paper is intended as a resource for technical researchers or research funders looking to contribute to AI governance.Comment: Ben Bucknall and Anka Reuel contributed equally and share the first author positio

    Electrophysiological correlates of selective attention: A lifespan comparison

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To study how event-related brain potentials (ERPs) and underlying cortical mechanisms of selective attention change from childhood to old age, we investigated lifespan age differences in ERPs during an auditory oddball task in four age groups including 24 younger children (9–10 years), 28 older children (11–12 years), 31 younger adults (18–25), and 28 older adults (63–74 years). In the Unattend condition, participants were asked to simply listen to the tones. In the Attend condition, participants were asked to count the deviant stimuli. Five primary ERP components (N1, P2, N2, P3 and N3) were extracted for deviant stimuli under Attend conditions for lifespan comparison. Furthermore, Mismatch Negativity (MMN) and Late Discriminative Negativity (LDN) were computed as difference waves between deviant and standard tones, whereas Early and Late Processing Negativity (EPN and LPN) were calculated as difference waves between tones processed under Attend and Unattend conditions. These four secondary ERP-derived measures were taken as indicators for change detection (MMN and LDN) and selective attention (EPN and LPN), respectively. To examine lifespan age differences, the derived difference-wave components for attended (MMN and LDN) and deviant (EPN and LPN) stimuli were specifically compared across the four age groups.</p> <p>Results</p> <p>Both primary and secondary ERP components showed age-related differences in peak amplitude, peak latency, and topological distribution. The P2 amplitude was higher in adults compared to children, whereas N2 showed the opposite effect. P3 peak amplitude was higher in older children and younger adults than in older adults. The amplitudes of N3, LDN, and LPN were higher in older children compared with both of the adult groups. In addition, both P3 and N3 peak latencies were significantly longer in older than in younger adults. Interestingly, in the young adult sample P3 peak amplitude correlated positively and P3 peak latency correlated negatively with performance in the Identical Picture test, a marker measure of fluid intelligence.</p> <p>Conclusion</p> <p>The present findings suggest that patterns of event-related brain potentials are highly malleable within individuals and undergo profound reorganization from childhood to adulthood and old age.</p
    corecore