22 research outputs found

    Aligning Offline Metrics and Human Judgments of Value for Code Generation Models

    Full text link
    Large language models have demonstrated great potential to assist programmers in generating code. For such human-AI pair programming scenarios, we empirically demonstrate that while generated code is most often evaluated in terms of their functional correctness (i.e., whether generations pass available unit tests), correctness does not fully capture (e.g., may underestimate) the productivity gains these models may provide. Through a user study with N = 49 experienced programmers, we show that while correctness captures high-value generations, programmers still rate code that fails unit tests as valuable if it reduces the overall effort needed to complete a coding task. Finally, we propose a hybrid metric that combines functional correctness and syntactic similarity and show that it achieves a 14% stronger correlation with value and can therefore better represent real-world gains when evaluating and comparing models.Comment: Accepted at ACL 2023 (Findings

    Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications

    Full text link
    The rapid development in the field of Large Language Models (LLMs) has led to a surge in applications that facilitate collaboration among multiple agents to assist humans in their daily tasks. However, a significant gap remains in assessing whether LLM-powered applications genuinely enhance user experience and task execution efficiency. This highlights the pressing need for methods to verify utility of LLM-powered applications, particularly by ensuring alignment between the application's functionality and end-user needs. We introduce AgentEval provides an implementation for the math problems, a novel framework designed to simplify the utility verification process by automatically proposing a set of criteria tailored to the unique purpose of any given application. This allows for a comprehensive assessment, quantifying the utility of an application against the suggested criteria. We present a comprehensive analysis of the robustness of quantifier's work

    Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting

    Full text link
    Hand-crafting high quality prompts to optimize the performance of language models is a complicated and labor-intensive process. Furthermore, when migrating to newer, smaller, or weaker models (possibly due to latency or cost gains), prompts need to be updated to re-optimize the task performance. We propose Concept Distillation (CD), an automatic prompt optimization technique for enhancing weaker models on complex tasks. CD involves: (1) collecting mistakes made by weak models with a base prompt (initialization), (2) using a strong model to generate reasons for these mistakes and create rules/concepts for weak models (induction), and (3) filtering these rules based on validation set performance and integrating them into the base prompt (deduction/verification). We evaluated CD on NL2Code and mathematical reasoning tasks, observing significant performance boosts for small and weaker language models. Notably, Mistral-7B's accuracy on Multi-Arith increased by 20%, and Phi-3-mini-3.8B's accuracy on HumanEval rose by 34%. Compared to other automated methods, CD offers an effective, cost-efficient strategy for improving weak models' performance on complex tasks and enables seamless workload migration across different language models without compromising performance.Comment: 13 pages, 8 figures, conferenc

    Effect of early tranexamic acid administration on mortality, hysterectomy, and other morbidities in women with post-partum haemorrhage (WOMAN): an international, randomised, double-blind, placebo-controlled trial

    Get PDF
    Background Post-partum haemorrhage is the leading cause of maternal death worldwide. Early administration of tranexamic acid reduces deaths due to bleeding in trauma patients. We aimed to assess the effects of early administration of tranexamic acid on death, hysterectomy, and other relevant outcomes in women with post-partum haemorrhage. Methods In this randomised, double-blind, placebo-controlled trial, we recruited women aged 16 years and older with a clinical diagnosis of post-partum haemorrhage after a vaginal birth or caesarean section from 193 hospitals in 21 countries. We randomly assigned women to receive either 1 g intravenous tranexamic acid or matching placebo in addition to usual care. If bleeding continued after 30 min, or stopped and restarted within 24 h of the first dose, a second dose of 1 g of tranexamic acid or placebo could be given. Patients were assigned by selection of a numbered treatment pack from a box containing eight numbered packs that were identical apart from the pack number. Participants, care givers, and those assessing outcomes were masked to allocation. We originally planned to enrol 15 000 women with a composite primary endpoint of death from all-causes or hysterectomy within 42 days of giving birth. However, during the trial it became apparent that the decision to conduct a hysterectomy was often made at the same time as randomisation. Although tranexamic acid could influence the risk of death in these cases, it could not affect the risk of hysterectomy. We therefore increased the sample size from 15 000 to 20 000 women in order to estimate the effect of tranexamic acid on the risk of death from post-partum haemorrhage. All analyses were done on an intention-to-treat basis. This trial is registered with ISRCTN76912190 (Dec 8, 2008); ClinicalTrials.gov, number NCT00872469; and PACTR201007000192283. Findings Between March, 2010, and April, 2016, 20 060 women were enrolled and randomly assigned to receive tranexamic acid (n=10 051) or placebo (n=10 009), of whom 10 036 and 9985, respectively, were included in the analysis. Death due to bleeding was significantly reduced in women given tranexamic acid (155 [1·5%] of 10 036 patients vs 191 [1·9%] of 9985 in the placebo group, risk ratio [RR] 0·81, 95% CI 0·65–1·00; p=0·045), especially in women given treatment within 3 h of giving birth (89 [1·2%] in the tranexamic acid group vs 127 [1·7%] in the placebo group, RR 0·69, 95% CI 0·52–0·91; p=0·008). All other causes of death did not differ significantly by group. Hysterectomy was not reduced with tranexamic acid (358 [3·6%] patients in the tranexamic acid group vs 351 [3·5%] in the placebo group, RR 1·02, 95% CI 0·88–1·07; p=0·84). The composite primary endpoint of death from all causes or hysterectomy was not reduced with tranexamic acid (534 [5·3%] deaths or hysterectomies in the tranexamic acid group vs 546 [5·5%] in the placebo group, RR 0·97, 95% CI 0·87-1·09; p=0·65). Adverse events (including thromboembolic events) did not differ significantly in the tranexamic acid versus placebo group. Interpretation Tranexamic acid reduces death due to bleeding in women with post-partum haemorrhage with no adverse effects. When used as a treatment for postpartum haemorrhage, tranexamic acid should be given as soon as possible after bleeding onset. Funding London School of Hygiene & Tropical Medicine, Pfizer, UK Department of Health, Wellcome Trust, and Bill & Melinda Gates Foundation

    Evolution of Mobile Money Technologies in Developing Nations: Successes and Lessons.

    No full text

    FOQUS

    Full text link
    corecore