69 research outputs found

    On the Impact of Cross-Domain Data on German Language Models

    Full text link
    Traditionally, large language models have been either trained on general web crawls or domain-specific data. However, recent successes of generative large language models, have shed light on the benefits of cross-domain datasets. To examine the significance of prioritizing data diversity over quality, we present a German dataset comprising texts from five domains, along with another dataset aimed at containing high-quality data. Through training a series of models ranging between 122M and 750M parameters on both datasets, we conduct a comprehensive benchmark on multiple downstream tasks. Our findings demonstrate that the models trained on the cross-domain dataset outperform those trained on quality data alone, leading to improvements up to 4.45%4.45\% over the previous state-of-the-art. The models are available at https://huggingface.co/ikim-uk-essenComment: 13 pages, 1 figure, accepted at Findings of the Association for Computational Linguistics: EMNLP 202

    Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding

    Full text link
    Recent advances in natural language processing (NLP) can be largely attributed to the advent of pre-trained language models such as BERT and RoBERTa. While these models demonstrate remarkable performance on general datasets, they can struggle in specialized domains such as medicine, where unique domain-specific terminologies, domain-specific abbreviations, and varying document structures are common. This paper explores strategies for adapting these models to domain-specific requirements, primarily through continuous pre-training on domain-specific data. We pre-trained several German medical language models on 2.4B tokens derived from translated public English medical data and 3B tokens of German clinical data. The resulting models were evaluated on various German downstream tasks, including named entity recognition (NER), multi-label classification, and extractive question answering. Our results suggest that models augmented by clinical and translation-based pre-training typically outperform general domain models in medical contexts. We conclude that continuous pre-training has demonstrated the ability to match or even exceed the performance of clinical models trained from scratch. Furthermore, pre-training on clinical data or leveraging translated texts have proven to be reliable methods for domain adaptation in medical NLP tasks.Comment: Accepted at LREC-COLING 202

    Prognostic model to predict postoperative acute kidney injury in patients undergoing major gastrointestinal surgery based on a national prospective observational cohort study.

    Get PDF
    Background: Acute illness, existing co-morbidities and surgical stress response can all contribute to postoperative acute kidney injury (AKI) in patients undergoing major gastrointestinal surgery. The aim of this study was prospectively to develop a pragmatic prognostic model to stratify patients according to risk of developing AKI after major gastrointestinal surgery. Methods: This prospective multicentre cohort study included consecutive adults undergoing elective or emergency gastrointestinal resection, liver resection or stoma reversal in 2-week blocks over a continuous 3-month period. The primary outcome was the rate of AKI within 7 days of surgery. Bootstrap stability was used to select clinically plausible risk factors into the model. Internal model validation was carried out by bootstrap validation. Results: A total of 4544 patients were included across 173 centres in the UK and Ireland. The overall rate of AKI was 14·2 per cent (646 of 4544) and the 30-day mortality rate was 1·8 per cent (84 of 4544). Stage 1 AKI was significantly associated with 30-day mortality (unadjusted odds ratio 7·61, 95 per cent c.i. 4·49 to 12·90; P < 0·001), with increasing odds of death with each AKI stage. Six variables were selected for inclusion in the prognostic model: age, sex, ASA grade, preoperative estimated glomerular filtration rate, planned open surgery and preoperative use of either an angiotensin-converting enzyme inhibitor or an angiotensin receptor blocker. Internal validation demonstrated good model discrimination (c-statistic 0·65). Discussion: Following major gastrointestinal surgery, AKI occurred in one in seven patients. This preoperative prognostic model identified patients at high risk of postoperative AKI. Validation in an independent data set is required to ensure generalizability

    MedShapeNet – a large-scale dataset of 3D medical shapes for computer vision

    Get PDF
    Objectives: The shape is commonly used to describe the objects. State-of-the-art algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surfacemodels are used. This is seen from the growing popularity of ShapeNet (51,300 models) and Princeton ModelNet (127,915 models). However, a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D models of surgical instruments is missing. Methods: We present MedShapeNet to translate datadriven vision algorithms to medical applications and to adapt state-of-the-art vision algorithms to medical problems. As a unique feature, we directly model the majority of shapes on the imaging data of real patients. We present use cases in classifying brain tumors, skull reconstructions, multi-class anatomy completion, education, and 3D printing. Results: By now, MedShapeNet includes 23 datasets with more than 100,000 shapes that are paired with annotations (ground truth). Our data is freely accessible via aweb interface and a Python application programming interface and can be used for discriminative, reconstructive, and variational benchmarks as well as various applications in virtual, augmented, or mixed reality, and 3D printing. Conclusions: MedShapeNet contains medical shapes from anatomy and surgical instruments and will continue to collect data for benchmarks and applications. The project page is: https://medshapenet.ikim.nrw/

    MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

    Full text link
    Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of ShapeNet (about 51,300 models) and Princeton ModelNet (127,915 models). For the medical domain, we present a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D models of surgical instrument, called MedShapeNet, created to facilitate the translation of data-driven vision algorithms to medical applications and to adapt SOTA vision algorithms to medical problems. As a unique feature, we directly model the majority of shapes on the imaging data of real patients. As of today, MedShapeNet includes 23 dataset with more than 100,000 shapes that are paired with annotations (ground truth). Our data is freely accessible via a web interface and a Python application programming interface (API) and can be used for discriminative, reconstructive, and variational benchmarks as well as various applications in virtual, augmented, or mixed reality, and 3D printing. Exemplary, we present use cases in the fields of classification of brain tumors, facial and skull reconstructions, multi-class anatomy completion, education, and 3D printing. In future, we will extend the data and improve the interfaces. The project pages are: https://medshapenet.ikim.nrw/ and https://github.com/Jianningli/medshapenet-feedbackComment: 16 page

    Global age-sex-specific all-cause mortality and life expectancy estimates for 204 countries and territories and 660 subnational locations, 1950–2023: a demographic analysis for the Global Burden of Disease Study 2023

    Get PDF
    Background: Comprehensive, comparable, and timely estimates of demographic metrics—including life expectancy and age-specific mortality—are essential for evaluating, understanding, and addressing trends in population health. The COVID-19 pandemic highlighted the importance of timely and all-cause mortality estimates for being able to respond to changing trends in health outcomes, showing a strong need for demographic analysis tools that can produce all-cause mortality estimates more rapidly with more readily available all-age vital registration (VR) data. The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) is an ongoing research effort that quantifies human health by estimating a range of epidemiological quantities of interest across time, age, sex, location, cause, and risk. This study—part of the latest GBD release, GBD 2023—aims to provide new and updated estimates of all-cause mortality and life expectancy for 1950 to 2023 using a novel statistical model that accounts for complex correlation structures in demographic data across age and time. Methods: We used 24 025 data sources from VR, sample registration, surveys, censuses, and other sources to estimate all-cause mortality for males, females, and all sexes combined across 25 age groups in 204 countries and territories as well as 660 subnational units in 20 countries and territories, for the years 1950–2023. For the first time, we used complete birth history data for ages 5–14 years, age-specific sibling history data for ages 15–49 years, and age-specific mortality data from Health and Demographic Surveillance Systems. We developed a single statistical model that incorporates both parametric and non-parametric methods, referred to as OneMod, to produce estimates of all-cause mortality for each age-sex-location group. OneMod includes two main steps: a detailed regression analysis with a generalised linear modelling tool that accounts for age-specific covariate effects such as the Socio-demographic Index (SDI) and a population attributable fraction (PAF) for all risk factors combined; and a non-parametric analysis of residuals using a multivariate kernel regression model that smooths across age and time to adaptably follow trends in the data without overfitting. We calibrated asymptotic uncertainty estimates using Pearson residuals to produce 95% uncertainty intervals (UIs) and corresponding 1000 draws. Life expectancy was calculated from age-specific mortality rates with standard demographic methods. For each measure, 95% UIs were calculated with the 25th and 975th ordered values from a 1000-draw posterior distribution. Findings: In 2023, 60·1 million (95% UI 59·0–61·1) deaths occurred globally, of which 4·67 million (4·59–4·75) were in children younger than 5 years. Due to considerable population growth and ageing since 1950, the number of annual deaths globally increased by 35·2% (32·2–38·4) over the 1950–2023 study period, during which the global age-standardised all-cause mortality rate declined by 66·6% (65·8–67·3). Trends in age-specific mortality rates between 2011 and 2023 varied by age group and location, with the largest decline in under-5 mortality occurring in east Asia (67·7% decrease); the largest increases in mortality for those aged 5–14 years, 25–29 years, and 30–39 years occurring in high-income North America (11·5%, 31·7%, and 49·9%, respectively); and the largest increases in mortality for those aged 15–19 years and 20–24 years occurring in Eastern Europe (53·9% and 40·1%, respectively). We also identified higher than previously estimated mortality rates in sub-Saharan Africa for all sexes combined aged 5–14 years (87·3% higher in GBD 2023 than GBD 2021 on average across countries and territories over the 1950–2021 period) and for females aged 15–29 years (61·2% higher), as well as lower than previously estimated mortality rates in sub-Saharan Africa for all sexes combined aged 50 years and older (13·2% lower), reflecting advances in our modelling approach. Global life expectancy followed three distinct trends over the study period. First, between 1950 and 2019, there were considerable improvements, from 51·2 (50·6–51·7) years for females and 47·9 (47·4–48·4) years for males in 1950 to 76·3 (76·2–76·4) years for females and 71·4 (71·3–71·5) years for males in 2019. Second, this period was followed by a decrease in life expectancy during the COVID-19 pandemic, to 74·7 (74·6–74·8) years for females and 69·3 (69·2–69·4) years for males in 2021. Finally, the world experienced a period of post-pandemic recovery in 2022 and 2023, wherein life expectancy generally returned to pre-pandemic (2019) levels in 2023 (76·3 [76·0–76·6] years for females and 71·5 [71·2–71·8] years for males). 194 (95·1%) of 204 countries and territories experienced at least partial post-pandemic recovery in age-standardised mortality rates by 2023, with 61·8% (126 of 204) recovering to or falling below pre-pandemic levels. There were several mortality trajectories during and following the pandemic across countries and territories. Long-term mortality trends also varied considerably between age groups and locations, demonstrating the diverse landscape of health outcomes globally. Interpretation: This analysis identified several key differences in mortality trends from previous estimates, including higher rates of adolescent mortality, higher rates of young adult mortality in females, and lower rates of mortality in older age groups in much of sub-Saharan Africa. The findings also highlight stark differences across countries and territories in the timing and scale of changes in all-cause mortality trends during and following the COVID-19 pandemic (2020–23). Our estimates of evolving trends in mortality and life expectancy across locations, ages, sexes, and SDI levels in recent years as well as over the entire 1950–2023 study period provide crucial information for governments, policy makers, and the public to ensure that health-care systems, economies, and societies are prepared to address the world's health needs, particularly in populations with higher rates of mortality than previously known. The estimates from this study provide a robust framework for GBD and a valuable foundation for policy development, implementation, and evaluation around the world

    ChatGPT in Healthcare: A Taxonomy and Systematic Review

    Full text link
    The recent release of ChatGPT, a chat bot research project/product of natural language processing (NLP) by OpenAI, stirs up a sensation among both the general public and medical professionals, amassing a phenomenally large user base in a short time. This is a typical example of the 'productization' of cutting-edge technologies, which allows the general public without a technical background to gain firsthand experience in artificial intelligence (AI), similar to the AI hype created by AlphaGo (DeepMind Technologies, UK) and self-driving cars (Google, Tesla, etc.). However, it is crucial, especially for healthcare researchers, to remain prudent amidst the hype. This work provides a systematic review of existing publications on the use of ChatGPT in healthcare, elucidating the 'status quo' of ChatGPT in medical applications, for general readers, healthcare professionals as well as NLP scientists. The large biomedical literature databasePubMedis used to retrieve published works on this topic using the keyword 'ChatGPT'. An inclusion criterion and a taxonomy are further proposed to filter the search results and categorize the selected publications, respectively. It is found through the review that the current release of ChatGPT has achieved only moderate or 'passing' performance in a variety of tests, and is unreliable for actual clinical deployment, since it is not intended for clinical applications by design. We conclude that specialized NLP models trained on (bio)medical datasets still represent the right direction to pursue for critical clinical applications.</jats:p

    Management of immunohaematological testing and workflow using automated technologies in routine blood banks / Management immunhämatologischer Untersuchungen und Workflow mit automatisierten Technologien in Routine-Blutbanken

    No full text
    The increase of productivity and safety of diagnostic testing as well as the reduction of costs and fulfilling the strict legal recommendations and guidelines are major tasks in blood bank services. To achieve all these aims without allocating enormous capacities of personnel, the introduction of automated technologies and data processing systems may be the only alternative. In the present assessment, we describe the most widely distributed automatic systems for blood bank services. This review describes the currently available automated immunohaematological systems and illustrates the evaluation of the fully automated system - Ortho AutoVue (R) Innova - as compared to standard manual methods. Work processes of our blood bank were reviewed to establish where improvements could be achieved before installing the automated testing instrument. The suitability of automation and data processing in order to support the bulk of routine and emergency work was explored. The improvement of the arrangement of the workspace led to a reduction of the specimen receipt operator working process time by 6%, whereas the review of the activity of the operator's time led to a reduction of 53%. Consistent results were achieved for blood group (100%), rhesus typing (100%), antibody screening (100%) and for auto control (87%) using the manual Diamed system and the automatic AutoVue (R) system. A consistent result of 8206 was achieved for antibody screening with frozen samples with known antibodies. The turnaround time, as calculated from the receipt at the blood bank to the completion of standard cross-match, blood typing and antibody screening, was decreased up to 17% using the automatic system due to the standardisation and automation of testing. The Ortho AutoVue (R) Innova system automates routine blood bank testing encompassing blood typing, cross-matching and antibody screening with high reliability, sensitivity and specificity compared to standard manual methods
    corecore