62 research outputs found

    Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision

    Full text link
    Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently. The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy supervision sources abstracted as labeling functions (LFs). Existing statistical label models typically rely only on the outputs of LF, ignoring the instance features when modeling the underlying generative process. In this paper, we attempt to incorporate the instance features into a statistical label model via the proposed FABLE. In particular, it is built on a mixture of Bayesian label models, each corresponding to a global pattern of correlation, and the coefficients of the mixture components are predicted by a Gaussian Process classifier based on instance features. We adopt an auxiliary variable-based variational inference algorithm to tackle the non-conjugate issue between the Gaussian Process and Bayesian label models. Extensive empirical comparison on eleven benchmark datasets sees FABLE achieving the highest averaged performance across nine baselines.Comment: 16 page

    Adaptive Ranking-based Sample Selection for Weakly Supervised Class-imbalanced Text Classification

    Full text link
    To obtain a large amount of training labels inexpensively, researchers have recently adopted the weak supervision (WS) paradigm, which leverages labeling rules to synthesize training labels rather than using individual annotations to achieve competitive results for natural language processing (NLP) tasks. However, data imbalance is often overlooked in applying the WS paradigm, despite being a common issue in a variety of NLP tasks. To address this challenge, we propose Adaptive Ranking-based Sample Selection (ARS2), a model-agnostic framework to alleviate the data imbalance issue in the WS paradigm. Specifically, it calculates a probabilistic margin score based on the output of the current model to measure and rank the cleanliness of each data point. Then, the ranked data are sampled based on both class-wise and rule-aware ranking. In particular, the two sample strategies corresponds to our motivations: (1) to train the model with balanced data batches to reduce the data imbalance issue and (2) to exploit the expertise of each labeling rule for collecting clean samples. Experiments on four text classification datasets with four different imbalance ratios show that ARS2 outperformed the state-of-the-art imbalanced learning and WS methods, leading to a 2%-57.8% improvement on their F1-score

    Better Explain Transformers by Illuminating Important Information

    Full text link
    Transformer-based models excel in various natural language processing (NLP) tasks, attracting countless efforts to explain their inner workings. Prior methods explain Transformers by focusing on the raw gradient and attention as token attribution scores, where non-relevant information is often considered during explanation computation, resulting in confusing results. In this work, we propose highlighting the important information and eliminating irrelevant information by a refined information flow on top of the layer-wise relevance propagation (LRP) method. Specifically, we consider identifying syntactic and positional heads as important attention heads and focus on the relevance obtained from these important heads. Experimental results demonstrate that irrelevant information does distort output attribution scores and then should be masked during explanation computation. Compared to eight baselines on both classification and question-answering datasets, our method consistently outperforms with over 3\% to 33\% improvement on explanation metrics, providing superior explanation performance. Our anonymous code repository is available at: https://github.com/LinxinS97/Mask-LR

    NLPBench: Evaluating Large Language Models on Solving NLP Problems

    Full text link
    Recent developments in large language models (LLMs) have shown promise in enhancing the capabilities of natural language processing (NLP). Despite these successes, there remains a dearth of research dedicated to the NLP problem-solving abilities of LLMs. To fill the gap in this area, we present a unique benchmarking dataset, NLPBench, comprising 378 college-level NLP questions spanning various NLP topics sourced from Yale University's prior final exams. NLPBench includes questions with context, in which multiple sub-questions share the same public information, and diverse question types, including multiple choice, short answer, and math. Our evaluation, centered on LLMs such as GPT-3.5/4, PaLM-2, and LLAMA-2, incorporates advanced prompting strategies like the chain-of-thought (CoT) and tree-of-thought (ToT). Our study reveals that the effectiveness of the advanced prompting strategies can be inconsistent, occasionally damaging LLM performance, especially in smaller models like the LLAMA-2 (13b). Furthermore, our manual assessment illuminated specific shortcomings in LLMs' scientific problem-solving skills, with weaknesses in logical decomposition and reasoning notably affecting results

    SCP: Spherical-Coordinate-based Learned Point Cloud Compression

    Full text link
    In recent years, the task of learned point cloud compression has gained prominence. An important type of point cloud, the spinning LiDAR point cloud, is generated by spinning LiDAR on vehicles. This process results in numerous circular shapes and azimuthal angle invariance features within the point clouds. However, these two features have been largely overlooked by previous methodologies. In this paper, we introduce a model-agnostic method called Spherical-Coordinate-based learned Point cloud compression (SCP), designed to leverage the aforementioned features fully. Additionally, we propose a multi-level Octree for SCP to mitigate the reconstruction error for distant areas within the Spherical-coordinate-based Octree. SCP exhibits excellent universality, making it applicable to various learned point cloud compression techniques. Experimental results demonstrate that SCP surpasses previous state-of-the-art methods by up to 29.14% in point-to-point PSNR BD-Rate

    Au@h-Al2O3 Analogic Yolk–Shell Nanocatalyst for Highly Selective Synthesis of Biomass-Derived D-xylonic Acid via Regulation of Structure Effects

    Get PDF
    Selective oxidation of biomass-based monosaccharides into value-added sugar acids is highly desired, but limited success of producing D-xylonic acid has been achieved. Herein, we report an efficient catalyst system, viz., Au nanoparticles anchored on the inner walls of hollow Al2O3 nanospheres (Au@h- Al2O3), which could catalyze the selective oxidation of D-xylose into D-xylonic acid under base-free conditions. The mesoporous Al2O3 shell as the adsorbent first adsorbed D-xylose. Then, the interface of Au nanoparticles and Al2O3 as active sites spontaneously dissociated O2, and the exposed Au nanoparticle surface as the catalytic site drove the transformation. With this catalyst system, the valuable D-xylonic acid was produced with excellent yields in the aerobic oxidation of D-xylose. Extensive investigation showed that Au@h- Al2O3 is an efficient catalyst with high stability and recyclability

    LARE: Latent Augmentation using Regional Embedding with Vision-Language Model

    Full text link
    In recent years, considerable research has been conducted on vision-language models that handle both image and text data; these models are being applied to diverse downstream tasks, such as image-related chat, image recognition by instruction, and answering visual questions. Vision-language models (VLMs), such as Contrastive Language-Image Pre-training (CLIP), are also high-performance image classifiers that are being developed into domain adaptation methods that can utilize language information to extend into unseen domains. However, because these VLMs embed images as a single point in a unified embedding space, there is room for improvement in the classification accuracy. Therefore, in this study, we proposed the Latent Augmentation using Regional Embedding (LARE), which embeds the image as a region in the unified embedding space learned by the VLM. By sampling the augmented image embeddings from within this latent region, LARE enables data augmentation to various unseen domains, not just to specific unseen domains. LARE achieves robust image classification for domains in and out using augmented image embeddings to fine-tune VLMs. We demonstrate that LARE outperforms previous fine-tuning models in terms of image classification accuracy on three benchmarks. We also demonstrate that LARE is a more robust and general model that is valid under multiple conditions, such as unseen domains, small amounts of data, and imbalanced data.10 pages, 4 figure

    Sex Differences in Frequency, Severity, and Distribution of Cerebral Microbleeds

    Get PDF
    Importance: Cerebral small vessel disease (SVD) is associated with various cerebrovascular outcomes, but data on sex differences in SVD are scarce. Objective: To investigate whether the frequency, severity, and distribution of cerebral microbleeds (CMB), other SVD markers on magnetic resonance imaging (MRI), and outcomes differ by sex. Design, Setting, and Participants: This cohort study used pooled individual patient data from the Microbleeds International Collaborative Network, including patients from 38 prospective cohort studies in 18 countries between 2000 and 2018, with clinical follow-up of at least 3 months (up to 5 years). Participants included patients with acute ischemic stroke or transient ischemic attack with available brain MRI. Data were analyzed from April to December 2023. Main Outcomes and Measures: Outcomes of interest were presence of CMB, lacunes, and severe white matter hyperintensities determined on MRI. Additionally, mortality, recurrent ischemic stroke, and intracranial hemorrhage during follow-up were assessed. Multivariable random-effects logistic regression models, Cox regression, and competing risk regression models were used to investigate sex differences in individual SVD markers, risk of recurrent cerebrovascular events, and death. Results: A total of 20 314 patients (mean [SD] age, 70.1 [12.7] years; 11 721 [57.7%] male) were included, of whom 5649 (27.8%) had CMB. CMB were more frequent in male patients, and this was consistent throughout different age groups, locations, and in multivariable models (female vs male adjusted odds ratio [aOR], 0.86; 95% CI, 0.80-0.92; P &lt; .001). Female patients had fewer lacunes (aOR, 0.82; 95% CI, 0.74-0.90; P &lt; .001) but a higher prevalence of severe white matter hyperintensities (aOR, 1.10; 95% CI, 1.01-1.20; P = .04) compared with male patients. A total of 2419 patients (11.9%) died during a median (IQR) follow-up of 1.4 (0.7-2.5) years. CMB presence was associated with a higher risk of mortality in female patients (hazard ratio, 1.15; 95% CI, 1.02-1.31), but not male patients (hazard ratio, 0.95; 95% CI, 0.84-1.07) (P for interaction = .01). A total of 1113 patients (5.5%) had recurrent ischemic stroke, and 189 patients (0.9%) had recurrent intracranial hemorrhage, with no sex differences. Conclusions and Relevance: This cohort study using pooled individual patient data found varying frequencies of individual SVD markers between female and male patients, indicating potential pathophysiological differences in manifestation and severity of SVD. Further research addressing differences in pathomechanisms and outcomes of SVD between female and male patients is required.</p

    Sex Differences in Frequency, Severity, and Distribution of Cerebral Microbleeds

    Get PDF
    Importance: Cerebral small vessel disease (SVD) is associated with various cerebrovascular outcomes, but data on sex differences in SVD are scarce. Objective: To investigate whether the frequency, severity, and distribution of cerebral microbleeds (CMB), other SVD markers on magnetic resonance imaging (MRI), and outcomes differ by sex. Design, Setting, and Participants: This cohort study used pooled individual patient data from the Microbleeds International Collaborative Network, including patients from 38 prospective cohort studies in 18 countries between 2000 and 2018, with clinical follow-up of at least 3 months (up to 5 years). Participants included patients with acute ischemic stroke or transient ischemic attack with available brain MRI. Data were analyzed from April to December 2023. Main Outcomes and Measures: Outcomes of interest were presence of CMB, lacunes, and severe white matter hyperintensities determined on MRI. Additionally, mortality, recurrent ischemic stroke, and intracranial hemorrhage during follow-up were assessed. Multivariable random-effects logistic regression models, Cox regression, and competing risk regression models were used to investigate sex differences in individual SVD markers, risk of recurrent cerebrovascular events, and death. Results: A total of 20314 patients (mean [SD] age, 70.1 [12.7] years; 11721 [57.7%] male) were included, of whom 5649 (27.8%) had CMB. CMB were more frequent in male patients, and this was consistent throughout different age groups, locations, and in multivariable models (female vs male adjusted odds ratio [aOR], 0.86; 95% CI, 0.80-0.92; P <.001). Female patients had fewer lacunes (aOR, 0.82; 95% CI, 0.74-0.90; P <.001) but a higher prevalence of severe white matter hyperintensities (aOR, 1.10; 95% CI, 1.01-1.20; P =.04) compared with male patients. A total of 2419 patients (11.9%) died during a median (IQR) follow-up of 1.4 (0.7-2.5) years. CMB presence was associated with a higher risk of mortality in female patients (hazard ratio, 1.15; 95% CI, 1.02-1.31), but not male patients (hazard ratio, 0.95; 95% CI, 0.84-1.07) (P for interaction =.01). A total of 1113 patients (5.5%) had recurrent ischemic stroke, and 189 patients (0.9%) had recurrent intracranial hemorrhage, with no sex differences. Conclusions and Relevance: This cohort study using pooled individual patient data found varying frequencies of individual SVD markers between female and male patients, indicating potential pathophysiological differences in manifestation and severity of SVD. Further research addressing differences in pathomechanisms and outcomes of SVD between female and male patients is required
    corecore