62 research outputs found
Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision
Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to
synthesize training labels efficiently. The core component of PWS is the label
model, which infers true labels by aggregating the outputs of multiple noisy
supervision sources abstracted as labeling functions (LFs). Existing
statistical label models typically rely only on the outputs of LF, ignoring the
instance features when modeling the underlying generative process. In this
paper, we attempt to incorporate the instance features into a statistical label
model via the proposed FABLE. In particular, it is built on a mixture of
Bayesian label models, each corresponding to a global pattern of correlation,
and the coefficients of the mixture components are predicted by a Gaussian
Process classifier based on instance features. We adopt an auxiliary
variable-based variational inference algorithm to tackle the non-conjugate
issue between the Gaussian Process and Bayesian label models. Extensive
empirical comparison on eleven benchmark datasets sees FABLE achieving the
highest averaged performance across nine baselines.Comment: 16 page
Adaptive Ranking-based Sample Selection for Weakly Supervised Class-imbalanced Text Classification
To obtain a large amount of training labels inexpensively, researchers have
recently adopted the weak supervision (WS) paradigm, which leverages labeling
rules to synthesize training labels rather than using individual annotations to
achieve competitive results for natural language processing (NLP) tasks.
However, data imbalance is often overlooked in applying the WS paradigm,
despite being a common issue in a variety of NLP tasks. To address this
challenge, we propose Adaptive Ranking-based Sample Selection (ARS2), a
model-agnostic framework to alleviate the data imbalance issue in the WS
paradigm. Specifically, it calculates a probabilistic margin score based on the
output of the current model to measure and rank the cleanliness of each data
point. Then, the ranked data are sampled based on both class-wise and
rule-aware ranking. In particular, the two sample strategies corresponds to our
motivations: (1) to train the model with balanced data batches to reduce the
data imbalance issue and (2) to exploit the expertise of each labeling rule for
collecting clean samples. Experiments on four text classification datasets with
four different imbalance ratios show that ARS2 outperformed the
state-of-the-art imbalanced learning and WS methods, leading to a 2%-57.8%
improvement on their F1-score
Better Explain Transformers by Illuminating Important Information
Transformer-based models excel in various natural language processing (NLP)
tasks, attracting countless efforts to explain their inner workings. Prior
methods explain Transformers by focusing on the raw gradient and attention as
token attribution scores, where non-relevant information is often considered
during explanation computation, resulting in confusing results. In this work,
we propose highlighting the important information and eliminating irrelevant
information by a refined information flow on top of the layer-wise relevance
propagation (LRP) method. Specifically, we consider identifying syntactic and
positional heads as important attention heads and focus on the relevance
obtained from these important heads. Experimental results demonstrate that
irrelevant information does distort output attribution scores and then should
be masked during explanation computation. Compared to eight baselines on both
classification and question-answering datasets, our method consistently
outperforms with over 3\% to 33\% improvement on explanation metrics, providing
superior explanation performance. Our anonymous code repository is available
at: https://github.com/LinxinS97/Mask-LR
NLPBench: Evaluating Large Language Models on Solving NLP Problems
Recent developments in large language models (LLMs) have shown promise in
enhancing the capabilities of natural language processing (NLP). Despite these
successes, there remains a dearth of research dedicated to the NLP
problem-solving abilities of LLMs. To fill the gap in this area, we present a
unique benchmarking dataset, NLPBench, comprising 378 college-level NLP
questions spanning various NLP topics sourced from Yale University's prior
final exams. NLPBench includes questions with context, in which multiple
sub-questions share the same public information, and diverse question types,
including multiple choice, short answer, and math. Our evaluation, centered on
LLMs such as GPT-3.5/4, PaLM-2, and LLAMA-2, incorporates advanced prompting
strategies like the chain-of-thought (CoT) and tree-of-thought (ToT). Our study
reveals that the effectiveness of the advanced prompting strategies can be
inconsistent, occasionally damaging LLM performance, especially in smaller
models like the LLAMA-2 (13b). Furthermore, our manual assessment illuminated
specific shortcomings in LLMs' scientific problem-solving skills, with
weaknesses in logical decomposition and reasoning notably affecting results
SCP: Spherical-Coordinate-based Learned Point Cloud Compression
In recent years, the task of learned point cloud compression has gained
prominence. An important type of point cloud, the spinning LiDAR point cloud,
is generated by spinning LiDAR on vehicles. This process results in numerous
circular shapes and azimuthal angle invariance features within the point
clouds. However, these two features have been largely overlooked by previous
methodologies. In this paper, we introduce a model-agnostic method called
Spherical-Coordinate-based learned Point cloud compression (SCP), designed to
leverage the aforementioned features fully. Additionally, we propose a
multi-level Octree for SCP to mitigate the reconstruction error for distant
areas within the Spherical-coordinate-based Octree. SCP exhibits excellent
universality, making it applicable to various learned point cloud compression
techniques. Experimental results demonstrate that SCP surpasses previous
state-of-the-art methods by up to 29.14% in point-to-point PSNR BD-Rate
Au@h-Al2O3 Analogic Yolk–Shell Nanocatalyst for Highly Selective Synthesis of Biomass-Derived D-xylonic Acid via Regulation of Structure Effects
Selective oxidation of biomass-based monosaccharides into value-added sugar acids is highly desired, but limited success of producing D-xylonic acid has been achieved. Herein, we report an efficient catalyst system, viz., Au nanoparticles anchored on the inner walls of hollow Al2O3 nanospheres (Au@h- Al2O3), which could catalyze the selective oxidation of D-xylose into D-xylonic acid under base-free conditions. The mesoporous Al2O3 shell as the adsorbent first adsorbed D-xylose. Then, the interface of Au nanoparticles and Al2O3 as active sites spontaneously dissociated O2, and the exposed Au nanoparticle surface as the catalytic site drove the transformation. With this catalyst system, the valuable D-xylonic acid was produced with excellent yields in the aerobic oxidation of D-xylose. Extensive investigation showed that Au@h- Al2O3 is an efficient catalyst with high stability and recyclability
LARE: Latent Augmentation using Regional Embedding with Vision-Language Model
In recent years, considerable research has been conducted on vision-language models that handle both image and text data; these models are being applied to diverse downstream tasks, such as image-related chat, image recognition by instruction, and answering visual questions. Vision-language models (VLMs), such as Contrastive Language-Image Pre-training (CLIP), are also high-performance image classifiers that are being developed into domain adaptation methods that can utilize language information to extend into unseen domains. However, because these VLMs embed images as a single point in a unified embedding space, there is room for improvement in the classification accuracy. Therefore, in this study, we proposed the Latent Augmentation using Regional Embedding (LARE), which embeds the image as a region in the unified embedding space learned by the VLM. By sampling the augmented image embeddings from within this latent region, LARE enables data augmentation to various unseen domains, not just to specific unseen domains. LARE achieves robust image classification for domains in and out using augmented image embeddings to fine-tune VLMs. We demonstrate that LARE outperforms previous fine-tuning models in terms of image classification accuracy on three benchmarks. We also demonstrate that LARE is a more robust and general model that is valid under multiple conditions, such as unseen domains, small amounts of data, and imbalanced data.10 pages, 4 figure
Sex Differences in Frequency, Severity, and Distribution of Cerebral Microbleeds
Importance: Cerebral small vessel disease (SVD) is associated with various cerebrovascular outcomes, but data on sex differences in SVD are scarce. Objective: To investigate whether the frequency, severity, and distribution of cerebral microbleeds (CMB), other SVD markers on magnetic resonance imaging (MRI), and outcomes differ by sex. Design, Setting, and Participants: This cohort study used pooled individual patient data from the Microbleeds International Collaborative Network, including patients from 38 prospective cohort studies in 18 countries between 2000 and 2018, with clinical follow-up of at least 3 months (up to 5 years). Participants included patients with acute ischemic stroke or transient ischemic attack with available brain MRI. Data were analyzed from April to December 2023. Main Outcomes and Measures: Outcomes of interest were presence of CMB, lacunes, and severe white matter hyperintensities determined on MRI. Additionally, mortality, recurrent ischemic stroke, and intracranial hemorrhage during follow-up were assessed. Multivariable random-effects logistic regression models, Cox regression, and competing risk regression models were used to investigate sex differences in individual SVD markers, risk of recurrent cerebrovascular events, and death. Results: A total of 20 314 patients (mean [SD] age, 70.1 [12.7] years; 11 721 [57.7%] male) were included, of whom 5649 (27.8%) had CMB. CMB were more frequent in male patients, and this was consistent throughout different age groups, locations, and in multivariable models (female vs male adjusted odds ratio [aOR], 0.86; 95% CI, 0.80-0.92; P < .001). Female patients had fewer lacunes (aOR, 0.82; 95% CI, 0.74-0.90; P < .001) but a higher prevalence of severe white matter hyperintensities (aOR, 1.10; 95% CI, 1.01-1.20; P = .04) compared with male patients. A total of 2419 patients (11.9%) died during a median (IQR) follow-up of 1.4 (0.7-2.5) years. CMB presence was associated with a higher risk of mortality in female patients (hazard ratio, 1.15; 95% CI, 1.02-1.31), but not male patients (hazard ratio, 0.95; 95% CI, 0.84-1.07) (P for interaction = .01). A total of 1113 patients (5.5%) had recurrent ischemic stroke, and 189 patients (0.9%) had recurrent intracranial hemorrhage, with no sex differences. Conclusions and Relevance: This cohort study using pooled individual patient data found varying frequencies of individual SVD markers between female and male patients, indicating potential pathophysiological differences in manifestation and severity of SVD. Further research addressing differences in pathomechanisms and outcomes of SVD between female and male patients is required.</p
Sex Differences in Frequency, Severity, and Distribution of Cerebral Microbleeds
Importance: Cerebral small vessel disease (SVD) is associated with various cerebrovascular outcomes, but data on sex differences in SVD are scarce. Objective: To investigate whether the frequency, severity, and distribution of cerebral microbleeds (CMB), other SVD markers on magnetic resonance imaging (MRI), and outcomes differ by sex. Design, Setting, and Participants: This cohort study used pooled individual patient data from the Microbleeds International Collaborative Network, including patients from 38 prospective cohort studies in 18 countries between 2000 and 2018, with clinical follow-up of at least 3 months (up to 5 years). Participants included patients with acute ischemic stroke or transient ischemic attack with available brain MRI. Data were analyzed from April to December 2023. Main Outcomes and Measures: Outcomes of interest were presence of CMB, lacunes, and severe white matter hyperintensities determined on MRI. Additionally, mortality, recurrent ischemic stroke, and intracranial hemorrhage during follow-up were assessed. Multivariable random-effects logistic regression models, Cox regression, and competing risk regression models were used to investigate sex differences in individual SVD markers, risk of recurrent cerebrovascular events, and death. Results: A total of 20314 patients (mean [SD] age, 70.1 [12.7] years; 11721 [57.7%] male) were included, of whom 5649 (27.8%) had CMB. CMB were more frequent in male patients, and this was consistent throughout different age groups, locations, and in multivariable models (female vs male adjusted odds ratio [aOR], 0.86; 95% CI, 0.80-0.92; P <.001). Female patients had fewer lacunes (aOR, 0.82; 95% CI, 0.74-0.90; P <.001) but a higher prevalence of severe white matter hyperintensities (aOR, 1.10; 95% CI, 1.01-1.20; P =.04) compared with male patients. A total of 2419 patients (11.9%) died during a median (IQR) follow-up of 1.4 (0.7-2.5) years. CMB presence was associated with a higher risk of mortality in female patients (hazard ratio, 1.15; 95% CI, 1.02-1.31), but not male patients (hazard ratio, 0.95; 95% CI, 0.84-1.07) (P for interaction =.01). A total of 1113 patients (5.5%) had recurrent ischemic stroke, and 189 patients (0.9%) had recurrent intracranial hemorrhage, with no sex differences. Conclusions and Relevance: This cohort study using pooled individual patient data found varying frequencies of individual SVD markers between female and male patients, indicating potential pathophysiological differences in manifestation and severity of SVD. Further research addressing differences in pathomechanisms and outcomes of SVD between female and male patients is required
- …
