790 research outputs found
Scene Parsing with Global Context Embedding
We present a scene parsing method that utilizes global context information
based on both the parametric and non- parametric models. Compared to previous
methods that only exploit the local relationship between objects, we train a
context network based on scene similarities to generate feature representations
for global contexts. In addition, these learned features are utilized to
generate global and spatial priors for explicit classes inference. We then
design modules to embed the feature representations and the priors into the
segmentation network as additional global context cues. We show that the
proposed method can eliminate false positives that are not compatible with the
global context representations. Experiments on both the MIT ADE20K and PASCAL
Context datasets show that the proposed method performs favorably against
existing methods.Comment: Accepted in ICCV'17. Code available at
https://github.com/hfslyc/GCPNe
Large-scale event extraction from literature with multi-level gene normalization
Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families. To this end, we have combined two state-of-the-art text mining components, previously evaluated on two community-wide challenges, and have extended and improved upon these methods by exploiting their complementary nature. Using these systems, we perform normalization and event extraction to create a large-scale resource that is publicly available, unique in semantic scope, and covers all 21.9 million PubMed abstracts and 460 thousand PubMed Central open access full-text articles. This dataset contains 40 million biomolecular events involving 76 million gene/protein mentions, linked to 122 thousand distinct genes from 5032 species across the full taxonomic tree. Detailed evaluations and analyses reveal promising results for application of this data in database and pathway curation efforts. The main software components used in this study are released under an open-source license. Further, the resulting dataset is freely accessible through a novel API, providing programmatic and customized access (http://www.evexdb.org/api/v001/). Finally, to allow for large-scale bioinformatic analyses, the entire resource is available for bulk download from http://evexdb.org/download/, under the Creative Commons -Attribution - Share Alike (CC BY-SA) license
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Recent advancements in multimodal large language models (LLMs) have shown
their potential in various domains, especially concept reasoning. Despite these
developments, applications in understanding 3D environments remain limited.
This paper introduces Reason3D, a novel LLM designed for comprehensive 3D
understanding. Reason3D takes point cloud data and text prompts as input to
produce textual responses and segmentation masks, facilitating advanced tasks
like 3D reasoning segmentation, hierarchical searching, express referring, and
question answering with detailed mask outputs. Specifically, we propose a
hierarchical mask decoder to locate small objects within expansive scenes. This
decoder initially generates a coarse location estimate covering the object's
general area. This foundational estimation facilitates a detailed,
coarse-to-fine segmentation strategy that significantly enhances the precision
of object identification and segmentation. Experiments validate that Reason3D
achieves remarkable results on large-scale ScanNet and Matterport3D datasets
for 3D express referring, 3D question answering, and 3D reasoning segmentation
tasks. Code and models are available at:
https://github.com/KuanchihHuang/Reason3D.Comment: Project Page: https://KuanchihHuang.github.io/project/reason3
BioRED: A Comprehensive Biomedical Relation Extraction Dataset
Automated relation extraction (RE) from biomedical literature is critical for
many downstream text mining applications in both research and real-world
settings. However, most existing benchmarking datasets for bio-medical RE only
focus on relations of a single type (e.g., protein-protein interactions) at the
sentence level, greatly limiting the development of RE systems in biomedicine.
In this work, we first review commonly used named entity recognition (NER) and
RE datasets. Then we present BioRED, a first-of-its-kind biomedical RE corpus
with multiple entity types (e.g., gene/protein, disease, chemical) and relation
pairs (e.g., gene-disease; chemical-chemical), on a set of 600 PubMed articles.
Further, we label each relation as describing either a novel finding or
previously known background knowledge, enabling automated algorithms to
differentiate between novel and background information. We assess the utility
of BioRED by benchmarking several existing state-of-the-art methods, including
BERT-based models, on the NER and RE tasks. Our results show that while
existing approaches can reach high performance on the NER task (F-score of
89.3%), there is much room for improvement for the RE task, especially when
extracting novel relations (F-score of 47.7%). Our experiments also demonstrate
that such a comprehensive dataset can successfully facilitate the development
of more accurate, efficient, and robust RE systems for biomedicine
BC4GO: a full-text corpus for the BioCreative IV GO task
Gene function curation via Gene Ontology (GO) annotation is a common task among Model Organism Database groups. Owing to its manual nature, this task is considered one of the bottlenecks in literature curation. There have been many previous attempts at automatic identification of GO terms and supporting information from full text. However, few systems have delivered an accuracy that is comparable with humans. One recognized challenge in developing such systems is the lack of marked sentence-level evidence text that provides the basis for making GO annotations. We aim to create a corpus that includes the GO evidence text along with the three core elements of GO annotations: (i) a gene or gene product, (ii) a GO term and (iii) a GO evidence code. To ensure our results are consistent with real-life GO data, we recruited eight professional GO curators and asked them to follow their routine GO annotation protocols. Our annotators marked up more than 5000 text passages in 200 articles for 1356 distinct GO terms. For evidence sentence selection, the inter-annotator agreement (IAA) results are 9.3% (strict) and 42.7% (relaxed) in F1-measures. For GO term selection, the IAAs are 47% (strict) and 62.9% (hierarchical). Our corpus analysis further shows that abstracts contain ∼10% of relevant evidence sentences and 30% distinct GO terms, while the Results/Experiment section has nearly 60% relevant sentences and >70% GO terms. Further, of those evidence sentences found in abstracts, less than one-third contain enough experimental detail to fulfill the three core criteria of a GO annotation. This result demonstrates the need of using full-text articles for text mining GO annotations. Through its use at the BioCreative IV GO (BC4GO) task, we expect our corpus to become a valuable resource for the BioNLP research community
Microstructure and superelastic properties of FeNiCoAlTi single crystals with the <100> orientation under tension
The microstructure and superelastic response of an Fe41 Ni28 Co17 Al11.5 Ti2.5 (at.%) single crystal along the [removed] orientation was investigated under tension at room temperature after aging at 600◦ C for 24 h. From the superelastic results, the samples aged at 600◦ C for 24 h exhibited 4.5% recoverable strain at room temperature. The digital image correlation (DIC) method was used to observe the strain distribution during the 6.5% applied strain loading. The DIC results showed that the strain was uniformly distributed during the loading and unloading cycles. Only one martensite variant was observed from the DIC results. This was related to the aging heat treatment times. The martensite morphology became a single variant with a longer aging time. The thermo-magnetization results indicated that the phase transformation and temperature hysteresis was around 36◦ C. Increasing the magnetic field from 0.05 to 7 Tesla, the transformation temperatures increased. The maximum magnetization was 160 emu/g under the magnetic field of 7 Tesla. From the transmission electron microscopy results, the L12 precipitates were around 10 nm in size, and they were high in Ni content and low in Fe content
BioConceptVec: creating and evaluating literature-based biomedical concept embeddings on a large scale
Capturing the semantics of related biological concepts, such as genes and
mutations, is of significant importance to many research tasks in computational
biology such as protein-protein interaction detection, gene-drug association
prediction, and biomedical literature-based discovery. Here, we propose to
leverage state-of-the-art text mining tools and machine learning models to
learn the semantics via vector representations (aka. embeddings) of over
400,000 biological concepts mentioned in the entire PubMed abstracts. Our
learned embeddings, namely BioConceptVec, can capture related concepts based on
their surrounding contextual information in the literature, which is beyond
exact term match or co-occurrence-based methods. BioConceptVec has been
thoroughly evaluated in multiple bioinformatics tasks consisting of over 25
million instances from nine different biological datasets. The evaluation
results demonstrate that BioConceptVec has better performance than existing
methods in all tasks. Finally, BioConceptVec is made freely available to the
research community and general public via
https://github.com/ncbi-nlp/BioConceptVec.Comment: 33 pages, 6 figures, 7 tables, accepted by PLOS Computational Biolog
- …
