703 research outputs found
Enhancing Sensitivity Classification with Semantic Features using Word Embeddings
Government documents must be reviewed to identify any sensitive information
they may contain, before they can be released to the public. However,
traditional paper-based sensitivity review processes are not practical for reviewing
born-digital documents. Therefore, there is a timely need for automatic sensitivity
classification techniques, to assist the digital sensitivity review process.
However, sensitivity is typically a product of the relations between combinations
of terms, such as who said what about whom, therefore, automatic sensitivity
classification is a difficult task. Vector representations of terms, such as word
embeddings, have been shown to be effective at encoding latent term features
that preserve semantic relations between terms, which can also be beneficial to
sensitivity classification. In this work, we present a thorough evaluation of the
effectiveness of semantic word embedding features, along with term and grammatical
features, for sensitivity classification. On a test collection of government
documents containing real sensitivities, we show that extending text classification
with semantic features and additional term n-grams results in significant improvements
in classification effectiveness, correctly classifying 9.99% more sensitive
documents compared to the text classification baseline
DYNAMO-MAS: a multi-agent system for ontology evolution from text
International audienceManual ontology development and evolution are complex and time-consuming tasks, even when textual documents are used as knowledge sources in addition to human expertise or existing ontologies. Processing natural language in text produces huge amounts of linguistic data that need to be filtered out and structured. To support both of these tasks, we have developed DYNAMO-MAS, an interactive tool based on an adaptive multi-agent system (adaptive MAS or AMAS) that builds and evolves ontologies from text. DYNA-MO-MAS is a partner system to build ontologies; the ontologist interacts with the system to validate or modify its outputs. This paper presents the architecture of DYNAMO-MAS, its operating principles and its evaluation on three case studies
SerpinB2 regulates stromal remodelling and local invasion in pancreatic cancer
Pancreatic cancer has a devastating prognosis, with an overall 5-year survival rate of ~8%, restricted treatment options and characteristic molecular heterogeneity. SerpinB2 expression, particularly in the stromal compartment, is associated with reduced metastasis and prolonged survival in pancreatic ductal adenocarcinoma (PDAC) and our genomic analysis revealed that SERPINB2 is frequently deleted in PDAC. We show that SerpinB2 is required by stromal cells for normal collagen remodelling in vitro, regulating fibroblast interaction and engagement with collagen in the contracting matrix. In a pancreatic cancer allograft model, co-injection of PDAC cancer cells and SerpinB2(-/-) mouse embryonic fibroblasts (MEFs) resulted in increased tumour growth, aberrant remodelling of the extracellular matrix (ECM) and increased local invasion from the primary tumour. These tumours also displayed elevated proteolytic activity of the primary biochemical target of SerpinB2-urokinase plasminogen activator (uPA). In a large cohort of patients with resected PDAC, we show that increasing uPA mRNA expression was significantly associated with poorer survival following pancreatectomy. This study establishes a novel role for SerpinB2 in the stromal compartment in PDAC invasion through regulation of stromal remodelling and highlights the SerpinB2/uPA axis for further investigation as a potential therapeutic target in pancreatic cancer
Text Mining the History of Medicine
Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform
Cytogerontology since 1881: A reappraisal of August Weismann and a review of modern progress
Cytogerontology, the science of cellular ageing, originated in 1881 with the prediction by August Weismann that the somatic cells of higher animals have limited division potential. Weismann's prediction was derived by considering the role of natural selection in regulating the duration of an organism's life. For various reasons, Weismann's ideas on ageing fell into neglect following his death in 1914, and cytogerontology has only reappeared as a major research area following the demonstration by Hayflick and Moorhead in the early 1960s that diploid human fibroblasts are restricted to a finite number of divisions in vitro.
In this review we give a detailed account of Weismann's theory, and we reveal that his ideas were both more extensive in their scope and more pertinent to current research than is generally recognised. We also appraise the progress which has been made over the past hundred years in investigating the causes of ageing, with particular emphasis being given to (i) the evolution of ageing, and (ii) ageing at the cellular level. We critically assess the current state of knowledge in these areas and recommend a series of points as primary targets for future research
Evolutionary and pulsational properties of white dwarf stars
Abridged. White dwarf stars are the final evolutionary stage of the vast
majority of stars, including our Sun. The study of white dwarfs has potential
applications to different fields of astrophysics. In particular, they can be
used as independent reliable cosmic clocks, and can also provide valuable
information about the fundamental parameters of a wide variety of stellar
populations, like our Galaxy and open and globular clusters. In addition, the
high densities and temperatures characterizing white dwarfs allow to use these
stars as cosmic laboratories for studying physical processes under extreme
conditions that cannot be achieved in terrestrial laboratories. They can be
used to constrain fundamental properties of elementary particles such as axions
and neutrinos, and to study problems related to the variation of fundamental
constants.
In this work, we review the essentials of the physics of white dwarf stars.
Special emphasis is placed on the physical processes that lead to the formation
of white dwarfs as well as on the different energy sources and processes
responsible for chemical abundance changes that occur along their evolution.
Moreover, in the course of their lives, white dwarfs cross different
pulsational instability strips. The existence of these instability strips
provides astronomers with an unique opportunity to peer into their internal
structure that would otherwise remain hidden from observers. We will show that
this allows to measure with unprecedented precision the stellar masses and to
infer their envelope thicknesses, to probe the core chemical stratification,
and to detect rotation rates and magnetic fields. Consequently, in this work,
we also review the pulsational properties of white dwarfs and the most recent
applications of white dwarf asteroseismology.Comment: 85 pages, 28 figures. To be published in The Astronomy and
Astrophysics Revie
Soccer Team Vectors
In this work we present STEVE - Soccer TEam VEctors, a principled approach
for learning real valued vectors for soccer teams where similar teams are close
to each other in the resulting vector space. STEVE only relies on freely
available information about the matches teams played in the past. These vectors
can serve as input to various machine learning tasks. Evaluating on the task of
team market value estimation, STEVE outperforms all its competitors. Moreover,
we use STEVE for similarity search and to rank soccer teams.Comment: 11 pages, 1 figure; This paper was presented at the 6th Workshop on
Machine Learning and Data Mining for Sports Analytics at ECML/PKDD 2019,
W\"urzburg, Germany, 201
Comparing High Dimensional Word Embeddings Trained on Medical Text to Bag-of-Words For Predicting Medical Codes
Word embeddings are a useful tool for extracting knowledge from the free-form text contained in electronic health records, but it has become commonplace to train such word embeddings on data that do not accurately reflect how language is used in a healthcare context. We use prediction of medical codes as an example application to compare the accuracy of word embeddings trained on health corpora to those trained on more general collections of text. It is shown that both an increase in embedding dimensionality and an increase in the volume of health-related training data improves prediction accuracy. We also present a comparison to the traditional bag-of-words feature representation, demonstrating that in many cases, this conceptually simple method for representing text results in superior accuracy to that of word embeddings
Acquired resistance to oxaliplatin is not directly associated with increased resistance to DNA damage in SK-N-ASrOXALI4000, a newly established oxaliplatin-resistant sub-line of the neuroblastoma cell line SK-N-AS
The formation of acquired drug resistance is a major reason for the failure of anti-cancer therapies after initial response. Here, we introduce a novel model of acquired oxaliplatin resistance, a sub-line of the non-MYCN-amplified neuroblastoma cell line SK-N-AS that was adapted to growth in the presence of 4000 ng/mL oxaliplatin (SK-N-ASrOXALI4000). SK-N-ASrOXALI4000 cells displayed enhanced chromosomal aberrations compared to SK-N-AS, as indicated by 24-chromosome fluorescence in situ hybridisation. Moreover, SK-N-ASrOXALI4000 cells were resistant not only to oxaliplatin but also to the two other commonly used anti-cancer platinum agents cisplatin and carboplatin. SK-N-ASrOXALI4000 cells exhibited a stable resistance phenotype that was not affected by culturing the cells for 10 weeks in the absence of oxaliplatin. Interestingly, SK-N-ASrOXALI4000 cells showed no cross resistance to gemcitabine and increased sensitivity to doxorubicin and UVC radiation, alternative treatments that like platinum drugs target DNA integrity. Notably, UVC-induced DNA damage is thought to be predominantly repaired by nucleotide excision repair and nucleotide excision repair has been described as the main oxaliplatin-induced DNA damage repair system. SK-N-ASrOXALI4000 cells were also more sensitive to lysis by influenza A virus, a candidate for oncolytic therapy, than SK-N-AS cells. In conclusion, we introduce a novel oxaliplatin resistance model. The oxaliplatin resistance mechanisms in SK-N-ASrOXALI4000 cells appear to be complex and not to directly depend on enhanced DNA repair capacity. Models of oxaliplatin resistance are of particular relevance since research on platinum drugs has so far predominantly focused on cisplatin and carboplatin
Prolactin-induced mouse mammary carcinomas model estrogen resistant luminal breast cancer.
INTRODUCTION: Tumors that express estrogen receptor alpha (ERα+) comprise 75% of breast cancers in women. While treatments directed against this receptor have successfully lowered mortality rates, many primary tumors initially or later exhibit resistance. The paucity of murine models of this luminal tumor subtype has hindered studies of factors that promote their pathogenesis and modulate responsiveness to estrogen-directed therapeutics. Since epidemiologic studies closely link prolactin and the development of ERα+ tumors in women, we examined characteristics of the aggressive ERα+ and ERα- carcinomas which develop in response to mammary prolactin in a murine transgenic model (neu-related lipocalin- prolactin (NRL-PRL)). To evaluate their relationship to clinical tumors, we determined phenotypic relationships among these carcinomas, other murine models of breast cancer, and features of luminal tumors in women.
METHODS: We examined a panel of prolactin-induced tumors for characteristics relevant to clinical tumors: histotype, ERα/progesterone receptor (PR) expression and estrogen responsiveness, Activating Protein 1 (AP-1) components, and phosphorylation of signal transducer and activator of transcription 5 (Stat5), extracellular signal regulated kinase (ERK) 1/2 and AKT. We compared levels of transcripts in the ERα-associated luminal signature that defines this subtype of tumors in women and transcripts enriched in various mammary epithelial lineages to other well-studied genetically modified murine models of breast cancer. Finally, we used microarray analyses to compare prolactin-induced ERα+ and ERα- tumors, and examined responsiveness to estrogen and the anti-estrogen, Faslodex, in vivo.
RESULTS: Prolactin-induced carcinomas were markedly diverse with respect to histotype, ERα/PR expression, and activated signaling cascades. They constituted a heterogeneous, but distinct group of murine mammary tumors, with molecular features of the luminal subtype of human breast cancer. In contrast to morphologically normal and hyperplastic structures in NRL-PRL females, carcinomas were insensitive to ERα-mediated signals. These tumors were distinct from mouse mammary tumor virus (MMTV)-neu tumors, and contained elevated transcripts for factors associated with luminal/alveolar expansion and differentiation, suggesting that they arose from physiologic targets of prolactin. These features were shared by ERα+ and ERα- tumors, suggesting a common origin, although the former exhibited transcript profiles reflecting greater differentiation.
CONCLUSIONS: Our studies demonstrate that prolactin can promote diverse carcinomas in mice, many of which resemble luminal breast cancers, providing a novel experimental model to examine the pathogenesis, progression and treatment responsiveness of this tumor subtype
- …
