2,114 research outputs found
A Framework for Comparing Groups of Documents
We present a general framework for comparing multiple groups of documents. A
bipartite graph model is proposed where document groups are represented as one
node set and the comparison criteria are represented as the other node set.
Using this model, we present basic algorithms to extract insights into
similarities and differences among the document groups. Finally, we demonstrate
the versatility of our framework through an analysis of NSF funding programs
for basic research.Comment: 6 pages; 2015 Conference on Empirical Methods in Natural Language
Processing (EMNLP '15
Minimising the expectation value of the procurement cost in electricity markets based on the prediction error of energy consumption
In this paper, we formulate a method for minimising the expectation value of
the procurement cost of electricity in two popular spot markets: {\it
day-ahead} and {\it intra-day}, under the assumption that expectation value of
unit prices and the distributions of prediction errors for the electricity
demand traded in two markets are known. The expectation value of the total
electricity cost is minimised over two parameters that change the amounts of
electricity. Two parameters depend only on the expected unit prices of
electricity and the distributions of prediction errors for the electricity
demand traded in two markets. That is, even if we do not know the predictions
for the electricity demand, we can determine the values of two parameters that
minimise the expectation value of the procurement cost of electricity in two
popular spot markets. We demonstrate numerically that the estimate of two
parameters often results in a small variance of the total electricity cost, and
illustrate the usefulness of the proposed procurement method through the
analysis of actual data
Topic Similarity Networks: Visual Analytics for Large Document Sets
We investigate ways in which to improve the interpretability of LDA topic
models by better analyzing and visualizing their outputs. We focus on examining
what we refer to as topic similarity networks: graphs in which nodes represent
latent topics in text collections and links represent similarity among topics.
We describe efficient and effective approaches to both building and labeling
such networks. Visualizations of topic models based on these networks are shown
to be a powerful means of exploring, characterizing, and summarizing large
collections of unstructured text documents. They help to "tease out"
non-obvious connections among different sets of documents and provide insights
into how topics form larger themes. We demonstrate the efficacy and
practicality of these approaches through two case studies: 1) NSF grants for
basic research spanning a 14 year period and 2) the entire English portion of
Wikipedia.Comment: 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData
2014
Mining Measured Information from Text
We present an approach to extract measured information from text (e.g., a
1370 degrees C melting point, a BMI greater than 29.9 kg/m^2 ). Such
extractions are critically important across a wide range of domains -
especially those involving search and exploration of scientific and technical
documents. We first propose a rule-based entity extractor to mine measured
quantities (i.e., a numeric value paired with a measurement unit), which
supports a vast and comprehensive set of both common and obscure measurement
units. Our method is highly robust and can correctly recover valid measured
quantities even when significant errors are introduced through the process of
converting document formats like PDF to plain text. Next, we describe an
approach to extracting the properties being measured (e.g., the property "pixel
pitch" in the phrase "a pixel pitch as high as 352 {\mu}m"). Finally, we
present MQSearch: the realization of a search engine with full support for
measured information.Comment: 4 pages; 38th International ACM SIGIR Conference on Research and
Development in Information Retrieval (SIGIR '15
- …
