656 research outputs found
Economic Impacts of Proposed Limits on Trans Fats in Canada
In response to growing concerns about coronary heart disease (CHD), the Government of Canada has recently taken policy measures to reduce Canadian trans fatty acid (TFA) consumption. The mandatory labelling of trans fat content in foods began in December 2005. The House of Commons also established a task force in November 2004 to develop a set of regulations to ban the sale of food products with a TFA content greater than 2 percent. The issue at stake is whether the mandatory content restriction has economic merit. While the mandatory TFA reductions could reduce heart disease and improve the health of Canadians, they also have the potential to increase economic costs faced by all aspects of the Canadian food oil complex, from primary producers to consumers. The goal of this article is to examine the impacts of a mandatory reduction of trans fat content by estimating the potential health benefits and potential adverse impacts on the agri-food sector.Agricultural and Food Policy, Food Consumption/Nutrition/Food Safety,
Automated Construction of Relational Attributes ACORA: A Progress Report
Data mining research has not only development a large number of algorithms, but also
enhanced our knowledge and understanding of their applicability and performance.
However, the application of data mining technology in business environments is still no
very common, despite the fact that organizations have access to large amounts of data
and make decisions that could profit from data mining on a daily basis. One of the
reasons is the mismatch between data representation for data storage and data analysis.
Data are most commonly stored in multi-table relational databases whereas data mining
methods require that the data be represented as a simple feature vector. This work
presents a general framework for feature construction from multiple relational tables for
data mining applications. The second part describes our prototype implementation
ACORA (Automated Construction of Relational Features).Information Systems Working Papers Serie
Aggregation-Based Feature Invention and Relational Concept Classes
Model induction from relational data requires aggregation of values of attributes of related entities. This paper makes three contributions to the study of relational learning.(1) It presents a hierarchy of relational concepts of increasing complexity, using relational schema characteristics such as cardinality, and derives classes of aggregation operators that are needed to learn these concepts. (2) Expanding one level of the hierarchy, it introduces new aggregation operators that model the distribution of the values to be aggregated and (for classification problems) the differences in these distributions by class. (3) It demonstrates empirically on a noisy business domain that more-complex aggregation methods can increase generalization performance. Constructing features using target-dependent aggregations can transform relational prediction tasks so that well-understood feature-vector-based modeling algorithms can be applied successfully.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
Predicting citation rates for physics papers: Constructing features for an ordered probit model
Gehrke et al. introduce the citation prediction task in their paper "Overview of the KDD Cup 2003" (in this issue). The objective was to predict the change in the number of citations a paper will receive-not the absolute number of citations. There are obvious factors affecting the number of citations including the quality and the topic of the paper, and the reputation of the authors. However it is not clear which factors might influence the change in citations between quarters, rendering the construction of predictive features a challenging task. A high quality and timely paper will be cited more often than a lower quality paper, but that does not suggest the change in citation counts. The selection of training data was critical, as the evaluation would only be on papers that received more than 5 citations in the quarter following the submission of results. After considering several modeling approaches, we used a modified version of an ordered probit model. We describe each of these steps in turn.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
ACORA: Distribution-Based Aggregation for Relational Learning from Identifier Attributes
Feature construction through aggregation plays an essential role in modeling relational
domains with one-to-many relationships between tables. One-to-many relationships
lead to bags (multisets) of related entities, from which predictive information
must be captured. This paper focuses on aggregation from categorical attributes
that can take many values (e.g., object identifiers). We present a novel aggregation
method as part of a relational learning system ACORA, that combines the use of
vector distance and meta-data about the class-conditional distributions of attribute
values. We provide a theoretical foundation for this approach deriving a "relational
fixed-effect" model within a Bayesian framework, and discuss the implications of
identifier aggregation on the expressive power of the induced model. One advantage
of using identifier attributes is the circumvention of limitations caused either by
missing/unobserved object properties or by independence assumptions. Finally, we
show empirically that the novel aggregators can generalize in the presence of identi-
fier (and other high-dimensional) attributes, and also explore the limitations of the
applicability of the methods.Information Systems Working Papers Serie
Distribution-based aggregation for relational learning with identifier attributes
Identifier attributes—very high-dimensional categorical attributes such as particular
product ids or people’s names—rarely are incorporated in statistical modeling. However,
they can play an important role in relational modeling: it may be informative to have communicated
with a particular set of people or to have purchased a particular set of products. A
key limitation of existing relational modeling techniques is how they aggregate bags (multisets)
of values from related entities. The aggregations used by existing methods are simple
summaries of the distributions of features of related entities: e.g., MEAN, MODE, SUM,
or COUNT. This paper’s main contribution is the introduction of aggregation operators that
capture more information about the value distributions, by storing meta-data about value
distributions and referencing this meta-data when aggregating—for example by computing
class-conditional distributional distances. Such aggregations are particularly important for
aggregating values from high-dimensional categorical attributes, for which the simple aggregates
provide little information. In the first half of the paper we provide general guidelines
for designing aggregation operators, introduce the new aggregators in the context of the
relational learning system ACORA (Automated Construction of Relational Attributes), and
provide theoretical justification.We also conjecture special properties of identifier attributes,
e.g., they proxy for unobserved attributes and for information deeper in the relationship
network. In the second half of the paper we provide extensive empirical evidence that the
distribution-based aggregators indeed do facilitate modeling with high-dimensional categorical
attributes, and in support of the aforementioned conjectures.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
Aggregation-Based Feature Invention and Relational
Due to interest in social and economic networks, relational modeling is
attracting increasing attention. The field of relational data
mining/learning, which traditionally was dominated by logic-based
approaches, has recently been extended by adapting learning methods such
as naive Bayes, Baysian networks and decision trees to relational tasks.
One aspect inherent to all methods of model induction from relational
data is the construction of features through the aggregation of sets.
The theoretical part of this work (1) presents an ontology of relational
concepts of increasing complexity, (2) derives classes of aggregation
operators that are needed to learn these concepts, and (3) classifies
relational domains based on relational schema characteristics such as
cardinality. We then present a new class of aggregation functions, ones
that are particularly well suited for relational classification and
class probability estimation. The empirical part of this paper
demonstrates on real domain the effects on the system performance of
different aggregation methods on different relational concepts. The
results suggest that more complex aggregation methods can significantly
increase generalization performance and that, in particular,
task-specific aggregation can simplify relational prediction tasks into
well-understood propositional learning problems.Information Systems Working Papers Serie
Evaluating and Optimizing Online Advertising: Forget the click, but there are good proxies
A main goal of online display advertising is to drive purchases (etc.)
following ad engagement. However, there often are too few purchase
conversions for campaign evaluation and optimization, due to low
conversion rates, cold start periods, and long purchase cycles (e.g.,
with brand advertising). This paper presents results across dozens of
experiments within individual online display advertising campaigns, each
comparing different 'proxies' for measuring success. Measuring success
is critical both for evaluating and comparing different targeting
strategies, and for designing and optimizing the strategies in the first
place (for example, via predictive modeling). Proxies are necessary
because data on the actual goals of advertising (e.g., purchasing,
increased brand affinity, etc.) often are scarce, missing, or
fundamentally difficult or impossible to observe. The paper presents bad
news and good news. The most commonly cited and used proxy for success
is a click on an advertisement. The bad news is that across a large
number of campaigns, clicks are not good proxies for evaluation nor for
optimization: buyers do not resemble clickers. The good news is that an
alternative sort of proxy performs remarkably well: observed visits to
the brand's website. Specifically, predictive models built based on
brand site visits do a remarkably good job of predicting which browsers
will purchase. The practical bottom line: evaluating campaigns and
optimizing based on clicks seems wrongheaded; however, there is an easy
and attractive alternative|use a well-chosen site visit proxy instead.m6d research; NYU Stern School of Busines
- …
