9,503 research outputs found
Boolean algebras and Lubell functions
Let denote the power set of . A collection
\B\subset 2^{[n]} forms a -dimensional {\em Boolean algebra} if there
exist pairwise disjoint sets , all non-empty
with perhaps the exception of , so that \B={X_0\cup \bigcup_{i\in I}
X_i\colon I\subseteq [d]}. Let be the maximum cardinality of a family
\F\subset 2^X that does not contain a -dimensional Boolean algebra.
Gunderson, R\"odl, and Sidorenko proved that where .
In this paper, we use the Lubell function as a new measurement for large
families instead of cardinality. The Lubell value of a family of sets \F with
\F\subseteq \tsupn is defined by h_n(\F):=\sum_{F\in \F}1/{{n\choose |F|}}.
We prove the following Tur\'an type theorem. If \F\subseteq 2^{[n]} contains
no -dimensional Boolean algebra, then h_n(\F)\leq 2(n+1)^{1-2^{1-d}} for
sufficiently large . This results implies , where is an absolute constant independent of and . As a
consequence, we improve several Ramsey-type bounds on Boolean algebras. We also
prove a canonical Ramsey theorem for Boolean algebras.Comment: 10 page
Data-based stochastic model reduction for the Kuramoto--Sivashinsky equation
The problem of constructing data-based, predictive, reduced models for the
Kuramoto-Sivashinsky equation is considered, under circumstances where one has
observation data only for a small subset of the dynamical variables. Accurate
prediction is achieved by developing a discrete-time stochastic reduced system,
based on a NARMAX (Nonlinear Autoregressive Moving Average with eXogenous
input) representation. The practical issue, with the NARMAX representation as
with any other, is to identify an efficient structure, i.e., one with a small
number of terms and coefficients. This is accomplished here by estimating
coefficients for an approximate inertial form. The broader significance of the
results is discussed.Comment: 23 page, 7 figure
Characteristics and Fertility Status of Soils and Minesoils in Selected Areas of Usibelli Coal Mine, Healy, Alaska
Alaska has been proven to contain not only bountiful oil and gas reserves. but also vast coal fields occurring from the southcentral coastline to the Interior and the Arctic zone to the north. Because of concerns for stable sources of energy, particularly by the energy-short, industrial nations of the Orient, more exploration and stripmining for coal can be expected in the near future. Therefore, it is important to know the consequences of large-area soil disturbances tn the subarctic and bow the effects of man's reclamation efforts and natural processes combine in reestablishing vegetative community. The culmination or synthesis of these processes is soil development and is of great importance in successful stripmine reclamation.
The Usibelli Coal Mine Company in the Healy coal field, located in Interior Alaska. commenced stripmining in 1943. Its operation has been continuous, moving from area to area, for the last 40 years. Stripmining requires the excavation of overburden and subsequent redeposition, therefore the Healy operation has exposed minespoils from different strata on various topography. In 1972, the Usibelli Coal Mine company initiated a reclamation program and, over the ensuing l0 years, has seeded and fertilized over 2000 acres. Nevertheless, there remain barren areas and areas undergoing natural revegetation. Additionally, experimental trials in seeding and fertilization were started in 1980. Large areas of intact native plant communities adjoin the mined areas. The company property provides opportunities to study the processes of soil formation under different sets of conditions.
The objectives of this study were to (1) characterize the soils on the mine lease area for baseline data, (2) to characterize the mine soils with various history, (3) to study the process of soil formation under different sets of conditions, and (4) to evaluate the nutrient levels of both soil and minesoils to form a basis for establishing soil-handling requirements to promote reclamation practices.This study was supported by funds from the U.S. Department of Energy (AM06-76RL02229) and the U.S. Department of Agriculture Hatch project. Our appreciation to Drs. W.M. Mitchell. G.A. Mitchell. and F. Wooding of the Agricultural and Forestry Experiment Station. and Mr. J.P. Moore of USDA Son Conservation Service for reviewing the manuscript and offering many useful suggestions.
Our appreciation also to Dr. Milton A. Wiltse of Division of Geological and Geophysical Surveys. Department of Natural Resources for access to the X-ray diffractometer and technical advice. Special thanks to the Usibelli Coal Mine Inc. for logistic and technical assistance tn carrying out this study
How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition
Data competitions rely on real-time leaderboards to rank competitor entries
and stimulate algorithm improvement. While such competitions have become quite
popular and prevalent, particularly in supervised learning formats, their
implementations by the host are highly variable. Without careful planning, a
supervised learning competition is vulnerable to overfitting, where the winning
solutions are so closely tuned to the particular set of provided data that they
cannot generalize to the underlying problem of interest to the host. This paper
outlines some important considerations for strategically designing relevant and
informative data sets to maximize the learning outcome from hosting a
competition based on our experience. It also describes a post-competition
analysis that enables robust and efficient assessment of the strengths and
weaknesses of solutions from different competitors, as well as greater
understanding of the regions of the input space that are well-solved. The
post-competition analysis, which complements the leaderboard, uses exploratory
data analysis and generalized linear models (GLMs). The GLMs not only expand
the range of results we can explore, they also provide more detailed analysis
of individual sub-questions including similarities and differences between
algorithms across different types of scenarios, universally easy or hard
regions of the input space, and different learning objectives. When coupled
with a strategically planned data generation approach, the methods provide
richer and more informative summaries to enhance the interpretation of results
beyond just the rankings on the leaderboard. The methods are illustrated with a
recently completed competition to evaluate algorithms capable of detecting,
identifying, and locating radioactive materials in an urban environment.Comment: 36 page
Pantheon 1.0, a manually verified dataset of globally famous biographies
We present the Pantheon 1.0 dataset: a manually verified dataset of
individuals that have transcended linguistic, temporal, and geographic
boundaries. The Pantheon 1.0 dataset includes the 11,341 biographies present in
more than 25 languages in Wikipedia and is enriched with: (i) manually verified
demographic information (place and date of birth, gender) (ii) a taxonomy of
occupations classifying each biography at three levels of aggregation and (iii)
two measures of global popularity including the number of languages in which a
biography is present in Wikipedia (L), and the Historical Popularity Index
(HPI) a metric that combines information on L, time since birth, and page-views
(2008-2013). We compare the Pantheon 1.0 dataset to data from the 2003 book,
Human Accomplishments, and also to external measures of accomplishment in
individual games and sports: Tennis, Swimming, Car Racing, and Chess. In all of
these cases we find that measures of popularity (L and HPI) correlate highly
with individual accomplishment, suggesting that measures of global popularity
proxy the historical impact of individuals.Comment: Scientific Data 2:15007
- …
