334 research outputs found
How I won the "Chess Ratings - Elo vs the Rest of the World" Competition
This article discusses in detail the rating system that won the kaggle
competition "Chess Ratings: Elo vs the rest of the world". The competition
provided a historical dataset of outcomes for chess games, and aimed to
discover whether novel approaches can predict the outcomes of future games,
more accurately than the well-known Elo rating system. The winning rating
system, called Elo++ in the rest of the article, builds upon the Elo rating
system. Like Elo, Elo++ uses a single rating per player and predicts the
outcome of a game, by using a logistic curve over the difference in ratings of
the players. The major component of Elo++ is a regularization technique that
avoids overfitting these ratings. The dataset of chess games and outcomes is
relatively small and one has to be careful not to draw "too many conclusions"
out of the limited data. Many approaches tested in the competition showed signs
of such an overfitting. The leader-board was dominated by attempts that did a
very good job on a small test dataset, but couldn't generalize well on the
private hold-out dataset. The Elo++ regularization takes into account the
number of games per player, the recency of these games and the ratings of the
opponents. Finally, Elo++ employs a stochastic gradient descent scheme for
training the ratings, and uses only two global parameters (white's advantage
and regularization constant) that are optimized using cross-validation
Energy Savings in EAF Steelmaking by Process Simulation and Data-Science Modeling on the Reproduced Results
Electric-Arc-Furnace (EAF)-based process route in modern steelmaking for the production of plates and special quality bars requires a series of stations for the secondary metallurgy treatment (Ladle-Furnace, and potentially Vacuum-Degasser), till the final casting for the production of slabs and blooms in the corresponding continuous casting machines. However, since every steel grade has its own melting characteristics, the melting (liquidus) temperature per grade is generally different and plays an important role in the final casting temperature, which has to exceed by somewhat the melting temperature by an amount called superheat. The superheat is adjusted at the ladle-furnace (LF) station by the operator who decides mostly on personal experience but, since the ladle has to pass from downstream processes, the liquid steel loses temperature not only due to the duration of the processes till casting but also due to the ladle refractory history. Simulation software was developed in order to reproduce the phenomena involved in a meltshop and influence downstream superheats. Data science models were deployed in order to check the potential of controlling casting temperatures by adjusting liquid-steel exit temperatures at LF
Evaluation of Solidification Times for Medium and High Carbon Steels Based upon Heat Transfer and Solidification Phenomena in the Continuous Casting of Blooms
A Numerical Solution Model for the Heat Transfer in Octagonal Billets
In the quest for high-quality steel products, the need of cast billets with minimum surface and internal defects is of paramount importance. On the other hand, productivity is required to be as high as possible in order to reduce production cost. Different billet shapes have been applied with emphasis upon square, rectangular, and circular cross-sections. It is obvious that the best billet shape that minimizes surface and subsurface defects is the circular one. Nevertheless, this shape creates some problems with respect to handling and safety reasons. One recent attempt is to produce normal octagonal-shaped billets that appear to approach the circular shape albeit easier to handle. In this study, a numerical solution for the heat transfer during solidification in the continuous casting of octagonal billets has been carried out. The developed model deploys an implicit scheme in order to solve the differential equations of heat transfer under the appropriate boundary conditions in a section of an octagonal billet, assuming fully axisymmetric cooling of the bloom. The geometry of the octagonal billet plays an interesting role in the development of the heat transfer analysis. Based upon fundamental principles, a computer program has been developed for this purpose. Consequently, results from the numerical solution are presented and discussed
Ανίχνευση Επικαλυπτόμενων Κοινοτήτων σε Γράφους με Δίκτυα Προσοχής
Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) "Επιστήμη Δεδομένων και Μηχανική Μάθηση
Dwarf: A Complete System for Analyzing High-Dimensional Data Sets
The need for data analysis by different industries, including
telecommunications, retail, manufacturing and financial services, has
generated a flurry of research, highly sophisticated methods and
commercial products. However, all of the current attempts are haunted
by the so-called "high-dimensionality curse"; the complexity of space
and time increases exponentially with the number of analysis
"dimensions". This means that all existing approaches are limited
only to coarse levels of analysis and/or to approximate answers with
reduced precision. As the need for detailed analysis keeps
increasing, along with the volume and the detail of the data that is
stored, these approaches are very quickly rendered unusable. I have
developed a unique method for efficiently performing analysis that is
not affected by the high-dimensionality of data and scales only
polynomially -and almost linearly- with the dimensions without
sacrificing any accuracy in the returned results. I have implemented a
complete system (called "Dwarf") and performed an extensive
experimental evaluation that demonstrated tremendous improvements over
existing methods for all aspects of performing analysis -initial
computation, storing, querying and updating it.
I have extended my research to the "data-streaming" model where
updates are performed on-line, exacerbating any concurrent analysis
but has a very high impact on applications like security, network
management/monitoring router traffic control and sensor networks. I
have devised streaming algorithms that provide complex statistics
within user-specified relative-error bounds over a data stream. I
introduced the class of "distinct implicated statistics", which is
much more general than the established class of "distinct count"
statistics. The latter has been proved invaluable in applications such
as analyzing and monitoring the distinct count of species in a
population or even in query optimization. The "distinct implicated
statistics" class provides invaluable information about the
correlations in the stream and is necessary for applications such as
security. My algorithms are designed to use bounded amounts of memory
and processing -so that they can even be implemented in hardware for
resource-limited environments such as network-routers or sensors- and
also to work in "noisy" environments, where some data may be flawed
either implicitly due to the extraction process or explicitly
The Dwarf Data Cube Eliminates the Highy Dimensionality Curse
The data cube operator encapsulates all possible groupings of a
data set and has proved to be an invaluable tool in analyzing vast amounts
of data. However its apparent exponential complexity has significantly
limited its applicability to low dimensional datasets. Recently the idea
of the dwarf data cube model was introduced, and showed that
high-dimensional ``dwarf data cubes'' are orders of magnitudes smaller in
size than the original data cubes even when they calculate and store every
possible aggregation with 100\% precision.
In this paper we present a surprising analytical result proving
that the size of dwarf cubes grows polynomially with the
dimensionality of the data set and, therefore, a full data cube at 100%
precision is not inherently cursed by high dimensionality. This striking
result of polynomial complexity reformulates the context of cube
management and redefines most of the problems associated with
data-warehousing and On-Line Analytical Processing. We also develop an
efficient algorithm for estimating the size of dwarf data cubes before
actually computing them. Finally, we complement our analytical approach
with an experimental evaluation using real and synthetic data sets, and
demonstrate our results.
UMIACS-TR-2003-12
Cholesteatoma of the external ear canal: etiological factors, symptoms and clinical findings in a series of 48 cases
BACKGROUND: To evaluate symptoms, clinical findings, and etiological factors in external ear canal cholesteatoma (EECC). METHOD: Retrospective evaluation of clinical records of all consecutive patients with EECC in the period 1979 to 2005 in a tertiary referral centre. Main outcome measures were incidence rates, classification according to causes, symptoms, extensions in the ear canal including adjacent structures, and possible etiological factors. RESULTS: Forty-five patients were identified with 48 EECC. Overall incidence rate was 0.30 cases per year per 100,000 inhabitants. Twenty-five cases were primary, while 23 cases were secondary: postoperative (n = 9), postinflammatory (n = 5), postirradiatory (n = 7), and posttraumatic (n = 2). Primary EECC showed a right/left ratio of 12/13 and presented with otalgia (n = 15), itching (n = 5), occlusion (n = 4), hearing loss (n = 3), fullness (n = 2), and otorrhea (n = 1). Similar symptoms were found in secondary EECC, but less pronounced. In total the temporomandibular joint was exposed in 11 cases, while the mastoid and middle ear was invaded in six and three cases, respectively. In one primary case the facial nerve was exposed and in a posttraumatic case the atticus and antrum were invaded. In primary EECC 48% of cases reported mechanical trauma. CONCLUSION: EECC is a rare condition with inconsistent and silent symptoms, whereas the extent of destruction may be pronounced. Otalgia was the predominant symptom and often related to extension into nearby structures. Whereas the aetiology of secondary EECC can be explained, the origin of primary EECC remains uncertain; smoking and minor trauma of the ear canal may predispose
- …
