517 research outputs found
A study of hierarchical and flat classification of proteins
Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article we investigate empirically whether this is the case for two such hierarchies. We compare multi-class classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multi-class settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data, but not in the case of the protein classification problems. Based on this we recommend that strong flat multi-class methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area
Comparative Analysis of Fatal Work Injuries and Fatal Traffic Accidents in the Czech Republic and Slovakia in 2017 and its Use in Risk Prevention
Zajištění bezpečnosti a ochrany zdraví při práci (BOZP)
je upraveno řadou právních předpisů, které se poměrně často
aktualizují, a je nezbytné sledovat jejich vývoj. Základním
kamenem pro správné a funkční zajištění BOZP je analýza rizik
na konkrétním pracovišti (činnosti). Cílem tohoto článku je
komparační analýza nejaktuálnějších smrtelných pracovních úrazů
a smrtelných dopravních nehod, které se staly na území ČR a SR
v roce 2017. Tato analýza může pomoci zaměstnavateli případně
odborně způsobilé osobě v prevenci rizik (OZO) správně odhadnou
riziko řízení motorového vozidla. Analýza rizik byla vyhotovena
pomocí FMEA analýzy.Health and safety at work (OSH) is governed by a number of
laws that are relatively frequently updated and it is necessary to
monitor their progression. A cornerstone for right and functional
occupational health and safety is risk analysis at a particular
workplace (activity). The aim of this article is a comparative
analysis of the most recent fatal injuries and fatal traffi c accidents
that occurred in the Czech Republic and Slovakia in 2017. This
analysis can help an employer or a competent person in risk
prevention (OZO) to correctly assess the risk of driving. Risk
analysis was conducted using FMEA analysis
Integration of computation methods into radiation sterilization process
Authors have developed the computer technology for simulation of practical tasks in radiation sterilization process in accordance with Method 1, Standard ISO 11137 and IAEA Code of Practice. The technology was realized on the base of the RT-Office modules for modeling by Monte Carlo method of an absorbed dose in an irradiated heterogeneous targets with electron beam (EB), X-ray and gamma-ray and specialized software ModeSAL. The software ModeSAL was used for simulation and comparative analysis of a sterility assurance level (SAL) and a sterilizing dose for bioburden to achieve a required SAL in an irradiated product. The mathematical approach of the technology is based on the detailed and precise consideration of a self-consistent physical and geometrical models of calculation an EB, X-ray and gamma-ray dose maps in an irradiated product, a SAL, a sterilizing dose and the spatial and time uncertainties for dose provided to an irradiated product. The analysis of interrelations between parameters of EB radiation facility and values of a SAL and a sterilizing dose are considered.
Разработана компьютерная технология для решения практических задач процесса радиационной стерилизации в соответствии со стандартом ISO 11137 и кодами практики МАГАТЭ. Технология реализована на основе модулей RT-Office и специализированного программного обеспечения ModeSAL. Модули RT-Office предназначены для расчета методом Монте-Карло поглощенной дозы в гетерогенных мишенях, облучаемых пучками электронов, тормозным или гамма излучением. Программное обеспечение ModeSAL использовалось для моделирования и сравнительного анализа уровня стерильности и стерилизационной дозы для микробной контаминации в облучаемом продукте. Математический подход технологии основан на детальном и корректном учете самосогласованных физических и геометрических моделей расчета поля дозы электронов, тормозного и гамма излучения, уровня стерильности и стерилизационной дозы а также пространственных и временных неоднородностей дозы в облучаемой мишени. Проводится анализ взаимосвязи между параметрами радиационного оборудования пучков электронов и величинами уровня стерильности и стерилизационной дозы
Experimental benchmarking of software ModeStEB for simulation electron beam processing
Introduction success of radiation technologies into practice substantially depends on development of computational dosimetry which is based on accurate and validated programs, capable effectively calculate absorbed dose in processes of an irradiation. The simulation of the absorbed dose distributions into thin polyvinylchloride dosimetric films located in the stack of plates of a reference materials irradiated with a scanned electron beam was performed. Modeling of electron beam dose distributions in the multi-layer packages was accomplished using the Monte Carlo method in a three-dimensional geometrical model with utilization of the software ModeStEB. Results of benchmarking experiment for the software ModeStEB, which is used for simulation of industrial electron beam processing, are considered.Успех внедрения радиационных технологий в практику существенно зависит от развития компьютерной дозиметрии основанной на точных и валидированных программах, способных эффективно рассчитывать поглощенную дозу в процессах облучения. Проведено моделирование распределения поглощенной дозы в тонких поливинилхлоридных дозиметрических пленках, расположенных в пакете пластин из известных материалов, который облучался сканирующим пучком электронов с энергией 10 МэВ. Моделирование распределения поглощенной дозы пучка электронов в многослойном пакете проводилось методом Монте Карло в трехмерной геометрии с использованием программы ModeStEB. Обсуждаются результаты экспериментальной проверки программы ModeStEB, которая используется для моделирования промышленных электронно-лучевых технологий. Успіх впровадження радіаційних технологій у практику суттєво залежить від розвитку комп’ютерної дозиметрії, яка грунтується на точних та валідірованих програмах, здатних моделювати поглинену дозу в процесах опромінення. Проведено моделювання розподілу поглиненої дози в тонких полівінілхлоридних плівках, розташованих у пакеті пластин із відомих матеріалів, які опромінювались скануючим пучком електронів з енергією 10 МеВ. Моделювання розподілу поглиненої дози у багатошаровому пакеті проводилось методом Монте Карло у трьох-вимірній геометрії із використанням програми ModeStEB. Обговорюються результати експериментальної перевірки програми ModeStEB, яка використовується для моделювання промислових електронно-променевих технологій
Dose field formation in thin films irradiated by electron beams: comparison of the Monte Carlo simulation with dosimetry
The simulation of the absorbed dose distribution into thin films with different thickness inserted between flat boundaries of two homogeneous materials as well as the absorbed dose distribution near to boundaries of these materials irradiated with a scanned electron beam were fulfilled. Films inserted in multi-layer targets were parallel orientated with an incident electron beam (EB). The simulation was performed by the Monte Carlo (MC) method with utilization of the software ModePEB which was designed for predictions of a dose distribution in heterogeneous targets irradiated by EB with the electron energy range from 0.1 to 25 MeV.
Comparison results of MC simulation with dosimetry for electron dose distribution formation in multi-layer targets with thin films is discussed. Проведено моделирование распределения поглощенной дозы в тонких пленках разной толщины, расположенных между плоскими границами двух гомогенных материалов, а также распределения поглощенной дозы вблизи границы этих материалов, облучаемых сканирующим пучком электронов. Пленки в многослойных мишенях были ориентированы параллельно падающему пучку электронов. Моделирование проводилось методом Монте Карло с использованием программы ModePEB, разработанной для предсказаний распределения дозы в гетерогенных мишенях, облучаемых пучком электронов с энергией от 0.1 до 25 МэВ. Обсуждается сравнение результатов Монте Карло моделирования с дозиметрией для формирования распределения дозы электронов в многослойных мишенях с тонкими пленками.
Проведено моделювання розподілу поглиненої дози в тонких плівках різної товщини, розташованих між плоскими границями двох гомогенних матеріалів, а також розподіл поглиненої дози біля границі цих матеріалів, що опромінюються сканованим пучком електронів. Плівки у багатошарових мішенях були орієнтовані паралельно падаючому пучку електронів. Моделювання проводилось методом Монте Карло із використанням програми ModePEB, яку розроблено для передбачення розподілу дози в гетерогенних мішенях, що опромінюються пучком електронів з енергією від 0.1 до 25 МеВ. Обговорюються порівняння результатів Монте Карло моделювання із дозиметрією для формування розподілу дози електронів у багатошарових мішенях із тонкими плівками
Scalable and Interpretable One-class SVMs with Deep Learning and Random Fourier features
One-class support vector machine (OC-SVM) for a long time has been one of the
most effective anomaly detection methods and extensively adopted in both
research as well as industrial applications. The biggest issue for OC-SVM is
yet the capability to operate with large and high-dimensional datasets due to
optimization complexity. Those problems might be mitigated via dimensionality
reduction techniques such as manifold learning or autoencoder. However,
previous work often treats representation learning and anomaly prediction
separately. In this paper, we propose autoencoder based one-class support
vector machine (AE-1SVM) that brings OC-SVM, with the aid of random Fourier
features to approximate the radial basis kernel, into deep learning context by
combining it with a representation learning architecture and jointly exploit
stochastic gradient descent to obtain end-to-end training. Interestingly, this
also opens up the possible use of gradient-based attribution methods to explain
the decision making for anomaly detection, which has ever been challenging as a
result of the implicit mappings between the input space and the kernel space.
To the best of our knowledge, this is the first work to study the
interpretability of deep learning in anomaly detection. We evaluate our method
on a wide range of unsupervised anomaly detection tasks in which our end-to-end
training architecture achieves a performance significantly better than the
previous work using separate training.Comment: Accepted at European Conference on Machine Learning and Principles
and Practice of Knowledge Discovery in Databases (ECML-PKDD) 201
Electron beam irradiator for post-harvest processing of chestnut fruits: technical parameters and feasibility
In a recent worldwide estimation, food irradiation processing represents about 400 000 ton, from which almost half (186 000 ton) were to eliminate insects. In EU Mediterranean countries chestnut fruits production represents a market of more than 100 000 ton, being Portugal the third producer with an amount of 20 000 ton, exporting 25% of the production, representing an income of about 15 million Euros. In March 2010, a European Union commission decision prohibited the use of methyl bromide (MeBr), a wide spectrum fumigant used for different agricultural purposes, namely for post-harvest disinfestation of chestnut fruits. The banning of MeBr could represent an opportunity to implement ionizing radiation treatment, as a well tested technology in other food commodities for post-harvest preservation.
Electron beam irradiators are more hardware sophisticated than gamma irradiators, however due to several factors they are becoming more popular and being the first choice, whenever the product can be treated by low penetration radiation.
Since the current focus for food irradiation is in e-beam versatility and advantages, it is presented in this paper a detailed analysis and discussion about technical characteristics and feasibility for post-harvest irradiation of chestnut fruits, taking in account the physical dimensions and fruits seasonality, beam energy, throughput and total costs of operation, to estimate the impact on the final price of the irradiated product
Correlation Clustering
Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. The core step of the KDD process is the application of a Data Mining algorithm in order to produce a particular enumeration of patterns and relationships in large databases. Clustering is one of the major data mining techniques and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized, and the similarity of objects from different clusters is minimized. This can serve to group customers with similar interests, or to group genes with related functionalities.
Currently, a challenge for clustering-techniques are especially high dimensional feature-spaces. Due to modern facilities of data collection, real data sets usually contain many features. These features are often noisy or exhibit correlations among each other. However, since these effects in different parts of the data set are differently relevant, irrelevant features cannot be discarded in advance. The selection of relevant features must therefore be integrated into the data mining technique.
Since about 10 years, specialized clustering approaches have been developed to cope with problems in high dimensional data better than classic clustering approaches. Often, however, the different problems of very different nature are not distinguished from one another. A main objective of this thesis is therefore a systematic classification of the diverse approaches developed in recent years according to their task definition, their basic strategy, and their algorithmic approach. We discern as main categories the search for clusters (i) w.r.t. closeness of objects in axis-parallel subspaces, (ii) w.r.t. common behavior (patterns) of objects in axis-parallel subspaces, and (iii) w.r.t. closeness of objects in arbitrarily oriented subspaces (so called correlation cluster).
For the third category, the remaining parts of the thesis describe novel approaches. A first approach is the adaptation of density-based clustering to the problem of correlation clustering. The starting point here is the first density-based approach in this field, the algorithm 4C. Subsequently, enhancements and variations of this approach are discussed allowing for a more robust, more efficient, or more effective behavior or even find hierarchies of correlation clusters and the corresponding subspaces. The density-based approach to correlation clustering, however, is fundamentally unable to solve some issues since an analysis of local neighborhoods is required. This is a problem in high dimensional data. Therefore, a novel method is proposed tackling the correlation clustering problem in a global approach. Finally, a method is proposed to derive models for correlation clusters to allow for an interpretation of the clusters and facilitate more thorough analysis in the corresponding domain science. Finally, possible applications of these models are proposed and discussed.Knowledge Discovery in Databases (KDD) ist der Prozess der automatischen Extraktion von Wissen aus großen Datenmengen, das gültig, bisher unbekannt und potentiell nützlich für eine gegebene Anwendung ist. Der zentrale Schritt des KDD-Prozesses ist das Anwenden von Data Mining-Techniken,
um nützliche Beziehungen und Zusammenhänge in einer aufbereiteten Datenmenge aufzudecken. Eine der wichtigsten Techniken des Data Mining ist die Cluster-Analyse (Clustering). Dabei sollen die Objekte einer Datenbank in Gruppen (Cluster) partitioniert werden, so dass Objekte eines Clusters möglichst ähnlich und Objekte verschiedener Cluster möglichst unähnlich zu einander sind. Hier können beispielsweise Gruppen von Kunden identifiziert werden, die ähnliche Interessen haben, oder Gruppen von Genen, die ähnliche Funktionalitäten besitzen.
Eine aktuelle Herausforderung für Clustering-Verfahren stellen hochdimensionale Feature-Räume dar. Reale Datensätze beinhalten dank moderner Verfahren zur Datenerhebung häufig sehr viele Merkmale (Features). Teile dieser Merkmale unterliegen oft Rauschen oder Abhängigkeiten und können meist nicht im Vorfeld ausgesiebt werden, da diese Effekte in Teilen der Datenbank jeweils unterschiedlich ausgeprägt sind. Daher muss die Wahl der Features mit dem Data-Mining-Verfahren verknüpft werden.
Seit etwa 10 Jahren werden vermehrt spezialisierte Clustering-Verfahren entwickelt, die mit den in hochdimensionalen Feature-Räumen auftretenden Problemen besser umgehen können als klassische Clustering-Verfahren. Hierbei wird aber oftmals nicht zwischen den ihrer Natur nach im Einzelnen sehr unterschiedlichen Problemen unterschieden. Ein Hauptanliegen der Dissertation ist daher eine systematische Einordnung der in den letzten Jahren entwickelten sehr diversen Ansätze nach den Gesichtspunkten ihrer jeweiligen Problemauffassung, ihrer grundlegenden Lösungsstrategie und ihrer algorithmischen Vorgehensweise. Als Hauptkategorien unterscheiden wir hierbei die Suche nach Clustern (1.) hinsichtlich der Nähe von Cluster-Objekten in
achsenparallelen Unterräumen, (2.) hinsichtlich gemeinsamer Verhaltensweisen (Mustern) von Cluster-Objekten in achsenparallelen Unterräumen und (3.) hinsichtlich der Nähe von Cluster-Objekten in beliebig orientierten Unterräumen (sogenannte Korrelations-Cluster).
Für die dritte Kategorie sollen in den weiteren Teilen der Dissertation innovative Lösungsansätze entwickelt werden. Ein erster Lösungsansatz basiert auf einer Erweiterung des dichte-basierten Clustering auf die Problemstellung des Korrelations-Clustering. Den Ausgangspunkt bildet der erste dichtebasierte Ansatz in diesem Bereich, der Algorithmus 4C. Anschließend werden Erweiterungen und Variationen dieses Ansatzes diskutiert, die robusteres, effizienteres oder effektiveres Verhalten aufweisen oder sogar Hierarchien
von Korrelations-Clustern und den entsprechenden Unterräumen finden. Die dichtebasierten Korrelations-Cluster-Verfahren können allerdings einige Probleme grundsätzlich nicht lösen, da sie auf der Analyse lokaler Nachbarschaften beruhen. Dies ist in hochdimensionalen Feature-Räumen problematisch. Daher wird eine weitere Neuentwicklung vorgestellt, die das Korrelations-Cluster-Problem mit einer globalen Methode angeht. Schließlich wird eine Methode vorgestellt, die Cluster-Modelle für Korrelationscluster ableitet, so dass die gefundenen Cluster interpretiert werden können und tiefergehende Untersuchungen in der jeweiligen Fachdisziplin zielgerichtet möglich sind. Mögliche Anwendungen dieser Modelle werden abschließend vorgestellt und untersucht
- …
