101 research outputs found
Data Strategies for Fleetwide Predictive Maintenance
For predictive maintenance, we examine one of the largest public datasets for
machine failures derived along with their corresponding precursors as error
rates, historical part replacements, and sensor inputs. To simplify the time
and accuracy comparison between 27 different algorithms, we treat the imbalance
between normal and failing states with nominal under-sampling. We identify 3
promising regression and discriminant algorithms with both higher accuracy
(96%) and twenty-fold faster execution times than previous work. Because
predictive maintenance success hinges on input features prior to prediction, we
provide a methodology to rank-order feature importance and show that for this
dataset, error counts prove more predictive than scheduled maintenance might
imply solely based on more traditional factors such as machine age or last
replacement times.Comment: 3 pages, 3 figure
The rotating spectrometer: New biotechnology for cell separations
An instrument for biochemical studies, called the rotating spectrometer, separates previously inseparable cell cultures. The rotating spectrometer is intended for use in pharmacological studies which require fractional splitting of heterogeneous cell cultures based on cell morphology and swimming behavior. As a method to separate and concentrate cells in free solution, the rotating method requires active organism participation and can effectively split the large class of organisms known to form spontaneous patterns. Examples include the biochemical star, an organism called Tetrahymena pyriformis. Following focusing in a rotated frame, the separation is accomplished using different radial dependencies of concentrated algal and protozoan species. The focusing itself appears as concentric rings and arises from the coupling between swimming direction and Coriolis forces. A dense cut is taken at varying radii and extraction is replenished at an inlet. Unlike standard separation and concentrating techniques such as filtration or centrifugation, the instrument is able to separate motile from immotile fractions. For a single pass, typical split efficiencies can reach 200 to 300 percent compared to the inlet concentration
A biosensor for cadmium based on bioconvective patterns
An 'in vitro' method for monitoring cadmium, one of the most lethal bivalent heavy metals, can detect biologically active levels. The effects of cadmium tend to concentrate in protozoa far above natural levels and therein begin transferring through freshwater food chains to animals and humans. In a small sample volume (approximately 5 ml) the method uses the toxic response to the protozoa, Tetrahymena pyriformis, to cadmium. The assay relies on macroscopic bioconvective patterns to measure the toxic response, giving a sensitivity better than 1 micro-g/1 and a toxicity threshold to 7 micro-g/1 for Cd(2+). Cadmium hinders pattern formation in a dose-dependent manner. Arrested organism growth arises from slowed division and mutation to non-dividing classes. Unlike previous efforts, this method can be performed in a shallow flow device and does not require electronic or chemical analyses to monitor toxicity
Virus-MNIST: A Benchmark Malware Dataset
The short note presents an image classification dataset consisting of 10
executable code varieties and approximately 50,000 virus examples. The
malicious classes include 9 families of computer viruses and one benign set.
The image formatting for the first 1024 bytes of the Portable Executable (PE)
mirrors the familiar MNIST handwriting dataset, such that most of the
previously explored algorithmic methods can transfer with minor modifications.
The designation of 9 virus families for malware derives from unsupervised
learning of class labels; we discover the families with KMeans clustering that
excludes the non-malicious examples. As a benchmark using deep learning methods
(MobileNetV2), we find an overall 80% accuracy for virus identification by
families when beneware is included. We also find that once a positive malware
detection occurs (by signature or heuristics), the projection of the first 1024
bytes into a thumbnail image can classify with 87% accuracy the type of virus.
The work generalizes what other malware investigators have demonstrated as
promising convolutional neural networks originally developed to solve image
problems but applied to a new abstract domain in pixel bytes from executable
files. The dataset is available on Kaggle and Github
Can Large Language Models Find And Fix Vulnerable Software?
In this study, we evaluated the capability of Large Language Models (LLMs),
particularly OpenAI's GPT-4, in detecting software vulnerabilities, comparing
their performance against traditional static code analyzers like Snyk and
Fortify. Our analysis covered numerous repositories, including those from NASA
and the Department of Defense. GPT-4 identified approximately four times the
vulnerabilities than its counterparts. Furthermore, it provided viable fixes
for each vulnerability, demonstrating a low rate of false positives. Our tests
encompassed 129 code samples across eight programming languages, revealing the
highest vulnerabilities in PHP and JavaScript. GPT-4's code corrections led to
a 90% reduction in vulnerabilities, requiring only an 11% increase in code
lines. A critical insight was LLMs' ability to self-audit, suggesting fixes for
their identified vulnerabilities and underscoring their precision. Future
research should explore system-level vulnerabilities and integrate multiple
static code analyzers for a holistic perspective on LLMs' potential
The Enron Corpus: Where the Email Bodies are Buried?
To probe the largest public-domain email database for indicators of fraud, we
apply machine learning and accomplish four investigative tasks. First, we
identify persons of interest (POI), using financial records and email, and
report a peak accuracy of 95.7%. Secondly, we find any publicly exposed
personally identifiable information (PII) and discover 50,000 previously
unreported instances. Thirdly, we automatically flag legally responsive emails
as scored by human experts in the California electricity blackout lawsuit, and
find a peak 99% accuracy. Finally, we track three years of primary topics and
sentiment across over 10,000 unique people before, during and after the onset
of the corporate crisis. Where possible, we compare accuracy against execution
times for 51 algorithms and report human-interpretable business rules that can
scale to vast datasets
The Multimodal And Modular Ai Chef: Complex Recipe Generation From Imagery
The AI community has embraced multi-sensory or multi-modal approaches to
advance this generation of AI models to resemble expected intelligent
understanding. Combining language and imagery represents a familiar method for
specific tasks like image captioning or generation from descriptions. This
paper compares these monolithic approaches to a lightweight and specialized
method based on employing image models to label objects, then serially
submitting this resulting object list to a large language model (LLM). This use
of multiple Application Programming Interfaces (APIs) enables better than 95%
mean average precision for correct object lists, which serve as input to the
latest Open AI text generator (GPT-4). To demonstrate the API as a modular
alternative, we solve the problem of a user taking a picture of ingredients
available in a refrigerator, then generating novel recipe cards tailored to
complex constraints on cost, preparation time, dietary restrictions, portion
sizes, and multiple meal plans. The research concludes that monolithic
multimodal models currently lack the coherent memory to maintain context and
format for this task and that until recently, the language models like GPT-2/3
struggled to format similar problems without degenerating into repetitive or
non-sensical combinations of ingredients. For the first time, an AI chef or
cook seems not only possible but offers some enhanced capabilities to augment
human recipe libraries in pragmatic ways. The work generates a 100-page recipe
book featuring the thirty top ingredients using over 2000 refrigerator images
as initializing lists
Overhead MNIST: A Benchmark Satellite Dataset
The research presents an overhead view of 10 important objects and follows
the general formatting requirements of the most popular machine learning task:
digit recognition with MNIST. This dataset offers a public benchmark extracted
from over a million human-labelled and curated examples. The work outlines the
key multi-class object identification task while matching with prior work in
handwriting, cancer detection, and retail datasets. A prototype deep learning
approach with transfer learning and convolutional neural networks (MobileNetV2)
correctly identifies the ten overhead classes with an average accuracy of
96.7%. This model exceeds the peak human performance of 93.9%. For upgrading
satellite imagery and object recognition, this new dataset benefits diverse
endeavors such as disaster relief, land use management, and other traditional
remote sensing tasks. The work extends satellite benchmarks with new
capabilities to identify efficient and compact algorithms that might work
on-board small satellites, a practical task for future multi-sensor
constellations. The dataset is available on Kaggle and Github
Rock Hunting With Martian Machine Vision
The Mars Perseverance rover applies computer vision for navigation and hazard
avoidance. The challenge to do onboard object recognition highlights the need
for low-power, customized training, often including low-contrast backgrounds.
We investigate deep learning methods for the classification and detection of
Martian rocks. We report greater than 97% accuracy for binary classifications
(rock vs. rover). We fine-tune a detector to render geo-located bounding boxes
while counting rocks. For these models to run on microcontrollers, we shrink
and quantize the neural networks' weights and demonstrate a low-power rock
hunter with faster frame rates (1 frame per second) but lower accuracy (37%)
Image Classifiers for Network Intrusions
This research recasts the network attack dataset from UNSW-NB15 as an
intrusion detection problem in image space. Using one-hot-encodings, the
resulting grayscale thumbnails provide a quarter-million examples for deep
learning algorithms. Applying the MobileNetV2's convolutional neural network
architecture, the work demonstrates a 97% accuracy in distinguishing normal and
attack traffic. Further class refinements to 9 individual attack families
(exploits, worms, shellcodes) show an overall 56% accuracy. Using feature
importance rank, a random forest solution on subsets show the most important
source-destination factors and the least important ones as mainly obscure
protocols. The dataset is available on Kaggle
- …
