Search CORE

101 research outputs found

Data Strategies for Fleetwide Predictive Maintenance

Author: Noever David
Publication venue
Publication date: 11/12/2018
Field of study

For predictive maintenance, we examine one of the largest public datasets for machine failures derived along with their corresponding precursors as error rates, historical part replacements, and sensor inputs. To simplify the time and accuracy comparison between 27 different algorithms, we treat the imbalance between normal and failing states with nominal under-sampling. We identify 3 promising regression and discriminant algorithms with both higher accuracy (96%) and twenty-fold faster execution times than previous work. Because predictive maintenance success hinges on input features prior to prediction, we provide a methodology to rank-order feature importance and show that for this dataset, error counts prove more predictive than scheduled maintenance might imply solely based on more traditional factors such as machine age or last replacement times.Comment: 3 pages, 3 figure

arXiv.org e-Print Archive

The rotating spectrometer: New biotechnology for cell separations

Author: Matsos Helen C.
Noever David A.
Publication venue
Publication date
Field of study

An instrument for biochemical studies, called the rotating spectrometer, separates previously inseparable cell cultures. The rotating spectrometer is intended for use in pharmacological studies which require fractional splitting of heterogeneous cell cultures based on cell morphology and swimming behavior. As a method to separate and concentrate cells in free solution, the rotating method requires active organism participation and can effectively split the large class of organisms known to form spontaneous patterns. Examples include the biochemical star, an organism called Tetrahymena pyriformis. Following focusing in a rotated frame, the separation is accomplished using different radial dependencies of concentrated algal and protozoan species. The focusing itself appears as concentric rings and arises from the coupling between swimming direction and Coriolis forces. A dense cut is taken at varying radii and extraction is replenished at an inlet. Unlike standard separation and concentrating techniques such as filtration or centrifugation, the instrument is able to separate motile from immotile fractions. For a single pass, typical split efficiencies can reach 200 to 300 percent compared to the inlet concentration

NASA Technical Reports Server

A biosensor for cadmium based on bioconvective patterns

Author: Matsos Helen C.
Noever David A.
Publication venue
Publication date
Field of study

An 'in vitro' method for monitoring cadmium, one of the most lethal bivalent heavy metals, can detect biologically active levels. The effects of cadmium tend to concentrate in protozoa far above natural levels and therein begin transferring through freshwater food chains to animals and humans. In a small sample volume (approximately 5 ml) the method uses the toxic response to the protozoa, Tetrahymena pyriformis, to cadmium. The assay relies on macroscopic bioconvective patterns to measure the toxic response, giving a sensitivity better than 1 micro-g/1 and a toxicity threshold to 7 micro-g/1 for Cd(2+). Cadmium hinders pattern formation in a dose-dependent manner. Arrested organism growth arises from slowed division and mutation to non-dividing classes. Unlike previous efforts, this method can be performed in a shallow flow device and does not require electronic or chemical analyses to monitor toxicity

NASA Technical Reports Server

Virus-MNIST: A Benchmark Malware Dataset

Author: Noever David
Noever Samantha E. Miller
Publication venue
Publication date: 28/02/2021
Field of study

The short note presents an image classification dataset consisting of 10 executable code varieties and approximately 50,000 virus examples. The malicious classes include 9 families of computer viruses and one benign set. The image formatting for the first 1024 bytes of the Portable Executable (PE) mirrors the familiar MNIST handwriting dataset, such that most of the previously explored algorithmic methods can transfer with minor modifications. The designation of 9 virus families for malware derives from unsupervised learning of class labels; we discover the families with KMeans clustering that excludes the non-malicious examples. As a benchmark using deep learning methods (MobileNetV2), we find an overall 80% accuracy for virus identification by families when beneware is included. We also find that once a positive malware detection occurs (by signature or heuristics), the projection of the first 1024 bytes into a thumbnail image can classify with 87% accuracy the type of virus. The work generalizes what other malware investigators have demonstrated as promising convolutional neural networks originally developed to solve image problems but applied to a new abstract domain in pixel bytes from executable files. The dataset is available on Kaggle and Github

arXiv.org e-Print Archive

Can Large Language Models Find And Fix Vulnerable Software?

Author: Noever David
Publication venue
Publication date: 20/08/2023
Field of study

In this study, we evaluated the capability of Large Language Models (LLMs), particularly OpenAI's GPT-4, in detecting software vulnerabilities, comparing their performance against traditional static code analyzers like Snyk and Fortify. Our analysis covered numerous repositories, including those from NASA and the Department of Defense. GPT-4 identified approximately four times the vulnerabilities than its counterparts. Furthermore, it provided viable fixes for each vulnerability, demonstrating a low rate of false positives. Our tests encompassed 129 code samples across eight programming languages, revealing the highest vulnerabilities in PHP and JavaScript. GPT-4's code corrections led to a 90% reduction in vulnerabilities, requiring only an 11% increase in code lines. A critical insight was LLMs' ability to self-audit, suggesting fixes for their identified vulnerabilities and underscoring their precision. Future research should explore system-level vulnerabilities and integrate multiple static code analyzers for a holistic perspective on LLMs' potential

arXiv.org e-Print Archive

The Enron Corpus: Where the Email Bodies are Buried?

Author: Noever David
Publication venue
Publication date: 24/01/2020
Field of study

To probe the largest public-domain email database for indicators of fraud, we apply machine learning and accomplish four investigative tasks. First, we identify persons of interest (POI), using financial records and email, and report a peak accuracy of 95.7%. Secondly, we find any publicly exposed personally identifiable information (PII) and discover 50,000 previously unreported instances. Thirdly, we automatically flag legally responsive emails as scored by human experts in the California electricity blackout lawsuit, and find a peak 99% accuracy. Finally, we track three years of primary topics and sentiment across over 10,000 unique people before, during and after the onset of the corporate crisis. Where possible, we compare accuracy against execution times for 51 algorithms and report human-interpretable business rules that can scale to vast datasets

arXiv.org e-Print Archive

The Multimodal And Modular Ai Chef: Complex Recipe Generation From Imagery

Author: Noever David
Noever Samantha Elizabeth Miller
Publication venue
Publication date: 19/03/2023
Field of study

The AI community has embraced multi-sensory or multi-modal approaches to advance this generation of AI models to resemble expected intelligent understanding. Combining language and imagery represents a familiar method for specific tasks like image captioning or generation from descriptions. This paper compares these monolithic approaches to a lightweight and specialized method based on employing image models to label objects, then serially submitting this resulting object list to a large language model (LLM). This use of multiple Application Programming Interfaces (APIs) enables better than 95% mean average precision for correct object lists, which serve as input to the latest Open AI text generator (GPT-4). To demonstrate the API as a modular alternative, we solve the problem of a user taking a picture of ingredients available in a refrigerator, then generating novel recipe cards tailored to complex constraints on cost, preparation time, dietary restrictions, portion sizes, and multiple meal plans. The research concludes that monolithic multimodal models currently lack the coherent memory to maintain context and format for this task and that until recently, the language models like GPT-2/3 struggled to format similar problems without degenerating into repetitive or non-sensical combinations of ingredients. For the first time, an AI chef or cook seems not only possible but offers some enhanced capabilities to augment human recipe libraries in pragmatic ways. The work generates a 100-page recipe book featuring the thirty top ingredients using over 2000 refrigerator images as initializing lists

arXiv.org e-Print Archive

Overhead MNIST: A Benchmark Satellite Dataset

Author: Noever David
Noever Samantha E. Miller
Publication venue
Publication date: 08/02/2021
Field of study

The research presents an overhead view of 10 important objects and follows the general formatting requirements of the most popular machine learning task: digit recognition with MNIST. This dataset offers a public benchmark extracted from over a million human-labelled and curated examples. The work outlines the key multi-class object identification task while matching with prior work in handwriting, cancer detection, and retail datasets. A prototype deep learning approach with transfer learning and convolutional neural networks (MobileNetV2) correctly identifies the ten overhead classes with an average accuracy of 96.7%. This model exceeds the peak human performance of 93.9%. For upgrading satellite imagery and object recognition, this new dataset benefits diverse endeavors such as disaster relief, land use management, and other traditional remote sensing tasks. The work extends satellite benchmarks with new capabilities to identify efficient and compact algorithms that might work on-board small satellites, a practical task for future multi-sensor constellations. The dataset is available on Kaggle and Github

arXiv.org e-Print Archive

Rock Hunting With Martian Machine Vision

Author: Noever David
Noever Samantha E. Miller
Publication venue
Publication date: 09/04/2021
Field of study

The Mars Perseverance rover applies computer vision for navigation and hazard avoidance. The challenge to do onboard object recognition highlights the need for low-power, customized training, often including low-contrast backgrounds. We investigate deep learning methods for the classification and detection of Martian rocks. We report greater than 97% accuracy for binary classifications (rock vs. rover). We fine-tune a detector to render geo-located bounding boxes while counting rocks. For these models to run on microcontrollers, we shrink and quantize the neural networks' weights and demonstrate a low-power rock hunter with faster frame rates (1 frame per second) but lower accuracy (37%)

arXiv.org e-Print Archive

Image Classifiers for Network Intrusions

Author: Noever David A.
Noever Samantha E. Miller
Publication venue
Publication date: 13/03/2021
Field of study

This research recasts the network attack dataset from UNSW-NB15 as an intrusion detection problem in image space. Using one-hot-encodings, the resulting grayscale thumbnails provide a quarter-million examples for deep learning algorithms. Applying the MobileNetV2's convolutional neural network architecture, the work demonstrates a 97% accuracy in distinguishing normal and attack traffic. Further class refinements to 9 individual attack families (exploits, worms, shellcodes) show an overall 56% accuracy. Using feature importance rank, a random forest solution on subsets show the most important source-destination factors and the least important ones as mainly obscure protocols. The dataset is available on Kaggle

arXiv.org e-Print Archive