80 research outputs found
Prediction of Biological Activities of Volatile Metabolites Using Molecular Fingerprints and Machine Learning Methods
Volatile metabolites are small molecules, comprise a diverse chemical group with various biological activities and have high vapor pressures under ambient conditions. It is crucial to determine the biological activities of volatile metabolites as they play important roles in chemical ecology and human healthcare. In this study, we have accumulated 341 volatiles emitted by biological species associated with 11 types of biological activities and deposited the data into our database, which is called KNApSAcK Metabolite Ecology Database. Using this dataset, we have developed 72 classification models to predict biological activities of volatile metabolites by using various machine learning methods. Eight types of molecular fingerprints were used to represent the molecules, which are PubChem (881 bits), CDK (1024 bits), Extended CDK (1024bits), MACCS (166 bits), Klekota-Roth (4860 bits), Substructure (307 bits), Estate (79 bits), and atom pairs (780 bits). A new type of fingerprint was also proposed by combining all features of these eight fingerprints (Combine, 9121 bits). The best classification model was developed by our proposed fingerprint (Combine, 9121 bits) trained with gradient boosting method algorithm (GBM) with predictive accuracy at 94.43%. The results indicated that molecular fingerprints and machine learning methods could be useful for predicting biological activities of volatile metabolites
On the chaotic nature of biological signals using nonlinear data analysis methodology
Organized by School of Mechatronic Engineering (UniMAP) & co-organized by The Institution of Engineering Malaysia (IEM), 11th - 13th October 2009 at Batu Feringhi, Penang, Malaysia.In this study, we analyze the characteristic of biological signals using nonlinear data analysis methodology. Biological signals are not linear so to get a more accurate portrait of nonlinear signals, we must analyze them with nonlinear analysis
methods. The nonlinear analysis method is emerging as relatively new and rapidly growing in biomedical field. One of the most useful techniques in nonlinear data analysis is the concept of Lyapunov exponent. As we may know, Lyapunov exponent is often used to define whether a dynamical system is chaotic or not. If the system exhibits at least one positive Lyapunov exponent and is purely deterministic, then it is chaotic. In this work, we measure the finger pulse signal for twenty minutes in two different situations. Then, we analyze the finger pulse signal using nonlinear data analysis method. We extract and evaluate Lyapunov exponent parameters from the finger pulse signal. We finally find the positive value of Lyapunov
exponent and confirm the existence of chaotic nature in biological systems.Technical sponsored by IEEE Malaysia Sectio
Cervical cancer detection method using an improved cellular neural network (CNN) algorithm
Cervical cancer is the second most common in Malaysia and the fourth frequent cancer among women in worldwide. Pap smear test is often ignored although it is actually useful, beneficial and essential as screening tool for cervical cancer. However, Pap smear images have low sensitivity as well as specificity. Therefore, it is difficult to determine whether the abnormal cells are cancerous or not. Recently, computer-based algorithms are widely used in cervical cancer screening. In this study, an improved cellular neural network (CNN) algorithm is proposed as the solution to detect the cancerous cells in real-time by undergoing the image processing of Pap smear images. A few templates are combined and modified to form an ideal CNN algorithm to detect the cancerous cells in total of 115 Pap smear images. A MATLAB based CNN is developed for an automated detection of cervix cancerous cells where the templates segmented the nucleus of the cells. From the simulation results, our proposed CNN algorithm can detect the cervix cancer cells automatically with more than 88% accuracy
Detection of topic on Health News in Twitter Data
Abstract: The development and rapid popularization of the internet has led to an exponential growth of data in the network, thus, the text mining becomes more important. Users search for the information from the immense information available online. The ways to obtain valuable information, and to classify, organize and manage vast text data automatically make the text processing even more difficult. Therefore, in order to solve those problems and requirements, intelligent information processing has been extensively studied. Topic modelling has been widely employed in the field of natural language processing. Current research directions are more focused on ways to improve the classification speed and accuracy of text classification and topic detection as well as selecting feature methods in achieving better dimension reduction operations. Latent Dirichlet Allocation (LDA) topic model works well on data noise reduction. The LDA is widely used as a feature model combined with the classifier design in order to achieve a good classification effect. This study aims to conduct data mining and save load from the huge database. Thus, three supervised learning algorithms are run, which are Naïve Bayes, Decision Tree and Random Forest. Random Forest classifier outperforms the other two classifiers with 99.99% accuracy. Seven clusters for topic modelling have been revealed using Random Forest classifier. Each output has been set to four highest word and shows the highest term and its weight. The highest term used in the dataset is term ‘Ebola’. Based on the finding of this study, it shows that the combination of the LDA and supervised learning algorithm effectively solve the problem of data sparseness in short text sets. The method of selecting microblogs that are most likely to discuss news topics will significantly reduce the size of data objects of concern, and to a certain extent eliminate the interference of non-news blogs
In-vitro diagnosis of single and poly microbial species targeted for diabetic foot infection using e-nose technology
BACKGROUND: Effective management of patients with diabetic foot infection is a crucial concern. A delay in prescribing appropriate antimicrobial agent can lead to amputation or life threatening complications. Thus, this electronic nose (e-nose) technique will provide a diagnostic tool that will allow for rapid and accurate identification of a pathogen. RESULTS: This study investigates the performance of e-nose technique performing direct measurement of static headspace with algorithm and data interpretations which was validated by Headspace SPME-GC-MS, to determine the causative bacteria responsible for diabetic foot infection. The study was proposed to complement the wound swabbing method for bacterial culture and to serve as a rapid screening tool for bacteria species identification. The investigation focused on both single and poly microbial subjected to different agar media cultures. A multi-class technique was applied including statistical approaches such as Support Vector Machine (SVM), K Nearest Neighbor (KNN), Linear Discriminant Analysis (LDA) as well as neural networks called Probability Neural Network (PNN). Most of classifiers successfully identified poly and single microbial species with up to 90% accuracy. CONCLUSIONS: The results obtained from this study showed that the e-nose was able to identify and differentiate between poly and single microbial species comparable to the conventional clinical technique. It also indicates that even though poly and single bacterial species in different agar solution emit different headspace volatiles, they can still be discriminated and identified using multivariate techniques
Data-Intensive Science of Relationships Among Species, Volatile Organic Compounds and Biological Activities
奈良先端科学技術大学院大学博士(工学)doctoral thesi
Data-Intensive Science of Relationships Among Species, Volatile Organic Compounds and Biological Activities
- …
