79 research outputs found

    When linearity prevails over hierarchy in syntax

    Get PDF
    Hierarchical structure has been cherished as a grammatical universal. We use experimental methods to show where linear order is also a relevant syntactic relation. An identical methodology and design were used across six research sites on South Slavic languages. Experimental results show that in certain configurations, grammatical production can in fact favor linear order over hierarchical structure. However, these findings are limited to coordinate structures and distinct from the kind of production errors found with comparable configurations such as “attraction” errors. The results demonstrate that agreement morphology may be computed in a series of steps, one of which is partly independent from syntactic hierarchy

    Coupled clustering ensemble by exploring data interdependence

    Get PDF
    © 2018 ACM. Clustering ensembles combine multiple partitions of data into a single clustering solution. It is an effective technique for improving the quality of clustering results. Current clustering ensemble algorithms are usually built on the pairwise agreements between clusterings that focus on the similarity via consensus functions, between data objects that induce similarity measures from partitions and re-cluster objects, and between clusters that collapse groups of clusters into meta-clusters. In most of those models, there is a strong assumption on IIDness (i.e., independent and identical distribution), which states that base clusterings perform independently of one another and all objects are also independent. In the real world, however, objects are generally likely related to each other through features that are either explicit or even implicit. There is also latent but definite relationship among intermediate base clusterings because they are derived from the same set of data. All these demand a further investigation of clustering ensembles that explores the interdependence characteristics of data. To solve this problem, a new coupled clustering ensemble (CCE) framework that works on the interdependence nature of objects and intermediate base clusterings is proposed in this article. The main idea is to model the coupling relationship between objects by aggregating the similarity of base clusterings, and the interactive relationship among objects by addressing their neighborhood domains. Once these interdependence relationships are discovered, they will act as critical supplements to clustering ensembles. We verified our proposed framework by using three types of consensus function: clustering-based, object-based, and cluster-based. Substantial experiments on multiple synthetic and real-life benchmark datasets indicate that CCE can effectively capture the implicit interdependence relationships among base clusterings and among objects with higher clustering accuracy, stability, and robustness compared to 14 state-of-the-art techniques, supported by statistical analysis. In addition, we show that the final clustering quality is dependent on the data characteristics (e.g., quality and consistency) of base clusterings in terms of sensitivity analysis. Finally, the applications in document clustering, as well as on the datasets with much larger size and dimensionality, further demonstrate the effectiveness, efficiency, and scalability of our proposed models

    OECEP: Enriching Complex Event Processing with Domain Knowledge from Ontologies

    No full text
    With the increasing adoption of an event-based perspective in many organizations, the demands for automatic processing of events are becoming more sophisticated. Although complex event processing systems can process events in near real-time, these systems rely heavily upon human domain experts. This becomes an issue in application areas that are rich in specialized domain knowledge and background information, such as clinical environments. We utilize a framework of four techniques to enhance complex event processing with domain knowledge from ontologies to address this issue. We realize this framework in our novel approach of ontologysupported complex event processing, which stands in contrast to related approaches and emphasizes the strengths of current advances in the individual fields of complex event processing and ontologies. Experimental results from the implementation of our approach based on a state-of-the-art system show its feasibility and indicate the direction for future research.Griffith Sciences, School of Information and Communication TechnologyFull Tex

    Hybrid Method for Short Text Topic Modeling

    No full text
    The rise in social media’s popularity has led to a significant increase in user-generated content across various topics. Extracting information from these data can be valuable for different domains, however, due to the nature of the vast volume it is not possible to extract information manually. Different aspects of information extraction have been introduced in literature including identifying what topic is discussed in the text. The challenge becomes even bigger when the text is short, such as found in social media. Various methods for topic modeling have been proposed in the literature that could be generally categorized as unsupervised and supervised learning. However, unsupervised topic modeling methods have some shortcomings, such as semantic loss and poor explanation, and are sensitive to the choice of parameters, such as the number of topics. While supervised machine learning methods based on deep learning can achieve high accuracy they need data annotated by humans, which is time-consuming and costly. To overcome the above mentioned disadvantages this work proposes a hybrid topic modeling method that combines the advantages of both unsupervised and supervised methods. We built a hybrid model by combining Latent Dirichlet Allocation (LDA) and deep learning built on top of the Bidirectional Encoder Representations from the Transformers (BERT) model. LDA is used to identify the optimal number of topics and topic-relevant keywords where the only need for human input, with the aid of ChatGPT, is to identify associated topics based on topic-specific keywords. This annotation is used to train and fine-tune the BERT model. In the experimental evaluation of posts related to climate change, we show that the proposed concept is applicable for predicting topics from short text without the need for lengthy and costly annotation.Full Tex

    Efficient Data Mining Method to Localise Errors in RFID Data

    No full text
    Since the emergence of Radio Frequency Identi?cation technology (RFID), the community has been promised a cost effective and ef?cient means to identify and track large number of items with relative ease. Unfortunately, due to the unreliable nature of the passive architecture, the RFID revolution has been reduced to a fraction of intended audience due to the anomalies. These anomalies are duplicate, positive and negative readings. While duplicate readings and wrong data (false positive) can be easily identi?ed and recti?ed, that is not the case for false negative or missed readings. To identify missed readings data mining methods can be used. However, due to its vast volume and complex spatio-temporal structure of RFID data, traditional data mining methods are not necessarily directly applicable. In this paper we propose method to identify possible missed RFID readings by applying association rules data mining method. In empirical study we show that our algorithm is accurate and ef?cient and also we show that it scales well with increased number of rows therefore it is applicable on vast volume on spatio-temporal RFID data.Griffith Sciences, School of Information and Communication TechnologyFull Tex

    Building a Dynamic Classifier for Large Text Data Collections

    No full text
    Due to the lack of in-built tools to navigate the web, people have to use external solutions to find information. The most popular of these are search engines and web directories. Search engines allow users to /textit{locate} specific information about a particular topic, whereas web directories facilitate /textit{exploration} over a wider topic. In the recent past, statistical machine learning methods have been successfully exploited in search engines. Web directories remained in their primitive state, which resulted in their decline. Exploration however is a task which answers a different information need of the user and should not be neglected. Web directories should provide a user experience of the same quality as search engines. Their development by machine learning methods however is hindered by the noisy nature of the web, which makes text classifiers unreliable when applied to web data. In this paper we propose Stochastic Prior Distribution Adjustment (SPDA) - a variation of the Multinomial Naive Bayes (MNB) classifier which makes it more suitable to classify real-world data. By stochastically adjusting class prior distributions we achieve a better overall success rate, but more importantly we also significantly improve error distribution across classes, making the classifier equally reliable for all classes and therefore more usable.Griffith Sciences, School of Information and Communication TechnologyFull Tex

    Applying a Neural Network to Recover Missed RFID Readings

    No full text
    Since the emergence of Radio Frequency Identification technology (RFID), the community has been promised a cost effective and efficient means of identifying and tracking large sums of items with relative ease. Unfortunately, due to the unreliable nature of the passive architecture, the RFID revolution has been reduced to a fraction of its intended audience due to anomalies such as missed readings. Previous work within this field of study have focused on restoring the data at the recording phase which we believe does not allow enough evidence for consecutive missed readings to be corrected. In this study, we propose a methodology of intelligently imputing missing observations through the use of an /emph{Artificial Neural Network} (ANN) in a static environment. Through experimentation, we discover the most effective algorithm to train the network and establish that the ANN restores a cleaner data set than other intelligent classifier methodologies.Griffith Sciences, School of Information and Communication TechnologyFull Tex

    Handling of Current Time in Native XML Databases

    No full text
    The introduction of Native XML databases opens many research questions related to the data models used to represent and manipulate data, including temporal data, in XML. Increasing use of XML for Valid Web pages warrants an adequate treatment of now in Native XML databases. In this study, we examined how to represent and manipulate now-relative temporal data. We identify different approaches being used to represent current time in XML temporal databases, and introduce the notion of storing variables such as `now' or `UC' as strings in XML native databases. All approaches are empirically evaluated on a query that time-slices the timeline at the current time. The experimental results indicate that the proposed extension offers several advantages over other approaches: better semantics, less storage space and better response time.Griffith Sciences, School of Information and Communication TechnologyFull Tex

    Connecting social media data with observed hybrid data for environment monitoring

    No full text
    Environmental monitoring has been regarded as one of effective solutions to protect our living places from potential risks. Traditional methods rely on periodically recording assessments of observed objects, which results in large amount of hybrid data sets. Additionally public opinions regarding certain topics can be extracted from social media and used as another source of descriptive data. In this work, we investigate how to connect and process the public opinions from social media with hybrid observation records. Particularly, we study Twitter posts from designated region with respect to specific topics, such as marine environmental activities. Sentiment analysis on tweets is performed to reflect public opinions on the environmental topics. Additionally two hybrid data sets have been considered. To process these data we use Hadoop cluster and utilize NoSql and relational databases to store data distributed across nodes in share nothing architecture. We compare the public sentiments in social media with scientific observations in real time and show that the “citizen science” enhanced with real time analytics can provide avenue to nominatively monitor natural environments. The approach presented in this paper provides an innovative method to monitor environment with the power of social media analysis and distributed computing.No Full Tex
    corecore