23 research outputs found

    Multi-Output Gaussian Processes for Crowdsourced Traffic Data Imputation

    Full text link
    Traffic speed data imputation is a fundamental challenge for data-driven transport analysis. In recent years, with the ubiquity of GPS-enabled devices and the widespread use of crowdsourcing alternatives for the collection of traffic data, transportation professionals increasingly look to such user-generated data for many analysis, planning, and decision support applications. However, due to the mechanics of the data collection process, crowdsourced traffic data such as probe-vehicle data is highly prone to missing observations, making accurate imputation crucial for the success of any application that makes use of that type of data. In this article, we propose the use of multi-output Gaussian processes (GPs) to model the complex spatial and temporal patterns in crowdsourced traffic data. While the Bayesian nonparametric formalism of GPs allows us to model observation uncertainty, the multi-output extension based on convolution processes effectively enables us to capture complex spatial dependencies between nearby road segments. Using 6 months of crowdsourced traffic speed data or "probe vehicle data" for several locations in Copenhagen, the proposed approach is empirically shown to significantly outperform popular state-of-the-art imputation methods.Comment: 10 pages, IEEE Transactions on Intelligent Transportation Systems, 201

    Application of the Empirical Bayes Method with the Finite Mixture Model for Identifying Accident-Prone Spots

    Get PDF
    Hotspot identification (HSID) is an important component of the highway safety management process. A number of methods have been proposed to identify hotspots. Among these methods, previous studies have indicated that the empirical Bayes (EB) method can outperform other methods for identifying hotspots, since the EB method combines the historical crash records of the site and expected number of crashes obtained from a safety performance function (SPF) for similar sites. However, the SPFs are usually developed based on a large number of sites, which may contain heterogeneity in traffic characteristic. As a result, the hotspot identification accuracy of EB methods can possibly be affected by SPFs, when heterogeneity is present in crash data. Thus, it is necessary to consider the heterogeneity and homogeneity of roadway segments when using EB methods. To address this problem, this paper proposed three different classification-based EB methods to identify hotspots. Rural highway crash data collected in Texas were analyzed and classified into different groups using the proposed methods. Based on the modeling results for Texas crash dataset, it is found that one proposed classification-based EB method performs better than the standard EB method as well as other HSID methods

    A Framework for Understanding and Addressing Bias and Sparsity in Mobile Location-Based Traffic Data

    No full text
    Thesis (Ph.D.)--University of Washington, 2018Traffic data derived from Global Positioning System (GPS) traces of individual travelers is achieving widespread adoption in transportation engineering and planning, practice, and research. Currently, the majority of such data is obtained from commercial sources, who provide little information about the processes and quality control methods that have been applied to address informative missing data patterns and sampling bias. Looking forward to a future of connected and autonomous vehicles, when fixed mechanical sensing will likely be a thing of the past, there is a growing need to highlight this issue and develop methods to address bias in a principled way. To do this, it is necessary to understand the sampling mechanisms and their impact on missing data and bias. The goal of this work is to describe the mechanisms leading to bias, inaccuracy, and missing data in GPS-based probe vehicle data, and to quantify the impact of these mechanisms quantitatively. It is most often the case that commercial probe vehicle data is collected from multiple traveler subpopulations, each with a distinct driving profile, data collection technology, and penetration rate. Thus, this work develops a framework for estimating the impact of these factors on data completeness and bias under heterogeneous driver populations and data collection technologies. This framework is validated using microscopic traffic simulation software under a range of sampling and traffic conditions. The implications of the estimation framework are investigated with respect to real-world probe vehicle datasets and transportation applications. The primary contributions of this work are as follows. First, this work develops a mathematical framework for describing the relationship between observed data and the true on-road traffic conditions under different sampling parameters and mixed vehicle populations. Second, this work presents an in-depth analysis of the impact of sampling and traffic parameters on statistical representation of real-world probe vehicle data. Finally, a set of case studies are presented illustrating how the proposed framework can be used to improve probe vehicle data quality and fidelity, including the development of a methodology for addressing sampling bias. The methods and guidance provided in this work will be of significant value to public agencies wishing to use probe vehicle data for various forms of transportation analysis, and will inform experimental design and data acquisition agreements for future data collection efforts. Further, this work will support future work in missing data imputation and quality assessment

    Flexible and Robust Treatments for Missing Traffic Sensor Data

    No full text
    Thesis (Master's)--University of Washington, 2014The focus of the work contained in this thesis is missing data treatments in traffic loop detector datasets. This work is motivated by the need to improve data quality and coverage for performance reporting and system management decisions. Missing data, whether due to hardware malfunction or error detection and removal, is a critical concern in loop detector data quality control in Washington State and elsewhere, and can quickly become the controlling factor in overall data quality as the rate of missingness increases. First, the various causal factors and resulting patterns of missingness in loop detector datasets are discussed with respect to the assumptions underlying common missing data treatments. Next, two multiple imputation methodologies are introduced for loop detector data, which have seen use in a number of fields but have not yet been applied to traffic data. These methods are able to take advantage of the various spatial correlation structures present in volume and speed data, and can produce reliable imputation even under high rates of missingness and missing entire days and months. The proposed imputation algorithms are demonstrated in different locations, time periods, and missing data patterns, and are shown to be capable of reliably representing the statistical properties of the true data. Aggregation levels, model structure, and limitations of the proposed methods are discussed, and some guidelines for implementation are presented. The proposed algorithms are designed to be incorporated into a comprehensive quality control process for traffic data, to be implemented as part of the STAR Lab DRIVE Net data analysis, visualization, and dissemination platform

    Flexible and Robust Method for Missing Loop Detector Data Imputation

    Full text link
    This study is primarily focused on missing traffic sensor data imputation for the purpose of improving the coverage and accuracy of traffic analysis and performance estimation. Missing data, whether attributable to hardware failure or error detection and removal, are a constant problem in loop and other traffic detector data sets. As the rate of missingness increases, the treatment of missing values quickly becomes the controlling factor in overall data quality. Previously, several imputation approaches have been developed for traffic data. However, few studies aim at handling the traffic data with large blocks of missing values for networkwide implementation. A proven predictive mean matching multiple imputation method is introduced; it was applied to loop detector volume data collected on Interstate 5 in Washington State. With the use of the iterative multiple imputation by chained equations approach, the spatial correlation between nearby detectors was considered for prediction, and the presence of missing data was effectively dealt with in all predictors. The proposed methodology is shown to perform well on a range of missing data patterns, including missing completely at random, missing days, and missing months. After the imputation method was applied to 20-s data and postimputation aggregation was performed, the results in this study suggest that the proposed method can outperform elementary pairwise regression and produce reliable imputation estimates, even when entire days and months are missing from the data set. Thus the predictive mean matching multiple imputation method can be used as an effective approach for imputing missing traffic data in a range of challenging scenarios. </jats:p

    Estimation of Origin and Destination Information from Bluetooth and Wi-Fi Sensing for Transit

    No full text
    Public urban transit offers a convenient, affordable, and sustainable mode of transportation for many. However, limited subsidies and revenue collected from bus fares place restrictions on the number of bus lines that can operate; these restrictions in turn limit the number of individuals who can benefit from public transit. To make well-informed operational decisions for transit planning and operations, understanding the origin and destination patterns of riders is crucial. However, traditional methods of transit data collection are labor-intensive and costly. Although some transit agencies use data from smart card transactions to obtain trip origin information readily, the trip destination information cannot be directly inferred. To aid in transit data acquisition efforts, this study presents a new technique that uses the Bluetooth and wireless fidelity (Wi-Fi) sensing technologies to estimate the origin and destination information for transit lines. Sensors were installed on four buses to collect Bluetooth, Wi-Fi, and GPS location data for a 4-week period. New methods for data processing and reduction were introduced to exclude invalid detections. On the basis of valid samples, the origin and destination information at different bus stops was estimated for a university operated transit line. The developed methods have the potential to be applied for large-scale transit networks

    Multi-Output Gaussian Processes for Crowdsourced Traffic Data Imputation

    No full text

    Hybrid short‐term prediction of traffic volume at ferry terminal based on data fusion

    Full text link
    corecore