322 research outputs found
Optimal Crowdsourced Classification with a Reject Option in the Presence of Spammers
We explore the design of an effective crowdsourcing system for an -ary
classification task. Crowd workers complete simple binary microtasks whose
results are aggregated to give the final decision. We consider the scenario
where the workers have a reject option so that they are allowed to skip
microtasks when they are unable to or choose not to respond to binary
microtasks. We present an aggregation approach using a weighted majority voting
rule, where each worker's response is assigned an optimized weight to maximize
crowd's classification performance.Comment: submitted to ICASSP 201
Multi-object Classification via Crowdsourcing with a Reject Option
Consider designing an effective crowdsourcing system for an -ary
classification task. Crowd workers complete simple binary microtasks whose
results are aggregated to give the final result. We consider the novel scenario
where workers have a reject option so they may skip microtasks when they are
unable or choose not to respond. For example, in mismatched speech
transcription, workers who do not know the language may not be able to respond
to microtasks focused on phonological dimensions outside their categorical
perception. We present an aggregation approach using a weighted majority voting
rule, where each worker's response is assigned an optimized weight to maximize
the crowd's classification performance. We evaluate system performance in both
exact and asymptotic forms. Further, we consider the setting where there may be
a set of greedy workers that complete microtasks even when they are unable to
perform it reliably. We consider an oblivious and an expurgation strategy to
deal with greedy workers, developing an algorithm to adaptively switch between
the two based on the estimated fraction of greedy workers in the anonymous
crowd. Simulation results show improved performance compared with conventional
majority voting.Comment: two column, 15 pages, 8 figures, submitted to IEEE Trans. Signal
Proces
Bifacial dye-sensitized solar cells : a strategy to enhance overall efficiency based on transparent polyaniline electrode
Dye-sensitized solar cell (DSSC) is a promising solution to global energy and environmental problems
because of its clean, low-cost, high efficiency, good durability, and easy fabrication. However, enhancing the
efficiency of the DSSC still is an important issue. Here we devise a bifacial DSSC based on a transparent
polyaniline (PANI) counter electrode (CE). Owing to the sunlight irradiation simultaneously from the front
and the rear sides, more dye molecules are excited and more carriers are generated, which results in the
enhancement of short-circuit current density and therefore overall conversion efficiency. The photoelectric
properties of PANI can be improved by modifying with 4-aminothiophenol (4-ATP). The bifacial DSSC
with 4-ATP/PANI CE achieves a light-to-electric energy conversion efficiency of 8.35%, which is increased
by ,24.6% compared to the DSSC irradiated from the front only. This new concept along with promising
results provides a new approach for enhancing the photovoltaic performances of solar cells.The authors acknowledge the financial joint support by the National High Technology Research and Development Program of China (No. 2009AA03Z217), the National Natural Science Foundation of China (nos. 90922028, U1205112, 51002053, 61306077), Seed Fund from Ocean University of China, and Fundamental Research Funds for the Central Universities (201313001)
On Classification in Human-driven and Data-driven Systems
Classification systems are ubiquitous, and the design of effective classification algorithms has been an even more active area of research since the emergence of machine learning techniques. Despite the significant efforts devoted to training and feature selection in classification systems, misclassifications do occur and their effects can be critical in various applications. The central goal of this thesis is to analyze classification problems in human-driven and data-driven systems, with potentially unreliable components and design effective strategies to ensure reliable and effective classification algorithms in such systems. The components/agents in the system can be machines and/or humans. The system components can be unreliable due to a variety of reasons such as faulty machines, security attacks causing machines to send falsified information, unskilled human workers sending imperfect information, or human workers providing random responses. This thesis first quantifies the effect of such unreliable agents on the classification performance of the systems and then designs schemes that mitigate misclassifications and their effects by adapting the behavior of the classifier on samples from machines and/or humans and ensure an effective and reliable overall classification.
In the first part of this thesis, we study the case when only humans are present in the systems, and consider crowdsourcing systems. Human workers in crowdsourcing systems observe the data and respond individually by providing label related information to a fusion center in a distributed manner. In such systems, we consider the presence of unskilled human workers where they have a reject option so that they may choose not to provide information regarding the label of the data. To maximize the classification performance at the fusion center, an optimal aggregation rule is proposed to fuse the human workers\u27 responses in a weighted majority voting manner.
Next, the presence of unreliable human workers, referred to as spammers, is considered. Spammers are human workers that provide random guesses regarding the data label information to the fusion center in crowdsourcing systems. The effect of spammers on the overall classification performance is characterized when the spammers can strategically respond to maximize their reward in reward-based crowdsourcing systems. For such systems, an optimal aggregation rule is proposed by adapting the classifier based on the responses from the workers.
The next line of human-driven classification is considered in the context of social networks. The classification problem is studied to classify a human whether he/she is influential or not in propagating information in social networks. Since the knowledge of social network structures is not always available, the influential agent classification problem without knowing the social network structure is studied. A multi-task low rank linear influence model is proposed to exploit the relationships between different information topics. The proposed approach can simultaneously predict the volume of information diffusion for each topic and automatically classify the influential nodes for each topic.
In the third part of the thesis, a data-driven decentralized classification framework is developed where machines interact with each other to perform complex classification tasks. However, the machines in the system can be unreliable due to a variety of reasons such as noise, faults and attacks. Providing erroneous updates leads the classification process in a wrong direction, and degrades the performance of decentralized classification algorithms. First, the effect of erroneous updates on the convergence of the classification algorithm is analyzed, and it is shown that the algorithm linearly converges to a neighborhood of the optimal classification solution. Next, guidelines are provided for network design to achieve faster convergence. Finally, to mitigate the impact of unreliable machines, a robust variant of ADMM is proposed, and its resilience to unreliable machines is shown with an exact convergence to the optimal classification result.
The final part of research in this thesis considers machine-only data-driven classification problems. First, the fundamentals of classification are studied in an information theoretic framework. We investigate the nonparametric classification problem for arbitrary unknown composite distributions in the asymptotic regime where both the sample size and the number of classes grow exponentially large. The notion of discrimination capacity is introduced, which captures the largest exponential growth rate of the number of classes relative to the samples size so that there exists a test with asymptotically vanishing probability of error. Error exponent analysis using the maximum mean discrepancy is provided and the discrimination rate, i.e., lower bound on the discrimination capacity is characterized. Furthermore, an upper bound on the discrimination capacity based on Fano\u27s inequality is developed
Learning Graph Neural Networks with Approximate Gradient Descent
The first provably efficient algorithm for learning graph neural networks
(GNNs) with one hidden layer for node information convolution is provided in
this paper. Two types of GNNs are investigated, depending on whether labels are
attached to nodes or graphs. A comprehensive framework for designing and
analyzing convergence of GNN training algorithms is developed. The algorithm
proposed is applicable to a wide range of activation functions including ReLU,
Leaky ReLU, Sigmod, Softplus and Swish. It is shown that the proposed algorithm
guarantees a linear convergence rate to the underlying true parameters of GNNs.
For both types of GNNs, sample complexity in terms of the number of nodes or
the number of graphs is characterized. The impact of feature dimension and GNN
structure on the convergence rate is also theoretically characterized.
Numerical experiments are further provided to validate our theoretical
analysis.Comment: 23 pages, accepted at AAAI 202
- …
