Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms
Crowdsourcing is a popular paradigm to address the high demands for labeled data in big data deluge. It aims to produce accurate labels by effectively integrating noisy, non-expert labeling from crowdsourced workers (annotators). The machine learning community has been studying effective crowdsourcing methods for many years, and many models and algorithms exist for this task. Among these efforts, one of the (arguably) most notable methods is an expectation maximization (EM) approach proposed by Dawid and Skene in 1979. The algorithm is based on a very simple model, yet has been quite effective in practice. However, theoretical understanding to the Dawid-Skene approach is still very limited. Recently, tensor algebra-based methods were proposed to establish identifiability of the Dawid-Skene model. However, tensor-based methods admit very high sample complexity since they hinge on third-order statistics of the annotator responses – which are quite hard to esti! mate in practice. In this work, we propose a simple algebraic algorithm that can efficiently solve large-scale crowdsourcing problems under the Dawid-Skene model with provable identifiability guarantees. We also propose two different approaches which can enhance the performance of the algebraic algorithm under more challenging scenarios. Proposed methods uses second-order statistics of the annotator responses, thus naturally enjoys much lower sample complexity compared to tensor-based methods. Experiments show that the proposed algorithms outperform the state-of-art algorithms under a variety of scenarios.
Major Advisor: Xiao Fu
Committee: Raviv Raich
Committee: Weng-Keen Wong
Committee: Xiaoli Fern
GCR: Adam Schultz
Monday, November 18, 2019 at 1:00pm to 3:00pm
Kelley Engineering Center, 1007
110 SW Park Terrace, Corvallis, OR 97331
Calvin Hughes
No recent activity