Anomaly Detection and Probabilistic Diagnosis for Automated Data Quality Control
Advances in sensor technology are greatly expanding the range of quantities that can be measured while simultaneously reducing the cost. However, deployed sensors drift out of calibration and fail, so every sensor network requires quality control procedures to promptly detect these failures. To address these problems, we propose a two-level architecture, SENSOR-DX, for automated quality control. SENSOR-DX is based on defining a collection of views of the network, where each view captures the behavior of a one or more sensors at one or more sites over a specified time interval. The lower level of SENSOR-DX consists of anomaly detectors trained for each view. These produce an anomaly score based on the sensor readings in the view. The upper level of SENSOR-DX performs probabilistic reasoning over these anomaly scores to infer which individual sensors are malfunctioning. SENSOR-DX combines the enhanced ability to detect sensor failures by modeling correlations! among multiple sensors with the ability of probabilistic inference to determine which individual sensors are malfunctioning. This dissertation also studies two subproblems that arise as part of SENSOR-DX. First, the data collected from sensors may contain missing values. Existing anomaly detection methods cannot handle missing values. We studied various methods for addressing this and concluded that two methods, proportional distribution and imputation, work the best. The second subproblem is that some weather variables, such as precipitation, have difficult probability distributions that are not handled well by general-purpose anomaly detection methods. We study special-purpose models that predict the amount of precipitation at each station as a function of the precipitation observed at neighboring stations. We find that a conditional mixture model gives the most effective anomaly detections for this task.
Major Advisor: Thomas Dietterich
Committee: Alan Fern
Committee: Fuxin Li
Committee: Mike Rosulek
GCR: Debashis Mondal
Tuesday, April 7 at 1:00pm to 3:00pmVirtual Event