About this Event
110 SW Park Terrace, Corvallis, OR 97331
Pixel- and Frame-level Video Labeling using Spatial and Temporal Convolutional Networks
This research report addresses the problem of video labeling, at the frame and pixel levels, using deep learning. For semantic pixel labeling, in our initial work, we have developed recurrent temporal deep field (RTDF). RTDF is a conditional random field (CRF) that combines a deconvolution neural network and a recurrent temporal restricted Boltzmann machine (RTRBM), which can be jointly trained end-to-end. We have derived a mean-field inference algorithm to jointly predict all latent variables in both RTRBM and CRF. Also, our previous work on pixel labeling has addressed boundary flow estimation using a fully convolutional Siamese network (FCSN). The FCSN first estimates object boundaries in two consecutive frames and then predicts boundary correspondences in the two frames. For frame labeling, we have specified a temporal deformable residual network (TDRN), which computes two parallel temporal streams: i) Residual stream that analyzes video information at its full temporal resolution, and ii) Pooling/unpooling stream that captures long-range visual cues. The former facilitates local, fine-scale action segmentation, and the latter uses multiscale context for improving the accuracy of frame classification. Leveraging our previous work, we propose two related lines of research. The first study will introduce new regularizations in learning of the temporal convolutional network (TCN) aimed at extracting meaningful temporal patterns and their relevance scores for frame-level labeling. The second study will guide a 3D convolutional (C3D) segmentation network for pixel-level labeling using only action- or activity-level labels as supervision.
Major Advisor: Sinisa Todorovic
Committee: Fuxin Li
Committee: Xiaoli Fern
Committee: Raviv Raich
GCR: Leonard Coop
Event Details
See Who Is Interested
0 people are interested in this event
User Activity
No recent activity