Sign Up

110 SW Park Terrace, Corvallis, OR 97331

View map Free Event

Pixel- and Frame-level Video Labeling using Spatial and Temporal Convolutional Networks

This research report addresses the problem of video labeling, at the frame and pixel levels, using deep learning. For semantic pixel labeling, in our initial work,  we have developed recurrent temporal deep field (RTDF). RTDF is a conditional random field (CRF) that combines a deconvolution neural network and a recurrent temporal restricted Boltzmann machine (RTRBM), which can be jointly trained end-to-end. We have derived a mean-field inference algorithm to jointly predict all latent variables in both RTRBM and CRF. Also, our previous work on pixel labeling has addressed boundary flow estimation using a fully convolutional Siamese network (FCSN). The FCSN first estimates object boundaries in two consecutive frames and then predicts boundary correspondences in the two frames. For frame labeling, we have specified a temporal deformable residual network (TDRN), which computes two parallel temporal streams: i) Residual stream that analyzes video information at its full temporal resolution, and ii) Pooling/unpooling stream that captures long-range visual cues. The former facilitates local, fine-scale action segmentation, and the latter uses multiscale context for improving the accuracy of frame classification. Leveraging our previous work, we propose two related lines of research. The first study will introduce new regularizations in learning of the temporal convolutional network (TCN) aimed at extracting meaningful temporal patterns and their relevance scores for frame-level labeling. The second study will guide a 3D convolutional (C3D) segmentation network for pixel-level labeling using only action- or activity-level labels as supervision.

Major Advisor: Sinisa Todorovic
Committee: Fuxin Li
Committee: Xiaoli Fern
Committee: Raviv Raich
GCR: Leonard Coop

0 people are interested in this event

User Activity

No recent activity