Learning to Integrate Vision, Language, and Action

Stefan Lee, Assistant Professor
School of Electrical Engineering and Computer Science
Oregon State University

Recently, the computer vision (CV) and natural language processing (NLP) communities have begun to approach problems that integrate perception and language understanding with simplified control tasks in simulated environments. For example, learning how to follow the instruction "Walk out of the bathroom, turn left, go downstairs and wait near the coat rack" or navigate in never-before-seen environments to answer the question "What color is the car?" This talk will provide a brief introduction to this research area and discuss some of my recent work in embodied question answering and vision-and-language navigation. These embodied tasks are often motivated in the context of robotics; however, there has been limited work on actual deployment. Throughout the talk, I'll highlight opportunities and challenges for transferring work in this area to real robots.

Stefan Lee is an assistant professor in EECS at Oregon State. Before coming to OSU, he was a research scientist in the School of Interactive Computing at Georgia Tech working on multimodal vision-and-language understanding problems. He obtained his Ph.D. degree in 2016 from Indiana University and then was a postdoctoral fellow at Virginia Tech. He has won a best paper award at EMNLP 2017 and received 7 outstanding reviewer awards at major machine learning and computer vision conferences.

Friday, November 22 at 10:00am to 11:00am

Rogers Hall, 226
2000 SW Monroe Avenue, Corvallis, OR 97331

Lecture or Presentation

Electrical Engineering and Computer Science, Mechanical, Industrial, and Manufacturing Engineering
Dylan Jones

