Dear all,
Please note that the speaker will be on zoom for the event but it will be set up in Rogers 230 (different from the usual location) for everyone to attend.
We investigate methods for computer vision architectures to self-improve in unlabelled data, by exploiting rich regularities of the natural world. As a starting point, we embrace the fact that the world is 3D, and design neural architectures that map RGB-D observations into 3D feature maps. This representation allows us to generate self-supervision objectives using other regularities: we know that two objects cannot be in the same location at once, and that multiple views can be related with geometry. We use these facts to train viewpoint-invariant 3D features (unsupervised), and yield improvements in object detection and tracking. We then discuss entity-centric architectures where entities are informed from associative retrieval or through reconstruction feedback, and show their superior generalization over models without memory or without reconstruction feedback. We then shift focus to extracting information from dynamic scenes. We propose a way to improve motion estimation itself, by revisiting the classic concept of “particle videos”. Using learned temporal priors and within-inference optimization, we can track points across occlusions, and outperform flow-based and feature-matching methods on fine-grained multi-frame correspondence tasks.
https://engineering.oregonstate.edu/EECS/research/AI
Rajesh Mangannavar,