Introduction

In this course, we are interested in vision in The Real World, a world of moving 3D objects and scenes. In which the imaging systems are not ideal, objects and noise and illumination and motion are all difficult to constrain in advance.

We are not interested in single static images. Or 2-D Flat Worlds, or Blocks Worlds, or other such constrained worlds. This eliminates many approaches and algorithms.


The standard model of vision: Vision for Recovery

An imaging model is a mapping from \(R^3\), the 3D scene, to \(R^2\), the image plane. An imaging model maps a 3D scene into a 2D image of that scene.

The standard model of vision is that the purpose of vision is to invert the imaging model. That is, given an image, recover (reconstruct) the scene. Determine the shapes and locations of all objects in the scene.


There are attractive elements to the "Vision As Recovery" approach:

Because recovery is so general and its errors quantifiable, it has been the standard model for understanding what vision "is" for a long time.


This approach is called passive vision. We do not actively choose images or goals. In this course we will consider passive vision for 3D and motion recovery.


As we review this literature, we will find that the general problem of scene recovery from passive imagery is far from satisfactorily solved. Only a few very special cases of this approach have succeeded:


Why is recovery so difficult?


Given these difficulties, perhaps we should look at the best existing systems, namely natural ones, for clues to alternative approaches.

  • Eg: Frog waiting for an insect to fly by. It does not need to recover the scene, just to detect moving objects and estimate distance to them.
  • Eg: Bee flying to the hive. It does not need to recover the scenes it confronts on the way, it just needs to recognize a few landmarks and do obstacle avoidance.

    Purposive vision

    Recovery is too difficult and produces much information that is not needed. Vision as recovery is wasteful of important resources. So here is an alternative to Vision As Recovery: Vision exists not to recover but to support specific behaviors and tasks. We refer to this as Purposive Vision. From this point of view, all representations, algorithms and strategies should be task-dependent, not set on recovery.


    Vision vs. Visions

    Thus there is not one "vision" but there are many "visions." Vision for a cheetah chasing its prey should have quite different algorithms than vision for an ant seeking food, or a cheetah returning to its den. Those interested in purposive vision do not ask, how in general do things see, but rather, how does vision-enabled system X support task Y? X can be a cheetah, a man, a CCD camera linked to a computer. Y can be egomotion estimation, obstacle avoidance, object detection, tracking, etc.


    So purposive vision is about selecting representations, algorithms and strategies which fit with:

  • Eg: goal: homing;

    It was the dream of the Vision As Recovery scientists to devise one set of algorithms, representations and schemes for all vision. But it is hard to imagine that the same algorithm that optimizes use of a bee eye in homing will be useful to guide a walking human or an anti-missile missile. Each has its own needs, each requires an algorithm optimized for those needs.


    Purposive vision is also active vision, in the sense that it is linked with selection of future views and integration of vision with dynamic behaviors. In addition to early and late passive vision, in this course we will consider: