CSE668 Principles of Animate Vision Spring 2011
 

    1. Introduction

 

In this course, we are interested in vision in

The Real World, a world of moving 3D objects and

scenes. In which the imaging systems are not

ideal, objects and noise and illumination and

motion are all difficult to constrain in advance.
 
 

We are not interested in single static images. Or

2-D Flat Worlds, or Blocks Worlds, or other such

constrained worlds. This eliminates many approaches

and algorithms.
 

 


CSE668 Sp2011 Peter Scott 01-01


 
The standard model of vision: Vision for Recovery
 

An imaging model is a mapping from R3, the 3D

scene, to R2, the image plane. An imaging model

maps a 3D scene into a 2D image of that scene.
 
 

The standard model of vision is that the purpose

of vision is to invert the imaging model. That is,

given an image, recover (reconstruct) the scene.

Determine the shapes and locations of all

objects in the scene.

CSE668 Sp2011 Peter Scott 01-02



There are attractive elements to the "Vision As

Recovery" approach:
 

1. Compatible with all cameras, scenes;

2. Recovery supports any narrower task;

3. Recovery uses visual data maximally.

4. Objectively assessible, quantifiable.
 

Because recovery is so general and its errors

quantifiable, it has been the standard model

for understanding what vision "is" for a

long time.
 

CSE668 Sp2011 Peter Scott 01-03



This approach is called passive vision. We do not

actively choose images or goals. In this course we

will consider passive vision for 3D and motion

recovery.


Early passive vision:

    A. 3D imaging models: projective geometry,

       stereopsis, epipolar geometry

    B. Shape recovery: shape from shading, other

       shape-from algorithms, illumination and

       reflection (radiometry), correspondence.

Late passive vision:

    C. 3D object recognition: 3D object-centered,

       2D view-centered, indexing and matching.

    D. Motion analysis: optical flow, structure

       from motion, passive egomotion, tracking.
 
 

CSE668 Sp2011 Peter Scott 01-04



 

As we review this literature, we will find that

the general problem of scene recovery from

passive imagery is far from satisfactorally

solved. Only a few very special cases of this

approach have succeeded:
 

    * Blocks World OR systems;

    * Autonomous vehicle nav systems operating slowly on
      structured roadways;


    * Robots in controlled environments.

 

CSE668 Sp2011 Peter Scott 01-05



Why is recovery so difficult?

 

    * Imaging model is many-to-one.

        Recovery is not well posed. One-to-many,

        non-robust, underconstrained, sensitive


        Eg: Is this cube tilted up or down?
  
         

 

    * Imaging model has many parameters.

        Eg. intrinsic camera params, extrinsic

            parameters, illumination params,

            surface reflectance params, etc.

            Hard to identify them accurately.

            Motion, shape parameters are hard to

            separate.
 
 

CSE668 Sp2011 Peter Scott 01-06



 

Given these difficulties, perhaps we should look

at the best existing systems, namely natural ones,

for clues to alternative approaches.
 

    Q: Does biological vision strive for recovery?

    A: Almost never! Biological vision is

       designed to support specific behaviors, not

       to recover every detail of everything it

       sees. It is purposive.
 

Eg: Frog waiting for an insect to fly by. It does

    not need to recover the scene, just to detect

    moving objects and estimate distance to them.
 

Eg: Bee flying to the hive. It does not need to

    recover the scenes it confronts on the way,

    it just needs to recognize a few landmarks

    and do obstacle avoidance.
 

CSE668 Sp2011 Peter Scott 01-06



Purposive vision

 

Recovery is too difficult and produces much

information that is not needed. Vision as

recovery is wasteful of important resources.


So here is an alternative to Vision As Recovery:

Vision exists not to recover but to support

specific behaviors and tasks. We refer to this

as Purposive Vision. From this point of view,

all representations, algorithms and strategies

should be task-dependent, not set on recovery.
 
 

CSE668 Sp2011 Peter Scott 01-07



Vision vs. Visions

 

Thus there is not one "vision" but there are many

"visions." Vision for a cheetah chasing its prey

should have quite different algorithms than vision

for an ant seeking food, or a cheetah returning to

its den.
 

Those interested in purposive vision do not ask,

how in general do things see, but rather, how does

vision-enabled system X support task Y? X can be a

cheetah, a man, a CCD camera linked to a computer.

Y can be egomotion estimation, obstacle avoidance,

object detection, tracking, etc.
 
 

CSE668 Sp2011 Peter Scott 01-08



So purposive vision is about selecting

representations, algorithms and strategies

which fit with:

    A specific goal, task or behavior.

    A given embodiment.

    A given set of environmental constraints.
 

Eg: goal: homing;

    embodiment: bee with multifaceted bee eyes;

    environmental constraints: can fly

    up to 5 mph, must fly at low altitudes also

    occupied by trees and bushes; may sustain

    attack by reptiles and spiders.
 

CSE668 Sp2011 Peter Scott 01-09



Eg: behavior: walking;

    embodiment: human being;

    environmental constraints: path is uneven,

    can trip over roots and rocks. Path is

    difficult to see in places. Must divide

    attention between footfall area and area

    ahead to maintain track on path.
 

Eg: task: Scud missle interception

    embodiment: anti-missle missle with onboard

        forward-looking camera

    environmental constraints: ballistic target,

        chaff, very high speed intercept.
 
 

CSE668 Sp2011 Peter Scott 01-10


 
It was the dream of the Vision As Recovery

scientists to devise one set of algorithms,

representations and schemes for all vision.

But it is hard to imagine that the same

algorithm that optimizes use of a bee eye

in homing will be useful to guide a walking

human or an anti-missle missle. Each has its

own needs, each requires an algorithm optimized

for those needs.
 

CSE668 Sp2011 Peter Scott 01-11


 

     Purposive vision is also active vision, in the

    sense that it is linked with selection of future 

    views and integration of vision with dynamic 

    behaviors. In addition to early and late passive

    vision, in this course we will consider:

 

    A. Early active vision: Active vision for 

       navigation. Egomotion estimation, 

       Obstacle avoidance, Visual servoing, 

       Homing.

 

    B. Late active vision: Active vision for 

       recognition and tracking. Active object 

       recognition, Active tracking. 

CSE668 Sp2011 Peter Scott 01-12