CSE668 Principles of Animate Vision Spring 2011

1. Introduction

In this course, we are interested in vision in

The Real World, a world of moving 3D objects and

scenes. In which the imaging systems are not

ideal, objects and noise and illumination and

motion are all difficult to constrain in advance.

We are not interested in single static images. Or

2-D Flat Worlds, or Blocks Worlds, or other such

constrained worlds. This eliminates many approaches

and algorithms.

CSE668 Sp2011 Peter Scott 01-01

The standard model of vision: Vision for Recovery

An imaging model is a mapping from R³, the 3D

scene, to R², the image plane. An imaging model

maps a 3D scene into a 2D image of that scene.

The standard model of vision is that the purpose

of vision is to invert the imaging model. That is,

given an image, recover (reconstruct) the scene.

Determine the shapes and locations of all

objects in the scene.

CSE668 Sp2011 Peter Scott 01-02

There are attractive elements to the "Vision As

Recovery" approach:

1. Compatible with all cameras, scenes;

2. Recovery supports any narrower task;

3. Recovery uses visual data maximally.

4. Objectively assessible, quantifiable.

Because recovery is so general and its errors

quantifiable, it has been the standard model

for understanding what vision "is" for a

long time.

CSE668 Sp2011 Peter Scott 01-03

This approach is calledpassive vision. We do not

actively choose images or goals. In this course we

will consider passive vision for 3D and motion

recovery.

Early passive vision:

A. 3D imaging models: projective geometry,

stereopsis, epipolar geometry

B. Shape recovery: shape from shading, other

shape-from algorithms, illumination and

reflection (radiometry), correspondence.

Late passive vision:

C. 3D object recognition: 3D object-centered,

2D view-centered, indexing and matching.

D. Motion analysis: optical flow, structure

from motion, passive egomotion, tracking.

CSE668 Sp2011 Peter Scott 01-04

As we review this literature, we will find that

the general problem of scene recovery from

passive imagery is far from satisfactorally

solved. Only a few very special cases of this

approach have succeeded:

    * Blocks World OR systems;

    * Autonomous vehicle nav systems operating slowly on       structured roadways;

    * Robots in controlled environments.

CSE668 Sp2011 Peter Scott 01-05

Why is recovery so difficult?

* Imaging model is many-to-one.

Recovery is not well posed. One-to-many,

non-robust, underconstrained, sensitive

Eg: Is this cube tilted up or down?

* Imaging model has many parameters.

Eg. intrinsic camera params, extrinsic

parameters, illumination params,

surface reflectance params, etc.

Hard to identify them accurately.

Motion, shape parameters are hard to

separate.

CSE668 Sp2011 Peter Scott 01-06

Given these difficulties, perhaps we should look

at the best existing systems, namely natural ones,

for clues to alternative approaches.

Q: Does biological vision strive for recovery?

A: Almost never! Biological vision is

designed to support specific behaviors, not

to recover every detail of everything it

sees. It is purposive.

Eg: Frog waiting for an insect to fly by. It does

not need to recover the scene, just to detect

moving objects and estimate distance to them.

Eg: Bee flying to the hive. It does not need to

recover the scenes it confronts on the way,

it just needs to recognize a few landmarks

and do obstacle avoidance.

CSE668 Sp2011 Peter Scott 01-06

Purposive vision

Recovery is too difficult and produces much

information that is not needed. Vision as

recovery is wasteful of important resources.

So here is an alternative to Vision As Recovery:

Vision exists not to recover but to support

specific behaviors and tasks.We refer to this

as Purposive Vision. From this point of view,

all representations, algorithms and strategies

should be task-dependent, not set on recovery.

CSE668 Sp2011 Peter Scott 01-07

Vision vs. Visions

Thus there is not one "vision" but there are many

"visions." Vision for a cheetah chasing its prey

should have quite different algorithms than vision

for an ant seeking food, or a cheetah returning to

its den.

Those interested in purposive vision do not ask,

how in general do things see, but rather, how does

vision-enabled system X support task Y? X can be a

cheetah, a man, a CCD camera linked to a computer.

Y can be egomotion estimation, obstacle avoidance,

object detection, tracking, etc.

CSE668 Sp2011 Peter Scott 01-08

So purposive vision is about selecting

representations, algorithms and strategies

which fit with:

A specific goal, task or behavior.

A given embodiment.

A given set of environmental constraints.

Eg: goal: homing;

embodiment: bee with multifaceted bee eyes;

environmental constraints: can fly

up to 5 mph, must fly at low altitudes also

occupied by trees and bushes; may sustain

attack by reptiles and spiders.

CSE668 Sp2011 Peter Scott 01-09

Eg: behavior: walking;

embodiment: human being;

environmental constraints: path is uneven,

can trip over roots and rocks. Path is

difficult to see in places. Must divide

attention between footfall area and area

ahead to maintain track on path.

Eg: task: Scud missle interception

embodiment: anti-missle missle with onboard

forward-looking camera

environmental constraints: ballistic target,

chaff, very high speed intercept.

CSE668 Sp2011 Peter Scott 01-10

It was the dream of the Vision As Recovery

scientists to devise one set of algorithms,

representations and schemes for all vision.

But it is hard to imagine that the same

algorithm that optimizes use of a bee eye

in homing will be useful to guide a walking

human or an anti-missle missle. Each has its

own needs, each requires an algorithm optimized

for those needs.

CSE668 Sp2011 Peter Scott 01-11

Purposive vision is also active vision, in the

sense that it is linked with selection of future

views and integration of vision with dynamic

behaviors. In addition to early and late passive

vision, in this course we will consider:

A. Early active vision: Active vision for

navigation. Egomotion estimation,

Obstacle avoidance, Visual servoing,

Homing.

B. Late active vision: Active vision for

recognition and tracking. Active object

recognition, Active tracking.

CSE668 Sp2011 Peter Scott 01-12