CSE668 Principles of
Animate Vision
Spring 2011
1.
Introduction
In
this
course,
we are interested
in vision in
The Real World, a world of moving 3D objects and
scenes. In which the imaging systems are not
ideal, objects and noise and illumination and
motion
are
all
difficult to
constrain in advance.
We are not interested in single static images. Or
2-D Flat Worlds, or Blocks Worlds, or other such
constrained worlds. This eliminates many approaches
and
algorithms.
CSE668 Sp2011 Peter Scott 01-01
The
standard model of
vision: Vision for Recovery
An imaging model is a mapping from R3, the 3D
scene, to R2, the image plane. An imaging model
maps
a
3D
scene into a 2D image of
that scene.
The standard model of vision is that the purpose
of vision is to invert the imaging model. That is,
given an image, recover (reconstruct) the scene.
Determine the shapes and locations of all
objects in the scene.
CSE668 Sp2011 Peter Scott 01-02
There are attractive elements
to the
"Vision As
Recovery"
approach:
1. Compatible with all cameras, scenes;
2. Recovery supports any narrower task;
3. Recovery uses visual data maximally.
4.
Objectively assessible,
quantifiable.
Because recovery is so general and its errors
quantifiable, it has been the standard model
for understanding what vision "is" for a
long
time.
CSE668 Sp2011 Peter Scott 01-03
This approach is called passive vision. We do not
actively choose images or goals. In this course we
will consider passive vision for 3D and motion
recovery.
Early passive vision:
A. 3D imaging models: projective geometry,
stereopsis, epipolar geometry
B. Shape recovery: shape from shading, other
shape-from algorithms, illumination and
reflection (radiometry), correspondence.
Late passive vision:
C. 3D object recognition: 3D object-centered,
2D view-centered, indexing and matching.
D. Motion analysis: optical flow, structure
from
motion,
passive egomotion, tracking.
CSE668 Sp2011 Peter Scott 01-04
As we review this literature, we will find that
the general problem of scene recovery from
passive imagery is far from satisfactorally
solved. Only a few very special cases of this
approach
have
succeeded:
CSE668 Sp2011 Peter Scott 01-05
Why is recovery so
difficult?
*
Imaging
model is
many-to-one.
Recovery is not well posed. One-to-many,
non-robust,
underconstrained,
sensitive
Eg: Is this cube tilted up or
down?
*
Imaging
model
has many parameters.
Eg. intrinsic camera params, extrinsic
parameters, illumination params,
surface reflectance params, etc.
Hard to identify them accurately.
Motion,
shape
parameters
are hard to
separate.
CSE668 Sp2011 Peter Scott 01-06
Given these difficulties, perhaps we should look
at
the
best
existing systems,
namely natural ones,
for
clues to alternative approaches.
Q: Does biological vision strive for recovery?
A: Almost never! Biological vision is
designed to support specific behaviors, not
to recover every detail of everything it
sees.
It is purposive.
Eg: Frog waiting for an insect to fly by. It does
not need to recover the scene, just to detect
moving
objects
and
estimate distance to them.
Eg: Bee flying to the hive. It does not need to
recover the scenes it confronts on the way,
it just needs to recognize a few landmarks
and
do
obstacle
avoidance.
CSE668 Sp2011 Peter Scott 01-06
Purposive vision
Recovery is too difficult and produces much
information that is not needed. Vision as
recovery is wasteful of important resources.
So here is an alternative to Vision As Recovery:
Vision exists not to recover but to support
specific
behaviors and tasks. We refer to this
as
Purposive
Vision. From this point of view,
all
representations, algorithms
and strategies
should
be
task-dependent, not
set on recovery.
CSE668 Sp2011 Peter Scott 01-07
Vision vs. Visions
Thus there is not one "vision" but there are many
"visions." Vision for a cheetah chasing its prey
should have quite different algorithms than vision
for an ant seeking food, or a cheetah returning to
its
den.
Those interested in purposive vision do not ask,
how in general do things see, but rather, how does
vision-enabled
system
X
support task Y? X
can be a
cheetah,
a
man, a CCD camera linked to a
computer.
Y can be egomotion estimation, obstacle avoidance,
object
detection,
tracking,
etc.
CSE668 Sp2011 Peter Scott 01-08
So
purposive vision is about
selecting
representations, algorithms and strategies
which fit with:
A specific goal, task or behavior.
A given embodiment.
A
given
set of
environmental constraints.
Eg: goal: homing;
embodiment: bee with multifaceted bee eyes;
environmental constraints: can fly
up to 5 mph, must fly at low altitudes also
occupied by trees and bushes; may sustain
attack
by
reptiles
and spiders.
CSE668 Sp2011 Peter Scott 01-09
Eg:
behavior: walking;
embodiment: human being;
environmental constraints: path is uneven,
can trip over roots and rocks. Path is
difficult to see in places. Must divide
attention between footfall area and area
ahead
to
maintain
track on path.
Eg: task: Scud missle interception
embodiment: anti-missle missle with onboard
forward-looking camera
environmental constraints: ballistic target,
chaff,
very
high speed intercept.
CSE668 Sp2011 Peter Scott 01-10
It
was the dream of the Vision As Recovery
scientists to devise one set of algorithms,
representations and schemes for all vision.
But it is hard to imagine that the same
algorithm that optimizes use of a bee eye
in homing will be useful to guide a walking
human or an anti-missle missle. Each has its
own needs, each requires an algorithm optimized
for
those needs.
CSE668 Sp2011 Peter Scott 01-11
Purposive
vision
is also active vision, in
the
sense that it is
linked
with selection of future
views
and
integration of vision with dynamic
behaviors.
In
addition to early and late passive
vision,
in this
course we will consider:
A. Early
active
vision: Active vision for
navigation. Egomotion estimation,
Obstacle
avoidance, Visual servoing,
Homing.
B. Late
active
vision: Active vision for
recognition and tracking. Active object
recognition, Active tracking.
CSE668 Sp2011 Peter Scott 01-12