CSE668 Lecture Slides Spring 2011
7. Passive 3-D Object Recognition
We
will now return from our foray into low level
active vision (Sections 5-6 of these notes) to complete
our discussion of the standard (passive) paradigm
approaches to 3-D and motion vision begun in Sec
2-3.
These are the higher level passive operations,
more
AI-dependent, as opposed to the lower level
passive
operations such as stereopsis, more signal
processing
dependent, which we have discussed thus
far.
Required reading: Sonka, Ch.
10.
Most of our presentation of 3-D OR
will be descriptive
rather than
analytic. That is, we will not
go into
detail
concerning the representations and algorithms. The
only
exceptions will be in the goad algorithm and the aspect
graph
method,
both
of
which
will be discussed towards the
end
of this section.
CSE668 Sp2011 Peter Scott 04-00
In our consideration of Shape Recovery in Sec. 3,
we saw that the paradigm of "vision-for-recovery"
is very ambitious and in most general cases the
results
have been limited and perhaps disappointing.
Here we consider a somewhat less ambitious, though
still very difficult, task: 3-D Object Recognition
(OR).
It is easier than shape recovery because you
do not have to recover an object in order to recognize
it.
Eg: You can recognize an object as "a ball" without
recovering
its
full
details:
surface
texture,
exact
shape, size, etc.
CSE668 Sp2011 Peter Scott 04-01
Suppose there are m hypotheses (object classes) in
the Frame Of Discernment (FOD), the set of all object
classes you can declare. OR is the association of
an object from the scene with a hypothesis 1...m from
the frame of discernment, ie. declaration of one class
label
as the true hypothesis.
Typically ("open system assumption"), one of the hypotheses
is
called the null hypothesis, the
hypothesis that states
that the object
does not
belong to any of the other m-1
classes. In a
closed
system the null hypothesis is not
included and it
is assumed all objects in the scene can be
correctly associated with one (or more) of the hypotheses.
CSE668 Sp2011 Peter Scott 04-02
Distinct hypotheses may require physically distinct
object classes, such as class 1: vehicles, class 2:
people, class 3: clouds, etc. Or the hypotheses may
be combinations of object physical characteristics
and
other properties.
Eg: FOD = { people facing camera, people facing
away from camera, animals moving, animals stationary,
null hypothesis }
CSE668 Sp2011 Peter Scott 04-03
How are objects declared to belong to class i
rather than class j? OR always involves comparisons.
These
comparisons may be of two general types:
1. Comparing descriptions
of the object features
with features of archetype objects of each class;
2. Comparing object class construction rules
with the manner in which the object to be
recognized
may
be
constructed.
CSE668 Sp2011 Peter Scott 04-04
The first is called recognition by analysis in which
features of the object are matched against features
of each of the stored archetypes (except the null
hypothesis), and the best match, if it is above a
preset threshold, determines the classification decision.
If
no class matches sufficiently, and the null hypothesis
is available (open-world assumption), the null hypothesis
is
chosen.
Eg: FOD = {vehicles, buildings, null hypothesis}.
A car would be classed a vehicle since there
would be an archetype with similar features:
four wheels, similar size, windows, etc.
CSE668 Sp2011 Peter Scott 04-05
The second is called recognition by synthesis in
which the constructive procedures for each class
and the object are compared. Again, the best match
determies
the classification decision.
Eg. Syntactic recognition: reduce the object
to a string of symbols that represents a
constructive procedure for generating it. See
if that string is "grammatical" in the grammars
that represent each hypothesis (object class).
In
other words, see if it obeys the rules for
construction which define each class separately.
In either case, to make the comparisions we need
to
have data concerning the object and data
concerning
objects representative of each class
in
the frame of discernment to compare.
CSE668 Sp2011 Peter Scott 04-06
Describing
3-D shapes
Descriptions can be complete or partial. 3-D
recovery
is complete description,
ie.
a
different
description
for each
distinct object. Complete
descriptions
are also
called representations.
Eg: (x,y,z) for each surface point is complete.
Eg: (p,q) for each surface point is complete.
Eg: Polygonal approximations are incomplete.
Eg: Superquadrics are incomplete.
CSE668 Sp2011 Peter Scott 04-07
For 3-D OR we seldom work with complete
descriptions,
for several reasons:
1. Limited view angles and image sets;
2. Hard to compute;
3. Contain more detail than needed for
OR
matching.
Instead, we work with partial representations,
ie. models. Typically, we select a class of models
and within this class find the parameter values
(features) that best describe the object and that
best describe the archetypes. These feature vectors
may
then be matched.
Models
are never complete, ie. many distinct objects
are
represented by the same model. Models represent
classes of objects. But often models are
unique (each object has unique best model fit).
Unique
models are a well suited for OR use.
CSE668 Sp2011 Peter Scott 04-08
Volumetric (space-filling) models
Consider a discrete binary image. Each pixel has
2 possible values: true (white, background) and
false (black, object). Similarly, we may
define a discrete binary 3-D scene model which
consists of a grid of voxels, or volume elements).
If voxel (i,j,k)=0 that means that the corresponding
small spatial cube is filled by object rather than
background.
Eg: Voxel representation of a tetrahedron.
For
finite voxel size, since many distinct objects
have
the same "staircased" appearance, this creates
a
partial resentation, or model of the object. In
the limit, as the voxel size->0, voxel models become
complete representations. Voxel models come
with all the analytical tools of signal processing.
They tend to be high-dimensional, require lots of
storage,
particularly non-binary voxel models.
CSE668 Sp2011 Peter Scott 04-09
Volume
primitives
approach
Another way to fill space is with unions and
intersections of volume primitives, simple 3-D
solids. By using smooth-surfaced primitives such
as spheres and ellipsoids, or non-right-angle
primitives such as parallelograms and pyramids,
the 3-D staircasing in voxel models can be
avoided. Different classes of volume primitives
may
be employed.
Note that voxels are cubic volume primitives
which are combined through union to form space-
filling models, so voxel modelling is a special
case of the volume primitive approach.
CSE668 Sp2011 Peter Scott 04-10
Polyhedral
models
The
single volume primitive used to create 3-D
polyhedral
models is the half-space, ie. the
set of points x
in R3 satisfying
αTX+β≥0
for some α in R3
and scalar β. Only
intersections
of them are
required for convex
objects. General
non-convex
objects can be decomposed
into a union
of convex
parts, and so can
be modeled by unions
of such
intersected half-spaces.
One nice feature of polyhedral models is that
they can be characterized by 4 scalars
(a,b,c,d) for each facet (bounding plane),
since ax + by + cz + d ≥ 0 defines a half-space.
Another is that linear programming can be used
to determine vertices and characteristics.
The obvious weak point of polyhedral modelling
is the proliferation of artificial facet edges
CSE668 Sp2011 Peter Scott 04-11
Eg: Polyhedral model of teapot, simple shading.
Source:
http://www.frontiernet.net/~imaging/fo_arch.html
CSE668 Sp2011 Peter Scott 04-11a
Eg: Polyhedral model of a bust of Beethoven
postprocessed with non-Lambertian surface
model and spot illumination.
Source: http://www.frontiernet.net/~imaging/fo_arch.html
Some kind of smoothing is obviously necessary
to make the model reasonably realistic
and
accurate.
CSE668 Sp2011 Peter Scott 04-11b
Constructive
geometry
models
The volume primitives are (typically) sphere,
cylinder, cone, cuboid (or prism) and half-space.
Sometimes ellipsoids, paraboloids and other
simple
forms
are
added
to
the
set
of primitives.
These models are surprisingly expressive, but it
is often difficult to determine the smallest set
for a given permissible deviation. The expressiveness
of a modeling scheme is the range of different
types of objects which it can accurately represent.
CSE668 Sp2011 Peter Scott 04-12
Super-quadrics
Quadric solids are defined by
ax2 + by2 + cz = d (Type I
quadric)
or
ax2 + by 2 + cz2 = d (Type
II quadric)
For instance if a and b are of the same
sign then the Type I quadric is called a
paraboloid, else a hyperbolic paraboloid.
If a, b, c and d are all of the same sign
then the Type II quadric is called an
ellipsoid.
CSE668 Sp2011 Peter Scott 04-13
Superquadrics are generalizations of quadrics
with additional parameters to make them less
smooth and symmetric than quadrics. See Sonka
Fig. 10-14 p 526 for some super-ellipsoids.
But note errors in exponent s in eqn 10-23
p
525. Should read
CSE668 Sp2011 Peter Scott 04-13a
A description of superquadrics and development
of some of their mathematical properties can
be found here. Superquadrics have many interesting
mathematical properties, and may be usefully
generalized in
various ways. They are much more
expressive than
quadrics, at the cost of two
additional
parameters (6 vs. 4) in the models.
CSE668 Sp2011 Peter Scott 04-13b
Sweep
representations
Given a 3-D object and a curvilinear path or
"spine" entirely contained within it, the object
may be described by defining at each point along
the spine the 2-D shape of the crossection of the
object at that point. This is called a sweep
representation, or generalized cylindrical
representation.
CSE668 Sp2011 Peter Scott 04-14
Eg: If the spine is a straight line in 3-D, and
the crossection is a circle with linearly
decreasing radius, the figure swept out is a
cone.
Sweep-represented objects need not have axes of
symmetry, or any symmetry whatsoever. But they
work best for "spaghetti-like" objects, that is,
objects for which the change in the crossection
for a properly selected spine is relatively slow
and smooth.
In case of hand-like objects with several
protruding parts, the spine must divide into
multiple skelelton like extensions. This
complicates
sweep representations significantly.
CSE668 Sp2011 Peter Scott 04-15
Surface
models
The alternative to space-filling models are those
that model the boundaries of the object: surface
models. Just as there are many ways to model
boundaries in 2-D, ie. 1-D curves on a plane,
there are many ways to model curvilinear 2-D
manifolds in
3-space.
There are two important problems to be solved in
computing a surface model: what class of manifolds
to use, and how and where they will intersect as
1-D curves in
3-D.
CSE668 Sp2011 Peter Scott 04-16
The simplest
volumetric model was the polyhedron, a
solid 3-D object with planar boundaries.
If we think of the elements of a polyhedron
model not as half-spaces but as the facets that
bound them, this model class can also be
considered a
surface model. However, if we are
describing a
polyhedron by quantifying its facets
we need to
specify its edges and vertices.
The edges
can be characterized by equating
pairs of
half-space boundaries, and the
vertices found
by solving
triples of these
equations.
CSE668 Sp2011 Peter Scott 04-16a
Eg: Consider the cube modelled volumetrically by
the
intersection
of
six
half-spaces
defined
by
the
inequalities
1x
+
0y
+
0z
>=
0
0x
+
1y
+
0z
>=
0
0x
+
0y
+
1z
>=
0
1x
+
0y
+
0z
<=
1
0x
+
1y
+
0z
<=
1
0x
+
0y
+
1z
<=
1
CSE668 Sp2011 Peter Scott 04-17
The half-space boundaries are these inequalities
considered
as
equalities
(e.g. x=0).
The
first
two
facets
intersect
where
x=0 and y=0,
which
is
the
1-D straight line
{
(x,y,z)
=
(0,0,z)
for
all
z
}
The
vertices
for
this
edge
between
facets
x=0
and
y=0
are
the points on all of the first three
half-space
boundaries
x=0,
y=0, z=0,
(x,y,z) = (0,0,0)
and the first two and the sixth,
(x,y,z) = (0,0,1)
CSE668 Sp2011 Peter Scott 04-18
In this example, the 6 half-space inequalities
define the volumetric model, while the 6
corresponding equalities, together with the 12
edge equations and 8 vertices, define the
corresponding
polyhedral boundary model. Clearly,
the cube
requires more constraints to be described
as a surface
model than a volumetric model. But the
additional
boundary parameters in surface models
permit
modelling of non-convex
bodies, which cannot
be done with
intersections of half-spaces.
Eg: This object cannot be modelled using
intersections of half-spaces, but can be
using a surface polyhedral model.
CSE668 Sp2011 Peter Scott 04-19
Delaunay
Triangulation
Given a set of
points on the surface of a 3-D object,
or surface point cloud, these points can be used as
the vertices of contiguous triangles whose union
models the surface of the object. This process is
called surface triangulation and the most common
stragegy is
Delaunay Triangulation:
1. Select 3 points from the point cloud and find
the unique circle they lie on. Extend that
circle to the unique sphere that has that
circle as a great circle.
2.
If
there
are
no
other
points
in the surface
point cloud which are in the interior
of that sphere, make these three points
vertices of a triangular facet of the model.
3.
Repeat
steps
1
and
2
for
all sets of 3 points.
CSE668 Sp2011 Peter Scott 04-20
A problem with the algorithm given above is that like
polyhedral volumetric modelling, it requires
additional constraints to handle non-convex objects.
The unconstrained algorithm above returns the convex
hull
of
the
point
cloud.
Eg:
given
the
vertices
of
the
non-convex
figure of 04-18
as
the
surface point
cloud,
unconstrained
Delaunay
Triangulation
will
return
the
cube shown in 04-19.
Delaunay Triangulation can also be extended to
non-convex objects by first decomposing the point
cloud into the union of point clouds in which the
convex hull models the object, then performing the
algorithm on each separately and merging the
resulting
convex
polyhedrons.
CSE668 Sp2011 Peter Scott 04-21
Eg:
This
non-convex
object
can
be
decomposed into
the union of two convex
objects by splitting
it vertically at z1.
There are numerous other triangulation schemes (e.g. constrained
triangulation, Morse triangulation) but Delaunay Trainagulation
is still the most commonly used in practice.
CSE668 Sp2011
Peter Scott 04-21a
Object-based
OR
using
volumetric
or
surface
models
Having
considered
several
types
of
3-D models, we now
turn
to
the
question
of
how
are
we
to use a 3-D model
for
OR?
As
suggested
earlier,
general
pattern
recognition
procedures
most
often
have
these
steps:
A. Offline:
1.
Compute
one
or
more
archetype
models (ideal
noise and distortion free models) for
each class.
Call this the class library.
2. Precompile a set of attributes Ai for each
member of the class library. Ai will be
used for purposes of matching.
3. Precompile other properties of the class
library and the data environment which will
be of use online, such as thresholds and
probabilities.
CSE668 Sp2011 Peter Scott 04-22
B. Online:
1. Given the online data, compute model of the
object
to
be
classified,
including
the
value
of
its
attributes
A.
2. Initialize: make an initial guess hypothesis Hi.
3. Compute match score for current hypothesis Hi.
4. If match threshold exceeded, declare object
to
be
recognized
and
its
class
to
be class i,
else
make
another
hypothesis
and
go
to
3. If all
hypotheses
have
been
tried
declare
recognition
failure.
5. If a learning pattern recognition procedure,
update precompiled database in light of
experience.
CSE668 Sp2011 Peter Scott 04-23
By a hypothesis, we mean a declaration of the object
class, together with any necessary view parameters, ie.
a declaration of what we are looking at and and how we
are
looking
at
it.
Eg: In a 3-D object-centered OR system with unknown
illumination and scale, a hypothesis would be
an object class together with camera pose and
location
parameters
and
illumination
parameters.
Eg: In a 3-D view-centered OR system with fixed
pose and scale and illumination, a hypothesis
would be an object class together with a view
index
number
(eg.
this
is
view
17
of object 6).
CSE668 Sp2011 Peter Scott 04-24
In methods employing the recognition-by-analysis
approach, the "set of attributes" are quantitative
or categorical mappings from the model called features.
Note: a categorical mapping is a mapping into a
non-numerical set which need not be endowed with a
similarity
or
distance
metric.
Eg: The feature set {surface point, edge point,
vertex point} is categorical.
Eg: For a polyhedral surface model, we can take the
set of outward pointing normals and the locations
of the centroids of each facet as the features.
These
are
all
numerical
features.
Eg:
For
a
voxel
model,
we
can
take the (i,j) moments
for
0<=i,j<=M
as
the
(numerical)
features.
Eg: For a sweep representation model, we can take
length and curvature of the spine, together
with Fourier Descriptors of sampled crossections,
as
the
feature
vector.
CSE668 Sp2011 Peter Scott 04-25
Object-centered coordinate systems assume that the object
being viewed is located at the origin of the world coord
system (xw,yw,zw)=(0,0,0) and that it has a specific
known pose (3-D rotation relative to the world coord
axes).
Eg: If the distance to the origin of world coords
from the camera is known, each extrinsic camera
model can be represented as 3-D vector indicating
the direction of the camera coordinate z-axis
("optical axis" of the system) in world coords,
a point on a sphere indicating camera location,
and a 2-D rotation vector indicating the rotation
of
the
image
plane
in
camera
coordinates.
CSE668 Sp2011 Peter Scott 04-26
The canonical position of the object in world coords
is often taken to be such that a selected vertex is
at the origin as shown, or such that the centroid of
the object is at the origin. The canonical pose is
such that the major axes of the object are oriented
with the world coord axes, or principal edges are
aligned
as
much
as
possible.
CSE668 Sp2011 Peter Scott 04-27
The
Goad
algorithm
If we use 3-D polyhedral models, the Goad algorithm can
be
used
to
implement
the
OR
procedure.
Assume
the
class
library
consists
of
polyhedral
archetypes,
one for each class. Take the feature set for each member of
the class library to be the complete set of object edges,
each represented by an ordered pair of points in world
coords.
Eg: Object is an ellipsoid, polyhedral model has
8
facets,
6
vertices,
12
edges.
Then
e1=(v1,v3),
e2=(v1,v6),
...
e12=(v2,v4)
CSE668 Sp2011 Peter Scott 04-28
The camera is assumed to be a unit distance away from
the object to be classified, and records an image. That
image is precompiled to a polyhedral model and thence
to a set of edges (represented by pairs of vertices).
The location p on the unit sphere is not known, nor the
optical
axis,
nor
the
camera
image
plane
rotation
q.
The optical axis degree of freedom can be removed by
translating the object to the center of the image,
orienting the optical axis to the world coord
origin.
Thus, a hypothesis will consist of an object class
together with (p,q), the view location and image
plane rotation extrinsic camera parameters.
Note: Do not
confuse the 3-D location vector p and
2-D rotation
vector q here with the scalar components
(p,q)
of
the
gradient
space
defined
for
shape
recovery.
CSE668 Sp2011 Peter Scott 04-29
The
standard
form
of
Goad's algorithm assumes the object
is
opaque.
Thus,
given
view sphere location p, we must
consider
that
only
some
edges will be visible.
The set
of
p's
from
which
a given edge
e
is
visible
is called
the
visibility
locus of e, and denoted L(e).
Eg: for the p shown in 04-28, edges (v5,v6), (v2,v6)
and
(v2,v5)
will
not
be
visible.
So
this p is
not
in
L(v5,v6), L(v2,v6) or L(v2,v5).
The basic idea
of Goad's algorithm is to seek an object
label, p and q for which edges in the object image and
edges in an image computed from the class library match
up
as
much
as
possible
for
p
in
L(e).
CSE668 Sp2011 Peter Scott 04-29a
The algorithm proceeds by hypothesize-and-verify over
increasing
edge
sets
with
backtracking. At a given iteration,
we
have
our
current
hypothesis (ωi,
p,
q)
where ωi is the
hypothesized
object
class,
and a the set
of edges
that
we
have
matched
with
this
hypothesis
up to this iteration. We
select
an
unmatched
edge
in
the
object
image, and based on
the
intersections
of
L(e)
for
all
matched
edges,
determine
a region where the matching edge must lie in the (ωi, p, q)
pre-stored
image.
If
there
is
one
there,
we
have
verified
the hypothesis so
far,
and
we
continue
by
seeking
to
match another edge
that
has
not
yet
been
matched.
If there fails to be one, ie. no edge in the (ωi,
p,
q)
pre-stored
image
which
is
in the intersections
of
L(e)
for
matched
edges,
the
hypothesis
(ωi,
p,
q) is rejected
and
we
backtrack
through
(p,q) space.
CSE668 Sp2011 Peter Scott 04-29b
The
algorithm
terminates
with
success when there are no
unmatched
edges
in
the
object
image,
and
the
current
hypothesis
(ωi,
p,
q)is declared. It
terminates
in
failure
when
we
have
backtracked
through
all
(ωi,
p,
q) for
the
current
ωi.
In
this
case the object is not
of
class
ωi,
and
we
move
on
to the next
class.
Note
that
since
there
are a finite number of hypotheses, and
each
polyhedral
object
image
has a finite number of edges, the
algorithm
has
guaranteed
convergence.
CSE668 Sp2011 Peter Scott 04-30
2-D
view-based methods
There are 3 distinct strategies we can employ for
matching
in
a
3-D
OR
problem:
1. 3-D model-based matching: the library stores
3-D archetypes. The image object is recovered, ie.
a 3-D model is produced, and the size, position and
pose of the image object relative to the archetype
are
hypothesized.
Matching
is
done
in
R3.
2. Hybrid matching: the library stores 3-D archetypes.
Based on the 2-D view (image), but without recovery,
the pose, size and location are hypothesized and
used to create a synthetic view. Matching is done
in R2. Goad's algorithm is an example of hybrid
matching.
CSE668 Sp2011 Peter Scott 04-31
Full 3-D model-based matching is seldom attempted
for practical applications except in very narrow
domains (eg. blocks-world type). Problem is the
difficulty
with
recovery.
Hybrid matching is sometimes used for non-time-
critical applications.
There is a way to implement 2-D matching which does
not require creating synthetic views online,
in
which
a
set
of
such
views
are
precompiled.
3. 2-D view-based matching: the library stores a set
of 2-D views of the object. Based on the 2-D image
object, the best stored 2-D view is selected and
matching is done in R2. This is the kind of matching
we saw earlier with the Nelson homing algorithm. In
that case the entire scene, not just an object in
the scene, was matched.
CSE668 Sp2011 Peter Scott 04-32
To perform view interpolation, that is, the rendering of
a synthetic view from a view location which has not been
stored, an important fact is that a given object point
in one view may be mapped into its location in another
view using a simple transformation (affine in case of
orthographic projection, trilinear in perspective
projection). This can permit "tuning" the view location
before matching as long as the set of visible object
points
has
not
changed
between
views.
Eg:
Note
that
only
edges
and
vertices
visible
in
a given
image
can
be
mapped
using
such
methods.
Features
visible
from
an
unstored
pose
but
not
visible
from
a stored pose
will
be
missed
in
the
"tuned"
image.
CSE668 Sp2011 Peter Scott 04-33
Most of the work on view-centric approaches assumes
that the object models are polyhedral, both library
views and object views. Polyhedra have many nice
properties
for
relating
views
around
the
view
sphere.
1. They can be easily characterized in a given
view by sets of vertices and edges;
2. Polyhedral edges are straight in any view;
3. For most small changes of viewpoint, there is
no
change
in
the
set
of
visible
object points.
CSE668 Sp2011 Peter Scott 04-34
If the 3-D polyhedron is convex, then there is a
fourth
important
simplifying
property:
4. Either an entire edge and both its terminating
vertices are visible, one of its two vertices
and none of the edge interior is visible, or
no part of the edge is visible.
Eg:
Note that a view of a non-convex polyhedron may contain
"false vertices," that is, vertices of the view that
do
not
correspond
to
object
vertices.
CSE668 Sp2011 Peter Scott 04-35
The most direct approach to 2-D view-based OR would be
to store views equally spaced around the view sphere.
For instance, the centers of the 20 triangular facets
of the icosahedron. Or, if a greater set was desired,
dividing each of these 20 triangles into 4 triangles
whose vertices are the original triangle vertices plus
the
midpoints
of
the
three
sides.
A problem with such a method is that equal sized view
loci may over-model certain aspects of the object and
under-model others, making reliable recognition hard
or
impossible.
CSE668 Sp2011 Peter Scott 04-36
Eg:
Consider the union of the view loci of the two
vertices at the bottom of the slot. If there is
not a viewpoint on the view sphere in that union,
we would get no information about the structure
of the bottom of the slot. That union would be
a small proportion of the surface area of the
view sphere, so having a viewpoint in there would
not be very likely. Then, if there were two object
classes that differed only in what's going on in
that slot, we could never distinguish them since
their
set
of
stored
views
would
be
identical.
CSE668 Sp2011 Peter Scott 04-37
Characteristic
views
In the late 1970's and early 1980's groups led by
J.K. Koenderink and H. Freeman developed the ideas
of the characteristic views and the aspect graph.
These ideas have made 2-D view-centric approaches
the dominant
choice for 3-D OR ever since.
Let us limit ourselves to employing polyhedral
models. Any
view of a polyhedral model, either
perspective or orthographic, reduced to a line drawing,
is called an aspect of the model. An aspect is
identified
(uniquely specified) by
the set of its
visible facets.
Two aspects are said to be
adjacent if they
differ by
only a single
facet. A
nondirected graph in which the
aspects are
nodes and
two aspects are connected by an
edge iff they
are
adjacent views is called an aspect
graph.
CSE668 Sp2011 Peter Scott 04-38
Every view of a given object always has both
topological
and geometric characteristics.
Topological: graph structure of the view when
considered
as
an
undirected
graph
(Image
Structure
Graph ISG), ie. number of vertices and
their
interconnectivity.
Geometrical: lengths of
edges, size of angles.
Call a view "stable" if a small neighborhood around
the viewpoint contains only views with the same
topological
structure, ie. where the same aspect is
viewed. Then a
contiguous stable set of views
bounded
by unstable
viewpoints is called a characteristic
view
domain, and
the aspect being viewed in that domain is
called a characteristic view.
A characteristic view
class
is a set of characteristic view
domains which
are topologically equivalent, ie. have identical
ISGs.
CSE668 Sp2011 Peter Scott 04-39
Eg: For a cube, there are 26 characteristic view
domains
and
3
characteristic
view
classes.
The
3 ISGs corresponding to these 3 classes are
completely
connected
graphs
with
1,
2
and
3 nodes.
For convex polyhedra, the characteristic view
locus boundaries correspond to view loci formed by
the facets of the polyhedron projected onto the view
sphere.
CSE668 Sp2011 Peter Scott 04-40
To determine the aspect graph for a convex polyhedral
object, extend the facets to complete planes in
R3, and note how they planes partition R3 into
cells. All cell
boundaries are planes. Each cell
except the
object itself corresponds to an aspect
of
the polyhedron.
The nodes in the aspect graph are the aspects of the
polyhedron. Aspects are adjacent if they are
connected by a cell face. Each pair of adjacent
aspects are connected by an edge in the aspect graph.
That completes the definition of the aspect graph for
polyhedra.
CSE668 Sp2011 Peter Scott 04-41
Eg: Construct
the aspect graph of an elongated cube.
Begin
by
extending
each
object
facets
into
a full plane.
For an elongated cube, there are six facets which
partition R3 into 27 parts, one occupied by
the object itself, leaving 26 aspects.
Let's
number
the
aspects,
beginning
with
the
nine
aspects above the object. Starting from the
left and backmost aspect, lets number them in
raster order looking down at the object. Continue
in that fashion for the eight aspects neither
above nor below the object, then the nine below.
So for instance aspect 2 is above and behind,
while aspect 14 is directly to the right of
the
object.
CSE668 Sp2011 Peter Scott 04-42
Let's also number the object facets. Call the
one nearest to us facet 1, facet 2 to its right,
facet 3 behind, facet 4 to its left, 5 above,
6
below.
We are ready to construct the aspect graph. It
will have 26 nodes, each labelled by a line
drawing of the object as seen from that aspect.
Each of the extended planes has 8 cell faces
connecting
aspects,
for
a
total
of
48
edges.
CSE668 Sp2011 Peter Scott 04-43
For
instance,
aspect
1
looks
like
and has edges connecting this node to aspects
number 2, 4 and 10. So this part of the aspect
graph
would
look
like
Note that he aspect graph nodes do not need to
contain
line
drawings,
they
are
just
shown
here for
illustrative
purposes.
CSE668 Sp2011 Peter Scott 04-44
For a typical example, many of the aspects are
topolog-
ically equivalent. That is, if we made an ISG of each
aspect, in which the visible facets were nodes
and adjacent facets were connected with edges
these
ISGs
would
be
isomorphic.
For our example, there are only 3 topologically
distinct
aspects,
ie.
3
non-isomorphic
aspect-top-
ology graphs: 1-facet, 2-facets, 3-facets, and
thus three characteristic views.
CSE668 Sp2011 Peter Scott 04-45
CSE668 Sp2011 Peter Scott 04-45a
Adjacent views,
those
linked in the aspect graph,
have the
property
that there is a continuous path
of stable view
locations
connecting them. That is,
a viewer can
move
from one aspect to the other
without
encountering
any other aspects, and do so
even with small
perturbations
to the path. Thus
no single-facet
view
is linked to a 3-facet view,
since in moving
from
a one-facet view, one of the
two additional
facets
would be encountered first,
to first see
them
both at exactly the same point
in the path is
unstable.
CSE668 Sp2011 Peter Scott 04-45b
Construction of the aspect graph for non-convex
polyhedra is more difficult. Freeman calls the
partitioning planes described so far type A planes.
There are type B planes formed by an edge and a
non-adjacent vertex, sometimes called edge-vertex
planes.
Type B plane
And in certain
complex cases involving non-convex
polyhedra there
are type C surfaces, which
are not
planes but
piecewise planar "ruled
surfaces." For
more detail,
see Wang and Freeman's
paper in H.
Freeman (ed.),
"Machine vision for 3-D
scenes,"
Academic Press
1990.
CSE668 Sp2011 Peter Scott 04-46
The Aspect Graph Match Method for 3-D OR
Aspect graphs are used in 3-D OR by segmenting out
the object to be recognized from the image, mapping
it into a 2-D
polyhedron (ie. an aspect) and thence to
an Image
Structure
Graph (ISG). The ISG is an undirected
unattributed
graph
in which the nodes of the graph are
the vertices of
the
aspect, and the links are its edges.
We look for a match with ISGs of the aspects of
each class aspect graph.
If there are one or more matches to aspects within
a single aspect graph, a recognition is declared. If
no matches
anywhere, the null hypothesis is declared.
If there are matches in two or more aspect graphs,
a second view can be taken from an adjacent viewing
position and the process repeated. Candidate classes
must have aspects that match both views, with an edge
in the aspect
graph between them.
The process is
continued until all matching aspect graphs
have been
found. If none, declare OR failure (null hypothesis).
If one or more, proceed to geometric matching step. Aspect
graph matching
does not consider geometry.
CSE668 Sp2011 Peter Scott 04-47
Note:
There is a graph related to the ISG called the Planar
Structure Graph.
The
nodes
of
the
PSG
are
the
visible vertices
of the object,
the edges are its visible edges. While any two
aspects which
are topologically equivalent have the same PSG's,
two aspects can
have the same PSG's without being topologically
equivalent (the
same is not true of ISG's).
Eg:
These
two
views
are
not
topologically
equivalent
even
though their PSGs are identical:
To be
topologically equivalent, two views need to
have isomorphic PSGs and be related by rotation,
translation and
perspective projection in R3.
CSE668 Sp2011 Peter Scott 04-48
Finally, note
that views that match topographically
(perfectly
matching aspect graphs)need not match
geometrically.
So a final step must be done after
topographic aspect graph matching to determine if
the object view
geometries match up as well. Thus
we should check
for matches in angles and proportions.
Eg:
The topologies of these views match but not their
geometries. There is no 3-D polyhedron such that
Views
A
and
B
are
both
views
of that object.
CSE668 Sp2011 Peter Scott 04-49
Batlle
on
natural
object
OR
from
outdoor
scenes
Required
reading: Batlle et al, A
review on strategies
for recognizing
natural objects in color images of outdoor
scenes, Image and Vision
Computing
v18 (2000) 515-530.
We close our look at 3-D OR with a general
review of OR constrained to an important problem domain:
recognizing
objects in unstructured outdoor scenes.
This is a typical 3-D domain: it is constrained
in terms of object classes, but there is great
variability in the appearance of objects. The
number of object classes in the 20 or so systems
surveyed ranged up to 14. Typical: {bushes,car,
fence,grass,road,roof,shadow,window}.
CSE668 Sp2011 Peter Scott 04-50
Outdoor scenes
are challenging because the appearance
can change radically in seconds due to cloud movement.
In such circumstances, color is very important for
segmentation,
and all the systems tested use color.
Natural objects tend to have complex and highly
variable shapes that make them poor candidates
for accurate shape modelling (eg. trees, rocks,
grass) whether 2-D view-based or 3-D object-based.
Most frequently they are modelled by a feature
vector capturing a few coarse shape descriptors
(eg. aspect ratio, major axis) together with
texture and
color features.
An important part of OR in such scenes is adjacency
rules: for instance, the sky cannot be under a rock,
or a rock high in a tree. This is an example of the
use of object
relationship or context rules
in OR.
CSE668 Sp2011 Peter Scott 04-51
The surveyed
algorithms are of three basic types:
top-down,
bottom-up and hybrid.
Top-down: begin with a small set of hypotheses,
employ hypothesize-and-verify to aid segmentation.
Use heterogeneous knowledge-based procedures such
as context rules and object-part rules for
both
segmentation and matching.
Bottom-up: begin with pixel values, use data-driven
procedures, uniform classes of archetype models and
image object models. Only hypothesize at final
matching step.
Hybrid: do some bottom-up processing (eg. non-
purposive low-level segmentation), then top-down
processing (eg. class-specific merging high-level
segmentation
rules).
CSE668 Sp2011 Peter Scott 04-52
Top-down
systems can
handle very complex scenes
due to their flexibility, different procedures for
different hypotheses. But they are not good at
handling or learning unmodelled objects, or where
there are many
object classes.
Bottom-up systems can handle unmodelled objects well,
since they generate descriptions of all objects
encountered, not just objects of known classes. But
they have great difficulty with complex scenes and
are mostly limited in practical use to "blocks-world"
type problems.
Hybrid systems combine the discrimination power of
top-down systems with the robustness of bottom-up
systems. After initial non-purposive segmentation,
the segments and their adjacencies can be used to
order hypotheses for purposive continuations. The
descriptions derived for new classes can be the
basis for
learning.
CSE668 Sp2011 Peter Scott 04-53
Main takeaway from Batlle's paper:
current computer vision
systems working in
natural (not artificially
constrained)
outdoor environments
are not yet capable of
recognizing lexicons
of hundreds or thousands of distinct
object types reliably. Small
lexicons can be
handled, but the scale-up problem
has not yet been
solved.
There has been some progress in natural scene labelling
since the
paper was written in 2000, but I think the
conclusion is still valid.
CSE668 Sp2011 Peter Scott 04-54