CSE668 Lecture Slides Spring 2011

 

3. Passive Shape Recovery

 

Required reading: Sonka Ch 9, Ch 10 sec 1.

Recommended: Trucco Ch 9.

 

Having developed the basic 3-D imaging models, we

will turn here to applying them to shape recovery,

ie. determining the range z(x,y) for each (x,y) in

the image. Various cues can be used to do this:

stereo disparity (shape-from-stereo), radiometric

variations over the image plane (shape-from-

shading), appearance of surface textures

(shape-from-texture) and others. We begin with

shape-from-stereo. The first, and hardest, problem

in using shape-from-stereo is solving the so-called

correspondence problem, which we must discuss first.

CSE668 Sp2011 03-01



The correspondence problem
 
 

    Stereopsis can be used to find the range of any

    scene point X as long as X is visible in both

    images. So, logically, the
 use of stereoptic proc-

    essing requires two steps:

       1. Given an image point (ul,vl) in one image

          corresponding to a desired scene point X, find

          the corresponding point (ur,vr) in the other

          image.

       2. Apply the stereopic formulas to find X:
 

          First solve for al and ar:

     Rl-1alKl-1[ul vl 1]T+tl = Rr-1arKr-1[ur vr 1]T +tr
 

          Then plug back into either of these to get Xw:
 

            Xw = Rl-1alKl-1[ul vl 1]T+tl
                                        = Rr-1arKr-1[ur vr 1]T+tr

CSE668 Sp2011 03-02




The correspondence problem (step 1 above) is easily solved

when only a single prominent feature point of an easily

recognizable object is required.
 

Eg: The upper left hand corner of the roofline of the

    large building in the stereo image pair.
 

Any sparse set of distinctive feature points can be easily

corresponded. But difficulties arise when the set is

dense, or when the features are not prominent.
 

Eg: Given a large pointset of locations on a blank wall in

    an image, find the correponding pointset in the other

    image.

 

The most general, and most difficult, case is called 

shape-from-stereo correspondence. This is the goal of finding,

for each point (u,v) in one image, the corresponding point in

the other. This allows us to compute Xw for every visible

scene point, ie it completely solves the recovery problem.
 

CSE668 Sp2011 03-03



Constraints which help reduce the computational complexity

of the correpondence problem
 

1. The epipolar constraint: the corresponding point must

   lie along the epipolar line. Converts a 2-D search to

   1-D search.
 

2. Similarity constraints: corresponding points must be

   similar with respect to:

       a. brightness: photometric compatibility constraint;

       b. geometric properties: geometric similarity

          constraint;

       c. edge occupancy: figural disparity constraint;
 

Some of these "constraints" should be considered hard

constraints which must be satisfied exactly, others soft

constraints. Eg., the epipolar is a hard constraint and

photometric compatibility soft.

CSE668 Sp2011 03-04



3. Set constraints: sets of correspondences must satisfy:

       a. continuity of disparity : disparity smoothness

          constraint;

Note: Disparity is defined as the difference (ul,vl)-(ur,vr)

between image plane locations of corresponding points in a

stereo pair of images.

       b. ordering invariance: ordering constraint;

4. Correspondence invertibility constraints:

       a. At most one corresponding point: uniqueness cnst;

       b. If ur corresponds to ul , then ul must correspond

          to ur: mutual correspondence constraint.
 

CSE668 Sp2011 03-05



Correspondence as feature matching
 

The most direct approach to solving the correspondence

problem over sparse pointsets is by identifying prominent

features of each u to be corresponded, then searching the

other image for points with identical or very similar

features in the other images. The correspondence

constraints cited above can then be used to parse down

the set of candidate matching points for each u.
 

Features of all kinds can be used: geometric, photometric,

color, texture, moments, etc.
 

CSE668 Sp2011 03-06



Feature-matching can also be used for dense point-sets.

The key is to have "prominent features" characterize

each point. For instance, suppose all edges are extracted

from both images. Starting from a prominent corner, the

ordering and disparity smoothness constraints can be used

to work correspondences along an edge starting at the

corner.
 

See text description of one of many such algorithms, the

PFM algorithm.
 

CSE668 Sp2011 03-07



Correspondence as maximum correlation over a window
 

As the pointsets to be corresponded get denser, and the

features less prominent, the feature-matching algorithms

performance degrades. For instance, shape-from-stereo

algorithms could be intialized by feature-matching "front

ends", but not every point in an image has prominent

features.
 

Correlation-based correspondence: for a given pointset

U={u1...un} in the left image to be corresponded,

   1. i=1;

   2. center an mxm template at ui in the left image;

   3. for each point (p,q) in the right image compute

      the mean-square error between the template and an

      mxm window centered at (p,q).

   4. Designate the set of all points in the right image

      whose mse is within a tolerance level of the minimum

      mse as the ui-correspondence candidate set.

   5. Use the correspondence constraints to eliminate

      candidates, then select the remaining candidate

      with the least mse as the corresponding point.

   6. If i=n stop, else increment i and return to step 2.
 
 

CSE668 Sp2011 03-08



Shape-from-shading
 

The basic idea of shape-from-shading is that

variations in the brightness of nearby points in a

scene can usually be ascribed to variations of the

orientation of the surface with respect to the

light source.
 

Eg:
 

CSE668 Sp2011 03-09



If the angles subtended by Light Rays 1 and 2 are

the same, the total radiant energy striking the

object is the same. But that energy is spread out

over a larger surface patch for Ray 1, since the

angle between the outward pointing normal and a

vector pointing back at the light source is larger

than for Ray 2. So if this radiant energy

is reflected off the surface and captured by two

pixels of equal size on the image plane, pixel 1

will see a surface patch with lower energy density

striking it and so pixel 2 will be brighter. 

 

CSE668 Sp2011 03-09a



From the brightness variation we can deduce that

the OPN of the surface patch imaged at pixel 2

points back at the light source, while the normal

at pixel 1, appearing less bright, must "look away"

from the light source.

Eg: Sphere with light source at camera.


CSE668 Sp2011 03-10



Radiometry and Photometry
 

Radiometry deals with the measurement of flows of

radiant energy (eg. visible light, RF, UV) and

photometry deals with how the human eye interprets

visible light.
 

The detailed study of brightness and color

variations in an image and how they are related to

objects in the scene, their shapes, locations and

surface characteristics, the scene illumination, and

the properties of the focal plane array requires

careful radiometric modelling. Here we can only

summarize these considerations as they pertain to

the problem of shape-from-shading.
 

We will not consider photometry at all. That is

essential for understanding how humans do

shape-from-shading, but our goal is to understand

the general process.

CSE668 Sp2011 03-11


 

Some radiometric quantities and their definitions
 

Radiometry is concerned with how radiant energy is

emitted from a source, distributes itself in space,

is scattered by material objects, is redistributed

after scattering, then strikes and stimulates a

detector surface.
 

1. Radiant flux (Phi) is the amount of power

radiated by a source. Units: Watts (W). Radiant

energy is is the time- integral of radiant flux.

Units: Joules (J).
 

2. Solid angle (theta) subtended by a 3-D ray can

be measured by surrounding the source of the ray

with a unit sphere, then measuring the surface area

which is subtended by the ray, ie. the area of the

intersection of the surface and the ray. Units:

steradians (sr).
 

3. Radiance (L) is the radiant flux from an

emitting surface per unit surface area and per unit

solid angle. Units: W m-2 sr-1.

CSE668 Sp2011 03-12


   


Eg: A sphere with total surface area of 1 mm2

    radiates a radiant flux of 10 W. Then if the

    flux is uniformly distributed, this source

    produces a radiance of

    L = (10 W)/((10-6 m2)(4 pi sr))= 795.8 kW m-2 sr-1

4.  Irradiance (E) is the power (radiant flux) per unit

    surface area that strikes a surface in the presence

    of radiant energy.

 

So a radiant source "throws" a spatial distribution

of radiance L(x,y,z) out into the scene. The source

might be a lightbulb, or it might be a reflecting

surface. A detector "catches" some amount of this

radiance at the point (u,v) on the image plane as

irradiance E. The brightness of the pixel at (u,v)

depends upon E. Note that for a given radiance L at

a point (x,y,z), the irradiance of a surface point

there will depend on its angle relative to the

direction of the radiant ray.

CSE668 Sp2011 03-13



    Eg: Our lightsource in the previous example

    illuminates a 1 cm x 1 cm film located 5m away.

    What is the irradiance on the film?

    The source produces 795.8*103*10-6 = 0.795 W/sr-1.

    The 1cm x 1cm film located 5m away subtends a

    solid angle of approximately
 

    theta/(4*pi) = 10-4 m2 / (4*pi*52 m2) = 4.0 mu-sr
 

    So the film will "catch"
 

    0.795 W/sr * 4.0 micro-sr = 3.18 microW
 

    of radiant power. Then dividing by surface area, 

    the irradiance is
 

        E = 3.18 microW / 10-4 m2 = 31.8 mW/m2
 

CSE668 Sp2011 03-14


 

 

    Note that if the film were oriented at an angle of

    say 45o with respect to the radiant ray, then the

    solid angle subtended by the film at the source

    drops by a factor of cos45o, and so the irradiance

    will be reduced to .707 times the calculated value

    above.

     This figure is a crossection showing that the film patch

     height on a sphere surrounding the source will be reduced

     by factor of 0.707 when tipped 45o. Thus the solid angle

     subtended by the film when viewed at the source is decreased

     by the same factor.


     For this 45
o-"tipped" film surface, its irradiance is now

         E = 0.707*31.8 mW/m2 = 22.5 mW/m2

CSE668 Sp2011 03-14a



Reflectance
 

The power which is reflected from a surface per

unit surface area into a given solid angle depends

on several factors: the irradiance at the surface,

the surface orientation relative to the ray carrying

that irradiance towards the surface, and the surface

characteristics.
 

CSE668 Sp2011 03-15



Surface orientation

 

For each point (x,y) in the image, let z(x,y)

specify the z-coord of the surface point in the

scene visible at (x,y) in camera coordinates.

Then let us define the surface gradient vector

[p q]T as the vector of partial derivatives
 

        [p q]T  =  [ dz/ddz/dy]T
 

The 3-D vectors [1 0 p]T and [0 1 q]T, in the xz

and yz planes respectively, point tangent to the

surface, hence a vector perpendicular to both

must be a normal to the surface. One such is

the vector cross-product of these two vectors

[-p -q 1]T. (p,q) is called gradient space, and

gradient space specifies the surface orientation.
 
 

CSE668 Sp2011 03-15a



Surface characteristics

 

Surfaces reflect a specific fraction of their

irradiance, called the albedo. That irradiance

equals the reflected radiance distributed through

a 2*pi solid angle centered about the surface

normal.
 

If the reflected radiance is uniform in that

solid angle, the surface is said to be Lambertian,

and the reflectance (reflected radiance) satisfies
 

        R =  rho*cos(Theta_i)/pi
 

where Theta_i is the angle of incidence (angle

between the normal and the incoming irradiance

ray) and rho is the albedo. 

CSE668 Sp2011 03-15b



 

In general, R depends on rho, cos(Theta_i) and on the

difference between the angle of incidence and the

angle whose R is being computed (angle of

reflection or view angle). The angle of reflection

does not matter only for pure Lambertian surfaces

with no specular component. This is the only case we

will consider here, the equation for R gets complicated

fast when the specular component is considered also.
 

CSE668 Sp2011 03-16



Specular vs. Lambertian
 

A pure specular surface is one whose reflectance R

is zero except for that angle of reflection

equal to the angle of incidence. Such surfaces

have a "mirror" or shiny quality. Lambertian

surfaces, on the other hand, have a matte or

dull surface luster.
 

Pure specular and pure Lambertian surfaces are

just limiting cases. Real surfaces in general

have reflectance functions R which are neither

uniform nor non-zero in just one direction.
 

CSE668 Sp2011 03-16a




Image irradiance equation

 

The basic radiometric conservation equation is that

which says the irradiance E "caught" at a point

(x,y) on the image plane equals the radiance R

"thrown" at that point from the corresponding

object point which is being imaged. In simplest

form (see Sonka p 493-494, Trucco p 223-224)
 

 E(x,y) = R(p(x,y),q(x,y)) = R(dz/dx,dz/dy)
 

where z=z(x,y) is the depth of the surface point.

CSE668 Sp2011 03-16b



This form of the irradiance equation assumes,

among other things, that the radiance function

depends only on surface orientation (Lambertian

assumption), that the light sources and

image plane are distant from the scene, and so

perspective projection is well characterized by

orthographic projection, ie. image coordinates

(u,v) equal the camera coordinates (x,y).

CSE668 Sp2011 03-17


Eg:

Orthographic projection means that the image is

projected onto the x-y image plane in the "usual"

way we think of projection of a 3D point onto a

2D plane, [x y z] -> [x y 0]. Note that using

orthographic projection, the image plane location

[u v]T = [x y]T does not depend on z. With perspective

projection,as in our pinhole model, the projection of

an object point [x y z] into the image plane depends not

just on x and y, but z as well.

CSE668 Sp2011 03-18




Then continuing with the present example, for each point


on the surface of the parabaloid,


 

            z(x,y) = x2+y2+10
 

    p(x,y) = dz/dx = 2x;  q(x,y) = dz/dy = 2y.
 

Lets determine R(p(x,y),q(x,y)). Assuming Lambertian

surface, R = cos Theta_i/pi where Theta_i is the

angle between the surface normal and source ray.
 

        n = [-p -q 1]T = [-2x -2y 1]T


But the cosine of the angle between a pair of vectors

equals their inner product divided by the product of

their norms, so

        cos Theta_i = [-2x -2y 1][1 0 0]T
                      -------------------
                    |[-2x -2y 1]|*|[1 0 0]|
 

CSE668 Sp2011 03-19


and the irradiance equation becomes
 

        E(x,y) =            2x
                    -----------------
                    pi*(4x2+4y2+1)1/2
 

Thus for instance, the brightness E on the image

plane would be zero along the y-axis (why?) and

go asymptotically to its max for large x (why?).
 

CSE668 Sp2011 03-20




Solving the IIE E(x,y)=I(p,q) for p(x,y) and q(x,y)



Given the radiance E(x,y) over an image, the image

irradiance equation E(x,y)=R(p(x,y),q(x,y)) can be

used to determine (or estimate) p(x,y) and q(x,y).

If you know the orientation of the tangent plane

to the surface everywhere, you know the shape.


There are two classes of methods used for solving

the IIE, which is a partial differential equation. The

Characteristic strip method is a classical pde-solver,

while the optimization method works on the priniciple

of finding a function which is a good global fit to the

data.

CSE668 Sp2011 03-20a




 Characteristic strip method:
 

Starting from a point (x,y) where z, p and q are

known, pick a path l and solve
 

      dx/dl= Rp, dy/dl=Rq, dz/dl=pRp+qRq

         dp/dl=Ex, dq/dl=Ey
 

Known points include occluding boundary and max / min

points of brightness, where patch orientation is

perpendicular to / parallel to light ray.

 

CSE668 Sp2011 03-21




Optimization method:

 

Given the Irradiance map E(x,y), an given the

Reflectance map R(p,q) (but not knowing

the (p,q) values as functions of (x,y)), find

the functions p(x,y) and q(x,y) which minimize

the sum over the entire image of the energy fn
 

    [E(x,y)-R(p,q)]2+ λ[(del2p)2+(del2q)2]  


Here del2() is the Laplacian operator (sum of squares

of the partials with respect to x and y). It is a

frequently-used measure of the irregularity of a fn.


This minimization is a variational calculus problem

which can be solved via relaxation methods.

 

          CSE668 Sp2011 03-22 


 

Once p(x,y) and q(x,y) are known over the entire

image, since the surface gradient is defined by

           [p q]T  =  [dz/ddz/dy]T

then taking a single fixed depth reference point

(xo,yo,1) we can integrate outward over the image

from that point to recover normalized z(x,y).
 
 

CSE668 Sp2011 03-23




Now what can we say about the effectiveness of the

shape-from-shading algorithms? The problem is

in general ill-posed, ie. has more degrees of freedom

than constraints induced by shading of the image.

Ill-posed problems can be regularized by addition

of hard constraints, for instance limiting the

surface chacteristics or the types of objects

being imaged (blocks-world approach) and/or by

addition of soft constraints (optimization

approach using irregularity as soft constraint).
 

Problem is whether the regularized problem becomes

artificially overconstrained.

CSE668 Sp2011 03-24



 

Other shape-from algorithms

Stereo and shading are not the only cues that can

be used to infer local surface orientation and

depth distribution.
 

Shape-from-texture
 

Consider the following image:

It is hard to avoid the perception that the

surface shown tips down to the left and

forward. Why do we have that perception?
 

CSE668 Sp2011 03-25


A texel is the basic repeating pattern of a 2-D

texture. A texture is any 2-D periodic image or

portion of an image.
 

Eg: In the binary image

   000010000100001000010000100001000010000100001
   100101001010010100101001010010100101001010010
   000010000100001000010000100001000010000100001
   100101001010010100101001010010100101001010010
   000010000100001000010000100001000010000100001
   100101001010010100101001010010100101001010010
          .....
 

   the texel is

   00100
   01010

   The texture has 2-D periodicity (2,5).

Note that the texel for a given perodic image is defined

only up to arbitrary horizontal and vertical phases.


Eg (cont;d): shifting the horizontal phase by 2 pixels, the

    texel for the previous example can be written 10000
                                                  01001

    or shifting horizontal phase by 2 and vertical phase by

    1, can also be written 01001
                           10000.

CSE668 Sp2011 03-26



 

The eye tends to assume that once a texel has been

identified, distortions to its shape are due to

changes in the surface orientation of the surface,

or due to perspective projection.
 

   Eg: Suppose we have identified a texel as a

       square. but at one point in the images,

       one of these texels looks like a trapezoid

       in perspective projection. We can infer the

       plane of the surface normal.


 

CSE668 Sp2011 03-27



 

Eg: A square mesh distorted by the shape of

    a function into a "wire-mesh 3-D

    graph." The function is the sinc fn
 

       z(x,y)=sin(d(x,y))/d(x,y)

    where

       d(x,y) = sqrt(x^2+y^2)

CSE668 Sp2011 03-28



 
 

Shape-from-texture algorithms work by determining

the undistorted texel shape, then matching each

texel in the image to that shape modified by a

distortion function which reveals the surface

orientation. It can be done discretely, texel by

texel, or continuously, computing rate of change

of surface gradient vector.
 

Eg: A square texel would look like one of the

    texels two images ago if its surface normal

    was approximately [1  1 -1]T.
 

Sonka does not give any actual algorithms, see

Trucco STAT-SHAPE_FROM_TEXTURE p. 240-241 for

one such.
 

CSE668 Sp2011 03-29



Shape-from-photometric-stereo

 

For a Lambertian surface and using orthographic projection,

we showed that
 

    R =  rho (cos Theta_i / pi)
 

where rho was the albedo, and Theta_i the angle

between light source and surface normal. Rewrite as
 

    R = rho (L*n)
 

where L is a vector whose direction is that of the

light ray and magnitude is the strength of the

irradiance, and n is the unit surface normal

vector. Note the factor of 1/pi is subsumed in L.
 

CSE668 Sp2011 03-30



 

For a single fixed surface point (x,y) suppose we

take 3 images with three different L-vectors, ie.

using three different lighting conditions:
 

    E1(x,y) = rho (L1*n)
    E2(x,y) = rho (L2*n)
    E3(x,y) = rho (L3*n)
 

Here Ei are the three measured irradiances at the

same point (x,y) in the image plane under the three

different lighting conditions. Then we can re-form

these 3 scalar equations into a single vector-matrix

equation
 

                E/rho = L*n
 

where E = [E1 E2 E3]T, and the 3x3 matrix L has

L1, L2 and L3 as its rows. Then
 

        n = (1/rho)*L-1*E
 

CSE668 Sp2011 03-31



 

This is shape-from-photometric stereo. Note that

if >3 images are available, can use least squares

fit to overdefined equations (more robust).
 

Limitations with s-f-p-s as presente here is the

perfect Lambertian assumption and the use of

orthographic projection. Gets a lot more complicated

without those idealizations.
 

Another concern: not very biomimetic, ie. doesn't

relate to any evolutionary mechanism.
 
 

CSE668 Sp2011 03-31



 

Shape-from-motion
 

If we watch an object go around a merry-go-round,

we can perceive its 3-D shape from the many views

of it which we see. Similarly, if we walk around a

statue, we can recover its shape. This is

shape-from-multiple-views, or shape-from-motion.  

The key tool here is the Structure from motion theorem.


Structure from motion theorem: given 4 or more

non-co-planar points on a rigid body as viewed

in 3 or more non-co-planar orthographic

projections, the 3-D displacements between the

points may be determined.
 

CSE668 Sp2011 03-32



 

Proof of the theorem involves only projective

geometry, and proceeds by selecting one of the

points as the origin, finding the translation

of that point between images and the rotation

of the rigid body around that point (see Sonka

p. 510-512).
 

One difficulty with shape-from-motion is separ-

ating shape parameters from motion parameters.

Another is the rigid body assumption. For

instance, cars are rigid bodies but tractor

trailer trucks are not. People are not.
 

CSE668 Sp2011 03-33



 

Shape-from-motion can also be approached on the

continuous level from optical flow, ie. the

movement in image space of grey levels. We will

develop optical flow in the next section of this

course, on motion analysis.
 

Other shape-from concepts discussed in Sonka

include shape-from-focus/defocus and shape-

from-contour. These are perhaps less important

than shape-from-X where X={stereo, shading,

texture, photometric stereo, motion}. In nature

all these, with the exception of X = photometric

stereo, have been shown to be important shape

determination mechanisms.

CSE668 Sp2011 03-34



Summary remarks about shape-from methods:

 

1.  Shape-from-stereo is a practical tool in

    many realistic settings.
 

2.  Shape-from-shading works well only in

    structured and well-characterized settings.
 

3.  Other shape-from methods are of limited

    practical value in current practice, work

    only for a narrow range of applications.
 

4.  What is missing from these methods is the

    notion of goal-seeking behavior, view

    planning. We will explore that shortly.
 

CSE668 Sp2011 03-35