Announcement: Office hrs today Feb 15 4:40-5:30pm
CSE668
Principles
of Animate Vision Spring 2011
4.
Natural
Vision
Required reading: Levine Ch 3.
about
animate vision principles and systems. Animate
vision
refers to the physiological and anatomical
principles of
vision in animal species.
Since the
animate vision
approach to design of computer
vision systems
is inspired by natural
(biological) vision,
it would do us
well to begin
by looking at how vision works
in humans and
other
animals. So that is what we will do.
CSE668
Sp2011 Peter Scott 06-01
Biological
vision systems input energy from a range
of frequencies in the electromagnetic spectrum (usually
but not exclusively the "visible spectrum" 0.400 to
0.700
nm wavelengths), and output behaviors.
Eg: You see a car coming, so you stop crossing the street.
Eg: You recognize your friend and you smile.
Eg: A snail detects a moving shadow so it gathers itself
into
its
shell
and
stops.
Because the output of vision is behavior, it is difficult
to do input-output (blackbox) analysis. Behavior is
complicated, and is the outcome of more than just the
visual or general perceptual cues that impinge on the animal.
CSE668 Sp2011 Peter Scott 06-02
We
will describe some elements of the anatomy and the
physiology of vision. Anatomy is the description of
the physical parts, physiology is the description of
the
functioning of these parts.
A biological vision system can be divided into three
major
parts:
1. The eye, that part which receives the visible
energy input and converts it to an electrical signal
that the animal's nervous system can work with;
2. The optic nerve, which transmits the coded
signals to the central nervous system;
3. The brain, which interprets the vision signals,
integrates them with other processes such as cognitive
and motor, and produces behaviors.
CSE668 Sp2011 Peter Scott 06-03
The
eye
The eye serves as the camera for the vision system. Here
we will describe some of its anatomy and physiology.
Please refer to the Levine reading, which has some
excellent detailed graphics I cannot reproduce here.
The action of the eye is to focus an image onto the
retina. This is the "image plane." If the eye were a
digital camera, this is where the ccd chip would be.
If it were an analog camera, this is where we would
place
the film. The retina, covering a bit more than
the back half
of
the spherical
eyeball, is clearly not
a plane. It
should really be
called the "image hemisphere"
rather than the
"image plane."
CSE668 Sp2011 Peter Scott 06-04
In order to be focused onto the retina,
lightrays
in
any
system other than a pinhole camera must be bent.
An image is
said to be focused when all the light
radiating from
a given point (x,y,z) in world coords
irradiates only
a single point (u,v) on the image
plane (or image
hemisphere in the case of the eye).
In the eye, this focusing function is done by the
cornea and anterior chamber (where the aqueous humor
is) and the lens. Together they form a multielement
or compound lens.
This is a diagram of a compound lens in a camera.
Normally, layers of lens are glued together and
the focal plane, where the image is formed, is
flat.
CSE668 Sp2011 Peter Scott 06-05
A camera needs a diaphragm, to control the size of the
aperture (hole) through which light travels. The human
eye diaphragm is the pupil system, whose aperture is
controlled by muscle fibers. There are sphincter
(tangential) muscles to stop the eye down, and dilator
(radial) muscles to open up the pupil when more light is
needed.
An imperfect lens can produce distortions on the image
surface. That is the case with the human eye. Yet these
distortions are uncompensated. It is the brain that
"cleans up" the image so it appears undistorted. But
the
spatial bandwidth limitation cannot be compensated.
Eg: You look at a stripe pattern. As it gets further and
further away, at some point you can no longer see
individual stripes. This is the spatial bandwidth,
expressed in lines/mm.
CSE668 Sp2011 Peter Scott 06-06
The
retina
The image is formed on a thin layer called the retina.
The retina covers more than 180 degrees around the
interior
surface of the eyeball (>2π steradians),
allowing us to
see a
few degrees towards the back while
we are looking
forward.
The retina consists of five layers of cells arranged
radially. One layer, the photoreceptive layer, is
devoted to electrooptical transduction. There are
two kinds of photoreceptors in that layer: rods and
cones.
Cones are photopic color-sensitive photoreceptors.
Rods are scotopic
brightness
photoreceptors.
Photopic
vision is normal day-vision. Scotopic is
night-vision.
Both rods and cones are unevenly distributed along the
retina. Cones are packed closely together in the fovea,
or optical center of the retina. Their density falls
off as distance from the fovea. Rods are absent from
the central part of the fovea, are dense in the
para-fovea, the area surrounding the fovea, and their
density also falls off with distance from the fovea
thereafter.
The area far away from the fovea is called
the
peri-fovea
or peripheral
retina.
CSE668 Sp2011 Peter Scott 06-07
The fovea is a depression in the retina about 1.5mm
in diameter, and subsumes about 5.2 degrees of
visual angle*. Its central part, only 0.3mm, is where
there is the maximum density of cones and thus maximum
spatial actuity (resolution, bandwidth). This highest
acuity
part of the visual field, the part impinging on
the center of the fovea, is about the size of
your thumbnail viewed at arm's length, roughly 1.0
degrees of visual angle *. That's all we can see in
great
detail, the rest is blurred.
The spatial resolution in the periphery is about two
orders of magnitude less than that in the central
fovea.
* Concerning visual angle: When an
angle is measured in
degrees
or radians rather than steradians, it refers to
a
plane angle, not a
solid angle. The solid
angle subtended
in R3 due to a plane angle in R2
is computed by rotating
the arms of the angle around their bisector to form a cone.
CSE668 Sp2011 Peter Scott 06-07a
In the case in point, the plane is that defined in space by
the
three points x1: top of fingernail held at arm's length,
x2
center of fovea, x3: bottom of fingernail. Then 1 degree
is
the plane angle formed by the connected line segments
x1-x2-x3
at the vertex x2.
CSE668 Sp2011 Peter Scott 06-08
The photoreceptors contain dyes, or photopigments, that
absorb photons and change their transmissivity. There
are three dyes in cones, just one in rods. Light passing
through these dyes strikes a photosensitive membrane
within the rod or cone, which absorbs photons and
builds up a transmembrane potential. When this TMP
reaches threshold, the cell "fires off" an action
potential. The action potential is an electrical pulse
which is then communicated to other cells by wiring
structures called synapses.
CSE668 Sp2011 Peter Scott 06-09
Retinal neurons
There are four other layers of cells in the retina that
preprocess the electrical signals that originate in the
photoreceptors, prepare them for transmission to the
central nervous system via the optic nerve. They are all
nerve
cells, or neurons.
Horizontal cells interface with rods and cones.
Bipolar cells blend together the outputs of several
horizontals.
Amacrine cells create horizontal filtering, eg.
center-surround filters.
Ganglion cells interface with the optic nerve.
CSE668 Sp2011 Peter Scott 06-10
There is strong evidence for hardwired preprocessing
at the retinal level to accomplish various "preattentive"
operations,
including:
1. Contrast enhancement;
2. Motion detection;
3. Elimination of redundant information;
4. Anti-aliasing;
5.
Noise
suppression;
For our purposes, it is enough to say that the
signals transmitted to the brain are preprocessed to
enhance important image features and suppress noise and
distortion. Preattentive means processing that is always
done regardless of what part (if any) of the visual data
we are actually paying attention to.
CSE668 Sp2011 Peter Scott 06-11
An
important fact is that the optic nerve contains about
1x106 fibers, while there are 10-100x106 photoreceptors.
So another important property of the retina is data
compression.
Pathways
beyond
the
retinal
optic
nerve
The left and right eye branches of the optic nerve
merge beneath the brain in the optical chiasm, where
fibers from the two left-half images merge, and likewise
the
two right-half images merge. This is for stereo.
The two halves of the optic nerve then project to the
superior colliculus and the visual or striate cortex.
The superior colliculus is common to man and lower
animals, the cortex is the part of the brain in which
higher cognitive processing occurs. More primitive
visual functions, like deciding eye movements, occur
in the superior colliculus while abstract image
understanding occurs in the striate cortex and the
brain
regions this projects to.
CSE668 Sp2011 Peter Scott 06-12
Neurons
and
neuronal
signal
processing
Neural nets have been on the scene a very long time,
perhaps a half billion years. We are just getting
around
to engineering them artificially.
Information processing in the human brain takes place in
a vast interconnection of specialized cells called
neurons.
CSE668 Sp2011 Peter Scott 06-13
When the cell body (soma) reaches threshold potential,
an
action potential is fired down the axon.
When the action potential reaches a synapse, or
connection with another neuron, it causes a
neurotransmitter to flow into the synaptic cleft.
Neurotransmitters include the organic chemicals dopamine,
acetylcholine, seritonin and norepinepherine. They each
drive
different types of neurons.
The neurotransmitter causes the post-synaptic membrane to
become more negative (inhibitory synapse) or more
positive (excitatory synapse). The net effect of all the
post-synaptic membrane stimuli is to change the soma
potential
closer to or further from threshold.
CSE668 Sp2011 Peter Scott 06-14
In
a human brain,
*
There
are
10-100
billion neurons;
* There are an average of 1-10 thousand synapses per
neuron
(10-100
trillion
synapses
altogether);
* Action potentials are fired asynchronously (no
central
clock
signal);
* No information is conveyed in the shape of the
action potential, only its presence or not. That
is,
channel coding is binary.
* The maximum action potential rate is about 500/sec
for
a
given
neuron.
CSE668 Sp2011 Peter Scott 06-15
Why
binary channel coding? Since an action potential is
an "all-or-nothing" response, it can be regenerated. This
would not be possible with analog channel coding.
Eg
Symbol coding in natural neural nets is pulse frequency
modulation. That is, the information in a neural signal
is conveyed in the rate of pulses (action potentials) per
unit
time.
CSE668 Sp2011 Peter Scott 06-16
This review of the human visual system is
admittedly
superficial. Our goal in this course is
to
understand how to synthesize and analyze computer
vision
systems for 3-D and motion, not human. We cannot
take
the time to do anything but scratch the surface.
But even the surface will have things to teach us.
As
we will see, many of the basic design principles
exhibited
by animate vision systems can be suitably
migrated
to artifical vision systems with great benefit
CSE668 Sp2011 Peter Scott 06-17