Natural Vision

Announcement: Office hrs today Feb 15 4:40-5:30pm

CSE668 Principles of Animate Vision Spring 2011

4. Natural Vision

Required reading: Levine Ch 3.

From now through the midterm, which is 3 weeks from today, ie. Tuesday March 8, we will be talking

about animate vision principles and systems. Animate

vision refers to the physiological and anatomical

principles of vision in animal species.

Since the animate vision approach to design of computer

vision systems is inspired by natural (biological) vision,

it would do us well to begin by looking at how vision works

in humans and other animals. So that is what we will do.

CSE668 Sp2011 Peter Scott 06-01

Biological vision systems input energy from a range

of frequencies in the electromagnetic spectrum (usually

but not exclusively the "visible spectrum" 0.400 to

0.700 nm wavelengths), and output behaviors.

Eg: You see a car coming, so you stop crossing the street.

Eg: You recognize your friend and you smile.

Eg: A snail detects a moving shadow so it gathers itself

into its shell and stops.

Because the output of vision is behavior, it is difficult

to do input-output (blackbox) analysis. Behavior is

complicated, and is the outcome of more than just the

visual or general perceptual cues that impinge on the animal.

CSE668 Sp2011 Peter Scott 06-02

We will describe some elements of the anatomy and the

physiology of vision.Anatomyis the description of

the physical parts,physiologyis the description of

the functioning of these parts.

A biological vision system can be divided into three

major parts:

1. The eye, that part which receives the visible

energy input and converts it to an electrical signal

that the animal's nervous system can work with;

2. The optic nerve, which transmits the coded

signals to the central nervous system;

3. The brain, which interprets the vision signals,

integrates them with other processes such as cognitive

and motor, and produces behaviors.

CSE668 Sp2011 Peter Scott 06-03

The eye

The eye serves as the camera for the vision system. Here

we will describe some of its anatomy and physiology.

Please refer to the Levine reading, which has some

excellent detailed graphics I cannot reproduce here.

The action of the eye is to focus an image onto the

retina. This is the "image plane." If the eye were a

digital camera, this is where the ccd chip would be.

If it were an analog camera, this is where we would

place the film. The retina, covering a bit more than

the back half of the spherical eyeball, is clearly not

a plane. It should really be called the "image hemisphere"

rather than the "image plane."

CSE668 Sp2011 Peter Scott 06-04

In order to be focused onto the retina, lightrays in

any system other than a pinhole camera must be bent.

An image is said to be focused when all the light

radiating from a given point (x,y,z) in world coords

irradiates only a single point (u,v) on the image

plane (or image hemisphere in the case of the eye).

In the eye, this focusing function is done by the

cornea and anterior chamber (where the aqueous humor

is) and the lens. Together they form a multielement

or compound lens.

This is a diagram of a compound lens in a camera.

Normally, layers of lens are glued together and

the focal plane, where the image is formed, is

flat.

CSE668 Sp2011 Peter Scott 06-05

A camera needs a diaphragm, to control the size of the

aperture (hole) through which light travels. The human

eye diaphragm is the pupil system, whose aperture is

controlled by muscle fibers. There are sphincter

(tangential) muscles to stop the eye down, and dilator

(radial) muscles to open up the pupil when more light is

needed.

An imperfect lens can produce distortions on the image

surface. That is the case with the human eye. Yet these

distortions are uncompensated. It is the brain that

"cleans up" the image so it appears undistorted. But

the spatial bandwidth limitation cannot be compensated.

Eg: You look at a stripe pattern. As it gets further and

further away, at some point you can no longer see

individual stripes. This is thespatial bandwidth,

expressed in lines/mm.

CSE668 Sp2011 Peter Scott 06-06

The retina

The image is formed on a thin layer called theretina.

The retina covers more than 180 degrees around the

interior surface of the eyeball (>2πsteradians),

allowing us to see a few degrees towards the back while

we are looking forward.

The retina consists of five layers of cells arranged

radially. One layer, thephotoreceptive layer, is

devoted to electrooptical transduction. There are

two kinds of photoreceptors in that layer: rods and

cones.

Conesare photopic color-sensitive photoreceptors.

Rodsare scotopic brightness photoreceptors.

Photopic vision is normal day-vision. Scotopic is

night-vision.

Both rods and cones are unevenly distributed along the

retina. Cones are packed closely together in thefovea,

or optical center of the retina. Their density falls

off as distance from the fovea. Rods are absent from

the central part of the fovea, are dense in the

para-fovea, the area surrounding the fovea, and their

density also falls off with distance from the fovea

thereafter. The area far away from the fovea is called

the peri-fovea or peripheral retina.

CSE668 Sp2011 Peter Scott 06-07

The fovea is a depression in the retina about 1.5mm

in diameter, and subsumes about 5.2 degrees of

visual angle*. Its central part, only 0.3mm, is where

there is the maximum density of cones and thus maximum

spatial actuity (resolution, bandwidth). This highest

acuity part of the visual field, the part impinging on

the center of the fovea, is about the size of

your thumbnail viewed at arm's length, roughly 1.0

degrees of visual angle *. That's all we can see in

great detail, the rest is blurred.

The spatial resolution in the periphery is about two

orders of magnitude less than that in the central

fovea.

* Concerning visual angle: When an angle is measured in

degrees or radians rather than steradians, it refers to

a plane angle, not a solid angle. The solid angle subtended

in R³ due to a plane angle in R² is computed by rotating the arms of the angle around their bisector to form a cone.

CSE668 Sp2011 Peter Scott 06-07a

In the case in point, the plane is that defined in space by

the three points x₁: top of fingernail held at arm's length,

x₂ center of fovea, x₃: bottom of fingernail. Then 1 degree

is the plane angle formed by the connected line segments

x₁-x₂-x₃ at the vertex x₂.

CSE668 Sp2011 Peter Scott 06-08

The photoreceptors contain dyes, or photopigments, that

absorb photons and change their transmissivity. There

are three dyes in cones, just one in rods. Light passing

through these dyes strikes a photosensitive membrane

within the rod or cone, which absorbs photons and

builds up a transmembrane potential. When this TMP

reaches threshold, the cell "fires off" an action

potential. The action potential is an electrical pulse

which is then communicated to other cells by wiring

structures called synapses.

CSE668 Sp2011 Peter Scott 06-09

Retinal neurons

There are four other layers of cells in the retina that

preprocess the electrical signals that originate in the

photoreceptors, prepare them for transmission to the

central nervous system via the optic nerve. They are all

nerve cells, or neurons.

Horizontal cells interface with rods and cones.

Bipolar cells blend together the outputs of several

horizontals.

Amacrine cells create horizontal filtering, eg.

center-surround filters.

Ganglion cells interface with the optic nerve.

CSE668 Sp2011 Peter Scott 06-10

There is strong evidence for hardwired preprocessing

at the retinal level to accomplish various "preattentive"

operations, including:

1. Contrast enhancement;

2. Motion detection;

3. Elimination of redundant information;

4. Anti-aliasing;

5. Noise suppression;

For our purposes, it is enough to say that the

signals transmitted to the brain are preprocessed to

enhance important image features and suppress noise and

distortion.Preattentivemeans processing that is always

done regardless of what part (if any) of the visual data

we are actually paying attention to.

CSE668 Sp2011 Peter Scott 06-11

An important fact is that the optic nerve contains about

1x10⁶ fibers, while there are 10-100x10⁶ photoreceptors.

So another important property of the retina is data

compression.

Pathways beyond the retinal optic nerve

The left and right eye branches of the optic nerve

merge beneath the brain in theoptical chiasm, where

fibers from the two left-half images merge, and likewise

the two right-half images merge. This is for stereo.

The two halves of the optic nerve then project to the

superior colliculus and the visual or striate cortex.

The superior colliculus is common to man and lower

animals, the cortex is the part of the brain in which

higher cognitive processing occurs. More primitive

visual functions, like deciding eye movements, occur

in the superior colliculus while abstract image

understanding occurs in the striate cortex and the

brain regions this projects to.

CSE668 Sp2011 Peter Scott 06-12

Neurons and neuronal signal processing

Neural nets have been on the scene a very long time,

perhaps a half billion years. We are just getting

around to engineering them artificially.

Information processing in the human brain takes place in

a vast interconnection of specialized cells called

neurons.

CSE668 Sp2011 Peter Scott 06-13

When the cell body (soma) reaches threshold potential,

an action potential is fired down the axon.

When the action potential reaches a synapse, or

connection with another neuron, it causes a

neurotransmitterto flow into the synaptic cleft.

Neurotransmitters include the organic chemicals dopamine,

acetylcholine, seritonin and norepinepherine. They each

drive different types of neurons.

The neurotransmitter causes the post-synaptic membrane to

become more negative (inhibitory synapse) or more

positive (excitatory synapse). The net effect of all the

post-synaptic membrane stimuli is to change the soma

potential closer to or further from threshold.

CSE668 Sp2011 Peter Scott 06-14

In a human brain,

* There are 10-100 billion neurons;

* There are an average of 1-10 thousand synapses per

neuron (10-100 trillion synapses altogether);

* Action potentials are fired asynchronously (no

central clock signal);

* No information is conveyed in the shape of the

action potential, only its presence or not. That

is, channel coding is binary.

* The maximum action potential rate is about 500/sec

for a given neuron.

CSE668 Sp2011 Peter Scott 06-15

Why binary channel coding? Since an action potential is

an "all-or-nothing" response, it can be regenerated. This

would not be possible with analog channel coding.

Eg

Symbol coding in natural neural nets ispulse frequency

modulation. That is, the information in a neural signal

is conveyed in the rate of pulses (action potentials) per

unit time.

CSE668 Sp2011 Peter Scott 06-16

Summary

This review of the human visual system is

admittedly superficial. Our goal in this course is

to understand how to synthesize and analyze computer

vision systems for 3-D and motion, not human. We cannot

take the time to do anything but scratch the surface.

But even the surface will have things to teach us.

As we will see, many of the basic design principles

exhibited by animate vision systems can be suitably

migrated to artifical vision systems with great benefit

to the computer vision designer.

CSE668 Sp2011 Peter Scott 06-17