Eye to Eye
Cameras and Vision

Human vision and photographic imaging have some significant similarities.   The eye and the camera each consist of a lens, an aperture, an image plane, and light sensors.   A camera also usually has a mechanical shutter while our vision is simply sampled through the optic nerve.   But there are some important differences.   The primary objective of this article is simply to explore the similarities, highlight the differences, and correlate the terminology of each to the other.   We will start with three key physical image elements; resolution, contrast, and color.


Resolution is the ability to recognize minute details as separate image elements.   In the eye, this is limited by the density of rods and cones on the retina and a small central area called the fovea.   With film resolution is limited by chemical grains on the film substrate.   With digital sensors it is defined by the size of the photo receptor sites or sensors.   Within the retina these receptors vary greatly in size and density.   The density is much higher in the fovea, the center of the macula and the center of our central vision, than it is in other areas of the retina that make up our peripheral vision.   With film, these sensors are somewhat randomly distributed but with a fairly uniform density.   With digital sensors they are uniformly distributed in a fixed pattern.

The human eye contains about 120 million rods and 6 million cones, channeled into 1.5 million optic nerve endings.   These basic numbers lead to some interesting but misleading pixel counts for the eye.   Rods are very sensitive to luminosity but turn off in bright light.   Cones are sensitive to color, but only in bright light.   There can be as many as 1000 rods or as few as a single cone connected to a single nerve ending (ganglion cell).   In some areas, a single rod or cone may connect to as many as five ganglion cells.   Multiple rods with different light sensitivities form a single image point.   And the color attributes of a single cone contribute to multiple image points.   In other words, the relationship between photo sites and image points is not one to one.

Our peripheral vision has a high refresh rate so it responds quickly to motion, but our central vision has higher latency so it is more acute and less responsive to motion.   Since the fovea (central vision) does not have rods, it is not sensitive to dim lights.   Astronomers know this so to observe a dim star, they use averted vision, looking slightly out of the side of their eyes.   Our central vision represents only 2% of the retina, but 50% of the visual cortex region of our brains where images are formed.   It is also important to note that there are significant differences within individuals based on age, gender, disease, and other factors.   Therefore, rod and cone photo-sensor counts do not correlate directly to resolution.   But we will look at some numbers later.


Contrast is a measured distance between tones in an image that allows a viewer to differentiate unique tones.   Taken together, resolution and contrast determine the sharpness of an image.   The full range between maximum black and white is typically called the dynamic range.   The difference between two adjacent tones is usually called local contrast.   Measuring these can be tricky because some refer to the contrast range, some to decibels (dB), and some to exposure values (EV).

The eye has an amazing ability to adapt to high or low levels of light.   But even the eye cannot adapt to total darkness and direct sunlight at the same time.   And, whatever numbers are quoted, the only thing very clear is that the dynamic range of our night vision is as much as 600 times that of our daylight vision.   The following table illustrates this (approximately).

  Contrast Ratio EV Ratio dB
Night vision 10,000,000:1 27 160
Daylight vision 15,000:1 13 80

By the same token, each film type, each paper type, and each digital sensor has a unique characteristic dynamic range.   By manipulating ISO sensitivity we can adapt film or a digital sensor to different levels of light just as with our night vision and daylight vision.   We measure these light levels as exposure values (EV) or stops.   In nature they vary roughly from -6 EV (black night) to +22 EV (direct sunlight).   Reflective surfaces such as paper are typically constrained to a range of 8 EV.   Film ranges are typically 8-11 EV.   Digital sensor ranges are typically 8-12 EV.   For the record, a 12 EV difference equates to about 72 dB or a contrast ratio of 4000:1.


Color is a psychological response to a physical stimulus provided by three different types of cones (RGB).   We can measure the physics (spectral response) of various colors but our perception and the names we assign to these colors will always remain subjective.   Our visual systems can adapt to the spectral colors in the light source as well as the spectral response of the object’s color.   This is possible because the cones in our central vision area are covered by a yellow filter called the macula.   Elsewhere in the retina (peripheral vision) they are not.   This additional information allows our brains to adapt our perception of colors to various light sources.   In digital or photographic imaging we call this white balance.

It is important to note that in human vision these adaptations to color and to light intensity are not instantaneous.   Dark to bright can take a minute, bright to dark can take 10 to 30 minutes.   Likewise, these adaptations are more computationally intensive in the digital imaging functions.


Another difference is that the image plane on our retina is semi-spherical, not flat as it is with film or digital photography.   The small area covered by the macula determines the angle of view of our central vision.   Within this an even smaller area, the fovea, determines how we focus.   The rest of the retinal area determines the much larger angle of view of our peripheral vision.   And we use two eyes to form a single image.   In general our peripheral vision is less sharp and less color sensitive.   But our brains are very sensitive to distance and motion in this area.   We select camera lenses based on their relationship to the angle of view based on our central vision.   We call them wide angle, normal, or telephoto based on the angle of view in the captured image.   We sometimes stitch or combine multiple digital images together to accomplish impressively wide angle views called panoramas.   In a similar fashion, the brain uses information from each eye to expand our peripheral vision, add depth, and fill in the blind spot at the optic nerve of the other.

Precisely measuring the normal angle of view is difficult.   It is usually based on the horizontal or diagonal angle of view.   For normal central vision, this will be approximately 50 degrees, but our peripheral vision expands this to 120 degrees.   The focal area (under the fovea) is less than 2 degrees.   For a camera, the angle of view is a function of both the lens focal length and the size of the image plane.   So the proper lens will vary with the image format.   A simple test is to look at the image in the camera’s viewfinder and again with the unaided eye.   The normal lens will produce a similar perspective or field of view.   Just for reference, the focal length of the lens and cornea is about 22 mm.

This retina’s spherical image plane also means that our vision is less prone to spatial distortions that can occur with some camera lens systems, particularly wide angle lenses.   But the visual cortex has to learn to recognize shapes and details in the first several months of infancy.   And, we are not completely immune to optical distortions.   Simple evidence of this is found in the design of Roman columns.

Some Measurements

Just for fun, here are some more miscellaneous factoids: Rods vary in size from 1 to 5.5 µm in diameter and cones can vary from 1 to 10 µm in diameter.   Typical camera photosites vary from 5 to 12 µm in diameter.   The retina’s diameter is about 25 mm.   The macula area is only about 7 mm wide. At the middle of this the central fovea is a minuscule 1.5 mm.   Thus maximum resolution and sharpness is concentrated in a very small area of our central vision.   At the fovea’s center all of the photoreceptors are cones, there are no rods.   The area of maximum rod density is a ring about 10 mm across surrounding the macula.   These densities are shown in the illustration below.   But as mentioned already, they do not correlate to image forming pixels.

This figure would imply a peak density of 160,000 mm2 (10,000 PPI).   Another way to look at this would be to compare image forming neural sensors to some typical DSLR cameras and a printed image, as shown below.

  Rods Cones Image Sensors Density mm2 Density PPI
Retina 120,000,000 6,000,000 1,500,000 1,061 827
Fovea/Macula 20,000 115,000 115,000 11,937 2,775
Fovea/Center 0 25,000 25,000 14,147 3,021
Nikokn D40 (DX)     6,016,000 16,272 3,240
Nikon D300 (DX)     12,169,344 32,498 4,579
Nikon D3 (FF)     12,081,312 14,042 3,010
Print 8x10     7,200,000 140 300

Of course, unlike our vision, we also want resolution to be uniform in all areas of a photographic image.   That is, unless we want to emphasize depth or subject separation for artistic effect.   If we want larger prints we can capture with larger image formats, just as with film.   Or we can use digital tools to stitch multiple images together in a single print.   So pure resolution comparisons can be misleading, not only are the technologies different, the objectives are different.

The Rest

In the world of digital imaging we can attempt to simulate our visual adaptation with special tools and techniques such as High Dynamic Range (HDR).   Basically these expand the contrast in both the shadows and the highlights while compressing contrast in the mid tones.   The result is that we can reproduce details in both the shadows and highlights that would be difficult to see with our unaided eyes in a single view.   In other words, the image is more aesthetically pleasing than technically accurate.

Advanced spectral imaging for digital images is currently only being used in highly specialized scientific equipment and applications.   But it may hold promise for future photographic applications.   Obviously, depth and motion are also more advanced photographic imaging topics.   Some examples are three dimensional systems such as stereo viewfinders and lenticular prints.   And motion can be incorporated through video systems or animation graphics.

This is not an in-depth coverage of either photographic imaging or human vision, but I hope that it helps to illustrate both the similarities and the differences.   It is misleading to compare the 1.2 million nerve endings in a typical human eye to a 1.2 MP digital camera.   It is just as misleading to compare the 126 million rods and cones to any digital camera metrics.   Both are fascinating examples of technological excellence.

I hope you also gained some new insight from this article.   If you have any comments, or suggestions, I would welcome your input.   Please send me an  Email

Rags Gardner
Rags Int., Inc.
204 Trailwood Drive
Euless, TX 76039
(817) 267-2554
Send Email
August 1, 2008

This page last updated on: Friday August 01 2008
You are visitor number 24,020 since 08/01/08