In Time and Conscious Experience it was proposed that our sense of depth is due to the representation of objects in time as well as space. It is interesting that optical depth illusions such as the Necker Cube are activated strongly by scanning the image, moving your eyes from one part to another.
Necker cube (courtesy Wikimedia Commons)
It takes about half a second for the cube to change form. The 3D sensation generated by the cube is a neural event, a "representation". The representational nature of the events is shown by the Kanizsa version of the Necker Cube:
Necker Nocube (courtesy Wikimedia Commons)
Although we call the Necker Cube a "3D" effect it is nothing of the sort because we cannot see behind the bars of the cube.
If the Necker Cube and similar illusions are not 3D effects what is the nature of the apparent distance that occurs between the front and the back of the image? The most likely explanation is that this distance is a time interval that corresponds to the motion that occurs when we scan the image and a part of it pops out towards us. This is also suggested by the instability of the "3D" image where, when we stare at the cube in one configuration for any length of time it loses its "3D" character until it flips into a new form.
This relationship of "3D" to active viewing is beautifully brought out in some "3D" films (for instance in the recently released "Avatar"). In this form of "3D" each eye is fed an image that corresponds to the image that it might receive if I were actually viewing the objects that are on the screen in real life. This allows us to focus on different planes in the image and hence actively explore the visual field. It is the active exploration of the field that gives it a "3D" effect.
Notice that we can never see the back of objects and that when we see "behind" a nearby object using two eyes we are actually looking through a distorted, stretched, transparent, 2D representation of the nearby object. The truth is that "3D" vision is active vision, it is actively focusing from one plane to another in a visual field (not seeing the back of objects as would be the case with real 3D). The "3D" of modern cinema is providing us with the best 2D representation of the world to date. Previous artwork using perspective and other tricks was less able to provide that essential ingredient of 2D representation of an nD world, the active engagement of the onlooker.
The apparently 3D world of our perception is actually a 2D world plus timings to different places within perception. It is a model for the effects of movement, the separation of this page being related to how long it takes to re-focus,the change in length of the ciliary muscles and, given that biological timings of motion will have known relationships in my brain, to how long my arm would take to touch it. The "depth" in perception is an "action space" (Cutting and Vishton 1995).
Cutting, J.E. & Vishton, P.M. (1995) Perceiving layout and knowing distances: The integration, relative potency, and contextual use of different information about depth. In W. Epstein & S. Rogers (eds.) Handbook of perception and cognition, Vol 5; Perception of space and motion. (pp. 69-117). San Diego, CA: Academic Press.