Perception Is Something You Do
Section 6 of Chapter 2 — sensorimotor contingencies and affordances
The most concrete form of the embodied claim — and the one that lands directly in the laboratory.
The previous section traced how a seventeenth-century strategic move shaped four centuries of how we think about mind and body. It explained why the embodied turn has had to be argued for at all. It did not yet say what perception, on the embodied view, actually is.
This section is where that account lands.
Let me ask you to do something that seems trivial but isn’t. Look at the screen in front of you. Now shift your gaze slightly to the left. Now back to the right.
What happened? The visual scene changed. Not because the scene itself moved — it didn’t. Because you moved. Because your action — the eye movement — produced a specific, lawful change in what you were receiving through your visual system. If you had turned your head instead, the change would have been different. If you had closed one eye, different again. If you had moved closer to the screen, different again.
The relationship between what you do and what you see is not arbitrary. It is structured. It is lawful. And — this is the central claim of this section — mastering that lawful structure is what perception actually is.
The standard picture and its problems
The standard picture of perception — the one implicit in most introductory accounts and in much of the Cartesian inheritance we just examined — goes something like this.
The world contains objects with properties. Those properties emit or reflect energy: light, sound, pressure, chemicals. That energy impinges on sensory receptors — the retina, the cochlea, the skin. The receptors transduce the energy into neural signals. The neural signals are processed by the brain. The brain constructs an internal representation of the world. And the agent perceives that representation.
On this picture, perception is fundamentally passive. The world acts on the agent. The agent registers and represents. The agent’s job in perception is to receive accurately and represent faithfully.
There is something right about it. Energy does impinge on receptors. Neural processing does occur. But there is something deeply incomplete about it — and the incompleteness matters for how we understand scientific observation.
Here is one thing the standard picture cannot explain. If perception were just passive reception, then identical retinal stimulations should produce identical percepts. But they don’t. The same pattern of light on the retina is perceived as near or far, moving or stationary, figure or ground, depending on what the perceiver is doing and has been doing. The context of action shapes what is perceived. The perceiver’s history of sensorimotor engagement with the world shapes what a given stimulation means.
Here is another. If perception were just passive reception, then equalised visual stimulation — without active motor engagement — should produce normal perceptual development. The evidence, as we’ll see shortly, points decisively against this. Perception is not what happens when signals arrive. It is what happens when an agent acts in the world and the world responds.
Sensorimotor contingencies
Alva Noë and Kevin O’Regan, in a landmark 2001 paper, made this precise. Perception, they argued, depends on the perceiver’s practical mastery of what they called sensorimotor contingencies — the lawful, learnable dependencies between an agent’s actions and the changes in sensory stimulation those actions produce.

A sensorimotor contingency is a structured relationship of the form: if I do this, my sensory stimulation will change in this way. If I turn my head to the left, the visual field will shift in a specific direction and at a specific rate. If I move my hand across a textured surface, the tactile sensation will change in a specific pattern. If I open my mouth and change my vocal tract configuration, the acoustic signal I produce will change in a specific way. These are not rules that the agent explicitly knows and consciously applies. They are practical skills — embodied competences that the agent has acquired through engagement with the world and now exercises fluently, without deliberation.
To perceive, on this account, is to exercise that practical mastery. It is to engage with the world in a way informed by an implicit grasp of how sensory stimulation depends on action. The agent who sees a cube does not have a stored internal representation of the cube. The agent has a set of sensorimotor competences — knowing, implicitly, how the visual array will change if they move around the cube, lean forward, shift their gaze to the hidden faces. The cube’s three-dimensionality is not stored in a representation. It is available through the agent’s capacity for action.
This is a radical reframing. Perception is not a matter of having the right internal pictures. It is a matter of being able to do the right things — of having the right practical access to the world through action.
And notice how this connects to what we have built this chapter. Merleau-Ponty’s body schema is precisely the pre-reflective mastery of sensorimotor contingencies — the body’s implicit grasp of how its actions couple with the world’s responses. Piaget’s action schemas are the developing agent’s progressive acquisition of new sensorimotor competences through engagement and accommodation. Autopoiesis grounds the whole picture biologically: the living system maintains its organisation through structural coupling with its environment, and sensorimotor contingencies are the perceptual face of that coupling.
Three things the laboratory shows
This is not just a philosophical position. It has empirical anchors. Three of them deserve to be named.
Change blindness. In a striking series of experiments, researchers showed that observers fail to notice large, obvious changes in a visual scene when those changes occur during a brief interruption — a blink, a cut in a film, a momentary disturbance. A person’s jumper changes colour. An object disappears from a table. An entire background shifts. And observers, remarkably, simply do not notice.
This is puzzling on the standard picture. If perception involves constructing a detailed internal representation of the scene, why don’t we notice when the scene changes? The sensorimotor account gives a clean answer: the visual system does not maintain a complete internal copy of the scene. It maintains practical access — the capacity to look at any part of the scene and sample it when needed. When the change occurs while the relevant part of the scene is not being actively sampled, it goes undetected. The world functions as its own external memory. Perception is the activity of re-accessing it, not the possession of a copy of it.

The kitten carousel. A classic developmental study by Held and Hein in 1963. Two kittens were raised in conditions of matched visual stimulation — they received equivalent visual input. But one kitten could actively move through the environment, while the other was passively carried in a gondola whose motion mirrored the active kitten’s. The active kitten developed normal visual perception. The passive kitten, despite receiving equivalent visual stimulation, developed severe perceptual deficits.
The implication is clear: perception is not a function of stimulation alone. It depends on the agent’s active, self-produced movement through the environment. Sensorimotor contingencies are learned through action, not through passive exposure. The body that moves and perceives simultaneously is not receiving information from the world — it is constituting its perceptual access to the world through its own activity.
Sensory substitution. The work of Paul Bach-y-Rita on tactile-visual substitution devices, dating to the late 1960s and developed for decades afterwards. A camera feeds a signal to a tactile array placed on a blind person’s back or tongue. With practice, subjects learn to use this device to perceive spatial layout — to navigate environments, to identify objects, even to catch thrown balls. The tactile channel becomes, through the acquisition of the relevant sensorimotor contingencies, a channel for spatial perception.
What does this tell us? It tells us that what matters in perception is not which sensory channel carries the information — it is whether the agent has mastered the lawful relationship between its actions and the changes in stimulation those actions produce. Perception is the skill of sensorimotor engagement, not the possession of a particular sensory organ.
Affordances and the ecological view
Alongside sensorimotor contingencies, there is a complementary framework that approaches the same reality from an ecological angle: James Gibson’s theory of affordances.
What agents perceive, Gibson argued, is not — in the first instance — the physical properties of objects, their mass or chemistry or geometry. What agents perceive are action possibilities that the environment offers relative to the agent’s capacities. A surface affords walking-on if it is flat, solid, and roughly horizontal relative to the agent’s size and locomotor capacities. A branch affords grasping if it is the right diameter and rigidity relative to the agent’s hand. A tool affords using if it has the shape, weight, and balance that the agent’s action repertoire can exploit.
Affordances are relational. They exist neither purely in the environment nor purely in the agent, but in the relationship between the agent’s embodied capacities and the structural properties of the environment. A doorknob affords grasping for a human hand but not for a paw. A puddle affords wading for a child but jumping-over for an adult. The world is not a neutral collection of physical objects. It is a field of action possibilities structured by what the perceiving, acting agent can do.
This connects directly to the habitat concept the series introduced in Chapter 1. The habitat is precisely the world as structured by affordances — the environment as it presents itself to an agent with specific morphological and sensorimotor capacities. And the affordance landscape — the full set of action possibilities available to an agent — is shaped not only by biology but by development, skill acquisition, cultural practice, and institutional and technological infrastructure.
The affordance landscape of a trained scientist is enormously richer than that of a novice. The instruments, notations, protocols, and shared practices of a scientific community manufacture new affordances — they make perceivable and actionable things that were previously imperceptible. The telescope made the moons of Jupiter visible — not by changing Jupiter or the moons, but by extending the affordance landscape of the astronomer. The balance made mass differences perceptible at a precision the unaided hand could not reach. Scientific instruments are, in this precise sense, affordance-extending technologies. They expand what the agent’s sensorimotor system can couple with.
This will return throughout the series. The whole architecture of scientific knowing, on the picture we are building, is the long historical project of extending the affordance landscape of inquiry.
A horizon — the whole-body view
Before we close, a gesture toward where this framework is heading.
Sensorimotor contingencies give us a precise theory of perception as active skill. But the account, as classically stated, tends to locate the relevant machinery in the relationship between sensory input and motor output — in the coupling between what the agent does and what signals arrive at its sensory surfaces.
An explicitly embodied extension asks: what if perception is not just a matter of the relationship between action and sensory stimulation at the periphery, but of the whole body conceived as a network of sites that simultaneously sense and act? Each part of the body is not just a motor effector or a sensory receptor — it is both at once. The hand that reaches also feels. The eye that looks also moves. The throat that vocalises also resonates. Every action is simultaneously a form of sensing, and every sensation occurs within an ongoing matrix of action.
This is the direction the Sensation-Modulating Network framework explores — and it will surface again later in the series when we return to instruments, measurement, and what bodies-with-tools make perceivable. For now, hold it as a horizon. Sensorimotor contingencies give us the relationship between action and sensory change. The whole-body network gives us the context within which that relationship is always already embedded. Neither is complete without the other.

Two clean landings
So what do sensorimotor contingencies and affordances give us?
They give us the perceptual mechanism that Merleau-Ponty’s phenomenology described but did not explain, that Piaget’s developmental account assumed but did not specify, and that autopoiesis grounded biologically but did not elaborate perceptually. Perception is not passive reception — it is active, skilled engagement with a world that responds to action in lawful, learnable ways. Affordances are the world’s side of that engagement — the action possibilities the environment offers to an agent with specific embodied capacities. Together they give us a picture of the perceiving, acting agent as fully embedded in its world: not a receiver of signals but a participant in an ongoing sensorimotor conversation.
Take-home point 1. Perception depends on the agent’s practical mastery of sensorimotor contingencies — the lawful dependencies between actions and the changes in sensory stimulation those actions produce. Perceiving is not having the right internal pictures but having skilled access to the world through action. Change blindness, the kitten carousel, and sensory substitution all converge on the same conclusion: perception is constituted through active, self-produced engagement, not through passive reception of stimulation.
Take-home point 2. Affordances — Gibson’s term for the action possibilities the environment offers relative to the agent’s capacities — are relational: they exist in the coupling between agent and environment, not in either alone. The affordance landscape is shaped by biology, development, skill, cultural practice, and technological infrastructure. Scientific instruments extend affordances — they make perceivable and actionable what was previously beyond the agent’s sensorimotor reach.
Next: “The Biological Basis of Social Life and Language” — what changes when the agents that develop sensorimotor mastery are also social and linguistic. The last threads of Chapter 2 before the synthesis.
Image prompts used for this post. Try them on your own AI model and compare what it produces with our figures.
1. Two pictures of perception
Output format: PNG. Landscape, 16cm × 9cm. A single schematic diagram divided into two halves by a thick vertical line. LEFT HALF — labeled "Standard picture: passive reception": cool gray-blue tones; the world is a small scene of objects on the left edge, with arrows of "energy" (labeled light, sound, pressure) flowing toward a stylized human head on the right; inside the head, a small picture of the world is being assembled (a faded internal screen); a caption below: "The world impinges. The brain constructs. The agent receives." RIGHT HALF — labeled "Sensorimotor account: active engagement": warm amber tones; a stylized human figure in profile, body engaged, with arrows going BOTH ways — outward from the figure (labeled "action: turn, move, reach") and inward (labeled "sensory change"); the figure and a small environment are joined by a closed loop of action-and-change; a caption below: "The agent acts. The world responds. Perception is the mastery of that loop." Above both halves, large caption: "Two pictures of perception." Below both halves, smaller caption: "Same retinal stimulation. Different theories of what perception is." Soft warm palette for the sensorimotor side; cooler gray-blue for the standard picture; the dividing line solid and unambiguous. Sketched, schematic line-art; not photographic.2. The sensorimotor loop — seeing a cube
Output format: PNG. Landscape, 16cm × 9cm. A single diagram showing the sensorimotor mastery of a cube. CENTER of the figure: a wireframe cube floating in space, with three of its faces visible and three hidden. AROUND the cube, four small stylized human figures in profile, each in a different posture, with curved arrows showing their possible movements and what each movement would reveal. Figure 1 (top): leans forward, with an arrow showing "lean forward → top face becomes visible." Figure 2 (right): walks around to the right, arrow showing "circumnavigate → hidden faces revealed." Figure 3 (bottom): tilts head, arrow showing "tilt → parallax changes." Figure 4 (left): reaches a hand toward the cube, arrow showing "reach → tactile contingency activated." Each arrow is dashed and labeled with the action and the sensorimotor consequence. Above the cube, large caption: "The cube's three-dimensionality is not stored. It is available through action." Below the cube, smaller caption: "Knowing a cube = mastering the family of sensorimotor contingencies it affords." Soft warm tones; the cube in clean line; the four figures sketched lightly so the action-flows are the visual focus. Sketched, schematic, not photographic.3. Three things the laboratory shows
Output format: PNG. Landscape, 18cm × 8cm. Three side-by-side panels, each illustrating one piece of empirical evidence for the sensorimotor view of perception. PANEL 1 — "Change blindness": a person looking at a scene of a desk with several objects; between two frames of the figure (shown side by side), one object has subtly changed (e.g., a coffee cup has moved or a book has changed color), with a small "blink" or "flash" symbol indicating the brief interruption between the two frames; the person's expression unchanged, with a thought bubble showing the unchanged scene; caption below: "Large changes go unnoticed when the interruption breaks active sampling. The brain does not store a copy of the scene." PANEL 2 — "Kitten carousel (Held & Hein, 1963)": two kittens, side by side, in a circular arrangement. The left kitten walks freely with a harness, casting its own shadow as it moves through a striped environment; the right kitten sits in a small gondola/basket that moves passively, mirroring the active kitten's motion, in the same environment; arrows show the active kitten's self-produced motion versus the passive kitten's externally-produced motion. Below the two kittens, two small icons: a healthy eye (left, "normal vision") and a faded eye with a question mark (right, "perceptual deficits"); caption below: "Same stimulation. Different action. Different perceptual development." PANEL 3 — "Sensory substitution (Bach-y-Rita)": a blind person with a small camera on glasses or head, and a tactile array placed on the back, shown in cross-section; lines connecting the camera to the array indicate signal flow; in front of the person, a ball is being thrown; the person reaches confidently to catch it; caption below: "Through practice, the tactile channel becomes a channel for spatial perception. The medium is not the perception — the mastered contingency is." Above all three panels, large caption: "Three things the laboratory shows." Below all three panels, smaller caption: "Different experiments. Same conclusion: perception is something the agent does, not something done to the agent." Soft warm and earth tones; sketched, schematic line-art; not photographic.4. Affordances are relational
Output format: PNG. Landscape, 18cm × 8cm. Three side-by-side panels, each showing the SAME object — a doorknob, a tree branch, and a flat surface — but encountered by DIFFERENT agents with different embodied capacities. The panels make visible that the affordance is in the relationship, not in the object. PANEL 1 — "Doorknob": LEFT side of the panel shows a human hand reaching for a round doorknob, with a small arrow labeled "affords grasping"; RIGHT side of the same panel shows a dog standing on hind legs in front of the same doorknob, paw raised but unable to grip, with a small arrow labeled "does NOT afford grasping (for a paw)." Caption below: "A doorknob is graspable for hands, not paws." PANEL 2 — "Branch": LEFT side shows a small bird perching easily on a slender branch with claws curled around it, labeled "affords perching"; RIGHT side shows a large heavy mammal (e.g., a leopard) attempting the same branch but the branch bending under its weight, labeled "does NOT afford perching (at this body weight)." Caption below: "A branch is perchable for a bird, not for a heavy body." PANEL 3 — "Puddle": LEFT side shows a small child wading happily through a puddle up to mid-shin, labeled "affords wading"; RIGHT side shows the same puddle but with an adult stepping easily across it without getting wet, labeled "affords stepping-over." Caption below: "A puddle is wadeable for a small body, step-overable for a larger one." Above all three panels, large caption: "Affordances are relational." Below all three panels, smaller caption: "The world is not a neutral set of objects. It is a field of action possibilities, structured by what the perceiving, acting body can do." Soft warm tones; sketched, schematic line-art; not photographic.The same stream (prompts) activates different snapshots (models) in different receivers (agents). Try the prompts above on your own AI model and compare what it produces with our figures.
This is “The Roots of STEM,” a series exploring the cognitive bases of science, technology, engineering, and mathematics. Subscribe to follow the arc from the body to the laboratory.



Oh this was really great reading for me. Highly appreciated. I am interested in the same field and found it refreshing to see how we come from different angles (i come from a morphed combination of disciplines which i have coined a “Morphinuum”)to the same destination! Looking forward to reading more of your work.