Your visual system does not work like a camera. If you don’t believe me, think about what happens to an image if, while taking it, you shake the camera from side to side. The image is fuzzy, right? Now shake your head from side to side. The world doesn’t get blurry, does it? This is because the brain does more than just take snapshots of what hits your retina. The instant light hits your retina, the image is deconstructed into, essentially, pixels. In visual processing, these “pixels” are interpreted and eventually reintegrated into recognizable percepts.
Photoreceptors in the retina translate the stimulus into neural signals. The initial deconstruction of the stimulus breaks the image up based on brightness and wavelengths of light tat fall on the retina. The extensive processing of visual information performed even just within the retina constitutes an elaborate convergence of information. We have about 260 million photoreceptors, but there are only 2 million ganglion cells, which are the cells that send information out from the retina, via the optic nerve, and on to the central nervous system. At this very early stage of visual processing, the image is being parsed, and the compression of this information suggest that higher-level visual centers must be sophisticated and efficient processors in order to recover visual details.
Signals from the photoreceptors are then sent from the optic nerve to the optic chiasm where they are split up yet again. Signals carrying information about the right visual field are routed to the left hemisphere, and signals carrying information about the left visual field are routed to the right hemisphere.1 The signals are now “in the brain”. Each optic nerve divides into pathways that terminate at different places in the subcortex, but 90% of the fibers go to the lateral geniculate nucleus (LGN).2 From the LGN, nearly all of the optic fibers terminate in the primary visual cortex. Even by the time visual information reaches the primary visual cortex, it has been processed by at least four distinct neurons. As you recall, this means that a whole lot of information has been compressed, because each neuron can only send one of two signals: 0 or 1. How, then, does the visual cortex reintegrate all of this parsed, compressed information into recognizable percepts?
From the primary visual cortex, information is sent to distinct regions throughout the larger visual cortex that carry out specialized processing functions. For example, area V4 processes color information, while area V5 processes motion information. Generally, as information moves from the primary visual cortex to these more specialized visual processing areas—moving from the very back of the brain toward the front—the processing becomes more and more sophisticated.
To get the big picture, information that hits the retina is parsed and compressed as it is projected to the primary visual cortex at the very back of the brain. From there it is projected forward through the visual cortex and reintegrated into recognizable percepts. How the information is reintegrated isn’t fully understood. One possibility is that different processing tasks are delegated to different visual areas, and each visual area provides its own limited analysis based on the attributes it’s in charge of processing. As information progresses through the visual system, different areas elaborate on the initial information projected onto the primary visual cortex and begin to integrate that information across dimensions.
This (somewhat) detailed description of visual processing will hopefully help you better understand how neural networks in machine learning work. Machine learning programs, just like the brain, have to process a lot of information. To make them more efficient, computer scientists have started modeling machine learning programs after the brain. It’s not a crazy idea; the brain is the most efficient information processing system we know of!
Say you want to write a program to recognize hand-written digits. The computer receives as one input, for example, an image of the digits 0–9 written by me on a piece of paper, and, as another input, an image of the digits 0–9 written by you on a piece of paper. Now all the computer “sees” when it “looks at” these images are huge matrices of pixel values.
Each value in the above matrix corresponds to the brightness of a single pixel of the input image. How does the computer interpret all of those numbers? How does it find a pattern in them such that it “knows” that it’s looking at a “0” or a “1” or a “2” et cetera? To handle this and other machine learning problems, computer scientists build programs called artificial neural networks (ANNs) that parse, compress, and interpret data analogously to how the brain parses, compresses, and interprets information. ANNs are inspired by biological neural networks, and they consist of an interconnected group of artificial neurons that find and represent or model complex relationships and patterns in data.
Artificial neurons are, no surprise here, based on neurons. The “cell bodies” in these artificial neuron models are called units, and each unit has a function to be computed. Activation of these units (“the neurons firing”) just means that that computation is computed. These units are organized into layers based on where they are receiving information. The first layer is called the input layer because these units are receiving the raw data as input. The last layer comprises usually just a single unit, and it is called the output layer.3 The output unit is the unit that computes the final value, whatever that might be. (In our example above, it would be a “0” or a “1” or a “2” et cetera)
There are also hidden layers, which is misleading terminology. Hidden layers are just any and all layers that are neither the input nor the output layer. All hidden layers receive input either from the input layer, or from other hidden layers. Like in the visual system, information is propagated forward in artificial neural networks. Each layer computes input from each unit in the previous layer, and outputs a single value. The raw data values are computed by the first layer units, each of which sends a discrete value onto the units of the second layer (or the output layer if the network is simple). Then the second layer units compute input → output, then the third, and fourth, and fifth—however many layers the network has—until the output layer computes the final value.
Hopefully the similarity between neurons and units is becoming clearer to you. Neurons receive and process a lot of signals from other cells and output a single signal; units receive and compute a lot of values from other units and output a single value. This is not a coincidence; units are modeled after neurons!
What about bona fide neural networks and artificial neural networks as whole networks? Without getting into the specific computations performed by each unit, it’s difficult to understand what exactly the units are doing, and thus difficult to grasp the intuition of ANNs on a larger scale. Essentially, the units in each layer compute the input and, at least in all of the (admittedly few) ANNs that I’ve programmed, output either 0 or 1, meaning either that the information they were programmed to compute for was either present in the data they received, or it wasn’t. Analogously, certain cells in the visual system will fire if the stimulus is moving but not if it isn’t moving, for example. To put it concretely, this cell would output a 1 if the stimulus is moving, and a 0 if the stimulus isn’t moving.
Computation tasks are delegated to different layers of an artificial neural network analogously to the delegation of different visual processing tasks to different visual area, and each layer/area provides its own limited analysis based on the information it is designated to compute/process. As data moves through an ANN, each unit in each layer computes the outputs from the units in the antecedent layers, until the output layer computes the final value.
I understand that this is vague, and I hope that you will extend me a bit of faith. However, the correlation between the visual system and an artificial neural network should be clear enough that we can switch back and forth between the two in understanding information processing in both. It is sometimes useful to use ANN terminology to refer to visual processes, and it is sometimes useful to use visual system terminology to refer to ANN functions. If nothing else, that we can switch back and forth in this way should alert you to the fact that these two information processors are analogous in very real ways. Again, this is not a coincidence!
This does not mean that signals carrying information from the right eye are routed to the left hemisphere or that information from the left eye is routed to the right hemisphere, which is a common misconception. Information that falls on the outside of each eye stays on that same side while information that falls on the inside of each gets crossed over to the other side. ↩
The other 10% get sent to either the pulvinar nucleus or the superior colliculus. These subcortical structures play a major role in visual attention. Though most visual information gets routed to the LGN, keep in mind that 10% of the human optic nerve constitutes more than fibers than are found in the entire human auditory pathway. ↩
In multi-class classification problems, the output layer will have more than one output unit. ↩