We know a lot about how neurons function. At the same time, we know a lot about the psychology of our behaviour, and even about the general brain areas that give rise to these behaviours (cognitive neuropsychology and cognitive neuroscience). But between these two levels of inspection, there is a great divide that we still know very little about. This goes by the name of systems neuroscience, where the systems in question can be anything from simple two neuron networks like the half-centre oscillator, up through more complex central pattern generators, to hugely complex systems of interaction between different brain areas, such as the basal ganglia and cerebral cortex. Furthermore, just as as cellular biology is grounded by molecular biology, which in turn is grounded by chemistry, and then physics, so we seek to ground psychological, behavioural and cognitive sciences in neuroscience by understanding the levels that exist between them.
Below I describe a network which gives an example of the way that neuronal processing can give rise to the higher level ideas that we think in. It is a highly (highly highly!) simplified model, but hopefully will serve to demonstrate the idea. At the bottom of the page, I discuss how biologically relevant this model is, along with the more complex models that I based it upon.
Below I describe a network which gives an example of the way that neuronal processing can give rise to the higher level ideas that we think in. It is a highly (highly highly!) simplified model, but hopefully will serve to demonstrate the idea. At the bottom of the page, I discuss how biologically relevant this model is, along with the more complex models that I based it upon.
The task
Similar to the way the retina is activated by light stimulating different photoreceptors, but what we percieve is different people, animals, things, etc, this network will be stimulated as a series of light or dark points on a layer of "retina" cells, and will decide which of a number of shapes is being formed by the light and dark patches.
The network will be presented with an array of cells that are either black or white. This represents an object in the world stimulating the photoreceptors of the retina. In the picture on the right, the object which causes the array is an X, but to the network, what is sensed is just a collection of ones and minus ones (I'm using -1 instead of 0 for practical reasons).
The problem is for the network to recognise the object as an X from the pattern of activation. However, to make it more difficult, I want the network to be able to recognise an X that falls anywhere on the retina, so it should recognise all of these Xs --> Notice that the two Xs in each row do not share even a single active (black) cell, meaning that recognising the X pattern cannot simply be a case of knowing which cells are usually active in an X, but must involve an abstract knowledge of what an "X" is. (<-- this sentence is important!! Read it again) This can be likened to our ability to recognise objects no matter from which angle we view them. Furthermore, we want the network to be able to recognise more than just one kind of object. So, this network will be able to recognise each of the objects on the right, when they are presented anywhere on its retina (the 4x4 grid). Each of the four objects can be presented to the network in four different positions, except for "water", which can only be presented in three positions (no side shifting due to its width). This is like our ability to recognise everything we know, despite the fact that a lot of the neurons that become active when we sense something are the same. |
Zum Bearbeiten hier klicken.
|
The way in which the network is going to solve this problem is to employ hierarchical processing. Each subsequent level of the hierarchy will recognise certain patterns in the activation pattern of the previous layer. Here is an example:
The X is made up of two different mini-shapes, each used twice. On the left I've called the two mini-shapes A and B. Think of each as a tile. If you arranged four tiles like this:
A B B A with all four overlapping slightly, an X can be formed where the four tiles all share the central black cell. (it might be worth you drawing this out on paper) So, the first stage of the processing is to recognise the mini-shapes. Since each mini-shape occupies a 2x2 grid, and the entire retina is 4x4, there are 9 different positions that each mini-shape could appear in. On the left I've shown four of them. The second layer of cells in the network is therefore a 3x3 grid (3x3=9), where each cell is active if and only if it senses the exact pattern it is looking for in the 2x2 grid that it references in the previous layer. An example:
Here we can see the retina layer (the input layer). There are four cells active, and they form two diagonal lines which are the same as the pattern B tile above. This is one section of the first layer of pattern detecting neurons. It detects diagonal lines going from top right to bottom left. In this case it has detected two of these patterns, and it is showing in which positions on the input layer they are. Notice that it has simplified the pattern by transforming a four cell pattern to a single cell which basically says: "it's here". |
One of the special things about this network is its hierarchical structure. At each level the cells look for patterns in the previous layer of cells, so as the information travels further up the hierarchy, active cells represent more complex combinations of features. Here is a simple illustration of this:
The red spots are the low-level patterns (edges, diagonal lines, curves, etc) that are detected in the pattern of activation of the retina.
The green spots are the cells that pick up certain patterns of red spots. For example; corner + curve + corner = wheel arch.
The blue spots are the cells that pick up certain patterns in the green spots. For example; wheel arch + door + bonnet + .... = car.
One thing you may notice about this picture is that even though the things being recognised become more complex as we look up the layers, the cells in each layer are not actually doing any different job to the cells in the other layers...
In each layer the cells simply find patterns that exist in the activation of the previous layer.
This is the amazing thing! Think about this for a moment; even though the ideas are becoming more and more complex, the neural layout that is processing them is the same.
The green spots are the cells that pick up certain patterns of red spots. For example; corner + curve + corner = wheel arch.
The blue spots are the cells that pick up certain patterns in the green spots. For example; wheel arch + door + bonnet + .... = car.
One thing you may notice about this picture is that even though the things being recognised become more complex as we look up the layers, the cells in each layer are not actually doing any different job to the cells in the other layers...
In each layer the cells simply find patterns that exist in the activation of the previous layer.
This is the amazing thing! Think about this for a moment; even though the ideas are becoming more and more complex, the neural layout that is processing them is the same.
example
Below there is an example of how the network I made does this. I've used the 'worm' pattern which I showed you earlier. The network consists of four layers of neurons, and the activation flowing from layer to layer is shown as black cells. Each layer's activation is solely dependent upon the previous layer only.
The first layer is the retina. When a worm appears in the visual field, its form is projected upon the retina.
The next layer (layer 1) looks for any basic patterns in the image that is being picked up by the retina. Here I've only shown one kind of pattern detector (like a backwards L), but in the full model there are another two basic patterns (the A and B tiles I talked about earlier). With just these three basic pattern detectors, all four objects can be detected in any position on the retina.
The next layer (layer 2) has a similar 2x2 pattern detector which searches layer 1 for a diagonal line.
The final layer contains only a single cell. This is active if it detects any activity at all in the previous layer. This last layer is like the grandmother cells that you may have heard of.
One final thing needs mentioning before I present the full model. In the above example, every cell in a layer looks back at a 2x2 square of adjacent cells in the previous layer. In the full model I have been flexible with these constraints, so one feature detector in layer 2 is 1x3 (a line of three adjacent cells in the previous layer), while another looks at just two cells in the previous layer that are quite far apart, but none of the cells in between. These variations are all biologically plausible, and I will explain them in more detail later.
The next layer (layer 1) looks for any basic patterns in the image that is being picked up by the retina. Here I've only shown one kind of pattern detector (like a backwards L), but in the full model there are another two basic patterns (the A and B tiles I talked about earlier). With just these three basic pattern detectors, all four objects can be detected in any position on the retina.
The next layer (layer 2) has a similar 2x2 pattern detector which searches layer 1 for a diagonal line.
The final layer contains only a single cell. This is active if it detects any activity at all in the previous layer. This last layer is like the grandmother cells that you may have heard of.
One final thing needs mentioning before I present the full model. In the above example, every cell in a layer looks back at a 2x2 square of adjacent cells in the previous layer. In the full model I have been flexible with these constraints, so one feature detector in layer 2 is 1x3 (a line of three adjacent cells in the previous layer), while another looks at just two cells in the previous layer that are quite far apart, but none of the cells in between. These variations are all biologically plausible, and I will explain them in more detail later.
The model
This model is quite complex, so rather than having you make the full thing, I'll show you how to set up one pathway down the hierarchy, then give you the full program later.
The pathway we will make together is the one that recognises the "worm" pattern anywhere on the retina (like the example above). It will be composed of:
To detect "worm" and "water, only 3 layers are detected (layers 1, 2 and 4), while the "X" and "O" require one more layer (layer 3). Don't worry about it till later.
The pathway we will make together is the one that recognises the "worm" pattern anywhere on the retina (like the example above). It will be composed of:
- the retina layer, where you put the pattern to simulate a worm being seen
- layer 1, which picks out basic patterns in the retina layers activation
- layer 2, which pick out patterns in the activation of layer 1
- layer 4, which contains only a single cell which will be active when a worm is detected, and inactive otherwise. This could be called the 'concept' layer, or the 'grandmother cell' layer.
To detect "worm" and "water, only 3 layers are detected (layers 1, 2 and 4), while the "X" and "O" require one more layer (layer 3). Don't worry about it till later.
the retina layerChoose a place on your Excel file and highlight a 4x4 square of cells in any colour you like. This will be the retina.
In each square, input -1. If you look carefully at my example, you'll notice there is a worm in there... can you see it?
Cells E4, F4, F3, G3 and G2 are not -1s; they are 1s. This pattern of 1s against a background of -1s is how the network senses things. When the network is finished, you can draw a worm in any of the four possible positions by setting the relevant cells to 1, and having all others be -1. |
Why is the retina 4x4?
We have to find a balance between simplicity for the sake of making it easier to build and understand, and complexity for the sake of observing interesting behaviour. I tried 3x3 first, but it didn't give enough options for combinations of patterns higher up the hierarchy. More than 4x4 would take longer to make. This retina has 16 cells, whereas a human retina has more than 100 million! Why -1? In the previous models, I've used 1 and 0 for firing and non-firing behaviours of the neurons. However, zeros can be troublesome, because they don't do anything! It is often more useful to replace the 0s with -1s, so that we get two perfectly opposing signals. In the pictures above, the black cells are the 1s, and the white cells are the -1s. You don't have to use the same place in the Excel file for your retina, but it might make it easier to follow. |
layer 1As we saw above, layer 1 is a 3x3 square. Choose a place for this and colour it with a different colour (the colours are just to make it easier to see). -->
You don't need to put any numbers in here yourself - each Excel cell represents a neuron, so we will use a function to connect it with some cells in the retina layer, and it will calculate its own activity. Let's start by wiring up the top left cell in layer 1 (D6 in my model). In the picture on the right, you can see how this is done. If the sum of the four cells that D6 is connected to is greater than 3.5 (the threshold), the cell fires (output =1), other wise it doesn't fire (output = -1). However, if you look carefully, this is not a normal sum. One of the values (E1) is negative. Why is this? The answer is in the pattern we are trying to detect. This 3x3 square is trying to detect this pattern: E1 is in the same position as the white cell above. Since we defined white cells as -1, having a minus number for the synapse means that only when exactly this pattern is detected will the sum be 4. If one cell is a different colour to the pattern, the sum will be 3, and the threshold will not be surpassed.
Because every cell in layer 2 detects the same pattern but in different places on the retina, we can fill in the rest of the layer 1 cells by just dragging D6 accross, then down. This is shown in the two pictures on the right. You can now check that this is connected up correctly by double clicking on each cell. For example, if you double click on the middle cell of layer 1, it should refer to the middle four cells of the retina layer, as shown in the picture --> |
I used columns D, E & F, and rows 6, 7 & 8, as above.
|
layer 2Layer 2 does exactly the same thing as layer 1 did, but with two small differences:
Look back at the example. You will see that in order to detect the "worm" pattern, layer two must find activation in layer 1 that forms the pattern on the right --> Because white means a negative synapse and black means a positive one, the sum will be - + + - Here's the formula --> Again, you can fill in the other cells in the layer just by dragging the first one across and then down. This time there are only four (2x2) cells in this layer though. When you've done it, be sure to double check that all cells are wired up correctly by double-clicking them. |
|
layer 4 - the concept layerLayer 4 consists of just one cell, and that cell needs to fire if and only if any of the cells in layer 2 is active.
As there are four cells in layer 2, if all of them are inactive, the sum will be -4. If just one of them is active, the sum will be -2. For this reason, I've made the threshold -2.5. Here's the formula --> I've used the SUM( ) function rather than using D14+E14+D15+E15, but the result is exactly the same. Notice that I've used 1 and 0 for the two outputs here. This is because there are no deeper layers that rely on this, so it's ok to use a 0 now. You can use a -1 if you prefer. |
|
testing it
We have set up a "worm" detecting hierarchical network, so the only tests that we can perform are whether a worm is present somewhere on the retina or not. There are four possible positions that a worm can appear on the retina (shown below). Try each of them. If the network is functioning correctly, the layer 4 neuron should output 1 in each of these four cases.
But this alone is not enough. We also need to test that the network does not say that a worm is present when there isn't on on the retina.
Try presenting one of the other patterns to the retina. I used the "X" and "water" patterns, as shown on the right--> In both cases, the network did not output 1. This is equivalent to it saying; "no, this is not a worm". It can't yet recognise what they are. But it can recognise that they are not "worm". One final test. It would be good if the network could recognise a worm even when there is some other stuff around it. We can test this by drawing a worm again, and then randomly changing a few of the other -1s to 1s. I've shown an example on the right --> This works. However, if you play around, you might find that changing certain cells to a 1 will mean that the network can no longer recognise the worm. The pattern on the right is not recognised as a worm. Can you work out why? The answer lies in layer 1. A worm is composed of two backwards L patterns in layer 1, which are then recognised in their relative positions in layer 2. However, the upper of the two of these is corrupted in the picture on the right, so it will not be recognised by layer 1 as a backward L. This error will propagate up the network. |
|
Would you say that this is a reasonable error to make? Do you think the final picture above looks like a worm?
The complete network
Here is the full thing:
hierarchical_vision_network.xlsx | |
File Size: | 15 kb |
File Type: | xlsx |
The complete network is considerably more complex. I've tried to make it as easy as possible to work out what everything does by colour coding and separating the layers, and having black and white (and purple and white) pictures next to each area in a layer to show the pattern that it is searching for. Here is a picture of the full thing:
The bit that you just made is down the left.
You can see that there are more pattern detectors in each layer. In layer 1, for example, there are three different pattern detectors. Beneath each of these, I have drawn the pattern that it searches the retina layer for in black and white.
On the other end of the hierarchy, down at the bottom, you will see that each concept cell in layer 4 has a purple and white picture below it. I used different colours here because this is NOT the pattern that this cell is looking for in the previous layer, but instead it is the top-down overall pattern that is understood (the combination of all previous bottom-up layers).
If you look at the right hand side of layer 2, you can see two purple and white patterns. If you click of any of the cells in the layer 2 pattern detectors for "X" or "O", you will see that they are connected to cells in both the 2 detectors and 3 detectors in layer 1. I found it easier to see what the top-down goal was at this point, which is why I used the purple and white picture.
Another useful feature I added is the colour coding of active and inactive cells (1s in a darker shade that -1s in each layer). You can do this too using the Conditional Formatting function in the Styles section of the Home tab in Excel.
To test this network, you can use all the same kinds of tests that we used earlier for the one you built.
Have a good click around all the different cells. Try to understand exactly what is happening everywhere.
You can see that there are more pattern detectors in each layer. In layer 1, for example, there are three different pattern detectors. Beneath each of these, I have drawn the pattern that it searches the retina layer for in black and white.
On the other end of the hierarchy, down at the bottom, you will see that each concept cell in layer 4 has a purple and white picture below it. I used different colours here because this is NOT the pattern that this cell is looking for in the previous layer, but instead it is the top-down overall pattern that is understood (the combination of all previous bottom-up layers).
If you look at the right hand side of layer 2, you can see two purple and white patterns. If you click of any of the cells in the layer 2 pattern detectors for "X" or "O", you will see that they are connected to cells in both the 2 detectors and 3 detectors in layer 1. I found it easier to see what the top-down goal was at this point, which is why I used the purple and white picture.
Another useful feature I added is the colour coding of active and inactive cells (1s in a darker shade that -1s in each layer). You can do this too using the Conditional Formatting function in the Styles section of the Home tab in Excel.
To test this network, you can use all the same kinds of tests that we used earlier for the one you built.
Have a good click around all the different cells. Try to understand exactly what is happening everywhere.
Biological relevance
The visual system, from retina to higher areas of the visual cortex, functions hierarchically. When visual information enters our eye, it actually goes through quite a lot of processing in the retina before it even starts its journey down the optic nerve to the brain (see my Retina page). The model presented here skips this retinal processing because we have the luxury of using clear pictures, whereas the images that the eye sees may be fuzzy, incomplete, or unclear in many other ways.
Hubel & Wiesel proposed the ice-cube model of cortex, shown on the right. Striate cortex is not actually organised exactly like this, but the ice-cube model is a good rough conceptual model to help us understand what striate cortex is doing. |
Once the information gets to the brain, it goes through several levels of processing. The picture on the left shows these levels. The information arrives via the optic radiation (the wire from the midbrain to the cortex) to vi
Layer 1 of the model presented above can be thought of as similar to area V1 (also known as the striate cortex) in the visual cortex because this is where the basic features are extracted. In the model the basic features are simple corners, lines, etc, while in the striate cortex the basic features are things like lines oriented in different directions and movement in different directions. |
Later layers of processing in the brain add increasingly more complexity to the visual data. V2 does much the same kind of thing as V1, but also adds some more psychological functions, such as the working out the orientation of illusory contours, field and ground differentiation, and binocular disparity. However, apart from the increasing complexity of the information they manipulate, relatively little is known about how extrastiate (not V1) visual cortex functions.
Inferior temporal (IT) cortex is known to be involved in object recognition, as lesions here can lead to inability to differentiate between objects (trying to eat your phone and phone with an apple). They also have much larger receptive fields and are not retinotopic (light that hits the retina in adjacent positions doesn't necessarily reach IT in adjacent positions). These last two features are both found in level 4 of the model presented here, which is the object recognising layer.
Inferior temporal (IT) cortex is known to be involved in object recognition, as lesions here can lead to inability to differentiate between objects (trying to eat your phone and phone with an apple). They also have much larger receptive fields and are not retinotopic (light that hits the retina in adjacent positions doesn't necessarily reach IT in adjacent positions). These last two features are both found in level 4 of the model presented here, which is the object recognising layer.
differences to the biological system
- First of all, the model shown here is a simplification of the real system. The retina is tiny - just big enough to support slightly complex shapes (composed of a combination of active cells) and allow the shapes to appear in different positions.
- Levels. I haven't attempted to make the right number of levels in the hierarchy - I just used whatever seemed right for each shape. It is not clear how many levels are in the ventral visual stream. The basic flow is: Retina --> Thalamus --> V1 --> V2 --> V4 --> IT. However, some of these layers may be better described as multiple layers (posterior IT, central IT and anterior IT are thought to be functionally differentiable, for example).
- Feedback. In our cortex there are more neural pathways feeding information back down the hierarchy (from ideas/concepts down towards the sensory neurons), but in this model there are no feedback pathways. Feedback is useful to fill in missing details. For example, if you see your best friend wearing a mask, there is not complete visual data. However, as long as there is sufficient data to recognise your friend, sensory data will travel up the hierarchy until the "friend" concept is activated, and this will then feed back down the hierarchy filling in the details that were missing what was sensed.
- Time. The dynamical (time related) functioning of the brain is one of its most fundamental properties. If you are unable to recognise a thing from one angle, a useful technique is to move your head around to get different perspectives. The same is true of sound - it is impossible to recognise a melody unless you listen to at least a few notes. This model, however, does not have any dynamical properties.