Podcast: Play in new window | Download | Embed

LM101-059: How to Properly Introduce a Neural Network

Episode Summary:

I discuss the concept of a “neural network” by providing some examples of recent successes in neural network machine learning algorithms and providing a historical perspective on the evolution of the neural network concept from its biological origins.

Show Notes:

Hello everyone! Welcome to the fifty-ninth podcast in the podcast series Learning Machines 101. In this series of podcasts my goal is to discuss important concepts of artificial intelligence and machine learning in hopefully an entertaining and educational manner. The earlier episodes in the series are intended for listeners who are relatively new to the field of machine learning while the later episodes in the series will eventually cover advanced topics in machine learning. Currently, we are about half-way between these two extremes but today we will jump back a bit and present an introductory episode.

Today, I will discuss the concept of a “neural network” by providing some examples of recent successes in neural network machine learning algorithms and providing a historical perspective on the evolution of the neural network concept from its biological origins.

Let’s begin with some terminology. The field of “Artificial Intelligence” is concerned with the construction of intelligent machines. The field of “Machine Learning” is a particular branch of the field of artificial intelligence which is concerned with the construction of intelligent machines which learn from experience.

Today, the concept “Neural Network” has multiple definitions as discussed in Episode 35 of Learning Machines 101 (https://www.learningmachines101.com/what-is-a-neural-network/). For a biologist, a “neural network” is a collection of brain cells. For a computer scientist a “neural network” is a type of Machine Learning Algorithm inspired by how brains work. A “deep learning network” is a type of “neural network machine learning algorithm”.

Before exploring these ideas in detail, let’s discuss a few popular examples of neural network applications.

Nathan Sprague has implemented an example of a type of Deep Learning Neural Network that learns to play Pong. Pong was one of the first video games ever invented and was released in 1972. It basically is a highly simplified version of ping-pong. Each of the two opponents have a paddle which moves in only one-dimension: Either up or down. In addition, there is a ping-pong ball which goes across the screen. You try to prevent the ball from going past you by moving your paddle up and down to block the ball and then the ball bounces off your paddle towards your opponent who tries to move her paddle up or down to block the ball.

You can watch the Deep Learning Neural Network learn to play Pong on YouTube. I have provided the link to the YouTube channel in the show notes of this episode of learning machines 101 at: www.learningmachines101.com. Nathan Sprague notes that his implementation is based upon the technical paper “Playing Atari with Deep Reinforcement Learning” and I have provided a link to that technical paper in the show notes as well.

The video tells a compelling story. You see the artificial neural network feebly move the paddle up and down while it misses many ping pong balls. But as the neural network acquires more experience, it manages to block more ping pong balls. And with large amounts of experience, the neural network plays like a world-class champion who has the both lighting fast reflexes as well as the ability to anticipate the ping pong balls trajectory!

Andrew Ng and his colleagues at Stanford University have used similar ideas to train a toy remote-controlled helicopter to perform complex acrobatic maneuvers. The basic idea in these experiments involved first having the neural network watch a human fly the helicopter, and then the neural network experiments on its own to refine its skills. Watching a human fly the toy helicopter is not sufficient for the neural network to acquire the ability to perform complex acrobatic maneuvers but such experiences help the neural network acquire the general basic principles of toy helicopter flying. After watching the human fly the toy helicopter, then the system experiments with the helicopter and refines it skills. In the show notes of this episode of learning machines 101 at the website: www.learningmachines101.com I have provided hyperlinks to the YouTube video illustrating these ideas as well as the original paper which was published at the Neural Information Processing Systems Conference. Episode 44 of Learning Machines 101 provides further discussions of Deep Reinforcement Learning!

A third example is a “self-driving car” which learns to drive using a “neural network machine learning algorithm” after watching videos of humans driving cars. This is a really fun YouTube video to watch where you see the car go over the divider and swerve off the road in the initial stages of learning but in the later stages of learning it drives without human intervention on populated roadways without difficulty. Again, hyperlinks to the YouTube Video and the original scientific paper can be found in the show notes of this episode.

So now that we have seen some examples of neural networks in the field of machine learning, let’s revisit the concept of a neural network more carefully by comparing biological neural networks with digital computer architectures so that we can understand in a more explicit way the similarities and differences between a “biological neural network” and a “machine learning learning neural network”.

Let’s use the “transistor” as the basis for a concept of “computing unit” in a computer’s brain which is called the “central processing unit”. Let’s use a “brain cell” or “neuron” as the basis for the concept of a “computing unit” in the brain.

In 1971, the typical central processing unit of a computer which corresponds to the “brain” of a computer had approximately 2,000 computing elements where each computing element completed a computation in about one millionth of a second. Today, the typical central processing unit of a computer has approximately 2 billion computing units where each computing unit completes its computation in about one-third of a billionth of a second.

In contrast, the human brain has about 100 billion computing units which each computing unit generates a response to an electrical impulse in about one-tenth of a second. Note that 100 billion computing units in the brain is a really big number and is about equal to the number of stars in our Milky Way Galaxy!

Thus, human brains have about the same number of computing units as digital computers but use components which are about 10 billion times slower, yet despite this slow component operating speed, the human brain can solve complex perceptual and cognitive information processing tasks such as recognizing a familiar object in a fraction of a second. How is this possible? How can a biological system whose computing units operate on times scales of about a tenth of a second identify and process large amounts of data in a fraction of a second more effectively than a digital computer system which has computing components which operate at speeds which are millions of times faster than their counterpart biological computing units?

These observations are remarkable because they make an interesting statement about the role of computational complexity in human brains. In particular, Feldman and Ballard in a 1982 article in the journal Cognitive Science concluded that these abilities of biological brains can only be explained if we assume that calculations in the brain are done using massive parallel processing. In a digital computer, the amount of computation is distributed such that small amounts of computation per unit time are completed over large numbers of time units even though each time unit might be a billionth of a second. In a biological human brain, the amount of computation is distributed such that large amounts of computation per unit time are completed over small numbers of time units where the length of a time unit is now about a tenth rather than a billionth of a second. Still, despite this fundamental difference in computing architectures, in many cases one can approximately or exactly simulate a massively parallel processing machine on a digital computer.

An important goal of computational neuroscience is the development of biologically realistic computer simulation models of the human brain. This is an important area of research because neuroscientists can compare the responses of specific small regions of the brain with the responses of corresponding simulated brain regions. If the responses agree, then this provides evidence that the neuroscientist has discovered Mathematical Laws of Neuron Response Properties which make quantitative predictions about neuron behavior.

This constitutes a fundamental advancement in our understanding of the human biological systems in the same sense that Newton’s Laws of Motion correspond to a mathematical theory of the laws of motion in the physical world. A big difference, however, is that Newton’s Laws of Motion are very simple while the mathematical laws governing the behavior of biological neural networks are very complicated.

In 1952, the University of Cambridge neuroscientists Hodgkin and Huxley published a paper in the Journal of Physiology (see the show notes for a hyperlink to the paper) titled: “A quantitative description of membrane current and its application to conduction and excitation in nerve”. This paper culminated a ten year investigation into the electrical response properties of the giant squid neuron and provided a mathematical theory (inspired by the mathematical theory of how electrical signals propagate through telegraph cables) which described how electrical activity propagates through biological neural networks. In 1963, they received the Nobel Prize for their work in this area because they were the first researchers to write down mathematical formulas which made quantitative predictions of how electrical activity flowed through biological brains.

Since 1952, these mathematical equations have been modified and extended to include new advances in our understanding of neuroscience. They are extremely complex and highly nonlinear and take considerable resources to simulate on a conventional digital computer. It’s interesting to note that these types of equations take into account the specific structural features of a particular neuron so neurons of different sizes and shapes have different response properties and these equations can be used to specify how the size and shape and structural characteristics of a neuron will influence the neuron’s response properties.

The Blue Brain Project described at “bluebrain.epfl.ch” has the goal of building “biologically detailed digital reconstructions and simulations of the rodent, and ultimately the human brain” through the extensive use of supercomputer technology. Also, it is important to note that there are at least three major computer simulation software packages which are open-source which you can download and simulate biologically realistic neural networks on your home computer. These simulation packages are: GENESIS, NEURON, VERTEX, and NEST. The links to these software packages can be found in the show notes of this episode. I encourage you to visit these websites and start simulating parts of biological brains on your home computer tonight!!

Still, simulating a massively parallel processing machine such as a biologically realistic human brain on a digital computer is a challenging task. There was a fun article published in 2013 on the website: www.extremetech.com (see shownotes for specific reference) by Ryan Whitman titled “Simulating 1 second of human brain activity takes 82,944 processors”. In the article, it is reported that Markus Diesmann and Abigal Morrison created an artificial neural network of 1.73 billion nerve cells connected by 10.4 trillion synapses. To simulate 1 second of brain computing time on this neural network using a supercomputer consisting of 82,944 networked computers took about 40 minutes of computer time.

Now recall that today’s modern computing units are not only more than a million times faster than the electronic computing units in the early 1970 but in addition the central processing unit of the 21^st century has a million times more computing units than the central processing unit of the 1970’s. This issue of computational complexity led researchers in the early 1970’s to explore the computational power of highly abstract mathematical models of more biologically realistic mathematical models.

The exploration of such simplifications yielded a number of interesting benefits. First, by using simplified mathematical models of a brain cell, one could construct and simulate much larger neural networks on a digital computer. Second, if one could demonstrate that a network of simplified mathematical models of neurons yields behavioral phenomena characteristic of biological brains, this would support the hypothesis that the specific brain cell characteristics which were eliminated in the abstract brain cell models are not necessary for creating specific types of behavioral phenomena. Or, in other words, this illustrates the concept of a sufficiency analysis which can be used to demonstrate that a particular extreme caricature of reality is sufficient to generate particular patterns of behavioral phenomena.

Now surprisingly, the first simplified model of a brain cell was proposed in 1943! It was developed by the Neurophysiologist, Psychologist, and Mathematical Physicist Warren McCulloch who was 45 years old at the time and the Mathematical Prodigy Walter Pitts who was about 20 years old in 1943. The paper they published was titled “A logical calculus of the ideas immanent in nervous activity” and this paper was published in the Bulletin of Mathematical Biophysics in 1943. A copy of this original paper can be found in the show notes of this episode. In this paper, they proposed that neurons in the brain worked like “logic gates” and could compute functions such as “OR”, “AND”, and “NOT”. In Episode 14 of Learning Machines 101, I discuss the concept of a McCulloch-Pitts neuron in detail. Computer scientists today often refer to McCulloch-Pitts neurons as Threshold Logic Units. By choosing the parameters of TLU or Threshold Logic Unit in a particular manner, it is possible for the TLU to implement either an: OR, AND, or NOT logic gate.

McCulloch and Pitts then showed in their paper that a multi-layer network of such Threshold Logic computing units could implement any arbitrary logical function. These ideas had a considerable impact on the mathematical genius Von Neumann and who cited the McCulloch-Pitts paper in his description of the first digital computer EDVAC which used Binary Logic Operations and Stored Computer Programs implemented using McCulloch-Pitts formal neurons. Thus, the first digital computer and its subsequent modern ancestors can all be viewed as “neural networks” because they were developed based upon inspiration of theories regarding how the brain processes information!

Unfortunately, by the 1960’s it was pretty clear to neuroscientists that the although McCulloch-Pitts formal neurons were great for building really cool digital computers they were very poor biologically realistic models of the human brain!

So there is lesson to this parable! Just because a system architecture is inspired by our knowledge of neuroscience doesn’t imply that it can be justified by reference to actual biological systems. Such inspirations are abstractions of reality and the question of the relevance of those abstractions of reality is fundamentally an experimental question.

Moving forward in time to the early 1960’s we find the publication of Frank Rosenblatt’s book titled “Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms”. You can find a detailed discussion of Perceptrons in Episode 15 of this podcast series. Briefly, however, Rosenblatt’s book covers a wide range of multi-layer network architectures and learning rules including cross-coupled layers. The generic Perceptron architecture, however, consists of a set of input units which generate a pattern of neural activity over a group of hidden units and then the hidden units, in turn, generate a pattern of neural activity over a group of output units. All units in the system are McCulloch-Pitts formal neurons and this network architecture is often called a “feed-forward” network architecture.

Now the big idea that Frank Rosenblatt was interested in exploring was the following. McCulloch and Pitts had shown in 1943 that a multi-layer network architecture was capable of representing any arbitrary logical function but they did not show how the parameters of such a multi-layer network architecture could be learned from experience. Rosenblatt proposed that the connections from the input to hidden units could be randomly connected. If one has enough hidden units, this would generate a large number of feature detection units at the hidden layer which would act like AND gates and pick up important feature conjunctions. The units in the output layer have their inputs connected to those units in the hidden layer but those connections are permitted to change as a function of experience.

In particular, the following simple learning rule was proposed. Present an input pattern to the perceptron and if it generates the correct response do not adjust the connections from the hidden units to the output units. However, if you present an input pattern to the perceptron and it generates an incorrect response to an input pattern then slightly decrease the connection weight from each active hidden unit to an output unit which is generating a logical 1 rather than a logical 0. On the other hand, if an output unit is generating a logical 0 rather than a logical 1, then the connections entering into that output unit from active hidden units are increased by a small amount. Rosenblatt and others showed that this learning procedure was guaranteed to find the correct mapping from input to output patterns in a finite number of learning trials if such a solution existed. Clearly this was an extremely exciting theoretical result especially if you consider this was at the very beginnings of computer science in the 1960’s. Indeed, this ideas has recently experienced a type of resurgence in the form of Extreme Learning Machines which are essentially based upon these principles. References to recently published articles on Extreme Learning Machines can be found in the show notes.

In the mid-1980s approximately in 1985, the concept of the multi-layer Perceptron network was extended to allow for learning in both the input to hidden and hidden to output layers. The idea was developed independently by three different research groups: Parker who was a researcher at Stanford University, Yann LeCun who was a PhD student in France, and the team of Rumelhart, McClelland, Hinton, and Williams. From an engineering perspective, the key idea was that the goal of the multi-layer learning machine was to minimize a prediction error and one could use conventional gradient descent methods to perturb the connection weights of the learning machine each time a training stimulus was presented to adjust the weights of the multi-layer network architecture. The concept of gradient descent is discussed in Episode 16.

Now let’s temporarily go backwards in time to 1980. In 1980, Fukushima published a neural network model of visual pattern recognition called neocognitron based upon a hierarchical modular structure identified in biological visual systems by the neuroscientists Hubel and Weisel. Qualitatively, the big idea here is that the same feature detection unit is employed to detect a particular feature anywhere in the input pattern and more than one such feature detection unit can exist. The learning process involves learning to adjust the parameters of these feature detection units. And, in situations, where two or more feature detection units disagree about the presence of a particular feature in the input pattern, the feature detection unit with the strongest response is chosen as the winner. These are the core ideas of having units with “shared weights” and “max-pooling” in convolutional hierarchical neural networks which are are discussed in greater detail in Episodes 29 and Episodes 41 of Learning Machines 101. Episode 35 of Learning Machines 101 also discusses the concept of a neural network in further details.

In conclusion, the goal of this episode was to provide a brief discussion of some example technological successes of neural network machine learning algorithms and then provide some insights into their biological and historical origins. A recently published article in the MIT Technology Review in 2013 notes that neural network technology is based upon our knowledge of neuroscience and implies that consequently “artificial intelligence is finally getting smart.”

Now this is a very exciting time for everyone including myself, but it is important to remember that we should not be too surprised if we reach some surprising insurmountable barrier in the near future. That is, we have recently seen orders of magnitude improvements in artificial intelligence technology in a very short time period but there is no reason to expect such large technological leaps to continue indefinitely into the future.

A Science News Letter article published in 1958 described the Perceptron as a “human being without life” which processed information in a manner similar to that of the human brain and which would shortly handle speech commands. The Perceptron’s rapid success at solving a variety of machine learning tasks in the late 1950s led some scientists to speculate that it would shortly be solving a much larger variety of more difficult information processing tasks.

Approximately 50 years later, those more difficult information processing tasks were indeed solved by the Perceptron but the scientists and especially the scientific news press of the late 1950s did not have a good grasp of how difficult those tasks really were to solve. Today, we should embrace the impressive success of machine learning technology but it is best to remain cautiously optimistic about what types of successes in machine learning and artificial intelligence might be expected in the near future.

Before leaving today, I would like to note that John Sonmez who is the founder of the “Simple Programmer” blog and author of “Soft Skills: The Software Developer’s Life Manual” has constructed the “The Ultimate List of Developer Podcasts”. I was quite pleased to be included on the list of “Data and Machine Learning” podcasts which includes not only my podcast “Learning Machines 101” but also other great Data Science and Machine Learning podcasts such as: Partially Derivative, the Data Skeptic, Linear Digressions, the O’Reilly Data Show, Data Stories, and Talking Machines.

I encourage you to visit John’s website to check out the “Data and Machine Learning” podcast list as well as the other podcast lists he has constructed for his “Ultimate List of Developer Podcasts”. Again, the link to his “Ultimate List of Developer Podcasts” can be found by visiting the website: www.learningmachines101.com.

Thank you again for listening to this episode of Learning Machines 101! I would like to remind you also that if you are a member of the Learning Machines 101 community, please update your user profile and let me know what topics you would like me to cover in this podcast.

You can update your user profile when you receive the email newsletter by simply clicking on the: “Let us know what you want to hear” link!

If you are not a member of the Learning Machines 101 community, you can join the community by visiting our website at: www.learningmachines101.com and you will have the opportunity to update your user profile at that time. You can also post requests for specific topics or comments about the show in the Statistical Machine Learning Forum on Linked In.

From time to time, I will review the profiles of members of the Learning Machines 101 community and comments posted in the Statistical Machine Learning Forum on Linked In and do my best to talk about topics of interest to the members of this group!

And don’t forget to follow us on TWITTER. The twitter handle for Learning Machines 101 is “lm101talk”!

And finally, I noticed I have been getting some nice reviews on ITUNES. Thank you so much. Your feedback and encouragement is greatly valued!

Keywords: Neural Networks, Perceptrons, Convolutional Neural Networks, Biological Neural Networks, Computational Neuroscience