LM101-053: How to Enhance Learning Machines with Swarm Intelligence (Particle Swarm Optimization)
In this 53rd episode of Learning Machines 101, we introduce the concept of a Swarm Intelligence with respect to Particle Swarm Optimization Algorithms. The essential idea of “Swarm Intelligence” is that you have a group of individual entities which behave in a coordinated manner yet there is no master control center providing directions to all of the individuals in the group. The global group behavior is an “emergent property” of local interactions among individuals in the group! We will analyze the concept of swarm intelligence as a Markov Random Field, discuss how it can be harnessed to enhance the performance of machine learning algorithms, and comment upon relevant mathematics for analyzing and designing “swarm intelligences” so they behave in an appropriate manner by viewing the Swarm as a nonlinear optimization algorithm.
Hello everyone! Welcome to the fifty-third podcast in the podcast series Learning Machines 101. In this series of podcasts my goal is to discuss important concepts of artificial intelligence and machine learning in hopefully an entertaining and educational manner. Today, we discuss the concept of “swarm intelligence”…specifically situations where you have a group of individual entities which behave in a coordinated manner yet there is no master control center providing directions to all of the individuals in the group. The global group behavior is an “emergent property” of local interactions among individuals in the group! We will analyze the concept of swarm intelligence, discuss how it can be harnessed to enhance the performance of machine learning algorithms, and comment upon some relevant mathematics which can be used to prove a “swarm intelligence” will behave in an appropriate manner.
Bird flocks, gaggles of Geese, fish schools, cattle herds, bee swarms, ant hills, human crowds, and robotic drone swarms have several critical shared features. We will refer to situations where one has a group of individuals which move together in such a manner that it seems as if the group itself is making decisions as “swarm intelligence”. For example, consider a flock of birds. Flying in a flock has a number of advantages. First, if you are an isolated bird, then a predator can sneak up behind you but if you are flying in a flock then it is very difficult for the predator to sneak up and surprise the flock. Someone will be looking in pretty much every direction! Second, if you are searching for food, then again you have multiple birds searching in slightly different directions resulting in a greater chance the food will be found. And third, if you are performing some sort of long migration effort and you are not sure which way to go, it is usually easier to get there if you have many flying buddies who are making suggestions regarding which way is the correct direction.
Various theories have been proposed to explain Swarm Intelligence. It is interesting also to note that the behavior of a flock of birds seems so coordinated that theories of bird mental telepathy were proposed in the past to account for the high degree of coordination in their behavior. Another theory is that all of the birds are following a single leader bird but for many species of birds this argument is not convincing because even though bird response times can be quite fast, they are not fast enough to account for the coordinated behavior of a flock of birds under the assumption that there is a single leader for the entire flock and all of the other birds are “following the leader”. Another theory is that all of the birds are following their neighbors but this hypothesis can’t account for the data either. One should also note that these types of arguments hold for schools of fish, herds of cattle, swarms of bees, migrating groups of human beings traveling across continents, and groups of human beings in a panic situation generated by someone yelling “fire” in a dark crowded movie theatre. So this leads to the question…how is the “swarm concept” actually implemented? Where is the “swarm intelligence” located since it is apparently not stored in any one individual in the swarm?
A recent article published in the January issue 2016 of DefenseTech.Org (visit www.learningmachines101.com for the hyperlink was titled US Navy plans to fly first drone swarm this summer and an article published in the May 2016 issue of DefenseTech.org is titled Drone Swarm Will Fly Alongside F35 Fighter Jet. Hyperlinks to both of these on-line articles can be found in the show notes at: www.learningmachines101.com. The concept of a swarm of flying robots is quite attractive as a military weapon or to survey territory. An individual robot might be difficult to detect by a radar system so you could have all of the elements of the Swarm converge to the target point from different locations creating the Swarm magically at the desired location, also it is difficult to attack a swarm of robots since if you destroy only part of the swarm the remaining flying robots can still accomplish their mission, and finally it is difficult to launch a sneak attack on a swarm of robots because they are looking in all directions and attacking one robot will startle the other robots!
So where is the location of the “swarm intelligence”? Let’s formally define a swarm as a set of points but if you want to be more concrete just think of a swarm as a set of honey bees. At each instant in time, we can record the current position of every honey bee. The entire list of positions of all honey bees at a particular instant in time corresponds to a “snapshot” of the dynamics of the swarm which we will call the “global swarm state”. If the “swarm” is treated as an entity, then the behavior of the swarm is expressed as a sequence of global swarm states. The behavior of the swarm might exhibit intelligent behavior by seeking out food and avoiding predators even though no individual in the swarm is actually controlling the swarm and the swarm is not being controlled by some outside entity.
So…how does this work? The basic idea is that each individual in the swarm updates its current position velocity based upon the current positions of some of the other individuals in the swarm. If these local update rules are formulated in just the right way, they can give rise to emergent global behavior. Another analogy which is helpful for understanding swarm dynamics is watching a ballet where multiple dancers are working together in unison. From the audience’s perspective, the dancers are perfectly synchronized and exhibit an interpretable global dynamics. From the perspective of an individual dancer, however, each dancer has specific expectations regarding what is the next step in the dance but those expectations and the execution of those expectations must be modified and adapted to the actual performance of the dancer’s colleagues. This is essentially a hypothesis introduced by Wayne Potts in his 1984 Nature article which was titled: “The chorus-line hypothesis of manoeuvre coordination in avian flocks”.
It’s worth mentioning that a useful tool for the analysis of situations where local probabilistic rules give rise to globally emergent probabilistic phenomena is the Markov Random Field which is a generalization of a Markov Chain. One can visualize a Markov Random Field as a Swarm of individuals where the probability of the next action of an individual is determined by the current actions of the individual’s neighbors in the Swarm.
Note that it is assumed that the neighborhood dynamics is fixed. The neighbors of an individual don’t change in time. With these assumptions, the group dynamics is entirely determined by local probabilistic rules.
A fundamental question for the theory of Markov Random Fields is under what conditions does the actions of the entire Swarm …equivalently all of the actions of all individuals in the group at each instant in time governed by a single global probabilistic law which species the probability that the Swarm will take a particular action? One might imagine that in some cases the local probabilistic relationships might be inconsistent with one another so characterization of the global probabilistic behavior of the Swarm is not possible. Alternatively, one might imagine that there may not be sufficient complexity and richness of the local probabilistic relationships to uniquely determine a particular single global probabilistic law governing the behavior of the Swarm.
A simple sufficient condition for guaranteeing a single global probabilistic law governing the behavior of the Swarm is uniquely determinable by local probabilistic rules of the individuals in the Swarm is provided by the Hammersley-Clifford Theorem. For simplicity, we assume discrete-time and a finite number of individuals. We also assume each individual can only generate a finite number of actions. Chances are…you had already made these assumptions!! But now we need one additional assumption which seems fairly obvious but is essential because it constrains the class of possible global probabilistic rules governing the behavior of the Swarm. This additional assumption is called the Positivity Condition. Intuitively, the positivity condition states that if an action is “possible” for an individual in the swarm based upon local probabilistic rules, then the swarm’s behavior must be such that the individual’s action is also “possible” based upon the global probabilistic law governing the behavior of the Swarm. I have provided some technical references to these ideas in the Show Notes. Also see Episodes 11, 21, and 42 for further discussions of Markov Chains and Markov Random Fields.
In addition to guiding a swarm of flying robots towards a specific goal, the basic principles of swarm intelligence as embodied in the PSO algorithm can be used to solve nonlinear optimization problems relevant to machine learning. To see this, suppose that the prediction error of a learning machine depends upon the choice of the parameter value settings for the learning machine. By adjusting the learning machine’s parameter values, the learning machine’s predictive performance can be either improved or diminished. We will now show how we can use the PSO algorithm to adjust the learning machine’s parameter values to improve the learning machine’s predictive performance.
Assume the position of an individual in the swarm at an instant in time corresponds to a particular choice of parameter values of the learning machine. For example, suppose the learning machine has two parameters. Then we can interpret a particular choice for the two parameters as a location in a two-dimensional space. If the learning machine has three parameters this can be interpreted as a location in a three-dimensional space. Thus, honey bees as they fly create Swarms which move in a 3-dimensional physical space. More generally, if the learning machine has D parameters then the location of the individual is a point in a D-dimensional parameter space which is called a “hyperspace” since it contains more than three dimensions. The swarm thus is flying through a d-dimensional hyperspace towards a target landmark in that hyperspace which corresponds to an optimal parameter value setting for the learning machine which maximizes the learning machine’s ability to generate a desired output pattern for a given input pattern with respect to a collection of training data.
These ideas can be generalized in a straightforward manner to not only solving supervised learning problems but can be used to solve unsupervised and reinforcement learning problems as well. In fact, any nonlinear optimization problem on a finite state space can be reformulated as a Particle Swarm Optimization problem!
Although there are many types of algorithms which are used in generating swarm intelligence, in this podcast we will focus upon the most popular and classical algorithm which is typically called “PSO” which stands for “Particle Swarm Optimization”. Here is the basic algorithm which is similar to the algorithm described in the Wikipedia entry: Particle Swarm Optimization.
The basic idea of the algorithm works as follows. At any given instant in time, we define the position of the Swarm as the list of all positions of the individuals in the Swarm at that instant in time. In addition, although we know how to measure the effectiveness of parameter values for an individual in the Swarm as the distance of that individual to a target landmark, we need to measure the effectiveness of the Parameter Value Choices for the Entire Swarm. This is accomplished by adding up the distances of all individuals to the target landmark plus the distance of the swarm to the target landmark. At each point in time, multiple individuals in the swarm change their respective locations in hyperspace. This situation where multiple individuals in the swarm have changed their locations corresponds to the case where those same individuals have changed their opinions regarding the choice of parameter values for the learning machine. Note that each individual in the swarm is providing a different opinion regarding what should be the parameter value choices for optimizing predictive performance of the learning machine.
It is assumed that each individual in the swarm keep’s track of three different things. First, each individual keeps track of it’s current position in hyperspace. Second, each individual keeps track of it’s own position in the past which was “closest” to the goal. That is, the parameter values which corresponded to maximizing the learning machine’s predictive performance. And third, each individual keeps track of the position of the Swarm’s position in the past which is “closest” to the goal.
The individual in the Swarm uses this information by taking a small CANDIDATE random step towards its best position from its current position plus a small CANDIDATE random step towards the best position of the swarm given its current position. The individual then decides whether as a result of these random steps whether or not it is closer to the goal of maximizing the learning machine’s predictive performance. If not, then the individual returns to its position where it was located before taking the random steps; otherwise the individual in the Swarm keeps its new position. If this new position is closer to the goal than the current position of the Swarm, then the current position of the Swarm is replaced with this new position. This process is then continually repeated.
Note that all sorts of additional variations are possible. For example, one can modify the random steps of an individual in a group so that it also takes a small random step to move away from its physical neighbors in the group. This type of regularization constraint keeps the individuals in the group from colliding into one another from the perspective of a flying swarm. From an optimization theory perspective, this same regularization constraint is basically saying that each individual in the group should be looking at a different choice of parameter values for the learning machine at any instant in time.
It can be shown that the Swarm is forced to move in such a way so that at each instant in time, the Swarm is only allowed to move in such a manner that it gets closer to its goal or equivalently modifies the parameter values of the learning machine to improve the learning machine’s predictive performance. Thus, the Swarm and its individuals can be viewed as search algorithms which randomly explore the parameter space and only change their current best choice for the learning machine’s parameter values if they find parameter values which are definitely better than the current parameter values. The exploration of the parameter space, however, is not random but rather is biased by a variety of factors.
Although this algorithm is a deterministic algorithm which could easily be trapped in local minima, one can modify the above basic PSO algorithm in a variety of ways to improve the quality of the search. One possible modification is to view the state of the swarm as an ordered pair of all positions of the swarm’s individuals and the position of the Swarm itself. The random steps can be interpreted as a candidate step in a Metropolis-Hastings Monte Carlo Markov Chain algorithm as described in Episode 21 and Episode 42. So the modification could be informally described as modifying the original PSO algorithm so that with some small probability an individual accepts the new position generated from the random steps even if it takes the individual further away from the goal. The required formula is the general Metropolis-Hastings step. The usual Metropolis step may not always be applicable because the Metropolis Algorithm, unlike the Metropolis-Hastings Algorithm, requires the probability that the Swarm moves to point B from point A is the same as the probability that the Swarm moves from point A to point B. To summarize, one can modify the PSO algorithm so that the asymptotic behavior of the Swarm can be “designed” using a Monte Carlo Markov Chain Metropolis-Hastings Algorithm as discussed in Episode 21 and Episode 42. With the correct implementation, the MCMC Metropolis-Hasting Convergence Theorem shows that the state of the Swarm will tend to randomly visit globally optimal solutions which globally maximize the learning machine’s performance more frequently as the number of iterations of the algorithm becomes very large. Note that as the temperature parameter of this algorithm approaches zero, the resulting modified algorithm is formally equivalent to the original unmodified PSO algorithm.
Swarm Intelligence has been applied in a variety of areas including solving complex traffic and transportation problems, currency trade portfolio optimization, distributed job scheduling, designing sensor communication networks, routing in communication networks, Semantic Clustering of Web Documents, swarms of robot drones for surveying disaster sites, Military Swarm Robot Drone Armies, application of swarm intelligence to the design of analog circuits, and selecting optimal feature selection algorithms for machine learning.
In addition, in October 2015, there was a blog at blog.ventureradar.com which posted the top 5 Swarm Intelligence Startups which solve problems in areas such as scouting fires, solving big data problems, cybersecurity, risk forecasting, controlling groups of autonomous underwater vehicles, and the control of robots on the ground and in the air. I have provided links in the show notes to these startups as well as links to research papers covering various applications of Swarm Intelligence.
Finally, I can’t resist sharing with you a Michael Crichton book I read recently called “Prey”. Michael Crichton is the science-fiction author who wrote the book “Jurassic Park”. In this science fiction novel, Crichton describes a military research lab which is producing miniature robots called nanobots which look like small black specs and which are capable of rearranging themselves into black clouds which are ingested by people and eventually control people for the purpose of taking over the planet. In this scenario, the famous technological singularity where Artificial Intelligence rapidly becomes super powerful and attempts to take over humanity is realized by a Swarm of reproducing nano-robots. Of course this is purely fictional so please don’t panic! However, if you want a fun silly fiction book about this podcast topic…check out the book “Prey” by Michael Crichton! And, if you have to worry about something, don’t worry about the nano-robot artificial intelligence technology in Crichton’s book “Prey” which won’t be realized for thousands of years but rather worry about artificially intelligent military robot drones currently deployed today which are actively involved in making decisions regarding who should live and who should die. Worrying about that would be much more productive!
Thank you again for listening to this episode of Learning Machines 101! I would like to remind you also that if you are a member of the Learning Machines 101 community, please update your user profile and let me know what topics you would like me to cover in this podcast.
You can update your user profile when you receive the email newsletter by simply clicking on the: “Let us know what you want to hear” link!
If you are not a member of the Learning Machines 101 community, you can join the community by visiting our website at: www.learningmachines101.com and you will have the opportunity to update your user profile at that time. You can also post requests for specific topics or comments about the show in the Statistical Machine Learning Forum on Linked In.
A quick update!!! We will be starting a new series of special sessions on the Statistical Machine Learning Forum designed to encourage discussions. Note that the forum is a closed group so you have to apply to become a member of this group and we will process your application as soon as possible. So the sooner you apply to join the group, the more quickly we can get to your application!!
From time to time, I will review the profiles of members of the Learning Machines 101 community and comments posted in the Statistical Machine Learning Forum on Linked In and do my best to talk about topics of interest to the members of this group!
And don’t forget to follow us on TWITTER. The twitter handle for Learning Machines 101 is “lm101talk”!
And finally, I noticed I have been getting some nice reviews on ITUNES. Thank you so much. Your feedback and encouragement is greatly valued!
Keywords: Particle Swarm Optimization, PSO, Swarm Intelligence, Monte Carlo Markov Chain, Metropolis-Hastings, Positivity Condition, Markov Field
Further References for Particle Swarm Optimization:
“Prey” by Michael Crichton (Fiction Novel)
how-do-flocking-birds-move-in-unison? (Great Videos of Real Bird Swarms!!!)
Further References to Markov Random Fields
Wikipedia Entry: Metropolis-Hastings Algorithm. Click here to visit!
Wikipedia Entry: Gibbs Sampling. Click here to visit!
Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine Intelligence. Click here to get a copy of the paper!
Copyright © 2016 by Richard M. Golden. All rights reserved.