LM101-008: How to Represent Beliefs using Probability Theory

By | July 14, 2014
Robot is using probabilistic inference to make deductions about world.

Episode Summary:

This episode focusses upon how an intelligent system can represent beliefs about its environment using fuzzy measure theory. Probability theory is introduced as a special case of fuzzy measure theory which is consistent with classical laws of logical inference.

Show Notes:

Hello everyone! Welcome to the eighth podcast in the podcast series Learning Machines 101. In this series of podcasts my goal is to discuss important concepts of artificial intelligence and machine learning in hopefully an entertaining and educational manner. The world in which we live is characterized by uncertainty. Nothing is absolutely true and nothing is absolutely false. Moreover, we have various beliefs about how the world works. In some cases, we may have absolutely certainty that our beliefs about the world are true but in many cases we may strongly believe something to be true but realize that exceptional circumstances may always exist.

In Episode 3, we introduced the idea of representing knowledge using logical IF-THEN rules. Unfortunately, the real world is often not very predictable. Therefore, in this episode we discuss the idea of an extension of the concept of logical IF-THEN rules that allows us to represent and manipulate beliefs rather than dogmatic assertions about the uncertain world in which we live.

For example, suppose I have some piece of knowledge such as: Birds can fly. Knowledge of this type can often be represented in terms of IF-THEN rules. So, for example, we can rewrite the assertion Birds can fly  as a logical IF-THEN rule. Specifically, IF something is a BIRD, THEN that something can FLY. However, there are always exceptions to rules. A rule may be applicable 98% of the time but not applicable 2% of the time. For example, suppose we have a rule that: IF Tweety is a bird, THEN Tweety can fly.  We have the assertion that “Tweety is a bird” but we also have the assertion that “Tweety is an ostrich” which leads to the implication “Tweety can fly and Tweety is an ostrich.” This would be an example where the existing IF-THEN rule is not adequate since ostrichs can not fly.

In Episode 7, we began our discussion of representing uncertainty in the world by considering two major approaches to modeling uncertainty which are called “fuzzy set theory” and “fuzzy measure theory”. Fuzzy set theory is based upon the idea that it is difficult perhaps impossible to divide the world up into “crisp sets”. For example, how could one define the “set of chairs”? If we want to have the concept of a “chair” then we might define that concept as a “crisp set” which means that something either is a chair or it is not a chair. When you eat at the dinner table on a chair, it seems reasonable to include a “dinner table chair” in that set. When you sit at your desk at work it seems reasonable to include a “desk chair” in that set. But there are lots of other things in the world which are “chair-like”. For example, would we want to include a “bean-bag chair” in the set of chairs?

Suppose that we are on a walking path in the woods and a rock has been provided for us to sit on. Would we want to include the rock in the set of chairs? Fuzzy set theory deals with this issue by introducing a “fuzzy set membership function” which assigns a number to each object in the world. If the object is definitely a chair, then the fuzzy set membership function assigns that object the membership number of 1. If the object is definitely not a chair, then the fuzzy set membership function assigns that object the membership number of 0. If an object is “kind-of” a chair then the fuzzy set membership function assigns that object a membership number between zero and one. The idea of a fuzzy set membership function was discussed in a little more detail in the previous episode which is Episode 7.

In this episode, we will focus on fuzzy measure theory. The essential idea of fuzzy measure theory is that we begin by assuming that, in fact, the world consists of crisp sets. Given a set of objects, a particular object is either a member of that set of objects or it is not a member of that set of objects. This is what we mean by a crisp set. When we use the term “set” or “collection” of objects we typically are referring to the concept of a “crisp set”. In fuzzy measure theory, we assign degrees of belief to assertions about crisp sets. So, for example, in fuzzy measure theory every object in the world is either a chair or is not a chair. However, an intelligent agent can have varying beliefs about the assertion that a particular object is a chair. So, for example, my belief that a “dinner table chair” or a “desk chair” is a chair might be equal to ONE indicating that I am certain these are examples of chairs. But my belief that a “bean bag chair” or a “rock” is a chair might be a number such as 8/10 indicating that I find the assertion that a “rock is a chair” to be somewhat believable.  However, the belief numbers that I assign to the assertions that “pizza is a chair” or “earth is a chair” or “Godzilla is a chair” would be much smaller and probably equal to zero or very close to zero. More technically, fuzzy measure theory is defined by having a collection of crisp sets and a special function called the fuzzy measure which assigns a number between zero and one to an assertion about a particular crisp set where zero indicates the assertion is believed to be false and one indicates the assertion is believed to be true. When the fuzzy measure assigns a number such as ½ to an assertion about a particular crisp set this indicates that the degree of belief assigned to the assertion is equal to ½. That is, here, we will use the terminology “degree of belief” to refer to the number assigned by a fuzzy measure to a particular crisp set of assertions.

There is another important characteristic of a fuzzy measure. This characteristic is called the monotonicity assumption. Suppose one has a numerical degree of belief that some set of assertions is true. Now consider an additional set of assertions. The numerical belief that the original set of assertions is true OR that the additional set of assertions is true must always be greater than or equal to the numerical belief assigned to the original set of assertions. In other words, if we believe that birds can fly with a degree of belief of 9/10 then our belief that birds can fly or fish can swim must be greater than or equal to 9/10.

To summarize, fuzzy set theory represents uncertainty by assuming true beliefs about fuzzy sets, while fuzzy measure theory represents uncertainty by assuming partially true beliefs about crisp sets.

The ability to represent and manipulate uncertainty is an important characteristic of intelligence. That is, an intelligent system should be able to deal with highly unusual situations which one might imagine couldn’t possibly occur. The assignment of “measures of belief” to sets of feature vectors, allows us to explicitly represent the degree of belief we may have that a particular set of feature vectors exists. This approach is called “fuzzy measure theory” and some references to fuzzy measure theory may be found in the show notes.

The assumption that we can represent the “believability” of an event as a number seems like a rather innocuous assumption. However, it actually involves two very strong assumptions about the representation of belief. These two assumptions are called the “completeness” and “transitivity” axioms. The “completeness axiom” states that all comparisons are completely well-defined.

An example violation of the completeness axiom is illustrated in the classic 1956 science-fiction movie “Forbidden Planet” whose plot is similar to Shakespeare’s “The Tempest”. In the movie, space travelers from Earth land upon a planet and find a human scientific genius named Morbius who has shunned humanity and created an amazing Robot. The robot is called Robbie the Robot and has been programmed such that it will never harm a human being. To demonstrate his programming, Morbius borrows the phaser of one of the space travelers and hands it to Robbie the Robot. Morbius commands Robbie to blast a plant on the terrace. Robbie does so and the plant is destroyed. Then Morbius asks Robbie if he understand how the phaser works and Robbie comments that it is a simple “blaster”. Morbius then asks Robbie to aim and fire the phaser at the Captain of the space travelers. Robbie begins to carry out the order but then “freezes up”. Morbius explains “he’s helpless…locked in a sub-electronic dilemma between my direct orders…and his basic inhibitions against harming rational beings.”  Morbius cancels his command and Robbie the robot becomes operational. Morbius then comments that if he had not cancelled his command that eventually Robbie the robot’s electronic circuitry would be destroyed.

In this episode from Forbidden Planet, Robbie is unable to comply with the request of his master Morbius. Robbie does not find the request to kill a space traveler to be more preferable, less preferable, or equally preferable to the command to not kill human beings. Robbie, instead, “freezes up” and goes into a catatonic state until his master retracts his order. Robbie’s inability to make a decision in this situation indicates that his method of making decisions violates the completeness axiom of rational decision making.

If Robbie the Robot was making decisions using numerical degrees of belief, then Robbie would assign one number to the belief that the correct action was to obey his master and kill the Space Captain and another number to the belief that the correct action is to disobey his master. Robbie would then choose the action with the greater degree of belief. If both actions had the same numerical degree of belief, then Robbie would be ambivalent regarding whether or not he killed the Space Captain.

The representation of belief as a number between 0 and 1 entails a second very strong assumption. Every possible method for representing belief as a number will imply that if the belief in A is greater than the belief in B and the belief in B is greater than the belief in C then it must always be the case that the belief in A is greater than the belief in C. This is called the “Transitivity Axiom”. If you prefer pizza to beer and prefer beer to ice cream than the Transitivity Axiom states that you must prefer pizza to ice cream. If you prefer pizza to beer and prefer beer to ice cream and, in addition, you prefer ice cream to pizza then you are violating the Transitivity Axiom. This means that you can believe anything you want! It’s a free world! However, we can not mathematically represent YOUR belief system using a numerical representation where a number represents your degree of belief.

In other words, the Transitivity Axiom follows directly from the assumption that we represent our belief in an event by the assignment of a numerical degree of belief to that event. If we assign a degree of belief of 8/10 to the event that pizza is our favorite food and assign a degree of belief of 7/10 to the event that beer is our favorite food in order to represent the idea that pizza is preferable to beer. It is mathematically impossible to pick a numerical degree of belief to be assigned to ice cream such that the belief number assigned to ice cream is less than the beer degree of belief 7/10 and also greater than pizza degree of belief 8/10!

Every Tuesday, I have breakfast at a restaurant with the Mad Hatter who is a good friend of mine but has his own crazy way of reasoning. In particular, my friend the Mad Hatter tends to make decisions which violate both the completeness axiom and the transitivity axiom!!! Listen to the following conversation carefully and see if you can determine violations of the axioms of transitivity and completeness assumptions as well as a violation of the monotonicity assumption.

<Transition Music>

RMG: Isn’t this place great! I love having breakfast at the Alice in Wonderland Coffee Shop!!

MH: Me too! This is fun!! Yipee!

RMG: And…here comes the coffee!!!!!

RMG: The waiter just brought you a cup of coffee? Don’t you like it?

MH: No. I want the cup of coffee the waiter brought you!

RMG: But it’s exactly the same! I watched the waiter pour coffee from the coffee pot into my cup and then he poured the coffee into your cup and we both have identical cups.

MH: I don’t care! I definitely do not want this cup of coffee. I want to switch with you. I definitely want to have your cup of coffee!! I definitely do not want my cup of coffee!

RMG: Ok! We can switch! The waiter will be coming back soon! What are you going to order?

MH: Well…I definitely prefer blueberry pancakes to eggs and prefer eggs to waffles. On the other hand, I definitely do not prefer blueberry pancakes to waffles.

RMG: Oh…I see this will take awhile. Maybe I can help you a little with deciding what to order because I’m getting pretty hungry.

MH: Ok!

RMG: What do you think of apple pancakes? Do you like apple pancakes?

MH: I really do not like apple pancakes but I do love any kind of pancake!!

RMG: I see…let me try this again…

RMG: Do you prefer apple pancakes to the Mexican omelet? Or do you like apple pancakes just as much as a Mexican omelet?

MH: I really don’t know if I like apple pancakes more or less than a Mexican omelet. I also don’t know if I like apple pancakes just as much a Mexican omelet.

RMG: I’m just going to go ahead and order for you like I do every week. I’m sure you will like what I order!

MH: Ok!

<Transition Music>

To summarize, the assumptions of completeness and transitivity are implied by the assumption that we have chosen a numerical representation of belief. The simple assumption that we can represent the belief in an event by assigning a number to that event which has the property that more believable events are assigned larger numbers, automatically implies that the axioms of completeness and transitivity must hold.

We will now discuss a very special type of fuzzy measure which I will refer to as an “additive” fuzzy measure. The additive fuzzy measure incorporates  two additional rules for combining and manipulating beliefs. By incorporating these two additional rule, the resulting additive fuzzy measure can be shown to result in a theory of belief representation that is consistent with the laws of deductive logic. These two additional rules will be called the rules of “additivity” and “conditioning”.

The rule of additivity may be illustrated as follows. Suppose that we have the rules

IF something is a bird, THEN I believe with belief = 1 that something can fly.”

IF something is a bird, THEN I believe with belief = 1 that something can not fly.”

These two rules taken together are inconsistent with the laws of logic. Again, one is entitled to believe anything one wishes to believe but these beliefs are not logically consistent. The problem is that we have not made the assumption that the belief a particular set of assertions is true is related to the belief that the same set of assertions is false.

Ideally, we would like to constrain our method of representing numerical beliefs such that the belief that something can not fly will tend to decrease as we increase the belief that something can fly. And, in particular, it is convenient to define the belief that something CAN NOT FLY as exactly equal to ONE minus our belief that something CAN FLY. Thus, in the special case where we believe with certainty that something CAN FLY this implies that we MUST believe with certainty that the assertion that something CAN NOT FLY is false. This is called the “Additivity Rule”.

An alternative formulation of the “Additivity Rule” is that the belief that something can fly OR something can not fly must exactly be equal to one. This where the “Additivity Rule” gets its name. The degree of belief that a set of assertions is EITHER true or false is constrained to be exactly equal to one. In other words, we always believe with certainty that a particular set of assertions is either true or false. Although this seems obvious, this constraint remains an assumption regarding how beliefs are represented and manipulated. And this assumption is called the Additivity Axiom.

Note that the monotonicity assumption of a fuzzy measure always holds if the additivity assumption holds. The additivity assumption is a stronger assumption than the monotonicity assumption.

Let’s now discuss the “conditioning assumption” associated with an additive fuzzy measure. Suppose we have a degree of belief that Tweety is a bird. We also have a rule that IF something is a bird, THEN we have a degree of belief that the something can fly. We would like a rule for combining beliefs so that we can explicitly calculate the degree of belief that Tweety is a BIRD and Tweety can fly from the degree of beliefs that “something is a bird” and “Tweety is a bird”. In the special case, where the degree of belief that Tweety is a BIRD is equal to 1 and the degree of belief that IF something is a BIRD, THEN the degree of belief that something can fly is equal to 1 multiplying these two degrees of beliefs together provides us with exactly the right answer since these two assertions imply “Tweety can fly with certainty” only if we are certain that Tweety is a BIRD and that IF something is a BIRD, THEN we are certain that something can fly.”

More generally, the conditioning assumption can be expressed as saying that the belief that an event E occurs with certainty multiplied by the belief that an event F occurs given the event E occurs must be equal to the belief that both event E and event F occur.

The above assumptions can be summarized as follows. First, it is assumed that degrees of belief can be represented as a number beween 0 and 1 and that beliefs are assigned to EVENTS which can be CONCEPTS or RULES. This first assumption implies that comparisons based upon degrees of belief satisfy the axioms of completeness and transitivity. Second, that beliefs are assigned to events such that the belief that an event is either true or false should always be equal to one. In other words, we are always certain that a particular event is either true or false. And third, that beliefs are assigned to events such that if we have a belief assigned to an IF-THEN rule such that we are CERTAIN the IF-THEN rule always holds and we are also certain the IF portion of that IF-THEN rule holds, then we must be CERTAIN both the IF part and the THEN part of the IF-THEN rule are true.

It can be shown that these characteristics of an additive fuzzy measure are mathematically equivalent to the laws of probability theory. In other words, instead of saying that Tweety is a BIRD with degree of belief 8/10. We will say the probability that Tweety is a BIRD is equal to 8/10. Instead of saying  IF something is a BIRD, THEN the belief is 9/10 that something CAN FLY. We will say: The probability that something CAN FLY is equal to 9/10 given that something is a BIRD.

It is important to realize that this is a more general concept of probability than that which is typically taught in introductory classes. The probability of an event is not the percentage of times that event occurs but rather the belief that the event occurs. However, under certain conditions, it can be shown that the percentage of times that an event occurs will be a good approximation to the belief that the event occurs. This observation is very important for understanding how the learning process in many machine learning algorithms work and this will be discussed in greater detail in a future episode.

Finally, the importance of the concepts introduced in this episode can not be emphasized enough. Using probabilistic reasoning methods as well as other methods of reasoning about uncertainty which have not been discussed here, it is possible to represent and manipulate assertions and rules about the world which are neither completely true or completely false. This is a crucial component of intelligence.

Further Reading:

The theorem by Cox (1946) is the basis for the discussion presented here that discusses how probability theory is equivalent to a type of fuzzy measure theory which is consistent with deductive logic. The theorem by Cox (1946) identifies assumptions which ensure that probability theory is the only special case of fuzzy measure theory which is consistent with deductive logic.

Cox, R. T. (1946). Probability, frequency, and reasonable expectation. American Journal of Physics, 14, 1-13. http://jimbeck.caltech.edu/summerlectures/references/ProbabilityFrequencyReasonableExpectation.pdf

Cox’s (1946) Theorem. Wikipedia http://en.wikipedia.org/wiki/Cox’s_theorem

Savage, L. (1972). The Foundations of Statistics. Dover, New York.

http://www.amazon.com/The-Foundations-Statistics-Leonard-Savage/dp/0486623491

An alternative classic derivation of the axioms of probability theory and expectation.
This is a more advanced text.

Wikipedia Fuzzy Measure entry (http://en.wikipedia.org/wiki/Fuzzy_measure)

McNeil, D. and Freiberger, P. (1994). Fuzzy Logic: The Revolutionary Computer TechnologySimon and Schuster.http://www.amazon.com/Fuzzy-Logic-Revolutionary-Computer-Technology/dp/0671875353.This is a very readable introduction to fuzzy logic at the level of this blog.

Wang, Z. and Klir, G. (2010). Fuzzy measure theory. www.amazon.com/Fuzzy-Measure-Theory-Zhenyuan-Wang/dp/1441932259/. This is an advanced text and requires a strong math background but is a useful introduction to the field for advanced undergraduate students or graduate students with background in advanced mathematics such as real analysis.

Copyright Notice:

Copyright © 2014 by Richard M. Golden. All rights reserved.

 

Leave a Reply

Your email address will not be published. Required fields are marked *