Tutorial on Bayesian Networks with Netica

A. Introduction to Bayes Nets

1. What is a Bayes net?

A Bayes net is a model. It reflects the states of some part of a world that is being modeled and it describes how those states are related by probabilities. The model might be of your house, or your car, your body, your community, an ecosystem, a stock-market, etc. Absolutely anything can be modeled by a Bayes net. All the possible states of the model represent all the possible worlds that can exist, that is, all the possible ways that the parts or states can be configured. The car engine can be running normally or giving trouble. It's tires can be inflated or flat. Your body can be sick or healthy, and so on.

So where do the probabilities come in? Well, typically some states will tend to occur more frequently when other states are present. Thus, if you are sick, the chances of a runny nose are higher. If it is cloudy, the chances of rain are higher, and so on.

Here is a simple Bayes net that illustrates these concepts. In this simple world, let us say the weather can have three states: sunny, cloudy, or rainy, also that the grass can be wet or dry, and that the sprinkler can be on or off. Now there are some causal links in this world. If it is rainy, then it will make the grass wet directly. But if it is sunny for a long time, that too can make the grass wet, indirectly, by causing us to turn on the sprinkler.

When actual probabilities are entered into this net that reflect the reality of real weather, lawn, and sprinkler-use-behavior, such a net can be made to answer a number of useful questions, like, "if the lawn is wet, what are the chances it was caused by rain or by the sprinkler", and "if the chance of rain increases, how does that affect my having to budget time for watering the lawn".

Here is another simple Bayes net called Asia. It is an example which is popular for introducing Bayes nets and is from Lauritzen&Spiegelhalter88. Note, it is for example purposes only, and should not be used for real decision making.

It is a simplified version of a network that could be used to diagnose patients arriving at a clinic. Each node in the network corresponds to some condition of the patient, for example, "Visit to Asia" indicates whether the patient recently visited Asia. The arrows (also called links) between any two nodes indicate that there are probability relationships that are know to exist between the states of those two nodes. Thus, smoking increases the chances of getting lung cancer and of getting bronchitis. Both lung cancer and bronchitis increase the chances of getting dyspnea (shortness of breath). Both lung cancer and tuberculosis, but not usually bronchitis, can cause an abnormal lung x-ray. And so on.

The direction of the link arrows roughly corresponds to "causality". That is the nodes higher up in the diagram tend to influence those below rather than, or, at least, more so than the other way around.

In a Bayes net, the links may form loops, but they may not form cycles. This is not an expressive limitation; it does not limit the modeling power of these nets. It only means we must be more careful in building our nets. In the left diagram below, there are numerous loops. These are fine. In the right diagram, the addition of the link from D to B creates a cycle, which is not permitted.


A valid Bayes net		Not a Bayes net

The key advantage of not allowing cycles it that it makes possible very fast update algorithms, since there is no way for probabilistic influence to "cycle around" indefinitely.

To diagnose a patient, values could be entered for some of nodes when they are known. This would allow us to re-calculate the probabilities for all the other nodes. Thus if we take a chest x-ray and the x-ray is abnormal, then the chances of the patient having TB or lung-cancer rise. If we further learn that our patient visited Asia, then the chances that they have tuberculosis would rise further, and of lung-cancer would drop (since the X-ray is now better explained by the presence of TB than of lung-cancer). We will see how this is done in a later section.

Summary

In this section we learned that a Bayesian network is a model, one that represents the possible states of a world. We also learned that a Bayes net possesses probability relationships between some of the states of the world.

1.1. Why are Bayes nets useful?

1.1.1 modeling reality

A model is generally useful if it helps us to greater understand the world we are modeling, and if it allows us to make useful predictions about how the world will behave. It is often easier to experiment with the model as compared to reality.

In the past, when scientists, engineers, and economists wanted to build probabilistic models of worlds, so that they could attempt to predict what was likely to happen when something else happened, they would typically try to represent what is called the "joint distribution". This is a table of all the probabilities of all the possible combinations of states in that world model. Such a table can become huge, since it ends up storing one probability value for every combination of states, this is the multiplication of all the numbers of states for each node. In the Weather model above, this would be 3 x 2 x 2 = 12 probabilities. In the Asia model it would be 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 = 2⁸ = 256 probabilities. For models of any reasonable complexity, the joint distribution can end up with millions, trillions, or unbelievably many entries. Clearly a better way is needed.

Bayesian nets are one such way. Because a Bayes net only relates nodes that are probabilistically related by some sort of causal dependency, an enormous saving of computation can result. There is no need to store all possible configurations of states, all possible worlds, if you will. All that is needed to store and work with is all possible combinations of states between sets of related parent and child nodes (families of nodes, if you will). This makes for a great saving of table space and computation. (Of course, some models are still too large for today's Bayes net algorithms. But new algorithms are being developed and breakthroughs are promising. This is a hotly researched area of modern computer science.)

A second reason Bayesian nets are proving so useful is that they are so adaptable. You can start them off small, with limited knowledge about a domain, and grow them as you acquire new knowledge. Furthermore, when you go to apply them, you don't need complete knowledge about the instance of the world you are applying it to. You can use as much knowledge as is available and the net will do as good a job as is possible with the available knowledge.

To illustrate this, let us return to our Asia net, that we saw in section 1 above. Let us suppose that you are a newly graduated medical doctor in Los Angeles, a specialist in lung diseases, and you decide to set up a chest clinic, one that handles serious lung-related disease. From your text-book studies you know something about the rates of lung cancer, tuberculosis, and bronchitis, and their causes and symptoms, so you can setup a basic Bayes net with some of that theoretical knowledge. For example, let's say according to your textbooks:

30% of the US population smokes.
Lung cancer can be found in about 70 people per 100,000.
TB occurs in about 10 people per 100,000.
Bronchitis can be found in about 800 people per 100,000.
Dyspnea can be found in about 10% of people, but most of that is due to asthma and causes other than TB, lung cancer, or bronchitis.

Armed with these statistics you could set up the following Bayes net:

Unfortunately, this net is not very helpful to you, because it really doesn't reflect the population of people that seek help from your clinic. Most of them have been referred by their family physicians, and so the incidences of lung disease amongst that population is much higher, you would imagine.

So you really should not use the above Bayes net in your practice. You need more data.

As your clinic grows and you handle hundreds of patient cases, you learn that while the text books may have described the North American situation, the reality of your clinic and its population of patients is very different. This is what your data collection efforts reveal:

50% of your patients smoke.
1% have TB.
5.5% have lung cancer.
45% have some form of mild or chronic bronchitis.

You enter these new figures into your net, and now you have a practical Bayes net, one that really describes the kind of patient you typically deal with.

So, let us see how we would use this net in our daily medical practice.

The first thing we should note is that the above net describes a new patient, one whom has just been referred to us, and for whom we have no knowledge whatsoever, other than that they are from our target population. As we acquire knowledge specific to each particular patient, the probabilities in the net will automatically adjust. This is the great beauty and power of Bayesian inference in action. And the great strength of the Bayes net approach is that the probabilities that result at each stage of knowlege buildup are mathematically and scientifically sound. In other words, given whatever knowledge we have about our patient, then based on the best mathematical and statistical knowledge to date, the net will tell us what we can legitimately conclude. This is a very powerful tool, indeed. Take a moment to think on it. You as a doctor are not just relying on hunches, or an intuitive sense of the likelihood of illness, as you may have in the past, but, rather, on a scientifically and provably accurate estimate of the likelihood of illness, one that gets more and more accurate as you gain knowledge about the particular patient, or about the particular population that the patient comes from.

So, let us see how adding knowledge about a particular patient adjusts the probabilities. Let us say a woman walks in, a new patient, and we begin talking to her. She tells us that she is often short of breath (dyspnea). So, we enter that finding into our net. With Netica we shall see, this is as simple as pointing your mouse at a node and clicking on it once, whereupon a list of available states pops up, and you then click on the correct item in the list. After doing that, this is what the net looks like. Notice how the Dyspnea box is grayed, indicating that we have evidence for it being in one of its states. In this case, because our patient appears trustworthy, we say we are 100% certain that our patient has dyspnea. It is easy with Netica to enter an uncertain finding (also called a likelihood finding), say of 90% Present, but let's keep things simple for now.

Observe how with this new finding, that our patient has dyspnea, that the probabilities for all three illnesses has increased. Why is this? Well, since all those illnesses have dyspnea as a symptom, because our patient is indeed exhibiting this symptom, it only makes sense that our belief in the possible presence of those illnesses should increase. Basically, the presence of the symptom has increased our belief that she might be seriously ill.

Let's look at those inferences more closely.

The most significant jump is for Bronchitis, from 45% to 83.4%. Why such a large jump? Well, bronchitis is far more common than cancer or TB. So, once we have evidence for serious lung illness, it becomes our most likely candidate diagnosis.
The chances that our patient is a smoker has now increased substantially, from 50% to 63.4%.
The chances that she recently visited Asia has increased very slightly: from 1% to 1.03%, which is insignificant.
The chances of our getting an abnormal X-Ray from our patient has also gone up marginally, from 11% to 16%.

If you think about this expansion of our knowledge, it is truly quite helpful. We have only entered one finding, the presence of Dyspnea, and this knowledge has "propagated" or spread its influence around the net, accurately updating all the other possible beliefs. Some of our beliefs are increased substantially, others hardly at all. And the beauty of it is that the amounts are precisely quantified.

We still do not know what precisely is ailing our patient. Our current best belief is that she suffers from Bronchitis (probability of Present=83.4%). However, we would like to increase our chances of a correct diagnosis. If we stop here and diagnose her with Bronchitis and she really has Cancer, we would be a poor doctor indeed. We really need more information.

So, being thorough, we run through our standard check-list of questions. We ask her if she has been to Asia recently. Surprisingly, she answers "yes". Now, let us see how this knowledge affects the net.

Suddenly, the chances of tuberculosis has increased substantially, from 2% to 9%. Note, interestingly, that the chances of lung cancer, bronchitis, or of our patient being a smoker all have decreased. Why is this? Well, this is because the explanation of dyspnea is now more strongly explained by tuberculosis than before (although bronchitis still remains the best candidate diagnosis). And because cancer and bronchitis are now less probable, so is smoking. This phenomenon is called "explaining away" in Bayes net circles. It says that when you have competing possible causes for some event, and the chances of one of those causes increases, the chances of the other causes must decline since they are being "explained away" by the first explanation.

To continue with our example, suppose we ask more questions and find out that our patient is indeed a smoker. Here is the updated net.

Note that our current best hypothesis still remains that the patient is suffering from Bronchitis, and not TB or lung cancer. But to be sure, we order a diagnostic X-Ray be performed. Let us say that the X-ray turns out normal. The result is:

Note how this more strongly confirms Bronchitis and disconfirms TB or lung cancer.

But suppose the X-ray were abnormal. The result is:

Note the big difference. TB or Lung Cancer has shot up enormously in probability. Bronchitis is still the most probable of the three separate illnesses, but it is less than the combination hypothesis of TB or Lung Cancer. So, we would then decide to perform further tests, order blood tests, lung tissue biopsies, and so forth. Our current Bayes net does not cover those tests, but it would be easy to extend it by simply adding extra nodes as we acquire new statistics for those diagnostic procedures. And we do not need to throw away any part of the previous net. This is another powerful feature of Bayes nets. They are easily extended (or reduced, simplified) to suit your changing needs and your changing knowledge.

Summary

In this section we learned that a Bayesian network is a mathematically rigorous way to model a world, one which is flexible and adaptable to whatever degree of knowledge you have, and one which is computationally efficient.

1.1.2 Assisting Decision Making

It is one thing to predict reality as accurately as is possible, but a natural and extremely useful extension of this is simply to weigh the states of your model with degrees of "goodness" or "badness". That is, if some states of the world lead to pleasure, while others to pain, you simply want to find out how can you change the world to maximize the pleasure and minimize the pain. Of course, you can use other terms for value, other than pleasure and pain, such as money, leisure-time, increased survival, and so forth. There is a science of decision making that mixes probability with measurements of value. It is called Decision Theory or Utility Theory. Bayes nets are easily extended to computing utility, given the degree of knowledge we have on a situation, and so they have become very popular in business and civic decision making as much as in scientific and economic modeling. We will see several examples of this later on in the tutorial when we use Netica for decision making.

Note on terminology. Bayes nets that are used strictly for modeling reality are often called "belief nets", while those that also mix in an element of value and decision making, as "decision nets". Of course, you can use a belief net to make decisions, but in a true decision net, the correct decision amongst the given options is computed for you, on quantitative grounds. The net itself gives you the optimal decisions. If you choose to act differently than the net suggests, it must be because you have extra information not represented in the net, or else you are in some sense not deciding rationally. This of course assumes that a rational person will want to maximize, pleasure, money, or whatever measure of value you choose, which is a question that is debated amongst philosophers. But we won't enter that arena here.

One interesting application of decision nets is in robotic controllers. The robot not only computes the best action using a Bayes net, but actually performs that action.

1.2. Why are Bayes nets called Bayes nets?

Bayes nets are networks of relationships, hence nets. And they are named "Bayes" after Reverend Thomas Bayes, 1702-1761, a British theologian and mathematician who wrote down a basic law of probability which is now called Bayes rule.

Bayes Rule:

For any two events, A and B,
p(B|A) = p(A|B) x p(B) / p(A)

where you read 'p(A)' as "the probability of A", and
'p(A|B)' as "the probability of A given that B has occurred".

It turns out that Bayes' rule is very powerful and is the basic computation rule that allows us to update all the probabilities in a net, when any one piece of information changes. Here is an example of it. Suppose you live in London, England, and you notice that during the winter, it rains 50% of the time and that it is cloudy 80% of the time (sometimes it is cloudy without rain). You know, of course, that 100% of the time, if it is raining, then it is also cloudy. What do you think the chances are of rain, given that it is just cloudy? Well, Bayes rule allows you to compute this. Bayes rule says that
p(R|C) = p(R)p(C|R)/p(C) = 0.5 x 1.0 / 0.8 = 0.625 = 5/8.
So, 5/8 of the time, in London during winter, if it is cloudy, then it is rainy.

Bayes rule can be extended to multiple variables with multiple states. Those equations are far more complex to write out, and tough to compute by hand, but easy for computers to solve, which is one reason why programs like Netica are so valuable.

1.3 What are Bayes nets used for?

Bayes nets may be used in any walk of life where modeling an uncertain reality is involved (and hence probabilities are present), and, in the case of decision nets, wherever it is helpful to make intelligent, justifiable, quantifiable decisions that will maximize the chances of a desirable outcome. In short, Bayes nets are useful everywhere.

1.3.1 Diagnosis

The Asia net is a typical diagnostic net.

The two top nodes are for predispositions which influence the likelihood of the diseases. Those diseases appear in the row below them. At the bottom are symptoms of the diseases. To a large degree, the links of the network correspond to causation. This is a common structure for diagnostic networks: predisposition nodes at the top, with links to nodes representing internal conditions and failure states, which in turn have links to nodes for observables. Often there are many layers of nodes representing internal conditions, with links between them representing their complex inter-relationships.

The diagnosis can be medical or mechanical. Many industrial applications of Bayes nets are for determining the chance of component failure. The nuclear industry, the airline industry, the construction industry, anywhere where lives or money are at stake, are all natural domains for applying Bayes nets.

1.3.2 Prediction

Since Bayes nets naturally represent causal chains, that is, the links may be cause-effect relationships between parent and child nodes, you can supply evidence of past events, and then run the Bayes net to see what the most likely future outcomes will be. Bayes nets are used for weather forecasting, stock market prediction, ecological modeling, etc., for making such predictions. Their strength is that they are very robust to missing information, and will make the best possible prediction with whatever information is present.

1.3.3 Financial risk management, portfolio allocation, insurance

Bank officers, insurance underwriters, investment advisors, all need to make difficult decisions, where often all the factors influencing a case are unknown. With Bayes nets, they are still able to make intelligent, quantifiable, and justifiable decisions, with whatever information is available.

1.3.4 Modeling ecosystems

Bayes nets are being heavily used in modeling ecosystems. Often fish and wildlife experts are faced with the difficult task of suggesting land use policy. They must balance the interests of industry, community, and nature and they need scientifically sound and justifiable arguments to back-up their analyses and decisions. With Bayes nets they can model an ecosystem and derive sound probabilities on whether certain species are at risk by certain industrial developments. For example, below is a Bayes net that depicts how the population of Townsend's big-eared bats is linked to various habitats being available, their temperature during Breeding times or Hibernation times, and so forth. It is taken from Bruce G. Marcot, of the US Forest Service, in his paper: Using Bayesian Belief Networks to Evaluate Fish and Wildlife Population Viability Under Land Management Alternatives from an Environmental Impact Statement. An on-line version can be found here.

1.3.5 Sensor fusion

Sensor fusion refers to the class of problems where data from various sources must be integrated to arrive at an interpretation of a situation. For instance, data from various cameras taken from various angles and resolutions might need to be integrated to determine what is in a scene. Or industrial sensors might each report on the state of a machine, and only by joining all their readings together does one get the complete picture. Often in sensor fusion problems, one must deal with different temporal or spatial resolutions, and one must solve the "correspondence problem", namely deciding which events from one sensor correspond to the same events as reported in the other sensors. Because Bayes nets are robust to missing data, they combine information well. Thus whereas each sensor has only a limited chance of giving a correct interpretation, the combination of all the sensors' chances, typically increases the likelihood of a valid interpretation. In the area of computer and robotic vision, Bayes nets have been widely used. Likewise for sonar image understanding.

1.3.6 Monitoring and alerting

An extension to diagnosing a system, is deciding when to send an alert that the system is in an unhealthy state. Often there is a severe cost if the system fails and also a severe cost for sending a false alarm (jets are scrambled, assembly lines are shut-down, etc.). Bayes nets, as decision nets, are a natural for making the best possible decision with available sensor data. The Vista system of Eric Horvitz, "is a decision net that was used at NASA Mission Control Center in Houston or several years. The system uses Bayesian networks to interpret live telemetry and provides advice on the likelihood of alternative failures of the space shuttle's propulsion systems." (ref #1)

The Netica API toolkits offer all the necessary tools to build such applications.

Summary

Bayes nets have the potential to be applied pretty much everywhere.

1.4 Interesting Properties of Bayes Nets

1.4.1 Probabilities need not be exact to be useful

Some people have shied away from using Bayes nets because they imagine they will only work well, if the probabilities upon which they are based are exact. This is not true. It turns out very often that approximate probabilities, even subjective ones that are guessed at, give very good results. Bayes nets are generally quite robust to imperfect knowledge. Often the combination of several strands of imperfect knowledge can allow us to make surprisingly strong conclusions.

1.4.2 Causal Conditional Probabilities are easier to estimate than the reverse

Studies have shown people are better at estimating probabilities "in the forward direction". For example, doctors are quite good at giving the probability estimates for "if the patient has lung cancer, what are the chances their X-ray will be abnormal?", rather than the reverse, "if the X-ray is abnormal, what are the chances of lung cancer being the cause?" (Jensen96)

Return to Tutorial Home