BETA UPDATING

This example is only for those who want to have a greater understanding of the technicalities of Bayesian learning in a very simple situation.

Suppose there is a large bag with white balls and black balls in it, but you don't know what the proportion is. You will draw a few balls from the bag (replacing them each time), to try to determine the proportion.

This all sounds rather artificial, but it has the same form as many learning problems where a proportion is involved. For example, instead of drawing balls you may be making telephone calls to determine the proportion of people who will vote for a certain political party. Or you may be picking medical files to see what proportion of a certain class of patients with a certain treatment eventually were cured. The purpose of replacing the balls after each draw is just to simulate a very large "population". If there are enough balls in the bag, it doesn't matter if they are replaced or not.

While doing the drawing, and learning the proportion, you could keep track of your belief for each possible proportion. This would be a probability distribution over all possible proportions. If you compile the example network you will see such a distribution in the leftmost node called "Proportion Black" (there are actually three Bayes nets included in the example; for now we concentrate on the leftmost one, and ignore the two single node nets to the right). This learner starts off believing all proportions are equally likely.

Suppose the learner had to provide a probability that the next draw was going to be black. Such a probability is provided by the node Draw_2. You can see that its equation is simply:

   P (Draw_2 | PB) = (Draw_2==black) ? PB : (1 - PB)

Where PB is the Proportion Black node. It just means that the probability of the next draw being black is the proportion of black balls, and the probability of the next draw being white is one minus the proportion of black balls.

Effectively, it ends up taking the average value of Proportion Black, which is 50.0%, as can be seen from the bottom line of the Proportion Black node.

You can enter findings for the other Draw_x nodes, and observe the changes in the probability distribution over proportions, and the probababilities for the other draws. If you wish to add other Draw_x nodes, just select a Draw_x node, and hold down the Ctrl key while dragging to a new location.

The initial distribution for the "Proportion Black" node, before any findings were entered, is a Beta(1,1) distribution, as can be seen by examining the node's equation. You could change the distribution to anything you want, and the example would still work. However, when the distribution is a beta distribution, some simplifications are possible. Even if it isn't a beta distribution, there are so many differently shaped beta distributions, that one which fits fairly closely could probably be found.

If the prior distribution is beta, then after drawing some balls the distribution will still be beta, but with different parameters. If it was Beta(x,y) then it will be Beta(x+b,y+w), where b and w are the number of black and white balls drawn, respectively. To demonstrate this, the "Proportion Black" node to the right has a Beta(5,3) distribution. Enter findings of black into 4 of the Draw_x nodes, and findings of white into 2 of them. Now the leftmost "Proportion Black" should have a Beta(1+4,1+2) distribution as well. You will see that they agree to within sampling error.

Netica's built-in learning is based on beta distributions. It allows you to define a single node like the "Draw" node in the lower right which represents the probability that the next draw seen will be black. After entering the color of the actual next draw as a finding into the "Draw" node, you can tell Netica to learn from the experience by incorporating the case.

For example, if you enter black as the first finding into Draw, choose Cases -> Incorporate Case, accept the default degree of 1, remove the finding and compile, the new probability for black will be 66.7%, the same as that for a new draw in the network to the left after a single black finding has been entered (as described above).

This type of learning is most valuable when it isn't for a single probability, but for all the probabilities involved in the relationships between several nodes. So in a more complex Bayes net, if after entering the findings from a case, you choose Cases -> Incorporate Case, then the CPTs of the Bayes net will be changed so that the network has learned from that case. After learning from a few cases, you can save the smarter network back to its file. Then the next time you use it, its probabilistic inference will be more accurate for the cases you are receiving.

Copyright 1998 Norsys Software Corp.