Tutorial on Bayesian Networks with Netica

B. Basic Netica Operation

3. Defining Node Relationships

In the previous tutorial we saw how to build the basic structure of a net, that is, how to define nodes and link them up. Here we will learn how to define the probability relationships between the nodes that have been linked up.

Although you might think that links would naturally house these relationships, it turns out that this is not ideal, since it makes it difficult to specify any interdependence between the relationships. It turns out best if the node holds the relationships that it bears with its parents. Therefore you will find conditional probabilities associated with any link by examining the child node of that link. You can find the relationship by either:

first selecting the child node and then choosing Relation->View Edit, or
first selecting the child node and then clicking on
raising the child node's dialog box, and clicking on the "Table" button.

3.1 Defining probability tables manually

The most basic and straightforward way to define a conditional probability between a node and its parents is to explicitly define what is termed the Conditional Probability Table, or CPT, for short.

The CPT is simply a table that has one probability for every possible combination of parent and child states. This is an N+1 dimensional table, where N is the number of parents. However the table can be "flattened" into two dimensions by explicitly specifying all combinations of parent states in one dimension and all child states in the other dimension. This is what Netica does, since multidimensional tables are hard to visualize. Let us look at the CPT for node Dyspnea in net Asia:

Dyspnea has two parents, TbOrCa and Bronchitis. Each of those has two states. On the left, presented vertically, are all possible combinations of the parent states. On the top right are all possible states of Dyspnea, "Present and "Absent". The probabilities of each combination of parent states and child state are then given at the bottom right.

Rows must sum to 1.0 Note that the probabilities of each row in the table must sum exactly to 1.0. This is because each row is summarizing the probabilities of one possible world, one where the parents are in the given states. And for that possible world, the chances of the child being in any one state must sum to 1.0.

You can edit the probabilities in the table by just clicking at the appropriate row-column location and typing in a new value. When done, click on "Okay". If any of the rows do not total 1.0, an error dialog is raised; simply make the necessary corrections and then click "Okay" again.

Terminology: Deterministic vs. Probabilistic relations. Sometimes a child node has exactly only one possible value for each possible configuration of parent states. Such a node is said to be deterministic since it's value is determined exactly by its parents, there is no element of chance involved. This means that in the child's CPT, in each row, one column will have a value of 1.0, and all the other columns will have a value of 0.0.

Load Button: Clicking the Load button will update all the probabilities in the table to reflect their current state in the network. It is the inverse of the "Apply" button which copies the table into the net. The "Load" feature is not one that is used much by beginner users, but when you are doing complex "what if" analyses, it can be very handy (it is perfectly fine in Netica to have multiple table dialogues open for the same node, each with different probabilities that you are experimenting with). Also, sometimes the underlying net is being updated by a program (say when the net is learning its probabilities from cases that occur on the fly), and you want to keep abreast of the current probabilities, so you can just keep clicking on "Load" to see the latest probabilities.

3.2 Defining probability tables by equation

Tables can sometimes be cumbersome to enter by hand, especially if there are many parent states to consider. Netica offers the ability to create a convenient shorthand description of the conditional probability tables using equations. The equation language is complete and powerful, and follows the syntax of the popular programming languages C, C++, and Java.

Tables can be used whether the nodes are continuous or discrete, and whether the relation is probabilistic or deterministic. All equations must be converted to tables before compiling a network, doing network transforms or solving decision problems. The tables are then used in the same way as if you had entered them by hand. Because tables assume a discrete set of states on the part of parents and child, any continuous nodes taking part in an equation must first have been discretized.

We will learn how to use equations by learning their basic syntactic form, and then by looking at a few examples. You should then try to create a few for yourself.

The syntax for equations varies slightly depending on whether the node's value is deterministically determined by its parents (always has a unique value, for each parent state configuration) or is probabilistically determined.

Deterministic nodes

Syntax: Child(Parent1, Parent2, ... Parent N) = some expression that yields legal state values of Child

Examples:

   /* convert F to Centigrade */
   C(F) = 9.0/5 * (F-32)  

   /* total distance traveled, X, is the average 
      velocity * time traveled + initial distance */
   X (Vel, dt, X0) = X0 + Vel * dt 

   /* if taste is sour,  choose a blue color; else 
      if taste is sweet, choose red; else
      if taste is salty, choose green; else
                         choose gray;  
   */
   Color (Taste) = Taste==sour?  blue:   
                   Taste==sweet? red:   
		   Taste==salty? green:  gray

Probabilistic nodes

Syntax: p(Child|Parent1, Parent2, ... Parent N) = some expression that yields probabilities, that is, numbers in the range 0.0 to 1.0.

Examples:

   /* the total distance traveled, X, follows a normal distribution with
      a mean of Vel*dt+X0, and a standard deviation of 'spread' */
   p (X | Vel, dt, X0, spread) = NormalDist (X, Vel*dt+X0, spread)

   /* the chemical's color is a probabilistic function of the temperature:
      if the temperature is high, the color is always yellow;
      if the temperature is medium, the color is always orange;
      but if the temperature is low, the color can be orange 20% of the 
      time and red 80% of the time. */
   p (Color | Temp) =  Temp == high ? (Color==yellow ? 1.0 : 0.0) :
                       Temp == med  ? (Color==orange ? 1.0 : 0.0) :
                       Temp == low  ? (Color==orange ? 0.2 : 
                                       Color==red    ? 0.8 : 0.0) : 0

Note that spaces and carriage returns have no bearing whatsoever in the equation syntax; you can use as many or as few as you like, to suit your taste. You can also add C-style comments (/* ... */) anywhere you like.

There are some rules that must be followed for an equation to make sense to Netica:

The only nodes which may be mentioned in an equation are: the node the equation describes, its parents, and any constant node.
If the equation is for a probabilistic node, its right-hand side must provide a probability for all the node's possible values. You cannot leave any out, as otherwise the probability table is incompletely defined. There is one exception to this rule. In the case of boolean nodes (those that take on the values of true and false), Netica will assume that if you give only one probability, it is for the True state.
It is a common error for beginning users of equations to define probabilistic functions that do not yield numbers, but states. For instance, they might type: p(Child|Parent) = (Parent==True)?False:True, but False and True are not probabilities, and so they are not legal values for the right-hand-side of a probabilistic function. The correct equation is: p(Child|Parent) = (Parent==True)?0.0:1.0, and p(Child|Parent) = not(Parent) would work just as well.

Netica's on-screen help contains a complete listing of all the available equation functions, including a detailed reference manual describing their parameters and exact function. There are over 50 of them, from simple mathematical ones to complex statistical ones. Here is a listing of them by name, just so you have an idea of what is available.

Common Operators
Functional Notation

neg (x)
not (b)
equal (x, y)
not_equal (x, y)
approx_eq (x, y)
less (x, y)
greater (x, y)
less_eq (x, y)
greater_eq (x, y)
plus (x1, x2, ... xn)
minus (x, y)
mult (x1, x2, ... xn)
div (x, y)
mod (x, base)
power (x, y)
and (b1, b2, ... bn)
or (b1, b2, ... bn)
if (test, tval, fval)
Common Operators
Operator Notation

- x
! b
x == y
x != y
x ~= y
x < y
x > y
x <= y
x >= y
x1 + x2 + ... + xn
x - y
x1 * x2 * ... * xn
x / y
x % base
x ^ y
b1 && b2 && ... && bn
b1 || b2 || ... || bn
test ? tval : fval
Common Math

abs (x)
sqrt (x)
exp (x)
log (x)
log2 (x)
log10 (x)
sin (x)
cos (x)
tan (x)
asin (x)
acos (x)
atan (x)
atan2 (x, y)
sinh (x)
cosh (x)
tanh (x)
floor (x)
ceil (x)
integer (x)
frac (x)
Special Math

round (x)
roundto (dx, x)
approx_eq (x, y)
eqnear (reldiff, x, y)
clip (min, max, x)
sign (x)
xor (b1, b2, ... bn)
increasing (x1, x2, ... xn)
nondecreasing (x1, x2, ... xn)
min (x1, x2, ... xn)
max (x1, x2, ... xn)
argmin0/1 (x0, x1, ... xn)
argmax0/1 (x0, x1, ... xn)
nearest0/1 (val, c0, c1, ... cn)
select0/1 (index, c0, c1, ... cn)
member (elem, s1, s2, ... sn)
factorial (n)
logfactorial (n)
gamma (x)
loggamma (x)
beta (z, w)
erf (x)
erfc (x)
binomial (n, k)
multinomial (n1, n2, ... nn)

Continuous Probability Distributions

UniformDist (x, a, b)
TriangularDist (x, a, w)
NormalDist (x, m, s)
LognormalDist (x, h, f)
ChiSquareDist (x, n)
ExponentialDist (x, l)
GammaDist (x, a, b)
WeibullDist (x, a, b)
BetaDist (x, a, b)
Beta4Dist (x, a, b, c, d)
CauchyDist (x, m, s)
ExtremeValueDist (x, m, s) Discrete Probability Distributions

SingleDist (k, c)
DiscUniformDist (k, a, b)
BernoulliDist (k, p)
BinomialDist (k, n, p)
PoissonDist (k, m)
HypergeometricDist (k, n, s, N)
NegBinomialDist (k, n, p)
GeometricDist (k, p)
NoisyOrDist (b, leak, b1, p1, b2, p2, ... bn, pn)
NoisyAndDist (b, inh, b1, p1, b2, p2, ... bn, pn)

Tips

The tables generated by equations may result in large files (and therefore slow reading), so you may want remove the node’s relation table with Relation->Remove or before saving the network to file. When you later read it in, do Relation->"Equation to Table" or (with no nodes selected) before using it.
If you need to define intermediate variables to simplify the equations, implement them as new (intermediate) nodes.
If the equations get large, it may be easier to create them in a text editor, and then paste them into the node dialog box.
When editing an equation you can cut, copy, paste, undo, etc. using their [Brent, their???] CTRL key commands, or by right-clicking[???].

3.3 Learning probability tables

In the two preceding sections we learned how to define the probabilistic relation between a node and its parents by either manually editing a table of probabilities or by writing an equation that is a short-hand expression for such a table.

In this section we discover a third way that Netica allows these conditional probabilities to be defined. This is by learning them from a collection of cases. If the collection of cases is a sample from the population we are modeling, then we can use the frequency information implicit in that data as approximations of the desired probabilities.

This is a very powerful and easy-to-use feature of Netica. Here is how to use it:

Collect your data in tabular format, one row per dataset instance (i.e., each row represents an occurrence of a possible world).
Ensure the data values are separated by tabs or spaces (in the next release of Netica commas are also accepted and other characters can also be defined as delimiters).
Ensure missing data fields are replaced with an asterisk ('*'). In the next release of Netica other characters can be defined as the missing data character.
At the top of the table, create one row with the names of the nodes. The names must be in the same order as the data columns in the rest of the table.
Finally, at the top of the case file place a single row with the string "// ~->[CASE-1]->~". This tells Netica that the format is in Netica Case file format #1, so that Netica can know what kind of file it is dealing with. In future there might be more advanced formats.
Save the resultant file to disk and then load it with one of:
- Relation->"Incorp Case File". (In the nest release of Netica this is Cases->"Incorp Case File".)

Here is a sample case file for the net Asia. Notice that we have added a special column called 'IDnum'. It is not required, but is a good idea for data handling purposes.

   // ~->[CASE-1]->~
   IDnum VisitAsia  Tuberculosis Smoking   Cancer  TbOrCa XRay     Bronchitis Dyspnea
   1     No_Visit   Present      Smoker    Absent  True   Abnormal Absent     Present
   2     No_Visit   Absent       Smoker    Absent  False  Normal   Present    Present
   3     No_Visit   *            Smoker    Present True   Abnormal *          Present
   4     No_Visit   Absent       NonSmoker Absent  False  Normal   Absent     Absent
   5     No_Visit   Absent       Smoker    Present True   Abnormal Present    Present
   6     No_Visit   Absent       Smoker    Absent  False  Abnormal Present    Present
  ...

The exact CASE file format is given here. It describes some other nice features about Case files, including how to comment them, add a multiplicity factor, and so forth.

As a teaching exercise, let us create a case file for both defining our nodes (as we saw how to do earlier in tutorial section B.2.1.6) and for learning the probabilities. Perform the following:

Cut-and-paste the following into your favorite text-editor, maybe add a few lines of your own, and save the result as your case file.

  // ~->[CASE-1]->~
  Forecast    ActualWeather
  rain        rain
  cloudy      sunny
  cloudy      cloudy
  cloudy      rain
  sunny       sunny
  sunny       sunny
  sunny       cloudy
  rain        cloudy

Create a new empty net.
Select Modify->"Add Case File Nodes" (Cases->"Add Case File Nodes..." in the next release of Netica), and give your case file as input. Two nodes should be constructed for you, Forecast and ActualWeather, each with the three states: rain, sunny, and cloudy.
Link Forecast to ActualWeather (or ActualWeather to Forecast: for our purposes here it makes no difference; the probability tables will end up incorporating whatever correlation exists).
Select Relation->"Incorp Case File" (Cases->"Incorp Case File" in the next release of Netica), and again choose file as input.
That is all. View the tables that were generated to confirm that they match what you expect.

More on Learning

Learning Algorithm used. The theory of Bayes nets does not dictate how probability tables are learned. There are many different learning algorithms possible. Some are known as "true Bayesian learning algorithms. Netica uses one of these. It is simple, and works well for most situations. The assumptions underlying it break down in situations where there are many parents (and thus the conditional probability table is large) and there are few data samples. In such situations, the learning is less than optimal, and you may want to find another way of estimating the probability tables. For details on the exact algorithm used, see the on-screen help system.
Learning from a single case. An alternate way to present cases, rather than loading them from file, is to present them individually, one by one, to Netica as positive findings (negative and likelihood findings are ignored in this kind of learning). Simply enter the findings values for those nodes you know and then select Relation->"Incorporate Case".
Unlearning. Sometimes you want to remove a previously learned case. To do this, simply learn it as though it were a single case (see above), but give it a degree of -1. The result will be the same as if you never had learned it in the first place.
Fading/Forgetting. Sometimes you want bias what you are learning toward the most recent cases. With Netica, you can choose to "fade" the old knowledge gradually. To do this, select Relation->Fade, and enter a degree, d, from 0 to 1. Netica will then reduce the experience and smooth the probabilities of the selected nodes by an amount dictated by that degree. 0 has no effect, and 1 creates uniform distributions with no experience, thereby undoing all previous learning. Fading twice, by d1 and then d2, is equivalent to fading once by d1*d2. See the on-screen help for further details.

Return to Tutorial Home