Learning from Case Data

Bayes net learning is the process of automatically determining a representative Bayes net given data in the form of cases (called the training cases).  Each case represents an example, event, object or situation in the world (presumably that exists or has occurred), and the case supplies values for a set of variables which describes the event, object, etc, as specified in the previous chapter.  Each variable will become a node in the learned net (unless you want to ignore some of them), and the possible values of that variable will become the node’s states.  Learning from cases data results in probability revision.

The learned net can be used to analyze a new case which comes from the same (or appropriately similar) world as the training cases did.  Typically the new case will provide values for only some of the variables.  These are entered as findings, and then Netica does probabilistic inference to determine beliefs for the values of the rest of the variables for that case.  Sometimes we aren't interested in values for all the rest of the variables, but only some of them, and we call the nodes that correspond to these variables target nodes.  If the links of the net correspond to a causal structure, and the target nodes are ancestors of the nodes with findings, then you could say that the net has learned to do diagnosis.  If the target nodes are descendants, then the net has learned to do prediction, and if the target node corresponds to a "class" variable, then the net has learned to do classification.  Of course the same net could do all three, even at the same time.

The Bayes net learning task has traditionally been divided into two parts: structure learning and parameter learning.  Structure learning determines the dependence and independence of variables and suggests a direction of causation, in other words, the placement of the links in the net.  Parameter learning determines the conditional probability table (CPT) at each node, given the link structures and the data.  

You might not want Netica to learn the CPTs of all the nodes in your Bayes net.  Some of the nodes may have CPTs that have already been learned well, were created manually by an expert, or are based on theoretical knowledge of the problem at hand (perhaps expressed by an equation).  Netica allows you to restrict the learning process to a subset of the nodes, and those nodes are called the learning nodes.

If every case supplies a value with certainty for each of the variables, then the learning process is greatly simplified.  If not, there are varying degrees of partial information:

If there is a variable for which none of the cases have any information, that variable is known as a latent variable or “hidden variable”.

If some cases have values for a certain variable, and others don’t, that is known as missing data. (more info)

Some values for variables may not be given with certainty, but only as likelihood findings. (more info)

It may seem strange to be learning a net that has latent variables, since none of the training cases have any information on them.  You introduce a latent variable as a parent node (or intermediate node) of multiple child nodes, and Netica uses the correlations among the children to determine relationships between the latent node with others.  The result may be a Bayes net that is actually simpler (has fewer CPT entries), and generalizes better (i.e. performs better on new cases seen).  For an example of using Netica to learn a latent variable, see the “Learn Latent.dne” net in the Examples folder of Netica Application distribution, or get it from the Norsys net library.

More info on Netica’s learning.

More info on Learning Algorithms.