Learning from Data – Part of Quick Tour

Tutorial:  Open the Bayes net called "Car_Diagnosis_0" from the Examples folder (note: do not use "Car_Diagnosis_2").  It is a simplified example net containing nodes for a few variables of interest when diagnosing a car that is not running.  The nodes are linked up in a causal manner, but the net does not contain any information other than the node names, their states, and how they are linked.  If you right-click one of the nodes, and then choose Table, you will see in the table dialog box which appears, that the node has no CPTs defined (the empty boxes in the right-hand panel).  None of the nodes have any probabilities defined, so the net is not yet ready for inference but in this step we will learn the probabilities from data.

To learn a probabilistic table for each of the nodes we will use a file of cases of cars previously arriving at a garage, called “Car Cases”, in the “Examples” folder.  You may want to examine this file with a text editor.  The row of headings across the top are the names of the nodes in the net, and each possible value they can take are the state names of those nodes.  “BatAge” is a continuous node, so its values are real numbers.  The asterisks (*) indicate data values which are not known (i.e. missing data).

With the net window active, and no nodes selected (otherwise the learning will only apply to the selected nodes), choose Cases Learn  Incorp Case File.  When you are queried for a file, choose Car Cases, and enter 1 for the degree.  Netica will use the cases to learn probabilistic tables for each node of the net, with the Messages window displaying the fraction completed.

When it is finished, if you examine a few nodes with the table dialog box, you will see the learned probability distributions.  You can click on the selector that says "% Probability", and choose "Counts" from the menu to see the number of occurrences of each possibility in the data file.  From the Counts table, Netica generates the "Unnormalized" table by adding a small constant (usually 1) to each cell.  Summing each row of the Unnormalized table results in the "Experience" table, which is used to normalize the unnormalized table, and produce the "Probabilities" table.  Thus, Netica can learn the local probability tables in a very simple and effective way.  If you have latent variables, or lots of missing data, then Netica must use considerably more complex algorithms, such as EM or gradient descent to learn the tables.

Now that the local CPT tables have been learned, you can compile the net and do inference, paste parts of it into a decision net, absorb nodes, etc.

If you want to learn the link structure of your net based on a case file, use the structure learning feature.

Netica can also be used to generate files of cases which follow the probability distribution of a Bayes net (i.e. “sample” from the Bayes net).  These cases can be used as realistic examples of possible scenarios, or as synthetic data for learning experiments.  Simply select those nodes of the net for which you want columns in the case file, and then do Cases Simulate Cases.  You will be queried for the number of cases to generate, the name of the file to create, and how much missing data you want (enter 0 for none, 1 for all missing).