|
1. Generating Random Cases
In section B.3 we saw how Netica can learn the conditional probabilities between nodes
in a model of the world by reading in a set of sample values that are taken at random from
that world. In this tutorial we show how you can also do the opposite. Take a given Bayes net,
and generate a set of sample values that correspond to the same probability distributions found
in the net. This process is known as simulation, or sometimes as sampling.
With Netica, drawing simulation samples from the modeled world is extremely easy.
In fact, only a few button clicks are required. The process will create a file
of sample cases. This file can then be used for learning or for another application.
Here is the procedure for generating the file
- Compile the net (click
).
- Select all the nodes for which you wish to have values in the case file. to do this,
either click on each of the nodes, while holding down the Ctrl key on your keyboard, or drag a
rectangle around the desired nodes (or do a combination of these two operations). Note that all of the nodes of the network will be used to generate the cases, but columns will only be made (output) for the selected ones.
- Click on Network->Simulate-Cases (Note: in the next release of Netica, this function will be moving from the Network menu to the Cases menu).
- A dialog will be raised, asking you to enter the number of cases you wish to generate. Enter a number greater than zero.
- A second dialog will be raised, a standard Windows "Save As" file dialog, asking you to choose in which directory and with what name to create the file. Choose these and click on "Save".
- Finally, a third dialog will be raised, asking you what percentage of the entries you wish to be missing data. Normally you will enter 0 for the amount of missing data, but if you want to have a case file with asterisks for some fraction of the fields, enter that fraction as a percentage (25 => one quarter).
- The file will then be written to disk, as desired. It is a standard text (ascii) file, which you can read with any text editor, or you can hit F8 to browse the individual cases and see what they would look like if entered into the Net as evidence.
Let us give this a try. We will use Asia again, since we are familiar with it. But the procedure is identical for any Bayes net. Let us say we select all the nodes in Asia, except for Dyspnea and Bronchitis, and we ask for 100 cases to be generated with 25% as missing data.
The resultant file will look something like this (the case file you obtain will very likely be different, since random numbers are involved; also, the column formatting may be slightly different, due to differences in tab specifications on different platforms):
// ~->[CASE-1]->~
// File created by an unlicensed user using Netica 1.12
// on Oct 21, 2002 at 10:36:22.
IDnum Tuberculosis TbOrCa Cancer VisitAsia Smoking XRay
1 Absent False Absent No_Visit Smoker Normal
2 * False Absent No_Visit * Normal
3 Absent * Absent No_Visit NonSmoker *
4 Absent False * No_Visit NonSmoker Normal
5 Absent False Absent * Smoker *
6 Absent * Absent * Smoker *
7 * False * No_Visit NonSmoker *
8 Absent False Absent No_Visit * Normal
9 * False Absent No_Visit NonSmoker Normal
10 Absent False * No_Visit Smoker Normal
...
98 Absent * * No_Visit * Normal
99 Absent False Absent No_Visit Smoker *
100 Absent False Absent No_Visit Smoker Normal
Notes on Simulation
- The sampling algorithms used are precise, so that the long range frequencies of the cases will exactly approach the probabilities of the belief network, while taking account of all findings currently entered.
- With Equations: If one or more nodes have an equation to define the relation between a node and its parents, then you may want Netica to use those equations directly to generate the random cases, instead of the probability tables which approximate the equations. In that case, don't compile the network before doing Network->Simulate-Cases.
- The sampling process will be slow if the network has an unlikely set of findings entered (a rejection method is used).
- In the case file generated, continuous variables (whether or not they have been discretized) will have as values their continuous real number for each case, not just a state representing a range of values.
|