C. Intermediate Topics
Return to Tutorial Home

1. Generating Random Cases

In section B.3 we saw how Netica can learn the conditional probabilities between nodes in a model of the world by reading in a set of sample values that are taken at random from that world. In this tutorial we show how you can also do the opposite. Take a given Bayes net, and generate a set of sample values that correspond to the same probability distributions found in the net. This process is known as simulation, or sometimes as sampling. With Netica, drawing simulation samples from the modeled world is extremely easy. In fact, only a few button clicks are required. The process will create a file of sample cases. This file can then be used for learning or for another application. Here is the procedure for generating the file

  1. Compile the net (click ).
  2. Select all the nodes for which you wish to have values in the case file. to do this, either click on each of the nodes, while holding down the Ctrl key on your keyboard, or drag a rectangle around the desired nodes (or do a combination of these two operations). Note that all of the nodes of the network will be used to generate the cases, but columns will only be made (output) for the selected ones.
  3. Click on Network->Simulate-Cases (Note: in the next release of Netica, this function will be moving from the Network menu to the Cases menu).
  4. A dialog will be raised, asking you to enter the number of cases you wish to generate. Enter a number greater than zero.
  5. A second dialog will be raised, a standard Windows "Save As" file dialog, asking you to choose in which directory and with what name to create the file. Choose these and click on "Save".
  6. Finally, a third dialog will be raised, asking you what percentage of the entries you wish to be missing data. Normally you will enter 0 for the amount of missing data, but if you want to have a case file with asterisks for some fraction of the fields, enter that fraction as a percentage (25 => one quarter).
  7. The file will then be written to disk, as desired. It is a standard text (ascii) file, which you can read with any text editor, or you can hit F8 to browse the individual cases and see what they would look like if entered into the Net as evidence.

Let us give this a try. We will use Asia again, since we are familiar with it. But the procedure is identical for any Bayes net. Let us say we select all the nodes in Asia, except for Dyspnea and Bronchitis, and we ask for 100 cases to be generated with 25% as missing data.

The resultant file will look something like this (the case file you obtain will very likely be different, since random numbers are involved; also, the column formatting may be slightly different, due to differences in tab specifications on different platforms):

// ~->[CASE-1]->~

// File created by an unlicensed user using Netica 1.12
// on Oct 21, 2002 at 10:36:22.

IDnum  Tuberculosis  TbOrCa  Cancer  VisitAsia  Smoking    XRay
1      Absent        False   Absent  No_Visit   Smoker     Normal
2      *             False   Absent  No_Visit   *          Normal
3      Absent        *       Absent  No_Visit   NonSmoker  *
4      Absent        False   *       No_Visit   NonSmoker  Normal
5      Absent        False   Absent  *          Smoker     *
6      Absent        *       Absent  *          Smoker     *
7      *             False   *       No_Visit   NonSmoker  *
8      Absent        False   Absent  No_Visit   *          Normal
9      *             False   Absent  No_Visit   NonSmoker  Normal
10     Absent        False   *       No_Visit   Smoker     Normal
  ...
98     Absent        *       *       No_Visit   *          Normal
99     Absent        False   Absent  No_Visit   Smoker     *
100    Absent        False   Absent  No_Visit   Smoker     Normal

 


Notes on Simulation

  • The sampling algorithms used are precise, so that the long range frequencies of the cases will exactly approach the probabilities of the belief network, while taking account of all findings currently entered.
  • With Equations: If one or more nodes have an equation to define the relation between a node and its parents, then you may want Netica to use those equations directly to generate the random cases, instead of the probability tables which approximate the equations. In that case, don't compile the network before doing Network->Simulate-Cases.
  • The sampling process will be slow if the network has an unlikely set of findings entered (a rejection method is used).
  • In the case file generated, continuous variables (whether or not they have been discretized) will have as values their continuous real number for each case, not just a state representing a range of values.
Return to Tutorial Home