BERKELEY ADMISSIONS

This net shows "Simpson's paradox" using a classic dataset on admissions data to graduate departments of the University of California at Berkeley in 1973.

The primary question is whether there is gender bias in the admissions policy.

Aggregated data is shown below:

```
|      male     female |     Total
---------+----------------------+----------
accept |      1198        557 |      1755
|     44.5%      30.4% |     38.8%
---------+----------------------+----------
reject |      1493       1278 |      2773
|     55.5%      69.6% |     61.2%
---------+----------------------+----------
Total |      2691       1835 |      4526
```
(from Bickel, P.J., J.W. Hammel and J.W. O'Connell (1975) "Sex bias in graduate admissions: Data from Berkeley" in Science, 187:398-403)

It appears that there is a bias since the acceptance rate for males is 44.5%, while for females it is only 30.4%.

To observe this in the bayes net, select all the nodes and choose Table -> Remove (if it is grayed, then they are already removed). Then choose Cases -> Incorp Case File, and select the file "Berkeley Admissions.cas". It contains the data in the table below, which is the same as the data above, but broken down by university department (you can view it by choosing File -> Open As Text):

```
--------+-------------------------
| ---male---    --female--
Dept |  acc   rej     acc   rej
--------+-------------------------
1 |  512   313      89    19
2 |  353   207      17     8
3 |  120   205     202   391
4 |  138   279     131   244
5 |   53   138      94   299
6 |   22   351      24   317
|-------------------------
Total | 1198  1493     557  1278
--------+-------------------------
```

Then choose Table -> Harden and accept the default degree of 1. That is to indicate that the dataset is complete, and that the bayes net should not add any uncertainty due to not observing all the cases. Then choose Network -> Compile (if grayed it is already compiled).

You can now read from the net the probabilities of various occurances; the overall probability of being admitted is 38.8%. If you enter a finding of "male", that probability jumps up to 44.5% and if you enter female the probability drops to 30.4%, as we observed above in the table for the aggregate data.

But now suppose we want to examine each of the departments individually, to see which ones have the most bias. Try entering a finding for each of the departments, and for each try a finding of male and of female, and see the probability of being admitted.

You may be surprised that for all but two departments, females have a better chance of being admitted than males, and in those two, it is quite close. In fact, by changing the numbers slightly we could arrange it so that for all departments the female acceptance rate is higher, and yet still have it so that in the aggregation over all departments, it is the male rate that is higher.

The reason is because, while there is a causal influence from node Gender to node Admitted, there is a stronger one that goes from Gender, through Department to Admitted. In other words, it is the choice of department that is mainly determining the acceptance rate, and females are more likely to choose the departments with low acceptance rates.

To see how fair the admissions policy is, we will consider a new individual (we will try male and female), whose department will be selected based just on the probabilities for the different departments (not on their gender), and see what their chance of acceptance is.

Select the Gender and Admitted nodes, and while holding the Ctrl key down, drag one to the right, and down slightly. You will create the nodes for a new individual whose gender is not going to affect their department choice (notice there is no link from Gender1 to Department).

Compile the net, and try entering findings of "male" and "female" for that new individual. You will see that the probability of a male being accepted is only 38.7%, while for a female it is 43%.

So, by this dataset, the admissions policy actually has a gender bias against males (futher analysis reveals it is only in department 1). In the short run the university may want to change their admissions policy to correct for that. In the longer run, the university may want to reallocate resources to the various departments, so that departments that females favour can be larger and have a higher acceptance rate. Making those decisions may involve many other criterion, but an analysis such as this one shows that it would be a mistake to simply try to change the admissions policy to account for the perceived inequity mentioned at the beginning.

To read more about "Simpson Paradox" explanations using causal Bayes nets, see Judea Pearl's (2000) "Causality" book, page 174.

Thanks go to Jack King for demonstrating Simpson's paradox using a bayes net on this problem. You can read about it in his book: King, Jack (2001) Operational Risk: Measurement and Modeling, Wiley.

For more information on it, see the literature section of: http://www.norsys.com/resources.htm He uses a non-standard version of the dataset, so the numbers obtained in his book are different.
Copyright 2002 Norsys Software Corp.