Calculating a joint probability distribution

The engine.
Post Reply
jonnie
Posts: 41
Joined: Mon Feb 06, 2012 12:49 pm

Calculating a joint probability distribution

Post by jonnie »

I have a bayesian network where the nodes are designated to be either disease nodes, symptom nodes, or auxiliary. Given evidence on some of the symptoms, I want to calculate the *joint* probability distribution of the disease nodes. That means not diseases' marginals but the probabilities of their combinations. I am only interested in "reasonable" combinations of maximum n diseases (take n=3 for example), which limits the combinatorical explosion a little.
What would be the best way to do this in Smile?
Three options come to my mind:
  • Work with probability of evidence. Calculate the probability of the symptoms P(S). Then, for each combination D I'm interested in, set the combination as additional evidence and calculate P(S,D). The probability of the disease combination given the symptoms can be obtained with P(D|S) = P(S,D) / P(S).
  • Add a "join node" to the network, with all the disease nodes as parents and one state for each combination and a deterministic definition.
  • Do it manually somehow (maybe there's a method that's faster than anything implemented in Smile)...
What would be the advantages / disadvantages of the approaches? Or would it be about the same amount of computations needed in any case?
I know BN are actually designed to avoid JPDs... but nevertheless i need to calculate it!
Any comments very appreciated :)
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: Calculating a joint probability distribution

Post by marek [BayesFusion] »

It seems to me that both (1) and (2) will work. I am somewhat worried about 2, as the number of states in that common child grows exponentially. It should work, however, for a small number of diseases. As far as (3) goes, here is a procedure that you may want to consider:

i. set evidence for all symptoms S
ii. calculate P(D1|S), i.e., the posterior of the first disease, D1, given the symptoms S
iii. add D1 to the evidence set and calculate P(D2|D1,S), the posterior of the second disease, D2, given the symptoms and the first disease
iv. calculate P(D1,D2|S)=P(D2|D1,S) P(D1|S).

Repeat this process until you have exhausted the set of diseases.
I'm not sure which is computationally more efficient, (1) or (3). They seem to be of comparable complexity.
I hope this helps.
Cheers,

Marek
Post Reply