Multiple parents - performance issue

The engine.
Post Reply
orzech
Posts: 51
Joined: Wed Aug 04, 2010 11:40 pm

Multiple parents - performance issue

Post by orzech »

Hello,

I am quite new to GeNIe and I have to admit that I love it. :) Nonetheless, I was developing a diagnostic network when I came across a serious problem (probably related to my lack of knowledge on BN not GeNIE itself).

In my model some Noisy-OR 2-outcome symptom nodes have a big number of parents (about 25 - for example 'fever' is related to many diseases). Since internally GeNIe stores Noisy-ORs as CPT tables such nodes are defined using 2^25 doubles which is way to much to handle by GeNIe. Actually the application freezes on the call to node->CiToCPT() when setting the definiton of the node.

And here comes my question - is there anything I can do about this situation? Perhaps I should use some particular algorithm or switch some options in Smile? I guess not but I ask in case I've missed something important.

My current solution is to 'split' symptom nodes with a big number of parents. I mean that if a node reaches 10 parents, it gets duplicated and the other parents use the new node instead. This way each symptom node has max 10 parents. Then, when I set evidence on sympton nodes I always set equivalent evidence on all 'duplicated nodes'. I hope I am clear on this simple solution. Now, my question - does this method make any sense or are there any other techniques I can use here? I've compared results in 'test view' when using 'duplicated version' and 'non-duplicated version' of the BN and the results *are* different - but maybe still resonable?

I'd greatly appreciate your help and any comment.
Thanks in advance
adam
Posts: 25
Joined: Thu Jan 17, 2008 11:01 pm
Location: Shrivenham, UK
Contact:

Post by adam »

Hi,

In general, there is a nice solution to your problem. The noisy-OR is in fact a decomposable model, which means that you can decompose a large CPT into a series of nodes that have much smaller CPTs. The easiest way to do this is to represent the noisy-OR as a ladder. You can find an explanation how to do this in this paper: http://zagorecki.com/web_documents/flairs.pdf (2nd page, 2nd column). Actually, the paper exploits this idea for other models than the noisy-OR.

I’ve worked with large diagnostic BNs and this type of decompositions was producing immense improvements in terms of inference speed. Not mentioning that it makes some models tractable in the first place (as it will be likely in your case).

Theoretically, it’s even better to use a tree rather than a ladder. But from my experiments it was not making much difference in practice. I may have some SMILE code that does the decompositions if you are interested.

But… this approach is not 100% bulletproof. If there are loops in the model there may be situations where decompositions do not quite work. And there are some more tricks you can do to improve inference speed. Let me know if you want to explore the problem.

Cheers,
Adam
orzech
Posts: 51
Joined: Wed Aug 04, 2010 11:40 pm

Post by orzech »

Hi,

I cannot say how thankful I am. :) I guess that's *exactly* what I've been looking for. May I send you an e-mail if I find problems with the decomposition? In that case we could use are native language. ;)

Pozdrawiam
adam
Posts: 25
Joined: Thu Jan 17, 2008 11:01 pm
Location: Shrivenham, UK
Contact:

Post by adam »

Sure, no problem. My email is zagorecki@gmail.com.
Post Reply