Hi!
How is the computational complexity of having multiple nodes with small CPTs and target set on only the top node vs one single node with a large CPT?
I often stand between making a single node with few inputs, or merging them together to one node with many inputs. The first option is easier to quantify, but would there be an advantage of marginalizing away the nodes after the elicitation process? I imagine that looking up a value in a CPT should be extremely quick, but that the size of the CPT grows very quickly with the number of inputs, so I imagine there would be some memory trouble at some point?
I have settarget on only the nodes that are of interest (which would not have been marginalized away).
Additional note, in my current project almost all nodes are 1/0 probabilities does this change anything?
Complexity of large nodes vs many nodes

 Site Admin
 Posts: 350
 Joined: Tue Dec 11, 2007 4:24 pm
Re: Complexity of large nodes vs many nodes
Hi Sverre,
I don't believe I could answer your question general. My feeling is that the network with individual nodes will be better but to make an informed judgment I should see and examine the two networks. You are mentioning ease of getting the numbers. This is a very important factors, as speed of inference is not a terribly big issue these days but knowledge engineering is and will remain a key issue. One question is, of course, which model is semantically correct and that depends on the domain.
With respect to zeros and ones, it may change something because SMILE propagates evidence through 0s and 1s in CPTs. I would like to caution you against 0s and 1s in CPTs  Bayes theorem does not like zero probabilities. Once zero, it can never change to anything else, no matter how strong evidence against it. If an event is possible at all, no matter how unlikely, I would suggest using some small number, say 1 in a million rather than zero. Of course, sometimes zero is justified.
I hope this helps,
Marek
I don't believe I could answer your question general. My feeling is that the network with individual nodes will be better but to make an informed judgment I should see and examine the two networks. You are mentioning ease of getting the numbers. This is a very important factors, as speed of inference is not a terribly big issue these days but knowledge engineering is and will remain a key issue. One question is, of course, which model is semantically correct and that depends on the domain.
With respect to zeros and ones, it may change something because SMILE propagates evidence through 0s and 1s in CPTs. I would like to caution you against 0s and 1s in CPTs  Bayes theorem does not like zero probabilities. Once zero, it can never change to anything else, no matter how strong evidence against it. If an event is possible at all, no matter how unlikely, I would suggest using some small number, say 1 in a million rather than zero. Of course, sometimes zero is justified.
I hope this helps,
Marek
Re: Complexity of large nodes vs many nodes
Hi Marek, thanks for the reply,
Regarding the 0/1 probabilities, they have worked well for me. The network evaluation will fail if it is impossible to explain the observed evidence, but this is as expected. I see the danger with divisions by 0, but have not experienced any problems yet. Anything i should look out for?
I'm using the network for realtime applications (highlevel decisionmaking for robotics) making speed an important issue.
In my case, I have some leafnodes with a probability distribution. Then the rest of my network consists of logic gates such as "and", "or", "max" (whichever input that is the "largest" state), "less than" (whether input 1 has a lower state than input 2). These are modeled as CPTs with 0/1 probabilities. There are quite a few leaf nodes and logical expressions in my model. One way of modeling this is to have a new node for each logical operator (two input nodes), another way would be to have one node with all the leaf nodes as input and then a giant CPT with the entire logical expression written out with 0/1 probabilities. Of course, this CPT would not be made by hand, but with equation nodes and then discretizing or by marginalizing away intermediate nodes.
This network is dynamic in time. Ideally, a new timestep is added to the DBN for each observation that is made by a robotic system. Increasing the number of timesteps quickly leads to large computational time which won't work for realtime use. I, therefore, have to do a combination of limiting the observations saved to the most important, forgetting very old observations, or reducing the precision of the approximate solver.
So if it's possible to affect the computational time by restructuring the network (eg. reducing the number of nodes but increasing the size of the CPTs) then that would be very helpful. Kind of like compiling the network to a faster, but less readable format.
I know there is no way for you to predict exactly what would result in the best computational time, but perhaps you have some guidelines or ideas on what affects the computations?
Is it detrimental to have very large CPTs? As the size of the CPTs grows combinatorial with the number of inputs, they quickly become very very large when I marginalize away nodes. I guess this will be a problem at some point, but when would this start to have an effect?
Is the computational time affected by the number of nodes? In this case, would it help to reduce the number of nodes?
Or would this not affect anything as long as the same nodes are marked as target nodes?
Best regards, and happy holidays!
Sverre
Regarding the 0/1 probabilities, they have worked well for me. The network evaluation will fail if it is impossible to explain the observed evidence, but this is as expected. I see the danger with divisions by 0, but have not experienced any problems yet. Anything i should look out for?
I'm using the network for realtime applications (highlevel decisionmaking for robotics) making speed an important issue.
In my case, I have some leafnodes with a probability distribution. Then the rest of my network consists of logic gates such as "and", "or", "max" (whichever input that is the "largest" state), "less than" (whether input 1 has a lower state than input 2). These are modeled as CPTs with 0/1 probabilities. There are quite a few leaf nodes and logical expressions in my model. One way of modeling this is to have a new node for each logical operator (two input nodes), another way would be to have one node with all the leaf nodes as input and then a giant CPT with the entire logical expression written out with 0/1 probabilities. Of course, this CPT would not be made by hand, but with equation nodes and then discretizing or by marginalizing away intermediate nodes.
This network is dynamic in time. Ideally, a new timestep is added to the DBN for each observation that is made by a robotic system. Increasing the number of timesteps quickly leads to large computational time which won't work for realtime use. I, therefore, have to do a combination of limiting the observations saved to the most important, forgetting very old observations, or reducing the precision of the approximate solver.
So if it's possible to affect the computational time by restructuring the network (eg. reducing the number of nodes but increasing the size of the CPTs) then that would be very helpful. Kind of like compiling the network to a faster, but less readable format.
I know there is no way for you to predict exactly what would result in the best computational time, but perhaps you have some guidelines or ideas on what affects the computations?
Is it detrimental to have very large CPTs? As the size of the CPTs grows combinatorial with the number of inputs, they quickly become very very large when I marginalize away nodes. I guess this will be a problem at some point, but when would this start to have an effect?
Is the computational time affected by the number of nodes? In this case, would it help to reduce the number of nodes?
Or would this not affect anything as long as the same nodes are marked as target nodes?
Best regards, and happy holidays!
Sverre

 Site Admin
 Posts: 350
 Joined: Tue Dec 11, 2007 4:24 pm
Re: Complexity of large nodes vs many nodes
Hi Sverre,
Zeros are not dangerous because of the possibility of division by zero but rather multiplication. In Bayes theorem, the posterior is derived by multiplying the prior by an expression that represents the strength of evidence. If the prior is zero, then the posterior HAS TO be zero. For example, suppose you say that there are no ghosts and your probability of existence of ghosts is zero. EVen if a ghost walks into your office through a wall, passes through you and hits you in your head. The evidence is very strong: you can see it, you can feel it, and your head is aching. The evidence is thus very strong. If you are a Bayesian and want to process the evidence correctly, you have no choice but to say to the ghost "I'm sorry, Sir, but you don't exist" :). Does this make it clearer? The lesson from this is that you should use probability zero with caution, only when you are absolutely positively sure that it is zero. In many domains, for example in medicine, there are no impossible events.
As far as very large CPTs goes, they are a problem because their size is exponential (not combinatorial!) in the number of parents. At some point, you will simply run our of memory, no matter how much (finite) memory you have. Decomposing large CPTs is essentially what Bayesian networks are about. This leads to smaller memory requirements and tractable inference. GeNIe will warn you when you have too many parents (and, hence, too large CPTs). This is a threshold that you can set in Tools/Options.
Generally, in exact inference, SMILE creates a junction tree and the size of the junction tree depends on the number of nodes and network connectivity (that, in turn impacts the size of the CPTs and the resulting clusters in the junction tree). There is no simple equation that would express the calculation time in terms of these variables.
One way of being able to have many parents is to use NoisyMAX gates and setting the flag "Enable NoisyMAX Decomposition" in Network Properties/Inference. When this is active, GeNIe (actually SMILE) will decompose large CPTs using the additional independences implied by the NoisyMAX assumption.
Having target nodes will generally help a lot, as SMILE will reduce the network before the inference so that it can guarantee that the targets are calculated (and the rest is just "don't care").
I hope this helps.
Marek
Zeros are not dangerous because of the possibility of division by zero but rather multiplication. In Bayes theorem, the posterior is derived by multiplying the prior by an expression that represents the strength of evidence. If the prior is zero, then the posterior HAS TO be zero. For example, suppose you say that there are no ghosts and your probability of existence of ghosts is zero. EVen if a ghost walks into your office through a wall, passes through you and hits you in your head. The evidence is very strong: you can see it, you can feel it, and your head is aching. The evidence is thus very strong. If you are a Bayesian and want to process the evidence correctly, you have no choice but to say to the ghost "I'm sorry, Sir, but you don't exist" :). Does this make it clearer? The lesson from this is that you should use probability zero with caution, only when you are absolutely positively sure that it is zero. In many domains, for example in medicine, there are no impossible events.
As far as very large CPTs goes, they are a problem because their size is exponential (not combinatorial!) in the number of parents. At some point, you will simply run our of memory, no matter how much (finite) memory you have. Decomposing large CPTs is essentially what Bayesian networks are about. This leads to smaller memory requirements and tractable inference. GeNIe will warn you when you have too many parents (and, hence, too large CPTs). This is a threshold that you can set in Tools/Options.
Generally, in exact inference, SMILE creates a junction tree and the size of the junction tree depends on the number of nodes and network connectivity (that, in turn impacts the size of the CPTs and the resulting clusters in the junction tree). There is no simple equation that would express the calculation time in terms of these variables.
One way of being able to have many parents is to use NoisyMAX gates and setting the flag "Enable NoisyMAX Decomposition" in Network Properties/Inference. When this is active, GeNIe (actually SMILE) will decompose large CPTs using the additional independences implied by the NoisyMAX assumption.
Having target nodes will generally help a lot, as SMILE will reduce the network before the inference so that it can guarantee that the targets are calculated (and the rest is just "don't care").
I hope this helps.
Marek