Crash/error upon inference in a dynamic Bayesian network

Wed Apr 01, 2020 10:37 am

Your method is a good first cut but you are not considering the fact that before cliques are formed, the network has to be triangularized and this introduces new connections. These new connections can make cliques much larger. In fact, the size grows exponentially with the number of variables in a clique, so your estimates may plainly go through the roof. It all depends on the topology of the network and predicting the size of the joint tree just based on the number of nodes and local information, such as the number of parents/connections will not work reliably.
I hope this helps,

Marek

PSGH · Post by **PSGH** » Wed Apr 01, 2020 2:47 pm

That's right, I forgot about triangularization. Considering the complexity of my graph, pretty much everything within a slice is connected, so it is indeed very complex to estimate memory usage. I need to not use junction tree in my network, whether for learning or inference. It is very annoying that there isn't any way to use an algorithm which offers a tunable parameter to choose a tradeoff between computation complexity and memory needs.

PSGH · Post by **PSGH** » Thu Apr 16, 2020 2:56 pm

I would like to know which formula exactly you use to calculate the LogLikelihood, because I'd like to compare the results of my own implementation of EM to yours. I learn the same parameters in the end, but my LogLikelihood is not the same.

Thu Apr 16, 2020 8:56 pm

SMILE's EM implementation calculates log likelihood by adding all log(P(e)) for each data row. The value of log likelihood displayed after EM is complete is the number obtained in the last iteration before convergence.

PSGH · Post by **PSGH** » Fri Apr 17, 2020 8:58 am

Yes, but how do you calculate P(e). I know LogLikelihood is the sum of the logarithm of the probabilities of the evidence given the parameters, but the probability of the evidence may be calculated in different ways. I want to know how you calculate it. I calculate it as the theoretical probability of obtaining the evidence without any observations, but I don't get the same results as SMILE when there are temporal arcs. I'd like you to give me a mathematical formula with terms explained.

Also a remark after using SMILE and GeNIe : it would be very convenient if you allowed to not only fix nodes, but also specific parameters of nodes. Being able to tie parameters across nodes and within nodes would also be great. Finally, although that would be a completely new feature, allow for structure variability within the networks, i. e. the existence of some arcs is dependant on the values of some nodes. I would have had use for all of this in my model.

Mon Apr 20, 2020 11:34 am

Yes, but how do you calculate P(e).

Our P(e) implementation should give you the number consistent with chain rule:

https://en.wikipedia.org/wiki/Chain_rul ... _variables

As I wrote earlier in this thread, SMILE first attempts to use a jointree-based algorithm for P(e). If the memory requirements for this algorithm are to great, the fallback algorithm based directly on the chain rule is used (so the number of internal inference calls is equal to the number of evidence nodes).

For DBNs P(e) is calculated on the unrolled network.

PSGH · Post by **PSGH** » Mon Apr 20, 2020 3:42 pm

Hi,
Thanks for your reply.
I've come across another thing I don't understand : sometimes, when I use EPIS sammpling inference via SMILE, I get NaNs for some node values. I get these only when the network has a consistent size, and I've solved it by increasing the number of samples from 1K to 1M, but is it the right way to go, or is the real issue somewhere else?

Mon Apr 20, 2020 10:22 pm

I believe this may be the issue with some evidence set which contains evidence values with very low P(e). We can check it under the debugger if you can share the network/dataset.

PSGH · Post by **PSGH** » Tue Apr 21, 2020 10:37 am

Hi,
I read your answer about the calculus of P(e), and I don't get why you need one inference call for each evidence node. Here is how I calculate P(e):
I do one call of inference without any evidence. This call gives me all the theoretical marginal probabilities of the network. Then, I sum on each time slice of each realisation and for each observed node, the log of the theoretical probability of obtaining the observation I got. This gives me the same result as you when there isn't any temporal arc, but a different one when there is a temporal arc. The rest of my program is right, as in both cases, I get the right learned parameters.
I also tried the formula of Q equation of EM, the expected log-likelihood, but with removing the unobserved nodes of the sum, and I get a different value when there is unobserved nodes in the network.
However, in both methods I get quite close values when they're wrong, so what is wrong with my methods?

Thank you in advance.

Mon Apr 27, 2020 2:36 pm

I'm attaching a simple DBN as an example to use with P(e). The xdsl file has one case (use View|Case Manager to apply the case) with three temporal evidence items:

Code: Select all

Rain(t=2)=true
Rain(t=5)=false
Rain(t=7)=true

P(e) is calculated on an unrolled network (use GeNIe's Unroll command after setting the evidence to obtain an unrolled network with evidence already set).

With chain rule, the P(e) will be calculated in the following way:

Code: Select all

init pe to 1.0

for Rain_2 state=true  		
Update network with no evidence, 
get posterior prob. for Rain_2=true
NodeP		0.27095540499999998 
pe*=NodeP	0.27095540499999998
Add evidence for Rain2=true

Update network with Rain_2=true,
get posterior prob. for Rain_5=false
NodeP		0.47754831292279332
pe*=NodeP	0.12939429653506218
Add evidence for Rain_5=false

Update network with Rain_2=true,Rain_5=false
get posterior prob. for Rain_7=true
NodeP		0.40445044951793174
pe*=NodeP	0.052333581398662454

Final P(e) is 0.052333581398662454, consistent with jointree-based P(e).

PSGH · Post by **PSGH** » Tue Apr 28, 2020 9:36 am

Hi,
Okay, I understand how you proceed, but that still seems a strange and extremely inefficient way of computing the log-likelihood. Why adding the observations one at a time? Since a single one influence the inferred state of both the nodes before and after it, putting in partial obsevation would result in something holding no meaning. Moreover, I believe chain rule applies only to conditional probabilities, not marginal ones. An observation of the network is constituted of all its variables observations, therefore, it only makes sense to have no variables observations at all or all of them. Furthermore, calling inference for each and every observation is extremely calculus consumming, and it would be the weirdest for the computation time of the log-likelihood, which is just an evaluation metric of the learning, to be far more important than eveything else in the learning procedure. Anyway, that's not something I can do, as I will have at least a few hundreds of thousands of observations in my dataset.

Tue Apr 28, 2020 2:46 pm

The chain-rule based P(e) is a fallback. Our primary algorithm uses jointree and has running time approximately equal to one inference call, regardless of the number of evidence nodes.

The fallback approach is useful when exact inference can't be performed; multiple sampling inference calls will most likely work, trading time for memory pressure.

PSGH · Post by **PSGH** » Wed Apr 29, 2020 3:44 pm

Okay, thank you.
I have now more or less converged on a definitive version of my network, but when I try to launch an inference on 200 slices with observations of one node (Bit) on each slice via java code, it crashes after a few minutes with the exception "bad allocation".
I use Likelihood Sampling to do the inference. Everything works fine for 40 slices, however. My network has one huge node with around 7 million paramameters (Champ). My computer has only around 7Go of RAM, so I would like to know how I could achieve this inference without crashing. I can't join my network, as it is too big, but I join a screenshot of it :

: NetworkFinalVersion.PNG (23.27 KiB) Viewed 12693 times

Thank you in advance.

Thu Apr 30, 2020 10:53 am

Can you upload your network to Google Drive, Dropbox or similar service and send the link?

BayesFusion Support Forum

Crash/error upon inference in a dynamic Bayesian network

Re: Crash/error upon inference in a dynamic Bayesian network

Re: Crash/error upon inference in a dynamic Bayesian network

Re: Crash/error upon inference in a dynamic Bayesian network

Re: Crash/error upon inference in a dynamic Bayesian network

Re: Crash/error upon inference in a dynamic Bayesian network

Re: Crash/error upon inference in a dynamic Bayesian network

Re: Crash/error upon inference in a dynamic Bayesian network

Re: Crash/error upon inference in a dynamic Bayesian network

Re: Crash/error upon inference in a dynamic Bayesian network

Re: Crash/error upon inference in a dynamic Bayesian network

Re: Crash/error upon inference in a dynamic Bayesian network

Re: Crash/error upon inference in a dynamic Bayesian network

Re: Crash/error upon inference in a dynamic Bayesian network

Re: Crash/error upon inference in a dynamic Bayesian network