Output posterior probabilities for latent variables (and missing values) after EM converges on training set

Yun · Post by **Yun** » Tue Oct 18, 2016 7:32 pm

Hi, SMILE creators,

I am currently using JSMILE's em.learn to learn the parameters of a DBM with latent variables and missing values for observables. I want to check the posterior probabilities of the latent variables and observables' missing values for each sample (e.g., I have 20 samples corresponding to 20 rows in the training data) after the EM converges on the training dataset, because I want to make better sense of how the inference evolves across time slices on the training set.

For example, the DBN network structure is as follows: A, B are latent variables, and C, D, E are observables; A is the only parent of C, B is the only parent of D, and A, B are the parents of E; A and B both have order 1 (i.e., A_{t-1} is the parent of A_{t}, and B_{t-1} is the parent of B_{t}). Following is part of my training data:

A A_1 A_2 A_3 A_4 A_5 B B_1 B_2 B_3 B_4 B_5 C C_1 C_2 C_3 C_4 C_5 D D_1 D_2 D_3 D_4 D_5 E E_1 E_2 E_3 E_4 E_5
* * * * * * * * * * * * f f * * * * * * f f * * * * * * f f (row 1)
* * * * * * * * * * * * f t * * * * * * f f * * * * * * f f
* * * * * * * * * * * * t t * * * * * * f t * * * * * * f t
...

I want to output something as follows this for the training set, where each value indicates the posterior probability of a variable being true conditioned on the observed data.

A A_1 A_2 A_3 A_4 A_5 B B_1 B_2 B_3 B_4 B_5 C C_1 C_2 C_3 C_4 C_5 D D_1 D_2 D_3 D_4 D_5 E E_1 E_2 E_3 E_4 E_5
0.6 0.62 0.64 0.66 0.68 0.70 0.3 0.5 0.7 0.9 0.92 0.93 0 0 0.65 0.68 0.70 0.74 0.2 0.4 0 0 0.5 0.7 0.1 0.14 0.5 0.7 0 0 (row 1)
...

Am I asking a valid question? If so, is there a way I could get this output?

[Side note: I think this is different from getting the inference result on a test set. In my application, during testing phrase, I can only enter evidence observed so far one by one (using net.setEvidence --> net.getNodeValue --> net.setNodeDefinition). However, during the training process, the posteriors should be conditioned on the entire sample (sequence) and also considers transitions between time slices. ]

Thank you so much! I appreciate your help!

Yun

Wed Oct 19, 2016 12:45 pm

I'm not quite sure I understand your question fully. Can you post your datafile and network here?

mark · Post by **mark** » Mon Oct 24, 2016 10:45 pm

Can't you just load a sequence of observations, perform inference (update beliefs), and read out the posterior distributions from the latent variables? Then you clear sequence 1, load sequence 2, and repeat the same procedure. Unless I'm missing something, I don't see why this wouldn't be possible.

BayesFusion Support Forum

Output posterior probabilities for latent variables (and missing values) after EM converges on training set

Output posterior probabilities for latent variables (and missing values) after EM converges on training set

Re: Output posterior probabilities for latent variables (and missing values) after EM converges on training set

Re: Output posterior probabilities for latent variables (and missing values) after EM converges on training set