problems with parameter learning with EM
problems with parameter learning with EM
I'm trying to learn a network with several hidden variables and some missing data from real live data. When I remove the hidden variables and learn a Naive Bayes network, learning works fine. But when I try to learn with the hidden variables I get the following message
em: not all nodes are updated (deterministic cpts?)
and either nothing happens or all/most of my cpts are zeroed.
Any Idea what the problem may be?
Thanks!
em: not all nodes are updated (deterministic cpts?)
and either nothing happens or all/most of my cpts are zeroed.
Any Idea what the problem may be?
Thanks!
some more information
i'm using 2.0.3259.0, and the program tends to crash after several iterations of training.
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
Re: some more information
Did you try the most recent build (2.0.3393.0)?ninio wrote:i'm using 2.0.3259.0, and the program tends to crash after several iterations of training.
some more observations
I updated to the latest version, and still get this.
In addition, I found that Genie has a problem when the network loaded from an xdsl file has a cpt with a total conditional distribution that's more then one. For example, on a binary variable, if I set the probs to 1 and 1e-5, and then train-parameters and try to open the node properties, the program dies.
worse, if you set the probs to 1 and 1e-9, genie will not complain but the EM parameter learning will fail with the same error message.
My current suspicion is that during the EM process, a rounding error leads to a sum that is not exactly one, and this leads to the problem.
attached is a small network that displays the problem. The column for s2 in "problem" has 1e-009 and 1, so learning this network fails. If you open the definition tab for the node, mark the s2 column and press normalize ("\Sigma=1") and OK, you can train the netwrork without a problem.
In addition, I found that Genie has a problem when the network loaded from an xdsl file has a cpt with a total conditional distribution that's more then one. For example, on a binary variable, if I set the probs to 1 and 1e-5, and then train-parameters and try to open the node properties, the program dies.
worse, if you set the probs to 1 and 1e-9, genie will not complain but the EM parameter learning will fail with the same error message.
My current suspicion is that during the EM process, a rounding error leads to a sum that is not exactly one, and this leads to the problem.
attached is a small network that displays the problem. The column for s2 in "problem" has 1e-009 and 1, so learning this network fails. If you open the definition tab for the node, mark the s2 column and press normalize ("\Sigma=1") and OK, you can train the netwrork without a problem.
- Attachments
-
- Network3test.xdsl
- the network
- (2.33 KiB) Downloaded 425 times
-
- Network2.txt
- the data file
- (8.13 KiB) Downloaded 407 times
not randomizing the parameters
I have a good idea what I want, and randomizing will give my hidden variables some random meaning, which will probably not match what I want.
And while at the subject - in what units is the "Confidence" setting? I used 1 to get the results above.
I will mention that on my large network (a dozen hidden vars and about twice that observed ones) I get the same error message even if I do randomize (but the time it takes to reach this state depends on the randomization). I will also mention that after a couple of dozen training sessions Genei tends to become unstable, and I see problems in image rendering and a significant tendency to crash, mostly if I try to close the program.
And while at the subject - in what units is the "Confidence" setting? I used 1 to get the results above.
I will mention that on my large network (a dozen hidden vars and about twice that observed ones) I get the same error message even if I do randomize (but the time it takes to reach this state depends on the randomization). I will also mention that after a couple of dozen training sessions Genei tends to become unstable, and I see problems in image rendering and a significant tendency to crash, mostly if I try to close the program.
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
Re: not randomizing the parameters
Does this happen with small model you've attached in one of previous posts, or the large network is required to crash the program?ninio wrote:I will also mention that after a couple of dozen training sessions Genei tends to become unstable, and I see problems in image rendering and a significant tendency to crash, mostly if I try to close the program.
glad to hear
is there a time frame for the next release? Is it possible to get a preview?
I'm currently stuck with my project, being unable to train my network.
As for the crashes - the software will crash when the input network has a node with distribution with a sum grater then one, and this network is sent to the learn-parameters function. It will fail, but once the node is opened for edit, the program will crash.
I'm currently stuck with my project, being unable to train my network.
As for the crashes - the software will crash when the input network has a node with distribution with a sum grater then one, and this network is sent to the learn-parameters function. It will fail, but once the node is opened for edit, the program will crash.
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
Re: glad to hear
We'll be testing modified EM for couple of days. If it works OK we'll release GeNIe or at least notify you and provide link to preview version.ninio wrote:is there a time frame for the next release? Is it possible to get a preview?
I can't reproduce that. Using most recent public GeNIe build (the one which does not yet have our recent EM fix) I ran EM on your model . EM returned an error message as expected and I can inspect all four CPTs in GeNIe without problems. This does not change when I run EM multiple times. Next I modified the network to have sum > 1 in node 'observed'. Still, EM returns error, but I can open the CPTs without crash.As for the crashes - the software will crash when the input network has a node with distribution with a sum grater then one, and this network is sent to the learn-parameters function. It will fail, but once the node is opened for edit, the program will crash.
I can't reproduce it now
I had a set where this happened consistently, but I saved over the file. I will try to see if I can find a copy.
Thanks for all your help!
Thanks for all your help!
Re: I can't reproduce it now
coming to think about it, the crashing may have happend on a slightly out of date version ( 2.0.3259.0 ) I have been using until recently. It may still be worth checking, as I did see problems when working with larger networks even with the new code, but this I can not duplicate so easily.ninio wrote:I had a set where this happened consistently, but I saved over the file. I will try to see if I can find a copy.
Thanks for all your help!
Any news on the updated code?
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
Re: I can't reproduce it now
Send me the private message.ninio wrote:Any news on the updated code?
Re: I can't reproduce it now
shooltz wrote:Send me the private message.ninio wrote:Any news on the updated code?
thanks, looks like it works fine. Its great to see such support!