error with log likelihood during EM
-
- Posts: 11
- Joined: Mon Dec 11, 2023 8:20 pm
error with log likelihood during EM
I keep getting this value 8.29714e-318 as a loglikelihood while running EM. Im running EM on discrete bayesian networks, so therefore the LL should be between [-inf, 0], but 8.29714e-318 keeps showing up within multiple runs of EM, even on different networks. Can someone explain why this is happening?
-
- Site Admin
- Posts: 1443
- Joined: Mon Nov 26, 2007 5:51 pm
Re: error with log likelihood during EM
Can you post your network and data file here, so we can reproduce the issue? Alternatively, send me a private message on the forum and attach the files (or provide a download link).
-
- Posts: 11
- Joined: Mon Dec 11, 2023 8:20 pm
Re: error with log likelihood during EM
Hi, I attached some of the files. ex7_TD2_10.xdsl is the model file. ex7_TD2_10.csv is the data file. and em_ex7_TD2_10_ED2_0.xdsl is an initialization for the em algorithm, which gives me 8.29714e-318 as a log likelihood
- Attachments
-
- ex7_TD2_10.xdsl
- (6.06 KiB) Downloaded 1198 times
-
- ex7_TD2_10.csv
- (11.74 KiB) Downloaded 1391 times
-
- em_ex7_TD2_10_ED2_0.xdsl
- (4.35 KiB) Downloaded 1356 times
-
- Site Admin
- Posts: 1443
- Joined: Mon Nov 26, 2007 5:51 pm
Re: error with log likelihood during EM
We checked the log likelihood obtained after learning parameters with your files, and got values close to -1000 (in GeNIe and with a simple Python program.)
Maybe you did not create a proper matching between your network and data? Can you post your code fragment where you call EM?
Maybe you did not create a proper matching between your network and data? Can you post your code fragment where you call EM?
-
- Posts: 11
- Joined: Mon Dec 11, 2023 8:20 pm
Re: error with log likelihood during EM
I am using Smile (the C++ version) Here is my code for the EM portion:
Code: Select all
std::vector<DSL_datasetMatch> matching;
DSL_network emModel;
DSL_dataset ds;
double loglik;
//reading data file
res = ds.ReadFile(dataFile.c_str());
if (DSL_OKAY != res)
{
return res;
}
//passing in the initialization for the model during EM
res = emModel.ReadFile(emFile.c_str());
if (DSL_OKAY != res)
{
std::cout << "error reading em file "<< res <<std::endl;
return res;
}
//matching model
res = ds.MatchNetwork(emModel, matching, errMsg);
if (DSL_OKAY == res)
{
DSL_em em;
em.SetEquivalentSampleSize(0);
res = em.Learn(ds, emModel, matching, &loglik);
}
-
- Site Admin
- Posts: 1443
- Joined: Mon Nov 26, 2007 5:51 pm
Re: error with log likelihood during EM
Your code will run fine when starting with ex7_TD2_10.xdsl. The log likelihood is around -990.
If the model used is em_ex7_TD2_10_ED2_0.xdsl, the DSL_em::Learn will return DSL_ZERO_POTENTIAL, which is defined as -43. This is due to conflicting evidence in the dataset. Your code does not check for the status returned from DSL_em::Learn. When the Learn method does not succeed, the value of log likelihood variable is not modified, and you're getting the 8.29714e-318 value (which is just random bits in the local double variable).
To debug your code/data, redirect SMILE's error output to console. If you're using most recent SMILE 2.2.0, use the following;
If SMILE version is earlier than 2.2, use this line instead:
With redirection enabled, you'll get this message when running EM on em_ex7_TD2_10_ED2_0.xdsl
To enable relevance, add this line before the Learn call:
With relevance enabled and logger redirected to the standard output, you'll get a series of messages like this:
The node Y has some deterministic columns in its CPT, which conflict with the data in the dataset.
If the model used is em_ex7_TD2_10_ED2_0.xdsl, the DSL_em::Learn will return DSL_ZERO_POTENTIAL, which is defined as -43. This is due to conflicting evidence in the dataset. Your code does not check for the status returned from DSL_em::Learn. When the Learn method does not succeed, the value of log likelihood variable is not modified, and you're getting the 8.29714e-318 value (which is just random bits in the local double variable).
To debug your code/data, redirect SMILE's error output to console. If you're using most recent SMILE 2.2.0, use the following;
Code: Select all
emModel.Logger().RedirectToFile(stdout);
Code: Select all
DSL_errorH().RedirectToFile(stdout);
Code: Select all
43: EM: please run with relevance enabled
Code: Select all
em.SetRelevance(true);
Code: Select all
26: EM: can't set evidence in record 1 for node Y
-
- Posts: 11
- Joined: Mon Dec 11, 2023 8:20 pm
Re: error with log likelihood during EM
Thank you so much for this information. In the em file I generated CPTs randomly but looks like because some of the values were so close to one or zero they got rounded up/down. Would you recommend using "em.SetRandomizeParameters(True); " instead of passing in the initialization through a file ? Can you give me some more information on this function call, and how the randomizations are set
Thank you
Thank you
-
- Site Admin
- Posts: 1443
- Joined: Mon Nov 26, 2007 5:51 pm
Re: error with log likelihood during EM
If you don't have any initial probability distribution for your model, just use EM's internal randomization (call em.SetRandomizeParameters(true), and optionally fix the random seed for the initial parameters with em.SetSeed before SetRandomizeParameters).In the em file I generated CPTs randomly
Do not call SetEquivalentSampleSize if your initial parameters are random - this method is used when your parameter were already learned or specified by the expert, and you just want to refine them with the new dataset.
-
- Posts: 11
- Joined: Mon Dec 11, 2023 8:20 pm
Re: error with log likelihood during EM
Okay, thank you for the elaboration I know how to proceed now.
-
- Posts: 11
- Joined: Mon Dec 11, 2023 8:20 pm
Re: error with log likelihood during EM
I implemented the above, and im getting a bunch of these messages,
-26: EM: can't set evidence in record 3 for node NODE040
-26: EM: can't set evidence in record 8 for node NODE040
-26: EM: can't set evidence in record 17 for node NODE040
-26: EM: can't set evidence in record 29 for node NODE022
-26: ERROR: conficting evidence between nodes [NODE008] and [NODE038]
-26: EM: can't set evidence in record 31 for node NODE039
-26: EM: can't set evidence in record 34 for node NODE022
-26: EM: can't set evidence in record 34 for node NODE040
-26: EM: can't set evidence in record 36 for node NODE022
-26: EM: can't set evidence in record 37 for node NODE041
-26: EM: can't set evidence in record 38 for node NODE022
is this an issue or normal ?
-26: EM: can't set evidence in record 3 for node NODE040
-26: EM: can't set evidence in record 8 for node NODE040
-26: EM: can't set evidence in record 17 for node NODE040
-26: EM: can't set evidence in record 29 for node NODE022
-26: ERROR: conficting evidence between nodes [NODE008] and [NODE038]
-26: EM: can't set evidence in record 31 for node NODE039
-26: EM: can't set evidence in record 34 for node NODE022
-26: EM: can't set evidence in record 34 for node NODE040
-26: EM: can't set evidence in record 36 for node NODE022
-26: EM: can't set evidence in record 37 for node NODE041
-26: EM: can't set evidence in record 38 for node NODE022
is this an issue or normal ?
-
- Site Admin
- Posts: 1443
- Joined: Mon Nov 26, 2007 5:51 pm
Re: error with log likelihood during EM
The messages you're getting show that your data is inconsistent with the parameters in the network. You have a bunch of deterministic nodes, and you're trying to input an impossible combination of evidence from the data file.