error with log likelihood during EM

The engine.
Post Reply
annieraichev
Posts: 11
Joined: Mon Dec 11, 2023 8:20 pm

error with log likelihood during EM

Post by annieraichev »

I keep getting this value 8.29714e-318 as a loglikelihood while running EM. Im running EM on discrete bayesian networks, so therefore the LL should be between [-inf, 0], but 8.29714e-318 keeps showing up within multiple runs of EM, even on different networks. Can someone explain why this is happening?
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: error with log likelihood during EM

Post by shooltz[BayesFusion] »

Can you post your network and data file here, so we can reproduce the issue? Alternatively, send me a private message on the forum and attach the files (or provide a download link).
annieraichev
Posts: 11
Joined: Mon Dec 11, 2023 8:20 pm

Re: error with log likelihood during EM

Post by annieraichev »

Hi, I attached some of the files. ex7_TD2_10.xdsl is the model file. ex7_TD2_10.csv is the data file. and em_ex7_TD2_10_ED2_0.xdsl is an initialization for the em algorithm, which gives me 8.29714e-318 as a log likelihood
Attachments
ex7_TD2_10.xdsl
(6.06 KiB) Downloaded 701 times
ex7_TD2_10.csv
(11.74 KiB) Downloaded 703 times
em_ex7_TD2_10_ED2_0.xdsl
(4.35 KiB) Downloaded 700 times
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: error with log likelihood during EM

Post by shooltz[BayesFusion] »

We checked the log likelihood obtained after learning parameters with your files, and got values close to -1000 (in GeNIe and with a simple Python program.)

Maybe you did not create a proper matching between your network and data? Can you post your code fragment where you call EM?
annieraichev
Posts: 11
Joined: Mon Dec 11, 2023 8:20 pm

Re: error with log likelihood during EM

Post by annieraichev »

I am using Smile (the C++ version) Here is my code for the EM portion:

Code: Select all


std::vector<DSL_datasetMatch> matching;
DSL_network emModel;
DSL_dataset ds;
double loglik;

//reading data file
res = ds.ReadFile(dataFile.c_str());
if (DSL_OKAY != res)
{
        return res;
}
//passing in the initialization for the model during EM
res = emModel.ReadFile(emFile.c_str()); 
if (DSL_OKAY != res)
{
        std::cout << "error reading em file "<< res <<std::endl;
        return res;
}
//matching model 
res = ds.MatchNetwork(emModel, matching, errMsg);
if (DSL_OKAY == res)
{       
        DSL_em em;
        em.SetEquivalentSampleSize(0);
        res = em.Learn(ds, emModel, matching, &loglik);
}
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: error with log likelihood during EM

Post by shooltz[BayesFusion] »

Your code will run fine when starting with ex7_TD2_10.xdsl. The log likelihood is around -990.

If the model used is em_ex7_TD2_10_ED2_0.xdsl, the DSL_em::Learn will return DSL_ZERO_POTENTIAL, which is defined as -43. This is due to conflicting evidence in the dataset. Your code does not check for the status returned from DSL_em::Learn. When the Learn method does not succeed, the value of log likelihood variable is not modified, and you're getting the 8.29714e-318 value (which is just random bits in the local double variable).

To debug your code/data, redirect SMILE's error output to console. If you're using most recent SMILE 2.2.0, use the following;

Code: Select all

emModel.Logger().RedirectToFile(stdout);
If SMILE version is earlier than 2.2, use this line instead:

Code: Select all

DSL_errorH().RedirectToFile(stdout);
With redirection enabled, you'll get this message when running EM on em_ex7_TD2_10_ED2_0.xdsl

Code: Select all

43: EM: please run with relevance enabled
To enable relevance, add this line before the Learn call:

Code: Select all

em.SetRelevance(true);
With relevance enabled and logger redirected to the standard output, you'll get a series of messages like this:

Code: Select all

26: EM: can't set evidence in record 1 for node Y
The node Y has some deterministic columns in its CPT, which conflict with the data in the dataset.
annieraichev
Posts: 11
Joined: Mon Dec 11, 2023 8:20 pm

Re: error with log likelihood during EM

Post by annieraichev »

Thank you so much for this information. In the em file I generated CPTs randomly but looks like because some of the values were so close to one or zero they got rounded up/down. Would you recommend using "em.SetRandomizeParameters(True); " instead of passing in the initialization through a file ? Can you give me some more information on this function call, and how the randomizations are set

Thank you
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: error with log likelihood during EM

Post by shooltz[BayesFusion] »

In the em file I generated CPTs randomly
If you don't have any initial probability distribution for your model, just use EM's internal randomization (call em.SetRandomizeParameters(true), and optionally fix the random seed for the initial parameters with em.SetSeed before SetRandomizeParameters).

Do not call SetEquivalentSampleSize if your initial parameters are random - this method is used when your parameter were already learned or specified by the expert, and you just want to refine them with the new dataset.
annieraichev
Posts: 11
Joined: Mon Dec 11, 2023 8:20 pm

Re: error with log likelihood during EM

Post by annieraichev »

Okay, thank you for the elaboration I know how to proceed now.
annieraichev
Posts: 11
Joined: Mon Dec 11, 2023 8:20 pm

Re: error with log likelihood during EM

Post by annieraichev »

I implemented the above, and im getting a bunch of these messages,
-26: EM: can't set evidence in record 3 for node NODE040
-26: EM: can't set evidence in record 8 for node NODE040
-26: EM: can't set evidence in record 17 for node NODE040
-26: EM: can't set evidence in record 29 for node NODE022
-26: ERROR: conficting evidence between nodes [NODE008] and [NODE038]
-26: EM: can't set evidence in record 31 for node NODE039
-26: EM: can't set evidence in record 34 for node NODE022
-26: EM: can't set evidence in record 34 for node NODE040
-26: EM: can't set evidence in record 36 for node NODE022
-26: EM: can't set evidence in record 37 for node NODE041
-26: EM: can't set evidence in record 38 for node NODE022

is this an issue or normal ?
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: error with log likelihood during EM

Post by shooltz[BayesFusion] »

The messages you're getting show that your data is inconsistent with the parameters in the network. You have a bunch of deterministic nodes, and you're trying to input an impossible combination of evidence from the data file.
Post Reply