Model Comparison through log(p)

The front end.
Post Reply
Hakankhm
Posts: 4
Joined: Thu Aug 27, 2020 11:01 am

Model Comparison through log(p)

Post by Hakankhm »

Hello,
GeNIe produces a log(P) or EM log-likelihood value after estimating a Bayesian model. This log(p) shows model accuracy and it can be used for model selection or comparison. However the only information about log(p) in the manual is that it is a score between minus infinity and zero. Can you please give me some information about how it is calculated and interpreted please? By interpretation I mean the correct prediction (accuracy) probability of the model. Thanks.
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: Model Comparison through log(p)

Post by marek [BayesFusion] »

Hi Hakan,

Log(p), ranging from minus infinity to zero, is not model accuracy but rather an expression of fit of the model to the data. Its numerical value is best used in comparisons between multiple runs of the learning algorithm -- the higher the number, the better (please keep in mind that the number is negative, so lower absolute value means higher and -10 is higher than -100). We interacted about log(p) by EMails and you requested the algorithm for calculating log(p) implemented in GeNIe (actually SMILE). Here is a pseudo-code for this calculation:

Inputs:
data file D with n records (possibly with missing values);
a model M;

float logp = 0.0;
/* Loop over all records in the data file from which you are learning */
for (int i=0; i<N, i++) {
Calculate the probability of evidence p(e) in M after entering the
state/value of every variable in record i into the model;
logp += log(p(e));
}

Please note that adding in logarithmic scale amounts to multiplication, so we are multiplying the probabilities of every record given the model. We do this because we assume that every record was generated independently by the model (hence the multiplication of probabilities). Effectively, we get log(p) = log(p(D|M)).

What you may be intuitively referring to as model accuracy is perhaps p(M|D), i.e., the probability of the model given the data. Unfortunately, this is much harder to compute, as we would need p(D) and this is something that we don't really know. Effectively, most scoring functions are based on p(D|M).

Does this make sense?

Marek
Hakankhm
Posts: 4
Joined: Thu Aug 27, 2020 11:01 am

Re: Model Comparison through log(p)

Post by Hakankhm »

Thank you so much for this detailed explanation Marek. It really helped me a lot. I also would like to thank Tomasz and other team members for your close interest in solving our problems and providing us such a great software. Best regards.
Hakankhm
Posts: 4
Joined: Thu Aug 27, 2020 11:01 am

Re: Model Comparison through log(p)

Post by Hakankhm »

After learning a model GeNIe gave me an EM log likelihood or log(p)=-15. When we calculate P (the probability of the model given the data set) it appears to be a very small probability so does this show that my model is useless? Thanks
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: Model Comparison through log(p)

Post by marek [BayesFusion] »

Not at all. log(p) is the probability of the whole data set given the model and it depends strongly on the data set size and the number of variables. Generally, the probability of a combination of values of nodes within a record is very low. Please verify this by choosing a record in your data set, entering the values of all variables as evidence in your GeNIe model, and then calculating p(e) in GeNIe. You will see that this probability will be very low, especially when there are many variables in your model. To obtain log(p), you need to multiply all the p(e) in individual records by each other and will typically get an astronomically small number. So, log(p)=-15 does not strike me as a particularly low number.
I hope this helps,

Marek
Post Reply