EM algorithm and smoothing

MartinA · Post by **MartinA** » Fri Apr 12, 2019 9:29 am

Hello,

I have a question related to the EM algorithm implemented in the dsl_em class. Is there any smoothing implemented during the EM process? I could not find the info, and it does not seem that any parameters in the function can control this.
- If yes, which is the exact smoothing implemented?
- If not, any tips to implement it myself (I use the java wrapper) ?

Thanks

mark · Post by **mark** » Tue Apr 16, 2019 10:45 pm

Hi MartinA, you should be able to achieve this by setting the equivalent sample size and picking/setting a prior distribution (I believe uniform is the default).

MartinA · Post by **MartinA** » Wed Apr 24, 2019 7:57 am

Hi Mark,

Thanks a lot for your answer. I see the point although I do not really know how to set the prior distribution of my variable (at least without doing this manually). Could you help me out?

Wed Apr 24, 2019 12:28 pm

I do not really know how to set the prior distribution of my variable (at least without doing this manually)

I am confused by your question. How else would you like to set the prior distribution over your variable? ESS allows you to weigh the prior distribution with the data but you need a prior distribution to start. If I am missing something obvious, please let me know!

Marek

mark · Post by **mark** » Wed Apr 24, 2019 2:27 pm

You can just use a uniform distribution (which should be the default). I think that works well for smoothing.

MartinA · Post by **MartinA** » Wed Apr 24, 2019 3:51 pm

Hi Marek, Mark,

Thanks a lot for your answers, and sorry if this appears to be confusing. Let me try to rephrase and give a bit more details about what I want to do.

First, I now understand that if you use the uniform distribution as a starting point, you are doing some smoothing (to avoid having CPT values with zeros after learning). That was my initial question and now I understand better my results.

The issue now is that I do not have all possible cases represented in the training data (i.e. some combinations between variables never appear). This data comes from measurements that I cannot do them again. Please let me know if there is any flaw in the reasoning:
- As I understand, training starts based on a uniform distribution, and then CPTs are updated according to the cases seen in the training data. I would then assume that, for the cases not represented in the data, the probability values will stay the same as in the beginning of the training.
- Then, when I perform inference using the BN (for examples MAP or MPE), and I come across a case that has never been explored, I guess that it is difficult for the BN to decide what is the highest probability (since we started from uniform and nothing changed).
- In this case, I was expecting than smoothing would help, to also have an idea of what happens even if this exact combination of variables has never be seen. But I did not know really how to implement it in SMILE.

Sorry for the long post, hope its a bit more clear :)

Martin

mark · Post by **mark** » Wed Apr 24, 2019 8:44 pm

What you say makes sense. If a given combination of parent variables never appears then obviously the distribution in the child node cannot be learned because it simply never occurs. That is also exactly when smoothing will make a big difference as it does not get washed away by the data. However, one usually expects that this combination of parent variables also never happens in new/predict data (i.e., are you using the right model?).

MartinA · Post by **MartinA** » Thu Apr 25, 2019 8:57 am

Hi Mark,

Thanks a lot for your answer!

I agree with you, normally it should be avoided to have a combination of variables appearing only in the new data. However that cannot really be avoided in my case (or at least as far as I see it now), that is why I thought smoothing would help here.

Then, my question was how to implement any smoothing in SMILE. From what I understand:
- I can play with the equivalent sample size
- I can change the prior distribution

Regarding the prior distribution, it is hard for me to understand what to do (even more after Marek's answer) because I do not know with which distribution I could start rather than uniform. That is why I wanted to know if some of you could have tips !

Martin

mark · Post by **mark** » Thu Apr 25, 2019 2:12 pm

So if ESS=10 and the prior distribution is uniform (i.e., 0.5-0.5) then 5-5 is used for counts and smoothing (it is combined with whatever else the data gives).
If ESS=10 and the prior distribution is 0.9-0.1 then 9-1 is used for counts and smoothing (more weight is given to the first parameter).
Etc. Does that make sense?

MartinA · Post by **MartinA** » Mon Apr 29, 2019 7:20 am

Yes sure it does!

I will try different ESS values and see if I can improve the learning.

Thanks !

BayesFusion Support Forum

EM algorithm and smoothing

EM algorithm and smoothing

Re: EM algorithm and smoothing

Re: EM algorithm and smoothing

Re: EM algorithm and smoothing

Re: EM algorithm and smoothing

Re: EM algorithm and smoothing

Re: EM algorithm and smoothing

Re: EM algorithm and smoothing

Re: EM algorithm and smoothing

Re: EM algorithm and smoothing