Post
by **MartinA** » Wed Apr 24, 2019 3:51 pm

Hi Marek, Mark,

Thanks a lot for your answers, and sorry if this appears to be confusing. Let me try to rephrase and give a bit more details about what I want to do.

First, I now understand that if you use the uniform distribution as a starting point, you are doing some smoothing (to avoid having CPT values with zeros after learning). That was my initial question and now I understand better my results.

The issue now is that I do not have all possible cases represented in the training data (i.e. some combinations between variables never appear). This data comes from measurements that I cannot do them again. Please let me know if there is any flaw in the reasoning:

- As I understand, training starts based on a uniform distribution, and then CPTs are updated according to the cases seen in the training data. I would then assume that, for the cases not represented in the data, the probability values will stay the same as in the beginning of the training.

- Then, when I perform inference using the BN (for examples MAP or MPE), and I come across a case that has never been explored, I guess that it is difficult for the BN to decide what is the highest probability (since we started from uniform and nothing changed).

- In this case, I was expecting than smoothing would help, to also have an idea of what happens even if this exact combination of variables has never be seen. But I did not know really how to implement it in SMILE.

Sorry for the long post, hope its a bit more clear :)

Martin