Hi GeNIe- Team,
while experimenting with the EM algorithm (Uniformize/ Randomize) in GeNIe, I noticed that a smoothing mechanism seems to be applied.
Is it possible to clarify what kind of smoothing or pseudocounts are used?
Thank you in advance.
EM algorithm
-
marek [BayesFusion]
- Site Admin
- Posts: 451
- Joined: Tue Dec 11, 2007 4:24 pm
Re: EM algorithm
I'm afraid we don't use any smoothing, at least consciously :-). Can you tell use more about it?
Cheers,
Marek
Cheers,
Marek
Re: EM algorithm
Hey Marek,
thanks for the quick reply!
While experimenting, I noticed that after learning with EM in GeNIe I never end up with hard probabilities like exactly 0 or 1.
That made me wonder whether GeNIe might implicitly avoid zero probabilities, since hard 0/1 values can cause issues during inference.
I might be misinterpreting the behavior, but I wanted to check whether this is an intended effect.
thanks for the quick reply!
While experimenting, I noticed that after learning with EM in GeNIe I never end up with hard probabilities like exactly 0 or 1.
That made me wonder whether GeNIe might implicitly avoid zero probabilities, since hard 0/1 values can cause issues during inference.
I might be misinterpreting the behavior, but I wanted to check whether this is an intended effect.
-
shooltz[BayesFusion]
- Site Admin
- Posts: 1483
- Joined: Mon Nov 26, 2007 5:51 pm
Re: EM algorithm
If the dataset used for parameter learning is complete, the EM uses one pass case counting. If equivalent sample size is zero (so initial network parameters have no influence on the output), and there are some CPT entries which have no records in the dataset, there will be 1.0/nodeOutcomeCount added to their respective counts before normalization.
If the dataset contains missing entries, EM does the expectation/maximization steps until convergence, and the fractional counts derived from JPTs are not adjusted for zero avoidance. You can get hard deterministic distributions in some CPT columns in such case.
If the dataset contains missing entries, EM does the expectation/maximization steps until convergence, and the fractional counts derived from JPTs are not adjusted for zero avoidance. You can get hard deterministic distributions in some CPT columns in such case.