EM algorithm details

The engine.
Post Reply
Ant_B
Posts: 1
Joined: Tue Mar 18, 2014 4:48 pm

EM algorithm details

Post by Ant_B »

Hi Genie/Smile community,

I recently started to experiment with SMILE via JSmile. Following the documentation, I was able to get up and running - creating a network, reading in a simple data file that includes missing data, and performing EM on the dataset. The simple project is on my GitHub page (improvements/fixes very welcome):
https://github.com/amb-enthusiast/BayesianHack

I also repeated this exercise with SamIam's inflib.jar library and Mallet's GRMM library (part of the above GitHub project). When comparing results, I noticed a discrepancy: my by-hand calculations, GRMM and SamIam results all matched up with the tutorial notes. However, JSmile/SMILE gave different results. My hunch is that the differences are due to different initial estimated CPT values, but it may be something else.

I tried to look up details of the EM implementation, but couldn't find any details in this forum, or in the site documentation.

Having experimented with JSmile a little, I have a few questions about the SMILE EM implementation:
  • What is the default behaviour for JSmile/SMILE EM? In particular, what initial parameter estimate is used?
  • Is it possible to override these defaults, and supply initial parameters for the target BN to the EM algorithm?
  • Is it possible to set thresholds (on log-likelihood difference or max/target number of iterations) for EM execution?
I'd be grateful for any advice, information or guidance on these questions.

Thanks in advance,

Ant
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: EM algorithm details

Post by shooltz[BayesFusion] »

What is the default behaviour for JSmile/SMILE EM? In particular, what initial parameter estimate is used?
You can randomize, uniformize or use existing parameters as initial values. EM.setRandomizeParameters and EM.setUniformizeParameters control this behavior. Currently, randomize defaults to true and uniformize defaults to false (but I suggest calling setters anyway to express your intent explicitly).

See also EM.get/setEqSampleSize, which defaults to 1.
Is it possible to set thresholds (on log-likelihood difference or max/target number of iterations) for EM execution?[/list]
No, this is not controlled through the API.
Post Reply