DSL_em

<< Click to Display Table of Contents >>

Navigation:  Reference Manual > Learning >

DSL_em

Header file: em.h


DSL_em();

The default constructor sets equivalent sample size to one, random seed to zero, and parameter randomization to true.


int Learn(const DSL_dataset& ds, DSL_network& orig,

    const std::vector<DSL_datasetMatch> &matches,

    double *loglik = NULL, DSL_progress *progress = 0);

int Learn(const DSL_dataset& ds, DSL_network& orig,

    const std::vector<DSL_datasetMatch> &matches,

    const std::vector<int> &fixedNodes,

    double *loglik = NULL, DSL_progress *progress = 0);

Learns network parameters by means of the EM algorithm in the specified network using data from the data set.  Returns DSL_OKAY on success or an error code on failure.

Network nodes and data set variables are matched through the DSL_datasetMatch vector specified through matches argument. Typically, this vector is obtained by a call to DSL_dataset::MatchNetwork, but it can also be created by your program if node and variable identifiers do not match.

The second overload should be used when some nodes' parameters are assumed to be fixed. The handles of these nodes are passed in the fixedNodes argument.

The optional argument loglik can be used to obtain the log likelihood from the EM algorithm. This value, ranging from minus infinity to zero, is a measure of fit of the model to the data.

The optional argument progress can be used to stop learning by returning false from DSL_progress::Tick method, which is called periodically within the main loop of the learning algorithm. In such a case, the Learn method returns DSL_INTERRUPTED.


void SetRandomizeParameters(bool r);

bool GetRandomizeParameters() const;

Sets/gets the value of the parameter randomization flag. If set to true, the network parameters will be randomized before entering the main loop of the EM algorithm. Defaults to true. If SetRandomizeParameters(true) is called, the uniformization is disabled and the equivalent sample size is set to 1.


void SetUniformizeParameters(bool u);

bool GetUniformizeParameters() const;

Sets/gets the value of the parameter uniformization flag. If set to true, the network parameters will be uniformized before entering the main loop of the EM algorithm. Defaults to false. If SetUniformizeParameters(true) is called, the uniformization is disabled and equivalent sample size is set to 1.


void SetSeed(int seed);

int GetSeed() const;

Sets/gets the seed used to initialize the random number generator. Defaults to zero, which causes the value based on the system clock to be used as seed. Calling SetSeed does not automatically enable randomization.


int SetEquivalentSampleSize(float eqs);

float GetEquivalentSampleSize() const;

Sets/gets the equivalent sample size. The equivalent sample size, also known as confidence, can be interpreted as the number of records that the current network parameters are based on. The larger the value, the less weight is assigned to new cases,which gives a mechanism for a gentle refinement of the model numerical parameters. The interpretation of this parameter is obvious when the entire network or its parameters have been learned from data - it should be equal to the number of records in the data file from which they were learned.

Equivalent sample size defaults to 1. Call SetRandomizeParameters(false) and SetUniformizeParameters(false) if you want to use larger values as equivalent sample sizes. SetEquivalentSampleSize fails if either randomization or uniformization is enabled.