Different Results in Learning

diana · Post by **diana** » Thu Jul 22, 2010 2:56 pm

I'm using the EM for parameter learning and the results from Genie and Smile differ quite a lot. I am using the default setting for em learning in smile. Is there an explanation for that, or am I doing smth wrong?

Code: Select all

void EM(DSL_network &theNet, string NBfile_Learning, string DiscParameterFileName)
{
	DSL_dataset dataset;

	cout<<"File to be read: "<<DiscParameterFileName<<endl;

	if(dataset.ReadFile(DiscParameterFileName)!=DSL_OKAY) 
    {
        cout<<"Reading failed!"<<endl;
        return;
    }
	else cout<<"Reading successful!"<<endl;
	
    DSL_network result;
	vector<DSL_datasetMatch> matchedData;
	string error;
	if(dataset.MatchNetwork(theNet, matchedData, error) !=DSL_OKAY)
    {
        cout<<"Matching failed!"<<endl;
        return;
    }
    cout<<"Matching successful!"<<endl;

	DSL_em em;
    cout<<"Learning in progress...."<<endl;
	if (em.Learn(dataset, theNet, matchedData)!=DSL_OKAY)
	{
		cout << "Learning failed!" << endl;
		return;
	}
	cout<< "Learning successful!" << endl;

	theNet.WriteFile(NBfile_Learning.c_str());
}

Thank you!

Tue Aug 03, 2010 2:18 pm

By default DSL_em randomizes initial parameters - check if running with randParams off in GeNIe and SMILE produces similiar results.

diana · Post by **diana** » Thu Aug 12, 2010 7:22 am

First of all, thanks for your reply.

It still gives different results. But now the main difference is that Genie does not give me an estimation for parameters for which i don't have evidence.
So if a combination of states does not occur in my learning data the distribution is left uniform (i am starting with uniform distribution in every node ).

Also, the Smile learning takes much much longer then the Genie learning, why is that? (same net, same trainingdata).

Thank you,
Diana

Thu Aug 12, 2010 10:55 pm

Your data file contains spaces after separator semicolons. This causes MatchNetwork method to return only two matchings instead of three.

But now the main difference is that Genie does not give me an estimation for parameters for which i don't have evidence.

How can it give any estimation in such case? Again, make sure that both GeNIe and your SMILE program run EM with initial parameter randomization off.

Also, the Smile learning takes much much longer then the Genie learning, why is that? (same net, same trainingdata).

GeNIe uses the same code as SMILE for actual learning (data file parsing and matching are different). Are you sure you're comparing the release build with GeNIe?

diana · Post by **diana** » Fri Aug 13, 2010 10:19 am

Thanks for your prompt reply.

The net I am using is actually different from the one posted, that one was just for testing the implementation of the learning. I am using the four level BN with 25 input nodes. Unfortunately, I am not allow to share the net, nor the trainingset.

For a training set of 150 000 records:
Genie takes: 96min
On release mode the smile implementation has been running for 2 days now...

And in previous runs there's always been a factor of at least 10x between the finishing times of Genie, resp. Smile.

Thank you,
Diana

p.s. randomize parameters is set to off for both runs (so both start with a uniform distribution of the paramenters they need to estimate)

Fri Aug 13, 2010 1:04 pm

diana wrote:For a training set of 150 000 records:
Genie takes: 96min
On release mode the smile implementation has been running for 2 days now...

And in previous runs there's always been a factor of at least 10x between the finishing times of Genie, resp. Smile.

The 10x factor strongly suggests the difference in node/column mappings between GeNIe and your SMILE app. Are you using MatchNetwork method with your real data? If so, try to output the created matching entries before EM runs.

BayesFusion Support Forum

Different Results in Learning

Different Results in Learning

Re: Different Results in Learning