Different Results in Learning

The engine.
Post Reply
diana
Posts: 5
Joined: Wed Jun 30, 2010 4:04 pm

Different Results in Learning

Post by diana »

I'm using the EM for parameter learning and the results from Genie and Smile differ quite a lot. I am using the default setting for em learning in smile. Is there an explanation for that, or am I doing smth wrong?

Code: Select all

void EM(DSL_network &theNet, string NBfile_Learning, string DiscParameterFileName)
{
	DSL_dataset dataset;

	cout<<"File to be read: "<<DiscParameterFileName<<endl;

	if(dataset.ReadFile(DiscParameterFileName)!=DSL_OKAY) 
    {
        cout<<"Reading failed!"<<endl;
        return;
    }
	else cout<<"Reading successful!"<<endl;
	
    DSL_network result;
	vector<DSL_datasetMatch> matchedData;
	string error;
	if(dataset.MatchNetwork(theNet, matchedData, error) !=DSL_OKAY)
    {
        cout<<"Matching failed!"<<endl;
        return;
    }
    cout<<"Matching successful!"<<endl;

	DSL_em em;
    cout<<"Learning in progress...."<<endl;
	if (em.Learn(dataset, theNet, matchedData)!=DSL_OKAY)
	{
		cout << "Learning failed!" << endl;
		return;
	}
	cout<< "Learning successful!" << endl;

	theNet.WriteFile(NBfile_Learning.c_str());
}
Thank you!
Attachments
nbtest.xdsl
inital NB
(1.34 KiB) Downloaded 266 times
learningparameters.txt
learningparameters file
(247 Bytes) Downloaded 268 times
nbtest_afterLearning.xdsl
NB after learning with Smile
(1.38 KiB) Downloaded 298 times
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: Different Results in Learning

Post by shooltz[BayesFusion] »

By default DSL_em randomizes initial parameters - check if running with randParams off in GeNIe and SMILE produces similiar results.
diana
Posts: 5
Joined: Wed Jun 30, 2010 4:04 pm

Post by diana »

First of all, thanks for your reply.

It still gives different results. But now the main difference is that Genie does not give me an estimation for parameters for which i don't have evidence.
So if a combination of states does not occur in my learning data the distribution is left uniform (i am starting with uniform distribution in every node ).

Also, the Smile learning takes much much longer then the Genie learning, why is that? (same net, same trainingdata).

Thank you,
Diana
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

Your data file contains spaces after separator semicolons. This causes MatchNetwork method to return only two matchings instead of three.
But now the main difference is that Genie does not give me an estimation for parameters for which i don't have evidence.

How can it give any estimation in such case? Again, make sure that both GeNIe and your SMILE program run EM with initial parameter randomization off.
Also, the Smile learning takes much much longer then the Genie learning, why is that? (same net, same trainingdata).
GeNIe uses the same code as SMILE for actual learning (data file parsing and matching are different). Are you sure you're comparing the release build with GeNIe?
diana
Posts: 5
Joined: Wed Jun 30, 2010 4:04 pm

Post by diana »

Thanks for your prompt reply.

The net I am using is actually different from the one posted, that one was just for testing the implementation of the learning. I am using the four level BN with 25 input nodes. Unfortunately, I am not allow to share the net, nor the trainingset.

For a training set of 150 000 records:
Genie takes: 96min
On release mode the smile implementation has been running for 2 days now...

And in previous runs there's always been a factor of at least 10x between the finishing times of Genie, resp. Smile.

Thank you,
Diana

p.s. randomize parameters is set to off for both runs (so both start with a uniform distribution of the paramenters they need to estimate)
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

diana wrote:For a training set of 150 000 records:
Genie takes: 96min
On release mode the smile implementation has been running for 2 days now...

And in previous runs there's always been a factor of at least 10x between the finishing times of Genie, resp. Smile.
The 10x factor strongly suggests the difference in node/column mappings between GeNIe and your SMILE app. Are you using MatchNetwork method with your real data? If so, try to output the created matching entries before EM runs.
Post Reply