Parameters learning with jSmile

dtodor · Post by **dtodor** » Tue May 26, 2009 11:55 am

Hi,

when I use the EM class for learning the parameters of a network I get a different result as compared to when learning the parameters from the same data set using Genie. The result delivered by Genie is the correct one (I've checked it against several other BN toolboxes). This is the code that I am using:

final DataSet dataSet = new DataSet();
dataSet.readFile(smileData.getAbsolutePath());
final DataMatch[] dataMatches = dataSet.matchNetwork(smileNetwork);
for (DataMatch dataMatch : dataMatches) {
LOGGER.info("Matched column " + dataMatch.column + " with node "
+ smileNetwork.getNodeId(dataMatch.node));
}

final EM em = new EM();
em.setEqSampleSize(2);
em.setRandomizeParameters(true);
em.learn(dataSet, smileNetwork, dataMatches);

and this is the output:

11:36:49.628 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 0 with node A
11:36:49.629 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 1 with node B
11:36:49.629 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 2 with node C
11:36:49.630 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 3 with node D
11:36:49.630 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 4 with node E
11:36:49.630 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 5 with node F
11:36:49.631 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 6 with node H
11:36:49.633 [main] [INFO ] [SmileLearner] [learn:101] - Using equivalent sample size of 2
11:36:49.788 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'A': 0.4993351776532061,0.5006648223467939
11:36:49.789 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'B': 0.48866942768165256,0.5113305723183474
11:36:49.789 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'C': 0.5085294258975455,0.4914705741024545
11:36:49.791 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'H': 0.23253331363720545,0.7674666863627945,0.6798108275443848,0.3201891724556151,0.09782451050574072,0.9021754894942593,0.5998807745262033,0.40011922547379675,0.4253543994189593,0.5746456005810406,0.950173804200744,0.04982619579925605,0.08742642342156699,0.9125735765784331,0.2624864926039031,0.7375135073960969
11:36:49.796 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'D': 0.21295359092548244,0.7870464090745175,0.3808818323836003,0.6191181676163997
11:36:49.797 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'E': 0.5870861303434215,0.41291386965657856,0.5183475619628168,0.48165243803718316
11:36:49.797 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'F': 0.3218042101710616,0.6781957898289384,0.38842898782100094,0.611571012178999

I've attached the network file as well as the data set file. The network's parameters can be seen as the ground truth.

Any help is highly appreciated!

Thanks in advance,

Todor

Wed May 27, 2009 11:39 am

dtodor wrote:when I use the EM class for learning the parameters of a network I get a different result as compared to when learning the parameters from the same data set using Genie. The result delivered by Genie is the correct one (I've checked it against several other BN toolboxes).

I can't reproduce that - numbers from my Java program are different from those you've posted here. Are you using most recent jSMILE binaries? What's your platform?

dtodor · Post by **dtodor** » Wed May 27, 2009 12:53 pm

I'm using the latest jSmile for Mac OS X.

dtodor · Post by **dtodor** » Wed May 27, 2009 1:43 pm

This doesn't work either under Mac OS:

#include <iostream>
#include <vector>

#include "smile.h"
#include "smilearn.h"

int main (int argc, char * const argv[]) {

DSL_network network;
if (network.ReadFile("abc-h-def-network-randomized.xdsl") != DSL_OKAY) {
std::cout << "Unable to read network\n";
return -1;
}
std::cout << "1. Successfully read network\n";

DSL_dataset dataset;
if (dataset.ReadFile("abc-h-def-network-original.txt") != DSL_OKAY) {
return -2;
}
std::cout << "2. Successfully read data set\n";

std::string errMsg;

std::vector<DSL_datasetMatch> matches;
if (dataset.MatchNetwork(network, matches, errMsg) != DSL_OKAY) {
return -3;
}
std::cout << "3. Successfully calculated matches\n";

std::vector<int> fixedNodes;

DSL_em em;
em.SetEquivalentSampleSize(2);
em.SetRandomizeParameters(true);
if (em.Learn(dataset, network, matches) != DSL_OKAY) {
std::cout << errMsg;
return -4;
}
std::cout << "4. Successfully learnt network\n";

if (network.WriteFile("abc-h-def-network-learnt.xdsl") != DSL_OKAY) {
return -5;
}
std::cout << "5. Successfully written learnt network\n";

DSL_doubleArray hdef = network.GetNode(network.FindNode("H"))->Definition()->GetMatrix()->GetItems();
std::cout << "Definition for node H: ";
for (int i=0; i<hdef.GetSize(); i++) {
std::cout << hdef;
if (i < hdef.GetSize()-1) {
std::cout << ",";
}
}
std::cout << "\n";

return 0;
}

Wed May 27, 2009 1:43 pm

dtodor wrote:I'm using the latest jSmile for Mac OS X.

That's going to be hard to reproduce - we don't have any Macs at hand. The only thing we do is building core C++ libs using virtualized Darwin. Can you test your program on other OS?

dtodor · Post by **dtodor** » Wed May 27, 2009 2:13 pm

Maybe I'm missing something when using the EM class?

dtodor · Post by **dtodor** » Wed May 27, 2009 2:13 pm

Do you have a C++ example for EM parameters learning?

Wed May 27, 2009 3:02 pm

dtodor wrote:Maybe I'm missing something when using the EM class?

You shouldn't use equivalent sample size with randomized initial parameters, but it has no effect on the output from my sample program. I tested both Java and C++.

dtodor wrote:Do you have a C++ example for EM parameters learning?

Try the code below, but make sure you've replaced the paths for network and data file:

Code: Select all

int main(int argc, char* argv[])
{	
	ErrorH.RedirectToFile(stdout);

	DSL_network net;
	if (DSL_OKAY != net.ReadFile("d:/tmpx/abc-h-def-network-original.xdsl"))
	{
		return -1;
	}

	DSL_dataset ds;
	if (DSL_OKAY != ds.ReadFile("d:/tmpx/abc-h-def-network-original.txt"))
	{
		return -2;
	}

	vector<DSL_datasetMatch> matching;
	string err;
	if (DSL_OKAY != ds.MatchNetwork(net, matching, err))
	{
		return -3;
	}

	for (unsigned i = 0; i < matching.size(); i ++)
	{
		const DSL_datasetMatch &m = matching[i];
		printf("%d col=%d slice=%d h=%d %s\n", i, m.column, m.slice, m.node, net.GetNode(m.node)->GetId());
	}

	DSL_em em;
	em.SetEquivalentSampleSize(2);
	em.SetRandomizeParameters(true);

	if (DSL_OKAY != em.Learn(ds, net, matching))
	{
		return -4;
	}

	for (int h = net.GetFirstNode(); h >= 0; h = net.GetNextNode(h))
	{
		printf("%d %s\n", h, net.GetNode(h)->GetId());
		const DSL_Dmatrix *mtx = net.GetNode(h)->Definition()->GetMatrix();
		for (int i = 0; i < mtx->GetSize(); i ++)
		{
			printf("%g ", (*mtx)[i]);
		}
		printf("\n");
	}

	return 0;
}

dtodor · Post by **dtodor** » Wed May 27, 2009 4:28 pm

The following code delivers the CORRECT results under Windows. Unfortunately this is NOT the case for Mac OS X

#include <iostream>
#include <vector>

#include "smile.h"
#include "smilearn.h"

int main (int argc, char * const argv[]) {

DSL_network network;
if (network.ReadFile("abc-h-def-network-randomized.xdsl") != DSL_OKAY) {
std::cout << "Unable to read network\n";
return -1;
}
std::cout << "1. Successfully read network\n";

DSL_dataset dataset;
if (dataset.ReadFile("abc-h-def-network-original.txt") != DSL_OKAY) {
return -2;
}
std::cout << "2. Successfully read data set\n";

std::string errMsg;

std::vector<DSL_datasetMatch> matches;
if (dataset.MatchNetwork(network, matches, errMsg) != DSL_OKAY) {
return -3;
}
std::cout << "3. Successfully calculated matches\n";

for (int i=0; i<matches.size(); i++) {
const DSL_datasetMatch &m = matches;
printf("%d col=%d slice=%d h=%d %s\n", i, m.column, m.slice, m.node, network.GetNode(m.node)->GetId());
}

std::vector<int> fixedNodes;

DSL_em em;
em.SetEquivalentSampleSize(2);
em.SetRandomizeParameters(true);
if (em.Learn(dataset, network, matches) != DSL_OKAY) {
std::cout << errMsg;
return -4;
}
std::cout << "4. Successfully learnt network\n";

if (network.WriteFile("abc-h-def-network-learnt.xdsl") != DSL_OKAY) {
return -5;
}
std::cout << "5. Successfully written learnt network\n";

DSL_doubleArray hdef = network.GetNode(network.FindNode("H"))->Definition()->GetMatrix()->GetItems();
std::cout << "Definition for node H: ";
for (int i=0; i<hdef.GetSize(); i++) {
std::cout << hdef;
if (i < hdef.GetSize()-1) {
std::cout << ",";
}
}
std::cout << "\n";

return 0;
}

Wed May 27, 2009 4:32 pm

Do they produce identical output if you remove parameter randomization?

dtodor · Post by **dtodor** » Wed May 27, 2009 4:51 pm

Without parameter randomization, the code still works under Windows and under Mac OS X still produces the wrong results.

Wed May 27, 2009 4:56 pm

dtodor wrote:Without parameter randomization, the code still works under Windows and under Mac OS X still produces the wrong results.

Can you post the results from both platforms, without randomization?

Wed May 27, 2009 5:03 pm

BTW, the EM randomization is enabled by default, so please make sure you actually call DSL_em::SetRandomizeParameters(false) - commenting out SetRandomizeParameters(true) is not enough.

dtodor · Post by **dtodor** » Wed May 27, 2009 5:32 pm

Windows :: SetRandomizeParameters(false):

0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.724004 0.275996 0.362899 0.637101 0.294494 0.705506 0.171772 0.828228 0.884285 0.115715 0.449727 0.5
50273 0.763873 0.236127
4 D
0.20216 0.79784 0.70362 0.29638
5 E
0.588571 0.411429 0.479142 0.520858
6 F
0.324728 0.675272 0.0908674 0.909133

Windows :: SetRandomizeParameters(true):

0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.724004 0.275996 0.362899 0.637101 0.294494 0.705506 0.171772 0.828228 0.884285 0.115715 0.449727 0.5
50273 0.763873 0.236127
4 D
0.20216 0.79784 0.70362 0.29638
5 E
0.588571 0.411429 0.479142 0.520858
6 F
0.324728 0.675272 0.0908674 0.909133

------------------

Mac OS X :: SetRandomizeParameters(false):

0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.518448 0.481552 0.584179 0.415821 0.86999 0.13001 0.0174197 0.98258 0.808501 0.191499 0.162883 0.837117 0.782767 0.217233
4 D
0.20216 0.79784 0.677407 0.322593
5 E
0.588571 0.411429 0.643097 0.356903
6 F
0.324728 0.675272 0.386902 0.613098

Mac OS X :: SetRandomizeParameters(true):

0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.518448 0.481552 0.584179 0.415821 0.86999 0.13001 0.0174197 0.98258 0.808501 0.191499 0.162883 0.837117 0.782767 0.217233
4 D
0.20216 0.79784 0.677407 0.322593
5 E
0.588571 0.411429 0.643097 0.356903
6 F
0.324728 0.675272 0.386902 0.613098

Thu May 28, 2009 6:10 pm

I can't reproduce your results on Windows. I was using two different VC++ versions. I'll need the following information before we can solve this issue:

1) VC++ version you're using, including installed service packs
2) sizes and dates of *.lib files from smile_1_1_vcxx.zip
3) the data file and the network file used to run the program. There are already two .xdsls posted in this topic.

I've also noticed the output from your Java app in the first post has node H matched to column 6, but attached data file has column H at position 3 (0-based). SMILearn's matching is done with the node/column ID, so it's seems that data file used with Java was different from the one posted here.

BayesFusion Support Forum

Parameters learning with jSmile

Parameters learning with jSmile

Re: Parameters learning with jSmile