Parameters learning with jSmile

The engine.
dtodor
Posts: 13
Joined: Tue Feb 03, 2009 8:30 am

Parameters learning with jSmile

Post by dtodor »

Hi,

when I use the EM class for learning the parameters of a network I get a different result as compared to when learning the parameters from the same data set using Genie. The result delivered by Genie is the correct one (I've checked it against several other BN toolboxes). This is the code that I am using:

final DataSet dataSet = new DataSet();
dataSet.readFile(smileData.getAbsolutePath());
final DataMatch[] dataMatches = dataSet.matchNetwork(smileNetwork);
for (DataMatch dataMatch : dataMatches) {
LOGGER.info("Matched column " + dataMatch.column + " with node "
+ smileNetwork.getNodeId(dataMatch.node));
}

final EM em = new EM();
em.setEqSampleSize(2);
em.setRandomizeParameters(true);
em.learn(dataSet, smileNetwork, dataMatches);

and this is the output:

11:36:49.628 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 0 with node A
11:36:49.629 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 1 with node B
11:36:49.629 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 2 with node C
11:36:49.630 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 3 with node D
11:36:49.630 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 4 with node E
11:36:49.630 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 5 with node F
11:36:49.631 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 6 with node H
11:36:49.633 [main] [INFO ] [SmileLearner] [learn:101] - Using equivalent sample size of 2
11:36:49.788 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'A': 0.4993351776532061,0.5006648223467939
11:36:49.789 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'B': 0.48866942768165256,0.5113305723183474
11:36:49.789 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'C': 0.5085294258975455,0.4914705741024545
11:36:49.791 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'H': 0.23253331363720545,0.7674666863627945,0.6798108275443848,0.3201891724556151,0.09782451050574072,0.9021754894942593,0.5998807745262033,0.40011922547379675,0.4253543994189593,0.5746456005810406,0.950173804200744,0.04982619579925605,0.08742642342156699,0.9125735765784331,0.2624864926039031,0.7375135073960969
11:36:49.796 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'D': 0.21295359092548244,0.7870464090745175,0.3808818323836003,0.6191181676163997
11:36:49.797 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'E': 0.5870861303434215,0.41291386965657856,0.5183475619628168,0.48165243803718316
11:36:49.797 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'F': 0.3218042101710616,0.6781957898289384,0.38842898782100094,0.611571012178999

I've attached the network file as well as the data set file. The network's parameters can be seen as the ground truth.

Any help is highly appreciated!

Thanks in advance,

Todor
Attachments
abc-h-def-network-original.xdsl
Network file with ground truth for the parameters.
(2.61 KiB) Downloaded 503 times
abc-h-def-network-original.txt
The training data set. The samples are generated from the network using Genie.
(24.62 KiB) Downloaded 506 times
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: Parameters learning with jSmile

Post by shooltz[BayesFusion] »

dtodor wrote:when I use the EM class for learning the parameters of a network I get a different result as compared to when learning the parameters from the same data set using Genie. The result delivered by Genie is the correct one (I've checked it against several other BN toolboxes).
I can't reproduce that - numbers from my Java program are different from those you've posted here. Are you using most recent jSMILE binaries? What's your platform?
dtodor
Posts: 13
Joined: Tue Feb 03, 2009 8:30 am

Post by dtodor »

I'm using the latest jSmile for Mac OS X.
dtodor
Posts: 13
Joined: Tue Feb 03, 2009 8:30 am

Post by dtodor »

This doesn't work either under Mac OS:

#include <iostream>
#include <vector>

#include "smile.h"
#include "smilearn.h"

int main (int argc, char * const argv[]) {

DSL_network network;
if (network.ReadFile("abc-h-def-network-randomized.xdsl") != DSL_OKAY) {
std::cout << "Unable to read network\n";
return -1;
}
std::cout << "1. Successfully read network\n";

DSL_dataset dataset;
if (dataset.ReadFile("abc-h-def-network-original.txt") != DSL_OKAY) {
return -2;
}
std::cout << "2. Successfully read data set\n";

std::string errMsg;


std::vector<DSL_datasetMatch> matches;
if (dataset.MatchNetwork(network, matches, errMsg) != DSL_OKAY) {
return -3;
}
std::cout << "3. Successfully calculated matches\n";

std::vector<int> fixedNodes;

DSL_em em;
em.SetEquivalentSampleSize(2);
em.SetRandomizeParameters(true);
if (em.Learn(dataset, network, matches) != DSL_OKAY) {
std::cout << errMsg;
return -4;
}
std::cout << "4. Successfully learnt network\n";

if (network.WriteFile("abc-h-def-network-learnt.xdsl") != DSL_OKAY) {
return -5;
}
std::cout << "5. Successfully written learnt network\n";

DSL_doubleArray hdef = network.GetNode(network.FindNode("H"))->Definition()->GetMatrix()->GetItems();
std::cout << "Definition for node H: ";
for (int i=0; i<hdef.GetSize(); i++) {
std::cout << hdef;
if (i < hdef.GetSize()-1) {
std::cout << ",";
}
}
std::cout << "\n";

return 0;
}
Attachments
abc-h-def-network-randomized.xdsl
Randomized probability densities
(3.11 KiB) Downloaded 433 times
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

dtodor wrote:I'm using the latest jSmile for Mac OS X.
That's going to be hard to reproduce - we don't have any Macs at hand. The only thing we do is building core C++ libs using virtualized Darwin. Can you test your program on other OS?
dtodor
Posts: 13
Joined: Tue Feb 03, 2009 8:30 am

Post by dtodor »

Maybe I'm missing something when using the EM class?
dtodor
Posts: 13
Joined: Tue Feb 03, 2009 8:30 am

Post by dtodor »

Do you have a C++ example for EM parameters learning?
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

dtodor wrote:Maybe I'm missing something when using the EM class?
You shouldn't use equivalent sample size with randomized initial parameters, but it has no effect on the output from my sample program. I tested both Java and C++.
dtodor wrote:Do you have a C++ example for EM parameters learning?
Try the code below, but make sure you've replaced the paths for network and data file:

Code: Select all

int main(int argc, char* argv[])
{	
	ErrorH.RedirectToFile(stdout);

	DSL_network net;
	if (DSL_OKAY != net.ReadFile("d:/tmpx/abc-h-def-network-original.xdsl"))
	{
		return -1;
	}

	DSL_dataset ds;
	if (DSL_OKAY != ds.ReadFile("d:/tmpx/abc-h-def-network-original.txt"))
	{
		return -2;
	}

	vector<DSL_datasetMatch> matching;
	string err;
	if (DSL_OKAY != ds.MatchNetwork(net, matching, err))
	{
		return -3;
	}

	for (unsigned i = 0; i < matching.size(); i ++)
	{
		const DSL_datasetMatch &m = matching[i];
		printf("%d col=%d slice=%d h=%d %s\n", i, m.column, m.slice, m.node, net.GetNode(m.node)->GetId());
	}

	DSL_em em;
	em.SetEquivalentSampleSize(2);
	em.SetRandomizeParameters(true);

	if (DSL_OKAY != em.Learn(ds, net, matching))
	{
		return -4;
	}

	for (int h = net.GetFirstNode(); h >= 0; h = net.GetNextNode(h))
	{
		printf("%d %s\n", h, net.GetNode(h)->GetId());
		const DSL_Dmatrix *mtx = net.GetNode(h)->Definition()->GetMatrix();
		for (int i = 0; i < mtx->GetSize(); i ++)
		{
			printf("%g ", (*mtx)[i]);
		}
		printf("\n");
	}

	return 0;
}

dtodor
Posts: 13
Joined: Tue Feb 03, 2009 8:30 am

Post by dtodor »

The following code delivers the CORRECT results under Windows. Unfortunately this is NOT the case for Mac OS X


#include <iostream>
#include <vector>

#include "smile.h"
#include "smilearn.h"

int main (int argc, char * const argv[]) {

DSL_network network;
if (network.ReadFile("abc-h-def-network-randomized.xdsl") != DSL_OKAY) {
std::cout << "Unable to read network\n";
return -1;
}
std::cout << "1. Successfully read network\n";

DSL_dataset dataset;
if (dataset.ReadFile("abc-h-def-network-original.txt") != DSL_OKAY) {
return -2;
}
std::cout << "2. Successfully read data set\n";


std::string errMsg;


std::vector<DSL_datasetMatch> matches;
if (dataset.MatchNetwork(network, matches, errMsg) != DSL_OKAY) {
return -3;
}
std::cout << "3. Successfully calculated matches\n";

for (int i=0; i<matches.size(); i++) {
const DSL_datasetMatch &m = matches;
printf("%d col=%d slice=%d h=%d %s\n", i, m.column, m.slice, m.node, network.GetNode(m.node)->GetId());
}

std::vector<int> fixedNodes;

DSL_em em;
em.SetEquivalentSampleSize(2);
em.SetRandomizeParameters(true);
if (em.Learn(dataset, network, matches) != DSL_OKAY) {
std::cout << errMsg;
return -4;
}
std::cout << "4. Successfully learnt network\n";

if (network.WriteFile("abc-h-def-network-learnt.xdsl") != DSL_OKAY) {
return -5;
}
std::cout << "5. Successfully written learnt network\n";

DSL_doubleArray hdef = network.GetNode(network.FindNode("H"))->Definition()->GetMatrix()->GetItems();
std::cout << "Definition for node H: ";
for (int i=0; i<hdef.GetSize(); i++) {
std::cout << hdef;
if (i < hdef.GetSize()-1) {
std::cout << ",";
}
}
std::cout << "\n";

return 0;
}
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

Do they produce identical output if you remove parameter randomization?
dtodor
Posts: 13
Joined: Tue Feb 03, 2009 8:30 am

Post by dtodor »

Without parameter randomization, the code still works under Windows and under Mac OS X still produces the wrong results.
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

dtodor wrote:Without parameter randomization, the code still works under Windows and under Mac OS X still produces the wrong results.
Can you post the results from both platforms, without randomization?
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

BTW, the EM randomization is enabled by default, so please make sure you actually call DSL_em::SetRandomizeParameters(false) - commenting out SetRandomizeParameters(true) is not enough.
dtodor
Posts: 13
Joined: Tue Feb 03, 2009 8:30 am

Post by dtodor »

Windows :: SetRandomizeParameters(false):

0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.724004 0.275996 0.362899 0.637101 0.294494 0.705506 0.171772 0.828228 0.884285 0.115715 0.449727 0.5
50273 0.763873 0.236127
4 D
0.20216 0.79784 0.70362 0.29638
5 E
0.588571 0.411429 0.479142 0.520858
6 F
0.324728 0.675272 0.0908674 0.909133


Windows :: SetRandomizeParameters(true):

0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.724004 0.275996 0.362899 0.637101 0.294494 0.705506 0.171772 0.828228 0.884285 0.115715 0.449727 0.5
50273 0.763873 0.236127
4 D
0.20216 0.79784 0.70362 0.29638
5 E
0.588571 0.411429 0.479142 0.520858
6 F
0.324728 0.675272 0.0908674 0.909133

------------------

Mac OS X :: SetRandomizeParameters(false):

0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.518448 0.481552 0.584179 0.415821 0.86999 0.13001 0.0174197 0.98258 0.808501 0.191499 0.162883 0.837117 0.782767 0.217233
4 D
0.20216 0.79784 0.677407 0.322593
5 E
0.588571 0.411429 0.643097 0.356903
6 F
0.324728 0.675272 0.386902 0.613098

Mac OS X :: SetRandomizeParameters(true):

0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.518448 0.481552 0.584179 0.415821 0.86999 0.13001 0.0174197 0.98258 0.808501 0.191499 0.162883 0.837117 0.782767 0.217233
4 D
0.20216 0.79784 0.677407 0.322593
5 E
0.588571 0.411429 0.643097 0.356903
6 F
0.324728 0.675272 0.386902 0.613098
Attachments
abc-h-def-network-original.txt
Training data
(24.62 KiB) Downloaded 478 times
abc-h-def-network-randomized.xdsl
Network
(3.11 KiB) Downloaded 497 times
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

I can't reproduce your results on Windows. I was using two different VC++ versions. I'll need the following information before we can solve this issue:

1) VC++ version you're using, including installed service packs
2) sizes and dates of *.lib files from smile_1_1_vcxx.zip
3) the data file and the network file used to run the program. There are already two .xdsls posted in this topic.

I've also noticed the output from your Java app in the first post has node H matched to column 6, but attached data file has column H at position 3 (0-based). SMILearn's matching is done with the node/column ID, so it's seems that data file used with Java was different from the one posted here.
Post Reply