Parameters learning with jSmile
Parameters learning with jSmile
Hi,
when I use the EM class for learning the parameters of a network I get a different result as compared to when learning the parameters from the same data set using Genie. The result delivered by Genie is the correct one (I've checked it against several other BN toolboxes). This is the code that I am using:
final DataSet dataSet = new DataSet();
dataSet.readFile(smileData.getAbsolutePath());
final DataMatch[] dataMatches = dataSet.matchNetwork(smileNetwork);
for (DataMatch dataMatch : dataMatches) {
LOGGER.info("Matched column " + dataMatch.column + " with node "
+ smileNetwork.getNodeId(dataMatch.node));
}
final EM em = new EM();
em.setEqSampleSize(2);
em.setRandomizeParameters(true);
em.learn(dataSet, smileNetwork, dataMatches);
and this is the output:
11:36:49.628 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 0 with node A
11:36:49.629 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 1 with node B
11:36:49.629 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 2 with node C
11:36:49.630 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 3 with node D
11:36:49.630 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 4 with node E
11:36:49.630 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 5 with node F
11:36:49.631 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 6 with node H
11:36:49.633 [main] [INFO ] [SmileLearner] [learn:101] - Using equivalent sample size of 2
11:36:49.788 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'A': 0.4993351776532061,0.5006648223467939
11:36:49.789 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'B': 0.48866942768165256,0.5113305723183474
11:36:49.789 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'C': 0.5085294258975455,0.4914705741024545
11:36:49.791 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'H': 0.23253331363720545,0.7674666863627945,0.6798108275443848,0.3201891724556151,0.09782451050574072,0.9021754894942593,0.5998807745262033,0.40011922547379675,0.4253543994189593,0.5746456005810406,0.950173804200744,0.04982619579925605,0.08742642342156699,0.9125735765784331,0.2624864926039031,0.7375135073960969
11:36:49.796 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'D': 0.21295359092548244,0.7870464090745175,0.3808818323836003,0.6191181676163997
11:36:49.797 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'E': 0.5870861303434215,0.41291386965657856,0.5183475619628168,0.48165243803718316
11:36:49.797 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'F': 0.3218042101710616,0.6781957898289384,0.38842898782100094,0.611571012178999
I've attached the network file as well as the data set file. The network's parameters can be seen as the ground truth.
Any help is highly appreciated!
Thanks in advance,
Todor
when I use the EM class for learning the parameters of a network I get a different result as compared to when learning the parameters from the same data set using Genie. The result delivered by Genie is the correct one (I've checked it against several other BN toolboxes). This is the code that I am using:
final DataSet dataSet = new DataSet();
dataSet.readFile(smileData.getAbsolutePath());
final DataMatch[] dataMatches = dataSet.matchNetwork(smileNetwork);
for (DataMatch dataMatch : dataMatches) {
LOGGER.info("Matched column " + dataMatch.column + " with node "
+ smileNetwork.getNodeId(dataMatch.node));
}
final EM em = new EM();
em.setEqSampleSize(2);
em.setRandomizeParameters(true);
em.learn(dataSet, smileNetwork, dataMatches);
and this is the output:
11:36:49.628 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 0 with node A
11:36:49.629 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 1 with node B
11:36:49.629 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 2 with node C
11:36:49.630 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 3 with node D
11:36:49.630 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 4 with node E
11:36:49.630 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 5 with node F
11:36:49.631 [main] [INFO ] [SmileLearner] [learn:85] - Matched column 6 with node H
11:36:49.633 [main] [INFO ] [SmileLearner] [learn:101] - Using equivalent sample size of 2
11:36:49.788 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'A': 0.4993351776532061,0.5006648223467939
11:36:49.789 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'B': 0.48866942768165256,0.5113305723183474
11:36:49.789 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'C': 0.5085294258975455,0.4914705741024545
11:36:49.791 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'H': 0.23253331363720545,0.7674666863627945,0.6798108275443848,0.3201891724556151,0.09782451050574072,0.9021754894942593,0.5998807745262033,0.40011922547379675,0.4253543994189593,0.5746456005810406,0.950173804200744,0.04982619579925605,0.08742642342156699,0.9125735765784331,0.2624864926039031,0.7375135073960969
11:36:49.796 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'D': 0.21295359092548244,0.7870464090745175,0.3808818323836003,0.6191181676163997
11:36:49.797 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'E': 0.5870861303434215,0.41291386965657856,0.5183475619628168,0.48165243803718316
11:36:49.797 [main] [INFO ] [SmileLearner] [learn:154] - Definition for node 'F': 0.3218042101710616,0.6781957898289384,0.38842898782100094,0.611571012178999
I've attached the network file as well as the data set file. The network's parameters can be seen as the ground truth.
Any help is highly appreciated!
Thanks in advance,
Todor
- Attachments
-
- abc-h-def-network-original.xdsl
- Network file with ground truth for the parameters.
- (2.61 KiB) Downloaded 503 times
-
- abc-h-def-network-original.txt
- The training data set. The samples are generated from the network using Genie.
- (24.62 KiB) Downloaded 506 times
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
Re: Parameters learning with jSmile
I can't reproduce that - numbers from my Java program are different from those you've posted here. Are you using most recent jSMILE binaries? What's your platform?dtodor wrote:when I use the EM class for learning the parameters of a network I get a different result as compared to when learning the parameters from the same data set using Genie. The result delivered by Genie is the correct one (I've checked it against several other BN toolboxes).
This doesn't work either under Mac OS:
#include <iostream>
#include <vector>
#include "smile.h"
#include "smilearn.h"
int main (int argc, char * const argv[]) {
DSL_network network;
if (network.ReadFile("abc-h-def-network-randomized.xdsl") != DSL_OKAY) {
std::cout << "Unable to read network\n";
return -1;
}
std::cout << "1. Successfully read network\n";
DSL_dataset dataset;
if (dataset.ReadFile("abc-h-def-network-original.txt") != DSL_OKAY) {
return -2;
}
std::cout << "2. Successfully read data set\n";
std::string errMsg;
std::vector<DSL_datasetMatch> matches;
if (dataset.MatchNetwork(network, matches, errMsg) != DSL_OKAY) {
return -3;
}
std::cout << "3. Successfully calculated matches\n";
std::vector<int> fixedNodes;
DSL_em em;
em.SetEquivalentSampleSize(2);
em.SetRandomizeParameters(true);
if (em.Learn(dataset, network, matches) != DSL_OKAY) {
std::cout << errMsg;
return -4;
}
std::cout << "4. Successfully learnt network\n";
if (network.WriteFile("abc-h-def-network-learnt.xdsl") != DSL_OKAY) {
return -5;
}
std::cout << "5. Successfully written learnt network\n";
DSL_doubleArray hdef = network.GetNode(network.FindNode("H"))->Definition()->GetMatrix()->GetItems();
std::cout << "Definition for node H: ";
for (int i=0; i<hdef.GetSize(); i++) {
std::cout << hdef;
if (i < hdef.GetSize()-1) {
std::cout << ",";
}
}
std::cout << "\n";
return 0;
}
#include <iostream>
#include <vector>
#include "smile.h"
#include "smilearn.h"
int main (int argc, char * const argv[]) {
DSL_network network;
if (network.ReadFile("abc-h-def-network-randomized.xdsl") != DSL_OKAY) {
std::cout << "Unable to read network\n";
return -1;
}
std::cout << "1. Successfully read network\n";
DSL_dataset dataset;
if (dataset.ReadFile("abc-h-def-network-original.txt") != DSL_OKAY) {
return -2;
}
std::cout << "2. Successfully read data set\n";
std::string errMsg;
std::vector<DSL_datasetMatch> matches;
if (dataset.MatchNetwork(network, matches, errMsg) != DSL_OKAY) {
return -3;
}
std::cout << "3. Successfully calculated matches\n";
std::vector<int> fixedNodes;
DSL_em em;
em.SetEquivalentSampleSize(2);
em.SetRandomizeParameters(true);
if (em.Learn(dataset, network, matches) != DSL_OKAY) {
std::cout << errMsg;
return -4;
}
std::cout << "4. Successfully learnt network\n";
if (network.WriteFile("abc-h-def-network-learnt.xdsl") != DSL_OKAY) {
return -5;
}
std::cout << "5. Successfully written learnt network\n";
DSL_doubleArray hdef = network.GetNode(network.FindNode("H"))->Definition()->GetMatrix()->GetItems();
std::cout << "Definition for node H: ";
for (int i=0; i<hdef.GetSize(); i++) {
std::cout << hdef;
if (i < hdef.GetSize()-1) {
std::cout << ",";
}
}
std::cout << "\n";
return 0;
}
- Attachments
-
- abc-h-def-network-randomized.xdsl
- Randomized probability densities
- (3.11 KiB) Downloaded 433 times
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
You shouldn't use equivalent sample size with randomized initial parameters, but it has no effect on the output from my sample program. I tested both Java and C++.dtodor wrote:Maybe I'm missing something when using the EM class?
Try the code below, but make sure you've replaced the paths for network and data file:dtodor wrote:Do you have a C++ example for EM parameters learning?
Code: Select all
int main(int argc, char* argv[])
{
ErrorH.RedirectToFile(stdout);
DSL_network net;
if (DSL_OKAY != net.ReadFile("d:/tmpx/abc-h-def-network-original.xdsl"))
{
return -1;
}
DSL_dataset ds;
if (DSL_OKAY != ds.ReadFile("d:/tmpx/abc-h-def-network-original.txt"))
{
return -2;
}
vector<DSL_datasetMatch> matching;
string err;
if (DSL_OKAY != ds.MatchNetwork(net, matching, err))
{
return -3;
}
for (unsigned i = 0; i < matching.size(); i ++)
{
const DSL_datasetMatch &m = matching[i];
printf("%d col=%d slice=%d h=%d %s\n", i, m.column, m.slice, m.node, net.GetNode(m.node)->GetId());
}
DSL_em em;
em.SetEquivalentSampleSize(2);
em.SetRandomizeParameters(true);
if (DSL_OKAY != em.Learn(ds, net, matching))
{
return -4;
}
for (int h = net.GetFirstNode(); h >= 0; h = net.GetNextNode(h))
{
printf("%d %s\n", h, net.GetNode(h)->GetId());
const DSL_Dmatrix *mtx = net.GetNode(h)->Definition()->GetMatrix();
for (int i = 0; i < mtx->GetSize(); i ++)
{
printf("%g ", (*mtx)[i]);
}
printf("\n");
}
return 0;
}
The following code delivers the CORRECT results under Windows. Unfortunately this is NOT the case for Mac OS X
#include <iostream>
#include <vector>
#include "smile.h"
#include "smilearn.h"
int main (int argc, char * const argv[]) {
DSL_network network;
if (network.ReadFile("abc-h-def-network-randomized.xdsl") != DSL_OKAY) {
std::cout << "Unable to read network\n";
return -1;
}
std::cout << "1. Successfully read network\n";
DSL_dataset dataset;
if (dataset.ReadFile("abc-h-def-network-original.txt") != DSL_OKAY) {
return -2;
}
std::cout << "2. Successfully read data set\n";
std::string errMsg;
std::vector<DSL_datasetMatch> matches;
if (dataset.MatchNetwork(network, matches, errMsg) != DSL_OKAY) {
return -3;
}
std::cout << "3. Successfully calculated matches\n";
for (int i=0; i<matches.size(); i++) {
const DSL_datasetMatch &m = matches;
printf("%d col=%d slice=%d h=%d %s\n", i, m.column, m.slice, m.node, network.GetNode(m.node)->GetId());
}
std::vector<int> fixedNodes;
DSL_em em;
em.SetEquivalentSampleSize(2);
em.SetRandomizeParameters(true);
if (em.Learn(dataset, network, matches) != DSL_OKAY) {
std::cout << errMsg;
return -4;
}
std::cout << "4. Successfully learnt network\n";
if (network.WriteFile("abc-h-def-network-learnt.xdsl") != DSL_OKAY) {
return -5;
}
std::cout << "5. Successfully written learnt network\n";
DSL_doubleArray hdef = network.GetNode(network.FindNode("H"))->Definition()->GetMatrix()->GetItems();
std::cout << "Definition for node H: ";
for (int i=0; i<hdef.GetSize(); i++) {
std::cout << hdef;
if (i < hdef.GetSize()-1) {
std::cout << ",";
}
}
std::cout << "\n";
return 0;
}
#include <iostream>
#include <vector>
#include "smile.h"
#include "smilearn.h"
int main (int argc, char * const argv[]) {
DSL_network network;
if (network.ReadFile("abc-h-def-network-randomized.xdsl") != DSL_OKAY) {
std::cout << "Unable to read network\n";
return -1;
}
std::cout << "1. Successfully read network\n";
DSL_dataset dataset;
if (dataset.ReadFile("abc-h-def-network-original.txt") != DSL_OKAY) {
return -2;
}
std::cout << "2. Successfully read data set\n";
std::string errMsg;
std::vector<DSL_datasetMatch> matches;
if (dataset.MatchNetwork(network, matches, errMsg) != DSL_OKAY) {
return -3;
}
std::cout << "3. Successfully calculated matches\n";
for (int i=0; i<matches.size(); i++) {
const DSL_datasetMatch &m = matches;
printf("%d col=%d slice=%d h=%d %s\n", i, m.column, m.slice, m.node, network.GetNode(m.node)->GetId());
}
std::vector<int> fixedNodes;
DSL_em em;
em.SetEquivalentSampleSize(2);
em.SetRandomizeParameters(true);
if (em.Learn(dataset, network, matches) != DSL_OKAY) {
std::cout << errMsg;
return -4;
}
std::cout << "4. Successfully learnt network\n";
if (network.WriteFile("abc-h-def-network-learnt.xdsl") != DSL_OKAY) {
return -5;
}
std::cout << "5. Successfully written learnt network\n";
DSL_doubleArray hdef = network.GetNode(network.FindNode("H"))->Definition()->GetMatrix()->GetItems();
std::cout << "Definition for node H: ";
for (int i=0; i<hdef.GetSize(); i++) {
std::cout << hdef;
if (i < hdef.GetSize()-1) {
std::cout << ",";
}
}
std::cout << "\n";
return 0;
}
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
Windows :: SetRandomizeParameters(false):
0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.724004 0.275996 0.362899 0.637101 0.294494 0.705506 0.171772 0.828228 0.884285 0.115715 0.449727 0.5
50273 0.763873 0.236127
4 D
0.20216 0.79784 0.70362 0.29638
5 E
0.588571 0.411429 0.479142 0.520858
6 F
0.324728 0.675272 0.0908674 0.909133
Windows :: SetRandomizeParameters(true):
0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.724004 0.275996 0.362899 0.637101 0.294494 0.705506 0.171772 0.828228 0.884285 0.115715 0.449727 0.5
50273 0.763873 0.236127
4 D
0.20216 0.79784 0.70362 0.29638
5 E
0.588571 0.411429 0.479142 0.520858
6 F
0.324728 0.675272 0.0908674 0.909133
------------------
Mac OS X :: SetRandomizeParameters(false):
0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.518448 0.481552 0.584179 0.415821 0.86999 0.13001 0.0174197 0.98258 0.808501 0.191499 0.162883 0.837117 0.782767 0.217233
4 D
0.20216 0.79784 0.677407 0.322593
5 E
0.588571 0.411429 0.643097 0.356903
6 F
0.324728 0.675272 0.386902 0.613098
Mac OS X :: SetRandomizeParameters(true):
0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.518448 0.481552 0.584179 0.415821 0.86999 0.13001 0.0174197 0.98258 0.808501 0.191499 0.162883 0.837117 0.782767 0.217233
4 D
0.20216 0.79784 0.677407 0.322593
5 E
0.588571 0.411429 0.643097 0.356903
6 F
0.324728 0.675272 0.386902 0.613098
0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.724004 0.275996 0.362899 0.637101 0.294494 0.705506 0.171772 0.828228 0.884285 0.115715 0.449727 0.5
50273 0.763873 0.236127
4 D
0.20216 0.79784 0.70362 0.29638
5 E
0.588571 0.411429 0.479142 0.520858
6 F
0.324728 0.675272 0.0908674 0.909133
Windows :: SetRandomizeParameters(true):
0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.724004 0.275996 0.362899 0.637101 0.294494 0.705506 0.171772 0.828228 0.884285 0.115715 0.449727 0.5
50273 0.763873 0.236127
4 D
0.20216 0.79784 0.70362 0.29638
5 E
0.588571 0.411429 0.479142 0.520858
6 F
0.324728 0.675272 0.0908674 0.909133
------------------
Mac OS X :: SetRandomizeParameters(false):
0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.518448 0.481552 0.584179 0.415821 0.86999 0.13001 0.0174197 0.98258 0.808501 0.191499 0.162883 0.837117 0.782767 0.217233
4 D
0.20216 0.79784 0.677407 0.322593
5 E
0.588571 0.411429 0.643097 0.356903
6 F
0.324728 0.675272 0.386902 0.613098
Mac OS X :: SetRandomizeParameters(true):
0 A
0.500759 0.499241
1 B
0.492636 0.507364
2 C
0.527896 0.472104
3 H
0.195727 0.804273 0.518448 0.481552 0.584179 0.415821 0.86999 0.13001 0.0174197 0.98258 0.808501 0.191499 0.162883 0.837117 0.782767 0.217233
4 D
0.20216 0.79784 0.677407 0.322593
5 E
0.588571 0.411429 0.643097 0.356903
6 F
0.324728 0.675272 0.386902 0.613098
- Attachments
-
- abc-h-def-network-original.txt
- Training data
- (24.62 KiB) Downloaded 478 times
-
- abc-h-def-network-randomized.xdsl
- Network
- (3.11 KiB) Downloaded 497 times
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
I can't reproduce your results on Windows. I was using two different VC++ versions. I'll need the following information before we can solve this issue:
1) VC++ version you're using, including installed service packs
2) sizes and dates of *.lib files from smile_1_1_vcxx.zip
3) the data file and the network file used to run the program. There are already two .xdsls posted in this topic.
I've also noticed the output from your Java app in the first post has node H matched to column 6, but attached data file has column H at position 3 (0-based). SMILearn's matching is done with the node/column ID, so it's seems that data file used with Java was different from the one posted here.
1) VC++ version you're using, including installed service packs
2) sizes and dates of *.lib files from smile_1_1_vcxx.zip
3) the data file and the network file used to run the program. There are already two .xdsls posted in this topic.
I've also noticed the output from your Java app in the first post has node H matched to column 6, but attached data file has column H at position 3 (0-based). SMILearn's matching is done with the node/column ID, so it's seems that data file used with Java was different from the one posted here.