MatchNetwork for DBNs in jSMILE

The engine.
Post Reply
tholorthored
Posts: 7
Joined: Thu Dec 10, 2015 12:04 pm

MatchNetwork for DBNs in jSMILE

Post by tholorthored »

Hey,

I want to learn parameters of a DBN using JSMILE and I am a bit confused about the data format for the method matchNetwork of the class DataSet. I expected the naming convention to be similar to the one in Genie: A, A_1, A_2, B, B_1, B_2 as stated in this post.

However, when I use the network from Tutorial 6 and the following data file, matchNetwork is only able to match time slice 0 (columns A, B, C). Rest of the data (A_1, A_2 ...) is not used for learning. When I import the same data file into Genie automatic matching works. What am I missing? Does matchNetwork in jSMILE work with DBNs? How should the column naming look like?

A A_1 A_2 B B_1 B_2 C C_1 C_2
t t t t t t t t t
t f f t f f t f f
f t f f t f f t f
f f f f f f f f f
t f t t f t t f t
t f f t f f t f f
f t f f t f f t f
f t t f t t f t t
t t t t t t t t t
t t t t t t t t t
t t t t t t t t t
t t f t t f t t f
f f f f f f f f f
f f t f f t f f t
f f f f f f f f f
t f f t f f t f f
f t f f t f f t f
t t t t t t t t t
t f f t f f t f f
f t f f t f f t f

Code: Select all

Network net = new Network();
	net.readFile("Net_tut_6_original.xdsl");
	
	DataSet ds = new DataSet();
	ds.readFile("tut_6_data.txt" );

	DataMatch[] matching = ds.matchNetwork(net);
	final EM em = new EM();
	
	em.setRandomizeParameters(false);
	em.setEqSampleSize(1);
	em.setUniformizeParameters(true);
	em.learn(ds, net, matching);
	
Thanks,
tholorthored
Last edited by tholorthored on Thu Dec 17, 2015 2:56 pm, edited 1 time in total.
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: MatchNetwork for DBNs in jSMILE

Post by shooltz[BayesFusion] »

The information in the post you've quoted is not correct; the matchNetwork method does not match timeslices. For DBN learning you'll need to create the arrays of DataMatch objects in your own code.
tholorthored
Posts: 7
Joined: Thu Dec 10, 2015 12:04 pm

Re: MatchNetwork for DBNs in jSMILE

Post by tholorthored »

Great, thanks a lot for the fast clarification! Manually matching the variables works now.
I did a quick test and compared a learned network in GENIE to the one learned in jSMILE .
If there are no missing values, both networks will be identical. However, if I introduce missing values both networks differ slightly in terms of parameters (initial parameters are uniformized in both cases).

So I have two more questions:
1. I am using blank fields in my data file to indicate missing values. The method dataset.isMissing(variable, record) returns "true" for those fields with blank values. Is my assumption correct, that the missing values will be handled correctly if the method returns "true"?

2. Is it possible that the states are not matched correctly? I am using a data file with state numbers as entries, i.e. if I have a variable with the state names "t" and "f", I won't have those strings in the data file, but the related integer of the state (0-> t, 1-> f). This seems to work in the case of complete data. However, do I need to manually match the states or is that only necessary for data files containing the state names?

Could you think of any other reason why the parameters of the two networks differ (Genie vs. Smile)?

Thanks,
tholorthored
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: MatchNetwork for DBNs in jSMILE

Post by shooltz[BayesFusion] »

1. I am using blank fields in my data file to indicate missing values. The method dataset.isMissing(variable, record) returns "true" for those fields with blank values. Is my assumption correct, that the missing values will be handled correctly if the method returns "true"?
That's correct assumption. The blanks are always interpreted as missing values (there are options for specifying other tokens representing missing values during dataset parse).
However, do I need to manually match the states or is that only necessary for data files containing the state names?
If your data file contains numeric indices, then no action is necessary. However, if you do not use matchNetwork and the dataset uses strings, there's a possibility that mapping between data and network will be incorrect. The discrete column in the dataset object always uses integers to represent the values. If the data column is text-based, these integers are also indices into alphabetically sorted array of unique strings representing all possible values in the data column. EM in turn always uses raw indices, so with "T" and "F" you're heading into the trouble assuming that 1st node outcome is labelled "T". Note that matchNetwork may in general case change the numeric indices in textual dataset columns.

The above does not explain why your parameters differ between GeNIe and SMILE - you've mentioned the discrepancy exists with numeric data.
tholorthored
Posts: 7
Joined: Thu Dec 10, 2015 12:04 pm

Re: MatchNetwork for DBNs in jSMILE

Post by tholorthored »

Thanks for the insights. I will stay with numeric indices then.
Checking again for other reasons, I found out that the observed difference is due to an issue in the learning procedure: I assumed by enabling "uniformize" (and disabling randomize), the previous parameters of the network did not play a role in EM anymore. However, if I use the option "uniformize", parameter estimations will still depend on the initial parameters of the network.

Example:
1. Starting with a simple 2-node-network and uniform CPTs
2. Learning parameters from "small_example.txt" with uniformize = true, confidence = 1
3. Resulting network has P(B = false|A=false, B_t-1 = true) = 0.5833 (see "first_training.xdsl")
4. Learning parameters again from "simple_example.dat"
5. Resulting network has P(B = false|A=false, B_t-1 = true) = 0.5972 (see "second_training.xdsl" )

The difference is small in this case, but I found larger ones for other networks. Why is there a difference in learning when I repeat the process? Does "uniformize" not reset all CPTs anyway?
(The results are the same in GeNIE and SMILE)
Attachments
second_training.xdsl
(1.26 KiB) Downloaded 468 times
small_example.txt
(86 Bytes) Downloaded 371 times
first_training.xdsl
(1.26 KiB) Downloaded 491 times
Last edited by tholorthored on Wed Jan 06, 2016 8:09 am, edited 1 time in total.
tholorthored
Posts: 7
Joined: Thu Dec 10, 2015 12:04 pm

Re: MatchNetwork for DBNs in jSMILE

Post by tholorthored »

Any idea why the initial parameters in the conditional probability tables still influence the learned parameters even though I am using "uniformize" = True ? What am I missing?
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: MatchNetwork for DBNs in jSMILE

Post by shooltz[BayesFusion] »

Can you post your Java code so I can try to reproduce the issue?
tholorthored
Posts: 7
Joined: Thu Dec 10, 2015 12:04 pm

Re: MatchNetwork for DBNs in jSMILE

Post by tholorthored »

Please find attached my Java code. I had the same results when I performed parameter learning in the GeNIe GUI.
Attachments
tut.txt
(2.4 KiB) Downloaded 367 times
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: MatchNetwork for DBNs in jSMILE

Post by shooltz[BayesFusion] »

Both xdsl files you've attached in the previous message (first_training.xdsl, second_training.xdsl) are identical.
tholorthored
Posts: 7
Joined: Thu Dec 10, 2015 12:04 pm

Re: MatchNetwork for DBNs in jSMILE

Post by tholorthored »

Oh, you are right. second_training.xdsl was the wrong file. I have edited my previous post and uploaded the correct file. Sorry for the confusion. Were you able to reproduce the results?
Post Reply