Parameter learner not optimising a DBN's temporal CPT's in PySMILE

The engine.
Post Reply
Marijn Peppelman
Posts: 23
Joined: Mon Jul 08, 2019 3:14 pm

Parameter learner not optimising a DBN's temporal CPT's in PySMILE

Post by Marijn Peppelman »

I'm running into some trouble optimizing the parameters of a DBN with the PySMILE.

In GENIE, the process is painless. Open the data file open the DBN, click learn parameters, confirm the time series, confirm the node matching and start the EM algorithm.

However, when working with the wrapper, it seems only the direct BN cpt's are optimized, the temporal cpt's are not touched at all (they all stay at the initialized values of 0.5).

I'm calling the learning process with this code:

Code: Select all

ds = pysmile.learning.DataSet()
        ds.read_file(self.learningDataFile);
        matching = ds.match_network(self.net)
        em = pysmile.learning.EM()
        em.learn(ds, self.net, matching)
I currently suspect the matching doesn't include the temporal nodes, but can't find which methods i would need to call to do the matching for a DBN properly. The SMILE wrapper and SMILE documentation also has no further information on how parameter optimization for a DBN should be called, only the above method which is for regular BNs.

I'll go digging through the forums a bit, see if i can find anything.
Last edited by Marijn Peppelman on Fri Nov 29, 2019 12:54 am, edited 1 time in total.
Marijn Peppelman
Posts: 23
Joined: Mon Jul 08, 2019 3:14 pm

Re: Parameter learner not optimising a DBN's temporal CPT's in PySMILE

Post by Marijn Peppelman »

PS: here are the files i'm working with (3 of 5):
Attachments
PerfNetwork1_3BE.xdsl
Origional network that generated the learning data
(2.28 KiB) Downloaded 446 times
dev_PerfNetwork1_3BE_1574983185.5145876B.xdsl
Network after parameter learning by own script
(2.07 KiB) Downloaded 472 times
dev_PerfNetwork1_3BE_1574983185.5145876.xdsl
BN before learning, same structure, default CPT's (all paramters 0.5)
(1.92 KiB) Downloaded 446 times
Last edited by Marijn Peppelman on Fri Nov 29, 2019 12:54 am, edited 1 time in total.
Marijn Peppelman
Posts: 23
Joined: Mon Jul 08, 2019 3:14 pm

Re: Parameter learner not optimising a DBN's temporal CPT's in PySMILE

Post by Marijn Peppelman »

PPS: and the script and data files (2 of 5)

The scripts is mostly helper functions, and the last section is printing for debugging, the important part is

Code: Select all

networkFile = "dev_PerfNetwork1_3BE_1574983185.5145876.xdsl"
networkOutFile = networkFile[:-5]+ "B.xdsl"
dataFile = "PerfNetwork1_3BE_Data.csv"
net = pysmile.Network()
net.read_file(networkFile)

ds = pysmile.learning.DataSet()
ds.read_file(dataFile);
print("learning from records: " + str(ds.get_record_count()))
matching = ds.match_network(net)
em = pysmile.learning.EM()
em.learn(ds, net, matching)
lastScore = em.get_last_score()
net.write_file(networkOutFile)


print(lastScore)
Attachments
PerfNetwork1_3BE_Data.csv
Learning data
(231.15 KiB) Downloaded 468 times
experiments.txt
Script i used
(5.5 KiB) Downloaded 473 times
piotr [BayesFusion]
Site Admin
Posts: 60
Joined: Mon Nov 06, 2017 6:41 pm

Re: Parameter learner not optimising a DBN's temporal CPT's in PySMILE

Post by piotr [BayesFusion] »

Network matching in SMILE wrappers works with BN's only. However, this functionality would be added in 1.4.2 release (first week of December), so if you are not in hurry, we advise you to wait a few days. In otherwise, you have to match network manually.
Marijn Peppelman
Posts: 23
Joined: Mon Jul 08, 2019 3:14 pm

Re: Parameter learner not optimising a DBN's temporal CPT's in PySMILE

Post by Marijn Peppelman »

Thank you for the reply.

I have a intermediate deadline before that, so I'll create a manual matching in the mean time for that.
It is still good news i won't have to do that in future.
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: Parameter learner not optimising a DBN's temporal CPT's in PySMILE

Post by shooltz[BayesFusion] »

Note that you'll need to ensure that indices in the dataset fit the order of outcomes in your network. Your outcomes are{working,failed} - working is at index 0, failed is at index 1. When loading the .csv with readFile you'll get alphabetical sort on the values in data columns; failed will be at zero and working at 1.

DataSet.matchNetwork actually changes the indices in the dataset when the match between column and node is found.

An easy alternative is to use integer indices in the dataset. In such case you only need to find the match between nodes and columns, no data translation is required.
Marijn Peppelman
Posts: 23
Joined: Mon Jul 08, 2019 3:14 pm

Re: Parameter learner not optimising a DBN's temporal CPT's in PySMILE

Post by Marijn Peppelman »

I've created a manual matching by taking the regular matching and adding more DataMatch objects with modified slice and column values.
For example, where normaly i would have:

Code: Select all

DataMatch.node = 1, DataMatch.column = 100, DataMatch.slice = 0
For a 100 time slice DBN, I added more DataMatch objects with

Code: Select all

DataMatch.node = 1, DataMatch.column = 101, DataMatch.slice = 1
DataMatch.node = 1, DataMatch.column = 102, DataMatch.slice = 2
DataMatch.node = 1, DataMatch.column = 103, DataMatch.slice = 3
DataMatch.node = 1, DataMatch.column = 104, DataMatch.slice = 4
etc.

However, doing this still led to some nonsensical results, and it apears only some of the temporal CPT's were updated.
When i restricted the learning data to only the first entry, while GeNIe optimised to a score of -71.6369, the wrapper would optimise to a score of -3.2147 but with the nonsensical temporal CPT's. ([0.994949494949495, 0.005050505050505051, 0.75, 0.25, 0.5, 0.5, 0.5, 0.5], source values [0.99, 0.01, 0.75, 0.25, 0.95, 0.05, 0.25, 0.75])
The base BN CPT's are exactly the same though.

The edited script looks like this:

Code: Select all

networkFile = "dev_PerfNetwork1_3BE_1574983185.5145876.xdsl"
networkOutFile = networkFile[:-5]+ "B.xdsl"
dataFile = "PerfNetwork1_3BE_DataA.csv"
net = pysmile.Network()
net.read_file(networkFile)

ds = pysmile.learning.DataSet()
ds.read_file(dataFile);
print("learning from records: " + str(ds.get_record_count()))
matching = ds.match_network(net)

newMatching = []
for baseMatch in matching:
    newMatching.append(baseMatch)
    for i in range(1, net.get_slice_count()):
        newMatchColumn = baseMatch.column + i
        newMatchNode = baseMatch.node
        newMatchSlice = i
        newMatch = pysmile.learning.DataMatch(newMatchColumn, newMatchNode, newMatchSlice)
        newMatching.append(newMatch)
matching = newMatching
em = pysmile.learning.EM()
em.learn(ds, net, matching)
lastScore = em.get_last_score()
net.write_file(networkOutFile)
Am i still missing a step in the EM learning, or am i misunderstanding the values the DataMatch objects should have?

EDIT: I just missed the previous post, since i had the site still open and posted before refreshing. I'll investigate that now.
Last edited by Marijn Peppelman on Fri Nov 29, 2019 8:08 pm, edited 2 times in total.
Marijn Peppelman
Posts: 23
Joined: Mon Jul 08, 2019 3:14 pm

Re: Parameter learner not optimising a DBN's temporal CPT's in PySMILE

Post by Marijn Peppelman »

I've manually set the ds.set_variable_names to ["working", "failed"] for all the nodes in ds, and the above issue still persists.
In the origional data set, it looks as if the self t-1 CPT's for node 3 are indeed flipped, which would suggest that the behavior you mentioned
(aplhabetised "working"/"failed") is at least somewhat at play here, but i've corrected that to the best of m knowledge.

Am i missing something more, or can't this be solved atm?

PS: Switching to indices did indeed lead to the same outcomes between optimisation by GeNIe and PySMILE. So i'll go with that for now. manually doing things, is clear i'm missing something, so i'll wait for the new version release

EDIT: added code and data file

Code: Select all

ds = pysmile.learning.DataSet()
ds.read_file(dataFile);
print("learning from records: " + str(ds.get_record_count()))
matching = ds.match_network(net)

newMatching = []
for baseMatch in matching:
    newMatching.append(baseMatch)
    stateNames = ds.get_state_names(baseMatch.node * net.get_slice_count())
        for i in range(1, net.get_slice_count()):
        temporalNodeHandle = baseMatch.node * net.get_slice_count() + i
        newMatchColumn = baseMatch.column + i
        newMatchNode = baseMatch.node
        newMatchSlice = i
        newMatch = pysmile.learning.DataMatch(newMatchColumn, newMatchNode, newMatchSlice)
        newMatching.append(newMatch)
        ds.set_state_names(temporalNodeHandle, stateNames)
        
matching = newMatching
em = pysmile.learning.EM()

em.set_uniformize_parameters(True)
em.set_auto_slices(True)
em.learn(ds, net, matching)

lastScore = em.get_last_score()
net.write_file(networkOutFile)


print(lastScore)
Attachments
PerfNetwork1_3BE_DataA.csv
reduced learning data file
(4.88 KiB) Downloaded 430 times
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: Parameter learner not optimising a DBN's temporal CPT's in PySMILE

Post by shooltz[BayesFusion] »

I've manually set the ds.set_variable_names to ["working", "failed"]
I believe you meant set_state_names above. Note that EM algorithm does not perform the lookup using the state names associated with dataset column - it just retrieves the integer value and uses it to set evidence. MatchNetwork translates the integers in the dataset if it founds a match between node and column.

Of course switching to the dataset with indices only solves this issue.
Post Reply