Learned HMM parameters(by EM) different from BNT toolbox

Yun · Post by **Yun** » Tue Jan 03, 2017 2:43 pm

Hi, dear SMILE creators,

I tried to learn parameters of Hidden Markov Models using jsmile's EM class, and end up having quite different parameters from those from BNT(Bayes Net Toolbox for Matlab), given that I set the initial values of the parameters the same in both toolboxes. The possible reasons I could think of are:

1) the maximum iterations and convergence log-likelihood difference thresholds might be different, which I don't know how to set in jsmile.
2) there might be difference in the inference algorithm. In BNT, I checked that it uses "Online Junction tree inference algorithm for DBNs" with a smoother engine (which "creates an engine which does offline (fixed-interval) smoothing in O(T) space/time"). I am not sure the inference engine used in JSMILE.
3) there might be something wrong with how I prepared the HMM data and network files for jsmile, while I am quite sure the correctness of how I used BNT given that I have used it for over 1 year and have compared the result of it with other tools.

Here is a brief introduction of the network. I am constructing HMMs for inferring students' latent knowledge levels on three skills observing their correct/wrong performance on problems(items). There are in total three skills/knowledge components(KCs)/HMMs. For each one I specified one hidden variable (KC1, KC2, KC3) and an observable(C1, C2, C3), so in total there are 6 variables. In the data file for JSMILE, I created 7 time slices for each variable (6*7 columns). We can observe the observables of multiple HMMs at the same time (slices), but they can also be missing for other time slices. Each HMM is at most observed 4 times (e.g., C1 is observed at the 1st, 3rd, 5th and 7th time slices).

I attach here the two sets of parameters I got from JSMILE vs. BNT, the data file I prepared for JSMILE, and the relevant EM code. Particularly, I am using BNT-SM, a wrapper built based on BNT specialized for doing student modeling, i.e., using HMMs to infer students' latent knowledge levels. I already make sure the initial network passed to JSMILE has parameters the same as the initial parameters used for BNT.

EM code:

Code: Select all

public static void learnParameters(String networkFile, String dataFile, String learnedNetworkFile, Integer nbSlices) {
		System.out.println("\nLearing parameters ...");

		DataSet ds = new DataSet();
		System.out.println("Reading data file " + dataFile);
		ds.readFile(dataFile, "*");

		Network net = new Network();
		net.readFile(networkFile);
		System.out.println("Reading network file " + networkFile);

		DataMatch[] matching = null;
		boolean dbn = (networkFile.contains("hmm") || networkFile.contains("wkt")) ? true : false;
		if (dbn) {//here is the part for learning HMMs
			assert nbSlices != null;
			int nbVars = net.getAllNodeIds().length;//doesn't consider the timeslices
			ArrayList<Integer> varsArray = new ArrayList<Integer>();
			int[] nodes = net.getAllNodes();
			for (int i = 0; i < nbVars; i++) {
				int node = nodes[i];
				String nodeId = net.getNodeId(node);
				if (nodeId.contains("KC")) { //record hidden variables in varsArray so that we could match columns in the data file later
					System.out.println("vars[" + varsArray.size() + "]: node=" + node + ", id=" + nodeId);
					varsArray.add(node);
				}
			}
			for (int i = 0; i < nbVars; i++) {
				int node = nodes[i];
				String nodeId = net.getNodeId(node);
				if (!nodeId.contains("KC")) {  //record observed variables in varsArray so that we could match columns in the data file later
					System.out.println("vars[" + varsArray.size() + "]: node=" + node + ", nodeId=" + nodeId);
					varsArray.add(node);
				}
			}
			
			Integer[] vars = varsArray.toArray(new Integer[varsArray.size()]);
			int nbColumns = nbVars * nbSlices;
			// for matching the columns in the dataset, nodes in the network, and the time slices
			matching = new DataMatch[nbColumns];
			int i = 0;
			for (i = 0; i < nbColumns; i++) {
				System.out.println("\tMatching column " + i + " with vars[" + i / nbSlices + "] with node=" + vars[i / nbSlices] + " timeslice " + i % nbSlices);
				matching[i] = new DataMatch(i, vars[i / nbSlices], i % nbSlices);
				/* e.g., nbSlices = 7:
				 *  0-6 column in the data matches the variable 0 time slice 0-6;
				 *  7-13 .... 1...0-6
				*/
			}

		} else {
			matching = ds.matchNetwork(net);
		}

		final EM em = new EM();
		em.setRandomizeParameters(false);
		System.out.println("Starting learning em...");
		em.learn(ds, net, matching);

		net.writeFile(learnedNetworkFile);
		System.out.println("Output to " + learnedNetworkFile);
	}

I'd appreciate if you could give me some clues how I can resolve the difference so that I could decide whether I could replace BNT with JSMILE, because JSMILE really has faster speed than BNT :D

Thank you so much,
Yun

Intelligent Systems Program
University of Pittsburgh

mark · Post by **mark** » Wed Jan 04, 2017 10:57 am

To be honest, I am not sure whether I have a good answer. All of the things you list (and more, e.g., bugs) could have contributed to the differences. It's hard to judge from such a high level experiment what the issue is. To debug this, I would start with a simple, known network (i.e., generate data) and see when the results deviate. This would also address issues in the preparation in the data (i.e., you would see it when you do it wrong). The EM algorithm is the same as what's used for non-DBNs except that the parameters between the slices are 'tied'.

Not a direct answer, but I still hope it helps!

Yun · Post by **Yun** » Wed Jan 04, 2017 5:30 pm

Dear Mark,

Thank you so much for your quick reply! Following your advice, I did smaller steps to debug, and finally found out that I made an error: I shouldn't prepare all HMMs data in one single file, but should prepare one data file for one HMM instead. By doing in my previous way, different HMMs can have time slices not corresponding and will be put missing values for its own observables, which will make the network different from what it should be, i.e., each HMM should be independent in their time slices.

Now each one of the learned parameters using jsmile for my posted network is at most 0.01 (absolute value) different from the one I got from BNT (Kevin Murphy), yet jsmile only costs 1/10 of the time!

I will proceed with the tool to do more experiment! Once again, thank you so much for making the tool so fast, so accessible and the help so quick -- I really enjoy using it!

mark · Post by **mark** » Thu Jan 05, 2017 9:25 am

Great, that's awesome!

BayesFusion Support Forum

Learned HMM parameters(by EM) different from BNT toolbox

Learned HMM parameters(by EM) different from BNT toolbox

Re: Learned HMM parameters(by EM) different from BNT toolbox

Re: Learned HMM parameters(by EM) different from BNT toolbox

Re: Learned HMM parameters(by EM) different from BNT toolbox