Hi,
I used validation for DBN in Genie but I am not sure how the output prediction probabilities are computed.
For example, is leave-one-out method repeated N times (where N is the data size) using N-1 records for training and leave one record for testing (leaving a different record out on each repetition)?
The output results are the average probabilities?
I ran leave-one-out method to evaluate the predictive accuracy of my model but I am getting some strange results (accuracy= 0.99 at time slices 0 and 1 and accuracy =1 at time slice 2).
Do I understand correct the validation procedure on GeNIe or I am missing something?
Thank you in advance,
Kalia
Validation methods in GeNIe
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
Re: Validation methods in GeNIe
Yes, that's correct. See http://en.wikipedia.org/wiki/Cross-vali ... validation for more info.korfan01 wrote:For example, is leave-one-out method repeated N times (where N is the data size) using N-1 records for training and leave one record for testing (leaving a different record out on each repetition)?
No, the accuracy value calculated by GeNIe is defined as number of correctly classified cases divided by the total number of relevant records. For the node accuracy, the number of records is equal to the total record count in the data. For outcome accuracy, the total is defined as the number of records with the specified outcome.The output results are the average probabilities?
Note that regardless of the validation method used (k-fold, leave one out, test only), each of the data records will be used exactly once for classification.
BTW, are you using the validation on the unrolled network?
Re: Validation methods in GeNIe
Yes, I had used an unrolled DBN for the validation process. Thank you for your answer, I found out what was going wrong with my results, but I have another one question.
How are the final conditional probabilities for the model computed, after the validation process ? Is the average of all the training repetitions or the probabilities learned on the last training process?
Also, is it possible to have access to these parameters after the validation process?
Thanks,
Kalia
How are the final conditional probabilities for the model computed, after the validation process ? Is the average of all the training repetitions or the probabilities learned on the last training process?
Also, is it possible to have access to these parameters after the validation process?
Thanks,
Kalia
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
Re: Validation methods in GeNIe
Validation does not modify the conditional probabilities of the model you have open in SMILE. When you run in leave-one-out or K-fold mode, the EM phase work on the copy of the network. Depending on your choice of EM parameters you can have the conditionals uniformized, randomized or used as a start point with give confidence level. When EM completes the training phase (and new set of conditionals is obtained), the records not used in training will be instantiated and the posteriors of class node(s) compared to actual values in the data. This gives the accuracy, which is the output of validation.kalia_or wrote:Yes, I had used an unrolled DBN for the validation process. Thank you for your answer, I found out what was going wrong with my results, but I have another one question.
How are the final conditional probabilities for the model computed, after the validation process ? Is the average of all the training repetitions or the probabilities learned on the last training process?
If you mean the modified conditionals, then the answer is no. You should run EM to obtain new parameters.Also, is it possible to have access to these parameters after the validation process?
Re: Validation methods in GeNIe
One last question, how can I use cross validation to evaluate a DBN prognostic model?
Let the DBN has 5 variables: [A,B,C,D,E] and it is unrolled for 3 time slices [t0, t1, t2] and the goal is to predict the class value at t=2 (last time slice) e.g. E_2,
given all the observations up to t=1 [A_0,A_1,B_0,B_1,C_0,C_1,D_0,D_1].
Thus, all the variables at t=2 will be hidden. Is it correct to select all the hidden variables as class variables in the cross-validation window or there is another way to evaluate prognostic models?
Let the DBN has 5 variables: [A,B,C,D,E] and it is unrolled for 3 time slices [t0, t1, t2] and the goal is to predict the class value at t=2 (last time slice) e.g. E_2,
given all the observations up to t=1 [A_0,A_1,B_0,B_1,C_0,C_1,D_0,D_1].
Thus, all the variables at t=2 will be hidden. Is it correct to select all the hidden variables as class variables in the cross-validation window or there is another way to evaluate prognostic models?
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
Re: Validation methods in GeNIe
Yes, that's correct.korfan01 wrote:Thus, all the variables at t=2 will be hidden. Is it correct to select all the hidden variables as class variables in the cross-validation window or there is another way to evaluate prognostic models?