Query of the validation of bayesian network

The front end.
Post Reply
wxk8000
Posts: 20
Joined: Fri Jan 19, 2018 11:58 am

Query of the validation of bayesian network

Post by wxk8000 » Wed Sep 12, 2018 11:51 am

hello!
I have a question about validation of bayesian network.
I know that we can evaluate the network on the data set straight or choose k-fold cross-validation.
Can the dataset used for learning be used for Validation again? that is, Can I validate the network with the dataset which has been used for structure or parameter learning? or I have to collect new dataset to validate the Network?


Another minor question
when I discrete the continuous data, many lines of dataset with the same records appear like the attached figure.
QQ截图20180912134717.jpg
QQ截图20180912134717.jpg (86.9 KiB) Viewed 1145 times
at this time, the line of dataset with the same records should not be deleted, is that right? otherwise, that will influnence the results of the stucture or parameter learning. and will make the learning result less accurate.

marek [BayesFusion]
Site Admin
Posts: 271
Joined: Tue Dec 11, 2007 4:24 pm

Re: Query of the validation of bayesian network

Post by marek [BayesFusion] » Thu Sep 13, 2018 11:37 am

There is a short and a longer answer to your question. The short answer is: Yes, you can use the same data set to train and validate your models. To do so, please use cross-validation when testing/validating the model. Please see GeNIe manual and read up on the concept of "cross-validation". The long answer includes the short answer with a word of caution: If you do use cross-validation, please keep in mind that you used the whole data set to learn the structure, so the validation is that of the parameters, which are trained from part of the data and tested on the remaining records in the process of cross-validation. If you want to validate the structure itself, you will have to split the data into groups, learn the structure from each group, and then compare the different structures or the accuracy of each structure or average the accuracies. This process is not automatized in GeNIe and left up to you. In my experience, very few people do this but if you want to be absolutely positively correct about validation, you might consider this.

The answer to your second question is no, you should not delete duplicate records. They are an important source of information for learning frequencies in your data.
I hope this helps,

Marek

wxk8000
Posts: 20
Joined: Fri Jan 19, 2018 11:58 am

Re: Query of the validation of bayesian network

Post by wxk8000 » Thu Sep 13, 2018 8:58 pm

marek [BayesFusion] wrote:
Thu Sep 13, 2018 11:37 am
There is a short and a longer answer to your question. The short answer is: Yes, you can use the same data set to train and validate your models. To do so, please use cross-validation when testing/validating the model. Please see GeNIe manual and read up on the concept of "cross-validation". The long answer includes the short answer with a word of caution: If you do use cross-validation, please keep in mind that you used the whole data set to learn the structure, so the validation is that of the parameters, which are trained from part of the data and tested on the remaining records in the process of cross-validation. If you want to validate the structure itself, you will have to split the data into groups, learn the structure from each group, and then compare the different structures or the accuracy of each structure or average the accuracies. This process is not automatized in GeNIe and left up to you. In my experience, very few people do this but if you want to be absolutely positively correct about validation, you might consider this.

The answer to your second question is no, you should not delete duplicate records. They are an important source of information for learning frequencies in your data.
I hope this helps,

Marek
Dear Marek
First, I want to thank you for your careful explaination everytime.
So you mean that the cross-validation method is just used for the validation of parameters, and this method can still use the same data which was used for structure learning.
I hope my understanding is correct.

marek [BayesFusion]
Site Admin
Posts: 271
Joined: Tue Dec 11, 2007 4:24 pm

Re: Query of the validation of bayesian network

Post by marek [BayesFusion] » Mon Sep 24, 2018 3:13 am

I believe your understanding is correct -- Marek

Post Reply