How many datasets (data samples) is proper to the bayesian network?

The front end.
Post Reply
wxk8000
Posts: 20
Joined: Fri Jan 19, 2018 11:58 am

How many datasets (data samples) is proper to the bayesian network?

Post by wxk8000 »

Hello!
I have established a bayesian network as following figure, there are five notes and each note's state is shown in figure.
bayesian net.png
bayesian net.png (35.17 KiB) Viewed 7642 times
but I only have 100 datasets as the following figure, I don't know the data samples is proper, is there any principle or experience to decide how many data samples is need to establish the bayesian? Could you give me some advices.
dataset.png
dataset.png (9.4 KiB) Viewed 7642 times
wxk8000
Posts: 20
Joined: Fri Jan 19, 2018 11:58 am

Re: How many datasets (data samples) is proper to the bayesian network?

Post by wxk8000 »

I always feel that the datasets of my bayesian network is too few, only 100 samples, how about 50? or10000?
I want to know how many datasets for my bayesian network at least?
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: How many datasets (data samples) is proper to the bayesian network?

Post by marek [BayesFusion] »

There is never too many data records in learning and as their number increases, the quality of your parameters will as well. I have seen a heuristic "ten times as many records as the number of variables" but it is really just a heuristic that can be easily criticized. Another way of estimating the number of records that you will need is to look at your largest CPT. In your case, it seems that the variable feed_mm will have the largest CPT (5*5=25 columns and 7 rows). Your hundred records will have to distribute among the 25 columns, which gives you roughly 4 records a column. It is hard to learn a probability distribution over 7 outcomes from just 4 records. Another difficulty is that the distribution among the 25 columns is not going to be uniform and there will be many columns with zero records. So, I agree with you -- you have too few records to learn from.
I hope this helps,

Marek
wxk8000
Posts: 20
Joined: Fri Jan 19, 2018 11:58 am

Re: How many datasets (data samples) is proper to the bayesian network?

Post by wxk8000 »

marek wrote:There is never too many data records in learning and as their number increases, the quality of your parameters will as well. I have seen a heuristic "ten times as many records as the number of variables" but it is really just a heuristic that can be easily criticized. Another way of estimating the number of records that you will need is to look at your largest CPT. In your case, it seems that the variable feed_mm will have the largest CPT (5*5=25 columns and 7 rows). Your hundred records will have to distribute among the 25 columns, which gives you roughly 4 records a column. It is hard to learn a probability distribution over 7 outcomes from just 4 records. Another difficulty is that the distribution among the 25 columns is not going to be uniform and there will be many columns with zero records. So, I agree with you -- you have too few records to learn from.
I hope this helps,

Marek
Thank you, Marek, So you means that the data records has some relations with the variables and the CPT, so I want to know how to estimate the number of the data records I need for my bayesian network? Is there any experience or formula to calculate the number data records? That is very important for me to plan my experiment to get the data records, Thank you!
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: How many datasets (data samples) is proper to the bayesian network?

Post by marek [BayesFusion] »

There is no formula that I am aware of. I tried to give you an idea of how the number of records influences accuracy of your network but I'm afraid it is all qualitative.
I hope this helps,

Marek
wxk8000
Posts: 20
Joined: Fri Jan 19, 2018 11:58 am

Re: How many datasets (data samples) is proper to the bayesian network?

Post by wxk8000 »

marek wrote:There is no formula that I am aware of. I tried to give you an idea of how the number of records influences accuracy of your network but I'm afraid it is all qualitative.
I hope this helps,

Marek
The qualitative is OK to me, can you tell me the qualitative method?
Because I need to discuss the experiment plan with my teacher about the number of the dataset. Thank you very much!
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: How many datasets (data samples) is proper to the bayesian network?

Post by marek [BayesFusion] »

I *tried* to give you an idea, so you will have to fish it out of what I wrote above :-).

You can test this indeed in an experiment, varying the data set size and important parameters of your networks, such as connectivity, maximum parent size, etc. I am very interested in the results of your experiments!
Cheers,

Marek
wxk8000
Posts: 20
Joined: Fri Jan 19, 2018 11:58 am

Re: How many datasets (data samples) is proper to the bayesian network?

Post by wxk8000 »

marek wrote:I *tried* to give you an idea, so you will have to fish it out of what I wrote above :-).

You can test this indeed in an experiment, varying the data set size and important parameters of your networks, such as connectivity, maximum parent size, etc. I am very interested in the results of your experiments!
Cheers,

Marek
Dear Marek, I see that there has a function "generate data file" in GeNIe, Can I use this function to produce more datasets to learn the bayesian Network? But I feel that it is no use. Yesterday, my friend said to me that there are some algorithms to obtain more datasets for learning, so i don't know whether is ok for the bayesian Network.
Thank you!
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: How many datasets (data samples) is proper to the bayesian network?

Post by marek [BayesFusion] »

Generation of a data set from an existing model is a standard GeNIe functionality. I'm not sure how this can help you if you have no model or you have a model that has been learned from a very small number of records and, hence, is not too good. Any method that is statistically sound and deals with too small data sets should work for Bayesian networks. The nice thing about Bayesian networks is that they are very close to statistics!
I hope this helps,

Marek
Post Reply