Hi,
I have two short questions.
First, I entered my dataset and let the program make a three-augmented BN. I want to predict the chances of a suicide or murder depending on the variables. Now I am a little confused because of the direction of the arrows. Shouldn't they be pointing at Person_shooting? (I can not enter the data in detail because of secrecy)
Second, I want to perform some validations and want the results to be the same every time. I understand I need to put on a seed different from 0. Does it matter what number I choose and if yes, what should I choose?
Thank you!
Beginner problems
Beginner problems
- Attachments
-
- Forum.PNG (114.45 KiB) Viewed 8031 times
-
- Site Admin
- Posts: 438
- Joined: Tue Dec 11, 2007 4:24 pm
Re: Beginner problems
Hi Elise2992,
TAN learning algorithm tries to fit the joint probability distribution (jpd) modeled by the BN to the jpd that has generated the data. They make no attempt to discover the causal structure of the system under study. On the other hand, they are excellent when the data set is on the small side and will perform numerically (i.e., in making their predictions) fine. I suspect that you will be reasonably happy with the model accuracy. I would, of course, compare this accuracy to the accuracy that you get using other algorithms. Another exercise that I would recommend is feature selection. It is possible that a subset of your feature variables will perform better than the whole set. It looks like you know what you are doing, so I'm sure you will figure this out. If not, please let us know and we will be happy to give some suggestions.
You are correct in the seed: Any seed larger than zero will make your results the same each time you run validation. Different seeds will make the results slightly different and there is really no theoretical preference for any seed.
I hope this helps,
Marek
TAN learning algorithm tries to fit the joint probability distribution (jpd) modeled by the BN to the jpd that has generated the data. They make no attempt to discover the causal structure of the system under study. On the other hand, they are excellent when the data set is on the small side and will perform numerically (i.e., in making their predictions) fine. I suspect that you will be reasonably happy with the model accuracy. I would, of course, compare this accuracy to the accuracy that you get using other algorithms. Another exercise that I would recommend is feature selection. It is possible that a subset of your feature variables will perform better than the whole set. It looks like you know what you are doing, so I'm sure you will figure this out. If not, please let us know and we will be happy to give some suggestions.
You are correct in the seed: Any seed larger than zero will make your results the same each time you run validation. Different seeds will make the results slightly different and there is really no theoretical preference for any seed.
I hope this helps,
Marek