The academy GeNIe always shut down.
- 
				Yang Yajie
- Posts: 34
- Joined: Thu Mar 19, 2020 11:49 am
The academy GeNIe always shut down.
the first question is :
when I use GeNIe to create Bayesian network and dynamic Bayesian network with several nodes, I can use it successfully. I use 27 nodes as parents nodes, and 2 nodes as children nodes in GeNIe, by PC algorithm, when I try to create structure, the GeNIe will shut down ,and all the data, nodes and picture will lose, I have to try to do it again, but I always failed again. Do you know why? My GeNIe is academy version.
the second question is :
how to connect SMILE with GeNIe together, do you have some introduction about that?
			
			
									
						
										
						when I use GeNIe to create Bayesian network and dynamic Bayesian network with several nodes, I can use it successfully. I use 27 nodes as parents nodes, and 2 nodes as children nodes in GeNIe, by PC algorithm, when I try to create structure, the GeNIe will shut down ,and all the data, nodes and picture will lose, I have to try to do it again, but I always failed again. Do you know why? My GeNIe is academy version.
the second question is :
how to connect SMILE with GeNIe together, do you have some introduction about that?
- 
				shooltz[BayesFusion]
- Site Admin
- Posts: 1477
- Joined: Mon Nov 26, 2007 5:51 pm
Re: The academy GeNIe always shut down.
Can you post your data file and PC algorithm settings here?by PC algorithm, when I try to create structure, the GeNIe will shut down ,and all the data, nodes and picture will lose
Please be more specific - what do you want to do with both SMILE and GeNIe?how to connect SMILE with GeNIe together, do you have some introduction about that?
- 
				Yang Yajie
- Posts: 34
- Joined: Thu Mar 19, 2020 11:49 am
Re: The academy GeNIe always shut down.
I want to use GeNIe to create a dynamic bayesian network, so i input data into GeNIe, i select 27 variables as parent nodes and 2 variables as child nodes in expert knowledge, then press the " learn network", i set use PC algorithm and other algorithm when learn algorithm, but after 10 minites learn in GeNIe software, the GeNIe always shut down and i can not get the results. but when i use fewer variables in GeNIe, then i can get the network well. do you know why?
the second question is, i realized that i can use SMILE to coding Dynamic bayesian network. but i am not familiar to coding, so i want to use GeNIe to create network and get the CPD, then use Python and SMILE package to coding Dynamic bayesian network. but i do not how to use GeNIe together, and i do not what GeNIe can do when i coding by SMILE package and python. i am not sure if i describe my questions clear. Thanks for your help!
			
			
									
						
										
						the second question is, i realized that i can use SMILE to coding Dynamic bayesian network. but i am not familiar to coding, so i want to use GeNIe to create network and get the CPD, then use Python and SMILE package to coding Dynamic bayesian network. but i do not how to use GeNIe together, and i do not what GeNIe can do when i coding by SMILE package and python. i am not sure if i describe my questions clear. Thanks for your help!
- 
				shooltz[BayesFusion]
- Site Admin
- Posts: 1477
- Joined: Mon Nov 26, 2007 5:51 pm
Re: The academy GeNIe always shut down.
Unfortunately, GeNIe does not support structure learning for dynamic Bayesian networks. However, parameter learning is supported. Therefore, you'll need to create nodes and arcs manually in GeNIe, then use parameter learning to obtain conditional probabilities.I want to use GeNIe to create a dynamic bayesian network, so i input data into GeNIe
Once you have your network created in GeNIe you can save it to .xdsl file, and subsequently load the file into your Python program.
- 
				Yang Yajie
- Posts: 34
- Joined: Thu Mar 19, 2020 11:49 am
Re: The academy GeNIe always shut down.
thank for your help. When I try to make Dynamic Bayesian Network in GeNIe first. There are still 5 questions need to be solved. I read the introduction of GeNIe and SMILE first, and search how to solve my problems on the website, however, I have not found the solution to my question. I need your help.
Let me introduce my research first. My research project relates to the Population's Socio-demographic and the Population's life events. The socio-demographic sometimes contain age, gender, marriage status, highest completed education status, employment status, and number of cars, number of family members, household income. The life course events sometimes contain Car ownership(buy a car; sale a car; replace a car), marriage( get marriages, divorce), the birth of children(decide to birth a child this year; Decide not to birth a child this year), You can think of life events as a decision that people will make every year.
For my research topic, I have 10 years of data for socio-demographic, and I can derive the life events in every year by comparing socio-demographic in every year. for example, if the number of cars in the year T+1 is 5, the number of cars in the year T is 4, then car ownership in year T is to buy a car. I put examples of data in the attachment.
Q1: In GeNIe, I need to input my data into GeNIe first to built DBN, can you tell me how can I name my socio-demographic and life events in each year to build the Dynamic Bayesian Network(DBN) in GeNIe? I saw an example that said that if I want to make DBN in GeNIe, I should name variables name according to their year, such as the name in CSV file shown like marriage_1, marriage_2, marriage_3,...,marriage_10 represent the marriage from year 1 to year 10. Right? then I name all the variables in the different years like Name_1, Name_2, Name_3,..., Name_10, then the GeNIe can identify the dynamic variables, right? Then I can make a dynamic bayesian network structure in GeNIe manually and learn the CPT of the Network.
Q2: When constructing the Bayesian Belief Network in GeNIe, Is the Conditional Probability Distribution calculated based on the frequency of data?
Q3: When I use GeNIe to make DBN, Does the CPT of nodes in DBN are invariable in every year? When I unrolled the DBN, the CPT of the nodes in every year is the same, right? and the transition probability is the same every year, right? When I make Parameter learning for DBN in GeNIe, should I unrolled the DBN before I make it that? or Should I Press the "Parameter Learning" bottom and "update" bottom directly when I finished construct DBN well in GeNIe?
Q4: There are too many nodes in my research, when I try to make DBN in GeNIe, sometimes the software will tell the number of parents could not more than 20, and the software always shut down. So I wonder whether it will be better if I coding them in python by SMILE package. if so, Can I make Parameter Learning by SMILE package in python? I notice that some examples of DBN set the CPT and transition probability manually, like the example of "Rain in Pittsburg and Saraha". I am confused that the CPT of rain and umbrella calculated by frequency of data or some expert knowledge? Should I make Parameter Learning of DBN by expert knowledge or by data when I construct the DBN in python? Should I unroll the DBN first before Parameter Learning? Do you have some programming examples of Parameter Learning by SMILE in python? I learned how the parameter learning in GeNIe, but do not know how to parameter learning by SMILE in python.
Q5: There are 4 types of variables nodes when I built the DBN network.
Type 1 is static socio-demographic variables and this kind of variable node will not change over time, like gender. Should I put type 1 nodes like "gender" in Contemporals area when I built DBN in GeNIe?
Type 2 is dynamic socio-demographic variables, this kind of variable node will change over time, but this kind of variable nodes will not be changed by influence from other socio-demographic and life events, like age, individual's age will change and increase 1 every year, but age node will influence on other life events nodes every year. Should I put age in the Temporal area when I built DBN in GeNIe? How can I reflect age increase 1 every year in GeNIe or in python?
Type 3 is another dynamic socio-demographic variables, like the number of cars, the number of family members, and the number of children in the family. This kind of socio-demographic will change every year because of the life events may change every year, and socio-demographic are the results of life events every year. for example, in year T, the number of cars influence on Car ownership, if Car ownership is to buy a car, then the number of cars in year T will influence 1. The number of children will influence life events "birth of child", if "birth of child" is to birth a child in year T, the number of children in year T+1 will influence 1. I do not know where to place this kind of nodes like " the number of cars" and "the number of children". In my research, the change of mostly socio-demographic results from life events change. Should I put type 3 nodes like the "the number of cars" and"the number of family members" in the Init Condition area or Temporal Plate area? If I put type 3 nodes and life events nodes in the Temporal Plate area, I am sure I will add arc from "the number of cars"(T) to " car ownership(T). To reflect the dynamic change, Should I draw time arc from Car ownership(T) to Car ownership (T+1)? Or is it reasonable to draw an arc from Car ownership (T) to "the number of cars (T+1)?
Type 4 variables are dynamic life events variable nodes. This life event will change every year according to the socio-demographic. I choose to put these dynamic life events nodes in the Temporal Plate area, am I right?
			
			
									
						
										
						Let me introduce my research first. My research project relates to the Population's Socio-demographic and the Population's life events. The socio-demographic sometimes contain age, gender, marriage status, highest completed education status, employment status, and number of cars, number of family members, household income. The life course events sometimes contain Car ownership(buy a car; sale a car; replace a car), marriage( get marriages, divorce), the birth of children(decide to birth a child this year; Decide not to birth a child this year), You can think of life events as a decision that people will make every year.
For my research topic, I have 10 years of data for socio-demographic, and I can derive the life events in every year by comparing socio-demographic in every year. for example, if the number of cars in the year T+1 is 5, the number of cars in the year T is 4, then car ownership in year T is to buy a car. I put examples of data in the attachment.
Q1: In GeNIe, I need to input my data into GeNIe first to built DBN, can you tell me how can I name my socio-demographic and life events in each year to build the Dynamic Bayesian Network(DBN) in GeNIe? I saw an example that said that if I want to make DBN in GeNIe, I should name variables name according to their year, such as the name in CSV file shown like marriage_1, marriage_2, marriage_3,...,marriage_10 represent the marriage from year 1 to year 10. Right? then I name all the variables in the different years like Name_1, Name_2, Name_3,..., Name_10, then the GeNIe can identify the dynamic variables, right? Then I can make a dynamic bayesian network structure in GeNIe manually and learn the CPT of the Network.
Q2: When constructing the Bayesian Belief Network in GeNIe, Is the Conditional Probability Distribution calculated based on the frequency of data?
Q3: When I use GeNIe to make DBN, Does the CPT of nodes in DBN are invariable in every year? When I unrolled the DBN, the CPT of the nodes in every year is the same, right? and the transition probability is the same every year, right? When I make Parameter learning for DBN in GeNIe, should I unrolled the DBN before I make it that? or Should I Press the "Parameter Learning" bottom and "update" bottom directly when I finished construct DBN well in GeNIe?
Q4: There are too many nodes in my research, when I try to make DBN in GeNIe, sometimes the software will tell the number of parents could not more than 20, and the software always shut down. So I wonder whether it will be better if I coding them in python by SMILE package. if so, Can I make Parameter Learning by SMILE package in python? I notice that some examples of DBN set the CPT and transition probability manually, like the example of "Rain in Pittsburg and Saraha". I am confused that the CPT of rain and umbrella calculated by frequency of data or some expert knowledge? Should I make Parameter Learning of DBN by expert knowledge or by data when I construct the DBN in python? Should I unroll the DBN first before Parameter Learning? Do you have some programming examples of Parameter Learning by SMILE in python? I learned how the parameter learning in GeNIe, but do not know how to parameter learning by SMILE in python.
Q5: There are 4 types of variables nodes when I built the DBN network.
Type 1 is static socio-demographic variables and this kind of variable node will not change over time, like gender. Should I put type 1 nodes like "gender" in Contemporals area when I built DBN in GeNIe?
Type 2 is dynamic socio-demographic variables, this kind of variable node will change over time, but this kind of variable nodes will not be changed by influence from other socio-demographic and life events, like age, individual's age will change and increase 1 every year, but age node will influence on other life events nodes every year. Should I put age in the Temporal area when I built DBN in GeNIe? How can I reflect age increase 1 every year in GeNIe or in python?
Type 3 is another dynamic socio-demographic variables, like the number of cars, the number of family members, and the number of children in the family. This kind of socio-demographic will change every year because of the life events may change every year, and socio-demographic are the results of life events every year. for example, in year T, the number of cars influence on Car ownership, if Car ownership is to buy a car, then the number of cars in year T will influence 1. The number of children will influence life events "birth of child", if "birth of child" is to birth a child in year T, the number of children in year T+1 will influence 1. I do not know where to place this kind of nodes like " the number of cars" and "the number of children". In my research, the change of mostly socio-demographic results from life events change. Should I put type 3 nodes like the "the number of cars" and"the number of family members" in the Init Condition area or Temporal Plate area? If I put type 3 nodes and life events nodes in the Temporal Plate area, I am sure I will add arc from "the number of cars"(T) to " car ownership(T). To reflect the dynamic change, Should I draw time arc from Car ownership(T) to Car ownership (T+1)? Or is it reasonable to draw an arc from Car ownership (T) to "the number of cars (T+1)?
Type 4 variables are dynamic life events variable nodes. This life event will change every year according to the socio-demographic. I choose to put these dynamic life events nodes in the Temporal Plate area, am I right?
- 
				marek [BayesFusion]
- Site Admin
- Posts: 449
- Joined: Tue Dec 11, 2007 4:24 pm
Re: The academy GeNIe always shut down.
Hi Yajie,
I wish I were able to write a comprehensive answer to your long query and answer every one of your questions. Regretfully, are are physically unable to offer modeling guidance to every one of our users and we have tens of thousands of them. Your best bet will be to rely on your research advisor, who is undoubtedly more advanced in the field than you are. Some of your questions and doubts relate to the exponential complexity of both representation and reasoning in probabilistic models. Even if your variables are all binary, 20 parents means 2^20 probability distributions and 20*8MB per such a node. You are thinking of building a dynamic network with such nodes and multiply this number. This leads to exhausting your memory and no wonder that it crashes the program. There are many modeling tricks and rules that you need to master to venture what you are doing. I suggest a good book on Bayesian networks and discussing things with your advisor. Let me give you brief answers to some of your questions. Hopefully, you can find more in a good book on Bayesian modeling.
Q1: This is correct and described in the manual -- variable names in different time steps should end in _1, _2, etc.
Q2: Yes
Q3: Yes, this is called "stationarity assumption" in DBNs
Q4: Your model is too complex and using SMILE directly is unlikely to help. You should focus on building simpler models
Please do not hesitate to post queries if they concern the software -- we will be happy to try to answer them.
Good luck!
Marek
			
			
									
						
										
						I wish I were able to write a comprehensive answer to your long query and answer every one of your questions. Regretfully, are are physically unable to offer modeling guidance to every one of our users and we have tens of thousands of them. Your best bet will be to rely on your research advisor, who is undoubtedly more advanced in the field than you are. Some of your questions and doubts relate to the exponential complexity of both representation and reasoning in probabilistic models. Even if your variables are all binary, 20 parents means 2^20 probability distributions and 20*8MB per such a node. You are thinking of building a dynamic network with such nodes and multiply this number. This leads to exhausting your memory and no wonder that it crashes the program. There are many modeling tricks and rules that you need to master to venture what you are doing. I suggest a good book on Bayesian networks and discussing things with your advisor. Let me give you brief answers to some of your questions. Hopefully, you can find more in a good book on Bayesian modeling.
Q1: This is correct and described in the manual -- variable names in different time steps should end in _1, _2, etc.
Q2: Yes
Q3: Yes, this is called "stationarity assumption" in DBNs
Q4: Your model is too complex and using SMILE directly is unlikely to help. You should focus on building simpler models
Please do not hesitate to post queries if they concern the software -- we will be happy to try to answer them.
Good luck!
Marek
- 
				Yang Yajie
- Posts: 34
- Joined: Thu Mar 19, 2020 11:49 am
Re: The academy GeNIe always shut down.
Hi Marek, thanks for your help. What is the biggest number of data items can GeNIe deal? When I create the DBN in GeNIe, I may at least use 200000 pieces of data, and these data involve 30-50 variables. But I will control the parent nodes no more than 8 for each node. if so, Is it possible to learn parameter of DBN in GeNIe?
			
			
									
						
										
						- 
				marek [BayesFusion]
- Site Admin
- Posts: 449
- Joined: Tue Dec 11, 2007 4:24 pm
Re: The academy GeNIe always shut down.
The size of your data set sounds manageable.  When we do parameter learning, we keep the data in memory, so you need enough memory for the data (the number of bytes needed is 200K*50*8=80MB, quite small for today's computers) and for additional data structures that do not depend on the number of data but more on the number of states and inter-dependences.
The largest size of the data file that GeNIe can handle is limited by your available memory, so there is not a set limit on the data size. Even if you ever encounter a data set that is too large to fit in memory, it is not a big problem. You can always sample the data and create a smaller data set that is a random sample of your original set. If you do sampling correctly, the new (smaller) data set will reflect the properties of your original data set. Large data sets is never a problem, it is something to be joyful about!
Finally the number of parents controls the exponent but the number of states controls the base in the amount of memory that you need to represent a conditional probability distribution. There is a big difference between 2^10 and 10^10 :-).
I hope this helps.
Marek
			
			
									
						
										
						The largest size of the data file that GeNIe can handle is limited by your available memory, so there is not a set limit on the data size. Even if you ever encounter a data set that is too large to fit in memory, it is not a big problem. You can always sample the data and create a smaller data set that is a random sample of your original set. If you do sampling correctly, the new (smaller) data set will reflect the properties of your original data set. Large data sets is never a problem, it is something to be joyful about!
Finally the number of parents controls the exponent but the number of states controls the base in the amount of memory that you need to represent a conditional probability distribution. There is a big difference between 2^10 and 10^10 :-).
I hope this helps.
Marek
- 
				Yang Yajie
- Posts: 34
- Joined: Thu Mar 19, 2020 11:49 am
Re: The academy GeNIe always shut down.
Hi Marek,
I calculated the memory for my DBN parameter learning and inference. But I an not sure the memory I understand is right.
i assum: the data items is N=17000000, the number of variables is Node=40, in DBN the time slice is T=10, the max state of variables is State=5, the max number of parent node is P=8.
therefore, the total need memory "M" for parameter learning "M1" and structure "M2" is :
M=M1+M2=T * Node * (State^P)/(2^17)+N * Node * P/(10^3)=10*40*(5^8)/(2^17) + 17000k * 40 * 8/(10^3) =1192 MB +5440 MB = 6.5 G
Do I calculate the needed total memory right?
Does the memory you mentioned means the system memory? for my computer showns that " the Installed memory(RAM): 8.00gb (7.85 GB usable)."
for my Windows (C:), 51.6 GB free of 235 GB. Is it enough to make parameter learning and make DBN inference?
Does the memory means system memory or Windows (C:) memory ?
Thanks you very much!
			
			
									
						
										
						I calculated the memory for my DBN parameter learning and inference. But I an not sure the memory I understand is right.
i assum: the data items is N=17000000, the number of variables is Node=40, in DBN the time slice is T=10, the max state of variables is State=5, the max number of parent node is P=8.
therefore, the total need memory "M" for parameter learning "M1" and structure "M2" is :
M=M1+M2=T * Node * (State^P)/(2^17)+N * Node * P/(10^3)=10*40*(5^8)/(2^17) + 17000k * 40 * 8/(10^3) =1192 MB +5440 MB = 6.5 G
Do I calculate the needed total memory right?
Does the memory you mentioned means the system memory? for my computer showns that " the Installed memory(RAM): 8.00gb (7.85 GB usable)."
for my Windows (C:), 51.6 GB free of 235 GB. Is it enough to make parameter learning and make DBN inference?
Does the memory means system memory or Windows (C:) memory ?
Thanks you very much!
- 
				marek [BayesFusion]
- Site Admin
- Posts: 449
- Joined: Tue Dec 11, 2007 4:24 pm
Re: The academy GeNIe always shut down.
Hi Yajie,
Looking at your formula, I'm not sure why you are dividing by 2^17. The second part of the formula seems strange (why are you multiplying by the number of records and also dividing by 10^3? The amount of memory for both learning the parameters and the structure depends on the algorithms used. When there are missing data, we use the EM algorithm and please keep in mind that you need to represent the current model and perform inference with it. If you are using the default clustering algorithm, you need to predict the size of the joint tree, which is not a trivial task. Structure learning algorithms work on different principles and need a varying amount of memory. Bayesian search will need to perform inference, so you need to represent the network and the joint tree as well. Furthermore, SMILE has special, memory efficient structures for representing the most important parameters of the data, so you need to calculate memory needed for these structures. Complex, isn't it? :-).
Hard to say if your memory is large enough. Because the memory demands are exponential, the simplest way of dealing with the "out of memory" condition is to simplify your models.
I hope this helps.
Marek
			
			
									
						
										
						Looking at your formula, I'm not sure why you are dividing by 2^17. The second part of the formula seems strange (why are you multiplying by the number of records and also dividing by 10^3? The amount of memory for both learning the parameters and the structure depends on the algorithms used. When there are missing data, we use the EM algorithm and please keep in mind that you need to represent the current model and perform inference with it. If you are using the default clustering algorithm, you need to predict the size of the joint tree, which is not a trivial task. Structure learning algorithms work on different principles and need a varying amount of memory. Bayesian search will need to perform inference, so you need to represent the network and the joint tree as well. Furthermore, SMILE has special, memory efficient structures for representing the most important parameters of the data, so you need to calculate memory needed for these structures. Complex, isn't it? :-).
Hard to say if your memory is large enough. Because the memory demands are exponential, the simplest way of dealing with the "out of memory" condition is to simplify your models.
I hope this helps.
Marek