Learn new network based on the strengths of relationships derived from a survey

Jessyboeters · Post by **Jessyboeters** » Fri Dec 08, 2017 7:29 pm

Hello,

I'm new to GeNIe and have a problem in developing a BBN for my graduation project.

I want to use GeNIe for developing a BBN of factors causing delays. For my graduation project, I completed a survey where I found cause-effect relationships between 20 factors because experts assessed the relationships between factors through a matrix table (o no relationship; 1 weak relationship; 2 strong relationship; 4 very strong relationship). With SPSS (statistical analysis), I can calculate the average of the strengths of the relationships and the skewness (the measure of the asymmetry of the probability distribution) of the relationship. With the average and skewness, I tested the results with logical rules what resulted in 108 strong cause-effect relationships between factors (there are relationships accepted in both directions). An example is attached to this post.

This is by far too many relationships to make an optimal BBN by hand. I want to use the requirement that one factor has a maximum of 4 arrows to other factors (max parent count). My question is can I develop an optimal BBN based on the average strength of the relationships (in a datasheet) that I found in my survey based on an algorithm in GeNIe (Learn new Network)?

Please let me know if this is possible

Kind regards,

PS The next step of my graduation project after developing a BBN is finding the probabilities of the factors by another survey.

Post by **marek [BayesFusion]** » Sat Dec 09, 2017 3:45 pm

Let me first make sure that I understand the attached Excel spreadsheet. N means the number of survey participants and the four rows below it state how many participants judged the relationship to be "no relation", "weak relation", etc., correct? Mean summarizes the score for the relationship and skewness is a measure of disagreement (I would think standard deviation might be a better measure of disagreement). In the spreadsheet you have five variables/factors but I suspect that you have a full version of the spreadsheet with a total of 20 factors. With five variables, there are 10 pairs that you can give strengths to and you are reporting 13. I noticed that you have 5-3 and 3-5, 1-3 and 3-1 and also 2-2. Are these correct? If so, what do they mean? With 20 variables, I would expect 190 columns. Is this correct? I just want to understand what you have as a starting point. My comments below are conditional on my understanding of your problem.

If the above is correct, a Bayesian network modeling your problem will have 20 variables, at most 190 arcs, and at most 19 parents per node (which is rather large). If you limit yourself to four parents per node, you will have fewer than 76 arcs but still a possibly prohibitive network in terms of the number of parameters. You may want to keep you maximum parent count at four but also additionally skip those arcs that end up with a low survey-based score.

Now, coming back to your question, I'm not sure there is a good theory for optimality of the constructed network from what you have. I would try to construct the network from the modes or means of the estimates. This would be a reasonable approach for combining the knowledge of your experts, who provided the estimates of the strength of the relationships. There is no standard functionality for this in GeNIe (or any BN software that I am aware of, for that matter). There are two approaches that you can follow (1) Generate a data set from your estimates in such a way that the data set reflects the joint probability distribution over your 20 factors and then use any learning algorithm implemented in GeNIe, or (2) Construct your Bayesian network straight from the parameters that you have using any programming language that will access SMILE.
Does this help?

Marek

Jessyboeters · Post by **Jessyboeters** » Sun Dec 10, 2017 1:38 pm

Hello Marek,

Thanks for your reaction! I attached a new Excel-file to make it more clear.

marek wrote:N means the number of survey participants and the four rows below it state how many participants judged the relationship to be "no relation", "weak relation", etc., correct?

Yes, this is correct.

marek wrote: Mean summarizes the score for the relationship and skewness is a measure of disagreement (I would think standard deviation might be a better measure of disagreement).

It is correct that the mean summarizes the score for the relationship. This means that a mean of 1.48 (in cell C20) lies between a weak and a strong relation.

The skewness, on the other hand, is the measure of the asymmetry of the probability distribution of the relationship. When there is a positive number for the skewness skew indicates that the tail on the right side is longer or fatter than the left side (what means there is more chosen for the options no relationship or weak relationship. A negative number for the skewness skew indicates that the tail on the left side is longer or fatter than the right side (what means there is more chosen for the options strong or very strong relationship. I only use this skewness number to test the results against the 9 logical rules also explained in the Excel-file (starts from cell A27)

marek wrote:In the spreadsheet you have five variables/factors but I suspect that you have a full version of the spreadsheet with a total of 20 factors.

Yes, this is correct

marek wrote:With five variables, there are 10 pairs that you can give strengths to and you are reporting 13. I noticed that you have 5-3 and 3-5, 1-3 and 3-1 and also 2-2. Are these correct?

To better understand this part I included the set-up from my survey, the matrix table (starting in cell A2). In this table, the factors in the column(cause) are the same factors in the row(effect). In the example, there are 25 relationships (5x5) but in a logical way, it is not possible that cause 'factor 1' will lead to the effect 'factor 1' because factor 1 already occurred (this means that I made a mistake because 2-2 is not possible). This reduces the possible relationships to 20 (25-5) as shown in the grey cells with an X.

What I have done before starting the survey is delete the illogical relationships (this is shown in the white cells with an X). This helped me to reduce the 20 possible relationships to 13 possible relationships. So, it is correct that some relationships are judged in both directions when you look at the set-up of the survey (starting in cell A2).

marek wrote:If so, what do they mean?

The relationship in C13 (X1Y3) means the relationship from factor 1 (cause = X) to factor 3 (effect = y).

marek wrote:With 20 variables, I would expect 190 columns. Is this correct? I just want to understand what you have as a starting point.

In the case of 20 factors, this will mean that there are 380 relationships between the factors (20x20 = 400; 400-20=380). When deleting the illogical relationships I reduced the possible relationships from 380 to 267 relationships.

After the survey, I tested the results against the 9 logical rules (starting in cell A27) what helped me to check which relationships are accepted in the survey. In the example of 5 factors, based on the 9 logical rules there are only 4 accepted relationships out of 13 relationships (with 20 factors 108 relationships out of 267).

What I want to do is with the accepted relationships make a BBN based on the mean (strength of the relationships). In the example, it is easy to do because of the only 4 accepted relationships but it is harder to do with 108 relationships between 20 variables. It is not possible to make a BBN out of all the 108 relationships because there are relationships in both directions but what I want is to use GeNIe (if this is possible) to make an optimal BBN in which considerations have been made between all relationships. It is easy to say one relation is stronger than another relation but it is difficult when there are more connections with more factors. After the BBN is developed, I will make another survey to find the chances of occurrence of the different factors.

marek wrote:(1) Generate a data set from your estimates in such a way that the data set reflects the joint probability distribution over your 20 factors and then use any learning algorithm implemented in GeNIe

I was thinking about the first option but I don't know how this data sheet has to look like? Is it possible to make a 'correlation matrix' data sheet based on the strength of the relationships?

The second option is more difficult because I'm not familiar with programming and SMILE.

I hope this information helps you to better understand my problem and what I want to achieve.

Kind regards,
Jessy Boeters

PS if this is not possible in GeNIe, I will try to construct the network from the mean of the estimates by hand as indicated by you in the previous post.

marek wrote:This would be a reasonable approach for combining the knowledge of your experts, who provided the estimates of the strength of the relationships

Post by **marek [BayesFusion]** » Sun Dec 10, 2017 6:32 pm

Hi Jessy,

marek wrote:
With five variables, there are 10 pairs that you can give strengths to and you are reporting 13. I noticed that you have 5-3 and 3-5, 1-3 and 3-1 and also 2-2. Are these correct?
To better understand this part I included the set-up from my survey, the matrix table (starting in cell A2). In this table, the factors in the column(cause) are the same factors in the row(effect). In the example, there are 25 relationships (5x5) but in a logical way, it is not possible that cause 'factor 1' will lead to the effect 'factor 1' because factor 1 already occurred (this means that I made a mistake because 2-2 is not possible). This reduces the possible relationships to 20 (25-5) as shown in the grey cells with an X.

What I have done before starting the survey is delete the illogical relationships (this is shown in the white cells with an X). This helped me to reduce the 20 possible relationships to 13 possible relationships. So, it is correct that some relationships are judged in both directions when you look at the set-up of the survey (starting in cell A2).

marek wrote:
If so, what do they mean?
The relationship in C13 (X1Y3) means the relationship from factor 1 (cause = X) to factor 3 (effect = y).

I actually meant the reciprocal relationship -- you have both C13 and C31. Is this allowed? What does this mean? If you exclude reciprocal relationships, my calculations of the number of arcs should be correct, right? Generally, if your graph of influences is to be acyclic, you can think about your matrix of influences/parameters as triangular.

marek wrote:
(1) Generate a data set from your estimates in such a way that the data set reflects the joint probability distribution over your 20 factors and then use any learning algorithm implemented in GeNIe
I was thinking about the first option but I don't know how this data sheet has to look like? Is it possible to make a 'correlation matrix' data sheet based on the strength of the relationships?

The second option is more difficult because I'm not familiar with programming and SMILE.

I hope this information helps you to better understand my problem and what I want to achieve.

I don't know off-hand how to generate a data file from your model. I guess you will have to reflect on what the elicited parameters tell you about the joint probability distribution and then write a program that will generate data from your assumptions. If you find it difficult to think about the data, it may be better to create a graphical model, again best to use a computer program that interprets your (triangular) matrix of influences.

There is one more thing that I would like to recommend you when creating your model -- canonical gates and quite possibly Noisy-OR, if your factors are all binary -- this should save you a lot of effort in getting numbers. Please have a look at the section "Canonical models" in GeNIe manual (Chapter "Building blocks of GeNIe", section "Components of GeNIe models").
I hope this helps.

Marek

Jessyboeters · Post by **Jessyboeters** » Thu Dec 14, 2017 7:19 am

Hi Marek,

marek wrote:I actually meant the reciprocal relationship -- you have both C13 and C31

This is not allowed in the network because I want to create a BBN without reciprocal relationships and loops in the network.

marek wrote:If you exclude reciprocal relationships, my calculations of the number of arcs should be correct, right?

That's right

marek wrote:Please have a look at the section "Canonical models" in GeNIe manual (Chapter "Building blocks of GeNIe", section "Components of GeNIe models").I hope this helps.

I will look at the section "Cononical models" in the GeNIe model.

Thanks for your quick response!

Jessy

BayesFusion Support Forum

Learn new network based on the strengths of relationships derived from a survey

Learn new network based on the strengths of relationships derived from a survey

Re: Learn new network based on the strengths of relationships derived from a survey

Re: Learn new network based on the strengths of relationships derived from a survey

Re: Learn new network based on the strengths of relationships derived from a survey

Re: Learn new network based on the strengths of relationships derived from a survey