Advice please on canonical model

The front end.
Post Reply
charlie
Posts: 66
Joined: Wed Aug 09, 2017 10:55 pm

Advice please on canonical model

Post by charlie »

I seek advice on proper canonical models in relation to BN for the adoption of technology. To simply put for the sake of posting, my problem deals with upgrading a processing process in a specific industry with implementing the latest technology. The implementation depends on in general three factors – engineering skills in a business of the industry, business desire to implement the technology, the accessibility of the technology in the local market. Each of the three factors are further ascribed to more sub-factors such as the engineering skills to mechanical skills and electrical automation skills, local market accessibility to product price and physical availability of the product, and the business desire to the management’s perception of return on investment and attitude to the company’s image being innovative. This factorisation can go on to include more levels or sub-factors but the above will suffice to describe my problem.

Certainly I quickly realised that it’s impossible to establish CPT purely through elicitation, not just because of the prohibitive number of parameters but also because of the limitation of siloed expert knowledge. The knowledge of the three factors need to be acquired from experts in three distinct fields separately – engineers, business managers, and market salespeople. They don’t usually cross each other in knowledge. While they are knowledgeable in each’s respective field, they don’t have proficient knowledge about the other fields. E.g. an engineer can comfortably talk of different levels of likelihood of adoption against different levels of engineering skills, ignoring the desire and market factors. It is however highly impossible or unreliable, if forced, to obtain an answer from a panel of engineers about adoption possibility under certain combination of business attitudes and market conditions.

The simple part of the problem is that all the variables of interest are intuitively causal and ordinal against the end result, adoption. E.g. the higher the appreciation of the return on investment, the higher the desire, and thus the higher the likelihood of adoption. All the variables may be preferably arranged in four discreet states – high, medium high, medium low and low (or the opposite in the case of price). It is also desired to explicitly keep many layers of factorisation in BN to enhance the visualisation of logical / influential structure and to facilitate on-going learning.
You may clearly see I need a or some canonical model(s) to help with the BN development. I read the 2007 paper by Francisco Diez and Marek Druzdzel but still don’t have much solid clue. It seems a graded causal Max Model would be of relevance. Any idea or advice will be highly appreciated, a direct recommendation of certain models, leads to some recent research / papers in a relevant area, or just some thoughts / technical tips dealing with the problem.
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: Advice please on canonical model

Post by marek [BayesFusion] »

My suggestion is that you look carefully at the paper that I have co-authored with my colleague Javier Diez, available at the following location:

http://www.cisiad.uned.es/techreports/canonical.php

We have tried to review all existing canonical models. Perhaps you will find in there what you are looking for.
Cheers,

Marek
charlie
Posts: 66
Joined: Wed Aug 09, 2017 10:55 pm

Re: Advice please on canonical model

Post by charlie »

Thanks Marek. As said in my first post I did studied that paper which is apparently very informative and instrumental. The information I collected however doesn't even allow me to comfortably specify the Noisy MAX parameters. Anyway I wrote some codes to interpret the raw data to general CPT and Noisy MAX, but I still have no idea how to estimate leak.

Apart from leak node, another problem I hope to have some enlightening here is this: I need to weight the influences from different parent nodes. As mentioned in the first post, knowledge about the three parents - engineering skills, business desire and local market availability was extracted from three distinctive panels of experts, engineers, biz managers and salespeople, respectively. From a separate study I also understand the strengths of influence of the three parents to the adoption of technology are very different (and varying with some other parameters). Is there any way in GeNIe I could assign weights to parent variables? Thanks
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: Advice please on canonical model

Post by marek [BayesFusion] »

Hi Charlie,

I'm sorry to have missed this (that you have already looked at the Diez & Druzdzel paper). My experience in applying canonical models is that they each require careful thinking. There is no easy/algorithmic solution to which model to pick.

Am I understanding correctly that you have selected the Noisy-MAX model and that you are struggling with the value of the parameters? This is not a bad situation to be in. The Noisy-MAX parameters are essentially weights that you are perhaps talking about. They will allow you to express everything that you need to express. It is not uncommon for different experts (or groups of experts) to come up with different parameters. Each parameter has a clear interpretation and a question that you can ask to your expert. The leak parameters, in particular, correspond to the situation when none of the explicit causes of the effect are present. How likely is each of the states of the child variable under such circumstance? In my experience, confronting experts with their estimates, if these differ much, works well.

There are two thoughts (dealt with in two sets of papers) that I would like to recommend:

(1) My papers with Agnieszka Onisko (Agnieszka Onisko and Marek J. Druzdzel. Impact of precision of Bayesian networks parameters on accuracy of medical diagnostic systems. Artificial Intelligence in Medicine, 57(3):197-206, March 2013, http://www.pitt.edu/~druzdzel/psfiles/onisko12.pdf) on the sensitivity of Bayesian networks to precision of parameters. It turns out that precision of parameters does not matter that much. Once you realize that, you will feel more comfortable estimating parameters. Just focus on reasonable values and don't worry too much about disagreements among experts and imprecise values of parameters. Try to get at least the right order of magnitude at first. Once you have your model complete, you can check through sensitivity analysis which parameters matter most and then improve on them.

(2) There is a paper on a DeMorgan canonical model that I co-authored with Paul Maaskant, Paul P. Maaskant and Marek J. Druzdzel. An ICI Model for opposing influences. In Proceedings of the Fourth European Workshop on Probabilistic Graphical Models (PGM-08), Manfred Jaeger & Thomas D. Nielsen (eds.), pages 185-192, Hirtshals, Denmark, September 17-19, 2008, http://www.pitt.edu/~druzdzel/ftp/pgm08a.pdf. Perhaps it will be helpful. That model is for binary variables. We have created at the university QGeNIe, a qualitative version of GeNIe that implemented this model and used it for rapid construction of models, often the first version that we later converted to GeNIe and elaborated upon. Here is a good paper on QGeNIe: Marek J. Druzdzel. Rapid modeling and analysis with QGeNIe. In Proceedings of the International Multiconference on Computer Science and Information Technology (IMCSIT-2009), pages 101-108, Mragowo, Poland, October 12-14, 2009, http://www.pitt.edu/~druzdzel/psfiles/imcsit09.pdf. We will make it into a product one day, so please stay tuned.

Finally, our discussion goes quite far beyond software support. If you think it will be useful for you, we are available to support your project on a consulting basis. We have quite a lot of experience in building probabilistic models and fielding them in practice.
I hope this helps!

Marek
charlie
Posts: 66
Joined: Wed Aug 09, 2017 10:55 pm

Re: Advice please on canonical model

Post by charlie »

Hi Marek
Thank you so much for the detailed reply. It's really helpful. The expert data I used was not initially designed for Bayesian elicitation but rather qualitative ranking scores from expert panels for generic consultation purpose. The quantity and quality of the data are far from satisfactory for BN. It's not likely we can re-run those panels soon as it's costly. It's really good to know precision does not matter much otherwise what else could I possibly do?

GeNIe's noisy-leaky-Max chance node helps hugely reduce the work of parameterization, but I found myself confronted with the difficulty of specifying leaks that I have no knowledge about at all. I had to get around it writing my own codes using the noisy-MAX equations in your paper and content-specific knowledge to convert raw expert data into a CPT without dealing with leaks.

Just wonder why general CPT approach does not take into account leak. A comprehensive dealing with possible dependence between identified parent variables of interest does not guarantee a comprehensive inclusion of all the parent variables, does it? So why only ICI models care about leak?

Just a note how I dealt with weighting among parents for comment. By weighting I think I actually meant to change the sensitivity of parent nodes to the child node otherwise parameters of three parents elicited from three expert panels separately will have to be treated equally which is not the case in real world. I simply tried to power probabilities of the three parents in the multiplication equation of noisy-max equation. Say if q1^p1 * q2^p2, and p1=0.8, p2=0.2, a change in q1 will be more influential than a change in q2.

Thanks again!
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: Advice please on canonical model

Post by marek [BayesFusion] »

Hi Charlie,

The leak is present in CPTs as well. It is the probability of the effect happening when none of the parents are present. If you click the CPT button in a Noisy node, you will certainly see the leak! Interesting that you are having problems with the leak. I find it one of the most straightforward concepts in canonical gates. Think of it as the probability of the effect happening when each of the modeled parents is in its distinguished (normal) state.

I guess you need to model the reliability of your experts by means of noise in the distribution or the interaction between the child node and its parents. One way of thinking about it is that you bring everything to the common denominator.
Cheers,

Marek
charlie
Posts: 66
Joined: Wed Aug 09, 2017 10:55 pm

Re: Advice please on canonical model

Post by charlie »

Yes I guess I do get stuck in a kind of mindset but it could also because the problem I'm studying - technology adoption (or some other social problems) does not have a legitimate distinguished or neutral state in a real world. That's why the lowest state of a node in my case is always set as "low" or "minimal" rather than "absent".

Of relevance, I wonder why in GeNIe when a general chance node is converted to a noisy-MAX node the conditional probabilities of the lowest state of a parent variable are always forced to be 0, 0, 0, 1 for p(y=high/x=low), p(y=medium high/x=low), p(y=medium low/x=low), p(y=low/x=low), respectively, and the editing of this column is inactivated. Why?
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: Advice please on canonical model

Post by marek [BayesFusion] »

Hi Charlie,

In theory, if you have a variable in any domain, then absent is a theoretically possible state. The reason why you don't see "absent" too often is precisely because of variables that you are unwilling or unable to model. They account for the non-zero probability of "absent" (and this is precisely what Noisy-MAX calls "leak") or typically observed state "minimal". If you become more concrete about your model, I will be glad to explain it more.

The grayed-out/inactivated columns correspond to the "distinguished"/"normal" state and its zero influence on the child. This belongs to the canonical model assumptions/constrains. By assumption, if each of the parents in a Noisy-OR is in its distinguished state, then the child is in its distinguished state. You can see this column as well and move it around so that a different state is designated as normal/distinguished but you need to press the "Show constrained columns" button.
I hope this helps.

Marek
charlie
Posts: 66
Joined: Wed Aug 09, 2017 10:55 pm

Re: Advice please on canonical model

Post by charlie »

Thanks Marek. I hope this doesn't get too philosophical - would the non-zero probability of "absent" be because of the stochastic nature of the variables (or the real world problem they represent) rather than because of "leak" from outside? Logically thinking in a cognitively established knowledge system, one would indeed conclude that the absence of causes will lead to zero probability of effects. BN however incorporates randomness (rather than questioning the source of randomness) in modelling (that's how BN is different from a deterministic graphical model, isn't it) . The randomness should thus be existing in any state of a variable because of this "intrinsic" nature of randomness, so I wonder why we have to force in place a "distinguished" state and attribute all the randomness in this special state to "leak". Would this make the "distinguished" state taste very differently from all the other states? In other words if there is leak (I'm not denying leak), would leak exist in any state in stead of the "distinguished" state only? Would that leak already be absorbed in the randomness of the variable, and the probabilities of its states (distinguished or not)? I know I got stuck but don't know where.
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: Advice please on canonical model

Post by marek [BayesFusion] »

Hi Charlie,

There are two ways of viewing Bayesian networks. One is that they are just representations of the joint probability distributions and the structure just represents independences. From this perspective, it does not matter what the structure is and in what directions the links are, as long as they represent the independences.

The other view is that they are causal graphs. The second view allows us to practically build them. As you pointed out, the logic of causality is that no effect happens without a cause. Canonical gates make building models easier and this is the reason why one ideally should look at them as causal models. Leak probability is the probability that the effect will happen even though none of the modeled causes is present. Leak corresponds to unmodeled causes and is not really hard to elicit in practice. I think this view is quite elegant.
I hope this helps,

Marek
charlie
Posts: 66
Joined: Wed Aug 09, 2017 10:55 pm

Re: Advice please on canonical model

Post by charlie »

OK, thanks Marek. Just a simple question about GeNIe, for a Noisy-MAX node the button for constrained state is somehow inactivated so that I cannot hide away the constrained columns. How to activate the button?
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: Advice please on canonical model

Post by marek [BayesFusion] »

This is actually a feature. The button gets disactivated whenever there is at least one parent that has more than two states. The reason for this is that it is sometimes desirable to change the order of states in the parent (e.g., when implementing negation, like in Noisy-MIN, or when the parent is a parent of another canonical gate). In binary nodes, you can right-click on the state and pick one of the two. This is theoretically challenging when there are more than two states, so we force showing the constrained columns.
I hope this helps.

Marek
charlie
Posts: 66
Joined: Wed Aug 09, 2017 10:55 pm

Re: Advice please on canonical model

Post by charlie »

Thanks, Marek. It makes sense, but I would still prefer to have the choice of hiding the constrained columns. This is why - sometime I have a row of say 10 parent nodes. The parameter data were prepared in Excel spreadsheets as such that can be readily copied to GeNIe. If no constrained columns I can copy all the 10 variables in one-go, but with the separation of these non-editable columns I have to do it 10 times.
Post Reply