Ordinal Data > Soft Observations

The front end.
Post Reply
zzpmarco
Posts: 2
Joined: Fri Jan 01, 2021 10:56 am

Ordinal Data > Soft Observations

Post by zzpmarco »

In the field of psychometry, it is customary to represent scores with average values from multiple items, each using a scale (e.g., Likert, 7 values).
So each score will be a continuous value, in the given range. Typically, experiments will produce observations (i.e., score values) for a given number of participants.

The objective is using Genie to define a probabilistic (causal) network that could work as a predictor of unobserved scores from observed ones.
Using discrete values for this, instead of continuous scores, greatly simplifies parameter learning and the visual interpretation of results.
Learning the network structure is not the issue here, it can be discovered with other means, or defined by experts.
Discretization is not an issue either, as it can be made easily using quantiles, say in Excel.

The main difficulty in doing so is that any discretization of the kind ('low', 'medium', 'high') will make the ordinal information to be neglected. Which leads to evident irregularities in the results.

One solution, also according to the literature, would be to adopt a 'softening' of observed data, using for instance a normal distribution as a spreader.
Just for clarity, observed data would be translated into something like this:
observed score = 0.99/1 -> ('low' : 0.0, 'medium' : 0.01, 'high' : 0.99)
observed score = 0.66/1 -> ('low' : 0.01, 'medium' : 0.49, 'high' : 0.50)

My problem is: how to represent all this in GeNIe?

Consider that, ideally - in the desired predictor, main chance variables should all be discrete and all causal links should be among discrete variables only.

Any comments will be greatly appreciated.
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: Ordinal Data > Soft Observations

Post by marek [BayesFusion] »

Hi,

Let me try to help, although I realize that I may not know enough to give useful suggestions.

I know that you want to use discrete variables but have you considered continuous variables with Gaussian distributions and linear relationships? Because your scores are averages and are continuous, there is a good chance that they come from Normal distribution. If relationships between variables can be considered linear (easy to check by examining some scatterplots), then the joint distribution could be multivariate Normal and the PC algorithm could learn the structure as well.

Now, GeNIe allows you for specifying the order of states of discrete variables. The order can be "High to low" "Low to high" or "none". This is used, for example in coloring the influences (positive and negative) when looking at arc strengths.

I guess I don't understand what you mean by "softening". Does 0.99/1 mean Normal(0.99,1), i.e., mean 0.99 and standard deviation 1.0? Are the states ('low', 'medium', 'high') discrete states that your variable will then take? Representing this as a discrete distribution will be a matter of choosing states ('low', 'medium', 'high' ?) and their probabilities, including conditional probabilities given values/states of parents.
Cheers,

Marek
zzpmarco
Posts: 2
Joined: Fri Jan 01, 2021 10:56 am

Re: Ordinal Data > Soft Observations

Post by zzpmarco »

Hi Marek,

thank you for your appreciated response, which brings about interesting ideas.

Let me just focus on the problem at hand:
- input data are continuous variables, obtained by averaging scores over several items;
- learning the structure is not necessary, as it has been defined already (say, by experts);
- using normals plus the linearity hypothesis is an indeed option, yet we want to stay with non-parametric case, as of now;
- the objective is obtaining a trained, non-parametric BN with discrete variables and the given structure.

The main issue, the way I see it, is the discretization of continuous variables, which introduces distortions in the form of irregularities and counterintuitive results in specific cases.

In passing, I did not notice before the option about the order of states: thank you. In my case, however, its setting did not produce any effect.

Nonetheless, the idea of a normal distribution might help clarifying.

Let's assume that we consider each atomic observation (i.e., one scalar value x for a specific variable X, for a given individual i in a dataset) as bearing some uncertainty. We can then interpret x as being the mean of a normal distribution having a constant variance (=the same in the entire dataset). even assuming that the cut points are also constant (for the dataset), the resulting 'soft' discretization, say as 'low', 'medium' and 'high' will be a probability distribution and not a 'one-hot' value. Ideally, this 'soft' interpretation should be applied to (some) variables in the entire dataset, for each individual data item. Also ideally, the result to be obtained would be a trained, non-parametric BN with discrete variables and the given structure, as said above.

For completeness, I see quite some similarity with what is being done with the discretization of equation nodes in GeNIe but I must confess I have no idea about whether this can be put at work for the purpose in point.

Anyway, thank you very much for your time. Any contributions will be definitely welcome.

- Marco
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: Ordinal Data > Soft Observations

Post by marek [BayesFusion] »

Dear Marco,

I doubt the automatic discretization of continuous variables will be of much help to you. What it does is turning a continuous variable into a discrete variable (producing a CPT for it based on the definitions of the parents and the node in question). Since you don't have a parametric definition of the interaction, you are unlikely to be able to use this.

It seems (please correct me if I am wrong) that you want to modify the data that you ultimately want to learn the model parameters from. Instead of a value of the variable for a given individual, you would like to have a value drawn from a distribution with that value being the mean. Have you considered generating a new data file that embeds this process? You could sample from the value for individual i in the data set and place the sampled value in the data. You will not be able to do this in GeNIe but writing a simple program would accomplish this job. I hope I have not misunderstood you completely :-)?
Cheers,

Marek
Post Reply