Handling missing data in learning parameters

The front end.
Post Reply
tstephens3956
Posts: 8
Joined: Tue Feb 19, 2013 9:43 pm

Handling missing data in learning parameters

Post by tstephens3956 »

I have a strategic question related to the best way to handle missing data in my situation. The situation is a diagnostic model that is normally conducted in stages (as most probably are) usually testing the least expensive measures for targets and progressing to the more expensive. Generally this leads to a situation where you find a tests which fails leading to the overall solution and higher level nodes are never tested. Consequently most examples do not have complete data from all nodes. How than is it best to handle these missing data?
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: Handling missing data in learning parameters

Post by marek [BayesFusion] »

This is very domain dependent. The EM algorithm, which is part of SMILE, will learn the parameters from a data set that contains missing data. I suspect, however, that you are aware of that.

There are many ways of handling missing data outside of applying the EM algorithm. People put average values in the missing fields, replace them by a new value called "missing" or replace the missing value by the most probable value (EM does something very close to that. In an old paper that I co-authored (here is the link to it: http://www.pitt.edu/~druzdzel/psfiles/iis02.pdf), we found out that in case of missing data in medicine, the best thing may be to replace the missing value by the most "normal" value, which is in case of, for example node "Temperature" value "normal, as opposed to "fever." Medicine may be different, though, because symptoms are usually reported at the outset and if they are not reported, they were probably not observed, i.e., observed to be absent. I would experiment with different ways of dealing with missing data, especially if you can test the accuracy of the resulting model, i.e., if you have data with the ultimate diagnosis.

I hope this helps,

Marek
Post Reply