Dear staff,
I have some questions about missing values, please see below:
(1) if i open the data file and replace all missing values with specified value, will these missing data affect the structure/parameter learning? for example, node A has three values "L, M, H", if i replace the missing values in node A with "-99", will this value join the computation? or it's just like a label (similar to SPSS, like a null) and doesn't make any sense (this is what i want)?
(2) if i follow the above step to learn the structure, when i use PC algorithm, the missing value (i.e., "-99") will not be considered as a state of node A in node properties. however, if i use Greedy Thick Thinning to learn the structure, the missing value (i.e., "-99") will be considered as a state of node A, showing "S_99" (see below). could you please explain the reason?
Many thanks.
missing value
missing value
- Attachments
-
- Screenshot 2023-10-08 223424.jpg (7.61 KiB) Viewed 2578 times
-
- Site Admin
- Posts: 1419
- Joined: Mon Nov 26, 2007 5:51 pm
Re: missing value
The structure learning algorithms in SMILE/GeNIe currently require that a dataset has no missing values. From the POV of the learning algorithm the missing value replacement is no different from any state label.
The PC algorithm should output the nodes with outcomes like S_99, just like the attached image.
The PC algorithm should output the nodes with outcomes like S_99, just like the attached image.
Re: missing value
Thanks Shooltz. So the structure leanring doesn't allow missing data, but parameter learning allows missing data, does it mean structure learning and parameter learning are based on different data set? Otherwise, we must use the complete data for both structure learing and parameter learning.shooltz[BayesFusion] wrote: ↑Mon Oct 09, 2023 10:46 pm The structure learning algorithms in SMILE/GeNIe currently require that a dataset has no missing values. From the POV of the learning algorithm the missing value replacement is no different from any state label.
The PC algorithm should output the nodes with outcomes like S_99, just like the attached image.
Cheers,
Yan
Re: missing value
and i try PC algorithm again, the output doesn't show outcomes/states like S_99 (replace missing data with 99), but if i use other structure learning algorithms (e.g., bayesian search), the bar chart view shows S_99, that's a bit weird. could you please check that by randomly using some data? thanks a lot!
-
- Site Admin
- Posts: 1419
- Joined: Mon Nov 26, 2007 5:51 pm
Re: missing value
It seems the issue is with PC learning algorithm and the missing value replacement which is numeric. I ran PC with a data file with missing values replaced with the label "x99", and the output did contain the x99 outcome. When the missing value was replaced with a number (like 99), the outcome representing value was not present.
We're searching for the root cause. In the meantime, please replace missing values with non-numeric labels.
We're searching for the root cause. In the meantime, please replace missing values with non-numeric labels.
-
- Site Admin
- Posts: 1419
- Joined: Mon Nov 26, 2007 5:51 pm
Re: missing value
GeNIe 4.1.4109 fixes this issue - the numeric replacement for missing values is converted to a valid SMILE id, and is visible as one of the node outcomes.