I want to learn network from data set with no missing values and then test it on data set that contains some missing values. I use readFile(path, missingValueToken) when reading training set (which is complete) and testing set (which has some missing values marked with missingValueToken). Then I learn network by GTT with learning set and test it with testing set. This gives me some results but also following message like:
Invalid outcome index ? for node 'synfuels_corporation_cutback', valid indices are 0..1
"?" stands for missing value (it has -1 index) and this message is printed for every missing value in test set. Is this normal behaviour or this message means there is some error? Am I doing something wrong?What is the difference when I read data with readFile(path) and readFile(path, missingValue) method?
Missing values in test set
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
Re: Missing values in test set
How exactly do you test? If you're iterating over values in the dataset and call Network.setEvidence, then you'll need to check for missing values manually before passing them to setEvidence.lizbona wrote:Then I learn network by GTT with learning set and test it with testing set. This gives me some results but also following message like:
Invalid outcome index ? for node 'synfuels_corporation_cutback', valid indices are 0..1
"?" stands for missing value (it has -1 index) and this message is printed for every missing value in test set.
-
- Site Admin
- Posts: 1417
- Joined: Mon Nov 26, 2007 5:51 pm
Yes, the -1 represents the missing value in dataset by default. However, I'd suggest checking the value of the index before passing it to setEvidence instead of catching an exception.lizbona wrote:my testing is done as you mentioned above. I do it right now by puting setEvidence, and I am catching smileException. It is thrown when there is -1 index. I guess index -1 goes with missing value '?'
Make sure you call clearAllEvidence before instantiating a record from the training set, or clearEvidence for each entry with missing value.