Validation

<< Click to Display Table of Contents >>

Navigation:  Using SMILE Wrappers > Learning >

Validation

To evaluate the predictive quality of your network you can use the Validator class.

The Validator constructor requires references to DataSet and Network objects to be specified. To properly match the network and data the constructor also requires the array of DataMatch objects (as did EM.learn method).

After the validator object is constructed, you need to specify which nodes in the network are considered class nodes by calling Validator.addClassNode method. Validation requires at least one class node.

For each record in the dataset during the validation, the variables matched to non-class nodes are used to set the evidence. The posterior probabilities are then calculated and for each class node the outcome with the highest probability is selected as a predicted outcome. The prediction is compared with an outcome in the dataset variable associated with the class node. The number of matches and calculated posteriors are used to obtain the accuracy, confusion matrix, ROC and calibration curves.

Validation can be either performed without parameter learning using Validator.test method, or with parameter learning using Validator::kFold and leaveOneOut methods. K-fold crossvalidation divides the dataset into K parts of equal size, trains the network on K-1 parts, and tests it on the last, Kth part. The process is repeated K times, with a different part of the data being selected for testing. Leave-one-out is an extreme case of K-fold, in which K is equal to the number of records in the data set.

The example below performs K-fold crossvalidation with 5 folds using one class node. The accuracy is obtained for the outcome with the index zero (that is, the first outcome of the node).

Java:

DataSet ds = new DataSet();

Network net = new Network();

// load network and data here

DataMatch[] matching = ds.matchNetwork();
Validator validator = new Validator(ds, net, matching);

int classNodehandle = net.getNode("someNodeId");

validator.addClassNode(classNodeHandle);
EM em = new EM();

// optionally tweak EM options here
validator.kFold(em, 5);
double acc = validator.getAccuracy(classNodeHandle, 0);

Python:

ds = pysmile.learning.DataSet()

net = pysmile.Network()

# load network and data here

matching = ds.match_network(net)

validator = pysmile.learning.Validator(ds, net, matching)

classNodehandle = net.getNode("someNodeId")

validator.addClassNode(classNodeHandle)
em = pysmile.learning.EM()

# optionally tweak EM options here

validator.k_fold(em, 5)

acc = validator.get_accuracy(classNodeHandle, 0)

R:

ds <- DataSet()

net <- Network()

# load network and data here

matching <- ds$matchNetwork()
validator <- Validator(ds, net, matching)

classNodehandle <- net$getNode("someNodeId")

validator$addClassNode(classNodeHandle)
em <- EM()

# optionally tweak EM options here
validator$kFold(em, 5)
acc <- validator$getAccuracy(classNodeHandle, 0)

C#:

DataSet ds = new DataSet();

Network net = new Network();

// load network and data here

DataMatch[] matching = ds.MatchNetwork();
Validator validator = new Validator(ds, net, matching);

int classNodehandle = net.GetNode("someNodeId");

validator.AddClassNode(classNodeHandle);
EM em = new EM();

// optionally tweak EM options here
validator.KFold(em, 5);
double acc = validator.GetAccuracy(classNodeHandle, 0);