Learning network structure

<< Click to Display Table of Contents >>

Navigation:  Using SMILE > Learning >

Learning network structure

The following classes can be used to learn a DSL_network from a DSL_dataset:

DSL_bs: Bayesian Search, a hill climbing procedure guided by scoring heuristic with random restarts

DSL_nb: Naive Bayes

DSL_tan: Tree Augmented Naive Bayes, semi-naive method based on the Bayesian Search approach

DSL_abn: Augmented Naive Bayes, another semi-naive method based on the Bayesian Search approach

In the simplest scenario, just declare the object representing a learning algorithm and call its Learn method:

DSL_dataset ds;

ds.ReadFile("myfile.txt");

DSL_network net;

DSL_bs bayesianSearch;

int res = bayesianSearch.Learn(ds, net);

If the algorithm succeeds, the learning algorithm returns DSL_OKAY and the DSL_network passed as an argument to Learn is the output of the learning. Note that every variable in the data set takes part in the learning process. If your data comes from the text file and you want to exclude some variables, use DSL_dataset::RemoveVar.

After learning the structure, each of the algorithms listed above performs parameter learning by counting cases. Counting cases can be used instead of a more sophisticated EM parameter learning algorithm, because the learning data set does not contain missing data entries.

The code example above used the default settings for Bayesian Search. To tweak the learning process, you can set some public data members in the learning object before calling its Learn method. The example below sets the number of iterations and maximum number of parents:

DSL_bs bayesianSearch;

bayesianSearch.nrIteration = 10;

bayesianSearch.maxParents = 4;

int res = bayesianSearch.Learn(ds, net);

All settings for the learning algorithms are described in detail in the Reference section.

SMILE also contains the DSL_pc class, which implements the PC structure learning algorithm (the algorithm name is an acronym derived from its inventors' first names, Peter and Clark). This algorithm also uses DSL_dataset as data source, but instead of DSL_network it learns the DSL_pattern object, which is a graph with directed and undirected edges, which is not guaranteed to be acyclic.

DSL_bs and DSL_pc objects contain a public data member of DSL_bkgndKnowledge type. It can be used to pass the background knowledge to the learning algorithm. The background knowledge influences the learned structure by:

forcing arcs between specified variables

forbidding arcs between specified variables

ordering specified groups of variables by temporal tiers: in the resulting structure, there will be no arcs from nodes in higher tiers to nodes in lower tiers

The example below forces an arc from X to Y and forbids an arc from Z to Y. It is assumed that the data set contains variables with the identifers used in the calls to DSL_dataset::FindVariable.

DSL_network net;

DSL_bs baySearch;

int varX = ds.FindVariable("X");

int varY = ds.FindVariable("Y");

int varZ = ds.FindVariable("Z");

baySearch.bkk.forcedArcs.push_back(make_pair(varX, varY));

baySearch.bkk.forbiddenArcs.push_back(make_pair(varZ, varY));

res = baySearch.Learn(ds, net);

Tutorial 9 contains a program, which performs structure learning using Bayesian Search, Tree Augmented Naive Bayes and PC.