Learning network structure

<< Click to Display Table of Contents >>

Navigation:  Using SMILE Wrappers > Learning >

Learning network structure

The following classes can be used to learn Network from DataSet:

BayesianSearch: Bayesian Search, a hill climbing procedure guided by scoring heuristic with random restarts

NaiveBayes: Naive Bayes

TAN: Tree Augmented Naive Bayes, semi-naive method based on the Bayesian Search approach

ABN: Augmented Naive Bayes, another semi-naive method based on the Bayesian Search approach

In the simplest scenario, just create the object representing learning algorithm and call its learn method:

Java:

DataSet ds = new DataSet();
ds.readFile("mydatafile.txt");

BayesianSearch baySearch = new BayesianSearch();
Network net = baySearch.learn(ds);

Python:

ds = pysmile.learning.DataSet()

ds.read_file("mydatafile.txt")

baySearch = pysmile.learning.BayesianSearch()

net = baySearch.learn(ds)

R:

ds <- DataSet()
ds$readFile("mydatafile.txt")

baySearch <- BayesianSearch()
net <- baySearch$learn(ds)

C#:

DataSet ds = new DataSet();

ds.ReadFile("mydatafile.txt");

BayesianSearch baySearch = new BayesianSearch();

Network net = baySearch.Learn(ds);

Note that every variable in the dataset takes part in the learning process. After learning the structure, each of the algorithms listed above performs parameter learning with EM, so the output network has nodes with parameters based on the data.

The code example above used the default settings for Bayesian Search. To tweak the learning process, you can change some control options in the learning object before calling its learn method. The example below sets the number of iterations and maximum number of parents:

Java:

BayesianSearch baySearch = new BayesianSearch();
baySearch.setIterationCount(10);

baySearch.setMaxParents(4);
Network net = baySearch.learn(ds);

Python:

baySearch = pysmile.learning.BayesianSearch()
baySearch.set_iteration_count(10)
baySearch.set_max_parents(4)

net = baySearch.learn(ds)

R:

baySearch <- BayesianSearch()
baySearch$setIterationCount(10)

baySearch$setMaxParents(4)
net <- baySearch$learn(ds)

C#:

BayesianSearch baySearch = new BayesianSearch();
baySearch.IterationCount = 10;

baySearch.MaxParents = 4;
Network net = baySearch.Learn(ds);

For NaiveBayes and its semi-naive derivatives (TAN and ABN) it is required to specify a class variable identifier (a string value) with a call to setClassVariableId before invoking learn. This identifier has to match one of the columns in the DataSet objects passed to learn.

SMILE also contains the PC class, which implements the PC structure learning algorithm (algorithm name is an acronym derived from its inventors' names). This algorithm uses DataSet as data source, but instead of Network learns the Pattern object, which is a graph with directed and undirected edges, which is not guaranteed to be acyclic.

BayesianSearch and PC algorithms can use background knowledge, provided by the caller. The background knowledge influences the learned structure by:

forcing arcs between specified variables

forbidding arcs between specified variables

ordering specified variables by temporal tiers: in the resulting structure, there will be no arcs from nodes in higher tiers to nodes in lower tiers

To specify the background knowledge, use BayesianSearch.setBkKnowledge or PC.setBkKnowledge methods.