|
<< Click to Display Table of Contents >> Navigation: Using SMILE Wrappers > Learning > Learning network structure |
The following classes can be used to learn Network from DataSet:
•BayesianSearch: performs structure learning using a hill-climbing procedure guided by a scoring heuristic with random restarts.
•NaiveBayes: implements the standard Naive Bayes structure learning method.
•TAN: Tree-Augmented Naive Bayes, a semi-naive method based on the Bayesian Search approach.
•ABN: Augmented Naive Bayes, another semi-naive method derived from Bayesian Search.
In the simplest scenario, create an instance of the desired learning algorithm class and call its learn method to perform network learning.
Python
ds = pysmile.learning.DataSet()
ds.read_file("mydatafile.txt")
baySearch = pysmile.learning.BayesianSearch()
net = baySearch.learn(ds)
Java
DataSet ds = new DataSet();
ds.readFile("mydatafile.txt");
BayesianSearch baySearch = new BayesianSearch();
Network net = baySearch.learn(ds);
C#
DataSet ds = new DataSet();
ds.ReadFile("mydatafile.txt");
BayesianSearch baySearch = new BayesianSearch();
Network net = baySearch.Learn(ds);
R
ds <- DataSet()
ds$readFile("mydatafile.txt")
baySearch <- BayesianSearch()
net <- baySearch$learn(ds)
Note that every variable in the dataset participates in the learning process. After learning the structure, each of the algorithms listed above automatically performs parameter learning using the Expectation-Maximization (EM) method, so the resulting network includes nodes with parameters estimated from the data.
The previous example used the default settings for Bayesian Search. To adjust the learning process, modify the control parameters of the learning object before calling its learn method. The example below demonstrates how to set the number of iterations and the maximum number of parents:
Python
baySearch = pysmile.learning.BayesianSearch()
baySearch.set_iteration_count(10)
baySearch.set_max_parents(4)
net = baySearch.learn(ds)
Java
BayesianSearch baySearch = new BayesianSearch();
baySearch.setIterationCount(10);
baySearch.setMaxParents(4);
Network net = baySearch.learn(ds);
C#
BayesianSearch baySearch = new BayesianSearch();
baySearch.IterationCount = 10;
baySearch.MaxParents = 4;
Network net = baySearch.Learn(ds);
R
baySearch <- BayesianSearch()
baySearch$setIterationCount(10)
baySearch$setMaxParents(4)
net <- baySearch$learn(ds)
For NaiveBayes and its semi-naive derivatives (TAN and ABN), a class variable identifier (string) must be specified by calling set_class_variable_id before invoking learn. This identifier must match one of the columns in the DataSet object passed to learn.
SMILE also provides the PC class, which implements the PC structure learning algorithm (its name is derived from the initials of its inventors). This algorithm uses a DataSet as its data source but learns a Pattern object instead of a Network. The Pattern represents a graph that may contain both directed and undirected edges and is not guaranteed to be acyclic.
Both the BayesianSearch and PC algorithms can incorporate background knowledge supplied by the caller. Background knowledge influences the learned structure by:
•Forcing arcs between specified variables.
•Forbidding arcs between specified variables.
•Ordering specified variables into temporal tiers, ensuring that no arcs are created from nodes in higher tiers to nodes in lower tiers.
To provide background knowledge, use the BayesianSearch.set_bk_knowledge or PC.set_bk_knowledge methods.
Tutorial 10 contains an example program demonstrating structure learning with Bayesian Search, Tree-Augmented Naive Bayes (TAN), and PC.