Learning network parameters

<< Click to Display Table of Contents >>

Navigation:  Using SMILE Wrappers > Learning >

Learning network parameters

To learn parameters in an existing Network object, use the Expectation-Maximization (EM) algorithm implemented in the EM class. As with structure learning, the data is provided in a DataSet object. The network and dataset must be aligned so that the learning algorithm can determine the correspondence between dataset variables and network nodes.

If variable names and node identifiers are identical, the DataSet.match_network method can be used. This method creates an array of DataMatch objects that define the correspondence between each variable and its associated network node. DataMatch has node and column fields, representing the node handle and variable index, respectively. For each node/variable pair, one element of the array is required. When automatic matching is not possible, the user must create the DataMatch array manually and pass it to EM.learn to perform parameter learning.

Python

ds = pysmile.learning.DataSet()

net = pysmile.Network()

# load network and data here

matching = ds.match_network(net)

em = pysmile.learning.EM()

em.learn(ds, net, matching)

Java

DataSet ds = new DataSet();

Network net = new Network();

// load network and data here

DataMatch[] matching = ds.matchNetwork(net);

em = new EM();

em.learn(ds, net, matching);

C#

DataSet ds = new DataSet();

Network net = new Network();

// load network and data here

DataMatch[] matching = ds.MatchNetwork(net);

em = new EM();

em.Learn(ds, net, matching);

R

ds <- DataSet()

net <- Network()

# load network and data here

matching <- ds$matchNetwork(net)

em <- EM()

em$learn(ds, net, matching)

 

During EM parameter learning, nodes can be marked as fixed, meaning their conditional probability distributions remain unchanged throughout the learning process. Nodes not designated as fixed are subject to parameter updates, even if they are not matched to any variable in the dataset. To specify fixed nodes, pass an array of node handles or identifiers to the EM.learn method.

The EM algorithm can start parameter learning from different initial conditions. The choice of initialization method affects convergence and, in some cases, the quality of the learned parameters.

1.Uniformize parameters: when this option is enabled, the algorithm initializes all network parameters with a uniform distribution. This approach effectively discards the existing parameters in the network. Use this option when you want to start learning from a neutral, non-informative prior. Call EM.set_uniformize_parameters(true) to enable uniform initialization.

2.Randomize parameters (default): random initialization assigns random values to all parameters. Although uniform initialization is common, it does not always yield better final parameter quality. The EM algorithm uses the initial parameter values only as a starting point in its search for the parameter set that maximizes the likelihood of the data given the model. Random initialization may be particularly useful when learning parameters in networks with latent variables, as it can help the algorithm avoid local maxima near uniform distributions. Call EM.set_randomize_parameters(true) to enable random initialization. The random seed can be specified with EM.set_seed. If not specified, or if zero is passed as a seed, the random generator will be seeded with system clock.

3.Keep original parameters: This option keeps the existing parameter values in the network and updates them based on the new data. It should be used when the new dataset is treated as an additional source of information rather than a complete replacement. In this case, the equivalent sample size (ESS) parameter becomes relevant. ESS represents the notional number of records that the current network parameters are based on. For networks whose parameters were previously learned from data, ESS should typically match the number of records in that dataset. The equivalent sample size defines the relative weight of the existing parameters compared to the new data. During EM learning, the algorithm combines prior information represented by the current parameters (weighted by ESS) with the likelihood derived from the dataset. A higher ESS makes the learned parameters change less in response to the new data, while a lower ESS allows the new data to have a stronger influence. Call EM.set_uniformize_parameters(false) and EM.set_randomize_parameters(false) to keep the original parameters. Set the equivalent sample size with EM.set_eq_sample_size.

The following methods of the EM class control parameter learning behavior, initialization options, and equivalent sample size.

Python

learn(data: DataSet, net: Network, matching: List[DataMatch], fixed_nodes: List[int] | List[str] | None = None)

set_uniformize_parameters(value: bool)

get_uniformize_parameters() -> bool

set_randomize_parameters(value: bool)

get_randomize_parameters() -> bool

set_seed(seed: int)

get_seed() -> int

set_eq_sample_size(size: int)

get_eq_sample_size() -> int

Java

void learn(DataSet data, Network net, DataMatch[] matching, int[] fixedNodes);

void learn(DataSet data, Network net, DataMatch[] matching, String[] fixedNodes);

void learn(DataSet data, Network net, DataMatch[] matching);

void setUniformizeParameters(boolean value);

boolean getUniformizeParameters();

void setRandomizeParameters(boolean value);

boolean getRandomizeParameters();

void setSeed(int seed);

int getSeed();

void setEqSampleSize(int size);

int getEqSampleSize();

C#

void Learn(DataSet data, Network net, List<DataMatch> matching, int[] fixedNodes);

void Learn(DataSet data, Network net, List<DataMatch> matching, string[] fixedNodesl);

void Learn(DataSet data, Network net, List<DataMatch> matching);

bool UniformizeParameters { get; set; }

bool RandomizeParameters { get; set; }

int Seed { get; set; }

int EqSampleSize { get; set; }

R

learn(data, net, matching, fixedNodes = NULL)

set_uniformize_parameters(value)

uniformParams <- get_uniformize_parameters()

set_randomize_parameters(value)

randParams <- get_randomize_parameters()

set_seed(seed)

seed <- get_seed()

set_eq_sample_size(size)

size <- get_eq_sample_size()

 

The final section of Tutorial 10 demonstrates EM parameter learning applied to a network structure created by the PC algorithm.