My example of simulating categorical model and parameter learning

BayesFusionUser123 · Post by **BayesFusionUser123** » Thu Jun 12, 2025 8:02 pm

Dear Community,

I am following the PDF version of the tutorial for SMILE wrappers, specifically for the Python language.

I tried to simulate a network with categorical variables only.

Simulate categorical variables A, B, C, D with the following Bayesian network dependencies:

* C -> A
* A -> D
* B -> D

1. In this sample code, I first simulate a dataset based on a predefined network.
2. I tried structured learning from a simulated dataset and checked what the output network structure looks like.
3. I use the predefined network and simulated dataset for parameter learning.
4. I double checked and found that the learned parameter aligns with the predefined parameter used when generating the simulated dataset.

I post this example here to let the forum administrator double-check whether I coded everything correctly.

If not, please point it out and tell me how to improve it. I am still in the process of learning.

Let's make this post a good example for future Python users. Thanks.

All related code and data are attached.

Fri Jun 13, 2025 12:43 pm

Your code is correct.

One possible issue is your choice of options for EM. When the dataset is complete (no missing data elements), the parameter learning uses case counting and obtains the CPT parameters after one pass over the data. When the dataset has missing entries, we generally recommend staring with parameter randomization set to true, unless you're refining your parameters in the existing network. Your program only creates a structure of the network (nodes added, then connected with arcs, no values specified for CPTs).

BayesFusionUser123 · Post by **BayesFusionUser123** » Fri Jun 13, 2025 8:41 pm

Let us discuss this section of the code below:

Code: Select all

em = pysmile.learning.EM()

try:
    matching = cate_ds.match_network(cate_net)
except pysmile.SMILEException:
    print("Can't automatically match network with dataset")
#endtry

em.set_uniformize_parameters(False)
em.set_randomize_parameters(False)
em.set_eq_sample_size(0)

try:
    em.learn(cate_ds, cate_net, matching)
except pysmile.SMILEException:
    print("EM failed")
#endtry

print("EM finished")
cate_net.write_file("./simulated_data/simulated_data_em_cate.xdsl")
print("Complete.")

Currently I am just learning how to use the package. My final goal is to use it for real-world project.

In a real-world project, we want to first use our own causal discovery algorithm to generate the skeleton of the network (like the PC algorithm outputs the skeleton of the network), then we want to use the SMILE software to learn the parameters of the network given our real-world dataset.

When the dataset has missing entries, we generally recommend starting with parameter randomization set to true, unless you're refining your parameters in the existing network.

If in our dataset, we preprocess the data, making sure there is no missing data, then the current setting is OK right?

By the way, I am just following the tutorial. Can you give me a detailed document (where to find in the document) regarding what the following functions are doing?

em.set_uniformize_parameters(False)
em.set_randomize_parameters(False)
em.set_eq_sample_size(0)

It is not easy to find in the document.

Thanks.

BayesFusion Support Forum

My example of simulating categorical model and parameter learning

My example of simulating categorical model and parameter learning

Re: My example of simulating categorical model and parameter learning

Re: My example of simulating categorical model and parameter learning