how default probability number in the network with only categorical variable is set?

BayesFusionUser123 · Post by **BayesFusionUser123** » Thu Jun 12, 2025 3:28 am

Dear Community,

I am following the PDF version of the tutorial for SMILE wrappers, specifically for python language.

I am doing the following experiment to learn more about SMILE package.

(1)

I defined a network using only categorical variables in the following Python code:

```
cate_net = pysmile.Network()

A = create_cpt_node(cate_net, "A", "A", ['a1', 'a2', 'a3'], 10, 20)
B = create_cpt_node(cate_net, "B", "B", ['b1', 'b2'], 10, 30)
C = create_cpt_node(cate_net, "C", "C", ['c1', 'c2', 'c3'], 10, 40)
D = create_cpt_node(cate_net, "D", "D", ['d1', 'd2'], 10, 50)

cate_net.add_arc(C, A)
cate_net.add_arc(A, D)
cate_net.add_arc(B, D)

cate_net.write_file("./simulated_data/predefined_cate_net.xdsl")
```

I can see that, the saved network file from predefined network is like below:

```
# predefined network
<nodes>
<cpt id="C">
<state id="c1" />
<state id="c2" />
<state id="c3" />
<probabilities>0.5 0.5 0</probabilities>
</cpt>
<cpt id="A">
<state id="a1" />
<state id="a2" />
<state id="a3" />
<parents>C</parents>
<probabilities>0.5 0.5 0 0.5 0.5 0 0.5 0.5 0</probabilities>
</cpt>
<cpt id="B">
<state id="b1" />
<state id="b2" />
<probabilities>0.5 0.5</probabilities>
</cpt>
<cpt id="D">
<state id="d1" />
<state id="d2" />
<parents>A B</parents>
<probabilities>0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5</probabilities>
</cpt>
</nodes>
```

(2)
I also simulated the data, based on this predefined network. Using the simulated dataset to learn a network using PC algorithm and save the network into file.

Part of the network file is as below:

```
# learned network

<cpt id="C">
<state id="a1" />
<state id="a2" />
<state id="a3" />
<probabilities>0.3333333333333333 0.3333333333333333 0.3333333333333333</probabilities>
</cpt>
<cpt id="A">
<state id="c1" />
<state id="c2" />
<state id="c3" />
<parents>C</parents>
<probabilities>0.3333333333333333 0.3333333333333333 0.3333333333333333 0.3333333333333333 0.3333333333333333 0.3333333333333333 0.3333333333333333 0.3333333333333333 0.3333333333333333</probabilities>
</cpt>
```

My question is:

Why the predefined network, the probability number looks not right. For instance, the probability value of variable C should be like `<probabilities>0.3333333333333333 0.3333333333333333 0.3333333333333333</probabilities>` all random right?
Why it is like `<probabilities>0.5 0.5 0</probabilities>` when saved from predefined network?

I have attached my code and file in this post.

I have attached the generated network files.

Because it is unable to attach .py file. The python code is below (in case you want to run it):
```
# %% [markdown]
# # Simulate Categorical Network

# %%
# env: smile2

import pandas as pd
import pysmile
import pysmile_license
import numpy as np

# %%
!pip show pysmile

# %% [markdown]
# ## Create simulated data from a network with discrete variables
#
# To simulate categorical variables $A, B, C, D$ with the following Bayesian network dependencies:
#
# * $C \rightarrow A$
# * $A, B \rightarrow D$
#
# We'll:
#
# 1. Sample $C$ independently.
# 2. Sample $A$ conditioned on $C$.
# 3. Sample $B$ independently.
# 4. Sample $D$ conditioned on $A$ and $B$.

# %%
simulated_dataset_path = "./simulated_data/simulated_cate_data.csv"

# %%
import numpy as np
import pandas as pd

# Define categories
categories_C = ['c1', 'c2', 'c3']
categories_A = ['a1', 'a2', 'a3']
categories_B = ['b1', 'b2']
categories_D = ['d1', 'd2']

n_samples = 10000

# 1. Sample C independently
C = np.random.choice(categories_C, size=n_samples, p=[0.3, 0.4, 0.3])

# 2. Sample A conditioned on C
# Conditional probability table: P(A | C)
P_A_given_C = {
'c1': [0.7, 0.2, 0.1],
'c2': [0.2, 0.5, 0.3],
'c3': [0.1, 0.3, 0.6],
}
A = np.array([
np.random.choice(categories_A, p=P_A_given_C[c_val])
for c_val in C
])

# 3. Sample B independently
B = np.random.choice(categories_B, size=n_samples, p=[0.6, 0.4])

# 4. Sample D conditioned on A and B
# Conditional probability table: P(D | A, B)
P_D_given_AB = {
('a1', 'b1'): [0.9, 0.1],
('a1', 'b2'): [0.6, 0.4],
('a2', 'b1'): [0.5, 0.5],
('a2', 'b2'): [0.4, 0.6],
('a3', 'b1'): [0.3, 0.7],
('a3', 'b2'): [0.2, 0.8],
}
D = np.array([
np.random.choice(categories_D, p=P_D_given_AB[(a_val, b_val)])
for a_val, b_val in zip(A, B)
])

# Create DataFrame
df = pd.DataFrame({'C': C, 'A': A, 'B': B, 'D': D})
print(df.head())

# Save to CSV
df.to_csv(simulated_dataset_path, index=False)

# %%
cate_ds = pysmile.learning.DataSet()

try:
cate_ds.read_file(simulated_dataset_path)
except pysmile.SMILEException:
print("Dataset load failed")
#endtry

print(f"Dataset has {cate_ds.get_variable_count()} variables (columns) "
+ f"and {cate_ds.get_record_count()} records (rows)")

# %% [markdown]
# ## Structure learning from simulated dataset

# %%
pc = pysmile.learning.PC()

try:
pattern = pc.learn(cate_ds)
except pysmile.SMILEException:
print("PC failed")
#endtry

learned_cate_net = pattern.make_network(cate_ds)
print("PC finished, proceeding to parameter learning")

learned_cate_net.write_file("./simulated_data/learned_cate_net.xdsl")

# %%
def create_cpt_node(net, id, name, outcomes, x_pos, y_pos):
handle = net.add_node(pysmile.NodeType.CPT, id)

net.set_node_name(handle, name)
net.set_node_position(handle, x_pos, y_pos, 85, 55)

initial_outcome_count = net.get_outcome_count(handle)

for i in range(0, initial_outcome_count):
net.set_outcome_id(handle, i, outcomes)

for i in range(initial_outcome_count, len(outcomes)):
net.add_outcome(handle, outcomes)

return handle

# %% [markdown]
# To simulate categorical variables $A, B, C, D$ with the following Bayesian network dependencies:
#
# * $C \rightarrow A$
# * $A, B \rightarrow D$

# %%
# Define categories
# categories_C = ['c1', 'c2', 'c3']
# categories_A = ['a1', 'a2', 'a3']
# categories_B = ['b1', 'b2']
# categories_D = ['d1', 'd2']

cate_net = pysmile.Network()

A = create_cpt_node(cate_net, "A", "A", ['a1', 'a2', 'a3'], 10, 20)
B = create_cpt_node(cate_net, "B", "B", ['b1', 'b2'], 10, 30)
C = create_cpt_node(cate_net, "C", "C", ['c1', 'c2', 'c3'], 10, 40)
D = create_cpt_node(cate_net, "D", "D", ['d1', 'd2'], 10, 50)

cate_net.add_arc(C, A)
cate_net.add_arc(A, D)
cate_net.add_arc(B, D)

cate_net.write_file("./simulated_data/predefined_cate_net.xdsl")

# %%
em = pysmile.learning.EM()

try:
matching = cate_ds.match_network(cate_net)
except pysmile.SMILEException:
print("Can't automatically match network with dataset")
#endtry

em.set_uniformize_parameters(False)
em.set_randomize_parameters(False)
em.set_eq_sample_size(0)

try:
em.learn(cate_ds, cate_net, matching)
except pysmile.SMILEException:
print("EM failed")
#endtry

print("EM finished")
cate_net.write_file("simulated_data_em_cate.xdsl")
print("Complete.")

```

Thu Jun 12, 2025 11:32 am

Code: Select all

A = create_cpt_node(cate_net, "A", "A", ['a1', 'a2', 'a3'], 10, 20)

The create_cpt_node function defined in the tutorial first creates the chance node, which by default has two outcomes and its CPT is { 0.5, 0.5 }. If there are more than two outcome ids, the Network.add_outcome is called to add outcomes to the newly created node. The CPT after add_outcome will contains zeros in the new rows of the CPT.

It's up to the SMILE user to provide actual values to the CPTs, possibly by learning parameters from data with EM.

BayesFusionUser123 · Post by **BayesFusionUser123** » Fri Jun 13, 2025 8:24 pm

Currently I am just learning how to use the package. My final goal is to use it for real-world project.

In a real-world project, we want to first use our own causal discovery algorithm to generate the skeleton of the network (like the PC algorithm outputs the skeleton of the network), then we want to use the SMILE software to learn the parameters of the network given our real-world dataset.

Our own causal discovery algorithm will give me an adjacency matrix to represent the network.

Code: Select all

cate_net = pysmile.Network()  
  
A = create_cpt_node(cate_net, "A", "A", ['a1', 'a2', 'a3'], 10, 20)  
B = create_cpt_node(cate_net, "B", "B", ['b1', 'b2'], 10, 30)  
C = create_cpt_node(cate_net, "C", "C", ['c1', 'c2', 'c3'], 10, 40)  
D = create_cpt_node(cate_net, "D", "D", ['d1', 'd2'], 10, 50)  
  
cate_net.add_arc(C, A)  
cate_net.add_arc(A, D)  
cate_net.add_arc(B, D)

For categorical variables, it is OK to create the skeleton of the network demonstrated by the code above right?

Later we call the EM algorithm for parameter learning on this constructed network `cate_net`, the initial numbers like 0.5, 0.5, 0, 0 etc does not matter right?

BayesFusion Support Forum

how default probability number in the network with only categorical variable is set?

how default probability number in the network with only categorical variable is set?

Re: how default probability number in the network with only categorical variable is set?

Re: how default probability number in the network with only categorical variable is set?