How to create SMILE DataSet from a pandas dataframe?

The engine.
Post Reply
BayesFusionUser123
Posts: 23
Joined: Tue Jun 10, 2025 3:51 pm

How to create SMILE DataSet from a pandas dataframe?

Post by BayesFusionUser123 »

I want to use this method
read_pandas_dataframe(self: pysmile.learning.DataSet, *args, **kwargs) -> None
to create DataSet object directly from a pandas dataframe.

Could you please tell me how can I fix this issue? Thanks.
The sample code is below:

Code: Select all

# %%
import pandas as pd
import pysmile
import pysmile_license
import numpy as np
import os
import scipy
import math
from functools import partial

# %%
def simulate_bayesian_network():
    # Set random seed for reproducibility
    np.random.seed(1)

    N = 50000

    # Step 1: B ~ Bernoulli(0.5)
    B = np.random.binomial(1, 0.5, size=(N,))

    # Step 2: A depends on B
    A = np.zeros(N, dtype=int)
    A[B == 1] = np.random.binomial(1, 0.9, size=(np.sum(B == 1),))
    A[B == 0] = np.random.binomial(1, 0.1, size=(np.sum(B == 0),))

    # Step 3: T depends on A and B
    T = np.zeros(N, dtype=int)
    T[(A == 1) & (B == 1)] = np.random.binomial(1, 0.9, size=np.sum((A == 1) & (B == 1)))
    T[(A == 1) & (B == 0)] = np.random.binomial(1, 0.1, size=np.sum((A == 1) & (B == 0)))
    T[(A == 0) & (B == 1)] = np.random.binomial(1, 0.1, size=np.sum((A == 0) & (B == 1)))
    T[(A == 0) & (B == 0)] = np.random.binomial(1, 0.1, size=np.sum((A == 0) & (B == 0)))
    
    data = {}
    data['A'] = A
    data['B'] = B
    data['T'] = T

    return pd.DataFrame(data)

# Simulate and preview the dataset
data = simulate_bayesian_network()
print(data.head())
print(type(data))

# %%
def assign_category_name_to_discrete_variables(df, variable_list):
    # the variables that are categorical, but represented by numbers like 0, 1, 2
    # convert it to form var_0, var_1, var_2
    
    def process_categorical_name(x, prefix):
        x = str(x)
        return f"{prefix}___{x}"
    
    for var in variable_list:
        my_func_for_x = partial(process_categorical_name, prefix=var)
        df[var] = df[var].apply(my_func_for_x)
    #endfor
    
    return df

# %%
assign_category_name_to_discrete_variables(data, ["A", "B", "T"])

# %%
print(type(data))

# %%
data.info()

# %%
# read_pandas_dataframe(self: pysmile.learning.DataSet, *args, **kwargs) -> None
# Reads data from a pandas DataFrame

data = data.astype("string")
for col in data.select_dtypes(include=['string', 'object']).columns:
    data[col] = data[col].astype('category')
#endfor
data = data.reset_index(drop=True)

ds_1 = pysmile.learning.DataSet()
ds_1.read_pandas_dataframe(data)

The error message is like:

Code: Select all

# Traceback (most recent call last):
#   File "/create_SMILE_DataSet_from_pandas_dataframe.py", line 92, in <module>
#     ds_1.read_pandas_dataframe(data)
#   File "/users/anaconda3/envs/smile/lib/python3.11/site-packages/pandas/core/generic.py", line 6318, in __getattr__
#     return object.__getattribute__(self, name)
#            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# AttributeError: 'DataFrame' object has no attribute 'dtype'. Did you mean: 'dtypes'?
piotr [BayesFusion]
Site Admin
Posts: 70
Joined: Mon Nov 06, 2017 6:41 pm

Re: How to create SMILE DataSet from a pandas dataframe?

Post by piotr [BayesFusion] »

Thank you for reporting this issue.

The problem was caused by a naming conflict: "T" is a built-in pandas DataFrame property that returns the transpose of the DataFrame.

Workaround for now: rename any column called T to something else (e.g. C). Other pandas attribute names to avoid as column names include: at, iat, loc, iloc, shape, columns, index, dtypes, values, axes, ndim, size, empty.

In the next PySMILE release we will fix the root cause - after that, no renaming will be necessary.
m.kozniewski
Posts: 1
Joined: Sat May 30, 2026 6:56 am

Re: How to create SMILE DataSet from a pandas dataframe?

Post by m.kozniewski »

As it concerns the same method of DataSet class. I have a similar trouble. Please find the Jupyter notebook and the data attached .

Code: Select all

21:24:57.267 [error] Disposing session as kernel process died ExitCode: 3221225477, Reason: 
21:25:29.314 [error] Failed to write data to the kernel channel shell [
  <Buffer 3c 49 44 53 7c 4d 53 47 3e>,
  <Buffer 38 37 31 31 62 36 39 33 34 37 39 37 34 32 34 34 32 39 30 30 65 37 65 65 34 66 37 61 37 32 66 37 63 66 31 35 63 32 62 65 31 38 65 33 31 65 35 35 65 65 ... 14 more bytes>,
  <Buffer 7b 22 64 61 74 65 22 3a 22 32 30 32 36 2d 30 36 2d 30 31 54 31 39 3a 32 35 3a 32 39 2e 33 31 34 5a 22 2c 22 6d 73 67 5f 69 64 22 3a 22 33 34 63 66 36 ... 177 more bytes>,
  <Buffer 7b 7d>,
  <Buffer 7b 7d>,
  <Buffer 7b 22 73 69 6c 65 6e 74 22 3a 66 61 6c 73 65 2c 22 73 74 6f 72 65 5f 68 69 73 74 6f 72 79 22 3a 66 61 6c 73 65 2c 22 75 73 65 72 5f 65 78 70 72 65 73 ... 12617 more bytes>
] [Error: Socket is closed
	at a.postToSocket (~\.vscode\extensions\ms-toolsai.jupyter-2025.9.1-win32-x64\dist\extension.node.js:309:7983)
	at ~\.vscode\extensions\ms-toolsai.jupyter-2025.9.1-win32-x64\dist\extension.node.js:309:7727] {
  errno: 9,
  code: 'EBADF'
}
The only workaround I found is to save to CSV and read it back with SMILE.

Code: Select all

smalldf.to_csv("./data/minidf.csv",index=False)
ds = pysmile.learning.DataSet()
ds.read_file("./data/minidf.csv")
Attachments
pysmile_dataset.zip
(124.13 KiB) Downloaded 5 times
Post Reply