I am experiencing difficulties with the PC algorithm on my Linux (x64) / gcc 4.4.5. I tested the same algorithm using the GeNIe front end on windows and found it to work. Using the same dataset on Linux and running the algorithm through SMILE directly results in a Pattern without edges.
Has anyone else noticed this issue? It is likely that I am doing something wrong.
The other algorithms: BS, GTT, NB, and TAN are all working.
PC Algorithm on Linux (x64) / gcc 4.4.5 not making edges
-
- Site Admin
- Posts: 1422
- Joined: Mon Nov 26, 2007 5:51 pm
Re: PC Algorithm on Linux (x64) / gcc 4.4.5 not making edges
Can you post your code and the data file here?
Re: PC Algorithm on Linux (x64) / gcc 4.4.5 not making edges
Thank you;
Here is the code:
bool dataset_learnPC( void * void_dataset, void * void_pat,
unsigned long maxcache, int maxAdjacency,
int maxSearchTime, double significance,
int * forcedarcs, int n_forcedarcs,
int * forbiddenarcs, int n_forbiddenarcs,
int * tiers, int lentiers )
{
DSL_dataset * dset = reinterpret_cast<DSL_dataset*>(void_dataset);
DSL_pattern * pat = reinterpret_cast<DSL_pattern*>(void_pat);
DSL_pc pc;
pc.maxcache = maxcache;
pc.maxAdjacency = maxAdjacency;
pc.maxSearchTime = maxSearchTime;
pc.significance = significance;
for ( int i = 0; i < n_forcedarcs; ++i )
{
int indi = forcedarcs[2*i];
int indj = forcedarcs[2*i+1];
pc.bkk.forcedArcs.push_back(std::pair<int,int>(indi, indj));
}
for ( int i = 0; i < n_forbiddenarcs; ++i )
{
int indi = forbiddenarcs[2*i];
int indj = forbiddenarcs[2*i+1];
pc.bkk.forbiddenArcs.push_back(std::pair<int,int>(indi, indj));
}
for ( int i = 0; i < lentiers; ++i )
{
int ind = tiers[2*i]; // index of the variable
int tier = tiers[2*i+1]; // its associated tier
pc.bkk.tiers.push_back(std::pair<int,int>(ind, tier));
}
return pc.Learn(*dset, *pat) == DSL_OKAY;
}
The function is written so it can be compiled into a shared C library. I was successful with BS, GTT, NB, and TAN using similar methods.
The data file is attached. Notice that columns A and C are identical, so there should be an edge between them.
Here is the code:
bool dataset_learnPC( void * void_dataset, void * void_pat,
unsigned long maxcache, int maxAdjacency,
int maxSearchTime, double significance,
int * forcedarcs, int n_forcedarcs,
int * forbiddenarcs, int n_forbiddenarcs,
int * tiers, int lentiers )
{
DSL_dataset * dset = reinterpret_cast<DSL_dataset*>(void_dataset);
DSL_pattern * pat = reinterpret_cast<DSL_pattern*>(void_pat);
DSL_pc pc;
pc.maxcache = maxcache;
pc.maxAdjacency = maxAdjacency;
pc.maxSearchTime = maxSearchTime;
pc.significance = significance;
for ( int i = 0; i < n_forcedarcs; ++i )
{
int indi = forcedarcs[2*i];
int indj = forcedarcs[2*i+1];
pc.bkk.forcedArcs.push_back(std::pair<int,int>(indi, indj));
}
for ( int i = 0; i < n_forbiddenarcs; ++i )
{
int indi = forbiddenarcs[2*i];
int indj = forbiddenarcs[2*i+1];
pc.bkk.forbiddenArcs.push_back(std::pair<int,int>(indi, indj));
}
for ( int i = 0; i < lentiers; ++i )
{
int ind = tiers[2*i]; // index of the variable
int tier = tiers[2*i+1]; // its associated tier
pc.bkk.tiers.push_back(std::pair<int,int>(ind, tier));
}
return pc.Learn(*dset, *pat) == DSL_OKAY;
}
The function is written so it can be compiled into a shared C library. I was successful with BS, GTT, NB, and TAN using similar methods.
The data file is attached. Notice that columns A and C are identical, so there should be an edge between them.
- Attachments
-
- sample_data.txt
- (3.78 KiB) Downloaded 364 times
Re: PC Algorithm on Linux (x64) / gcc 4.4.5 not making edges
Alright, so I can reproduce the error using strictly C++ (please see attached code).
Using the same dataset as before, I get no parents (and perfectly uniform CPTs) for PC. I get A->C if I use greedy-thick-thinning.
Using the same dataset as before, I get no parents (and perfectly uniform CPTs) for PC. I get A->C if I use greedy-thick-thinning.
- Attachments
-
- gtt.xdsl
- results from greedy thick thinning (correct)
- (2.06 KiB) Downloaded 504 times
-
- pc.xdsl
- results from PC (incorrect)
- (1.57 KiB) Downloaded 491 times
-
- test_PC.cpp
- minimum cpp example
- (841 Bytes) Downloaded 356 times
-
- Site Admin
- Posts: 1422
- Joined: Mon Nov 26, 2007 5:51 pm
Re: PC Algorithm on Linux (x64) / gcc 4.4.5 not making edges
The problem is caused by missing state names in the dataset (your data file contains integer indices, we tested with strings). To fix the problem you can modify your data file or provide the state names in the dataset, as in the example below. Note the ErrorH.RedirectToFile call at the top of testPC - this will reveal errors/warnings (if any) emitted from the learning algorithm. Also note that DSL_pattern::ToNetwork does not perform the parameter learning, so uniform CPTs are to be expected.
Code: Select all
void FixStateNames(DSL_dataset &ds, int varIdx)
{
assert(ds.IsDiscrete(varIdx));
assert(ds.GetStateNames(varIdx).empty());
// unfortunately, DSL_dataset::GetMinMaxInt has a bug,
// so we need to get min/max explicitly
int recCount = ds.GetNumberOfRecords();
bool minMaxInitialized = false;
int minval, maxval;
for (int recIdx = 0; recIdx < recCount; recIdx ++)
{
if (!ds.IsMissing(varIdx, recIdx))
{
int x = ds.GetInt(varIdx, recIdx);
if (minMaxInitialized)
{
if (x < minval) minval = x;
if (x > maxval) maxval = x;
}
else
{
minMaxInitialized = true;
minval = maxval = x;
}
}
}
int stateCount = maxval - minval + 1;
vector<string> stateNames(stateCount);
string id;
for (int i = 0; i < stateCount; i ++)
{
id = "State";
DSL_appendInt(id, minval + i);
stateNames[i] = id;
}
ds.SetStateNames(varIdx, stateNames);
}
int testPC()
{
ErrorH.RedirectToFile(stdout);
DSL_dataset d;
int res = d.ReadFile("d:/sample_data.txt");
if (DSL_OKAY != res)
{
return res;
}
for (int i = 0; i < d.GetNumberOfVariables(); i ++)
{
FixStateNames(d, i);
}
DSL_network net;
DSL_pc pc;
DSL_pattern pat;
res = pc.Learn(d, pat);
if (DSL_OKAY != res)
{
return res;
}
pat.ToNetwork(d, net);
net.WriteFile("d:/pc.xdsl");
return DSL_OKAY;
}