PC Algorithm on Linux (x64) / gcc 4.4.5 not making edges

The engine.
Post Reply
timw
Posts: 7
Joined: Fri Apr 25, 2014 10:42 pm

PC Algorithm on Linux (x64) / gcc 4.4.5 not making edges

Post by timw »

I am experiencing difficulties with the PC algorithm on my Linux (x64) / gcc 4.4.5. I tested the same algorithm using the GeNIe front end on windows and found it to work. Using the same dataset on Linux and running the algorithm through SMILE directly results in a Pattern without edges.

Has anyone else noticed this issue? It is likely that I am doing something wrong.
The other algorithms: BS, GTT, NB, and TAN are all working.
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: PC Algorithm on Linux (x64) / gcc 4.4.5 not making edges

Post by shooltz[BayesFusion] »

Can you post your code and the data file here?
timw
Posts: 7
Joined: Fri Apr 25, 2014 10:42 pm

Re: PC Algorithm on Linux (x64) / gcc 4.4.5 not making edges

Post by timw »

Thank you;

Here is the code:

bool dataset_learnPC( void * void_dataset, void * void_pat,
unsigned long maxcache, int maxAdjacency,
int maxSearchTime, double significance,
int * forcedarcs, int n_forcedarcs,
int * forbiddenarcs, int n_forbiddenarcs,
int * tiers, int lentiers )
{
DSL_dataset * dset = reinterpret_cast<DSL_dataset*>(void_dataset);
DSL_pattern * pat = reinterpret_cast<DSL_pattern*>(void_pat);

DSL_pc pc;
pc.maxcache = maxcache;
pc.maxAdjacency = maxAdjacency;
pc.maxSearchTime = maxSearchTime;
pc.significance = significance;

for ( int i = 0; i < n_forcedarcs; ++i )
{
int indi = forcedarcs[2*i];
int indj = forcedarcs[2*i+1];
pc.bkk.forcedArcs.push_back(std::pair<int,int>(indi, indj));
}

for ( int i = 0; i < n_forbiddenarcs; ++i )
{
int indi = forbiddenarcs[2*i];
int indj = forbiddenarcs[2*i+1];
pc.bkk.forbiddenArcs.push_back(std::pair<int,int>(indi, indj));
}

for ( int i = 0; i < lentiers; ++i )
{
int ind = tiers[2*i]; // index of the variable
int tier = tiers[2*i+1]; // its associated tier
pc.bkk.tiers.push_back(std::pair<int,int>(ind, tier));
}

return pc.Learn(*dset, *pat) == DSL_OKAY;
}

The function is written so it can be compiled into a shared C library. I was successful with BS, GTT, NB, and TAN using similar methods.

The data file is attached. Notice that columns A and C are identical, so there should be an edge between them.
Attachments
sample_data.txt
(3.78 KiB) Downloaded 343 times
timw
Posts: 7
Joined: Fri Apr 25, 2014 10:42 pm

Re: PC Algorithm on Linux (x64) / gcc 4.4.5 not making edges

Post by timw »

Alright, so I can reproduce the error using strictly C++ (please see attached code).
Using the same dataset as before, I get no parents (and perfectly uniform CPTs) for PC. I get A->C if I use greedy-thick-thinning.
Attachments
gtt.xdsl
results from greedy thick thinning (correct)
(2.06 KiB) Downloaded 483 times
pc.xdsl
results from PC (incorrect)
(1.57 KiB) Downloaded 470 times
test_PC.cpp
minimum cpp example
(841 Bytes) Downloaded 331 times
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: PC Algorithm on Linux (x64) / gcc 4.4.5 not making edges

Post by shooltz[BayesFusion] »

The problem is caused by missing state names in the dataset (your data file contains integer indices, we tested with strings). To fix the problem you can modify your data file or provide the state names in the dataset, as in the example below. Note the ErrorH.RedirectToFile call at the top of testPC - this will reveal errors/warnings (if any) emitted from the learning algorithm. Also note that DSL_pattern::ToNetwork does not perform the parameter learning, so uniform CPTs are to be expected.

Code: Select all

void FixStateNames(DSL_dataset &ds, int varIdx)
{
	assert(ds.IsDiscrete(varIdx));
	assert(ds.GetStateNames(varIdx).empty());

	// unfortunately, DSL_dataset::GetMinMaxInt has a bug,
	// so we need to get min/max explicitly
	int recCount = ds.GetNumberOfRecords();
	bool minMaxInitialized = false;
	int minval, maxval;
	for (int recIdx = 0; recIdx < recCount; recIdx ++)
	{
		if (!ds.IsMissing(varIdx, recIdx))
		{
			int x = ds.GetInt(varIdx, recIdx);
			if (minMaxInitialized)
			{
				if (x < minval) minval = x;
				if (x > maxval) maxval = x;
			}
			else
			{
				minMaxInitialized = true;
				minval = maxval = x;
			}
		}
	}


	int stateCount = maxval - minval + 1;
	vector<string> stateNames(stateCount);
	string id;
	for (int i = 0; i < stateCount; i ++)
	{
		id = "State";
		DSL_appendInt(id, minval + i);
		stateNames[i] = id;
	}

	ds.SetStateNames(varIdx, stateNames);
}

int testPC()
{
	ErrorH.RedirectToFile(stdout);

	DSL_dataset d;
	int res = d.ReadFile("d:/sample_data.txt");
	if (DSL_OKAY != res)
	{
		return res;
	}

	for (int i = 0; i < d.GetNumberOfVariables(); i ++) 
	{
		FixStateNames(d, i);
	}


  	DSL_network net;
	DSL_pc pc;
	DSL_pattern pat;
	res = pc.Learn(d, pat);
	if (DSL_OKAY != res)
	{
		return res;
	}
	
	pat.ToNetwork(d, net);
	net.WriteFile("d:/pc.xdsl");

	return DSL_OKAY;
}
timw
Posts: 7
Joined: Fri Apr 25, 2014 10:42 pm

Re: PC Algorithm on Linux (x64) / gcc 4.4.5 not making edges

Post by timw »

Thank you!
Post Reply