K-Fold Crossvalidation

The engine.
Post Reply
kile
Posts: 19
Joined: Sat Apr 25, 2009 3:36 pm

K-Fold Crossvalidation

Post by kile »

Hi all,

I was looking thorugh the forum and i found it http://genie.sis.pitt.edu/forum/viewtop ... ight=cross

I assume any changes since that post so I started to create my own crossvalidation function. The problem is that I was thinking to do something like the following:

Code: Select all

DSL_dataset m_dataset;
m_dataset.ReadFile(filename);

DSL_dataset learnDataset=m_dataset;
DSL_dataset trainDataset=m_dataset;

for indices in train
     learnDataset->RemoveRecord(indice)

for indices not in train
     trainDataset->RemoveRecord(indice)
But first of all, I was thinking it has overloaded the = operator so it will not be a pointer to the same data, but it's not so after deleting from each dataset it's deleting from the others :(
And secondly as I'm removing records while indexing next ones, those indices are not valid anymore.

The only solution I can imagine is to export both data to text files, and then read each one on each dataset but it's sound not very nice :(

Any help?
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: K-Fold Crossvalidation

Post by shooltz[BayesFusion] »

kile wrote:But first of all, I was thinking it has overloaded the = operator so it will not be a pointer to the same data, but it's not so after deleting from each dataset it's deleting from the others :(
Thanks for reporting that issue. Technically what's missing is the copy constructor; the default one using memberwise copy (generated by the compiler) is not adequate. Please add the code below in the definition of DSL_dataset class in dataset.h:

Code: Select all

DSL_dataset() {}
DSL_dataset(const DSL_dataset &src) : metadata(src.metadata)
{
	int varCount = src.GetNumberOfVariables();
	data.resize(varCount);
	for (int var = 0; var < varCount; var ++)
	{
		if (src.IsDiscrete(var))
		{
			data[var] = new std::vector<int>(src.GetIntData(var));
		}
		else
		{
			data[var] = new std::vector<float>(src.GetFloatData(var));
		}
	}
}

And secondly as I'm removing records while indexing next ones, those indices are not valid anymore.
You can easily fix this by removing records in the loop proceeding from the last record towards the beginning of the data. Alternatively, you can use forward-going loop and increase the index only if record was not actually deleted.
kile
Posts: 19
Joined: Sat Apr 25, 2009 3:36 pm

Post by kile »

Thank u very much schooltz, I added it and copyFrom(...) to have a copy function not just in the contructor.

The loop I think I've managed too ^_^

I'll post my version of K-fold when i'll get it working i think will be useful for someone.
Post Reply