Questions about DSL_dataset's ReadFile method

The engine.
Post Reply
ermutarra
Posts: 17
Joined: Fri Apr 24, 2009 1:19 pm

Questions about DSL_dataset's ReadFile method

Post by ermutarra »

Hi,

I'm having trouble loading a data file and I have a few questions about this.

1. Do the variable names have to be in the first row? Can it not load a data file that does not have the variable names in the first row?

2. Do the variable states have to be a string? Can they not be numbers?

My data file for now is just:

1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4

But DSL_dataset's ReadFile method fails.

3. Now that there is no parser anymore how does the ReadFile know whether the variables are discrete or continuous?

Thank you for your help!
ermutarra
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: Questions about DSL_dataset's ReadFile method

Post by shooltz[BayesFusion] »

ermutarra wrote:1. Do the variable names have to be in the first row? Can it not load a data file that does not have the variable names in the first row?
Such data can be loaded, you just need to override the default parse parameters passed to DSL_dataset::ReadFile.

Code: Select all

DSL_datasetParseParams params;
params.columnIdsPresent = false;
int res = dataset.ReadFile(filename, &params);
2. Do the variable states have to be a string? Can they not be numbers? My data file for now is just:
1 1 1 1
2 2 2 2
...
They can be numbers. Your file doesn't load, because the first row to be considered to be column name, and there's repeated '1' in the first row. See the answer to your question 1 on how to force the parser to use first row as data (not the header).
3. Now that there is no parser anymore how does the ReadFile know whether the variables are discrete or continuous?
We use simple algorithm for that. If column contains at least one non-numeric value, it's considered to be discrete. If column contains all numeric values, it's discrete if all numbers are integers.
kile
Posts: 19
Joined: Sat Apr 25, 2009 3:36 pm

Post by kile »

Hi, I just started to make some tests with ReadFile copying the basic tutorial as following:

Code: Select all

DSL_dataset dataset;
std::string filename="D:\\test2.txt";

if (!dataset.ReadFile(filename))
   ExitProcess(0);
And test2.txt has the example values:

Code: Select all

c x y
State1 State1 State0
State1 State1 State0
State0 State1 State1
State0 State0 State0
State0 State0 State1
State0 State0 State1
State0 State1 State0
State0 State0 State1
State0 State1 State1
State0 State1 State1
So I just got false, and I exit in the ExitProcess statement.
If i change the default params setting DSL_datasetParseParams::columnIdsPresent to false, and delete the first row c x y, It loads correctly.
Can anyone tell me what I'm doing wrong?

thank u very much
ermutarra
Posts: 17
Joined: Fri Apr 24, 2009 1:19 pm

Post by ermutarra »

I also get false when trying to load the data from a file, but using the method PrintDataset from one of the tutorials I can see that the data has actually been read correctly.

Therefore, my conclusion is that in the new API false means no errors.
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

ermutarra wrote:Therefore, my conclusion is that in the new API false means no errors.
That's correct. DSL_dataset::ReadFile returns int. If data was loaded the returned value is DSL_OKAY, which is #defined to be zero.
kile
Posts: 19
Joined: Sat Apr 25, 2009 3:36 pm

Post by kile »

Ok now it's working ;) Thank u very much.
rollyman
Posts: 1
Joined: Wed Jun 24, 2009 5:47 am

hi

Post by rollyman »

thanks for the right DSL_dataset::ReadFile..



_________________
California Orange County Lawyer | Custom banner
Post Reply