Help with SmiLearn changes

The engine.
Post Reply
Greg
Posts: 16
Joined: Mon Dec 17, 2007 8:03 pm
Location: University of Connecticut
Contact:

Help with SmiLearn changes

Post by Greg »

Hi,

I am trying to learn a network with continuous variables by modifying code that worked with the previous version of SmiLearn.

The dataset has 30 variables and approx 4800 records. I have run tests where all data are real numbers with no change in the results (the zero values are integers in the original datafile).

Dataset::ReadFile returns 0. Does ReadFile now return DSL_OKAY now as it did in earlier versions (before the last)?

DSL_pattern::ToDAG returns true

However, DSL_pc::Learn returns 1

Any thoughts on what is preventing this from working?

Also, I have another question related to DSL_learnProgress. Does this class enable one to stop the learning code and restart at a later time?

Thanks for your time,
Greg Johnson
shooltz[BayesFusion]
Site Admin
Posts: 1457
Joined: Mon Nov 26, 2007 5:51 pm

Re: Help with SmiLearn changes

Post by shooltz[BayesFusion] »

Dataset::ReadFile returns 0. Does ReadFile now return DSL_OKAY now as it did in earlier versions (before the last)?
That's correct. The type returned from DSL_dataset::ReadFile was changed from bool to int around October 2008.

DSL_pattern::ToDAG returns true
Yes, this method still returns a bool value.
However, DSL_pc::Learn returns 1
That's highly unlikely - DSL_pc::Learn always returned an int and kept the SMILE convention of using negative error codes. Maybe you meant -1 (DSL_GENERAL_ERROR) instead of 1? If you have a console app, try adding the following line before other SMILE calls - maybe you'll get some diagnostic messages from the learning code on the screen:

Code: Select all

ErrorH.RedirectToFile(stdout);
Also, I have another question related to DSL_learnProgress. Does this class enable one to stop the learning code and restart at a later time?
If you derive from DSL_learnProgress and implement the Tick method, you'll be able to stop by returning false. For stop/restart you'd have to enter some loop/wait for external event in Tick - the method is called on the same thread where DSL_pc::Learn runs.
Greg
Posts: 16
Joined: Mon Dec 17, 2007 8:03 pm
Location: University of Connecticut
Contact:

Post by Greg »

Thanks for the quick response!

pc::Learn returns 1 not -1 as expected. However, the error message indicates that there is a mix of continuous and discrete variables. Is there a way to force all variables to be continuous? I have edited the dataset so all data elements have at least one decimal place.

Thanks again,
Greg
shooltz[BayesFusion]
Site Admin
Posts: 1457
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

Greg wrote:pc::Learn returns 1 not -1 as expected.
That's highly unexpected outcome. What's the size/date of pc.h file?

However, the error message indicates that there is a mix of continuous and discrete variables. Is there a way to force all variables to be continuous? I have edited the dataset so all data elements have at least one decimal place.
The parser included in SMILE only cares about actual values in the parsed files, not the notation (so if you have 1.0, 2.0 and 3.0 the column will be considered discrete anyway). You'll need to copy columns or the entire dataset to ensure all columns are continuous.
Greg
Posts: 16
Joined: Mon Dec 17, 2007 8:03 pm
Location: University of Connecticut
Contact:

Post by Greg »

pc.h is 720 bytes and dated 4/16/09 01:43:17 PM

I'm afraid I don't understand what you mean about copying columns or the dataset to ensure all columns are continuous.
shooltz[BayesFusion]
Site Admin
Posts: 1457
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

Greg wrote:I'm afraid I don't understand what you mean about copying columns or the dataset to ensure all columns are continuous.
After loading the data file, iterate over the dataset columns. For each column call DSL_dataset::IsDiscrete. If IsDiscrete returns false, you'll have to create new dataset column with AddFloatVar and copy the values from discrete column into newly created continuous column. When the copy is complete, delete the discrete column with DSL_dataset::RemoveVar.

The copy can be done within single dataset object, or you can create second dataset.
Greg
Posts: 16
Joined: Mon Dec 17, 2007 8:03 pm
Location: University of Connecticut
Contact:

Post by Greg »

I've got the columns copied to continuous variables, Now I get a "pc: constant not allowed" error. Do I have a column with too little variation? I've tried adjusting the pc significance value with no success.

Thanks,
Greg
shooltz[BayesFusion]
Site Admin
Posts: 1457
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

Greg wrote:I've got the columns copied to continuous variables, Now I get a "pc: constant not allowed" error. Do I have a column with too little variation?
This error message is generated when one or more of the columns have exactly one distinct value.
Greg
Posts: 16
Joined: Mon Dec 17, 2007 8:03 pm
Location: University of Connecticut
Contact:

Post by Greg »

Okay, when I use a different dataset, or a concatenation of several datasets, DSL_pattern::Print produces the following output:

None: 0
Undirected: 1
Directed: 2

And DSL_pc::Learn doesn't produce a valid network although no errors are produced/detected.
shooltz[BayesFusion]
Site Admin
Posts: 1457
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

Greg wrote:Okay, when I use a different dataset, or a concatenation of several datasets, DSL_pattern::Print produces the following output:

None: 0
Undirected: 1
Directed: 2
The output above is just a 'legend' - if the pattern contains anything, this will be followed by a series of numbers (0/1/2) representing the relationships between vertices in the pattern. What does DSL_pattern::GetSize return?
Greg
Posts: 16
Joined: Mon Dec 17, 2007 8:03 pm
Location: University of Connecticut
Contact:

Post by Greg »

Perhaps I am confused about how to use the pattern, dataset and network. After a dataset has been created, I have:

if (!pattern.ToDAG(d, result))
cout << "\tDSL_pattern::ToDAG returned false!"<< endl;


pattern.Print();

cout << "Starting learning... ";

if (error_code = pc.Learn(d,pattern)!=DSL_OKAY)

{

cout << "Learning failed with error " << error_code << "." << endl;

return -1;

}

cout << "Learning complete!"<<endl;

where result is a DSL_network. Am I using something wrong or forgetting something?
shooltz[BayesFusion]
Site Admin
Posts: 1457
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

ToDAG gets the dataset on input, but it's only used to initialize node identifiers (the actual numeric values from the data are not used). You should follow these steps

1) obtain a dataset (parse the file, optionally ensure that all columns are continuous)
2) pass the dataset to DSL_pc::Learn. If DSL_pc::Learn returns DSL_OKAY you should have something in the learned pattern.
3) try DSL_pattern::ToDAG - this may return false (ToDAG doesn't remove arcs between pattern vertices; it doesn't convert bidirectional arcs to unidirectionals). In such case, you'll need to ensure that patterns is directed and acyclic in your own code, or provide some kind of user interface for this manipulation.
Greg
Posts: 16
Joined: Mon Dec 17, 2007 8:03 pm
Location: University of Connecticut
Contact:

Post by Greg »

When I invoke ToDAG after DSL_pc::Learn DSL_pattern::ToDAG returns false but I get the following output from DSL_pattern::Print which shows no undirected arcs.

None: 0
Undirected: 1
Directed: 2
0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2
0 0 2 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 2
0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 2 0 0 0 0 0 2 2 0
0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0
0 0 0 0 0 0 2 0 0 0 0 0 2 0 0 0 0 2 0 0 0 0 2 2 0 0 0 0 0 0
0 0 0 0 0 2 0 0 0 2 0 0 2 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 2 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 2 2 2 0 2 0 0 0 0
0 0 2 0 0 0 2 0 0 0 0 0 0 0 0 0 2 0 0 0 2 0 0 0 0 0 0 0 0 2
0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 2 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 2 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0
0 2 0 0 2 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 2 0 0 2 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 2 2 2 0 2 0 0 0 0 0 0 0 0 0 2 0 0 0 2
0 0 0 0 0 0 0 0 0 0 2 0 0 2 2 0 0 2 0 0 2 2 0 0 0 0 0 0 0 0
0 0 0 0 0 0 2 0 0 2 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 2 0 0 0
0 0 0 0 0 2 0 0 0 0 0 2 0 0 0 2 0 0 0 0 0 0 2 0 0 2 0 2 0 0
0 0 0 2 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 2 0 0 0 0 0 2
0 0 0 0 0 0 0 0 2 0 2 0 0 0 0 0 2 0 2 0 0 2 0 0 0 0 2 0 0 0
0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 2 0 0 2 0 0 0 0 0 0 0 0 0 2 0
0 0 0 2 0 0 0 2 2 0 0 0 0 0 0 2 0 0 0 2 0 0 0 0 0 0 2 0 0 0
0 0 2 0 0 2 0 0 2 0 0 0 0 2 0 0 0 2 0 0 0 0 0 0 0 0 2 0 0 0
0 0 0 0 0 2 0 2 2 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 2 0 0 2 0 2 2 0 0 2 0 0 0 0 0 0 0 0 2 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 2 0 0 2 0 2 2 2 0 2 0 0 0 0
0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 2 0
0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0
2 0 2 0 0 0 0 0 0 2 0 0 0 0 2 0 0 0 2 0 0 0 0 2 0 0 0 0 0 0
shooltz[BayesFusion]
Site Admin
Posts: 1457
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

BTW, this...
if (error_code = pc.Learn(d,pattern)!=DSL_OKAY)
... should be changed to this - notice the extra parentheses:

Code: Select all

if ((error_code = pc.Learn(d,pattern))!=DSL_OKAY)
shooltz[BayesFusion]
Site Admin
Posts: 1457
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

Greg wrote:When I invoke ToDAG after DSL_pc::Learn DSL_pattern::ToDAG returns false but I get the following output from DSL_pattern::Print which shows no undirected arcs.
The pattern with directed arcs may not be a directed acyclic graph. In other words, it may contain a directed cycle. In such case ToDAG returns false.
Post Reply