Continuous/Discrete

The engine.
Post Reply
borisrabin
Posts: 24
Joined: Thu Sep 30, 2010 7:48 pm

Continuous/Discrete

Post by borisrabin »

Hello,

How the SMILE's algorithm recognize if the column type continuous or discrete ?

Thanks,
Boris
shooltz[BayesFusion]
Site Admin
Posts: 1457
Joined: Mon Nov 26, 2007 5:51 pm

Re: Continuous/Discrete

Post by shooltz[BayesFusion] »

The variables (data columns) in the DSL_dataset can be created with AddIntVar or AddFloatVar - this automatically marks the column as discrete/continuous. The parser used by DSL_dataset::ReadFile treats the column as continouos when at least one value is non-integer.
borisrabin
Posts: 24
Joined: Thu Sep 30, 2010 7:48 pm

Re: Continuous/Discrete

Post by borisrabin »

shooltz wrote:The parser used by DSL_dataset::ReadFile treats the column as continouos when at least one value is non-integer.
Suppose the column contains only integers : 1,2,3,4,5,...,50,.....1000 this column esteemed as Continuous or Discrete ?

Thanks,
Boris
shooltz[BayesFusion]
Site Admin
Posts: 1457
Joined: Mon Nov 26, 2007 5:51 pm

Re: Continuous/Discrete

Post by shooltz[BayesFusion] »

borisrabin wrote:Suppose the column contains only integers : 1,2,3,4,5,...,50,.....1000 this column esteemed as Continuous or Discrete ?
If you're asking about the type of the column as inferred by DSL_dataset::ReadFile, then the answer is 'discrete'.
borisrabin
Posts: 24
Joined: Thu Sep 30, 2010 7:48 pm

Re: Continuous/Discrete

Post by borisrabin »

shooltz wrote:
borisrabin wrote:Suppose the column contains only integers : 1,2,3,4,5,...,50,.....1000 this column esteemed as Continuous or Discrete ?
If you're asking about the type of the column as inferred by DSL_dataset::ReadFile, then the answer is 'discrete'.
What is the "Discrete threshold" feature in GeNIe ?
Is this feature enabled in SMILE with some default value ?

Thanks,
Boris
shooltz[BayesFusion]
Site Admin
Posts: 1457
Joined: Mon Nov 26, 2007 5:51 pm

Re: Continuous/Discrete

Post by shooltz[BayesFusion] »

What is the "Discrete threshold" feature in GeNIe ?
If the number of unique integer values in the data column is above the 'discrete threshold', the column is considered continuous. For example, the 'salar' column in retention.txt will be considered continuous despite containing integer values only.
Is this feature enabled in SMILE with some default value ?
No, this is GeNIe feature; however, GeNIe uses publicly available SMILearn API (DSL_dataset) to implement it. After the data is loaded into the dataset, but before learning starts, the discrete columns are checked against 'discrete threshold', and, if required, converted to continuous.
Post Reply