Discrete and continuous variables

<< Click to Display Table of Contents >>

Navigation:  Using SMILE > Datasets >

Discrete and continuous variables

If the data for learning or validation comes from a source other than a text file, you will need to programmatically initialize the structure and the contents of the data set. Consider the example data set from the previous chapter - it had three variables, of which the first was continuous and other two were discrete. The code to create the structure of this data set looks as follows:

DSL_dataset ds;

ds.AddFloatVar("Var1");

ds.AddIntVar("Var2");

ds.AddIntVar("Var3");

vector<string> stateNames({"StateX","StateY",""StateZ""});

ds.SetStateNames(ds.FindVariable("Var3"), stateNames);

DSL_dataset::FindVariable was used to get the index of the variable with a known identifier. An alternative approach in the example above would use a hard coded index, as we know which variables were added to the data set just prior to DSL_dataset::SetStateNames.

You can set the number of records in the data set upfront with a call to DSL_dataset::SetNumberOfRecords, or call AddEmptyRecord for each record that you plan to append to the data set at a later time.

ds.AddEmptyRecord();

int recIdx = ds.GetNumberOfRecords() - 1;

ds.SetFloat(0, recIdx, 44.225);

ds.SetInt(1, recIdx, 3);

ds.SetInt(2, recIdx, 2);

Note that depending on the type of the variable, you need to use either DSL_dataset::SetFloat or SetInt. In the example above, the last variable has associated state names, but they are not used for data entry.

To mark an element of the variable as missing, use DSL_dataset::SetMissing. After a call to AddEmptyRecord, all elements of the last record in the data set are missing.

The following code snippet displays contents of a data set. It uses DSL_dataset::IsDiscrete to determine the type of the variable. For discrete variables, the state names vector returned by DSL_dataset::GetStateNames is also checked. If the vector is empty, there are no strings associated with the integer variable values.

int varCount = ds.GetNumberOfVariables();

int recCount = ds.GetNumberOfRecords();

for (int r = 0; r < recCount; r ++)

{

    for (int v = 0; v < varCount; v++)

    {

        if (v > 0) printf(",");

        if (ds.IsMissing(v, r))

        {

            printf("N/A");

        }

        else if (ds.IsDiscrete(v))

        {

            int x = ds.GetInt(v, r);

            const vector<string> &states = ds.GetStateNames(v);

            if (states.empty())

            {

                printf("%d", x);

            }

            else

            {

                printf("%s", states[x].c_str());

            }

        }

        else

        {

            printf("%f", ds.GetFloat(v, r));

        }

    }

    printf("\n");

}