<< Click to Display Table of Contents >> Navigation: Reference Manual > DSL_dataset |
Header file: dataset.h
DSL_dataset();
DSL_dataset(const DSL_dataset &src);
DSL_dataset& operator=(const DSL_dataset &src);
~DSL_dataset();
Default constructor, copy constructor, assignment operator, and destructor are defined.
int ReadFile(const std::string &filename,
const DSL_datasetParseParams *params = NULL,
std::string *errOut = NULL);
Reads the contents of the data set from the text file. Returns DSL_OKAY on success or an error code on failure. If errOut is not NULL, additional information about the error is returned.
The parser reads the first line from the file and searches for the following separator characters: tab, comma, semicolon, space (in this order). The first character found is considered to be the separator.
The types of data set variables are determined as follows:
•If the data column in the file contains non-numeric entries, the corresponding data set variable is string discrete.
•If the data column in the file contains only numeric entries and there is at least one fractional value, the corresponding data set variable is numeric continuous.
•Otherwise the data set variable is numeric discrete.
To customize parsing, you can pass the pointer to the DSL_datasetParseParams struct. The structure is declared in dataset.h as:
struct DSL_datasetParseParams
{
DSL_datasetParseParams() :
missingValueToken("*"),
missingInt(DSL_MISSING_INT),
missingFloat(DSL_MISSING_FLOAT),
columnIdsPresent(true) {}
std::string missingValueToken;
int missingInt;
float missingFloat;
bool columnIdsPresent;
};
int WriteFile(const std::string &filename,
const DSL_datasetWriteParams *params = NULL,
std::string *errOut = NULL) const;
Writes the contents of the data set to a text file. Returns DSL_OKAY on success or an error code on failure. If errOut is not NULL, additional information about the error is returned.
To customize parsing, you can pass the pointer to the DSL_datasetWriteParams struct. The structure is declared in dataset.h as:
struct DSL_datasetWriteParams
{
DSL_datasetWriteParams() :
missingValueToken("*"),
columnIdsPresent(true),
useStateIndices(false),
separator('\t'),
floatFormat("%g") {}
std::string missingValueToken;
bool columnIdsPresent;
bool useStateIndices;
char separator;
std::string floatFormat;
};
int MatchNetwork(const DSL_network &net,
std::vector<DSL_datasetMatch> &matching,
std::string &errMsg);
Attempts to match the contents of the data set to the structure of the network specified as the first argument (typically before parameter learning or network validation). May change the integer indices in the data set to ensure the correct fit with outcome ids in the network nodes, therefore it is a non-const method.
On success, the vector of DSL_datasetMatch objects is returned in the matching argument and the method returns DSL_OKAY. To successfully match the network and the data, at least one node and one data set variable have to have identical identifier, and
•either both the node and the data set variable are continuous, or
•both the node and the data set variable are discrete, and all values in the data set variable can be mapped onto node outcomes
When the network and the data set cannot be matched, an error code is returned and additional human-readable information about the error is writter to errMsg parameter.
int AddIntVar(const std::string id = std::string(),
int missingValue = DSL_MISSING_INT);
Adds discrete integer variable to the data set. Note that you need to call DSL_dataset::SetStateNames later if you want to assign string values to integer indices. Returns DSL_OKAY on success or error code on failure.
Multiple variables with empty identifiers are allowed.
int AddFloatVar(const std::string id = std::string(),
float missingValue = DSL_MISSING_FLOAT);
Adds continuous, floating point variable to the data set. Returns DSL_OKAY on success or error code on failure.
Multiple variables with empty identifiers are allowed.
int RemoveVar(int var);
Removes a variable from the data set. Returns DSL_OKAY on success or error code on failure.
void AddEmptyRecord();
Appends a record with all values missing.
void SetNumberOfRecords(int numRecords);
Sets the number of records in the data set. If the new number is greater than the current number, new records will have all values missing.
int RemoveRecord(int rec);
Removes the specified record from the data set. Returns DSL_OKAY on success or error code on failure.
int FindVariable(const std::string &id) const;
Returns the index of the variable with the specified identifier, or a negative error code on failure.
int GetNumberOfVariables() const;
Returns the number of variables in the data set.
int GetNumberOfRecords() const;
Returns the number of records in the data set.
int GetInt(int var, int rec) const;
Returns an integer data value in the specified variable and record.
float GetFloat(int var, int rec) const;
Returns a floating data value in the specified variable and record.
void SetInt(int var, int rec, int value);
Sets an integer data value in the specified variable and record.
void SetFloat(int var, int rec, float value);
Sets a floating data value in the specified variable and record.
void SetMissing(int var, int rec);
Marks a data element in the specified variable and record as missing.
bool IsMissing(int var, int rec) const;
Returns true if the data element in the specified variable and record is missing.
int GetMissingInt(int var) const;
Returns an integer value representing missing data in the specified discrete variable.
float GetMissingFloat(int var) const;
Returns a float value representing missing data in the specified continuous variable.
bool IsDiscrete(int var) const;
Returns true if the specified variable is discrete.
enum DiscretizeAlgorithm { Hierarchical, UniformWidth, UniformCount };
int Discretize(int var, DiscretizeAlgorithm alg, int intervals,
const std::string &statePrefix, std::vector<double> &edges);
int Discretize(int var, DiscretizeAlgorithm alg, int intervals,
const std::string &statePrefix);
Discretizes a data set variable. Returns DSL_OKAY on success or error code on failure. The first overload also returns the values of discretization interval edges.