<< Click to Display Table of Contents >> Navigation: Using GeNIe > Learning > Accessing data |
GeNIe can access data from three sources: text files, ODBC databases, and the native GeNIe data format. They will be subject of the following three sections.
The simplest data format used by GeNIe is text format. Data in the text format consist of rows of records, where values are separated by commas (*.csv format) or TAB characters (*.txt and *.dat formats). The first row in the data file contains variable IDs. Each of these IDs has to start with a letter, followed by letters, digits, and underscore characters. Letters are a-z and A-Z but also all Unicode characters above codepoint 127, which allows using characters from other alphabets than the Latin alphabet. The popular CSV format (used, among others, in Microsoft Excel), conforms to this standard. To access data stored in a text file select File-Open Data File...
Subsequently, select the data file that you wish to load.
Data, once loaded, should look as follows:
ODBC (Open DataBase Connectivity) is a standard application programming interface (API) for accessing database management systems (DBMS). ODBC is independent of the details of any concrete database system and the operating system. GeNIe implements the ODBC standard, which allows it to connect to most DBMS. In this section, we will open a Microsoft Access database. To access the data from a database, select File-Import ODBC Data..., which will open the Select Data Source dialog.
If you have never created a data source before, you will have to create a new one. It is most convenient to create a new data source that covers all files originating from a Windows application, which is a Machine Data Source. We will create a data source for Microsoft Access.
We select Microsoft Access Database and press OK. GeNIe will display a dialog box that allows for selecting data, that should look as follows:
We will open the netflix.mdb database. The ensuing dialog shows the tables (or views, if you select the Views tab) present in the database. Table MovieGenres contains two variables, movie and genre.
You can select a table, a view, or create a new table through an SQL query that you can type in the SQL Query tab.
Pressing OK runs the query and opens the result in GeNIe:
GeNIe allows to save data in a binary internal format that we call GeNIe Data Format (*.gdat). The biggest advantage of this data format is that it allows for saving all useful information, such as the original values in the data, the replaced missing values, discretization information, and even column widths. Because the format includes the original data, it is always possible to reverse all data preprecessing operations, such as discretization. To save your data in GeNIe Data Format, select File-Save As...
Once a data file has been opened/loaded into GeNIe, the types of columns are fixed and there is no way of changing this type. GeNIe uses a database program to keep and store data and it checks the type. Once the type has been set, it stays the same for the duration of the session. If all values in a column are numbers, the type is numerical. When learning, you have control over whether a variable is judged continuous or discrete. For that set the Discrete threshold in the learning dialog. When the number of different values in a column exceeds the threshold value, the column is judged to be continuous.