Accessing data

<< Click to Display Table of Contents >>

Navigation:  Using GeNIe > Learning >

Accessing data

GeNIe can access data from three sources: text files, ODBC databases, and the native GeNIe data format. They will be subject of the following three sections.

Text Format (*.txt, *.dat, *.csv)

The simplest data format used by GeNIe is text format. Data in the text format consist of rows of records, where values are separated by commas (*.csv format) or TAB characters (*.txt and *.dat formats). The first row in the data file contains variable IDs. Each of these IDs has to start with a letter, followed by letters, digits, and underscore characters. Letters are a-z and A-Z but also all Unicode characters above codepoint 127, which allows using characters from other alphabets than the Latin alphabet. The popular CSV format (used, among others, in Microsoft Excel), conforms to this standard. To access data stored in a text file select File-Open Data File...

file_menu

Subsequently, select the data file that you wish to load.

open_data_file_dialog

Data, once loaded, should look as follows:

data_spreadsheet

ODBC Data

ODBC (Open DataBase Connectivity) is a standard application programming interface (API) for accessing database management systems (DBMS). ODBC is independent of the details of any concrete database system and the operating system. GeNIe implements the ODBC standard, which allows it to connect to most DBMS. In this section, we will open a Microsoft Access database. To access the data from a database, select File-Import ODBC Data..., which will open the Select Data Source dialog.

select_data_source_dialog

If you have never created a data source before, you will have to create a new one. It is most convenient to create a new data source that covers all files originating from a Windows application, which is a Machine Data Source. We will create a data source for Microsoft Access.

select_data_source_dialog2

We select Microsoft Access Database and press OK. GeNIe will display a dialog box that allows for selecting data, that should look as follows:

select_database_dialog

We will open the netflix.mdb database. The ensuing dialog shows the tables (or views, if you select the Views tab) present in the database. Table MovieGenres contains two variables, movie and genre.

import_odbc_data

You can select a table, a view, or create a new table through an SQL query that you can type in the SQL Query tab.

import_odbc_data3

Pressing OK runs the query and opens the result in GeNIe:

sql_query_results

GeNIe Data Format (*.gdat format)

GeNIe allows to save data in a binary internal format that we call GeNIe Data Format (*.gdat). The biggest advantage of this data format is that it allows for saving all useful information, such as the original values in the data, the replaced missing values, discretization information, and even column widths. Because the format includes the original data, it is always possible to reverse all data preprecessing operations, such as discretization. To save your data in GeNIe Data Format, select File-Save As...

data_save_as_dialog

 

Once a data file has been opened/loaded into GeNIe, the types of columns are fixed and there is no way of changing this type. GeNIe uses a database program to keep and store data and it checks the type. Once the type has been set, it stays the same for the duration of the session. If all values in a column are numbers, the type is numerical. When learning, you have control over whether a variable is judged continuous or discrete. For that set the Discrete threshold in the learning dialog. When the number of different values in a column exceeds the threshold value, the column is judged to be continuous.