CSV data format

Training (parameter learning) requires a dataset in CSV format. This CSV file must be formatted as follows:

  • The first row of the CSV must be a header consisting of variable names

  • The subsequent rows of the CSV must be data points that match the categories of the variables

Additionally, the CSV data must conform to some basic rules to pass validation.

For example, in the Sprinkler data set there are four possible variables: cloudy, rain, sprinkler, and wetGrass. Each of these variables consists of the following categories:

  • cloudy: yes, no

  • rain: yes, no

  • sprinkler: on, off

  • wetGrass: yes, no

An example CSV for this dataset would be (as a table):

cloudy
rain
sprinkler
wetGrass

no

no

off

no

yes

no

off

no

yes

yes

off

yes

yes

yes

off

yes

yes

yes

off

yes

In CSV format, this dataset would have the following form:

cloudy,rain,sprinkler,wet_grass
no,no,off,no
yes,no,off,no
yes,yes,off,yes
yes,yes,off,yes
yes,yes,off,yes
circle-exclamation

Last updated