Training file

The training file format

To train your model you need a CSV file containing an example for each row.

Each example can contain a set of fields, some mandatory some optional. The index, staring from 0, of the fields in the record can be communicated to EZClassifier model train command with a set of command options

If the index of the fileld doe not exists in the record or the field format is not a valid value, the default value is used:

field namefield descriptionrequiredtypedefault valuefield index
prototypea text that exemplifies a typical element in the category specified in the second fieldYESStringN.A.–prototype-index=0
classa short text that represent a category nameYESStringN.A.–class-index=1
weighta multiplicative factor for predicted similarityNOpositive real number1.0–weight-index
biasa additive value factor for predicted similarityNOreal number0.0–bias-index

If an header raw is present, please do not forget to use the --header option in model train command

Here is an example of a model trainig file to find strings about cats:

prototype, class, weight, bias
"Persian Cat: Known for their long, luxurious fur and sweet temperament, Persian cats are one of the most popular breeds", cat
"siamese animal", cat, 0.8
"from siam", cat, 0.8, -0.5

In this example:

  • the first row is the header (to be discarded using -H option during model training)
  • the second row assumes a weight = 1.0 and a bias = 0.0
  • the third row assumes a weight = 8.8 and a bias = 0.0
  • the fourth row assumes a weight = 8.8 and a bias = -0.5
Last modified October 6, 2023: fix: updated (61991cb)