This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Training file

The training file format

    To train your model you need a CSV file containing an example for each row.

    Each example can contain a set of fields, some mandatory some optional. The index, staring from 0, of the fields in the record can be communicated to EZClassifier model train command with a set of command options

    If the index of the fileld doe not exists in the record or the field format is not a valid value, the default value is used:

    field namefield descriptionrequiredtypedefault valuefield index
    prototypea text that exemplifies a typical element in the category specified in the second fieldYESStringN.A.–prototype-index=0
    classa short text that represent a category nameYESStringN.A.–class-index=1
    weighta multiplicative factor for predicted similarityNOpositive real number1.0–weight-index
    biasa additive value factor for predicted similarityNOreal number0.0–bias-index

    If an header raw is present, please do not forget to use the --header option in model train command

    Here is an example of a model trainig file to find strings about cats:

    prototype, class, weight, bias
    "Persian Cat: Known for their long, luxurious fur and sweet temperament, Persian cats are one of the most popular breeds", cat
    "siamese animal", cat, 0.8
    "from siam", cat, 0.8, -0.5
    

    In this example:

    • the first row is the header (to be discarded using -H option during model training)
    • the second row assumes a weight = 1.0 and a bias = 0.0
    • the third row assumes a weight = 8.8 and a bias = 0.0
    • the fourth row assumes a weight = 8.8 and a bias = -0.5