This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Overview

With EZClassifier, you can effortlessly classify any text, revolutionizing the way you handle natural language text. EZClassifier makes it easier than ever before.

EZClassifier uses an innovative classification technology! Unlike traditional deep learning, that demand thousands of meticulously tagged real data, EZClassifier uses a breakthrough machine learning algorithm that thrives on just a few examples. Say goodbye to the data deluge and welcome the future of effortless, efficient, and accurate classification.

1 - Getting Started

What you need to know to use EZClassifier

Data classification is the process of organizing and categorizing data based on specific criteria or attributes. It helps make data more structured, accessible, and understandable.

Let’s see an example

Suppose you need to categorize a set of texts that contain mixed references to cats, actors, dogs, and other things that don’t matter.

You will need a CSV file containing a few examples. In this file, each example corresponds to a row that provides at least two fields:

  • prototype: a text that exemplifies a typical element in the category specified in the second field
  • class: a short text that represent a category name

These examples are used by EZClassifier to create a personalized model. You can add to the model as many prototypes and categories as you like, as long as there is at least one example for each category. EZClassifier works with any language, even when used simultaneously in the same text or the same data file and/or examples.

Once you created your model, you use it to classify your data stream. EZClassifier will add two additional fields to your data:

  • class: that is the predicted category for the described row
  • similarity: a number from 0 to 1 that represents the confidence of EZClassifier about its classification (1=maxumum confidence , 0=no confidence).

Here are some examples you can use to train your model:

prototypeclass
Persian Cat: Known for their long, luxurious fur and sweet temperament, Persian cats are one of the most popular breedscat
Maine Coon: These are among the largest domestic cat breeds. They have tufted ears, a bushy tail, and a friendly, gentle personalitycat
Siamese Cat: Siamese cats are known for their striking blue almond-shaped eyes, short coat, and vocal naturecat
Ragdoll Cat: Ragdolls are large, affectionate cats known for their tendency to go limp when you hold them, hence the name Ragdollcat
German Shepherd: Intelligent and versatile, often used in police and military workdog
Rottweiler: Strong and loyal, originally bred for herding and guardingdog
Siberian Husky: Known for their endurance and striking appearance, used as sled dogsdog
Doberman Pinscher: Agile and protective, often used as guard dogsdog
Meryl Streep: Known for her incredible talent and versatility, Meryl Streep is one of the most acclaimed and decorated actresses in Hollywood historyactor
Leonardo DiCaprio: Leonardo DiCaprio is a highly respected actor who has starred in a wide range of critically acclaimed filmsactor
Viola Davis: Viola Davis is a talented actress known for her powerful performancesactor

Here are the data you want to classify:

data
Bengal Cat: Bengal cats have a wild appearance with rosette-shaped spots on their coat, reminiscent of a leopard.
Scottish Fold: Scottish Fold cats are recognized by their unique folded ears, which give them an endearing appearance.
Sphynx Cat: Sphynx cats are a hairless breed with wrinkled skin.
Shovel: A shovel is a tool with a flat, wide blade and a long handle, used for digging, lifting, and moving soil, gravel, or materials.
Garden Fork: A garden fork has sturdy tines and a handle, used for loosening soil, breaking up clumps, and mixing in compost.
Denzel Washington: Denzel Washington is an iconic American actor with a commanding presence on screen.
Abyssinian: Abyssinian cats are active and playful with a short, ticked coat.
Rake: Rakes have curved or straight teeth attached to a handle and are used for leveling soil, removing debris, and spreading mulc
Poodle: Poodles are highly intelligent and come in different sizes: Standard, Miniature, and Toy.
Dachshund: Dachshunds, or wiener dogs, are known for their long bodies and short legs.
Yorkshire Terrier: Yorkies are small but spirited dogs with long, silky hair
Boxer: Boxers are medium to large dogs with strong, muscular bodies.
Cate Blanchett: Cate Blanchett is an Australian actress known for her elegance and versatility.
Tom Hanks: Tom Hanks is a beloved American actor known for his likable and relatable on-screen persona.
Siberian Husky: Huskies are known for their striking appearance, with a thick double coat and blue or multicolored eyes.
British Shorthair: British Shorthairs are known for their dense, plush coat and round faces.
Russian Blue: Russian Blue cats have a distinctive bluish-gray coat and striking green eyes.
Hoe: A hoe has a flat, blade-like head and a long handle, used for weeding, cultivating, and breaking up soil.
Pruning Shears: Prunin

Here are some results:

dataclasssimilarity
Bengal Cat: Bengal cats have a wild appearance with rosette-shaped spots on their coat, reminiscent of a leopard.cat0.864848
Scottish Fold: Scottish Fold cats are recognized by their unique folded ears, which give them an endearing appearance.cat0.858967
Sphynx Cat: Sphynx cats are a hairless breed with wrinkled skin.cat0.858745
Shovel: A shovel is a tool with a flat, wide blade and a long handle, used for digging, lifting, and moving soil, gravel, or materials.OTHER0.760553
Garden Fork: A garden fork has sturdy tines and a handle, used for loosening soil, breaking up clumps, and mixing in compost.OTHER0.764859
Abyssinian: Abyssinian cats are active and playful with a short, ticked coat.cat0.865725
Rake: Rakes have curved or straight teeth attached to a handle and are used for leveling soil, removing debris, and spreading mulcOTHER0.755273
Poodle: Poodles are highly intelligent and come in different sizes: Standard, Miniature, and Toy.dog0.843957
Dachshund: Dachshunds, or wiener dogs, are known for their long bodies and short legs.dog0.845551
Yorkshire Terrier: Yorkies are small but spirited dogs with long, silky hairdog0.853720
Boxer: Boxers are medium to large dogs with strong, muscular bodies.dog0.860509
Denzel Washington: Denzel Washington is an iconic American actor with a commanding presence on screen.actor0.855477
Cate Blanchett: Cate Blanchett is an Australian actress known for her elegance and versatility.actor0.876928
Tom Hanks: Tom Hanks is a beloved American actor known for his likable and relatable on-screen persona.actor0.841147
Siberian Husky: Huskies are known for their striking appearance, with a thick double coat and blue or multicolored eyes.dog0.931415
British Shorthair: British Shorthairs are known for their dense, plush coat and round faces.cat0.878175
Russian Blue: Russian Blue cats have a distinctive bluish-gray coat and striking green eyes.cat0.891341
Hoe: A hoe has a flat, blade-like head and a long handle, used for weeding, cultivating, and breaking up soil.OTHER0.760036
Pruning Shears: PruninOTHER0.781802

Try it out!

HW & SW Prerequisites

To run EZClassifier, you need any computer with Java 17+ installed. You can download the Java 17+ for your architecture (Linux, macOS, Windows) from the the official Java site.

Install EZClassifier

Next, download the latest version of the EZC JAR file to a directory of your choice.

Lastly, you’ll need an API Key to enable the services (see prices here). Proof of Concept (POC) and free plans are available upon request.

Create shortcut for launching the java command and set the TC_API_KEY environment with your api key:

For example, with bash, open a terminal and type:

export TC_API_KEY=HERE-IS-YOUR-API-KEY
alias ezc='java -jar ezc.jar' 

For example, in windows open a terminal windows (CMD) and type:

set TC_API_KEY=HERE-IS-YOUR-API-KEY
doskey ezc=java -jar "%USERPROFILE%\Downloads\ezc.jar" $*

Test that the system is working:

# be sure it reports a java version > 17 
ezc --version

Step 1: download example files

Here you can download some sample data:

Step 2: create a model

Create a new model from your examples:

ezc model train --name=mymodel --header --input=examples.csv

Be sure to use as –input argument the path (relative or absolute) of the downloaded model source file

To list all the available models you can use ezc model ls

Step 3: classify your data using the created model:

ezc classify --name=mymodel --header --input=input-data.csv --output=result.csv --threshold=0.84

The result.csv file in your current directory will contain your classified data. Note that data with a confidence less than 0.84 are assigned to the “OTHER” category

Remove your model with ezc model rm --name=mymodel.

2 - Command reference

All EZClassifier command syntax and options

All the services offered by EZClassifier can be accessed using the command through the ezc command. See the getting started section for the instructions for downloading and installing it.

General syntax

For java JRE 17+ is required

The command returns 0 on success or a value > 0 on failure. The logs are written on stdout by default and can be redirected.

All commands requires a valid api key in the environment variable API_KEY

export TC_API_KEY=<api-key>

Usage:

java -jar <path of the downloaded jar file> [-hV] [COMMAND]

A software agent that classifies text based on the provided prototypes

Options:

  • -h, --help
    Show this help message and exit.

  • -V, --version
    Print version information and exit.

Commands:

  • model
    Manage models

  • classify
    Perform classification

Command model train

Train the model

model train [-hHSV] -C=<labelIndex> [-e=<apiEndpoint>]
                    [-i=<inputFilename>] [-k=<apiKey>] -n=<name> -P=<textIndex>
                    [-W=<weightIndex>]

You can enhance an already trained model by adding new examples through multiple calls to the “model train” command.

Options:

  • -C, --class-index=<labelIndex>
    The column index in the CSV file that contains the field with the class attached to the text (from 0). By default is 1

  • -e, --endpoint=<apiEndpoint> Api endpoint. By default https://api.mopso.io/v1/tc

  • -h, --help
    Show this help message and exit.

  • -H, --header
    Will ignore the first line of the CSV file.

  • -i, --input=<inputFilename>
    The file containing the training data, by default “-” that means std in (e.g. -i - ). The stream is supposed to be in CSV format and MUST contain two fields (“prototype” and “class”) and some additional fields.

  • -k, --api-key=<apiKey>
    A registered API key. If not present, the value from the env variable TC_API_KEY is used.

  • -n, --name=<name>
    Name of the model to train, create a new model if the name is not found.

  • -P, --prototype-index=<textIndex>
    The column index in the CSV file that contains the field with the text to classify (from 0)

  • -S, --strict
    Runs the program in strict mode: any partially recoverable exception thrown during the execution will stop the training process. If not run in strict mode, the application will try to compensate for as many errors as possible.

  • -V, --version
    Print version information and exit.

  • -W, --weight-index=<weightIndex>
    The column index in the CSV file that contains the field with the classification weight (from 0). Set to -1 if not present.

Command model ls

List models

model ls [-hV] [-e=<apiEndpoint>] [-k=<apiKey>]

Options:

  • -e, --endpoint=<apiEndpoint> Api endpoint. By default https://api.mopso.io/v1/tc

  • -h, --help
    Show this help message and exit.

  • -k, --api-key=<apiKey>
    A registered API key. If not present, the value from the env variable TC_API_KEY is used.

  • -V, --version
    Print version information and exit.

Command model rm

model rm [-hV] [-e=<apiEndpoint>] [-k=<apiKey>] -n=<name>

Models that are not used for more than 3 month are automatically deleted.

Options:

  • -e, --endpoint=<apiEndpoint> Api endpoint. By default https://api.mopso.io/v1/tc

  • -h, --help
    Show this help message and exit.

  • -k, --api-key=<apiKey>
    A registered API key. If not present, the value from the env variable TC_API_KEY is used.

  • -n, --name=<name>
    Name of the model to remove.

  • -V, --version
    Print version information and exit.

Command classify

Usage:
classify [-hHSV] [-e=<apiEndpoint>] [-i=<inFilename>] [-I=<textIndex>] [-k=<apiKey>] -n=<name> [-o=<outFilename>] [-t=<threshold>] [-T=<threads>]

Perform classification

  • -e, --endpoint=<apiEndpoint> Api endpoint. By default https://api.mopso.io/v1/tc

  • -h, --help
    Show this help message and exit.

  • -H, --header
    If the flag is present, the first line of the input file is copied to the output with added columns ‘CLASS’ and ‘SIMILARITY_SCORE’. By default, it is assumed that the input has no header.

  • -i, --input=<inFilename>
    The input filename; “-” means stdin (e.g. -i - ). The file must be in CSV format.

  • -I, --index=<textIndex>
    The index in the CSV file that contains the field to classify (from 0). By default, the index is 0.

  • -k, --api-key=<apiKey>
    A registered API key. If not present, the value from the env variable TC_API_KEY is used.

  • -n, --name=<name>
    Model name.

  • --no-buffer Execute the program in interactive mode. Will ignore –input, –output and –header options.

  • -o, --output=<outFilename>
    The output filename; “-” means stdout (e.g. -o - ).

  • -S, --strict
    Runs the program in strict mode: any partially recoverable exception thrown during the execution (i.e. a classification that fails or a row that can’t be parsed) will stop the program, truncating the output to the last stable state. If not run in strict mode, the application will try to compensate for as many errors as it’s possible.

  • -t, --threshold=<threshold>
    Number between 0 (not included) and 1 (included) that is used to determine whether a match ‘SIMILARITY_SCORE’ is too low to be considered valid. In this case, the ‘CLASS’ is set to ‘OTHER’. Default value is 0.84.

  • -T, --threads=<threads>
    The number of parallel jobs to be used by the classification services, by default is 1. If more than 1 is used, the output order is not preserved. The value is capped to the number of CPU cores.

  • -V, --version
    Print version information and exit.

3 - Training file

The training file format

To train your model you need a CSV file containing an example for each row.

Each example can contain a set of fields, some mandatory some optional. The index, staring from 0, of the fields in the record can be communicated to EZClassifier model train command with a set of command options

If the index of the fileld doe not exists in the record or the field format is not a valid value, the default value is used:

field namefield descriptionrequiredtypedefault valuefield index
prototypea text that exemplifies a typical element in the category specified in the second fieldYESStringN.A.–prototype-index=0
classa short text that represent a category nameYESStringN.A.–class-index=1
weighta multiplicative factor for predicted similarityNOpositive real number1.0–weight-index
biasa additive value factor for predicted similarityNOreal number0.0–bias-index

If an header raw is present, please do not forget to use the --header option in model train command

Here is an example of a model trainig file to find strings about cats:

prototype, class, weight, bias
"Persian Cat: Known for their long, luxurious fur and sweet temperament, Persian cats are one of the most popular breeds", cat
"siamese animal", cat, 0.8
"from siam", cat, 0.8, -0.5

In this example:

  • the first row is the header (to be discarded using -H option during model training)
  • the second row assumes a weight = 1.0 and a bias = 0.0
  • the third row assumes a weight = 8.8 and a bias = 0.0
  • the fourth row assumes a weight = 8.8 and a bias = -0.5

4 - Use cases

See how EZClassifier helps you

Here are some use cases where EZClassifier can increase your productivity.

4.1 - Anti-Money Laundry

detect suspicious transactions

Mopso, a leader in AML software, uses their special models to detect suspicious transactions.

4.2 - Asset management

asset classification

Use your customized models to classify assets in a room, in a building, in a finacial portfolio or in a whole city.

4.3 - Accounting

accounting reconciliation

Automatically assign a reference in the chart of accounts to each entry of the general Ledger entries.

4.4 - Content classification

Classify the content of posts, ads and websites

Are you tired of manually sorting through endless texts, ads, and web content? We have the perfect solution for you! Our cutting-edge system can classify texts, ads, posts, and entire web pages effortlessly, saving you time and effort.

🔍 What can it do?

  • Buying/Selling: Easily distinguish between content meant for buying and selling.
  • Type of Item: Identify the type of item being sold, whether it’s electronics, an house, clothing, or something else.
  • Size Categories: Classify items based on their size, be it large, medium, small, or any other dimension.

🌟 Why Choose Us?

  • Easy to customize: create you models just with few examples for each category and let our A.I. engine to do the rest.
  • Accuracy: Our system ensures accurate classification, making sure each item is placed in the correct category.
  • Efficiency: Save time and resources with our efficient automated process.
  • Flexibility: Adapt the system to your needs, whether you’re dealing with tweets, posts, meta-descriptions, or detailed product descriptions.

🌐 Unlimited Possibilities: Once the model is created, it can handle an unlimited number of ads, regardless of their complexity. Whether it’s a short tweet or a detailed product description, our system can handle it all!

Ready to streamline your content classification process? Try our system today and experience the future of content management!

4.5 - Talent matching

skill and occupation mapping

Classify personnel skills according to your taxonomy to improve the profile creation and achieve a better match with job offerings and job postings.

5 - Prices

Please contact us for special a quote . Offerings plans start form 100E. Free trial also available.

The EZservices are billed based on the following factors:

  • The number of created models.
  • The amount of data to be classified.

You have the option to choose from pre-paid packages or monthly plans.

For pre-paid packages, please note that they must be used within 12 months.