Getting Started

What you need to know to use EZClassifier

Data classification is the process of organizing and categorizing data based on specific criteria or attributes. It helps make data more structured, accessible, and understandable.

Let’s see an example

Suppose you need to categorize a set of texts that contain mixed references to cats, actors, dogs, and other things that don’t matter.

You will need a CSV file containing a few examples. In this file, each example corresponds to a row that provides at least two fields:

  • prototype: a text that exemplifies a typical element in the category specified in the second field
  • class: a short text that represent a category name

These examples are used by EZClassifier to create a personalized model. You can add to the model as many prototypes and categories as you like, as long as there is at least one example for each category. EZClassifier works with any language, even when used simultaneously in the same text or the same data file and/or examples.

Once you created your model, you use it to classify your data stream. EZClassifier will add two additional fields to your data:

  • class: that is the predicted category for the described row
  • similarity: a number from 0 to 1 that represents the confidence of EZClassifier about its classification (1=maxumum confidence , 0=no confidence).

Here are some examples you can use to train your model:

prototypeclass
Persian Cat: Known for their long, luxurious fur and sweet temperament, Persian cats are one of the most popular breedscat
Maine Coon: These are among the largest domestic cat breeds. They have tufted ears, a bushy tail, and a friendly, gentle personalitycat
Siamese Cat: Siamese cats are known for their striking blue almond-shaped eyes, short coat, and vocal naturecat
Ragdoll Cat: Ragdolls are large, affectionate cats known for their tendency to go limp when you hold them, hence the name Ragdollcat
German Shepherd: Intelligent and versatile, often used in police and military workdog
Rottweiler: Strong and loyal, originally bred for herding and guardingdog
Siberian Husky: Known for their endurance and striking appearance, used as sled dogsdog
Doberman Pinscher: Agile and protective, often used as guard dogsdog
Meryl Streep: Known for her incredible talent and versatility, Meryl Streep is one of the most acclaimed and decorated actresses in Hollywood historyactor
Leonardo DiCaprio: Leonardo DiCaprio is a highly respected actor who has starred in a wide range of critically acclaimed filmsactor
Viola Davis: Viola Davis is a talented actress known for her powerful performancesactor

Here are the data you want to classify:

data
Bengal Cat: Bengal cats have a wild appearance with rosette-shaped spots on their coat, reminiscent of a leopard.
Scottish Fold: Scottish Fold cats are recognized by their unique folded ears, which give them an endearing appearance.
Sphynx Cat: Sphynx cats are a hairless breed with wrinkled skin.
Shovel: A shovel is a tool with a flat, wide blade and a long handle, used for digging, lifting, and moving soil, gravel, or materials.
Garden Fork: A garden fork has sturdy tines and a handle, used for loosening soil, breaking up clumps, and mixing in compost.
Denzel Washington: Denzel Washington is an iconic American actor with a commanding presence on screen.
Abyssinian: Abyssinian cats are active and playful with a short, ticked coat.
Rake: Rakes have curved or straight teeth attached to a handle and are used for leveling soil, removing debris, and spreading mulc
Poodle: Poodles are highly intelligent and come in different sizes: Standard, Miniature, and Toy.
Dachshund: Dachshunds, or wiener dogs, are known for their long bodies and short legs.
Yorkshire Terrier: Yorkies are small but spirited dogs with long, silky hair
Boxer: Boxers are medium to large dogs with strong, muscular bodies.
Cate Blanchett: Cate Blanchett is an Australian actress known for her elegance and versatility.
Tom Hanks: Tom Hanks is a beloved American actor known for his likable and relatable on-screen persona.
Siberian Husky: Huskies are known for their striking appearance, with a thick double coat and blue or multicolored eyes.
British Shorthair: British Shorthairs are known for their dense, plush coat and round faces.
Russian Blue: Russian Blue cats have a distinctive bluish-gray coat and striking green eyes.
Hoe: A hoe has a flat, blade-like head and a long handle, used for weeding, cultivating, and breaking up soil.
Pruning Shears: Prunin

Here are some results:

dataclasssimilarity
Bengal Cat: Bengal cats have a wild appearance with rosette-shaped spots on their coat, reminiscent of a leopard.cat0.864848
Scottish Fold: Scottish Fold cats are recognized by their unique folded ears, which give them an endearing appearance.cat0.858967
Sphynx Cat: Sphynx cats are a hairless breed with wrinkled skin.cat0.858745
Shovel: A shovel is a tool with a flat, wide blade and a long handle, used for digging, lifting, and moving soil, gravel, or materials.OTHER0.760553
Garden Fork: A garden fork has sturdy tines and a handle, used for loosening soil, breaking up clumps, and mixing in compost.OTHER0.764859
Abyssinian: Abyssinian cats are active and playful with a short, ticked coat.cat0.865725
Rake: Rakes have curved or straight teeth attached to a handle and are used for leveling soil, removing debris, and spreading mulcOTHER0.755273
Poodle: Poodles are highly intelligent and come in different sizes: Standard, Miniature, and Toy.dog0.843957
Dachshund: Dachshunds, or wiener dogs, are known for their long bodies and short legs.dog0.845551
Yorkshire Terrier: Yorkies are small but spirited dogs with long, silky hairdog0.853720
Boxer: Boxers are medium to large dogs with strong, muscular bodies.dog0.860509
Denzel Washington: Denzel Washington is an iconic American actor with a commanding presence on screen.actor0.855477
Cate Blanchett: Cate Blanchett is an Australian actress known for her elegance and versatility.actor0.876928
Tom Hanks: Tom Hanks is a beloved American actor known for his likable and relatable on-screen persona.actor0.841147
Siberian Husky: Huskies are known for their striking appearance, with a thick double coat and blue or multicolored eyes.dog0.931415
British Shorthair: British Shorthairs are known for their dense, plush coat and round faces.cat0.878175
Russian Blue: Russian Blue cats have a distinctive bluish-gray coat and striking green eyes.cat0.891341
Hoe: A hoe has a flat, blade-like head and a long handle, used for weeding, cultivating, and breaking up soil.OTHER0.760036
Pruning Shears: PruninOTHER0.781802

Try it out!

HW & SW Prerequisites

To run EZClassifier, you need any computer with Java 17+ installed. You can download the Java 17+ for your architecture (Linux, macOS, Windows) from the the official Java site.

Install EZClassifier

Next, download the latest version of the EZC JAR file to a directory of your choice.

Lastly, you’ll need an API Key to enable the services (see prices here). Proof of Concept (POC) and free plans are available upon request.

Create shortcut for launching the java command and set the TC_API_KEY environment with your api key:

For example, with bash, open a terminal and type:

export TC_API_KEY=HERE-IS-YOUR-API-KEY
alias ezc='java -jar ezc.jar' 

For example, in windows open a terminal windows (CMD) and type:

set TC_API_KEY=HERE-IS-YOUR-API-KEY
doskey ezc=java -jar "%USERPROFILE%\Downloads\ezc.jar" $*

Test that the system is working:

# be sure it reports a java version > 17 
ezc --version

Step 1: download example files

Here you can download some sample data:

Step 2: create a model

Create a new model from your examples:

ezc model train --name=mymodel --header --input=examples.csv

Be sure to use as –input argument the path (relative or absolute) of the downloaded model source file

To list all the available models you can use ezc model ls

Step 3: classify your data using the created model:

ezc classify --name=mymodel --header --input=input-data.csv --output=result.csv --threshold=0.84

The result.csv file in your current directory will contain your classified data. Note that data with a confidence less than 0.84 are assigned to the “OTHER” category

Remove your model with ezc model rm --name=mymodel.

Last modified October 24, 2023: updated examples (6d348c5)