With EZClassifier, you can effortlessly classify any text, revolutionizing the way you handle natural language text.
EZClassifier makes it easier than ever before.
EZClassifier uses an innovative classification technology! Unlike traditional deep learning, that demand thousands of meticulously tagged real data, EZClassifier uses a breakthrough machine learning algorithm that thrives on just a few examples. Say goodbye to the data deluge and welcome the future of effortless, efficient, and accurate classification.
1 - Getting Started
What you need to know to use EZClassifier
Data classification is the process of organizing and categorizing data based on specific criteria or attributes.
It helps make data more structured, accessible, and understandable.
Let’s see an example
Suppose you need to categorize a set of texts that contain mixed references to cats, actors, dogs, and other things that don’t matter.
You will need a CSV file containing a few examples. In this file, each example corresponds to a row that provides at least two fields:
prototype: a text that exemplifies a typical element in the category specified in the second field
class: a short text that represent a category name
These examples are used by EZClassifier to create a personalized model. You can add to the model as many prototypes and categories as you like, as long as there is at least one example for each category. EZClassifier works with any language, even when used simultaneously in the same text or the same data file and/or examples.
Once you created your model, you use it to classify your data stream.
EZClassifier will add two additional fields to your data:
class: that is the predicted category for the described row
similarity: a number from 0 to 1 that represents the confidence of EZClassifier about its classification (1=maxumum confidence , 0=no confidence).
Here are some examples you can use to train your model:
prototype
class
Persian Cat: Known for their long, luxurious fur and sweet temperament, Persian cats are one of the most popular breeds
cat
Maine Coon: These are among the largest domestic cat breeds. They have tufted ears, a bushy tail, and a friendly, gentle personality
cat
Siamese Cat: Siamese cats are known for their striking blue almond-shaped eyes, short coat, and vocal nature
cat
Ragdoll Cat: Ragdolls are large, affectionate cats known for their tendency to go limp when you hold them, hence the name Ragdoll
cat
German Shepherd: Intelligent and versatile, often used in police and military work
dog
Rottweiler: Strong and loyal, originally bred for herding and guarding
dog
Siberian Husky: Known for their endurance and striking appearance, used as sled dogs
dog
Doberman Pinscher: Agile and protective, often used as guard dogs
dog
Meryl Streep: Known for her incredible talent and versatility, Meryl Streep is one of the most acclaimed and decorated actresses in Hollywood history
actor
Leonardo DiCaprio: Leonardo DiCaprio is a highly respected actor who has starred in a wide range of critically acclaimed films
actor
Viola Davis: Viola Davis is a talented actress known for her powerful performances
actor
Here are the data you want to classify:
data
Bengal Cat: Bengal cats have a wild appearance with rosette-shaped spots on their coat, reminiscent of a leopard.
Scottish Fold: Scottish Fold cats are recognized by their unique folded ears, which give them an endearing appearance.
Sphynx Cat: Sphynx cats are a hairless breed with wrinkled skin.
Shovel: A shovel is a tool with a flat, wide blade and a long handle, used for digging, lifting, and moving soil, gravel, or materials.
Garden Fork: A garden fork has sturdy tines and a handle, used for loosening soil, breaking up clumps, and mixing in compost.
Denzel Washington: Denzel Washington is an iconic American actor with a commanding presence on screen.
Abyssinian: Abyssinian cats are active and playful with a short, ticked coat.
Rake: Rakes have curved or straight teeth attached to a handle and are used for leveling soil, removing debris, and spreading mulc
Poodle: Poodles are highly intelligent and come in different sizes: Standard, Miniature, and Toy.
Dachshund: Dachshunds, or wiener dogs, are known for their long bodies and short legs.
Yorkshire Terrier: Yorkies are small but spirited dogs with long, silky hair
Boxer: Boxers are medium to large dogs with strong, muscular bodies.
Cate Blanchett: Cate Blanchett is an Australian actress known for her elegance and versatility.
Tom Hanks: Tom Hanks is a beloved American actor known for his likable and relatable on-screen persona.
Siberian Husky: Huskies are known for their striking appearance, with a thick double coat and blue or multicolored eyes.
British Shorthair: British Shorthairs are known for their dense, plush coat and round faces.
Russian Blue: Russian Blue cats have a distinctive bluish-gray coat and striking green eyes.
Hoe: A hoe has a flat, blade-like head and a long handle, used for weeding, cultivating, and breaking up soil.
Pruning Shears: Prunin
Here are some results:
data
class
similarity
Bengal Cat: Bengal cats have a wild appearance with rosette-shaped spots on their coat, reminiscent of a leopard.
cat
0.864848
Scottish Fold: Scottish Fold cats are recognized by their unique folded ears, which give them an endearing appearance.
cat
0.858967
Sphynx Cat: Sphynx cats are a hairless breed with wrinkled skin.
cat
0.858745
Shovel: A shovel is a tool with a flat, wide blade and a long handle, used for digging, lifting, and moving soil, gravel, or materials.
OTHER
0.760553
Garden Fork: A garden fork has sturdy tines and a handle, used for loosening soil, breaking up clumps, and mixing in compost.
OTHER
0.764859
Abyssinian: Abyssinian cats are active and playful with a short, ticked coat.
cat
0.865725
Rake: Rakes have curved or straight teeth attached to a handle and are used for leveling soil, removing debris, and spreading mulc
OTHER
0.755273
Poodle: Poodles are highly intelligent and come in different sizes: Standard, Miniature, and Toy.
dog
0.843957
Dachshund: Dachshunds, or wiener dogs, are known for their long bodies and short legs.
dog
0.845551
Yorkshire Terrier: Yorkies are small but spirited dogs with long, silky hair
dog
0.853720
Boxer: Boxers are medium to large dogs with strong, muscular bodies.
dog
0.860509
Denzel Washington: Denzel Washington is an iconic American actor with a commanding presence on screen.
actor
0.855477
Cate Blanchett: Cate Blanchett is an Australian actress known for her elegance and versatility.
actor
0.876928
Tom Hanks: Tom Hanks is a beloved American actor known for his likable and relatable on-screen persona.
actor
0.841147
Siberian Husky: Huskies are known for their striking appearance, with a thick double coat and blue or multicolored eyes.
dog
0.931415
British Shorthair: British Shorthairs are known for their dense, plush coat and round faces.
cat
0.878175
Russian Blue: Russian Blue cats have a distinctive bluish-gray coat and striking green eyes.
cat
0.891341
Hoe: A hoe has a flat, blade-like head and a long handle, used for weeding, cultivating, and breaking up soil.
OTHER
0.760036
Pruning Shears: Prunin
OTHER
0.781802
Try it out!
HW & SW Prerequisites
To run EZClassifier, you need any computer with Java 17+ installed. You can download the Java 17+ for your architecture (Linux, macOS, Windows) from the the official Java site.
The result.csv file in your current directory will contain your classified data.
Note that data with a confidence less than 0.84 are assigned to the “OTHER” category
Also with streamed data
EZClassifier is also able work as a buffered stream data processor :
Remove your model with ezc model rm --name=mymodel.
Warning
Models that are not used for more than 3 month are automatically deleted.
2 - Command reference
All EZClassifier command syntax and options
All the services offered by EZClassifier can be accessed using the command through the ezc command.
See the getting started section for the instructions for downloading and installing it.
General syntax
For java JRE 17+ is required
The command returns 0 on success or a value > 0 on failure. The logs are written on stdout by default and can be redirected.
All commands requires a valid api key in the environment variable API_KEY
export TC_API_KEY=<api-key>
Usage:
java -jar <path of the downloaded jar file> [-hV][COMMAND]
A software agent that classifies text based on the provided prototypes
Options:
-h, --help Show this help message and exit.
-V, --version Print version information and exit.
Commands:
model Manage models
classify Perform classification
Command model train
Train the model
model train [-hHSV] -C=<labelIndex> [-e=<apiEndpoint>][-i=<inputFilename>][-k=<apiKey>] -n=<name> -P=<textIndex>
[-W=<weightIndex>]
You can enhance an already trained model by adding new examples through multiple calls to the “model train” command.
Options:
-C, --class-index=<labelIndex> The column index in the CSV file that contains the field with the class attached to the text (from 0). By default is 1
-e, --endpoint=<apiEndpoint>
Api endpoint. By default https://api.mopso.io/v1/tc
-h, --help Show this help message and exit.
-H, --header Will ignore the first line of the CSV file.
-i, --input=<inputFilename> The file containing the training data, by default “-” that means std in (e.g. -i - ). The stream is supposed to be in CSV format and MUST contain two fields (“prototype” and “class”) and some additional fields.
-k, --api-key=<apiKey> A registered API key. If not present, the value from the env variable TC_API_KEY is used.
-n, --name=<name> Name of the model to train, create a new model if the name is not found.
-P, --prototype-index=<textIndex> The column index in the CSV file that contains the field with the text to classify (from 0)
-S, --strict Runs the program in strict mode: any partially recoverable exception thrown during the execution will stop the training process. If not run in strict mode, the application will try to compensate for as many errors as possible.
-V, --version Print version information and exit.
-W, --weight-index=<weightIndex> The column index in the CSV file that contains the field with the classification weight (from 0). Set to -1 if not present.
Command model ls
List models
model ls [-hV][-e=<apiEndpoint>][-k=<apiKey>]
Options:
-e, --endpoint=<apiEndpoint>
Api endpoint. By default https://api.mopso.io/v1/tc
-h, --help Show this help message and exit.
-k, --api-key=<apiKey> A registered API key. If not present, the value from the env variable TC_API_KEY is used.
-V, --version Print version information and exit.
Command model rm
model rm [-hV][-e=<apiEndpoint>][-k=<apiKey>] -n=<name>
Models that are not used for more than 3 month are automatically deleted.
Options:
-e, --endpoint=<apiEndpoint>
Api endpoint. By default https://api.mopso.io/v1/tc
-h, --help Show this help message and exit.
-k, --api-key=<apiKey> A registered API key. If not present, the value from the env variable TC_API_KEY is used.
-e, --endpoint=<apiEndpoint>
Api endpoint. By default https://api.mopso.io/v1/tc
-h, --help Show this help message and exit.
-H, --header If the flag is present, the first line of the input file is copied to the output with added columns ‘CLASS’ and ‘SIMILARITY_SCORE’. By default, it is assumed that the input has no header.
-i, --input=<inFilename> The input filename; “-” means stdin (e.g. -i - ). The file must be in CSV format.
-I, --index=<textIndex> The index in the CSV file that contains the field to classify (from 0). By default, the index is 0.
-k, --api-key=<apiKey> A registered API key. If not present, the value from the env variable TC_API_KEY is used.
-n, --name=<name> Model name.
--no-buffer
Execute the program in interactive mode. Will ignore –input, –output and –header options.
-o, --output=<outFilename> The output filename; “-” means stdout (e.g. -o - ).
-S, --strict Runs the program in strict mode: any partially recoverable exception thrown during the execution (i.e. a classification that fails or a row that can’t be parsed) will stop the program, truncating the output to the last stable state. If not run in strict mode, the application will try to compensate for as many errors as it’s possible.
-t, --threshold=<threshold> Number between 0 (not included) and 1 (included) that is used to determine whether a match ‘SIMILARITY_SCORE’ is too low to be considered valid. In this case, the ‘CLASS’ is set to ‘OTHER’. Default value is 0.84.
-T, --threads=<threads> The number of parallel jobs to be used by the classification services, by default is 1. If more than 1 is used, the output order is not preserved. The value is capped to the number of CPU cores.
-V, --version Print version information and exit.
3 - Training file
The training file format
To train your model you need a CSV file containing an example for each row.
Each example can contain a set of fields, some mandatory some optional. The index, staring from 0, of the fields in the record can be communicated to EZClassifier model train command with a set of command options
If the index of the fileld doe not exists in the record or the field format is not a valid value, the default value is used:
field name
field description
required
type
default value
field index
prototype
a text that exemplifies a typical element in the category specified in the second field
YES
String
N.A.
–prototype-index=0
class
a short text that represent a category name
YES
String
N.A.
–class-index=1
weight
a multiplicative factor for predicted similarity
NO
positive real number
1.0
–weight-index
bias
a additive value factor for predicted similarity
NO
real number
0.0
–bias-index
If an header raw is present, please do not forget to use the --header option in model train command
Here is an example of a model trainig file to find strings about cats:
prototype, class, weight, bias
"Persian Cat: Known for their long, luxurious fur and sweet temperament, Persian cats are one of the most popular breeds", cat
"siamese animal", cat, 0.8
"from siam", cat, 0.8, -0.5
In this example:
the first row is the header (to be discarded using -H option during model training)
the second row assumes a weight = 1.0 and a bias = 0.0
the third row assumes a weight = 8.8 and a bias = 0.0
the fourth row assumes a weight = 8.8 and a bias = -0.5
4 - Use cases
See how EZClassifier helps you
Here are some use cases where EZClassifier can increase your productivity.
4.1 - Anti-Money Laundry
detect suspicious transactions
Mopso, a leader in AML software, uses their special models to detect suspicious transactions.
4.2 - Asset management
asset classification
Use your customized models to classify assets in a room, in a building, in a finacial portfolio or in a whole city.
4.3 - Accounting
accounting reconciliation
Automatically assign a reference in the chart of accounts to each entry of the general Ledger entries.
4.4 - Content classification
Classify the content of posts, ads and websites
Are you tired of manually sorting through endless texts, ads, and web content? We have the perfect solution for you! Our cutting-edge system can classify texts, ads, posts, and entire web pages effortlessly, saving you time and effort.
🔍 What can it do?
Buying/Selling: Easily distinguish between content meant for buying and selling.
Type of Item: Identify the type of item being sold, whether it’s electronics, an house, clothing, or something else.
Size Categories: Classify items based on their size, be it large, medium, small, or any other dimension.
🌟 Why Choose Us?
Easy to customize: create you models just with few examples for each category and let our A.I. engine to do the rest.
Accuracy: Our system ensures accurate classification, making sure each item is placed in the correct category.
Efficiency: Save time and resources with our efficient automated process.
Flexibility: Adapt the system to your needs, whether you’re dealing with tweets, posts, meta-descriptions, or detailed product descriptions.
🌐 Unlimited Possibilities:
Once the model is created, it can handle an unlimited number of ads, regardless of their complexity. Whether it’s a short tweet or a detailed product description, our system can handle it all!
Ready to streamline your content classification process? Try our system today and experience the future of content management!
4.5 - Talent matching
skill and occupation mapping
Classify personnel skills according to your taxonomy to improve the profile creation and achieve a better match with job offerings and job postings.