The field of AI and Natural Language Processing is progressing rapidly. So much so, it's hard to keep up with the latest developments, let alone actually begin to put them to use.

In today’s competitive landscape, anticipating your customer's needs and servicing them quickly is a must. For this reason, many teams are eager to augment their decision making processes with predictive and programmatic applications of AI.

However, AI has become quite the buzzword. Very few real-world AI applications are intelligent, while hyped-up state-of-the-art concepts are cooked up in isolated testing environments making them hardly practical for real-world use.

That's why we're here – to help you understand AI, the useful tools you have at your disposal, the impact they can have for you, and most importantly, to give you access in a way that is easy to use!

This brings us to a recent release in which we brought Zero-shot Classification capabilities to the Caravel product. Zero Shot Learning is perhaps the most exciting recent development in the world of NLP. However, the information available is very technical, making it hard to understand what it is, how it actually works, and how you can use it. As it is with all things AI and Machine Learning, the predictions are only as good as the data that goes into the algorithm. So with this release, we wanted to take the opportunity to provide an easy-to-understand explanation of what is happening under the hood so that you can get the most out of it.

Introducing Classification On-The-Fly

To put it simply, Zero-shot Learning is classification on-the-fly: It enables custom classifications that work for your data, for any set of categories you can come up with without the requirement of annotated data sets.

Training data is either in short supply, entirely unavailable, or hard and costly to gather at most companies. If you are able to collect the data, the task of gathering, cleaning, and annotating hundreds to thousands of samples of data to simply predict a few categories quickly becomes the most time-consuming part of your project.

Data prep like this has made AI inaccessible to the many companies that simply can’t afford dedicated data science resources or simply lack large volumes of annotated data.

You can always try to find a pre-trained model. However, keep in mind models that are pre-trained have never seen your data and will have trouble adapting. Eventually, all paths will lead you to the requirement of annotating data to get good classifications.

Data prep is time-consuming and costly

How it Works

Zero-shot uses a different approach when making predictions. Instead of requiring you to train a model on hundreds to thousands of annotated samples of a label, it starts with just the label and makes predictions based on the information it understands from the label name.

Workflow of classification via training vs zero-shot

You've probably used zero-shot learning in real life, as people often learn to recognize objects similarly. Let us consider a child that has seen a horse but never in their life have they seen a zebra.

However, she read about one, and she learned that a zebra is horse-like with black and white stripes.

One day, the child is at the zoo and spots an animal that is the shape of a horse and has black and white stripes. Even though she has never seen an image of a zebra before, she recognizes the attributes and instantly knows that it is a zebra!!!

Similarly, a machine with zero-shot capabilities can identify a zebra in an image even if it has never seen an image of a zebra before. Suppose the machine has seen other images of animals. In that case, it can break down the features of the zebra image by animal-like features, relating those to its understanding of a zebra.

A real-world example of zero-shot

Zero-shot Text Classification

Text classification enables you to structure your text into topics, expressed intents, emotions, sentiment, categories, and more. It has unlocked many capabilities, including conversational analysis, language recognition, document organization, semantic search, automated Q&A, and article recommendations.

To date, there have been two ways you can get started with text classification:

1. Use pre-trained models that fit your use case

You can find pre-trained models to use. The goal of any pre-trained model is to generalize itself to work well across several different sources of text.

You can also use third-party APIs that offer hosted prebuilt models to alleviate the costs of deployment.

While this is an excellent way to get started quickly, there are a few significant limitations to keep in mind:

  • If you find a model that works well for your use case, you will only be able to classify the labels that the model was trained to classify
  • You won’t have much control over the precision for your use case because that model has never seen your data

How do you improve precision for your use case? This leads us to your second option...

2. Train your own models

If you’re feeling adventurous, you can train your own models. While this does get around the limitations of pre-trained models, it is more costly and time-consuming.

Before picking up this endeavor, it’s essential to understand the full scope of your effort. At a minimum, before training your machine learning model, you must account for the following tasks:

  • Creation of clear, objective definitions for every label you want to predict
  • Collection of samples of text for every label – You'll need hundreds to thousands of samples for each label depending on the complexity of your label definitions
  • You’ll also want to collect samples for an “Other” label to classify text that doesn't match any of your labels
  • Cleaning and removing extraneous data
  • Dedicated analysts to understand and annotate each data sample
  • Selection of the best model for your use case and training of that model – Be sure to account for trial and error process here!
  • Deployment and ongoing maintenance of that model

Either approach is less than ideal. But, what if there was a way to get the efficiency of pre-trained models with the adaptability of training your own?

This is precisely the gap that zero-shot can fill.

Step-Up Zero-Shot!

Using zero-shot for text classification, you can assign any number of categories to a piece of text without having to train a model on samples of those categories – enabling custom text classification for your particular use case that is cost-effective and versatile.

What Can Zero-shot Classify?

With Zero-shot, you simply come up with label names, and the algorithm relies on information it can understand from labels to predict whether or not the text it observes matches your label.

Because of this, care must be taken when naming labels. The labels you choose must have a meaningful relationship to the text you wish to classify.

It also means that Zero-shot can’t predict every label. If a label is simply too ambiguous or hard to understand, it may not be the best candidate for Zero-shot.

When using zero-shot, we recommend using actionable, more specific labels.

So how do you come up with actionable labels that zero-shot can understand? Here are a few quick tips to help –

Tips for coming up with good labels

1. Think like a machine

Put yourself in the shoes of a robot. Would you be able to look at samples of this label and identify patterns in the text, or are the examples of this text all over the place? If the answer to the latter is yes, you’ll want to consider breaking your label into multiple sub-labels that are more specific.

2. Avoid technical jargon and ambiguous labels

Use words in your labels that you would expect to show up in the text. If your label can be interpreted to mean different things, consider adding clarity to your label.

For example, “bug” would predict beetles and app issues, while “software bug” would get you a better classifier to predict mentions of app issues. If you were trying to classify customers talking about issues with managing their accounts, an abstract label like “AM Issues” wouldn't work. A label like “Account management issue” would work much better!

3. Stick to the context of the text

"Angry customer" will work if you want to label statements about an "angry customer" like "This long term client is very unhappy!". But, unless the statement mentions an unhappy or angry customer, it will not magically determine that the person speaking that is unhappy is also a customer.

4. Try labels out and iterate!!!

And last but not least, the best way is to create labels and try them out on your data! The great thing about zero-shot text classification is the barrier to testing predictions on your set of labels is... zero!

Not actionable vs actionable labels

To summarize, label names should not be too general or too broad. Pick practical label names, and when in doubt, get more specific. And above all else, try variations to see what works best!

Now, to help get you started, here are a few examples of text classification tasks you can try where we’ve found zero-shot can shine.

Classification Tasks You Can Try

Topic Analysis

Perhaps the most common use case for text classification is to predict what a piece of text is about or its topic. Using a classifier with zero-shot capabilities, you can provide any set of topic labels and determine if a conversation is about one or some of those topics. This can be general categories like:

And more specific, actionable sub-topics of those categories, like:

Intent Classification

Intent classification determines what a person in a conversation is trying to achieve. It is beneficial for automation – whether it’s automating Q&A, prioritizing support requests or sales leads, or automating internal routing of cases and feedback. Using a classifier with zero-shot capabilities, you can provide any set of intent labels you want to predict:

Sentiment and Emotion Detection

Provide sentiment and emotion labels to determine a feeling towards something. Sentiment labels can predict attitude on a range from “very negative” to “neutral” and “very positive”. Emotion labels can predict how a customer is feeling in the moment:

Predicting Likely Behaviors

You can even provide labels to determine the likelihood of behavior based on the context of the text. For example, you can provide labels to determine if a customer who left a review is a likely advocate.

Prioritizing Requests

You can also provide labels to determine the severity and urgency of requests:

Get Started with Zero-shot Classification

With zero-shot you are no longer limited by a set of predefined labels, so there are many possibilities to what you can predict. The best way to discover the possibilities is to try it out!

Unfortunately, zero-shot classification is not widely available outside of the Machine Learning community.

Luckily, we’re always striving to create an experience that makes fast and versatile AI capabilities accessible to all. So starting today, we’ve opened up zero-shot classification within Caravel!

You can go get started for free and test zero-shot on your data using Caravel’s classifiers. Best of all, we place no limits on the number of times you can test classifications, so you can try a myriad of labels and see what works best for your use case.

To learn more about how to use zero-shot classification in Caravel, see here or watch the video below. To get started for free, sign up now.