Create your own chat bot in Java using Apache OpenNLP | Artificial Intelligence | Natural Language Processing

In this article we will create our own custom chat bot or automated chat agent. We will do this using Apache OpenNLP API library which provides “Natural Language Processing” in Java. “Natural Language Processing” is a branch of “Artificial Intelligence” through which human language is processed in a way that machines can understand it, use it & act on it.

If you are completely new to “Natural Language Processing” aspect of “Artificial Intelligence” then you can through this simple & example based tutorial to get started.

Tutorial | Natural Language Processing in Java using Apache OpenNLP

Example in this article for Chat Bot

  • User will chat with Chat Bot using console (to keep example simple).
  • Chat bot will be for a hypothetical product (mobile phone) selling company.
  • User will inquire about product like product features, price etc.
  • Chat bot will reply with greetings, answers to questions about product etc.

High level approach & flow for chat bot program

This short-n-quick video will give you high level design approach that we will take in our code example. This is a very simple & basic approach which will use many features of “Natural Language Processing”.



Models for our example

If you have gone through the basics tutorial of Apache OpenNLP, you are aware that we need either trained serialized models & raw samples data for different features of Open NLP.

In tutorials, we trained our own models for everything. For this example, we will use trained models from different sources (except for categorizer) so that we can focus on our chat bot code. All model files also available in our GitHub repository provided towards end of this article.

  • Sentence Detection, Tokenizer, POS Tagger model
  • Lemmatizer model
    • Link – Github
    • Could not find lemmatizer model from Apache so using this from public github repository. You can use any other model if you have.
    • This is not a serialized model. These are training samples, so we have to train & serialize our own model.
  • Categorizer model
    • As explained in video, we want to categorize or classify users input into certain categories so that our code knows what to respond. We will create our own custom model for categorizer.
    • Categories: To keep chat simple, lets define below categories. You can add or refine categories to experiment & improve.
      • greeting – Basic greetings that we anticipate user to use to start chat.
      • conversation-continue – Words like “ok”, “hmm” that user might use in between of conversation.
      • conversation-complete – Words or sentences that user might end to end conversation.
      • product-inquiry – Questions that user might ask to inquire about product or its features.
      • price-inquiry – Questions that user might ask to unquire about price of product

Lets create some samples data using above categories.

Here is the code to train a model for categorizer using above sample.



Lets code Chat Bot

Now that we are ready with models and familiar with Apache OpenNLP & the approach that we are going to take, its time to code Chat Bot.

We will prepare & store our answers for each categories in a HashMap so its easy to lookup.

Here is the code for the steps of chat bot.

  • It first trains categorizer model with latest samples data.
  • Then take input from user (through console).
  • Then it will break sentences.
  • Tokenize each sentence into words.
  • Find POS tags of each words as its required in next step.
  • Lemmatize each word using tokens & POS tags. This will make it very easy to categories as we don’t have to have all lemma possibilities in categorizer samples data.
  • Categorize lemma tokens & then find answer for detected category.




Here are the methods for sentence detection, tokenizing, POS or part-of-speech tagging, lemmatizing & categorizing.



Lets chat now

Now comes the interesting point. Lets chat with our Chat Bot.

You can fine complete code of this example including model files in our GitHub repository.

Further improvements you can experiment

  • You can further improve categorizer sample data by adding more categories, adding more samples.
  • You can try adding “Language detection” feature of Apache OpenNLP to detect which language user is using. If user is using some other language then you can request for specific language. Find pre-trained serialized model for language here on Apache site.

Further Reading

You might be interested in these articles as well.

Create your own video conference web application using Java & JavaScript



Leave a Reply

Your email address will not be published. Required fields are marked *