Natural Language Processing in Java using Apache OpenNLP | Part-Of-Speech (POS) tagger | Simple example for beginners

In this article we will create a simple example of part of speech (POS) tagging feature of  ‘Natural Language Processing‘ (NLP) aspect of ‘Artificial Intelligence‘ using Apache OpenNLP API in Java. POS tagging is a process of analyzing grammatical structure of a sentence & detect grammatical category of each word like verb, noun etc.

Example for document categorizer in this article

Creating training data

As per javadoc documentation of WordTagSampleStream format for training data file is as followed.

A stream filter which reads a sentence per line which contains words and tags in word_tag format

Lets create such a file. Here you can find list of POS tags. For simplicity, you can use one of the free online pos tagger like this one. We will use this tool create with input as “itsallbinary is a blogging website with very good articles. I like this website.” & use output to create training data file as given below.



Lets train a model & test with sentences

We will use POSTaggerFactory with POSTaggerME to create a model i.e. POSModel. Once we have a model trained POSTaggerME we will use it to tag tokens from other sentences. We will also need tokenizer feature to tokenize sentences. Read here in details about tokenizer with example.

Here is the code for POS tagger training & testing.




Output:

As you can see in output below, out input sentence “I like itsallbinary” is tagged correctly as pr out training data.



Leave a Reply

Your email address will not be published. Required fields are marked *