Why sentiment analysis engines need customization


Share on LinkedIn

There was a time, not too long ago, when the general population would have bet their life savings that flying cars and human-like robots would exist by the year 2014.

There are definitely no flying cars, and robots aren’t exactly how we pictured them. However, we do have artificial intelligence that can understand what people are saying.

Creepy? Sort of, but more cool than creepy in my opinion.

When artificial language is used to understand human (natural) language, it’s referred to as natural language processing (NLP). Most NLP engines that are used to analyze text come equipped with something called sentiment analysis, which is a technology that lets us know whether text is positive, negative or neutral.

Good NLP engines will be able to assign sentiment to a single word or phrase. “Awful”, for example, is a word with negative sentiment. “Delicious” is positive and “Blue chair” is neutral.

Sentiment analysis can also tell us the polarity of an entire document. For example, a tweet that reads, “The service was awful but the food was delicious!” would be neutral. That’s because the positive and the negative cancel out to make for a neutral score.

Really good NLP engines will give you a sentiment score for the individual words and phrases that bear sentiment, and another score for the entire document as a whole. So in the example above, we would know that the tweet is neutral, but it contains valuable positive and negative information.

The problem with sentiment analysis is sometimes it’s wrong. It’s just a limitation that we have to deal with. I mean, humans can’t agree on the polarity of a document half the time. Even grad students won’t agree 20% of the time.

“Oh man, that was nasty!” Is this sentence positive or negative?

Surely, it must be negative. “Nasty” is a negative word, and everything else in this sentence is neutral. Final answer, negative! Drum role…

Wrong! It’s positive.

The person who said this used the American slang definition of nasty, which has positive sentiment. There is absolutely no way to know by reading the sentence. So, if you (a human) were just tricked by reading this article, how is a machine supposed to figure it out? Answer: Tell the engine what’s positive and what’s negative.

High quality NLP engines will let you customize your sentiment analysis settings. “Nasty” is negative by default. If you’re processing slang where “nasty” is considered a positive term, you would access your engine’s sentiment customization function, and assign a positive score to the word.

The better NLP engines out there will make this entire process a piece of cake. Without this kind of customization, the machine could very well be useless in your work. When you choose a sentiment analysis engine, make sure it allows for customization.

Otherwise, you’ll be stuck with a machine that interprets everything literally, and you’ll never get accurate results.

Scott Van Boeyen
I'm the community manager for both Lexalytics and Semantria, contributing to the text analytics and sentiment analysis community by writing/blogging, helping reporters and journalists with ideas for content and providing thought leadership on social media.


Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here