data stories

Who tweeted that...?

7 June 2016 | Tine Wiederer, Stefan Keil and Moritz Klack

This tweet-classifier can be used to categorize tweets as written by the US presidential candidates Donald Trump, Bernie Sanders and Hillary Clinton. Simply paste or write a tweet to see which of the three is most likely to be the author.

This application is an example of Document Classification using a Naive Bayes Classifier. The classifier was trained with a set of 950 tweets by each of the candidates, which we scraped from twitter. To test the application a test set of 50 tweets by each candidate was used. 137 of the 150 test tweets were classified correctly. We used the library Bayes for setting up this application.

Please note that 100% length of the bar chart does not indicate a probability of 1! The algorithm returns the probabilities of a tweet belonging to the categories as values between 0 and 1. We mapped the values so the highest probability fits 100% of the bar length and the other values are calculated correspondingly.

You can read more about the technique of Naive Bayes Classifiers in our blog post "Document Classification In Javascript".