In this blog, we will learn what bots are and how they can skew the sentiment analysis used in your trading strategy. We will cover the following topics:
But before you estimate the sentiment of a tweet you need to know if the tweet was an automated response of a bot or was it made by a human.
Why should we identify a bot?It is relevant because you need to know what the bots are doing, which in turn will tell you how the sentiment of a particular stock on Twitter is being manipulated. When we calculate the Twitter sentiment of a particular stock, we identify and remove those tweets made by bot users. This will give the true sentiment sans manipulation. This true sentiment can be a very powerful metric, when used with other technical indicators, to call the tops and bottoms of a trend.
BotometerIn python, we use the library called botometer to know if a particular tweet was made by a bot or not. The botometer library uses a machine learning algorithm trained on tens of thousands of labelled data. This algorithm’s output is a probability on a scale of 0 to 1, where 1 indicates that a twitter account is managed by a bot.
The Botometer API takes the user id as the input and then extracts 1200 features related to that user to compute a score. The Botometer gives separate scores for the following categories:
- Network features
- User features
- Friends features
- Temporal features
- Content features
- Sentiment features
Network featuresNetwork features of a user include information on the retweets, mentions, and hashtags that a user tweeted in the past.
For example, If the user is retweeting only those tweets made a particular handle then the user is most likely a bot.
User featuresThis contains user-specific information such as the user name, language, location, account created date etc., Generally, bots do not contain such information. And if they do, it will be something gibberish.
Temporal featuresThe category temporal features analyze the tweet rate, timing patterns of tweeting and retweeting etc., For example, if the account tweets at the same time intervals, then it is most likely a bot.
Election Tweets: Identifying the botsCurrently, India is conducting its general election. And there are many bots on twitter supporting their political parties. It is important to know which tweets were made by bots. But before that let us check if the botometer is working correctly. Let us take a known twitter handles real political leaders and check if it is predicting correctly.
First, we install the botometer library using the pip install as shown below and then instantiate a botometer object.
!pip install botometer
Now let us try with the known handle of American President Mr Donald Trump and see what the botometer says.
@realDonaldTrump is the Twitter handle of the American President. After fetching the bot score for this handle we can print and check it using the following command.
All the bot scores values are very low on a scale of 0-1, indicating that most likely the user is a human. We can also get the final score or the aggregate using the following command.
Since most of the tweets made by Mr Doanld Trump are in English, we can consider the English score of 0.03 on a scale of 0-1 as the final indicator. In general, we consider a score of more than 0.6 as an indication that an account is being controlled by a bot. So if any of you who had doubts whether the tweets made by the account @realDonaldTrump were really made by the president himself, then these scores should settle that doubt now.
Now let us get back to the Indian election and check how the twitter handles of the two top candidates perform.
First, let us run the analysis for the current Prime Minister of India: Mr Narender Modi. His twitter handle is @narendramodi
And his bot scores are
Since some of Mr Narendra Modi’s tweets are in Hindi, we need to consider the universal score in this case. Which again is very low indicating a human user.
Now let us perform the same exercise for the main opposition Prime Minister Candidate: Mr Rahul Gandhi.
Again the score here is very less indicating a human user. The purpose of this exercise is to check the sanity of the predictions made by the botometer and they clearly look satisfactory and as expected.
But we have not yet identified the bot followers for these leaders. We can do this by importing the Twitter IDs using the tweepy library and then running the botometer check on each of them. The process of separating bots and calculating the market sentiment might seem like a very difficult task, but it can be done easily with proper preprocessing. In our new course on Sentiment Analysis in Trading we show you how to fetch the tweets related a stock using conditional statements and then perform the pre-processing to identify and remove the unrelated tweets. We then generate the market sentiment and create a trading strategy based on this.
ConclusionThe botometer library is built specifically for analysing the Twitter feed. It a very important library in the preprocessing of twitter data before the data can be used to create a trading strategy. Building gauging the market sentiment is a very difficult task and it needs data from various sources. And all data cannot be treated the same way. So, you need to develop new techniques to preprocess data from blogs and news articles.
You can learn how to preprocess different types of data and other important steps involved in the sentiment analysis by enrolling in our new course: Sentiment Analysis in Trading