In this blog, we will learn what bots are and how they can skew the sentiment analysis used in your trading strategy. We will cover the following topics:
When we perform trading on the basis of market sentiment, we need to fetch data from news sources such as Twitter, Reuters, Bloomberg and Webhosie etc. Although reading complete articles and gauging their sentiment can be difficult, estimating the sentiment of a tweet is not that complicated.
But before you estimate the sentiment of a tweet you need to know if the tweet was an automated response of a bot or was it made by a human.
You may ask why this is relevant?
Why should we identify a bot?
It is relevant because you need to know what the bots are doing, which in turn will tell you how the sentiment of a particular stock on Twitter is being manipulated. When we calculate the Twitter sentiment of a particular stock, we identify and remove those tweets made by bot users. This will give the true sentiment sans manipulation. This true sentiment can be a very powerful metric, when used with other technical indicators, to call the tops and bottoms of a trend.
In python, we use the library called botometer to know if a particular tweet was made by a bot or not. The botometer library uses a machine learning algorithm trained on tens of thousands of labelled data. This algorithm’s output is a probability on a scale of 0 to 1, where 1 indicates that a twitter account is managed by a bot.
The Botometer API takes the user id as the input and then extracts 1200 features related to that user to compute a score. The Botometer gives separate scores for the following categories:
- Network features
- User features
- Friends features
- Temporal features
- Content features
- Sentiment features
Let us discuss some of these features.
Network features of a user include information on the retweets, mentions, and hashtags that a user tweeted in the past.
For example, If the user is retweeting only those tweets made a particular handle then the user is most likely a bot.
This contains user-specific information such as the user name, language, location, account created date etc., Generally, bots do not contain such information. And if they do, it will be something gibberish.
The category temporal features analyze the tweet rate, timing patterns of tweeting and retweeting etc., For example, if the account tweets at the same time intervals, then it is most likely a bot.
Election Tweets: Identifying the bots
Currently, India is conducting its general election. And there are many bots on twitter supporting their political parties. It is important to know which tweets were made by bots. But before that let us check if the botometer is working correctly. Let us take a known twitter handles real political leaders and check if it is predicting correctly.
First, we install the botometer library using the pip install as shown below and then instantiate a botometer object.
!pip install botometer
Now let us try with the known handle of American President Mr Donald Trump and see what the botometer says.
@realDonaldTrump is the Twitter handle of the American President. After fetching the bot score for this handle we can print and check it using the following command.
All the bot scores values are very low on a scale of 0-1, indicating that most likely the user is a human. We can also get the final score or the aggregate using the following command.
Since most of the tweets made by Mr Doanld Trump are in English, we can consider the English score of 0.03 on a scale of 0-1 as the final indicator. In general, we consider a score of more than 0.6 as an indication that an account is being controlled by a bot. So if any of you who had doubts whether the tweets made by the account @realDonaldTrump were really made by the president himself, then these scores should settle that doubt now.
Now let us get back to the Indian election and check how the twitter handles of the two top candidates perform.
First, let us run the analysis for the current Prime Minister of India: Mr Narender Modi. His twitter handle is @narendramodi
And his bot scores are
Since some of Mr Narendra Modi’s tweets are in Hindi, we need to consider the universal score in this case. Which again is very low indicating a human user.
Now let us perform the same exercise for the main opposition Prime Minister Candidate: Mr Rahul Gandhi.
Again the score here is very less indicating a human user. The purpose of this exercise is to check the sanity of the predictions made by the botometer and they clearly look satisfactory and as expected.
But we have not yet identified the bot followers for these leaders. We can do this by importing the Twitter IDs using the tweepy library and then running the botometer check on each of them. The process of separating bots and calculating the market sentiment might seem like a very difficult task, but it can be done easily with proper preprocessing. In our new course on Sentiment Analysis in Trading we show you how to fetch the tweets related a stock using conditional statements and then perform the pre-processing to identify and remove the unrelated tweets. We then generate the market sentiment and create a trading strategy based on this.
The botometer library is built specifically for analysing the Twitter feed. It a very important library in the preprocessing of twitter data before the data can be used to create a trading strategy. Building gauging the market sentiment is a very difficult task and it needs data from various sources. And all data cannot be treated the same way. So, you need to develop new techniques to preprocess data from blogs and news articles.
You can learn how to preprocess different types of data and other important steps involved in the sentiment analysis by enrolling in our new course: Sentiment Analysis in Trading
- Application of Sentiment Analysis in Trading: Where it works?
- Five Indicators To Build A Trend Following Strategy
Disclaimer: All investments and trading in the stock market involve risk. Any decisions to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.