Stock Market Data And Analysis In Python

15 min read

By Ishan Shah

In this article, you will learn to get the stock market data such as price, volume and fundamental data using python packages and how to analyze it.

In backtesting your strategies or analyzing the performance, one of the first hurdles faced is getting the right stock market data and in the right format, isn't it? Don't worry.

After reading this, you will be able to:

  • Fetch the open, high, low, close, and volume data.
  • Get data at a custom frequency such as 1 minute, 7 minutes or 2 hours
  • Perform analysis of your portfolio
  • Get the earnings data, balance sheet data, cash flow statements and various key ratios such as price to earnings (PE) and price to book value (PB)
  • Get the futures and options data for Indian stock market

Generally, web sources are quite unstable and therefore, you will learn to get the stock market data from multiple web sources.

For easy navigation, this article is divided as below.

  1. Price Volume Daily Data
  2. Intraday Data
  3. Fundamental Data
  4. Futures and Options Data
  5. Visualization and Analysis

Price Volume Daily Data

Yahoo Finance

One of the first sources from which you can get daily price-volume stock market data is Yahoo finance. You can use pandas_datareader or yfinance module to get the data.

In [ ]:
!pip install pandas_datareader==0.7.0
In [22]:
# Import pandas datareader
import pandas_datareader
pandas_datareader.__version__
Out[22]:
'0.7.0'
In [7]:
# Yahoo recently has become an unstable data source.
# If it gives an error, you may run the cell again, or try yfinance
import pandas as pd
from pandas_datareader import data
# Set the start and end date
start_date = '1990-01-01'
end_date = '2019-02-01'
# Set the ticker
ticker = 'AMZN'
# Get the data
data = data.get_data_yahoo(ticker, start_date, end_date)
data.head()
Out[7]:
DateHighLowOpenCloseVolumeAdj Close
1997-05-152.5000001.9270832.4375001.95833372156000.01.958333
1997-05-161.9791671.7083331.9687501.72916714700000.01.729167
1997-05-191.7708331.6250001.7604171.7083336106800.01.708333
1997-05-201.7500001.6354171.7291671.6354175467200.01.635417
1997-05-211.6458331.3750001.6354171.42708318853200.01.427083

To visualize the adjusted close price data, you can use the matplotlib library and plot method as shown below.

In [9]:
import matplotlib.pyplot as plt
%matplotlib inline
data['Adj Close'].plot()
plt.show()

Let us improve the plot by resizing, giving appropriate labels and adding grid lines for better readability.

In [10]:
# Plot the adjusted close price
data['Adj Close'].plot(figsize=(10, 7))
# Define the label for the title of the figure
plt.title("Adjusted Close Price of %s" % ticker, fontsize=16)
# Define the labels for x-axis and y-axis
plt.ylabel('Price', fontsize=14)
plt.xlabel('Year', fontsize=14)
# Plot the grid lines
plt.grid(which="major", color='k', linestyle='-.', linewidth=0.5)
# Show the plot
plt.show()

Advantages

  1. Adjusted close price stock market data is available
  2. Most recent stock market data is available
  3. Doesn't require API key to fetch the stock market data

Disadvantages

  1. It is not a stable source to fetch the stock market data

If the stock market data fetching fails from yahoo finance using the pandas_datareader then you can use yfinance package to fetch the data.

Quandl

Quandl has many data sources to get different types of data. However, some are free and some are paid. Wiki is the free data source of Quandl to get the data of the end of the day prices of 3000+ US equities.

It is curated by Quandl community and also provides information about the dividends and split.

To get the stock market data, you need to first install the quandl module if it is not already installed using the pip command as shown below.

In [ ]:
!pip install quandl

You need to get your own API Key from quandl to get the stock market data using the below code. If you are facing issue in getting the API key then you can refer to this link.

After you get your key, assign the variable `QUANDL_API_KEY` with that key. Then set the start date, end date and the ticker of the asset whose stock market data you want to fetch.

The quandl get method takes this stock market data as input and returns the open, high, low, close, volume, adjusted values and other information.

In [1]:
# Import the quandl
import quandl
# To get your API key, sign up for a free Quandl account.
# Then, you can find your API key on Quandl account settings page.
QUANDL_API_KEY = 'REPLACE-THIS-TEXT-WITH-A-REAL-API-KEY'
# This is to prompt you to change the Quandl Key
if QUANDL_API_KEY == 'REPLACE-THIS-TEXT-WITH-A-REAL-API-KEY':
 raise Exception("Please provide a valid Quandl API key!")
# Set the start and end date
start_date = '1990-01-01'
end_date = '2018-03-01'
# Set the ticker name
ticker = 'AMZN'
# Feth the data
data = quandl.get('WIKI/'+ticker, start_date=start_date,
 end_date=end_date, api_key=QUANDL_API_KEY)
# Print the first 5 rows of the dataframe
data.head()
Out[1]:
DateOpenHighLowCloseVolumeEx-DividendSplit RatioAdj. OpenAdj. HighAdj. LowAdj. CloseAdj. Volume
1997-05-1622.3823.7520.5020.751225000.00.01.01.8650001.9791671.7083331.72916714700000.0
1997-05-1920.5021.2519.5020.50508900.00.01.01.7083331.7708331.6250001.7083336106800.0
1997-05-2020.7521.0019.6319.63455600.00.01.01.7291671.7500001.6358331.6358335467200.0
1997-05-2119.2519.7516.5017.131571100.00.01.01.6041671.6458331.3750001.42750018853200.0
1997-05-2217.2517.3815.7516.75981400.00.01.01.4375001.4483331.3125001.39583311776800.0
In [3]:
# Define the figure size for the plot
plt.figure(figsize=(10, 7))
# Plot the adjusted close price
data['Adj. Close'].plot()
# Define the label for the title of the figure
plt.title("Adjusted Close Price of %s" % ticker, fontsize=16)
# Define the labels for x-axis and y-axis
plt.ylabel('Price', fontsize=14)
plt.xlabel('Year', fontsize=14)
# Plot the grid lines
plt.grid(which="major", color='k', linestyle='-.', linewidth=0.5)
plt.show()

Get stock market data for multiple tickers

To get the stock market data of multiple stock tickers, you can create a list of tickers and call the quandl get method for each stock ticker.[1]

For simplicity, I have created a dataframe data to store the adjusted close price of the stocks.

In [4]:
# Define the ticker list
import pandas as pd
tickers_list = ['AAPL', 'IBM', 'MSFT', 'WMT']
# Import pandas
data = pd.DataFrame(columns=tickers_list)
# Feth the data
for ticker in tickers_list:
 data[ticker] = quandl.get('WIKI/' + ticker, start_date=start_date,
 end_date=end_date, api_key=QUANDL_API_KEY)['Adj. Close']
# Print first 5 rows of the data
data.head()
Out[4]:
DateAAPLIBMMSFTWMT
1990-01-021.11809314.1381440.4102784.054211
1990-01-031.12559714.2636560.4125904.054211
1990-01-041.12949914.4266780.4247024.033561
1990-01-051.13310114.3906110.4143003.990541
1990-01-081.14060514.4800570.4206804.043886
In [5]:
# Plot all the close prices
data.plot(figsize=(10, 7))
# Show the legend
plt.legend()
# Define the label for the title of the figure
plt.title("Adjusted Close Price", fontsize=16)
# Define the labels for x-axis and y-axis
plt.ylabel('Price', fontsize=14)
plt.xlabel('Year', fontsize=14)
# Plot the grid lines
plt.grid(which="major", color='k', linestyle='-.', linewidth=0.5)
plt.show()

Advantages

  1. It is free of cost
  2. Has split and dividend-adjusted stock market data

Disadvantages

  1. Only available till 27-March-2018

Intraday Data

Alpha Vantage

Alpha vantage is used to get the minute level stock market data. You need to signup on alpha vantage to get the free API key.

In [ ]:
# Install the alpha_vantage if not already installed
!pip install alpha_vantage

Assign the ALPHA_VANTAGE_API_KEY, with your API Key in the below code.

In [12]:
# Import TimeSeries class
from alpha_vantage.timeseries import TimeSeries
ALPHA_VANTAGE_API_KEY = 'REPLACE-THIS-TEXT-WITH-A-REAL-API-KEY'
# This is to prompt you to change the Quandl Key
if ALPHA_VANTAGE_API_KEY == 'REPLACE-THIS-TEXT-WITH-A-REAL-API-KEY':
 raise Exception("Please provide a valid Alpha Vantage API key!")
# Initialize the TimeSeries class with key and output format
ts = TimeSeries(key=ALPHA_VANTAGE_API_KEY, output_format='pandas')
# Get pandas dataframe with the intraday data and information of the data
intraday_data, data_info = ts.get_intraday(
 'GOOGL', outputsize='full', interval='1min')
# Print the information of the data
data_info
Out[12]:
{'1. Information': 'Intraday (1min) open, high, low, close prices and volume',
 '2. Symbol': 'GOOGL',
 '3. Last Refreshed': '2019-08-01 16:00:00',
 '4. Interval': '1min',
 '5. Output Size': 'Full size',
 '6. Time Zone': 'US/Eastern'}

This gives information about the stock market data which is returned. The information includes the type of data returned such as open, high, low and close, the symbol or ticker of the stock, last refresh time of the data, frequency of the stock market data and the time zone.

In [13]:
# Print the intraday data
intraday_data.head()
Out[13]:
DateOpenHighLowCloseVolume
2019-07-26 09:31:001228.23001232.491228.00001230.7898407037.0
2019-07-26 09:32:001230.92001235.131230.43011233.0000111929.0
2019-07-26 09:33:001233.00001237.901232.75001237.900086564.0
2019-07-26 09:34:001237.44491241.901237.00001241.9000105884.0
2019-07-26 09:35:001241.93991244.491241.35001243.130074444.0
In [19]:
intraday_data['4. close'].plot(figsize=(10, 7))
# Define the label for the title of the figure
plt.title("Close Price", fontsize=16)
# Define the labels for x-axis and y-axis
plt.ylabel('Price', fontsize=14)
plt.xlabel('Time', fontsize=14)
# Plot the grid lines
plt.grid(which="major", color='k', linestyle='-.', linewidth=0.5)
plt.show()

Get data at a custom frequency

During strategy modelling, you are required to work with a custom frequency of stock market data such as 7 minutes or 35 minutes. This custom frequency candles are not provided by data vendors or web sources.

In this case, you can use the pandas resample method to convert the stock market data to the frequency of your choice. The implementation of these is shown below where a 1-minute frequency data is converted to 10-minute frequency data.

The first step is to define the dictionary with the conversion logic. For example, to get the open value the first value will be used, to get the high value the maximum value will be used and so on.

In [ ]:
ohlcv_dict = {
 '1. open': 'first',
 '2. high': 'max',
 '3. low': 'min',
 '4. close': 'last',
 '5. volume': 'sum'
}

Convert the index to datetime timestamp as by default string is returned. Then call the resample method with the frequency such as

  • 10T for 10 minutes,
  • D for 1 day and
  • M for 1 month
In [130]:
intraday_data.index = pd.to_datetime(intraday_data.index)
intraday_data_10 = intraday_data.resample('10T').agg(ohlcv_dict)
intraday_data_10.head()
Out[130]:
DateOpenHighLowCloseVolume
2019-07-17 09:30:001150.92001155.55001150.5101154.2882911.0
2019-07-17 09:40:001154.29501157.84001154.2951157.7635549.0
2019-07-17 09:50:001157.42501158.43991155.6701155.6730371.0
2019-07-17 10:00:001155.47001156.02001153.0901153.0921445.0
2019-07-17 10:10:001153.01941153.42001151.0001152.2931073.0

yfinance

yfinance is another module which can be used to fetch the minute level stock market data. It returns the stock market data for the last 7 days.

In [ ]:
# Install the yfinance if not already installed
!pip install yfinance

The yfinance module has the download method which can be used to download the stock market data. It takes the following parameters:

  1. ticker: The name of the tickers you want the data for. If you want data for multiple tickers then separate them by space
  2. period: The number of days/month of data required. The valid frequencies are 1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max
  3. interval: The frequency of data. The valid intervals are 1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1wk, 1mo, 3mo

The below code fetches the stock market data for MSFT for the past 5 days of 1-minute frequency.

In [133]:
data = yf.download(tickers="MSFT",
 period="5d",
 interval="1m")
data.head()
[*********************100%***********************] 1 of 1 downloaded
Out[133]:
DatetimeOpenHighLowCloseAdj CloseVolume
2019-07-17 09:30:00-04:00137.70137.75137.23137.33137.33645676
2019-07-17 09:31:00-04:00137.33137.43137.22137.40137.40112675
2019-07-17 09:32:00-04:00137.39137.40137.18137.29137.2973906
2019-07-17 09:33:00-04:00137.44137.58137.39137.42137.42127492
2019-07-17 09:34:00-04:00137.44137.52137.43137.45137.4556630

Stocks Fundamental Data

We have used yfinance to get the fundamental data.

The first step is to set the ticker and then call the appropriate properties to get the right stock market data.

In [ ]:
# Install the yfinance if not already installed
!pip install yfinance
In [ ]:
# Import yfinance
import yfinance as yf
# Set the ticker as MSFT
msft = yf.Ticker("MSFT")

Key Ratios

You can fetch the latest price to book ratio and price to earnings ratio as shown below.

In [ ]:
# get price to book
pb = msft.info['priceToBook']
pe = msft.info['regularMarketPrice']/msft.info['epsTrailingTwelveMonths']
print('Price to Book Ratio is: %.2f' % pb)
print('Price to Earnings Ratio is: %.2f' % pe)

Revenues

In [134]:
# show revenues
revenue = msft.financials.loc['Total Revenue']
plt.bar(revenue.index, revenue.values)
plt.ylabel("Total Revenues")
plt.show()

Earnings Before Interest and Taxes

In [135]:
EBIT = msft.financials.loc['Earnings Before Interest and Taxes']
plt.bar(EBIT.index, EBIT.values)
plt.ylabel("EBIT")
plt.show()

Balance sheet, cash flows and other information

In [ ]:
# show income statement
msft.financials
# show balance heet
msft.balance_sheet
# show cashflow
msft.cashflow
# show other info
msft.info

Futures and Options (F&O) Data for Indian Equities

NSEpy

The nsepy package is used to get the stock market data for the futures and options for Indian stocks and indices.

Futures Data

In [15]:
from datetime import date
from nsepy import get_history
# Stock options (for index options, set index = True)
stock_fut = get_history(symbol="HDFC",
 start=date(2019, 1, 15),
 end=date(2019, 2, 1),
 futures=True,
 expiry_date=date(2019, 2, 28))
stock_fut.head()
Out[15]:
DateSymbolExpiryOpenHighLowCloseLastSettle PriceNumber of ContractsTurnoverOpen InterestChange in OIUnderlying
2019-01-15HDFC2019-02-281986.702011.001982.952008.252006.202008.2548104.796817e+09253750022995001992.15
2019-01-16HDFC2019-02-282002.102010.151985.201992.151991.301992.1526562.655748e+09378350012460001975.00
2019-01-17HDFC2019-02-282003.602019.051991.602017.152013.002017.1539934.008667e+0955450001761500NaN
2019-01-18HDFC2019-02-282018.552025.752005.002018.402017.252018.404814.845300e+085637000920002006.85
2019-01-21HDFC2019-02-282011.252031.101998.002016.552016.602016.5514891.505249e+0962580006210002004.45
In [20]:
import matplotlib.pyplot as plt
stock_fut.Close.plot(figsize=(10, 5))
# Define the label for the title of the figure
plt.title("Close Price", fontsize=16)
# Define the labels for x-axis and y-axis
plt.ylabel('Price', fontsize=14)
plt.xlabel('Date', fontsize=14)
# Plot the grid lines
plt.grid(which="major", color='k', linestyle='-.', linewidth=0.5)
plt.show()

Options Data

In [17]:
from datetime import date
from nsepy import get_history
stock_opt = get_history(symbol="HDFC",
 start=date(2019, 1, 15),
 end=date(2019, 2, 1),
 option_type="CE",
 strike_price=2000,
 expiry_date=date(2019, 2, 28))
stock_opt.head()
Out[17]:
DateSymbolExpiryOption TypeStrike PriceOpenHighLowCloseLastSettle PriceNumber of ContractsTurnoverPremium TurnoverOpen InterestChange in OIUnderlying
2019-01-15HDFC2019-02-28CE2000.052.7056.0052.7056.056.056.033081000.081000.01000010001992.15
2019-01-16HDFC2019-02-28CE2000.055.0055.0049.0049.049.049.01414358000.0358000.01100010001975.00
2019-01-17HDFC2019-02-28CE2000.059.1564.6551.0061.961.961.92727750000.0750000.0185007500NaN
2019-01-18HDFC2019-02-28CE2000.063.0063.0060.0060.060.060.077212000.0212000.01850002006.85
2019-01-21HDFC2019-02-28CE2000.062.0569.0062.0562.962.962.966198000.0198000.02000015002004.45
In [21]:
import matplotlib.pyplot as plt
stock_opt.Close.plot(figsize=(10, 5))
# Define the label for the title of the figure
plt.title("Close Price", fontsize=16)
# Define the labels for x-axis and y-axis
plt.ylabel('Price', fontsize=14)
plt.xlabel('Date', fontsize=14)
# Plot the grid lines
plt.grid(which="major", color='k', linestyle='-.', linewidth=0.5)
plt.show()

Visualization and Analysis

After you have the stock market data, the next step is to create trading strategies and analyze the performance.

I have created a simple buy and hold strategy for illustration purpose with four stocks namely Apple, Amazon, Microsoft and Walmart.

To analyze the performance, you can use the pyfolio tear sheet as shown below.

In [ ]:
# Install pyfolio if not already installed
!pip install pyfolio
In [ ]:
import pyfolio as pf
In [16]:
# Define the ticker list
tickers_list = ['AAPL', 'AMZN', 'MSFT', 'WMT']
# Import pandas and create a placeholder for the data
import pandas as pd
data = pd.DataFrame(columns=tickers_list)
# Feth the data
import yfinance as yf
for ticker in tickers_list:
 data[ticker] = yf.download(ticker, period='5y',)['Adj Close']
# Compute the returns of individula stocks and then compute the daily mean returns.
# The mean return is the daily portfolio returns with the above four stocks.
data = data.pct_change().dropna().mean(axis=1)
# Print first 5 rows of the data
data.head()
[*********************100%***********************] 1 of 1 downloaded
[*********************100%***********************] 1 of 1 downloaded
[*********************100%***********************] 1 of 1 downloaded
[*********************100%***********************] 1 of 1 downloaded
Out[16]:
Date
2014-07-29 -0.003227
2014-07-30 -0.002592
2014-07-31 -0.020351
2014-08-01 -0.005142
2014-08-04 0.006887
dtype: float64
In [21]:
pf.create_full_tear_sheet(data)
Start date2014-07-29
End date2019-07-25
Total months59

Backtest
Annual return26.6%
Cumulative returns224.4%
Annual volatility18.2%
Sharpe ratio1.39
Calmar ratio1.11
Stability0.96
Max drawdown-24.1%
Omega ratio1.28
Sortino ratio2.07
Skew0.15
Kurtosis4.25
Tail ratio0.94
Daily value at risk-2.2%
Worst drawdown periodsNet drawdown in %Peak dateValley dateRecovery dateDuration
024.082018-10-012018-12-242019-04-17143
113.152015-07-302015-08-252015-10-2362
213.122015-12-072016-02-092016-04-1393
39.652019-05-032019-06-032019-06-1833
48.462018-03-122018-04-022018-05-1044
Stress Eventsmeanminmax
Oct140.04%-1.68%2.17%
Fall2015-0.17%-4.67%5.35%
New Normal0.10%-4.67%7.17%


I hope you can use the Python codes to fetch the stock market data of your favourites stocks, build the strategies and analyze it. I would appreciate if you could share your thoughts and your comments below.

Python is quite essential to understand data structures, data analysis, dealing with financial data, and for generating trading signals. For traders and quants who want to learn and use Python in trading, this bundle of courses is just perfect.

Disclaimer: All investments and trading in the stock market involve risk. Any decisions to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.