python quantitative stock analysis

Stock analysis with Python: first time lucky

I’ve been looking for something different than Excel for quite some time. Don’t get me wrong, spreadsheets are great, I’m not sure how people managed without them back in the day. However, at some point you just want more and I’ll try to do the stock analysis with Python instead.

A buddy of mine suggested I try MATLAB some time ago so I installed it and then never figured out how to use it. Then I came across Python and I thought that if institutions are using it I should give it a try too.

There was only one problem. I’ve never written a single line of code!

Thankfully it’s the 21st century and we have access to the collective wisdom of the world via the internet. I have to say that the web didn’t disappoint. The amount of free tutorials, courses and code is astonishing.

The purpose of this article is to show you what you can achieve as an absolute beginner within a matter of weeks.

Applications of Python for finance

Since we’re talking about a high level programming language the applications are almost unlimited. You can analyse companies and portfolios or develop a trading strategy and even write your own high frequency trading algorythm.

But let’s not get carried away, coding is difficult so I decided to start by running a single stock analysis with Python. Before we dive into coding I’ll go through the installation and learning process.

Installation and tutorials

It’s really easy to get started, you just go to the Python download page and install it. Then search your computer for IDLE, open it, click on the File menu and then new.

You’ll get something that looks like an empty Notepad file. What do you do now? That’s when the tutorials become handy.

Think of it like learning a foreign language. You start with simple words and expressions before you can talk about theoretical physics.

The advantage of Python is that it can be easier than French depending on what you’re coding. Here’s the first bit of code I wrote as part of a beginner tutorial:

You can see that the code is relatively intuitive, most of it is just math. You should be able to write something like this very early on.

Stock analysis with Python

I just wanted to get historical data, print some charts and figure out the expected move of a stock. Then I thought that the formula below was somewhat simplistic:

Expected move = share price * (volatility / 100) * square root of (days / 365)

Why not run a Monte Carlo simulation instead? How hard can it be?

It turned out that it was easier than I expected. That’s great because I can’t do it in Excel so if I can code it I’ll try to migrate to Python.

Writing the code

I read a few hours worth of tutorials and decided that I won’t really learn anything unless I start coding.

The tutorials cover all kinds of applications. However, my own goal was to write a specific bit of code, not to become a Python developer. That’s when the internet really did its thing.

I found some awesome no nonsense finance tutorials here, here and here. Most of the code in the article is based on these 3 sources.

One important thing to consider is that you need to be comfortable to operate a computer through a command prompt. The first reason is that you’ll need to install some libraries. And the second is that the code works in a similar way to writing commands.

What does a stock analysis with Python look like

First you have to install the libraries. We’ll need a data analysis tool, plotting tool for charts and some other bits. You just go to the command prompt and type.

pip install pandas
 numpy matplotlib scipy seaborn requests

If you have any trouble installing any of these you may need to use Anaconda instead.

Afterwards you can start coding. Open a new file in IDLE and import the stuff we need. It’s a bit like when you install a software which then goes on to install drivers and other things. Only this time you are the one picking the other things.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pandas_datareader as web
from scipy.stats import norm
import seaborn as sns
import requests
import datetime
import random

Then we set a style for all the figures. There is one that blends with my website’s colours, very nice!

plt.rcParams['figure.figsize'] = (10,6)
plt.rcParams['figure.autolayout'] = False
plt.rcParams['lines.linewidth'] = 2
plt.rcParams['axes.grid'] = True'Solarize_Light2')

Data collection

Now it’s time to pick up some stocks and download the historical returns:

# Inputs:
ticker = 'SPY' 
start, end = datetime.datetime(2016, 1, 10), datetime.datetime(2021, 1, 12) #period
# Monte Carlo & bootstrapping
days = 252 
trials = 100

We’ll stick to the SPY ETF and set up datetime to input the dates of the study. These may need to be repeated again and again so I named them start and end to spare typing.

The last bit is the input for the Monte Carlo simulation: trading days and number of paths. For those unfamiliar 1 year consists of ~252 trading days. You can run as many paths as you want (millions) as long as your computer can handle it.

Next we download the data and start processing:

data = pd.DataFrame()
data[ticker] = web.DataReader(ticker, data_source = 'yahoo', start = start, end = end)['Adj Close']
stock_daily_returns = data[ticker].pct_change()
log_returns = np.log(1 + data.pct_change()) 
freq_ret = log_returns.iloc[1:]
stock_stdev = stock_daily_returns.std() * 100
stock_mean = stock_daily_returns.mean() * 100
print(f"{ticker} daily mean return is: {stock_mean.round(2)}%")
print(f"{ticker} daily standard deviation: {stock_stdev.round(2)}%")

The Pandas library has a built-in data reader which downloads information from Yahoo Finance so you need to make sure you’re using their tickers, like ^GSPC for the S&P 500.

I like to print some of the data. The output is pretty basic, however, the advantage is that you don’t need to code any user interface. Note that you have to tell the script to print stuff otherwise it will process the data but you won’t see any output.

Output and charts

We’ll move on to the exciting part, the charts.

subplots_ratio = dict(width_ratios=[3,2], height_ratios=[3,2])
fig, ax = plt.subplots(2,2, gridspec_kw=subplots_ratio, figsize=(10,6))
data.plot(ax=ax[0,0], grid=True, linewidth=2, legend=None, xlabel=(""))
data.plot.hist(ax=ax[0,1], grid=True, bins=60, ylabel=(""))
stock_daily_returns.plot(ax=ax[1,0], grid=True, linewidth=2, legend=None, xlabel=(""))
freq_ret.plot.hist(ax=ax[1,1], grid=True, bins=60, legend=None)
plt.axvline(np.percentile(freq_ret,16), color='g', linestyle='dashed', linewidth=2)
plt.axvline(np.percentile(freq_ret,84), color='g', linestyle='dashed', linewidth=2)
plt.axvline(np.percentile(freq_ret,2.5), color='orange', linestyle='dashed', linewidth=2)
plt.axvline(np.percentile(freq_ret,97.5), color='orange', linestyle='dashed', linewidth=2)
plt.axvline(np.percentile(freq_ret,0.15), color='r', linestyle='dashed', linewidth=2)
plt.axvline(np.percentile(freq_ret,99.85), color='r', linestyle='dashed', linewidth=2)

I know that looks like a lot to digest but if you start coding you’ll get used to it fairly quickly. It’s just the share price, price frequency, volatility of the returns and their frequency. You can plot these as individual charts, it’s a personal preference.

I also added some lines to mark where one, two and tree standard deviations should be based on the number of occurrences. This is to figure out what’s the actual amount of tail events vs the one implied by the standard deviation. I’ll explain this in a bit.

The returns chart

The charts are great, the first time I plotted one I thought wow, that’s really something. Here’s the result:

stock analysis with python

It’s a vector graphic so you can zoom in as much as you want. If I want to know what’s between the 2 orange lines I just need to click the magnifying glass icon and then drag a rectangle.

spy distribution of returns

Distribution statistics for stock analysis with Python

We’ll look into the mean return, standard deviation, skew and kurtosis of the distribution. However, before we start I have to make it clear that the standard deviation has limitations and has been criticised as a risk measure.

There is a lot of merit to these arguments so it’s good practice to use a few different methods to inform investing decisions. Nevertheless, it’s widely applied in Finance and I don’t see why we can’t use it as long as we take its shortcomings into account.

Limitations of Gaussian statistics

SPY’s mean daily return was 0.07% and the standard deviation (sigma) was ±1.19%, meaning that it fluctuates anywhere within the range of ±1.26% around 68% of trading days. This is a one standard deviation move. A two sigma move is supposed to be ±2.45% and cover 95% of all occurrences, while three standard deviations add up to ±3.64% and cover 99.7% of all daily returns.

Tell that to the people who traded when the NYSE’s market-wide Level 1 circuit breaker got activated minutes after market open last year, the S&P 500 lost 7% of its value.

That’s the reason I added the green, orange and red lines. We’re expecting around 4 tail events between 2016 and 2021 with a magnitude of ±3.65% or greater. Let’s compare with the actuals:

spy distribution of returns

The reality is completely different, 99.7% of the realised observations are within the -8% to +6% range. We also have around 15 tail events or about 1.2% of all occurrences if we were to use the ±3.64% range.

Some of these are of a 6 sigma magnitude (±7.3%) which is given a theoretical probability of 3.4 per every 1,000,000 observations. In other words these events should have never happened in our 1260 cycles, but they did, on 5 separate occasions.

Another issue is that the returns are not symmetrical. It’s quite clear that although the S&P 500 has an overall positive drift the tail events to the downside are of greater magnitude compared to the upside ones.

Time to move on before I digress.

Mean return, standard deviation, skew and kurtosis

Next we’ll print a chart of the above statistical measures

rs = data.apply(np.log).diff(1) 
w = 22
s1 = rs.rolling(w).mean()
s2 = rs.rolling(w).std()
s3 = rs.rolling(w).skew()
s4 = rs.rolling(w).kurt()
# Distribution statistics
signals = pd.concat([s1, s2, s3, s4], axis=1)
signals.columns = ['mean', 'std dev', 'skew', 'kurtosis']
signals.plot(title="Distribution data", subplots=True, figsize=(10,6))

Here’s the output:

stock analysis with python kurtosis skew

First we have the mean return which is mostly positive. Then the standard deviation. You can see that sigma can be unpredictable and the historic one has little to do with whatever may or may not happen in the future.

The skew shows us the shape of the distribution. For practical purposes positive skew means that the tail risk is more significant to the upside while negative skew makes it more significant to the down side.

The orange distribution has negative skew, the red has normal skew and the blue has positive skew

The kurtosis is another measure of tail risk. However, it’s more about its magnitude and the distribution of returns. A kurtosis of 0 means that it is more likely that the volatility of the daily returns will remain within the boundaries implied by the standard deviation.

Positive kurtosis is indicative of increased risk of tail events. It can’t tell us if these are to the downside or the upside but we only care about the downside anyway. I mean no one will complain if a stock they hold experiences a +30% tail event.

Negative kurtosis means that the stock is not expected to make any significant moves to the upside or the downside.

Different values of the kurtosis

So far I’m really impressed by the stock analysis with Python and it’s great that I can go through the data so quickly. Nevertheless, I’m doing this for practical reasons and I wanted something forward looking.

I figured that trying to anticipate the expected move of a stock would be useful. Sure, historical returns are not indicative of future ones but they are widely used for analysis. Let’s not forget that hundreds of technical indicators like MACD, RSI, VWAP and so on are all based on historical data.

Monte Carlo simulation

The first time I read ‘….you can run a Monte Carlo simulation on Crystal Ball…’, honestly I thought that it was a typo. But still I got interested and nowadays my broker runs 1,000 Monte Carlo paths on all options trades.

It seems like a good idea, I’ve always done a sensitivity analysis as part of financial modelling and usually assign a minimum of 1% probability for rare events. The simulation is just a different way of achieving the same.

I can’t do it in Excel so I wondered how hard can it be and will it add value to the stock analysis with Python. It turned out it’s really easy, just 10 lines of code!

u = log_returns.mean()
var = log_returns.var()
drift = u - (0.5*var)
stdev = log_returns.std()
Z = norm.ppf(np.random.rand(days, trials)) 
daily_returns = np.exp(drift.values + stdev.values * Z)
price_paths = np.zeros_like(daily_returns)
price_paths[0] = data.iloc[-1]
for t in range(1, days):
    price_paths[t] = price_paths[t-1]*daily_returns[t]

print(f"68% probability that {ticker} will trade within this range over the next {days} days:")
print("16% percentile =",np.percentile(price_paths[t],16))
print("Median ex move =",np.percentile(price_paths[t],50))
print("84% percentile =",np.percentile(price_paths[t],84))

How does that inform the stock analysis with Python

The output is the 68% probability expected move. It’s based on the range where 68% of the paths ended, so we just exclude the first 16 percentiles and the last 16 to remove 32% of the paths. We can present it graphically too by including the following code:

# Paths chart
fig = plt.figure()
ax1 = fig.add_axes([0.1,0.1,0.8,0.8])
plt.axhline(np.percentile(price_paths,16), color='g', linestyle='dashed', linewidth=2)
plt.axhline(np.percentile(price_paths,84), color='g', linestyle='dashed', linewidth=2)
plt.axhline(np.percentile(price_paths,2.5), color='orange', linestyle='dashed', linewidth=2)
plt.axhline(np.percentile(price_paths,97.5), color='orange', linestyle='dashed', linewidth=2)
plt.axhline(np.percentile(price_paths,0.15), color='r', linestyle='dashed', linewidth=2)
plt.axhline(np.percentile(price_paths,99.85), color='r', linestyle='dashed', linewidth=2)
ax1.set_title("Monte Carlo paths")
# Frequency of the final price
plt.axvline(np.percentile(price_paths[t],16), color='g', linestyle='dashed', linewidth=2)
plt.axvline(np.percentile(price_paths[t],84), color='g', linestyle='dashed', linewidth=2)
plt.axvline(np.percentile(price_paths[t],2.5), color='orange', linestyle='dashed', linewidth=2)
plt.axvline(np.percentile(price_paths[t],97.5), color='orange', linestyle='dashed', linewidth=2)
plt.axvline(np.percentile(price_paths[t],0.15), color='r', linestyle='dashed', linewidth=2)
plt.axvline(np.percentile(price_paths[t],99.85), color='r', linestyle='dashed', linewidth=2)
plt.title("Frequency of the Monte Carlo paths")

Here are the plots:

monte carlo stock analysis with python
monte carlo stock analysis with python

The distribution of the simulated price of SPY in a year (2022) has a kurtosis near 0 and positive skew. This is the end of period price so daily tail events are not included. Overall, I agree with the output, it is not very likely that SPY will trade for $250 or $700. Still, it’s certainly not impossible.


Bootstrapping is another method you can use to randomise the price paths. This means that the script picks random chunks of data and constructs a path.

It’s a bit like candlestick patterns. There are many well known patterns and some are alleged to be indicative of one outcome or another no matter the stock. We’re doing something similar via the bootstrapping but we’ll have to run another Monte Carlo to double check the output. Here is the code:

bootstrap_data = data[ticker]
stock = pd.DataFrame([bootstrap_data],
stock_returns = stock.pct_change().dropna().mean(axis=1)

stock_bootstrapping = (1+pd.DataFrame([random.choices(list(
    stock_returns.values), k=days) for i in 
# Bootstrapping
stock_bootstrapping.plot(figsize=(10,6), legend=False, linewidth=1, alpha=0.2, color='b')
plt.axhline(np.percentile(stock_bootstrapping,16), color='g', linestyle='dashed', linewidth=2)
plt.axhline(np.percentile(stock_bootstrapping,84), color='g', linestyle='dashed', linewidth=2)
plt.axhline(np.percentile(stock_bootstrapping,2.5), color='orange', linestyle='dashed', linewidth=2)
plt.axhline(np.percentile(stock_bootstrapping,97.5), color='orange', linestyle='dashed', linewidth=2)
plt.axhline(np.percentile(stock_bootstrapping,0.15), color='r', linestyle='dashed', linewidth=2)
plt.axhline(np.percentile(stock_bootstrapping,99.85), color='r', linestyle='dashed', linewidth=2)
plt.title("Bootstrapped simulation")
print("16% percentile =",np.percentile(stock_bootstrapping,16))
print("Median ex move =",np.percentile(stock_bootstrapping,50))
print("84% percentile =",np.percentile(stock_bootstrapping,84))
# Monte Carlo 2
stockmr = stock_returns.mean()
sigma = stock_returns.std()
stock_mc = pd.DataFrame([(np.random.normal(loc=stockmr, scale=sigma, size=days)+1) for x in range(trials)]).T.cumprod()
stock_mc.plot(figsize=(10,6), legend=False, linewidth=1, alpha=0.2, color='green')
plt.axhline(np.percentile(stock_mc,16), color='g', linestyle='dashed', linewidth=2)
plt.axhline(np.percentile(stock_mc,84), color='g', linestyle='dashed', linewidth=2)
plt.axhline(np.percentile(stock_mc,2.5), color='orange', linestyle='dashed', linewidth=2)
plt.axhline(np.percentile(stock_mc,97.5), color='orange', linestyle='dashed', linewidth=2)
plt.axhline(np.percentile(stock_mc,0.15), color='r', linestyle='dashed', linewidth=2)
plt.axhline(np.percentile(stock_mc,99.85), color='r', linestyle='dashed', linewidth=2)
plt.title("Parametric Monte Carlo simulation")
print("Parametric Monte Carlo:")
print("16% percentile =",np.percentile(stock_mc,16))
print("Median ex move =",np.percentile(stock_mc,50))
print("84% percentile =",np.percentile(stock_mc,84))

This is the output:

bootstrapping stock analysis with python
parametric monte carlo stock analysis with python

The results of the 3 simulations are pretty similar. It looks good but it remains to be seen if there is any practical value.

Stock analysis with Python: tests

We’ll tell the script to process SPY data between Jan 2015 and Jan 2020 to see what range we’ll get for 2021. The 12th of January 2021 close is ~$379.

Here are the results:

spy stock analysis with python

SPY ended in the upper end of the anticipated 68% probability range and delivered a return of ~18%.

United Airlines

Overall it’s not a bad output but I think that we should try it with something that didn’t recover from the covid crash, like United Airlines:

ual stock analysis with python

This time it failed miserably as UAL closed around $46 yesterday. Maybe it didn’t exactly fail as this price is between the 95th and the 99th percentile of anticipated returns.

Note that we get the same positive skew as when we ran the simulation for SPY.

FTSE 100

I’d say that’s fair considering UAL was trading for $25 during the 2020 covid crash. Moving on to the FTSE 100, the data reader didn’t like the ^FTSE ticker and I had to switch to the VUKE.L ETF:

ftse 100 stock analysis with python

It barely scraped the 68% probability range after yesterday’s close of £29.74.

Discussion of the results

On balance I’d say that the simulation does what it says on the tin as 2 out of 3 tests ended within the anticipated 68% probability range. The third one is a proper outlier move which serves to remind us that the risk is out there.

Even when the 68% probability expected move is accurate the range is wide and the math can’t tell us if the stock will be going up or down. Clearly you need to use other methods if you want to narrow it down.

Discounted cash flow model (DCF)

What’s not to like about the DCF? One thing is that you may find it a bit difficult at first. Thankfully, we have Financial Modelling Prep who offer a ready made DCF. However, you need to register or log in with Google to get your free API key.

#DCF model section
stocks = ['FB' 'AMZN', 'AAPL', 'NFLX', 'GOOGL']
def dcf(stock):
    data = requests.get(f"{stock}?apikey=<Enter your api key here>")
    data = data.json()
    dcf = data[0]['dcf']
    current_price = data[0]['Stock Price']
    return (dcf, current_price)
data = map(dcf, stocks)
df = pd.DataFrame(data, columns=['dcf', 'current price'], index=stocks)
df.rename(columns={'index': 'symbol'}, inplace=True)
df['value'] = df.apply(lambda x: 'Under-valued' if x['dcf']>x['current price'] else 'Over-valued',axis=1)
print("DCF results:")

If you’re running the rest of the code you may want to add the ‘stocks’ list variable to the top of the script so you don’t have to scroll down to make changes. This is the output for the FAANG stocks:

discounted cash flow dcf stock analysis with python
Note that if you run this part of the code for ETF tickers you’ll crash the script because the DCF cannot be used on them

Should you make your own DCF

Now we’ve got the make or use an API dilemma to consider. You can make your own DCF and you’ll know all the assumptions, etc. Or you can use the ready made one. Which is better?

It’s difficult to say because everyone, analyst or not, can have an opinion on what the future may look like. Nevertheless, there is always something which changes or deviates from the predicted path, like covid for example, and sends our DCF straight to the trash.

I support the semi-strong efficient market hypothesis, thus I look at the DCF as something which is already priced in. It doesn’t hurt to check it though.

Conclusion on the stock analysis with Python

Stock analysis with Python is awesome! And the community support and tutorials are great too! It’s easy to get started and I intend to stick to it. If you get any errors while coding head over to the Stack Exchange, they’ve got you covered.

I know that Python is not the same thing as Excel and it’s not fair to compare them but data analysis is just a means to an end. Using Python to process it was a lot easier for me.

It’s fairly quick too, running 3 simulations with 100k paths each takes less than 5 minutes on a 1.1GHz, 8GB RAM, Windows 10 Home tablet. Another advantage is that it picks up all the data, you only need to tell it which company and specify the dates.

None of the script output has any predictive ability. It can provide a rough guide but it can’t tell us if we’ll sustain significant gains or losses.

The complexity of the market is on par with predicting the weather over a long period which leaves us without any reliable methods. There is an argument to be made that otherwise there will be no market.

If a decent number of investors can predict the share price no transactions will be completed. For example, Bitcoin is known to increase in price dramatically from time to time. The result is that everyone holds on to it as an asset which cannibalises its application as a currency.

Who would make a sizeable purchase with Bitcoin when it may go up in value by 50 or 100%?