Stock Visualisation and Prediction using Linear Regression

Rockborne consultant who is the author of this blog post.

by Olatoyosi Olamide Babayeju

22 Sep '22

Comparing Relative Stocks Using Visualisation and Predicting Stock Prices with Linear Regression Modelling


(The opinions expressed in this blog are for general informational purposes only and are not intended to provide specific advice or recommendations for any individual or on any specific investment.  It is only intended to provide education about analytical tools and the financial industry.  The views reflected in the commentary are subject to change at any time without notice)


This month working at Rockborne, we have been delving into the vast world of python and its many libraries. Python has a huge amount of libraries that allow the programming language to be extremely powerful.


Our first aim is to compare different stocks within the same category using pythons data visualization tools. The next aim is to learn to predict stocks using linear regression modelling. By predicting future stock prices we can create a strategy for daily trading. It is relatively simple to predict stock prices using linear regression, the difficulty arises when trying to find the right combinations to make predictions profitable


Importing The Stock Data into Python

First let us import our stock data. In this case, we got our data from Kaggle. Once we have downloaded the necessary file, we need to load the file into memory as a pandas Data Frame.


Importing Data Visualization Tools

For data visualization, we import the visualization tools and set the format of the graphs.

Data Preparation and Visualisation

The next step is to visualize and compare the stocks of real estate businesses, tech giants, and automobile companies. We will extract the data price for each stock from our DataFrame. Our plot tool will automatically set the x axis to the date and our y axis will be the ‘close’ (price of the stock).

Real Estate Closing Prices

The stocks for American Tower corp are generally higher than that of Crown Castle. Both stock prices have a general upward trend with time. American Tower Corp and Crown Castle both focus on wireless and broadcast communication real estate infrastructures. With everything going digital it explains why both stocks are on an upward price trend. It also explains why the stocks have similar dips at similar times.


Tech Giant Closing Prices



From our graph above we notice that the stock prices for Apple kept going up from 2010, which is after the release of the iPhone, whilst Microsofts stock prices seem to dip/ level off during the same period. Apple stock prices surged up at the start of 2012 but started to dip from mid-2012 to mid-2013 before it starts to pick up again. The first reason for this was because Apple iPhone sales grew by only 7% whilst they originally predicted an estimated 30% growth rate. The next reason was due to Apple’s profit margin dropping dramatically. For Microsoft, our stock prices increase/dip in a very steady manner throughout the period without any major surge in prices making it seem to be a safer stock due to its consistency, however, this comes at lower margins of profit if we were to trade this stock.


Automobile Closing Prices

Next, we have stock prices for the automobile companies. One thing is clear, both companies are volatile and have no clear upward trend. General Motors and Ford both have major stock price drops from 2011 to mid-2012 before bouncing back to their previous highs. The reason for GM’s stock price decline was that on November 10th, 2010, GM became open to the public with its first IPO. This means the stock prices were based on the hype rather than the true valuation of the company. Whilst for ford the decline was caused by the CFO explaining that ford could lose as much as 2 billion in Europe in 2013 causing a big sell-off.

Daily Percentage Gain & BoxPlot Visualisation

Next, we want to find out the daily percentage gain for each of the stocks. This will be used to help us find out the most volatile stocks, and the best way to visualise this is using a boxplot.

The daily percentage gain is calulated using the following formula: rt = (pt/(p(t-1))) – 1





The box plot shown above is the distribution of the stock’s daily returns. The larger boxes and longer whiskers indicate the stocks with the most volatility. So in this case our most volatile stocks are GM, Ford, and Apple, and this can be confirmed by their graphs above. Making these the stocks that have the most opportunity alongside the most risk. Note that predicting these stocks is harder because of their volatility meaning more chances for an error in our model. Using this information we will build a linear regression model based on Apple, to find the best days for us to trade the stock to make a profit and determine how profitable our model is.

Linear Regression Modelling for Apple Stocks

Limitations and Assumptions

Our first assumption is that the variables in the data are independent. This dictates that the residuals for any single variable are not related, in this case, the difference between the predicted value and observed value. One issue we have is that our stock values are values of the same thing recorded in sequence since this is Time Series data. This is problematic because it produces a data characteristic that is called autocorrelation


We will be focusing on Apple stocks because of how volatile the stock has been over the years. First, we pull the data for the apple stocks and then produce a plot of the data.

Text(0, 0.5, ‘Stock Price ($)’)


Filtering the Data for modeling

Next, we will filter our data for modeling and begin setting up our data for train data and test data.


[[ 30.57285686]

[ 30.62571329]

[ 30.13857071]

[116.760002  ]

[116.730003  ]

[115.82      ]]

Standardization of the dataset is an important requirement for our machine learning estimators. Using from sklearn.preprocessing import MinMaxScaler. The code allows us to transform features (in this case the closing price of a stock) by scaling each feature to a given range in our case 0 and 1. Fit_transform is used to scale the training data and learn the scaling parameters of the data.








Developing the Testing – Training Dataset

Machine Learning requires a minimum of two data sets for effectiveness and accuracy. The training data and testing data are prerequisites for the model. We will split our training and testing size into 65/35 respectively. These percentages can be changed depending on how accurate our predicted values are.

Our data has now separated the Dataframe objects to the nearest whole-number reflective of our 65/35 split (which is 1145 training samples, 617 for the test samples). (Note: we have 60 overlapping data points due to the -60). From our split, we obtain the training and testing data. The empty cells for x_train and y_train are going to be used to store our target and feature sets. We will import tqdm which just shows a smart progress meter of the number of iterations completed.


100%|██████████| 1085/1085 [00:00<00:00, 188431.94it/s]



(1085, 1085)

The next step is to reshape the data, this is because the LSTM model requires 3 dimensions


Twitter logo icon LinkedIn logo icon