Can you see the future?

Everybody is fascinated with the future and predicting it. In this article we will explore forecasting techniques and in particular focus on the application of Prophet.

Before we begin, consider that in data science there are a lot of considerations and efforts required to understand, clean and transform the data prior to any modelling activity. In this article, we will focus directly on forecasting using Prophet. The objective is to understand the library and it’s utility. The hyper-parameters have been explain in the introductory article here.

With that in mind, let’s define the forecast horizon to be short term.

Considerations

As you are probably aware, stock price forecasting is a tough beast to slay. This is primarily due to the fact that there are a lot of factors that influence the stock price of a company. To name a few:

  • Market sentiment: Overall sentiment of the buyers and sellers of the stock.
  • Political environment: Impact of government policies
  • Business environment: Business decisions, policies, competition and so on.
  • Black swan events: Unforeseen events ranging from natural disasters to a sudden dip or jump in sales.

This also means that just closing stock prices will not be sufficient to accurately make any predictions.

Let’s begin!

Data

For our study, we obtained daily prices for Reliance Industries Limited from January 1, 2000 to present January 1, 2020, using the Alpha Vantage API. The columns present in the data-set are:

  1. Open
  2. High
  3. Low
  4. Close
  5. Volume

Methodology

We divided our approach into the following:

  1. Training:
    • Uni-variate time series model to forecast open prices, using Facebook’s Prophet library. The training period was set to 720 days to cover repeated patterns, long term trends and the prediction horizon was 10 days.
  2. Testing:
    • With time series data, the further you try to predict into the future, the less reliable the predictions usually become. So, it is very important to adopt a more practical approach. In order to test this method and the overall performance of the model, walk forward validation involves the following steps:
      • Re-train the model at different points of time.
      • Make the prediction on the days following the train period end date.
      • Measure the MAPE and evaluate the model for consistency and robustness.
  3. Fine-tuning

The metric we chose to optimize our model for was mean absolute percentage error (MAPE) due to its ease of interpretation.

Parameters of Prophet

We used the following settings to get our base model ready:

  • growth: Growth takes two values- linear, if we see a trend that keeps growing and logistic, if we see a maximum achievable point or carrying capacity, where the data saturates. On plotting our data, we noticed an increasing trend with no signs of saturation and hence, set growth to “linear”.
  • yearly_seasonality: This parameter is used to fit yearly seasonality and can take values of ‘auto’, True, False or a number of Fourier terms to generate. We set it to true.
  • seasonality_prior_scale: Controls the flexibility of the seasonality. Large values of this parameter enable the model to fit big seasonal fluctuations, while smaller values lessen the impact of seasonality.
    We found 0.1 to be a suitable value from our experiments, based on the values of MAPE.
  • n_changepoints: Change points are sudden spikes that occur in the trend present in the data. Prophet allows us to specify these points, as well as detect them itself.
    For our purpose, we found that specifying the number of change points ourselves, to 50, through experimentation. We also tried to allow Prophet to detect them.
  • Effect of holidays: Prophet allows us to incorporate the effect of holidays and major events by having country specific days built-in. For India, they have included Holi and Diwali, whose effect can be further controlled by the holidays_prior_scale parameter. We had tried values in the range of 5 to 50 and found that a small value of 5 gave us the lowest value of the MAPE.

Results of testing base model

DateActualPrediction
2019-11-04 1465.9 1401.49
2019-11-05 1463.1 1418.29
2019-11-06 1442.7 1442.69
2019-11-07 1435.0 1457.53
2019-11-08 1449.0 1456.98
Table 1-5 day predictions

Table 1 showcases the latest predictions, achieved by the base model, along with the actual open price for Reliance. They are pretty close to the actual. But, in order to use this in production, we need to understand how we will be basing our decisions on these forecasts. From November 4 to November 7, there, clearly, is a short-term decrease in prices. Therefore, if we aim to indulge in daily trading, then our predictions showing an increasing trend is a problem.

Figure 1- Last 10 days prediction

Figure 1 shows the trend followed by the actual and predicted open prices for the last 10 days. An interesting thing to note is that if the model were to predict the movement (increase/decrease) of prices, then it is not performing that bad. But in terms of closeness to the actuals, it is quite off. The actuals have been stable around the 1450 mark, while the forecasts have clearly been increasing

Figure 3-10 day forecast MAPE
Figure 2- forecasts using a 10 day window

Figure 2 shows that the predictions get over a longer period of time. Figure 3 shows the MAPE plotted for 500 randomly chosen iterations. While the average MAPE is around 10%, however, we see a lot of fluctuation as well going as high as 30%.

Fine-tuning the model

So why does the performance not look too promising?

The underlying cause could be the model itself. However, it is more likely that it is not able to capture patterns influence by external variables such as news sentiment, annual report releases etc.

So, what next?

As the objective here is to build a univariate time series model and in the current iteration, we do not plan to capture the external variables. Hence, we need to get a little creative. One important question that comes out from Figure 2 is that whether my MAPE is consistent across all the 10 days we have forecasted for or does it generally increase as we increase the forecast horizon?

Figure 4- forecasts using a 10 day window with last training date marked

In Figure 4, the last training dates are marked on the line of predictions using black dots. We can see that at the start of the 10 day forecast horizon, the the distance between the actual and forecasted prices is the least and increases as we move further until the start of the next horizon. So, instead of a 10 day forecast window, why not use a single day window instead?

Figure 5- single day prediction horizon

Figure 6- single day forecast MAPE

Figure 5 shows the forecasts using this approach. Clearly, compared to what we see in Figure 2, they are now much closer to the actuals. This is further enforced in Figure 6, where the average MAPE has gone down to roughly 3%. Moreover, over 80% of the iterations have MAPE lesser than 7%, as compared to about 17% in Figure 3. However, the fluctuations are still pretty high.

Figure 7- Chart from 2016-12-07 to 2018-05-10
Figure 6- Last 365 days prediction

Also, the last 365 days (Figure 6) show that the predictions have been in line with the actual stock price trends. The increase or decrease in the price over time has been mirrored by the model. Thus, this model is less likely to disappoint if our investment perspective is relatively longer, i.e. we wish to realize gains over several months. Now, as a quick start model, prophet looks like an easy to use library.

Though to build the best model, there is still a lot of work necessary including incorporating external factors and a lot more testing.

Conclusion

We observed Prophet to be working well for a single day forecast horizon as opposed to a longer one. We also came across a study, where the authors compared Prophet to ARIMA and Deep Learning methods to forecast food wholesale prices and found it to be more error prone compared to the competition. It is important to keep in mind that there is no universal algorithm that will perform the best in all situations. Hence, it is always advisable to try out multiple different algorithms and identify the best choice for the problem and data at hand. While Prophet is working well, there is more scope to better capture the seasonality and other causal links by adding external variables, such as news sentiments, macro variables (GDP), technical indicators etc.

References:

About Us

Data science discovery is a step on the path of your data science journey. Please follow us on LinkedIn to stay updated.

About the writers:

  • Ujjayant Sinha: Data science enthusiast with interest in natural language problems.
  • Ankit Gadi: Driven by a knack and passion for data science coupled with a strong foundation in Operations Research and Statistics has helped me embark on my data science journey.

Leave a Reply