Prophet is a well-known open source library created to work on forecasting based problems. Its developers vouch for its accuracy, speed and automation capabilities. Is it a quick way to forecast on your time series data? How well does it work with the default settings? How do I measure it’s accuracy? We are going to start by understanding its different components and the challenges it aims to tackle.
Why time series forecasting is hard
“Prediction is very difficult, especially if it’s about the future.”
Niels Bohr, Nobel laureate in Physics
We all have an infatuation with predicting the future, especially stock prices. However, time series forecasting, generally, is a daunting task and some of the factors that add to its complexity have been mentioned below. A time series can be decomposed into the following components:
- Trend
- Seasonality
- Cyclic
- Random component
“Some things are so unexpected that no one is prepared for them. “
Leo Rosten in Rome Wasn’t Burned in a Day
Other than the above mentioned there are a plethora of factors at play, to name a few:
- Holidays
- Market competition
- Political landscape
Hence, stock prediction will not be the best use case given the complexity, however, the objective is to understand the simplicity of the tool.
For a minute, let’s address the usual steps involved (not in any order).
- Analyse the different components (Decomposition)
- Check stationarity
- Check for ACF / PACF (Auto correlation)
- Identifying outlier such as major events, holidays that influence the variability
To add to the process, there are a variety of techniques like moving averages, exponential smoothing, ARIMA to choose from. Tuning them properly requires us to be highly proficient at time series forecasting.
For a relatively inexperienced practitioner, navigating through all of this can be very confusing. It is quite the effort, even for a seasoned expert.
Proclaim Prophet!
“The best qualification of a prophet is to have a good memory. “
Marquis of Halifax
Facebook created the Prophet library, as an answer to most of these issues. At first glance, it seems to have several features that we found promising:
- The parameters were quite intuitive and relatively simple to tune.
- It works quite fast.
- Takes care of the yearly and monthly seasonality.
- To a certain extent, it is robust to issues in the data such as missing values, changes in outliers.
- Identifies change-points (inflection points where trend changes significantly).
- Ability to incorporate irregular variations caused due to holidays.
- This growth parameter turns out to be quite useful, say for example, if we were trying to predict weekly production. That is if on plotting the data or due to business understanding, we see signs of a maximum achievable point or saturation, we would set growth to logistic, else linear.
However, there are also a few things about how Prophet that should be kept in mind prior to usage:
- Just like a few other traditional time series techniques, there is no way to incorporate external variables, and thus we would advise some steps. We will discuss those in more detail in a separate post.
- Multiplicative models cannot be accounted.
- It covers only 2 holidays for India (Holi and Diwali). An option for adding other holidays and events is available. It is important to understand their impact on the stock prices.
Saturation Forecasts: Link
These benefits make it a good starting point for any forecasting challenge. However, one should understand what all the parameters involved do thoroughly and avoid blindly charging in.
An overview of Prophet
Prophet’s algorithm is based on generalized additive models, where non-linear trends are fit with yearly, weekly and daily seasonality. Holidays are also considered, with the feature to account a particular country’s national holidays or providing your own list.
Broadly, the components of Prophet’s model are as follows:
. y(t) = g(t) + s(t) + h(t) + ϵₜ
where,
* g(t) models growth, i.e the long-term trends of increase or decrease seen in the data.It can take logistic or linear values.
* s(t) describes the effects of seasonal events throughout the year, on the data. For example- the effect on retail sales each year, in the weeks preceding a festive event, like Christmas or Diwali.
* h(t) describes the effect of holidays and important events that can impact business. For example- government elections, Christmas.
* ϵₜ represents variability that can’t be accounted for by the model.
Other parameters:
- n_changepoints: We can’t capture every event in a list that has an impact on the target variable. Think of this parameter as the flexibility for the curve plot over the points, that is how often can it adjust or change in order to make more accurate forecasts. Of course, consider the possibility of over fitting while tuning this parameter.
- growth: Linear or logistic
- holidays: Data frame containing list of holidays
- holiday_prior_scale: Imagine the impact of release of financial reports every year or Christmas holidays. We need to be able to adjust our forecasts. This parameter helps with that.
- seasonality_prior_scale: There are certain businesses that flourish during a particular season, that is there sales increase and hence there is a possibility of an uptick in the stock prices. Hence, you wish to be able to adequately capture the seasonality.
- yearly_seasonality, monthly_seasonality, weekly_seasonality: Binary flag (true or false)
For seasonality, one component of the algorithm we have not addressed is that it is using Fourier transformations.
In the next post, we are going to dive deeper into Prophet’s mechanics and see how it can be tweaked by applying it to the data. We will forecast stock prices of a major corporation and evaluate how well Prophet does with respect to the actual values.
References:
About Us
Data science discovery is a step on the path of your data science journey. Please follow us on LinkedIn to stay updated.
About the writers:
- Ujjayant Sinha: Data science enthusiast with interest in natural language problems.
- Ankit Gadi: Driven by a knack and passion for data science coupled with a strong foundation in Operations Research and Statistics has helped me embark on my data science journey.