Revenue forecasting with linear methods.
This post highlights linear regression techniques on time series data. We have some weekly sales data from a small business that is expected to have seasonal trends. A bike rental business in a major tourist city. Let’s dive into the data. Load up the required packages.
suppressPackageStartupMessages({
library(forecast)
})
Load the data, we have two columns with weekly revenue numbers from two different sources. Online and Phone; these correspond to sales online and via phone call to the business.
filename <- '~/data/BikeTours.csv'
# Read up the data file - first line is the header
d <- read.csv(filename, header = TRUE)
# Get the seller name
sellerName <- gsub("/|~.*data|.csv", "", filename)
str(d)
## 'data.frame': 220 obs. of 2 variables:
## $ Online: num 0.001 0.001 104 442 156 52 312 0.001 0.001 854 ...
## $ Phone : num 0.001 0.001 0.001 0.001 44.5 0.001 0.001 141 0.001 0.001 ...
The weekly revenue for BikeTours
Timeseries
Let start by creating a timeseries object using the ts()
function and has the signature ts(vector, start=,end=,frequency)
where start and end are times of the first and last observation and frequency is the number of observations per unit time (1=annual, 4=quarterly, 12=monthly).
In our case we have weekly data, therefore we will set our frequncy to 52.
d_ts <- ts(d, start = c(2012,1), frequency = 52)
summary(d_ts)
## Online Phone
## Min. : 0.001 Min. : 0.001
## 1st Qu.: 211.000 1st Qu.: 0.001
## Median :1824.500 Median : 191.500
## Mean :1959.751 Mean : 333.967
## 3rd Qu.:3112.500 3rd Qu.: 571.875
## Max. :8505.000 Max. :1733.000
Online and Phone revenue
Let’s look at the weekly sales split out by mode of purchase Online vs Phone. Online sales are the primary component of sales. For the next few sections we will focus on the Online component. The same analysis and forecast can be done for the Phone component.
title <- paste(" Seasonal decompsition for ", sellerName)
ts.plot(d_ts, gpars = list(col=c("tomato", "magenta"), lwd=2, ylab="Gross Sales", xlab="year", main = title))
legend('topleft' ,legend=c("Online", "Phone"), lty=1, col=c("tomato", "magenta"), bty='n', cex=.75)
Online Revenue
Seasonal component extraction.
When a time series varies according to a periodic component, we can extract the periodicty using classical statistical methods. The stl() function in core R uses loess to effectively take the means of each period. The seasonal component is removed and the remainder is smoothed to find the trend. The sales are clearly seasonal with peaks in the middle of the year starting near spring break and ending near thanksgiving.
Weekly revenue with the seasonal component extracted
d_online <- d$Online
d_ts_online <- ts(d_online, start = c(2012,1), frequency = params$freq)
d_ts_comp_online <- stl(d_ts_online, s.window = "period", robust = TRUE)
plot(d_ts_comp_online, main = title)
Let’s combine this into a single plot. We have also extracted the seasonally adjusted numbers to show the actual variations and trends.
plot(d_ts_online, ylab="Gross Revenue", main = "Online gross revenue with trend and seasonal parts")
lines(d_ts_comp_online$time.series[,2], col="tomato")
lines(seasadj(d_ts_comp_online), col="green")
lines(d_ts_comp_online$time.series[,1], col="blue")
legend('topleft' ,legend=c("Actual", "trend", "seasonal adj", "seasonal"), lty=1, col=c("black", "tomato", "green", "blue"), bty='n', cex=.75)
Forecast using linear regression
This is the forecast for revenue taking seasonality into account. The function tslm() is the linear regression version for time series data.
Note the red line shows the model’s predicted numbers.
f_online<-tslm(d_ts_online ~ trend + season, lambda = 0.5)
ff_online <- forecast(f_online, h=params$freq)
plot(ff_online, ylab="Online Gross Revenue")
lines(fitted(ff_online), col="red")
lines(d_ts_comp_online$time.series[,2], col="tomato")
Forecast using ARIMA models
Let’s also use the [ARIMA](https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average Wikipedia) model to get a forecast. Note that the D=1 parameter forces the model to take seasonality into account.
f_online_arima <-auto.arima(d_ts_online, lambda = 0.5, D=1)
ff_online_arima <- forecast(f_online_arima, h=params$freq)
plot(ff_online_arima, ylab="Online Gross Revenue")
lines(fitted(ff_online_arima), col="red")
lines(d_ts_comp_online$time.series[,2], col="tomato")
Residuals
The residuals - the difference between the actual and predicted values back fitted. Low numbers are indicative of a good fit. A 0 mean is indicative of an unbiased forecast.
res <- residuals(ff_online)
plot(res, ylab="Residuals",xlab="Year", main = paste("Residuals for ", sellerName) )
summary(res)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -41.166 -9.406 -1.330 0.000 8.180 45.450
Autocorrelation
The autocorrelation shows the correlation between seasonal data, peaks show positive correlation at the marked intervals. ( 52 weeks, so yearly correlation is statistically signficant - i.e. above the blue line)
Acf(d_ts_online , main="ACF of revenue")
This is the weekly revenue prediction using this model
ff_online
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2016.231 932.32319 318.4182798 1868.2845 122.377409 2500.5504
## 2016.250 2578.50032 1450.7906998 4028.2664 980.153309 4935.1287
## 2016.269 2899.35343 1693.8126535 4426.9506 1181.579368 5375.4089
## 2016.288 3636.01505 2266.6825172 5327.4040 1666.912767 6363.3987
## 2016.308 2811.32749 1626.6914914 4318.0199 1125.630506 5255.3059
## 2016.327 2534.91901 1418.1468035 3973.7476 953.354775 4874.7646
## 2016.346 3060.37223 1817.3972106 4625.4037 1285.157659 5593.8682
## 2016.365 3436.28453 2109.5778794 5085.0476 1532.589002 6098.2615
## 2016.385 3496.85664 2157.0949601 5158.6747 1573.128986 6178.8657
## 2016.404 3544.38422 2194.4579606 5216.3669 1605.059651 6241.9902
## 2016.423 3417.33077 2094.7327927 5061.9852 1519.939747 6073.0032
## 2016.442 4106.06462 2640.8182622 5893.3674 1989.788563 6980.6221
## 2016.462 4100.72397 2636.5355784 5886.9688 1986.071299 6973.6580
## 2016.481 4906.16729 3289.5213509 6844.8697 2557.578032 8013.0379
## 2016.500 6338.84958 4479.2505450 8520.5050 3617.463974 9818.5166
## 2016.519 5186.98677 3520.1734058 7175.8566 2761.418733 8370.8362
## 2016.538 5270.49922 3589.0301382 7274.0247 2822.442901 8476.8369
## 2016.558 5797.34323 4025.9782912 7890.7646 3211.347336 9141.6205
## 2016.577 5501.54321 3780.1223019 7545.0205 2992.183310 8769.1845
## 2016.596 6625.44928 4720.6757023 8852.2793 3834.746161 10174.4338
## 2016.615 5116.51520 3462.1610316 7092.9258 2710.065021 8281.2468
## 2016.635 4448.71331 2916.9705965 6302.5125 2230.402750 7425.3053
## 2016.654 3794.23255 2391.9585078 5518.5630 1774.583579 6572.1629
## 2016.673 4162.58023 2686.1801445 5961.0367 2029.189479 7054.2524
## 2016.692 3681.91917 2302.9566493 5382.9381 1698.040275 6424.0795
## 2016.712 4390.17215 2869.6040100 6232.7967 2189.008275 7349.6174
## 2016.731 3632.94408 2264.2579505 5323.6866 1664.833664 6359.3359
## 2016.750 3369.14148 2057.0412819 5003.2981 1487.858639 6008.7057
## 2016.769 3282.84745 1989.7352906 4898.0160 1430.700569 5893.2757
## 2016.788 2810.32651 1625.9300943 4316.7793 1124.997152 5253.9373
## 2016.808 1909.04629 961.1825996 3178.9664 586.661624 3989.7123
## 2016.827 1755.20285 852.9584196 2979.5037 502.818088 3765.8690
## 2016.846 931.63616 318.0168235 1867.3119 122.128578 2499.4251
## 2016.865 962.29862 336.0346946 1910.6190 133.388706 2549.4899
## 2016.885 560.41274 120.6339549 1322.2479 17.652625 1861.4542
## 2016.904 63.23344 -22.4462142 426.0771 -132.700951 752.0473
## 2016.923 208.05583 3.0083216 735.1598 -25.476195 1148.9169
## 2016.942 259.64447 11.7223916 829.6230 -11.276489 1266.2938
## 2016.962 179.62121 0.5077909 680.7911 -36.835817 1080.6880
## 2016.981 123.96826 -2.4198465 567.5731 -69.512770 936.7052
## 2017.000 94.68002 -7.5661760 493.3463 -88.755270 834.1545
## 2017.019 219.76105 5.4910801 745.5834 -18.723170 1154.3487
## 2017.038 280.90078 18.3103595 855.0436 -5.718185 1289.6331
## 2017.058 191.89206 1.8809485 693.4556 -28.077880 1089.2560
## 2017.077 279.04190 17.8380613 851.7982 -5.986930 1285.6466
## 2017.096 210.92247 4.1703189 729.2271 -21.420211 1133.9745
## 2017.115 196.93559 2.4097589 703.0138 -26.193857 1101.2271
## 2017.135 183.21704 1.1124655 676.8741 -31.534946 1068.4489
## 2017.154 577.11718 133.2230499 1332.5637 23.735368 1864.0487
## 2017.173 927.77902 323.2239800 1843.8865 127.873075 2461.2347
## 2017.192 1177.07497 476.4394082 2189.2630 229.739463 2857.9602
## 2017.212 1031.06060 385.3014920 1988.3721 167.929587 2627.7414