FORECASTING TOURIST ARRIVALS IN SOUTH AFRICA

Problem investigated: Tourism to South Africa has grown substantially since the first democratic elections in 1994. It is currently the third largest industry in the country and a vital source of foreign exchange earnings. Tourist arrivals continue to grow annually, and have shown some resilience to a number of emerging market crises, including the terrorist attacks in the USA. Business success, marketing decisions, government’s investment policy as well as macroeconomic policy are influenced by the accuracy of tourism forecasts, since the tourism product comprises a number of services that cannot be accumulated. Accurate forecasts of tourism demand are paramount to ensure the availability of such services when demanded. In addition, the seasonal nature of tourism leads to a pattern of excess capacity followed by shortage in capacity.


INTRODUCTION AND BACKGROUND TO FORECASTING METHODS
After the democratic elections in 1994, tourism in South Africa has grown to become one of the country's main earners of foreign exchange and creators of employment.The growth in tourism to South Africa meant that the country is currently the main destination for tourists to the African continent (UNWTO, 2009b: 38-39).The growth rate in tourist arrivals has also surpassed that of the world average for more than a decade.The growth is fuelled by reasons that include hosting major events, while political turmoil and terrorist attacks in other parts of the world have had positive effects on South African arrivals.In addition, major markets started to change their travel trends, with an increasing number of tourists from developed countries visiting developing countries (UNWTO, 2009b;Saayman & Rossouw, 2008: 2;SAT, 2009).Based on the arrival figures of visitors to South Africa (see Figure 1), it is clear that the country is experiencing significant growth, but the question remains whether this can be sustained.Answers to this question can assist both government and the private sector in pro-active decision-making (Song & Witt, 2006: 214-215).Acta Commercii 2010 An analysis of South Africa's main international tourism markets reveals that tourists from Africa are the main source of tourist arrivalsin particular tourists from neighbouring countries, namely Lesotho, Botswana, Zimbabwe, Mozambique and Swaziland.Many of these tourists enter South Africa for the sole purpose of shopping at border towns and/or visiting family and friends.Tourists from overseas countries stem mostly from the United Kingdom, Germany, the Netherlands, the United States of America and France.Since these visitors spend much more than their average African counterpart when visiting South Africa, understanding and forecasting the trends in different markets remain paramount (Saayman & Saayman, 2003).Goh and Law (2002: 499) indicated that the first tourism forecasting studies took place in the early 1960s.
In a review of empirical evidence for forecasting tourism demand in 1995, Witt and Witt (1995) classified tourism forecasting methods into two groups, namely causal methods and time series methods.Under the causal methods, econometric models using independent variables such as income, population, prices and qualitative effects, as well as trends and lagged dependent variables were popular in earlier studies.
Other researchers used spatial models, and especially the gravity model specification, in forecasting tourism demand (Witt & Witt, 1995).More recent econometric forecasting studies tend to use autoregressive distributed lag specifications, error correction mechanisms, vector autoregressive schemes and time varying parameter models.In addition, adjustments to the almost ideal demand system (AIDS) increased its popularity in this line of research (Song & Li, 2008).Song and Li (2008) confirm that causal and time series methods remain the most popular forecasting methods and a combination of these approaches is being used increasingly, but add that new quantitative approaches have surfaced over the past few years.These include artificial neural networks, the rough set approach, fuzzy time series methods and genetic algorithms that can all be classified under artificial intelligence methods.Yet, Song and Li (2008) criticise these techniques based on their poor theoretical justification, economic rationale and policy application.They continue to state that almost 60 percent of the forecasting research since 2000 has implemented time-series methods, indicating the popularity of this method of forecasting.These methods explore the historic trend in the tourism time series and extrapolate the trend into the future without exploring its underlying causes (Song et al., 2003a).The various time series forecasting methods and results are subsequently reviewed, with emphasis on the methods used in the current paper.Acta Commercii 2010 The Naïve model The Naïve model assumes that tourism arrivals follow a random walk, and trends and turning points can therefore not be predicted (Goh & Law, 2002: 501).Two versions of the Naïve model exist, with the first model assuming that tourism arrivals in the current period (denoted by t F ) equal arrivals in the previous period (denoted by 1 t X ) (Burger et al., 2001: 405).Thus: The second model allows for a seasonal adjustment of the data as indicated below (Goh & Law, 2002: 501-502): The naïve models often serve as the benchmark against which the accuracy of the forecasting performance of other models is assessed (Song & Li, 2008: 210).Witt and Witt (1995:465-467) report that the naïve models showed the lowest mean absolute percentage error (MAPE) in research results from the early 1980s.However, since the naïve models forecast no change, they tend to perform weaker in forecasting the direction of change and changes in trends than other methods.

Moving average techniques
When using average techniques, equal weights are given to all past observations used in computing the average (Lim & McAleer, 2001: 968).When monthly data is considered, the previous 12 observations are chosen and given equal weights in determining the average (Goh & Law, 2002: 502).Thus: Goh and Law (2002:509) found that moving average techniques outperformed the naïve models based on a basket of criteria, which included the MAPE, Theill's U, root mean square error (RMSE) and the mean square error (MSE).However, it failed to outperform other time series techniques, most notably exponential smoothing, ARIMA and SARIMA techniques (discussed below) on all the criteria.

Exponential smoothing techniques
While exponential smoothing techniques also use averages of the data, it assigns different weights to past observations used in forecasting a time series, unlike moving average methods.As such, these methods adjust the smoothing coefficients and reduce the fluctuations caused by the irregular component in the time series under consideration (Lim & McAleer, 2001: 966, 968).In general, the weights show an exponential decay and observations closer to the forecasting period therefore carry a greater weight (Goh & Law, 2002: 502).Lim and McAleer (2001: 971) found that the Holt-Winters exponential smoothing method delivers improved forecasts of tourism arrivals in Australia, a country that also shows severe seasonality in tourist arrivals, compared to other exponential smoothing techniques.The three components of the smoothed series are defined by the following: , and , and where , and are the smoothing coefficients and t X ~ is the smoothed estimate of arrivals at time t .
In line with Lim and McAleer (2001), Goh and Law (2002: 509) found that the Holt-Winters exponential smoothing forecasting technique outperformed the other smoothing techniques, based on a range of accuracy measurements.The method also outperformed the naïve models as well as the moving average forecasting methods.The MAPE was even lower than those of the ARIMA forecasts (discussed below), but failed to improve on the SARIMA (see below) forecasts.

ARIMA-models and the Box-Jenkins procedure
The time-series models that dominate tourism demand forecasting are autoregressive integrated moving average (ARIMA) models (Song & Li, 2008: 210).ARIMA (p, d, q) models were developed by Box and Jenkins in the 1970s and their approach of identification, estimation and diagnostics is based on the principle of parsimony.In identifying the appropriate model specification, the autocorrelation and partial autocorrelation functions are scrutinised.The models are then estimated and the model with the lowest information criteria (Akaike information or Schwartz Bayesian criteria) is chosen as the most appropriate model specification.Before using the model for forecasting, the goodness of fit should be assessed and care taken to avoid overfitting (Asteriou & Hall, 2007: 240-242).
The forecasting equation for tourism arrivals with ARIMA (p, d, q) models, where the p denotes the order of the autoregressive part, the d the order of integration and the q the order of the moving average part of the model, is given by (Goh & Law, 2002: 502): In terms of lag operator notation, the model's specification is: Song et al. (2003b:136) remark that ARIMA models are often considered as delivering more accurate forecasts than econometric forecasting techniques.This is confirmed by du Preez and Witt (2003: 449) who found ARIMA models to outperform multivariate models in forecasting performance.Goh and Law (2002: 509) indicate that the overall performance of ARIMA models is superior to that of the naïve models and smoothing techniques.However, they indicate that it is outperformed by the forecasting accuracy of SARIMA models (see discussion below).

SARIMA-models
The seasonal ARIMA model (SARIMA) is an extension of the ordinary ARIMA model to allow for seasonality in the data (Chu, 1998: 602).These seasonal components of the ARIMA model are denoted by capital letters -SARIMA (p, d, q)(P, D, Q) swhere the last bracket indicates the seasonal factor parameters for the order of autoregressive, integration and moving average parts of the model.The first bracket indicates the non-seasonal parameters (Goh & Law, 2002: 502-503).Chu (1998: 603) indicates that a SARIMA (1, 1, 1)(1, 1, 1) 12 model yields the following equation: i and i the non-seasonal parameters that are estimates, i and i the seasonal parameters, and t an uncorrelated random shock.In an evaluation of the various time series models, the SARIMA models are often found to deliver the most accurate forecasts for tourism demand (Goh & Law, 2002: 509;Chu, 1998: 612;Song & Li, 2008: 210).

Other time series models
Although the above-mentioned methods are the most widely applied time series forecasting techniques in tourism demand literature, they are not the only methods used.Volatility models, including autoregressive conditional heteroscedasticity (ARCH) and generalised autoregressive conditional heteroscedasticity (GARCH) models, are increasing in popularity with papers such as those by Chan et al. (2005) and Coshall (2009).Chu (2004) used a cubic polynomial approach in forecasting tourism demand in Singapore.However, Chu found that the forecasting accuracy of this approach is not superior to that of Acta Commercii 2010 seasonal ARIMA models.Chu (2008) further experimented with new expansions in ARIMA modelling by forecasting tourism demand for Singapore using a fractionally integrated autoregressive moving average (ARFIMA) approach.The ARFIMA (p, d, q) approach differs from the ARIMA (p, d, q) approach only in the estimation of the difference parameter (d).With the standard ARIMA (p, d, q) model, d can only take on integer values (e.g.0, 1, 2), while d can take on any fractional value (e.g.0.7, 1.2 etc.) with the ARFIMA (p, d, q) approach.The results, using ARFIMA methodology, are mixed, with Chu (2008) and Gil-Alana (2005) indicating that the ARFIMA approach improved forecasting accuracy relative to the standard ARIMA and SARIMA models, but Franses and Ooms (1997) failed to show improved forecasting results.

PROBLEM INVESTIGATED
The growing importance of tourism forecasting is highlighted by Song and Witt (2006: 214-215) who indicated that business success, marketing decisions, government's investment policy as well as macroeconomic policy are influenced by the accuracy of tourism forecasts.Archer (1987: 77) adds that since the tourism product comprises a number of services that cannot be accumulated, accurate forecasts of tourism demand are paramount to ensure the availability of such services when demanded.In addition, the seasonal nature of tourism leads to a pattern of excess capacity followed by shortage in capacity.
The aspects raised above support the notion that forecasting tourist arrivals accurately is important for the destination country's tourism industry as a whole.How to obtain the most accurate forecasts of arrivals has therefore remained a research question in tourism and forecasting literature.However, while many studies have been conducted on forecasting tourist arrivals world wide, only one study has been conducted for South Africa at the end of the 1990s (see Burger et al., 2001).The study only focused on tourism demand to one of South Africa's popular tourist cities (Durban) and analysed tourist arrivals as a whole for the city.In contrast, this paper aims to model and forecast tourist arrivals to the whole of South Africa from its main overseas markets, using various time series approaches.The forecasting accuracy of the various approaches will also be evaluated to establish the most appropriate univariate time series forecasting method for South African tourism demand.
A variety of measures is available to evaluate the forecasting accuracy of the various time series models.Chu (2004) remarks that the most popular measures used by more than 20 researchers are the values of the mean absolute percentage error (MAPE) and the root mean square error (RMSE).These two measures are based on the forecast error, which is the difference between actual arrivals and forecasted arrivals and this forms the basis for the comparison of the various models.Since the forecast error depends on the size of the variable in question, it is difficult to compare the results of two different time series using these measures.To overcome this obstacle, Theill's U will also be assessed, which is a relative measure that compares the forecast results with that of the naïve (or random walk) model (Goh & Law, 2002: 504).

DATA AND METHOD
As was mentioned previously, South Africa's main international tourist markets, excluding African markets, are Great Britain (22.4% of overseas arrivals in 2007), Germany (11.5%), the USA (12.5%), the Netherlands (5.8%) and France (5.2%).Together, arrivals from these markets account for 57.3% of all intercontinental arrivals in 2007.Forecasting arrivals from these markets is therefore important in understanding the dynamics of international tourism to South Africa.Figure 2

ANALYSIS AND FINDINGS
The best ARIMA models, based on the Akaike Information Criteria, were used in forecasting tourism arrivals.For Germany, an ARIMA (2,1,2) model was estimated, for France an ARIMA (2,0,2), for Great Britain an ARIMA (4,1,3), for the Netherlands an ARIMA (4,1,2) and for the USA an ARIMA (2,1,2) model proved to be the best fit.
For the Seasonal ARIMA models, most arrival series follow the airline model, with Germany, Great Britain and the USA following a SARIMA (0,1,1)(0,1,1) 12 structure.Arrivals from France follow a SARIMA (1,0,1)(0,1,1) 12 structure and from the Netherlands a SARIMA (1,1,1)(0,1,1) 12 structure.The airline model was first derived by Box and Jenkins in 1970 when they estimated a seasonal ARIMA model using airline passenger data.The model proved quite robust in forecasting seasonal data series and became a benchmark for comparing the accuracy of forecasts (Ghysels et al., 2006:665).The forecasting accuracy of the various models is compared in Tables 2 to 4.
Table 2 shows that the seasonal nature of tourist arrivals in South Africa clearly influences the effectiveness of forecasts over all the forecasting horizons assessed.This is especially true in forecasting over longer horizons, with the mean absolute percentage error (MAPE) the smallest in the SARIMA forecasts, followed by the Holt Winters exponential smoothing forecasts for arrivals from all the major overseas tourist markets for the forecasting periods six months and 12 months.It is also noteworthy that the naïve models perform the worst in almost all cases.The only exception is the three-month forecasts for tourists from Germany and France.The second naïve model performs the worst of all the forecasts over the short term.Acta Commercii 2010  1) *The number in parenthesis indicates the overall ranking of forecasting performance of the methods for each country and time horizon.
In Table 3, the forecasting accuracy of the various forecasting models is assessed using the root mean square error (RMSE) as a measure.It is evident from Table 3, that the results are similar to those in Table 2.The forecasting methods that take into account the seasonal nature of tourist arrivals in South Africa clearly outperform the other models, especially over longer forecasting horizons (six and 12 months).The naïve 1 (or random walk) model only shows some accuracy in short-term forecasts, but fails to perform in the longer run.A similar pattern can be observed when Theill's U is assessed to measure forecasting accuracy.The SARIMA models outperform the other models in all three time horizons, followed by the Holt Winters exponential smoothing forecasts.For the first time, the ARIMA forecast takes the pole position in forecasting accuracyfor France over a three-month forecast.Across all the other time horizons, the non-seasonal ARIMA forecasts struggle to outperform the naïve models, confirming the results of Smeral and Wüger (2005) that the forecasting accuracy of ARIMA models is not always superior to that of naïve models.Based on the assessment of forecasting accuracy of the various models, the authors can conclude that the seasonal methods for forecasting international tourist arrivals are superior to the non-seasonal methods.The seasonal ARIMA model seems to be the best predictor over all the forecasting horizons considered, followed by the Holt-Winters exponential smoothing technique.Figure 3 illustrates the oneyear forecasts generated by the SARIMA models for all the markets under consideration and this is compared with the actual arrivals in 2007.It is evident from the graphs that the forecasts for Germany, Great Britain and the Netherlands are more accurate than those of arrivals from France and the USA.This is also reflected in the tables above where the errors of the SARIMA model are lower for Germany, Great Britain and the Netherlands than for France and the USA.Acta Commercii 2010 The results confirm the research results of Goh and Law (2002: 509) and Chu (1998: 606), both found that SARIMA models outperform other forecasting techniques in predicting arrivals in Asian-Pacific countries.In addition, Chu (1998: 606) found the Holt Winters exponential smoothing forecasts to be the second-best method for forecasting tourist arrivals in Asian-Pacific countries.For Hong Kong, though, Goh and Law (2002: 509) showed that non-seasonal ARIMA models outperform the Holt-Winters forecasts.

IMPLICATIONS OF THE RESEARCH
The analysis indicated that the forecasting models that took into account the seasonal nature of tourist arrivals in South Africa outperformed the other models over all forecasting horizons.Therefore, firstly from a methodological point of view, one has to recognise that international tourist arrivals are seasonal and this has to be accounted for in the modelling process.Secondly, seasonal ARIMA models proved to deliver the most accurate predictions of arrivals over the three time horizons followed by the Holt-Winters exponential smoothing method and could therefore be identified as the superior univariate time series technique in modelling international tourist arrivals to South Africa.Non-seasonal ARIMA models faired overall slightly better than the naïve models, but since they do not take the seasonal nature of overseas arrivals into consideration, their accuracy proved to be far less than that of the seasonal ARIMA models.As such, this research confirms research by Chu (1998) and Goh and Law (2002) who found seasonal ARIMA models to outperform other time series techniques in forecasting tourist arrivals in Asian-Pacific countries.
The contribution and importance of this research lie in the findings as well as the application to South African data.Applying seasonal ARIMA models in forecasting international tourist arrivals to South Africa could assist government and the private sector in being more pro-active in their policies and decisionmaking to have products available when tourists need them.However, one of the drawbacks of using univariate time series techniques in forecasting arrivals is that it does not identify the underlying causes for the changes in arrivals.As such, the research could be expanded to include econometric modelling techniques to identify these causes and to better inform the industry on how to maintain a sustainable growth in tourist arrivals.

Acta Commercii 2010 Figure 2: Tourism arrivals from main overseas destinations (1994-2007)
illustrates th(StatsSA, 2008))these markets for the period 1994 to 2007.All data used in the analysis was obtained from Statistics South Africa's Tourism Reports (Report 03-51-02 from 1999 onwards and Statistical Release P0351 for earlier years).It is clear from the figure that tourism to South Africa grows consistently and follows a seasonal pattern.This period can be viewed as a relatively stable period in South Africa's tourist arrivals, since tourist arrivals only started to increase significantly between1990  and 1993(StatsSA, 2008)), and 2007 is before the onset of the financial crisis, which saw a decline in tourist arrivals worldwide(UNWTO, 2009a).The models that are estimated and forecasted are the Naïve 1 and 2 models, the Holt-Winters exponential smoothing model, ARIMA and SARIMA models, as explained above.Ex post forecasts are then conducted for 2007 in three time periods, namely a three-month, six-month and twelve-month forecast.Forecasting accuracy is assessed using the MAPE, RMSE and Theill's U statistics.All estimations are done using EViews 6, making use of natural logarithms of the arrivals series.The Augmented Dickey-Fuller (ADF) test revealed that all the series are non-stationary when no constant or trend is added to the testing equation.However, when a constant is included, only arrivals from Germany and the USA are non-stationary.This impacted on the ARIMA model structures.The results of the ADF test are indicated in Table1.The results of the Phillips-Peron unit root test confirm the ADF test Acta Commercii 2010 results, but indicate that stationarity may exist in all the series when a constant is included in the test equation.
TOURISM_USSource of data: Compiled from Statistics South Africa 1 reports Using monthly arrivals, the models are estimated for the period 1994 to 2006 and forecasted for 2007.