Forecasting Macroeconomic Time Series with Locally adaptive Signal extraction

Abstract

We introduce a non-Gaussian dynamic mixture model for macroeconomic forecasting. The locally adaptive signal extraction and regression (LASER) model is designed to capture relatively persistent AR processes (signal) contaminated by high frequency noise. The distribution of the innovations in both noise and signal is robustly modeled using mixtures of normals. The mean of the process and the variances of the signal and noise are allowed to shift suddenly or gradually at unknown locations and number of times. The model is then capable of capturing movements in the mean and conditional variance of a series as well as in the signal-to-noise ratio. Four versions of the model are estimated by Bayesian methods and used to forecast a total of nine quarterly macroeconomic series from the US, Sweden and Australia. We observe that allowing for infrequent and large parameter shifts while imposing normal and homoskedastic errors often leads to erratic forecasts, but that the model typically forecasts well if made more robust by allowing for non-normal errors and time varying variances. Our main …nding is that, for the nine series we analyze, speci…cations with infrequent and large shifts in error variances outperform both …xed parameter speci…cations as well as smooth, continuous shifts when it comes to interval coverage. Keywords: Bayesian inference, Forecast evaluation, Regime switching, State-space modeling, Dynamic Mixture models. 1. Introduction This paper is concerned with the forecasting performance for macroeconomic time series of a class of dynamic mixture models. The widespread instability of coe¢ cients in standard autoregressive moving average (ARMA) models for these types of data series has been widely documented (see for example Stock and Watson, 1996). Multiple shifts in local means, error variances and autocorrelation structure in in‡ation, interest rates and other nominal time series are detected by various frequentist and Bayesian procedures on the last four decades of data (Levin and Piger 2003, Stock and Watson 2006, Koop and Potter 2008, Giordani and Kohn, 2008). According to Clements and Hendry (1999), such shifts are the main cause of forecasting failure of univariate and multivariate linear models. A variety of models has been formulated to tackle various forms of non-Gaussian behavior. However, one cannot but be surprised by the overall di¢ culty in outperforming standard Giordani: Research department, Sveriges Riksbank, SE-103 37 Stockholm, Sweden. E-mail: paolo.giordani@riksbank.se. Villani: Research department, Sveriges Riksbank, SE-103 37 Stockholm, Sweden and Department of Statistics, Stockholm University. The views expressed in this paper are solely the responsibility of the author and should not be interpreted as re‡ecting the views of the Executive Board of Sveriges Riksbank. 1 2 PAOLO GIORDANI AND MATTIAS VILLANI autoregressive (AR) processes out-of-sample even when more complex models are strongly supported in-sample (Stock andWatson, 1996). Some simple models have withstood the test of time and are largely adopted by practitioners and academics. Some of these simple models are cast in state space form, others are estimated by least squares, but all essentially involve some exponential discounting of past observations (discounted least squares, exponential smoothing) and/or over-di¤erencing (ARIMA, local trends models). Marcellino (2008) recently reports good forecasting performance for time varying parameter models (TVP) for macroeconomic series, while Stock and Watson (1996) report more ambiguous results. Markov switching models (Hamilton, 1989) and the closely associated multiple change-point models (e.g. Chib, 1998) are very popular for o¤-line analysis, but little has been published on the forecasting performance of the former and, as far as we are aware, nothing on the latter. The few available studies show disappointing results for Markov switching models (Clements and Krolzig (1998) and Bessec and Bouabdallah, 2005), at least for point forecasts. We summarize the discussion above as follows. Even though in-sample analysis indicates that parameter instability in AR(MA) models is widespread in macroeconomic time series, …xed parameter speci…cations are competitive with simple models that assume continuous and smooth time variation and superior to complex Markov switching models when it comes to forecasting. The goal of this paper is to shed some light on these seemingly con‡icting observations. Rather than forecasting a large number of series, we choose to provide a more detailed analysis of three macroeconomic series of particular interest (real GDP growth, CPI in‡ation and a short interest rate) using quarterly US, Swedish and Australian data (for a total of nine series). Our tool of analysis is a Bayesian dynamic mixture model recently developed for forecasting at the Swedish Central Bank. The model, denoted LASER (locally adaptive signal extraction and regression), allows for a variety of generalizations of standard ARMA models that can account for Gaussian or non-Gaussian shifts in local mean, error variance and persistence, as well as for non-Gaussian innovations. By switching on and o¤ various features of the model and monitoring the real-time forecasts we can thereby try to understand which features are responsible for good and poor performance. From a technical perspective, our innovation is to expand on the work of Giordani and Kohn (2008) (henceforth GK) and introduce a more general extension of the ARMA class than currently available in the literature. The Markov Chain Monte Carlo (MCMC) technology of GK is used to achieve fast and e¢ cient Bayesian inference, which allows us to perform the …rst (to the best of our knowledge) serious forecasting evaluation of change-point and mixture innovation models. Based on this forecasting evaluation, our main conclusions are as follows. First, it is much easier to outperform …xed coe¢ cient models when considering interval coverage rather than point forecasts. Second, infrequent and large shifts in error We de…ne over-di¤erencing informally as the di¤erencing of a series that cannot reasonably be considered unbounded, like the real interest rate or the consumption over income ratio. Harvey (1989) and West and Harrison (1997) provide detailed expositions of many such models from a frequentist and Bayesian perspective respectively. LOCALLY ADAPTIVE SIGNAL EXTRACTION 3 variances provide better conditional interval coverage than continuous smooth shifts. Third, models that allow for infrequent and large shifts in conditional mean but normal independent identically distributed (iid) innovations are very fragile to the presence of outliers and of shifts in error variance, whereas they perform well when the Gaussian iid assumption is removed. Our interpretation of the …rst two results is that shifts in error variance are large and persistent in our series, and therefore easy to detect and model with a change-point approach. The intuition for the third results is that when normality is imposed on the errors, any outlier (or increased variance) will be interpreted as a parameter shift in real time, generating excessively volatile forecasts. Section 2 presents LASER in rather general terms, and discusses some options to model shifts in local mean and variances. Section 3 speci…es four models that are nested in the LASER framework and di¤er in the speci…cation of the error structure and of time variation. Section 4 shows how two of these models imply time variation in the persistence of the process, not only in its mean and variance. Section 5 presents the forecasting experiment and discusses results, focusing on point forecasts …rst and then on interval coverage. Section 6 concludes. 2. Locally adaptive signal extraction and regression LASER (locally adaptive signal extraction and regression) is a state space model for …ltering and forecasting recently developed at the Swedish Central Bank and employing the approach to shifts in conditional mean and variance proposed by Giordani and Kohn (2008). The univariate random variable yt is decomposed into three processes3, all of which can have mixture of normals (MN) innovations: (1) A local mean t, which can be any conditionally Gaussian process (i.e. Gaussian conditional on a vector of latent indicators and parameters). (2) A latent, unobserved, process xt, modeled as a …nite-order, stationary, AR model with (i) MN innovations in the log variance process (ii) MN errors (iii) unknown lag length p: (3) An independently distributed measurement error/additive outlier process with (i) MN innovations in the log variance process (ii) MN errors. The observation equation for yt and the transition equation for xt are yt = t + xt + t (2.1) xt = 1xt 1 + :::+ pxt p + ut (2.2) t MN(ky; y; y; y;t y) (2.3) ut MN(kx; x; x; x;t x): (2.4) 3In fact there is a fourth component, a regression e¤ect (also time-varying) that we omit in our discussion since the paper focuses on univariate forecasting. 4 PAOLO GIORDANI AND MATTIAS VILLANI Equations (2.3) and (2.4) are to be read as follows: t has a mixture of normals (MN) distribution with ky components, and parameters speci…ed by a ky vector of probabilities y; a ky vector or means y;and a ky vector of variances y;t 2 y; where 2 y;t is a (possibly timevarying) scalar common to all components of the mixture. For identi…cation purposes, we set [ y]1 = [ x]1 = 0 and y 1 = [ x]1 = 1, where [a]i denotes the ith element of a vector a: The autoregressive parameters 1; :::; p are constrained to lie in the stationary region. The lag length is unknown and we compute its posterior by adding an updating step in the MCMC algorithm, with the user specifying the maximum number of lags and the prior lag probabilities.4 At this level the model is still very general and requires several choices to be made operational. We now discuss some options for the local mean process t: Modelling shifts in the local mean t. We refer to t as the "local mean" because we typically assume that t changes infrequently. One possible speci…cation is the random walk with a two-component mixture distribution, with one component being degenerate: t = t 1 + ;1u ;t with prob. 1 (2.5) = t 1 with prob. 1 1; where u ;t is iid N(0; 1) and 1 is the probability of a shift. In this case yt is globally nonstationary, although it may behave as a stationary series for long stretches. For 1 = 1 the innovations are normal as in the well-known local level model. An attractive alternative when prior information on the long-run mean of the series is available or when the sample is large is the globally stationary speci…cation t = + ;2u ;t with prob. 1 (2.6) = t 1 with prob. 1 1: It is also possible to allow for both types of shifts: t = t 1 + ;1u ;t with prob. 1 (2.7) = + ;2u ;t with prob. 2 (2.8) = t 1 with prob. 1 1 2: Finally, when the shifts are infrequent but possibly large, all these speci…cations are unappealing for most macroeconomic series in that they suggest an immediate jump of yt to the new local mean. Since we believe that large shifts (e.g. from high to low in‡ation) typically take place over the course of several quarters, we generalize the speci…cation in (2.7) as follows: 4Our prior assumes that if i 6= 0; then also j 6= 0 if i > j: This assumption could be easily relaxed. LOCALLY ADAPTIVE SIGNAL EXTRACTION 5 t = (1 )(e t t 1) (2.9) e t = t 1 + ;1u ;t with prob. 1 e t = + ;2u ;t with prob. 2 e t = e t 1 with prob. 1 1 2; where determines how gradual the transition is and = 0 retrieves (2.7). Here e t jumps and t moves gradually to e t: Modelling shifts in variances. We model shifts in log variances as random walks with a two-component mixture distribution, with one component being degenerate: ln y;t = ln y;t 1 + yey;t with prob. y (2.10) ln y;t = ln y;t 1 with prob. 1 y ; and ln x;t = ln x;t 1 + xex;t with prob. x (2.11) ln x;t = ln x;t 1 with prob. 1 x; with ey;t and ex;t both iid N(0; 1): This formulation allows for infrequent, large shifts as well as for continuos, small shifts ( y = x = 1). 3. Four forecasting models 3.1. Models. If t; x;t and y;t are constant and all innovations normal, LASER simpli…es to the state space representation of an ARMA(p,p) process. We wish to understand which additional features of LASER can be expected to contribute to forecasting accuracy (both in terms of point forecasts and of interval coverage). For this purpose we will compare the forecasting performance for several versions of LASER.5 These versions will e¤ectively di¤er only in the prior and not in the way inference is performed (which is by MCMC with lag selection and stationarity imposed in all cases; see the Appendix for a description of the MCMC scheme). The four models can be broadly characterized as follows: (1) ARMA. t; x;t and y;t constant and all innovations normal. (2) Shifts. Infrequent shifts in t; constant x;t and y;t and normal innovations. (3) Robust TVP. Normal innovations in t; ln( x;t) and ln( y;t) and MN innovations elsewhere. 5It would be interesting to also evaluate the forecasting performance of a Bayesian model average of these four models. Computing marginal likelihoods for dynamic mixture models is a very di¢ cult and time consuming endeavour, see e.g. Frühwirth-Schnatter (2006). Moreover, we would have to compute marginal likelihoods in every time period of the evaluation sample. Since our main motivation is to compare the four models to better understand their di¤erences in a forecasting environment, we will not compute a Bayesian model average of the models. 6 PAOLO GIORDANI AND MATTIAS VILLANI (4) Robust Shifts. Infrequent shifts in t; ln( x;t) and ln( y;t) and MN innovations elsewhere. The exact priors for each model are given in the next section. The ARMA model acts as a benchmark. The Shifts model has shifts in mean but normal iid errors. The Robust TVP speci…cation is meant, by comparison with the Robust Shifts speci…cation, to evaluate the relative merits of frequent, small shifts versus infrequent, larger shifts. 3.2. Priors. This section presents the priors used in this paper for blocks of parameters. Unless otherwise speci…ed, the priors are common to all four models. Priors for 1; :::; p; y; y; y;0; x; x; x;0. We assume the following probabilities for lag lengths p from 1 to 4 (longer lags have zero probability): prob(p = 1; p = 2; p = 3; p = 4) = (0:4; 0:3; 0:2; 0:1): 1; :::; pjp N BBB@ 6664 0 ::: 0 7775 ; 2 6664 1 2 3

Cite this paper

@inproceedings{rikSbank2008ForecastingMT, title={Forecasting Macroeconomic Time Series with Locally adaptive Signal extraction}, author={SverigeS rikSbank and Paolo Giordani and Mattias Villani}, year={2008} }