This page intentionally left blank [609650]

Licență

This page intentionally left blank [609650]

Byadmin ianuarie 1, 2024

This page intentionally left blank

Introductory Econometrics for Finance
SECOND EDITION
This best-selling textbook addresses the need for an introduction to
econometrics speciﬁcally written for ﬁnance students. It includes
examples and case studies which ﬁnance students will recognise and
relate to. This new edition builds on the successful data- and
problem-driven approach of the ﬁrst edition, giving students the skills to
estimate and interpret models while developing an intuitive grasp ofunderlying theoretical concepts.
Key features:
●Thoroughly revised and updated, including two new chapters onpanel data and limited dependent variable models
●Problem-solving approach assumes no prior knowledge ofeconometrics emphasising intuition rather than formulae, giving
students the skills and conﬁdence to estimate and interpret models
●Detailed examples and case studies from ﬁnance show students how
techniques are applied in real research
●Sample instructions and output from the popular computer packageEViews enable students to implement models themselves and
understand how to interpret results
●Gives advice on planning and executing a project in empirical ﬁnance,
preparing students for using econometrics in practice
●Covers important modern topics such as time-series forecasting,volatility modelling, switching models and simulation methods
●Thoroughly class-tested in leading ﬁnance schools
Chris Brooks is Professor of Finance at the ICMA Centre, University ofReading, UK, where he also obtained his PhD. He has published over
sixty articles in leading academic and practitioner journals including
theJournal of Business, theJournal of Banking and Finance, theJournal of
Empirical Finance, theReview of Economics and Statistics and the Economic
Journal. He is an associate editor of a number of journals including theInternational Journal of Forecasting. He has also acted as consultant for
various banks and professional bodies in the ﬁelds of ﬁnance,
econometrics and real estate.

Introductory Econometrics
for Finance
SECOND EDITION
Chris Brooks
The ICMA Centre, University of Reading

CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK
First published in print format
ISBN-13 978-0-521-87306-2
ISBN-13 978-0-521-69468-1ISBN-13 978-0-511-39848-3© Chris Brooks 2008
2008Information on this title: www.cambridge.org/[anonimizat]
This publication is in copyright. Subject to statutory exception and to the provision of
relevant collective licensing agreements, no reproduction of any part may take place without the written
permission of Cambrid ge University Press.
Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or a ppropriate.Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
paperbackeBook (EBL)
hardback

Contents
List of figures page xii
List of tables xiv
List of boxes xvi
List of screenshots xvii
Preface to the second edition xix
Acknowledgements xxiv
1 Introduction 1
1.1 What is econometrics? 1
1.2 Is ﬁnancial econometrics different from ‘economic econometrics’? 21.3 Types of data 31.4 Returns in ﬁnancial modelling 71.5 Steps involved in formulating an econometric model 91.6 Points to consider when reading articles in empirical ﬁnance 101.7 Econometric packages for modelling ﬁnancial data 111.8 Outline of the remainder of this book 221.9 Further reading 25
Appendix: Econometric software package suppliers 26
2 A brief overview of the classical linear regression model 27
2.1 What is a regression model? 272.2 Regression versus correlation 282.3 Simple regression 282.4 Some further terminology 372.5 Simple linear regression in EViews – estimation of an optimal
hedge ratio 40
2.6 The assumptions underlying the classical linear regression model 432.7 Properties of the OLS estimator 442.8 Precision and standard errors 462.9 An introduction to statistical inference 51
v

vi Contents
2.10 A special type of hypothesis test: the t-ratio 65
2.11 An example of the use of a simple t-test to test a theory in ﬁnance:
can US mutual funds beat the market? 67
2.12 Can UK unit trust managers beat the market? 69
2.13 The overreaction hypothesis and the UK stock market 71
2.14 The exact signiﬁcance level 74
2.15 Hypothesis testing in EViews – example 1: hedging revisited 75
2.16 Estimation and hypothesis testing in EViews – example 2:
the CAPM 77Appendix: Mathematical derivations of CLRM results 81
3 Further development and analysis of the classical linear
regression model 88
3.1 Generalising the simple model to multiple linear regression 883.2 The constant term 893.3 How are the parameters (the elements of the βvector) calculated
in the generalised case? 91
3.4 Testing multiple hypotheses: the F-test 93
3.5 Sample EViews output for multiple hypothesis tests 993.6 Multiple regression in EViews using an APT-style model 993.7 Data mining and the true size of the test 1053.8 Goodness of ﬁt statistics 1063.9 Hedonic pricing models 112
3.10 Tests of non-nested hypotheses 115
Appendix 3.1: Mathematical derivations of CLRM results 117Appendix 3.2: A brief introduction to factor models and principalcomponents analysis 120
4 Classical linear regression model assumptions and
diagnostic tests 129
4.1 Introduction 1294.2 Statistical distributions for diagnostic tests 1304.3 Assumption 1: E(u
t)=0 131
4.4 Assumption 2: var(ut)=σ2<∞ 132
4.5 Assumption 3: cov(ui,uj)=0 for i/negationslash=j 139
4.6 Assumption 4: the xtare non-stochastic 160
4.7 Assumption 5: the disturbances are normally distributed 1614.8 Multicollinearity 1704.9 Adopting the wrong functional form 174
4.10 Omission of an important variable 178
4.11 Inclusion of an irrelevant variable 179

Contents vii
4.12 Parameter stability tests 180
4.13 A strategy for constructing econometric models and a discussion
of model-building philosophies 191
4.14 Determinants of sovereign credit ratings 194
5 Univariate time series modelling and forecasting 206
5.1 Introduction 2065.2 Some notation and concepts 2075.3 Moving average processes 2115.4 Autoregressive processes 2155.5 The partial autocorrelation function 2225.6 ARMA processes 2235.7 Building ARMA models: the Box–Jenkins approach 2305.8 Constructing ARMA models in EViews 2345.9 Examples of time series modelling in ﬁnance 239
5.10 Exponential smoothing 241
5.11 Forecasting in econometrics 243
5.12 Forecasting using ARMA models in EViews 2565.13 Estimating exponential smoothing models using EViews 258
6 Multivariate models 265
6.1 Motivations 2656.2 Simultaneous equations bias 2686.3 So how can simultaneous equations models be validly estimated? 2696.4 Can the original coefﬁcients be retrieved from the πs? 269
6.5 Simultaneous equations in ﬁnance 2726.6 A deﬁnition of exogeneity 2736.7 Triangular systems 2756.8 Estimation procedures for simultaneous equations systems 2766.9 An application of a simultaneous equations approach to
modelling bid–ask spreads and trading activity 279
6.10 Simultaneous equations modelling using EViews 285
6.11 Vector autoregressive models 290
6.12 Does the VAR include contemporaneous terms? 2956.13 Block signiﬁcance and causality tests 297
6.14 VARs with exogenous variables 298
6.15 Impulse responses and variance decompositions 298
6.16 VAR model example: the interaction between property returns and
the macroeconomy 302
6.17 VAR estimation in EViews 308

viii Contents
7 Modelling long-run relationships in ﬁnance 318
7.1 Stationarity and unit root testing 318
7.2 Testing for unit roots in EViews 3317.3 Cointegration 3357.4 Equilibrium correction or error correction models 3377.5 Testing for cointegration in regression: a residuals-based approach 3397.6 Methods of parameter estimation in cointegrated systems 3417.7 Lead–lag and long-term relationships between spot and
futures markets 343
7.8 Testing for and estimating cointegrating systems using the
Johansen technique based on VARs 350
7.9 Purchasing power parity 355
7.10 Cointegration between international bond markets 357
7.11 Testing the expectations hypothesis of the term structure of
interest rates 362
7.12 Testing for cointegration and modelling cointegrated systems
using EViews 365
8 Modelling volatility and correlation 379
8.1 Motivations: an excursion into non-linearity land 379
8.2 Models for volatility 3838.3 Historical volatility 3838.4 Implied volatility models 3848.5 Exponentially weighted moving average models 3848.6 Autoregressive volatility models 3858.7 Autoregressive conditionally heteroscedastic (ARCH) models 3868.8 Generalised ARCH (GARCH) models 3928.9 Estimation of ARCH/GARCH models 394
8.10 Extensions to the basic GARCH model 404
8.11 Asymmetric GARCH models 404
8.12 The GJR model 4058.13 The EGARCH model 406
8.14 GJR and EGARCH in EViews 406
8.15 Tests for asymmetries in volatility 408
8.16 GARCH-in-mean 409
8.17 Uses of GARCH-type models including volatility forecasting 411
8.18 Testing non-linear restrictions or testing hypotheses about
non-linear models 417
8.19 Volatility forecasting: some examples and results from the
literature 420
8.20 Stochastic volatility models revisited 427

Contents ix
8.21 Forecasting covariances and correlations 428
8.22 Covariance modelling and forecasting in ﬁnance: some examples 429
8.23 Historical covariance and correlation 4318.24 Implied covariance models 4318.25 Exponentially weighted moving average model for covariances 4328.26 Multivariate GARCH models 4328.27 A multivariate GARCH model for the CAPM with time-varying
covariances 436
8.28 Estimating a time-varying hedge ratio for FTSE stock index returns 4378.29 Estimating multivariate GARCH models using EViews 441
Appendix: Parameter estimation using maximum likelihood 444
9 Switching models 451
9.1 Motivations 4519.2 Seasonalities in ﬁnancial markets: introduction and
literature review 454
9.3 Modelling seasonality in ﬁnancial data 4559.4 Estimating simple piecewise linear functions 4629.5 Markov switching models 4649.6 A Markov switching model for the real exchange rate 4669.7 A Markov switching model for the gilt–equity yield ratio 4699.8 Threshold autoregressive models 4739.9 Estimation of threshold autoregressive models 474
9.10 Speciﬁcation tests in the context of Markov switching and
threshold autoregressive models: a cautionary note 476
9.11 A SETAR model for the French franc–German mark exchange rate 477
9.12 Threshold models and the dynamics of the FTSE 100 index and
index futures markets 480
9.13 A note on regime switching models and forecasting accuracy 484
10 Panel data 487
10.1 Introduction – what are panel techniques and why are they used? 48710.2 What panel techniques are available? 48910.3 The ﬁxed effects model 49010.4 Time-ﬁxed effects models 49310.5 Investigating banking competition using a ﬁxed effects model 49410.6 The random effects model 49810.7 Panel data application to credit stability of banks in Central and
Eastern Europe 499
10.8 Panel data with EViews 50210.9 Further reading 509

x Contents
11 Limited dependent variable models 511
11.1 Introduction and motivation 511
11.2 The linear probability model 51211.3 The logit model 51411.4 Using a logit to test the pecking order hypothesis 51511.5 The probit model 51711.6 Choosing between the logit and probit models 51811.7 Estimation of limited dependent variable models 51811.8 Goodness of ﬁt measures for linear dependent variable models 51911.9 Multinomial linear dependent variables 521
11.10 The pecking order hypothesis revisited – the choice between
ﬁnancing methods 525
11.11 Ordered response linear dependent variables models 527
11.12 Are unsolicited credit ratings biased downwards? An ordered
probit analysis 528
11.13 Censored and truncated dependent variables 533
11.14 Limited dependent variable models in EViews 537
Appendix: The maximum likelihood estimator for logit and
probit models 544
12 Simulation methods 546
12.1 Motivations 54612.2 Monte Carlo simulations 54712.3 Variance reduction techniques 54912.4 Bootstrapping 55312.5 Random number generation 55712.6 Disadvantages of the simulation approach to econometric or
ﬁnancial problem solving 558
12.7 An example of Monte Carlo simulation in econometrics: deriving a
set of critical values for a Dickey–Fuller test 559
12.8 An example of how to simulate the price of a ﬁnancial option 56512.9 An example of bootstrapping to calculate capital risk requirements 571
13 Conducting empirical research or doing a project or dissertation
in ﬁnance 585
13.1 What is an empirical research project and what is it for? 58513.2 Selecting the topic 58613.3 Sponsored or independent research? 59013.4 The research proposal 59013.5 Working papers and literature on the internet 59113.6 Getting the data 591

Contents xi
13.7 Choice of computer software 593
13.8 How might the ﬁnished project look? 59313.9 Presentational issues 597
14 Recent and future developments in the modelling
of ﬁnancial time series 598
14.1 Summary of the book 59814.2 What was not covered in the book 59814.3 Financial econometrics: the future? 60214.4 The ﬁnal word 606
Appendix 1 A review of some fundamental mathematical and
statistical concepts 607
A1 Introduction 607A2 Characteristics of probability distributions 607A3 Properties of logarithms 608A4 Differential calculus 609A5 Matrices 611A6 The eigenvalues of a matrix 614
Appendix 2 Tables of statistical distributions 616
Appendix 3 Sources of data used in this book 628
References 629
Index 641

Figures
1.1 Steps involved in forming an
econometric model page 9
2.1 Scatter plot of two variables, yand x 29
2.2 Scatter plot of two variables with a line
of best ﬁt chosen by eye 31
2.3 Method of OLS ﬁtting a line to the data
by minimising the sum of squaredresiduals 32
2.4 Plot of a single observation, together
with the line of best ﬁt, the residualand the ﬁtted value 32
2.5 Scatter plot of excess returns on fund
XXX versus excess returns on themarket portfolio 35
2.6 No observations close to the y-axis 36
2.7 Effect on the standard errors of the
coefﬁcient estimates when (x
t−¯x)are
narrowly dispersed 48
2.8 Effect on the standard errors of the
coefﬁcient estimates when (xt−¯x)are
widely dispersed 49
2.9 Effect on the standard errors of x2
tlarge 49
2.10 Effect on the standard errors of x2
tsmall 50
2.11 The normal distribution 542.12 The t-distribution versus the normal 55
2.13 Rejection regions for a two-sided 5%
hypothesis test 57
2.14 Rejection regions for a one-sided
hypothesis test of the form H
0:β=β∗,
H1:β<β∗57
2.15 Rejection regions for a one-sided
hypothesis test of the form H0:β=β∗,
H1:β>β∗57
2.16 Critical values and rejection regions for
at20;5% 612.17 Frequency distribution of t-ratios of
mutual fund alphas (gross oftransactions costs) Source: Jensen(1968). Reprinted with the permissionof Blackwell Publishers 68
2.18 Frequency distribution of t-ratios of
mutual fund alphas (net oftransactions costs) Source: Jensen(1968). Reprinted with the permissionof Blackwell Publishers 68
2.19 Performance of UK unit trusts,
1979–2000 70
3.1 R
2=0demonstrated by a ﬂat
estimated line, i.e. a zero slopecoefﬁcient 109
3.2 R
2=1when all data points lie exactly
on the estimated line 109
4.1 Effect of no intercept on a regression
line 131
4.2 Graphical illustration of
heteroscedasticity 132
4.3 Plot of ˆutagainst ˆut−1, showing positive
autocorrelation 141
4.4 Plot of ˆutover time, showing positive
autocorrelation 142
4.5 Plot of ˆutagainst ˆut−1, showing
negative autocorrelation 142
4.6 Plot of ˆutover time, showing negative
autocorrelation 143
4.7 Plot of ˆutagainst ˆut−1, showing no
autocorrelation 143
4.8 Plot of ˆutover time, showing no
autocorrelation 144
4.9 Rejection and non-rejection regions for
DW test 147
xii

List of figures xiii
4.10 A normal versus a skewed distribution 162
4.11 A leptokurtic versus a normal
distribution 162
4.12 Regression residuals from stock return
data, showing large outlier for October1987 165
4.13 Possible effect of an outlier on OLS
estimation 166
4.14 Plot of a variable showing suggestion
for break date 185
5.1 Autocorrelation function for sample
MA(2) process 215
5.2 Sample autocorrelation and partial
autocorrelation functions for an MA(1)model: y
t=−0.5ut−1+ut 226
5.3 Sample autocorrelation and partial
autocorrelation functions for an MA(2)model: y
t=0.5ut−1−0.25ut−2+ut 226
5.4 Sample autocorrelation and partial
autocorrelation functions for a slowlydecaying AR(1) model: y
t=0.9yt−1+ut 227
5.5 Sample autocorrelation and partial
autocorrelation functions for a morerapidly decaying AR(1) model:
y
t=0.5yt−1+ut 227
5.6 Sample autocorrelation and partial
autocorrelation functions for a morerapidly decaying AR(1) model withnegative coefﬁcient: y
t=−0.5yt−1+ut 228
5.7 Sample autocorrelation and partial
autocorrelation functions for anon-stationary model (i.e. a unitcoefﬁcient): y
t=yt−1+ut 228
5.8 Sample autocorrelation and partial
autocorrelation functions for anARMA(1, 1) model:
y
t=0.5yt−1+0.5ut−1+ut 229
5.9 Use of an in-sample and an
out-of-sample period for analysis 245
6.1 Impulse responses and standard error
bands for innovations in unexpectedinﬂation equation errors 307
6.2 Impulse responses and standard error
bands for innovations in the dividendyields 307
7.1 Value of R
2for 1,000 sets of regressions
of a non-stationary variable on anotherindependent non-stationary variable 3197.2 Value of t-ratio of slope coefﬁcient for
1,000 sets of regressions of anon-stationary variable on anotherindependent non-stationary variable 320
7.3 Example of a white noise process 3247.4 Time series plot of a random walk
versus a random walk with drift 324
7.5 Time series plot of a deterministic
trend process 325
7.6 Autoregressive processes with differing
values of φ(0, 0.8, 1) 325
8.1 Daily S&P returns for January
1990–December 1999 387
8.2 The problem of local optima in
maximum likelihood estimation 397
8.3 News impact curves for S&P500 returns
using coefﬁcients implied from GARCHand GJR model estimates 410
8.4 Three approaches to hypothesis testing
under maximum likelihood 418
8.5 Source: Brooks, Henry and Persand
(2002). Time-varying hedge ratiosderived from symmetric andasymmetric BEKK models for FTSEreturns. 440
9.1 Sample time series plot illustrating a
regime shift 452
9.2 Use of intercept dummy variables for
quarterly data 456
9.3 Use of slope dummy variables 4599.4 Piecewise linear model with
threshold x
∗463
9.5 Source: Brooks and Persand (2001b).
Unconditional distribution ofUS GEYR together with a normaldistribution with the same mean andvariance 470
9.6 Source: Brooks and Persand (2001b).
Value of GEYR and probability thatit is in the High GEYR regime for theUK 471
11.1 The fatal ﬂaw of the linear probability
model 513
11.2 The logit model 51511.3 Modelling charitable donations as a
function of income 534
11.4 Fitted values from the failure probit
regression 542

Tables
1.1 Econometric software packages for
modelling ﬁnancial data page 12
2.1 Sample data on fund XXX to motivate
OLS estimation 34
2.2 Critical values from the standard
normal versus t-distribution 55
2.3 Classifying hypothesis testing errors
and correct conclusions 64
2.4 Summary statistics for the estimated
regression results for (2.52) 67
2.5 Summary statistics for unit trust
returns, January 1979–May 2000 69
2.6 CAPM regression results for unit trust
returns, January 1979–May 2000 70
2.7 Is there an overreaction effect in the
UK stock market? 73
2.8 Part of the EViews regression output
revisited 75
3.1 Hedonic model of rental values in
Quebec City, 1990. Dependent variable:Canadian dollars per month 114
3A.1 Principal component ordered
eigenvalues for Dutch interest rates,1962–1970 123
3A.2 Factor loadings of the ﬁrst and second
principal components for Dutchinterest rates, 1962–1970 123
4.1 Constructing a series of lagged values
and ﬁrst differences 140
4.2 Determinants and impacts of sovereign
credit ratings 197
4.3 Do ratings add to public information? 1994.4 What determines reactions to ratings
announcements? 2015.1 Uncovered interest parity test results 241
5.2 Forecast error aggregation 2526.1 Call bid–ask spread and trading volume
regression 283
6.2 Put bid–ask spread and trading volume
regression 283
6.3 Granger causality tests and implied
restrictions on VAR models 297
6.4 Marginal signiﬁcance levels associated
with joint F-tests 305
6.5 Variance decompositions for the
property sector index residuals 306
7.1 Critical values for DF tests (Fuller, 1976,
p. 373) 328
7.2 DF tests on log-prices and returns for
high frequency FTSE data 344
7.3 Estimated potentially cointegrating
equation and test for cointegration forhigh frequency FTSE data 345
7.4 Estimated error correction model for
high frequency FTSE data 346
7.5 Comparison of out-of-sample
forecasting accuracy 346
7.6 Trading proﬁtability of the error
correction model with cost of carry 348
7.7 Cointegration tests of PPP with
European data 356
7.8 DF tests for international bond indices 3577.9 Cointegration tests for pairs of
international bond indices 358
7.10 Johansen tests for cointegration
between international bond yields 359
7.11 Variance decompositions for VAR of
international bond yields 360
xiv

List of tables xv
7.12 Impulse responses for VAR of
international bond yields 361
7.13 Tests of the expectations hypothesis
using the US zero coupon yield curvewith monthly data 364
8.1 GARCH versus implied volatility 4238.2 EGARCH versus implied volatility 4238.3 Out-of-sample predictive power for
weekly volatility forecasts 426
8.4 Comparisons of the relative
information content of out-of-samplevolatility forecasts 426
8.5 Hedging effectiveness: summary
statistics for portfolio returns 439
9.1 Values and signiﬁcances of days of the
week coefﬁcients 458
9.2 Day-of-the-week effects with the
inclusion of interactive dummyvariables with the risk proxy 461
9.3 Estimates of the Markov switching
model for real exchange rates 468
9.4 Estimated parameters for the Markov
switching models 470
9.5 SETAR model for FRF–DEM 4789.6 FRF–DEM forecast accuracies 4799.7 Linear AR(3) model for the basis 4829.8 A two-threshold SETAR model for the
basis 483
10.1 Tests of banking market equilibrium
with ﬁxed effects panel models 49610.2 Tests of competition in banking with
ﬁxed effects panel models 497
10.3 Results of random effects panel
regression for credit stability of Centraland East European banks 503
11.1 Logit estimation of the probability of
external ﬁnancing 517
11.2 Multinomial logit estimation of the
type of external ﬁnancing 527
11.3 Ordered probit model results for the
determinants of credit ratings 531
11.4 Two-step ordered probit model
allowing for selectivity bias in thedeterminants of credit ratings 532
11.5 Marginal effects for logit and
probit models for probability of MScfailure 543
12.1 EGARCH estimates for currency futures
returns 574
12.2 Autoregressive volatility estimates for
currency futures returns 575
12.3 Minimum capital risk requirements for
currency futures as a percentage of theinitial value of the position 578
13.1 Journals in ﬁnance and
econometrics 589
13.2 Useful internet sites for ﬁnancial
literature 592
13.3 Suggested structure for a typical
dissertation or project 594

Boxes
1.1 The value of econometrics page 2
1.2 Time series data 41.3 Log returns 81.4 Points to consider when reading a
published paper 11
1.5 Features of EViews 212.1 Names for yand xs in regression
models 28
2.2 Reasons for the inclusion of the
disturbance term 30
2.3 Assumptions concerning disturbance
terms and their interpretation 44
2.4 Standard error estimators 482.5 Conducting a test of signiﬁcance 562.6 Carrying out a hypothesis test using
conﬁdence intervals 60
2.7 The test of signiﬁcance and conﬁdence
interval approaches compared 61
2.8 Type I and type II errors 642.9 Reasons for stock market overreactions 712.10 Ranking stocks and forming portfolios 722.11 Portfolio monitoring 723.1 The relationship between the
regression F-statistic and R
2111
3.2 Selecting between models 1174.1 Conducting White’s test 1344.2 ‘Solutions’ for heteroscedasticity 1384.3 Conditions for DWto be a valid
test 148
4.4 Conducting a Breusch–Godfrey test 1494.5 The Cochrane–Orcutt procedure 1514.6 Observations for the dummy variable 165
4.7 Conducting a Chow test 1805.1 The stationarity condition for an AR( p)
model 216
5.2 The invertibility condition for an MA(2)
model 224
5.3 Naive forecasting methods 2476.1 Determining whether an equation is
identiﬁed 270
6.2 Conducting a Hausman test for
exogeneity 274
6.3 Forecasting with VARs 2997.1 Stationarity tests 3317.2 Multiple cointegrating relationships 3408.1 Testing for ‘ARCH effects’ 3908.2 Estimating an ARCH or GARCH model 3958.3 Using maximum likelihood estimation
in practice 398
9.1 How do dummy variables work? 45610.1 Fixed or random effects? 50011.1 Parameter interpretation for probit and
logit models 519
11.2 The differences between censored and
truncated dependent variables 535
12.1 Conducting a Monte Carlo simulation 54812.2 Re-sampling the data 55512.3 Re-sampling from the residuals 55612.4 Setting up a Monte Carlo simulation 56012.5 Simulating the price of an Asian option 56512.6 Generating draws from a GARCH
process 566
xvi

Screenshots
1.1 Creating a workﬁle page 15
1.2 Importing Excel data into the workﬁle 161.3 The workﬁle containing loaded data 171.4 Summary statistics for a series 191.5 A line graph 202.1 Summary statistics for spot and futures 412.2 Equation estimation window 422.3 Estimation results 432.4 Plot of two series 793.1 Stepwise procedure equation
estimation window 103
3.2 Conducting PCA in EViews 1264.1 Regression options window 1394.2 Non-normality test results 1644.3 Regression residuals, actual values and
ﬁtted series 168
4.4 Chow test for parameter stability 1884.5 Plotting recursive coefﬁcient estimates 1904.6 CUSUM test graph 1915.1 Estimating the correlogram 2355.2 Plot and summary statistics for the
dynamic forecasts for the percentagechanges in house prices using anAR(2) 257
5.3 Plot and summary statistics for the
static forecasts for the percentagechanges in house prices using anAR(2) 2585.4 Estimating exponential smoothing
models 259
6.1 Estimating the inﬂation equation 2886.2 Estimating the rsandp equation 2896.3 VAR inputs screen 3106.4 Constructing the VAR impulse
responses 313
6.5 Combined impulse response graphs 3146.6 Variance decomposition graphs 3157.1 Options menu for unit root tests 3327.2 Actual, Fitted and Residual plot to
check for stationarity 366
7.3 Johansen cointegration test 3687.4 VAR speciﬁcation for Johansen tests 3748.1 Estimating a GARCH-type model 4008.2 GARCH model estimation options 4018.3 Forecasting from GARCH models 4158.4 Dynamic forecasts of the conditional
variance 415
8.5 Static forecasts of the conditional
variance 416
8.6 Making a system 44110.1 Workﬁle structure window 50511.1 ‘Equation Estimation’ window for
limited dependent variables 539
11.2 ‘Equation Estimation’ options for
limited dependent variables 541
12.1 Running an EViews program 561
xvii

Preface to the second edition
Sales of the ﬁrst edition of this book surpassed expectations (at least
those of the author). Almost all of those who have contacted the authorseem to like the book, and while other textbooks have been publishedsince that date in the broad area of ﬁnancial econometrics, none is reallyat the introductory level. All of the motivations for the ﬁrst edition,described below, seem just as important today. Given that the bookseems to have gone down well with readers, I have left the style largelyunaltered and made small changes to the structure, described below.
The main motivations for writing the ﬁrst edition of the book were:
●To write a book that focused on using and applying the techniques rather
than deriving proofs and learning formulae
●To write an accessible textbook that required no prior knowledge ofeconometrics, but which also covered more recently developed ap-proaches usually found only in more advanced texts
●To use examples and terminology from ﬁnance rather than economicssince there are many introductory texts in econometrics aimed at stu-dents of economics but none for students of ﬁnance
●To litter the book with case studies of the use of econometrics in prac-tice taken from the academic ﬁnance literature
●To include sample instructions, screen dumps and computer outputfrom two popular econometrics packages. This enabled readers to seehow the techniques can be implemented in practice
●To develop a companion web site containing answers to end-of-chapterquestions, PowerPoint slides and other supporting materials.
xix

xx Preface
Why I thought a second edition was needed
The second edition includes a number of important new features.
(1) It could have reasonably been argued that the ﬁrst edition of the book
had a slight bias towards time-series methods, probably in part as aconsequence of the main areas of interest of the author. This secondedition redresses the balance by including two new chapters, on lim-ited dependent variables and on panel techniques. Chapters 3 and 4from the ﬁrst edition, which provided the core material on linear re-gression, have now been expanded and reorganised into three chapters(2 to 4) in the second edition.
(2) As a result of the length of time it took to write the book, to produce
the ﬁnal product, and the time that has elapsed since then, the dataand examples used in the book are already several years old. Moreimportantly, the data used in the examples for the ﬁrst edition werealmost all obtained from Datastream International, an organisationwhich expressly denied the author permission to distribute the dataor to put them on a web site. By contrast, this edition as far as possi-ble uses fully updated datasets from freely available sources, so thatreaders should be able to directly replicate the examples used in thetext.
(3) A number of new case studies from the academic ﬁnance literature are
employed, notably on the pecking order hypothesis of ﬁrm ﬁnancing,credit ratings, banking competition, tests of purchasing power parity,and evaluation of mutual fund manager performance.
(4) The previous edition incorporated sample instructions from EViews
and WinRATS. As a result of the additional content of the new chap-ters, and in order to try to keep the length of the book manageable,it was decided to include only sample instructions and outputs fromthe EViews package in the revised version. WinRATS will continue tobe supported, but in a separate handbook published by CambridgeUniversity Press (ISBN: 9780521896955).
Motivations for the ﬁrst edition
This book had its genesis in two sets of lectures given annually by theauthor at the ICMA Centre (formerly ISMA Centre), University of Readingand arose partly from several years of frustration at the lack of an appro-priate textbook. In the past, ﬁnance was but a small sub-discipline drawnfrom economics and accounting, and therefore it was generally safe to

Preface xxi
assume that students of ﬁnance were well grounded in economic prin-
ciples; econometrics would be taught using economic motivations andexamples.
However, ﬁnance as a subject has taken on a life of its own in recent
years. Drawn in by perceptions of exciting careers and telephone-numbersalaries in the ﬁnancial markets, the number of students of ﬁnance hasgrown phenomenally, all around the world. At the same time, the diversityof educational backgrounds of students taking ﬁnance courses has alsoexpanded. It is not uncommon to ﬁnd undergraduate students of ﬁnanceeven without advanced high-school qualiﬁcations in mathematics or eco-nomics. Conversely, many with PhDs in physics or engineering are alsoattracted to study ﬁnance at the Masters level. Unfortunately, authors oftextbooks have failed to keep pace, thus far, with the change in the natureof students. In my opinion, the currently available textbooks fall short ofthe requirements of this market in three main regards, which this bookseeks to address:
(1) Books fall into two distinct and non-overlapping categories: the intro-
ductory and the advanced. Introductory textbooks are at the appro-priate level for students with limited backgrounds in mathematics orstatistics, but their focus is too narrow. They often spend too longderiving the most basic results, and treatment of important, interest-ing and relevant topics (such as simulations methods, VAR modelling,etc.) is covered in only the last few pages, if at all. The more advancedtextbooks, meanwhile, usually require a quantum leap in the level ofmathematical ability assumed of readers, so that such books cannot beused on courses lasting only one or two semesters, or where studentshave differing backgrounds. In this book, I have tried to sweep a broadbrush over a large number of different econometric techniques thatare relevant to the analysis of ﬁnancial and other data.
(2) Many of the currently available textbooks with broad coverage are too
theoretical in nature and students can often, after reading such abook, still have no idea of how to tackle real-world problems them-selves, even if they have mastered the techniques in theory. To thisend, in this book, I have tried to present examples of the use of thetechniques in ﬁnance, together with annotated computer instructionsand sample outputs for an econometrics package (EViews). This shouldassist students who wish to learn how to estimate models for them-selves – for example, if they are required to complete a project or dis-sertation. Some examples have been developed especially for this book,while many others are drawn from the academic ﬁnance literature. In

xxii Preface
my opinion, this is an essential but rare feature of a textbook that
should help to show students how econometrics is really applied. It isalso hoped that this approach will encourage some students to delvedeeper into the literature, and will give useful pointers and stimulateideas for research projects. It should, however, be stated at the out-set that the purpose of including examples from the academic ﬁnanceprint is not to provide a comprehensive overview of the literature or todiscuss all of the relevant work in those areas, but rather to illustratethe techniques. Therefore, the literature reviews may be considered de-liberately deﬁcient, with interested readers directed to the suggestedreadings and the references therein.
(3) With few exceptions, almost all textbooks that are aimed at the intro-
ductory level draw their motivations and examples from economics,which may be of limited interest to students of ﬁnance or business.To see this, try motivating regression relationships using an examplesuch as the effect of changes in income on consumption and watchyour audience, who are primarily interested in business and ﬁnanceapplications, slip away and lose interest in the ﬁrst ten minutes ofyour course.
Who should read this book?
The intended audience is undergraduates or Masters/MBA students whorequire a broad knowledge of modern econometric techniques commonlyemployed in the ﬁnance literature. It is hoped that the book will also beuseful for researchers (both academics and practitioners), who require anintroduction to the statistical tools commonly employed in the area ofﬁnance. The book can be used for courses covering ﬁnancial time-seriesanalysis or ﬁnancial econometrics in undergraduate or postgraduate pro-grammes in ﬁnance, ﬁnancial economics, securities and investments.
Although the applications and motivations for model-building given in
the book are drawn from ﬁnance, the empirical testing of theories in manyother disciplines, such as management studies, business studies, real es-tate, economics and so on, may usefully employ econometric analysis. Forthis group, the book may also prove useful.
Finally, while the present text is designed mainly for students at the
undergraduate or Masters level, it could also provide introductory read-ing in ﬁnancial time-series modelling for ﬁnance doctoral programmeswhere students have backgrounds which do not include courses in mod-ern econometric techniques.

Preface xxiii
Pre-requisites for good understanding of this material
In order to make the book as accessible as possible, the only background
recommended in terms of quantitative techniques is that readers haveintroductory knowledge of calculus, algebra (including matrices) and basicstatistics. However, even these are not necessarily prerequisites since theyare covered brieﬂy in an appendix to the text. The emphasis throughoutthe book is on a valid application of the techniques to real data andproblems in ﬁnance.
In the ﬁnance and investment area, it is assumed that the reader has
knowledge of the fundamentals of corporate ﬁnance, ﬁnancial marketsand investment. Therefore, subjects such as portfolio theory, the CapitalAsset Pricing Model (CAPM) and Arbitrage Pricing Theory (APT), the efﬁ-cient markets hypothesis, the pricing of derivative securities and the termstructure of interest rates, which are frequently referred to throughout thebook, are nottreated in this text. There are very many good books available
in corporate ﬁnance, in investments, and in futures and options, includ-ing those by Brealey and Myers (2005), Bodie, Kane and Marcus (2008) andHull (2005) respectively.
Chris Brooks, October 2007

Acknowledgements
I am grateful to Gita Persand, Olan Henry, James Chong and Apostolos
Katsaris, who assisted with various parts of the software applications forthe ﬁrst edition. I am also grateful to Hilary Feltham for assistance withthe mathematical review appendix and to Simone Varotto for useful dis-cussions and advice concerning the EViews example used in chapter 11.
I would also like to thank Simon Burke, James Chong and Con Keat-
ing for detailed and constructive comments on various drafts of the ﬁrstedition and Simon Burke for comments on parts of the second edition.The ﬁrst and second editions additionally beneﬁted from the comments,suggestions and questions of Peter Burridge, Kyongwook Choi, ThomasEilertsen, Waleid Eldien, Andrea Gheno, Kimon Gomozias, Abid Hameed,Arty Khemlani, David McCaffrey, Tehri Jokipii, Emese Lazar, Zhao Liuyan,Dimitri Lvov, Bill McCabe, Junshi Ma, David Merchan, Victor Murinde, ThaiPham, Jean-Sebastien Pourchet, Guilherme Silva, Silvia Stanescu, Li Qui,Panagiotis Varlagas, and Meng-Feng Yen.
A number of people sent useful e-mails pointing out typos or inaccu-
racies in the ﬁrst edition. To this end, I am grateful to Merlyn Foo, Jande Gooijer and his colleagues, Mikael Petitjean, Fred Sterbenz, and BirgitStrikholm.
Useful comments and software support from QMS and Estima are grate-
fully acknowledged. Any remaining errors are mine alone.
The publisher and author have used their best endeavours to ensure
that the URLs for external web sites referred to in this book are correctand active at the time of going to press. However, the publisher and authorhave no responsibility for the web sites and can make no guarantee thata site will remain live or that the content is or will remain appropriate.
xxiv

1
Introduction
This chapter sets the scene for the book by discussing in broad terms
the questions of what is econometrics, and what are the ‘stylised facts’describing ﬁnancial data that researchers in this area typically try to cap-ture in their models. It also collects together a number of preliminaryissues relating to the construction of econometric models in ﬁnance.
Learning Outcomes
In this chapter, you will learn how to
●Distinguish between different types of data
●Describe the steps involved in building an econometric model
●Calculate asset price returns
●Construct a workﬁle, import data and accomplish simple tasks
in EViews
1.1 What is econometrics?
The literal meaning of the word econometrics is ‘measurement in eco-
nomics’. The ﬁrst four letters of the word suggest correctly that the originsof econometrics are rooted in economics. However, the main techniquesemployed for studying economic problems are of equal importance inﬁnancial applications. As the term is used in this book, ﬁnancial econo-metrics will be deﬁned as the application of statistical techniques to problems
in finance . Financial econometrics can be useful for testing theories in
ﬁnance, determining asset prices or returns, testing hypotheses concern-ing the relationships between variables, examining the effect on ﬁnancialmarkets of changes in economic conditions, forecasting future values ofﬁnancial variables and for ﬁnancial decision-making. A list of possibleexamples of where econometrics may be useful is given in box 1.1.
1

2 Introductory Econometrics for Finance
Box 1.1 The value of econometrics
(1) Testing whether ﬁnancial markets are weak-form informationally efﬁcient
(2) Testing whether the Capital Asset Pricing Model (CAPM) or Arbitrage Pricing Theory
(APT) represent superior models for the determination of returns on risky assets
(3) Measuring and forecasting the volatility of bond returns(4) Explaining the determinants of bond credit ratings used by the ratings agencies(5) Modelling long-term relationships between prices and exchange rates(6) Determining the optimal hedge ratio for a spot position in oil(7) Testing technical trading rules to determine which makes the most money(8) Testing the hypothesis that earnings or dividend announcements have no effect on
stock prices
(9) Testing whether spot or futures markets react more rapidly to news
(10) Forecasting the correlation between the stock indices of two countries.
The list in box 1.1 is of course by no means exhaustive, but it hopefully
gives some ﬂavour of the usefulness of econometric tools in terms of theirﬁnancial applicability.
1.2 Is ﬁnancial econometrics different from ‘economic
econometrics’?
As previously stated, the tools commonly used in ﬁnancial applications are
fundamentally the same as those used in economic applications, althoughthe emphasis and the sets of problems that are likely to be encounteredwhen analysing the two sets of data are somewhat different. Financialdata often differ from macroeconomic data in terms of their frequency,accuracy, seasonality and other properties.
In economics, a serious problem is often a lack of data at hand for testing
the theory or hypothesis of interest – this is often called a ‘small samplesproblem’. It might be, for example, that data are required on governmentbudget deﬁcits, or population ﬁgures, which are measured only on anannual basis. If the methods used to measure these quantities changed aquarter of a century ago, then only at most twenty-ﬁve of these annualobservations are usefully available.
Two other problems that are often encountered in conducting applied
econometric work in the arena of economics are those of measurement
error and data revisions . These difﬁculties are simply that the data may be
estimated, or measured with error, and will often be subject to severalvintages of subsequent revisions. For example, a researcher may estimatean economic model of the effect on national output of investment incomputer technology using a set of published data, only to ﬁnd that the

Introduction 3
data for the last two years have been revised substantially in the next,
updated publication.
These issues are rarely of concern in ﬁnance. Financial data come in
many shapes and forms, but in general the prices and other entities thatare recorded are those at which trades actually took place , or which were
quoted on the screens of information providers. There exists, of course, the
possibility for typos and possibility for the data measurement method tochange (for example, owing to stock index re-balancing or re-basing). Butin general the measurement error and revisions problems are far lessserious in the ﬁnancial context.
Similarly, some sets of ﬁnancial data are observed at much higher frequen-
ciesthan macroeconomic data. Asset prices or yields are often available
at daily, hourly, or minute-by-minute frequencies. Thus the number of ob-servations available for analysis can potentially be very large – perhapsthousands or even millions, making ﬁnancial data the envy of macro-econometricians! The implication is that more powerful techniques canoften be applied to ﬁnancial than economic data, and that researchersmay also have more conﬁdence in the results.
Furthermore, the analysis of ﬁnancial data also brings with it a num-
ber of new problems. While the difﬁculties associated with handling andprocessing such a large amount of data are not usually an issue givenrecent and continuing advances in computer power, ﬁnancial data oftenhave a number of additional characteristics. For example, ﬁnancial dataare often considered very ‘noisy’, which means that it is more difﬁcultto separate underlying trends or patterns from random and uninteresting
features. Financial data are also almost always not normally distributedin spite of the fact that most techniques in econometrics assume thatthey are. High frequency data often contain additional ‘patterns’ whichare the result of the way that the market works, or the way that pricesare recorded. These features need to be considered in the model-buildingprocess, even if they are not directl y of interest to the researcher.
1.3 Types of data
There are broadly three types of data that can be employed in quantitative
analysis of ﬁnancial problems: time series data, cross-sectional data, andpanel data.
1.3.1 Time series data
Time series data, as the name suggests, are data that have been collectedover a period of time on one or more variables. Time series data have

4 Introductory Econometrics for Finance
Box 1.2 Time series data
Series Frequency
Industrial production Monthly, or quarterlyGovernment budget deﬁcit AnnuallyMoney supply WeeklyThe value of a stock As transactions occur
associated with them a particular frequency of observation or collectionof data points. The frequency is simply a measure of the interval over ,o r
theregularity with which , the data are collected or recorded. Box 1.2 shows
some examples of time series data.
A word on ‘As transactions occur’ is necessary. Much ﬁnancial data does
not start its life as being regularly spaced . For example, the price of common
stock for a given company might be recorded to have changed wheneverthere is a new trade or quotation placed by the ﬁnancial informationrecorder. Such recordings are very unlikely to be evenly distributed overtime – for example, there may be no activity between, say, 5p.m. whenthe market closes and 8.30a.m. the next day when it reopens; there isalso typically less activity around the opening and closing of the market,and around lunch time. Although there are a number of ways to dealwith this issue, a common and simple approach is simply to select anappropriate frequency, and use as the observation for that time periodthe last prevailing price during the interval.
It is also generally a requirement that all data used in a model be
of the same frequency of observation . So, for example, regressions that seek
to estimate an arbitrage pricing model using monthly observations onmacroeconomic factors must also use monthly observations on stock re-turns, even if daily or weekly observations on the latter are available.
The data may be quantitative (e.g. exchange rates, prices, number of
shares outstanding), or qualitative (e.g. the day of the week, a survey of the
ﬁnancial products purchased by private individuals over a period of time,a credit rating, etc.).
Problems that could be tackled using time series data:
●How the value of a country’s stock index has varied with that country’smacroeconomic fundamentals
●How the value of a company’s stock price has varied when it announcedthe value of its dividend payment
●The effect on a country’s exchange rate of an increase in its trade deﬁcit.

Introduction 5
In all of the above cases, it is clearly the time dimension which is the
most important, and the analysis will be conducted using the values ofthe variables over time.
1.3.2 Cross-sectional data
Cross-sectional data are data on one or more variables collected at a singlepoint in time. For example, the data might be on:
●A poll of usage of Internet stockbroking services
●A cross-section of stock returns on the New York Stock Exchange(NYSE)
●A sample of bond credit ratings for UK banks.
Problems that could be tackled using cross-sectional data:
●The relationship between company size and the return to investing inits shares
●The relationship between a country’s GDP level and the probability thatthe government will default on its sovereign debt.
1.3.3 Panel data
Panel data have the dimensions of both time series and cross-sections,e.g. the daily prices of a number of blue chip stocks over two years. Theestimation of panel regressions is an interesting and developing area, andwill be examined in detail in chapter 10.
Fortunately, virtually all of the standard techniques and analysis in
econometrics are equally valid for time series and cross-sectional data.For time series data, it is usual to denote the individual observation num-bers using the index t, and the total number of observations available for
analysis by T. For cross-sectional data, the individual observation numbers
are indicated using the index i, and the total number of observations avail-
able for analysis by N. Note that there is, in contrast to the time series
case, no natural ordering of the observations in a cross-sectional sample.For example, the observations imight be on the price of bonds of differ-
ent ﬁrms at a particular point in time, ordered alphabetically by companyname. So, in the case of cross-sectional data, there is unlikely to be anyuseful information contained in the fact that Northern Rock follows Na-tional Westminster in a sample of UK bank credit ratings, since it is purelyby chance that their names both begin with the letter ‘N’. On the otherhand, in a time series context, the ordering of the data is relevant sincethe data are usually ordered chronologically.

6 Introductory Econometrics for Finance
In this book, the total number of observations in the sample will be
given by Teven in the context of regression equations that could apply
either to cross-sectional or to time series data.
1.3.4 Continuous and discrete data
As well as classifying data as being of the time series or cross-sectionaltype, we could also distinguish it as being either continuous or discrete,exactly as their labels would suggest. Continuous data can take on any value
and are not conﬁned to take speciﬁc numbers; their values are limited onlyby precision. For example, the rental yield on a property could be 6.2%,6.24% or 6.238%, and so on. On the other hand, discrete data can only take
on certain values, which are usually integers
1(whole numbers), and are
often deﬁned to be count numbers. For instance, the number of people ina particular underground carriage or the number of shares traded duringa day. In these cases, having 86.3 passengers in the carriage or 5857
1/2
shares traded would not make sense.
1.3.5 Cardinal, ordinal and nominal numbers
Another way in which we could classify numbers is according to whetherthey are cardinal, ordinal, or nominal. Cardinal numbers are those where
the actual numerical values that a particular variable takes have meaning,and where there is an equal distance between the numerical values. Onthe other hand, ordinal numbers can only be interpreted as providing a
position or an ordering. Thus, for cardinal numbers, a ﬁgure of 12 impliesa measure that is ‘twice as good’ as a ﬁgure of 6. Examples of cardinalnumbers would be the price of a share or of a building, and the numberof houses in a street. On the other hand, for an ordinal scale, a ﬁgure of 12may be viewed as ‘better’ than a ﬁgure of 6, but could not be consideredtwice as good. Examples of ordinal numbers would be the position ofa runner in a race (e.g. second place is better than fourth place, but itwould make little sense to say it is ‘twice as good’) or the level reached ina computer game.
The ﬁnal type of data that could be encountered would be where there is
no natural ordering of the values at all, so a ﬁgure of 12 is simply differentto that of a ﬁgure of 6, but could not be considered to be better or worsein any sense. Such data often arise when numerical values are arbitrarilyassigned, such as telephone numbers or when codings are assigned to
1Discretely measured data do not necessarily have to be integers. For example, until
recently when they became ‘decimalised’, many ﬁnancial asset prices were quoted to thenearest 1/16 or 1/32 of a dollar.

Introduction 7
qualitative data (e.g. when describing the exchange that a US stock is
traded on, ‘1’ might be used to denote the NYSE, ‘2’ to denote the NASDAQand ‘3’ to denote the AMEX). Sometimes, such variables are called nominal
variables. Cardinal, ordinal and nominal variables may require differentmodelling approaches or at least different treatments, as should becomeevident in the subsequent chapters.
1.4 Returns in ﬁnancial modelling
In many of the problems of interest in ﬁnance, the starting point is a timeseries of prices – for example, the prices of shares in Ford, taken at 4p.m.each day for 200 days. For a number of statistical reasons, it is preferablenot to work directly with the price series, so that raw price series areusually converted into series of returns. Additionally, returns have theadded beneﬁt that they are unit-free. So, for example, if an annualisedreturn were 10%, then investors know that they would have got back £110for a £100 investment, or £1,100 for a £1,000 investment, and so on.
There are two methods used to calculate returns from a series of prices,
and these involve the formation of simple returns, and continuously com-pounded returns, which are achieved as follows:
Simple returns Continuously compounded returns
R
t=pt−pt−1
pt−1×100% (1.1) rt=100% ×ln/parenleftbiggpt
pt−1/parenrightbigg
(1.2)
where: Rtdenotes the simple return at time t,rtdenotes the continuously
compounded return at time t,ptdenotes the asset price at time t, and ln
denotes the natural logarithm.
If the asset under consideration is a stock or portfolio of stocks, the
total return to holding it is the sum of the capital gain and any divi-dends paid during the holding period. However, researchers often ignoreany dividend payments. This is unfortunate, and will lead to an under-estimation of the total returns that accrue to investors. This is likely tobe negligible for very short holding periods, but will have a severe im-pact on cumulative returns over investment horizons of several years.Ignoring dividends will also have a distortionary effect on the cross-section of stock returns. For example, ignoring dividends will imply that‘growth’ stocks, with large capital gains will be inappropriately favouredover income stocks (e.g. utilities and mature industries) that pay highdividends.

8 Introductory Econometrics for Finance
Box 1.3 Log returns
(1) Log-returns have the nice property that they can be interpreted as continuously com-
pounded returns – so that the frequency of compounding of the return does not
matter and thus returns across assets can more easily be compared.
(2) Continuously compounded returns are time-additive . For example, suppose that a
weekly returns series is required and daily log returns have been calculated for ﬁvedays, numbered 1 to 5, representing the returns on Monday through Friday. It is validto simply add up the ﬁve daily returns to obtain the return for the whole week:
Monday return r
1=ln(p1/p0)=lnp1−lnp0
Tuesday return r2=ln(p2/p1)=lnp2−lnp1
Wednesday return r3=ln(p3/p2)=lnp3−lnp2
Thursday return r4=ln(p4/p3)=lnp4−lnp3
Friday return r5=ln(p5/p4)=lnp5−lnp4
——————————–
Return over the week lnp5−lnp0=ln(p5/p0)
Alternatively, it is possible to adjust a stock price time series so that
the dividends are added back to generate a total return index .I fptwere
a total return index, returns generated using either of the two formulaepresented above thus provide a measure of the total return that wouldaccrue to a holder of the asset during time t.
The academic ﬁnance literature generally employs the log-return for-
mulation (also known as log-price relatives since they are the log of theratio of this period’s price to the previous period’s price). Box 1.3 showstwo key reasons for this.
There is, however, also a disadvantage of using the log-returns. The
simple return on a portfolio of assets is a weighted average of the simplereturns on the individual assets:
R
pt=N/summationdisplay
i=1wiRit (1.3)
But this does not work for the continuously compounded returns, so that
they are not additive across a portfolio. The fundamental reason why thisis the case is that the log of a sum is not the same as the sum of a log,since the operation of taking a log constitutes a non-linear transformation .
Calculating portfolio returns in this context must be conducted by ﬁrstestimating the value of the portfolio at each time period and then deter-mining the returns from the aggregate portfolio values. Or alternatively,if we assume that the asset is purchased at time t−Kfor price P
t−K
and then sold Kperiods later at price Pt, then if we calculate simple
returns for each period, Rt,Rt+1,…, RK, the aggregate return over all K

Introduction 9
1a. Economic or financial theory (previous studies)
1b. Formulation of an estimable theoretical model
2. Collection of data
3. Model estimation
4. Is the model statistically adequate?
No Y es
Reformulate model 5. Interpret model
6. Use for analysisFigure 1.1
Steps involved in
forming aneconometric model
periods is
RKt=Pt−Pt−K
Pt−K=Pt
Pt−K−1=/bracketleftbiggPt
Pt−1×Pt−1
Pt−2×…×Pt−K+1
Pt−K/bracketrightbigg
−1
=[(1+Rt)(1+Rt−1)…(1+Rt−K+1)]−1
(1.4)
In the limit, as the frequency of the sampling of the data is increased
so that they are measured over a smaller and smaller time interval, thesimple and continuously compounded returns will be identical.
1.5 Steps involved in formulating an econometric model
Although there are of course many different ways to go about the processof model building, a logical and valid approach would be to follow thesteps described in ﬁgure 1.1.
The steps involved in the model construction process are now listed and
described. Further details on each stage are given in subsequent chaptersof this book.
●Step 1a and 1b: general statement of the problem This will usually involve
the formulation of a theoretical model, or intuition from ﬁnancial the-ory that two or more variables should be related to one another ina certain way. The model is unlikely to be able to completely captureevery relevant real-world phenomenon, but it should present a sufﬁ-ciently good approximation that it is useful for the purpose at hand.

10 Introductory Econometrics for Finance
●Step 2: collection of data relevant to the model The data required may be
available electronically through a ﬁnancial information provider, suchas Reuters or from published government ﬁgures. Alternatively, the re-quired data may be available only via a survey after distributing a setof questionnaires i.e. primary data.
●Step 3: choice of estimation method relevant to the model proposed in step 1
For example, is a single equation or multiple equation technique to beused?
●Step 4: statistical evaluation of the model What assumptions were required
to estimate the parameters of the model optimally? Were these assump-tions satisﬁed by the data or the model? Also, does the model adequatelydescribe the data? If the answer is ‘yes’, proceed to step 5; if not, go backto steps 1–3 and either reformulate the model, collect more data, orselect a different estimation technique that has less stringent require-ments.
●Step 5: evaluation of the model from a theoretical perspective Are the param-
eter estimates of the sizes and signs that the theory or intuition fromstep 1 suggested? If the answer is ‘yes’, proceed to step 6; if not, againreturn to stages 1–3.
●Step 6: use of model When a researcher is ﬁnally satisﬁed with the model,
it can then be used for testing the theory speciﬁed in step 1, or for for-mulating forecasts or suggested courses of action. This suggested courseof action might be for an individual (e.g. ‘if inﬂation and GDP rise, buystocks in sector X’), or as an input to government policy (e.g. ‘when
equity markets fall, program trading causes excessive volatility and soshould be banned’).
It is important to note that the process of building a robust empirical
model is an iterative one, and it is certainly not an exact science. Often,the ﬁnal preferred model could be very different from the one originallyproposed, and need not be unique in the sense that another researcherwith the same data and the same initial theory could arrive at a differentﬁnal speciﬁcation.
1.6 Points to consider when reading articles in empirical ﬁnance
As stated above, one of the deﬁning features of this book relative to othersin the area is in its use of published academic research as examples of theuse of the various techniques. The papers examined have been chosen fora number of reasons. Above all, they represent (in this author’s opinion) aclear and speciﬁc application in ﬁnance of the techniques covered in this

Introduction 11
Box 1.4 Points to consider when reading a published paper
(1) Does the paper involve the development of a theoretical model or is it merely a
technique looking for an application so that the motivation for the whole exercise ispoor?
(2) Are the data of ‘good quality’? Are they from a reliable source? Is the size of the
sample sufﬁciently large for the model estimation task at hand?
(3) Have the techniques been validly applied? Have tests been conducted for possible
violations of any assumptions made in the estimation of the model?
(4) Have the results been interpreted sensibly? Is the strength of the results exagger-
ated? Do the results actually obtained relate to the questions posed by the author(s)?Can the results be replicated by other researchers?
(5) Are the conclusions drawn appropriate given the results, or has the importance of
the results of the paper been overstated?
book. They were also required to be published in a peer-reviewed journal,and hence to be widely available.
When I was a student, I used to think that research was a very pure
science. Now, having had ﬁrst-hand experience of research that academicsand practitioners do, I know that this is not the case. Researchers often cutcorners. They have a tendency to exaggerate the strength of their results,and the importance of their conclusions. They also have a tendency not tobother with tests of the adequacy of their models, and to gloss over or omitaltogether any results that do not conform to the point that they wishto make. Therefore, when examining papers from the academic ﬁnanceliterature, it is important to cast a very critical eye over the research –rather like a referee who has been asked to comment on the suitabilityof a study for a scholarly journal. The questions that are always worthasking oneself when reading a paper are outlined in box 1.4.
Bear these questions in mind when reading my summaries of the ar-
ticles used as examples in this book and, if at all possible, seek out andread the entire articles for yourself.
1.7 Econometric packages for modelling ﬁnancial data
As the name suggests, this section contains descriptions of various com-puter packages that may be employed to estimate econometric models. Thenumber of available packages is large, and over time, all packages haveimproved in breadth of available techniques, and have also converged interms of what is available in each package. Some readers may already befamiliar with the use of one or more packages, and if this is the case,this section may be skipped. For those who do not know how to use any

12 Introductory Econometrics for Finance
Table 1.1 Econometric software packages for
modelling ﬁnancial data
Package software supplier∗
EViews QMS Software
GAUSS Aptech SystemsLIMDEP Econometric SoftwareMATLAB The MathWorksRATS EstimaSAS SAS InstituteSHAZAM Northwest EconometricsSPLUS Insightful CorporationSPSS SPSSTSP TSP International
∗Full contact details for all software suppliers
can be found in the appendix at the end of thischapter.
econometrics software, or have not yet found a package which suits their
requirements, then read on.
1.7.1 What packages are available?
Although this list is by no means exhaustive, a set of widely used packagesis given in table 1.1. The programs can usefully be categorised according towhether they are fully interactive, (menu-driven), command-driven (so thatthe user has to write mini-programs), or somewhere in between. Menu-driven packages, which are usually based on a standard Microsoft Win-dows graphical user interface, are almost certainly the easiest for novicesto get started with, for they require little knowledge of the structure ofthe package, and the menus can usually be negotiated simply. EViews isa package that falls into this category.
On the other hand, some such packages are often the least ﬂexible,
since the menus of available options are ﬁxed by the developers, andhence if one wishes to build something slightly more complex or justdifferent, then one is forced to consider alternatives. EViews, however,has a command-based programming language as well as a click-and-pointinterface so that it offers ﬂexibility as well as user-friendliness.
1.7.2 Choosing a package
Choosing an econometric software package is an increasingly difﬁculttask as the packages become more powerful but at the same time morehomogeneous. For example, LIMDEP, a package originally developed forthe analysis of a certain class of cross-sectional data, has many useful

Introduction 13
features for modelling ﬁnancial time series. Also, many packages devel-
oped for time series analysis, such as TSP (‘Time Series Processor’), can alsonow be used for cross-sectional or panel data. Of course, this choice maybe made for you if your institution offers or supports only one or two ofthe above possibilities. Otherwise, sensible questions to ask yourself are:
●Is the package suitable for your intended applications – for example, does
the software have the capability for the models that you want to esti-mate? Can it handle sufﬁciently large databases?
●Is the package user-friendly ?
●Is it fast?
●How much does it cost?
●Is it accurate ?
●Is the package discussed orsupported in a standard textbook, as EViews
is in this book?
●Does the package have readable and comprehensive manuals ? Is help avail-
able online?
●Does the package come with free technical support so that you can e-mail
the developers with queries?
A great deal of useful information can be obtained most easily from the
web pages of the software developers. Additionally, many journals (includ-ing the Journal of Applied Econometrics ,t h e Economic Journal ,t h e International
Journal of Forecasting and the American Statistician ) publish software reviews
that seek to evaluate and compare the packages’ usefulness for a givenpurpose. Three reviews that this author has been involved with, that arerelevant for chapter 8 of this text in particular, are Brooks (1997) andBrooks, Burke and Persand (2001, 2003).
The EViews package will be employed in this text because it is simple
to use, menu-driven, and will be sufﬁcient to estimate most of the modelsrequired for this book. The following section gives an introduction to thissoftware and outlines the key features and how basic tasks are executed.
2
1.7.3 Accomplishing simple tasks using EViews
EViews is a simple to use, interactive econometrics software package, pro-viding the tools most frequently used in practical econometrics. EViewsis built around the concept of objects with each object having its ownwindow, its own menu, its own procedure and its own view of its data.
2The ﬁrst edition of this text also incorporated a detailed discussion of the WinRATS
package, but in the interests of keeping the book at a manageable length with two newchapters included, the support for WinRATS users will now be given in a separatehandbook that accompanies the main text, ISBN: 9780521896955.

14 Introductory Econometrics for Finance
Using menus, it is easy to change between displays of a spreadsheet, line
and bar graphs, regression results, etc. One of the most important fea-tures of EViews that makes it useful for model-building is the wealth ofdiagnostic (misspeciﬁcation) tests, that are automatically computed, mak-ing it possible to test whether the model is econometrically valid or not.You work your way through EViews using a combination of windows, but-tons, menus and sub-menus. A good way of familiarising yourself withEViews is to learn about its main menus and their relationships throughthe examples given in this and subsequent chapters.
This section assumes that readers have obtained a licensed copy of
EViews, and have successfully loaded it onto an available computer. Therenow follows a description of the EViews package, together with instruc-tions to achieve standard tasks and sample output. Any instructions thatmust be entered or icons to be clicked are illustrated throughout this bookbybold-faced type . The objective of the treatment in this and subsequent
chapters is not to demonstrate the full functionality of the package, butrather to get readers started quickly and to explain how the techniquesare implemented. For further details, readers should consult the softwaremanuals in the ﬁrst instance, which are now available electronically withthe software as well as in hard copy.
3Note that EViews is not case-sensitive,
so that it does not matter whether commands are entered as lower-caseor CAPITAL letters.
Opening the software
To load EViews from Windows, choose Start ,All Programs ,EViews6 and
ﬁnally, EViews6 again.
Reading in data
EViews provides support to read from or write to various ﬁle types, in-
cluding ‘ASCII’ (text) ﬁles, Microsoft Excel ‘.XLS’ ﬁles (reading from anynamed sheet in the Excel workbook), Lotus ‘.WKS1’ and ‘.WKS3’ ﬁles. It isusually easiest to work directly with Excel ﬁles, and this will be the casethroughout this book.
Creating a workﬁle and importing data
The ﬁrst step when the EViews software is opened is to create a workfile
that will hold the data. To do this, select New from the File menu. Then
3A student edition of EViews 4.1 is available at a much lower cost than the full version,
but with reduced functionality and restrictions on the number of observations andobjects that can be included in each workﬁle.

Introduction 15
choose Workfile . The ‘Workﬁle Create’ window in screenshot 1.1 will be
displayed.
Screenshot 1.1
Creating a workﬁle
We are going to use as an example a time series of UK average house
price data obtained from Nationwide,4which comprises 197 monthly ob-
servations from January 1991 to May 2007. The frequency of the data(Monthly ) should be set and the start ( 1991:01 ) and end ( 2007:05 )d a t e s
should be inputted. Click OK. An untitled workﬁle will be created.
Under ‘Workﬁle structure type’, keep the default option, Dated – regu-
lar frequency . Then, under ‘Date speciﬁcation’, choose Monthly . Note the
format of date entry for monthly and quarterly data: YYYY:M and YYYY:Q,respectively. For daily data, a US date format must usually be used depend-ing on how EViews has been set up: MM/DD/YYYY (e.g. 03/01/1999 wouldbe 1st March 1999, not 3rd January). Caution therefore needs to be exer-cised here to ensure that the date format used is the correct one. Typethe start and end dates for the sample into the boxes: 1991:01 and 2007:05
respectively. Then click OK. The workﬁle will now have been created. Note
that two pairs of dates are displayed, ‘Range’ and ‘Sample’: the ﬁrst one isthe range of dates contained in the workﬁle and the second one (whichis the same as above in this case) is for the current workﬁle sample. Two
4Full descriptions of the sources of data used will be given in appendix 3 and on the web
site accompanying this book.

16 Introductory Econometrics for Finance
objects are also displayed: C (which is a vector that will eventually contain
the parameters of any estimated models) and RESID (a residuals series,which will currently be empty). See chapter 2 for a discussion of theseconcepts. All EViews workﬁles will contain these two objects, which arecreated automatically.
Now that the workﬁle has been set up, we can import the data from
the Excel ﬁle UKHP.XLS. So from the File menu, select Import and Read
Text-Lotus-Excel . You will then be prompted to select the directory and ﬁle
name. Once you have found the directory where the ﬁle is stored, enterUKHP.XLS in the ‘ﬁle name’ box and select the ﬁle type ‘Excel ( ∗.xls)’. The
window in screenshot 1.2 (‘Excel Spreadsheet Import’) will be displayed.
Screenshot 1.2
Importing Excel data
into the workﬁle
You have to choose the order of your data: by observations (series in
columns as they are in this and most other cases) or by series (series in
rows). Also you could provide the names for your series in the relevantbox. If the names of the series are already in the imported Excel data ﬁle,you can simply enter the number of series (which you are importing) inthe ‘Names for series or Number if named in ﬁle’ ﬁeld in the dialog box.In this case, enter HP, say, for house prices. The ‘Upper-left data cell’ refers
to the ﬁrst cell in the spreadsheet that actually contains numbers. In thiscase, it can be left at B2 as the ﬁrst column in the spreadsheet contains

Introduction 17
only dates and we do not need to import those since EViews will date the
observations itself. You should also choose the sample of the data that youwish to import. This box can almost always be left at EViews’ suggestionwhich defaults to the current workﬁle sample. Click OKand the series will
be imported. The series will appear as a new icon in the workﬁle window,as in screenshot 1.3.
Screenshot 1.3
The workﬁle
containing loadeddata
Verifying the data
Double click on the new hp icon that has appeared, and this will open
up a spreadsheet window within EViews containing the monthly houseprice values. Make sure that the data ﬁle has been correctly imported bychecking a few observations at random.
The next step is to save the workﬁle: click on the Save As button from
theFile menu and select Save Active Workfile and click OK. A save dialog
box will open, prompting you for a workﬁle name and location. You shouldenter XX (where XX is your chosen name for the ﬁle), then click OK.E V i e w s
will save the workﬁle in the speciﬁed directory with the name XX.WF1.The saved workﬁle can be opened later by selecting File/Open/EViews Work-ﬁle…from the menu bar.

18 Introductory Econometrics for Finance
Transformations
Variables of interest can be created in EViews by selecting the Genr button
from the workﬁle toolbar and typing in the relevant formulae. Suppose,for example, we have a time series called Z. The latter can be modiﬁed inthe following ways so as to create Variables A, B, C, etc.
A=Z/2 Dividing
B=Z*2 Multiplication
C=Zˆ2 Squaring
D=LOG(Z) Taking the logarithms
E=EXP(Z) Taking the exponential
F=Z(−1) Lagging the data
G=LOG(Z/Z (−1)) Creating the log-returns
Other functions that can be used in the formulae include: abs,sin,cos,e t c .
Notice that no special instruction is necessary; simply type ‘new variable =
function of old variable(s)’. The variables will be displayed in the sameworkﬁle window as the original (imported) series.
In this case, it is of interest to calculate simple percentage changes in
the series. Click Genr and type DHP=100*(HP-HP(-1))/HP(-1) . It is important
to note that this new series, DHP, will be a series of monthly changes andwill not be annualised.
Computing summary statistics
Descriptive summary statistics of a series can be obtained by selectingQuick /Series Statistics /Histogram and Stats and typing in the name of
the variable ( DHP ). The view in screenshot 1.4 will be displayed in the
window.
As can be seen, the histogram suggests that the series has a longer upper
tail than lower tail (note the x-axis scale) and is centred slightly abovezero. Summary statistics including the mean, maximum and minimum,standard deviation, higher moments and a test for whether the series isnormally distributed are all presented. Interpreting these will be discussedin subsequent chapters. Other useful statistics and transformations canbe obtained by selecting the command Quick /Series Statistics , but these are
covered later in this book.
Plots
EViews supports a wide range of graph types including line graphs, bargraphs, pie charts, mixed line–bar graphs, high–low graphs and scatter-plots. A variety of options permits the user to select the line types, colour,

Introduction 19
Screenshot 1.4
Summary statistics
for a series
border characteristics, headings, shading and scaling, including logarith-
mic scale and dual scale graphs. Legends are automatically created (al-though they can be removed if desired), and customised graphs can beincorporated into other Windows applications using copy-and-paste, or byexporting as Windows metaﬁles.
From the main menu, select Quick /Graph and type in the name of the
series that you want to plot ( HPto plot the level of house prices) and click
OK. You will be prompted with the Graph window where you choose the
type of graph that you want (line, bar, scatter or pie charts). There is aShow Option button, which you click to make adjustments to the graphs.
Choosing a line graph would produce screenshot 1.5.
Scatter plots can similarly be produced by selecting ‘Scatter’ in the
‘Graph Type’ box after opening a new graph object.
Printing results
Results can be printed at any point by selecting the Print button on the ob-
ject window toolbar. The whole current window contents will be printed.Choosing View/Print Selected from the workﬁle window prints the default

20 Introductory Econometrics for Finance
Screenshot 1.5
A line graph
view for all of the selected objects. Graphs can be copied into the clipboard
if desired by right clicking on the graph and choosing Copy .
Saving data results and workﬁle
Data generated in EViews can be exported to other Windows applications,
e.g. Microsoft Excel. From the object toolbar, select Procs /Export /Write Text-
Lotus-Excel . You will then be asked to provide a name for the exported ﬁle
and to select the appropriate directory. The next window will ask you toselect all the series that you want to export, together with the sampleperiod.
Assuming that the workﬁle has been saved after the importation of
the data set (as mentioned above), additional work can be saved by justselecting Save from the Filemenu. It will ask you if you want to overwrite
the existing ﬁle, in which case you click on the Yesbutton. You will also
be prompted to select whether the data in the ﬁle should be saved in‘single precision’ or ‘double precision’. The latter is preferable for obviousreasons unless the ﬁle is likely to be very large because of the quantityof variables and observations it contains (single precision will require lessspace). The workﬁle will be saved including all objects in it – data, graphs,

Introduction 21
equations, etc. so long as they have been given a title . Any untitled objects
will be lost upon exiting the program.
Econometric tools available in EViews
Box 1.5 describes the features available in EViews, following the formatof the user guides for version 6, with material discussed in this bookindicated by italics .
Box 1.5 Features of EViews
The EViews user guide is now split into two volumes. Volume I contains parts I to III as
described below, while Volume II contains Parts IV to VIII.
PART I (EVIEWS FUNDAMENTALS)
●Chapters 1–4 contain introductory material describing the basics of Windows and
EViews, how workﬁles are constructed and how to deal with objects.
●Chapters 5 and 6 document the basics of working with data. Importing data into
EViews, using EViews to manipulate and manage data , and exporting from EViews
into spreadsheets, text ﬁles and other Windows applications are discussed.
●Chapters 7–10 describe the EViews database and other advanced data and workﬁlehandling features.
PART II (BASIC DATA ANALYSIS)
●Chapter 11 describes the series object. Series are the basic unit of data in EViews
and are the basis for all univariate analysis . This chapter documents the basic
graphing anddata analysis features associated with series .
●Chapter 12 documents the group object. Groups are collections of series that form
the basis for a variety of multivariate graphing anddata analyses .
●Chapter 13 provides detailed documentation for explanatory data analysis using
distribution graphs, density plots andscatter plot graphs.
●Chapters 14 and 15 describe the creation and customisation of more advanced
tables and graphs.
PART III (COMMANDS AND PROGRAMMING)
●Chapters 16–23 describe in detail how to write programs using the EViews
programming language.
PART IV (BASIC SINGLE EQUATION ANALYSIS)
●Chapter 24 outlines the basics of ordinary least squares estimation (OLS) in EViews.
●Chapter 25 discusses the weighted least squares ,two-stage least squares and
non-linear least squares estimation techniques.
●Chapter 26 describes single equation regression techniques for the analysis of time
series data: testing for serial correlation ,estimation of ARMA models , using
polynomial distributed lags, and unit root tests for non-stationary time series.

22 Introductory Econometrics for Finance
●Chapter 27 describes the fundamentals of using EViews to forecast from estimated
equations .
●Chapter 28 describes the speciﬁcation testing procedures available in EViews.
PART V (ADVANCED SINGLE EQUATION ANALYSIS)
●Chapter 29 discusses ARCH and GARCH estimation and outlines the EViews tools
formodelling the conditional variance of a variable.
●Chapter 30 documents EViews functions for estimating qualitative and limited
dependent variable models . EViews provides estimation routines for binary or
ordered (e.g. probit and logit), censored or truncated (tobit, etc.) and integer valued
(count) data.
●Chapter 31 discusses the fashionable topic of the estimation of quantileregressions.
●Chapter 32 shows how to deal with the log-likelihood object, and how to solveproblems with non-linear estimation.
PART VI (MULTIPLE EQUATION ANALYSIS)
●Chapters 33–36 describe estimation techniques for systems of equations including
VARandVECmodels, and state space models.
PART VII (PANEL AND POOLED DATA)
●Chapter 37 outlines tools for working with pooled time series, cross-section data and
estimating standard equation speciﬁcations that account for the pooled structure ofthe data.
●Chapter 38 describes how to structure a panel of data and how to analyse it, whilechapter 39 extends the analysis to look at panel regression model estimation .
PART VIII (OTHER MULTIVARIATE ANALYSIS)
●Chapter 40, the ﬁnal chapter of the manual, explains how to conduct factor analysis
in EViews.
1.8 Outline of the remainder of this book
Chapter 2
This introduces the classical linear regression model (CLRM). The ordinaryleast squares (OLS) estimator is derived and its interpretation discussed.The conditions for OLS optimality are stated and explained. A hypothesistesting framework is developed and examined in the context of the linearmodel. Examples employed include Jensen’s classic study of mutual fundperformance measurement and tests of the ‘overreaction hypothesis’ inthe context of the UK stock market.

Introduction 23
Chapter 3
This continues and develops the material of chapter 2 by generalising the
bivariate model to multiple regression – i.e. models with many variables.The framework for testing multiple hypotheses is outlined, and measuresof how well the model ﬁts the data are described. Case studies includemodelling rental values and an application of principal components anal-ysis to interest rate modelling.
Chapter 4
Chapter 4 examines the important but often neglected topic of diagnos-tic testing. The consequences of violations of the CLRM assumptions aredescribed, along with plausible remedial steps. Model-building philoso-phies are discussed, with particular reference to the general-to-speciﬁcapproach. Applications covered in this chapter include the determinationof sovereign credit ratings.
Chapter 5
This presents an introduction to time series models, including their moti-vation and a description of the characteristics of ﬁnancial data that theycan and cannot capture. The chapter commences with a presentation ofthe features of some standard models of stochastic (white noise, movingaverage, autoregressive and mixed ARMA) processes. The chapter contin-ues by showing how the appropriate model can be chosen for a set ofactual data, how the model is estimated and how model adequacy checksare performed. The generation of forecasts from such models is discussed,as are the criteria by which these forecasts can be evaluated. Examples in-clude model-building for UK house prices, and tests of the exchange ratecovered and uncovered interest parity hypotheses.
Chapter 6
This extends the analysis from univariate to multivariate models. Multi-variate models are motivated by way of explanation of the possibleexistence of bi-directional causality in ﬁnancial relationships, and thesimultaneous equations bias that results if this is ignored. Estimationtechniques for simultaneous equations models are outlined. Vector auto-regressive (VAR) models, which have become extremely popular in theempirical ﬁnance literature, are also covered. The interpretation of VARsis explained by way of joint tests of restrictions, causality tests, impulseresponses and variance decompositions. Relevant examples discussed inthis chapter are the simultaneous relationship between bid–ask spreads

24 Introductory Econometrics for Finance
and trading volume in the context of options pricing, and the relationship
between property returns and macroeconomic variables.
Chapter 7
The ﬁrst section of the chapter discusses unit root processes and presentstests for non-stationarity in time series. The concept of and tests for coin-tegration, and the formulation of error correction models, are then dis-cussed in the context of both the single equation framework of Engle–Granger, and the multivariate framework of Johansen. Applications stud-ied in chapter 7 include spot and futures markets, tests for cointegrationbetween international bond markets and tests of the purchasing powerparity hypothesis and of the expectations hypothesis of the term struc-ture of interest rates.
Chapter 8
This covers the important topic of volatility and correlation modellingand forecasting. This chapter starts by discussing in general terms theissue of non-linearity in ﬁnancial time series. The class of ARCH (AutoRe-gressive Conditionally Heteroscedastic) models and the motivation for thisformulation are then discussed. Other models are also presented, includ-ing extensions of the basic model such as GARCH, GARCH-M, EGARCHand GJR formulations. Examples of the huge number of applications arediscussed, with particular reference to stock returns. Multivariate GARCHmodels are described, and applications to the estimation of conditionalbetas and time-varying hedge ratios, and to ﬁnancial risk measurement,are given.
Chapter 9
This discusses testing for and modelling regime shifts or switches of be-haviour in ﬁnancial series that can arise from changes in governmentpolicy, market trading conditions or microstructure, among other causes.This chapter introduces the Markov switching approach to dealing withregime shifts. Threshold autoregression is also discussed, along with issuesrelating to the estimation of such models. Examples include the modellingof exchange rates within a managed ﬂoating environment, modelling andforecasting the gilt–equity yield ratio, and models of movements of thedifference between spot and futures prices.
Chapter 10
This new chapter focuses on how to deal appropriately with longitudinaldata – that is, data having both time series and cross-sectional dimensions.Fixed effect and random effect models are explained and illustrated by way

Introduction 25
of examples on banking competition in the UK and on credit stability in
Central and Eastern Europe. Entity ﬁxed and time-ﬁxed effects models areelucidated and distinguished.
Chapter 11
The second new chapter describes various models that are appropriatefor situations where the dependent variable is not continuous. Readerswill learn how to construct, estimate and interpret such models, and todistinguish and select between alternative speciﬁcations. Examples usedinclude a test of the pecking order hypothesis in corporate ﬁnance andthe modelling of unsolicited credit ratings.
Chapter 12
This presents an introduction to the use of simulations in econometricsand ﬁnance. Motivations are given for the use of repeated sampling, and adistinction is drawn between Monte Carlo simulation and bootstrapping.The reader is shown how to set up a simulation, and examples are givenin options pricing and ﬁnancial risk management to demonstrate theusefulness of these techniques.
Chapter 13
This offers suggestions related to conducting a project or dissertation inempirical ﬁnance. It introduces the sources of ﬁnancial and economic dataavailable on the Internet and elsewhere, and recommends relevant onlineinformation and literature on research in ﬁnancial markets and ﬁnancialtime series. The chapter also suggests ideas for what might constitute agood structure for a dissertation on this subject, how to generate ideas fora suitable topic, what format the report could take, and some commonpitfalls.
Chapter 14
This summarises the book and concludes. Several recent developments inthe ﬁeld, which are not covered elsewhere in the book, are also mentioned.Some tentative suggestions for possible growth areas in the modelling ofﬁnancial time series are also given.
1.9 Further reading
EViews 6 User’s Guides I and II – Quantitative Micro Software (2007), QMS, Irvine, CA
EViews 6 Command Reference – Quantitative Micro Software (2007), QMS, Irvine, CA
Startz, R. EViews Illustrated for Version 6 (2007) QMS, Irvine, CA

26 Introductory Econometrics for Finance
Appendix: Econometric software package suppliers
Package Contact information
EViews QMS Software, Suite 336, 4521 Campus Drive #336, Irvine, CA 92612–2621, USA
Tel: (+1) 949 856 3368; Fax: ( +1) 949 856 2044; Web: www.eviews.com
GAUSS Aptech Systems Inc, PO Box 250, Black Diamond, WA 98010, USA
Tel: (+1) 425 432 7855; Fax: ( +1) 425 432 7832; Web: www.aptech.com
LIMDEP Econometric Software, 15 Gloria Place, Plainview, NY 11803, USA
Tel: (+1) 516 938 5254; Fax: ( +1) 516 938 2441; Web: www.limdep.com
MATLAB The MathWorks Inc., 3 Applie Hill Drive, Natick, MA 01760-2098, USA
Tel: (+1) 508 647 7000; Fax: ( +1) 508 647 7001; Web: www.mathworks.com
RATS Estima, 1560 Sherman Avenue, Evanson, IL 60201, USA
Tel: (+1) 847 864 8772; Fax: ( +1) 847 864 6221; Web: www.estima.com
SAS SAS Institute, 100 Campus Drive, Cary NC 27513–2414, USA
Tel: (+1) 919 677 8000; Fax: ( +1) 919 677 4444; Web: www.sas.com
SHAZAM Northwest Econometrics Ltd., 277 Arbutus Reach, Gibsons, B.C. V0N 1V8,
Canada
Tel: –; Fax: ( +1) 707 317 5364; Web: shazam.econ.ubc.ca
SPLUS Insightful Corporation, 1700 Westlake Avenue North, Suite 500, Seattle, WA
98109–3044, USA
Tel: (+1) 206 283 8802; Fax: ( +1) 206 283 8691; Web: www.splus.com
SPSS SPSS Inc, 233 S. Wacker Drive, 11th Floor, Chicago, IL 60606–6307, USA
Tel: (+1) 800 543 2185; Fax: ( +1) 800 841 0064; Web: www.spss.com
TSP TSP International, PO Box 61015 Station A, Palo Alto, CA 94306, USA
Tel: (+1) 650 326 1927; Fax: ( +1) 650 328 4163; Web: www.tspintl.com
Key concepts
The key terms to be able to deﬁne and explain from this chapter are
●ﬁnancial econometrics ●continuously compounded returns
●time series ●cross-sectional data
●panel data ●pooled data
●continuous data ●discrete data

2
A brief overview of the classical linear
regression model
Learning Outcomes
In this chapter, you will learn how to
●Derive the OLS formulae for estimating parameters and their
standard errors
●Explain the desirable properties that a good estimator should
have
●Discuss the factors that affect the sizes of standard errors
●Test hypotheses using the test of signiﬁcance and conﬁdence
interval approaches
●Interpret p-values
●Estimate regression models and test single hypotheses in
EViews
2.1 What is a regression model?
Regression analysis is almost certainly the most important tool at the
econometrician’s disposal. But what is regression analysis? In very generalterms, regression is concerned with describing and evaluating the relation-
ship between a given variable and one or more other variables . More speciﬁcally,
regression is an attempt to explain movements in a variable by referenceto movements in one or more other variables.
To make this more concrete, denote the variable whose movements
the regression seeks to explain by yand the variables which are used to
explain those variations by x
1,×2,…,xk. Hence, in this relatively simple
setup, it would be said that variations in kvariables (the xs) cause changes
in some other variable, y. This chapter will be limited to the case where
the model seeks to explain changes in only one variable y(although this
restriction will be removed in chapter 6).
27

28 Introductory Econometrics for Finance
Box 2.1 Names for yandxs in regression models
Names for y
Dependent variableRegressandEffect variableExplained variableNames for the xs
Independent variablesRegressorsCausal variablesExplanatory variables
There are various completely interchangeable names for yand the
xs, and all of these terms will be used synonymously in this book (see
box 2.1).
2.2 Regression versus correlation
All readers will be aware of the notion and deﬁnition of correlation . The
correlation between two variables measures the degree of linear association
between them. If it is stated that yand xare correlated, it means that y
and xare being treated in a completely symmetrical way. Thus, it is not
implied that changes in xcause changes in y, or indeed that changes in
ycause changes in x. Rather, it is simply stated that there is evidence
for a linear relationship between the two variables, and that movementsin the two are on average related to an extent given by the correlationcoefﬁcient.
In regression, the dependent variable (y)and the independent vari-
able(s) (xs)are treated very differently. The yvariable is assumed to be
random or ‘stochastic’ in some way, i.e. to have a probability distribution .
The xvariables are, however, assumed to have ﬁxed (‘non-stochastic’) val-
ues in repeated samples.
1Regression as a tool is more ﬂexible and more
powerful than correlation.
2.3 Simple regression
For simplicity, suppose for now that it is believed that ydepends on only
one xvariable. Again, this is of course a severely restricted case, but the
case of more explanatory variables will be considered in the next chap-ter. Three examples of the kind of relationship that may be of interestinclude:
1Strictly, the assumption that the xs are non-stochastic is stronger than required, an
issue that will be discussed in more detail in chapter 4.

A brief overview of the classical linear regression model 29
xy
100
80
60
40
20
0
10 20 30 40 50Figure 2.1
Scatter plot of two
variables, yandx
●How asset returns vary with their level of market risk
●Measuring the long-term relationship between stock prices and
dividends
●Constructing an optimal hedge ratio.
Suppose that a researcher has some idea that there should be a relation-
ship between two variables yand x, and that ﬁnancial theory suggests
that an increase in xwill lead to an increase in y. A sensible ﬁrst stage
to testing whether there is indeed an association between the variableswould be to form a scatter plot of them. Suppose that the outcome of thisplot is ﬁgure 2.1.
In this case, it appears that there is an approximate positive linear
relationship between xand ywhich means that increases in xare usually
accompanied by increases in y, and that the relationship between them
can be described approximately by a straight line. It would be possibleto draw by hand onto the graph a line that appears to ﬁt the data. Theintercept and slope of the line ﬁtted by eye could then be measured fromthe graph. However, in practice such a method is likely to be laboriousand inaccurate.
It would therefore be of interest to determine to what extent this rela-
tionship can be described by an equation that can be estimated using a de-ﬁned procedure. It is possible to use the general equation for a straight line
y=α+βx (2.1)

30 Introductory Econometrics for Finance
Box 2.2 Reasons for the inclusion of the disturbance term
●Even in the general case where there is more than one explanatory variable, some
determinants of ytwill always in practice be omitted from the model. This might, for
example, arise because the number of inﬂuences on yis too large to place in a
single model, or because some determinants of ymay be unobservable or not
measurable.
●There may be errors in the way that yis measured which cannot be modelled.
●There are bound to be random outside inﬂuences on ythat again cannot be
modelled. For example, a terrorist attack, a hurricane or a computer failure could allaffect ﬁnancial asset returns in a way that cannot be captured in a model andcannot be forecast reliably. Similarly, many researchers would argue that humanbehaviour has an inherent randomness and unpredictability!
to get the line that best ‘ﬁts’ the data. The researcher would then beseeking to ﬁnd the values of the parameters or coefﬁcients, αandβ,
which would place the line as close as possible to all of the data pointstaken together.
However, this equation (y=α+βx)is an exact one. Assuming that this
equation is appropriate, if the values of αandβhad been calculated, then
given a value of x, it would be possible to determine with certainty what
the value of ywould be. Imagine – a model which says with complete
certainty what the value of one variable will be given any value of theother!
Clearly this model is not realistic. Statistically, it would correspond to
the case where the model ﬁtted the data perfectly – that is, all of the datapoints lay exactly on a straight line. To make the model more realistic, arandom disturbance term, denoted by u, is added to the equation, thus
y
t=α+βxt+ut (2.2)
where the subscript t(=1, 2, 3, …) denotes the observation number. The
disturbance term can capture a number of features (see box 2.2).
So how are the appropriate values of αandβdetermined? αandβare
chosen so that the (vertical) distances from the data points to the ﬁttedlines are minimised (so that the line ﬁts the data as closely as possible).The parameters are thus chosen to minimise collectively the (vertical)distances from the data points to the ﬁtted line. This could be done by‘eye-balling’ the data and, for each set of variables yand x, one could
form a scatter plot and draw on a line that looks as if it ﬁts the data wellby hand, as in ﬁgure 2.2.
Note that the vertical distances are usually minimised rather than the
horizontal distances or those taken perpendicular to the line. This arises

A brief overview of the classical linear regression model 31
xy Figure 2.2
Scatter plot of two
variables with a lineof best ﬁt chosen byeye
as a result of the assumption that xis ﬁxed in repeated samples, so that
the problem becomes one of determining the appropriate model for y
given (or conditional upon) the observed values of x.
This ‘eye-balling’ procedure may be acceptable if only indicative results
are required, but of course this method, as well as being tedious, is likelyto be imprecise. The most common method used to ﬁt a line to the data isknown as ordinary least squares (OLS). This approach forms the workhorseof econometric model estimation, and will be discussed in detail in thisand subsequent chapters.
Two alternative estimation methods (for determining the appropriate
values of the coefﬁcients αandβ) are the method of moments and the
method of maximum likelihood. A generalised version of the method ofmoments, due to Hansen (1982), is popular, but beyond the scope of thisbook. The method of maximum likelihood is also widely employed, andwill be discussed in detail in chapter 8.
Suppose now, for ease of exposition, that the sample of data contains
only ﬁve observations. The method of OLS entails taking each verticaldistance from the point to the line, squaring it and then minimisingthe total sum of the areas of squares (hence ‘least squares’), as shown inﬁgure 2.3. This can be viewed as equivalent to minimising the sum of theareas of the squares drawn from the points to the line.
Tightening up the notation, let y
tdenote the actual data point for ob-
servation tand let ˆytdenote the ﬁtted value from the regression line – in

32 Introductory Econometrics for Finance
xy
10
86
4
20
1 03 25 47 6Figure 2.3
Method of OLS
ﬁtting a line to thedata by minimisingthe sum of squaredresiduals
xy
ûtyt
xtytˆFigure 2.4
Plot of a single
observation,together with theline of best ﬁt, theresidual and theﬁtted value
other words, for the given value of xof this observation t,ˆytis the value
forywhich the model would have predicted. Note that a hat (ˆ) over a
variable or parameter is used to denote a value estimated by a model.Finally, let ˆu
tdenote the residual, which is the difference between the
actual value of yand the value ﬁtted by the model for this data point –
i.e.(yt−ˆyt).This is shown for just one observation tin ﬁgure 2.4.
What is done is to minimise the sum of the ˆu2
t. The reason that the sum
of the squared distances is minimised rather than, for example, ﬁndingthe sum of ˆu
tthat is as close to zero as possible, is that in the latter case
some points will lie above the line while others lie below it. Then, whenthe sum to be made as close to zero as possible is formed, the points

A brief overview of the classical linear regression model 33
above the line would count as positive values, while those below would
count as negatives. So these distances will in large part cancel each otherout, which would mean that one could ﬁt virtually any line to the data,so long as the sum of the distances of the points above the line and thesum of the distances of the points below the line were the same. In thatcase, there would not be a unique solution for the estimated coefﬁcients.In fact, any ﬁtted line that goes through the mean of the observations(i.e. ¯x,¯y) would set the sum of the ˆu
tto zero. However, taking the squared
distances ensures that all deviations that enter the calculation are positiveand therefore do not cancel out.
So minimising the sum of squared distances is given by minimising
(ˆu
2
1+ˆu2
2+ˆu2
3+ˆu2
4+ˆu2
5), or minimising
/parenleftBigg5/summationdisplay
t=1ˆu2
t/parenrightBigg
This sum is known as the residual sum of squares (RSS) or the sum of squared
residuals. But what is ˆut? Again, it is the difference between the actual
point and the line, yt−ˆyt.So minimising/summationtext
tˆu2
tis equivalent to minimis-
ing/summationtext
t(yt−ˆyt)2.
Letting ˆαand ˆβdenote the values of αandβselected by minimising the
RSS, respectively, the equation for the ﬁtted line is given by ˆyt=ˆα+ˆβxt.
Now let Ldenote the RSS, which is also known as a loss function. Take
the summation over all of the observations, i.e. from t=1toT, where T
is the number of observations
L=T/summationdisplay
t=1(yt−ˆyt)2=T/summationdisplay
t=1(yt−ˆα−ˆβxt)2. (2.3)
Lis minimised with respect to (w.r.t.) ˆαand ˆβ, to ﬁnd the values of αandβ
which minimise the residual sum of squares to give the line that is closestto the data. So Lis differentiated w.r.t. ˆαand ˆβ, setting the ﬁrst derivatives
to zero. A derivation of the ordinary least squares (OLS) estimator is givenin the appendix to this chapter. The coefﬁcient estimators for the slopeand the intercept are given by
ˆβ=/summationdisplay
x
tyt−Txy/summationdisplay
x2
t−T¯x2(2.4) ˆα=¯y−ˆβ¯x (2.5)
Equations (2.4) and (2.5) state that, given only the sets of observations xt
and yt, it is always possible to calculate the values of the two parameters,
ˆαand ˆβ, that best ﬁt the set of data. Equation (2.4) is the easiest formula

34 Introductory Econometrics for Finance
Table 2.1 Sample data on fund XXX to motivate OLS estimation
Excess return on Excess return on
Year, t fund XXX =rXXX,t−rft market index =rmt−rft
1 17.8 13.7
2 39.0 23.23 12.8 6.94 24.2 16.85 17.2 12.3
to use to calculate the slope estimate, but the formula can also be written,
more intuitively, as
ˆβ=/summationtext(xt−¯x)(yt−¯y)/summationtext(xt−¯x)2(2.6)
which is equivalent to the sample covariance between xand ydivided by
the sample variance of x.
To reiterate, this method of ﬁnding the optimum is known as OLS. It
is also worth noting that it is obvious from the equation for ˆαthat the
regression line will go through the mean of the observations – i.e. thatthe point (¯x,¯y)lies on the regression line.
Example 2.1
Suppose that some data have been collected on the excess returns on a
fund manager’s portfolio (‘fund XXX’) together with the excess returns ona market index as shown in table 2.1.
The fund manager has some intuition that the beta (in the CAPM
framework) on this fund is positive, and she therefore wants to ﬁndwhether there appears to be a relationship between xand ygiven the data.
Again, the ﬁrst stage could be to form a scatter plot of the two variables(ﬁgure 2.5).
Clearly, there appears to be a positive, approximately linear relation-
ship between xand y, although there is not much data on which to base
this conclusion! Plugging the ﬁve observations in to make up the for-mulae given in (2.4) and (2.5) would lead to the estimates ˆα=−1.74 and
ˆβ=1.64. The ﬁtted line would be written as
ˆy
t=−1.74+1.64xt (2.7)
where xtis the excess return of the market portfolio over the risk free
rate (i.e. rm−rf), also known as the market risk premium .

A brief overview of the classical linear regression model 35
45
4035
30
2520
15
10
5
0
05 1 0 1 5 2 5 20Excess return on fund XXX
Excess return on market portfolioFigure 2.5
Scatter plot of
excess returns onfund XXX versusexcess returns onthe market portfolio
2.3.1 What are ˆαand ˆβused for?
This question is probably best answered by posing another question. If an
analyst tells you that she expects the market to yield a return 20% higherthan the risk-free rate next year, what would you expect the return onfund XXX to be?
The expected value of y=‘−1.74+1.64×value of x’, so plug x=20
into (2.7)
ˆy
t=−1.74+1.64×20=31.06 (2.8)
Thus, for a given expected market risk premium of 20%, and given its
riskiness, fund XXX would be expected to earn an excess over the risk-free rate of approximately 31%. In this setup, the regression beta is alsothe CAPM beta, so that fund XXX has an estimated beta of 1.64, sug-gesting that the fund is rather risky. In this case, the residual sum ofsquares reaches its minimum value of 30.33 with these OLS coefﬁcientvalues.
Although it may be obvious, it is worth stating that it is not advisable
to conduct a regression analysis using only ﬁve observations! Thus theresults presented here can be considered indicative and for illustration ofthe technique only. Some further discussions on appropriate sample sizesfor regression analysis are given in chapter 4.
The coefﬁcient estimate of 1.64 for βis interpreted as saying that, ‘if
xincreases by 1 unit, ywill be expected, everything else being equal,
to increase by 1.64 units’. Of course, if ˆβhad been negative, a rise in x
would on average cause a fall in y.ˆα, the intercept coefﬁcient estimate, is

36 Introductory Econometrics for Finance
xy
0Figure 2.6
No observations
close to the y-axis
interpreted as the value that would be taken by the dependent variable y
if the independent variable xtook a value of zero. ‘Units’ here refer to the
units of measurement of xtand yt. So, for example, suppose that ˆβ=1.64,
xis measured in per cent and yis measured in thousands of US dollars.
Then it would be said that if xrises by 1%, ywill be expected to rise on
average by $1.64 thousand (or $1,640). Note that changing the scale of y
orxwill make no difference to the overall results since the coefﬁcient
estimates will change by an off-setting factor to leave the overall relation-ship between yand xunchanged (see Gujarati, 2003, pp. 169–173 for a
proof). Thus, if the units of measurement of ywere hundreds of dollars
instead of thousands, and everything else remains unchanged, the slopecoefﬁcient estimate would be 16.4, so that a 1% increase in xwould lead
to an increase in yof $16.4 hundreds (or $1,640) as before. All other prop-
erties of the OLS estimator discussed below are also invariant to changesin the scaling of the data.
A word of caution is, however, in order concerning the reliability of
estimates of the constant term. Although the strict interpretation of theintercept is indeed as stated above, in practice, it is often the case thatthere are no values of xclose to zero in the sample. In such instances,
estimates of the value of the intercept will be unreliable. For example,consider ﬁgure 2.6, which demonstrates a situation where no points areclose to the y-axis.

A brief overview of the classical linear regression model 37
In such cases, one could not expect to obtain robust estimates of the
value of ywhen xis zero as all of the information in the sample pertains
to the case where xis considerably larger than zero.
A similar caution should be exercised when producing predictions for
yusing values of xthat are a long way outside the range of values in
the sample. In example 2.1, xtakes values between 7% and 23% in the
available data. So, it would not be advisable to use this model to determinethe expected excess return on the fund if the expected excess return onthe market were, say 1% or 30%, or −5% (i.e. the market was expected to
fall).
2.4 Some further terminology
2.4.1 The population and the sample
The population is the total collection of all objects or people to be studied .F o r
example, in the context of determining the relationship between risk andreturn for UK equities, the population of interest would be all time seriesobservations on all stocks traded on the London Stock Exchange (LSE).
The population may be either ﬁnite or inﬁnite, while a sample is a
selection of just some items from the population . In general, either all of the
observations for the entire population will not be available, or they may beso many in number that it is infeasible to work with them, in which casea sample of data is taken for analysis. The sample is usually random , and
it should be representative of the population of interest. A random sample
is a sample in which each individual item in the population is equallylikely to be drawn. The size of the sample is the number of observationsthat are available, or that it is decided to use, in estimating the regressionequation.
2.4.2 The data generating process, the population regression function and the
sample regression function
The population regression function (PRF) is a description of the model
that is thought to be generating the actual data and it represents the true
relationship between the variables . The population regression function is also
known as the data generating process (DGP). The PRF embodies the truevalues of αandβ, and is expressed as
y
t=α+βxt+ut (2.9)
Note that there is a disturbance term in this equation, so that even if one
had at one’s disposal the entire population of observations on xand y,

38 Introductory Econometrics for Finance
it would still in general not be possible to obtain a perfect ﬁt of the line
to the data. In some textbooks, a distinction is drawn between the PRF(the underlying true relationship between yand x)and the DGP (the
process describing the way that the actual observations on ycome about),
although in this book, the two terms will be used synonymously.
The sample regression function, SRF, is the relationship that has been
estimated using the sample observations, and is often written as
ˆy
t=ˆα+ˆβxt (2.10)
Notice that there is no error or residual term in (2.10); all this equation
states is that given a particular value of x, multiplying it by ˆβand adding
ˆαwill give the model ﬁtted or expected value for y, denoted ˆy. It is also
possible to write
yt=ˆα+ˆβxt+ˆut (2.11)
Equation (2.11) splits the observed value of yinto two components: the
ﬁtted value from the model, and a residual term.
The SRF is used to infer likely values of the PRF. That is, the estimates
ˆαand ˆβare constructed, for the sample of data at hand, but what is really
of interest is the true relationship between xand y– in other words,
the PRF is what is really wanted, but all that is ever available is the SRF!However, what can be said is how likely it is, given the ﬁgures calculatedforˆαand ˆβ, that the corresponding population parameters take on certain
values.
2.4.3 Linearity and possible forms for the regression function
In order to use OLS, a model that is linear is required. This means that,
in the simple bivariate case, the relationship between xand ymust be
capable of being expressed diagramatically using a straight line. Morespeciﬁcally, the model must be linear in the parameters (αandβ), but it
does not necessarily have to be linear in the variables (yand x). By ‘linear
in the parameters’, it is meant that the parameters are not multipliedtogether, divided, squared, or cubed, etc.
Models that are not linear in the variables can often be made to take
a linear form by applying a suitable transformation or manipulation. Forexample, consider the following exponential regression model
Y
t=AXβ
teut(2.12)

A brief overview of the classical linear regression model 39
Taking logarithms of both sides, applying the laws of logs and rearranging
the right-hand side (RHS)
lnYt=ln(A)+βlnXt+ut (2.13)
where Aandβare parameters to be estimated. Now let α=ln(A),yt=lnYt
and xt=lnXt
yt=α+βxt+ut (2.14)
This is known as an exponential regression model since Yvaries according
to some exponent (power) function of X. In fact, when a regression equa-
tion is expressed in ‘double logarithmic form’, which means that boththe dependent and the independent variables are natural logarithms, thecoefﬁcient estimates are interpreted as elasticities (strictly, they are unitchanges on a logarithmic scale). Thus a coefﬁcient estimate of 1.2 for ˆβin
(2.13) or (2.14) is interpreted as stating that ‘a rise in Xof 1% will lead on
average, everything else being equal, to a rise in Yof 1.2%’. Conversely, for
yand xin levels rather than logarithmic form (e.g. (2.9)), the coefﬁcients
denote unit changes as described above.
Similarly, if theory suggests that xshould be inversely related to yac-
cording to a model of the form
y
t=α+β
xt+ut (2.15)
the regression can be estimated using OLS by setting
zt=1
xt
and regressing yon a constant and z. Clearly, then, a surprisingly varied
array of models can be estimated using OLS by making suitable transfor-mations to the variables. On the other hand, some models are intrinsically
non-linear , e.g.
y
t=α+βxγ
t+ut (2.16)
Such models cannot be estimated using OLS, but might be estimable using
a non-linear estimation method (see chapter 8).
2.4.4 Estimator or estimate?
Estimators are the formulae used to calculate the coefficients – for example,
the expressions given in (2.4) and (2.5) above, while the estimates, onthe other hand, are the actual numerical values for the coefficients that are
obtained from the sample.

40 Introductory Econometrics for Finance
2.5 Simple linear regression in EViews – estimation
of an optimal hedge ratio
This section shows how to run a bivariate regression using EViews. The
example considers the situation where an investor wishes to hedge a longposition in the S&P500 (or its constituent stocks) using a short positionin futures contracts. Many academic studies assume that the objective ofhedging is to minimise the variance of the hedged portfolio returns. Ifthis is the case, then the appropriate hedge ratio (the number of unitsof the futures asset to sell per unit of the spot asset held) will be theslope estimate (i.e. ˆβ) in a regression where the dependent variable is a
time series of spot returns and the independent variable is a time seriesof futures returns.
2
This regression will be run using the ﬁle ‘SandPhedge.xls’, which con-
tains monthly returns for the S&P500 index (in column 2) and S&P500futures (in column 3). As described in chapter 1, the ﬁrst step is toopen an appropriately dimensioned workﬁle. Open EViews and click on
File/New/Workfile ; choose Dated – regular frequency and Monthly fre-
quency data. The start date is 2002:02 and the end date is 2007:07 . Then
import the Excel ﬁle by clicking Import and Read Text-Lotus-Excel . The
data start in B2 and as for the previous example in chapter 1, the ﬁrstcolumn contains only dates which we do not need to read in. In ‘Namesfor series or Number if named in ﬁle’, we can write Spot Futures . The
two imported series will now appear as objects in the workﬁle and canbe veriﬁed by checking a couple of entries at random against the originalExcel ﬁle.
The ﬁrst step is to transform the levels of the two series into percentage
returns. It is common in academic research to use continuously com-pounded returns rather than simple returns. To achieve this (i.e. to pro-duce continuously compounded returns), click on Genr and in the ‘Enter
Equation’ dialog box, enter dfutures=100*dlog(futures) . Then click Genr
again and do the same for the spot series: dspot=100*dlog(spot) . Do not
forget to Save the workfile. Continue to re-save it at regular intervals to
ensure that no work is lost!
Before proceeding to estimate the regression, now that we have im-
ported more than one series, we can examine a number of descriptivestatistics together and measures of association between the series. For ex-ample, click Quick and Group Statistics . From there you will see that it
is possible to calculate the covariances or correlations between series and
2See chapter 8 for a detailed discussion of why this is the appropriate hedge ratio.

A brief overview of the classical linear regression model 41
a number of other measures that will be discussed later in the book. For
now, click on Descriptive Statistics and Common Sample .3In the dialog
box that appears, type rspot rfutures and click OK. Some summary statis-
tics for the spot and futures are presented, as displayed in screenshot 2.1,and these are quite similar across the two series, as one would expect.
Screenshot 2.1
Summary statistics
for spot and futures
Note that the number of observations has reduced from 66 for the levels
of the series to 65 when we computed the returns (as one observation is‘lost’ in constructing the t−1value of the prices in the returns formula).
If you want to save the summary statistics, you must name them by click-ingName and then choose a name, e.g. Descstats . The default name is
‘group01’, which could have also been used. Click OK.
We can now proceed to estimate the regression. There are several ways to
do this, but the easiest is to select Quick and then Estimate Equation .Y o u
3‘Common sample’ will use only the part of the sample that is available for all the series
selected, whereas ‘Individual sample’ will use all available observations for eachindividual series. In this case, the number of observations is the same for both seriesand so identical results would be observed for both options.

42 Introductory Econometrics for Finance
Screenshot 2.2
Equation estimation
window
will be presented with a dialog box, which, when it has been completed,
will look like screenshot 2.2.
In the ‘Equation Speciﬁcation’ window, you insert the list of variables
to be used, with the dependent variable (y)ﬁrst, and including a constant
(c), so type rspot c rfutures . Note that it would have been possible to write
this in an equation format as rspot =c(1)+c(2)∗rfutures, but this is more
cumbersome.
In the ‘Estimation settings’ box, the default estimation method is OLS
and the default sample is the whole sample, and these need not be modi-ﬁed. Click OKand the regression results will appear, as in screenshot 2.3.
The parameter estimates for the intercept (ˆα)and slope (ˆβ)are 0.36 and
0.12 respectively. Name the regression results returnreg , and it will now
appear as a new object in the list. A large number of other statistics arealso presented in the regression output – the purpose and interpretationof these will be discussed later in this and subsequent chapters.
Now estimate a regression for the levels of the series rather than
the returns (i.e. run a regression of spot on a constant and futures) and
examine the parameter estimates. The return regression slope parame-ter estimated above measures the optimal hedge ratio and also measures

A brief overview of the classical linear regression model 43
Screenshot 2.3
Estimation results
the short run relationship between the two series. By contrast, the slope
parameter in a regression using the raw spot and futures indices (or thelog of the spot series and the log of the futures series) can be interpretedas measuring the long run relationship between them. This issue of thelong and short runs will be discussed in detail in chapter 4. For now, clickQuick/Estimate Equation and enter the variables spot c futures in the
Equation Speciﬁcation dialog box, click OK, then name the regression
results ‘levelreg’ . The intercept estimate (ˆα)in this regression is 21.11
and the slope estimate (ˆβ)is 0.98. The intercept can be considered to ap-
proximate the cost of carry, while as expected, the long-term relationshipbetween spot and futures prices is almost 1:1 – see chapter 7 for furtherdiscussion of the estimation and interpretation of this long-term relation-ship. Finally, click the Save button to save the whole workﬁle.
2.6 The assumptions underlying the classical linear regression model
The model yt=α+βxt+utthat has been derived above, together with
the assumptions listed below, is known as the classical linear regression model

44 Introductory Econometrics for Finance
Box 2.3 Assumptions concerning disturbance terms and their interpretation
Technical notation Interpretation
(1)E(ut)=0 The errors have zero mean
(2)var(ut)=σ2<∞ The variance of the errors is constant and
ﬁnite over all values of xt
(3)cov(ui,uj)=0 The errors are linearly independent of
one another
(4)cov(ut,xt)=0 There is no relationship between the error
and corresponding xvariate
(CLRM). Data for xtis observable, but since ytalso depends on ut, it is neces-
sary to be speciﬁc about how the utare generated. The set of assumptions
shown in box 2.3 are usually made concerning the uts, the unobservable
error or disturbance terms. Note that no assumptions are made concern-ing their observable counterparts, the estimated model’s residuals.
As long as assumption 1holds, assumption 4 can be equivalently written
E(x
tut)=0. Both formulations imply that the regressor is orthogonal to
(i.e. unrelated to) the error term. An alternative assumption to 4, whichis slightly stronger, is that the x
tarenon-stochastic or ﬁxed in repeated
samples. This means that there is no sampling variation in xt, and that
its value is determined outside the model.
A ﬁfth assumption is required to make valid inferences about the pop-
ulation parameters (the actual αandβ) from the sample parameters (ˆα
and ˆβ)estimated using a ﬁnite amount of data:
(5)ut∼N(0,σ2)−i.e.thatutis normally distributed
2.7 Properties of the OLS estimator
If assumptions 1–4 hold, then the estimators ˆαand ˆβdetermined by OLS
will have a number of desirable properties, and are known as Best LinearUnbiased Estimators (BLUE). What does this acronym stand for?
●‘Estimator’ – ˆαand ˆβare estimators of the true value of αandβ
●‘Linear’ – ˆαand ˆβare linear estimators – that means that the formulae
forˆαand ˆβare linear combinations of the random variables (in this
case, y)
●‘Unbiased’ – on average, the actual values of ˆαand ˆβwill be equal to
their true values

A brief overview of the classical linear regression model 45
●‘Best’ – means that the OLS estimator ˆβhas minimum variance among
the class of linear unbiased estimators; the Gauss–Markov theoremproves that the OLS estimator is best by examining an arbitrary alter-native linear unbiased estimator and showing in all cases that it musthave a variance no smaller than the OLS estimator.
Under assumptions 1–4 listed above, the OLS estimator can be shown
to have the desirable properties that it is consistent, unbiased and efﬁ-cient. Unbiasedness and efﬁciency have already been discussed above, andconsistency is an additional desirable property. These three characteristicswill now be discussed in turn.
2.7.1 Consistency
The least squares estimators ˆαand ˆβare consistent. One way to state this
algebraically for ˆβ(with the obvious modiﬁcations made for ˆα)is
lim
T→∞Pr [|ˆβ−β|>δ]=0∀δ>0 (2.17)
This is a technical way of stating that the probability (Pr) that ˆβis more
than some arbitrary ﬁxed distance δaway from its true value tends to
zero as the sample size tends to inﬁnity, for all positive values of δ.I n
the limit (i.e. for an inﬁnite number of observations), the probability ofthe estimator being different from the true value is zero. That is, theestimates will converge to their true values as the sample size increasesto inﬁnity. Consistency is thus a large sample, or asymptotic property. Theassumptions that E (x
tut)=0and E(ut)=0are sufﬁcient to derive the
consistency of the OLS estimator.
2.7.2 Unbiasedness
The least squares estimates of ˆαand ˆβare unbiased. That is
E(ˆα)=α (2.18)
and
E(ˆβ)=β (2.19)
Thus, on average, the estimated values for the coefﬁcients will be equal to
their true values. That is, there is no systematic overestimation or under-estimation of the true coefﬁcients. To prove this also requires the assump-tion that cov(u
t,xt)=0. Clearly, unbiasedness is a stronger condition than
consistency, since it holds for small as well as large samples (i.e. for allsample sizes).

46 Introductory Econometrics for Finance
2.7.3 Efﬁciency
An estimator ˆβof a parameter βis said to be efﬁcient if no other estima-
tor has a smaller variance. Broadly speaking, if the estimator is efﬁcient,it will be minimising the probability that it is a long way off from thetrue value of β. In other words, if the estimator is ‘best’, the uncertainty
associated with estimation will be minimised for the class of linear un-biased estimators. A technical way to state this would be to say that anefﬁcient estimator would have a probability distribution that is narrowlydispersed around the true value.
2.8 Precision and standard errors
Any set of regression estimates ˆαand ˆβare speciﬁc to the sample used
in their estimation. In other words, if a different sample of data wasselected from within the population, the data points (the x
tand yt)will
be different, leading to different values of the OLS estimates.
Recall that the OLS estimators (ˆαand ˆβ)are given by (2.4) and (2.5). It
would be desirable to have an idea of how ‘good’ these estimates of αand
βare in the sense of having some measure of the reliability or precision of
the estimators (ˆαand ˆβ). It is thus useful to know whether one can have
conﬁdence in the estimates, and whether they are likely to vary muchfrom one sample to another sample within the given population. An ideaof the sampling variability and hence of the precision of the estimatescan be calculated using only the sample of data available. This estimate isgiven by its standard error. Given assumptions 1–4 above, valid estimatorsof the standard errors can be shown to be given by
SE(ˆα)=s/radicaltp/radicalvertex/radicalvertex/radicalbt
/summationdisplay
x2
t
T/summationdisplay
(xt−¯x)2=s/radicaltp/radicalvertex/radicalvertex/radicalvertex/radicalbt/summationdisplay
x2
t
T/parenleftBig/parenleftBig/summationdisplay
x2
t/parenrightBig
−T¯x2/parenrightBig (2.20)
SE(ˆβ)=s/radicalBigg
1/summationdisplay
(xt−¯x)2=s/radicalBigg
1/summationdisplay
x2
t−T¯x2(2.21)
where sis the estimated standard deviation of the residuals (see below).
These formulae are derived in the appendix to this chapter.
It is worth noting that the standard errors give only a general indication
of the likely accuracy of the regression parameters. They do not showhow accurate a particular set of coefﬁcient estimates is. If the standarderrors are small, it shows that the coefﬁcients are likely to be preciseon average, not how precise they are for this particular sample. Thusstandard errors give a measure of the degree of uncertainty in the estimated

A brief overview of the classical linear regression model 47
values for the coefﬁcients. It can be seen that they are a function of
the actual observations on the explanatory variable, x, the sample size,
T, and another term, s. The last of these is an estimate of the variance
of the disturbance term. The actual variance of the disturbance term isusually denoted by σ
2. How can an estimate of σ2be obtained?
2.8.1 Estimating the variance of the error term (σ2)
From elementary statistics, the variance of a random variable utis given by
var(ut)=E[(ut)−E(ut)]2(2.22)
Assumption 1of the CLRM was that the expected or average value of the
errors is zero. Under this assumption, (2.22) above reduces to
var(ut)=E/bracketleftbig
u2
t/bracketrightbig
(2.23)
So what is required is an estimate of the average value of u2
t, which could
be calculated as
s2=1
T/summationdisplay
u2
t (2.24)
Unfortunately (2.24) is not workable since utis a series of population
disturbances, which is not observable. Thus the sample counterpart to ut,
which is ˆut, is used
s2=1
T/summationdisplay
ˆu2
t (2.25)
But this estimator is a biased estimator of σ2. An unbiased estimator,
s2, would be given by the following equation instead of the previous one
s2=/summationdisplay
ˆu2
t
T−2(2.26)
where/summationtextˆu2
tis the residual sum of squares, so that the quantity of rele-
vance for the standard error formulae is the square root of (2.26)
s=/radicalBigg/summationdisplay
ˆu2
t
T−2(2.27)
sis also known as the standard error of the regression or the standard error
of the estimate. It is sometimes used as a broad measure of the ﬁt of theregression equation. Everything else being equal, the smaller this quantityis, the closer is the ﬁt of the line to the actual data.
2.8.2 Some comments on the standard error estimators
It is possible, of course, to derive the formulae for the standard errorsof the coefﬁcient estimates from ﬁrst principles using some algebra, and

48 Introductory Econometrics for Finance
this is left to the appendix to this chapter. Some general intuition is now
given as to why the formulae for the standard errors given by (2.20) and(2.21) contain the terms that they do and in the form that they do. Thepresentation offered in box 2.4 loosely follows that of Hill, Grifﬁths andJudge (1997), which is the clearest that this author has seen.
Box 2.4 Standard error estimators
(1) The larger the sample size, T, the smaller will be the coefﬁcient standard errors.
Tappears explicitly in SE(ˆα)and implicitly in SE(ˆβ).Tappears implicitly since the
sum/summationtext(xt−¯x)2is from t=1toT. The reason for this is simply that, at least for
now, it is assumed that every observation on a series represents a piece of usefulinformation which can be used to help determine the coefﬁcient estimates. So thelarger the size of the sample, the more information will have been used in estimationof the parameters, and hence the more conﬁdence will be placed in those estimates.
(2) Both SE(ˆα)and SE(ˆβ)depend on s
2(ors). Recall from above that s2is the estimate
of the error variance. The larger this quantity is, the more dispersed are the residuals,and so the greater is the uncertainty in the model. If s
2is large, the data points are
collectively a long way away from the line.
(3) The sum of the squares of the xtabout their mean appears in both formulae – since/summationtext(xt−¯x)2appears in the denominators. The larger the sum of squares, the smaller
the coefﬁcient variances. Consider what happens if/summationtext(xt−¯x)2is small or large, as
shown in ﬁgures 2.7 and 2.8, respectively.
In ﬁgure 2.7, the data are close together so that/summationtext(xt−¯x)2is small. In this ﬁrst
case, it is more difﬁcult to determine with any degree of certainty exactly where theline should be. On the other hand, in ﬁgure 2.8, the points are widely dispersed
y
x_y
x_0Figure 2.7
Effect on the
standard errors ofthe coefﬁcientestimates when(x
t−¯x)are narrowly
dispersed

A brief overview of the classical linear regression model 49
across a long section of the line, so that one could hold more conﬁdence in the
estimates in this case.
(4) The term/summationtextx2
taffects only the intercept standard error and not the slope standard
error. The reason is that/summationtextx2
tmeasures how far the points are away from the y-axis.
Consider ﬁgures 2.9 and 2.10.
In ﬁgure 2.9, all of the points are bunched a long way from the y-axis, which makes
it more difﬁcult to accurately estimate the point at which the estimated line crossesthey-axis (the intercept). In ﬁgure 2.10, the points collectively are closer to
y
x 0_y
x_Figure 2.8
Effect on the
standard errors ofthe coefﬁcientestimates when(x
t−¯x)are widely
dispersed
xy
0Figure 2.9
Effect on the
standard errors of
x2
tlarge

50 Introductory Econometrics for Finance
xy
0Figure 2.10
Effect on the
standard errors of
x2
tsmall
they-axis and hence it will be easier to determine where the line actually crosses
the axis. Note that this intuition will work only in the case where all of the xtare
positive!
Example 2.2
Assume that the following data have been calculated from a regression of
yon a single variable xand a constant over 22 observations
/summationdisplay
xtyt=830102 ,T=22,¯x=416.5,¯y=86.65,
/summationdisplay
x2
t=3919654 ,RSS=130.6
Determine the appropriate values of the coefﬁcient estimates and their
standard errors.
This question can simply be answered by plugging the appropriate num-
bers into the formulae given above. The calculations are
ˆβ=830102 −(22×416.5×86.65)
3919654 −22×(416.5)2=0.35
ˆα=86.65−0.35×416.5=−59.12
The sample regression function would be written as
ˆyt=ˆα+ˆβxt
ˆyt=−59.12+0.35xt

A brief overview of the classical linear regression model 51
Now, turning to the standard error calculations, it is necessary to obtain
an estimate, s, of the error variance
SE(regression ),s=/radicalBigg/summationtextˆu2
t
T−2=/radicalbigg
130.6
20=2.55
SE(ˆα)=2.55×/radicalBigg
3919654
22×(3919654 −22×416.52)=3.35
SE(ˆβ)=2.55×/radicalbigg
1
3919654 −22×416.52=0.0079
With the standard errors calculated, the results are written as
ˆyt=−59.12+0.35xt
(3.35) (0 .0079)(2.28)
The standard error estimates are usually placed in parentheses under the
relevant coefﬁcient estimates.
2.9 An introduction to statistical inference
Often, ﬁnancial theory will suggest that certain coefﬁcients should takeon particular values, or values within a given range. It is thus of interestto determine whether the relationships expected from ﬁnancial theoryare upheld by the data to hand or not. Estimates of αandβhave been
obtained from the sample, but these values are not of any particular in-terest; the population values that describe the true relationship betweenthe variables would be of more interest, but are never available. Instead,inferences are made concerning the likely population values from the re-gression parameters that have been estimated from the sample of datato hand. In doing this, the aim is to determine whether the differencesbetween the coefﬁcient estimates that are actually obtained, and expecta-tions arising from ﬁnancial theory, are a long way from one another in astatistical sense.
Example 2.3
Suppose the following regression results have been calculated:
ˆyt=20.3+0.5091 xt
(14.38) (0 .2561)(2.29)
ˆβ=0.5091 is a single (point) estimate of the unknown population param-
eter,β. As stated above, the reliability of the point estimate is measured

52 Introductory Econometrics for Finance
by the coefﬁcient’s standard error. The information from one or more of
the sample coefﬁcients and their standard errors can be used to makeinferences about the population parameters. So the estimate of the slopecoefﬁcient is ˆβ=0.5091, but it is obvious that this number is likely to
vary to some degree from one sample to the next. It might be of interestto answer the question, ‘Is it plausible, given this estimate, that the truepopulation parameter, β, could be 0.5? Is it plausible that βcould be 1?’,
etc. Answers to these questions can be obtained through hypothesis testing .
2.9.1 Hypothesis testing: some concepts
In the hypothesis testing framework, there are always two hypotheses that
go together, known as the null hypothesis (denoted H 0or occasionally H N)
and the alternative hypothesis (denoted H 1or occasionally H A). The null hy-
pothesis is the statement or the statistical hypothesis that is actually beingtested. The alternative hypothesis represents the remaining outcomes ofinterest.
For example, suppose that given the regression results above, it is of
interest to test the hypothesis that the true value of βis in fact 0.5. The
following notation would be used.
H
0:β=0.5
H1:β/negationslash=0.5
This states that the hypothesis that the true but unknown value of βcould
be 0.5 is being tested against an alternative hypothesis where βis not 0.5.
This would be known as a two-sided test, since the outcomes of bothβ< 0.5 and β> 0.5 are subsumed under the alternative hypothesis.
Sometimes, some prior information may be available, suggesting for
example that β> 0.5 would be expected rather than β< 0.5. In this case,
β< 0.5 is no longer of interest to us, and hence a one-sided test would be
conducted:
H
0:β=0.5
H1:β> 0.5
Here the null hypothesis that the true value of βis 0.5 is being tested
against a one-sided alternative that βis more than 0.5.
On the other hand, one could envisage a situation where there is prior
information that β< 0.5 is expected. For example, suppose that an in-
vestment bank bought a piece of new risk management software that isintended to better track the riskiness inherent in its traders’ books andthatβis some measure of the risk that previously took the value 0.5.
Clearly, it would not make sense to expect the risk to have risen, and so

A brief overview of the classical linear regression model 53
β> 0.5, corresponding to an increase in risk, is not of interest. In this
case, the null and alternative hypotheses would be speciﬁed as
H0:β=0.5
H1:β< 0.5
This prior information should come from the ﬁnancial theory of the prob-
lem under consideration, and not from an examination of the estimatedvalue of the coefﬁcient. Note that there is always an equality under thenull hypothesis. So, for example, β< 0.5 would not be speciﬁed under
the null hypothesis.
There are two ways to conduct a hypothesis test: via the test of significance
approach or via the confidence interval approach. Both methods centre on
a statistical comparison of the estimated value of the coefﬁcient, and itsvalue under the null hypothesis. In very general terms, if the estimatedvalue is a long way away from the hypothesised value, the null hypothesisis likely to be rejected; if the value under the null hypothesis and the esti-mated value are close to one another, the null hypothesis is less likely tobe rejected. For example, consider ˆβ=0.5091 as above. A hypothesis that
the true value of βis 5 is more likely to be rejected than a null hypothesis
that the true value of βis0.5. What is required now is a statistical decision
rule that will permit the formal testing of such hypotheses.
2.9.2 The probability distribution of the least squares estimators
In order to test hypotheses, assumption 5 of the CLRM must be used,
namely that ut∼N(0,σ2)– i.e. that the error term is normally distributed.
The normal distribution is a convenient one to use for it involves onlytwo parameters (its mean and variance). This makes the algebra involvedin statistical inference considerably simpler than it otherwise would havebeen. Since y
tdepends partially on ut, it can be stated that if utis normally
distributed, ytwill also be normally distributed.
Further, since the least squares estimators are linear combinations of
the random variables, i.e. ˆβ=/summationtextwtyt, where wtare effectively weights,
and since the weighted sum of normal random variables is also normallydistributed, it can be said that the coefﬁcient estimates will also be nor-mally distributed. Thus
ˆα∼N(α,var( ˆα)) and ˆβ∼N(β,var(ˆβ))
Will the coefﬁcient estimates still follow a normal distribution if the er-
rors do not follow a normal distribution? Well, brieﬂy, the answer is usu-ally ‘yes’, provided that the other assumptions of the CLRM hold, and thesample size is sufﬁciently large. The issue of non-normality, how to testfor it, and its consequences, will be further discussed in chapter 4.

54 Introductory Econometrics for Finance
xxf ( ) Figure 2.11
The normal
distribution
Standard normal variables can be constructed from ˆαand ˆβby subtract-
ing the mean and dividing by the square root of the variance
ˆα−α√var( ˆα)∼N(0,1) andˆβ−β/radicalbig
var(ˆβ)∼N(0,1)
The square roots of the coefﬁcient variances are the standard errors. Unfor-
tunately, the standard errors of the true coefﬁcient values under the PRFare never known – all that is available are their sample counterparts, thecalculated standard errors of the coefﬁcient estimates, SE(ˆα)and SE(ˆβ).
4
Replacing the true values of the standard errors with the sample es-
timated versions induces another source of uncertainty, and also meansthat the standardised statistics follow a t-distribution with T−2degrees
of freedom (deﬁned below) rather than a normal distribution, so
ˆα−α
SE(ˆα)∼tT−2 andˆβ−β
SE(ˆβ)∼tT−2
This result is not formally proved here. For a formal proof, see Hill,
Grifﬁths and Judge (1997, pp. 88–90).
2.9.3 A note on the t and the normal distributions
The normal distribution, shown in ﬁgure 2.11, should be familiar to read-ers. Note its characteristic ‘bell’ shape and its symmetry around the mean(of zero for a standard normal distribution).
4Strictly, these are the estimated standard errors conditional on the parameter estimates,
and so should be denoted SˆE(ˆα)and SˆE(ˆβ), but the additional layer of hats will be
omitted here since the meaning should be obvious from the context.

A brief overview of the classical linear regression model 55
Table 2.2 Critical values from the standard normal versus
t-distribution
Signiﬁcance level (%) N(0,1) t40 t4
50% 0 0 0
5% 1.64 1.68 2.132.5% 1.96 2.02 2.780.5% 2.57 2.70 4.60
normal distribution
t-distribution
xxf ( ) Figure 2.12
Thet-distribution
versus the normal
A normal variate can be scaled to have zero mean and unit variance
by subtracting its mean and dividing by its standard deviation. There is aspeciﬁc relationship between the t- and the standard normal distribution,
and the t-distribution has another parameter, its degrees of freedom.
What does the t-distribution look like? It looks similar to a normal
distribution, but with fatter tails, and a smaller peak at the mean, asshown in ﬁgure 2.12.
Some examples of the percentiles from the normal and t-distributions
taken from the statistical tables are given in table 2.2. When used in thecontext of a hypothesis test, these percentiles become critical values. Thevalues presented in table 2.2 would be those critical values appropriatefor a one-sided test of the given signiﬁcance level.
It can be seen that as the number of degrees of freedom for the t-
distribution increases from 4 to 40, the critical values fall substantially.In ﬁgure 2.12, this is represented by a gradual increase in the height ofthe distribution at the centre and a reduction in the fatness of the tails asthe number of degrees of freedom increases. In the limit, a t-distribution
with an inﬁnite number of degrees of freedom is a standard normal, i.e.

56 Introductory Econometrics for Finance
t∞=N(0,1), so the normal distribution can be viewed as a special case of
thet.
Putting the limit case, t∞, aside, the critical values for the t-distribution
are larger in absolute value than those from the standard normal. Thisarises from the increased uncertainty associated with the situation wherethe error variance must be estimated. So now the t-distribution is used,
and for a given statistic to constitute the same amount of reliable evidenceagainst the null, it has to be bigger in absolute value than in circumstanceswhere the normal is applicable.
There are broadly two approaches to testing hypotheses under regres-
sion analysis: the test of signiﬁcance approach and the conﬁdence intervalapproach. Each of these will now be considered in turn.
2.9.4 The test of signiﬁcance approach
Assume the regression equation is given by yt=α+βxt+ut,t=
1,2,…, T. The steps involved in doing a test of signiﬁcance are shown
in box 2.5.
Box 2.5 Conducting a test of signiﬁcance
(1) Estimate ˆα,ˆβand SE(ˆα),SE(ˆβ)in the usual way.
(2) Calculate the test statistic. This is given by the formula
test statistic =ˆβ−β∗
SE(ˆβ)(2.30)
where β∗is the value of βunder the null hypothesis. The null hypothesis is H0:β
=β∗and the alternative hypothesis is H1:β/negationslash=β∗(for a two-sided test).
(3) A tabulated distribution with which to compare the estimated test statistics is re-
quired. Test statistics derived in this way can be shown to follow a t-distribution with
T−2degrees of freedom.
(4) Choose a ‘signiﬁcance level’, often denoted α(notthe same as the regression
intercept coefﬁcient). It is conventional to use a signiﬁcance level of 5%.
(5) Given a signiﬁcance level, a rejection region andnon-rejection region can be de-
termined. If a 5% signiﬁcance level is employed, this means that 5% of the totaldistribution (5% of the area under the curve) will be in the rejection region. Thatrejection region can either be split in half (for a two-sided test) or it can all fall onone side of the y-axis, as is the case for a one-sided test.
For a two-sided test, the 5% rejection region is split equally between the two tails,
as shown in ﬁgure 2.13.
For a one-sided test, the 5% rejection region is located solely in one tail of the
distribution, as shown in ﬁgures 2.14 and 2.15, for a test where the alternativeis of the ‘less than’ form, and where the alternative is of the ‘greater than’ form,respectively.

A brief overview of the classical linear regression model 57
x95% non-rejection region 2.5%
rejection region2.5%
rejection regionxf ( ) Figure 2.13
Rejection regions for
a two-sided 5%hypothesis test
x95% non-rejection region5%
rejection regionxf ( ) Figure 2.14
Rejection region for
a one-sidedhypothesis test ofthe form
H
0:β=β∗,
H1:β<β∗
x95% non-rejection region5%
rejection regionxf ( ) Figure 2.15
Rejection region for
a one-sidedhypothesis test ofthe form
H
0:β=β∗,
H1:β>β∗

58 Introductory Econometrics for Finance
Box 2.5 contd.
(6) Use the t-tables to obtain a critical value or values with which to compare the test
statistic. The critical value will be that value of xthat puts 5% into the rejection
region.
(7) Finally perform the test. If the test statistic lies in the rejection region then reject
the null hypothesis (H0), else do not reject H0.
Steps 2–7 require further comment. In step 2, the estimated value of βis
compared with the value that is subject to test under the null hypothesis,but this difference is ‘normalised’ or scaled by the standard error of thecoefﬁcient estimate. The standard error is a measure of how conﬁdentone is in the coefﬁcient estimate obtained in the ﬁrst stage. If a standarderror is small, the value of the test statistic will be large relative to thecase where the standard error is large. For a small standard error, it wouldnot require the estimated and hypothesised values to be far away from oneanother for the null hypothesis to be rejected. Dividing by the standarderror also ensures that, under the ﬁve CLRM assumptions, the test statisticfollows a tabulated distribution.
In this context, the number of degrees of freedom can be interpreted
as the number of pieces of additional information beyond the minimumrequirement. If two parameters are estimated (αandβ– the intercept
and the slope of the line, respectively), a minimum of two observations isrequired to ﬁt this line to the data. As the number of degrees of freedomincreases, the critical values in the tables decrease in absolute terms, sinceless caution is required and one can be more conﬁdent that the resultsare appropriate.
The signiﬁcance level is also sometimes called the size of the test (note
that this is completely different from the size of the sample) and it de-termines the region where the null hypothesis under test will be rejectedor not rejected. Remember that the distributions in ﬁgures 2.13–2.15 arefor a random variable. Purely by chance, a random variable will take onextreme values (either large and positive values or large and negative val-ues) occasionally. More speciﬁcally, a signiﬁcance level of 5% means thata result as extreme as this or more extreme would be expected only 5%of the time as a consequence of chance alone. To give one illustration, ifthe 5% critical value for a one-sided test is 1.68, this implies that the teststatistic would be expected to be greater than this only 5% of the time bychance alone. There is nothing magical about the test – all that is done isto specify an arbitrary cutoff value for the test statistic that determineswhether the null hypothesis would be rejected or not. It is conventionalto use a 5% size of test, but 10% and 1% are also commonly used.

A brief overview of the classical linear regression model 59
However, one potential problem with the use of a ﬁxed (e.g. 5%) size
of test is that if the sample size is sufﬁciently large, any null hypothesiscan be rejected. This is particularly worrisome in ﬁnance, where tens ofthousands of observations or more are often available. What happens isthat the standard errors reduce as the sample size increases, thus leadingto an increase in the value of all t-test statistics. This problem is frequently
overlooked in empirical work, but some econometricians have suggestedthat a lower size of test (e.g. 1%) should be used for large samples (see, forexample, Leamer, 1978, for a discussion of these issues).
Note also the use of terminology in connection with hypothesis tests:
it is said that the null hypothesis is either rejected ornot rejected .I ti s
incorrect to state that if the null hypothesis is not rejected, it is ‘accepted’(although this error is frequently made in practice), and it is never saidthat the alternative hypothesis is accepted or rejected. One reason whyit is not sensible to say that the null hypothesis is ‘accepted’ is that itis impossible to know whether the null is actually true or not! In anygiven situation, many null hypotheses will not be rejected. For example,suppose that H
0:β=0.5 and H 0:β=1 are separately tested against the
relevant two-sided alternatives and neither null is rejected. Clearly then itwould not make sense to say that ‘H
0:β=0.5 is accepted’ and ‘H 0:β=1
is accepted’, since the true (but unknown) value of βcannot be both 0.5
and 1. So, to summarise, the null hypothesis is either rejected or notrejected on the basis of the available evidence.
2.9.5 The conﬁdence interval approach to hypothesis testing (box 2.6)
To give an example of its usage, one might estimate a parameter, say ˆβ,t o
be 0.93, and a ‘95% conﬁdence interval’ to be (0.77, 1.09). This means thatin many repeated samples, 95% of the time, the true value of βwill be
contained within this interval. Conﬁdence intervals are almost invariablyestimated in a two-sided form, although in theory a one-sided intervalcan be constructed. Constructing a 95% conﬁdence interval is equivalentto using the 5% level in a test of signiﬁcance.
2.9.6 The test of signiﬁcance and conﬁdence interval approaches always
give the same conclusion
Under the test of signiﬁcance approach, the null hypothesis that β=β∗
will not be rejected if the test statistic lies within the non-rejection region,
i.e. if the following condition holds
−tcrit≤ˆβ−β∗
SE(ˆβ)≤+ tcrit

60 Introductory Econometrics for Finance
Box 2.6 Carrying out a hypothesis test using conﬁdence intervals
(1) Calculate ˆα,ˆβand SE(ˆα),SE(ˆβ)as before.
(2) Choose a signiﬁcance level, α(again the convention is 5%). This is equivalent to
choosing a (1−α)∗100% conﬁdence interval
i.e.5% signiﬁcance level =95% conﬁdence interval
(3) Use the t-tables to ﬁnd the appropriate critical value, which will again have T−2
degrees of freedom.
(4) The conﬁdence interval for βis given by
(ˆβ−tcrit·SE(ˆβ),ˆβ+tcrit·SE(ˆβ))
Note that a centre dot (·)is sometimes used instead of a cross (×)to denote when
two quantities are multiplied together.
(5) Perform the test: if the hypothesised value of β(i.e.β∗)lies outside the conﬁdence
interval, then reject the null hypothesis that β=β∗, otherwise do not reject the null.
Rearranging, the null hypothesis would not be rejected if
−tcrit·SE(ˆβ)≤ˆβ−β∗≤+ tcrit·SE(ˆβ)
i.e. one would not reject if
ˆβ−tcrit·SE(ˆβ)≤β∗≤ˆβ+tcrit·SE(ˆβ)
But this is just the rule for non-rejection under the conﬁdence interval
approach. So it will always be the case that, for a given signiﬁcance level,the test of signiﬁcance and conﬁdence interval approaches will providethe same conclusion by construction. One testing approach is simply analgebraic rearrangement of the other.
Example 2.4
Given the regression results above
ˆyt=20.3+0.5091 xt, T=22(14.38) (0.2561)(2.31)
Using both the test of signiﬁcance and conﬁdence interval approaches, test
the hypothesis that β=1 against a two-sided alternative. This hypothesis
might be of interest, for a unit coefﬁcient on the explanatory variableimplies a 1:1 relationship between movements in xand movements in y.
The null and alternative hypotheses are respectively:
H
0:β=1
H1:β/negationslash=1

A brief overview of the classical linear regression model 61
Box 2.7 The test of signiﬁcance and conﬁdence interval approaches compared
Test of signiﬁcance approach Conﬁdence interval approach
test stat =ˆβ−β∗
SE(ˆβ)
=0.5091−1
0.2561=−1.917Find tcrit=t20;5%=±2.086
Find tcrit=t20;5%=±2.086ˆβ±tcrit·SE(ˆβ)
=0.5091±2.086·0.2561
=(−0.0251,1.0433)
Do not reject H0since test statistic Do not reject H0since 1lies
lies within non-rejection region within the conﬁdence interval
The results of the test according to each approach are shown in box 2.7.A couple of comments are in order. First, the critical value from the
t-distribution that is required is for 20 degrees of freedom and at the 5%
level. This means that 5% of the total distribution will be in the rejec-tion region, and since this is a two-sided test, 2.5% of the distributionis required to be contained in each tail. From the symmetry of the t-
distribution around zero, the critical values in the upper and lower tailwill be equal in magnitude, but opposite in sign, as shown in ﬁgure 2.16.
What if instead the researcher wanted to test H
0:β=0o rH 0:β=2?
In order to test these hypotheses using the test of signiﬁcance approach,the test statistic would have to be reconstructed in each case, although thecritical value would be the same. On the other hand, no additional workwould be required if the conﬁdence interval approach had been adopted,
x95% non-rejection region2.5%
rejection region2.5%
rejection region
–2.086 +2.086xf ( ) Figure 2.16
Critical values and
rejection regions forat
20;5%

62 Introductory Econometrics for Finance
since it effectively permits the testing of an inﬁnite number of hypotheses.
So for example, suppose that the researcher wanted to test
H0:β=0
versus
H1:β/negationslash=0
and
H0:β=2
versus
H1:β/negationslash=2
In the ﬁrst case, the null hypothesis (that β=0) would not be rejected
since 0 lies within the 95% conﬁdence interval. By the same argument, thesecond null hypothesis (that β=2) would be rejected since 2 lies outside
the estimated conﬁdence interval.
On the other hand, note that this book has so far considered only the
results under a 5% size of test. In marginal cases (e.g. H
0:β=1, where the
test statistic and critical value are close together), a completely differentanswer may arise if a different size of test was used. This is where the testof signiﬁcance approach is preferable to the construction of a conﬁdenceinterval.
For example, suppose that now a 10% size of test is used for the null
hypothesis given in example 2.4. Using the test of signiﬁcance approach,
test statistic =ˆβ−β
∗
SE(ˆβ)
=0.5091−1
0.2561=−1.917
as above. The only thing that changes is the critical t-value. At the 10%
level (so that 5% of the total distribution is placed in each of the tailsfor this two-sided test), the required critical value is t
20;10% =±1.725.S o
now, as the test statistic lies in the rejection region, H 0would be rejected.
In order to use a 10% test under the conﬁdence interval approach, theinterval itself would have to have been re-estimated since the critical valueis embedded in the calculation of the conﬁdence interval.
So the test of signiﬁcance and conﬁdence interval approaches both have
their relative merits. The testing of a number of different hypotheses iseasier under the conﬁdence interval approach, while a consideration of

A brief overview of the classical linear regression model 63
the effect of the size of the test on the conclusion is easier to address
under the test of signiﬁcance approach.
Caution should therefore be used when placing emphasis on or making
decisions in the context of marginal cases (i.e. in cases where the nullis only just rejected or not rejected). In this situation, the appropriateconclusion to draw is that the results are marginal and that no strong in-ference can be made one way or the other. A thorough empirical analysisshould involve conducting a sensitivity analysis on the results to deter-mine whether using a different size of test alters the conclusions. It isworth stating again that it is conventional to consider sizes of test of 10%,5% and 1%. If the conclusion (i.e. ‘reject’ or ‘do not reject’) is robust tochanges in the size of the test, then one can be more conﬁdent that theconclusions are appropriate. If the outcome of the test is qualitatively al-tered when the size of the test is modiﬁed, the conclusion must be thatthere is no conclusion one way or the other!
It is also worth noting that if a given null hypothesis is rejected using a
1% signiﬁcance level, it will also automatically be rejected at the 5% level,so that there is no need to actually state the latter. Dougherty (1992,p. 100), gives the analogy of a high jumper. If the high jumper can clear2 metres, it is obvious that the jumper could also clear 1.5 metres. The1% signiﬁcance level is a higher hurdle than the 5% signiﬁcance level.Similarly, if the null is not rejected at the 5% level of signiﬁcance, it willautomatically not be rejected at any stronger level of signiﬁcance (e.g. 1%).In this case, if the jumper cannot clear 1.5 metres, there is no way s/hewill be able to clear 2 metres.
2.9.7 Some more terminology
If the null hypothesis is rejected at the 5% level, it would be said that theresult of the test is ‘statistically signiﬁcant’. If the null hypothesis is notrejected, it would be said that the result of the test is ‘not signiﬁcant’, orthat it is ‘insigniﬁcant’. Finally, if the null hypothesis is rejected at the1% level, the result is termed ‘highly statistically signiﬁcant’.
Note that a statistically signiﬁcant result may be of no practical sig-
niﬁcance. For example, if the estimated beta for a stock under a CAPMregression is 1.05, and a null hypothesis that β=1 is rejected, the result
will be statistically signiﬁcant. But it may be the case that a slightly higherbeta will make no difference to an investor’s choice as to whether to buythe stock or not. In that case, one would say that the result of the testwas statistically signiﬁcant but ﬁnancially or practically insigniﬁcant.

64 Introductory Econometrics for Finance
Table 2.3 Classifying hypothesis testing errors and correct conclusions
Reality
H0is true H 0is false
Signiﬁcant Type I error =α√
Result of test (reject H 0)
Insigniﬁcant√Type II error =β
(do not reject H 0)
2.9.8 Classifying the errors that can be made using hypothesis tests
H0is usually rejected if the test statistic is statistically signiﬁcant at a
chosen signiﬁcance level. There are two possible errors that could be made:
(1) Rejecting H 0when it was really true; this is called a type I error .
(2) Not rejecting H 0when it was in fact false; this is called a type II error .
The possible scenarios can be summarised in table 2.3.
The probability of a type I error is just α, the signiﬁcance level or size
of test chosen. To see this, recall what is meant by ‘signiﬁcance’ at the 5%level: it is only 5% likely that a result as or more extreme as this couldhave occurred purely by chance. Or, to put this another way, it is only 5%likely that this null would be rejected when it was in fact true.
Note that there is no chance for a free lunch (i.e. a cost-less gain) here!
What happens if the size of the test is reduced (e.g. from a 5% test to a1% test)? The chances of making a type I error would be reduced …but so
would the probability that the null hypothesis would be rejected at all,so increasing the probability of a type II error. The two competing effectsof reducing the size of the test can be shown in box 2.8.
So there always exists, therefore, a direct trade-off between type I
and type II errors when choosing a signiﬁcance level. The only way to
Box 2.8 Type I and Type II errors
Less likely Lower
to falsely →chance of
Reduce size →More strict →Reject null /arrownortheastreject type I error
of test (e.g. criterion for hypothesis /arrowsoutheast
5% to 1%) rejection less often More likely to Higher
incorrectly →chance of
not reject type II error

A brief overview of the classical linear regression model 65
reduce the chances of both is to increase the sample size or to select
a sample with more variation, thus increasing the amount of informa-tion upon which the results of the hypothesis test are based. In practice,up to a certain level, type I errors are usually considered more seriousand hence a small size of test is usually chosen (5% or 1% are the mostcommon).
The probability of a type I error is the probability of incorrectly reject-
ing a correct null hypothesis, which is also the size of the test. Anotherimportant piece of terminology in this area is the power of a test . The power
of a test is deﬁned as the probability of (appropriately) rejecting an incor-rect null hypothesis. The power of the test is also equal to one minus theprobability of a type II error.
An optimal test would be one with an actual test size that matched
the nominal size and which had as high a power as possible. Such a testwould imply, for example, that using a 5% signiﬁcance level would resultin the null being rejected exactly 5% of the time by chance alone, andthat an incorrect null hypothesis would be rejected close to 100% of thetime.
2.10 A special type of hypothesis test: the t-ratio
Recall that the formula under a test of signiﬁcance approach to hypothesis
testing using a t-test for the slope parameter was
test statistic =ˆβ−β∗
SE/parenleftbigˆβ/parenrightbig (2.32)
with the obvious adjustments to test a hypothesis about the intercept. If
the test is
H0:β=0
H1:β/negationslash=0
i.e. a test that the population parameter is zero against a two-sided alter-
native, this is known as a t-ratio test. Since β∗=0, the expression in (2.32)
collapses to
test statistic =ˆβ
SE(ˆβ)(2.33)
Thus the ratio of the coefﬁcient to its standard error, given by this
expression, is known as the t-ratio ort-statistic .

66 Introductory Econometrics for Finance
Example 2.5
Suppose that we have calculated the estimates for the intercept and the
slope (1.10 and −19.88 respectively) and their corresponding standard er-
rors (1.35 and 1.98 respectively). The t-ratios associated with each of the
intercept and slope coefﬁcients would be given by
ˆα ˆβ
Coefﬁcient 1 .10 −19.88
SE 1.35 1 .98
t-ratio 0 .81 −10.04
Note that if a coefﬁcient is negative, its t-ratio will also be negative. In
order to test (separately) the null hypotheses that α=0 and β=0, the
test statistics would be compared with the appropriate critical value fromat-distribution. In this case, the number of degrees of freedom, given by
T−k, is equal to 15 – 3 =12. The 5% critical value for this two-sided test
(remember, 2.5% in each tail for a 5% test) is 2.179, while the 1% two-sidedcritical value (0.5% in each tail) is 3.055. Given these t-ratios and critical
values, would the following null hypotheses be rejected?
H
0:α=0? ( No)
H0:β=0? ( Yes)
If H 0is rejected, it would be said that the test statistic is significant .I ft h e
variable is not ‘signiﬁcant’, it means that while the estimated value of thecoefﬁcient is not exactly zero (e.g. 1.10 in the example above), the coefﬁ-cient is indistinguishable statistically from zero. If a zero were placed inthe ﬁtted equation instead of the estimated value, this would mean thatwhatever happened to the value of that explanatory variable, the depen-dent variable would be unaffected. This would then be taken to mean thatthe variable is not helping to explain variations in y, and that it could
therefore be removed from the regression equation. For example, if the t-
ratio associated with xhad been −1.04 rather than −10.04 (assuming that
the standard error stayed the same), the variable would be classed as in-signiﬁcant (i.e. not statistically different from zero). The only insigniﬁcantterm in the above regression is the intercept. There are good statisticalreasons for always retaining the constant, even if it is not signiﬁcant; seechapter 4.
It is worth noting that, for degrees of freedom greater than around 25,
the 5% two-sided critical value is approximately ±2. So, as a rule of thumb
(i.e. a rough guide), the null hypothesis would be rejected if the t-statistic
exceeds 2 in absolute value.

A brief overview of the classical linear regression model 67
Some authors place the t-ratios in parentheses below the corresponding
coefﬁcient estimates rather than the standard errors. One thus needs tocheck which convention is being used in each particular application, andalso to state this clearly when presenting estimation results.
There will now follow two ﬁnance case studies that involve only the
estimation of bivariate linear regression models and the construction andinterpretation of t-ratios.
2.11 An example of the use of a simple t-test to test a theory in
ﬁnance: can US mutual funds beat the market?
Jensen (1968) was the ﬁrst to systematically test the performance of mutual
funds, and in particular examine whether any ‘beat the market’. He useda sample of annual returns on the portfolios of 115 mutual funds from1945–64. Each of the 115 funds was subjected to a separate OLS time seriesregression of the form
R
jt−Rft=αj+βj(Rmt−Rft)+ujt (2.52)
where Rjtis the return on portfolio jat time t,Rftis the return on a
risk-free proxy (a 1-year government bond), Rmtis the return on a mar-
ket portfolio proxy, ujtis an error term, and αj,βjare parameters to be
estimated. The quantity of interest is the signiﬁcance of αj, since this
parameter deﬁnes whether the fund outperforms or underperforms themarket index. Thus the null hypothesis is given by: H
0:αj=0. A positive
and signiﬁcant αjfor a given fund would suggest that the fund is able
to earn signiﬁcant abnormal returns in excess of the market-required re-turn for a fund of this given riskiness. This coefﬁcient has become knownas ‘Jensen’s alpha’. Some summary statistics across the 115 funds for theestimated regression results for (2.52) are given in table 2.4.
Table 2.4 Summary statistics for the estimated regression results for (2.52)
Extremal values
Item Mean value Median value Minimum Maximum
ˆα −0.011 −0.009 −0.080 0.058
ˆβ 0.840 0.848 0.219 1.405
Sample size 17 19 10 20
Source : Jensen (1968). Reprinted with the permission of Blackwell Publishers.

68 Introductory Econometrics for Finance
45
40353025
20
15
10
5
0
–4 –1 1 3 –5 2 0 –2 –3Frequency
t-ratio2122141
28
15
5Figure 2.17
Frequency
distribution oft-ratios of mutual
fund alphas (grossof transactionscosts) Source:Jensen (1968).Reprinted with thepermission ofBlackwell Publishers
35
302520
15
10
5
0
–4 –1 1 3 –5 2 0 –2 –3Frequency
t-ratio13103032
28
10
1Figure 2.18
Frequency
distribution oft-ratios of mutual
fund alphas (net oftransactions costs)Source: Jensen(1968). Reprintedwith the permissionof BlackwellPublishers
As table 2.4 shows, the average (deﬁned as either the mean or the me-
dian) fund was unable to ‘beat the market’, recording a negative alphain both cases. There were, however, some funds that did manage to per-form signiﬁcantly better than expected given their level of risk, with thebest fund of all yielding an alpha of 0.058. Interestingly, the average fundhad a beta estimate of around 0.85, indicating that, in the CAPM context,most funds were less risky than the market index. This result may beattributable to the funds investing predominantly in (mature) blue chipstocks rather than small caps.
The most visual method of presenting the results was obtained by plot-
ting the number of mutual funds in each t-ratio category for the alpha
coefﬁcient, ﬁrst gross and then net of transactions costs, as in ﬁgure 2.17and ﬁgure 2.18, respectively.

A brief overview of the classical linear regression model 69
Table 2.5 Summary statistics for unit trust returns, January 1979–May 2000
Mean Minimum Maximum Median
(%) (%) (%) (%)
Average monthly
return, 1979–2000 1.0 0.6 1.4 1.0
Standard deviation of
returns over time 5.1 4.3 6.9 5.0
The appropriate critical value for a two-sided test of αj=0 is approx-
imately 2.10 (assuming 20 years of annual data leading to 18 degrees offreedom). As can be seen, only ﬁve funds have estimated t-ratios greater
than 2 and are therefore implied to have been able to outperform themarket before transactions costs are taken into account. Interestingly, ﬁveﬁrms have also signiﬁcantly underperformed the market, with t-ratios
of –2 or less.
When transactions costs are taken into account (ﬁgure 2.18), only one
fund out of 115 is able to signiﬁcantly outperform the market, while 14signiﬁcantly underperform it. Given that a nominal 5% two-sided size oftest is being used, one would expect two or three funds to ‘signiﬁcantlybeat the market’ by chance alone. It would thus be concluded that, duringthe sample period studied, US fund managers appeared unable to system-atically generate positive abnormal returns.
2.12 Can UK unit trust managers beat the market?
Jensen’s study has proved pivotal in suggesting a method for conductingempirical tests of the performance of fund managers. However, it has beencriticised on several grounds. One of the most important of these in thecontext of this book is that only between 10 and 20 annual observationswere used for each regression. Such a small number of observations isreally insufﬁcient for the asymptotic theory underlying the testing proce-dure to be validly invoked.
A variant on Jensen’s test is now estimated in the context of the UK
market, by considering monthly returns on 76 equity unit trusts. Thedata cover the period January 1979–May 2000 (257 observations for eachfund). Some summary statistics for the funds are presented in table 2.5.
From these summary statistics, the average continuously compounded
return is 1.0% per month, although the most interesting feature is the

70 Introductory Econometrics for Finance
Table 2.6 CAPM regression results for unit trust returns, January 1979–May 2000
Estimates of Mean Minimum Maximum Median
α(%) −0.02 −0.54 0.33 −0.03
β 0.91 0.56 1.09 0.91
t-ratio on α −0.07 −2.44 3.11 −0.25
Figure 2.19
Performance of UK
unit trusts,1979–2000
wide variation in the performances of the funds. The worst-performing
fund yields an average return of 0.6% per month over the 20-year pe-riod, while the best would give 1.4% per month. This variability is furtherdemonstrated in ﬁgure 2.19, which plots over time the value of £100 in-vested in each of the funds in January 1979.
A regression of the form (2.52) is applied to the UK data, and the sum-
mary results presented in table 2.6. A number of features of the regressionresults are worthy of further comment. First, most of the funds have esti-mated betas less than one again, perhaps suggesting that the fund man-agers have historically been risk-averse or investing disproportionately inblue chip companies in mature sectors. Second, gross of transactions costs,nine funds of the sample of 76 were able to signiﬁcantly outperform themarket by providing a signiﬁcant positive alpha, while seven funds yieldedsigniﬁcant negative alphas. The average fund (where ‘average’ is measuredusing either the mean or the median) is not able to earn any excess returnover the required rate given its level of risk.

A brief overview of the classical linear regression model 71
Box 2.9 Reasons for stock market overreactions
(1)That the ‘overreaction effect’ is just another manifestation of the ‘size effect ’. The size
effect is the tendency of small ﬁrms to generate on average, superior returns to largeﬁrms. The argument would follow that the losers were small ﬁrms and that thesesmall ﬁrms would subsequently outperform the large ﬁrms. DeBondt and Thaler didnot believe this a sufﬁcient explanation, but Zarowin (1990) found that allowing forﬁrm size did reduce the subsequent return on the losers.
(2)That the reversals of fortune reﬂect changes in equilibrium required returns . The losers
are argued to be likely to have considerably higher CAPM betas, reﬂecting investors’perceptions that they are more risky. Of course, betas can change over time, and asubstantial fall in the ﬁrms’ share prices (for the losers) would lead to a rise in theirleverage ratios, leading in all likelihood to an increase in their perceived riskiness.Therefore, the required rate of return on the losers will be larger, and their ex post
performance better. Ball and Kothari (1989) ﬁnd the CAPM betas of losers to beconsiderably higher than those of winners.
2.13 The overreaction hypothesis and the UK stock market
2.13.1 Motivation
Two studies by DeBondt and Thaler (1985, 1987) showed that stocks expe-riencing a poor performance over a 3–5-year period subsequently tend tooutperform stocks that had previously performed relatively well. This im-plies that, on average, stocks which are ‘losers’ in terms of their returnssubsequently become ‘winners’, and vice versa. This chapter now exam-ines a paper by Clare and Thomas (1995) that conducts a similar studyusing monthly UK stock returns from January 1955 to 1990 (36 years) onall ﬁrms traded on the London Stock exchange.
This phenomenon seems at ﬁrst blush to be inconsistent with the efﬁ-
cient markets hypothesis, and Clare and Thomas propose two explanations(box 2.9).
Zarowin (1990) also ﬁnds that 80% of the extra return available from
holding the losers accrues to investors in January, so that almost all ofthe ‘overreaction effect’ seems to occur at the start of the calendar year.
2.13.2 Methodology
Clare and Thomas take a random sample of 1,000 ﬁrms and, for each, theycalculate the monthly excess return of the stock for the market over a 12-,24- or 36-month period for each stock i
U
it=Rit−Rmtt=1,…, n;i=1,…, 1000;
n=12,24 or 36 (2.53)

72 Introductory Econometrics for Finance
Box 2.10 Ranking stocks and forming portfolios
Portfolio Ranking
Portfolio 1 Best performing 20% of ﬁrmsPortfolio 2 Next 20%Portfolio 3 Next 20%Portfolio 4 Next 20%Portfolio 5 Worst performing 20% of ﬁrms
Box 2.11 Portfolio monitoring
Estimate ¯Rifor year 1
Monitor portfolios for year 2Estimate ¯R
ifor year 3
…
Monitor portfolios for year 36
Then the average monthly return over each stock ifor the ﬁrst 12-, 24-, or
36-month period is calculated:
¯Ri=1
nn/summationdisplay
t=1Uit (2.54)
The stocks are then ranked from highest average return to lowest and
from these 5 portfolios are formed and returns are calculated assumingan equal weighting of stocks in each portfolio (box 2.10).
The same sample length nis used to monitor the performance of each
portfolio. Thus, for example, if the portfolio formation period is one, twoor three years, the subsequent portfolio tracking period will also be one,two or three years, respectively. Then another portfolio formation periodfollows and so on until the sample period has been exhausted. How manysamples of length nwill there be? n=1, 2, or 3 years. First, suppose n=
1 year. The procedure adopted would be as shown in box 2.11.
So if n=1, there are 18 independent (non-overlapping) observation
periods and 18 independent tracking periods. By similar arguments, n=2
gives 9 independent periods and n=3 gives 6 independent periods. The
mean return for each month over the 18, 9, or 6 periods for the winnerand loser portfolios (the top 20% and bottom 20% of ﬁrms in the portfolioformation period) are denoted by ¯R
W
ptand ¯RL
pt, respectively. Deﬁne the
difference between these as ¯RDt=¯RL
pt−¯RW
pt.

A brief overview of the classical linear regression model 73
Table 2.7 Is there an overreaction effect in the UK stock market?
Panel A: All Months
n=12 n=24 n=36
Return on loser 0.0033 0.0011 0.0129
Return on winner 0.0036 −0.0003 0.0115
Implied annualised return difference −0.37% 1.68% 1.56%
Coefﬁcient for (2.55): ˆα1 −0.00031 0.0014∗∗0.0013
(0.29) (2.01) (1.55)
Coefﬁcients for (2.56): ˆα2 −0.00034 0.00147∗∗0.0013∗
(−0.30) (2.01) (1.41)
Coefﬁcients for (2.56): ˆβ −0.022 0.010 −0.0025
(−0.25) (0.21) ( −0.06)
Panel B: all months except January
Coefﬁcient for (2.55): ˆα1 −0.0007 0.0012∗0.0009
(−0.72) (1.63) (1.05)
Notes: t-ratios in parentheses;∗and∗∗denote signiﬁcance at the 10% and 5% levels,
respectively.Source: Clare and Thomas (1995). Reprinted with the permission of Blackwell
Publishers.
The ﬁrst regression to be performed is of the excess return of the losers
over the winners on a constant only
¯RDt=α1+ηt (2.55)
where ηtis an error term. The test is of whether α1is signiﬁcant and
positive. However, a signiﬁcant and positive α1is not a sufﬁcient condition
for the overreaction effect to be conﬁrmed because it could be owing tohigher returns being required on loser stocks owing to loser stocks beingmore risky. The solution, Clare and Thomas (1995) argue, is to allow forrisk differences by regressing against the market risk premium
¯R
Dt=α2+β(Rmt−Rft)+ηt (2.56)
where Rmtis the return on the FTA All-share, and Rftis the return on a
UK government three-month Treasury Bill. The results for each of thesetwo regressions are presented in table 2.7.
As can be seen by comparing the returns on the winners and losers in
the ﬁrst two rows of table 2.7, 12 months is not a sufﬁciently long timefor losers to become winners. By the two-year tracking horizon, however,the losers have become winners, and similarly for the three-year samples.This translates into an average 1.68% higher return on the losers than the

74 Introductory Econometrics for Finance
winners at the two-year horizon, and 1.56% higher return at the three-year
horizon. Recall that the estimated value of the coefﬁcient in a regressionof a variable on a constant only is equal to the average value of that vari-able. It can also be seen that the estimated coefﬁcients on the constantterms for each horizon are exactly equal to the differences between thereturns of the losers and the winners. This coefﬁcient is statistically signif-icant at the two-year horizon, and marginally signiﬁcant at the three-yearhorizon.
In the second test regression, ˆβrepresents the difference between the
market betas of the winner and loser portfolios. None of the beta coefﬁ-cient estimates are even close to being signiﬁcant, and the inclusion ofthe risk term makes virtually no difference to the coefﬁcient values orsigniﬁcances of the intercept terms.
Removal of the January returns from the samples reduces the subse-
quent degree of overperformance of the loser portfolios, and the signif-icances of the ˆα
1terms is somewhat reduced. It is concluded, therefore,
that only a part of the overreaction phenomenon occurs in January. Clareand Thomas then proceed to examine whether the overreaction effect isrelated to ﬁrm size, although the results are not presented here.
2.13.3 Conclusions
The main conclusions from Clare and Thomas’ study are:
(1) There appears to be evidence of overreactions in UK stock returns, as
found in previous US studies.
(2) These over-reactions are unrelated to the CAPM beta.(3) Losers that subsequently become winners tend to be small, so that
most of the overreaction in the UK can be attributed to the size effect.
2.14 The exact signiﬁcance level
The exact signiﬁcance level is also commonly known as the p-value. It
gives the marginal significance level where one would be indifferent between
rejecting and not rejecting the null hypothesis. If the test statistic is ‘large’in absolute value, the p-value will be small, and vice versa. For example,
consider a test statistic that is distributed as a t
62and takes a value of 1.47.
Would the null hypothesis be rejected? It would depend on the size of thetest. Now, suppose that the p-value for this test is calculated to be 0.12:
●Is the null rejected at the 5% level? No
●Is the null rejected at the 10% level? No
●Is the null rejected at the 20% level? Yes

A brief overview of the classical linear regression model 75
Table 2.8 Part of the EViews regression output revisited
Coefﬁcient Std. Error t-Statistic Prob.
C 0.363302 0.444369 0.817569 0.4167
RFUTURES 0.123860 0.133790 0.925781 0.3581
In fact, the null would have been rejected at the 12% level or higher.
To see this, consider conducting a series of tests with size 0.1%, 0.2%,0.3%, 0.4%, …1%,…, 5%,…10%,…Eventually, the critical value and test
statistic will meet and this will be the p-value. p-values are almost always
provided automatically by software packages. Note how useful they are!They provide all of the information required to conduct a hypothesis testwithout requiring of the researcher the need to calculate a test statistic orto ﬁnd a critical value from a table – both of these steps have already beentaken by the package in producing the p-value. The p-value is also useful
since it avoids the requirement of specifying an arbitrary signiﬁcancelevel ( α). Sensitivity analysis of the effect of the signiﬁcance level on the
conclusion occurs automatically.
Informally, the p-value is also often referred to as the probability of
being wrong when the null hypothesis is rejected. Thus, for example, if a
p-value of 0.05 or less leads the researcher to reject the null (equivalent to
a 5% signiﬁcance level), this is equivalent to saying that if the probabilityof incorrectly rejecting the null is more than 5%, do not reject it. The
p-value has also been termed the ‘plausibility’ of the null hypothesis; so,
the smaller is the p-value, the less plausible is the null hypothesis.
2.15 Hypothesis testing in EViews – example 1: hedging revisited
Reload the ‘hedge.wf1’ EViews work file that was created above .I fw e
re-examine the results table from the returns regression (screenshot 2.3on p. 43), it can be seen that as well as the parameter estimates, EViewsautomatically calculates the standard errors, the t-ratios, and the p-values
associated with a two-sided test of the null hypothesis that the true valueof a parameter is zero. Part of the results table is replicated again here(table 2.8) for ease of interpretation.
The third column presents the t-ratios, which are the test statistics for
testing the null hypothesis that the true values of these parameters arezero against a two sided alternative – i.e. these statistics test H
0:α=0ver-
sus H 1:α/negationslash=0in the ﬁrst row of numbers and H 0:β=0versus H 1:β/negationslash=0

76 Introductory Econometrics for Finance
in the second. The fact that these test statistics are both very small is in-
dicative that neither of these null hypotheses is likely to be rejected. Thisconclusion is conﬁrmed by the p-values given in the ﬁnal column. Both p-
values are considerably larger than 0.1, indicating that the correspondingtest statistics are not even signiﬁcant at the 10% level.
Suppose now that we wanted to test the null hypothesis that H
0:β=1
rather than H 0:β=0. We could test this, or any other hypothesis about
the coefﬁcients, by hand, using the information we already have. But itis easier to let EViews do the work by typing View and then Coefficient
Tests/Wald – Coefficient Restrictions …. EViews deﬁnes all of the param-
eters in a vector C, so that C(1) will be the intercept and C(2) will be theslope. Type C(2)=1 and click OK. Note that using this software, it is possi-
ble to test multiple hypotheses, which will be discussed in chapter 3, andalso non-linear restrictions, which cannot be tested using the standardprocedure for inference described above.
Wald Test:
Equation: LEVELREG
Test Statistic Value df Probability
F-statistic 0.565298 (1, 64) 0.4549Chi-square 0.565298 1 0.4521
Null Hypothesis Summary:
Normalised Restriction ( =0) Value Std. Err.
−1+C(2) −0.017777 0.023644
Restrictions are linear in coefﬁcients.
The test is performed in two different ways, but results suggest that
the null hypothesis should clearly be rejected as the p-value for the test
is zero to four decimal places. Since we are testing a hypothesis aboutonly one parameter, the two test statistics (‘ F-statistic’ and ‘ χ-square’) will
always be identical. These are equivalent to conducting a t-test, and these
alternative formulations will be discussed in detail in chapter 4. EViewsalso reports the ‘normalised restriction’, although this can be ignored forthe time being since it merely reports the regression slope parameter (ina different form) and its standard error.
Now go back to the regression in levels (i.e. with the raw prices rather
than the returns) and test the null hypothesis that β=1 in this regression.
You should ﬁnd in this case that the null hypothesis is not rejected (tablebelow).

A brief overview of the classical linear regression model 77
Wald Test:
Equation: RETURNREG
Test Statistic Value df Probability
F-statistic 42.88455 (1, 63) 0.0000Chi-square 42.88455 1 0.0000
Null Hypothesis Summary:
Normalised Restriction ( =0) Value Std. Err.
−1+C(2) −0.876140 0.133790
Restrictions are linear in coefﬁcients.
2.16 Estimation and hypothesis testing in EViews – example 2:
the CAPM
This exercise will estimate and test some hypotheses about the CAPM beta
for several US stocks. First, Open a new workfile to accommodate monthly
data commencing in January 2002 and ending in April 2007. Then import
the Excel file ‘capm.xls’. The ﬁle is organised by observation and contains
six columns of numbers plus the dates in the ﬁrst column, so in the‘Names for series or Number if named in ﬁle’ box, type 6. As before, do
not import the dates so the data start in cell B2. The monthly stock pricesof four companies (Ford, General Motors, Microsoft and Sun) will appear asobjects, along with index values for the S&P500 (‘sandp’) and three-monthUS-Treasury bills (‘ustb3m’). Save the EViews workfile as ‘capm.wk1’ .
In order to estimate a CAPM equation for the Ford stock, for example,
we need to ﬁrst transform the price series into returns and then theexcess returns over the risk free rate. To transform the series, click on theGenerate button ( Genr ) in the workﬁle window. In the new window, type
RSANDP=100*LOG(SANDP/SANDP( −1))
This will create a new series named RSANDP that will contain the returns
of the S&P500. The operator ( −1) is used to instruct EViews to use the one-
period lagged observation of the series. To estimate percentage returns onthe Ford stock, press the Genr button again and type
RFORD=100*LOG(FORD/FORD( −1))
This will yield a new series named RFORD that will contain the returns
of the Ford stock. EViews allows various kinds of transformations to the

78 Introductory Econometrics for Finance
series. For example
X2=X/2 creates a new variable called X2 that is half
of X
XSQ=Xˆ2 creates a new variable XSQ that is X squared
LX=LOG(X) creates a new variable LX that is the
log of X
LAGX=X(−1) creates a new variable LAGX containing X
lagged by one period
LAGX2 =X(−2) creates a new variable LAGX2 containing X
lagged by two periods
Other functions include:
d(X) ﬁrst difference of X
d(X,n) nth order difference of X
dlog(X) ﬁrst difference of the logarithm of Xdlog(X,n) nth order difference of the logarithm of X
abs(X) absolute value of X
If, in the transformation, the new series is given the same name as the
old series, then the old series will be overwritten. Note that the returnsfor the S&P index could have been constructed using a simpler commandin the ‘Genr’ window such as
RSANDP =100
∗DLOG(SANDP)
as we used in chapter 1. Before we can transform the returns into ex-
cess returns, we need to be slightly careful because the stock returnsare monthly, but the Treasury bill yields are annualised. We could runthe whole analysis using monthly data or using annualised data and itshould not matter which we use, but the two series must be measuredconsistently. So, to turn the T-bill yields into monthly ﬁgures and to writeover the original series, press the Genr button again and type
USTB3M=USTB3M/12
Now, to compute the excess returns, click Genr again and type
ERSANDP=RSANDP-USTB3M
where ‘ERSANDP’ will be used to denote the excess returns, so that the
original raw returns series will remain in the workﬁle. The Ford returnscan similarly be transformed into a set of excess returns.
Now that the excess returns have been obtained for the two series,
before running the regression, plot the data to examine visually whether

A brief overview of the classical linear regression model 79
the series appear to move together. To do this, create a new object by
clicking on the Object/New Object menu on the menu bar. Select Graph ,
provide a name (call the graph Graph1 ) and then in the new window
provide the names of the series to plot. In this new window, type
ERSANDP ERFORDThen press OKand screenshot 2.4 will appear.
Screenshot 2.4
Plot of two series
This is a time-series plot of the two variables, but a scatter plot may be
more informative. To examine a scatter plot, Click Options , choose the
Type tab, then select Scatter from the list and click OK. There appears to
be a weak association between ERFTAS and ERFORD. Close the window ofthe graph and return to the workﬁle window.
To estimate the CAPM equation, click on Object/New Objects .I nt h e
new window, select Equation and name the object CAPM .C l i c ko n OK.
In the window, specify the regression equation. The regression equationtakes the form
(R
Ford−rf)t=α+β(RM−rf)t+ut

80 Introductory Econometrics for Finance
Since the data have already been transformed to obtain the excess returns,
in order to specify this regression equation, type in the equation window
ERFORD C ERSANDP
To use all the observations in the sample and to estimate the regressionusing LS – Least Squares (NLS and ARMA), click on OK. The results screen
appears as in the following table. Make sure that you save the Workfile
again to include the transformed series and regression results!
Dependent Variable: ERFORD
Method: Least SquaresDate: 08/21/07 Time: 15:02Sample (adjusted): 2002M02 2007M04Included observations: 63 after adjustments
Coefﬁcient Std. Error t-Statistic Prob.
C 2.020219 2.801382 0.721151 0.4736
ERSANDP 0.359726 0.794443 0.452803 0.6523
R-squared 0.003350 Mean dependent var 2.097445Adjusted R-squared −0.012989 S.D. dependent var 22.05129
S.E. of regression 22.19404 Akaike info criterion 9.068756Sum squared resid 30047.09 Schwarz criterion 9.136792Log likelihood −283.6658 Hannan-Quinn criter. 9.095514
F-statistic 0.205031 Durbin-Watson stat 1.785699Prob(F-statistic) 0.652297
Take a couple of minutes to examine the results of the regression. What
is the slope coefﬁcient estimate and what does it signify? Is this coefﬁcientstatistically signiﬁcant? The beta coefﬁcient (the slope coefﬁcient) estimateis 0.3597. The p-value of the t-ratio is 0.6523, signifying that the excess
return on the market proxy has no signiﬁcant explanatory power for thevariability of the excess returns of Ford stock. What is the interpretationof the intercept estimate? Is it statistically signiﬁcant?
In fact, there is a considerably quicker method for using transformed
variables in regression equations, and that is to write the transformationdirectly into the equation window. In the CAPM example above, this couldbe done by typing
DLOG(FORD)-USTB3M C DLOG(SANDP)-USTB3Minto the equation window. As well as being quicker, an advantage of this
approach is that the output will show more clearly the regression that hasactually been conducted, so that any errors in making the transformationscan be seen more clearly.

A brief overview of the classical linear regression model 81
How could the hypothesis that the value of the population coefﬁcient is
equal to 1 be tested? The answer is to click on View/Coefficient Tests/Wald
– Coefficient Restrictions. . . and then in the box that appears, Type C(2)=1 .
The conclusion here is that the null hypothesis that the CAPM beta of Fordstock is 1 cannot be rejected and hence the estimated beta of 0.359 is notsigniﬁcantly different from 1.
5
Key concepts
The key terms to be able to deﬁne and explain from this chapter are
●regression model ●disturbance term
●population ●sample
●linear model ●consistency
●unbiasedness ●efﬁciency
●standard error ●statistical inference
●null hypothesis ●alternative hypothesis
●t-distribution ●conﬁdence interval
●test statistic ●rejection region
●type I error ●type II error
●size of a test ●power of a test
●p-value ●data mining
●asymptotic
Appendix: Mathematical derivations of CLRM results
2A.1 Derivation of the OLS coefﬁcient estimator in the bivariate case
L=T/summationdisplay
t=1(yt−ˆyt)2=T/summationdisplay
t=1(yt−ˆα−ˆβxt)2(2A.1)
It is necessary to minimise Lw.r.t. ˆαand ˆβ,to ﬁnd the values of αand
βthat give the line that is closest to the data. So Lis differentiated w.r.t.
ˆαand ˆβ, and the ﬁrst derivatives are set to zero. The ﬁrst derivatives are
given by
∂L
∂ˆα=−2/summationdisplay
t(yt−ˆα−ˆβxt)=0 (2A.2)
∂L
∂ˆβ=−2/summationdisplay
txt(yt−ˆα−ˆβxt)=0 (2A.3)
5Although the value 0.359 may seem a long way from 1, considered purely from an
econometric perspective, the sample size is quite small and this has led to a largeparameter standard error, which explains the failure to reject both H
0:β=0and
H0:β=1.

82 Introductory Econometrics for Finance
The next step is to rearrange (2A.2) and (2A.3) in order to obtain expres-
sions for ˆαand ˆβ. From (2A.2)
/summationdisplay
t(yt−ˆα−ˆβxt)=0 (2A.4)
Expanding the parentheses and recalling that the sum runs from 1 to T
so that there will be Tterms in ˆα
/summationdisplay
yt−Tˆα−ˆβ/summationdisplay
xt=0 (2A.5)
But/summationtextyt=T¯yand/summationtextxt=T¯x, so it is possible to write (2A.5) as
T¯y−Tˆα−Tˆβ¯x=0 (2A.6)
or
¯y−ˆα−ˆβ¯x=0 (2A.7)
From (2A.3)
/summationdisplay
txt(yt−ˆα−ˆβxt)=0 (2A.8)
From (2A.7)
ˆα=¯y−ˆβ¯x (2A.9)
Substituting into (2A.8) for ˆαfrom (2A.9)
/summationdisplay
txt(yt−¯y+ˆβ¯x−ˆβxt)=0 (2A.10)
/summationdisplay
txtyt−¯y/summationdisplay
xt+ˆβ¯x/summationdisplay
xt−ˆβ/summationdisplay
x2
t=0 (2A.11)
/summationdisplay
txtyt−T¯x¯y+ˆβT¯x2−ˆβ/summationdisplay
x2
t=0 (2A.12)
Rearranging for ˆβ,
ˆβ/parenleftBig
T¯x2−/summationdisplay
x2
t/parenrightBig
=Txy−/summationdisplay
xtyt (2A.13)
Dividing both sides of (2A.13) by/parenleftbig
T¯x2−/summationtextx2
t/parenrightbig
gives
ˆβ=/summationdisplay
xtyt−Txy/summationdisplay
x2
t−T¯x2and ˆα=¯y−ˆβ¯x (2A.14)

A brief overview of the classical linear regression model 83
2A.2 Derivation of the OLS standard error estimators for the intercept and
slope in the bivariate case
Recall that the variance of the random variable ˆαcan be written as
var( ˆα)=E(ˆα−E(ˆα))2(2A.15)
and since the OLS estimator is unbiased
var( ˆα)=E(ˆα−α)2(2A.16)
By similar arguments, the variance of the slope estimator can be written
as
var(ˆβ)=E(ˆβ−β)2(2A.17)
Working ﬁrst with (2A.17), replacing ˆβwith the formula for it given by
the OLS estimator
var(ˆβ)=E/parenleftBigg/summationdisplay
(xt−¯x)(yt−¯y)/summationdisplay
(xt−¯x)2−β/parenrightBigg2
(2A.18)
Replacing ytwithα+βxt+ut, and replacing ¯ywithα+β¯xin (2A.18)
var(ˆβ)=E/parenleftBigg/summationdisplay
(xt−¯x)(α+βxt+ut−α−β¯x)/summationdisplay
(xt−¯x)2−β/parenrightBigg2
(2A.19)
Cancelling αand multiplying the last βterm in (2A.19) by/summationtext(xt−¯x)2
/summationtext(xt−¯x)2
var(ˆβ)=E/parenleftBigg/summationdisplay
(xt−¯x)(βxt+ut−β¯x)−β/summationdisplay
(xt−¯x)2
/summationdisplay
(xt−¯x)2/parenrightBigg2
(2A.20)
Rearranging
var(ˆβ)=E/parenleftBigg/summationdisplay
(xt−¯x)β(xt−¯x)+/summationdisplay
ut(xt−¯x)−β/summationdisplay
(xt−¯x)2
/summationdisplay
(xt−¯x)2/parenrightBigg2
(2A.21)
var(ˆβ)=E/parenleftBigg
β/summationdisplay
(xt−¯x)2+/summationdisplay
ut(xt−¯x)−β/summationdisplay
(xt−¯x)2
/summationdisplay
(xt−¯x)2/parenrightBigg2
(2A.22)
Now the βterms in (2A.22) will cancel to give
var(ˆβ)=E/parenleftBigg/summationdisplay
ut(xt−¯x)/summationdisplay
(xt−¯x)2/parenrightBigg2
(2A.23)

84 Introductory Econometrics for Finance
Now let x∗
tdenote the mean-adjusted observation for xt, i.e. ( xt−¯x). Equa-
tion (2A.23) can be written
var(ˆβ)=E/parenleftBigg/summationdisplay
utx∗
t/summationdisplay
x∗2
t/parenrightBigg2
(2A.24)
The denominator of (2A.24) can be taken through the expectations oper-
ator under the assumption that xis ﬁxed or non-stochastic
var(ˆβ)=1
/parenleftBig/summationdisplay
x∗2
t/parenrightBig2E/parenleftBig/summationdisplay
utx∗
t/parenrightBig2
(2A.25)
Writing the terms out in the last summation of (2A.25)
var(ˆβ)=1
/parenleftBig/summationdisplay
x∗2
t/parenrightBig2E/parenleftbig
u1x∗
1+u2x∗
2+···+ uTx∗
T/parenrightbig2(2A.26)
Now expanding the brackets of the squared term in the expectations
operator of (2A.26)
var(ˆβ)=1
/parenleftBig/summationdisplay
x∗2
t/parenrightBig2E/parenleftbig
u2
1x∗2
1+u2
2x∗2
2+···+ u2
Tx∗2
T+cross-products/parenrightbig
(2A.27)
where ‘ cross-products ’ in (2A.27) denotes all of the terms uix∗
iujx∗
j(i/negationslash=j).
These cross-products can be written as uiujx∗
ix∗
j(i/negationslash=j)and their expecta-
tion will be zero under the assumption that the error terms are uncorre-lated with one another. Thus, the ‘ cross-products ’ term in (2A.27) will drop
out. Recall also from the chapter text that E(u
2
t)is the error variance,
which is estimated using s2
var(ˆβ)=1
/parenleftBig/summationdisplay
x∗2
t/parenrightBig2/parenleftbig
s2x∗2
1+s2x∗2
2+···+ s2x∗2
T/parenrightbig
(2A.28)
which can also be written
var(ˆβ)=s2
/parenleftBig/summationdisplay
x∗2
t/parenrightBig2/parenleftbig
x∗2
1+x∗2
2+···+ x∗2
T/parenrightbig
=s2/summationdisplay
x∗2
t/parenleftBig/summationdisplay
x∗2
t/parenrightBig2(2A.29)
A term in/summationtextx∗2
tcan be cancelled from the numerator and denominator
of (2A.29), and recalling that x∗
t=(xt−¯x), this gives the variance of the
slope coefﬁcient as
var(ˆβ)=s2
/summationdisplay
(xt−¯x)2(2A.30)

A brief overview of the classical linear regression model 85
so that the standard error can be obtained by taking the square root of
(2A.30)
SE(ˆβ)=s/radicalBigg
1/summationdisplay
(xt−¯x)2(2A.31)
Turning now to the derivation of the intercept standard error, this is in
fact much more difﬁcult than that of the slope standard error. In fact,both are very much easier using matrix algebra as shown below. Therefore,this derivation will be offered in summary form. It is possible to express
ˆαas a function of the true αand of the disturbances, u
t
ˆα=α+/summationdisplay
ut/bracketleftBig/summationdisplay
x2
t−xt/summationdisplay
xt/bracketrightBig
/bracketleftBig
T/summationdisplay
x2
t−/parenleftBig/summationdisplay
xt/parenrightBig2/bracketrightBig (2A.32)
Denoting all of the elements in square brackets as gt, (2A.32) can be written
ˆα−α=/summationdisplay
utgt (2A.33)
From (2A.15), the intercept variance would be written
var( ˆα)=E/parenleftBig/summationdisplay
utgt/parenrightBig2
=/summationdisplay
g2
tE/parenleftbig
u2
t/parenrightbig
=s2/summationdisplay
g2
t (2A.34)
Writing (2A.34) out in full for g2
tand expanding the brackets
var( ˆα)=s2/bracketleftBig
T/parenleftBig/summationdisplay
x2
t/parenrightBig2
−2/summationdisplay
xt/parenleftBig/summationdisplay
x2
t/parenrightBig/summationdisplay
xt+/parenleftBig/summationdisplay
x2
t/parenrightBig/parenleftBig/summationdisplay
xt/parenrightBig2/bracketrightBig
/bracketleftBig
T/summationdisplay
x2
t−/parenleftBig/summationdisplay
xt/parenrightBig2/bracketrightBig2
(2A.35)
This looks rather complex, but fortunately, if we take/summationtextx2
toutside the
square brackets in the numerator, the remaining numerator cancels witha term in the denominator to leave the required result
SE(ˆα)=s/radicaltp/radicalvertex/radicalvertex/radicalbt /summationdisplay
x2
t
T/summationdisplay
(xt−¯x)2(2A.36)
Review questions
1. (a) Why does OLS estimation involve taking vertical deviations of the
points to the line rather than horizontal distances?
(b) Why are the vertical distances squared before being added
together?

86 Introductory Econometrics for Finance
(c) Why are the squares of the vertical distances taken rather than the
absolute values?
2. Explain, with the use of equations, the difference between the sample
regression function and the population regression function.
3. What is an estimator? Is the OLS estimator superior to all other
estimators? Why or why not?
4. What ﬁve assumptions are usually made about the unobservable error
terms in the classical linear regression model (CLRM)? Brieﬂy explainthe meaning of each. Why are these assumptions made?
5. Which of the following models can be estimated (following a suitable
rearrangement if necessary) using ordinary least squares (OLS), where
X,y,Zare variables and α,β,γare parameters to be estimated?
(Hint: the models need to be linear in the parameters.)
y
t=α+βxt+ut (2.57)
yt=eαxβ
teut(2.58)
yt=α+βγxt+ut (2.59)
ln(yt)=α+βln(xt)+ut (2.60)
yt=α+βxtzt+ut (2.61)
6. The capital asset pricing model (CAPM) can be written as
E(Ri)=Rf+βi[E(Rm)−Rf] (2.62)
using the standard notation.
The ﬁrst step in using the CAPM is to estimate the stock’s beta using
the market model. The market model can be written as
Rit=αi+βiRmt+uit (2.63)
where Ritis the excess return for security iat time t,Rmtis the excess
return on a proxy for the market portfolio at time t, and utis an iid
random disturbance term. The cofﬁcient beta in this case is also theCAPM beta for security i.
Suppose that you had estimated (2.63) and found that the estimated
value of beta for a stock, ˆβwas 1.147. The standard error associated
with this coefﬁcient SE(ˆβ)is estimated to be 0.0548.
A city analyst has told you that this security closely follows the
market, but that it is no more risky, on average, than the market. Thiscan be tested by the null hypotheses that the value of beta is one. Themodel is estimated over 62 daily observations. Test this hypothesisagainst a one-sided alternative that the security is more risky than the

A brief overview of the classical linear regression model 87
market, at the 5% level. Write down the null and alternative hypothesis.
What do you conclude? Are the analyst’s claims empirically veriﬁed?
7. The analyst also tells you that shares in Chris Mining PLC have no
systematic risk, in other words that the returns on its shares arecompletely unrelated to movements in the market. The value of betaand its standard error are calculated to be 0.214 and 0.186,respectively. The model is estimated over 38 quarterly observations.Write down the null and alternative hypotheses. Test this nullhypothesis against a two-sided alternative.
8. Form and interpret a 95% and a 99% conﬁdence interval for beta using
the ﬁgures given in question 7.
9. Are hypotheses tested concerning the actual values of the coefﬁcients
(i.e.β) or their estimated values (i.e. ˆβ)and why?
10. Using EViews, select one of the other stock series from the ‘capm.wk1’
ﬁle and estimate a CAPM beta for that stock. Test the null hypothesisthat the true beta is one and also test the null hypothesis that the truealpha (intercept) is zero. What are your conclusions?

3
Further development and analysis of the
classical linear regression model
Learning Outcomes
In this chapter, you will learn how to
●Construct models with more than one explanatory variable
●Test multiple hypotheses using an F-test
●Determine how well a model ﬁts the data
●Form a restricted regression
●Derive the OLS parameter and standard error estimators using
matrix algebra
●Estimate multiple regression models and test multiple
hypotheses in EViews
3.1 Generalising the simple model to multiple linear regression
Previously, a model of the following form has been used:
yt=α+βxt+utt=1,2,…, T (3.1)
Equation (3.1) is a simple bivariate regression model. That is, changes
in the dependent variable are explained by reference to changes in onesingle explanatory variable x. But what if the ﬁnancial theory or idea that
is sought to be tested suggests that the dependent variable is inﬂuencedby more than one independent variable? For example, simple estimationand tests of the CAPM can be conducted using an equation of the form of(3.1), but arbitrage pricing theory does not pre-suppose that there is onlya single factor affecting stock returns. So, to give one illustration, stockreturns might be purported to depend on their sensitivity to unexpectedchanges in:
88

Further development and analysis of the CLRM 89
(1) inﬂation
(2) the differences in returns on short- and long-dated bonds(3) industrial production(4) default risks.
Having just one independent variable would be no good in this case. It
would of course be possible to use each of the four proposed explanatoryfactors in separate regressions. But it is of greater interest and it is morevalid to have more than one explanatory variable in the regression equa-tion at the same time, and therefore to examine the effect of all of theexplanatory variables together on the explained variable.
It is very easy to generalise the simple model to one with kregressors
(independent variables). Equation (3.1) becomes
y
t=β1+β2x2t+β3x3t+···+ βkxkt+ut,t=1,2,…, T (3.2)
So the variables x2t,x3t,…,xktare a set of k−1explanatory variables
which are thought to inﬂuence y, and the coefﬁcient estimates β1,
β2,…,βkare the parameters which quantify the effect of each of these
explanatory variables on y. The coefﬁcient interpretations are slightly al-
tered in the multiple regression context. Each coefﬁcient is now knownas a partial regression coefﬁcient, interpreted as representing the partialeffect of the given explanatory variable on the explained variable, afterholding constant, or eliminating the effect of, all other explanatory vari-ables. For example, ˆβ
2measures the effect of x2onyafter eliminating
the effects of x3,x4,…,xk. Stating this in other words, each coefﬁcient
measures the average change in the dependent variable per unit changein a given independent variable, holding all other independent variablesconstant at their average values.
3.2 The constant term
In (3.2) above, astute readers will have noticed that the explanatory vari-ables are numbered x
2,×3,… i.e. the list starts with x2and not x1. So,
where is x1? In fact, it is the constant term, usually represented by a
column of ones of length T:
x1=⎡
⎢⎢⎢⎢⎢⎣1
1
···
1⎤
⎥⎥⎥⎥⎥⎦(3.3)

90 Introductory Econometrics for Finance
Thus there is a variable implicitly hiding next to β1, which is a column
vector of ones, the length of which is the number of observations inthe sample. The x
1in the regression equation is not usually written, in
the same way that one unit of pand 2 units of qwould be written as
‘p+2q’ and not ‘ 1p+2q’.β1is the coefﬁcient attached to the constant
term (which was called αin the previous chapter). This coefﬁcient can still
be referred to as the intercept , which can be interpreted as the average value
which ywould take if all of the explanatory variables took a value of zero.
A tighter deﬁnition of k, the number of explanatory variables, is prob-
ably now necessary. Throughout this book, kis deﬁned as the number of
‘explanatory variables’ or ‘regressors’ including the constant term. Thisis equivalent to the number of parameters that are estimated in the re-gression equation. Strictly speaking, it is not sensible to call the constantan explanatory variable, since it does not explain anything and it alwaystakes the same values. However, this deﬁnition of kwill be employed for
notational convenience.
Equation (3.2) can be expressed even more compactly by writing it in
matrix form
y=Xβ+u (3.4)
where: yis of dimension T×1
Xis of dimension T×k
βis of dimension k×1
uis of dimension T×1
The difference between (3.2) and (3.4) is that all of the time observations
have been stacked up in a vector, and also that all of the different ex-planatory variables have been squashed together so that there is a col-umn for each in the Xmatrix. Such a notation may seem unnecessarily
complex, but in fact, the matrix notation is usually more compact andconvenient. So, for example, if kis2, i.e. there are two regressors, one of
which is the constant term (equivalent to a simple bivariate regression
y
t=α+βxt+ut), it is possible to write
⎡
⎢⎢⎢⎣y
1
y2
…
yT⎤
⎥⎥⎥⎦=⎡
⎢⎢⎢⎣1x
21
1×22
……
1x
2T⎤
⎥⎥⎥⎦/bracketleftbiggβ
1
β2/bracketrightbigg
+⎡
⎢⎢⎢⎣u
1
u2
…
uT⎤
⎥⎥⎥⎦(3.5)
T×1 T×22 ×1T×1
so that the x
ijelement of the matrix Xrepresents the jth time observa-
tion on the ith variable. Notice that the matrices written in this way are

Further development and analysis of the CLRM 91
conformable – in other words, there is a valid matrix multiplication and
addition on the RHS.
The above presentation is the standard way to express matrices in the
time series econometrics literature, although the ordering of the indices isdifferent to that used in the mathematics of matrix algebra (as presentedin the mathematical appendix at the end of this book). In the latter case,
x
ijwould represent the element in row iand column j, although in the
notation used in the body of this book it is the other way around.
3.3 How are the parameters (the elements of the βvector)
calculated in the generalised case?
Previously, the residual sum of squares,/summationtextˆu2
iwas minimised with respect
toαandβ. In the multiple regression context, in order to obtain estimates
of the parameters, β1,β2,…,βk,t h e RSSwould be minimised with respect
to all the elements of β. Now, the residuals can be stacked in a vector:
ˆu=⎡
⎢⎢⎢⎣ˆu
1
ˆu2
…
ˆuT⎤
⎥⎥⎥⎦(3.6)
The RSSis still the relevant loss function, and would be given in a matrix
notation by
L=ˆu/primeˆu=[ˆu1ˆu2···ˆuT]⎡
⎢⎢⎢⎣ˆu
1
ˆu2
…
ˆuT⎤
⎥⎥⎥⎦=ˆu2
1+ˆu2
2+···+ ˆu2
T=/summationdisplay
ˆu2
t
(3.7)
Using a similar procedure to that employed in the bivariate regression
case, i.e. substituting into (3.7), and denoting the vector of estimated pa-rameters as ˆβ, it can be shown (see the appendix to this chapter) that the
coefﬁcient estimates will be given by the elements of the expression
ˆβ=⎡
⎢⎢⎣ˆβ
1ˆβ2…
ˆβ
k⎤
⎥⎥⎦=(X/primeX)−1X/primey (3.8)
If one were to check the dimensions of the RHS of (3.8), it would be
observed to be k×1. This is as required since there are kparameters to
be estimated by the formula for ˆβ.

92 Introductory Econometrics for Finance
But how are the standard errors of the coefﬁcient estimates calculated?
Previously, to estimate the variance of the errors, σ2, an estimator denoted
bys2was used
s2=/summationdisplay
ˆu2
t
T−2(3.9)
The denominator of (3.9) is given by T−2, which is the number of de-
grees of freedom for the bivariate regression model (i.e. the number ofobservations minus two). This essentially applies since two observationsare effectively ‘lost’ in estimating the two model parameters (i.e. in de-riving estimates for αandβ). In the case where there is more than one
explanatory variable plus a constant, and using the matrix notation, (3.9)would be modiﬁed to
s
2=ˆu/primeˆu
T−k(3.10)
where k=number of regressors including a constant. In this case, k
observations are ‘lost’ as kparameters are estimated, leaving T−kdegrees
of freedom. It can also be shown (see the appendix to this chapter) thatthe parameter variance–covariance matrix is given by
var(ˆβ)=s
2(X/primeX)−1(3.11)
The leading diagonal terms give the coefﬁcient variances while the off-
diagonal terms give the covariances between the parameter estimates, sothat the variance of ˆβ
1is the ﬁrst diagonal element, the variance of ˆβ2
is the second element on the leading diagonal, and the variance of ˆβkis
thekth diagonal element. The coefﬁcient standard errors are thus simply
given by taking the square roots of each of the terms on the leadingdiagonal.
Example 3.1
The following model with 3 regressors (including the constant) is esti-mated over 15 observations
y=β
1+β2×2+β3×3+u (3.12)
and the following data have been calculated from the original xs
(X/primeX)−1=⎡
⎢⎣2.03.5−1.0
3.51.06 .5
−1.06.54 .3⎤
⎥⎦,(X/primey)=⎡
⎢⎣−3.0
2.2
0.6⎤
⎥⎦,ˆu/primeˆu=10.96

Further development and analysis of the CLRM 93
Calculate the coefﬁcient estimates and their standard errors.
ˆβ=⎡
⎢⎢⎢⎣ˆβ
1
ˆβ2…
ˆβk⎤
⎥⎥⎥⎦=(X/primeX)−1X/primey=⎡
⎢⎣2.03.5−1.0
3.51.06 .5
−1.06.54 .3⎤
⎥⎦
×⎡
⎢⎣−3.0
2.2
0.6⎤
⎥⎦=⎡
⎢⎣1.10
−4.40
19.88⎤
⎥⎦ (3.13)
To calculate the standard errors, an estimate of σ2is required
s2=RSS
T−k=10.96
15−3=0.91 (3.14)
The variance–covariance matrix of ˆβis given by
s2(X/primeX)−1=0.91(X/primeX)−1=⎡
⎣1.82 3 .19−0.91
3.19 0 .91 5 .92
−0.91 5 .92 3 .91⎤
⎦ (3.15)
The coefﬁcient variances are on the diagonals, and the standard errors
are found by taking the square roots of each of the coefﬁcient variances
var(ˆβ1)=1.82 SE(ˆβ1)=1.35 (3.16)
var(ˆβ2)=0.91⇔SE(ˆβ2)=0.95 (3.17)
var(ˆβ3)=3.91 SE(ˆβ3)=1.98 (3.18)
The estimated equation would be written
ˆy=1.10−4.40×2+19.88×3
(1.35) (0 .95) (1 .98)(3.19)
Fortunately, in practice all econometrics software packages will estimate
the cofﬁcient values and their standard errors. Clearly, though, it is stilluseful to understand where these estimates came from.
3.4 Testing multiple hypotheses: the F-test
The t-test was used to test single hypotheses, i.e. hypotheses involving
only one coefﬁcient. But what if it is of interest to test more than onecoefﬁcient simultaneously? For example, what if a researcher wanted todetermine whether a restriction that the coefﬁcient values for β
2andβ3
are both unity could be imposed, so that an increase in either one of the
two variables x2orx3would cause yto rise by one unit? The t-testing

94 Introductory Econometrics for Finance
framework is not sufﬁciently general to cope with this sort of hypothesis
test. Instead, a more general framework is employed, centring on an F-test.
Under the F-test framework, two regressions are required, known as the
unrestricted and the restricted regressions. The unrestricted regression isthe one in which the coefﬁcients are freely determined by the data, ashas been constructed previously. The restricted regression is the one inwhich the coefﬁcients are restricted, i.e. the restrictions are imposed onsome βs. Thus the F-test approach to hypothesis testing is also termed
restricted least squares, for obvious reasons.
The residual sums of squares from each regression are determined, and
the two residual sums of squares are ‘compared’ in the test statistic. The
F-test statistic for testing multiple hypotheses about the coefﬁcient esti-
mates is given by
test statistic =RRSS−URSS
URSS×T−k
m(3.20)
where the following notation applies:
URSS =residual sum of squares from unrestricted regression
RRSS =residual sum of squares from restricted regression
m=number of restrictions
T=number of observations
k=number of regressors in unrestricted regression
The most important part of the test statistic to understand is the nu-
merator expression RRSS−URSS . To see why the test centres around a
comparison of the residual sums of squares from the restricted and un-restricted regressions, recall that OLS estimation involved choosing themodel that minimised the residual sum of squares, with no constraintsimposed. Now if, after imposing constraints on the model, a residual sumof squares results that is not much higher than the unconstrained model’sresidual sum of squares, it would be concluded that the restrictions weresupported by the data. On the other hand, if the residual sum of squaresincreased considerably after the restrictions were imposed, it would beconcluded that the restrictions were not supported by the data and there-fore that the hypothesis should be rejected.
It can be further stated that RRSS ≥URSS . Only under a particular set
of very extreme circumstances will the residual sums of squares for therestricted and unrestricted models be exactly equal. This would be the casewhen the restriction was already present in the data, so that it is not reallya restriction at all (it would be said that the restriction is ‘not binding’, i.e.it does not make any difference to the parameter estimates). So, for exam-ple, if the null hypothesis is H
0:β2=1 and β3=1, then RRSS =URSS only

Further development and analysis of the CLRM 95
in the case where the coefﬁcient estimates for the unrestricted regression
had been ˆβ2=1 and ˆβ3=1. Of course, such an event is extremely unlikely
to occur in practice.
Example 3.2
Dropping the time subscripts for simplicity, suppose that the general re-gression is
y=β
1+β2×2+β3×3+β4×4+u (3.21)
and that the restriction β3+β4=1is under test (there exists some hy-
pothesis from theory which suggests that this would be an interestinghypothesis to study). The unrestricted regression is (3.21) above, but whatis the restricted regression? It could be expressed as
y=β
1+β2×2+β3×3+β4×4+us.t. (subject to) β3+β4=1 (3.22)
The restriction (β3+β4=1)is substituted into the regression so that it is
automatically imposed on the data. The way that this would be achievedwould be to make either β
3orβ4the subject of (3.22), e.g.
β3+β4=1⇒β4=1−β3 (3.23)
and then substitute into (3.21) for β4
y=β1+β2×2+β3×3+(1−β3)x4+u (3.24)
Equation (3.24) is already a restricted form of the regression, but it is not
yet in the form that is required to estimate it using a computer package. Inorder to be able to estimate a model using OLS, software packages usuallyrequire each RHS variable to be multiplied by one coefﬁcient only. There-fore, a little more algebraic manipulation is required. First, expanding thebrackets around (1−β
3)
y=β1+β2×2+β3×3+x4−β3×4+u (3.25)
Then, gathering all of the terms in each βitogether and rearranging
(y−x4)=β1+β2×2+β3(x3−x4)+u (3.26)
Note that any variables without coefﬁcients attached (e.g. x4in (3.25)) are
taken over to the LHS and are then combined with y. Equation (3.26)
is the restricted regression. It is actually estimated by creating two newvariables – call them, say, Pand Q, where P=y−x
4and Q=x3−x4–
so the regression that is actually estimated is
P=β1+β2×2+β3Q+u (3.27)

96 Introductory Econometrics for Finance
What would have happened if instead β3had been made the subject of
(3.23) and β3had therefore been removed from the equation? Although
the equation that would have been estimated would have been differentfrom (3.27), the value of the residual sum of squares for these two models(both of which have imposed upon them the same restriction) would bethe same.
The test statistic follows the F-distribution under the null hypothesis.
The F-distribution has 2 degrees of freedom parameters (recall that the
t-distribution had only 1 degree of freedom parameter, equal to T−k).
The value of the degrees of freedom parameters for the F-test are m,t h e
number of restrictions imposed on the model, and (T−k), the number of
observations less the number of regressors for the unrestricted regression,respectively. Note that the order of the degree of freedom parameters isimportant. The appropriate critical value will be in column m,r o w( T−k)
of the F-distribution tables.
3.4.1 The relationship between the t- and the F-distributions
Any hypothesis that could be tested with a t-test could also have been
tested using an F-test, but not the other way around. So, single hypotheses
involving one coefﬁcient can be tested using a t-o ra n F-test, but multiple
hypotheses can be tested only using an F-test. For example, consider the
hypothesis
H0:β2=0.5
H1:β2/negationslash=0.5
This hypothesis could have been tested using the usual t-test
test stat =ˆβ2−0.5
SE(ˆβ2)(3.28)
or it could be tested in the framework above for the F-test. Note that the
two tests always give the same conclusion since the t-distribution is just
a special case of the F-distribution. For example, consider any random
variable Zthat follows a t-distribution with T−kdegrees of freedom,
and square it. The square of the tis equivalent to a particular form of the
F-distribution
Z2∼t2(T−k) then also Z2∼F(1,T−k)
Thus the square of a t-distributed random variable with T−kdegrees
of freedom also follows an F-distribution with 1 and T−kdegrees of

Further development and analysis of the CLRM 97
freedom. This relationship between the tand the F-distributions will al-
ways hold – take some examples from the statistical tables and try it!
The F-distribution has only positive values and is not symmetrical.
Therefore, the null is rejected only if the test statistic exceeds the critical
F-value, although the test is a two-sided one in the sense that rejection
will occur if ˆβ2is signiﬁcantly bigger or signiﬁcantly smaller than 0.5.
3.4.2 Determining the number of restrictions, m
How is the appropriate value of mdecided in each case? Informally, the
number of restrictions can be seen as ‘the number of equality signs underthe null hypothesis’. To give some examples
H
0:hypothesis No .of restrictions ,m
β1+β2=21
β2=1 andβ3=−12
β2=0,β3=0 andβ4=03
At ﬁrst glance, you may have thought that in the ﬁrst of these cases, the
number of restrictions was two. In fact, there is only one restriction thatinvolves two coefﬁcients. The number of restrictions in the second twoexamples is obvious, as they involve two and three separate componentrestrictions, respectively.
The last of these three examples is particularly important. If the
model is
y=β
1+β2×2+β3×3+β4×4+u (3.29)
then the null hypothesis of
H0:β2=0 and β3=0 and β4=0
is tested by ‘THE’ regression F-statistic. It tests the null hypothesis that
all of the coefﬁcients except the intercept coefﬁcient are zero. This test issometimes called a test for ‘junk regressions’, since if this null hypothesiscannot be rejected, it would imply that none of the independent variablesin the model was able to explain variations in y.
Note the form of the alternative hypothesis for all tests when more than
one restriction is involved
H
1:β2/negationslash=0o r β3/negationslash=0o r β4/negationslash=0
In other words, ‘and’ occurs under the null hypothesis and ‘or’ under the
alternative, so that it takes only one part of a joint null hypothesis to bewrong for the null hypothesis as a whole to be rejected.

98 Introductory Econometrics for Finance
3.4.3 Hypotheses that cannot be tested with either an F-o ra t-test
It is not possible to test hypotheses that are not linear or that are multi-
plicative using this framework – for example, H0:β2β3=2, or H0:β2
2=1
cannot be tested.
Example 3.3
Suppose that a researcher wants to test whether the returns on a com-pany stock (y)show unit sensitivity to two factors (factor x
2and factor
x3) among three considered. The regression is carried out on 144 monthly
observations. The regression is
y=β1+β2×2+β3×3+β4×4+u (3.30)
(1) What are the restricted and unrestricted regressions?
(2) If the two RSSare 436.1 and 397.2, respectively, perform the test.
Unit sensitivity to factors x2and x3implies the restriction that the coef-
ﬁcients on these two variables should be unity, so H 0:β2=1andβ3=1.
The unrestricted regression will be the one given by (3.30) above. To derivethe restricted regression, ﬁrst impose the restriction:
y=β
1+β2×2+β3×3+β4×4+us.t.β2=1 and β3=1 (3.31)
Replacing β2andβ3by their values under the null hypothesis
y=β1+x2+x3+β4×4+u (3.32)
Rearranging
y−x2−x3=β1+β4×4+u (3.33)
Deﬁning z=y−x2−x3, the restricted regression is one of zon a constant
and x4
z=β1+β4×4+u (3.34)
The formula for the F-test statistic is given in (3.20) above. For this appli-
cation, the following inputs to the formula are available: T=144, k=4,
m=2,RRSS =436.1, URSS =397.2. Plugging these into the formula gives
anF-test statistic value of 6.86. This statistic should be compared with an
F(m,T−k), which in this case is an F(2, 140). The critical values are 3.07
at the 5% level and 4.79 at the 1% level. The test statistic clearly exceedsthe critical values at both the 5% and 1% levels, and hence the null hy-pothesis is rejected. It would thus be concluded that the restriction is notsupported by the data.
The following sections will now re-examine the CAPM model as an il-
lustration of how to conduct multiple hypothesis tests using EViews.

Further development and analysis of the CLRM 99
3.5 Sample EViews output for multiple hypothesis tests
Reload the ‘capm.wk1’ workfile constructed in the previous chapter. As
a reminder, the results are included again below.
Dependent Variable: ERFORD
Method: Least SquaresDate: 08/21/07 Time: 15:02Sample (adjusted): 2002M02 2007M04Included observations: 63 after adjustments
Coefﬁcient Std. Error t-Statistic Prob.
C 2.020219 2.801382 0.721151 0.4736
ERSANDP 0.359726 0.794443 0.452803 0.6523
R-squared 0.003350 Mean dependent var 2.097445Adjusted R-squared −0.012989 S.D. dependent var 22.05129
S.E. of regression 22.19404 Akaike info criterion 9.068756Sum squared resid 30047.09 Schwarz criterion 9.136792Log likelihood −283.6658 Hannan-Quinn criter. 9.095514
F-statistic 0.205031 Durbin-Watson stat 1.785699Prob(F-statistic) 0.652297
If we examine the regression F-test, this also shows that the regression
slope coefﬁcient is not signiﬁcantly different from zero, which in this caseis exactly the same result as the t-test for the beta coefﬁcient (since there
is only one slope coefﬁcient). Thus, in this instance, the F-test statistic is
equal to the square of the slope t-ratio.
Now suppose that we wish to conduct a joint test that both the intercept
and slope parameters are 1. We would perform this test exactly as for a
test involving only one coefﬁcient. Select View/Coefficient Tests/Wald –
Coefficient Restrictions …and then in the box that appears, type C(1)=1,
C(2)=1 . There are two versions of the test given: an F-version and a χ
2-
version. The F-version is adjusted for small sample bias and should be
used when the regression is estimated using a small sample (see chapter 4).Both statistics asymptotically yield the same result, and in this case the
p-values are very similar. The conclusion is that the joint null hypothesis,
H
0:β1=1andβ2=1, is not rejected.
3.6 Multiple regression in EViews using an APT-style model
In the spirit of arbitrage pricing theory (APT), the following example will
examine regressions that seek to determine whether the monthly returns

100 Introductory Econometrics for Finance
on Microsoft stock can be explained by reference to unexpected changes
in a set of macroeconomic and ﬁnancial variables. Open a new EViews
workfile to store the data. There are 254 monthly observations in the ﬁle
‘macro.xls’, starting in March 1986 and ending in April 2007. There are 13series plus a column of dates. The series in the Excel ﬁle are the Microsoftstock price, the S&P500 index value, the consumer price index, an indus-trial production index, Treasury bill yields for the following maturities:three months, six months, one year, three years, ﬁve years and ten years, ameasure of ‘narrow’ money supply, a consumer credit series, and a ‘creditspread’ series. The latter is deﬁned as the difference in annualised averageyields between a portfolio of bonds rated AAA and a portfolio of bondsrated BAA.
Import the data from the Excel ﬁle and save the resulting workﬁle as
‘macro.wf1’.
The ﬁrst stage is to generate a set of changes or differences for each of the
variables, since the APT posits that the stock returns can be explained byreference to the unexpected changes in the macroeconomic variables rather
than their levels. The unexpected value of a variable can be deﬁned as thedifference between the actual (realised) value of the variable and its ex-pected value. The question then arises about how we believe that investorsmight have formed their expectations, and while there are many ways toconstruct measures of expectations, the easiest is to assume that investorshave naive expectations that the next period value of the variable is equalto the current value. This being the case, the entire change in the variablefrom one period to the next is the unexpected change (because investorsare assumed to expect no change).
1
Transforming the variables can be done as described above. Press Genr
and then enter the following in the ‘Enter equation’ box:
dspread = baa aaa spread – baa aaa spread(-1)
Repeat these steps to conduct all of the following transformations:
dcredit = consumer credit – consumer credit(-1)
dprod = industrial production – industrial production(-1)
rmsoft = 100*dlog(microsoft)rsandp = 100*dlog(sandp)dmoney = m1money
supply – m1money supply(-1)
1It is an interesting question as to whether the differences should be taken on the levels
of the variables or their logarithms. If the former, we have absolute changes in thevariables, whereas the latter would lead to proportionate changes. The choice betweenthe two is essentially an empirical one, and this example assumes that the former ischosen, apart from for the stock price series themselves and the consumer price series.

Further development and analysis of the CLRM 101
inflation = 100*dlog(cpi)
term = ustb10y – ustb3m
and then click OK. Next, we need to apply further transformations to some
of the transformed series, so repeat the above steps to generate
dinflation = inflation – inflation(-1)
mustb3m = ustb3m/12rterm = term – term(-1)ermsoft = rmsoft – mustb3mersandp = rsandp – mustb3m
The ﬁnal two of these calculate excess returns for the stock and for the
index.
We can now run the regression. So click Object/New Object/Equation
and name the object ‘msof treg’ . Type the following variables in the Equa-
tion speciﬁcation window
ERMSOFT C ERSANDP DPROD DCREDIT DINFLATION DMONEY DSPREAD
RTERM
and use Least Squares over the whole sample period. The table of results
will appear as follows.
Dependent Variable: ERMSOFT
Method: Least SquaresDate: 08/21/07 Time: 21:45Sample (adjusted): 1986M05 2007M04Included observations: 252 after adjustments
Coefﬁcient Std. Error t-Statistic Prob.
C −0.587603 1.457898 −0.403048 0.6873
ERSANDP 1.489434 0.203276 7.327137 0.0000
DPROD 0.289322 0.500919 0.577583 0.5641
DCREDIT −5.58E-05 0.000160 −0.347925 0.7282
DINFLATION 4.247809 2.977342 1.426712 0.1549
DMONEY −1.161526 0.713974 −1.626847 0.1051
DSPREAD 12.15775 13.55097 0.897187 0.3705
RTERM 6.067609 3.321363 1.826843 0.0689
R-squared 0.203545 Mean dependent var −0.420803
Adjusted R-squared 0.180696 S.D. dependent var 15.41135S.E. of regression 13.94965 Akaike info criterion 8.140017Sum squared resid 47480.62 Schwarz criterion 8.252062Log likelihood −1017.642 Hannan-Quinn criter. 8.185102
F-statistic 8.908218 Durbin-Watson stat 2.156221Prob(F-statistic) 0.000000

102 Introductory Econometrics for Finance
Take a few minutes to examine the main regression results. Which of
the variables has a statistically signiﬁcant impact on the Microsoft excessreturns? Using your knowledge of the effects of the ﬁnancial and macro-economic environment on stock returns, examine whether the coefﬁcientshave their expected signs and whether the sizes of the parameters areplausible.
The regression F-statistic takes a value 8.908. Remember that this tests
the null hypothesis that all of the slope parameters are jointly zero. The
p-value of zero attached to the test statistic shows that this null hy-
pothesis should be rejected. However, there are a number of parame-ter estimates that are not signiﬁcantly different from zero – speciﬁcallythose on the DPROD, DCREDIT and DSPREAD variables. Let us test thenull hypothesis that the parameters on these three variables are jointlyzero using an F-test. To test this, Click on View/Coefficient Tests/Wald –
Coefficient Restrictions …and in the box that appears type C(3)=0, C(4)=0,
C(7)=0 and click OK. The resulting F-test statistic follows an F(3, 244) dis-
tribution as there are three restrictions, 252 usable observations and eightparameters to estimate in the unrestricted regression. The F-statistic value
is 0.402 with p-value 0.752, suggesting that the null hypothesis cannot be
rejected. The parameters on DINLATION and DMONEY are almost signiﬁ-cant at the 10% level and so the associated parameters are not includedin this F-test and the variables are retained.
There is a procedure known as a stepwise regression that is now avail-
able in EViews 6. Stepwise regression is an automatic variable selectionprocedure which chooses the jointly most ‘important’ (variously deﬁned)explanatory variables from a set of candidate variables. There are a num-ber of different stepwise regression procedures, but the simplest is theuni-directional forwards method. This starts with no variables in the re-gression (or only those variables that are always required by the researcherto be in the regression) and then it selects ﬁrst the variable with the low-estp-value (largest t-ratio) if it were included, then the variable with the
second lowest p-value conditional upon the ﬁrst variable already being in-
cluded, and so on. The procedure continues until the next lowest p-value
relative to those already included variables is larger than some speciﬁedthreshold value, then the selection stops, with no more variables beingincorporated into the model.
To conduct a stepwise regression which will automatically select from
among these variables the most important ones for explaining the vari-ations in Microsoft stock returns, click Proc and then Equation . Name
the equation Msoftstepwise and then in the ‘Estimation settings/Method’
box, change LS – Least Squares (NLS and ARMA) to
STEPLS – Stepwise Least

Further development and analysis of the CLRM 103
Squares and then in the top box that appears, ‘Dependent variable fol-
lowed by list of always included regressors’, enter
ERMSOFT CThis shows that the dependent variable will be the excess returns on
Microsoft stock and that an intercept will always be included in the re-gression. If the researcher had a strong prior view that a particular ex-planatory variable must always be included in the regression, it should belisted in this ﬁrst box. In the second box, ‘List of search regressors’, typethe list of all of the explanatory variables used above: ERSANDP DPROD
DCREDIT DINFLATION DMONEY DSPREAD RTERM . The window will ap-
pear as in screenshot 3.1.
Screenshot 3.1
Stepwise procedure
equation estimationwindow
Clicking on the ‘Options’ tab gives a number of ways to conduct the
regression. For example, ‘Forwards’ will start with the list of requiredregressors (the intercept only in this case) and will sequentially add to

104 Introductory Econometrics for Finance
them, while ‘Backwards’ will start by including all of the variables and
will sequentially delete variables from the regression. The default criterionis to include variables if the p-value is less than 0.5, but this seems high
and could potentially result in the inclusion of some very insigniﬁcantvariables, so modify this to 0.2 and then click OKto see the results.
As can be seen, the excess market return, the term structure, money
supply and unexpected inﬂation variables have all been included, whilethe default spread and credit variables have been omitted.
Dependent Variable: ERMSOFT
Method: Stepwise RegressionDate: 08/27/07 Time: 10:21Sample (adjusted): 1986M05 2007M04Included observations: 252 after adjustmentsNumber of always included regressors: 1Number of search regressors: 7Selection method: Stepwise forwardsStopping criterion: p-value forwards/backwards =0.2/0.2
Coefﬁcient Std. Error t-Statistic Prob.∗
C −0.947198 0.8787 −1.077954 0.2821
ERSANDP 1.471400 0.201459 7.303725 0.0000
RTERM 6.121657 3.292863 1.859068 0.0642
DMONEY −1.171273 0.702523 −1.667238 0.0967
DINFLATION 4.013512 2.876986 1.395040 0.1643
R-squared 0.199612 Mean dependent var −0.420803
Adjusted R-squared 0.186650 S.D. dependent var 15.41135S.E. of regression 13.89887 Akaike info criterion 8.121133Sum squared resid 47715.09 Schwarz criterion 8.191162Log likelihood −1018.263 Hannan-Quinn criter. 8.149311
F-statistic 15.40008 Durbin-Watson stat 2.150604Prob(F-statistic) 0.000000
Selection Summary
Added ERSANDP
Added RTERM
Added DMONEY
Added DINFLATION
∗Note: p-values and subsequent tests do not account for stepwise selection.
Stepwise procedures have been strongly criticised by statistical purists.
At the most basic level, they are sometimes argued to be no better thanautomated procedures for data mining, in particular if the list of potentialcandidate variables is long and results from a ‘ﬁshing trip’ rather than

Further development and analysis of the CLRM 105
a strong prior ﬁnancial theory. More subtly, the iterative nature of the
variable selection process implies that the size of the tests on parametersattached to variables in the ﬁnal model will not be the nominal values (e.g.5%) that would have applied had this model been the only one estimated.
Thus the p-values for tests involving parameters in the ﬁnal regression
should really be modiﬁed to take into account that the model resultsfrom a sequential procedure, although they are usually not in statisticalpackages such as EViews.
3.6.1 A note on sample sizes and asymptotic theory
A question that is often asked by those new to econometrics is ‘what is anappropriate sample size for model estimation?’ While there is no deﬁnitiveanswer to this question, it should be noted that most testing proceduresin econometrics rely on asymptotic theory. That is, the results in theoryhold only if there are an infinite number of observations . In practice, an in-
ﬁnite number of observations will never be available and fortunately, aninﬁnite number of observations are not usually required to invoke theasymptotic theory! An approximation to the asymptotic behaviour of thetest statistics can be obtained using ﬁnite samples, provided that they arelarge enough. In general, as many observations as possible should be used(although there are important caveats to this statement relating to ‘struc-tural stability’, discussed in chapter 4). The reason is that all the researcherhas at his disposal is a sample of data from which to estimate parametervalues and to infer their likely population counterparts. A sample may failto deliver something close to the exact population values owing to sam-pling error. Even if the sample is randomly drawn from the population,some samples will be more representative of the behaviour of the popu-lation than others, purely owing to ‘luck of the draw’. Sampling error isminimised by increasing the size of the sample, since the larger the sam-ple, the less likely it is that all of the data drawn will be unrepresentativeof the population.
3.7 Data mining and the true size of the test
Recall that the probability of rejecting a correct null hypothesis is equalto the size of the test, denoted α. The possibility of rejecting a correct null
hypothesis arises from the fact that test statistics are assumed to followa random distribution and hence they will take on extreme values thatfall in the rejection region some of the time by chance alone. A conse-quence of this is that it will almost always be possible to ﬁnd signiﬁcant

106 Introductory Econometrics for Finance
relationships between variables if enough variables are examined. For ex-
ample, suppose that a dependent variable ytand 20 explanatory variables
x2t,…,x21t(excluding a constant term) are generated separately as in-
dependent normally distributed random variables. Then yis regressed
separately on each of the 20 explanatory variables plus a constant, andthe signiﬁcance of each explanatory variable in the regressions is exam-ined. If this experiment is repeated many times, on average one of the 20
regressions will have a slope coefﬁcient that is signiﬁcant at the 5% levelfor each experiment. The implication is that for any regression, if enoughexplanatory variables are employed in a regression, often one or more willbe signiﬁcant by chance alone. More concretely, it could be stated that ifanα% size of test is used, on average one in every (100 /α) regressions will
have a signiﬁcant slope coefﬁcient by chance alone.
Trying many variables in a regression without basing the selection of
the candidate variables on a ﬁnancial or economic theory is known as‘data mining’ or ‘data snooping’. The result in such cases is that the truesigniﬁcance level will be considerably greater than the nominal signiﬁ-cance level assumed. For example, suppose that 20 separate regressionsare conducted, of which three contain a signiﬁcant regressor, and a 5%nominal signiﬁcance level is assumed, then the true signiﬁcance levelwould be much higher (e.g. 25%). Therefore, if the researcher then showsonly the results for the regression containing the ﬁnal three equationsand states that they are signiﬁcant at the 5% level, inappropriate conclu-sions concerning the signiﬁcance of the variables would result.
As well as ensuring that the selection of candidate regressors for in-
clusion in a model is made on the basis of ﬁnancial or economic theory,another way to avoid data mining is by examining the forecast perfor-mance of the model in an ‘out-of-sample’ data set (see chapter 5). Theidea is essentially that a proportion of the data is not used in model esti-mation, but is retained for model testing. A relationship observed in theestimation period that is purely the result of data mining, and is there-fore spurious, is very unlikely to be repeated for the out-of-sample period.Therefore, models that are the product of data mining are likely to ﬁt verypoorly and to give very inaccurate forecasts for the out-of-sample period.
3.8 Goodness of ﬁt statistics
3.8.1 R2
It is desirable to have some measure of how well the regression model
actually ﬁts the data. In other words, it is desirable to have an answerto the question, ‘how well does the model containing the explanatory

Further development and analysis of the CLRM 107
variables that was proposed actually explain variations in the dependent
variable?’ Quantities known as goodness of fit statistics are available to test
how well the sample regression function (SRF) ﬁts the data – that is, how‘close’ the ﬁtted regression line is to all of the data points taken together.Note that it is not possible to say how well the sample regression functionﬁts the population regression function – i.e. how the estimated modelcompares with the true relationship between the variables, since the latteris never known.
But what measures might make plausible candidates to be goodness
of ﬁt statistics? A ﬁrst response to this might be to look at the residualsum of squares ( RSS). Recall that OLS selected the coefﬁcient estimates that
minimised this quantity, so the lower was the minimised value of the RSS,
the better the model ﬁtted the data. Consideration of the RSSis certainly
one possibility, but RSSis unbounded from above (strictly, RSSis bounded
from above by the total sum of squares – see below) – i.e. it can take any(non-negative) value. So, for example, if the value of the RSS under OLS
estimation was 136.4, what does this actually mean? It would therefore bevery difﬁcult, by looking at this number alone, to tell whether the regres-sion line ﬁtted the data closely or not. The value of RSSdepends to a great
extent on the scale of the dependent variable. Thus, one way to pointlesslyreduce the RSSwould be to divide all of the observations on yby 10!
In fact, a scaled version of the residual sum of squares is usually employed.
The most common goodness of ﬁt statistic is known as R
2. One way to
deﬁne R2is to say that it is the square of the correlation coefﬁcient
between yand ˆy– that is, the square of the correlation between the values
of the dependent variable and the corresponding ﬁtted values from themodel. A correlation coefﬁcient must lie between −1 and +1 by deﬁnition.
Since R
2deﬁned in this way is the square of a correlation coefﬁcient, it
must lie between 0 and 1. If this correlation is high, the model ﬁts thedata well, while if the correlation is low (close to zero), the model is notproviding a good ﬁt to the data.
Another deﬁnition of R
2requires a consideration of what the model
is attempting to explain. What the model is trying to do in effect is toexplain variability of yabout its mean value, ¯y. This quantity, ¯y, which
is more speciﬁcally known as the unconditional mean of y, acts like a
benchmark since, if the researcher had no model for y, he could do no
worse than to regress yon a constant only. In fact, the coefﬁcient estimate
for this regression would be the mean of y. So, from the regression
y
t=β1+ut (3.35)
the coefﬁcient estimate ˆβ1, will be the mean of y, i.e. ¯y. The total variation
across all observations of the dependent variable about its mean value is

108 Introductory Econometrics for Finance
known as the total sum of squares, TSS, which is given by:
TSS=/summationdisplay
t(yt−¯y)2(3.36)
The TSScan be split into two parts: the part that has been explained by the
model (known as the explained sum of squares, ESS) and the part that the
model was not able to explain (the RSS). That is
TSS=ESS+RSS (3.37)
/summationdisplay
t(yt−¯y)2=/summationdisplay
t(ˆyt−¯y)2+/summationdisplay
tˆu2
t (3.38)
Recall also that the residual sum of squares can also be expressed as
/summationdisplay
t(yt−ˆyt)2
since a residual for observation tis deﬁned as the difference between the
actual and ﬁtted values for that observation. The goodness of ﬁt statisticis given by the ratio of the explained sum of squares to the total sum ofsquares:
R
2=ESS
TSS(3.39)
but since TSS=ESS+RSS, it is also possible to write
R2=ESS
TSS=TSS−RSS
TSS=1−RSS
TSS(3.40)
R2must always lie between zero and one (provided that there is a constant
term in the regression). This is intuitive from the correlation interpreta-tion of R
2given above, but for another explanation, consider two extreme
cases
RSS=TSS i.e.ESS=0s o R2=ESS/TSS=0
ESS=TSS i.e.RSS=0s o R2=ESS/TSS=1
In the ﬁrst case, the model has not succeeded in explaining any of the
variability of yabout its mean value, and hence the residual and total
sums of squares are equal. This would happen only where the estimatedvalues of all of the coefﬁcients were exactly zero. In the second case, themodel has explained all of the variability of yabout its mean value, which
implies that the residual sum of squares will be zero. This would happenonly in the case where all of the observation points lie exactly on theﬁtted line. Neither of these two extremes is likely in practice, of course,but they do show that R
2is bounded to lie between zero and one, with a
higher R2implying, everything else being equal, that the model ﬁts the
data better.

Further development and analysis of the CLRM 109
y–yt
xtFigure 3.1
R2=0
demonstrated by aﬂat estimated line,i.e. a zero slopecoefﬁcient
yt
xtFigure 3.2
R2=1when all data
points lie exactly onthe estimated line
To sum up, a simple way (but crude, as explained next) to tell whether
the regression line ﬁts the data well is to look at the value of R2. A value of
R2close to 1 indicates that the model explains nearly all of the variability
of the dependent variable about its mean value, while a value close to zeroindicates that the model ﬁts the data poorly. The two extreme cases, where
R
2=0and R2=1, are indicated in ﬁgures 3.1 and 3.2 in the context of
a simple bivariate regression.
3.8.2 Problems with R2as a goodness of ﬁt measure
R2is simple to calculate, intuitive to understand, and provides a broad
indication of the ﬁt of the model to the data. However, there are a numberof problems with R
2as a goodness of ﬁt measure:

110 Introductory Econometrics for Finance
(1)R2is deﬁned in terms of variation about the mean of yso that if
a model is reparameterised (rearranged) and the dependent variablechanges, R
2will change, even if the second model was a simple re-
arrangement of the ﬁrst, with identical RSS. Thus it is not sensible
to compare the value of R2across models with different dependent
variables.
(2)R2never falls if more regressors are added to the regression. For ex-
ample, consider the following two models:
Regression 1: y=β1+β2×2+β3×3+u (3.41)
Regression 2: y=β1+β2×2+β3×3+β4×4+u (3.42)
R2will always be at least as high for regression 2 relative to regression
1. The R2from regression 2 would be exactly the same as that for
regression 1 only if the estimated value of the coefﬁcient on the newvariable were exactly zero, i.e. ˆβ
4=0. In practice, ˆβ4will always be non-
zero, even if not signiﬁcantly so, and thus in practice R2always rises
as more variables are added to a model. This feature of R2essentially
makes it impossible to use as a determinant of whether a given variableshould be present in the model or not.
(3)R
2can take values of 0.9 or higher for time series regressions, and
hence it is not good at discriminating between models, since a widearray of models will frequently have broadly similar (and high) valuesofR
2.
3.8.3 Adjusted R2
In order to get around the second of these three problems, a modiﬁca-
tion to R2is often made which takes into account the loss of degrees of
freedom associated with adding extra variables. This is known as ¯R2,o r
adjusted R2, which is deﬁned as
¯R2=1−/bracketleftbiggT−1
T−k(1−R2)/bracketrightbigg
(3.43)
So if an extra regressor (variable) is added to the model, kincreases and
unless R2increases by a more than off-setting amount, ¯R2will actually
fall. Hence ¯R2can be used as a decision-making tool for determining
whether a given variable should be included in a regression model or not,with the rule being: include the variable if ¯R
2rises and do not include it
if¯R2falls.
However, there are still problems with the maximisation of ¯R2as crite-
rion for model selection, and principal among these is that it is a ‘soft’

Further development and analysis of the CLRM 111
rule, implying that by following it, the researcher will typically end up
with a large model, containing a lot of marginally signiﬁcant or insignif-icant variables. Also, while R
2must be at least zero if an intercept is
included in the regression, its adjusted counterpart may take negativevalues, even with an intercept in the regression, if the model ﬁts the datavery poorly.
Now reconsider the results from the previous exercises using EViews in
the previous chapter and earlier in this chapter. If we ﬁrst consider thehedging model from chapter 2, the R
2value for the returns regression
was only 0.01, indicating that a mere 1% of the variation in spot returnsis explained by the futures returns – a very poor model ﬁt indeed.
The ﬁt is no better for the Ford stock CAPM regression described in
chapter 2, where the R
2is less than 1% and the adjusted R2is actually
negative. The conclusion here would be that for this stock and this sampleperiod, almost none of the monthly movement in the excess returns canbe attributed to movements in the market as a whole, as measured by theS&P500.
Finally, if we look at the results from the recent regressions for Mi-
crosoft, we ﬁnd a considerably better ﬁt. It is of interest to compare themodel ﬁt for the original regression that included all of the variableswith the results of the stepwise procedure. We can see that the raw R
2
is slightly higher for the original regression (0.204 versus 0.200 for the
stepwise regression, to three decimal places), exactly as we would expect.Since the original regression contains more variables, the R
2-value must
be at least as high. But comparing the ¯R2s, the stepwise regression value
(0.187) is slightly higher than for the full regression (0.181), indicatingthat the additional regressors in the full regression do not justify theirpresence, at least according to this criterion.
Box 3.1 The relationship between the regression F-statistic and R2
There is a particular relationship between a regression’s R2value and the regression
F-statistic. Recall that the regression F-statistic tests the null hypothesis that all of
the regression slope parameters are simultaneously zero. Let us call the residual sumof squares for the unrestricted regression including all of the explanatory variablesRSS, while the restricted regression will simply be one of y
ton a constant
yt=β1+ut (3.44)
Since there are no slope parameters in this model, none of the variability of ytabout
its mean value would have been explained. Thus the residual sum of squares forequation (3.44) will actually be the total sum of squares of y
t,TSS. We could write the

112 Introductory Econometrics for Finance
usual F-statistic formula for testing this null that all of the slope parameters are jointly
zero as
F−stat=TSS−RSS
RSS×T−k
k−1(3.45)
In this case, the number of restrictions (‘ m’) is equal to the number of slope
parameters, k−1. Recall that TSS−RSS=ESSand dividing the numerator and
denominator of equation (3.45) by TSS, we obtain
F−stat=ESS/TSS
RSS/TSS×T−k
k−1(3.46)
Now the numerator of equation (3.46) is R2, while the denominator is 1−R2, so that
theF-statistic can be written
F−stat=R2(T−k)
1−R2(k−1)(3.47)
This relationship between the F-statistic and R2holds only for a test of this null
hypothesis and not for any others.
There now follows another case study of the application of the OLS
method of regression estimation, including interpretation of t-ratios
and R2.
3.9 Hedonic pricing models
One application of econometric techniques where the coefﬁcients have
a particularly intuitively appealing interpretation is in the area of hedo-nic pricing models. Hedonic models are used to value real assets, especially
housing, and view the asset as representing a bundle of characteristics,each of which gives either utility or disutility to its consumer. Hedonicmodels are often used to produce appraisals or valuations of properties,given their characteristics (e.g. size of dwelling, number of bedrooms,location, number of bathrooms, etc). In these models, the coefﬁcient esti-mates represent ‘prices of the characteristics’.
One such application of a hedonic pricing model is given by Des Rosiers
and Th ´erialt (1996), who consider the effect of various amenities on rental
values for buildings and apartments in ﬁve sub-markets in the Quebec areaof Canada. After accounting for the effect of ‘contract-speciﬁc’ featureswhich will affect rental values (such as whether furnishings, lighting, orhot water are included in the rental price), they arrive at a model wherethe rental value in Canadian dollars per month (the dependent variable) is

Further development and analysis of the CLRM 113
a function of 9–14 variables (depending on the area under consideration).
The paper employs 1990 data for the Quebec City region, and there are13,378 observations. The 12 explanatory variables are:
LnAGE log of the apparent age of the property
NBROOMS number of bedroomsAREABYRM area per room (in square metres)ELEVATOR a dummy variable =1if the building has an
elevator; 0otherwise
BASEMENT a dummy variable =1if the unit is located in a
basement; 0otherwise
OUTPARK number of outdoor parking spacesINDPARK number of indoor parking spacesNOLEASE a dummy variable =1if the unit has no lease
attached to it; 0otherwise
LnDISTCBD log of the distance in kilometres to the central
business district (CBD)
SINGLPAR percentage of single parent families in the area
where the building stands
DSHOPCNTR distance in kilometres to the nearest shopping
centre
VACDIFF1 vacancy difference between the building and the
census ﬁgure
This list includes several variables that are dummy variables. Dummy vari-
ables are also known as qualitative variables because they are often used to
numerically represent a qualitative entity. Dummy variables are usuallyspeciﬁed to take on one of a narrow range of integer values, and in mostinstances only zero and one are used.
Dummy variables can be used in the context of cross-sectional or time
series regressions. The latter case will be discussed extensively below. Ex-amples of the use of dummy variables as cross-sectional regressors wouldbe for sex in the context of starting salaries for new traders (e.g. male =0,
female =1) or in the context of sovereign credit ratings (e.g. developing
country =0, developed country =1), and so on. In each case, the dummy
variables are used in the same way as other explanatory variables and thecoefﬁcients on the dummy variables can be interpreted as the average dif-ferences in the values of the dependent variable for each category, givenall of the other factors in the model.
Des Rosiers and Th ´erialt (1996) report several speciﬁcations for ﬁve dif-
ferent regions, and they present results for the model with variables as

114 Introductory Econometrics for Finance
Table 3.1 Hedonic model of rental values in Quebec City, 1990.
Dependent variable: Canadian dollars per month
A priori
Variable Coefﬁcient t-ratio sign expected
Intercept 282.21 56.09 +
LnAGE −53.10 −59.71 −
NBROOMS 48.47 104.81 +
AREABYRM 3.97 29.99 +
ELEVATOR 88.51 45.04 +
BASEMENT −15.90 −11.32 −
OUTPARK 7.17 7.07 +
INDPARK 73.76 31.25 +
NOLEASE −16.99 −7.62 −
LnDISTCBD 5.84 4.60 −
SINGLPAR −4.27 −38.88 −
DSHOPCNTR −10.04 −5.97 −
VACDIFF1 0.29 5.98 −
Notes : Adjusted R2=0.651; regression F-statistic =2082.27.
Source : Des Rosiers and Th ´erialt (1996). Reprinted with permission
of American Real Estate Society.
discussed here in their exhibit 4, which is adapted and reported here as
table 3.1.
The adjusted R2value indicates that 65% of the total variability of rental
prices about their mean value is explained by the model. For a cross-sectional regression, this is quite high. Also, all variables are signiﬁcant atthe 0.01% level or lower and consequently, the regression F-statistic rejects
very strongly the null hypothesis that all coefﬁcient values on explanatoryvariables are zero.
As stated above, one way to evaluate an econometric model is to de-
termine whether it is consistent with theory. In this instance, no realtheory is available, but instead there is a notion that each variable will af-fect rental values in a given direction. The actual signs of the coefﬁcientscan be compared with their expected values, given in the last column oftable 3.1 (as determined by this author). It can be seen that all coefﬁcientsexcept two (the log of the distance to the CBD and the vacancy differential)have their predicted signs. It is argued by Des Rosiers and Th ´erialt that the
‘distance to the CBD’ coefﬁcient may be expected to have a positive signsince, while it is usually viewed as desirable to live close to a town centre,everything else being equal, in this instance most of the least desirableneighbourhoods are located towards the centre.

Further development and analysis of the CLRM 115
The coefﬁcient estimates themselves show the Canadian dol-
lar rental price per month of each feature of the dwelling. To offer afew illustrations, the NBROOMS value of 48 (rounded) shows that, every-thing else being equal, one additional bedroom will lead to an averageincrease in the rental price of the property by $48 per month at 1990prices. A basement coefﬁcient of −16 suggests that an apartment located
in a basement commands a rental $16 less than an identical apartmentabove ground. Finally the coefﬁcients for parking suggest that on averageeach outdoor parking space adds $7 to the rent while each indoor parkingspace adds $74, and so on. The intercept shows, in theory, the rental thatwould be required of a property that had zero values on all the attributes.This case demonstrates, as stated previously, that the coefﬁcient on theconstant term often has little useful interpretation, as it would refer to adwelling that has just been built, has no bedrooms each of zero size, noparking spaces, no lease, right in the CBD and shopping centre, etc.
One limitation of such studies that is worth mentioning at this stage is
their assumption that the implicit price of each characteristic is identicalacross types of property, and that these characteristics do not becomesaturated. In other words, it is implicitly assumed that if more and morebedrooms or allocated parking spaces are added to a dwelling indeﬁnitely,the monthly rental price will rise each time by $48 and $7, respectively.This assumption is very unlikely to be upheld in practice, and will result inthe estimated model being appropriate for only an ‘average’ dwelling. Forexample, an additional indoor parking space is likely to add far more valueto a luxury apartment than a basic one. Similarly, the marginal value ofan additional bedroom is likely to be bigger if the dwelling currently hasone bedroom than if it already has ten. One potential remedy for thiswould be to use dummy variables with ﬁxed effects in the regressions;see, for example, chapter 10 for an explanation of these.
3.10 Tests of non-nested hypotheses
All of the hypothesis tests conducted thus far in this book have been inthe context of ‘nested’ models. This means that, in each case, the test in-volved imposing restrictions on the original model to arrive at a restrictedformulation that would be a sub-set of, or nested within, the original spec-iﬁcation.
However, it is sometimes of interest to compare between non-nested
models. For example, suppose that there are two researchers workingindependently, each with a separate ﬁnancial theory for explaining the

116 Introductory Econometrics for Finance
variation in some variable, yt. The models selected by the researchers re-
spectively could be
yt=α1+α2x2t+ut (3.48)
yt=β1+β2x3t+vt (3.49)
where utandvtare iid error terms. Model (3.48) includes variable x2but
not x3, while model (3.49) includes x3but not x2. In this case, neither
model can be viewed as a restriction of the other, so how then can thetwo models be compared as to which better represents the data, y
t? Given
the discussion in section 3.8, an obvious answer would be to compare thevalues of R
2or adjusted R2between the models. Either would be equally
applicable in this case since the two speciﬁcations have the same num-ber of RHS variables. Adjusted R
2could be used even in cases where the
number of variables was different across the two models, since it employsa penalty term that makes an allowance for the number of explanatoryvariables. However, adjusted R
2is based upon a particular penalty func-
tion (that is, T−kappears in a speciﬁc way in the formula). This form of
penalty term may not necessarily be optimal. Also, given the statementabove that adjusted R
2is a soft rule, it is likely on balance that use of
it to choose between models will imply that models with more explana-tory variables are favoured. Several other similar rules are available, eachhaving more or less strict penalty terms; these are collectively known as‘information criteria’. These are explained in some detail in chapter 5, butsufﬁce to say for now that a different strictness of the penalty term willin many cases lead to a different preferred model.
An alternative approach to comparing between non-nested models
would be to estimate an encompassing or hybrid model. In the case of(3.48) and (3.49), the relevant encompassing model would be
y
t=γ1+γ2x2t+γ3x3t+wt (3.50)
where wtis an error term. Formulation (3.50) contains both (3.48) and
(3.49) as special cases when γ3andγ2are zero, respectively. Therefore, a
test for the best model would be conducted via an examination of thesigniﬁcances of γ
2andγ3in model (3.50). There will be four possible
outcomes (box 3.2).
However, there are several limitations to the use of encompassing re-
gressions to select between non-nested models. Most importantly, even ifmodels (3.48) and (3.49) have a strong theoretical basis for including theRHS variables that they do, the hybrid model may be meaningless. Forexample, it could be the case that ﬁnancial theory suggests that ycould
either follow model (3.48) or model (3.49), but model (3.50) is implausible.

Further development and analysis of the CLRM 117
Box 3.2 Selecting between models
(1)γ2is statistically signiﬁcant but γ3is not. In this case, (3.50) collapses to (3.48),
and the latter is the preferred model.
(2)γ3is statistically signiﬁcant but γ2is not. In this case, (3.50) collapses to (3.49),
and the latter is the preferred model.
(3)γ2andγ3are both statistically signiﬁcant. This would imply that both x2and x3have
incremental explanatory power for y, in which case both variables should be retained.
Models (3.48) and (3.49) are both ditched and (3.50) is the preferred model.
(4) Neither γ2norγ3are statistically signiﬁcant. In this case, none of the models can be
dropped, and some other method for choosing between them must be employed.
Also, if the competing explanatory variables x2and x3are highly re-
lated (i.e. they are near collinear), it could be the case that if they areboth included, neither γ
2norγ3are statistically signiﬁcant, while each is
signiﬁcant in their separate regressions (3.48) and (3.49); see the sectionon multicollinearity in chapter 4.
An alternative approach is via the J-encompassing test due to Davidson
and MacKinnon (1981). Interested readers are referred to their work or toGujarati (2003, pp. 533–6) for further details.
Key concepts
The key terms to be able to deﬁne and explain from this chapter are
●multiple regression model ●variance-covariance matrix
●restricted regression ●F-distribution
●R2●¯R2
●hedonic model ●encompassing regression
●data mining
Appendix 3.1 Mathematical derivations of CLRM results
Derivation of the OLS coefﬁcient estimator in the
multiple regression context
In the multiple regression context, in order to obtain the parameter esti-
mates, β1,β2,…,β k,t h e RSSwould be minimised with respect to all the
elements of β. Now the residuals are expressed in a vector:
ˆu=⎡
⎢⎢⎢⎣ˆu
1
ˆu2
…
ˆuT⎤
⎥⎥⎥⎦(3A.1)

118 Introductory Econometrics for Finance
The RSSis still the relevant loss function, and would be given in a matrix
notation by expression (3A.2)
L=ˆu/primeˆu=[ˆu1ˆu2…ˆuT]⎡
⎢⎢⎢⎣ˆu
1
ˆu2
…
ˆuT⎤
⎥⎥⎥⎦=ˆu2
1+ˆu2
2+···+ ˆu2
T=/summationdisplay
ˆu2
t (3A.2)
Denoting the vector of estimated parameters as ˆβ, it is also possible to
write
L=ˆu/primeˆu=(y−Xˆβ)/prime(y−Xˆβ)=y/primey−ˆβ/primeX/primey−y/primeXˆβ+ˆβ/primeX/primeXˆβ (3A.3)
It turns out that ˆβ/primeX/primeyis(1×k)×(k×T)×(T×1)=1×1, and also that
y/primeXˆβis(1×T)×(T×k)×(k×1)=1×1,s oi nf a c t ˆβ/primeX/primey=y/primeXˆβ. Thus
(3A.3) can be written
L=ˆu/primeˆu=(y−Xˆβ)/prime(y−Xˆβ)=y/primey−2ˆβ/primeX/primey+ˆβ/primeX/primeXˆβ (3A.4)
Differentiating this expression with respect to ˆβand setting it to zero
in order to ﬁnd the parameter values that minimise the residual sum ofsquares would yield
∂L
∂ˆβ=−2X/primey+2X/primeXˆβ=0 (3A.5)
This expression arises since the derivative of y/primeyis zero with respect to
ˆβ, and ˆβ/primeX/primeXˆβacts like a square of Xˆβ, which is differentiated to 2X/primeXˆβ.
Rearranging (3A.5)
2X/primey=2X/primeXˆβ (3A.6)
X/primey=X/primeXˆβ (3A.7)
Pre-multiplying both sides of (3A.7) by the inverse of X/primeX
ˆβ=(X/primeX)−1X/primey (3A.8)
Thus, the vector of OLS coefﬁcient estimates for a set of kparameters is
given by
ˆβ=⎡
⎢⎢⎢⎣ˆβ
1
ˆβ2…
ˆβk⎤
⎥⎥⎥⎦=(X/primeX)−1X/primey (3A.9)

Further development and analysis of the CLRM 119
Derivation of the OLS standard error estimator in the
multiple regression context
The variance of a vector of random variables ˆβis given by the formula
E[(ˆβ−β)(ˆβ−β)/prime]. Since y=Xβ+u,it can also be stated, given (3A.9),
that
ˆβ=(X/primeX)−1X/prime(Xβ+u) (3A.10)
Expanding the parentheses
ˆβ=(X/primeX)−1X/primeXβ+(X/primeX)−1X/primeu (3A.11)
ˆβ=β+(X/primeX)−1X/primeu (3A.12)
Thus, it is possible to express the variance of ˆβas
E[(ˆβ−β)(ˆβ−β)/prime]=E[(β+(X/primeX)−1X/primeu−β)(β+(X/primeX)−1X/primeu−β)/prime]
(3A.13)
Cancelling the βterms in each set of parentheses
E[(ˆβ−β)(ˆβ−β)/prime]=E[((X/primeX)−1X/primeu)((X/primeX)−1X/primeu)/prime] (3A.14)
Expanding the parentheses on the RHS of (3A.14) gives
E[(ˆβ−β)(ˆβ−β)/prime]=E[(X/primeX)−1X/primeuu/primeX(X/primeX)−1] (3A.15)
E[(ˆβ−β)(ˆβ−β)/prime]=(X/primeX)−1X/primeE[uu/prime]X(X/primeX)−1(3A.16)
Now E[uu/prime]is estimated by s2I,s ot h a t
E[(ˆβ−β)(ˆβ−β)/prime]=(X/primeX)−1X/primes2IX(X/primeX)−1(3A.17)
where Iis a k×kidentity matrix. Rearranging further,
E[(ˆβ−β)(ˆβ−β)/prime]=s2(X/primeX)−1X/primeX(X/primeX)−1(3A.18)
The X/primeXand the last (X/primeX)−1term cancel out to leave
var(ˆβ)=s2(X/primeX)−1(3A.19)
as the expression for the parameter variance–covariance matrix. This quan-
tity, s2(X/primeX)−1, is known as the estimated variance–covariance matrix of
the coefﬁcients. The leading diagonal terms give the estimated coefﬁcientvariances while the off-diagonal terms give the estimated covariances be-tween the parameter estimates. The variance of ˆβ
1is the ﬁrst diagonal
element, the variance of ˆβ2is the second element on the leading di-
agonal, …, and the variance of ˆβkis the kth diagonal element, etc. as
discussed in the body of the chapter.

120 Introductory Econometrics for Finance
Appendix 3.2 A brief introduction to factor models and principal
components analysis
Factor models are employed primarily as dimensionality reduction tech-
niques in situations where we have a large number of closely relatedvariables and where we wish to allow for the most important inﬂuencesfrom all of these variables at the same time. Factor models decomposethe structure of a set of series into factors that are common to allseries and a proportion that is speciﬁc to each series (idiosyncratic varia-tion). There are broadly two types of such models, which can be looselycharacterised as either macroeconomic or mathematical factor models.The key distinction between the two is that the factors are observablefor the former but are latent (unobservable) for the latter. Observablefactor models include the APT model of Ross (1976). The most commonmathematical factor model is principal components analysis (PCA). PCAis a technique that may be useful where explanatory variables are closelyrelated – for example, in the context of near multicollinearity. Speciﬁ-cally, if there are kexplanatory variables in the regression model, PCA
will transform them into kuncorrelated new variables. To elucidate,
suppose that the original explanatory variables are denoted x
1,×2,…,
xk, and denote the principal components by p1,p2,…,pk. These prin-
cipal components are independent linear combinations of the originaldata
p
1=α11×1+α12×2+···+ α1kxk
p2=α21×1+α22×2+···+ α2kxk (3A.20)
… … … …
pk=αk1x1+αk2x2+···+ αkkxk
where αijare coefﬁcients to be calculated, representing the coefﬁcient
on the jth explanatory variable in the ith principal component. These
coefﬁcients are also known as factor loadings. Note that there will be T
observations on each principal component if there were Tobservations
on each explanatory variable.
It is also required that the sum of the squares of the coefﬁcients for
each component is one, i.e.
α2
11+α2
12+···+ α2
1k=1
…… (3A.21)
α2
k1+α2
k2+···+ α2
kk=1

Further development and analysis of the CLRM 121
This requirement could also be expressed using sigma notation
k/summationdisplay
j=1α2
ij=1∀i=1,…, k (3A.22)
Constructing the components is a purely mathematical exercise in con-
strained optimisation, and thus no assumption is made concerning thestructure, distribution, or other properties of the variables.
The principal components are derived in such a way that they are in
descending order of importance. Although there are kprincipal compo-
nents, the same as the number of explanatory variables, if there is somecollinearity between these original explanatory variables, it is likely thatsome of the (last few) principal components will account for so little ofthe variation that they can be discarded. However, if all of the originalexplanatory variables were already essentially uncorrelated, all of the com-ponents would be required, although in such a case there would have beenlittle motivation for using PCA in the ﬁrst place.
The principal components can also be understood as the eigenvalues
of (X
/primeX), where Xis the matrix of observations on the original variables.
Thus the number of eigenvalues will be equal to the number of variables,k. If the ordered eigenvalues are denoted λ
i(i=1,…,k), the ratio
φi=λi
k/summationdisplay
i=1λi
gives the proportion of the total variation in the original data explained
by the principal component i. Suppose that only the ﬁrst r(0<r<k)
principal components are deemed sufﬁciently useful in explaining thevariation of ( X
/primeX), and that they are to be retained, with the remaining
k−rcomponents being discarded. The regression ﬁnally estimated, after
the principal components have been formed, would be one of yon the r
principal components
yt=γ0+γ1p1t+···+ γrprt+ut (3A.23)
In this way, the principal components are argued to keep most of the
important information contained in the original explanatory variables,but are orthogonal. This may be particularly useful for independent vari-ables that are very closely related. The principal component estimates(ˆγ
i,i=1,…, r) will be biased estimates, although they will be more ef-
ﬁcient than the OLS estimators since redundant information has been

122 Introductory Econometrics for Finance
removed. In fact, if the OLS estimator for the original regression of yon
xis denoted ˆβ, it can be shown that
ˆγr=P/prime
rˆβ (3A.24)
where ˆγrare the coefﬁcient estimates for the principal components, and
Pris a matrix of the ﬁrst rprincipal components. The principal component
coefﬁcient estimates are thus simply linear combinations of the originalOLS estimates.
An application of principal components to interest rates
Many economic and ﬁnancial models make use of interest rates in someform or another as independent variables. Researchers may wish to in-clude interest rates on a large number of different assets in order to re-ﬂect the variety of investment opportunities open to investors. However,market interest rates could be argued to be not sufﬁciently independentof one another to make the inclusion of several interest rate series in aneconometric model statistically sensible. One approach to examining thisissue would be to use PCA on several related interest rate series to de-termine whether they did move independently of one another over somehistorical time period or not.
Fase (1973) conducted such a study in the context of monthly Dutch mar-
ket interest rates from January 1962 until December 1970 (108 months).Fase examined both ‘money market’ and ‘capital market’ rates, althoughonly the money market results will be discussed here in the interests ofbrevity. The money market instruments investigated were:
●Call money
●Three-month Treasury paper
●One-year Treasury paper
●Two-year Treasury paper
●Three-year Treasury paper
●Five-year Treasury paper
●Loans to local authorities: three-month
●Loans to local authorities: one-year
●Eurodollar deposits
●Netherlands Bank ofﬁcial discount rate.
Prior to analysis, each series was standardised to have zero mean and
unit variance by subtracting the mean and dividing by the standard de-viation in each case. The three largest of the ten eigenvalues are given intable 3A.1.

Further development and analysis of the CLRM 123
Table 3A.1 Principal component ordered eigenvalues for Dutch interest rates,
1962–1970
Monthly data Quarterly data
Jan 62–Dec 70 Jan 62–Jun 66 Jul 66–Dec 70 Jan 62–Dec 70
λ1 9.57 9.31 9.32 9.67
λ2 0.20 0.31 0.40 0.16
λ3 0.09 0.20 0.17 0.07
φ1 95.7% 93.1% 93.2% 96.7%
Source: Fase (1973). Reprinted with the permission of Elsevier Science.
Table 3A.2 Factor loadings of the ﬁrst and second principal components for
Dutch interest rates, 1962–1970
j Debt instrument αj1 αj2
1 Call money 0.95 −0.22
2 3-month Treasury paper 0.98 0.123 1-year Treasury paper 0.99 0.154 2-year Treasury paper 0.99 0.135 3-year Treasury paper 0.99 0.116 5-year Treasury paper 0.99 0.097 Loans to local authorities: 3-month 0.99 −0.08
8 Loans to local authorities: 1-year 0.99 −0.04
9 Eurodollar deposits 0.96 −0.26
10 Netherlands Bank ofﬁcial discount rate 0.96 −0.03
Eigenvalue, λi 9.57 0.20
Proportion of variability explained by 95.7 2.0
eigenvalue i,φi(%)
Source: Fase (1973). Reprinted with the permission of Elsevier Science.
The results in table 3A.1 are presented for the whole period using the
monthly data, for two monthly sub-samples, and for the whole periodusing data sampled quarterly instead of monthly. The results show clearlythat the ﬁrst principal component is sufﬁcient to describe the commonvariation in these Dutch interest rate series. The ﬁrst component is able toexplain over 90% of the variation in all four cases, as given in the last rowof table 3A.1. Clearly, the estimated eigenvalues are fairly stable across thesample periods and are relatively invariant to the frequency of samplingof the data. The factor loadings (coefﬁcient estimates) for the ﬁrst twoordered components are given in table 3A.2.
As table 3A.2 shows, the loadings on each factor making up the
ﬁrst principal component are all positive. Since each series has been

124 Introductory Econometrics for Finance
standardised to have zero mean and unit variance, the coefﬁcients αj1
andαj2can be interpreted as the correlations between the interest rate
jand the ﬁrst and second principal components, respectively. The fac-
tor loadings for each interest rate series on the ﬁrst component are allvery close to one. Fase (1973) therefore argues that the ﬁrst componentcan be interpreted simply as an equally weighted combination of all ofthe market interest rates. The second component, which explains muchless of the variability of the rates, shows a factor loading pattern of posi-tive coefﬁcients for the Treasury paper series and negative or almost zerovalues for the other series. Fase (1973) argues that this is owing to thecharacteristics of the Dutch Treasury instruments that they rarely changehands and have low transactions costs, and therefore have less sensitivityto general interest rate movements. Also, they are not subject to defaultrisks in the same way as, for example Eurodollar deposits. Therefore, thesecond principal component is broadly interpreted as relating to defaultrisk and transactions costs.
Principal components can be useful in some circumstances, although
the technique has limited applicability for the following reasons:
●A change in the units of measurement of xwill change the principal
components. It is thus usual to transform all of the variables to havezero mean and unit variance prior to applying PCA.
●The principal components usually have no theoretical motivation orinterpretation whatsoever.
●The rprincipal components retained from the original kare the ones
that explain most of the variation in x, but these components might
not be the most useful as explanations for y.
Calculating principal components in EViews
In order to calculate the principal components of a set of series with
EViews, the ﬁrst stage is to compile the series concerned into a group.Re-open the ‘macro.wf1’ file which contains US Treasury bill and bond
series of various maturities. Select New Object/Group but do not name the
object. When EViews prompts you to give a ‘List of series, groups and/orseries expressions’, enter
USTB3M USTB6M USTB1Y US TB3Y USTB5Y USTB10Y
and click OK, then name the group Interest by clicking the Name tab. The
group will now appear as a set of series in a spreadsheet format. Fromwithin this window, click View/Principal Components . Screenshot 3.2 will
appear.

Further development and analysis of the CLRM 125
There are many features of principal components that can be examined,
but for now keep the defaults and click OK. The results will appear as in
the following table.
Principal Components Analysis
Date: 08/31/07 Time: 14:45Sample: 1986M03 2007M04Included observations: 254Computed using: Ordinary correlationsExtracting 6 of 6 possible components
Eigenvalues: (Sum =6, Average =1)
Cumulative Cumulative
Number Value Difference Proportion Value Proportion
1 5.645020 5.307297 0.9408 5.645020 0.94082 0.337724 0.323663 0.0563 5.982744 0.99713 0.014061 0.011660 0.0023 5.996805 0.99954 0.002400 0.001928 0.0004 5.999205 0.99995 0.000473 0.000150 0.0001 5.999678 0.99996 0.000322 – 0.0001 6.000000 1.0000
Eigenvectors (loadings):
Variable PC 1 PC 2 PC 3 PC 4 PC 5 PC 6
USTB3M 0.405126 −0.450928 0.556508 −0.407061 0.393026 −0.051647
USTB6M 0.409611 −0.393843 0.084066 0.204579 −0.746089 0.267466
USTB1Y 0.415240 −0.265576 −0.370498 0.577827 0.335650 −0.416211
USTB3Y 0.418939 0.118972 −0.540272 −0.295318 0.243919 0.609699
USTB5Y 0.410743 0.371439 −0.159996 −0.461981 −0.326636 −0.589582
USTB10Y 0.389162 0.647225 0.477986 0.3973990 0.100167 0.182274
Ordinary correlations:
USTB3M USTB6M USTB1Y USTB3Y USTB5Y USTB10Y
USTB3M 1.000000USTB6M 0.997052 1.000000
USTB1Y 0.986682 0.995161 1.000000USTB3Y 0.936070 0.952056 0.973701 1.000000USTB5Y 0.881930 0.899989 0.929703 0.987689 1.000000
USTB10Y 0.794794 0.814497 0.852213 0.942477 0.981955 1.000000
It is evident that there is a great deal of common variation in the series,
since the ﬁrst principal component captures 94% of the variation in theseries and the ﬁrst two components capture 99.7%. Consequently, if wewished, we could reduce the dimensionality of the system by using twocomponents rather than the entire six interest rate series. Interestingly,

126 Introductory Econometrics for Finance
Screenshot 3.2
Conducting PCA in
EViews
the ﬁrst component comprises almost exactly equal weights in all six
series.
Then Minimise this group and you will see that the ‘Interest’ group
has been added to the list of objects.
Review questions
1. By using examples from the relevant statistical tables, explain the
relationship between the t- and the F-distributions.
For questions 2–5, assume that the econometric model is of the form
yt=β1+β2x2t+β3x3t+β4x4t+β5x5t+ut (3.51)
2. Which of the following hypotheses about the coefﬁcients can be tested
using a t-test? Which of them can be tested using an F-test? In each
case, state the number of restrictions.
(a)H0:β3=2
(b)H0:β3+β4=1

Further development and analysis of the CLRM 127
(c)H0:β3+β4=1 andβ5=1
(d)H0:β2=0 andβ3=0 andβ4=0 andβ5=0
(e)H0:β2β3=1
3. Which of the above null hypotheses constitutes ‘THE’ regression
F-statistic in the context of (3.51)? Why is this null hypothesis
always of interest whatever the regression relationship under study?What exactly would constitute the alternative hypothesis in thiscase?
4. Which would you expect to be bigger – the unrestricted residual sum of
squares or the restricted residual sum of squares, and why?
5. You decide to investigate the relationship given in the null hypothesis of
question 2, part (c). What would constitute the restricted regression?The regressions are carried out on a sample of 96 quarterlyobservations, and the residual sums of squares for the restricted andunrestricted regressions are 102.87 and 91.41, respectively. Performthe test. What is your conclusion?
6. You estimate a regression of the form given by (3.52) below in order to
evaluate the effect of various ﬁrm-speciﬁc factors on the returns of asample of ﬁrms. You run a cross-sectional regression with 200ﬁrms
r
i=β0+β1Si+β2MB i+β3PEi+β4BETA i+ui (3.52)
where: riis the percentage annual return for the stock
Siis the size of ﬁrm imeasured in terms of sales revenue
MB iis the market to book ratio of the ﬁrm
PEiis the price/earnings (P/E) ratio of the ﬁrm
BETA iis the stock’s CAPM beta coefﬁcient
You obtain the following results (with standard errors in parentheses)
ˆri=0.080+0.801Si+0.321MB i+0.164PEi−0.084BETA i
(0.064) (0 .147) (0 .136) (0 .420) (0 .120) (3.53)
Calculate the t-ratios. What do you conclude about the effect of each
variable on the returns of the security? On the basis of your results,what variables would you consider deleting from the regression? If astock’s beta increased from 1 to 1.2, what would be the expectedeffect on the stock’s return? Is the sign on beta as you would haveexpected? Explain your answers in each case.

128 Introductory Econometrics for Finance
7. A researcher estimates the following econometric models including a
lagged dependent variable
yt=β1+β2x2t+β3x3t+β4yt−1+ut (3.54)
/Delta1yt=γ1+γ2x2t+γ3x3t+γ4yt−1+vt (3.55)
where utandvtare iid disturbances.
Will these models have the same value of (a) The residual sum of
squares ( RSS), (b) R2, (c) Adjusted R2? Explain your answers in each
case.
8. A researcher estimates the following two econometric models
yt=β1+β2x2t+β3x3t+ut (3.56)
yt=β1+β2x2t+β3x3t+β4x4t+vt (3.57)
where utandvtare iid disturbances and x3tis an irrelevant variable
which does not enter into the data generating process for yt. Will the
value of (a) R2, (b) Adjusted R2, be higher for the second model than
the ﬁrst? Explain your answers.
9. Re-open the CAPM Eviews ﬁle and estimate CAPM betas for each of the
other stocks in the ﬁle.(a) Which of the stocks, on the basis of the parameter estimates you
obtain, would you class as defensive stocks and which asaggressive stocks? Explain your answer.
(b) Is the CAPM able to provide any reasonable explanation of the
overall variability of the returns to each of the stocks over thesample period? Why or why not?
10. Re-open the Macro ﬁle and apply the same APT-type model to some of
the other time-series of stock returns contained in the CAPM-ﬁle.(a) Run the stepwise procedure in each case. Is the same sub-set of
variables selected for each stock? Can you rationalise thedifferences between the series chosen?
(b) Examine the sizes and signs of the parameters in the regressions
in each case – do these make sense?
11. What are the units of R
2?

4
Classical linear regression model assumptions
and diagnostic tests
Learning Outcomes
In this chapter, you will learn how to
●Describe the steps involved in testing regression residuals for
heteroscedasticity and autocorrelation
●Explain the impact of heteroscedasticity or autocorrelation on
the optimality of OLS parameter and standard error estimation
●Distinguish between the Durbin–Watson and Breusch–Godfrey
tests for autocorrelation
●Highlight the advantages and disadvantages of dynamic models
●Test for whether the functional form of the model employed is
appropriate
●Determine whether the residual distribution from a regression
differs signiﬁcantly from normality
●Investigate whether the model parameters are stable
●Appraise different philosophies of how to build an econometric
model
●Conduct diagnostic tests in EViews
4.1 Introduction
Recall that ﬁve assumptions were made relating to the classical linear re-
gression model (CLRM). These were required to show that the estimationtechnique, ordinary least squares (OLS), had a number of desirable proper-ties, and also so that hypothesis tests regarding the coefﬁcient estimatescould validly be conducted. Speciﬁcally, it was assumed that:
(1)E(u
t)=0
(2)var(ut)=σ2<∞
(3)cov(ui,uj)=0
129

130 Introductory Econometrics for Finance
(4)cov(ut,xt)=0
(5)ut∼N(0,σ2)
These assumptions will now be studied further, in particular looking at
the following:
●How can violations of the assumptions be detected?
●What are the most likely causes of the violations in practice?
●What are the consequences for the model if an assumption is violatedbut this fact is ignored and the researcher proceeds regardless?
The answer to the last of these questions is that, in general, the model
could encounter any combination of three problems:
●the coefﬁcient estimates ( ˆβs) are wrong
●the associated standard errors are wrong
●the distributions that were assumed for the test statistics are inappro-
priate.
A pragmatic approach to ‘solving’ problems associated with the use of
models where one or more of the assumptions is not supported by thedata will then be adopted. Such solutions usually operate such that:
●the assumptions are no longer violated, or
●the problems are side-stepped, so that alternative techniques are usedwhich are still valid.
4.2 Statistical distributions for diagnostic tests
The text below discusses various regression diagnostic (misspeciﬁcation)tests that are based on the calculation of a test statistic. These tests canbe constructed in several ways, and the precise approach to constructingthe test statistic will determine the distribution that the test statistic isassumed to follow. Two particular approaches are in common usage andtheir results are given by the statistical packages: the LM test and the Waldtest. Further details concerning these procedures are given in chapter 8.For now, all that readers require to know is that LM test statistics in thecontext of the diagnostic tests presented here follow a χ
2distribution
with degrees of freedom equal to the number of restrictions placedon the model, and denoted m. The Wald version of the test follows an
F-distribution with ( m,T−k) degrees of freedom. Asymptotically, these
two tests are equivalent, although their results will differ somewhatin small samples. They are equivalent as the sample size increasestowards inﬁnity since there is a direct relationship between the χ
2-a n d

Classical linear regression model assumptions and diagnostic tests 131
F-distributions. Taking a χ2variate and dividing by its degrees of freedom
asymptotically gives an F-variate
χ2(m)
m→F(m,T−k)a s T→∞
Computer packages typically present results using both approaches, al-
though only one of the two will be illustrated for each test below. They willusually give the same conclusion, although if they do not, the F-version
is usually considered preferable for ﬁnite samples, since it is sensitive tosample size (one of its degrees of freedom parameters depends on samplesize) in a way that the χ
2-version is not.
4.3 Assumption 1: E(ut)=0
The ﬁrst assumption required is that the average value of the errors is
zero. In fact, if a constant term is included in the regression equation, thisassumption will never be violated. But what if ﬁnancial theory suggeststhat, for a particular application, there should be no intercept so thatthe regression line is forced through the origin? If the regression didnot include an intercept, and the average value of the errors was non-zero, several undesirable consequences could arise. First, R
2, deﬁned as
ESS/TSS can be negative, implying that the sample average, ¯y, ‘explains’
more of the variation in ythan the explanatory variables. Second, and
more fundamentally, a regression with no intercept parameter could leadto potentially severe biases in the slope coefﬁcient estimates. To see this,consider ﬁgure 4.1.
yt
xtFigure 4.1
Effect of no
intercept on aregression line

132 Introductory Econometrics for Finance
The solid line shows the regression estimated including a constant term,
while the dotted line shows the effect of suppressing (i.e. setting to zero)the constant term. The effect is that the estimated line in this case isforced through the origin, so that the estimate of the slope coefﬁcient(ˆβ) is biased. Additionally, R
2and ¯R2are usually meaningless in such a
context. This arises since the mean value of the dependent variable, ¯y,
will not be equal to the mean of the ﬁtted values from the model, i.e. themean of ˆyif there is no constant in the regression.
4.4 Assumption 2: var(ut)=σ2<∞
It has been assumed thus far that the variance of the errors is con-
stant, σ2– this is known as the assumption of homoscedasticity . If the er-
rors do not have a constant variance, they are said to be heteroscedastic .
To consider one illustration of heteroscedasticity, suppose that a regres-sion had been estimated and the residuals, ˆu
t, have been calculated and
then plotted against one of the explanatory variables, x2t, as shown in
ﬁgure 4.2.
It is clearly evident that the errors in ﬁgure 4.2 are heteroscedastic –
that is, although their mean value is roughly constant, their variance isincreasing systematically with x
2t.
ût
x2t+
–Figure 4.2
Graphical
illustration ofheteroscedasticity

Classical linear regression model assumptions and diagnostic tests 133
4.4.1 Detection of heteroscedasticity
How can one tell whether the errors are heteroscedastic or not? It is pos-
sible to use a graphical method as above, but unfortunately one rarelyknows the cause or the form of the heteroscedasticity, so that a plot islikely to reveal nothing. For example, if the variance of the errors wasan increasing function of x
3t, and the researcher had plotted the residu-
als against x2t, he would be unlikely to see any pattern and would thus
wrongly conclude that the errors had constant variance. It is also possiblethat the variance of the errors changes over time rather than systemati-cally with one of the explanatory variables; this phenomenon is knownas ‘ARCH’ and is described in chapter 8.
Fortunately, there are a number of formal statistical tests for het-
eroscedasticity, and one of the simplest such methods is the Goldfeld–Quandt (1965) test. Their approach is based on splitting the total sampleof length Tinto two sub-samples of length T
1and T2. The regression model
is estimated on each sub-sample and the two residual variances are cal-culated as s
2
1=ˆu/prime
1ˆu1/(T1−k)and s2
2=ˆu/prime
2ˆu2/(T2−k)respectively. The null
hypothesis is that the variances of the disturbances are equal, which canbe written H
0:σ2
1=σ2
2, against a two-sided alternative. The test statistic,
denoted GQ, is simply the ratio of the two residual variances where the
larger of the two variances must be placed in the numerator (i.e. s2
1is the
higher sample variance for the sample with length T1, even if it comes
from the second sub-sample):
GQ=s2
1
s2
2(4.1)
The test statistic is distributed as an F(T1−k,T2−k)under the null hy-
pothesis, and the null of a constant variance is rejected if the test statisticexceeds the critical value.
The GQtest is simple to construct but its conclusions may be contin-
gent upon a particular, and probably arbitrary, choice of where to splitthe sample. Clearly, the test is likely to be more powerful when this choiceis made on theoretical grounds – for example, before and after a majorstructural event. Suppose that it is thought that the variance of the dis-turbances is related to some observable variable z
t(which may or may not
be one of the regressors). A better way to perform the test would be toorder the sample according to values of z
t(rather than through time) and
then to split the re-ordered sample into T1and T2.
An alternative method that is sometimes used to sharpen the inferences
from the test and to increase its power is to omit some of the observations

134 Introductory Econometrics for Finance
from the centre of the sample so as to introduce a degree of separation
between the two sub-samples.
A further popular test is White’s (1980) general test for heteroscedas-
ticity. The test is particularly useful because it makes few assumptionsabout the likely form of the heteroscedasticity. The test is carried out asin box 4.1.
Box 4.1 Conducting White’s test
(1) Assume that the regression model estimated is of the standard linear form, e.g.
yt=β1+β2x2t+β3x3t+ut (4.2)
To test var( ut)=σ2, estimate the model above, obtaining the residuals, ˆut
(2) Then run the auxiliary regression
ˆu2
t=α1+α2x2t+α3x3t+α4×2
2t+α5×2
3t+α6x2tx3t+vt (4.3)
where vtis a normally distributed disturbance term independent of ut. This
regression is of the squared residuals on a constant, the original explanatoryvariables, the squares of the explanatory variables and their cross-products. To seewhy the squared residuals are the quantity of interest, recall that for a randomvariable u
t, the variance can be written
var(ut)=E[(ut−E(ut))2] (4.4)
Under the assumption that E( ut)=0, the second part of the RHS of this
expression disappears:
var(ut)=E/bracketleftBig
u2
t/bracketrightBig
(4.5)
Once again, it is not possible to know the squares of the population disturbances,
u2
t, so their sample counterparts, the squared residuals, are used instead.
The reason that the auxiliary regression takes this form is that it is desirable to
investigate whether the variance of the residuals (embodied in ˆu2
t) varies
systematically with any known variables relevant to the model. Relevant variableswill include the original explanatory variables, their squared values and theircross-products. Note also that this regression should include a constant term,even if the original regression did not. This is as a result of the fact that ˆu
2
twill
always have a non-zero mean, even if ˆuthas a zero mean.
(3) Given the auxiliary regression, as stated above, the test can be conducted using
two different approaches. First, it is possible to use the F-test framework described
in chapter 3. This would involve estimating (4.3) as the unrestricted regression andthen running a restricted regression of ˆu
2
ton a constant only. The RSSfrom each
speciﬁcation would then be used as inputs to the standard F-test formula.
With many diagnostic tests, an alternative approach can be adopted that does
not require the estimation of a second (restricted) regression. This approach isknown as a Lagrange Multiplier (LM) test, which centres around the value of R
2for
the auxiliary regression. If one or more coefﬁcients in (4.3) is statisticallysigniﬁcant, the value of R
2for that equation will be relatively high, while if none of
the variables is signiﬁcant, R2will be relatively low. The LM test would thus operate

Classical linear regression model assumptions and diagnostic tests 135
by obtaining R2from the auxiliary regression and multiplying it by the number of
observations, T. It can be shown that
TR2∼χ2(m)
where mis the number of regressors in the auxiliary regression (excluding the
constant term), equivalent to the number of restrictions that would have to beplaced under the F-test approach.
(4) The test is one of the joint null hypothesis that α
2=0, and α3=0, and α4=0,
andα5=0, and α6=0. For the LM test, if the χ2-test statistic from step 3 is
greater than the corresponding value from the statistical table then reject the nullhypothesis that the errors are homoscedastic.
Example 4.1
Suppose that the model (4.2) above has been estimated using 120 obser-vations, and the R
2from the auxiliary regression (4.3) is 0.234. The test
statistic will be given by TR2=120×0.234=28.8, which will follow a
χ2(5)under the null hypothesis. The 5% critical value from the χ2table is
11.07. The test statistic is therefore more than the critical value and hencethe null hypothesis is rejected. It would be concluded that there is signif-icant evidence of heteroscedasticity, so that it would not be plausible toassume that the variance of the errors is constant in this case.
4.4.2 Consequences of using OLS in the presence of heteroscedasticity
What happens if the errors are heteroscedastic, but this fact is ignoredand the researcher proceeds with estimation and inference? In this case,OLS estimators will still give unbiased (and also consistent) coefﬁcientestimates, but they are no longer BLUE – that is, they no longer have theminimum variance among the class of unbiased estimators. The reasonis that the error variance, σ
2, plays no part in the proof that the OLS
estimator is consistent and unbiased, but σ2does appear in the formulae
for the coefﬁcient variances. If the errors are heteroscedastic, the formulaepresented for the coefﬁcient standard errors no longer hold. For a veryaccessible algebraic treatment of the consequences of heteroscedasticity,see Hill, Grifﬁths and Judge (1997, pp. 217–18).
So, the upshot is that if OLS is still used in the presence of heteroscedas-
ticity, the standard errors could be wrong and hence any inferences madecould be misleading. In general, the OLS standard errors will be toolarge for the intercept when the errors are heteroscedastic. The effect ofheteroscedasticity on the slope standard errors will depend on its form.For example, if the variance of the errors is positively related to the

136 Introductory Econometrics for Finance
square of an explanatory variable (which is often the case in practice), the
OLS standard error for the slope will be too low. On the other hand, theOLS slope standard errors will be too big when the variance of the errorsis inversely related to an explanatory variable.
4.4.3 Dealing with heteroscedasticity
If the form (i.e. the cause) of the heteroscedasticity is known, then an alter-native estimation method which takes this into account can be used. Onepossibility is called generalised least squares (GLS). For example, supposethat the error variance was related to z
tby the expression
var(ut)=σ2z2
t (4.6)
All that would be required to remove the heteroscedasticity would be to
divide the regression equation through by zt
yt
zt=β11
zt+β2x2t
zt+β3x3t
zt+vt (4.7)
where vt=ut
ztis an error term.
Now, if var(ut)=σ2z2
t,v a r (vt)=var/parenleftbiggut
zt/parenrightbigg
=var(ut)
z2
t=σ2z2
t
z2
t=σ2for
known z.
Therefore, the disturbances from (4.7) will be homoscedastic. Note that
this latter regression does not include a constant since β1is multiplied by
(1/zt). GLS can be viewed as OLS applied to transformed data that satisfy
the OLS assumptions. GLS is also known as weighted least squares (WLS),since under GLS a weighted sum of the squared residuals is minimised,whereas under OLS it is an unweighted sum.
However, researchers are typically unsure of the exact cause of the het-
eroscedasticity, and hence this technique is usually infeasible in practice.Two other possible ‘solutions’ for heteroscedasticity are shown in box 4.2.
Examples of tests for heteroscedasticity in the context of the single in-
dex market model are given in Fabozzi and Francis (1980). Their results arestrongly suggestive of the presence of heteroscedasticity, and they examinevarious factors that may constitute the form of the heteroscedasticity.
4.4.4 Testing for heteroscedasticity using EViews
Re-open the Microsoft Workﬁle that was examined in the previous chap-ter and the regression that included all the macroeconomic explanatoryvariables. First, plot the residuals by selecting View/Actual, Fitted, Residu-
als/Residual Graph. If the residuals of the regression have systematically
changing variability over the sample, that is a sign of heteroscedasticity.

Classical linear regression model assumptions and diagnostic tests 137
In this case, it is hard to see any clear pattern, so we need to run the
formal statistical test. To test for heteroscedasticity using White’s test,click on the View button in the regression window and select Residual
Tests/Heteroscedasticity Tests . You will see a large number of different
tests available, including the ARCH test that will be discussed in chapter8. For now, select the White specification . You can also select whether
to include the cross-product terms or not (i.e. each variable multiplied byeach other variable) or include only the squares of the variables in theauxiliary regression. Uncheck the ‘ Include White cross terms ’ given the
relatively large number of variables in this regression and then click OK.
The results of the test will appear as follows.
Heteroskedasticity Test: White
F-statistic 0.626761 Prob. F(7,244) 0.7336
Obs∗R-squared 4.451138 Prob. Chi-Square(7) 0.7266
Scaled explained SS 21.98760 Prob. Chi-Square(7) 0.0026
Test Equation:Dependent Variable: RESID
∧2
Method: Least SquaresDate: 08/27/07 Time: 11:49Sample: 1986M05 2007M04Included observations: 252
Coefﬁcient Std. Error t-Statistic Prob.
C 259.9542 65.85955 3.947099 0.0001
ERSANDP∧2 −0.130762 0.826291 −0.158252 0.8744
DPROD∧2 −7.465850 7.461475 −1.000586 0.3180
DCREDIT∧2 −1.65E-07 3.72E-07 −0.443367 0.6579
DINFLATION∧2 −137.6317 227.2283 −0.605698 0.5453
DMONEY∧2 12.79797 13.66363 0.936645 0.3499
DSPREAD∧2 −650.6570 3144.176 −0.20694 0.8362
RTERM∧2 −491.0652 418.2860 −1.173994 0.2415
R-squared 0.017663 Mean dependent var 188.4152
Adjusted R-squared −0.010519 S.D. dependent var 612.8558
S.E. of regression 616.0706 Akaike info criterion 15.71583Sum squared resid 92608485 Schwarz criterion 15.82788Log likelihood −1972.195 Hannan-Quinn criter. 15.76092
F-statistic 0.626761 Durbin-Watson stat 2.068099Prob(F-statistic) 0.733596
EViews presents three different types of tests for heteroscedasticity and
then the auxiliary regression in the ﬁrst results table displayed. The teststatistics give us the information we need to determine whether theassumption of homoscedasticity is valid or not, but seeing the actual

138 Introductory Econometrics for Finance
Box 4.2 ‘Solutions’ for heteroscedasticity
(1)Transforming the variables into logs or reducing by some other measure of ‘size’. This
has the effect of re-scaling the data to ‘pull in’ extreme observations. The regressionwould then be conducted upon the natural logarithms or the transformed data. Takinglogarithms also has the effect of making a previously multiplicative model, such asthe exponential regression model discussed previously (with a multiplicative errorterm), into an additive one. However, logarithms of a variable cannot be taken insituations where the variable can take on zero or negative values, for the log will notbe deﬁned in such cases.
(2)Using heteroscedasticity-consistent standard error estimates . Most standard econo-
metrics software packages have an option (usually called something like ‘robust’)that allows the user to employ standard error estimates that have been modiﬁed toaccount for the heteroscedasticity following White (1980). The effect of using thecorrection is that, if the variance of the errors is positively related to the square ofan explanatory variable, the standard errors for the slope coefﬁcients are increasedrelative to the usual OLS standard errors, which would make hypothesis testing more‘conservative’, so that more evidence would be required against the null hypothesisbefore it would be rejected.
auxiliary regression in the second table can provide useful additional in-formation on the source of the heteroscedasticity if any is found. In thiscase, both the F- and χ
2(‘LM’) versions of the test statistic give the same
conclusion that there is no evidence for the presence of heteroscedasticity,since the p-values are considerably in excess of 0.05. The third version of
the test statistic, ‘Scaled explained SS’, which as the name suggests is basedon a normalised version of the explained sum of squares from the auxil-iary regression, suggests in this case that there is evidence of heteroscedas-ticity. Thus the conclusion of the test is somewhat ambiguous here.
4.4.5 Using White’s modiﬁed standard error estimates in EViews
In order to estimate the regression with heteroscedasticity-robust standarderrors in EViews, select this from the option button in the regression entrywindow. In other words, close the heteroscedasticity test window and clickon the original ‘Msoftreg’ regression results, then click on the Estimate
button and in the Equation Estimation window, choose the Options tab
and screenshot 4.1 will appear.
Check the ‘ Heteroskedasticity consistent coefficient variance ’ box and
click OK. Comparing the results of the regression using heteroscedasticity-
robust standard errors with those using the ordinary standard er-rors, the changes in the signiﬁcances of the parameters are onlymarginal. Of course, only the standard errors have changed and theparameter estimates have remained identical to those from before. The

Classical linear regression model assumptions and diagnostic tests 139
Screenshot 4.1
Regression options
window
heteroscedasticity-consistent standard errors are smaller for all variables
except for money supply, resulting in the p-values being smaller. The main
changes in the conclusions reached are that the term structure variable,which was previously signiﬁcant only at the 10% level, is now signiﬁcantat 5%, and the unexpected inﬂation variable is now signiﬁcant at the 10%level.
4.5 Assumption 3: cov(ui,uj)=0f o r i/negationslash=j
Assumption 3 that is made of the CLRM’s disturbance terms is that the
covariance between the error terms over time (or cross-sectionally, forthat type of data) is zero. In other words, it is assumed that the errors areuncorrelated with one another. If the errors are not uncorrelated withone another, it would be stated that they are ‘autocorrelated’ or that theyare ‘serially correlated’. A test of this assumption is therefore required.
Again, the population disturbances cannot be observed, so tests for
autocorrelation are conducted on the residuals, ˆu. Before one can proceed
to see how formal tests for autocorrelation are formulated, the conceptof the lagged value of a variable needs to be deﬁned.

140 Introductory Econometrics for Finance
Table 4.1 Constructing a series of lagged values and ﬁrst differences
ty t yt−1 /Delta1yt
2006 M09 0.8 −−
2006 M10 1.3 0.8 (1.3 −0.8)=0.5
2006 M11 −0.9 1.3 ( −0.9−1.3)=− 2.2
2006 M12 0.2 −0.9 (0.2 −−0.9)=1.1
2007 M01 −1.7 0.2 ( −1.7−0.2)=− 1.9
2007 M02 2.3 −1.7 (2.3 −−1.7)=4.0
2007 M03 0.1 2.3 (0.1 −2.3)=− 2.2
2007 M04 0.0 0.1 (0.0 −0.1)=− 0.1
.. . … . … . .
4.5.1 The concept of a lagged value
The lagged value of a variable (which may be yt,xt,o r ut)is simply the
value that the variable took during a previous period. So for example, thevalue of y
tlagged one period, written yt−1, can be constructed by shifting
all of the observations forward one period in a spreadsheet, as illustratedin table 4.1.
So, the value in the 2006 M10 row and the y
t−1column shows the value
that yttook in the previous period, 2006 M09, which was 0.8. The last
column in table 4.1 shows another quantity relating to y, namely the
‘ﬁrst difference’. The ﬁrst difference of y, also known as the change in y,
and denoted /Delta1yt, is calculated as the difference between the values of y
in this period and in the previous period. This is calculated as
/Delta1yt=yt−yt−1 (4.8)
Note that when one-period lags or ﬁrst differences of a variable are con-
structed, the ﬁrst observation is lost. Thus a regression of /Delta1ytusing the
above data would begin with the October 2006 data point. It is also possi-ble to produce two-period lags, three-period lags, and so on. These wouldbe accomplished in the obvious way.
4.5.2 Graphical tests for autocorrelation
In order to test for autocorrelation, it is necessary to investigate whetherany relationships exist between the current value of ˆu,ˆu
t, and any of
its previous values, ˆut−1,ˆut−2,…The ﬁrst step is to consider possible

Classical linear regression model assumptions and diagnostic tests 141
ût
ût–1+
–+ –Figure 4.3
Plot of ˆutagainst
ˆut−1, showing
positiveautocorrelation
relationships between the current residual and the immediately previ-
ous one, ˆut−1, via a graphical exploration. Thus ˆutis plotted against ˆut−1,
and ˆutis plotted over time. Some stereotypical patterns that may be found
in the residuals are discussed below.
Figures 4.3 and 4.4 show positive autocorrelation in the residuals, which
is indicated by a cyclical residual plot over time. This case is known as pos-
itive autocorrelation since on average if the residual at time t−1is positive,
the residual at time tis likely to be also positive; similarly, if the residual
att−1is negative, the residual at tis also likely to be negative. Figure 4.3
shows that most of the dots representing observations are in the ﬁrst andthird quadrants, while ﬁgure 4.4 shows that a positively autocorrelatedseries of residuals will not cross the time-axis very frequently.
Figures 4.5 and 4.6 show negative autocorrelation, indicated by an
alternating pattern in the residuals. This case is known as negativeautocorrelation since on average if the residual at time t−1is positive,
the residual at time tis likely to be negative; similarly, if the residual
att−1is negative, the residual at tis likely to be positive. Figure 4.5
shows that most of the dots are in the second and fourth quadrants,while ﬁgure 4.6 shows that a negatively autocorrelated series of residu-als will cross the time-axis more frequently than if they were distributedrandomly.

142 Introductory Econometrics for Finance
ût
+
–timeFigure 4.4
Plot of ˆutover time,
showing positiveautocorrelation
ût
ût–1+
–+ –Figure 4.5
Plot of ˆutagainst
ˆut−1, showing
negativeautocorrelation
Finally, ﬁgures 4.7 and 4.8 show no pattern in residuals at all: this is
what is desirable to see. In the plot of ˆutagainst ˆut−1(ﬁgure 4.7), the points
are randomly spread across all four quadrants, and the time series plot ofthe residuals (ﬁgure 4.8) does not cross the x-axis either too frequently or
too little.

Classical linear regression model assumptions and diagnostic tests 143
ût
+
–timeFigure 4.6
Plot of ˆutover time,
showing negativeautocorrelation
ût
ût–1+
–+ –Figure 4.7
Plot of ˆutagainst
ˆut−1, showing no
autocorrelation
4.5.3 Detecting autocorrelation: the Durbin–Watson test
Of course, a ﬁrst step in testing whether the residual series from an esti-
mated model are autocorrelated would be to plot the residuals as above,looking for any patterns. Graphical methods may be difﬁcult to interpretin practice, however, and hence a formal statistical test should also beapplied. The simplest test is due to Durbin and Watson (1951).

144 Introductory Econometrics for Finance
ût
+
–timeFigure 4.8
Plot of ˆutover time,
showing noautocorrelation
Durbin–Watson ( DW) is a test for ﬁrst order autocorrelation – i.e. it tests
only for a relationship between an error and its immediately previousvalue. One way to motivate the test and to interpret the test statisticwould be in the context of a regression of the time terror on its previous
value
u
t=ρut−1+vt (4.9)
where vt∼N(0,σ2
v).The DWtest statistic has as its null and alternative
hypotheses
H0:ρ=0 and H 1:ρ/negationslash=0
Thus, under the null hypothesis, the errors at time t−1and tare indepen-
dent of one another, and if this null were rejected, it would be concludedthat there was evidence of a relationship between successive residuals. Infact, it is not necessary to run the regression given by (4.9) since the teststatistic can be calculated using quantities that are already available afterthe ﬁrst regression has been run
DW=
T/summationdisplay
t=2(ˆut−ˆut−1)2
T/summationdisplay
t=2ˆu2
t(4.10)
The denominator of the test statistic is simply (the number of observations
−1)×the variance of the residuals. This arises since if the average of the

Classical linear regression model assumptions and diagnostic tests 145
residuals is zero
var(ˆut)=E(ˆu2
t)=1
T−1T/summationdisplay
t=2ˆu2
t
so that
T/summationdisplay
t=2ˆu2
t=var(ˆut)×(T−1)
The numerator ‘compares’ the values of the error at times t−1and t.
If there is positive autocorrelation in the errors, this difference in thenumerator will be relatively small, while if there is negative autocorrela-tion, with the sign of the error changing very frequently, the numeratorwill be relatively large. No autocorrelation would result in a value for thenumerator between small and large.
It is also possible to express the DWstatistic as an approximate function
of the estimated value of ρ
DW≈2(1−ˆρ) (4.11)
where ˆρis the estimated correlation coefﬁcient that would have been
obtained from an estimation of (4.9). To see why this is the case, considerthat the numerator of (4.10) can be written as the parts of a quadratic
T/summationdisplay
t=2(ˆut−ˆut−1)2=T/summationdisplay
t=2ˆu2
t+T/summationdisplay
t=2ˆu2
t−1−2T/summationdisplay
t=2ˆutˆut−1 (4.12)
Consider now the composition of the ﬁrst two summations on the RHS of
(4.12). The ﬁrst of these is
T/summationdisplay
t=2ˆu2
t=ˆu2
2+ˆu2
3+ˆu2
4+···+ ˆu2
T
while the second is
T/summationdisplay
t=2ˆu2
t−1=ˆu2
1+ˆu2
2+ˆu2
3+···+ ˆu2
T−1
Thus, the only difference between them is that they differ in the ﬁrst and
last terms in the summation
T/summationdisplay
t=2ˆu2
t
contains ˆu2
Tbut not ˆu2
1, while
T/summationdisplay
t=2ˆu2
t−1

146 Introductory Econometrics for Finance
contains ˆu2
1but not ˆu2
T. As the sample size, T, increases towards inﬁn-
ity, the difference between these two will become negligible. Hence, theexpression in (4.12), the numerator of (4.10), is approximately
2T/summationdisplay
t=2ˆu2
t−2T/summationdisplay
t=2ˆutˆut−1
Replacing the numerator of (4.10) with this expression leads to
DW≈2T/summationdisplay
t=2ˆu2
t−2T/summationdisplay
t=2ˆutˆut−1
T/summationdisplay
t=2ˆu2
t=2⎛
⎜⎜⎜⎜⎝1−T/summationdisplay
t=2ˆutˆut−1
T/summationdisplay
t=2ˆu2
t⎞
⎟⎟⎟⎟⎠(4.13)
The covariance between u
tand ut−1can be written as E[( ut−E(ut))(ut−1−
E(ut−1))]. Under the assumption that E( ut)=0(and therefore that E( ut−1)=
0), the covariance will be E[ utut−1]. For the sample residuals, this covari-
ance will be evaluated as
1
T−1T/summationdisplay
t=2ˆutˆut−1
Thus, the sum in the numerator of the expression on the right of (4.13)
can be seen as T−1times the covariance between ˆutand ˆut−1, while the
sum in the denominator of the expression on the right of (4.13) can beseen from the previous exposition as T−1times the variance of ˆu
t. Thus,
it is possible to write
DW≈2/parenleftbigg
1−T−1c o v ( ˆut,ˆut−1)
T−1 var( ˆut)/parenrightbigg
=2/parenleftbigg
1−cov(ˆut,ˆut−1)
var(ˆut)/parenrightbigg
=2(1−corr(ˆut,ˆut−1)) (4.14)
so that the DW test statistic is approximately equal to 2(1−ˆρ). Since ˆρ
is a correlation, it implies that −1≤ˆρ≤1. That is, ˆρis bounded to lie
between −1and+1. Substituting in these limits for ˆρto calculate DW
from (4.11) would give the corresponding limits for DW as0≤DW≤4.
Consider now the implication of DWtaking one of three important values
(0, 2, and 4):
●ˆρ=0,DW=2This is the case where there is no autocorrelation in
the residuals. So roughly speaking, the null hypothesis would not berejected if DWis near 2→i.e. there is little evidence of autocorrelation.
●ˆρ=1,DW=0This corresponds to the case where there is perfect pos-
itive autocorrelation in the residuals.

Classical linear regression model assumptions and diagnostic tests 147
Reject H 0:
positiveautocorrelationInconclusiveDo not reject
H
0: No evidence
of autocorrelationInconclusiveReject H 0:
negativeautocorrelation
0 d
L dU 4-dU 24 – dL 4
Figure 4.9 Rejection and non-rejection regions for DWtest
●ˆρ=−1,DW=4This corresponds to the case where there is perfect
negative autocorrelation in the residuals.
The DWtest does not follow a standard statistical distribution such as a
t,F,o rχ2.DW has 2 critical values: an upper critical value ( dU)and a
lower critical value ( dL), and there is also an intermediate region where
the null hypothesis of no autocorrelation can neither be rejected nor notrejected! The rejection, non-rejection, and inconclusive regions are shownon the number line in ﬁgure 4.9.
So, to reiterate, the null hypothesis is rejected and the existence of pos-
itive autocorrelation presumed if DWis less than the lower critical value;
the null hypothesis is rejected and the existence of negative autocorrela-tion presumed if DWis greater than 4 minus the lower critical value; the
null hypothesis is not rejected and no signiﬁcant residual autocorrelationis presumed if DWis between the upper and 4 minus the upper limits.
Example 4.2
A researcher wishes to test for ﬁrst order serial correlation in the residuals
from a linear regression. The DWtest statistic value is 0.86. There are 80
quarterly observations in the regression, and the regression is of the form
yt=β1+β2x2t+β3x3t+β4x4t+ut (4.15)
The relevant critical values for the test (see table A2.6 in the appendix of
statistical distributions at the end of this book), are dL=1.42,dU=1.57,s o
4−dU=2.43and 4−dL=2.58.The test statistic is clearly lower than the
lower critical value and hence the null hypothesis of no autocorrelationis rejected and it would be concluded that the residuals from the modelappear to be positively autocorrelated.
4.5.4 Conditions which must be fulﬁlled for DWto be a valid test
In order for the DWtest to be valid for application, three conditions must
be fulﬁlled (box 4.3).

148 Introductory Econometrics for Finance
Box 4.3 Conditions for DWto be a valid test
(1) There must be a constant term in the regression
(2) The regressors must be non-stochastic – as assumption 4 of the CLRM (see p. 160
and chapter 6)
(3) There must be no lags of dependent variable (see section 4.5.8) in the regression.
If the test were used in the presence of lags of the dependent vari-
able or otherwise stochastic regressors, the test statistic would be biasedtowards 2, suggesting that in some instances the null hypothesis of noautocorrelation would not be rejected when it should be.
4.5.5 Another test for autocorrelation: the Breusch–Godfrey test
Recall that DWis a test only of whether consecutive errors are related to
one another. So, not only can the DWtest not be applied if a certain set of
circumstances are not fulﬁlled, there will also be many forms of residualautocorrelation that DWcannot detect. For example, if corr( ˆu
t,ˆut−1)=0,
but corr( ˆut,ˆut−2)/negationslash=0,DW as deﬁned above will not ﬁnd any autocorre-
lation. One possible solution would be to replace ˆut−1in (4.10) with ˆut−2.
However, pairwise examinations of the correlations ( ˆut,ˆut−1),(ˆut,ˆut−2),(ˆut,
ˆut−3),…will be tedious in practice and is not coded in econometrics soft-
ware packages, which have been programmed to construct DWusing only
a one-period lag. In addition, the approximation in (4.11) will deteriorateas the difference between the two time indices increases. Consequently,the critical values should also be modiﬁed somewhat in these cases.
Therefore, it is desirable to examine a joint test for autocorrelation that
will allow examination of the relationship between ˆu
tand several of its
lagged values at the same time. The Breusch–Godfrey test is a more generaltest for autocorrelation up to the rthorder. The model for the errors under
this test is
u
t=ρ1ut−1+ρ2ut−2+ρ3ut−3+ ···+ ρrut−r+vt,v t∼N/parenleftbig
0,σ2
v/parenrightbig
(4.16)
The null and alternative hypotheses are:
H0:ρ1=0 and ρ2=0 and …andρr=0
H1:ρ1/negationslash=0o r ρ2/negationslash=0o r…orρr/negationslash=0
So, under the null hypothesis, the current error is not related to any of
itsrprevious values. The test is carried out as in box 4.4.
Note that (T−r)pre-multiplies R2in the test for autocorrelation rather
than T(as was the case for the heteroscedasticity test). This arises because

Classical linear regression model assumptions and diagnostic tests 149
Box 4.4 Conducting a Breusch–Godfrey test
(1) Estimate the linear regression using OLS and obtain the residuals, ˆut
(2) Regress ˆuton all of the regressors from stage 1 (the xs) plus ˆut−1,ˆut−2,…,ˆut−r;
the regression will thus be
ˆut=γ1+γ2x2t+γ3x3t+γ4x4t+ρ1ˆut−1+ρ2ˆut−2+ρ3ˆut−3
+···+ ρrˆut−r+vt,vt∼N/parenleftbig
0,σ2
v/parenrightbig
(4.17)
Obtain R2from this auxiliary regression
(3) Letting Tdenote the number of observations, the test statistic is given by
(T−r)R2∼χ2
r
the ﬁrst robservations will effectively have been lost from the sample
in order to obtain the rlags used in the test regression, leaving (T−r)
observations from which to estimate the auxiliary regression. If the teststatistic exceeds the critical value from the Chi-squared statistical tables,reject the null hypothesis of no autocorrelation. As with any joint test,only one part of the null hypothesis has to be rejected to lead to rejectionof the hypothesis as a whole. So the error at time thas to be signiﬁcantly
related only to one of its previous rvalues in the sample for the null of
no autocorrelation to be rejected. The test is more general than the DW
test, and can be applied in a wider variety of circumstances since it doesnot impose the DWrestrictions on the format of the ﬁrst stage regression.
One potential difﬁculty with Breusch–Godfrey, however, is in determin-
ing an appropriate value of r, the number of lags of the residuals, to use
in computing the test. There is no obvious answer to this, so it is typicalto experiment with a range of values, and also to use the frequency of thedata to decide. So, for example, if the data is monthly or quarterly, set r
equal to 12 or 4, respectively. The argument would then be that errors atany given time would be expected to be related only to those errors in theprevious year. Obviously, if the model is statistically adequate, no evidenceof autocorrelation should be found in the residuals whatever value of ris
chosen.
4.5.6 Consequences of ignoring autocorrelation if it is present
In fact, the consequences of ignoring autocorrelation when it is presentare similar to those of ignoring heteroscedasticity. The coefﬁcient esti-mates derived using OLS are still unbiased, but they are inefﬁcient, i.e.they are not BLUE, even at large sample sizes, so that the standard er-ror estimates could be wrong. There thus exists the possibility that thewrong inferences could be made about whether a variable is or is not

150 Introductory Econometrics for Finance
an important determinant of variations in y. In the case of positive
serial correlation in the residuals, the OLS standard error estimates willbe biased downwards relative to the true standard errors. That is, OLSwill understate their true variability. This would lead to an increase inthe probability of type I error – that is, a tendency to reject the null hy-pothesis sometimes when it is correct. Furthermore, R
2is likely to be
inﬂated relative to its ‘correct’ value if autocorrelation is present but ig-nored, since residual autocorrelation will lead to an underestimate of thetrue error variance (for positive autocorrelation).
4.5.7 Dealing with autocorrelation
If the form of the autocorrelation is known, it would be possible to usea GLS procedure. One approach, which was once fairly popular, is knownas the Cochrane–Orcutt procedure (see box 4.5). Such methods work by as-suming a particular form for the structure of the autocorrelation (usuallya ﬁrst order autoregressive process – see chapter 5 for a general descriptionof these models). The model would thus be speciﬁed as follows:
y
t=β1+β2x2t+β3x3t+ut, ut=ρut−1+vt (4.18)
Note that a constant is not required in the speciﬁcation for the errors
since E( ut)=0. If this model holds at time t, it is assumed to also hold
for time t−1, so that the model in (4.18) is lagged one period
yt−1=β1+β2x2t−1+β3x3t−1+ut−1 (4.19)
Multiplying (4.19) by ρ
ρyt−1=ρβ1+ρβ2x2t−1+ρβ3x3t−1+ρut−1 (4.20)
Subtracting (4.20) from (4.18) would give
yt−ρyt−1=β1−ρβ1+β2x2t−ρβ2x2t−1+β3x3t−ρβ3x3t−1+ut−ρut−1
(4.21)
Factorising, and noting that vt=ut−ρut−1
(yt−ρyt−1)=(1−ρ)β1+β2(x2t−ρx2t−1)+β3(x3t−ρx3t−1)+vt
(4.22)
Setting y∗
t=yt−ρyt−1,β∗
1=(1−ρ)β1,x∗
2t=(x2t−ρx2t−1),and x∗
3t=(x3t−
ρx3t−1), the model in (4.22) can be written
y∗
t=β∗
1+β2x∗
2t+β3x∗
3t+vt (4.23)

Classical linear regression model assumptions and diagnostic tests 151
Box 4.5 The Cochrane–Orcutt procedure
(1) Assume that the general model is of the form (4.18) above. Estimate the equation
in (4.18) using OLS, ignoring the residual autocorrelation.
(2) Obtain the residuals, and run the regression
ˆut=ρˆut−1+vt (4.24)
(3) Obtain ˆρand construct y∗
tetc. using this estimate of ˆρ.
(4) Run the GLS regression (4.23).
Since the ﬁnal speciﬁcation (4.23) contains an error term that is freefrom autocorrelation, OLS can be directly applied to it. This procedure iseffectively an application of GLS. Of course, the construction of y
∗
tetc.
requires ρto be known. In practice, this will never be the case so that ρ
has to be estimated before (4.23) can be used.
A simple method would be to use the ρobtained from rearranging
the equation for the DWstatistic given in (4.11). However, this is only an
approximation as the related algebra showed. This approximation may bepoor in the context of small samples.
The Cochrane–Orcutt procedure is an alternative, which operates as in
box 4.5.
This could be the end of the process. However, Cochrane and Orcutt
(1949) argue that better estimates can be obtained by going through steps2–4 again. That is, given the new coefﬁcient estimates, β
∗
1,β2,β3, etc. con-
struct again the residual and regress it on its previous value to obtaina new estimate for ˆρ. This would then be used to construct new values
of the variables y
∗
t,x∗
2t,x∗
3tand a new (4.23) is estimated. This procedure
would be repeated until the change in ˆρbetween one iteration and the
next is less than some ﬁxed amount (e.g. 0.01). In practice, a small numberof iterations (no more than 5) will usually sufﬁce.
However, the Cochrane–Orcutt procedure and similar approaches re-
quire a speciﬁc assumption to be made concerning the form of the modelfor the autocorrelation. Consider again (4.22). This can be rewritten takingρy
t−1over to the RHS
yt=(1−ρ)β1+β2(x2t−ρx2t−1)+β3(x3t−ρx3t−1)+ρyt−1+vt (4.25)
Expanding the brackets around the explanatory variable terms would give
yt=(1−ρ)β1+β2x2t−ρβ2x2t−1+β3x3t−ρβ3x3t−1+ρyt−1+vt(4.26)

152 Introductory Econometrics for Finance
Now, suppose that an equation containing the same variables as (4.26)
were estimated using OLS
yt=γ1+γ2x2t+γ3x2t−1+γ4x3t+γ5x3t−1+γ6yt−1+vt (4.27)
It can be seen that (4.26) is a restricted version of (4.27), with the re-
strictions imposed that the coefﬁcient on x2tin (4.26) multiplied by the
negative of the coefﬁcient on yt−1gives the coefﬁcient on x2t−1, and that
the coefﬁcient on x3tmultiplied by the negative of the coefﬁcient on yt−1
gives the coefﬁcient on x3t−1. Thus, the restrictions implied for (4.27) to
get (4.26) are
γ2γ6=−γ3andγ4γ6=−γ5
These are known as the common factor restrictions , and they should be tested
before the Cochrane–Orcutt or similar procedure is implemented. If therestrictions hold, Cochrane–Orcutt can be validly applied. If not, however,Cochrane–Orcutt and similar techniques would be inappropriate, and theappropriate step would be to estimate an equation such as (4.27) directlyusing OLS. Note that in general there will be a common factor restrictionfor every explanatory variable (excluding a constant) x
2t,x3t,…, xktin the
regression. Hendry and Mizon (1978) argued that the restrictions are likelyto be invalid in practice and therefore a dynamic model that allows forthe structure of yshould be used rather than a residual correction on a
static model – see also Hendry (1980).
The White variance–covariance matrix of the coefﬁcients (that is, calcu-
lation of the standard errors using the White correction for heteroscedas-ticity) is appropriate when the residuals of the estimated equation areheteroscedastic but serially uncorrelated. Newey and West (1987) developa variance–covariance estimator that is consistent in the presence of bothheteroscedasticity and autocorrelation. So an alternative approach to deal-ing with residual autocorrelation would be to use appropriately modiﬁedstandard error estimates.
While White’s correction to standard errors for heteroscedasticity as dis-
cussed above does not require any user input, the Newey–West procedurerequires the speciﬁcation of a truncation lag length to determine the num-ber of lagged residuals used to evaluate the autocorrelation. EViews usesINTEGER[ 4(T/100)
2/9]. In EViews, the Newey–West procedure for estimat-
ing the standard errors is employed by invoking it from the same placeas the White heteroscedasticity correction. That is, click the Estimate but-
ton and in the Equation Estimation window, choose the Options tab and
then instead of checking the ‘White’ box, check Newey-West . While this
option is listed under ‘Heteroskedasticity consistent coefﬁcient variance’,

Classical linear regression model assumptions and diagnostic tests 153
the Newey-West procedure in fact produces ‘HAC’ (Heteroscedasticity and
Autocorrelation Consistent) standard errors that correct for both autocor-relation and heteroscedasticity that may be present.
A more ‘modern’ view concerning autocorrelation is that it presents
an opportunity rather than a problem! This view, associated with Sargan,Hendry and Mizon, suggests that serial correlation in the errors arises asa consequence of ‘misspeciﬁed dynamics’. For another explanation of thereason why this stance is taken, recall that it is possible to express thedependent variable as the sum of the parts that can be explained usingthe model, and a part which cannot (the residuals)
y
t=ˆyt+ˆut (4.28)
where ˆytare the ﬁtted values from the model (=ˆβ1+ˆβ2x2t+ˆβ3x3t+···+
ˆβkxkt). Autocorrelation in the residuals is often caused by a dynamic struc-
ture in ythat has not been modelled and so has not been captured in
the ﬁtted values. In other words, there exists a richer structure in thedependent variable yand more information in the sample about that
structure than has been captured by the models previously estimated.What is required is a dynamic model that allows for this extra structureiny.
4.5.8 Dynamic models
All of the models considered so far have been static in nature, e.g.
yt=β1+β2x2t+β3x3t+β4x4t+β5x5t+ut (4.29)
In other words, these models have allowed for only a contemporaneous re-
lationship between the variables, so that a change in one or more of the
explanatory variables at time tcauses an instant change in the depen-
dent variable at time t. But this analysis can easily be extended to the
case where the current value of ytdepends on previous values of yor on
previous values of one or more of the variables, e.g.
yt=β1+β2x2t+β3x3t+β4x4t+β5x5t+γ1yt−1+γ2x2t−1
+···+ γkxkt−1+ut (4.30)
It is of course possible to extend the model even more by adding further
lags, e.g. x2t−2,yt−3. Models containing lags of the explanatory variables
(but no lags of the explained variable) are known as distributed lag models .
Speciﬁcations with lags of both explanatory and explained variables areknown as autoregressive distributed lag (ADL) models.
How many lags and of which variables should be included in a dy-
namic regression model? This is a tricky question to answer, but hopefully

154 Introductory Econometrics for Finance
recourse to ﬁnancial theory will help to provide an answer; for another
response (see section 4.13).
Another potential ‘remedy’ for autocorrelated residuals would be to
switch to a model in ﬁrst differences rather than in levels. As explainedpreviously, the ﬁrst difference of y
t, i.e. yt−yt−1is denoted /Delta1yt; similarly,
one can construct a series of ﬁrst differences for each of the explanatoryvariables, e.g. /Delta1x
2t=x2t−x2t−1, etc. Such a model has a number of other
useful features (see chapter 7 for more details) and could be expressed as
/Delta1yt=β1+β2/Delta1x2t+β3/Delta1x3t+ut (4.31)
Sometimes the change in yis purported to depend on previous values
of the level of yorxi(i=2,…, k)as well as changes in the explanatory
variables
/Delta1yt=β1+β2/Delta1x2t+β3/Delta1x3t+β4x2t−1+β5yt−1+ut (4.32)
4.5.9 Why might lags be required in a regression?
Lagged values of the explanatory variables or of the dependent variable (or
both) may capture important dynamic structure in the dependent variablethat might be caused by a number of factors. Two possibilities that arerelevant in ﬁnance are as follows:
●Inertia of the dependent variable Often a change in the value of one
of the explanatory variables will not affect the dependent variable im-mediately during one time period, but rather with a lag over severaltime periods. For example, the effect of a change in market microstruc-ture or government policy may take a few months or longer to workthrough since agents may be initially unsure of what the implicationsfor asset pricing are, and so on. More generally, many variables in eco-nomics and ﬁnance will change only slowly. This phenomenon arisespartly as a result of pure psychological factors – for example, in ﬁnan-cial markets, agents may not fully comprehend the effects of a particu-lar news announcement immediately, or they may not even believe thenews. The speed and extent of reaction will also depend on whether thechange in the variable is expected to be permanent or transitory. Delaysin response may also arise as a result of technological or institutionalfactors. For example, the speed of technology will limit how quicklyinvestors’ buy or sell orders can be executed. Similarly, many investorshave savings plans or other ﬁnancial products where they are ‘locked in’and therefore unable to act for a ﬁxed period. It is also worth noting that

Classical linear regression model assumptions and diagnostic tests 155
dynamic structure is likely to be stronger and more prevalent the higher
is the frequency of observation of the data.
●Overreactions It is sometimes argued that ﬁnancial markets overre-
act to good and to bad news. So, for example, if a ﬁrm makes a proﬁtwarning, implying that its proﬁts are likely to be down when formallyreported later in the year, the markets might be anticipated to perceivethis as implying that the value of the ﬁrm is less than was previouslythought, and hence that the price of its shares will fall. If there isan overreaction, the price will initially fall below that which is appro-priate for the ﬁrm given this bad news, before subsequently bouncingback up to a new level (albeit lower than the initial level before theannouncement).
Moving from a purely static model to one which allows for lagged ef-
fects is likely to reduce, and possibly remove, serial correlation which waspresent in the static model’s residuals. However, other problems with theregression could cause the null hypothesis of no autocorrelation to berejected, and these would not be remedied by adding lagged variables tothe model:
●Omission of relevant variables, which are themselves autocorrelatedIn other words, if there is a variable that is an important determinantof movements in y, but which has not been included in the model, and
which itself is autocorrelated, this will induce the residuals from theestimated model to be serially correlated. To give a ﬁnancial context inwhich this may arise, it is often assumed that investors assess one-step-ahead expected returns on a stock using a linear relationship
r
t=α0+α1/Omega1t−1+ut (4.33)
where /Omega1t−1is a set of lagged information variables (i.e. /Omega1t−1is a vector of
observations on a set of variables at time t−1). However, (4.33) cannot
be estimated since the actual information set used by investors to formtheir expectations of returns is not known. /Omega1
t−1is therefore proxied
with an assumed sub-set of that information, Zt−1. For example, in many
popular arbitrage pricing speciﬁcations, the information set used in theestimated model includes unexpected changes in industrial production,the term structure of interest rates, inﬂation and default risk premia.Such a model is bound to omit some informational variables used byactual investors in forming expectations of returns, and if these areautocorrelated, it will induce the residuals of the estimated model tobe also autocorrelated.

156 Introductory Econometrics for Finance
●Autocorrelation owing to unparameterised seasonality Suppose that
the dependent variable contains a seasonal or cyclical pattern, wherecertain features periodically occur. This may arise, for example, in thecontext of sales of gloves, where sales will be higher in the autumnand winter than in the spring or summer. Such phenomena are likelyto lead to a positively autocorrelated residual structure that is cyclicalin shape, such as that of ﬁgure 4.4, unless the seasonal patterns arecaptured by the model. See chapter 9 for a discussion of seasonalityand how to deal with it.
●If ‘misspecification’ error has been committed by using an inappro-
priate functional form For example, if the relationship between yand
the explanatory variables was a non-linear one, but the researcher hadspeciﬁed a linear regression model, this may again induce the residualsfrom the estimated model to be serially correlated.
4.5.10 The long-run static equilibrium solution
Once a general model of the form given in (4.32) has been found, it maycontain many differenced and lagged terms that make it difﬁcult to in-terpret from a theoretical perspective. For example, if the value of x
2
were to increase in period t, what would be the effect on yin periods,
t,t+1,t+2, and so on? One interesting property of a dynamic model
that can be calculated is its long-run or static equilibrium solution.
The relevant deﬁnition of ‘equilibrium’ in this context is that a system
has reached equilibrium if the variables have attained some steady statevalues and are no longer changing, i.e. if yand xare in equilibrium, it is
possible to write
y
t=yt+1=…=yandx2t=x2t+1=…=x2,and so on.
Consequently, /Delta1yt=yt−yt−1=y−y=0,/Delta1x2t=x2t−x2t−1=x2−x2=
0, etc. since the values of the variables are no longer changing. So the
way to obtain a long-run static solution from a given empirical modelsuch as (4.32) is:
(1) Remove all time subscripts from the variables
(2) Set error terms equal to their expected values of zero, i.e E( u
t)=0
(3) Remove differenced terms (e.g. /Delta1yt)altogether
(4) Gather terms in xtogether and gather terms in ytogether
(5) Rearrange the resulting equation if necessary so that the dependent
variable yis on the left-hand side (LHS) and is expressed as a function
of the independent variables.

Classical linear regression model assumptions and diagnostic tests 157
Example 4.3
Calculate the long-run equilibrium solution for the following model
/Delta1yt=β1+β2/Delta1x2t+β3/Delta1x3t+β4x2t−1+β5yt−1+ut (4.34)
Applying ﬁrst steps 1–3 above, the static solution would be given by
0=β1+β4×2+β5y (4.35)
Rearranging (4.35) to bring yto the LHS
β5y=−β1−β4×2 (4.36)
and ﬁnally, dividing through by β5
y=−β1
β5−β4
β5×2 (4.37)
Equation (4.37) is the long-run static solution to (4.34). Note that this
equation does not feature x3, since the only term which contained x3
was in ﬁrst differenced form, so that x3does not inﬂuence the long-run
equilibrium value of y.
4.5.11 Problems with adding lagged regressors to ‘cure’ autocorrelation
In many instances, a move from a static model to a dynamic one will result
in a removal of residual autocorrelation. The use of lagged variables in aregression model does, however, bring with it additional problems:
●Inclusion of lagged values of the dependent variable violates the as-sumption that the explanatory variables are non-stochastic (assump-
tion 4 of the CLRM), since by deﬁnition the value of yis determined
partly by a random error term, and so its lagged values cannot be non-stochastic. In small samples, inclusion of lags of the dependent variablecan lead to biased coefﬁcient estimates, although they are still consis-tent, implying that the bias will disappear asymptotically (that is, asthe sample size increases towards inﬁnity).
●What does an equation with a large number of lags actually mean?A model with many lags may have solved a statistical problem(autocorrelated residuals) at the expense of creating an interpretationalone (the empirical model containing many lags or differenced terms isdifﬁcult to interpret and may not test the original ﬁnancial theory thatmotivated the use of regression analysis in the ﬁrst place).
Note that if there is still autocorrelation in the residuals of a model
including lags, then the OLS estimators will not even be consistent. To see

158 Introductory Econometrics for Finance
why this occurs, consider the following regression model
yt=β1+β2x2t+β3x3t+β4yt−1+ut (4.38)
where the errors, ut, follow a ﬁrst order autoregressive process
ut=ρut−1+vt (4.39)
Substituting into (4.38) for utfrom (4.39)
yt=β1+β2x2t+β3x3t+β4yt−1+ρut−1+vt (4.40)
Now, clearly ytdepends upon yt−1. Taking (4.38) and lagging it one period
(i.e. subtracting one from each time index)
yt−1=β1+β2x2t−1+β3x3t−1+β4yt−2+ut−1 (4.41)
It is clear from (4.41) that yt−1is related to ut−1since they both appear
in that equation. Thus, the assumption that E( X/primeu)=0is not satisﬁed
for (4.41) and therefore for (4.38). Thus the OLS estimator will not beconsistent, so that even with an inﬁnite quantity of data, the coefﬁcientestimates would be biased.
4.5.12 Autocorrelation and dynamic models in EViews
In EViews, the lagged values of variables can be used as regressors or forother purposes by using the notation x(−1)for a one-period lag, x(−5)
for a ﬁve-period lag, and so on, where xis the variable name. EViews
will automatically adjust the sample period used for estimation to takeinto account the observations that are lost in constructing the lags. Forexample, if the regression contains ﬁve lags of the dependent variable, ﬁveobservations will be lost and estimation will commence with observationsix.
In EViews, the DWstatistic is calculated automatically, and was given in
the general estimation output screens that result from estimating any re-gression model. To view the results screen again, click on the View button
in the regression window and select Estimation output . For the Microsoft
macroeconomic regression that included all of the explanatory variables,the value of the DWstatistic was 2.156. What is the appropriate conclu-
sion regarding the presence or otherwise of ﬁrst order autocorrelation inthis case?
The Breusch–Godfrey test can be conducted by selecting View; Residual
Tests; Serial Correlation LM Test …In the new window, type again the
number of lagged residuals you want to include in the test and click onOK. Assuming that you selected to employ ten lags in the test, the results
would be as given in the following table.

Classical linear regression model assumptions and diagnostic tests 159
Breusch-Godfrey Serial Correlation LM Test:
F-statistic 1.497460 Prob. F(10,234) 0.1410
Obs*R-squared 15.15657 Prob. Chi-Square(10) 0.1265
Test Equation:Dependent Variable: RESIDMethod: Least SquaresDate: 08/27/07 Time: 13:26Sample: 1986M05 2007M04Included observations: 252Presample missing value lagged residuals set to zero.
Coefﬁcient Std. Error t-Statistic Prob.
C 0.087053 1.461517 0.059563 0.9526
ERSANDP −0.021725 0.204588 −0.106187 0.9155
DPROD −0.036054 0.510873 −0.070573 0.9438
DCREDIT −9.64E-06 0.000162 −0.059419 0.9527
DINFLATION −0.364149 3.010661 −0.120953 0.9038
DMONEY 0.225441 0.718175 0.313909 0.7539
DSPREAD 0.202672 13.70006 0.014794 0.9882
RTERM −0.19964 3.363238 −0.059360 0.9527
RESID( −1) −0.12678 0.065774 −1.927509 0.0551
RESID( −2) −0.063949 0.066995 −0.954537 0.3408
RESID( −3) −0.038450 0.065536 −0.586694 0.5580
RESID( −4) −0.120761 0.065906 −1.832335 0.0682
RESID( −5) −0.126731 0.065253 −1.942152 0.0533
RESID( −6) −0.090371 0.066169 −1.365755 0.1733
RESID( −7) −0.071404 0.065761 −1.085803 0.2787
RESID( −8) −0.119176 0.065926 −1.807717 0.0719
RESID( −9) −0.138430 0.066121 −2.093571 0.0374
RESID( −10) −0.060578 0.065682 −0.922301 0.3573
R-squared 0.060145 Mean dependent var 8.11E-17
Adjusted R-squared −0.008135 S.D. dependent var 13.75376
S.E. of regression 13.80959 Akaike info criterion 8.157352Sum squared resid 44624.90 Schwarz criterion 8.409454Log likelihood −1009.826 Hannan-Quinn criter. 8.258793
F-statistic 0.880859 Durbin-Watson stat 2.013727Prob(F-statistic) 0.597301
In the ﬁrst table of output, EViews offers two versions of the test – an
F-version and a χ2version, while the second table presents the estimates
from the auxiliary regression. The conclusion from both versions of thetest in this case is that the null hypothesis of no autocorrelation shouldnot be rejected. Does this agree with the DWtest result?

160 Introductory Econometrics for Finance
4.5.13 Autocorrelation in cross-sectional data
The possibility that autocorrelation may occur in the context of a time
series regression is quite intuitive. However, it is also plausible that auto-correlation could be present in certain types of cross-sectional data. Forexample, if the cross-sectional data comprise the proﬁtability of banks indifferent regions of the US, autocorrelation may arise in a spatial sense,if there is a regional dimension to bank proﬁtability that is not capturedby the model. Thus the residuals from banks of the same region or inneighbouring regions may be correlated. Testing for autocorrelation inthis case would be rather more complex than in the time series context,and would involve the construction of a square, symmetric ‘spatial con-tiguity matrix’ or a ‘distance matrix’. Both of these matrices would be
N×N, where Nis the sample size. The former would be a matrix of ze-
ros and ones, with one for element i,jwhen observation ioccurred for
a bank in the same region to, or sufﬁciently close to, region jand zero
otherwise ( i,j=1,…, N). The distance matrix would comprise elements
that measured the distance (or the inverse of the distance) between bank
iand bank j. A potential solution to a ﬁnding of autocorrelated residuals
in such a model would be again to use a model containing a lag struc-ture, in this case known as a ‘spatial lag’. Further details are contained inAnselin (1988).
4.6 Assumption 4: the xtare non-stochastic
Fortunately, it turns out that the OLS estimator is consistent and unbiased
in the presence of stochastic regressors, provided that the regressors arenot correlated with the error term of the estimated equation. To see this,recall that
ˆβ=(X
/primeX)−1X/primeyand y=Xβ+u (4.42)
Thus
ˆβ=(X/primeX)−1X/prime(Xβ+u) (4.43)
ˆβ=(X/primeX)−1X/primeXβ+(X/primeX)−1X/primeu (4.44)
ˆβ=β+(X/primeX)−1X/primeu (4.45)
Taking expectations, and provided that Xand uare independent,1
E(ˆβ)=E(β)+E((X/primeX)−1X/primeu) (4.46)
E(ˆβ)=β+E[(X/primeX)−1X/prime]E(u) (4.47)
1A situation where Xand uare not independent is discussed at length in chapter 6.

Classical linear regression model assumptions and diagnostic tests 161
Since E( u)=0, this expression will be zero and therefore the estimator is
still unbiased, even if the regressors are stochastic.
However, if one or more of the explanatory variables is contemporane-
ously correlated with the disturbance term, the OLS estimator will noteven be consistent. This results from the estimator assigning explanatorypower to the variables where in reality it is arising from the correlationbetween the error term and y
t. Suppose for illustration that x2tand ut
are positively correlated. When the disturbance term happens to take a
high value, ytwill also be high (because yt=β1+β2x2t+···+ ut). But if
x2tis positively correlated with ut, then x2tis also likely to be high. Thus
the OLS estimator will incorrectly attribute the high value of ytto a high
value of x2t, where in reality ytis high simply because utis high, which
will result in biased and inconsistent parameter estimates and a ﬁttedline that appears to capture the features of the data much better than itdoes in reality.
4.7 Assumption 5: the disturbances are normally distributed
Recall that the normality assumption ( ut∼N(0,σ2))is required in order
to conduct single or joint hypothesis tests about the model parameters.
4.7.1 Testing for departures from normality
One of the most commonly applied tests for normality is the Bera–Jarque(hereafter BJ) test. BJ uses the property of a normally distributed randomvariable that the entire distribution is characterised by the ﬁrst two mo-ments – the mean and the variance. The standardised third and fourthmoments of a distribution are known as its skewness and kurtosis . Skewness
measures the extent to which a distribution is not symmetric about itsmean value and kurtosis measures how fat the tails of the distribution are.A normal distribution is not skewed and is deﬁned to have a coefﬁcientof kurtosis of 3. It is possible to deﬁne a coefﬁcient of excess kurtosis,equal to the coefﬁcient of kurtosis minus 3; a normal distribution willthus have a coefﬁcient of excess kurtosis of zero. A normal distribution issymmetric and said to be mesokurtic. To give some illustrations of what aseries having speciﬁc departures from normality may look like, considerﬁgures 4.10 and 4.11.
A normal distribution is symmetric about its mean, while a skewed
distribution will not be, but will have one tail longer than the other, suchas in the right hand part of ﬁgure 4.10.

162 Introductory Econometrics for Finance
x xxf ( ) xf ( )
Figure 4.10 A normal versus a skewed distribution
0.5
0.40.30.20.10.0
–5.4 –3.6 –1.8 0.0 1.8 3.6 5.4Figure 4.11
A leptokurtic versus
a normal distribution
A leptokurtic distribution is one which has fatter tails and is more
peaked at the mean than a normally distributed random variable withthe same mean and variance, while a platykurtic distribution will be lesspeaked in the mean, will have thinner tails, and more of the distributionin the shoulders than a normal. In practice, a leptokurtic distributionis far more likely to characterise ﬁnancial (and economic) time series,and to characterise the residuals from a ﬁnancial time series model. Inﬁgure 4.11, the leptokurtic distribution is shown by the bold line, withthe normal by the faint line.

Classical linear regression model assumptions and diagnostic tests 163
Bera and Jarque (1981) formalise these ideas by testing whether the co-
efﬁcient of skewness and the coefﬁcient of excess kurtosis are jointly zero.Denoting the errors by uand their variance by σ
2, it can be proved that
the coefﬁcients of skewness and kurtosis can be expressed respectively as
b1=E[u3]
/parenleftbig
σ2/parenrightbig3/2and b2=E[u4]
/parenleftbig
σ2/parenrightbig2(4.48)
The kurtosis of the normal distribution is 3 so its excess kurtosis ( b2−3)
is zero.
The Bera–Jarque test statistic is given by
W=T/bracketleftbiggb2
1
6+(b2−3)2
24/bracketrightbigg
(4.49)
where Tis the sample size. The test statistic asymptotically follows a χ2(2)
under the null hypothesis that the distribution of the series is symmetricand mesokurtic.
b
1and b2can be estimated using the residuals from the OLS regression,
ˆu. The null hypothesis is of normality, and this would be rejected if the
residuals from the model were either signiﬁcantly skewed or leptokurtic/platykurtic (or both).
4.7.2 Testing for non-normality using EViews
The Bera–Jarque normality tests results can be viewed by selectingView/Residual Tests/Histogram – Normality Test . The statistic has a χ
2
distribution with 2 degrees of freedom under the null hypothesis of nor-
mally distributed errors. If the residuals are normally distributed, thehistogram should be bell-shaped and the Bera–Jarque statistic would notbe signiﬁcant. This means that the p-value given at the bottom of the
normality test screen should be bigger than 0.05 to not reject the null ofnormality at the 5% level. In the example of the Microsoft regression, thescreen would appear as in screenshot 4.2.
In this case, the residuals are very negatively skewed and are leptokurtic.
Hence the null hypothesis for residual normality is rejected very strongly(the p-value for the BJ test is zero to six decimal places), implying that
the inferences we make about the coefﬁcient estimates could be wrong,although the sample is probably just about large enough that we need beless concerned than we would be with a small sample. The non-normalityin this case appears to have been caused by a small number of verylarge negative residuals representing monthly stock price falls of morethan−25%.

164 Introductory Econometrics for Finance
Screenshot 4.2
Non-normality test
results
4.7.3 What should be done if evidence of non-normality is found?
It is not obvious what should be done! It is, of course, possible to em-
ploy an estimation method that does not assume normality, but such amethod may be difﬁcult to implement, and one can be less sure of itsproperties. It is thus desirable to stick with OLS if possible, since its be-haviour in a variety of circumstances has been well researched. For samplesizes that are sufﬁciently large, violation of the normality assumption isvirtually inconsequential. Appealing to a central limit theorem, the teststatistics will asymptotically follow the appropriate distributions even inthe absence of error normality.
2
In economic or ﬁnancial modelling, it is quite often the case that one
or two very extreme residuals cause a rejection of the normality assump-tion. Such observations would appear in the tails of the distribution, andwould therefore lead u
4, which enters into the deﬁnition of kurtosis, to
be very large. Such observations that do not ﬁt in with the pattern of theremainder of the data are known as outliers . If this is the case, one way
2The law of large numbers states that the average of a sample (which is a random
variable) will converge to the population mean (which is ﬁxed), and the central limittheorem states that the sample mean converges to a normal distribution.

Classical linear regression model assumptions and diagnostic tests 165
ût
+
–timeOct
1987Figure 4.12
Regression
residuals from stockreturn data, showinglarge outlier forOctober 1987
to improve the chances of error normality is to use dummy variables or
some other method to effectively remove those observations.
In the time series context, suppose that a monthly model of asset re-
turns from 1980–90 had been estimated, and the residuals plotted, andthat a particularly large outlier has been observed for October 1987, shownin ﬁgure 4.12.
A new variable called D87M10
tcould be deﬁned as
D87M10t=1during October 1987 and zero otherwise
The observations for the dummy variable would appear as in box 4.6.
The dummy variable would then be used just like any other variable in
the regression model, e.g.
yt=β1+β2x2t+β3x3t+β4D87M10t+ut (4.50)
Box 4.6 Observations for the dummy variable
Time Value of dummy variable D87M10t
1986 M12 0
1987 M01 0
……
1987 M09 0
1987 M10 1
1987 M11 0
……

166 Introductory Econometrics for Finance
yt
xtFigure 4.13
Possible effect of an
outlier on OLSestimation
This type of dummy variable that takes the value one for only a single
observation has an effect exactly equivalent to knocking out that obser-vation from the sample altogether, by forcing the residual for that obser-vation to zero. The estimated coefﬁcient on the dummy variable will beequal to the residual that the dummied observation would have taken ifthe dummy variable had not been included.
However, many econometricians would argue that dummy variables to
remove outlying residuals can be used to artiﬁcially improve the charac-teristics of the model – in essence fudging the results. Removing outlyingobservations will reduce standard errors, reduce the RSS, and therefore
increase R
2, thus improving the apparent ﬁt of the model to the data.
The removal of observations is also hard to reconcile with the notion instatistics that each data point represents a useful piece of information.
The other side of this argument is that observations that are ‘a long
way away’ from the rest, and seem not to ﬁt in with the general patternof the rest of the data are known as outliers . Outliers can have a serious
effect on coefﬁcient estimates, since by deﬁnition, OLS will receive a bigpenalty, in the form of an increased RSS, for points that are a long way
from the ﬁtted line. Consequently, OLS will try extra hard to minimisethe distances of points that would have otherwise been a long way fromthe line. A graphical depiction of the possible effect of an outlier on OLSestimation, is given in ﬁgure 4.13.
In ﬁgure 4.13, one point is a long way away from the rest. If this point
is included in the estimation sample, the ﬁtted line will be the dottedone, which has a slight positive slope. If this observation were removed,the full line would be the one ﬁtted. Clearly, the slope is now large andnegative. OLS would not select this line if the outlier is included since the

Classical linear regression model assumptions and diagnostic tests 167
observation is a long way from the others and hence when the residual
(the distance from the point to the ﬁtted line) is squared, it would lead toa big increase in the RSS. Note that outliers could be detected by plotting
yagainst xonly in the context of a bivariate regression. In the case where
there are more explanatory variables, outliers are easiest identiﬁed byplotting the residuals over time, as in ﬁgure 4.12, etc.
So, it can be seen that a trade-off potentially exists between the need
to remove outlying observations that could have an undue impact on theOLS estimates and cause residual non-normality on the one hand, and thenotion that each data point represents a useful piece of information onthe other. The latter is coupled with the fact that removing observationsat will could artiﬁcially improve the ﬁt of the model. A sensible way toproceed is by introducing dummy variables to the model only if there isboth a statistical need to do so and a theoretical justiﬁcation for theirinclusion. This justiﬁcation would normally come from the researcher’sknowledge of the historical events that relate to the dependent variableand the model over the relevant sample period. Dummy variables maybe justiﬁably used to remove observations corresponding to ‘one-off’ orextreme events that are considered highly unlikely to be repeated, andthe information content of which is deemed of no relevance for the dataas a whole. Examples may include stock market crashes, ﬁnancial panics,government crises, and so on.
Non-normality in ﬁnancial data could also arise from certain types of
heteroscedasticity, known as ARCH – see chapter 8. In this case, the non-normality is intrinsic to all of the data and therefore outlier removalwould not make the residuals of such a model normal.
Another important use of dummy variables is in the modelling of sea-
sonality in ﬁnancial data, and accounting for so-called ‘calendar anoma-lies’, such as day-of-the-week effects and weekend effects. These are dis-cussed in chapter 9.
4.7.4 Dummy variable construction and use in EViews
As we saw from the plot of the distribution above, the non-normality inthe residuals from the Microsoft regression appears to have been causedby a small number of outliers in the regression residuals. Such eventscan be identiﬁed if it is present by plotting the actual values, the ﬁttedvalues and the residuals of the regression. This can be achieved in EViewsby selecting View/Actual, Fitted, Residual/Actual, Fitted, Residual Graph .
The plot should look as in screenshot 4.3.
From the graph, it can be seen that there are several large (negative)
outliers, but the largest of all occur in early 1998 and early 2003. All of the

168 Introductory Econometrics for Finance
Screenshot 4.3
Regression
residuals, actualvalues and ﬁttedseries
large outliers correspond to months where the actual return was much
smaller (i.e. more negative) than the model would have predicted. Inter-estingly, the residual in October 1987 is not quite so prominent becauseeven though the stock price fell, the market index value fell as well, sothat the stock price fall was at least in part predicted (this can be seen bycomparing the actual and ﬁtted values during that month).
In order to identify the exact dates that the biggest outliers were re-
alised, we could use the shading option by right clicking on the graphand selecting the ‘add lines & shading’ option. But it is probably easier tojust examine a table of values for the residuals, which can be achieved byselecting View/Actual, Fitted, Residual/Actual, Fitted, Residual Table .I fw e
do this, it is evident that the two most extreme residuals (with values tothe nearest integer) were in February 1998 ( −68) and February 2003 ( −67).
As stated above, one way to remove big outliers in the data is by using
dummy variables. It would be tempting, but incorrect, to construct onedummy variable that takes the value 1 for both Feb 98 and Feb 03, butthis would not have the desired effect of setting both residuals to zero. In-stead, to remove two outliers requires us to construct two separate dummy

Classical linear regression model assumptions and diagnostic tests 169
variables. In order to create the Feb 98 dummy ﬁrst, we generate a series
called ‘FEB98DUM’ that will initially contain only zeros. Generate this se-
ries (hint: you can use ‘Quick/Generate Series’ and then type in the box
‘FEB98DUM =0’).Double click on the new object to open the spreadsheet
and turn on the editing mode by clicking ‘Edit +/−’ and input a single 1
in the cell that corresponds to February 1998. Leave all other cell entriesas zeros.
Once this dummy variable has been created, repeat the process above to
create another dummy variable called ‘FEB03DUM’ that takes the value
1 in February 2003 and zero elsewhere and then rerun the regression
including all the previous variables plus these two dummy variables. Thiscan most easily be achieved by clicking on the ‘Msoftreg’ results object ,
then the Estimate button and adding the dummy variables to the end of
the variable list. The full list of variables is
ermsoft c ersandp dprod dcredit dinflation dmoney dspread rterm
feb98dum feb03dum
and the results of this regression are as in the following table.
Dependent Variable: ERMSOFT
Method: Least SquaresDate: 08/29/07 Time: 09:11Sample (adjusted): 1986M05 2007M04Included observations: 252 after adjustments
Coefﬁcient Std. Error t-Statistic Prob.
C −0.086606 1.315194 −0.065850 0.9476
ERSANDP 1.547971 0.183945 8.415420 0.0000
DPROD 0.455015 0.451875 1.006948 0.315
DCREDIT −5.92E-05 0.000145 −0.409065 0.6829
DINFLATION 4.913297 2.685659 1.829457 0.0686
DMONEY −1.430608 0.644601 −2.219369 0.0274
DSPREAD 8.624895 12.22705 0.705395 0.4812
RTERM 6.893754 2.993982 2.302537 0.0222
FEB98DUM −69.14177 12.68402 −5.451093 0.0000
FEB03DUM −68.24391 12.65390 −5.393113 0.0000
R-squared 0.358962 Mean dependent var −0.420803
Adjusted R-squared 0.335122 S.D. dependent var 15.41135S.E. of regression 12.56643 Akaike info criterion 7.938808Sum squared resid 38215.45 Schwarz criterion 8.078865Log likelihood −990.2898 Hannan-Quinn criter. 7.995164
F-statistic 15.05697 Durbin-Watson stat 2.142031Prob(F-statistic) 0.000000

170 Introductory Econometrics for Finance
Note that the dummy variable parameters are both highly signiﬁcant and
take approximately the values that the corresponding residuals wouldhave taken if the dummy variables had not been included in the model.
3
By comparing the results with those of the regression above that excludedthe dummy variables, it can be seen that the coefﬁcient estimates on theremaining variables change quite a bit in this instance and the signiﬁ-cances improve considerably. The term structure and money supply pa-rameters are now both signiﬁcant at the 5% level, and the unexpectedinﬂation parameter is now signiﬁcant at the 10% level. The R
2value has
risen from 0.20 to 0.36 because of the perfect ﬁt of the dummy variablesto those two extreme outlying observations.
Finally, if we re-examine the normality test results by clicking
View/Residual Tests/Histogram – Normality Test , we will see that while
the skewness and kurtosis are both slightly closer to the values that theywould take under normality, the Bera–Jarque test statistic still takes avalue of 829 (compared with over 1000 previously). We would thus con-clude that the residuals are still a long way from following a normaldistribution. While it would be possible to continue to generate dummyvariables, there is a limit to the extent to which it would be desirable to doso. With this particular regression, we are unlikely to be able to achieve aresidual distribution that is close to normality without using an excessivenumber of dummy variables. As a rule of thumb, in a monthly samplewith 252 observations, it is reasonable to include, perhaps, two or threedummy variables, but more would probably be excessive.
4.8 Multicollinearity
An implicit assumption that is made when using the OLS estimationmethod is that the explanatory variables are not correlated with one an-other. If there is no relationship between the explanatory variables, theywould be said to be orthogonal to one another. If the explanatory variables
were orthogonal to one another, adding or removing a variable from aregression equation would not cause the values of the coefﬁcients on theother variables to change.
In any practical context, the correlation between explanatory variables
will be non-zero, although this will generally be relatively benign in the
3Note the inexact correspondence between the values of the residuals and the values of
the dummy variable parameters because two dummies are being used together; had weincluded only one dummy, the value of the dummy variable coefﬁcient and that whichthe residual would have taken would be identical.

Classical linear regression model assumptions and diagnostic tests 171
sense that a small degree of association between explanatory variables
will almost always occur but will not cause too much loss of precision.However, a problem occurs when the explanatory variables are very highlycorrelated with each other, and this problem is known as multicollinearity .
It is possible to distinguish between two classes of multicollinearity: per-fect multicollinearity and near multicollinearity.
Perfect multicollinearity occurs when there is an exact relationship be-
tween two or more variables. In this case, it is not possible to estimate allof the coefﬁcients in the model. Perfect multicollinearity will usually beobserved only when the same explanatory variable is inadvertently usedtwice in a regression. For illustration, suppose that two variables wereemployed in a regression function such that the value of one variable wasalways twice that of the other (e.g. suppose x
3=2×2).I fb o t h x3and x2
were used as explanatory variables in the same regression, then the model
parameters cannot be estimated. Since the two variables are perfectly re-lated to one another, together they contain only enough information toestimate one parameter, not two. Technically, the difﬁculty would occurin trying to invert the (X
/primeX)matrix since it would not be of full rank
(two of the columns would be linearly dependent on one another), sothat the inverse of (X
/primeX)would not exist and hence the OLS estimates
ˆβ=(X/primeX)−1X/primeycould not be calculated.
Near multicollinearity is much more likely to occur in practice, and would
arise when there was a non-negligible, but not perfect, relationship be-tween two or more of the explanatory variables. Note that a high correla-tion between the dependent variable and one of the independent variablesis not multicollinearity.
Visually, we could think of the difference between near and perfect
multicollinearity as follows. Suppose that the variables x
2tand x3twere
highly correlated. If we produced a scatter plot of x2tagainst x3t, then
perfect multicollinearity would correspond to all of the points lying ex-actly on a straight line, while near multicollinearity would correspond tothe points lying close to the line, and the closer they were to the line(taken altogether), the stronger would be the relationship between thetwo variables.
4.8.1 Measuring near multicollinearity
Testing for multicollinearity is surprisingly difﬁcult, and hence all thatis presented here is a simple method to investigate the presence orotherwise of the most easily detected forms of near multicollinear-ity. This method simply involves looking at the matrix of correlations

172 Introductory Econometrics for Finance
between the individual variables. Suppose that a regression equation has
three explanatory variables (plus a constant term), and that the pair-wisecorrelations between these explanatory variables are.
corr
x2 x3 x4
x2 – 0.2 0.8
x3 0.2 – 0.3
x4 0.8 0.3 –
Clearly, if multicollinearity was suspected, the most likely culprit would
be a high correlation between x2and x4. Of course, if the relationship
involves three or more variables that are collinear – e.g. x2+x3≈x4–
then multicollinearity would be very difﬁcult to detect.
4.8.2 Problems if near multicollinearity is present but ignored
First, R2will be high but the individual coefﬁcients will have high stan-
dard errors, so that the regression ‘looks good’ as a whole4, but the in-
dividual variables are not signiﬁcant. This arises in the context of veryclosely related explanatory variables as a consequence of the difﬁculty inobserving the individual contribution of each variable to the overall ﬁtof the regression. Second, the regression becomes very sensitive to smallchanges in the speciﬁcation, so that adding or removing an explanatoryvariable leads to large changes in the coefﬁcient values or signiﬁcances ofthe other variables. Finally, near multicollinearity will thus make conﬁ-dence intervals for the parameters very wide, and signiﬁcance tests mighttherefore give inappropriate conclusions, and so make it difﬁcult to drawsharp inferences.
4.8.3 Solutions to the problem of multicollinearity
A number of alternative estimation techniques have been proposed thatare valid in the presence of multicollinearity – for example, ridge re-gression, or principal components. Principal components analysis was dis-cussed brieﬂy in an appendix to the previous chapter. Many researchersdo not use these techniques, however, as they can be complex, their prop-erties are less well understood than those of the OLS estimator and, aboveall, many econometricians would argue that multicollinearity is more aproblem with the data than with the model or estimation method.
4Note that multicollinearity does not affect the value of R2in a regression.

Classical linear regression model assumptions and diagnostic tests 173
Other, more ad hoc methods for dealing with the possible existence of
near multicollinearity include:
●Ignore it, if the model is otherwise adequate, i.e. statistically and in
terms of each coefﬁcient being of a plausible magnitude and having anappropriate sign. Sometimes, the existence of multicollinearity does notreduce the t-ratios on variables that would have been signiﬁcant without
the multicollinearity sufﬁciently to make them insigniﬁcant. It is worthstating that the presence of near multicollinearity does not affect theBLUE properties of the OLS estimator – i.e. it will still be consistent,unbiased and efﬁcient since the presence of near multicollinearity doesnot violate any of the CLRM assumptions 1–4. However, in the presenceof near multicollinearity, it will be hard to obtain small standard errors.This will not matter if the aim of the model-building exercise is toproduce forecasts from the estimated model, since the forecasts willbe unaffected by the presence of near multicollinearity so long as thisrelationship between the explanatory variables continues to hold overthe forecasted sample.
●Drop one of the collinear variables, so that the problem disappears.
However, this may be unacceptable to the researcher if there were stronga priori theoretical reasons for including both variables in the model.
Also, if the removed variable was relevant in the data generating processfory, an omitted variable bias would result (see section 4.10).
●Transform the highly correlated variables into a ratio and include
only the ratio and not the individual variables in the regression.Again, this may be unacceptable if ﬁnancial theory suggests thatchanges in the dependent variable should occur following changes inthe individual explanatory variables, and not a ratio of them.
●Finally, as stated above, it is also often said that near multicollinear-ity is more a problem with the data than with the model , so that there
is insufﬁcient information in the sample to obtain estimates for allof the coefﬁcients. This is why near multicollinearity leads coefﬁcientestimates to have wide standard errors, which is exactly what wouldhappen if the sample size were small. An increase in the sample sizewill usually lead to an increase in the accuracy of coefﬁcient estimationand consequently a reduction in the coefﬁcient standard errors, thusenabling the model to better dissect the effects of the various explana-tory variables on the explained variable. A further possibility, therefore,is for the researcher to go out and collect more data – for example,by taking a longer run of data, or switching to a higher frequency of

174 Introductory Econometrics for Finance
sampling. Of course, it may be infeasible to increase the sample size
if all available data is being utilised already. A further method of in-creasing the available quantity of data as a potential remedy for nearmulticollinearity would be to use a pooled sample. This would involvethe use of data with both cross-sectional and time series dimensions (seechapter 10).
4.8.4 Multicollinearity in EViews
For the Microsoft stock return example given above previously, a correla-tion matrix for the independent variables can be constructed in EViewsby clicking Quick/Group Statistics/Correlations and then entering the
list of regressors (not including the regressand) in the dialog box thatappears:
ersandp dprod dcredit dinflation dmoney dspread rtermA new window will be displayed that contains the correlation matrix of
the series in a spreadsheet format:
ERSANDP DPROD DCREDIT DINFLATION DMONEY DSPREAD RTERM
ERSANDP 1.000000 −0.096173 −0.012885 −0.013025 −0.033632 −0.038034 0.013764
DPROD −0.096173 1.000000 −0.002741 0.168037 0.121698 −0.073796 −0.042486
DCREDIT −0.012885 −0.002741 1.000000 0.071330 0.035290 0.025261 −0.062432
DINFLATION −0.013025 0.168037 0.071330 1.000000 0.006702 −0.169399 −0.006518
DMONEY −0.033632 0.121698 0.035290 0.006702 1.000000 −0.075082 0.170437
DSPREAD −0.038034 −0.073796 0.025261 −0.169399 −0.075082 1.000000 0.018458
RTERM 0.013764 −0.042486 −0.062432 −0.006518 0.170437 0.018458 1.000000
Do the results indicate any signiﬁcant correlations between the inde-
pendent variables? In this particular case, the largest observed correlationis 0.17 between the money supply and term structure variables and thisis sufﬁciently small that it can reasonably be ignored.
4.9 Adopting the wrong functional form
A further implicit assumption of the classical linear regression model isthat the appropriate ‘functional form’ is linear. This means that the ap-propriate model is assumed to be linear in the parameters, and that inthe bivariate case, the relationship between yand xcan be represented
by a straight line. However, this assumption may not always be upheld.Whether the model should be linear can be formally tested using Ramsey’s(1969) RESET test, which is a general test for misspeciﬁcation of functional

Classical linear regression model assumptions and diagnostic tests 175
form. Essentially, the method works by using higher order terms of the
ﬁtted values (e.g. ˆy2
t,ˆy3
t, etc.) in an auxiliary regression. The auxiliary re-
gression is thus one where yt, the dependent variable from the original
regression, is regressed on powers of the ﬁtted values together with theoriginal explanatory variables
y
t=α1+α2ˆy2
t+α3ˆy3
t+···+ αpˆyp
t+/summationdisplay
βixit+vt (4.51)
Higher order powers of the ﬁtted values of ycan capture a variety
of non-linear relationships, since they embody higher order powers andcross-products of the original explanatory variables, e.g.
ˆy
2
t=(ˆβ1+ˆβ2x2t+ˆβ3x3t+···+ ˆβkxkt)2(4.52)
The value of R2is obtained from the regression (4.51), and the test statis-
tic, given by TR2, is distributed asymptotically as a χ2(p−1). Note that
the degrees of freedom for this test will be ( p−1) and not p. This arises
because pis the highest order term in the ﬁtted values used in the aux-
iliary regression and thus the test will involve p−1terms, one for the
square of the ﬁtted value, one for the cube, …, one for the pth power. If
the value of the test statistic is greater than the χ2critical value, reject
the null hypothesis that the functional form was correct.
4.9.1 What if the functional form is found to be inappropriate?
One possibility would be to switch to a non-linear model, but the RESETtest presents the user with no guide as to what a better speciﬁcation mightbe! Also, non-linear models in the parameters typically preclude the useof OLS, and require the use of a non-linear estimation technique. Somenon-linear models can still be estimated using OLS, provided that they arelinear in the parameters. For example, if the true model is of the form
y
t=β1+β2x2t+β3×2
2t+β4×3
2t+ut (4.53)
– that is, a third order polynomial in x– and the researcher assumes that
the relationship between ytand xtis linear (i.e. x2
2tand x3
2tare missing
from the speciﬁcation), this is simply a special case of omitted variables,with the usual problems (see section 4.10) and obvious remedy.
However, the model may be multiplicatively non-linear. A second possi-
bility that is sensible in this case would be to transform the data intologarithms. This will linearise many previously multiplicative modelsinto additive ones. For example, consider again the exponential growthmodel
y
t=β1xβ2
tut (4.54)

176 Introductory Econometrics for Finance
Taking logs, this becomes
ln(yt)=ln(β1)+β2ln(xt)+ln(ut) (4.55)
or
Yt=α+β2Xt+vt (4.56)
where Yt=ln(yt),α=ln(β1),Xt=ln(xt),vt=ln(ut). Thus a simple loga-
rithmic transformation makes this model a standard linear bivariate re-gression equation that can be estimated using OLS.
Loosely following the treatment given in Stock and Watson (2006), the
following list shows four different functional forms for models that areeither linear or can be made linear following a logarithmic transformationto one or more of the dependent or independent variables, examining onlya bivariate speciﬁcation for simplicity. Care is needed when interpretingthe coefﬁcient values in each case.
(1) Linear model: y
t=β1+β2x2t+ut; a 1-unit increase in x2tcauses a β2-
unit increase in yt.
x2tyt
(2) Log-linear: ln(yt)=β1+β2x2t+ut; a 1-unit increase in x2tcauses a
100×β2% increase in yt.
x2tln yt
x2tyt

Classical linear regression model assumptions and diagnostic tests 177
(3) Linear-log: yt=β1+β2ln(x2t)+ut; a 1% increase in x2tcauses a 0.01×
β2-unit increase in yt.
yt
In(x2t) x2tyt
(4) Double log: ln(yt)=β1+β2ln(x2t)+ut; a 1% increase in x2tcauses a β2%
increase in yt. Note that to plot yagainst x2would be more complex
since the shape would depend on the size of β2.
ln(yt)
In(x2t)
Note also that we cannot use R2or adjusted R2to determine which
of these four types of model is most appropriate since the dependentvariables are different across some of the models.
4.9.2 RESET tests using EViews
Using EViews, the Ramsey RESET test is found in the View menu of the
regression window (for ‘Msoftreg’) under Stability tests/Ramsey RESET
test…. EViews will prompt you for the ‘number of ﬁtted terms’, equivalent
to the number of powers of the ﬁtted value to be used in the regression;leave the default of 1to consider only the square of the ﬁtted values. The
Ramsey RESET test for this regression is in effect testing whether the rela-tionship between the Microsoft stock excess returns and the explanatory

178 Introductory Econometrics for Finance
variables is linear or not. The results of this test for one ﬁtted term are
shown in the following table.
Ramsey RESET Test:
F-statistic 1.603573 Prob. F(1,241) 0.2066
Log likelihood ratio 1.671212 Prob. Chi-Square(1) 0.1961
Test Equation:Dependent Variable: ERMSOFTMethod: Least SquaresDate: 08/29/07 Time: 09:54Sample: 1986M05 2007M04Included observations: 252
Coefﬁcient Std. Error t-Statistic Prob.
C −0.531288 1.359686 −0.390743 0.6963
ERSANDP 1.639661 0.197469 8.303368 0.0000
DPROD 0.487139 0.452025 1.077681 0.2823
DCREDIT −5.99E-05 0.000144 −0.414772 0.6787
DINFLATION 5.030282 2.683906 1.874239 0.0621
DMONEY −1.413747 0.643937 −2.195475 0.0291
DSPREAD 8.488655 12.21231 0.695090 0.4877
RTERM 6.692483 2.994476 2.234943 0.0263
FEB89DUM −94.39106 23.62309 −3.995712 0.0001
FEB03DUM −105.0831 31.71804 −3.313037 0.0011
FITTED∧2 0.007732 0.006106 1.266323 0.2066
R-squared 0.363199 Mean dependent var −0.420803
Adjusted R-squared 0.336776 S.D. dependent var 15.41135S.E. of regression 12.55078 Akaike info criterion 7.940113Sum squared resid 37962.85 Schwarz criterion 8.094175Log likelihood −989.4542 Hannan-Quinn criter. 8.002104
F-statistic 13.74543 Durbin-Watson stat 2.090304Prob(F-statistic) 0.000000
Both F−andχ2versions of the test are presented, and it can be seen
that there is no apparent non-linearity in the regression equation and soit would be concluded that the linear model for the Microsoft returns isappropriate.
4.10 Omission of an important variable
What would be the effects of excluding from the estimated regression avariable that is a determinant of the dependent variable? For example,

Classical linear regression model assumptions and diagnostic tests 179
suppose that the true, but unknown, data generating process is repre-
sented by
yt=β1+β2x2t+β3x3t+β4x4t+β5x5t+ut (4.57)
but the researcher estimated a model of the form
yt=β1+β2x2t+β3x3t+β4x4t+ut (4.58)
so that the variable x5tis omitted from the model. The consequence would
be that the estimated coefﬁcients on all the other variables will be biasedand inconsistent unless the excluded variable is uncorrelated with allthe included variables. Even if this condition is satisﬁed, the estimate ofthe coefﬁcient on the constant term will be biased, which would implythat any forecasts made from the model would be biased. The standarderrors will also be biased (upwards), and hence hypothesis tests could yieldinappropriate inferences. Further intuition is offered in Dougherty (1992,pp. 168–73).
4.11 Inclusion of an irrelevant variable
Suppose now that the researcher makes the opposite error to section 4.10,i.e. that the true DGP was represented by
y
t=β1+β2x2t+β3x3t+β4x4t+ut (4.59)
but the researcher estimates a model of the form
yt=β1+β2x2t+β3x3t+β4x4t+β5x5t+ut (4.60)
thus incorporating the superﬂuous or irrelevant variable x5t.A s x5tis
irrelevant, the expected value of β5is zero, although in any practical
application, its estimated value is very unlikely to be exactly zero. Theconsequence of including an irrelevant variable would be that the coefﬁ-cient estimators would still be consistent and unbiased, but the estima-tors would be inefﬁcient. This would imply that the standard errors forthe coefﬁcients are likely to be inﬂated relative to the values which theywould have taken if the irrelevant variable had not been included. Vari-ables which would otherwise have been marginally signiﬁcant may nolonger be so in the presence of irrelevant variables. In general, it can alsobe stated that the extent of the loss of efﬁciency will depend positivelyon the absolute value of the correlation between the included irrelevantvariable and the other explanatory variables.

180 Introductory Econometrics for Finance
Summarising the last two sections it is evident that when trying to
determine whether to err on the side of including too many or too fewvariables in a regression model, there is an implicit trade-off between in-consistency and efﬁciency; many researchers would argue that while in anideal world, the model will incorporate precisely the correct variables – nomore and no less – the former problem is more serious than the latter andtherefore in the real world, one should err on the side of incorporatingmarginally signiﬁcant variables.
4.12 Parameter stability tests
So far, regressions of a form such as
yt=β1+β2x2t+β3x3t+ut (4.61)
have been estimated. These regressions embody the implicit assumption
that the parameters ( β1,β2andβ3)are constant for the entire sample, both
for the data period used to estimate the model, and for any subsequentperiod used in the construction of forecasts.
This implicit assumption can be tested using parameter stability tests.
The idea is essentially to split the data into sub-periods and then to esti-mate up to three models, for each of the sub-parts and for all the dataand then to ‘compare’ the RSSof each of the models. There are two types
of test that will be considered, namely the Chow (analysis of variance) testand predictive failure tests.
4.12.1 The Chow test
The steps involved are shown in box 4.7.
Box 4.7 Conducting a Chow test
(1)Split the data into two sub-periods . Estimate the regression over the whole period
and then for the two sub-periods separately (3 regressions). Obtain the RSSfor
each regression.
(2)The restricted regression is now the regression for the whole period while the
‘unrestricted regression’ comes in two parts: one for each of the sub-samples. It isthus possible to form an F-test, which is based on the difference between the
RSSs. The statistic is
test statistic =RSS−(RSS
1+RSS 2)
RSS 1+RSS 2×T−2k
k(4.62)
where RSS=residual sum of squares for whole sample

Classical linear regression model assumptions and diagnostic tests 181
RSS 1=residual sum of squares for sub-sample 1
RSS 2=residual sum of squares for sub-sample 2
T=number of observations
2k=number of regressors in the ‘unrestricted’ regression (since it comes in two
parts)k=number of regressors in (each) ‘unrestricted’ regression
The unrestricted regression is the one where the restriction has not been imposed
on the model. Since the restriction is that the coefﬁcients are equal across thesub-samples, the restricted regression will be the single regression for the wholesample. Thus, the test is one of how much the residual sum of squares forthe whole sample ( RSS) is bigger than the sum of the residual sums of squares for
the two sub-samples ( RSS
1+RSS 2). If the coefﬁcients do not change much
between the samples, the residual sum of squares will not rise much uponimposing the restriction. Thus the test statistic in (4.62) can be considered astraightforward application of the standard F-test formula discussed in chapter 3.
The restricted residual sum of squares in (4.62) is RSS, while the unrestricted
residual sum of squares is ( RSS
1+RSS 2). The number of restrictions is equal to the
number of coefﬁcients that are estimated for each of the regressions, i.e. k. The
number of regressors in the unrestricted regression (including the constants) is 2 k,
since the unrestricted regression comes in two parts, each with kregressors.
(3)Perform the test . If the value of the test statistic is greater than the critical value
from the F-distribution, which is an F(k,T−2k), then reject the null hypothesis that
the parameters are stable over time.
Note that it is also possible to use a dummy variables approach to calcu-
lating both Chow and predictive failure tests. In the case of the Chow test,the unrestricted regression would contain dummy variables for the inter-cept and for all of the slope coefﬁcients (see also chapter 9). For example,suppose that the regression is of the form
y
t=β1+β2x2t+β3x3t+ut (4.63)
If the split of the total of Tobservations is made so that the sub-samples
contain T1and T2observations (where T1+T2=T), the unrestricted re-
gression would be given by
yt=β1+β2x2t+β3x3t+β4Dt+β5Dtx2t+β6Dtx3t+vt (4.64)
where Dt=1fort∈T1and zero otherwise. In other words, Dttakes the
value one for observations in the ﬁrst sub-sample and zero for observationsin the second sub-sample. The Chow test viewed in this way would then bea standard F-test of the joint restriction H
0:β4=0andβ5=0andβ6=0,
with (4.64) and (4.63) being the unrestricted and restricted regressions,respectively.

182 Introductory Econometrics for Finance
Example 4.4
Suppose that it is now January 1993. Consider the following regression
for the standard CAPM βfor the returns on a stock
rgt=α+βrMt+ut (4.65)
where rgtand rMtare excess returns on Glaxo shares and on a market
portfolio, respectively. Suppose that you are interested in estimating betausing monthly data from 1981 to 1992, to aid a stock selection decision.Another researcher expresses concern that the October 1987 stock marketcrash fundamentally altered the risk–return relationship. Test this conjec-ture using a Chow test. The model for each sub-period is
1981 M1–1987 M10
ˆr
gt=0.24+1.2rMt T=82 RSS 1=0.03555 (4.66)
1987 M11–1992 M12
ˆrgt=0.68+1.53rMt T=62 RSS 2=0.00336 (4.67)
1981 M1–1992 M12
ˆrgt=0.39+1.37rMt T=144 RSS=0.0434 (4.68)
The null hypothesis is
H0:α1=α2andβ1=β2
where the subscripts 1 and 2 denote the parameters for the ﬁrst and
second sub-samples, respectively. The test statistic will be given by
test statistic =0.0434−(0.0355+0.00336 )
0.0355+0.00336×144−4
2(4.69)
=7.698
The test statistic should be compared with a 5%, F(2,140) =3.06. H 0is
rejected at the 5% level and hence it is concluded that the restrictionthat the coefﬁcients are the same in the two periods cannot be employed.The appropriate modelling response would probably be to employ onlythe second part of the data in estimating the CAPM beta relevant forinvestment decisions made in early 1993.
4.12.2 The predictive failure test
A problem with the Chow test is that it is necessary to have enough datato do the regression on both sub-samples, i.e. T
1/greatermuchk,T2/greatermuchk. This may not

Classical linear regression model assumptions and diagnostic tests 183
hold in the situation where the total number of observations available is
small. Even more likely is the situation where the researcher would liketo examine the effect of splitting the sample at some point very close tothe start or very close to the end of the sample. An alternative formula-tion of a test for the stability of the model is the predictive failure test,which requires estimation for the full sample and one of the sub-samplesonly. The predictive failure test works by estimating the regression over a‘long’ sub-period (i.e. most of the data) and then using those coefﬁcientestimates for predicting values of yfor the other period. These predic-
tions for yare then implicitly compared with the actual values. Although
it can be expressed in several different ways, the null hypothesis for thistest is that the prediction errors for all of the forecasted observations arezero.
To calculate the test:
●Run the regression for the whole period (the restricted regression) and
obtain the RSS.
●Run the regression for the ‘large’ sub-period and obtain the RSS(called
RSS 1). Note that in this book, the number of observations for the long
estimation sub-period will be denoted by T1(even though it may come
second). The test statistic is given by
test statistic =RSS−RSS 1
RSS 1×T1−k
T2(4.70)
where T2=number of observations that the model is attempting to
‘predict’. The test statistic will follow an F(T2,T1−k).
For an intuitive interpretation of the predictive failure test statistic for-
mulation, consider an alternative way to test for predictive failure using aregression containing dummy variables. A separate dummy variable wouldbe used for each observation that was in the prediction sample. The un-restricted regression would then be the one that includes the dummyvariables, which will be estimated using all Tobservations, and will have
(k+T
2)regressors (the koriginal explanatory variables, and a dummy
variable for each prediction observation, i.e. a total of T2dummy vari-
ables). Thus the numerator of the last part of (4.70) would be the totalnumber of observations (T)minus the number of regressors in the unre-
stricted regression ( k+T
2). Noting also that T−(k+T2)=(T1−k), since
T1+T2=T,this gives the numerator of the last term in (4.70). The re-
stricted regression would then be the original regression containing theexplanatory variables but none of the dummy variables. Thus the number

184 Introductory Econometrics for Finance
of restrictions would be the number of observations in the prediction
period, which would be equivalent to the number of dummy variablesincluded in the unrestricted regression, T
2.
To offer an illustration, suppose that the regression is again of the form
of (4.63), and that the last three observations in the sample are used fora predictive failure test. The unrestricted regression would include threedummy variables, one for each of the observations in T
2
rgt=α+βrMt+γ1D1t+γ2D2t+γ3D3t+ut (4.71)
where D1t=1 for observation T−2 and zero otherwise, D2t=1f o r
observation T−1and zero otherwise, D3t=1for observation Tand zero
otherwise. In this case, k=2, and T2=3. The null hypothesis for the
predictive failure test in this regression is that the coefﬁcients on all ofthe dummy variables are zero (i.e. H
0:γ1=0 and γ2=0 and γ3=0). Both
approaches to conducting the predictive failure test described above areequivalent, although the dummy variable regression is likely to take moretime to set up.
However, for both the Chow and the predictive failure tests, the dummy
variables approach has the one major advantage that it provides theuser with more information. This additional information comes fromthe fact that one can examine the signiﬁcances of the coefﬁcients onthe individual dummy variables to see which part of the joint null hy-pothesis is causing a rejection. For example, in the context of the Chowregression, is it the intercept or the slope coefﬁcients that are signiﬁ-cantly different across the two sub-samples? In the context of the pre-dictive failure test, use of the dummy variables approach would showfor which period(s) the prediction errors are signiﬁcantly different fromzero.
4.12.3 Backward versus forward predictive failure tests
There are two types of predictive failure tests – forward tests and back-wards tests. Forward predictive failure tests are where the last few obser-vations are kept back for forecast testing. For example, suppose that obser-vations for 1980Q1–2004Q4 are available. A forward predictive failure testcould involve estimating the model over 1980Q1–2003Q4 and forecasting2004Q1–2004Q4. Backward predictive failure tests attempt to ‘back-cast’the ﬁrst few observations, e.g. if data for 1980Q1–2004Q4 are available,and the model is estimated over 1971Q1–2004Q4 and back-cast 1980Q1–
1980Q4. Both types of test offer further evidence on the stability of the
regression relationship over the whole sample period.

Classical linear regression model assumptions and diagnostic tests 185
Example 4.5
Suppose that the researcher decided to determine the stability of the
estimated model for stock returns over the whole sample in example 4.4by using a predictive failure test of the last two years of observations. Thefollowing models would be estimated:
1981 M1–1992 M12 (whole sample)
ˆr
gt=0.39+1.37rMt T=144 RSS=0.0434 (4.72)
1981 M1–1990 M12 (‘long sub-sample’)
ˆrgt=0.32+1.31rMt T=120 RSS 1=0.0420 (4.73)
Can this regression adequately ‘forecast’ the values for the last two years?
The test statistic would be given by
test statistic =0.0434−0.0420
0.0420×120−2
24(4.74)
=0.164
Compare the test statistic with an F(24,118) =1.66 at the 5% level. So
the null hypothesis that the model can adequately predict the last fewobservations would not be rejected. It would thus be concluded that themodel did not suffer from predictive failure during the 1991 M1–1992 M12
period.
4.12.4 How can the appropriate sub-parts to use be decided?
As a rule of thumb, some or all of the following methods for selectingwhere the overall sample split occurs could be used:
●Plot the dependent variable over time and split the data accordingly toany obvious structural changes in the series , as illustrated in ﬁgure 4.14.
1400
1200
1000
800
600
400
200
0
Observation numberyt
1
336597
129
161
193
225257289
321
353385
417
449Figure 4.14
Plot of a variable
showing suggestionfor break date

186 Introductory Econometrics for Finance
It is clear that yin ﬁgure 4.14 underwent a large fall in its value
around observation 175, and it is possible that this may have causeda change in its behaviour. A Chow test could be conducted with thesample split at this observation.
●Split the data according to any known important historical events (e.g. a
stock market crash, change in market microstructure, new governmentelected). The argument is that a major change in the underlying envi-ronment in which yis measured is more likely to cause a structural
change in the model’s parameters than a relatively trivial change.
●Use all but the last few observations and do a forwards predictive failure
teston those.
●Use all but the ﬁrst few observations and do a backwards predictive failure
teston those.
If a model is good, it will survive a Chow or predictive failure test with
any break date. If the Chow or predictive failure tests are failed, two ap-proaches could be adopted. Either the model is respeciﬁed, for example,by including additional variables, or separate estimations are conductedfor each of the sub-samples. On the other hand, if the Chow and predictivefailure tests show no rejections, it is empirically valid to pool all of thedata together in a single regression. This will increase the sample size andtherefore the number of degrees of freedom relative to the case where the
sub-samples are used in isolation.
4.12.5 The QLR test
The Chow and predictive failure tests will work satisfactorily if the dateof a structural break in a ﬁnancial time series can be speciﬁed. But moreoften, a researcher will not know the break date in advance, or may knowonly that it lies within a given range (sub-set) of the sample period. Insuch circumstances, a modiﬁed version of the Chow test, known as theQuandt likelihood ratio (QLR) test , named after Quandt (1960), can be used
instead. The test works by automatically computing the usual Chow F-
test statistic repeatedly with different break dates, then the break dategiving the largest F-statistic value is chosen. While the test statistic is
of the F-variety, it will follow a non-standard distribution rather than
anF-distribution since we are selecting the largest from a number of
F-statistics rather than examining a single one.
The test is well behaved only when the range of possible break dates is
sufﬁciently far from the end points of the whole sample, so it is usualto ‘‘trim’’ the sample by (typically) 5% at each end. To illustrate, supposethat the full sample comprises 200 observations; then we would test for

Classical linear regression model assumptions and diagnostic tests 187
a structural break between observations 31 and 170 inclusive. The criti-
cal values will depend on how much of the sample is trimmed away, thenumber of restrictions under the null hypothesis (the number of regres-sors in the original regression as this is effectively a Chow test) and thesigniﬁcance level.
4.12.6 Stability tests based on recursive estimation
An alternative to the QLR test for use in the situation where a researcherbelieves that a series may contain a structural break but is unsure ofthe date is to perform a recursive estimation. This is sometimes knownasrecursive least squares (RLS). The procedure is appropriate only for time-
series data or cross-sectional data that have been ordered in some sensibleway (for example, a sample of annual stock returns, ordered by marketcapitalisation). Recursive estimation simply involves starting with a sub-sample of the data, estimating the regression, then sequentially addingone observation at a time and re-running the regression until the end ofthe sample is reached. It is common to begin the initial estimation withthe very minimum number of observations possible, which will be k+1.
So at the ﬁrst step, the model is estimated using observations 1 to k+1;
at the second step, observations 1 to k+2are used and so on; at the ﬁnal
step, observations 1 to Tare used. The ﬁnal result will be the production
ofT−kseparate estimates of every parameter in the regression model.
It is to be expected that the parameter estimates produced near the
start of the recursive procedure will appear rather unstable since theseestimates are being produced using so few observations, but the key ques-tion is whether they then gradually settle down or whether the volatilitycontinues through the whole sample. Seeing the latter would be an indi-cation of parameter instability.
It should be evident that RLS in itself is not a statistical test for parame-
ter stability as such, but rather it provides qualitative information whichcan be plotted and thus gives a very visual impression of how stable theparameters appear to be. But two important stability tests, known as theCUSUM and CUSUMSQ tests, are derived from the residuals of the recur-
sive estimation (known as the recursive residuals).
5The CUSUM statistic
is based on a normalised (i.e. scaled) version of the cumulative sums ofthe residuals. Under the null hypothesis of perfect parameter stability, theCUSUM statistic is zero however many residuals are included in the sum
5Strictly, the CUSUM and CUSUMSQ statistics are based on the one-step ahead prediction
errors – i.e. the differences between ytand its predicted value based on the parameters
estimated at time t−1. See Greene (2002, chapter 7) for full technical details.

188 Introductory Econometrics for Finance
(because the expected value of a disturbance is always zero). A set of ±2
standard error bands is usually plotted around zero and any statistic lyingoutside the bands is taken as evidence of parameter instability.
The CUSUMSQ test is based on a normalised version of the cumulative
sums of squared residuals. The scaling is such that under the null hy-pothesis of parameter stability, the CUSUMSQ statistic will start at zeroand end the sample with a value of 1. Again, a set of ±2 standard error
bands is usually plotted around zero and any statistic lying outside theseis taken as evidence of parameter instability.
4.12.7 Stability tests in EViews
In EViews, to access the Chow test, click on the View/Stability Tests/Chow
Breakpoint Test …in the ‘Msoftreg’ regression window. In the new win-
dow that appears, enter the date at which it is believed that a breakpointoccurred. Input 1996:01 in the dialog box in screenshot 4.4 to split the
sample roughly in half. Note that it is not possible to conduct a Chowtest or a parameter stability test when there are outlier dummy variables
Screenshot 4.4
Chow test for
parameter stability

Classical linear regression model assumptions and diagnostic tests 189
in the regression. This occurs because when the sample is split into two
parts, the dummy variable for one of the parts will have values of zero forall observations, which would thus cause perfect multicollinearity withthe column of ones that is used for the constant term. So ensure that theChow test is performed using the regression containing all of the explana-tory variables except the dummies. By default, EViews allows the values ofall the parameters to vary across the two sub-samples in the unrestrictedregressions, although if we wanted, we could force some of the parametersto be ﬁxed across the two sub-samples.
EViews gives three versions of the test statistics, as shown in the follow-
ing table.
Chow Breakpoint Test: 1996M01
Null Hypothesis: No breaks at speciﬁed breakpointsVarying regressors: All equation variablesEquation Sample: 1986M05 2007M04
F-statistic 0.581302 Prob. F(8,236) 0.7929Log likelihood ratio 4.917407 Prob. Chi-Square(8) 0.7664Wald Statistic 4.650416 Prob. Chi-Square(8) 0.7942
The ﬁrst version of the test is the familiar F-test, which computes a
restricted version and an unrestricted version of the auxiliary regressionand ‘compares’ the residual sums of squares, while the second and thirdversions are based on χ
2formulations. In this case, all three test statistics
are smaller than their critical values and so the null hypothesis thatthe parameters are constant across the two sub-samples is not rejected.Note that the Chow forecast (i.e. the predictive failure) test could also beemployed by clicking on the View/Stability Tests/Chow Forecast Test …
in the regression window. Determine whether the model can predict the
last four observations by entering 2007:01 in the dialog box. The results
of this test are given in the following table.
Chow Forecast Test: Forecast from 2007M01 to 2007M04
F-statistic 0.056576 Prob. F(4,240) 0.9940
Log likelihood ratio 0.237522 Prob. Chi-Square(4) 0.9935
The table indicates that the model can indeed adequately predict the
2007 observations. Thus the conclusions from both forms of the test arethat there is no evidence of parameter instability. However, the conclusionshould really be that the parameters are stable with respect to these partic-
ular break dates . It is important to note that for the model to be deemed

190 Introductory Econometrics for Finance
adequate, it needs to be stable with respect to any break dates that we
may choose. A good way to test this is to use one of the tests based onrecursive estimation.
Click on View/Stability Tests/Recursive Estimates (OLS Only) …. You will
be presented with a menu as shown in screenshot 4.5 containing a numberof options including the CUSUM and CUSUMSQ tests described above andalso the opportunity to plot the recursively estimated coefﬁcients.
Screenshot 4.5
Plotting recursive
coefﬁcientestimates
First, check the box next to Recursive coefficients and then recur-
sive estimates will be given for all those parameters listed in the ‘Co-efﬁcient display list’ box, which by default is all of them. Click OKand
you will be presented with eight small ﬁgures, one for each parameter,showing the recursive estimates and ±2 standard error bands around
them. As discussed above, it is bound to take some time for the co-efﬁcients to stabilise since the ﬁrst few sets are estimated using suchsmall samples. Given this, the parameter estimates in all cases are re-markably stable over time. Now go back to View/Stability Tests/Recursive
Estimates (OLS Only) …. and choose CUSUM Test . The resulting graph is in
screenshot 4.6.
Since the line is well within the conﬁdence bands, the conclusion would
be again that the null hypothesis of stability is not rejected. Now repeat
the above but using the CUSUMSQ test rather than CUSUM . Do we retain
the same conclusion? (No) Why?

Classical linear regression model assumptions and diagnostic tests 191
60
40
20
0
−20
−40
−60
88 90 94 92 96 98 00 02 04 06
CUSUM 5% Significance
Screenshot 4.6
CUSUM test graph
4.13 A strategy for constructing econometric models and a
discussion of model-building philosophies
The objective of many econometric model-building exercises is to build a
statistically adequate empirical model which satisﬁes the assumptions ofthe CLRM, is parsimonious, has the appropriate theoretical interpretation,and has the right ‘shape’ (i.e. all signs on coefﬁcients are ‘correct’ and allsizes of coefﬁcients are ‘correct’).
But how might a researcher go about achieving this objective? A com-
mon approach to model building is the ‘LSE’ or general-to-speciﬁc method-ology associated with Sargan and Hendry. This approach essentially in-volves starting with a large model which is statistically adequate and re-stricting and rearranging the model to arrive at a parsimonious ﬁnal for-mulation. Hendry’s approach (see Gilbert, 1986) argues that a good modelis consistent with the data and with theory. A good model will also encom-pass rival models, which means that it can explain all that rival modelscan and more. The Hendry methodology suggests the extensive use ofdiagnostic tests to ensure the statistical adequacy of the model.
An alternative philosophy of econometric model-building, which pre-
dates Hendry’s research, is that of starting with the simplest model andadding to it sequentially so that it gradually becomes more complexand a better description of reality. This approach, associated principallywith Koopmans (1937), is sometimes known as a ‘speciﬁc-to-general’ or

192 Introductory Econometrics for Finance
‘bottoms-up’ modelling approach. Gilbert (1986) termed this the ‘Average
Economic Regression’ since most applied econometric work had been tack-led in that way. This term was also having a joke at the expense of a topeconomics journal that published many papers using such a methodology.
Hendry and his co-workers have severely criticised this approach, mainly
on the grounds that diagnostic testing is undertaken, if at all, almost asan after-thought and in a very limited fashion. However, if diagnostic testsare not performed, or are performed only at the end of the model-buildingprocess, all earlier inferences are potentially invalidated. Moreover, if thespeciﬁc initial model is generally misspeciﬁed, the diagnostic tests them-selves are not necessarily reliable in indicating the source of the prob-lem. For example, if the initially speciﬁed model omits relevant variableswhich are themselves autocorrelated, introducing lags of the includedvariables would not be an appropriate remedy for a signiﬁcant DW test
statistic. Thus the eventually selected model under a speciﬁc-to-generalapproach could be sub-optimal in the sense that the model selected usinga general-to-speciﬁc approach might represent the data better. Under theHendry approach, diagnostic tests of the statistical adequacy of the modelcome ﬁrst, with an examination of inferences for ﬁnancial theory drawnfrom the model left until after a statistically adequate model has beenfound.
According to Hendry and Richard (1982), a ﬁnal acceptable model should
satisfy several criteria (adapted slightly here). The model should:
●be logically plausible
●be consistent with underlying ﬁnancial theory, including satisfying anyrelevant parameter restrictions
●have regressors that are uncorrelated with the error term
●have parameter estimates that are stable over the entire sample
●have residuals that are white noise (i.e. completely random and exhibit-ing no patterns)
●be capable of explaining the results of all competing models and more.
The last of these is known as the encompassing principle . A model that
nests within it a smaller model always trivially encompasses it. But a smallmodel is particularly favoured if it can explain all of the results of a largermodel; this is known as parsimonious encompassing .
The advantages of the general-to-speciﬁc approach are that it is statis-
tically sensible and also that the theory on which the models are basedusually has nothing to say about the lag structure of a model. Therefore,the lag structure incorporated in the ﬁnal model is largely determinedby the data themselves. Furthermore, the statistical consequences from

Classical linear regression model assumptions and diagnostic tests 193
excluding relevant variables are usually considered more serious than
those from including irrelevant variables.
The general-to-speciﬁc methodology is conducted as follows. The ﬁrst
step is to form a ‘large’ model with lots of variables on the RHS. This isknown as a generalised unrestricted model (GUM), which should originatefrom ﬁnancial theory, and which should contain all variables thought toinﬂuence the dependent variable. At this stage, the researcher is requiredto ensure that the model satisﬁes all of the assumptions of the CLRM.If the assumptions are violated, appropriate actions should be taken toaddress or allow for this, e.g. taking logs, adding lags, adding dummyvariables.
It is important that the steps above are conducted prior to any hypoth-
esis testing. It should also be noted that the diagnostic tests presentedabove should be cautiously interpreted as general rather than speciﬁctests. In other words, rejection of a particular diagnostic test null hypoth-esis should be interpreted as showing that there is something wrong withthe model. So, for example, if the RESET test or White’s test show a rejec-tion of the null, such results should not be immediately interpreted asimplying that the appropriate response is to ﬁnd a solution for inappro-priate functional form or heteroscedastic residuals, respectively. It is quiteoften the case that one problem with the model could cause several as-sumptions to be violated simultaneously. For example, an omitted variablecould cause failures of the RESET, heteroscedasticity and autocorrelationtests. Equally, a small number of large outliers could cause non-normalityand residual autocorrelation (if they occur close together in the sample)and heteroscedasticity (if the outliers occur for a narrow range of theexplanatory variables). Moreover, the diagnostic tests themselves do notoperate optimally in the presence of other types of misspeciﬁcation sincethey essentially assume that the model is correctly speciﬁed in all otherrespects. For example, it is not clear that tests for heteroscedasticity willbehave well if the residuals are autocorrelated.
Once a model that satisﬁes the assumptions of the CLRM has been ob-
tained, it could be very big, with large numbers of lags and indepen-dent variables. The next stage is therefore to reparameterise the model byknocking out very insigniﬁcant regressors. Also, some coefﬁcients may beinsigniﬁcantly different from each other, so that they can be combined.At each stage, it should be checked whether the assumptions of the CLRMare still upheld. If this is the case, the researcher should have arrivedat a statistically adequate empirical model that can be used for testingunderlying ﬁnancial theories, forecasting future values of the dependentvariable, or for formulating policies.

194 Introductory Econometrics for Finance
However, needless to say, the general-to-speciﬁc approach also has its
critics. For small or moderate sample sizes, it may be impractical. In suchinstances, the large number of explanatory variables will imply a smallnumber of degrees of freedom. This could mean that none of the variablesis signiﬁcant, especially if they are highly correlated. This being the case, itwould not be clear which of the original long list of candidate regressorsshould subsequently be dropped. Moreover, in any case the decision onwhich variables to drop may have profound implications for the ﬁnalspeciﬁcation of the model. A variable whose coefﬁcient was not signiﬁcantmight have become signiﬁcant at a later stage if other variables had beendropped instead.
In theory, sensitivity of the ﬁnal speciﬁcation to the various possible
paths of variable deletion should be carefully checked. However, this couldimply checking many (perhaps even hundreds) of possible speciﬁcations. Itcould also lead to several ﬁnal models, none of which appears noticeablybetter than the others.
The general-to-speciﬁc approach, if followed faithfully to the end, will
hopefully lead to a statistically valid model that passes all of the usualmodel diagnostic tests and contains only statistically signiﬁcant regres-sors. However, the ﬁnal model could also be a bizarre creature that isdevoid of any theoretical interpretation. There would also be more thanjust a passing chance that such a model could be the product of a statisti-cally vindicated data mining exercise. Such a model would closely ﬁt thesample of data at hand, but could fail miserably when applied to othersamples if it is not based soundly on theory.
There now follows another example of the use of the classical linear
regression model in ﬁnance, based on an examination of the determinantsof sovereign credit ratings by Cantor and Packer (1996).
4.14 Determinants of sovereign credit ratings
4.14.1 Background
Sovereign credit ratings are an assessment of the riskiness of debt issuedby governments. They embody an estimate of the probability that the bor-rower will default on her obligation. Two famous US ratings agencies,Moody’s and Standard and Poor’s, provide ratings for many governments.Although the two agencies use different symbols to denote the given risk-iness of a particular borrower, the ratings of the two agencies are com-parable. Gradings are split into two broad categories: investment gradeand speculative grade. Investment grade issuers have good or adequatepayment capacity, while speculative grade issuers either have a high

Classical linear regression model assumptions and diagnostic tests 195
degree of uncertainty about whether they will make their payments, or
are already in default. The highest grade offered by the agencies, for thehighest quality of payment capacity, is ‘triple A’, which Moody’s denotes‘Aaa’ and Standard and Poor’s denotes ‘AAA’. The lowest grade issued to asovereign in the Cantor and Packer sample was B3 (Moody’s) or B −(Stan-
dard and Poor’s). Thus the number of grades of debt quality from thehighest to the lowest given to governments in their sample is 16.
The central aim of Cantor and Packer’s paper is an attempt to explain
and model how the agencies arrived at their ratings. Although the ratingsthemselves are publicly available, the models or methods used to arriveat them are shrouded in secrecy. The agencies also provide virtually noexplanation as to what the relative weights of the factors that make up therating are. Thus, a model of the determinants of sovereign credit ratingscould be useful in assessing whether the ratings agencies appear to haveacted rationally. Such a model could also be employed to try to predictthe rating that would be awarded to a sovereign that has not previouslybeen rated and when a re-rating is likely to occur. The paper continues,among other things, to consider whether ratings add to publicly availableinformation, and whether it is possible to determine what factors affecthow the sovereign yields react to ratings announcements.
4.14.2 Data
Cantor and Packer (1996) obtain a sample of government debt ratings for49 countries as of September 1995 that range between the above grad-ings. The ratings variable is quantiﬁed, so that the highest credit quality(Aaa/AAA) in the sample is given a score of 16, while the lowest ratedsovereign in the sample is given a score of 1 (B3/B −). This score forms the
dependent variable. The factors that are used to explain the variabilityin the ratings scores are macroeconomic variables. All of these variablesembody factors that are likely to inﬂuence a government’s ability andwillingness to service its debt costs. Ideally, the model would also includeproxies for socio-political factors, but these are difﬁcult to measure ob-jectively and so are not included. It is not clear in the paper from wherethe list of factors was drawn. The included variables (with their units ofmeasurement) are:
●Per capita income (in 1994 thousand US dollars). Cantor and Packer ar-
gue that per capita income determines the tax base, which in turn in-
ﬂuences the government’s ability to raise revenue.
●GDP growth (annual 1991–4 average, %). The growth rate of increase in
GDP is argued to measure how much easier it will become to servicedebt costs in the future.

196 Introductory Econometrics for Finance
●Inflation (annual 1992–4 average, %). Cantor and Packer argue that high
inﬂation suggests that inﬂationary money ﬁnancing will be used toservice debt when the government is unwilling or unable to raise therequired revenue through the tax system.
●Fiscal balance (average annual government budget surplus as a propor-
tion of GDP 1992–4, %). Again, a large ﬁscal deﬁcit shows that thegovernment has a relatively weak capacity to raise additional revenueand to service debt costs.
●External balance (average annual current account surplus as a proportion
of GDP 1992–4, %). Cantor and Packer argue that a persistent currentaccount deﬁcit leads to increasing foreign indebtedness, which may beunsustainable in the long run.
●External debt (foreign currency debt as a proportion of exports in 1994,
%). Reasoning as for external balance (which is the change in externaldebt over time).
●Dummy for economic development (=1 for a country classiﬁed by the IMF as
developed, 0 otherwise). Cantor and Packer argue that credit ratingsagencies perceive developing countries as relatively more risky beyondthat suggested by the values of the other factors listed above.
●Dummy for default history (=1 if a country has defaulted, 0 otherwise).
It is argued that countries that have previously defaulted experience alarge fall in their credit rating.
The income and inﬂation variables are transformed to their logarithms.
The model is linear and estimated using OLS. Some readers of this bookwho have a background in econometrics will note that strictly, OLS is notan appropriate technique when the dependent variable can take on onlyone of a certain limited set of values (in this case, 1, 2, 3, …16). In such
applications, a technique such as ordered probit (not covered in this text)would usually be more appropriate. Cantor and Packer argue that anyapproach other than OLS is infeasible given the relatively small samplesize (49), and the large number (16) of ratings categories.
The results from regressing the rating value on the variables listed above
are presented in their exhibit 5, adapted and presented here as table 4.2.Four regressions are conducted, each with identical independent vari-ables but a different dependent variable. Regressions are conducted forthe rating score given by each agency separately, with results presentedin columns (4) and (5) of table 4.2. Occasionally, the ratings agencies givedifferent scores to a country – for example, in the case of Italy, Moody’sgives a rating of ‘A1’, which would generate a score of 12 on a 16-scale.Standard and Poor’s (S and P), on the other hand, gives a rating of ‘AA’,

Classical linear regression model assumptions and diagnostic tests 197
Table 4.2 Determinants and impacts of sovereign credit ratings
Dependent variable
Explanatory Expected Average Moody’s S&P Difference
variable sign rating rating rating Moody’s/S&P(1) (2) (3) (4) (5) (6)
Intercept ? 1.442 3.408 −0.524 3.932∗∗
(0.663) (1.379) ( −0.223) (2.521)
Per capita income + 1.242∗∗∗1.027∗∗∗1.458∗∗∗−0.431∗∗∗
(5.302) (4.041) (6.048) ( −2.688)
GDP growth + 0.151 0.130 0.171∗∗−0.040
(1.935) (1.545) (2.132) (0.756)
Inﬂation −− 0.611∗∗∗−0.630∗∗∗−0.591∗∗∗−0.039
(−2.839) ( −2.701) ( −2.671) ( −0.265)
Fiscal balance + 0.073 0.049 0.097∗−0.048
(1.324) (0.818) (1.71) ( −1.274)
External balance + 0.003 0.006 0.001 0.006
(0.314) (0.535) (0.046) (0.779)
External debt −− 0.013∗∗∗−0.015∗∗∗−0.011∗∗∗−0.004∗∗∗
(−5.088) ( −5.365) ( −4.236) ( −2.133)
Development dummy + 2.776∗∗∗2.957∗∗∗2.595∗∗∗0.362
(4.25) (4.175) (3.861) (0.81)
Default dummy −− 2.042∗∗∗−1.63∗∗−2.622∗∗∗1.159∗∗∗
(−3.175) ( −2.097) ( −3.962) (2.632)
Adjusted R20.924 0.905 0.926 0.836
Notes :t-ratios in parentheses;∗,∗∗and∗∗∗indicate signiﬁcance at the 10%, 5% and
1% levels, respectively.Source : Cantor and Packer (1996). Reprinted with permission from Institutional
Investor .
which would score 14 on the 16-scale, two gradings higher. Thus a regres-
sion with the average score across the two agencies, and with the differ-ence between the two scores as dependent variables, is also conducted,and presented in columns (3) and (6), respectively of table 4.2.
4.14.3 Interpreting the models
The models are difﬁcult to interpret in terms of their statistical adequacy,since virtually no diagnostic tests have been undertaken. The values ofthe adjusted R
2, at over 90% for each of the three ratings regressions,
are high for cross-sectional regressions, indicating that the model seemsable to capture almost all of the variability of the ratings about their

198 Introductory Econometrics for Finance
mean values across the sample. There does not appear to be any attempt
at reparameterisation presented in the paper, so it is assumed that theauthors reached this set of models after some searching.
In this particular application, the residuals have an interesting interpre-
tation as the difference between the actual and ﬁtted ratings. The actualratings will be integers from 1 to 16, although the ﬁtted values from theregression and therefore the residuals can take on any real value. Cantorand Packer argue that the model is working well as no residual is biggerthan 3, so that no ﬁtted rating is more than three categories out from theactual rating, and only four countries have residuals bigger than two cat-egories. Furthermore, 70% of the countries have ratings predicted exactly(i.e. the residuals are less than 0.5 in absolute value).
Now, turning to interpret the models from a ﬁnancial perspective, it is
of interest to investigate whether the coefﬁcients have their expected signsand sizes. The expected signs for the regression results of columns (3)–(5)are displayed in column (2) of table 4.2 (as determined by this author).As can be seen, all of the coefﬁcients have their expected signs, althoughthe ﬁscal balance and external balance variables are not signiﬁcant or areonly very marginally signiﬁcant in all three cases. The coefﬁcients can beinterpreted as the average change in the rating score that would resultfrom a unit change in the variable. So, for example, a rise in per capita
income of $1,000 will on average increase the rating by 1.0 units accordingto Moody’s and 1.5 units according to Standard & Poor’s. The developmentdummy suggests that, on average, a developed country will have a ratingthree notches higher than an otherwise identical developing country. Andeverything else equal, a country that has defaulted in the past will havea rating two notches lower than one that has always kept its obligation.
By and large, the ratings agencies appear to place similar weights on
each of the variables, as evidenced by the similar coefﬁcients and signif-icances across columns (4) and (5) of table 4.2. This is formally tested incolumn (6) of the table, where the dependent variable is the difference be-tween Moody’s and Standard and Poor’s ratings. Only three variables arestatistically signiﬁcantly differently weighted by the two agencies. Stan-dard & Poor’s places higher weights on income and default history, whileMoody’s places more emphasis on external debt.
4.14.4 The relationship between ratings and yields
In this section of the paper, Cantor and Packer try to determine whetherratings have any additional information useful for modelling the cross-sectional variability of sovereign yield spreads over and above that con-tained in publicly available macroeconomic data. The dependent variable

Classical linear regression model assumptions and diagnostic tests 199
Table 4.3 Do ratings add to public information?
Dependent variable: ln (yield spread)
Variable Expected sign (1) (2) (3)
Intercept ? 2.105∗∗∗0.466 0.074
(16.148) (0.345) (0.071)
Average rating −− 0.221∗∗∗−0.218∗∗∗
(−19.175) ( −4.276)
Per capita −− 0.144 0.226
income ( −0.927) (1.523)
GDP growth −− 0.004 0.029
(−0.142) (1.227)
Inﬂation + 0.108 −0.004
(1.393) ( −0.068)
Fiscal balance −− 0.037 −0.02
(−1.557) ( −1.045)
External balance −− 0.038 −0.023
(−1.29) ( −1.008)
External debt + 0.003∗∗∗0.000
(2.651) (0.095)
Development −− 0.723∗∗∗−0.38
dummy ( −2.059) ( −1.341)
Default dummy + 0.612∗∗∗0.085
(2.577) (0.385)
Adjusted R20.919 0.857 0.914
Notes :t-ratios in parentheses;∗,∗∗and∗∗∗indicate signiﬁcance at the 10%, 5% and 1%
levels, respectively.Source : Cantor and Packer (1996). Reprinted with permission from Institutional Investor .
is now the log of the yield spread, i.e.
ln(Yield on the sovereign bond – Yield on a US Treasury Bond)One may argue that such a measure of the spread is imprecise, for the
true credit spread should be deﬁned by the entire credit quality curverather than by just two points on it. However, leaving this issue aside, theresults are presented in table 4.3.
Three regressions are presented in table 4.3, denoted speciﬁcations (1),
(2) and (3). The ﬁrst of these is a regression of the ln(spread) on only aconstant and the average rating (column (1)), and this shows that ratingshave a highly signiﬁcant inverse impact on the spread. Speciﬁcation (2)

200 Introductory Econometrics for Finance
is a regression of the ln(spread) on the macroeconomic variables used in
the previous analysis. The expected signs are given (as determined by thisauthor) in column (2). As can be seen, all coefﬁcients have their expectedsigns, although now only the coefﬁcients belonging to the external debtand the two dummy variables are statistically signiﬁcant. Speciﬁcation(3) is a regression on both the average rating and the macroeconomicvariables. When the rating is included with the macroeconomic factors,none of the latter is any longer signiﬁcant – only the rating coefﬁcientis statistically signiﬁcantly different from zero. This message is also por-trayed by the adjusted R
2values, which are highest for the regression
containing only the rating, and slightly lower for the regression contain-ing the macroeconomic variables and the rating. One may also observethat, under speciﬁcation (3), the coefﬁcients on the per capita income,
GDP growth and inﬂation variables now have the wrong sign. This is, infact, never really an issue, for if a coefﬁcient is not statistically signiﬁcant,it is indistinguishable from zero in the context of hypothesis testing, andtherefore it does not matter whether it is actually insigniﬁcant and pos-itive or insigniﬁcant and negative. Only coefﬁcients that are both of thewrong sign and statistically signiﬁcant imply that there is a problem withthe regression.
It would thus be concluded from this part of the paper that there is no
more incremental information in the publicly available macroeconomicvariables that is useful for predicting the yield spread than that embodiedin the rating. The information contained in the ratings encompasses thatcontained in the macroeconomic variables.
4.14.5 What determines how the market reacts to ratings announcements?
Cantor and Packer also consider whether it is possible to build a modelto predict how the market will react to ratings announcements, in termsof the resulting change in the yield spread. The dependent variable forthis set of regressions is now the change in the log of the relative spread,i.e. log[(yield – treasury yield)/treasury yield], over a two-day period at thetime of the announcement. The sample employed for estimation comprisesevery announcement of a ratings change that occurred between 1987 and1994; 79 such announcements were made, spread over 18 countries. Ofthese, 39 were actual ratings changes by one or more of the agencies,and 40 were listed as likely in the near future to experience a regrad-ing. Moody’s calls this a ‘watchlist’, while Standard and Poor’s term ittheir ‘outlook’ list. The explanatory variables are mainly dummy variablesfor:

Classical linear regression model assumptions and diagnostic tests 201
●whether the announcement was positive – i.e. an upgrade
●whether there was an actual ratings change or just listing for probable
regrading
●whether the bond was speculative grade or investment grade
●whether there had been another ratings announcement in the previous60 days
●the ratings gap between the announcing and the other agency.
The following cardinal variable was also employed:
●the change in the spread over the previous 60 days.
The results are presented in table 4.4, but in this text, only the ﬁnal
speciﬁcation (numbered 5 in Cantor and Packer’s exhibit 11) containingall of the variables described above is included.
As can be seen from table 4.4, the models appear to do a relatively poor
job of explaining how the market will react to ratings announcements.The adjusted R
2value is only 12%, and this is the highest of the ﬁve
Table 4.4 What determines reactions to ratings announcements?
Dependent variable: log relative spread
Independent variable Coefﬁcient ( t-ratio)
Intercept −0.02
(−1.4)
Positive announcements 0.01
(0.34)
Ratings changes −0.01
(−0.37)
Moody’s announcements 0.02
(1.51)
Speculative grade 0.03∗∗
(2.33)
Change in relative spreads from day −60to day −1 −0.06
(−1.1)
Rating gap 0.03∗
(1.7)
Other rating announcements from day −60to day −1 0.05∗∗
(2.15)
Adjusted R20.12
Note:∗and∗∗denote signiﬁcance at the 10% and 5% levels, respectively.
Source: Cantor and Packer (1996). Reprinted with permission from Institutional Investor .

202 Introductory Econometrics for Finance
speciﬁcations tested by the authors. Further, only two variables are signif-
icant and one marginally signiﬁcant of the seven employed in the model.It can therefore be stated that yield changes are signiﬁcantly higher fol-lowing a ratings announcement for speculative than investment gradebonds, and that ratings changes have a bigger impact on yield spreads ifthere is an agreement between the ratings agencies at the time the an-nouncement is made. Further, yields change signiﬁcantly more if therehas been a previous announcement in the past 60 days than if not. Onthe other hand, neither whether the announcement is an upgrade ordowngrade, nor whether it is an actual ratings change or a name on thewatchlist, nor whether the announcement is made by Moody’s or Stan-dard & Poor’s, nor the amount by which the relative spread has alreadychanged over the past 60 days, has any signiﬁcant impact on how themarket reacts to ratings announcements.
4.14.6 Conclusions
●To summarise, six factors appear to play a big role in determiningsovereign credit ratings – incomes, GDP growth, inﬂation, external debt,industrialised or not and default history
●The ratings provide more information on yields than all of the macro-economic factors put together
●One cannot determine with any degree of conﬁdence what factors de-termine how the markets will react to ratings announcements.
Key concepts
The key terms to be able to deﬁne and explain from this chapter are
●homoscedasticity ●heteroscedasticity
●autocorrelation ●dynamic model
●equilibrium solution ●robust standard errors
●skewness ●kurtosis
●outlier ●functional form
●multicollinearity ●omitted variable
●irrelevant variable ●parameter stability
●recursive least squares ●general-to-speciﬁc approach
Review questions
1. Are assumptions made concerning the unobservable error terms ( ut)or
about their sample counterparts, the estimated residuals ( ˆut)? Explain
your answer.

Classical linear regression model assumptions and diagnostic tests 203
2. What pattern(s) would one like to see in a residual plot and why?
3. A researcher estimates the following model for stock market returns,
but thinks that there may be a problem with it. By calculating thet-ratios, and considering their signiﬁcance and by examining the value
ofR
2or otherwise, suggest what the problem might be.
ˆyt=0.638+0.402x2t−0.891x3tR2=0.96,¯R2=0.89
(4.75) (0.436) (0 .291) (0 .763)
How might you go about solving the perceived problem?
4. (a) State in algebraic notation and explain the assumption about the
CLRM’s disturbances that is referred to by the term‘homoscedasticity’.
(b) What would the consequence be for a regression model if the
errors were not homoscedastic?
(c) How might you proceed if you found that (b) were actually the case?
5. (a) What do you understand by the term ‘autocorrelation’?
(b) An econometrician suspects that the residuals of her model might
be autocorrelated. Explain the steps involved in testing this theoryusing the Durbin–Watson ( DW) test.
(c) The econometrician follows your guidance (!!!) in part (b) and
calculates a value for the Durbin–Watson statistic of 0.95. Theregression has 60 quarterly observations and three explanatoryvariables (plus a constant term). Perform the test. What is yourconclusion?
(d) In order to allow for autocorrelation, the econometrician decides to
use a model in ﬁrst differences with a constant
/Delta1y
t=β1+β2/Delta1x2t+β3/Delta1x3t+β4/Delta1x4t+ut (4.76)
By attempting to calculate the long-run solution to this model,
explain what might be a problem with estimating models entirely inﬁrst differences.
(e) The econometrician ﬁnally settles on a model with both ﬁrst
differences and lagged levels terms of the variables
/Delta1y
t=β1+β2/Delta1x2t+β3/Delta1x3t+β4/Delta1x4t+β5x2t−1
+β6x3t−1+β7x4t−1+vt (4.77)
Can the Durbin–Watson test still validly be used in this case?
6. Calculate the long-run static equilibrium solution to the following
dynamic econometric model
/Delta1yt=β1+β2/Delta1x2t+β3/Delta1x3t+β4yt−1+β5x2t−1
+β6x3t−1+β7x3t−4+ut (4.78)

204 Introductory Econometrics for Finance
7. What might Ramsey’s RESET test be used for? What could be done if it
were found that the RESET test has been failed?
8. (a) Why is it necessary to assume that the disturbances of a
regression model are normally distributed?
(b) In a practical econometric modelling situation, how might the
problem that the residuals are not normally distributed beaddressed?
9. (a) Explain the term ‘parameter structural stability’?
(b) A ﬁnancial econometrician thinks that the stock market crash of
October 1987 fundamentally changed the risk–return relationshipgiven by the CAPM equation. He decides to test this hypothesisusing a Chow test. The model is estimated using monthly data fromJanuary 1980–December 1995, and then two separate regressionsare run for the sub-periods corresponding to data before and afterthe crash. The model is
r
t=α+βRmt+ut (4.79)
so that the excess return on a security at time tis regressed upon
the excess return on a proxy for the market portfolio at time t. The
results for the three models estimated for shares in British Airways(BA) are as follows:1981 M1–1995 M12
r
t=0.0215+1.491rmt RSS=0.189T=180 (4.80)
1981 M1–1987 M10
rt=0.0163+1.308rmt RSS=0.079T=82 (4.81)
1987 M11–1995 M12
rt=0.0360+1.613rmt RSS=0.082T=98 (4.82)
(c) What are the null and alternative hypotheses that are being tested
here, in terms of αandβ?
(d) Perform the test. What is your conclusion?
10. For the same model as above, and given the following results, do a
forward and backward predictive failure test:1981 M1–1995 M12
r
t=0.0215+1.491rmt RSS=0.189T=180 (4.83)
1981 M1–1994 M12
rt=0.0212+1.478rmt RSS=0.148T=168 (4.84)

Classical linear regression model assumptions and diagnostic tests 205
1982 M1–1995 M12
rt=0.0217+1.523rmt RSS=0.182T=168 (4.85)
What is your conclusion?
11. Why is it desirable to remove insigniﬁcant variables from a regression?12. Explain why it is not possible to include an outlier dummy variable in a
regression model when you are conducting a Chow test for parameterstability. Will the same problem arise if you were to conduct a predictivefailure test? Why or why not?
13. Re-open the ‘macro.wf1’ and apply the stepwise procedure including all
of the explanatory variables as listed above, i.e. ersandp dprod dcreditdinﬂation dmoney dspread rterm with a strict 5% threshold criterion forinclusion in the model. Then examine the resulting model bothﬁnancially and statistically by investigating the signs, sizes andsigniﬁcances of the parameter estimates and by conducting all of thediagnostic tests for model adequacy.

5
Univariate time series modelling and forecasting
Learning Outcomes
In this chapter, you will learn how to
●Explain the deﬁning characteristics of various types of
stochastic processes
●Identify the appropriate time series model for a given data
series
●Produce forecasts for ARMA and exponential smoothing models
●Evaluate the accuracy of predictions using various metrics
●Estimate time series models and produce forecasts from them
in EViews
5.1 Introduction
Univariate time series models are a class of speciﬁcations where one attempts
to model and to predict ﬁnancial variables using only information con-tained in their own past values and possibly current and past values of anerror term. This practice can be contrasted with structural models , which
are multivariate in nature, and attempt to explain changes in a variableby reference to the movements in the current or past values of other (ex-planatory) variables. Time series models are usually a-theoretical, implyingthat their construction and use is not based upon any underlying theo-retical model of the behaviour of a variable. Instead, time series modelsare an attempt to capture empirically relevant features of the observeddata that may have arisen from a variety of different (but unspeciﬁed)structural models. An important class of time series models is the fam-ily of AutoRegressive Integrated Moving Average (ARIMA) models, usuallyassociated with Box and Jenkins (1976). Time series models may be useful
206

Univariate time series modelling and forecasting 207
when a structural model is inappropriate. For example, suppose that there
is some variable ytwhose movements a researcher wishes to explain. It
may be that the variables thought to drive movements of ytare not ob-
servable or not measurable, or that these forcing variables are measuredat a lower frequency of observation than y
t. For example, ytmight be a
series of daily stock returns, where possible explanatory variables couldbe macroeconomic indicators that are available monthly. Additionally, aswill be examined later in this chapter, structural models are often notuseful for out-of-sample forecasting. These observations motivate the con-sideration of pure time series models, which are the focus of this chapter.
The approach adopted for this topic is as follows. In order to deﬁne,
estimate and use ARIMA models, one ﬁrst needs to specify the notationand to deﬁne several important concepts. The chapter will then considerthe properties and characteristics of a number of speciﬁc models from theARIMA family. The book endeavours to answer the following question: ‘Fora speciﬁed time series model with given parameter values, what will be itsdeﬁning characteristics?’ Following this, the problem will be reversed, sothat the reverse question is asked: ‘Given a set of data, with characteristicsthat have been determined, what is a plausible model to describe thatdata?’
5.2 Some notation and concepts
The following sub-sections deﬁne and describe several important conceptsin time series analysis. Each will be elucidated and drawn upon later inthe chapter. The ﬁrst of these concepts is the notion of whether a series isstationary or not. Determining whether a series is stationary or not is very
important, for the stationarity or otherwise of a series can strongly inﬂu-ence its behaviour and properties. Further detailed discussion of station-arity, testing for it, and implications of it not being present, are coveredin chapter 7.
5.2.1 A strictly stationary process
A strictly stationary process is one where, for any t1,t2,…,tT∈Z,a n y
k∈Zand T=1,2,…
Fyt1,yt2,…, ytT(y1,…, yT)=Fyt1+k,yt2+k,…, ytT+k(y1,…, yT) (5.1)
where Fdenotes the joint distribution function of the set of random vari-
ables (Tong, 1990, p.3). It can also be stated that the probability measurefor the sequence {y
t}is the same as that for {yt+k}∀k(where ‘∀k’ means

208 Introductory Econometrics for Finance
‘for all values of k’). In other words, a series is strictly stationary if the
distribution of its values remains the same as time progresses, implyingthat the probability that yfalls within a particular interval is the same
now as at any time in the past or the future.
5.2.2 A weakly stationary process
If a series satisﬁes (5.2)–(5.4) for t=1, 2,…,∞, it is said to be weakly or
covariance stationary
(1)E(yt)=μ (5.2)
(2)E(yt−μ)(yt−μ)=σ2<∞ (5.3)
(3)E(yt1−μ)(yt2−μ)=γt2−t1∀t1,t2 (5.4)
These three equations state that a stationary process should have a con-
stant mean, a constant variance and a constant autocovariance structure,respectively. Deﬁnitions of the mean and variance of a random variableare probably well known to readers, but the autocovariances may not be.
The autocovariances determine how yis related to its previous values,
and for a stationary series they depend only on the difference betweent
1and t2, so that the covariance between ytand yt−1is the same as the
covariance between yt−10and yt−11, etc. The moment
E(yt−E(yt))(yt−s−E(yt−s))=γs,s=0,1,2,… (5.5)
is known as the autocovariance function . When s=0, the autocovariance at
lag zero is obtained, which is the autocovariance of ytwith yt, i.e. the vari-
ance of y. These covariances, γs, are also known as autocovariances since
they are the covariances of ywith its own previous values. The autocovari-
ances are not a particularly useful measure of the relationship between y
and its previous values, however, since the values of the autocovariancesdepend on the units of measurement of y
t, and hence the values that they
take have no immediate interpretation.
It is thus more convenient to use the autocorrelations, which are the
autocovariances normalised by dividing by the variance
τs=γs
γ0,s=0,1,2,… (5.6)
The series τsnow has the standard property of correlation coefﬁcients
that the values are bounded to lie between ±1. In the case that s=0, the
autocorrelation at lag zero is obtained, i.e. the correlation of ytwith yt,
which is of course 1. If τsis plotted against s=0, 1, 2, …, a graph known
as the autocorrelation function (acf) or correlogram is obtained.

Univariate time series modelling and forecasting 209
5.2.3 A white noise process
Roughly speaking, a white noise process is one with no discernible struc-
ture. A deﬁnition of a white noise process is
E(yt)=μ (5.7)
var(yt)=σ2(5.8)
γt−r=/braceleftbiggσ2ift=r
0 otherwise(5.9)
Thus a white noise process has constant mean and variance, and zero
autocovariances, except at lag zero. Another way to state this last condi-tion would be to say that each observation is uncorrelated with all othervalues in the sequence. Hence the autocorrelation function for a whitenoise process will be zero apart from a single peak of 1 at s=0. Ifμ=0,
and the three conditions hold, the process is known as zero mean whitenoise.
If it is further assumed that y
tis distributed normally, then the sample
autocorrelation coefﬁcients are also approximately normally distributed
ˆτs∼approx .N(0,1/T)
where Tis the sample size, and ˆτsdenotes the autocorrelation coefﬁcient
at lag sestimated from a sample. This result can be used to conduct
signiﬁcance tests for the autocorrelation coefﬁcients by constructing anon-rejection region (like a conﬁdence interval) for an estimated autocor-relation coefﬁcient to determine whether it is signiﬁcantly different fromzero. For example, a 95% non-rejection region would be given by
±1.96×1
√
T
fors/negationslash=0. If the sample autocorrelation coefﬁcient, ˆτs, falls outside this
region for a given value of s, then the null hypothesis that the true value
of the coefﬁcient at that lag sis zero is rejected.
It is also possible to test the joint hypothesis that all mof the τkcorre-
lation coefﬁcients are simultaneously equal to zero using the Q-statistic
developed by Box and Pierce (1970)
Q=Tm/summationdisplay
k=1ˆτ2
k (5.10)
where T=sample size, m=maximum lag length.
The correlation coefﬁcients are squared so that the positive and nega-
tive coefﬁcients do not cancel each other out. Since the sum of squares ofindependent standard normal variates is itself a χ
2variate with degrees

210 Introductory Econometrics for Finance
of freedom equal to the number of squares in the sum, it can be stated
that the Q-statistic is asymptotically distributed as a χ2
munder the null
hypothesis that all mautocorrelation coefﬁcients are zero. As for any joint
hypothesis test, only one autocorrelation coefﬁcient needs to be statisti-cally signiﬁcant for the test to result in a rejection.
However, the Box–Pierce test has poor small sample properties, implying
that it leads to the wrong decision too frequently for small samples. Avariant of the Box–Pierce test, having better small sample properties, hasbeen developed. The modiﬁed statistic is known as the Ljung–Box (1978)statistic
Q
∗=T(T+2)m/summationdisplay
k=1ˆτ2
k
T−k∼χ2
m (5.11)
It should be clear from the form of the statistic that asymptotically (that
is, as the sample size increases towards inﬁnity), the (T+2) and (T−k)
terms in the Ljung–Box formulation will cancel out, so that the statis-tic is equivalent to the Box–Pierce test. This statistic is very useful as aportmanteau (general) test of linear dependence in time series.
Example 5.1
Suppose that a researcher had estimated the ﬁrst ﬁve autocorrelation co-efﬁcients using a series of length 100 observations, and found them to be
Lag 1 2 3 4 5
Autocorrelation coefﬁcient 0.207 −0.013 0.086 0.005 −0.022
Test each of the individual correlation coefﬁcients for signiﬁcance, and
test all ﬁve jointly using the Box–Pierce and Ljung–Box tests.
A 95% conﬁdence interval can be constructed for each coefﬁcient using
±1.96×1√
T
where T=100 in this case. The decision rule is thus to reject the null
hypothesis that a given coefﬁcient is zero in the cases where the coefﬁ-cient lies outside the range (−0.196, +0.196). For this example, it would
be concluded that only the ﬁrst autocorrelation coefﬁcient is signiﬁcantlydifferent from zero at the 5% level.
Now, turning to the joint tests, the null hypothesis is that all of the
ﬁrst ﬁve autocorrelation coefﬁcients are jointly zero, i.e.
H
0:τ1=0,τ2=0,τ3=0,τ4=0,τ5=0

Univariate time series modelling and forecasting 211
The test statistics for the Box–Pierce and Ljung–Box tests are given respec-
tively as
Q=100×(0.2072+−0.0132+0.0862+0.0052+−0.0222)
=5.09 (5.12)
Q∗=100×102×/parenleftbigg0.2072
100−1+−0.0132
100−2+0.0862
100−3
+0.0052
100−4+−0.0222
100−5/parenrightbigg
=5.26 (5.13)
The relevant critical values are from a χ2distribution with 5 degrees of
freedom, which are 11.1 at the 5% level, and 15.1 at the 1% level. Clearly,in both cases, the joint null hypothesis that all of the ﬁrst ﬁve autocorre-lation coefﬁcients are zero cannot be rejected. Note that, in this instance,the individual test caused a rejection while the joint test did not. This is anunexpected result that may have arisen as a result of the low power of thejoint test when four of the ﬁve individual autocorrelation coefﬁcients areinsigniﬁcant. Thus the effect of the signiﬁcant autocorrelation coefﬁcientis diluted in the joint test by the insigniﬁcant coefﬁcients. The sample sizeused in this example is also modest relative to those commonly availablein ﬁnance.
5.3 Moving average processes
The simplest class of time series model that one could entertain is thatof the moving average process. Let u
t(t=1, 2, 3, …) be a white noise
process with E( ut)=0and var( ut)=σ2. Then
yt=μ+ut+θ1ut−1+θ2ut−2+···+ θqut−q (5.14)
is aqth order moving average mode, denoted MA( q). This can be expressed
using sigma notation as
yt=μ+q/summationdisplay
i=1θiut−i+ut (5.15)
A moving average model is simply a linear combination of white noise
processes, so that ytdepends on the current and previous values of a white
noise disturbance term. Equation (5.15) will later have to be manipulated,and such a process is most easily achieved by introducing the lag operatornotation. This would be written Ly
t=yt−1to denote that ytis lagged once.
In order to show that the ith lag of ytis being taken (that is, the value
that yttook iperiods ago), the notation would be Liyt=yt−i. Note that in

212 Introductory Econometrics for Finance
some books and studies, the lag operator is referred to as the ‘backshift
operator’, denoted by B.Using the lag operator notation, (5.15) would be
written as
yt=μ+q/summationdisplay
i=1θiLiut+ut (5.16)
or as
yt=μ+θ(L)ut (5.17)
where: θ(L)=1+θ1L+θ2L2+···+ θqLq.
In much of what follows, the constant ( μ) is dropped from the equations.
Removing μconsiderably eases the complexity of algebra involved, and is
inconsequential for it can be achieved without loss of generality. To seethis, consider a sample of observations on a series, z
tthat has a mean ¯z.A
zero-mean series, ytcan be constructed by simply subtracting ¯zfrom each
observation zt.
The distinguishing properties of the moving average process of order q
given above are
(1) E( yt)=μ (5.18)
(2) var( yt)=γ0=/parenleftbig
1+θ2
1+θ2
2+···+ θ2
q/parenrightbig
σ2(5.19)
(3) covariances γs
=/braceleftBigg
(θs+θs+1θ1+θs+2θ2+···+ θqθq−s)σ2for s=1,2,…, q
0 for s >q(5.20)
So, a moving average process has constant mean, constant variance, and
autocovariances which may be non-zero to lag qand will always be zero
thereafter. Each of these results will be derived below.
Example 5.2
Consider the following MA(2) process
yt=ut+θ1ut−1+θ2ut−2 (5.21)
where utis a zero mean white noise process with variance σ2.
(1) Calculate the mean and variance of yt
(2) Derive the autocorrelation function for this process (i.e. express the
autocorrelations, τ1,τ2,… as functions of the parameters θ1andθ2)
(3) If θ1=−0.5 and θ2=0.25, sketch the acf of yt.

Univariate time series modelling and forecasting 213
Solution
(1) If E(ut)=0,then E( ut−i)=0∀i (5.22)
So the expected value of the error term is zero for all time periods.
Taking expectations of both sides of (5.21) gives
E(yt)=E(ut+θ1ut−1+θ2ut−2)
=E(ut)+θ1E(ut−1)+θ2E(ut−2)=0 (5.23)
var(yt)=E[yt−E(yt)][yt−E(yt)] (5.24)
but E( yt)=0, so that the last component in each set of square brackets
in (5.24) is zero and this reduces to
var(yt)=E[(yt)(yt)] (5.25)
Replacing ytin (5.25) with the RHS of (5.21)
var(yt)=E[(ut+θ1ut−1+θ2ut−2)(ut+θ1ut−1+θ2ut−2)] (5.26)
var(yt)=E/bracketleftbig
u2
t+θ2
1u2t−1+θ2
2u2t−2+cross -products/bracketrightbig
(5.27)
But E[ cross -products ]=0since cov(ut,ut−s)=0fors/negationslash=0.‘Cross -products ’
is thus a catchall expression for all of the terms in uwhich have
different time subscripts, such as ut−1ut−2orut−5ut−20, etc. Again, one
does not need to worry about these cross-product terms, since theseare effectively the autocovariances of u
t, which will all be zero by
deﬁnition since utis a random error process, which will have zero
autocovariances (except at lag zero). So
var(yt)=γ0=E/bracketleftbig
u2
t+θ2
1u2t−1+θ2
2u2t−2/bracketrightbig
(5.28)
var(yt)=γ0=σ2+θ2
1σ2+θ2
2σ2(5.29)
var(yt)=γ0=/parenleftbig
1+θ2
1+θ2
2/parenrightbig
σ2(5.30)
γ0can also be interpreted as the autocovariance at lag zero.
(2) Calculating now the acf of yt, ﬁrst determine the autocovariances
and then the autocorrelations by dividing the autocovariances by thevariance.
The autocovariance at lag 1 is given by
γ
1=E[yt−E(yt)][yt−1−E(yt−1)] (5.31)
γ1=E[yt][yt−1] (5.32)
γ1=E[(ut+θ1ut−1+θ2ut−2)(ut−1+θ1ut−2+θ2ut−3)] (5.33)

214 Introductory Econometrics for Finance
Again, ignoring the cross-products, (5.33) can be written as
γ1=E/bracketleftbig/parenleftbig
θ1u2
t−1+θ1θ2u2
t−2/parenrightbig/bracketrightbig
(5.34)
γ1=θ1σ2+θ1θ2σ2(5.35)
γ1=(θ1+θ1θ2)σ2(5.36)
The autocovariance at lag 2 is given by
γ2=E[yt−E(yt)][yt−2−E(yt−2)] (5.37)
γ2=E[yt][yt−2] (5.38)
γ2=E[(ut+θ1ut−1+θ2ut−2)(ut−2+θ1ut−3+θ2ut−4)] (5.39)
γ2=E/bracketleftbig/parenleftbig
θ2u2
t−2/parenrightbig/bracketrightbig
(5.40)
γ2=θ2σ2(5.41)
The autocovariance at lag 3 is given by
γ3=E[yt−E(yt)][yt−3−E(yt−3)] (5.42)
γ3=E[yt][yt−3] (5.43)
γ3=E[(ut+θ1ut−1+θ2ut−2)(ut−3+θ1ut−4+θ2ut−5)] (5.44)
γ3=0 (5.45)
Soγs=0fors2. All autocovariances for the MA(2) process will be zero
for any lag length, s, greater than 2.
The autocorrelation at lag 0 is given by
τ0=γ0
γ0=1 (5.46)
The autocorrelation at lag 1 is given by
τ1=γ1
γ0=(θ1+θ1θ2)σ2
/parenleftbig
1+θ2
1+θ2
2/parenrightbig
σ2=(θ1+θ1θ2)/parenleftbig
1+θ2
1+θ2
2/parenrightbig (5.47)
The autocorrelation at lag 2 is given by
τ2=γ2
γ0=(θ2)σ2
/parenleftbig
1+θ2
1+θ2
2/parenrightbig
σ2=θ2/parenleftbig
1+θ2
1+θ2
2/parenrightbig (5.48)
The autocorrelation at lag 3 is given by
τ3=γ3
γ0=0 (5.49)
The autocorrelation at lag sis given by
τs=γs
γ0=0∀s>2 (5.50)

Univariate time series modelling and forecasting 215
1.2
1
0.80.60.40.2
0
–0.2
–0.4
–0.60 1 3 2 4 5acf
lag, s
Figure 5.1 Autocorrelation function for sample MA(2) process
(3) For θ1=−0.5 and θ2=0.25, substituting these into the formulae
above gives the ﬁrst two autocorrelation coefﬁcients as τ1=−0.476,
τ2=0.190. Autocorrelation coefﬁcients for lags greater than 2 will
all be zero for an MA(2) model. Thus the acf plot will appear as inﬁgure 5.1.
5.4 Autoregressive processes
An autoregressive model is one where the current value of a variable, y,
depends upon only the values that the variable took in previous periodsplus an error term. An autoregressive model of order p, denoted as AR( p),
can be expressed as
y
t=μ+φ1yt−1+φ2yt−2+···+ φpyt−p+ut (5.51)
where utis a white noise disturbance term. A manipulation of expression
(5.51) will be required to demonstrate the properties of an autoregres-sive model. This expression can be written more compactly using sigmanotation
y
t=μ+p/summationdisplay
i=1φiyt−i+ut (5.52)

216 Introductory Econometrics for Finance
or using the lag operator, as
yt=μ+p/summationdisplay
i=1φiLiyt+ut (5.53)
or
φ(L)yt=μ+ut (5.54)
where φ(L)=(1−φ1L−φ2L2−···− φpLp).
5.4.1 The stationarity condition
Stationarity is a desirable property of an estimated AR model, for several
reasons. One important reason is that a model whose coefﬁcients are non-stationary will exhibit the unfortunate property that previous values ofthe error term will have a non-declining effect on the current value of
y
tas time progresses. This is arguably counter-intuitive and empirically
implausible in many cases. More discussion on this issue will be presentedin chapter 7. Box 5.1 deﬁnes the stationarity condition algebraically.
Box 5.1 The stationarity condition for an AR( p) model
Setting μto zero in (5.54), for a zero mean AR ( p) process, yt, given by
φ(L)yt=ut (5.55)
it would be stated that the process is stationary if it is possible to write
yt=φ(L)−1ut (5.56)
withφ(L)−1converging to zero. This means that the autocorrelations will decline
eventually as the lag length is increased. When the expansion φ(L)−1is calculated, it
will contain an inﬁnite number of terms, and can be written as an MA( ∞), e.g.
a1ut−1+a2ut−2+a3ut−3+···+ ut. If the process given by (5.54) is stationary, the
coefﬁcients in the MA( ∞)representation will decline eventually with lag length. On
the other hand, if the process is non-stationary, the coefﬁcients in the MA( ∞)
representation would not converge to zero as the lag length increases.
The condition for testing for the stationarity of a general AR( p)model is that the
roots of the ‘characteristic equation’
1−φ1z−φ2z2−···− φpzp=0 (5.57)
all lie outside the unit circle. The notion of a characteristic equation is so-called
because its roots determine the characteristics of the process yt– for example, the
acf for an AR process will depend on the roots of this characteristic equation, which isa polynomial in z.

Univariate time series modelling and forecasting 217
Example 5.3
Is the following model stationary?
yt=yt−1+ut (5.58)
In order to test this, ﬁrst write yt−1in lag operator notation (i.e. as Lyt),
and take this term over to the LHS of (5.58), and factorise
yt=Lyt+ut (5.59)
yt−Lyt=ut (5.60)
yt(1−L)=ut (5.61)
Then the characteristic equation is
1−z=0, (5.62)
having the root z=1, which lies on, not outside, the unit circle. In fact,
the particular AR( p) model given by (5.58) is a non-stationary process
known as a random walk (see chapter 7).
This procedure can also be adopted for autoregressive models with
longer lag lengths and where the stationarity or otherwise of the processis less obvious. For example, is the following process for y
tstationary?
yt=3yt−1−2.75yt−2+0.75yt−3+ut (5.63)
Again, the ﬁrst stage is to express this equation using the lag operator
notation, and then taking all the terms in yover to the LHS
yt=3Lyt−2.75L2yt+0.75L3yt+ut (5.64)
(1−3L+2.75L2−0.75L3)yt=ut (5.65)
The characteristic equation is
1−3z+2.75z2−0.75z3=0 (5.66)
which fortunately factorises to
(1−z)(1−1.5z)(1−0.5z)=0 (5.67)
so that the roots are z=1,z=2/3, and z=2. Only one of these lies
outside the unit circle and hence the process for ytdescribed by (5.63) is
not stationary.
5.4.2 Wold’s decomposition theorem
Wold’s decomposition theorem states that any stationary series can be de-composed into the sum of two unrelated processes, a purely deterministic

218 Introductory Econometrics for Finance
part and a purely stochastic part, which will be an MA( ∞). A simpler
way of stating this in the context of AR modelling is that any stationaryautoregressive process of order pwith no constant and no other terms
can be expressed as an inﬁnite order moving average model. This result isimportant for deriving the autocorrelation function for an autoregressiveprocess.
For the AR( p)model, given in, for example, (5.51) (with μset to zero for
simplicity) and expressed using the lag polynomial notation, φ(L)y
t=ut,
the Wold decomposition is
yt=ψ(L)ut (5.68)
where ψ(L)=φ(L)−1=(1−φ1L−φ2L2−···− φpLp)−1
The characteristics of an autoregressive process are as follows. The (un-
conditional) mean of yis given by
E(yt)=μ
1−φ1−φ2−···− φp(5.69)
The autocovariances and autocorrelation functions can be obtained by
solving a set of simultaneous equations known as the Yule–Walker equa-tions. The Yule–Walker equations express the correlogram (the τs) as a
function of the autoregressive coefﬁcients (the φs)
τ
1=φ1+τ1φ2+···+ τp−1φp
τ2=τ1φ1+φ2+···+ τp−2φp
……… (5.70)
τ
p=τp−1φ1+τp−2φ2+···+ φp
For any AR model that is stationary, the autocorrelation function will
decay geometrically to zero.1These characteristics of an autoregressive
process will be derived from ﬁrst principles below using an illustrativeexample.
Example 5.4
Consider the following simple AR(1) model
yt=μ+φ1yt−1+ut (5.71)
(i) Calculate the (unconditional) mean yt.
For the remainder of the question, set the constant to zero ( μ=0)
for simplicity.
1Note that the τswill not follow an exact geometric sequence, but rather the absolute
value of the τsis bounded by a geometric series. This means that the autocorrelation
function does not have to be monotonically decreasing and may change sign.

Univariate time series modelling and forecasting 219
(ii) Calculate the (unconditional) variance of yt.
(iii) Derive the autocorrelation function for this process.
Solution
(i) The unconditional mean will be given by the expected value of ex-
pression (5.71)
E(yt)=E(μ+φ1yt−1) (5.72)
E(yt)=μ+φ1E(yt−1) (5.73)
But also
yt−1=μ+φ1yt−2+ut−1 (5.74)
So, replacing yt−1in (5.73) with the RHS of (5.74)
E(yt)=μ+φ1(μ+φ1E(yt−2)) (5.75)
E(yt)=μ+φ1μ+φ2
1E(yt−2) (5.76)
Lagging (5.74) by a further one period
yt−2=μ+φ1yt−3+ut−2 (5.77)
Repeating the steps given above one more time
E(yt)=μ+φ1μ+φ2
1(μ+φ1E(yt−3)) (5.78)
E(yt)=μ+φ1μ+φ2
1μ+φ3
1E(yt−3) (5.79)
Hopefully, readers will by now be able to see a pattern emerging.
Making nsuch substitutions would give
E(yt)=μ/parenleftbig
1+φ1+φ2
1+···+ φn−1
1/parenrightbig
+φt
1E(yt−n) (5.80)
So long as the model is stationary, i.e. |φ1|<1, then φ∞
1=0. Therefore,
taking limits as n→∞ , then limn→∞φt
1E(yt−n)=0, and so
E(yt)=μ/parenleftbig
1+φ1+φ2
1+···/parenrightbig
(5.81)
Recall the rule of algebra that the ﬁnite sum of an inﬁnite number
of geometrically declining terms in a series is given by ‘ﬁrst term inseries divided by (1 minus common difference)’, where the commondifference is the quantity that each term in the series is multipliedby to arrive at the next term. It can thus be stated from (5.81) that
E(y
t)=μ
1−φ1(5.82)

220 Introductory Econometrics for Finance
Thus the expected or mean value of an autoregressive process of order
one is given by the intercept parameter divided by one minus theautoregressive coefﬁcient.
(ii) Calculating now the variance of y
t, with μset to zero
yt=φ1yt−1+ut (5.83)
This can be written equivalently as
yt(1−φ1L)=ut (5.84)
From Wold’s decomposition theorem, the AR( p)can be expressed as
an MA( ∞)
yt=(1−φ1L)−1ut (5.85)
yt=/parenleftbig
1+φ1L+φ2
1L2+···/parenrightbig
ut (5.86)
or
yt=ut+φ1ut−1+φ2
1ut−2+φ3
1ut−3+··· (5.87)
So long as |φ1|<1, i.e. so long as the process for ytis stationary, this
sum will converge.
From the deﬁnition of the variance of any random variable y,i ti s
possible to write
var(yt)=E[yt−E(yt)][yt−E(yt)] (5.88)
but E( yt)=0, since μis set to zero to obtain (5.83) above. Thus
var(yt)=E[(yt)(yt)] (5.89)
var(yt)=E/bracketleftbig/parenleftbig
ut+φ1ut−1+φ2
1ut−2+···/parenrightbig/parenleftbig
ut+φ1ut−1+φ2
1ut−2+···/parenrightbig/bracketrightbig
(5.90)
var(yt)=E/bracketleftbig
u2
t+φ2
1u2t−1+φ4
1u2t−2+···+ cross -products/bracketrightbig
(5.91)
As discussed above, the ‘cross-products’ can be set to zero.
var(yt)=γ0=E/bracketleftbig
u2
t+φ2
1u2t−1+φ4
1u2t−2+···/bracketrightbig
(5.92)
var(yt)=σ2+φ2
1σ2+φ4
1σ2+··· (5.93)
var(yt)=σ2/parenleftbig
1+φ2
1+φ4
1+···/parenrightbig
(5.94)
Provided that |φ1|<1, the inﬁnite sum in (5.94) can be written as
var(yt)=σ2
/parenleftbig
1−φ2
1/parenrightbig (5.95)
(iii) Turning now to the calculation of the autocorrelation function, the
autocovariances must ﬁrst be calculated. This is achieved by following

Univariate time series modelling and forecasting 221
similar algebraic manipulations as for the variance above, starting
with the deﬁnition of the autocovariances for a random variable. Theautocovariances for lags 1, 2, 3, …,s, will be denoted by γ
1,γ2,γ3,…,
γs, as previously.
γ1=cov ( yt,yt−1)=E[yt−E(yt)][yt−1−E(yt−1)] (5.96)
Since μhas been set to zero, E( yt)=0and E( yt−1)=0,s o
γ1=E[ytyt−1] (5.97)
under the result above that E( yt)=E(yt−1)=0. Thus
γ1=E/bracketleftbig/parenleftbig
ut+φ1ut−1+φ2
1ut−2+···/parenrightbig/parenleftbig
ut−1+φ1ut−2
+φ2
1ut−3+···/parenrightbig/bracketrightbig
(5.98)
γ1=E/bracketleftbig
φ1u2
t−1+φ3
1u2t−2+···+ cross−products/bracketrightbig
(5.99)
Again, the cross-products can be ignored so that
γ1=φ1σ2+φ3
1σ2+φ5
1σ2+··· (5.100)
γ1=φ1σ2/parenleftbig
1+φ2
1+φ4
1+···/parenrightbig
(5.101)
γ1=φ1σ2
/parenleftbig
1−φ2
1/parenrightbig (5.102)
For the second autocovariance,
γ2=cov(yt,yt−2)=E[yt−E(yt)][yt−2−E(yt−2)] (5.103)
Using the same rules as applied above for the lag 1 covariance
γ2=E[ytyt−2] (5.104)
γ2=E/bracketleftbig/parenleftbig
ut+φ1ut−1+φ2
1ut−2+···/parenrightbig/parenleftbig
ut−2+φ1ut−3
+φ2
1ut−4+···/parenrightbig/bracketrightbig
(5.105)
γ2=E/bracketleftbig
φ2
1u2t−2+φ4
1u2t−3+···+ cross-products/bracketrightbig
(5.106)
γ2=φ2
1σ2+φ4
1σ2+··· (5.107)
γ2=φ2
1σ2/parenleftbig
1+φ2
1+φ4
1+···/parenrightbig
(5.108)
γ2=φ2
1σ2
/parenleftbig
1−φ2
1/parenrightbig (5.109)
By now it should be possible to see a pattern emerging. If these steps
were repeated for γ3, the following expression would be obtained
γ3=φ3
1σ2
/parenleftbig
1−φ2
1/parenrightbig (5.110)

222 Introductory Econometrics for Finance
and for any lag s, the autocovariance would be given by
γs=φs
1σ2
/parenleftbig
1−φ2
1/parenrightbig (5.111)
The acf can now be obtained by dividing the covariances by the vari-
ance, so that
τ0=γ0
γ0=1 (5.112)
τ1=γ1
γ0=/parenleftBigg
φ1σ2
/parenleftbig
1−φ2
1/parenrightbig/parenrightBigg
/parenleftBigg
σ2
/parenleftbig
1−φ2
1/parenrightbig/parenrightBigg=φ1 (5.113)
τ2=γ2
γ0=/parenleftBigg
φ2
1σ2
/parenleftbig
1−φ2
1/parenrightbig/parenrightBigg
/parenleftBigg
σ2
/parenleftbig
1−φ2
1/parenrightbig/parenrightBigg=φ2
1 (5.114)
τ3=φ3
1 (5.115)
The autocorrelation at lag sis given by
τs=φs
1 (5.116)
which means that corr( yt,yt−s)=φs
1. Note that use of the Yule–Walker
equations would have given the same answer.
5.5 The partial autocorrelation function
The partial autocorrelation function, or pacf (denoted τkk), measures the
correlation between an observation kperiods ago and the current ob-
servation, after controlling for observations at intermediate lags (i.e. alllags<k) – i.e. the correlation between y
tand yt−k, after removing the ef-
fects of yt−k+1,yt−k+2,…, yt−1. For example, the pacf for lag 3 would mea-
sure the correlation between ytand yt−3after controlling for the effects
ofyt−1and yt−2.
At lag 1, the autocorrelation and partial autocorrelation coefﬁcients
are equal, since there are no intermediate lag effects to eliminate. Thus,τ
11=τ1, where τ1is the autocorrelation coefﬁcient at lag 1.
At lag 2
τ22=/parenleftbig
τ2−τ2
1/parenrightbig/slashbig/parenleftbig
1−τ2
1/parenrightbig
(5.117)

Univariate time series modelling and forecasting 223
where τ1andτ2are the autocorrelation coefﬁcients at lags 1 and 2, re-
spectively. For lags greater than two, the formulae are more complex andhence a presentation of these is beyond the scope of this book. There nowproceeds, however, an intuitive explanation of the characteristic shape ofthe pacf for a moving average and for an autoregressive process.
In the case of an autoregressive process of order p, there will be direct
connections between y
tand yt−sfors≤p, but no direct connections for
s>p. For example, consider the following AR(3) model
yt=φ0+φ1yt−1+φ2yt−2+φ3yt−3+ut (5.118)
There is a direct connection through the model between ytand yt−1, and
between ytand yt−2, and between ytand yt−3, but not between ytand yt−s,
fors>3. Hence the pacf will usually have non-zero partial autocorrelation
coefﬁcients for lags up to the order of the model, but will have zero partialautocorrelation coefﬁcients thereafter. In the case of the AR(3), only theﬁrst three partial autocorrelation coefﬁcients will be non-zero.
What shape would the partial autocorrelation function take for a mov-
ing average process? One would need to think about the MA model asbeing transformed into an AR in order to consider whether y
tand yt−k,
k=1, 2,…, are directly connected. In fact, so long as the MA( q)pro-
cess is invertible, it can be expressed as an AR( ∞). Thus a deﬁnition of
invertibility is now required.
5.5.1 The invertibility condition
An MA( q)model is typically required to have roots of the characteristic
equation θ(z)=0 greater than one in absolute value. The invertibility
condition is mathematically the same as the stationarity condition, butis different in the sense that the former refers to MA rather than ARprocesses. This condition prevents the model from exploding under anAR(∞) representation, so that θ
−1(L)converges to zero. Box 5.2 shows the
invertibility condition for an MA(2) model.
5.6 ARMA processes
By combining the AR( p)and MA( q)models, an ARMA( p,q)model is
obtained. Such a model states that the current value of some series y
depends linearly on its own previous values plus a combination of cur-rent and previous values of a white noise error term. The model could be

224 Introductory Econometrics for Finance
Box 5.2 The invertibility condition for an MA(2) model
In order to examine the shape of the pacf for moving average processes, consider the
following MA(2) process for yt
yt=ut+θ1ut−1+θ2ut−2=θ(L)ut (5.119)
Provided that this process is invertible, this MA(2) can be expressed as an AR( ∞)
yt=∞/summationdisplay
i=1ciLiyt−i+ut (5.120)
yt=c1yt−1+c2yt−2+c3yt−3+···+ ut (5.121)
It is now evident when expressed in this way that for a moving average model, there are
direct connections between the current value of yand all of its previous values. Thus,
the partial autocorrelation function for an MA( q)model will decline geometrically, rather
than dropping off to zero after qlags, as is the case for its autocorrelation function. It
could thus be stated that the acf for an AR has the same basic shape as the pacf foran MA, and the acf for an MA has the same shape as the pacf for an AR.
written
φ(L)yt=μ+θ(L)ut (5.122)
where
φ(L)=1−φ1L−φ2L2− ···− φpLpand
θ(L)=1+θ1L+θ2L2+···+ θqLq
or
yt=μ+φ1yt−1+φ2yt−2+···+ φpyt−p+θ1ut−1
+θ2ut−2+···+ θqut−q+ut (5.123)
with
E(ut)=0;E/parenleftbig
u2
t/parenrightbig
=σ2;E(utus)=0,t/negationslash=s
The characteristics of an ARMA process will be a combination of those
from the autoregressive (AR) and moving average (MA) parts. Note thatthe pacf is particularly useful in this context. The acf alone can distin-guish between a pure autoregressive and a pure moving average process.However, an ARMA process will have a geometrically declining acf, as willa pure AR process. So, the pacf is useful for distinguishing between anAR(p)process and an ARMA( p,q)process – the former will have a geomet-
rically declining autocorrelation function, but a partial autocorrelationfunction which cuts off to zero after plags, while the latter will have

Univariate time series modelling and forecasting 225
both autocorrelation and partial autocorrelation functions which decline
geometrically.
We can now summarise the deﬁning characteristics of AR, MA and
ARMA processes.
An autoregressive process has:
●a geometrically decaying acf
●a number of non-zero points of pacf =AR order.
A moving average process has:
●number of non-zero points of acf =MA order
●a geometrically decaying pacf.
A combination autoregressive moving average process has:
●a geometrically decaying acf
●a geometrically decaying pacf.
In fact, the mean of an ARMA series is given by
E(yt)=μ
1−φ1−φ2−···− φp(5.124)
The autocorrelation function will display combinations of behaviour de-
rived from the AR and MA parts, but for lags beyond q, the acf will simply
be identical to the individual AR( p)model, so that the AR part will dom-
inate in the long term. Deriving the acf and pacf for an ARMA processrequires no new algebra, but is tedious and hence is left as an exercisefor interested readers.
5.6.1 Sample acf and pacf plots for standard processes
Figures 5.2–5.8 give some examples of typical processes from the ARMAfamily with their characteristic autocorrelation and partial autocorrela-tion functions. The acf and pacf are not produced analytically from therelevant formulae for a model of that type, but rather are estimated using100,000 simulated observations with disturbances drawn from a normaldistribution. Each ﬁgure also has 5% (two-sided) rejection bands repre-sented by dotted lines. These are based on ( ±1.96/√100000) =±0.0062,
calculated in the same way as given above. Notice how, in each case, theacf and pacf are identical for the ﬁrst lag.
In ﬁgure 5.2, the MA(1) has an acf that is signiﬁcant for only lag 1,
while the pacf declines geometrically, and is signiﬁcant until lag 7. Theacf at lag 1 and all of the pacfs are negative as a result of the negativecoefﬁcient in the MA generating process.

226 Introductory Econometrics for Finance
0.05
0
–0.05
–0.1
–0.15
–0.2
–0.25
–0.3
–0.35
–0.4
–0.45acf and pacf
lag,s123456789 1 0
acf
pacf
Figure 5.2 Sample autocorrelation and partial autocorrelation functions for an MA(1) model:
yt=−0.5ut−1+ut
0.4
0.30.2
0.1
0
–0.1
–0.2
–0.3–0.4acf and pacf
lag, s2 14 35 8 7 61 0 9acf
pacf
Figure 5.3 Sample autocorrelation and partial autocorrelation functions for an MA(2) model:
yt=0.5ut−1−0.25ut−2+ut

Univariate time series modelling and forecasting 227
1
0.9
0.8
0.7
0.60.5
0.4
0.3
0.20.1
0
–0.1acf and pacf
lag, s123456789 1 0acf
pacf
Figure 5.4 Sample autocorrelation and partial autocorrelation functions for a slowly decaying AR(1)
model: yt=0.9yt−1+ut
0.6
0.5
0.40.30.2
0.1
0
–0.1 acf and pacf
lag,s1234567 8 9 1 0acf
pacf
Figure 5.5 Sample autocorrelation and partial autocorrelation functions for a more rapidly decaying
AR(1) model: yt=0.5yt−1+ut

228 Introductory Econometrics for Finance
0.3
0.20.1
0
–0.1
–0.2
–0.3
–0.4
–0.5
–0.6acf and pacf
lag, s2 14 3 7 6 59 81 0
acf
pacf
Figure 5.6 Sample autocorrelation and partial autocorrelation functions for a more rapidly decaying
AR(1) model with negative coefﬁcient: yt=−0.5yt−1+ut
1
0.90.8
0.70.6
0.50.4
0.3
0.20.1
0acf and pacf
lag, s2 14 36 58 71 0 9acf
pacf
Figure 5.7 Sample autocorrelation and partial autocorrelation functions for a non-stationary model
(i.e. a unit coefﬁcient): yt=yt−1+ut

Univariate time series modelling and forecasting 229
0.8
0.60.4
0.2
0
–0.2
–0.4acf and pacf
lag, s2 14 36 58 71 0 9acf
pacf
Figure 5.8 Sample autocorrelation and partial autocorrelation functions for an ARMA(1, 1) model:
yt=0.5yt−1+0.5ut−1+ut
Again, the structures of the acf and pacf in ﬁgure 5.3 are as anticipated.
The ﬁrst two autocorrelation coefﬁcients only are signiﬁcant, while thepartial autocorrelation coefﬁcients are geometrically declining. Note alsothat, since the second coefﬁcient on the lagged error term in the MAis negative, the acf and pacf alternate between positive and negative. Inthe case of the pacf, we term this alternating and declining function a‘damped sine wave’ or ‘damped sinusoid’.
For the autoregressive model of order 1 with a fairly high coefﬁcient –
i.e. relatively close to 1 – the autocorrelation function would be expectedto die away relatively slowly, and this is exactly what is observed here inﬁgure 5.4. Again, as expected for an AR(1), only the ﬁrst pacf coefﬁcientis signiﬁcant, while all others are virtually zero and are not signiﬁcant.
Figure 5.5 plots an AR(1), which was generated using identical error
terms, but a much smaller autoregressive coefﬁcient. In this case, theautocorrelation function dies away much more quickly than in the previ-ous example, and in fact becomes insigniﬁcant after around 5 lags.
Figure 5.6 shows the acf and pacf for an identical AR(1) process to that
used for ﬁgure 5.5, except that the autoregressive coefﬁcient is now nega-tive. This results in a damped sinusoidal pattern for the acf, which again

230 Introductory Econometrics for Finance
becomes insigniﬁcant after around lag 5. Recalling that the autocorre-
lation coefﬁcient for this AR(1) at lag sis equal to (−0.5)s,t h i sw i l lb e
positive for even s, and negative for odd s. Only the ﬁrst pacf coefﬁcient
is signiﬁcant (and negative).
Figure 5.7 plots the acf and pacf for a non-stationary series (see
chapter 7 for an extensive discussion) that has a unit coefﬁcient on thelagged dependent variable. The result is that shocks to ynever die away,
and persist indeﬁnitely in the system. Consequently, the acf function re-mains relatively ﬂat at unity, even up to lag 10. In fact, even by lag 10,the autocorrelation coefﬁcient has fallen only to 0.9989. Note also that onsome occasions, the acf does die away, rather than looking like ﬁgure 5.7,even for such a non-stationary process, owing to its inherent instabilitycombined with ﬁnite computer precision. The pacf, however, is signiﬁcantonly for lag 1, correctly suggesting that an autoregressive model with nomoving average term is most appropriate.
Finally, ﬁgure 5.8 plots the acf and pacf for a mixed ARMA process.
As one would expect of such a process, both the acf and the pacf declinegeometrically – the acf as a result of the AR part and the pacf as a result ofthe MA part. The coefﬁcients on the AR and MA are, however, sufﬁcientlysmall that both acf and pacf coefﬁcients have become insigniﬁcant bylag 6.
5.7 Building ARMA models: the Box–Jenkins approach
Although the existence of ARMA models predates them, Box and Jenkins(1976) were the ﬁrst to approach the task of estimating an ARMA model ina systematic manner. Their approach was a practical and pragmatic one,involving three steps:
(1) Identiﬁcation
(2) Estimation(3) Diagnostic checking.
These steps are now explained in greater detail.
Step 1
This involves determining the order of the model required to capture the dy-
namic features of the data. Graphical procedures are used (plotting thedata over time and plotting the acf and pacf) to determine the most ap-propriate speciﬁcation.

Univariate time series modelling and forecasting 231
Step 2
This involves estimation of the parameters of the model speciﬁed in step 1. This
can be done using least squares or another technique, known as maximumlikelihood, depending on the model.
Step 3
This involves model checking – i.e. determining whether the model spec-
iﬁed and estimated is adequate. Box and Jenkins suggest two methods:overﬁtting and residual diagnostics. Overfitting involves deliberately ﬁtting
a larger model than that required to capture the dynamics of the dataas identiﬁed in stage 1. If the model speciﬁed at step 1 is adequate, anyextra terms added to the ARMA model would be insigniﬁcant. Residual di-
agnostics imply checking the residuals for evidence of linear dependence
which, if present, would suggest that the model originally speciﬁed wasinadequate to capture the features of the data. The acf, pacf or Ljung–Boxtests could be used.
It is worth noting that ‘diagnostic testing’ in the Box–Jenkins world es-
sentially involves only autocorrelation tests rather than the whole barrageof tests outlined in chapter 4. Also, such approaches to determining the ad-equacy of the model could only reveal a model that is underparameterised(‘too small’) and would not reveal a model that is overparameterised (‘toobig’).
Examining whether the residuals are free from autocorrelation is much
more commonly used than overﬁtting, and this may partly have arisensince for ARMA models, it can give rise to common factors in the overﬁt-ted model that make estimation of this model difﬁcult and the statisticaltests ill behaved. For example, if the true model is an ARMA(1,1) and we de-liberately then ﬁt an ARMA(2,2) there will be a common factor so that notall of the parameters in the latter model can be identiﬁed. This problemdoes not arise with pure AR or MA models, only with mixed processes.
It is usually the objective to form a parsimonious model , which is one that
describes all of the features of data of interest using as few parameters(i.e. as simple a model) as possible. A parsimonious model is desirablebecause:
●The residual sum of squares is inversely proportional to the number of
degrees of freedom. A model which contains irrelevant lags of thevariable or of the error term (and therefore unnecessary parameters)will usually lead to increased coefﬁcient standard errors, implying thatit will be more difﬁcult to ﬁnd signiﬁcant relationships in the data.Whether an increase in the number of variables (i.e. a reduction in

232 Introductory Econometrics for Finance
the number of degrees of freedom) will actually cause the estimated
parameter standard errors to rise or fall will obviously depend on howmuch the RSS falls, and on the relative sizes of Tand k.I fTis very
large relative to k, then the decrease in RSS is likely to outweigh the
reduction in T−kso that the standard errors fall. Hence ‘large’ models
with many parameters are more often chosen when the sample size islarge.
●Models that are proﬂigate might be inclined to ﬁt to data speciﬁc fea-tures, which would not be replicated out-of-sample. This means that themodels may appear to ﬁt the data very well, with perhaps a high valueofR
2, but would give very inaccurate forecasts. Another interpretation
of this concept, borrowed from physics, is that of the distinction be-tween ‘signal’ and ‘noise’. The idea is to ﬁt a model which captures the
signal (the important features of the data, or the underlying trends or
patterns), but which does not try to ﬁt a spurious model to the noise(the completely random aspect of the series).
5.7.1 Information criteria for ARMA model selection
The identiﬁcation stage would now typically not be done using graphi-cal plots of the acf and pacf. The reason is that when ‘messy’ real data isused, it unfortunately rarely exhibits the simple patterns of ﬁgures 5.2–5.8.This makes the acf and pacf very hard to interpret, and thus it is difﬁ-cult to specify a model for the data. Another technique, which removessome of the subjectivity involved in interpreting the acf and pacf, is touse what are known as information criteria . Information criteria embody
two factors: a term which is a function of the residual sum of squares(RSS), and some penalty for the loss of degrees of freedom from adding
extra parameters. So, adding a new variable or an additional lag to amodel will have two competing effects on the information criteria: theresidual sum of squares will fall but the value of the penalty term willincrease.
The object is to choose the number of parameters which minimises the
value of the information criteria. So, adding an extra term will reducethe value of the criteria only if the fall in the residual sum of squaresis sufﬁcient to more than outweigh the increased value of the penaltyterm. There are several different criteria, which vary according to howstiff the penalty term is. The three most popular information criteriaare Akaike’s (1974) information criterion ( AIC), Schwarz’s (1978) Bayesian
information criterion ( SBIC), and the Hannan–Quinn criterion ( HQIC ).

Univariate time series modelling and forecasting 233
Algebraically, these are expressed, respectively, as
AIC=ln( ˆσ2)+2k
T(5.125)
SBIC=ln( ˆσ2)+k
TlnT (5.126)
HQIC =ln( ˆσ2)+2k
Tln(ln( T)) (5.127)
where ˆσ2is the residual variance (also equivalent to the residual sum
of squares divided by the number of observations, T),k=p+q+1is
the total number of parameters estimated and Tis the sample size. The
information criteria are actually minimised subject to p≤¯p,q≤¯q, i.e.
an upper limit is speciﬁed on the number of moving average ( ¯q)and/or
autoregressive ( ¯p)terms that will be considered.
It is worth noting that SBIC embodies a much stiffer penalty term than
AIC, while HQIC is somewhere in between. The adjusted R2measure can
also be viewed as an information criterion, although it is a very soft one,which would typically select the largest models of all.
5.7.2 Which criterion should be preferred if they suggest different model orders?
SBIC is strongly consistent (but inefﬁcient) and AICis not consistent, but is
generally more efﬁcient. In other words, SBIC will asymptotically deliver
the correct model order, while AIC will deliver on average too large a
model, even with an inﬁnite amount of data. On the other hand, theaverage variation in selected model orders from different samples withina given population will be greater in the context of SBIC than AIC. Overall,
then, no criterion is deﬁnitely superior to others.
5.7.3 ARIMA modelling
ARIMA modelling, as distinct from ARMA modelling, has the additionalletter ‘I’ in the acronym, standing for ‘integrated’. An integrated au-
toregressive process is one whose characteristic equation has a root on
the unit circle. Typically researchers difference the variable as neces-sary and then build an ARMA model on those differenced variables. AnARMA( p,q)model in the variable differenced dtimes is equivalent to an
ARIMA( p,d,q)model on the original data – see chapter 7 for further de-
tails. For the remainder of this chapter, it is assumed that the data used inmodel construction are stationary, or have been suitably transformed tomake them stationary. Thus only ARMA models will be considered further.

234 Introductory Econometrics for Finance
5.8 Constructing ARMA models in EViews
5.8.1 Getting started
This example uses the monthly UK house price series which was already
incorporated in an EViews workﬁle in chapter 1. There were a total of196 monthly observations running from February 1991 (recall that theJanuary observation was ‘lost’ in constructing the lagged value) to May2007 for the percentage change in house price series.
The objective of this exercise is to build an ARMA model for the house
price changes. Recall that there are three stages involved: identiﬁcation, es-timation and diagnostic checking. The ﬁrst stage is carried out by lookingat the autocorrelation and partial autocorrelation coefﬁcients to identifyany structure in the data.
5.8.2 Estimating the autocorrelation coefﬁcients for up to 12 lags
Double click on the DHP series and then click View and choose Correlo-
gram …. In the ‘Correlogram Speciﬁcation’ window, choose Level (since
the series we are investigating has already been transformed into percent-age returns or percentage changes) and in the ‘Lags to include’ box, type12. Click on OK. The output, including relevant test statistics, is given in
screenshot 5.1.
It is clearly evident from the ﬁrst columns that the series is quite persis-
tent given that it is already in percentage change form. The autocorrela-tion function dies away quite slowly. Only the ﬁrst partial autocorrelationcoefﬁcient appears strongly signiﬁcant. The numerical values of the auto-correlation and partial autocorrelation coefﬁcients at lags 1–12 are givenin the fourth and ﬁfth columns of the output, with the lag length givenin the third column.
The penultimate column of output gives the statistic resulting from a
Ljung–Box test with number of lags in the sum equal to the row number(i.e. the number in the third column). The test statistics will follow a χ
2(1)
for the ﬁrst row, a χ2(2) for the second row, and so on. p-values associated
with these test statistics are given in the last column.
Remember that as a rule of thumb, a given autocorrelation coefﬁcient
is classed as signiﬁcant if it is outside a ±1.96×1/(T)1/2band, where T
is the number of observations. In this case, it would imply that a cor-relation coefﬁcient is classed as signiﬁcant if it is bigger than approx-imately 0.14 or smaller than −0.14. The band is of course wider when
the sampling frequency is monthly, as it is here, rather than daily wherethere would be more observations. It can be deduced that the ﬁrst three

Univariate time series modelling and forecasting 235
Screenshot 5.1
Estimating the
correlogram
autocorrelation coefﬁcients and the ﬁrst two partial autocorrelation co-
efﬁcients are signiﬁcant under this rule. Since the ﬁrst acf coefﬁcient ishighly signiﬁcant, the Ljung–Box joint test statistic rejects the null hy-pothesis of no autocorrelation at the 1% level for all numbers of lagsconsidered. It could be concluded that a mixed ARMA process could beappropriate, although it is hard to precisely determine the appropriateorder given these results. In order to investigate this issue further, theinformation criteria are now employed.
5.8.3 Using information criteria to decide on model orders
As demonstrated above, deciding on the appropriate model orders fromautocorrelation functions could be very difﬁcult in practice. An easier wayis to choose the model order that minimises the value of an informationcriterion.
An important point to note is that books and statistical packages often
differ in their construction of the test statistic. For example, the formu-lae given earlier in this chapter for Akaike’s and Schwarz’s Information

236 Introductory Econometrics for Finance
Criteria were
AIC=ln( ˆσ2)+2k
T(5.128)
SBIC =ln( ˆσ2)+k
T(lnT) (5.129)
where ˆσ2is the estimator of the variance of regressions disturbances ut,k
is the number of parameters and Tis the sample size. When using the
criterion based on the estimated standard errors, the model with thelowest value of AICand SBIC should be chosen. However, EViews uses a
formulation of the test statistic derived from the log-likelihood functionvalue based on a maximum likelihood estimation (see chapter 8). Thecorresponding EViews formulae are
AIC
/lscript=− 2/lscript/T+2k
T(5.130)
SBIC /lscript=− 2/lscript/T+k
T(lnT) (5.131)
where l=−T
2(1+ln(2π)+ln(ˆu/primeˆu/T))
Unfortunately, this modiﬁcation is not benign, since it affects the rela-
tive strength of the penalty term compared with the error variance, some-times leading different packages to select different model orders for thesame data and criterion!
Suppose that it is thought that ARMA models from order (0,0) to (5,5)
are plausible for the house price changes. This would entail considering36 models (ARMA(0,0), ARMA(1,0), ARMA(2,0), …ARMA(5,5)), i.e. up to ﬁve
lags in both the autoregressive and moving average terms.
In EViews, this can be done by separately estimating each of the models
and noting down the value of the information criteria in each case.
2This
would be done in the following way. On the EViews main menu, clickonQuick and choose Estimate Equation …. EViews will open an Equa-
tion Speciﬁcation window. In the Equation Speciﬁcation editor, type, forexample
dhp c ar(1) ma(1)For the estimation settings, select LS – Least Squares (NLS and ARMA) ,
select the whole sample, and click OK– this will specify an ARMA(1,1).
The output is given in the table below.
2Alternatively, any reader who knows how to write programs in EViews could set up a
structure to loop over the model orders and calculate all the values of the informationcriteria together – see chapter 12.

Univariate time series modelling and forecasting 237
Dependent Variable: DHP
Method: Least SquaresDate: 08/31/07 Time: 16:09Sample (adjusted): 1991M03 2007M05Included observations: 195 after adjustmentsConvergence achieved after 19 iterationsMA Backcast: 1991M02
Coefﬁcient Std. Error t-Statistic Prob.
C 0.868177 0.334573 2.594884 0.0102
AR(1) 0.975461 0.019471 50.09854 0.0000
MA(1) −0.909851 0.039596 −22.9784 0.0000
R-squared 0.144695 Mean dependent var 0.635212
Adjusted R-squared 0.135786 S.D. dependent var 1.149146S.E. of regression 1.068282 Akaike info criterion 2.985245Sum squared resid 219.1154 Schwarz criterion 3.035599Log likelihood −288.0614 Hannan-Quinn criter. 3.005633
F-statistic 16.24067 Durbin-Watson stat 1.842823Prob(F-statistic) 0.000000
Inverted AR Roots .98Inverted MA Roots .91
In theory, the output would then be interpreted in a similar way to
that discussed in chapter 3. However, in reality it is very difﬁcult to in-terpret the parameter estimates in the sense of, for example, saying, ‘a1 unit increase in xleads to a βunit increase in y’. In part because the
construction of ARMA models is not based on any economic or ﬁnancialtheory, it is often best not to even try to interpret the individual param-eter estimates, but rather to examine the plausibility of the model as awhole and to determine whether it describes the data well and producesaccurate forecasts (if this is the objective of the exercise, which it often is).
The inverses of the AR and MA roots of the characteristic equation are
also shown. These can be used to check whether the process implied by themodel is stationary and invertible. For the AR and MA parts of the processto be stationary and invertible, respectively, the inverted roots in each casemust be smaller than 1 in absolute value, which they are in this case,although only just. Note also that the header for the EViews output forARMA models states the number of iterations that have been used in themodel estimation process. This shows that, in fact, an iterative numericaloptimisation procedure has been employed to estimate the coefﬁcients(see chapter 8 for further details).

238 Introductory Econometrics for Finance
Repeating these steps for the other ARMA models would give all of
the required values for the information criteria. To give just one moreexample, in the case of an ARMA(5,5), the following would be typed in theEquation Speciﬁcation editor box:
dhp c ar(1) ar(2) ar(3) ar(4) ar(5) ma(1) ma(2) ma(3) ma(4) ma(5)
Note that, in order to estimate an ARMA(5,5) model, it is necessary to
write out the whole list of terms as above rather than to simply write, forexample, ‘dhp c ar(5) ma(5)’, which would give a model with a ﬁfth lagof the dependent variable and a ﬁfth lag of the error term but no othervariables. The values of all of the information criteria, calculated usingEViews, are as follows:
Information criteria for ARMA models of the
percentage changes in UK house prices
AIC
p / q 012345
0 3.116 3.086 2.973 2.973 2.977 2.9771 3.065 2.985 2.965 2.935 2.931 2.9382 2.951 2.961 2.968 2.924 2.941 2.9573 2.960 2.968 2.970 2.980 2.937 2.9144 2.969 2.979 2.931 2.940 2.862 2.924
5 2.984 2.932 2.955 2.986 2.937 2.936
SBIC
p / q 012345
0 3.133 3.120 3.023 3.040 3.061 3.0781 3.098 3.036 3.032 3.019 3.032 3.0562 3.002 3.029 3.053 3.025 3.059 3.091
3 3.028 3.053 3.072 3.098 3.072 3.0664 3.054 3.081 3.049 3.076 3.015 3.0945 3.086 3.052 3.092 3.049 3.108 3.123
So which model actually minimises the two information criteria? In this
case, the criteria choose different models: AICselects an ARMA(4,4), while
SBIC selects the smaller ARMA(2,0) model – i.e. an AR(2). These chosen
models are highlighted in bold in the table. It will always be the casethat SBIC selects a model that is at least as small (i.e. with fewer or the
same number of parameters) as AIC, because the former criterion has a
stricter penalty term. This means that SBIC penalises the incorporation
of additional terms more heavily. Many different models provide almost

Univariate time series modelling and forecasting 239
identical values of the information criteria, suggesting that the chosen
models do not provide particularly sharp characterisations of the data andthat a number of other speciﬁcations would ﬁt the data almost as well.
5.9 Examples of time series modelling in ﬁnance
5.9.1 Covered and uncovered interest parity
The determination of the price of one currency in terms of another (i.e. theexchange rate) has received a great deal of empirical examination in theinternational ﬁnance literature. Of these, three hypotheses in particularare studied – covered interest parity (CIP), uncovered interest parity (UIP)and purchasing power parity (PPP). The ﬁrst two of these will be consid-ered as illustrative examples in this chapter, while PPP will be discussed inchapter 7. All three relations are relevant for students of ﬁnance, for vio-lation of one or more of the parities may offer the potential for arbitrage,or at least will offer further insights into how ﬁnancial markets operate.All are discussed brieﬂy here; for a more comprehensive treatment, seeCuthbertson and Nitsche (2004) or the many references therein.
5.9.2 Covered interest parity
Stated in its simplest terms, CIP implies that, if ﬁnancial markets areefﬁcient, it should not be possible to make a riskless proﬁt by borrowingat a risk-free rate of interest in a domestic currency, switching the fundsborrowed into another (foreign) currency, investing them there at a risk-free rate and locking in a forward sale to guarantee the rate of exchangeback to the domestic currency. Thus, if CIP holds, it is possible to write
f
t−st=(r−r∗)t (5.132)
where ftand stare the log of the forward and spot prices of the domestic
in terms of the foreign currency at time t,ris the domestic interest rate
and r∗is the foreign interest rate. This is an equilibrium condition which
must hold otherwise there would exist riskless arbitrage opportunities,and the existence of such arbitrage would ensure that any deviation fromthe condition cannot hold indeﬁnitely. It is worth noting that, underlyingCIP are the assumptions that the risk-free rates are truly risk-free – thatis, there is no possibility for default risk. It is also assumed that there areno transactions costs, such as broker’s fees, bid–ask spreads, stamp duty,etc., and that there are no capital controls, so that funds can be movedwithout restriction from one currency to another.

240 Introductory Econometrics for Finance
5.9.3 Uncovered interest parity
UIP takes CIP and adds to it a further condition known as ‘forward rate
unbiasedness’ (FRU). Forward rate unbiasedness states that the forwardrate of foreign exchange should be an unbiased predictor of the futurevalue of the spot rate. If this condition does not hold, again in theoryriskless arbitrage opportunities could exist. UIP, in essence, states thatthe expected change in the exchange rate should be equal to the interestrate differential between that available risk-free in each of the currencies.Algebraically, this may be stated as
s
e
t+1−st=(r−r∗)t (5.133)
where the notation is as above and se
t+1is the expectation, made at time
tof the spot exchange rate that will prevail at time t+1.
The literature testing CIP and UIP is huge with literally hundreds of
published papers. Tests of CIP unsurprisingly (for it is a pure arbitrage con-dition) tend not to reject the hypothesis that the condition holds. Taylor(1987, 1989) has conducted extensive examinations of CIP, and concludedthat there were historical periods when arbitrage was proﬁtable, particu-larly during periods where the exchange rates were under management.
Relatively simple tests of UIP and FRU take equations of the form (5.133)
and add intuitively relevant additional terms. If UIP holds, these addi-tional terms should be insigniﬁcant. Ito (1988) tests UIP for the yen/dollarexchange rate with the three-month forward rate for January 1973 untilFebruary 1985. The sample period is split into three as a consequenceof perceived structural breaks in the series. Strict controls on capitalmovements were in force in Japan until 1977, when some were relaxedand ﬁnally removed in 1980. A Chow test conﬁrms Ito’s intuition andsuggests that the three sample periods should be analysed separately.Two separate regressions are estimated for each of the three samplesub-periods
s
t+3−ft,3=a+b1(st−ft−3,3)+b2(st−1−ft−4,3)+ut (5.134)
where st+3is the spot interest rate prevailing at time t+3,ft,3is the for-
ward rate for three periods ahead available at time t, and so on, and ut
is an error term. A natural joint hypothesis to test is H 0:a=0 and b1=0
and b2=0. This hypothesis represents the restriction that the deviation
of the forward rate from the realised rate should have a mean value in-signiﬁcantly different from zero ( a=0) and it should be independent of
any information available at time t(b
1=0 and b2=0). All three of these
conditions must be fulﬁlled for UIP to hold. The second equation that Ito

Univariate time series modelling and forecasting 241
Table 5.1 Uncovered interest parity test results
Sample period 1973M1–1977M3 1977M4–1980M12 1981M1–1985M2
Panel A: Estimates and hypothesis tests for
St+3−ft,3=a+b1(st−ft−3,3)+b2(st−1−ft−4,3)+ut
Estimate of a 0.0099 0.0031 0.027
Estimate of b1 0.020 0.24 0.077
Estimate of b2 −0.37 0.16 −0.21
Joint test χ2(3) 23.388 5.248 6.022
P-value for joint test 0.000 0.155 0.111
Panel B: Estimates and hypothesis tests for
St+3−ft,3=a+b(st−ft,3)+vt
Estimate of a 0.00 −0.052 −0.89
Estimate of b 0.095 4.18 2.93
Joint test χ2(2) 31.923 22.06 5.39
p-value for joint test 0.000 0.000 0.07
Source : Ito (1988). Reprinted with permission from MIT Press Journals.
tests is
st+3−ft,3=a+b(st−ft,3)+vt (5.135)
where vtis an error term and the hypothesis of interest in this case is H 0:
a=0 and b=0.
Equation (5.134) tests whether past forecast errors have information use-
ful for predicting the difference between the actual exchange rate at timet+3, and the value of it that was predicted by the forward rate. Equation
(5.135) tests whether the forward premium has any predictive power forthe difference between the actual exchange rate at time t+3, and the
value of it that was predicted by the forward rate. The results for thethree sample periods are presented in Ito’s table 3, and are adapted andreported here in table 5.1.
The main conclusion is that UIP clearly failed to hold throughout the
period of strictest controls, but there is less and less evidence against UIPas controls were relaxed.
5.10 Exponential smoothing
Exponential smoothing is another modelling technique (not based on theARIMA approach) that uses only a linear combination of the previousvalues of a series for modelling it and for generating forecasts of its future

242 Introductory Econometrics for Finance
values. Given that only previous values of the series of interest are used,
the only question remaining is how much weight should be attached toeach of the previous observations. Recent observations would be expectedto have the most power in helping to forecast future values of a series. Ifthis is accepted, a model that places more weight on recent observationsthan those further in the past would be desirable. On the other hand,observations a long way in the past may still contain some informationuseful for forecasting future values of a series, which would not be thecase under a centred moving average. An exponential smoothing modelwill achieve this, by imposing a geometrically declining weighting schemeon the lagged values of a series. The equation for the model is
S
t=αyt+(1−α)St−1 (5.136)
where αis the smoothing constant, with 0<α< 1,ytis the current re-
alised value, Stis the current smoothed value.
Since α+(1−α)=1,Stis modelled as a weighted average of the current
observation ytand the previous smoothed value. The model above can be
rewritten to express the exponential weighting scheme more clearly. Bylagging (5.136) by one period, the following expression is obtained
S
t−1=αyt−1+(1−α)St−2 (5.137)
and lagging again
St−2=αyt−2+(1−α)St−3 (5.138)
Substituting into (5.136) for St−1from (5.137)
St=αyt+(1−α)(αyt−1+(1−α)St−2) (5.139)
St=αyt+(1−α)αyt−1+(1−α)2St−2 (5.140)
Substituting into (5.140) for St−2from (5.138)
St=αyt+(1−α)αyt−1+(1−α)2(αyt−2+(1−α)St−3) (5.141)
St=αyt+(1−α)αyt−1+(1−α)2αyt−2+(1−α)3St−3 (5.142)
Tsuccessive substitutions of this kind would lead to
St=/parenleftBiggT/summationdisplay
i=0α(1−α)iyt−i/parenrightBigg
+(1−α)T+1St−1−T (5.143)
Since α0, the effect of each observation declines geometrically as the
variable moves another observation forward in time. In the limit as T→
∞,( 1−α)TS0→0, so that the current smoothed value is a geometrically
weighted inﬁnite sum of the previous realisations.

Univariate time series modelling and forecasting 243
The forecasts from an exponential smoothing model are simply set to
the current smoothed value, for any number of steps ahead, s
ft,s=St,s=1,2,3,… (5.144)
The exponential smoothing model can be seen as a special case of a Box–
Jenkins model, an ARIMA(0,1,1), with MA coefﬁcient (1 −α) – see Granger
and Newbold (1986, p. 174).
The technique above is known as single or simple exponential smooth-
ing, and it can be modiﬁed to allow for trends (Holt’s method) or to allowfor seasonality (Winter’s method) in the underlying variable. These aug-mented models are not pursued further in this text since there is a muchbetter way to model the trends (using a unit root process – see chapter 7)and the seasonalities (see chapters 1 and 9) of the form that are typicallypresent in ﬁnancial data.
Exponential smoothing has several advantages over the slightly more
complex ARMA class of models discussed above. First, exponential smooth-ing is obviously very simple to use. There is no decision to be made on howmany parameters to estimate (assuming only single exponential smooth-ing is considered). Thus it is easy to update the model if a new realisationbecomes available.
Among the disadvantages of exponential smoothing is the fact that it
is overly simplistic and inﬂexible. Exponential smoothing models can beviewed as but one model from the ARIMA family, which may not necessar-ily be optimal for capturing any linear dependence in the data. Also, theforecasts from an exponential smoothing model do not converge on thelong-term mean of the variable as the horizon increases. The upshot isthat long-term forecasts are overly affected by recent events in the historyof the series under investigation and will therefore be sub-optimal.
A discussion of how exponential smoothing models can be estimated
using EViews will be given after the following section on forecasting ineconometrics.
5.11 Forecasting in econometrics
Although the words ‘forecasting’ and ‘prediction’ are sometimes givendifferent meanings in some studies, in this text the words will be usedsynonymously. In this context, prediction or forecasting simply means anattempt to determine the values that a series is likely to take . Of course, forecasts
might also usefully be made in a cross-sectional environment. Althoughthe discussion below refers to time series data, some of the argumentswill carry over to the cross-sectional context.

244 Introductory Econometrics for Finance
Determining the forecasting accuracy of a model is an important test of
its adequacy. Some econometricians would go as far as to suggest that thestatistical adequacy of a model in terms of whether it violates the CLRMassumptions or whether it contains insigniﬁcant parameters, is largelyirrelevant if the model produces accurate forecasts. The following sub-sections of the book discuss why forecasts are made, how they are madefrom several important classes of models, how to evaluate the forecasts,and so on.
5.11.1 Why forecast?
Forecasts are made essentially because they are useful! Financial decisionsoften involve a long-term commitment of resources, the returns to whichwill depend upon what happens in the future. In this context, the deci-sions made today will reﬂect forecasts of the future state of the world,and the more accurate those forecasts are, the more utility (or money!) islikely to be gained from acting on them.
Some examples in ﬁnance of where forecasts from econometric models
might be useful include:
●Forecasting tomorrow’s return on a particular share
●Forecasting the price of a house given its characteristics
●Forecasting the riskiness of a portfolio over the next year
●Forecasting the volatility of bond returns
●Forecasting the correlation between US and UK stock market movements
tomorrow
●Forecasting the likely number of defaults on a portfolio of home loans.
Again, it is evident that forecasting can apply either in a cross-sectional or
a time series context. It is useful to distinguish between two approachesto forecasting:
●Econometric (structural )forecasting – relates a dependent variable to one or
more independent variables. Such models often work well in the longrun, since a long-run relationship between variables often arises fromno-arbitrage or market efﬁciency conditions. Examples of such forecastswould include return predictions derived from arbitrage pricing mod-els, or long-term exchange rate prediction based on purchasing powerparity or uncovered interest parity theory.
●Time series forecasting – involves trying to forecast the future values of a
series given its previous values and/or previous values of an error term.
The distinction between the two types is somewhat blurred – for example,
it is not clear where vector autoregressive models (see chapter 6 for anextensive overview) ﬁt into this classiﬁcation.

Univariate time series modelling and forecasting 245
In-sample estimation periodOut-of-sample forecast
evaluation period
Jan 1990 Dec 1998 Jan 1999 Dec 1999
Figure 5.9 Use of an in-sample and an out-of-sample period for analysis
It is also worth distinguishing between point and interval forecasts.
Point forecasts predict a single value for the variable of interest, while
interval forecasts provide a range of values in which the future value of
the variable is expected to lie with a given level of conﬁdence.
5.11.2 The difference between in-sample and out-of-sample forecasts
In-sample forecasts are those generated for the same set of data that was
used to estimate the model’s parameters. One would expect the ‘forecasts’of a model to be relatively good in-sample, for this reason. Therefore, asensible approach to model evaluation through an examination of forecastaccuracy is not to use all of the observations in estimating the modelparameters, but rather to hold some observations back. The latter sample,sometimes known as a holdout sample , would be used to construct out-of-
sample forecasts.
To give an illustration of this distinction, suppose that some monthly
FTSE returns for 120 months (January 1990–December 1999) are available.It would be possible to use all of them to build the model (and generateonly in-sample forecasts), or some observations could be kept back, asshown in ﬁgure 5.9.
What would be done in this case would be to use data from 1990 M1 until
1998 M12 to estimate the model parameters, and then the observations for
1999 would be forecasted from the estimated parameters. Of course, whereeach of the in-sample and out-of-sample periods should start and ﬁnishis somewhat arbitrary and at the discretion of the researcher. One couldthen compare how close the forecasts for the 1999 months were relative totheir actual values that are in the holdout sample. This procedure wouldrepresent a better test of the model than an examination of the in-sampleﬁt of the model since the information from 1999 M1 onwards has not been
used when estimating the model parameters.
5.11.3 Some more terminology: one-step-ahead versus multi-step-ahead
forecasts and rolling versus recursive samples
Aone-step-ahead forecast is a forecast generated for the next observation only,
whereas multi-step-ahead forecasts are those generated for 1,2,3,…,ssteps

246 Introductory Econometrics for Finance
ahead, so that the forecasting horizon is for the next speriods. Whether
one-step- or multi-step-ahead forecasts are of interest will be determinedby the forecasting horizon of interest to the researcher.
Suppose that the monthly FTSE data are used as described in the ex-
ample above. If the in-sample estimation period stops in December 1998,then up to 12-step-ahead forecasts could be produced, giving 12 predictionsthat can be compared with the actual values of the series. Comparing theactual and forecast values in this way is not ideal, for the forecasting hori-zon is varying from 1 to 12 steps ahead. It might be the case, for example,that the model produces very good forecasts for short horizons (say, oneor two steps), but that it produces inaccurate forecasts further ahead. Itwould not be possible to evaluate whether this was in fact the case or notsince only a single one-step-ahead forecast, a single 2-step-ahead forecast,and so on, are available. An evaluation of the forecasts would require aconsiderably larger holdout sample.
A useful way around this problem is to use a recursive or rolling window ,
which generates a series of forecasts for a given number of steps ahead.A recursive forecasting model would be one where the initial estimationdate is ﬁxed, but additional observations are added one at a time to theestimation period. A rolling window, on the other hand, is one where thelength of the in-sample period used to estimate the model is ﬁxed, sothat the start date and end date successively increase by one observation.Suppose now that only one-, two-, and three-step-ahead forecasts are ofinterest. They could be produced using the following recursive and rollingwindow approaches:
Objective: to produce Data used to estimate model parameters
1-, 2-, 3-step-ahead forecasts for: Rolling window Recursive window
1999 M1,M2,M3 1990 M1–1998 M12 1990 M1–1998 M12
1999 M2,M3,M4 1990 M2–1999 M1 1990 M1–1999 M1
1999 M3,M4,M5 1990 M3–1999 M2 1990 M1–1999 M2
1999 M4,M5,M6 1990 M4–1999 M3 1990 M1–1999 M3
1999 M5,M6,M7 1990 M5–1999 M4 1990 M1–1999 M4
1999 M6,M7,M8 1990 M6–1999 M5 1990 M1–1999 M5
1999 M7,M8,M9 1990 M7–1999 M6 1990 M1–1999 M6
1999 M8,M9,M10 1990 M8–1999 M7 1990 M1–1999 M7
1999 M9,M10,M11 1990 M9–1999 M8 1990 M1–1999 M8
1999 M10,M11,M12 1990 M10–1999 M9 1990 M1–1999 M9
The sample length for the rolling windows above is always set at 108
observations, while the number of observations used to estimate the

Univariate time series modelling and forecasting 247
parameters in the recursive case increases as we move down the table
and through the sample.
5.11.4 Forecasting with time series versus structural models
To understand how to construct forecasts, the idea of conditional expecta-
tions is required. A conditional expectation would be expressed as
E(yt+1|/Omega1t)
This expression states that the expected value of yis taken for time t+1,
conditional upon, or given, (|)all information available up to and includ-
ing time t(/Omega1t). Contrast this with the unconditional expectation of y,
which is the expected value of ywithout any reference to time, i.e. the
unconditional mean of y. The conditional expectations operator is used
to generate forecasts of the series.
How this conditional expectation is evaluated will of course depend on
the model under consideration. Several families of models for forecastingwill be developed in this and subsequent chapters.
A ﬁrst point to note is that by deﬁnition the optimal forecast for a zero
mean white noise process is zero
E(u
t+s|/Omega1t)=0∀s>0 (5.145)
The two simplest forecasting ‘methods’ that can be employed in almost
every situation are shown in box 5.3.
Box 5.3 Naive forecasting methods
(1) Assume no change so that the forecast, f, of the value of y,ssteps into the future
is the current value of y
E(yt+s|/Omega1t)=yt (5.146)
Such a forecast would be optimal if ytfollowed a random walk process.
(2) In the absence of a full model, forecasts can be generated using the long-term
average of the series. Forecasts using the unconditional mean would be more usefulthan ‘no change’ forecasts for any series that is ‘mean-reverting’ (i.e. stationary).
Time series models are generally better suited to the production of timeseries forecasts than structural models. For an illustration of this, considerthe following linear regression model
y
t=β1+β2x2t+β3x3t+···+ βkxkt+ut (5.147)

248 Introductory Econometrics for Finance
To forecast y, the conditional expectation of its future value is required.
Taking expectations of both sides of (5.147) yields
E(yt|/Omega1t−1)=E(β1+β2x2t+β3x3t+···+ βkxkt+ut) (5.148)
The parameters can be taken through the expectations operator, since
this is a population regression function and therefore they are assumedknown. The following expression would be obtained
E(y
t|/Omega1t−1)=β1+β2E(x2t)+β3E(x3t)+···+ βkE(xkt) (5.149)
But there is a problem: what are E(x2t), etc.? Remembering that informa-
tion is available only until time t−1, the values of these variables are
unknown. It may be possible to forecast them, but this would requireanother set of forecasting models for every explanatory variable. To theextent that forecasting the explanatory variables may be as difﬁcult, oreven more difﬁcult, than forecasting the explained variable, this equationhas achieved nothing! In the absence of a set of forecasts for the explana-tory variables, one might think of using ¯x
2, etc., i.e. the mean values of
the explanatory variables, giving
E(yt)=β1+β2¯x2+β3¯x3+···+ βk¯xk=¯y! (5.150)
Thus, if the mean values of the explanatory variables are used as inputs
to the model, all that will be obtained as a forecast is the average value of
y. Forecasting using pure time series models is relatively common, since
it avoids this problem.
5.11.5 Forecasting with ARMA models
Forecasting using ARMA models is a fairly simple exercise in calculatingconditional expectations. Although any consistent and logical notationcould be used, the following conventions will be adopted in this book. Let
f
t,sdenote a forecast made using an ARMA( p,q)model at time tforssteps
into the future for some series y. The forecasts are generated by what is
known as a forecast function, typically of the form
ft,s=p/summationdisplay
i=1aift,s−i+q/summationdisplay
j=1bjut+s−j (5.151)
where ft,s=yt+s,s≤0; ut+s=0,s>0
=ut+s,s≤0
and aiand biare the autoregressive and moving average coefﬁcients,
respectively.

Univariate time series modelling and forecasting 249
A demonstration of how one generates forecasts for separate AR and
MA processes, leading to the general equation (5.151) above, will now begiven.
5.11.6 Forecasting the future value of an MA(q) process
A moving average process has a memory only of length q, and this lim-
its the sensible forecasting horizon. For example, suppose that an MA(3)model has been estimated
y
t=μ+θ1ut−1+θ2ut−2+θ3ut−3+ut (5.152)
Since parameter constancy over time is assumed, if this relationship holds
for the series yat time t, it is also assumed to hold for yat time t+1,t+
2,…, so1can be added to each of the time subscripts in (5.152), and 2
added to each of the time subscripts, and then 3, and so on, to arrive at
the following
yt+1=μ+θ1ut+θ2ut−1+θ3ut−2+ut+1 (5.153)
yt+2=μ+θ1ut+1+θ2ut+θ3ut−1+ut+2 (5.154)
yt+3=μ+θ1ut+2+θ2ut+1+θ3ut+ut+3 (5.155)
Suppose that all information up to and including that at time tis available
and that forecasts for 1,2,…, ssteps ahead – i.e. forecasts for yat times
t+1,t+2,…, t+sare wanted. yt,yt−1,…, and ut,ut−1, are known, so
producing the forecasts is just a matter of taking the conditional expec-tation of (5.153)
f
t,1=E(yt+1|t)=E(μ+θ1ut+θ2ut−1+θ3ut−2+ut+1|/Omega1t) (5.156)
where E( yt+1|t) is a short-hand notation for E( yt+1|/Omega1t)
ft,1=E(yt+1|t)=μ+θ1ut+θ2ut−1+θ3ut−2 (5.157)
Thus the forecast for y,1step ahead, made at time t, is given by this
linear combination of the disturbance terms. Note that it would not beappropriate to set the values of these disturbance terms to their uncon-ditional mean of zero. This arises because it is the conditional expectation
of their values that is of interest. Given that all information is known upto and including that at time tis available, the values of the error terms
up to time tare known. But u
t+1is not known at time tand therefore
E(ut+1|t)=0, and so on.

250 Introductory Econometrics for Finance
The forecast for 2 steps ahead is formed by taking the conditional ex-
pectation of (5.154)
ft,2=E(yt+2|t)=E(μ+θ1ut+1+θ2ut+θ3ut−1+ut+2|/Omega1t) (5.158)
ft,2=E(yt+2|t)=μ+θ2ut+θ3ut−1 (5.159)
In the case above, ut+2is not known since information is available only to
time t,s oE ( ut+2)is set to zero. Continuing and applying the same rules
to generate 3-, 4-, …,s-step-ahead forecasts
ft,3=E(yt+3|t)=E(μ+θ1ut+2+θ2ut+1+θ3ut+ut+3|/Omega1t) (5.160)
ft,3=E(yt+3|t)=μ+θ3ut (5.161)
ft,4=E(yt+4|t)=μ (5.162)
ft,s=E(yt+s|t)=μ∀s≥4 (5.163)
As the MA(3) process has a memory of only three periods, all forecasts four
or more steps ahead collapse to the intercept. Obviously, if there had beenno constant term in the model, the forecasts four or more steps ahead foran MA(3) would be zero.
5.11.7 Forecasting the future value of an AR(p) process
Unlike a moving average process, an autoregressive process has inﬁnitememory. To illustrate, suppose that an AR(2) model has been estimated
y
t=μ+φ1yt−1+φ2yt−2+ut (5.164)
Again, by appealing to the assumption of parameter stability, this equation
will hold for times t+1,t+2, and so on
yt+1=μ+φ1yt+φ2yt−1+ut+1 (5.165)
yt+2=μ+φ1yt+1+φ2yt+ut+2 (5.166)
yt+3=μ+φ1yt+2+φ2yt+1+ut+3 (5.167)
Producing the one-step-ahead forecast is easy, since all of the information
required is known at time t. Applying the expectations operator to (5.165),
and setting E( ut+1)to zero would lead to
ft,1=E(yt+1|t)=E(μ+φ1yt+φ2yt−1+ut+1|/Omega1t) (5.168)
ft,1=E(yt+1|t)=μ+φ1E(yt|t)+φ2E(yt−1|t) (5.169)
ft,1=E(yt+1|t)=μ+φ1yt+φ2yt−1 (5.170)

Univariate time series modelling and forecasting 251
Applying the same procedure in order to generate a two-step-ahead fore-
cast
ft,2=E(yt+2|t)=E(μ+φ1yt+1+φ2yt+ut+2|/Omega1t) (5.171)
ft,2=E(yt+2|t)=μ+φ1E(yt+1|t)+φ2E(yt|t) (5.172)
The case above is now slightly more tricky, since E( yt+1)is not known,
although this in fact is the one-step-ahead forecast, so that (5.172)becomes
f
t,2=E(yt+2|t)=μ+φ1ft,1+φ2yt (5.173)
Similarly, for three, four, …and ssteps ahead, the forecasts will be, re-
spectively, given by
ft,3=E(yt+3|t)=E(μ+φ1yt+2+φ2yt+1+ut+3|/Omega1t) (5.174)
ft,3=E(yt+3|t)=μ+φ1E(yt+2|t)+φ2E(yt+1|t) (5.175)
ft,3=E(yt+3|t)=μ+φ1ft,2+φ2ft,1 (5.176)
ft,4=μ+φ1ft,3+φ2ft,2 (5.177)
etc. so
ft,s=μ+φ1ft,s−1+φ2ft,s−2 (5.178)
Thus the s-step-ahead forecast for an AR(2) process is given by the inter-
cept+the coefﬁcient on the one-period lag multiplied by the time s−1
forecast +the coefﬁcient on the two-period lag multiplied by the s−2
forecast.
ARMA( p,q)forecasts can easily be generated in the same way by applying
the rules for their component parts, and using the general formula givenby (5.151).
5.11.8 Determining whether a forecast is accurate or not
For example, suppose that tomorrow’s return on the FTSE is predicted tobe 0.2, and that the outcome is actually −0.4. Is this an accurate forecast?
Clearly, one cannot determine whether a forecasting model is good ornot based upon only one forecast and one realisation. Thus in practice,forecasts would usually be produced for the whole of the out-of-sampleperiod, which would then be compared with the actual values, and thedifference between them aggregated in some way. The forecast error forobservation iis deﬁned as the difference between the actual value for
observation iand the forecast made for it. The forecast error, deﬁned
in this way, will be positive (negative) if the forecast was too low (high).Therefore, it is not possible simply to sum the forecast errors, since the

252 Introductory Econometrics for Finance
Table 5.2 Forecast error aggregation
Steps ahead Forecast Actual Squared error Absolute error
1 0.20 −0.40 (0.20 −−0.40)2=0.360|0.20−−0.40|=0.600
2 0.15 0.20 (0.15 −0.20)2=0.002 |0.15−0.20|=0.050
3 0.10 0.10 (0.10 −0.10)2=0.000 |0.10−0.10|=0.000
4 0.06 −0.10 (0.06 −−0.10)2=0.026|0.06−−0.10|=0.160
5 0.04 −0.05 (0.04 −−0.05)2=0.008|0.04−−0.05|=0.090
positive and negative errors will cancel one another out. Thus, before the
forecast errors are aggregated, they are usually squared or the absolutevalue taken, which renders them all positive. To see how the aggregationworks, consider the example in table 5.2, where forecasts are made fora series up to 5 steps ahead, and are then compared with the actualrealisations (with all calculations rounded to 3 decimal places).
The mean squared error, MSE, and mean absolute error, MAE , are now
calculated by taking the average of the fourth and ﬁfth columns, respec-tively
MSE=(0.360+0.002+0.000+0.026+0.008)/5=0.079 (5.179)
MAE=(0.600+0.050+0.000+0.160+0.090)/5=0.180 (5.180)
Taken individually, little can be gleaned from considering the size of the
MSE orMAE , for the statistic is unbounded from above (like the residual
sum of squares or RSS). Instead, the MSE orMAE from one model would
be compared with those of other models for the same data and forecastperiod, and the model(s) with the lowest value of the error measure wouldbe argued to be the most accurate.
MSE provides a quadratic loss function, and so may be particularly use-
ful in situations where large forecast errors are disproportionately moreserious than smaller errors. This may, however, also be viewed as a disad-vantage if large errors are not disproportionately more serious, although
the same critique could also, of course, be applied to the whole leastsquares methodology. Indeed Dielman (1986) goes as far as to say thatwhen there are outliers present, least absolute values should be used todetermine model parameters rather than least squares. Makridakis (1993,p. 528) argues that mean absolute percentage error ( MAPE ) is ‘a relative
measure that incorporates the best characteristics among the various ac-curacy criteria’. Once again, denoting s-step-ahead forecasts of a variable
made at time tasf
t,sand the actual value of the variable at time tasyt,

Univariate time series modelling and forecasting 253
then the mean square error can be deﬁned as
MSE=1
T−(T1−1)T/summationdisplay
t=T1(yt+s−ft,s)2(5.181)
where Tis the total sample size (in-sample +out-of-sample), and T1is the
ﬁrst out-of-sample forecast observation. Thus in-sample model estimationinitially runs from observation 1 to ( T
1−1), and observations T1toTare
available for out-of-sample estimation, i.e. a total holdout sample of T−
(T1−1).
Mean absolute error ( MAE) measures the average absolute forecast error,
and is given by
MAE=1
T−(T1−1)T/summationdisplay
t=T1|yt+s−ft,s| (5.182)
Adjusted MAPE ( AMAPE ) or symmetric MAPE corrects for the problem of
asymmetry between the actual and forecast values
AMAPE =100
T−(T1−1)T/summationdisplay
t=T1/vextendsingle/vextendsingle/vextendsingle/vextendsingley
t+s−ft,s
yt+s+ft,s/vextendsingle/vextendsingle/vextendsingle/vextendsingle(5.183)
The symmetry in (5.183) arises since the forecast error is divided by twice
the average of the actual and forecast values. So, for example, AMAPE will
be the same whether the forecast is 0.5 and the actual value is 0.3, orthe actual value is 0.5 and the forecast is 0.3. The same is not true of thestandard MAPE formula, where the denominator is simply y
t+s,s ot h a t
whether ytorft,sis larger will affect the result
MAPE =100
T−(T1−1)T/summationdisplay
t=T1/vextendsingle/vextendsingle/vextendsingle/vextendsingley
t+s−ft,s
yt+s/vextendsingle/vextendsingle/vextendsingle/vextendsingle(5.184)
MAPE also has the attractive additional property compared to MSE that
it can be interpreted as a percentage error, and furthermore, its value isbounded from below by 0.
Unfortunately, it is not possible to use the adjustment if the series and
the forecasts can take on opposite signs (as they could in the context ofreturns forecasts, for example). This is due to the fact that the predictionand the actual value may, purely by coincidence, take on values that arealmost equal and opposite, thus almost cancelling each other out in thedenominator. This leads to extremely large and erratic values of AMAPE .
In such an instance, it is not possible to use MAPE as a criterion either.
Consider the following example: say we forecast a value of f
t,s=3, but
the out-turn is that yt+s=0.0001. The addition to total MSE from this one

254 Introductory Econometrics for Finance
observation is given by
1
391×(0.0001−3)2=0.0230 (5.185)
This value for the forecast is large, but perfectly feasible since in many
cases it will be well within the range of the data. But the addition to totalMAPE from just this single observation is given by
100
391/vextendsingle/vextendsingle/vextendsingle/vextendsingle0.0001−3
0.0001/vextendsingle/vextendsingle/vextendsingle/vextendsingle=7670 (5.186)
MAPE has the advantage that for a random walk in the log levels (i.e. a
zero forecast), the criterion will take the value one (or 100 if we multiplythe formula by 100 to get a percentage, as was the case for the equationabove. So if a forecasting model gives a MAPE smaller than one (or 100),
it is superior to the random walk model. In fact the criterion is also notreliable if the series can take on absolute values less than one. This pointmay seem somewhat obvious, but it is clearly important for the choice offorecast evaluation criteria.
Another criterion which is popular is Theil’s U-statistic (1966). The met-
ric is deﬁned as follows
U=/radicalBigg
T/summationtext
t=T1/parenleftbiggyt+s−ft,s
yt+s/parenrightbigg2
/radicalBigg
T/summationtext
t=T1/parenleftbiggyt+s−fbt,s
yt+s/parenrightbigg2(5.187)
where fbt,sis the forecast obtained from a benchmark model (typically
a simple model such as a naive or random walk). A U-statistic of one
implies that the model under consideration and the benchmark modelare equally (in)accurate, while a value of less than one implies that themodel is superior to the benchmark, and vice versa for U>1. Although
the measure is clearly useful, as Makridakis and Hibon (1995) argue, it isnot without problems since if fb
t,sis the same as yt+s,Uwill be inﬁnite
since the denominator will be zero. The value of Uwill also be inﬂuenced
by outliers in a similar vein to MSE and has little intuitive meaning.3
5.11.9 Statistical versus ﬁnancial or economic loss functions
Many econometric forecasting studies evaluate the models’ success using
statistical loss functions such as those described above. However, it is not
3Note that the Theil’s U-formula reported by EViews is slightly different.

Univariate time series modelling and forecasting 255
necessarily the case that models classed as accurate because they have
small mean squared forecast errors are useful in practical situations. Togive one speciﬁc illustration, it has recently been shown (Gerlow, Irwin andLiu, 1993) that the accuracy of forecasts according to traditional statisticalcriteria may give little guide to the potential proﬁtability of employingthose forecasts in a market trading strategy. So models that perform poorlyon statistical grounds may still yield a proﬁt if used for trading, and viceversa.
On the other hand, models that can accurately forecast the sign of
future returns, or can predict turning points in a series have been foundto be more proﬁtable (Leitch and Tanner, 1991). Two possible indicatorsof the ability of a model to predict direction changes irrespective of theirmagnitude are those suggested by Pesaran and Timmerman (1992) andby Refenes (1995). The relevant formulae to compute these measures are,respectively
% correct sign predictions =1
T−(T1−1)T/summationdisplay
t=T1zt+s (5.188)
where zt+s=1if(yt+sft,s)>0
zt+s=0otherwise
and
% correct direction change predictions =1
T−(T1−1)T/summationdisplay
t=T1zt+s (5.189)
where zt+s=1if(yt+s−yt)(ft,s−yt)>0
zt+s=0otherwise
Thus, in each case, the criteria give the proportion of correctly predicted
signs and directional changes for some given lead time s, respectively.
Considering how strongly each of the three criteria outlined above ( MSE,
MAE and proportion of correct sign predictions) penalises large errors
relative to small ones, the criteria can be ordered as follows:
Penalises large errors least →penalises large errors most heavily
Sign prediction →MAE→MSE
MSE penalises large errors disproportionately more heavily than small er-
rors, MAE penalises large errors proportionately equally as heavily as small
errors, while the sign prediction criterion does not penalise large errorsany more than small errors.

256 Introductory Econometrics for Finance
5.11.10 Finance theory and time series analysis
An example of ARIMA model identiﬁcation, estimation and forecasting in
the context of commodity prices is given by Chu (1978). He ﬁnds ARIMAmodels useful compared with structural models for short-term forecast-ing, but also ﬁnds that they are less accurate over longer horizons. It alsoobserved that ARIMA models have limited capacity to forecast unusualmovements in prices.
Chu (1978) argues that, although ARIMA models may appear to be com-
pletely lacking in theoretical motivation, and interpretation, this may notnecessarily be the case. He cites several papers and offers an additionalexample to suggest that ARIMA speciﬁcations quite often arise naturallyas reduced form equations (see chapter 6) corresponding to some under-lying structural relationships. In such a case, not only would ARIMA mod-els be convenient and easy to estimate, they could also be well groundedin ﬁnancial or economic theory after all.
5.12 Forecasting using ARMA models in EViews
Once a speciﬁc model order has been chosen and the model estimated fora particular set of data, it may be of interest to use the model to forecastfuture values of the series. Suppose that the AR(2) model selected for thehouse price percentage changes series were estimated using observationsFebruary 1991–December 2004, leaving 29 remaining observations to con-struct forecasts for and to test forecast accuracy (for the period January2005–May 2007).
Once the required model has been estimated and EViews has opened a
window displaying the output, click on the Forecast icon. In this instance,
the sample range to forecast would, of course, be 169–197 (which shouldbe entered as 2005M01–2007M05). There are two methods available inEViews for constructing forecasts: dynamic and static. Select the optionDynamic to calculate multi-step forecasts starting from the ﬁrst period
in the forecast sample or Static to calculate a sequence of one-step-ahead
forecasts, rolling the sample forwards one observation after each forecastto use actual rather than forecasted values for lagged dependent variables.The outputs for the dynamic and static forecasts are given in screenshots5.2 and 5.3.
The forecasts are plotted using the continuous line, while a conﬁdence
interval is given by the two dotted lines in each case. For the dynamicforecasts, it is clearly evident that the forecasts quickly converge upon thelong-term unconditional mean value as the horizon increases. Of course,

Univariate time series modelling and forecasting 257
Screenshot 5.2
Plot and summary
statistics for thedynamic forecastsfor the percentagechanges in houseprices using anAR(2)
this does not occur with the series of one-step-ahead forecasts produced
by the ‘static’ command. Several other useful measures concerning theforecast errors are displayed in the plot box, including the square root ofthe mean squared error (RMSE), the MAE, the MAPE and Theil’s U-statistic.The MAPE for the dynamic and static forecasts for DHP are well over100% in both cases, which can sometimes happen for the reasons outlinedabove. This indicates that the model forecasts are unable to account formuch of the variability of the out-of-sample part of the data. This is to beexpected as forecasting changes in house prices, along with the changesin the prices of any other assets, is difﬁcult!
EViews provides another piece of useful information – a decomposition
of the forecast errors. The mean squared forecast error can be decomposedinto a bias proportion, a variance proportion and a covariance proportion.The bias component measures the extent to which the mean of the forecasts
is different to the mean of the actual data (i.e. whether the forecasts arebiased). Similarly, the variance component measures the difference between
the variation of the forecasts and the variation of the actual data, whilethecovariance component captures any remaining unsystematic part of the

258 Introductory Econometrics for Finance
Screenshot 5.3
Plot and summary
statistics for thestatic forecasts forthe percentagechanges in houseprices using anAR(2)
forecast errors. As one might have expected, the forecasts are not biased.
Accurate forecasts would be unbiased and also have a small variance pro-portion, so that most of the forecast error should be attributable to thecovariance (unsystematic or residual) component. For further details, seeGranger and Newbold (1986).
A robust forecasting exercise would of course employ a longer out-of-
sample period than the two years or so used here, would perhaps employseveral competing models in parallel, and would also compare the accu-racy of the predictions by examining the error measures given in the boxafter the forecast plots.
5.13 Estimating exponential smoothing models using EViews
This class of models can be easily estimated in EViews by double clickingon the desired variable in the workﬁle, so that the spreadsheet for thatvariable appears, and selecting Proc on the button bar for that variable
and then Exponential Smoothing …. The screen with options will appear
as in screenshot 5.4.

Univariate time series modelling and forecasting 259
Screenshot 5.4
Estimating
exponentialsmoothing models
There is a variety of smoothing methods available, including single and
double, or various methods to allow for seasonality and trends in thedata. Select Single (exponential smoothing), which is the only smoothing
method that has been discussed in this book, and specify the estimationsample period as 1991M1 – 2004M12 to leave 29 observations for out-
of-sample forecasting. Clicking OKwill give the results in the following
table.
Date: 09/02/07 Time: 14:46
Sample: 1991M02 2004M12Included observations: 167Method: Single ExponentialOriginal Series: DHPForecast Series: DHPSM
Parameters: Alpha 0.0760Sum of Squared Residuals 208.5130Root Mean Squared Error 1.117399
End of Period Levels: Mean 0.994550

260 Introductory Econometrics for Finance
The output includes the value of the estimated smoothing coefﬁcient
(=0.076 in this case), together with the RSS for the in-sample estimation
period and the RMSE for the 29 forecasts. The ﬁnal in-sample smoothedvalue will be the forecast for those 29 observations (which in this casewould be 0.994550). EViews has automatically saved the smoothed values(i.e. the model ﬁtted values) and the forecasts in a series called ‘DHPSM’.
Key concepts
The key terms to be able to deﬁne and explain from this chapter are
●ARIMA models ●Ljung–Box test
●invertible MA ●Wold’s decomposition theorem
●autocorrelation function ●partial autocorrelation function
●Box-Jenkins methodology ●information criteria
●exponential smoothing ●recursive window
●rolling window ●out-of-sample
●multi-step forecast ●mean squared error
●mean absolute percentage error
Review questions
1. What are the differences between autoregressive and moving average
models?
2. Why might ARMA models be considered particularly useful for ﬁnancial
time series? Explain, without using any equations or mathematicalnotation, the difference between AR, MA and ARMA processes.
3. Consider the following three models that a researcher suggests might
be a reasonable model of stock market prices
y
t=yt−1+ut (5.190)
yt=0.5yt−1+ut (5.191)
yt=0.8ut−1+ut (5.192)
(a) What classes of models are these examples of?
(b) What would the autocorrelation function for each of these
processes look like? (You do not need to calculate the acf, simplyconsider what shape it might have given the class of model fromwhich it is drawn.)
(c) Which model is more likely to represent stock market prices from a
theoretical perspective, and why? If any of the three models trulyrepresented the way stock market prices move, which could

Univariate time series modelling and forecasting 261
potentially be used to make money by forecasting future values of
the series?
(d) By making a series of successive substitutions or from your
knowledge of the behaviour of these types of processes, considerthe extent of persistence of shocks in the series in each case.
4. (a) Describe the steps that Box and Jenkins (1976) suggested should
be involved in constructing an ARMA model.
(b) What particular aspect of this methodology has been the subject of
criticism and why?
(c) Describe an alternative procedure that could be used for this
aspect.
5. You obtain the following estimates for an AR(2) model of some returns
data
y
t=0.803yt−1+0.682yt−2+ut
where utis a white noise error process. By examining the characteristic
equation, check the estimated model for stationarity.
6. A researcher is trying to determine the appropriate order of an ARMA
model to describe some actual data, with 200 observations available.She has the following ﬁgures for the log of the estimated residualvariance (i.e. log ( ˆσ
2))for various candidate models. She has assumed
that an order greater than (3,3) should not be necessary to model thedynamics of the data. What is the ‘optimal’ model order?
ARMA (p,q) log(ˆσ
2)
model order(0,0) 0.932(1,0) 0.864(0,1) 0.902(1,1) 0.836(2,1) 0.801(1,2) 0.821(2,2) 0.789(3,2) 0.773(2,3) 0.782(3,3) 0.764
7. How could you determine whether the order you suggested for question
6 was in fact appropriate?
8. ‘Given that the objective of any econometric modelling exercise is to
ﬁnd the model that most closely ‘ﬁts’ the data, then adding more lags

262 Introductory Econometrics for Finance
to an ARMA model will almost invariably lead to a better ﬁt. Therefore a
large model is best because it will ﬁt the data more closely.’Comment on the validity (or otherwise) of this statement.
9. (a) You obtain the following sample autocorrelations and partial
autocorrelations for a sample of 100 observations from actual data:
L a g 1 2 34 5678
acf 0.420 0.104 0.032 −0.206 −0.138 0.042 −0.018 0.074
pacf 0.632 0.381 0.268 0.199 0.205 0.101 0.096 0.082
Can you identify the most appropriate time series process for this
data?
(b) Use the Ljung–Box Q
∗test to determine whether the ﬁrst three
autocorrelation coefﬁcients taken together are jointly signiﬁcantlydifferent from zero.
10. You have estimated the following ARMA(1,1) model for some time
series data
y
t=0.036+0.69yt−1+0.42ut−1+ut
Suppose that you have data for time to t−1, i.e. you know that
yt−1=3.4, and ˆut−1=−1.3
(a) Obtain forecasts for the series yfor times t,t+1, and t+2 using
the estimated ARMA model.
(b) If the actual values for the series turned out to be −0.032, 0.961,
0.203fort,t+1,t+2, calculate the (out-of-sample) mean squared
error.
(c) A colleague suggests that a simple exponential smoothing model
might be more useful for forecasting the series. The estimatedvalue of the smoothing constant is 0.15, with the most recentlyavailable smoothed value, S
t−1being 0.0305. Obtain forecasts for
the series yfor times t,t+1, and t+2 using this model.
(d) Given your answers to parts (a) to (c) of the question, determine
whether Box–Jenkins or exponential smoothing models give themost accurate forecasts in this application.
11. (a) Explain what stylised shapes would be expected for the
autocorrelation and partial autocorrelation functions for thefollowing stochastic processes:
●white noise
●an AR(2)
●an MA(1)
●an ARMA (2,1).

Univariate time series modelling and forecasting 263
(b) Consider the following ARMA process.
yt=0.21+1.32yt−1+0.58ut−1+ut
Determine whether the MA part of the process is invertible.
(c) Produce 1-,2-,3- and 4-step-ahead forecasts for the process given in
part (b).
(d) Outline two criteria that are available for evaluating the forecasts
produced in part (c), highlighting the differing characteristics ofeach.
(e) What procedure might be used to estimate the parameters of an
ARMA model? Explain, brieﬂy, how such a procedure operates, andwhy OLS is not appropriate.
12. (a) Brieﬂy explain any difference you perceive between the
characteristics of macroeconomic and ﬁnancial data. Which ofthese features suggest the use of different econometric tools foreach class of data?
(b) Consider the following autocorrelation and partial autocorrelation
coefﬁcients estimated using 500 observations for a weaklystationary series, y
t:
Lag acf pacf
1 0.307 0.307
2−0.013 0.264
3 0.086 0.1474 0.031 0.0865−0.197 0.049
Using a simple ‘rule of thumb’, determine which, if any, of the acf
and pacf coefﬁcients are signiﬁcant at the 5% level. Use both theBox–Pierce and Ljung–Box statistics to test the joint null hypothesisthat the ﬁrst ﬁve autocorrelation coefﬁcients are jointly zero.
(c) What process would you tentatively suggest could represent the
most appropriate model for the series in part (b)? Explain youranswer.
(d) Two researchers are asked to estimate an ARMA model for a daily
USD/GBP exchange rate return series, denoted x
t. Researcher A
uses Schwarz’s criterion for determining the appropriate modelorder and arrives at an ARMA(0,1). Researcher Buses Akaike’s
information criterion which deems an ARMA(2,0) to be optimal. The

264 Introductory Econometrics for Finance
estimated models are
A:ˆxt=0.38+0.10ut−1
B:ˆxt=0.63+0.17xt−1−0.09xt−2
where utis an error term.
You are given the following data for time until day z(i.e. t=z)
xz=0.31,xz−1=0.02,xz−2=−0.16
uz=−0.02,uz−1=0.13,uz−2=0.19
Produce forecasts for the next 4 days (i.e. for times z+1,z+2,
z+3,z+4) from both models.
(e) Outline two methods proposed by Box and Jenkins (1970) for
determining the adequacy of the models proposed in part (d).
(f) Suppose that the actual values of the series xon days z+1,z+2,
z+3,z+4 turned out to be 0.62, 0.19, −0.32, 0.72, respectively.
Determine which researcher’s model produced the most accurateforecasts.
13. Select two of the stock series from the ‘CAPM.XLS’ Excel ﬁle, construct
a set of continuously compounded returns, and then perform atime-series analysis of these returns. The analysis should include(a) An examination of the autocorrelation and partial autocorrelation
functions.
(b) An estimation of the information criteria for each ARMA model order
from (0,0) to (5,5).
(c) An estimation of the model that you feel most appropriate given the
results that you found from the previous two parts of the question.
(d) The construction of a forecasting framework to compare the
forecasting accuracy of
i. Your chosen ARMA model
ii. An arbitrary ARMA(1,1)
iii. An single exponential smoothing model
iv. A random walk with drift in the log price levels (hint: this is
easiest achieved by treating the returns as an ARMA(0,0) – i.e.simply estimating a model including only a constant).
(e) Then compare the ﬁtted ARMA model with the models that were
estimated in chapter 4 based on exogenous variables. Which typeof model do you prefer and why?

6
Multivariate models
Learning Outcomes
In this chapter, you will learn how to
●Compare and contrast single equation and systems-based
approaches to building models
●Discuss the cause, consequence and solution to simultaneous
equations bias
●Derive the reduced form equations from a structural model
●Describe several methods for estimating simultaneous
equations models
●Explain the relative advantages and disadvantages of VAR
modelling
●Determine whether an equation from a system is identiﬁed
●Estimate optimal lag lengths, impulse responses and variance
decompositions
●Conduct Granger causality tests
●Construct simultaneous equations models and VARs in EViews
6.1 Motivations
All of the structural models that have been considered thus far have been
single equations models of the form
y=Xβ+u (6.1)
One of the assumptions of the classical linear regression model (CLRM)
is that the explanatory variables are non-stochastic , or ﬁxed in repeated
samples. There are various ways of stating this condition, some of whichare slightly more or less strict, but all of which have the same broad
265

266 Introductory Econometrics for Finance
implication. It could also be stated that all of the variables contained in
theXmatrix are assumed to be exogenous – that is, their values are deter-
mined outside that equation. This is a rather simplistic working deﬁnitionof exogeneity, although several alternatives are possible; this issue will berevisited later in the chapter. Another way to state this is that the modelis ‘conditioned on’ the variables in X.
As stated in chapter 2, the Xmatrix is assumed not to have a probability
distribution. Note also that causality in this model runs from Xtoy,a n d
not vice versa, i.e. that changes in the values of the explanatory variablescause changes in the values of y, but that changes in the value of ywill
not impact upon the explanatory variables. On the other hand, yis an
endogenous variable – that is, its value is determined by (6.1).
The purpose of the ﬁrst part of this chapter is to investigate one of the
important circumstances under which the assumption presented abovewill be violated. The impact on the OLS estimator of such a violation willthen be considered.
To illustrate a situation in which such a phenomenon may arise, con-
sider the following two equations that describe a possible model for thetotal aggregate (country-wide) supply of new houses (or any other physicalasset).
Q
dt=α+βPt+γSt+ut (6.2)
Qst=λ+μPt+κTt+vt (6.3)
Qdt=Qst (6.4)
where
Qdt=quantity of new houses demanded at time t
Qst=quantity of new houses supplied (built) at time t
Pt=(average) price of new houses prevailing at time t
St=price of a substitute (e.g. older houses)
Tt=some variable embodying the state of housebuilding technology, ut
andvtare error terms.
Equation (6.2) is an equation for modelling the demand for new houses,
and (6.3) models the supply of new houses. (6.4) is an equilibrium condi-tion for there to be no excess demand (people willing and able to buy newhouses but cannot) and no excess supply (constructed houses that remainempty owing to lack of demand).
Assuming that the market always clears, that is, that the market is
always in equilibrium, and dropping the time subscripts for simplicity,

Multivariate models 267
(6.2)–(6.4) can be written
Q=α+βP+γS+u (6.5)
Q=λ+μP+κT+v (6.6)
Equations (6.5) and (6.6) together comprise a simultaneous structural form
of the model, or a set of structural equations. These are the equationsincorporating the variables that economic or ﬁnancial theory suggestsshould be related to one another in a relationship of this form. The pointis that price and quantity are determined simultaneously (price affectsquantity and quantity affects price). Thus, in order to sell more houses,everything else equal, the builder will have to lower the price. Equally, inorder to obtain a higher price for each house, the builder should constructand expect to sell fewer houses. Pand Qare endogenous variables, while
Sand Tare exogenous.
A set of reduced form equations corresponding to (6.5) and (6.6) can be
obtained by solving (6.5) and (6.6) for Pand for Q(separately). There will
be a reduced form equation for each endogenous variable in the system.
Solving for Q
α+βP+γS+u=λ+μP+κT+v (6.7)
Solving for P
Q
β−α
β−γS
β−u
β=Q
μ−λ
μ−κT
μ−v
μ(6.8)
Rearranging (6.7)
βP−μP=λ−α+κT−γS+v−u (6.9)
(β−μ)P=(λ−α)+κT−γS+(v−u) (6.10)
P=λ−α
β−μ+κ
β−μT−γ
β−μS+v−u
β−μ(6.11)
Multiplying (6.8) through by βμand rearranging
μQ−μα−μγS−μu=βQ−βλ−βκT−βv (6.12)
μQ−βQ=μα−βλ−βκT+μγS+μu−βv (6.13)
(μ−β)Q=(μα−βλ)−βκT+μγS+(μu−βv) (6.14)
Q=μα−βλ
μ−β−βκ
μ−βT+μγ
μ−βS+μu−βv
μ−β(6.15)
(6.11) and (6.15) are the reduced form equations for Pand Q. They are the
equations that result from solving the simultaneous structural equations

268 Introductory Econometrics for Finance
given by (6.5) and (6.6). Notice that these reduced form equations have
only exogenous variables on the RHS.
6.2 Simultaneous equations bias
It would not be possible to estimate (6.5) and (6.6) validly using OLS, as theyare clearly related to one another since they both contain Pand Q,a n d
OLS would require them to be estimated separately. But what would havehappened if a researcher had estimated them separately using OLS? Bothequations depend on P. One of the CLRM assumptions was that Xand u
are independent (where Xis a matrix containing all the variables on the
RHS of the equation), and given also the assumption that E (u)=0, then
E(X
/primeu)=0, i.e. the errors are uncorrelated with the explanatory variables.
But it is clear from (6.11) that Pis related to the errors in (6.5) and (6.6) –
i.e. it is stochastic . So this assumption has been violated.
What would be the consequences for the OLS estimator, ˆβif the simul-
taneity were ignored? Recall that
ˆβ=(X/primeX)−1X/primey (6.16)
and that
y=Xβ+u (6.17)
Replacing yin (6.16) with the RHS of (6.17)
ˆβ=(X/primeX)−1X/prime(Xβ+u) (6.18)
so that
ˆβ=(X/primeX)−1X/primeXβ+(X/primeX)−1X/primeu (6.19)
ˆβ=β+(X/primeX)−1X/primeu (6.20)
Taking expectations,
E(ˆβ)=E(β)+E((X/primeX)−1X/primeu) (6.21)
E(ˆβ)=β+E((X/primeX)−1X/primeu) (6.22)
If the Xs are non-stochastic (i.e. if the assumption had not been violated),
E[(X/primeX)−1X/primeu]=(X/primeX)−1X/primeE[u]=0, which would be the case in a single
equation system, so that E(ˆβ)=βin (6.22). The implication is that the
OLS estimator, ˆβ, would be unbiased.
But, if the equation is part of a system, then E[ (X/primeX)−1X/primeu]/negationslash=0, in
general, so that the last term in (6.22) will not drop out, and so it can be

Multivariate models 269
concluded that application of OLS to structural equations which are part
of a simultaneous system will lead to biased coefﬁcient estimates. This isknown as simultaneity bias orsimultaneous equations bias .
Is the OLS estimator still consistent, even though it is biased? No, in
fact, the estimator is inconsistent as well, so that the coefﬁcient estimateswould still be biased even if an inﬁnite amount of data were available,although proving this would require a level of algebra beyond the scopeof this book.
6.3 So how can simultaneous equations models
be validly estimated?
Taking (6.11) and (6.15), i.e. the reduced form equations, they can be rewrit-
ten as
P=π10+π11T+π12S+ε1 (6.23)
Q=π20+π21T+π22S+ε2 (6.24)
where the πcoefﬁcients in the reduced form are simply combinations of
the original coefﬁcients, so that
π10=λ−α
β−μ,π 11=κ
β−μ,π 12=−γ
β−μ,ε 1=v−u
β−μ,
π20=μα−βλ
μ−β,π 21=−βκ
μ−β,π 22=μγ
μ−β,ε 2=μu−βv
μ−β
Equations (6.23) and (6.24) can be estimated using OLS since all the RHS
variables are exogenous, so the usual requirements for consistency andunbiasedness of the OLS estimator will hold (provided that there are noother misspeciﬁcations). Estimates of the π
ijcoefﬁcients would thus be
obtained. But, the values of the πcoefﬁcients are probably not of much
interest; what was wanted were the original parameters in the structuralequations – α, β, γ, λ, μ, κ. The latter are the parameters whose val-
ues determine how the variables are related to one another according toﬁnancial or economic theory.
6.4 Can the original coefﬁcients be retrieved from the πs?
The short answer to this question is ‘sometimes’, depending upon whether
the equations are identiﬁed. Identification is the issue of whether there is
enough information in the reduced form equations to enable the struc-tural form coefﬁcients to be calculated. Consider the following demand

270 Introductory Econometrics for Finance
and supply equations
Q=α+βPSupply equation (6.25)
Q=λ+μPDemand equation (6.26)
It is impossible to tell which equation is which, so that if one simply ob-
served some quantities of a good sold and the price at which they weresold, it would not be possible to obtain the estimates of α, β, λ andμ. This
arises since there is insufﬁcient information from the equations to esti-mate 4 parameters. Only 2 parameters could be estimated here, althougheach would be some combination of demand and supply parameters, andso neither would be of any use. In this case, it would be stated that bothequations are unidentified (or not identiﬁed or underidentiﬁed). Notice that
this problem would not have arisen with (6.5) and (6.6) since they havedifferent exogenous variables.
6.4.1 What determines whether an equation is identiﬁed or not?
Any one of three possible situations could arise, as shown in box 6.1.
How can it be determined whether an equation is identiﬁed or not?
Broadly, the answer to this question depends upon how many and whichvariables are present in each structural equation. There are two conditionsthat could be examined to determine whether a given equation from asystem is identiﬁed – the order condition and the rank condition :
●The order condition – is a necessary but not sufﬁcient condition for an
equation to be identiﬁed. That is, even if the order condition is satisﬁed,the equation might not be identiﬁed.
●The rank condition – is a necessary and sufﬁcient condition for identi-
ﬁcation. The structural equations are speciﬁed in a matrix form andthe rank of a coefﬁcient matrix of all of the variables excluded from a
Box 6.1 Determining whether an equation is identiﬁed
(1) An equation is unidentiﬁed , such as (6.25) or (6.26). In the case of an unidentiﬁed
equation, structural coefﬁcients cannot be obtained from the reduced form estimatesby any means.
(2) An equation is exactly identiﬁed (just identiﬁed ), such as (6.5) or (6.6). In the case
of a just identiﬁed equation, unique structural form coefﬁcient estimates can beobtained by substitution from the reduced form equations.
(3) If an equation is overidentiﬁed , more than one set of structural coefﬁcients can be
obtained from the reduced form. An example of this will be presented later in thischapter.

Multivariate models 271
particular equation is examined. An examination of the rank condition
requires some technical algebra beyond the scope of this text.
Even though the order condition is not sufﬁcient to ensure identiﬁcation
of an equation from a system, the rank condition will not be consideredfurther here. For relatively simple systems of equations, the two ruleswould lead to the same conclusions. Also, in fact, most systems of equa-tions in economics and ﬁnance are overidentiﬁed, so that underidentiﬁ-cation is not a big issue in practice.
6.4.2 Statement of the order condition
There are a number of different ways of stating the order condition; thatemployed here is an intuitive one (taken from Ramanathan, 1995, p. 666,and slightly modiﬁed):
LetGdenote the number of structural equations. An equation is just
identiﬁed if the number of variables excluded from an equation is G−1,
where ‘excluded’ means the number of all endogenous and exogenousvariables that are not present in this particular equation. If more than
G−1 are absent, it is over-identiﬁed. If less than G−1 are absent, it is
not identiﬁed.
One obvious implication of this rule is that equations in a system can have
differing degrees of identiﬁcation, as illustrated by the following example.
Example 6.1
In the following system of equations, the Ysare endogenous, while the
Xs are exogenous (with time subscripts suppressed). Determine whether
each equation is overidentiﬁed, underidentiﬁed, or just identiﬁed.
Y1=α0+α1Y2+α3Y3+α4X1+α5X2+u1 (6.27)
Y2=β0+β1Y3+β2X1+u2 (6.28)
Y3=γ0+γ1Y2+u3 (6.29)
In this case, there are G=3 equations and 3 endogenous variables. Thus,
if the number of excluded variables is exactly 2, the equation is just iden-tiﬁed. If the number of excluded variables is more than 2, the equationis overidentiﬁed. If the number of excluded variables is less than 2, theequation is not identiﬁed.
The variables that appear in one or more of the three equations are Y
1,
Y2,Y3,X1,X2. Applying the order condition to (6.27)–(6.29):

272 Introductory Econometrics for Finance
●Equation (6.27): contains all variables, with none excluded, so that it is
not identiﬁed
●Equation (6.28): has variables Y1and X2excluded, and so is just identi-
ﬁed
●Equation (6.29): has variables Y1,X1,X2excluded, and so is overidenti-
ﬁed
6.5 Simultaneous equations in ﬁnance
There are of course numerous situations in ﬁnance where a simultaneousequations framework is more relevant than a single equation model. Twoillustrations from the market microstructure literature are presented laterin this chapter, while another, drawn from the banking literature, will bediscussed now.
There has recently been much debate internationally, but especially in
the UK, concerning the effectiveness of competitive forces in the bankingindustry. Governments and regulators express concern at the increasingconcentration in the industry, as evidenced by successive waves of mergeractivity, and at the enormous proﬁts that many banks made in the late1990s and early twenty-ﬁrst century. They argue that such proﬁts resultfrom a lack of effective competition. However, many (most notably, ofcourse, the banks themselves!) suggest that such proﬁts are not the resultof excessive concentration or anti-competitive practices, but rather partlyarise owing to recent world prosperity at that phase of the business cycle(the ‘proﬁts won’t last’ argument) and partly owing to massive cost-cuttingby the banks, given recent technological improvements. These debateshave fuelled a resurgent interest in models of banking proﬁtability andbanking competition. One such model is employed by Shaffer and DiSalvo(1994) in the context of two banks operating in south central Pennsylvania.The model is given by
lnq
it=a0+a1lnPit+a2lnPjt+a3lnYt+a4lnZt+a5t+ui1t (6.30)
lnTR it=b0+b1lnqit+3/summationdisplay
k=1bk+1lnwikt+ui2t (6.31)
where i=1,2are the two banks, qis bank output, Ptis the price of the
output at time t,Ytis a measure of aggregate income at time t,Ztis
the price of a substitute for bank activity at time t, the variable trep-
resents a time trend, TRitis the total revenue of bank iat time t,wikt

Multivariate models 273
are the prices of input k(k=1, 2, 3 for labour, bank deposits, and phys-
ical capital) for bank iat time tand the uare unobservable error terms.
The coefﬁcient estimates are not presented here, but sufﬁce to say that asimultaneous framework, with the resulting model estimated separatelyusing annual time series data for each bank, is necessary. Output is afunction of price on the RHS of (6.30), while in (6.31), total revenue,which is a function of output on the RHS, is obviously related to price.Therefore, OLS is again an inappropriate estimation technique. Both ofthe equations in this system are overidentiﬁed, since there are only twoequations, and the income, the substitute for banking activity, and thetrend terms are missing from (6.31), whereas the three input prices aremissing from (6.30).
6.6 A deﬁnition of exogeneity
Leamer (1985) deﬁnes a variable xas exogenous if the conditional dis-
tribution of ygiven xdoes not change with modiﬁcations of the process
generating x. Although several slightly different deﬁnitions exist, it is pos-
sible to classify two forms of exogeneity – predeterminedness and strictexogeneity:
●Apredetermined variable is one that is independent of the contempora-
neous and future errors in that equation
●Astrictly exogenous variable is one that is independent of all contempo-
raneous, future and past errors in that equation.
6.6.1 Tests for exogeneity
How can a researcher tell whether variables really need to be treated asendogenous or not? In other words, ﬁnancial theory might suggest thatthere should be a two-way relationship between two or more variables, buthow can it be tested whether a simultaneous equations model is necessaryin practice?
Example 6.2
Consider again (6.27)–(6.29). Equation (6.27) contains Y2and Y3– but are
separate equations required for them, or could the variables Y2and Y3be
treated as exogenous variables (in which case, they would be called X3
and X4!)? This can be formally investigated using a Hausman test, which
is calculated as shown in box 6.2.

274 Introductory Econometrics for Finance
Box 6.2 Conducting a Hausman test for exogeneity
(1)Obtain the reduced form equations corresponding to (6.27)–(6.29) .
The reduced form equations are obtained as follows.
Substituting in (6.28) for Y3from (6.29):
Y2=β0+β1(γ0+γ1Y2+u3)+β2X1+u2 (6.32)
Y2=β0+β1γ0+β1γ1Y2+β1u3+β2X1+u2 (6.33)
Y2(1−β1γ1)=(β0+β1γ0)+β2X1+(u2+β1u3) (6.34)
Y2=(β0+β1γ0)
(1−β1γ1)+β2X1
(1−β1γ1)+(u2+β1u3)
(1−β1γ1)(6.35)
(6.35) is the reduced form equation for Y2, since there are no endogenous variables
on the RHS. Substituting in (6.27) for Y3from (6.29)
Y1=α0+α1Y2+α3(γ0+γ1Y2+u3)+α4X1+α5X2+u1 (6.36)
Y1=α0+α1Y2+α3γ0+α3γ1Y2+α3u3+α4X1+α5X2+u1 (6.37)
Y1=(α0+α3γ0)+(α1+α3γ1)Y2+α4X1+α5X2+(u1+α3u3) (6.38)
Substituting in (6.38) for Y2from (6.35):
Y1=(α0+α3γ0)+(α1+α3γ1)/parenleftbigg(β0+β1γ0)
(1−β1γ1)+β2X1
(1−β1γ1)+(u2+β1u3)
(1−β1γ1)/parenrightbigg
+α4X1+α5X2+(u1+α3u3) (6.39)
Y1=/parenleftbigg
α0+α3γ0+(α1+α3γ1)(β0+β1γ0)
(1−β1γ1)/parenrightbigg
+(α1+α3γ1)β2X1
(1−β1γ1)
+(α1+α3γ1)(u2+β1u3)
(1−β1γ1)+α4X1+α5X2+(u1+α3u3) (6.40)
Y1=/parenleftbigg
α0+α3γ0+(α1+α3γ1)(β0+β1γ0)
(1−β1γ1)/parenrightbigg
+/parenleftbigg(α1+α3γ1)β2
(1−β1γ1)+α4/parenrightbigg
X1
+α5X2+/parenleftbigg(α1+α3γ1)(u2+β1u3)
(1−β1γ1)+(u1+α3u3)/parenrightbigg
(6.41)
(6.41) is the reduced form equation for Y1. Finally, to obtain the reduced form
equation for Y3, substitute in (6.29) for Y2from (6.35)
Y3=/parenleftbigg
γ0+γ1(β0+β1γ0)
(1−β1γ1)/parenrightbigg
+γ1β2X1
(1−β1γ1)+/parenleftbiggγ1(u2+β1u3)
(1−β1γ1)+u3/parenrightbigg
(6.42)
So, the reduced form equations corresponding to (6.27)–(6.29) are, respectively,
given by (6.41), (6.35) and (6.42). These three equations can also be expressedusing π
ijfor the coefﬁcients, as discussed above
Y1=π10+π11X1+π12X2+v1 (6.43)
Y2=π20+π21X1+v2 (6.44)
Y3=π30+π31X1+v3 (6.45)

Multivariate models 275
Estimate the reduced form equations (6.43)–(6.45) using OLS, and obtain the ﬁtted
values, ˆY1
1,ˆY1
2,ˆY1
3, where the superﬂuous superscript1denotes the ﬁtted values
from the reduced form estimation.
(2) Run the regression corresponding to (6.27) – i.e. the structural form equation, at
this stage ignoring any possible simultaneity.
(3) Run the regression (6.27) again, but now also including the ﬁtted values from the
reduced form equations, ˆY1
2,ˆY1
3, as additional regressors
Y1=α0+α1Y2+α3Y3+α4X1+α5X2+λ2ˆY1
2+λ3ˆY1
3+ε1 (6.46)
(4) Use an F-test to test the joint restriction that λ2=0, and λ3=0. If the null
hypothesis is rejected, Y2and Y3should be treated as endogenous. If λ2andλ3
are signiﬁcantly different from zero, there is extra important information for modelling
Y1from the reduced form equations. On the other hand, if the null is not rejected,
Y2and Y3can be treated as exogenous for Y1, and there is no useful additional
information available for Y1from modelling Y2and Y3as endogenous variables.
Steps 2–4 would then be repeated for (6.28) and (6.29).
6.7 Triangular systems
Consider the following system of equations, with time subscripts omittedfor simplicity
Y
1=β10+γ11X1+γ12X2+u1 (6.47)
Y2=β20+β21Y1+γ21X1+γ22X2+u2 (6.48)
Y3=β30+β31Y1+β32Y2+γ31X1+γ32X2+u3 (6.49)
Assume that the error terms from each of the three equations are not
correlated with each other. Can the equations be estimated individuallyusing OLS? At ﬁrst blush, an appropriate answer to this question mightappear to be, ‘No, because this is a simultaneous equations system.’ Butconsider the following:
●Equation (6.47): contains no endogenous variables, so X1and X2are not
correlated with u1. So OLS can be used on (6.47).
●Equation (6.48): contains endogenous Y1together with exogenous X1
and X2. OLS can be used on (6.48) if all the RHS variables in (6.48) are
uncorrelated with that equation’s error term. In fact, Y1is not corre-
lated with u2because there is no Y2term in (6.47). So OLS can be used
on (6.48).
●Equation (6.49): contains both Y1and Y2; these are required to be un-
correlated with u3. By similar arguments to the above, (6.47) and (6.48)
do not contain Y3. So OLS can be used on (6.49).

276 Introductory Econometrics for Finance
This is known as a recursive or triangular system , which is really a spe-
cial case – a set of equations that looks like a simultaneous equationssystem, but isn’t. In fact, there is not a simultaneity problem here, sincethe dependence is not bi-directional, for each equation it all goes oneway.
6.8 Estimation procedures for simultaneous equations systems
Each equation that is part of a recursive system can be estimatedseparately using OLS. But in practice, not many systems of equations willbe recursive, so a direct way to address the estimation of equations thatare from a true simultaneous system must be sought. In fact, there arepotentially many methods that can be used, three of which – indirectleast squares, two-stage least squares and instrumental variables – will bedetailed here. Each of these will be discussed below.
6.8.1 Indirect least squares (ILS)
Although it is not possible to use OLS directly on the structural equations,it is possible to validly apply OLS to the reduced form equations. If the sys-tem is just identiﬁed, ILS involves estimating the reduced form equationsusing OLS, and then using them to substitute back to obtain the struc-tural parameters. ILS is intuitive to understand in principle; however, it isnot widely applied because:
(1)Solving back to get the structural parameters can be tedious . For a large
system, the equations may be set up in a matrix form, and to solvethem may therefore require the inversion of a large matrix.
(2)Most simultaneous equations systems are overidentified , and ILS can be used
to obtain coefﬁcients only for just identiﬁed equations. For overiden-tiﬁed systems, ILS would not yield unique structural form estimates.
ILS estimators are consistent and asymptotically efﬁcient, but in general
they are biased, so that in ﬁnite samples ILS will deliver biased struc-tural form estimates. In a nutshell, the bias arises from the fact that thestructural form coefﬁcients under ILS estimation are transformations ofthe reduced form coefﬁcients. When expectations are taken to test forunbiasedness, it is in general not the case that the expected value of a(non-linear) combination of reduced form coefﬁcients will be equal to thecombination of their expected values (see Gujarati, 1995, pp. 704–5 for aproof).

Multivariate models 277
6.8.2 Estimation of just identiﬁed and overidentiﬁed systems using 2SLS
This technique is applicable for the estimation of overidentiﬁed systems,
where ILS cannot be used. In fact, it can also be employed for estimatingthe coefﬁcients of just identiﬁed systems, in which case the method wouldyield asymptotically equivalent estimates to those obtained from ILS.
Two-stage least squares (2SLS or TSLS) is done in two stages:
●Stage 1 Obtain and estimate the reduced form equations using OLS.
Save the ﬁtted values for the dependent variables.
●Stage 2 Estimate the structural equations using OLS, but replace any
RHS endogenous variables with their stage 1 ﬁtted values.
Example 6.3
Suppose that (6.27)–(6.29) are required. 2SLS would involve the followingtwo steps:
●Stage 1 Estimate the reduced form equations (6.43)–(6.45) individually
by OLS and obtain the ﬁtted values, and denote them ˆY1
1,ˆY1
2,ˆY1
3, where
the superﬂuous superscript1indicates that these are the ﬁtted values
from the ﬁrst stage.
●Stage 2 Replace the RHS endogenous variables with their stage 1 esti-
mated values
Y1=α0+α1ˆY1
2+α3ˆY1
3+α4X1+α5X2+u1 (6.50)
Y2=β0+β1ˆY1
3+β2X1+u2 (6.51)
Y3=γ0+γ1ˆY1
2+u3 (6.52)
where ˆY1
2and ˆY1
3are the ﬁtted values from the reduced form estimation.
Now ˆY1
2and ˆY1
3will not be correlated with u1,ˆY1
3will not be correlated
with u2, and ˆY1
2will not be correlated with u3. The simultaneity problem
has therefore been removed. It is worth noting that the 2SLS estimatoris consistent, but not unbiased.
In a simultaneous equations framework, it is still of concern whether the
usual assumptions of the CLRM are valid or not, although some of thetest statistics require modiﬁcations to be applicable in the systems con-text. Most econometrics packages will automatically make any requiredchanges. To illustrate one potential consequence of the violation of theCLRM assumptions, if the disturbances in the structural equations areautocorrelated, the 2SLS estimator is not even consistent.

278 Introductory Econometrics for Finance
The standard error estimates also need to be modiﬁed compared with
their OLS counterparts (again, econometrics software will usually do thisautomatically), but once this has been done, the usual t-tests can be used
to test hypotheses about the structural form coefﬁcients. This modiﬁcationarises as a result of the use of the reduced form ﬁtted values on the RHSrather than actual variables, which implies that a modiﬁcation to theerror variance is required.
6.8.3 Instrumental variables
Broadly, the method of instrumental variables (IV) is another techniquefor parameter estimation that can be validly used in the context of asimultaneous equations system. Recall that the reason that OLS cannot beused directly on the structural equations is that the endogenous variablesare correlated with the errors.
One solution to this would be not to use Y
2orY3, but rather to use some
other variables instead. These other variables should be (highly) correlatedwith Y
2and Y3, but not correlated with the errors – such variables would
be known as instruments . Suppose that suitable instruments for Y2and Y3,
were found and denoted z2and z3, respectively. The instruments are not
used in the structural equations directly, but rather, regressions of thefollowing form are run
Y
2=λ1+λ2z2+ε1 (6.53)
Y3=λ3+λ4z3+ε2 (6.54)
Obtain the ﬁtted values from (6.53) and (6.54), ˆY1
2and ˆY1
3, and replace Y2
and Y3with these in the structural equation. It is typical to use more
than one instrument per endogenous variable. If the instruments are thevariables in the reduced form equations, then IV is equivalent to 2SLS, sothat the latter can be viewed as a special case of the former.
6.8.4 What happens if IV or 2SLS are used unnecessarily?
In other words, suppose that one attempted to estimate a simultaneoussystem when the variables speciﬁed as endogenous were in fact indepen-dent of one another. The consequences are similar to those of includingirrelevant variables in a single equation OLS model. That is, the coefﬁcientestimates will still be consistent, but will be inefﬁcient compared to thosethat just used OLS directly.

Multivariate models 279
6.8.5 Other estimation techniques
There are, of course, many other estimation techniques available for
systems of equations, including three-stage least squares (3SLS), fullinformation maximum likelihood (FIML) and limited information maxi-mum likelihood (LIML). Three-stage least squares provides a third step inthe estimation process that allows for non-zero covariances between theerror terms in the structural equations. It is asymptotically more efﬁcientthan 2SLS since the latter ignores any information that may be availableconcerning the error covariances (and also any additional informationthat may be contained in the endogenous variables of other equations).Full information maximum likelihood involves estimating all of the equa-tions in the system simultaneously using maximum likelihood (see chap-ter 8 for a discussion of the principles of maximum likelihood estimation).Thus under FIML, all of the parameters in all equations are treated jointly,and an appropriate likelihood function is formed and maximised. Finally,limited information maximum likelihood involves estimating each equa-tion separately by maximum likelihood. LIML and 2SLS are asymptoticallyequivalent. For further technical details on each of these procedures, seeGreene (2002, chapter 15).
The following section presents an application of the simultaneous equa-
tions approach in ﬁnance to the joint modelling of bid–ask spreads andtrading activity in the S&P100 index options market. Two related applica-tions of this technique that are also worth examining are by Wang et al.
(1997) and by Wang and Yau (2000). The former employs a bivariate sys-tem to model trading volume and bid–ask spreads and they show using aHausman test that the two are indeed simultaneously related and so mustboth be treated as endogenous variables and are modelled using 2SLS. Thelatter paper employs a trivariate system to model trading volume, spreadsand intra-day volatility.
6.9 An application of a simultaneous equations approach
to modelling bid–ask spreads and trading activity
6.9.1 Introduction
One of the most rapidly growing areas of empirical research in ﬁnance is
the study of market microstructure. This research is involved with issuessuch as price formation in ﬁnancial markets, how the structure of themarket may affect the way it operates, determinants of the bid–ask spread,and so on. One application of simultaneous equations methods in the

280 Introductory Econometrics for Finance
market microstructure literature is a study by George and Longstaff (1993).
Among other issues, this paper considers the questions:
●Is trading activity related to the size of the bid–ask spread?
●How do spreads vary across options, and how is this related to thevolume of contracts traded? ‘Across options’ in this case means for dif-ferent maturities and strike prices for an option on a given underlyingasset.
This chapter will now examine the George and Longstaff models, results
and conclusions.
6.9.2 The data
The data employed by George and Longstaff comprise options prices onthe S&P100 index, observed on all trading days during 1989. The S&P100index has been traded on the Chicago Board Options Exchange (CBOE)since 1983 on a continuous open-outcry auction basis. The option priceas used in the paper is deﬁned as the average of the bid and the ask. Theaverage bid and ask prices are calculated for each option during the time2.00p.m.–2.15p.m. (US Central Standard Time) to avoid time-of-day effects,such as differences in behaviour at the open and the close of the market.The following are then dropped from the sample for that day to avoid anyeffects resulting from stale prices:
●Any options that do not have bid and ask quotes reported during the1/4hour
●Any options with fewer than ten trades during the day.
This procedure results in a total of 2,456 observations. A ‘pooled’ regres-
sion is conducted since the data have both time series and cross-sectionaldimensions. That is, the data are measured every trading day and acrossoptions with different strikes and maturities, and the data is stacked in asingle column for analysis.
6.9.3 How might the option price/trading volume and the
bid–ask spread be related?
George and Longstaff argue that the bid–ask spread will be determined
by the interaction of market forces. Since there are many market makerstrading the S&P100 contract on the CBOE, the bid–ask spread will be setto just cover marginal costs. There are three components of the costsassociated with being a market maker. These are administrative costs,

Multivariate models 281
inventory holding costs, and ‘risk costs’. George and Longstaff consider
three possibilities for how the bid–ask spread might be determined:
●Market makers equalise spreads across options This is likely to be the case
if order-processing (administrative) costs make up the majority of costsassociated with being a market maker. This could be the case since theCBOE charges market makers the same fee for each option traded. Infact, for every contract (100 options) traded, a CBOE fee of 9 cents andan Options Clearing Corporation (OCC) fee of 10 cents is levied on theﬁrm that clears the trade.
●The spread might be a constant proportion of the option value This would
be the case if the majority of the market maker’s cost is in inventoryholding costs, since the more expensive options will cost more to holdand hence the spread would be set wider.
●Market makers might equalise marginal costs across options irrespective of trad-ing volume This would occur if the riskiness of an unwanted position
were the most important cost facing market makers. Market makers typ-ically do not hold a particular view on the direction of the market – theysimply try to make money by buying and selling. Hence, they would liketo be able to ofﬂoad any unwanted (long or short) positions quickly. Buttrading is not continuous, and in fact the average time between tradesin 1989 was approximately ﬁve minutes. The longer market makers holdan option, the higher the risk they face since the higher the probabil-ity that there will be a large adverse price movement. Thus optionswith low trading volumes would command higher spreads since it ismore likely that the market maker would be holding these options forlonger.
In a non-quantitative exploratory analysis, George and Longstaff ﬁnd that,
comparing across contracts with different maturities, the bid–ask spreaddoes indeed increase with maturity (as the option with longer maturityis worth more) and with ‘moneyness’ (that is, an option that is deeper inthe money has a higher spread than one which is less in the money). Thisis seen to be true for both call and put options.
6.9.4 The inﬂuence of tick-size rules on spreads
The CBOE limits the tick size (the minimum granularity of price quotes),
which will of course place a lower limit on the size of the spread. The ticksizes are:
●$1/8 for options worth $3 or more
●$1/16 for options worth less than $3.

282 Introductory Econometrics for Finance
6.9.5 The models and results
The intuition that the bid–ask spread and trading volume may be simul-
taneously related arises since a wider spread implies that trading is rel-atively more expensive so that marginal investors would withdraw fromthe market. On the other hand, market makers face additional risk if thelevel of trading activity falls, and hence they may be expected to respondby increasing their fee (the spread). The models developed seek to simul-taneously determine the size of the bid–ask spread and the time betweentrades.
For the calls, the model is:
CBA
i=α0+α1CDUM i+α2Ci+α3CLi+α4Ti+α5CRi+ei (6.55)
CLi=γ0+γ1CBA i+γ2Ti+γ3T2
i+γ4M2
i+vi (6.56)
And symmetrically for the puts:
PBA i=β0+β1PDUM i+β2Pi+β3PLi+β4Ti+β5PRi+ui (6.57)
PLi=δ0+δ1PBA i+δ2Ti+δ3T2
i+δ4M2
i+wi (6.58)
where CBA iand PBA iare the call bid–ask spread and the put bid–ask
spread for option i, respectively
Ciand Piare the call price and put price for option i, respectively
CLiand PLiare the times between trades for the call and put option i,
respectivelyCR
iand PRiare the squared deltas of the options
CDUM iand PDUM iare dummy variables to allow for the minimum
tick size
=0i f Cior P i<$3
=1i f Cior P i≥$3
Tis the time to maturity
T2allows for a non-linear relationship between time to maturity and the
spread M2is the square of moneyness, which is employed in quadratic
form since at-the-money options have a higher trading volume, whileout-of-the-money and in-the-money options both have lower tradingactivityCR
iand PRiare measures of risk for the call and put, respectively, given
by the square of their deltas.
Equations (6.55) and (6.56), and then separately (6.57) and (6.58), are esti-
mated using 2SLS. The results are given here in tables 6.1 and 6.2.

Multivariate models 283
Table 6.1 Call bid–ask spread and trading volume regression
CBA i=α0+α1CDUM i+α2Ci+α3CLi+α4Ti+α5CRi+ei (6.55)
CLi=γ0+γ1CBA i+γ2Ti+γ3T2
i+γ4M2
i+vi (6.56)
α0 α1 α2 α3 α4 α5 Adj. R2
0.08362 0.06114 0.01679 0.00902 −0.00228 −0.15378 0.688
(16.80) (8.63) (15.49) (14.01) ( −12.31)(−12.52)
γ0 γ1 γ2 γ3 γ4 Adj. R2
−3.8542 46.592 −0.12412 0.00406 0.00866 0.618
(−10.50) (30.49) ( −6.01) (14.43) (4.76)
Note:t-ratios in parentheses.
Source : George and Longstaff (1993). Reprinted with the permission of School of
Business Administration, University of Washington.
Table 6.2 Put bid–ask spread and trading volume regression
PBA i=β0+β1PDUM i+β2Pi+β3PLi+β4Ti+β5PRi+ui (6.57)
PLi=δ0+δ1PBA i+δ2Ti+δ3T2
i+δ4M2
i+wi (6.58)
β0 β1 β2 β3 β4 β5 Adj.R2
0.05707 0.03258 0.01726 0.00839 −0.00120 −0.08662 0.675
(15.19) (5.35) (15.90) (12.56) ( −7.13)( −7.15)
δ0 δ1 δ2 δ3 δ4 Adj. R2
−2.8932 46.460 −0.15151 0.00339 0.01347 0.517
(−8.42) (34.06) ( −7.74) (12.90) (10.86)
Note:t-ratios in parentheses.
Source : George and Longstaff (1993). Reprinted with the permission of School of
Business Administration, University of Washington.
The adjusted R2≈0.6 for all four equations, indicating that the vari-
ables selected do a good job of explaining the spread and the time betweentrades. George and Longstaff argue that strategic market maker behaviour,which cannot be easily modelled, is important in inﬂuencing the spreadand that this precludes a higher adjusted R
2.
A next step in examining the empirical plausibility of the estimates is
to consider the sizes, signs and signiﬁcances of the coefﬁcients. In the calland put spread regressions, respectively, α
1andβ1measure the tick size
constraint on the spread – both are statistically signiﬁcant and positive. α2
andβ2measure the effect of the option price on the spread. As expected,
both of these coefﬁcients are again signiﬁcant and positive since these are

284 Introductory Econometrics for Finance
inventory or holding costs. The coefﬁcient value of approximately 0.017
implies tha t a 1 dollar increase in the price of the option will on av-
erage lead to a 1.7 cent increase in the spread. α3andβ3measure the
effect of trading activity on the spread. Recalling that an inverse tradingactivity variable is used in the regressions, again, the coefﬁcients havetheir correct sign. That is, as the time between trades increases (that is, astrading activity falls), the bid–ask spread widens. Furthermore, althoughthe coefﬁcient values are small, they are statistically signiﬁcant. In theput spread regression, for example, the coefﬁcient of approximately 0.009implies that, even if the time between trades widened from one minuteto one hour, the spread would increase by only 54 cents. α
4andβ4mea-
sure the effect of time to maturity on the spread; both are negative andstatistically signiﬁcant. The authors argue that this may arise as marketmaking is a more risky activity for near-maturity options. A possible al-ternative explanation, which they dismiss after further investigation, isthat the early exercise possibility becomes more likely for very short-datedoptions since the loss of time value would be negligible. Finally, α
5and
β5measure the effect of risk on the spread; in both the call and put
spread regressions, these coefﬁcients are negative and highly statisticallysigniﬁcant. This seems an odd result, which the authors struggle to jus-tify, for it seems to suggest that more risky options will command lowerspreads.
Turning attention now to the trading activity regressions, γ
1andδ1
measure the effect of the spread size on call and put trading activity,
respectively. Both are positive and statistically signiﬁcant, indicating thata rise in the spread will increase the time between trades. The coefﬁcientsare such that a 1 cent increase in the spread would lead to an increase
in the average time between call and put trades of nearly half a minute.γ
2andδ2give the effect of an increase in time to maturity, while γ3
andδ3are coefﬁcients attached to the square of time to maturity. For
both the call and put regressions, the coefﬁcient on the level of time tomaturity is negative and signiﬁcant, while that on the square is positiveand signiﬁcant. As time to maturity increases, the squared term woulddominate, and one could therefore conclude that the time between tradeswill show a U-shaped relationship with time to maturity. Finally, γ
4andδ4
give the effect of an increase in the square of moneyness (i.e. the effect of
an option going deeper into the money or deeper out of the money) on thetime between trades. For both the call and put regressions, the coefﬁcientsare statistically signiﬁcant and positive, showing that as the option movesfurther from the money in either direction, the time between trades rises.This is consistent with the authors’ supposition that trade is most active

Multivariate models 285
in at-the-money options, and less active in both out-of-the-money and in-
the-money options.
6.9.6 Conclusions
The value of the bid–ask spread on S&P100 index options and the timebetween trades (a measure of market liquidity) can be usefully modelledin a simultaneous system with exogenous variables such as the options’deltas, time to maturity, moneyness, etc.
This study represents a nice example of the use of a simultaneous equa-
tions system, but, in this author’s view, it can be criticised on severalgrounds. First, there are no diagnostic tests performed. Second, clearlythe equations are all overidentiﬁed, but it is not obvious how the over-identifying restrictions have been generated. Did they arise from consid-eration of ﬁnancial theory? For example, why do the CLand PLequations
not contain the CRand PRvariables? Why do the CBA and PBA equations
not contain moneyness or squared maturity variables? The authors couldalso have tested for endogeneity of CBA and CL. Finally, the wrong sign on
the highly statistically signiﬁcant squared deltas is puzzling.
6.10 Simultaneous equations modelling using EViews
What is the relationship between inﬂation and stock returns? Holdingstocks is often thought to provide a good hedge against inﬂation, sincethe payments to equity holders are not ﬁxed in nominal terms and rep-resent a claim on real assets (unlike the coupons on bonds, for example).However, the majority of empirical studies that have investigated the signof this relationship have found it to be negative. Various explanationsof this puzzling empirical phenomenon have been proposed, including alink through real activity, so that real activity is negatively related to in-ﬂation but positively related to stock returns and therefore stock returnsand inﬂation vary positively. Clearly, inﬂation and stock returns oughtto be simultaneously related given that the rate of inﬂation will affectthe discount rate applied to cashﬂows and therefore the value of equi-ties, but the performance of the stock market may also affect consumerdemand and therefore inﬂation through its impact on householder wealth(perceived or actual).
1
1Crucially, good econometric models are based on solid ﬁnancial theory. This model is
clearly not, but represents a simple way to illustrate the estimation and interpretationof simultaneous equations models using EViews with freely available data!

286 Introductory Econometrics for Finance
This simple example uses the same macroeconomic data as used previ-
ously to estimate this relationship simultaneously. Suppose (without jus-tiﬁcation) that we wish to estimate the following model, which does notallow for dynamic effects or partial adjustments and does not distinguishbetween expected and unexpected inﬂation
inﬂation
t=α0+α1returns t+α2dcredit t+α3dprodt+α4dmoney +u1t
(6.59)
returns t=β0+β1dprodt+β2dspreadt+β3inﬂationt+β4rterm t+u2t
(6.60)
where ‘returns’ are stock returns and all of the other variables are deﬁned
as in the previous example in chapter 4.
It is evident that there is feedback between the two equations since
theinflation variable appears in the stock returns equation and vice versa.
Are the equations identiﬁed? Since there are two equations, each will beidentiﬁed if one variable is missing from that equation. Equation (6.59),the inﬂation equation, omits two variables. It does not contain the defaultspread or the term spread, and so is over-identiﬁed. Equation (6.60), thestock returns equation, omits two variables as well – the consumer creditand money supply variables – and so is over-identiﬁed too. Two-stage leastsquares (2SLS) is therefore the appropriate technique to use.
In EViews, to do this we need to specify a list of instruments, which
would be all of the variables from the reduced form equation. In thiscase, the reduced form equations would be
inﬂation =f(constant ,dprod,dspread ,rterm,dcredit ,qrev,dmoney )
(6.61)
returns =g(constant ,dprod,dspread ,rterm,dcredit ,qrev,dmoney )
(6.62)
We can perform both stages of 2SLS in one go, but by default, EViews
estimates each of the two equations in the system separately. To dothis, click Quick ,Estimate Equation and then select T S L S–T w oS t a g e
Least Squares (TSNLS and ARMA) from the list of estimation methods.
Then ﬁll in the dialog box as in screenshot 6.1 to estimate the inﬂationequation.
Thus the format of writing out the variables in the ﬁrst window is
as usual, and the full structural equation for inﬂation as a dependentvariable should be speciﬁed here. In the instrument list, include everyvariable from the reduced form equation, including the constant, andclick OK.

Multivariate models 287
The results would then appear as in the following table.
Dependent Variable: INFLATION
Method: Two-Stage Least SquaresDate: 09/02/07 Time: 20:55Sample (adjusted): 1986M04 2007M04Included observations: 253 after adjustmentsInstrument list: C DCREDIT DPROD RTERM DSPREAD DMONEY
Coefﬁcient Std. Error t-Statistic Prob.
C 0.066248 0.337932 0.196038 0.8447DPROD 0.068352 0.090839 0.752453 0.4525DCREDIT 4.77E-07 1.38E-05 0.034545 0.9725DMONEY 0.027426 0.05882 0.466266 0.6414RSANDP 0.238047 0.363113 0.655573 0.5127
R-squared −15.398762 Mean dependent var 0.253632
Adjusted R-squared −15.663258 S.D. dependent var 0.269221
S.E. of regression 1.098980 Sum squared resid 299.5236F-statistic 0.179469 Durbin-Watson stat 1.923274Prob(F-statistic) 0.948875 Second-Stage SSR 17.39799
Similarly, the dialog box for the rsandp equation would be speciﬁed as
in screenshot 6.2. The output for the returns equation is shown in thefollowing table.
Dependent Variable: RSANDP
Method: Two-Stage Least SquaresDate: 09/02/07 Time: 20:30Sample (adjusted): 1986M04 2007M04Included observations: 253 after adjustmentsInstrument list: C DCREDIT DPROD RTERM DSPREAD DMONEY
Coefﬁcient Std. Error t-Statistic Prob.
C 0.682709 3.531687 0.193310 0.8469DPROD −0.242299 0.251263 −0.964322 0.3358
DSPREAD −2.517793 10.57406 −0.238110 0.8120
RTERM 0.138109 1.263541 0.109303 0.9131INFLATION 0.322398 14.10926 0.02285 0.9818
R-squared 0.006553 Mean dependent var 0.721483Adjusted R-squared −0.009471 S.D. dependent var 4.355220
S.E. of regression 4.375794 Sum squared resid 4748.599F-statistic 0.688494 Durbin-Watson stat 2.017386Prob(F-statistic) 0.600527 Second-Stage SSR 4727.189

288 Introductory Econometrics for Finance
Screenshot 6.1
Estimating the
inﬂation equation
The results overall are not very enlightening. None of the parameters
is even close to statistical signiﬁcance in either equation, although inter-estingly, the ﬁtted relationship between the stock returns and inﬂationseries is positive (albeit not signiﬁcantly so). The ¯R
2values from both
equations are also negative, so should be interpreted with caution. As theEViews User’s Guide warns, this can sometimes happen even when there is
an intercept in the regression.
It may also be of relevance to conduct a Hausman test for the endo-
geneity of the inﬂation and stock return variables. To do this, estimate
the reduced form equations and save the residuals . Then create series of
fitted values by constructing new variables which are equal to the actual
values minus the residuals. Call the ﬁtted value series inflation
fitand
rsandp fit. Then estimate the structural equations (separately), adding the
ﬁtted values from the relevant reduced form equations. The two sets of

Multivariate models 289
Screenshot 6.2
Estimating the
rsandp equation
variables (in EViews format, with the dependent variables ﬁrst followed
by the lists of independent variables) are as follows.
For the stock returns equation:
rsandp c dprod dspread rterm inﬂation inﬂation ﬁt
and for the inﬂation equation:
inﬂation c dprod dcredit dmoney rsandp rsandp ﬁt
The conclusion is that the inﬂation ﬁtted value term is not signiﬁcant in
the stock return equation and so inﬂation can be considered exogenousfor stock returns. Thus it would be valid to simply estimate this equation(minus the ﬁtted value term) on its own using OLS. But the ﬁtted stockreturn term is signiﬁcant in the inﬂation equation, suggesting that stockreturns are endogenous.

290 Introductory Econometrics for Finance
6.11 Vector autoregressive models
Vector autoregressive models (VARs) were popularised in econometrics by
Sims (1980) as a natural generalisation of univariate autoregressive modelsdiscussed in chapter 5. A VAR is a systems regression model (i.e. there ismore than one dependent variable) that can be considered a kind of hybridbetween the univariate time series models considered in chapter 5 and thesimultaneous equations models developed previously in this chapter. VARshave often been advocated as an alternative to large-scale simultaneousequations structural models.
The simplest case that can be entertained is a bivariate VAR, where there
are only two variables, y
1tand y2t, each of whose current values depend
on different combinations of the previous kvalues of both variables, and
error terms
y1t=β10+β11y1t−1+···+ β1ky1t−k+α11y2t−1+···+ α1ky2t−k+u1t
(6.63)
y2t=β20+β21y2t−1+···+ β2ky2t−k+α21y1t−1+···+ α2ky1t−k+u2t
(6.64)
where uitis a white noise disturbance term with E( uit)=0,(i=1,2),
E(u1tu2t)=0.
As should already be evident, an important feature of the VAR model
is its ﬂexibility and the ease of generalisation. For example, the modelcould be extended to encompass moving average errors, which would bea multivariate version of an ARMA model, known as a VARMA. Instead ofhaving only two variables, y
1tand y2t, the system could also be expanded
to include gvariables, y1t,y2t,y3t,…,ygt, each of which has an equation.
Another useful facet of VAR models is the compactness with which the
notation can be expressed. For example, consider the case from abovewhere k=1, so that each variable depends only upon the immediately
previous values of y
1tand y2t, plus an error term. This could be written as
y1t=β10+β11y1t−1+α11y2t−1+u1t (6.65)
y2t=β20+β21y2t−1+α21y1t−1+u2t (6.66)
or
/parenleftbiggy1t
y2t/parenrightbigg
=/parenleftbiggβ10
β20/parenrightbigg
+/parenleftbiggβ11α11
α21β21/parenrightbigg/parenleftbiggy1t−1
y2t−1/parenrightbigg
+/parenleftbiggu1t
u2t/parenrightbigg
(6.67)
or even more compactly as
yt=β0+β1yt−1+ut
g×1g×1g×gg×1g×1(6.68)

Multivariate models 291
In (6.68), there are g=2 variables in the system. Extending the model to
the case where there are klags of each variable in each equation is also
easily accomplished using this notation
yt=β0+β1yt−1+β2yt−2+ ···+ βkyt−k+ut
g×1g×1g×gg×1g×gg×1 g×gg×1g×1
(6.69)
The model could be further extended to the case where the model includes
ﬁrst difference terms and cointegrating relationships (a vector error cor-rection model (VECM) – see chapter 7).
6.11.1 Advantages of VAR modelling
VAR models have several advantages compared with univariate time seriesmodels or simultaneous equations structural models:
●The researcher does not need to specify which variables are endoge-nous or exogenous – all are endogenous . This is a very important point,
since a requirement for simultaneous equations structural models tobe estimable is that all equations in the system are identiﬁed. Essen-tially, this requirement boils down to a condition that some variablesare treated as exogenous and that the equations contain different RHSvariables. Ideally, this restriction should arise naturally from ﬁnancialor economic theory. However, in practice theory will be at best vague inits suggestions of which variables should be treated as exogenous. Thisleaves the researcher with a great deal of discretion concerning how toclassify the variables. Since Hausman-type tests are often not employedin practice when they should be, the speciﬁcation of certain variables asexogenous, required to form identifying restrictions, is likely in manycases to be invalid. Sims termed these identifying restrictions ‘incred-ible’. VAR estimation, on the other hand, requires no such restrictionsto be imposed.
●VARs allow the value of a variable to depend on more than just itsown lags or combinations of white noise terms, so VARs are more ﬂexi-ble than univariate AR models; the latter can be viewed as a restrictedcase of VAR models. VAR models can therefore offer a very rich struc-
ture, implying that they may be able to capture more features of the
data.
●Provided that there are no contemporaneous terms on the RHS of theequations, it is possible to simply use OLS separately on each equation. This
arises from the fact that all variables on the RHS are pre-determined –that is, at time t, they are known. This implies that there is no possibility

292 Introductory Econometrics for Finance
for feedback from any of the LHS variables to any of the RHS variables.
Pre-determined variables include all exogenous variables and lagged val-ues of the endogenous variables.
●The forecasts generated by VARs are often better than ‘traditional struc-
tural’ models . It has been argued in a number of articles (see, for exam-
ple, Sims, 1980) that large-scale structural models performed badly interms of their out-of-sample forecast accuracy. This could perhaps ariseas a result of the ad hoc nature of the restrictions placed on the struc-tural models to ensure identiﬁcation discussed above. McNees (1986)shows that forecasts for some variables (e.g. the US unemployment rateand real GNP, etc.) are produced more accurately using VARs than fromseveral different structural speciﬁcations.
6.11.2 Problems with VARs
VAR models of course also have drawbacks and limitations relative to othermodel classes:
●VARs are a-theoretical (as are ARMA models), since they use little theoret-
ical information about the relationships between the variables to guidethe speciﬁcation of the model. On the other hand, valid exclusion re-strictions that ensure identiﬁcation of equations from a simultaneousstructural system will inform on the structure of the model. An up-shot of this is that VARs are less amenable to theoretical analysis andtherefore to policy prescriptions. There also exists an increased possibil-ity under the VAR approach that a hapless researcher could obtain anessentially spurious relationship by mining the data. It is also often notclear how the VAR coefﬁcient estimates should be interpreted.
●How should the appropriate lag lengths for the VAR be determined? There
are several approaches available for dealing with this issue, which willbe discussed below.
●So many parameters !I ft h e r ea r e gequations, one for each of gvariables
and with klags of each of the variables in each equation, (g+kg2)
parameters will have to be estimated. For example, if g=3 and k=3
there will be 30 parameters to estimate. For relatively small sample sizes,degrees of freedom will rapidly be used up, implying large standarderrors and therefore wide conﬁdence intervals for model coefﬁcients.
●Should all of the components of the VAR be stationary ? Obviously, if one
wishes to use hypothesis tests, either singly or jointly, to examine thestatistical signiﬁcance of the coefﬁcients, then it is essential that allof the components in the VAR are stationary. However, many propo-nents of the VAR approach recommend that differencing to induce

Multivariate models 293
stationarity should not be done. They would argue that the purpose
of VAR estimation is purely to examine the relationships between thevariables, and that differencing will throw information on any long-runrelationships between the series away. It is also possible to combine lev-els and ﬁrst differenced terms in a VECM – see chapter 7.
6.11.3 Choosing the optimal lag length for a VAR
Often, ﬁnancial theory will have little to say on what is an appropriatelag length for a VAR and how long changes in the variables should taketo work through the system. In such instances, there are broadly twomethods that could be used to arrive at the optimal lag length: cross-equation restrictions and information criteria.
6.11.4 Cross-equation restrictions for VAR lag length selection
A ﬁrst (but incorrect) response to the question of how to determine theappropriate lag length would be to use the block F-tests highlighted in
section 6.13 below. These, however, are not appropriate in this case as theF-test would be used separately for the set of lags in each equation, and
what is required here is a procedure to test the coefﬁcients on a set oflags on all variables for all equations in the VAR at the same time.
It is worth noting here that in the spirit of VAR estimation (as Sims,
for example, thought that model speciﬁcation should be conducted), themodels should be as unrestricted as possible. A VAR with different laglengths for each equation could be viewed as a restricted VAR. For example,consider a VAR with 3 lags of both variables in one equation and 4 lags ofeach variable in the other equation. This could be viewed as a restrictedmodel where the coefﬁcient on the fourth lags of each variable in theﬁrst equation have been set to zero.
An alternative approach would be to specify the same number of lags in
each equation and to determine the model order as follows. Suppose that aVAR estimated using quarterly data has 8 lags of the two variables in eachequation, and it is desired to examine a restriction that the coefﬁcientson lags 5–8 are jointly zero. This can be done using a likelihood ratio test(see chapter 8 for more general details concerning such tests). Denote thevariance–covariance matrix of residuals (given by ˆuˆu
/prime),a s ˆ/Sigma1. The likelihood
ratio test for this joint hypothesis is given by
LR=T[log|ˆ/Sigma1r|−log|ˆ/Sigma1u|] (6.70)
where |ˆ/Sigma1r|is the determinant of the variance–covariance matrix of the
residuals for the restricted model (with 4 lags), |ˆ/Sigma1u|is the determinant

294 Introductory Econometrics for Finance
of the variance–covariance matrix of residuals for the unrestricted VAR
(with 8 lags) and Tis the sample size. The test statistic is asymptotically
distributed as a χ2variate with degrees of freedom equal to the total
number of restrictions. In the VAR case above, 4 lags of two variables arebeing restricted in each of the 2 equations =a total of 4 ×2×2=16
restrictions. In the general case of a VAR with gequations, to impose
the restriction that the last qlags have zero coefﬁcients, there would be
g
2qrestrictions altogether. Intuitively, the test is a multivariate equivalent
to examining the extent to which the RSSrises when a restriction is im-
posed. If ˆ/Sigma1rand ˆ/Sigma1uare ‘close together’, the restriction is supported by the
data.
6.11.5 Information criteria for VAR lag length selection
The likelihood ratio (LR) test explained above is intuitive and fairly easy toestimate, but has its limitations. Principally, one of the two VARs must bea special case of the other and, more seriously, only pairwise comparisonscan be made. In the above example, if the most appropriate lag length hadbeen 7 or even 10, there is no way that this information could be gleanedfrom the LR test conducted. One could achieve this only by starting witha VAR(10), and successively testing one set of lags at a time.
A further disadvantage of the LR test approach is that the χ
2test will
strictly be valid asymptotically only under the assumption that the errorsfrom each equation are normally distributed. This assumption is unlikelyto be upheld for ﬁnancial data. An alternative approach to selecting theappropriate VAR lag length would be to use an information criterion, asdeﬁned in chapter 5 in the context of ARMA model selection. Informationcriteria require no such normality assumptions concerning the distribu-tions of the errors. Instead, the criteria trade off a fall in the RSSof each
equation as more lags are added, with an increase in the value of thepenalty term. The univariate criteria could be applied separately to eachequation but, again, it is usually deemed preferable to require the num-ber of lags to be the same for each equation. This requires the use ofmultivariate versions of the information criteria, which can be deﬁnedas
MAIC =log/vextendsingle/vextendsingleˆ/Sigma1/vextendsingle/vextendsingle+2k
/prime/T (6.71)
MSBIC =log/vextendsingle/vextendsingleˆ/Sigma1/vextendsingle/vextendsingle+k/prime
Tlog(T) (6.72)
MHQIC =log/vextendsingle/vextendsingleˆ/Sigma1/vextendsingle/vextendsingle+2k/prime
Tlog(log( T)) (6.73)

Multivariate models 295
where again ˆ/Sigma1is the variance–covariance matrix of residuals, Tis the
number of observations and k/primeis the total number of regressors in all
equations, which will be equal to p2k+pforpequations in the VAR sys-
tem, each with klags of the pvariables, plus a constant term in each
equation. As previously, the values of the information criteria are con-structed for 0, 1, …,¯klags (up to some pre-speciﬁed maximum ¯k), and
the chosen number of lags is that number minimising the value of thegiven information criterion.
6.12 Does the VAR include contemporaneous terms?
So far, it has been assumed that the VAR speciﬁed is of the form
y1t=β10+β11y1t−1+α11y2t−1+u1t (6.74)
y2t=β20+β21y2t−1+α21y1t−1+u2t (6.75)
so that there are no contemporaneous terms on the RHS of (6.74) or (6.75) –
i.e. there is no term in y2ton the RHS of the equation for y1tand no term
iny1ton the RHS of the equation for y2t. But what if the equations had a
contemporaneous feedback term, as in the following case?
y1t=β10+β11y1t−1+α11y2t−1+α12y2t+u1t (6.76)
y2t=β20+β21y2t−1+α21y1t−1+α22y1t+u2t (6.77)
Equations (6.76) and (6.77) could also be written by stacking up the terms
into matrices and vectors:
/parenleftbiggy1t
y2t/parenrightbigg
=/parenleftbiggβ10
β20/parenrightbigg
+/parenleftbiggβ11α11
α21β21/parenrightbigg/parenleftbiggy1t−1
y2t−1/parenrightbigg
+/parenleftbiggα12 0
0α22/parenrightbigg/parenleftbiggy2t
y1t/parenrightbigg
+/parenleftbiggu1t
u2t/parenrightbigg
(6.78)
This would be known as a VAR in primitive form , similar to the structural
form for a simultaneous equations model. Some researchers have arguedthat the a-theoretical nature of reduced form VARs leaves them unstruc-tured and their results difﬁcult to interpret theoretically. They argue thatthe forms of VAR given previously are merely reduced forms of a moregeneral structural VAR (such as (6.78)), with the latter being of more in-terest.
The contemporaneous terms from (6.78) can be taken over to the LHS
and written as
/parenleftbigg1−α
12
−α22 1/parenrightbigg/parenleftbiggy1t
y2t/parenrightbigg
=/parenleftbiggβ10
β20/parenrightbigg
+/parenleftbiggβ11α11
α21β21/parenrightbigg/parenleftbiggy1t−1
y2t−1/parenrightbigg
+/parenleftbiggu1t
u2t/parenrightbigg
(6.79)

296 Introductory Econometrics for Finance
or
Ayt=β0+β1yt−1+ut (6.80)
If both sides of (6.80) are pre-multiplied by A−1
yt=A−1β0+A−1β1yt−1+A−1ut (6.81)
or
yt=A0+A1yt−1+et (6.82)
This is known as a standard form VAR , which is akin to the reduced
form from a set of simultaneous equations. This VAR contains only pre-determined values on the RHS (i.e. variables whose values are known attime t), and so there is no contemporaneous feedback term. This VAR can
therefore be estimated equation by equation using OLS.
Equation (6.78), the structural or primitive form VAR, is not identiﬁed,
since identical pre-determined (lagged) variables appear on the RHS ofboth equations. In order to circumvent this problem, a restriction thatone of the coefﬁcients on the contemporaneous terms is zero must beimposed. In (6.78), either α
12orα22must be set to zero to obtain a trian-
gular set of VAR equations that can be validly estimated. The choice ofwhich of these two restrictions to impose is ideally made on theoreticalgrounds. For example, if ﬁnancial theory suggests that the current valueofy
1tshould affect the current value of y2tbut not the other way around,
setα12=0, and so on. Another possibility would be to run separate estima-
tions, ﬁrst imposing α12=0 and then α22=0, to determine whether the
general features of the results are much changed. It is also very commonto estimate only a reduced form VAR, which is of course perfectly validprovided that such a formulation is not at odds with the relationshipsbetween variables that ﬁnancial theory says should hold.
One fundamental weakness of the VAR approach to modelling is that its
a-theoretical nature and the large number of parameters involved makethe estimated models difﬁcult to interpret. In particular, some laggedvariables may have coefﬁcients which change sign across the lags, andthis, together with the interconnectivity of the equations, could renderit difﬁcult to see what effect a given change in a variable would haveupon the future values of the variables in the system. In order to par-tially alleviate this problem, three sets of statistics are usually constructedfor an estimated VAR model: block signiﬁcance tests, impulse responsesand variance decompositions. How important an intuitively interpretablemodel is will of course depend on the purpose of constructing the model.Interpretability may not be an issue at all if the purpose of producing theVAR is to make forecasts.

Multivariate models 297
Table 6.3 Granger causality tests and implied restrictions on VAR models
Hypothesis Implied restriction
1 Lags of y1tdo not explain current y2tβ21=0andγ21=0andδ21=0
2 Lags of y1tdo not explain current y1tβ11=0andγ11=0andδ11=0
3 Lags of y2tdo not explain current y1tβ12=0andγ12=0andδ12=0
4 Lags of y2tdo not explain current y2tβ22=0andγ22=0andδ22=0
6.13 Block signiﬁcance and causality tests
It is likely that, when a VAR includes many lags of variables, it will be
difﬁcult to see which sets of variables have signiﬁcant effects on eachdependent variable and which do not. In order to address this issue, testsare usually conducted that restrict all of the lags of a particular variableto zero. For illustration, consider the following bivariate VAR(3)
/parenleftbiggy
1t
y2t/parenrightbigg
=/parenleftbiggα10
α20/parenrightbigg
+/parenleftbiggβ11β12
β21β22/parenrightbigg/parenleftbiggy1t−1
y2t−1/parenrightbigg
+/parenleftbiggγ11γ12
γ21γ22/parenrightbigg/parenleftbiggy1t−2
y2t−2/parenrightbigg
+/parenleftbiggδ11δ12
δ21δ22/parenrightbigg/parenleftbiggy1t−3
y2t−3/parenrightbigg
+/parenleftbiggu1t
u2t/parenrightbigg
(6.83)
This VAR could be written out to express the individual equations as
y1t=α10+β11y1t−1+β12y2t−1+γ11y1t−2+γ12y2t−2
+δ11y1t−3+δ12y2t−3+u1t(6.84)
y2t=α20+β21y1t−1+β22y2t−1+γ21y1t−2+γ22y2t−2
+δ21y1t−3+δ22y2t−3+u2t
One might be interested in testing the hypotheses and their implied re-
strictions on the parameter matrices given in table 6.3.
Assuming that all of the variables in the VAR are stationary, the joint
hypotheses can easily be tested within the F-test framework, since each
individual set of restrictions involves parameters drawn from only oneequation. The equations would be estimated separately using OLS to obtainthe unrestricted RSS, then the restrictions imposed and the models re-
estimated to obtain the restricted RSS. The F-statistic would then take the
usual form described in chapter 3. Thus, evaluation of the signiﬁcance ofvariables in the context of a VAR almost invariably occurs on the basis ofjoint tests on all of the lags of a particular variable in an equation, ratherthan by examination of individual coefﬁcient estimates.

298 Introductory Econometrics for Finance
In fact, the tests described above could also be referred to as causality
tests. Tests of this form were described by Granger (1969) and a slight vari-ant due to Sims (1972). Causality tests seek to answer simple questions ofthe type, ‘Do changes in y
1cause changes in y2?’ The argument follows
that if y1causes y2, lags of y1should be signiﬁcant in the equation for y2.
If this is the case and not vice versa, it would be said that y1‘Granger-
causes’ y2or that there exists unidirectional causality from y1toy2.O n
the other hand, if y2causes y1, lags of y2should be signiﬁcant in the equa-
tion for y1. If both sets of lags were signiﬁcant, it would be said that there
was ‘bi-directional causality’ or ‘bi-directional feedback’. If y1is found to
Granger-cause y2, but not vice versa, it would be said that variable y1is
strongly exogenous (in the equation for y2). If neither set of lags are sta-
tistically signiﬁcant in the equation for the other variable, it would besaid that y
1and y2are independent. Finally, the word ‘causality’ is some-
what of a misnomer, for Granger-causality really means only a correlationbetween the current value of one variable and the past values of others;
it does not mean that movements of one variable cause movements ofanother.
6.14 VARs with exogenous variables
Consider the following speciﬁcation for a VAR(1) where Xtis a vector of
exogenous variables and Bis a matrix of coefﬁcients
yt=A0+A1yt−1+BX t+et (6.85)
The components of the vector Xtare known as exogenous variables since
their values are determined outside of the VAR system – in other words,there are no equations in the VAR with any of the components of X
tas
dependent variables. Such a model is sometimes termed a VARX, althoughit could be viewed as simply a restricted VAR where there are equationsfor each of the exogenous variables, but with the coefﬁcients on the RHSin those equations restricted to zero. Such a restriction may be considereddesirable if theoretical considerations suggest it, although it is clearly notin the true spirit of VAR modelling, which is not to impose any restrictionson the model but rather to ‘let the data decide’.
6.15 Impulse responses and variance decompositions
Block F-tests and an examination of causality in a VAR will suggest which
of the variables in the model have statistically signiﬁcant impacts on the

Multivariate models 299
Box 6.3 Forecasting with VARs
One of the main advantages of the VAR approach to modelling and forecasting is that
since only lagged variables are used on the right hand side, forecasts of the futurevalues of the dependent variables can be calculated using only information from withinthe system. We could term these unconditional forecasts since they are not
constructed conditional on a particular set of assumed values. However, conversely itmay be useful to produce forecasts of the future values of some variables conditional
upon known values of other variables in the system. For example, it may be the case
that the values of some variables become known before the values of the others. If theknown values of the former are employed, we would anticipate that the forecastsshould be more accurate than if estimated values were used unnecessarily, thusthrowing known information away. Alternatively, conditional forecasts can be employedfor counterfactual analysis based on examining the impact of certain scenarios. Forexample, in a trivariate VAR system incorporating monthly stock returns, inﬂation andGDP, we could answer the question: ‘What is the likely impact on the stock market over
the next 1–6 months of a 2-percentage point increase in inﬂation and a 1% rise inGDP?’
future values of each of the variables in the system. But F-test results will
not, by construction, be able to explain the sign of the relationship or howlong these effects require to take place. That is, F-test results will not reveal
whether changes in the value of a given variable have a positive or negativeeffect on other variables in the system, or how long it would take for theeffect of that variable to work through the system. Such information will,however, be given by an examination of the VAR’s impulse responses andvariance decompositions.
Impulse responses trace out the responsiveness of the dependent variables
in the VAR to shocks to each of the variables. So, for each variable fromeach equation separately, a unit shock is applied to the error, and theeffects upon the VAR system over time are noted. Thus, if there are g
variables in a system, a total of g
2impulse responses could be generated.
The way that this is achieved in practice is by expressing the VAR modelas a VMA – that is, the vector autoregressive model is written as a vectormoving average (in the same way as was done for univariate autoregressivemodels in chapter 5). Provided that the system is stable, the shock shouldgradually die away.
To illustrate how impulse responses operate, consider the following
bivariate VAR(1)
y
t=A1yt−1+ut (6.86)
where A1=/bracketleftbigg0.50.3
0.00.2/bracketrightbigg

300 Introductory Econometrics for Finance
The VAR can also be written out using the elements of the matrices and
vectors as
/bracketleftbiggy1t
y2t/bracketrightbigg
=/bracketleftbigg0.50.3
0.00.2/bracketrightbigg/bracketleftbiggy1t−1
y2t−1/bracketrightbigg
+/bracketleftbiggu1t
u2t/bracketrightbigg
(6.87)
Consider the effect at time t=0,1,…, of a unit shock to y1tat time t=0
y0=/bracketleftbiggu10
u20/bracketrightbigg
=/bracketleftbigg1
0/bracketrightbigg
(6.88)
y1=A1y0=/bracketleftbigg0.50.3
0.00.2/bracketrightbigg/bracketleftbigg1
0/bracketrightbigg
=/bracketleftbigg0.5
0/bracketrightbigg
(6.89)
y2=A1y1=/bracketleftbigg0.50.3
0.00.2/bracketrightbigg/bracketleftbigg0.5
0/bracketrightbigg
=/bracketleftbigg0.25
0/bracketrightbigg
(6.90)
and so on. It would thus be possible to plot the impulse response functions
ofy1tand y2tto a unit shock in y1t. Notice that the effect on y2tis always
zero, since the variable y1t−1has a zero coefﬁcient attached to it in the
equation for y2t.
Now consider the effect of a unit shock to y2tat time t=0
y0=/bracketleftbiggu10
u20/bracketrightbigg
=/bracketleftbigg0
1/bracketrightbigg
(6.91)
y1=A1y0=/bracketleftbigg0.50.3
0.00.2/bracketrightbigg/bracketleftbigg0
1/bracketrightbigg
=/bracketleftbigg0.3
0.2/bracketrightbigg
(6.92)
y2=A1y1=/bracketleftbigg0.50.3
0.00.2/bracketrightbigg/bracketleftbigg0.3
0.2/bracketrightbigg
=/bracketleftbigg0.21
0.04/bracketrightbigg
(6.93)
and so on. Although it is probably fairly easy to see what the effects of
shocks to the variables will be in such a simple VAR, the same principlescan be applied in the context of VARs containing more equations or morelags, where it is much more difﬁcult to see by eye what are the interactionsbetween the equations.
Variance decompositions offer a slightly different method for examining
VAR system dynamics. They give the proportion of the movements in thedependent variables that are due to their ‘own’ shocks, versus shocks tothe other variables. A shock to the ith variable will directly affect that
variable of course, but it will also be transmitted to all of the other vari-ables in the system through the dynamic structure of the VAR. Variancedecompositions determine how much of the s-step-ahead forecast error
variance of a given variable is explained by innovations to each explana-tory variable for s=1, 2,…In practice, it is usually observed that own

Multivariate models 301
series shocks explain most of the (forecast) error variance of the series in
a VAR. To some extent, impulse responses and variance decompositionsoffer very similar information.
For calculating impulse responses and variance decompositions, the or-
dering of the variables is important. To see why this is the case, recallthat the impulse responses refer to a unit shock to the errors of one VARequation alone. This implies that the error terms of all other equationsin the VAR system are held constant. However, this is not realistic sincethe error terms are likely to be correlated across equations to some extent.Thus, assuming that they are completely independent would lead to a mis-representation of the system dynamics. In practice, the errors will havea common component that cannot be associated with a single variablealone.
The usual approach to this difﬁculty is to generate orthogonalised impulse
responses . In the context of a bivariate VAR, the whole of the common
component of the errors is attributed somewhat arbitrarily to the ﬁrstvariable in the VAR. In the general case where there are more thantwo variables in the VAR, the calculations are more complex but the in-terpretation is the same. Such a restriction in effect implies an ‘ordering’of variables, so that the equation for y
1twould be estimated ﬁrst and then
that of y2t, a bit like a recursive or triangular system.
Assuming a particular ordering is necessary to compute the impulse
responses and variance decompositions, although the restriction underly-ing the ordering used may not be supported by the data. Again, ideally,ﬁnancial theory should suggest an ordering (in other words, that move-ments in some variables are likely to follow, rather than precede, others).Failing this, the sensitivity of the results to changes in the ordering canbe observed by assuming one ordering, and then exactly reversing it andre-computing the impulse responses and variance decompositions. It isalso worth noting that the more highly correlated are the residuals froman estimated equation, the more the variable ordering will be important.But when the residuals are almost uncorrelated, the ordering of the vari-ables will make little difference (see L ¨utkepohl, 1991, chapter 2 for further
details).
Runkle (1987) argues that both impulse responses and variance decom-
positions are notoriously difﬁcult to interpret accurately. He argues thatconﬁdence bands around the impulse responses and variance decomposi-tions should always be constructed. However, he further states that, eventhen, the conﬁdence intervals are typically so wide that sharp inferencesare impossible.

302 Introductory Econometrics for Finance
6.16 VAR model example: the interaction between
property returns and the macroeconomy
6.16.1 Background, data and variables
Brooks and Tsolacos (1999) employ a VAR methodology for investigat-
ing the interaction between the UK property market and various macro-economic variables. Monthly data, in logarithmic form, are used for theperiod from December 1985 to January 1998. The selection of the variablesfor inclusion in the VAR model is governed by the time series that are com-monly included in studies of stock return predictability. It is assumed thatstock returns are related to macroeconomic and business conditions, andhence time series which may be able to capture both current and futuredirections in the broad economy and the business environment are usedin the investigation.
Broadly, there are two ways to measure the value of property-based
assets – direct measures of property value and equity-based measures . Direct prop-
erty measures are based on periodic appraisals or valuations of the actualproperties in a portfolio by surveyors, while equity-based measures evalu-ate the worth of properties indirectly by considering the values of stockmarket traded property companies. Both sources of data have their draw-backs. Appraisal-based value measures suffer from valuation biases and in-accuracies. Surveyors are typically prone to ‘smooth’ valuations over time,such that the measured returns are too low during property market boomsand too high during periods of property price falls. Additionally, not everyproperty in the portfolio that comprises the value measure is appraisedduring every period, resulting in some stale valuations entering the aggre-gate valuation, further increasing the degree of excess smoothness of therecorded property price series. Indirect proper ty vehicles – property-related
companies traded on stock exchanges – do not suffer from the above prob-lems, but are excessively inﬂuenced by general stock market movements.It has been argued, for example, that over three-quarters of the variationover time in the value of stock exchange traded property companies can beattributed to general stock market-wide price movements. Therefore, thevalue of equity-based property series reﬂects much more the sentimentin the general stock market than the sentiment in the property marketspeciﬁcally.
Brooks and Tsolacos (1999) elect to use the equity-based FTSE Property
Total Return Index to construct property returns. In order to purge the realestate return series of its general stock market inﬂuences, it is commonto regress property returns on a general stock market index (in this case

Multivariate models 303
the FTA All-Share Index is used), saving the residuals. These residuals are
expected to reﬂect only the variation in property returns, and thus becomethe property market return measure used in subsequent analysis, andare denoted PROPRES.
Hence, the variables included in the VAR are the property returns (with
general stock market effects removed), the rate of unemployment, nom-inal interest rates, the spread between the long- and short-term interestrates, unanticipated inﬂation and the dividend yield. The motivations forincluding these particular variables in the VAR together with the propertyseries, are as follows:
●The rate of unemployment (denoted UNEM) is included to indicate general
economic conditions. In US research, authors tend to use aggregateconsumption, a variable that has been built into asset pricing modelsand examined as a determinant of stock returns. Data for this variableand for alternative variables such as GDP are not available on a monthlybasis in the UK. Monthly data are available for industrial productionseries but other studies have not shown any evidence that industrialproduction affects real estate returns. As a result, this series was notconsidered as a potential causal variable.
●Short-term nominal interest rates (denoted SIR) are assumed to contain
information about future economic conditions and to capture the stateof investment opportunities. It was found in previous studies that short-term interest rates have a very signiﬁcant negative inﬂuence on propertystock returns.
●Interest rate spreads (denoted SPREAD), i.e. the yield curve, are usually
measured as the difference in the returns between long-term TreasuryBonds (of maturity, say, 10 or 20 years), and the one-month or three-month Treasury Bill rate. It has been argued that the yield curve hasextra predictive power, beyond that contained in the short-term inter-est rate, and can help predict GDP up to four years ahead. It has alsobeen suggested that the term structure also affects real estate marketreturns.
●Inflation rate inﬂuences are also considered important in the pricing
of stocks. For example, it has been argued that unanticipated inﬂationcould be a source of economic risk and as a result, a risk premium willalso be added if the stock of ﬁrms has exposure to unanticipated inﬂa-tion. The unanticipated inﬂation variable (denoted UNINFL) is deﬁned asthe difference between the realised inﬂation rate, computed as the per-centage change in the Retail Price Index (RPI), and an estimated seriesof expected inﬂation. The latter series was produced by ﬁtting an ARMA

304 Introductory Econometrics for Finance
model to the actual series and making a one-period(month)-ahead fore-
cast, then rolling the sample forward one period, and re-estimatingthe parameters and making another one-step-ahead forecast, andso on.
●Dividend yields (denoted DIVY) have been widely used to model stock
market returns, and also real estate property returns, based on theassumption that movements in the dividend yield series are related tolong-term business conditions and that they capture some predictablecomponents of returns.
All variables to be included in the VAR are required to be stationary in
order to carry out joint signiﬁcance tests on the lags of the variables.Hence, all variables are subjected to augmented Dickey–Fuller (ADF) tests(see chapter 7). Evidence that the log of the RPI and the log of the un-employment rate both contain a unit root is observed. Therefore, the ﬁrstdifferences of these variables are used in subsequent analysis. The remain-ing four variables led to rejection of the null hypothesis of a unit root inthe log-levels, and hence these variables were not ﬁrst differenced.
6.16.2 Methodology
A reduced form VAR is employed and therefore each equation can ef-fectively be estimated using OLS. For a VAR to be unrestricted, it is re-quired that the same number of lags of all of the variables is used in allequations. Therefore, in order to determine the appropriate lag lengths,the multivariate generalisation of Akaike’s information criterion ( AIC)
is used.
Within the framework of the VAR system of equations, the signiﬁcance
of all the lags of each of the individual variables is examined jointly withanF-test. Since several lags of the variables are included in each of the
equations of the system, the coefﬁcients on individual lags may not ap-pear signiﬁcant for all lags, and may have signs and degrees of signiﬁcancethat vary with the lag length. However, F-tests will be able to establish
whether all of the lags of a particular variable are jointly signiﬁcant. In or-der to consider further the effect of the macroeconomy on the real estatereturns index, the impact multipliers (orthogonalised impulse responses)are also calculated for the estimated VAR model. Two standard error bandsare calculated using the Monte Carlo integration approach employed byMcCue and Kling (1994), and based on Doan (1994). The forecast error vari-ance is also decomposed to determine the proportion of the movementsin the real estate series that are a consequence of its own shocks ratherthan shocks to other variables.

Multivariate models 305
Table 6.4 Marginal signiﬁcance levels associated with joint F-tests
Lags of variableDependent
variable SIR DIVY SPREAD UNEM UNINFL PROPRES
SIR 0.0000 0.0091 0.0242 0.0327 0.2126 0.0000
DIVY 0.5025 0.0000 0.6212 0.4217 0.5654 0.4033SPREAD 0.2779 0.1328 0.0000 0.4372 0.6563 0.0007UNEM 0.3410 0.3026 0.1151 0.0000 0.0758 0.2765UNINFL 0.3057 0.5146 0.3420 0.4793 0.0004 0.3885PROPRES 0.5537 0.1614 0.5537 0.8922 0.7222 0.0000
The test is that all 14 lags have no explanatory power for that particular equation in
the VAR.Source : Brooks and Tsolacos (1999).
6.16.3 Results
The number of lags that minimises the value of Akaike’s information
criterion is 14, consistent with the 15 lags used by McCue and Kling (1994).There are thus (1 +14×6)=85 variables in each equation, implying 59
degrees of freedom. F-tests for the null hypothesis that all of the lags of a
given variable are jointly insigniﬁcant in a given equation are presentedin table 6.4.
In contrast to a number of US studies which have used similar vari-
ables, it is found to be difﬁcult to explain the variation in the UK realestate returns index using macroeconomic factors, as the last row oftable 6.4 shows. Of all the lagged variables in the real estate equation,only the lags of the real estate returns themselves are highly signiﬁcant,and the dividend yield variable is signiﬁcant only at the 20% level. Noother variables have any signiﬁcant explanatory power for the real estatereturns. Therefore, based on the F-tests, an initial conclusion is that the
variation in property returns, net of stock market inﬂuences, cannot beexplained by any of the main macroeconomic or ﬁnancial variables usedin existing research. One possible explanation for this might be that, inthe UK, these variables do not convey the information about the macro-economy and business conditions assumed to determine the intertempo-ral behaviour of property returns. It is possible that property returns mayreﬂect property market inﬂuences, such as rents, yields or capitalisationrates, rather than macroeconomic or ﬁnancial variables. However, againthe use of monthly data limits the set of both macroeconomic and prop-erty market variables that can be used in the quantitative analysis of realestate returns in the UK.

306 Introductory Econometrics for Finance
Table 6.5 Variance decompositions for the property sector index residuals
Explained by innovations in
SIR DIVY SPREAD UNEM UNINFL PROPRES
Months ahead I II I II I II I II I II I II
1 0.0 0.8 0.0 38.2 0.0 9.1 0.0 0.7 0.0 0.2 100.0 51.0
2 0.2 0.8 0.2 35.1 0.2 12.3 0.4 1.4 1.6 2.9 97.5 47.53 3.8 2.5 0.4 29.4 0.2 17.8 1.0 1.5 2.3 3.0 92.3 45.84 3.7 2.1 5.3 22.3 1.4 18.5 1.6 1.1 4.8 4.4 83.3 51.5
12 2.8 3.1 15.5 8.7 15.3 19.5 3.3 5.1 17.0 13.5 46.1 50.024 8.2 6.3 6.8 3.9 38.0 36.2 5.5 14.7 18.1 16.9 23.4 22.0
Source : Brooks and Tsolacos (1999).
It appears, however, that lagged values of the real estate variable have
explanatory power for some other variables in the system. These resultsare shown in the last column of table 6.4. The property sector appearsto help in explaining variations in the term structure and short-terminterest rates, and moreover since these variables are not signiﬁcant inthe property index equation, it is possible to state further that the prop-erty residual series Granger-causes the short-term interest rate and theterm spread. This is a bizarre result. The fact that property returns areexplained by own lagged values – i.e. that is there is interdependency be-tween neighbouring data points (observations) – may reﬂect the way thatproperty market information is produced and reﬂected in the propertyreturn indices.
Table 6.5 gives variance decompositions for the property returns index
equation of the VAR for 1, 2, 3, 4, 12 and 24 steps ahead for the twovariable orderings:
Order I: PROPRES, DIVY, UNINFL, UNEM, SPREAD, SIR
Order II: SIR, SPREAD, UNEM, UNINFL, DIVY, PROPRES.
Unfortunately, the ordering of the variables is important in the decom-
position. Thus two orderings are applied, which are the exact opposite ofone another, and the sensitivity of the result is considered. It is clear thatby the two-year forecasting horizon, the variable ordering has become al-most irrelevant in most cases. An interesting feature of the results is thatshocks to the term spread and unexpected inﬂation together account forover 50% of the variation in the real estate series. The short-term interestrate and dividend yield shocks account for only 10–15% of the variance of

Multivariate models 307
0.04
0.02
0
–0.02
–0.04–0.06
–0.08
–0.11
Steps aheadInnovations in unexpected inflation
2 3 4 5 6 7 8 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 9Figure 6.1
Impulse responses
and standard errorbands forinnovations inunexpected inﬂationequation errors
0.06
0.04
0.02
0
–0.02
–0.04
–0.061
Steps aheadInnovations in dividend yields
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24Figure 6.2
Impulse responses
and standard errorbands forinnovations in thedividend yields
the property index. One possible explanation for the difference in results
between the F-tests and the variance decomposition is that the former
is a causality test and the latter is effectively an exogeneity test. Hencethe latter implies the stronger restriction that both current and laggedshocks to the explanatory variables do not inﬂuence the current value ofthe dependent variable of the property equation. Another way of statingthis is that the term structure and unexpected inﬂation have a contempo-raneous rather than a lagged effect on the property index, which impliesinsigniﬁcant F-test statistics but explanatory power in the variance decom-
position. Therefore, although the F-tests did not establish any signiﬁcant
effects, the error variance decompositions show evidence of a contempora-neous relationship between PROPRES and both SPREAD and UNINFL. Thelack of lagged effects could be taken to imply speedy adjustment of themarket to changes in these variables.
Figures 6.1 and 6.2 give the impulse responses for PROPRES associated
with separate unit shocks to unexpected inﬂation and the dividend yield,

308 Introductory Econometrics for Finance
as examples (as stated above, a total of 36 impulse responses could be
calculated since there are 6 variables in the system).
Considering the signs of the responses, innovations to unexpected
inﬂation (ﬁgure 6.1) always have a negative impact on the real estateindex, since the impulse response is negative, and the effect of the shockdoes not die down, even after 24 months. Increasing stock dividend yields(ﬁgure 6.2) have a negative impact for the ﬁrst three periods, but beyondthat, the shock appears to have worked its way out of the system.
6.16.4 Conclusions
The conclusion from the VAR methodology adopted in the Brooks andTsolacos paper is that overall, UK real estate returns are difﬁcult to ex-plain on the basis of the information contained in the set of the variablesused in existing studies based on non-UK data. The results are not stronglysuggestive of any signiﬁcant inﬂuences of these variables on the variationof the ﬁltered property returns series. There is, however, some evidencethat the interest rate term structure and unexpected inﬂation have a con-temporaneous effect on property returns, in agreement with the resultsof a number of previous studies.
6.17 VAR estimation in EViews
By way of illustration, a VAR is estimated in order to examine whetherthere are lead–lag relationships for the returns to three exchange ratesagainst the US dollar – the euro, the British pound and the Japanese yen.The data are daily and run from 7 July 2002 to 7 July 2007, giving a total of1,827 observations. The data are contained in the Excel ﬁle ‘currencies.xls’.First Create a new workfile , called ‘currencies.wf1’, and import the three
currency series. Construct a set of continuously compounded percentagereturns called ‘reur’, ‘rgbp’ and ‘rjpy’. VAR estimation in EViews can be ac-
complished by clicking on the Quick menu and then Estimate VAR . The
VAR inputs screen appears as in screenshot 6.3.
In the Endogenous variables box, type the three variable names, reur
rgbp rjpy . In the Exogenous box, leave the default ‘C’ and in the Lag
Interval box, enter 12to estimate a VAR(2), just as an example. The output
appears in a neatly organised table as shown on the following page, withone column for each equation in the ﬁrst and second panels, and a singlecolumn of statistics that describes the system as a whole in the third. Sovalues of the information criteria are given separately for each equationin the second panel and jointly for the model as a whole in the third.

Multivariate models 309
Vector Autoregression Estimates
Date: 09/03/07 Time: 21:54Sample (adjusted): 7/10/2002 7/07/2007Included observations: 1824 after adjustmentsStandard errors in ( ) & t-statistics in [ ]
REUR RGBP RJPY
REUR( −1) 0.031460 0 .016776 0 .040970
(0.03681) (0 .03234) (0 .03444)
[0.85471] [0 .51875] [1 .18944]
REUR( −2) 0.011377 0 .045542 0 .030551
(0.03661) (0 .03217) (0 .03426)
[0.31073] [1 .41574] [0 .89167]
RGBP( −1) −0.070259 0 .040547 −0.060907
(0.04051) (0 .03559) (0 .03791)
[−1.73453] [1 .13933] [ −1.60683]
RGBP(-2) 0.026719 −0.015074 −0.019407
(0.04043) (0 .03552) (0 .03784)
[0.66083] [ −0.42433] [ −0.51293]
RJPY(-1) −0.020698 −0.029766 0 .011809
(0.03000) (0 .02636) (0 .02807)
[−0.68994] [ −1.12932] [0 .42063]
RJPY(-2) −0.014817 −0.000392 0 .035524
(0.03000) (0 .02635) (0 .02807)
[−0.49396] [ −0.01489] [1 .26557]
C −0.017229 −0.012878 0 .002187
(0.01100) (0 .00967) (0 .01030)
[−1.56609] [ −1.33229] [0 .21239]
R-squared 0.003403 0 .004040 0 .003797
Adj. R-squared 0.000112 0 .000751 0 .000507
Sum sq. resids 399.0767 308 .0701 349 .4794
S.E. equation 0.468652 0 .411763 0 .438564
F-statistic 1.034126 1 .228431 1 .154191
Log likelihood −1202.238 −966.1886 −1081.208
Akaike AIC 1.325919 1 .067093 1 .193210
Schwarz SC 1.347060 1 .088234 1 .214351
Mean dependent −0.017389 −0.014450 0 .002161
S.D. dependent 0.468679 0 .411918 0 .438676
Determinant resid covariance (dof adj.) 0.002214
Determinant resid covariance 0.002189
Log likelihood −2179.054
Akaike information criterion 2.412339
Schwarz criterion 2.475763

310 Introductory Econometrics for Finance
Screenshot 6.3
VAR inputs screen
We will shortly discuss the interpretation of the output, but the exam-
ple so far has assumed that we know the appropriate lag length for the VAR.
However, in practice, the ﬁrst step in the construction of any VAR model,once the variables that will enter the VAR have been decided, will be todetermine the appropriate lag length. This can be achieved in a varietyof ways, but one of the easiest is to employ a multivariate informationcriterion. In EViews, this can be done easily from the EViews VAR outputwe have by clicking View/Lag Structure/Lag Length Criteria …. You will
be invited to specify the maximum number of lags to entertain includingin the model, and for this example, arbitrarily select 10. The output in
the following table would be observed.
EViews presents the values of various information criteria and other
methods for determining the lag order. In this case, the Schwartz andHannan–Quinn criteria both select a zero order as optimal, while Akaike’scriterion chooses a VAR(1). Estimate a VAR(1) and examine the results.
Does the model look as if it ﬁts the data well? Why or why not?

Multivariate models 311
VAR Lag Order Selection Criteria
Endogenous variables: REUR RGBP RJPYExogenous variables: CDate: 09/03/07 Time: 21:58Sample: 7/07/2002 7/07/2007Included observations: 1816
Lag LogL LR FPE AIC SC HQ
0 −2192.395 NA 0.002252 2.417836 2.426929∗2.421191∗
1 −2175.917 32.88475 0.002234∗2.409600∗2.445973 2.423020
2 −2170.888 10.01901 0.002244 2.413973 2.477625 2.437459
3 −2167.760 6.221021 0.002258 2.420441 2.511372 2.453992
4 −2158.361 18.66447 0.002257 2.420001 2.538212 2.463617
5 −2151.563 13.47494 0.002263 2.422426 2.567917 2.476109
6 −2145.132 12.72714 0.002269 2.425256 2.598026 2.489004
7 −2141.412 7.349932 0.002282 2.431071 2.631120 2.504884
8 −2131.693 19.17197 0.002281 2.430278 2.657607 2.514157
9 −2121.823 19.43540∗0.002278 2.429320 2.683929 2.523264
10 −2119.745 4.084453 0.002296 2.436944 2.718832 2.540953
∗indicates lag order selected by the criterion
LR: sequential modiﬁed LR test statistic (each test at 5% level)FPE: Final prediction errorAIC: Akaike information criterionSC: Schwarz information criterionHQ: Hannan-Quinn information criterion
Next, run a Granger causality test by clicking View/Lag Structure/
Granger Causality/Block Exogeneity Tests . The table of statistics will
appear immediately as on the following page.
The results, unsurprisingly, show very little evidence of lead–lag interac-
tions between the series. Since we have estimated a tri-variate VAR, threepanels are displayed, with one for each dependent variable in the sys-tem. None of the results shows any causality that is signiﬁcant at the 5%level, although there is causality from the pound to the euro and from thepound to the yen that is almost signiﬁcant at the 10% level, but no causal-ity in the opposite direction and no causality between the euro–dollar andthe yen–dollar in either direction. These results might be interpreted assuggesting that information is incorporated slightly more quickly in thepound–dollar rate than in the euro–dollar or yen–dollar rates.
It is worth also noting that the term ‘Granger causality’ is something of
a misnomer since a ﬁnding of ‘causality’ does not mean that movements

312 Introductory Econometrics for Finance
VAR Granger Causality/Block Exogeneity Wald Tests
Date: 09/04/07 Time: 13:50Sample: 7/07/2002 7/07/2007Included observations: 1825
Dependent variable: REUR
Excluded Chi-sq df Prob.
RGBP 2.617817 1 0.1057
RJPY 0.473950 1 0.4912
All 3.529180 2 0.1713
Dependent variable: RGBP
Excluded Chi-sq df Prob.
REUR 0.188122 1 0.6645
RJPY 1.150696 1 0.2834
All 1.164752 2 0.5586
Dependent variable: RJPY
Excluded Chi-sq df Prob.
REUR 1.206092 1 0.2721
RGBP 2.424066 1 0.1195
All 2.435252 2 0.2959
in one variable physically cause movements in another. For example, in
the above analysis, if movements in the euro–dollar market were foundto Granger-cause movements in the pound–dollar market, this would nothave meant that the pound–dollar rate changed as a direct result of, orbecause of, movements in the euro–dollar market. Rather, causality simplyimplies a chronological ordering of movements in the series . It could validly be
stated that movements in the pound–dollar rate appear to lead those ofthe euro–dollar rate, and so on.
The EViews manual suggests that block F-test restrictions can be per-
formed by estimating the VAR equations individually using OLS and thenby using the View then Lag Structure then Lag Exclusion Tests .E V i e w s
tests for whether the parameters for a given lag of all the variables in aparticular equation can be restricted to zero.
To obtain the impulse responses for the estimated model, simply click
theImpulse on the button bar above the VAR object and a new dialog box
will appear as in screenshot 6.4.

Multivariate models 313
Screenshot 6.4
Constructing the
VAR impulseresponses
By default, EViews will offer to estimate and plot all of the responses
to separate shocks of all of the variables in the order that the variableswere listed in the estimation window, using ten steps and conﬁdenceintervals generated using analytic formulae. If 20 steps ahead had beenselected, with ‘combined response graphs’, you would see the graphs inthe format in screenshot 6.5 (obviously they appear small on the pageand the colour has been lost, but the originals are much clearer). As onewould expect given the parameter estimates and the Granger causalitytest results, again few linkages between the series are established here.The responses to the shocks are very small, except for the response of avariable to its own shock, and they die down to almost nothing after theﬁrst lag.
Plots of the variance decompositions can also be generated by clicking
onView and then Variance Decomposition . A similar plot for the variance
decompositions would appear as in screenshot 6.6.
There is little again that can be seen from these variance decomposition
graphs that appear small on a printed page apart from the fact that the

314 Introductory Econometrics for Finance
Screenshot 6.5
Combined impulse
response graphs
behaviour is observed to settle down to a steady state very quickly. Inter-
estingly, while the percentage of the errors that is attributable to ownshocks is 100% in the case of the euro rate, for the pound, the euro seriesexplains around 55% of the variation in returns, and for the yen, the euroseries explains around 30% of the variation.
We should remember that the ordering of the variables has an effect
on the impulse responses and variance decompositions, and when, as inthis case, theory does not suggest an obvious ordering of the series, somesensitivity analysis should be undertaken. This can be achieved by clickingon the ‘Impulse Deﬁnition’ tab when the window that creates the impulsesis open. A window entitled ‘Ordering for Cholesky’ should be apparent,and it would be possible to reverse the order of variables or to select anyother order desired. For the variance decompositions, the ‘Ordering forCholesky’ box is observed in the window for creating the decompositionswithout having to select another tab.

Multivariate models 315
Screenshot 6.6
Variance
decompositiongraphs
Key concepts
The key terms to be able to deﬁne and explain from this chapter are
●endogenous variable ●exogenous variable
●simultaneous equations bias ●identiﬁed
●order condition ●rank condition
●Hausman test ●reduced form
●structural form ●instrumental variables
●indirect least squares ●two-stage least squares
●vector autoregression ●Granger causality
●impulse response ●variance decomposition
Review questions
1. Consider the following simultaneous equations system
y1t=α0+α1y2t+α2y3t+α3X1t+α4X2t+u1t (6.94)
y2t=β0+β1y3t+β2X1t+β3X3t+u2t (6.95)
y3t=γ0+γ1y1t+γ2X2t+γ3X3t+u3t (6.96)

316 Introductory Econometrics for Finance
(a) Derive the reduced form equations corresponding to (6.94)–(6.96).
(b) What do you understand by the term ‘identiﬁcation’? Describe a rule
for determining whether a system of equations is identiﬁed. Applythis rule to (6.94–6.96). Does this rule guarantee that estimates ofthe structural parameters can be obtained?
(c) Which would you consider the more serious misspeciﬁcation: treating
exogenous variables as endogenous, or treating endogenousvariables as exogenous? Explain your answer.
(d) Describe a method of obtaining the structural form coefﬁcients
corresponding to an overidentiﬁed system.
(e) Using EViews, estimate a VAR model for the interest rate series
used in the principal components example of chapter 3. Use amethod for selecting the lag length in the VAR optimally. Determinewhether certain maturities lead or lag others, by conducting Grangercausality tests and plotting impulse responses and variancedecompositions. Is there any evidence that new information isreﬂected more quickly in some maturities than others?
2. Consider the following system of two equations
y
1t=α0+α1y2t+α2X1t+α3X2t+u1t (6.97)
y2t=β0+β1y1t+β2X1t+u2t (6.98)
(a) Explain, with reference to these equations, the undesirable
consequences that would arise if (6.97) and (6.98) were estimatedseparately using OLS.
(b) What would be the effect upon your answer to (a) if the variable y
1t
had not appeared in (6.98)?
(c) State the order condition for determining whether an equation which
is part of a system is identiﬁed. Use this condition to determinewhether (6.97) or (6.98) or both or neither are identiﬁed.
(d) Explain whether indirect least squares (ILS) or two-stage least
squares (2SLS) could be used to obtain the parameters of (6.97)and (6.98). Describe how each of these two procedures (ILS and2SLS) are used to calculate the parameters of an equation. Compareand evaluate the usefulness of ILS, 2SLS and IV .
(e) Explain brieﬂy the Hausman procedure for testing for exogeneity.
3. Explain, using an example if you consider it appropriate, what you
understand by the equivalent terms ‘recursive equations’ and ‘triangularsystem’. Can a triangular system be validly estimated using OLS?Explain your answer.

Multivariate models 317
4. Consider the following vector autoregressive model
yt=β0+k/summationdisplay
i=1βiyt−i+ut (6.99)
where ytis a p×1vector of variables determined by klags of all p
variables in the system, utis a p×1 vector of error terms, β0is a p×1
vector of constant term coefﬁcients and βiarep×pmatrices of
coefﬁcients on the ith lag of y.
(a) If p=2, and k=3, write out all the equations of the VAR in full,
carefully deﬁning any new notation you use that is not given in thequestion.
(b) Why have VARs become popular for application in economics and
ﬁnance, relative to structural models derived from some underlyingtheory?
(c) Discuss any weaknesses you perceive in the VAR approach to
econometric modelling.
(d) Two researchers, using the same set of data but working
independently, arrive at different lag lengths for the VAR equation(6.99). Describe and evaluate two methods for determining which ofthe lag lengths is more appropriate.
5. Deﬁne carefully the following terms
●Simultaneous equations system
●Exogenous variables
●Endogenous variables
●Structural form model
●Reduced form model

7
Modelling long-run relationships in ﬁnance
Learning Outcomes
In this chapter, you will learn how to
●Highlight the problems that may occur if non-stationary data
are used in their levels form
●Test for unit roots
●Examine whether systems of variables are cointegrated
●Estimate error correction and vector error correction models
●Explain the intuition behind Johansen’s test for cointegration
●Describe how to test hypotheses in the Johansen framework
●Construct models for long-run relationships between variables
in EViews
7.1 Stationarity and unit root testing
7.1.1 Why are tests for non-stationarity necessary?
There are several reasons why the concept of non-stationarity is important
and why it is essential that variables that are non-stationary be treated dif-ferently from those that are stationary. Two deﬁnitions of non-stationaritywere presented at the start of chapter 5. For the purpose of the analysis inthis chapter, a stationary series can be deﬁned as one with a constant mean ,
constant variance and constant autocovariances for each given lag. Therefore,
the discussion in this chapter relates to the concept of weak stationarity.An examination of whether a series can be viewed as stationary or not isessential for the following reasons:
●The stationarity or otherwise of a series can strongly influence its behaviour
and properties . To offer one illustration, the word ‘shock’ is usually used
318

Modelling long-run relationships in finance 319
to denote a change or an unexpected change in a variable or perhaps
simply the value of the error term during a particular time period. For astationary series, ‘shocks’ to the system will gradually die away. That is,a shock during time twill have a smaller effect in time t+1, a smaller
effect still in time t+2, and so on. This can be contrasted with the case
of non-stationary data, where the persistence of shocks will always beinﬁnite, so that for a non-stationary series, the effect of a shock duringtime twill not have a smaller effect in time t+1, and in time t+2,
etc.
●The use of non-stationary data can lead to spurious regressions .I ft w o
stationary variables are generated as independent random series, whenone of those variables is regressed on the other, the t-ratio on the slope
coefﬁcient would be expected not to be signiﬁcantly different from zero,and the value of R
2would be expected to be very low. This seems ob-
vious, for the variables are not related to one another. However, if twovariables are trending over time, a regression of one on the other couldhave a high R
2even if the two are totally unrelated. So, if standard
regression techniques are applied to non-stationary data, the end resultcould be a regression that ‘looks’ good under standard measures (signif-icant coefﬁcient estimates and a high R
2), but which is really valueless.
Such a model would be termed a ‘spurious regression’.
To give an illustration of this, two independent sets of non-stationary
variables, yand x, were generated with sample size 500, one regressed
on the other and the R2noted. This was repeated 1,000 times to obtain
1,000 R2values. A histogram of these values is given in ﬁgure 7.1.
As ﬁgure 7.1 shows, although one would have expected the R2val-
ues for each regression to be close to zero, since the explained and
0.00 0.25 0.50 0.75200
160
120
8040
0frequency
2Figure 7.1
Value of R2for1,000
sets of regressionsof a non-stationaryvariable on anotherindependentnon-stationaryvariable

320 Introductory Econometrics for Finance
–750 –250 0 250 500 750 –500120
100
80604020
0frequency
t-ratioFigure 7.2
Value of t-ratio of
slope coefﬁcient for1,000sets of
regressions of anon-stationaryvariable on anotherindependentnon-stationaryvariable
explanatory variables in each case are independent of one another, in
fact R2takes on values across the whole range. For one set of data, R2
is bigger than 0.9, while it is bigger than 0.5 over 16% of the time!
●If the variables employed in a regression model are not stationary , then
it can be proved that the standard assumptions for asymptotic analysiswill not be valid. In other words, the usual ‘ t-ratios’ will not follow a
t-distribution, and the F-statistic will not follow an F-distribution, and
so on. Using the same simulated data as used to produce ﬁgure 7.1,ﬁgure 7.2 plots a histogram of the estimated t-ratio on the slope coefﬁ-
cient for each set of data.
In general, if one variable is regressed on another unrelated variable,
the t-ratio on the slope coefﬁcient will follow a t-distribution. For a
sample of size 500, this implies that 95% of the time, the t-ratio will
lie between ±2. As ﬁgure 7.2 shows quite dramatically, however, the
standard t-ratio in a regression of non-stationary variables can take on
enormously large values. In fact, in the above example, the t-ratio is
bigger than 2in absolute value over 98% of the time, when it should
be bigger than 2in absolute value only approximately 5% of the time!
Clearly, it is therefore not possible to validly undertake hypothesis testsabout the regression parameters if the data are non-stationary.
7.1.2 Two types of non-stationarity
There are two models that have been frequently used to characterise thenon-stationarity, the random walk model with drift
y
t=μ+yt−1+ut (7.1)

Modelling long-run relationships in finance 321
and the trend-stationary process – so-called because it is stationary around
a linear trend
yt=α+βt+ut (7.2)
where utis a white noise disturbance term in both cases.
Note that the model (7.1) could be generalised to the case where ytis
an explosive process
yt=μ+φyt−1+ut (7.3)
where φ> 1. Typically, this case is ignored and φ=1 is used to char-
acterise the non-stationarity because φ> 1 does not describe many data
series in economics and ﬁnance, but φ=1 has been found to describe
accurately many ﬁnancial and economic time series. Moreover, φ> 1 has
an intuitively unappealing property: shocks to the system are not onlypersistent through time, they are propagated so that a given shock willhave an increasingly large inﬂuence. In other words, the effect of a shockduring time twill have a larger effect in time t+1, a larger effect still in
time t+2, and so on. To see this, consider the general case of an AR(1)
with no drift
y
t=φyt−1+ut (7.4)
Letφtake any value for now. Lagging (7.4) one and then two periods
yt−1=φyt−2+ut−1 (7.5)
yt−2=φyt−3+ut−2 (7.6)
Substituting into (7.4) from (7.5) for yt−1yields
yt=φ(φyt−2+ut−1)+ut (7.7)
yt=φ2yt−2+φut−1+ut (7.8)
Substituting again for yt−2from (7.6)
yt=φ2(φyt−3+ut−2)+φut−1+ut (7.9)
yt=φ3yt−3+φ2ut−2+φut−1+ut (7.10)
Tsuccessive substitutions of this type lead to
yt=φT+1yt−(T+1)+φut−1+φ2ut−2+φ3ut−3+···+ φTut−T+ut(7.11)
There are three possible cases:(1)φ< 1⇒φ
T→0asT→∞
So the shocks to the system gradually die away – this is the stationary
case.

322 Introductory Econometrics for Finance
(2)φ=1⇒φT=1∀T
So shocks persist in the system and never die away. The following isobtained
y
t=y0+∞/summationdisplay
t=0utasT→∞ (7.12)
So the current value of yis just an inﬁnite sum of past shocks plus
some starting value of y0. This is known as the unit root case , for the
root of the characteristic equation would be unity.
(3)φ> 1. Now given shocks become more inﬂuential as time goes on,
since if φ> 1,φ3>φ2>φ, etc. This is the explosive case which, for the
reasons listed above, will not be considered as a plausible descriptionof the data.
Going back to the two characterisations of non-stationarity, the random
walk with drift
y
t=μ+yt−1+ut (7.13)
and the trend-stationary process
yt=α+βt+ut (7.14)
The two will require different treatments to induce stationarity. The
second case is known as deterministic non-stationarity and de-trending is
required. In other words, if it is believed that only this class of non-stationarity is present, a regression of the form given in (7.14) would berun, and any subsequent estimation would be done on the residuals from(7.14), which would have had the linear trend removed.
The ﬁrst case is known as stochastic non-stationarity, where there is a
stochastic trend in the data. Letting /Delta1y
t=yt−yt−1and Lyt=yt−1so that
(1−L)yt=yt−Lyt=yt−yt−1.I f( 7 . 1 3 )i st a k e na n d yt−1subtracted from
both sides
yt−yt−1=μ+ut (7.15)
(1−L)yt=μ+ut (7.16)
/Delta1yt=μ+ut (7.17)
There now exists a new variable /Delta1yt, which will be stationary. It would be
said that stationarity has been induced by ‘differencing once’. It shouldalso be apparent from the representation given by (7.16) why y
tis also
known as a unit root process : i.e. that the root of the characteristic equation
(1−z)=0, will be unity.

Modelling long-run relationships in finance 323
Although trend-stationary and difference-stationary series are both
‘trending’ over time, the correct approach needs to be used in each case. Ifﬁrst differences of a trend-stationary series were taken, it would ‘remove’the non-stationarity, but at the expense of introducing an MA(1) structureinto the errors. To see this, consider the trend-stationary model
y
t=α+βt+ut (7.18)
This model can be expressed for time t−1, which would be obtained by
removing 1 from all of the time subscripts in (7.18)
yt−1=α+β(t−1)+ut−1 (7.19)
Subtracting (7.19) from (7.18) gives
/Delta1yt=β+ut−ut−1 (7.20)
Not only is this a moving average in the errors that has been created,
it is a non-invertible MA (i.e. one that cannot be expressed as an autore-gressive process). Thus the series, /Delta1y
twould in this case have some very
undesirable properties.
Conversely if one tried to de-trend a series which has stochastic trend,
then the non-stationarity would not be removed. Clearly then, it is notalways obvious which way to proceed. One possibility is to nest both casesin a more general model and to test that. For example, consider the model
/Delta1y
t=α0+α1t+(γ−1)yt−1+ut (7.21)
Although again, of course the t-ratios in (7.21) will not follow a
t-distribution. Such a model could allow for both deterministic and
stochastic non-stationarity. However, this book will now concentrate onthe stochastic stationarity model since it is the model that has been foundto best describe most non-stationary ﬁnancial and economic time series.Consider again the simplest stochastic trend model
y
t=yt−1+ut (7.22)
or
/Delta1yt=ut (7.23)
This concept can be generalised to consider the case where the series
contains more than one ‘unit root’. That is, the ﬁrst difference operator,/Delta1, would need to be applied more than once to induce stationarity. This
situation will be described later in this chapter.
Arguably the best way to understand the ideas discussed above is to
consider some diagrams showing the typical properties of certain relevant

324 Introductory Econometrics for Finance
4
32
1
0
–1
–2
–3
–41 40 79 118 157 196 235 274 313 352 391 430 469Figure 7.3
Example of a white
noise process
70
6050
40
30
20
10
0
–10
–201 19 37 55 73 91 109 127 145 163 181 199 217 235 253 271 289 307 325 343 361 379 397 415 433 451 469 487Random walk
Random walk with driftFigure 7.4
Time series plot of a
random walk versusa random walk withdrift
types of processes. Figure 7.3 plots a white noise (pure random) process,
while ﬁgures 7.4 and 7.5 plot a random walk versus a random walk withdrift and a deterministic trend process, respectively.
Comparing these three ﬁgures gives a good idea of the differences be-
tween the properties of a stationary, a stochastic trend and a deterministictrend process. In ﬁgure 7.3, a white noise process visibly has no trendingbehaviour, and it frequently crosses its mean value of zero. The randomwalk (thick line) and random walk with drift (faint line) processes of ﬁg-ure 7.4 exhibit ‘long swings’ away from their mean value, which they crossvery rarely. A comparison of the two lines in this graph reveals that thepositive drift leads to a series that is more likely to rise over time than tofall; obviously, the effect of the drift on the series becomes greater and

Modelling long-run relationships in finance 325
30
25
20
15
10
5
0
–51 40 118 157 2 74 313 352 391 430 469 79 196 235Figure 7.5
Time series plot of a
deterministic trendprocess
15
10
50
–5
–10
–15
–20Phi = 1
Phi = 0.8Phi = 0
1 53 105 209 261 313 417 521 573 677 784 833 885 157 365 469 625 729 937 989Figure 7.6
Autoregressive
processes withdiffering values of φ
(0,0.8,1)
greater the further the two processes are tracked. Finally, the determin-
istic trend process of ﬁgure 7.5 clearly does not have a constant mean,and exhibits completely random ﬂuctuations about its upward trend. Ifthe trend were removed from the series, a plot similar to the white noiseprocess of ﬁgure 7.3 would result. In this author’s opinion, more time se-ries in ﬁnance and economics look like ﬁgure 7.4 than either ﬁgure 7.3 or7.5. Consequently, as stated above, the stochastic trend model will be thefocus of the remainder of this chapter.
Finally, ﬁgure 7.6 plots the value of an autoregressive process of order
1 with different values of the autoregressive coefﬁcient as given by (7.4).

326 Introductory Econometrics for Finance
Values of φ=0(i.e. a white noise process), φ=0.8(i.e. a stationary AR(1))
andφ=1 (i.e. a random walk) are plotted over time.
7.1.3 Some more deﬁnitions and terminology
If a non-stationary series, ytmust be differenced dtimes before it becomes
stationary, then it is said to be integrated of order d. This would be written
yt∼I(d).S oi f yt∼I(d)then/Delta1dyt∼I(0). This latter piece of terminology
states that applying the difference operator, /Delta1,dtimes, leads to an I(0)
process, i.e. a process with no unit roots. In fact, applying the differenceoperator more than dtimes to an I(d)process will still result in a station-
ary series (but with an MA error structure). An I(0)series is a stationary
series, while an I(1)series contains one unit root. For example, consider
the random walk
y
t=yt−1+ut (7.24)
AnI(2)series contains two unit roots and so would require differencing
twice to induce stationarity. I(1) and I(2) series can wander a long way
from their mean value and cross this mean value rarely, while I(0)series
should cross the mean frequently. The majority of ﬁnancial and economictime series contain a single unit root, although some are stationary andsome have been argued to possibly contain two unit roots (series suchas nominal consumer prices and nominal wages). The efﬁcient marketshypothesis together with rational expectations suggest that asset prices(or the natural logarithms of asset prices) should follow a random walk ora random walk with drift, so that their differences are unpredictable (oronly predictable to their long-term average value).
To see what types of data generating process could lead to an I(2)series,
consider the equation
y
t=2yt−1−yt−2+ut (7.25)
taking all of the terms in yover to the LHS, and then applying the lag
operator notation
yt−2yt−1+yt−2=ut (7.26)
(1−2L+L2)yt=ut (7.27)
(1−L)(1−L)yt=ut (7.28)
It should be evident now that this process for ytcontains two unit roots,
and would require differencing twice to induce stationarity.

Modelling long-run relationships in finance 327
What would happen if ytin (7.25) were differenced only once? Taking
ﬁrst differences of (7.25), i.e. subtracting yt−1from both sides
yt−yt−1=yt−1−yt−2+ut (7.29)
yt−yt−1=(yt−yt−1)−1+ut (7.30)
/Delta1yt=/Delta1yt−1+ut (7.31)
(1−L)/Delta1yt=ut (7.32)
First differencing would therefore have removed one of the unit roots, but
there is still a unit root remaining in the new variable, /Delta1yt.
7.1.4 Testing for a unit root
One immediately obvious (but inappropriate) method that readers may
think of to test for a unit root would be to examine the autocorrelationfunction of the series of interest. However, although shocks to a unit rootprocess will remain in the system indeﬁnitely, the acf for a unit root pro-cess (a random walk) will often be seen to decay away very slowly to zero.Thus, such a process may be mistaken for a highly persistent but station-ary process. Hence it is not possible to use the acf or pacf to determinewhether a series is characterised by a unit root or not. Furthermore, evenif the true data generating process for y
tcontains a unit root, the results
of the tests for a given sample could lead one to believe that the process isstationary. Therefore, what is required is some kind of formal hypothesistesting procedure that answers the question, ‘given the sample of data tohand, is it plausible that the true data generating process for ycontains
one or more unit roots?’
The early and pioneering work on testing for a unit root in time series
was done by Dickey and Fuller (Fuller, 1976; Dickey and Fuller, 1979).The basic objective of the test is to examine the null hypothesis thatφ=1i n
y
t=φyt−1+ut (7.33)
against the one-sided alternative φ< 1. Thus the hypotheses of interest
are H 0: series contains a unit root versus H 1: series is stationary.
In practice, the following regression is employed, rather than (7.33), for
ease of computation and interpretation
/Delta1yt=ψyt−1+ut (7.34)
so that a test of φ=1is equivalent to a test of ψ=0 (since φ−1=ψ).
Dickey–Fuller (DF) tests are also known as τ-tests, and can be conducted
allowing for an intercept, or an intercept and deterministic trend, or

328 Introductory Econometrics for Finance
Table 7.1 Critical values for DF tests (Fuller, 1976, p. 373)
Signiﬁcance level 10% 5% 1%
CV for constant but no trend −2.57 −2.86 −3.43
CV for constant and trend −3.12 −3.41 −3.96
neither, in the test regression. The model for the unit root test in each
case is
yt=φyt−1+μ+λt+ut (7.35)
The tests can also be written, by subtracting yt−1from each side of the
equation, as
/Delta1yt=ψyt−1+μ+λt+ut (7.36)
In another paper, Dickey and Fuller (1981) provide a set of additional
test statistics and their critical values for joint tests of the signiﬁcance ofthe lagged y, and the constant and trend terms. These are not examined
further here. The test statistics for the original DF tests are deﬁned as
test statistic =ˆψ
ˆSE(ˆψ)(7.37)
The test statistics do not follow the usual t-distribution under the null
hypothesis, since the null is one of non-stationarity, but rather they followa non-standard distribution. Critical values are derived from simulationsexperiments in, for example, Fuller (1976); see also chapter 12 in this book.Relevant examples of the distribution are shown in table 7.1. A full set ofDickey–Fuller (DF) critical values is given in the appendix of statisticaltables at the end of this book. A discussion and example of how suchcritical values (CV) are derived using simulations methods are presentedin chapter 12.
Comparing these with the standard normal critical values, it can be
seen that the DF critical values are much bigger in absolute terms (i.e.more negative). Thus more evidence against the null hypothesis is requiredin the context of unit root tests than under standard t-tests. This arises
partly from the inherent instability of the unit root process, the fatterdistribution of the t-ratios in the context of non-stationary data (see ﬁgure
7.2), and the resulting uncertainty in inference. The null hypothesis of aunit root is rejected in favour of the stationary alternative in each case ifthe test statistic is more negative than the critical value.

Modelling long-run relationships in finance 329
The tests above are valid only if utis white noise. In particular, utis
assumed not to be autocorrelated, but would be so if there was autocor-relation in the dependent variable of the regression ( /Delta1y
t) which has not
been modelled. If this is the case, the test would be ‘oversized’, mean-ing that the true size of the test (the proportion of times a correctnull hypothesis is incorrectly rejected) would be higher than the nom-inal size used (e.g. 5%). The solution is to ‘augment’ the test using p
lags of the dependent variable. The alternative model in case (i) is nowwritten
/Delta1y
t=ψyt−1+p/summationdisplay
i=1αi/Delta1yt−i+ut (7.38)
The lags of /Delta1ytnow ‘soak up’ any dynamic structure present in the depen-
dent variable, to ensure that utis not autocorrelated. The test is known as
an augmented Dickey–Fuller (ADF) test and is still conducted on ψ, and
the same critical values from the DF tables are used as before.
A problem now arises in determining the optimal number of lags of
the dependent variable. Although several ways of choosing phave been
proposed, they are all somewhat arbitrary, and are thus not presentedhere. Instead, the following two simple rules of thumb are suggested.First, the frequency of the data can be used to decide. So, for example, if the
data are monthly, use 12 lags, if the data are quarterly, use 4 lags, andso on. Clearly, there would not be an obvious choice for the number oflags to use in a regression containing higher frequency ﬁnancial data (e.g.hourly or daily)! Second, an information criterion can be used to decide. So
choose the number of lags that minimises the value of an informationcriterion, as outlined in chapter 6.
It is quite important to attempt to use an optimal number of lags of the
dependent variable in the test regression, and to examine the sensitivityof the outcome of the test to the lag length chosen. In most cases, hope-fully the conclusion will not be qualitatively altered by small changes in
p, but sometimes it will. Including too few lags will not remove all of
the autocorrelation, thus biasing the results, while using too many willincrease the coefﬁcient standard errors. The latter effect arises since anincrease in the number of parameters to estimate uses up degrees of free-dom. Therefore, everything else being equal, the absolute values of thetest statistics will be reduced. This will result in a reduction in the powerof the test, implying that for a stationary process the null hypothesis of aunit root will be rejected less frequently than would otherwise have beenthe case.

330 Introductory Econometrics for Finance
7.1.5 Testing for higher orders of integration
Consider the simple regression
/Delta1yt=ψyt−1+ut (7.39)
H0:ψ=0 is tested against H 1:ψ< 0.
If H 0is rejected, it would simply be concluded that ytdoes not contain
a unit root. But what should be the conclusion if H 0is not rejected?
The series contains a unit root, but is that it? No! What if yt∼I(2)? The
null hypothesis would still not have been rejected. It is now necessary toperform a test of
H
0:yt∼I(2) vs. H 1:yt∼I(1)
/Delta12yt(=/Delta1yt−/Delta1yt−1) would now be regressed on /Delta1yt−1(plus lags of /Delta12ytto
augment the test if necessary). Thus, testing H0:/Delta1yt∼I(1)is equivalent to
H0:yt∼I(2). So in this case, if H0is not rejected (very unlikely in practice),
it would be concluded that ytis at least I(2). If H 0is rejected, it would be
concluded that ytcontains a single unit root. The tests should continue
for a further unit root until H 0is rejected.
Dickey and Pantula (1987) have argued that an ordering of the tests
as described above (i.e. testing for I(1), then I(2), and so on) is, strictly
speaking, invalid. The theoretically correct approach would be to start byassuming some highest plausible order of integration (e.g. I(2)), and to test
I(2) against I(1).I fI(2) is rejected, then test I(1) against I(0). In practice,
however, to the author’s knowledge, no ﬁnancial time series contain morethan a single unit root, so that this matter is of less concern in ﬁnance.
7.1.6 Phillips–Perron (PP) tests
Phillips and Perron have developed a more comprehensive theory of unitroot non-stationarity. The tests are similar to ADF tests, but they incorpo-rate an automatic correction to the DF procedure to allow for autocorre-lated residuals. The tests often give the same conclusions as, and sufferfrom most of the same important limitations as, the ADF tests.
7.1.7 Criticisms of Dickey–Fuller- and Phillips–Perron-type tests
The most important criticism that has been levelled at unit root testsis that their power is low if the process is stationary but with a rootclose to the non-stationary boundary. So, for example, consider an AR(1)data generating process with coefﬁcient 0.95. If the true data generatingprocess is
y
t=0.95yt−1+ut (7.40)

Modelling long-run relationships in finance 331
Box 7.1 Stationarity tests
Stationarity tests have stationarity under the null hypothesis, thus reversing the null
and alternatives under the Dickey–Fuller approach. Thus, under stationarity tests, thedata will appear stationary by default if there is little information in the sample. Onesuch stationarity test is the KPSS test (Kwaitkowski et al., 1992). The computation of
the test statistic is not discussed here but the test is available within the EViewssoftware. The results of these tests can be compared with the ADF/PP procedure tosee if the same conclusion is obtained. The null and alternative hypotheses undereach testing approach are as follows:
ADF/PP KPSS
H0:yt∼I(1) H 0:yt∼I(0)
H1:yt∼I(0) H 1:yt∼I(1)
There are four possible outcomes:
(1) Reject H 0 and Do not reject H 0
(2) Do not Reject H 0 and Reject H 0
(3) Reject H 0 and Reject H 0
(4) Do not reject H 0 and Do not reject H 0
For the conclusions to be robust, the results should fall under outcomes 1 or 2, which
would be the case when both tests concluded that the series is stationary ornon-stationary, respectively. Outcomes 3 or 4 imply conﬂicting results. The joint use ofstationarity and unit root tests is known as conﬁrmatory data analysis .
the null hypothesis of a unit root should be rejected. It has been thus
argued that the tests are poor at deciding, for example, whether φ=1o r
φ=0.95, especially with small sample sizes. The source of this problem
is that, under the classical hypothesis-testing framework, the null hypoth-esis is never accepted, it is simply stated that it is either rejected or notrejected. This means that a failure to reject the null hypothesis could oc-cur either because the null was correct, or because there is insufﬁcientinformation in the sample to enable rejection. One way to get around thisproblem is to use a stationarity test as well as a unit root test, as describedin box 7.1.
7.2 Testing for unit roots in EViews
This example uses the same data on UK house prices as employed in chap-ter 5. Assuming that the data have been loaded, and the variables aredeﬁned as in chapter 5, double click on the icon next to the name of theseries that you want to perform the unit root test on, so that a spreadsheet

332 Introductory Econometrics for Finance
appears containing the observations on that series. Open the raw house
price series, ‘hp’ by clicking on the hpicon. Next, click on the View but-
ton on the button bar above the spreadsheet and then Unit Root Test ….
You will then be presented with a menu containing various options, as inscreenshot 7.1.
Screenshot 7.1
Options menu for
unit root tests
From this, choose the following options:
(1) Test Type Augmented Dickey–Fuller
(2) Test for Unit Root in Levels(3) Include in test equation Intercept(4) Maximum lags 12
and click OK.
This will obviously perform an augmented Dickey–Fuller (ADF) test with
up to 12 lags of the dependent variable in a regression equation on theraw data series with a constant but no trend in the test equation. EViewspresents a large number of options here – for example, instead of the

Modelling long-run relationships in finance 333
Dickey–Fuller series, we could run the Phillips–Perron or KPSS tests as
described above. Or, if we ﬁnd that the levels of the series are non-stationary, we could repeat the analysis on the ﬁrst differences directlyfrom this menu rather than having to create the ﬁrst differenced seriesseparately. We can also choose between various methods for determiningthe optimum lag length in an augmented Dickey–Fuller test, with theSchwarz criterion being the default. The results for the raw house priceseries would appear as in the following table.
Null Hypothesis: HP has a unit root
Exogenous: ConstantLag Length: 2 (Automatic based on SIC, MAXLAG=11)
t-Statistic Prob.∗
Augmented Dickey-Fuller test statistic 2.707012 1.0000
Test critical values: 1% level −3.464101
5% level −2.876277
10% level −2.574704
∗MacKinnon (1996) one-sided p-values.
Augmented Dickey-Fuller Test Equation
Dependent Variable: D(HP)Method: Least SquaresDate: 09/05/07 Time: 21:15Sample (adjusted): 1991M04 2007M05Included observations: 194 after adjustments
Coefﬁcient Std. Error t-Statistic Prob.
HP(-1) 0.004890 0.001806 2.707012 0.0074
D(HP(-1)) 0.220916 0.070007 3.155634 0.0019
D(HP(-2)) 0.291059 0.070711 4.116164 0.0001
C −99.91536 155.1872 −0.643838 0.5205
R-squared 0.303246 Mean dependent var 663.3590
Adjusted R-squared 0.292244 S.D. dependent var 1081.701S.E. of regression 910.0161 Akaike info criterion 16.48520Sum squared resid 1.57E +08 Schwarz criterion 16.55258
Log likelihood −1595.065 Hannan-Quinn criter. 16.51249
F-statistic 27.56430 Durbin-Watson stat 2.010299Prob(F-statistic) 0.000000
The value of the test statistic and the relevant critical values given the
type of test equation (e.g. whether there is a constant and/or trend in-cluded) and sample size, are given in the ﬁrst panel of the output above.

334 Introductory Econometrics for Finance
Schwarz’s criterion has in this case chosen to include 2 lags of the depen-
dent variable in the test regression. Clearly, the test statistic is not morenegative than the critical value, so the null hypothesis of a unit root inthe house price series cannot be rejected. The remainder of the outputpresents the estimation results. Since the dependent variable in this re-gression is non-stationary, it is not appropriate to examine the coefﬁcientstandard errors or their t-ratios in the test regression.
Now repeat all of the above steps for the first difference of the house
price series (use the ‘First Difference’ option in the unit root testing win-
dow rather than using the level of the dhp series). The output wouldappear as in the following table
Null Hypothesis: D(HP) has a unit root
Exogenous: ConstantLag Length: 1 (Automatic based on SIC, MAXLAG =11)
t-Statistic Prob.∗
Augmented Dickey-Fuller test statistic −5.112531 0.0000
Test critical values: 1% level −3.464101
5% level −2.876277
10% level −2.574704
∗MacKinnon (1996) one-sided p-values.
Augmented Dickey-Fuller Test Equation
Dependent Variable: D(HP,2)Method: Least SquaresDate: 09/05/07 Time: 21:20Sample (adjusted): 1991M04 2007M05Included observations: 194 after adjustments
Coefﬁcient Std. Error t-Statistic Prob.
D(HP(-1)) −0.374773 0.073305 −5.112531 0.0000
D(HP(-1),2) −0.346556 0.068786 −5.038192 0.0000
C 259.6274 81.58188 3.182415 0.0017
R-squared 0.372994 Mean dependent var 9.661185Adjusted R-squared 0.366429 S.D. dependent var 1162.061S.E. of regression 924.9679 Akaike info criterion 16.51274Sum squared resid 1.63E +08 Schwarz criterion 16.56327
Log likelihood −1598.736 Hannan-Quinn criter. 16.53320
F-statistic 56.81124 Durbin-Watson stat 2.045299Prob(F-statistic) 0.000000

Modelling long-run relationships in finance 335
In this case, as one would expect, the test statistic is more negative than
the critical value and hence the null hypothesis of a unit root in the ﬁrstdifferences is convincingly rejected. For completeness, run a unit root teston the levels of the dhp series , which are the percentage changes rather
than the absolute differences in prices. You should ﬁnd that these are alsostationary.
Finally, run the KPSS test on the hp levels series by selecting it from
the‘Test Type’ box in the unit root testing window. You should observe
now that the test statistic exceeds the critical value, even at the 1% level,so that the null hypothesis of a stationary series is strongly rejected, thus
conﬁrming the result of the unit root test previously conducted on thesame series.
7.3 Cointegration
In most cases, if two variables that are I(1)are linearly combined, then the
combination will also be I(1). More generally, if variables with differing
orders of integration are combined, the combination will have an order ofintegration equal to the largest. If X
i,t∼I(di)f o r i=1,2,3,…, kso that
there are kvariables each integrated of order di, and letting
zt=k/summationdisplay
i=1αiXi,t (7.41)
Then zt∼I(max di).ztin this context is simply a linear combination of
thekvariables Xi. Rearranging (7.41)
X1,t=k/summationdisplay
i=2βiXi,t+z/prime
t (7.42)
where βi=−αi
α1,z/prime
t=zt
α1,i=2,…, k. All that has been done is to take one
of the variables, X1,t, and to rearrange (7.41) to make it the subject. It could
also be said that the equation has been normalised on X1,t. But viewed
another way, (7.42) is just a regression equation where z/prime
tis a disturbance
term. These disturbances would have some very undesirable properties:in general, z
/prime
twill not be stationary and is autocorrelated if all of the Xi
areI(1).
As a further illustration, consider the following regression model con-
taining variables yt,x2t,x3twhich are all I(1)
yt=β1+β2x2t+β3x3t+ut (7.43)
For the estimated model, the SRF would be written
yt=ˆβ1+ˆβ2x2t+ˆβ3x3t+ˆut (7.44)

336 Introductory Econometrics for Finance
Taking everything except the residuals to the LHS
yt−ˆβ1−ˆβ2x2t−ˆβ3x3t=ˆut (7.45)
Again, the residuals when expressed in this way can be considered a linear
combination of the variables. Typically, this linear combination of I(1)
variables will itself be I(1), but it would obviously be desirable to obtain
residuals that are I(0). Under what circumstances will this be the case?
The answer is that a linear combination of I(1) variables will be I(0),i n
other words stationary, if the variables are cointegrated .
7.3.1 Deﬁnition of cointegration (Engle and Granger, 1987)
Letwtbe a k×1 vector of variables, then the components of wtare inte-
grated of order ( d,b) if:
(1) All components of wtare I( d)
(2) There is at least one vector of coefﬁcients αsuch that
α/primewt∼I(d−b)
In practice, many ﬁnancial variables contain one unit root, and are thus
I(1), so that the remainder of this chapter will restrict analysis to the casewhere d=b=1. In this context, a set of variables is deﬁned as cointe-
grated if a linear combination of them is stationary. Many time seriesare non-stationary but ‘move together’ over time – that is, there existsome inﬂuences on the series (for example, market forces), which implythat the two series are bound by some relationship in the long run. Acointegrating relationship may also be seen as a long-term or equilibriumphenomenon, since it is possible that cointegrating variables may devi-ate from their relationship in the short run, but their association wouldreturn in the long run.
7.3.2 Examples of possible cointegrating relationships in ﬁnance
Financial theory should suggest where two or more variables would beexpected to hold some long-run relationship with one another. There aremany examples in ﬁnance of areas where cointegration might be expectedto hold, including:
●Spot and futures prices for a given commodity or asset
●Ratio of relative prices and an exchange rate
●Equity prices and dividends.
In all three cases, market forces arising from no-arbitrage conditions
suggest that there should be an equilibrium relationship between the

Modelling long-run relationships in finance 337
series concerned. The easiest way to understand this notion is perhaps
to consider what would be the effect if the series were not cointegrated.If there were no cointegration, there would be no long-run relationshipbinding the series together, so that the series could wander apart withoutbound. Such an effect would arise since all linear combinations of the se-ries would be non-stationary, and hence would not have a constant meanthat would be returned to frequently.
Spot and futures prices may be expected to be cointegrated since they
are obviously prices for the same asset at different points in time, andhence will be affected in very similar ways by given pieces of information.The long-run relationship between spot and futures prices would be givenby the cost of carry.
Purchasing power parity (PPP) theory states that a given representative
basket of goods and services should cost the same wherever it is boughtwhen converted into a common currency. Further discussion of PPP occursin section 7.9, but for now sufﬁce it to say that PPP implies that theratio of relative prices in two countries and the exchange rate betweenthem should be cointegrated. If they did not cointegrate, assuming zerotransactions costs, it would be proﬁtable to buy goods in one country, sellthem in another, and convert the money obtained back to the currencyof the original country.
Finally, if it is assumed that some stock in a particular company is
held to perpetuity (i.e. for ever), then the only return that would accrueto that investor would be in the form of an inﬁnite stream of futuredividend payments. Hence the discounted dividend model argues thatthe appropriate price to pay for a share today is the present value of allfuture dividends. Hence, it may be argued that one would not expectcurrent prices to ‘move out of line’ with future anticipated dividends inthe long run, thus implying that share prices and dividends should becointegrated.
An interesting question to ask is whether a potentially cointegrating
regression should be estimated using the levels of the variables or thelogarithms of the levels of the variables. Financial theory may provide ananswer as to the more appropriate functional form, but fortunately evenif not, Hendry and Juselius (2000) note that if a set of series is cointegratedin levels, they will also be cointegrated in log levels.
7.4 Equilibrium correction or error correction models
When the concept of non-stationarity was ﬁrst considered in the 1970s, ausual response was to independently take the ﬁrst differences of each of

338 Introductory Econometrics for Finance
theI(1)variables and then to use these ﬁrst differences in any subsequent
modelling process. In the context of univariate modelling (e.g. the con-struction of ARMA models), this is entirely the correct approach. However,when the relationship between variables is important, such a procedureis inadvisable. While this approach is statistically valid, it does have theproblem that pure ﬁrst difference models have no long-run solution. Forexample, consider two series, y
tand xt, that are both I(1). The model that
one may consider estimating is
/Delta1yt=β/Delta1xt+ut (7.46)
One deﬁnition of the long run that is employed in econometrics implies
that the variables have converged upon some long-term values and areno longer changing, thus y
t=yt−1=y;xt=xt−1=x. Hence all the dif-
ference terms will be zero in (7.46), i.e. /Delta1yt=0;/Delta1xt=0, and thus every-
thing in the equation cancels. Model (7.46) has no long-run solution and ittherefore has nothing to say about whether xand yhave an equilibrium
relationship (see chapter 4).
Fortunately, there is a class of models that can overcome this problem by
using combinations of ﬁrst differenced and lagged levels of cointegratedvariables. For example, consider the following equation
/Delta1y
t=β1/Delta1xt+β2(yt−1−γxt−1)+ut (7.47)
This model is known as an error correction model or an equilibrium correction
model , and yt−1−γxt−1is known as the error correction term . Provided that
ytand xtare cointegrated with cointegrating coefﬁcient γ, then (yt−1−
γxt−1)will be I(0) even though the constituents are I(1). It is thus valid
to use OLS and standard procedures for statistical inference on (7.47). It isof course possible to have an intercept in either the cointegrating term(e.g. y
t−1−α−γxt−1)or in the model for /Delta1yt(e.g./Delta1yt=β0+β1/Delta1xt+
β2(yt−1−γxt−1)+ut)or both. Whether a constant is included or not could
be determined on the basis of ﬁnancial theory, considering the argumentson the importance of a constant discussed in chapter 4.
The error correction model is sometimes termed an equilibrium correc-
tion model, and the two terms will be used synonymously for the purposesof this book. Error correction models are interpreted as follows. yis pur-
ported to change between t−1 and tas a result of changes in the values
of the explanatory variable(s), x, between t−1and t, and also in part to
correct for any disequilibrium that existed during the previous period.Note that the error correction term ( y
t−1−γxt−1) appears in (7.47) with
a lag. It would be implausible for the term to appear without any lag(i.e. as y
t−γxt), for this would imply that ychanges between t−1and

Modelling long-run relationships in finance 339
tin response to a disequilibrium at time t.γdeﬁnes the long-run rela-
tionship between xand y, while β1describes the short-run relationship
between changes in xand changes in y. Broadly, β2describes the speed
of adjustment back to equilibrium, and its strict deﬁnition is that it mea-sures the proportion of last period’s equilibrium error that is correctedfor.
Of course, an error correction model can be estimated for more than
two variables. For example, if there were three variables, x
t,wt,yt, that
were cointegrated, a possible error correction model would be
/Delta1yt=β1/Delta1xt+β2/Delta1w t+β3(yt−1−γ1xt−1−γ2wt−1)+ut (7.48)
The Granger representation theorem states that if there exists a dynamic lin-
ear model with stationary disturbances and the data are I(1), then the
variables must be cointegrated of order (1,1).
7.5 Testing for cointegration in regression:
a residuals-based approach
The model for the equilibrium correction term can be generalised further
to include kvariables ( yand the k−1xs)
yt=β1+β2x2t+β3x3t+···+ βkxkt+ut (7.49)
utshould be I(0) if the variables yt,x2t,…xktare cointegrated, but utwill
still be non-stationary if they are not.
Thus it is necessary to test the residuals of (7.49) to see whether they
are non-stationary or stationary. The DF or ADF test can be used on ˆut,
using a regression of the form
/Delta1ˆut=ψˆut−1+vt (7.50)
withvtan iid error term.
However, since this is a test on residuals of a model, ˆut, then the critical
values are changed compared to a DF or an ADF test on a series of rawdata. Engle and Granger (1987) have tabulated a new set of critical valuesfor this application and hence the test is known as the Engle–Granger(EG) test. The reason that modiﬁed critical values are required is thatthe test is now operating on the residuals of an estimated model ratherthan on raw data. The residuals have been constructed from a particularset of coefﬁcient estimates, and the sampling estimation error in thosecoefﬁcients will change the distribution of the test statistic. Engle andYoo (1987) tabulate a new set of critical values that are larger in absolute

340 Introductory Econometrics for Finance
value (i.e. more negative) than the DF critical values, also given at the end
of this book. The critical values also become more negative as the numberof variables in the potentially cointegrating regression increases.
It is also possible to use the Durbin–Watson ( DW) test statistic or the
Phillips–Perron ( PP) approach to test for non-stationarity of ˆu
t.I ft h e DW
test is applied to the residuals of the potentially cointegrating regression,it is known as the Cointegrating Regression Durbin Watson ( CRDW ). Under
the null hypothesis of a unit root in the errors, CRDW ≈0, so the null
of a unit root is rejected if the CRDW statistic is larger than the relevant
critical value (which is approximately 0.5).
What are the null and alternative hypotheses for any unit root test
applied to the residuals of a potentially cointegrating regression?
H
0:ˆut∼I(1)
H1:ˆut∼I(0).
Thus, under the null hypothesis there is a unit root in the potentially coin-
tegrating regression residuals, while under the alternative, the residualsare stationary. Under the null hypothesis, therefore, a stationary linearcombination of the non-stationary variables has not been found. Hence,if this null hypothesis is not rejected, there is no cointegration. The ap-propriate strategy for econometric modelling in this case would be toemploy speciﬁcations in ﬁrst differences only. Such models would haveno long-run equilibrium solution, but this would not matter since nocointegration implies that there is no long-run relationship anyway.
On the other hand, if the null of a unit root in the potentially coin-
tegrating regression’s residuals is rejected, it would be concluded that astationary linear combination of the non-stationary variables had beenfound. Therefore, the variables would be classed as cointegrated. The ap-propriate strategy for econometric modelling in this case would be to formand estimate an error correction model, using a method described below.
Box 7.2 Multiple cointegrating relationships
In the case where there are only two variables in an equation, yt,and xt,say, there can
be at most only one linear combination of yt,and xtthat is stationary – i.e. at most
one cointegrating relationship. However, suppose that there are kvariables in a system
(ignoring any constant term), denoted yt,x2t,…xkt. In this case, there may be up to r
linearly independent cointegrating relationships (where r≤k−1). This potentially
presents a problem for the OLS regression approach described above, which is capableof ﬁnding at most one cointegrating relationship no matter how many variables thereare in the system. And if there are multiple cointegrating relationships, how can oneknow if there are others, or whether the ‘best’ or strongest cointegrating relationship

Modelling long-run relationships in finance 341
has been found? An OLS regression will ﬁnd the minimum variance stationary linear
combination of the variables,1but there may be other linear combinations of the
variables that have more intuitive appeal. The answer to this problem is to use asystems approach to cointegration, which will allow determination of all rcointegrating
relationships. One such approach is Johansen’s method – see section 7.8.
7.6 Methods of parameter estimation in cointegrated systems
What should be the modelling strategy if the data at hand are thoughtto be non-stationary and possibly cointegrated? There are (at least) threemethods that could be used: Engle–Granger, Engle–Yoo and Johansen. Theﬁrst and third of these will be considered in some detail below.
7.6.1 The Engle–Granger 2-step method
This is a single equation technique, which is conducted as follows:
Step 1
Make sure that all the individual variables are I(1). Then estimate the
cointegrating regression using OLS. Note that it is not possible to performany inferences on the coefﬁcient estimates in this regression – all thatcan be done is to estimate the parameter values. Save the residuals of thecointegrating regression, ˆu
t. Test these residuals to ensure that they are
I(0).I ft h e ya r e I(0), proceed to Step 2; if they are I(1), estimate a model
containing only ﬁrst differences.
Step 2
Use the step 1 residuals as one variable in the error correction model, e.g.
/Delta1yt=β1/Delta1xt+β2(ˆut−1)+vt (7.51)
where ˆut−1=yt−1−ˆτxt−1. The stationary, linear combination of non-
stationary variables is also known as the cointegrating vector . In this case,
the cointegrating vector would be [1 −ˆτ]. Additionally, any linear transfor-
mation of the cointegrating vector will also be a cointegrating vector. So,for example, −10y
t−1+10ˆτxt−1will also be stationary. In (7.45) above, the
cointegrating vector would be [1−ˆβ1−ˆβ2−ˆβ3]. It is now valid to perform
1Readers who are familiar with the literature on hedging with futures will recognise
that running an OLS regression will minimise the variance of the hedged portfolio, i.e.it will minimise the regression’s residual variance, and the situation here is analogous.

342 Introductory Econometrics for Finance
inferences in the second-stage regression, i.e. concerning the parameters
β1andβ2(provided that there are no other forms of misspeciﬁcation, of
course), since all variables in this regression are stationary.
The Engle–Granger 2-step method suffers from a number of problems:
(1) The usual ﬁnite sample problem of a lack of power in unit root and
cointegration tests discussed above.
(2) There could be a simultaneous equations bias if the causality between
yand xruns in both directions, but this single equation approach
requires the researcher to normalise on one variable (i.e. to specifyone variable as the dependent variable and the others as independentvariables). The researcher is forced to treat yand xasymmetrically,
even though there may have been no theoretical reason for doing so. Afurther issue is the following. Suppose that the following speciﬁcationhad been estimated as a potential cointegrating regression
y
t=α1+β1xt+u1t (7.52)
What if instead the following equation was estimated?
xt=α2+β2yt+u2t (7.53)
If it is found that u1t∼I(0), does this imply automatically that u2t∼
I(0)? The answer in theory is ‘yes’, but in practice different conclusions
may be reached in ﬁnite samples. Also, if there is an error in the modelspeciﬁcation at stage 1, this will be carried through to the cointegra-tion test at stage 2, as a consequence of the sequential nature of thecomputation of the cointegration test statistic.
(3) It is not possible to perform any hypothesis tests about the actual coin-
tegrating relationship estimated at stage 1.
Problems 1 and 2 are small sample problems that should disappear asymp-
totically. Problem 3 is addressed by another method due to Engle and Yoo.There is also another alternative technique, which overcomes problems 2and 3 by adopting a different approach based on estimation of a VARsystem – see section 7.8.
7.6.2 The Engle and Yoo 3-step method
The Engle and Yoo (1987) 3-step procedure takes its ﬁrst two steps fromEngle–Granger (EG). Engle and Yoo then add a third step giving updatedestimates of the cointegrating vector and its standard errors. The Engleand Yoo (EY) third step is algebraically technical and additionally, EY suf-fers from all of the remaining problems of the EG approach. There is

Modelling long-run relationships in finance 343
arguably a far superior procedure available to remedy the lack of testabil-
ity of hypotheses concerning the cointegrating relationship – namely, theJohansen (1988) procedure. For these reasons, the Engle–Yoo procedure israrely employed in empirical applications and is not considered furtherhere.
There now follows an application of the Engle–Granger procedure in
the context of spot and futures markets.
7.7 Lead–lag and long-term relationships between spot
and futures markets
7.7.1 Background
If the markets are frictionless and functioning efﬁciently, changes in the
(log of the) spot price of a ﬁnancial asset and its corresponding changes inthe (log of the) futures price would be expected to be perfectly contempo-raneously correlated and not to be cross-autocorrelated. Mathematically,these notions would be represented as
corr(/Delta1log(f
t),/Delta1ln(st))≈1 (a)
corr(/Delta1log(ft),/Delta1ln(st−k))≈0∀k>0 (b)
corr(/Delta1log(ft−j),/Delta1ln(st))≈0∀j>0 (c)
In other words, changes in spot prices and changes in futures prices are
expected to occur at the same time (condition (a)). The current change inthe futures price is also expected not to be related to previous changesin the spot price (condition (b)), and the current change in the spot priceis expected not to be related to previous changes in the futures price(condition (c)). The changes in the log of the spot and futures prices arealso of course known as the spot and futures returns.
For the case when the underlying asset is a stock index, the equilibrium
relationship between the spot and futures prices is known as the cost of
carry model , given by
F
∗
t=Ste(r−d)(T−t)(7.54)
where F∗
tis the fair futures price, Stis the spot price, ris a continuously
compounded risk-free rate of interest, dis the continuously compounded
yield in terms of dividends derived from the stock index until the fu-tures contract matures, and ( T−t) is the time to maturity of the futures
contract. Taking logarithms of both sides of (7.54) gives
f
∗
t=st+(r−d)(T−t) (7.55)

344 Introductory Econometrics for Finance
Table 7.2 DF tests on log-prices and returns for high frequency
FTSE data
Futures Spot
Dickey–Fuller statistics −0.1329 −0.7335
for log-price data
Dickey–Fuller statistics −84.9968 −114.1803
for returns data
where f∗
tis the log of the fair futures price and stis the log of the spot
price. Equation (7.55) suggests that the long-term relationship betweenthe logs of the spot and futures prices should be one to one. Thus thebasis, deﬁned as the difference between the futures and spot prices (and ifnecessary adjusted for the cost of carry) should be stationary, for if it couldwander without bound, arbitrage opportunities would arise, which wouldbe assumed to be quickly acted upon by traders such that the relationshipbetween spot and futures prices will be brought back to equilibrium.
The notion that there should not be any lead–lag relationships between
the spot and futures prices and that there should be a long-term one toone relationship between the logs of spot and futures prices can be testedusing simple linear regressions and cointegration analysis. This book willnow examine the results of two related papers – Tse (1995), who employsdaily data on the Nikkei Stock Average (NSA) and its futures contract, andBrooks, Rew and Ritson (2001), who examine high-frequency data fromthe FTSE 100 stock index and index futures contract.
The data employed by Tse (1995) consists of 1,055 daily observations
on NSA stock index and stock index futures values from December 1988to April 1993. The data employed by Brooks et al. comprises 13,035 ten-
minutely observations for all trading days in the period June 1996–May1997, provided by FTSE International. In order to form a statistically ade-quate model, the variables should ﬁrst be checked as to whether they canbe considered stationary. The results of applying a Dickey–Fuller (DF) testto the logs of the spot and futures prices of the 10-minutely FTSE data areshown in table 7.2.
As one might anticipate, both studies conclude that the two log-price se-
ries contain a unit root, while the returns are stationary. Of course, it maybe necessary to augment the tests by adding lags of the dependent variableto allow for autocorrelation in the errors (i.e. an Augmented Dickey–Fulleror ADF test). Results for such tests are not presented, since the conclusionsare not altered. A statistically valid model would therefore be one in thereturns. However, a formulation containing only ﬁrst differences has no

Modelling long-run relationships in finance 345
Table 7.3 Estimated potentially cointegrating
equation and test for cointegration forhigh frequency FTSE data
Coefficient Estimated value
ˆγ0 0.1345
ˆγ1 0.9834
DF test on residuals Test statistic
ˆzt −14.7303
Source : Brooks, Rew and Ritson (2001).
long-run equilibrium solution. Additionally, theory suggests that the two
series should have a long–run relationship. The solution is therefore to seewhether there exists a cointegrating relationship between f
tand stwhich
would mean that it is valid to include levels terms along with returns inthis framework. This is tested by examining whether the residuals, ˆz
t,o f
a regression of the form
st=γ0+γ1ft+zt (7.56)
are stationary, using a Dickey–Fuller test, where ztis the error term. The
coefﬁcient values for the estimated (7.56) and the DF test statistic are givenin table 7.3.
Clearly, the residuals from the cointegrating regression can be consid-
ered stationary. Note also that the estimated slope coefﬁcient in the coin-tegrating regression takes on a value close to unity, as predicted from thetheory. It is not possible to formally test whether the true population co-efﬁcient could be one, however, since there is no way in this frameworkto test hypotheses about the cointegrating relationship.
The ﬁnal stage in building an error correction model using the Engle–
Granger 2-step approach is to use a lag of the ﬁrst-stage residuals, ˆz
t,a st h e
equilibrium correction term in the general equation. The overall model is
/Delta1logst=β0+δˆzt−1+β1/Delta1lnst−1+α1/Delta1lnft−1+vt (7.57)
where vtis an error term. The coefﬁcient estimates for this model are
presented in table 7.4.
Consider ﬁrst the signs and signiﬁcances of the coefﬁcients (these can
now be interpreted validly since all variables used in this model are sta-tionary). ˆα
1is positive and highly signiﬁcant, indicating that the futures
market does indeed lead the spot market, since lagged changes in futuresprices lead to a positive change in the subsequent spot price. ˆβ
1is positive

346 Introductory Econometrics for Finance
Table 7.4 Estimated error correction model for high
frequency FTSE data
Coefﬁcient Estimated value t-ratio
ˆβ0 9.6713E −06 1.6083
ˆδ −0.8388 −5.1298
ˆβ1 0.1799 19.2886
ˆα1 0.1312 20.4946
Source : Brooks, Rew and Ritson (2001).
Table 7.5 Comparison of out-of-sample forecasting accuracy
ECM ECM-COC ARIMA VAR
RMSE 0.0004382 0.0004350 0.0004531 0.0004510
MAE 0.4259 0.4255 0.4382 0.4378% Correct direction 67.69% 68.75% 64.36% 66.80%
Source : Brooks, Rew and Ritson (2001).
and highly signiﬁcant, indicating on average a positive autocorrelation in
spot returns. ˆδ, the coefﬁcient on the error correction term, is negative
and signiﬁcant, indicating that if the difference between the logs of thespot and futures prices is positive in one period, the spot price will fallduring the next period to restore equilibrium, and vice versa.
7.7.2 Forecasting spot returns
Both Brooks, Rew and Ritson (2001) and Tse (1995) show that it is possibleto use an error correction formulation to model changes in the log of astock index. An obvious related question to ask is whether such a modelcan be used to forecast the future value of the spot series for a holdoutsample of data not used previously for model estimation. Both sets of re-searchers employ forecasts from three other models for comparison withthe forecasts of the error correction model. These are an error correc-tion model with an additional term that allows for the cost of carry, anARMA model (with lag length chosen using an information criterion) andan unrestricted VAR model (with lag length chosen using a multivariateinformation criterion).
The results are evaluated by comparing their root-mean squared errors,
mean absolute errors and percentage of correct direction predictions. Theforecasting results from the Brooks, Rew and Ritson paper are given intable 7.5.

Modelling long-run relationships in finance 347
It can be seen from table 7.5 that the error correction models have
both the lowest mean squared and mean absolute errors, and the highestproportion of correct direction predictions. There is, however, little tochoose between the models, and all four have over 60% of the signs of thenext returns predicted correctly.
It is clear that on statistical grounds the out-of-sample forecasting per-
formances of the error correction models are better than those of theircompetitors, but this does not necessarily mean that such forecasts haveany practical use. Many studies have questioned the usefulness of statisti-cal measures of forecast accuracy as indicators of the proﬁtability of usingthese forecasts in a practical trading setting (see, for example, Leitch andTanner, 1991). Brooks, Rew and Ritson (2001) investigate this propositiondirectly by developing a set of trading rules based on the forecasts of theerror correction model with the cost of carry term, the best statisticalforecasting model. The trading period is an out-of-sample data series notused in model estimation, running from 1 May–30 May 1997. The ECM-COCmodel yields 10-minutely one-step-ahead forecasts. The trading strategy in-volves analysing the forecast for the spot return, and incorporating thedecision dictated by the trading rules described below. It is assumed thatthe original investment is £1,000, and if the holding in the stock indexis zero, the investment earns the risk-free rate. Five trading strategies areemployed, and their proﬁtabilities are compared with that obtained bypassively buying and holding the index. There are of course an inﬁnitenumber of strategies that could be adopted for a given set of spot returnforecasts, but Brooks, Rew and Ritson use the following:
●Liquid trading strategy This trading strategy involves making a round-
trip trade (i.e. a purchase and sale of the FTSE 100 stocks) every 10minutes that the return is predicted to be positive by the model. If thereturn is predicted to be negative by the model, no trade is executedand the investment earns the risk-free rate.
●Buy-and-hold while forecast positive strategy This strategy allows the trader
to continue holding the index if the return at the next predicted invest-ment period is positive, rather than making a round-trip transaction foreach period.
●Filter strategy: better predicted return than average This strategy involves
purchasing the index only if the predicted returns are greater than theaverage positive return (there is no trade for negative returns thereforethe average is only taken of the positive returns).
●Filter strategy: better predicted return than first decile This strategy is
similar to the previous one, but rather than utilising the average as

348 Introductory Econometrics for Finance
Table 7.6 Trading proﬁtability of the error correction model with cost of carry
Terminal Terminal Return(%)
wealth Return (%) wealth (£) annualised Number
Trading strategy (£) annualised with slippage with slippage of trades
Passive investment 1040.92 4.09 1040.92 4.09 1
{49.08}{ 49.08}
Liquid trading 1156.21 15.62 1056.38 5.64 583
{187.44 }{ 67.68}
Buy-and-Hold while 1156.21 15.62 1055.77 5.58 383
forecast positive {187.44 }{ 66.96}
Filter I 1144.51 14.45 1123.57 12.36 135
{173.40 }{ 148.32 }
Filter II 1100.01 10.00 1046.17 4.62 65
{120.00 }{ 55.44}
Filter III 1019.82 1.98 1003.23 0.32 8
{23.76}{ 3.84}
Source : Brooks, Rew and Ritson (2001).
previously, only the returns predicted to be in the top 10% of all re-
turns are traded on.
●Filter strategy: high arbitrary cutoff An arbitrary ﬁlter of 0.0075% is im-
posed, which will result in trades only for returns that are predicted tobe extremely large for a 10-minute interval.
The results from employing each of the strategies using the forecasts
for the spot returns obtained from the ECM-COC model are presented intable 7.6.
The test month of May 1997 was a particularly bullish one, with a pure
buy-and-hold-the-index strategy netting a return of 4%, or almost 50% onan annualised basis. Ideally, the forecasting exercise would be conductedover a much longer period than one month, and preferably over differentmarket conditions. However, this was simply impossible due to the lack ofavailability of very high frequency data over a long time period. Clearly,the forecasts have some market timing ability in the sense that they seemto ensure trades that, on average, would have invested in the index whenit rose, but be out of the market when it fell. The most proﬁtable tradingstrategies in gross terms are those that trade on the basis of every positivespot return forecast, and all rules except the strictest ﬁlter make moremoney than a passive investment. The strict ﬁlter appears not to workwell since it is out of the index for too long during a period when themarket is rising strongly.

Modelling long-run relationships in finance 349
However, the picture of immense proﬁtability painted thus far is some-
what misleading for two reasons: slippage time and transactions costs.First, it is unreasonable to assume that trades can be executed in themarket the minute they are requested, since it may take some time toﬁnd counterparties for all the trades required to ‘buy the index’. (Note,of course, that in practice, a similar returns proﬁle to the index can beachieved with a very much smaller number of stocks.) Brooks, Rew andRitson therefore allow for ten minutes of ‘slippage time’, which assumesthat it takes ten minutes from when the trade order is placed to when itis executed. Second, it is unrealistic to consider gross proﬁtability, sincetransactions costs in the spot market are non-negligible and the strategiesexamined suggested a lot of trades. Sutcliffe (1997, p. 47) suggests thattotal round-trip transactions costs for FTSE stocks are of the order of1.7% of the investment.
The effect of slippage time is to make the forecasts less useful than they
would otherwise have been. For example, if the spot price is forecast torise, and it does, it may have already risen and then stopped rising by thetime that the order is executed, so that the forecasts lose their markettiming ability. Terminal wealth appears to fall substantially when slippagetime is allowed for, with the monthly return falling by between 1.5% and10%, depending on the trading rule.
Finally, if transactions costs are allowed for, none of the trading rules
can outperform the passive investment strategy, and all in fact make sub-stantial losses.
7.7.3 Conclusions
If the markets are frictionless and functioning efﬁciently, changes in thespot price of a ﬁnancial asset and its corresponding futures price wouldbe expected to be perfectly contemporaneously correlated and not to becross-autocorrelated. Many academic studies, however, have documentedthat the futures market systematically ‘leads’ the spot market, reﬂectingnews more quickly as a result of the fact that the stock index is not asingle entity. The latter implies that:
●Some components of the index are infrequently traded, implying thatthe observed index value contains ‘stale’ component prices
●It is more expensive to transact in the spot market and hence the spotmarket reacts more slowly to news
●Stock market indices are recalculated only every minute so that newinformation takes longer to be reﬂected in the index.

350 Introductory Econometrics for Finance
Clearly, such spot market impediments cannot explain the inter-daily
lead–lag relationships documented by Tse (1995). In any case, however,since it appears impossible to proﬁt from these relationships, their exis-tence is entirely consistent with the absence of arbitrage opportunitiesand is in accordance with modern deﬁnitions of the efﬁcient marketshypothesis.
7.8 Testing for and estimating cointegrating systems using the
Johansen technique based on VARs
Suppose that a set of gvariables ( g≥2) are under consideration that
are I(1) and which are thought may be cointegrated. A VAR with klags
containing these variables could be set up:
yt=β1yt−1+β2yt−2+···+ βkyt−k+ut
g×1g×gg×1g×gg×1 g×gg×1g×1(7.58)
In order to use the Johansen test, the VAR (7.58) above needs to be turned
into a vector error correction model (VECM) of the form
/Delta1yt=/Pi1yt−k+/Gamma11/Delta1yt−1+/Gamma12/Delta1yt−2+···+ /Gamma1k−1/Delta1yt−(k−1)+ut (7.59)
where /Pi1=(/summationtextk
i=1βi)−Igand/Gamma1i=(/summationtexti
j=1βj)−Ig
This VAR contains gvariables in ﬁrst differenced form on the LHS, and
k−1 lags of the dependent variables (differences) on the RHS, each with
a/Gamma1coefﬁcient matrix attached to it. In fact, the Johansen test can be
affected by the lag length employed in the VECM, and so it is useful toattempt to select the lag length optimally, as outlined in chapter 6. TheJohansen test centres around an examination of the /Pi1matrix. /Pi1can
be interpreted as a long-run coefﬁcient matrix, since in equilibrium, allthe/Delta1y
t−iwill be zero, and setting the error terms, ut, to their expected
value of zero will leave /Pi1yt−k=0. Notice the comparability between this
set of equations and the testing equation for an ADF test, which has a ﬁrstdifferenced term as the dependent variable, together with a lagged levelsterm and lagged differences on the RHS.
The test for cointegration between the ys is calculated by looking at the
rank of the /Pi1matrix via its eigenvalues.
2The rank of a matrix is equal
to the number of its characteristic roots (eigenvalues) that are different
2Strictly, the eigenvalues used in the test statistics are taken from rank-restricted product
moment matrices and not of /Pi1itself.

Modelling long-run relationships in finance 351
from zero (see the appendix at the end of this book for some algebra
and examples). The eigenvalues, denoted λiare put in ascending order
λ1≥λ2≥…≥λgIf the λs are roots, in this context they must be less than
1 in absolute value and positive, and λ1will be the largest (i.e. the closest to
one), while λgwill be the smallest (i.e. the closest to zero). If the variables
are not cointegrated, the rank of /Pi1will not be signiﬁcantly different from
zero, so λi≈0∀i. The test statistics actually incorporate ln(1−λi), rather
than the λithemselves, but still, when λi=0,ln(1−λi)=0.
Suppose now that rank (/Pi1)=1, then ln(1−λ1)will be negative and
ln(1−λi)=0∀i>1. If the eigenvalue iis non-zero, then ln(1−λi)<
0∀i>1. That is, for /Pi1to have a rank of 1, the largest eigenvalue must
be signiﬁcantly non-zero, while others will not be signiﬁcantly differentfrom zero.
There are two test statistics for cointegration under the Johansen ap-
proach, which are formulated as
λ
trace(r)=− Tg/summationdisplay
i=r+1ln(1−ˆλi) (7.60)
and
λmax(r,r+1)=− Tln(1−ˆλr+1) (7.61)
where ris the number of cointegrating vectors under the null hypothesis
and ˆλiis the estimated value for the ith ordered eigenvalue from the /Pi1
matrix. Intuitively, the larger is ˆλi, the more large and negative will be
ln(1−ˆλi)and hence the larger will be the test statistic. Each eigenvalue
will have associated with it a different cointegrating vector, which willbe eigenvectors. A signiﬁcantly non-zero eigenvalue indicates a signiﬁcantcointegrating vector.
λ
traceis a joint test where the null is that the number of cointegrat-
ing vectors is less than or equal to ragainst an unspeciﬁed or general
alternative that there are more than r.I ts t a r t sw i t h peigenvalues, and
then successively the largest is removed. λtrace=0 when all the λi=0, for
i=1,…, g.
λmaxconducts separate tests on each eigenvalue, and has as its null
hypothesis that the number of cointegrating vectors is ragainst an alter-
native of r+1.
Johansen and Juselius (1990) provide critical values for the two statis-
tics. The distribution of the test statistics is non-standard, and the critical

352 Introductory Econometrics for Finance
values depend on the value of g−r, the number of non-stationary compo-
nents and whether constants are included in each of the equations. Inter-cepts can be included either in the cointegrating vectors themselves or asadditional terms in the VAR. The latter is equivalent to including a trend inthe data generating processes for the levels of the series. Osterwald-Lenum(1992) provides a more complete set of critical values for the Johansen test,some of which are also given in the appendix of statistical tables at theend of this book.
If the test statistic is greater than the critical value from Johansen’s
tables, reject the null hypothesis that there are rcointegrating vectors
in favour of the alternative that there are r+1(forλ
trace) or more than
r(forλmax). The testing is conducted in a sequence and under the null,
r=0,1,…, g−1so that the hypotheses for λmaxare
H0:r=0 versus H 1:0<r≤g
H0:r=1 versus H 1:1<r≤g
H0:r=2 versus H 1:2<r≤g
………
H
0:r=g−1 versus H 1:r=g
The ﬁrst test involves a null hypothesis of no cointegrating vectors (corre-
sponding to /Pi1having zero rank). If this null is not rejected, it would
be concluded that there are no cointegrating vectors and the testingwould be completed. However, if H
0:r=0 is rejected, the null that there
is one cointegrating vector (i.e. H 0:r=1) would be tested and so on.
Thus the value of ris continually increased until the null is no longer
rejected.
But how does this correspond to a test of the rank of the /Pi1matrix? ris
the rank of /Pi1./Pi1cannot be of full rank ( g) since this would correspond to
the original ytbeing stationary. If /Pi1has zero rank, then by analogy to the
univariate case, /Delta1ytdepends only on /Delta1yt−jand not on yt−1, so that there
is no long-run relationship between the elements of yt−1. Hence there is
no cointegration. For 1<rank(/Pi1)<g,t h e r ea r e rcointegrating vectors. /Pi1
is then deﬁned as the product of two matrices, αandβ/prime, of dimension
(g×r)and (r×g), respectively, i.e.
/Pi1=αβ/prime(7.62)
The matrix βgives the cointegrating vectors, while αgives the amount
of each cointegrating vector entering each equation of the VECM, alsoknown as the ‘adjustment parameters’.

Modelling long-run relationships in finance 353
For example, suppose that g=4, so that the system contains four vari-
ables. The elements of the /Pi1matrix would be written
/Pi1=⎛
⎜⎜⎝π11π12π13π14
π21π22π23π24
π31π32π33π34
π41π42π43π44⎞
⎟⎟⎠(7.63)
Ifr=1, so that there is one cointegrating vector, then αandβwill be
(4×1)
/Pi1=αβ/prime=⎛
⎜⎜⎝α11
α12
α13
α14⎞
⎟⎟⎠(β11β12β13β14) (7.64)
Ifr=2, so that there are two cointegrating vectors, then αandβwill be
(4×2)
/Pi1=αβ/prime=⎛
⎜⎜⎝α11α21
α12α22
α13α23
α14α24⎞
⎟⎟⎠/parenleftbiggβ11β12β13β14
β21β22β23β24/parenrightbigg
(7.65)
and so on for r=3,…
Suppose now that g=4, and r=1, as in (7.64) above, so that there are
four variables in the system, y1,y2,y3, and y4, that exhibit one cointegrat-
ing vector. Then /Pi1yt−kwill be given by
/Pi1=⎛
⎜⎜⎝α11
α12
α13
α14⎞
⎟⎟⎠(β11β12β13β14)⎛
⎜⎜⎝y1
y2
y3
y4⎞
⎟⎟⎠
t−k(7.66)
Equation (7.66) can also be written
/Pi1=⎛
⎜⎜⎝α11
α12
α13
α14⎞
⎟⎟⎠(β11y1+β12y2+β13y3+β14y4)t−k (7.67)
Given (7.67), it is possible to write out the separate equations for each
variable /Delta1yt. It is also common to ‘normalise’ on a particular variable, so
that the coefﬁcient on that variable in the cointegrating vector is one.For example, normalising on y
1would make the cointegrating term in

354 Introductory Econometrics for Finance
the equation for /Delta1y1
α11/parenleftbigg
y1+β12
β11y2+β13
β11y3+β14
β11y4/parenrightbigg
t−k,etc.
Finally, it must be noted that the above description is not exactly how the
Johansen procedure works, but is an intuitive approximation to it.
7.8.1 Hypothesis testing using Johansen
Engle–Granger did not permit the testing of hypotheses on the cointegrat-ing relationships themselves, but the Johansen setup does permit the test-ing of hypotheses about the equilibrium relationships between the vari-ables. Johansen allows a researcher to test a hypothesis about one or morecoefﬁcients in the cointegrating relationship by viewing the hypothesis asa restriction on the /Pi1matrix. If there exist rcointegrating vectors, only
these linear combinations or linear transformations of them, or combina-tions of the cointegrating vectors, will be stationary. In fact, the matrix ofcointegrating vectors βcan be multiplied by any non-singular conformable
matrix to obtain a new set of cointegrating vectors.
A set of required long-run coefﬁcient values or relationships between
the coefﬁcients does not necessarily imply that the cointegrating vectorshave to be restricted. This is because any combination of cointegratingvectors is also a cointegrating vector. So it may be possible to combinethe cointegrating vectors thus far obtained to provide a new one or, ingeneral, a new set, having the required properties. The simpler and fewerare the required properties, the more likely that this recombination pro-
cess (called renormalisation ) will automatically yield cointegrating vectors
with the required properties. However, as the restrictions become morenumerous or involve more of the coefﬁcients of the vectors, it will eventu-ally become impossible to satisfy all of them by renormalisation. After thispoint, all other linear combinations of the variables will be non-stationary.If the restriction does not affect the model much, i.e. if the restriction isnot binding, then the eigenvectors should not change much following im-position of the restriction. A test statistic to test this hypothesis is givenby
test statistic =−T
r/summationdisplay
i=1[ln(1−λi)−ln(1−λi∗)]∼χ2(m) (7.68)
where λ∗
iare the characteristic roots of the restricted model, λiare the
characteristic roots of the unrestricted model, ris the number of non-
zero characteristic roots in the unrestricted model and mis the number
of restrictions.

Modelling long-run relationships in finance 355
Restrictions are actually imposed by substituting them into the relevant
αorβmatrices as appropriate, so that tests can be conducted on either the
cointegrating vectors or their loadings in each equation in the system (orboth). For example, considering (7.63)–(7.65) above, it may be that theorysuggests that the coefﬁcients on the loadings of the cointegrating vector(s)in each equation should take on certain values, in which case it would berelevant to test restrictions on the elements of α(e.g.α
11=1,α23=−1,
etc.). Equally, it may be of interest to examine whether only a sub-setof the variables in y
tis actually required to obtain a stationary linear
combination. In that case, it would be appropriate to test restrictions ofelements of β. For example, to test the hypothesis that y
4is not necessary
to form a long-run relationship, set β14=0,β24=0, etc.).
For an excellent detailed treatment of cointegration in the context of
both single equation and multiple equation models, see Harris (1995).Several applications of tests for cointegration and modelling cointegratedsystems in ﬁnance will now be given.
7.9 Purchasing power parity
Purchasing power parity (PPP) states that the equilibrium or long-run ex-change rate between two countries is equal to the ratio of their relativeprice levels. Purchasing power parity implies that the real exchange rate,
Q
t, is stationary. The real exchange rate can be deﬁned as
Qt=EtPt∗
Pt(7.69)
where Etis the nominal exchange rate in domestic currency per unit of
foreign currency, Ptis the domestic price level and Pt∗is the foreign price
level. Taking logarithms of (7.69) and rearranging, another way of statingthe PPP relation is obtained
e
t−pt+pt∗=qt (7.70)
where the lower case letters in (7.70) denote logarithmic transforms of the
corresponding upper case letters used in (7.69). A necessary and sufﬁcientcondition for PPP to hold is that the variables on the LHS of (7.70) – that isthe log of the exchange rate between countries Aand B, and the logs of
the price levels in countries Aand Bbe cointegrated with cointegrating
vector [1−11 ].
A test of this form is conducted by Chen (1995) using monthly data
from Belgium, France, Germany, Italy and the Netherlands over the

356 Introductory Econometrics for Finance
Table 7.7 Cointegration tests of PPP with European data
Tests for
cointegration between r=0 r≤1 r≤2 α1 α2
FRF–DEM 34.63∗17.10 6.26 1.33 −2.50
FRF–ITL 52.69∗15.81 5.43 2.65 −2.52
FRF–NLG 68.10∗16.37 6.42 0.58 −0.80
FRF–BEF 52.54∗26.09∗3.63 0.78 −1.15
DEM–ITL 42.59∗20.76∗4.79 5.80 −2.25
DEM–NLG 50.25∗17.79 3.28 0.12 −0.25
DEM–BEF 69.13∗27.13∗4.52 0.87 −0.52
ITL–NLG 37.51∗14.22 5.05 0.55 −0.71
ITL–BEF 69.24∗32.16∗7.15 0.73 −1.28
NLG–BEF 64.52∗21.97∗3.88 1.69 −2.17
Critical values 31.52 17.95 8.18 – –
Notes : FRF – French franc; DEM – German mark; NLG – Dutch guilder; ITL – Italian
lira; BEF – Belgian franc.Source : Chen (1995). Reprinted with the permission of Taylor & Francis Ltd
<www.tandf.co.uk >.
period April 1973 to December 1990. Pair-wise evaluations of the exis-
tence or otherwise of cointegration are examined for all combinationsof these countries (10 country pairs). Since there are three variables inthe system (the log exchange rate and the two log nominal price series)in each case, and that the variables in their log-levels forms are non-stationary, there can be at most two linearly independent cointegratingrelationships for each country pair. The results of applying Johansen’strace test are presented in Chen’s table 1, adapted and presented here astable 7.7.
As can be seen from the results, the null hypothesis of no cointegrating
vectors is rejected for all country pairs, and the null of one or fewer coin-tegrating vectors is rejected for France–Belgium, Germany–Italy, Germany–Belgium, Italy–Belgium, Netherlands–Belgium. In no cases is the null oftwo or less cointegrating vectors rejected. It is therefore concluded thatthe PPP hypothesis is upheld and that there are either one or two cointe-grating relationships between the series depending on the country pair.Estimates of α
1andα2are given in the last two columns of table 7.7. PPP
suggests that the estimated values of these coefﬁcients should be 1and
−1, respectively. In most cases, the coefﬁcient estimates are a long way
from these expected values. Of course, it would be possible to impose thisrestriction and to test it in the Johansen framework as discussed above,but Chen does not conduct this analysis.

Modelling long-run relationships in finance 357
7.10 Cointegration between international bond markets
Often, investors will hold bonds from more than one national market in
the expectation of achieving a reduction in risk via the resulting diver-siﬁcation. If international bond markets are very strongly correlated inthe long run, diversiﬁcation will be less effective than if the bond mar-kets operated independently of one another. An important indication ofthe degree to which long-run diversiﬁcation is available to internationalbond market investors is given by determining whether the markets arecointegrated. This book will now study two examples from the academicliterature that consider this issue: Clare, Maras and Thomas (1995), andMills and Mills (1991).
7.10.1 Cointegration between international bond markets: a univariate approach
Clare, Maras and Thomas (1995) use the Dickey–Fuller and Engle–Grangersingle-equation method to test for cointegration using a pair-wise analy-sis of four countries’ bond market indices: US, UK, Germany and Japan.Monthly Salomon Brothers’ total return government bond index data fromJanuary 1978 to April 1990 are employed. An application of the Dickey–Fuller test to the log of the indices reveals the following results (adaptedfrom their table 1), given in table 7.8.
Neither the critical values, nor a statement of whether a constant or
trend are included in the test regressions, are offered in the paper. Nev-ertheless, the results are clear. Recall that the null hypothesis of a unitroot is rejected if the test statistic is smaller (more negative) than the crit-ical value. For samples of the size given here, the 5% critical value would
Table 7.8 DF tests for international bond indices
Panel A: test on log-index for country DF Statistic
Germany −0.395
Japan −0.799
UK −0.884
US 0.174
Panel B: test on log-returns for country
Germany −10.37
Japan −10.11
UK −10.56
US −10.64
Source : Clare, Maras and Thomas (1995). Reprinted with
the permission of Blackwell Publishers.

358 Introductory Econometrics for Finance
Table 7.9 Cointegration tests for pairs of international bond indices
UK– UK– Germany– Germany– Japan– 5% Critical
Test Germany Japan UK–US Japan US US value
CRDW 0.189 0.197 0.097 0.230 0.169 0.139 0.386DF 2.970 2.770 2.020 3.180 2.160 2.160 3.370ADF 3.160 2.900 1.800 3.360 1.640 1.890 3.170
Source : Clare, Maras and Thomas (1995). Reprinted with the permission of Blackwell
Publishers.
be somewhere between −1.95 and −3.50. It is thus demonstrated quite
conclusively that the logarithms of the indices are non-stationary, whiletaking the ﬁrst difference of the logs (that is, constructing the returns)induces stationarity.
Given that all logs of the indices in all four cases are shown to be
I(1), the next stage in the analysis is to test for cointegration by forming
a potentially cointegrating regression and testing its residuals for non-stationarity. Clare, Maras and Thomas use regressions of the form
B
i=α0+α1Bj+u (7.71)
with time subscripts suppressed and where Biand Bjrepresent the log-
bond indices for any two countries iand j. The results are presented in
their tables 3 and 4, which are combined into table 7.9 here. They offerresults from applying 7 different tests, while we present results only forthe Cointegrating Regression Durbin Watson (CRDW), Dickey–Fuller andAugmented Dickey–Fuller tests (although the lag lengths for the latter arenot given) are presented here.
In this case, the null hypothesis of a unit root in the residuals from
regression (7.71) cannot be rejected. The conclusion is therefore that thereis no cointegration between any pair of bond indices in this sample.
7.10.2 Cointegration between international bond markets:
a multivariate approach
Mills and Mills (1991) also consider the issue of cointegration or non-
cointegration between the same four international bond markets. How-ever, unlike Clare, Maras and Thomas, who use bond price indices, Millsand Mills employ daily closing observations on the redemption yields. Thelatter’s sample period runs from 1 April 1986 to 29 December 1989, giving960 observations. They employ a Dickey–Fuller-type regression procedureto test the individual series for non-stationarity and conclude that all fouryields series are I(1).

Modelling long-run relationships in finance 359
Table 7.10 Johansen tests for cointegration between international bond yields
Critical valuesr(number of cointegrating
vectors under the null hypothesis) Test statistic 10% 5%
0 22.06 35.6 38.6
1 10.58 21.2 23.82 2.52 10.3 12.03 0.12 2.9 4.2
Source : Mills and Mills (1991). Reprinted with the permission of Blackwell Publishers.
The Johansen systems procedure is then used to test for cointegration
between the series. Unlike the Clare, Maras and Thomas paper, Mills andMills (1991) consider all four indices together rather than investigatingthem in a pair-wise fashion. Therefore, since there are four variables inthe system (the redemption yield for each country), i.e. g=4, there can be
at most three linearly independent cointegrating vectors, i.e., r≤3.T h e
trace statistic is employed, and it takes the form
λ
trace(r)=− Tg/summationdisplay
i=r+1ln(1−ˆλi) (7.72)
where λiare the ordered eigenvalues. The results are presented in their
table 2, which is modiﬁed slightly here, and presented in table 7.10.
Looking at the ﬁrst row under the heading, it can be seen that the test
statistic is smaller than the critical value, so the null hypothesis that r=0
cannot be rejected, even at the 10% level. It is thus not necessary to look
at the remaining rows of the table. Hence, reassuringly, the conclusionfrom this analysis is the same as that of Clare, Maras and Thomas – i.e.that there are no cointegrating vectors.
Given that there are no linear combinations of the yields that are sta-
tionary, and therefore that there i s no error correction representation,
Mills and Mills then continue to estimate a VAR for the ﬁrst differencesof the yields. The VAR is of the form
/Delta1X
t=k/summationdisplay
i=1/Gamma1i/Delta1Xt−i+vt (7.73)
where:
Xt=⎡
⎢⎢⎣X(US)t
X(UK)t
X(WG)t
X(JAP)t⎤
⎥⎥⎦,/Gamma1i=⎡
⎢⎢⎣/Gamma111i/Gamma112i/Gamma113i/Gamma114i
/Gamma121i/Gamma122i/Gamma123i/Gamma124i
/Gamma131i/Gamma132i/Gamma133i/Gamma134i
/Gamma141i/Gamma142i/Gamma143i/Gamma144i⎤
⎥⎥⎦,vt=⎡
⎢⎢⎣v1t
v2t
v3t
v4t⎤
⎥⎥⎦

360 Introductory Econometrics for Finance
Table 7.11 Variance decompositions for VAR of international bond yields
Explained by movements in
Explaining Days
movements in ahead US UK Germany Japan
US 1 95.6 2.4 1.7 0.3
5 94.2 2.8 2.3 0.7
10 92.9 3.1 2.9 1.120 92.8 3.2 2.9 1.1
UK 1 0.0 98.3 0.0 1.7
5 1.7 96.2 0.2 1.9
10 2.2 94.6 0.9 2.320 2.2 94.6 0.9 2.3
Germany 1 0.0 3.4 94.6 2.0
5 6.6 6.6 84.8 3.0
10 8.3 6.5 82.9 3.620 8.4 6.5 82.7 3.7
Japan 1 0.0 0.0 1.4 100.0
5 1.3 1.4 1.1 96.2
10 1.5 2.1 1.8 94.620 1.6 2.2 1.9 94.2
Source : Mills and Mills (1991). Reprinted with the permission of Blackwell Publishers.
They set k, the number of lags of each change in the yield in each regres-
sion, to 8, arguing that likelihood ratio tests rejected the possibility ofsmaller numbers of lags. Unfortunately, and as one may anticipate for aregression of daily yield changes, the R
2values for the VAR equations are
low, ranging from 0.04 for the US to 0.17 for Germany. Variance decompo-sitions and impulse responses are calculated for the estimated VAR. Twoorderings of the variables are employed: one based on a previous studyand one based on the chronology of the opening (and closing) of the ﬁ-nancial markets considered: Japan →Germany →UK→US. Only results
for the latter, adapted from tables 4 and 5 of Mills and Mills (1991), arepresented here. The variance decompositions and impulse responses forthe VARs are given in tables 7.11 and 7.12, respectively.
As one may expect from the low R
2of the VAR equations, and the
lack of cointegration, the bond markets seem very independent of oneanother. The variance decompositions, which show the proportion of themovements in the dependent variables that are due to their ‘own’ shocks,versus shocks to the other variables, seem to suggest that the US, UKand Japanese markets are to a certain extent exogenous in this system.That is, little of the movement of the US, UK or Japanese series can be

Modelling long-run relationships in finance 361
Table 7.12 Impulse responses for VAR of international bond yields
Response of US to innovations in
Days after shock US UK Germany Japan
0 0.98 0.00 0.00 0.00
1 0.06 0.01 −0.10 0.05
2 −0.02 0.02 −0.14 0.07
3 0.09 −0.04 0.09 0.08
4 −0.02 −0.03 0.02 0.09
10 −0.03 −0.01 −0.02 −0.01
20 0.00 0.00 −0.10 −0.01
Response of UK to innovations in
Days after shock US UK Germany Japan
0 0.19 0.97 0.00 0.00
1 0.16 0.07 0.01 −0.06
2 −0.01 −0.01 −0.05 0.09
3 0.06 0.04 0.06 0.054 0.05 −0.01 0.02 0.07
10 0.01 0.01 −0.04 −0.01
20 0.00 0.00 −0.01 0.00
Response of Germany to innovations in
Days after shock US UK Germany Japan
0 0.07 0.06 0.95 0.00
1 0.13 0.05 0.11 0.022 0.04 0.03 0.00 0.003 0.02 0.00 0.00 0.014 0.01 0.00 0.00 0.0910 0.01 0.01 −0.01 0.02
20 0.00 0.00 0.00 0.00
Response of Japan to innovations in
Days after shock US UK Germany Japan
0 0.03 0.05 0.12 0.971 0.06 0.02 0.07 0.042 0.02 0.02 0.00 0.213 0.01 0.02 0.06 0.074 0.02 0.03 0.07 0.0610 0.01 0.01 0.01 0.0420 0.00 0.00 0.00 0.01
Source : Mills and Mills (1991). Reprinted with the permission of
Blackwell Publishers.

362 Introductory Econometrics for Finance
explained by movements other than their own bond yields. In the German
case, however, after 20 days, only 83% of movements in the German yieldare explained by German shocks. The German yield seems particularlyinﬂuenced by US (8.4% after 20 days) and UK (6.5% after 20 days) shocks.It also seems that Japanese shocks have the least inﬂuence on the bondyields of other markets.
A similar pattern emerges from the impulse response functions, which
show the effect of a unit shock applied separately to the error of eachequation of the VAR. The markets appear relatively independent of oneanother, and also informationally efﬁcient in the sense that shocks workthrough the system very quickly. There is never a response of more than10% to shocks in any series three days after they have happened; in mostcases, the shocks have worked through the system in two days. Such aresult implies that the possibility of making excess returns by trading inone market on the basis of ‘old news’ from another appears very unlikely.
7.10.3 Cointegration in international bond markets: conclusions
A single set of conclusions can be drawn from both of these papers. Bothapproaches have suggested that international bond markets are not coin-tegrated. This implies that investors can gain substantial diversiﬁcationbeneﬁts. This is in contrast to results reported for other markets, suchas foreign exchange (Baillie and Bollerslev, 1989), commodities (Baillie,1989), and equities (Taylor and Tonks, 1989). Clare, Maras and Thomas(1995) suggest that the lack of long-term integration between the mar-kets may be due to ‘institutional idiosyncrasies’, such as heterogeneousmaturity and taxation structures, and differing investment cultures, is-suance patterns and macroeconomic policies between countries, whichimply that the markets operate largely independently of one another.
7.11 Testing the expectations hypothesis of the term structure
of interest rates
The following notation replicates that employed by Campbell and Shiller
(1991) in their seminal paper. The single, linear expectations theory ofthe term structure used to represent the expectations hypothesis (here-after EH), deﬁnes a relationship between an n-period interest rate or yield,
denoted R
(n)
t, and an m-period interest rate, denoted R(m)
t, where n>m.
Hence R(n)
tis the interest rate or yield on a longer-term instrument relative
to a shorter-term interest rate or yield, R(m)
t. More precisely, the EH states

Modelling long-run relationships in finance 363
that the expected return from investing in an n-period rate will equal the
expected return from investing in m-period rates up to n−mperiods in
the future plus a constant risk-premium, c, which can be expressed as
R(n)
t=1
qq−1/summationdisplay
i=0EtR(m)
t+mi+c (7.74)
where q=n/m. Consequently, the longer-term interest rate, R(n)
t, can be
expressed as a weighted-average of current and expected shorter-term in-terest rates, R
(m)
t, plus a constant risk premium, c. If (7.74) is considered,
it can be seen that by subtracting R(m)
tfrom both sides of the relationship
we have
R(n)
t−R(m)
t=1
qq−1/summationdisplay
i=0j=i/summationdisplay
j=1Et/bracketleftbig
/Delta1(m)R(m)
t+jm/bracketrightbig
+c (7.75)
Examination of (7.75) generates some interesting restrictions. If the inter-
est rates under analysis, say R(n)
tand R(m)
t, are I(1) series, then, by deﬁni-
tion,/Delta1R(n)
tand/Delta1R(m)
twill be stationary series. There is a general accep-
tance that interest rates, Treasury Bill yields, etc. are well described as I(1)processes and this can be seen in Campbell and Shiller (1988) and Stockand Watson (1988). Further, since cis a constant then it is by deﬁnition a
stationary series. Consequently, if the EH is to hold, given that cand/Delta1R
(m)
t
are I(0) implying that the RHS of (7.75) is stationary, then R(n)
t−R(m)
tmust
by deﬁnition be stationary, otherwise we will have an inconsistency inthe order of integration between the RHS and LHS of the relationship.
R
(n)
t−R(m)
tis commonly known as the spread between the n-period and
m-period rates, denoted S(n,m)
t, which in turn gives an indication of the
slope of the term structure. Consequently, it follows that if the EH is tohold, then the spread will be found to be stationary and therefore R
(n)
t
and R(m)
twill cointegrate with a cointegrating vector ( 1,−1) for [ R(n)
t,R(m)
t].
Therefore, the integrated process driving each of the two rates is commonto both and hence it can be said that the rates have a common stochas-tic trend. As a result, since the EH predicts that each interest rate serieswill cointegrate with the one-period interest rate, it must be true thatthe stochastic process driving all the rates is the same as that driving theone-period rate, i.e. any combination of rates formed to create a spreadshould be found to cointegrate with a cointegrating vector (1,−1).
Many examinations of the expectations hypothesis of the term structure
have been conducted in the literature, and still no overall consensus ap-pears to have emerged concerning its validity. One such study that tested

364 Introductory Econometrics for Finance
Table 7.13 Tests of the expectations hypothesis using the US zero coupon yield curve
with monthly data
Lag length Hypothesis
Sample period Interest rates included of VAR is λmax λtrace
1952 M1–1978 M12 Xt=[RtR(6)
t]/prime2 r=04 7 .54∗∗∗49.82∗∗∗
r≤1 2.28 2.28
1952 M1–1987 M2 Xt=[RtR(120)
t]/prime2 r=04 0 .66∗∗∗43.73∗∗∗
r≤1 3.07 3.07
1952 M1–1987 M2 Xt=[RtR(60)
tR(120)
t]/prime2 r=04 0 .13∗∗∗42.63∗∗∗
r≤1 2.50 2.50
1973 M5–1987 M2 Xt=[RtR(60)
tR(120)
t R(180)
t R(240)t]/prime7 r=03 4 .78∗∗∗75.50∗∗∗
r≤1 23.31∗40.72
r≤2 11.94 17.41
r≤3 3.80 5.47
r≤4 1.66 1.66
Notes :∗,∗∗and∗∗∗denote signiﬁcance at the 20%,10% and 5% levels, respectively; r
is the number of cointegrating vectors under the null hypothesis.Source : Shea (1992). Reprinted with the permission of American Statistical
Association. All rights reserved.
the expectations hypothesis using a standard data-set due to McCulloch
(1987) was conducted by Shea (1992). The data comprises a zero couponterm structure for various maturities from 1 month to 25 years, coveringthe period January 1952–February 1987. Various techniques are employedin Shea’s paper, while only his application of the Johansen technique isdiscussed here. A vector X
tcontaining the interest rate at each of the
maturities is constructed
Xt=/bracketleftbig
RtR(2)
t…R(n)
t/bracketrightbig/prime(7.76)
where Rtdenotes the spot interest rate. It is argued that each of the ele-
ments of this vector is non-stationary, and hence the Johansen approachis used to model the system of interest rates and to test for cointegra-tion between the rates. Both the λ
maxandλtracestatistics are employed,
corresponding to the use of the maximum eigenvalue and the cumu-lated eigenvalues, respectively. Shea tests for cointegration between vari-ous combinations of the interest rates, measured as returns to maturity.A selection of Shea’s results is presented in table 7.13.
The results below, together with the other results presented by Shea,
seem to suggest that the interest rates at different maturities are typi-cally cointegrated, usually with one cointegrating vector. As one may have

Modelling long-run relationships in finance 365
expected, the cointegration becomes weaker in the cases where the anal-
ysis involves rates a long way apart on the maturity spectrum. However,cointegration between the rates is a necessary but not sufﬁcient conditionfor the expectations hypothesis of the term structure to be vindicated bythe data. Validity of the expectations hypothesis also requires that anycombination of rates formed to create a spread should be found to cointe-grate with a cointegrating vector ( 1,−1). When comparable restrictions are
placed on the βestimates associated with the cointegrating vectors, they
are typically rejected, suggesting only limited support for the expectationshypothesis.
7.12 Testing for cointegration and modelling cointegrated
systems using EViews
The S&P500 spot and futures series that were discussed in chapters 2 and 3
will now be examined for cointegration using EViews. If the two series arecointegrated, this means that the spot and futures prices have a long-termrelationship, which prevents them from wandering apart without bound.To test for cointegration using the Engle–Granger approach, the residualsof a regression of the spot price on the futures price are examined.
3Create
two new variables , for the log of the spot series and the log of the futures
series, and call them ‘ lspot ’ and ‘ lfutures ’ respectively. Then generate a
new equation object and run the regression:
LSPOT C LFUTURESNote again that it is not valid to examine anything other than the coefﬁ-
cient values in this regression. The residuals of this regression are foundin the object called RESID. First, if you click on the Resids tab, you will
see a plot of the levels of the residuals (blue line), which looks much morelike a stationary series than the original spot series (the red line corre-sponding to the actual values of y) looks. The plot should appear as in
screenshot 7.2.
Generate a new series that will keep these residuals in an object for
later use:
STATRESIDS =RESID
3Note that it is common to run a regression of the log of the spot price on the log of the
futures rather than a regression in levels; the main reason for using logarithms isthat the differences of the logs are returns, whereas this is not true for thelevels.

366 Introductory Econometrics for Finance
Screenshot 7.2
Actual, Fitted and
Residual plot tocheck forstationarity
This is required since every time a regression is run, the RESID object is up-
dated (overwritten) to contain the residuals of the most recently conductedregression. Perform the ADF Test on the residual series STATRESIDS. As-
suming again that up to 12 lags are permitted, and that a constant butnot a trend are employed in a regression on the levels of the series, theresults are:
Null Hypothesis: STATRESIDS has a unit root
Exogenous: ConstantLag Length: 0 (Automatic based on SIC, MAXLAG =12)
t-Statistic Prob.∗
Augmented Dickey-Fuller test statistic −8.050542 0.0000
Test critical values: 1% level −3.534868
5% level −2.906923
10% level −2.591006
∗MacKinnon (1996) one-sided p-values.

Modelling long-run relationships in finance 367
Augmented Dickey-Fuller Test Equation
Dependent Variable: D(STATRESIDS)Method: Least SquaresDate: 09/06/07 Time: 10:55Sample (adjusted): 2002M03 2007M07Included observations: 65 after adjustments
Coefﬁcient Std. Error t-Statistic Prob.
STATRESIDS(-1) −1.027830 0.127672 −8.050542 0.000000
C 0.000352 0.003976 0.088500 0.929800
R-squared 0.507086 Mean dependent var −0.000387
Adjusted R-squared 0.499262 S.D. dependent var 0.045283S.E. of regression 0.032044 Akaike info criterion −4.013146
Sum squared resid 0.064688 Schwarz criterion −3.946241
Log likelihood 132.4272 Hannan-Quinn criter. −3.986748
F-statistic 64.81123 Durbin-Watson stat 1.935995Prob(F-statistic) 0.000000
Since the test statistic ( −8.05) is more negative than the critical values,
even at the 1% level, the null hypothesis of a unit root in the test regres-sion residuals is strongly rejected. We would thus conclude that the twoseries are cointegrated. This means that an error correction model (ECM)can be estimated, as there is a linear combination of the spot and futuresprices that would be stationary. The ECM would be the appropriate modelrather than a model in pure ﬁrst difference form because it would en-able us to capture the long-run relationship between the series as well asthe short-run one. We could now estimate an error correction model byrunning the regression
4
rspot c rfutures statresids( −1)
Although the Engle–Granger approach is evidently very easy to use, as
outlined above, one of its major drawbacks is that it can estimate onlyup to one cointegrating relationship between the variables. In the spot-futures example, there can be at most one cointegrating relationship sincethere are only two variables in the system. But in other situations, if thereare more variables, there could potentially be more than one linearlyindependent cointegrating relationship. Thus, it is appropriate instead toexamine the issue of cointegration within the Johansen VAR framework.
4If you run this regression, you will see that the estimated ECM results from this
example are not entirely plausible but may have resulted from the relatively shortsample period employed!

368 Introductory Econometrics for Finance
The application we will now examine centres on whether the yields
on treasury bills of different maturities are cointegrated. Re-open the
‘macro.wf1’ workfile that was used in chapter 3. There are six interest
rate series corresponding to three and six months, and one, three, ﬁveand ten years. Each series has a name in the ﬁle starting with the letters‘ustb’. The ﬁrst step in any cointegration analysis is to ensure that thevariables are all non-stationary in their levels form, so confirm that this
is the case for each of the six series, by running a unit root test on
each one.
Next, to run the cointegration test, highlight the six series and then
click Quick/Group Statistics/Cointegration Test . A box should then appear
with the names of the six series in it. Click OK, and then the following
list of options will appear (screenshot 7.3).
Screenshot 7.3
Johansen
cointegration test
The differences between models 1 to 6 centre on whether an intercept or
a trend or both are included in the potentially cointegrating relationshipand/or the VAR. It is usually a good idea to examine the sensitivity of theresult to the type of speciﬁcation used, so select Option 6 which will do
this and click OK. The results appear as in the following table

Modelling long-run relationships in finance 369
Date: 09/06/07 Time: 11:43
Sample: 1986M03 2007M04Included observations: 249Series: USTB10Y USTB1Y USTB3M USTB3Y USTB5Y USTB6MLags interval: 1 to 4
Selected (0.05 level*) Number of Cointegrating Relations by Model
Data Trend: None None Linear Linear Quadratic
Test Type No Intercept Inter cept Intercept Intercept Intercept
No Trend No Trend No Trend Trend Trend
Trace 4 3 4 4 6
Max-Eig 3 2 2 1 1
∗Critical values based on MacKinnon-Haug-Michelis (1999)
Information Criteria by Rank and Model
Data Trend: None None Linear Linear Quadratic
Rank or No Intercept Intercept Intercept Intercept Intercept
No. of CEs No Trend No Trend No Trend Trend Trend
Log Likelihood by Rank (rows) and Model (columns)
0 1667.058 1667.058 1667.807 1667.807 1668.036
1 1690.466 1691.363 1691.975 1692.170 1692.3692 1707.508 1709.254 1709.789 1710.177 1710.3633 1719.820 1722.473 1722.932 1726.801 1726.9814 1728.513 1731.269 1731.728 1738.760 1738.9055 1733.904 1737.304 1737.588 1746.100 1746.2386 1734.344 1738.096 1738.096 1751.143 1751.143
Akaike Information Criteria by Rank (rows) and Model (columns)
0 −12.23340 −12.23340 −12.19122 −12.19122 −12.14487
1 −12.32503 −12.32420 −12.28896 −12.28249 −12.24393
2 −12.36552 −12.36349 −12.33566 −12.32271 −12.29208
3 −12.36803∗−12.36524 −12.34484 −12.35182 −12.32916
4 −12.34147 −12.33148 −12.31910 −12.34345 −12.32856
5 −12.28838 −12.27553 −12.26979 −12.29799 −12.29107
6 −12.19553 −12.17748 −12.17748 −12.23408 −12.23408
Schwarz Criteria by Rank (rows) and Model (columns)
0 −10.19921∗−10.19921∗−10.07227 −10.07227 −9.941161
1 −10.12132 −10.10637 −10.00049 −9.979903 −9.870707
2 −9.992303 −9.962013 −9.877676 −9.836474 −9.749338
3 −9.825294 −9.780129 −9.717344 −9.681945 −9.616911
4 −9.629218 −9.562721 −9.522087 −9.489935 −9.446787
5 −9.406616 −9.323131 −9.303259 −9.260836 −9.239781
6 −9.144249 −9.041435 −9.041435 −9.013282 −9.013282

370 Introductory Econometrics for Finance
The results across the six types of model and the type of test (the ‘trace’
or ‘max’ statistics) are a little mixed concerning the number of cointegrat-ing vectors (the top panel) but they do at least all suggest that the seriesare cointegrated – in other words, all speciﬁcations suggest that there isat least one cointegrating vector. The following three panels all provideinformation that could be used to determine the appropriate lag lengthfor the VAR. The values of the log-likelihood function could be used torun tests of whether a VAR of a given order could be restricted to a VARof lower order; AIC and SBIC values are provided in the ﬁnal two pan-els. Fortunately, which ever model is used concerning whether interceptsand/or trends are incorporated, AIC selects a VAR with 3 lags and SBIC aVAR with 0 lags. Note that the difference in optimal model order could beattributed to the relatively small sample size available with this monthlysample compared with the number of observations that would have beenavailable were daily data used, implying that the penalty term in SBIC ismore severe on extra parameters in this case.
So, in order to see the estimated models, click View/Cointegration Test
and select Option 3 (Intercept (no trend) in CE and test VAR), changing
the ‘Lag Intervals’ to 13, and clicking OK. EViews produces a very large
quantity of output, as shown in the following table.
5
Date: 09/06/07 Time: 13:20
Sample (adjusted): 1986M07 2007M04Included observations: 250 after adjustmentsTrend assumption: Linear deterministic trendSeries: USTB10Y USTB1Y USTB3M USTB3Y USTB5Y USTB6MLags interval (in ﬁrst differences): 1 to 3
Unrestricted Cointegration Rank Test (Trace)
Hypothesized Trace 0.05
No. of CE(s) Eigenvalue Statistic Critical Value Prob.∗∗
None∗0.185263 158.6048 95.75366 0.0000
At most 1∗0.140313 107.3823 69.81889 0.0000
At most 2∗0.136686 69.58558 47.85613 0.0001
At most 3∗0.082784 32.84123 29.79707 0.0216
At most 4 0.039342 11.23816 15.49471 0.1973At most 5 0.004804 1.203994 3.841466 0.2725
Trace test indicates 4 cointegrating eqn(s) at the 0.05 level
∗denotes rejection of the hypothesis at the 0.05 level
∗∗MacKinnon-Haug-Michelis (1999) p-values
5Estimated cointegrating vectors and loadings are provided by EViews for 2–5
cointegrating vectors as well, but these are not shown to preserve space.

Modelling long-run relationships in finance 371
Unrestricted Cointegration Rank Test (Maximum Eigenvalue)
Hypothesized Max-Eigen 0.05
No. of CE(s) Eigenvalue Statistic Critical Value Prob.∗∗
None∗0.185263 51.22249 40.07757 0.0019
At most 1∗0.140313 37.79673 33.87687 0.0161
At most 2∗0.136686 36.74434 27.58434 0.0025
At most 3∗0.082784 21.60308 21.13162 0.0429
At most 4 0.039342 10.03416 14.26460 0.2097At most 5 0.004804 1.203994 3.841466 0.2725
Max-eigenvalue test indicates 4 cointegrating eqn(s) at the 0.05 level
∗denotes rejection of the hypothesis at the 0.05 level
∗∗MacKinnon-Haug-Michelis (1999) p-values
Unrestricted Cointegrating Coefﬁcients (normalized by b/prime∗S11∗b=I):
USTB10Y USTB1Y USTB3M USTB3Y USTB5Y USTB6M
2.775295 −6.449084 −14.79360 1.880919 −4.947415 21.32095
2.879835 0.532476 −0.398215 −7.247578 0.964089 3.797348
6.676821 −15.83409 1.422340 21.39804 −20.73661 6.834275
−7.351465 −9.144157 −3.832074 −6.082384 15.06649 11.51678
1.301354 0.034196 3.251778 8.469627 −8.131063 −4.915350
−2.919091 1.146874 0.663058 −1.465376 3.350202 −1.422377
Unrestricted Adjustment Coefﬁcients (alpha):
D(USTB10Y) 0.030774 0.009498 0.038434 −0.042215 0.004975 0.012630
D(USTB1Y) 0.047301 −0.013791 0.037992 −0.050510 −0.012189 0.004599
D(USTB3M) 0.063889 −0.028097 0.004484 −0.031763 −0.003831 0.001249
D(USTB3Y) 0.042465 0.014245 0.035935 −0.062930 −0.006964 0.010137
D(USTB5Y) 0.039796 0.018413 0.041033 −0.058324 0.001649 0.010563
D(USTB6M) 0.042840 −0.029492 0.018767 −0.046406 −0.006399 0.002473
1 Cointegrating Equation(s): Log likelihood 1656.437
Normalized cointegrating coefﬁcients (standard error in parentheses)
USTB10Y USTB1Y USTB3M USTB3Y USTB5Y USTB6M
1.000000 −2.323747 −5.330461 0.677737 −1.782662 7.682407
(0.93269) (0.78256) (0.92410) (0.56663) (1.28762)

372 Introductory Econometrics for Finance
Adjustment coefﬁcients (standard error in parentheses)
D(USTB10Y) 0.085407
(0.04875)
D(USTB1Y) 0.131273
(0.04510)
D(USTB3M) 0.177312
(0.03501)
D(USTB3Y) 0.117854
(0.05468)
D(USTB5Y) 0.110446
(0.05369)
D(USTB6M) 0.118894
(0.03889)
2 Cointegrating Equation(s): Log likelihood 1675.335
Normalized cointegrating coefﬁcients (standard error in parentheses)
USTB10Y USTB1Y USTB3M USTB3Y USTB5Y USTB6M
1.000000 0.000000 −0.520964 −2.281223 0 .178708 1 .787640
(0.76929) (0 .77005) (0 .53441) (0 .97474)
0.000000 1.000000 2 .069717 −1.273357 0 .844055 −2.536751
(0.43972) (0 .44016) (0 .30546) (0 .55716)
Adjustment coefﬁcients (standard error in parentheses)
D(USTB10Y) 0.112760 −0.193408
(0.07021) (0 .11360)
D(USTB1Y) 0.091558 −0.312389
(0.06490) (0 .10500)
D(USTB3M) 0.096396 −0.426988
(0.04991) (0 .08076)
D(USTB3Y) 0.158877 −0.266278
(0.07871) (0 .12735)
D(USTB5Y) 0.163472 −0.246844
(0.07722) (0 .12494)
D(USTB6M) 0.033962 −0.291983
(0.05551) (0 .08981)
Note: Table truncated.

Modelling long-run relationships in finance 373
The ﬁrst two panels of the table show the results for the λtraceandλmax
statistics respectively. The second column in each case presents the or-
dered eigenvalues, the third column the test statistic, the fourth columnthe critical value and the ﬁnal column the p-value. Examining the trace
test, if we look at the ﬁrst row after the headers, the statistic of 158.6048considerably exceeds the critical value (of 95) and so the null of no coin-tegrating vectors is rejected. If we then move to the next row, the teststatistic (107.3823) again exceeds the critical value so that the null of atmost one cointegrating vector is also rejected. This continues, until we donot reject the null hypothesis of at most four cointegrating vectors at the5% level, and this is the conclusion. The max test, shown in the second
panel, conﬁrms this result.
The unrestricted coefﬁcient values are the estimated values of coefﬁ-
cients in the cointegrating vector, and these are presented in the thirdpanel. However, it is sometimes useful to normalise the coefﬁcient valuesto set the coefﬁcient value on one of them to unity, as would be the case inthe cointegrating regression under the Engle–Granger approach. The nor-malisation will be done by EViews with respect to the ﬁrst variable givenin the variable list (i.e. which ever variable you listed ﬁrst in the systemwill by default be given a coefﬁcient of 1 in the normalised cointegratingvector). Panel 6 of the table presents the estimates if there were only onecointegrating vector, which has been normalised so that the coefﬁcient onthe ten-year bond yield is unity. The adjustment coefﬁcients, or loadingsin each regression (i.e. the ‘amount of the cointegrating vector’ in eachequation), are also given in this panel. In the next panel, the same formatis used (i.e. the normalised cointegrating vectors are presented and thenthe adjustment parameters) but under the assumption that there are twocointegrating vectors, and this proceeds until the situation where thereare ﬁve cointegrating vectors, the maximum number possible for a systemcontaining six variables.
In order to see the whole VECM model, select Proc/Make Vector
Autoregression …. Starting on the default ‘Basics’ tab, in ‘VAR type’, se-
lect Vector Error Correction , and in the ‘Lag Intervals for D(Endogenous):’
box, type 13. Then click on the cointegration tab and leave the default
as 1 cointegrating vector for simplicity in the ‘Rank’ box and option 3 tohave an intercept but no trend in the cointegrating equation and the VAR.When OKis clicked, the output for the entire VECM will be seen.
It is sometimes of interest to test hypotheses about either the parame-
ters in the cointegrating vector or their loadings in the VECM. To do this

374 Introductory Econometrics for Finance
from the ‘Vector Error Correction Estimates’ screen, click the Estimate
button and click on the VEC Restrictions tab.
In EViews, restrictions concerning the cointegrating relationships em-
bodied in βare denoted by B(i,j), where B(i,j) represents the jth coefﬁcient
in the ith cointegrating relationship (screenshot 7.4).
Screenshot 7.4
VAR speciﬁcation for
Johansen tests
In this case, we are allowing for only one cointegrating relationship, so
suppose that we want to test the hypothesis that the three-month and six-month yields do not appear in the cointegrating equation. We could testthis by specifying the restriction that their parameters are zero, which inEViews terminology would be achieved by writing B(1,3)=0, B(1,6) =0in
the ‘VEC Coefﬁcient Restrictions’ box and clicking OK. EViews will then
show the value of the test statistic, followed by the restricted cointegratingvector and the VECM. To preseve space, only the test statistic and restrictedcointegrating vector are shown in the following table.
In this case, there are two restrictions, so that the test statistic follows
aχ
2distribution with 2 degrees of freedom. In this case, the p-value for
the test is 0.001, and so the restrictions are not supported by the data and

Modelling long-run relationships in finance 375
Vector Error Correction Estimates
Date: 09/06/07 Time: 14:04Sample (adjusted): 1986M07 2007M04Included observations: 250 after adjustmentsStandard errors in ( ) & t-statistics in [ ]
Cointegration Restrictions:
B(1,3) =0, B(1,6) =0
Convergence achieved after 38 iterations.Not all cointegrating vectors are identiﬁedLR test for binding restrictions (rank =1):
Chi-square(2) 13.50308Probability 0.001169
Cointegrating Eq: CointEq1
USTB10Y(-1) −0.088263
USTB1Y(-1) −2.365941
USTB3M(-1) 0.000000USTB3Y(-1) 5.381347USTB5Y(-1) −3.149580
USTB6M(-1) 0.000000C 0.923034
Note: Table truncated
we would conclude that the cointegrating relationship must also include
the short end of the yield curve.
When performing hypothesis tests concerning the adjustment coefﬁ-
cients (i.e. the loadings in each equation), the restrictions are denoted by
A(i,j), which is the coefﬁcient on the cointegrating vector for the ith
variable in the jth cointegrating relation. For example, A(2, 1)=0 would
test the null that the equation for the second variable in the order thatthey were listed in the original speciﬁcation (USTB1Y in this case) doesnot include the ﬁrst cointegrating vector, and so on. Examining somerestrictions of this type is left as an exercise.
Key concepts
The key terms to be able to deﬁne and explain from this chapter are
●non-stationary ●explosive process
●unit root ●spurious regression
●augmented Dickey–Fuller test ●cointegration
●error correction model ●Engle–Granger 2-step approach
●Johansen technique ●vector error correction model
●eigenvalues

376 Introductory Econometrics for Finance
Review questions
1. (a) What kinds of variables are likely to be non-stationary? How can
such variables be made stationary?
(b) Why is it in general important to test for non-stationarity in time
series data before attempting to build an empirical model?
(c) Deﬁne the following terms and describe the processes that they
represent
(i) Weak stationarity
(ii) Strict stationarity
(iii) Deterministic trend
(iv) Stochastic trend.
2. A researcher wants to test the order of integration of some time series
data. He decides to use the DF test. He estimates a regression of theform
/Delta1y
t=μ+ψyt−1+ut
and obtains the estimate ˆψ=−0.02 with standard error =0.31.
(a) What are the null and alternative hypotheses for this test?(b) Given the data, and a critical value of −2.88, perform the test.
(c) What is the conclusion from this test and what should be the next
step?
(d) Why is it not valid to compare the estimated test statistic with the
corresponding critical value from a t-distribution, even though the test
statistic takes the form of the usual t-ratio?
3. Using the same regression as for question 2, but on a different set of
data, the researcher now obtains the estimate ˆψ=−0.52 with standard
error=0.16.
(a) Perform the test.(b) What is the conclusion, and what should be the next step?
(c) Another researcher suggests that there may be a problem with this
methodology since it assumes that the disturbances ( u
t) are white
noise. Suggest a possible source of difﬁculty and how the researchermight in practice get around it.
4. (a) Consider a series of values for the spot and futures prices of a given
commodity. In the context of these series, explain the concept ofcointegration. Discuss how a researcher might test for cointegrationbetween the variables using the Engle–Granger approach. Explainalso the steps involved in the formulation of an error correctionmodel.

Modelling long-run relationships in finance 377
(b) Give a further example from ﬁnance where cointegration between a
set of variables may be expected. Explain, by reference to theimplication of non-cointegration, why cointegration between theseries might be expected.
5. (a) Brieﬂy outline Johansen’s methodology for testing for cointegration
between a set of variables in the context of a VAR.
(b) A researcher uses the Johansen procedure and obtains the following
test statistics (and critical values):
rλ
max 95% critical value
0 38.962 33.1781 29.148 27.1692 16.304 20.2783 8.861 14.0364 1.994 3.962
Determine the number of cointegrating vectors.
(c) ‘If two series are cointegrated, it is not possible to make inferences
regarding the cointegrating relationship using the Engle–Grangertechnique since the residuals from the cointegrating regression arelikely to be autocorrelated.’ How does Johansen circumvent thisproblem to test hypotheses about the cointegrating relationship?
(d) Give one or more examples from the academic ﬁnance literature of
where the Johansen systems technique has been employed. Whatwere the main results and conclusions of this research?
(e) Compare the Johansen maximal eigenvalue test with the test based
on the trace statistic. State clearly the null and alternativehypotheses in each case.
6. (a) Suppose that a researcher has a set of three variables,
y
t(t=1,…, T), i.e. y tdenotes a p-variate, or p×1vector, that she
wishes to test for the existence of cointegrating relationships usingthe Johansen procedure.What is the implication of ﬁnding that the rank of the appropriatematrix takes on a value of(i) 0 (ii) 1 (iii) 2 (iv) 3?
(b) The researcher obtains results for the Johansen test using the
variables outlined in part (a) as follows:
rλ
max 5% critical value
0 38.65 30.261 26.91 23.842 10.67 17.723 8.55 10.71

378 Introductory Econometrics for Finance
Determine the number of cointegrating vectors, explaining your
answer.
7. Compare and contrast the Engle–Granger and Johansen methodologies
for testing for cointegration and modelling cointegrated systems. Which,in your view, represents the superior approach and why?
8. In EViews, open the ‘currencies.wf1’ ﬁle that will be discussed in detail
in the following chapter. Determine whether the exchange rate series (intheir raw levels forms) are non-stationary. If that is the case, test forcointegration between them using both the Engle–Granger and Johansenapproaches. Would you have expected the series to cointegrate? Why orwhy not?

8
Modelling volatility and correlation
Learning Outcomes
In this chapter, you will learn how to
●Discuss the features of data that motivate the use of GARCH
models
●Explain how conditional volatility models are estimated
●Test for ‘ARCH-effects’ in time series data
●Produce forecasts from GARCH models
●Contrast various models from the GARCH family
●Discuss the three hypothesis testing procedures available under
maximum likelihood estimation
●Construct multivariate conditional volatility models and
compare between alternative speciﬁcations
●Estimate univariate and multivariate GARCH models in EViews
8.1 Motivations: an excursion into non-linearity land
All of the models that have been discussed in chapters 2–7 of this book
have been linear in nature – that is, the model is linear in the parameters,so that there is one parameter multiplied by each variable in the model.For example, a structural model could be something like
y=β
1+β2×2+β3×3+β4×4+u (8.1)
or more compactly y=Xβ+u. It was additionally assumed that ut∼
N(0,σ2).
The linear paradigm as described above is a useful one. The properties
of linear estimators are very well researched and very well understood.Many models that appear, prima facie , to be non-linear, can be made linear
379

380 Introductory Econometrics for Finance
by taking logarithms or some other suitable transformation. However, it
is likely that many relationships in ﬁnance are intrinsically non-linear.As Campbell, Lo and MacKinlay (1997) state, the payoffs to options arenon-linear in some of the input variables, and investors’ willingness totrade off returns and risks are also non-linear. These observations provideclear motivations for consideration of non-linear models in a variety ofcircumstances in order to capture better the relevant features of the data.
Linear structural (and time series) models such as (8.1) are also unable
to explain a number of important features common to much ﬁnancialdata, including:
●Leptokurtosis – that is, the tendency for ﬁnancial asset returns to have
distributions that exhibit fat tails and excess peakedness at the mean.
●Volatility clustering or volatility pooling – the tendency for volatility in
ﬁnancial markets to appear in bunches. Thus large returns (of eithersign) are expected to follow large returns, and small returns (ofeither sign) to follow small returns. A plausible explanation for thisphenomenon, which seems to be an almost universal feature of assetreturn series in ﬁnance, is that the information arrivals which driveprice changes themselves occur in bunches rather than being evenlyspaced over time.
●Leverage effects – the tendency for volatility to rise more following a large
price fall than following a price rise of the same magnitude.
Campbell, Lo and MacKinlay (1997) broadly deﬁne a non-linear data gen-
erating process as one where the current value of the series is relatednon-linearly to current and previous values of the error term
y
t=f(ut,ut−1,ut−2,…) (8.2)
where utis an iid error term and fis a non-linear function. According to
Campbell, Lo and MacKinlay, a more workable and slightly more speciﬁcdeﬁnition of a non-linear model is given by the equation
y
t=g(ut−1,ut−2,…)+utσ2(ut−1,ut−2,…) (8.3)
where gis a function of past error terms only, and σ2can be interpreted
as a variance term, since it is multiplied by the current value of the error.Campbell, Lo and MacKinlay usefully characterise models with non-linear
g(•)as being non-linear in mean, while those with non-linear σ(•)
2are
characterised as being non-linear in variance.
Models can be linear in mean and variance (e.g. the CLRM, ARMA mod-
els) or linear in mean, but non-linear in variance (e.g. GARCH models).

Modelling volatility and correlation 381
Models could also be classiﬁed as non-linear in mean but linear in variance
(e.g. bicorrelations models, a simple example of which is of the followingform (see Brooks and Hinich, 1999))
y
t=α0+α1yt−1yt−2+ut (8.4)
Finally, models can be non-linear in both mean and variance (e.g. the
hybrid threshold model with GARCH errors employed by Brooks, 2001).
8.1.1 Types of non-linear models
There are an inﬁnite number of different types of non-linear model. How-ever, only a small number of non-linear models have been found to beuseful for modelling ﬁnancial data. The most popular non-linear ﬁnan-cial models are the ARCH or GARCH models used for modelling and fore-casting volatility, and switching models, which allow the behaviour of aseries to follow different processes at different points in time. Models forvolatility and correlation will be discussed in this chapter, with switchingmodels being covered in chapter 9.
8.1.2 Testing for non-linearity
How can it be determined whether a non-linear model may potentially beappropriate for the data? The answer to this question should come at leastin part from ﬁnancial theory: a non-linear model should be used whereﬁnancial theory suggests that the relationship between variables shouldbe such as to require a non-linear model. But the linear versus non-linearchoice may also be made partly on statistical grounds – deciding whethera linear speciﬁcation is sufﬁcient to describe all of the most importantfeatures of the data at hand.
So what tools are available to detect non-linear behaviour in ﬁnancial
time series? Unfortunately, ‘traditional’ tools of time series analysis (suchas estimates of the autocorrelation or partial autocorrelation function, or‘spectral analysis’, which involves looking at the data in the frequencydomain) are likely to be of little use. Such tools may ﬁnd no evidence oflinear structure in the data, but this would not necessarily imply that thesame observations are independent of one another.
However, there are a number of tests for non-linear patterns in time
series that are available to the researcher. These tests can broadly be splitinto two types: general tests and speciﬁc tests. General tests, also some-times called ‘portmanteau’ tests, are usually designed to detect many de-partures from randomness in data. The implication is that such tests will

382 Introductory Econometrics for Finance
detect a variety of non-linear structures in data, although these tests are
unlikely to tell the researcher which type of non-linearity is present! Per-haps the simplest general test for non-linearity is Ramsey’s RESET testdiscussed in chapter 4, although there are many other popular tests avail-able. One of the most widely used tests is known as the BDS test (see Brocket al., 1996) named after the three authors who ﬁrst developed it. BDS is
a pure hypothesis test. That is, it has as its null hypothesis that the dataare pure noise (completely random), and it has been argued to have powerto detect a variety of departures from randomness – linear or non-linearstochastic processes, deterministic chaos, etc. (see Brock et al., 1991). The
BDS test follows a standard normal distribution under the null hypothe-sis. The details of this test, and others, are technical and beyond the scopeof this book, although computer code for BDS estimation is now widelyavailable free of charge on the Internet.
As well as applying the BDS test to raw data in an attempt to ‘see if
there is anything there’, another suggested use of the test is as a modeldiagnostic. The idea is that a proposed model (e.g. a linear model, GARCH,or some other non-linear model) is estimated, and the test applied to the(standardised) residuals in order to ‘see what is left’. If the proposed modelis adequate, the standardised residuals should be white noise, while if thepostulated model is insufﬁcient to capture all of the relevant features ofthe data, the BDS test statistic for the standardised residuals will be statis-tically signiﬁcant. This is an excellent idea in theory, but has difﬁculties inpractice. First, if the postulated model is a non-linear one (such as GARCH),the asymptotic distribution of the test statistic will be altered, so that itwill no longer follow a normal distribution. This requires new critical val-ues to be constructed via simulation for every type of non-linear modelwhose residuals are to be tested. More seriously, if a non-linear model isﬁtted to the data, any remaining structure is typically garbled, resultingin the test either being unable to detect additional structure present inthe data (see Brooks and Henry, 2000) or selecting as adequate a modelwhich is not even in the correct class for that data generating process (seeBrooks and Heravi, 1999).
The BDS test is available in EViews. To run it on a given series, simply
open the series to be tested (which may be a set of raw data or residualsfrom an estimated model) so that it appears as a spreadsheet. Then se-lect the View menu and BDS Independence Tes t…. You will then be
offered various options. Further details are given in the EViews User’s
Guide.
Other popular tests for non-linear structure in time series data include
the bispectrum test due to Hinich (1982), the bicorrelation test (see Hsieh,

Modelling volatility and correlation 383
1993; Hinich, 1996; or Brooks and Hinich, 1999 for its multivariate gener-
alisation).
Most applications of the above tests conclude that there is non-linear
dependence in ﬁnancial asset returns series, but that the dependenceis best characterised by a GARCH-type process (see Hinich and Patterson,1985; Baillie and Bollerslev, 1989; Brooks, 1996; and the references thereinfor applications of non-linearity tests to ﬁnancial data).
Speciﬁc tests, on the other hand, are usually designed to have power
to ﬁnd speciﬁc types of non-linear structure. Speciﬁc tests are unlikely todetect other forms of non-linearities in the data, but their results will bydeﬁnition offer a class of models that should be relevant for the data athand. Examples of speciﬁc tests will be offered later in this and subsequentchapters.
8.2 Models for volatility
Modelling and forecasting stock market volatility has been the subject ofvast empirical and theoretical investigation over the past decade or soby academics and practitioners alike. There are a number of motivationsfor this line of inquiry. Arguably, volatility is one of the most importantconcepts in the whole of ﬁnance. Volatility, as measured by the standarddeviation or variance of returns, is often used as a crude measure ofthe total risk of ﬁnancial assets. Many value-at-risk models for measuringmarket risk require the estimation or forecast of a volatility parameter.The volatility of stock market prices also enters directly into the Black–Scholes formula for deriving the prices of traded options.
The next few sections will discuss various models that are appropriate
to capture the stylised features of volatility, discussed below, that havebeen observed in the literature.
8.3 Historical volatility
The simplest model for volatility is the historical estimate. Historicalvolatility simply involves calculating the variance (or standard deviation)of returns in the usual way over some historical period, and this thenbecomes the volatility forecast for all future periods. The historical aver-age variance (or standard deviation) was traditionally used as the volatil-ity input to options pricing models, although there is a growing bodyof evidence suggesting that the use of volatility predicted from more

384 Introductory Econometrics for Finance
sophisticated time series models will lead to more accurate option val-
uations (see, for example, Akgiray, 1989; or Chu and Freund, 1996). Histor-ical volatility is still useful as a benchmark for comparing the forecastingability of more complex time models.
8.4 Implied volatility models
All pricing models for ﬁnancial options require a volatility estimate orforecast as an input. Given the price of a traded option obtained fromtransactions data, it is possible to determine the volatility forecast overthe lifetime of the option implied by the option’s valuation. For example,if the standard Black–Scholes model is used, the option price, the timeto maturity, a risk-free rate of interest, the strike price and the currentvalue of the underlying asset, are all either speciﬁed in the details of theoptions contracts or are available from market data. Therefore, given allof these quantities, it is possible to use a numerical procedure, such as themethod of bisections or Newton–Raphson to derive the volatility impliedby the option (see Watsham and Parramore, 2004). This implied volatilityis the market’s forecast of the volatility of underlying asset returns overthe lifetime of the option.
8.5 Exponentially weighted moving average models
The exponentially weighted moving average (EWMA) is essentially a sim-ple extension of the historical average volatility measure, which allowsmore recent observations to have a stronger impact on the forecast ofvolatility than older data points. Under an EWMA speciﬁcation, the latestobservation carries the largest weight, and weights associated with previ-ous observations decline exponentially over time. This approach has twoadvantages over the simple historical model. First, volatility is in practicelikely to be affected more by recent events, which carry more weight,than events further in the past. Second, the effect on volatility of a sin-gle given observation declines at an exponential rate as weights attachedto recent events fall. On the other hand, the simple historical approachcould lead to an abrupt change in volatility once the shock falls out ofthe measurement sample. And if the shock is still included in a relativelylong measurement sample period, then an abnormally large observationwill imply that the forecast will remain at an artiﬁcially high level even ifthe market is subsequently tranquil. The exponentially weighted moving

Modelling volatility and correlation 385
average model can be expressed in several ways, e.g.
σ2
t=(1−λ)∞/summationdisplay
j=0λj(rt−j−¯r)2(8.5)
where σ2
tis the estimate of the variance for period t, which also becomes
the forecast of future volatility for all periods, ¯ris the average return
estimated over the observations and λis the ‘decay factor’, which de-
termines how much weight is given to recent versus older observations.The decay factor could be estimated, but in many studies is set at 0.94
as recommended by RiskMetrics, producers of popular risk measurementsoftware. Note also that RiskMetrics and many academic papers assumethat the average return, ¯r, is zero. For data that is of daily frequency or
higher, this is not an unreasonable assumption, and is likely to lead tonegligible loss of accuracy since it will typically be very small. Obviously,in practice, an inﬁnite number of observations will not be available onthe series, so that the sum in (8.5) must be truncated at some ﬁxed lag. Aswith exponential smoothing models, the forecast from an EWMA modelfor all prediction horizons is the most recent weighted average estimate.
It is worth noting two important limitations of EWMA models. First,
while there are several methods that could be used to compute the EWMA,the crucial element in each case is to remember that when the inﬁnitesum in (8.5) is replaced with a ﬁnite sum of observable data, the weightsfrom the given expression will now sum to less than one. In the case ofsmall samples, this could make a large difference to the computed EWMAand thus a correction may be necessary. Second, most time-series mod-els, such as GARCH (see below), will have forecasts that tend towards theunconditional variance of the series as the prediction horizon increases.This is a good property for a volatility forecasting model to have, sinceit is well known that volatility series are ‘mean-reverting’. This impliesthat if they are currently at a high level relative to their historic average,they will have a tendency to fall back towards their average level, whileif they are at a low level relative to their historic average, they will havea tendency to rise back towards the average. This feature is accounted forin GARCH volatility forecasting models, but not by EWMAs.
8.6 Autoregressive volatility models
Autoregressive volatility models are a relatively simple example from theclass of stochastic volatility speciﬁcations. The idea is that a time se-ries of observations on some volatility proxy are obtained. The standard

386 Introductory Econometrics for Finance
Box–Jenkins-type procedures for estimating autoregressive (or ARMA) mod-
els can then be applied to this series. If the quantity of interest in the studyis a daily volatility estimate, two natural proxies have been employed inthe literature: squared daily returns, or daily range estimators. Produc-ing a series of daily squared returns trivially involves taking a column ofobserved returns and squaring each observation. The squared return ateach point in time, t, then becomes the daily volatility estimate for day
t. A range estimator typically involves calculating the log of the ratio of
the highest observed price to the lowest observed price for trading day t,
which then becomes the volatility estimate for day t
σ
2
t=log/parenleftbigghight
low t/parenrightbigg
(8.6)
Given either the squared daily return or the range estimator, a standard
autoregressive model is estimated, with the coefﬁcients βiestimated us-
ing OLS (or maximum likelihood – see below). The forecasts are also pro-duced in the usual fashion discussed in chapter 5 in the context of ARMAmodels
σ
2
t=β0+p/summationdisplay
j=1βjσ2
t−j+εt (8.7)
8.7 Autoregressive conditionally heteroscedastic (ARCH) models
One particular non-linear model in widespread usage in ﬁnance is known
as an ‘ARCH’ model (ARCH stands for ‘autoregressive conditionally het-eroscedastic’). To see why this class of models is useful, recall that a typi-cal structural model could be expressed by an equation of the form givenin (8.1) above with u
t∼N(0,σ2). The assumption of the CLRM that the
variance of the errors is constant is known as homoscedasticity (i.e. it is
assumed that var (ut)=σ2). If the variance of the errors is not constant,
this would be known as heteroscedasticity . As was explained in chapter 4,
if the errors are heteroscedastic, but assumed homoscedastic, an implica-tion would be that standard error estimates could be wrong. It is unlikelyin the context of ﬁnancial time series that the variance of the errors willbe constant over time, and hence it makes sense to consider a model thatdoes not assume that the variance is constant, and which describes howthe variance of the errors evolves.
Another important feature of many series of ﬁnancial asset returns
that provides a motivation for the ARCH class of models, is known as‘volatility clustering’ or ‘volatility pooling’. Volatility clustering describes

Modelling volatility and correlation 387
−0.08−0.06−0.04-0.020.000.020.040.06
1/01/90 11/01/93 9/01/97Return
DateFigure 8.1
Daily S&P returns
for January1990–December1999
the tendency of large changes in asset prices (of either sign) to follow
large changes and small changes (of either sign) to follow small changes.In other words, the current level of volatility tends to be positively corre-lated with its level during the immediately preceding periods. This phe-nomenon is demonstrated in ﬁgure 8.1, which plots daily S&P500 returnsfor January 1990–December 1999.
The important point to note from ﬁgure 8.1 is that volatility occurs in
bursts . There appears to have been a prolonged period of relative tranquil-
ity in the market during the mid-1990s, evidenced by only relatively smallpositive and negative returns. On the other hand, during mid-1997 to late1998, there was far more volatility, when many large positive and largenegative returns were observed during a short space of time. Abusing theterminology slightly, it could be stated that ‘volatility is autocorrelated’.
How could this phenomenon, which is common to many series of ﬁnan-
cial asset returns, be parameterised (modelled)? One approach is to usean ARCH model. To understand how the model works, a deﬁnition of theconditional variance of a random variable, u
t, is required. The distinction
between the conditional and unconditional variances of a random variableis exactly the same as that of the conditional and unconditional mean.The conditional variance of u
tmay be denoted σ2
t, which is written as
σ2
t=var(ut|ut−1,ut−2,…)=E[(ut−E(ut))2|ut−1,ut−2,…] (8.8)
It is usually assumed that E( ut)=0,s o
σ2
t=var(ut|ut−1,ut−2,…)=E/bracketleftbig
u2
t|ut−1,ut−2,…/bracketrightbig
(8.9)
Equation (8.9) states that the conditional variance of a zero mean nor-
mally distributed random variable utis equal to the conditional expected

388 Introductory Econometrics for Finance
value of the square of ut. Under the ARCH model, the ‘autocorrelation in
volatility’ is modelled by allowing the conditional variance of the errorterm, σ
2
t, to depend on the immediately previous value of the squared
error
σ2
t=α0+α1u2
t−1 (8.10)
The above model is known as an ARCH(1), since the conditional variance
depends on only one lagged squared error. Notice that (8.10) is only a par-tial model, since nothing has been said yet about the conditional mean.Under ARCH, the conditional mean equation (which describes how thedependent variable, y
t, varies over time) could take almost any form that
the researcher wishes. One example of a full model would be
yt=β1+β2x2t+β3x3t+β4x4t+ut ut∼N/parenleftbig
0,σ2
t/parenrightbig
(8.11)
σ2
t=α0+α1u2
t−1 (8.12)
The model given by (8.11) and (8.12) could easily be extended to the general
case where the error variance depends on qlags of squared errors, which
would be known as an ARCH( q) model:
σ2
t=α0+α1u2
t−1+α2u2
t−2+···+ αqu2
t−q (8.13)
Instead of calling the conditional variance σ2
t, in the literature it is often
called ht, so that the model would be written
yt=β1+β2x2t+β3x3t+β4x4t+ut ut∼N(0,ht) (8.14)
ht=α0+α1u2
t−1+α2u2
t−2+···+ αqu2
t−q (8.15)
The remainder of this chapter will use σ2
tto denote the conditional vari-
ance at time t, except for computer instructions where htwill be used
since it is easier not to use Greek letters.
8.7.1 Another way of expressing ARCH models
For illustration, consider an ARCH(1). The model can be expressed in twoways that look different but are in fact identical. The ﬁrst is as given in(8.11) and (8.12) above. The second way would be as follows
y
t=β1+β2x2t+β3x3t+β4x4t+ut (8.16)
ut=vtσtvt∼N(0,1) (8.17)
σ2
t=α0+α1u2
t−1 (8.18)
The form of the model given in (8.11) and (8.12) is more commonly pre-
sented, although specifying the model as in (8.16)–(8.18) is required in

Modelling volatility and correlation 389
order to use a GARCH process in a simulation study (see chapter 12). To
show that the two methods for expressing the model are equivalent, con-sider that in (8.17), v
tis normally distributed with zero mean and unit
variance, so that utwill also be normally distributed with zero mean and
variance σ2
t.
8.7.2 Non-negativity constraints
Since htis a conditional variance, its value must always be strictly posi-
tive; a negative variance at any point in time would be meaningless. Thevariables on the RHS of the conditional variance equation are all squaresof lagged errors, and so by deﬁnition will not be negative. In order toensure that these always result in positive conditional variance estimates,all of the coefﬁcients in the conditional variance are usually required tobe non-negative. If one or more of the coefﬁcients were to take on a neg-ative value, then for a sufﬁciently large lagged squared innovation termattached to that coefﬁcient, the ﬁtted value from the model for the con-ditional variance could be negative. This would clearly be nonsensical. So,for example, in the case of (8.18), the non-negativity condition would beα
0≥0andα1≥0. More generally, for an ARCH( q) model, all coefﬁcients
would be required to be non-negative: αi≥0∀i=0,1,2,…, q. In fact,
this is a sufﬁcient but not necessary condition for non-negativity of theconditional variance (i.e. it is a slightly stronger condition than is actuallynecessary).
8.7.3 Testing for ‘ARCH effects’
A test for determining whether ‘ARCH-effects’ are present in the residualsof an estimated model may be conducted using the steps outlined inbox 8.1.
Thus, the test is one of a joint null hypothesis that all qlags of the
squared residuals have coefﬁcient values that are not signiﬁcantly differ-ent from zero. If the value of the test statistic is greater than the criticalvalue from the χ
2distribution, then reject the null hypothesis. The test
can also be thought of as a test for autocorrelation in the squared residu-als. As well as testing the residuals of an estimated model, the ARCH testis frequently applied to raw returns data.
8.7.4 Testing for ‘ARCH effects’ in exchange rate returns using EViews
Before estimating a GARCH-type model, it is sensible ﬁrst to compute theEngle (1982) test for ARCH effects to make sure that this class of models isappropriate for the data. This exercise (and the remaining exercises of thischapter), will employ returns on the daily exchange rates where there are

390 Introductory Econometrics for Finance
Box 8.1 Testing for ‘ARCH effects’
(1) Run any postulated linear regression of the form given in the equation above, e.g.
yt=β1+β2x2t+β3x3t+β4x4t+ut (8.19)
saving the residuals, ˆut.
(2) Square the residuals, and regress them on qown lags to test for ARCH of order q,
i.e. run the regression
ˆu2
t=γ0+γ1ˆu2
t−1+γ2ˆu2
t−2+···+ γqˆu2
t−q+vt (8.20)
where vtis an error term.
Obtain R2from this regression.
(3) The test statistic is deﬁned as TR2(the number of observations multiplied by the
coefﬁcient of multiple correlation) from the last regression, and is distributed as aχ
2(q).
(4) The null and alternative hypotheses are
H0:γ1=0 and γ2=0 and γ3=0 and…andγq=0
H1:γ1/negationslash=0o rγ2/negationslash=0o rγ3/negationslash=0o r…orγq/negationslash=0
1,827 observations. Models of this kind are inevitably more data intensive
than those based on simple linear regressions, and hence, everything elsebeing equal, they work better when the data are sampled daily ratherthan at a lower frequency.
A test for the presence of ARCH in the residuals is calculated by regress-
ing the squared residuals on a constant and plags, where pis set by the
user. As an example, assume that pis set to 5. The ﬁrst step is to esti-
mate a linear model so that the residuals can be tested for ARCH. Fromthe main menu, select Quick and then select Estimate Equation .I nt h e
Equation Speciﬁcation Editor, input rgbp c ar(1) ma(1) which will estimate
an ARMA(1,1) for the pound-dollar returns.
1Select the Least Squares (NLA
and ARMA) procedure to estimate the model, using the whole sample
period and press the OKbutton (output not shown).
The next step is to click on View from the Equation Window and to
select Residual Tests and then Heteroskedasticity Test s…. In the ‘Test
type’ box, choose ARCH and the number of lags to include is 5, and press
OK. The output below shows the Engle test results.
1Note that the (1,1) order has been chosen entirely arbitrarily at this stage. However, it is
important to give some thought to the type and order of model used even if it is not ofdirect interest in the problem at hand (which will later be termed the ‘conditionalmean’ equation), since the variance is measured around the mean and therefore anymis-speciﬁcation in the mean is likely to lead to a mis-speciﬁed variance.

Modelling volatility and correlation 391
Heteroskedasticity Test: ARCH
F-statistic 5.909063 Prob. F(5,1814) 0.0000
Obs*R-squared 29.16797 Prob. Chi-Square(5) 0.0000
Test Equation:Dependent Variable: RESID
∧2
Method: Least SquaresDate: 09/06/07 Time: 14:41Sample (adjusted): 7/14/2002 7/07/2007Included observations: 1820 after adjustments
Coefﬁcient Std. Error t-Statistic Prob.
C 0.154689 0.011369 13.60633 0.0000
RESID∧2(-1) 0.118068 0.023475 5.029627 0.0000
RESID∧2(-2) −0.006579 0.023625 −0.278463 0.7807
RESID∧2(-3) 0.029000 0.023617 1.227920 0.2196
RESID∧2(-4) −0.032744 0.023623 −1.386086 0.1659
RESID∧2(-5) −0.020316 0.023438 −0.866798 0.3862
R-squared 0.016026 Mean dependent var 0.169496
Adjusted R-squared 0.013314 S.D. dependent var 0.344448S.E. of regression 0.342147 Akaike info criterion 0.696140Sum squared resid 212.3554 Schwarz criterion 0.714293Log likelihood −627.4872 Hannan-Quinn criter. 0.702837
F-statistic 5.909063 Durbin-Watson stat 1.995904Prob(F-statistic) 0.000020
Both the F-version and the LM-statistic are very signiﬁcant, suggesting the
presence of ARCH in the pound–dollar returns.
8.7.5 Limitations of ARCH(q) models
ARCH provided a framework for the analysis and development of timeseries models of volatility. However, ARCH models themselves have rarelybeen used in the last decade or more, since they bring with them a num-ber of difﬁculties:
●How should the value of q, the number of lags of the squared residual
in the model, be decided? One approach to this problem would be theuse of a likelihood ratio test, discussed later in this chapter, althoughthere is no clearly best approach.
●The value of q, the number of lags of the squared error that are required
to capture all of the dependence in the conditional variance, mightbevery large . This would result in a large conditional variance model
that was not parsimonious. Engle (1982) circumvented this problem by

392 Introductory Econometrics for Finance
specifying an arbitrary linearly declining lag length on an ARCH(4)
σ2
t=γ0+γ1/parenleftbig
0.4ˆu2
t−1+0.3ˆu2
t−2+0.2ˆu2
t−3+0.1ˆu2
t−4/parenrightbig
(8.21)
such that only two parameters are required in the conditional variance
equation (γ0andγ1), rather than the ﬁve which would be required for
an unrestricted ARCH(4).
●Non-negativity constraints might be violated . Everything else equal, the more
parameters there are in the conditional variance equation, the morelikely it is that one or more of them will have negative estimated values.
A natural extension of an ARCH( q) model which overcomes some of these
problems is a GARCH model. In contrast with ARCH, GARCH models areextremely widely employed in practice.
8.8 Generalised ARCH (GARCH) models
The GARCH model was developed independently by Bollerslev (1986) andTaylor (1986). The GARCH model allows the conditional variance to be de-pendent upon previous own lags, so that the conditional variance equa-tion in the simplest case is now
σ
2
t=α0+α1u2
t−1+βσ2
t−1 (8.22)
This is a GARCH(1,1) model. σ2
tis known as the conditional variance since
it is a one-period ahead estimate for the variance calculated based on anypast information thought relevant. Using the GARCH model it is possibleto interpret the current ﬁtted variance, h
t, as a weighted function of a
long-term average value (dependent on α0), information about volatility
during the previous period (α1u2
t−1)and the ﬁtted variance from the model
during the previous period (βσt−12). Note that the GARCH model can be
expressed in a form that shows that it is effectively an ARMA model forthe conditional variance. To see this, consider that the squared return attime trelative to the conditional variance is given by
ε
t=u2
t−σ2
t (8.23)
or
σ2
t=u2
t−εt (8.24)
Using the latter expression to substitute in for the conditional variance
in (8.22)
u2
t−εt=α0+α1u2
t−1+β/parenleftbig
u2
t−1−εt−1/parenrightbig
(8.25)

Modelling volatility and correlation 393
Rearranging
u2
t=α0+α1u2
t−1+βu2
t−1−βεt−1+εt (8.26)
so that
u2
t=α0+(α1+β)u2
t−1−βεt−1+εt (8.27)
This ﬁnal expression is an ARMA(1,1) process for the squared errors.
Why is GARCH a better and therefore a far more widely used model than
ARCH? The answer is that the former is more parsimonious, and avoidsoverﬁtting. Consequently, the model is less likely to breach non-negativityconstraints. In order to illustrate why the model is parsimonious, ﬁrst takethe conditional variance equation in the GARCH(1,1) case, subtract 1 fromeach of the time subscripts of the conditional variance equation in (8.22),so that the following expression would be obtained
σ
2
t−1=α0+α1u2
t−2+βσ2
t−2 (8.28)
and subtracting 1 from each of the time subscripts again
σ2
t−2=α0+α1u2
t−3+βσ2
t−3 (8.29)
Substituting into (8.22) for σ2
t−1
σ2
t=α0+α1u2
t−1+β/parenleftbig
α0+α1u2
t−2+βσ2
t−2/parenrightbig
(8.30)
σ2
t=α0+α1u2
t−1+α0β+α1βu2
t−2+β2σ2
t−2 (8.31)
Now substituting into (8.31) for σ2
t−2
σ2
t=α0+α1u2
t−1+α0β+α1βu2
t−2+β2/parenleftbig
α0+α1u2
t−3+βσ2
t−3/parenrightbig
(8.32)
σ2
t=α0+α1u2
t−1+α0β+α1βu2
t−2+α0β2+α1β2u2
t−3+β3σ2
t−3 (8.33)
σ2
t=α0(1+β+β2)+α1u2
t−1(1+βL+β2L2)+β3σ2
t−3 (8.34)
An inﬁnite number of successive substitutions of this kind would yield
σ2
t=α0(1+β+β2+··· )+α1u2
t−1(1+βL+β2L2+··· )+β∞σ2
0(8.35)
The ﬁrst expression on the RHS of (8.35) is simply a constant, and as the
number of observations tends to inﬁnity, β∞will tend to zero. Hence, the
GARCH(1,1) model can be written as
σ2
t=γ0+α1u2
t−1(1+βL+β2L2+··· ) (8.36)
=γ0+γ1u2
t−1+γ2u2
t−2+···, (8.37)
which is a restricted inﬁnite order ARCH model. Thus the GARCH(1,1)
model, containing only three parameters in the conditional variance

394 Introductory Econometrics for Finance
equation, is a very parsimonious model, that allows an inﬁnite number
of past squared errors to inﬂuence the current conditional variance.
The GARCH(1,1) model can be extended to a GARCH( p,q) formulation,
where the current conditional variance is parameterised to depend upon
qlags of the squared error and plags of the conditional variance
σ2
t=α0+α1u2
t−1+α2u2
t−2+···+ αqu2
t−q+β1σ2
t−1
+β2σ2
t−2+···+ βpσ2
t−p (8.38)
σ2
t=α0+q/summationdisplay
i=1αiu2
t−i+p/summationdisplay
j=1βjσ2
t−j (8.39)
But in general a GARCH(1,1) model will be sufﬁcient to capture the volatil-
ity clustering in the data, and rarely is any higher order model estimatedor even entertained in the academic ﬁnance literature.
8.8.1 The unconditional variance under a GARCH speciﬁcation
The conditional variance is changing, but the unconditional variance ofu
tis constant and given by
var(ut)=α0
1−(α1+β)(8.40)
so long as α1+β< 1.F o r α1+β≥1, the unconditional variance of ut
is not deﬁned, and this would be termed ‘non-stationarity in variance’.
α1+β=1would be known as a ‘unit root in variance’, also termed ‘In-
tegrated GARCH’ or IGARCH. Non-stationarity in variance does not have astrong theoretical motivation for its existence, as would be the case fornon-stationarity in the mean (e.g. of a price series). Furthermore, a GARCHmodel whose coefﬁcients imply non-stationarity in variance would havesome highly undesirable properties. One illustration of these relates to theforecasts of variance made from such models. For stationary GARCH mod-els, conditional variance forecasts converge upon the long-term averagevalue of the variance as the prediction horizon increases (see below). ForIGARCH processes, this convergence will not happen, while for α
1+β> 1,
the conditional variance forecast will tend to inﬁnity as the forecast hori-zon increases!
8.9 Estimation of ARCH/GARCH models
Since the model is no longer of the usual linear form, OLS cannot be usedfor GARCH model estimation. There are a variety of reasons for this, butthe simplest and most fundamental is that OLS minimises the residual

Modelling volatility and correlation 395
Box 8.2 Estimating an ARCH or GARCH model
(1) Specify the appropriate equations for the mean and the variance – e.g. an
AR(1)-GARCH(1,1) model
yt=μ+φyt−1+ut,ut∼N/parenleftBig
0,σ2
t/parenrightBig
(8.41)
σ2
t=α0+α1u2
t−1+βσ2
t−1 (8.42)
(2) Specify the log-likelihood function ( LLF) to maximise under a normality assumption
for the disturbances
L=−T
2log(2π)−1
2T/summationdisplay
t=1log/parenleftbig
σ2
t/parenrightbig
−1
2T/summationdisplay
t=1(yt−μ−φyt−1)2/σ2
t (8.43)
(3) The computer will maximise the function and generate parameter values that
maximise the LLFand will construct their standard errors.
sum of squares. The RSS depends only on the parameters in the condi-
tional mean equation, and not the conditional variance, and hence RSSminimisation is no longer an appropriate objective.
In order to estimate models from the GARCH family, another technique
known as maximum likelihood is employed. Essentially, the method works
by ﬁnding the most likely values of the parameters given the actual data.More speciﬁcally, a log-likelihood function is formed and the values of theparameters that maximise it are sought. Maximum likelihood estimationcan be employed to ﬁnd parameter values for both linear and non-linearmodels. The steps involved in actually estimating an ARCH or GARCHmodel are shown in box 8.2.
The following section will elaborate on points 2 and 3 above, explaining
how the LLFis derived.
8.9.1 Parameter estimation using maximum likelihood
As stated above, under maximum likelihood estimation, a set of parame-
ter values are chosen that are most likely to have produced the observeddata. This is done by ﬁrst forming a likelihood function , denoted LF.LFwill
be a multiplicative function of the actual data, which will consequentlybe difﬁcult to maximise with respect to the parameters. Therefore, its log-arithm is taken in order to turn LFinto an additive function of the sample
data, i.e. the LLF. A derivation of the maximum likelihood (ML) estimator
in the context of the simple bivariate regression model with homoscedas-ticity is given in the appendix to this chapter. Essentially, deriving the MLestimators involves differentiating the LLFwith respect to the parameters.
But how does this help in estimating heteroscedastic models? How can the

396 Introductory Econometrics for Finance
method outlined in the appendix for homoscedastic models be modiﬁed
for application to GARCH model estimation?
In the context of conditional heteroscedasticity models, the model is
yt=μ+φyt−1+ut,ut∼N(0,σ2
t), so that the variance of the errors has
been modiﬁed from being assumed constant, σ2, to being time-varying,
σ2
t, with the equation for the conditional variance as previously. The LLF
relevant for a GARCH model can be constructed in the same way as forthe homoscedastic case by replacing
T
2logσ2
with the equivalent for time-varying variance
1
2T/summationdisplay
t=1logσ2
t
and replacing σ2in the denominator of the last part of the expression
withσ2
t(see the appendix to this chapter). Derivation of this result from
ﬁrst principles is beyond the scope of this text, but the log-likelihoodfunction for the above model with time-varying conditional variance andnormally distributed errors is given by (8.43) in box 8.2.
Intuitively, maximising the LLFinvolves jointly minimising
T/summationdisplay
t=1logσ2
t
and
T/summationdisplay
t=1(yt−μ−φyt−1)2
σ2
t
(since these terms appear preceded with a negative sign in the LLF,a n d
−T
2log(2π)
is just a constant with respect to the parameters). Minimising these terms
jointly also implies minimising the error variance, as described in chap-ter 3. Unfortunately, maximising the LLFfor a model with time-varying
variances is trickier than in the homoscedastic case. Analytical derivativesof the LLFin (8.43) with respect to the parameters have been developed,
but only in the context of the simplest examples of GARCH speciﬁcations.Moreover, the resulting formulae are complex, so a numerical procedureis often used instead to maximise the log-likelihood function.
Essentially, all methods work by ‘searching’ over the parameter-space
until the values of the parameters that maximise the log-likelihood

Modelling volatility and correlation 397
ACBl( )Figure 8.2
The problem of local
optima in maximumlikelihoodestimation
function are found. EViews employs an iterative technique for maximising
theLLF. This means that, given a set of initial guesses for the parameter
estimates, these parameter values are updated at each iteration until theprogram determines that an optimum has been reached. If the LLFhas
only one maximum with respect to the parameter values, any optimisa-tion method should be able to ﬁnd it – although some methods will takelonger than others. A detailed presentation of the various methods avail-able is beyond the scope of this book. However, as is often the case withnon-linear models such as GARCH, the LLFcan have many local maxima,
so that different algorithms could ﬁnd different local maxima of the LLF.
Hence readers should be warned that different optimisation procedurescould lead to different coefﬁcient estimates and especially different esti-mates of the standard errors (see Brooks, Burke and Persand, 2001 or 2003for details). In such instances, a good set of initial parameter guesses isessential.
Local optima or multimodalities in the likelihood surface present po-
tentially serious drawbacks with the maximum likelihood approach toestimating the parameters of a GARCH model, as shown in ﬁgure 8.2.
Suppose that the model contains only one parameter, θ, so that the log-
likelihood function is to be maximised with respect to this one parameter.In ﬁgure 8.2, the value of the LLFfor each value of θis denoted l(θ).
Clearly, l(θ)reaches a global maximum when θ=C, and a local maximum
when θ=A. This demonstrates the importance of good initial guesses for
the parameters. Any initial guesses to the left of Bare likely to lead
to the selection of Arather than C. The situation is likely to be even
worse in practice, since the log-likelihood function will be maximisedwith respect to several parameters, rather than one, and there could be

398 Introductory Econometrics for Finance
Box 8.3 Using maximum likelihood estimation in practice
(1)Set up theLLF.
(2) Use regression to get initial estimates for the mean parameters.
(3) Choose some initial guesses for the conditional variance parameters . In most
software packages, the default initial values for the conditional varianceparameters would be zero. This is unfortunate since zero parameter values oftenyield a local maximum of the likelihood function. So if possible, set plausible initialvalues away from zero.
(4) Specify a convergence criterion – either by criterion or by value. When ‘by criterion’
is selected, the package will continue to search for ‘better’ parameter values thatgive a higher value of the LLFuntil the change in the value of the LLFbetween
iterations is less than the speciﬁed convergence criterion. Choosing ‘by value’ willlead to the software searching until the change in the coefﬁcient estimates aresmall enough. The default convergence criterion for EViews is 0.001, which meansthat convergence is achieved and the program will stop searching if the biggestpercentage change in any of the coefﬁcient estimates for the most recent iterationis smaller than 0.1%.
many local optima. Another possibility that would make optimisationdifﬁcult is when the LLFis ﬂat around the maximum. So, for example, if
the peak corresponding to Cin ﬁgure 8.2, were ﬂat rather than sharp, a
range of values for θcould lead to very similar values for the LLF, making
it difﬁcult to choose between them.
So, to explain again in more detail, the optimisation is done in the way
shown in box 8.3.
The optimisation methods employed by EViews are based on the deter-
mination of the ﬁrst and second derivatives of the log-likelihood functionwith respect to the parameter values at each iteration, known as the gra-dient and Hessian (the matrix of second derivatives of the LLFw.r.t the
parameters), respectively. An algorithm for optimisation due to Berndt,Hall, Hall and Hausman (1974), known as BHHH, is available in EViews.BHHH employs only ﬁrst derivatives (calculated numerically rather thananalytically) and approximations to the second derivatives are calculated.Not calculating the actual Hessian at each iteration at each time step in-creases computational speed, but the approximation may be poor whentheLLFis a long way from its maximum value, requiring more iterations
to reach the optimum. The Marquardt algorithm, available in EViews, is amodiﬁcation of BHHH (both of which are variants on the Gauss–Newtonmethod) that incorporates a ‘correction’, the effect of which is to push thecoefﬁcient estimates more quickly to their optimal values. All of these op-timisation methods are described in detail in Press et al. (1992).

Modelling volatility and correlation 399
8.9.2 Non-normality and maximum likelihood
Recall that the conditional normality assumption for utis essential in
specifying the likelihood function. It is possible to test for non-normalityusing the following representation
u
t=vtσt,vt∼N/parenleftbig
0,1/parenrightbig
(8.44)
σt=/radicalBig
α0+α1u2
t−1+βσ2
t−1 (8.45)
Note that one would not expect utto be normally distributed – it is a
N(0,σ2
t)disturbance term from the regression model, which will imply it
is likely to have fat tails. A plausible method to test for normality wouldbe to construct the statistic
v
t=ut
σt(8.46)
which would be the model disturbance at each point in time tdivided
by the conditional standard deviation at that point in time. Thus, it isthev
tthat are assumed to be normally distributed, not ut. The sample
counterpart would be
ˆvt=ˆut
ˆσt(8.47)
which is known as a standardised residual. Whether the ˆvtare normal can
be examined using any standard normality test, such as the Bera–Jarque.Typically, ˆv
tare still found to be leptokurtic, although less so than the ˆut.
The upshot is that the GARCH model is able to capture some, although notall, of the leptokurtosis in the unconditional distribution of asset returns.
Is it a problem if ˆv
tare not normally distributed? Well, the answer is
‘not really’. Even if the conditional normality assumption does not hold,the parameter estimates will still be consistent if the equations for themean and variance are correctly speciﬁed. However, in the context of non-normality, the usual standard error estimates will be inappropriate, anda different variance–covariance matrix estimator that is robust to non-normality, due to Bollerslev and Wooldridge (1992), should be used. Thisprocedure (i.e. maximum likelihood with Bollerslev–Wooldridge standarderrors) is known as quasi-maximum likelihood , or QML.
8.9.3 Estimating GARCH models in EViews
To estimate a GARCH-type model, open the equation speciﬁcation di-
alog by selecting Quick/Estimate Equation or by selecting Object/New
Object/Equation . . . . Select ARCH from the ‘Estimation Settings’ selection
box. The window in screenshot 8.1 will open.

400 Introductory Econometrics for Finance
Screenshot 8.1
Estimating a
GARCH-type model
It is necessary to specify both the mean and the variance equations, as
well as the estimation technique and sample.
The mean equation
The speciﬁcation of the mean equation should be entered in the depen-dent variable edit box. Enter the speciﬁcation by listing the dependentvariable followed by the regressors. The constant term ‘C’ should also beincluded. If your speciﬁcation includes an ARCH-M term (see later in thischapter), you should click on the appropriate button in the upper RHSof the dialog box to select the conditional standard deviation, the condi-tional variance, or the log of the conditional variance.
The variance equation
The edit box labelled ‘Variance regressors’ is where variables that are to beincluded in the variance speciﬁcation should be listed. Note that EViewswill always include a constant in the conditional variance, so that it isnot necessary to add ‘C’ to the variance regressor list. Similarly, it is not

Modelling volatility and correlation 401
necessary to include the ARCH or GARCH terms in this box as they will be
dealt with in other parts of the dialog box. Instead, enter here any exoge-nous variables or dummies that you wish to include in the conditionalvariance equation, or (as is usually the case), just leave this box blank.
Variance and distribution speciﬁcation
Under the ‘Variance and distribution Speciﬁcation’ label, choose the num-ber of ARCH and GARCH terms. The default is to estimate with one ARCHand one GARCH term (i.e. one lag of the squared errors and one lag ofthe conditional variance, respectively). To estimate the standard GARCHmodel, leave the default ‘GARCH/TARCH’. The other entries in this boxdescribe more complicated variants of the standard GARCH speciﬁcation,which are described in later sections of this chapter.
Estimation options
EViews provides a number of optional estimation settings. Clicking on theOptions tab gives the options in screenshot 8.2 to be ﬁlled out as required.
Screenshot 8.2
GARCH model
estimation options

402 Introductory Econometrics for Finance
The Heteroskedasticity Consistent Covariance option is used to compute
the quasi-maximum likelihood (QML) covariances and standard errors us-ing the methods described by Bollerslev and Wooldridge (1992). This op-tion should be used if you suspect that the residuals are not conditionallynormally distributed. Note that the parameter estimates will be (virtually)unchanged if this option is selected; only the estimated covariance matrixwill be altered.
The log-likelihood functions for ARCH models are often not well behaved
so that convergence may not be achieved with the default estimation set-tings. It is possible in EViews to select the iterative algorithm (Marquardt,BHHH/Gauss Newton), to change starting values, to increase the maximumnumber of iterations or to adjust the convergence criteria. For example,if convergence is not achieved, or implausible parameter estimates areobtained, it is sensible to re-do the estimation using a different set ofstarting values and/or a different optimisation algorithm.
Once the model has been estimated, EViews provides a variety of
pieces of information and procedures for inference and diagnostic check-ing. For example, the following options are available on the Viewbutton:
●Actual, Fitted, ResidualThe residuals are displayed in various forms, such as table, graphs andstandardised residuals.
●GARCH graphThis graph plots the one-step ahead standard deviation, σ
t, or the con-
ditional variance, σ2
tfor each observation in the sample.
●Covariance Matrix
●Coefﬁcient Tests
●Residual Tests/Correlogram-Q statistics
●Residual Tests/Correlogram Squared Residuals
●Residual Tests/Histogram-Normality Test
●Residual Tests/ARCH LM Test.
ARCH model procedures
These options are all available by pressing the ‘Proc’ button following the
estimation of a GARCH-type model:
●Make Residual Series
●Make GARCH Variance Series
●Forecast.

Modelling volatility and correlation 403
Estimating the GARCH(1,1) model for the yen–dollar (‘rjpy’) series using
the instructions as listed above, and the default settings elsewhere wouldyield the results:
Dependent Variable: RJPY
Method: ML – ARCH (Marquardt) – Normal distributionDate: 09/06/07 Time: 18:02Sample (adjusted): 7/08/2002 7/07/2007Included observations: 1826 after adjustmentsConvergence achieved after 10 iterationsPresample variance: backcast (parameter =0.7)
GARCH =C(2)+C(3)
∗RESID( −1)∧2+C(4)∗GARCH( −1)
Coefﬁcient Std. Error z-Statistic Prob.
C 0.005518 0.009396 0.587333 0.5570
Variance Equation
C 0.001345 0.000526 2.558748 0.0105
RESID( −1)∧2 0.028436 0.004108 6.922465 0.0000
GARCH( −1) 0.964139 0.005528 174.3976 0.0000
R-squared −0.000091 Mean dependent var 0.001328
Adjusted R-squared −0.001738 S.D. dependent var 0.439632
S.E. of regression 0.440014 Akaike info criterion 1.139389Sum squared resid 352.7611 Schwarz criterion 1.151459Log likelihood −1036.262 Hannan-Quinn criter. 1.143841
Durbin-Watson stat 1.981759
The coefﬁcients on both the lagged squared residual and lagged con-
ditional variance terms in the conditional variance equation are highlystatistically signiﬁcant. Also, as is typical of GARCH model estimates forﬁnancial asset returns data, the sum of the coefﬁcients on the laggedsquared error and lagged conditional variance is very close to unity (ap-proximately 0.99). This implies that shocks to the conditional variancewill be highly persistent. This can be seen by considering the equationsfor forecasting future values of the conditional variance using a GARCHmodel given in a subsequent section. A large sum of these coefﬁcientswill imply that a large positive or a large negative return will lead futureforecasts of the variance to be high for a protracted period. The individualconditional variance coefﬁcients are also as one would expect. The vari-ance intercept term ‘C’ is very small, and the ‘ARCH parameter’ is around

404 Introductory Econometrics for Finance
0.03while the coefﬁcient on the lagged conditional variance (‘GARCH’) is
larger at 0.96.
8.10 Extensions to the basic GARCH model
Since the GARCH model was developed, a huge number of extensions and
variants have been proposed. A couple of the most important exampleswill be highlighted here. Interested readers who wish to investigate furtherare directed to a comprehensive survey by Bollerslev et al. (1992).
Many of the extensions to the GARCH model have been suggested as
a consequence of perceived problems with standard GARCH (p,q)mod-
els. First, the non-negativity conditions may be violated by the estimatedmodel. The only way to avoid this for sure would be to place artiﬁ-cial constraints on the model coefﬁcients in order to force them to benon-negative. Second, GARCH models cannot account for leverage effects(explained below), although they can account for volatility clusteringand leptokurtosis in a series. Finally, the model does not allow for anydirect feedback between the conditional variance and the conditionalmean.
Some of the most widely used and inﬂuential modiﬁcations to the
model will now be examined. These may remove some of the restrictionsor limitations of the basic model.
8.11 Asymmetric GARCH models
One of the primary restrictions of GARCH models is that they enforcea symmetric response of volatility to positive and negative shocks. Thisarises since the conditional variance in equations such as (8.39) is a func-tion of the magnitudes of the lagged residuals and not their signs (inother words, by squaring the lagged error in (8.39), the sign is lost). How-ever, it has been argued that a negative shock to ﬁnancial time series islikely to cause volatility to rise by more than a positive shock of the samemagnitude. In the case of equity returns, such asymmetries are typicallyattributed to leverage effects , whereby a fall in the value of a ﬁrm’s stock
causes the ﬁrm’s debt to equity ratio to rise. This leads shareholders, whobear the residual risk of the ﬁrm, to perceive their future cashﬂow streamas being relatively more risky.
An alternative view is provided by the ‘volatility-feedback’ hypothesis.
Assuming constant dividends, if expected returns increase when stock

Modelling volatility and correlation 405
price volatility increases, then stock prices should fall when volatility rises.
Although asymmetries in returns series other than equities cannot beattributed to changing leverage, there is equally no reason to supposethat such asymmetries only exist in equity returns.
Two popular asymmetric formulations are explained below: the GJR
model, named after the authors Glosten, Jagannathan and Runkle(1993), and the exponential GARCH (EGARCH) model proposed by Nelson(1991).
8.12 The GJR model
The GJR model is a simple extension of GARCH with an additional termadded to account for possible asymmetries. The conditional variance isnow given by
σ
2
t=α0+α1u2
t−1+βσ2
t−1+γu2
t−1It−1 (8.48)
where It−1=1ifut−1<0
=0otherwise
For a leverage effect, we would see γ> 0. Notice now that the condition
for non-negativity will be α0>0,α1>0,β≥0, and α1+γ≥0.That is,
the model is still admissible, even if γ< 0, provided that α1+γ≥0.
Example 8.1
To offer an illustration of the GJR approach, using monthly S&P500 re-
turns from December 1979 until June 1998, the following results wouldbe obtained, with t-ratios in parentheses
y
t=0.172 (8.49)
(3.198)
σ2
t=1.243+0.015u2
t−1+0.498σ2
t−1+0.604u2
t−1It−1 (8.50)
(16.372) (0 .437) (14 .999) (5 .772)
Note that the asymmetry term, γ, has the correct sign and is signiﬁcant. To
see how volatility rises more after a large negative shock than a large posi-tive one, suppose that σ
2
t−1=0.823, and consider ˆut−1=±0.5.I fˆut−1=0.5,
this implies that σ2
t=1.65. However, a shock of the same magnitude but
of opposite sign, ˆut−1=−0.5, implies that the ﬁtted conditional variance
for time twill be σ2
t=1.80.

406 Introductory Econometrics for Finance
8.13 The EGARCH model
The exponential GARCH model was proposed by Nelson (1991). There are
various ways to express the conditional variance equation, but one possi-ble speciﬁcation is given by
ln/parenleftbig
σ
2
t/parenrightbig
=ω+βln/parenleftbig
σ2
t−1/parenrightbig
+γut−1/radicalBig
σ2
t−1+α⎡
⎣|ut−1|/radicalBig
σ2
t−1−/radicalbigg
2
π⎤
⎦ (8.51)
The model has several advantages over the pure GARCH speciﬁcation. First,
since the log (σ2
t)is modelled, then even if the parameters are negative, σ2
t
will be positive. There is thus no need to artiﬁcially impose non-negativity
constraints on the model parameters. Second, asymmetries are allowed forunder the EGARCH formulation, since if the relationship between volatil-ity and returns is negative, γ, will be negative.
Note that in the original formulation, Nelson assumed a Generalised
Error Distribution (GED) structure for the errors. GED is a very broadfamily of distributions that can be used for many types of series. However,owing to its computational ease and intuitive interpretation, almost allapplications of EGARCH employ conditionally normal errors as discussedabove rather than using GED.
8.14 GJR and EGARCH in EViews
The main menu screen for GARCH estimation demonstrates that a num-ber of variants on the standard GARCH model are available. Arguably mostimportant of these are asymmetric models, such as the TGARCH (‘thresh-old’ GARCH), which is also known as the GJR model, and the EGARCHmodel. To estimate a GJR model in EViews, from the GARCH model equa-tion speciﬁcation screen (screenshot 8.1 above), change the ‘ThresholdOrder’ number from 0 to 1. To estimate an EGARCH model, change the‘GARCH/TARCH’ model estimation default to ‘EGARCH’.
Coefﬁcient estimates for each of these speciﬁcations using the daily
Japanese yen–US dollar returns data are given in the next two out-put tables, respectively. For both speciﬁcations, the asymmetry terms(‘(RESID <0)∗ARCH(1)’ in the GJR model and ‘RESID( −1)/@SQRT(GARCH
(−1))’) are not statistically signiﬁcant (although it is almost signiﬁcant
in the case of the EGARCH model). Also in both cases, the coefﬁcientestimates are negative, suggesting that positive shocks imply a highernext period conditional variance than negative shocks of the same sign.

Dependent Variable: RJPY
Method: ML – ARCH (Marquardt) – Normal distributionDate: 09/06/07 Time: 18:20Sample (adjusted): 7/08/2002 7/07/2007Included observations: 1826 after adjustmentsConvergence achieved after 9 iterationsPresample variance: backcast (parameter = 0.7)GARCH =C(2)+C(3)
∗RESID( −1)∧2+C(4)∗RESID( −1)∧2∗(RESID( −1)<0)
+C(5)∗GARCH( −1)
Coefﬁcient Std. Error z-Statistic Prob.
C 0.005588 0.009602 0.581934 0.5606
Variance Equation
C 0.001361 0.000544 2.503534 0.0123
RESID( −1)∧2 0.029036 0.005373 5.404209 0.0000
RESID( −1)∧2∗(RESID(-1) <0)−0.001027 0.006140 −0.167301 0.8671
GARCH( −1) 0.963989 0.005644 170.7852 0.0000
R-squared −0.000094 Mean dependent var 0.001328
Adjusted R-squared −0.002291 S.D. dependent var 0.439632
S.E. of regression 0.440135 Akaike info criterion 1.140477Sum squared resid 352.7622 Schwarz criterion 1.155564Log likelihood −1036.256 Hannan-Quinn criter. 1.146042
Durbin-Watson stat 1.981753
Dependent Variable: RJPYMethod: ML – ARCH (Marquardt) – Normal distributionDate: 09/06/07 Time: 18:18Sample (adjusted): 7/08/2002 7/07/2007Included observations: 1826 after adjustmentsConvergence achieved after 12 iterationsPresample variance: backcast (parameter =0.7)
LOG(GARCH) =C(2)+C(3)
∗ABS(RESID( −1)/ SQRT(GARCH( −1)))
+C(4)∗RESID( −1)/ SQRT(GARCH( −1))+C(5)∗LOG(GARCH( −1))
Coefﬁcient Std. Error z-Statistic Prob.
C 0.003756 0.010025 0.374722 0.7079
Variance Equation
C(2) −1.262782 0.194243 −6.501047 0.0000
C(3) 0.214215 0.034226 6.258919 0.0000C(4) −0.046461 0.024983 −1.859751 0.0629
C(5) 0.329164 0.112572 2.924037 0.0035
R-squared −0.000031 Mean dependent var 0.001328
Adjusted R-squared −0.002227 S.D. dependent var 0.439632
S.E. of regression 0.440121 Akaike info criterion 1.183216Sum squared resid 352.7398 Schwarz criterion 1.198303Log likelihood −1075.276 Hannan-Quinn criter. 1.188781
Durbin-Watson stat 1.981879

408 Introductory Econometrics for Finance
This is the opposite to what would have been expected in the case of the
application of a GARCH model to a set of stock returns. But arguably,neither the leverage effect orvolatility feedback explanations for asymmetries
in the context of stocks apply here. For a positive return shock, this impliesmore yen per dollar and therefore a strengthening dollar and a weakeningyen. Thus the results suggest that a strengthening dollar (weakening yen)leads to higher next period volatility than when the yen strengthens bythe same amount.
8.15 Tests for asymmetries in volatility
Engle and Ng (1993) have proposed a set of tests for asymmetry in volatility,known as sign and size bias tests. The Engle and Ng tests should thus beused to determine whether an asymmetric model is required for a givenseries, or whether the symmetric GARCH model can be deemed adequate.In practice, the Engle–Ng tests are usually applied to the residuals of aGARCH ﬁt to the returns data. Deﬁne S
−
t−1as an indicator dummy that
takes the value 1 if ˆut−1<0and zero otherwise. The test for sign bias is
based on the signiﬁcance or otherwise of φ1in
ˆu2
t=φ0+φ1S−
t−1+υt (8.52)
where υtis an iid error term. If positive and negative shocks to ˆut−1im-
pact differently upon the conditional variance, then φ1will be statistically
signiﬁcant.
It could also be the case that the magnitude or size of the shock will
affect whether the response of volatility to shocks is symmetric or not.In this case, a negative size bias test would be conducted, based on aregression where S
−
t−1is now used as a slope dummy variable. Negative
size bias is argued to be present if φ1is statistically signiﬁcant in the
regression
ˆu2
t=φ0+φ1S−
t−1ut−1+υt (8.53)
Finally, deﬁning S+
t−1=1−S−
t−1,s ot h a t S+
t−1picks out the observations
with positive innovations, Engle and Ng propose a joint test for sign andsize bias based on the regression
ˆu
2
t=φ0+φ1S−
t−1+φ2S−
t−1ut−1+φ3S+
t−1ut−1+υt (8.54)
Signiﬁcance of φ1indicates the presence of sign bias, where positive
and negative shocks have differing impacts upon future volatility, com-pared with the symmetric response required by the standard GARCH

Modelling volatility and correlation 409
formulation. On the other hand, the signiﬁcance of φ2orφ3would suggest
the presence of size bias, where not only the sign but the magnitude ofthe shock is important. A joint test statistic is formulated in the standardfashion by calculating TR
2from regression (8.54), which will asymptoti-
cally follow a χ2distribution with 3 degrees of freedom under the null
hypothesis of no asymmetric effects.
8.15.1 News impact curves
A pictorial representation of the degree of asymmetry of volatility to pos-itive and negative shocks is given by the news impact curve introducedby Pagan and Schwert (1990). The news impact curve plots the next-periodvolatility (σ
2
t)that would arise from various positive and negative values
ofut−1, given an estimated model. The curves are drawn by using the esti-
mated conditional variance equation for the model under consideration,with its given coefﬁcient estimates, and with the lagged conditional vari-ance set to the unconditional variance. Then, successive values of u
t−1are
used in the equation to determine what the corresponding values of σ2
t
derived from the model would be. For example, consider the GARCH and
GJR model estimates given above for the S&P500 data from EViews. Valuesofu
t−1in the range (−1,+1) are substituted into the equations in each
case to investigate the impact on the conditional variance during the nextperiod. The resulting news impact curves for the GARCH and GJR modelsare given in ﬁgure 8.3.
As can be seen from ﬁgure 8.3, the GARCH news impact curve (the
grey line) is of course symmetrical about zero, so that a shock of givenmagnitude will have the same impact on the future conditional variancewhatever its sign. On the other hand, the GJR news impact curve (the blackline) is asymmetric, with negative shocks having more impact on futurevolatility than positive shocks of the same magnitude. It can also be seenthat a negative shock of given magnitude will have a bigger impact underGJR than would be implied by a GARCH model, while a positive shock ofgiven magnitude will have more impact under GARCH than GJR. The latterresult arises as a result of the reduction in the value of α
1, the coefﬁcient
on the lagged squared error, when the asymmetry term is included in themodel.
8.16 GARCH-in-mean
Most models used in ﬁnance suppose that investors should be rewardedfor taking additional risk by obtaining a higher return. One way to

410 Introductory Econometrics for Finance
0.14
0.12
0.1
0.080.060.04
0.02
0GARCH
GJRValue of conditional variance
–1
Value of lagged shock–0.9 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Figure 8.3
News impact curves
for S&P500 returnsusing coefﬁcientsimplied from GARCHand GJR modelestimates
operationalise this concept is to let the return of a security be partly
determined by its risk. Engle, Lilien and Robins (1987) suggested anARCH-M speciﬁcation, where the conditional variance of asset returns en-ters into the conditional mean equation. Since GARCH models are nowconsiderably more popular than ARCH, it is more common to estimatea GARCH-M model. An example of a GARCH-M model is given by thespeciﬁcation
y
t=μ+δσt−1+ut,ut∼N/parenleftbig
0,σ2
t/parenrightbig
(8.55)
σ2
t=α0+α1u2
t−1+βσ2
t−1 (8.56)
Ifδis positive and statistically signiﬁcant, then increased risk, given
by an increase in the conditional variance, leads to a rise in the meanreturn. Thus δcan be interpreted as a risk premium. In some empiri-
cal applications, the conditional variance term, σ
2
t−1, appears directly in
the conditional mean equation, rather than in square root form, σt−1.
Also, in some applications the term is contemporaneous, σ2
t, rather than
lagged.
8.16.1 GARCH-M estimation in EViews
The GARCH-M model with the conditional standard deviation term in themean, estimated using the rjpy data in EViews from the main GARCHmenu as described above, would give the following results:

Modelling volatility and correlation 411
Dependent Variable: RJPY
Method: ML – ARCH (Marquardt) – Normal distributionDate: 09/06/07 Time: 18:58Sample (adjusted): 7/08/2002 7/07/2007Included observations: 1826 after adjustmentsConvergence achieved after 18 iterationsPresample variance: backcast (parameter =0.7)
GARCH =C(3)+C(4)
∗RESID( −1)∧2+C(5)∗GARCH( −1)
Coefﬁcient Std. Error z-Statistic Prob.
SQRT(GARCH) −0.068943 0.124958 −0.551729 0.5811
C 0.033279 0.051802 0.642436 0.5206
Variance Equation
C 0.001373 0.000529 2.594929 0.0095
RESID( −1)∧2 0.028886 0.004150 6.960374 0.0000
GARCH( −1) 0.963568 0.005580 172.6828 0.0000
R-squared 0.000034 Mean dependent var 0.001328
Adjusted R-squared −0.002162 S.D. dependent var 0.439632
S.E. of regression 0.440107 Akaike info criterion 1.140302Sum squared resid 352.7170 Schwarz criterion 1.155390Log likelihood −1036.096 Hannan-Quinn criter. 1.145867
F-statistic 0.015541 Durbin-Watson stat 1.982106Prob(F-statistic) 0.999526
In this case, the estimated parameter on the mean equation has a neg-
ative sign but is not statistically signiﬁcant. We would thus conclude thatfor these currency returns, there is no feedback from the conditional vari-ance to the conditional mean.
8.17 Uses of GARCH-type models including volatility forecasting
Essentially GARCH models are useful because they can be used to modelthe volatility of a series over time. It is possible to combine together morethan one of the time series models that have been considered so far inthis book, to obtain more complex ‘hybrid’ models. Such models can ac-count for a number of important features of ﬁnancial series at the sametime – e.g. an ARMA–EGARCH(1,1)-M model; the potential complexity ofthe model is limited only by the imagination!
GARCH-type models can be used to forecast volatility. GARCH is a model
to describe movements in the conditional variance of an error term,

412 Introductory Econometrics for Finance
ut, which may not appear particularly useful. But it is possible to show
that
var (yt|yt−1,yt−2,…)=var (ut|ut−1,ut−2,…) (8.57)
So the conditional variance of y, given its previous values, is the same as
the conditional variance of u, given its previous values. Hence, modelling
σ2
twill give models and forecasts for the variance of ytas well. Thus, if
the dependent variable in a regression, ytis an asset return series, fore-
casts of σ2
twill be forecasts of the future variance of yt. So one primary
usage of GARCH-type models is in forecasting volatility. This can be use-ful in, for example, the pricing of ﬁnancial options where volatility is aninput to the pricing model. For example, the value of a ‘plain vanilla’ calloption is a function of the current value of the underlying, the strikeprice, the time to maturity, the risk free interest rate and volatility. Therequired volatility, to obtain an appropriate options price, is really thevolatility of the underlying asset expected over the lifetime of the option.As stated previously, it is possible to use a simple historical average mea-sure as the forecast of future volatility, but another method that seemsmore appropriate would be to use a time series model such as GARCH tocompute the volatility forecasts. The forecasting ability of various mod-els is considered in a paper by Day and Lewis (1992), discussed in detailbelow.
Producing forecasts from models of the GARCH class is relatively simple,
and the algebra involved is very similar to that required to obtain forecastsfrom ARMA models. An illustration is given by example 8.2.
Example 8.2
Consider the following GARCH(1,1) model
yt=μ+ut,ut∼N(0,σ2
t) (8.58)
σ2
t=α0+α1u2
t−1+βσ2
t−1 (8.59)
Suppose that the researcher had estimated the above GARCH model for
a series of returns on a stock index and obtained the following param-eter estimates: ˆμ=0.0023, ˆα
0=0.0172, ˆβ=0.7811, ˆα1=0.1251. If the
researcher has data available up to and including time T, write down
a set of equations in σ2
tand u2
tand their lagged values, which could
be employed to produce one-, two-, and three-step-ahead forecasts for theconditional variance of y
t.
What is needed is to generate forecasts of σT+12|/Omega1T,σT+22|/Omega1T,…,
σT+s2|/Omega1Twhere /Omega1Tdenotes all information available up to and including

Modelling volatility and correlation 413
observation T. For time T, the conditional variance equation is given by
(8.59). Adding one to each of the time subscripts of this equation, andthen two, and then three would yield equations (8.60)–(8.62)
σ
T+12=α0+α1u2
T+βσ2
T (8.60)
σT+22=α0+α1u2
T+1+βσ2
T+1 (8.61)
σT+32=α0+α1u2
T+2+βσ2
T+2 (8.62)
Letσf2
1,Tbe the one-step-ahead forecast for σ2made at time T. This is easy
to calculate since, at time T, the values of all the terms on the RHS are
known. σf2
1,Twould be obtained by taking the conditional expectation of
(8.60).
Given σf2
1,T,h o wi s σf2
2,T, the two-step-ahead forecast for σ2made at time
T, calculated?
σf2
1,T=α0+α1u2
T+βσ2
T (8.63)
From (8.61), it is possible to write
σf2
2,T=α0+α1E(u2
T+1|/Omega1T)+βσf2
1,T (8.64)
where E( u2
T+1|/Omega1T)is the expectation, made at time T,o f u2
T+1, which is
the squared disturbance term. It is necessary to ﬁnd E( u2
T+1|/Omega1T), using the
expression for the variance of a random variable ut. The model assumes
that the series uthas zero mean, so that the variance can be written
var (ut)=E[(ut−E(ut))2]=E/parenleftbig
u2
t/parenrightbig
. (8.65)
The conditional variance of utisσ2
t,s o
σ2
t|/Omega1t=E(ut)2(8.66)
Turning this argument around, and applying it to the problem at hand
E(uT+1|/Omega1t)2=σ2
T+1 (8.67)
butσ2
T+1is not known at time T, so it is replaced with the forecast for it,
σf2
1,T, so that (8.64) becomes
σf2
2,T=α0+α1σf2
1,T+βσf2
1,T (8.68)
σf2
2,T=α0+(α1+β)σf2
1,T (8.69)
What about the three-step-ahead forecast?

414 Introductory Econometrics for Finance
By similar arguments,
σf2
3,T=ET/parenleftbig
α0+α1u2
T+2+βσ2
T+2/parenrightbig
(8.70)
σf2
3,T=α0+(α1+β)σf2
2,T (8.71)
σf2
3,T=α0+(α1+β)/bracketleftbig
α0+(α1+β)σf2
1,T/bracketrightbig
(8.72)
σf2
3,T=α0+α0(α1+β)+(α1+β)2σf2
1,T (8.73)
Any s-step-ahead forecasts would be produced by
σf2
s,T=α0s−1/summationdisplay
i=1(α1+β)i−1+(α1+β)s−1σf2
1,T (8.74)
for any value of s≥2.
It is worth noting at this point that variances, and therefore variance
forecasts, are additive over time. This is a very useful property. Suppose,for example, that using daily foreign exchange returns, one-, two-, three-,four-, and ﬁve-step-ahead variance forecasts have been produced, i.e. aforecast has been constructed for each day of the next trading week.The forecasted variance for the whole week would simply be the sum ofthe ﬁve daily variance forecasts. If the standard deviation is the requiredvolatility estimate rather than the variance, simply take the square rootof the variance forecasts. Note also, however, that standard deviations arenot additive. Hence, if daily standard deviations are the required volatil-ity measure, they must be squared to turn them to variances. Then thevariances would be added and the square root taken to obtain a weeklystandard deviation.
8.17.1 Forecasting from GARCH models with EViews
Forecasts from any of the GARCH models that can be estimated usingEViews are obtained by using only a sub-sample of available data for modelestimation, and then by clicking on the ‘Forecast’ button that appearsafter the estimation of the required model has been completed. Suppose,for example, we stopped the estimation of the GARCH(1,1) model for theJapanese yen returns on 6 July 2005 so as to keep the last two years of datafor forecasting (i.e. the ‘Forecast sample’ is 7/07/2005 7/07/2007 . Then click
Proc/Forecast . . . and the dialog box in screenshot 8.3 will then appear.
Again, several options are available, including providing a name for the
conditional mean and for the conditional variance forecasts, or whether toproduce static (a series of rolling single-step-ahead) or dynamic (multiple-step-ahead) forecasts. The dynamic and static forecast plots that would beproduced are given in screenshots 8.4 and 8.5.

Screenshot 8.3
Forecasting from
GARCH models
Screenshot 8.4
Dynamic forecasts
of the conditionalvariance

416 Introductory Econometrics for Finance
Screenshot 8.5
Static forecasts of
the conditionalvariance
GARCH(1,1) Dynamic forecasts (2 years ahead)
The dynamic forecasts show a completely ﬂat forecast structure for the
mean (since the conditional mean equation includes only a constantterm), while at the end of the in-sample estimation period, the valueof the conditional variance was at a historically low level relative toits unconditional average. Therefore, the forecasts converge upon theirlong-term mean value from below as the forecast horizon increases. No-tice also that there are no ±2-standard error band conﬁdence intervals
for the conditional variance forecasts; to compute these would requiresome kind of estimate of the variance of variance, which is beyond thescope of this book (and beyond the capability of the built-in functionsof the EViews software). The conditional variance forecasts provide thebasis for the standard error bands that are given by the dotted red linesaround the conditional mean forecast. Because the conditional varianceforecasts rise gradually as the forecast horizon increases, so the standarderror bands widen slightly. The forecast evaluation statistics that are pre-sented in the box to the right of the graphs are for the conditional meanforecasts.

Modelling volatility and correlation 417
GARCH(1,1) Static forecasts (1 month ahead – 22 days)
It is evident that the variance forecasts gradually fall over the out-of-
sample period, although since these are a series of rolling one-step aheadforecasts for the conditional variance, they show much more volatilitythan for the dynamic forecasts. This volatility also results in more vari-ability in the standard error bars around the conditional mean forecasts.
Predictions can be similarly produced for any member of the GARCH
family that is estimable with the software.
8.18 Testing non-linear restrictions or testing hypotheses about
non-linear models
The usual t- and F-tests are still valid in the context of non-linear mod-
els, but they are not ﬂexible enough. For example, suppose that it is ofinterest to test a hypothesis that α
1β=1. Now that the model class has
been extended to non-linear models, there is no reason to suppose thatrelevant restrictions are only linear.
Under OLS estimation, the F-test procedure works by examining the de-
gree to which the RSS rises when the restrictions are imposed. In very
general terms, hypothesis testing under ML works in a similar fashion –that is, the procedure works by examining the degree to which the maxi-mal value of the LLFfalls upon imposing the restriction. If the LLFfalls ‘a
lot’, it would be concluded that the restrictions are not supported by thedata and thus the hypothesis should be rejected.
There are three hypothesis testing procedures based on maximum like-
lihood principles: Wald, Likelihood ratio and Lagrange Multiplier. To illus-trate brieﬂy how each of these operates, consider a single parameter, θto
be estimated, and denote the MLestimate as ˆθand a restricted estimate
as˜θ. Denoting the maximised value of the LLFby unconstrained MLas
L(ˆθ)and the constrained optimum as L(˜θ), the three testing procedures
can be illustrated as in ﬁgure 8.4.
The tests all require the measurement of the ‘distance’ between the
points A(representing the unconstrained maximised value of the log like-
lihood function) and B(representing the constrained value). The vertical
distance forms the basis of the LRtest. Twice this vertical distance is given
by2[L(ˆθ)−L(˜θ)]=2ln[l(ˆθ)/l(˜θ)], where Ldenotes the log-likelihood func-
tion, and ldenotes the likelihood function. The Wald test is based on
the horizontal distance between ˆθand ˜θ, while the LMtest compares the
slopes of the curve at Aand B.A tA, the unrestricted maximum of the log-
likelihood function, the slope of the curve is zero. But is it ‘signiﬁcantly

418 Introductory Econometrics for Finance
BAL( )
L( )ˆ
L( )˜
˜ ˆFigure 8.4
Three approaches to
hypothesis testingunder maximumlikelihood
steep’ at L(˜θ), i.e. at point B? The steeper the curve is at B, the less likely
the restriction is to be supported by the data.
Expressions for LMtest statistics involve the ﬁrst and second derivatives
of the log-likelihood function with respect to the parameters at the con-strained estimate. The ﬁrst derivatives of the log-likelihood function arecollectively known as the score vector, measuring the slope of the LLFfor
each possible value of the parameters. The expected values of the secondderivatives comprise the information matrix, measuring the peakednessof the LLF, and how much higher the LLFvalue is at the optimum than in
other places. This matrix of second derivatives is also used to constructthe coefﬁcient standard errors. The LMtest involves estimating only a re-
stricted regression, since the slope of the LLFat the maximum will be zero
by deﬁnition. Since the restricted regression is usually easier to estimatethan the unrestricted case, LMtests are usually the easiest of the three
procedures to employ in practice. The reason that restricted regressionsare usually simpler is that imposing the restrictions often means thatsome components in the model will be set to zero or combined under thenull hypothesis, so that there are fewer parameters to estimate. The Waldtest involves estimating only an unrestricted regression, and the usual OLSt-tests and F-tests are examples of Wald tests (since again, only unrestricted
estimation occurs).
Of the three approaches to hypothesis testing in the maximum-
likelihood framework, the likelihood ratio test is the most intuitively ap-pealing, and therefore a deeper examination of it will be the subject ofthe following section; see Ghosh (1991, section 10.3) for further details.

Modelling volatility and correlation 419
8.18.1 Likelihood ratio tests
Likelihood ratio ( LR) tests involve estimation under the null hypothesis and
under the alternative, so that two models are estimated: an unrestrictedmodel and a model where the restrictions have been imposed. The max-imised values of the LLFfor the restricted and unrestricted cases are ‘com-
pared’. Suppose that the unconstrained model has been estimated and thata given maximised value of the LLF, denoted L
u, has been achieved. Sup-
pose also that the model has been estimated imposing the constraint(s)and a new value of the LLFobtained, denoted L
r. The LRtest statistic
asymptotically follows a Chi-squared distribution and is given by
LR=−2(Lr−Lu)∼χ2(m) (8.75)
where m=number of restrictions. Note that the maximised value of the
log-likelihood function will always be at least as big for the unrestrictedmodel as for the restricted model, so that L
r≤Lu. This rule is intuitive
and comparable to the effect of imposing a restriction on a linear modelestimated by OLS, that RRSS≥URSS . Similarly, the equality between L
r
and Luwill hold only when the restriction was already present in the
data. Note, however, that the usual F-test is in fact a Wald test, and not a
LRtest – that is, it can be calculated using an unrestricted model only. The
F-test approach based on comparing RSSarises conveniently as a result of
the OLS algebra.
Example 8.3
A GARCH model is estimated and a maximised LLFof 66.85 is obtained.
Suppose that a researcher wishes to test whether β=0in (8.77)
yt=μ+φyt−1+ut,ut∼N/parenleftbig
0,σ2
t/parenrightbig
(8.76)
σ2
t=α0+α1u2
t−1+βσ2
t−1 (8.77)
The model is estimated imposing the restriction and the maximised LLF
falls to 64.54. Is the restriction supported by the data, which would corre-spond to the situation where an ARCH(1) speciﬁcation was sufﬁcient? Thetest statistic is given by
LR=−2(64.54−66.85)=4.62 (8.78)
The test follows a χ
2(1)=3.84 at 5%, so that the null is marginally rejected.
It would thus be concluded that an ARCH(1) model, with no lag of theconditional variance in the variance equation, is not quite sufﬁcient todescribe the dependence in volatility over time.

420 Introductory Econometrics for Finance
8.19 Volatility forecasting: some examples and results
from the literature
There is a vast and relatively new literature that attempts to compare
the accuracies of various models for producing out-of-sample volatilityforecasts. Akgiray (1989), for example, ﬁnds the GARCH model superior toARCH, exponentially weighted moving average and historical mean modelsfor forecasting monthly US stock index volatility. A similar result concern-ing the apparent superiority of GARCH is observed by West and Cho (1995)using one-step-ahead forecasts of dollar exchange rate volatility, althoughfor longer horizons, the model behaves no better than their alternatives.Pagan and Schwert (1990) compare GARCH, EGARCH, Markov switchingregime and three non-parametric models for forecasting monthly US stockreturn volatilities. The EGARCH followed by the GARCH models performmoderately; the remaining models produce very poor predictions. Fransesand van Dijk (1996) compare three members of the GARCH family (stan-dard GARCH, QGARCH and the GJR model) for forecasting the weeklyvolatility of various European stock market indices. They ﬁnd that thenon-linear GARCH models were unable to beat the standard GARCH model.Finally, Brailsford and Faff (1996) ﬁnd GJR and GARCH models slightly su-perior to various simpler models for predicting Australian monthly stockindex volatility. The conclusion arising from this growing body of researchis that forecasting volatility is a ‘notoriously difﬁcult task’ (Brailsford andFaff, 1996, p. 419), although it appears that conditional heteroscedastic-ity models are among the best that are currently available. In particular,more complex non-linear and non-parametric models are inferior in pre-diction to simpler models, a result echoed in an earlier paper by Dimsonand Marsh (1990) in the context of relatively complex versus parsimoniouslinear models. Finally, Brooks (1998), considers whether measures of mar-ket volume can assist in improving volatility forecast accuracy, ﬁndingthat they cannot.
A particularly clear example of the style and content of this class of re-
search is given by Day and Lewis (1992). The Day and Lewis study will there-fore now be examined in depth. The purpose of their paper is to considerthe out-of-sample forecasting performance of GARCH and EGARCH modelsfor predicting stock index volatility. The forecasts from these economet-ric models are compared with those given from an ‘implied volatility’.As discussed above, implied volatility is the market’s expectation of the‘average’ level of volatility of an underlying asset over the life of the op-tion that is implied by the current traded price of the option. Given anassumed model for pricing options, such as the Black–Scholes, all of the

Modelling volatility and correlation 421
inputs to the model except for volatility can be observed directly from
the market or are speciﬁed in the terms of the option contract. Thus, it ispossible, using an iterative search procedure such as the Newton–Raphsonmethod (see, for example, Watsham and Parramore, 2004), to ‘back out’the volatility of the underlying asset from the option’s price. An impor-tant question for research is whether implied or econometric models pro-duce more accurate forecasts of the volatility of the underlying asset. Ifthe options and underlying asset markets are informationally efﬁcient,econometric volatility forecasting models based on past realised values ofunderlying volatility should have no incremental explanatory power forfuture values of volatility of the underlying asset. On the other hand, ifeconometric models do hold additional information useful for forecastingfuture volatility, it is possible that such forecasts could be turned into aproﬁtable trading rule.
The data employed by Day and Lewis comprise weekly closing prices
(Wednesday to Wednesday, and Friday to Friday) for the S&P100 Index op-tion and the underlying index from 11 March 1983–31 December 1989.They employ both mid-week to mid-week returns and Friday to Friday re-turns to determine whether weekend effects have any signiﬁcant impacton the latter. They argue that Friday returns contain expiration effectssince implied volatilities are seen to jump on the Friday of the week of ex-piration. This issue is not of direct interest to this book, and consequentlyonly the mid-week to mid-week results will be shown here.
The models that Day and Lewis employ are as follows. First, for the
conditional mean of the time series models, they employ a GARCH-Mspeciﬁcation for the excess of the market return over a risk-free proxy
R
Mt−RFt=λ0+λ1/radicalbig
ht+ut (8.79)
where RMtdenotes the return on the market portfolio, and RFtdenotes
the risk-free rate. Note that Day and Lewis denote the conditional variancebyh
2
t, while this is modiﬁed to the standard hthere. Also, the notation σ2
t
will be used to denote implied volatility estimates. For the variance, two
speciﬁcations are employed: a ‘plain vanilla’ GARCH(1,1) and an EGARCH
ht=α0+α1u2
t−1+β1ht−1 (8.80)
or
ln(ht)=α0+β1ln(ht−1)+α1/parenleftBigg
θut−1/radicalbight−1+γ/bracketleftBigg/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingleu
t−1/radicalbight−1/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle−/parenleftbigg2
π/parenrightbigg1/2/bracketrightBigg/parenrightBigg
(8.81)

422 Introductory Econometrics for Finance
One way to test whether implied or GARCH-type volatility models perform
best is to add a lagged value of the implied volatility estimate ( σ2
t−1)to
(8.80) and (8.81). A ‘hybrid’ or ‘encompassing’ speciﬁcation would thusresult. Equation (8.80) becomes
h
t=α0+α1u2
t−1+β1ht−1+δσ2
t−1 (8.82)
and (8.81) becomes
ln(ht)=α0+β1ln(ht−1)
+α1/parenleftBigg
θut−1√ht−1+γ/bracketleftBigg/vextendsingle/vextendsingle/vextendsingle/vextendsingleu
t−1√ht−1/vextendsingle/vextendsingle/vextendsingle/vextendsingle−/parenleftbigg2
π/parenrightbigg1/2/bracketrightBigg/parenrightBigg
+δln/parenleftbig
σ2
t−1/parenrightbig
(8.83)
The tests of interest are given by H0:δ=0 in (8.82) or (8.83). If these
null hypotheses cannot be rejected, the conclusion would be that im-plied volatility contains no incremental information useful for explainingvolatility than that derived from a GARCH model. At the same time, H
0:
α1=0 and β1=0 in (8.82), and H0:α1=0 and β1=0 and θ=0a n d
γ=0 in (8.83) are also tested. If this second set of restrictions holds, then
(8.82) and (8.83) collapse to
ht=α0+δσ2
t−1 (8.82/prime)
and
ln/parenleftbig
ht/parenrightbig
=α0+δln/parenleftbig
σ2
t−1/parenrightbig
(8.83/prime)
These sets of restrictions on (8.82) and (8.83) test whether the lagged
squared error and lagged conditional variance from a GARCH model con-tain any additional explanatory power once implied volatility is includedin the speciﬁcation. All of these restrictions can be tested fairly easilyusing a likelihood ratio test. The results of such a test are presented intable 8.1.
It appears from the coefﬁcient estimates and their standard errors un-
der the speciﬁcation (8.82) that the implied volatility term ( δ) is statistically
signiﬁcant, while the GARCH terms (α
1andβ1)are not. However, the test
statistics given in the ﬁnal column are both greater than their correspond-ingχ
2critical values, indicating that both GARCH and implied volatility
have incremental power for modelling the underlying stock volatility. Asimilar analysis is undertaken in Day and Lewis that compares EGARCHwith implied volatility. The results are presented here in table 8.2.
The EGARCH results tell a very similar story to those of the GARCH spec-
iﬁcations. Neither the lagged information from the EGARCH speciﬁcationnor the lagged implied volatility terms can be suppressed, according to the

Modelling volatility and correlation 423
Table 8.1 GARCH versus implied volatility
RMt−RFt=λ0+λ1√ht+ut (8.79)
ht=α0+α1u2
t−1+β1ht−1 (8.80)
ht=α0+α1u2
t−1+β1ht−1+δσ2
t−1 (8.82)
ht=α0+δσ2
t−1 (8.82/prime)
Equation for
variance λ0 λ1 α0×10−4α1 β1 δ Log-L χ2
(8.80) 0.0072 0.071 5.428 0.093 0.854 − 767.321 17.77
(0.005) (0.01) (1.65) (0.84) (8.17)
(8.82) 0.0015 0.043 2.065 0.266 −0.068 0.318 776.204 −
(0.028) (0.02) (2.98) (1.17) ( −0.59) (3.00)
(8.82/prime) 0.0056 −0.184 0.993 −− 0.581 764.394 23.62
(0.001) ( −0.001) (1.50) (2.94)
Notes :t-ratios in parentheses, Log-L denotes the maximised value of the log-
likelihood function in each case. χ2denotes the value of the test statistic,
which follows a χ2(1)in the case of (8.82) restricted to (8.80), and a χ2(2)in the case
of (8.82) restricted to (8.82/prime).
Source : Day and Lewis (1992). Reprinted with the permission of Elsevier Science.
Table 8.2 EGARCH versus implied volatility
RMt−RFt=λ0+λ1√ht+ut (8.79)
ln(ht)=α0+β1ln(ht−1)+α1/parenleftBigg
θut−1√ht−1+γ/bracketleftBigg/vextendsingle/vextendsingle/vextendsingle/vextendsingleu
t−1√ht−1/vextendsingle/vextendsingle/vextendsingle/vextendsingle−/parenleftbigg2
π/parenrightbigg1/2/bracketrightBigg/parenrightBigg
(8.81)
ln(ht)=α0+β1ln(ht−1)+α1/parenleftBigg
θut−1√ht−1+γ/bracketleftBigg/vextendsingle/vextendsingle/vextendsingle/vextendsingleu
t−1√ht−1/vextendsingle/vextendsingle/vextendsingle/vextendsingle−/parenleftbigg2
π/parenrightbigg1/2/bracketrightBigg/parenrightBigg
+δln/parenleftbig
σ2
t−1/parenrightbig
(8.83)
ln(ht)=α0+δln/parenleftbig
σ2
t−1/parenrightbig
(8.83/prime)
Equation for
variance λ0 λ1 α0×10−4β1 θγ δ Log-L χ2
(8.81) −0.0026 0.094 −3.62 0.529 0.273 0.357 − 776.436 8.09
(−0.03) (0.25) ( −2.90) (3.26) ( −4.13) (3.17)
(8.83) 0.0035 −0.076 −2.28 0.373 −0.282 0.210 0.351 780.480 −
(0.56) ( −0.24) ( −1.82) (1.48) ( −4.34) (1.89) (1.82)
(8.83/prime) 0.0047 −0.139 −2.76 −− − 0.667 765.034 30.89
(0.71) ( −0.43) ( −2.30) (4.01)
Notes :t-ratios in parentheses, Log-L denotes the maximised value of the log-
likelihood function in each case. χ2denotes the value of the test statistic, which
follows a χ2(1)in the case of (8.83) restricted to (8.81), and a χ2(3)in the case of
(8.83) restricted to (8.83/prime).
Source : Day and Lewis (1992). Reprinted with the permission of Elsevier Science.

424 Introductory Econometrics for Finance
likelihood ratio statistics. In speciﬁcation (8.83), both the EGARCH terms
and the implied volatility coefﬁcients are marginally signiﬁcant.
However, the tests given above do not represent a true test of the pre-
dictive ability of the models, since all of the observations were used inboth estimating and testing the models. Hence the authors proceed toconduct an out-of-sample forecasting test. There are a total of 729 datapoints in their sample. They use the ﬁrst 410 to estimate the models, andthen make a one-step-ahead forecast of the following week’s volatility. Theythen roll the sample forward one observation at a time, constructing anew one-step-ahead forecast at each stage.
They evaluate the forecasts in two ways. The ﬁrst is by regressing the
realised volatility series on the forecasts plus a constant
σ
2
t+1=b0+b1σ2
ft+ξt+1 (8.84)
where σ2
t+1is the ‘actual’ value of volatility at time t+1, and σ2
ftis the
value forecasted for it during period t. Perfectly accurate forecasts would
imply b0=0and b1=1. The second method is via a set of forecast encom-
passing tests. Essentially, these operate by regressing the realised volatilityon the forecasts generated by several models. The forecast series that havesigniﬁcant coefﬁcients are concluded to encompass those of models whosecoefﬁcients are not signiﬁcant.
But what is volatility? In other words, with what measure of realised or
‘ex post ’ volatility should the forecasts be compared? This is a question that
received very little attention in the literature until recently. A commonmethod employed is to assume, for a daily volatility forecasting exercise,that the relevant ex post measure is the square of that day’s return. For
any random variable r
t, its conditional variance can be expressed as
var(rt)=E[rt−E(rt)]2(8.85)
As stated previously, it is typical, and not unreasonable for relatively high
frequency data, to assume that E(rt)is zero, so that the expression for the
variance reduces to
var(rt)=E/bracketleftbig
r2
t/bracketrightbig
(8.86)
Andersen and Bollerslev (1998) argue that squared daily returns provide
a very noisy proxy for the true volatility, and a much better proxy forthe day’s variance would be to compute the volatility for the day fromintra-daily data. For example, a superior daily variance measure couldbe obtained by taking hourly returns, squaring them and adding themup. The reason that the use of higher frequency data provides a bettermeasure of ex post volatility is simply that it employs more information.

Modelling volatility and correlation 425
By using only daily data to compute a daily volatility measure, effectively
only two observations on the underlying price series are employed. If thedaily closing price is the same one day as the next, the squared return andtherefore the volatility would be calculated to be zero, when there mayhave been substantial intra-day ﬂuctuations. Hansen and Lunde (2006) gofurther and suggest that even the ranking of models by volatility forecastaccuracy could be inconsistent if the evaluation uses a poor proxy for thetrue, underlying volatility.
Day and Lewis use two measures of ex post volatility in their study (for
which the frequency of data employed in the models is weekly):
(1) The square of the weekly return on the index, which they call SR
(2) The variance of the week’s daily returns multiplied by the number of
trading days in that week, which they call WV.
The Andersen and Bollerslev argument implies that the latter measure is
likely to be superior, and therefore that more emphasis should be placedon those results.
The results for the separate regressions of realised volatility on a con-
stant and the forecast are given in table 8.3.
The coefﬁcient estimates for b
0given in table 8.3 can be interpreted as
indicators of whether the respective forecasting approaches are biased. Inall cases, the b
0coefﬁcients are close to zero. Only for the historic volatility
forecasts and the implied volatility forecast when the ex post measure is the
squared weekly return, are the estimates statistically signiﬁcant. Positivecoefﬁcient estimates would suggest that on average the forecasts are toolow. The estimated b
1coefﬁcients are in all cases a long way from unity,
except for the GARCH (with daily variance ex post volatility) and EGARCH
(with squared weekly variance as ex post measure) models. Finally, the R2
values are very small (all less than 10%, and most less than 3%), suggesting
that the forecast series do a poor job of explaining the variability of therealised volatility measure.
The forecast encompassing regressions are based on a procedure due to
Fair and Shiller (1990) that seeks to determine whether differing sets offorecasts contain different sets of information from one another. The testregression is of the form
σ
2
t+1=b0+b1σ2
It+b2σ2
Gt+b3σ2
Et+b4σ2
Ht+ξt+1 (8.87)
with results presented in table 8.4.
The sizes and signiﬁcances of the coefﬁcients in table 8.4 are of interest.
The most salient feature is the lack of signiﬁcance of most of the fore-cast series. In the ﬁrst comparison, neither the implied nor the GARCH

Table 8.3 Out-of-sample predictive power for weekly volatility forecasts
σ2
t+1=b0+b1σ2
ft+ξt+1 (8.84)
Proxy for ex
Forecasting model post volatility b0 b1 R2
Historic SR 0.0004 0.129 0.094
(5.60) (21.18)
Historic WV 0.0005 0.154 0.024
(2.90) (7.58)
GARCH SR 0.0002 0.671 0.039
(1.02) (2.10)
GARCH WV 0.0002 1.074 0.018
(1.07) (3.34)
EGARCH SR 0.0000 1.075 0.022
(0.05) (2.06)
EGARCH WV −0.0001 1.529 0.008
(−0.48) (2.58)
Implied volatility SR 0.0022 0.357 0.037
(2.22) (1.82)
Implied volatility WV 0.0005 0.718 0.026
(0.389) (1.95)
Notes : ‘Historic’ refers to the use of a simple historical average of the squared returns
to forecast volatility; t-ratios in parentheses; SR and WV refer to the square of the
weekly return on the S&P100, and the variance of the week’s daily returnsmultiplied by the number of trading days in that week, respectively.Source : Day and Lewis (1992). Reprinted with the permission of Elsevier Science.
Table 8.4 Comparisons of the relative information content of out-of-sample volatility
forecasts
σ2
t+1=b0+b1σ2
It+b2σ2
Gt+b3σ2
Et+b4σ2
Ht+ξt+1 (8.87)
Forecast comparisons b0 b1 b2 b3 b4 R2
Implied versus GARCH −0.00010 0.601 0.298 −− 0.027
(−0.09) (1.03) (0.42)
Implied versus GARCH 0.00018 0.632 −0.243 − 0.123 0.038
versus Historical (1.15) (1.02) ( −0.28) (7.01)
Implied versus EGARCH −0.00001 0.695 − 0.176 − 0.026
(−0.07) (1.62) (0.27)
Implied versus EGARCH 0.00026 0.590 −0.374 − 0.118 0.038
versus Historical (1.37) (1.45) ( −0.57) (7.74)
GARCH versus EGARCH 0.00005 − 1.070 −0.001 − 0.018
(0.370) (2.78) ( −0.00)
Notes :t-ratios in parentheses; the ex post measure used in this table is the variance
of the week’s daily returns multiplied by the number of trading days in that week.Source : Day and Lewis (1992). Reprinted with the permission of Elsevier Science.

Modelling volatility and correlation 427
forecast series have statistically signiﬁcant coefﬁcients. When historical
volatility is added, its coefﬁcient is positive and statistically signiﬁcant.An identical pattern emerges when forecasts from implied and EGARCHmodels are compared: that is, neither forecast series is signiﬁcant, butwhen a simple historical average series is added, its coefﬁcient is signif-icant. It is clear from this, and from the last row of table 8.4, that theasymmetry term in the EGARCH model has no additional explanatorypower compared with that embodied in the symmetric GARCH model.Again, all of the R
2values are very low (less than 4%).
The conclusion reached from this study (which is broadly in line with
many others) is that within sample, the results suggest that impliedvolatility contains extra information not contained in the GARCH/EGARCHspeciﬁcations. But the out-of-sample results suggest that predicting volatil-ity is a difﬁcult task!
8.20 Stochastic volatility models revisited
Under the heading of models for time-varying volatilities, only approachesbased on the GARCH class of models have been discussed thus far. Anotherclass of models is also available, known as stochastic volatility (SV) models.It is a common misconception that GARCH-type speciﬁcations are sortsof stochastic volatility models. However, as the name suggests, stochasticvolatility models differ from GARCH principally in that the conditionalvariance equation of a GARCH speciﬁcation is completely deterministicgiven all information available up to that of the previous period. In otherwords, there is no error term in the variance equation of a GARCH model,only in the mean equation.
Stochastic volatility models contain a second error term, which enters
into the conditional variance equation. A very simple example of a stochas-tic volatility model would be the autoregressive volatility speciﬁcation de-scribed in section 8.6. This model is simple to understand and simple toestimate, because it requires that we have an observable measure of volatil-ity which is then simply used as any other variable in an autoregressivemodel. However, the term ‘stochastic volatility’ is usually associated witha different formulation, a possible example of which would be
y
t=μ+utσt,ut∼N(0,1) (8.88)
log/parenleftbig
σ2
t/parenrightbig
=α0+β1log/parenleftbig
σ2
t−1/parenrightbig
+σηηt (8.89)
where ηtis another N(0,1) random variable that is independent of ut. Here
the volatility is latent rather than observed, and so is modelled indirectly.

428 Introductory Econometrics for Finance
Stochastic volatility models are closely related to the ﬁnancial theories
used in the options pricing literature. Early work by Black and Scholes(1973) had assumed that volatility is constant through time. Such an as-sumption was made largely for simplicity, although it could hardly beconsidered realistic. One unappealing side-effect of employing a modelwith the embedded assumption that volatility is ﬁxed, is that optionsdeep in-the-money and far out-of-the-money are underpriced relative toactual traded prices. This empirical observation provided part of the gen-esis for stochastic volatility models, where the logarithm of an unobservedvariance process is modelled by a linear stochastic speciﬁcation, suchas an autoregressive model. The primary advantage of stochastic volatil-ity models is that they can be viewed as discrete time approximationsto the continuous time models employed in options pricing frameworks(see, for example, Hull and White, 1987). However, such models are hardto estimate. For reviews of (univariate) stochastic volatility models, seeTaylor (1994), Ghysels et al. (1995) or Shephard (1996) and the references
therein.
While stochastic volatility models have been widely employed in the
mathematical options pricing literature, they have not been popularin empirical discrete-time ﬁnancial applications, probably owing to thecomplexity involved in the process of estimating the model parameters(see Harvey, Ruiz and Shephard, 1994). So, while GARCH-type models arefurther from their continuous time theoretical underpinnings thanstochastic volatility, they are much simpler to estimate using maximumlikelihood. A relatively simple modiﬁcation to the maximum likelihoodprocedure used for GARCH model estimation is not available, and hencestochastic volatility models are not discussed further here.
8.21 Forecasting covariances and correlations
A major limitation of the volatility models examined above is that they areentirely univariate in nature – that is, they model the conditional varianceof each series entirely independently of all other series. This is potentiallyan important limitation for two reasons. First, to the extent that theremay be ‘volatility spillovers’ between markets or assets (a tendency forvolatility to change in one market or asset following a change in thevolatility of another), the univariate model will be misspeciﬁed. Second,it is often the case in ﬁnance that the covariances between series are ofinterest, as well as the variances of the individual series themselves. Thecalculation of hedge ratios, portfolio value at risk estimates, CAPM betas,

Modelling volatility and correlation 429
and so on, all require covariances as inputs. Multivariate GARCH models
can potentially overcome both of these deﬁciencies with their univariatecounterparts. Multivariate extensions to GARCH models can be used toforecast the volatilities of the component series, just as with univariatemodels. In addition, because multivariate models give estimates for theconditional covariances as well as the conditional variances, they have anumber of other potentially useful applications.
Several papers have investigated the forecasting ability of various mod-
els incorporating correlations. Siegel (1997), for example, ﬁnds that im-plied correlation forecasts from traded options encompass all informationembodied in the historical returns (although he does not consider EWMA-or GARCH-based models). Walter and Lopez (2000), on the other hand, ﬁndthat implied correlation is generally less useful for predicting the futurecorrelation between the underlying assets’ returns than forecasts derivedfrom GARCH models. Finally, Gibson and Boyer (1998) ﬁnd that a diago-nal GARCH and a Markov switching approach provide better correlationforecasts than simpler models in the sense that the latter produce smallerproﬁts when the forecasts are employed in a trading strategy.
8.22 Covariance modelling and forecasting in ﬁnance: some examples
8.22.1 The estimation of conditional betas
The CAPM beta for asset iis deﬁned as the ratio of the covariance be-
tween the market portfolio return and the asset return, to the variance ofthe market portfolio return. Betas are typically constructed using a set ofhistorical data on market variances and covariances. However, like mostother problems in ﬁnance, beta estimation conducted in this fashion isbackward-looking, when investors should really be concerned with thebeta that will prevail in the future over the time that the investor is con-sidering holding the asset. Multivariate GARCH models provide a simplemethod for estimating conditional (or time-varying) betas. Then forecastsof the covariance between the asset and the market portfolio returns andforecasts of the variance of the market portfolio are made from the model,so that the beta is a forecast, whose value will vary over time
β
i,t=σim,t
σ2
m,t(8.90)
where βi,tis the time-varying beta estimate at time tfor stock i,σim,tis
the covariance between market returns and returns to stock iat time t
andσ2
m,tis the variance of the market return at time t.

430 Introductory Econometrics for Finance
8.22.2 Dynamic hedge ratios
Although there are many techniques available for reducing and manag-
ing risk, the simplest and perhaps the most widely used, is hedging withfutures contracts. A hedge is achieved by taking opposite positions inspot and futures markets simultaneously, so that any loss sustained froman adverse price movement in one market should to some degree beoffset by a favourable price movement in the other. The ratio of the num-ber of units of the futures asset that are purchased relative to the numberof units of the spot asset is known as the hedge ratio . Since risk in this
context is usually measured as the volatility of portfolio returns, an in-tuitively plausible strategy might be to choose that hedge ratio whichminimises the variance of the returns of a portfolio containing the spotand futures position; this is known as the optimal hedge ratio . The optimal
value of the hedge ratio may be determined in the usual way, followingHull (2005) by ﬁrst deﬁning:
/Delta1S=change in spot price S, during the life of the hedge /Delta1F=change
in futures price, F, during the life of the hedge σ
s=standard deviation
of/Delta1SσF=standard deviation of /Delta1Fp=correlation coefﬁcient between
/Delta1Sand/Delta1Fh=hedge ratio
For a short hedge (i.e. long in the asset and short in the futures contract),
the change in the value of the hedger’s position during the life of thehedge will be given by (/Delta1S−h/Delta1F), while for a long hedge, the appropriate
expression will be (h/Delta1F−/Delta1S).
The variances of the two hedged portfolios (long spot and short futures
or long futures and short spot) are the same. These can be obtained from
var(h/Delta1F−/Delta1S)
Remembering the rules for manipulating the variance operator, this can
be written
var(/Delta1S)+var(h/Delta1F)−2cov(/Delta1S,h/Delta1F)
or
var(/Delta1S)+h
2var(/Delta1F)−2hcov(/Delta1S,/Delta1F)
Hence the variance of the change in the value of the hedged position is
given by
v=σ2
s+h2σ2
F−2hpσsσF (8.91)
Minimising this expression w.r.t. hwould give
h=pσs
σF(8.92)

Modelling volatility and correlation 431
Again, according to this formula, the optimal hedge ratio is time-
invariant, and would be calculated using historical data. However, whatif the standard deviations are changing over time? The standard devia-tions and the correlation between movements in the spot and futuresseries could be forecast from a multivariate GARCH model, so that theexpression above is replaced by
h
t=ptσs,t
σF,t(8.93)
Various models are available for covariance or correlation forecasting, and
several will be discussed below.
8.23 Historical covariance and correlation
In exactly the same fashion as for volatility, the historical covariance orcorrelation between two series can be calculated in the standard way usinga set of historical data.
8.24 Implied covariance models
Implied covariances can be calculated using options whose payoffs aredependent on more than one underlying asset. The relatively small num-ber of such options that exist limits the circumstances in which impliedcovariances can be calculated. Examples include rainbow options, ‘crack-spread’ options for different grades of oil, and currency options. In thelatter case, the implied variance of the cross-currency returns xyis given
by
˜σ
2(xy)=˜σ2(x)+˜σ2(y)−2˜σ(x,y) (8.94)
where ˜σ2(x)and ˜σ2(y)are the implied variances of the xand yreturns,
respectively, and ˜σ(x,y)is the implied covariance between xand y. By sub-
stituting the observed option implied volatilities of the three currenciesinto (8.94), the implied covariance is obtained via
˜σ(x,y)=˜σ
2(x)+˜σ2(y)−˜σ2(xy)
2(8.95)
So, for instance, if the implied covariance between USD/DEM and USD/JPY
is of interest, then the implied variances of the returns of USD/DEM andUSD/JPY, as well as the returns of the cross-currency DEM/JPY, are requiredso as to obtain the implied covariance using (8.94).

432 Introductory Econometrics for Finance
8.25 Exponentially weighted moving average model for covariances
Again, as for the case of volatility modelling, an EWMA speciﬁcation is
available that gives more weight in the calculation of covariance to recentobservations than the estimate based on the simple average. The EWMAmodel estimate for covariance at time tand the forecast for subsequent
periods may be written
σ(x,y)
t=(1−λ)∞/summationdisplay
i=0λixt−iyt−i (8.96)
withλ(0<λ< 1)again denoting the decay factor, determining the rela-
tive weights attached to recent versus less recent observations.
8.26 Multivariate GARCH models
Multivariate GARCH models are in spirit very similar to their univari-ate counterparts, except that the former also specify equations for howthe covariances move over time. Several different multivariate GARCH for-mulations have been proposed in the literature, including the VECH ,t h e
diagonal VECH and the BEKK models. Each of these is discussed in turn
below; for a more detailed discussion, see Kroner and Ng (1998). In eachcase, it is assumed below for simplicity that there are two assets, whosereturn variances and covariances are to be modelled. For an excellent sur-vey of multivariate GARCH models, see Bauwens, Laurent and Rombouts(2006).
2
8.26.1 The VECH model
A common speciﬁcation of the VECH model, initially due to Bollerslev,
Engle and Wooldridge (1988), is
VECH (Ht)=C+AVECH (/Xi1t−1/Xi1/prime
t−1)+BVECH (Ht−1)
/Xi1t|ψt−1∼N(0,Ht),(8.97)
where Hti sa2 ×2 conditional variance–covariance matrix, /Xi1ti sa2 ×1
innovation (disturbance) vector, ψt−1represents the information set at
time t−1,Cis a 3 ×1 parameter vector, Aand Bare 3 ×3 parameter
matrices and VECH (·)denotes the column-stacking operator applied to the
upper portion of the symmetric matrix. The model requires the estimation
2It is also worth noting that there also exists a class of multivariate stochastic volatility
models. These were originally proposed by Harvey, Ruiz and Shephard (1994), althoughsee also Brooks (2006).

Modelling volatility and correlation 433
of 21 parameters ( Chas 3 elements, Aand Beach have 9 elements). In
order to gain a better understanding of how the VECH model works, the
elements are written out below. Deﬁne
Ht=/bracketleftbiggh11th12t
h21th22t/bracketrightbigg
,/Xi1t=/bracketleftbiggu1t
u2t/bracketrightbigg
,C=⎡
⎣c11
c21
c31⎤
⎦,
A=⎡
⎣a11a12a13
a21a22a23
a31a32a33⎤
⎦,B=⎡
⎣b11b12b13
b21b22b23
b31b32b33⎤
⎦,
The VECH operator takes the ‘upper triangular’ portion of a matrix, and
stacks each element into a vector with a single column. For example, inthe case of VECH (H
t), this becomes
VECH (Ht)=⎡
⎢⎣h11t
h22t
h12t⎤
⎥⎦
where hiitrepresent the conditional variances at time tof the two-asset
return series ( i=1, 2) used in the model, and hijt(i/negationslash=j)represent the con-
ditional covariances between the asset returns. In the case of VECH (/Xi1t/Xi1/prime
t),
this can be expressed as
VECH (/Xi1t/Xi1/prime
t)=VECH/parenleftbigg/bracketleftbiggu1t
u2t/bracketrightbigg/bracketleftbigg
u1tu2t/bracketrightbigg/parenrightbigg
=VECH/parenleftBigg
u2
1t u1tu2t
u1tu2t u2
2t/parenrightBigg
=⎡
⎢⎣u2
1t
u2
2t
u1tu2t⎤
⎥⎦
The VECH model in full is given by
h11t=c11+a11u2
1t−1+a12u2
2t−1+a13u1t−1u2t−1+b11h11t−1
+b12h22t−1+b13h12t−1 (8.98)
h22t=c21+a21u2
1t−1+a22u2
2t−1+a23u1t−1u2t−1+b21h11t−1
+b22h22t−1+b23h12t−1 (8.99)
h12t=c31+a31u2
1t−1+a32u2
2t−1+a33u1t−1u2t−1+b31h11t−1
+b32h22t−1+b33h12t−1 (8.100)
Thus, it is clear that the conditional variances and conditional covariances
depend on the lagged values of all of the conditional variances of, and

434 Introductory Econometrics for Finance
conditional covariances between, all of the asset returns in the series, as
well as the lagged squared errors and the error cross-products. Estimationof such a model would be quite a formidable task, even in the two-assetcase considered here.
8.26.2 The diagonal VECH model
Even in the simple case of two assets, the conditional variance and covari-ance equations for the unrestricted VECH model contain 21 parameters. As
the number of assets employed in the model increases, the estimation oftheVECH model can quickly become infeasible. Hence the VECH model’s
conditional variance–covariance matrix has been restricted to the formdeveloped by Bollerslev, Engle and Wooldridge (1988), in which Aand B
are assumed to be diagonal. This reduces the number of parameters tobe estimated to 9 (now Aand Beach have 3 elements) and the model,
known as a diagonal VECH , is now characterised by
h
ij,t=ωij+αijui,t−1uj,t−1+βijhij,t−1fori,j=1,2, (8.101)
where ωij,αijandβijare parameters. The diagonal VECH multivariate
GARCH model could also be expressed as an inﬁnite order multivariateARCH model, where the covariance is expressed as a geometrically de-clining weighted average of past cross products of unexpected returns,with recent observations carrying higher weights. An alternative solutionto the dimensionality problem would be to use orthogonal GARCH orfactor GARCH models (see Alexander, 2001). A disadvantage of the VECH
model is that there is no guarantee of a positive semi-deﬁnite covariancematrix.
A variance–covariance or correlation matrix must always be ‘positive
semi-deﬁnite’, and in the case where all the returns in a particular seriesare all the same so that their variance is zero is disregarded, then thematrix will be positive deﬁnite. Among other things, this means thatthe variance–covariance matrix will have all positive numbers on theleading diagonal, and will be symmetrical about this leading diagonal.These properties are intuitively appealing as well as important from amathematical point of view, for variances can never be negative, and thecovariance between two series is the same irrespective of which of thetwo series is taken ﬁrst, and positive deﬁniteness ensures that this isthe case.
A positive deﬁnite correlations matrix is also important for many ap-
plications in ﬁnance – for example, from a risk management point ofview. It is this property which ensures that, whatever the weight of eachseries in the asset portfolio, an estimated value-at-risk is always positive.

Modelling volatility and correlation 435
Fortunately, this desirable property is automatically a feature of time-
invariant correlations matrices which are computed directly using actualdata. An anomaly arises when either the correlation matrix is estimatedusing a non-linear optimisation procedure (as multivariate GARCH mod-els are), or when modiﬁed values for some of the correlations are used bythe risk manager. The resulting modiﬁed correlation matrix may or maynot be positive deﬁnite, depending on the values of the correlations thatare put in, and the values of the remaining correlations. If, by chance,the matrix is not positive deﬁnite, the upshot is that for some weightingsof the individual assets in the portfolio, the estimated portfolio variancecould be negative.
8.26.3 The BEKK model
The BEKK model (Engle and Kroner, 1995) addresses the difﬁculty with
VECH of ensuring that the Hmatrix is always positive deﬁnite. It is rep-
resented by
Ht=W/primeW+A/primeHt−1A+B/prime/Xi1t−1/Xi1/prime
t−1B (8.102)
where A, and Bare2×2matrices of parameters and Wis an upper tri-
angular matrix of parameters. The positive deﬁniteness of the covariancematrix is ensured owing to the quadratic nature of the terms on theequation’s RHS.
8.26.4 Model estimation for multivariate GARCH
Under the assumption of conditional normality, the parameters of themultivariate GARCH models of any of the above speciﬁcations can be es-timated by maximising the log-likelihood function
/lscript(θ)=−TN
2log 2π−1
2T/summationdisplay
t=1/parenleftbig
log|Ht|+/Xi1/prime
tH−1
t/Xi1t/parenrightbig
(8.103)
where θdenotes all the unknown parameters to be estimated, Nis
the number of assets (i.e. the number of series in the system) and T
is the number of observations and all other notation is as above. Themaximum-likelihood estimate for θis asymptotically normal, and thus
traditional procedures for statistical inference are applicable. Further de-tails on maximum-likelihood estimation in the context of multivariateGARCH models are beyond the scope of this book. But sufﬁce to say thatthe additional complexity and extra parameters involved compared withunivariate models make estimation a computationally more difﬁcult task,although the principles are essentially the same.

436 Introductory Econometrics for Finance
8.27 A multivariate GARCH model for the CAPM with
time-varying covariances
Bollerslev, Engle and Wooldridge (1988) estimate a multivariate GARCH
model for returns to US Treasury Bills, gilts and stocks. The data employedcomprised calculated quarterly excess holding period returns for 6-monthUS Treasury bills, 20-year US Treasury bonds and a Center for Researchin Security Prices record of the return on the New York Stock Exchange(NYSE) value-weighted index. The data run from 1959Q1 to 1984Q2 – atotal of 102 observations.
A multivariate GARCH-M model of the diagonal VECH type is employed,
with coefﬁcients estimated by maximum likelihood, and the Berndt et al.
(1974) algorithm is used. The coefﬁcient estimates are easiest presented inthe following equations for the conditional mean and variance equations,respectively
/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingley
1t
y2t
y3t/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle=/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle0.070
(0.032)
−4.342
(1.030)
−3.117
(0.710)/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle+0.499
(0.160)/summationdisplay
jωjt−1/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingleh
1jt
h2jt
h3jt/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle+/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingleε
1t
ε2t
ε3t/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle(8.104)
/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingleh
11t
h12t
h22t
h13t
h23t
h33t/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle=/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle0.011
(0.004)
0.176
(0.062)
13.305
(6.372)
0.018
(0.009)
5.143
(2.820)
2.083
(1.466)/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle+/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle0.445ε
2
1t−1
(0.105)
0.233ε1t−1ε2t−1
(0.092)
0.188ε2
2t−1
(0.113)
0.197ε1t−1ε3t−1
(0.132)
0.165ε2t−1ε3t−1
(0.093)
0.078ε2
3t−1
(0.066)/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle+/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle0.466h
11t−1
(0.056)
0.598h12t−1
(0.052)
0.441h22t−1
(0.215)
−0.362h13t−1
(0.361)
−0.348h23t−1
(0.338)
0.469h33t−1
(0.333)/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle/vextendsingle(8.105)
Source : Bollerslev, Engle and Wooldridge (1988). Reprinted with the permission
of University of Chicago Press.
where yjtare the returns, ωjt−1are a set vector of value weights at time
t−1,i=1, 2, 3, refers to bills, bonds and stocks, respectively and stan-
dard errors are given in parentheses. Consider now the implications ofthe signs, sizes and signiﬁcances of the coefﬁcient estimates in (8.104)and (8.105). The coefﬁcient of 0.499 in the conditional mean equationgives an aggregate measure of relative risk aversion, also interpreted asrepresenting the market trade-off between return and risk. This condi-tional variance-in-mean coefﬁcient gives the required additional return ascompensation for taking an additional unit of variance (risk). The inter-cept coefﬁcients in the conditional mean equation for bonds and stocks

Modelling volatility and correlation 437
are very negative and highly statistically signiﬁcant. The authors argue
that this is to be expected since favourable tax treatments for investingin longer-term assets encourages investors to hold them even at relativelylow rates of return.
The dynamic structure in the conditional variance and covariance equa-
tions is strongest for bills and bonds, and very weak for stocks, as indicatedby their respective statistical signiﬁcances. In fact, none of the parametersin the conditional variance or covariance equations for the stock returnequations is signiﬁcant at the 5% level. The unconditional covariance be-tween bills and bonds is positive, while that between bills and stocks,and between bonds and stocks, is negative. This arises since, in the lat-ter two cases, the lagged conditional covariance parameters are negativeand larger in absolute value than those of the corresponding lagged errorcross-products.
Finally, the degree of persistence in the conditional variance (given by
α
1+β), which embodies the degree of clustering in volatility, is relatively
large for the bills equation, but surprisingly small for bonds and stocks,given the results of other relevant papers in this literature.
8.28 Estimating a time-varying hedge ratio for FTSE
stock index returns
A paper by Brooks, Henry and Persand (2002) compared the effectiveness
of hedging on the basis of hedge ratios derived from various multivariateGARCH speciﬁcations and other, simpler techniques. Some of their mainresults are discussed below.
8.28.1 Background
There has been much empirical research into the calculation of opti-mal hedge ratios. The general consensus is that the use of multivariategeneralised autoregressive conditionally heteroscedastic (MGARCH) mod-els yields superior performances, evidenced by lower portfolio volatilities,than either time-invariant or rolling ordinary least squares (OLS) hedges.Cecchetti, Cumby and Figlewski (1988), Myers and Thompson (1989) andBaillie and Myers (1991), for example, argue that commodity prices arecharacterised by time-varying covariance matrices. As news about spotand futures prices arrives to the market in discrete bunches, the condi-tional covariance matrix, and hence the optimal hedging ratio, becomestime-varying. Baillie and Myers (1991) and Kroner and Sultan (1993), inter
alia, employ MGARCH models to capture time-variation in the covariance
matrix and to estimate the resulting hedge ratio.

438 Introductory Econometrics for Finance
8.28.2 Notation
LetStand Ftrepresent the logarithms of the stock index and stock index
futures prices, respectively. The actual return on a spot position held fromtime t−1t o tis/Delta1S
t=St−St−1similarly, the actual return on a futures
position is /Delta1Ft=Ft−Ft−1. However at time t−1 the expected return,
Et−1(Rt), of the portfolio comprising one unit of the stock index and β
units of the futures contract may be written as
Et−1(Rt)=Et−1(/Delta1St)−βt−1Et−1(/Delta1Ft) (8.106)
where βt−1is the hedge ratio determined at time t−1, for employment
in period t. The variance of the expected return, hp,t, of the portfolio may
be written as
hp,t=hs,t+β2
t−1hF,t−2βt−1hSF, t (8.107)
where hp,t,hs,tand hF,trepresent the conditional variances of the portfolio
and the spot and futures positions, respectively and hSF,trepresents the
conditional covariance between the spot and futures position. β∗
t−1, the op-
timal number of futures contracts in the investor’s portfolio, i.e. the opti-mal hedge ratio, is given by
β
∗
t−1=−hSF,t
hF,t(8.108)
If the conditional variance–covariance matrix is time-invariant (and if St
and Ftare not cointegrated) then an estimate of β∗, the constant optimal
hedge ratio, may be obtained from the estimated slope coefﬁcient bin
the regression
/Delta1St=a+b/Delta1Ft+ut (8.109)
The OLS estimate of the optimal hedge ratio could be given by b=hSF/hF.
8.28.3 Data and results
The data employed in the Brooks, Henry and Persand (2002) study com-
prises 3,580daily observations on the FTSE 100stock index and stock index
futures contract spanning the period 1 January 1985–9 April 1999. Severalapproaches to estimating the optimal hedge ratio are investigated.
The hedging effectiveness is ﬁrst evaluated in-sample, that is, where
the hedges are constructed and evaluated using the same set of data.The out-of-sample hedging effectiveness for a 1-day hedging horizon isalso investigated by forming one-step-ahead forecasts of the conditionalvariance of the futures series and the conditional covariance between thespot and futures series. These forecasts are then translated into hedge

Modelling volatility and correlation 439
Table 8.5 Hedging effectiveness: summary statistics for portfolio returns
In-sample
Symmetric Asymmetric
Unhedged Naive hedge time-varying hedge time-varying hedge
β=0 β=−1 βt=hFS,t
hF,tβt=hFS,t
hF,t
(1) (2) (3) (4) (5)
Return 0.0389 −0.0003 0.0061 0.0060
{2.3713 }{ − 0.0351 }{ 0.9562 }{ 0.9580 }
Variance 0.8286 0.1718 0.1240 0.1211
Out-of-sample
Symmetric Asymmetric
Unhedged Naive hedge time-varying hedge time-varying hedge
β=0 β=−1 βt=hFS,t
hF,tβt=hFS,t
hF,t
Return 0.0819 −0.0004 0.0120 0.0140
{1.4958 }{ 0.0216 }{ 0.7761 }{ 0.9083 }
Variance 1.4972 0.1696 0.1186 0.1188
Note: t -ratios displayed as {.}.
Source: Brooks, Henry and Persand (2002).
ratios using (8.108). The hedging performance of a BEKK formulation is
examined, and also a BEKK model including asymmetry terms (in the same
style as GJR models). The returns and variances for the various hedgingstrategies are presented in table 8.5.
The simplest approach, presented in column (2), is that of no hedge at
all. In this case, the portfolio simply comprises a long position in the cashmarket. Such an approach is able to achieve signiﬁcant positive returns insample, but with a large variability of portfolio returns. Although none ofthe alternative strategies generate returns that are signiﬁcantly differentfrom zero, either in-sample or out-of-sample, it is clear from columns (3)–(5) of table 8.5 that any hedge generates signiﬁcantly less return variabilitythan none at all.
The ‘naive’ hedge, which takes one short futures contract for every spot
unit, but does not allow the hedge to time-vary, generates a reductionin variance of the order of 80% in-sample and nearly 90% out-of-samplerelative to the unhedged position. Allowing the hedge ratio to be time-varying and determined from a symmetric multivariate GARCH modelleads to a further reduction as a proportion of the unhedged variance of5% and 2% for the in-sample and holdout sample, respectively. Allowingfor an asymmetric response of the conditional variance to positive and

440 Introductory Econometrics for Finance
Figure 8.5
Source: Brooks,
Henry and Persand(2002). Time-varyinghedge ratios derivedfrom symmetric andasymmetric BEKKmodels for FTSEreturns.
negative shocks yields a very modest reduction in variance (a further 0.5%
of the initial value) in-sample, and virtually no change out-of-sample.
Figure 8.5 graphs the time-varying hedge ratio from the symmetric and
asymmetric MGARCH models. The optimal hedge ratio is never greaterthan 0.96 futures contracts per index contract, with an average value of0.82 futures contracts sold per long index contract. The variance of theestimated optimal hedge ratio is 0.0019. Moreover the optimal hedge ratioseries obtained through the estimation of the asymmetric GARCH modelappears stationary. An ADF test of the null hypothesis β
∗
t−1∼I(1)(i.e. that
the optimal hedge ratio from the asymmetric BEKK model contains aunit root) was strongly rejected by the data (ADF statistic =−5.7215,
5% Critical value =−2.8630). The time-varying hedge requires the sale
(purchase) of fewer futures contracts per long (short) index contract andhence would save the ﬁrm wishing to hedge a short exposure money rela-tive to the time-invariant hedge. One possible interpretation of the betterperformance of the dynamic strategies over the naive hedge is that the dy-namic hedge uses short-run information, while the naive hedge is drivenby long-run considerations and an assumption that the relationship be-tween spot and futures price movements is 1:1.
Brooks, Henry and Persand also investigate the hedging performances
of the various models using a modern risk management approach. Theyﬁnd, once again, that the time-varying hedge results in a considerable

Modelling volatility and correlation 441
improvement, but that allowing for asymmetries results in only a very
modest incremental reduction in hedged portfolio risk.
8.29 Estimating multivariate GARCH models using EViews
In previous versions of the software, multivariate GARCH models couldonly be estimated in EViews by writing the required instructions, butnow they are available using the menus. To estimate such a model, ﬁrstyou need to create a system that contains the variables to be used. High-
light the three variables ‘reur’, ‘rgbp’, and ‘rjpy’ and then right click
the mouse. Choose Open/as System . . . ;Click Object/New Object and then
click System . Screenshot 8.6 will appear.
Screenshot 8.6
Making a system
Since no explanatory variables will be used in the conditional mean
equation, all of the default choices can be retained, so just click OK.
A system box containing the three equations with just intercepts willbe seen. Then click Proc/Estimate …for the ‘System Estimation’ window.
Change the ‘Estimation method’ to ARCH – Conditional Heteroscedastic-
ity. EViews permits the estimation of 3 important classes of multivariate
GARCH model: the diagonal VECH, the constant conditional correlation,

442 Introductory Econometrics for Finance
and the diagonal BEKK models. For the error distribution, either a mul-
tivariate normal or a multivariate Student’s tcan be used. Additional
exogenous variables can be incorporated into the variance equation, andasymmetries can be allowed for. Leaving all of these options as the defaultsand clicking OKwould yield the following results.
3
System: UNTITLED
Estimation Method: ARCH Maximum Likelihood (Marquardt)Covariance speciﬁcation: Diagonal VECHDate: 09/06/07 Time: 20:27Sample: 7/08/2002 7/07/2007Included observations: 1826Total system (balanced) observations 5478Presample covariance: backcast (parameter =0.7)
Convergence achieved after 97 iterations
Coefﬁcient Std. Error z-Statistic Prob.
C(1) −0.024107 0.008980 −2.684689 0.0073
C(2) −0.014243 0.008861 −1.607411 0.1080
C(3) 0.005420 0.009368 0.578572 0.5629
Variance Equation Coefﬁcients
C(4) 0.006725 0.000697 9.651785 0.0000C(5) 0.054984 0.004840 11.36043 0.0000C(6) 0.004792 0.000979 4.895613 0.0000C(7) 0.129606 0.007495 17.29127 0.0000C(8) 0.030076 0.003945 7.624554 0.0000C(9) 0.006344 0.001276 4.971912 0.0000
C(10) 0.031130 0.002706 11.50347 0.0000
C(11) 0.047425 0.004734 10.01774 0.0000
C(12) 0.022325 0.004061 5.497348 0.0000C(13) 0.121511 0.012267 9.905618 0.0000
C(14) 0.059994 0.007375 8.135074 0.0000
C(15) 0.034482 0.005079 6.788698 0.0000
C(16) 0.937158 0.004929 190.1436 0.0000
C(17) 0.560650 0.034187 16.39950 0.0000
C(18) 0.933618 0.011479 81.33616 0.0000
C(19) 0.127121 0.039195 3.243308 0.0012
C(20) 0.582251 0.047292 12.31189 0.0000
C(21) 0.931788 0.010298 90.47833 0.0000
Log likelihood −1935.756 Schwarz criterion 2.206582
Avg. log likelihood −0.353369 Hannan-Quinn criter. 2.166590
Akaike info criterion 2.143216
3The complexity of this model means that it takes longer to estimate than any of the
univariate GARCH or other models examined previously.

Modelling volatility and correlation 443
Equation: REUR =C(1)
R-squared −0.000151 Mean dependent var −0.018327
Adjusted R-squared −0.000151 S.D. dependent var 0.469930
S.E. of regression 0.469965 Sum squared resid 403.0827Prob(F-statistic) 2.050379
Equation: RGBP =C(2)
R-squared −0.000006 Mean dependent var −0.015282
Adjusted R-squared −0.000006 S.D. dependent var 0.413105
S.E. of regression 0.413106 Sum squared resid 311.4487Prob(F-statistic) 1.918603
Equation: RJPY =C(3)
R-squared −0.000087 Mean dependent var 0.001328
Adjusted R-squared −0.000087 S.D. dependent var 0.439632
S.E. of regression 0.439651 Sum squared resid 352.7596Prob(F-statistic) 1.981767
Covariance speciﬁcation: Diagonal VECHGARCH =M+A1.
∗RESID( −1)∗RESID( −1)/prime+B1.∗GARCH( −1)
M is an indeﬁnite matrixA1 is an indeﬁnite matrixB1 is an indeﬁnite matrix
Transformed Variance Coefﬁcients
Coefﬁcient Std. Error z-Statistic Prob.
M(1,1) 0.006725 0.000697 9.651785 0.0000
M(1,2) 0.054984 0.004840 11.36043 0.0000M(1,3) 0.004792 0.000979 4.895613 0.0000
M(2,2) 0.129606 0.007495 17.29127 0.0000M(2,3) 0.030076 0.003945 7.624554 0.0000M(3,3) 0.006344 0.001276 4.971912 0.0000
A1(1,1) 0.031130 0.002706 11.50347 0.0000
A1(1,2) 0.047425 0.004734 10.01774 0.0000A1(1,3) 0.022325 0.004061 5.497348 0.0000
A1(2,2) 0.121511 0.012267 9.905618 0.0000A1(2,3) 0.059994 0.007375 8.135074 0.0000A1(3,3) 0.034482 0.005079 6.788698 0.0000
B1(1,1) 0.937158 0.004929 190.1436 0.0000
B1(1,2) 0.560650 0.034187 16.39950 0.0000B1(1,3) 0.933618 0.011479 81.33616 0.0000
B1(2,2) 0.127121 0.039195 3.243308 0.0012B1(2,3) 0.582251 0.047292 12.31189 0.0000B1(3,3) 0.931788 0.010298 90.47833 0.0000

444 Introductory Econometrics for Finance
The ﬁrst panel of the table presents the conditional mean estimates; in
this example, only intercepts were used in the mean equations. The nextpanel shows the variance equation coefﬁcients, followed by some mea-sures of goodness of ﬁt for the model as a whole and then for each indi-vidual mean equation. The ﬁnal panel presents the transformed variancecoefﬁcients, which in this case are identical to the panel of variance co-efﬁcients since no transformation is conducted with normal errors (thesewould only be different if a Student’s tspeciﬁcation were used). It is evi-
dent that the parameter estimates are all both plausible and statisticallysigniﬁcant.
There are a number of useful further steps that can be conducted once
the model has been estimated, all of which are available by clicking the‘View’ button. For example, we can plot the series of residuals, or estimatethe correlations between them. Or by clicking on ‘Conditional variance’,we can list or plot the values of the conditional variances and covariancesover time. We can also test for autocorrelation and normality of the errors.
Key concepts
The key terms to be able to deﬁne and explain from this chapter are
●non-linearity ●GARCH model
●conditional variance ●Wald test
●maximum likelihood ●likelihood ratio test
●lagrange multiplier test ●GJR speciﬁcation
●asymmetry in volatility ●exponentially weighted
●constant conditional correlation moving average
●diagonal VECH ●BEKK model
●news impact curve ●GARCH-in-mean
●volatility clustering
Appendix: Parameter estimation using maximum likelihood
For simplicity, this appendix will consider by way of illustration the bivari-ate regression case with homoscedastic errors (i.e. assuming that there isno ARCH and that the variance of the errors is constant over time). Sup-pose that the linear regression model of interest is of the form
y
t=β1+β2xt+ut (8A.1)
Assuming that ut∼N(0,σ2), then yt∼N(β1+β2xt,σ2)so that the prob-
ability density function for a normally distributed random variable with

Modelling volatility and correlation 445
this mean and variance is given by
f(yt|β1+β2xt,σ2)=1
σ√
2πexp/braceleftbigg
−1
2(yt−β1−β2xt)2
σ2/bracerightbigg
(8A.2)
The probability density is a function of the data given the parameters.
Successive values of ytwould trace out the familiar bell-shaped curve of
the normal distribution. Since the ys are iid, the joint probability density
function (pdf) for all the ys can be expressed as a product of the individual
density functions
f(y1,y2,…, yT|β1+β2×1,β1+β2×2,…,β 1+β2xT,σ2)
=f(y1|β1+β2×2,σ2)f(y2|β1+β2×2,σ2)…f(yT|β1+β2xT,σ2)
=T/productdisplay
t=1f(yt|β1+β2xt,σ2)fort=1,…, T (8A.3)
The term on the LHS of this expression is known as the joint density
and the terms on the RHS are known as the marginal densities . This result
follows from the independence of the yvalues, in the same way as un-
der elementary probability, for three independent events A,Band C,t h e
probability of A,Band Call happening is the probability of Amultiplied
by the probability of Bmultiplied by the probability of C. Equation (8A.3)
shows the probability of obtaining all of the values of ythat did occur.
Substituting into (8A.3) for every ytfrom (8A.2), and using the result that
Aex1×Aex2×··· AexT=AT(ex1×ex2×···× exT)=ATe(x1+x2+ ···+ xT), the fol-
lowing expression is obtained
f(y1,y2,…, yT|β1+β2xt,σ2)
=1
σT(√
2π)Texp/braceleftBigg
−1
2T/summationdisplay
t=1(yt−β1−β2xt)2
σ2/bracerightBigg
(8A.4)
This is the joint density of all of the ys given the values of xt,β1,β2and
σ2. However, the typical situation that occurs in practice is the reverse of
the above situation – that is, the xtand ytare given and β1,β2,σ2a r et ob e
estimated. If this is the case, then f(•)is known as a likelihood function,
denoted LF(β1,β2,σ2), which would be written
LF(β1,β2,σ2)=1
σT(√
2π)Texp/braceleftBigg
−1
2T/summationdisplay
t=1(yt−β1−β2xt)2
σ2/bracerightBigg
(8A.5)
Maximum likelihood estimation involves choosing parameter values ( β1,
β2,σ2) that maximise this function. Doing this ensures that the values of
the parameters are chosen that maximise the likelihood that we would

446 Introductory Econometrics for Finance
have actually observed the ys that we did. It is necessary to differentiate
(8A.5) w.r.t. β1,β2,σ2, but (8A.5) is a product containing Tterms, and so
would be difﬁcult to differentiate.
Fortunately, since max
xf(x)=max
xln(f(x)), logs of (8A.5) can be taken,
and the resulting expression differentiated, knowing that the same opti-
mal values for the parameters will be chosen in both cases. Then, usingthe various laws for transforming functions containing logarithms, thelog-likelihood function, LLFis obtained
LLF=−Tlnσ−T
2ln(2π)−1
2T/summationdisplay
t=1(yt−β1−β2xt)2
σ2(8A.6)
which is equivalent to
LLF=−T
2lnσ2−T
2ln(2π)−1
2T/summationdisplay
t=1(yt−β1−β2xt)2
σ2(8A.7)
Only the ﬁrst part of the RHS of (8A.6) has been changed in (8A.7) to make
σ2appear in that part of the expression rather than σ.
Remembering the result that
∂
∂x(ln(x))=1
x
and differentiating (8A.7) w.r.t. β1,β2,σ2, the following expressions for
the ﬁrst derivatives are obtained
∂LLF
∂β1=−1
2/summationdisplay(yt−β1−β2xt).2.−1
σ2(8A.8)
∂LLF
∂β2=−1
2/summationdisplay(yt−β1−β2xt).2.−xt
σ2(8A.9)
∂LLF
∂σ2=−T
21
σ2+1
2/summationdisplay(yt−β1−β2xt)2
σ4(8A.10)
Setting (8A.8)–(8A.10) to zero to minimise the functions, and placing hats
above the parameters to denote the maximum likelihood estimators, from(8A.8)
/summationdisplay
(y
t−ˆβ1−ˆβ2xt)=0 (8A.11)
/summationdisplay
yt−/summationdisplayˆβ1−/summationdisplayˆβ2xt=0 (8A.12)
/summationdisplay
yt−Tˆβ1−ˆβ2/summationdisplay
xt=0 (8A.13)
1
T/summationdisplay
yt−ˆβ1−ˆβ21
T/summationdisplay
xt=0 (8A.14)

Modelling volatility and correlation 447
Recall that
1
T/summationdisplay
yt=¯yt
the mean of yand similarly for x, an estimator for ˆβ1can ﬁnally be derived
ˆβ1=¯y−ˆβ2¯x (8A.15)
From (8A.9)
/summationdisplay
(yt−ˆβ1−ˆβ2xt)xt=0 (8A.16)
/summationdisplay
ytxt−/summationdisplayˆβ1xt−/summationdisplayˆβ2×2
t=0 (8A.17)
/summationdisplay
ytxt−ˆβ1/summationdisplay
xt−ˆβ2/summationdisplay
x2
t=0 (8A.18)
ˆβ2/summationdisplay
x2
t=/summationdisplay
ytxt−(¯y−ˆβ2¯x)/summationdisplay
xt (8A.19)
ˆβ2/summationdisplay
x2
t=/summationdisplay
ytxt−Txy+ˆβ2T¯x2(8A.20)
ˆβ2/parenleftbig/summationdisplay
x2
t−T¯x2/parenrightbig
=/summationdisplay
ytxt−Txy (8A.21)
ˆβ2=/summationtextytxt−Txy/parenleftbig/summationtextx2
t−T¯x2/parenrightbig (8A.22)
From (8A.10)
T
ˆσ2=1
ˆσ4/summationdisplay
(yt−ˆβ1−ˆβ2xt)2(8A.23)
Rearranging,
ˆσ2=1
T/summationdisplay
(yt−ˆβ1−ˆβ2xt)2(8A.24)
But the term in parentheses on the RHS of (8A.24) is the residual for time
t(i.e. the actual minus the ﬁtted value), so
ˆσ2=1
T/summationdisplay
ˆu2
t (8A.25)
How do these formulae compare with the OLS estimators? (8A.15) and
(8A.22) are identical to those of OLS. So maximum likelihood and OLSwill deliver identical estimates of the intercept and slope coefﬁcients.

448 Introductory Econometrics for Finance
However, the estimate of ˆσ2in (8A.25) is different. The OLS estimator
was
ˆσ2=1
T−k/summationdisplay
ˆu2
t (8A.26)
and it was also shown that the OLS estimator is unbiased. Therefore, the
ML estimator of the error variance must be biased, although it is consis-tent, since as T→∞,T−k≈T.
Note that the derivation above could also have been conducted using
matrix rather than sigma algebra. The resulting estimators for the inter-cept and slope coefﬁcients would still be identical to those of OLS, whilethe estimate of the error variance would again be biased. It is also worthnoting that the ML estimator is consistent and asymptotically efﬁcient.Derivation of the ML estimator for the GARCH LLFis algebraically difﬁcult
and therefore beyond the scope of this book.
Review questions
1. (a) What stylised features of ﬁnancial data cannot be explained using
linear time series models?
(b) Which of these features could be modelled using a GARCH(1,1)
process?
(c) Why, in recent empirical research, have researchers preferred
GARCH(1,1) models to pure ARCH( p)?
(d) Describe two extensions to the original GARCH model. What
additional characteristics of ﬁnancial data might they be able tocapture?
(e) Consider the following GARCH(1,1) model
y
t=μ+ut, ut∼N/parenleftbig
0,σ2
t/parenrightbig
(8.110)
σ2
t=α0+α1u2
t−1+βσ2
t−1 (8.111)
Ifytis a daily stock return series, what range of values are likely for
the coefﬁcients μ,α0,α1andβ?
(f) Suppose that a researcher wanted to test the null hypothesis that
α1+β=1in the equation for part (e). Explain how this might be
achieved within the maximum likelihood framework.
(g) Suppose now that the researcher had estimated the above GARCH
model for a series of returns on a stock index and obtained thefollowing parameter estimates: ˆμ=0.0023, ˆα
0=0.0172,
ˆβ=0.9811, ˆα1=0.1251. If the researcher has data available up to

Modelling volatility and correlation 449
and including time T, write down a set of equations in σ2
tand u2
t
their lagged values, which could be employed to produce one-, two-,
and three-step-ahead forecasts for the conditional variance of yt.
(h) Suppose now that the coefﬁcient estimate of ˆβfor this model is
0.98 instead. By re-considering the forecast expressions you derivedin part (g), explain what would happen to the forecasts in this case.
2. (a) Discuss brieﬂy the principles behind maximum likelihood.
(b) Describe brieﬂy the three hypothesis testing procedures that are
available under maximum likelihood estimation. Which is likely to bethe easiest to calculate in practice, and why?
(c) OLS and maximum likelihood are used to estimate the parameters of
a standard linear regression model. Will they give the sameestimates? Explain your answer.
3. (a) Distinguish between the terms ‘conditional variance’ and
‘unconditional variance’. Which of the two is more likely to berelevant for producing:
i. 1-step-ahead volatility forecasts
ii. 20-step-ahead volatility forecasts.
(a) If u
tfollows a GARCH(1,1) process, what would be the likely result if
a regression of the form (8.110) were estimated using OLS andassuming a constant conditional variance?
(b) Compare and contrast the following models for volatility, noting their
strengths and weaknesses:
i. Historical volatility
ii. EWMA
iii. GARCH(1,1)
iv. Implied volatility.
4. Suppose that a researcher is interested in modelling the correlation
between the returns of the NYSE and LSE markets.(a) Write down a simple diagonal VECH model for this problem. Discuss
the values for the coefﬁcient estimates that you would expect.
(b) Suppose that weekly correlation forecasts for two weeks ahead are
required. Describe a procedure for constructing such forecasts froma set of daily returns data for the two market indices.
(c) What other approaches to correlation modelling are available?
(d) What are the strengths and weaknesses of multivariate GARCH
models relative to the alternatives that you propose in part (c)?
5. (a) What is a news impact curve? Using a spreadsheet or otherwise,
construct the news impact curve for the following estimated EGARCHand GARCH models, setting the lagged conditional variance to the

450 Introductory Econometrics for Finance
value of the unconditional variance (estimated from the sample data
rather than the mode parameter estimates), which is 0.096
σ2
t=α0+α1u2
t−1+α2σ2
t−1 (8.112)
log/parenleftbig
σ2
t/parenrightbig
=α0+α1ut−1/radicalBig
σ2
t−1+α2log/parenleftbig
σ2
t−1/parenrightbig
+α3⎡
⎣|ut−1|/radicalBig
σ2
t−1−/radicalbigg
2
π⎤
⎦ (8.113)
GARCH EGARCH
μ −0.0130 −0.0278
(0.0669) (0 .0855)
α0 0.0019 0 .0823
(0.0017) (0 .5728)
α1 0.1022∗∗−0.0214
(0.0333) (0 .0332)
α2 0.9050∗∗0.9639∗∗
(0.0175) (0 .0136)
α3 − 0.2326∗∗
(0.0795)
(b) In fact, the models in part (a) were estimated using daily foreign
exchange returns. How can ﬁnancial theory explain the patternsobserved in the news impact curves?
6. Using EViews, estimate a multivariate GARCH model for the spot and
futures returns series in ‘sandphedge.wf1’. Note that these series aresomewhat short for multivariate GARCH model estimation. Save theﬁtted conditional variances and covariances, and then use these toconstruct the time-varying optimal hedge ratios. Compare this plot withthe unconditional hedge ratio calculated in chapter 2.

9
Switching models
Learning Outcomes
In this chapter, you will learn how to
●Use intercept and slope dummy variables to allow for seasonal
behaviour in time series
●Motivate the use of regime switching models in ﬁnancial
econometrics
●Specify and explain the logic behind Markov switching models
●Compare and contrast Markov switching and threshold
autoregressive models
●Describe the intuition behind the estimation of regime
switching models
9.1 Motivations
Many ﬁnancial and economic time series seem to undergo episodes in
which the behaviour of the series changes quite dramatically comparedto that exhibited previously. The behaviour of a series could change overtime in terms of its mean value, its volatility, or to what extent its currentvalue is related to its previous value. The behaviour may change once andfor all, usually known as a ‘structural break’ in a series. Or it may changefor a period of time before reverting back to its original behaviour orswitching to yet another style of behaviour, and the latter is typicallytermed a ‘regime shift’ or ‘regime switch’.
9.1.1 What might cause one-off fundamental changes in the
properties of a series?
Usually, very substantial changes in the properties of a series are at-
tributed to large-scale events, such as wars, ﬁnancial panics – e.g. a ‘run
451

452 Introductory Econometrics for Finance
25
20
15
10
50
–5
–10
–1547 93 1 139 185 231 277 323 369 415 461 507 558 599 645 691 737 783 829 875 921 967 1013Figure 9.1
Sample time series
plot illustrating aregime shift
on a bank’, signiﬁcant changes in government policy, such as the intro-
duction of an inﬂation target, or the removal of exchange controls, orchanges in market microstructure – e.g. the ‘Big Bang’, when trading onthe London Stock Exchange (LSE) became electronic, or a change in themarket trading mechanism, such as the partial move of the LSE from aquote-driven to an order-driven system in 1997.
However, it is also true that regime shifts can occur on a regular basis
and at much higher frequency. Such changes may occur as a result of moresubtle factors, but still leading to statistically important modiﬁcationsin behaviour. An example would be the intraday patterns observed inequity market bid–ask spreads (see chapter 6). These appear to start withhigh values at the open, gradually narrowing throughout the day, beforewidening again at the close.
To give an illustration of the kind of shifts that may be seen to occur,
ﬁgure 9.1 gives an extreme example.
As can be seen from ﬁgure 9.1, the behaviour of the series changes
markedly at around observation 500. Not only does the series becomemuch more volatile than previously, its mean value is also substantiallyincreased. Although this is a severe case that was generated using sim-ulated data, clearly, in the face of such ‘regime changes’ a linear modelestimated over the whole sample covering the change would not be ap-propriate. One possible approach to this problem would be simply to splitthe data around the time of the change and to estimate separate modelson each portion. It would be possible to allow a series, y
tto be drawn
from two or more different generating processes at different times. Forexample, if it was thought an AR(1) process was appropriate to capture

Switching models 453
the relevant features of a particular series whose behaviour changed at
observation 500, say, two models could be estimated:
yt=μ1+φ1yt−1+u1tbefore observation 500 (9.1)
yt=μ2+φ2yt−1+u2tafter observation 500 (9.2)
In the context of ﬁgure 9.1, this would involve focusing on the mean
shift only. These equations represent a very simple example of what isknown as a piecewise linear model – that is, although the model is globally(i.e. when it is taken as a whole) non-linear, each of the component partsis a linear model.
This method may be valid, but it is also likely to be wasteful of in-
formation. For example, even if there were enough observations in eachsub-sample to estimate separate (linear) models, there would be an efﬁ-ciency loss in having fewer observations in each of two samples than ifall the observations were collected together. Also, it may be the case thatonly one property of the series has changed – for example, the (uncon-ditional) mean value of the series may have changed, leaving its otherproperties unaffected. In this case, it would be sensible to try to keep allof the observations together, but to allow for the particular form of thestructural change in the model-building process. Thus, what is requiredis a set of models that allow all of the observations on a series to be usedfor estimating a model, but also that the model is sufﬁciently ﬂexible toallow different types of behaviour at different points in time. Two classesof regime switching models that potentially allow this to occur are Markov
switching models and threshold autoregressive models .
A ﬁrst and central question to ask is: How can it be determined where
the switch(es) occurs? The method employed for making this choice willdepend upon the model used. A simple type of switching model is onewhere the switches are made deterministically using dummy variables.One important use of this in ﬁnance is to allow for ‘seasonality’ in ﬁnan-cial data. In economics and ﬁnance generally, many series are believed toexhibit seasonal behaviour, which results in a certain element of partlypredictable cycling of the series over time. For example, if monthly orquarterly data on consumer spending are examined, it is likely that thevalue of the series will rise rapidly in late November owing to Christmas-related expenditure, followed by a fall in mid-January, when consumersrealise that they have spent too much before Christmas and in the Januarysales! Consumer spending in the UK also typically drops during theAugust vacation period when all of the sensible people have left the coun-try. Such phenomena will be apparent in many series and will be presentto some degree at the same time every year, whatever else is happeningin terms of the long-term trend and short-term variability of the series.

454 Introductory Econometrics for Finance
9.2 Seasonalities in ﬁnancial markets: introduction
and literature review
In the context of ﬁnancial markets, and especially in the case of equi-
ties, a number of other ‘seasonal effects’ have been noted. Such effectsare usually known as ‘calendar anomalies’ or ‘calendar effects’. Exam-ples include open- and close-of-market effects, ‘the January effect’, week-end effects and bank holiday effects. Investigation into the existence orotherwise of ‘calendar effects’ in ﬁnancial markets has been the subjectof a considerable amount of recent academic research. Calendar effectsmay be loosely deﬁned as the tendency of ﬁnancial asset returns to dis-play systematic patterns at certain times of the day, week, month, or year.One example of the most important such anomalies is the day-of-the-week
effect , which results in average returns being signiﬁcantly higher on some
days of the week than others. Studies by French (1980), Gibbons and Hess(1981) and Keim and Stambaugh (1984), for example, have found that theaverage market close-to-close return in the US is signiﬁcantly negative onMonday and signiﬁcantly positive on Friday. By contrast, Jaffe and West-erﬁeld (1985) found that the lowest mean returns for the Japanese andAustralian stock markets occur on Tuesdays.
At ﬁrst glance, these results seem to contradict the efﬁcient markets
hypothesis, since the existence of calendar anomalies might be takento imply that investors could develop trading strategies which make ab-normal proﬁts on the basis of such patterns. For example, holding allother factors constant, equity purchasers may wish to sell at the closeon Friday and to buy at the close on Thursday in order to take advan-tage of these effects. However, evidence for the predictability of stock re-turns does not necessarily imply market inefﬁciency, for at least two rea-sons. First, it is likely that the small average excess returns documentedby the above papers would not generate net gains when employed in atrading strategy once the costs of transacting in the markets has beentaken into account. Therefore, under many ‘modern’ deﬁnitions of mar-ket efﬁciency (e.g. Jensen, 1978), these markets would not be classiﬁedas inefﬁcient. Second, the apparent differences in returns on differentdays of the week may be attributable to time-varying stock market riskpremiums.
If any of these calendar phenomena are present in the data but ignored
by the model-building process, the result is likely to be a misspeciﬁedmodel. For example, ignored seasonality in y
tis likely to lead to residual
autocorrelation of the order of the seasonality – e.g. ﬁfth order residualautocorrelation if y
tis a series of daily returns.

Switching models 455
9.3 Modelling seasonality in ﬁnancial data
As discussed above, seasonalities at various different frequencies in ﬁnan-
cial time series data are so well documented that their existence cannotbe doubted, even if there is argument about how they can be rationalised.One very simple method for coping with this and examining the degreeto which seasonality is present is the inclusion of dummy variables in re-gression equations. The number of dummy variables that could sensiblybe constructed to model the seasonality would depend on the frequencyof the data. For example, four dummy variables would be created for quar-terly data, 12 for monthly data, ﬁve for daily data and so on. In the caseof quarterly data, the four dummy variables would be deﬁned as follows:
D1
t=1in quarter 1 and zero otherwise
D2t=1in quarter 2 and zero otherwise
D3t=1in quarter 3 and zero otherwise
D4t=1in quarter 4 and zero otherwise
How many dummy variables can be placed in a regression model? If an
intercept term is used in the regression, the number of dummies thatcould also be included would be one less than the ‘seasonality’ of thedata. To see why this is the case, consider what happens if all four dum-mies are used for the quarterly series. The following gives the values thatthe dummy variables would take for a period during the mid-1980s, to-gether with the sum of the dummies at each point in time, presented inthe last column:
D1D2D3D4 Sum
1986 Q 1 1000 1
Q 2 0100 1Q 3 0010 1Q 4 0001 1
1987 Q 1 1000 1
Q 2 0100 1Q 3 0010 1
etc.
The sum of the four dummies would be 1 in every time period. Unfor-
tunately, this sum is of course identical to the variable that is implicitlyattached to the intercept coefﬁcient. Thus, if the four dummy variablesand the intercept were both included in the same regression, the problemwould be one of perfect multicollinearity so that ( X
/primeX)−1would not exist

456 Introductory Econometrics for Finance
and none of the coefﬁcients could be estimated. This problem is known
as the dummy variable trap . The solution would be either to just use three
dummy variables plus the intercept, or to use the four dummy variableswith no intercept.
The seasonal features in the data would be captured using either of
these, and the residuals in each case would be identical, although theinterpretation of the coefﬁcients would be changed. If four dummy vari-ables were used (and assuming that there were no explanatory variablesin the regression), the estimated coefﬁcients could be interpreted as theaverage value of the dependent variable during each quarter. In the casewhere a constant and three dummy variables were used, the interpreta-tion of the estimated coefﬁcients on the dummy variables would be thatthey represented the average deviations of the dependent variables for theincluded quarters from their average values for the excluded quarter, asdiscussed in the example below.
Box 9.1 How do dummy variables work?
The dummy variables as described above operate by changing the intercept , so that the
average value of the dependent variable, given all of the explanatory variables, ispermitted to change across the seasons. This is shown in ﬁgure 9.2.
xtQ3
Q2
Q1
Q4yt
3
1
12Figure 9.2
Use of intercept
dummy variables forquarterly data

Switching models 457
Consider the following regression
yt=β1+γ1D1t+γ2D2t+γ3D3t+β2x2t+···+ ut (9.3)
During each period, the intercept will be changed. The intercept will be:
●ˆβ1+ˆγ1in the ﬁrst quarter, since D1=1and D2=D3=0for all quarter 1
observations
●ˆβ1+ˆγ2in the second quarter, since D2=1and D1=D3=0for all quarter 2
observations.
●ˆβ1+ˆγ3in the third quarter, since D3=1and D1=D2=0for all quarter 3
observations
●ˆβ1in the fourth quarter, since D1=D2=D3=0for all quarter 4 observations.
Example 9.1
Brooks and Persand (2001a) examine the evidence for a day-of-the-week
effect in ﬁve Southeast Asian stock markets: South Korea, Malaysia,the Philippines, Taiwan and Thailand. The data, obtained from PrimarkDatastream, are collected on a daily close-to-close basis for all weekdays(Mondays to Fridays) falling in the period 31 December 1989 to 19 Jan-uary 1996 (a total of 1,581 observations). The ﬁrst regressions estimated,which constitute the simplest tests for day-of-the-week effects, are of theform
r
t=γ1D1t+γ2D2t+γ3D3t+γ4D4t+γ5D5t+ut (9.4)
where rtis the return at time tfor each country examined separately,
D1tis a dummy variable for Monday, taking the value 1 for all Monday
observations and zero otherwise, and so on. The coefﬁcient estimates canbe interpreted as the average sample return on each day of the week. Theresults from these regressions are shown in table 9.1.
Brieﬂy, the main features are as follows. Neither South Korea nor the
Philippines have signiﬁcant calendar effects; both Thailand and Malaysiahave signiﬁcant positive Monday average returns and signiﬁcant negativeTuesday returns; Taiwan has a signiﬁcant Wednesday effect.
Dummy variables could also be used to test for other calendar anoma-
lies, such as the January effect, etc. as discussed above, and a given re-gression can include dummies of different frequencies at the same time.For example, a new dummy variable D6
tcould be added to (9.4) for ‘April
effects’, associated with the start of the new tax year in the UK. Such avariable, even for a regression using daily data, would take the value 1 forall observations falling in April and zero otherwise.
If we choose to omit one of the dummy variables and to retain the in-
tercept, then the omitted dummy variable becomes the reference category

458 Introductory Econometrics for Finance
Table 9.1 Values and signiﬁcances of days of the week coefﬁcients
Thailand Malaysia Taiwan South Korea Philippines
Monday 0.49E-3 0.00322 0.00185 0.56E-3 0.00119
(0.6740) (3.9804)∗∗(2.9304)∗∗(0.4321) (1.4369)
Tuesday −0.45E-3 −0.00179 −0.00175 0.00104 −0.97E-4
(−0.3692) ( −1.6834) ( −2.1258)∗∗(0.5955) ( −0.0916)
Wednesday −0.37E-3 −0.00160 0.31E-3 −0.00264 −0.49E-3
(−0.5005) ( −1.5912) (0.4786) ( −2.107)∗∗(−0.5637)
Thursday 0.40E-3 0.00100 0.00159 −0.00159 0.92E-3
(0.5468) (1.0379) (2.2886)∗∗(−1.2724) (0.8908)
Friday −0.31E-3 0.52E-3 0.40E-4 0.43E-3 0.00151
(−0.3998) (0.5036) (0.0536) (0.3123) (1.7123)
Notes : Coefﬁcients are given in each cell followed by t-ratios in parentheses;∗and∗∗
denote signiﬁcance at the 5% and 1% levels, respectively.
Source: Brooks and Persand (2001a).
against which all the others are compared. For example consider a model
such as the one above, but where the Monday dummy variable has beenomitted
r
t=α+γ2D2t+γ3D3t+γ4D4t+γ5D5t+ut (9.5)
The estimate of the intercept will be ˆαon Monday, ˆα+ˆγ21on Tuesday
and so on. ˆγ2will now be interpreted as the difference in average returns
between Monday and Tuesday. Similarly, ˆγ3,…, ˆγ5can also be interpreted
as the differences in average returns between Wednesday, …, Friday, and
Monday.
This analysis should hopefully have made it clear that by thinking care-
fully about which dummy variable (or the intercept) to omit from theregression, we can control the interpretation to test naturally the hypoth-esis that is of most interest. The same logic can also be applied to slopedummy variables, which are described in the following section.
9.3.1 Slope dummy variables
As well as, or instead of, intercept dummies, slope dummy variables canalso be used. These operate by changing the slope of the regression line,leaving the intercept unchanged. Figure 9.3 gives an illustration in thecontext of just one slope dummy (i.e. two different ‘states’). Such a setup

Switching models 459
xtyt
yt = a + bxt + γDtxt + ut
yt = a + bxt + utFigure 9.3
Use of slope dummy
variables
would apply if, for example, the data were bi-annual (twice yearly) or bi-
weekly or observations made at the open and close of markets. Then Dt
would be deﬁned as Dt=1 for the ﬁrst half of the year and zero for the
second half.
A slope dummy changes the slope of the regression line, leaving the
intercept unchanged. In the above case, the intercept is ﬁxed at α, while
the slope varies over time. For periods where the value of the dummy iszero, the slope will be β, while for periods where the dummy is one, the
slope will be β+γ.
Of course, it is also possible to use more than one dummy variable for
the slopes. For example, if the data were quarterly, the following setupcould be used, with D1
t…D3trepresenting quarters 1–3.
yt=α+βxt+γ1D1txt+γ2D2txt+γ3D3txt+ut (9.6)
In this case, since there is also a term in xtwith no dummy attached,
the interpretation of the coefﬁcients on the dummies ( γ1, etc.) is that
they represent the deviation of the slope for that quarter from the av-erage slope over all quarters. On the other hand, if the 4 slope dummyvariables were included (and not βx
t), the coefﬁcients on the dummies
would be interpreted as the average slope coefﬁcients during each quarter.Again, it is important not to include 4 quarterly slope dummies and the

460 Introductory Econometrics for Finance
βxtin the regression together, otherwise perfect multicollinearity would
result.
Example 9.2
Returning to the example of day-of-the-week effects in Southeast Asianstock markets, although signiﬁcant coefﬁcients in (9.4) will support thehypothesis of seasonality in returns, it is important to note that risk fac-tors have not been taken into account. Before drawing conclusions on thepotential presence of arbitrage opportunities or inefﬁcient markets, it isimportant to allow for the possibility that the market can be more or lessrisky on certain days than others. Hence, low (high) signiﬁcant returns in(9.4) might be explained by low (high) risk. Brooks and Persand thus testfor seasonality using the empirical market model, whereby market risk isproxied by the return on the FTA World Price Index. Hence, in order tolook at how risk varies across the days of the week, interactive (i.e. slope)dummy variables are used to determine whether risk increases (decreases)on the day of high (low) returns. The equation, estimated separately usingtime-series data for each country can be written
r
t=/parenleftBigg5/summationdisplay
i=1αiDit+βiDitRWM t/parenrightBigg
+ut (9.7)
where αiandβiare coefﬁcients to be estimated, Ditis the ithdummy
variable taking the value 1 for day t=iand zero otherwise, and RWM tis
the return on the world market index. In this way, when considering theeffect of market risk on seasonality, both risk and return are permitted tovary across the days of the week. The results from estimation of (9.6) aregiven in table 9.2. Note that South Korea and the Philippines are excludedfrom this part of the analysis, since no signiﬁcant calendar anomalies werefound to explain in table 9.1.
As can be seen, signiﬁcant Monday effects in the Bangkok and Kuala
Lumpur stock exchanges, and a signiﬁcant Thursday effect in the latter,remain even after the inclusion of the slope dummy variables which allowrisk to vary across the week. The t-ratios do fall slightly in absolute value,
however, indicating that the day-of-the-week effects become slightly lesspronounced. The signiﬁcant negative average return for the Taiwanesestock exchange, however, completely disappears. It is also clear that aver-age risk levels vary across the days of the week. For example, the betas forthe Bangkok stock exchange vary from a low of 0.36 on Monday to a highof over unity on Tuesday. This illustrates that not only is there a signiﬁcantpositive Monday effect in this market, but also that the responsiveness of

Switching models 461
Table 9.2 Day-of-the-week effects with the inclusion of interactive dummy variables
with the risk proxy
Thailand Malaysia Taiwan
Monday 0.00322 0.00185 0.544E-3
(3.3571)∗∗(2.8025)∗∗(0.3945)
Tuesday −0.00114 −0.00122 0.00140
(−1.1545) ( −1.8172) (1.0163)
Wednesday −0.00164 0.25E-3 −0.00263
(−1.6926) (0.3711) ( −1.9188)
Thursday 0.00104 0.00157 −0.00166
(1.0913) (2.3515)∗(−1.2116)
Friday 0.31E-4 −0.3752 −0.13E-3
(0.03214) ( −0.5680) ( −0.0976)
Beta-Monday 0.3573 0.5494 0.6330
(2.1987)∗(4.9284)∗∗(2.7464)∗∗
Beta-Tuesday 1.0254 0.9822 0.6572
(8.0035)∗∗(11.2708)∗∗(3.7078)∗∗
Beta-Wednesday 0.6040 0.5753 0.3444
(3.7147)∗∗(5.1870)∗∗(1.4856)
Beta-Thursday 0.6662 0.8163 0.6055
(3.9313)∗∗(6.9846)∗∗(2.5146)∗
Beta-Friday 0.9124 0.8059 1.0906
(5.8301)∗∗(7.4493)∗∗(4.9294)∗∗
Notes : Coefﬁcients are given in each cell followed by t-ratios in parentheses;∗and∗∗
denote signiﬁcance at the 5% and 1%, levels respectively.
Source : Brooks and Persand (2001a).
Bangkok market movements to changes in the value of the general world
stock market is considerably lower on this day than on other days of theweek.
9.3.2 Dummy variables for seasonality in EViews
The most commonly observed calendar effect in monthly data is a January
effect . In order to examine whether there is indeed a January effect in a
monthly time series regression, a dummy variable is created that takes thevalue 1 only in the months of January. This is easiest achieved by creatinga new dummy variable called JANDUM containing zeros everywhere, andthen editing the variable entries manually, changing all of the zeros forJanuary months to ones. Returning to the Microsoft stock price exampleof chapters 3 and 4, Create this variable using the methodology described

462 Introductory Econometrics for Finance
above, and run the regression again including this new dummy variable
as well. The results of this regression are:
Dependent Variable: ERMSOFT
Method: Least SquaresDate: 09/06/07 Time: 20:45Sample (adjusted): 1986M05 2007M04Included observations: 252 after adjustments
Coefﬁcient Std. Error t-Statistic Prob.
C −0.574717 1.334120 −0.430783 0.6670
ERSANDP 1.522142 0.183517 8.294282 0.0000
DPROD 0.522582 0.450995 1.158730 0.2477
DCREDIT −6.27E-05 0.000144 −0.435664 0.6635
DINFLATION 2.162911 3.048665 0.709462 0.4787
DMONEY −1.412355 0.641359 −2.202129 0.0286
DSPREAD 8.944002 12.16534 0.735203 0.4629
RTERM 6.944576 2.978703 2.331409 0.0206
FEB89DUM −68.52799 12.62302 −5.428811 0.0000
FEB03DUM −66.93116 12.60829 −5.308503 0.0000
JANDUM 6.140623 3.277966 1.873303 0.0622
R-squared 0.368162 Mean dependent var −0.420803
Adjusted R-squared 0.341945 S.D. dependent var 15.41135S.E. of regression 12.50178 Akaike info criterion 7.932288Sum squared resid 37666.97 Schwarz criterion 8.086351Log likelihood −988.4683 Hannan-Quinn criter. 7.994280
F-statistic 14.04271 Durbin-Watson stat 2.135471Prob(F-statistic) 0.000000
As can be seen, the dummy is just outside being statistically signiﬁcant
at the 5% level, and it has the expected positive sign. The coefﬁcient valueof 6.14, suggests that on average and holding everything else equal, Mi-crosoft stock returns are around 6% higher in January than the averagefor other months of the year.
9.4 Estimating simple piecewise linear functions
The piecewise linear model is one example of a general set of modelsknown as spline techniques . Spline techniques involve the application of
polynomial functions in a piecewise fashion to different portions of thedata. These models are widely used to ﬁt yield curves to available data onthe yields of bonds of different maturities (see, for example, Shea, 1984).
A simple piecewise linear model could operate as follows. If the rela-
tionship between two series, yand x, differs depending on whether xis

Switching models 463
smaller or larger than some threshold value x∗, this phenomenon can be
captured using dummy variables. A dummy variable, Dt, could be deﬁned,
taking values
Dt=/braceleftBigg
0if xt<x∗
1if xt≥x∗(9.8)
To offer an illustration of where this may be useful, it is sometimes the
case that the tick size limits vary according to the price of the asset. Forexample, according to George and Longstaff (1993, see also chapter 6 ofthis book), the Chicago Board of Options Exchange (CBOE) limits the ticksize to be $(1/8) for options worth $3 or more, and $(1/16) for options worthless than $3. This means that the minimum permissible price movementsare $(1/8) and ($1/16) for options worth $3 or more and less than $ 3,
respectively. Thus, if yis the bid–ask spread for the option, and xis the
option price, used as a variable to partly explain the size of the spread,the spread will vary with the option price partly in a piecewise mannerowing to the tick size limit. The model could thus be speciﬁed as
y
t=β1+β2xt+β3Dt+β4Dtxt+ut (9.9)
with Dtdeﬁned as above. Viewed in the light of the above discussion on
seasonal dummy variables, the dummy in (9.8) is used as both an interceptand a slope dummy. An example showing the data and regression line isgiven by ﬁgure 9.4.
Note that the value of the threshold or ‘knot’ is assumed known at
this stage. Throughout, it is also possible that this situation could be
xtyt
Threshold
value of xFigure 9.4
Piecewise linear
model withthreshold x
∗

464 Introductory Econometrics for Finance
generalised to the case where ytis drawn from more than two regimes or
is generated by a more complex model.
9.5 Markov switching models
Although a large number of more complex, non-linear threshold mod-els have been proposed in the econometrics literature, only two kinds ofmodel have had any noticeable impact in ﬁnance (aside from thresholdGARCH models of the type alluded to in chapter 8). These are the Markovregime switching model associated with Hamilton (1989, 1990), and thethreshold autoregressive model associated with Tong (1983, 1990). Each ofthese formulations will be discussed below.
9.5.1 Fundamentals of Markov switching models
Under the Markov switching approach, the universe of possible occur-rences is split into mstates of the world, denoted s
i,i=1,…,m, cor-
responding to mregimes. In other words, it is assumed that ytswitches
regime according to some unobserved variable, st, that takes on integer
values. In the remainder of this chapter, it will be assumed that m=1
or 2. So if st=1, the process is in regime 1 at time t, and if st=2, the
process is in regime 2 at time t. Movements of the state variable between
regimes are governed by a Markov process. This Markov property can beexpressed as
P[a<y
t≤b|y1,y2,…, yt−1]=P[a<yt≤b|yt−1] (9.10)
In plain English, this equation states that the probability distribution
of the state at any time tdepends only on the state at time t−1a n d
not on the states that were passed through at times t−2,t−3,…Hence
Markov processes are not path-dependent. The model’s strength lies in itsﬂexibility, being capable of capturing changes in the variance betweenstate processes, as well as changes in the mean.
The most basic form of Hamilton’s model, also known as ‘Hamilton’s
ﬁlter’ (see Hamilton, 1989), comprises an unobserved state variable, de-noted z
t, that is postulated to evaluate according to a ﬁrst order Markov
process
prob[ zt=1|zt−1=1]=p11 (9.11)
prob[ zt=2|zt−1=1]=1−p11 (9.12)
prob[ zt=2|zt−1=2]=p22 (9.13)
prob[ zt=1|zt−1=2]=1−p22 (9.14)

Switching models 465
where p11and p22denote the probability of being in regime one, given
that the system was in regime one during the previous period, and theprobability of being in regime two, given that the system was in regimetwo during the previous period, respectively. Thus 1 −p
11deﬁnes the prob-
ability that ytwill change from state 1 in period t−1 to state 2 in period
t, and 1 −p22deﬁnes the probability of a shift from state 2 to state 1
between times t−1 and t. It can be shown that under this speciﬁcation,
ztevolves as an AR(1) process
zt=(1−p11)+ρzt−1+ηt (9.15)
where ρ=p11+p22−1. Loosely speaking, ztcan be viewed as a gener-
alisation of the dummy variables for one-off shifts in a series discussedabove. Under the Markov switching approach, there can be multiple shiftsfrom one set of behaviour to another.
In this framework, the observed returns series evolves as given by (9.15)
y
t=μ1+μ2zt+(σ2
1+φzt)1/2ut (9.16)
where ut∼N(0,1). The expected values and variances of the series are μ1
andσ2
1, respectively in state 1, and ( μ1+μ2) and σ2
1+φin respectively,
state 2. The variance in state 2 is also deﬁned, σ2
2=σ2
1+φ. The unknown
parameters of the model (μ1,μ2,σ2
1,σ2
2,p11,p22)are estimated using max-
imum likelihood. Details are beyond the scope of this book, but are mostcomprehensively given in Engel and Hamilton (1990).
If a variable follows a Markov process, all that is required to forecast the
probability that it will be in a given regime during the next period is thecurrent period’s probability and a set of transition probabilities, given forthe case of two regimes by (9.11)–(9.14). In the general case where therearemstates, the transition probabilities are best expressed in a matrix as
P=⎡
⎢⎢⎣P
11 P12… P1m
P21 P22… P2m
… … … …
Pm1Pm2… Pmm⎤
⎥⎥⎦(9.17)
where Pijis the probability of moving from regime ito regime j. Since,
at any given time, the variable must be in one of the mstates, it must be
true that
m/summationdisplay
j=1Pij=1∀i (9.18)
A vector of current state probabilities is then deﬁned as
πt=[π1π2… π m] (9.19)

466 Introductory Econometrics for Finance
where πiis the probability that the variable yis currently in state i. Given
πtand P, the probability that the variable ywill be in a given regime next
period can be forecast using
πt+1=πtP (9.20)
The probabilities for Ssteps into the future will be given by
πt+s=πtPs(9.21)
9.6 A Markov switching model for the real exchange rate
There have been a number of applications of the Markov switching model
in ﬁnance. Clearly, such an approach is useful when a series is thought toundergo shifts from one type of behaviour to another and back again, butwhere the ‘forcing variable’ that causes the regime shifts is unobservable.
One such application is to modelling the real exchange rate. As dis-
cussed in chapter 7, purchasing power parity (PPP) theory suggests thatthe law of one price should always apply in the long run such that thecost of a representative basket of goods and services is the same wher-ever it is purchased, after converting it into a common currency. Undersome assumptions, one implication of PPP is that the real exchange rate(that is, the exchange rate divided by a general price index such as theconsumer price index (CPI)) should be stationary. However, a number ofstudies have failed to reject the unit root null hypothesis in real exchangerates, indicating evidence against the PPP theory.
It is widely known that the power of unit root tests is low in the presence
of structural breaks as the ADF test ﬁnds it difﬁcult to distinguish betweena stationary process subject to structural breaks and a unit root process.In order to investigate this possibility, Bergman and Hansson (2005) es-timate a Markov switching model with an AR(1) structure for the realexchange rate, which allows for multiple switches between two regimes.The speciﬁcation they use is
y
t=μst+φyt−1+/epsilon1t (9.22)
where ytis the real exchange rate, st,(t=1,2)are the two states, and
/epsilon1t∼N(0,σ2).1The state variable stis assumed to follow a standard
2-regime Markov process as described above.
1The authors also estimate models that allow φandσ2to vary across the states, but the
restriction that the parameters are the same across the two states cannot be rejectedand hence the values presented in the study assume that they are constant.

Switching models 467
Quarterly observations from 1973Q2 to 1997Q4 (99 data points) are used
on the real exchange rate (in units of foreign currency per US dollar) forthe UK, France, Germany, Switzerland, Canada and Japan. The model isestimated using the ﬁrst 72 observations (1973Q2–1990Q4) with the re-mainder retained for out-of-sample forecast evaluation. The authors use100 times the log of the real exchange rate, and this is normalised to takea value of one for 1973Q2 for all countries. The Markov switching modelestimates obtained using maximum likelihood estimation are presentedin table 9.3.
As the table shows, the model is able to separate the real exchange rates
into two distinct regimes for each series, with the intercept in regimeone (μ
1) being positive for all countries except Japan (resulting from the
phenomenal strength of the yen over the sample period), correspondingto a rise in the log of the number of units of the foreign currency per USdollar, i.e. a depreciation of the domestic currency against the dollar. μ
2,
the intercept in regime 2, is negative for all countries, corresponding toa domestic currency appreciation against the dollar. The probabilities ofremaining within the same regime during the following period ( p
11and
p22) are fairly low for the UK, France, Germany and Switzerland, indicating
fairly frequent switches from one regime to another for those countries’currencies.
Interestingly, after allowing for the switching intercepts across the
regimes, the AR(1) coefﬁcient, φ, in table 9.3 is a considerable distance
below unity, indicating that these real exchange rates are stationary.Bergman and Hansson simulate data from the stationary Markov switch-ing AR(1) model with the estimated parameters but they assume that theresearcher conducts a standard ADF test on the artiﬁcial data. They ﬁndthat for none of the cases can the unit root null hypothesis be rejected,even though clearly this null is wrong as the simulated data are station-ary. It is concluded that a failure to account for time-varying intercepts(i.e. structural breaks) in previous empirical studies on real exchange ratescould have been the reason for the ﬁnding that the series are unit rootprocesses when the ﬁnancial theory had suggested that they should bestationary.
Finally, the authors employ their Markov switching AR(1) model for fore-
casting the remainder of the exchange rates in the sample in comparisonwith the predictions produced by a random walk and by a Markov switch-ing model with a random walk. They ﬁnd that for all six series, and forforecast horizons up to 4 steps (quarters) ahead, their Markov switching ARmodel produces predictions with the lowest mean squared errors; theseimprovements over the pure random walk are statistically signiﬁcant.

Table 9.3 Estimates of the Markov switching model for real exchange rates
Parameter UK France Germany Switzerland Canada Japan
μ1 3.554 (0.550) 6.131 (0.604) 6.569 (0.733) 2.390 (0.726) 1.693 (0.230) −0.370 (0.681)
μ2 −5.096 (0.549) −2.845 (0.409) −2.676 (0.487) −6.556 (0.775) −0.306 (0.249) −8.932 (1.157)
φ 0.928 (0.027) 0.904 (0.020) 0.888 (0.023) 0.958 (0.027) 0.922 (0.021) 0.871 (0.027)
σ210.118 (1.698) 7.706 (1.293) 10.719 (1.799) 13.513 (2.268) 1.644 (0.276) 15.879 (2.665)
p11 0.672 0.679 0.682 0.792 0.952 0.911
p22 0.690 0.833 0.830 0.716 0.944 0.817
Notes : Standard errors in parentheses.
Source : Bergman and Hansson (2005).
Reprinted with the permission of Elsevier Science.

Switching models 469
9.7 A Markov switching model for the gilt–equity yield ratio
As discussed below, a Markov switching approach is also useful for mod-
elling the time series behaviour of the gilt–equity yield ratio (GEYR), de-ﬁned as the ratio of the income yield on long-term government bonds tothe dividend yield on equities. It has been suggested that the current valueof the GEYR might be a useful tool for investment managers or marketanalysts in determining whether to invest in equities or whether to investin gilts. Thus the GEYR is purported to contain information useful for de-termining the likely direction of future equity market trends. The GEYRis assumed to have a long-run equilibrium level, deviations from whichare taken to signal that equity prices are at an unsustainable level. If theGEYR becomes high relative to its long-run level, equities are viewed asbeing expensive relative to bonds. The expectation, then, is that for givenlevels of bond yields, equity yields must rise, which will occur via a fall inequity prices. Similarly, if the GEYR is well below its long-run level, bondsare considered expensive relative to stocks, and by the same analysis, theprice of the latter is expected to increase. Thus, in its crudest form, anequity trading rule based on the GEYR would say, ‘if the GEYR is low, buyequities; if the GEYR is high, sell equities’. The paper by Brooks and Per-sand (2001b) discusses the usefulness of the Markov switching approachin this context, and considers whether proﬁtable trading rules can bedeveloped on the basis of forecasts derived from the model.
Brooks and Persand (2001b) employ monthly stock index dividend yields
and income yields on government bonds covering the period January 1975until August 1997 (272 observations) for three countries – the UK, the USand Germany. The series used are the dividend yield and index valuesof the FTSE100 (UK), the S&P500 (US) and the DAX (Germany). The bondindices and redemption yields are based on the clean prices of UK govern-ment consols, and US and German 10-year government bonds.
As an example, ﬁgure 9.5 presents a plot of the distribution of the GEYR
for the US (in bold), together with a normal distribution having the samemean and variance. Clearly, the distribution of the GEYR series is notnormal, and the shape suggests two separate modes: one upper part ofthe distribution embodying most of the observations, and a lower partcovering the smallest values of the GEYR.
Such an observation, together with the notion that a trading rule should
be developed on the basis of whether the GEYR is ‘high’ or ‘low’, and inthe absence of a formal econometric model for the GEYR, suggests that aMarkov switching approach may be useful. Under the Markov switchingapproach, the values of the GEYR are drawn from a mixture of normal

470 Introductory Econometrics for Finance
0.45
0.400.350.300.25
0.20
0.15
0.10
0.05
0.00
–4 –2 2 04Figure 9.5
Source: Brooks and
Persand (2001b).Unconditionaldistribution of USGEYR togetherwith a normaldistribution withthe same meanand variance
Table 9.4 Estimated parameters for the Markov switching models
μ1 μ2 σ2
1 σ2
2 p11 p22 N1 N2
Statistic (1)( 2)( 3)( 4)( 5)( 6)( 7)( 8)
UK 2.4293 2.0749 0.0624 0.0142 0.9547 0.9719 102 170
(0.0301) (0.0367) (0.0092) (0.0018) (0.0726) (0.0134)
US 2.4554 2.1218 0.0294 0.0395 0.9717 0.9823 100 172
(0.0181) (0.0623) (0.0604) (0.0044) (0.0171) (0.0106)
Germany 3.0250 2.1563 0.5510 0.0125 0.9816 0.9328 200 72
(0.0544) (0.0154) (0.0569) (0.0020) (0.0107) (0.0323)
Notes : Standard errors in parentheses; N1and N2denote the number of observations
deemed to be in regimes 1and 2, respectively.
Source : Brooks and Persand (2001b).
distributions, where the weights attached to each distribution sum to
one and where movements between series are governed by a Markov pro-cess. The Markov switching model is estimated using a maximum likeli-hood procedure (as discussed in chapter 8), based on GAUSS code suppliedby James Hamilton. Coefﬁcient estimates for the model are presented intable 9.4.
The means and variances for the values of the GEYR for each of the two
regimes are given in columns headed (1)–(4) of table 9.4 with standarderrors associated with each parameter in parentheses. It is clear that the

Switching models 471
regime switching model has split the data into two distinct samples – one
with a high mean (of 2.43, 2.46 and 3.03 for the UK, US and Germany,respectively) and one with a lower mean (of 2.07, 2.12, and 2.16), as wasanticipated from the unconditional distribution of returns. Also apparentis the fact that the UK and German GEYR are more variable at timeswhen it is in the high mean regime, evidenced by their higher variance(in fact, it is around four and 20 times higher than for the low GEYR state,respectively). The number of observations for which the probability thatthe GEYR is in the high mean state exceeds 0.5 (and thus when the GEYRis actually deemed to be in this state) is 102 for the UK (37.5% of the total),while the ﬁgures for the US are 100 (36.8%) and for Germany 200 (73.5%).Thus, overall, the GEYR is more likely to be in the low mean regime forthe UK and US, while it is likely to be high in Germany.
The columns marked (5) and (6) of table 9.4 give the values of p
11and
p22, respectively, that is the probability of staying in state 1 given that
the GEYR was in state 1 in the immediately preceding month, and theprobability of staying in state 2 given that the GEYR was in state 2 previ-ously, respectively. The high values of these parameters indicates that theregimes are highly stable with less than a 10% chance of moving from alow GEYR to a high GEYR regime and vice versa for all three series. Figure9.6 presents a ‘ q-plot’, which shows the value of GEYR and probability that
it is in the high GEYR regime for the UK at each point in time.
Jul 1980
Mar 81
Nov 81
Jul 82
Mar 83
Nov 83
Jul 84
Mar 85
Nov 85
Jul 86
Mar 87
Nov 87
Jul 88
Mar 89
Nov 89
Jul 90
Mar 91
Nov 91
Jul 92
Mar 93
Nov 93
Jul 94
Mar 95
Nov 95
Jul 96
Mar 97
DateValue of GEYR
Probabilit yGEYR
st = 13.5
3.3
3.12.9
2.7
2.52.32.11.9
1.7
1.51.2
10.8
0.6
0.4
0.2
0Figure 9.6
Source: Brooks and
Persand (2001b).Value of GEYR andprobability that it isin the High GEYRregime for the UK

472 Introductory Econometrics for Finance
As can be seen, the probability that the UK GEYR is in the ‘high’ regime
(the dotted line) varies frequently, but spends most of its time either closeto zero or close to one. The model also seems to do a reasonably good jobof specifying which regime the UK GEYR should be in, given that the prob-ability seems to match the broad trends in the actual GEYR (the full line).
Engel and Hamilton (1990) show that it is possible to give a forecast of
the probability that a series y
t, which follows a Markov switching process,
will be in a particular regime. Brooks and Persand (2001b) use the ﬁrst60 observations (January 1975–December 1979) for in-sample estimationof the model parameters ( μ
1,μ2,σ2
1,σ2
2,p11,p22). Then a one step-ahead
forecast is produced of the probability that the GEYR will be in the highmean regime during the next period. If the probability that the GEYRwill be in the low regime during the next period is forecast to be morethat 0.5, it is forecast that the GEYR will be low and hence equities arebought or held. If the probability that the GEYR is in the low regime isforecast to be less than 0.5, it is anticipated that the GEYR will be high andhence gilts are invested in or held. The model is then rolled forward oneobservation, with a new set of model parameters and probability forecastsbeing constructed. This process continues until 212 such probabilities areestimated with corresponding trading rules.
The returns for each out-of-sample month for the switching portfolio
are calculated, and their characteristics compared with those of buy-and-hold equities and buy-and-hold gilts strategies. Returns are calculated ascontinuously compounded percentage returns on a stock (the FTSE inthe UK, the S&P500 in the US, the DAX in Germany) or on a long-termgovernment bond. The proﬁtability of the trading rules generated by theforecasts of the Markov switching model are found to be superior in grossterms compared with a simple buy-and-hold equities strategy. In the UKcontext, the former yields higher average returns and lower standard de-viations. The switching portfolio generates an average return of 0.69% permonth, compared with 0.43% for the pure bond and 0.62% for the pure
equity portfolios. The improvements are not so clear-cut for the US andGermany. The Sharpe ratio for the UK Markov switching portfolio is al-most twice that of the buy-and-hold equities portfolio, suggesting that,after allowing for risk, the switching model provides a superior tradingrule. The improvement in the Sharpe ratio for the other two countries is,on the contrary, only very modest.
To summarise:
●The Markov switching approach can be used to model the gilt-equityyield ratio

Switching models 473
●The resulting model can be used to produce forecasts of the probability
that the GEYR will be in a particular regime
●Before transactions costs, a trading rule derived from the model pro-duces a better performance than a buy-and-hold equities strategy, inspite of inferior predictive accuracy as measured statistically
●Net of transactions costs, rules based on the Markov switching modelare not able to beat a passive investment in the index for any of thethree countries studied.
9.8 Threshold autoregressive models
Threshold autoregressive (TAR) models are one class of non-linear autore-gressive models. Such models are a relatively simple relaxation of standardlinear autoregressive models that allow for a locally linear approximationover a number of states. According to Tong (1990, p. 99), the thresholdprinciple ‘allows the analysis of a complex stochastic system by decom-posing it into a set of smaller sub-systems’. The key difference betweenTAR and Markov switching models is that, under the former, the statevariable is assumed known and observable, while it is latent under thelatter. A very simple example of a threshold autoregressive model is givenby (9.23). The model contains a ﬁrst order autoregressive process in eachof two regimes, and there is only one threshold. Of course, the numberof thresholds will always be the number of regimes minus one. Thus,the dependent variable y
tis purported to follow an autoregressive process
with intercept coefﬁcient μ1and autoregressive coefﬁcient φ1if the value
of the state-determining variable lagged kperiods, denoted st−kis lower
than some threshold value r. If the value of the state-determining variable
lagged kperiods, is equal to or greater than that threshold value r,ytis
speciﬁed to follow a different autoregressive process, with intercept coef-ﬁcient μ
2and autoregressive coefﬁcient φ2. The model would be written
yt=/braceleftBiggμ1+φ1yt−1+u1tif s t−k<r
μ2+φ2yt−1+u2tif s t−k≥r(9.23)
But what is st−k, the state-determining variable? It can be any variable
that is thought to make ytshift from one set of behaviour to another.
Obviously, ﬁnancial or economic theory should have an important roleto play in making this decision. If k=0, it is the current value of the
state-determining variable that inﬂuences the regime that yis in at
time t, but in many applications kis set to 1, so that the immediately
preceding value of sis the one that determines the current value of y.

474 Introductory Econometrics for Finance
The simplest case for the state determining variable is where it is the
variable under study, i.e. st−k=yt−k. This situation is known as a self-
exciting TAR, or a SETAR, since it is the lag of the variable yitself that
determines the regime that yis currently in. The model would now be
written
yt=/braceleftBiggμ1+φ1yt−1+u1tif y t−k<r
μ2+φ2yt−1+u2tif y t−k≥r(9.24)
The models of (9.23) or (9.24) can of course be extended in several direc-
tions. The number of lags of the dependent variable used in each regimemay be higher than one, and the number of lags need not be the same forboth regimes. The number of states can also be increased to more thantwo. A general threshold autoregressive model, that notationally permitsthe existence of more than two regimes and more than one lag, may bewritten
x
t=J/summationdisplay
j=1I(j)
t/parenleftBigg
φ(j)
0+pj/summationdisplay
i=1φ(j)
ixt−i+u(j)
t/parenrightBigg
,rj−1≤zt−d≤rj (9.25)
where I(j)
tis an indicator function for the jthregime taking the value
one if the underlying variable is in state jand zero otherwise. zt−dis
an observed variable determining the switching point and u(j)
tis a zero-
mean independently and identically distributed error process. Again, ifthe regime changes are driven by own lags of the underlying variable, x
t
(i.e. zt−d=xt−d), then the model is a self-exciting TAR (SETAR).
It is also worth re-stating that under the TAR approach, the variable
yis either in one regime or another, given the relevant value of s,a n d
there are discrete transitions between one regime and another. This is incontrast with the Markov switching approach, where the variable yis in
both states with some probability at each point in time. Another class ofthreshold autoregressive models, known as smooth transition autoregres-sions (STAR), allows for a more gradual transition between the regimesby using a continuous function for the regime indicator rather than anon–off switch (see Franses and van Dijk, 2000, chapter 3).
9.9 Estimation of threshold autoregressive models
Estimation of the model parameters ( φi,rj,d,pj)is considerably more dif-
ﬁcult than for a standard linear autoregressive process, since in generalthey cannot be determined simultaneously in a simple way, and the values

Copyright Notice

© Licențiada.org respectă drepturile de proprietate intelectuală și așteaptă ca toți utilizatorii să facă același lucru. Dacă consideri că un conținut de pe site încalcă drepturile tale de autor, te rugăm să trimiți o notificare DMCA.

Acest articol: This page intentionally left blank [609650] (ID: 609650)

Dacă considerați că acest conținut vă încalcă drepturile de autor, vă rugăm să depuneți o cerere pe pagina noastră Copyright Takedown.

This page intentionally left blank [609650]

Copyright Notice

Faculty of Economic Sciences and Business Administration Finance and Banking Bachelors thesis Graduate, Raluca HONCAN Coordinator, Prof. univ. dr…. [621253]

Castelele din Judetul Satu Mare [308332]

Planul Proiectului de Licen taDisertatie [630776]

AVOCATURA SI CÂTEVA PROVOC ĂRI DE NATUR Ă ETICĂ [609498]

1Sănătate1și1asistenŃă1pedagogică1 [623888]

Implicarea elevilor cu cerințe educative speciale în activitățile matematice [624238]

Copyright Notice

Similar Posts