-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path33_vecm.qmd
424 lines (340 loc) · 14.5 KB
/
33_vecm.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
# VECM\*
## Cointegrated Systems
::: {#def-coint2}
An $n \times 1$ vector $y_t$ is said to be **cointegrated** if each of
its elements individually is $I(1)$ and there exists a non-zero vector
$a\in\mathbb R^n$ such that $a'y_t$ is stationary. $a$ is called a
**cointegrating vector**.
:::
If there are $h<n$ linearly independent cointegrating vectors
$(a_1, a_2, \dots,a_h)$, then any linear combination
$k_1a_2 + k_2a_2+\dots+k_ha_h$ is also a cointegrating vector. Thus, we
say $(a_1, a_2, \dots,a_h)$ form a basis a basis for **the space of
cointegrating vectors**.
Cointegrated systems can be represented by a **Vector Error Correlation
Model (VECM)**:
$$
\Delta y_t = \Pi y_{t-1} + \Gamma_1\Delta y_{t-1} + \Gamma_2\Delta y_{t-2} + \dots +
\Gamma_{p-1}\Delta y_{t-p+1} + \mu+\epsilon_t
$$ {#eq-ecm}
::: {#thm-ecm}
## Engle-Granger Representation Theorem
Any set of $I(1)$ variables are cointegrated if and only if there exists
an error correlation (ECM) representation for them.
:::
Therefore, it is inappropriate to model a cointegrated system with
differenced VAR. Because the term $\Pi y_{t-1}$ is missing out, which
means a misspecification.
The cointegration term can be further factored as
$\Pi=\underset{n \times h}{A}\times\underset{h \times n}{B'}$, in which
$B$ contains the cointegrating vectors and $A$ hosts the adjustment
coefficients. In a bivariate example, it looks like
$$
\begin{bmatrix}\Delta y_{1t} \\ \Delta y_{2t}\end{bmatrix} =
\begin{bmatrix}\alpha_1 \\ \alpha_2\end{bmatrix}
\begin{bmatrix}\beta_1 & \beta_2\end{bmatrix}
\begin{bmatrix}y_{1t-1} \\ y_{2t-1}\end{bmatrix} +
\sum_{j=1}^{p-1}
\begin{bmatrix}
\gamma_{j,11} & \gamma_{j,12} \\
\gamma_{j,21} & \gamma_{j,22}
\end{bmatrix}
\begin{bmatrix}\Delta y_{1t-j} \\ \Delta y_{2t-j}\end{bmatrix} +
\begin{bmatrix}\epsilon_{1t} \\ \epsilon_{2t}\end{bmatrix}
$$
The economic interpretation is that $\beta_1 y_{1t} + \beta_2 y_{2t}$
represents some *long-run equilibrium* relationship of the two
variables. Parameters $\alpha_1,\alpha_2$ describe the speed of
adjustment, that is how each variable reacts to the deviations from the
equilibrium path. Small values of $\alpha_i$ would imply a relatively
unresponsive reaction, which means it takes a long time to return to the
equilibrium.
Note that $\Pi = AB'$ cannot be a full rank matrix. If there are $n$
independent cointegrating vectors, it follows that any linear
combination of the components of $y_t$ is stationary, which effectively
means $y_t$ is stationary. $\Pi$ cannot be zero either. If this is the
case, the system is fully characterized by differenced VAR, there is no
cointegration. Therefore, for a cointegrated system, it necessitates
$0<h<n$.
A three variable example would be like:
$$
\begin{bmatrix}\Delta y_{1t} \\ \Delta y_{2t} \\ \Delta y_{3t}\end{bmatrix} =
\begin{bmatrix}
\alpha_{11} & \alpha_{12}\\
\alpha_{21} & \alpha_{22}\\
\alpha_{31} & \alpha_{32}\\
\end{bmatrix}
\begin{bmatrix}
\beta_{11} & \beta_{12} & \beta_{13}\\
\beta_{21} & \beta_{22} & \beta_{23}\\
\end{bmatrix}
\begin{bmatrix}y_{1t-1} \\ y_{2t-1} \\ y_{3t-1}\end{bmatrix} +
\cdots
$$
In general, if $y_t$ has $n$ non-stationary components, there could be
at most $n-1$ cointegrating vectors. The number of cointegrating vectors
is also called the **cointegrating rank**.
Note that a single equation ECM is equivalent to an ARDL model:
$$
\begin{aligned}
&\Delta y_t = \alpha(y_{t-1} - \delta-\beta x_{t-1}) + \gamma\Delta x_t + u_t \\
\Leftrightarrow &\ y_t = (\alpha+1)y_{t-1} + \gamma x_t - (\alpha\beta+\gamma)x_{t-1}
-\alpha\delta + u_t \\
\Leftrightarrow &\ y_t = b_1y_{t-1} + b_2x_t + b_3x_{t-1} + c + u_t
\end{aligned}
$$
Given a (possibly) cointegrated system, we would like to know if any
cointegrating relations exist and how many cointegrating vectors there
are. Johansen (1991) provides a likelihood-based method to test and
estimate a cointegrated system. But before we introduce the Johansen's
method, we need the prerequisite knowledge of canonical correlations.
## Canonical Correlation
Principle component analysis (PCA) finds a linear combination of
$[x_1\ x_2\ \dots x_n]$ that produces the largest variance. What if we
want to extend the analysis to the correlations between two datasets:
$\underset{T \times n}{X} = [x_1\ x_2\ \dots x_n]$ and
$\underset{T \times m}{Y} = [y_1\ y_2\ \dots y_m]$? The cross-dataset
covariance matrix is
$$
\underset{n\times m}{\Sigma_{XY}} =
\begin{bmatrix}
\text{cov}(x_1,y_1) & \text{cov}(x_1,y_2) & \dots & \text{cov}(x_1,y_m) \\
\text{cov}(x_2,y_1) & \text{cov}(x_2,y_2) & \dots & \text{cov}(x_2,y_m) \\
\vdots & \vdots & \ddots & \vdots \\
\text{cov}(x_n,y_1) & \text{cov}(x_n,y_2) & \dots & \text{cov}(x_n,y_m) \\
\end{bmatrix}
$$
Canonical correlation analysis (CCA) seeks two vectors
$a \in \mathbb R^n$ and $b \in \mathbb R^m$ such that $a'X$ and $b'Y$
maximize the correlation $\rho =\text{corr}(a'X, b'Y)$. The random
transformed random variable $U = a'X$ and $V = b'Y$ are the first pair
of canonical variables. The second pair of canonical variables are
orthogonal to the first pair and maximize the same correlation, and so
on.
Suppose we want to choose $a$ and $b$ to maximize
$$
\rho = \frac{a'\Sigma_{XY}b}{\sqrt{a'\Sigma_{XX}a}\sqrt{b'\Sigma_{YY}b}}
$$
We may impose the constraint such that $a'\Sigma_{XX}a$ and
$b'\Sigma_{YY}b$ normalize to $1$. Form the Lagrangian
$$
\mathcal{L} = a'\Sigma_{XY}b - \frac{\mu}{2}(a'\Sigma_{XX}a - 1) -\frac{\nu}{2}(b'\Sigma_{YY}b -1)
$$
The first-order conditions are
$$
\begin{aligned}
&\frac{\partial\mathcal{L}}{\partial a} = \Sigma_{XY}b -\mu\ \Sigma_{XX}a = 0 \\
&\frac{\partial\mathcal{L}}{\partial b} = \Sigma_{YX}a -\nu\ \Sigma_{YY}b = 0 \\
\end{aligned}
$$
which implies
$$
\begin{aligned}
\Sigma_{XX}^{-1}\Sigma_{XY}\Sigma_{YY}^{-1}\Sigma_{YX} a &= \mu\nu a =\lambda a \\[1em]
\Sigma_{YY}^{-1}\Sigma_{YX}\Sigma_{XX}^{-1}\Sigma_{XY} b &= \mu\nu b =\lambda b
\end{aligned}
$$
Therefore, $a$ is the eigenvector of
$\Sigma_{XX}^{-1}\Sigma_{XY}\Sigma_{YY}^{-1}\Sigma_{YX}$. The associated
eigenvalue $\lambda$ is equivalent to the maximized correlation squared
$\hat\rho^2$. To see this, just multiply the first-order condition by
$a'$:
$$
a'\Sigma_{XY}b = \mu\cdot a'\Sigma_{XX}a=\mu=\rho^*
$$
Symmetrically, $b$ is the eigenvector of
$\Sigma_{YY}^{-1}\Sigma_{YX}\Sigma_{XX}^{-1}\Sigma_{XY}$.
So the canonical correlation can be computed as follows:
1. Compute the eigenvalues and eigenvectors of
$$\Sigma_{XX}^{-1}\Sigma_{XY}\Sigma_{YY}^{-1}\Sigma_{YX}$$
2. Sort the eigenvalues as
$\lambda_1 \geq \lambda_2 \geq\dots\geq \lambda_n$ and
$a_1,a_2,\dots,a_n$ are the corresponding eigenvectors such that
$a'\Sigma_{XX}a=1$.
3. Compute the eigenvalues and eigenvectors of
$$\Sigma_{YY}^{-1}\Sigma_{YX}\Sigma_{XX}^{-1}\Sigma_{XY}$$
4. Sort the eigenvalues as
$\lambda_1 \geq \lambda_2 \geq\dots\geq \lambda_m$ and
$b_1,b_2,\dots,b_m$ are the corresponding eigenvectors such that
$b'\Sigma_{XX}b=1$.
5. $(a_k,b_k)$ is the $k$-th pair of canonical variables, where
$k\leq\min\{m,n\}$; and $\gamma_k$ is the $k$-th largest canonical
correlation.
Note that, if $\underset{n \times k}{A} = [a_1\ a_2\ \dots a_k]$,
$\underset{m \times k}{B} = [b_1\ b_2\ \dots b_k]$, we would have
$A'\Sigma_{XY}B = \Lambda$ where
$$
\Lambda = \begin{bmatrix}
\rho_1 & & \\
& \rho_2 & \\
& & \ddots & \\
& & & \rho_k
\end{bmatrix}
$$
is a diagonal matrix. Therefore, canonical variables transform the
covariance matrix between $X$ and $Y$ into a diagonal matrix, where
entries on the diagonal best summarize the correlations between the two
datasets.
## Johansen's Procedure
1. Estimate by OLS two regressions $$
\begin{aligned}
\Delta y_t &= \hat\Psi_0 + \hat\Psi_1\Delta y_{t-1} + \dots +\hat\Psi_{p-1}\Delta y_{t-p+1} +\hat u_t \\
y_{t-1} &= \hat\Theta_0 + \hat\Theta_1\Delta y_{t-1} + \dots +\hat\Theta_{p-1}\Delta y_{t-p+1} +\hat v_t \\
\end{aligned}
$$ Save $\hat u_t$ and $\hat v_t$.
2. Compute the canonical correlations between $\hat u_t$ and
$\hat v_t$. Find the eigenvalues of
$\Sigma_{vv}^{-1}\Sigma_{vu}\Sigma_{uu}^{-1}\Sigma_{uv}$. Sort them
from the largest to the smallest:
$\hat\lambda_1 \geq \hat\lambda_2 \geq \dots\geq \hat\lambda_n$.
Then the maximum log-likelihood function subject to the constraint
that there are $h$ cointegrating relations is given by $$
\ell^*=-\frac{Tn}{2}\log(2\pi)-\frac{Tn}{2}-\frac{T}{2}\log|\hat\Sigma_{uu}|
-\frac{T}{2}\sum_{i=1}^{h}\log(1-\hat\lambda_i)
$$ The test for there being $h$ cointegrating relations is
equivalent to testing $$
-\frac{T}{2}\sum_{i=h+1}^{n}\log(1-\hat\lambda_i) = 0.
$$
3. Calculate the MLE of the parameters. The cointegrating matrix is
given by
$$\hat B = \begin{bmatrix}\hat b_1 & \hat b_2 & \dots & \hat b_h\end{bmatrix}$$
where $\hat b_i$ are the eigenvectors used to normalize $v_t$. The
adjustment matrix and other parameters are given by $$
\begin{aligned}
\hat A &= \hat\Sigma_{uv}\hat B\\
\hat\Pi &= \hat A\hat B' \\
\hat\Gamma_i &= \hat\Psi_i - \hat\Pi\hat\Theta_i\\
\mu &= \hat\Psi_0 - \hat\Pi\hat\Theta_0.
\end{aligned}
$$
To understand this procedure, note that if we treat $\Pi$ as given, the
MLE for @eq-ecm is equivalent to estimating the coefficients by OLS:
$$
\Delta y_t - \Pi y_{t-1} = \hat\Gamma_1\Delta y_{t-1} + \hat\Gamma_2\Delta y_{t-2} +
\dots + \hat\Gamma_{p-1}\Delta y_{t-p+1} + \hat\mu+ \hat\epsilon_t
$$
The log-likelihood function becomes
$$
\ell(\Pi, \Omega) = -\frac{T}{2}\log |2\pi\Omega| - \frac{1}{2}\sum_{t=1}^{T}
(\hat\epsilon_t'\Omega^{-1}\hat\epsilon_t).
$$
Step 1 does this in two separate regressions, in which
$$
\begin{aligned}
\hat\Gamma_i &= \hat\Psi_i - \Pi\hat\Theta_i \\
\hat\epsilon_t &= \hat u_t - \Pi\hat v_t \\
\end{aligned}
$$
Thus, the log-likelihood can be rewritten as
$$
\ell(\Pi, \Omega) = -\frac{T}{2}\log |2\pi\Omega| - \frac{1}{2}\sum_{t=1}^{T}
[(\hat u_t - \Pi\hat v_t)'\Omega^{-1} (\hat u_t - \Pi \hat v_t)]
$$
Further concentrating $\Omega$:
$$
\hat\Omega = \frac{1}{T}\sum_{t=1}^{T}[(\hat u_t - \Pi\hat v_t)(\hat u_t - \Pi \hat v_t)']
$$
Substituting this into the $\ell$ function
$$
\ell(\Pi) = -\frac{Tn}{2}\log (2\pi) - \frac{Tn}{2}- \frac{T}{2}\log
\left|\frac{1}{T}\sum_{t=1}^{T}[(\hat u_t - \Pi\hat v_t)(\hat u_t - \Pi \hat v_t)']\right|
$$
Thus, maximizing $\ell$ is equivalent to minimizing
$$
\left|\frac{1}{T}\sum_{t=1}^{T}[(\hat u_t - \Pi\hat v_t)(\hat u_t - \Pi \hat v_t)']\right|
$$
by choosing $\Pi$. If $u_t$ is a single variable, the optimal $\hat\Pi$
that minimizes $|T^{-1}\sum_t (u_t-\Pi v_t)^2|$ would be simply the OLS
estimator. Similarly, for the vector case, we have
$$
\hat\Pi = \left(\frac{1}{T}\sum_t u_t v_t'\right)\left(\frac{1}{T}\sum_t v_t v_t'\right)^{-1}
$$
Suppose we have the canonical decomposition $A'\Sigma_{uv}B = \Lambda$
where $\Lambda$ is a diagonal matrix of canonical correlations, and
$A'\Sigma_{uu}A=I$, $B'\Sigma_{vv}B=I$. The estimator can be reduced to
$$
\begin{aligned}
\hat\Pi &= (A'^{-1}\Lambda B^{-1})(B'^{-1}B^{-1})^{-1} = A'^{-1}\Lambda B' \\[1em]
&= A'^{-1}\begin{bmatrix}r_1 \\ & r_2 \\ & & \ddots \\ & & & r_n\end{bmatrix} B'
\end{aligned}
$$
The minimized 'squared residuals' is
$$
\begin{aligned}
\left|\frac{1}{T}\sum_{t=1}^{T}[(\hat u_t - \Pi\hat v_t)(\hat u_t - \Pi \hat v_t)']\right|
&= \left| A'^{-1} \frac{1}{T}\sum_{t=1}^{T}(A'u_t - \Lambda B'v_t)(A'u_t - \Lambda B'v_t)' A^{-1} \right| \\[1em]
&= |A|^{-2} |I-\Lambda\Lambda'| \\[1em]
&= |A|^{-2} \begin{vmatrix}1-r_1^2 \\ & 1-r_2^2 \\ & & \ddots \\ & & & 1-r_n^2\end{vmatrix} \\[1em]
&= |A|^{-2} \prod_{i=1}^{n} (1-\lambda_i).
\end{aligned}
$$
This explains the likelihood function in Step 2. The MLE for $\hat\Pi$
above is subject to no constraint. However, if there is any
cointegrating relations, $\Pi$ cannot be full rank. If we restrict the
rank of $\Pi$ to $h$, the minimized squared residuals is achieved by
picking the $h$ largest $\lambda$s.
## Hypothesis Testing
### Test 1: At most h cointegrations {.unnumbered}
Hypothesis:
$$
\begin{aligned}
&H_0: \text{there are no more than }h\text{ cointegrating relations}\\
&H_1: \text{there are more than }h\text{ cointegrating relations}\\
\end{aligned}
$$
Test statistics:
$$
\lambda_{\text{trace}} = 2(\ell_1^* - \ell_0^*) = -T\sum_{i=h+1}^{n}\log(1-\hat\lambda_i)
$$
If $H_0$ is true, $\lambda_{\text{trace}}$ should be close to zero. The
critical values are provided by the table below. *Case 1* means there is
no constant or deterministic trend; *Case 2* contains constants in
cointegrating vectors but no deterministic trend; *Case 3* contains
deterministic trend.
| $n-h$ | $T$ | 0.1 | 0.05 | 0.025 | 0.001 |
|------------|-----|-------|-------|-------|-------|
| **Case 1** | | | | | |
| 1 | 400 | 2.86 | 3.84 | 4.93 | 6.51 |
| 2 | 400 | 10.47 | 12.53 | 14.43 | 16.31 |
| ... | | | | | |
| **Case 2** | | | | | |
| 1 | 400 | 6.69 | 8.08 | 9.66 | 11.58 |
| 2 | 400 | 15.58 | 17.84 | 19.61 | 21.96 |
| ... | | | | | |
| **Case 3** | | | | | |
| 1 | 400 | 2.82 | 3.96 | 5.33 | 6.94 |
| 2 | 400 | 13.34 | 15.20 | 17.30 | 19.31 |
| ... | | | | | |
: Critical values of Johansen's likelihood ratio test of the null
hypothesis of $h$ integrating relations against the alternative of no
restrictions
### Test 2: h cointegrations vs h+1 {.unnumbered}
Hypothesis:
$$
\begin{aligned}
&H_0: \text{there are }h\text{ cointegrating relations}\\
&H_1: \text{there are }h+1\text{ cointegrating relations}\\
\end{aligned}
$$
Test statistics:
$$
\lambda_{\text{max}} = 2(\ell_1^* -\ell_0^*)=-T\log(1-\hat\lambda_{h+1})
$$
The critical values are given as below.
| $n-h$ | $T$ | 0.1 | 0.05 | 0.025 | 0.001 |
|------------|-----|-------|-------|-------|-------|
| **Case 1** | | | | | |
| 1 | 400 | 2.86 | 3.84 | 4.96 | 6.51 |
| 2 | 400 | 9.52 | 11.44 | 13.27 | 15.69 |
| ... | | | | | |
| **Case 2** | | | | | |
| 1 | 400 | 6.69 | 8.08 | 9.66 | 11.58 |
| 2 | 400 | 12.78 | 14.60 | 16.40 | 18.78 |
| ... | | | | | |
| **Case 3** | | | | | |
| 1 | 400 | 2.82 | 3.96 | 5.33 | 6.94 |
| 2 | 400 | 12.10 | 14.04 | 15.81 | 17.94 |
| ... | | | | | |
: Critical values of Johansen's likelihood ratio test of the null
hypothesis of $h$ integrating relations against the alternative of $h+1$
relations