格兰杰因果关系


请输入要查询的词条内容:

格兰杰因果关系




格兰杰因果关系 (granger causality)


要探讨因果关系,首先当然要定义什么是因果关系。这里不再谈伽利略抑或休谟等人在哲学意义上所说的因果关系,只从统计意义上介绍其定义。从统计的角度,因果关系是通过概率或者分布函数的角度体现出来的:在宇宙中所有其它事件的发生情况固定不变的条件下,如果一个事件A的发生与不发生对于另一个事件B的发生的概率(如果通过事件定义了随机变量那么也可以说分布函数)有影响,并且这两个事件在时间上有先后顺序(A前B后),那么我们便可以说A是B的原因。 早期因果性是简单通过概率来定义的,即如果P(B|A)>P(B)那么A就是B的原因(Suppes,1970);然而这种定义有两大缺陷:一、没有考虑时间先后顺序;二、从P(B|A)>P(B)由条件概率公式马上可以推出P(A|B)>P(A),显然上面的定义就自相矛盾了(并且定义中的“>”毫无道理,换成“<”照样讲得通,后来通过改进,把定义中的“>”改为了不等号“≠”,其实按照同样的推理,这样定义一样站不住脚)。 事实上,以上定义还有更大的缺陷,就是信息集的问题。严格讲来,要真正确定因果关系,必须考虑到完整的信息集,也就是说,要得出“A是B的原因”这样的结论,必须全面考虑宇宙中所有的事件,否则往往就会发生误解。最明显的例子就是若另有一个事件C,它是A和B的共同原因,考虑一个极端情况:若P(A|C)=1,P(B|C)=1,那么显然有P(B|AC)=P(B|C),此时可以看出A事件是否发生与B事件已经没有关系了。 因此,Granger(1980)提出了因果关系的定义,他的定义是建立在完整信息集以及发生时间先后顺序基础上的。至于判断准则,也在逐步发展变化: 最初是根据分布函数(条件分布)判断,注意Ωn是到n期为止宇宙中的所有信息,Yn为到n期为止所有的Yt (t=1…n),Xn+1为第n+1期X的取值,Ωn-Yn为除Y之外的所有信息。F(Xn+1 | Ωn) ≠ F(Xn+1 | (Ωn − Yn)) - - - - - - - (1) 后来认为宇宙信息集是不可能找到的,于是退而求其次,找一个可获取的信息集J来替代Ω:F(Xn+1 | Jn) ≠ F(Xn+1 | (Jn − Yn)) - - - - - - - (2) 再后来,大家又认为验证分布函数是否相等实在是太复杂,于是再次退而求其次,只是验证期望是否相等(这种叫做均值因果性,上面用分布函数验证的因果关系叫全面因果性):E(Xn+1 | Jn) ≠ E(Xn+1 | (Jn − Yn)) - - - - - - - (3) 也有一种方法是验证Y的出现是否能减小对Xn+1的预测误差,即:σ2(Xn+1 | Jn) < σ2(Xn+1 | (Jn − Yn)) - - - - - - - (4) 最后一种方法已经接近我们最常用的格兰杰因果检验方法,统计上通常用残差平方和来表示预测误差,于是常常用X和Y建立回归方程,通过假设检验的方法(F检验)检验Y的系数是否为零。 可以看出,我们所使用的Granger因果检验与其最初的定义已经偏离甚远,削减了很多条件(并且由回归分析方法和F检验的使用我们可以知道还增强了若干条件),这很可能会导致虚假的因果关系。因此,在使用这种方法时,务必检查前提条件,使其尽量能够满足。此外,统计方法并非万能的,评判一个对象,往往需要多种角度的观察。正所谓“兼听则明,偏听则暗”。诚然真相永远只有一个,但是也要靠科学的探索方法。

英语翻译


Granger causality test is a technique for determining whether one time series is useful in forecasting another.Ordinarily, regressions reflect "mere" correlations, but Clive Granger, who won a Nobel Prize in Economics, argued that there is an interpretation of a set of tests as revealing something about causality.

A time series X is said to Granger-cause Y if it can be shown, usually through a series of F-tests on lagged values of X (and with lagged values of Y also known), that those X values provide statistically significant information about future values of Y.

The test works by first doing a regression of ΔY on lagged values of ΔY. Once the appropriate lag interval for Y is proved significant (t-stat or p-value), subsequent regressions for lagged levels of ΔX are performed and added to the regression provided that they 1) are significant in and of themselves and 2) add explanatory power to the model. This can be repeated for multiple ΔXs (with each ΔX being tested independently of other ΔXs, but in conjunction with the proven lag level of ΔY). More than one lag level of a variable can be included in the final regression model, provided it is statistically significant and provides explanatory power.

The researcher is often looking for a clear story, such as X granger-causes Y but not the other way around. In practice, however results are often hard-to-interpret. For instance no variable granger-causes the other, or that each of the two variables granger-causes the second.

Despite its name, Granger causality does not imply true causality. If both X and Y are driven by a common third process with different lags, their measure of Granger causality could still be statistically significant. Yet, manipulation of one process would not change the other. Indeed, the Granger test is designed to handle pairs of variables, and may produce misleading results when the true relationship involves three or more variables. A similar test involving more variables can be applied with vector autoregression. A new method for Granger causality that is not sensitive to the normal distribution of the error term has been developed by Hacker and Hatemi-J (2006). This new method is especially useful in financial economics since many financial variables are non-normal.

This technique has been adapted to neural science..

Here is an example of the function grangertest() in the lmtest library of the R package:

Granger causality testModel 1: fii ~ Lags(fii, 1:5) + Lags(rM, 1:5)

Model 2: fii ~ Lags(fii, 1:5)

Res.Df Df F Pr(>F)

1 629

2 634 5 2.5115 0.02896 *

---Signif. codes: 0 ''***'' 0.001 ''**'' 0.01 ''*'' 0.05 ''.'' 0.1 '' '' 1

Granger causality test

Model 1: rM ~ Lags(rM, 1:5) + Lags(fii, 1:5)

Model 2: rM ~ Lags(rM, 1:5)

Res.Df Df F Pr(>F)

1 629

2 634 5 1.1804 0.3172The first Model 1 tests whether it is okay to remove lagged rM from the regression explaining FII using lagged FII. It is not (p = 0.02896). The second pair of Model 1 and Model 2 finds that it is possible to remove the lagged FII from the model explaining rM using lagged rM. From this, we conclude that rM granger-causes FII but not the other way around.