Friday, December 4, 2015

The linearity of variance – stats.stackexchange.com #JHedzWorlD


I think the following two formulas are true:


$$ mathrmVar(aX)=a^2 mathrmVar(X) $$ while a is a constant number $$ mathrmVar(X + Y)=mathrmVar(X)+mathrmVar(Y) $$ if $X$, $Y$ are independent


However, I am not sure what is wrong with the below:


$$mathrmVar(2X) = mathrmVar(X+X) = mathrmVar(X) + mathrmVar(X) $$ which does not equal to $2^2 mathrmVar(X)$, i.e. $4mathrmVar(X)$.


If it is assumed that $X$ is the sample taken from a population, I think we can always assume $X$ to be independent from the other $X$s.


So what is wrong with my confusion?




$DeclareMathOperatorCovCov$ $DeclareMathOperatorCorrCorr$ $DeclareMathOperatorVarVar$


The problem with your line of reasoning is


“I think we can always assume $X$ to be independent from the other $X$s.”



$X$ is not independent of $X$. The symbol $X$ is being used to refer to the same random variable here. Once you know the value of the first $X$ to appear in your formula, this also fixes the value of the second $X$ to appear.


If two variables $X$ and $Y$ are independent then $Pr(X=a|Y=b)$ is the same as $Pr(X=a)$: knowing the value of $Y$ does not give us any additional information about the value of $X$. But $Pr(X=a|X=b)$ is $1$ if $a=b$ and $0$ otherwise: knowing the value of $X$ gives you complete information about the value of $X$.


Another way of seeing things is that if two variables are independent then they have zero correlation (though zero correlation does not imply independence!) but $X$ is perfectly correlated with itself, $Corr(X,X)=1$ so $X$ can’t be independent of itself. Note that since the covariance is given by $Cov(X,Y)=Corr(X,Y)sqrtVar(X)Var(Y)$, then
$$Cov(X,X)=1sqrtVar(X)^2=Var(X)$$


The more general formula for the variance of a sum of two random variables is


$$Var(X+Y) = Var(X) + Var(Y) + 2 Cov(X,Y)$$


In particular, $Cov(X,X) = Var(X)$, so


$$Var(X+X) = Var(X) + Var(X) + 2Var(X) = 4Var(X)$$


which is the same as you would have deduced from applying the rule


$$Var(aX) = a^2 Var(X) implies Var(2X) = 4Var(X)$$



If you are interested in linearity, then you might be interested in the bilinearity of covariance. For random variables $W$, $X$, $Y$ and $Z$ (whether dependent or independent) and constants $a$, $b$, $c$ and $d$ we have


$$Cov(aW + bX, Y) = a Cov(W,Y) + b Cov(X,Y)$$


$$Cov(X, cY + dZ) = c Cov(X,Y) + d Cov(X,Z)$$


and overall,


$$Cov(aW + bX, cY + dZ) = ac Cov(W,Y) + ad Cov(W,Z) + bc Cov(X,Y) + bd Cov(X,Z)$$


You can then use this to prove the (non-linear) results for variance that you wrote in your post:


$$Var(aX) = Cov(aX, aX) = a^2 Cov(X,X) = a^2 Var(X)$$


$$ beginalign Var(aX + bY) &= Cov(aX + bY, aX + bY) \ &= a^2 Cov(X,X) + ab Cov(X,Y) + ba Cov (X,Y) + b^2 Cov(Y,Y) \ Var(aX + bY) &= a^2 Var(X) + b^2 Var(Y) + 2ab Cov(X,Y) endalign $$


The latter gives, as a special case when $a=b=1$,


$$Var(X+Y) = Var(X) + Var(Y) + 2 Cov(X,Y)$$


When $X$ and $Y$ are uncorrelated (which includes the case where they are independent), then this reduces to $Var(X+Y) = Var(X) + Var(Y)$. So if you want to manipulate variances in a “linear” way (which is often a nice way to work algebraically), then work with the covariances instead, and exploit their bilinearity.









    

Variance isn't linear — your first statement shows this (if it were, you'd have $Var(aX) = a Var(X)$. Covariance on the other hand is bilinear. – Batman 1 hour ago








    

Thank you so much! I got it! – lanselibai 1 hour ago


JHedzWorlD


AIM GLOBAL






No comments:

Post a Comment