This TD is to write in Rmarkdown.
A random variable \(X\) is a variable whose value is determined after the realization of a phenomenon, experiment or random event.
For a discrete random variable \(X\) taking values in a space value \(\xi\), the probability mass functions p.m.f are the probabilities \[P(X = k), k\in\xi\]
Classical discrete distribution:
For a continuous random variable \(X\), the probability density function pdf is the mathematical function \(f\) whose the area under curve between \(a\) and \(b\) give the probability:
\[ P(a\leq X\leq b)= \int_{a}^b f(x)dx. \]
Classical continue distribution:
The cumulative distribution function c.d.f. is
\[ F(x) = P(X\leq x) \]
\[ Q(\alpha) = F^{-1}(\alpha);\quad \text{that is to say the value of $x$ such that } P(X\leq x) = \alpha. \] N.B.: \(Q(0.5)\) is the median of \(X\), \(Q(0.25)\) is the first quartile and \(Q(0.75)\) is the third quartile
For a continuous random variable taking value in \(\mathbb R\) and with density \(f\), the expectation of \(X\) is
\[ \mathbb E X = \int_{\mathbb R} xf(x) dx. \] For a discrete variable taking value on a discrete space \(\xi\), the expectation of \(X\) is \[ \mathbb E X = \sum_{k\in\xi} k P(X=k). \] The variance is \[ var(X) = \mathbb E\left((X-\mathbb E(X))^2\right) = \mathbb E(X^2)-(\mathbb E(X))^2 \] The standard deviation is \(sd(X)=\sqrt{var(X)}\). Finally, for two random variables \(X\) and \(Y\), the covariance between \(X\) and \(Y\) is \[ cov(X,Y)=E\left((X-E(X))(Y-E(Y))\right) = \mathbb E(XY) - \mathbb E(X)\mathbb E(Y) \]
For an independent and identically distributed (i.i.d.) sample \(y_1,\ldots,y_n\), an estimation of expectation and variance of the sample are the mean value (or average/ empirical mean) and the empirical variance given by: \[ \bar y=mean(y_1,\ldots,y_n)=\frac{1}{n}\sum_{i=1}^n y_i,\quad S_y^2 =var(y_1,\ldots,y_n) = \frac{1}{n}\sum_{i=1}^n(y_i - \bar y)^2 = \bar{y^2} - (\bar y)^2. \] Standard deviation (sd) on the sample is \(sd(y_1,\ldots,y_n)=\sqrt{S^2_y}\)
In the same way, the empirical covariance between the samples \(x_1,\ldots,x_n\) and \(y_1,\ldots,y_n\) is \[ C_{x,y}=\frac{1}{n}\sum_{i=1}^n(x_i-\bar x)(y_i - \bar y) = \bar{xy} - \bar x\bar y. \]
Finally the (Pearson’s) correlation between \(x_1,\ldots,x_n\) and \(y_1,\ldots,y_n\) is \[ r_{x,y}=\frac{C_{x,y}}{\sqrt{S_x^2S_y^2}} \]
Pearson’s correlation is on \([-1,1]\), \(\pm 1\) for perfect linear dependence and 0 for no linear dependence.
In R, use the R function mean, var, sd, cov, cor.
In the following exercises, we will use the R functions:
plot(function(x) dnorm(x,mean=0,sd=1), xlim=c(-5,5), ylab="", main="density of the gaussian distribution")
plot(function(x) pnorm(x,mean=0,sd=1), xlim=c(-5,5), ylab="", main="c.d.f. of the gaussian distribution")
barplot(dpois(0:10,4),ylab="Probability",xlab="k",
space=2,ylim=c(0,0.2), names.arg=0:10, main="p.m.f. of the poisson distribution with parameter 4")
plot(function(x) ppois(x,4), xlim=c(-1,10), ylab="", main="c.d.f. of the poisson distribution with parameter 4")
First, fix the seed by using the following command
set.seed(1)
n=10000
#?rexp
#?rnorm
#?rbinom
X1<-rexp(n,2) ##add comment
X2<-rnorm(n,3,2)
X3<-rbinom(n,1,0.25)
X=data.frame(exp=X1,Gauss=X2,Bernouilli=X3)
head(X)
## exp Gauss Bernouilli
## 1 0.37759092 4.869048 0
## 2 0.59082139 3.580626 0
## 3 0.07285336 3.387249 0
## 4 0.06989763 5.147031 1
## 5 0.21803431 2.286873 0
## 6 1.44748427 3.790633 0
apply(X,2,mean)
## exp Gauss Bernouilli
## 0.4991806 3.0078905 0.2482000
apply(X,2,var)
## exp Gauss Bernouilli
## 0.2578852 4.0025920 0.1866154
cov(X)
## exp Gauss Bernouilli
## exp 0.25788516 -0.023411310 -0.006222800
## Gauss -0.02341131 4.002592022 -0.005486475
## Bernouilli -0.00622280 -0.005486475 0.186615422
hist(X1,breaks=30,freq=FALSE) ##add comment
plot(function(x) dexp(x,2),add=TRUE,col="red",xlim=c(0,5))##add comment
hist(X2,breaks=30,freq=FALSE)
plot(function(x) dnorm(x,3,2),add=TRUE,col="red",xlim=c(-10,10))
table(X3)/n
## X3
## 0 1
## 0.7518 0.2482
pie(table(X3))
Let \(U\) a random variable distributed from the uniform distribution on \([0,1]\). Then for any cumulative distribution function \(F\), the random variable \[Z=F^{-1}(U)\quad \text{has distribution } F.\] Using this result, simulate the same exponential, Gaussian samples and answer the same questions as exercise 1. Use the r function runif, qexp, qnorm, qbinom
set.seed(2)
n=10000
U1=runif(n)
#?qexp
##to complete
Let \(X\) a continuous random variable with c.d.f. \(F\). Then the random variable \(Z=F(X)\) is uniformly distributed on \([0,1]\).
Using this result, from a Gaussian distribution, then from an exponential distribution, simulate a uniform sample on \([0,1]\).
#to complete
Let \(X_1,\ldots,X_n\) an i.i.d.
sample with expectation \(\mathbb
E(X)=\mu\) and variance \(Var(X)
=\sigma^2\). Then,
\[
\sqrt{n}(\bar X_n - \mu) \rightarrow \mathcal N(0,\sigma^2)
\quad\text{as soons as } n\rightarrow\infty
\] We say that the mean value \(\bar
X_n\) converges in distribution to a Gaussian distribution.
Using a loop ‘for’, simulate \(M=10^4\) samples of size \(n=1000\) from a Poisson distribution with parameter \(\lambda=5\) and verify the theorem. Plot the histogram of the statistic with the density of the Gaussian.
M=10^4;n=10^3
Z=rep(0,M); #initialization
for(m in 1:M){
##complete the loop
## Z[m]<-
}
#hist(Z,freq=FALSE)
#plot(function(x) dnorm(x,0,..),add=TRUE,col="red",xlim=c(-6,6) )
Note that pour a Poisson distribution the expectation and the variance is \(\lambda=5\), so the standard deviation is \(\sqrt{\lambda}=5\).
try now with a exponential distribution with rate parameter \(\lambda=5\). Note that for an exponential distribution the expectation is \(1/\lambda=1/5\), and the variance is \(1/\lambda^2=1/25\) (so the sd is 1/5)
Let \(X_1,\ldots,X_n\) an i.i.d. sample with expectation \(\mu\). Then \[P(\bar X_n \rightarrow \mu)=1.\] We say that \(\bar X_n\) converges almost surely (a.s.) to \(\mu\).
simulate a sample of size \(n=10^4\) of exponential distribution with parameter \(\lambda=2\). For \(k\) varying from 1 to \(n\), evaluate the mean value \(Z_k=\bar X_k\). Plot the sequence \(Z_k\) and add the horizontal line \(y=0.5\) (use abline(h=0.5)).
#TO DO