This TD is to write in Rmarkdown.

1 probability reminder

A random variable \(X\) is a variable whose value is determined after the realization of a phenomenon, experiment or random event.

1.1 Distribution functions

For a discrete random variable \(X\) taking values in a space value \(\xi\), the probability mass functions p.m.f are the probabilities \[P(X = k), k\in\xi\]

Classical discrete distribution:

  • Uniform distribution on \(\{a,\ldots,b\}\) (e.g. roll a dice)
  • Bernouilli distribution (e.g. flip a coin)
  • Binomial distribution (e.g. tossing a coin multiple times)
  • Poisson distribution (counting law, e.g. Number of sinister in assurance)

For a continuous random variable \(X\), the probability density function pdf is the mathematical function \(f\) whose the area under curve between \(a\) and \(b\) give the probability:

\[ P(a\leq X\leq b)= \int_{a}^b f(x)dx. \]

Classical continue distribution:

  • Uniform distribution on \([a,b]\) (e.g. roll a real number in the interval \([a,b]\))
  • Gaussian distribution (e.g. heights of students in a class)
  • Exponential distribution (times between events or arrivals e.g. time between arrivals at a bus stop)

The cumulative distribution function c.d.f. is

\[ F(x) = P(X\leq x) \]

  • the quantile function is

\[ Q(\alpha) = F^{-1}(\alpha);\quad \text{that is to say the value of $x$ such that } P(X\leq x) = \alpha. \] N.B.: \(Q(0.5)\) is the median of \(X\), \(Q(0.25)\) is the first quartile and \(Q(0.75)\) is the third quartile

1.2 Expectation, variance, covariance

For a continuous random variable taking value in \(\mathbb R\) and with density \(f\), the expectation of \(X\) is

\[ \mathbb E X = \int_{\mathbb R} xf(x) dx. \] For a discrete variable taking value on a discrete space \(\xi\), the expectation of \(X\) is \[ \mathbb E X = \sum_{k\in\xi} k P(X=k). \] The variance is \[ var(X) = \mathbb E\left((X-\mathbb E(X))^2\right) = \mathbb E(X^2)-(\mathbb E(X))^2 \] The standard deviation is \(sd(X)=\sqrt{var(X)}\). Finally, for two random variables \(X\) and \(Y\), the covariance between \(X\) and \(Y\) is \[ cov(X,Y)=E\left((X-E(X))(Y-E(Y))\right) = \mathbb E(XY) - \mathbb E(X)\mathbb E(Y) \]

1.3 On a sample

For an independent and identically distributed (i.i.d.) sample \(y_1,\ldots,y_n\), an estimation of expectation and variance of the sample are the mean value (or average/ empirical mean) and the empirical variance given by: \[ \bar y=mean(y_1,\ldots,y_n)=\frac{1}{n}\sum_{i=1}^n y_i,\quad S_y^2 =var(y_1,\ldots,y_n) = \frac{1}{n}\sum_{i=1}^n(y_i - \bar y)^2 = \bar{y^2} - (\bar y)^2. \] Standard deviation (sd) on the sample is \(sd(y_1,\ldots,y_n)=\sqrt{S^2_y}\)

In the same way, the empirical covariance between the samples \(x_1,\ldots,x_n\) and \(y_1,\ldots,y_n\) is \[ C_{x,y}=\frac{1}{n}\sum_{i=1}^n(x_i-\bar x)(y_i - \bar y) = \bar{xy} - \bar x\bar y. \]

Finally the (Pearson’s) correlation between \(x_1,\ldots,x_n\) and \(y_1,\ldots,y_n\) is \[ r_{x,y}=\frac{C_{x,y}}{\sqrt{S_x^2S_y^2}} \]

Pearson’s correlation is on \([-1,1]\), \(\pm 1\) for perfect linear dependence and 0 for no linear dependence.

In R, use the R function mean, var, sd, cov, cor.

2 Generating distributions

In the following exercises, we will use the R functions:

plot(function(x) dnorm(x,mean=0,sd=1), xlim=c(-5,5), ylab="", main="density of the gaussian distribution")

plot(function(x) pnorm(x,mean=0,sd=1), xlim=c(-5,5), ylab="", main="c.d.f. of the gaussian distribution")

 barplot(dpois(0:10,4),ylab="Probability",xlab="k",
space=2,ylim=c(0,0.2), names.arg=0:10, main="p.m.f. of the poisson distribution with parameter 4")

 plot(function(x) ppois(x,4), xlim=c(-1,10), ylab="", main="c.d.f. of the poisson distribution with parameter 4")

2.1 Simulation of classical distributions

First, fix the seed by using the following command

set.seed(1)
  • Using the R functions rexp, rnorm, rbinom, generate i.i.d. samples \(X_1\), \(X_2\) and \(X_3\) of size \(n=10000\) respectively from exponential distribution with rate parameter \(\lambda=2\), from Gaussian distribution with mean value \(\mu=3\) and standard deviation \(sd=2\) and from Bernoulli distribution with probability of success \(p=0.25\).
  • Create a data frame \(X\) (R command data.frame) which contains \(X_1\), \(X_2\) and \(X_3\). Print the top of the data frame (use head).
  • Using the R function apply, Calculate the mean value and the empirical variance of the samples. Compare with the theoretical values. Evaluate the Pearson’s correlation between the 3 samples and also the covariance matrix (use \(cor(X)\), \(cov(X)\)).
  • Plot the histogram of the distributions of the exponential and Gaussian distribution and add the corresponding density curves (use hist(X1, freq=FALSE))
  • print the table of frequency (use table(X3)/n) and compare these values with \(p\) and \(1-p\)
n=10000
#?rexp
#?rnorm
#?rbinom

X1<-rexp(n,2) ##add comment
X2<-rnorm(n,3,2)
X3<-rbinom(n,1,0.25)

X=data.frame(exp=X1,Gauss=X2,Bernouilli=X3) 
head(X)  
##          exp    Gauss Bernouilli
## 1 0.37759092 4.869048          0
## 2 0.59082139 3.580626          0
## 3 0.07285336 3.387249          0
## 4 0.06989763 5.147031          1
## 5 0.21803431 2.286873          0
## 6 1.44748427 3.790633          0
apply(X,2,mean)
##        exp      Gauss Bernouilli 
##  0.4991806  3.0078905  0.2482000
apply(X,2,var)
##        exp      Gauss Bernouilli 
##  0.2578852  4.0025920  0.1866154
cov(X)
##                    exp        Gauss   Bernouilli
## exp         0.25788516 -0.023411310 -0.006222800
## Gauss      -0.02341131  4.002592022 -0.005486475
## Bernouilli -0.00622280 -0.005486475  0.186615422
hist(X1,breaks=30,freq=FALSE) ##add comment
plot(function(x) dexp(x,2),add=TRUE,col="red",xlim=c(0,5))##add comment

hist(X2,breaks=30,freq=FALSE)
plot(function(x) dnorm(x,3,2),add=TRUE,col="red",xlim=c(-10,10))

table(X3)/n
## X3
##      0      1 
## 0.7518 0.2482
pie(table(X3))

2.2 Simulating distributions using uniform samplings

Let \(U\) a random variable distributed from the uniform distribution on \([0,1]\). Then for any cumulative distribution function \(F\), the random variable \[Z=F^{-1}(U)\quad \text{has distribution } F.\] Using this result, simulate the same exponential, Gaussian samples and answer the same questions as exercise 1. Use the r function runif, qexp, qnorm, qbinom

set.seed(2)
n=10000
U1=runif(n)
#?qexp
##to complete

Let \(X\) a continuous random variable with c.d.f. \(F\). Then the random variable \(Z=F(X)\) is uniformly distributed on \([0,1]\).

Using this result, from a Gaussian distribution, then from an exponential distribution, simulate a uniform sample on \([0,1]\).

#to complete

2.3 Theorem Central Limit (TCL), convergence in distribution

Let \(X_1,\ldots,X_n\) an i.i.d. sample with expectation \(\mathbb E(X)=\mu\) and variance \(Var(X) =\sigma^2\). Then,
\[ \sqrt{n}(\bar X_n - \mu) \rightarrow \mathcal N(0,\sigma^2) \quad\text{as soons as } n\rightarrow\infty \] We say that the mean value \(\bar X_n\) converges in distribution to a Gaussian distribution.

Using a loop ‘for’, simulate \(M=10^4\) samples of size \(n=1000\) from a Poisson distribution with parameter \(\lambda=5\) and verify the theorem. Plot the histogram of the statistic with the density of the Gaussian.

M=10^4;n=10^3
Z=rep(0,M); #initialization
for(m in 1:M){
##complete the loop
##  Z[m]<-
}
#hist(Z,freq=FALSE)
#plot(function(x) dnorm(x,0,..),add=TRUE,col="red",xlim=c(-6,6) )

Note that pour a Poisson distribution the expectation and the variance is \(\lambda=5\), so the standard deviation is \(\sqrt{\lambda}=5\).

try now with a exponential distribution with rate parameter \(\lambda=5\). Note that for an exponential distribution the expectation is \(1/\lambda=1/5\), and the variance is \(1/\lambda^2=1/25\) (so the sd is 1/5)

2.4 Law of large numbers (LLN), almost surely convergence

Let \(X_1,\ldots,X_n\) an i.i.d. sample with expectation \(\mu\). Then \[P(\bar X_n \rightarrow \mu)=1.\] We say that \(\bar X_n\) converges almost surely (a.s.) to \(\mu\).

simulate a sample of size \(n=10^4\) of exponential distribution with parameter \(\lambda=2\). For \(k\) varying from 1 to \(n\), evaluate the mean value \(Z_k=\bar X_k\). Plot the sequence \(Z_k\) and add the horizontal line \(y=0.5\) (use abline(h=0.5)).

#TO DO