1 probability reminder

1.1 Distribution functions

  • For a continuous variable \(X\), the probability density function pdf is the mathematical function \(f\) whose the area under curve between \(a\) and \(b\) give the probability:

\[ P(a\leq X\leq b)= \int_{a}^b f(x)dx. \]

  • For a discrete random variable \(X\) taking values in a space value \(\xi\), the probability mass functions p.m.f are the probabilities \[P(X = k), k\in\xi\]

  • the cumulative distribution function c.d.f. is

\[ F(x) = P(X\leq x) \]

  • the quantile function is

\[ Q(\alpha) = F^{-1}(\alpha);\quad \text{that is to say the value of $x$ such that } P(X\leq x) = \alpha. \] N.B.: \(Q(0.5)\) is the median of \(X\), \(Q(0.25)\) is the first quartile and \(Q(0.75)\) is the third quartile

1.2 Expectation, variance, covariance

For a continuous random variable taking value in \(\mathbb R\) and with density \(f\), the expectation of \(X\) is

\[ \mathbb E X = \int_{\mathbb R} xf(x) dx. \] For a discrete variable taking value on a discrete space \(\xi\), the expectation of \(X\) is \[ \mathbb E X = \sum_{k\in\xi} k P(X=k). \] The variance is \[ var(X) = \mathbb E\left((X-\mathbb E(X))^2\right) = \mathbb E(X^2)-(\mathbb E(X))^2 \] The standard deviation is \(sd(X)=\sqrt{var(X)}\). Finally, for two random variables \(X\) and \(Y\), the covariance between \(X\) and \(Y\) is \[ cov(X,Y)=E\left((X-E(X))(Y-E(Y))\right) = \mathbb E(XY) - \mathbb E(X)\mathbb E(Y) \]

1.3 On a sample

For an independent and identically distributed (i.i.d.) sample \(y_1,\ldots,y_n\), an estimation of expectation and variance of the sample are the mean value (or empirical mean) and the empirical variance given by: \[ \bar y=\frac{1}{n}\sum_{i=1}^n y_i,\quad S_y^2 = \frac{1}{n}\sum_{i=1}^n(y_i - \bar y)^2 = \bar{y^2} - (\bar y)^2. \] Standard deviation (sd) on the sample is \(sd(y_1,\ldots,y_n)=\sqrt{S^2_y}\)

In the same way, the empirical covariance between the samples \(x_1,\ldots,x_n\) and \(y_1,\ldots,y_n\) is \[ C_{x,y}=\frac{1}{n}\sum_{i=1}^n(x_i-\bar x)(y_i - \bar y) = \bar{xy} - \bar x\bar y. \]

Finally the (Pearson’s) correlation between \(x_1,\ldots,x_n\) and \(y_1,\ldots,y_n\) is \[ r_{x,y}=\frac{C_{x,y}}{\sqrt{S_x^2S_y^2}} \] Pearson’s correlation is on \([-1,1]\), \(\pm 1\) for perfect linear dependence and 0 for no linear dependence.

2 Distributions

2.1 Generating distributions

In the following exercises, we will use the R functions:

  • rnorm, rexp, rbinom, rpois to simulate random variables with respectively Gaussian, exponential, binomial and poisson distributions
  • pnorm, pexp, pbinom, ppois to to evaluate the c.d.f. of random variables with respectively Gaussian, exponential, binomial and poisson distributions
  • qnorm, qexp, qbinom, qpois to to evaluate the quantile function, i.e. the inverse of the c.d.f. of random variables with respectively Gaussian, exponential, binomial and poisson distributions
  • dnorm, dexp, dbinom, dpois to to evaluate the density or p.m.f. of random variables with respectively Gaussian, exponential, binomial and poisson distributions
plot(function(x) dnorm(x,mean=0,sd=1), xlim=c(-5,5), ylab="", main="density of the gaussian distribution")

plot(function(x) pnorm(x,mean=0,sd=1), xlim=c(-5,5), ylab="", main="c.d.f. of the gaussian distribution")

 barplot(dpois(0:10,4),ylab="Probability",xlab="k",
space=2,ylim=c(0,0.2), names.arg=0:10, main="p.m.f. of the poisson distribution with parameter 4")

 plot(function(x) ppois(x,4), xlim=c(-1,10), ylab="", main="c.d.f. of the poisson distribution with parameter 4")

2.2 Simulation of classical distributions

First, fix the seed by using the following command

set.seed(1)
  • Using the R functions rexp, rnorm, rbinom, generate i.i.d. samples \(X_1\), \(X_2\) and \(X_3\) of size \(n=10000\) respectively from exponential distribution with rate parameter \(\lambda=2\), from Gaussian distribution with mean value \(\mu=3\) and standard deviation \(sd=2\) and from Bernoulli distribution with probability of success \(p=0.25\).
  • Create a data frame \(X\) (R command data.frame) which contains \(X_1\), \(X_2\) and \(X_3\). Print the top of the data frame (use head).
  • Using the R function apply, Calculate the mean value and the empirical variance of the samples. Compare with the theoretical values. Evaluate the Pearson’s correlation between the 3 samples and also the covariance matrix (use \(cor(X)\), \(cov(X)\)).
  • Plot the histogram of the distributions of the exponential and Gaussian distribution and add the corresponding density curves (use hist(X1, freq=FALSE))
  • print the table of frequency (use table(X3)/n) and compare these values with \(p\) and \(1-p\)
n=10000

X1<-rexp(n,2)
X2<-rnorm(n,3,2)
X3<-rbinom(n,1,0.25)

X=data.frame(exp=X1,Gauss=X2,Bernouilli=X3) ###concatenation de X1, X2 et X3
head(X)  ##Print the head of the matrix (or dataset) X
##          exp    Gauss Bernouilli
## 1 0.37759092 4.869048          0
## 2 0.59082139 3.580626          0
## 3 0.07285336 3.387249          0
## 4 0.06989763 5.147031          1
## 5 0.21803431 2.286873          0
## 6 1.44748427 3.790633          0
apply(X,2,mean)
##        exp      Gauss Bernouilli 
##  0.4991806  3.0078905  0.2482000
apply(X,2,var)
##        exp      Gauss Bernouilli 
##  0.2578852  4.0025920  0.1866154
cov(X)
##                    exp        Gauss   Bernouilli
## exp         0.25788516 -0.023411310 -0.006222800
## Gauss      -0.02341131  4.002592022 -0.005486475
## Bernouilli -0.00622280 -0.005486475  0.186615422
hist(X1,breaks=30,freq=FALSE)
plot(function(x) dexp(x,2),add=TRUE,col="red",xlim=c(0,5))

hist(X2,breaks=30,freq=FALSE)
plot(function(x) dnorm(x,3,2),add=TRUE,col="red",xlim=c(-10,10))

table(X3)/n
## X3
##      0      1 
## 0.7518 0.2482
pie(table(X3),labels=c("males","females"))

2.3 Simulating distributions using uniform samplings

Let \(U\) a random variable distributed from the uniform distribution on \([0,1]\). Then for any cumulative distribution function \(F\), the random variable \[Z=F^{-1}(U)\quad \text{has distribution } F.\] Using this result, simulate the same exponential, Gaussian samples and answer the same questions as exercise 1. Use the r function runif, qexp, qnorm, qbinom

U1=runif(n)
U2=runif(n)
U3=runif(n)
X1= qexp(U1, 2)
X2= qnorm(U2,3,2)
X3=qbinom(U3,1,0.25)
X=data.frame(exp=X1,Gauss=X2,Bernouilli=X3) ###concatenation de X1, X2 et X3
head(X)  ##Affiche le début de la matrice X
##          exp     Gauss Bernouilli
## 1 0.01972026 4.5880807          0
## 2 0.18615195 0.7783434          0
## 3 0.59454820 2.5494842          0
## 4 0.08673629 6.3367267          0
## 5 0.91940992 2.3569812          1
## 6 0.78985083 4.1971632          0
apply(X,2,mean)
##        exp      Gauss Bernouilli 
##   0.507474   3.009853   0.247900
apply(X,2,var)
##        exp      Gauss Bernouilli 
##  0.2578895  4.0610531  0.1864642
cor(X)
##                      exp         Gauss    Bernouilli
## exp         1.0000000000 -0.0009199897 -0.0121168116
## Gauss      -0.0009199897  1.0000000000 -0.0009433278
## Bernouilli -0.0121168116 -0.0009433278  1.0000000000
hist(X1,breaks=30,freq=FALSE)

hist(X2,breaks=30,freq=FALSE)

table(X3)
## X3
##    0    1 
## 7521 2479

Let \(X\) a continuous random variable with c.d.f. \(F\). Then the random variable \(Z=F(X)\) is uniformly distributed on \([0,1]\).

Using this result, from a Gaussian distribution, then from an exponential distribution, simulate a uniform sample on \([0,1]\).

X1=rnorm(n,2,3)
X2=rexp(n,3)

U1=pnorm(X1,2,3)
U2=pexp(X2,3)

hist(U1,breaks=30,freq=FALSE)

hist(U2,breaks=30,freq=FALSE)

2.3.1 Pareto distribution

The Pareto distribution has cumulative distribution function

\[ F(x) = 1 - (\frac{\mu}{x})^k\quad x\geq \mu,\quad k\in\mathbb R+ \] Determine the quantile function \(Q\). Verify the median is \(Q(0.5)=\mu\sqrt[k]{2}\). Create the function rPareto which for \(n, \mu ,k\) return a sample constituting with \(n\) Pareto variables with parameter \(\mu\) and \(k\). Try the function for different values of \(\mu\) and \(k\). What is the role of the these parameters?

rPareto<-function(n,mu,k)
{
  U<-runif(n)
  mu*(1-U)^{-1/k}
}

X<-rPareto(1000,5,5)
summary(X)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.000   5.297   5.729   6.240   6.584  20.758
#hist(X,breaks=30)
#X<-rPareto(1000,1,5)
#summary(X)

#X<-rPareto(1000,5,10)
#summary(X)

#X<-rPareto(1000,5,1)
#summary(X)

2.4 Theorem Central Limit (TCL), convergence in distribution

Let \(X_1,\ldots,X_n\) an i.i.d. sample with expectation \(\mathbb E(X)=\mu\) and variance \(Var(X) =\sigma^2\). Then,
\[ \sqrt{n}(\bar X_n - \mu) \rightarrow \mathcal N(0,\sigma^2) \quad\text{as soons as } n\rightarrow\infty \] We say that the mean value \(\bar X_n\) converges in distribution to a Gaussian distribution.

Using a loop ‘for’, simulate \(M=10^4\) samples of size \(n=1000\) from a Poisson distribution with parameter \(\lambda=5\) and verify the theorem. Plot the histogram of the statistic with the density of the Gaussian.

M=10^4;n=10^3
Z=NULL;
for(m in 1:M){
  X=rpois(n,5)
  Z[m]=sqrt(n)*(mean(X) - 5)
}
hist(Z,freq=FALSE)
plot(function(x) dnorm(x,0,sqrt(5)),add=TRUE,col="red",xlim=c(-6,6) )

Note that pour a Poisson distribution the expectation and the variance is \(\lambda=5\), so the standard deviation is \(\sqrt{\lambda}=5\).

try now with a exponential distribution with rate parameter \(\lambda=5\). Note that for an exponential distribution the expectation is \(1/\lambda=1/5\), and the variance is \(1/\lambda^2=1/25\) (so the sd is 1/5). Try with the pareto distribution. Nothe the expection is \(\frac{k\mu}{k-1}\)

M=10^4;n=10^3
Z=NULL;
for(m in 1:M){
  X=rexp(n,5)
  Z[m]=sqrt(n)*(mean(X) - 1/5)
}
hist(Z,freq=FALSE)
plot(function(x) dnorm(x,0,sqrt(1/25)),add=TRUE,col="red",xlim=c(-1,1) )

2.5 Law of large numbers (LLN), almost surely convergence

Let \(X_1,\ldots,X_n\) an i.i.d. sample with expectation \(\mu\). Then \[P(\bar X_n \rightarrow \mu)=1.\] We say that \(\bar X_n\) converges almost surely (a.s.) to \(\mu\).

simulate a sample of size \(n=10^4\) of exponential distribution with parameter \(\lambda=2\). For \(k\) varying from 1 to \(n\), evaluate the mean value \(Z_k=\bar X_k\). Plot the sequence \(Z_k\) and add the horizontal line \(y=0.5\) (use abline(h=0.5)).

set.seed(2)
n=10^4
Z=NULL;
X=rexp(n,2)
for(k in 1:n){
  Z[k]=mean(X[1:k])
}
plot(Z,xlab="length",ylab="mean value")
abline(h=1/2,col="red")

2.6 Multivariate Gaussian distribution

2.6.1 Cholesky decomposition

Any symmetric definite-positive matrix \(\Sigma\) can to be decompose in the product of an lower triangular matrix \(L\) (composed with 0 above the diagonal) and his transposed: \[ \Sigma=LL^T. \] Find these two triangular matrices require specific algebra algorithm (refereed as the Cholesky decomposition).

Let \[ \Sigma= \left(\begin{matrix}1 & 0.4 \\0.4 & 1\end{matrix}\right) \] Using the eigen R function that provides the eigen value of a matrix, check that the matrix \(\Sigma\) is definite-positive (Note that a symmetric matrix is positive definite if and only if all the eigen values of \(\Sigma\) are strictly positives). Use the R function chol to get the lower triangular matrix. Make the matricial product to check that \(\Sigma=LL^T\). The R matricial product is %*% and the transpose operation is t().

sigma=matrix(c(1,0.4,0.4,1),2,2)
eigen(sigma)
## eigen() decomposition
## $values
## [1] 1.4 0.6
## 
## $vectors
##           [,1]       [,2]
## [1,] 0.7071068 -0.7071068
## [2,] 0.7071068  0.7071068
L=t(chol(sigma))
L%*%t(L)
##      [,1] [,2]
## [1,]  1.0  0.4
## [2,]  0.4  1.0

2.6.2 Simulating of the multivariate Gaussian distribution

Let \(X=(Z_1,Z_2)\) be a . We say that \(X\) is multivariate Gaussian with mean value \(m=(m_1,m_2)\) and covariance matrix \(\Sigma\) if the density of \(X\) is

\[ f(x,m,\Sigma)=\frac{1}{(2 \pi)^{n / 2} \sqrt{\operatorname{det} \Sigma}} \exp \left(-\frac{1}{2}(x-m)^t \Sigma^{-1}(x-m)\right),\quad x\in \mathbb R^2 \]

library(rvinecopulib)
## Warning: le package 'rvinecopulib' a été compilé avec la version R 4.4.2
Cn <- bicop_dist(family = "gaussian", parameters =0.8)

contour(Cn,col="blue",main="contour-plot of bivariate gaussian distribution")

## plots
plot(Cn,margins="norm",main="Bivariate gaussian density") # surface plot of copula density

To simulate a multivariate Gaussian sample with mean value \(m=(m_1,m_2)\) and covariance matrix \(\Sigma\) we can use the Cholesky decomposition. In fact, Consider \(Z_1\) and \(Z_2\) two independent random variable from the Gaussian distribution with mean value 0 and sd 1. Let \(Z=(Z_1,Z_2)\). Then we have the property that the vector \(X=L^T Z + m\) is multivariate Gaussian with covariance matrix \(\Sigma\) and mean value \(m\).

  • Generate a sample of size \(n\) from the multivariate Gaussian distribution with covariance \(\Sigma\) and mean value \(m=c(0,0).\) To do that, use the cholesky decomposition (optional) or use the R package mvtnorm:
##or using mvtnorm
library(mvtnorm)
sigma=matrix(c(1,0.4,0.4,1),2,2) #matrix sigma
m=c(0,0) #mean value vector m
X<-rmvnorm(M,m,sigma)
  • Plot the sample and the two histograms

  • Calculate the mean value and the empirical variance of the two component of the samples and the empirical covariance between the two components.

  • Start again with \[ \Sigma_2= \left(\begin{matrix}1 & 0.1 \\0.1 & 1\end{matrix}\right) \text{ and } \Sigma_3= \left(\begin{matrix}1 & 0.9 \\0.9 & 1\end{matrix}\right). \]

  • Use the following code to add the ‘contour-plot’ of the distribution

library(rvinecopulib)

plot(X[,1],X[,2],main=paste("rho=",0.4))
Cn <- bicop_dist(family = "gaussian", rotation = 0, parameters =0.4)
contour(Cn,col="blue",add=TRUE)

2.7 Other multivariate distributions

Simulate a bivariate sample \((X_1,Y_1),\ldots,(X_n,Y_n)\) in which \(X_1,\ldots,X_n\) are i.i.d. and follow a exponential distribution with shape parameter \(\lambda=3\), \(Y_1,\ldots,Y_n\) are i.i.d. and follow a Gaussian distribution white mean value \(\mu=2\) and variance \(\sigma^2=1\), and such that \(X_1\) and \(X_2\) be correlated

To do that:

  • Simulate a \((X_1,Y_1),\ldots,(X_n,Y_n)\) following a multivariate Gaussian distribution with mean value \(m=(2,2)\) and covariance 0.4 using the R package mvtnorm and the function rmvnorm.
  • Transform \(X_1,\ldots,X_n\) in order to obtain an uniform sample following Exercise 2.
  • Transform the uniform sample in order to obtain an exponential sample following Exercise 2.
  • plot \(Y\) vs \(X\). Plot the histogram of \(X_1,\ldots,X_n\) and \(Y_1,\ldots,Y_n\)
  • print the Pearson’s correlation and Spearman’s correlation.
M=10000
rho=0.4
K = matrix(c(1,rho,rho,1),2,2) ##matrice de covariance souhaitée

##in the same way of the previous exercise

L=chol(K) ##matrice triangulaire supérieur
L=t(L) ##matrice triangulaire inférieur
X=matrix(rep(0,M*2),M,2) ##Initialisation de la matrice
for(m in 1:M)
X[m,]=L%*%rnorm(2) + c(2,2) ##Remplissage de la matrice X

##or using mvtnorm
library(mvtnorm)
X<-rmvnorm(M,c(2,2),K)

X[,1]<-pnorm(X[,1],2,1)
X[,1]<-qexp(X[,1],3)

hist(X[,1],freq=FALSE)
plot(function(x) dexp(x,3),xlim=c(0,3),add=TRUE,col="red")

plot(X[,1],X[,2])

cor(X[,1],X[,2])
## [1] 0.3643721
cor(X[,1],X[,2],method="spearman")
## [1] 0.3909433
  • Use the following R code to have the scatterplot of the distribution and the histograms:
xhist <- hist(X[,1], plot=FALSE)
yhist <- hist(X[,2], plot=FALSE)
top <- max(c(xhist$density, yhist$density))
nf <- layout(matrix(c(2,0,1,3),2,2,byrow=TRUE), c(3,1), c(1,3), TRUE)
par(mar=c(4,4,1,1))
#image(vx,vy,z,col=rev(heat.colors(101)))
plot(X,cex=0.2)
#points(X,cex=.2)
par(mar=c(0,3,1,1))
barplot(xhist$density, axes=FALSE, ylim=c(0, top), space=0)
lines((density(X[,1])$x-xhist$breaks[1])/diff(xhist$breaks)[1],
      dexp(density(X[,1])$x,3),col="red")
par(mar=c(3,0,1,1))
barplot(yhist$density, axes=FALSE, xlim=c(0, top), space=0, 
        horiz=TRUE)
lines(dnorm(density(X[,2])$x,2,1),(density(X[,2])$x-yhist$breaks[1])/
        diff(yhist$breaks)[1],col="red")

3 Maximum likelihood estimator

3.1 log-Likelihood

Let \(y_1,\ldots,y_n\) a i.i.d. sample with density (or p.m.f.) \(f\) with parameter \(\theta\). The likelihood of the sample \(y_1,\ldots,y_n\) is \[ \mathcal L(\theta; y_1,\ldots,y_n) = \prod_{i=1}^n \mathcal L(\theta; y_i) \] where \(\mathcal L(\theta; y_i)\) is the function \(\theta \mapsto f(y_i,\theta)\). Generally, Instead of the likelihood function, we consider the log-likelihood: \[ \log \mathcal L(\theta; y_1,\ldots,y_n) = \log\left(\prod_{i=1}^n \mathcal L(\theta; y_i)\right) = \sum_{i=1}^n \log \mathcal L(\theta; y_i). \]

The Maximum Likelihood Estimation (MLE) consists to maximizing the log-likelihood of the sample, (or minimizing \(-\log \mathcal L(\theta; y_1,\ldots,y_n)\) ). With some distributions, it can be done to solve the scoring equations, that are \[ \frac{\partial}{\partial\theta}\log \mathcal L(\theta; y_1,\ldots,y_n) =0. \] for many distributions, the scoring equations do not have an explicit solution, and an optimization procedure can be down to obtain the MLE, use optim in R.

  • Let \(y_1,\ldots,y_n\) be a sample from a Gaussian distribution with mean value \(\mu\) and variance \(\sigma^2\). Found the MLE of \(\mu\) and \(\sigma^2\).

\[\begin{align*} \log \mathcal L(\mu,\sigma^2; y_1,\ldots,y_n) &=\sum_{i=1}^n \log \mathcal L(\mu,\sigma^2; y_i) = \sum_{i=1}^n \log\left(\frac{1}{\sqrt{2\pi\sigma^2}}exp\left(-\frac{(y_i-\mu)^2}{2\sigma^2}\right)\right)\\ &=-\frac{n}{2}\log(2\pi)-\frac{n}{2}\log(\sigma^2) - \sum_{i=1}^n\left(\frac{(y_i-\mu)^2}{2\sigma^2}\right) \end{align*}\] So \[\begin{align*} \frac{\partial}{\partial\mu}\log L(\mu,\sigma^2; y_1,\ldots,y_n) = 0 &\Leftrightarrow \frac{\partial}{\partial\mu}\left(-\frac{n}{2}\log(2\pi)-\frac{n}{2}\log(\sigma^2) - \sum_{i=1}^n\left(\frac{(y_i-\mu)^2}{2\sigma^2}\right)\right)=0\\ &\Leftrightarrow \frac{\partial}{\partial\mu}\sum_{i=1}^n\left(\frac{(y_i-\mu)^2}{2\sigma^2}\right)=0\\ &\Leftrightarrow -\frac{1}{\sigma^2}\sum_{i=1}^n (y_i-\mu)=0\\ &\Leftrightarrow \mu=\bar y_n \end{align*}\]

\[\begin{align*} \frac{\partial}{\partial\sigma^2}\log L(\mu,\sigma^2; y_1,\ldots,y_n) = 0 &\Leftrightarrow \frac{\partial}{\partial\sigma^2}\left(-\frac{n}{2}\log(2\pi)-\frac{n}{2}\log(\sigma^2) - \sum_{i=1}^n\left(\frac{(y_i-\mu)^2}{2\sigma^2}\right)\right)=0\\ &\Leftrightarrow \frac{\partial}{\partial\sigma^2}\left(-\frac{n}{2}\log(\sigma^2) - \sum_{i=1}^n\left(\frac{(y_i-\mu)^2}{2\sigma^2}\right)\right)=0\\ &\Leftrightarrow -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4}\sum_{i=1}^n(y_i-\mu)^2=0\\ &\Leftrightarrow \sigma^2 = \frac{1}{n}\sum_{i=1}^n (y_i - \bar y_n)^2 \end{align*}\]

set.seed(123)
n=10^4
Y<-rnorm(n,2,2)
mu.MLE<-mean(Y)
sigma2.MLE<-var(Y)
mu.MLE;sigma2.MLE
## [1] 1.995257
## [1] 3.989101
hist(Y,freq=FALSE)
plot(function(x) dnorm(x,mu.MLE,sqrt(sigma2.MLE)),add=TRUE, col="red",xlim=c(-5,10))
plot(function(x) dnorm(x,2,2),add=TRUE, col="green",xlim=c(-5,10))

  • Generate a sample (use function) of size \(n=10^6\) from a Gaussian distribution with mean value \(\mu = 2\) and variance \(\sigma^2 = 4\). Estimate the MLE and illustrate the a.s. convergence of the estimator, that is to say the convergence of the estimator when the sample size increase
Me<-Va<-NULL
for(m in 1:n){
Me[m]=mean(Y[1:m])
Va[m]=var(Y[1:m])
}

plot(Me)
abline(h=2,col="red")

plot(Va)
abline(h=4,col="red")

3.2 Exponential distribution

Do the thing using the exponential distribution given by \[ f(\lambda,x) = \lambda \exp(-\lambda x) \]

  • Define the MLE here some calculus (on the derivative of the log-density function) lead to

\[ \hat \lambda = \frac{1}{\bar{y_n}} \]

  • Illustrate the almost surely convergence using \(\lambda=4\)
Z=NULL
n=10^4
lambda=4
X= rexp(n,lambda)
for(k in 1:n)
{
  Z[k] = 1/mean(X[1:k])
}

plot(Z)
abline(h=lambda,col="red")

hist(X,breaks=30,freq=FALSE)
hat.lambda=1/mean(X)
plot(function(x) dexp(x, hat.lambda), col="red",add=TRUE,xlim=c(0,10))

3.3 Gamma distribution

The density of the Gamma distribution with shape parameter \(\alpha\) and rate parameter \(\beta\) is defined by \[ f(x;\alpha,\beta) = \frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}exp(-\beta x),\quad x>0, \alpha>0, \beta>0, \]

where the Gamma function \(\Gamma\) is defined by

\[ \Gamma(z)=\int_0^\infty t^{z-1}e^{-t}dt. \] Note that \(\Gamma(z+1) = z\Gamma(z)\) and \(\Gamma(1)=1\). If Y follows a Gamma distribution with shape parameter \(\alpha\) and rate parameter \(\beta\), \[ \mathbb E(Y) = \frac{\alpha}{\beta},\quad Var(Y) = \frac{\alpha}{\beta^2}. \]

  • Check that for \(\alpha=1\), the Gamma distribution is the exponential distribution.

  • Plot the density of the Gamma distribution for many parameter \(\alpha\) and \(\beta\) (use dgamma function).

plot(function(x) dgamma(x,1,3),xlim=c(0,5),xlab="",ylab="",main="gamma density")
plot(function(x) dgamma(x,2,0.5),xlim=c(0,5),col="red",add=TRUE)
plot(function(x) dgamma(x,4,6),xlim=c(0,5),col="blue",add=TRUE)
plot(function(x) dgamma(x,0.5,6),xlim=c(0,5),col="green",add=TRUE)
legend("topright",legend = c("shape=1, rate=3", "shape=2, rate=0.5","shape=4, rate=6","shape=0.5, rate=6"),col=c("black","red","blue","green"),lty=1)

  • Generate a sample (use rgamma function) of size \(n=10^6\) from a gamma distribution with shape value \(\alpha = 3\) and rate parameter \(\beta = 5\). Estimate the MLE using the R function optim then using the function fitdist (using library(fitdistrplus))
library(fitdistrplus)
## Warning: le package 'fitdistrplus' a été compilé avec la version R 4.4.2
## Le chargement a nécessité le package : MASS
## Le chargement a nécessité le package : survival
set.seed(1)
n<-5000
Y<-rgamma(n,3,5)

optim(f=function(x) -sum(dgamma(Y,x[1],x[2],log=TRUE)),par=c(0.1,0.1))$par
## [1] 2.997455 4.932734
fitg<-fitdist(Y,"gamma")
fitg
## Fitting of the distribution ' gamma ' by maximum likelihood 
## Parameters:
##       estimate Std. Error
## shape 2.996575  0.0569118
## rate  4.931091  0.1019496
llplot(fitg, expansion=5)
points(fitg$estimate[1],fitg$estimate[2],col="blue",pch=4)

4 gradient descent

Classical algorithms to reach MLE of a \(d\)-dimensional parameter \(\theta=(\theta_1,\ldots,\theta_d)\) and more generally to reach the maximum of a function \(f\) (or the minimum of the opposite function \(-f\)) are gradient descent algorithms. They consist to:

where \(\gamma\) is a learning rate. If \(\gamma\) is too large, gradient descent may not converge. If \(\gamma\) is too small, convergence may be slow. I recommend start by considering \(\gamma = \frac{1}{n}\).

4.1 Exponential distribution

  • Simulate a exponential distribution with parameter \(\lambda= 5\).

  • Use a gradient descent algorithm to estimate the distribution parameter. Compare with the output from fitdist

  • Determine the MLE using the \(fidistrplus\) R package and compare with your result. Again using the \(fidistrplus\) R package, draw the likelihood curve and add the successive parameter values of the algorithm.

## MLE of lambda is  4.9131

## [1] 4.913099
## Fitting of the distribution ' exp ' by maximum likelihood 
## Parameters:
##      estimate Std. Error
## rate   4.9131 0.06948173

4.2 Gamma distribution

  • For \(n=2000\), simulate a sample \(y_1,\ldots,y_n\) from a gamma distribution with parameter shape parameter \(\alpha= 3\) and rate parameter \(\beta = 5\).

  • Use a gradient descent algorithm to estimate the distribution parameter by MLE.

The derivatives of the log-likelihood function are

\[ \frac{\partial}{\partial \alpha} \log \mathcal L(y_1,\ldots,y_n, \alpha , \beta) = n\log(\beta) - n\psi(\alpha) + \sum_{i=1}^n\log(y_i) \] where \(\psi\) is the digamma function (use the digamma R function): \(\psi(\alpha)= \Gamma'(\alpha)/\Gamma(\alpha)\), and

\[ \frac{\partial}{\partial \beta} \log \mathcal L(y_1,\ldots,y_n, \alpha , \beta) = n\frac{\alpha}{\beta} - \sum_{i=1}^ny_i. \]

  • Determine the MLE using the \(fidistrplus\) R package and compare with your result. Again using the \(fidistrplus\) R package, draw the contour of the loglikelihood function and add the successive parameter values of the algorithm.
## MLE for alpha and beta are  3.083185 and  5.13798

5 Dichotomy algorithm (bisection method)

The objective of the exercise is to find the value of \(x\) such that \(F(x)=P(X\leq x)=0.975\), that is to say \(F^{-1}(0.975)\) where \(X\) is a random variable following a standard Gaussian distribution, using a dichotomy algorithm

plot(function(x) pnorm(x),xlim=c(-6,6))
abline(h=0.975,col="red",lty=2)
abline(v=qnorm(0.975),col="blue",lty=2)
points(c(qnorm(0.975),0.01),pch="?",col="blue")

In a equivalent way, we have to solve the equation \[ F(x)- 0.975 = 0 \]

Because the c.d.f. are strictly increasing and the c.d.f. of the Gaussian distribution is continuous, we can numerically solve this equation using a dichotomy algorithm.

Define the function \(Z(y) = F(y) - 0.975\) and start by define an initial interval \([a,b]\) such that the solution of the equation satisfies \(a\leq x\leq b\), that is to say \(Z(a)Z(b)<0\).

## [1] -0.024375

Then consider \(c=\frac{a+b}{2}\) and the interval \([a,c]\). If the solution \(x\) is in the interval \([a,c]\), necessarily \(Z(a)Z(c)<0\). If the solution \(x\) is in the interval \([c,b]\), \(Z(a)Z(c)\geq0\). Hence, if \(Z(a)Z(c)<0\), we can look for a solution in \([a,c]\) and else we can look for a solution in \([c,b]\) and thus define a new interval \([a,b]\) which contains the solution \(x\), it is equal to \([a,c]\) or \([c,b]\).\

Consider now the middle of the new interval \([a,b]\) and find a smaller interval which contains \(x\). Reiterate the algorithm as many times as necessary until you get an interval \([a,b]\) smaller than an \(\epsilon\), chosen by user. (For example consider \(\epsilon=10^{-7}\)). The final value for the estimation of \(x\) will be the middle of the considered interval.

To do that:

if(abs(b-a)<eps) break;

Then

cat("We used",m,"iterations to get the solution",c[m],"\n")
## We used 27 iterations to get the solution 1.959964
cat("The true value was",qnorm(0.975))
## The true value was 1.959964

6 Monte Carlo approximation

6.1 Monte Carlo approximation of \(\pi\)

Let recall that the equation of the unit circle (circle with a radius of 1) is given by \(x^2+y^2=1\). In the following Figure, the circle equation \(y=\sqrt{1-x^2}\) is plotted on [0,1].

plot(function(x) sqrt(1-x^2),ylab="",xlab="")

The objective of this exercise is to approximate \(\pi\). Let’s start by sampling random vectors \(U_1=(U_{1,1},U_{1,2})\) and \(U_2=(U_{2,1},U_{2,2})\) according to independent uniform variables on \([0,1]\). Then we look for the position of \(U_1\) and \(U_2\) in the unit square:

set.seed(1)
plot(function(x) sqrt(1-x^2),ylab="",xlab="")
U1=runif(2)
U2=runif(2)
points(U1[1],U1[2],col="red",pch=3)
text(U1[1]+0.04,U1[2]+0.01,"U1",col="red")
segments(x0=0,y0=U1[2],x1=U1[1],y1=U1[2],lty=2,col="red")
segments(x=U1[1],y0=0,x1=U1[1],y1=U1[2],lty=2,col="red")

points(U2[1],U2[2],col="blue",pch=3)
text(U2[1]+0.04,U2[2]+0.01,"U2",col="blue")
segments(x0=0,y0=U2[2],x1=U2[1],y1=U2[2],lty=2,col="blue")
segments(x=U2[1],y0=0,x1=U2[1],y1=U2[2],lty=2,col="blue")

In this example,

  • \(U_1=(U_{1,1},U_{1,2})=(0.27, 0.37)\) is under the curve of the quarter circle: \(U_{1,1}^2+U_{1,2}^2 = 0.27^2 + 0.37^2 = 0.21<1\).
  • \(U_2=(U_{2,1},U_{2,2})=(0.57, 0.91)\) is above the curve of the quarter circle: \(U_{2,1}^2+U_{2,2}^2 = 0.57^2+ 0.91^2 = 1.15 >1\).

Remember that the area of the circle is \(\pi\) so the area of the quarter circle is \(\pi/4\). What is the probability for any simulated \(U=(U_1,U_2)\) where \(U_1\) and \(U_2\) are uniform, to be under the curve of the quarter circle?

To estimate \(\pi/4\), Monte Carlo procedure consists in simulating \(M\) random vectors \(U_1=(U_{1,1},U_{1,2}),\ldots,U_M=(U_{M,1},U_{M,2})\), according to uniform distributions on \([0,1]\), and evaluate the proportion of the points \(U_m\) which are under the curve of the quarter circle, that is to say, such that \(U^2_{m,1}+U^2_{m,2}<1\).

  • Using \(M=10^4\), simulate \(M\) such random vectors
  • Plot the points which are under the curve of the quarter circle in red and the points which are above the curve of the quarter circle in blue:

Then, \(\pi/4\) can be approximated by the average of points which are under the curve of the quarter circle (among the total number of point) \[ \pi/4 \approx \frac{\text{number of red points}}{\text{total number of points}} \]

  • Evaluate \(\pi/4\) then \(\pi\)

  • Illustrate the (almost surely) convergence of the Monte Carlo estimator of \(\pi\). (Note that in R, you can access to the true value of \(\pi\) using the R command pi):

6.2 Monte Carlo approximation of the area under the Gaussian density

In the the same way of exercise 1, propose a Monte Carlo(MC) algorithm to evaluate the area under the Gaussian density between \(a=-1\) and \(b=5\), that is to say the probability \(P(a\leq X\leq b) = P(X\leq b) - P(X\leq a)\) for \(X\) a random variable with standard Gaussian distribution.

  • Plot the density of the Gaussian distribution on the interval \((-10,10)\).
  • In the same graphic, add the vertical lines \(a=-1\) and \(b=5\).
  • Add the points of the MC algorithm which are under the Gaussian curve in red and the points of the MC algorithm which are above the Gaussian curve in blue

\[ P(a\leq X\leq b) \approx \frac{\text{number of red points}}{\text{total number of points}}\times(b-a) \]

  • Evaluate the value of the area under the Gaussian density between \(a=-1\) and \(b=5\) and compare with the theoretical value obtained using \(pnorm\)
  • Illustrate the (almost surely) convergence of the Monte Carlo estimation of the area under the Gaussian curve.

## [1] 0.8346
## [1] 0.8413445