\[ P(a\leq X\leq b)= \int_{a}^b f(x)dx. \]
For a discrete random variable \(X\) taking values in a space value \(\xi\), the probability mass functions p.m.f are the probabilities \[P(X = k), k\in\xi\]
the cumulative distribution function c.d.f. is
\[ F(x) = P(X\leq x) \]
\[ Q(\alpha) = F^{-1}(\alpha);\quad \text{that is to say the value of $x$ such that } P(X\leq x) = \alpha. \] N.B.: \(Q(0.5)\) is the median of \(X\), \(Q(0.25)\) is the first quartile and \(Q(0.75)\) is the third quartile
For a continuous random variable taking value in \(\mathbb R\) and with density \(f\), the expectation of \(X\) is
\[ \mathbb E X = \int_{\mathbb R} xf(x) dx. \] For a discrete variable taking value on a discrete space \(\xi\), the expectation of \(X\) is \[ \mathbb E X = \sum_{k\in\xi} k P(X=k). \] The variance is \[ var(X) = \mathbb E\left((X-\mathbb E(X))^2\right) = \mathbb E(X^2)-(\mathbb E(X))^2 \] The standard deviation is \(sd(X)=\sqrt{var(X)}\). Finally, for two random variables \(X\) and \(Y\), the covariance between \(X\) and \(Y\) is \[ cov(X,Y)=E\left((X-E(X))(Y-E(Y))\right) = \mathbb E(XY) - \mathbb E(X)\mathbb E(Y) \]
For an independent and identically distributed (i.i.d.) sample \(y_1,\ldots,y_n\), an estimation of expectation and variance of the sample are the mean value (or empirical mean) and the empirical variance given by: \[ \bar y=\frac{1}{n}\sum_{i=1}^n y_i,\quad S_y^2 = \frac{1}{n}\sum_{i=1}^n(y_i - \bar y)^2 = \bar{y^2} - (\bar y)^2. \] Standard deviation (sd) on the sample is \(sd(y_1,\ldots,y_n)=\sqrt{S^2_y}\)
In the same way, the empirical covariance between the samples \(x_1,\ldots,x_n\) and \(y_1,\ldots,y_n\) is \[ C_{x,y}=\frac{1}{n}\sum_{i=1}^n(x_i-\bar x)(y_i - \bar y) = \bar{xy} - \bar x\bar y. \]
Finally the (Pearson’s) correlation between \(x_1,\ldots,x_n\) and \(y_1,\ldots,y_n\) is \[ r_{x,y}=\frac{C_{x,y}}{\sqrt{S_x^2S_y^2}} \] Pearson’s correlation is on \([-1,1]\), \(\pm 1\) for perfect linear dependence and 0 for no linear dependence.
In the following exercises, we will use the R functions:
plot(function(x) dnorm(x,mean=0,sd=1), xlim=c(-5,5), ylab="", main="density of the gaussian distribution")
plot(function(x) pnorm(x,mean=0,sd=1), xlim=c(-5,5), ylab="", main="c.d.f. of the gaussian distribution")
barplot(dpois(0:10,4),ylab="Probability",xlab="k",
space=2,ylim=c(0,0.2), names.arg=0:10, main="p.m.f. of the poisson distribution with parameter 4")
plot(function(x) ppois(x,4), xlim=c(-1,10), ylab="", main="c.d.f. of the poisson distribution with parameter 4")
First, fix the seed by using the following command
set.seed(1)
n=10000
X1<-rexp(n,2)
X2<-rnorm(n,3,2)
X3<-rbinom(n,1,0.25)
X=data.frame(exp=X1,Gauss=X2,Bernouilli=X3) ###concatenation de X1, X2 et X3
head(X) ##Print the head of the matrix (or dataset) X
## exp Gauss Bernouilli
## 1 0.37759092 4.869048 0
## 2 0.59082139 3.580626 0
## 3 0.07285336 3.387249 0
## 4 0.06989763 5.147031 1
## 5 0.21803431 2.286873 0
## 6 1.44748427 3.790633 0
apply(X,2,mean)
## exp Gauss Bernouilli
## 0.4991806 3.0078905 0.2482000
apply(X,2,var)
## exp Gauss Bernouilli
## 0.2578852 4.0025920 0.1866154
cov(X)
## exp Gauss Bernouilli
## exp 0.25788516 -0.023411310 -0.006222800
## Gauss -0.02341131 4.002592022 -0.005486475
## Bernouilli -0.00622280 -0.005486475 0.186615422
hist(X1,breaks=30,freq=FALSE)
plot(function(x) dexp(x,2),add=TRUE,col="red",xlim=c(0,5))
hist(X2,breaks=30,freq=FALSE)
plot(function(x) dnorm(x,3,2),add=TRUE,col="red",xlim=c(-10,10))
table(X3)/n
## X3
## 0 1
## 0.7518 0.2482
pie(table(X3),labels=c("males","females"))
Let \(U\) a random variable distributed from the uniform distribution on \([0,1]\). Then for any cumulative distribution function \(F\), the random variable \[Z=F^{-1}(U)\quad \text{has distribution } F.\] Using this result, simulate the same exponential, Gaussian samples and answer the same questions as exercise 1. Use the r function runif, qexp, qnorm, qbinom
U1=runif(n)
U2=runif(n)
U3=runif(n)
X1= qexp(U1, 2)
X2= qnorm(U2,3,2)
X3=qbinom(U3,1,0.25)
X=data.frame(exp=X1,Gauss=X2,Bernouilli=X3) ###concatenation de X1, X2 et X3
head(X) ##Affiche le début de la matrice X
## exp Gauss Bernouilli
## 1 0.01972026 4.5880807 0
## 2 0.18615195 0.7783434 0
## 3 0.59454820 2.5494842 0
## 4 0.08673629 6.3367267 0
## 5 0.91940992 2.3569812 1
## 6 0.78985083 4.1971632 0
apply(X,2,mean)
## exp Gauss Bernouilli
## 0.507474 3.009853 0.247900
apply(X,2,var)
## exp Gauss Bernouilli
## 0.2578895 4.0610531 0.1864642
cor(X)
## exp Gauss Bernouilli
## exp 1.0000000000 -0.0009199897 -0.0121168116
## Gauss -0.0009199897 1.0000000000 -0.0009433278
## Bernouilli -0.0121168116 -0.0009433278 1.0000000000
hist(X1,breaks=30,freq=FALSE)
hist(X2,breaks=30,freq=FALSE)
table(X3)
## X3
## 0 1
## 7521 2479
Let \(X\) a continuous random variable with c.d.f. \(F\). Then the random variable \(Z=F(X)\) is uniformly distributed on \([0,1]\).
Using this result, from a Gaussian distribution, then from an exponential distribution, simulate a uniform sample on \([0,1]\).
X1=rnorm(n,2,3)
X2=rexp(n,3)
U1=pnorm(X1,2,3)
U2=pexp(X2,3)
hist(U1,breaks=30,freq=FALSE)
hist(U2,breaks=30,freq=FALSE)
The Pareto distribution has cumulative distribution function
\[ F(x) = 1 - (\frac{\mu}{x})^k\quad x\geq \mu,\quad k\in\mathbb R+ \] Determine the quantile function \(Q\). Verify the median is \(Q(0.5)=\mu\sqrt[k]{2}\). Create the function rPareto which for \(n, \mu ,k\) return a sample constituting with \(n\) Pareto variables with parameter \(\mu\) and \(k\). Try the function for different values of \(\mu\) and \(k\). What is the role of the these parameters?
rPareto<-function(n,mu,k)
{
U<-runif(n)
mu*(1-U)^{-1/k}
}
X<-rPareto(1000,5,5)
summary(X)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.000 5.297 5.729 6.240 6.584 20.758
#hist(X,breaks=30)
#X<-rPareto(1000,1,5)
#summary(X)
#X<-rPareto(1000,5,10)
#summary(X)
#X<-rPareto(1000,5,1)
#summary(X)
Let \(X_1,\ldots,X_n\) an i.i.d.
sample with expectation \(\mathbb
E(X)=\mu\) and variance \(Var(X)
=\sigma^2\). Then,
\[
\sqrt{n}(\bar X_n - \mu) \rightarrow \mathcal N(0,\sigma^2)
\quad\text{as soons as } n\rightarrow\infty
\] We say that the mean value \(\bar
X_n\) converges in distribution to a Gaussian distribution.
Using a loop ‘for’, simulate \(M=10^4\) samples of size \(n=1000\) from a Poisson distribution with parameter \(\lambda=5\) and verify the theorem. Plot the histogram of the statistic with the density of the Gaussian.
M=10^4;n=10^3
Z=NULL;
for(m in 1:M){
X=rpois(n,5)
Z[m]=sqrt(n)*(mean(X) - 5)
}
hist(Z,freq=FALSE)
plot(function(x) dnorm(x,0,sqrt(5)),add=TRUE,col="red",xlim=c(-6,6) )
Note that pour a Poisson distribution the expectation and the variance is \(\lambda=5\), so the standard deviation is \(\sqrt{\lambda}=5\).
try now with a exponential distribution with rate parameter \(\lambda=5\). Note that for an exponential distribution the expectation is \(1/\lambda=1/5\), and the variance is \(1/\lambda^2=1/25\) (so the sd is 1/5). Try with the pareto distribution. Nothe the expection is \(\frac{k\mu}{k-1}\)
M=10^4;n=10^3
Z=NULL;
for(m in 1:M){
X=rexp(n,5)
Z[m]=sqrt(n)*(mean(X) - 1/5)
}
hist(Z,freq=FALSE)
plot(function(x) dnorm(x,0,sqrt(1/25)),add=TRUE,col="red",xlim=c(-1,1) )
Let \(X_1,\ldots,X_n\) an i.i.d. sample with expectation \(\mu\). Then \[P(\bar X_n \rightarrow \mu)=1.\] We say that \(\bar X_n\) converges almost surely (a.s.) to \(\mu\).
simulate a sample of size \(n=10^4\) of exponential distribution with parameter \(\lambda=2\). For \(k\) varying from 1 to \(n\), evaluate the mean value \(Z_k=\bar X_k\). Plot the sequence \(Z_k\) and add the horizontal line \(y=0.5\) (use abline(h=0.5)).
set.seed(2)
n=10^4
Z=NULL;
X=rexp(n,2)
for(k in 1:n){
Z[k]=mean(X[1:k])
}
plot(Z,xlab="length",ylab="mean value")
abline(h=1/2,col="red")
Any symmetric definite-positive matrix \(\Sigma\) can to be decompose in the product of an lower triangular matrix \(L\) (composed with 0 above the diagonal) and his transposed: \[ \Sigma=LL^T. \] Find these two triangular matrices require specific algebra algorithm (refereed as the Cholesky decomposition).
Let \[ \Sigma= \left(\begin{matrix}1 & 0.4 \\0.4 & 1\end{matrix}\right) \] Using the eigen R function that provides the eigen value of a matrix, check that the matrix \(\Sigma\) is definite-positive (Note that a symmetric matrix is positive definite if and only if all the eigen values of \(\Sigma\) are strictly positives). Use the R function chol to get the lower triangular matrix. Make the matricial product to check that \(\Sigma=LL^T\). The R matricial product is %*% and the transpose operation is t().
sigma=matrix(c(1,0.4,0.4,1),2,2)
eigen(sigma)
## eigen() decomposition
## $values
## [1] 1.4 0.6
##
## $vectors
## [,1] [,2]
## [1,] 0.7071068 -0.7071068
## [2,] 0.7071068 0.7071068
L=t(chol(sigma))
L%*%t(L)
## [,1] [,2]
## [1,] 1.0 0.4
## [2,] 0.4 1.0
Let \(X=(Z_1,Z_2)\) be a . We say that \(X\) is multivariate Gaussian with mean value \(m=(m_1,m_2)\) and covariance matrix \(\Sigma\) if the density of \(X\) is
\[ f(x,m,\Sigma)=\frac{1}{(2 \pi)^{n / 2} \sqrt{\operatorname{det} \Sigma}} \exp \left(-\frac{1}{2}(x-m)^t \Sigma^{-1}(x-m)\right),\quad x\in \mathbb R^2 \]
library(rvinecopulib)
## Warning: le package 'rvinecopulib' a été compilé avec la version R 4.4.2
Cn <- bicop_dist(family = "gaussian", parameters =0.8)
contour(Cn,col="blue",main="contour-plot of bivariate gaussian distribution")
## plots
plot(Cn,margins="norm",main="Bivariate gaussian density") # surface plot of copula density
To simulate a multivariate Gaussian sample with mean value \(m=(m_1,m_2)\) and covariance matrix \(\Sigma\) we can use the Cholesky decomposition. In fact, Consider \(Z_1\) and \(Z_2\) two independent random variable from the Gaussian distribution with mean value 0 and sd 1. Let \(Z=(Z_1,Z_2)\). Then we have the property that the vector \(X=L^T Z + m\) is multivariate Gaussian with covariance matrix \(\Sigma\) and mean value \(m\).
##or using mvtnorm
library(mvtnorm)
sigma=matrix(c(1,0.4,0.4,1),2,2) #matrix sigma
m=c(0,0) #mean value vector m
X<-rmvnorm(M,m,sigma)
Plot the sample and the two histograms
Calculate the mean value and the empirical variance of the two component of the samples and the empirical covariance between the two components.
Start again with \[ \Sigma_2= \left(\begin{matrix}1 & 0.1 \\0.1 & 1\end{matrix}\right) \text{ and } \Sigma_3= \left(\begin{matrix}1 & 0.9 \\0.9 & 1\end{matrix}\right). \]
Use the following code to add the ‘contour-plot’ of the distribution
library(rvinecopulib)
plot(X[,1],X[,2],main=paste("rho=",0.4))
Cn <- bicop_dist(family = "gaussian", rotation = 0, parameters =0.4)
contour(Cn,col="blue",add=TRUE)
Simulate a bivariate sample \((X_1,Y_1),\ldots,(X_n,Y_n)\) in which \(X_1,\ldots,X_n\) are i.i.d. and follow a exponential distribution with shape parameter \(\lambda=3\), \(Y_1,\ldots,Y_n\) are i.i.d. and follow a Gaussian distribution white mean value \(\mu=2\) and variance \(\sigma^2=1\), and such that \(X_1\) and \(X_2\) be correlated
To do that:
M=10000
rho=0.4
K = matrix(c(1,rho,rho,1),2,2) ##matrice de covariance souhaitée
##in the same way of the previous exercise
L=chol(K) ##matrice triangulaire supérieur
L=t(L) ##matrice triangulaire inférieur
X=matrix(rep(0,M*2),M,2) ##Initialisation de la matrice
for(m in 1:M)
X[m,]=L%*%rnorm(2) + c(2,2) ##Remplissage de la matrice X
##or using mvtnorm
library(mvtnorm)
X<-rmvnorm(M,c(2,2),K)
X[,1]<-pnorm(X[,1],2,1)
X[,1]<-qexp(X[,1],3)
hist(X[,1],freq=FALSE)
plot(function(x) dexp(x,3),xlim=c(0,3),add=TRUE,col="red")
plot(X[,1],X[,2])
cor(X[,1],X[,2])
## [1] 0.3643721
cor(X[,1],X[,2],method="spearman")
## [1] 0.3909433
xhist <- hist(X[,1], plot=FALSE)
yhist <- hist(X[,2], plot=FALSE)
top <- max(c(xhist$density, yhist$density))
nf <- layout(matrix(c(2,0,1,3),2,2,byrow=TRUE), c(3,1), c(1,3), TRUE)
par(mar=c(4,4,1,1))
#image(vx,vy,z,col=rev(heat.colors(101)))
plot(X,cex=0.2)
#points(X,cex=.2)
par(mar=c(0,3,1,1))
barplot(xhist$density, axes=FALSE, ylim=c(0, top), space=0)
lines((density(X[,1])$x-xhist$breaks[1])/diff(xhist$breaks)[1],
dexp(density(X[,1])$x,3),col="red")
par(mar=c(3,0,1,1))
barplot(yhist$density, axes=FALSE, xlim=c(0, top), space=0,
horiz=TRUE)
lines(dnorm(density(X[,2])$x,2,1),(density(X[,2])$x-yhist$breaks[1])/
diff(yhist$breaks)[1],col="red")
Let \(y_1,\ldots,y_n\) a i.i.d. sample with density (or p.m.f.) \(f\) with parameter \(\theta\). The likelihood of the sample \(y_1,\ldots,y_n\) is \[ \mathcal L(\theta; y_1,\ldots,y_n) = \prod_{i=1}^n \mathcal L(\theta; y_i) \] where \(\mathcal L(\theta; y_i)\) is the function \(\theta \mapsto f(y_i,\theta)\). Generally, Instead of the likelihood function, we consider the log-likelihood: \[ \log \mathcal L(\theta; y_1,\ldots,y_n) = \log\left(\prod_{i=1}^n \mathcal L(\theta; y_i)\right) = \sum_{i=1}^n \log \mathcal L(\theta; y_i). \]
The Maximum Likelihood Estimation (MLE) consists to maximizing the log-likelihood of the sample, (or minimizing \(-\log \mathcal L(\theta; y_1,\ldots,y_n)\) ). With some distributions, it can be done to solve the scoring equations, that are \[ \frac{\partial}{\partial\theta}\log \mathcal L(\theta; y_1,\ldots,y_n) =0. \] for many distributions, the scoring equations do not have an explicit solution, and an optimization procedure can be down to obtain the MLE, use optim in R.
\[\begin{align*} \log \mathcal L(\mu,\sigma^2; y_1,\ldots,y_n) &=\sum_{i=1}^n \log \mathcal L(\mu,\sigma^2; y_i) = \sum_{i=1}^n \log\left(\frac{1}{\sqrt{2\pi\sigma^2}}exp\left(-\frac{(y_i-\mu)^2}{2\sigma^2}\right)\right)\\ &=-\frac{n}{2}\log(2\pi)-\frac{n}{2}\log(\sigma^2) - \sum_{i=1}^n\left(\frac{(y_i-\mu)^2}{2\sigma^2}\right) \end{align*}\] So \[\begin{align*} \frac{\partial}{\partial\mu}\log L(\mu,\sigma^2; y_1,\ldots,y_n) = 0 &\Leftrightarrow \frac{\partial}{\partial\mu}\left(-\frac{n}{2}\log(2\pi)-\frac{n}{2}\log(\sigma^2) - \sum_{i=1}^n\left(\frac{(y_i-\mu)^2}{2\sigma^2}\right)\right)=0\\ &\Leftrightarrow \frac{\partial}{\partial\mu}\sum_{i=1}^n\left(\frac{(y_i-\mu)^2}{2\sigma^2}\right)=0\\ &\Leftrightarrow -\frac{1}{\sigma^2}\sum_{i=1}^n (y_i-\mu)=0\\ &\Leftrightarrow \mu=\bar y_n \end{align*}\]
\[\begin{align*} \frac{\partial}{\partial\sigma^2}\log L(\mu,\sigma^2; y_1,\ldots,y_n) = 0 &\Leftrightarrow \frac{\partial}{\partial\sigma^2}\left(-\frac{n}{2}\log(2\pi)-\frac{n}{2}\log(\sigma^2) - \sum_{i=1}^n\left(\frac{(y_i-\mu)^2}{2\sigma^2}\right)\right)=0\\ &\Leftrightarrow \frac{\partial}{\partial\sigma^2}\left(-\frac{n}{2}\log(\sigma^2) - \sum_{i=1}^n\left(\frac{(y_i-\mu)^2}{2\sigma^2}\right)\right)=0\\ &\Leftrightarrow -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4}\sum_{i=1}^n(y_i-\mu)^2=0\\ &\Leftrightarrow \sigma^2 = \frac{1}{n}\sum_{i=1}^n (y_i - \bar y_n)^2 \end{align*}\]
set.seed(123)
n=10^4
Y<-rnorm(n,2,2)
mu.MLE<-mean(Y)
sigma2.MLE<-var(Y)
mu.MLE;sigma2.MLE
## [1] 1.995257
## [1] 3.989101
hist(Y,freq=FALSE)
plot(function(x) dnorm(x,mu.MLE,sqrt(sigma2.MLE)),add=TRUE, col="red",xlim=c(-5,10))
plot(function(x) dnorm(x,2,2),add=TRUE, col="green",xlim=c(-5,10))
Me<-Va<-NULL
for(m in 1:n){
Me[m]=mean(Y[1:m])
Va[m]=var(Y[1:m])
}
plot(Me)
abline(h=2,col="red")
plot(Va)
abline(h=4,col="red")
Do the thing using the exponential distribution given by \[ f(\lambda,x) = \lambda \exp(-\lambda x) \]
\[ \hat \lambda = \frac{1}{\bar{y_n}} \]
Z=NULL
n=10^4
lambda=4
X= rexp(n,lambda)
for(k in 1:n)
{
Z[k] = 1/mean(X[1:k])
}
plot(Z)
abline(h=lambda,col="red")
hist(X,breaks=30,freq=FALSE)
hat.lambda=1/mean(X)
plot(function(x) dexp(x, hat.lambda), col="red",add=TRUE,xlim=c(0,10))
The density of the Gamma distribution with shape parameter \(\alpha\) and rate parameter \(\beta\) is defined by \[ f(x;\alpha,\beta) = \frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}exp(-\beta x),\quad x>0, \alpha>0, \beta>0, \]
where the Gamma function \(\Gamma\) is defined by
\[ \Gamma(z)=\int_0^\infty t^{z-1}e^{-t}dt. \] Note that \(\Gamma(z+1) = z\Gamma(z)\) and \(\Gamma(1)=1\). If Y follows a Gamma distribution with shape parameter \(\alpha\) and rate parameter \(\beta\), \[ \mathbb E(Y) = \frac{\alpha}{\beta},\quad Var(Y) = \frac{\alpha}{\beta^2}. \]
Check that for \(\alpha=1\), the Gamma distribution is the exponential distribution.
Plot the density of the Gamma distribution for many parameter \(\alpha\) and \(\beta\) (use dgamma function).
plot(function(x) dgamma(x,1,3),xlim=c(0,5),xlab="",ylab="",main="gamma density")
plot(function(x) dgamma(x,2,0.5),xlim=c(0,5),col="red",add=TRUE)
plot(function(x) dgamma(x,4,6),xlim=c(0,5),col="blue",add=TRUE)
plot(function(x) dgamma(x,0.5,6),xlim=c(0,5),col="green",add=TRUE)
legend("topright",legend = c("shape=1, rate=3", "shape=2, rate=0.5","shape=4, rate=6","shape=0.5, rate=6"),col=c("black","red","blue","green"),lty=1)
library(fitdistrplus)
## Warning: le package 'fitdistrplus' a été compilé avec la version R 4.4.2
## Le chargement a nécessité le package : MASS
## Le chargement a nécessité le package : survival
set.seed(1)
n<-5000
Y<-rgamma(n,3,5)
optim(f=function(x) -sum(dgamma(Y,x[1],x[2],log=TRUE)),par=c(0.1,0.1))$par
## [1] 2.997455 4.932734
fitg<-fitdist(Y,"gamma")
fitg
## Fitting of the distribution ' gamma ' by maximum likelihood
## Parameters:
## estimate Std. Error
## shape 2.996575 0.0569118
## rate 4.931091 0.1019496
llplot(fitg, expansion=5)
points(fitg$estimate[1],fitg$estimate[2],col="blue",pch=4)
Classical algorithms to reach MLE of a \(d\)-dimensional parameter \(\theta=(\theta_1,\ldots,\theta_d)\) and more generally to reach the maximum of a function \(f\) (or the minimum of the opposite function \(-f\)) are gradient descent algorithms. They consist to:
where \(\gamma\) is a learning rate. If \(\gamma\) is too large, gradient descent may not converge. If \(\gamma\) is too small, convergence may be slow. I recommend start by considering \(\gamma = \frac{1}{n}\).
Simulate a exponential distribution with parameter \(\lambda= 5\).
Use a gradient descent algorithm to estimate the distribution parameter. Compare with the output from fitdist
Determine the MLE using the \(fidistrplus\) R package and compare with your result. Again using the \(fidistrplus\) R package, draw the likelihood curve and add the successive parameter values of the algorithm.
set.seed(3)
n=5000
###Exponential distribution
X=rexp(n,5)
f<-function(lambda) sum(log(lambda) - lambda*X)
df<- function(lambda) sum(1/lambda - X)
init = 0.5
maxit = 1000
#tolerance = 1e-6 #stopping criterion
learning_rate = 1/n
lambda = init
par=numeric(maxit+1)
par[1]=lambda
for (i in 1:maxit) {
new_lambda <- lambda + learning_rate * df(lambda)
lambda<-new_lambda
par[i+1]<-lambda
}
cat("MLE of lambda is ", lambda,"\n")
## MLE of lambda is 4.9131
lambda_values <- seq(0.1, 10, 0.1)
loglik <- sapply(lambda_values, f)
plot(loglik~lambda_values,type='l')
par_values <- sapply(par, f)
lines(par_values~par,col="red",type='l')
abline(v=5,lty=2)
abline(v=lambda,lty=2,col="red")
optim(f=function(x) -f(x),par=1,method="BFGS")$par
## [1] 4.913099
fitg<-fitdist(X,"exp")
fitg
## Fitting of the distribution ' exp ' by maximum likelihood
## Parameters:
## estimate Std. Error
## rate 4.9131 0.06948173
llplot(fitg)
abline(v=fitg$estimate,lty=2,col="red")
For \(n=2000\), simulate a sample \(y_1,\ldots,y_n\) from a gamma distribution with parameter shape parameter \(\alpha= 3\) and rate parameter \(\beta = 5\).
Use a gradient descent algorithm to estimate the distribution parameter by MLE.
Note that the derivatives of the log-likelihood function are
\[ \frac{\partial}{\partial \alpha} \log \mathcal L(y_1,\ldots,y_n, \alpha , \beta) = n\log(\beta) - n\psi(\alpha) + \sum_{i=1}^n\log(y_i) \] where \(\psi\) is the digamma function (use the digamma R function): \(\psi(\alpha)= \Gamma'(\alpha)/\Gamma(\alpha)\), and
\[ \frac{\partial}{\partial \beta} \log \mathcal L(y_1,\ldots,y_n, \alpha , \beta) = n\frac{\alpha}{\beta} - \sum_{i=1}^ny_i. \]
## MLE for alpha and beta are 3.083185 and 5.137979
The objective of the exercise is to find the value of \(x\) such that \(F(x)=P(X\leq x)=0.975\), that is to say \(F^{-1}(0.975)\) where \(X\) is a random variable following a standard Gaussian distribution, using a dichotomy algorithm
plot(function(x) pnorm(x),xlim=c(-6,6))
abline(h=0.975,col="red",lty=2)
abline(v=qnorm(0.975),col="blue",lty=2)
points(c(qnorm(0.975),0.01),pch="?",col="blue")
In a equivalent way, we have to solve the equation \[ F(x)- 0.975 = 0 \]
Because the c.d.f. are strictly increasing and the c.d.f. of the Gaussian distribution is continuous, we can numerically solve this equation using a dichotomy algorithm.
Define the function \(Z(y) = F(y) - 0.975\) and start by define an initial interval \([a,b]\) such that the solution of the equation satisfies \(a\leq x\leq b\), that is to say \(Z(a)Z(b)<0\).
a=-6; b=6
Z<-function(y) pnorm(y)-0.975
Z(a)*Z(b)
## [1] -0.024375
Then consider \(c=\frac{a+b}{2}\) and the interval \([a,c]\). If the solution \(x\) is in the interval \([a,c]\), necessarily \(Z(a)Z(c)<0\). If the solution \(x\) is in the interval \([c,b]\), \(Z(a)Z(c)\geq0\). Hence, if \(Z(a)Z(c)<0\), we can look for a solution in \([a,c]\) and else we can look for a solution in \([c,b]\) and thus define a new interval \([a,b]\) which contains the solution \(x\), it is equal to \([a,c]\) or \([c,b]\).\
Consider now the middle of the new interval \([a,b]\) and find a smaller interval which contains \(x\). Reiterate the algorithm as many times as necessary until you get an interval \([a,b]\) smaller than an \(\epsilon\), chosen by user. (For example consider \(\epsilon=10^{-7}\)). The final value for the estimation of \(x\) will be the middle of the considered interval.
To do that:
if(abs(b-a)<eps) break;
Then
maxit=1000
eps=10^{-7}
c=NULL
for(m in 1:maxit)
{
c[m]=(a+b)/2
if(Z(a)*Z(c[m]) <0) b<-c[m]
if(Z(a)*Z(c[m]) >=0) a<-c[m]
if(abs(b-a)<eps) break;
}
c[m]=(a+b)/2
#print(c[m])
#print(qnorm(0.975))
plot(c,type='l',xlab="iterations",ylab="value of the algorithm", main="Dichotomy algorithm")
abline(h=qnorm(0.975),col="red",lty=2)
cat("We used",m,"iterations to get the solution",c[m],"\n")
## We used 27 iterations to get the solution 1.959964
cat("The true value was",qnorm(0.975))
## The true value was 1.959964
Let recall that the equation of the unit circle (circle with a radius of 1) is given by \(x^2+y^2=1\). In the following Figure, the circle equation \(y=\sqrt{1-x^2}\) is plotted on [0,1].
plot(function(x) sqrt(1-x^2),ylab="",xlab="")
The objective of this exercise is to approximate \(\pi\). Let’s start by sampling random vectors \(U_1=(U_{1,1},U_{1,2})\) and \(U_2=(U_{2,1},U_{2,2})\) according to independent uniform variables on \([0,1]\). Then we look for the position of \(U_1\) and \(U_2\) in the unit square:
set.seed(1)
plot(function(x) sqrt(1-x^2),ylab="",xlab="")
U1=runif(2)
U2=runif(2)
points(U1[1],U1[2],col="red",pch=3)
text(U1[1]+0.04,U1[2]+0.01,"U1",col="red")
segments(x0=0,y0=U1[2],x1=U1[1],y1=U1[2],lty=2,col="red")
segments(x=U1[1],y0=0,x1=U1[1],y1=U1[2],lty=2,col="red")
points(U2[1],U2[2],col="blue",pch=3)
text(U2[1]+0.04,U2[2]+0.01,"U2",col="blue")
segments(x0=0,y0=U2[2],x1=U2[1],y1=U2[2],lty=2,col="blue")
segments(x=U2[1],y0=0,x1=U2[1],y1=U2[2],lty=2,col="blue")
In this example,
Remember that the area of the circle is \(\pi\) so the area of the quarter circle is \(\pi/4\). What is the probability for any simulated \(U=(U_1,U_2)\) where \(U_1\) and \(U_2\) are uniform, to be under the curve of the quarter circle?
To estimate \(\pi/4\), Monte Carlo procedure consists in simulating \(M\) random vectors \(U_1=(U_{1,1},U_{1,2}),\ldots,U_M=(U_{M,1},U_{M,2})\), according to uniform distributions on \([0,1]\), and evaluate the proportion of the points \(U_m\) which are under the curve of the quarter circle, that is to say, such that \(U^2_{m,1}+U^2_{m,2}<1\).
Then, \(\pi/4\) can be approximated by the average of points which are under the curve of the quarter circle (among the total number of point) \[ \pi/4 \approx \frac{\text{number of red points}}{\text{total number of points}} \]
Evaluate \(\pi/4\) then \(\pi\)
Illustrate the (almost surely) convergence of the Monte Carlo estimator of \(\pi\). (Note that in R, you can access to the true value of \(\pi\) using the R command pi):
In the the same way of exercise 1, propose a Monte Carlo(MC) algorithm to evaluate the area under the Gaussian density between \(a=-1\) and \(b=5\), that is to say the probability \(P(a\leq X\leq b) = P(X\leq b) - P(X\leq a)\) for \(X\) a random variable with standard Gaussian distribution.
\[ P(a\leq X\leq b) \approx \frac{\text{number of red points}}{\text{total number of points}}\times(b-a) \]
## [1] 0.8346
## [1] 0.8413445