using Distributions
using LaTeXStrings # just for labels in some plots
using Plots
using Test
Distributions of random variables
Preliminaries
For the exercices of this page, you need to add the packages Distributions.jl, LaTeXStrings.jl, Plots.jl and Test.jl. To install them please open Julia’s interactive session (known as REPL) and press ] key in the REPL to use the package mode, then add the packages:
> ]
julia> add Distributions
pkg> add LaTeXStrings
pkg> add Plots
pkg> add Test pkg
When the installation is complete, import them:
The package Test.jl
is used to test if the outputs of functions to complete are correct. The packages Plots.jl
is for data visualization. Almost everything in Plots is done by specifying plot attributes. Do not hesitate to have a look to this tutorial. Finally, the package Distributions.jl
provides a large collection of probabilistic distributions and related functions.
Empirical distribution function
Exercise 1
- Build the empirical cumulative distribution function, also called eCDF.
Use broadcasting to complete the following code.
= [1,2,3.5]
a .< 2 a
3-element BitVector:
1
0
0
"""
Compute de number of element in the vactor t less than a value x
input
t : Vector of Real
x : Real
Output
Integer
"""
function empirique(t::Vector{<:Real}, x::Real)::Int
# to complete
return sum(t .< x) # .< vectorial operation
end
println("empirique([1.,2,3],1.5) = ", empirique([1.,2,3],1.5))
Test.@test empirique([1.,2,3],1.5) == 1
empirique([1.,2,3],1.5) = 1
Test Passed
# If the type of the vector elements is not a real then there is an error
println("empirique([1.+2im,2,3],1.5) = ", empirique([1.,2+2im,3],1.5))
MethodError: no method matching empirique(::Vector{ComplexF64}, ::Float64) The function `empirique` exists, but no method is defined for this combination of argument types. Closest candidates are: empirique(::Vector{<:Real}, ::Real) @ Main In[352]:9 Stacktrace: [1] top-level scope @ In[353]:2
- Generate a sample of N=1000 datas from a uniform distribution on [0,2] and plot the eCDF of this sample.
= 1000 # number of datas
N = 2*rand(N) # uniform law on [0,2]
u = -1:0.1:3
x_grid
# Plot of the empirical cumulative distribution function
F(x) = empirique(u,x)/N
= plot(x_grid,F,xlabel="x", ylabel="F(x)", legend=false) p_uniform_cdf
- Add on the plot the eCDF with the
Distribution.jl
package.
Distributions.jl Package
Introduction
There is lots of libraries (packages in julia
): https://julialang.org/packages/. For the documentation of the Distributions Package see https://juliastats.org/Distributions.jl/stable/.
= 0; b = 2;
a = Uniform(a,b) # dist is an object : the uniform distribution on [a,b]
dist println("type de dist = ",typeof(dist))
# you can acces to the mean or median of the distribution
println("mean(dist) = ", mean(dist))
println("median(dist) = ", median(dist))
# and the the PDF, CDF and inverse CDF function of the distribution
println("pdf(1.2) = ", pdf(dist,1.2))
println("pdf(3) = ", pdf(dist,3))
println("cdf(1.2) = ", cdf(dist,1.2))
println("cdf(3) = ", cdf(dist,3))
println("inverse of cdf(0.75) = ", quantile(dist,0.75))
type de dist = Uniform{Float64}
mean(dist) = 1.0
median(dist) = 1.0
pdf(1.2) = 0.5
pdf(3) = 0.0
cdf(1.2) = 0.6
cdf(3) = 1.0
inverse of cdf(0.75) = 1.5
Exercise 2
Plot on the same first graph the CFD of the uniform distribution on [0,2]
cdf_uniform(x) = cdf(dist,x)
plot!(p_uniform_cdf,x_grid,cdf_uniform,xlabel="x", ylabel="F(x)", legend=false)
Triangular Distribution
We consider the distribution with the following density distribution
f(x) = \begin{cases} x\quad\textrm{pour}\quad x\in[0,1]\\ 2-x\quad\textrm{pour}\quad x\in[1,2]\\ 0\quad\textrm{sinon} \end{cases}
Plot the density, cumulative dendity and inverse cumulative function.
= 0; b = 2;
a = TriangularDist(a,b,1) # min = a; max = b; mode = 1
dist println("type de dist = ",typeof(dist))
println("params(dist) = ", params(dist))
# Density function
= plot(x_grid, x->pdf(dist,x), color = :blue, linewidth=2, xlabel=(L"x"), ylabel=(L"f(x)"))
p1 # Cumulative density function
= plot(a-1:0.01:b+1, x->cdf(dist,x), linewidth=2, xlabel=(L"x"), ylabel=(L"F(x)"))
p2 # Inverse cumulative density function
= plot(0:0.01:1, x->quantile(dist,x), xlims=(0,1), ylims=(0,2), color = :green, linewidth=2, xlabel=(L"u"), ylabel=(L"F^{-1}(u)"))
p3 plot(p1,p2,p3, layout=(1,3),legend = false,size = (1200,300), margin = 0.6Plots.cm)
type de dist = TriangularDist{Float64}
params(dist) = (0.0, 2.0, 1.0)
Histogram
Generate a sample of 100 datas from the triangular distribution and plot on the same graph the histogram of the simple and the PDF function
# Sample of 100 datas
= rand(dist,100)
t histogram(t)
histogram(t,normalize=true) # normalize=true pour ajouter la fonction de densité
plot!(a-0.5:0.1:b+0.5, x->pdf.(dist,x), linewidth=2, xlabel=(L"x"), ylabel=(L"f(x)"))
Question
What is the problem ?
- Use the normalize=true parameter in the histogram function for solving the problem
- Execute for a sample of N = 10000 datas
Example of discret distribution
We present here the binomial distribution which is a discrete probability distribution.
= 10, 0.2, 10^3
n, p, N = Binomial(n,p)
bDist = 0:n
xgrid plot(xgrid,pdf.(bDist,xgrid), color=:orange, seriestype = :scatter)
plot!(xgrid,pdf.(bDist,xgrid), line = :stem, linewidth=2, color=:orange)
Central limit theorem
We are going to illustrate the central limit theorem :
Suppose X_1,X_2,\ldots is a sequence of Independent and identically distributed random variables with E(X_i)=\mu and Var(X_i)=\sigma^2 < +\infty. Then, as n approaches infinity, the random variables \sqrt{n}(\bar{X}_n - \mu) converge in distribution to a normal distribution \mathcal{N}(0,\sigma^2)
Exercise
- Choose a distribution law dist, compute its mean \mu and its variance \sigma^2 and N the number of sanple
- For n in (1,2,5,20)
- Generate N=10000 samples of lenght n from the dist distribution
- Compute the means of the N samples and the N values \sqrt{n}(\bar{X}_n - \mu)
- Plot the histogram of these N values and the normal distribution \mathcal{N}(0,\sigma^2)
= rand(dist,(2,3))
X println(X)
mean(X,dims=1)) (
[0.3900501110792506 1.6017523790079928 0.8608969107894937; 0.8100903551087728 0.820605938432214 0.7620181237169289]
1×3 Matrix{Float64}:
0.60007 1.21118 0.811458
= Uniform(0,12)
dist = mean(dist)
μ = std(dist)
σ = Normal(0,σ)
normal_dist println(L"\mu = ", μ)
println(L"\sigma = ", σ)
= 10000
N = (1,2,5,20)
n_mean
= []
p for n in n_mean
= rand(dist,(n,N))
X = vec(sqrt(n)*(mean(X,dims=1) .- μ)) # to have a vector and not a matrix of size (1,3)
Xbar_n
= histogram(Xbar_n,normalize=true) # normalize=true pour ajouter la fonction de densité
p1 plot!(p1, -3*σ:0.1:3*σ, x->pdf.(normal_dist,x), linewidth=2, xlabel=(L"x"), ylabel=(L"f(x)"))
push!(p,p1)
end
println(p)
plot(p[1],p[2],p[3],p[4],legend = false)
$\mu = $6.0
$\sigma = $3.4641016151377544
Any[Plot{Plots.GRBackend() n=2}, Plot{Plots.GRBackend() n=2}, Plot{Plots.GRBackend() n=2}, Plot{Plots.GRBackend() n=2}]