Based on the talk “From Simulation to Markov Chain Monte Carlo Methods” of Professor Lih Y Deng - Department of Mathematical Sciences, The University of Memphis
+This method simulates the local perturbation of the current value (\(\epsilon_j\)) by defining a sequence \(\{\pi_t\}\) of densities whose argmax are confounded with \(argmax(h)\). Each \(\theta_t\) in the sequence is then simulated from the density \(\pi_t\) according to a specific update mechanism.
where
\[\begin{align*} &\Delta{h} = h(\zeta) - h(\theta_0) \\ &\text{If $h(\zeta) \ge h(\theta_0)$, $\zeta$ is accepted} \\ &\text{If $h(\zeta) < h(\theta_0)$, $\zeta$ may still be accepted with $\rho > 0$} \\ &\text{This allows escape from local maxima} \end{align*}\]
#From 5.3
h <- function(x){(cos(50*x)+sin(20*x))^2}
plot(function(x)h(x), xlim=c(0,1),ylim=c(0,4),lwd=2,main='function h(x)')
optimize(h, c(0, 1), tol = 0.0001, maximum=TRUE)
## $maximum
## [1] 0.3791249
##
## $objective
## [1] 3.832543
nsim <- 2500
u <- runif(nsim)
#Simulated annealing
xval <- array(0,c(nsim,1))
r <- .5
for(i in 2:nsim){
#Generate u from U(a_t, b_t)
test <- runif(1, min=max(xval[i-1]-r,0),max=min(xval[i-1]+r,1))
#Calculate probability
delta <- h(test)-h(xval[i-1])
rho <- min(exp(delta*log(i)/1),1)
xval[i] <- test*(u[i]<rho)+xval[i-1]*(u[i]>rho)
}
h(xval[nsim])
## [1] 3.803225
plot(xval,h(xval),type="l",lwd=2,main='Simulated Annealing Trajectory')
\[h(x,y)=(xsin(20y)+ysin(20x))^2cosh(sin(10x)x)+(xcos(10y)-ysin(10x))^2cosh(cos(20y)y)\] * We will apply simulated Annealing with different choices of \(T_i\): + Results dependent on choice of \(T_i\) + \(T_i \propto 1/log(i+1)\) + \(g \sim U(-0.1, 0.1)\) + Starting point is (0.5, 0.4)
h <- function(x,y){(x*sin(20*y)+y*sin(20*x))^2 * cosh(sin(10*x)*x) + (x*cos(10*y)-y*sin(10*x))^2 *cosh(cos(20*y)*y)}
Nsim <- 5000
XY <- matrix(data = NA, nrow = Nsim, ncol=2)
#Starting point
XY[1,1] <- .5
XY[1,2] <- .4
m <- 0
for(i in 2:Nsim) {
T_i <- 100/log(i+1)
U <- runif(1, max(XY[i-1, 1]-.1, -2), min(XY[i-1, 1]+.1, 2))
V <- runif(1, max(XY[i-1, 2]-.1,-2), min(XY[i-1, 2]+.1, 2))
rho <- min(exp(-(h(U,V)-h(XY[i-1,1], XY[i-1,2]))/T_i), 1)
if (rho>runif(1)) x <- U; y <- V; m <- m+1
XY[i,1] <- x
XY[i,2] <- y
}
plot(XY[,1], XY[,2], type='l', lwd=.5, main='Ti=100/log(i+1)')
The goal is to generate a random variate \(X\) with p.d.f \(p(x)\), assuming we know how to generate \(Y\) with p.d.f \(g(y)\) and we can find a constant \(M\) such that \(p(x)\le Mg(x)\).
\(g(x)\) is the majorizing, trial, or instrumental density.
\[pr(acceptance)=\frac{1}{M}\]
Choose \(g(x)=1, 0<x<1\) and we find \(M=\frac{20}{7}\) because \[p_X(x)=\frac{10}{7}(x+x^4)\le \frac{20}{7}\] and \[\frac{p_X(x)}{Mg(x)}=\frac{1}{2}(x+x^4)\].
\[pr(acceptance)=\frac{7}{20}=0.35\]
Nsim <- 10^4
v <- runif(Nsim);y <- v
u <- runif(Nsim)
x <- na.omit(ifelse(u <= (1/2)*(y+y^4),y,NA))
#hist(X, freq = F)
curve((1/2)*(x+x^4),col='green')
abline(h = 1, col = "red")
title (main="Plot of p(x)/Mg(x), g(x) =1",font.main=4)
text (0.8, 0.6, "p(x)/Mg(x)", adj=c (-.1,-.1)) # adding text to plot
# add accept/reject line segments to plot
segments(x0=0.4, x1=0.4, y0=0.0, y1=0.2, col="blue",lty=1)
segments(x0=0.4, x1=0.4, y0=0.2, y1=1.0, col="yellow",lty=1)
text (0.41, 0.1, "Accept", adj=c (-.1,-.1)) # adding “accept” text to plot
text (0.41, 0.8, "Reject", adj=c (-.1,-.1)) # adding “reject” text to plot
Choose \(g(x)=2x, 0<x<1\) and we find \(M=\frac{10}{7}\) because \[\frac{p(x)}{g(x)}=\frac{5}{7}(1+x^3)\le \frac{10}{7}\]. Hence, \(M=\frac{5}{7}\) and \(\frac{p_X(x)}{Mg(x)}=\frac{1}{2}(1+x^3)\).
\[pr(acceptance)=\frac{7}{10}=0.70\]
Nsim <- 10^4
v <- runif(Nsim);y <- v^(1/2)
u <- runif(Nsim)
x <- na.omit(ifelse(u <= (1/2)*(1+y^3),y,NA))
#hist(X, freq = F)
curve((1/2)*(1+x^3),col='green')
abline(h = 1, col = "red")
title (main="Plot of p(x)/Mg(x), g(x) =2x",font.main=4)
text (0.8, 0.8, "p(x)/Mg(x)", adj=c (-.1,-.1)) # adding text to plot
# add accept/reject line segments to plot
segments(x0=0.6, x1=0.6, y0=0.0, y1=0.6, col="blue",lty=1)
segments(x0=0.6, x1=0.6, y0=0.6, y1=1.0, col="yellow",lty=1)
text (0.61, 0.52, "Accept", adj=c (-.1,-.1)) # adding “accept” text to plot
text (0.61, 0.9, "Reject", adj=c (-.1,-.1)) # adding “reject” text to plot
In Rejection Method, there is no clear rule to find \(g(y)\) and constant “M” may not be easy to find. In these cases, we need to use Metropolis-Hastings Method.
When Markov Chain processes is applied in Metropolis-Hastings Method, we have a nice implementaion to sample a desired function since \(P\{X_i|X_0,...,X_{i-2}, X_{i-1}\} = P\{X_i|X_{i-1}\}\) and when n is large, \(X_n \to X \sim p(x)\). We can achieve our goal without finding the constant \(M\) and \(g(x)\), also no normalizing constant of \(p(x)\) is needed.
Random walk chain
A simple random walk Markov chain satisfies: \(Y = X_i + \epsilon\), where \(\epsilon \sim g(.)\), g(.) is a symmetric function around 0.
Independence chain
Since \(Y\) and \(X_i\) are independent, \(g(y|x_i) = g(y)\). To have the good estimates, we need to choose \(g(y)\) as close approximate to \(p(y)\) as possible and \(g(x)\) needs to be easy to generate.
\[p(\theta | \mathbf{y}) \propto (2+\theta)^{125} (1-\theta)^{38} \theta^{34}\] given that \(\theta \sim U(0,1)\)
First, let look at the distribution of posterior.
Nsim <- 10^4
p <- function(x) dbeta(x,55.75994892,33.77023722)
x <- runif(Nsim)
plot(x, p(x), col='red',cex=0.05,
main= bquote('p('~theta ~'|y)=c'(2+theta)^125*~(1-theta)^38 ~theta^34),ylab='')
Compare the distribution of \(p(\theta|y)\) to 3 different Beta distributions. As we can see, the posterior distribution has a close form to Beta(56,34).
x <- runif(1000)
x_seq <- seq(from=min(x), to=max(x), length = 1000)
plot(x_seq, p(x_seq), col='red',
main=bquote('p'~(theta~'|y') ~ 'vs. Beta(55,30), Beta(30,30), Beta(56, 34)'),
ylab='',xlab=expression(theta),type='l')
lines(x_seq, dbeta(x_seq,55,30),type='l',col='green')
lines(x_seq, dbeta(x_seq,30,30),type='l',col='blue')
lines(x_seq, dbeta(x_seq,56,34),type='l',col='sienna')
legend(0,7, c('p(theta|y)','B(50,50)','B(30,30)','B(56,34)'),lty=c(1,1,1),col=c('red','green','blue','sienna'),cex=0.5)
Application: We can use Metropolis-Hastings Algorithm to estimate the posterior mean. After large number of iterations, the sample mean converges to the true mean.
Example for \(g \sim Beta(55,30)\)
#Beta(55,30)
Nsim <- 10000
f <- function(x) (2+x)^125*(1-x)^38*x^34
g <- function(x) rbeta(1,55,30)
TH <- rep(0,Nsim)
TH[1] <- 0.5
for (i in 2:Nsim) {
th_current <- TH[i-1]
th_proposed <- g()
r <- (f(th_proposed)/f(th_current)) *
(dbeta(th_current, 55, 30)/dbeta(th_proposed, 55, 30))
if(r >= runif(1)) TH[i] <- th_proposed
else TH[i] <- th_current
}
mean_th_1 <- cumsum(TH)/(1:Nsim)
plot(TH, type='p',col='blue',cex=0.2,main='MCMC with g~Beta(55,30)',xlab='iteration', ylab = expression(theta))
abline(h=0.6228061319, col='black')
lines(mean_th_1,col='red',type='l')
#Beta(30,30)
Nsim <- 10000
g <- function(x) rbeta(1,30,30)
for (i in 2:Nsim) {
th_current <- TH[i-1]
th_proposed <- g()
r <- (f(th_proposed)/f(th_current)) *
(dbeta(th_current, 30, 30)/dbeta(th_proposed, 30, 30))
if(r >= runif(1)) TH[i] <- th_proposed
else TH[i] <- th_current
}
mean_th_2 <- cumsum(TH)/(1:Nsim)
plot(TH, type='p',col='green',cex=0.2,main='MCMC with g~Beta(30,30)',xlab='iterations')
abline(h=0.6228061319, col='black')
lines(mean_th_2,col='red',type='l')
#Beat(56,34)
g <- function(x) rbeta(1,50,30)
for (i in 2:Nsim) {
th_current <- TH[i-1]
th_proposed <- g()
r <- (f(th_proposed)/f(th_current)) *
(dbeta(th_current, 56, 34)/dbeta(th_proposed, 56, 34))
if(r >= runif(1)) TH[i] <- th_proposed
else TH[i] <- th_current
}
mean_th_3 <- cumsum(TH)/(1:Nsim)
plot(TH, type='p',col='sienna',cex=0.2,main='MCMC with g~Beta(56,34)',xlab='iterations')
abline(h=0.6228061319, col='black')
lines(mean_th_3,col='red',type='l')
As we can see, means in three MCMC with different parameters for Beta distribution all converge to the true mean (0.6228061319)
plot(NULL, xlim=c(0,Nsim), ylim=c(0,1), ylab="", xlab="", main='All three MCMC')
abline(h=0.6228061319, col='black')
lines(mean_th_1, col='blue')
lines(mean_th_2, col='green')
lines(mean_th_3, col='red')
legend(0,0.4, c('true mean','g~B(55,30)','g~B(30,30)','g~B(56,34)'),lty=c(1,1,1),col=c('black','blue','green','red'),cex=0.5)
\[p(x_1|x_2,...,x_d)\] \[p(x_2|x_1,x_3,...,x_d)\] \[p(x_d|x_1,...,x_{d-1})\]
Gibbs sampler is a method which generates random variables from a marginal distribution directly without having to calculate the density \(f(x),\) with the assumption that is is easier to calculate these marginals. It is based on elementary properties of Markov Chains.
If two random variables \(X\) and \(Y\) have joint density \(f(x,y)\) with corresponding densitites \(f_{y/x}\) and \(f_{x/y},\) the two-stage Gibbs sampler generates a Markov chain \((X_t, Y_t)\) according to the following steps:
Algorithm: Two-stage Gibbs Sampler
Take \(X_0=x_0\)
For \(t=1,2,...,\) generate
\(Y_t \sim f_{y|x}(.|x_{t-1})\)
\(X_t \sim f_{x|y}(.|y_{t})\)
There are many examples given in the text but i have choosen example 7.2 for the implementation of this algorithm.
Natural extension from the two-stage Gibbs sampler.
Suppose that for \(p>1,\) the random variable \(X \in x\) can be written as \(X=(X_1, ...,X_p),\) where the \(X_{i}\,'s\) are either unidemensional or multidimensional components.
Moreover, suppose that we can simulate from the corresponding conditional densities \(f_1,f_2,...,f_p,\) that is, we can simulate \[X_i|x_1,x_2,...,x_{i-1},x_{i+1},...,x_p \sim f_i(x_i|x_1,x_2,...,x_{i-1},x_{i+1},...,x_p)\] for \(i=1,2,...,p\)
The associated Gibbs sampler is given by the following transition from \(X^{(t)}\) to \(X^{(t+1)};\)
Algorithm: The multistage Gibbs Sampler
At iteration \(t=1,2,...,\) given \(X^{(t)}=(x_1^{(t)},...,x_p^{(t)}),\) generate
\(X_1^{(t+1)}\sim f_1(x_1|x_2^{(t)},...,x_p^{(t)});\)
\(X_2^{(t+1)}\sim f_2(x_1|x_1^{(t+1)},x_3^{(t)},...,x_p^{(t)});\)
.
.
.
The densities \(f_1, ...,f_p\) are called the full conditionals, and a particular feature of the Gibbs sampler is that they are the only densities used for simulation. Thus, even in a high-dimensional problem, all of the simulations may be univariate, which is usually an advantage.
In a general missing-data setting \[g(x)=\int_z \mathrm{f(x,z)}\,\mathrm{d}z\] for \(p\ge2,\) we write \(y=(x,z)=(y_1,...,y_p)\) and denote the conditional densities of \(f(y)=f(y_1,...,y_n)\) by \[Y_1|y_2,...,y_p \sim f_1(y_1|y_2,....y_p)\] \[Y_2|y_1, y_3,...,y_p \sim f_2(y_2|y_1, y_3,....y_p)\]
\[Y_p|y_1, y_2,...,y_{p-1} \sim f_p(y_1|y_1, y_2,....y_{p-1})\] Applying a multistage Gibbs sampler to those full conditionals and assuming they can be simulated leads to a Markov \((Y^{(t)})_t\) that converges to \(f\) and therefore a subchain \((X^{(t)})_t\) that converges to \(g.\)
Gibbs sampler is useful because It turns out that under reasonable general conditions, with large enough number of iterations, the distribution of \(X_k\) converges to \(f(x)\) which is the true marginal distribution of X. So the final point \(X_k\) is effectively a sample point from \(f(x)\)
Gibbs sampling can be used to estimate the density itself by averaging the final conditional densities from each Gibbs sequence. Just as values \(X_k=x_k\) yeild a realization of \(X_1,..., X_m \sim f(x)\), the values \(Y_k=y_k\) yeild a realization of \(Y_1,...,Y_m \sim f(y)\). The average of the conditional densities \(f(x|Y_k=y_k\) will be a close approximation to \(f(x)\) as \[\hat {f(x)} = \frac{1}{m} \sum_{i=1}^{m}f(x|y_i)\] where \(y_1,...,y_m\) is the sequence of relized values of final \(Y\) observations from each Gibbs sequence.
The Gibbs sampler provides better estimators since they are evaluated by \(f(x|y_1),...,f(x|y_m)\) using the simulated values \(y_1,...,y_m\) which carry more information about \(f(x)\) than \(x_1,...,x_m\).
Beta Binomial
Considering the pair of distributions
\[X|\theta \sim Bin(n, \;\theta),\quad \theta\sim Be(a\;b)\] leads to the joint distribution \[f(x,\;\theta)=\binom{n}{x}\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\theta^{x+a-1}(1-\theta)^{n-x+b-1}\]
The corresponding conditional distribution of \(X|\theta\) is given above, while \[\theta|x \sim Be(x+a,\;n-x+b).\]
The associated Gibbs sampler can be implemented as
betabi=function(x,a,b,n){
beta(x+a,n-x+b)/(beta(a,b)*beta(x+1,n-x+1)*(n+1))}
nsim=10^4
n=15;a=3;b=7
X=T=rep(0,nsim)
T[1]=rbeta(1,a,b) #initialize the chain
X[1]=rbinom(1,n,T[1]) #initialize the chain
for (i in 2:nsim){
X[i]=rbinom(1,n,T[i-1])
T[i]=rbeta(1,a+X[i],n-X[i]+b)
}
par(mfrow=c(1,2),mar=c(4,4,2,1))
hist(X[2000:nsim],nclass=16,col="grey",freq=FALSE, xlim=c(0,15),main="",xlab="X")
curve(betabi(x,a,b,n),from=0, to=15,col="gold4",lwd=2,add=TRUE)
hist(T[2000:nsim],nclass=134,col="grey",freq=FALSE,xlim=c(0,.8),main="", xlab=expression(theta))
curve(dbeta(x,shape1=a,shape2=b),from=0, to=.8,col="sienna",lwd=2,add=TRUE)
since this is a toy example, the closed form marginals are available and thus produced on top of the histogram, and they show a good fit for both Gibbs sampler.
Bivariate normal distribution.
As we know, the marginal distributions of X and Y are \(N(\mu_X, \sigma_X)\) and \(N(\mu_Y, \sigma_Y)\). The conditional distributions are:
\[Y|X=x \sim N \left( \mu_Y, \frac{\rho \sigma_Y}{\sigma_X}(x - \mu_X), (1 - \rho^2) \sigma_Y^2 \right) \] and \[X|Y=y \sim N \left( \mu_X, \frac{\rho \sigma_X}{\sigma_Y}(y - \mu_Y), (1 - \rho^2) \sigma_X^2 \right) \] where \(\rho\) is the correlation of X and Y
R implementation
Nsim <- 10000
r=0.9;mu_x=1;mu_y=2;sd_x=1.2;sd_y=0.75
s1 <- sqrt(1-r^2)*sd_x
s2 <- sqrt(1-r^2)*sd_y
#output matrix
XY <- matrix(nrow=Nsim,ncol=2)
XY[1,] <- c(mu_x,mu_y)
Y <- XY[1,2]
for(i in 1:Nsim-1){
mx <- mu_x+r*(Y-mu_y)*sd_x/sd_y
#get X with new mean
X <- rnorm(1,mx,s1)
#update X into output matrix
XY[i+1,1] <- X
my <- mu_y+r*(X-mu_x)*sd_y/sd_x
#sample Y with new mean
Y <- rnorm(1,my,s2)
#update Y into output matrix
XY[i+1,2] <- Y
}
#plot X and Y with burn-in = 500. Both X and Y converged.
par(mfrow=c(1,2))
plot(XY[501:10000,1],type='l',col='darkblue',main='Plot of X', ylab='X',xlab='iteration')
plot(XY[501:10000,2],type='l',col='wheat3',main='Plot of y', ylab='Y',xlab='iteration')
Gibbs sampling works since clearly, both X and Y converge to specific values.
Multisatge Gibbs sampler
Given the normal target \(N_p(0,(1-\rho)I+\rho J)\), we produce a Gibbs sampler using the conditional distribution \[X_i|x_{(-i)} \sim N\left( \frac{(p-1)\rho}{1+(p-2)\rho} \overline{x}_{-i}, \frac{1+(p-2)\rho-(p-1)p^2)}{1+(p-2)\rho} \right)\] where \(x_{(-i)} = (x_1, x_2, ..., x_{i-1}, x_{i+1},..., x_p)\) and \(\overline{x}_{-i}\) is the mean of vector \(x_{(-i)}\), \(p=5, \rho=0.25\).
The R implementation:
T=5000 ;p=5 ;r=0.25
X=cur=rnorm(p)
for (t in 1 :T){
for (j in 1 :p){
m=sum(cur[-j])/(p-1)
cur[j]=rnorm(1,(p-1)*r*m/(1+(p-2)*r), sqrt((1+(p-2)*r-(p-1)*r^2)/(1+(p-2)*r)))
}
X=cbind(X,cur)
}
#Distributions of X's
par(mfrow=c(2,3))
for (i in 1:p){
hist(X[i,],prob=TRUE,col="wheat2",xlab="",main="")
curve(dnorm(x),add=TRUE,col="sienna",lwd=2)
}
#Whether X's converge
cl <- c('darkblue', 'darkgreen', 'darkred', 'sienna', 'wheat')
par(mfrow=c(2,3))
for (i in 1:p){
plot(X[i,501:5000],type='l',col=cl[i],main=paste('Plot of X_',i), ylab=paste('X_',i),xlab='iteration')
}
We could see that all 5 marginals have a close form of a standard normal distribution and X’s converged
Missing data with \(N(\theta,1)\)
The likelihood function: \[g(x|\theta)=L(\theta|x) \propto \prod_{i=1}^{m}e^{\frac{-(x_i-\theta)^2}{2}}\]
and \[f(x,z|\theta)=L(\theta|x,z) \propto \prod_{i=1}^{m}e^{\frac{-(x_i-\theta)^2}{2}} \prod_{i=m+1}^{n}e^{\frac{-(z_i-\theta)^2}{2}}\]
is the complete-data likelihood. Given a prior distribution \(\pi(\theta)\) on \(\theta\), we can then create a Gibbs sampler that iterates between the conditional distributions \(\pi(\theta|x,z)\) and \(f(z|x,\theta)\) and will have stationary distribuion \(\pi(\theta,z|x)\), the posterior distribution of \((\theta,z)\).
Taking a non informative prior \(\pi(\theta)=1\), the conditional distribution of \(\theta|x,z\) is given by: \[\theta|x,z \sim N \left(\frac{m.\overline{x}+(n-m)\overline{z}}{n},\frac{1}{n}\right)\] while the conditional distribution of \(Z|x,\theta\) is the product of the truncated normals \(Z_i|x,\theta \sim \varphi(z-\theta) / \{1-\Phi(a-\theta)\}\) where \(\varphi\) and \(\Phi\) are the normal pdf and cdf, respectively. Each \(Z_i\) must be greater than the truncated point \(a\).
Here are the R implementation for using Gibbs sampler to estimate the posterior distribution of \(\theta.\)
x <- rnorm(20, mean=3)
m <- length(x)
n <- 30;a <- 3.5 #1/3 missing data
nsim <- 10^4
xbar <- mean(x)
#outputs
th <- array(xbar,dim=c(nsim,1))
zbar <- array(a,dim=c(nsim,1))
for (i in 2:nsim){
#get probabilities to sample z values
temp <- runif(n-m, min=pnorm(a,mean=th[i-1],sd=1), max=1)
#calculate zbar from sampling of z
zbar[i] <- mean(qnorm(temp, mean=th[i-1], sd=1))
#sample theta with posterior distribution
th[i] <- rnorm(1, mean=(m*xbar+(n-m)*zbar[i])/n, sd=sqrt(1/n))
}
par(mfrow=c(1,2))
#burn-in is 500, we start drawing histogram from the 500th iteration
hist(th[500:nsim],col="grey",breaks=25,xlab=expression(theta),main="",freq=FALSE)
curve(dnorm(x,mean(th),sd=sd(th)),add=T,lwd=2)
#Latent variables
hist(zbar[500:nsim],col="wheat",breaks=25,main="",xlab= expression(bar(Z)),freq=FALSE)
curve(dnorm(x,mean(zbar),sd=sd(zbar)),add=T,lwd=2, col='wheat4')
Both \(\theta\) and \(Z\) have normal distributiona.
Given a density of interest, \(f_X(x),\) we can always represent it as the marginal density of the joint density \[f(x,u)=I\;\{0\,< u\,<f_X(x)\}\] Since integrating the above in \(u\) returns \(f_X.\)
The associated conditional densities are \[f_{X|U}(x|u)=\frac{I\;\{0\,< u\,<f_X(x)\}}{\int\mathrm{I\;\{0\,< u\,<f_X(x)\}\mathrm{d}x} },\]\[f_{U|X}(u|x)=\frac{I\;\{0\,< u\,<f_X(x)\}}{\int\mathrm{I\;\{0\,< u\,<f_X(x)\}\mathrm{d}u} }\] Which means they are both uniform.
Those two conditionals then define the slice sampler as the associated Gibbs sampler.
Algorithm: 2D Slice-sampler:
At iteration \(t,\) simulate
\(U^{t+1} \sim U_{[0,\;f(x^{(t)})];}\)
\(X^{(t+1)\; \sim \; U_{A^{(t+1)}}}\), with \[ A^{(t+1)} \; = \; \{ x:\;f(x) \ge \; u^{(t+1)}\}.\]
The appeal of this algorithm is that it formally applies to any density known up to a multiplicative constant with no restriction on its shape or dimension. Obviously, its implementation may be hindered by the uniform simulation over the set \(A^{(t)}.\)
Example: Slice sampler
Consider the density \(f(x)=\frac{1}{2}e^{-\sqrt(x)}\)defined for \(x\;>\;0\). While it can be directly simulated, it also yields easily to the slice sampler. Indeed, applying the formulas above, we have \[U|x \sim U\;(0,\;\frac{1}{2}e^{-\sqrt(x)}),\quad X|u \sim U\;(0,\;[log(2u)]^2).\]
We implement the sampler to generate 5000 variates and plot them along with the density in Figure below, which shows that the agreement is very good. The right panel does show some strong autocorrelations, which is typical of the slice sampler.
par(mfrow=c(1,1))
fff=function(y)exp(-sqrt(y))/2;
nsim=5000;x=rep(pi,nsim);x[1]=-log(runif(1))
for (i in 2:nsim){
w=runif(1,min=0,max=fff(x[i-1]));x[i]=runif(1,min=0,max=(-log(2*w))^2)
}
X11(h=3.5);par(mfrow=c(1,2),mar=c(4,4,2,1))
## Warning in system2("otool", c("-L", shQuote(DSO)), stdout = TRUE): running
## command ''otool' -L '/Library/Frameworks/R.framework/Resources/modules/
## R_X11.so'' had status 1
hist(x,xlab="Slice Sampler",main="",xlim=c(0,40),ylim=c(0,.25),freq=FALSE,col="grey",breaks=250)
curve(fff,add=TRUE,lwd=3,col="sienna",xlim=c(0,40))
acf(x,xlab="Autocorrelation",ylab="",lwd=3)
If \[p(x) \propto \prod_{j=1}^{k}\;f_j(x),\qquad f_j(x)\ge 0, \qquad 1\le j\le k,\] then \(p(x)\) can be “completed” into \[g(Z_1, Z_2,...,Z_k, X)\; \propto \; \prod_{j=i}^{k} \mathbf{I}_{0\le z_j \ge f_j(x)}.\]
Each conditional p.d.f is a uniform distiribution!
Algorithm:
\[A^{(i+1)}=\{y|f_j(y)\;\ge\;Z_j^{(i+1)},\; j=1,2,...,k\}\]
\[p(\theta|\mathbf{y}) \propto (2+\theta)^{125} (1-\theta)^{38} \theta^{34}\] Goal: Estimation of its posterior mean via Slice/Gibbs Sampler
\[f_1(\theta)\;=\;(2+\theta)^{125}, \qquad f_2(\theta)\;=\;(1-\theta)^{38}, \qquad f_3(\theta)\;=\;\theta^{34}.\]
Startung with any \(0 < \theta = \theta_0 < 1,\) repeat steps \((1)-(5)\)
Generate \(Z_1 \sim U(0, (2+\theta)^{125})\)
Generate \(Z_ \sim U(0, (1-\theta)^{38})\)
Generate \(Z_3 \sim U(0, \theta^{34})\)
For \(Z_1,\; Z_2,\; Z_3\) generated, find the range of \(\theta:\) \[\theta \ge Z_1^{1/125}-2, \qquad \theta \le 1-Z_2^{1/38}, \qquad \theta \ge Z_3^{1/34}.\]
Generate \(\theta^{new} \sim U(A, B),\) where \((A, B)\) is \[A\;=\;max(Z_1^{1/125}\;-2,\; Z_3^{1/34}), \qquad B\;=\;1-Z_2^{1/38}.\]
Three initial values of \(\theta\) tested: \(0.05,\; 0.50, \; 0.95.\)$
Nsim <- 5000
slice_sampler <- function (th) {
Th <- rep(0,Nsim)
Th[1] <- th
for (i in 2:Nsim) {
z1 <- runif(1,0,(2+Th[i-1])^125)
z2 <- runif(1,0,(1-Th[i-1])^38)
z3 <- runif(1,0,(Th[i-1])^34)
Th[i] <- runif(1, max(z1^(1/125)-2, z3^(1/34)), 1-z2^(1/38))
}
return(Th)
}
Th_1 <- slice_sampler(0.05)
Th_2 <- slice_sampler(0.5)
Th_3 <- slice_sampler(0.95)
plot(NULL, xlim=c(0,Nsim), ylim=c(0,1), ylab="", xlab="", main='Slice Sampler with three initial values')
abline(h=0.6228061319, col='black')
lines(cumsum(Th_1)/(1:Nsim), col='blue')
lines(cumsum(Th_2)/(1:Nsim), col='green')
lines(cumsum(Th_3)/(1:Nsim), col='red')
legend(3700,0.3, c('true mean','initial value=0.05','initial value=0.5','initial value=0.95'),lty=c(1,1,1),col=c('black','blue','green','red'),cex=0.7)
All three values converge to:
\[E(\theta|\mathbf{y})=\frac{\int_0^1 (2+\theta)^{125}(1-\theta)^{38}\theta^{34}\theta\, \mathrm{d}\theta}{{\int_0^1 (2+\theta)^{125}(1-\theta)^{38}\theta^{34}\, \mathrm{d}\theta}} \;=\; 0.6228061319\]