R-hat against iterations RStan - stan

I am trying to generate a similar plot as below to show the change in R-hat over iterations:
I have tried the following options :
summary(fit1)$summary : gives R-hat all chains are merged
summary(fit1)$c_summary : gives R-hat for each chain individually
Can you please help me to get R-hat for each iteration for a given parameter?

rstan provides the Rhat() function, which takes a matrix of iterations x chains and returns R-hat. We can extract this matrix from the fitted model and apply Rhat() cumulatively over it. The code below uses the 8 schools model as an example (copied from the getting started guide).
library(tidyverse)
library(purrr)
library(rstan)
theme_set(theme_bw())
# Fit the 8 schools model.
schools_dat <- list(J = 8,
y = c(28, 8, -3, 7, -1, 1, 18, 12),
sigma = c(15, 10, 16, 11, 9, 11, 10, 18))
fit <- stan(file = 'schools.stan', data = schools_dat)
# Extract draws for mu as a matrix; columns are chains and rows are iterations.
mu_draws = as.array(fit)[,,"mu"]
# Get the cumulative R-hat as of each iteration.
mu_rhat = map_dfr(
1:nrow(mu_draws),
function(i) {
return(data.frame(iteration = i,
rhat = Rhat(mu_draws[1:i,])))
}
)
# Plot iteration against R-hat.
mu_rhat %>%
ggplot(aes(x = iteration, y = rhat)) +
geom_line() +
labs(x = "Iteration", y = expression(hat(R)))

Related

Multiply probability distributions in R [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 months ago.
Improve this question
I'm trying to multiply some probability functions as to update the probability given certain factors. I've tried several things using the pdqr and bayesmeta packages, but they all work out not the way I intend, what am I missing?
A reproducible example showing two different distributions, a and b, which I want to multiply. That is because, as you notice, b doesn't have measurements in the low values, so a probability of 0. This should be reflected in the updated distribution.
library(tidyverse)
library(pdqr)
library(bayesmeta)
#measurements
a <- c(1, 2, 2, 4, 5, 5, 6, 6, 7, 7, 7, 8, 7, 8, 2, 6, 9, 10)
b <- c(5, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 9, 9, 9, 7)
#create probability distribution functions
distr_a <- new_d(a, type = "continuous")
distr_b <- new_d(b, type = "continuous")
#try to combine distributions
summarized <- distr_a + distr_b
multiplied <- distr_a * distr_b
mixture <- form_mix(list(distr_a, distr_b))
convolution <- convolve(distr_a, distr_b)
The resulting PDF's are plotted like this:
The bayesmeta::convolve() does the same as summarizing two pdqr PDF's and seem to oddly shift the distributions to the right and make them not as high as supposed to be.
Ordinarily multiplying the pdqr PDF's leaves a very low probablity overall.
Using the pdqr::form_mix() seems to even the PDF's out in between, but leaving probabilies above 0 for the lower x-values.
So, I tried to gain some insight in what I wanted to do, by using the PDF's for a and b to generate probabilities for each x value and multiply that:
#multiply distributions manually
x <- c(1:10)
manual <- data.frame(x) %>%
mutate(a = distr_a(x),
b = distr_b(x),
multiplied = a*b)
This indeed gives a resulting shape I am after, it however (logically) has too low probabilities:
I would like to multiply (multiple) PDF's. What am I doing wrong? Are my statistics wrong, or am I missing a usefull function?
UPDATE:
It seems I am a stats noob on this subject, but I would like to achieve something like the below distribution. Given that both situation a and b are true, I would expect the distribution te be something like the dotted line. Is that possible?
multiplied is the correct one. One can check with log-normal distributions. The sum of two independant log-normal random variables is log-normal with µ = µ_a + µ_b and sigma² = sigma²_a + sigma²_b.
a <- rlnorm(25000, meanlog = 0, sdlog = 1)
b <- rlnorm(25000, meanlog = 1, sdlog = 1)
distr_a <- new_d(a, type = "continuous")
distr_b <- new_d(b, type = "continuous")
distr_ab <- form_trans(
list(distr_a, distr_b), trans = function(x, y) x*y
)
# or: distr_ab <- distr_a * distr_b
plot(distr_ab, xlim = c(0, 40))
curve(dlnorm(x, meanlog = 1, sdlog = sqrt(2)), add = TRUE, col = "red")
As demonstrated here:
https://www.r-bloggers.com/2019/05/bayesian-models-in-r-2/
# Example distributions
probs <- seq(0,1,length.out= 100)
prior <- dbinom(x = 8, prob = probs, size = 10)
lik <- dnorm(x = probs, mean = .5, sd = .1)
# Multiply distributions
unstdPost <- lik * prior
# If you wanted to get an actual posterior, it must be a probability
# distribution (integrate to 1), so we can divide by the sum:
stdPost <- unstdPost / sum(unstdPost)
# Plot
plot(probs, prior, col = "black", # rescaled
type = "l", xlab = "P(Black)", ylab = "Density")
lines(probs, lik / 15, col = "red")
lines(probs, unstdPost, col = "green")
lines(probs, stdPost, col = "blue")
legend("topleft", legend = c("Lik", "Prior", "Unstd Post", "Post"),
text.col = 1:4, bty = "n")
Created on 2022-08-06 by the reprex package (v2.0.1)

Reconstruct Time Series from FFT frequency and strength data using R

After applying a Fourier Transform to an EEG measurement, I want to compare the approximation by FFT with the original signal in the form of a plot. I have to convert the data (frequency and strength) from the FFT back to a time series.
To transform the original time series I use the eegfft method of the eegkit package. I get a list of frequencies and amplitudes to approximate the original signal.
Here the two results of the FFT are shown as shortened examples:
# Frequency in Hz
freq <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# Strength in uV
ampl <- c(4.1135352, 5.1272713, 3.2069741, 1.5336438, 2.4301334, 1.0974758, 1.8238327, 0.9637886, 1.1401306, 0.2224472)
Is there a package or method that I can use to reconstruct a time series from the frequency and amplitude data that has been approximated by FFT?
EDIT:
For the reconstruction of the original signal, do I also need the phase information that the eegfft method returns in the result?
# Phase shift in range -pi to pi
phase <- c(0.0000000, 1.1469542, -2.1930702, 2.7361738,1.1597980, 2.6118647, -0.6609641, -2.1508755,1.6584852, -1.2906986)
I expect something like this should work.
Edit: I have set phases to default to zero if missing and not passed into data_from_fft.
freq <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
ampl <- c(4.1135352, 5.1272713, 3.2069741, 1.5336438, 2.4301334, 1.0974758, 1.8238327, 0.9637886, 1.1401306, 0.2224472)
phase <- c(0.0000000, 1.1469542, -2.1930702, 2.7361738,1.1597980, 2.6118647, -0.6609641, -2.1508755,1.6584852, -1.2906986)
sampl_freq = 1000
data_from_fft <- function(xmin, xmax, sample_freq,
frequencies, amplitudes, phases = 0) {
x_vals <- seq(xmin, xmax, length.out = sample_freq * (xmax-xmin))
y_vals <- x_vals * 0
for (i in seq_along(x_vals)) {
# Note, I don't understand why the pi/2 phase adjustment is needed here,
# but I couldn't get the right answers out eegfft without it... :-(
y_vals[i] <- sum(amplitudes * sin(2*pi*frequencies * x_vals[i] + phase + pi/2))
}
data.frame(x_vals, y_vals)
}
library(tidyverse)
plot_from_FFT <- data_from_fft(0, 1, sampl_freq, freq, ampl, phase)
ggplot(plot_from_FFT, aes(x_vals, y_vals)) +
geom_line()
Now, let's see if we can use that output to reconstruct the inputs:
eegkit::eegfft(plot_from_FFT$y_vals, lower = 1, upper = 20, Fs = sampl_freq) %>%
filter(abs(strength) > 0.1)
frequency strength phase.shift
1 1 4.1158607 0.004451123
2 2 5.1177070 1.154553861
3 3 3.2155744 -2.185185998
4 4 1.5319350 2.739953054
5 5 2.4283426 1.173258629
6 6 1.0813858 2.645126993
7 7 1.8323207 -0.644216053
8 8 0.9598727 -2.138381646
9 9 1.1427380 1.685081744
10 10 0.2312619 -1.265466418
Yes! These are pretty close to the inputs.
eegkit::eegfft(plot_from_FFT$y_vals, lower = 1, upper = 20, Fs = sampl_freq) %>%
filter(abs(strength) > 0.1) %>%
left_join(
tibble(frequency = freq,
strength_orig = ampl,
phase_orig = phase)
) %>%
gather(stat, value, -frequency) %>%
mutate(category = if_else(stat %>% str_detect("str"), "strength", "phase"),
version = if_else(stat %>% str_detect("orig"), "plot inputs", "reconstructed inputs"),) %>%
ggplot(aes(frequency, value, shape = version, size = version)) +
geom_point() +
scale_x_continuous(breaks = 1:10, minor_breaks = NULL) +
scale_shape_manual(values = c(16, 21)) +
scale_size_manual(values = c(1,5)) +
facet_wrap(~category)

How to bootstrap the result of arithmetic operation of regression coefficients of two models

I want to do bootstrap of the division of the outputs of two regression models to get confidence intervals of the mean of the result of the operation.
#creating sample data
ldose <- rep(0:5, 2)
numdead <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16)
sex <- factor(rep(c("M", "F"), c(6, 6)))
SF <- cbind(numdead, numalive = 20-numdead)
dat<-data.frame(ldose, numdead, sex, SF)
tibble::rowid_to_column(dat, "indices")
#creating the function to be bootstrapped
out<-function(dat) {
d<-data[indices, ] #allows boot to select sample
fit1<- glm(SF ~ sex*ldose, family = binomial (link = log), start=c(-1,0,0,0))
fit2<- glm(SF ~ sex*ldose, family = binomial (link = log), start=c(-1,0,0,0))
coef1<-coef(fit1)
numer<-exp(coef1[2])
coef2<-coef(fit2)
denom<-exp(coef2[2])
resultX<-numer/denom
return(mean(resultX))
}
#doing bootstrap
results <- boot(dat, out, 1000)
#error message
Error in statistic(data, original, ...) : unused argument (original)
Thanks in advance for any help.

R: how to get optim to output parameter values at each iteration

library(stats4)
x <- 0:10
y <- c(26, 17, 13, 12, 20, 5, 9, 8, 5, 4, 8)
## Easy one-dimensional MLE:
nLL <- function(lambda) -sum(stats::dpois(y, lambda, log = TRUE))
fit0 <- mle(nLL, start = list(lambda = 5), nobs = NROW(y), method = "L-BFGS-B")
This is a toy example from mle's documentation. The optimization method I chose to use is L-BFGS-B. I'm interested in seeing the lambda values at different iterations.
Looking into optim's help page, I tried adding trace = TRUE. But that seems to give me the likelihood at each iteration and not the lambda values.
> fit0 <- mle(nLL, start = list(lambda = 5), nobs = NROW(y), method = "L-BFGS-B", control = list(trace = TRUE))
final value 42.726780
converged
How can I obtain the lambda estimates at each iteration?

Optimizing an optimization

I have a discrete data set with multiple peaks. I am trying to generate an automatic method for fitting a Gaussian curve to an unknown number of data points. The ultimate goal is to provide a measure of uncertainty on the location (x-axis) of the peak in the y-axis, using the sigma value of a best-fit Gaussian curve. The full data set has a half dozen or so unique peaks of various shapes.
Here is a sample data set.
working <- data.frame(age = seq(1, 50), likelihood = c())
likelihood = c(10, 10, 10, 10, 10, 12, 14, 16, 17, 18,
19, 20, 19, 18, 17, 16, 14, 12, 11, 10,
10, 9, 8, 8, 8, 8, 7, 6, 6, 6))
Here is the Gaussian fitting procedure. I found it on SO, but I can't find the page I took it from again, so please forgive the lack of link and citation.
fitG =
function(x,y,mu,sig,scale)
f = function(p){
d = p[3] * dnorm( x, mean = p[ 1 ], sd = p[ 2 ] )
sum( ( d - y ) ^ 2)
}
optim( c( mu, sig, scale ), f )
}
This works well if I pre-define the area to fit. For instance taking only the area around the peak and using input mean = 10, sigma = 5, and scale = 1:
work2 <- work[5:20, ]
fit1 <- fitG(work2$age, work2$likelihood, 10, 5, 1)
fitpar1 <- fit1$par
plot(work2$age, work2$likelihood, pch = 20)
lines(work2$age, fitpar1[3]*dnorm(work2$age, fitpar1[1], fitpar1[2]))
However, I am interested in automating the procedure in some way, where I define the peak centers for the whole data set using peakwindow from the cardidates package. The ideal function would then iterate the number of data points used in the fit around a given peak in order to optimize the Gaussian parameters. Here is my attempt:
fitG.2 <- function (x, y) {
g <- function (z) {
newdata <- x[(y - 1 - z) : (y + 1 + z), ]
newfit <- fitG( newdata$age, newdata$likelihood, 10, 5, 1)
}
optimize( f = g, interval = c(seq(1, 100)))
}
However, I can't get this type of function to actually work (an error I can't solve). I have also tried creating a function with a for loop and setting break parameters but this method does not work consistently for peaks with widely varying shape parameters. There are likely many other R functions unknown to me that do exactly this.

Resources