When plotting runjags output, how does one plot a single specific variable, when many other variables have similar names? Providing a quoted variable name with the varsargument doesn't seem to do it (it still provides all partial matches).
Here is a simple reproducible example.
N <- 200
nobs <- 3
psi <- 0.35
p <- 0.45
z <- rbinom(n=N, size=1,prob=psi)
y <- rbinom(n=N, size=nobs,prob=p*z)
sink("model.txt")
cat("
model {
for (i in 1:N){
z[i] ~ dbern(psi)
pz[i] <- z[i]*p
y[i] ~ dbin(pz[i],nobs)
} #i
psi ~ dunif(0,1)
p ~ dunif(0,1)
}
",fill = TRUE)
sink()
m <-list(y=y,N=N,nobs=nobs)
inits <- function(){list(psi=runif(1),p=runif(1),z=as.numeric(y>0))}
parameters <- c("p","psi")
ni <- 1000
nt <- 1
nb <- 200
nc <- 3
ad <- 100
library(runjags)
out <- run.jags(model="model.txt",monitor=parameters,data=m,n.chains=nc,inits=inits,burnin=nb,
sample=ni,adapt=ad,thin=nt,modules=c("glm","dic"),method="parallel")
windows(9,4)
plot(out,plot.type=c("trace","histogram"),vars="p",layout=c(1,2),new.window=FALSE)
It should be possible to double quote variables to get an exact match, but this seems to be broken. It should also be possible to specify a logical vector to vars but this seems to be broken for the plot method ... how embarrassing. The following does work though:
# Generate a logical vector to use with matching variable names:
variables <- extract(out, 'stochastic')
variables['psi'] <- FALSE
# Add summary statistics only for the specified variables and pre-draw plots:
out2 <- add.summary(out, vars=variables, plots=TRUE)
plot(out2, plot.type=c("trace","histogram"))
I will fix the other issues for the next release.
Matt
Related
I am using the package robust.arima in R, which works fine when I call it in a script. However, I want to organize my files and therefore call robust arima in a function. Here all of a sudden the variable is not found. Let me give an example
# Works fine
ts_list <- rnorm(100)
arima.rob(ts_list~1)
# Breaks down
get_rob_estimate <- function(x){
out <- arima.rob(x~1)
return(out)
ts_list <- rnorm(100)
get_rob_estimate(ts_list)
Error in eval(formula[[2]]) : object 'x' not found
Does anyone know what's going on? I think the problem looks similar to R : Pass argument to glm inside an R function , but I still can't seem to figure it out and I am curious how R processes these functions?
Edit
Okay for the basic option I understand it now, but I don't get why it works. What if I have
check_func <- function(ind_ts){
out <- substitute(arima.rob(ind_ts~1))
return(eval(out))
}
analyze_ts <- function(){
df <- mvrnorm(100, mu=c(0,0,0), Sigma=diag(c(1,1,1)))
p <- list()
for (i in ncol(df)){
sel <- df[,i]
check_func(sel)
p <- append(p, sel)
}
return(p)
}
analyze_ts()
I then get the error
Error in eval(formula[[2]]) : object 'sel' not found
How does it work? What is going on here? I just want my list to go as a list in my function, shouldn't be so hard right? Does not matter how many functions it goes through?
Using substitute()
get_rob_estimate <- function(x) {
out <- substitute(robustarima::arima.rob(x ~ 1))
return(eval(out))
}
get_rob_estimate(ts_list)
# Call:
# robustarima::arima.rob(formula = ts_list ~ 1)
#
# Regression Coefficients:
# (Intercept)
# 0.1032
#
# Degrees of freedom: 100 total; 99 residual
# Innovations standard deviation: 0.9832
#
# Number of outliers detected: 1
#
# Outlier index
# [1] 59
#
# Outlier type
# [1] "AO"
#
# Outlier impact
# [1] -3.0963
#
# Outlier t-statistics
# [1] 3.1493
edit
You can write your Arima wrapper correctly like so:
analyze_ts <- function(){
df <- MASS::mvrnorm(100, mu=c(0, 0, 0), Sigma=diag(c(1, 1, 1)))
for (i in seq_len(ncol(df))) {
sel <- df[,i]
sel <- check_func(sel)
p <- append(p, sel)
}
return(p)
}
Better using lapply
analyze_ts <- function() {
df <- MASS::mvrnorm(100, mu=c(0, 0, 0), Sigma=diag(c(1,1,1)))
return(lapply(seq_len(ncol(df)), \(i) check_func(df[,i])))
}
Usage:
set.seed(42) ## for sake of reproducibility
analyze_ts()
Data:
set.seed(42)
ts_list <- rnorm(100)
I am attempting to find the significance of factor exposures, from a 5x60 matrix of df_exposures - 5 factors & 60 portfolios
The df_exposures matrix was initially calculated by regressing the monthly returns on 60 portfolios on certain "macroeconomic shocks," - TM2R, IPR, InfR, UnR, OilR in the following loop
# Assign variable names
TM2R <- data[,2]
IPR <- data[,3]
InfR <- data[,4]
UnR <- data[,5]
OilR <- data[,6]
# Run regression
for (i in 7:69){
model_name <- paste0("model_", i, ".csv")
model <- lm(data = data, data[[i]]~TM2R+IPR+InfR+UnR+OilR)
#Model coefficients
tidy1 <- tidy(model)
#Standard Errors in regression results
tidy1$vcov <- vcovHC(model, type = "HC1")
write.csv(tidy1, file = model_name)
}
The loop which I am using to find how many of these exposures/"beta-hats," are significant is as follows
alpha <- 0.05
for (i in 1:ncol(df_exposure)){
for (j in 1:nrow(df_exposure)){
# Store Beta_hat of asset "i" to shock "j"
beta_hat <- model[[i]]$coefficients[[j+1]]
df_exposure[j,i] <- beta_hat
#Store beta_hat if significant in df_significance
pval <- coef(summary(model[[i]]))[j+1, "pr(>|t|)"]
if(pval > alpha){
df_significance[j,i] <- 0
}else{
df_significance[j,i] <- beta_hat
}
}
}
However, R returns the error Error: $ operator is invalid for atomic vectors.
I am unable to find a way to convert model[[i]] from an atomic vector to something else within a loop.
It may be important to note that R returns
is.atomic(model[[i]]) = TRUE
is.atomic(model) = FALSE
Any help on the matter would be appreciated, there may also be a way to find how many of the model coefficients are significant within the first loop?
Thanks :)
I am fairly new to programming in R, so I apologize if this question is too basic. I am trying to study the properties of OLS with error terms created by three different processes (i.e., normal1, normal2, and chi-square). I include these in a list, 'fun_list'.
I would like to iterate through 1,000 (iter) regressions, each with sample size 500 (n). I would like to save all 1,000 X 500 observations in a dataset (big_data) as well as the regression results (reg_results).
At the end of the program, I would like 1,000 regressions for each of the three processes (for a total of 3,000 regressions). I have set up nested loops for the three functions on one level and the 1,000 iterations on a different (sub-) level. I am having trouble getting the program to loop through the three different functions. I am not sure how to call out each element of the list in this embedded loop. Any help would be greatly appreciated!
library(psych)
library(arm)
library(dplyr)
library(fBasics)
library(sjstats)
#set sample size and number of iterations
set.seed(12345)
n <- 500
iter <- 1000
#setting empty vectors. Probably a better way to do this. :)
bn <- rep(NA,iter)
sen <- rep(NA,iter)
#these are the three functions I want to use to generate en,
#which is the error term below. I want one loop for each of the three.
# I can get f1, f2 and f3 to work independently, but I can't get the list
#to work to cycle through all three.
f1 <- function (n) {rnorm(n, 0, 2)}
f2 <- function (n) {rnorm(n, 0, 10)}
f3 <- function (n) {rchisq(n, 2)}
fun_list <- list(f1, f2, f3)
#following line starting point for saving all iterations in one big
#dataset
datalist = list()
#if I remove the following line (for (j ....)), I can get this to work by
#referencing each function independently (i.e., using 'en <- f1(n)').
for (j in fun_list) {
for (s in 1:iter) {
# en <- f1(n)
en <- fun_list[[1]]
x <- rnorm(n, 0, .5)
yn <- .3*x + en
#this is the part that saves the data#
dat <- data.frame(yn, x, en)
dat$s <- s
datalist[[s]] <- dat
#### run model for normal data and save parameters###
lm1n <- lm(yn ~ x)
int.hatn <- coef (lm1n)[1]
b.hatn <- coef (lm1n)[2]
se.hatn <- se.coef (lm1n) [2]
##save them for each iteration
bn[s] = b.hatn
sen[s] = se.hatn
}
}
reg_results<- tibble(bn, sen)
big_data = do.call(rbind,datalist)
When using the loop, I get the following error:
Error in 0.3 * x + en : non-numeric argument to binary operator
I am assuming this is because I do not fully understand how to call out each of the three functions in the list.
Here is a complete solution which wraps the multiple points discussed in the comments:
library(psych)
library(arm)
library(dplyr)
library(fBasics)
library(sjstats)
#set sample size and number of iterations
set.seed(12345)
n <- 500
iter <- 1000
#setting empty vectors. Probably a better way to do this. :)
bn <- c()
sen <- c()
#these are the three functions I want to use to generate en,
#which is the error term below. I want one loop for each of the three.
# I can get f1, f2 and f3 to work independently, but I can't get the list
#to work to cycle through all three.
f1 <- function (n) {rnorm(n, 0, 2)}
f2 <- function (n) {rnorm(n, 0, 10)}
f3 <- function (n) {rchisq(n, 2)}
fun_list <- list(f1, f2, f3)
#following line starting point for saving all iterations in one big
#dataset
datalist = list()
#if I remove the following line (for (j ....)), I can get this to work by
#referencing each function independently (i.e., using 'en <- f1(n)').
for (j in c(1:length(fun_list))) {
en <- fun_list[[j]]
for (s in 1:iter) {
x <- rnorm(n, 0, .5)
random_part <- en(n)
yn <- .3*x + random_part
#this is the part that saves the data#
dat <- data.frame(yn, x, random_part)
dat$s <- s
datalist[[s]] <- dat
#### run model for normal data and save parameters###
lm1n <- lm(yn ~ x)
int.hatn <- coef(lm1n)[1]
b.hatn <- coef(lm1n)[2]
se.hatn <- se.coef(lm1n)[2]
##save them for each iteration
bn = c(bn,b.hatn)
sen = c(sen,se.hatn)
}
}
reg_results<- tibble(bn, sen)
big_data = do.call(rbind,datalist)
I'm pretty new in R and i'm stuck with one problem.
I've already found how to create many linear models at once, i made a function that counts AIC for each lm, but I cannot display this function with header that will show the name of the lm. I mean i want to get a data frame with header e.g. lm(a~b+c, data=data), and the AIC result for this lm.
Here's what i already wrote (with big help from stackoverflow, of course)
vars <- c("azot_stand", "przeplyw", "pH", "twardosc", "fosf_stand", "jon_stand", "tlen_stand", "BZO_stand", "spadek_stand")
N <- list(1,2,3,4,5,6,7,8)
COMB <- sapply(N, function(m) combn(x=vars[1:8], m))
COMB2 <- list()
k=0
for(i in seq(COMB)){
tmp <- COMB[[i]]
for(j in seq(ncol(tmp))){
k <- k + 1
COMB2[[k]] <- formula(paste("azot_stand", "~", paste(tmp[,j], collapse=" + ")))
}
}
res <- vector(mode="list", length(COMB2))
for(i in seq(COMB2)){
res[[i]] <- lm(COMB2[[i]], data=s)
}
aic <- vector(mode="list", length(COMB2))
d=0
for(i in seq(res)){
aic[[i]] <- AIC(res[[i]])
}
View(aic)
show(COMB2)
I guess that i miss something in the aic, but don't know what...
With formula you can obtain the formula of a regression model. Since you want to store the formula with the AIC, I would create a data.frame containing both:
aic <- data.frame(model = character(length(res)), aic = numeric(length(res)),
stringsAsFactors = FALSE)
for(i in seq(res)){
aic$model[i] <- deparse(formula(res[[i]]), width.cutoff = 500)
aic$aic[i] <- AIC(res[[i]])
}
Normally you would use format to convert a formula to a character. However, for long formulas this results in multiple lines. Therefore, I use deparse (which is also used by format) and passed it the width.cutoff argument.
You cannot use res[[i]]$call as this is always equal to lm(formula = COMB2[[i]], data = s).
Other suggestions
The first part of your code can be simplified. I would write something like:
s <- attitude
vars <- names(attitude)[-1]
yvar <- names(attitude)[1]
models <- character(0)
for (i in seq_along(vars)) {
comb <- combn(vars, i)
models <- c(models,
paste(yvar, " ~ ", apply(comb, 2, paste, collapse=" + ")))
}
res <- lapply(models, function(m) lm(as.formula(m), data = s))
It is shorter and also has the advantage that magical constants such as the 8 and azot_stand are defined outside the main code and can easily be modified.
I also noticed that you use azot_stand both as target variable and predictor (it is also part of vars). I don't think you will want to do that.
I get an error for running the code below. I haven not figured out what I am doing wrong - sorry if it is obvious, I am new to R. The idea is to "generate" 100 regressions and output the estimated slope 100 times.
set.seed(21)
x <- seq(1,40,1)
for (i in 1:100 ) {
y[i] = 2*x+1+5*rnorm(length(x))
reg[i] <- lm(y[i]~x)
slp[i] <- coef(reg[i])[2]
}
There are several problems with the way you use indexing. You'll probably need to spend some time again on a short tutorial about R for beginners, and not "rush" to loops and regressions...
In the end, you want to have a vector containing 100 slope values. You need to define this (empty) vector 'slp' prior to running the loop and then fill each ith element with its value in the loop.
On the other hand,
1) at each iteration you don't fill the ith element of y but create a whole new vector y with as many values as there are in x...
2) you don't need to keep every regression so you don't need to "index" your object reg.
So here it is:
set.seed(21)
x <- seq(1,40,1)
slp=rep(NA,100)
for (i in 1:100) {
y = 2*x+1+5*rnorm(length(x))
reg <- lm(y~x)
slp[i]<-coef(reg)[2]
}
print(slp)
In addition to the other answers, there is a better (more efficient and easier) possibility. lm accepts a matrix as input for y:
set.seed(21)
y <- matrix(rep(2*x + 1, 100) + 5 *rnorm(length(x) * 100), ncol = 100)
reg1 <- lm(y ~ x)
slp1 <- coef(reg1)[2,]
all.equal(slp, slp1)
#[1] TRUE
If you had a function other than lm and needed a loop, you should use replicate instead of a for loop:
set.seed(21)
slp2 <- replicate(100, {
y = 2*x+1+5*rnorm(length(x))
reg <- lm(y~x)
unname(coef(reg)[2])
})
all.equal(slp, slp2)
#[1] TRUE
You need to create the matrix/vector y, reg, slp first, to be able to write to position i like: y[i] <-. You can do something along:
set.seed(21)
x <- seq(1,40,1)
slp <- numeric(100)
for (i in 1:100 ) {
y <- 2*x+1+5*rnorm(length(x))
reg <- lm(y~x)
slp[i] <- coef(reg)[2]
}
> slp
[1] 2.036344 1.953487 1.949170 1.961897 2.098186 2.027659 2.002638 2.107278
[9] 2.036880 1.980800 1.893701 1.925230 1.927503 2.073176 2.101303 1.943719
...
[97] 1.966039 2.041239 2.063801 2.066801