When using the post command, I get the following error:
post command requires expressions be bound in parenthesis
My program generates a matrix which stores regression coefficients for each simulation, and then uses the post command to declare as float and place the output of the matrix in parenthesis (betas).
A sample of the code:
*Priors
set more off
global nmc=10
global l = 4 /* number of lags */
global cnt=150 /* number of countries */
set seed 10101
* Gen empty beta matrix
matrix betas = J(153,$nmc+1,.)
*** THIS IS WHERE MONTECARLO STARTS***
program bootStrapCH5, rclass
tempname sim
postfile `sim' betas using results, replace /* As trial I'll create only the betas matrix for now. */
*postfile `sim' betas alpha_mean b1_mean b2_mean b3_mean b4_mean se_alpha se1 se2 se3 se4 using results, replace
quietly {
forvalues i = 1/$nmc {
* Fixed effects regression.
reg gdp_growth_wb L(1/4).gdp_growth_wb i.id
matrix B1= e(b)
mat li B1
predict g_hat,xb
gen e_hat= gdp_growth_wb - g_hat
*gen flag=e(sample)
* Generate the "wild" errors for the forecasts
gen eta=rnormal()
gen e_star=e_hat*eta
**RECURSION
levelsof id, local(codes)
capture noisily replace y_star= _b[_cons] + _b[L.gdp_growth_wb]*L.y_star + ///
_b[L2.gdp_growth_wb]*L2.y_star + _b[L3.gdp_growth_wb]*L3.y_star + ///
_b[L4.gdp_growth_wb]*L4.y_star + e_star if (id==1 & Dini4forward==1)
forvalues cc= 2(1)150 {
capture noisily replace y_star= _b[_cons] + _b[`cc'.id] + _b[L.gdp_growth_wb]*L.y_star + ///
_b[L2.gdp_growth_wb]*L2.y_star + _b[L3.gdp_growth_wb]*L3.y_star + ///
_b[L4.gdp_growth_wb]*L4.y_star + e_star if (id==`cc' & Dini4forward==1)
}
*Regression with new sample: y_star
reg y_star L(1/4).y_star i.id
matrix b= e(b)'
matrix betas= (betas , b)
matrix list betas
post `sim' float (betas)
}
}
postclose `sim'
end
*Execute program
bootStrapCH5
use results, clear
summarize
I also tried an alternative:
post `sim' (betas)
And got the error:
> type mismatch
post: above message corresponds to expression 1, variable betas
Any ideas on how to fix this are very much appreciated.
I'm not very familiar with postfile, but I think one issue could be that you are trying to insert a kx2 matrix into a single variable inside of your loop with post.
When you initiate postfile using:
postfile `sim' betas using results
you have declared a Stata dataset with a single variable, betas.
So, instead of using
post `sim' float (betas)
you might try:
tempname sim
postfile `sim' float (betas1 betas2) using results, replace
forvalues i = 1/$nmc {
* Some code. . .
local rows = rowsof(betas)
forvalues i = 1/`r' {
post `sim' (betas[`i',1]) (betas[`i',2])
}
* some other code. . .
}
or something similar to declare a file with the proper number of variables which you intend on posting to the dataset.
Further, I'm not sure that you can post a matrix directly anyway (I could be wrong about this). If you can't, then you could nest a forvalues loop inside of the loop you currently have to iterate through the elements of betas and post them individually - as I have done in the example above.
Finally, you are trying to cast the values of betas as data type float in your post command. I believe the storage types need to be declared in the postfile command (but again, I could be wrong about this). The first error you cite (expressions bound in parenthesis) is a direct result of including float in the post command.
Bottom line - I suspect the first error is due to declaring the data type when you try to post the data, and the second error (type mismatch) is a result of trying to insert an kx2 matrix into a variable. See below for an example of type mismatch when trying to (incorrectly) create data from a matrix:
clear *
mat a = (1\2)
set obs 2
gen x = a
Although I admittedly would have expected the error to be more analogous to this:
mat a = (1\2)
set obs 2
gen x = a*2
matrix operators that return matrices not allowed in this context
Also look at svmat for creating data from matrices.
Related
I struggle with adapting the example of the function bigglm.data.frame within package biglm to a case where chunksize is not constant but chunks are identified by a factor, say "GROUP" in the input dataframe i.e. say "DF" (around 20 million rows in my case). My problem is not storing the data but understanding how to feed it in gradually to bigglm. I have made splitted version of DF along the variable GROUP, i.a list of data frames, call it DATALIST.
I understand the function, more exactly its subfunction datafun must return the next chunk data. So in my case I want it to go to the next i in DATALIST[[i]]. I can equally usethe original data frame, i.e subsetting with DF$GROUP==i. My question is how I adapt the example funtion from the package to do this.
From the package (https://github.com/cran/biglm/blob/master/R/bigglm.R) the function is
function (formula, data, ..., chunksize = 5000)
{
n <- nrow(data)
cursor <- 0
datafun <- function(reset = FALSE) {
if (reset) {
cursor <<- 0
return(NULL)
}
if (cursor >= n)
return(NULL)
start <- cursor + 1
cursor <<- cursor + min(chunksize, n - cursor)
data[start:cursor, ]
}
rval <- bigglm(formula = formula, data = datafun, ...)
rval$call <- sys.call()
rval$call[[1]] <- as.name(.Generic)
rval
}
I am no good programmer obviously, rather a simple user with a loop mindset, so I had expected bigglm would have an index that I could match to i, but there is none. I see n refers to rows and start from zero then increases by adding chunksize. I know n from my dataframe. And I can also have cursor from the length of each chunk (length(DATALIST[[i]])), but I need first to identify the chunk itself and that is where I am stuck.
Meanwhile I know I can just fit a glm to each chunk separately but that is a more traditional way and would love to have the big model fitted. One could also suggest I go for equal chunksize but I prepared chunks exactly to make sure I never have only zeros or ones (it is a logit model) once I have controlled for combined fixed effects.
Thanks for any help!
I am trying desperately to automate some model testing in lme4::lmer (as I have too many to do)
The functions I use run some models and find the best one by checking their stats, and create res (that has two Lists in it.
res$res is a data frame
res$res$model is the text I need rerun the best model (to clear it out, I only use 1)
res$fits is a List of 1, with res$fits[1] being a "Formal Class 'lmerMod' with 13 slots, the name of which is always exactly the same as models
Here's some code to make more sense:
models <- theBigList
## run function
res <- fit.func(models=models, response='bnParam1')
# show model selection table
res$res
# This is where you get the best model from above and put it in here to set it up for plotting.
#
models <- res$res$model[1]
# run function
res <- fit.func(models=models, response='bnParam1')
## model selection table
res$res
# Once you get a model where the best result is the same as the "previous" one you copy and paste it in here to graph it.
# It will be the one with the the lowest CV.R2 from the 2nd 'models'
top.fit <- res$fits$'INSERT models HERE'
#top.fit is a class lmerMod, and has the list of everything needed to be extracted and calculated to ggplot it
Normally I copy and paste the text for the best model into the space where it says 'INSERT models HERE', but I would like to automate it.
I can't seem to use models as an input, nor force it, eg as.Class or as.String, things like that, nor use other ways of referencing from a list. I am at a loss as to how to assign the right variable.
EDIT #######
So res$res is the first List in res that is a data frame, it will output something like this:
> res$res
model nPar D aic d.aic w.aic R2 cv.R2
1 Sp + (1|Spec) + SE + TC 51 3804.244 3906.244 0 1 0.6376789 0.2586369
To expand on my last sentence which is the most important. Normally the last bit of code passes the parameters to lme4::fixef like this for e.g.:
top.fit <- res$fits$"Sp + (1|Spec) + SE + TC"
This line of code also has the last part of that (that I discovered earlier but changes everytime I run a different analysis):
models <- res$res$model[1]
> models
[1] "Sp + (1|Spec) + SE + TC"
So I'd basically like to put to something like this top.fit <- res$fits$models but I assume there is some form of Type incompatibility or problem with using 'models' within the reference to the List/Class?
Single square brackets were changing the class to a list which wasn't working.
I solved this by referencing with double square brackets, which stopped the conversion, and passes the List as the original lmerMod Class it needs to be. Now I know.
top.fit <- res$fits[[1]]
I am new in writing loops and I have some difficulties there. I already looked through other questions, but didn't find the answer to my specific problem.
So lets just create a random dataset, give column names and set the variables as character:
d<-data.frame(replicate(4,sample(1:9,197,rep=TRUE)))
colnames(d)<-c("variable1","variable2","trait1","trait2")
d$variable1<-as.character(d$variable1)
d$variable2<-as.character(d$variable2)
Now I define my vector over which I want to loop. It correspons to trait 1 and trait 2:
trt.nm <- names(d[c(3,4)])
Now I want to apply the following model for trait 1 and trait 2 (which should now be as column names in trt.nm) in a loop:
library(lme4)
for(trait in trt.nm)
{
lmer (trait ~ 1 + variable1 + (1|variable2) ,data=d)
}
Now I get the error that variable lengths differ. How could this be explained?
If I apply the model without loop for each trait, I get a result, so the problem has to be somewhere in the loop, I think.
trait is a string, so you'll have to convert it to a formula to work; see http://www.cookbook-r.com/Formulas/Creating_a_formula_from_a_string/ for more info.
Try this (you'll have to add a print statement or save the result to actually see what it does, but this will run without errors):
for(trait in trt.nm) {
lmer(as.formula(paste(trait, " ~ 1 + variable1 + (1|variable2)")), data = d)
}
Another suggestion would be to use a list and lapply or purrr::map instead. Good luck!
I am working with some R code that I'm sure must be able to written using one of the apply series of functions, but I can't work out how. I have a dataframe with multiple columns and I want to call a function, and the input of the function is using multiple columns from the dataframe. Let's say I have this data and a function f:
data<- data.frame(T=c(1,2,3,4), S=c(3,7,8,4), K=c(5,6,11,9))
data
V<-c(0.1,0.2,0.3,0.4,0.5,0.6)
f<-function(para_h,S,T,a,t,b){
r<- V
steps<-T
# Recursive form: Terminal condition for the A and B at time T
A_T=0
B_T=0
A=c()
B=c()
# A and B a time T-1
A[1]= r[steps]*a
B[1]= a*para_h[5]+ ((para_h[4])^(-2))
# Recursion back to time t
for (i in 2:steps){
A[i]= A[i-1]+ r[steps-i+1]*a + para_h[1]*B[i-1]
B[i]= para_h[2]*B[i-1]+a*para_h[5]+ (para_h[4]^(-2))
}
f = exp(log(S)*a + A[t] + B[t]*b )
return(f)
}
This function works well for some specific values :
> para_h<-c(0.1,0.2,0.3,0.4,0.5,0.7)
> f(para_h,S=3,T=2,a=0.4,t=1,b=0.1)
[1] 3.204144
I want to apply a function to each column S and T in a data frame. So, my code looks like:
mapply(function(para_h,S,T,a,t,b) f(para_h,S,T,a,t,b) ,para_h,S=data$S,T=data$T,a=0.4,t=1,b=0.1)
This gives an error:
> mapply(function(para_h,S,T,a,t,b) f(para_h,S,T,a,t,b) ,para_h,S=data$S,T=data$T,a=0.4,t=1,b=0.1)
Error in A[i] = A[i - 1] + r[steps - i + 1] * a + para_h[1] * B[i - 1] :
replacement has length zero
I'm pretty sure the problem is that : "steps" is vector. Will really appreciate an elegant solution.
I hope this has made some sort of sense, any advice would be greatly appreciated.
Couple of things:
1) each call of your function expects full para_h vector, but in your mapply code it will receive only one value at a time, so you probably wants something like this:
mapply(function(S,T) f(para_h,S,T,a=0.4,t=1,b=0.1), data$S, data$T)
or this:
apply(data,1,function(d) f(para_h,d['S'],d['T'],a=0.4,t=1,b=0.1))
2) Your function throws error when T==1 (which is the case in the first row of data), so you might need to modify your sample data set to be able to run this code.
How to use Rcaller to get more than one result in one time?
For example,I use
code.addRCode("data<-read.table(\""+ "/home/yo/Documents/Book1.csv"+ "\", header=TRUE,sep=\"\t\")");
caller.setRCode(code);
caller.runAndReturnResult("data");
Then,I can use caller.getParser().getNames().size() ...e.g.function
But if I want to run summary(data$pH) ,how do I should do?Add to code before?If that the "caller"belongs to which?
Thanks anyone who help me!
Use lists of results in R. For example you have a list of
result <- list(a=c(1,2,3), b=3.6, c=5)
after calling rcaller.runAndReturnResult(result), the variables a, b and c are accessible wia
double[] a = rcaller.getParser().getAsDoubleArray("a");
or
int c = rcaller.getParser().getAsIntegerArray("c")[0];
With getNames() method, you can get the names contained in the 'result' list as well.
When you use summary(), nothing changes. Suppose you make a lm() call in R like
ols <- lm (y ~ x + z, data=mydata)
and then
detailed <- summary(ols)
and this is also a list, as the returned value of lm(). You can access elements of this list using
double[] residuals = rcaller.getParser().getAsDoubleArray("residuals");
and
double rsquared = rcaller.getParser().getAsDoubleArray("r.squared")[0];
Nothing changes after summary(). Back to your code
code.addRCode("data<-read.table(\""+ "/home/yo/Documents/Book1.csv"+ "\", header=TRUE,sep=\"\t\")");
caller.setRCode(code);
caller.runAndReturnResult("data");
does not return a list, you can type rather
RCode code = new RCode();
code.addRCode("myresult <- list(res1=data$pH, res2=data$anotherVector)");
rcaller.setRCode(code);
caller.runAndReturnResult("myresult");
After all,
double[] pH = caller.getParser().getAsDoubleArray(pH);
returns your pH variable.
For further information, visit the official blog here
you may find this here . it demostrate how we can get result from RCaller using runAndReturnValue method
http://stdioe.blogspot.com.tr/search/label/rcaller