A function included in a package is throwing an error when I attempt to supply weights to the function. The portion of the package call is required to be specified like this:
weights = c("kernel_wght")
Inside the function, the following two lines of code are used to specify a data frame object called weight:
weight1 <- sprintf("dataarg$%s", weights)
weight <- as.data.frame(eval(parse(text = weight1)))
However, the analytic portion of the function attempts to use glm to conduct an analysis of data, using the weights provided.
result1 <- glm(f1, family="gaussian", weights=weight, data=dataarg)
Doing so yields the following error:
Error in (function (arg) : object 'weight' not found
I've seen some recommendations that the whole glm call should be re-specified...and i've seen some referrals to global environment objects. Why can i print the dataframe, verifying it indeed is created, but can't refer to it in the call to glm? Is there a fix that i have overlooked?
As per requested, here is a small example. I created some sample data, as if it had come from a multiple imputation generating process:
dat <- c(1, 1, 0, .5, 1, 3, 0, 1, 1, 4, 0, .5, 1, 5, 1, 1, 1, 2, 1,
.5,
2, 7, 1, 1, 2, 3, 0, .5, 2, 2, 0, 1, 2, 4, 1, .5)
dat <- data.frame(matrix(dat,ncol=4, byrow=T))
colnames(dat) <- c("id", "y", "tx", "wt")
imp_lst <- lapply(1:2, function(s) dplyr::filter(dat, id == s))
for (i in 1:length(imp_lst)) { assign(paste0("imp", i),
as.data.frame(imp_lst[[i]])) }
df_lst <- list()
for (i in 1:length(imp_lst)) {
assign(paste0("imp", i), as.data.frame(imp_lst[[i]]))
df_lst <- append(df_lst, list(get(paste0("imp", i))))
names(df_lst)[i] <- paste0("imp", i)
}
And here is a small example, mostly taken from the package, that re-creates the problem:
my_ex <- function(datasets, y, treatment, weights=NULL, ...) {
data <- names(datasets)
for (i in 1:length(treatment)) {
d1 <- sprintf("datasets$%s", data[i])
dataarg <- eval(parse(text=d1))
print(dataarg)
if(!is.null(weights)) {
weight1 <- sprintf("dataarg$%s", weights)
weight <- as.data.frame(eval(parse(text = weight1)))
print(weight)
} else {
dataarg$weight <- weight <- rep(1,nrow(dataarg))
}
f1 <- sprintf("%s ~ %s ", y, treatment)
print(f1)
result1 <- glm(f1, family="gaussian", weights=weight, data=dataarg)
print(summary(result1))
}
}
Using the following call, the error appears:
testrun <- my_ex(df_lst, y = c("y","y"), treatment = c("tx","tx"), weights = c("wt","wt"))enter code here
The proximal problem is that you are defining the formula as a character string and passing it to glm. It gets converted to a formula within glm, but when that happens its environment is the environment of glm, so it doesn't know where to look for the weights variable (loosely speaking, glm will look (1) within the data frame provided as data and (2) in the environment of the formula). You can work around this by using as.formula() to convert the string to a formula before passing it to glm (e.g. glm(as.formula(f1), ...)).
However: using functions like eval, parse, assign is a code smell in R — it means there's probably a more natural, simpler, more robust way to do what you want. For example, I think this function does the same as what your function is trying to do, relying on indexing within lists rather than using eval(parse(...)) and friends.
my_ex2 <- function(datasets, y, treatment, weights = NULL, ...) {
result <- list()
for (i in 1:length(treatment)) {
form <- reformulate(treatment[i], response = y[i])
data <- datasets[[i]]
## note double brackets around second term - we want
## the results to be a vector, not a data frame
weight <- data[[weights[i]]]
result[[i]] <- glm(formula = form, weight = weight, data = data)
}
result
}
Then, to print out all the summaries, lapply(result, summary) (if you really think you only need the summary, you can save the summary instead of the fitted object inside the loop).
Related
Using R, the ifelse() command seems to be changing the class of a model I've saved (and reloaded) to a list instead of keeping it as "glm".
Here is the code that demonstrates the issue:
x1 <- c( 1:10 )
x2 <- c( 1, 1, 1, 0, 0, 0, 1, 1, 1, 0 )
y <- c( 1, 0, 0, 1, 1, 0, 0, 1, 1, 1 )
dataset <- as.data.frame( cbind( y, x1, x2 ) )
m <- glm( y ~ x1 + x2, family = binomial )
saveRDS( m, "test.rds" )
model <- readRDS( "test.rds" )
class(model)
# The object "model" is of class "glm" and "lm", and clearly not "list". The predict thus works, as the next code shows:
predict( model, dataset )
# Now, I want to assign the model if a certain condition is true (which I've replaced with TRUE to simplify the question, and a different model if the condition is FALSE. The issue is that the following assignment changes the class of model from "glm" "lm" to "list":
assign <- ifelse( TRUE, model )
class(assign)
# Now the object "assign", which should be the same as model (right?), is of class "list", and the following line of code fails to work:
predict( assign, dataset )
The last "predict" command gives this error message:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "list"
Can someone explain why the class was changed? How can I get the predict command to work on my assign?
From the documentation of the ifelse function, the return value of the function call ifelse(test, yes, no) is:
A vector of the same length and attributes (including dimensions and "class") as test and data values from the values of yes or no. [...]
As TRUE has length 1, you are getting a vector of length one back.
It seems like the functionality you desire can be achieved with:
if (test) {
assign <- model
} else {
assign <- the_other_model
}
Long time reader, first time poster. I have not found any previous questions about my current problem. I would like to create multiple linear functions, which I can later apply to variables. I have a data frame of slopes: df_slopes and a data frame of constants: df_constants.
Dummy data:
df_slope <- data.frame(var1 = c(1, 2, 3,4,5), var2 = c(2,3,4,5,6), var3 = c(-1, 1, 0, -10, 1))
df_constant<- data.frame(var1 = c(3, 4, 6,7,9), var2 = c(2,3,4,5,6), var3 = c(-1, 7, 8, 0, -1))
I would like to construct functions such as
myfunc <- function(slope, constant, trvalue){
result <- trvalue*slope+constant
return(result)}
where the slope and constant values are
slope<- df_slope[i,j]
constant<- df_constant[i,j]
I have tried many ways, for example like this, creating a dataframe of functions with for loop
myfunc_all<-data.frame()
for(i in 1:5){
for(j in 1:3){
myfunc_all[i,j]<-function (x){ x*df_slope[i,j]+df_constant[i,j] }
full_func[[i]][j]<- func_full
}
}
without success. The slope-constant values are paired up, such as df_slope[i,j] is paired with df_constant[i,j]. The desired end result would be some kind of data frame, from where I can call a function by giving it the coordinates, for example like this:
myfunc_all[i,j}
but any form would be great. For example
myfunc_all[2,1]
in our case would be
function (x){ x*2+4]
which I can apply to different x values. I hope my problem is clear.
So you have a slight problem with lazy evaluation and variable scopes when you are using a for loop to build functions (see here for more info). It's a bit safer to use something like mapply which will create closures for you. Try
myfunc_all <- with(expand.grid(1:5, 1:3), mapply(function(i, j) {
function(x) {
x*df_slope[i,j]+df_constant[i,j]
}
},Var1, Var2))
dim(myfunc_all) <- c(5,3)
This will create an array like object. The only difference is that you need to use double brackets to extract the function. For example
myfunc_all[[2,1]](0)
# [1] 4
myfunc_all[[5,3]](0)
# [1] -1
Alternative you can choose to write a function that returns a function. That would look like
myfunc_all <- (function(slopes, constants) {
function(i, j)
function(x) x*slopes[i,j]+constants[i,j]
})(df_slope, df_constant)
then rather than using brackets, you call the function with parenthesis.
myfunc_all(2,1)(0)
# [1] 4
myfunc_all(5,3)(0)
# [1] -1
df_slope <- data.frame(var1 = c(1, 2, 3,4,5), var2 = c(2,3,4,5,6), var3 = c(-1, 1, 0, -10, 1))
df_constant<- data.frame(var1 = c(3, 4, 6,7,9), var2 = c(2,3,4,5,6), var3 = c(-1, 7, 8, 0, -1))
functions = vector(mode = "list", length = nrow(df_slope))
for (i in 1:nrow(df_slope)) {
functions[[i]] = function(i,x) { df_slope[i]*x + df_constant[i]}
}
f = function(i, x) {
functions[[i]](i, x)
}
f(1, 1:10)
f(3, 5:10)
I have trouble with embedding a function in another function I am about to write. The particular function that shows some misbehaviour (or at least, it seems so to me), is the r.squaredGLMM function. It works fine on its own, but when called within the function I am writing, there is an error happening that I, honestly, do not understand. Here comes some example data:
library (lme4)
library (MuMIn)
my_table <- data.frame ("type" = c (rep ("low", 15), rep ("mid", 15), rep ("high", 15)),
"group" = rep (c ("A", "B", "C"),5),
"dependent" = sort (rpois (45, 3)),
"var_A" = rnorm (45),
"var_B" = c (rnorm (15), rnorm (15, 6, 0.1), rnorm (15, 18, 2)),
"var_C" = rep (c(3,1,6,2,8,2,1,6,7,3,20,12,23,15,15), 3) * c (1:3)
)
I use this dataframe calculate the following GLMM:
large_model <- glmer (dependent ~ var_A + var_B + var_C + (1|group),
data = my_table,
family = "poisson")
The r.squaredGLMM function yields the following result:
r.squaredGLMM (large_model)
> R2m R2c
> delta 0.4171626 0.4171626
> lognormal 0.4485549 0.4485549
> trigamma 0.3824522 0.3824522
Now comes the interesting part. I create a function to alter the large_model. Within the function, I create an object containing the formula of the large_model via getCall and an object containing the family used in the large_model. An if-statement is then used to alter the model; here, it produces a set of (perhaps senseless) shorter models. Note, that the the_family - object works fine here in order to tell glmer which family to use. The resulting short_model can then also be used to calculate, for example, the resuiduals of the respective models.
my_function <- function (global.model = NULL, my_i = 2) {
the_call <- getCall (global.model)
the_family <- family (global.model)$family
for (i in length(global.model#frame[[1]]): my_i) {
short_model <- glmer (the_call, data = global.model#frame[2:i,], family = the_family)
short_model_resids <- resid (short_model)
}
}
This works fine:
my_function (global_model, 40)
However, when I try to pass the short_model to r.squaredGLMM, I get an error:
my_function <- function (global.model = NULL, my_i = 2) {
the_call <- getCall (global.model)
the_family <- family (global.model)$family
for (i in length(global.model#frame[[1]]): my_i) {
short_model <- glmer (the_call, data = global.model#frame[2:i,], family = the_family)
short_model_resids <- resid (short_model)
short_model_R <- r.squaredGLMM (short_model)
}
}
my_function (global_model, 40)
> Error in glmer(formula = dependent ~ 1 + (1 | group), data = global.model#frame[2:i, :
> object 'the_family' not found
When I do it without the function and step by step by hand, it works fine (I suppose this is due to R storing the_family in the environment):
my_i <- 40
the_call <- getCall (large_model)
the_family <- family (large_model)$family
for (i in length(large_model#frame[[1]]): my_i) {
short_model <- glmer (the_call, data = global.model#frame[2:i,], family = the_family)
short_model_resids <- resid (short_model)
short_model_R <- r.squaredGLMM (short_model)
}
I have really no idea what is going wrong here. I hope very much, someone can help - it'd be so very good to use r.squaredGLMM within a function and to feed it with a family-object created within the same function.
I'm working on a project, trying to convert an R function to CUDA C++, but I can't understand some R function call, I'm really new to R and I can't find what I'm really looking after. To be exactly, this is the main R function code:
for (i in 1:ncy) {
res <- apply(allsubset, 2, banddepthforonecurve, xdata=x, ydata=y[,i], tau=tau, use=use)
depth[i] <- sum(res[1,])
localdepth[i] <- sum(res[2,])
}
The part that I can't really understand is "banddepthforonecurve" function call, this is the "banddepthforonecurve" function code:
banddepthforonecurve <- function(x, xdata, ydata, tau, use) {
envsup <- apply(xdata[,x], 1, max)
envinf <- apply(xdata[,x], 1, min)
inenvsup <- ydata <= envsup
inenvinf <- ydata >= envinf
depth <- all(inenvsup) & all(inenvinf)
localdepth <- depth & use(envsup-envinf) <= tau
res <- c(depth,localdepth)
return(res)
}
When it is called in:
res <- apply(allsubset, 2, banddepthforonecurve, xdata=x, ydata=y[,i], tau=tau, use=use)
I don't really get what it set for the first parameter "x" of the "banddepthforonecurve", I supposed its like banddepthforonecurve(i, xdata=x, ydata=y[,i], tau = tau, use=use)
but if I try to run it separately on R studio to try to understand it better I get:
apply(xdata[, x], 1, max) : dim(X) must have a positive length
Why when I compile the whole R project there isn't this error? What it set for the "x" parameter when called in the "res <- apply(...)"? I hope I was clear, sorry for my bad english, Thank you in advance !
# This apply function
res = apply(X = input, MAR = 2, FUN = foo, ...)
# is essentially syntactical sugar for this:
res = list()
for(i in 1:ncol(X)) {
res[[i]] = foo(X[, i], ...)
}
# plus an attempt simplify `res` (e.g., to a matrix or vector)
So in your line:
apply(allsubset, 2, banddepthforonecurve, xdata=x, ydata=y[,i], tau=tau, use=use)
In a single iteration of your for loop, the first parameter of banddepthforonecurve (x) will be allubset[, 1], then allsubset[, 2], ..., allsubset[, ncol(allsubset)].
The xdata parameter is always x, the tau and use parameters are always tau and use, and the for loop iterates over the columns of y to use as the ydata argument. You can think of it as a nested loop, for each column of y, use it as ydata and (via apply) iterate over all columns of allsubset.
(If the MAR argument of apply was 1, then it would iterate over rows instead of columns.)
I am trying to do something much more complex than my example below, but the basic idea is encapsulated in this example:
pass_thru <- function(FUN,params){
n <- length(FUN)
out <- list()
for(i in 1:n){
temp <- get(FUN[i],mode="function")
out[[i]] <- temp(params[[i]])
}
return(out)
}
fun1 <- function(x,y,z){
x+y+z
}
fun2 <- function(l,m,n){
l*m*n
}
FUN = c("fun1","fun2")
params = list(c(x=1,y=2,z=3,
l=4,m=5,n=6))
pass_thru(FUN,params)
The passing and parsing of FUN within pass_thru() works fine, but passing params as a list only works if every element of params is a single value (only one parameter passed to each function of FUN). I am not sure how to get multiple parameters to each function in FUN to be passed to the appropriate FUN.
What I really want to be able to do is to pass some of the parameters to each FUN in my call to pass_thru(), and have the body of pass_thru calculate the rest of the parameters to be passed to fun1 and fun2.
I am trying to create a flexible architecture upfront for large-scale data analysis, so having the ability to pass functions as well as any or all of those functions' parameters to other functions would be of great help. Thank you for any insights you have into this question!
Use do.call like this and also fix params as shown:
params <- list(list(x = 1, y = 2, z = 3), list(l = 4, m = 5, n = 6))
pass_thru <- function(FUN, params) Map(do.call, FUN, params)
pass_thru(FUN, params)
giving:
$fun1
[1] 6
$fun2
[1] 120
If you really did want:
params2 <- list(x = 1, y = 2, z = 3, l = 4, m = 5, n = 6)
then try this noting that the params[intersect(...)] part picks out the relevant parameters for the function call:
pass_thru2 <- function(FUN, params) {
runf <- function(f) do.call(f, params[intersect(names(params), names(formals(f)))])
lapply(FUN, runf)
}
pass_thru2(FUN, params2)
giving the same result.