converting a string into a data frame name - r

In functions such as plotmeans there is an argument that specifies the data frame to use, data=. I would like to construct the name of the data frame to be used using paste0 or something similar, df <- paste0("results", i), where i is a number to get (say) "results04". If I then use data=df, I get an error saying that data= expects a variable, not a string. Is there any way to convert the string into a form that data= will accept? data=results04 without the quotes, of course, works.
Thanks for any suggestions or pointers.

The answer would have been obvious to one with more R experience, but let me put it here for others: use the get() function, so for instance
df <- paste0("results", i)
plotmeans(a ~ b, data=get(df))

Related

R, Is there a function for deparsing and (re)parsing any regression output for lm/glm/semgented models?

Is there an R package/class/code that will deparse the output object from lm or glm into text (or JSON/XML/Other MarkUp) and can also parse it back into an object with the same structure as the original (i.e. so code can query the new object in the same way as the original)?
To complicate matters, I want to be able to do the same with output from the segmented package. So I am looking for something that can read the syntax of any parsed list, including the attr, into the originally structured list object with attr. But a solution to the single lm would be a good starting point.
So far I have tried:
toJSON //does not support R type 6)//
toString then as.list //gives a list with all in one item and escape symbols prevent parsing//
deparse(str(summary(OUTPUT))) //No error but returns NULL//
deparse(OUTPUT) then as.list //gives a list with one item per top level but sublevels are just in a string.
deparse (OUTPUT) then parse directly //Error, must be a string or connection//
passing the string output to a new structure() object //Same Result as 4 except not recognised as list by rstudio//
Exporting the summary metrics to a data frame and then to JSON //Keeps the coeffs but loses other information//
I wish to avoid having to make any assumptions about the original structure or number of terms in the formula (so other regression models can be used) which means code to scan the deparsed lm model will need to be quite subtle.
As #AEF mentions in their comment, you might be looking for serialization
with saveRDS(). If you specifically need a text format rather than a
binary one, you can use ascii = TRUE.
fit0 <- lm(mpg ~ wt, data = mtcars)
saveRDS(fit0, "model.rds", ascii = TRUE)
fit1 <- readRDS("model.rds")
predict(fit0, data.frame(wt = 3))
#> 1
#> 21.25171
predict(fit1, data.frame(wt = 3))
#> 1
#> 21.25171

Calculating multiple ROC curves in R using a for loop and pROC package. What variable to use in the predictor field?

I am using the pROC package and I want to calculate multiple ROC curve plots using a for loop.
My variables are specific column names that are included as string in a vector and I want pROC to read sequentially that vector and use the strings in the field "predictor" that seems to accept text/characters.
However, I cannot parse correctly the variable, as I am getting the error:
'predictor' argument should be the name of the column, optionally quoted.
here is an example code with aSAH dataset:
ROCvector<- c("s100b","ndka")
for (i in seq_along(ROCvector)){
a<-ROCvector[i]
pROC_obj <- roc(data=aSAH, outcome, as.character(a))
#code for output/print#
}
I have tried to call just "a" and using the functions print() or get() without any results.
Writing manually the variable (with or without quoting) works, of course.
Is there something I am missing about the type of variable I should use in the predictor field?
By passing data=aSAH as first argument, you are triggering the non-standard evaluation (NSE) of arguments, dplyr-style. Therefore you cannot simply pass the column name in a variable. Note the inconsistency with outcome that you pass unquoted and looks like a variable (but isn't)? Fortunately, functions with NSE in dplyr come with an equivalent function with standard evaluation, whose name ends with _. The pROC package follows this convention. You should usually use those if you are programming with column names.
Long story short, you should use the roc_ function instead, which accepts characters as column names (don't forget to quote "outcome"):
pROC_obj <- roc_(data=aSAH, "outcome", as.character(a))
A slightly more idiomatic version of your code would be:
for (predictor in ROCvector) {
pROC_obj <- roc_(data=aSAH, "outcome", predictor)
}
roc can accept formula, so we can use paste0 and as.formula to create one. i.e.
library(pROC)
ROCvector<- c("s100b","ndka")
for (i in seq_along(ROCvector)){
a<-ROCvector[i]
pROC_obj <- roc(as.formula(paste0("outcome~",a)), data=aSAH)
print(pROC_obj)
#code for output/print#
}
To can get the original call i.e. without paste0 wich you can use for later for downstream calculations, use eval and bquote
pROC_obj <- eval(bquote(roc(.(as.formula(paste0("outcome~",a))), data=aSAH)))

Creating call objects to compare to formula elements

I would like to create an object from a string to compare with an element of a formula.
For example, in the following:
# note that f does not exist
myForm <- y ~ f(x)
theF <- myForm[[3]]
fString <- "f(x)"
How can I compare fString to theF?
If I know the string is "f(x)" I can manually enter the following
cheating <- as.call(quote(f(x)))
identical(theF, cheating)
which works (it gives TRUE) but I want to be able to take the string "f(x)" as an argument (e.g. maybe it's "g(x)".
The real point of this question is for me to understand better how to work with call objects and quote function.
parse(text = s) converts text, s, to an expression and e[[1]] extracts the call object from a length 1 expression e. theF is a call object so putting these together we have:
identical(theF, parse(text = fString)[[1]])
## TRUE
note that formula's are really nothing on their own in R.
the only thing they do is convert it into a string like object...
"y~f(x)"
it's then on to the functions that accept formulas to interpret it...
check coplot for an example implementation

invalid 'envir' argument of type 'character' -- in self-defined function with lattice histogram

I want a function with parameters such as data name (dat), factor(myfactor), variable names(myvar) to dynamically generate histograms (have to use lattice).
Using IRIS as a minimal example:
data(iris)
my_histogram <- function(myvar,myfactor,dat){
listofparam <- c(myvar,myfactor)
myf <- as.formula(paste("~",paste(listofparam,collapse="|")))
histogram(myf,
data=dat,
main=bquote(paste(.(myvar),"distribution by",.(myfactor),seq=" ")))}
my_histogram("Sepal.Length","Species","iris")
I also tried do.call as some posts indicated:
my_histogram <- function(myvar,myfactor,dat){
listofparam <- c(myvar,myfactor)
myf <- as.formula(paste("~",paste(listofparam,collapse="|")))
p <- do.call("histogram",
args = list(myf,
data=dat))
print(p)
}
my_histogram("Sepal.Length","Species","iris")
But the error appears: invalid 'envir' argument of type 'character'. I think the program doesn't know where to look for thismyf` string. How can I fix this or there's a better way?
Readers of this should be aware that the question has completely mutated from an earlier version and doesn't really match up with this answer anymore. The answer to the new question appears in the comments.
There is no object named Sepal.Length. (So R is creating an error even before that my_function gets called.) There is only a column name and it would need to be quoted to pass it to a function. (The data object could not be created because that URL fails to deliver the data. Why aren't you using the built-in copy of the iris data object?
You will also need to build a formula from myvar and fac. Formulas are expressions and get parsed without evaluation of their tokens. You need to build a formula inside your function that looks like: ~Sepal.Length|Species and then pass it to the histogram call. Consult ?as.formula

Accessing Arbitrary Columns from an R Data Frame using with()

Suppose that I have a data frame with a column whose name is stored in a variable. Accessing this column using the variable is easy using bracket notation:
df <- data.frame(A = rep(1, 10), B = rep(2, 10))
column.name <- 'B'
df[,column.name]
But it is not obvious how to access an arbitrary column using a call to with(). The naive approach
with(df, column.name)
effectively evaluates column.name in the caller's environment. How can I delay evaluation sufficiently that with() will provide the same results that brackets give?
You can use get:
with(df, get(column.name))
You use 'with' to create a localized and temporary namespace inside which you evaluate some expression. In your code above, you haven't passed in an expression.
For instance:
data(iris) # this data is in your R installation, just call 'data' and pass it in
Ordinarily you have to refer to variable names within a data frame like this:
tx = tapply(iris$sepal.len, list(iris$species), mean)
Unless you do this:
attach(iris)
The problem with using 'attach' is the likelihood of namespace clashes, so you've got to remember to call 'detach'
It's much cleaner to use 'with':
tx = with( iris, tapply(sepal.len, list(species), mean) )
So, the call signature (informally) is: with( data, function() )

Resources