Pass generic column names to xtabs function in R - r

Is there any way to pass generic column names to functions like xtabs in R?
Typically, I'm trying to do something like:
xtabs(weight ~ col, data=dframe)
with col and weight two columns of my data.frame, weight being a column containing weights. It works, but if I want to wrap xtabs in a function to which I pass the column names as argument, it fails. So, if I do:
xtabs.wrapper <- function(dframe, colname, weightname) {
return(xtabs(weightname ~ colname, data=dframe))
}
it fails. Is there a simple way to do something similar? Perhaps I'm missing something with R logic, but it seems to me quite annoying not to be able to pass generic variables to such functions since I'm not particularly fond of copy/paste.
Any help or comments appreciated!
Edit: as mentioned in comments, I was suggested to use eval and I came with this solution:
xtabs.wrapper <- function(dframe, wname, cname) {
xt <- eval(parse(text=paste("xtabs(", wname, "~", cname, ", data=",
deparse(substitute(dframe)), ")")))
return(xt)
}
As I said, I seems to me to be an ugly trick, but I'm probably missing something about the language logic.

Not sure if this is any prettier, but here is a way to define a function without using eval ... it involves accessing the correct columns of dframe via []:
xtabs.wrapper <- function(dframe, wname, cname) {
tmp.wt <- dframe[,wname]
tmp.col <- dframe[,cname]
xt <- xtabs(tmp.wt~tmp.col)
return(xt)
}
Or you can shorten the guts of the function to:
xtabs.wrapper2 <- function(dframe, wname, cname) {
xt <- xtabs(dframe[,wname]~dframe[,cname])
return(xt)
}
To show they are equivalent here with an example from the mtcars data:
data(mtcars)
xtabs(wt~cyl, mtcars)
xtabs.wrapper(mtcars, "wt", "cyl")
xtabs.wrapper2(mtcars, "wt", "cyl")

I did this once:
creatextab<-function(factorsToUse, data)
{
newform<-as.formula(paste("Freq ~", paste(factorsToUse, collapse="+"), sep=""))
xtabs(formula= newform, drop.unused.levels = TRUE, data=data)
}
Obviously this is a different form because of the Freq, but basically .. you can generate the forumula as a string and then you are just using xtabs() directly.

If you want an n-way crosstab and cname contains a string of variable names, then you'll want the following:
xtabs.wrapper3 <- function(dframe, wname, cname) {
eval(cname)
formula <- paste0(wname, " ~ ", paste0(cname, collapse=" + ") )
xt <- xtabs(formula, data = dframe)
return(xt)
}
xtabs.wrapper3(mtcars, "wt", c("cyl", "vs"))

Related

How to find object name passed to function

I have a function which takes a dataframe and its columns and processes it in various ways (left out for simplicity). We can put in column names as arguments or transform columns directly inside function arguments (like here). I need to find out what object(s) are passed in the function.
Reproducible example:
df <- data.frame(x= 1:10, y=1:10)
myfun <- function(data, col){
col_new <- eval(substitute(col), data)
# magic part
object_name <- ...
# magic part
plot(col_new, main= object_name)
}
For instance, the expected output for myfun(data= df, x*x) is the plot plot(df$x*df$x, main= "x"). So the title is x, not x*x. What I have got so far is this:
myfun <- function(data, col){
colname <- tryCatch({eval(substitute(col))}, error= function(e) {geterrmessage()})
colname <- gsub("' not found", "", gsub("object '", "", colname))
plot(eval(substitute(col), data), main= colname)
}
This function gives the expected output but there must be some more elegant way to find out to which object the input refers to. The answer must be with base R.
Use substitute to get the expression passed as col and then use eval and all.vars to get the values and name.
myfun <- function(data, col){
s <- substitute(col)
plot(eval(s, data), main = all.vars(s), type = "o", ylab = "")
}
myfun(df, x * x)
Anothehr possibility is to pass a one-sided formula.
myfun2 <- function(formula, data){
plot(eval(formula[[2]], data), main = all.vars(formula), type = "o", ylab = "")
}
myfun2(~ x * x, df)
The rlang package can be very powerful when you get a hang of it. Does something like this do what you want?
library(rlang)
myfun <- function (data, col){
.col <- enexpr(col)
unname(sapply(call_args(.col), as_string))
}
This gives you back the "wt" column.
myfun(mtcars, as.factor(wt))
# [1] "wt"
I am not sure your use case, but this would work for multiple inputs.
myfun(mtcars, sum(x, y))
# [1] "x" "y"
And finally, it is possible you might not even need to do this, but rather store the expression and operate directly on the data. The tidyeval framework can help with that as well.

control scoping of arguments supplied to lm() from whithin a function called by lapply

I have a function that takes a dataset, extracts different variables, and then makes linear models from those variables (it expects the response in the last column). I want the data argument of the calls for these models to use objects in the global environment so that I can manipulate them with other functions outside this function. The following gives the expected behavior when provided with a single dataset.
make_mods <- function(dataset) {
make_mod <- function(x){
response <- names(dataset)[length(dataset)]
form <- paste0(response, " ~ ", x)
form <- as.formula(form)
bquote( lm(.(form), data = .(d_sub)) ) # Unevaluated to show output
}
d_sub <- substitute(dataset)
vars <- names(dataset)[-length(dataset)]
mods <- lapply(vars, make_mod)
return(mods)
}
# Make some different datasets
ex1 <- ex2 <- ex3 <- mtcars[c(3,4,6,1)]
new_data <- function(x) {
x + rnorm(length(x), mean = 0, sd = sd(x))
}
ex2[-length(ex2)] <- lapply(ex2[-length(ex2)], new_data)
ex3[-length(ex3)] <- lapply(ex3[-length(ex3)], new_data)
make_mods(ex1)
I also want to be able to use this function within lapply
# List of datasets for testing function with lapply
ex_l <- mget(c("ex1", "ex2", "ex3"))
lapply(ex_l, make_mods)
But here the model calls end up looking like this: lm(mpg ~ disp, data = X[[i]]) and, of course, this model call doesn't evaluate in the default environment (the actual function evaluates the model call in the function). The desired output is a list of lists of models that look like: lm(mpg ~ disp, data = ex_l[["ex1"]]), i.e., they have valid calls with data arguments that reference data frames in the global environment.
I've experimented with passing names to lapply and different wrapper functions for calling make_mods from lapply but it seems like my function, in using substitute only gives the expected behavior when called from the global environment. I'm new to working with scoping and environments. How can I get my function to give the desired lm call both when passed a data frame from the global environment, and when passed data frames from within lapply.
The only thing that I could think of was to add an if statement to my make mods function that tests if the input is a call or not. If it's a call, it expects it to be a call for a dataset in the global environment.
make_mods <- function(dataset) {
make_mod <- function(x){
response <- names(dataset)[length(dataset)]
form <- paste0(response, " ~ ", x)
form <- as.formula(form)
bquote( lm(.(form), data = .(d_sub)) )
}
if(is.call(dataset)) {
d_sub <- dataset
dataset <- eval(dataset)
} else {
d_sub <- substitute(dataset)
}
vars <- names(dataset)[-length(dataset)]
mods <- lapply(vars, make_mod)
return(mods)
}
Then I can use lapply like this:
out <- lapply(names(ex_l), function(x){
g <- bquote(ex_l[[.(x)]])
make_mods(g)
})
names(out) <- names(ex_l)
which gives me this:
$ex1
$ex1[[1]]
lm(mpg ~ disp, data = ex_l[["ex1"]])
$ex1[[2]]
lm(mpg ~ hp, data = ex_l[["ex1"]])
$ex1[[3]]
lm(mpg ~ wt, data = ex_l[["ex1"]])
<<output truncated>>
Maybe not an elegant solution, but it's working.

R: dynamic arguments

I'm using an R function which requires a list of variables as input arguments in the following format:
output <- funName(gender ~ height + weight + varName4, data=tableName)
Basically the input arguments are column names in the table (and are not to be enclosed in ""). I have a list of these variables that I want to add one by one; i.e. run the function with one variable first, get the output, and incrementally adding variables (getting an output each time) i.e.
iteration 1:
output <- funName(gender ~ height, data=tableName)
iteration 2:
output <- funName(gender ~ height + weight, data=tableName)
iteration 3:
output <- funName(gender ~ height + weight + varName4, data=tableName)
Is this possible?
Try the following:
# vector of variable names
myNames <- c("gender", "height", "weight", "varName4")
# print out results
for(i in 2:4) {
print(as.formula(paste(myNames[1], "~", paste(myNames[2:i], collapse="+"))))
}
Of course, you can replace print with the appropriate funName, such as lm, along with additional arguments. So
for(i in 2:4) {
lm(as.formula(paste(myNames[1], "~", paste(myNames[2:i], collapse="+"))), data=tableName)
}
Should work as you would expect it to. You could also use lapply if you wanted to save the results in an orderly fashion:
temp <- lapply(2:4, function(i) as.formula(paste(myNames[1], "~",
paste(myNames[2:i], collapse="+"))))
will save a list of formulas, for example.
Using the reformulate function as mentioned by #ben-bolker, you can simplify the web of paste functions:
for(i in 2:4) {
print(reformulate(myNames[2:i], response = myNames[1], intercept = TRUE))
}

Error: cannot join on columns: index out of bounds [duplicate]

I am trying to perform an inner join two tables using dplyr, and I think I'm getting tripped up by non-standard evaluation rules. When using the by=("a" = "b") argument, everything works as expected when "a" and "b" are actual strings. Here's a toy example that works:
library(dplyr)
data(iris)
inner_join(iris, iris, by=c("Sepal.Length" = "Sepal.Width"))
But let's say I was putting inner_join in a function:
library(dplyr)
data(iris)
myfn <- function(xname, yname) {
data(iris)
inner_join(iris, iris, by=c(xname = yname))
}
myfn("Sepal.Length", "Sepal.Width")
This returns the following error:
Error: cannot join on columns 'xname' x 'Sepal.Width': index out of bounds
I suspect there is some fancy expression, deparsing, quoting, or unquoting that I could do to make this work, but I'm a bit murky on those details.
You can use
myfn <- function(xname, yname) {
data(iris)
inner_join(iris, iris, by=setNames(yname, xname))
}
The suggested syntax in the ?inner_join documentation of
by = c("a"="b") # same as by = c(a="b")
is slightly misleading because both those values aren't proper character values. You're actually created a named character vector. To dynamically set the values to the left of the equals sign is different from those on the right. You can use setNames() to set the names of the vector dynamically.
I like MrFlick's answer and fber's addendum, but I prefer structure. For me setNames feels as something at the end of a pipe, not as an on-the-fly constructor. On another note, both setNames and structure enable the use of variables in the function call.
myfn <- function(xnames, ynames) {
data(iris)
inner_join(iris, iris, by = structure(names = xnames, .Data = ynames))
}
x <- "Sepal.Length"
myfn(x, "Sepal.Width")
A named vector argument would run into problems here:
myfn <- function(byvars) {
data(iris)
inner_join(iris, iris, by = byvars)
}
x <- "Sepal.Length"
myfn(c(x = "Sepal.Width"))
You could solve that, though, by using setNames or structure in the function call.
I know I'm late to the party, but how about:
myfn <- function(byvar) {
data(iris)
inner_join(iris, iris, by=byvar)
}
This way you can do what you want with:
myfn(c("Sepal.Length"="Sepal.Width"))
I faced a nearly identical challenge as #Peter, but needed to pass multiple different sets of by = join parameters at one time. I chose to use the map() function from the tidyverse package, purrr.
This is the subset of the tidyverse that I used.
library(magrittr)
library(dplyr)
library(rlang)
library(purrr)
First, I adapted myfn to use map() for the case posted by Peter. 42's comment and Felipe Gerard's answer made it clear that the by argument can take a named vector. map() requires a list over which to iterate.
myfn_2 <- function(xname, yname) {
by_names <- list(setNames(nm = xname, yname ))
data(iris)
# map() returns a single-element list. We index to retrieve dataframe.
map( .x = by_names,
.f = ~inner_join(x = iris,
y = iris,
by = .x)) %>%
`[[`(1)
}
myfn_2("Sepal.Length", "Sepal.Width")
I found that I didn't need quo_name() / !! in building the function.
Then, I adapted the function to take a list of by parameters. For each by_i in by_grps, we could extend x and y to add named values on which to join.
by_grps <- list( by_1 = list(x = c("Sepal.Length"), y = c("Sepal.Width")),
by_2 = list(x = c("Sepal.Width"), y = c("Petal.Width"))
)
myfn_3 <- function(by_grps_list, nm_dataset) {
by_named_vectors_list <- lapply(by_grps_list,
function(by_grp) setNames(object = by_grp$y,
nm = by_grp$x))
map(.x = by_named_vectors_list,
.f = ~inner_join(nm_dataset, nm_dataset, by = .x))
}
myfn_3(by_grps, iris)

Entering variables into regression function

I have this feature_list that contains several possible values, say "A", "B", "C" etc. And there is time in time_list.
So I will have a loop where I will want to go through each of these different values and put it in a formula.
something like for(i in ...) and then my_feature <- feature_list[i] and my_time <- time_list[i].
Then I put the time and the chosen feature to a dataframe to be used for regression
feature_list<- c("GPRS")
time_list<-c("time")
calc<-0
feature_dim <- length(feature_list)
time_dim <- length(time_list)
data <- read.csv("data.csv", header = TRUE, sep = ";")
result <- matrix(nrow=0, ncol=5)
errors<-matrix(nrow=0, ncol=3)
for(i in 1:feature_dim) {
my_feature <- feature_list[i]
my_time <- time_list[i]
fitdata <- data.frame(data[my_feature], data[my_time])
for(j in 1:60) {
my_b <- 0.0001 * (2^j)
for(k in 1:60) {
my_c <- 0.0001 * (2^k)
cat("Feature: ", my_feature, "\t")
cat("b: ", my_b, "\t")
cat("c: ", my_c, "\n")
err <- try(nlsfit <- nls(GPRS ~ 53E5*exp(-1*b*exp(-1*c*time)), data=fitdata, start=list(b=my_b, c=my_c)), silent=TRUE)
calc<-calc+1
if(class(err) == "try-error") {
next
}
else {
coefs<-coef(nlsfit)
ess<-deviance(nlsfit)
result<-rbind(result, c(coefs[1], coefs[2], ess, my_b, my_c))
}
}
}
}
Now in the nls() call I want to be able to call my_feature instead of just "A" or "B" or something and then to the next one on the list. But I get an error there. What am I doing wrong?
You can use paste to create a string version of your formula including the variable name you want, then use either as.formula or formula functions to convert this to a formula to pass to nls.
as.formula(paste(my_feature, "~ 53E5*exp(-1*b*exp(-1*c*time))"))
Another option is to use the bquote function to insert the variable names into a function call, then eval the function call.
I worked with R a while ago, maybe you can give this a try:
What you want is create a formula with a list of variables right?
so if the response variable is the first element of your list and the others are the explanatory variables you could create your formula this way:
my_feature[0] ~ reduce("+",my_feature[1:]) . This might work.
this way you can create formulae that depends on the variables in my_features.

Resources