Using dplyr: is there a way to loop over variables in a data frame and pass both the data and the variable name to a custom function?
I have a solution for this using mapply in base R. In the interest of learning I am wondering if there is a neat dplyr-way to achieve the same result.
Here is a small example, where each column in a data frame is transformed by adding a constant. The constant I wish to add is different for each variable, as listed in myconstants.
library(tidyverse)
mydata <- tibble(
a = 1:5,
b = 1:5,
c = 1:5
)
myconstants <- tibble(
a = 10,
b = 20
)
custom_function <- function (x, y, k) {
constant <- if (is.null(k[[y]])) 0 else k[[y]]
x + constant
}
# solution in base R
foo <- mapply(
custom_function,
mydata,
names(mydata),
MoreArgs = list(k = myconstants)
) %>%
as_tibble()
Related
I want to generate a dataframe where each row is given by a function of the cross product of several properties.
This can be done imperatively:
seed <- expand.grid(
c(1,2,3),
c(4,5,6)
)
d <- data.frame()
for(i in 1:nrow(seed)) {
d[i,"prop1"] = f(seed[i,])
d[i,"prop2"] = g(seed[i,])
}
where f and g are functions. But I would prefer a functional version. I tried to use pmap:
pmap(seed, ~ data.frame(
prop1 = f(..1, ..2),
prop2 = g(..1, ..2))
)
but this doesn't give me a dataframe. Is there an (efficient) way to do this without loops?
If f and g return atomic values to populate dataframe d, why not just return the complete vectors instead by passing seed to the functions? E.g.
d$prop1 <- f(seed)
d$prop2 <- g(seed)
Consider the following data frame:
dat <- data.frame(
ID = c(1:200),
var1 = rnorm(200),
var2 = rnorm(200),
var3 = rnorm(200),
var4 = rnorm(200)
)
We want to use the lapply() function to apply ggqqplot function from the ggpubr package to columns 2:4 of dat:
library(tidyverse)
library(ggpubr)
vars <- paste0(names(dat[,2:5]))
lapply(vars, FUN=ggqqplot, data=dat)
This works fine, but we want to print the figures side by side (e.g., in 2 rows, 2 columns). How can we do this using the apply framework?
ggqqplot allows facets using argument facet_by (check ?ggqqplot), so all you have to do is convert your data into facet friendly long form data. see below
dat_melt <- reshape2::melt(dat, measure.vars = vars) # converts data into long-form
ggqqplot(dat_melt, x="value", facet.by = "variable")
Edit:
If you still want to use apply functions, the following is the easiest solution I can think of
plist <- lapply(vars, FUN = function (x) ggqqplot(dat,x = x))
cowplot::plot_grid(plotlist = plist, ncol = 2)
or the pipe-friendly equivalent
lapply(vars, FUN = function (x) ggqqplot(dat,x = x)) %>%
{cowplot::plot_grid(plotlist=.,ncol = 2)}
I am writing a function to build new data frames based on existing data frames. So I essentially have
f1 <- function(x,y) {
x_adj <- data.frame("DID*"= df.y$`DM`[x], "LDI"= df.y$`DirectorID*`[-(x)], "LDM"= df.y$`DM`[-(x)], "IID*"=y)
}
I have 4,000 data frames df., so I really need to use this and R is returning an error saying that df.y is not found. y is meant to be used through a list of all the 4000 names of the different df. I am very new at R so any help would be really appreciated.
In case more specifics are needed I essentially have something like
df.1 <- data.frame(x = 1:3, b = 5)
And I need the following as a result using a function
df.11 <- data.frame(x = 1, c = 2:3, b = 5)
df.12 <- data.frame(x = 2, c = c(1,3), b = 5)
df.13 <- data.frame(x = 3, c = 1:2, b = 5)
Thanks in advance!
OP seems to access data.frame with dynamic name.
One option is to use get:
get(paste("df",y,sep = "."))
The above get will return df.1.
Hence, the function can be modified as:
f1 <- function(x,y) {
temp_df <- get(paste("df",y,sep = "."))
x_adj <- data.frame("DID*"= temp_df$`DM`[x], "LDI"= temp_df$`DirectorID*`[-(x)],
"LDM"= temp_df$`DM`[-(x)], "IID*"=y)
}
The case I have is I want to "tack on" a bunch of columns to an existing data.frame, where each column is a function that does math on other columns. My goals are:
I want to specify the functions once
I don't want to worry about having to pass arguments in the right order and/or match them by name
I want to specify the order in which to apply the functions once
I want the new column names to be the function names
Ideally I want something like:
df <- data.frame(a = rnorm(10), b = rnorm(10))
y <- function (x) a + b
z <- function (x) b * y
df2 <- lapply (list (y, z), df)
where df2 is a data.frame with 4 columns: a, b, y and z. I think this achieves the goals.
The closest I've gotten to this is the following:
df <- data.frame(a = rnorm(10), b = rnorm(10))
y <- function (x) x$a + x$b
z <- function (x) x$b * x$y
funs <- list (
y = y,
z = z
)
df2 <- df
df2$y <- funs$y(df2)
df2$z <- funs$z(df2)
This achieves goals 1 and 2, but not 3 and 4.
Thanks in advance for the help.
This maybe the thing you want. After defining the function dfapply, it can be used very similar to your original intention without too much things like x$a etc, except to use expression instead of function.
dfapply <- function(exprs, df){
for (expr in exprs) {
df <- within(df, eval(expr))
}
df
}
df <- data.frame(a = rnorm(10), b = rnorm(10))
expr1 <- expression(y <- a + b)
expr2 <- expression(z <- b * y)
df2 <- dfapply(c(expr1, expr2), df)
I'm not quite familiar with R function dealing with variables used.
Here's the problem:
I want to built a function, of which variables ... are column names of data frame used for table().
f <- function (data, ...){
T <- with(data, table(...) # ... variables input
return(T)
}
How can I deal with the code?
Thanks a lot for answering!
The order of evaluation doesn't quite work right with with() apparently. Here's an alternative that should work (using sample data from #DavidArenburg)
set.seed(1)
data1 <- data.frame(a = sample(5,5), b = sample(5,5))
f <- function (data, ...) {
xx <- lapply(substitute(...()), eval, data, parent.frame())
T <- do.call(table, xx)
return(T)
}
f(data = data1, a,b)
It is often far easier to avoid non-standard evaluation and use character strings to reference the columns within a data.frame.
set.seed(1)
data1 <- data.frame(a = sample(5,5), b = sample(5,5))
f <- function (data, ...) {
do.call(table,data[unlist(list(...))])
}
# the following calls to `f` return the same results
f(data = data1, 'a','b')
f(data = data1, c('a','b'))
a <- c('a','b')
f(data = data1, a)