I have a function f1 on a dataframe which calls another function f2 and then does stuff with f2's output. f2 works interactively on its own but how do I get it to run when called by f1?
f1 <- function(x, y) {
z <- f2(x, y)
# do stuff with z
w <- z
return(w)
}
f2 uses subset() to de-select certain columns:
f2 <- function(x, y) {
y <- substitute(y)
subset(x, select = -eval(y))
}
As you can see, f2 works interactively. I don't care about that, but I do want it to work when called by f1.
# This works fine interactively (but I don't care about that)
f2(mtcars,mpg)
# This is what I want to work
f1(mtcars,mpg)
Error in -eval(y) : invalid argument to unary operator
I would prefer not to change f1 or its arguments. How do I re-write f2 so that it works within f1?
Here is a similar question with solutions that I'm having trouble applying to my context: R: passing expression to an inner function
The easiest would be to use rlang and tidyverse functions for compatibility with quasiquotation :
library(dplyr)
library(rlang)
f1 <- function(x, y) {
z <- f2(x, -!!enquo(y))
# do stuff with z
w <- z
return(w)
}
f2 <- function(x, y) {
select(x, !!enquo(y))
}
f1(mtcars,mpg)
# cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# ...
This works in base R :
f1 <- function(x, y) {
z <- f2(x, substitute(y))
# do stuff with z
w <- z
return(w)
}
f2 <- function(x, y) {
eval(substitute(subset(x, select = -Y), list(Y = y)))
}
f1(mtcars,mpg)
# cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# ...
With the newer version of rlang 0.4.0, we can use the {{...}} (curly-curly) that makes it easier to do the evaluation
library(rlang)
f1 <- function(x, y) {
z <- f2(x, -{{y}})
# do stuff with z
w <- z
return(w)
}
f2 <- function(x, y) {
select(x, {{y}})
}
f1(mtcars,mpg) %>%
head
# cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 6 225 105 2.76 3.460 20.22 1 0 3 1
Related
Hey I'm sure I'm missing something simple with mapping, but I can't get it to work. I want to use a loop to do the same calculation for multiple dataframes that have the same name. Basically, I want this loop to not throw an error:
mtcars1 <- mtcars
mtcars2 <- mtcars
for(x in c(mtcars1, mtcars2)){
x$new <- x$mpg * x$cyl
}
So that at the end, both mtcars1 and mtcars2 have a new variable called new, that is mpg * cyl.
If you first put your dataframes into a list, you can index into each using seq_along():
dfs <- list(mtcars1 = mtcars, mtcars2 = mtcars)
for (i in seq_along(dfs)) {
dfs[[i]]$new <- dfs[[i]]$mpg * dfs[[i]]$cyl
}
Or, using lapply():
dfs <- lapply(dfs, \(x) {
x$new <- x$mpg * x$cyl
x
})
Result from either approach:
#> head(dfs$mtcars1)
mpg cyl disp hp drat wt qsec vs am gear carb new
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 126.0
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 126.0
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 91.2
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 128.4
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 149.6
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 108.6
If you really want to leave your dataframes loose in the environment, you could do something like
for (nm in c("mtcars1", "mtcars2")) {
x <- get(nm)
x$new <- x$mpg * x$cyl
assign(nm, x)
}
Result:
#> head(mtcars1)
mpg cyl disp hp drat wt qsec vs am gear carb new
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 126.0
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 126.0
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 91.2
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 128.4
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 149.6
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 108.6
The for loop does not work as you think, and more important, the effect of c(mtcars1, mtcars2) is not what you think, to see so do
test <- c(mtcars1, mtcars2)
length(test)
str(test)
You need to replace the c with list. Below is one solution:
mtcars1 <- mtcars
mtcars2 <- mtcars
test <- list(mtcars1, mtcars2)
newtest <- lapply(test, FUN=function(x)
within(x, new <- mpg * cyl))
We could use transform with lapply
lst1 <- lapply(lst1, transform, new = mpg * cyl)
data
lst1 <- list(mtcars1, mtcars2)
Following on from my previous challenging exercise: promise already under evaluation with nesting from function, I have learnt thus far how to properly use: enquos, !!!, c() within a function for a variety of calling methods. However, my next challenge is more complex - I want to call a function within a function, and only passing it parameters from the outer function. Essentially, I wanted to make a list of functions and pass different parameters to each element from the list by using another function.
for example:
anotherTest <- function(data,...){
cols = enquos(...)
testFunc <- function(df, more){
df %>% mutate(!!!c(more))
}
n <- length(cols)
addMutation <- replicate(n, testFunc, simplify=FALSE)
print(addMutation)
addCars <- replicate(n, data)
mapply(function(x, y, z) x %>% reduce(., y, z),addCars, addMutation, cols)
}
When I call:
anotherTest(mtcars, vs, gear, am)
I get this error:
Error in fn(out, elt, ...) : unused argument (~vs)
We could try
anotherTest <- function(data,...){
cols = enquos(...)
testFunc <- function(df, more){
df %>% mutate(!!!c(more))
}
n <- length(cols)
addMutation <- replicate(n, testFunc, simplify=FALSE)
addCars <- replicate(n, data, simplify = FALSE)
Map(function(x, y, z) y(x, z), addCars, addMutation, cols)
}
-testing
out <- anotherTest(mtcars, vs, gear, am)
> lapply(out, head, 3)
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
[[2]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
[[3]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
I want to copy data frame a to a new data frame b inside a function.
a <- mtcars
saveData <- function(x, y){
y <- x
return(y)
}
saveData(a, b)
In this example, the function should create the object/data frame b. b should be a copy of a (i.e., mtcars)
The crux is to flexibly "name" objects.
I excessively played around with assign(), deparse(), and substitute(), but I could not make it work.
It is not a good pracrtice to save the data in global environment from a function. However if you want to do it here is a way :
saveData <- function(x, y){
assign(deparse(substitute(y)), x, envir = parent.frame())
}
a <- mtcars
b
Error: object 'b' not found
saveData(a, b)
b
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#...
Another idea is to use list2env, but you have to convert to a named list, so your second argument will need to be a character, i.e.
saveData <- function(x, y) {
v1 <- setNames(list(x), y)
list2env(v1, envir = .GlobalEnv)
}
saveData(a, 'b')
b
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#.....
NOTE: I wouldn't recommend adding staff to your global environment. It is better to keep them in lists
I want to reorder the rows of a data.table according to some given sequence of indices, which is what setcolorder does for columns. Is there a function for this?
Here is a reproducible example, with the expected output.
> DT = data.table(mtcars, keep.rownames=TRUE)[1:3]
> DT
rn mpg cyl disp hp drat wt qsec vs am gear carb
1: Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2: Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
> ord = c(3,1,2)
> setroworder(DT, ord)
> DT
rn mpg cyl disp hp drat wt qsec vs am gear carb
1: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
2: Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
3: Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
neworder should be the "lookup index" of the new ordering, e.g. neworder = c(3, 1, 2) gives the 3rd row as the new first row, the 1st row as the new second row, etc...
# example
DT = data.table(mtcars, keep.rownames=TRUE)[1:3]
ord = c(3,1,2)
DT
rn mpg cyl disp hp drat wt qsec vs am gear carb
1: Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2: Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# use DT[ord, do_stuff]:
setorderv(DT[ord, .rn := .I], ".rn")[]
rn mpg cyl disp hp drat wt qsec vs am gear carb .rn
1: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
2: Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2
3: Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3
As noted in a comment, I think it would be a bad idea to get rid of the column capturing the row ordering, but you can make a wrapper to get rid of it as in the other answers.
It may be necessary to use 1:.N in a future version if the behavior of .I changes as discussed here: https://github.com/Rdatatable/data.table/issues/2598
If your integer vector is not inherent to the table, then I don't see a function to do it automatically (I hope others will chime in, I'm not a data.table-guru). Lacking that, here's a quick function, with annoying message calls to show the object memory address, to show this is done internally (and not changing the memory location):
setroworder <- function(DT, vec, verbose = TRUE, vecname = NA) {
if (is.logical(verbose)) verbose <- if (verbose) message else c
verbose("# ", data.table::address(DT))
if (is.na(vecname)) {
# find an unused name
vecname <- make.unique(c(colnames(DT), "vec"))[ ncol(DT) + 1L ]
}
verbose("# ", data.table::address(DT))
set(DT, i = NULL, j = vecname, value = order(vec))
verbose("# ", data.table::address(DT))
setorderv(DT, vecname)
verbose("# ", data.table::address(DT))
set(DT, j = vecname, value = NULL)
verbose("# ", data.table::address(DT))
invisible(DT) # convenience only, this function operates in side-effect
}
In action:
x <- data.table(a = 1:10)
setroworder(x, c(3,1,2,4:10))[]
# # 0000000012EFF1A8
# # 0000000012EFF1A8
# # 0000000012EFF1A8
# # 0000000012EFF1A8
# # 0000000012EFF1A8
# a
# 1: 3
# 2: 1
# 3: 2
# 4: 4
# 5: 5
# 6: 6
# 7: 7
# 8: 8
# 9: 9
# 10: 10
Edit. The original answer did not give behaviour equivalent to secolorder. neworder should be the "lookup index" of the new ordering, e.g. neworder = c(3, 1, 2) gives the 3rd row as the new first row, the 1st row as the new second row, etc...
Here is my solution:
setroworder <- function(x, neworder) {
# This is assumes that there is some convention that colnames do not start with '.'.
# I don't know if there is any such convention though.
x[, .indexcol := sort.int(neworder, index.return = TRUE)$ix]
setorder(x, .indexcol)
x[, .indexcol := NULL]
}
Testing it out:
> x <- as.data.table(mtcars)
> head(x)
mpg cyl disp hp drat wt qsec vs am gear carb
1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
> set.seed(42)
> head(setroworder(x, sample(32)))
mpg cyl disp hp drat wt qsec vs am gear carb
1: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
2: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
3: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
4: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
5: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
6: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
As an intermediate step I generate a data frame with one column as character strings and the rest are numbers. I'd like to convert it to a matrix, but first I have to convert that character column into row names and remove it from the data frame.
Is there a simpe way to do this in dplyr? A function like to_rownames() that is opposite to add_rownames()?
I saw a solution using a custom function, but it's really out of dplyr philosophy.
You can now use the tibble-package:
tibble::column_to_rownames()
This provides NSE & standard eval functions:
library(dplyr)
df <- data_frame(a=sample(letters, 4), b=c(1:4), c=c(5:8))
reset_rownames <- function(df, col="rowname") {
stopifnot(is.data.frame(df))
col <- as.character(substitute(col))
reset_rownames_(df, col)
}
reset_rownames_ <- function(df, col="rowname") {
stopifnot(is.data.frame(df))
nm <- data.frame(df)[, col]
df <- df[, !(colnames(df) %in% col)]
rownames(df) <- nm
df
}
m <- "rowname"
head(as.matrix(reset_rownames(add_rownames(mtcars), "rowname")))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
head(as.matrix(reset_rownames_(add_rownames(mtcars), m)))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Perhaps to_rownames() or set_rownames() makes more sense. ¯\_(ツ)_/¯ YMMV.
If you really need a matrix you can just save the character column to a separate variable, drop it, and then create the matrix
library(dplyr)
df <- data_frame(a = sample(letters, 4), b = c(1:4), c = c(5:8))
letters <- df %>% select(a)
a.matrix <- df %>% select(-a) %>% as.matrix
Not sure what you are going to do after that, but this gets you as far as you asked for...