I'm trying to write a recursive function that builds a nested ifelse call. I do realize there are much better approaches than nested ifelse, e.g., dplyr::case_when and data.table::fcase, but I'm trying to learn how to approach such problems with metaprogramming.
The following code builds out the nested ifelse, but I'm struggling to substitute data with the actual supplied value, in this case my_df.
If I replace quote(data) with substitute(data), it only works for the first ifelse, but after entering the next iteration, it turns into data.
I think something like pryr::modify_lang could solve this after the fact, but I think there's probably a base R solution someone knows.
my_df <- data.frame(group = letters[1:3],
value = 1:3)
build_ifelse <- function(data, by, values, iter=1){
x <- call("ifelse",
call("==",
call("[[", quote(data), by),
values[iter]),
1,
if(iter != length(values)) build_ifelse(data, by, values, iter = iter + 1) else NA)
return(x)
}
build_ifelse(data = my_df, by = "group", values = letters[1:3])
# ifelse(data[["group"]] == "a", 1, ifelse(data[["group"]] == "b",
# 1, ifelse(data[["group"]] == "c", 1, NA)))
Thanks for any input!
Edit:
I found this question/answer: https://stackoverflow.com/a/59242109/9244371
Based on that, I found a solution that seems to work pretty well:
build_ifelse <- function(data, by, values, iter=1){
x <- call("ifelse",
call("==",
call("[[", quote(data), by),
values[iter]),
1,
if(iter != length(values)) build_ifelse(data, by, values, iter = iter + 1) else NA)
x <- do.call(what = "substitute",
args = list(x,
list(data = substitute(data))))
return(x)
}
build_ifelse(data = my_df, by = "group", values = letters[1:3])
# ifelse(my_df[["group"]] == "a", 1, ifelse(my_df[["group"]] ==
# "b", 1, ifelse(my_df[["group"]] == "c", 1, NA)))
eval(build_ifelse(data = my_df, by = "group", values = letters[1:3]))
# [1] 1 1 1
There is a base function, switch, that can deliver sequential testing and results similar to dplyr::case_when, at least when used with a loop wrapper. It's not well documented. It is really two different functions, one that expects a numeric input for it classification variable and another that expects character values. I can never remember it's name, and so typically I need to remind myself that it is referenced in the ?Control page. Since you're using character values, here goes. (I changed the outputs so you can see that some degree of substitution is occurring and that there is an "otherwise" option
sapply( my_df$group, switch, a=4, b=5, d=6, NA)
a b c
4 5 NA
Related
I am trying to apply a function with two arguments. The first argument is a dataframe, the second is an integer that defines a row of the df.
col_1 <- c("A", "B", "C")
col_2 <- c("red", "blue", "black")
df <- data.frame(col_1, col_2)
f <- function(x, arg1) {
x[arg1, 1]
x[arg1, 2]
}
apply(df, 1, f)
Looks like the second argument is not passed to the function. Here is the error
Error in x[arg1, 1] : incorrect number of dimensions
when I put arg1=1 like this
apply(df, arg1=1, f)
it gives me a FUN error
Error in match.fun(FUN) : argument "FUN" is missing, with no default
the desired output is "A" and "red", i.e. in my real code I need to operate with the values of each row.
I also want to add an output variable to be able to save a plot that I am making in my real analysis in a file. Can I just add an "output" variable in function(x, arg1) and then do apply(df, arg1=1, f, output="output_file")?
As #Greg mentions, the purpose of this code isn't clear. However, the question seems to relate to how apply() works so here goes:
Basically, when any of the apply family of functions is used, the user-enetered function (f(), in this case) is applied to the subset of the data produced by apply. So here, you've asked apply to evaluate each row then call f() - the first argument to f() would then be a vector rather than the data frame your function requires.
Here's some functioning code:
col_1 <- c("A", "B", "C")
col_2 <- c("red", "blue", "black")
df <- data.frame(col_1, col_2)
f <- function(x) {
x[1]
x[2]
}
apply(df, 1, f)
This generates all of the values of the second column as a vector since x[2] is returned from the function and for each row, will represent the value in the second column.
If you want the arg1 row of results, you could simply use the following:
find_row <- function(df, row) {
df[row, ]
}
find_row(df, 1)
apply() isn't required. Using a single function makes the code simpler to read and should be faster too.
Continuing on my quest to work with functions and ggplot:
I sorted out basic ways on how to use lapply and ggplot to cycle through a list of y_columns to make some individual plots:
require(ggplot2)
# using lapply with ggplot
df <- data.frame(x=c("a", "b", "c"), col1=c(1, 2, 3), col2=c(3, 2, 1), col3=c(4, 2, 3))
cols <- colnames(df[2:4])
myplots <- vector('list', 3)
plot_function <- function(y_column, data) {
ggplot(data, aes_string(x="x", y=y_column, fill = "x")) +
geom_col() +
labs(title=paste("lapply:", y_column))
}
myplots <- lapply(cols, plot_function, df)
myplots[[3]])
I know what to bring in a second variable that I will use to select rows. In my minimal example I am skipping the selection and just reusing the same plots and dfs as before, I simply add 3 iterations. So I would like to generate the same three plots as above, but now labelled as iteration A, B, and C.
I took me a while to sort out the syntax, but I now get that mapply needs to vectors of identical length that get passed on to the function as matched pairs. So I am using expand.grid to generate all pairs of variable 1 and variable 2 to create a dataframe and then pass the first and second column on via mapply. The next problem to sort out was that I need to pass on the dataframe as list MoreArgs =. So it seems like everything should be good to go. I am using the same syntax for aes_string() as above in my lapply example.
However, for some reason now it is not evaluating the y_column properly, but simply taking it as a value to plot, not as an indicator to plate the values contained in df$col1.
HELP!
require(ggplot2)
# using mapply with ggplot
df <- data.frame(x=c("a", "b", "c"), col1=c(1, 2, 3), col2=c(3, 2, 1), col3=c(4, 2, 3))
cols <- colnames(df[2:4])
iteration <- c("Iteration A", "Iteration B", "Iteration C")
multi_plot_function <- function(y_column, iteration, data) {
plot <- ggplot(data, aes_string(x="x", y=y_column, fill = "x")) +
geom_col() +
labs(title=paste("mapply:", y_column, "___", iteration))
}
# mapply call
combo <- expand.grid(cols=cols, iteration=iteration)
myplots <- mapply(multi_plot_function, combo[[1]], combo[[2]], MoreArgs = list(df), SIMPLIFY = F)
myplots[[3]]
We may need to use rowwise here
out <- lapply(asplit(combo, 1), function(x)
multi_plot_function(x[1], x[2], df))
In the OP's code, the only issue is that the columns are factor for 'combo', so it is not parsed correctly. If we change it to character, it works
out2 <- mapply(multi_plot_function, as.character(combo[[1]]),
as.character(combo[[2]]), MoreArgs = list(df), SIMPLIFY = FALSE)
-testing
out2[[1]]
reprod:
df1 <- data.frame(X = c(0:9), Y = c(10:19))
df2 <- data.frame(X = c(0:9), Y = c(10:19))
df3 <- data.frame(X = c(0:9), Y = c(10:19))
list_of_df <- list(A = df1, B = df2, C = df3)
list_of_df
I'm trying to apply the rollmean function from zoo to every 'Y' column in this list of dataframes.
I've tried lapply with no success, It seems no matter which way i spin it, there is no way to get around specifying the dataframe you want to apply to at some point.
This does one of the dataframes
roll_mean <- rollmean(list_of_df$A, 2)
roll_mean
obviously this doesn't work:
roll_mean1 <- rollmean(list_of_df, 2)
roll_mean1
I also tried this:
subset(may not be necessary)
Sub1 <- lapply(list_of_df, "[", 2)
roll_mean1 <- rollmean(Sub1, 2)
roll_mean1
there doesn't seem to be a way to do it without having to
specify the particular dataframe in the rollmean function
lapply(list_of_df), function(x) rollmean(list_of_df, 2))
for loop? also no success
For (i in list_of_df) {roll_mean1 <- rollmean(Sub1, 2)
Exp
}
Stating the obvious but I'm very new to coding in general and would appreciate some pointers.
It has occurred to me that even if it did work, the column that has been averaged would be one value longer than the rest of the dataframe; how would I get around that?
The question at one point says that it wants to perform the rollmean only on Y and at another point says that this works roll_mean <- rollmean(list_of_df$A, 2) but that does all columns.
1) Assuming that you want to apply rollmean to all columns:
Use lapply like this:
lapply(list_of_df, rollmean, 2)
This also works:
for(i in seq_along(list_of_df)) list_of_df[[i]] <- rollmean(list_of_df[[i]], 2)
2) If you only want to apply it to the Y column:
lapply(list_of_df, transform, Y = rollmean(Y, 2, fill = NA))
or
for(i in seq_along(list_of_df)) {
list_of_df[[i]]$Y <- rollmean(list_of_df[[i]]$Y, 2, fill = NA)
}
Assume I have a value x which is of some (unknown) type (especially: scalar, vector or list). I would like to get the R expression representing this value. If x == 1 then this function should simply return expression(1). For x == c(1,2)) this function should return expression(c(1,2)). The enquote function is quite near to that what I want, but not exactly.
By some playing around I found the following "solution" to my problem:
get_expr <- function(val) {
tmp_expr <- enquote(val)
tmp_expr[1] <- quote(expression())
return(eval(tmp_expr))
}
get_expr(1) # returns expression(1)
get_expr(c(1, 2)) # returns expression(c(1, 2))
get_expr(list(x = 1)) # returns expression(list(x = 1))
But I think my get_expr function is some kind of hack. Logically, the evaluation should not be necessary.
Is there some more elegant way to do this? As far as I see, substitute does not really work for me, because the parameter of my get_expr function may be the result of an evaluation (and substitute(eval(expr)) does not do the evaluation).
I found another way via parse(text = deparse(val)), but this is even more a bad hack...
as.expression(list(...)) seems to do it:
> get_expr <- function(val) as.expression(list(val))
> str(get_expr(1))
expression(1)
> str(get_expr(c(1, 2)))
expression(c(1, 2))
> str(get_expr(list(x=1)))
expression(list(x = 1))
> val <- list(x=1, y=2)
> str(get_expr(val))
expression(list(x = 1, y = 2))
You can use substitute(), and just need to call it a bit differently:
express <- function(e) substitute(expression(x), env = list(x=e))
v1 <- c(1, 2)
express(v1)
# expression(c(1, 2))
v2 <- list(a = 1, b = 2)
express(v2)
# expression(list(a = 1, b = 2))
I'm trying to run apply a function to each row of a dataset. The function looks up matching rows in a second dataset and computes a similarity score for the product details passed to it.
The function works if I just call it with test numbers but I can't figure out how to run it on all rows of my dataset. I've tried using apply but can't get it working.
I'm going to be iterating different parameter settings to find those that best fit historical data so speed is important... meaning that a loop is out. Any help you can provide would be hugely appreciated.
Thanks! Alan
GetDistanceTest <- function(SnapshotDate, Cand_Type, Cand_Height, Cand_Age) {
HeightParam <- 1/5000
AgeParam <- 1
Stock_SameType <- HistoricalStock[!is.na(HistoricalStock$date) & !is.na(HistoricalStock$Type) & as.character(HistoricalStock$date)==as.character(SnapshotDate) & HistoricalStock$Type==Cand_Type,]
Stock_SameType$ED <- (HeightParam*(Stock_SameType$Height - Cand_Height))^2 + (AgeParam*(Stock_SameType$Age - Cand_Age))^2
return(sqrt(sum(Stock_SameType$ED)))
}
HistoricalStock <- HistoricalAQStock[,c(1, 3, 4, 5)]
colnames(HistoricalStock) <- c("date", "Age", "Height", "Type")
Sales <- AllSales[,c(2,10,11,25)]
colnames(Sales) <- c("date", "Age", "Height", "Type")
GetDistanceTest("2010-04-01", 5261, 12, 7523) #works and returns a single number
res1 <- transform(Sales, ClusterScore=GetDistanceTest(date, Type, Height, Age))
# returns Error in `$<-.data.frame`(`*tmp*`, "ED", value = c(419776714.528591, 22321257.0276852, : replacement has 4060 rows, data has 54
# also 4 warnings, one for each variable. e.g. 1: In as.character(HistoricalStock$date) == as.character(SnapshotDate) : longer object length is not a multiple of shorter object length
res2 <- apply(Sales, 1, GetDistanceTest, Sales$Type, Sales$Height, Sales$Age)
# `$<-.data.frame`(`*tmp*`, "ED", value = c(419648071.041523, 22325941.2704261, : replacement has 4060 rows, data has 13
# also same 4 warnings as res1
I took some liberties with your code b/c I try to vectorize vice use loops whenever I can... With the merge function, you merge the two data frames, and operate on the "columns", which allows you to use the vectorization built into R. I think this will do what you want (in the second line I'm just making sure that A and B don't have the same values for height and age so that your distance isn't always zero):
A <- B <- data.frame(date=Sys.Date()-9:0, stock=letters[1:10], type=1:10, height=1:10, age=1:10)
B$height <- B$age <- 10:1
AB <- merge(x=A, y=B, by=c("date", "type"), suffixes=c(".A", ".B"))
height.param <- 1/5000
age.param <- 1
temp <- sqrt( height.param * (AB$height.A - AB$height.B)^2 + age.param * (AB$age.A - AB$age.B)^2 )
Use mapply, the multivariate form of apply:
res1 <- mapply(GetDistanceTest, Sales$date, Sales$Type, Sales$Height, Sales$Age)
Code as per above comment:
A <- data.frame(date=rep(Sys.Date()-9:0,100), id=letters[1:10], type=floor(runif(1000, 1, 10)), height=runif(1000, 1, 100), age=runif(1000, 1, 100))
B <- data.frame(date=rep(Sys.Date()-9:0,1000), type=floor(runif(10000, 1, 10)), height=runif(10000, 1, 10), age=runif(10000, 1, 10))
AB <- merge(x=A, y=B, by=c("date", "type"), suffixes=c(".A", ".B"))
height.param <- 1
age.param <- 1
AB$ClusterScore <- sqrt( height.param * (AB$height.A - AB$height.B)^2 + age.param * (AB$age.A - AB$age.B)^2 )
Scores <- ddply(AB, c("id"), function(df)sum(df$ClusterScore))