R - How to Create Custom Ifelse function that repeats - r

I am quite familiar with R's standard ifelse statement, and how to create nested ifelse statements. I however want to create a "better" version, so that I dont have to copy / paste ifelse so many times.
Take this nested ifelse statement for example:
df <- data.frame(b = 1:5)
df$a <- ifelse(df$b == 1,1,
ifelse(df$b == 2,2,
ifelse(df$b == 3,3,4)))
Instead, what I would like to do is create a function like I could call like this:
df$a <- myFunction(df$b == 1,1,
df$b == 2,2,
df$b == 3,3,4)
I would want the function to be able to pick up how many arguments I have entered, and thus know how many ifelse statements to include and then plug the arguments into the correct position, up to however many I want.
There is still some repetition, but when creating longer nested ifelse statements it would be nice to not have to repeat that piece of code, and then try to keep track of ending paren's.

We can use Reduce() to build up the required parse tree of nested ifelse() calls and then eval() it:
ifelses <- function(...) {
## validate number of args is at least 3 and odd
stopifnot(nargs()>=3L);
stopifnot(nargs()%%2L==1L);
## precompute the required number of calls and the argument parse tree list
num <- (nargs()-1L)%/%2L;
cl <- match.call();
## build up the parse tree of nested ifelse() calls using Reduce(), then eval() it
## terminology (following docs): ifelse(test,yes,no)
eval(Reduce(
function(i,noArg) call('ifelse',cl[[i]],cl[[i+1L]],noArg),
seq(2L,by=2L,len=num), ## indexes of "test" args
cl[[length(cl)]], ## first (innermost) "no" arg
T ## proceed from right-to-left, IOW inside-out wrt parse tree
));
}; ## end ifelses()
Useful docs:
nargs()
stopifnot()
match.call()
Reduce()
call()
eval()
seq()
ifelse()
Demo:
ifelses(c(F,T,F,F),1:4,c(T,F,F,F),5:8,c(F,T,F,T),9:12,13:16);
## [1] 5 2 15 12
OP's example:
df <- data.frame(b=1:5);
df$a <- ifelses(df$b==1L,1L,df$b==2L,2L,df$b==3L,3L,4L);
df;
## b a
## 1 1 1
## 2 2 2
## 3 3 3
## 4 4 4
## 5 5 4

This is a job for merging with a lookup table. You can wrap that in a function, but usually I wouldn't bother:
df <- data.frame(b = 1:5)
lookupif <- function(df, x, y, else.val = NA, on.col, res.col = "val") {
lookup <- data.frame(x, y)
names(lookup)[1] <- res.col
df <- merge(df, lookup, by.x = on.col, by.y = "y", all.x = TRUE)
df[is.na(df[[res.col]]), res.col] <- else.val
df
}
lookupif(df, 1:3, 1:3, 4, "b")
# b val
#1 1 1
#2 2 2
#3 3 3
#4 4 4
#5 5 4

dplyr::case_when is a cascading alternative to nested ifelses, e.g.
library(dplyr)
df <- data.frame(b = 1:5)
df %>% mutate(a = case_when(b == 1 ~ 1,
b == 2 ~ 2,
b == 3 ~ 3,
TRUE ~ 4))
#> b a
#> 1 1 1
#> 2 2 2
#> 3 3 3
#> 4 4 4
#> 5 5 4
or just steal it and put it in base syntax:
df$a <- with(df, dplyr::case_when(b == 1 ~ 1,
b == 2 ~ 2,
b == 3 ~ 3,
TRUE ~ 4))
which returns the same thing.
Since it's already about as simple as you can get without sacrificing the versatility of ifelse, it may not need to be put into a function, but it could, if you like. Using the development version's new rlang NSE syntax,
add_cases <- function(.data, .col, ...){
.data %>% mutate(!!.col := case_when(!!!quos(...)))
}
df %>% add_cases(.col = 'a',
b == 1 ~ 1,
b == 2 ~ 2,
b == 3 ~ 3,
TRUE ~ 4)
#> b a
#> 1 1 1
#> 2 2 2
#> 3 3 3
#> 4 4 4
#> 5 5 4

Sorry for shameless advertisement - you can try if_val function in my package expss
b = sample(1:7, 10, replace = TRUE)
if_val(b, 1 ~ 1, 2 ~ 2, 3 ~ 3, other ~ 4)
There is also ifs function: ifs(b==1 ~ 1, b==2 ~ 2, b==3 ~ 3, TRUE ~ 4).

Related

Updating a vector outside the loop in map(), using R

I have the following simple vector:
a = c(1,0,0,1,0,0,0,0)
and I would like to obtain a vector (b) such that for each indicator x in a, if a[x] is 1, we let it as is, and if it is 0, we compute a[x-1] + 1, until the next 1:
b = c(1,2,3,1,2,3,4,5)
I tried using map():
map(
.x = seq(1,(length(a))),
.f = function(x) {
a[x] = ifelse(a[x]==1, a[x], a[x-1]+1)
a})
Obviously this does not work because map does not update the a vector. How can I do this using map(). Is it even possible to update a something outside map() ?
If you just change it to use the superassignment operator <<-, the way you attempted it does in fact work.
a = c(1,0,0,1,0,0,0,0)
map(
.x = seq(1,(length(a))),
.f = function(x) {
a[x] <<- ifelse(a[x]==1, a[x], a[x-1]+1)
a})
a
#> [1] 1 2 3 1 2 3 4 5
Maybe a solution close to what you're looking (i.e. that would mimic a for loop) for is purrr::accumulate.
accumulate(1:8, .f = ~ ifelse(a[.y] == 1, 1, .x + 1))
#[1] 1 2 3 1 2 3 4 5

R: pass multiple arguments to accumulate/reduce

This is related to R: use the newly generated data in the previous row
I realized the actual problem I was faced with is a bit more complicated than the example I gave in the thread above - it seems I have to pass 3 arguments to the recursive calculation to achieve what I want. Thus, accumulate2 or reduce may not work. So I open a new question here to avoid possible confusion.
I have the following dataset grouped by ID:
ID <- c(1, 2, 2, 3, 3, 3)
pw <- c(1:6)
add <- c(1, 2, 3, 5, 7, 8)
x <- c(1, 2, NA, 4, NA, NA)
df <- data.frame(ID, pw, add, x)
df
ID pw add x
1 1 1 1 1
2 2 2 2 2
3 2 3 3 NA
4 3 4 5 4
5 3 5 7 NA
6 3 6 8 NA
Within each group for column x, I want to keep the value of the first row as it is, while fill in the remaining rows with lagged values raised to the power stored in pw, and add to the exponent the value in add. I want to update the lagged values as I proceed. So I would like to have:
ID pw add x
1 1 1 1 1
2 2 2 2 2
3 2 3 3 2^3 + 3
4 3 4 5 4
5 3 5 7 4^5 + 7
6 3 6 8 (4^5 + 7)^6 + 8
I have to apply this calculation to a large dataset, so it would be perfect if there is a fast way to do this!
If we want to use accumulate2, then specify the arguments correctly i.e. it takes two input arguments as 'pw' and 'add' and an initialization argument which would be the first value of 'x'. As it is a grouped by 'ID', do the grouping before we do the accumulate2, extract the lambda default arguments ..1, ..2 and ..3 respectively in that order and create the recursive function based on this
library(dplyr)
library(purrr)
out <- df %>%
group_by(ID) %>%
mutate(x1 = accumulate2(pw[-1], add[-1], ~ ..1^..2 + ..3,
.init = first(x)) %>%
flatten_dbl ) %>%
ungroup
out$x1
#[1] 1 2 11
#[4] 4 1031 1201024845477409792
With more than 3 arguments, a for loop would be better
# // initialize an empty vector
out <- c()
# // loop over the `unique` ID
for(id in unique(df$ID)) {
# // create a temporary subset of data based on that id
tmp_df <- subset(df, ID == id)
# // initialize a temporary storage output
tmp_out <- numeric(nrow(tmp_df))
# // initialize first value with the first element of x
tmp_out[1] <- tmp_df$x[1]
# // if the number of rows is greater than 1
if(nrow(tmp_df) > 1) {
// loop over the rows
for(i in 2:nrow(tmp_df)) {
#// do the recursive calculation and update
tmp_out[i] <- tmp_out[i - 1]^ tmp_df$pw[i] + tmp_df$add[i]
}
}
out <- c(out, tmp_out)
}
out
#[1] 1 2 11
#[4] 4 1031 1201024845477409792
In base R we could use the following solution for more than two arguments.
In this solution I first subset the original data set on ID values
Then I chose row id values through seq_len(nrow(tmp))[-1] omitting the first row id since it was provided by init
In anonymous function I used in Reduce, b argument represents accumulated/ previous value starting from init and c represents new/current values of our vector which is row numbers
So in every iteration our previous value (starting from init) will be raised to the power of new value from pw and will be summed by new value from add
cbind(df[-length(df)], unlist(lapply(unique(df$ID), function(a) {
tmp <- subset(df, df$ID == a)
Reduce(function(b, c) {
b ^ tmp$pw[c] + tmp$add[c]
}, init = tmp$x[1],
seq_len(nrow(tmp))[-1], accumulate = TRUE)
}))) |> setNames(c(names(df)))
ID pw add x
1 1 1 1 1.000000e+00
2 2 2 2 2.000000e+00
3 2 3 3 1.100000e+01
4 3 4 5 4.000000e+00
5 3 5 7 1.031000e+03
6 3 6 8 1.201025e+18
Data
structure(list(ID = c(1, 2, 2, 3, 3, 3), pw = 1:6, add = c(1,
2, 3, 5, 7, 8), x = c(1, 2, NA, 4, NA, NA)), class = "data.frame", row.names = c(NA,
-6L))
Base R, not using Reduce() but rather a while() Loop:
# Split-apply-combine while loop: res => data.frame
res <- do.call(rbind, lapply(with(df, split(df, ID)), function(y){
# While there are any NAs in x:
while(any(is.na(y$x))){
# Store the index of the first NA value: idx => integer scalar
idx <- with(y, head(which(is.na(x)), 1))
# Calculate x at that index using the business rule provided:
# x => numeric vector
y$x[idx] <- with(y, x[(idx-1)] ** pw[idx] + add[idx])
}
# Explicitly define the return object: y => GlobalEnv
y
}
)
)
OR recursive function:
# Recursive function: estimation_func => function()
estimation_func <- function(value_vec, exponent_vec, add_vec){
# Specify the termination condition; when all elements
# of value_vec are no longer NA:
if(all(!(is.na(value_vec)))){
# Return value_vec: numeric vector => GlobalEnv
return(value_vec)
# Otherwise recursively apply the below:
}else{
# Store the index of the first na value: idx => integer vector
idx <- Position(is.na, value_vec)
# Calculate the value of the value_vec at that index;
# using the provided business logic: value_vec => numeric vector
value_vec[idx] <- (value_vec[(idx-1)] ** exponent_vec[idx]) + add_vec[idx]
# Recursively apply function: function => Local Env
return(estimation_func(value_vec, exponent_vec, add_vec))
}
}
# Split data.frame into a list on ID;
# Overwrite x values, applying recursive function;
# Combine list into a data.frame
# res => data.frame
res <- data.frame(
do.call(
rbind,
Map(function(y){y$x <- estimation_func(y$x, y$pw, y$add); y}, split(df, df$ID))
), row.names = NULL
)

Create new columns for df in for-loop with name of existing columns + string

My dataframe is structured like the following:
ID A_L A_R B_L B_R
1 7 5 6 3
2 3 2 3 1
3 6 3 4 5
The goal is to create a new column for each existing column (besides the first column ID) dividing the value of the existing column through its L/R counterpart. So A_L_ratio = A_L/A_R and A_R_ratio = A_R/A_L etc.
I've tried to create a for-loop, using if/elseto differentiate between odd and even indices.
for (col in 2:length(df)) {
if( (col%%2) == 0){
a <- df[,col] / df[,col+1]}
else{
a <- df[,col] / df[,col-1]}
df[colnames(df[col])"_ratio"] <- a
}
But I seem to fail at R's syntax when it comes to naming the columns. Name should be the name of the column that is called in each loop df[,col] + the string _ratio. At the end I want to append that columne to df. Could someone tell me the right syntax to do this? Thanks a lot!
You need to paste the colnames to the string "_ratio". Something like this, maybe:
# Create the data.frame
df <- data.frame(
ID = 1:3,
A_L = c(7, 3, 6),
A_R = c(5, 2, 3),
B_L = c(6, 3, 4),
B_R = c(3, 1, 5)
)
# create the cols with "_ratio" character appended
for (col in 2:length(df)) {
if( (col%%2) == 0){
a <- df[,col] / df[,col+1]
} else {(a <- df[,col] / df[,col-1])}
df[paste(colnames(df[col]), "_ratio", sep = "")] <- a
}
There are easier and more efficient ways to do this using the dplyr package, though.
Don't konw how important this is, but I got the imrpession you have many more columns than what is shown? If so better take it nice and slow so you don't get errors.
If safety checks are not needed, then disregard this.
df <- read.table( text="
ID A_L A_R B_L B_R
1 7 5 6 3
2 3 2 3 1
3 6 3 4 5
", header=TRUE )
var.names.L <- grep( "_L$", colnames(df) , value=TRUE )
var.names.R <- sub( "_L", "_R", var.names.L )
i.L.name.ok <- var.names.R %in% colnames(df)
ok.L.names <- var.names.L[i.has.R.name]
ok.R.names <- var.names.R[i.has.R.name]
new.columns.1 <- df[, ok.L.names ] / df[, ok.R.names ]
colnames(new.columns.1) <- paste0( colnames(new.columns.1), "_ratio" )
new.columns.2 <- df[, ok.R.names ] / df[, ok.L.names ]
colnames(new.columns.2) <- paste0( colnames(new.columns.2), "_ratio" )
cbind.data.frame(
df,
new.columns.1,
new.columns.2
)
The above code nice and neatly checks that for every _L column there is a coresponding _R column, and then it performs the divition with only those columns.
Output:
ID A_L A_R B_L B_R A_L_ratio B_L_ratio A_R_ratio B_R_ratio
1 1 7 5 6 3 1.4 2.0 0.7142857 0.5000000
2 2 3 2 3 1 1.5 3.0 0.6666667 0.3333333
3 3 6 3 4 5 2.0 0.8 0.5000000 1.2500000

Nested ifelse: improved syntax

Description
ifelse() function allows to filter the values in a vector through a series of tests, each of them producing different actions in case of a positive result. For instance, let xx be a data.frame, as follows:
xx <- data.frame(a=c(1,2,1,3), b=1:4)
xx
a b
1 1
2 2
1 3
3 4
Suppose that you want to create a new column, c, from column b, but depending on the values in column a in the following way:
For each row,
if the value in column a is 1, the value in column c, is the same value in column b.
if the value in column a is 2, the value in column c, is 100 times the value in column b.
in any other case, the value in column c is the negative of the value in column b.
Using ifelse(), a solution could be:
xx$c <- ifelse(xx$a==1, xx$b,
ifelse(xx$a==2, xx$b*100,
-xx$b))
xx
a b c
1 1 1
2 2 200
1 3 3
3 4 -4
Problem 1
An aesthetic problem arises when the number of tests increases, say, four tests:
xx$c <- ifelse(xx$a==1, xx$b,
ifelse(xx$a==2, xx$b*100,
ifelse(xx$a==3, ...,
ifelse(xx$a==4, ...,
...))))
I found partial solution to the problem in this page, which consists in the definition of the functions if.else_(), i_(), e_(), as follows:
library(lazyeval)
i_ <- function(if_stat, then) {
if_stat <- lazyeval::expr_text(if_stat)
then <- lazyeval::expr_text(then)
sprintf("ifelse(%s, %s, ", if_stat, then)
}
e_ <- function(else_ret) {
else_ret <- lazyeval::expr_text(else_ret)
else_ret
}
if.else_ <- function(...) {
args <- list(...)
for (i in 1:(length(args) - 1) ) {
if (substr(args[[i]], 1, 6) != "ifelse") {
stop("All but the last argument, need to be if.then_ functions.", call. = FALSE)
}
}
if (substr(args[[length(args)]], 1, 6) == "ifelse"){
stop("Last argument needs to be an else_ function.", call. = FALSE)
}
args$final <- paste(rep(')', length(args) - 1), collapse = '')
eval_string <- do.call('paste', args)
eval(parse(text = eval_string))
}
In this way, the problem given in the Description, can be rewritten as follows:
xx <- data.frame(a=c(1,2,1,3), b=1:4)
xx$c <- if.else_(
i_(xx$a==1, xx$b),
i_(xx$a==2, xx$b*100),
e_(-xx$b)
)
xx
a b c
1 1 1
2 2 200
1 3 3
3 4 -4
And the code for the four tests will simply be:
xx$c <- if.else_(
i_(xx$a==1, xx$b),
i_(xx$a==2, xx$b*100),
i_(xx$a==3, ...), # dots meaning actions for xx$a==3
i_(xx$a==4, ...), # dots meaning actions for xx$a==4
e_(...) # dots meaning actions for any other case
)
Problem 2 & Question
The given code apparently solves the problem. Then, I wrote the following test function:
test.ie <- function() {
dd <- data.frame(a=c(1,2,1,3), b=1:4)
if.else_(
i_(dd$a==1, dd$b),
i_(dd$a==2, dd$b*100),
e_(-dd$b)
) # it should give c(1, 200, 3, -4)
}
When I tried the test:
test.ie()
it spit the following error message:
Error in ifelse(dd$a == 1, dd$b, ifelse(dd$a == 2, dd$b * 100, -dd$b)) :
object 'dd' not found
Question
Since the if.else_() syntactic constructor is not supposed to run only from the console, is there a way for it to 'know' the variables from the function that calls it?
Note
In "Best way to replace a lengthy ifelse structure in R", a similar problem was posted. However, the given solution there focuses on building the table's new column with the given constant output values (the "then" or "else" slots of the ifelse() function), whereas my case addresses a syntactic problem in which the "then" or "else" slots can even be expressions in terms of other data.frame elements or variables.
I think you can use dplyr::case_when inside dplyr::mutate to achieve this.
library(dplyr)
df <- tibble(a=c(1,2,1,3), b=1:4)
df %>%
mutate(
foo = case_when(
.$a == 1 ~ .$b,
.$a == 2 ~ .$b * 100L,
TRUE ~ .$b * -1L
)
)
#> # A tibble: 4 x 3
#> a b foo
#> <dbl> <int> <int>
#> 1 1 1 1
#> 2 2 2 200
#> 3 1 3 3
#> 4 3 4 -4
In the upcoming relase of dplyr 0.6.0 you won't need to use the akward work-around of .$, and you can just use:
df %>%
mutate(
foo = case_when(
a == 1 ~ b,
a == 2 ~ b * 100L,
TRUE ~ b * -1L
)
)
Taking into account MrFlick's advice, I re-coded the if.else_() function as follows:
if.else_ <- function(...) {
args <- list(...)
for (i in 1:(length(args) - 1) ) {
if (substr(args[[i]], 1, 6) != "ifelse") {
stop("All but the last argument, need to be if.then_ functions.", call. = FALSE)
}
}
if (substr(args[[length(args)]], 1, 6) == "ifelse"){
stop("Last argument needs to be an else_ function.", call. = FALSE)
}
args$final <- paste(rep(')', length(args) - 1), collapse = '')
eval_string <- do.call('paste', args)
eval(parse(text = eval_string), envir = parent.frame())
}
Now the test.ie() function runs properly
test.ie()
[1] 1 200 3 -4
With full respect to the OP's remarkable effort to improve nested ifelse(), I prefer a different approach which I believe is easy to write, concise, maintainable and fast:
xx <- data.frame(a=c(1L,2L,1L,3L), b=1:4)
library(data.table)
# coerce to data.table, and set the default first
setDT(xx)[, c:= -b]
xx[a == 1L, c := b] # 1st special case
xx[a == 2L, c := 100L*b] # 2nd special case, note use of integer 100L
# xx[a == 3L, c := ...] # other cases
# xx[a == 4L, c := ...]
#...
xx
# a b c
#1: 1 1 1
#2: 2 2 200
#3: 1 3 3
#4: 3 4 -4
Note that for the 2nd special case b is multiplied by the integer constant 100L to make sure that the right hand sides are all of type integer in order to avoid type conversion to double.
Edit 2: This can also be written in an even more concise (but still maintainable) way as a one-liner:
setDT(xx)[, c:= -b][a == 1L, c := b][a == 2L, c := 100*b][]
data.table chaining works here, because c is updated in place so that subsequent expressions are acting on all rows of xx even if the previous expression was a selective update of a subset of rows.
Edit 1: This approach can be implemented with base R as well:
xx <- data.frame(a=c(1L,2L,1L,3L), b=1:4)
xx$c <- -xx$b
idx <- xx$a == 1L; xx$c[idx] <- xx$b[idx]
idx <- xx$a == 2L; xx$c[idx] <- 100 * xx$b[idx]
xx
# a b c
#1 1 1 1
#2 2 2 200
#3 1 3 3
#4 3 4 -4

How to manipulate data by row in a data frame

I'm getting a bit confused. I've got data like this in a data frame
index times
1 1 56.60
2 1 150.75
3 1 204.41
4 2 44.71
5 2 98.03
6 2 112.20
and I know that the times indexed 1 are biased, whereas the times indexed otherwise are not. I need to create a copy of that data frame removing the bias from the samples indexed 1. I've been trying several combinations of apply, by, and the likes. The closest I got was with
by(lct, lct$index, function(x) { if(x$index == 1) x$times = x$times-50 else x$times = x$times } )
which returned an object of class by, which is unusable for me. I need to write the data back to a csv file in the same format (index, times) of the original file. Ideas?
Something like this should work:
df$times[df$index ==1] <- df$times[df$times == 1] - 50
The trick here is to take the subset of df$times that fits your filter, and realize that R can also assign to a subset.
Alternatively, you can use ifelse:
df$times = ifelse(df$index == 1, df$times - 50, df$times)
and use it in dplyr:
library(dplyr)
df = data.frame(index = sample(1:5, 100, replace = TRUE),
value = runif(100)) %>% arrange(index)
df %>% mutate(value = ifelse(index == 1, value - 50, value))
# index value
#1 1 -49.95827
#2 1 -49.98104
#3 1 -49.44015
#4 1 -49.37316
#5 1 -49.76286
#6 1 -49.22133
#etc
How about,
index <- c(1, 1, 1, 2, 2, 2)
times <- c(56.60, 150.75, 204.41, 44.71, 98.03, 112.20)
df <- data.frame(index, times)
df$times <- ifelse(df$index == 1, df$times - 50, df$times)
> df
#index times
#1 1 6.60
#2 1 100.75
#3 1 154.41
#4 2 44.71
#5 2 98.03
#6 2 112.20

Resources