Nested ifelse: improved syntax - r

Description
ifelse() function allows to filter the values in a vector through a series of tests, each of them producing different actions in case of a positive result. For instance, let xx be a data.frame, as follows:
xx <- data.frame(a=c(1,2,1,3), b=1:4)
xx
a b
1 1
2 2
1 3
3 4
Suppose that you want to create a new column, c, from column b, but depending on the values in column a in the following way:
For each row,
if the value in column a is 1, the value in column c, is the same value in column b.
if the value in column a is 2, the value in column c, is 100 times the value in column b.
in any other case, the value in column c is the negative of the value in column b.
Using ifelse(), a solution could be:
xx$c <- ifelse(xx$a==1, xx$b,
ifelse(xx$a==2, xx$b*100,
-xx$b))
xx
a b c
1 1 1
2 2 200
1 3 3
3 4 -4
Problem 1
An aesthetic problem arises when the number of tests increases, say, four tests:
xx$c <- ifelse(xx$a==1, xx$b,
ifelse(xx$a==2, xx$b*100,
ifelse(xx$a==3, ...,
ifelse(xx$a==4, ...,
...))))
I found partial solution to the problem in this page, which consists in the definition of the functions if.else_(), i_(), e_(), as follows:
library(lazyeval)
i_ <- function(if_stat, then) {
if_stat <- lazyeval::expr_text(if_stat)
then <- lazyeval::expr_text(then)
sprintf("ifelse(%s, %s, ", if_stat, then)
}
e_ <- function(else_ret) {
else_ret <- lazyeval::expr_text(else_ret)
else_ret
}
if.else_ <- function(...) {
args <- list(...)
for (i in 1:(length(args) - 1) ) {
if (substr(args[[i]], 1, 6) != "ifelse") {
stop("All but the last argument, need to be if.then_ functions.", call. = FALSE)
}
}
if (substr(args[[length(args)]], 1, 6) == "ifelse"){
stop("Last argument needs to be an else_ function.", call. = FALSE)
}
args$final <- paste(rep(')', length(args) - 1), collapse = '')
eval_string <- do.call('paste', args)
eval(parse(text = eval_string))
}
In this way, the problem given in the Description, can be rewritten as follows:
xx <- data.frame(a=c(1,2,1,3), b=1:4)
xx$c <- if.else_(
i_(xx$a==1, xx$b),
i_(xx$a==2, xx$b*100),
e_(-xx$b)
)
xx
a b c
1 1 1
2 2 200
1 3 3
3 4 -4
And the code for the four tests will simply be:
xx$c <- if.else_(
i_(xx$a==1, xx$b),
i_(xx$a==2, xx$b*100),
i_(xx$a==3, ...), # dots meaning actions for xx$a==3
i_(xx$a==4, ...), # dots meaning actions for xx$a==4
e_(...) # dots meaning actions for any other case
)
Problem 2 & Question
The given code apparently solves the problem. Then, I wrote the following test function:
test.ie <- function() {
dd <- data.frame(a=c(1,2,1,3), b=1:4)
if.else_(
i_(dd$a==1, dd$b),
i_(dd$a==2, dd$b*100),
e_(-dd$b)
) # it should give c(1, 200, 3, -4)
}
When I tried the test:
test.ie()
it spit the following error message:
Error in ifelse(dd$a == 1, dd$b, ifelse(dd$a == 2, dd$b * 100, -dd$b)) :
object 'dd' not found
Question
Since the if.else_() syntactic constructor is not supposed to run only from the console, is there a way for it to 'know' the variables from the function that calls it?
Note
In "Best way to replace a lengthy ifelse structure in R", a similar problem was posted. However, the given solution there focuses on building the table's new column with the given constant output values (the "then" or "else" slots of the ifelse() function), whereas my case addresses a syntactic problem in which the "then" or "else" slots can even be expressions in terms of other data.frame elements or variables.

I think you can use dplyr::case_when inside dplyr::mutate to achieve this.
library(dplyr)
df <- tibble(a=c(1,2,1,3), b=1:4)
df %>%
mutate(
foo = case_when(
.$a == 1 ~ .$b,
.$a == 2 ~ .$b * 100L,
TRUE ~ .$b * -1L
)
)
#> # A tibble: 4 x 3
#> a b foo
#> <dbl> <int> <int>
#> 1 1 1 1
#> 2 2 2 200
#> 3 1 3 3
#> 4 3 4 -4
In the upcoming relase of dplyr 0.6.0 you won't need to use the akward work-around of .$, and you can just use:
df %>%
mutate(
foo = case_when(
a == 1 ~ b,
a == 2 ~ b * 100L,
TRUE ~ b * -1L
)
)

Taking into account MrFlick's advice, I re-coded the if.else_() function as follows:
if.else_ <- function(...) {
args <- list(...)
for (i in 1:(length(args) - 1) ) {
if (substr(args[[i]], 1, 6) != "ifelse") {
stop("All but the last argument, need to be if.then_ functions.", call. = FALSE)
}
}
if (substr(args[[length(args)]], 1, 6) == "ifelse"){
stop("Last argument needs to be an else_ function.", call. = FALSE)
}
args$final <- paste(rep(')', length(args) - 1), collapse = '')
eval_string <- do.call('paste', args)
eval(parse(text = eval_string), envir = parent.frame())
}
Now the test.ie() function runs properly
test.ie()
[1] 1 200 3 -4

With full respect to the OP's remarkable effort to improve nested ifelse(), I prefer a different approach which I believe is easy to write, concise, maintainable and fast:
xx <- data.frame(a=c(1L,2L,1L,3L), b=1:4)
library(data.table)
# coerce to data.table, and set the default first
setDT(xx)[, c:= -b]
xx[a == 1L, c := b] # 1st special case
xx[a == 2L, c := 100L*b] # 2nd special case, note use of integer 100L
# xx[a == 3L, c := ...] # other cases
# xx[a == 4L, c := ...]
#...
xx
# a b c
#1: 1 1 1
#2: 2 2 200
#3: 1 3 3
#4: 3 4 -4
Note that for the 2nd special case b is multiplied by the integer constant 100L to make sure that the right hand sides are all of type integer in order to avoid type conversion to double.
Edit 2: This can also be written in an even more concise (but still maintainable) way as a one-liner:
setDT(xx)[, c:= -b][a == 1L, c := b][a == 2L, c := 100*b][]
data.table chaining works here, because c is updated in place so that subsequent expressions are acting on all rows of xx even if the previous expression was a selective update of a subset of rows.
Edit 1: This approach can be implemented with base R as well:
xx <- data.frame(a=c(1L,2L,1L,3L), b=1:4)
xx$c <- -xx$b
idx <- xx$a == 1L; xx$c[idx] <- xx$b[idx]
idx <- xx$a == 2L; xx$c[idx] <- 100 * xx$b[idx]
xx
# a b c
#1 1 1 1
#2 2 2 200
#3 1 3 3
#4 3 4 -4

Related

How to incorporate colSums for both vector and data frame in lappy

H_D<-function(level, zero, ...){
special<-c(0,0,0)
D<-list(special,...)
cell <- do.call(expand.grid, lapply(level, seq)) # create all cell
support <- apply(cell, 1, function(x) +(x != zero)) # create all support set
# provide subset H_D (support sets and given vectors matches
hd<-lapply(D, function (x) cell[colSums(support==x)==length(x),])
h_D<-do.call(rbind, hd)
rownames(h_D)<-1:nrow(h_D)
return(h_D)
}
level<-c(3,2,4)
zero<-c(1,2,1)
y<-c(0,1,1)
H_D(level,zero,y)
> H_D(level,zero,y)
Var1 Var2 Var3
1 1 2 1
2 1 1 2
3 1 1 3
4 1 1 4
My function works fine for the above situation as colSums works for data frame. But if my argument is a vector instead of data frame this is not working. I am getting the following errors. My input argument could a vector or a data frame. How can I incorporate both in my above mention function?
level = 3
zero = 2
y<-1
H_D(level,zero,y)
> H_D(level,zero,y)
Error in colSums(support == x) :
'x' must be an array of at least two dimensions
I tried drop=FALSE, but not working!
We could change the function with an if/else based on the number of columns of 'cell'. If it is one column, then just do the subset or else do the other part of computation
H_D <- function(level, zero, ...){
special <- c(0,0,0)
D <- list(special,...)
cell <- do.call(expand.grid, lapply(level, seq)) # create all cell
if(ncol(cell) == 1) {
h_D <- subset(cell, Var1 != zero)
} else {
support <- apply(cell, 1, function(x) +(x != zero)) # create all support set
# provide subset H_D (support sets and given vectors matches
hd <- lapply(D, function (x) cell[colSums(support==x)==length(x),])
h_D <- do.call(rbind, hd)
rownames(h_D) <- 1:nrow(h_D)
}
return(h_D)
}
-testing
level <- 3
zero <- 2
y <- 1
H_D(level, zero, y)
# Var1
#1 1
#3 3
and the first case
level <- c(3,2,4)
zero <- c(1,2,1)
y <- c(0,1,1)
H_D(level,zero,y)
# Var1 Var2 Var3
#1 1 2 1
#2 1 1 2
#3 1 1 3
#4 1 1 4

Replace nth consecutive occurrence of a value

I want to replace the nth consecutive occurrence of a particular code in my data frame. This should be a relatively easy task but I can't think of a solution.
Given a data frame
df <- data.frame(Values = c(1,4,5,6,3,3,2),
Code = c(1,1,2,2,2,1,1))
I want a result
df_result <- data.frame(Values = c(1,4,5,6,3,3,2),
Code = c(1,0,2,2,2,1,0))
The data frame is time-ordered so I need to keep the same order after replacing the values. I guess that nth() or duplicate() functions could be useful here but I'm not sure how to use them. What I'm missing is a function that would count the number of consecutive occurrences of a given value. Once I have it, I could then use it to replace the nth occurrence.
This question had some ideas that I explored but still didn't solve my problem.
EDIT:
After an answer by #Gregor I wrote the following function which solves the problem
library(data.table)
library(dplyr)
replace_nth <- function(x, nth, code) {
y <- data.table(x)
y <- y[, code_rleid := rleid(y$Code)]
y <- y[, seq := seq_along(Code), by = code_rleid]
y <- y[seq == nth & Code == code, Code := 0]
drop.cols <- c("code_rleid", "seq")
y %>% select(-one_of(drop.cols)) %>% data.frame() %>% return()
}
To get the solution, simply run replace_nth(df, 2, 1)
Using data.table:
library(data.table)
setDT(df)
df[, code_rleid := rleid(df$Code)]
df[, seq := seq_along(Code), by = code_rleid]
df[seq == 2 & Code == 1, Code := 0]
df
# Values Code code_rleid seq
# 1: 1 1 1 1
# 2: 4 0 1 2
# 3: 5 2 2 1
# 4: 6 2 2 2
# 5: 3 2 2 3
# 6: 3 1 3 1
# 7: 2 0 3 2
You could combine some of these (and drop the extra columns after). I'll leave it clear and let you make modifications as you like.

If loop calculation to all records

I have some noisy data with numbers, nulls and characters. I need to check the percentage change in numbers.
For that, I used a regular expression to check the % symbol present in a column. If yes, then extract numbers and subtract the percentage change Else extract numbers and perform calculations to get change.
Below is the reproducible code
df = data.frame(Actual = c('0.10%','55.10%',NA,'20.8B'),
Previous = c('-0.50%','47.90%',NA,'16.6B'))
df
Actual Previous
1 0.10% -0.50%
2 55.10% 47.90%
3 <NA <NA>
4 20.8B 16.6B
# if loop to calculate percentage change
if(grepl("%", df$Actual) & grepl("%", df$Previous)) {
a = as.numeric(stringr::str_extract(df$Actual,"[-\\d.][\\d]"))
p = as.numeric(stringr::str_extract(df$Previous,"[-\\d.][\\d]"))
df$Gain = a - p
} else {
a = as.numeric(stringr::str_extract(df$Actual,"[-\\d.][\\d]"))
p = as.numeric(stringr::str_extract(df$Previous,"[-\\d.][\\d]"))
df$Gain = (a - p)/p * 100
}
df
Actual Previous Gain
1 0.10% -0.50% 0.6
2 55.10% 47.90% 7.2
3 <NA> <NA> <NA>
4 20.8B 16.6B 4.2
The last value should be calculated as 25.30, instead of 4.2
The value of if loop is :
grepl("%", df$Actual) & grepl("%", df$Previous)
[1] TRUE TRUE FALSE FALSE
The last row should be in else loop. Can you help to get mistake in code.
You could use the parse_number-function from the readr-package (one of the tidyverse-packages) in combination with an ifelse condition to achieve what you want.
Using:
library(readr)
library(dplyr)
df %>%
mutate(gain = (parse_number(Actual) - parse_number(Previous)) /
if_else(grepl('%', Actual), 1, parse_number(Previous)/100) )
gives:
Actual Previous gain
1 0.10% -0.50% 0.6000
2 55.10% 47.90% 7.2000
3 <NA> <NA> NA
4 20.8B 16.6B 25.3012
Non-dplyr approach could be
df = data.frame(Actual = c('0.10%','55.10%',NA,'20.8B'),
Previous = c('-0.50%','47.90%',NA,'16.6B'), stringsAsFactors = FALSE)
df
percChange <- function(x) {
if (all(grepl("%", x))){
d <- diff(rev(as.numeric(gsub("[^-\\d{1,2}.\\d+]", "", x, perl = TRUE))))
}
else {
n <- rev(as.numeric(gsub("[^-\\d{1,2}.\\d+]", "", x, perl = TRUE)))
d <- diff(n) / n[1] * 100
}
return (d)
}
df$diff <- apply(df, 1, percChange)
df
Actual Previous diff
1 0.10% -0.50% 0.6000
2 55.10% 47.90% 7.2000
3 <NA> <NA> NA
4 20.8B 16.6B 25.3012
Also, regarding what is wrong with your loop - running it throws the following error:
Warning message:
In if (grepl("%", df$Actual) & grepl("%", df$Previous)) { :
the condition has length > 1 and only the first element will be used
Meaning that only the first element (which is TRUE because the first row has % values for both columns) will be used. So your outcome in row 4 is 20-16 = 4! You have to loop over the rows to prevent this

Flag rows in matrix that contain the same set of values

I have a matrix of integers
m <- rbind(c(1,2),
c(3,6),
c(5,1),
c(2,1),
c(6,3))
and I am looking for a function that takes this matrix as input and outputs a vector flag with length(flag) == ncol(m) that assigns the rows that contain the same set of integers the same unique (let's say integer) value.
For the above example, the desired output would be:
flag <- c(1, 2, 3, 1, 2)
So rows 1 and 4 inm get the same flag 1, because they both contain the same set of integers, in this case {1, 2}. Similarly, rows 2 and 5 get the same flag.
The solution should work for any number of columns.
The only thing I could come up with is the following approach ...
FlagSymmetric <- function(x) {
vec_sim <- rep(NA, nrow(x)) # object containing flags
ind_ord <- ncol(x)
counter <- 1
for(i in 1:nrow(x)) {
if(is.na(vec_sim[i])) { # if that row is not flagged yet, proceed ...
vec_sim[i] <- counter # ... and give the next free flag
for(j in (i+1):nrow(x)) {
if( (i+1) > nrow(x) ) next # in case of tiny matrices
ind <- x[j, ] %in% x[i, ]
if(sum(ind)==ind_ord) vec_sim[j] <- counter # if the same, assign flag
}
counter <- counter + 1
}
}
return(vec_sim)
}
... which does what I want:
> FlagSymmetric(m)
[1] 1 2 3 1 2
If n = nrow(m) this needs 1/2 n^2 operations. Of course, I could make it much quicker by writing this in C++, but this only alleviates my problem to some extent, because I am working with matrices with a potentially huge number of rows.
I guess there must be a smarter way of doing this.
EDIT:
Additional, more general example (sorting row and pasting to character string not possible):
m2 <- rbind(c(1,112),
c(11,12),
c(12,11),
c(112,1),
c(6,3))
flag2 <- c(1, 2, 2, 1, 3) # desired output
FlagSymmetric(m2) # works
[1] 1 2 2 1 3
Assuming you only have numeric data in your matrix.
First converting the matrix to dataframe,
m <- data.frame(m)
We can sort every row and paste them together. Convert them to factor and then to numeric to get unique numbers for every combination
m$flag <- as.numeric(factor(apply(m, 1, function(x) paste0(sort(x), collapse = ""))))
m
# X1 X2 flag
#1 1 2 1
#2 3 6 3
#3 5 1 2
#4 2 1 1
#5 6 3 3
EDIT
The above solution does not work for every combination as explained in the new example. To differentiate between each number, as #d.b commented we can use any non-empty collapse argument. For updated example,
as.numeric(factor(apply(m2, 1, function(x) paste0(sort(x), collapse = "-"))))
#[1] 1 2 2 1 3

R - How to Create Custom Ifelse function that repeats

I am quite familiar with R's standard ifelse statement, and how to create nested ifelse statements. I however want to create a "better" version, so that I dont have to copy / paste ifelse so many times.
Take this nested ifelse statement for example:
df <- data.frame(b = 1:5)
df$a <- ifelse(df$b == 1,1,
ifelse(df$b == 2,2,
ifelse(df$b == 3,3,4)))
Instead, what I would like to do is create a function like I could call like this:
df$a <- myFunction(df$b == 1,1,
df$b == 2,2,
df$b == 3,3,4)
I would want the function to be able to pick up how many arguments I have entered, and thus know how many ifelse statements to include and then plug the arguments into the correct position, up to however many I want.
There is still some repetition, but when creating longer nested ifelse statements it would be nice to not have to repeat that piece of code, and then try to keep track of ending paren's.
We can use Reduce() to build up the required parse tree of nested ifelse() calls and then eval() it:
ifelses <- function(...) {
## validate number of args is at least 3 and odd
stopifnot(nargs()>=3L);
stopifnot(nargs()%%2L==1L);
## precompute the required number of calls and the argument parse tree list
num <- (nargs()-1L)%/%2L;
cl <- match.call();
## build up the parse tree of nested ifelse() calls using Reduce(), then eval() it
## terminology (following docs): ifelse(test,yes,no)
eval(Reduce(
function(i,noArg) call('ifelse',cl[[i]],cl[[i+1L]],noArg),
seq(2L,by=2L,len=num), ## indexes of "test" args
cl[[length(cl)]], ## first (innermost) "no" arg
T ## proceed from right-to-left, IOW inside-out wrt parse tree
));
}; ## end ifelses()
Useful docs:
nargs()
stopifnot()
match.call()
Reduce()
call()
eval()
seq()
ifelse()
Demo:
ifelses(c(F,T,F,F),1:4,c(T,F,F,F),5:8,c(F,T,F,T),9:12,13:16);
## [1] 5 2 15 12
OP's example:
df <- data.frame(b=1:5);
df$a <- ifelses(df$b==1L,1L,df$b==2L,2L,df$b==3L,3L,4L);
df;
## b a
## 1 1 1
## 2 2 2
## 3 3 3
## 4 4 4
## 5 5 4
This is a job for merging with a lookup table. You can wrap that in a function, but usually I wouldn't bother:
df <- data.frame(b = 1:5)
lookupif <- function(df, x, y, else.val = NA, on.col, res.col = "val") {
lookup <- data.frame(x, y)
names(lookup)[1] <- res.col
df <- merge(df, lookup, by.x = on.col, by.y = "y", all.x = TRUE)
df[is.na(df[[res.col]]), res.col] <- else.val
df
}
lookupif(df, 1:3, 1:3, 4, "b")
# b val
#1 1 1
#2 2 2
#3 3 3
#4 4 4
#5 5 4
dplyr::case_when is a cascading alternative to nested ifelses, e.g.
library(dplyr)
df <- data.frame(b = 1:5)
df %>% mutate(a = case_when(b == 1 ~ 1,
b == 2 ~ 2,
b == 3 ~ 3,
TRUE ~ 4))
#> b a
#> 1 1 1
#> 2 2 2
#> 3 3 3
#> 4 4 4
#> 5 5 4
or just steal it and put it in base syntax:
df$a <- with(df, dplyr::case_when(b == 1 ~ 1,
b == 2 ~ 2,
b == 3 ~ 3,
TRUE ~ 4))
which returns the same thing.
Since it's already about as simple as you can get without sacrificing the versatility of ifelse, it may not need to be put into a function, but it could, if you like. Using the development version's new rlang NSE syntax,
add_cases <- function(.data, .col, ...){
.data %>% mutate(!!.col := case_when(!!!quos(...)))
}
df %>% add_cases(.col = 'a',
b == 1 ~ 1,
b == 2 ~ 2,
b == 3 ~ 3,
TRUE ~ 4)
#> b a
#> 1 1 1
#> 2 2 2
#> 3 3 3
#> 4 4 4
#> 5 5 4
Sorry for shameless advertisement - you can try if_val function in my package expss
b = sample(1:7, 10, replace = TRUE)
if_val(b, 1 ~ 1, 2 ~ 2, 3 ~ 3, other ~ 4)
There is also ifs function: ifs(b==1 ~ 1, b==2 ~ 2, b==3 ~ 3, TRUE ~ 4).

Resources