Conditional column sort - r

I have to sort one column in my df by checking a condition on a string.
Basically, I want to look into test.name and based on I want the column to be arranged asc or desc based on the value contained in it.
In the example below, I tried with paste0 after the pipe, but something is not working.
test.name <- "abc"
test.value <- data.frame(a = rnorm(100, 0, 1)
, b = rnorm(100, 0, 1))
result <- case_when(test.name == "bcd" ~ "desc"
, TRUE ~ "asc")
paste0("arrange(",result,"(b))",sep="")
test.value %>% paste0("arrange(",result,"(b))",sep="")

We could use parse_expr from rlang and evaluate (!!)
library(dplyr)
test.value %>%
arrange(!! rlang::parse_expr(case_when(test.name == 'bcd'~
'desc(b)', TRUE ~ 'b')))
Or we can use across as well
test.value %>%
arrange(across(b, ~ case_when(test.name == 'bcd' ~ desc(.), TRUE ~.)))

Related

mutate cells of a range of columns if the column name is in another column

I have a huge dataset where I would like to change a cell value in a range of columns, if the column name is in another column.
I know I can loop through cells, and use ifelse, but this becomes very slow very soon, it seems. I got as far as using mutate() and across() but cannot work out how to make a logical with the column name.
I would be grateful if someone could suggest a vectorized approach, or point me to a similar question (which I was unable to find!), using tidyverse if possible.
Example of a dataset and the nested for loops:
a <- c(1,2,3,4)
b <- c(5,6,7,8)
c <- c(9,10,11,12)
d <- c("a","b","c","none")
test <- data.frame(a,c,b,d)
for(column in 1:3){
for(row in 1:nrow(test)){
test[row,column] <- ifelse(names(test)[column] == test$d[row], -99, test[row, column])
}
}
I found the solution to my own question in using current_col() which gives the name of the current column in an across()function, using ifelse().
test %>% mutate(across(c(a, b, c), ~ifelse(cur_column() == d, -99, .)))
You could do this for every column of interest as well as any reference column in your dataset.
library(tidyverse)
test %>%
mutate(a = case_when(
d == names(test)[1] ~ -99,
T ~ a
))
You could then add a new mutate, or include it in the same mutate, per "target" column (i.e.)
test %>%
mutate(a = case_when(
d == names(test)[1] ~ -99,
T ~ a
)) %>%
mutate(b = case_when(
d == names(test)[2] ~ -99,
T ~ b
))
If you have multiple source columns (i.e. Columns like d, then you would need to add new rows to your mutates that account for that column, however since your test does not include that I won't get into it unless required.

Mutate to modify values and replace

Hi there I am trying to mutate values (e.g. changing kilograms to tonnes) and replace them in the original dataset but it doesn't seem to remain within the original dataset.
Here is a sample dataset for reference.
Country
Type
Quantity
A
Kilograms
23132
B
Kilograms
34235
C
Tonnes
700
library(dplyr)
df %>%
filter(Type == "Kilograms") %>%
group_by(Quantity) %>%
mutate(Quantity = Quantity /1000)
But I am not sure what to do the for next step, I tried the replace function but it didn't work.
Also, I plan to add a line at the end that changes all kilograms to tonnes, something like this:
df$Unit[df$Type == 'Kilograms'] <- 'Tonnes'
You can also use case_when() which is dplyr's equivalent to SQL's CASE WHEN. Basically it allows you to vectorize multiple if_else() statements. Below, the first condition is the if statement and then TRUE ~ is the else statement
df <- data.frame(Country = c('A', 'B', 'C'),
Type = c("Kilograms", "Kilograms", "Tonnes"),
Quantity = c(23132, 34235, 700))
df <- df %>%
mutate(Quantity = case_when(Type == 'Kilograms' ~ Quantity/1000,
TRUE ~ Quantity),
Type = case_when(Type == 'Kilograms' ~ 'Tonnes',
TRUE ~ 'Tonnes')
)
use ifelse function to change the value based on other condition. This function also works weel with tidyverse environment.
Don't forget to reassign the result to original variable since pipe operator does not change the input data
library(dplyr)
df = df %>% mutate(Quantity = ifelse(Type=="Kilograms",Quantity/1000,Quantity),
Type = ifelse(Type=='Kilograms','Tonnes',Type))

How to keep original values of cells if they do not meet criterion in case when at R?

I want the mutate to run on all cells values (edited: in DF with multiple columns), and if it doesn't meet any criterion of the case_when, to keep the original data.
for example
mutate(~case_when(.=="hi" ~1,
.=="hello"~2,
T~(keep original value)
))
If you are running case_when only for 1 column you can refer to the column name itself in TRUE ~
library(dplyr)
df %>%
mutate(cyl = case_when(col == 'hi' ~ 1,
col == 'hello' ~ 2,
TRUE ~ col))
If you are running this for multiple columns with mutate_at/across you can use .
df %>%
mutate(across(c(a, b), ~case_when(.== "hi" ~ 1,
.== "hello"~2,
TRUE ~ .)))

Change iteration number in one Dplyr command

I need to be able to change iteration number in each seperated line of one dplyr code.
I have prepared and example of 'by hand' approach and what I need to do in 'pseudo steps'.
library(tidyverse)
cr <-
mtcars %>%
group_by(gear) %>%
nest()
# This is 'by-hand' approach of what I would like to do - How to automate it? E.g. we do not know all values of 'carb'
cr$data[[1]] %>%
mutate(VARIABLE1 =
case_when(carb == 1 ~ hp/mpg,
TRUE ~ 0)) %>%
mutate(VARIABLE2 =
case_when(carb == 2 ~ hp/mpg,
TRUE ~ 0)) %>%
mutate(VARIABLE4 =
case_when(carb == 4 ~ hp/mpg,
TRUE ~ 0))
# This is a pseodu-idea of what I need to do. Is the any way how to change iteration number in ONE dplyr code?
vals <- cr$data[[1]] %>% pull(carb) %>% sort %>% unique()
for (i in vals) {
message(i)
cr$data[[1]] %>%
mutate(paste('VARIABLE', i, sep = '') = case_when(carb == i ~ hp/mpg, # At this line, all i shall be first element of vals
TRUE ~ 0)) %>%
mutate(paste('VARIABLE', i, sep = '') = case_when(carb == i ~ hp/mpg, # At this line, all i shall be second element of vals
TRUE ~ 0)) %>%
mutate(paste('VARIABLE', i, sep = '') = case_when(carb == i ~ hp/mpg, # At this line, all i shall be third element of vals
TRUE ~ 0))
}
is there any trick maybe using purrr package or other solution as well?
I need to iterate over some unique values of some variable. And for each unique value create a new column in dataframe. I need to automatize this, however I am not able to do so on my own.
You can do this using sym to convert text to symbols and !! to evaluate within dplyr functions. See this question and this vignette for further details.
For your application, you probably want something like this:
carbs = c(1,2,4)
for(cc in carbs){
var_name = sym(paste0("VARIABLE",cc))
cr$data[[1]] = cr$data[[1]] %>%
mutate(!!var_name := case_when(carb == cc ~ hp/mpg,
TRUE ~ 0))
}
There are three key parts to this:
sym turns the text string into a symbol variable.
!! means that a symbol is treated as an R expression
:= lets us use !! evaluation on the left-hand-side of the equation

conditional mutate by dplyr

I want to create a new column in my dataset that: i) drop the last 1 character if the word itself starts with "vi"; and ii) drop the last 2 characters if the word itself does not start with "vi". I know how to work on that in R, like below:
iris$Species <- as.character(iris$Species)
iris$Species_mod <- substr(iris$Species,
1,
ifelse(grepl('^vi',iris$Species),
nchar(iris$Species)-1,
nchar(iris$Species)-2))
But I have a hard time in deciphering the mutate, if_else and matches in dplyr. Can anyone enlighten me? Thanks!
Same idea, except you explicitly need to convert your factor to a string
iris = mutate(iris, Species_mod = substr(Species, 1, nchar(as.character(Species)) -
ifelse(grepl('^vi', Species), 1, 2)))
you can try something like:
iris %>%
mutate(Species = as.character(Species)) %>%
rowwise() %>%
mutate(species2 = case_when(
Species == 'vi%' ~ substr(Species, 0, nchar(Species) - 1),
Species != 'vi%' ~ substr(Species, 0, nchar(Species) - 2)
))

Resources