Using dplyr to mutate the following rows after meeting condition - r

I am trying to add a new column, with character strings based on another column, via an ifelse statement in dplyr. When the condition is met, I also want the following two rows to also show the same value.
I show an example from the mtcars dataset
mtcars %>%
mutate(type=ifelse(mpg>20,"Event", "No event")) %>%
mutate(type=ifelse(type=="Event", lead(type),`type`))
What I am trying to do here is produce a new column called type, which if the mpg>20, I want the row to state "event" and if not "no event". However, I also want the two rows following the mpg>20 also to show "Event", even if they don't meet the criteria.
Hope this makes sense

I am not sure I understand the problem correctly.
However you can try to modify the logical expression inside if_else:
mtcars %>%
mutate(type = if_else(mpg > 20 | lag(mpg) > 20 | lag(mpg, n = 2) > 20, "Event", "No event"))
mpg type
1 21.0 Event
2 21.0 Event
3 22.8 Event
4 21.4 Event
5 18.7 Event
6 18.1 Event
7 14.3 No event
8 24.4 Event
9 22.8 Event
10 19.2 Event
11 17.8 Event
12 16.4 No event
13 17.3 No event
14 15.2 No event
15 10.4 No event
16 10.4 No event
17 14.7 No event
18 32.4 Event

For a general solution, you can use zoos rolling function. You can adjust the window size based on how much you want to look back.
library(dplyr)
library(zoo)
mtcars %>% mutate(type = rollapplyr(mpg > 20, 3, any, partial = TRUE))
# mpg cyl disp hp drat wt qsec vs am gear carb type
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 TRUE
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 TRUE
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 TRUE
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 TRUE
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 TRUE
#6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 TRUE
#7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 FALSE
#8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 TRUE
#9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 TRUE
#10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 TRUE
#11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 TRUE
#12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 FALSE
#13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 FALSE
#...
#...
You can then change it to "Event", "No Event" using ifelse :
mtcars %>%
mutate(type = ifelse(rollapplyr(mpg > 20, 3, any, partial = TRUE),
'Event', 'No event'))
Or without ifelse :
mtcars %>%
mutate(type = c('No event', 'Event')
[rollapplyr(mpg > 20, 3, any, partial = TRUE) + 1])

Related

Correct way to mutate with case_when while using a loop

I found somewhat similar examples here and here, but I didn't follow the examples for the problem I am trying to solve.
What I would like to do is to use mutate and case_when to create a new column. The new column would create a category classification (e.g., "category_1") depending on the values from a different column. Since the number of values may change I want to make the case_when dynamic.
The problem is when this loop operates, it operates fine on each iteration, but when the loop advances it overwrites the previous values. So I am wondering how to use a case_when in a loop that would prevent the last loop value being evaluated while overwriting the previous iterations.
Here is a reproducible example:
library(tidyverse)
# Use built-in data frame for reproducible example
my_df <- mtcars
# Create sequence to reference beginning and end ranges within mpg values
mpg_vals <- sort(mtcars$mpg)
beg_seq <- seq(1, 31, 4)
end_seq <- seq(4, 32, 4)
# Create loop to fill in mpg category
for(i in 1:8){
my_df <- my_df %>%
mutate(mpg_class = case_when(
mpg %in% mpg_vals[beg_seq[i]:end_seq[i]] ~ paste0("category", i)
)
)
# Observe loop values
print(mpg_vals[beg_seq[i]:end_seq[i]])
print(paste0("category_", i))
}
Edit:
If I understand the questions right, you want every fourth ranking of mpg to get a new category. You might use:
my_df %>%
mutate(mpg_class = paste("category", 1 + min_rank(mpg) %/% 4))
That produces:
mpg cyl disp hp drat wt qsec vs am gear carb mpg_class
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 category 5
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 category 5
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 category 7
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 category 6
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 category 4
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 category 4
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 category 2
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 category 7
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 category 7
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 category 5
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 category 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 category 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 category 4
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 category 2
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 category 1
...
Original answer: A looped case_when seems complicated when you could do:
lengths <- end_seq - beg_seq + 1
my_df$mpg_class <- rep(paste0("category", 1:length(lengths)), lengths)
This finds the length of each category. Then we make a vector that repeats each category name as many times as the length of the category and assign that to an mpg_class column.

Replacing values in R dataframes based on conditional

I'm having trouble replacing values in a column of a R dataframe based upon conditions related to other data variables.
I've created a new dataframe called VAED1 based on the left join between the original data frame VAED (has over 20 variables) and another dataframe called new_map (has only 3 variables and one is called Category)
Here is the code i wrote that works fine:
#join the left side table (VAED) with the right side table (new_map) with the left join function
VAED1 <- VAED %>%
left_join(new_map, by = c("ID1" = "ID2"), suffix= c("_VAED", "_MAP"))***
I then added a three extra columns (nnate, NICU, enone) to the dataframe VAED1 using mutate function to create a new dataframe VAED2:
VAED2 <- VAED1 %>%
mutate(nnate = if_else((substr(W25VIC,1,1) == "P") & (CARE != "U") & (AGE < 1) , "Y", "N"))%>%
mutate(NICU = if_else((nnate == "Y") & (ICUH > 0), "Y", "N"))%>%
mutate(enone = if_else((EMNL == "E") , "Emerg", "Non-emerg")%>%***
Everything works fine to this point.
Finally I wanted to replace the values in one column called Category (this was a character variable in the original joined dataset new_map) based upon certain conditions of other variables in the dataframe. So only change values in the Category column when W25VIC and CARE variables equal certain values. Otherwise leave the original value,)
Use the code:
Category <- if_else((W25VIC == "R03A") & (SAMEDAY == "Y"), "08 Other multiday", Category)
This always shows an error - object 'W25VIC' and 'SAMEDAY' not found. It seems straightforward but the last line of code doesn't work no matter what i do. I check the dataframe using a Head command to make sure the data columns are there during each step. They exist but the code doesn't seem to recognise them.
Grateful for any ideas on what I am doing wrong.
Also used the command
Category[(W25VIC == "R03A") & (SAMEDAY == "Y")] <- "08 Other multiday"
Still same error message.
I think it is worth to readup on how the magrittr pipe works. The pipe takes an object from the left-hand side of an expression and moves it as the first argument into a function on the right.
So x %>% f() becomes f(x) and x %>% f(y) becomes f(x, y). In your last statement
Category <- if_else((W25VIC == "R03A") & (SAMEDAY == "Y"), "08 Other multiday", Category)
the x and the function of what to do following the evaluation of the if_else statement is missing. Here is an example how to use the pipe operator together with an if_else statement to generate a new column:
library(tidyverse)
data <- mtcars
new_data <- data %>% mutate( evaluation = if_else(hp > 150, "awesome", "lame"))
head(new_data, 20)
#> mpg cyl disp hp drat wt qsec vs am gear carb evaluation
#> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 lame
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 lame
#> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 lame
#> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 lame
#> 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 awesome
#> 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 lame
#> 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 awesome
#> 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 lame
#> 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 lame
#> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 lame
#> 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 lame
#> 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 awesome
#> 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 awesome
#> 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 awesome
#> 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 awesome
#> 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 awesome
#> 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 awesome
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 lame
#> 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 lame
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 lame
Created on 2021-01-07 by the reprex package (v0.3.0)

Using `dplyr::na_if` with a probability to create missing data?

I'm interested in simulating data with a chance of missing-ness. How can I do this using using dplyr::na_if?
Intuitively I wanted to do something like:
mtcars %>%
mutate(mpg = na_if(mpg, rbinom(n = n(),
1,
prob = .5) == 1))
But I think this is wrong because na_if is really for matching x and y. How do I use na_if to create a probability of missingness?
(edit: Also if there is a better function for creating missing data in the tidyverse please let me know in the comments)
You don't need na_if here, just use if_else. rbinom is overkill also, runif works fine.
mtcars %>%
mutate(mpg = if_else(runif(n = n()) > 0.5, NA_real_, mpg))
With a slight modification of your code:
mtcars %>%
mutate(mpg = if_else(rbinom(n(), 1, prob = 0.5) == 1, NA_real_, mpg))
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 NA 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
6 NA 6 225.0 105 2.76 3.460 20.22 1 0 3 1
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
10 NA 6 167.6 123 3.92 3.440 18.30 1 0 4 4

Combine a list of data frames column wise and return a list of combined data frames using R

I would like to combine two list of data frames element wise and return a list of data frames. The following code works for the mtcars dataset
list1=split(mtcars[c(1:16),-11],mtcars[c(1:16),2])
list2=split(data.frame(mtcars[c(1:16),]),mtcars[c(1:16),2])
newList=Map(cbind, list1, list2)
How do I modify the Map function to just bind a specific column(s) from list2? Thanks
Since #thelatemail doesn't want to add an answer here is purrr version of his answer.
library(purrr)
map2(list1, map(list2, `[`, 'carb'), cbind)
#Or
#map2(list1, map(list2, `[`, 'carb'), dplyr::bind_cols)
#$`4`
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1
#2 24.4 4 146.7 62 3.69 3.19 20.00 1 0 4 2
#3 22.8 4 140.8 95 3.92 3.15 22.90 1 0 4 2
#$`6`
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#4 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#5 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#6 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#$`8`
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#2 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#3 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#4 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#5 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#6 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#7 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4

how to compute rowsums using tidyverse

I did mtcars %>% by_row(sum) but got the message:
by_row() is deprecated; please use a combination of: tidyr::nest();
dplyr::mutate(); purrr::map()
My naive approach is this
mtcars %>%
group_by(id = row_number()) %>%
nest(-id) %>%
mutate(hi = map_dbl(data, sum))
Is there a way to do it without creating an "id" column?
Is this what you are looking for?
mtcars %>% mutate(rowsum = rowSums(.))
Output:
mpg cyl disp hp drat wt qsec vs am gear carb rowsum
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 328.980
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 329.795
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 259.580
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 426.135
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 590.310
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 385.540
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 656.920
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 270.980
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 299.570
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 350.460
11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 349.660
12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 510.740
13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 511.500
14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 509.850
15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 728.560
16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 726.644
Subset of columns is available too.
mtcars %>% mutate(rowsum = rowSums(.[2:4]))
mtcars %>% mutate(rowsum = pmap_dbl(., sum))
Furthermore, you can use conditional subsetting, but then you sum up the number of the columns that meet the criterion, not the values:
mtcars %>%
select(all_of(c('gear', 'carb'))) %>%
mutate(
high_gear_carb = rowSums(. > 3)
)
gear carb high_gear_carb
1 4 4 2
2 4 4 2
3 4 1 1
4 3 1 0
5 3 2 0
6 3 1 0
7 3 4 1
...

Resources