Replace value in dataframe non-conditionally with dplyr - r

What is the dplyr function (if any) for df[df == 2] <- 3?
(i.e. replace all values of 2 in the dataframe df by 3)
With dplyr I could do that as:
df %>% mutate_all(funs(ifelse(.==2, 3, .)))
Is there a function such as recode_all(df, old_value=2, new_value=3)?

We can also use replace
library(dplyr)
df %>%
replace(.< 22, "smaller_22")

thats a pretty good one in my opinion:
df1 <- mtcars[1:4,1:4]
df1 %>% `[<-`(., . < 22, value = "smaller_22")
so in your special case:
df %>% `[<-`(., . == 2, value = 3)

This could work:
df %>% mutate_all(funs(case_when(.==2 ~ 3, TRUE ~ as.numeric(.))))
(not sure why it wants me to change it to numeric, but just putting in . at the end gave an error...)

Related

fill() in missing lubridate value from a different column

Below is a fictional reproducible example of pick-up and drop-of times of four taxis.
Taxi 1, 2, and 3 unfortunately have a missing in the drop-of time. fortunately, two of these times (for taxi 1 and 3) can be inferred to be at least 1 sec before they pick-up new costumers (these are non-ride sharing taxi, very corona-proof):
(the below df is - in the real use case - the result of a group_by and summarise of another df)
library(dplyr)
x <- seq(as.POSIXct('2020/01/01'), # Create sequence of dates
as.POSIXct('2030/01/01'),
by = "10 mins") %>%
head(20) %>%
sort()
taxi_nr <- c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4)
drop_of <- x[c(TRUE, FALSE)]
pick_up <- x[c(FALSE, TRUE)]
drop_of[2] <- NA
drop_of[5] <- NA
drop_of[7] <- NA
df <- data.frame(taxi_nr,pick_up,drop_of) %>%
arrange(pick_up)
I wish to fill in the NA of taxi 1 and 3, I have tried the following:
df <- df %>%
fill(drop_of, .direction = "up")
However, this take the below drop-of value instead of the below pick-up value and does not take into account the taxi nr.
I have also thought about:
df <- df %>%
filter(is.na(drop_of)) %>%
mutate(drop_of, ov[,+1])
This seems to run into problems with the taxi_nr 2 case, as there is no [,+1] in within the group - or so I believe is the issue. I have tried to add safely(), possibly() and quietly(), but that did not help:
df <- df %>%
filter(is.na(drop_of)) %>%
mutate(drop_of, purr::safely(ov[,+1]))
Does anyone have a solution?
ps: once I get the right column for filling in it also needs to be subtracted 1 second and be in the right lubridate formate (d/m/y-h/m/s)
THANKS!
You can try to use a temporary variable for it, although it does not look pretty
df <- df %>%
mutate(temp = ifelse(is.na(drop_of), NA, pick_up)) %>%
group_by(taxi_nr) %>%
fill(temp, .direction = "up") %>%
ungroup() %>%
mutate(drop_of = ifelse(is.na(drop_of), temp - 1, drop_of),
drop_of = as.POSIXct(drop_of, origin = "1970-01-01")) %>%
select(-temp)
And if you need your data in a format d/m/y-h/m/s, you could do that with a format() function (I am not sure if what you described is exactly what you need, but at least you should get the idea)
df <- df %>% mutate(drop_of = format(drop_of, "%d/%m/%Y-%H/%M/%S"))

How to add a new column using mutate function from a group of existing columns with similar names

I would like to add a column to my data frame based upon the values in other columns.
Here is an extract of the data.
On each row if any of the 4 TOPER columns have any of the following values (92514, 92515, 92508, 92510, 92511 or 92512( i want the S_Flag column to be equal to 1, If not the S_Flag value should be 0.
Have highlighted the data where this true (case nos 2, 4, 6 and 8) - therefore S_Flag should be made 1.
Have tried using a ifelse inside a mutate function. Just not sure how to identify looking across all 4 TOPER columns within the ifelse function???
Have tried
tt <- mutate(rr, S_Flag = ifelse( any(vars(everything()) %in% toper_vec), 1,0))
where rr is the original data frame and toper_vec is a vector containing the 6 TOPER column values.
Hope that makes sense. By the way i am in early stages of learning R.
Thank you for any assistance.
A couple of quick fixes should make your code work:
(1) use rowwise() and
(2) use across().
The revised code reads:
tt <- rr %>%
rowwise() %>%
mutate(S_Flag = if_else( any(across(everything()) %in% toper_vec), 1,0))
A similar question was addressed in the following informative post: Check row wise if value is present in column and update new column row wise
Applying the suggested approach in that post to your immediate question, the following should work:
library(tidyverse)
toper_vec <- c(92514, 92515, 92508, 92510, 92511, 92512)
df <- data.frame("CASE" = c(1, 2, 3, 4, 5),
"TOPER1" = c(86509, 92514, 87659, 45232, 86509),
"TOPER2" = c(12341, 10094, 12341, 92508, 10094),
"TOPER3" = c(86509, 67326, 41908, 50567, 50567))
new_df <- df %>%
rowwise() %>%
mutate(S_Flag = case_when(TOPER1 %in% toper_vec ~ 1,
TOPER2 %in% toper_vec ~ 1,
TOPER3 %in% toper_vec ~ 1,
TRUE ~ 0))
Here's an alternative, reusing toper_vec and df from Blue050205 :
df %>%
rowwise() %>%
mutate(s_flag = if_else(any(c_across(starts_with("TOP")) %in% toper_vec), 1, 0))

Add multiple columns with mutate using column-based conditions, without using explicit column name + POSIX

I have a dataframe of data: 1 column is POSIX, the rest is data.
I need to remove selectively some data from a group of columns and add these "new" columns to the original dataframe.
I can "easily" do it in base R (I am an old-style user). I'd like to do it more compactly with mutate_at or with other function... although I am having several issues.
A solution homemade with base R could be
df <- data.frame("date" = seq.POSIXt(as.POSIXct(format(Sys.time(),"%F %T"),tz="UTC"),length.out=20,by="min"), "a.1" = rnorm(20,0,3), "a.2" = rnorm(20,1,2), "b.1"= rnorm(20,1,4), "b.2"= rnorm(20,3,4))
df1 <- lapply(df[,grep("^a",names(df))], function(x) replace(x, which(x > 0 & x < 0.2), NA))
df1 <- data.frame(matrix(unlist(df1), nrow = nrow(df), byrow = F)) ## convert to data.frame
names(df1) <- grep("^a",names(df),value=T) ## rename columns
df1 <- cbind.data.frame("date"=df$date, df1) ## add date
Can anyone help me in setting up something working with dplyr + transmute?
So far I come up with something like:
df %>%
select(starts_with("a.")) %>%
transmute(
case_when(
.>0.2 ~ NA,
)
) %>%
cbind.data.frame(df)
But I am quite stuck, since I can't combine transmute with case_when: all examples that I found use explicitly the column names in case_when, but I can't, since I won't know the names of the column in advance. I will only know the initial of the columns that I need to transmute.
Thanks,
Alex
We can use transmute_at if the intention is to return only those columns specified in the vars
library(dplyr)
df %>%
transmute_at(vars(starts_with('a')), ~ case_when(. > 0.2~ NA_real_, TRUE~ .)) %>%
bind_cols(df %>% select(date), .)
If we need all the columns to return, but only change the columns of interest in vars, then we need mutate_at instead of transmute_at
df %>%
mutate_at(vars(starts_with('a')), ~ case_when(. > 0.2~ NA_real_, TRUE~ .)) %>%
select(date, starts_with('a')) # only need if we are selecting a subset of columns

How do I combine mutate_all and ifelse

I would like to iterate over all columns of a data.frame with mutate_all() and then selectively change values using ifelse().
testdf <- data.frame("a"=c(1,2,3), "b"=c(4,5,6), "c"=c(7,8,9))
mutate_all(testdf, ifelse(.>9,10,.))
But this does not work. I always get "object '.' not found". How do I refer to the individual values passed through the mutate_all() function? I thought the '.' worked that way? This works:
mutate_all(testdf, funs(.*2))
Try any of these:
testdf %>% mutate_all(function(x) ifelse(x>9,10,x))
testdf %>% mutate_all(funs(ifelse(.>9,10,.)))
testdf %>% mutate_all(testdf, ~ifelse(.>9,10,.))
testdf %>% mutate_all(~ pmin(., 10))
testdf %>% mutate_all(pmin, 10)
testdf %>% mutate_all(~ replace(., . > 9, 10))
testdf %>% replace(. > 9, 10)
Last two are per Ronak Shah comment below.
Update
Since this question was asked dplyr 1.0.0 has come out and introduced a new across function which is used with mutate and is now preferred over the mutate_* functions.
testdf %>% mutate(across(, ~ pmin(., 10)))

dplyr having trouble redefining type with group_by()

I have the following problem:
When using dplyr to mutate a numeric column after group_by(), it fails if a row contains only one value which is an NaN when using the mutate command.
Thus, if the grouped column contains a numeric, it correctly classifies as dbl, but as soon as there is an instance of only a NaN for a group, it fails as dplyr defines that group as lgl, while all the other groups are dbl.
My first (and more general question) is:
Is there a way to tell dplyr, when using group_by(), to always define a column in a certain way?
Secondly, can someone help me with a hack for the problem explained in the MWE below:
# ERROR: This will provide the column defining error mentioned:
df <- data_frame(a = c(rep(LETTERS[1:2],4),"C"),g = c(rep(LETTERS[5:7],3)), x = c(7, 8,3, 5, 9, 2, 4, 7,8)) %>% tbl_df()
df <- df %>% group_by(a) %>% mutate_each(funs(sd(., na.rm=TRUE)),x)
df <- df %>% mutate(Winsorise = ifelse(x>2,2,x))
# NO ERROR (as no groups have single entry with NaN):
df2 <- data_frame(a = c(rep(LETTERS[1:2],4),"C"),g = c(rep(LETTERS[5:7],3)), x = c(7, 8,3, 5, 9, 2, 4, 7,8)) %>% tbl_df()
df2 <- df2 %>% group_by(a) %>% mutate_each(funs(sd(., na.rm=TRUE)),x)
# Update the Group for the row with an NA - Works
df2[9,1] <- "A"
df2 <- df2 %>% mutate(Winsorise = ifelse(x>3,3,x))
# REASON FOR ERROR: What happens for groups with one member = NaN, although we want the winsorise column to be dbl not lgl:
df3 <- data_frame(g = "A",x = NaN)
df3 <- df3 %>% mutate(Winsorise = ifelse(x>3,3,x))
The reason is, as you rightly pointed out in df3, that the mutate result is cast as a logical when the source column is NaN/NA.
To circumvent this, cast your answer as numeric:
df <- df %>% mutate(Winsorise = as.numeric(ifelse(x>2,2,x)))
Perhaps #hadley could shed some light on why the mutate result is cast as lgl?

Resources