Using dplyr in assignment - r

I have a question about dplyr.
Lets say I want to update certain values in a dataframe, can I do this?:
mtcars %>% filter(mpg>20) %>% select(hp)=1000
(The example is nonsensical where all cars with MPGs greater than 20 have HP set to 1000)
I get an error so I am guessing the answer is no I can't use %>% and the dplyr verbs to the left of an assignment, but the dplyr syntax is a lot cleaner than:
mtcars[mtcars$mpg>20,"hp"]=1000
Especially when you are dealing with more complex cases, so I wanted to ask if there is any way to use the dplyr syntax in this case?
edit: It looks like mutate is the verb I want, so now my question is, can I dynamically change the name of the var in the mutate statement like so:
for (i in c("hp","wt")) {mtcars<-mtcars %>% filter(mpg>20) %>% mutate(i=1000) }
This example just creates a column named "i" with value 1000, which isn't what I want.

Related

How can I write this R expression in the pipe operator format?

I am trying to rewrite this expression to magrittr’s pipe operator:
print(mean(pull(df, height), na.rm=TRUE))
which returns 175.4 for my dataset.
I know that I have to start with the data frame and write it as >df%>% but I’m confused about how to write it inside out. For example, should the na.rm=TRUE go inside mean(), pull() or print()?
UPDATE: I actually figured it out by trial and error...
>df%>%
+pull(height)%>%
+mean(na.rm=TRUE)
+print()
returns 175.4
It would be good practice to make a reproducible example, with dummy data like this:
height <- seq(1:30)
weight <- seq(1:30)
df <- data.frame(height, weight)
These pipe operators work with the majority of the tidyverse (not just magrittr). What you are trying to do is actually coming out of dplyr. The na.rm=T is required for many summary variables like mean, sd, as well as certain functions used to gather specific data points like min, max, etc. These functions don't play well with NA values.
df %>% pull(height) %>% mean(na.rm=T) %>% print()
Unless your data is nested you may not even need to use pull
df %>% summarise(mean = mean(height,na.rm=T))
Also, using summarise you can pipe these into another dataframe rather than just printing, and call them out of the dataframe whenever you want.
df %>% summarise(meanHt = mean(height,na.rm=T), sdHt = sd(height,na.rm=T)) -> summary
summary[1]
summary[2]

Exclude columns by names in mutate_at in dplyr

I am trying to do something very simple, and yet can't figure out the right way to specify. I simply want to exclude some named columns from mutate_at. It works fine if I specify position, but I don't want to hard code positions.
For example, I want the same output as this:
mtcars %>% mutate_at(-c(1, 2), max)
But, by specifying mpg and cyl column names.
I tried many things, including:
mtcars %>% mutate_at(-c('mpg', 'cyl'), max)
Is there a way to work with names and exclusion in mutate_at?
You can use vars to specify the columns, which works the same way as select() and allows you to exclude columns using -:
mtcars %>% mutate_at(vars(-mpg, -cyl), max)
One option is to pass the strings inside one_of
mtcars %>%
mutate_at(vars(-one_of("mpg", "cyl")), max)

How to change column data type of a tibble (with least typing)

Pipes and tidyverse are sometimes very convenient. The user wants to do convert one column from one type to another.
Like so:
mtcars$qsec <-as.integer(mtcars$qsec)
This requires typing twice what I need. Please do not suggest "with" command since I find it confusing to use.
What would be the tidyverse and magrittr %<>% way of doing the same with least amount of typing? Also, if qsec is 6th column, how can I do it just refering to column position. Something like (not correct code)
mtcars %<>% mutate(as.integer,qsec)
mtcars %<>% mutate(as.integer,[[6]])
With typing reference to the column just once - the compliant answer is
mtcars %<>% mutate_at(6, as.integer)
Edit: note that as of 2021, mutate_at syntax has been superseded by
mtcars %<>% mutate(across(6), as.integer)
To refer to column by name, solution with one redundant typing of column name is
mtcars %<>% mutate(qsec = as.integer(qsec))
NOTE:credit goes to commenting users above
This solution is probably the shortest:
mtcars$qsec %<>% as.integer
The trick is to perform the cast operation directly on the column > no need for mutate() any more.
Update dplyr 1.0.0
Tidyverse recommends using across() which replaces, as noted by #Aren Cambre, mutate_at.
Here is how it looks like:
mtcars %>% mutate(across(qsec, as.integer))
Important:
Note that as.integer is written without parentheses ()

dplyr 0.5: arrange() using groupings

I've got a lot of code written in dplyr 0.4.3, that relied on the grouped arrange() function. As of the 0.5 release, arrange no longer applies grouping.
This decision baffles me, as this makes arrange() inconsistent with other dplyr verbs, and surely a user could just ungroup() before arrange() if ungrouped is required. I would have hoped for perhaps a parameter in arrange() to retain grouped_by behavior, but alas!
I therefore have to rewrite my grouped arrange. At this point, my only option seems to be to break up the pipe at the arrange call, loop through the groups and arrange group by group, and then bind() the result again. I'm hoping there might be a more elegant solution?
Below is an MRE, I'd like to run a cumsum on wt per group_by(cyl). Many thanks for ideas/suggestions.
library(dplyr)
mtcars %>%
group_by(cyl) %>%
arrange(desc(mpg)) %>%
mutate(WtCum = cumsum(wt))
To order within groups in dplyr 0.5, add the grouping variable before the other ordering variables within arrange.
mtcars %>%
group_by(cyl) %>%
arrange(cyl, desc(mpg))
If you want to keep around an “old arrange”, you may use this snippet:
arrange_old <- function(.data, ...) {
dplyr::arrange_(.data, .dots = c(groups(.data), lazyeval::lazy_dots(...)))
}
This will respect grouping by basically prepending the group variables to the new arrange call.
Then you can do:
mtcars %>%
group_by(cyl) %>%
arrange_old(desc(mpg))
For what it's worth, I've also found this change confusing and unintuitive, and I keep making the mistake of forgetting to explicitly specify the grouping.

r dplyr group_by - by variable content

I use dplyr group_by function to group my data frame,
and need to be able to group the data, by a column, i don't know the name of the column yet, i need to decide it along the code, so the name can't be hard coded.
for example,
i can't use
data %>% group_by(col_name)
i need to do somthing like
data %>% c <- col_name
data %>% group_by(c)
when i try doing so, it popes error:
Error: unknown variable to group by : c
All the examples I find are for the trevial case when you can hard code the name of the column
group by example
Same in the r help
Thanks.
You would like to look up NSE as others have said in their comments. Using that also requires you to use lazyeval package, and group_by_ function, which allows you to you standard evaluation. So it will look like:
data %>% group_by_(lazyeval::interp(~var, var = as.name(c)))

Resources