I am trying to do something very simple, and yet can't figure out the right way to specify. I simply want to exclude some named columns from mutate_at. It works fine if I specify position, but I don't want to hard code positions.
For example, I want the same output as this:
mtcars %>% mutate_at(-c(1, 2), max)
But, by specifying mpg and cyl column names.
I tried many things, including:
mtcars %>% mutate_at(-c('mpg', 'cyl'), max)
Is there a way to work with names and exclusion in mutate_at?
You can use vars to specify the columns, which works the same way as select() and allows you to exclude columns using -:
mtcars %>% mutate_at(vars(-mpg, -cyl), max)
One option is to pass the strings inside one_of
mtcars %>%
mutate_at(vars(-one_of("mpg", "cyl")), max)
Related
A new column must be added to the existed dataframe so, that it is the mean of some other columns which are selected dynamiclly.
I prefer using dplyr, and thus the solution might look like something as follows:
selected_columns <- c("am", "mpg")
dplyr::mutate_at(mt_cars, vars(selected_columns), funs(new_col = rowMeans(.)))
Is there a way to modify this chunk or is another approach required?
Here, we just need to subset the columns of data (. ) with the string vector and get the rowMeans
library(dplyr)
mtcars %>%
mutate(new_col = rowMeans(.[selected_columns]))
mutate doesn't have the funs parameter (funs is already deprecated with list) and it is in mutate_if/mutate_at/mutate_all.
I am using dplyr in R (with great joy) and want to get the differential of the columns mpg to gear in mtcars. The first row then returns NA (for obvious reason). Instead of this first row being NA I would like it to stay the original value.
I am looking for a clean and efficient way to achieve this (not using join to add the first row to the differntiated values since the code on my own dataset contains many filters and grouped variables).
my code is as follows:
mtcars %>% mutate_at(vars(mpg:gear), funs(. - lag(., 1)))
I expect the first row to be mtcars[1] and the rest to be the differential
We can specify the default parameter with 0, otherwise, it would be NA
library(dplyr)
mtcars %>%
mutate_at(vars(mpg:gear), list(~ . - lag(., default = 0)))
Or another option is diff with concatenating the first element
mtcars %>%
mutate_at(vars(mpg:gear), list(~ c(first(.), diff(.))))
NOTE: The funs is getting deprecated. Instead of that we are using list
I am trying to rewrite this expression to magrittr’s pipe operator:
print(mean(pull(df, height), na.rm=TRUE))
which returns 175.4 for my dataset.
I know that I have to start with the data frame and write it as >df%>% but I’m confused about how to write it inside out. For example, should the na.rm=TRUE go inside mean(), pull() or print()?
UPDATE: I actually figured it out by trial and error...
>df%>%
+pull(height)%>%
+mean(na.rm=TRUE)
+print()
returns 175.4
It would be good practice to make a reproducible example, with dummy data like this:
height <- seq(1:30)
weight <- seq(1:30)
df <- data.frame(height, weight)
These pipe operators work with the majority of the tidyverse (not just magrittr). What you are trying to do is actually coming out of dplyr. The na.rm=T is required for many summary variables like mean, sd, as well as certain functions used to gather specific data points like min, max, etc. These functions don't play well with NA values.
df %>% pull(height) %>% mean(na.rm=T) %>% print()
Unless your data is nested you may not even need to use pull
df %>% summarise(mean = mean(height,na.rm=T))
Also, using summarise you can pipe these into another dataframe rather than just printing, and call them out of the dataframe whenever you want.
df %>% summarise(meanHt = mean(height,na.rm=T), sdHt = sd(height,na.rm=T)) -> summary
summary[1]
summary[2]
I'm trying as per
dplyr mutate using variable columns
&
dplyr - mutate: use dynamic variable names
to use dynamic names in mutate. What I am trying to do is to normalize column data by groups subject to a minimum standard deviation. Each column has a different minimum standard deviation
e.g. (I omitted loops & map statements for convenience)
require(dplyr)
require(magrittr)
data(iris)
iris <- tbl_df(iris)
minsd <- c('Sepal.Length' = 0.8)
varname <- 'Sepal.Length'
iris %>% group_by(Species) %>% mutate(!!varname := mean(pluck(iris,varname),na.rm=T)/max(sd(pluck(iris,varname)),minsd[varname]))
I got the dynamic assignment & variable selection to work as suggested by the reference answers. But group_by() is not respected which, for me at least, is the main benefit of using dplyr here
desired answer is given by
iris %>% group_by(Species) %>% mutate(!!varname := mean(Sepal.Length,na.rm=T)/max(sd(Sepal.Length),minsd[varname]))
Is there a way around this?
I actually did not know much about pluck, so I don't know what went wrong, but I would go for this and this works:
iris %>%
group_by(Species) %>%
mutate(
!! varname :=
mean(!!as.name(varname), na.rm = T) /
max(sd(!!as.name(varname)),
minsd[varname])
)
Let me know if this isn't what you were looking for.
The other answer is obviously the best and it also solved a similar problem that I have encountered. For example, with !!as.name(), there is no need to use group_by_() (or group_by_at or arrange_() (or arrange_at()).
However, another way is to replace pluck(iris,varname) in your code with .data[[varname]]. The reason why pluck(iris,varname) does not work is that, I suppose, iris in pluck(iris,varname) is not grouped. However, .data refer to the tibble that executes mutate(), and so is grouped.
An alternative to as.name() is rlang::sym() from the rlang package.
I have a question about dplyr.
Lets say I want to update certain values in a dataframe, can I do this?:
mtcars %>% filter(mpg>20) %>% select(hp)=1000
(The example is nonsensical where all cars with MPGs greater than 20 have HP set to 1000)
I get an error so I am guessing the answer is no I can't use %>% and the dplyr verbs to the left of an assignment, but the dplyr syntax is a lot cleaner than:
mtcars[mtcars$mpg>20,"hp"]=1000
Especially when you are dealing with more complex cases, so I wanted to ask if there is any way to use the dplyr syntax in this case?
edit: It looks like mutate is the verb I want, so now my question is, can I dynamically change the name of the var in the mutate statement like so:
for (i in c("hp","wt")) {mtcars<-mtcars %>% filter(mpg>20) %>% mutate(i=1000) }
This example just creates a column named "i" with value 1000, which isn't what I want.