Using dplyr's select where variable names are quoted [duplicate] - r

This question already has answers here:
Pass a vector of variable names to arrange() in dplyr
(6 answers)
Closed 7 years ago.
Often I'll want to select a subset of variables where the subset is the result of a function. In this simple case, I first get all the variable names which pertain to width characteristics
library(dplyr)
library(magrittr)
data(iris)
width.vars <- iris %>%
names %>%
extract(grep(".Width", .))
Which returns:
>width.vars
[1] "Sepal.Width" "Petal.Width"
It would be useful to be able to use these returns as a way to select columns (and while I'm aware that contains() and its siblings exist, there are plenty of more complicated subsets I would like to perform, and this example is made trivial for the purpose of this example.
If I was to attempt to use this function as a way to select columns, the following happens:
iris %>%
select(Species,
width.vars)
Error: All select() inputs must resolve to integer column positions.
The following do not:
* width.vars
How can I use dplyr::select with a vector of variable names stored as strings?

Within dplyr, most commands have an alternate version that ends with a '_' that accept strings as input; in this case, select_. These are typically what you have to use when you are utilizing dplyr programmatically.
iris %>% select_(.dots=c("Species",width.vars))

First of all, you can do the selection in dplyr with
iris %>% select(Species, contains(".Width"))
No need to create the vector of names separately. But if you did have a list of columns as string names, you could do
width.vars <- c("Sepal.Width", "Petal.Width")
iris %>% select(Species, one_of(width.vars))
See the ?select help page for all the available options.

Related

Extract all columns except numeric in R data frame [duplicate]

This question already has answers here:
How to select non-numeric columns using dplyr::select_if
(3 answers)
Closed 1 year ago.
In my project, I want to extract all the columns except numeric from my R data frame, as this question I used the same method and just put a not gate into is.numeric() R function but it is not working
This gives all the numaric data,
x<-iris %>% dplyr::select(where(is.numeric))
But this does not work as expected,
x<-iris %>% dplyr::select(where(!is.numeric))
Note: Finally the output data frame should only contain the species column in the iris dataset
purrr package from tidyverse serves exactly what you want by purrr::keep and purrr::discard
library(purrr)
x <- iris %>% keep(is.numeric)
by these piece of code, you set a logical test in keep function and only the columns which passed the test stays.
to reverse that operation and achieve to your wish, you can use discard from purrr also;
x <- iris %>% discard(is.numeric)
you can think discard as keep but with !is.numeric
or alternatively by dplyr
x <- iris %>% select_if(~!is.numeric(.))

Add a column based on the dynamically named columns

A new column must be added to the existed dataframe so, that it is the mean of some other columns which are selected dynamiclly.
I prefer using dplyr, and thus the solution might look like something as follows:
selected_columns <- c("am", "mpg")
dplyr::mutate_at(mt_cars, vars(selected_columns), funs(new_col = rowMeans(.)))
Is there a way to modify this chunk or is another approach required?
Here, we just need to subset the columns of data (. ) with the string vector and get the rowMeans
library(dplyr)
mtcars %>%
mutate(new_col = rowMeans(.[selected_columns]))
mutate doesn't have the funs parameter (funs is already deprecated with list) and it is in mutate_if/mutate_at/mutate_all.

dplyr - convert column names containing words to character

I want to convert column names that start with the word "feature" to character type using dplyr. I tried the below and a few other variations using answers from stackoverflow. Any help would be appreciated. Thanks!
train %>% mutate_if(vars(starts_with("feature")), funs(as.character(.)))
train %>% mutate_if(vars(starts_with("feature")), funs(as.character(.)))
I am trying to improve my usage of dplyr commands.
You need mutate_at instead
library(dplyr)
train %>% mutate_at(vars(starts_with("feature")), as.character)
As #Gregor mentioned, mutate_if is when selection of column is based on the actual data in the column and not the names.
For example,
iris %>% mutate_if(is.numeric, sqrt)
So if the data in the column is numeric only then it will calculate square root.
If we want to combine multiple vars statement into one we can use matches
merchants %>% mutate_at(vars(matches("_id|category_")), as.character)

Filtering tables with a character variable as a column name in dplyr R [duplicate]

This question already has answers here:
R dplyr: Non-Standard Evaluation difficulty. Would like to use dynamic variable names in filter and mutate
(2 answers)
Closed 4 years ago.
I am trying to filter the mtcars table in R, referencing a column name with a character variable. So, I write:
var <- "cyl"
mtcars %>%
filter(!!var > 6)
But, for some reason the table isn't being filtered. I think this code is the equivalent of this:
mtcars %>%
filter("cyl" > 6)
What I really need is to convert that string to a name. Does anybody know how to handle this problem?
This works for me:
library(dplyr)
var <- sym("cyl")
mtcars %>%
filter(!!var > 6)

dplyr use select() helpers inside mutate() [duplicate]

This question already has answers here:
dplyr mutate rowSums calculations or custom functions
(7 answers)
Closed 4 years ago.
I'd to make a new variable which represents the sum (or other function) of many other variables which all start with "prefix_". Is there a way to do this neatly using these select() helpers (e.g. starts_with())?
I don't think mutate_at() works for this since I'm just trying to create a single new variable based on many existing variables.
My attempt:
df %<>%
mutate(newvar = sum(vars(starts_with("prefix_"))))
This of course doesn't work. Many thanks!
A reproducible example:
mtcars %<>%
rename("prefix_mpg" = mpg) %>%
rename("prefix_cyl" = cyl) %>%
mutate(newvar = sum(var(starts_with("prefix_"))))
Intended output would be mtcars$newvar which is the sum of prefix_mpg and prefix_cyl. Of course I could just explicitly name mpg and cyl but in my actual case it's a long list of variables, too long to name conveniently.
We can use starts_with with the select call and put them in the rowSums function. . is a way to specify the object from the output of the previous pipe operation.
library(dplyr)
mtcars %>%
rename(prefix_mpg = mpg, prefix_cyl = cyl) %>%
mutate(newvar = rowSums(select(., starts_with("prefix_"))))

Resources