I want to convert column names that start with the word "feature" to character type using dplyr. I tried the below and a few other variations using answers from stackoverflow. Any help would be appreciated. Thanks!
train %>% mutate_if(vars(starts_with("feature")), funs(as.character(.)))
train %>% mutate_if(vars(starts_with("feature")), funs(as.character(.)))
I am trying to improve my usage of dplyr commands.
You need mutate_at instead
library(dplyr)
train %>% mutate_at(vars(starts_with("feature")), as.character)
As #Gregor mentioned, mutate_if is when selection of column is based on the actual data in the column and not the names.
For example,
iris %>% mutate_if(is.numeric, sqrt)
So if the data in the column is numeric only then it will calculate square root.
If we want to combine multiple vars statement into one we can use matches
merchants %>% mutate_at(vars(matches("_id|category_")), as.character)
Related
This question already has answers here:
How to select non-numeric columns using dplyr::select_if
(3 answers)
Closed 1 year ago.
In my project, I want to extract all the columns except numeric from my R data frame, as this question I used the same method and just put a not gate into is.numeric() R function but it is not working
This gives all the numaric data,
x<-iris %>% dplyr::select(where(is.numeric))
But this does not work as expected,
x<-iris %>% dplyr::select(where(!is.numeric))
Note: Finally the output data frame should only contain the species column in the iris dataset
purrr package from tidyverse serves exactly what you want by purrr::keep and purrr::discard
library(purrr)
x <- iris %>% keep(is.numeric)
by these piece of code, you set a logical test in keep function and only the columns which passed the test stays.
to reverse that operation and achieve to your wish, you can use discard from purrr also;
x <- iris %>% discard(is.numeric)
you can think discard as keep but with !is.numeric
or alternatively by dplyr
x <- iris %>% select_if(~!is.numeric(.))
Is there a way to export both the column name and label from R so that they appear as the 1st and 2nd row of a spreadsheet. I'm able to do the reverse (import) where I read in each row and then use names() and label() to assign the name/label. But I'm stuck on how to do the export without manually adding the labels as a row of data in R first.
Column name/label in Viewer
Here is a simple solution:
library(dplyr) #for pipes and mutate_if
library(purrr) #for map_chr
library(expss) #for apply_labels and labels management
iris2=iris %>%
mutate_if(is.factor, as.character) %>%
apply_labels(Sepal.Length="length", Sepal.Width="witdh", Petal.Length="length2", Petal.Width="width2", Species="spec")
library(Hmisc)
rtn=rbind(names(iris2), label(iris2), iris2)
rtn %>% head
You have to use mutate_if to change all factors to character vectors, like I did in my dummy dataset, else you would have NA instead of names and labels.
Still, please note that this leads to untidy data as the first non-heading row is not an observation. It may be OK for outputting though.
I have a tibble and want to only do some mutation on selected columns. In the case below, all columns which have the word 'date' will be transformed by a function (as.Date()).
After I have performed some transformations on the selected columns, I want to get back all the columns from my tibble df.
Is there a way to do so?
df %>% select(contains('date')) %>% mutate_all(as.Date) %>% select(all)
Thanks
We can use mutate_at instead of select and then mutate_all. This would select only the columns of interest and modify those while keeping the others columns as such
library(dplyr)
df %>%
mutate_at(vars(contains('date')), as.Date)
This question already has answers here:
Pass a vector of variable names to arrange() in dplyr
(6 answers)
Closed 7 years ago.
Often I'll want to select a subset of variables where the subset is the result of a function. In this simple case, I first get all the variable names which pertain to width characteristics
library(dplyr)
library(magrittr)
data(iris)
width.vars <- iris %>%
names %>%
extract(grep(".Width", .))
Which returns:
>width.vars
[1] "Sepal.Width" "Petal.Width"
It would be useful to be able to use these returns as a way to select columns (and while I'm aware that contains() and its siblings exist, there are plenty of more complicated subsets I would like to perform, and this example is made trivial for the purpose of this example.
If I was to attempt to use this function as a way to select columns, the following happens:
iris %>%
select(Species,
width.vars)
Error: All select() inputs must resolve to integer column positions.
The following do not:
* width.vars
How can I use dplyr::select with a vector of variable names stored as strings?
Within dplyr, most commands have an alternate version that ends with a '_' that accept strings as input; in this case, select_. These are typically what you have to use when you are utilizing dplyr programmatically.
iris %>% select_(.dots=c("Species",width.vars))
First of all, you can do the selection in dplyr with
iris %>% select(Species, contains(".Width"))
No need to create the vector of names separately. But if you did have a list of columns as string names, you could do
width.vars <- c("Sepal.Width", "Petal.Width")
iris %>% select(Species, one_of(width.vars))
See the ?select help page for all the available options.
I have a data.frame:
set.seed(1L)
vector <- data.frame(patient=rep(1:5,each=2),medicine=rep(1:3,length.out=10),prob=runif(10))
I want to get the mean of the "prob" column while grouping by patient. I do this with the following code:
vector %>%
group_by(patient) %>%
summarise(average=mean(prob))
This code perfectly works. However, I need to get the same values without using the word "prob" on the "summarise" line. I tried the following code, but it gives me a data.frame in which the column "average" is a vector with 5 identical values, which is not what I want:
vector %>%
group_by(patient) %>%
summarise(average=mean(vector[,3]))
PD: for the sake of understanding why I need this, I have another data frame with multiple columns with complex names that need to be "summarised", that's why I can't put one by one on the summarise command. What I want is to put a vector there to calculate the probs of each column grouped by patients.
It appears you want summarise_each
vector %>%
group_by(patient) %>%
summarise_each(funs(mean), vars= matches('prop'))
Using data.table you could do
setDT(vector)[,lapply(.SD,mean),by=patient,.SDcols='prob')