R dplyr: select all columns after transforming selected columns - r

I have a tibble and want to only do some mutation on selected columns. In the case below, all columns which have the word 'date' will be transformed by a function (as.Date()).
After I have performed some transformations on the selected columns, I want to get back all the columns from my tibble df.
Is there a way to do so?
df %>% select(contains('date')) %>% mutate_all(as.Date) %>% select(all)
Thanks

We can use mutate_at instead of select and then mutate_all. This would select only the columns of interest and modify those while keeping the others columns as such
library(dplyr)
df %>%
mutate_at(vars(contains('date')), as.Date)

Related

Changing a Column to an Observation in a Row in R

I am currently struggling to transition one of my columns in my data to a row as an observation. Below is a representative example of what my data looks like:
library(tidyverse)
test_df <- tibble(unit_name=rep("Chungcheongbuk-do"),unit_n=rep(2),
can=c("Cho Bong-am","Lee Seung-man","Lee Si-yeong","Shin Heung-woo"),
pev1=rep(510014),vot1=rep(457815),vv1=rep(445955),
ivv1=rep(11860),cv1=c(25875,386665,23006,10409),
abstention=rep(52199))
As seen above, the abstention column exists at the end of my data frame, and I would like my data to look like the following:
library(tidyverse)
desired_df <- tibble(unit_name=rep("Chungcheongbuk-do"),unit_n=rep(2),
can=c("Cho Bong-am","Lee Seung-man","Lee Si-yeong","Shin Heung-woo","abstention"),
pev1=rep(510014),vot1=rep(457815),vv1=rep(445955),
ivv1=rep(11860),cv1=c(25875,386665,23006,10409,52199))
Here, abstentions are treated like a candidate, in the can column. Thus, the rest of the data is maintained, and the abstention values are their own observation in the cv1 column.
I have tried using pivot_wider, but I am unsure how to use the arguments to get what I want. I have also considered t() to transpose the column into a row, but also having a hard time slotting it back into my data. Any help is appreciated! Thanks!
Here's a strategy what will work if you have multiple unit_names
test_df %>%
group_split(unit_name) %>%
map( function(group_data) {
slice(group_data, 1) %>%
mutate(can="abstention", cv1=abstention) %>%
add_row(group_data, .) %>%
select(-abstention)
}) %>%
bind_rows()
Basically we split the data up by unit_name, then we grab the first row for each group and move the values around. Append that as a new row to each group, and then re-combine all the groups.

In R, how to sum all other columns based on value of one column, without specifying column names?

I have 401 columns. The 401st column represents the PIN CODE. I want to add all the other 400 columns on the basis of pin code. Can I do this without having to specify all the 400 column names?
We can do a group_by sum. With dplyr, pass the 'PINCODE' as the grouping variable and apply sum with summarise_all which gets the sum of all other columns remaining in the data
library(dplyr)
df1 %>%
group_by(PINCODE) %>%
summarise_all(sum)
Or with aggregate from base R
aggregate(.~ PINCODE, df1, sum)

dplyr - convert column names containing words to character

I want to convert column names that start with the word "feature" to character type using dplyr. I tried the below and a few other variations using answers from stackoverflow. Any help would be appreciated. Thanks!
train %>% mutate_if(vars(starts_with("feature")), funs(as.character(.)))
train %>% mutate_if(vars(starts_with("feature")), funs(as.character(.)))
I am trying to improve my usage of dplyr commands.
You need mutate_at instead
library(dplyr)
train %>% mutate_at(vars(starts_with("feature")), as.character)
As #Gregor mentioned, mutate_if is when selection of column is based on the actual data in the column and not the names.
For example,
iris %>% mutate_if(is.numeric, sqrt)
So if the data in the column is numeric only then it will calculate square root.
If we want to combine multiple vars statement into one we can use matches
merchants %>% mutate_at(vars(matches("_id|category_")), as.character)

extracting columns from dataframe with atleast three values above cutoff

I am new to R programming. Need help to filter my data.For example my data set is mtcars. I want to extract columns which have at least three values above 18. How do i do that.thanks
I have used sort function but that is good only for one column each. not as a whole data frame.
You can get the names of the columns with the following code do this:
library(dplyr)
library(tidyr)
columns = mtcars %>% gather() %>% filter(value > 18) %>% count(key) %>% filter(n > 3) %>%
select(key)
And then filter the dataframe with:
mtcars[, c(t(columns))]
gather transforms the dataframe to one that has two columns:
key is the name of the column
value is the value taken by the observation for the column
The value above 18 are filtered and we count the number of observations by key (the name of the column).

Error dplyr summarise

I have a data.frame:
set.seed(1L)
vector <- data.frame(patient=rep(1:5,each=2),medicine=rep(1:3,length.out=10),prob=runif(10))
I want to get the mean of the "prob" column while grouping by patient. I do this with the following code:
vector %>%
group_by(patient) %>%
summarise(average=mean(prob))
This code perfectly works. However, I need to get the same values without using the word "prob" on the "summarise" line. I tried the following code, but it gives me a data.frame in which the column "average" is a vector with 5 identical values, which is not what I want:
vector %>%
group_by(patient) %>%
summarise(average=mean(vector[,3]))
PD: for the sake of understanding why I need this, I have another data frame with multiple columns with complex names that need to be "summarised", that's why I can't put one by one on the summarise command. What I want is to put a vector there to calculate the probs of each column grouped by patients.
It appears you want summarise_each
vector %>%
group_by(patient) %>%
summarise_each(funs(mean), vars= matches('prop'))
Using data.table you could do
setDT(vector)[,lapply(.SD,mean),by=patient,.SDcols='prob')

Resources