dplyr::mutate_at for changing prefixes? - r

I've got a data frame (df) with three variables, two of which have the prefix abc and one with the prefix def.
I'd like to use dplyr() to change the prefix of the variables starting with abc, so that they instead have the prefix new.
The problems that my current code's not working and I don't understand why.
Thanks!
Starting point (df):
df <- data.frame(abc_question1_F1_Q1=c(1,2,1,2),abc_question_F1_Q2=c(1,2,1,2),def_question1_F1_Q3=c(1,2,1,2))
Desired outcome (dfgoal):
df <- data.frame(new_question1_F1_Q1=c(1,2,1,2),new_question_F1_Q2=c(1,2,1,2),def_question1_F1_Q3=c(1,2,1,2))
Current code:
library(dplyr)
df <- df %>% mutate_at(vars(contains("abc_")), function(x){gsub("abc_", "new_", x)})

If we need to use dplyr
df %>%
rename_all(funs(sub("^abc", "new", .)))
Or with base R
names(df) <- sub("^abc", "new", names(df))

Related

How to Rename Column Headers in R

I have two separate datasets: one has the column headers and another has the data.
The first one looks like this:
where I want to make the 2nd column as the column headers of the next dataset:
How can I do this? Thank you.
In general you can use colnames, which is a list of your column names of your dataframe or matrix. You can rename your dataframe then with:
colnames(df) <- *listofnames*
Also it is possible just to rename one name by using the [] brackets.
This would rename the first column:
colnames(df2)[1] <- "name"
For your example we gonna take the values of your column. Try this:
colnames(df2) <- as.character(df1[,2])
Take care that the length of the columns and the header is identical.
Equivalent for rows is rownames()
dplyr way w/ reproducible code:
library(dplyr)
df <- tibble(x = 1:5, y = 11:15)
df_n <- tibble(x = 1:2, y = c("col1", "col2"))
names(df) <- df_n %>% select(y) %>% pull()
I think the select() %>% pull() syntax is easier to remember than list indexing. Also I used names over colnames function. When working with a dataframe, colnames simply calls the names function, so better to cut out the middleman and be more explicit that we are working with a dataframe and not a matrix. Also shorter to type.
You can simply do this :
names(data)[3]<- 'Newlabel'
Where names(data)[3] is the column you want to rename.

Making quick calculations on subsets with R

and thanks to all in advance.
I have the following data:
set.seed(123)
data <- data.frame (name=LETTERS[sample(1:26, 500, replace=T)],present=sample(0:1,500,replace = T))
And I want to quickly calculate the percentage of present observations (1's) for each letter. I can do it manually, but I believe there is an easier way to do this:
library(dplyr)
A <- filter(data, name=="A" & present==1)
A2 <- filter(data, name=="A")
data$Percentage[data$name=="A"] <- nrow(A)/nrow(A2)
And so on until I arrive to "Z".
Can I make this task automatically without having to change the values of the "name" colum manually?
Best regards,
We can use prop.table with table to get the proportion
prop.table(table(data), 1)[,2]
To add it as a column, we can expand it by matching with the 'names'
data$Percentage <- prop.table(table(data), 1)[,2][as.character(data$name)]
Or as #Lars Lau Raket suggested, we don't need to convert to character
prop.table(table(data), 1)[,2][data$name]
If we need to create a column
library(dplyr)
data %>%
group_by(name) %>%
mutate(Percentage = mean(present==1))

grouping in dplyr with missing columns

I have a complex dplyr structure within a function call. The input is a data frame which can have an extra column called s. If this column is available, I want to group by this column additionally to the standard grouping.
At the moment I solved it by an if statement checking if the column is in the data frame and make the grouping differently. After the grouping I have the same code for both kind of data.
Is there a more elegant way of doing this? In my original function, there are several variables I calculate in the summarise function and I don't want to maintain both parts separately.
Here is an example.
library(dplyr)
df1 <- data.frame(s=rep(c('a','b'), each=10),
p=rep(letters[1:5], 4),
v=runif(20))
df2 <- data.frame(p=rep(letters[1:5], each=4),
v=runif(20))
avgP <- function(df) {
if('s' %in% names(df)) {
df %>%
group_by(s, p) %>%
summarise(avg=mean(v))
} else {
df %>%
group_by(p) %>%
summarise(avg=mean(v))
}
}
avgP(df1)
avgP(df2)
My preferred solution would be something like group_by is just ignoring the missing column and will group only by p when I work on df2.
We can use intersect
avgP1 <- function(df){
df %>%
group_by_(.dots = intersect(names(df), c("s", "p"))) %>%
summarise(avg=mean(v))
}
avgP1(df1)
avgP1(df2)

Name the column of data frame and set as factor at the same time

I need your help to simplify the following code.
I need to name the columns of matrix and format each of it as factor.
How can I do that for 100 columns without doing it one by one.
z <- matrix(sample(seq(3),n*p,replace=TRUE),nrow=n)
train.data <- data.frame(x1=factor(z[,1],x2=factor(z[,2],....,x100=factor(z[,52]))
Here's one option
setNames(data.frame(lapply(split(z, col(z)), factor)), paste0("x", 1:p))
or use magrittr piping syntax
library(magrittr)
split(z, col(z)) %>%
lapply(factor) %>%
data.frame %>%
setNames(paste0("x", 1:p))

how to use gather_ in tidyr with variables

I'm using tidyr together with shiny and hence needs to utilize dynamic values in tidyr operations.
However I do have trouble using the gather_(), which I think was designed for such case.
Minimal example below:
library(tidyr)
df <- data.frame(name=letters[1:5],v1=1:5,v2=10:14,v3=7:11,stringsAsFactors=FALSE)
#works fine
df %>% gather(Measure,Qty,v1:v3)
dyn_1 <- 'Measure'
dyn_2 <- 'Qty'
dyn_err <- 'v1:v3'
dyn_err_1 <- 'v1'
dyn_err_2 <- 'v2'
#error
df %>% gather_(dyn_1,dyn_2,dyn_err)
#error
df %>% gather_(dyn_1,dyn_2,dyn_err_1:dyn_err_2)
after some debug I realized the error happened at melt measure.vars part, but I don't know how to get it work with the ':' there...
Please help with a solution and explain a little bit so I could learn more.
You are telling gather_ to look for the colume 'v1:v3' not on the separate column ids. Simply change dyn_err <- "v1:v3" to dyn_err <- paste("v", seq(3), sep="").
If you df has different column names (e.g. var_a, qtr_b, stg_c), you can either extract those column names or use the paste function for whichever variables are of interest.
dyn_err <- colnames(df)[2:4]
or
dyn_err <- paste(c("var", "qtr", "stg"), letters[1:3], sep="_")
You need to look at what column names you want and make the corresponding vector.

Resources