How to rename columns in R with dplyr using a character object? - r

My data frame is as such:
#generic dataset
datatest <- data.frame(col1 = c(1,2,3,4), col2 = c('A', 'B', 'C', 'D'))
#character objects
name1 <- 'A'
name2 <- 'B'
I want to rename my columns using the name1 and name2 objects. These dynamically change in the code so I can't use the following:
#I DON'T WANT THIS
datatest %>% rename(A = col1, B = col2)
I want to use this:
datatest %>% rename(name1 = col1, name2 = col2)
but then the data table columns end up becoming 'name1' and 'name2' respectively, when they should be A and B. Here is the data table at the moment.
name1 (I want this to be A)
name2 (I want this to be B)
1
A
2
B
3
C
4
D
Any help is hugely appreciated. I have the same issue with kable tables too.
Thanks in advance!

Couple of options -
Using rename_with -
library(dplyr)
name1 <- 'A'
name2 <- 'B'
datatest %>% rename_with(~c(name1, name2), c(col1, col2))
#If there are only two columns in datatest
datatest %>% rename_with(~c(name1, name2))
# A B
#1 1 A
#2 2 B
#3 3 C
#4 4 D
Use a named vector
name <- c(A = 'col1', B = 'col2')
datatest %>% rename(!!name)

You may try
datatest %>% rename({{name1}} := col1, {{name2}} := col2)
A B
1 1 A
2 2 B
3 3 C
4 4 D

Here is one more option using !!!setNames
datatest %>%
rename(!!!setNames(names(.), c(name1, name2)))
A B
1 1 A
2 2 B
3 3 C
4 4 D

Related

Subsetting data, if the column entry contains letters

I have data as follows:
DT <- as.data.frame(c("1","2", "3", "A", "B"))
names(DT)[1] <- "charnum"
What I want is quite simple, but I could not find an example on it on stackoverflow.
I want to split the dataset into two. DT1 with all the rows for which DT$charnum has numbers and DT2 with all the rows for which DT$charnum has letters. I tried something like:
DT1 <- DT[is.numeric(as.numeric(DT$charnum)),]
But that gives:
[1] 1 2 3 A B
Levels: 1 2 3 A B
Desired result:
> DT1
charnum
1 1
2 2
3 3
> DT2
charnum
1 A
2 B
You can use regular expressions to separate the two types of data that you have and then separate the two datasets.
result <- split(DT, grepl('^\\d+$', DT$charnum))
DT1 <- type.convert(result[[1]])
DT1
# charnum
#4 A
#5 B
DT2 <- type.convert(result[[2]])
DT2
# charnum
#1 1
#2 2
#3 3
Using tidyverse
library(dplyr)
library(purrr)
library(stringr)
DT %>%
group_split(grp = str_detect(charnum, "\\d+"), .keep = FALSE) %>%
map(type.convert, as.is = TRUE)

Swap values round in a column - R

How do I swap one value with another in a column within a dataframe?
For example swap the 2's and 4's around in df1 to give df2:
df1 <- as.data.frame(col1 = c(1,2,1,4))
df2 <- as.data.frame(col1 = c(1,4,1,2))
Simple solution using replace in base R:
df2 <- data.frame(col1 = replace(df1$col1, c(4,2), c(2,4)))
Output
col1
1 1
2 4
3 1
4 2
We can try using case_when from the dplyr package for some switch functionality:
df2 <- df1
df2$col1 <- case_when(
df2$col1 == 2 ~ 4,
df2$col1 == 4 ~ 2,
TRUE ~ df2$col1
)
df2
col1
1 1
2 4
3 1
4 2
Data:
df1 <- data.frame(col1 = c(1,2,1,4))
you can swap by reassigning the index for that column.
With the dataframe:
df1 <- data.frame(col1 = c("a","b","c","d"))
> df1
col1
1 a
2 b
3 c
4 d
we can:
df1[,1] <- df1[c(1,4,3,2),1]
to get
> df1
col1
1 a
2 d
3 c
4 b

Function that ignores missing columns

Say I have the following two data frames:
col1 <- c("a","b","c","d","e")
col2 <- c("A","B","C","D","E")
col1a <- c("a","b","c","d","e")
col2a <- c("A","B","C","D","E")
df1 <- data.frame(col1, col2)
df2 <- data.frame(col1a, col2a)
colnames(df1) <- c("c1","c2")
colnames(df2) <- c("c1","c3")
And I have the following function to rename column headers:
library(dplyr)
col_rename <- function(x) x %>% rename(new_c1 = c1, new_c2 = c2, new_c3 = c3)
When I run this function, I get an error because the columns in the function does not match the columns in the data frame.
df1 <- col_rename(df1)
Error: `c3` contains unknown variables
How can I make the function run only on the present columns, and ignore the ones not present, without removing or changing the column names specified in the function?
EDIT:
I can see how the example was a bit confusing. I have many dataframes with many columns. These columns are shared by some dataframes but not all. However, I want to rename all columns specified by the function, regardless of what is present in the dataframe. It looks something like this:
col1 <- c(1:5)
col2 <- c(1:5)
col3 <- c(1:5)
col4 <- c(1:5)
df1 <- data.frame(col1,col2,col3,col4)
df2 <- data.frame(col1,col2,col3,col4)
colnames(df1) <- c("c1","c2","c6","c8")
colnames(df2) <- c("c1","c3","c2","c8")
AB_rename <- function(x) x %>% rename(aa=col1,bb=col2,
cc=col3,dd=col4,
ee=col5,ff=col6,
gg=col7,hh=col8)
Therefore I cannot follow the example of #Ycw, as they do not all follow the same rename rule. How do I make this ignore columns that are not present?
Here is a workaround to use setNames for the col_rename function.
col_rename <- function(x) setNames(x, paste0("new_", names(x)))
col_rename(df1)
new_c1 new_c2
1 a A
2 b B
3 c C
4 d D
5 e E
col_rename(df2)
new_c1 new_c3
1 a A
2 b B
3 c C
4 d D
5 e E
Or use the select_all function from the dplyr.
library(dplyr)
df1 %>% select_all(function(x) paste0("new_", x))
new_c1 new_c2
1 a A
2 b B
3 c C
4 d D
5 e E
This (~) also works for select_all
df2 %>% select_all(~paste0("new_", .))
new_c1 new_c3
1 a A
2 b B
3 c C
4 d D
5 e E
rename_all also works well
library(dplyr)
df1 %>% rename_all(~paste0("new_", .))
new_c1 new_c2
1 a A
2 b B
3 c C
4 d D
5 e E
Update
This is an update to address OP's updated question.
We can create a named vector showing the relationship between old column names and new column names. And defined a function to change the name based on the setNames function.
# Create name vector
vec <- paste0("c", 1:8)
names(vec) <- c("aa", "bb", "cc", "dd", "ee", "ff", "gg", "hh")
# Create the function
AB_rename <- function(x, name_vec){
old_colname <- names(x)
new_colname <- name_vec[name_vec %in% old_colname]
x2 <- setNames(x, names(new_colname))
return(x2)
}
AB_rename(df1, vec)
aa bb ff hh
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5

combining values in rows based on matching conditions in R

I have a simple question about aggregating values in R.
Suppose I have a dataframe:
DF <- data.frame(col1=c("Type 1", "Type 1B", "Type 2"), col2=c(1, 2, 3))
which looks like this:
col1 col2
1 Type 1 1
2 Type 1B 2
3 Type 2 3
I notice that I have Type 1 and Type 1B in the data, so I would like to combine Type 1B into Type 1.
So I decide to use dplyr:
filter(DF, col1=='Type 1' | col1=='Type 1B') %>%
summarise(n = sum(col2))
But now I need to keep going with it:
DF2 <- data.frame('Type 1', filter(DF, col1=='Type 1' | col1=='Type 1B') %>%
summarise(n = sum(col2)))
I guess I want to cbind this new DF2 back to the original DF, but that means I have to set the column names to be consistent:
names(DF2) <- c('col1', 'col2')
OK, now I can rbind:
rbind(DF2, DF[3,])
The result? It worked....
col1 col2
1 Type 1 3
3 Type 2 3
...but ugh! That was awful! There has to be a better way to simply combine values.
Here's a possible dplyr approach:
library(dplyr)
DF %>%
group_by(col1 = sub("(.*\\d+).*$", "\\1", col1)) %>%
summarise(col2 = sum(col2))
#Source: local data frame [2 x 2]
#
# col1 col2
#1 Type 1 3
#2 Type 2 3
Using sub() with aggregate(), removing anything other than a digit from the end of col1,
do.call("data.frame",
aggregate(col2 ~ cbind(col1 = sub("\\D+$", "", col1)), DF, sum)
)
# col1 col2
# 1 Type 1 3
# 2 Type 2 3
The do.call() wrapper is there so that the first column after aggregate() is properly changed from a matrix to a vector. This way there aren't any surprises later on down the road.
In my opinion, aggregate() is the perfect function for this purpose, but you shouldn't have to do any text processing (e.g. gsub()). I would do this in a two-step process:
Overwrite col1 with the new desired grouping.
Compute the aggregation using the new col1 to specify the grouping.
DF$col1 <- ifelse(DF$col1 %in% c('Type 1','Type 1B'),'Type 1',levels(DF$col1));
DF;
## col1 col2
## 1 Type 1 1
## 2 Type 1 2
## 3 Type 2 3
DF <- aggregate(col2~col1, DF, FUN=sum );
DF;
## col1 col2
## 1 Type 1 3
## 2 Type 2 3
You can try:
library(data.table)
setDT(transform(DF, col1=gsub("(.*)[A-Z]+$","\\1",DF$col1)))[,list(col2=sum(col2)),col1]
# col1 col2
# 1: Type 1 3
# 2: Type 2 3
Or even more directly:
setDT(DF)[, .(col2 = sum(col2)), by = .(col1 = sub("[[:alpha:]]+$", "", col1))]

Remove rows from a data frame that contain duplicate information across the columns

col1 <- c('A','B','C', 'D')
col2 <- c('B','A','C', 'C')
col3 <- c('B','C','C', 'A')
dat <- data.frame(cbind(col1, col2, col3))
dat
col1 col2 col3
1 A B B
2 B A C
3 C C C
4 D C A
I would like to remove rows 1 and 3 from dat as the letter B is present more than once in row 1 and the letter C is present more than once in row 3.
EDIT:
My actual data contains over 1 million rows and 14 columns, all of which contain character data. The solution that runs the fastest is preferred as I am using the dataframe in a live setting to make decisions, and the underlying data is changing every few minutes.
You could try this (but I'm sure there is a better way)
cols <- ncol(dat)
indx <- apply(dat, 1, function(x) length(unique(x)) == cols)
dat[indx, ]
# col1 col2 col3
# 2 B A C
# 4 D C A
Another way (if your columns are characters and if you don't have too many columns) is something like the following (which is vectorized)
indx <- with(dat, (col1 == col2) | (col1 == col3) | (col2 == col3))
dat[!indx, ]
# col1 col2 col3
# 2 B A C
# 4 D C A
You could do this in dplyr, if you don't mind specifying the columns:
library(dplyr)
dat %>%
rowwise() %>%
mutate(repeats = max(table(c(col1, col2, col3))) - 1) %>%
filter(repeats == 0) %>%
select(-repeats) # if you don't want that column to appear in results
Source: local data frame [2 x 3]
col1 col2 col3
1 B A C
2 D C A
Here is an alternative. I haven't tested on big dataset,
library(data.table) #devel version v1.9.5
dat[setDT(melt(as.matrix(dat)))[,uniqueN(value)==.N , Var1]$V1,]
# col1 col2 col3
#2 B A C
#4 D C A
Or use anyDuplicated
dat[!apply(dat, 1, anyDuplicated),]
# col1 col2 col3
#2 B A C
#4 D C A

Resources