Rename dataframe column names by switching string patterns - r

I have following dataframe and I want to rename the column names to c("WBC_MIN_D7", "WBC_MAX_D7", "DBP_MIN_D3")
> dataf <- data.frame(
+ WBC_D7_MIN=1:4,WBC_D7_MAX=1:4,DBP_D3_MIN=1:4
+ )
> dataf
WBC_D7_MIN WBC_D7_MAX DBP_D3_MIN
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
> names(dataf)
[1] "WBC_D7_MIN" "WBC_D7_MAX" "DBP_D3_MIN"
Probably, the rename_with function in tidyverse can do it, But I cannot figure out how to do it.

You can use capture groups with sub to extract values in order -
names(dataf) <- sub('^(\\w+)_(\\w+)_(\\w+)$', '\\1_\\3_\\2', names(dataf))
Same regex can be used in rename_with -
library(dplyr)
dataf %>% rename_with(~ sub('^(\\w+)_(\\w+)_(\\w+)$', '\\1_\\3_\\2', .))
# WBC_MIN_D7 WBC_MAX_D7 DBP_MIN_D3
#1 1 1 1
#2 2 2 2
#3 3 3 3
#4 4 4 4

You can rename your dataf with your vector with names(yourDF) <- c("A","B",...,"Z"):
names(dataf) <- c("WBC_MIN_D7", "WBC_MAX_D7", "DBP_MIN_D3")

Related

Dataframe NA conversion to specific items

I have a data frame like;
dataframe <- data.frame(ID1=c(NA,2,3,1,NA,2),ID2=c(1,2,3,1,2,2))
Now I want to convert the NA value to the valuable which is the same to the next column valuable like;
dataframe <- data.frame(ID1=c(1,2,3,1,2,2),ID2=c(1,2,3,1,2,2))
I think I should use the if function, but I want use %>% for simplification.
Please teach me.
An ifelse solution
dataframe <- within(dataframe, ID1 <- ifelse(is.na(ID1),ID2,ID1))
such that
> dataframe
ID1 ID2
1 1 1
2 2 2
3 3 3
4 1 1
5 2 2
6 2 2
A straightforward solution is to find out NA values in ID1 and replace them with corresponding values from ID2.
inds <- is.na(dataframe$ID1)
dataframe$ID1[inds] <- dataframe$ID2[inds]
However, since you want a solution with pipes you can use coalesce from dplyr
library(dplyr)
dataframe %>% mutate(ID1 = coalesce(ID1, ID2))
# ID1 ID2
#1 1 1
#2 2 2
#3 3 3
#4 1 1
#5 2 2
#6 2 2
A dplyr (using %>%) solution:
sanitized <- dataframe %>%
mutate(ID1 = ifelse(is.na(ID1), ID2, ID1))

filter() or subset() all the dataframes stored in a list

If I want to remove all the rows that contain 0s in a specific column, I can just do:
df <- data.frame(a = c(0,1,2,3,0,5),
b = c(1,2,3,5,3,1))
df <- filter(df, a != 0)
How can I do the same if I'm working with lists?
My intuition tells me to use 'lapply' but I cannot seem to make the syntax work:
#same dataframe.
df <- data.frame(a = c(0,1,2,3,0,5),
b = c(1,2,3,5,3,1))
df2 <- df
list.df <- list (df, df2)
lapply(list.df, filter(), a !=0) #don't work. How do I fix this syntax?
Many thanks in advance!
One option involving purrr could be:
map(.x = list.df, ~ .x %>%
filter(a != 0))
[[1]]
a b
1 1 2
2 2 3
3 3 5
4 5 1
[[2]]
a b
1 1 2
2 2 3
3 3 5
4 5 1
You have other options using lapply as:
#Without dplyr
lapply(list.df, function(x)x["a"!=0,])
#With dplyr
library(dplyr)
lapply(list.df, function(x)filter(x,a!=0))
# Result
# [[1]]
# a b
# 1 1 2
# 2 2 3
# 3 3 5
# 4 5 1
#
# [[2]]
# a b
# 1 1 2
# 2 2 3
# 3 3 5
# 4 5 1

How to apply function to colnames in tidyverse

Just like in title: is there any function that allows applying another function to column names of data frame? I mean something like forcats::fct_relabel that applies some function to factor labels.
To give an example, supose I have a data.frame as below:
X<-data.frame(
firababst = c(1,1,1),
secababond = c(2,2,2),
thiababrd = c(3,3,3)
)
X
firababst secababond thiababrd
1 1 2 3
2 1 2 3
3 1 2 3
Now I wish to get rid of abab from column names by applying stringr::str_remove. My workaround involves magrittr::set_colnames:
X %>%
set_colnames(colnames(.) %>% str_remove('abab'))
first second third
1 1 2 3
2 1 2 3
3 1 2 3
Can you suggest some more strightforward way? Ideally, something like:
X %>%
magic_foo(str_remove, 'abab')
You can do:
X %>%
rename_all(~ str_remove(., "abab"))
first second third
1 1 2 3
2 1 2 3
3 1 2 3
With base R, we can do
names(X) <- sub("abab", "", names(X))

Apply a vector of filters based on a string (or vector of strings) in dplyr

R and the tidyverse have some extremely powerful but equally mysterious methods for turning strings into actionable expressions. I feel like one needs to be an expert to really understand how to use them.
NOTE: this question differs from this one in that I specifically ask about a vector (that is multiple) filter conditions. I demonstrate a solution for single filters that fails when I try multiple ways of extending it to multiple filters.
I want to do something along the lines of:
df = data.frame(A=1:10, B=1:10)
df %>% filter(A<3, B<5)
But where the filters are contained in either a string such as "A<3, B<5" or a character vector such as c("A<3", "B<5").
I can do
df %>% filter(eval(str2expression("A<3")))
# A B
# 1 1 1
# 2 2 2
But this does not work:
df %>% filter(eval(str2expression("A<3, B<5")))
Error in str2expression("A<3, B<5") : <text>:1:4: unexpected ','
1: A<3,
^
These don't work either:
> df %>% filter(!!c(str2expression("A<3"), str2expression("B<5")))
Error: Argument 2 filter condition does not evaluate to a logical vector
> df %>% filter(!!!c(str2expression("A<3"), str2expression("B<5")))
Error: Can't splice an object of type `expression` because it is not a vector
Run `rlang::last_error()` to see where the error occurred.
Evaluating a vector of expressions from str2expression for some reason only applies the last expression:
> df %>% filter(eval(c(str2expression("A<3"), str2expression("B<5"))))
# A B
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
Using a vector of evaluated expressions fails altogether:
> df %>% filter(!!!c(eval(str2expression("A<3")), eval(str2expression("B<5"))))
Error in eval(str2expression("A<3")) : object 'A' not found
I can do:
> df %>% filter(!!!c(expr(A<3), expr(B<5)))
# A B
# 1 1 1
# 2 2 2
and this tells me that expr(A<3) is NOT the same thing as str2expression("A<3")
But that isn't starting from strings.
What to do?
You could use parse_exprs from rlang
library(dplyr)
expr <- c("A<3", "B<5")
filter(df, !!!rlang::parse_exprs(expr))
# A B
#1 1 1
#2 2 2
Or you could combine the two expressions and then use it in eval
filter(df, eval(parse(text = paste0(expr, collapse = "&"))))
# A B
#1 1 1
#2 2 2
Learning from #Ronak Shah's answer, apparently, in dplyr I can use multiple conditions with a single & in filter instead of a comma. I don't understand this at all---it is not the same thing as an and logical:
> df %>% filter(A<3 & B<5)
A B
1 1 1
2 2 2
> df %>% filter(A<3 && B<5)
A B
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
Nevertheless, the following does work:
> df %>% filter(eval(str2expression("A<3 & B<5")))
A B
1 1 1
2 2 2
> df %>% filter(eval(str2expression("A<6 & B<5")))
A B
1 1 1
2 2 2
3 3 3
4 4 4

Select list of columns from a data frame using dplyr and select_()

I'm trying to use the following function to extract some columns from a data frame:
library('dplyr')
desired_columns = c(
'a',
'b',
'c')
extract_columns <- function(data) {
extracted_data <- data %>%
select_(desired_columns)
return(extracted_data)
}
But when I try it, I don't get what I expect:
> df <- data.frame(a=1:5, b=1:5, c=1:5, d=1:5)
> df
a b c d
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
> extract_columns(df)
a
1 1
2 2
3 3
4 4
5 5
I seem to be only getting the first column and I can't figure out what I'm doing wrong. How can I get all the requested columns?
You are just missing the .dots argument in select_:
extract_columns <- function(data) {
extracted_data <- data %>%
select_(.dots = desired_columns)
return(extracted_data)
}
extract_columns(df)
a b c
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
In this case, you have to use .dots parameter to pass the vector (or list):
select_(.dots = desired_columns)
It seems that it has something to do with the laziness.
A tibble is the tidyverse/dplyr version of a dataframe. Since select() is a dplyr function, you can convert your dataframe to a tibble and use select directly with a list of variables to return another tibble.
df = data.frame(a=1:5, b=1:5, c=1:5, d=1:5)
desired_columns = c( 'a', 'b', 'c')
df %>% as_tibble() %>% select(desired_columns)

Resources