use outside variable inside of rename() function in R - r

I'm new to R and have a problem
I am trying to reformat some data, and in the process I would like to rename the columns of the new data set.
here is how I have tried to do this:
first the .csv file is read in, lets say case1_case2.csv
then the name of the .csv file is broken up into two parts
each part is assigned to a vector
so it ends up being like this:
xName=case1
yName=case2
After I have put my data into new columns I would like to rename each column to be case1 and case2
to do this I tried using the rename function in R but instead of renaming to case1 and case2 the columns get renamed to xName and yName.
here is my code:
for ( n in 1:length(dirNames) ){
inFile <- read.csv(dirNames[n], header=TRUE, fileEncoding="UTF-8-BOM")
xName <- sub("_.*","",dirNames[n])
yName <- sub(".*[_]([^.]+)[.].*", "\\1", dirNames[n])
xValues <- inFile %>% select(which(str_detect(names(inFile), xName))) %>% stack() %>% rename( xName = values ) %>% subset( select = xName)
yValues <- inFile %>% select(which(!str_detect(names(inFile), xName))) %>% stack() %>% rename(yName = values, Organisms=ind)
finalForm <- cbind(xValues, yValues) %>% filter(complete.cases(.))
}
how can I make sure that the variables xName and yName are expanded inside of the rename() function
thanks.

You didn't provide a reproducible example, so I'll just demonstrate the idea in general. The rename function is part of the dplyr package.
You need to "unquote" the variable that contains the string you want to use as the new column name. The unquote operator is !! and you'll need to use the special := assignment operator to make unquoting on the left hand side allowed.
library(tidyverse)
df <- data_frame(x = 1:3)
y <- "Foo"
df %>% rename(y=x) # Not what you want - need to unquote y
df %>% rename(!!y = x) # Gives error - need to use :=
df %>% rename(!!y := x) # Correct

Related

R function that selects certain columns from a dataframe

I am trying to figure out how to write a function in R than can select specific columns from a dataframe(df) for subsetting:
Essentially I have df with columns or colnames : count_A.x, count_B.x, count_C.x, count_A.y, count_B.y, count_C.y.
I would ideally like a function where I can select both "count_A.x" and "count_A.y" columns by simply specifying count_A in function argument.
I tried the following:
e.g. pull_columns2 <- function(df,count_char){
df_subset<- df%>% select(,c(count_char.x, count_char.y))
}
Unfortunately when I run the above code [i.e., pull_columns2(df, count_A)] the following code it rightfully says that column count_char.x does not exist and does not "convert" count_char.x to count_A
pull_columns2(df, count_A)
We can use
pull_columns2 <- function(df,count_char){
df_subset<- df %>% select(contains(count_char))
df_subset
}
#> then use it as follows
df %>% pull_columns2("count_A")
Try
select_func = function(df, pattern){
return(df[colnames(df)[which(grepl(pattern, colnames(df)))]])
}
df = data.frame("aaa" = 1:10, "aab" = 1:10, "bb" = 1:10, "ca" = 1:10)
select_func(df,"b")

dplyr: how to pass strings to dplyr's mutate argument

I want to write a helper function that summarizes the percentage change for column A, B and C in one shot. I want to pass a string to the "mutate" argument of dplyr with the help of rlang. Unfortunately, I get an error saying that I have an unexpected ",". Could you please take a look? Thanks in advance!
library(rlang) #read text inputs and return vars
library(dplyr)
set.seed(10)
dat <- data.frame(A=rnorm(10,0,1),
B=rnorm(10,0,1),
C=rnorm(10,0,1),
D=2001:2010)
calc_perct_chg <- function(input_data,
target_Var_list,
year_Var_name){
#create new variable names
mutate_varNames <- paste0(target_Var_list,rep("_pct_chg = ",length(target_Var_list)))
#generate text for formula
mutate_formula <- lapply(target_Var_list,function(x){output <- paste0("(",x,"-lag(",x,"))/lag(",x,")");return(output)})
mutate_formula <- unlist(mutate_formula) #convert list to a vector
#generate arguments for mutate
mutate_args <<- paste0(mutate_varNames,collapse=",",mutate_formula)
#data manipulation
output <- input_data %>%
arrange(!!parse_quo(year_Var_name,env=caller_env())) %>%
mutate(!!parse_quo(mutate_args,env=caller_env()))
#output data frame
return(output)
}
# error: unexpected ','
calc_perct_chg(input_data =dat,
target_Var_list=list("A","B","C"),
year_Var_name="D")
I don't think it's a good idea to evaluate string as code, also I think you are over-complicating it. Using across this should be easier.
library(dplyr)
calc_perct_chg <- function(input_data,
target_Var_list,
year_Var_name){
input_data %>%
arrange(across(all_of(year_Var_name))) %>%
mutate(across(all_of(target_Var_list), ~(.x - lag(.x))/lag(.x)))
}
calc_perct_chg(input_data = dat,
target_Var_list = c("A","B","C"),
year_Var_name = "D")

R: select columns with NAs for mutate function

I'm trying to subset columns in my dataframe to use them solely in a mutate function as part of conditional formating for an HTML Table with knitr::kable and kableExtra.
#Conditional Formating function
highlights <- function(x) { cell_spec(x, background = ifelse( x != NA, "#C9FFE5","white")) }
#build table
ds.tab <- ds%>%
mutate_if("column contains ANY NA values", funs(highlights(.)))%>% ...
I need to write the bit between brackets ("column contains ANY NA values") in R.
Thanks!
It should work if you use any(is.na(.)) such as the following:
ds.tab <- ds %>%
mutate_if(function(x) any(is.na(x)), funs(highlights(.))) %>% ...
Or if you prefer, the following syntax works the same way
ds.tab <- ds %>%
mutate_if(~any(is.na(.)), funs(highlights(.))) %>% ...

Using a for loop with knitr::kable and kableExtra::add_header_above

I am using R version 3.5.2.
I would like to evaluate a string in a kable function, but I am having some issues. Normally, I can pass a string through a for loop using the get function but in the kableExtra::add_header_above function I get the following error:
Error: unexpected '=' in:"print(kable(df4,"html", col.names = c("zero","one")) %>% add_header_above(c(get("string") ="
I have tried a handful of techniques like creating a string outside of the kable function and calling it, using page breaks and print statements in the knit loop and trying the eval function as well. I have also added result ="asis" as suggested here
Here is a reproducible example:
```{r results="asis"}
library("knitr")
library("kableExtra")
df1 <- mtcars %>% dplyr::select(am,vs)
df1a <- df1 %>% mutate(type = "A")
df1b <- df1 %>% mutate(type = "B")
df1c <- df1 %>% mutate(type = "C")
df2 <- rbind(df1a,df1b,df1c)
vector <- as.vector(unique(df2$type))
for (variable in vector) {
df3 <- df2 %>% filter(type == (variable))
df4 <- table(df3$am,df3$vs)
print(kable(df4,"html", col.names = c("zero","one")) %>%
add_header_above(c(get("string") = 3)))
}
```
Ideally, I would like the header of the table to have the string name from the column type. Below is an example of what I want it to look it:
print(kable(df4,"html", col.names = c("zero","one")) %>%
add_header_above(c("A" = 3)))
I understand that the knitr function needs to be treated differently than regular R when using loops as found in this solution but I am still struggling to get the string to be evaluated correctly. Perhaps because the function requires a vecotr input, it is not evalauting it as a string?
You have to define your header as a vector. The name of the header should be the names of the vector and the value of the vector would be the number of columns the header will use.
The loop in the code should look like this:
for (variable in vector) {
df3 <- df2 %>% filter(type == (variable))
df4 <- table(df3$am,df3$vs)
header_temp = 3
names(header_temp) = get("variable")
print(kable(df4,"html", col.names = c("zero","one")) %>%
add_header_above(header_temp))
}
So first I define the number of columns the of the header in the variable header_temp and then i assign a name to it.

Can "assign()" and "get()" be written more concisely?

Below is my code. I use an extra variation "tmp" to clean the "ABC_Chla". Because the "Location_name" can change, I use "assign()" and "get()" function.
Location_name <- "ABC_"
tmp <- get(paste(Location_name,"DO",sep = "")) %>% filter(log.DO != -Inf)
assign(paste(Location_name,"DO",sep = ""), tmp)
My code can achieve this goal, but it seems not concise (introduce a temporary variable). Is there a better way?
Assuming the inputs shown reproducibly in the Note at the end (next time please make sure your question includes complete reproducible code including inputs) we can make the following changes:
use paste0 instead of paste
create a variable locname to hold the name of the data frame and a variable e to be the environment where our data frame is located
use e[[...]] instead of get and assign
use magrittr %<>% two-way pipe
possibly use filter(is.finite(log.DO)) -- not shown below
giving this code:
library(dplyr)
library(magrittr)
e <- .GlobalEnv # change if our data frame is in some other environment
locname <- paste0(Location_name, "DO")
e[[locname]] %<>%
filter(log.DO != -Inf)
The result is:
get(locname, e)
## log.DO
## 1 1
## 2 2
Alternative
This alternative only uses ordinary pipes. We use e and locname from above.
library(dplyr)
e[[locname]] <- e[[locname]] %>%
filter(log.DO != -Inf)
Note
Test input:
ABC_DO <- data.frame(log.DO = c(1, -Inf, 2))
Location_name <- "ABC_"
You only have a temporary variable because you store the data in tmp, i don't see it as a problem.But, n this case, the only thing that i see you can do is pass the code of tmp directly to assign, like:
assign(
paste(Location_name,"DO",sep = ""),
get(paste(Location_name,"DO",sep = "")) %>% filter(log.DO != -Inf)
)

Resources