R function that selects certain columns from a dataframe - r

I am trying to figure out how to write a function in R than can select specific columns from a dataframe(df) for subsetting:
Essentially I have df with columns or colnames : count_A.x, count_B.x, count_C.x, count_A.y, count_B.y, count_C.y.
I would ideally like a function where I can select both "count_A.x" and "count_A.y" columns by simply specifying count_A in function argument.
I tried the following:
e.g. pull_columns2 <- function(df,count_char){
df_subset<- df%>% select(,c(count_char.x, count_char.y))
}
Unfortunately when I run the above code [i.e., pull_columns2(df, count_A)] the following code it rightfully says that column count_char.x does not exist and does not "convert" count_char.x to count_A
pull_columns2(df, count_A)

We can use
pull_columns2 <- function(df,count_char){
df_subset<- df %>% select(contains(count_char))
df_subset
}
#> then use it as follows
df %>% pull_columns2("count_A")

Try
select_func = function(df, pattern){
return(df[colnames(df)[which(grepl(pattern, colnames(df)))]])
}
df = data.frame("aaa" = 1:10, "aab" = 1:10, "bb" = 1:10, "ca" = 1:10)
select_func(df,"b")

Related

dplyr: how to pass strings to dplyr's mutate argument

I want to write a helper function that summarizes the percentage change for column A, B and C in one shot. I want to pass a string to the "mutate" argument of dplyr with the help of rlang. Unfortunately, I get an error saying that I have an unexpected ",". Could you please take a look? Thanks in advance!
library(rlang) #read text inputs and return vars
library(dplyr)
set.seed(10)
dat <- data.frame(A=rnorm(10,0,1),
B=rnorm(10,0,1),
C=rnorm(10,0,1),
D=2001:2010)
calc_perct_chg <- function(input_data,
target_Var_list,
year_Var_name){
#create new variable names
mutate_varNames <- paste0(target_Var_list,rep("_pct_chg = ",length(target_Var_list)))
#generate text for formula
mutate_formula <- lapply(target_Var_list,function(x){output <- paste0("(",x,"-lag(",x,"))/lag(",x,")");return(output)})
mutate_formula <- unlist(mutate_formula) #convert list to a vector
#generate arguments for mutate
mutate_args <<- paste0(mutate_varNames,collapse=",",mutate_formula)
#data manipulation
output <- input_data %>%
arrange(!!parse_quo(year_Var_name,env=caller_env())) %>%
mutate(!!parse_quo(mutate_args,env=caller_env()))
#output data frame
return(output)
}
# error: unexpected ','
calc_perct_chg(input_data =dat,
target_Var_list=list("A","B","C"),
year_Var_name="D")
I don't think it's a good idea to evaluate string as code, also I think you are over-complicating it. Using across this should be easier.
library(dplyr)
calc_perct_chg <- function(input_data,
target_Var_list,
year_Var_name){
input_data %>%
arrange(across(all_of(year_Var_name))) %>%
mutate(across(all_of(target_Var_list), ~(.x - lag(.x))/lag(.x)))
}
calc_perct_chg(input_data = dat,
target_Var_list = c("A","B","C"),
year_Var_name = "D")

Using a for loop with knitr::kable and kableExtra::add_header_above

I am using R version 3.5.2.
I would like to evaluate a string in a kable function, but I am having some issues. Normally, I can pass a string through a for loop using the get function but in the kableExtra::add_header_above function I get the following error:
Error: unexpected '=' in:"print(kable(df4,"html", col.names = c("zero","one")) %>% add_header_above(c(get("string") ="
I have tried a handful of techniques like creating a string outside of the kable function and calling it, using page breaks and print statements in the knit loop and trying the eval function as well. I have also added result ="asis" as suggested here
Here is a reproducible example:
```{r results="asis"}
library("knitr")
library("kableExtra")
df1 <- mtcars %>% dplyr::select(am,vs)
df1a <- df1 %>% mutate(type = "A")
df1b <- df1 %>% mutate(type = "B")
df1c <- df1 %>% mutate(type = "C")
df2 <- rbind(df1a,df1b,df1c)
vector <- as.vector(unique(df2$type))
for (variable in vector) {
df3 <- df2 %>% filter(type == (variable))
df4 <- table(df3$am,df3$vs)
print(kable(df4,"html", col.names = c("zero","one")) %>%
add_header_above(c(get("string") = 3)))
}
```
Ideally, I would like the header of the table to have the string name from the column type. Below is an example of what I want it to look it:
print(kable(df4,"html", col.names = c("zero","one")) %>%
add_header_above(c("A" = 3)))
I understand that the knitr function needs to be treated differently than regular R when using loops as found in this solution but I am still struggling to get the string to be evaluated correctly. Perhaps because the function requires a vecotr input, it is not evalauting it as a string?
You have to define your header as a vector. The name of the header should be the names of the vector and the value of the vector would be the number of columns the header will use.
The loop in the code should look like this:
for (variable in vector) {
df3 <- df2 %>% filter(type == (variable))
df4 <- table(df3$am,df3$vs)
header_temp = 3
names(header_temp) = get("variable")
print(kable(df4,"html", col.names = c("zero","one")) %>%
add_header_above(header_temp))
}
So first I define the number of columns the of the header in the variable header_temp and then i assign a name to it.

issues with Iterating over f(x, y) using map2_

I am using a wrapper for the LastFM API to search for track Tags.
the wrapper function is...
devtools::install_github("juyeongkim/lastfmr")
track_getInfo("track", "artist", api_key= lastkey)
I defined my own function as
INFOLM <- function(x= track, y= artist) {
output <- track_getInfo(x,y,api_key = lastkey)
output <- flatten(output)
output_1 <- output[["tag"]][["name"]]
return(output_1)
}
Then prepared my list elements from my larger data frame
artist4lf <- c(small_descriptive[1:10,2])
track4lf <- c(small_descriptive[1:10,3])
x<- vector("list", length = length(track4lf))
y<- vector("list", length = length(artist4lf))
names(x) <- track4lf
names(y) <- artist4lf
Then...
map2_df(track4lf, artist4lf, INFOLM)
I get a 0x0 tibble back everytime... does anyone have a suggestion?
I think your INFOLM function will work if you just delete the api_key argument from the track_getInfo function.
Also, not sure you need to use purrr::map2 here, you should be able to use your small_descriptive dataframe with rowwise and mutate to add the column(s) you want.
Here's a go, using testdf as if it's your small_descriptive dataframe with only the track and artist columns.
library(lastfmr)
library(dplyr)
library(tidyr)
testdf <- tribble(
~Artist, ~Track,
"SmashMouth", "All Star",
"Garth Brooks", "The Dance"
)
INFOLM <- function(x= track, y= artist) {
output <- track_getInfo(x,y)
output <- flatten(output)
output_1 <- output[["tag"]][["name"]]
return(paste(output_1, collapse = ","))
}
testdf %>% rowwise %>%
mutate(stuff = INFOLM(Track, Artist)) %>%
tidyr::separate(stuff, c("Tag1", "Tag2", "Tag3", "Tag4", "Tag5"), sep = ",")

Initialize an empty tibble with column names and 0 rows

I have a vector of column names called tbl_colnames.
I would like to create a tibble with 0 rows and length(tbl_colnames) columns.
The best way I've found of doing this is...
tbl <- as_tibble(data.frame(matrix(nrow=0,ncol=length(tbl_colnames)))
and then I want to name the columns so...
colnames(tbl) <- tbl_colnames.
My question: Is there a more elegant way of doing this?
something like tbl <- tibble(colnames=tbl_colnames)
my_tibble <- tibble(
var_name_1 = numeric(),
var_name_2 = numeric(),
var_name_3 = numeric(),
var_name_4 = numeric(),
var_name_5 = numeric()
)
Haven't tried, but I guess it works too if instead of initiating numeric vectors of length 0 you do it with other classes (for example, character()).
This SO question explains how to do it with other R libraries.
According to this tidyverse issue, this won't be a feature for tribbles.
Since you want to combine a list of tibbles. You can just assign NULL to the variable and then bind_rows with other tibbles.
res = NULL
for(i in tibbleList)
res = bind_rows(res,i)
However, a much efficient way to do this is
bind_rows(tibbleList) # combine all tibbles in the list
For anyone still interested in an elegant way to create a 0-row tibble with column names given by a character vector tbl_colnames:
tbl_colnames %>% purrr::map_dfc(setNames, object = list(logical()))
or:
tbl_colnames %>% purrr::map_dfc(~tibble::tibble(!!.x := logical()))
or:
tbl_colnames %>% rlang::rep_named(list(logical())) %>% tibble::as_tibble()
This, of course, results in each column being of type logical.
The following command will create a tibble with 0 row and variables (columns) named with the contents of tbl_colnames
tbl <- tibble::tibble(!!!tbl_colnames, .rows = 0)
You could abuse readr::read_csv, which allow to read from string. You can control names and types, e.g.:
tbl_colnames <- c("one", "two", "three", "c4", "c5", "last")
read_csv("\n", col_names = tbl_colnames) # all character type
read_csv("\n", col_names = tbl_colnames, col_types = "lcniDT") # various types
I'm a bit late to the party, but for future readers:
as_tibble(matrix(nrow = 0, ncol = length(tbl_colnames)), .name_repair = ~ tbl_colnames)
.name_repair allows you to name you columns within the same function.

use outside variable inside of rename() function in R

I'm new to R and have a problem
I am trying to reformat some data, and in the process I would like to rename the columns of the new data set.
here is how I have tried to do this:
first the .csv file is read in, lets say case1_case2.csv
then the name of the .csv file is broken up into two parts
each part is assigned to a vector
so it ends up being like this:
xName=case1
yName=case2
After I have put my data into new columns I would like to rename each column to be case1 and case2
to do this I tried using the rename function in R but instead of renaming to case1 and case2 the columns get renamed to xName and yName.
here is my code:
for ( n in 1:length(dirNames) ){
inFile <- read.csv(dirNames[n], header=TRUE, fileEncoding="UTF-8-BOM")
xName <- sub("_.*","",dirNames[n])
yName <- sub(".*[_]([^.]+)[.].*", "\\1", dirNames[n])
xValues <- inFile %>% select(which(str_detect(names(inFile), xName))) %>% stack() %>% rename( xName = values ) %>% subset( select = xName)
yValues <- inFile %>% select(which(!str_detect(names(inFile), xName))) %>% stack() %>% rename(yName = values, Organisms=ind)
finalForm <- cbind(xValues, yValues) %>% filter(complete.cases(.))
}
how can I make sure that the variables xName and yName are expanded inside of the rename() function
thanks.
You didn't provide a reproducible example, so I'll just demonstrate the idea in general. The rename function is part of the dplyr package.
You need to "unquote" the variable that contains the string you want to use as the new column name. The unquote operator is !! and you'll need to use the special := assignment operator to make unquoting on the left hand side allowed.
library(tidyverse)
df <- data_frame(x = 1:3)
y <- "Foo"
df %>% rename(y=x) # Not what you want - need to unquote y
df %>% rename(!!y = x) # Gives error - need to use :=
df %>% rename(!!y := x) # Correct

Resources