R function to compare columns - r

In R language I would like to create a function to view selected columns for comparison in the Viewer. Assuming my dataframe is df1:
compare_col <- function(x){
select(df1, x) %>%
View()
}
If I define the function by x, I can only put input 1 column.
compare_col <- function(x)
compare_col("col_1")
Only if I define the function by say x,y, then can I input in 2 columns.
compare_col <- function(x, y)
compare_col("col_1", "col_2")
How can I create a function that is dynamic enough to input in any no. of columns?

You can use the rlang package to achieve this.
This will allow you to input a string of column names using the syms and !!! operator which will splice and evaluate in the given environment dynamically as you require.
library(dplyr)
#library(rlang)
compare_col <- function(x){
df1 %>% select(!!! syms(x)) %>%
View()
}
compare_col(c("col1", "col2"))

Just realised, all I actually needed to do was vectorise the inputs when calling the function.
compare_col(c("col1", "col2"))

Related

Extending an sapply to apply list of variables and saving output as list of data frames in R

I have a data set similar to the example below, complex sample data. Thanks to SO user IRTFM, I was able to adapt the code and save results (i'm only interested in the total proportions, not the confidence intervals) as a reshaped object for further processing. What I would like to do is extend this sapply to generate results for 20 other variables. I would like to save the results as data frames in a list, ideally, since I think this is the most efficient way. My struggle is how to extend the sapply so that I can process multiple variables at once. I thought about a for loop over a list that holds the names of the variables and started to make this list, var_list below, but this seems not the way forward. I'd rather take advantage of the apply family since I would like the results to be stored in a list.
library(survey) # using the `dclus1` object that is standard in the examples.
library(reshape)
library(tidyverse)
data(api)
stype_t <- sapply( levels(dclus1$variables$stype),
function(x){
form <- as.formula( substitute( ~I(stype %in% x), list(x=x)))
z <- svyciprop(form, dclus1, method="me", df=degf(dclus1))
c( z, c(attr(z,"ci")) )} ) %>%
as.data.frame() %>% slice(1) %>% reshape::melt() %>% dplyr::mutate(value = round(value, digits = 4)*100)
Lets say you then wanted to repeat the above using the variable awards. You could copy the lines and do it that way but it would be better to be more efficient. So I started by making a list of the names of the two variables in this example data but I am stumped as to how to apply this list to the code above and retain the results in a list of dataframes. I tried wrapping the sapply with an lapply but this did not work because I'm betting that was wrong. Any advice or thoughts would be appreciated.
var_list <- list("stype", "awards")
Instead of $ to reference named elements, consider [[ extractor to reference names by string. Also, extend substitute for dynamic variable:
# DEFINED METHOD
df_build <- function(var) {
sapply(levels(dclus1$variables[[var]]), function(x) {
form <- as.formula(substitute(~I(var %in% x),
list(var=as.name(var), x=x)))
z <- svyciprop(form, dclus1, method="me", df=degf(dclus1))
c(z, c(attr(z,"ci")))
}) %>%
as.data.frame() %>%
slice(1) %>%
reshape::melt() %>%
dplyr::mutate(value = round(value, digits = 4)*100)
}
# ITERATE THROUGH CHARACTER VECTOR AND CALL METHOD
var_list <- list("stype", "awards")
df_list <- lapply(var_list, df_build)

Convert a variable in multiple dataframes to character in R

I have 4 datasets:(y25_age,y30_age,y25_mri,y30_mri). Each dataset has an ID variable. I want to convert the ID format from numeric to character in the above datasets. I have tried the below code
x<-list(y25_age,y30_age,y25_mri,y30_mri)
x$ID<-lapply(x,function(x){x<-x["ID"]<-as.character(x["ID"])})
However, this gives an output of all the IDs as characters, which is not what I want. Any suggestions are welcome? Thank you in advance.
Here, the lhs to <- should be x and there should be a return statement for 'x'
x <- lapply(x,function(u){u$ID <-as.character(u$ID)
u})
NOTE: changed the anonymous function from 'x' to 'u' to avoid any confusion
Or another option is transform
x <- lapply(x, transform, ID = as.character(ID))
If the intention is to change the original objects, the 'x' should be a named list
names(x) <- c('y25_age','y30_age','y25_mri','y30_mri')
and then use list2env
list2env(x, .GlobalEnv) # not recommended though
this should also work
library(tidyverse)
map(x, ~ .x %>% mutate(ID = as.character(ID)))

R - Passing a column name to a function to be evaluated in a non standard way

I have a data-frame where I want to pass a column name to R in and then filter based on that column.
I have tried reading a few tutorials on this and it seems to be related to non standard evaluation in R.
I cant seem to wrap my head around the examples in the blog posts I have read.
Just for simplicity, I have taken the iris dataset and I want to pass a column to a function which will then filter that dataset where the column value is greater than one.
mydf <- iris
filter_measurements <- function(mydf, measurement){
mydf <- filter(measurement >= 1)
mydf
}
mydf %>%
filter_measurements(measurement = Petal.Width)
Do I have to add something to my function so that R knows I want a column and not use it as 'Petal.Width' for example.
I have seen Passing a variable name to a function in R which i was unable to adapt to my example
Thank you all for your time
A great resource for this is Programming with dplyr.
mydf <- iris
filter_measurements <- function(mydf, measurement){
measurement <- enquo(measurement)
mydf <- filter(mydf, (!!measurement) >= 1)
mydf
}
mydf %>%
filter_measurements(measurement = Petal.Width)
You have to tell the function that measurement is giving as a bare variable name. For this first use enquo to evaluate what is given in the measurement argument and store it as a quosure. Then with !! in front of measurement the filter function knows that it doesn't have to quote this argument, as it is already a quosure.
Alternative
You can also pass the column you want to filter on as a string and use filter_:
filter_measurements <- function(mydf, measurement){
mydf <- filter_(mydf, paste0(measurement, " >= 1"))
mydf
}
mydf %>%
filter_measurements(measurement = "Petal.Width")
You have to pass the column name either as character or integer index of the column. Also, the line
mydf <- filter(measurement >= 1)
within your function never states what is being filtered and will expect the "measurement" to be a stand-alone object, not a part of a data frame.
Try this:
filter_measurements <- function(mydf, measurement)
{
mydf <- filter(mydf, mydf[,measurement] >= 1)
mydf
}
iris %>% filter_measurements("Petal.Width")
A more convoluted invocation of the function would also work:
iris %>% filter_measurements(which(names(.)=="Petal.Width"))

R apply a function using endsWith to a vector

I need to apply a function (which takes two arguments of different lengths) to each item in a vector. The function looks up the value in the first argument that ends with the characters in the second argument and outputs the index (the objective is to perform a left join on two tables using a fuzzy join, but regex_left_join crashed so this is the first step in a workaround solution).
Example input:
x <- c("492820UA665110", "492820UA742008", "493600N077751", "671884RB25355")
y <- c("RB25355", "S56890")
Function:
idx_endsWith <- function(.x, .y) {
return(ifelse(length(which(endsWith(.x, .y))) == 1,
which(endsWith(.x, .y)),
NA))
}
So for example,
> idx_endsWith(x, y[1])
[1] 4
How can I apply this function to each element in y without using a loop? I need to vectorize the function, but mapply doesn't work because the vectors need to be the same length. I'm looking for a solution in dplyr.
For dplyr, as you requested, this should work:
data.frame(y, stringsAsFactors = FALSE) %>%
rowwise %>%
mutate(index = idx_endsWith(x, y))

Repeatedly mutate variable using dplyr and purrr

I'm self-taught in R and this is my first StackOverflow question. I apologize if this is an obvious issue; please be kind.
Short Version of my Question
I wrote a custom function to calculate the percent change in a variable year over year. I would like to use purrr's map_at function to apply my custom function to a vector of variable names. My custom function works when applied to a single variable, but fails when I chain it using map_a
My custom function
calculate_delta <- function(df, col) {
#generate variable name
newcolname = paste("d", col, sep="")
#get formula for first difference.
calculate_diff <- lazyeval::interp(~(a + lag(a))/a, a = as.name(col))
#pass formula to mutate, name new variable the columname generated above
df %>%
mutate_(.dots = setNames(list(calculate_diff), newcolname)) }
When I apply this function to a single variable in the mtcars dataset, the output is as expected (although obviously the meaning of the result is non-sensical).
calculate_delta(mtcars, "wt")
Attempt to Apply the Function to a Character Vector Using Purrr
I think that I'm having trouble conceptualizing how map_at passes arguments to the function. All of the example snippets I can find online use map_at with functions like is.character, which don't require additional arguments. Here are my attempts at applying the function using purrr.
vars <- c("wt", "mpg")
mtcars %>% map_at(vars, calculate_delta)
This gives me this error message
Error in paste("d", col, sep = "") :
argument "col" is missing, with no default
I assume this is because map_at is passing vars as the df, and not passing an argument for col. To get around that issue, I tried the following:
vars <- c("wt", "mpg")
mtcars %>% map_at(vars, calculate_delta, df = .)
That throws me this error:
Error: unrecognised index type
I've monkeyed around with a bunch of different versions, including removing the df argument from the calculate_delta function, but I have had no luck.
Other potential solutions
1) A version of this using sapply, rather than purrr. I've tried solving the problem that way and had similar trouble. And my goal is to figure out a way to do this using purrr, if that is possible. Based on my understanding of purrr, this seems like a typical use case.
2) I can obviously think of how I would implement this using a for loop, but I'm trying to avoid that if possible for similar reasons.
Clearly I'm thinking about this wrong. Please help!
EDIT 1
To clarify, I am curious if there is a method of repeatedly transforming variables that accomplishes two things.
1) Generates new variables within the original tbl_df without replacing replace the columns being mutated (as is the case when using dplyr's mutate_at).
2) Automatically generates new variable labels.
3) If possible, accomplishes what I've described by applying a single function using map_at.
It may be that this is not possible, but I feel like there should be an elegant way to accomplish what I am describing.
Try simplifying the process:
delta <- function(x) (x + dplyr::lag(x)) /x
cols <- c("wt", "mpg")
#This
library(dplyr)
mtcars %>% mutate_at(cols, delta)
#Or
library(purrr)
mtcars %>% map_at(cols, delta)
#If necessary, in a function
f <- function(df, cols) {
df %>% mutate_at(cols, delta)
}
f(iris, c("Sepal.Width", "Petal.Length"))
f(mtcars, c("wt", "mpg"))
Edit
If you would like to embed new names after, we can write a custom pipe-ready function:
Rename <- function(object, old, new) {
names(object)[names(object) %in% old] <- new
object
}
mtcars %>%
mutate_at(cols, delta) %>%
Rename(cols, paste0("lagged",cols))
If you want to rename the resulting lagged variables:
mtcars %>% mutate_at(cols, funs(lagged = delta))

Resources