R: rolling correlation / non-standard evaluation - r

I am trying to calculate rolling correlation on a tibble, iterating through column names in a loop. I seem to be struggling passing variables to a function, though. This works:
tbl <- tibble(date = seq(as.Date("1983-03-31"), by=7, length.out=100),
col1 = 1:100, col2 = sample(100, size = 100, replace=TRUE), col3 = col1 + col2)
tbl %>%
tq_mutate_xy(
x = col1,
y = col3,
mutate_fun = runCor,
n = 10,
use = "pairwise.complete.obs",
col_rename = "col1_col3_corr"
)
But this doesn't:
tbl <- tibble(date = seq(as.Date("1983-03-31"), by=7, length.out=100),
col1 = 1:100, col2 = sample(100, size = 100, replace=TRUE), col3 = col1 + col2)
c1 <- "col1"
c2 <- "col3"
tbl %>%
tq_mutate_xy(
x = !!c1,
y = !!c2,
mutate_fun = runCor,
n = 10,
use = "pairwise.complete.obs",
col_rename = paste0(c1, "_", c2, "_corr")
)
The error is "Error in check_x_y_valid(data, x, y) : x = !(!c1) not a valid name."
What am I doing wrong?

First, I think you want the non-standard evaluation (NSE) version of tq_mutate_xy --
that is, tq_mutate_xy_. As a result, when you use the NSE of these functions, you need to use character strings -- this means your mutate_fun variable should also be a character string. The following should work:
c1 <- "col1"
c2 <- "col3"
tbl %>%
tq_mutate_xy_(
x = c1,
y = c2,
mutate_fun = "runCor",
n = 10,
use = "pairwise.complete.obs",
col_rename = paste0(c1, "_", c2, "_corr")
)
Be sure to look at example 5 from the help documentation, ?tq_mutate_xy

Related

Convert dplyr::select structure to base R

I have this data:
df_1 <- data.frame(
x = replicate(
n = 10, expr = runif(n = 1000, min = 20, max = 100)
)
)
My code:
library(dplyr)
df_1 |>
(\(x) cbind(x, r = apply(x[colnames(x = select(x, where(is.numeric) & head(x = everything(x), 2) & starts_with("x.")))], 1, sum, na.rm = T)))()
I tried use [ instead colnames, but doesn't work. I want convert this part (simultaneously, as a dplyr::select structure made above):
[colnames(x = select(x, where(is.numeric) & head(x = everything(x), 2) & starts_with("x.")))]
to base R.
Just do:
transform(df_1, r = x.1 + x.2)
or even:
cbind(df_1, r = rowSums(df_1[1:2]))
or even:
cbind(df_1, r = df_1[1] + df_1[2])

How to plot sjPlots from a nested tibble?

I create some models like this using a nested tidyr dataframe:
set.seed(1)
library(tidyr)
library(dplyr)
library(sjPlot)
library(tibble)
library(purrr)
fits <- tribble(~group, ~colA, ~colB, ~colC,
sample(c("group1", "group2"), 10, replace = T), 0, sample(10, replace = T), sample(10, replace = T),
sample(c("group1", "group2"), 10, replace = T), 1, sample(10, replace = T), sample(10, replace = T)) %>%
unnest(cols = c(colB, colC)) %>%
nest(data=-group) %>%
mutate(fit= map(data, ~glm(formula = colA ~ colB + colC, data = .x, family="binomial"))) %>%
dplyr::select(group, fit) %>%
tibble::column_to_rownames("group")
I would like to use this data to create some quick marginal effects plots with sjPlot::plot_models like this
plot_models(as.list(fits), type = "pred", terms = c("colB", "colA", "colC"))
Unfortunately, I get the error
Error in if (fam.info$is_linear) tf <- NULL else tf <- "exp" :
argument is of length zero
In addition: Warning message:
Could not access model information.
I've played around a bit with the nesting of the data but I've been unable to get it into a format that sjPlot::plot_models will accept.
What I was expecting to get is a "Forest plot of multiple regression models" as described in the help file. Ultimately, the goal is to plot the marginal effects of regression models by group, which I was hoping the plot_models will do (please correct me if I'm wrong).
It think there are some issues with the original code as well as with the data. There are arguments from plot_model in the function call which are not supported in plot_models. I first show an example that shows how plot_models can be called and used with a nested tibble using {ggplot2}'s diamonds data set. Then I apply this approach to the OP's sample data, which doesn't yield useable results*. Finally, I create some new toy data to show how the approach could be applied to a binominal model.
(* In the original toy data the dependent variable is either always 0 or always 1 in each model so this is unlikely to yield useable results).
set.seed(1)
library(tidyr)
library(dplyr)
library(sjPlot)
library(tibble)
library(ggplot2)
# general example
fits <- tibble(id = c("x", "y", "z")) %>%
rowwise() %>%
mutate(fit = list(glm(reformulate(
termlabels = c("cut", "color", "depth", "table", "price", id),
response = "carat"),
data = diamonds)))
plot_models(fits$fit)
# OP's example data
fits2 <- tribble(~group, ~colA, ~colB, ~colC,
sample(c("group1", "group2"), 10, replace = T), 0,
sample(10, replace = T), sample(10, replace = T),
sample(c("group1", "group2"), 10, replace = T), 1,
sample(10, replace = T),
sample(10, replace = T)) %>%
unnest(cols = c(colB, colC)) %>%
nest(data = -group) %>%
rowwise() %>%
mutate(fit = list(glm(formula = colA ~ colB + colC, data = data, family="binomial")))
plot_models(fits2$fit)
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning: Removed 4 rows containing missing values (geom_point).
# new data for binominal model
n <- 500
g <- round(runif(n, 0L, 1L), 0)
x1 <- runif(n,0,100)
x2 <- runif(n,0,100)
y <- (x2 - x1 + rnorm(n,sd=20)) < 0
fits3 <- tibble(g, y, x1, x2) %>%
nest_by(g) %>%
mutate(fit = list(glm(formula = y ~ x1 + x2, data = data, family="binomial")))
plot_models(fits3$fit)
Created on 2021-01-23 by the reprex package (v0.3.0)

Can you pipe data into a pairwise.t.test?

I'm wondering if the following code can be simplified to allow the data to be piped directly from the summarise command to the pairwise.t.test, without creating the intermediary object?
data_for_PTT <- data %>%
group_by(subj, TT) %>%
summarise(meanRT = mean(RT))
pairwise.t.test(x = data_for_PTT$meanRT, g = data_for_PTT$TT, paired = TRUE)
I tried x = .$meanRT but it didn't like it, returning:
Error in match.arg(p.adjust.method) :
'arg' must be NULL or a character vector
You can use curly braces:
data_for_PTT <- data %>%
group_by(subj, TT) %>%
summarise(meanRT = mean(RT)) %>%
{pairwise.t.test(x = .$meanRT, g = .$TT, paired = TRUE)}
Reproducible:
df <- data.frame(X1 = runif(1000), X2 = runif(1000), subj = rep(c("A", "B")))
df %>%
{pairwise.t.test(.$X1, .$subj, paired = TRUE)}

Plot a list using R

I would like to know how to plot a list.
Now I have a list
[[1977]] keyword1, keyword2, keyword3, ...
[[1978]] keyword2, keyword5, ...
...
[[2018]] keyword1, keywords3, ...
length(mylist) = 2018
lengts(mylist) = 0,0,0,0,0,......
dput(head(mylist)) = list(NULL, NULL, NULL, NULL, NULL, NULL)
And I would like to plot it using keywords' frequencies as the y-axis and 1977~2018 as the x-axis.
So it should have many lines equal to the number of the keywords. Does anyone have any idea?
Try this example:
# example data
set.seed(1); myList <- list(sample(LETTERS[1:3], 10, replace = TRUE),
sample(LETTERS[1:3], 10, replace = TRUE),
sample(LETTERS[1:3], 10, replace = TRUE),
sample(LETTERS[1:3], 10, replace = TRUE),
sample(LETTERS[1:3], 10, replace = TRUE))
names(myList) <- 1977:1981
library(ggplot2)
library(dplyr)
plotDat <- stack(myList) %>%
mutate(myYears = as.numeric(as.character(ind)),
myWords = values) %>%
group_by(myYears, myWords) %>%
summarise(myCount = n())
ggplot(plotDat, aes(x = myYears, y = myCount, col = myWords)) +
geom_line()
You can probably use data.table::rbindlist() to create a long data.table. Summarise to a frequency-table to plot with ggplot-functions
# using example data from #zx8754's answer.
library( data.table )
library( ggplot2 )
dt <- data.table::rbindlist( lapply( myList, as.data.table ), idcol = "year" )
dt <- dt[, .N, by = list(year, V1) ]
ggplot( data = dt, aes( x = year, y = N, group = V1, fill = V1 )) + geom_col( color = "black" )

Quosure with in a nested function

I am struggling to write a function fun2 that uses fun1... and keep getting errors. I have written a simplified example below. It is the first time I deal with "tidy evaluation" and not sure to understand the in and outs of it.
Example dataframes:
d1 = data.frame(
ID = c("A", "A", "A", "B", "B", "C", "C", "C", "C"),
EXPR = c(2, 8, 3, 5, 7, 20, 1, 5, 4)
)
d2 = data.frame(
ID = c("A", "B", "C"),
NUM = c(22, 50, 31)
)
First function
fun1 <- function(
df1 = "df 1",
df2 = "df 2",
t1 = "threshold 1",
expr_col = "expr column",
id_col = "sample column - must be present in df1 and df2") {
# dataframes
df <- df1
db <- df2
# quosure
enquo_id <- enquo(id_col)
enquo_expr <- enquo(expr_col)
# classify
df <- df %>%
mutate(threshold = t1) %>%
mutate(class = ifelse(!!enquo_expr > t1, "positive", "negative")) %>%
mutate(class = factor(class, levels = c("positive", "negative")))
# calculate sample data
df.sum <- df %>%
group_by(!!enquo_id, class) %>%
summarise(count = n()) %>%
complete(class, fill = list(count = 0)) %>%
mutate(total = sum(count), freq = count/total)
# merge dataframes
df.sum <- left_join(df.sum, db, by = quo_name(enquo_id))
# return
return(df.sum)
}
If I run a test of this, I get a dataframe in return, as expected
test <- fun1(df1 = d1, df2 = d2, t1 = 3, expr_col = EXPR, id_col = ID)
Second funtion
Now with fun2, I am trying to use fun1 in a for loop to iterate from ti to tf of the seq vector:
fun2 <- function(
df1 = "df 1",
df2 = "df 2",
expr_col = "expr column",
id_col = "sample column - must be present in df1 and df2",
ti = "initial value",
tf = "final value",
res = "resolution") {
# define variables for fun1
var1 <- enquo(d1)
var2 <- enquo(d2)
var3 <- enquo(t1)
var4 <- enquo(EXPR)
var5 <- enquo(ID)
# get sequence of values
seq <- seq(from = ti, to = tf, by = res)
# open list
t.list <- list()
# Loop ----
for (i in seq_along(seq)){
t1 <- seq[i]
t.list[[i]] <- fun1(df1 = var1,
df2 = var2,
t1 = var3,
expr_col = var4,
id_col = var5)
}
df.out <- plyr::ldply(t.list, rbind)
### Return ---
return(df.out)
}
But if I run this
test <- fun2(df1 = d1, df2 = d2, expr_col = EXPR, id_col = ID, ti = 1, tf = 10, res = 1)
I get an error message
Error in (function (x) : object 'EXPR' not found
I tried various things... and I am kind of stuck here. I guess I am not using enquo() properly. I can get it to work by not using varX and putting directly the actual appropriate name of each element in the fun1 arguments, but the whole point of doing this, to me, is to make it "generalisable" and therefore specify the arguments only in fun2 which will then be passed to fun1.
Any help would be greatly appreciated.
Many thanks for your answer aosmith. I am now sorted using the following code:
fun2 <- function(
df1 = "df 1",
df2 = "df 2",
expr_col = "expr column",
id_col = "sample column - must be present in df1 and df2",
ti = "initial value",
tf = "final value",
res = "resolution") {
# define variables for fun1
var4 <- enquo(expr_col)
var5 <- enquo(id_col)
# get sequence of values
seq <- seq(from = ti, to = tf, by = res)
# open list
t.list <- list()
### Loop --------------------------------------------------------------
for (i in seq_along(seq)){
t1 <- seq[i]
t.list[[i]] <- fun1(df1 = df1,
df2 = df2,
t1 = t1,
expr_col = !!var4,
id_col = !!var5)
}
df.out <- plyr::ldply(t.list, rbind)
### Return ---
return(df.out)
}
# TEST FUN2
test <- fun2(df1 = d1, df2 = d2, expr_col = EXPR, id_col = ID, ti = 1, tf = 10, res = 1)

Resources