Read functions as text and use for plotting - r

I have a set of 500 equations listed in a single column of a .csv file. The equations are written as text like this (for example):
15+6.2*A-4.3*B+3.7*C-7.9*B*C+2*D^2
(this is the "right" side of the equation, which equals "y", but the text "y=" does not appear in the .csv file)
These are general linear models that have been written to a .csv file by someone else. Not all models have the same number of variables.
I would like to read these functions into R and format them in a way that will allow for using them to (iteratively) make simple line plots (one for each n = 500 models) of "y" across a range of values for A (shown on the x-axis), given values of B, C, and D.
Does anyone have any suggestions for how to do this?

I thought of something based on this [post][1], it is not the best solution, but it seems to work.
Equations
Created two equations for an example
models <- c("15+6.2*A-4.3*B+3.7*C-7.9*B*C+2*D^2","50+6.2*A-4.3*B+3.7*C-7.9*B*C+2*D^2")
models_names <- c("model1","model2")
Data
Random data as an example
data <-
tibble(
A = rnorm(100),
B = rnorm(100),
C = rnorm(100),
D = rnorm(100)
)
Function
Then a created a function that uses those text equations and apply as function returning the values
text_model <- function(formula){
eval(parse(text = paste('f <- function(A,B,C,D) { return(' , formula , ')}', sep='')))
out <- f(data$A,data$B,data$C,data$D)
return(out)
}
Applied equations
Finally, I apply each equation for the data, binding both.
data %>%
bind_cols(
map(.x = models,.f = text_model) %>%
set_names(models_names) %>%
bind_rows(.id = "model")
)
# A tibble: 100 x 6
A B C D model1 model2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 -0.0633 1.18 -0.409 2.01 9.52 54.9
2 -0.00207 1.35 1.28 1.59 9.16 40.3
3 0.798 -0.141 1.58 -0.123 20.6 63.2
4 -0.162 -0.0795 0.408 0.663 14.3 52.0
5 -1.11 0.788 -1.37 1.20 4.71 46.0
6 2.80 1.84 -0.850 0.161 24.4 68.7
7 1.03 0.550 0.907 -1.92 19.0 60.8
8 0.515 -0.179 -0.980 0.0437 19.0 48.9
9 -0.353 0.0643 1.39 1.30 12.5 55.3
10 -0.427 -1.01 -1.11 -0.547 16.7 39.3
# ... with 90 more rows

Related

create new variables from formulas stored in a list using dplyr

I have a list of formulas which I want to use to create new variables with mutate. For each formula stored in my list, I want to create a new variable. I want to automatically generate one variable for each element in my list. This is my code
library("dplyr")
library("purrr")
library("formula.tools")
t<-10 #just some constant which needs to be included (and found within my pipe)
ut <- list( # my list with the formulas as elements
v1 = V.1 ~ A * B*t,
v2 = V.2 ~ A+B)
data <- tibble(A=rnorm(10),B=runif(10)) %>% ## the dataset
mutate(!!lhs(ut[["v1"]]) := !!rhs(ut[["v1"]]),
!!lhs(ut[["v2"]]) := !!rhs(ut[["v2"]]))
This works fine. However, I do not want to write this for each element in my function. I want to mutate to take each element of the list, and apply the formula, i.e. I need some kind of loop. I tried with across, but across requires existing variables.
I tried to wrap it into a function and use map, but this didn't work
by_formula <- function(equation){
!!lhs(equation) := !!rhs(equation)
}
data <- tibble(A=rnorm(10),B=runif(10)) %>%
mutate(map(ut,by_formula))
I appreciate any hints how to do this so that I do not need to worry about the length of the list. This should be part of a function where the length of the list depends on the user input.
Here is one way
library(dplyr)
library(purrr)
library(formula.tools)
by_formula <- function(equation){
# //! cur_data_all may get deprecated in favor of pick
# pick(everything()) %>%
cur_data_all() %>%
transmute(!!lhs(equation) := !!rhs(equation) )
}
tibble(A=rnorm(10),B=runif(10)) %>%
mutate(map_dfc(ut, by_formula))
-output
# A tibble: 10 × 4
A B V.1 V.2
<dbl> <dbl> <dbl> <dbl>
1 1.73 0.0770 1.33 1.80
2 -1.46 0.894 -13.0 -0.562
3 -0.620 0.804 -4.99 0.184
4 0.834 0.524 4.37 1.36
5 -0.980 0.00581 -0.0569 -0.974
6 -0.361 0.316 -1.14 -0.0444
7 1.73 0.833 14.4 2.57
8 1.71 0.512 8.74 2.22
9 0.233 0.944 2.20 1.18
10 -0.832 0.474 -3.94 -0.358

Export ‘epiR" Output to Tables

Good Morning,
i am using the "epiR" packages to assess test accuracy.
https://search.r-project.org/CRAN/refmans/epiR/html/epi.tests.html
## Generate a data set listing test results and true disease status:
dis <- c(rep(1, times = 744), rep(0, times = 842))
tes <- c(rep(1, times = 670), rep(0, times = 74),
rep(1, times = 202), rep(0, times = 640))
dat.df02 <- data.frame(dis, tes)
tmp.df02 <- dat.df02 %>%
mutate(dis = factor(dis, levels = c(1,0), labels = c("Dis+","Dis-"))) %>%
mutate(tes = factor(tes, levels = c(1,0), labels = c("Test+","Test-"))) %>%
group_by(tes, dis) %>%
summarise(n = n())
tmp.df02
## View the data in conventional 2 by 2 table format:
pivot_wider(tmp.df02, id_cols = c(tes), names_from = dis, values_from = n)
rval.tes02 <- epi.tests(tmp.df02, method = "exact", digits = 2,
conf.level = 0.95)
summary(rval.tes02)
The data type is listed as "epi.test". I would like to export the summary statistics to a table (i.e. gtsummary or flextable).
As summary is a function of base R, I am struggling to do this. Can anyone help? Thank you very much
The epi.tests function has been edited so it writes the results out to a data frame (instead of a list). This will simplify export to gtsummary or flextable. epiR version 2.0.50 to be uploaded to CRAN shortly.
This was not quite as straight forward as I expected.
It appears that summary() when applied to an object x of class epi.tests simply prints x$details. x$details is a list of data.frames with statistic names as row names. That last bit makes things slightly more complicated than they would otherwise have been.
A potential tidyverse solution is
library(tidyverse)
lapply(
names(rval.tes02$detail),
function(x) {
as_tibble(rval.tes02$detail[[x]]) %>%
add_column(statistic=x, .before=1)
}
) %>%
bind_rows()
# A tibble: 18 × 4
statistic est lower upper
<chr> <dbl> <dbl> <dbl>
1 ap 0.550 0.525 0.574
2 tp 0.469 0.444 0.494
3 se 0.901 0.877 0.921
4 sp 0.760 0.730 0.789
5 diag.ac 0.826 0.806 0.844
6 diag.or 28.7 21.5 38.2
7 nndx 1.51 1.41 1.65
8 youden 0.661 0.607 0.710
9 pv.pos 0.768 0.739 0.796
10 pv.neg 0.896 0.872 0.918
11 lr.pos 3.75 3.32 4.24
12 lr.neg 0.131 0.105 0.163
13 p.rout 0.450 0.426 0.475
14 p.rin 0.550 0.525 0.574
15 p.tpdn 0.240 0.211 0.270
16 p.tndp 0.0995 0.0789 0.123
17 p.dntp 0.232 0.204 0.261
18 p.dptn 0.104 0.0823 0.128
Which is a tibble containing the same information as summary(rval.tes02), which you should be able to pass on to gtsummary or flextable. Unusually, the broom package doesn't have a tidy() verb for epi.tests objects.

Append output from iterative mapped function

This is a follow-up to a previous question: Read functions as text and use for plotting
The output of the mapped function...
data %>%
bind_cols(
map(.x = models,.f = text_model) %>%
set_names(models) %>%
bind_rows(.id = "model")
)
...generates a data frame with the results of each function written to a separate column (with the function included in the column headers).
However, it would be best to have the output from each function appended such that all results are included in the same column with a separate column to keep track of which function ("model001", "model002",..."model500") generated the results.
How can the code from the previous question (Read functions as text and use for plotting) be adjusted to write the results in this manner?
Edit: Someone suggested Read functions as text and use for plotting as an answer, but this post is a follow-up to that one asking about how the output can be written to a single column (rather than a sperate column for each function).
Given the other answer, we can pivot the data
data %>%
bind_cols(
map(.x = models,.f = text_model) %>%
set_names(models_names) %>%
bind_rows(.id = "model")
) %>%
pivot_longer(cols = model1:model2,names_to = "model")
# A tibble: 200 x 6
A B C D model value
<dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 0.833 0.538 0.647 1.65 model1 22.9
2 0.833 0.538 0.647 1.65 model2 57.9
3 2.07 1.20 -0.748 -2.04 model1 35.3
4 2.07 1.20 -0.748 -2.04 model2 70.3
5 0.880 -0.199 1.08 1.04 model1 29.2
6 0.880 -0.199 1.08 1.04 model2 64.2
7 0.252 0.400 1.45 -0.0645 model1 15.6
8 0.252 0.400 1.45 -0.0645 model2 50.6
9 0.746 0.0943 -1.00 1.44 model1 20.4
10 0.746 0.0943 -1.00 1.44 model2 55.4
# ... with 190 more rows

Export results from split and lapply to a csv or Excel file

I'm using the split() and lapply functions to run Mann Kendall trend tests in bulk. In the code below, split() separates the results (ConcLow) by Analyte (water quality parameter). Then lapply runs the MannKendall and summary for each. The output goes to the console (example shown below code), but I'd like it to go into an Excel or cvs document so I can work with it. Ideally the Excel document would have the analyte (TOC for example) in the first column, then end column = tau value, 3rd column = pvalue. Then the next tab or following columns would display results from the summary function. Any assistance you can provide is greatly appreciated! I'm quite new to R.
mk.analyte <- split(BarkTop$ConcLow, BarkTop$Analyte)
lapply(mk.analyte, MannKendall)
lapply(mk.analyte, summary)
Output for each analyte looks like this (abbreviated here, but it's a long list):
$TOC
tau = 0.0108, 2-sided pvalue =0.8081
$TOC
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.378 2.054 2.255 2.434 2.600 4.530
Data look like this:
Date Location Analyte ConcLow Units
5/8/2000 Barker Res. Hardness 3.34 mg/L (as CaCO3)
11/24/2000 Barker Res. Hardness 9.47 mg/L (as CaCO3)
6/12/2001 Barker Res. Hardness 1.4 mg/L (as CaCO3)
12/29/2001 Barker Res. Hardness 21.9 mg/L (as CaCO3)
7/17/2002 Barker Res. Fe (diss 81 ug/L
2/2/2003 Barker Res. Fe (diss 90 ug/L
8/21/2003 Barker Res. Fe (diss 0.08 ug/L
3/8/2004 Barker Res. Fe (diss 15.748 ug/L
9/24/2004 Barker Res. TSS 6.2 mg/L
4/12/2005 Barker Res. TSS 8 mg/L
10/29/2005 Barker Res. TSS 10 mg/L
In my own opinion, I would use the tidyverse, as it is easier to read.
Short way:
#Sample data
set.seed(42)
df <- data.frame(
Location = replicate(1000, sample(letters[1:15], 1)),
Analyte = replicate(1000, sample(c("Hardness", "TSS", "Fe"), 1)),
ConcLow = runif(1000, 1, 30))
#Soltion
df %>%
nest(-Location, -Analyte) %>%
mutate(
mannKendall = purrr::map(data, function(x) {
broom::tidy(Kendall::MannKendall(x$ConcLow))}),
sumData = purrr::map(data, function(x) {
broom::tidy(summary(x$ConcLow))})) %>%
select(-data) %>%
unnest(mannKendall, sumData) %>%
write_excel_csv(path = "mydata.xls")
#How the table looks like:
# A tibble: 45 x 13
Location Analyte statistic p.value kendall_score denominator var_kendall_sco~ minimum q1 median
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 n Fe 0.264 0.0907 61 231. 1258. 1.38 14.4 20.6
2 o Hardne~ 0.0870 0.568 24 276. 1625. 2.02 9.52 18.3
3 e Fe -0.108 0.499 -25 231. 1258. 1.14 9.24 15.9
4 m TSS -0.00654 1 -1 153 697 2.19 5.89 10.4
5 j TSS -0.158 0.363 -27 171. 817 1.20 6.44 12.8
6 h Hardne~ 0.0909 0.466 48 528 4165. 4.28 11.1 19.4
7 l TSS -0.0526 0.780 -9 171. 817 5.39 12.5 21.1
8 c Fe -0.0736 0.652 -17 231. 1258. 1.63 5.87 10.6
9 j Hardne~ 0.415 0.0143 71 171. 817 4.50 11.7 15.4
10 k Fe -0.146 0.342 -37 253. 1434. 2.68 12.3 15.4
# ... with 35 more rows, and 3 more variables: mean <dbl>, q3 <dbl>, maximum <dbl>
Long way
It's a bit backwards but you can do something below.
Please note that I used subset from the mtcars dataset for my solution.
require(tidyverse)
df <- mtcars %>%
select(cyl, disp)
wilx <- df %>%
split(.$cyl) %>%
map(function(x) {broom::tidy(wilcox.test(x$disp, paired = FALSE,
exact = FALSE))})
sumData <- df %>%
split(.$cyl) %>%
map(function(x) {summary(x$disp)})
for (i in 1:length(wilx)) {
write_excel_csv(as.data.frame(wilx[i]), path = paste0(getwd(), "/wilx", i, ".xls"))
write_excel_csv(as.data.frame(unlist(sumData[i])), path = paste0(getwd(), "/sumData", i, ".xls"))
}

Lookup function for mutate in data

I'd like to store functions, or at least their names, in a column of a data.frame for use in a call to mutate. A simplified broken example:
library(dplyr)
expand.grid(mu = 1:5, sd = c(1, 10), stat = c('mean', 'sd')) %>%
group_by(mu, sd, stat) %>%
mutate(sample = get(stat)(rnorm(100, mu, sd))) %>%
ungroup()
If this worked how I thought it would, the value of sample would be generated by the function in .GlobalEnv corresponding to either 'mean' or 'sd', depending on the row.
The error I get is:
Error in mutate_impl(.data, dots) :
Evaluation error: invalid first argument.
Surely this has to do with non-standard evaluation ... grrr.
A few issues here. First expand.grid will convert character values to factors. And get() doesn't like working with factors (ie get(factor("mean")) will give an error). The tidyverse-friendly version is tidyr::crossing(). (You could also pass stringsAsFactors=FALSE to expand.grid.)
Secondly, mutate() assumes that all functions you call are vectorized, but functions like get() are not vectorized, they need to be called one-at-a-time. A safer way rather than doing the group_by here to guarantee one-at-a-time evaluation is to use rowwise().
And finally, your real problem is that you are trying to call get("sd") but when you do, sd also happens to be a column in your data.frame that is part of the mutate. Thus get() will find this sd first, and this sd is just a number, not a function. You'll need to tell get() to pull from the global environment explicitly. Try
tidyr::crossing(mu = 1:5, sd = c(1, 10), stat = c('mean', 'sd')) %>%
rowwise() %>%
mutate(sample = get(stat, envir = globalenv())(rnorm(100, mu, sd)))
Three problems (that I see): (1) expand.grid is giving you factors; (2) get finds variables, so using "sd" as a stat is being confused with the column names "sd" (that was hard to find!); and (3) this really is a row-wise operation, grouping isn't helping enough. The first is easily fixed with an option, the second can be fixed by using match.fun instead of get, and the third can be mitigated with dplyr::rowwise, purrr::pmap, or base R's mapply.
This helper function was useful during debugging and can be used to "clean up" the code within mutate, but it isn't required (for other than this demonstration). Inline "anonymous" functions will work as well.
func <- function(f,m,s) get(f)(rnorm(100,mean=m,sd=s))
Several implementation methods:
set.seed(0)
expand.grid(mu = 1:5, sd = c(1, 10), stat = c('mean', 'sd'),
stringsAsFactors=FALSE) %>%
group_by(mu, sd, stat) %>% # can also be `rowwise() %>%`
mutate(
sample0 = match.fun(stat)(rnorm(100, mu, sd)),
sample1 = purrr::pmap_dbl(list(stat, mu, sd), ~ match.fun(..1)(rnorm(100, ..2, ..3))),
sample2 = purrr::pmap_dbl(list(stat, mu, sd), func),
sample3 = mapply(function(f,m,s) match.fun(f)(rnorm(100,m,s)), stat, mu, sd),
sample4 = mapply(func, stat, mu, sd)
) %>%
ungroup()
# # A tibble: 20 x 8
# mu sd stat sample0 sample1 sample2 sample3 sample4
# <int> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 mean 1.02 1.03 0.896 1.08 0.855
# 2 2 1 mean 1.95 2.07 2.05 1.90 1.92
# 3 3 1 mean 2.93 3.07 3.03 2.89 3.01
# 4 4 1 mean 4.01 3.94 4.23 4.05 3.96
# 5 5 1 mean 5.04 5.11 5.05 5.17 5.19
# 6 1 10 mean 1.67 1.21 1.30 2.08 -0.641
# 7 2 10 mean 1.82 2.82 2.35 3.65 1.78
# 8 3 10 mean 1.45 3.10 3.15 4.28 2.58
# 9 4 10 mean 3.49 6.33 5.11 2.84 3.41
# 10 5 10 mean 5.33 4.85 4.07 5.58 6.66
# 11 1 1 sd 0.965 1.04 0.993 0.942 1.08
# 12 2 1 sd 0.974 0.967 0.981 0.984 1.15
# 13 3 1 sd 1.12 0.902 1.06 0.977 1.02
# 14 4 1 sd 0.946 0.928 0.960 1.01 0.992
# 15 5 1 sd 1.06 1.01 0.911 1.11 1.00
# 16 1 10 sd 9.46 8.95 10.0 8.91 9.60
# 17 2 10 sd 9.51 9.11 11.5 9.85 10.6
# 18 3 10 sd 9.77 9.96 11.0 9.09 10.7
# 19 4 10 sd 10.5 9.84 10.1 10.6 8.89
# 20 5 10 sd 11.2 8.82 10.4 9.06 9.64
sample0 happens to work because you have grouped it to be row-wise. If at some point any one grouping has two or more values, this will fail.
For sample1 through sample4, you can remove the group_by and it works equally well (though sample0 demonstrates its failing, so remove it too). You won't get identical results as above with grouping removed, because the entropy is being consumed differently.

Resources