Building a regression results table - r

I'm attempting to build a regression results table and I'm stuck. I'm getting the error:
Error in summary(mod)$coefficients[vars, "Estimate"] : subscript out of bounds.
I have all these models run and labeled as so. What I want my table to look like:
| | model1L | model2L | model3L | model1P | model2P | model3P |
|----------|----------|----------|----------|----------|----------|----------|
|price | coef1L | coef2L | coef3L | coef1P | coef2P | coef3P |
| | sd1L | sd2L | sd3L | sd1P | sd2P | sd3P |
|promoflag | coef1L | coef2L | coef3L | coef1P | coef2P | coef3P |
| | sd1L | sd2L | sd3L | sd1P | sd2P | sd3P |
my functions to extract key regression results from an estimated model
model_list = c("model1L","model2L","model3L", "model1P", "model2P", "model3P")
vars = c("price","promoflag")
building the table
results_table1 = function(model_list, vars) {
# build leftmost column of results table
outrec = c()
for (j in 1:length(vars)) {
outrec = c(outrec,sprintf("%s",vars[j]))
outrec = c(outrec,"")
}
outrec = c(outrec,"R^2")
outrec = c(outrec,"Observations")
outdf = as.data.frame(outrec)
# process each model
for (i in 1:length(model_list)) {
# extract estimates for this model
mod = eval(parse(text=model_list[i]))
estimates = summary(mod)$coefficients[vars,"Estimate"]
ses = summary(mod)$coefficients[vars,"Std. Error"]
pvals = summary(mod)$coefficients[vars,"Pr(>|t|)"]
# process each parameter of interest
outrec = c()
for (j in 1:length(vars)) {
# set significance stars
star = ""
if (pvals[j] <= .05) {star = "*"}
if (pvals[j] <= .01) {star = "**"}
if (pvals[j] <= .001) {star = "***"}
# output estimate and std err
outrec = c(outrec,sprintf("%.4f%s",estimates[j],star))
outrec = c(outrec,sprintf("(%.4f)",ses[j]))
}
# add R^2, # of observations to output
outrec = c(outrec,sprintf("%.4f",summary(mod)$r.squared[1]))
outrec = c(outrec,sprintf("%d",nobs(mod)))
outdf = cbind(outdf,outrec)
}
# set column names to model names
names(outdf) = c("",model_list)
outdf
}
outputting the sample results table
model_list = c("model1L", "model2L", "model3L", "model1P", "model2P", "model3P")
vars = c("price", "promoflag")
outdf = results_table1(model_list, vars)
library(knitr)
kable(outdf,align='c')

Related

Saving click coordinates from ggplotly using event_data() and updating multiple plots

I just want to preface this by saying that this is my very first post on stackexchange. So apologies ahead of time if I violate any sort of norms or guidelines. I'll do my best to correct or clarify anything in the future. This community has saved me countless hours of frustration in the past. I hoping you can do your magic once again.
My end goal is to create an R-shiny app where users upload any number of files (in this case .txt), go through a sequential set of inputs that culminates in a peak detection algorithm selecting points along a line plot created from data in each of the uploaded files. No peak detection alg is perfect though (especially with noisy data), so as added functionality I would like users to be able to select OR deselect points on the plot that they believe should OR should not be "peaks". Using various posts, notably here and here I have been able to do this successfully in an app with only one file upload. However, I am having difficulty extending it to the case with multiple file uploads.
I am not going to include the original app code that I mentioned above as it is lengthy and some parts are unnecessary to this particular request. Instead, as a motivating example, I have included code for an R-shiny app that I think should provide the skeleton for what I am looking for, albeit in a much simplified form. In the app below, the user would select the "+Add .txt file(s)" multiple times (in this case just two should suffice; I can extrapolate from there) to upload data. I can't attach data but I have included an example of two such datasets at the end that you can save as .txt (without column names) to upload to the app. In the code, 5 random points are colored a different color (red; these represent the peaks are determined by the peak detection algorithm). At this point I would like to be able to dynamically select/deselect different points in each plot, record them, and color them. Therefore, I need the set of the points already randomly selected + any new points selected (- points deselected). Hence the use of the outersect() function. Obviouly, this code doesn't do what I would like it to do, but I think I am really close. Especially since I have a 100% working version with just one file upload. Any help would be much appreciated! Let me know if I need to clarify anything!
library(shiny)
library(plotly)
library(ggplot2)
ui <- fluidPage(
actionButton("addFiles","+Add .txt file(s)"),
uiOutput("dataUploadOption"),
uiOutput("plot1")
)
server <- function(input, output) {
counter <- reactiveValues(countervalue = 0)
observeEvent(input$addFiles,{
counter$countervalue <- counter$countervalue + 1
output$dataUploadOption <- renderUI({
lapply(1:counter$countervalue,function(i){
fileInput(paste0("fileInput",i), "Choose file", accept = c(".txt"))
})
})
})
datList <- reactive({
for(i in 1:counter$countervalue){
req(input[[paste0("fileInput",i)]])
}
datList <- list()
for(i in 1:counter$countervalue){
dat <- read.table(input[[paste0("fileInput",i)]]$datapath,
header = FALSE,
sep = "\t")
datList[[i]] <- dat
}
return(datList)
})
#set.seed(1)
pointData <- reactive({
req(datList())
datList <- datList()
randdatList <- list()
for(i in 1:length(datList)){
dat <- datList[[i]]
set.seed(i)
randpts <- sample(1:nrow(dat), size=5)
randdat <- dat[randpts,]
randdatList[[i]] <- randdat
}
return(randdatList)
})
x_save <- vector()
y_save <- vector()
outersect <- function(x, y) {
sort(c(x[!x%in%y],
y[!y%in%x]))
}
plot1 <- reactive({
req(datList())
req(pointData())
datList <- datList()
pointData <- pointData()
pList <- list()
for(i in 1:length(datList)){
dat <- datList[[i]]
pdat <- pointData[[i]]
s <- paste0("source_", i)
cpoints <- event_data("plotly_click", source = s)
x_save <<- c(x_save, cpoints$x)
y_save <<- c(y_save, cpoints$y)
clickdata <- data.frame(x = x_save, y = y_save)
osx <- outersect(pdat$V1, clickdata$x)
osy <- outersect(pdat$V2, clickdata$y)
clickdata2 <- data.frame(x = osx, y = osy)
p <- ggplot() +
geom_point(data = dat, aes(x = V1, y = V2)) +
geom_point(data = clickdata2, aes(x = x, y = y), color = "red") +
theme_bw()
s <- paste0("source_", i)
p <- ggplotly(p, source = s)
pList[[i]] <- p
}
lapply(1:length(pList), function(levels) {
output[[paste0('p1', levels)]] <- renderPlotly({
pList[[levels]]
})
})
return(pList)
})
output$plot1 <- renderUI({
req(plot1())
tagList(lapply(1:length(plot1()), function(i) {
plotlyOutput(paste0('p1', i))
}))
})
}
shinyApp(ui = ui, server = server)
Dataset 1:
| x | y |
|---------------------|------------------|
| -1.08950003950399 | 3.92061909032194 |
| 0.208720196968725 | 5.67222928235966 |
| -0.263836645668798 | 3.71247675453362 |
| 0.138665250101923 | 7.74906181389072 |
| -2.55225465270668 | 8.70452316801581 |
| 1.57869487229503 | 5.74999341498374 |
| -1.2308772107366 | 3.35530542164603 |
| -0.321971204328944 | 4.80979923193731 |
| -0.0220344748315207 | 5.71594486764801 |
| -0.744744222332549 | 4.55989156213266 |
| 1.04545646933507 | 5.91191284007836 |
| 0.487641404547292 | 5.72802397537141 |
| 0.530577651038453 | 3.09012202763487 |
| 1.66213363574977 | 3.33320134285085 |
| 0.0228775779018585 | 4.53629938354094 |
| 0.248560724286165 | 6.90396647322719 |
| -0.828199737516727 | 2.93261672162585 |
| -0.188767161133671 | 7.05261125820679 |
| -0.0516854640148708 | 6.05915293490151 |
| -1.1222846968583 | 1.39602662887452 |
Dataset 2:
| x | y |
|-------------------|--------------------|
| 1.02949171061974 | -3.01531004873537 |
| 0.142236350026741 | 4.03725957935051 |
| 0.393662548389848 | 0.693063164212043 |
| 0.850695864525208 | -0.817259089585591 |
| 0.415261536856849 | 6.20153263624976 |
| 0.286698530882788 | -1.97242366699712 |
| 0.396735374870177 | -3.49076632915453 |
| 1.31048795207181 | 10.6312475753655 |
| 1.49027462911218 | -2.36815149198265 |
| 0.25593074551849 | -4.20103425686884 |
| 0.141856102265992 | -4.96844566049411 |
| 0.557311276998118 | -2.60340268165709 |
| 1.09676002845372 | -3.13743417501215 |
| 1.3982175261235 | -2.88779134364473 |
| 0.1910142375317 | 3.04928812470083 |
| 0.305993362575559 | 4.45623609398508 |
| 1.20394527560274 | -0.766794097405343 |
| 0.329481916465341 | 8.49273280033692 |
| 0.300228890671757 | -4.72320661206269 |
| 2.07709902138915 | 2.78067540836668 |

Add row in etable from expss R package

I want to add rows at specific place to expss output etable. I did that with some brute force method which always add the row at the start of etable. Any method to add rows at specific place.
library(tidyverse)
library(expss)
test1 <-
mtcars %>%
tab_cells(cyl) %>%
tab_cols(vs) %>%
tab_stat_cpct() %>%
tab_pivot()
test1 %>%
tibble() %>%
tibble::add_row(.data = tibble("", test1[2, -1]/test1[1, -1]*100) %>%
set_names(names(test1))
, .before = 3)
Not sure there's a simple method exported with expss, but we can use expss::add_rows() with a simple custom function to split a table to accomplish this.
insert_row <- function(tbl, where, ...) {
args <- c(...)
tbl1 <- tbl[1:where,]
tbl2 <- tbl[(where+1):nrow(tbl),]
tbl1 %>%
add_rows(args) %>%
add_rows(tbl2)
}
insert_row(test1, 2, c("cyl|4", 300, 40))
| | | vs | |
| | | 0 | 1 |
| --- | ------------ | ---------------- | ---------------- |
| cyl | 4 | 5.55555555555556 | 71.4285714285714 |
| | 6 | 16.6666666666667 | 28.5714285714286 |
| | 4 | 300 | 40 |
| | 8 | 77.7777777777778 | |
| | #Total cases | 18 | 14 |
Solution based on #caldwellst code but with automatic ratio calculation:
insert_ratio <- function(tbl, where) {
if(is.character(where)) {
# if where is character we search it in the rowlabels
where = grep(where, tbl[[1]], fixed = TRUE)[1]
}
isTRUE(where>1) || stop("'where' should be greater than 1 for ratio calculation.")
isTRUE(where<=NROW(tbl)) || stop("'where' should be less or equal than number of rows in the table.")
tbl1 <- tbl[1:where,]
to_insert = c(row_labels = tbl[[1]][where], tbl[where, -1]/tbl[where - 1, -1]*100)
tbl2 <- tbl[(where+1):nrow(tbl),]
tbl1 %>%
add_rows(to_insert) %>%
add_rows(tbl2)
}
insert_ratio(test1, 2)
# | | | vs | |
# | | | 0 | 1 |
# | --- | ------------ | ----- | ---- |
# | cyl | 4 | 5.6 | 71.4 |
# | | 6 | 16.7 | 28.6 |
# | | | 300.0 | 40.0 |
# | | 8 | 77.8 | |
# | | #Total cases | 18.0 | 14.0 |
insert_ratio(test1, "cyl|6")
# the same result
UPDATE
Ratio calculation is moved to separate function:
ratio = function(tbl, where, label = NULL){
if(is.character(where)) {
# if where is character we search it in the rowlabels
where = grep(where, tbl[[1]], fixed = TRUE)[1]
}
isTRUE(where>1) || stop("'where' should be greater than 1 for ratio calculation.")
isTRUE(where<=NROW(tbl)) || stop("'where' should be less or equal than number of rows in the table.")
if(is.null(label)) label = tbl[[1]][where]
c(row_labels = label, tbl[where, -1]/tbl[where - 1, -1]*100)
}
insert_row = function(tbl, where, row) {
if(is.character(where)) {
# if where is character we search it in the rowlabels
where = grep(where, tbl[[1]], fixed = TRUE)[1]
}
isTRUE(where<=NROW(tbl)) || stop("'where' should be less or equal than number of rows in the table.")
first_part = seq_len(where)
tbl1 <- tbl[first_part,]
tbl2 <- tbl[-first_part,]
tbl1 %>%
add_rows(row) %>%
add_rows(tbl2)
}
insert_row(test1, 2, ratio(test1, 2))
insert_row(test1, "cyl|6", ratio(test1, "cyl|6"))

Can weighted data be used with the CrossTable function in R?

I have attached sample weights to my data using the code below;
s_w <- couple_dta$h_sw /1000000
design <-svydesign(ids =~s_unit + hh, strata =~res , weights = s_w ,data =c_dta)
I had earlier created crosstables with unweighted data using the code;
CrossTable(c_dta$varA, c_dta$varB, prop.c = FALSE ,prop.r = FALSE , prop.chisq = FALSE , format = "SPSS")
The result of CrossTable with unweighted data is something below.
Variable B
Variable A | f | mf | m | Row Total |
---------------------|-----------|-----------|-----------|-----------|
m | n1 | n2 | n3 |n1 +n2+n3 |
|n1/N x 100 | n2/N x 100|n3/N x 100 | |
---------------------|-----------|-----------|-----------|-----------|
mf | n5 | n6 | n7 | n5+n6+n7 |
|n5/N x 100 |n6/N x 100 | n7/N x 100| |
---------------------|-----------|-----------|-----------|-----------|
f | n8 | n9 | n10 |n8+n9+n10 |
|n8/N x 100 |n9/N x 100 |n10/N x 100| |
---------------------|-----------|-----------|-----------|-----------|
Column Total | n1+n5+n8 | n2+n6+n9 |n3+n7+n10 | N |
---------------------|-----------|-----------|-----------|-----------|
Is there a way of incorporating weights to the second data. I have looked at 'prop.table(svytable)' but not sure how to proceed, given that I would also like to display the number of observations in each cell and the corresponding percentage.
Thank you in advance
Edit : I have used the svytable and Crosstable functions to achieve my goal.
table2 <- svytable(~c_dta$VarA + c_dta$wrd_VarB, design=design)
CrossTable(table2 ,prop.c = FALSE ,prop.r = FALSE , prop.chisq = FALSE , format = "SPSS")

Generating table from dataframe with proportions of 20 variables, for each row, for each possible combination of said variable in R

I have a dataframe with 1000 rows representing a different species, for each of these rows are 20 columns with different proportions of a single variable (amino acids).
For each row (species), I would like to calculate the proportion of each possible combination of single letter variables (amino acids).
So each species should have 10 million calculated combinations of the amino acids.
My code for generating all possible combinations of amino acids is this:
S <- c('G','A','L','M','F','W','K','Q','E','S','P','V','I','C','Y','H','R','N','D','T')
allCombs <- function(x) c(x, lapply(seq_along(x)[-1L],
function(y) combn(x, y, collapse = "")),
recursive = TRUE)
Scombi <- allCombs(S)
My dataframe looks like this:
+----------------------------+----------+------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
| Species | Domain | Actual OGT | A | C | D | E | F | G | H | I | K | L | M | N | P | Q | R | S | T | V | W | Y |
+----------------------------+----------+------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
| Acaryochloris_marina | Bacteria | 25 | 0.089806129655016 | 0.011179368033588 | 0.052093758404379 | 0.056116688487831 | 0.033311792369428 | 0.074719969063287 | 0.021456955206517 | 0.062874293719234 | 0.046629846831622 | 0.105160548187069 | 0.023372745414207 | 0.034667218445279 | 0.050847279968411 | 0.052372091362254 | 0.054393907299958 | 0.058415776607691 | 0.059282788930956 | 0.075786041807662 | 0.012266709932789 | 0.025246090272826 |
| Acetobacter_pasteurianus | Bacteria | 26 | 0.113635842586218 | 0.009802006063102 | 0.053600553080754 | 0.058133056353357 | 0.036903783608575 | 0.085210142094237 | 0.021833316616858 | 0.053123968429941 | 0.045353753818743 | 0.096549489115246 | 0.025913145427995 | 0.027225003296464 | 0.052562918173042 | 0.033342785074972 | 0.072705595398914 | 0.049908591821467 | 0.056094207383391 | 0.079084190962059 | 0.010144168305489 | 0.018873482389179 |
| Acetobacterium_woodii | Bacteria | 30 | 0.074955804625209 | 0.011863137047001 | 0.058166310295556 | 0.071786218284636 | 0.03424697521635 | 0.075626240308253 | 0.018397399287915 | 0.087245372635541 | 0.078978610001876 | 0.087790924875632 | 0.03068806687375 | 0.046498124583435 | 0.036120348133785 | 0.031790536900726 | 0.045179171055634 | 0.050727609439901 | 0.055617806111571 | 0.069643619533744 | 0.005984048340735 | 0.028693676448754 |
| Acetohalobium_arabaticum | Bacteria | 37 | 0.07294006171749 | 0.008402092275195 | 0.063388830763099 | 0.094174357919767 | 0.032968396601359 | 0.074335444399095 | 0.014775170057021 | 0.081175614650614 | 0.068173658934912 | 0.096191143631822 | 0.023591084039018 | 0.042176390239929 | 0.036535950562554 | 0.032690297143697 | 0.045929769851454 | 0.05201834344653 | 0.049098780255464 | 0.079225589949997 | 0.004923023531168 | 0.027286000029819 |
| Acholeplasma_laidlawii | Bacteria | 37 | 0.067353087090147 | 0.002160134400001 | 0.056809775441953 | 0.065310218890485 | 0.038735792072418 | 0.069508395797039 | 0.018942086187746 | 0.081435757342441 | 0.084786245636216 | 0.096181862610799 | 0.026545056054257 | 0.045549913713558 | 0.038323250930165 | 0.033008924859672 | 0.047150659509282 | 0.054698408656138 | 0.059971572823796 | 0.072199395290938 | 0.005926270925023 | 0.03540319176793 |
| Achromobacter_xylosoxidans | Bacteria | 30 | 0.120974236639852 | 0.008469732379263 | 0.054028585828065 | 0.055476991380945 | 0.035048667997051 | 0.086814010110846 | 0.02243157894653 | 0.050520668283285 | 0.039296015271673 | 0.099074202941835 | 0.028559018986725 | 0.025845147774914 | 0.049701994138614 | 0.034808403369533 | 0.073998251525545 | 0.050072992977641 | 0.051695040348985 | 0.080314177991249 | 0.011792085285623 | 0.021078197821829 |
+----------------------------+----------+------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
So you can see, each row has the proportion of each amino acid (A,G,I etc.) over the entire set of amino acids, (all 20 add up to 1), but I would like to generate each possible combination, over 1. so something that looks like the following:
+----------------------+----------+------------+-------------------+-------------------+-------------------+-------------------+
| Species | Domain | Actual OGT | A | AC | AD | AE |
+----------------------+----------+------------+-------------------+-------------------+-------------------+-------------------+
| Acaryochloris_marina | Bacteria | 25 | 0.089806129655016 | 0.191179368033588 | 0.1782093758404379 | 0.186116688487831 |
+----------------------+----------+------------+-------------------+-------------------+-------------------+-------------------+
So for each species, 10 million columns (each representing one of the possible combinations of amino acids, without repetition, so the largest string is 20 with each one)
Apologies for being unclear, does anyone have any ideas on how to create this data-set? (Or the best way of asking/explaining what I should be looking up?)
Species <- structure(list(Species = c("Acaryochloris_marina",
"Acetobacter_pasteurianus",
"Acetobacterium_woodii", "Acetohalobium_arabaticum", "Acholeplasma_laidlawii",
"Achromobacter_xylosoxidans"), Domain = c("Bacteria", "Bacteria",
"Bacteria", "Bacteria", "Bacteria", "Bacteria"), Actual.OGT = c(25,
26, 30, 37, 37, 30), A = c(0.089806129655016, 0.113635842586218,
0.074955804625209, 0.07294006171749, 0.067353087090147, 0.120974236639852
), C = c(0.011179368033588, 0.009802006063102, 0.011863137047001,
0.008402092275195, 0.002160134400001, 0.008469732379263), D = c(0.052093758404379,
0.053600553080754, 0.058166310295556, 0.063388830763099, 0.056809775441953,
0.054028585828065), E = c(0.056116688487831, 0.058133056353357,
0.071786218284636, 0.094174357919767, 0.065310218890485, 0.055476991380945
), F = c(0.033311792369428, 0.036903783608575, 0.03424697521635,
0.032968396601359, 0.038735792072418, 0.035048667997051), G = c(0.074719969063287,
0.085210142094237, 0.075626240308253, 0.074335444399095, 0.069508395797039,
0.086814010110846), H = c(0.021456955206517, 0.021833316616858,
0.018397399287915, 0.014775170057021, 0.018942086187746, 0.02243157894653
), I = c(0.062874293719234, 0.053123968429941, 0.087245372635541,
0.081175614650614, 0.081435757342441, 0.050520668283285), K = c(0.046629846831622,
0.045353753818743, 0.078978610001876, 0.068173658934912, 0.084786245636216,
0.039296015271673), L = c(0.105160548187069, 0.096549489115246,
0.087790924875632, 0.096191143631822, 0.096181862610799, 0.099074202941835
), M = c(0.023372745414207, 0.025913145427995, 0.03068806687375,
0.023591084039018, 0.026545056054257, 0.028559018986725), N = c(0.034667218445279,
0.027225003296464, 0.046498124583435, 0.042176390239929, 0.045549913713558,
0.025845147774914), P = c(0.050847279968411, 0.052562918173042,
0.036120348133785, 0.036535950562554, 0.038323250930165, 0.049701994138614
), Q = c(0.052372091362254, 0.033342785074972, 0.031790536900726,
0.032690297143697, 0.033008924859672, 0.034808403369533), R = c(0.054393907299958,
0.072705595398914, 0.045179171055634, 0.045929769851454, 0.047150659509282,
0.073998251525545), S = c(0.058415776607691, 0.049908591821467,
0.050727609439901, 0.05201834344653, 0.054698408656138, 0.050072992977641
), T = c(0.059282788930956, 0.056094207383391, 0.055617806111571,
0.049098780255464, 0.059971572823796, 0.051695040348985), V = c(0.075786041807662,
0.079084190962059, 0.069643619533744, 0.079225589949997, 0.072199395290938,
0.080314177991249), W = c(0.012266709932789, 0.010144168305489,
0.005984048340735, 0.004923023531168, 0.005926270925023, 0.011792085285623
), Y = c(0.025246090272826, 0.018873482389179, 0.028693676448754,
0.027286000029819, 0.03540319176793, 0.021078197821829)), .Names = c("Species",
"Domain", "Actual.OGT", "A", "C", "D", "E", "F", "G", "H", "I",
"K", "L", "M", "N", "P", "Q", "R", "S", "T", "V", "W", "Y"), row.names = c(NA,
-6L), class = "data.frame")
I'm not entirely sure that R is the right tool for this job. It's going to take a very, very long time. You may be able to reduce that time using the parallel package if you have sufficient cores, however.
I've put together a process that will accomplish what you want. For each species, it takes my computer about eight minutes to generate the "joint proportion." If you run on a single thread, as R will do inherently, you're looking at close to an hour just to accomplish the these six species in your sample data.
I wrote my script to run in parallel, and using seven cores, it took about 11 minutes to complete all six. Extending this over all 1000 species, I wouldn't be surprised if it took as long as two days to do all this (on seven cores). If you have a large cluster, you may be able to cut it down some.
Please note that this will not give you your results as described in your question. I posted a comment that I wasn't sure what formula you were using to get the joint proportions. I am just taking the sum here for ease of demonstration. You will need to adjust your code appropriately.
library(parallel)
library(dplyr)
library(tidyr)
library(magrittr)
# Reshape data. This will make it easier to split and access proportion
# within each species.
SpeciesLong <-
Species %>%
gather(protein, proportion,
A:Y) %>%
arrange(Species)
# Get unique species
S <- unique(SpeciesLong$protein)
# Build the combination list
# Note, this is different than your code, I added FUN = paste0
Scombi <- unlist(lapply(seq_along(S),
function(x) combn(S, x, FUN = paste0, collapse = "")))
# Function to get the joint proportion
# I took the sum, for convenience. You'll need to replace this
# with whatever function you use to get the joint proportion.
# The key part is getting the correct proteins, which happens within
# the `sum` call.
joint_protein <- function(protein_combo, data){
sum(data$proportion[vapply(data$protein,
grepl,
logical(1),
protein_combo)])
}
# make a list data frames, one for each species
SplitSpecies <-
split(SpeciesLong,
SpeciesLong$Species)
# Make a cluster of processors to run on
cl <- makeCluster(detectCores() - 1)
# export Scombi and joint_protein to all processes in the cluster
clusterExport(cl, c("Scombi", "joint_protein"))
# Get the aggregate values for each species in a one-row data frame.
SpeciesAggregate <-
parLapply(cl,
X = SplitSpecies,
fun = function(data){
X <- lapply(Scombi,
joint_protein,
data)
names(X) <- Scombi
as.data.frame(X)
})
# Join the results to the Species data
# You may want to save your data before this step. I'm not entirely
# sure I did this right to match the rows correctly.
Species <- cbind(Species, SpeciesAggregate)

Use character String as function argument R

I lost already so much time but I don't get it.
Is it possible to use a String as an argument in a function?
My String is definded as:
mergesetting <- "all = FALSE"
(Sometimes I use "all.y = TRUE" or "all.x = TRUE" instead)
I tried to set that String as an argument into the following Function:
merged = merge.data.frame(x = DataframeA ,y = DataframeB ,by = "date_new", mergesetting )
But i get an error message: Error in fix.by(by.x, x)
The function does work if I use the argument directly:
merged = merge.data.frame(x = DataframeA,y = DataframeB,by = "date_new", all = FALSE )
As well two other approaches found on Use character string as function argument
didn't work:
L<- list(x = DataframeA,y = DataframeB,by = "date_new", mergesetting)
merged <- do.call(merge.data.frame, L)
Any help is much appreciated.
Not sure the point but,
if you had a list with your data and arguments
Say dfA is this data frame
kable(head(dfA))
|dates | datum|
|:----------|-----:|
|2010-05-11 | 1130|
|2010-05-12 | 1558|
|2010-05-13 | 1126|
|2010-05-14 | 131|
|2010-05-15 | 2223|
|2010-05-16 | 4005|
and dfB is this...
kable(head(dfB))
|dates | datum|
|:----------|-----:|
|2010-05-11 | 3256|
|2010-05-12 | 50|
|2010-05-13 | 2280|
|2010-05-14 | 4981|
|2010-05-15 | 2117|
|2010-05-16 | 791|
Your pre set list:
arg.li <- list(dfA = dfA,dfB = dfB,all = T,by = 'dates')
The wrapper for the list function...
f <- function(x)do.call('merge.data.frame',list(x = x$dfA,y = x$dfB,all = x$all))
results in:
kable(summary(f(arg.li)))
| | dates | datum |
|:--|:------------------|:------------|
| |Min. :2010-05-11 |Min. : 24 |
| |1st Qu.:2010-09-03 |1st Qu.:1288 |
| |Median :2010-12-28 |Median :2520 |
| |Mean :2011-01-09 |Mean :2536 |
| |3rd Qu.:2011-04-22 |3rd Qu.:3785 |
| |Max. :2011-12-01 |Max. :5000 |

Resources