I'm working with this data.frame and I would like to create a new column called "predicted" whose values are calculated with this formula:
rbeta(1,alfa,beta)
Here's some example data:
data<-structure(list(mu = c(0.548403436247893, 0.944576856539307, 0.72167558981069,
0.721610257581108, 0.987386739865525), kappa = c(77.8230430114621,
26.2939905325391, 28.0123299600893, 24.5166019567386, 42.8769003810988
), alfa = c(42.6784242067533, 24.8366949231, 20.2158147459191,
17.6914314530156, 42.336082882832), beta = c(35.1446188047087,
1.45729560943902, 7.7965152141702, 6.82517050372298, 0.540817498266786
)), class = "data.frame", row.names = c(NA, -5L))
Thanks
The first argument to rbeta is the number of values you want - use the number of rows of data, not 1.
data$predicted = with(data, rbeta(nrow(data), alfa, beta))
(rbeta is vectorized over the shape1 and shape2 parameters).
Or with dplyr:
library(dplyr)
data %>%
mutate(predicted = rbeta(n(), alfa, beta))
We can use package purrr to iterate over 2 columns:
library(purrr)
data %>%
mutate(predicted = map2_dbl(alfa, beta, ~ rbeta(1, .x, .y)))
mu kappa alfa beta predicted
1 0.5484034 77.82304 42.67842 35.1446188 0.5618492
2 0.9445769 26.29399 24.83669 1.4572956 0.9805548
3 0.7216756 28.01233 20.21581 7.7965152 0.7686036
4 0.7216103 24.51660 17.69143 6.8251705 0.8851859
5 0.9873867 42.87690 42.33608 0.5408175 0.9991376
If you use tibble instead of data.frame you can do it at the same time you're defining the mu, kappa, alfa and beta columns:
library(tibble)
data <- tibble(mu = c(0.548403436247893, 0.944576856539307, 0.72167558981069, 0.721610257581108, 0.987386739865525),
kappa = c(77.8230430114621, 26.2939905325391, 28.0123299600893, 24.5166019567386, 42.8769003810988),
alfa = c(42.6784242067533, 24.8366949231, 20.2158147459191, 17.6914314530156, 42.336082882832),
beta = c(35.1446188047087, 1.45729560943902, 7.7965152141702, 6.82517050372298, 0.540817498266786),
predicted = rbeta(n(), alfa, beta)
)
data
## A tibble: 5 x 5
# mu kappa alfa beta predicted
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 0.548 77.8 42.7 35.1 0.534
#2 0.945 26.3 24.8 1.46 0.846
#3 0.722 28.0 20.2 7.80 0.797
#4 0.722 24.5 17.7 6.83 0.653
#5 0.987 42.9 42.3 0.541 0.991
tibble is an enhanced version of data.frame and may worth take a look: https://r4ds.had.co.nz/tibbles.html
Related
I want to achieve a GAM plot that looks like this
Image from https://stats.stackexchange.com/questions/179947/statistical-differences-between-two-hourly-patterns/446048#446048
How can I accomplish this?
Model is
model = gam(y ~ s(t) + g, data = d)
The general way to do this is to compute model estimates (fitted values) over the range of the covariate(s) of interest for each group. The reproducible example below illustrates once way to do this using {mgcv} to fit the GAM and my {gratia} package for some helper functions to facilitate the process.
library("gratia")
library("mgcv")
library("ggplot2")
eg_data <- data_sim("eg4", n = 400, dist = "normal", scale = 2, seed = 1)
m <- gam(y ~ s(x2) + fac, data = eg_data, method = "REML")
ds <- data_slice(m, x2 = evenly(x2, n = 100), fac = evenly(fac))
fv <- fitted_values(m, data = ds)
The last line gets you fitted values from the model at the covariate combinations specified in the data slice:
> fv
# A tibble: 300 × 6
x2 fac fitted se lower upper
<dbl> <fct> <dbl> <dbl> <dbl> <dbl>
1 0.00131 1 -1.05 0.559 -2.15 0.0412
2 0.00131 2 -3.35 0.563 -4.45 -2.25
3 0.00131 3 1.13 0.557 0.0395 2.22
4 0.0114 1 -0.849 0.515 -1.86 0.160
5 0.0114 2 -3.14 0.519 -4.16 -2.13
6 0.0114 3 1.34 0.513 0.332 2.34
7 0.0215 1 -0.642 0.474 -1.57 0.287
8 0.0215 2 -2.94 0.480 -3.88 -2.00
9 0.0215 3 1.54 0.473 0.616 2.47
10 0.0316 1 -0.437 0.439 -1.30 0.424
# … with 290 more rows
# ℹ Use `print(n = ...)` to see more rows
This object is in a form suitable for plotting with ggplot():
fv |>
ggplot(aes(x = x2, y = fitted, colour = fac)) +
geom_point(data = eg_data, mapping = aes(y = y), size = 0.5) +
geom_ribbon(aes(x = x2, ymin = lower, ymax = upper, fill = fac,
colour = NULL),
alpha = 0.2) +
geom_line()
which produces
You can enhance and/or modify this using your ggplot skills.
The basic point with this model is that you have a common smooth effect of a covariate (here x2) plus group means (for the factor fac). Hence the curves are "parallel".
Note that there's a lot of variation around the estimated curves in this model because the simulated data are from a richer model with group-specific smooths and smooth effects of other covariates.
gg.bs30 <- ggplot(data,aes(x=Predictor,y=Output,col=class))+geom_point()+
geom_smooth(method='gam',formula=y ~ splines::bs(x, 30)) + facet_grid(class ~.)
print(gg.bs30)
Code from -> https://github.com/mariocastro73/ML2020-2021/blob/master/scripts/gams-with-ggplot-classes.R
Good Morning,
i am using the "epiR" packages to assess test accuracy.
https://search.r-project.org/CRAN/refmans/epiR/html/epi.tests.html
## Generate a data set listing test results and true disease status:
dis <- c(rep(1, times = 744), rep(0, times = 842))
tes <- c(rep(1, times = 670), rep(0, times = 74),
rep(1, times = 202), rep(0, times = 640))
dat.df02 <- data.frame(dis, tes)
tmp.df02 <- dat.df02 %>%
mutate(dis = factor(dis, levels = c(1,0), labels = c("Dis+","Dis-"))) %>%
mutate(tes = factor(tes, levels = c(1,0), labels = c("Test+","Test-"))) %>%
group_by(tes, dis) %>%
summarise(n = n())
tmp.df02
## View the data in conventional 2 by 2 table format:
pivot_wider(tmp.df02, id_cols = c(tes), names_from = dis, values_from = n)
rval.tes02 <- epi.tests(tmp.df02, method = "exact", digits = 2,
conf.level = 0.95)
summary(rval.tes02)
The data type is listed as "epi.test". I would like to export the summary statistics to a table (i.e. gtsummary or flextable).
As summary is a function of base R, I am struggling to do this. Can anyone help? Thank you very much
The epi.tests function has been edited so it writes the results out to a data frame (instead of a list). This will simplify export to gtsummary or flextable. epiR version 2.0.50 to be uploaded to CRAN shortly.
This was not quite as straight forward as I expected.
It appears that summary() when applied to an object x of class epi.tests simply prints x$details. x$details is a list of data.frames with statistic names as row names. That last bit makes things slightly more complicated than they would otherwise have been.
A potential tidyverse solution is
library(tidyverse)
lapply(
names(rval.tes02$detail),
function(x) {
as_tibble(rval.tes02$detail[[x]]) %>%
add_column(statistic=x, .before=1)
}
) %>%
bind_rows()
# A tibble: 18 × 4
statistic est lower upper
<chr> <dbl> <dbl> <dbl>
1 ap 0.550 0.525 0.574
2 tp 0.469 0.444 0.494
3 se 0.901 0.877 0.921
4 sp 0.760 0.730 0.789
5 diag.ac 0.826 0.806 0.844
6 diag.or 28.7 21.5 38.2
7 nndx 1.51 1.41 1.65
8 youden 0.661 0.607 0.710
9 pv.pos 0.768 0.739 0.796
10 pv.neg 0.896 0.872 0.918
11 lr.pos 3.75 3.32 4.24
12 lr.neg 0.131 0.105 0.163
13 p.rout 0.450 0.426 0.475
14 p.rin 0.550 0.525 0.574
15 p.tpdn 0.240 0.211 0.270
16 p.tndp 0.0995 0.0789 0.123
17 p.dntp 0.232 0.204 0.261
18 p.dptn 0.104 0.0823 0.128
Which is a tibble containing the same information as summary(rval.tes02), which you should be able to pass on to gtsummary or flextable. Unusually, the broom package doesn't have a tidy() verb for epi.tests objects.
The answer to this question clearly explains how to retrieve tidy regression results by group when running a regression through a dplyr pipe, but the solution is no longer reproducible.
How can one use dplyr and broom in combination to run a regression by group and retrieve tidy results using R 4.02, dplyr 1.0.0, and broom 0.7.0?
Specifically, the example answer from the question linked above,
library(dplyr)
library(broom)
df.h = data.frame(
hour = factor(rep(1:24, each = 21)),
price = runif(504, min = -10, max = 125),
wind = runif(504, min = 0, max = 2500),
temp = runif(504, min = - 10, max = 25)
)
dfHour = df.h %>% group_by(hour) %>%
do(fitHour = lm(price ~ wind + temp, data = .))
# get the coefficients by group in a tidy data_frame
dfHourCoef = tidy(dfHour, fitHour)
returns the following error (and three warnings) when I run it on my system:
Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
Calling var(x) on a factor x is defunct.
Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
In addition: Warning messages:
1: Data frame tidiers are deprecated and will be removed in an upcoming release of broom.
2: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
3: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
If I reformat df.h$hour as a character rather than factor,
df.h <- df.h %>%
mutate(
hour = as.character(hour)
)
re-run the regression by group, and again attempt to retrieve the results using broom::tidy,
dfHour = df.h %>% group_by(hour) %>%
do(fitHour = lm(price ~ wind + temp, data = .))
# get the coefficients by group in a tidy data_frame
dfHourCoef = tidy(dfHour, fitHour)
I get this error:
Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
is.atomic(x) is not TRUE
I assume that the problem has to do with the fact that the group-level regression results are stored as lists in dfHour$fitHour, but I am unsure how to correct the error and once again tidily and quickly compile the regression results, as used to work in the originally posted code/answer.
****** Updated with more succinct code pulled from the dplyr 1.0.0 release notes ******
Thank you. I was struggling with a similar question with the update to dplyr 1.0.0 related to using the examples in the provided link. This was both a helpful question and answer.
One note as an FYI, do() has been superseded as of dplyr 1.0.0, so may consider using the updated language (now very efficient with my update):
dfHour = df.h %>%
# replace group_by() with nest_by()
# to convert your model data to a vector of lists
nest_by(hour) %>%
# change do() to mutate(), then add list() before your model
# make sure to change data = . to data = data
mutate(fitHour = list(lm(price ~ wind + temp, data = data))) %>%
summarise(tidy(mod))
Done!
This gives a very efficient df with select output stats. The last line replaces the following code (from my original response), which does the same thing, but less easily:
ungroup() %>%
# then leverage the feedback from #akrun
transmute(hour, HourCoef = map(fitHour, tidy)) %>%
unnest(HourCoef)
dfHour
Which gives the outupt:
# A tibble: 72 x 6
hour term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 (Intercept) 68.6 21.0 3.27 0.00428
2 1 wind 0.000558 0.0124 0.0450 0.965
3 1 temp -0.866 0.907 -0.954 0.353
4 2 (Intercept) 31.9 17.4 1.83 0.0832
5 2 wind 0.00950 0.0113 0.838 0.413
6 2 temp 1.69 0.802 2.11 0.0490
7 3 (Intercept) 85.5 22.3 3.83 0.00122
8 3 wind -0.0210 0.0165 -1.27 0.220
9 3 temp 0.276 1.14 0.243 0.811
10 4 (Intercept) 73.3 15.1 4.86 0.000126
# ... with 62 more rows
Thanks for the patience, I am working through this myself!
Issue would be that there is a grouping attribute rowwise after the do call and the column 'fitHour' is a list. We can ungroup, loop over the list with map and tidy it to a list column
library(dplyr)
library(purrr)
library(broom)
df.h %>%
group_by(hour) %>%
do(fitHour = lm(price ~ wind + temp, data = .)) %>%
ungroup %>%
mutate(HourCoef = map(fitHour, tidy))
Or use unnest after the mtuate
df.h %>%
group_by(hour) %>%
do(fitHour = lm(price ~ wind + temp, data = .)) %>%
ungroup %>%
transmute(hour, HourCoef = map(fitHour, tidy)) %>%
unnest(HourCoef)
# A tibble: 72 x 6
# hour term estimate std.error statistic p.value
# <fct> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 (Intercept) 89.8 20.2 4.45 0.000308
# 2 1 wind 0.00493 0.0151 0.326 0.748
# 3 1 temp -1.84 1.08 -1.71 0.105
# 4 2 (Intercept) 75.6 23.7 3.20 0.00500
# 5 2 wind -0.00910 0.0146 -0.622 0.542
# 6 2 temp 0.192 0.853 0.225 0.824
# 7 3 (Intercept) 44.0 23.9 1.84 0.0822
# 8 3 wind -0.00158 0.0166 -0.0953 0.925
# 9 3 temp 0.622 1.19 0.520 0.609
#10 4 (Intercept) 57.8 18.9 3.06 0.00676
# … with 62 more rows
If we wanted a single dataset, pull the 'fitHour', loop over the list with map, condense it to a single dataset by row binding (suffix _dfr)
df.h %>%
group_by(hour) %>%
do(fitHour = lm(price ~ wind + temp, data = .)) %>%
ungroup %>%
pull(fitHour) %>%
map_dfr(tidy, .id = 'grp')
NOTE: The OP's error message was able to be replicated with R 4.02, dplyr 1.0.0 and broom 0.7.0
tidy(dfHour,fitHour)
Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x),
na.rm = na.rm) :
Calling var(x) on a factor x is defunct.
Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
In addition: Warning messages:
1: Data frame tidiers are deprecated and will be removed in an upcoming release of broom.
2: In mean.default(X[[i]], ...) :
Your code actually works. Maybe package version or re starting a new R session could help:
library(dplyr)
library(broom)
df.h = data.frame(
hour = factor(rep(1:24, each = 21)),
price = runif(504, min = -10, max = 125),
wind = runif(504, min = 0, max = 2500),
temp = runif(504, min = - 10, max = 25)
)
dfHour = df.h %>% group_by(hour) %>%
do(fitHour = lm(price ~ wind + temp, data = .))
tidy(dfHour,fitHour)
# A tibble: 72 x 6
# Groups: hour [24]
hour term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 (Intercept) 66.4 14.8 4.48 0.000288
2 1 wind 0.000474 0.00984 0.0482 0.962
3 1 temp 0.0691 0.945 0.0731 0.943
4 2 (Intercept) 66.5 20.4 3.26 0.00432
5 2 wind -0.00540 0.0127 -0.426 0.675
6 2 temp -0.306 0.944 -0.324 0.750
7 3 (Intercept) 86.5 17.3 5.00 0.0000936
8 3 wind -0.0119 0.00960 -1.24 0.232
9 3 temp -1.18 0.928 -1.27 0.221
10 4 (Intercept) 59.8 17.5 3.42 0.00304
# ... with 62 more rows
I first simulated 500 samples of size 55 in the normal distribution.
samples <- replicate(500, rnorm(55,mean=50, sd=10), simplify = FALSE)
1) For each sample, I want the mean, median, range, and third quartile. Then I need to store these together in a data frame.
This is what I have. I am not sure about the range or the quantile. I tried sapply and lapply but not sure how they work.
stats <- data.frame(
means = map_dbl(samples,mean),
medians = map_dbl(samples,median),
sd= map_dbl(samples,sd),
range= map_int(samples, max-min),
third_quantile=sapply(samples,quantile,type=3)
)
2) Then plot the sampling distribution (histogram) of the means.
I try to plot but I don't get how to get the mean
stats <- gather(stats, key = "Trials", value = "Mean")
ggplot(stats,aes(x=Trials))+geom_histogram()
3) Then I want to plot the other three statistics in (three separate graphs) of a single plotting window.
I know I need to use something like gather and facet_wrap, but I am not sure how to do it.
You were almost there. All it is needed is to define anonymous functions wherever there are errors.
library(tidyverse)
set.seed(1234) # Make the results reproducible
samples <- replicate(500, rnorm(55,mean=50, sd=10), simplify = FALSE)
str(samples)
stats <- data.frame(
means = map_dbl(samples, mean),
medians = map_dbl(samples, median),
sd = map_dbl(samples, sd),
range = map_dbl(samples, function(x) diff(range(x))),
third_quantile = map_dbl(samples, function(x) quantile(x, probs = 3/4, type = 3))
)
str(stats)
#'data.frame': 500 obs. of 5 variables:
# $ means : num 49.8 51.5 52.2 50.2 51.6 ...
# $ medians : num 51.5 51.7 51 51.1 50.5 ...
# $ sd : num 9.55 7.81 11.43 8.97 10.75 ...
# $ range : num 38.5 37.2 54 36.7 60.2 ...
# $ third_quantile: num 57.7 56.2 58.8 55.6 57 ...
The map_dbl functions you're using are definitely nice, but if you're trying to get a data frame in the end anyway, you might have an easier time converting the list into a data frame at the beginning, then taking advantage of some dplyr functions.
I'm first mapping over the list, creating tibbles, and binding it together with an added ID. The conversion creates a column value of the sample values. summarise_at lets you take a list of functions—supplying names in the list sets the names in the resultant data frame. You can use purrr's ~. notation to define these functions inline where needed. Cuts down on the number of times you have to map_dbl and so on.
library(tidyverse)
stats <- samples %>%
map_dfr(as_tibble, .id = "sample") %>%
group_by(sample) %>%
summarise_at(vars(value),
.funs = list(mean = mean, median = median, sd = sd,
range = ~(max(.) - min(.)),
third_quartile = ~quantile(., probs = 0.75)))
head(stats)
#> # A tibble: 6 x 6
#> sample mean median sd range third_quartile
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 45.0 44.4 8.71 47.6 48.6
#> 2 10 51.0 52.0 9.55 49.3 56.2
#> 3 100 51.6 52.2 10.4 60.7 58.1
#> 4 101 51.6 51.1 9.92 37.6 57.2
#> 5 102 49.1 48.2 9.65 39.8 57.0
#> 6 103 52.2 51.3 10.1 47.4 58.5
Next, in your code you gathered the data—which is often the solution folks need on SO—but if you're only trying to show the mean column, you can work with it as is.
ggplot(stats, aes(x = mean)) +
geom_histogram()
I have several models fit to predict an outcome y = x1 + x2 + .....+x22. That's a fair number of predictors and a fair number of models. My customers want to know what's the marginal impact of each X on the estimated y. The models may include splines and interaction terms. I can do this, but it's cumbersome and requires loops or a lot of copy paste, which is slow or error prone. Can I do this better by writing my function differently and/or using purrr or an *apply function? Reproducible example is below. Ideally, I could write one function and apply it to longdata.
## create my fake data.
library(tidyverse)
library (rms)
ltrans<- function(l1){
newvar <- exp(l1)/(exp(l1)+1)
return(newvar)
}
set.seed(123)
mystates <- c("AL","AR","TN")
mydf <- data.frame(idno = seq(1:1500),state = rep(mystates,500))
mydf$x1[mydf$state=='AL'] <- rnorm(500,50,7)
mydf$x1[mydf$state=='AR'] <- rnorm(500,55,8)
mydf$x1[mydf$state=='TN'] <- rnorm(500,48,10)
mydf$x2 <- sample(1:5,500, replace = T)
mydf$x3 <- (abs(rnorm(1500,10,20)))^2
mydf$outcome <- as.numeric(cut2(sample(1:100,1500,replace = T),95))-1
dd<- datadist(mydf)
options(datadist = 'dd')
m1 <- lrm(outcome ~ x1 + x2+ rcs(x3,3), data = mydf)
dothemath <- function(x1 = x1ref,x2 = x2ref,x3 = x3ref) {
ltrans(-2.1802256-0.01114239*x1+0.050319692*x2-0.00079289232* x3+
7.6508189e-10*pmax(x3-7.4686271,0)^3-9.0897627e-10*pmax(x3- 217.97865,0)^3+
1.4389439e-10*pmax(x3-1337.2538,0)^3)}
x1ref <- 51.4
x2ref <- 3
x3ref <- 217.9
dothemath() ## 0.0591
mydf$referent <- dothemath()
mydf$thisobs <- dothemath(x1 = mydf$x1, x2 = mydf$x2, x3 = mydf$x3)
mydf$predicted <- predict(m1,mydf,type = "fitted.ind") ## yes, matches.
mydf$x1_marginaleffect <- dothemath(x1= mydf$x1)/mydf$referent
mydf$x2_marginaleffect <- dothemath(x2 = mydf$x2)/mydf$referent
mydf$x3_marginaleffect <- dothemath(x3 = mydf$x3)/mydf$referent
## can I do this with long data?
longdata <- mydf %>%
select(idno,state,referent,thisobs,x1,x2,x3) %>%
gather(varname,value,x1:x3)
##longdata$marginaleffect <- dothemath(longdata$varname = longdata$value) ## no, this does not work.
## I need to communicate to the function which variable it is evaluating.
longdata$marginaleffect[longdata$varname=="x1"] <- dothemath(x1 = longdata$value[longdata$varname=="x1"])/
longdata$referent[longdata$varname=="x1"]
longdata$marginaleffect[longdata$varname=="x2"] <- dothemath(x2 = longdata$value[longdata$varname=="x2"])/
longdata$referent[longdata$varname=="x2"]
longdata$marginaleffect[longdata$varname=="x3"] <- dothemath(x3 = longdata$value[longdata$varname=="x3"])/
longdata$referent[longdata$varname=="x3"]
testing<- inner_join(longdata[longdata$varname=="x1",c(1,7)],mydf[,c(1,10)])
head(testing) ## yes, both methods work.
Mostly you're just talking about a grouped mutate, with the caveat that dothemath is built such that you need to specify the variable name, which can be done by using do.call or purrr::invoke to call it on a named list of parameters:
longdata <- longdata %>%
group_by(varname) %>%
mutate(marginaleffect = invoke(dothemath, setNames(list(value), varname[1])) / referent)
longdata
#> # A tibble: 4,500 x 7
#> # Groups: varname [3]
#> idno state referent thisobs varname value marginaleffect
#> <int> <fct> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1 1 AL 0.0591 0.0688 x1 46.1 1.06
#> 2 2 AR 0.0591 0.0516 x1 50.2 1.01
#> 3 3 TN 0.0591 0.0727 x1 38.0 1.15
#> 4 4 AL 0.0591 0.0667 x1 48.4 1.03
#> 5 5 AR 0.0591 0.0515 x1 47.1 1.05
#> 6 6 TN 0.0591 0.0484 x1 37.6 1.15
#> 7 7 AL 0.0591 0.0519 x1 60.9 0.905
#> 8 8 AR 0.0591 0.0531 x1 63.2 0.883
#> 9 9 TN 0.0591 0.0780 x1 47.8 1.04
#> 10 10 AL 0.0591 0.0575 x1 50.5 1.01
#> # ... with 4,490 more rows
# the first values look similar
inner_join(longdata[longdata$varname == "x1", c(1,7)], mydf[,c(1,10)])
#> Joining, by = "idno"
#> # A tibble: 1,500 x 3
#> idno marginaleffect x1_marginaleffect
#> <int> <dbl> <dbl>
#> 1 1 1.06 1.06
#> 2 2 1.01 1.01
#> 3 3 1.15 1.15
#> 4 4 1.03 1.03
#> 5 5 1.05 1.05
#> 6 6 1.15 1.15
#> 7 7 0.905 0.905
#> 8 8 0.883 0.883
#> 9 9 1.04 1.04
#> 10 10 1.01 1.01
#> # ... with 1,490 more rows
# check everything is the same
mydf %>%
gather(varname, marginaleffect, x1_marginaleffect:x3_marginaleffect) %>%
select(idno, varname, marginaleffect) %>%
mutate(varname = substr(varname, 1, 2)) %>%
all_equal(select(longdata, idno, varname, marginaleffect))
#> [1] TRUE
It may be easier to reconfigure dothemath to take an additional parameter of the variable name so as to avoid the gymnastics.