errors in apply lm regression to each row - r

I want to run a lm regression to each row of my data dt.
My code is
coe <- apply(dt, 1, FUN = function(x) lm(dbl ~ bld, data = as.data.frame(x))$coefficients)
But it returns:
Error in eval(predvars, data, env) : object 'dbl' not found
I confirm that there are dbl and bld in my data dt.
So I do not know how to deal with it.

I am guessing you have mistakenly written about running regression by row (which is impossible since there will just be one observation for x and y in y ~ x). Instead, you want to run the regression repeatedly for some grouping variable?
This is pretty easy to do with groupedstats:
groupedstats::grouped_lm(
data = ggplot2::diamonds,
grouping.vars = c(cut, color), # grouping variables
formula = price ~ carat * clarity # formula
)
#> # A tibble: 547 x 10
#> cut color term estimate std.error t.value conf.low conf.high
#> <ord> <ord> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Ideal E (Int~ -3085. 64.9 -47.5 -3212. -2958.
#> 2 Ideal E carat 10529. 74.1 142. 10384. 10674.
#> 3 Ideal E clar~ -2088. 267. -7.81 -2612. -1564.
#> 4 Ideal E clar~ 168. 265. 0.633 -352. 688.
#> 5 Ideal E clar~ -926. 217. -4.26 -1352. -500.
#> 6 Ideal E clar~ 625. 157. 3.99 318. 932.
#> 7 Ideal E clar~ -392. 107. -3.65 -602. -181.
#> 8 Ideal E clar~ 83.9 79.1 1.06 -71.1 239.
#> 9 Ideal E clar~ -40.8 67.4 -0.605 -173. 91.4
#> 10 Ideal E cara~ 9746. 287. 34.0 9185. 10308.
#> # ... with 537 more rows, and 2 more variables: p.value <dbl>,
#> # significance <chr>
Created on 2018-08-19 by the reprex package (v0.2.0.9000).

There are two problems with what you're trying to do. When you're passing dt to apply, it drops x to a (named) numeric vector. When you coerce it using as.data.frame, it becomes a data.frame with one column. This is why dbl is not found.
> x <- c(a = 1, b = 0.58)
> as.data.frame(x)
x
a 1.00
b 0.58
Second point is that you want to do a regression on two points. In essence, you are doing this:
> lm(b ~ a, data = data.frame(a = 1, b = 0.58))
Call:
lm(formula = b ~ a, data = data.frame(a = 1, b = 0.58))
Coefficients:
(Intercept) a
0.58 NA
You will not be able to estimate the parameter of interest because you'll need more points to do that.

Related

How to run a linear regression on estimated coefficients

Consider the following:
a <- c( 2, 3, 4, 5, 6, 7, 8, 9, 100, 11)
b <- c(5, 6, 7, 80, 9, 10, 11, 12, 13, 14)
c <- c(15, 16, 175, 18, 19, 20, 21, 22, 23, 24)
x <- c(17,18,50,15,64,15,3,5,6,9)
y <- c(55,66,99,83,64,51,23,64,89,101)
z <- c(98,78,56,21,45,34,61,98,45,64)
abc <- data.frame(cbind(a,b,c))
Firstly, I plan to run a regression and values abc with xyz as follows (This went according to plan):
dep_vars <- as.matrix(abc)
lm <- lm(dep_vars ~ x + y + z, data = abc)
From here, I understand that we can print the summary in the console using either of these two methods:
summary(lm)
map_dfr(summary(lm), tidy, .id = 'dep_var')
My issue is that I would like to run a regression of the coefficients of x, y and z on a, b and c.
Base R Options
The others have already answered your question, but I felt like providing multiple options in case you wished to know them for the future. Here, all of these base R functions do effectively the same thing albeit in slightly different ways:
#### Call Directly From Saved LM Object ####
coef(lm)
lm$coefficients # functionally equivalent to above
lm[1] # returns vector format from list
For example lm$coefficients and coef(lm) both look like this:
a b c
(Intercept) 15.3706338 44.2597407 -51.0505560
x -0.5862903 -0.3488530 1.0575465
y 0.4074877 0.1587221 0.7602451
z -0.2724661 -0.5257350 0.2025181
Whereas pulling it from a saved lm object gives you a vectorized version pulled from a list:
$coefficients
a b c
(Intercept) 15.3706338 44.2597407 -51.0505560
x -0.5862903 -0.3488530 1.0575465
y 0.4074877 0.1587221 0.7602451
z -0.2724661 -0.5257350 0.2025181
Broom/Dplyr Options
If you want a tidy version of this, tidy transforms a base R lm object into a tidier format that you can directly manipulate:
#### Broom and Dplyr ####
library(tidyverse)
library(broom)
tidy(lm) # pull all terms
Like so:
# A tibble: 12 × 6
response term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 a (Intercept) 15.4 45.8 0.335 0.749
2 a x -0.586 0.538 -1.09 0.318
3 a y 0.407 0.454 0.898 0.404
4 a z -0.272 0.426 -0.639 0.546
5 b (Intercept) 44.3 30.4 1.46 0.195
6 b x -0.349 0.357 -0.978 0.366
7 b y 0.159 0.300 0.528 0.616
8 b z -0.526 0.282 -1.86 0.112
9 c (Intercept) -51.1 69.1 -0.739 0.488
10 c x 1.06 0.812 1.30 0.240
11 c y 0.760 0.684 1.11 0.309
12 c z 0.203 0.643 0.315 0.763
To select some specific coefficients, you can simply add slice to whatever you please:
tidy(lm) %>%
slice(2) # pull specific coefficient
Giving you a row of your data frame for a and x:
# A tibble: 1 × 6
response term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 a x -0.586 0.538 -1.09 0.318

Mapping broom::tidy to nested list of {fixest} models and keep name of list element

I want to apply broom::tidy() to models nested in a fixest_multi object and extract the names of each list level as data frame columns. Here's an example of what I mean.
library(fixest)
library(tidyverse)
library(broom)
multiple_est <- feols(c(Ozone, Solar.R) ~ Wind + Temp, airquality, fsplit = ~Month)
This command estimates two models for each dep. var. (Ozone and Solar.R) for a subset of each Month plus the full sample. Here's how the resulting object looks like:
> names(multiple_est)
[1] "Full sample" "5" "6" "7" "8" "9"
> names(multiple_est$`Full sample`)
[1] "Ozone" "Solar.R"
I now want to tidy each model object, but keep the information of the Month / Dep.var. combination as columns in the tidied data frame. My desired output would look something like this:
I can run map_dfr from the tidyr package, giving me this result:
> map_dfr(multiple_est, tidy, .id ="Month") %>% head(9)
# A tibble: 9 x 6
Month term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Full sample (Intercept) -71.0 23.6 -3.01 3.20e- 3
2 Full sample Wind -3.06 0.663 -4.61 1.08e- 5
3 Full sample Temp 1.84 0.250 7.36 3.15e-11
4 5 (Intercept) -76.4 82.0 -0.931 3.53e- 1
5 5 Wind 2.21 2.31 0.958 3.40e- 1
6 5 Temp 3.07 0.878 3.50 6.15e- 4
7 6 (Intercept) -70.6 46.8 -1.51 1.45e- 1
8 6 Wind -1.34 1.13 -1.18 2.50e- 1
9 6 Temp 1.64 0.609 2.70 1.29e- 2
But this tidies only the first model of each Month, the model with the Ozone outcome.
My desired output would look something like this:
Month outcome term estimate more columns from tidy
Full sample Ozone (Intercept) -71.0
Full sample Ozone Wind -3.06
Full sample Ozone Temp 1.84
Full sample Solar.R (Intercept) some value
Full sample Solar.R Wind some value
Full sample Solar.R Temp some value
... rows repeated for each month 5, 6, 7, 8, 9
How can I apply tidy to all models and add another column that indicates the outcome of the model (which is stored in the name of the model object)?
So, fixest_mult has a pretty strange setup as I delved deeper. As you noticed, mapping across it or using apply just accesses part of the data frames. In fact, it isn't just the data frames for "Ozone", but actually just the data frames for the first 6 data frames (those for c("Full sample", "5", "6").
If you convert to a list, it access the data attribute, which is a sequential list of all 12 data frames, but dropping the relevant names you're looking for. So, as a workaround, could use pmap() and the names (found in the attributes of the object) to tidy() and then use mutate() for your desired columns.
library(fixest)
library(tidyverse)
library(broom)
multiple_est <- feols(c(Ozone, Solar.R) ~ Wind + Temp, airquality, fsplit = ~Month)
nms <- attr(multiple_est, "meta")$all_names
pmap_dfr(
list(
data = as.list(multiple_est),
month = rep(nms$sample, each = length(nms$lhs)),
outcome = rep(nms$lhs, length(nms$sample))
),
~ tidy(..1) %>%
mutate(
Month = ..2,
outcome = ..3,
.before = 1
)
)
#> # A tibble: 36 × 7
#> Month outcome term estimate std.error statistic p.value
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Full sample Ozone (Intercept) -71.0 23.6 -3.01 3.20e- 3
#> 2 Full sample Ozone Wind -3.06 0.663 -4.61 1.08e- 5
#> 3 Full sample Ozone Temp 1.84 0.250 7.36 3.15e-11
#> 4 Full sample Solar.R (Intercept) -76.4 82.0 -0.931 3.53e- 1
#> 5 Full sample Solar.R Wind 2.21 2.31 0.958 3.40e- 1
#> 6 Full sample Solar.R Temp 3.07 0.878 3.50 6.15e- 4
#> 7 5 Ozone (Intercept) -70.6 46.8 -1.51 1.45e- 1
#> 8 5 Ozone Wind -1.34 1.13 -1.18 2.50e- 1
#> 9 5 Ozone Temp 1.64 0.609 2.70 1.29e- 2
#> 10 5 Solar.R (Intercept) -284. 262. -1.08 2.89e- 1
#> # … with 26 more rows

Improve parallel performance with batching in a static-dynamic branching pipeline

BLUF: I am struggling to understand out how to use batching in the R targets package to improve performance in a static and dynamic branching pipeline processed in parallel using tar_make_future(). I presume that I need to batch within each dynamic branch but I am unsure how to go about doing that.
Here's a reprex that uses dynamic branching nested inside static branching, similar to what my actual pipeline is doing. It first branches statically for each value in all_types, and then dynamically branches within each category. This code produces 1,000 branches and 1,010 targets total. In the actual workflow I obviously don't use replicate, and the dynamic branches vary in number depending on the type value.
# _targets.r
library(targets)
library(tarchetypes)
library(future)
library(future.callr)
plan(callr)
all_types = data.frame(type = LETTERS[1:10])
tar_map(values = all_types, names = "type",
tar_target(
make_data,
replicate(100,
data.frame(x = seq(1000) + rnorm(1000, 0, 5),
y = seq(1000) + rnorm(1000, 20, 20)),
simplify = FALSE
),
iteration = "list"
),
tar_target(
fit_model,
lm(make_data),
pattern = map(make_data),
iteration = "list"
)
)
And here's a timing comparison of tar_make() vs tar_make_future() with eight workers:
# tar_destroy()
t1 <- system.time(tar_make())
# tar_destroy()
t2 <- system.time(tar_make_future(workers = 8))
rbind(serial = t1, parallel = t2)
## user.self sys.self elapsed user.child sys.child
## serial 2.12 0.11 25.59 NA NA
## parallel 2.07 0.24 184.68 NA NA
I don't think the user or system fields are useful here since the job gets dispatched to separate R processes, but the elapsed time for the parallel job takes about 7 times longer than the serial job.
I presume this slowdown is caused by the large number of targets. Will batching improve performance in this case, and if so how can I implement batching within the dynamic branch?
You are on the right track with batching. In your case, that is a matter of breaking up your list of 100 datasets into groups of, say, 10 or so. You could do this with a nested list of datasets, but that's a lot of work. Luckily, there is an easier way.
Your question is actually really well-timed. I just wrote some new target factories in tarchetypes that could help. To access them, you will need the development version of tarchetypes from GitHub:
remotes::install_github("ropensci/tarchetypes")
Then, with tar_map2_count(), it will be much easier to batch your list of 100 datasets for each scenario.
library(targets)
tar_script({
library(broom)
library(targets)
library(tarchetypes)
library(tibble)
make_data <- function(n) {
datasets_per_batch <- replicate(
100,
tibble(
x = seq(n) + rnorm(n, 0, 5),
y = seq(n) + rnorm(n, 20, 20)
),
simplify = FALSE
)
tibble(dataset = datasets_per_batch, rep = seq_along(datasets_per_batch))
}
tar_map2_count(
name = model,
command1 = make_data(n = rows),
command2 = tidy(lm(y ~ x, data = dataset)), # Need dataset[[1]] in tarchetypes 0.4.0
values = data_frame(
scenario = LETTERS[seq_len(10)],
rows = seq(10, 100, length.out = 10)
),
columns2 = NULL,
batches = 10
)
})
tar_make(reporter = "silent")
#> Warning message:
#> `data_frame()` was deprecated in tibble 1.1.0.
#> Please use `tibble()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
tar_read(model)
#> # A tibble: 2,000 × 8
#> term estimate std.error statistic p.value scenario rows tar_group
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <int>
#> 1 (Intercept) 17.1 12.8 1.34 0.218 A 10 10
#> 2 x 1.39 1.35 1.03 0.333 A 10 10
#> 3 (Intercept) 6.42 14.0 0.459 0.658 A 10 10
#> 4 x 1.75 1.28 1.37 0.209 A 10 10
#> 5 (Intercept) 32.8 7.14 4.60 0.00176 A 10 10
#> 6 x -0.300 1.14 -0.263 0.799 A 10 10
#> 7 (Intercept) 29.7 3.24 9.18 0.0000160 A 10 10
#> 8 x 0.314 0.414 0.758 0.470 A 10 10
#> 9 (Intercept) 20.0 13.6 1.47 0.179 A 10 10
#> 10 x 1.23 1.77 0.698 0.505 A 10 10
#> # … with 1,990 more rows
Created on 2021-12-10 by the reprex package (v2.0.1)
There is also tar_map_rep(), which may be easier if all your datasets are randomly generated, but I am not sure if I am overfitting your use case.
library(targets)
tar_script({
library(broom)
library(targets)
library(tarchetypes)
library(tibble)
make_one_dataset <- function(n) {
tibble(
x = seq(n) + rnorm(n, 0, 5),
y = seq(n) + rnorm(n, 20, 20)
)
}
tar_map_rep(
name = model,
command = tidy(lm(y ~ x, data = make_one_dataset(n = rows))),
values = data_frame(
scenario = LETTERS[seq_len(10)],
rows = seq(10, 100, length.out = 10)
),
batches = 10,
reps = 10
)
})
tar_make(reporter = "silent")
#> Warning message:
#> `data_frame()` was deprecated in tibble 1.1.0.
#> Please use `tibble()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
tar_read(model)
#> # A tibble: 2,000 × 10
#> term estimate std.error statistic p.value scenario rows tar_batch tar_rep
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <int> <int>
#> 1 (Inter… 37.5 7.50 5.00 0.00105 A 10 1 1
#> 2 x -0.701 1.17 -0.601 0.564 A 10 1 1
#> 3 (Inter… 21.5 9.64 2.23 0.0567 A 10 1 2
#> 4 x -0.213 1.55 -0.138 0.894 A 10 1 2
#> 5 (Inter… 20.6 9.51 2.17 0.0620 A 10 1 3
#> 6 x 1.40 1.79 0.783 0.456 A 10 1 3
#> 7 (Inter… 11.6 11.2 1.04 0.329 A 10 1 4
#> 8 x 2.34 1.39 1.68 0.131 A 10 1 4
#> 9 (Inter… 26.8 9.16 2.93 0.0191 A 10 1 5
#> 10 x 0.288 1.10 0.262 0.800 A 10 1 5
#> # … with 1,990 more rows, and 1 more variable: tar_group <int>
Created on 2021-12-10 by the reprex package (v2.0.1)
Unfortunately, futures do come with overhead. Maybe it will be faster in your case if you try tar_make_clustermq()?

How to run a paired t-test on different levels of a categorical variable?

I am trying to run a paired t-test on pre- and post-intervention results of three intervention types. I am trying to run the the test on each intervention separately using "subset" in t.test function but it keeps running the test on the whole sample. I cannot separate the intervention levels manually as this is a large database and I do not have access to the excel file. Does anyone have any suggestions?
Here's the codes I am using:
Treatment (intervention) levels:"Passive" "Pro" "Peer"
"Post" and "Pre" are continuous variables.
t.test(data$Post, data$Pre, paired=T, subset=data$Treatment=="Peer")
t.test(data$Post, data$Pre, paired=T, subset=data$Treatment=="Pro")
t.test(data$Post, data$Pre, paired=T, subset=data$Treatment=="Passive")
There is no subset argument (nor a data argument) for the t.test function when using the default method:
> args(stats:::t.test.default)
function (x, y = NULL, alternative = c("two.sided", "less",
"greater"), mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, ...)
You'll have to subset first,
with(subset(data, subset=Treatment=="Peer"),
t.test(Post, Pre, paired=TRUE)
)
There's also an easier way using dplyr and broom...
library(dplyr)
library(broom)
data %>%
group_by(Treatment) %>%
do(tidy(t.test(.$Pre, .$Post, paired=TRUE)))
Reproducible example:
set.seed(123)
data <- tibble(id=1:63, Pre=rnorm(21*3,10,5), Post=rnorm(21*3,13,5),
Treatment=sample(c("Peer","Pro","Passive"), 63, TRUE))
data
# A tibble: 63 x 4
id Pre Post Treatment
<int> <dbl> <dbl> <chr>
1 1 7.20 7.91 Pro
2 2 8.85 7.64 Peer
3 3 17.8 14.5 Peer
4 4 10.4 15.2 Peer
5 5 10.6 13.3 Passive
6 6 18.6 17.6 Passive
7 7 12.3 23.3 Pro
8 8 3.67 10.5 Peer
9 9 6.57 1.45 Pro
10 10 7.77 18.0 Passive
# ... with 53 more rows
Output:
# A tibble: 3 x 9
# Groups: Treatment [3]
Treatment estimate statistic p.value parameter conf.low conf.high method alternative
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Passive -2.41 -1.72 0.107 14 -5.42 0.592 Paired t-~ two.sided
2 Peer -3.61 -2.96 0.00636 27 -6.11 -1.10 Paired t-~ two.sided
3 Pro -1.22 -0.907 0.376 19 -4.03 1.59 Paired t-~ two.sided

Calculate predicted model results by iterating through variables

I have several models fit to predict an outcome y = x1 + x2 + .....+x22. That's a fair number of predictors and a fair number of models. My customers want to know what's the marginal impact of each X on the estimated y. The models may include splines and interaction terms. I can do this, but it's cumbersome and requires loops or a lot of copy paste, which is slow or error prone. Can I do this better by writing my function differently and/or using purrr or an *apply function? Reproducible example is below. Ideally, I could write one function and apply it to longdata.
## create my fake data.
library(tidyverse)
library (rms)
ltrans<- function(l1){
newvar <- exp(l1)/(exp(l1)+1)
return(newvar)
}
set.seed(123)
mystates <- c("AL","AR","TN")
mydf <- data.frame(idno = seq(1:1500),state = rep(mystates,500))
mydf$x1[mydf$state=='AL'] <- rnorm(500,50,7)
mydf$x1[mydf$state=='AR'] <- rnorm(500,55,8)
mydf$x1[mydf$state=='TN'] <- rnorm(500,48,10)
mydf$x2 <- sample(1:5,500, replace = T)
mydf$x3 <- (abs(rnorm(1500,10,20)))^2
mydf$outcome <- as.numeric(cut2(sample(1:100,1500,replace = T),95))-1
dd<- datadist(mydf)
options(datadist = 'dd')
m1 <- lrm(outcome ~ x1 + x2+ rcs(x3,3), data = mydf)
dothemath <- function(x1 = x1ref,x2 = x2ref,x3 = x3ref) {
ltrans(-2.1802256-0.01114239*x1+0.050319692*x2-0.00079289232* x3+
7.6508189e-10*pmax(x3-7.4686271,0)^3-9.0897627e-10*pmax(x3- 217.97865,0)^3+
1.4389439e-10*pmax(x3-1337.2538,0)^3)}
x1ref <- 51.4
x2ref <- 3
x3ref <- 217.9
dothemath() ## 0.0591
mydf$referent <- dothemath()
mydf$thisobs <- dothemath(x1 = mydf$x1, x2 = mydf$x2, x3 = mydf$x3)
mydf$predicted <- predict(m1,mydf,type = "fitted.ind") ## yes, matches.
mydf$x1_marginaleffect <- dothemath(x1= mydf$x1)/mydf$referent
mydf$x2_marginaleffect <- dothemath(x2 = mydf$x2)/mydf$referent
mydf$x3_marginaleffect <- dothemath(x3 = mydf$x3)/mydf$referent
## can I do this with long data?
longdata <- mydf %>%
select(idno,state,referent,thisobs,x1,x2,x3) %>%
gather(varname,value,x1:x3)
##longdata$marginaleffect <- dothemath(longdata$varname = longdata$value) ## no, this does not work.
## I need to communicate to the function which variable it is evaluating.
longdata$marginaleffect[longdata$varname=="x1"] <- dothemath(x1 = longdata$value[longdata$varname=="x1"])/
longdata$referent[longdata$varname=="x1"]
longdata$marginaleffect[longdata$varname=="x2"] <- dothemath(x2 = longdata$value[longdata$varname=="x2"])/
longdata$referent[longdata$varname=="x2"]
longdata$marginaleffect[longdata$varname=="x3"] <- dothemath(x3 = longdata$value[longdata$varname=="x3"])/
longdata$referent[longdata$varname=="x3"]
testing<- inner_join(longdata[longdata$varname=="x1",c(1,7)],mydf[,c(1,10)])
head(testing) ## yes, both methods work.
Mostly you're just talking about a grouped mutate, with the caveat that dothemath is built such that you need to specify the variable name, which can be done by using do.call or purrr::invoke to call it on a named list of parameters:
longdata <- longdata %>%
group_by(varname) %>%
mutate(marginaleffect = invoke(dothemath, setNames(list(value), varname[1])) / referent)
longdata
#> # A tibble: 4,500 x 7
#> # Groups: varname [3]
#> idno state referent thisobs varname value marginaleffect
#> <int> <fct> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1 1 AL 0.0591 0.0688 x1 46.1 1.06
#> 2 2 AR 0.0591 0.0516 x1 50.2 1.01
#> 3 3 TN 0.0591 0.0727 x1 38.0 1.15
#> 4 4 AL 0.0591 0.0667 x1 48.4 1.03
#> 5 5 AR 0.0591 0.0515 x1 47.1 1.05
#> 6 6 TN 0.0591 0.0484 x1 37.6 1.15
#> 7 7 AL 0.0591 0.0519 x1 60.9 0.905
#> 8 8 AR 0.0591 0.0531 x1 63.2 0.883
#> 9 9 TN 0.0591 0.0780 x1 47.8 1.04
#> 10 10 AL 0.0591 0.0575 x1 50.5 1.01
#> # ... with 4,490 more rows
# the first values look similar
inner_join(longdata[longdata$varname == "x1", c(1,7)], mydf[,c(1,10)])
#> Joining, by = "idno"
#> # A tibble: 1,500 x 3
#> idno marginaleffect x1_marginaleffect
#> <int> <dbl> <dbl>
#> 1 1 1.06 1.06
#> 2 2 1.01 1.01
#> 3 3 1.15 1.15
#> 4 4 1.03 1.03
#> 5 5 1.05 1.05
#> 6 6 1.15 1.15
#> 7 7 0.905 0.905
#> 8 8 0.883 0.883
#> 9 9 1.04 1.04
#> 10 10 1.01 1.01
#> # ... with 1,490 more rows
# check everything is the same
mydf %>%
gather(varname, marginaleffect, x1_marginaleffect:x3_marginaleffect) %>%
select(idno, varname, marginaleffect) %>%
mutate(varname = substr(varname, 1, 2)) %>%
all_equal(select(longdata, idno, varname, marginaleffect))
#> [1] TRUE
It may be easier to reconfigure dothemath to take an additional parameter of the variable name so as to avoid the gymnastics.

Resources