how to fill stacked barplot with patterns or textures in R - r

I have used ggplot2 to draw a stacked barplot, and I want to fill the barplot with patterns. But it seems that the problem is very complicated to be solved by ggplot2.
So is there a way to fill stacked barplot with patterns or textures with base R or with another R package?
My plot is similar to this barplot:
and I want the barplot looks like this, fill with patterns or textures:
My data is from my previous post:
plant group n percentage
1 Cucumber-1 [3.19e-39,2] 14729 0.8667686695
2 Cucumber-1 (2,4] 1670 0.0982757606
3 Cucumber-1 (4,6] 447 0.0263049491
4 Cucumber-1 (6,8] 131 0.0077090567
5 Cucumber-1 (8,10] 16 0.0009415642
6 Cucumber-2 [3.19e-39,2] 20206 0.9410394933
7 Cucumber-2 (2,4] 1155 0.0537909836
8 Cucumber-2 (4,6] 90 0.0041915052
9 Cucumber-2 (6,8] 16 0.0007451565
10 Cucumber-2 (8,10] 5 0.0002328614
11 Eggplant-1 [3.19e-39,2] 11273 0.9012631916
12 Eggplant-1 (2,4] 960 0.0767508794
13 Eggplant-1 (4,6] 181 0.0144707387
14 Eggplant-1 (6,8] 31 0.0024784138
15 Eggplant-1 (8,10] 63 0.0050367765
16 Eggplant-2 [3.19e-39,2] 16483 0.9493721921
17 Eggplant-2 (2,4] 725 0.0417578620
18 Eggplant-2 (4,6] 140 0.0080635871
19 Eggplant-2 (6,8] 12 0.0006911646
20 Eggplant-2 (8,10] 2 0.0001151941
21 Pepper-1 [3.19e-39,2] 4452 0.9763157895
22 Pepper-1 (2,4] 97 0.0212719298
23 Pepper-1 (4,6] 11 0.0024122807
24 Pepper-2 [3.19e-39,2] 23704 0.9560763119
25 Pepper-2 (2,4] 905 0.0365022385
26 Pepper-2 (4,6] 184 0.0074214496

Most of the required work is to get your data in shape. The function ?barplot is simple to use, but you want to feed it a matrix. You can use vectors for the density= and angle= arguments to distinguish the elements of the stacked bar plot.
d = read.table(text="plant ...
... 184 0.0074214496", header=T)
d$group <- factor(d$group, levels=c(levels(d$group)[c(5,1:4)]),
labels=c("(0,2]", levels(d$group)[1:4]))
levels(d$group)
# [1] "(0,2]" "(2,4]" "(4,6]" "(6,8]" "(8,10]"
tab <- table(d$group, d$plant)
tab
# output omitted
d <- rbind(d,
c("Pepper-1", "(6,8]", 0, 0),
c("Pepper-1", "(8,10]", 0, 0),
c("Pepper-2", "(6,8]", 0, 0),
c("Pepper-2", "(8,10]", 0, 0) )
d <- d[order(d$plant, d$group),]
d
# output omitted
mat <- matrix(as.numeric(d$percentage), nrow=5, ncol=6)
rownames(mat) <- levels(d$group)
colnames(mat) <- levels(d$plant)
names(dimnames(mat)) <- c("group", "plant")
mat
# plant
# group Cucumber-1 Cucumber-2 Eggplant-1 Eggplant-2 Pepper-1 Pepper-2
# (0,2] 0.8667686695 0.9410394933 0.901263192 0.9493721921 0.976315789 0.95607631
# (2,4] 0.0982757606 0.0537909836 0.076750879 0.0417578620 0.021271930 0.03650224
# (4,6] 0.0263049491 0.0041915052 0.014470739 0.0080635871 0.002412281 0.00742145
# (6,8] 0.0077090567 0.0007451565 0.002478414 0.0006911646 0.000000000 0.00000000
# (8,10] 0.0009415642 0.0002328614 0.005036777 0.0001151941 0.000000000 0.00000000
barplot(mat, density=5:9, angle=seq(40, 90, 10), cex.names=.8)

Related

Rolling prediction in a data frame using dplyr and rollapply

My first question here :)
My goal is: Given a data frame with predictors (each column a predictor / rows observations) fit a regression using lm and then predict the value using the last observation using a rolling window.
The data frame looks like:
> DfPredictor[1:40,]
Y X1 X2 X3 X4 X5
1 3.2860 192.5115 2.1275 83381 11.4360 8.7440
2 3.2650 190.1462 2.0050 88720 11.4359 8.8971
3 3.2213 192.9773 2.0500 74130 11.4623 8.8380
4 3.1991 193.7058 2.1050 73930 11.3366 8.7536
5 3.2224 193.5407 2.0275 80875 11.3534 8.7555
6 3.2000 190.6049 2.0950 86606 11.3290 8.8555
7 3.1939 191.1390 2.0975 91402 11.2960 8.8433
8 3.1971 192.2921 2.2700 88181 11.2930 8.8681
9 3.1873 194.9700 2.3300 115959 1.9477 8.5245
10 3.2182 194.5396 2.4200 134754 11.3200 8.4990
11 3.2409 194.5396 2.2025 136685 1.9649 8.4192
12 3.2112 195.1362 2.1900 136316 1.9750 8.3752
13 3.2231 193.3560 2.2475 140295 1.9691 8.3546
14 3.2015 192.9649 2.2575 139474 1.9500 8.3116
15 3.1744 194.0154 2.1900 146202 1.8476 8.2225
16 3.1646 194.4423 2.2650 142983 1.8600 8.1948
17 3.1708 194.9473 2.2425 141377 1.8522 8.2589
18 3.1675 193.9788 2.2400 141377 1.8600 8.2600
19 3.1744 194.2563 2.3000 149875 1.8718 8.2899
20 3.1410 193.4316 2.2300 129561 1.8480 8.2395
21 3.1266 191.2633 2.2550 122636 1.8440 8.2396
22 3.1486 192.0354 2.3600 130996 1.8570 8.8640
23 3.1282 194.3351 2.4825 92430 1.7849 8.1291
24 3.1214 193.5196 2.4750 94814 1.7624 8.1991
25 3.1230 193.2017 2.3725 87590 1.7660 8.2310
26 3.1182 192.1642 2.4475 87715 1.6955 8.2414
27 3.1203 191.3744 2.3775 89857 1.6539 8.2480
28 3.1156 192.2646 2.3725 92159 1.5976 8.1676
29 3.1270 192.7555 2.3675 97425 1.5896 8.1162
30 3.1154 194.0375 2.3725 87598 1.5277 8.2640
31 3.1104 192.0596 2.3850 93236 1.5132 7.9999
32 3.0846 192.2792 2.2900 94608 1.4990 8.1600
33 3.0569 193.2573 2.3050 84663 1.4715 8.2200
34 3.0893 192.7632 2.2550 67149 1.4955 7.9590
35 3.0991 192.1229 2.3050 75519 1.4280 7.9183
36 3.0879 192.1229 2.3100 76756 1.3839 7.9133
37 3.0965 192.0502 2.2175 61748 1.3130 7.8750
38 3.0655 191.2274 2.2300 41490 1.2823 7.8656
39 3.0636 191.6342 2.1925 51049 1.1492 7.7447
40 3.1097 190.9312 2.2150 21934 1.1626 7.6895
For instance using the rolling window with width = 10 the regression should be estimate and then predict the 'Y' correspondent to the X1,X2,...,X5.
The predictions should be included in a new column 'Ypred'.
There's some way to do that using rollapply + lm/predict + mudate??
Many thanks!!
Using the data in the Note at the end and assuming that in a window of width 10 we want to predict the last Y (i..e. the 10th), then:
library(zoo)
pred <- function(x) tail(fitted(lm(Y ~., as.data.frame(x))), 1)
transform(DF, pred = rollapplyr(DF, 10, pred, by.column = FALSE, fill = NA))
giving:
Y X1 X2 X3 X4 X5 pred
1 3.2860 192.5115 2.1275 83381 11.4360 8.7440 NA
2 3.2650 190.1462 2.0050 88720 11.4359 8.8971 NA
3 3.2213 192.9773 2.0500 74130 11.4623 8.8380 NA
4 3.1991 193.7058 2.1050 73930 11.3366 8.7536 NA
5 3.2224 193.5407 2.0275 80875 11.3534 8.7555 NA
6 3.2000 190.6049 2.0950 86606 11.3290 8.8555 NA
7 3.1939 191.1390 2.0975 91402 11.2960 8.8433 NA
8 3.1971 192.2921 2.2700 88181 11.2930 8.8681 NA
9 3.1873 194.9700 2.3300 115959 1.9477 8.5245 NA
10 3.2182 194.5396 2.4200 134754 11.3200 8.4990 3.219764
11 3.2409 194.5396 2.2025 136685 1.9649 8.4192 3.241614
12 3.2112 195.1362 2.1900 136316 1.9750 8.3752 3.225423
13 3.2231 193.3560 2.2475 140295 1.9691 8.3546 3.217797
14 3.2015 192.9649 2.2575 139474 1.9500 8.3116 3.205856
15 3.1744 194.0154 2.1900 146202 1.8476 8.2225 3.177928
16 3.1646 194.4423 2.2650 142983 1.8600 8.1948 3.156405
17 3.1708 194.9473 2.2425 141377 1.8522 8.2589 3.176243
18 3.1675 193.9788 2.2400 141377 1.8600 8.2600 3.177165
19 3.1744 194.2563 2.3000 149875 1.8718 8.2899 3.177211
20 3.1410 193.4316 2.2300 129561 1.8480 8.2395 3.145533
21 3.1266 191.2633 2.2550 122636 1.8440 8.2396 3.127410
22 3.1486 192.0354 2.3600 130996 1.8570 8.8640 3.148792
23 3.1282 194.3351 2.4825 92430 1.7849 8.1291 3.124913
24 3.1214 193.5196 2.4750 94814 1.7624 8.1991 3.124992
25 3.1230 193.2017 2.3725 87590 1.7660 8.2310 3.117981
26 3.1182 192.1642 2.4475 87715 1.6955 8.2414 3.117679
27 3.1203 191.3744 2.3775 89857 1.6539 8.2480 3.119898
28 3.1156 192.2646 2.3725 92159 1.5976 8.1676 3.121039
29 3.1270 192.7555 2.3675 97425 1.5896 8.1162 3.123903
30 3.1154 194.0375 2.3725 87598 1.5277 8.2640 3.119438
31 3.1104 192.0596 2.3850 93236 1.5132 7.9999 3.113963
32 3.0846 192.2792 2.2900 94608 1.4990 8.1600 3.101229
33 3.0569 193.2573 2.3050 84663 1.4715 8.2200 3.076817
34 3.0893 192.7632 2.2550 67149 1.4955 7.9590 3.083266
35 3.0991 192.1229 2.3050 75519 1.4280 7.9183 3.089377
36 3.0879 192.1229 2.3100 76756 1.3839 7.9133 3.084225
37 3.0965 192.0502 2.2175 61748 1.3130 7.8750 3.075252
38 3.0655 191.2274 2.2300 41490 1.2823 7.8656 3.063025
39 3.0636 191.6342 2.1925 51049 1.1492 7.7447 3.068808
40 3.1097 190.9312 2.2150 21934 1.1626 7.6895 3.091819
Note: Input DF in reproducible form is:
Lines <- " Y X1 X2 X3 X4 X5
1 3.2860 192.5115 2.1275 83381 11.4360 8.7440
2 3.2650 190.1462 2.0050 88720 11.4359 8.8971
3 3.2213 192.9773 2.0500 74130 11.4623 8.8380
4 3.1991 193.7058 2.1050 73930 11.3366 8.7536
5 3.2224 193.5407 2.0275 80875 11.3534 8.7555
6 3.2000 190.6049 2.0950 86606 11.3290 8.8555
7 3.1939 191.1390 2.0975 91402 11.2960 8.8433
8 3.1971 192.2921 2.2700 88181 11.2930 8.8681
9 3.1873 194.9700 2.3300 115959 1.9477 8.5245
10 3.2182 194.5396 2.4200 134754 11.3200 8.4990
11 3.2409 194.5396 2.2025 136685 1.9649 8.4192
12 3.2112 195.1362 2.1900 136316 1.9750 8.3752
13 3.2231 193.3560 2.2475 140295 1.9691 8.3546
14 3.2015 192.9649 2.2575 139474 1.9500 8.3116
15 3.1744 194.0154 2.1900 146202 1.8476 8.2225
16 3.1646 194.4423 2.2650 142983 1.8600 8.1948
17 3.1708 194.9473 2.2425 141377 1.8522 8.2589
18 3.1675 193.9788 2.2400 141377 1.8600 8.2600
19 3.1744 194.2563 2.3000 149875 1.8718 8.2899
20 3.1410 193.4316 2.2300 129561 1.8480 8.2395
21 3.1266 191.2633 2.2550 122636 1.8440 8.2396
22 3.1486 192.0354 2.3600 130996 1.8570 8.8640
23 3.1282 194.3351 2.4825 92430 1.7849 8.1291
24 3.1214 193.5196 2.4750 94814 1.7624 8.1991
25 3.1230 193.2017 2.3725 87590 1.7660 8.2310
26 3.1182 192.1642 2.4475 87715 1.6955 8.2414
27 3.1203 191.3744 2.3775 89857 1.6539 8.2480
28 3.1156 192.2646 2.3725 92159 1.5976 8.1676
29 3.1270 192.7555 2.3675 97425 1.5896 8.1162
30 3.1154 194.0375 2.3725 87598 1.5277 8.2640
31 3.1104 192.0596 2.3850 93236 1.5132 7.9999
32 3.0846 192.2792 2.2900 94608 1.4990 8.1600
33 3.0569 193.2573 2.3050 84663 1.4715 8.2200
34 3.0893 192.7632 2.2550 67149 1.4955 7.9590
35 3.0991 192.1229 2.3050 75519 1.4280 7.9183
36 3.0879 192.1229 2.3100 76756 1.3839 7.9133
37 3.0965 192.0502 2.2175 61748 1.3130 7.8750
38 3.0655 191.2274 2.2300 41490 1.2823 7.8656
39 3.0636 191.6342 2.1925 51049 1.1492 7.7447
40 3.1097 190.9312 2.2150 21934 1.1626 7.6895"
DF <- read.table(text = Lines, header = TRUE)

How to give a function a specific column of a list using a for-loop but prevent that output is named according to the iterator command

Given the following example:
library(metafor)
dat <- escalc(measure = "RR", ai = tpos, bi = tneg, ci = cpos, di = cneg, data = dat.bcg, append = TRUE)
dat
rma(yi, vi, data = dat, mods = ~dat[[8]], subset = (alloc=="systematic"), knha = TRUE)
trial author year tpos tneg cpos cneg ablat alloc yi vi
1 1 Aronson 1948 4 119 11 128 44 random -0.8893 0.3256
2 2 Ferguson & Simes 1949 6 300 29 274 55 random -1.5854 0.1946
3 3 Rosenthal et al 1960 3 228 11 209 42 random -1.3481 0.4154
4 4 Hart & Sutherland 1977 62 13536 248 12619 52 random -1.4416 0.0200
5 5 Frimodt-Moller et al 1973 33 5036 47 5761 13 alternate -0.2175 0.0512
6 6 Stein & Aronson 1953 NA NA NA NA 44 alternate NA NA
7 7 Vandiviere et al 1973 8 2537 10 619 19 random -1.6209 0.2230
8 8 TPT Madras 1980 505 87886 499 87892 NA random 0.0120 0.0040
9 9 Coetzee & Berjak 1968 29 7470 45 7232 27 random -0.4694 0.0564
10 10 Rosenthal et al 1961 17 1699 65 1600 42 systematic -1.3713 0.0730
11 11 Comstock et al 1974 186 50448 141 27197 18 systematic -0.3394 0.0124
12 12 Comstock & Webster 1969 5 2493 3 2338 33 systematic 0.4459 0.5325
13 13 Comstock et al 1976 27 16886 29 17825 33 systematic -0.0173 0.0714
Now what i basically want is to iterate with the rma() command (only for mods argument) from - let's say - [7:8] and to store this result in a variable equal to the columnname.
Two problems:
1) When i enter the command:
rma(yi, vi, data = dat, mods = ~dat[[8]], subset = (alloc=="systematic"), knha = TRUE)
The modname is named as dat[[8]]. But I want the modname to be the columname (i.e. colnames(dat[i]))
Model Results:
estimate se tval pval ci.lb ci.ub
intrcpt 0.5543 1.4045 0.3947 0.7312 -5.4888 6.5975
dat[[8]] -0.0312 0.0435 -0.7172 0.5477 -0.2185 0.1560
2) Now imagine that I have a lot of columns more and I want to iterate from [8:53], such that each result gets stored in a variable named equal to the columnname.
Problem 2) has been solved:
for(i in 7:8){
assign(paste(colnames(dat[i]), i, sep=""), rma(yi, vi, data = dat, mods = ~dat[[i]], subset = (alloc=="systematic"), knha = TRUE))}
To answers 1st part of your question, you can change the names by accessing the attributes of the model object.
In this case
# inspect the attributes
attr(model$vb, which = "dimnames")
# assign the name
attr(model$vb, which = "dimnames")[[1]][2] <- paste(colnames(dat)[8])

Outputting percentiles by filtering a data frame

Note that, as requested in the comments, that this question has been revised.
Consider the following example:
df <- data.frame(FILTER = rep(1:10, each = 10), VALUE = 1:100)
I would like to, for each value of FILTER, create a data frame which contains the 1st, 2nd, ..., 99th percentiles of VALUE. The final product should be
PERCENTILE df_1 df_2 ... df_10
1 [first percentiles]
2 [second percentiles]
etc., where df_i is based on FILTER == i.
Note that FILTER, although it contains numbers, is actually categorical.
The way I have been doing this is by using dplyr:
nums <- 1:10
library(dplyr)
for (i in nums){
df_temp <- filter(df, FILTER == i)$VALUE
assign(paste0("df_", i), quantile(df_temp, probs = (1:99)/100))
}
and then I would have to cbind these (with 1:99 in the first column), but I would rather not type in every single df name. I have considered using a loop on the names of these data frames, but this would involve using eval(parse()).
Here's a basic outline of a possibly smoother approach. I have not included every single aspect of your desired output, but the modification should be fairly straightforward.
df <- data.frame(FILTER = rep(1:10, each = 10), VALUE = 1:100)
df_s <- lapply(split(df,df$FILTER),
FUN = function(x) quantile(x$VALUE,probs = c(0.25,0.5,0.75)))
out <- do.call(cbind,df_s)
colnames(out) <- paste0("df_",colnames(out))
> out
df_1 df_2 df_3 df_4 df_5 df_6 df_7 df_8 df_9 df_10
25% 3.25 13.25 23.25 33.25 43.25 53.25 63.25 73.25 83.25 93.25
50% 5.50 15.50 25.50 35.50 45.50 55.50 65.50 75.50 85.50 95.50
75% 7.75 17.75 27.75 37.75 47.75 57.75 67.75 77.75 87.75 97.75
I did this for just 3 quantiles to keep things simple, but it obviously extends. And you can add the 1:99 column afterwards as well.
I suggest that you use a list.
list_of_dfs <- list()
nums <- 1:10
for (i in nums){
list_of_dfs[[i]] <- nums*i
}
df <- data.frame(list_of_dfs[[1]])
df <- do.call("cbind",args=list(df,list_of_dfs))
colnames(df) <- paste0("df_",1:10)
You'll get the result you want:
df_1 df_2 df_3 df_4 df_5 df_6 df_7 df_8 df_9 df_10
1 1 2 3 4 5 6 7 8 9 10
2 2 4 6 8 10 12 14 16 18 20
3 3 6 9 12 15 18 21 24 27 30
4 4 8 12 16 20 24 28 32 36 40
5 5 10 15 20 25 30 35 40 45 50
6 6 12 18 24 30 36 42 48 54 60
7 7 14 21 28 35 42 49 56 63 70
8 8 16 24 32 40 48 56 64 72 80
9 9 18 27 36 45 54 63 72 81 90
10 10 20 30 40 50 60 70 80 90 100
How about using get?
df <- data.frame(1:10)
for (i in nums) {
df <- cbind(df, get(paste0("df_", i)))
}
# get rid of first useless column
df <- df[, -1]
# get names
names(df) <- paste0("df_", nums)
df

Automate regression by rows

I have a data.frame
set.seed(100)
exp <- data.frame(exp = c(rep(LETTERS[1:2], each = 10)), re = c(rep(seq(1, 10, 1), 2)), age1 = seq(10, 29, 1), age2 = seq(30, 49, 1),
h = c(runif(20, 10, 40)), h2 = c(40 + runif(20, 4, 9)))
I'd like to make a lm for each row in a data set (h and h2 ~ age1 and age2)
I do it by loop
exp$modelh <- 0
for (i in 1:length(exp$exp)){
age = c(exp$age1[i], exp$age2[i])
h = c(exp$h[i], exp$h2[i])
model = lm(age ~ h)
exp$modelh[i] = coef(model)[1] + 100 * coef(model)[2]
}
and it works well but takes some time with very large files. Will be grateful for the faster solution f.ex. dplyr
Using dplyr, we can try with rowwise() and do. Inside the do, we concatenate (c) the 'age1', 'age2' to create 'age', likewise, we can create 'h', apply lm, extract the coef to create the column 'modelh'.
library(dplyr)
exp %>%
rowwise() %>%
do({
age <- c(.$age1, .$age2)
h <- c(.$h, .$h2)
model <- lm(age ~ h)
data.frame(., modelh = coef(model)[1] + 100*coef(model)[2])
} )
gives the output
# exp re age1 age2 h h2 modelh
#1 A 1 10 30 19.23298 46.67906 68.85506
#2 A 2 11 31 17.73018 47.55402 66.17050
#3 A 3 12 32 26.56967 46.69174 84.98486
#4 A 4 13 33 11.69149 47.74486 61.98766
#5 A 5 14 34 24.05648 46.10051 82.90167
#6 A 6 15 35 24.51312 44.85710 89.21053
#7 A 7 16 36 34.37208 47.85151 113.37492
#8 A 8 17 37 21.10962 48.40977 74.79483
#9 A 9 18 38 26.39676 46.74548 90.34187
#10 A 10 19 39 15.10786 45.38862 75.07002
#11 B 1 20 40 28.74989 46.44153 100.54666
#12 B 2 21 41 36.46497 48.64253 125.34773
#13 B 3 22 42 18.41062 45.74346 81.70062
#14 B 4 23 43 21.95464 48.77079 81.20773
#15 B 5 24 44 32.87653 47.47637 115.95097
#16 B 6 25 45 30.07065 48.44727 101.10688
#17 B 7 26 46 16.13836 44.90204 84.31080
#18 B 8 27 47 20.72575 47.14695 87.00805
#19 B 9 28 48 20.78425 48.94782 84.25406
#20 B 10 29 49 30.70872 44.65144 128.39415
We could do this with the devel version of data.table i.e. v1.9.5. Instructions to install the devel version are here.
We convert the 'data.frame' to 'data.table' (setDT), create a column 'rn' with the option keep.rownames=TRUE. We melt the dataset by specifying the patterns in the measure to convert from 'wide' to 'long' format. Grouped by 'rn', we do the lm and get the coef. This can be assigned as a new column in the original dataset ('exp') while removing the unwanted 'rn' column by assigning (:=) it to NULL.
library(data.table)#v1.9.5+
modelh <- melt(setDT(exp, keep.rownames=TRUE), measure=patterns('^age', '^h'),
value.name=c('age', 'h'))[, {model <- lm(age ~h)
coef(model)[1] + 100 * coef(model)[2]},rn]$V1
exp[, modelh:= modelh][, rn := NULL]
exp
# exp re age1 age2 h h2 modelh
# 1: A 1 10 30 19.23298 46.67906 68.85506
# 2: A 2 11 31 17.73018 47.55402 66.17050
# 3: A 3 12 32 26.56967 46.69174 84.98486
# 4: A 4 13 33 11.69149 47.74486 61.98766
# 5: A 5 14 34 24.05648 46.10051 82.90167
# 6: A 6 15 35 24.51312 44.85710 89.21053
# 7: A 7 16 36 34.37208 47.85151 113.37492
# 8: A 8 17 37 21.10962 48.40977 74.79483
# 9: A 9 18 38 26.39676 46.74548 90.34187
#10: A 10 19 39 15.10786 45.38862 75.07002
#11: B 1 20 40 28.74989 46.44153 100.54666
#12: B 2 21 41 36.46497 48.64253 125.34773
#13: B 3 22 42 18.41062 45.74346 81.70062
#14: B 4 23 43 21.95464 48.77079 81.20773
#15: B 5 24 44 32.87653 47.47637 115.95097
#16: B 6 25 45 30.07065 48.44727 101.10688
#17: B 7 26 46 16.13836 44.90204 84.31080
#18: B 8 27 47 20.72575 47.14695 87.00805
#19: B 9 28 48 20.78425 48.94782 84.25406
#20: B 10 29 49 30.70872 44.65144 128.39415
Great (double) answer from #akrun.
Just a suggestion for your future analysis as you mentioned "it's an example of a bigger problem". Obviously, if you are really interested in building models rowwise then you'll create more and more columns as your age and h observations increase. If you get N observations you'll have to use 2xN columns for those 2 variables only.
I'd suggest to use a long data format in order to increase your rows instead of your columns.
Something like:
exp[1,] # how your first row (model building info) looks like
# exp re age1 age2 h h2
# 1 A 1 10 30 19.23298 46.67906
reshape(exp[1,], # how your model building info is transformed
varying = list(c("age1","age2"),
c("h","h2")),
v.names = c("age_value","h_value"),
direction = "long")
# exp re time age_value h_value id
# 1.1 A 1 1 10 19.23298 1
# 1.2 A 1 2 30 46.67906 1
Apologies if the "bigger problem" refers to something else and this answer is irrelevant.
With base R, the function sprintf can help us create formulas. And lapply carries out the calculation.
strings <- sprintf("c(%f,%f) ~ c(%f,%f)", exp$age1, exp$age2, exp$h, exp$h2)
lst <- lapply(strings, function(x) {model <- lm(as.formula(x));coef(model)[1] + 100 * coef(model)[2]})
exp$modelh <- unlist(lst)
exp
# exp re age1 age2 h h2 modelh
# 1 A 1 10 30 19.23298 46.67906 68.85506
# 2 A 2 11 31 17.73018 47.55402 66.17050
# 3 A 3 12 32 26.56967 46.69174 84.98486
# 4 A 4 13 33 11.69149 47.74486 61.98766
# 5 A 5 14 34 24.05648 46.10051 82.90167
# 6 A 6 15 35 24.51312 44.85710 89.21053
# 7 A 7 16 36 34.37208 47.85151 113.37493
# 8 A 8 17 37 21.10962 48.40977 74.79483
# 9 A 9 18 38 26.39676 46.74548 90.34187
# 10 A 10 19 39 15.10786 45.38862 75.07002
# 11 B 1 20 40 28.74989 46.44153 100.54666
# 12 B 2 21 41 36.46497 48.64253 125.34773
# 13 B 3 22 42 18.41062 45.74346 81.70062
# 14 B 4 23 43 21.95464 48.77079 81.20773
# 15 B 5 24 44 32.87653 47.47637 115.95097
# 16 B 6 25 45 30.07065 48.44727 101.10688
# 17 B 7 26 46 16.13836 44.90204 84.31080
# 18 B 8 27 47 20.72575 47.14695 87.00805
# 19 B 9 28 48 20.78425 48.94782 84.25406
# 20 B 10 29 49 30.70872 44.65144 128.39416
In the lapply function the expression as.formula(x) is what converts the formulas created in the first line into a format usable by the lm function.
Benchmark
library(dplyr)
library(microbenchmark)
set.seed(100)
big.exp <- data.frame(age1=sample(30, 1e4, T),
age2=sample(30:50, 1e4, T),
h=runif(1e4, 10, 40),
h2= 40 + runif(1e4,4,9))
microbenchmark(
plafort = {strings <- sprintf("c(%f,%f) ~ c(%f,%f)", big.exp$age1, big.exp$age2, big.exp$h, big.exp$h2)
lst <- lapply(strings, function(x) {model <- lm(as.formula(x));coef(model)[1] + 100 * coef(model)[2]})
big.exp$modelh <- unlist(lst)},
akdplyr = {big.exp %>%
rowwise() %>%
do({
age <- c(.$age1, .$age2)
h <- c(.$h, .$h2)
model <- lm(age ~ h)
data.frame(., modelh = coef(model)[1] + 100*coef(model)[2])
} )}
,times=5)
t: seconds
expr min lq mean median uq max neval cld
plafort 13.00605 13.41113 13.92165 13.56927 14.53814 15.08366 5 a
akdplyr 26.95064 27.64240 29.40892 27.86258 31.02955 33.55940 5 b
(Note: I downloaded the newest 1.9.5 devel version of data.table today, but continued to receive errors when trying to test it.
The results also differ fractionally (1.93 x 10^-8). Rounding likely accounts for the difference.)
all.equal(pl, ak)
[1] "Attributes: < Component “class”: Lengths (1, 3) differ (string compare on first 1) >"
[2] "Attributes: < Component “class”: 1 string mismatch >"
[3] "Component “modelh”: Mean relative difference: 1.933893e-08"
Conclusion
The lapply approach seems to perform well compared to dplyr with respect to speed, but it's 5 digit rounding may be an issue. Improvements may be possible. Perhaps using apply after converting to matrix to increase speed and efficiency.

Given data points and y value, give x value

Given a set of (x,y) coordinates, how can I solve for x, from y. If you were to plot the coordinates, they would be non-linear, but pretty close to exponential. I tried approx(), but it is way off. Here is example data. In this scenario, how could I solve for y == 50?
V1 V3
1 5.35 11.7906
2 10.70 15.0451
3 16.05 19.4243
4 21.40 20.7885
5 26.75 22.0584
6 32.10 25.4367
7 37.45 28.6701
8 42.80 30.7500
9 48.15 34.5084
10 53.50 37.0096
11 58.85 39.3423
12 64.20 41.5023
13 69.55 43.4599
14 74.90 44.7299
15 80.25 46.5738
16 85.60 47.7548
17 90.95 49.9749
18 96.30 51.0331
19 101.65 52.0207
20 107.00 52.9781
21 112.35 53.8730
22 117.70 54.2907
23 123.05 56.3025
24 128.40 56.6949
25 133.75 57.0830
26 139.10 58.5051
27 144.45 59.1440
28 149.80 60.0687
29 155.15 60.6627
30 160.50 61.2313
31 165.85 61.7748
32 171.20 62.5587
33 176.55 63.2684
34 181.90 63.7085
35 187.25 64.0788
36 192.60 64.5807
37 197.95 65.2233
38 203.30 65.5331
39 208.65 66.1200
40 214.00 66.6208
41 219.35 67.1952
42 224.70 67.5270
43 230.05 68.0175
44 235.40 68.3869
45 240.75 68.7485
46 246.10 69.1878
47 251.45 69.3980
48 256.80 69.5899
49 262.15 69.7382
50 267.50 69.7693
51 272.85 69.7693
52 278.20 69.7693
53 283.55 69.7693
54 288.90 69.7693
I suppose the problem you have is that approx solves for y given x, while you are talking about solving for x given y. So you need to switch your variables x and y when using approx:
df <- read.table(textConnection("
V1 V3
85.60 47.7548
90.95 49.9749
96.30 51.0331
101.65 52.0207
"), header = TRUE)
approx(x = df$V3, y = df$V1, xout = 50)
# $x
# [1] 50
#
# $y
# [1] 91.0769
Also, if y is exponential with respect to x, then you have a linear relationship between x and log(y), so it makes more sense to use a linear interpolator between x and log(y), then take the exponential to get back to y:
exp(approx(x = df$V3, y = log(df$V1), xout = 50)$y)
# [1] 91.07339

Resources