Interpolate data in R [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
Improve this question
I have these data for each individual (id).
I would like to interpolate data (hr, fr, relVO2, percent_relVO2) for each individual according their respective interpolated 25, 50 and 75 % of percent_power.
Thank you very much all for your help,
id power training hr fr percent_power relVO2 percent_relVO2 group temps
1 AC12-PRD-C1 25 linear 88.75 22.75 21.73913 8.797619 49.34068 CHD 1
2 AC12-PRD-C1 40 linear 93.25 23.00 34.78261 9.758929 54.73210 CHD 1
3 AC12-PRD-C1 55 linear 99.75 22.75 47.82609 11.324405 63.51193 CHD 1
4 AC12-PRD-C1 70 linear 109.75 23.00 60.86957 12.800595 71.79102 CHD 1
5 AC12-PRD-C1 85 linear 118.75 22.75 73.91304 14.273810 80.05341 CHD 1
6 AC12-PRD-C1 100 linear 127.00 26.00 86.95652 16.020833 89.85144 CHD 1
7 AC12-PRD-C1 115 linear 135.75 28.00 100.00000 17.830357 100.00000 CHD 1
8 AC12-PRD-C2 25 linear 84.25 20.50 17.24138 7.974646 40.10378 CHD 2
9 AC12-PRD-C2 40 linear 89.25 20.50 27.58621 8.649764 43.49889 CHD 2
10 AC12-PRD-C2 55 linear 96.25 22.25 37.93103 9.852594 49.54781 CHD 2
11 AC12-PRD-C2 70 linear 102.25 21.75 48.27586 12.529481 63.00964 CHD 2
12 AC12-PRD-C2 85 linear 110.75 22.25 58.62069 13.923939 70.02224 CHD 2
13 AC12-PRD-C2 100 linear 118.25 23.00 68.96552 15.931604 80.11861 CHD 2
14 AC12-PRD-C2 115 linear 129.25 24.75 79.31034 17.765330 89.34025 CHD 2
15 AC12-PRD-C2 130 linear 136.25 26.50 89.65517 18.552476 93.29874 CHD 2
16 AC12-PRD-C2 145 linear 147.50 29.75 100.00000 19.885024 100.00000 CHD 2
17 AL13-PRD-C1 25 nonlinear 69.50 16.50 19.23077 7.733918 41.36691 CHD 1
18 AL13-PRD-C1 40 nonlinear 73.00 17.50 30.76923 8.754386 46.82515 CHD 1
19 AL13-PRD-C1 55 nonlinear 83.25 15.50 42.30769 10.000000 53.48764 CHD 1
20 AL13-PRD-C1 70 nonlinear 93.75 16.00 53.84615 11.514620 61.58899 CHD 1
21 AL13-PRD-C1 85 nonlinear 104.50 16.00 65.38462 13.444444 71.91117 CHD 1
22 AL13-PRD-C1 100 nonlinear 114.25 19.25 76.92308 15.748538 84.23522 CHD 1
23 AL13-PRD-C1 115 nonlinear 125.25 20.75 88.46154 16.970760 90.77260 CHD 1
24 AL13-PRD-C1 130 nonlinear 136.25 24.75 100.00000 18.695906 100.00000 CHD 1
25 AL13-PRD-C2 25 nonlinear 60.25 15.75 15.62500 6.911408 30.83378 CHD 2
26 AL13-PRD-C2 40 nonlinear 63.25 14.25 25.00000 7.666869 34.20411 CHD 2
27 AL13-PRD-C2 55 nonlinear 72.75 15.75 34.37500 10.024272 44.72117 CHD 2
28 AL13-PRD-C2 70 nonlinear 79.00 15.50 43.75000 11.471481 51.17759 CHD 2
29 AL13-PRD-C2 85 nonlinear 88.25 16.00 53.12500 13.962379 62.29020 CHD 2
30 AL13-PRD-C2 100 nonlinear 99.00 16.75 62.50000 15.767597 70.34380 CHD 2
31 AL13-PRD-C2 115 nonlinear 107.00 18.00 71.87500 16.962985 75.67677 CHD 2
32 AL13-PRD-C2 130 nonlinear 118.50 21.00 81.25000 18.822816 83.97401 CHD 2
33 AL13-PRD-C2 145 nonlinear 128.25 24.25 90.62500 20.785801 92.73146 CHD 2
34 AL13-PRD-C2 160 nonlinear 142.50 29.00 100.00000 22.415049 100.00000 CHD 2

base R
byout <- by(dat[,c("percent_power","hr","fr","relVO2","percent_relVO2")], dat["id"],
FUN = function(z) data.frame(lapply(z, function(a) approx(a, x=z[[1]], xout=c(25, 50, 75))$y)))
do.call(rbind, Map(function(x, nm) transform(x, id = nm), byout, names(byout)))
# percent_power hr fr relVO2 percent_relVO2 id
# AC12-PRD-C1.1 25 89.87500 22.81250 9.037947 50.68854 AC12-PRD-C1
# AC12-PRD-C1.2 50 101.41666 22.79167 11.570436 64.89178 AC12-PRD-C1
# AC12-PRD-C1.3 75 119.43750 23.02083 14.419396 80.86992 AC12-PRD-C1
# AC12-PRD-C2.1 25 88.00000 20.50000 8.480984 42.65011 AC12-PRD-C2
# AC12-PRD-C2.2 50 103.66667 21.83333 12.761891 64.17841 AC12-PRD-C2
# AC12-PRD-C2.3 75 124.66667 24.02083 17.001278 85.49790 AC12-PRD-C2
# AL13-PRD-C1.1 25 71.25000 17.00000 8.244152 44.09603 AL13-PRD-C1
# AL13-PRD-C1.2 50 90.25000 15.83333 11.009747 58.88854 AL13-PRD-C1
# AL13-PRD-C1.3 75 112.62500 18.70833 15.364522 82.18121 AL13-PRD-C1
# AL13-PRD-C2.1 25 63.25000 14.25000 7.666869 34.20411 AL13-PRD-C2
# AL13-PRD-C2.2 50 85.16667 15.83333 13.132080 58.58600 AL13-PRD-C2
# AL13-PRD-C2.3 75 110.83333 19.00000 17.582929 78.44252 AL13-PRD-C2
dplyr
library(dplyr)
dat %>%
group_by(id) %>%
summarize(
across(c(hr, fr, relVO2, percent_relVO2), ~ approx(cur_data()$percent_power, ., xout = c(25, 50, 75))$y),
percent_power = c(25, 50, 75),
.groups = "drop")
# # A tibble: 12 x 6
# id hr fr relVO2 percent_relVO2 percent_power
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 AC12-PRD-C1 89.9 22.8 9.04 50.7 25
# 2 AC12-PRD-C1 101. 22.8 11.6 64.9 50
# 3 AC12-PRD-C1 119. 23.0 14.4 80.9 75
# 4 AC12-PRD-C2 88.0 20.5 8.48 42.7 25
# 5 AC12-PRD-C2 104. 21.8 12.8 64.2 50
# 6 AC12-PRD-C2 125. 24.0 17.0 85.5 75
# 7 AL13-PRD-C1 71.2 17 8.24 44.1 25
# 8 AL13-PRD-C1 90.3 15.8 11.0 58.9 50
# 9 AL13-PRD-C1 113. 18.7 15.4 82.2 75
# 10 AL13-PRD-C2 63.2 14.2 7.67 34.2 25
# 11 AL13-PRD-C2 85.2 15.8 13.1 58.6 50
# 12 AL13-PRD-C2 111. 19 17.6 78.4 75
Note that in dplyr, the order of calculation matters. If we put the percent_power= reassignment before across(...), then approx will never see the real (more rows) value, it will only see the new (c(25,50,75)) value.
data.table
library(data.table)
DT <- as.data.table(dat) # setDT(dat) would be more canonical
DT[, c(
list(percent_power = c(25, 50, 75)),
lapply(.SD, function(a) approx(a, x=percent_power, xout=c(25, 50, 75))$y)
), by = .(id), .SDcols = c("hr","fr","relVO2","percent_relVO2")]
# id percent_power hr fr relVO2 percent_relVO2
# 1: AC12-PRD-C1 25 89.87500 22.81250 9.037947 50.68854
# 2: AC12-PRD-C1 50 101.41666 22.79167 11.570436 64.89178
# 3: AC12-PRD-C1 75 119.43750 23.02083 14.419396 80.86992
# 4: AC12-PRD-C2 25 88.00000 20.50000 8.480984 42.65011
# 5: AC12-PRD-C2 50 103.66667 21.83333 12.761891 64.17841
# 6: AC12-PRD-C2 75 124.66667 24.02083 17.001278 85.49790
# 7: AL13-PRD-C1 25 71.25000 17.00000 8.244152 44.09603
# 8: AL13-PRD-C1 50 90.25000 15.83333 11.009747 58.88854
# 9: AL13-PRD-C1 75 112.62500 18.70833 15.364522 82.18121
# 10: AL13-PRD-C2 25 63.25000 14.25000 7.666869 34.20411
# 11: AL13-PRD-C2 50 85.16667 15.83333 13.132080 58.58600
# 12: AL13-PRD-C2 75 110.83333 19.00000 17.582929 78.44252
Unlike in dplyr, I can name the new field percent_power the same since in the subsequent lapply, the ref to percent_power is resolved to the original (more values) data, not the new. (This is one area where mutate and data.table-semantics are very different: in mutate, one calculation can refer to a column calculated previously in the same mutate, whereas in data.table this is not true.
Data
dat <- structure(list(id = c("AC12-PRD-C1", "AC12-PRD-C1", "AC12-PRD-C1", "AC12-PRD-C1", "AC12-PRD-C1", "AC12-PRD-C1", "AC12-PRD-C1", "AC12-PRD-C2", "AC12-PRD-C2", "AC12-PRD-C2", "AC12-PRD-C2", "AC12-PRD-C2", "AC12-PRD-C2", "AC12-PRD-C2", "AC12-PRD-C2", "AC12-PRD-C2", "AL13-PRD-C1", "AL13-PRD-C1", "AL13-PRD-C1", "AL13-PRD-C1", "AL13-PRD-C1", "AL13-PRD-C1", "AL13-PRD-C1", "AL13-PRD-C1", "AL13-PRD-C2", "AL13-PRD-C2", "AL13-PRD-C2", "AL13-PRD-C2", "AL13-PRD-C2", "AL13-PRD-C2", "AL13-PRD-C2", "AL13-PRD-C2", "AL13-PRD-C2", "AL13-PRD-C2"), power = c(25L, 40L, 55L, 70L, 85L, 100L, 115L, 25L, 40L, 55L, 70L, 85L, 100L, 115L, 130L, 145L, 25L, 40L, 55L, 70L, 85L, 100L, 115L, 130L, 25L, 40L, 55L, 70L, 85L, 100L, 115L, 130L, 145L, 160L), training = c("linear", "linear", "linear", "linear", "linear", "linear", "linear", "linear", "linear", "linear", "linear", "linear", "linear", "linear", "linear", "linear", "nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear"), hr = c(88.75, 93.25, 99.75, 109.75, 118.75, 127, 135.75, 84.25, 89.25, 96.25, 102.25, 110.75, 118.25, 129.25, 136.25, 147.5, 69.5, 73, 83.25, 93.75, 104.5, 114.25, 125.25, 136.25, 60.25, 63.25, 72.75, 79, 88.25, 99, 107, 118.5, 128.25, 142.5), fr = c(22.75, 23, 22.75, 23, 22.75, 26, 28, 20.5, 20.5, 22.25, 21.75, 22.25, 23, 24.75, 26.5, 29.75, 16.5, 17.5, 15.5, 16, 16, 19.25, 20.75, 24.75, 15.75, 14.25, 15.75, 15.5, 16, 16.75, 18, 21, 24.25, 29), percent_power = c(21.73913, 34.78261, 47.82609, 60.86957, 73.91304, 86.95652, 100, 17.24138, 27.58621, 37.93103, 48.27586, 58.62069, 68.96552, 79.31034, 89.65517, 100, 19.23077, 30.76923, 42.30769, 53.84615, 65.38462, 76.92308, 88.46154, 100, 15.625, 25, 34.375, 43.75, 53.125, 62.5, 71.875, 81.25, 90.625, 100), relVO2 = c(8.797619, 9.758929, 11.324405, 12.800595, 14.27381, 16.020833, 17.830357, 7.974646, 8.649764, 9.852594, 12.529481, 13.923939, 15.931604, 17.76533, 18.552476, 19.885024, 7.733918, 8.754386, 10, 11.51462, 13.444444, 15.748538, 16.97076, 18.695906, 6.911408, 7.666869, 10.024272, 11.471481, 13.962379, 15.767597, 16.962985, 18.822816, 20.785801, 22.415049), percent_relVO2 = c(49.34068, 54.7321, 63.51193, 71.79102, 80.05341, 89.85144, 100, 40.10378, 43.49889, 49.54781, 63.00964, 70.02224, 80.11861, 89.34025, 93.29874, 100, 41.36691, 46.82515, 53.48764, 61.58899, 71.91117, 84.23522, 90.7726, 100, 30.83378, 34.20411, 44.72117, 51.17759, 62.2902, 70.3438, 75.67677, 83.97401, 92.73146, 100), group = c("CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD", "CHD"), temps = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34"))

Related

Interpolation and estimation of data in R

Here are my data showing the cardiopulmonary evolution during an acute exercise with gaz exchange analysis. As we can see, when I report data according the relative percentage of peak power (percent_power), the percentage doesn't correspond between each individuals, because, of course, everyone has a different peak.
Then, I would like to create an interpolation for each individual to estimate specific percentages (e.g. 25%, 50%, 75%) for each individual, and put in a column. Consequently, each variable should be estimated as well and should correspond to the new estimated 25, 50, 75 % of percent_power.
Thank you very much all for your help,
id power training hr fr VE absVO2 VCO2 PETCO2 VES QC IC WCI RVSi RVS VTD FE body_mass percent_absVO2 percent_power relVO2 percent_relVO2 group temps
1 AC12-PRD-C1 25 linear 88.75 22.75 22.75 0.73900 0.66700 39.2925 88.650 8.025 3.975 4.825 1768.75 876.00 143.025 62.050 84.0 49.34068 21.73913 8.797619 49.34068 CHD 1
2 AC12-PRD-C1 40 linear 93.25 23.00 23.75 0.81975 0.71500 39.6200 87.375 8.050 3.975 4.825 1759.50 871.75 141.625 61.725 84.0 54.73210 34.78261 9.758929 54.73210 CHD 1
3 AC12-PRD-C1 55 linear 99.75 22.75 26.75 0.95125 0.85400 41.4100 93.375 9.175 4.550 5.525 1540.50 763.00 150.325 62.100 84.0 63.51193 47.82609 11.324405 63.51193 CHD 1
4 AC12-PRD-C1 70 linear 109.75 23.00 32.50 1.07525 1.04700 42.0150 93.825 10.025 4.925 6.000 1414.25 700.50 145.750 64.375 84.0 71.79102 60.86957 12.800595 71.79102 CHD 1
5 AC12-PRD-C1 85 linear 118.75 22.75 39.50 1.19900 1.25125 41.8425 97.375 11.225 5.575 6.750 1260.75 624.50 148.975 65.325 84.0 80.05341 73.91304 14.273810 80.05341 CHD 1
6 AC12-PRD-C1 100 linear 127.00 26.00 48.25 1.34575 1.51850 41.0950 100.900 12.550 6.225 7.525 1127.75 558.75 154.225 65.475 84.0 89.85144 86.95652 16.020833 89.85144 CHD 1
7 AC12-PRD-C1 115 linear 135.75 28.00 55.75 1.49775 1.76025 40.7275 104.475 13.950 6.875 8.375 1014.00 502.25 157.975 66.250 84.0 100.00000 100.00000 17.830357 100.00000 CHD 1
8 AC12-PRD-C2 25 linear 84.25 20.50 20.75 0.67625 0.59950 38.9575 102.700 8.650 4.275 5.575 1775.00 879.50 216.450 48.350 84.8 40.10378 17.24138 7.974646 40.10378 CHD 2
9 AC12-PRD-C2 40 linear 89.25 20.50 23.25 0.73350 0.66225 38.5500 111.625 9.725 4.800 6.250 1567.75 776.75 217.800 51.825 84.8 43.49889 27.58621 8.649764 43.49889 CHD 2
10 AC12-PRD-C2 55 linear 96.25 22.25 26.75 0.83550 0.77500 38.3350 101.300 9.325 4.625 6.000 1619.75 802.75 202.700 50.350 84.8 49.54781 37.93103 9.852594 49.54781 CHD 2
11 AC12-PRD-C2 70 linear 102.25 21.75 32.50 1.06250 1.01550 39.6525 103.550 10.350 5.125 6.625 1459.00 723.00 194.050 53.675 84.8 63.00964 48.27586 12.529481 63.00964 CHD 2
12 AC12-PRD-C2 85 linear 110.75 22.25 37.75 1.18075 1.19225 40.1300 100.825 10.650 5.275 6.825 1424.00 705.25 194.250 51.900 84.8 70.02224 58.62069 13.923939 70.02224 CHD 2
13 AC12-PRD-C2 100 linear 118.25 23.00 42.75 1.35100 1.40300 41.1500 108.950 12.375 6.100 7.950 1225.50 606.75 197.325 55.175 84.8 80.11861 68.96552 15.931604 80.11861 CHD 2
14 AC12-PRD-C2 115 linear 129.25 24.75 51.25 1.50650 1.65650 40.7575 107.625 13.275 6.550 8.525 1133.50 561.50 201.225 53.525 84.8 89.34025 79.31034 17.765330 89.34025 CHD 2
15 AC12-PRD-C2 130 linear 136.25 26.50 58.75 1.57325 1.83200 39.6750 108.925 14.375 7.125 9.250 1045.75 518.25 196.025 55.675 84.8 93.29874 89.65517 18.552476 93.29874 CHD 2
16 AC12-PRD-C2 145 linear 147.50 29.75 70.00 1.68625 2.07350 38.1600 104.875 15.025 7.450 9.600 1010.75 500.75 185.400 56.825 84.8 100.00000 100.00000 19.885024 100.00000 CHD 2
17 AL13-PRD-C1 25 nonlinear 69.50 16.50 24.00 0.66125 0.58050 31.2275 101.825 7.175 3.500 4.450 2126.50 1037.25 220.850 48.550 85.5 41.36691 19.23077 7.733918 41.36691 CHD 1
18 AL13-PRD-C1 40 nonlinear 73.00 17.50 26.50 0.74850 0.66425 32.1025 107.850 7.775 3.775 4.850 1942.00 947.25 242.825 48.000 85.5 46.82515 30.76923 8.754386 46.82515 CHD 1
19 AL13-PRD-C1 55 nonlinear 83.25 15.50 29.00 0.85500 0.79425 33.6650 110.250 9.075 4.375 5.650 1706.00 832.50 233.500 47.325 85.5 53.48764 42.30769 10.000000 53.48764 CHD 1
20 AL13-PRD-C1 70 nonlinear 93.75 16.00 36.50 0.98450 0.99925 34.5325 114.650 10.425 5.075 6.525 1462.00 713.25 233.075 49.175 85.5 61.58899 53.84615 11.514620 61.58899 CHD 1
21 AL13-PRD-C1 85 nonlinear 104.50 16.00 44.75 1.14950 1.23475 34.4225 120.650 12.150 5.925 7.550 1249.25 609.25 233.575 51.775 85.5 71.91117 65.38462 13.444444 71.91117 CHD 1
22 AL13-PRD-C1 100 nonlinear 114.25 19.25 55.25 1.34650 1.48375 33.1800 115.250 12.775 6.275 7.975 1178.25 574.75 220.375 52.350 85.5 84.23522 76.92308 15.748538 84.23522 CHD 1
23 AL13-PRD-C1 115 nonlinear 125.25 20.75 63.75 1.45100 1.65775 32.6450 117.500 14.100 6.875 8.825 1095.25 534.25 236.575 50.200 85.5 90.77260 88.46154 16.970760 90.77260 CHD 1
24 AL13-PRD-C1 130 nonlinear 136.25 24.75 78.00 1.59850 1.89075 30.9000 119.150 15.575 7.600 9.700 968.25 472.25 231.075 51.600 85.5 100.00000 100.00000 18.695906 100.00000 CHD 1
25 AL13-PRD-C2 25 nonlinear 60.25 15.75 19.00 0.56950 0.46550 32.2575 154.625 9.450 4.700 6.075 1597.75 794.75 348.975 44.850 82.4 30.83378 15.62500 6.911408 30.83378 CHD 2
26 AL13-PRD-C2 40 nonlinear 63.25 14.25 19.50 0.63175 0.52325 33.5700 143.225 9.275 4.625 5.975 1631.75 811.50 326.325 44.575 82.4 34.20411 25.00000 7.666869 34.20411 CHD 2
27 AL13-PRD-C2 55 nonlinear 72.75 15.75 25.00 0.82600 0.69925 34.4600 147.350 10.175 5.075 6.525 1497.25 744.75 312.475 47.950 82.4 44.72117 34.37500 10.024272 44.72117 CHD 2
28 AL13-PRD-C2 70 nonlinear 79.00 15.50 30.75 0.94525 0.86850 34.9675 153.575 11.925 5.925 7.675 1257.00 625.25 271.525 56.625 82.4 51.17759 43.75000 11.471481 51.17759 CHD 2
29 AL13-PRD-C2 85 nonlinear 88.25 16.00 37.50 1.15050 1.08025 35.6175 155.200 13.325 6.625 8.550 1127.00 560.50 282.300 54.975 82.4 62.29020 53.12500 13.962379 62.29020 CHD 2
30 AL13-PRD-C2 100 nonlinear 99.00 16.75 44.75 1.29925 1.31475 35.6475 154.150 14.775 7.325 9.500 1030.75 512.50 285.350 54.500 82.4 70.34380 62.50000 15.767597 70.34380 CHD 2
31 AL13-PRD-C2 115 nonlinear 107.00 18.00 50.00 1.39775 1.45600 36.0325 161.000 16.675 8.300 10.725 898.00 446.50 282.850 57.175 82.4 75.67677 71.87500 16.962985 75.67677 CHD 2
32 AL13-PRD-C2 130 nonlinear 118.50 21.00 61.50 1.55100 1.73675 34.8775 162.300 18.300 9.100 11.750 815.75 405.75 276.700 58.700 82.4 83.97401 81.25000 18.822816 83.97401 CHD 2
33 AL13-PRD-C2 145 nonlinear 128.25 24.25 74.75 1.71275 1.99100 33.3300 161.025 19.925 9.900 12.800 749.50 372.75 267.875 60.175 82.4 92.73146 90.62500 20.785801 92.73146 CHD 2
34 AL13-PRD-C2 160 nonlinear 142.50 29.00 90.50 1.84700 2.21650 30.9325 154.750 20.925 10.425 13.425 715.50 355.75 272.250 57.100 82.4 100.00000 100.00000 22.415049 100.00000 CHD 2

Remove duplicates based on column values in specific intervals in R

I have multi-column data as follows. I want to remove rows having duplicate values in depth column.
Date Levels values depth
1 2005-12-31 1 182.80 0
2 2005-12-31 2 182.80 0
3 2005-12-31 5 182.80 2
4 2005-12-31 6 182.80 2
5 2005-12-31 7 182.80 2
6 2005-12-31 8 182.80 3
7 2005-12-31 9 182.80 4
8 2005-12-31 10 182.80 4
9 2005-12-31 11 182.80 5
10 2005-12-31 13 182.70 7
11 2005-12-31 14 182.70 8
12 2005-12-31 16 182.60 10
13 2005-12-31 17 182.50 12
14 2005-12-31 20 181.50 17
15 2005-12-31 23 177.50 23
16 2005-12-31 26 165.90 31
17 2005-12-31 28 155.00 36
18 2005-12-31 29 149.20 40
19 2005-12-31 31 136.90 46
20 2005-12-31 33 126.10 53
21 2005-12-31 35 112.70 60
22 2005-12-31 38 88.23 70
23 2005-12-31 41 67.99 79
24 2005-12-31 44 54.63 87
25 2005-12-31 49 45.40 98
26 2006-12-31 1 182.80 0
27 2006-12-31 2 182.80 0
28 2006-12-31 5 182.80 2
29 2006-12-31 6 182.80 2
30 2006-12-31 7 182.80 2
31 2006-12-31 8 182.80 3
32 2006-12-31 9 182.80 4
33 2006-12-31 10 182.80 4
34 2006-12-31 11 182.70 5
35 2006-12-31 13 182.70 7
36 2006-12-31 14 182.70 8
37 2006-12-31 16 182.60 10
38 2006-12-31 17 182.50 12
39 2006-12-31 20 181.50 17
40 2006-12-31 23 178.60 23
41 2006-12-31 26 168.70 31
42 2006-12-31 28 156.90 36
43 2006-12-31 29 150.40 40
44 2006-12-31 31 137.10 46
45 2006-12-31 33 126.00 53
46 2006-12-31 35 112.70 60
47 2006-12-31 38 91.80 70
48 2006-12-31 41 75.91 79
49 2006-12-31 44 65.17 87
50 2006-12-31 49 58.33 98
I know how to remove duplicates based on a column as follows;
nodup<- distinct(df, column, .keep_all = TRUE)
But how can I do this code for every 25 rows interval?
base R
do.call(rbind, by(dat, (seq_len(nrow(dat))-1) %/% 25,
function(z) z[!duplicated(z$depth),]))
# Date Levels values depth
# 0.1 2005-12-31 1 182.8 0
# 0.3 2005-12-31 5 182.8 2
# 0.6 2005-12-31 8 182.8 3
# 0.7 2005-12-31 9 182.8 4
# 0.9 2005-12-31 11 182.8 5
# 0.10 2005-12-31 13 182.7 7
# 0.11 2005-12-31 14 182.7 8
# 0.12 2005-12-31 16 182.6 10
# 0.13 2005-12-31 17 182.5 12
# 0.14 2005-12-31 20 181.5 17
# 0.15 2005-12-31 23 177.5 23
# 0.16 2005-12-31 26 165.9 31
# 0.17 2005-12-31 28 155.0 36
# 0.18 2005-12-31 29 149.2 40
# 0.19 2005-12-31 31 136.9 46
# 0.20 2005-12-31 33 126.1 53
# 0.21 2005-12-31 35 112.7 60
# 0.22 2005-12-31 38 88.2 70
# 0.23 2005-12-31 41 68.0 79
# 0.24 2005-12-31 44 54.6 87
# 0.25 2005-12-31 49 45.4 98
# 1.26 2006-12-31 1 182.8 0
# 1.28 2006-12-31 5 182.8 2
# 1.31 2006-12-31 8 182.8 3
# 1.32 2006-12-31 9 182.8 4
# 1.34 2006-12-31 11 182.7 5
# 1.35 2006-12-31 13 182.7 7
# 1.36 2006-12-31 14 182.7 8
# 1.37 2006-12-31 16 182.6 10
# 1.38 2006-12-31 17 182.5 12
# 1.39 2006-12-31 20 181.5 17
# 1.40 2006-12-31 23 178.6 23
# 1.41 2006-12-31 26 168.7 31
# 1.42 2006-12-31 28 156.9 36
# 1.43 2006-12-31 29 150.4 40
# 1.44 2006-12-31 31 137.1 46
# 1.45 2006-12-31 33 126.0 53
# 1.46 2006-12-31 35 112.7 60
# 1.47 2006-12-31 38 91.8 70
# 1.48 2006-12-31 41 75.9 79
# 1.49 2006-12-31 44 65.2 87
# 1.50 2006-12-31 49 58.3 98
or
dat[!ave(dat$depth, (seq_len(nrow(dat))-1) %/% 25, FUN = duplicated),]
dplyr
library(dplyr)
dat %>%
group_by(grp = (seq_len(n())-1) %/% 25) %>%
distinct(depth, .keep_all = TRUE) %>%
ungroup() %>%
select(-grp)
# # A tibble: 42 x 4
# Date Levels values depth
# <chr> <int> <dbl> <int>
# 1 2005-12-31 1 183. 0
# 2 2005-12-31 5 183. 2
# 3 2005-12-31 8 183. 3
# 4 2005-12-31 9 183. 4
# 5 2005-12-31 11 183. 5
# 6 2005-12-31 13 183. 7
# 7 2005-12-31 14 183. 8
# 8 2005-12-31 16 183. 10
# 9 2005-12-31 17 182. 12
# 10 2005-12-31 20 182. 17
# # ... with 32 more rows
data.table
library(data.table)
as.data.table(dat)[, .SD[!duplicated(depth),], by=.( (seq_len(nrow(dat))-1) %/% 25 ) ][,-1]
(The [,-1] on the end is because the by= grouping operation implicitly prepends the seq_len(.)... counter as its first column.)
(Notice a theme? :-)
Data
dat <- structure(list(Date = c("2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2005-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31", "2006-12-31"), Levels = c(1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L, 14L, 16L, 17L, 20L, 23L, 26L, 28L, 29L, 31L, 33L, 35L, 38L, 41L, 44L, 49L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L, 14L, 16L, 17L, 20L, 23L, 26L, 28L, 29L, 31L, 33L, 35L, 38L, 41L, 44L, 49L), values = c(182.8, 182.8, 182.8, 182.8, 182.8, 182.8, 182.8, 182.8, 182.8, 182.7, 182.7, 182.6, 182.5, 181.5, 177.5, 165.9, 155, 149.2, 136.9, 126.1, 112.7, 88.23, 67.99, 54.63, 45.4, 182.8, 182.8, 182.8, 182.8, 182.8, 182.8, 182.8, 182.8, 182.7, 182.7, 182.7, 182.6, 182.5, 181.5, 178.6, 168.7, 156.9, 150.4, 137.1, 126, 112.7, 91.8, 75.91, 65.17, 58.33), depth = c(0L, 0L, 2L, 2L, 2L, 3L, 4L, 4L, 5L, 7L, 8L, 10L, 12L, 17L, 23L, 31L, 36L, 40L, 46L, 53L, 60L, 70L, 79L, 87L, 98L, 0L, 0L, 2L, 2L, 2L, 3L, 4L, 4L, 5L, 7L, 8L, 10L, 12L, 17L, 23L, 31L, 36L, 40L, 46L, 53L, 60L, 70L, 79L, 87L, 98L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "50"))
We could use order and !duplicated:
df = df[order(df[,'depth']),]
df = df[!duplicated(df$depth),]
df
Date Levels values depth
<date> <dbl> <dbl> <dbl>
1 2005-12-31 1 183. 0
2 2005-12-31 5 183. 2
3 2005-12-31 8 183. 3
4 2005-12-31 9 183. 4
5 2005-12-31 11 183. 5
6 2005-12-31 13 183. 7
7 2005-12-31 14 183. 8
8 2006-12-31 49 58.3 9
9 2005-12-31 16 183. 10
10 2005-12-31 17 182. 12
# … with 12 more rows

How to average and estimate data in R

I have this data:
id power training hr percent_absVO2 percent_power relVO2 percent_relVO2
1 AC12-PRD-C1 25 linear 88.75 49.34068 21.73913 8.797619 49.34068
2 AC12-PRD-C1 40 linear 93.25 54.73210 34.78261 9.758929 54.73210
3 AC12-PRD-C1 55 linear 99.75 63.51193 47.82609 11.324405 63.51193
4 AC12-PRD-C1 70 linear 109.75 71.79102 60.86957 12.800595 71.79102
5 AC12-PRD-C1 85 linear 118.75 80.05341 73.91304 14.273810 80.05341
6 AC12-PRD-C1 100 linear 127.00 89.85144 86.95652 16.020833 89.85144
7 AC12-PRD-C1 115 linear 135.75 100.00000 100.00000 17.830357 100.00000
8 AC12-PRD-C2 25 linear 84.25 40.10378 17.24138 7.974646 40.10378
9 AC12-PRD-C2 40 linear 89.25 43.49889 27.58621 8.649764 43.49889
10 AC12-PRD-C2 55 linear 96.25 49.54781 37.93103 9.852594 49.54781
11 AC12-PRD-C2 70 linear 102.25 63.00964 48.27586 12.529481 63.00964
12 AC12-PRD-C2 85 linear 110.75 70.02224 58.62069 13.923939 70.02224
13 AC12-PRD-C2 100 linear 118.25 80.11861 68.96552 15.931604 80.11861
14 AC12-PRD-C2 115 linear 129.25 89.34025 79.31034 17.765330 89.34025
15 AC12-PRD-C2 130 linear 136.25 93.29874 89.65517 18.552476 93.29874
16 AC12-PRD-C2 145 linear 147.50 100.00000 100.00000 19.885024 100.00000
17 AL13-PRD-C1 25 nonlinear 69.50 41.36691 19.23077 7.733918 41.36691
18 AL13-PRD-C1 40 nonlinear 73.00 46.82515 30.76923 8.754386 46.82515
19 AL13-PRD-C1 55 nonlinear 83.25 53.48764 42.30769 10.000000 53.48764
20 AL13-PRD-C1 70 nonlinear 93.75 61.58899 53.84615 11.514620 61.58899
21 AL13-PRD-C1 85 nonlinear 104.50 71.91117 65.38462 13.444444 71.91117
22 AL13-PRD-C1 100 nonlinear 114.25 84.23522 76.92308 15.748538 84.23522
23 AL13-PRD-C1 115 nonlinear 125.25 90.77260 88.46154 16.970760 90.77260
24 AL13-PRD-C1 130 nonlinear 136.25 100.00000 100.00000 18.695906 100.00000
25 AL13-PRD-C2 25 nonlinear 60.25 30.83378 15.62500 6.911408 30.83378
26 AL13-PRD-C2 40 nonlinear 63.25 34.20411 25.00000 7.666869 34.20411
27 AL13-PRD-C2 55 nonlinear 72.75 44.72117 34.37500 10.024272 44.72117
28 AL13-PRD-C2 70 nonlinear 79.00 51.17759 43.75000 11.471481 51.17759
29 AL13-PRD-C2 85 nonlinear 88.25 62.29020 53.12500 13.962379 62.29020
30 AL13-PRD-C2 100 nonlinear 99.00 70.34380 62.50000 15.767597 70.34380
31 AL13-PRD-C2 115 nonlinear 107.00 75.67677 71.87500 16.962985 75.67677
32 AL13-PRD-C2 130 nonlinear 118.50 83.97401 81.25000 18.822816 83.97401
33 AL13-PRD-C2 145 nonlinear 128.25 92.73146 90.62500 20.785801 92.73146
34 AL13-PRD-C2 160 nonlinear 142.50 100.00000 100.00000 22.415049 100.00000
As you can see, not everyone has the same percent_power value. I would like to make everyone one the same increment according the percent_power column. For example, I would like to get 25 %, 50 %, 75 %, and 100 %.
That means that the value has to be estimated if I understand well.
Here is an example for only 2 patients of what I would expect. Values in the other column have to be estimated if the exact percent_power doesn't exist.
id power training hr percent_absVO2 percent_power relVO2 percent_relVO2
1 AC12-PRD-C1 25 linear 88.75 49.34068 25.00000 8.797619 49.34068
2 AC12-PRD-C1 55 linear 99.75 63.51193 50.00000 11.324405 63.51193
3 AC12-PRD-C1 85 linear 118.75 80.05341 75.00000 14.273810 80.05341
4 AC12-PRD-C1 115 linear 135.75 100.00000 100.00000 17.830357 100.00000
5 AC12-PRD-C2 40 linear 89.25 43.49889 25.00000 8.649764 43.49889
6 AC12-PRD-C2 70 linear 102.25 63.00964 50.00000 12.529481 63.00964
7 AC12-PRD-C2 115 linear 129.25 89.34025 75.00000 17.765330 89.34025
16 AC12-PRD-C2 145 linear 147.50 100.00000 100.00000 19.885024 100.00000
Thank you for your help !
Your question is not well defined, you want us to predict multiple variables, but you do not tell us how, which is crucial. There are infinitely many ways in which you could "predict" these missing values. For ex. your variables "percent" are limited within 0 and 100, which you have to include in your prediction, or you will get nonsense results.
Having said that, I will do just that and predict all these variables using a simple linear model, which assumes a linear relationship between all your numeric variables and percent_power.
First create a linear model to make your predictions
mod=lm(cbind(power,hr,percent_absVO2,relVO2,percent_relVO2)~percent_power*id+training,data=df)
then create your dataset with all the combinations that you want to predict
tst=setNames(
data.frame(
expand.grid(unique(df[,"id"]),unique(df[,"training"]),seq(25,100,25))
)
,c("id","training","percent_power")
)
and then predict
cbind(
tst,
predict(mod,tst)
)
id training percent_power power hr percent_absVO2 relVO2
1 AC12-PRD-C1 linear 25 28.75 88.08482 49.30491 8.791242
2 AC12-PRD-C2 linear 25 36.25 86.88333 43.57464 8.664828
3 AL13-PRD-C1 linear 25 32.50 70.24554 42.60440 7.965278
4 AL13-PRD-C2 linear 25 40.00 63.65909 36.80735 8.250386
5 AC12-PRD-C1 nonlinear 25 28.75 88.08482 49.30491 8.791242
6 AC12-PRD-C2 nonlinear 25 36.25 86.88333 43.57464 8.664828
7 AL13-PRD-C1 nonlinear 25 32.50 70.24554 42.60440 7.965278
8 AL13-PRD-C2 nonlinear 25 40.00 63.65909 36.80735 8.250386
9 AC12-PRD-C1 linear 50 57.50 103.65774 65.64847 11.705357
10 AC12-PRD-C2 linear 50 72.50 106.05556 63.13669 12.554745
11 AL13-PRD-C1 linear 50 65.00 91.71230 61.50428 11.498782
12 AL13-PRD-C2 linear 50 80.00 88.20455 57.97911 12.996047
13 AC12-PRD-C1 nonlinear 50 57.50 103.65774 65.64847 11.705357
14 AC12-PRD-C2 nonlinear 50 72.50 106.05556 63.13669 12.554745
15 AL13-PRD-C1 nonlinear 50 65.00 91.71230 61.50428 11.498782
16 AL13-PRD-C2 nonlinear 50 80.00 88.20455 57.97911 12.996047
17 AC12-PRD-C1 linear 75 86.25 119.23065 81.99203 14.619473
18 AC12-PRD-C2 linear 75 108.75 125.22778 82.69873 16.444662
19 AL13-PRD-C1 linear 75 97.50 113.17907 80.40415 15.032285
20 AL13-PRD-C2 linear 75 120.00 112.75000 79.15088 17.741707
21 AC12-PRD-C1 nonlinear 75 86.25 119.23065 81.99203 14.619473
22 AC12-PRD-C2 nonlinear 75 108.75 125.22778 82.69873 16.444662
23 AL13-PRD-C1 nonlinear 75 97.50 113.17907 80.40415 15.032285
24 AL13-PRD-C2 nonlinear 75 120.00 112.75000 79.15088 17.741707
25 AC12-PRD-C1 linear 100 115.00 134.80357 98.33560 17.533588
26 AC12-PRD-C2 linear 100 145.00 144.40000 102.26077 20.334578
27 AL13-PRD-C1 linear 100 130.00 134.64583 99.30403 18.565789
28 AL13-PRD-C2 linear 100 160.00 137.29545 100.32264 22.487368
29 AC12-PRD-C1 nonlinear 100 115.00 134.80357 98.33560 17.533588
30 AC12-PRD-C2 nonlinear 100 145.00 144.40000 102.26077 20.334578
31 AL13-PRD-C1 nonlinear 100 130.00 134.64583 99.30403 18.565789
32 AL13-PRD-C2 nonlinear 100 160.00 137.29545 100.32264 22.487368
percent_relVO2
1 49.30491
2 43.57464
3 42.60440
4 36.80735
5 49.30491
6 43.57464
7 42.60440
8 36.80735
9 65.64847
10 63.13669
11 61.50428
12 57.97911
13 65.64847
14 63.13669
15 61.50428
16 57.97911
17 81.99203
18 82.69873
19 80.40415
20 79.15088
21 81.99203
22 82.69873
23 80.40415
24 79.15088
25 98.33560
26 102.26077
27 99.30403
28 100.32264
29 98.33560
30 102.26077
31 99.30403
32 100.32264
Note how the percents have gone over 100, which was to be expected, given our model.
When I got you correct, that is what you want.
library(dplyr)
df <- structure(list(id = c("AC12-PRD-C1", "AC12-PRD-C1", "AC12-PRD-C1",
"AC12-PRD-C1", "AC12-PRD-C1", "AC12-PRD-C1", "AC12-PRD-C1", "AC12-PRD-C2",
"AC12-PRD-C2", "AC12-PRD-C2", "AC12-PRD-C2", "AC12-PRD-C2", "AC12-PRD-C2",
"AC12-PRD-C2", "AC12-PRD-C2", "AC12-PRD-C2", "AL13-PRD-C1", "AL13-PRD-C1",
"AL13-PRD-C1", "AL13-PRD-C1", "AL13-PRD-C1", "AL13-PRD-C1", "AL13-PRD-C1",
"AL13-PRD-C1", "AL13-PRD-C2", "AL13-PRD-C2", "AL13-PRD-C2", "AL13-PRD-C2",
"AL13-PRD-C2", "AL13-PRD-C2", "AL13-PRD-C2", "AL13-PRD-C2", "AL13-PRD-C2"
), power = c(25, 40, 55, 70, 85, 100, 115, 25, 40, 55, 70, 85,
100, 115, 130, 145, 25, 40, 55, 70, 85, 100, 115, 130, 25, 40,
55, 70, 85, 100, 115, 130, 145), training = c("linear", "linear",
"linear", "linear", "linear", "linear", "linear", "linear", "linear",
"linear", "linear", "linear", "linear", "linear", "linear", "linear",
"nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear",
"nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear",
"nonlinear", "nonlinear", "nonlinear", "nonlinear", "nonlinear",
"nonlinear", "nonlinear"), hr = c(88.75, 93.25, 99.75, 109.75,
118.75, 127, 135.75, 84.25, 89.25, 96.25, 102.25, 110.75, 118.25,
129.25, 136.25, 147.5, 69.5, 73, 83.25, 93.75, 104.5, 114.25,
125.25, 136.25, 60.25, 63.25, 72.75, 79, 88.25, 99, 107, 118.5,
128.25), percent_absVO2 = c(49.34068, 54.7321, 63.51193, 71.79102,
80.05341, 89.85144, 100, 40.10378, 43.49889, 49.54781, 63.00964,
70.02224, 80.11861, 89.34025, 93.29874, 100, 41.36691, 46.82515,
53.48764, 61.58899, 71.91117, 84.23522, 90.7726, 100, 30.83378,
34.20411, 44.72117, 51.17759, 62.2902, 70.3438, 75.67677, 83.97401,
92.73146), percent_power = c(21.73913, 34.78261, 47.82609, 60.86957,
73.91304, 86.95652, 100, 17.24138, 27.58621, 37.93103, 48.27586,
58.62069, 68.96552, 79.31034, 89.65517, 100, 19.23077, 30.76923,
42.30769, 53.84615, 65.38462, 76.92308, 88.46154, 100, 15.625,
25, 34.375, 43.75, 53.125, 62.5, 71.875, 81.25, 90.625), relVO2 = c(8.797619,
9.758929, 11.324405, 12.800595, 14.273810, 16.020833,
17.830357, 7.974646, 8.649764, 9.852594, 12.529481,
13.923939, 15.931604, 17.765330, 18.552476, 19.885024,
7.733918, 8.754386, 10.000000, 11.514620, 13.444444,
15.748538, 16.970760, 18.695906, 6.911408, 7.666869,
10.024272, 11.471481, 13.962379, 15.767597, 16.962985,
18.822816, 20.785801), percent_relVO2 = c(49.34068, 54.7321,
63.51193, 71.79102, 80.05341, 89.85144, 100, 40.10378, 43.49889,
49.54781, 63.00964, 70.02224, 80.11861, 89.34025, 93.29874, 100,
41.36691, 46.82515, 53.48764, 61.58899, 71.91117, 84.23522, 90.7726,
100, 30.83378, 34.20411, 44.72117, 51.17759, 62.2902, 70.3438,
75.67677, 83.97401, 92.73146)), row.names = c(NA, -33L), class = c("tbl_df",
"tbl", "data.frame"))
df %>% dplyr::mutate(., percent_power = dplyr::case_when(
percent_absVO2 < 25 ~ 0,
percent_absVO2 < 50 ~ 25,
percent_absVO2 < 75 ~ 50,
percent_absVO2 < 100 ~ 75,
TRUE ~ 100
))

How to perform a polynomial equation and curve?

I have this dataframe:
id power hr fr VE VO2 VCO2 PETCO2 percent_VO2 percent_power
1 BM06-PRD-S1 25 119.25 18.25 19.00 0.61675 0.58225 37.6425 48.87084 25.00000
2 BM06-PRD-S1 40 126.00 18.00 20.75 0.71700 0.65950 39.2175 56.81458 40.00000
3 BM06-PRD-S1 55 133.50 20.75 25.00 0.86275 0.82750 41.2150 68.36371 55.00000
4 BM06-PRD-S1 70 147.25 18.25 29.00 0.98575 1.04550 41.7050 78.11014 70.00000
5 BM06-PRD-S1 85 158.50 22.25 39.25 1.13000 1.30525 41.1425 89.54041 85.00000
6 BM06-PRD-S1 100 168.75 27.75 51.00 1.26200 1.61150 38.8925 100.00000 100.00000
7 CB19-PRD-S1 25 98.75 18.50 25.00 0.88350 0.80475 40.7550 36.15715 13.15789
8 CB19-PRD-S1 40 98.25 20.00 25.50 0.94575 0.82900 41.4675 38.70473 21.05263
9 CB19-PRD-S1 55 102.00 19.75 28.50 1.08125 0.95800 42.2775 44.25005 28.94737
10 CB19-PRD-S1 70 107.50 20.50 34.25 1.24400 1.14275 42.6450 50.91058 36.84211
11 CB19-PRD-S1 85 111.00 21.25 35.50 1.30475 1.19925 43.3600 53.39677 44.73684
12 CB19-PRD-S1 100 117.25 21.50 40.25 1.47350 1.42225 44.2650 60.30284 52.63158
13 CB19-PRD-S1 115 123.00 22.75 47.00 1.67900 1.68475 44.6400 68.71291 60.52632
14 CB19-PRD-S1 130 129.50 24.50 52.50 1.79075 1.87950 44.3425 73.28627 68.42105
15 CB19-PRD-S1 145 135.50 25.25 59.50 1.96000 2.13525 44.7300 80.21281 76.31579
16 CB19-PRD-S1 160 145.25 26.75 64.50 2.04050 2.28350 43.8825 83.50726 84.21053
17 CB19-PRD-S1 175 151.25 30.50 83.00 2.34425 2.76050 41.6025 95.93820 92.10526
18 CB19-PRD-S1 190 161.75 33.75 92.25 2.44350 2.96850 40.0400 100.00000 100.00000
19 CC14-PRD-S1 20 102.50 19.00 18.25 0.59250 0.54825 37.7175 49.26211 22.22222
20 CC14-PRD-S1 30 110.25 18.75 19.75 0.66100 0.60325 38.5800 54.95739 33.33333
21 CC14-PRD-S1 40 113.25 18.50 20.75 0.74350 0.66025 39.2950 61.81667 44.44444
22 CC14-PRD-S1 50 122.50 20.00 23.50 0.87875 0.77325 40.5650 73.06173 55.55556
23 CC14-PRD-S1 60 126.25 17.50 26.25 0.94350 0.89375 41.3525 78.44523 66.66667
24 CC14-PRD-S1 70 132.00 16.50 28.00 0.99675 0.98525 42.7575 82.87258 77.77778
25 CC14-PRD-S1 80 145.00 18.50 32.75 1.11425 1.16275 42.5025 92.64186 88.88889
26 CC14-PRD-S1 90 153.50 19.50 37.25 1.20275 1.32700 42.0975 100.00000 100.00000
27 DA24-PRD-S1 25 88.00 18.50 15.75 0.53500 0.45075 37.2200 40.33170 21.73913
28 DA24-PRD-S1 40 93.25 18.50 16.25 0.58450 0.47775 38.3375 44.06332 34.78261
29 DA24-PRD-S1 55 103.75 19.00 20.25 0.76875 0.65450 40.1875 57.95326 47.82609
30 DA24-PRD-S1 70 119.00 20.75 28.00 0.98200 0.95525 41.5175 74.02940 60.86957
31 DA24-PRD-S1 85 133.25 22.75 34.75 1.09975 1.18325 41.4125 82.90614 73.91304
32 DA24-PRD-S1 100 145.00 27.50 45.75 1.25900 1.49700 39.1475 94.91142 86.95652
33 DA24-PRD-S1 115 155.25 36.50 64.75 1.32650 1.72500 33.0275 100.00000 100.00000
I am running a plot using ggplot and ggscatter:
ggplot(dftest, aes(percent_power, PETCO2)) +
geom_point()
ggscatter(dftest, x = "percent_power", y = "PETCO2", add = "reg.line") +
stat_cor(label.x = 20, label.y = 3.8) +
stat_regline_equation(label.x = 20, label.y = 0.5) +
xlab("Percentage of power (%)") +
geom_smooth(method = "lm", colour = "red") +
ylab(expression(paste("PETC", O[2]," (mmHg)")))
I would like to perform a polynomial equation and curve because I am just able to run a linear regression.
Thank you!
I think what you're asking is how to how to fit a polynomial regression line to your data while still using the ggpubr functions to annotate it.
This is possible, but it seems that the in-built regression line can only be either a straight line or a loess model, neither of which is appropriate. However, you can fit a polynomial curve and get the equation and adjusted R-squared on the plot using the method below. In your case I have used a cubic formula, but you should choose your polynomial based on a known model or whatever makes most sense based on what you already know about the relationship between your variables.
You can use ggplot to add the actual line as suggested by #Roland, as long as this uses the same formula as the one you supply to stat_regline_equation
ggscatter(dftest, x = "percent_power", y = "PETCO2") +
stat_regline_equation(label.x = 20, label.y = 0.5,
formula = y ~ poly(x, 3),
aes(label = paste(..eq.label.., ..adj.rr.label.., sep = "~~~~")),) +
geom_smooth(method = "lm", formula = y ~ poly(x, 3)) +
xlab("Percentage of power (%)") +
ylab(expression(paste("PETC", O[2]," (mmHg)")))
Which gives this result:

Apply function to each row for each group in dplyr group by

I have a dataframe that consists of functions I want to apply let f1 and f2 represent these functions that take dbh and ht as arguments.
spcd region function
122 'OR_W' f1
141 'OR_W' f2
I also have a dataframe that looks like
spcd region dbh ht
122 'OR_W' 12 101
122 'OR_W' 13 141
122 'OR_W' 15 122
141 'OR_W' 12 101
etc
I want to apply the functions stored in the first data frame to the rows in the second dataframe to produce something like this
spcd region dbh ht output
122 'OR_W' 12 101 <output of f1>
122 'OR_W' 13 141 <output of f1>
122 'OR_W' 15 122 <output of f1>
141 'OR_W' 12 101 <output of f2>
Where <output of f1> is the output of the first function with the inputs of dbh and ht.
I think that dplyr's group_by would be useful for this, by grouping on spcd and region in the second dataframe, and then applying the correct function for each row in that group.
Is there a way to apply a function, row-wise, to a group within a dplyr group_by object?
This is a base r solution;
Map(apply,split(data1[-1],data1$d),1,c(data2$fun)))
data1["output"]=c(mapply(apply,split(data1[-1],data1$d),1,c(data2$fun)))
data1
d Girth Height Volume output
1 1 8.3 70 10.3 88.6
2 1 8.6 65 10.3 83.9
3 1 8.8 63 10.2 82.0
4 1 10.5 72 16.4 98.9
5 1 10.7 81 18.8 110.5
6 2 10.8 83 19.7 83.0
7 2 11.0 66 15.6 66.0
8 2 11.0 75 18.2 75.0
9 2 11.1 80 22.6 80.0
10 2 11.2 75 19.9 75.0
data used:
data1=structure(list(d = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
Girth = c(8.3, 8.6, 8.8, 10.5, 10.7, 10.8, 11, 11, 11.1,
11.2), Height = c(70, 65, 63, 72, 81, 83, 66, 75, 80, 75),
Volume = c(10.3, 10.3, 10.2, 16.4, 18.8, 19.7, 15.6, 18.2,
22.6, 19.9)), .Names = c("d", "Girth", "Height", "Volume"
), row.names = c(NA, 10L), class = "data.frame")
data2=structure(list(X1.2 = 1:2, fun = c("sum", "max")), .Names = c("X1.2",
"fun"), row.names = c(NA, -2L), class = "data.frame")

Resources