Computing iterations in R to calculate the rate of population increase - r

I´ve tried to calculate the rate of population increase, denoted as r, which is obtained from:
sum(e^(r*x)*lx*mx) = 1
I know the values of x, lx, and mx, but r value should be iteratively obtained to get a sum of one.
This is the code I wrote (or tried to), which it´s not correct as it returns values for the sum but not for r.
I don´t know what´s wrong. I´ll appreciate any clue to solve this.
Thanks.
lx <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0)
mx <- c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22)
radj = sum((exp(-r*x))*lx*mx)
for (radj in 0:1) {
repeat { radj <- sum((exp(-r*x))*lx*mx)
print(radj)
if (radj < 1) break ()} }

Try this:
root <- uniroot( f = function(r) sum(exp(r*x)*lx*mx) - 1, interval = c(-1, 0))
root$root
> [1] -0.8340894

Related

Transform data to be fitted by pammtools package

I am trying to use the pammtools package to conduct some Piece-wise exponential Additive Mixed modelling: https://adibender.github.io/pammtools/index.html. In order to fit the model, I first need to transform the data using the as_ped function provided by the pammtools package: https://adibender.github.io/pammtools/reference/as_ped.html. I don't know how to make it work for my dataset. Here is a toy dataset:
df <- data.frame(id = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3),
time=c(0, 500, 1000, 1500,
0, 500, 1000, 1500, 2000, 2500,
0, 500, 1000, 1500),
event=c(0, 1, 0, 1,
0, 1, 0, 1, 0, 1,
0, 0, 0, 1),
tdc=c(2, 3, 4, 6,
2, 1, 2, 2, 3, 5,
4, 2, 1, 2))
It seems that the as_ped function mandates a single row for each id (no multiple rows for each id). However, My data are longitudinal and therefore each id takes up several rows. How can I make this work? Any help will be greatly appreciated!

Using R to create forest plots of coefficients for regressions on subsamples

I have a dataset of chess positions made up of 100 groups, with each group taking one of 50 positions ("Position_number") and one of two colours ("stm_white"). I want to run a linear regression for each Position_number subsample, where stm_white is the explanatory variable and stm_perform is the outcome variable. Then, I want to display the coefficient of stm_white and the associated confidence interval for each regression in a forest plot. The idea is to be able to easily see which Position_number subsample gives significant coefficients for stm_white, and to compare coefficients across positions. For example, the plot would have 50 y-axis categories labelled with each position number, the x-axis would represent the coefficient range, and the plot would display a horizontal confidence bar for each position number.
Where I'm stuck:
Getting the confidence interval bounds for each regression
Plotting each of the 50 coefficients (with confidence intervals) on one plot. (I think this is called a forest plot?)
This is how I current get a list of the coefficients for each regression:
fits <- by(df, df[,"Position_number"],
function(x) lm(stm_perform ~ stm_white, data = x))
# Combine coefficients from each model
do.call("rbind", lapply(fits, coef))
And here is a sample of 10 positions (apologies if there's a better way to show reproducible data):
>dput(droplevels(dfMWE[,c("Position_number","stm_white","stm_perform")]))
structure(list(Position_number = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10), stm_white = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1), stm_perform = c(0.224847134350316, -0.252000458803946,
0.263005239459311, -0.337712202569111, 0.525880930891169, -0.5,
0.514387184165999, 0.520136722035817, -0.471249436107731, -0.557311633762293,
-0.382774969095054, -0.256365477992672, -0.592466230584332, 0.420100239642119,
0.35728693116738, -0.239203909010858, 0.492804918290949, -0.377349804212738,
0.498560888290847, 0.650604627933873, 0.244481117928803, 0.225852022298169,
0.448376452689039, 0.305090287270497, 0.275461757157464, 0.0232950364735793,
-0.117225030904946, 0.103523492101814, 0.098301745397805, 0.435599509759579,
-0.323024628921732, -0.790798102797238, 0.326223812111678, -0.331305043692668,
0.300230596737942, -0.340292005855252, 0.196181480575316, -0.0606495585093978,
0.789844179758131, -0.0862623926308338, -0.560150145231903, 0.697345078589853,
-0.425719796345476, 0.65321716721887, -0.878090073942596, 0.393712176214572,
0.636076899687882, 0.530184680003902, -0.567228844342952, 0.767024918145021,
-0.207303615824231, -0.332581578126777, -0.511510891217792, 0.227871326531416,
-0.0140876421179904, -0.891010911045765, -0.617225030904946,
-0.335142021445235, -0.517262524432376, 0.676301669492737, 0.375998241382333,
-0.0882899718631629, -0.154706189382, -0.108431333126633, 0.204584592662721,
0.475554538879339, 0.0840205872617279, -0.403370826694226, -0.74253555894307,
0.182570385474772, -0.484175014735265, -0.332581578126777, -0.427127748605496,
0.474119069108831, -0.0668284645696687, -0.0262098994728823,
-0.255269593134965, -0.313699742316688, -0.485612815834001, 0.302654921410147,
-0.425719796345476, 0.65321716721887, 0.393712176214572, 0.60766106412682,
0.530184680003902, 0.384135895746244, 0.564400490240421, 0.767024918145021,
0.702182602090521, 0.518699777929559, -0.281243170101218, -0.283576305897061,
0.349395372066127, -0.596629173305774, 0.0849108889395813, -0.264122555898524,
0.593855385236178, -0.418698521631085, 0.269754586702576, -0.719919005947152,
0.510072446927438, -0.0728722513945044, -0.0849108889395813,
0.0650557537775339, 0.063669188530584, -0.527315973006493, -0.716423694102939,
-0.518699777929559, 0.349395372066127, -0.518699777929559, 0.420100239642119,
-0.361262250888275, 0.431358608116332, 0.104596852632671, 0.198558626418023,
0.753386077785615, 0.418698521631085, -0.492804918290949, -0.636076899687882,
-0.294218640287997, 0.617225030904946, -0.333860575416878, -0.544494573083008,
-0.738109032540419, -0.192575818328721, -0.442688366237707, 0.455505426916992,
0.13344335621046, 0.116471711943561, 0.836830966002895, -0.125024693001636,
0.400603203290743, -0.363923100312118, -0.157741327529574, -0.281243170101218,
-0.326223812111678, -0.548774335859742, 0.104058949158278, -0.618584122089031,
-0.148779202375097, -0.543066492022212, -0.790798102797238, -0.541637702714763,
0.166337530816562, -0.431358608116332, -0.471249436107731, -0.531618297828107,
-0.135452994588696, 0.444109038883147, -0.309993792719686, 0.472684026993507,
-0.672509643334985, -0.455505426916992, -0.0304828450187082,
-0.668694956307332, 0.213036720610531, -0.370611452782498, -0.100361684849949,
-0.167940159469667, -0.256580594295053, 0.41031649686005, 0.544494573083008,
-0.675040201040299, 0.683816314193659, 0.397841906825283, 0.384135895746244,
0.634743335052317, 0.518699777929559, -0.598013765769344, -0.524445461120661,
-0.613136820153143, 0.12949974225673, -0.337712202569111, -0.189904841395243,
0.588289971863163, 0.434184796930767, -0.703385003471829, 0.505756208411145,
0.445530625978324, -0.167137309739621, 0.437015271896404, -0.550199353253537,
-0.489927553072562, -0.791748837508184, 0.434184796930767, 0.264122555898524,
-0.282408276808469, -0.574280203654524, 0.167940159469667, -0.439849854768097,
-0.604912902007957, 0.420100239642119, 0.35728693116738, 0.239220254140668,
-0.276612130560829, -0.25746444105693, 0.593855385236178, -0.632070012100074,
0.314483587504712, 0.650604627933873, -0.226860086923233, -0.702182602090521,
0.25746444105693, -0.174474012638818, 0.0166045907672774, 0.535915926945102,
0.141635395826102, 0.420100239642119, 0.557311633762293, 0.593855385236178,
0.6961287704296, 0.0444945730830079, -0.234005329233511, 0.448376452689039,
-0.86655664378954, 0.22107824319756, 0.148051654147426, 0.543066492022212,
-0.448376452689039, 0.373300918333268)), row.names = c(NA, -220L
), groups = structure(list(Position_number = c(0, 1, 2, 3, 4,
5, 6, 7, 8, 9, 10), .rows = structure(list(1:20, 21:40, 41:60,
61:80, 81:100, 101:120, 121:140, 141:160, 161:180, 181:200,
201:220), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 11L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
confint() can get you the confidence interval of a model.
forestplot() from the forestplot R package can make you a forest plot.
library(dplyr)
library(forestplot)
results <- lapply(unique(df$Position_number), function(pos) {
fit = filter(df, Position_number == pos) %>%
lm(data = ., stm_perform ~ stm_white)
stm_white_lm_index = 2 # the second term in lm() output is "stm_white"
coefficient = coef(fit)[stm_white_lm_index]
lb = confint(fit)[stm_white_lm_index,1] # lower bound confidence
ub = confint(fit)[stm_white_lm_index,2] # upper bound confidence
output = data.frame(Position_number = pos, coefficient, lb, ub)
return(output)
}) %>% bind_rows() # bind_rows() combines output from each model in the list
with(results, forestplot(Position_number, coefficient, lb, ub))
The forest plot shows the "Position_number" labels on the left and the regression coefficients of "stm_white" with the 95% confidence intervals plotted. You can further customize the plot. See forestplot::forestplot() or this introduction by Max Gordon for details.

ERROR in R: `[.data.frame`(y.data, , treat) : undefined columns selected - running mediation package

I am looking at the mediation of AUDITCEN --> INTERN through W1_cesd.
The relationship between AUDITCEN and INTERN is quadradtic, but the relationship between AUDITCEN and W1_cesd is linear. I think this is causing me issues....
I am running:
dyad_id <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
INTERN <- c(4, 3, 4, 2, 2, 6, 8, 6, 9, 9)
AUDITCEN <- c(5.9, -6.1, -9.1, -5.1, -7.1, -6.1, 0.9, -2.1, -7.1, 1.9)
W1_cesd <- c(25, 8, 5, 0, 5, 17, 10, 5, 5, 7)
GENDERKID<- c(0, 0, 1, 1, 0, 1, 1, 0, 1, 0)
C_AGE_DI <- c(0, 0, 1, 1, 0, 0, 0, 0, 0, 1)
RACE_W <- c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1)
RACE_O <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
cesd <- data.frame(dyad_id, INTERN, AUDITCEN, W1_cesd, GENDERKID, C_AGE_DI, RACE_W, RACE_O)
library(mediation)
med.fit <-glm(W1_cesd ~ AUDITCEN + GENDERKID + C_AGE_DI + RACE_W + RACE_O, data=cesd )
out.fit <-glm(INTERN ~ W1_cesd+ poly(AUDITCEN, 2) + GENDERKID + C_AGE_DI + RACE_W + RACE_O, data=cesd )
results <-mediate(med.fit, out.fit, sims = 1000, boot = TRUE, treat = "AUDITCEN", control.value=-10, treat.value=0, mediator = "W1_cesd")
"results" produces the following error:
Error in `[.data.frame`(y.data, , treat) : undefined columns selected
My treatment variable does exist and the 2 models look fine. What is going wrong? Am I doing something wrong when I specify the quadratic association for my exp-out relationship?

R - run each time over one cell from one column over each cell in another column

I have a function which her input should run each time over one cell from one column over each cell in another column.
I can do it with a loop, however, I'm looking to vectorize the process or make it faster. As for now, it would take me days to finish the process.
Ideally, it would be using tidyverse but any help would be appreciated.
My loop looks like that:
results <- data.frame(
pathSubject1 = as.character(),
pathSubject2 = as.character())
i <- 1 #Counter first loop
j <- 1 #Counter second loop
#Loop over subject 1
for (i in 1:dim(df)[1]) {#Start of first loop
#Loop over subject 2
for (j in 1:dim(df)[1]) {#Start of second loop
#calc my function for the subjects
tempPercentSync <- myFunc(df$subject1[i], df$subject2[j])
results <- rbind(
results,
data.frame(
pathSubject1 = df$value[i],
pathSubject2 = df$value[j],
syncData = nest(tempPercentSync)))
} #End second loop
} #End first loop
My example function:
myFunc <- function(x, y) {
temp <- dplyr::inner_join(
as.data.frame(x),
as.data.frame(y),
by = "Time")
out <- as.data.frame(summary(temp))
}
Example of my dataset using dput:
structure(list(value = c("data/ExportECG/101_1_1_0/F010.feather",
"data/ExportECG/101_1_1_0/F020.feather"), ID = c(101, 101), run = c(1,
1), timeComing = c(1, 1), part = c(0, 0), paradigm = c("F010",
"F020"), group = c(1, 1), subject1 = list(structure(list(Time = c(0,
0.5, 1, 1.5, 2, 2.5), subject1 = c(9.73940345482368, 9.08451907157601,
8.42963468832833, 7.77475030508065, 7.11986592183298, 7.24395122629289
)), .Names = c("Time", "subject1"), row.names = c(NA, 6L), class = "data.frame"),
structure(list(Time = c(0, 0.5, 1, 1.5, 2, 2.5), subject1 = c(58.3471156751544,
75.9103303197856, 83.014068283342, 89.7923167579699, 88.6748903116088,
84.7651306939912)), .Names = c("Time", "subject1"), row.names = c(NA,
6L), class = "data.frame")), subject2 = list(structure(list(
Time = c(0, 0.5, 1, 1.5, 2, 2.5), subject2 = c(77.7776200371528,
77.4139420609906, 74.9760822165258, 75.3915183650012, 77.5672070195079,
80.7418145918357)), .Names = c("Time", "subject2"), row.names = c(NA,
6L), class = "data.frame"), structure(list(Time = c(0, 0.5, 1,
1.5, 2, 2.5), subject2 = c(101.133666720578, 105.010792226714,
107.01541987713, 104.471173834529, 97.5910271952943, 92.9840354003295
)), .Names = c("Time", "subject2"), row.names = c(NA, 6L), class = "data.frame"))), .Names = c("value",
"ID", "run", "timeComing", "part", "paradigm", "group", "subject1",
"subject2"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-2L))
Output should loook like:
pathSubject1
1 data/ExportECG/101_1_1_0/F010.feather
2 data/ExportECG/101_1_1_0/F010.feather
3 data/ExportECG/101_1_1_0/F020.feather
4 data/ExportECG/101_1_1_0/F020.feather
pathSubject2
1 data/ExportECG/101_1_1_0/F010.feather
2 data/ExportECG/101_1_1_0/F020.feather
3 data/ExportECG/101_1_1_0/F010.feather
4 data/ExportECG/101_1_1_0/F020.feather
data
1 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 20, 5, 17, 14, 8, 11, 21, 6, 19, 16, 10, 13, 22, 7, 18, 15, 9, 12
2 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 21, 6, 17, 14, 8, 12, 22, 7, 19, 16, 10, 13, 20, 5, 18, 15, 9, 11
3 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 20, 5, 17, 14, 8, 11, 21, 7, 19, 16, 10, 13, 22, 6, 18, 15, 9, 12
4 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 21, 6, 17, 14, 8, 12, 22, 7, 19, 16, 10, 13, 20, 5, 18, 15, 9, 11
Thank you!
I think you're looking for lapply (or a related function).
What's probably taking the most time is the rbind, because at each step in your loops the entire object results gets slightly larger, which means it gets fully copied. With lapply, all results are first calculated, and only then you combine them with dplyr::rbind_list dplyr::bind_rows
What you get is this:
results <- dplyr::bind_rows(lapply(1:dim(df)[1], function(i) {
dplyr::bind_rows(lapply(1:dim(df)[1], function(j) {
data.frame(pathSubject1 = df$value[i],
pathSubject2 = df$value[j],
syncData = tidyr::nest(myFunc(df$subject1[[i]], df$subject2[[j]])))
}))
}))
Does that solve your problem?
EDIT: speeding things up
I've edited to use bind_rows instead of rbind_list, it's supposed to be faster. Furthermore, if you use [[i]] instead of [i] in the call to myFunc, you can drop the as.data.frame(x) there (and some for j/y).
Finally, you could optimize myFunc a bit by not assigning any intermediate results:
myFunc <- function(x, y) {
as.data.frame(summary(dplyr::inner_join(x, y, by = "Time")))
}
But my gut feeling says these will be small differences. To gain more speedup we need to reduce the actual computations, and then it matters what your actual data is, and what you need for your results.
Some observations, based on your example:
Do we need seperate data.frames? We compare all values in df$subject1 with those in df$subject2. In the example, first making one large data.frame for subject1, and then another for subject2, if needed with an extra label would speed up the join.
Why a join? Right now the summary of the join gives only information that we could have gotten without a join as well.
We join on Time, but in the example the timestamps for subject1 and 2 are identical. A check whether they are the same, followed by simply copying would be faster
As end-result we have a data.frame, with one column containing data.frames containing the summary of the join. Is that the way you need it? I think your code could be a lot faster if you only calculate the values you actually need. And I haven't worked a lot with data.frames containing data.frames, but it could well be that bind_rows doesn't handle it efficiently. Maybe a simple list (as column of your data.frame) would work better, as there's less overhead.
Finally, it could be that you're unable to reveal more about your real data, or it's too complicated.
In that case I think you could look aorund for various profiling-tools, functions that can help show you where most time is being spend. Personally, I like the profvis-tool
Put print(profvis::profvis({ mycode }, interval=seconds)) around a block of code, and after it finishes execution you see which lines took the most time, and which functions are called under the hood.
In the example-code, almost all time is spent in the row-binding and making data.frames. But in real data, I expect other functions may be time-consuming.

Creating a barplot from matrix

So, I have a matrix like that:
> dput(tbl_sum_peaks[1:40])
structure(c(2, 8, 3, 4, 1, 2, 1, 3, 1, 3, 1, 4, 4, 2, 1, 1, 2,
1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 5, 4, 2, 1, 1, 2, 1,
4, 2), .Names = c("AT1G01050", "AT1G01080", "AT1G01090", "AT1G01320",
"AT1G01470", "AT1G01800", "AT1G01910", "AT1G01960", "AT1G01980",
"AT1G02150", "AT1G02470", "AT1G02500", "AT1G02560", "AT1G02780",
"AT1G02816", "AT1G02880", "AT1G02920", "AT1G02930", "AT1G03030",
"AT1G03090", "AT1G03110", "AT1G03210", "AT1G03220", "AT1G03230",
"AT1G03330", "AT1G03475", "AT1G03630", "AT1G03680", "AT1G03740",
"AT1G03870", "AT1G04080", "AT1G04170", "AT1G04270", "AT1G04410",
"AT1G04420", "AT1G04530", "AT1G04640", "AT1G04650", "AT1G04690",
"AT1G04750"))
I would like to make a barplot which will have on yaxis the number of rows with specific number. As we see it the given example data most of the rows has a number 1 so the barplot for number 1 will be the tallest.
That's a basic but I can't turn on my brain... so help from someone will be rewarded!
Try
barplot(table(tbl_sum_peaks))

Resources