How to get the results from nls - r

My data are as follows:
df<-read.table (text=" cup size
12 30.76923077
13 38.46153846
14 53.84615385
15 76.92307692
16 92.30769231
17 100", header=TRUE)
I have used the following codes to get the results
y=as.vector(df$cup)
x=as.vector(df$size)
fit <- nls(1/y ~ 1 + exp(-b*(x-c)), start = list(b = 0.01, c = 12), alg = "plinear")
I want to extract values b, c, .lin separately
like this:
b= 0.01552
c= 5.98258
lin= 0.04887
I have used fit[[2]], but it does not work for me.

This should work:
df<-read.table (text=" cup size
12 30.76923077
13 38.46153846
14 53.84615385
15 76.92307692
16 92.30769231
17 100", header=TRUE)
y=as.vector(df$cup)
x=as.vector(df$size)
fit <- nls(1/y ~ 1 + exp(-b*(x-c)), start = list(b = 0.01, c = 12), alg = "plinear")
myvector <- coef(fit)
b <- myvector[1]
c <- myvector[2]
lin <- myvector[3]

As suggested by #user20650, coef is the function used to extract the values from the model.
If you refer to the source code you'll see it's hidden in the m element in the function getAllPars()
fit$m$getAllPars()
b c .lin
0.01552340 5.98262729 0.04887368

Related

Implementing additional constraint variables in integer programming using lpSolve

I'm working to implement a lpSolve solution to optimizing a hypothetical daily fantasy baseball problem. I'm having trouble applying my last constraint:
position - Exactly 3 outfielders (OF) 2 pitchers (P) and 1 of everything else
cost - Cost less than 200
team - Max number from any one team is 6
team - Minimum number of teams on a roster is 3**
Say for example you have a dataframe of 1000 players with points, cost, position, and team and you're trying to maximize average points:
library(tidyverse)
library(lpSolve)
set.seed(123)
df <- data_frame(avg_points = sample(5:45,1000, replace = T),
cost = sample(3:45,1000, replace = T),
position = sample(c("P","C","1B","2B","3B","SS","OF"),1000, replace = T),
team = sample(LETTERS,1000, replace = T)) %>% mutate(id = row_number())
head(df)
# A tibble: 6 x 5
# avg_points cost position team id
# <int> <int> <chr> <chr> <int>
#1 17 13 2B Y 1
#2 39 45 1B P 2
#3 29 33 1B C 3
#4 38 31 2B V 4
#5 17 13 P A 5
#6 10 6 SS V 6
I've implemented the first 3 constraints with the following code, but i'm having trouble figuring out how to implement the minimum number of teams on a roster. I think I need to add additional variable to the model, but i'm not sure how to do that.
#set the objective function (what we want to maximize)
obj <- df$avg_points
# set the constraint rows.
con <- rbind(t(model.matrix(~ position + 0,df)), cost = df$cost, t(model.matrix(~ team + 0, df)) )
#set the constraint values
rhs <- c(1,1,1,1,3,2,1, # 1. #exactly 3 outfielders 2 pitchers and 1 of everything else
200, # 2. at a cost less than 200
rep(6,26) # 3. max number from any team is 6
)
#set the direction of the constraints
dir <- c("=","=","=","=","=","=","=","<=",rep("<=",26))
result <- lp("max",obj,con,dir,rhs,all.bin = TRUE)
If it helps, i'm trying to replicate This paper (with minor tweaks) which has corresponding julia code here
This might be a solution for your problem.
This is the data I have used (identical to yours):
library(tidyverse)
library(lpSolve)
N <- 1000
set.seed(123)
df <- tibble(avg_points = sample(5:45,N, replace = T),
cost = sample(3:45,N, replace = T),
position = sample(c("P","C","1B","2B","3B","SS","OF"),N, replace = T),
team = sample(LETTERS,N, replace = T)) %>%
mutate(id = row_number())
You want to find x1...xn that maximise the objective function below:
x1 * average_points1 + x2 * average_points1 + ... + xn * average_pointsn
With the way lpSolve works, you will need to express every LHS as the sum over
x1...xn times the vector you provide.
Since you cannot express the number of teams with your current variables, you can introduce new ones (I will call them y1..yn_teams and z1..zn_teams):
# number of teams:
n_teams = length(unique(df$team))
Your new objective function (ys and zs will not influence your overall objective funtion, since the constant is set to 0):
obj <- c(df$avg_points, rep(0, 2 * n_teams))
)
The first 3 constraints are the same, but with the added constants for y and z:
c1 <- t(model.matrix(~ position + 0,df))
c1 <- cbind(c1,
matrix(0, ncol = 2 * n_teams, nrow = nrow(c1)))
c2 = df$cost
c2 <- c(c2, rep(0, 2 * n_teams))
c3 = t(model.matrix(~ team + 0, df))
c3 <- cbind(c3, matrix(0, ncol = 2 * n_teams, nrow = nrow(c3)))
Since you want to have at least 3 teams, you will first use y to count the number of players per team:
This constraint counts the number of players per team. You sum up all players of a team that you have picked and substract the corresponding y variable per team. This should be equal to 0. (diag() creates the identity matrix, we do not worry about z at this point):
# should be x1...xn - y1...n = 0
c4_1 <- cbind(t(model.matrix(~team + 0, df)), # x
-diag(n_teams), # y
matrix(0, ncol = n_teams, nrow = n_teams) # z
) # == 0
Since each y is now the number of players in a team, you can now make sure that z is binary with this constraint:
c4_2 <- cbind(t(model.matrix(~ team + 0, df)), # x1+...+xn ==
-diag(n_teams), # - (y1+...+yn )
diag(n_teams) # z binary
) # <= 1
This is the constraint that ensures that at least 3 teams are picked:
c4_3 <- c(rep(0, nrow(df) + n_teams), # x and y
rep(1, n_teams) # z >= 3
)
You need to make sure that
You can use the big-M method for that to create a constraint, which is:
Or, in a more lpSolve friendly version:
In this case you can use 6 as a value for M, because it is the largest value any y can take:
c4_4 <- cbind(matrix(0, nrow = n_teams, ncol = nrow(df)),
diag(n_teams),
-diag(n_teams) * 6)
This constraint is added to make sure all x are binary:
#all x binary
c5 <- cbind(diag(nrow(df)), # x
matrix(0, ncol = 2 * n_teams, nrow = nrow(df)) # y + z
)
Create the new constraint matrix
con <- rbind(c1,
c2,
c3,
c4_1,
c4_2,
c4_3,
c4_4,
c5)
#set the constraint values
rhs <- c(1,1,1,1,3,2,1, # 1. #exactly 3 outfielders 2 pitchers and 1 of everything else
200, # 2. at a cost less than 200
rep(6, n_teams), # 3. max number from any team is 6
rep(0, n_teams), # c4_1
rep(1, n_teams), # c4_2
3, # c4_3,
rep(0, n_teams), #c4_4
rep(1, nrow(df))# c5 binary
)
#set the direction of the constraints
dir <- c(rep("==", 7), # c1
"<=", # c2
rep("<=", n_teams), # c3
rep('==', n_teams), # c4_1
rep('<=', n_teams), # c4_2
'>=', # c4_3
rep('<=', n_teams), # c4_4
rep('<=', nrow(df)) # c5
)
The problem is almost the same, but I am using all.int instead of all.bin to make sure the counts work for the players in the team:
result <- lp("max",obj,con,dir,rhs,all.int = TRUE)
Success: the objective function is 450
roster <- df[result$solution[1:nrow(df)] == 1, ]
roster
# A tibble: 10 x 5
avg_points cost position team id
<int> <int> <chr> <chr> <int>
1 45 19 C I 24
2 45 5 P X 126
3 45 25 OF N 139
4 45 22 3B J 193
5 45 24 2B B 327
6 45 25 OF P 340
7 45 23 P Q 356
8 45 13 OF N 400
9 45 13 SS L 401
10 45 45 1B G 614
If you change your data to
N <- 1000
set.seed(123)
df <- tibble(avg_points = sample(5:45,N, replace = T),
cost = sample(3:45,N, replace = T),
position = sample(c("P","C","1B","2B","3B","SS","OF"),N, replace = T),
team = sample(c("A", "B"),N, replace = T)) %>%
mutate(id = row_number())
It will now be infeasable, because the number of teams in the data is less then 3.
You can check that it now works:
sort(unique(df$team))[result$solution[1027:1052]==1]
[1] "B" "E" "I" "J" "N" "P" "Q" "X"
sort(unique(roster$team))
[1] "B" "E" "I" "J" "N" "P" "Q" "X"

How to perform Broken-line regression analysis in R?

I have the following data:
Treatment Dose Value
FeSo4 200 104.17
TQ1 6 98.17
TQ2 9 92
TQ3 12 86.67
TQ4 15 77.33
TQ5 18 71.33
TQ6 21 74.83
TQ7 24 82.17
How can I do Broken-line regression analysis of this data in R to get the graph as below:
The best way to fit linear models by segments in R is to use CRAN package segmented.
In what follows, I have created a new column, coercing column Treatment from class factor to its integer codes.
library(segmented)
df1$Num <- as.integer(df1$Treatment)
fit <- lm(Value ~ Num, df1)
summary(fit)
seg <- segmented(fit, seg.Z = ~Num, psi = 6)
plot(Value ~ Num, df1) # plot the points
plot(seg, add = TRUE) # plot the broken line
abline(v = seg$psi[2]) # plot the vertical at the breakpoint
Data.
df1 <- read.table(text = "
Treatment Dose Value
FeSo4 200 104.17
TQ1 6 98.17
TQ2 9 92
TQ3 12 86.67
TQ4 15 77.33
TQ5 18 71.33
TQ6 21 74.83
TQ7 24 82.17
", header = TRUE)
A different approach is to first find the threshold and then fit a regular lm() model:
library(SiZer)
df <- read.table(text = "
Treatment Dose Value
FeSo4 200 104.17
TQ1 6 98.17
TQ2 9 92
TQ3 12 86.67
TQ4 15 77.33
TQ5 18 71.33
TQ6 21 74.83
TQ7 24 82.17
", header = TRUE)
df$Num <- as.integer(df$Treatment)
thr.pwl = piecewise.linear(df$Num, df$Value,
middle = 1, CI = FALSE,
bootstrap.samples = 1000, sig.level = 0.001)
thr.pwl
[1] "Threshold alpha: 6.30159931424453" #This is the threshold you need
[1] ""
[1] "Model coefficients: Beta[0], Beta[1], Beta[2]" #The estimates here are the same as in model.pwl, however, with lm() you can include also other independent variables
(Intercept) x w
111.48333 -6.63000 13.97001
model.pwl <- lm(Value ~ Num*(Num >= 6.30) + Num*(Num < 6.30),
data = df)
summary(model.pwl)
And you can plot it as:
plot(thr.pwl)
abline(v = thr.pwl$change.point)
However, with piecewise.linear() you can only us one threshold, while with segmented() more of them.

mcmcglmm loop to create many chains

Following up from this question (see for reproducible data frame) I want to run MCMCGLMM n times, where n is the number of randomisations. I have tried to construct a loop which runs all the chains, and saves them (to retrieve the posterior distributions of the randomised variable later) but I am encountering problems.
This is what the data frame looks like (when n = 5, hence R1-R5), A = response variable, L and V are random effect variables, B is a fixed effect, R1-R5 are random assignments of L with structure of V maintained:
ID L B V A R1 R2 R3 R4 R5
1 1_1_1 1 1 1 11.1 6 19 21 1 31
2 1_1_1 1 1 1 6.9 6 19 21 1 31
3 1_1_4 1 1 4 7.7 2 24 8 22 22
4 1_1_4 1 1 4 10.5 2 24 8 22 22
5 1_1_5 1 1 5 8.5 11 27 14 17 22
6 1_1_7 1 1 7 11.2 5 24 13 18 25
I can create the names I want to assign to my chains, and the names of the variable that changes with each run of the MCMC chain (R1-Rn):
n = 5
Rs = as.vector(rep(NA,n))
for(i in 1:n){
Rs[i] = paste("R",i, sep = "")
}
Rs
Output:
> Rs
[1] "R1" "R2" "R3" "R4" "R5"
I then tried this loop to produce 5 chains:
for(i in 1:n){
chains[i] = MCMCglmm(A ~1 + B,
random = as.formula(paste0("~" ,Rs[i], " + Vial")),
rcov = ~units,
nitt = 500,
thin = 2,
burnin = 50,
prior = prior2,
family = "gaussian",
start = list(QUASI = FALSE),
data = df)
}
Thanks Roland for helping to get the random effect to call properly, previously I was getting an error Error in buildZ(rmodel.terms[r] ... object Rs[i] not found- fixed by as.formula
But this stores all of the data in chains and seemingly only the $Sol components, but I need to be able to access the values within the VCV, specifically the posterior distributions of the R variables (e.g. summary(chainR1$VCV))
In summary: It seems I am making a mistake in how I assign the chain names, does anyone have a suggestion of how to do this, and save the posterior distributions or even the whole chain?
Using assign was a key point:
n = 10 #Number of chains to run
chainVCVdf = matrix(rep(NA, times = ((nitt-burnin)/thin)*n), ncol = n)
colnames(chainVCVdf)=c(rep("X", times = n))
for(i in 1:n){
assign("chainX",paste0("chain",Rs[i]))
chainX = MCMCglmm(A ~1 + B,
random = as.formula(paste0("~" ,Rs[i], " + V")),
rcov = ~units,
nitt = nitt,
thin = thin,
burnin = burnin,
prior = prior1,
family = "gaussian",
start = list(QUASI = FALSE),
data = df)
assign("chainVCV", chainX$VCV[,1])
chainVCVdf[,i]=(chainVCV)
colnames(chainVCVdf)[i] = colnames(chainX$VCV)[1]
}
It then became possible to build a matrix of the VCV component that I am interested in (namely the randomised L assignment in columns R1-Rn)
It seems as though you want to run a number of different MCMCglmm formulas in a loop. #Roland has helped you found the solution to this (although I personally would create the formulas prior to the loop). #Roland also points out that in order to save the results of each model, you should save them in a list - rather than a chain as you are currently doing. You could also save each model as an .RData file, as seen in the end of the question. To formalize an answer to this question I would perform this in the following way:
Rs = paste0("~R", 1:5, " + V") ## Create all model formulae
chainNames = paste0("chainR", 1:5) ## Names for each model
chains = list() ## Initialize list
## Loop over models
for(i in 1:length(Rs)){
chains[[i]] = MCMCglmm(A ~1 + B,
random = formula(Rs[i]),
rcov = ~units,
nitt = 500,
thin = 2,
burnin = 50,
prior = prior2,
family = "gaussian",
start = list(QUASI = FALSE),
data = df)
}
names(chains) = chainNames ## Name each model
save(chains, "chainsR1-R5.Rdata") ## Save all model output
A side note, paste0 is the same as paste, but with the argument sep="" by default

Multi-way interaction: easy way to get numerical coefficient estimates?

Let's say there's a 4-way interaction, with a 2x2x2 factorial design plus a continuous variable.
Factors have the default contrast coding (contr.treatment). Here's an example:
set.seed(1)
cat1 <- as.factor(sample(letters[1:2], 1000, replace = TRUE))
cat2 <- as.factor(sample(letters[3:4], 1000, replace = TRUE))
cat3 <- as.factor(sample(letters[5:6], 1000, replace = TRUE))
cont1 <- rnorm(1000)
resp <- rnorm(1000)
df <- data.frame(cat1, cat2, cat3, cont1, resp)
mod <- lm(resp ~ cont1 * cat1 * cat2 * cat3, data = df)
Looking at the output of coef(mod), we get something like:
(Intercept) cont1 cat1b
0.019822407 0.011990238 0.207604677
cat2d cat3f cont1:cat1b
-0.010132897 0.105397591 -0.001153867
cont1:cat2d cat1b:cat2d cont1:cat3f
0.023358901 -0.194991402 0.060960695
cat1b:cat3f cat2d:cat3f cont1:cat1b:cat2d
-0.240624582 -0.117278931 -0.069880751
cont1:cat1b:cat3f cont1:cat2d:cat3f cat1b:cat2d:cat3f
-0.120446848 -0.141688864 0.136945262
cont1:cat1b:cat2d:cat3f
0.201792298
And to get the estimated intercept for cat1b (for example), we would add our implicit (Intercept) term and cat1b, i.e. coef(mod)[1] + coef(mod)[3]. To get the change in slope for the same category, we would use coef(mod)[2] + coef(mod)[6], a la this r-bloggers post. It gets pretty tedious to write all of them out, and methods(class="lm") doesn't look like it has any functions that do this right out of the gate.
Is there some obvious way to get numerical estimates for the intercept and slope for each combination of factors?
You're looking for the lsmeans package. Check it out:
lstrends(mod, specs = c('cat1', 'cat2', 'cat3'), var = 'cont1')
cat1 cat2 cat3 cont1.trend SE df lower.CL upper.CL
a c e 0.01199024 0.08441129 984 -0.15365660 0.1776371
b c e 0.01083637 0.08374605 984 -0.15350502 0.1751778
a d e 0.03534914 0.09077290 984 -0.14278157 0.2134799
b d e -0.03568548 0.09644117 984 -0.22493948 0.1535685
a c f 0.07295093 0.08405090 984 -0.09198868 0.2378905
b c f -0.04864978 0.09458902 984 -0.23426916 0.1369696
a d f -0.04537903 0.09363128 984 -0.22911897 0.1383609
b d f -0.03506820 0.08905581 984 -0.20982934 0.1396929

Using split function in R

I am trying to simulate three small datasets, which contains x1,x2,x3,x4, trt and IND.
However, when I try to split simulated data by IND using "split" in R I get Warning messages and outputs are correct. Could someone please give me a hint what I did wrong in my R code?
# Step 2: simulate data
Alpha = 0.05
S = 3 # number of replicates
x = 8 # number of covariates
G = 3 # number of treatment groups
N = 50 # number of subjects per dataset
tot = S*N # total subjects for a simulation run
# True parameters
alpha = c(0.5, 0.8) # intercepts
b1 = c(0.1,0.2,0.3,0.4) # for pi_1 of trt A
b2 = c(0.15,0.25,0.35,0.45) # for pi_2 of trt B
b = c(1.1,1.2,1.3,1.4);
##############################################################################
# Scenario 1: all covariates are independent standard normally distributed #
##############################################################################
set.seed(12)
x1 = rnorm(n=tot, mean=0, sd=1);x2 = rnorm(n=tot, mean=0, sd=1);
x3 = rnorm(n=tot, mean=0, sd=1);x4 = rnorm(n=tot, mean=0, sd=1);
###############################################################################
p1 = exp(alpha[1]+b1[1]*x1+b1[2]*x2+b1[3]*x3+b1[4]*x4)/
(1+exp(alpha[1]+b1[1]*x1+b1[2]*x2+b1[3]*x3+b1[4]*x4) +
exp(alpha[2]+b2[1]*x1+b2[2]*x2+b2[3]*x3+b2[4]*x4))
p2 = exp(alpha[2]+b2[1]*x1+b2[2]*x2+b2[3]*x3+b2[4]*x4)/
(1+exp(alpha[1]+b1[1]*x1+b1[2]*x2+b1[3]*x3+b1[4]*x4) +
exp(alpha[2]+b2[1]*x1+b2[2]*x2+b2[3]*x3+b2[4]*x4))
p3 = 1/(1+exp(alpha[1]+b1[1]*x1+b1[2]*x2+b1[3]*x3+b1[4]*x4) +
exp(alpha[2]+b2[1]*x1+b2[2]*x2+b2[3]*x3+b2[4]*x4))
# To assign subjects to one of treatment groups based on response probabilities
tmp = function(x){sample(c("A","B","C"), 1, prob=x, replace=TRUE)}
trt = apply(cbind(p1,p2,p3),1,tmp)
IND=rep(1:S,each=N) #create an indicator for split simulated data
sim=data.frame(x1,x2,x3,x4,trt, IND)
Aset = subset(sim, trt=="A")
Bset = subset(sim, trt=="B")
Cset = subset(sim, trt=="C")
Anew = split(Aset, f = IND)
Bnew = split(Bset, f = IND)
Cnew = split(Cset, f = IND)
The warning message:
> Anew = split(Aset, f = IND)
Warning message:
In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :
data length is not a multiple of split variable
and the output becomes
$`2`
x1 x2 x3 x4 trt IND
141 1.0894068 0.09765185 -0.46702047 0.4049424 A 3
145 -1.2953113 -1.94291045 0.09926239 -0.5338715 A 3
148 0.0274979 0.72971804 0.47194731 -0.1963896 A 3
$`3`
[1] x1 x2 x3 x4 trt IND
<0 rows> (or 0-length row.names)
I have checked my R code several times however, I can't figure out what I did wrong. Many thanks in advance
IND is the global variable for the full data, sim. You want to use the specific one for the subset, eg
Anew <- split(Aset, f = Aset$IND)
It's a warning, not an error, which means split executed successfully, but may not have done what you wanted to do.
From the "details" section of the help file:
f is recycled as necessary and if the length of x is not a multiple of
the length of f a warning is printed. Any missing values in f are
dropped together with the corresponding values of x.
Try checking the length of your IND against the size of your dataframe, maybe.
Not sure what your goal is once you have your data split, but this sounds like a good candidate for the plyr package.
> library(plyr)
> ddply(sim, .(trt,IND), summarise, x1mean=mean(x1), x2sum=sum(x2), x3min=min(x3), x4max=max(x4))
trt IND x1mean x2sum x3min x4max
1 A 1 -0.49356448 -1.5650528 -1.016615 2.0027822
2 A 2 0.05908053 5.1680463 -1.514854 0.8184445
3 A 3 0.22898716 1.8584443 -1.934188 1.6326763
4 B 1 0.01531230 1.1005720 -2.002830 2.6674931
5 B 2 0.17875088 0.2526760 -1.546043 1.2021935
6 B 3 0.13398967 -4.8739380 -1.565945 1.7887837
7 C 1 -0.16993037 -0.5445507 -1.954848 0.6222546
8 C 2 -0.04581149 -6.3230167 -1.491114 0.8714535
9 C 3 -0.41610973 0.9085831 -1.797661 2.1174894
>
Where you can substitute summarise and its following arguments for any function that returns a data.frame or something that can be coerced to one. If lists are the target, ldply is your friend.

Resources