I'm trying to fit a von Bertalanffy growth function (VGBF) in r to my data grouped by a serial number.
This is a snippet of my data:
Serial_No<- c(315,315,315,315,315,315,315,316,316,316,316,317,317,317,317,317,317,317,317,317,318,318,318,318,319,319,319,319)
Year<-c(1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945)
tl<-c(19,33,46,55,63,66,70,22,39,55,65,20,40,51,56,60,62,63,64,65,26,43,54,60,28,43,53,61)
age<-c(1,2,3,4,5,6,7,1,2,3,4,1,2,3,4,5,6,7,8,9,1,2,3,4,1,2,3,4))
df<-data.frame(Serial_No, Year, tl, age)
I've been following this example: https://www.r-bloggers.com/2020/01/von-bertalanffy-growth-plots-ii/
and have changed my code to as follows:
vb <- vbFuns()
predict2 <- function(x) predict(x,data.frame(age=ages))
agesum <- group_by(df,Serial_No) %>%
summarize(minage=min(age),maxage=max(age))
Serial_Nos <- unique(df$Serial_No)
nSerial_Nos <- length(Serial_Nos)
cfs <- cis <- preds1 <- preds2 <- NULL
for (i in 1:nSerial_Nos) {
cat(Serial_Nos[i],"Loop\n")
tmp1 <- filter(df,Serial_No==Serial_Nos[i])
sv1 <- vbStarts(tl~age,data=tmp1)
fit1 <- nls(tl~vb(age,Linf,K,t0),data=tmp1,start=sv1)
cfs <- rbind(cfs,coef(fit1))
boot1 <- Boot(fit1)
tmp2 <- confint(boot1)
cis <- rbind(cis,c(tmp2["Linf",],tmp2["K",],tmp2["t0",]))
ages <- seq(-1,16,0.2)
boot2 <- Boot(fit1,f=predict2)
tmp2 <- data.frame(Serial_No=Serial_Nos[i],age=ages,
predict(fit1,data.frame(age=ages)),
confint(boot2))
preds1 <- rbind(preds1,tmp2)
tmp2 <- filter(tmp2,age>=agesum$minage[i],age<=agesum$maxage[i])
preds2 <- rbind(preds2,tmp2)
}
The code runs, but the results from the VBGF returned are the same for every serial no, which can't be the case. I think it's the filter function not working in the above code.
I've searched for solutions but can't get it to work.
If anyone can please help, or knows of a solution i'd really appreciate it.
Thank you in advance
Model fit with package growthrates
The following post describes an alternative approach without for-loop and filter. Similar loop-free solutions can be implemented using the common nls function and lapply in "base" R or group_by in "tidyverse".
Model definition
The growthrates package does not contain a von Bertalanffy function, so it has to be provided as user supplied model, as described in the package vignette. Here I borrowed the function from package FSA and adapted it accordingly:
library("growthrates")
grow_von_bert <- function(time, parms) {
with(as.list(parms), {
y <- Linf * (1 - exp(-K * (time - t0)))
as.matrix(data.frame(time = time, y = y))
})
}
Test of the model with a single example
p <- c(t0=5, Linf=10, K=.1)
time <- seq(5, 100)
plot(grow_von_bert(time, p), type="l")
Fit of a single data example
It is always a good idea to fit one or more single examples first, before doing this for all.
df1 <- subset(df, Serial_No == 315)
fit1 <- fit_growthmodel(df1$age, df1$tl,
FUN = grow_von_bert, p=c(t0=0, Linf=70, K=0.1))
summary(fit1)
Fit of all data sets
This can be done in a loop or with appropriate tidyverse functions, whipe package growthrates has such a function already built in, so all models can be fitted with a single function call. It is of course necessary to specify good start parameters, either the same for all curves or individual parameter sets, depending on the quality of the data. Here is the complete code including the data of the OP:
library("growthrates")
df <- data.frame(
Serial_No = factor(c(315,315,315,315,315,315,315,316,316,316,316,317,317,317,317,
317,317,317,317,317,318,318,318,318,319,319,319,319)),
year = c(1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,
1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945),
tl = c(19,33,46,55,63,66,70,22,39,55,65,20,40,51,56,60,62,63,64,65,26,43,54,60,28,
43,53,61),
age = c(1,2,3,4,5,6,7,1,2,3,4,1,2,3,4,5,6,7,8,9,1,2,3,4,1,2,3,4)
)
grow_von_bert <- function(time, parms) {
with(as.list(parms), {
y <- Linf * (1 - exp(-K * (time - t0)))
as.matrix(data.frame(time = time, y = y))
})
}
fit <- all_growthmodels(tl ~ age | Serial_No,
data=df,
FUN = grow_von_bert,
p=c(t0=0, Linf=70, K=0.1))
results(fit)
par(mfrow=c(2,3))
plot(fit, las=1)
I want to be able to input the variable name that I'll be using in the "weights" option in the lmer function. So then I can change the dataset, and cycle through the "weights" and pull the correct variable.
I want to pull the correct column for weights within the for loop.
So for y, the equation would be:
lmer(y~x+(1|study), weights = weight.var)
And y1:
lmer(y1~x+(1|study),weights = weight.var1)
So I named the weighting variables (weight.opt), then want to use them in the formula within the for loop. I can use "as.formula" to get the formula working and connected to the dataset, but I'm not sure how to do something similar with the weights.
x <- rnorm(300,0,1)
y <- x*rnorm(300,2,0.5)
y1 <- x*rnorm(300,0.1,0.1)
study <- rep(c("a","b","c"),each = 100)
weight.var <- rep(c(0.5,2,4),each = 100)
weight.var1 <- rep(c(0.1,.2,.15),each = 100)
library(lme4)
dataset <- data.frame(x,y,y1,study,weight.var,weight.var1)
resp1 <- c("y","y1")
weight.opt <- c("weight.var","weight.var1")
for(i in 1:2){
lmer(as.formula(paste(resp1[i],"~x+(1|study)")),weights = weight.opt[i],data = dataset)
}
This seems to work fine:
res_list <- list()
for(i in 1:2){
res_list[[i]] <- lmer(as.formula(paste(resp1[i],"~x+(1|study)")),
weights = dataset[[weight.opt[i]]],data = dataset)
}
while using Regsubsets from package leaps on data with linear dependencies, I found that results given by coef() and by summary()$which differs. It seems that, when linear dependencies are found, reordering changes position of coefficients and coef() returns wrong values.
I use mtcars just to "simulate" the problem I had with other data. In first example there is no issue of lin. dependencies and best given model by BIC is mpg~wt+cyl and both coef(),summary()$which gives the same result. In second example I add dummy variable so there is possibility of perfect multicollinearity, but variables in this order (dummy in last column) don't cause the problem. In last example after changing order of variables in dataset, the problem finally appears and coef(),summary()$which gives different models. Is there anything incorrect in this approach? Is there any other way to get coefficients from regsubsets?
require("leaps") #install.packages("leaps")
###Example1
dta <- mtcars[,c("mpg","cyl","am","wt","hp") ]
bestSubset.cars <- regsubsets(mpg~., data=dta)
(best.sum <- summary(bestSubset.cars))
#
w <- which.min(best.sum$bic)
best.sum$which[w,]
#
best.sum$outmat
coef(bestSubset.cars, w)
#
###Example2
dta2 <- cbind(dta, manual=as.numeric(!dta$am))
bestSubset.cars2 <- regsubsets(mpg~., data=dta)
(best.sum2 <- summary(bestSubset.cars2))
#
w <- which.min(best.sum2$bic)
best.sum2$which[w,]
#
coef(bestSubset.cars2, w)
#
###Example3
bestSubset.cars3 <- regsubsets(mpg~., data=dta2[,c("mpg","manual","am","cyl","wt","hp")])
(best.sum3 <- summary(bestSubset.cars3))
#
w <- which.min(best.sum3$bic)
best.sum3$which[w,]
#
coef(bestSubset.cars3, w)
#
best.sum2$which
coef(bestSubset.cars2,1:4)
best.sum3$which
coef(bestSubset.cars3,1:4)
The order of vars by summary.regsubsets and regsubsets are different. The generic function coef() of regsubsets calls those two in one function, and the results are in mess if you are trying to force.in or using formula with fixed order. Changing some lines in the coef() function might help. Try codes below, see if it works!
coef.regsubsets <- function (object, id, vcov = FALSE, ...)
{
s <- summary(object)
invars <- s$which[id, , drop = FALSE]
betas <- vector("list", length(id))
for (i in 1:length(id)) {
# added
var.name <- names(which(invars[i, ]))
thismodel <- which(object$xnames %in% var.name)
names(thismodel) <- var.name
# deleted
#thismodel <- which(invars[i, ])
qr <- .Fortran("REORDR", np = as.integer(object$np),
nrbar = as.integer(object$nrbar), vorder = as.integer(object$vorder),
d = as.double(object$d), rbar = as.double(object$rbar),
thetab = as.double(object$thetab), rss = as.double(object$rss),
tol = as.double(object$tol), list = as.integer(thismodel),
n = as.integer(length(thismodel)), pos1 = 1L, ier = integer(1))
beta <- .Fortran("REGCF", np = as.integer(qr$np), nrbar = as.integer(qr$nrbar),
d = as.double(qr$d), rbar = as.double(qr$rbar), thetab = as.double(qr$thetab),
tol = as.double(qr$tol), beta = numeric(length(thismodel)),
nreq = as.integer(length(thismodel)), ier = numeric(1))$beta
names(beta) <- object$xnames[qr$vorder[1:qr$n]]
reorder <- order(qr$vorder[1:qr$n])
beta <- beta[reorder]
if (vcov) {
p <- length(thismodel)
R <- diag(qr$np)
R[row(R) > col(R)] <- qr$rbar
R <- t(R)
R <- sqrt(qr$d) * R
R <- R[1:p, 1:p, drop = FALSE]
R <- chol2inv(R)
dimnames(R) <- list(object$xnames[qr$vorder[1:p]],
object$xnames[qr$vorder[1:p]])
V <- R * s$rss[id[i]]/(object$nn - p)
V <- V[reorder, reorder]
attr(beta, "vcov") <- V
}
betas[[i]] <- beta
}
if (length(id) == 1)
beta
else betas
}
Another solution that works for me is to randomize the order of the column(independent variables) in your dataset before running the regsubsets. The idea is that after reorder hopefully the highly correlated columns will be far apart from each other and will not trigger the reorder behavior in the regsubsets algorithm.
I am trying to compute prediction intervals for my neural network created with the neuralnet package.
I use R in Tableau Software, by creating .RData files containing my functions and loaded in Tableau.
It's a simple NN, with one hidden layer containing 5 nodes. I searched and found this package : nnetpredint
So I tried to use it, using their examples.
I tried also to change the way I use it (train/test in same data frame, separated data frames with the same columns names etc.)
And the best result I had was the prediction, but without the lowerBound and upperBound columns.
In fact, I got exactly the same result as when I use compute(myNN, etc.), but I don't have the second and third columns.
Thanks for your help,
EDIT :
My data is coming from tableau, my function take five parameters which are :
ValuesToExplain,train1,train2,test1,test2.
Then, i create and train my NN with the 3first and try to compute the two last.
(test1 = k*train1 and test2 = k2*train2 for now but it will probably move in the future).
Here is my whole code :
NNetwork <- function(objectiveValues, knownValues1, knownValues2, newData, newData2){
numberOfColumn = 3
##Create the training dataframe
training <- data.frame(objectiveValues, knownValues1,knownValues2)
training[which(is.na(training[,"objectiveValues"])),"objectiveValues"]<- mean(training[,"objectiveValues"], na.rm = TRUE)
training[which(is.na(training[,"knownValues1"])),"knownValues1"]<- mean(training[,"knownValues1"], na.rm = TRUE)
training[which(is.na(training[,"knownValues2"])),"knownValues2"]<- mean(training[,"knownValues2"], na.rm = TRUE)
## Create the testing dataframe
testing <- data.frame(objectiveValues,newData,newData2)
names(testing) <- c("objectiveValues", "knownValues1", "knownValues2")
testing[which(is.na(testing[,"objectiveValues"])),"objectiveValues"]<- mean(testing[,"objectiveValues"], na.rm = TRUE)
testing[which(is.na(testing[,"knownValues1"])),"knownValues1"]<- mean(testing[,"knownValues1"], na.rm = TRUE)
testing[which(is.na(testing[,"knownValues2"])),"knownValues2"]<- mean(testing[,"knownValues2"], na.rm = TRUE)
## Scaling
maxs <- apply(training, 2, max)
mins <- apply(training, 2, min)
trainingScaled <- as.data.frame(scale(training, center = mins, scale = maxs - mins))
testingScaled <- as.data.frame(scale(testing, center = mins, scale = maxs - mins))
### NeuralNetwork Part
library(neuralnet)
n <- names(trainingScaled)
f <- as.formula(paste("objectiveValues ~", paste(n[!n %in% "objectiveValues"], collapse = " + ")))
# Training NN
nn <- neuralnet(f, data=trainingScaled,hidden=5,linear.output=TRUE)
# Using NN
computedTrainingScaled <- compute(nn,trainingScaled[,2:numberOfColumn])
computedFromNNScaled <- compute(nn,testingScaled[,2:numberOfColumn])
# UnScaling
computedTraining <- computedTrainingScaled$net.result*(max(training$objectiveValues)-min(training$objectiveValues))+min(training$objectiveValues)
computedFromNN <- computedFromNNScaled$net.result*(max(training$objectiveValues)-min(training$objectiveValues))+min(training$objectiveValues)
RSquare = (1-( (sum((training$objectiveValues - computedTraining)^2))/(sum((training$objectiveValues - mean(training$objectiveValues))^2)) ))*100
RSE = sum((training$objectiveValues - computedTraining)^2)/nrow(training)
res <- (1:nrow(training))
library(nnetpredint) # Getting prediction confidence interval
x <- trainingScaled[,2:numberOfColumn]
y <- trainingScaled[1]
newData <- testingScaled[,2:numberOfColumn]
# S3 generic method: Object of nn
yPredInt <- nnetPredInt(nn, x, y, newData)
for(i in 1:nrow(training)){
res[i] <- paste(computedFromNN[i],RSquare,RSE, sep="#")
}
return(res)
}
save(NNetwork, file = "NNetwork.RData")
Here, i removed the part using the nnetpredint pckage because it was not working, but it was like this :
library(nnetpredint)
y <- trainingScaled
x <- trainingScaled[,2:3]
newData <- testingScaled[,2:3]
yPredInt <- nnetPredInt(nn, x, y, newData)
My problem is that when I try to access yPredInt$lowerBound or yPredInt$upperBound , they don't exist.