I have written a model that I am fitting to data using ML via the mle2 package. However, I have a large data frame of samples and I would like to fit the model to each replicate, and then retrieve all of the coefficients of the model in a data frame.
I have tried to use the ddply function in the plyr package with no success.
I get the following error message when I try:
Error in output[[var]][rng] <- df[[var]] :
incompatible types (from S4 to logical) in subassignment type fix
Any thoughts?
here is an example of what I am doing.
This is my data frame. I have measurement in Pond 5...n on day 1....n. Measurements consist of 143 fluxes (flux.cor), which is the variable I am modelling.
Pond Obs Date Time Temp DO pH U day month PAR
932 5 932 2011-06-16 17:31:00 17:31:00 294.05 334.3750 8.47 2 1 1 685.08
933 5 933 2011-06-16 17:41:00 17:41:00 294.05 339.0625 8.47 2 1 1 808.44
934 5 934 2011-06-16 17:51:00 17:51:00 294.02 340.6250 8.46 2 1 1 752.78
935 5 935 2011-06-16 18:01:00 18:01:00 294.00 340.6250 8.45 2 1 1 684.14
936 5 936 2011-06-16 18:11:00 18:11:00 293.94 340.9375 8.50 2 1 1 625.86
937 5 937 2011-06-16 18:21:00 18:21:00 293.88 341.5625 8.48 2 1 1 597.06
day.night Treat H pOH OH DO.cor sd.DO av.DO DO.sat
932 1 A 3.388442e-09 5.53 2.951209e-06 342.1406 2.63078 342.1406 274.0811
933 1 A 3.388442e-09 5.53 2.951209e-06 339.0625 2.63078 342.1406 274.0811
934 1 A 3.467369e-09 5.54 2.884032e-06 340.6250 2.63078 342.1406 274.2432
935 1 A 3.548134e-09 5.55 2.818383e-06 340.6250 2.63078 342.1406 274.3513
936 1 A 3.162278e-09 5.50 3.162278e-06 340.9375 2.63078 342.1406 274.6763
937 1 A 3.311311e-09 5.52 3.019952e-06 341.5625 2.63078 342.1406 275.0020
DO_flux NEP.hr flux.cor sd.flux av.flux
932 -3.078125 -3.09222602 -3.078125 2.104482 -0.1070312
933 1.562500 1.54903673 1.562500 2.104482 -0.1070312
934 0.000000 -0.01375489 0.000000 2.104482 -0.1070312
935 0.312500 0.29876654 0.312500 2.104482 -0.1070312
936 0.625000 0.61126617 0.625000 2.104482 -0.1070312
here is the my model:
# function that generates predictions of O2 flux given GPP R and gas exchange
flux.pred <- function(GPP24, PAR, R24, Temp, U, DO, DOsat){
# calculates Schmidt coefficient from water temperature
Sc<-function(Temp){
S<-0.0476*(Temp)^2 + 3.7818*(Temp)^2 - 120.1*Temp + 1800.6
}
# calculates piston velocity k (m h-1) from wind speed at 10m (m s-1)
k600<-function(U){
k.600<-(2.07 + 0.215*((U)^1.7))/100
}
# calculates piston velocity k (m h-1) from wind speed at 10m (m s-1)
k<-function(Temp,U){
k<-k600(U)*((Sc(Temp)/600)^-0.5)
}
# physical gas flux (mg O2 m-2 10mins-1)
D<-function(Temp,U,DO,DOsat){
d<-(k(Temp,U)/6)*(DO-DOsat)
}
# main function to generate predictions
flux<-(GPP24/sum(YSI$PAR[YSI$PAR>40]))*(ifelse(YSI$PAR>40, YSI$PAR, 0))-(R24/144)+D(YSI$Temp,YSI$U,YSI$DO,YSI$DO.sat)
return(flux)
}
which returns predictions for the fluxes.
I then build my likelihood function:
# likelihood function
ll<-function(GPP24, PAR, R24, Temp, U, DO.cor, DO.sat){
pred = (flux.pred(GPP24, PAR, R24, Temp, U, DO.cor, DOsat))
pred = pred[-144]
obs = YSI$flux.cor[-144]
return(-sum(dnorm(obs, mean=pred, sd=sqrt(var(obs-pred)))))
}
and apply it
ll.fit<-mle2(ll,start=list(GPP24=100, R24=100))
It works beautifully for one Pond on one day, but what I want to do is apply it to all ponds on all days automatically.
I tried the ddply (as stated above)
metabolism<-ddply(YSI, .(Pond,Treat,day,month), summarise,
mle = mle2(ll,start=list(GPP24=100, R24=100)))
but had not success. I also tried just extracting the coefficients using a for loop, but this did not work either.
for(i in 1:length(unique(YSI$day))){
GPP<-numeric(length=length(unique(YSI$day)))
GPP[i]<-mle2(ll,start=list(GPP24=100, R24=100))
}
any help would be gratefully received.
There's at least one problem with your functions : Nowhere in your function flux.pred or ll do you have an argument where you can actually specify what the data is that is used. You hardcoded it. So how on earth is any *ply supposed to guess that it needs to change YSI$... into a subset?
Next to that, as #hadley points out, ddply will not suit you. dlply might, or you might just use the classic approach of either by() or lapply(split()).
So imagine you make a function
flux.pred <- function(data, GPP24, R24){
# calculates Schmidt coefficient from water temperature
Sc<-function(data$Temp){
S<-0.0476*(data$Temp)^2 ...
...
}
and a function
ll<-function(GPP24, R24, data ){
pred = (flux.pred(data, GPP24, R24 ))
pred = pred[-144] # check this
obs = data$flux.cor[-144] # check this
return(-sum(dnorm(obs, mean=pred, sd=sqrt(var(obs-pred)))))
}
You should then be able to do do eg :
dlply(data, .(Pond,Treat,day,month), .fun=function(i){
mle2(ll,start=list(GPP24=100, R24=100, data=i))
})
The passing of the data argument is dependent on what you use in mle2 for the optimization. In your case, you use the default optimizer, which is optim. See ?optim for more details. The argument data=i will be passed from mle2 to optim to ll.
What I can't check, is how the optim behaves. It might even be that your function doesn't really work as you intend. Normally you should have a function ll like :
ll <- function(par, data){
GPP24 <- par[1]
R24 <- par[2]
...
}
for optim to work. But if you say it works as you wrote it, I believe you. Make sure though it actually does. I am not convinced...
On a sidenote : neither using by() / lapply(split()) nor using dlply() is the same as vectorizing. In contrary, all these constructs are intrinsic loops. On the why of using them, read : Is R's apply family more than syntactic sugar?
Related
I just begin to learn to code using R and I tried to do a classification by C5.0. But I encounter some problems and I don't understand. I am looking for help with gratitude. Below is the code I learned from someone and I tried to use it to run my own data:
require(C50)
data.resultc50 <- c()
prematrixc50 <- c()
for(i in 3863:3993)
{
needdata$class <- as.factor(needdata$class)
trainc50 <- C5.0(class ~ ., needdata[1:3612,], trials=5, control=C5.0Control(noGlobalPruning = TRUE, CF = 0.25))
predc50 <- predict(trainc50, newdata=testdata[i, -1], trials=5, type="class")
data.resultc50[i-3862] <- sum(predc50==testdata$class[i])/length(predc50)
prematrixc50[i-3862] <- as.character.factor(predc50)
}
Belows are two objects needdata & testdata I used in the code above with part of their heads respectively:
class Volume MA20 MA10 MA120 MA40 MA340 MA24 BIAS10
1 1 2800 8032.00 8190.9 7801.867 7902.325 7367.976 1751 7.96
2 1 2854 8071.40 8290.3 7812.225 7936.550 7373.624 1766 6.27
3 0 2501 8117.45 8389.3 7824.350 7973.250 7379.444 1811 5.49
4 1 2409 8165.40 8488.1 7835.600 8007.900 7385.294 1825 4.02
# the above is "needdata" and actually has 15 variables with 3862 obs.
class Volume MA20 MA10 MA120 MA40 MA340 MA24 BIAS10
1 1 2800 8032.00 8190.9 7801.867 7902.325 7367.976 1751 7.96
2 1 2854 8071.40 8290.3 7812.225 7936.550 7373.624 1766 6.27
3 0 2501 8117.45 8389.3 7824.350 7973.250 7379.444 1811 5.49
4 1 2409 8165.40 8488.1 7835.600 8007.900 7385.294 1825 4.02
# the above is "testdata" and has 15 variables with 4112 obs.
The data above contain the factor class with value of 0 & 1. After I run it I got warnings below:
In predict.C5.0(trainc50, newdata = testdata[i, -1], trials = 5, ... : 'trials' should be <= 1 for this object. Predictions generated
using 1 trials
And when I try to look at the object trainc50 just created, I noticed the number of boosting iterations is 1 due to early stopping as shown below:
# trainc50
Call:
C5.0.formula(formula = class ~ ., data = needdata[1:3612, ],
trials = 5, control = C5.0Control(noGlobalPruning = TRUE,
CF = 0.25), earlyStopping = FALSE)
Classification Tree
Number of samples: 3612
Number of predictors: 15
Number of boosting iterations: 5 requested; 1 used due to early stopping
Non-standard options: attempt to group attributes, no global pruning
I also tried to plot the decision tree and I got the error as below:
plot(trainc50)
Error in if (!n.cat[i]) { : argument is of length zero
In addition: Warning message:
In 1:which(out == "Decision tree:") : numerical expression has 2 elements: only the first used
Does that mean my code is too bad to perform further trials while running C5.0? What is wrong? Can someone please help me out about why do I encounter early stopping and what does the error and waring message mean? How can I fix it? If anyone can help me I'll be very thankful.
Used in
http://r-project-thanos.blogspot.tw/2014/09/plot-c50-decision-trees-in-r.html
using function
C5.0.graphviz(firandomf,
"a.txt",
fontname='Arial',
col.draw='black',
col.font='blue',
col.conclusion='lightpink',
col.question='grey78',
shape.conclusion='box3d',
shape.question='diamond',
bool.substitute=c('None', 'yesno', 'truefalse', 'TF'),
prefix=FALSE,
vertical=TRUE)
And in the command line:
pip install graphviz
dot -Tpng ~plot/a.txt >~/plot/a.png
I just wrote my first attempt of providing a neuronal network for household classification by using energy consumption features. So far I could make it run but the output seems to be questionable.
So like you can see I'm using 18 features (maybe to much?) to predict if it's a single or non-single household.
I got 3488 rows like this:
c_day c_weekend c_evening c_morning c_night c_noon c_max c_min r_mean_max r_min_mean r_night_day r_morning_noon
12 14 1826 9 765 3 447 2 878 0 7338 4
r_evening_noon t_above_1kw t_above_2kw t_above_mean t_daily_max single
3424 1 695 0 174319075712881 1
My neuronal network using these parameters:
net.nn <- neuralnet(single
~ c_day
+ c_weekend
+ c_weekday
+ c_evening
+ c_morning
+ c_night
+ c_noon
+ c_max
+ c_min
+ r_mean_max
+ r_min_mean
+ r_night_day
+ r_morning_noon
+ r_evening_noon
+ t_above_1kw
+ t_above_2kw
+ t_above_mean
+ t_daily_max
,train, hidden=15, threshold=0.01,linear.output=F)
1 repetition was calculated.
Error Reached Threshold Steps
1 126.3425379 0.009899229932 4091
I normalized the data before by using the min-max normalization formula:
for(i in names(full_data)){
x <- as.numeric(full_data[,i])
full_data[,i] <- (x-min(x)/max(x)-min(x))
}
I got 3488 rows of data and splitted them into a training and a test set.
half <- nrow(full_data)/2
train <- full_data[1:half,]
test <- full_data[half:3488,]
net.results <- compute(net.nn,test)
nn$net.result
I used the prediction method and bound it to the actual "single[y/no]"-column to compare the result:
predict <- nn$net.result
cleanoutput <- cbind(predict,full_data$single[half:3488])
colnames(cleanoutput) <- c("predicted","actual")
So when I print it, this is my classification result for the first 10 rows:
predicted actual
1701 0.1661093405 0
1702 0.1317067578 0
1703 0.1677147708 1
1704 0.2051188618 1
1705 0.2013035634 0
1706 0.2088726723 0
1707 0.2683753128 1
1708 0.1661093405 0
1709 0.2385537285 1
1710 0.1257108821 0
So if I understand it right, when I round the predicted output it should be either a 0 or 1 but it always ends up being a 0!
Am I using the wrong parameters? Is my data simply not suitable for nn prediction? Is the normalization wrong?
It means your model performance is still not good. Once you have reached good model performance after tuning you should get correct expected behavior. Neural net techniques are very susceptible of scale difference between different columns so standardization of data [mean =0 std =1] is a good practice. As pointed by OP scale() does the job.
Using scale(full_data) for the entire data did the trick. Now the data is normalized through standard-mean deviation and the output seems much more reliable.
I am using predict.merMod (the predict() function with glmer GLMM models). I keep getting strange and non-sensible predictions when I predict values using newdata with new levels, but it is not consistent.
My Data:
The data are tree diameters (D) and tree ages (tree_age) for individual trees (differentiated by StaticLineID) across multiple years (YearlyLineID). In other words, the data are grouped by StaticLinID, and each year's measurement for each StaticLineID gets its own YearlyLineID.
urlfile<-'http://theforestecologist.web.unc.edu/files/2015/07/dat.csv'
tree.mix1<-read.csv(urlfile,head=T)
My Goal:
Predict missing tree_age values for newdata ("trees.to.fix" below) using GLMM's with tree.mix1
My 2 Models:
I try to use both a linear and a quadratic glmm in order to compare:
glmer.lin<-glmer(tree_age~D+(D|StaticLineID),
data=tree.mix1,family=Gamma(link=log),na.action=na.omit,
glmerControl(optimizer='bobyqa',optCtrl=list(maxfun=500000)))
glmer.quad<-glmer(tree_age~D+I(D^2)+(D+I(D^2)|StaticLineID),
data=tree.mix1,family=Gamma(link=log),na.action=na.omit,
glmerControl(optimizer='bobyqa',optCtrl=list(maxfun=500000)))
Newdata:
Same format as tree.mix1, except no tree_age. Also contains entirely different set of StaticLineID's.
urlfile2<-'http://theforestecologist.web.unc.edu/files/2015/07/dat2.csv'
trees.to.fix<-read.csv(urlfile2,head=T)
Problem:
When I use predict, I assume I have to use allow.new.levels=T because there is no overlap in StaticLineID between datasets (and because the function will not work otherwise). However, when I do use predict this way I get strange values for some of my predictions:
pred1<-predict(glmer.lin,newdata=trees.to.fix,allow.new.levels=T)
pred.lin<-exp(pred1)
pred2<-predict(glmer.quad,newdata=trees.to.fix,allow.new.levels=T)
pred.quad<-exp(pred2)
The prediction creates essentially either very large log values in the case of the linear model or very small log values in the case of the quadratic model. If I place these in a data frame with the tree data you can see these strange values seem to occur mostly with Large D:
out<-cbind(trees.to.fix[,c('YearlyLineID','StaticLineID','D')],pred1,pred2)
out[1:20,]
YearlyLineId StaticLineID D pred.lin pred.quad
>18415 16366 3089 17.5 9.857345 2.400280
>18414 16367 3089 19.8 10.994224 0.924323
>18416 16368 3089 22.9 12.526540 -1.682780
>18417 16369 3089 25.7 13.910567 -4.647235
>18424 16370 3089 28.2 15.146306 -7.783046
>18419 16371 3089 30.2 16.134896 -10.623830
>18426 16372 3089 32.5 17.271776 -14.255710
>18422 16373 3089 37.8 19.891541 -24.111314
>18425 16374 3089 37.1 19.545535 -22.690800
>18423 16375 3089 38.9 20.435266 -26.416620
>18418 16376 3089 39.2 20.583555 -27.060839
>18420 16377 3089 40.3 21.127280 -29.479809
>18681 16626 3128 13.0 7.633015 4.158762
>18680 16627 3128 13.5 7.880163 4.037183
>18685 16628 3128 15.0 8.621606 3.561734
>18683 16629 3128 15.7 8.967613 3.283025
>18682 16630 3128 16.3 9.264190 3.015348
>18790 16732 3147 11.4 6.842143 4.423819
>18793 16733 3147 12.4 7.336438 4.280301
>18792 16734 3147 14.5 8.374459 3.738669
Which of course also give ridiculous predictions if I take the exponent:
(either producing ridiculously Large age predictions in the case of the linear model or rounding to essentially make age=0 in the case of the quadratic model. The quad model also decreases in magnitude as D increases, which I also don't understand):
out2<-cbind(trees.to.fix[,c('YearlyLineID','StaticLineID','D')],pred.lin,pred.quad)
out2[1:20,]
> YearlyLineId StaticLineID D pred.lin pred.quad
>18415 16366 3089 17.5 1.909811e+04 11.026
>18414 16367 3089 19.8 5.952932e+04 2.520
>18416 16368 3089 22.9 2.755543e+05 0.186
>18417 16369 3089 25.7 1.099721e+06 0.010
>18424 16370 3089 28.2 3.784051e+06 0.000
>18419 16371 3089 30.2 1.016943e+07 0.000
>18426 16372 3089 32.5 3.169837e+07 0.000
>18422 16373 3089 37.8 4.352980e+08 0.000
>18425 16374 3089 37.1 3.079767e+08 0.000
>18423 16375 3089 38.9 7.497620e+08 0.000
>18418 16376 3089 39.2 8.696097e+08 0.000
>18420 16377 3089 40.3 1.497825e+09 0.000
>18681 16626 3128 13.0 2.065268e+03 63.992
>18680 16627 3128 13.5 2.644304e+03 56.666
>18685 16628 3128 15.0 5.550295e+03 35.224
>18683 16629 3128 15.7 7.844854e+03 26.656
>18682 16630 3128 16.3 1.055326e+04 20.396
>18790 16732 3147 11.4 9.364937e+02 83.414
>18793 16733 3147 12.4 1.535234e+03 72.262
>18792 16734 3147 14.5 4.334921e+03 42.042
I've been all over stack exchange, github, etc. etc. to make sense of what is going on or how to fix it. I'll be honest that all of this is a learning work in progress for me, so I am not 100% up to speed on the theory behind all of this yet.
If this is beyond a coding fixble problem, I would appreciate any insight about how to more properly predict these values if there is an alternate method.
Any help would be great! Thanks!
Yeah, why are you using a glm with two sets of continuous data instead of just a lm? I am also concerned with how you just chose a random intercept & slope by default without testing if they are both really warranted. I would strongly recommend reading Chapter 5 in Zuur et al 2009: Mixed Effect Models and Extensions in Ecology with R (~30-45 min read).
I would have done something more like this:
urlfile<-'http://theforestecologist.web.unc.edu/files/2015/07/dat.csv'
tree.mix1<-read.csv(urlfile,head=T)
tree.mix1$StaticLineID <- as.factor(tree.mix1$StaticLineID)
# check the data
par(mfrow=c(2,2))
plot(tree.mix1$D, tree.mix1$tree_age)
hist(tree.mix1$D); hist(tree.mix1$tree_age)
plot(log(tree.mix1$D+1), tree.mix1$tree_age)
# D is skewed, we should log transform
tree.mix1$D <- log(tree.mix1$D + 1)
library(lme4)
library(nlme)
library(ggplot2)
# Start with the full model with all fixed effects
lme.Full.int.slope <- lme(tree_age~D+I(D^2), random=~D|StaticLineID, data=tree.mix1, method="REML")
lme.Full.int <- lme(tree_age~D+I(D^2), random=~1|StaticLineID, data=tree.mix1, method="REML")
# Compare it to a basic gls to see if random effects are even nessesary
M.gls <- gls(tree_age~D+I(D^2), data=tree.mix1)
anova(M.gls, lme.Full.int.slope)
anova(M.gls, lme.Full.int)
anova(lme.Full.int.slope, lme.Full.int)
# ok keep random intercept and slope in model
# For the fixed effects which model has more support
# the linear or the quadratic (use maximum likilihood, ML)
anova(lme.Full.int.slope)
lme.lin <- lme(tree_age~D, random=~D|StaticLineID, data=tree.mix1, method="ML")
lme.quad <- lme(tree_age~D+I(D^2), random=~D|StaticLineID, data=tree.mix1, method="ML")
anova(lme.lin, lme.quad) # ok keep the quadratic form
# Refit model in lmer with REML
lmer.quad <- lmer(tree_age~D+I(D^2)+ (D|StaticLineID), data=tree.mix1)
# Import new data & plot response curves
urlfile2<-'http://theforestecologist.web.unc.edu/files/2015/07/dat2.csv'
trees.to.fix<-read.csv(urlfile2,head=T)
# Don't forget to log transform D again
trees.to.fix$D <- log(trees.to.fix$D+1)
jvalues <- with(tree.mix1, seq(from = min(D), to = max(D), length.out = 100)) # sequence of values to plot.
# calculate predicted probabilities and store in a list
pp <- lapply(jvalues, function(j) {
trees.to.fix$D <- j
predict(lmer.quad, newdata = trees.to.fix, REform=NA)
})
# get the means with lower and upper quartiles
plotdat <- t(sapply(pp, function(x) {
c(M = mean(x), quantile(x, c(0.25, 0.75)))
}))
# add in D values and convert to data frame
plotdat <- as.data.frame(cbind(plotdat, jvalues))
#better names and show the first few rows
colnames(plotdat) <- c("PredictedAge", "Lower", "Upper", "D")
head(plotdat)
library(ggplot2)
p <- ggplot(plotdat, aes(x = D, y = PredictedAge)) + geom_line() + ylim(c(0, 40))
#We could also add the lower and upper quartiles. This information shows us the range in which 50 percent of the predicted probabilities fell
p + geom_point(data=tree.mix1, aes(x = tree.mix1$D, y = tree.mix1$tree_age, colour = StaticLineID)) +
theme(legend.position = "none")
I have a dataset 162 x 152. What I want to do is use stepwise regression, incorporating cross validation on the dataset to create a model and to test how accurate that model is.
ID RT (seconds) 76_TI2 114_DECC 120_Lop 212_PCD 236_X3Av
4281 38 4.086 1.2 2.322 0 0.195
4952 40 2.732 0.815 1.837 1.113 0.13
4823 41 4.049 1.153 2.117 2.354 0.094
3840 41 4.049 1.153 2.117 3.838 0.117
3665 42 4.56 1.224 2.128 2.38 0.246
3591 42 2.96 0.909 1.686 0.972 0.138
This is part of the dataset I have. I want to construct a model where my Y variable is RT(seconds) and all my variables (my predictors) are all the other 151 variables in my dataset. I was told to use the superleaner package, and algorithm for that is:-
test <- CV.SuperLearner(Y = Y, X = X, V = 10, SL.library = SL.library,
verbose = TRUE, method = "method.NNLS")
The problem is that I'm still rather new to R. The main way in which I've been reading my data in and performing other forms of machine learning algorithms onto my data is by doing the following:-
mydata <- read.csv("filepathway")
fit <- lm(RT..seconds~., data=mydata)
So how do I go about separating the RT seconds column from the input of my data so that I can input the things as an X and Y dataframe? i.e something along the lines of:-
mydata <- read.csv("filepathway")
mydata$RT..seconds. = Y #separating my Y response variable
Alltheother151variables = X #separating all of my X predictor variables (all 151 of them)
SL.library <- c("SL.step")
test <- CV.SuperLearner(Y (i.e RT seconds column), X (all the other 151 variables that corresponds to the RT values), V = 10, SL.library = SL.library,
verbose = TRUE, method = "method.NNLS")
I hope this all makes sense. Thanks!
If the response variable is in the first column, you can simply use:
Y <- mydata[ , 1 ]
X <- mydata[ , -1 ]
The first argument of [ (the row number) is empty, so we keep all the rows,
and the second is either 1 (the first column) or -1 (everything but the first column).
If your response variable is elsewhere, you can use the column names instead:
Y <- mydata[ , "RT..seconds." ]
X <- mydata[ , setdiff( colnames(mydata), "RT..seconds." ) ]
I just started working with R and would like to get a Nonlinear least square fit nls(...) to the formula y=A(1-exp(-bL))+R.
I define my function g by
> g<-function(x,y,A,b,R) {
y~A(1-exp(-bx))+R
}
and want to perform nls by
>nls((y~g(x,y,A,b,R)),data=Data, start=list(A=-2,b=0,R=-5))
And I end with the following error message.
>Error in lhs - rhs : non-numeric argument to binary operator
I guess it's just a stupid basic mistake by another beginner, but I'd be extremely glad if anyone could help me out.
Next question would be, whether I can implement the fitted curve into my graph
>plot(x,y,main="VI.20.29")
Thanks to everyone taking time to read and hopefully answer my question!
Detailed information:
I have a table with the x values (Light.intensity) and y values (e.g. VI.20.29)
> photo.data<-read.csv("C:/X/Y/Z.csv", header=T)
> names(photo.data)
[1] "Light.intensity" "SR.8.6" "SR.8.7"
[4] "SR.8.18" "SR.8.20" "VI.20.1"
[7] "VI.20.5" "VI.20.20" "VI.20.29"
[10] "DP.19.1" "DP.19.15" "DP.19.33"
[13] "DP.19.99"
> x<-photo.data$Light.intensity
> x
[1] 0 50 100 200 400 700 1000 1500 2000
> y<-photo.data$VI.20.29
> y
[1] -2.76 -2.26 -1.72 -1.09 0.18 0.66 1.47 1.48 1.63
> plot(x,y,main="VI.20.29")
> Data<-data.frame(x,y)
> Data
x y
1 0 -2.76
2 50 -2.26
3 100 -1.72
4 200 -1.09
5 400 0.18
6 700 0.66
7 1000 1.47
8 1500 1.48
9 2000 1.63
> g<-function(x,y,A,b,R) {
+ y~A(1-exp(-bx))+R
+ }
> nls((y~g(x,y,A,b,R)),data=Data, start=list(A=-2,b=0,R=-5))
Error in lhs - rhs : non-numeric argument to binary operator
The problem is that you're calling a function within a function. You're saying y~g(...), when the function g(...) itself calls y~(other variables). It's kind of 'double counting' in a way.
Just do:
nls(y~A*(1-exp(-b*x))+R, data=Data, start=list(A=-2,b=0,R=-5))
Your initial guess for parameters were way off. I saved your data in 'data.csv'
for this example that converges and then does the plot... To get this, I
adjusted parameters to get close and then did the nls fit...
df <- read.csv('data.csv')
x <- df$x
y <- df$y
plot(x,y)
fit <- nls(y~A*(1-exp(-b*x))+R, data=df, start=list(A=3,b=0.005,R=-2))
s <- summary(fit)
A <- s[["parameters"]][1]
b <- s[["parameters"]][2]
R <- s[["parameters"]][3]
f <- function(z){
v <- A*(1-exp(-b*z))+R
v
}
x.t <- 0:max(x)
y.c <- sapply(x.t, f)
lines(x.t, y.c, col='red')
print(s)
Computers do what you tell them:
y~A(1-exp(-bx))+R
Here R interprets A(...) as a function and bx as a variable.
You want y~A*(1-exp(-b*x))+R.