Find start values for NLS function in R [duplicate] - r

This question already has answers here:
In r, get output values in power curve for 'a' and 'b' values
(3 answers)
Closed 8 years ago.
i am trying to make a fit for a power function using the NLS function in R but i am failing to find good start values.
This is part of my data "CentroM":
Wg TLcm
3200 79
2650 77
2750 74
870 45
1480 52
3400 80.5
2400 76
2800 76.5
2900 77.5
2700 76
3215 76
3300 83
3100 79
3000 78.5
2800 76
2700 77
2500 74.5
2300 69
2700 73.5
3350 79
and here is the script i used:
plot(CentroM$TLcm,CentroM$Wg,xlab="Total Length(cm)",ylab="Total Weight(g)",pch=1,type="p")
f<-function(TLcm,a,b){a*TLcm^b}
fit<-nls(CentroM$Wg~f(CentroM$TLcm,a,b),start=list(a=0.5,b=0.5),data=CentroM)
and here is what i get:
Error in model.frame.default(formula = ~CentroM + Wg + TLcm, data = CentroM) :
invalid type (list) for variable 'CentroM'
Any help please...

You could take the logs, fit a linear model and use the coef from there a starting values:
df <- read.table(header = TRUE, text = 'Wg TLcm
3200 79
2650 77
2750 74
870 45
1480 52
3400 80.5
2400 76
2800 76.5
2900 77.5
2700 76
3215 76
3300 83
3100 79
3000 78.5
2800 76
2700 77
2500 74.5
2300 69
2700 73.5
3350 79')
mod1 <- lm(log(Wg) ~ log(TLcm), data = df)
fit <- nls(Wg ~ a*TLcm^b,
start = list(a = exp(coef(mod1)[1]),
b = coef(mod1)[2]),
data = df)

Related

Gompertz-Makeham parameter estimation

I would like estimate the parameters of the Gompert-Makeham distribution, but I haven't got a result.
I would like a method in R, like this Weibull parameter estimation code:
weibull_loglik <- function(parm){
gamma <- parm[1]
lambda <- parm[2]
loglik <- sum(dweibull(vec, shape=gamma, scale=lambda, log=TRUE))
return(-loglik)
}
weibull <- nlm(weibull_loglik,parm<-c(1,1), hessian = TRUE, iterlim=100)
weibull$estimate
c=weibull$estimate[1];b=weibull$estimate[2]
My data:
[1] 872 52 31 26 22 17 11 17 17 8 20 12 25 14 17
[16] 20 17 23 32 37 28 24 43 40 34 29 26 32 34 51
[31] 50 67 84 70 71 137 123 137 172 189 212 251 248 272 314
[46] 374 345 411 494 461 505 506 565 590 535 639 710 733 795 786
[61] 894 963 1019 1149 1185 1356 1354 1460 1622 1783 1843 2049 2262 2316 2591
[76] 2730 2972 3187 3432 3438 3959 3140 3612 3820 3478 4054 3587 3433 3150 2881
[91] 2639 2250 1850 1546 1236 966 729 532 375 256 168 107 65 39 22
[106] 12 6 3 2 1 1
summary(vec)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.0 32.0 314.0 900.9 1355.0 4054.0
It would be nice to have a reproducible example, but something like:
library(bbmle)
library(eha)
set.seed(101)
vec <- rmakeham(1000, shape = c(2,3), scale = 2)
dmwrap <- function(x, shape1, shape2, scale, log) {
res <- try(dmakeham(x, c(shape1, shape2), scale, log = log), silent = TRUE)
if (inherits(res, "try-error")) return(NA)
res
}
m1 <- mle2(y ~ dmwrap(shape1, shape2, scale),
start = list(shape1=1,shape2=1, scale=1),
data = data.frame(y = vec),
method = "Nelder-Mead"
)
Define a wrapper that (1) takes shape parameters as separate values; (2) returns NA rather than throwing an error when e.g. parameters are negative
Use Nelder-Mead rather than default BFGS for robustness
the fitdistrplus package might help too
if you're going to do a lot of this it may help to fit parameters on the log scale (i.e. use parameters logshape1, etc., and use exp(logshape1) etc. in the fitting formula)
I had to work a little harder to fit your data; I scaled the variable by 1000 (and found that I could only compute the log-likelihood; the likelihood gave an error that I didn't bother trying to track down). Unfortunately, it doesn't look like a great fit (too many small values).
x <- scan(text = "872 52 31 26 22 17 11 17 17 8 20 12 25 14 17
20 17 23 32 37 28 24 43 40 34 29 26 32 34 51
50 67 84 70 71 137 123 137 172 189 212 251 248 272 314
374 345 411 494 461 505 506 565 590 535 639 710 733 795 786
894 963 1019 1149 1185 1356 1354 1460 1622 1783 1843 2049 2262 2316 2591
2730 2972 3187 3432 3438 3959 3140 3612 3820 3478 4054 3587 3433 3150 2881
2639 2250 1850 1546 1236 966 729 532 375 256 168 107 65 39 22
12 6 3 2 1 1")
m1 <- mle2(y ~ dmwrap(shape1, shape2, scale),
start = list(shape1=1,shape2=1, scale=10000),
data = data.frame(y = x/1000),
method = "Nelder-Mead"
)
cc <- as.list(coef(m1))
png("gm.png")
hist(x,breaks = 25, freq=FALSE)
with(cc,
curve(exp(dmwrap(x/1000, shape1, shape2, scale, log = TRUE))/1000, add = TRUE)
)
dev.off()

Error in VAR model

I have this data:
Year W L PTS GF GA S SA
1 2006 49 25 106 253 224 2380 2662
2 2007 51 23 110 266 207 2261 2553
3 2008 41 32 91 227 224 2425 2433
4 2009 40 34 88 207 228 2375 2398
5 2010 47 29 100 217 221 2508 2389
6 2011 44 27 99 213 190 2362 2506
7 2012 48 26 104 232 205 2261 2517
8 2014 38 32 88 214 233 2382 2365
9 2015 47 25 104 226 202 2614 2304
10 2016 41 27 96 224 213 2507 2231
11 2017 41 29 94 238 220 2557 2458
12 2018 53 18 117 261 204 2641 2650
I've built a VAR model from this data (it's hockey data for one team for the listed years). I converted the above into a time series the ts() argument, and created this model:
VARselect(NSH_ts[, 3:5], lag.max = 8)
var1 <- VAR(NSH_ts[, 3:5], p = 2, type = "both", ic = c("AIC"))
serial.test(var1, type = "PT.adjusted")
forecast.var1 <- forecast(var1, h = 2)
autoplot(forecast.var1) +
scale_x_continuous(breaks = seq(2006, 2022))
I want to use the serial.test() argument, but I get this error:
Error in t(Ci) %*% C0inv : non-conformable arguments
Why won't the serial.test() argument work? (Overall I'm trying to forecast PTS for the next two years, based on the variables in the set).
I've been using this as a guide: https://otexts.org/fpp2/VAR.html
I'm getting a different error, which may be from the VARselect. My table is mostly -Inf entries, with one NaN, and the rest 0. Adjusting the lag.max gave me real numbers, and I had to adjust the other values as well.
VARselect(dfVAR[, 3:5], lag.max = 2)
var1 <- VAR(dfVAR[, 3:5], p = 1, type = "both", ic = c("AIC"))
serial.test(var1, lags.pt = 4, type = "PT.adjusted")
Portmanteau Test (adjusted)
data: Residuals of VAR object var1
Chi-squared = 35.117, df = 27, p-value = 0.1359
The basis of the non-conformable error is that your matrix algebra isn't working, the number of cols in the first matrix have to match the number of rows in the second. Having no knowledge of VAR models, I can't offer help beyond this.

Error In R code in LPPL model in R

I am learning R and had problem when I try run LPPL using nls. I used monthly data of KLSE.
> library(tseries)
> library(zoo)
ts<-read.table(file.choose(),header=TRUE)
ts
rdate Close Date
1 8/1998 302.91 0
2 9/1998 373.52 100
3 10/1998 405.33 200
4 11/1998 501.47 300
5 12/1998 586.13 400
6 1/1999 591.43 500
7 2/1999 542.23 600
8 3/1999 502.82 700
9 4/1999 674.96 800
10 5/1999 743.04 900
11 6/1999 811.10 1000
12 7/1999 768.69 1100
13 8/1999 767.06 1200
14 9/1999 675.45 1300
15 10/1999 742.87 1400
16 11/1999 734.66 1500
17 12/1999 812.33 1600
18 1/2000 922.10 1700
19 2/2000 982.24 1800
20 3/2000 974.38 1900
21 4/2000 898.35 2000
22 5/2000 911.51 2100
23 6/2000 833.37 2200
24 7/2000 798.83 2300
25 8/2000 795.84 2400
26 9/2000 713.51 2500
27 10/2000 752.36 2600
28 11/2000 729.95 2700
29 12/2000 679.64 2800
30 1/2001 727.73 2900
31 2/2001 709.39 3000
32 3/2001 647.48 3100
33 4/2001 584.50 3200
34 5/2001 572.88 3300
35 6/2001 592.99 3400
36 7/2001 659.40 3500
37 8/2001 687.16 3600
38 9/2001 615.34 3700
39 10/2001 600.07 3800
40 11/2001 638.02 3900
41 12/2001 696.09 4000
42 1/2002 718.82 4100
43 2/2002 708.91 4200
44 3/2002 756.10 4300
45 4/2002 793.99 4400
46 5/2002 741.76 4500
47 6/2002 725.44 4600
48 7/2002 721.59 4700
49 8/2002 711.36 4800
50 9/2002 638.01 4900
51 10/2002 659.57 5000
52 11/2002 629.22 5100
53 12/2002 646.32 5200
54 1/2003 664.77 5300
55 2/2003 646.80 5400
56 3/2003 635.72 5500
57 4/2003 630.37 5600
58 5/2003 671.46 5700
59 6/2003 691.96 5800
60 7/2003 720.56 5900
61 8/2003 743.30 6000
62 9/2003 733.45 6100
63 10/2003 817.12 6200
64 11/2003 779.28 6300
65 12/2003 793.94 6400
66 1/2004 818.94 6500
67 2/2004 879.24 6600
68 3/2004 901.85 6700
69 4/2004 838.21 6800
70 5/2004 810.67 6900
71 6/2004 819.86 7000
72 7/2004 833.98 7100
73 8/2004 827.98 7200
74 9/2004 849.96 7300
75 10/2004 861.14 7400
76 11/2004 917.19 7500
77 12/2004 907.43 7600
78 1/2005 916.27 7700
79 2/2005 907.38 7800
80 3/2005 871.35 7900
81 4/2005 878.96 8000
82 5/2005 860.73 8100
83 6/2005 888.32 8200
84 7/2005 937.39 8300
85 8/2005 913.56 8400
86 9/2005 927.54 8500
87 10/2005 910.76 8600
88 11/2005 896.13 8700
89 12/2005 899.79 8800
90 1/2006 914.01 8900
91 2/2006 928.94 9000
92 3/2006 926.63 9100
93 4/2006 949.23 9200
94 5/2006 927.78 9300
95 6/2006 914.69 9400
96 7/2006 935.85 9500
97 8/2006 958.12 9600
98 9/2006 967.55 9700
99 10/2006 988.30 9800
100 11/2006 1080.66 9900
101 12/2006 1096.24 10000
102 1/2007 1189.35 10100
103 2/2007 1196.45 10200
104 3/2007 1246.87 10300
105 4/2007 1322.25 10400
106 5/2007 1346.89 10500
107 6/2007 1354.38 10600
108 7/2007 1373.71 10700
109 8/2007 1273.93 10800
110 9/2007 1336.30 10900
111 10/2007 1413.65 11000
112 11/2007 1396.98 11100
113 12/2007 1445.03 11200
df <- data.frame(ts)
df <- data.frame(Date=df$Date,Y=df$Close)
df <- df[!is.na(df$Y),]
library(minpack.lm)
library(ggplot2)
f <- function(pars, xx){pars$a+pars$b*(pars$tc-xx)^pars$m* (1+pars$c*cos(pars$omega*log(pars$tc-xx)+pars$phi))}
resids <- function(p,observed,xx){df$Y-f(p,xx)}
nls.out<-nls.lm(par=list(a=7.048293, b=-8.8e-5, tc=112000, m=0.5, omega=3.03, phi=-9.76, c=-14), fn=resids, observed=df$Y, xx=df$days, control=nls.lm.control(maxiter=1024, ftol=1e-6, maxfev=1e6))
par <- nls.out$par
nls.final<-nls(Y~a+(tc-days)^m*(b+c*cos(omega*log(tc-days)+phi)), data=df, start=par, algorithm="plinear", control=nls.control(maxiter=1024, minFactor=1e-8))
Error in qr.solve(QR.B, cc) : singular matrix 'a' in solve
I got error a singular matrix.What I need to change to avoid this error?
Your problem is: the cosine term is zero for some value, this makes the matrix singular, you basically need to constrict the parameter space. Additionally, I would read more of the literature since some fancy trig work will remove the phi parameter, this improves the nl optimization enough to get useful and reproducible results.

Why does my detrending not result in a different semivariogram in R?

I have a series of samples, which I wish to construct a variogram model of, and eventual kriging model. First, I neeed to detrend the data, as shown below:
samples
x y z
1 180 1180 2.763441
2 180 240 -2.000000
3 380 1840 1.720087
4 720 80 4.056754
5 860 800 4.361503
6 620 1360 4.737717
7 980 1920 4.352956
8 1680 260 4.568255
9 1520 800 5.025272
10 1100 1220 4.693432
11 800 1460 2.470927
12 360 1900 1.455169
13 700 1760 2.894159
14 720 1540 2.115742
15 660 1480 1.749017
16 540 1680 3.291592
17 260 1280 2.962401
18 440 1640 2.422442
19 280 1260 2.966076
20 580 1580 3.178913
21 600 1220 3.752786
22 240 1700 1.748011
23 480 1440 3.106302
24 740 1880 4.827699
25 760 1320 3.603621
26 1560 1640 5.410076
27 1960 1980 6.145778
28 1520 1620 5.499064
29 1900 1820 5.316121
30 1780 1580 5.318344
31 100 740 2.019103
32 180 760 2.353693
33 140 200 1.714856
34 380 720 3.526107
35 240 580 3.075283
36 260 600 3.329397
37 340 360 3.188613
38 280 680 2.626241
39 420 700 3.211163
40 500 240 2.960805
41 460 280 3.171664
42 480 300 2.828883
43 400 640 3.227938
44 440 480 2.420358
45 300 560 4.021187
46 1380 220 5.364264
47 1500 740 5.344526
48 1240 380 4.632060
49 1420 360 4.012537
50 1280 800 4.122139
51 1400 600 5.033020
52 1300 640 4.215308
53 1460 200 5.116025
54 1220 440 4.550290
55 1200 520 3.788613
56 1540 340 5.772432
57 1520 660 5.656598
58 1480 260 5.423685
59 1360 780 4.728220
60 1260 240 3.683696
print(mean(samples$z))
h <- gstat(formula=z~1, locations=~x+y, data=samples)
samples.vgm <- variogram(h)
plot(samples.vgm,main='Variogram of Samples NOT detrended')
z = samples$z
x = samples$x
y = samples$y
trend <- lm(z~x+y)
c = trend$coefficients[[1]]
a = trend$coefficients[[2]]
b = trend$coefficients[[3]]
Xs <- c()
Ys <- c()
Zs <- c()
print('started the loop')
for (i in 1:nrow(samples)){
i = samples[i,]
x=i$x
y=i$y
z=i$z
z_prime = z - (a*x+b*y+c)
Xs <- c(Xs,x)
Ys <- c(Ys,y)
Zs <- c(Zs,z_prime)
}
sampled <- data.frame(Xs,Ys,Zs)
print(sampled)
print('the length of sampled is')
print(length(sampled[[1]]))
print(levelplot(Zs~Xs+Ys,sampled))
x <- seq(0,2000,by=20)
y <- seq(0,2000,by=20)
pred.grid <- data.frame(x=rep(x,times=length(y)),y=rep(y,each=length(x)))
g <- gstat(formula=Zs~1, locations=~Xs+Ys, data=sampled)
sampled.vgm <- variogram(g)
plot(sampled.vgm,main='Variogram of Samples hopefully detrended')
The problem is that the plot of detrended variogram (i.e. the variogram g above) looks exactly the same as the variogram h also above, which is NOT detrended. any reason why this happens??
The data is clearly different, The mean of the values in the detrended version is 0, as expected, but the non-detrended version the mean is around 3.556, also as expected.
Is there something I'm not catching here?
Not sure this question belongs here, since I think the issue is conceptual, not related to your code. I'm new, though, so I'll just go ahead and give you some quick feedback.
The variogram plots the variance (or semi-variance technically) of your data within a given spatial lag. When you apply a linear transformation to your data, I don't believe you'll alter the variance, and so you shouldn't see different parameters come out of the variogram model. Instead, your kriged surface just takes on a different mean value.
p.s. It would be helpful to make the code something that anyone can copy and paste -- e.g., include coded test data.

How to process multi columns data in data.frame with plyr

I am trying to solve the DSC(Differential scanning calorimetry) data with R but it seems that I ran into some troubles. All this used to be done in Origin or Qtiplot tediously in my lab.But I wonder if there is another way to do it in batch.But the result did not goes well. For example, maybe I have used the wrong colnames of my data.frame,the code
dat$0.5min
Error: unexpected numeric constant in "dat$0.5"
can not reach my data.
So below is the full description of my purpose, thank you in advance!
the DSC data is like this(I store the CSV file in my GoogleDrive Link ) :
T1 0.5min T2 1min
40.59 -0.2904 40.59 -0.2545
40.81 -0.281 40.81 -0.2455
41.04 -0.2747 41.04 -0.2389
41.29 -0.2728 41.29 -0.2361
41.54 -0.2553 41.54 -0.2239
41.8 -0.07 41.8 -0.0732
42.06 0.1687 42.06 0.1414
42.32 0.3194 42.32 0.2817
42.58 0.3814 42.58 0.3421
42.84 0.3863 42.84 0.3493
43.1 0.3665 43.11 0.3322
43.37 0.3438 43.37 0.3109
43.64 0.3265 43.64 0.2937
43.9 0.3151 43.9 0.2819
44.17 0.3072 44.17 0.2735
44.43 0.2995 44.43 0.2656
44.7 0.2899 44.7 0.2563
44.96 0.2779 44.96 0.245
in fact I have merge the data into a data.frame and hope I can adjust it and do something further.
the command is:
dat<-read.csv("Book1.csv",header=F)
colnames(dat)<-c('T1','0.5min','T2','1min','T3','2min','T4','4min','T5','8min','T6','10min',
'T7','20min','T8','ascast1','T9','ascast2','T10','ascast3','T11','ascast4',
'T12','ascast5'
)
so actually dat is a data.frame with 1163 obs. of 24 variables.
T1,T2,T3.....T12 means temperature that the samples were tested of DSC although in the same interval they do differ a little due to the unstability of the machine.
And the colname along T1~T12 is Heat Flow of different heat treatment durations that records by the machine and ascast1~ascast5 means nothing done to the sample to check the accuracy of the machine.
Now I need to do something like the following:
for T1~T2 is in Celsius Degrees,I need to change them into Kelvin Degrees whichi means every data plus 273.16.
Two temperature is chosen to compare the result that is Ts=180.25,Te=240.45(all is discussed in Celsius Degrees and I have seen it Qtiplot to make sure). To be clear I list the two temperature and the first 6 columns data.
T1 0.5min T2 1min T3 2min T4 4min
180.25 -0.01710000 180.25 -0.01780000 180.25 -0.02120000 180.25 -0.02020000
. . . .
. . . .
240.45 0.05700000 240.45 0.04500000 240.45 0.05780000 240.45 0.05580000
That all Heat Flow in Ts should be the same that can be made 0 for convenience. So based on the different values Heat Flow of different times like 0.5min,1min,2min,4min,8min,10min,20min and ascas1~ascast5 all Heat Flow value should be minus the Heat Flow value in Ts.
And for Heat Flow in Te, the value should be adjust to make sure that all the Heat Flow data are the same in Te. The purpose is like the following, (1) calculate mean of the 12 heat flow data in Te. Let's use Hmean for the mean heat flow.So Hmean is the value that all Heat Flow should be. (2) for data in column 0.5min,I use col("0.5min") to denote, and the lineal transform formula is like the following:
col("0.5min")-[([0.05700000-(-0.01710000)]-Hmean)/(Te-Ts)]*(col(T1)-Ts)
Actually, [0.05700000-(-0.01710000)] is done in step 2,but I write it for your reference. And this formula is used for different pair of T1~T12 and columns,like (T1,0.5min),(T2, 1min),(T3,1min).....all is 12 pairs.
Now we can plot the 12 pairs of data on the same plot with intervals from 180~240(also in Celsius Degrees) to magnify the details of differences between the different scans of DSC.
I have been stuck on this problems for 2 days , so I return to stackoverflow for help.
Thanks!
I am assuming that your question was right in the beginning where you got the following error,
dat$0.5min
Error: unexpected numeric constant in "dat$0.5"
As I could not find a question in the rest of the steps. They just seemed like a step by step procedure of an experiment.
To fix that error, the problem is the column name has a number in it so to use the column name in the way you want (to reference a column), you should use "`", accent mark, symbol.
>dataF <- data.frame("0.5min"=1:10,"T2"=11:20,check.names = F)
> dataF$`0.5min`
[1] 1 2 3 4 5 6 7 8 9 10
Based on comments adding more information,
You can add a constant to add to alternate columns in the following manner,
dataF <- data.frame(matrix(1:100,10,10))
const <- 237
> print(dataF)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 1 11 21 31 41 51 61 71 81 91
2 2 12 22 32 42 52 62 72 82 92
3 3 13 23 33 43 53 63 73 83 93
4 4 14 24 34 44 54 64 74 84 94
5 5 15 25 35 45 55 65 75 85 95
6 6 16 26 36 46 56 66 76 86 96
7 7 17 27 37 47 57 67 77 87 97
8 8 18 28 38 48 58 68 78 88 98
9 9 19 29 39 49 59 69 79 89 99
10 10 20 30 40 50 60 70 80 90 100
dataF[,seq(1,ncol(dataF),by = 2)] <- dataF[,seq(1,ncol(dataF),by = 2)] + const
> print(dataF)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 238 11 258 31 278 51 298 71 318 91
2 239 12 259 32 279 52 299 72 319 92
3 240 13 260 33 280 53 300 73 320 93
4 241 14 261 34 281 54 301 74 321 94
5 242 15 262 35 282 55 302 75 322 95
6 243 16 263 36 283 56 303 76 323 96
7 244 17 264 37 284 57 304 77 324 97
8 245 18 265 38 285 58 305 78 325 98
9 246 19 266 39 286 59 306 79 326 99
10 247 20 267 40 287 60 307 80 327 100
To generalize, we know that the columns of a dataframe can be referenced with a vector of numbers/column names. Most operations in R are vectorized. You can use column names or numbers based on the pattern you are looking for.
For example, I change the name of my first two columns and want to access just those I do this,
colnames(dataF)[c(1,2)] <- c("Y1","Y2")
#Reference all column names with "Y" in it. You can do any operation you want on this.
dataF[,grep("Y",colnames(dataF))]
Y1 Y2
1 238 11
2 239 12
3 240 13
4 241 14
5 242 15
6 243 16
7 244 17
8 245 18
9 246 19
10 247 20

Resources