I am learning R and had problem when I try run LPPL using nls. I used monthly data of KLSE.
> library(tseries)
> library(zoo)
ts<-read.table(file.choose(),header=TRUE)
ts
rdate Close Date
1 8/1998 302.91 0
2 9/1998 373.52 100
3 10/1998 405.33 200
4 11/1998 501.47 300
5 12/1998 586.13 400
6 1/1999 591.43 500
7 2/1999 542.23 600
8 3/1999 502.82 700
9 4/1999 674.96 800
10 5/1999 743.04 900
11 6/1999 811.10 1000
12 7/1999 768.69 1100
13 8/1999 767.06 1200
14 9/1999 675.45 1300
15 10/1999 742.87 1400
16 11/1999 734.66 1500
17 12/1999 812.33 1600
18 1/2000 922.10 1700
19 2/2000 982.24 1800
20 3/2000 974.38 1900
21 4/2000 898.35 2000
22 5/2000 911.51 2100
23 6/2000 833.37 2200
24 7/2000 798.83 2300
25 8/2000 795.84 2400
26 9/2000 713.51 2500
27 10/2000 752.36 2600
28 11/2000 729.95 2700
29 12/2000 679.64 2800
30 1/2001 727.73 2900
31 2/2001 709.39 3000
32 3/2001 647.48 3100
33 4/2001 584.50 3200
34 5/2001 572.88 3300
35 6/2001 592.99 3400
36 7/2001 659.40 3500
37 8/2001 687.16 3600
38 9/2001 615.34 3700
39 10/2001 600.07 3800
40 11/2001 638.02 3900
41 12/2001 696.09 4000
42 1/2002 718.82 4100
43 2/2002 708.91 4200
44 3/2002 756.10 4300
45 4/2002 793.99 4400
46 5/2002 741.76 4500
47 6/2002 725.44 4600
48 7/2002 721.59 4700
49 8/2002 711.36 4800
50 9/2002 638.01 4900
51 10/2002 659.57 5000
52 11/2002 629.22 5100
53 12/2002 646.32 5200
54 1/2003 664.77 5300
55 2/2003 646.80 5400
56 3/2003 635.72 5500
57 4/2003 630.37 5600
58 5/2003 671.46 5700
59 6/2003 691.96 5800
60 7/2003 720.56 5900
61 8/2003 743.30 6000
62 9/2003 733.45 6100
63 10/2003 817.12 6200
64 11/2003 779.28 6300
65 12/2003 793.94 6400
66 1/2004 818.94 6500
67 2/2004 879.24 6600
68 3/2004 901.85 6700
69 4/2004 838.21 6800
70 5/2004 810.67 6900
71 6/2004 819.86 7000
72 7/2004 833.98 7100
73 8/2004 827.98 7200
74 9/2004 849.96 7300
75 10/2004 861.14 7400
76 11/2004 917.19 7500
77 12/2004 907.43 7600
78 1/2005 916.27 7700
79 2/2005 907.38 7800
80 3/2005 871.35 7900
81 4/2005 878.96 8000
82 5/2005 860.73 8100
83 6/2005 888.32 8200
84 7/2005 937.39 8300
85 8/2005 913.56 8400
86 9/2005 927.54 8500
87 10/2005 910.76 8600
88 11/2005 896.13 8700
89 12/2005 899.79 8800
90 1/2006 914.01 8900
91 2/2006 928.94 9000
92 3/2006 926.63 9100
93 4/2006 949.23 9200
94 5/2006 927.78 9300
95 6/2006 914.69 9400
96 7/2006 935.85 9500
97 8/2006 958.12 9600
98 9/2006 967.55 9700
99 10/2006 988.30 9800
100 11/2006 1080.66 9900
101 12/2006 1096.24 10000
102 1/2007 1189.35 10100
103 2/2007 1196.45 10200
104 3/2007 1246.87 10300
105 4/2007 1322.25 10400
106 5/2007 1346.89 10500
107 6/2007 1354.38 10600
108 7/2007 1373.71 10700
109 8/2007 1273.93 10800
110 9/2007 1336.30 10900
111 10/2007 1413.65 11000
112 11/2007 1396.98 11100
113 12/2007 1445.03 11200
df <- data.frame(ts)
df <- data.frame(Date=df$Date,Y=df$Close)
df <- df[!is.na(df$Y),]
library(minpack.lm)
library(ggplot2)
f <- function(pars, xx){pars$a+pars$b*(pars$tc-xx)^pars$m* (1+pars$c*cos(pars$omega*log(pars$tc-xx)+pars$phi))}
resids <- function(p,observed,xx){df$Y-f(p,xx)}
nls.out<-nls.lm(par=list(a=7.048293, b=-8.8e-5, tc=112000, m=0.5, omega=3.03, phi=-9.76, c=-14), fn=resids, observed=df$Y, xx=df$days, control=nls.lm.control(maxiter=1024, ftol=1e-6, maxfev=1e6))
par <- nls.out$par
nls.final<-nls(Y~a+(tc-days)^m*(b+c*cos(omega*log(tc-days)+phi)), data=df, start=par, algorithm="plinear", control=nls.control(maxiter=1024, minFactor=1e-8))
Error in qr.solve(QR.B, cc) : singular matrix 'a' in solve
I got error a singular matrix.What I need to change to avoid this error?
Your problem is: the cosine term is zero for some value, this makes the matrix singular, you basically need to constrict the parameter space. Additionally, I would read more of the literature since some fancy trig work will remove the phi parameter, this improves the nl optimization enough to get useful and reproducible results.
Related
I would like estimate the parameters of the Gompert-Makeham distribution, but I haven't got a result.
I would like a method in R, like this Weibull parameter estimation code:
weibull_loglik <- function(parm){
gamma <- parm[1]
lambda <- parm[2]
loglik <- sum(dweibull(vec, shape=gamma, scale=lambda, log=TRUE))
return(-loglik)
}
weibull <- nlm(weibull_loglik,parm<-c(1,1), hessian = TRUE, iterlim=100)
weibull$estimate
c=weibull$estimate[1];b=weibull$estimate[2]
My data:
[1] 872 52 31 26 22 17 11 17 17 8 20 12 25 14 17
[16] 20 17 23 32 37 28 24 43 40 34 29 26 32 34 51
[31] 50 67 84 70 71 137 123 137 172 189 212 251 248 272 314
[46] 374 345 411 494 461 505 506 565 590 535 639 710 733 795 786
[61] 894 963 1019 1149 1185 1356 1354 1460 1622 1783 1843 2049 2262 2316 2591
[76] 2730 2972 3187 3432 3438 3959 3140 3612 3820 3478 4054 3587 3433 3150 2881
[91] 2639 2250 1850 1546 1236 966 729 532 375 256 168 107 65 39 22
[106] 12 6 3 2 1 1
summary(vec)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.0 32.0 314.0 900.9 1355.0 4054.0
It would be nice to have a reproducible example, but something like:
library(bbmle)
library(eha)
set.seed(101)
vec <- rmakeham(1000, shape = c(2,3), scale = 2)
dmwrap <- function(x, shape1, shape2, scale, log) {
res <- try(dmakeham(x, c(shape1, shape2), scale, log = log), silent = TRUE)
if (inherits(res, "try-error")) return(NA)
res
}
m1 <- mle2(y ~ dmwrap(shape1, shape2, scale),
start = list(shape1=1,shape2=1, scale=1),
data = data.frame(y = vec),
method = "Nelder-Mead"
)
Define a wrapper that (1) takes shape parameters as separate values; (2) returns NA rather than throwing an error when e.g. parameters are negative
Use Nelder-Mead rather than default BFGS for robustness
the fitdistrplus package might help too
if you're going to do a lot of this it may help to fit parameters on the log scale (i.e. use parameters logshape1, etc., and use exp(logshape1) etc. in the fitting formula)
I had to work a little harder to fit your data; I scaled the variable by 1000 (and found that I could only compute the log-likelihood; the likelihood gave an error that I didn't bother trying to track down). Unfortunately, it doesn't look like a great fit (too many small values).
x <- scan(text = "872 52 31 26 22 17 11 17 17 8 20 12 25 14 17
20 17 23 32 37 28 24 43 40 34 29 26 32 34 51
50 67 84 70 71 137 123 137 172 189 212 251 248 272 314
374 345 411 494 461 505 506 565 590 535 639 710 733 795 786
894 963 1019 1149 1185 1356 1354 1460 1622 1783 1843 2049 2262 2316 2591
2730 2972 3187 3432 3438 3959 3140 3612 3820 3478 4054 3587 3433 3150 2881
2639 2250 1850 1546 1236 966 729 532 375 256 168 107 65 39 22
12 6 3 2 1 1")
m1 <- mle2(y ~ dmwrap(shape1, shape2, scale),
start = list(shape1=1,shape2=1, scale=10000),
data = data.frame(y = x/1000),
method = "Nelder-Mead"
)
cc <- as.list(coef(m1))
png("gm.png")
hist(x,breaks = 25, freq=FALSE)
with(cc,
curve(exp(dmwrap(x/1000, shape1, shape2, scale, log = TRUE))/1000, add = TRUE)
)
dev.off()
When using Microbenchmark I have noticed that the first execution is always a lot slower than the rest. This effect was the same over different machines and with different functions. Does this have something to do with with the library or is this some kind of warmup that is to be expected?
library(microbenchmark)
X <- matrix(rnorm(100), nrow = 10)
microbenchmark(solve(X))$time
#[1] 82700 23700 18300 17700 19700 19100 16900 17500 17300 16600 16700 16700 18500 16900 17700 16900 17000 16200 17400 17000 16800 16600 17000 16700 16800 17100
#[27] 17300 17100 16800 17800 17400 18100 17400 18100 18000 16700 17400 17300 17000 16800 16400 17300 16700 16900 16900 16700 17200 17800 16600 17100 16800 17800
#[53] 17000 17200 17500 17200 17200 17300 17800 17600 17600 17200 16600 16700 16800 16600 16400 16500 17300 17600 16800 17600 16300 16800 17100 16500 16800 16700
#[79] 16300 16700 16300 16700 16800 16700 16400 17100 16400 17100 17000 18000 16600 16600 16600 16800 16700 16500 17600 19100 17400 16900
It has to do with the warm-up time, see help('microbenchmark'), section details, argument control:
The control list can contain the following entries:
order
[omited]
warmup
the number of warm-up iterations performed before the actual benchmark. These are used to estimate the timing overhead as well as spinning up the processor from any sleep or idle states it might be in. The default value is 2.
If you increase the number of warm-up iterations, the first run might not be the slowest, though it many times is.
library(microbenchmark)
set.seed(2020)
X <- matrix(rnorm(100), nrow = 10)
times <- microbenchmark(solve(X), control = list(warmup = 10))$time
times
# [1] 145229 72724 65333 65305 115715 63797 689113 72101 64830 66392
# [11] 65776 66619 65531 64765 65351 65605 65745 65106 64661 65790
# [21] 65435 64964 66138 65952 66893 65654 65585 75141 74666 69060
# [31] 72725 66650 65486 65894 66808 65381 66039 65959 64842 65029
# [41] 65673 66439 64394 70585 68899 73875 73180 67807 65891 65699
# [51] 64693 63679 65504 80190 66150 65048 64372 64842 65845 65144
# [61] 65543 65297 65485 64695 66580 64921 65453 64840 65559 65805
# [71] 64362 66098 65464 65227 64998 64007 65659 63919 64727 64796
# [81] 65231 64030 65871 65735 64217 65195 65181 65130 66015 63891
# [91] 63755 65274 65116 64573 64244 64214 64148 64457 65346 64228
Now see which is the first with order:
order(times, decreasing = TRUE)
# [1] 7 1 5 54 28 29 46 47 31 2 8 44 30 45 48 25 35 32 12
# [20] 65 42 10 55 23 72 37 89 38 24 34 49 83 59 70 20 11 17 84
# [39] 50 41 77 26 16 27 69 61 13 53 33 63 73 67 21 36 15 99 3
# [58] 4 62 92 81 74 86 87 60 88 93 18 56 40 75 22 66 39 58 68
# [77] 9 80 14 79 64 51 19 94 98 43 57 71 95 100 85 96 97 82 76
# [96] 78 90 6 91 52
In this case the slowest was the seventh run, not the first.
I currently have this data set below, but I am unsure as to how I can convert this into a time series from the data frame format that it is currently in.
I am also unsure as to how I can split this data up to create an in-sample and out-of-sample data set for forecasting.
Date Observations
1 1975/01 5172
2 1975/02 6162
3 1975/03 6979
4 1975/04 5418
5 1976/01 4801
6 1976/02 5849
7 1976/03 6292
8 1976/04 5261
9 1977/01 4461
10 1977/02 5322
11 1977/03 6153
12 1977/04 5377
13 1978/01 4808
14 1978/02 5845
15 1978/03 6023
16 1978/04 5691
17 1979/01 4683
18 1979/02 5663
19 1979/03 6068
20 1979/04 5429
21 1980/01 4897
22 1980/02 5685
23 1980/03 5862
24 1980/04 4663
25 1981/01 4566
26 1981/02 5118
27 1981/03 5261
28 1981/04 4459
29 1982/01 4352
30 1982/02 4995
31 1982/03 5559
32 1982/04 4823
33 1983/01 4462
34 1983/02 5228
35 1983/03 5997
36 1983/04 4725
37 1984/01 4223
38 1984/02 4940
39 1984/03 5780
40 1984/04 5232
41 1985/01 4723
42 1985/02 5219
43 1985/03 5855
44 1985/04 5613
45 1986/01 4987
46 1986/02 6117
47 1986/03 5777
48 1986/04 5803
49 1987/01 5113
50 1987/02 6298
51 1987/03 7152
52 1987/04 6591
53 1988/01 6337
54 1988/02 6672
55 1988/03 7224
56 1988/04 6296
57 1989/01 6957
58 1989/02 7538
59 1989/03 8022
60 1989/04 7216
61 1990/01 6633
62 1990/02 7355
63 1990/03 7897
64 1990/04 7159
65 1991/01 6637
66 1991/02 7629
67 1991/03 8080
68 1991/04 7077
69 1992/01 7190
70 1992/02 7396
71 1992/03 7795
72 1992/04 7147
This question already has answers here:
In r, get output values in power curve for 'a' and 'b' values
(3 answers)
Closed 8 years ago.
i am trying to make a fit for a power function using the NLS function in R but i am failing to find good start values.
This is part of my data "CentroM":
Wg TLcm
3200 79
2650 77
2750 74
870 45
1480 52
3400 80.5
2400 76
2800 76.5
2900 77.5
2700 76
3215 76
3300 83
3100 79
3000 78.5
2800 76
2700 77
2500 74.5
2300 69
2700 73.5
3350 79
and here is the script i used:
plot(CentroM$TLcm,CentroM$Wg,xlab="Total Length(cm)",ylab="Total Weight(g)",pch=1,type="p")
f<-function(TLcm,a,b){a*TLcm^b}
fit<-nls(CentroM$Wg~f(CentroM$TLcm,a,b),start=list(a=0.5,b=0.5),data=CentroM)
and here is what i get:
Error in model.frame.default(formula = ~CentroM + Wg + TLcm, data = CentroM) :
invalid type (list) for variable 'CentroM'
Any help please...
You could take the logs, fit a linear model and use the coef from there a starting values:
df <- read.table(header = TRUE, text = 'Wg TLcm
3200 79
2650 77
2750 74
870 45
1480 52
3400 80.5
2400 76
2800 76.5
2900 77.5
2700 76
3215 76
3300 83
3100 79
3000 78.5
2800 76
2700 77
2500 74.5
2300 69
2700 73.5
3350 79')
mod1 <- lm(log(Wg) ~ log(TLcm), data = df)
fit <- nls(Wg ~ a*TLcm^b,
start = list(a = exp(coef(mod1)[1]),
b = coef(mod1)[2]),
data = df)
I'm using the below data to create a plot in R using ggplot2.
Hour.of.day Model N Distance.travelled sd se ci
1 0100 h300_fv30 60 3.6264709 5.078277 0.6556027 1.3118579
2 0100 h300_fv35 60 2.9746019 5.313252 0.6859379 1.3725586
3 0100 h300_fv40 60 3.0422525 3.950650 0.5100267 1.0205610
4 0200 h300_fv30 60 4.3323896 6.866003 0.8863972 1.7736767
5 0200 h300_fv35 60 3.5567420 6.259378 0.8080823 1.6169689
6 0200 h300_fv40 60 2.5232512 4.533234 0.5852380 1.1710585
7 0300 h300_fv30 60 3.1800537 5.303506 0.6846797 1.3700409
8 0300 h300_fv35 60 2.9281442 4.445953 0.5739700 1.1485113
9 0300 h300_fv40 60 2.5078045 4.058295 0.5239236 1.0483687
10 0400 h300_fv30 60 3.3408231 4.567161 0.5896180 1.1798229
11 0400 h300_fv35 60 2.8679676 5.396700 0.6967110 1.3941155
12 0400 h300_fv40 60 3.1615813 4.244155 0.5479180 1.0963815
13 0500 h300_fv30 60 3.8117851 6.970900 0.8999394 1.8007745
14 0500 h300_fv35 60 2.1130581 3.925906 0.5068323 1.0141691
15 0500 h300_fv40 60 3.6430531 4.905484 0.6332953 1.2672209
16 0600 h300_fv30 60 3.5234762 5.150027 0.6648657 1.3303931
17 0600 h300_fv35 60 2.0341804 3.192176 0.4121082 0.8246266
18 0600 h300_fv40 60 3.2838958 3.770624 0.4867855 0.9740555
19 0700 h300_fv30 60 3.8327926 6.521022 0.8418603 1.6845587
20 0700 h300_fv35 60 1.6933289 2.607322 0.3366039 0.6735428
21 0700 h300_fv40 60 2.3896956 3.435656 0.4435413 0.8875241
22 0800 h300_fv30 60 3.3077466 6.504371 0.8397107 1.6802573
23 0800 h300_fv35 60 1.4823307 3.556884 0.4591917 0.9188405
24 0800 h300_fv40 60 2.4161741 3.571444 0.4610715 0.9226019
25 0900 h300_fv30 60 2.1506438 2.893029 0.3734885 0.7473487
26 0900 h300_fv35 60 1.8821961 3.457929 0.4464167 0.8932778
27 0900 h300_fv40 60 1.7896335 2.714514 0.3504423 0.7012334
28 1000 h300_fv30 60 2.5107475 5.491835 0.7089929 1.4186914
29 1000 h300_fv35 60 0.9491365 2.061712 0.2661658 0.5325966
30 1000 h300_fv40 60 1.6678013 3.234033 0.4175119 0.8354393
31 1100 h300_fv30 60 1.8602186 3.365695 0.4345093 0.8694511
32 1100 h300_fv35 60 1.4385708 2.869765 0.3704851 0.7413389
33 1100 h300_fv40 60 1.1273899 2.010280 0.2595261 0.5193105
34 1200 h300_fv30 60 1.4870763 2.112841 0.2727667 0.5458048
35 1200 h300_fv35 60 2.5295481 4.740384 0.6119810 1.2245711
36 1200 h300_fv40 60 1.6551202 3.051420 0.3939366 0.7882653
37 1300 h300_fv30 60 2.8791490 4.925870 0.6359271 1.2724872
38 1300 h300_fv35 60 2.4731563 5.266690 0.6799268 1.3605303
39 1300 h300_fv40 60 4.5989133 8.394460 1.0837201 2.1685189
40 1400 h300_fv30 60 1.5050205 3.188480 0.4116310 0.8236717
41 1400 h300_fv35 60 1.7615688 3.064842 0.3956693 0.7917325
42 1400 h300_fv40 60 2.2766514 5.215937 0.6733746 1.3474194
43 1500 h300_fv30 60 1.9097882 2.770040 0.3576106 0.7155772
44 1500 h300_fv35 60 2.0109347 4.070014 0.5254365 1.0513961
45 1500 h300_fv40 60 1.6316881 4.119681 0.5318485 1.0642264
46 1600 h300_fv30 60 3.3246263 5.352698 0.6910304 1.3827486
47 1600 h300_fv35 60 2.0389703 3.781869 0.4882372 0.9769604
48 1600 h300_fv40 60 1.0204568 2.205685 0.2847527 0.5697888
49 1700 h300_fv30 60 3.6132519 5.467875 0.7058996 1.4125019
50 1700 h300_fv35 60 2.1139255 4.178283 0.5394140 1.0793648
51 1700 h300_fv40 60 1.5547818 3.411135 0.4403756 0.8811895
52 1800 h300_fv30 60 5.0552532 7.344069 0.9481152 1.8971742
53 1800 h300_fv35 60 2.1832792 3.824244 0.4937078 0.9879070
54 1800 h300_fv40 60 1.6532516 3.273697 0.4226325 0.8456856
55 1900 h300_fv30 60 5.6107731 6.891023 0.8896272 1.7801399
56 1900 h300_fv35 60 2.9822004 5.958244 0.7692060 1.5391777
57 1900 h300_fv40 60 2.7111394 3.798765 0.4904184 0.9813250
58 2000 h300_fv30 60 6.0438385 7.126952 0.9200855 1.8410868
59 2000 h300_fv35 60 3.9517888 6.462761 0.8343388 1.6695081
60 2000 h300_fv40 60 3.9508503 5.374253 0.6938130 1.3883167
61 2100 h300_fv30 60 4.2144712 5.648673 0.7292406 1.4592070
62 2100 h300_fv35 60 2.2205186 3.397391 0.4386013 0.8776392
63 2100 h300_fv40 60 3.9000010 5.881409 0.7592866 1.5193290
64 2200 h300_fv30 60 3.9478958 5.584154 0.7209112 1.4425401
65 2200 h300_fv35 60 3.1612149 4.788883 0.6182421 1.2370996
66 2200 h300_fv40 60 3.7812992 6.424478 0.8293965 1.6596186
67 2300 h300_fv30 61 3.3860628 5.176299 0.6627571 1.3257117
68 2300 h300_fv35 61 3.7427743 6.257596 0.8012031 1.6026448
69 2300 h300_fv40 61 3.6674335 4.945831 0.6332487 1.2666861
70 2400 h300_fv30 59 3.8745470 5.763821 0.7503856 1.5020600
71 2400 h300_fv35 59 3.1284346 5.016476 0.6530895 1.3073007
72 2400 h300_fv40 59 3.7563017 4.819053 0.6273872 1.2558520
The plot function is
ggplot(my_data, aes(x=Hour.of.day, y=Distance.travelled, colour=Model)) +
geom_errorbar(aes(ymin = Distance.travelled - ci, ymax = Distance.travelled + ci), width=.1, position=position_dodge(2)) +
geom_line(position=position_dodge(2)) +
geom_point(position=position_dodge(2)) +
scale_x_discrete(breaks=c("0600", "1200", "1800", "2400")) +
theme(axis.ticks = element_blank())
Differentiating the three separate patterns is hard to do in the resulting plot.
Does anybody have any suggestions on ways to improve the visualization so that the three separate patterns can be better differentiated? For example, some way to emphasize the mean points and place the confidence intervals in the background?
Use lines and ribbons:
library(ggplot2)
ggplot(my_data, aes(x=Hour.of.day, y=Distance.travelled,
fill=Model)) +
theme_bw()+
geom_line(aes(colour=Model))+
geom_ribbon(aes(ymin = Distance.travelled - ci,
ymax = Distance.travelled + ci),alpha=0.4)+
scale_x_discrete(breaks=c("0600", "1200", "1800", "2400")) +
theme(axis.ticks = element_blank())
ggsave("ribbonplot.png",width=7,height=4)
You can make the lines wider (lwd) or the ribbons fainter (alpha) if you want to emphasize the pattern of the mean more strongly.
Here's another way, using facets:
ggplot(gg,aes(x=Hour.of.day, y=Distance.travelled)) +
geom_pointrange(aes(ymin=Distance.travelled-ci,ymax=Distance.travelled+ci,color=Model))+
facet_grid(Model~.) +
stat_smooth(formula=y~1, method="lm",linetype=2,se=F)+
geom_abline(aes(slope=0,intercept=mean(Distance.travelled)),linetype=3)
The main idea here is that the data should have a frame of reference (here, the mean value of distance traveled for a given model). This tells you at a glance when distance traveled is significantly different from the mean. The grey dotted line is the grand average across all models, which tells you if a given model tends to travel more or less the average over time for all the models.
If you set se=T in the call to stat_smooth(...), you also get the variability in the means, but all that shading I think detracts from the main point.