Fitting damped sine wave dataset with gnuplot, getting lot of errors - plot

I was trying to fit this dataset:
#Mydataset damped sine wave data
#X ---- Y
45.80 320.0
91.60 -254.0
137.4 198.0
183.2 -156.0
229.0 126.0
274.8 -100.0
320.6 80.0
366.4 -64.0
412.2 52.0
458.0 -40.0
503.8 34.0
549.6 -26.0
595.4 22.0
641.2 -18.0
which, as you can see by the plot below, has the classical trend of a damped sine wave:
So i first set the macro for the fit
f(x) = exp(-a*x)*sin(b*x)
then i made the proper fit
fit f(x) 'data.txt' via a,b
iter chisq delta/lim lambda a b
0 2.7377200000e+05 0.00e+00 1.10e-19 1.000000e+00 1.000000e+00
Current data point
=========================
# = 1 out of 14
x = -5.12818e+20
z = 320
Current set of parameters
=========================
a = -5.12818e+20
b = -1.44204e+20
Function evaluation yields NaN ("not a number")
getting a NaN as result. So I looked around on STackOverflow and I remembered I've already have had in the past problems by fitting exponential due to their fast growth/decay which requires you to set initial parameters in order not to get this error (as I've asked here). So I tried by setting as starting parameters a and b as the ones expected, a = 9000, b=146000, but the result was more frustrating than the one before:
fit f(x) 'data.txt' via a,b
iter chisq delta/lim lambda a b
0 2.7377200000e+05 0.00e+00 0.00e+00 9.000000e+03 1.460000e+05
Singular matrix in Givens()
I've thought: "these are too much large numbers, let's try with smaller ones".
So i entered the values for a and b and started fitting again
a = 0.01
b = 2
fit f(x) 'data.txt' via a,b
iter chisq delta/lim lambda a b
0 2.7429059500e+05 0.00e+00 1.71e+01 1.000000e-02 2.000000e+00
1 2.7346318324e+05 -3.03e+02 1.71e+00 1.813940e-02 -9.254913e-02
* 1.0680927157e+137 1.00e+05 1.71e+01 -2.493611e-01 5.321099e+00
2 2.7344431789e+05 -6.90e+00 1.71e+00 1.542835e-02 4.310193e+00
* 6.1148639318e+81 1.00e+05 1.71e+01 -1.481123e-01 -1.024914e+01
3 2.7337226343e+05 -2.64e+01 1.71e+00 1.349852e-02 -9.008087e+00
* 6.4751980241e+136 1.00e+05 1.71e+01 -2.458835e-01 -4.089511e+00
4 2.7334273482e+05 -1.08e+01 1.71e+00 1.075319e-02 -4.346296e+00
* 1.8228530731e+121 1.00e+05 1.71e+01 -2.180542e-01 -1.407646e+00
* 2.7379223634e+05 1.64e+02 1.71e+02 8.277720e-03 -1.440256e+00
* 2.7379193486e+05 1.64e+02 1.71e+03 1.072342e-02 -3.706519e+00
5 2.7326800742e+05 -2.73e+01 1.71e+02 1.075288e-02 -4.338196e+00
* 2.7344116255e+05 6.33e+01 1.71e+03 1.069793e-02 -3.915375e+00
* 2.7327905718e+05 4.04e+00 1.71e+04 1.075232e-02 -4.332930e+00
6 2.7326776014e+05 -9.05e-02 1.71e+03 1.075288e-02 -4.338144e+00
iter chisq delta/lim lambda a b
After 6 iterations the fit converged.
final sum of squares of residuals : 273268
rel. change during last iteration : -9.0493e-07
degrees of freedom (FIT_NDF) : 12
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 150.905
variance of residuals (reduced chisquare) = WSSR/ndf : 22772.3
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 0.0107529 +/- 3.114 (2.896e+04%)
b = -4.33814 +/- 3.678 (84.78%)
correlation matrix of the fit parameters:
a b
a 1.000
b 0.274 1.000
I saw it produced some result, so I thought it was all ok, but my happiness lasted seconds, just until I plotted the output:
Wow. A really good one.
And I'm still here wondering what's wrong and how to get a proper fit of a damped sine wave dataset with gnuplot.
Hope someone knows the answer :)

The function you are fitting the data to is not a good match for the data. The envelope of the data is a decaying function, so you want a positive damping parameter a. But then your fitting function cannot be bigger than 1 for positive x, unlike your data. Also, by using a sine function in your fit you assume something about the phase behavior -- the fitted function will always be zero at x=0. However, your data looks like it should have a large, negative amplitude.
So let's choose a better fitting function, and give gnuplot a hand by choosing some reasonable initial guesses for the parameters:
f(x)=c*exp(-a*x)*cos(b*x)
a=1./500
b=2*pi/100.
c=-400.
fit f(x) 'data.txt' via a,b,c
plot f(x), "data.txt" w p
gives

Related

How to run granger-test for original stationary and transformed stationary time-series in R?

I have three time-series of daily frequencies named a, b, c. I have transformed the three time series via zoo and then run three tests to check for stationarity also detailed here:
Ljung-Box is <0.01 for all
ADF hovers around 0.01 for all (0.0104, 0.01395, 0.0151), so under 0.05
KPSS test is <0.01 for two and 0.06 for one
This tells me that c is stationary and a and b are not. To transform the two non-stationary time series to stationary, I use diff()
diffa <- diff(a)
diffb <- diff(b)
So now I have the time series c, diffa, and diffb. c has the length 230, diffa and diff 229. As they have different lengths, I can't run grangertest() on c, diffa, and diffb.
My questions:
As far as I understand, I should not use diff() on an already stationary time series although this would solve the different lengths problem. Correct?
How can I run grangertest() on the three time series? Should I add 0 as the first value for the two now-stationary time series? That strikes me as wrong. Adding NA as first value doesn't work as this throws an error.
Or am I missing something entirely here?
Edit: I also ran the Johansen-Procedure
######################
# Johansen-Procedure #
######################
Test type: trace statistic , with linear trend
Eigenvalues (lambda):
[1] 0.2870433 0.2059927 0.1364341
Values of teststatistic and critical values of test:
test 10pct 5pct 1pct
r <= 2 | 26.70 6.50 8.18 11.65
r <= 1 | 68.68 15.66 17.95 23.52
r = 0 | 130.25 28.71 31.52 37.22
Eigenvectors, normalised to first column:
(These are the cointegration relations)
a.l2 c.l2 b.l2
a.l2 1.000000000 1.00000000 1.000000000
c.l2 0.002389741 -0.01354087 -0.009186882
b.l2 -0.721484628 0.63371350 -5.025730289
Weights W:
(This is the loading matrix)
a.l2 c.l2 b.l2
a.d -0.4943852 -0.1442216 0.03055074
c.d -28.8624865 22.6674082 5.20934270
b.d 0.3205697 -0.1668201 0.05236327
Edit 2:
VECM output from tsDyn
AIC 7310.527 BIC 7364.995 SSR 107996319896
Cointegrating vector (estimated by 2OLS):
a c b
r1 1 -0.0003824692 -0.5213095
ECT Intercept a -1 c -1 b -1
Equation a -0.5926(0.1377)*** 8.7668(12.0209) 0.1257(0.1237) 0.0011(0.0006)* -0.2109(0.0768)**
Equation c 14.7790(21.2262) -26.1263(1852.6566) 1.3814(19.0588) -0.1389(0.0852) -0.8560(11.8419)
Equation b 0.1636(0.2013) 2.6051(17.5698) 0.0168(0.1807) 0.0035(0.0008)*** -0.3475(0.1123)**
A VECM works with non-stationary series so you will have an order of integration to be stationary I(1).
https://pdf4pro.com/view/vector-error-correction-models-215587.html

How to simulate random states from fitted HMM with R package depmix?

I'm quite new to R, HMMs and depmix, so apologize if this question is too obvious. I fitted a toy model and want to simulate random sequences of predetermined length. The simulate function seems the way to go. My commands:
mod <- depmix(list(speeds~1,categ~1),data=my2Ddata,nstates=2,family=list(gaussian(),multinomial("identity")),instart=runif(2))
mod <- simulate(mod)
print(mod)
Output is not the one expected (actually the output is exactly the same I get if I print mod before simulate command):
Initial state probabilties model
pr1 pr2
0.615 0.385
Transition matrix
toS1 toS2
fromS1 0.5 0.5
fromS2 0.5 0.5
Response parameters
Resp 1 : gaussian
Resp 2 : multinomial
Re1.(Intercept) Re1.sd Re2.0 Re2.1
St1 0 1 0.5 0.5
St2 0 1 0.5 0.5
I was expecting something like a sequence of n random states drawn from fitted distribution (like they say page 41 here: https://cran.r-project.org/web/packages/depmixS4/depmixS4.pdf)
Any hint someone?
mod#response[[1]][[1]]#y
mod#response[[1]][[2]]#y
would provide the simulated speeds and categ.
mod#states
would provide the simulated hidden states.

Hessian Matrix in Maximum Likelihood - Gauss vs. R

I am struggling with the following problem. In a nutshell: Two different software packages (Gauss by Aptech and R) yield totally different Hessian Matrices in a Maximum Liklihood Procedure. I am using the same procedure (BFGS), the exact same data, the same maximum likelihood formula (it is a very simple logit model) with the exact same starting values and confusingly, I get the same results for the parameters and the log-likelihood. Only the Hessian matrices are different accross both programs and therefore, the estimation of the standard errors and statistical inference differs.
It does not appear much deviation in this specific example, but every increasing complication of the model increases the difference, so if I try to estimate my final model, both programs yield completely off results.
Does anyone know, how both programs differ in the way they compute the Hessian and possibly the right way to optaining the same results?
EDIT: In the R (Gauss) code, vector X (alt) is the independent variable, consisting of a two-colum vector with column one being entirely ones and the second column the subjects' responses. Vector y (itn) is the dependent variable, consisting of one columns with the subjects' responses. The example (R Code and data set) has been taken from http://www.polsci.ucsb.edu/faculty/glasgow/ps206/ps206.html, just as an example to reproduce and isolate the problem.
I have attached both codes (Gauss and R syntax) and outputs.
Any help would be greatly appreciated. Thank you :)
Gauss:
start={ 0.95568840 , -0.20459156 };
library maxlik,pgraph;
maxset;
_max_Algorithm = 2;
_max_Diagnostic = 1;
{betaa,f,g,cov,ret} = maxlik(XMAT,0,&ll,start);
call maxprt(betaa,f,g,cov,ret);
print _max_FinalHess;
proc ll(b,XMAT);
local exb, probo, logexb, yn, logexbn, yt, ynt, logl;
exb = EXP(alt*b);
//print exb;
probo = exb./(1+exb);
logexb = ln(probo);
yn = 1 - itn;
logexbn = ln(1 - probo);
yt = itn';
ynt = yn';
logl = (yt*logexb + ynt*logexbn);
print(logl);
retp(logl);
endp;
R:
startv <- c(0.95568840,-0.20459156)
logit.lf <- function(beta) {
exb <- exp(X%*%beta)
prob1 <- exb/(1+exb)
logexb <- log(prob1)
y0 <- 1 - y
logexb0 <- log(1 - prob1)
yt <- t(y)
y0t <- t(y0)
logl <- -(yt%*%logexb + y0t%*%logexb0)
return(logl)
}
logitmodel <- optim(startv, logit.lf, method="BFGS", control=list(trace=TRUE, REPORT=1), hessian=TRUE)
logitmodel$hessian
Gauss Output:
return code = 0
normal convergence
Mean log-likelihood -0.591820
Number of cases 1924
Covariance matrix of the parameters computed by the following method:
Inverse of computed Hessian
Parameters Estimates Std. err. Est./s.e. Prob. Gradient
------------------------------------------------------------------
P01 2.1038 0.2857 7.363 0.0000 0.0000
P02 -0.9984 0.2365 -4.221 0.0000 0.0000
Gauss Hessian:
0.20133256 0.23932571
0.23932571 0.29377761
R Output:
initial value 1153.210839
iter 2 value 1148.015749
iter 3 value 1141.420328
iter 4 value 1138.668174
iter 5 value 1138.662148
iter 5 value 1138.662137
iter 5 value 1138.662137
final value 1138.662137
converged
Coeff. Std. Err. z p value
[1,] 2.10379869 0.28570765 7.3634665 1.7919000e-13
[2,] -0.99837955 0.23651060 -4.2212889 2.4290942e-05
R Hessian:
[,1] [,2]
[1,] 387.34106 460.45379
[2,] 460.45379 565.24412
They are just scaled differently. The GAUSS numbers are around 1924 times smaller than the R numbers.
I think GAUSS keeps the numbers in a smaller range for numerical stability.

Parameter and initial conditions fitting ODE models with nls.lm

I am currently trying to fit ODE functional responses using the Levenberg-Marquardt routine (nls.lm) in pkg-minpack.lm following the tutorial here (http://www.r-bloggers.com/learning-r-parameter-fitting-for-models-involving-differential-equations/).
In the example, he fits the data by first setting up a function rxnrate which I modified shown below:
library(ggplot2) #library for plotting
library(reshape2) # library for reshaping data (tall-narrow <-> short-wide)
library(deSolve) # library for solving differential equations
library(minpack.lm) # library for least squares fit using levenberg-marquart algorithm
# prediction of concentration
# rate function
rxnrate=function(t,c,parms){
# rate constant passed through a list called parms
k1=parms$k1
k2=parms$k2
k3=parms$k3
# c is the concentration of species
# derivatives dc/dt are computed below
r=rep(0,length(c))
r[1]=-k1*c["A"] #dcA/dt
r[2]=k1*c["A"]-k2*c["B"]+k3*c["C"] #dcB/dt
r[3]=k2*c["B"]-k3*c["C"] #dcC/dt
# the computed derivatives are returned as a list
# order of derivatives needs to be the same as the order of species in c
return(list(r))
}
My problem is that the initial condition of each states can be also considered as the estimated parameters. However, it does not work properly at the moment.
Below is my code:
# function that calculates residual sum of squares
ssq=function(myparms){
# inital concentration
cinit=c(A=myparms[4],B=0,C=0)
# time points for which conc is reported
# include the points where data is available
t=c(seq(0,5,0.1),df$time)
t=sort(unique(t))
# parms from the parameter estimation routine
k1=myparms[1]
k2=myparms[2]
k3=myparms[3]
# solve ODE for a given set of parameters
out=ode(y=cinit,times=t,func=rxnrate,parms=list(k1=k1,k2=k2,k3=k3))
# Filter data that contains time points where data is available
outdf=data.frame(out)
outdf=outdf[outdf$time %in% df$time,]
# Evaluate predicted vs experimental residual
preddf=melt(outdf,id.var="time",variable.name="species",value.name="conc")
expdf=melt(df,id.var="time",variable.name="species",value.name="conc")
ssqres=preddf$conc-expdf$conc
# return predicted vs experimental residual
return(ssqres)
}
# parameter fitting using levenberg marquart algorithm
# initial guess for parameters
myparms=c(k1=0.5,k2=0.5,k3=0.5,A=1)
# fitting
fitval=nls.lm(par=myparms,fn=ssq)
Once I run this, an error comes out like this
Error in chol.default(object$hessian) :
the leading minor of order 1 is not positive definite
The problem of your code is the following one:
In the code-line cinit=c(A=myparms[4],B=0,C=0) you gave A the value of myparms[4] AND the name of myparms[4]. Let's see:
myparms=c(k1=0.5,k2=0.5,k3=0.5,A=1)
cinit=c(A=myparms[4],B=0,C=0)
print(cinit)
A.A B C
1 0 0
to solve this problem, you can do this:
myparms=c(k1=0.5,k2=0.5,k3=0.5,A=1)
cinit=c(A=unname(myparms[4]),B=0,C=0)
print(cinit)
A B C
1 0 0
or this:
myparms=c(k1=0.5,k2=0.5,k3=0.5,1)
cinit=c(A=unname(myparms[4]),B=0,C=0)
print(cinit)
A B C
1 0 0
Then your code will work!
Best regards,
J_F

Fitting two parameter observations into copulas

I have one set of observations containing two parameters.
How to fit it into copula (estimate the parameter of the copula and the margin function)?
Let's say the margin distribution are log-normal distributions, and the copula is Gumbel copula.
The data is as below:
1 974.0304 1010
2 6094.2672 1150
3 3103.2720 1490
4 1746.1872 1210
5 6683.7744 3060
6 6299.6832 3330
7 4784.0112 1550
8 1472.4288 607
9 3758.5728 1970
10 4381.2144 1350
Library(copula)
gumbel.cop <- gumbelCopula(dim=2)
myMvd <- mvdc(gumbel.cop, c("lnorm","lnorm"), list(list(meanlog = 7.1445391,sdlog=0.4568783), list(meanlog = 7.957392,sdlog=0.559831)))
x <- rmvdc(myMvd, 1000)
fit <- fitMvdc(x, myMvd, c(7.1445391,0.4568783,7.957392,0.559831))
The meanlog and sdlog value are derived from the data set. Error message:
"Error in if (alpha - 1 < .Machine$double.eps^(1/3)) return(rCopula(n, :
missing value where TRUE/FALSE needed"
How to choose the copula parameter with the given data, and the margin distributions derived from the data set?
To close the question assessed in the comments.
It seems that giving a parameter of TRUE or FALSE close the problem as well as doing first the pseudo-observation and then fit the function.

Resources