R nls Different Errors occur - r

I'm new in R programming and I don't get a solution to an error which occurs when I use the nls Function.
I try to fit the data from an ecdf (values are extracted and saved in y) to this function model with four parameters:
fitsim <- nls(y ~ exp(-(((a-Abfluss)/(c*(Abfluss-b)))^d)),
start = list( a=max(Abfluss), b=min(Abfluss),
c=3, d=1))
When I start the nls Function these error occurs:
Error in numericDeriv(form[[3L]], names(ind), env) :
Fehlender Wert oder etwas Unendliches durch das Modell erzeugt
which means there is a missing value ore some value with infinity is generated through the model.
My vectors Abfluss and y have both the same lengths. Aim is to get the parameter estimation.
Maybe the problem is, that the model only works under this conditions:
c>0, d>0, b<=Abfluss<=a.
I try already the na.rm=True command. Then another error appears:
Error in model.frame.default(formula = ~y + Abfluss, na.rm = TRUE) :
Variablenlängen sind unterschiedlich (gefunden für '(na.rm)')
which means, the Length of variables are different.
I appreciative for every kind of help and advice.
For a better understanding I attach my whole code with whole data:
time<-c(1851:2013)
Abfluss<- c(4853,4214,5803,3430,4645,4485,3100,4797,4030,3590,5396,9864,3683,4485,4064,3420,5396,
4895,3931,4238,3790,3520,4263,5474,3790,4700,5109,4525,4007,6340,4993,6903,8160,3600,3480,3540,
3540,4565,3333,7764,
4755,7940,3112,3169,4435,5365,9422,3150,10500,4512,3790,4618,6126,3769,3704,
5938,5669,4552,5458,5854,4867,6057,4783,5753,5736,4618,6091,5820,5007,7984, 4435,
4645,7465,5820,5988,6022,4300,6062,3302,4877,4586,5275,4410,3174,4966,4939,4638,
5541,5760,6495,5435,4952,4912,6092,5182,5820,5129,6436,6648,3063,5550,5160,4400,
9600,6400,6380,6300,6180,6899,4360,5550,4580,3894,5277,7520,6780,5100,5430,4550,
6620,4050,4560,5290,6610,8560,4943,6940,4744,6650,5700,7440,6200,4597,3697,7300,
4644,5456,6302,3741,5398,9500,6296,5279,5923,6412,6559,6559,5891,5737,5010,5790,
10300,4150,4870,6740,7560,8010,5120,8170,7430, 7330,5900, 11150)
#EV4-Distribution
dEV4 <- function(x, a, b, c,d) {
m<-exp(-(((a-Abfluss)/(c*(Abfluss-b)))^d))
return(m)
}
#Simulation example
Sim<-dEV4(Abfluss,a=max(Abfluss),b=min(Abfluss), c=3, d=1)
dEV4cdf<-cbind(Abfluss,Sim)
#Empirical cdf
p = ecdf(Abfluss)
y<- p(Abfluss) #Extracting of cumulated probabilities
m<-cbind(Abfluss,y)
#plot EV4 and ecdf
plot(dEV4cdf, type="p",main="EV4")
plot(ecdf(Abfluss), add=T)
#Fitting EV4 nls
fitsim <- nls(y ~ exp(-(((a-Abfluss)/(c*(Abfluss-b)))^d)),
start = list( a=max(Abfluss), b=min(Abfluss),
c=3, d=1), na.rm=TRUE)

Do not use starting values that are on the boundary of the feasible region and try nlxb in nlmrt instead (which can be used with the same arguments except data = is not optional):
library(nlmrt)
fitsim <- nlxb(y ~ exp(-(((a - Abfluss) / (c * (Abfluss - b))) ^ d)),
data = data.frame(y, Abfluss),
start = list(a = 2 * max(Abfluss), b = min(Abfluss) / 2, c = 3, d = 1))
plot(y ~ Abfluss, pch = 20)
o <- order(Abfluss)
fit <- y - fitsim$resid
lines(fit[o] ~ Abfluss[o], col = "red")
giving:
nlmrt class object: x
residual sumsquares = 0.02908 on 163 observations
after 5001 Jacobian and 6060 function evaluations
name coeff SE tstat pval gradient JSingval
a 20047.7 NA NA NA 1.119e-07 3251
b -1175384 NA NA NA 1.432e-09 0.1775
c 0.0129414 NA NA NA -0.1296 5.808e-06
d 12.146 NA NA NA -2.097e-06 6.798e-11

Related

FME package: "Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix" in fitting using modFit()

I'm trying to fit the differential equation using the least squares method (FME package).
However, I keep getting this error that I don't know how to tackle.
The reproducible example:
times = seq(0, 4, by = 0.5)
dat = data.frame(time = seq(1,4),
Tick = c(128, 52.5, 28, 121))
N = 10
tick.model <- function(time, y, params, ...) { #here we begin a function with three arguments
with(as.list(c(y, params)),{
dTick <- (30 - s.t*Tick)*Tick*0.3*N - delta.t*Tick
return(list(c(dTick)))
})
}
y = c(Tick = 82.375)
cost1 <- function(p) {
out <- ode(y, times, tick.model, p)
modCost(out, dat, weight = "none")
}
params <- c(s.t=0.1, delta.t = 1)
fit = modFit(f = cost1, p = params, lower = rep(0,2),
upper = c(10, 5))
summary(fit)
The result comes out like this:
Parameters:
Estimate Std. Error t value Pr(>|t|)
s.t 0.3641876 NA NA NA
delta.t 0.0001417 NA NA NA
Residual standard error: 60.92 on 2 degrees of freedom
Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix
In addition: Warning message:
In summary.modFit(fit) : Cannot estimate covariance; system is singular
Also, the fitted model doesn't look nice
.
I have no idea what I could have done wrong.

`Error in solve.default(xtx + diag(pen)): system is computationally singular:` when trying to impute 3-level dataset

I have a datset comprised of 44 variables and ~86,000 rows.
Of the 44 variables 34 variables contain missing data ranging from ~2% to ~25%. Of the variables containing missing, 7 variables are on level 1, 25 variables on level 2, and 2 variables on level 3. The 10 remaining variables are comprise three level identifiers and level 1 variables without missings.
I've been trying to impute the incomplete data following the sample script in the documentation. However, when trying to run the mice imputation function, I get the error Error in solve.default(xtx + diag(pen)): system is computationally singular: reciprocal condition number = 4.3615e-20 which I don't understand. It may sound like I haven't defined a level ID variable, but I fail to see how my code is noticeably different from the example.
Some help understanding the error message would be much appreciated.
My code:
library(dplyr)
library(mice)
library(miceadds)
my_data
x <- paste0("x",1:7)
y <- paste0("y", 1:25)
z <- c("z1", "z2")
#----- specify levels of variables (only relevent for variables
# with missing values)
level <- character(ncol(my_data))
names(level) <- colnames(my_data)
level[y] <- "id2" # level 2 identifier
level[z] <- "id3" # level 3 identifier
#----- specify predictor matrix
predMatrix <- my_data %>%
make.predictorMatrix()
# remove indicator variables from predictor matrix
predMatrix[, c("id2", "id3")] <- 0
# set -2 for level identifier for level 3 variable z1
# because "2lonly" function is used
predMatrix[c(z), "id3"] <- -2
#----- specify imputation methods
impMethod <- my_data %>%
make.method()
# method for lower-level variables (x, y, and z)
impMethod[c(x, y)] <- "ml.lmer"
# method for variables at top level (w)
impMethod[c(z)] <- "2lonly.norm"
#----- specify hierarchical structure of imputation models
levels_id <- list()
#** hierarchical structure for L1 variable
l1 <- list()
l1 <- lapply(x, function(x){
append(l1, c("id2", "id3")) %>%
unlist()
})
names(l1) <- x
#** hierarchical structure for variable y1
l2 <- list()
l2 <- lapply(y, function(x){
append(l2, c("id3")) %>%
unlist()
})
names(l2) <- y
levels_id <- c(levels_id, l1, l2)
rm(l1, l2)
# run mice
imp <- mice(my_data, m = 5, maxit = 10, method = impMethod,
predictorMatrix = predMatrix, levels_id = levels_id,
variables_levels = level)
Output:
iter imp variable
1 1 x1
boundary (singular) fit: see ?isSingular
# (message repeated for all level 1 variables ...)
x7
boundary (singular) fit: see ?isSingular
y1 y2 y3 (all level 2 and 3 variables except z2)
y24 y25 z1
Error in solve.default(xtx + diag(pen)): system is computationally singular: reciprocal condition number = 4.3615e-20
Traceback:
1. mice(t19sel_imp, m = 5, maxit = 10, method = impMethod, predictorMatrix = predMatrix,
. levels_id = levels_id, variables_levels = level)
2. sampler(data, m, ignore, where, imp, blocks, method, visitSequence,
. predictorMatrix, formulas, blots, post, c(from, to), printFlag,
. ...)
3. sampler.univ(data = data, r = r, where = where, type = type,
. formula = ff, method = theMethod, yname = j, k = k, calltype = calltype,
. user = user, ignore = ignore, ...)
4. do.call(f, args = args)
5. mice.impute.2lonly.norm(y = c(1, 1, #[... an extremely long printout of list items follows ...]
. BCBG04 = "IDCNTSCH"))
6. .imputation.level2(y = y, ry = ry, x = x, type = type, wy = wy,
. method = "norm", ...)
7. mice.impute.norm(y = y2, ry = ry2, x = x2, wy = wy2, ...)
8. .norm.draw(y, ry, x, ...)
9. estimice(x[ry, , drop = FALSE], y[ry], ...)
10. solve(xtx + diag(pen))
11. solve.default(xtx + diag(pen))

`nls` fails to estimate parameters of my model

I am trying to estimate the constants for Heaps law.
I have the following dataset novels_colection:
Number of novels DistinctWords WordOccurrences
1 1 13575 117795
2 1 34224 947652
3 1 40353 1146953
4 1 55392 1661664
5 1 60656 1968274
Then I build the next function:
# Function for Heaps law
heaps <- function(K, n, B){
K*n^B
}
heaps(2,117795,.7) #Just to test it works
So n = Word Occurrences, and K and B are values that should be constants in order to find my prediction of Distinct Words.
I tried this but it gives me an error:
fitHeaps <- nls(DistinctWords ~ heaps(K,WordOccurrences,B),
data = novels_collection[,2:3],
start = list(K = .1, B = .1), trace = T)
Error = Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model
Any idea in how could I fix this or a method to fit the function and get the values for K and B?
If you take log transform on both sides of y = K * n ^ B, you get log(y) = log(K) + B * log(n). This is a linear relationship between log(y) and log(n), hence you can fit a linear regression model to find log(K) and B.
logy <- log(DistinctWords)
logn <- log(WordOccurrences)
fit <- lm(logy ~ logn)
para <- coef(fit) ## log(K) and B
para[1] <- exp(para[1]) ## K and B
With minpack.lm we can fit a non-linear model but I guess it will be prone to overfitting more than a linear model on the log-transformed variables will do (as done by Zheyuan), but we may compare the residuals of linear / non-linear model on some held-out dataset to get the empirical results, which will be interesting to see.
library(minpack.lm)
fitHeaps = nlsLM(DistinctWords ~ heaps(K, WordOccurrences, B),
data = novels_collection[,2:3],
start = list(K = .01, B = .01))
coef(fitHeaps)
# K B
# 5.0452566 0.6472176
plot(novels_collection$WordOccurrences, novels_collection$DistinctWords, pch=19)
lines(novels_collection$WordOccurrences, predict(fitHeaps, newdata = novels_collection[,2:3]), col='red')

Exponential fitting with R

I have a dataset like this
df
x y
7.3006667 -0.14383333
-0.8983333 0.02133333
2.7953333 -0.07466667
and I would like to fit an exponential function like y = a*(exp(bx)).
This is what I tried and the error I get
f <- function(x,a,b) {a * exp(b * x)}
st <- coef(nls(log(y) ~ log(f(x, a, b)), df, start = c(a = 1, b = -1)))
Error in qr.qty(QR, resid) : NA/NaN/Inf in foreign function call (arg 5)
In addition: Warning messages:
1: In log(y) : NaNs produced
2: In log(y) : NaNs produced
fit <- nls(y ~ f(x, a, b), data = df, start = list(a = st[1], b = st[2]))
Error in nls(y ~ exp(a + b * x), data = df, start = list(a = st[1], :
singular gradient
I believe it has to do with the fact that the log is not defined for negative numbers but I don't know how to solve this.
I'm having trouble seeing the problem here.
f <- function(x,a,b) {a * exp(b * x)}
fit <- nls(y~f(x,a,b),df,start=c(a=1,b=1))
summary(fit)$coefficients
# Estimate Std. Error t value Pr(>|t|)
# a -0.02285668 0.03155189 -0.7244157 0.6008871
# b 0.25568987 0.19818736 1.2901422 0.4197729
plot(y~x, df)
curve(predict(fit,newdata=data.frame(x)), add=TRUE)
The coefficients are very poorly estimated, but that's not surprising: you have two parameters and three data points.
As to why your code fails: the first call to nls(...) generates an error, so st is never set to anything (although it may have a value from some earlier code). Then you try to use that in the second call to nls(...).

Forecasting timeseries with tslm in R

I'm still new to R and am facing a problem i can't seem to resolve.
I would like to forecast my time series data.
I have this year's daily numbers: y, and last year's daily number which I want to use as a predictor.
The numbers show week cycles. I tried this code. (Fake numbers for clarity)
x = rnorm(60,0,1)
y = rnorm(60,0 ,1) + 2*cos(2*pi*1:60/7) + 10*x
new_x = rnorm(10,0,1)
y <- ts(y,frequency = 7)
fit <- tslm(y ~ trend + season + x)
fcast = forecast.lm(fit, h = 10, newdata = new_x)
I get the error message :
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
variable lengths differ (found for 'x')
In addition: Warning message:
'newdata' had 10 rows but variables found have 60 rows
Any hints on what I did wrong?
From your fit object:
Call:
lm(formula = formula, data = "y", na.action = na.exclude)
Coefficients:
(Intercept) trend season2 season3 season4 season5 season6 season7 x
1.1644029 0.0009672 -1.5575562 -3.6723105 -3.1824001 -1.5658857 0.0789683 0.3053541 9.9233635
The last variable is named x. And the help for forecast.lm says newdata is an optional data.frame. You need to turn new_x into a data.frame, with x as column name.
library(forecast)
x = rnorm(60,0,1)
y = rnorm(60,0 ,1) + 2*cos(2*pi*1:60/7) + 10*x
new_x = rnorm(10,0,1)
y <- ts(y,frequency = 7)
fit <- tslm(y ~ trend + season + x)
# You can directly use `forecast`, as `fit` is an lm object
# and you don't need `h`, as you provide new data.
fcast = forecast(fit, newdata = data.frame(x=new_x))
# Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
# 9.571429 -3.1541222 -4.5886075 -1.719637 -5.37216743 -0.9360771
# 9.714286 12.5962250 11.1367496 14.055700 10.33953926 14.8529108
# 9.857143 10.5924632 9.1480030 12.036924 8.35899443 12.8259321
#10.000000 15.9419378 14.4775444 17.406331 13.67764776 18.2062278
#10.142857 -7.1887433 -8.6444741 -5.733013 -9.43963897 -4.9378477
#10.285714 -9.4133170 -10.8470152 -7.979619 -11.63014523 -7.1964887
#10.428571 2.2702132 0.8331488 3.707278 0.04818005 4.4922464
#10.571429 0.3519401 -1.1037991 1.807679 -1.89896851 2.6028487
#10.714286 -11.8348209 -13.2930857 -10.376556 -14.08963475 -9.5800070
#10.857143 1.0058209 -0.4435763 2.455218 -1.23528154 3.2469233
You could have converted new_x to data.frame and your initial code too would work.
The new_x variable is of type number and needs to have data.frame as a input for forecast.lm.
Regards,
Ganesh Bhat
The error seems to be obvious:
new_data has 10 random variable whereas y & x have 60. Can you update new_data to have 60 random variables and verify that the error does not occur?
Regards,
Ganesh

Resources