R nls Different Errors occur - r
I'm new in R programming and I don't get a solution to an error which occurs when I use the nls Function.
I try to fit the data from an ecdf (values are extracted and saved in y) to this function model with four parameters:
fitsim <- nls(y ~ exp(-(((a-Abfluss)/(c*(Abfluss-b)))^d)),
start = list( a=max(Abfluss), b=min(Abfluss),
c=3, d=1))
When I start the nls Function these error occurs:
Error in numericDeriv(form[[3L]], names(ind), env) :
Fehlender Wert oder etwas Unendliches durch das Modell erzeugt
which means there is a missing value ore some value with infinity is generated through the model.
My vectors Abfluss and y have both the same lengths. Aim is to get the parameter estimation.
Maybe the problem is, that the model only works under this conditions:
c>0, d>0, b<=Abfluss<=a.
I try already the na.rm=True command. Then another error appears:
Error in model.frame.default(formula = ~y + Abfluss, na.rm = TRUE) :
Variablenlängen sind unterschiedlich (gefunden für '(na.rm)')
which means, the Length of variables are different.
I appreciative for every kind of help and advice.
For a better understanding I attach my whole code with whole data:
time<-c(1851:2013)
Abfluss<- c(4853,4214,5803,3430,4645,4485,3100,4797,4030,3590,5396,9864,3683,4485,4064,3420,5396,
4895,3931,4238,3790,3520,4263,5474,3790,4700,5109,4525,4007,6340,4993,6903,8160,3600,3480,3540,
3540,4565,3333,7764,
4755,7940,3112,3169,4435,5365,9422,3150,10500,4512,3790,4618,6126,3769,3704,
5938,5669,4552,5458,5854,4867,6057,4783,5753,5736,4618,6091,5820,5007,7984, 4435,
4645,7465,5820,5988,6022,4300,6062,3302,4877,4586,5275,4410,3174,4966,4939,4638,
5541,5760,6495,5435,4952,4912,6092,5182,5820,5129,6436,6648,3063,5550,5160,4400,
9600,6400,6380,6300,6180,6899,4360,5550,4580,3894,5277,7520,6780,5100,5430,4550,
6620,4050,4560,5290,6610,8560,4943,6940,4744,6650,5700,7440,6200,4597,3697,7300,
4644,5456,6302,3741,5398,9500,6296,5279,5923,6412,6559,6559,5891,5737,5010,5790,
10300,4150,4870,6740,7560,8010,5120,8170,7430, 7330,5900, 11150)
#EV4-Distribution
dEV4 <- function(x, a, b, c,d) {
m<-exp(-(((a-Abfluss)/(c*(Abfluss-b)))^d))
return(m)
}
#Simulation example
Sim<-dEV4(Abfluss,a=max(Abfluss),b=min(Abfluss), c=3, d=1)
dEV4cdf<-cbind(Abfluss,Sim)
#Empirical cdf
p = ecdf(Abfluss)
y<- p(Abfluss) #Extracting of cumulated probabilities
m<-cbind(Abfluss,y)
#plot EV4 and ecdf
plot(dEV4cdf, type="p",main="EV4")
plot(ecdf(Abfluss), add=T)
#Fitting EV4 nls
fitsim <- nls(y ~ exp(-(((a-Abfluss)/(c*(Abfluss-b)))^d)),
start = list( a=max(Abfluss), b=min(Abfluss),
c=3, d=1), na.rm=TRUE)
Do not use starting values that are on the boundary of the feasible region and try nlxb in nlmrt instead (which can be used with the same arguments except data = is not optional):
library(nlmrt)
fitsim <- nlxb(y ~ exp(-(((a - Abfluss) / (c * (Abfluss - b))) ^ d)),
data = data.frame(y, Abfluss),
start = list(a = 2 * max(Abfluss), b = min(Abfluss) / 2, c = 3, d = 1))
plot(y ~ Abfluss, pch = 20)
o <- order(Abfluss)
fit <- y - fitsim$resid
lines(fit[o] ~ Abfluss[o], col = "red")
giving:
nlmrt class object: x
residual sumsquares = 0.02908 on 163 observations
after 5001 Jacobian and 6060 function evaluations
name coeff SE tstat pval gradient JSingval
a 20047.7 NA NA NA 1.119e-07 3251
b -1175384 NA NA NA 1.432e-09 0.1775
c 0.0129414 NA NA NA -0.1296 5.808e-06
d 12.146 NA NA NA -2.097e-06 6.798e-11
Related
FME package: "Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix" in fitting using modFit()
I'm trying to fit the differential equation using the least squares method (FME package). However, I keep getting this error that I don't know how to tackle. The reproducible example: times = seq(0, 4, by = 0.5) dat = data.frame(time = seq(1,4), Tick = c(128, 52.5, 28, 121)) N = 10 tick.model <- function(time, y, params, ...) { #here we begin a function with three arguments with(as.list(c(y, params)),{ dTick <- (30 - s.t*Tick)*Tick*0.3*N - delta.t*Tick return(list(c(dTick))) }) } y = c(Tick = 82.375) cost1 <- function(p) { out <- ode(y, times, tick.model, p) modCost(out, dat, weight = "none") } params <- c(s.t=0.1, delta.t = 1) fit = modFit(f = cost1, p = params, lower = rep(0,2), upper = c(10, 5)) summary(fit) The result comes out like this: Parameters: Estimate Std. Error t value Pr(>|t|) s.t 0.3641876 NA NA NA delta.t 0.0001417 NA NA NA Residual standard error: 60.92 on 2 degrees of freedom Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix In addition: Warning message: In summary.modFit(fit) : Cannot estimate covariance; system is singular Also, the fitted model doesn't look nice . I have no idea what I could have done wrong.
`Error in solve.default(xtx + diag(pen)): system is computationally singular:` when trying to impute 3-level dataset
I have a datset comprised of 44 variables and ~86,000 rows. Of the 44 variables 34 variables contain missing data ranging from ~2% to ~25%. Of the variables containing missing, 7 variables are on level 1, 25 variables on level 2, and 2 variables on level 3. The 10 remaining variables are comprise three level identifiers and level 1 variables without missings. I've been trying to impute the incomplete data following the sample script in the documentation. However, when trying to run the mice imputation function, I get the error Error in solve.default(xtx + diag(pen)): system is computationally singular: reciprocal condition number = 4.3615e-20 which I don't understand. It may sound like I haven't defined a level ID variable, but I fail to see how my code is noticeably different from the example. Some help understanding the error message would be much appreciated. My code: library(dplyr) library(mice) library(miceadds) my_data x <- paste0("x",1:7) y <- paste0("y", 1:25) z <- c("z1", "z2") #----- specify levels of variables (only relevent for variables # with missing values) level <- character(ncol(my_data)) names(level) <- colnames(my_data) level[y] <- "id2" # level 2 identifier level[z] <- "id3" # level 3 identifier #----- specify predictor matrix predMatrix <- my_data %>% make.predictorMatrix() # remove indicator variables from predictor matrix predMatrix[, c("id2", "id3")] <- 0 # set -2 for level identifier for level 3 variable z1 # because "2lonly" function is used predMatrix[c(z), "id3"] <- -2 #----- specify imputation methods impMethod <- my_data %>% make.method() # method for lower-level variables (x, y, and z) impMethod[c(x, y)] <- "ml.lmer" # method for variables at top level (w) impMethod[c(z)] <- "2lonly.norm" #----- specify hierarchical structure of imputation models levels_id <- list() #** hierarchical structure for L1 variable l1 <- list() l1 <- lapply(x, function(x){ append(l1, c("id2", "id3")) %>% unlist() }) names(l1) <- x #** hierarchical structure for variable y1 l2 <- list() l2 <- lapply(y, function(x){ append(l2, c("id3")) %>% unlist() }) names(l2) <- y levels_id <- c(levels_id, l1, l2) rm(l1, l2) # run mice imp <- mice(my_data, m = 5, maxit = 10, method = impMethod, predictorMatrix = predMatrix, levels_id = levels_id, variables_levels = level) Output: iter imp variable 1 1 x1 boundary (singular) fit: see ?isSingular # (message repeated for all level 1 variables ...) x7 boundary (singular) fit: see ?isSingular y1 y2 y3 (all level 2 and 3 variables except z2) y24 y25 z1 Error in solve.default(xtx + diag(pen)): system is computationally singular: reciprocal condition number = 4.3615e-20 Traceback: 1. mice(t19sel_imp, m = 5, maxit = 10, method = impMethod, predictorMatrix = predMatrix, . levels_id = levels_id, variables_levels = level) 2. sampler(data, m, ignore, where, imp, blocks, method, visitSequence, . predictorMatrix, formulas, blots, post, c(from, to), printFlag, . ...) 3. sampler.univ(data = data, r = r, where = where, type = type, . formula = ff, method = theMethod, yname = j, k = k, calltype = calltype, . user = user, ignore = ignore, ...) 4. do.call(f, args = args) 5. mice.impute.2lonly.norm(y = c(1, 1, #[... an extremely long printout of list items follows ...] . BCBG04 = "IDCNTSCH")) 6. .imputation.level2(y = y, ry = ry, x = x, type = type, wy = wy, . method = "norm", ...) 7. mice.impute.norm(y = y2, ry = ry2, x = x2, wy = wy2, ...) 8. .norm.draw(y, ry, x, ...) 9. estimice(x[ry, , drop = FALSE], y[ry], ...) 10. solve(xtx + diag(pen)) 11. solve.default(xtx + diag(pen))
`nls` fails to estimate parameters of my model
I am trying to estimate the constants for Heaps law. I have the following dataset novels_colection: Number of novels DistinctWords WordOccurrences 1 1 13575 117795 2 1 34224 947652 3 1 40353 1146953 4 1 55392 1661664 5 1 60656 1968274 Then I build the next function: # Function for Heaps law heaps <- function(K, n, B){ K*n^B } heaps(2,117795,.7) #Just to test it works So n = Word Occurrences, and K and B are values that should be constants in order to find my prediction of Distinct Words. I tried this but it gives me an error: fitHeaps <- nls(DistinctWords ~ heaps(K,WordOccurrences,B), data = novels_collection[,2:3], start = list(K = .1, B = .1), trace = T) Error = Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model Any idea in how could I fix this or a method to fit the function and get the values for K and B?
If you take log transform on both sides of y = K * n ^ B, you get log(y) = log(K) + B * log(n). This is a linear relationship between log(y) and log(n), hence you can fit a linear regression model to find log(K) and B. logy <- log(DistinctWords) logn <- log(WordOccurrences) fit <- lm(logy ~ logn) para <- coef(fit) ## log(K) and B para[1] <- exp(para[1]) ## K and B
With minpack.lm we can fit a non-linear model but I guess it will be prone to overfitting more than a linear model on the log-transformed variables will do (as done by Zheyuan), but we may compare the residuals of linear / non-linear model on some held-out dataset to get the empirical results, which will be interesting to see. library(minpack.lm) fitHeaps = nlsLM(DistinctWords ~ heaps(K, WordOccurrences, B), data = novels_collection[,2:3], start = list(K = .01, B = .01)) coef(fitHeaps) # K B # 5.0452566 0.6472176 plot(novels_collection$WordOccurrences, novels_collection$DistinctWords, pch=19) lines(novels_collection$WordOccurrences, predict(fitHeaps, newdata = novels_collection[,2:3]), col='red')
Exponential fitting with R
I have a dataset like this df x y 7.3006667 -0.14383333 -0.8983333 0.02133333 2.7953333 -0.07466667 and I would like to fit an exponential function like y = a*(exp(bx)). This is what I tried and the error I get f <- function(x,a,b) {a * exp(b * x)} st <- coef(nls(log(y) ~ log(f(x, a, b)), df, start = c(a = 1, b = -1))) Error in qr.qty(QR, resid) : NA/NaN/Inf in foreign function call (arg 5) In addition: Warning messages: 1: In log(y) : NaNs produced 2: In log(y) : NaNs produced fit <- nls(y ~ f(x, a, b), data = df, start = list(a = st[1], b = st[2])) Error in nls(y ~ exp(a + b * x), data = df, start = list(a = st[1], : singular gradient I believe it has to do with the fact that the log is not defined for negative numbers but I don't know how to solve this.
I'm having trouble seeing the problem here. f <- function(x,a,b) {a * exp(b * x)} fit <- nls(y~f(x,a,b),df,start=c(a=1,b=1)) summary(fit)$coefficients # Estimate Std. Error t value Pr(>|t|) # a -0.02285668 0.03155189 -0.7244157 0.6008871 # b 0.25568987 0.19818736 1.2901422 0.4197729 plot(y~x, df) curve(predict(fit,newdata=data.frame(x)), add=TRUE) The coefficients are very poorly estimated, but that's not surprising: you have two parameters and three data points. As to why your code fails: the first call to nls(...) generates an error, so st is never set to anything (although it may have a value from some earlier code). Then you try to use that in the second call to nls(...).
Forecasting timeseries with tslm in R
I'm still new to R and am facing a problem i can't seem to resolve. I would like to forecast my time series data. I have this year's daily numbers: y, and last year's daily number which I want to use as a predictor. The numbers show week cycles. I tried this code. (Fake numbers for clarity) x = rnorm(60,0,1) y = rnorm(60,0 ,1) + 2*cos(2*pi*1:60/7) + 10*x new_x = rnorm(10,0,1) y <- ts(y,frequency = 7) fit <- tslm(y ~ trend + season + x) fcast = forecast.lm(fit, h = 10, newdata = new_x) I get the error message : Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : variable lengths differ (found for 'x') In addition: Warning message: 'newdata' had 10 rows but variables found have 60 rows Any hints on what I did wrong?
From your fit object: Call: lm(formula = formula, data = "y", na.action = na.exclude) Coefficients: (Intercept) trend season2 season3 season4 season5 season6 season7 x 1.1644029 0.0009672 -1.5575562 -3.6723105 -3.1824001 -1.5658857 0.0789683 0.3053541 9.9233635 The last variable is named x. And the help for forecast.lm says newdata is an optional data.frame. You need to turn new_x into a data.frame, with x as column name. library(forecast) x = rnorm(60,0,1) y = rnorm(60,0 ,1) + 2*cos(2*pi*1:60/7) + 10*x new_x = rnorm(10,0,1) y <- ts(y,frequency = 7) fit <- tslm(y ~ trend + season + x) # You can directly use `forecast`, as `fit` is an lm object # and you don't need `h`, as you provide new data. fcast = forecast(fit, newdata = data.frame(x=new_x)) # Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 # 9.571429 -3.1541222 -4.5886075 -1.719637 -5.37216743 -0.9360771 # 9.714286 12.5962250 11.1367496 14.055700 10.33953926 14.8529108 # 9.857143 10.5924632 9.1480030 12.036924 8.35899443 12.8259321 #10.000000 15.9419378 14.4775444 17.406331 13.67764776 18.2062278 #10.142857 -7.1887433 -8.6444741 -5.733013 -9.43963897 -4.9378477 #10.285714 -9.4133170 -10.8470152 -7.979619 -11.63014523 -7.1964887 #10.428571 2.2702132 0.8331488 3.707278 0.04818005 4.4922464 #10.571429 0.3519401 -1.1037991 1.807679 -1.89896851 2.6028487 #10.714286 -11.8348209 -13.2930857 -10.376556 -14.08963475 -9.5800070 #10.857143 1.0058209 -0.4435763 2.455218 -1.23528154 3.2469233
You could have converted new_x to data.frame and your initial code too would work. The new_x variable is of type number and needs to have data.frame as a input for forecast.lm. Regards, Ganesh Bhat
The error seems to be obvious: new_data has 10 random variable whereas y & x have 60. Can you update new_data to have 60 random variables and verify that the error does not occur? Regards, Ganesh