I've tried to fit the following into a ADBUG model using the nls function in r, but the singular matrix error kept repeating and I don't really know how to proceed on doing this...
nprice nlv2
[1,] 0.6666667 1.91666667
[2,] 0.7500000 1.91666667
[3,] 0.8333333 1.91666667
[4,] 0.9166667 1.44444444
[5,] 1.0000000 1.00000000
[6,] 1.0833333 0.58333333
[7,] 1.1666667 0.22222222
[8,] 1.2500000 0.08333333
[9,] 1.3333333 0.02777778
code:
fit <- nls(f=nprice~a+b*nlv2^c/(nlv2^c+d),start=list(a=0.083,b=1.89,c=-10.95,d=0.94))
Error in nls(f = nprice ~ a + b * nlv2^c/(nlv2^c + d), start = list(a = 0.083, :
singular gradient
Package nlsr provides an updated version of nls through function nlxb that in most cases avoids the "singular gradient" error.
library(nlsr)
fit <- nlxb(f = nprice~a+b*nlv2^c/(nlv2^c+d),
data = df,
start = list(a=0.083,b=1.89,c=-10.95,d=0.94))
## vn:[1] "nprice" "a" "b" "nlv2" "c" "d"
## no weights
fit$coefficients
## a b c d
## -2.1207e+04 2.1208e+04 -7.4083e-01 1.6236e-05
The fitted coefficients are far away from the starting values and quite big, indicating the problem is not well grounded.
Related
Issues with evaluating ranger. In both, unable to subset the data (want the first column of rf.trnprob)
rangermodel= ranger(outcome~., data=traindata, num.trees=200, probability=TRUE)
rf.trnprob= predict(rangerModel, traindata, type='prob')
trainscore <- subset(traindata, select=c("outcome"))
trainscore$score<-rf.trnprob[, 1]
Error:
incorrect number of dimensions
table(pred = rf.trbprob, true=traindata$outcome)
Error:
all arguments must have the same length
Seems like the predict function is called wrongly, it should be response instead of type. Using an example dataset:
library(ranger)
traindata =iris
traindata$Species = factor(as.numeric(traindata$Species=="versicolor"))
rangerModel = ranger(Species~.,data=traindata,probability=TRUE)
rf.trnprob= predict(rangerModel, traindata, response='prob')
Probability is stored here, one column for each class:
head(rf.trnprob$predictions)
0 1
[1,] 1.0000000 0.000000000
[2,] 0.9971786 0.002821429
[3,] 1.0000000 0.000000000
[4,] 1.0000000 0.000000000
[5,] 1.0000000 0.000000000
[6,] 1.0000000 0.000000000
But seems like you want to do a confusion matrix, so you can get the predictions by doing:
pred = levels(traindata$Species)[max.col(rf.trnprob$predictions)]
Then:
table(pred,traindata$Species)
pred 0 1
0 100 2
1 0 48
I have a data.frame containing a vector of numeric values (prcp_log).
waterdate PRCP prcp_log
<date> <dbl> <dbl>
1 2007-10-01 0 0
2 2007-10-02 0.02 0.0198
3 2007-10-03 0.31 0.270
4 2007-10-04 1.8 1.03
5 2007-10-05 0.03 0.0296
6 2007-10-06 0.19 0.174
I then pass this data through Christiano-Fitzgerald band pass filter using the following command from the mfilter package.
library(mFilter)
US1ORLA0076_cffilter <- cffilter(US1ORLA0076$prcp_log,pl=180,pu=365,root=FALSE,drift=FALSE,
type=c("asymmetric"),
nfix=NULL,theta=1)
Which creates an S3 object containing, among other things, and vector of "trend" values and a vector of "cycle" values, like so:
head(US1ORLA0076_cffilter$trend)
[,1]
[1,] 0.05439408
[2,] 0.07275321
[3,] 0.32150292
[4,] 1.07958965
[5,] 0.07799329
[6,] 0.22082246
head(US1ORLA0076_cffilter$cycle)
[,1]
[1,] -0.05439408
[2,] -0.05295058
[3,] -0.05147578
[4,] -0.04997023
[5,] -0.04843449
[6,] -0.04686915
Plotted:
plot(US1ORLA0076_cffilter)
I then apply the following mathematical operation in attempt to remove the trend and seasonal components from the original numeric vector:
US1ORLA0076$decomp <- ((US1ORLA0076$prcp_log - US1ORLA0076_cffilter$trend) - US1ORLA0076_cffilter$cycle)
Which creates an output of values which includes unexpected elements such as dashes and letters.
head(US1ORLA0076$decomp)
[,1]
[1,] 0.000000e+00
[2,] 0.000000e+00
[3,] 1.387779e-17
[4,] -2.775558e-17
[5,] 0.000000e+00
[6,] 6.938894e-18
What has happened here? What do these additional characters signify? How can perform this mathematical operation and achieve the desired output of simply $log_prcp minus both the $tend and $cycle values?
I am happy to provide any additional info that will help right away, just ask.
I'm using the R package geigen to solve the generalized eigenvalue problem AV = lambdaB*V.
This is the code:
geigen(Gamma_chi_0, diag(diag(Gamma_xi_0)),symmetric=TRUE, only.values=FALSE) #GENERALIZED EIGENVALUE PROBLEM
Where:
Gamma_chi_0
[,1] [,2] [,3] [,4] [,5]
[1,] 1.02346 -0.50204 0.41122 -0.73066 0.00072
[2,] -0.50204 0.96712 -0.33526 0.51774 -0.37708
[3,] 0.41122 -0.33526 1.05086 0.09798 0.09274
[4,] -0.73066 0.51774 0.09798 0.99780 -0.51596
[5,] 0.00072 -0.37708 0.09274 -0.51596 1.03354
and
diag(diag(Gamma_xi_0))
[,1] [,2] [,3] [,4] [,5]
[1,] -0.0234 0.0000 0.0000 0.0000 0.0000
[2,] 0.0000 0.0329 0.0000 0.0000 0.0000
[3,] 0.0000 0.0000 -0.0509 0.0000 0.0000
[4,] 0.0000 0.0000 0.0000 0.0022 0.0000
[5,] 0.0000 0.0000 0.0000 0.0000 -0.0335
But I get this error:
> geigen(Gamma_chi_0, diag(diag(Gamma_xi_0)), only.values=FALSE)
Error in .sygv_Lapackerror(z$info, n) :
Leading minor of order 1 of B is not positive definite
In matlab, using the same two matrices, it works:
opt.disp = 0;
[P, D] = eigs(Gamma_chi_0, diag(diag(Gamma_xi_0)),r,'LM',opt);
% compute first r generalized eigenvectors and eigenvalues
For example I get the following eigenvalues matrix
D =
427.8208 0
0 -38.6419
Of course in matlab I just computed the first r=2, in R i want all the eigenvalues and eigenvectors (n=5), and then i subset the first 2.
Can someone help me to solve this?
geigen has detected a symmetric matrix for Gamma_chi_0. Then Lapack encounters an error and cannot continue. Specify symmetric=FALSE in the call of geigen. The manual describes what argument symmetric does. Do this
geigen(Gamma_chi_0, B, symmetric=FALSE, only.values=FALSE)
The result is (on my computer)
$values
[1] 4.312749e+02 -3.869203e+01 -2.328465e+01 1.706288e-05 1.840783e+01
$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] -0.067535068 1.0000000 0.2249715 -0.89744514 0.05194799
[2,] -0.035746438 0.1094176 0.3273440 0.03714518 1.00000000
[3,] 0.005083806 0.3782606 0.8588086 0.50306323 0.17858115
[4,] -1.000000000 0.2986963 0.4067701 -1.00000000 -0.48314183
[5,] -0.034226056 -0.6075727 1.0000000 -0.53017872 0.06738515
$alpha
[1] 1.365959e+00 -1.152686e+00 -9.202769e-01 4.352770e-07 5.588102e-01
$beta
[1] 0.003167259 0.029791306 0.039522893 0.025510167 0.030357208
This is quite close to what you show for Matlab. I know nothing about Matlab so I cannot help you with that.
Addendum
Matlab seems to use similar methods as geigen when the matrices used are determined to be symmetric or not. Your matrix Gamma_chi_0 may not be exactly symmetric. See this documentation for argument 'algorithm' of eig.
More addendum
In actual fact your matrix B is not positive definite. Try the function chol of base R. And you'll get the same error message. In this case you have to force geigen to use the general algorithm.
When using the knn() function in package class in R, there is an argument called "prob". If I make this true, I get the probability of that particular value being classified to whatever it is classified as.
I have a dataset where the classifier has 9 levels. Is there any way in which I can get the probability of a particular observation for all the 9 levels?
As far as I know the knn() function in class only returns the highest probability.
However, you can use the knnflex package which allows you to return all probability levels using knn.probability (see here, page 9-10).
This question still require proper answer.
If the probability for the most probable class is needed then the class package will be still suited. The clue is to set the argument prob to TRUE and k to higher than default 1 - class::knn(tran, test, cl, k = 5, prob = TRUE). The k has to be higher than default 1 to not get always 100% probability for each observation.
However if you want to get probabilities for each of the classes I will recommend the caret::knn3 function with predict one.
data(iris3)
train <- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])
test <- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])
cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
# class package
# take into account k higher than 1 and prob equal TRUE
model <- class::knn(train, test, cl, k = 5, prob = TRUE)
tail(attributes(model)$prob, 10)
#> [1] 1.0 1.0 1.0 1.0 1.0 1.0 0.8 1.0 1.0 0.8
# caret package
model2 <- predict(caret::knn3(train, cl, k = 3), test)
tail(model2, 10)
#> c s v
#> [66,] 0.0000000 0 1.0000000
#> [67,] 0.0000000 0 1.0000000
#> [68,] 0.0000000 0 1.0000000
#> [69,] 0.0000000 0 1.0000000
#> [70,] 0.0000000 0 1.0000000
#> [71,] 0.0000000 0 1.0000000
#> [72,] 0.3333333 0 0.6666667
#> [73,] 0.0000000 0 1.0000000
#> [74,] 0.0000000 0 1.0000000
#> [75,] 0.3333333 0 0.6666667
Created on 2021-07-20 by the reprex package (v2.0.0)
I know there is an answer already marked here, but this is possible to complete without utilizing another function or package.
What you can do instead is build your knn model knn_model and check out it's attributes for the "prob" output, as such.
attributes(knn_model)$prob
I searched about poly() in R and I think it should produce orthogonal polynomials so when we use it in regression model like lm(y~poly(x,2)) the predictors are uncorrelated. However:
poly(1:3,2)=
[1,] -7.071068e-01 0.4082483
[2,] -7.850462e-17 -0.8164966
[3,] 7.071068e-01 0.4082483
I think this is probably a stupid question but what I don't understand is the column vectors of the result poly(1:3,2) does not have inner product zero? That is -7.07*0.40-7.85*(-0.82)+7.07*0.41=/ 0? so how is this uncorrelated predictors for regression?
Your main problem is that you're missing the meaning of the e or "E notation": as commented by #MamounBenghezal above, fffeggg is shorthand for fff * 10^(ggg)
I get slightly different answers than you do (the difference is numerically trivial) because I'm running this on a different platform:
pp <- poly(1:3,2)
## 1 2
## [1,] -7.071068e-01 0.4082483
## [2,] 4.350720e-18 -0.8164966
## [3,] 7.071068e-01 0.4082483
An easier format to see:
print(zapsmall(matrix(c(pp),3,2)),digits=3)
## [,1] [,2]
## [1,] -0.707 0.408
## [2,] 0.000 -0.816
## [3,] 0.707 0.408
sum(pp[,1]*pp[,2]) ## 5.196039e-17, effectively zero
Or to use your example, with the correct placement of decimal points:
-0.707*0.408-(7.85e-17)*(-0.82)+(0.707)*0.408
## [1] 5.551115e-17