New Probability Model fitting using R Code - r

I have developed new probability model using generalized technique of mixturing.
Now i want it's fitting on discrete data set.
But i am getting error that
Error in seq.default(0, x) : 'to' must be of length 1
I don't understand, how to handle this.
R code is below:
# rm(list=ls(all=TRUE))
obs=rep(seq(0,6),c(260,87,32,4,1,0,0))
NBWED<-function(x,r,alpha,beta){
j=seq(0,x)
C=function(n,x){
factorial(n)/(factorial(n-x)*factorial(x))
}
C(x+r-1,x)*sum(C(x,j)*(-1)^j*(alpha^2/(alpha+beta))*((r+j+alpha)+beta)/(r+j+alpha)^2)
}
library(MASS)
fit09=fitdistr(x = obs,densfun = NBWED,start = list(r=1,alpha=0.5,beta=9.4),lower = list(a = 0.1,0.001,0.001),upper=c(Inf,Inf,Inf))
fit09

seq(0,n) creates a vector from 0 to n. Probably, your x is a vector or something similar, therefore it throws an error.
Just try: seq(0,5) and see the result. It would help.

Related

Unused argument error when building a Confusion Matrix in R

I am currently trying to run Logistic Regression model on my DF.
While I was creating a new modelframe with the actual and predicted values i get get the following error message.
Error
Error in confusionMatrix(as.factor(log_class), lgtest$Satisfaction, positive = "satisfied") :
unused argument (positive = "satisfied")
This is my model:
#### Logistic regression model
log_model = glm(Satisfaction~., data = lgtrain, family = "binomial")
summary(log_model)
log_preds = predict(log_model, lgtest[,1:22], type = "response")
head(log_preds)
log_class = array(c(99))
for (i in 1:length(log_preds)){
if(log_preds[i]>0.5){
log_class[i]="satisfied"}else{log_class[i]="neutral or dissatisfied"}}
### Creating a new modelframe containing the actual and predicted values.
log_result = data.frame(Actual = lgtest$Satisfaction, Prediction = log_class)
lgtest$Satisfaction = factor(lgtest$Satisfaction, c(1,0),labels=c("satisfied","neutral or dissatisfied"))
lgtest
confusionMatrix(log_class, log_preds, threshold = 0.5) ####this works
mr1 = confusionMatrix(as.factor(log_class),lgtest$Satisfaction, positive = "satisfied") ## this is the line that causes the error
I had same problem. I typed "?confusionMatrix" and take this output:
Help on topic 'confusionMatrix' was found in the following packages:
confusionMatrix
(in package InformationValue in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
Create a confusion matrix
(in package caret in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
Confusion Matrix
(in package ModelMetrics in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
As we can understand from here, since it is in more than one package, we need to specify which package we want to use.
So I typed code with "caret::confusionMatrix(...)" and it worked!
This is how we can write the code to get rid of argument error when building a confusion matrix in R
caret::confusionMatrix(
data = new_tree_predict$predicted,
reference = new_tree_predict$actual,
positive = "True"
)

R implementation of kohonen SOMs: prediction error due to data type.

I have been trying to run an example code for supervised kohonen SOMs from https://clarkdatalabs.github.io/soms/SOM_NBA . When I tried to predict test set data I got the following error:
pos.prediction <- predict(NBA.SOM3, newdata = NBA.testing)
Error in FUN(X[[i]], ...) :
Data type not allowed: should be a matrix or a factor
I tried newdata = as.matrix(NBA.testing) but it did not help. Neither did as.factor().
Why does it happen? And how can I fix that?
You should put one more argument to the predict function, i.e. "whatmap", then set its value to 1.
The code would be like:
pos.prediction <- predict(NBA.SOM3, newdata = NBA.testing, whatmap = 1)
To verify the prediction result, you can check using:
table(NBA$Pos[-training_indices], pos.prediction$predictions[[2]], useNA = 'always')
The result may be different from that of the tutorial, since it did not declare the use of set.seed() function.
I suggest that the set.seed() with an arbitrary number in it was declared somewhere before the training phase.
For simplicity, put it once on the top most of your script, e.g.
set.seed(12345)
This will guarantee a reproducible result of your model next time you re-run your script.
Hope that will help.

Fitting Step functions

AIM: The aim here was to find a suitable fit, using step functions, which uses age to describe wage, in the Wage dataset in the library ISLR.
PLAN:
To find a suitable fit, I'll try multiple fits, which will have different cut points. I'll use the glm() function (of the boot library) for the fitting purpose. In order to check which fit is the best, I'll use the cv.glm() function to perform cross-validation over the fitted model.
PROBLEM:
In order to do so, I did the following:
all.cvs = rep(NA, 10)
for (i in 2:10) {
lm.fit = glm(wage~cut(Wage$age,i), data=Wage)
all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}
But this gives an error:
Error in model.frame.default(formula = wage ~ cut(Wage$age, i), data =
list( : variable lengths differ (found for 'cut(Wage$age, i)')
Whereas, when I run the code given below, it runs.(It can be found here)
all.cvs = rep(NA, 10)
for (i in 2:10) {
Wage$age.cut = cut(Wage$age, i)
lm.fit = glm(wage~age.cut, data=Wage)
all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}
Hypotheses and Results:
Well, it might be possible that cut() and glm() might not work together. But this works:
glm(wage~cut(age,4),data=Wage)
Question:
So, basically we're using the cut() function, saving it's results in a variable, then using that variable in the glm() function. But we can't put the cut function inside the glm() function. And that too, only if the code is in a loop.
So, why is the first version of the code not working?
This is confusing. Any help appreciated.

Kaggle Digit Recognizer Using SVM (e1071): Error in predict.svm(ret, xhold, decision.values = TRUE) : Model is empty

I am trying to solve the digit Recognizer competition in Kaggle and I run in to this error.
I loaded the training data and adjusted the values of it by dividing it with the maximum pixel value which is 255. After that, I am trying to build my model.
Here Goes my code,
Given_Training_data <- get(load("Given_Training_data.RData"))
Given_Testing_data <- get(load("Given_Testing_data.RData"))
Maximum_Pixel_value = max(Given_Training_data)
Tot_Col_Train_data = ncol(Given_Training_data)
training_data_adjusted <- Given_Training_data[, 2:ncol(Given_Training_data)]/Maximum_Pixel_value
testing_data_adjusted <- Given_Testing_data[, 2:ncol(Given_Testing_data)]/Maximum_Pixel_value
label_training_data <- Given_Training_data$label
final_training_data <- cbind(label_training_data, training_data_adjusted)
smp_size <- floor(0.75 * nrow(final_training_data))
set.seed(100)
training_ind <- sample(seq_len(nrow(final_training_data)), size = smp_size)
training_data1 <- final_training_data[training_ind, ]
train_no_label1 <- as.data.frame(training_data1[,-1])
train_label1 <-as.data.frame(training_data1[,1])
svm_model1 <- svm(train_label1,train_no_label1) #This line is throwing an error
Error : Error in predict.svm(ret, xhold, decision.values = TRUE) : Model is empty!
Please Kindly share your thoughts. I am not looking for an answer but rather some idea that guides me in the right direction as I am in a learning phase.
Thanks.
Update to the question :
trainlabel1 <- train_label1[sapply(train_label1, function(x) !is.factor(x) | length(unique(x))>1 )]
trainnolabel1 <- train_no_label1[sapply(train_no_label1, function(x) !is.factor(x) | length(unique(x))>1 )]
svm_model2 <- svm(trainlabel1,trainnolabel1,scale = F)
It didn't help either.
Read the manual (https://cran.r-project.org/web/packages/e1071/e1071.pdf):
svm(x, y = NULL, scale = TRUE, type = NULL, ...)
...
Arguments:
...
x a data matrix, a vector, or a sparse matrix (object of class
Matrix provided by the Matrix package, or of class matrix.csr
provided by the SparseM package,
or of class simple_triplet_matrix provided by the slam package).
y a response vector with one label for each row/component of x.
Can be either a factor (for classification tasks) or a numeric vector
(for regression).
Therefore, the mains problems are that your call to svm is switching the data matrix and the response vector, and that you are passing the response vector as integer, resulting in a regression model. Furthermore, you are also passing the response vector as a single-column data-frame, which is not exactly how you are supposed to do it. Hence, if you change the call to:
svm_model1 <- svm(train_no_label1, as.factor(train_label1[, 1]))
it will work as expected. Note that training will take some minutes to run.
You may also want to remove features that are constant (where the values in the respective column of the training data matrix are all identical) in the training data, since these will not influence the classification.
I don't think you need to scale it manually since svm itself will do it unlike most neural network package.
You can also use the formula version of svm instead of the matrix and vectors which is
svm(result~.,data = your_training_set)
in your case, I guess you want to make sure the result to be used as factor,because you want a label like 1,2,3 not 1.5467 which is a regression
I can debug it if you can share the data:Given_Training_data.RData

For loops regression in R

I'm fitting GARCH model to the residuals of and ARIMA, and trying to apply ARCH(p) for p from 1 to 10 to compare the fitness. Here is my code. Errors are returned in the for loop part but I cannot figure out the reason why. Could anyone give some tips?
So for the single value p=1 the codes are as below and it's no problem.
fitone<- garchFit(~garch(1,0),data=logprice)
coef(fitone)
summary(fitone)
And for the for loop my codes go like
for (n in 1:10) {
fit [[n]]<- garchFit(~garch(n,0),data=logprice)
coef(fit[[n]])
summary(fit[[n]])
}
Error in .garchArgsParser(formula = formula, data = data, trace = FALSE) :
Formula and data units do not match.
I never wrote a loop code before. Can someone help me with the codes?
The problem is that generally one tries to evaluate all the variables in a formula in the context of the data= parameter, but your n variable isn't coming from logprice, it's coming from the global environment. You will need to dynamically create the formula. Here's one way to run all the models with lapply rather than a for look would be
library(fGarch)
#sample data
x.vec = as.vector(garchSim(garchSpec(rseed = 1985), n = 200)[,1])
fits <- lapply(1:10, function(n) {
garchFit(bquote(~garch(.(n),0)), data = x.vec, trace = FALSE)
})
and then we can get the coefs with
lapply(fits, coef)

Resources