R predict() returns only fitted values for nls() model - r

First of all, I would like to mention I am just a beginner in R.
I have encountered a problem when trying to predict data from a model generated by nls(). I fitted the exponential decay function into my data and everything seems to be fine, e.g. I got a decent regression line. However, when I use predict() on a new data set, it returns only fitted values.
My code is:
df = data.frame(Time = c(0,5,15,30), Value = c(1, 0.38484677,0.18679383, 0.06732328))
model <- nls(Value~a*exp(-b*Time), start=list(a=1, b=0.15), data = df)
plot(Value~Time, data = df)
lines(df$Time, predict(model))
newtime <- data.frame(Time = seq(1,20, by = 1))
pr = predict(model, newdata = newtime$Time)
pr
[1] 0.979457389 0.450112312 0.095058637 0.009225664
Could someone explain me please, what I am doing wrong? I know there are here some answers to that problem, but none helped me.
Thank you in advance for your help!

The newdata parameter should be a data.frame with the same names as your input data. When you use newdata = newtime$Time you are actually passing in newtime$Time which is not a data.frame anymore since it 'dropped' down to a vector. You can just pass in newtime like so
pr = predict(model, newdata = newtime)

Related

Make a model matrix if missing the response variable and where matrix multiplication recreates the predict function

I want to create a model matrix for a test dataset which is missing the response variable, and where I can perfectly replicate the results of calling predict() on the model if building predictions using matrix multiplication. See code below for example.
I have code which can do this (again, see below for example), but it requires that I create a placeholder response variable in my test data. This doesn't seem very clean, and I'm wondering if there's a way to get the code to work without this workaround.
# Make data, fit model
set.seed(1); df_train = data.frame(y = rnorm(10), x = rnorm(10), z = rnorm(10))
set.seed(2); df_test = data.frame(x = rnorm(10), z = rnorm(10))
fit = lm(y ~ poly(x) + poly(z), data = df_train)
# Make model matrices. Get error for the test data as 'y' isnt found
mm_train = model.matrix(terms(fit), df_train)
mm_test = model.matrix(terms(fit), df_test) #"Error in eval(predvars, data, env) : object 'y' not found"
# Make fake y variable for test data then build model matrix. I want to know if there's a less hacky way to do this
df_test$y = 1
mm_test = model.matrix(terms(fit), df_test)
# Check predict and matrix multiplication give identical results on test data. NB this is not the case if contstructing the model matrix using (e.g.) mm_test = model.matrix(formula(fit)[-2], df_test) for the reason outlined here https://stackoverflow.com/questions/59462820/why-are-predict-lm-and-matrix-multiplication-giving-different-predictions.
preds_1 = round(predict(fit, df_test), 5)
preds_2 = round(mm_test %*% fit$coefficients, 5)
all(preds_1 == preds_2) #TRUE

Caret train function for muliple data frames as function

there has been a similar question to mine 6 years+ ago and it hasn't been solve (R -- Can I apply the train function in caret to a list of data frames?)
This is why I am bringing up this topic again.
I'm writing my own functions for my big R project at the moment and I'm wondering if there is an opportunity to sum up the model training function train() of the pakage caret for different dataframes with different predictors.
My function should look like this:
lda_ex <- function(data, predictor){
model <- train(predictor ~., data,
method = "lda",
trControl = trainControl(method = "none"),
preProc = c("center","scale"))
return(model)
}
Using it afterwards should work like this:
data_iris <- iris
predictor_iris <- "Species"
iris_res <- lda_ex(data = data_iris, predictor = predictor_iris)
Unfortunately the R formula is not able to deal with a variable as input as far as I tried.
Is there something I am missing?
Thank you in advance for helping me out!
Solving this would help me A LOT to keep my function sheet clean and safe work for sure.
By writing predictor_iris <- "Species", you are basically saving a string object in predictor_iris. Thus, when you run lda_ex, I guess you incur in some error concerning the formula object in train(), since you are trying to predict a string using vectors of covariates.
Indeed, I tried the following toy example:
X = rnorm(1000)
Y = runif(1000)
predictor = "Y"
lm(predictor ~ X)
which gives an error about differences in the lengths of variables.
Let me modify your function:
lda_ex <- function(data, formula){
model <- train(formula, data,
method = "lda",
trControl = trainControl(method = "none"),
preProc = c("center","scale"))
return(model)
}
The key difference is that now we must pass in the whole formula, instead of the predictor only. In that way, we avoid the string-related problem.
library(caret) # Recall to specify the packages needed to reproduce your examples!
data_iris <- iris
formula_iris = Species ~ . # Key difference!
iris_res <- lda_ex(data = data_iris, formula = formula_iris)

R: how to make predictions using gamboost

library(mboost)
### a simple two-dimensional example: cars data
cars.gb <- gamboost(dist ~ speed, data = cars, dfbase = 4,
control = boost_control(mstop = 50))
set.seed(1)
cars_new <- cars + rnorm(nrow(cars))
> predict(cars.gb, newdata = cars_new$speed)
Error in check_newdata(newdata, blg, mf) :
‘newdata’ must contain all predictor variables, which were used to specify the model.
I fit a model using the example on the help(gamboost) page. I want to use this model to predict on a new dataset, cars_new, but encountered the above error. How can I fix this?
predict function looks for a variable called speed but when you subset it with $ sign it has no name anymore.
so, this variant of prediction works;
predict(cars.gb, newdata = data.frame(speed = cars_new$speed))
or keep the original name as is;
predict(cars.gb, newdata = cars_new['speed'])

Why when I run the ggttest there is an error?

When I run the t-test for a numeric and a dichotomous variable there in no problem and I can see the results. The problem is when I run the ggttest of the same t-test. There is an error and says that one of my variable is not found. I do not why that happens. The aml dataset I used is from package boot. Below you can see the code:
https://i.stack.imgur.com/7kuaA.png
library(gginference)
time_group.test16537 = t.test(formula = time~group,
data = aml,
alternative = "two.sided",
paired = FALSE,
var.equal = FALSE,
conf.level = 0.95)
time_group.test16537
ggttest(time_group.test16537,
colaccept="lightsteelblue1",
colreject="gray84",
colstat="navyblue")
The problem comes with these lines of code in ggttest:
datnames <- strsplit(t$data.name, splitter)
len1 <- length(eval(parse(text = datnames[[1]][1])))
len2 <- length(eval(parse(text = datnames[[1]][2])))
It tries to find the len of group and time, but it doesn't see that it came from a data.frame. Pretty bad bug...
For your situation, supposedly you have less than 30 in each group and it plots a t-distribution, so do:
library(gginference)
library(boot)
gginference:::normt(t.test(time~group,data=aml),
colaccept = "lightsteelblue1",colreject = "grey84",
colstat = "navyblue")
t.test doesn't store your data in the output so there is no way that you could extract the data from the list of the output of t.test.
The only way to use formula is:
library(gginference)
t_test <- t.test(questionnaire$pulse ~ questionnaire$gender)
ggttest(t_test)
Original answer here: How to extract the dataset from an "htest" object when using formula in r

R object is not a matrix

I am new to R and trying to save my svm model in R and have read the documentation but still do not understand what is wrong.
I am getting the error "object is not a matrix" which would seem to mean that my data is not a matrix, but it is... so something is missing.
My data is defined as:
data = read.table("data.csv")
trainSet = as.data.frame(data[,1:(ncol(data)-1)])
Where the last line is my label
I am trying to define my model as:
svm.model <- svm(type ~ ., data=trainSet, type='C-classification', kernel='polynomial',scale=FALSE)
This seems like it should be correct but I am having trouble finding other examples.
Here is my code so far:
# load libraries
require(e1071)
require(pracma)
require(kernlab)
options(warn=-1)
# load dataset
SVMtimes = 1
KERNEL="polynomial"
DEGREE = 2
data = read.table("head.csv")
results10foldAll=c()
# Cross Fold for training and validation datasets
for(timesRun in 1:SVMtimes) {
cat("Running SVM = ",timesRun," result = ")
trainSet = as.data.frame(data[,1:(ncol(data)-1)])
trainClasses = as.factor(data[,ncol(data)])
model = svm(trainSet, trainClasses, type="C-classification",
kernel = KERNEL, degree = DEGREE, coef0=1, cost=1,
cachesize = 10000, cross = 10)
accAll = model$accuracies
cat(mean(accAll), "/", sd(accAll),"\n")
results10foldAll = rbind(results10foldAll, c(mean(accAll),sd(accAll)))
}
# create model
svm.model <- svm(type ~ ., data = trainSet, type='C-classification', kernel='polynomial',scale=FALSE)
An example of one of my samples would be:
10.135338 7.214543 5.758917 6.361316 0.000000 18.455875 14.082668 31
Here, trainSet is a data frame but in the svm.model function it expects data to be a matrix(where you are assigning trainSet to data). Hence, set data = as.matrix(trainSet). This should work fine.
Indeed as pointed out by #user5196900 you need a matrix to run the svm(). However beware that matrix object means all columns have same datatypes, all numeric or all categorical/factors. If this is true for your data as.matrix() may be fine.
In practice more than often people want to model.matrix() or sparse.model.matrix() (from package Matrix) which gives dummy columns for categorical variables, while having single column for numerical variables. But a matrix indeed.

Resources