How to use predict from a model stored in a list in R? - r

I have a dataframe dfab that contains 2 columns that I used as argument to generate a series of linear models as following:
models = list()
for (i in 1:10){
models[[i]] = lm(fc_ab10 ~ (poly(nUs_ab, i)), data = dfab)
}
dfab has 32 observations and I want to predict fc_ab10 for only 1 value.
I thought of doing so:
newdf = data.frame(newdf = nUs_ab)
newdf[] = 0
newdf[1,1] = 56
prediction = predict(models[[1]], newdata = newdf)
First I tried writing newdf as a dataframe with only one position, but since there are 32 in the dataset on which the model was built, I thought I had to provide at least 32 points as well. I don't think this is necessary though.
Every time I run that piece of code I am given the following error:
Error: variable 'poly(nUs_ab, i) was fitted with type “nmatrix.1” but type “numeric” was supplied.
In addition: Warning message:
In Z/rep(sqrt(norm2[-1L]), each = length(x)) :
longer object length is not a multiple of shorter object length
I thought all I need to use predict was a LM model, predictors (the number 56) given in a column-named dataframe. Obviously, I am mistaken.
How can I fix this issue?
Thanks.

newdf should be a data.frame with column name nUs_ab, otherwise R won't be able to know which column to operate upon (i.e., generate the prediction design matrix). So the following code should work
newdf = data.frame(nUs_ab = 56)
prediction = predict(models[[1]], newdata = newdf)

Related

How can I include both my categorical and numeric predictors in my elastic net model? r

As a note beforehand, I think I should mention that I am working with highly sensitive medical data that is protected by HIPAA. I cannot share real data with dput- it would be illegal to do so. That is why I made a fake dataset and explained my processes to help reproduce the error.
I have been trying to estimate an elastic net model in r using glmnet. However, I keep getting an error. I am not sure what is causing it. The error happens when I go to train the data. It sounds like it has something to do with the data type and matrix.
I have provided a sample dataset. Then I set the outcomes and certain predictors to be factors. After setting certain variables to be factors, I label them. Next, I create an object with the column names of the predictors I want to use. That object is pred.names.min. Then I partition the data into the training and test data frames. 65% in the training, 35% in the test. With the train control function, I specify a few things I want to have happen with the model- random paraments for lambda and alpha, as well as the leave one out method. I also specify that it is a classification model (categorical outcome). In the last step, I specify the training model. I write my code to tell it to use all of the predictor variables in the pred.names.min object for the trainingset data frame.
library(dplyr)
library(tidyverse)
library(glmnet),0,1,0
library(caret)
#creating sample dataset
df<-data.frame("BMIfactor"=c(1,2,3,2,3,1,2,1,3,2,1,3,1,1,3,2,3,2,1,2,1,3),
"age"=c(0,4,8,1,2,7,4,9,9,2,2,1,8,6,1,2,9,2,2,9,2,1),
"L_TartaricacidArea"=c(0,1,1,0,1,1,1,0,0,1,0,1,1,0,1,0,0,1,1,0,1,1),
"Hydroxymethyl_5_furancarboxylicacidArea_2"=
c(1,1,0,1,0,0,1,0,1,1,0,1,1,0,1,1,0,1,0,1,0,1),
"Anhydro_1.5_D_glucitolArea"=
c(8,5,8,6,2,9,2,8,9,4,2,0,4,8,1,2,7,4,9,9,2,2),
"LevoglucosanArea"=
c(6,2,9,2,8,6,1,8,2,1,2,8,5,8,6,2,9,2,8,9,4,2),
"HexadecanolArea_1"=
c(4,9,2,1,2,9,2,1,6,1,2,6,2,9,2,8,6,1,8,2,1,2),
"EthanolamineArea"=
c(6,4,9,2,1,2,4,6,1,8,2,4,9,2,1,2,9,2,1,6,1,2),
"OxoglutaricacidArea_2"=
c(4,7,8,2,5,2,7,6,9,2,4,6,4,9,2,1,2,4,6,1,8,2),
"AminopentanedioicacidArea_3"=
c(2,5,5,5,2,9,7,5,9,4,4,4,7,8,2,5,2,7,6,9,2,4),
"XylitolArea"=
c(6,8,3,5,1,9,9,6,6,3,7,2,5,5,5,2,9,7,5,9,4,4),
"DL_XyloseArea"=
c(6,9,5,7,2,7,0,1,6,6,3,6,8,3,5,1,9,9,6,6,3,7),
"ErythritolArea"=
c(6,7,4,7,9,2,5,5,8,9,1,6,9,5,7,2,7,0,1,6,6,3),
"hpresponse1"=
c(1,0,1,1,0,1,1,0,0,1,0,0,1,0,1,1,1,0,1,0,0,1),
"hpresponse2"=
c(1,0,1,0,0,1,1,1,0,1,0,1,0,1,1,0,1,0,1,0,0,1))
#setting variables as factors
df$hpresponse1<-as.factor(df$hpresponse1)
df$hpresponse2<-as.factor(df$hpresponse2)
df$BMIfactor<-as.factor(df$BMIfactor)
df$L_TartaricacidArea<- as.factor(df$L_TartaricacidArea)
df$Hydroxymethyl_5_furancarboxylicacidArea_2<-
as.factor(df$Hydroxymethyl_5_furancarboxylicacidArea_2)
#labeling factor levels
df$hpresponse1 <- factor(df$hpresponse1, labels = c("group1.2", "group3.4"))
df$hpresponse2 <- factor(df$hpresponse2, labels = c("group1.2.3", "group4"))
df$L_TartaricacidArea <- factor(df$L_TartaricacidArea, labels =c ("No",
"Yes"))
df$Hydroxymethyl_5_furancarboxylicacidArea_2 <-
factor(df$Hydroxymethyl_5_furancarboxylicacidArea_2, labels =c ("No",
"Yes"))
df$BMIfactor <- factor(df$BMIfactor, labels = c("<40", ">=40and<50",
">=50"))
#creating list of predictor names
pred.start.min <- which(colnames(df) == "BMIfactor"); pred.start.min
pred.stop.min <- which(colnames(df) == "ErythritolArea"); pred.stop.min
pred.names.min <- colnames(df)[pred.start.min:pred.stop.min]
#partition data into training and test (65%/35%)
set.seed(2)
n=floor(nrow(df)*0.65)
train_ind=sample(seq_len(nrow(df)), size = n)
trainingset=df[train_ind,]
testingset=df[-train_ind,]
#specifying that I want to use the leave one out cross-
#validation method and
use "random" as search for elasticnet
tcontrol <- trainControl(method = "LOOCV",
search="random",
classProbs = TRUE)
#training model
elastic_model1 <- train(as.matrix(trainingset[,
pred.names.min]),
trainingset$hpresponse1,
data = trainingset,
method = "glmnet",
trControl = tcontrol)
After I run the last chunk of code, I end up with this error:
Error in { :
task 1 failed - "error in evaluating the argument 'x' in selecting a
method for function 'as.matrix': object of invalid type "character" in
'matrix_as_dense()'"
In addition: There were 50 or more warnings (use warnings() to see the first
50)
I tried removing the "as.matrix" arguemtent:
elastic_model1 <- train((trainingset[, pred.names.min]),
trainingset$hpresponse1,
data = trainingset,
method = "glmnet",
trControl = tcontrol)
It still produces a similar error.
Error in { :
task 1 failed - "error in evaluating the argument 'x' in selecting a method
for function 'as.matrix': object of invalid type "character" in
'matrix_as_dense()'"
In addition: There were 50 or more warnings (use warnings() to see the first
50)
When I tried to make none of the predictors factors (but keep outcome as factor), this is the error I get:
Error: At least one of the class levels is not a valid R variable name; This
will cause errors when class probabilities are generated because the
variables names will be converted to X0, X1 . Please use factor levels that
can be used as valid R variable names (see ?make.names for help).
How can I fix this? How can I use my predictors (both the numeric and categorical ones) without producing an error?
glmnet does not handle factors well. The recommendation currently is to dummy code and re-code to numeric where possible:
Using LASSO in R with categorical variables

how to use data matrix in predict function in R

Context
I have data frame with multiple columns and i have to do regression between alternate columns (like between 1&2, 3&4) and predict using a test data set and then find residuals. I do not want to reference each column using column name (like target.data$nifty), hence I am using data matrix.
Problem
When i predict using a data matrix, I am getting the following error:
number of items to replace is not a multiple of replacement length
I know my training data set has 255 rows and test data set has 50 rows but it should be able to predict on test data set and give me 50 predicted values, but it is not.
b is a matrix having training data set of 255 rows and 8 columns and t is a matrix having test data set of 50 rows and 8 columns.
I tried using matrix directly as newdata in predict but it was giving following error:
Error in eval(predvars, data, env) : numeric 'envir' arg not of
length one
... so i converted it into a data frame. Please suggest how can I use matrix inside predict.
Here's the code i am using:
b<-as.matrix(target.data)
t<-as.matrix(target.test)
t_pred <- matrix(,nrow = 50,ncol = 8)
t_res <- matrix(,nrow = 50,ncol = 8)
for(i in 1:6) {
if (!i %% 2) {
t_model <- lm(b[,i] ~ b[,i])
t_pred[,i] <- predict(t_model, newdata=data.frame(t[,i]), type = 'response')
t_res[,i] <- t_pred[,i] - t[,i]
}
}

Getting Error in R when Performing predict function

I am getting the following error whenever I try to predict data against my linear model.
Warning message: 'newdata' had 101 rows but variables found have 296 rows
The following is the code snippet
trainingFrame = data.frame(weeksTrainingConv,bugsTraining)
validateFrame = data.frame(weekTestConv,bugsTest)
model <- lm(totWeekConv ~ totBugs,trainingFrame)
myPrediction <- predict(model,validateFrame)
The calculations for the dataframe and their components are written in a separate sheet. Here is the snippet. I have commented out the blocks according to the nature of the code. The first block represents my training dataset, the second is the dataset which I will use to test my model. Finally the last block is the total dataset.
library(lubridate)
#training DataSet
weeksTraining = as.Date(c("2003-12-28","2004-01-04","2004-01-11","2004-01-18","2004-01-25","2004-02-01","2004-02-08","2004-02-15","2004-02-22","2004-02-29","2004-03-07","2004-03-14","2004-03-21","2004-03-28","2004-04-04","2004-04-11","2004-04-18","2004-04-25","2004-05-02","2004-05-09","2004-05-16","2004-05-23","2004-05-30","2004-06-06","2004-06-13","2004-06-20","2004-06-27","2004-07-04","2004-07-11","2004-07-18","2004-07-25","2004-08-01","2004-08-08","2004-08-15","2004-08-22","2004-08-29","2004-09-05","2004-09-12","2004-09-19","2004-09-26","2004-10-03","2004-10-10","2004-10-17","2004-10-24","2004-10-31","2004-11-07","2004-11-14","2004-11-21","2004-11-28","2004-12-05","2004-12-12","2004-12-19","2004-12-26","2005-01-02","2005-01-09","2005-01-16","2005-01-23","2005-01-30","2005-02-06","2005-02-13","2005-02-20","2005-02-27","2005-03-06","2005-03-13","2005-03-20","2005-03-27","2005-04-03","2005-04-10","2005-04-17","2005-04-24","2005-05-01","2005-05-08","2005-05-15","2005-05-22","2005-05-29","2005-06-05","2005-06-12","2005-06-19","2005-06-26","2005-07-03","2005-07-10","2005-07-17","2005-07-24","2005-07-31","2005-08-07","2005-08-14","2005-08-21","2005-08-28","2005-09-04","2005-09-11","2005-09-18","2005-09-25","2005-10-02","2005-10-09","2005-10-16","2005-10-23","2005-10-30","2005-11-06","2005-11-13","2005-11-20","2005-11-27","2005-12-04","2005-12-11","2005-12-18","2005-12-25","2006-01-01","2006-01-08","2006-01-15","2006-01-22","2006-01-29","2006-02-05","2006-02-12","2006-02-19","2006-02-26","2006-03-05","2006-03-12","2006-03-19","2006-03-26","2006-04-02","2006-04-09","2006-04-16","2006-04-23","2006-04-30","2006-05-07","2006-05-14","2006-05-21","2006-05-28","2006-06-04","2006-06-11","2006-06-18","2006-06-25","2006-07-02","2006-07-09","2006-07-16","2006-07-23","2006-07-30","2006-08-06","2006-08-13","2006-08-20","2006-08-27","2006-09-03","2006-09-10","2006-09-17","2006-09-24","2006-10-01","2006-10-08","2006-10-15","2006-10-22","2006-10-29","2006-11-05","2006-11-12","2006-11-19","2006-11-26","2006-12-03","2006-12-10","2006-12-17","2006-12-24","2006-12-31","2007-01-07","2007-01-14","2007-01-21","2007-01-28","2007-02-04","2007-02-11","2007-02-18","2007-02-25","2007-03-04","2007-03-11","2007-03-18","2007-03-25","2007-04-01","2007-04-08","2007-04-15","2007-04-22","2007-04-29","2007-05-06","2007-05-13","2007-05-20","2007-05-27","2007-06-03","2007-06-10","2007-06-17","2007-06-24","2007-07-01","2007-07-08","2007-07-15","2007-07-22","2007-07-29","2007-08-05","2007-08-12","2007-08-19","2007-08-26","2007-09-02","2007-09-09","2007-09-16"))
bugsTraining = c(3,18,14,25,21,13,17,25,21,18,20,11,17,19,23,9,7,18,13,17,16,15,16,18,20,12,14,16,19,23,18,10,24,23,11,14,16,19,22,20,15,21,14,9,19,12,18,12,20,10,20,16,14,12,16,11,10,18,20,17,17,20,16,15,20,19,9,11,11,17,10,14,10,16,7,14,11,9,10,9,14,7,13,13,13,16,17,7,17,8,11,11,10,16,9,20,9,13,13,6,11,21,8,10,7,14,16,13,12,9,13,12,17,13,10,12,15,14,8,8,9,13,9,9,18,9,6,10,14,11,5,6,7,4,9,9,9,6,4,5,7,10,12,7,4,13,11,9,6,6,2,8,10,2,7,7,4,1,5,5,10,11,5,11,9,14,5,9,2,6,6,4,4,2,5,7,13,6,4,3,1,5,4,4,2,6,3,5,2,5,5,3,1,5,2)
weeksTrainingConv = numeric();
#converting Dates to numerical Value
for(i in 1:length(weeksTraining)){
val = ymd(weeksTraining[i])
val = as.numeric(val)
weeksTrainingConv[i] = c(val)
print(weeksTrainingConv[i])
}
#end Training DataSet
#test DataSet
weekTest = as.Date(c("2007-09-23","2007-09-30","2007-10-07","2007-10-14","2007-10-21","2007-10-28","2007-11-04","2007-11-11","2007-11-18","2007-11-25","2007-12-02","2007-12-09","2007-12-16","2007-12-30","2008-01-06","2008-01-13","2008-01-20","2008-01-27","2008-02-03","2008-02-10","2008-02-17","2008-02-24","2008-03-02","2008-03-09","2008-03-16","2008-03-23","2008-03-30","2008-04-06","2008-04-13","2008-04-20","2008-04-27","2008-05-04","2008-05-11","2008-05-18","2008-05-25","2008-06-01","2008-06-08","2008-06-15","2008-06-22","2008-06-29","2008-07-06","2008-07-20","2008-07-27","2008-08-03","2008-08-10","2008-08-17","2008-08-24","2008-08-31","2008-09-07","2008-09-14","2008-09-21","2008-09-28","2008-10-05","2008-10-12","2008-10-19","2008-10-26","2008-11-02","2008-11-09","2008-11-16","2008-11-30","2008-12-07","2008-12-14","2009-01-04","2009-01-11","2009-01-18","2009-01-25","2009-02-01","2009-02-15","2009-02-22","2009-03-15","2009-03-22","2009-03-29","2009-04-05","2009-04-12","2009-04-19","2009-04-26","2009-05-10","2009-05-17","2009-05-24","2009-05-31","2009-06-21","2009-06-28","2009-07-05","2009-07-12","2009-07-19","2009-07-26","2009-08-02","2009-08-09","2009-08-16","2009-08-23","2009-09-06","2009-09-20","2009-09-27","2009-10-04","2009-10-11","2009-10-25","2009-11-01","2009-11-08","2009-11-15","2009-11-29","2009-12-06"));
bugsTest = c(2,4,5,1,4,4,2,4,1,7,2,2,4,1,2,3,1,2,3,1,4,2,10,1,1,6,3,5,1,4,2,3,2,4,2,1,5,6,3,1,1,2,2,5,1,1,2,1,2,3,3,4,4,3,2,3,1,2,6,1,1,1,2,2,2,3,1,1,2,1,3,4,2,3,1,3,1,2,2,1,1,2,2,1,1,1,2,2,2,1,4,3,2,2,6,2,4,3,2,2,1)
weekTestConv = numeric()
#converting Dates to numerical Value
for(i in 1:length(weekTest)){
val = ymd(weekTest[i])
val = as.numeric(val)
weekTestConv[i] = c(val)
}
#end Test DataSet
#total DataSet
totWeek = as.Date(c("2003-12-28","2004-01-04","2004-01-11","2004-01-18","2004-01-25","2004-02-01","2004-02-08","2004-02-15","2004-02-22","2004-02-29","2004-03-07","2004-03-14","2004-03-21","2004-03-28","2004-04-04","2004-04-11","2004-04-18","2004-04-25","2004-05-02","2004-05-09","2004-05-16","2004-05-23","2004-05-30","2004-06-06","2004-06-13","2004-06-20","2004-06-27","2004-07-04","2004-07-11","2004-07-18","2004-07-25","2004-08-01","2004-08-08","2004-08-15","2004-08-22","2004-08-29","2004-09-05","2004-09-12","2004-09-19","2004-09-26","2004-10-03","2004-10-10","2004-10-17","2004-10-24","2004-10-31","2004-11-07","2004-11-14","2004-11-21","2004-11-28","2004-12-05","2004-12-12","2004-12-19","2004-12-26","2005-01-02","2005-01-09","2005-01-16","2005-01-23","2005-01-30","2005-02-06","2005-02-13","2005-02-20","2005-02-27","2005-03-06","2005-03-13","2005-03-20","2005-03-27","2005-04-03","2005-04-10","2005-04-17","2005-04-24","2005-05-01","2005-05-08","2005-05-15","2005-05-22","2005-05-29","2005-06-05","2005-06-12","2005-06-19","2005-06-26","2005-07-03","2005-07-10","2005-07-17","2005-07-24","2005-07-31","2005-08-07","2005-08-14","2005-08-21","2005-08-28","2005-09-04","2005-09-11","2005-09-18","2005-09-25","2005-10-02","2005-10-09","2005-10-16","2005-10-23","2005-10-30","2005-11-06","2005-11-13","2005-11-20","2005-11-27","2005-12-04","2005-12-11","2005-12-18","2005-12-25","2006-01-01","2006-01-08","2006-01-15","2006-01-22","2006-01-29","2006-02-05","2006-02-12","2006-02-19","2006-02-26","2006-03-05","2006-03-12","2006-03-19","2006-03-26","2006-04-02","2006-04-09","2006-04-16","2006-04-23","2006-04-30","2006-05-07","2006-05-14","2006-05-21","2006-05-28","2006-06-04","2006-06-11","2006-06-18","2006-06-25","2006-07-02","2006-07-09","2006-07-16","2006-07-23","2006-07-30","2006-08-06","2006-08-13","2006-08-20","2006-08-27","2006-09-03","2006-09-10","2006-09-17","2006-09-24","2006-10-01","2006-10-08","2006-10-15","2006-10-22","2006-10-29","2006-11-05","2006-11-12","2006-11-19","2006-11-26","2006-12-03","2006-12-10","2006-12-17","2006-12-24","2006-12-31","2007-01-07","2007-01-14","2007-01-21","2007-01-28","2007-02-04","2007-02-11","2007-02-18","2007-02-25","2007-03-04","2007-03-11","2007-03-18","2007-03-25","2007-04-01","2007-04-08","2007-04-15","2007-04-22","2007-04-29","2007-05-06","2007-05-13","2007-05-20","2007-05-27","2007-06-03","2007-06-10","2007-06-17","2007-06-24","2007-07-01","2007-07-08","2007-07-15","2007-07-22","2007-07-29","2007-08-05","2007-08-12","2007-08-19","2007-08-26","2007-09-02","2007-09-09","2007-09-16","2007-09-23","2007-09-30","2007-10-07","2007-10-14","2007-10-21","2007-10-28","2007-11-04","2007-11-11","2007-11-18","2007-11-25","2007-12-02","2007-12-09","2007-12-16","2007-12-30","2008-01-06","2008-01-13","2008-01-20","2008-01-27","2008-02-03","2008-02-10","2008-02-17","2008-02-24","2008-03-02","2008-03-09","2008-03-16","2008-03-23","2008-03-30","2008-04-06","2008-04-13","2008-04-20","2008-04-27","2008-05-04","2008-05-11","2008-05-18","2008-05-25","2008-06-01","2008-06-08","2008-06-15","2008-06-22","2008-06-29","2008-07-06","2008-07-20","2008-07-27","2008-08-03","2008-08-10","2008-08-17","2008-08-24","2008-08-31","2008-09-07","2008-09-14","2008-09-21","2008-09-28","2008-10-05","2008-10-12","2008-10-19","2008-10-26","2008-11-02","2008-11-09","2008-11-16","2008-11-30","2008-12-07","2008-12-14","2009-01-04","2009-01-11","2009-01-18","2009-01-25","2009-02-01","2009-02-15","2009-02-22","2009-03-15","2009-03-22","2009-03-29","2009-04-05","2009-04-12","2009-04-19","2009-04-26","2009-05-10","2009-05-17","2009-05-24","2009-05-31","2009-06-21","2009-06-28","2009-07-05","2009-07-12","2009-07-19","2009-07-26","2009-08-02","2009-08-09","2009-08-16","2009-08-23","2009-09-06","2009-09-20","2009-09-27","2009-10-04","2009-10-11","2009-10-25","2009-11-01","2009-11-08","2009-11-15","2009-11-29","2009-12-06"))
totBugs = c(3,18,14,25,21,13,17,25,21,18,20,11,17,19,23,9,7,18,13,17,16,15,16,18,20,12,14,16,19,23,18,10,24,23,11,14,16,19,22,20,15,21,14,9,19,12,18,12,20,10,20,16,14,12,16,11,10,18,20,17,17,20,16,15,20,19,9,11,11,17,10,14,10,16,7,14,11,9,10,9,14,7,13,13,13,16,17,7,17,8,11,11,10,16,9,20,9,13,13,6,11,21,8,10,7,14,16,13,12,9,13,12,17,13,10,12,15,14,8,8,9,13,9,9,18,9,6,10,14,11,5,6,7,4,9,9,9,6,4,5,7,10,12,7,4,13,11,9,6,6,2,8,10,2,7,7,4,1,5,5,10,11,5,11,9,14,5,9,2,6,6,4,4,2,5,7,13,6,4,3,1,5,4,4,2,6,3,5,2,5,5,3,1,5,2,2,4,5,1,4,4,2,4,1,7,2,2,4,1,2,3,1,2,3,1,4,2,10,1,1,6,3,5,1,4,2,3,2,4,2,1,5,6,3,1,1,2,2,5,1,1,2,1,2,3,3,4,4,3,2,3,1,2,6,1,1,1,2,2,2,3,1,1,2,1,3,4,2,3,1,3,1,2,2,1,1,2,2,1,1,1,2,2,2,1,4,3,2,2,6,2,4,3,2,2,1)
totWeekConv = numeric();
#converting Dates to numerical Value
for(i in 1:length(totWeek)){
val = ymd(totWeek[i])
val = as.numeric(val)
totWeekConv[i] = c(val)
}
#end Total DataSet
I wanted to create a linear model and establish a relationship between weeks vs bugs. I converted the week Dates into a numerical format for easier calculation.
I can create the model using the lm() command and I provided it with a training dataset as shown in the 1st code snippet. Whenever I want to predict against the model using testing data set which in this case is a dataframe named "validateFrame", the program gives me an error stating
Warning message: 'newdata' had 101 rows but variables found have 296
rows
I am new to R and I have already googled regarding this but am failing somewhere.I have googled it already but the solution I found doesn't seem to work for me.
The problem is in your initial code snippet.
trainingFrame = data.frame(weeksTrainingConv,bugsTraining)
validateFrame = data.frame(weekTestConv,bugsTest)
model <- lm(totWeekConv ~ totBugs, trainingFrame)
myPrediction <- predict(model,validateFrame)
First, you create the model using totWeekConv and totBugs from trainingFrame. But trainingFrame does not have variables with those names. It has columns named weeksTrainingConv and bugsTraining. Then you try to evaluate the model on validateFrame where the variables have yet different names - weekTestConv and bugsTest. You must use the same variable names throughout.
I am not quite sure how you meant to use totWeekConv and totBugs but I believe that what you wanted was:
trainingFrame = data.frame(weeksConv = weeksTrainingConv,bugs = bugsTraining)
validateFrame = data.frame(weeksConv = weekTestConv,bugs = bugsTest)
model <- lm(weeksConv ~ bugs,trainingFrame)
myPrediction <- predict(model,validateFrame)
Here, you are training on the training data and testing on the test data but the column names are the same in both places.

R predict single row

I'm trying to predict some data from PCA using leave-one-out (LOO) cross validation.
The prcomp goes well, however when I come to predict the function gets upset
error: 'newdata' must be a matrix or data frame
because I'm supplying a vector (i.e. a single row) and not a matrix (i.e. multiple rows).
I've tried as.data.frame and as.matrix and various varieties thereof but I still get errors
error: 'newdata' does not have named columns matching one or more of the original columns`
In my example here loo is the LOO index and mydata and myinfo contain the data and metadata respectively.
tdata = mydata[-loo,]
tinfo = myinfo[-loo,]
vdata = mydata[loo,]
vinfo = myinfo[loo,]
p = prcomp( tdata )
predict(p, newdata = vdata )
Nevermind, found it:
predict(p, newdata = as.data.frame(t(vdata)) )

Used Predict function on New Dataset with different Columns

Using "stackloss" data in R, I created a regression model as seen below:
stackloss.lm = lm(stack.loss ~ Air.Flow + Water.Temp + Acid.Conc.,data=stackloss)
stackloss.lm
newdata = data.frame(Air.Flow=stackloss$Air.Flow, Water.Temp= stackloss$Water.Temp, Acid.Conc.=stackloss$Acid.Conc.)
Suppose I get a new data set and would need predict its "stack.loss" based on the previous model as seen below:
#suppose I need to used my model on a new set of data
stackloss$predict1[-1] <- predict(stackloss.lm, newdata)
I get this error:
Error in `$<-.data.frame`(`*tmp*`, "predict1", value = numeric(0)) :
replacement has 0 rows, data has 21
Is their a way to used the predict function on different data set with the same columns but different rows?
Thanks in advance.
You can predict into a new data set of whatever length you want, you just need to make sure you assign the results to an existing vector of appropriate size.
This line causes a problem because
stackloss$predict1[-1] <- predict(stackloss.lm, newdata)
because you can't assign and subset a non-existing vector at the same time. This also doesn't work
dd <- data.frame(a=1:3)
dd$b[-1]<-1:2
The length of stackloss which you used to fit the model will always be the same length so re-assigning new values to that data.frame doesn't make sense. If you want to use a smaller dataset to predict on, that's fine
stackloss.lm = lm(stack.loss ~ Air.Flow + Water.Temp + Acid.Conc.,data=stackloss)
newdata = head(data.frame(Air.Flow=stackloss$Air.Flow, Water.Temp= stackloss$Water.Temp, Acid.Conc.=stackloss$Acid.Conc.),5)
predict(stackloss.lm, newdata)
1 2 3 4 5
38.76536 38.91749 32.44447 22.30223 19.71165
Since the result has the same number of values as newdata has rows (n=5), it makes sense to attach these to newdata. It would not make sense to attach to stackloss because that has a different number of rows (n=21)
newdata$predcit1 <- predict(stackloss.lm, newdata)

Resources