Getting Error in R when Performing predict function - r
I am getting the following error whenever I try to predict data against my linear model.
Warning message: 'newdata' had 101 rows but variables found have 296 rows
The following is the code snippet
trainingFrame = data.frame(weeksTrainingConv,bugsTraining)
validateFrame = data.frame(weekTestConv,bugsTest)
model <- lm(totWeekConv ~ totBugs,trainingFrame)
myPrediction <- predict(model,validateFrame)
The calculations for the dataframe and their components are written in a separate sheet. Here is the snippet. I have commented out the blocks according to the nature of the code. The first block represents my training dataset, the second is the dataset which I will use to test my model. Finally the last block is the total dataset.
library(lubridate)
#training DataSet
weeksTraining = as.Date(c("2003-12-28","2004-01-04","2004-01-11","2004-01-18","2004-01-25","2004-02-01","2004-02-08","2004-02-15","2004-02-22","2004-02-29","2004-03-07","2004-03-14","2004-03-21","2004-03-28","2004-04-04","2004-04-11","2004-04-18","2004-04-25","2004-05-02","2004-05-09","2004-05-16","2004-05-23","2004-05-30","2004-06-06","2004-06-13","2004-06-20","2004-06-27","2004-07-04","2004-07-11","2004-07-18","2004-07-25","2004-08-01","2004-08-08","2004-08-15","2004-08-22","2004-08-29","2004-09-05","2004-09-12","2004-09-19","2004-09-26","2004-10-03","2004-10-10","2004-10-17","2004-10-24","2004-10-31","2004-11-07","2004-11-14","2004-11-21","2004-11-28","2004-12-05","2004-12-12","2004-12-19","2004-12-26","2005-01-02","2005-01-09","2005-01-16","2005-01-23","2005-01-30","2005-02-06","2005-02-13","2005-02-20","2005-02-27","2005-03-06","2005-03-13","2005-03-20","2005-03-27","2005-04-03","2005-04-10","2005-04-17","2005-04-24","2005-05-01","2005-05-08","2005-05-15","2005-05-22","2005-05-29","2005-06-05","2005-06-12","2005-06-19","2005-06-26","2005-07-03","2005-07-10","2005-07-17","2005-07-24","2005-07-31","2005-08-07","2005-08-14","2005-08-21","2005-08-28","2005-09-04","2005-09-11","2005-09-18","2005-09-25","2005-10-02","2005-10-09","2005-10-16","2005-10-23","2005-10-30","2005-11-06","2005-11-13","2005-11-20","2005-11-27","2005-12-04","2005-12-11","2005-12-18","2005-12-25","2006-01-01","2006-01-08","2006-01-15","2006-01-22","2006-01-29","2006-02-05","2006-02-12","2006-02-19","2006-02-26","2006-03-05","2006-03-12","2006-03-19","2006-03-26","2006-04-02","2006-04-09","2006-04-16","2006-04-23","2006-04-30","2006-05-07","2006-05-14","2006-05-21","2006-05-28","2006-06-04","2006-06-11","2006-06-18","2006-06-25","2006-07-02","2006-07-09","2006-07-16","2006-07-23","2006-07-30","2006-08-06","2006-08-13","2006-08-20","2006-08-27","2006-09-03","2006-09-10","2006-09-17","2006-09-24","2006-10-01","2006-10-08","2006-10-15","2006-10-22","2006-10-29","2006-11-05","2006-11-12","2006-11-19","2006-11-26","2006-12-03","2006-12-10","2006-12-17","2006-12-24","2006-12-31","2007-01-07","2007-01-14","2007-01-21","2007-01-28","2007-02-04","2007-02-11","2007-02-18","2007-02-25","2007-03-04","2007-03-11","2007-03-18","2007-03-25","2007-04-01","2007-04-08","2007-04-15","2007-04-22","2007-04-29","2007-05-06","2007-05-13","2007-05-20","2007-05-27","2007-06-03","2007-06-10","2007-06-17","2007-06-24","2007-07-01","2007-07-08","2007-07-15","2007-07-22","2007-07-29","2007-08-05","2007-08-12","2007-08-19","2007-08-26","2007-09-02","2007-09-09","2007-09-16"))
bugsTraining = c(3,18,14,25,21,13,17,25,21,18,20,11,17,19,23,9,7,18,13,17,16,15,16,18,20,12,14,16,19,23,18,10,24,23,11,14,16,19,22,20,15,21,14,9,19,12,18,12,20,10,20,16,14,12,16,11,10,18,20,17,17,20,16,15,20,19,9,11,11,17,10,14,10,16,7,14,11,9,10,9,14,7,13,13,13,16,17,7,17,8,11,11,10,16,9,20,9,13,13,6,11,21,8,10,7,14,16,13,12,9,13,12,17,13,10,12,15,14,8,8,9,13,9,9,18,9,6,10,14,11,5,6,7,4,9,9,9,6,4,5,7,10,12,7,4,13,11,9,6,6,2,8,10,2,7,7,4,1,5,5,10,11,5,11,9,14,5,9,2,6,6,4,4,2,5,7,13,6,4,3,1,5,4,4,2,6,3,5,2,5,5,3,1,5,2)
weeksTrainingConv = numeric();
#converting Dates to numerical Value
for(i in 1:length(weeksTraining)){
val = ymd(weeksTraining[i])
val = as.numeric(val)
weeksTrainingConv[i] = c(val)
print(weeksTrainingConv[i])
}
#end Training DataSet
#test DataSet
weekTest = as.Date(c("2007-09-23","2007-09-30","2007-10-07","2007-10-14","2007-10-21","2007-10-28","2007-11-04","2007-11-11","2007-11-18","2007-11-25","2007-12-02","2007-12-09","2007-12-16","2007-12-30","2008-01-06","2008-01-13","2008-01-20","2008-01-27","2008-02-03","2008-02-10","2008-02-17","2008-02-24","2008-03-02","2008-03-09","2008-03-16","2008-03-23","2008-03-30","2008-04-06","2008-04-13","2008-04-20","2008-04-27","2008-05-04","2008-05-11","2008-05-18","2008-05-25","2008-06-01","2008-06-08","2008-06-15","2008-06-22","2008-06-29","2008-07-06","2008-07-20","2008-07-27","2008-08-03","2008-08-10","2008-08-17","2008-08-24","2008-08-31","2008-09-07","2008-09-14","2008-09-21","2008-09-28","2008-10-05","2008-10-12","2008-10-19","2008-10-26","2008-11-02","2008-11-09","2008-11-16","2008-11-30","2008-12-07","2008-12-14","2009-01-04","2009-01-11","2009-01-18","2009-01-25","2009-02-01","2009-02-15","2009-02-22","2009-03-15","2009-03-22","2009-03-29","2009-04-05","2009-04-12","2009-04-19","2009-04-26","2009-05-10","2009-05-17","2009-05-24","2009-05-31","2009-06-21","2009-06-28","2009-07-05","2009-07-12","2009-07-19","2009-07-26","2009-08-02","2009-08-09","2009-08-16","2009-08-23","2009-09-06","2009-09-20","2009-09-27","2009-10-04","2009-10-11","2009-10-25","2009-11-01","2009-11-08","2009-11-15","2009-11-29","2009-12-06"));
bugsTest = c(2,4,5,1,4,4,2,4,1,7,2,2,4,1,2,3,1,2,3,1,4,2,10,1,1,6,3,5,1,4,2,3,2,4,2,1,5,6,3,1,1,2,2,5,1,1,2,1,2,3,3,4,4,3,2,3,1,2,6,1,1,1,2,2,2,3,1,1,2,1,3,4,2,3,1,3,1,2,2,1,1,2,2,1,1,1,2,2,2,1,4,3,2,2,6,2,4,3,2,2,1)
weekTestConv = numeric()
#converting Dates to numerical Value
for(i in 1:length(weekTest)){
val = ymd(weekTest[i])
val = as.numeric(val)
weekTestConv[i] = c(val)
}
#end Test DataSet
#total DataSet
totWeek = as.Date(c("2003-12-28","2004-01-04","2004-01-11","2004-01-18","2004-01-25","2004-02-01","2004-02-08","2004-02-15","2004-02-22","2004-02-29","2004-03-07","2004-03-14","2004-03-21","2004-03-28","2004-04-04","2004-04-11","2004-04-18","2004-04-25","2004-05-02","2004-05-09","2004-05-16","2004-05-23","2004-05-30","2004-06-06","2004-06-13","2004-06-20","2004-06-27","2004-07-04","2004-07-11","2004-07-18","2004-07-25","2004-08-01","2004-08-08","2004-08-15","2004-08-22","2004-08-29","2004-09-05","2004-09-12","2004-09-19","2004-09-26","2004-10-03","2004-10-10","2004-10-17","2004-10-24","2004-10-31","2004-11-07","2004-11-14","2004-11-21","2004-11-28","2004-12-05","2004-12-12","2004-12-19","2004-12-26","2005-01-02","2005-01-09","2005-01-16","2005-01-23","2005-01-30","2005-02-06","2005-02-13","2005-02-20","2005-02-27","2005-03-06","2005-03-13","2005-03-20","2005-03-27","2005-04-03","2005-04-10","2005-04-17","2005-04-24","2005-05-01","2005-05-08","2005-05-15","2005-05-22","2005-05-29","2005-06-05","2005-06-12","2005-06-19","2005-06-26","2005-07-03","2005-07-10","2005-07-17","2005-07-24","2005-07-31","2005-08-07","2005-08-14","2005-08-21","2005-08-28","2005-09-04","2005-09-11","2005-09-18","2005-09-25","2005-10-02","2005-10-09","2005-10-16","2005-10-23","2005-10-30","2005-11-06","2005-11-13","2005-11-20","2005-11-27","2005-12-04","2005-12-11","2005-12-18","2005-12-25","2006-01-01","2006-01-08","2006-01-15","2006-01-22","2006-01-29","2006-02-05","2006-02-12","2006-02-19","2006-02-26","2006-03-05","2006-03-12","2006-03-19","2006-03-26","2006-04-02","2006-04-09","2006-04-16","2006-04-23","2006-04-30","2006-05-07","2006-05-14","2006-05-21","2006-05-28","2006-06-04","2006-06-11","2006-06-18","2006-06-25","2006-07-02","2006-07-09","2006-07-16","2006-07-23","2006-07-30","2006-08-06","2006-08-13","2006-08-20","2006-08-27","2006-09-03","2006-09-10","2006-09-17","2006-09-24","2006-10-01","2006-10-08","2006-10-15","2006-10-22","2006-10-29","2006-11-05","2006-11-12","2006-11-19","2006-11-26","2006-12-03","2006-12-10","2006-12-17","2006-12-24","2006-12-31","2007-01-07","2007-01-14","2007-01-21","2007-01-28","2007-02-04","2007-02-11","2007-02-18","2007-02-25","2007-03-04","2007-03-11","2007-03-18","2007-03-25","2007-04-01","2007-04-08","2007-04-15","2007-04-22","2007-04-29","2007-05-06","2007-05-13","2007-05-20","2007-05-27","2007-06-03","2007-06-10","2007-06-17","2007-06-24","2007-07-01","2007-07-08","2007-07-15","2007-07-22","2007-07-29","2007-08-05","2007-08-12","2007-08-19","2007-08-26","2007-09-02","2007-09-09","2007-09-16","2007-09-23","2007-09-30","2007-10-07","2007-10-14","2007-10-21","2007-10-28","2007-11-04","2007-11-11","2007-11-18","2007-11-25","2007-12-02","2007-12-09","2007-12-16","2007-12-30","2008-01-06","2008-01-13","2008-01-20","2008-01-27","2008-02-03","2008-02-10","2008-02-17","2008-02-24","2008-03-02","2008-03-09","2008-03-16","2008-03-23","2008-03-30","2008-04-06","2008-04-13","2008-04-20","2008-04-27","2008-05-04","2008-05-11","2008-05-18","2008-05-25","2008-06-01","2008-06-08","2008-06-15","2008-06-22","2008-06-29","2008-07-06","2008-07-20","2008-07-27","2008-08-03","2008-08-10","2008-08-17","2008-08-24","2008-08-31","2008-09-07","2008-09-14","2008-09-21","2008-09-28","2008-10-05","2008-10-12","2008-10-19","2008-10-26","2008-11-02","2008-11-09","2008-11-16","2008-11-30","2008-12-07","2008-12-14","2009-01-04","2009-01-11","2009-01-18","2009-01-25","2009-02-01","2009-02-15","2009-02-22","2009-03-15","2009-03-22","2009-03-29","2009-04-05","2009-04-12","2009-04-19","2009-04-26","2009-05-10","2009-05-17","2009-05-24","2009-05-31","2009-06-21","2009-06-28","2009-07-05","2009-07-12","2009-07-19","2009-07-26","2009-08-02","2009-08-09","2009-08-16","2009-08-23","2009-09-06","2009-09-20","2009-09-27","2009-10-04","2009-10-11","2009-10-25","2009-11-01","2009-11-08","2009-11-15","2009-11-29","2009-12-06"))
totBugs = c(3,18,14,25,21,13,17,25,21,18,20,11,17,19,23,9,7,18,13,17,16,15,16,18,20,12,14,16,19,23,18,10,24,23,11,14,16,19,22,20,15,21,14,9,19,12,18,12,20,10,20,16,14,12,16,11,10,18,20,17,17,20,16,15,20,19,9,11,11,17,10,14,10,16,7,14,11,9,10,9,14,7,13,13,13,16,17,7,17,8,11,11,10,16,9,20,9,13,13,6,11,21,8,10,7,14,16,13,12,9,13,12,17,13,10,12,15,14,8,8,9,13,9,9,18,9,6,10,14,11,5,6,7,4,9,9,9,6,4,5,7,10,12,7,4,13,11,9,6,6,2,8,10,2,7,7,4,1,5,5,10,11,5,11,9,14,5,9,2,6,6,4,4,2,5,7,13,6,4,3,1,5,4,4,2,6,3,5,2,5,5,3,1,5,2,2,4,5,1,4,4,2,4,1,7,2,2,4,1,2,3,1,2,3,1,4,2,10,1,1,6,3,5,1,4,2,3,2,4,2,1,5,6,3,1,1,2,2,5,1,1,2,1,2,3,3,4,4,3,2,3,1,2,6,1,1,1,2,2,2,3,1,1,2,1,3,4,2,3,1,3,1,2,2,1,1,2,2,1,1,1,2,2,2,1,4,3,2,2,6,2,4,3,2,2,1)
totWeekConv = numeric();
#converting Dates to numerical Value
for(i in 1:length(totWeek)){
val = ymd(totWeek[i])
val = as.numeric(val)
totWeekConv[i] = c(val)
}
#end Total DataSet
I wanted to create a linear model and establish a relationship between weeks vs bugs. I converted the week Dates into a numerical format for easier calculation.
I can create the model using the lm() command and I provided it with a training dataset as shown in the 1st code snippet. Whenever I want to predict against the model using testing data set which in this case is a dataframe named "validateFrame", the program gives me an error stating
Warning message: 'newdata' had 101 rows but variables found have 296
rows
I am new to R and I have already googled regarding this but am failing somewhere.I have googled it already but the solution I found doesn't seem to work for me.
The problem is in your initial code snippet.
trainingFrame = data.frame(weeksTrainingConv,bugsTraining)
validateFrame = data.frame(weekTestConv,bugsTest)
model <- lm(totWeekConv ~ totBugs, trainingFrame)
myPrediction <- predict(model,validateFrame)
First, you create the model using totWeekConv and totBugs from trainingFrame. But trainingFrame does not have variables with those names. It has columns named weeksTrainingConv and bugsTraining. Then you try to evaluate the model on validateFrame where the variables have yet different names - weekTestConv and bugsTest. You must use the same variable names throughout.
I am not quite sure how you meant to use totWeekConv and totBugs but I believe that what you wanted was:
trainingFrame = data.frame(weeksConv = weeksTrainingConv,bugs = bugsTraining)
validateFrame = data.frame(weeksConv = weekTestConv,bugs = bugsTest)
model <- lm(weeksConv ~ bugs,trainingFrame)
myPrediction <- predict(model,validateFrame)
Here, you are training on the training data and testing on the test data but the column names are the same in both places.
Related
Cross validation help: Error in model.frame.default(as.formula(delete.response(Terms)), newdata, : variable lengths differ (found for 'fun factor')
So I have a specific error that I can't figure out. By searching I am finding that the model and the cross validation set do not have the data with the same levels to fit the model. I am trying to understand completely with my use case. Basically I am building a QDA model to predict vehicle country based on numeric values. This code will run for anyone since it is a public google sheets document. For those of you who follow Doug Demuro on YouTube you may find this a tad bit interesting. #load dataset into r library(gsheet) url = 'https://docs.google.com/spreadsheets/d/1KTArYwDWrn52fnc7B12KvjRb6nmcEaU6gXYehWfsZSo/edit' doug_df = read.csv(text=gsheet2text(url, format='csv'), stringsAsFactors=FALSE,header=FALSE) #begin cleanup. remove first blank rows of data doug_df = doug_df[-c(1,2,3), ] attach(doug_df) #name columns appropriately names(doug_df) = c("year","make","model","styling","acceleration","handling","fun factor","cool factor","total weekend score","features","comfort","quality","practicality","value","total daily score","dougscore","video duration","filming city","filming state","vehicle country") #removing categorical columns and columns not being used for discriminate analysis to include totals columns library(dplyr) doug_df = doug_df %>% dplyr::select (-c(make,model,`total weekend score`,`total daily score`,dougscore,`video duration`,`filming city`,`filming state`)) #convert from character to numeric num.cols <- c("year","styling","acceleration","handling","fun factor","cool factor","features","comfort","quality","practicality","value") doug_df[num.cols] <- sapply(doug_df[num.cols], as.numeric) `vehicle country` = as.factor(`vehicle country`) #create a new column to reflect groupings for response variable doug_df$country.group=ifelse(`vehicle country`=='Germany','Germany', ifelse(`vehicle country`=='Italy','Italy', ifelse(`vehicle country`=='Japan','Japan', ifelse(`vehicle country`=='UK','UK', ifelse(`vehicle country`=='USA','USA','Other'))))) #remove the initial country column doug_df = doug_df %>% dplyr::select (-c(`vehicle country`)) #QDA with multiple predictors library(MASS) qdafit1 = qda(country.group~styling+acceleration+handling+`fun factor`+`cool factor`+features+comfort+quality+value,data=doug_df) #predict using model and compute error n=dim(doug_df)[1] fittedclass = predict(qdafit1,data=doug_df)$class table(doug_df$country.group,fittedclass) Error = sum(doug_df$country.group != fittedclass)/n; Error #conduct k 10 fold cross validation allpredictedCV1 = rep("NA",n) cvk = 10 groups = c(rep(1:cvk,floor(n/cvk))) set.seed(4) cvgroups = sample(groups,n,replace=TRUE) for (i in 1:cvk) { qdafit1 = qda(country.group~styling+acceleration+handling+`fun factor`+`cool factor`+features+comfort+quality+value,data=doug_df,subset=(cvgroups!=i)) newdata1i = data.frame(doug_df[cvgroups==i,]) allpredictedCV1[cvgroups==i] = as.character(predict(qdafit1,newdata1i)$class) } table(doug_df$country.group,allpredictedCV1) CVmodel1 = sum(allpredictedCV1!=doug_df$country.group)/n; CVmodel1 This is throwing the error for the last part of the code w/ the cross validation: Error in model.frame.default(as.formula(delete.response(Terms)), newdata, : variable lengths differ (found for 'fun factor') Can someone help explain it a bit more in depth to me what is happening? I think that the variable fun factor doesn't have the same levels in each fold of the cross validation as it did the model. Now I need to know my options to fix it. Thanks in advance! EDIT In addition to the above, I am getting a very similar error for when I try to predict a dummy car review. #build a dummy review and predict it using multiple models dummy_review = data.frame(year=2014,styling=8,acceleration=6,handling=6,`fun factor`=8,`cool factor`=8,features=4,comfort=4,quality=6,practicality=3,value=5) #predict vehicle country for dummy data using model 1 predict(qdafit1,dummy_review)$class This returns the following error: Error in model.frame.default(as.formula(delete.response(Terms)), newdata, : variable lengths differ (found for 'fun factor')
Issue with h2o Package in R using subsetted dataframes leading to near perfect prediction accuracy
I have been stumped on this problem for a very long time and cannot figure it out. I believe the issue stems from subsets of data.frame objects retaining information of the parent but I also feel it's causing issues when training h2o.deeplearning models on what I think is just my training set (though this may not be true). See below for sample code. I included comments to clarify what I'm doing but it's fairly short code: dataset = read.csv("dataset.csv")[,-1] # Read dataset in but omit the first column (it's just an index from the original data) y = dataset[,1] # Create response X = dataset[,-1] # Create regressors X = model.matrix(y~.,data=dataset) # Automatically create dummy variables y=as.factor(y) # Ensure y has factor data type dataset = data.frame(y,X) # Create final data.frame dataset train = sample(length(y),length(y)/1.66) # Create training indices -- A boolean test = (-train) # Create testing indices h2o.init(nthreads=2) # Initiate h2o # BELOW: Create h2o.deeplearning model with subset of dataset. mlModel = h2o.deeplearning(y='y',training_frame=as.h2o(dataset[train,,drop=TRUE]),activation="Rectifier", hidden=c(6,6),epochs=10,train_samples_per_iteration = -2) predictions = h2o.predict(mlModel,newdata=as.h2o(dataset[test,-1])) # Predict using mlModel predictions = as.data.frame(predictions) # Convert predictions to dataframe object. as.vector() caused issues for me predictions = predictions[,1] # Extract predictions mean(predictions!=y[test]) The problem is that if I evaluate this against my test subset I get almost 0% error: [1] 0.0007531255 Has anyone encountered this issue? Have an idea of how to alleviate this problem?
It will be more efficient to use the H2O functions to load the data and split it. data = h2o.importFile("dataset.csv") y = 2 #Response is 2nd column, first is an index x = 3:(ncol(data)) #Learn from all the other columns data[,y] = as.factor(data[,y]) parts = h2o.splitFrame(data, 0.8) #Split 80/20 train = parts[[1]] test = parts[[2]] # BELOW: Create h2o.deeplearning model with subset of dataset. mlModel = h2o.deeplearning(x=x, y=y, training_frame=train,activation="Rectifier", hidden=c(6,6),epochs=10,train_samples_per_iteration = -2) h2o.performance(mlModel, test) It is hard to say what the problem with your original code is, without seeing the contents of dataset.csv and being able to try it. My guess is that train and test are not being split, and it is actually being trained on the test data.
PGLS returns an error when referring to variables by their column position in a caper object
I am carrying out PGLS between a trait and 21 environmental variables for a clade of plant species. I am using a loop to do this 21 times, once for each of the environmental variables, and extract the p-values and some other values into a results matrix. When normally carrying each PGLS individually I will refer to the variables by their column names, for example: pgls(**trait1**~**meanrainfall**, data=caperobject) But in order to loop this process for multiple environmental variables, I am referring to the variables by their column position in the data frame (which is in the form of the caper object for PGLS) instead of their column name: pgls(**caperobject[,2]**~**caperobject[,5]**, data=caperobject) This returns the error: Error in model.frame.default(formula, data$data, na.action = na.pass) : invalid type (list) for variable 'caperobject[, 2]' This is not a problem when running a linear regression using the original data frame -- referring to the variables by their column name only produces this error when using the caper object as the data using PGLS. Does this way of referring to the column names not work for caper objects? Is there another way I could refer to the column names so I can incorporate them into a PGLS loop?
Your solution is to use caperobject$data[,2] ~ caperobject$data[,5], because comparative.data class is a list with the trait values located in the list data. Here is an example: library(ape) library(caper) # generate random data seed <- 245937 tr <- rtree(10) dat <- data.frame(taxa = tr$tip.label, trait1 = rTraitCont(tr, root.value = 3), meanrainfall = rnorm(10, 50, 10)) # prepare a comparative.data structure caperobject <- comparative.data(tr, dat, taxa, vcv = TRUE, vcv.dim = 3) # run PGLS pgls(trait1 ~ meanrainfall, data = caperobject) pgls(caperobject$data[, 1] ~ caperobject$data[, 2], data = caperobject) Both options return identical values for the intercept = 3.13 and slope = -0.003. A good practice in problems with data format is to check, how the data are stored with str(caperobject).
How to use predict from a model stored in a list in R?
I have a dataframe dfab that contains 2 columns that I used as argument to generate a series of linear models as following: models = list() for (i in 1:10){ models[[i]] = lm(fc_ab10 ~ (poly(nUs_ab, i)), data = dfab) } dfab has 32 observations and I want to predict fc_ab10 for only 1 value. I thought of doing so: newdf = data.frame(newdf = nUs_ab) newdf[] = 0 newdf[1,1] = 56 prediction = predict(models[[1]], newdata = newdf) First I tried writing newdf as a dataframe with only one position, but since there are 32 in the dataset on which the model was built, I thought I had to provide at least 32 points as well. I don't think this is necessary though. Every time I run that piece of code I am given the following error: Error: variable 'poly(nUs_ab, i) was fitted with type “nmatrix.1” but type “numeric” was supplied. In addition: Warning message: In Z/rep(sqrt(norm2[-1L]), each = length(x)) : longer object length is not a multiple of shorter object length I thought all I need to use predict was a LM model, predictors (the number 56) given in a column-named dataframe. Obviously, I am mistaken. How can I fix this issue? Thanks.
newdf should be a data.frame with column name nUs_ab, otherwise R won't be able to know which column to operate upon (i.e., generate the prediction design matrix). So the following code should work newdf = data.frame(nUs_ab = 56) prediction = predict(models[[1]], newdata = newdf)
R predict single row
I'm trying to predict some data from PCA using leave-one-out (LOO) cross validation. The prcomp goes well, however when I come to predict the function gets upset error: 'newdata' must be a matrix or data frame because I'm supplying a vector (i.e. a single row) and not a matrix (i.e. multiple rows). I've tried as.data.frame and as.matrix and various varieties thereof but I still get errors error: 'newdata' does not have named columns matching one or more of the original columns` In my example here loo is the LOO index and mydata and myinfo contain the data and metadata respectively. tdata = mydata[-loo,] tinfo = myinfo[-loo,] vdata = mydata[loo,] vinfo = myinfo[loo,] p = prcomp( tdata ) predict(p, newdata = vdata )
Nevermind, found it: predict(p, newdata = as.data.frame(t(vdata)) )