I am running some multi-group confirmatory factor analyses (CFA) in lavaan in R after multiple imputation.
First, I created a list called Plav to store 5 imputed datasets:
library(lavaan)
library(lavaan.survey)
library(mitools)
library(semTools)
a <- imputationList(Plav) ##Tell R these are plausible values
Survey <- svydesign(ids = ~1, weights = ~wt, data = a) # set the weight
Subsequently, I conducted a multi-group CFA:
# Model without population corrections
fit <- cfa(model, data=Plav[[1]], estimator = 'MLR', missing = 'default', group = 'gender',group.equal = c("loadings"))
# Model with population corrections
fitSurvey <- lavaan.survey(lavaan.fit = fit, survey.design = Survey)
The following error was returned:
Error in FUN(X[[1L]], ...) :
dims [product 1936] do not match the length of object [0]
When I remove the grouping variable and conduct an analysis on the whole sample, no error is returned.
Can anybody explain why this error is returned?
Related
I am attempting to use a random forest regressor to classify a raster stack, but an error does not allow a prediction of "area_pct", have I not trained the model properly?
d100 is my dataset with predictor variables d100[,4:ncol(d100)] and prediction variable d100["area_pct"].
#change na values to zero
d100[is.na(d100)] <- 0
set.seed(100)
#split dataset into training (70%) and testing (30%)
id<- sample(2,nrow(d100), replace = TRUE, prob = c(0.7,0.3))
train_100<- d100[id==1,]
test_100 <- d100[id==2,]
train random forest model with randomForest package, this appears to work fine
final_CC_rf_20 = randomForest(x=train[,4:ncol(train)], y= train$area_pct,
xtest=test[,4:ncol(test)], ytest=test$area_pct, mtry=14, importance=TRUE, ntree = 600)
Then I try to predict a raster.
New raster stack with predictor variables
sentinel_2_20 <- stack( paste(getwd(), "Sentinel_SR_clip_20.tif", sep="/") )
area_classified_20_2018 <- predict(object = final_CC_rf_20 , newdata = sentinel_2_20,type = 'response', progress = 'window')
but error pops up:
#Error in predict.randomForest(object = final_CC_rf_20, newdata = sentinel_2_20, :
# No forest component in the object
any help would be extremely useful
The arguments you are using for predict (with raster data) are not correct. The first argument, object, should be the raster data, the second argument, model, should be the fitted model. There is no argument newdata.
Another problem is that you use keep.forest=FALSE which is the default when xtest is not NULL. You could set keep.forest=TRUE but that is not a good approach, generally, as you should fit your model with all data before you make a prediction (you are no longer evaluating your model). Thus, I would suggest fitting your model without xtest, like this
rfmod <- randomForest(x=d100[,4:ncol(train)], y=d100$area_pct,
mtry=14, importance=TRUE, ntree = 600)
And then do
p <- predict(sentinel_2_20, rfmod, type='response')
See ?raster::predict or ?terra::predict for working examples
I am working on project to forecast sales of stores to learn forecasting.Till now I have successfully used simple auto.Arima() function for forecasting.But to make these forecast more accurate I can make use of covariates.I have defined covariates like holidays, promotion which affect on sales of store using xreg operator with the help of this post:
How to setup xreg argument in auto.arima() in R?
But my code fails at line:
ARIMAfit <- auto.arima(saledata, xreg=covariates)
and gives error saying:
Error in model.frame.default(formula = x ~ xreg, drop.unused.levels = TRUE) : variable lengths differ (found for 'xreg') In addition: Warning message: In !is.na(x) & !is.na(rowSums(xreg)) : longer object length is not a multiple of shorter object length
Below is link to my Dataset: https://drive.google.com/file/d/0B-KJYBgmb044blZGSWhHNEoxaHM/view?usp=sharing
This is my code:
data = read.csv("xdata.csv")[1:96,]
View(data)
saledata <- ts(data[1:96,4],start=1)
View(saledata)
saledata[saledata == 0] <- 1
View(saledata)
covariates = cbind(DayOfWeek=model.matrix(~as.factor(data$DayOfWeek)),
Customers=data$Customers,
Open=data$Open,
Promo=data$Promo,
SchoolHoliday=data$SchoolHoliday)
View(head(covariates))
# Remove intercept
covariates <- covariates[,-1]
View(covariates)
require(forecast)
ARIMAfit <- auto.arima(saledata, xreg=covariates)//HERE IS ERROR LINE
summary(ARIMAfit)
Also tell me how I can forecast for next 48 days.I know how to forecast using simple auto.Arima() and n.ahead but dont know how to do it when xreg is used.
A few points. One, you can just convert the entire matrix to a ts object and then isolate the variables later. Second, if you are using covariates in your arima model then you will need to provide them when you forecast out-of-sample. This may mean forecasting each of the covariates before generating forecasts for your variable of interest. In the example below I split the data into two samples for simplicity.
dta = read.csv("xdata.csv")[1:96,]
dta <- ts(dta, start = 1)
# to illustrate out of sample forecasting with covariates lets split the data
train <- window(dta, end = 90)
test <- window(dta, start = 91)
# fit model
covariates <- c("DayOfWeek", "Customers", "Open", "Promo", "SchoolHoliday")
fit <- auto.arima(train[,"Sales"], xreg = train[, covariates])
# forecast
fcast <- forecast(fit, xreg = test[, covariates])
After fitting a model with glm I got this as a result:
Warning message:
glm.fit: Adjusted probabilities with numerical value 0 or 1.**
After some research on Google, I tried with the brglm package. When I try to apply backward elimination on the model, I get the following error:
Error in do.call("glm.control", control) : second argument must be a list.
I searched on Google but I didn't find anything.
Here is my code with brglm:
library(mlbench)
#require(Amelia)
library(caTools)
library(mlr)
library(ciTools)
library(brglm)
data("BreastCancer")
data_bc <- BreastCancer
data_bc
head(data_bc)
dim(data_bc)
#Delete id column
data_bc<- data_bc[,-1]
data_bc
dim(data_bc)
str(data_bc)
# convert all factors columns to be numeric except class.
for(i in 1:9){
data_bc[,i]<- as.numeric(as.character(data_bc[,i]))
}
str(data_bc)
#convert class: benign and malignant to binary 0 and 1:
data_bc$Class<-ifelse(data_bc$Class=="malignant",1,0)
# now convert class to factor
data_bc$Class<- factor(data_bc$Class, levels = c(0,1))
str(data_bc)
model <- brglm(formula = Class~.^2, data = data_bc, family = "binomial",
na.action = na.exclude )
summary(model)
#Backward Elimination:
final <- step(model, direction = "backward")
You can work around this by using the brglm2 package, which supersedes the brglm package anyway:
model <- glm(formula = Class~.^2, data = na.omit(data_bc), family = "binomial",
na.action = na.fail, method="brglmFit" )
final <- step(model, direction = "backward")
length(coef(model)) ## 46
length(coef(final)) ## 42
setdiff(names(coef(model)), names(coef(final))
## [1] "Cl.thickness:Epith.c.size" "Cell.size:Marg.adhesion"
## [3] "Cell.shape:Bl.cromatin" "Bl.cromatin:Mitoses"
Some general concerns about your approach:
stepwise reduction is one of the worst forms of model reduction (cf. lasso, ridge, elasticnet ...)
in the presence of missing data, model comparison (e.g. by AIC) is questionable, as different models will be fitted to different subsets of the data. Given that you are only going to lose a small fraction of your data by using na.omit() (comparing nrow(bc_data) with sum(complete.cases(bc_data)), I would strongly recommend dropping observations with NA values from the data set before starting
it's also not clear to me that comparing penalized models via AIC is statistically appropriate (see here)
I am trying to do a Random Forest model on a dataset to predict a two classification variable. I have attached the code below and come back with this error. The variable Customer Count is in the dataset and this error is still getting thrown.
This is for my predictive model. I have tried to reorganize the dataset to get around Customer Count as the first variable. I have also tried to trim the dataset so it is not as large and that maybe that was an issue.
# Load the dataset and explore
library(readxl)
rawData <- read_excel("StrippedTransformerModelData.xlsx")
View(rawData)
head(rawData)
str(rawData)
summary(rawData)
# Split into Train and Validation sets
# Training Set : Validation Set = 70 : 30 (random)
set.seed(100)
train <- sample(nrow(rawData), 0.7*nrow(rawData), replace = FALSE)
TrainSet <- rawData[train,]
ValidSet <- rawData[-train,]
summary(TrainSet)
summary(ValidSet)
# Create a Random Forest model with default parameters
model1 <- randomForest(data = TrainSet, Failure ~ ., ntree = 500, mtry = 6, importance = TRUE)
model1
Error in eval(predvars, data, env) : object 'Customer Count' not found.
The variable Customer Count is in the dataset for sure and I don't know why it is saying it is not found.
I have run a Nested Logit model in R using the mlogit() package. I am now trying to measure marginal effects/elasticities and continue to run into an error. Here I have recreated the error by modifying the vignette by the package author:
data("Fishing", package = "mlogit")
Fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode")
m <- mlogit(mode ~ price | income | catch, data = Fish,
nests=list(water=c("boat","charter"),
land=c("beach","pier")))
# compute a data.frame containing the mean value of the covariates in the sample
z <- with(Fish, data.frame(price = tapply(price, index(m)$alt, mean),
catch = tapply(catch, index(m)$alt, mean),
income = mean(income)))
# compute the marginal effects (the second one is an elasticity
effects(m, covariate = "income", data = z)
I get the following error:
Error in `colnames<-`(`*tmp*`, value = c("beach", "boat", "charter", "pier" :
attempt to set 'colnames' on an object with less than two dimensions
In addition: Warning message:
In cbind(Gb, Gl) :
number of rows of result is not a multiple of vector length (arg 2)
This works fine when I do not have a nested model (like a regular Multinomial Logit), and that has been covered in some previous stackoverflow questions, but something weird is happening specifically with the step of re-predicting on a changed data frame (in this case the means frame z).
Ill note that the solution here: marginal effects of mlogit in R did not help me.