Structural equation modeling / path analysis using lavaan - r

I'm trying to use the sem function, after having defined a simple model as:
Model1 <- 'Y ~ X + M
M ~ X'
sem (Model1, data = A)
Where A is a matrix defined with the command
A = matrix(ncol = 3, nrow = 50)
A[,1] = read.csv2("Mydata1",header = TRUE)
A[,2] = read.csv2("Mydata2", header = TRUE)
A[,3] = read.csv2("Mydata3",header = TRUE)
But the software displays:
Error in lav_data_full(data = data, group = group, cluster = cluster,
: lavaan ERROR: missing observed variables in dataset: Y M
I've tried also to substitute missing values with means for each variable, but it displays the same error.

Related

rootogram() error when checking for overdispersion in GAM

I have run the below GAM and am trying to plot a rootogram() using the countreg package to check for overdispersion, but get the error Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp : number of items to replace is not a multiple of replacement length.
I understand what the error message is telling me, that the length of two vectors/objects do not match, but am none the wiser as to how to fix it. Any help/suggestions would be appreciated? Has anyone had this problem previously, if so how did you fix it?
This may be arising due to a peculiarity in my data as I have never previously had a problem producing rootograms when using other datasets.
# I cannot fit a rootogram from the following GAM
> knots2 <- list(nMonth = c(0.5, 12.5))
> sup15 <- gam(Number ~ State + Virus + State*Virus + s(nMonth, bs = "cc", k = 12, by = Virus) + s(Time, k = 60, by = Virus),
data = supply.pad,
family = nb(),
method = "REML",
knots = knots2)
> root_nb <- rootogram(sup15, style = "hanging", plot = FALSE)
Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp :
number of items to replace is not a multiple of replacement length
# But can fit a rootogram from the below GAM. Note that these are different datasets but pretty much the same code.
> knots1 <- list(month = c(0.5, 12.5))
> gam10 <- gam(n ~ State + s(month, bs = "cc", k = 12) + s(time),
data = rhdv.gp.pad,
family = nb(),
method = "REML",
knots = knots1)
> root_nb1 <- rootogram(gam10, style = "hanging", plot = FALSE)

How can I find regression model analyses from 2 dataset?

setwd("C:/Users/sevvalayse.yurtekin/Desktop/hw3")
data = read.table('DSE501_fall2020_HW3.csv', header= T, sep=',')
attach
data
getOption("max.print")
rs<-rowSums(data[,2:76], na.rm = TRUE)
data<-cbind(data,rs)
data
p1<-ggplot()+
geom_line(aes(y = rs, x=year), data=data)+
scale_x_continuous(breaks = seq(2004,2019,2))
p1
model = lm(rs ~ year )
model
summary(model)
residuals(model)
predict(model)
#model.fit = lm(year~rs)
#summary(model.fit)
new.year<-data.frame(
year = c(2021,2022,2023)
)
predict(model, newdata = new.year, interval = 'confidence')
data2 = read.table('TUIK_nufus_2019.csv', header = T, sep=",")
data2
total = data2$Total
mydata<-data[-c(1,2,3),]
model2 = lm(mydata~total)
model2
Hello, I have an error about the Error in model.frame.default(formula = mydata ~ total, drop.unused.levels = TRUE) : invalid type (list) for variable 'mydata'.
How can I fixed? I want to regression analyses from 2 data.
The line that's causing the issue is model2 = lm(mydata~total). mydata is not a vector, which is what your dependent variable should be in the lm function. When you set mydata you do not provide a column name: mydata<-data[-c(1,2,3), <enter column name of dependent variable>]
Otherwise you can fit your model with the following syntax (provided your dependent and independent variables are in the same dataframe). Here I just used y as a fake variable name: lm(y ~ total, data = mydata)

How to prepare variables for nnet classification/predict in R?

In the classification I use the variable x as the value and y as the labels. As here in the example for randomForest:
iris_train_values <- iris[,c(1:4)]
iris_train_labels <- iris[,5]
model_RF <- randomForest(x = iris_train_values, y = iris_train_labels, importance = TRUE,
replace = TRUE, mtry = 4, ntree = 500, na.action=na.omit,
do.trace = 100, type = "classification")
This solution works for many classifiers, however when I try to do it in nnet and get error:
model_nnet <- nnet(x = iris_train_values, y = iris_train_labels, size = 1, decay = 0.1)
Error in nnet.default(x = iris_train_values, y = iris_train_labels, size = 1, :
NA/NaN/Inf in foreign function call (arg 2)
In addition: Warning message:
In nnet.default(x = iris_train_values, y = iris_train_labels, size = 1, :
NAs introduced by coercion
Or on another data set gets an error:
Error in y - tmp : non-numeric argument to binary operator
How should I change the variables to classify?
The formula syntax works:
library(nnet)
model_nnet <- nnet(Species ~ ., data = iris, size = 1)
But the matrix syntax does not:
nnet::nnet(x = iris_train_values, y = as.matrix(iris_train_labels), size = 1)
I don't understand why this doesn't work, but at least there is a work around.
predict works fine with the formula syntax:
?predict.nnet
predict(model_nnet,
iris[c(1,51,101), 1:4],
type = "class") # true classese are ['setosa', 'versicolor', 'virginica']

mgcv::gamm() and MuMIn::dredge() errors

I've been trying to fit multiple GAMs using the package mgcv within a function, and crudely select the most appropriate model through model selection procedures. But my function runs the first model then doesn't seem to recognise the input data dat again.
I get the error
Error in is.data.frame(data) : object 'dat' not found.
I think this is a scoping problem and I've looked here, and here for help but cannot figure it out.
Code and data are as follows (hopefully reproducible):
https://github.com/cwaldock1/Help/blob/master/test_gam.csv
library(mgcv)
# Function to fit multiple models
best.mod <- function(dat) {
# Set up control structure
ctrl <- list(niterEM = 0, msVerbose = TRUE, optimMethod="L-BFGS-B")
# AR(1)
m1 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 1),
control = ctrl)), subset=1)[[1]]
# AR(2)
m2 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 2),
control = ctrl)), subset=1)[[1]]
# AR(3)
m3 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 3),
control = ctrl)), subset = 1)[[1]]
### Select best model to work with based on unselective AIC criteria
if(AIC(m2$lme) > AIC(m1$lme)){mod = m1}else{mod = m2}
if(AIC(mod$lme) > AIC(m3$lme)){mod = m3}else{mod = mod}
return(mod$gam)
}
mod2 <- best.mod(dat = test_gam)
Any help would be greatly appreciated.
Thanks,
Conor
get.models evaluates in model's formula environment, which in gamm is
(always?) .GlobalEnv, while it should be function's environment (i.e.
sys.frames(sys.nframe())).
So, instead of
get.models(ms, 1)
use
eval(getCall(ms, 1))

Passing the weights argument to a regression function inside an R function

I am trying to write an R function to run a weighted (optional) regressions, and I am having difficulties getting the weight variable to work.
Here is a simplified version of the function.
HC <- function(data, FUN, formula, tau = 0.5, weights = NULL){
if(is.null(weights)){
est <- FUN(data = data, formula = formula, tau = tau)
intercept = est$coef[["(Intercept)"]]
zeroWorker <- exp(intercept)
}
else {
est <- FUN(data = data, formula = formula, tau = tau, weights = weights)
intercept = est$coef[["(Intercept)"]]
zeroWorker <- exp(intercept)
}
return(zeroWorker)
}
The function works perfectly if I do not use the weights argument.
mod1 <- HC(data = mydata, formula = lin.model, tau = 0.2,
FUN = rq)
But, throws an error message when I use the weights argument.
mod2 <- HC(data = mydata, formula = lin.model, tau = 0.2,
FUN = rq, weights = weig)
I google the problem, and this post seems to be the closest to my problem, but I could still not get it to work. R : Pass argument to glm inside an R function.
Any help will be appreciated.
My problem can be replicated with:
library("quantreg")
data(engel)
mydata <- engel
mydata$weig <- with(mydata, log(sqrt(income))) # Create a fictive weigth variable
lin.model <- foodexp~income
mod1 <- HC(data = mydata, formula = lin.model, tau = 0.2,
FUN = rq) # This works perfectly
mod2 <- HC(data = mydata, formula = lin.model, tau = 0.2,
FUN = rq, weights = weig) # throws an error.
Error in HC(data = mydata, formula = lin.model, tau = 0.2, FUN = rq, weights = weig) :
object 'weig' not found
You have two problems. The error you're encountering is because you're trying to use the weigh variable without referencing it as coming from the mydata dataset. Try using mydata$weig. This will solve your first error, but you then get the actual one related to using the weights argument, which is:
Error in model.frame.default(formula = formula, data = data, weights = substitute(weights), :
invalid type (symbol) for variable '(weights)'
The solution is to add the variable specified in HC's weights argument to the dataframe before passing it to FUN:
HC <- function(data, FUN, formula, tau = 0.5, weights = NULL){
data$.weights <- weights
if(is.null(weights)){
est <- FUN(data = data, formula = formula, tau = tau)
} else {
est <- FUN(data = data, formula = formula, tau = tau, weights = .weights)
}
intercept = est$coef[["(Intercept)"]]
zeroWorker <- exp(intercept)
return(zeroWorker)
}
Then everything works:
mod2 <- HC(data = mydata, formula = lin.model, tau = 0.2, FUN = rq, weights = mydata$weig)
mod2
# [1] 4.697659e+47

Resources