VAR with exogenous variables - r

I am attempting a VAR model in R with an exogenous variable on:
VARM <- data.frame(y,x1,x2,x3) #x3 is the exogenous variable
First, I want to choose the correct lag order by using VARselect
VARselect(VARM, lag.max = 6, type = "const" , exogen=x3)
I then get the following error : "different row size of y and exogen"
I can't figure out what's causing this. When I view the data frame I have confirmed that the rows are the same and there is no missing observations. I've tried various things to use the x3 variable, but the closest I could get is this error when the VARselect runs:
"No column names supplied in exogen, using: exo1 , instead"

Seems that you were almost there. In the details of VARselect it says: "providing a matrix object for exogen". If, in addition, you do not want to get a warning (not an error) such as "No column names supplied in exogen, using: exo1 , instead" you should provide named matrix. For example:
df <- data.frame(x1 = rnorm(50), x2 = rnorm(50))
model <- VARselect(df, exogen = cbind(x3 = rnorm(50)))

Related

SMOTE target variable not found in R

Why is it when I run the smote function in R, an error appears saying that my target variable is not found? I am using the smotefamily package to run this smote function.
tv_smote <- SMOTE(tv_smote, Churn, K = 5, dup_size = 0)
Error in table(target) : object 'Churn' not found
Chunk Codes
df data structure
df 1st few rows
Generally, you should include the minimum code and data needed to reproduce the problem. This saves time and gives you more chance of getting an answer. However, try this...
tv_smote <- SMOTE(tv_smote, tv_smote$Churn, K = 5, dup_size = 0)
I can get the same error by doing this:
library(smotefamily)
df <- data.frame(x = 1:8, y = 8:1)
df_smote <- SMOTE(df, y, K = 3, dup_size = 0)
It appears to me that SMOTE doesn't know y is a column name. In the documentation, ?SMOTE, it says that the target argument is "A vector of a target class attribute corresponding to a dataset X." My interpretation of this is that you need to supply a vector, not a name. Changing it to df_smote <- SMOTE(df, df$y, K = 3, dup_size = 0) gets past that part.
I am not familiar with the SMOTE function, but from testing it would appear that SMOTE cannot accept dataframes with any factor type columns. I get Error in get.knnx(data, query, k, algorithm) : Data non-numeric when using as.factor.

Error in `$<-.data.frame`(`*tmp*`, "P_value", value = 9.66218350888067e-05) : replacement has 1 row, data has 0

Can some-one help me with my code, i have a code which is calculating a lot of logistic regression at the same time. i used this code also for a lm model and then it worked quite wel, however i tried to adapt it to a glm model but it does not work anymore.
Output_logistic <- data.frame()
glm_output = glm(test[,1] ~ test_2[,1], family = binomial ('logit'))
Output_2 <- data.frame(R_spuared = summary(glm_output)$r.squared)
Output_2$P_value <- summary(glm_output)$coefficients[2,4]
Output_2$Variabele <- paste(colnames(test))
Output_2$Variabele_1 <- paste(colnames(test_2))
Output_2$N_NA <- length(glm_output$na.action)
Output_2$df <- paste(glm_output$df.residual)
Output_logistic <- rbind(Output_logistic,Output_2)
running this code gives the next error:
Error in $<-.data.frame(*tmp*, "P_value", value = 9.66218350888067e-05) :
replacement has 1 row, data has 0
does anybody know what i have to adapt so that the code will work?
Thanks in advance
Your Output_2 is an empty data.frame (it has no rows) because summary(glm_output)$r.squared does not exist, because glm doesn’t report this value.
If you need the R-squared value you’ll have to calculate it yourself. But to fix the error you can simply change your code to construct the data-frame from the existing data in the summary:
output_2 = data.frame(
P_value = summary(glm_output)$coefficients[2, 4],
Variable = colnames(test),
# … etc.
)

Undefined columns error when referencing element of data.frame

I am trying to plot many graphs, and I am having an error referencing elements of a data.frame.
Rather than manually change the variable names I would like to loop through and reference the specific variable names.
When I do this I get the "undefined columns selected" error.
When I run this code, I get the correct plot:
xy <- lm(Unfairness_Scale ~ OS_ImpCoreV_A * ImpCoreV_A, data =
branch_annual)
with(branch_annual, interact_plot(xy, pred = OS_ImpCoreV_A, modx =
ImpCoreV_A))
When I run this code, I get the "undefined columns selected" error:
xy <- lm(branch_annual$Unfairness_Scale ~ branch_annual$OS_ImpCoreV_A *
branch_annual$ImpCoreV_A, data = branch_annual)
with(branch_annual, interact_plot(xy, pred = branch_annual$OS_ImpCoreV_A,
modx = branch_annual$ImpCoreV_A))
I have tried several different methods to reference the elements of the data frame but I keep getting the same error. What am I not understanding correctly?
Thanks,
Sebastian
You can use as.formula with character as input from your loop and construct the formula inside lm function.
xy <- lm(as.formula(paste('Unfairness_Scale', '~', 'OS_ImpCoreV_A', '*
', 'ImpCoreV_A')), data = branch_annual)
with(branch_annual, interact_plot(xy, pred = 'OS_ImpCoreV_A',
modx = 'ImpCoreV_A'))

"Input datasets must be dataframes" error in kamila package in R

I have a mixed type data set, one continuous variable, and eight categorical variables, so I wanted to try kamila clustering. It gives me an error when I use one continuous variable, but when I use two continuous variables it is working.
library(kamila)
data <- read.csv("mixed.csv",header=FALSE,sep=";")
conInd <- 9
conVars <- data[,conInd]
conVars <- data.frame(scale(conVars))
catVarsFac <- data[,c(1,2,3,4,5,6,7,8)]
catVarsFac[] <- lapply(catVarsFac, factor)
kamRes <- kamila(conVars, catVarsFac, numClust=5, numInit=10,calcNumClust = "ps",numPredStrCvRun = 10, predStrThresh = 0.5)
Error in kamila(conVar = conVar[testInd, ], catFactor =
catFactor[testInd, : Input datasets must be dataframes
I think the problem is that the function assumes that you have at least two of both data types (i.e. >= 2 continuous variables, and >= 2 categorical variables). It looks like you supplied a single column index (conInd = 9, just column 9), so you have only one continuous variable in your data. Try adding another continuous variable to your continuous data.
I had the same problem (with categoricals) and this approach fixed it for me.
I think the ultimate source of the error in the program is at around line 170 of the source code. Here's the relevant snippet...
numObs <- nrow(conVar)
numInTest <- floor(numObs/2)
for (cvRun in 1:numPredStrCvRun) {
for (ithNcInd in 1:length(numClust)) {
testInd <- sample(numObs, size = numInTest, replace = FALSE)
testClust <- kamila(conVar = conVar[testInd,],
catFactor = catFactor[testInd, ],
numClust = numClust[ithNcInd],
numInit = numInit, conWeights = conWeights,
catWeights = catWeights, maxIter = maxIter,
conInitMethod = conInitMethod, catBw = catBw,
verbose = FALSE)
When the code partitions your data into a training set, it's selecting rows from a one-column data.frame, but that returns a vector by default in that case. So you end up with "not a data.frame" even though you did supply a data.frame. That's where the error comes from.
If you can't dig up another variable to add to your data, you could edit the code such that the calls to kamila in the cvRun for loop wrap the data.frame() function around any subsetted conVar or catFactor, e.g.
testClust <- kamila(conVar = data.frame(conVar[testInd,]),
catFactor = data.frame(catFactor[testInd,], ... )
and just save that as your own version of the function called say, my_kamila, which you could use instead.
Hope this helps.

Predict using multiple variables in R

I have a slight problem with my R coursework.
I have made a following dataset:
Now I'm going to plot the values based on this dataset using the following command:
plot(x ~ Group.1, data = jarelmaks_vaikelaen23mean,
xlab = "Vanus", ylab = "PD", main = "Järelmaks ja väikelaen")
After that, I'm creating a glm model using the following command. The difference is, that now I'm using an original dataset (the values of the dependent values are 1/0).
GLM command:
jarelmaks_vaikelaen23_mudel <- glm(Default ~ Vanus.aastates + Toode,
family = binomial(link = 'logit'), data = jarelmaks_vaikelaen_23)
Now, I'm trying to predict the values using my model.
predict(jarelmaks_vaikelaen23_mudel,data.frame(Vanus.aastates=x),type = "resp")
Unfortunately, I get a following error message:
Error in data.frame(Vanus.aastates = x) : object 'x' not found
Can you give me some ideas, how to solve this problem or explain, how this predict() command works or smth?
When you provide a data-frame to the predict function's newdata argument, the data-frame should have column names that match the variables used as independent variables in your model-fitting step. That is, your predict call should look like
predict(
jarelmaks_vaikelaen23_mudel,
newdata = data.frame(
Vanus.aastates = SOMETHING,
Toode = SOMETHING_ELSE
),
type = "response"
)

Resources