Problems with function MatchIt::matchit - r

Hi I'd like to do a logistical regression by adjusting on the propensity score. But first I'd like to match treaty and non-treaty according to propensity scores. Here's my first script:
mod_match<-matchit(Treatment~Prop.score, method = "nearest", data = Epidemio.prop,caliper = 0.05)
Here are the error messages
Error in matchit(Treatment~Prop.score, method = "nearest", data =
Epidemio.prop, : Missing values exist in the data
I have therefore removed from the model all other variables except the two variables of interest that have no missing data.
mod_match<-matchit(Treatment~Prop.score,
method = "nearest", data = Epidemio.prop[c("Treatment","Prop.score")],
caliper = 0.1)
I still have error messages.
Error in weights.matrix(match.matrix, treat, discarded) : No units
were matched In addition: Warning messages:
1: In max(pscore[treat == 0]) : no non-missing arguments to max;
returning -Inf
2: In max(pscore[treat == 1]) : no non-missing arguments to max;
returning -Inf
3: In min(pscore[treat == 0]) : no non-missing arguments to min;
returning Inf
4: In min(pscore[treat == 1]) : no non-missing arguments to min;
returning Inf

The problem is that you are not giving any variables to be used in the propensity score calculation (i.e., you are only giving Treatment and Prop.score, whose meaning is not clear to me).
You need to pass a set of auxiliary variables that are going to be used to fit the model predicting propensity scores.
Also, from my experience using MatchIt, it will throw an error related to missing values no matter the missingness is not related to the variables included in the model.
I recommend you create an auxiliary data frame with the variables you want to use in the model, and delete (or impute) any of the observations with missing values in any of those variables.
Something like this:
vars_to_keep <- c("Treatment", "x1", "x2", "x3", ... )
aux_df <- df[vars_to_keep]
# Select only complete cases (i.e. drop observations with at least one missing)
aux_df <- aux_df[complete.cases(aux_df), ]
mod_match <- matchit(Treatment ~ x1 + x2 + x3 + ..., method = "nearest", data = aux_df)
Nevertheless, this tutorial is a much more comprehensive help. I recommend having a look at it.
Good luck!

Related

Error when trying to fit Hierarchical GAMs (Model GS or S) using mgcv

I have a large dataset (~100k observations) of presence/absence data that I am trying to fit a Hierarchical GAM with individual effects that have a Shared penalty (e.g. 'S' in Pedersen et al. 2019). The data consists of temp as numeric, region (5 groups) as a factor.
Here is a simple version of the model that I am trying to fit.
modS1 <- gam(occurrence ~ s(temp, region), family = binomial,
data = df, method = "REML")
modS2 <- gam(occurrence ~ s(temp, region, k= c(10,4), family = binomial,
data = df, method = "REML")
In the first case I received the following error:
Which I assumed it because k was set too high for region given there are only 5 different regions in the data set.
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: In mean.default(xx) : argument is not numeric or logical: returning NA
2: In Ops.factor(xx, shift[i]) : ‘-’ not meaningful for factors
In the second case I attempt to lower k for region and receive this error:
Error in if (k < M + 1) { : the condition has length > 1
In addition: Warning messages:
1: In mean.default(xx) : argument is not numeric or logical: returning NA
2: In Ops.factor(xx, shift[i]) : ‘-’ not meaningful for factors
I can fit Models G and GI and I from Pedersen et al. 2019 with no issues. It is models GS and S where I run into issues.
If anyone has any insights I would really appreciate it!
The bs = "fs" argument in the code you're using as a guide is important. If we start at the ?s help page and click on the link to the ?smooth.terms help page, we see:
Factor smooth interactions
bs="fs" Smooth factor interactions are often produced using by variables (see gam.models), but a special smoother class (see factor.smooth.interaction) is available for the case in which a smooth is required at each of a large number of factor levels (for example a smooth for each patient in a study), and each smooth should have the same smoothing parameter. The "fs" smoothers are set up to be efficient when used with gamm, and have penalties on each null space component (i.e. they are fully ‘random effects’).
You need to use a smoothing basis appropriate for factors.
Notably, if you take your source code and remove the bs = "fs" argument and attempt to run gam(log(uptake) ∼ s(log(conc), Plant_uo, k=5, m=2), data=CO2, method="REML"), it will produce the same error that you got.

Approx(): Need at least two non-NA values to interpolate R

I am trying to use nnetar for some time series forecasting, and running into an issue when the data has repeating values (i.e. the same counts observed in a time period). To reproduce the error I have created a list of values and replaced the first 10 values with a 0:
dummy.ls <- runif(n=80)
for(i in 1:10)
dummy.ls[i] <- 0
fit <- nnetar(dummy.ls, lambda=0)
When running the nnetar function I receive the following error:
Error in approx(idx, x[idx], tt, rule = 2) :
need at least two non-NA values to interpolate
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
I see similar errors in other questions, but unsure how to avoid the error?

having issues plotting/doing analysis as part of a column loop in R

I am trying to build a loop where I do non-linear regression of several variables, create and save graphs as a pdf. here is a fraction of the code:
library(propagate)
library(nlstools)
nls_fit_best<-nls(reformulate("a*IDV^b", i),
start = list(a = 1, b = 1),
control = list(minFactor=0, maxiter=nls_iterations),
data=df)
#calculcates the values for the confidence intervals
preds <- data.frame(IDV = seq(min(IDV), max(IDV), length=30))
y.conf <- predictNLS(nls_fit_best, newdata=preds, interval="confidence", alpha=0.05, nsim=10000)$summary
best_fit_coeffs<-as.data.frame(round(coeffs(nls_fit_best), digits=3))
residual_plots<-nlsResiduals(nls_fit_best)
par(mfrow = c(3,3), mar=c(5.1,4.1,2,1.1), oma=c(0,0,0,0))
layout(matrix(c(1,1,1,1,1,1,2,3,4),nrow=3, byrow = TRUE))
#plots the values into a graph with a bit of wiggle room
plot(i~IDV,
data=cor_data_centered,
ylim=c(0,max(i)+0.2,xlim=c(0,max(IDV)+0.2))
#plots best fit line
lines(IDV,predict(nls_fit_best),lty=2,col="black",lwd=3)
#plots 95% confidence interval and info
matlines(preds, y.conf[,c("Sim.2.5%", "Sim.97.5%")], col="black", lty="dashed")
mtext(paste("power function coeffs",best_fit_coeffs,sep=" "), side=3)
plot(residual_plots, which=2)
plot(residual_plots, which=4)
plot(residual_plots, which=6)
}
where IDV is my idenpendent variable (i.e, X) - which has 13 measurements. There are 14 variables in the df, with 13 measurments, lets say we have this, to make it easy.
IDV=1:13
df <- as.data.frame(matrix(1, ncol = 14, nrow = 13))
When I run the code i get the following error message
Error in model.frame.default(formula = ~i + IDV, data = df) :
variable lengths differ (found for 'IDV')
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
it seems to me that the code is trying to correlate the actualy column names of the data frame with the IDV instead of doing the correlations of each of the variables of the df with IDV. I suppose that the error lies in the initial loop but I do not know how to fix this.
In case anyone has a similar problem to mine, I managed to solve the issue with using the minimum of each variable to set the boundaries, using the following code:
max_loop=c()
max_loop[i]= max(linear_plots[,i])
min_loop=c()
min_loop[i]= min(linear_plots[,i])
#plots the values into a graph with a bit of wiggle room
plot(reformulate("IDV",i),
data=cor_data_centered,
ylim=c(min_loop[i]-wiggle_room_y,max_loop[i]+wiggle_room_y),
xlim=c(min(IDV)-wiggle_room_x,max(IDV)+wiggle_room_x))
With "wiggle room" being whatever value you want to use.

Error in panel spatial model in R using spml

I am trying to fit a panel spatial model in R using the package spml. I first define the NxN weighting matrix as follows
neib <- dnearneigh(coordinates(coord), 0, 50, longlat = TRUE)
dlist <- nbdists(neib, coordinates(coord))
idlist <- lapply(dlist, function(x) 1/x)
w50 <- nb2listw(neib,zero.policy=TRUE, glist=idlist, style="W")
Thus I define two observations to be neighbours if they are distant within a range of 50km at most. The weights attached to each pairs of neighbour observations correspond to the inverse of their distance, so that closer neighbours receive higher weights. I also use the option zero.policy=TRUE so that observations which do not have neighbours are associated with a vector of zero weights.
Once I do this I try to fit the panel spatial model in the following way
mod <- spml(y ~ x , data = data_p, listw = w50, na.action = na.fail, lag = F, spatial.error = "b", model = "within", effect = "twoways" ,zero.policy=TRUE)
but I get the following error and warning messages
Error in lag.listw(listw, u) : Variable contains non-finite values In
addition: There were 50 or more warnings (use warnings() to see the
first 50)
Warning messages: 1: In mean.default(X[[i]], ...) : argument is not
numeric or logical: returning NA
...
50: In mean.default(X[[i]], ...) : argument is not numeric or
logical: returning NA
I believe this to be related to the non-neighbour observations. Can please anyone help me with this? Is there any way to deal with non-neighbour observations besides the zero.policy option?
Many many thanks for helping me.
You should check two things:
1) Make sure that the weight matrix is row-normalized.
2) Treat properly if you have any NA values in the dataset and as well in the W matrix.

Plotting C5.0 Tree in R

I am trying to plot a C5.0 object tree in R but it is giving the following error and I can't seem to find out how to fix it.
plot(model)
Error in partysplit(varid = as.integer(i), index = index, info = k, prob = NULL) :
minimum of ‘index’ is not equal to 1
In addition: Warning message:
In min(index, na.rm = TRUE) :
no non-missing arguments to min; returning Inf
It seems that the factors in your data frame contain spaces. I was facing the same issue, then I removed spaces from them and now it works.
for example, if a variable has factors " bad" and " good" then change them to "bad" and "good".
"The error itself is due to NA values being passed in the index vector. The root cause is probably that the factor levels are being split on spaces" Found here https://github.com/topepo/C5.0/issues/10
try this
library(rattle)
fancyRpartPlot(model)

Resources