Translating Stata xtmelogit to R glmer (lme4) package - r

I try to recalculate exactly published Stata code in R. In the first step I exported the same dataset from Stata and Imported it in R. Nevertheless I am fighting with errors in my code.
What am I doing wrong?
Original Stata Code:
xtmelogit redpref1 c.incomedif c.incomedif#c.forpop forpop i.year if (brncntr==1) || country:
My Approach in R was:
dataset <- dataset%>%
filter(brncntr==1) %>%
mutate(c.incomedif = factor(incomedif))%>%
mutate(c.forpop = factor(forpop)) %>%
mutate(i.year = as.integer(year)))
library(lme4)
logit <- glmer(redpref1~ c.incomedif+ i.year|country,family=binomial,rueda4, nAGQ=0L)
summary(logit)
This approach generates an error as follows:
Error in if (ctrl$npt > (2 * n + 1)) warning("Setting npt > 2 * length(par) + 1 is not recommended.") :
missing value where TRUE/FALSE needed
In addition: Warning message:
In (n + 1L) * (n + 2L) : NAs produced by integer overflow
>

Related

Error in Z_aktuell * D : non-conformable arrays

Can anyone figure out why I get the error below after running the following code?
library(haven)
library(survival)
library(dplyr)
library(readr)
library(glmmLasso)
par(mar=c(1,1,1,1))
library(discSurv)
HH<- as.data.frame(read_dta("https://www.stata.com/data/jwooldridge/eacsap/recid.dta") )
HHC <- contToDisc(dataShort = HH, timeColumn = "durat", intervalLimits = 20,equi = TRUE)
dtLong<-dataLong(dataShort = HHC, timeColumn = "timeDisc",
eventColumn = "cens",timeAsFactor = FALSE)
formula.1<-y~factor(black)+factor(alcohol)
family<-binomial(link = "logit")
lambda <- 20
penal.vec<-20
next.try<-TRUE
BIC_vec<-rep(Inf,length(lambda))
Deltama.glm2<-as.matrix(t(rep(0,3)))#coefficients + Intercept
Smooth.glm2<-as.matrix(t(rep(0,20)))
j<-1;test.step<-1;
glm2 <- glmmLasso(formula.1,
rnd = NULL,family = family, data = dtLong, lambda=lambda[j],final.re=T,switch.NR=F,
control = list(smooth=list(formula=~- 1+as.numeric(timeInt),nbasis=20,spline.degree=3,
diff.ord=2,penal=penal.vec[test.step],start=Smooth.glm2[j,]),
method.final="EM", print.iter=T,print.iter.final=T,
eps.final=1e-4,epsilon=1e-4,complexity="non.zero",
start=Deltama.glm2[j,]))
Iteration 41
Final Re-estimation Iteration 9Error in Z_aktuell * D : non-conformable arrays
when I change Deltama.glm2<-as.matrix(t(rep(0,3))) to Deltama.glm2<-as.matrix(t(rep(0,2)))
I get the error
Iteration 1Error in grad.lasso[b.is.0] <- score.beta[b.is.0] - lambda.b * sign(score.beta[b.is.0]) :
NAs are not allowed in subscripted assignments
I have tried to remove the starting values as suggested on glmmLasso error and warning without success
Switching from R version 4.2.2 to 3.6.0 solved the issue. Seems there is a compatibility issue between glmmlasso and the newer versions of R.

Having trouble with making K Nearest Neighbors work in R Studio

I'm trying to use the knn function in r but I keep getting this error message when I try to compute it.
> knn(Taxi_train,Taxi_test,cl,k=100)
Error in knn(Taxi_train, Taxi_test, cl, k = 100) :
NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In knn(Taxi_train, Taxi_test, cl, k = 100) : NAs introduced by coercion
2: In knn(Taxi_train, Taxi_test, cl, k = 100) : NAs introduced by coercion
I don't know what exactly is wrong with my code so I need some help to get it working.
I tried making sure that all the variables are numeric but that didn't change anything. It may also be an issue with my cl factor in the knn equation.
Here is what my code is currently:
date<-chicago_taxi$date
class(date)
Date <- as.Date(date)
class(Date)
Julian <- yday(Date)
class(Julian)
head(Julian)
chicago_taxi <- cbind(chicago_taxi,Julian)
chicago_taxi$seconds <- as.numeric(chicago_taxi$seconds)
set.seed(7777)
train_set <- sample(1:13081,10400,replace = FALSE)
Taxi_train <- chicago_taxi[train_set,]
Taxi_test <- chicago_taxi[-train_set,]
cl <- Taxi_train$payment_type
scale(chicago_taxi$miles)
scale(chicago_taxi$seconds)
scale(chicago_taxi$Julian)
knn(Taxi_train,Taxi_test,cl,k=100)

R - Error: argument 1 is not a vector when bootstrapping

I'm attempting to bootstrap my data to get 2000 measurements based on the linear regression and Theil regression (mblm function w/ repeated=FALSE).
My bootstrap R code works perfectly for the normal regression (from what I can tell), given below:
> fitfunc <- function(formula, data, index) {
+ d<- data[index,]
+ f<- lm(formula,data=d)
+ return(coef(f))
+ }
boot(dataframe, fitfunc, R=2000, formula=`Index A`~`Measurement B`)
But I get an error when attempting the Theil estimator bootstrap:
> fitfuncTheil <- function(formula,data,index) {
+ d<- data[index,]
+ f<- mblm(formula, data=d, repeated=FALSE)
+ return(coef(f))
+ }
> boot(dataframe, fitfuncTheil, R=2000, formula=`Index A`~`Measurement B`)
Error in order(x) : argument 1 is not a vector
In addition: Warning message:
In is.na(x) :
The error message seems basic but I cannot figure out why this would work in one case but not the other.
Once I removed the space from the column names (referenced in the formula field), the issue was resolved.

Error message when running npreg

I'm working the npreg example in the R np package documentation (by T. Hayfield, J. Racine), section 3.1 Univariate Regression.
library("np")
data("cps71")
model.par = lm(logwage~age + I(age^2),data=cps71)
summary(model.par)
#
attach(cps71)
bw = npregbw(logwage~age) # thislne not in example 3.1
model.np = npreg(logwage~age,regtype="ll", bwmethod="cv.aic",gradients="TRUE",
+ data=cps71)
This copied directly from the example, but the npreg call results in error message
*Rerun with Debug
Error in npreg.rbandwidth(txdat = txdat, tydat = tydat, bws = bws, ...) :
NAs in foreign function call (arg 15)
In addition: Warning message:
In npreg.rbandwidth(txdat = txdat, tydat = tydat, bws = bws, ...) :
NAs introduced by coercion*
The npreg R documentation indicates the first argument should be BW specificaion. I tried setting bws=1
model.np = npreg(bws=1,logwage~age,regtype="ll",
+ bwmethod="cv.aic",gradients="TRUE", data=cps71)
which gives the following error
*Error in toFrame(xdat) :
xdat must be a data frame, matrix, vector, or factor*
First time working with density estimation in R. Please suggest how to resolve these errors.

Using Beta.Select function in R (prior estimate)

I am trying to formulate the priors by using total counts and beta distribution.
I have following written:
quantile(df$row, probs=c(0.00001, 0.5, 0.99999))
quantile1 <- list(p=0.5, x=8)
quantile2 <- list(p=0.99999, x=10)
quantile3 <- list(p=0.00001, x=1)
library("LearnBayes")
findBeta <- function(quantile1,quantile2,quantile3)
quantile1_p <- quantile1[[1]]; quantile1_q <- quantile1[[2]]
quantile2_p <- quantile2[[1]]; quantile2_q <- quantile2[[2]]
quantile3_p <- quantile3[[1]]; quantile3_q <- quantile3[[2]]
priorA <- beta.select(list(p=0.5, x=8), list(p=0.99999, x=10))
and once I am trying to calculate priorA using beta.select function I get following error:
Error in if (p0 < p) m.hi = m0 else m.lo = m0 :
missing value where TRUE/FALSE needed
In addition: Warning message:
In pbeta(x, K * m0, K * (1 - m0)) : NaNs produced
I just can't get rid of the error and do not know how to approach it any more. Urgently need help.
I am guessing (completely out of thin air) that you are dealing with percentages. In which case you want to use x/100
beta.select(list(p=0.5, x=.08), list(p=0.9, x=.10))
# [1] 28.02 318.74
Either way, while it would be nice of beta.select to throw a more appropriate error message (or rather, to have an error check in there), the root of the issue is that your x's are out of bounds. (As #Didzis noted, the interval for a beta dist is [0, 1])

Resources