Using svyglm within plyr call - r

This is clearly something idiosyncratic to R's survey package. I'm trying to use llply from the plyr package to make a list of svyglm models. Here's an example:
library(survey)
library(plyr)
foo <- data.frame(y1 = rbinom(50, size = 1, prob=.25),
y2 = rbinom(50, size = 1, prob=.5),
y3 = rbinom(50, size = 1, prob=.75),
x1 = rnorm(50, 0, 2),
x2 = rnorm(50, 0, 2),
x3 = rnorm(50, 0, 2),
weights = runif(50, .5, 1.5))
My list of dependent variables' column numbers
dvnum <- 1:3
Indicating no clusters or strata in this sample
wd <- svydesign(ids= ~0, strata= NULL, weights= ~weights, data = foo)
A single svyglm call works
svyglm(y1 ~ x1 + x2 + x3, design= wd)
And llply will make a list of base R glm models
llply(dvnum, function(i) glm(foo[,i] ~ x1 + x2 + x3, data = foo))
But llply throws the following error when I try to adapt this method to svyglm
llply(dvnum, function(i) svyglm(foo[,i] ~ x1 + x2 + x3, design= wd))
Error in svyglm.survey.design(foo[, i] ~ x1 + x2 + x3, design = wd) :
all variables must be in design= argument
So my question is: how do I use llply and svyglm?

DWin was on to something with his comment about correct formula.
reformulate will do this.
dvnum <- names(foo)[1:3]
llply(dvnum, function(i) {
svyglm(reformulate(c('x1', 'x2', 'x3'),response = i), design = wd)})

Related

Object is not a matrix error using cragg package and map in R

I am trying to use this function from the cragg package and iterate it using map but I keep getting the following error:
Error in model.frame.default(object, data, xlev = xlev) : object is
not a matrix
My code and reproducible example:
y <- data.frame(gender = rbinom(100, 1, 0.5), age = rnorm(100), vcam = rnorm(100),
rs6696259 = rbinom(100, 2, 0.5), rs5491 = rbinom(100, 2, 0.5))
map(c("rs6696259", "rs5491"), ~
cragg_donald(X =~ gender + age,
D =~ vcam,
Z =~ .x %>% as.name(),
data = y))
However this line of coding works just fine.
cragg_donald(X =~ gender + age,
D =~ vcam,
Z =~ rs6696259,
data = y)
Thank you.

How to set all coefficients to one in model?

To fix certain coefficient in regression to one we can use offset function.
I want to set all coefficients to 1.
Let's take this example:
set.seed(42)
y <- rnorm(100)
df <- data.frame("Uni" = runif(100), "Exp" = rexp(100), "Wei" = rweibull(100, 1))
lm(y~ offset(2*get("Uni")) + Exp + Wei, data = df)
Call:
lm(formula = y ~ offset(Uni) + offset(Exp) + offset(Wei), data = df)
Coefficients:
(Intercept)
-2.712
This code works, however what if I have huge amount of data e.g. 800 variables and I want to do for all of them ? Writing all their names would be not so efficient. Is there any solution which allows us to do it more tricky ?
I think I found one solution if we do it this way:
set.seed(42)
# Assign everything to one data frame
df <- data.frame("Dep" = rnorm(100), "Uni" = runif(100),
"Exp" = rexp(100), "Wei" = rweibull(100, 1))
varnames <- names(df)[-1]
# Create formula for the sake of model creation
form <- paste0("offset","(",varnames, ")",collapse = "+")
form <- as.formula(paste0(names(df)[1], "~", form))
lm(form, data = df)
1) terms/update The following one-liner will produce the indicated formula.
update(formula(terms(y ~ ., data = df)), ~ offset(.))
## y ~ offset(Uni + Exp + Wei)
2) reformulate/sprintf another approach is:
reformulate(sprintf("offset(%s)", names(df)), "y")
## y ~ offset(Dep) + offset(Uni) + offset(Exp) + offset(Wei)
3) rowSums Another approach is to simply sum each row:
lm(y ~ offset(rowSums(df)))
4) lm.fit We could use lm.fit in which case we don't need a formula:
lm.fit(cbind(y^0), y, offset = rowSums(df))
5) mean If you only need the coefficient then it is just:
mean(y - rowSums(df))

How to fit a GAM model to several pairs of (x,y) variables

I am trying to fit a GAM model to a dataset consisting of two pairs of (x,y) values i.e. (x1,y1) and (x2,y2) by first fitting the 1st pair and then moving to the second. When I call the gam function inside the ‘for’ loop, it gives an error “Not enough (non-NA) data to do anything meaningful”.
I suspect this is something to do with the way I construct the x1, y1, x2 and y2 labels of the columns because outside the ‘for’ loop the gam function works.
Thank you!
library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.8-26. For overview type 'help("mgcv-package")'.
library(ggplot2)
library(tidyverse)
# create dataframe
x1 = seq(0, 50, by = 0.5)
y1 = dnorm(x1, mean = 22, sd = 5)
x2 = seq(0, 50, by = 0.5)
y2 = dnorm(x2, mean = 28, sd = 7)
df = cbind.data.frame(x1, y1, x2, y2)
# plot(c(x1,x2), c(y1,y2))
count = ncol(df)/2
for (i in 1:count) {
x<-noquote(paste("x", i, sep = ""))
y<-noquote(paste("y", i, sep = ""))
print(x) # test
gam(y ~ s(x), data = df, method = "REML") # this call doesn't work
}
gam(y1 ~ s(x1), data = df, method = "REML") # this call works
I have managed to figure out what the problem is. It turned out that my construction of xi and yi vars is causing the problem because then the y ~ s(x) is not of type “formula”. I had to construct the equation outside the gam function call, convert it to type “formula” and then use it in the gam call.
library(mgcv)
library(ggplot2)
library(tidyverse)
# create test dataframe
x1 = seq(0, 50, by = 0.5)
y1 = dnorm(x1, mean = 25, sd = 5)
x2 = seq(0, 50, by = 0.5)
y2 = dnorm(x2, mean = 29, sd = 7)
df = cbind.data.frame(x1, y1, x2, y2)
plot(c(df$x1,df$x2), c(df$y1,df$y2))
(count = ncol(df)/2)
for (i in 1:count) {
# construct the formula to go into the "gam" function and convert it to type "formula" with the "as.formula" function
part1 <- noquote(paste0("y", i))
part2 <- paste0("~ s(")
frag1 <- paste(part1, part2)
part3 <- noquote(paste0("x", i))
frag2 <- paste0(frag1, part3)
frag3 <- paste0(frag2, ")")
fmla <- as.formula(frag3)
# fit the data
gam_mod <- gam(formula = fmla, data = df, method = "REML")
print(gam_mod)
}

R rollapply on glmnet

library(zoo)
library(glmnet)
I can get the rolling coefficients on a linear regression:
seat <- as.zoo(log(UKDriverDeaths))
time(seat) <- as.yearmon(time(seat))
seat <- merge(y = seat, y1 = lag(seat, k = -1),
y12 = lag(seat, k = -12), all = FALSE)
tail(seat)
fm <- rollapply(seat, width = 50,
FUN = function(z) coef(lm(y ~ y1 + y12, data = as.data.frame(z))),
by.column = FALSE, align = "right")
but I am having trouble getting the rolling coefficients for glmnet:
fm <- rollapply(seat, width = 50,
FUN = function(z) coef(cv.glmnet(z[,c(2,3)],z[,1],alpha=1, data =
as.data.frame(z))), by.column = FALSE, align = "right")
Thank you for any help
First, cv.glmnet doesn't have a data argument. It has x and y arguments which are the predictor matrix and response vector respectively.
Second, your seat dataset has missing values in the first row (unavoidable due to the lag operation). This will mess up glmnet, which has a rather bare-bones interface which does minimal checking.
Third, coef on a glmnet/cv.glmnet object returns a sparse matrix, which rollapply doesn't know what to do with.
Fixing all of these gives:
fm2 <- rollapply(seat, width=50, FUN=function(z)
{
z <- na.omit(z)
as.numeric(coef(cv.glmnet(z[, c(2, 3)], z[, 1], alpha=1)))
}, by.column=FALSE, align="right")
You can also use my glmnetUtils package, which implements a formula/data frame interface to glmnet. This deals with the first two problems above.
library(glmnetUtils)
fm3 <- rollapply(seat, width=50, FUN=function(z)
{
as.numeric(coef(cv.glmnet(y ~ y1 + y12, data=as.data.frame(z), alpha=1)))
}, by.column=FALSE, align="right")

multivariate regression

I have two dependents that both depent on two variables AND on each other, can this be modelled in R (must be!) but I can't figure out how, anyone a hint?
In clear terms:
I want to model my data with the following model:
Y1=X1*coef1+X2*coef2
Y2=X1*coef2+X2*coef3
Note: coef2 appears in both lines
Xi, Yi is input and output data respectively
I got this far:
lm(Y1~X1+X2,mydata)
now how do I add the second line of the model including the cross dependency?
Your help is greatly appreciated!
Cheers, Bastiaan
Try this:
# sample data - true coefs are 2, 3, 4
set.seed(123)
n <- 35
DF <- data.frame(X1 = 1, X2 = 1:n, X3 = (1:n)^2)
DF <- transform(DF, Y1 = X1 * 2 + X2 * 3 + rnorm(n),
Y2 = X1 * 3 + X2 * 4 + rnorm(n))
# construct data frame for required model
DF2 <- with(DF, data.frame(y = c(Y1, Y2),
x1 = c(X1, 0*X1),
x2 = c(X2, X1),
x3 = c(0*X2, X2)))
lm(y ~. - 1, DF2)
We see it does, indeed, recover the true coefs of 2, 3, 4:
> lm(y ~. - 1, DF2)
Call:
lm(formula = y ~ . - 1, data = DF2)
Coefficients:
x1 x2 x3
2.084 2.997 4.007

Resources