`ddply` fails to apply logistic regression (GLM) by group to my dataset - r

I'm working out the LD50 (lethal dosage) for multiple populations from different experiments using the MASS package. It's simple enough when I subset the data and do one at a time, but I'm getting an error when I use ddply. Essentially I need an LD50 for each population at each temperature.
My data looks somewhat like this:
# dput(d)
d <- structure(list(Pop = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), .Label = c("a", "b", "c"), class = "factor"), Temp = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("high", "low"), class = "factor"),
Dose = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), Dead = c(0L,
11L, 12L, 14L, 2L, 16L, 17L, 7L, 5L, 3L, 17L, 15L, 9L, 20L,
8L, 19L, 7L, 2L, 20L, 14L, 9L, 15L, 1L, 15L), Alive = c(20L,
9L, 8L, 6L, 18L, 4L, 3L, 13L, 15L, 17L, 3L, 5L, 11L, 0L,
12L, 1L, 13L, 18L, 0L, 6L, 11L, 5L, 19L, 5L)), .Names = c("Pop",
"Temp", "Dose", "Dead", "Alive"), class = "data.frame", row.names = c(NA,
-24L))
The following works fine:
d$Mortality <- cbind(d$Alive, d$Dead)
a <- d[d$Pop=="a" & d$Temp=="high",]
library(MASS)
dose.p(glm(Mortality ~ Dose, family="binomial", data=a), p=0.5)[1]
But when I put this into ddply I get the following error:
library(plyr)
d$index <- paste(d$Pop, d$Temp, sep="_")
ddply(d, 'index', function(x) dose.p(glm(Mortality~Dose, family="binomial", data=x), p=0.5)[1])
Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
I can get the right LD50 when I use a proportion but can't figure out where I've gone wrong with my approach (and had already written this question).

Perhaps this will amaze you. But if you choose to use formula
cbind(Alive, Dead) ~ Dose
instead of
Mortality ~ Dose
the problem will be gone.
library(MASS)
library(plyr)
## `d` is as your `dput` result
## a function to apply
f <- function(x) {
fit <- glm(cbind(Alive, Dead) ~ Dose, family = "binomial", data = x)
dose.p(fit, p=0.5)[[1]]
}
## call `ddply`
ddply(d, .(Pop, Temp), f)
# Pop Temp V1
#1 a high 2.6946257
#2 a low 2.1834099
#3 b high 2.5000000
#4 b low 0.4830998
#5 c high 2.2899553
#6 c low 2.5000000
So what happened with Mortality ~ Dose? Let's set .inform = TRUE when calling ddply:
## `d` is as your `dput` result
d$Mortality <- cbind(d$Alive, d$Dead)
## a function to apply
g <- function(x) {
fit <- glm(Mortality ~ Dose, family = "binomial", data = x)
dose.p(fit, p=0.5)[[1]]
}
## call `ddply`
ddply(d, .(Pop, Temp), g, .inform = TRUE)
#Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
#Error: with piece 1:
# Pop Temp Dose Dead Alive Mortality
#1 a high 1 0 20 20
#2 a high 2 11 9 9
#3 a high 3 12 8 8
#4 a high 4 14 6 6
Now we we see that variable Mortality has lost dimension, and only the first column (Alive) is retained. For a glm with binomial response, if the response is a single vector, glm expects 0-1 binary or a factor of two levels. Now, we have integers 20, 9, 8, 6, ..., hence glm will complain
Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
There is really no way to fix this issue. I have tried using a protector:
d$Mortality <- I(cbind(d$Alive, d$Dead))
but it still ends up with the same failure.

Related

Error running GEE logistic Model: NA/NaN/Inf in foreign function call (arg 2)

I am running a logistic regression model implemented through generalized estimating equations (GEEs) and keep running into the following error despite trying multiple solutions posted here on SO and elsewhere. I am unsure from where this error arises. I am using the gee package but the error also occurs in geepack.
Does anyone know why this error may be occurring despite no NA, inf, or character variables in the dataset? My suspicion is that there is something very simple I am missing, but after two days, I have to throw it to better coders than me.
Minimal data and code to reproduce the error, attempts at solutions, and relevant SO questions are below.
Data
df <- structure(list(id = structure(c(7L, 1L, 20L, 15L, 14L, 6L, 8L, 24L, 21L, 19L, 5L, 4L, 18L,
13L, 23L, 16L, 25L, 12L, 10L, 9L, 22L, 17L, 11L, 3L, 2L, 2L),
levels = c("ALWA28M", "BOMA13M", "BOMA41M", "DAYA35M", "DEMB72M", "EDAB3WM", "EFCH52M",
"FASI6M", "FRRO35M", "GRAS35F", "GRKA48M", "JARA35M", "KABA27M", "KECH4WM",
"MAAD60M", "MACH33M", "MEBA29F", "MIGU42M", "MTSA10M", "NTMA22F", "RACA2M",
"STMA35M", "TOKE39M", "TRMA12M", "YOLU29M"), class = "factor"),
testres = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L),
levels = c("POS", "NEG"), class = "factor"),
agegrp = structure(c(5L, 3L, 3L, 5L, 1L, 1L, 2L, 2L, 1L, 2L, 6L, 4L, 4L,
3L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 5L, 4L, 2L, 2L),
levels = c("0", "1", "2", "3", "4", "5"), class = "factor")),
row.names = c(NA, 26L),
class = "data.frame")
Model
gee::gee(testres ~ agegrp, data = df,
id = id,
family = binomial,
corstr = "exchangeable")
Error
Error in gee::gee(testres ~ agegrp, data = df, id = id, family = binomial, :
NA/NaN/Inf in foreign function call (arg 2)
In addition: Warning message:
In gee::gee(testres ~ agegrp, data = df, id = id, family = binomial, :
NAs introduced by coercion
Checking data to ensure no NA, Inf, or character variables - all are factors with no missing data
# All factors
str(df)
# 'data.frame': 26 obs. of 3 variables:
# $ id : Factor w/ 25 levels "ALWA28M","BOMA13M",..: 7 1 20 15 14 6 8 24 21 19 ...
# $ testres: Factor w/ 2 levels "POS","NEG": 1 1 1 2 1 1 1 1 1 1 ...
# $ agegrp : Factor w/ 6 levels "0","1","2","3",..: 5 3 3 5 1 1 2 2 1 2 ...
# No NAs or Infinites
lapply(df, table, useNA = "always")
# 0 NAs
lapply(df, \(x) table(is.infinite(x)))
# All FALSE
Alternative approach using geepack
geepack::geeglm(testres ~ agegrp,
data = df, id = id,
corstr = "exchangeable",
family = "binomial")
geepack error:
Error in lm.fit(zsca, qlf(pr2), offset = soffset) : NA/NaN/Inf in 'y'
In addition: Warning messages:
1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, mu) : β€˜-’ not meaningful for factors
Changing the correlation structure yields same error. Standard logistic regression converges:
summary(glm(testres ~ agegrp, data = df, family = "binomial"(link = logit)))
SO questions that did not resolve the issue. While this issue is common on the site, in my view there is not a sufficient answer to this question on SO, hence the decision to post.
How to eliminate "NA/NaN/Inf in foreign function call (arg 7)" running predict with randomForest
R: NA/NaN/Inf in foreign function call (arg 1)
Error in fitting a model with gee(): NA/NaN/Inf in foreign function call (arg 3)
NA/NaN/Inf in foreign function call (arg 2)
NA/NaN/Inf in foreign function call (arg 5)
lme: NA/NaN/Inf in foreign function call (arg 3)
NA/NaN/Inf in foreign function call (arg 1) when trying to run a PGLS (Pagel's lambda)
How to eliminate β€œNA/NaN/Inf in foreign function call (arg 3)” in bigglm
R error in glmnet: NA/NaN/Inf in foreign function call
Using 0 and 1 in testres works:
df <- structure(list(id = structure(c(7L, 1L, 20L, 15L, 14L, 6L, 8L, 24L, 21L, 19L, 5L, 4L, 18L,
13L, 23L, 16L, 25L, 12L, 10L, 9L, 22L, 17L, 11L, 3L, 2L, 2L),
levels = c("ALWA28M", "BOMA13M", "BOMA41M", "DAYA35M", "DEMB72M", "EDAB3WM", "EFCH52M",
"FASI6M", "FRRO35M", "GRAS35F", "GRKA48M", "JARA35M", "KABA27M", "KECH4WM",
"MAAD60M", "MACH33M", "MEBA29F", "MIGU42M", "MTSA10M", "NTMA22F", "RACA2M",
"STMA35M", "TOKE39M", "TRMA12M", "YOLU29M"), class = "factor"),
testres = structure(c(1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L)),
agegrp = structure(c(5L, 3L, 3L, 5L, 1L, 1L, 2L, 2L, 1L, 2L, 6L, 4L, 4L,
3L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 5L, 4L, 2L, 2L),
levels = c("0", "1", "2", "3", "4", "5"), class = "factor")),
row.names = c(NA, 26L),
class = "data.frame")
gee::gee(testres ~ agegrp, data = df,
id = id,
family = binomial,
corstr = "exchangeable")
#> Beginning Cgee S-function, #(#) geeformula.q 4.13 98/01/27
#> running glm to get initial regression estimate
#> (Intercept) agegrp1 agegrp2 agegrp3 agegrp4
#> 1.956607e+01 -3.377525e-08 -1.817977e+01 -1.831331e+01 -1.887292e+01
#> agegrp5
#> -3.513736e-08
#> Error in gee::gee(testres ~ agegrp, data = df, id = id, family = binomial, : Cgee: error: logistic model for probability has fitted value very close to 1.
#> estimates diverging; iteration terminated.
There is now an error because the model has fitted some probabilities very close to 0 or 1, but I think this is an unrelated problem (see the section Details in ?glm).

R:Error in `[.data.frame` undefined columns selected

I have this data sample
dput()
timeseries=structure(list(sales_point_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), calendar_id_operday = c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L,
20L, 21L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L), line_fact_amt = c(55767L,
59913L, 36363L, 48558L, 505L, 76344L, 22533L, 11965L, 78944L,
36754L, 30621L, 55716L, 32470L, 62165L, 57986L, 2652L, 16487L,
72849L, 73715L, 65656L, 64411L, 47460L, 61866L, 10877L, 72392L,
53011L, 23544L, 76692L, 10388L, 24255L, 56684L, 59329L, 6655L,
65612L, 17495L, 10389L, 63702L, 47407L, 78782L, 22898L, 21151L,
32587L)), class = "data.frame", row.names = c(NA, -42L))
calendar_id_operday=1 its mean range of week 20210101-20210108(ymd) but here there is no date format only week, just such a specificity of these data . I try transform my data.
library(reshape)
df <- cast(melt(timeseries, id=c("calendar_id_operday"), na.rm=TRUE),
line_fact_amt + calendar_id_operday)[, c("line_fact_amt", "calendar_id_operday", substring(month.name, 1, 3))]
colnames(df)[1] <- "sales_point_id"
df[, substring(month.name, 1, 3)] <- lapply(timeseries[, substring(month.name, 1, 3)],
function(x) as.numeric(as.character(x)))
But something goes wrong
Error in `[.data.frame`(timeseries, , substring(month.name, 1, 3)) :
undefined columns selected
I want that as the result i got this data.frame
sales_point_id year jan-1 jan-2 jan-3 jan-4 feb1
1 1 2021 8034.843 7485.725 8238.493 8446.994 134
2 1 2021 7810.315 7261.198 8013.965 8222.466 346
3 1 2021 7585.788 7036.670 7789.438 7997.938 54364
4 1 2021 7361.260 6812.142 7564.910 7773.411 34546
5 1 2021 7136.733 6587.615 7340.382 7548.883 46436
jan-1 is data for firts week of jan. jan2- is the second week of jan and so on.
What should i do to get desired result?
Thanks for your valuable help

Subsetting a function variable within another variable

I have the following function that performs a step-wise linear regression, and it works well with numerical and integer values, although, when I have factors as independent variables, I get the following error:
Error in [.data.frame(d, , names(resul0)) : undefined columns selected
The layout of the function:
stepfor(bird$Richness, data.frame(GARDENSIZE, Site, season), alfa = 0.2)
I have figured out a way that splits the factors into columns and assigns them respective values following the comments, given by this:
x <- function(x) {x %>%
select(where(negate(is.numeric))) %>%
map_dfc(~ model.matrix(~ .x -1) %>%
as_tibble) %>%
rename_all(~ str_remove(., "\\.x"))
}
Though, I'm not sure how I can include it into the function below, so that x can be implemented with the function below by calling it stepfor likeso:
stepfor(bird$Richness, data.frame(x(bird)), alfa = 0.2)
I just want to know how to include the function x within the function below to have it work like above. And if there aren't any factors in the data, then set the function as FALSE so it doesn't return an error like x is missing.
Here is my function:
stepfor<-function (y = y, d = d, alfa = 0.05)
{
pval <- NULL
design <- NULL
j = 1
resul0 <- summary(lm(y ~ ., data = d))$coefficients[, 4]
d <- as.data.frame(d[, names(resul0)][-1])
for (i in 1:ncol(d)) {
sub <- cbind(design, d[, i])
sub <- as.data.frame(sub)
lm2 <- lm(y ~ ., data = sub)
result <- summary(lm2)
pval[i] <- result$coefficients[, 4][j + 1]
}
min <- min(pval)
while (min < alfa) {
b <- pval == min
c <- c(1:length(pval))
pos <- c[b]
pos <- pos[!is.na(pos)][1]
design <- cbind(design, d[, pos])
design <- as.data.frame(design)
colnames(design)[j] <- colnames(d)[pos]
j = j + 1
d <- as.data.frame(d[, -pos])
pval <- NULL
if (ncol(d) != 0) {
for (i in 1:ncol(d)) {
sub <- cbind(design, d[, i])
sub <- as.data.frame(sub)
lm2 <- lm(y ~ ., data = sub)
result <- summary(lm2)
pval[i] <- result$coefficients[, 4][j + 1]
}
min <- min(pval, na.rm = TRUE)
}
else min <- 1
}
if (is.null(design)) {
lm1 <- lm(y ~ 1)
}
else {
lm1 <- lm(y ~ ., data = design)
}
return(lm1)
}
Reproducible code:
bird<- structure(list(season = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L), .Label = c("Summer",
"Winter"), class = "factor"), Richness = c(20L, 17L, 18L, 19L,
11L, 15L, 17L, 15L, 15L, 9L, 13L, 14L, 12L, 18L, 30L, 30L, 17L,
25L, 32L, 32L, 29L, 29L, 27L, 18L, 25L, 24L, 15L, 18L, 23L, 22L,
25L, 22L, 22L, 23L, 17L, 22L, 7L, 15L, 16L, 20L, 24L, 21L, 22L,
39L, 17L, 17L, 13L, 26L, 25L, 20L), GARDEN_SIZE = structure(c(1L,
1L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 3L, 1L, 1L, 1L,
1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 3L, 3L, 2L, 2L,
1L), .Label = c("L", "M", "S"), class = "factor"), Site = structure(c(1L,
1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("R", "S", "U"), class = "factor")), row.names = c(NA,
50L), class = "data.frame")
Consider this:
stepfor<-function (y = y, d = d, alfa = 0.05)
{
# split the incoming data to give non-numeric the factor treatment:
x1 <- d %>% select(where(negate(is.numeric))) %>%
map_dfc(~ model.matrix(~ .x -1) %>%
as_tibble) %>%
rename_all(~ str_remove(., "\\.x"))
x2 <- d %>% select(where(is.numeric))
d <- cbind( x1, x2 )
pval <- NULL
design <- NULL
j = 1
resul0 <- summary(lm(y ~ ., data = d))$coefficients[, 4][-1]
d <- as.data.frame(d[, names(resul0)])
# rest of function body as is
}
eg. move the [-1] from the 5th line to the 4th to remove the intercept term earlier. The reamining coefficients shouldnow match the ones you have in your data.frame and names(resul0) should all exist in your data.frame
There is a problem with your approach to tackle this. You do:
d <- as.data.frame( d[, names(resul0) ] [-1] )
This code tries to look up all of the names(resul0) inside the d data.frame. This includes the intercept term, and this therefore fails. (And at this point its too late to remove the intercept afterwards as the damage has already been done)
You need to remove the intercept before looking up the names inside d. Then the name-error won't happen.
The body of the x function can be inserted in there, quite straight forward.

Comparing two groups with the linear model

I want to do a regression when parendiv is my Dependent variable and routine1997 is my Independent variable, and compare males to females. The code is like this:
structure(list(gender = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L), .Label = c("male",
"female"), class = "factor"), parent = structure(c(2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("intact", "parentaldivorce"), class = "factor"),
routine = structure(c(1L, 1L, 1L, 1L, NA, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 1L, 2L, 3L, 2L, 1L, 3L, 3L), .Label = c("Med",
"High", "Low"), class = "factor")), row.names = c(3L, 5L,
6L, 7L, 8L, 9L, 10L, 11L, 16L, 18L, 19L, 21L, 22L, 23L, 24L,
25L, 28L, 29L, 30L, 34L), class = "data.frame")
This is the code and I want to specifically compare coefficient among men and women.
lm(parent~routine, data=nlsy97, subset=gender)
There are two ways to compare the coefficients.
The easiest way would be to code gender as dummy (0/1) and include an interaction term in the model. Then, you get the difference the gender makes for the coefficient, complete with a p-value:
out = lm(parent ~ routine + gender + routine*gender, data=nlsy97)
The other way would be to use a multigroup regression and comparing the pooled regression model (all genders included) with the unpooled models (seperate slopes or intercepts or both for genders). The model with the smallest AIC fits the data best. If your random slope model yields the lowest AIC, you have gender differences in your effect. If the random intercept is best, you just have level differences between the genders but may assume equal effects.
library(lme4)
pooled = lm(parent ~ routine, data=nlsy97)
r.inter = lmer(parent ~ routine + (1|gender), data=nlsy97)
r.slope = lmer(parent ~ routine + (routine|gender), data=nlsy97)
r.unpooled = lmer(parent ~ routine + (1+routine|gender), data=nlsy97)
AIC(pooled)
AIC(r.inter)
AIC(r.slope)
AIC(r.unpooled)
Using the method coefficients() on the model with the lowest AIC provides you with the exact coefficients for the individual groups.
EDIT: I just noticed that you just have 20 cases in total. If this is your whole dataset you should probably not do any statistical analyses at all.

array manipulation: calculate odds ratios for a layer in a 3-way table

This is a question about array and data frame manipulation and calculation, in the
context of models for log odds in contingency tables. The closest question I've found to this is How can i calculate odds ratio in many table, but mine is more general.
I have a data frame representing a 3-way frequency table, of size 5 (litter) x 2 (treatment) x 3 (deaths).
"Freq" is the frequency in each cell, and deaths is the response variable.
Mice <-
structure(list(litter = c(7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L,
11L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L, 7L, 7L, 8L,
8L, 9L, 9L, 10L, 10L, 11L, 11L), treatment = structure(c(1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("A",
"B"), class = "factor"), deaths = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("0", "1",
"2+"), class = "factor"), Freq = c(58L, 75L, 49L, 58L, 33L, 45L,
15L, 39L, 4L, 5L, 11L, 19L, 14L, 17L, 18L, 22L, 13L, 22L, 12L,
15L, 5L, 7L, 10L, 8L, 15L, 10L, 15L, 18L, 17L, 8L)), .Names = c("litter",
"treatment", "deaths", "Freq"), row.names = c(NA, 30L), class = "data.frame")
From this, I want to calculate the log odds for adjacent categories of the last variable (deaths)
and have this value in a data frame with factors litter (5), treatment (2), and contrast (2), as detailed below.
The data can be seen in xtabs() form:
mice.tab <- xtabs(Freq ~ litter + treatment + deaths, data=Mice)
ftable(mice.tab)
deaths 0 1 2+
litter treatment
7 A 58 11 5
B 75 19 7
8 A 49 14 10
B 58 17 8
9 A 33 18 15
B 45 22 10
10 A 15 13 15
B 39 22 18
11 A 4 12 17
B 5 15 8
>
From this, I want to calculate the (adjacent) log odds of 0 vs. 1 and 1 vs.2+ deaths, which is easy in
array format,
odds1 <- log(mice.tab[,,1]/mice.tab[,,2]) # contrast 0:1
odds2 <- log(mice.tab[,,2]/mice.tab[,,3]) # contrast 1:2+
odds1
treatment
litter A B
7 1.6625477 1.3730491
8 1.2527630 1.2272297
9 0.6061358 0.7156200
10 0.1431008 0.5725192
11 -1.0986123 -1.0986123
>
But, for analysis, I want to have these in a data frame, with factors litter, treatment and contrast
and a column, 'logodds' containing the entries in the odds1 and odds2 tables, suitably strung out.
More generally, for an I x J x K table, where the last factor is the response, my desired result
is a data frame of IJ(K-1) rows, with adjacent log odds in a 'logodds' column, and ideally, I'd like
to have a general function to do this.
Note that if T is the 10 x 3 matrix of frequencies shown by ftable(), the calculation is essentially
log(T) %*% matrix(c(1, -1, 0,
0, 1, -1))
followed by reshaping and labeling.
Can anyone help with this?

Resources