I am doing one way ANOVA for a dataset and I am showing some rows here:-
Number Call Weight
1 X 33.29
2 Y 88.22
3 Y 70.19
4 Y 69.25
5 X 73.26
6 X 56.18
7 Y 16.19
8 Y 20.21
9 Y 50.26
10 X 95.29
I did anova using:-
aov <- aov(data$Weight ~ data$Call)
But it does not give any p value. I am also getting:-
Warning messages:
1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors
I have tried your code on these data and it works without issue. Try to check str of your data. Most probably issue is that Weight is factor in your case and you need to change it to numeric with as.numeric().
dta <- read.table(text=
"Number, Call, Weight
1, X, 33.29
2, Y, 88.22
3, Y, 70.19
4, Y, 69.25
5, X, 73.26
6, X, 56.18
7, Y, 16.19
8, Y, 20.21
9, Y, 50.26
1,0 X, 95.29", header=T, sep=",")
summary(aov(dta$Weight ~ dta$Call))
Result
Call:
aov(formula = dta$Weight ~ dta$Call)
Terms:
dta$Call Residuals
Sum of Squares 352.450 6303.466
Deg. of Freedom 1 8
Residual standard error: 28.07015
Estimated effects may be unbalanced
result for str(dta)
'data.frame': 10 obs. of 3 variables:
$ Number: int 1 2 3 4 5 6 7 8 9 1
$ Call : Factor w/ 3 levels " X"," Y",..: 1 2 2 2 1 1 2 2 2 3
$ Weight: num 33.3 88.2 70.2 69.2 73.3 ...
Related
I'm trying to apply the lme function to my data, but the model gives follow message:
mod.1 = lme(lon ~ sex + month2 + bat + sex*month2, random=~1|id, method="ML", data = AA_patch_GLM, na.action=na.exclude)
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
dput for data, copy from https://pastebin.com/tv3NvChR (too large to include here)
str(AA_patch_GLM)
'data.frame': 2005 obs. of 12 variables:
$ lon : num -25.3 -25.4 -25.4 -25.4 -25.4 ...
$ lat : num -51.9 -51.9 -52 -52 -52 ...
$ id : Factor w/ 12 levels "24641.05","24642.03",..: 1 1 1 1 1 1 1 1 1 1 ...
$ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
$ bat : int -3442 -3364 -3462 -3216 -3216 -2643 -2812 -2307 -2131 -2131 ...
$ year : chr "2005" "2005" "2005" "2005" ...
$ month : chr "12" "12" "12" "12" ...
$ patch_id: Factor w/ 45 levels "111870.17_1",..: 34 34 34 34 34 34 34 34 34 34 ...
$ YMD : Date, format: "2005-12-30" "2005-12-31" "2005-12-31" ...
$ month2 : Ord.factor w/ 7 levels "January"<"February"<..: 7 7 7 7 7 1 1 1 1 1 ...
$ lonsc : num [1:2005, 1] -0.209 -0.213 -0.215 -0.219 -0.222 ...
$ batsc : num [1:2005, 1] 0.131 0.179 0.118 0.271 0.271 ...
What's the problem?
I saw a solution applying the lme4::lmer function, but there is another option to continue to use lme function?
The problem is that you have collinear combinations of predictors. In particular, here are some diagnostics:
## construct the fixed-effect model matrix for your problem
X <- model.matrix(~ sex + month2 + bat + sex*month2, data = AA_patch_GLM)
lc <- caret::findLinearCombos(X)
colnames(X)[lc$linearCombos[[1]]]
## [1] "sexM:month2^6" "(Intercept)" "sexM" "month2.L"
## [5] "month2.C" "month2^4" "month2^5" "month2^6"
## [9] "sexM:month2.L" "sexM:month2.C" "sexM:month2^4" "sexM:month2^5"
This is in a weird order, but it suggests that the sex × month interaction is causing problems. Indeed:
with(AA_patch_GLM, table(sex, month2))
## sex January February March April May June December
## F 367 276 317 204 43 0 6
## M 131 93 90 120 124 75 159
shows that you're missing data for one sex/month combination (i.e., no females were sampled in June).
You can:
construct the sex/month interaction yourself (data$SM <- with(data, interaction(sex, month2, drop = TRUE))) and use ~ SM + bat — but then you'll have to sort out main effects and interactions yourself (ugh)
construct the model matrix by hand (as above), drop the redundant column(s), then include all the resulting columns in the model:
d2 <- with(AA_patch_GLM,
data.frame(lon,
as.data.frame(X),
id))
## drop linearly dependent column
## note data.frame() has "sanitized" variable names (:, ^ both converted to .)
d2 <- d2[names(d2) != "sexM.month2.6"]
lme(reformulate(colnames(d2)[2:15], response = "lon"),
random=~1|id, method="ML", data = d2)
Again, the results will be uglier than the simpler version of the model.
use a patched version of nlme (I submitted a patch here but it hasn't been considered)
remotes::install_github("bbolker/nlme")
I read that the R flexsurv package can also be used for modeling time-dependent covariates according to Christopher Jackson (2016) ["flexsurv: a platform for parametric survival modeling in R, Journal of Statistical Software, 70 (1)].
However, I was not able to figure out how, even after several adjustments and searches in online forums.
Before turning to the estimation of time-dependent covariates I tried to create a simple model with only time-independent covariates to test whether I specified the Surv object correctly. Here is a small example.
library(splitstackshape)
library(flexsurv)
## create sample data
n=50
set.seed(2)
t <- rpois(n,15)+1
x <- rnorm(n,t,5)
df <- data.frame(t,x)
df$id <- 1:n
df$rep <- df$t-1
Which looks like this:
t x id rep
1 12 17.696149 1 11
2 12 20.358094 2 11
3 11 2.058789 3 10
4 16 26.156213 4 15
5 13 9.484278 5 12
6 15 15.790824 6 14
...
And the long data:
long.df <- expandRows(df, "rep")
rep.vec<-c()
for(i in 1:n){
rep.vec <- c(rep.vec,1:(df[i,"t"]-1))
}
long.df$start <- rep.vec
long.df$stop <- rep.vec +1
long.df$censrec <- 0
long.df$censrec<-ifelse(long.df$stop==long.df$t,1,long.df$censrec)
Which looks like this:
t x id start stop censrec
1 12 17.69615 1 1 2 0
1.1 12 17.69615 1 2 3 0
1.2 12 17.69615 1 3 4 0
1.3 12 17.69615 1 4 5 0
1.4 12 17.69615 1 5 6 0
1.5 12 17.69615 1 6 7 0
1.6 12 17.69615 1 7 8 0
1.7 12 17.69615 1 8 9 0
1.8 12 17.69615 1 9 10 0
1.9 12 17.69615 1 10 11 0
1.10 12 17.69615 1 11 12 1
2 12 20.35809 2 1 2 0
...
Now I can estimate a simple Cox model to see whether it works:
coxph(Surv(t)~x,data=df)
This yields:
coef exp(coef) se(coef) z p
x -0.0588 0.9429 0.0260 -2.26 0.024
And in the long format:
coxph(Surv(start,stop,censrec)~x,data=long.df)
I get:
coef exp(coef) se(coef) z p
x -0.0588 0.9429 0.0260 -2.26 0.024
Taken together I conclude that my transformation into the long format was correct. Now, turning to the flexsurv framework:
flexsurvreg(Surv(time=t)~x,data=df, dist="weibull")
yields:
Estimates:
data mean est L95% U95% se exp(est) L95% U95%
shape NA 5.00086 4.05569 6.16631 0.53452 NA NA NA
scale NA 13.17215 11.27876 15.38338 1.04293 NA NA NA
x 15.13380 0.01522 0.00567 0.02477 0.00487 1.01534 1.00569 1.02508
But
flexsurvreg(Surv(start,stop,censrec) ~ x ,data=long.df, dist="weibull")
causes an error:
Error in flexsurvreg(Surv(start, stop, censrec) ~ x, data = long.df, dist = "weibull") :
Initial value for parameter 1 out of range
Would anyone happen to know the correct syntax for the latter Surv object? If you use the correct syntax, do you get the same estimates?
Thank you very much,
best,
David
===============
EDIT AFTER FEEDBACK FROM 42
===============
library(splitstackshape)
library(flexsurv)
x<-c(8.136527, 7.626712, 9.809122, 12.125973, 12.031536, 11.238394, 4.208863, 8.809854, 9.723636)
t<-c(2, 3, 13, 5, 7, 37 ,37, 9, 4)
df <- data.frame(t,x)
#transform into long format for time-dependent covariates
df$id <- 1:length(df$t)
df$rep <- df$t-1
long.df <- expandRows(df, "rep")
rep.vec<-c()
for(i in 1:length(df$t)){
rep.vec <- c(rep.vec,1:(df[i,"t"]-1))
}
long.df$start <- rep.vec
long.df$stop <- rep.vec +1
long.df$censrec <- 0
long.df$censrec<-ifelse(long.df$stop==long.df$t,1,long.df$censrec)
coxph(Surv(t)~x,data=df)
coxph(Surv(start,stop,censrec)~x,data=long.df)
flexsurvreg(Surv(time=t)~x,data=df, dist="weibull")
flexsurvreg(Surv(start,stop,censrec) ~ x ,data=long.df, dist="weibull",inits=c(shape=.1, scale=1))
Which yields the same estimates for both coxph models but
Call:
flexsurvreg(formula = Surv(time = t) ~ x, data = df, dist = "weibull")
Estimates:
data mean est L95% U95% se exp(est) L95% U95%
shape NA 1.0783 0.6608 1.7594 0.2694 NA NA NA
scale NA 27.7731 3.5548 216.9901 29.1309 NA NA NA
x 9.3012 -0.0813 -0.2922 0.1295 0.1076 0.9219 0.7466 1.1383
N = 9, Events: 9, Censored: 0
Total time at risk: 117
Log-likelihood = -31.77307, df = 3
AIC = 69.54614
and
Call:
flexsurvreg(formula = Surv(start, stop, censrec) ~ x, data = long.df,
dist = "weibull", inits = c(shape = 0.1, scale = 1))
Estimates:
data mean est L95% U95% se exp(est) L95% U95%
shape NA 0.8660 0.4054 1.8498 0.3353 NA NA NA
scale NA 24.0596 1.7628 328.3853 32.0840 NA NA NA
x 8.4958 -0.0912 -0.3563 0.1739 0.1353 0.9128 0.7003 1.1899
N = 108, Events: 9, Censored: 99
Total time at risk: 108
Log-likelihood = -30.97986, df = 3
AIC = 67.95973
Reading the error message:
Error in flexsurvreg(Surv(start, stop, censrec) ~ x, data = long.df, dist = "weibull", :
initial values must be a numeric vector
And then reading the help page, ?flexsurvreg, it seemed as though an attempt at setting values for inits to a named numeric vector should be attempted:
flexsurvreg(Surv(start,stop,censrec) ~ x ,data=long.df, dist="weibull", inits=c(shape=.1, scale=1))
Call:
flexsurvreg(formula = Surv(start, stop, censrec) ~ x, data = long.df,
dist = "weibull", inits = c(shape = 0.1, scale = 1))
Estimates:
data mean est L95% U95% se exp(est) L95% U95%
shape NA 5.00082 4.05560 6.16633 0.53454 NA NA NA
scale NA 13.17213 11.27871 15.38341 1.04294 NA NA NA
x 15.66145 0.01522 0.00567 0.02477 0.00487 1.01534 1.00569 1.02508
N = 715, Events: 50, Censored: 665
Total time at risk: 715
Log-likelihood = -131.5721, df = 3
AIC = 269.1443
Extremely similar results. My guess was basically a stab in the dark, so I have no guidance on how to make a choice if this had not succeeded other than to "expand the search."
I just want to mention that in flexsurv v1.1.1, running this code:
flexsurvreg(Surv(start,stop,censrec) ~ x ,data=long.df, dist="weibull")
doesn't return any errors. It also gives the same estimates as the non time-varying command
flexsurvreg(Surv(time=t)~x,data=df, dist="weibull")
I am working on a map, where the color of each point is proportional to one response variable, and the size of the point is proportional to another. I've noticed that when I try to plot the points using formula notation things go haywire, while default notation performs as expected. I have used formula notation to plot maps many times before, and thought that the notations were nearly interchangeable. Why would these produce different results? I have read through the plot.formula and plot.default documentation and haven't been able to figure it out. Based on this I am wondering if it has to do with the columns of dat being coerced to factors, but I'm not sure why that would be happening. Any ideas?
Consider the following example data frame, dat:
latitude <- c(runif(10, min = 45, max = 48))
latitude[9] <- NA
longitude <- c(runif(10, min = -124.5, max = -122.5))
longitude[9] <- NA
color <- c("#00FFCCCC", "#99FF00CC", "#FF0000CC", "#3300FFCC", "#00FFCCCC",
"#00FFCCCC", "#3300FFCC", "#00FFCCCC", NA, "#3300FFCC")
size <- c(4.916667, 5.750000, 7.000000, 2.000000, 5.750000,
4.500000, 2.000000, 4.500000, NA, 2.000000)
dat <- as.data.frame(cbind(longitude, latitude, color, size))
Plotting according to formula notation
plot(latitude ~ longitude, data = dat, type = "p", pch = 21, col = 1, bg = color, cex = size)
produces
this mess and the following error: graphical parameter "type" is obsolete.
Plotting according to the default notation
plot(longitude, latitude, type = "p", pch = 21, col = 1, bg = color, cex = size)
works as expected, though with the same error.
There are a couple of problems with this. First is that your use of cbind is turning this into a matrix, albeit temporarily, which is converting your numbers to character. See:
dat <- as.data.frame(cbind(longitude, latitude, color, size))
str(dat)
# 'data.frame': 10 obs. of 4 variables:
# $ longitude: Factor w/ 9 levels "-122.855375511572",..: 6 8 9 1 4 3 2 7 NA 5
# $ latitude : Factor w/ 9 levels "45.5418886151165",..: 6 2 4 1 3 7 5 9 NA 8
# $ color : Factor w/ 4 levels "#00FFCCCC","#3300FFCC",..: 1 3 4 2 1 1 2 1 NA 2
# $ size : Factor w/ 5 levels "2","4.5","4.916667",..: 3 4 5 1 4 2 1 2 NA 1
If instead you just use data.frame, you'll get:
dat <- data.frame(longitude, latitude, color, size)
str(dat)
# 'data.frame': 10 obs. of 4 variables:
# $ longitude: num -124 -124 -124 -123 -124 ...
# $ latitude : num 47.3 45.9 46.3 45.5 46 ...
# $ color : Factor w/ 4 levels "#00FFCCCC","#3300FFCC",..: 1 3 4 2 1 1 2 1 NA 2
# $ size : num 4.92 5.75 7 2 5.75 ...
plot(latitude ~ longitude, data = dat, pch = 21, col = 1, bg = color, cex = size)
But now the colors are all dorked. Okay, the problem is likely because your $color is a factor, which is being interpreted internally as integers. Try stringsAsFactors=F:
dat <- data.frame(longitude, latitude, color, size, stringsAsFactors=FALSE)
str(dat)
# 'data.frame': 10 obs. of 4 variables:
# $ longitude: num -124 -124 -124 -123 -124 ...
# $ latitude : num 47.3 45.9 46.3 45.5 46 ...
# $ color : chr "#00FFCCCC" "#99FF00CC" "#FF0000CC" "#3300FFCC" ...
# $ size : num 4.92 5.75 7 2 5.75 ...
plot(latitude ~ longitude, data = dat, pch = 21, col = 1, bg = color, cex = size)
I am attempting to run a classification algorithm for a dataset with no missing values. Here is the dataset description:
'data.frame': 59977 obs. of 6 variables:
$ gender : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 1 1 2 2 ...
$ age : num 35.7 35.7 35.7 35.7 35.7 ...
$ code : Factor w/ 492 levels "ADN105","AXN16B",..: 128 128 128 363 363 363 104 104 221 221 ...
$ totalflags : num 4 4 4 4 4 4 3 3 2 2 ...
$ measure2 : num 30 30 30 1 1 1 23 23 22 22 ...
$ outcome : num 1 1 1 0 0 0 1 1 1 1 ...
- attr(*, "na.action")=Class 'omit' Named int [1:138] 3718 3719 5493 5494 5495 5496 7302 7303 8415 8416 ...
.. ..- attr(*, "names")= chr [1:138] "4929" "4930" "7384" "7385" ...
When I run the following command
x <- Mydataset[,1:5]
y <- Mydataset[,6]
fit <- glmnet(x, y, family="binomial", alpha=0.5, lambda=0.001)
I get
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
NA/NaN/Inf in foreign function call (arg 5)
In addition: Warning message:
In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
NAs introduced by coercion
Before running the glm model, I did this:
Mydataset <- na.omit(Mydataset)
And checked to make sure no NA's exist:
sapply(Mydataset, function(y) sum(length(which(is.na(y)))))
and I got:
gender age code totalflags measure2 outcome
0 0 0 0 0 0
I looked at other questions for couldn't find anything relevant. Appreciate any thoughts and help in this
EDIT: ANSWER
I did a little digging and decided to change the data frame to numeric matrix and the model ran without complaining. This is the code that helped me:
x <- data.matrix(Mydataset[,1:5])
y <- data.matrix(Mydataset[,6])
The most likely cause is small or zero numbers of factor variables within one or more levels. Try this first:
Mydataset [ c('gender', 'code') ] <-
lapply( Mydataset [ c('gender', 'code') ], factor)
If that's not effective then you should show the actual code used and better description and names of all objects used. At the moment we don't even know what are x and y.
EDIT: The glmnet function does not have a formula interface and is not set up to handle data.frames and factors the way that typical R regression functions would allow. After looking at the structure of x (still a list/dataframe) and reviewing the help page for ?glmnet and doing a bit of searching for the correct way to handle factors when a numeric matrix is the expected input, I suggest converting your factors to dummies with model.matrix. It's going to be easier for interpretation of the results if you change the default contrast scheme for treatment contrasts (See https://stats.stackexchange.com/questions/69804/group-categorical-variables-in-glmnet):
contr.Dummy <- function(contrasts, ...){
conT <- contr.treatment(contrasts=FALSE, ...)
conT
}
options(contrasts=c(ordered='contr.Dummy', unordered='contr.Dummy'))
x.m <- model.matrix( ~.-1, x)
fit <- glmnet(x=x.m, y, family="binomial", alpha=0.5, lambda=0.001)
I have a lme object, constructed from some repeated measures nutrient intake data (two 24-hour intake periods per RespondentID):
Male.lme2 <- lmer(BoxCoxXY ~ -1 + AgeFactor + IntakeDay + (1|RespondentID),
data = Male.Data,
weights = SampleWeight)
and I can successfully retrieve the random effects by RespondentID using ranef(Male.lme1). I would also like to collect the result of the fixed effects by RespondentID. coef(Male.lme1) does not provide exactly what I need, as I show below.
> summary(Male.lme1)
Linear mixed model fit by REML
Formula: BoxCoxXY ~ AgeFactor + IntakeDay + (1 | RespondentID)
Data: Male.Data
AIC BIC logLik deviance REMLdev
9994 10039 -4990 9952 9980
Random effects:
Groups Name Variance Std.Dev.
RespondentID (Intercept) 0.19408 0.44055
Residual 0.37491 0.61230
Number of obs: 4498, groups: RespondentID, 2249
Fixed effects:
Estimate Std. Error t value
(Intercept) 13.98016 0.03405 410.6
AgeFactor4to8 0.50572 0.04084 12.4
AgeFactor9to13 0.94329 0.04159 22.7
AgeFactor14to18 1.30654 0.04312 30.3
IntakeDayDay2Intake -0.13871 0.01809 -7.7
Correlation of Fixed Effects:
(Intr) AgFc48 AgF913 AF1418
AgeFactr4t8 -0.775
AgeFctr9t13 -0.761 0.634
AgFctr14t18 -0.734 0.612 0.601
IntkDyDy2In -0.266 0.000 0.000 0.000
I have appended the fitted results to my data, head(Male.Data) shows
NutrientID RespondentID Gender Age SampleWeight IntakeDay IntakeAmt AgeFactor BoxCoxXY lmefits
2 267 100020 1 12 0.4952835 Day1Intake 12145.852 9to13 15.61196 15.22633
7 267 100419 1 14 0.3632839 Day1Intake 9591.953 14to18 15.01444 15.31373
8 267 100459 1 11 0.4952835 Day1Intake 7838.713 9to13 14.51458 15.00062
12 267 101138 1 15 1.3258785 Day1Intake 11113.266 14to18 15.38541 15.75337
14 267 101214 1 6 2.1198688 Day1Intake 7150.133 4to8 14.29022 14.32658
18 267 101389 1 5 2.1198688 Day1Intake 5091.528 4to8 13.47928 14.58117
The first couple of lines from coef(Male.lme1) are:
$RespondentID
(Intercept) AgeFactor4to8 AgeFactor9to13 AgeFactor14to18 IntakeDayDay2Intake
100020 14.28304 0.5057221 0.9432941 1.306542 -0.1387098
100419 14.00719 0.5057221 0.9432941 1.306542 -0.1387098
100459 14.05732 0.5057221 0.9432941 1.306542 -0.1387098
101138 14.44682 0.5057221 0.9432941 1.306542 -0.1387098
101214 13.82086 0.5057221 0.9432941 1.306542 -0.1387098
101389 14.07545 0.5057221 0.9432941 1.306542 -0.1387098
To demonstrate how the coef results relate to the fitted estimates in Male.Data (which were grabbed using Male.Data$lmefits <- fitted(Male.lme1), for the first RespondentID, who has the AgeFactor level 9-13:
- the fitted value is 15.22633, which equals - from the coeffs - (Intercept) + (AgeFactor9-13) = 14.28304 + 0.9432941
Is there a clever command for me to use that will do want I want automatically, which is to extract the fixed effect estimate for each subject, or am I faced with a series of if statements trying to apply the correct AgeFactor level to each subject to get the correct fixed effect estimate, after deducting the random effect contribution off the Intercept?
Update, apologies, was trying to cut down on the output I was providing and forgot about str(). Output is:
>str(Male.Data)
'data.frame': 4498 obs. of 11 variables:
$ NutrientID : int 267 267 267 267 267 267 267 267 267 267 ...
$ RespondentID: Factor w/ 2249 levels "100020","100419",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Gender : int 1 1 1 1 1 1 1 1 1 1 ...
$ Age : int 12 14 11 15 6 5 10 2 2 9 ...
$ BodyWeight : num 51.6 46.3 46.1 63.2 28.4 18 38.2 14.4 14.6 32.1 ...
$ SampleWeight: num 0.495 0.363 0.495 1.326 2.12 ...
$ IntakeDay : Factor w/ 2 levels "Day1Intake","Day2Intake": 1 1 1 1 1 1 1 1 1 1 ...
$ IntakeAmt : num 12146 9592 7839 11113 7150 ...
$ AgeFactor : Factor w/ 4 levels "1to3","4to8",..: 3 4 3 4 2 2 3 1 1 3 ...
$ BoxCoxXY : num 15.6 15 14.5 15.4 14.3 ...
$ lmefits : num 15.2 15.3 15 15.8 14.3 ...
The BodyWeight and Gender aren't being used (this is the males data, so all the Gender values are the same) and the NutrientID is similarly fixed for the data.
I have been doing horrible ifelse statements sinced I posted, so will try out your suggestion immediately. :)
Update2: this works perfectly with my current data and should be future-proof for new data, thanks to DWin for the extra help in the comment for this. :)
AgeLevels <- length(unique(Male.Data$AgeFactor))
Temp <- as.data.frame(fixef(Male.lme1)['(Intercept)'] +
c(0,fixef(Male.lme1)[2:AgeLevels])[
match(Male.Data$AgeFactor, c("1to3", "4to8", "9to13","14to18", "19to30","31to50","51to70","71Plus") )] +
c(0,fixef(Male.lme1)[(AgeLevels+1)])[
match(Male.Data$IntakeDay, c("Day1Intake","Day2Intake") )])
names(Temp) <- c("FxdEffct")
Below is how I've always found it easiest to extract the individuals' fixed effects and random effects components in the lme4-package. It actually extracts the corresponding fit to each observation. Assuming we have a mixed-effects model of form:
y = Xb + Zu + e
where Xb are the fixed effects and Zu are the random effects, we can extract the components (using lme4's sleepstudy as an example):
library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
# Xb
fix <- getME(fm1,'X') %*% fixef(fm1)
# Zu
ran <- t(as.matrix(getME(fm1,'Zt'))) %*% unlist(ranef(fm1))
# Xb + Zu
fixran <- fix + ran
I know that this works as a generalized approach to extracting components from linear mixed-effects models. For non-linear models, the model matrix X contains repeats and you may have to tailor the above code a bit. Here's some validation output as well as a visualization using lattice:
> head(cbind(fix, ran, fixran, fitted(fm1)))
[,1] [,2] [,3] [,4]
[1,] 251.4051 2.257187 253.6623 253.6623
[2,] 261.8724 11.456439 273.3288 273.3288
[3,] 272.3397 20.655691 292.9954 292.9954
[4,] 282.8070 29.854944 312.6619 312.6619
[5,] 293.2742 39.054196 332.3284 332.3284
[6,] 303.7415 48.253449 351.9950 351.9950
# Xb + Zu
> all(round((fixran),6) == round(fitted(fm1),6))
[1] TRUE
# e = y - (Xb + Zu)
> all(round(resid(fm1),6) == round(sleepstudy[,"Reaction"]-(fixran),6))
[1] TRUE
nobs <- 10 # 10 observations per subject
legend = list(text=list(c("y", "Xb + Zu", "Xb")), lines = list(col=c("blue", "red", "black"), pch=c(1,1,1), lwd=c(1,1,1), type=c("b","b","b")))
require(lattice)
xyplot(
Reaction ~ Days | Subject, data = sleepstudy,
panel = function(x, y, ...){
panel.points(x, y, type='b', col='blue')
panel.points(x, fix[(1+nobs*(panel.number()-1)):(nobs*(panel.number()))], type='b', col='black')
panel.points(x, fixran[(1+nobs*(panel.number()-1)):(nobs*(panel.number()))], type='b', col='red')
},
key = legend
)
It is going to be something like this (although you really should have given us the results of str(Male.Data) because model output does not tell us the factor levels for the baseline values:)
#First look at the coefficients
fixef(Male.lme2)
#Then do the calculations
fixef(Male.lme2)[`(Intercept)`] +
c(0,fixef(Male.lme2)[2:4])[
match(Male.Data$AgeFactor, c("1to3", "4to8", "9to13","14to18") )] +
c(0,fixef(Male.lme2)[5])[
match(Male.Data$IntakeDay, c("Day1Intake","Day2Intake") )]
You are basically running the original data through a match function to pick the correct coefficient(s) to add to the intercept ... which will be 0 if the data is the factor's base level (whose spelling I am guessing at.)
EDIT: I just noticed that you put a "-1" in the formula so perhaps all of your AgeFactor terms are listed in the output and you can tale out the 0 in the coefficient vector and the invented AgeFactor level in the match table vector.