Visualizing regression coefficient of a regression - r

I am trying to figure out the best way to display a list of 30+ coefficients on a regression of a continuous variable.
(This may belong more in CrossValidated, I am not sure.)
Here is my example:
library("nycflights13")
library(dplyr)
flights <- nycflights13::flights
flights<- sample_n (flights, 3000)
m1<- glm(formula = arr_delay ~ . , data = flights)
summary(m1)

An option is dwplot from dotwhisker
library(dotwhisker)
dwplot(m1)
As #BenBolker commented, by default, the dwplot scales regression coeffficients by 2 standard deviations of the predictor variable
Or if we need a data.frame/tibble, then use tidy from broom
library(broom)
tidy(m1)

This may help. You could select a specific coefficient with the following :
str(flights) # to print list of data features
coef(m1)["age"] # here I just suppose that you have an axis called "age", you could select as many features coefficients as you want. For this you coud use a vector of relevant axis.
You could have a look at :
extract coefficients from glm in R

tl;dr dwplot is still (a) right answer, but there's a lot to say about the details of how you're fitting this model (and why it takes a really really long time).
glm vs lm
You're using glm() to fit a linear model, which isn't incorrect (and which would allow you to generalize to problems with count or binary responses). However, it's overkill in this case — lm() will work just fine, and be faster [considerably faster when it comes to generating confidence intervals etc.]
system.time(m1 <- glm(formula = arr_delay ~ . , data = flights)) ## 6 seconds
system.time(m2 <- lm(formula = arr_delay ~ . , data = flights, x=TRUE)) ## 13 seconds
(the reason for including x=TRUE will be discussed below)
The time difference becomes more stark when tidying/computing confidence intervals:
setTimeLimit(elapsed=600)
system.time(tidy(m1, conf.int=TRUE)) ## gave up after 10 minutes
system.time(tt <- tidy(m2, conf.int=TRUE)) ## 3.2 seconds
Tidying glms by default uses MASS::confint.glm() to compute confidence intervals by likelihood profiling, which is more accurate than Wald (mean +/- 1.96*SE) intervals for non-Gaussian responses), but way slower.
modeling choices
One of the reasons that everything is so slow is that there are lots of parameters (length(coef(m2)) is 1761). Why?
Although there are only 19 columns in the input data frame (so we might naively expect 18 coefficients), 4 of them are categorical, so get expanded to indicator variables:
catvars <- names(flights)[sapply(flights,is.character)]
sapply(catvars, function(x) length(unique(flights[[x]])))
## carrier tailnum origin dest
## 15 1653 3 94
So, most of the coefficients come from modeling the departures of individual planes (tailnum) [table(table(flights$tailnum)) shows that in this subsample of the data, more than half of the planes are recorded only once ...] It might not make sense to include this variable (if I were going to use tailnum, I would treat it as a random effect, although that would add a lot of modeling complexity).
Let's proceed without tailnum (we will still have plenty of coefficients to worry about).
plotting
At this point we're doing approximately what dotwhisker::dwplot does, but doing it by hand for more flexibility (in particular, ordering the terms by value).
The next step (1) extracts coefficients/conf int etc.; (2) scales non-binary variables by 2SD (using an internal function from dotwhisker); (3) drops the intercept; (4) makes term a factor ordered by the coefficient value and computes whether the term is significant (i.e., whether the lower and upper CI limits are both above or both below zero).
tt <- (tidy(m3, conf.int=TRUE)
%>% dotwhisker::by_2sd(flights)
%>% filter(term!="(Intercept)")
%>% mutate(term=reorder(factor(term),estimate),
sig=(conf.low*conf.high)>1)
)
Plot:
(ggplot(tt, aes(x=estimate,y=term,xmin=conf.low,xmax=conf.high))
+ geom_pointrange(aes(colour=sig))
+ geom_vline(xintercept=0,lty=2)
+ scale_colour_manual(values=c("black","red"))
)

Related

GAM smooths interaction differences - calculate p value using mgcv and gratia 0.6

I am using the useful gratia package by Gavin Simpson to extract the difference in two smooths for two different levels of a factor variable. The smooths are generated by the wonderful mgcv package. For example
library(mgcv)
library(gratia)
m1 <- gam(outcome ~ s(dep_var, by = fact_var) + fact_var, data = my.data)
diff1 <- difference_smooths(m1, smooth = "s(dep_var)")
draw(diff1)
This give me a graph of the difference between the two smooths for each level of the "by" variable in the gam() call. The graph has a shaded 95% credible interval (CI) for the difference.
Statistical significance, or areas of statistical significance at the 0.05 level, is assessed by whether or where the y = 0 line crosses the CI, where the y axis represents the difference between the smooths.
Here is an example from Gavin's site where the "by" factor variable had 3 levels.
The differences are clearly statistically significant (at 0.05) over nearly all of the graphs.
Here is another example I have generated using a "by" variable with 2 levels.
The difference in my example is clearly not statistically significant anywhere.
In the mgcv package, an approximate p value is outputted for a smooth fit that tests the null hypothesis that the coefficients are all = 0, based on a chi square test.
My question is, can anyone suggest a way of calculating a p value that similarly assesses the difference between the two smooths instead of solely relying on graphical evidence?
The output from difference_smooths() is a data frame with differences between the smooth functions at 100 points in the range of the smoothed variable, the standard error for the difference and the upper and lower limits of the CI.
Here is a link to the release of gratia 0.4 that explains the difference_smooths() function
enter link description here
but gratia is now at version 0.6
enter link description here
Thanks in advance for taking the time to consider this.
Don
One way of getting a p value for the interaction between the by factor variables is to manipulate the difference_smooths() function by activating the ci_level option. Default is 0.95. The ci_level can be manipulated to find a level where the y = 0 is no longer within the CI bands. If for example this occurred when ci_level = my_level, the p value for testing the hypothesis that the difference is zero everywhere would be 1 - my_level.
This is not totally satisfactory. For example, it would take a little manual experimentation and it may be difficult to discern accurately when zero drops out of the CI. Although, a function could be written to search the accompanying data frame that is outputted with difference_smooths() as the ci_level is varied. This is not totally satisfactory either because the detection of a non-zero CI would be dependent on the 100 points chosen by difference_smooths() to assess the difference between the two curves. Then again, the standard errors are approximate for a GAM using mgcv, so that shouldn't be too much of a problem.
Here is a graph where the zero first drops out of the CI.
Zero dropped out at ci_level = 0.88 and was still in the interval at ci_level = 0.89. So an approxiamte p value would be 1 - 0.88 = 0.12.
Can anyone think of a better way?
Reply to Gavin Simpson's comments Feb 19
Thanks very much Gavin for taking the time to make your comments.
I am not sure if using the criterion, >= 0 (for negative diffs), is a good way to go. Because of the draws from the posterior, there is likely to be many diffs that meet this criterion. I am interpreting your criterion as sample the posterior distribution and count how many differences meet the criterion, calculate the percentage and that is the p value. Correct me if I have misunderstood. Using this approach, I consistently got p values at around 0.45 - 0.5 for different gam models, even when it was clear the difference in the smooths should be statistically significant, at least at p = 0.05, because the confidence band around the smooth did not contain zero at a number of points.
Instead, I was thinking perhaps it would be better to compare the means of the posterior distribution of each of the diffs. For example
# get coefficients for the by smooths
coeff.level1 <- coef(gam.model1)[31:38]
coeff.level0 <- coef(gam.model1)[23:30]
# these indices are specific to my multi-variable gam.model1
# in my case 8 coefficients per smooth
# get posterior coefficients variances for the by smooths' coefficients
vp_level1 <- gam.model1$Vp[31:38, 31:38]
vp_level0 <- gam.model1$Vp[23:30, 23:30]
#run the simulation to get the distribution of each
#difference coefficient using the joint variance
library(MASS)
no.draws = 1000
sim <- mvrnorm(n = no.draws, (coeff.level1 - coeff.level0),
(vp_level1 + vp_level0))
# sim is a no.draws X no. of coefficients (8 in my case) matrix
# put the results into a data.frame.
y.group <- data.frame(y = as.vector(sim),
group = c(rep(1,no.draws), rep(2,no.draws),
rep(3,no.draws), rep(4,no.draws),
rep(5,no.draws), rep(6,no.draws),
rep(7,no.draws), rep(8,no.draws)) )
# y has the differences sampled from their posterior distributions.
# group is just a grouping name for the 8 sets of differences,
# (one set for each difference in coefficients)
# compare means with a linear regression
lm.test <- lm(y ~ as.factor(group), data = y.group)
summary(lm.test)
# The p value for the F statistic tells you how
# compatible the data are with the null hypothesis that
# all the group means are equal to each other.
# Same F statistic and p value from
anova(lm.test)
One could argue that if all coefficients are not equal to each other then they all can't be equal to zero but that isn't what we want here.
The basis of the smooth tests of fit given by summary(mgcv::gam.model1)
is a joint test of all coefficients == 0. This would be from a type of likelihood ratio test where model fit with and without a term are compared.
I would appreciate some ideas how to do this with the difference between two smooths.
Now that I got this far, I had a rethink of your original suggestion of using the criterion, >= 0 (for negative diffs). I reinterpreted this as meaning for each simulated coefficient difference distribution (in my case 8), count when this occurs and make a table where each row (my case, 8) is for one of these distributions with two columns holding this count and (number of simulation draws minus count), Then on this table run a chi square test. When I did this, I got a very low p value when I believe I shouldn't have as 0 was well within the smooth difference CI across almost all the levels of the exposure. Maybe I am still misunderstanding your suggestion.
Follow up thought Feb 24
In a follow up thought, we could create a variable that represents the interaction between the by factor and continuous variable
library(dplyr)
my.dat <- my.dat %>% mutate(interact.var =
ifelse(factor.2levels == "yes", 1, 0)*cont.var)
Here I am assuming that factor.2levels has the levels ("no", "yes"), and "no" is the reference level. The ifelse function creates a dummy variable which is multiplied by the continuous variable to generate the interactive variable.
Then we place this interactive variable in the GAM and get the usual statistical test for fit, that is, testing all the coefficients == 0.
#GavinSimpson actually posted a method of how to get the difference between two smooths and assess its statistical significance here in 2017. Thanks to Matteo Fasiolo for pointing me in that direction.
In that approach, the by variable is converted to an ordered categorical variable which causes mgcv::gam to produce difference smooths in comparison to the reference level. Statistical significance for the difference smooths is then tested in the usual way with the summary command for the gam model.
However, and correct me if I have misunderstood, the ordered factor approach causes the smooth for the main effect to now be the smooth for the reference level of the ordered factor.
The approach I suggested, see the main post under the heading, Follow up thought Feb 24, where the interaction variable is created, gives an almost identical result for the p value for the difference smooth but does not change the smooth for the main effect. It also does not change the intercept and the linear term for the by categorical variable which also both changed with the ordered variable approach.

Implementing Longitudinal Random Forest with LongituRF package in R

I have some high dimensional repeated measures data, and i am interested in fitting random forest model to investigate the suitability and predictive utility of such models. Specifically i am trying to implement the methods in the LongituRF package. The methods behind this package are detailed here :
Capitaine, L., et al. Random forests for high-dimensional longitudinal data. Stat Methods Med Res (2020) doi:10.1177/0962280220946080.
Conveniently the authors provide some useful data generating functions for testing. So we have
install.packages("LongituRF")
library(LongituRF)
Let's generate some data with DataLongGenerator() which takes as arguments n=sample size, p=number of predictors and G=number of predictors with temporal behavior.
my_data <- DataLongGenerator(n=50,p=6,G=6)
my_data is a list of what you'd expect Y (response vector),
X (matrix of fixed effects predictors), Z (matrix of random-effects predictors),
id (vector of sample identifier) and time (vector of time measurements). To fit random forest model simply
model <- REEMforest(X=my_data$X,Y=my_data$Y,Z=my_data$Z,time=my_data$time,
id=my_data$id,sto="BM",mtry=2)
takes about 50secs here so bear with me
so far so good. Now im clear about all the parameters here except for Z. What is Z when i go to fit this model on my actual data?
Looking at my_data$Z.
dim(my_data$Z)
[1] 471 2
head(my_data$Z)
[,1] [,2]
[1,] 1 1.1128914
[2,] 1 1.0349287
[3,] 1 0.7308948
[4,] 1 1.0976203
[5,] 1 1.3739856
[6,] 1 0.6840415
Each row of looks like an intercept term (i.e. 1) and values drawn from a uniform distribution runif().
The documentation of REEMforest() indicates that "Z [matrix]: A Nxq matrix containing the q predictor of the random effects." How is this matrix to be specified when using actual data?
My understanding is that traditionally Z is simply one-hot (binary) encoding of the group variables (e.g. as described here), so Z from the DataLongGenerator() should be nxG (471x6) sparse matrix no?
Clarity on how to specify the Z parameter with actual data would be appreciated.
EDIT
My specific example is as follows, i have a response variable (Y). Samples (identified with id) were randomly assigned to intervention (I, intervention or no intervention). A high dimensional set of features (X). Features and response were measured at two timepoints (Time, baseline and endpoint). I am interested in predicting Y, using X and I. I am also interested in extracting which features were most important to predicting Y (the same way Capitaine et al. did with HIV in their paper).
I will call REEMforest() as follows
REEMforest(X=cbind(X,I), Y=Y, time=Time, id=id)
What should i use for Z?
When the function DataLongGenerator() creates Z, it's a random uniform data in a matrix. The actual coding is
Z <- as.matrix(cbind(rep(1, length(f)), 2 * runif(length(f))))
Where f represents the length of the matrices that represent each of the elements. In your example, you used 6 groups of 50 participants with 6 fixed effects. That led to a length of 472.
From what I can gather, since this function is designed to simulate longitudinal data, this is a simulation of random effects on that data. If you were working with real data, I think it would be a lot easier to understand.
While this example doesn't use RE-EM forests, I thought it was pretty clear, because it uses tangible elements as an example. You can read about random effects in section 1.2.2 Fixed v. Random Effects. https://ademos.people.uic.edu/Chapter17.html#32_fixed_effects
Look at section 3.2 to see examples of random effects that you could intentionally model if you were working with real data.
Another example: You're running a cancer drug trial. You've collected patient demographics on a weekly basis: weight, temperature, and a CBC panel and different groups of drug administration: 1 unit per day, 2 units per day, and 3 units per day.
In traditional regression, you'd model these variables to determine how accurately the model identifies the outcome. The fixed effects are the explainable variance or R2. So if you've .86 or 86% then 14% is unexplained. It could be an interaction causing the noise, the unexplained variance between perfect and what the model determined was the outcome.
Let's say the patients with really low white blood cell counts and were overweight responded far better to the treatment. Or perhaps the patients with red hair responded better; that's not in your data. In terms of longitudinal data, let's say that the relationship (the interaction relationship) only appears after some measure of time passes.
You can try to model different relationships to evaluate the random interactions in the data. I think you'd be better off with one of the many ways to evaluate interactions systematically than a random attempt to identify random effects, though.
EDITED I started to write this in the comments with #JustGettinStarted, but it was too much.
Without the background - the easiest way to achieve this would be to run something like REEMtree::REEMtree(), setting the random effects argument to random = ~1 | time / id). After it runs, extract the random effects it's calculated. You can do it like this:
data2 <- data %>% mutate(oOrder = row_number()) %>% # identify original order of the data
arrange(time, id) %>%
mutate(zOrder = row_number()) # because the random effects will be in order by time then id
extRE <- data.frame(time = attributes(fit$RandomEffects[2][["id"]])[["row.names"]]) %>%
separate(col = time,
into = c("time", "id"),
sep = "\\/") %>%
mutate(Z = fit$RandomEffects[[2]] %>% unlist(),
id = as.integer(id),
time = time)) # set data type to match dataset for time
data2 <- data2 %>% left_join(extRE) %>% arrange(oOrder) # return to original order
Z = cbind(rep(1, times = nrows(data2)), data2$Z)
Alternatively, I suggest that you start with the random generation of random effects. The random-effects you start with are just a jumping-off point. The random effects at the end will be different.
No matter how many ways I tried to use LongituRF::REEMforest() with real data, I ran into errors. I had an uninvertible matrix failure every time.
I noticed that the data generated by DataLongGenerator() comes in order by id, then time. I tried to order the data (and Z) that way, but it didn't help. When I extracted all the functionality out of the package LongituRF, I used the MERF (multiple-effects random forest) function with no problems. Even in the research paper, that method was solid. Just thought it was worth mentioning.

Finding model predictor values that maximize the outcome

How do you find the set of values for model predictors (a mixture of linear and non-linear) that yield the highest value for the response.
Example Model:
library(lme4); library(splines)
summary(lmer(formula = Solar.R ~ 1 + bs(Ozone) + Wind + Temp + (1 | Month), data = airquality, REML = F))
Here I am interested in what conditions (predictors) produce the highest solar radation (outcome).
This question seems simple, but I've failed to find a good answer using Google.
If the model was simple, I could take the derivatives to find the maximum or minimum. Someone has suggested that if the model function can be extracted, the stats::optim() function might be used. As a last resort, I could simulate all the reasonable variations of input values and plug it into the predict() function and look for the maximum value.
The last approach mentioned doesn't seem very efficient and I imagine that this is a common enough task (e.g., finding optimal customers for advertising) that someone has built some tools for handling it. Any help is appreciated.
There are some conceptual issues here.
for the simple terms (Wind and Temp), the response is a linear (and hence both monotonic and unbounded) function of the predictors. Thus if these terms have positive parameter estimates, increasing their values to infinity (Inf) will give you an infinite response (Solar.R); values should be as small as possible (negative infinite) if the coefficients are negative. Practically speaking, then, you want to set these predictors to the minimum or maximum reasonable value if the parameter estimates are respectively negative or positive.
for the bs term, I'm not sure what the properties of the B-spline are beyond the boundary knots, but I'm pretty sure that the curves go off to positive or negative infinity, so you've got the same issue. However, for the case of bs, it's also possible that there are one or more interior maxima. For this case I would probably try to extract the basis terms and evaluate the spline over the range of the data ...
Alternatively, your mentioning optim makes me think that this is a possibility:
data(airquality)
library(lme4)
library(splines)
m1 <- lmer(formula = Solar.R ~ 1 + bs(Ozone) + Wind + Temp + (1 | Month),
data = airquality, REML = FALSE)
predval <- function(x) {
newdata <- data.frame(Ozone=x[1],Wind=x[2],Temp=x[3])
## return population-averaged prediction (no Month effect)
return(predict(m1, newdata=newdata, re.form=~0))
}
aq <- na.omit(airquality)
sval <- with(aq,c(mean(Ozone),mean(Wind),mean(Temp)))
predval(sval)
opt1 <- optim(fn=predval,
par=sval,
lower=with(aq,c(min(Ozone),min(Wind),min(Temp))),
upper=with(aq,c(max(Ozone),max(Wind),max(Temp))),
method="L-BFGS-B", ## for constrained opt.
control=list(fnscale=-1)) ## for maximization
## opt1
## $par
## [1] 70.33851 20.70000 97.00000
##
## $value
## [1] 282.9784
As expected, this is intermediate in the range of Ozone(1-168), and min/max for Wind (2.3-20.7) and Temp (57-97).
This brute force solution could be made much more efficient by automatically selecting the min/max values for the simple terms and optimizing only over the complex (polynomial/spline/etc.) terms.

lm() saving residuals with group_by with R- confused spss user

This is complete reEdit of my orignal question
Let's assume I'm working on RT data gathered in a repeated measure experiment. As part of my usual routine I always transform RT to natural logarytms and then compute a Z score for each RT within each partipant adjusting for trial number. This is typically done with a simple regression in SPSS syntax:
split file by subject.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT rtLN
/METHOD=ENTER trial
/SAVE ZRESID.
split file off.
To reproduce same procedure in R generate data:
#load libraries
library(dplyr); library(magrittr)
#generate data
ob<-c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3)
ob<-factor(ob)
trial<-c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6)
rt<-c(300,305,290,315,320,320,350,355,330,365,370,370,560,565,570,575,560,570)
cond<-c("first","first","first","snd","snd","snd","first","first","first","snd","snd","snd","first","first","first","snd","snd","snd")
#Following variable is what I would get after using SPSS code
ZreSPSS<-c(0.4207,0.44871,-1.7779,0.47787,0.47958,-0.04897,0.45954,0.45487,-1.7962,0.43034,0.41075,0.0407,-0.6037,0.0113,0.61928,1.22038,-1.32533,0.07806)
sym<-data.frame(ob, trial, rt, cond, ZreSPSS)
I could apply a formula (blend of Mark's and Daniel's solution) to compute residuals from a lm(log(rt)~trial) regression but for some reason group_by is not working here
sym %<>%
group_by (ob) %>%
mutate(z=residuals(lm(log(rt)~trial)),
obM=mean(rt), obSd=sd(rt), zRev=z*obSd+obM)
Resulting values clearly show that grouping hasn't kicked in.
Any idea why it didn't work out?
Using dplyr and magrittr, you should be able to calculate z-scores within individual with this code (it breaks things into the groups you tell it to, then calculates within that group).
experiment %<>%
group_by(subject) %>%
mutate(rtLN = log(rt)
, ZRE1 = scale(rtLN))
You should then be able to do use that in your model. However, one thing that may help your shift to R thinking is that you can likely build your model directly, instead of having to make all of these columns ahead of time. For example, using lme4 to treat subject as a random variable:
withRandVar <-
lmer(log(rt) ~ cond + (1|as.factor(subject))
, data = experiment)
Then, the residuals should already be on the correct scale. Further, if you use the z-scores, you probably should be plotting on that scale. I am not actually sure what running with the z-scores as the response gains you -- it seems like you would lose information about the degree of difference between the groups.
That is, if the groups are tight, but the difference between them varies by subject, a z-score may always show them as a similar number of z-scores away. Imagine, for example, that you have two subjects, one scores (1,1,1) on condition A and (3,3,3) on condition B, and a second subject that scores (1,1,1) and (5,5,5) -- both will give z-scores of (-.9,-.9,-.9) vs (.9,.9,.9) -- losing the information that the difference between A and B is larger in subject 2.
If, however, you really want to convert back, you can probably use this to store the subject means and sds, then multiply the residuals by subjSD and add subjMean.
experiment %<>%
group_by(subject) %>%
mutate(rtLN = log(rt)
, ZRE1 = scale(rtLN)
, subjMean = mean(rtLN)
, subjSD = sd(rtLN))
mylm <- lm(x~y)
rstandard(mylm)
This returns the standardized residuals of the function. To bind these to a variable you can do:
zresid <- rstandard(mylm)
EXAMPLE:
a<-rnorm(1:10,10)
b<-rnorm(1:10,10)
mylm <- lm(a~b)
mylm.zresid<-rstandard(mylm)
See also:
summary(mylm)
and
mylm$coefficients
mylm$fitted.values
mylm$xlevels
mylm$residuals
mylm$assign
mylm$call
mylm$effects
mylm$qr
mylm$terms
mylm$rank
mylm$df.residual
mylm$model

Standardisation in MuMIn package in R

I am using the 'MuMIn' package in R to select models and calculate effect sizes of the input variables (rain, brk, onset, wid). To make my effect size comparable between variables, I standardised them using standardize function in arm package. Here is the code that I am following:
For reference, please refer to the appendix of this paper: http://onlinelibrary.wiley.com/doi/10.1111/j.1420-9101.2010.02210.x/full
Grueber et al. 2011: Multimodel inference in ecology and evolution: challenges and solutions
data1<-read.csv("data.csv",header=TRUE) #reads the data
global.model<-lmer(yld.res ~ rain + brk + onset + wid + (1|state),data=data1,REML="FALSE") # prepares a global model
stdz.model <- standardize(global.model,standardize.y = FALSE) # standardise the input varaibles
model.set <- dredge(stdz.model) ### generates the full submodel set
top.models <- get.models(model.set, subset= delta<2) # selects models with delta AIC <2
model.avg(top.models) # calculates the average effect size of input variables
Here is the result of model.avg(top.models) which gives the average effect size of each input variable
Coefficients:
(Intercept) brk rain wid onset
subset -4.281975e-14 -106.0919 51.54688 39.82837 35.68766
I read around how the standardize function works- subtracts mean and divides by 2SD.
My question is this: Since I have standardised the input variables, should not the effect sizes be between -1 to 1? or the effect size which the output shows is correct?
Please advise
Thanks a lot
This is more of a statistical question than a programming question, but: you've only standardized the predictor variables, not the response variable (you specified standardize.y=FALSE); therefore, each of your coefficients represents the expected change of the response (in the response's units!) per 2 SD change in the predictor. If the range of the response is large (as it must be in your example), then there could be a very large change. For example, if I were analyzing the change in elephant weight measured in milligrams, I could expect very large changes in the response for reasonably small changes in the predictors (e.g. sex, age, food availability). You should probably use standardize.y=TRUE if you want truly nondimensional/unitless effect sizes. Even nondimensional effects aren't necessarily constrained to be between -1 and +1, but it would be surprising for them to be so large.
By the way, I think your standardize function comes from the arm package, not from MuMIn (library("sos"); findFn("standardize",sortby="Function)).

Resources