I am trying to show different colors for coefficients that are not significant (p>0.05) and the ones that are. Plus, if someone has a way to show the legend or signify the colors that would also be nice..
Any ideas?
Sample code:
library(nycflights13)
library(dplyr)
library(dotwhisker)
library(MASS)
flights <- nycflights13::flights
flights<- sample_n (flights, 500)
m1<- glm(formula = arr_delay ~ dep_time + origin+ air_time+ distance , data = flights)
#m1<- glm(formula = arr_delay ~ . , data = flights)
m1<- stepAIC(m1)
p<- dotwhisker::dwplot(m1)
z<- p +
geom_vline(xintercept=0, linetype="dashed")+
geom_segment(aes(x=conf.low,y=term,xend=conf.high,
yend=term,col=p.value<0.05)) +
geom_point(aes(x=estimate,y=term,col=p.value<0.05)) +
xlab("standardized coefficient") +
ylab("coefficient") +
ggtitle("coefficients in the model and significance")
print(z)
Your code already kind of does what you want. The problem is that the object p produced by dwplot already has a geom_segment layer and a geom_point layer with a number of aesthetic mappings. Their colors are currently mapped to the variable model, which is just a factor level allowing for different colorings when comparing models side by side. It is possible to over-write them though:
p$layers[[1]]$mapping[5] <- aes(color = p.value < 0.05)
p$layers[[2]]$mapping[4] <- aes(color = p.value < 0.05)
And you can change the legend label with
p$labels$colour <- "Significant"
By default, dwplot also hides the legend, but we can reset that with:
p$theme <- list()
So without adding any new geoms or creating the object z, we have:
p
Note that p is still a valid and internally consistent ggplot, so you can continue to style it as desired, for example:
p + theme_bw() + geom_vline(xintercept = 0, lty = 2)
Related
I want to graph an interaction effect between two variables with one outcome in R. While I can successfully produce a graph using sjPlot:plot_model(), the interaction plot does not resize when I adjust the x-axis values. Instead, the graph that's plotted is always that of the original-size while the x- and y-axis will adjust. Below is an example using the mtcars data in R.
library(sjPlot)
library(sjmisc)
library(ggplot2)
mtcars.df <- mtcars
fit <- lm(mpg ~ hp * disp, data = mtcars.df)
plot_model(fit, type = "pred", terms = c("hp", "disp"))
I can get a graph like this in my own code. However, when I attempt to alter the x- and y-axes as seen below, the grid expands, but the graph itself does not.
plot_model(fit, type = "pred", terms = c("hp", "disp"), axis.lim = list(c(0,150),c(0,200)))
Picture of successfully graphed interaction with wildly exaggerated adjustments to the axes. The graph does not extend but the grid does.
What code can I use to adjust both the lines of my interaction effect AND those of the grid? Adjusting post-hoc with
plot_model(fit, type = "pred", terms = c("hp", "disp"))+xlim(0,150)
creates the same issue.
Post-hoc extending the graph creates the same issue.
plot_model will only plot interactions over the range of your original data. It's really not difficult to do it directly in ggplot though by feeding whatever x values you want into predict:
library(ggplot2)
mtcars.df <- mtcars
fit <- lm(mpg ~ hp * disp, data = mtcars.df)
new_df <- expand.grid(hp = 0:300, disp = c(106.78, 230.72, 354.66))
predictions <- predict(fit, new_df, se = TRUE)
new_df$mpg <- predictions$fit
new_df$upper <- new_df$mpg + 1.96 * predictions$se.fit
new_df$lower <- new_df$mpg - 1.96 * predictions$se.fit
new_df$disp <- factor(new_df$disp)
ggplot(new_df, aes(hp, mpg)) +
geom_ribbon(aes(ymax = upper, ymin = lower, fill = disp), alpha = 0.3) +
geom_line(aes(color = disp)) +
scale_fill_brewer(palette = "Set1") +
scale_color_brewer(palette = "Set1")
Created on 2022-05-21 by the reprex package (v2.0.1)
plot_model allow you to choose the range of the plot just adding the range in square braquets next to the selected variable <<[min,max]>>.
I think the easiest way would be the following:
plot_model(fit, type = "pred", terms = c("hp [0,300]", "disp"))
You can find more details here:
https://strengejacke.github.io/sjPlot/articles/plot_marginal_effects.html
I am using the rms package to perform Cox regression with age as restricted cubic splines with 4 knots, see reproducible code below from the rms package documentation.
My problem is: How do I change the y axis interval for the ggplot-output? I tried adding scale_y_continuous(limits = c(0,20)) but it didn't alter the axis
library(ggplot)
library(rms)
# NOT RUN {
# Simulate data from a population model in which the log hazard
# function is linear in age and there is no age x sex interaction
n <- 1000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n,
rep=TRUE, prob=c(.6, .4)))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
dt <- -log(runif(n))/h
label(dt) <- 'Follow-up Time'
e <- ifelse(dt <= cens,1,0)
dt <- pmin(dt, cens)
units(dt) <- "Year"
dd <- datadist(age, sex)
options(datadist='dd')
S <- Surv(dt,e)
f <- cph(S ~ rcs(age,4) + sex, x=TRUE, y=TRUE)
cox.zph(f, "rank") # tests of PH
anova(f)
ggplot(Predict(f, age, sex, fun=exp)) + # plot age effect, 2 curves for 2 sexes
scale_y_continuous(limits = c(0,20))
Unusually, the Predict class of rms has its own S3 method for ggplot, which automatically adds position scales, coordinates and geom layers. This makes it easier to plot Predict objects, but limits extensibility. In particular, it already sets the y limits via a CoordCartesian, which over-rides any y axis scales you add.
The simplest way round this is to add a new coord_cartesian which will over-ride the existing Coord object (though also generate a message).
ggplot(Predict(f, age, sex, fun=exp)) + # plot age effect, 2 curves for 2 sexes
coord_cartesian(ylim = c(0, 20))
#> Coordinate system already present. Adding new coordinate system,
#> which will replace the existing one.
The alternative is to store the plot and change its coord limits, which doesn't generate a message but is a bit "hacky"
p <- ggplot(Predict(f, age, sex, fun=exp))
p$coordinates$limits$y <- c(0, 20)
p
You can also change the y axis limits to NULL in the above code, which will allow your axis limits to be set using scale_y_continuous instead.
Although a bit laborious, you could always create the plot from first principles by converting the object to a data.frame first. The benefit with that way is you have full control over what you want to add.
age_data <- as.data.frame(Predict(f, age, sex, fun = exp))
ggplot(age_data, aes(x = age, y = yhat, colour = sex)) +
geom_line() +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2) +
scale_y_continuous(limits = c(0, 20))
#Allans approach is much simpler for your particular request though!
How can I make the method argument of geom_smooth() from ggplot be dynamic and adapt to the number of data points in a group?
For example, I have data in the following format:
1. DATE PRODUCT SIZE
2. 3/1/2017 A 10
3. 3/2/2017 B 14
4. 3/3/2017 C 25
5. 3/4/2017 A 16
6. etc.
This charts completely fine and adds a loess fit to each group (PRODUCT) with the following code (each PRODUCT group has about 20 entries):
DT<-read.csv("TEST_DATA.csv")
DT$DATE<-as.Date(DT$DATE, "%m/%d/%Y")
myPlot<-ggplot(DT, aes(DATE, SIZE, color = PRODUCT))
myPlot + geom_point() + geom_smooth(method = "loess", se = FALSE)
However, let's say I add in just 2 data points for a 4th Product "D". I then get the following warning messages and no loess fit lines are added to the plot for ANY group.
Warning messages:
1: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, ... : span too small. fewer data values than degrees of freedom.
I believe this warning is due to the fact that the number of observations for product D is less that the degrees of freedom for the loess fit.
Setting method = "auto", chooses "loess" anyway so that doesn't help and setting method to "lm" is not what I want.
I would like to do the following but can't quite get it to work and am wondering if someone can help?
myPlot + geom_point() + geom_smooth(data = DT, method = if(length(DT$PRODUCT)<5) {"lm"} else {"loess"}, se = F)
As you can see, I am trying to have geom_smooth() use method = "lm" if any groups have less than 5 observations, otherwise use the "loess" method. But I can't quite figure out how to access the number of observations of each group within the geom_smooth() function.
There's an n argument (number of points to evaluate smoother at) that you can use. See stat_smooth for details.
EDIT:
You can build the plot dynamically:
sProduct <- unique(DT$PRODUCT)
myPlot <- ggplot(DT, aes(DATE, SIZE, color = PRODUCT)) + geom_point()
for (i in sProduct){
sMethod <- ifelse(sum(DT$PRODUCT == i) <= 5, "lm", "loess")
myPlot <- myPlot + geom_smooth(data = subset(DT, PRODUCT == i), method = sMethod, se = FALSE)
}
myPlot
You could write a function that chooses the smoothing method conditionally, based on minimum group length. For example:
library(tidyverse)
theme_set(theme_classic())
conditional_smooth = function(data, xvar, yvar, group) {
p = ggplot(data, aes_string(xvar, yvar, colour=group)) +
geom_point()
min_group_length = split(data, data[, group]) %>% map_dbl(nrow) %>% min
# Choose smoothing method based on minimum group length
if(min_group_length >= 5) {
p + geom_smooth(method=loess)
}
else {
p + geom_smooth(method=lm)
}
}
Let's run the function. For the iris data frame, the smallest group has length 50.
conditional_smooth(iris, "Petal.Length", "Sepal.Length", "Species")
Now let's shorten one group to four values:
conditional_smooth(iris[c(1:50,97:150), ], "Petal.Length", "Sepal.Length", "Species")
I've poked around, but been unable to find an answer. I want to do a weighted geom_bar plot overlaid with a vertical line that shows the overall weighted average per facet. I'm unable to make this happen. The vertical line seems to a single value applied to all facets.
require('ggplot2')
require('plyr')
# data vectors
panel <- c("A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B")
instrument <-c("V1","V2","V1","V1","V1","V2","V1","V1","V2","V1","V1","V2","V1","V1","V2","V1")
cost <- c(1,4,1.5,1,4,4,1,2,1.5,1,2,1.5,2,1.5,1,2)
sensitivity <- c(3,5,2,5,5,1,1,2,3,4,3,2,1,3,1,2)
# put an initial data frame together
mydata <- data.frame(panel, instrument, cost, sensitivity)
# add a "contribution to" vector to the data frame: contribution of each instrument
# to the panel's weighted average sensitivity.
myfunc <- function(cost, sensitivity) {
return(cost*sensitivity/sum(cost))
}
mydata <- ddply(mydata, .(panel), transform, contrib=myfunc(cost, sensitivity))
# two views of each panels weighted average; should be the same numbers either way
ddply(mydata, c("panel"), summarize, wavg=weighted.mean(sensitivity, cost))
ddply(mydata, c("panel"), summarize, wavg2=sum(contrib))
# plot where each panel is getting its overall cost-weighted sensitivity from. Also
# put each panel's weighted average on the plot as a simple vertical line.
#
# PROBLEM! I don't know how to get geom_vline to honor the facet breakdown. It
# seems to be computing it overall the data and showing the resulting
# value identically in each facet plot.
ggplot(mydata, aes(x=sensitivity, weight=contrib)) +
geom_bar(binwidth=1) +
geom_vline(xintercept=sum(contrib)) +
facet_wrap(~ panel) +
ylab("contrib")
If you pass in the presumarized data, it seems to work:
ggplot(mydata, aes(x=sensitivity, weight=contrib)) +
geom_bar(binwidth=1) +
geom_vline(data = ddply(mydata, "panel", summarize, wavg = sum(contrib)), aes(xintercept=wavg)) +
facet_wrap(~ panel) +
ylab("contrib") +
theme_bw()
Example using dplyr and facet_wrap incase anyone wants it.
library(dplyr)
library(ggplot2)
df1 <- mutate(iris, Big.Petal = Petal.Length > 4)
df2 <- df1 %>%
group_by(Species, Big.Petal) %>%
summarise(Mean.SL = mean(Sepal.Length))
ggplot() +
geom_histogram(data = df1, aes(x = Sepal.Length, y = ..density..)) +
geom_vline(data = df2, mapping = aes(xintercept = Mean.SL)) +
facet_wrap(Species ~ Big.Petal)
vlines <- ddply(mydata, .(panel), summarize, sumc = sum(contrib))
ggplot(merge(mydata, vlines), aes(sensitivity, weight = contrib)) +
geom_bar(binwidth = 1) + geom_vline(aes(xintercept = sumc)) +
facet_wrap(~panel) + ylab("contrib")
I have some data from a R course class. The professor was adding each line kind of manually using base graphics. I'd like to do it using ggplot2.
So far I've created a facet'd plot in ggplot with scatter plots of hunger in different regions and also separately fitted a model to the data. The specific model has interaction terms between the x variable in the plot and the group/colour variable.
What I want to do now is plot the lines resulting for that model one per panel. I could do this by using geom_abline and defining the slope and the intercept as the sum of 2 of the coefficients (as the categorical variables for group have 0/1 values and in each panel only some values are multiplied by 1) - but this seems not easy.
I tried the same equation I used in lm in stat_smooth with no luck, I get an error.
Ideally, I'd think one can put the equation somehow into the stat_smooth and have ggplot do all the work. How would one go about it?
download.file("https://sparkpublic.s3.amazonaws.com/dataanalysis/hunger.csv",
"hunger.csv", method = "curl")
hunger <- read.csv("hunger.csv")
hunger <- hunger[hunger$Sex!="Both sexes",]
hunger_small <- hunger[hunger$WHO.region!="WHO Non Members",c(5,6,8)]
q<- qplot(x = Year, y = Numeric, data = hunger_small,
color = WHO.region) + theme(legend.position = "bottom")
q <- q + facet_grid(.~WHO.region)+guides(col=guide_legend(nrow=2))
q
# I could add the standard lm line from stat_smooth, but I dont want that
# q <- q + geom_smooth(method="lm",se=F)
#I want to add the line(s) from the lm fit below, it is really one line per panel
lmRegion <- lm(hunger$Numeric ~ hunger$Year + hunger$WHO.region +
hunger$Year *hunger$WHO.region)
# I also used a loop to do it, as below, but all in one panel
# I am not able to do that
# with facets, I used a function I found to get the colors
ggplotColours <- function(n=6, h=c(0, 360) +15) {
if ((diff(h)%%360) < 1) h[2] <- h[2] - 360/n
hcl(h = (seq(h[1], h[2], length = n)), c = 100, l = 65)
}
n <- length(levels(hunger_small$WHO.region))
q <- qplot(x = Year, y = Numeric, data = hunger_small,
color = WHO.region) + theme(legend.position = "bottom")
q <- q + geom_abline(intercept = lmRegion$coefficients[1],
slope = lmRegion$coefficients[2], color = ggplotColours(n=n)[1])
for (i in 2:n) {
q <- q + geom_abline(intercept = lmRegion$coefficients[1] +
lmRegion$coefficients[1+i], slope = lmRegion$coefficients[2] +
lmRegion$coefficients[7+i], color = ggplotColours(n=n)[i])
}
If you have one categorical data:
geom_point()
will not work,
geom_boxplot()
will work.
ggplot(hunger, aes(x = sex, y = hunger)) + geom_boxplot() + labs(x="sex") + geom_smooth(method = "lm",se=FALSE, col = "blue"). Susy