Categorical scatter plot with mean segments using ggplot2 in R - r

I am trying to plot a simple scatter plot for 3 groups, overlaying a segment indicating the mean for each group and labelling the groups.
I have managed to get a scatter plot with error bars, but I would only like a segment indicating where the mean is. I also cannot seem to be getting the group labelling right.
To get the summary statistics I am using the function "summarySE" from this page. [EDIT: note this function is also provided in the Rmisc package]
Is there any simpler way to do this, and to get a segment instead of a point for the mean?
I really appreciate your help!
library(ggplot2)
library(plyr)
df <- data.frame(tt = rep(1:3, each = 40),
val = round(rnorm(120, m = rep(c(4, 5, 7), each = 40))))
# After loading the summarySE function:
dfc <- summarySE(df, measurevar="val", groupvars="tt")
ggplot(dfc, aes(tt, val), main="Scatter plot with mean bars",
xlab="Groups", ylab="Values", names=c("Group1", "Group2", "Group3"))+
geom_jitter(aes(tt, val), data = df, colour = I("red"),
position = position_jitter(width = 0.05)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin=val-sd, ymax=val+sd), width = 0.01, size = 1)

You can use geom_crossbar() and use val as y, ymin and ymax values. With scale_x_continuous() you can change x axis labels for original data or use #agstudy solution to change original data and labels will appear automatically.
ggplot()+
geom_jitter(aes(tt, val), data = df, colour = I("red"),
position = position_jitter(width = 0.05)) +
geom_crossbar(data=dfc,aes(x=tt,ymin=val, ymax=val,y=val,group=tt), width = 0.5)+
scale_x_continuous(breaks=c(1,2,3),labels=c("Group1", "Group2", "Group3"))

To get the group labelling , You can change continuous tt a factor like this :
dfc$tt <- factor(dfc$tt,labels=c("Group1", "Group2", "Group3"))
Of course before calling summarySE and creating dfc.
and using crossbar as mentioned in the other solution below , you get:

Related

Nested legend based on colour and shape

I want to make an xy plot of nested groups (Group and Subgroup) where points are colored by Group and have shape by Subgroup. A minimal example is below:
DATA<-data.frame(
Group=c(rep("group1",10),rep("group2",10),rep("group3",10) ),
Subgroup = c(rep(c("1.1","1.2"),5), rep(c("2.1","2.2"),5), rep(c("3.1","3.2"),5)),
x=c(rnorm(10, mean=5),rnorm(10, mean=10),rnorm(10, mean=15)),
y=c(rnorm(10, mean=3),rnorm(10, mean=4),rnorm(10, mean=5))
)
ggplot(DATA, aes(x=x, y=y,colour=Group, shape=Subgroup) ) +
geom_point(size=3)
However, because in reality I have many more subgroups than can be easily be identified based on the available shapes I want to repeat the same shapes within each Group. Below is the same code but with an additional column (Shape) specifying the shape:
DATA<-data.frame(
Group=c(rep("group1",10),rep("group2",10),rep("group3",10) ),
Subgroup = c(rep(c("1.1","1.2"),5), rep(c("2.1","2.2"),5), rep(c("3.1","3.2"),5)),
Shape = as.character(c(rep(c(1,2),15) ) ),
x=c(rnorm(10, mean=5),rnorm(10, mean=10),rnorm(10, mean=15)),
y=c(rnorm(10, mean=3),rnorm(10, mean=4),rnorm(10, mean=5))
)
ggplot(DATA, aes(x=x, y=y,colour=Group, shape=Shape) ) +
geom_point(size=3)
Now the shapes and colours are as I want them. However, the legend no longer lists the subgroups. What I want is a legend that lists all subgroups under each respective Group. Something like:
Group1
1.1
1.2
Group2
2.1
2.2
Group3
3.1
3.2
(Ideally, this would be a single nested legend. If nested legends are not possible, perhaps they can be three separate legends with the Groups as titles)
Is this something that can be achieved, and how?
Thanks
One option to achieve your desired result would be via the ggnewscale package which allows for multiple scales and legends for the same aesthetic.
To this end we have to
split the data by GROUP and plot each GROUP via a separate geom_point layer.
Additionally each GROUP gets a separate shape scale and legend which via achieve via ggnewscale::new_scale.
Instead of making use of the color aesthetic we set the color for each group as an argument for which I make use of a named vector of colors
Instead of copying and pasting the code for each group I make use of purrr::imap to loop over the splitted dataset and add the layers dynamically.
One more note: In general the order of legends is by default set via a "magic algorithm". To get the groups in the right order we have to explicitly set the order via guide_legend.
library(ggplot2)
library(ggnewscale)
library(dplyr)
library(purrr)
library(tibble)
DATA_split <- split(DATA, DATA$Group)
# Vector of colors and shapes
colors <- setNames(scales::hue_pal()(length(DATA_split)), names(DATA_split))
shapes <- setNames(scales::shape_pal()(length(unique(DATA$Shape))), unique(DATA$Shape))
ggplot(mapping = aes(x = x, y = y)) +
purrr::imap(DATA_split, function(x, y) {
# Get Labels
labels <- x[c("Shape", "Subgroup")] %>%
distinct(Shape, Subgroup) %>%
deframe()
# Get order
order <- as.numeric(gsub("^.*?(\\d+)$", "\\1", y))
list(
geom_point(data = x, aes(shape = Shape), color = colors[[y]], size = 3),
scale_shape_manual(values = shapes, labels = labels, name = y, guide = guide_legend(order = order)),
new_scale("shape")
)
})
DATA
set.seed(123)
DATA <- data.frame(
Group = c(rep("group1", 10), rep("group2", 10), rep("group3", 10)),
Subgroup = c(rep(c("1.1", "1.2"), 5), rep(c("2.1", "2.2"), 5), rep(c("3.1", "3.2"), 5)),
Shape = as.character(c(rep(c(1, 2), 15))),
x = c(rnorm(10, mean = 5), rnorm(10, mean = 10), rnorm(10, mean = 15)),
y = c(rnorm(10, mean = 3), rnorm(10, mean = 4), rnorm(10, mean = 5))
)

Advice/ on how to plot side by side histograms with line graph going through in ggplot2

I'm currently finishing off my Masters project and need to include some graphics for the write-up. Without boring you too much, I have some data which is associated with AR(1) parameters ranging from 0.1 to 0.9 by 0.1 increments. As such I thought of doing a faceted histogram like the one below (worry not about the hideous fruit salad of colours, it will not be used).
I used this code.
ggplot(opt_lens_geom,aes(x=l_1024,fill=factor(rho))) + geom_histogram()+coord_flip()+facet_grid(.~rho,scales = "free_x")
I also would like to draw a trend line for the median values since the AR(1) parameter is continuous. In a later iteration I deleted the padding and made it "look" like it was one graph, but I have had issues with the endpoints matching up since each facet is a separate graphical device. Can anyone give me some advice on how to do this? I am not particularly partial to the faceting so if it is not needed I do away with it.
I will try and upload sample data, but all simulating 100 values for each of the 9 rhos would work just to get it started like:
opt_lens_geom <- data.frame(rho= rep(seq(0.1,0.9,by=0.1),each=100),l_1024=rnorm(900))
You might consider ggridges. I've assumed here that you want a median value for each value of rho.
library(ggplot2)
library(ggridges)
library(dplyr)
set.seed(1001)
opt_lens_geom <- data.frame(rho = rep(seq(0.1, 0.9, by = 0.1), each = 100),
l_1024 = rnorm(900))
opt_lens_geom %>%
mutate(rho_f = factor(rho)) %>%
ggplot(aes(l_1024, rho_f)) +
stat_density_ridges(quantiles = 2, quantile_lines = TRUE)
Result. You can add scale = 1 as a parameter to stat_density_ridges if you don't like the amount of overlap.
Try the following. It uses a pre-computed data frame of the medians.
library(ggplot2)
df <- iris[c(1, 5)]
names(df) <- c("val", "rho")
med <- plyr::ddply(df, "rho", summarise, m = median(val))
ggplot(data = df, aes(x = val, fill = factor(rho))) +
geom_histogram() +
coord_flip() +
geom_vline(data = med, aes(xintercept = m), colour = 'black') +
facet_wrap(~ factor(rho))
You could do a variant on this using geom_violin instead of using histograms, although you wouldn't get labelled counts, just an idea of the relative density. Example with made up data:
df = data.frame(
rho = rep(c(0.1, 0.2, 0.3), each = 50),
val = sample(1:10, 150, replace = TRUE)
)
df$val = df$val + (5 * (df$rho == 0.2)) + (8 * (df$rho == 0.3))
ggplot(df, aes(x = rho, y = val, fill = factor(rho))) +
geom_violin() +
stat_summary(aes(group = 1), colour = "black",
geom = "line", fun.y = "median")
This produces a violin for each value of rho, and joins the medians for each violin.

Adding dummy values on axis in ggplot2 to add asymmetric distance between ticks

How to add dummy values on x-axis in ggplot2
I have 0,2,4,6,12,14,18,22,26 in data and that i have plotted on x-axis. Is there a way to add the remaining even numbers for which there is no data in table? this will create due spaces on the x-axis.
after the activity the x-axis should show 0,2,4,6,8,10,12,14,16,18,20,22,24,26
i have tried using rbind.fill already to add dummy data but when I make them factor the 8,10,12etc coming in last
Thanks
enter image description here
Hope this make sense:
library(ggplot2)
gvals <- factor(letters[1:3])
xvals <- factor(c(0,2,4,6,12,14,18,22,26), levels = seq(0, 26, by = 2))
yvals <- rnorm(10000, mean = 2)
df <- data.frame(x = sample(xvals, size = length(yvals), replace = TRUE),
y = yvals,
group = sample(gvals, size = length(yvals), replace = TRUE))
ggplot(df, aes(x = x, y = y)) + geom_boxplot(aes(fill = group)) +
scale_x_discrete(drop = FALSE)
The tricks are to make the x-variable with all levels you need and to specify drop = FALSE in scale.

R ggplot: How to define group dependent y-axis breaks using facetted ggplots?

I have 40 groups (defined by short_ID) and would like to produce 40 different plots that use different y-scale breaks for each short_ID. I want the breaks for the y-scale to be (1) mean-2SD, (2) mean and (3) mean+2SD.
I have a dataset called Dataplots containing my X and Y variables and the grouping variable "short_ID". I have created additional vectors M$SD11 (=mean-2SD), M$mean and M$SD22 (=mean+2SD) to define the breaks and M$short_ID as grouping variable. The code below partly works but the problem is that I do not know how to make the breaks group-dependent (i.e., dependent on short_ID). When I run the code below I get the same y axis breaks for all plots, namely for example the max of the vector M$SD22 instead of a different M$SD22 value for each plot. So I think I need to add something to
"scale_y_continuous(breaks=c(M$SD11, M$mean, M$SD22)", for example "scale_y_continuous(group=M$short_ID, breaks=c(M$SD11, M$mean, M$SD22)" but this does not work.
Does anybody know what I can do to define different breaks for my different groups (i.e, short_IDs)? How can I change the code below to do this? Many thanks!
Dataplot <- ggplot(data = Dataplots, aes(x = Measure, y = Amylase_u, group = short_ID)) + geom_line() + facet_wrap(~ short_ID) + scale_y_continuous(breaks=c(M$SD11, M$mean, M$SD22))
I have added an example of 'Dataplots' and 'M'. For the purpose of the example I included only two groups (i.e., short_IDs) instead of the 40 I actually have. Thus this example would need to produce 2 plots, one for each short_ID with different y-axis breaks for each of the groups.
Example of Dataplots:
dput(Dataplots) structure(list(short_ID = c(1111, 1111, 1111, 1111, 2222, 2222, 2222, 2222), Measure = c(1, 2, 3, 4, 1, 2, 3, 4), Amylase_u = c(81.561, 75.648, 145.25, 85.246, 311.69, 261.74, 600.93, 291.39)), .Names = c("short_ID", "Measure", "Amylase_u"), row.names = c(NA, -8L), class = "data.frame", codepage = 65001L)
Example of M:
dput(M) structure(list(SD11 = c(162, 682), mean = c(97, 366), SD22 = c(32, 51), short_ID = c(1111, 2222)), .Names = c("SD11", "mean", "SD22", "short_ID"), row.names = 1:2, class = "data.frame")
#Mark I have been trying to apply your suggestions to my complete dataset but cannot seem to get it right. I have in total 61 plots. I started with:
myPlots <-
lapply(unique(Dataplots$short_ID), function(thisID){
Dataplots %>%
filter(short_ID == thisID) %>%
ggplot(aes(x = Measure, y = Amylase_u)) +
geom_line() +
scale_y_continuous(breaks= M %>%
filter(short_ID == thisID) %>%
select(mean) %>%
as.numeric()
) +
ggtitle(thisID)
})
(As you can see I decided to go for the subject-mean on the y-axis only and decided to drop the SDs.) I then continued with your final cowplot sugestion:
plot_grid(ggdraw() + draw_label("Amylase_u", angle = 90), plot_grid(
plot_grid(plotlist = lapply(myPlots, function(x){x + theme(axis.title = element_blank())}))
, ggdraw() + draw_label("Measurement")
, ncol = 1
, rel_heights = c(0.9, .1))
, nrow = 1, rel_widths = c(0.05, 0.95))
This, however, results in 61 plots with the subject-mean on the y-axis but without the Measurements depecited in it (so the graph itself is missing). I figured there may be a ')' misplaced so I tried:
plot_grid(
ggdraw() + draw_label("Amylase_u", angle = 90)
, plot_grid(
plot_grid(plotlist = lapply(myPlots, function(x){x +theme(axis.title = element_blank())}))
, ggdraw() + draw_label("Measurement")
, ncol = 1
, rel_heights = c(0.9, .1)
, nrow = 1
, rel_widths = c(0.05, 0.95)))
This does give me graphs but they are tiny and the layout is terrible (Rplot2). I tried adapting the rel-heights and widths too but even after reading the help-file don't quite get how I should adapt them.
Thanks again!
Rplot2
Finally, I removed the IDnumbers on top of each plot because they are not really necessary and this already greatly improves the plot (Rplot3), but still the layout needs to be adjusted.
Rplot3
My understanding is that this still remains impossible in the facet functions. However, you can accomplish it yourself using the cowplot package.
First, loop over your ideas (in lapply) and generate each of the sub-plots you wanted. Note that I am using dplyr for the pipe and filtering.
myPlots <-
lapply(unique(Dataplots$short_ID), function(thisID){
Dataplots %>%
filter(short_ID == thisID) %>%
ggplot(aes(x = Measure, y = Amylase_u)) +
geom_line() +
scale_y_continuous(breaks= M %>%
filter(short_ID == thisID) %>%
select(SD11, mean, SD22) %>%
as.numeric()
) +
ggtitle(thisID)
})
Then, call the function plot_grid from cowplot with the list of plots:
plot_grid(plotlist = myPlots)
gives:
A few notes:
cowplot autoloads its own default style, so use theme_set to return to your preferred style
Your included data appear to not actually span all of the thresholds you gave for the y-axis breaks
This should work for an arbitrarily large number of subplots, though you may want/ need to adjust labels and alignment to make them readable.
Since I am not sure what your goal is, here is another alternative. If you just want to plot deviation from mean (in standard deviations) to make the changes comparable, you could just calculate the z-score of the column within the groups and plot the results. Using dplyr again:
Dataplots %>%
group_by(short_ID) %>%
mutate(scaledAmylase = as.numeric(scale(Amylase_u)) ) %>%
ggplot(aes(x = Measure
, y = scaledAmylase)) +
geom_line() +
facet_wrap(~short_ID)
gives
Or, if the mean/SD are calculated/defined somewhere else (and stored in M) rather than coming directly from the data, you can scale using M instead of the data:
Dataplots %>%
left_join(M) %>%
mutate(scaledAmylase = (Amylase_u - mean) / ((SD22 - mean) / 2) ) %>%
ggplot(aes(x = Measure
, y = scaledAmylase)) +
geom_line() +
facet_wrap(~short_ID)
gives
And, because I can't leave well enough alone, here is a version of the plot_grid approach that removes the duplicated axis titles and includes them just once instead (like facet_wrap would). As above, increasing the number of subplots or the aspect ratio will force you to tweak the relative values here:
plot_grid(
ggdraw() + draw_label("Amylase_u", angle = 90)
, plot_grid(
plot_grid(plotlist = lapply(myPlots, function(x){x + theme(axis.title = element_blank())}))
, ggdraw() + draw_label("Measurement")
, ncol = 1
, rel_heights = c(0.9, .1))
, nrow = 1
, rel_widths = c(0.05, 0.95)
)
gives

Adding line type to legend in ggplot2

How do I make the line types used by geom_hline or geom_abline show up in the legend of a ggplot plot?
For example:
require (ggplot2)
# some data
dummy <- data.frame (category1 = rep (1:5, 8), category2 = rep (1:4, each = 10),
category3 = rep (factor (1:2), 2), expected = 10 ^ rep (4:7, each = 10),
value = 10 ^rnorm(40, 5))
# faceted plot
baseplot <-ggplot (dummy ) +
geom_point (aes (category1, value, color = category3))+
scale_y_log10 () +
facet_wrap (~category2)
# add a dotted line for expected value
p1 <-baseplot + geom_hline ( aes ( yintercept = expected), linetype = 2)
I tried a couple approaches to making the dotted line show up in the legend, but they give me the same thing as p1
p1a < -p1+scale_linetype_discrete (labels = "expected")+
guides ( linetype= guide_legend ("", labels ="expected"))
p1b <- baseplot + geom_hline (aes (yintercept = expected, linetype = "expected")) +
scale_linetype_manual (labels= "expected", values = 2)
p1a
p1b
How about multiple lines/line types?
Let's say I also wanted to plot groupwise and overall geometric means
require (reshape)
require (plyr)
# calculate geometric means, keep them in their own data frame
geometric_mean <- function (x) exp ( mean (log (x)))
dummy $GM_overall <- geometric_mean (dummy $value)
extra <- ddply(dummy, c( "GM_overall", "expected","category2"), summarize,
GM_group = geometric_mean (value))
extra_long <- melt (GM_group_long, id.vars = "category2")
I expected this approach to show linetype in the legend based on this post, but no such luck
p2=baseplot + geom_hline ( aes ( yintercept = value , linetype = variable), extra)
p2
Here's another case where I would want to do something similar with abline
It would be nice to be able to label the line as 1:1
dummy$value2 <- dummy $value * runif(40, 0.5, 2)
ggplot (dummy)+coord_fixed() +
geom_point (aes (value, value2, color = category3))+
geom_abline (yintercept =0, slope =1)
I'm using R 3.0.0, ggplot 0.9.3.1
You run through several examples, but this simple case should get you most of the way there:
dummy <- data.frame (category1 = rep (1:5, 8), category2 = rep (1:4, each = 10),
category3 = rep (factor (1:2), 2), expected = 10 ^ rep (4:7, each = 10),
value = 10 ^rnorm(40, 5))
# faceted plot
baseplot <- ggplot(dummy) +
geom_point(aes(category1, value, color = category3))+
scale_y_log10() +
facet_wrap(~category2)
# add a dotted line for expected value
baseplot + geom_hline(aes(yintercept = expected,linetype = "expected"),show_guide = TRUE)
The key in most cases, I think, is adding show_guide = TRUE. It is FALSE by default for this geom, which may or may not be intuitive. (I can see the rationale.)
Note how, in this "one line type" case, I "tricked" ggplot into creating a legend by mapping linetype to the character "expected", which causes a new column to be created behind the scenes. Multiple line types should work as expected with the usual methods of creating columns and mapping them to linetype.

Resources