Join data points on boxplot with lines ggplot2 - r

I'm trying to visualize paired sets of data points on the following graphs:
ggplot(grid.mag.ROIcontrols.allRuns, aes(Model,Grid_Magnitude)) +
geom_boxplot(aes(fill=Model),outlier.shape = NA,alpha=0.6) +
geom_point(aes(fill=Model),size=2,shape=21,position=position_jitterdodge(0.2)) +
geom_line(aes(group=Side)) +
facet_grid(~Side,scales = "free") +
scale_fill_brewer(palette="GnBu") +
labs(title = "Average Grid Magnitude, pm vs al EC")
Lines are joining the points between alLeft6/pmLeft6 and between alRight6/pmRight6.
However geom_line with the group variable that I need doesn't work - it adds vertical lines and one horizontal line between data points, when I need one horizontal line for each of the 10 pairs.
Without geom_line:
With geom_line:
PS: Sorry, I don't know how to share the raw data...

Without the actual data it is hard give you in depth help, please refer to this site for a guide for a great reproducible example, as mentioned in the comments.
I am assuming you want to compare one datapoint from alLeft6 to one from pmLeft6 (otherwise the horizontal line would make little sense). This indicates you have some column in your data linking these points together (Pairs in the example data).
With made up data this would be as easy as setting the geom_line() grouping variable to this column (Pairs). To align the geom_point() with the geom_line() with jitter an easy solution is to define the offset before the ggplot call (here called pd).
library(tidyverse)
grid.mag.ROIcontrols.allRuns = tibble(Model = c(rep("alLeft6", 10),rep("pmLeft6", 10),rep("alRight6", 10),rep("pmRight6", 10)),
Grid_Magnitude = c(runif(10, -1, 1),runif(10, -0.5, 1.5), runif(10, -1, 1),runif(10, -1.5, 0.5)),
Side = c(rep("Left", 20), rep("Right", 20)),
Pair = c(rep(1:10, 2), rep(11:20, 2))
) %>%
mutate(Pair = as.factor(Pair))
pd <- position_dodge(0.2)
ggplot(grid.mag.ROIcontrols.allRuns, aes(Model,Grid_Magnitude)) +
geom_boxplot(aes(fill=Model),outlier.shape = NA,alpha=0.6) +
geom_line(aes(group=Pair), position = pd) +
geom_point(aes(fill=Model,group=Pair),size=2,shape=21, position = pd) +
facet_grid(~Side,scales = "free") +
scale_fill_brewer(palette="GnBu") +
labs(title = "Average Grid Magnitude, pm vs al EC")

Related

Points keep getting cut off, and standard fixes don't work well with facet grid on a log scale

Novice R user here wrestling with some arcane details of ggplot
I am trying to produce a plot that charts two data ranges: One plotted as a line, and another plotted on the same plot, but as points. The code is something roughly like this:
ggplot(data1, aes(x = Year, y = Capacity, col = Process)) +
geom_line() +
facet_grid(Country ~ ., scales = "free_y") +
scale_y_continuous(trans = "log10") +
geom_point(data = data2, aes(x = Year, y = Capacity, col = Process))
I've left out some additional cosmetic arguments for the sake of simplicity.
The problem is that the points from the geom_point keep getting cut off by the x axis:
I know the standard fix here would be to adjust the y limits to make room for the points:
scale_y_continuous(limits = c(-100, Y_MAX))
But here there is a separate problem due to the facet grid with free scales, since there is no single value for Y_MAX
I've also tried it using expansions:
scale_y_continuous(expand = c(0.5, 0))
But here, it runs into problems with the log scale, since it multiplies by different values for each facet, producing very wonky results.
I just want to produce enough blank space on the bottom of each facet to make room for the point. Or, alternatively, move each point up a little bit to make room. Is there any easy way to do this in my case?
This might be a good place for scales::pseudo_log_trans, which combines a log transformation with a linear transformation (and a flipped sign log transformation) to retain most of the benefits of a log transformation while also allowing zero and negative values. Adjust the sigma parameter of the function to adjust where the transition from linear to log should happen.
library(ggplot2)
ggplot(data = data.frame(country = rep(c("France","USA"), each = 5),
x = rep(1:5, times = 2),
y = c(10^(2:6), 0, 10^(1:4))),
aes(x,y)) +
geom_point() +
# scale_y_continuous(trans = "log10") +
scale_y_continuous(trans = scales::pseudo_log_trans(),
breaks = c(0, 10^(0:6)),
labels = scales::label_number_si()) +
facet_wrap(~country, ncol = 1, scales = "free_y")
vs. with (trans = "log10"):

geom_step starting and ending with a horizontal segment

Sometimes I'd like to present data that refer to periods (not to points in time) as a step function. When e.g. data are per-period averages, this seems more appropriate than using a line connecting points (with geom_line). Consider, as a MWE, the following:
df = data.frame(x=1:8,y=rnorm(8,5,2))
ggplot(df,aes(x=x,y=y))+geom_step(size=1)+scale_x_continuous(breaks=seq(0,8,2))
This gives
However, the result is not fully satisfactory, as (1) I'd like the final observation to be represented by an horizontal segment and (2) I'd like to have labels on the x-axis aligned at the center of the horizontal line. What I want can be obtained with some hacking:
df %>% rbind(tail(df,1) %>% mutate(x=x+1)) %>%
ggplot(aes(x,y))+geom_step(size=1)+
scale_x_continuous(breaks=seq(0,12,2))+
theme(axis.ticks.x=element_blank(),axis.text.x=element_text(hjust=-2))
which produces:
This corresponds to what I am looking for (except that the horizontal alignment of labels requires some fine tuning and is not perfect). However, I am not sure this is the best way to proceed and I wonder if there is a better way.
Does this work for you? It comes down to altering the data as it is passed rather than changing the plotting code per se (as is often the case in ggplot)
Essentially what we do is add an extra copy of the final y value on to the end of the data frame at an incremented x value.
To make the horizontal segments line up to the major axis breaks, we simply subtract 0.5 from the x value.
ggplot(rbind(df, data.frame(x = 9, y = tail(df$y, 1))),
aes(x = x - 0.5, y = y)) +
geom_step(size = 1)+
scale_x_continuous(breaks = seq(0, 8, 2), name = "x",
minor_breaks = seq(0, 8, 1) + 0.5) +
theme_bw() +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor = element_line())

Add new geom as new row in ggplot2, preventing layering of plots

I am pretty sure that this is easy to do but I can't seem to find a proper way to query this question into google or stack, so here we are:
I have a plot made in ggplot2 which makes use of geom_jitter(), efficiently creating one row for each element in a factor and plotting its values.
I would like to add a complementary geom_violin() to the plot, but just adding the extra geom_ function to the plot code returns two layers: the jitter and the violin, one on top of the other (as usually expected).
EDIT:
This is how the plot looks like:
How can I have the violin as a separate row, without generating a second plot?
Side quest: how I can I have the jitter and the violin geoms interleaved? (i.e. element A jitter row followed by element A violin row, and then element B jitter row followed by element B violin row)
This is the minimum required code to make it (without all the theme() embellishments):
P1 <- ggplot(data=TEST_STACK_SUB, aes(x=E, y=C, col=A)) +
theme(... , aspect.ratio=0.3) +
geom_point(position = position_jitter(w = 0.30, h = 0), alpha=0.2, size=0.5) +
geom_violin(data=TEST_STACK_SUB, mapping=aes(x=E, y=C), position="dodge") +
scale_x_discrete() +
scale_y_continuous(limits=c(0,1), breaks=seq(0,1,0.1),
labels=c(seq(0,1,0.1))) +
scale_color_gradient2(breaks=seq(0,100,20),
limits=c(0,100),
low="green3",
high="darkorchid4",
midpoint=50,
name="") +
coord_flip()
options(repr.plot.width=8, repr.plot.height=2)
plot(P1)
Here is a subset of the data to generate it (for you to try):
data
How about manipulating your factor as a continuous variable and nudging the entries across the aes() calls like so:
library(dplyr)
library(ggplot2)
set.seed(42)
tibble(x = rep(c(1, 3), each = 10),
y = c(rnorm(10, 2), rnorm(10))) -> plot_data
ggplot(plot_data) +
geom_jitter(aes(x = x - 0.5, y = y), width = 0.25) +
geom_violin(aes(x = x + 0.5, y = y, group = x), width = 0.5) +
coord_flip() +
labs(x = "x") +
scale_x_continuous(breaks = c(1, 3),
labels = paste("Level", 1:2),
trans = scales::reverse_trans())

ggrepel: Repelling text in only one direction, and returning values of repelled text

I have a dataset, where each data point has an x-value that is constrained (represents an actual instance of a quantitative variable), y-value that is arbitrary (exists simply to provide a dimension to spread out text), and a label. My datasets can be very large, and there is often text overlap, even when I try to spread the data across the y-axis as much as possible.
Hence, I am trying to use the new ggrepel. However, I am trying to keep the text labels constrained at their x-value position, while only allowing them to repel from each other in the y-direction.
As an example, the below code produces an plot for 32 data points, where the x-values show the number of cylinders in a car, and the y-values are determined randomly (have no meaning but to provide a second dimension for text plotting purposes). Without using ggrepel, there is significant overlap in the text:
library(ggrepel)
library(ggplot2)
set.seed(1)
data = data.frame(x=runif(100, 1, 10),y=runif(100, 1, 10),label=paste0("label",seq(1:100)))
origPlot <- ggplot(data) +
geom_point(aes(x, y), color = 'red') +
geom_text(aes(x, y, label = label)) +
theme_classic(base_size = 16)
I can remedy the text overlap using ggrepel, as shown below. However, this changes not only the y-values, but also the x-values. I am trying to avoid changing the x-values, as they represent an actual physical meaning (the number of cylinders):
repelPlot <- ggplot(data) +
geom_point(aes(x, y), color = 'red') +
geom_text_repel(aes(x, y, label = label)) +
theme_classic(base_size = 16)
As a note, the reason I cannot allow the x-value of the text to change is because I am only plotting the text (not the points). Whereas, it seems that most examples in ggrepel keep the position of the points (so that their values remain true), and only repel the x and y values of the labels. Then, the points and connected to the labels with segments (you can see that in my second plot example).
I kept the points in the two examples above for demonstration purposes. However, I am only retaining the text (and hence will be removing the points and the segments), leaving me with something like this:
repelPlot2 <- ggplot(data) + geom_text_repel(aes(x, y, label = label), segment.size = 0) + theme_classic(base_size = 16)
My question is two fold:
1) Is it possible for me to repel the text labels only in the y-direction?
2) Is it possible for me to obtain a structure containing the new (repelled) y-values of the text?
Thank you for any advice!
ggrepel version 0.6.8 (Install from GitHub using devtools::github_install) now supports a "direction" argument, which enables repelling of labels only in "x" or "y" direction.
repelPlot2 <- ggplot(data) + geom_text_repel(aes(x, y, label = label), segment.size = 0, direction = "y") + theme_classic(base_size = 16)
Getting the y values is harder -- one approach can be to use the "repel_boxes" function from ggrepel first to get repelled values and then input those into ggplot with geom_text. For discussion and sample code of that approach, see https://github.com/slowkow/ggrepel/issues/24. Note that if using the latest version, the repel_boxes function now also has a "direction" argument, which takes in "both","x", or "y".
I don't think it is possible to repel text labels only in one direction with ggrepel.
I would approach this problem differently, by instead generating the arbitrary y-axis positions manually. For example, for the data set in your example, you could do this using the code below.
I have used the dplyr package to group the data set by the values of x, and then created a new column of data y containing the row numbers within each group. The row numbers are then used as the values for the y-axis.
library(ggplot2)
library(dplyr)
data <- data.frame(x = mtcars$cyl, label = paste0("label", seq(1:32)))
data <- data %>%
group_by(x) %>%
mutate(y = row_number())
ggplot(data, aes(x = x, y = y, label = label)) +
geom_text(size = 2) +
xlim(3.5, 8.5) +
theme_classic(base_size = 8)
ggsave("filename.png", width = 4, height = 2)

Add vertical lines to ggplot2 bar plot

I am doing some research on non-defaulters and defaulters with regards to banking. In that context I am plotting their distributions relative to some score in a bar plot. The higher the score, the better the credit rating.
Since the number of defaults is very limited compared to the number of non-defaults plotting the defaults and non-defaults on the same bar plot is not very giving as you hardly can see the defaults. I then make a second bar plot based on the defaulters' scores only, but on the same interval scale as the full bar plot of both the scores of the defaulters and non-defaulters. I would then like to add vertical lines to the first bar plot indicating where the highest defaulter score is located and the lowest defaulter score is located. That is to get a view of where the distribution of the defaulters fit into that of the overall distribution of both defaulters and non-defaulters.
Below is the code I am using replaced with (seeded) random data instead.
library(ggplot2)
#NDS represents non-defaults and DS defaults on the same scale
#although here being just some random normals for the sake of simplicity.
set.seed(10)
NDS<-rnorm(10000,sd=1)-2
DS<-rnorm(100,sd=2)-5
#Cutoffs are constructed such that intervals of size 0.3
#contain all values of NDS & DS
minCutoff<--9.3
maxCutoff<-2.1
#Generate the actual interval "bins"
NDS_CUT<-cut(NDS,breaks=seq(minCutoff, maxCutoff, by = 0.3))
DS_CUT<-cut(DS,breaks=seq(minCutoff, maxCutoff, by = 0.3))
#Manually generate where to put the vertical lines for min(DS) and max(DS)
minDS_bar<-levels(cut(NDS,breaks=seq(minCutoff, maxCutoff, by = 0.3)))[1]
maxDS_bar<-levels(cut(NDS,breaks=seq(minCutoff, maxCutoff, by = 0.3)))[32]
#Generate data frame - seems stupid, but makes sense
#when the "real" data is used :-)
NDSdataframe<-cbind(as.data.frame(NDS_CUT),rep(factor("State-1"),length(NDS_CUT)))
colnames(NDSdataframe)<-c("Score","Action")
DSdataframe<-cbind(as.data.frame(DS_CUT),rep(factor("State-2"),length(DS_CUT)))
colnames(DSdataframe)<-c("Score","Action")
fulldataframe<-rbind(NDSdataframe,DSdataframe)
attach(fulldataframe)
#Plot the full distribution of NDS & DS
# with geom_vline(xintercept = minDS_bar) + geom_vline(xintercept = maxDS_bar)
# that unfortunately does not show :-(
fullplot<-ggplot(fulldataframe, aes(Score, fill=factor(Action,levels=c("State-2","State-1")))) + geom_bar(position="stack") + opts(axis.text.x = theme_text(angle = 45)) + opts (legend.position = "none") + xlab("Scoreinterval") + ylab("Antal pr. interval") + geom_vline(xintercept = minDS_bar) + geom_vline(xintercept = maxDS_bar)
#Generate dataframe for DS only
#It might seem stupid, but again makes sense
#when using the original data :-)
DSdataframe2<-cbind(as.data.frame(DS_CUT),rep(factor("State-2"),length(DS_CUT)))
colnames(DSdataframe2)<-c("theScore","theAction")
#Calucate max number of observations to adjust bar plot of DS only
myMax<-max(table(DSdataframe2))+1
attach(DSdataframe2)
#Generate bar plot of DS only
subplot<-ggplot(fulldataframe, aes(theScore, fill=factor(theAction))) + geom_bar (position="stack") + opts(axis.text.x = theme_text(angle = 45)) + opts(legend.position = "none") + ylim(0, myMax) + xlab("Scoreinterval") + ylab("Antal pr. interval")
#plot on a grid
grid.newpage()
pushViewport(viewport(layout = grid.layout(2, 1)))
vplayout <- function(x, y)
viewport(layout.pos.row = x, layout.pos.col = y)
print(fullplot, vp = vplayout(1, 1))
print(subplot, vp = vplayout(2, 1))
#detach dataframes
detach(DSdataframe2)
detach(fulldataframe)
Furthermore, if anybody has an idea of how I can align the to plot so that correct intervals are just below/above each other on the grid plot
Hope somebody is able to help!
Thanks in advance,
Christian
Wrap aes around the xintercept in the geom_vline layer:
... + geom_vline(aes(xintercept = minDS_bar)) + geom_vline(aes(xintercept = maxDS_bar))
Question 1:
Since you provide the vertical lines as data, you have to map the aesthetics first, using aes()
fullplot <-ggplot(
fulldataframe,
aes(Score, fill=factor(Action,levels=c("State-2","State-1")))) +
geom_bar(position="stack") +
opts(axis.text.x = theme_text(angle = 45)) +
opts (legend.position = "none") +
xlab("Scoreinterval") +
ylab("Antal pr. interval") +
geom_vline(aes(xintercept = minDS_bar)) +
geom_vline(aes(xintercept = maxDS_bar))
Second question:
To align the plots, you can use the align.plots() function in package ggExtra
install.packages("dichromat")
install.packages("ggExtra", repos="http://R-Forge.R-project.org")
library(ggExtra)
ggExtra::align.plots(fullplot, subplot)

Resources