plotting geom_text at individual positions in each facet of facet_wrap

plotting geom_text at individual positions in each facet of facet_wrap - r

I have this dataset
df <- data.frame(groups=factor(c(rep("A",6), rep("B",6), rep("C",6))),
types=factor(c(rep(c(rep("Z",2), rep("Y",2), rep("X",2)),3))),
values=c(10,11,1,2,0.1, 0.2, 12,13, 2,2.5, 0.2, 0.01,
12,14, 2,3,0.1,0.2))
library(ggplot2)
px <- ggplot(df, aes(groups, values)) + facet_wrap(~types, scale="free") + geom_point()
px
i conducted post-hoc analysis (values~groups for each level of types) and created a dataset, which contains significance groups (as letters: a, b, c and so on):
df.text <- data.frame(groups=factor(c(rep("A",3), rep("B",3), rep("C",3))),
label=rep("a", 9))
i proceeded to plot the labels on px:
px + geom_text(data=df.text, aes(x=groups, y=0.1, label=label), size=4, col="red", stat="identity") +theme_bw()
which doesnt look great.
My problem is to define a aes(y) in geom_text, which plots the labels at at a fixed position (e.g. above the x-axis or on top of the panel), without shifting the limits of the y axis too much. With previous datasets, the y-values were quite homogeneous among groups, so i could get away with a very low y-value. This time however the range of y is quite high, so its not easily getting done.
So, the question is how to plot the labels inside df.text at a fixed position in facet_wrap, while keeping scale="free". Best would be above the top panel.border.

You could define a new variable height which determines the height to plot the labels:
library(tidyverse)
df %>%
group_by(types) %>%
mutate(height = max(values) + .3 * sd(values)) %>%
left_join(df.text, by = "groups") %>%
ggplot(aes(groups, values)) +
facet_wrap(~types, scale = "free") +
geom_point() +
geom_text(aes(x = groups, y = height, label = label), size = 4, col = "red", stat = "identity") +
theme_bw()
Here I used the max value plus .3 times the standard deviation but you could change that to whatever you wanted obviously. Not sure how to get the labels on top of the panel strips though.

Related

How to properly form ggplot graphs, without cutting off important parts of the graph?

I have created a barchart using ggplot() + geom_bar() functions, by ggplot2 package. I have also used coord_flip() to reverse the orientation of the bars and geom_text() to add the values at the top of each bar. Some of the bars have different colors, so there is a legend following the graph. What I am getting as result is a picture half occupied by the graph, half by the legend and with the values on top of the longest bars being cut off because of the small size of the graph.
Any ideas on how to enlarge the size of the graph and reduce the size of the legend, in order the values of the bars not to be cut off?
Thank you
This is my code on imaginary data:
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
df <- as.data.frame(cbind(labels,freq))
type <- c("rich","poor","poor","poor","rich")
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
geom_text(label = sort(freq, decreasing = FALSE), size = 3.5, hjust = -0.2)
And this is the graph it gives as result:

There are a few fixes to this:
Change your Limits
As indicated by #Dave2e - see his response
Change the size of your output
The interesting thing about graphics in R is that the aspect ratio and resolution of the graphics device will change the result and look of a plot. When I ran your code... no clipping was observed. You can test this out creating the plot and then saving differently. If I take your default code, here's what I get with different arguments to width= and height= for ggsave() as a png:
ggsave('a1.png', width=10, height=5)
ggsave('a2.png', width=15, height=5)
Set an Expansion
The third way is to set an expansion to the scale limits. By default, ggplot2 actually adds some "padding" to the ends of a scale. So, if you set your limits from 0 to 10, you'll actually have a plot area that goes a bit beyond this (about 5% beyond by default). You can redefine that setting by using the expand= argument of scale_... commands in ggplot. So you can set this limit, for example in the following code:
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
type <- c("rich","poor","poor","poor","rich")
df <- data.frame(labels, freq, type)
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
geom_text(label = freq, size = 3.5, hjust = -0.2) +
scale_y_continuous(expand=expansion(mult=c(0,0.15)))
You can define the lower and upper expansion for an axis, so in the above code I've defined to set no expansion to the lower limit of the y scale and to use a multiplier of 0.15 (about 15%) to the upper limit. Default is 0.05, I believe (or 5%).

You can override the default limits on the y axis scale with with the ylim() function.
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
type <- c("rich","poor","poor","poor","rich")
df <- data.frame(labels, freq, type)
#set the max y axis limit to allow enough room for the label
ylimitmax <- 11
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
ylim(0, ylimitmax) +
geom_text(label = freq, size = 3.5, hjust = -0.2)
The script shows how to code the manual limits but you may want to automate the limit calculation with something like ylimitmax= max(freq) * 1.2.

Add new geom as new row in ggplot2, preventing layering of plots

I am pretty sure that this is easy to do but I can't seem to find a proper way to query this question into google or stack, so here we are:
I have a plot made in ggplot2 which makes use of geom_jitter(), efficiently creating one row for each element in a factor and plotting its values.
I would like to add a complementary geom_violin() to the plot, but just adding the extra geom_ function to the plot code returns two layers: the jitter and the violin, one on top of the other (as usually expected).
EDIT:
This is how the plot looks like:
How can I have the violin as a separate row, without generating a second plot?
Side quest: how I can I have the jitter and the violin geoms interleaved? (i.e. element A jitter row followed by element A violin row, and then element B jitter row followed by element B violin row)
This is the minimum required code to make it (without all the theme() embellishments):
P1 <- ggplot(data=TEST_STACK_SUB, aes(x=E, y=C, col=A)) +
theme(... , aspect.ratio=0.3) +
geom_point(position = position_jitter(w = 0.30, h = 0), alpha=0.2, size=0.5) +
geom_violin(data=TEST_STACK_SUB, mapping=aes(x=E, y=C), position="dodge") +
scale_x_discrete() +
scale_y_continuous(limits=c(0,1), breaks=seq(0,1,0.1),
labels=c(seq(0,1,0.1))) +
scale_color_gradient2(breaks=seq(0,100,20),
limits=c(0,100),
low="green3",
high="darkorchid4",
midpoint=50,
name="") +
coord_flip()
options(repr.plot.width=8, repr.plot.height=2)
plot(P1)
Here is a subset of the data to generate it (for you to try):
data

How about manipulating your factor as a continuous variable and nudging the entries across the aes() calls like so:
library(dplyr)
library(ggplot2)
set.seed(42)
tibble(x = rep(c(1, 3), each = 10),
y = c(rnorm(10, 2), rnorm(10))) -> plot_data
ggplot(plot_data) +
geom_jitter(aes(x = x - 0.5, y = y), width = 0.25) +
geom_violin(aes(x = x + 0.5, y = y, group = x), width = 0.5) +
coord_flip() +
labs(x = "x") +
scale_x_continuous(breaks = c(1, 3),
labels = paste("Level", 1:2),
trans = scales::reverse_trans())

geom_dotplot dot sizes change when plotting different datasets in loop

I'm trying to do many dot plots for different subsets of a dataset. Problem is that the format is not the same across plots. In particular, the size of the dots is not the same.
The range of the "y" variable is not the same across subsets. Is this the reason?
rm(list=ls())
library(ggplot2)
outdir<-"SELECT YOUR OUTPUT DIRECTORY"
#generate subsets separately
set.seed(1)
#
data1<-rbind(
data.frame(poll=rnorm(20,20,5),zone="zone1"),
data.frame(poll=rnorm(20,16,1),zone="zone2"))
data1$id="ID1"
data2<-rbind(
data.frame(poll=rnorm(20,2,3),zone="zone1"),
data.frame(poll=rnorm(20,2,1),zone="zone2"))
data2$id="ID2"
#this is the sample full data set
alldata<-rbind(data1,data2)
ids<-unique(alldata$id)
for (i in ids) {
graphdata<-subset(alldata, id==i)
p<-ggplot(graphdata, aes(x=zone, y=poll)) +
geom_dotplot(binaxis='y', stackdir='center', binwidth=0.8,
method="histodot",stackratio=0.8, dotsize=0.5) +
ggtitle(i)
fname<-paste(outdir,"/",i,".png",sep="")
ggsave(fname,last_plot())
}

While geom_dotplot looks like a dot plot, it's actually a different representation of a histogram. If we look at ?geom_dotplot, we see that the the size of the dots is not an absolute size, but is based on the width of the bins relative to the x-axis or y-axis (as appropriate):
In a dot plot, the width of a dot corresponds to the bin width ...
And the dotsize argument (contrary to what you might expect) just scales the size of the dots by a relative factor:
dotsize: The diameter of the dots relative to binwidth, default 1.
We can see this with an example:
ggplot(mtcars, aes(x = mpg)) +
geom_dotplot(binwidth = 1.5, stackdir = "center")
By scaling the x-axis by three while keeping binwidth constant, we reduce the relative size of these bins relative to the axis and the dots shrink:
ggplot(mtcars, aes(x = mpg*3)) +
geom_dotplot(binwidth = 1.5, stackdir = "center")
If we multiply the size of the binwidth by three, the relative size of the bins is the same and the dots are the same size as the first example:
ggplot(mtcars, aes(x = mpg*3)) +
geom_dotplot(binwidth = 4.5, stackdir = "center")
We can also compensate by setting dotsize = 3 (up from its default value of 1). This makes the dots 3x larger so they match the size of the dots in the first example, despite the bins being smaller relative to the axis. Note that they overlap now, since the dots are larger than the space the take up on the x-axis:
ggplot(mtcars, aes(x = mpg*3)) +
geom_dotplot(binwidth = 1.5, stackdir = "center", dotsize = 3)
If you want your dots to be the same size, I'd use a dynamic value for dotsize to scale them. There's probably a more elegant way to do this, but as a simple attempt, I'd calculate the maximum range of the y-axis for all your datasets:
# Put this outside the loop
# and choose whatever dataset has the largest range
max_y_range <- max(data1$poll) - min(data1$poll)
then in your loop, set:
dotsize = (max(graphdata$poll) - min(graphdata$poll))/max_y_range
This should scale your dots properly as the y-axis changes between plots:

Beyond #divibisan's excellent explaination, you might also want to look at the ggpubr package which I recently came across. You can simply use ggdotplot and get nicer graphs.
Here is you original graph. I changed the plotting code a little.
set.seed(1)
#
data1<-rbind(
data.frame(poll=rnorm(20,20,5),zone="zone1"),
data.frame(poll=rnorm(20,16,1),zone="zone2"))
data1$id="ID1"
data2<-rbind(
data.frame(poll=rnorm(20,2,3),zone="zone1"),
data.frame(poll=rnorm(20,2,1),zone="zone2"))
data2$id="ID2"
#this is the sample full data set
alldata<-rbind(data1,data2)
alldata %>% ggplot(aes(x=zone, y=poll)) +
geom_dotplot(binaxis='y', stackdir='center', binwidth=0.8,
method="histodot",stackratio=0.8, dotsize=0.5) +
facet_wrap(~id, scale="free_y")
Here is how you can draw using ggdotplot.
library(ggpubr)
alldata %>% ggdotplot(x="zone", y="poll", fill="zone", size=1.5)+
facet_wrap(~id, scale="free_y")

I have found a work around for using dotplot without manually feeding dotsize. Not very elegant but it does the trick,
p <- ggplot(mtcars, aes(x = mpg)) +
geom_dotplot(binwidth = 1, stackdir = "center")
maxY <- max(ggplot_build(p)$data[[1]]$x)
my_binwidth = maxY/30
p <- ggplot(mtcars, aes(x = mpg)) +
geom_dotplot(binwidth = my_binwidth, stackdir = "center", dotsize = 1)
p
##############################################################################
p <- ggplot(mtcars, aes(x = mpg * 3)) +
geom_dotplot(binwidth = 1, stackdir = "center")
maxY <- max(ggplot_build(p)$data[[1]]$x)
my_binwidth = maxY/30
p <- ggplot(mtcars, aes(x = mpg * 3)) +
geom_dotplot(binwidth = my_binwidth, stackdir = "center", dotsize = 1)
p
So you will have to build a fake plot with geom_dotplot to get the max value by ggplot_build(p)$data[[1]]$y, which you can use to specify binwidth in your actual plot. The dot size will remain constant irrespective of the maximum data size. If the dot size is larger or smaller to your taste then you can multiply my_binwidth by an appropriate factor. Note that it's important to keep dotsize as constant. Increasing binwidth will proportionally change the dotsize.

ggplot dual reversed y axis and geom_hline intercept calculation

I have created a line graph in ggplot2 with two y axes, and want only one dataset (blue) plotted on a reversed axis and want the other dataset (red) plotted on a different scale from the first. However, the code I am working with reverses both axes, and although the second y axis has been coded to have a different scale the second dataset (red) is being plotted using the scale of the first y axis. Furthermore I have created a line (green) for which I have to determine where the blue line intercepts it. I know the latter part of this question has been asked before and was answered, however it was noted in that post that the solution doesn't actually work. Any input would be helpful! Thank you! I've provided a sample dataset as mine is too large to recreate.
time<-c(1,2,3,4,5,6,7,8,9,10)
height<-c(100,330,410,570,200,270,230,390,400,420)
temp<-c(37,33,14,12,35,34,32,28,26,24)
tempdf<-data.frame(time,height,temp)
makeplot<-ggplot(tempdf,aes(x=time)) + geom_line(aes(y=height),color="Blue")
+ geom_line(aes(y=temp),color="Red")+
scale_y_continuous(sec.axis=sec_axis(~./100,name =
"Temperature"),trans="reverse")+
geom_hline(aes(yintercept=250), color="green")

ggplot will only do 1:1 axis transformations, and if it flips one axis, will flip both, so you need to figure out an equation to translate your axes. Multiplying (or dividing) by a negative flips your temperature axis back to a standard increasing scale. These two equations worked to get the sample data you had on the same scale.
height = temp*(-10) + 600
temp = (height - 600)/(-10)
Now, you can incorporate the equations into your plot code, the first to translate the temperature data into numbers that fit on the height scale, the second to translate your secondary axis numbers to a scale that shows temperature.
makeplot<-ggplot(tempdf,aes(x=time)) +
geom_line(aes(y=height),color="blue") +
geom_line(aes(y = (temp*(-10)) + 600), color="red")+
scale_y_continuous(sec.axis=sec_axis(~(.-600)/(-10),name =
"Temperature"),trans="reverse")+
geom_hline(aes(yintercept=250), color="green")
makeplot

Ignoring the intersection of lines problem for now, here are a couple of alternatives to dual axes. First, facets:
library(tidyverse)
library(scales)
tempdf %>%
# convert height to depth
mutate(height = -height) %>%
rename(depth = height) %>%
gather(key, value, -time) %>%
ggplot(aes(time, value)) +
geom_line() +
facet_grid(key ~ ., scales = "free_y") +
scale_x_continuous(breaks = pretty_breaks()) +
theme_bw()
Second, use coloured points to indicate temperature at each depth:
tempdf %>%
mutate(height = -height) %>%
rename(depth = height) %>%
ggplot(aes(time, depth)) +
geom_line() +
geom_point(aes(color = temp), size = 3) +
geom_hline(yintercept = -250, color = "blue") +
scale_color_gradient2(midpoint = 25,
low = "blue",
mid = "yellow",
high = "red") +
scale_x_continuous(breaks = pretty_breaks()) +
theme_dark()

Another alternative is a path plot:
ggplot(tempdf, aes(height, temp)) +
geom_path() +
geom_point(aes(fill = time), size = 8, shape = 21) +
geom_text(aes(label = time)) +
viridis::scale_fill_viridis()

dot plot dynamcially control ylim

I would like to do a dot plot and show the count on the y axis instead of the denisty AND have the ylim() adjust dynamcially
library(ggplot2)
d = data.frame( x = c(-.5,-.06,-.051,-.049,-.03,.02), color = c("red", "red", "red","green", "red","blue"))
set.seed(1)
#d= data.frame(x = rnorm(10))
binwidth= .025
p=ggplot(d, aes(x = x)) + geom_dotplot(binwidth = binwidth, method="histodot") + coord_fixed(ratio=binwidth)
p + ylim(0, ceiling(max(table(cut(p$data$x, (diff(range(p$data$x))/binwidth)))))*1.2)
Is there a way to apply the coloring that is in the "color" column of the dataframe?
Also the size of the plot changes when you change the binwidth variable. Change the binwidth variable to .1 and you will see the plot becomes larger. Is there a way to have the plot be the same size?
Thank you

We can determine this number with
p = ggplot(d, aes(x = x)) + geom_dotplot(binwidth = .05, method="histodot") + coord_fixed(ratio=0.05)
p + ylim(0, ceiling(max(table(cut(p$data$x, (diff(range(p$data$x))/0.05)))))*1.2)
Find range of the values
diff(range(p$data$x))
Divide by the binwidth or coord_fixed ratio to find number of cuts
diff(range(p$data$x))/p$coordinates$ratio
Assign each number to a bin based on the number of cuts determined above
cut(p$data$x, diff(range(p$data$x))/p$coordinates$ratio)
Find the bin with the maximum number of observations. This number might not exactly match up with the plot, but that should not be an issue.
ceiling(max(table(cut(p$data$x, (diff(range(p$data$x))/0.05)))))*1.2

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

plotting geom_text at individual positions in each facet of facet_wrap - r

Related

How to properly form ggplot graphs, without cutting off important parts of the graph?

Add new geom as new row in ggplot2, preventing layering of plots

geom_dotplot dot sizes change when plotting different datasets in loop

ggplot dual reversed y axis and geom_hline intercept calculation

dot plot dynamcially control ylim

Categories

Resources