I would like to do a dot plot and show the count on the y axis instead of the denisty AND have the ylim() adjust dynamcially
library(ggplot2)
d = data.frame( x = c(-.5,-.06,-.051,-.049,-.03,.02), color = c("red", "red", "red","green", "red","blue"))
set.seed(1)
#d= data.frame(x = rnorm(10))
binwidth= .025
p=ggplot(d, aes(x = x)) + geom_dotplot(binwidth = binwidth, method="histodot") + coord_fixed(ratio=binwidth)
p + ylim(0, ceiling(max(table(cut(p$data$x, (diff(range(p$data$x))/binwidth)))))*1.2)
Is there a way to apply the coloring that is in the "color" column of the dataframe?
Also the size of the plot changes when you change the binwidth variable. Change the binwidth variable to .1 and you will see the plot becomes larger. Is there a way to have the plot be the same size?
Thank you
We can determine this number with
p = ggplot(d, aes(x = x)) + geom_dotplot(binwidth = .05, method="histodot") + coord_fixed(ratio=0.05)
p + ylim(0, ceiling(max(table(cut(p$data$x, (diff(range(p$data$x))/0.05)))))*1.2)
Find range of the values
diff(range(p$data$x))
Divide by the binwidth or coord_fixed ratio to find number of cuts
diff(range(p$data$x))/p$coordinates$ratio
Assign each number to a bin based on the number of cuts determined above
cut(p$data$x, diff(range(p$data$x))/p$coordinates$ratio)
Find the bin with the maximum number of observations. This number might not exactly match up with the plot, but that should not be an issue.
ceiling(max(table(cut(p$data$x, (diff(range(p$data$x))/0.05)))))*1.2
Related
I have created a barchart using ggplot() + geom_bar() functions, by ggplot2 package. I have also used coord_flip() to reverse the orientation of the bars and geom_text() to add the values at the top of each bar. Some of the bars have different colors, so there is a legend following the graph. What I am getting as result is a picture half occupied by the graph, half by the legend and with the values on top of the longest bars being cut off because of the small size of the graph.
Any ideas on how to enlarge the size of the graph and reduce the size of the legend, in order the values of the bars not to be cut off?
Thank you
This is my code on imaginary data:
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
df <- as.data.frame(cbind(labels,freq))
type <- c("rich","poor","poor","poor","rich")
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
geom_text(label = sort(freq, decreasing = FALSE), size = 3.5, hjust = -0.2)
And this is the graph it gives as result:
There are a few fixes to this:
Change your Limits
As indicated by #Dave2e - see his response
Change the size of your output
The interesting thing about graphics in R is that the aspect ratio and resolution of the graphics device will change the result and look of a plot. When I ran your code... no clipping was observed. You can test this out creating the plot and then saving differently. If I take your default code, here's what I get with different arguments to width= and height= for ggsave() as a png:
ggsave('a1.png', width=10, height=5)
ggsave('a2.png', width=15, height=5)
Set an Expansion
The third way is to set an expansion to the scale limits. By default, ggplot2 actually adds some "padding" to the ends of a scale. So, if you set your limits from 0 to 10, you'll actually have a plot area that goes a bit beyond this (about 5% beyond by default). You can redefine that setting by using the expand= argument of scale_... commands in ggplot. So you can set this limit, for example in the following code:
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
type <- c("rich","poor","poor","poor","rich")
df <- data.frame(labels, freq, type)
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
geom_text(label = freq, size = 3.5, hjust = -0.2) +
scale_y_continuous(expand=expansion(mult=c(0,0.15)))
You can define the lower and upper expansion for an axis, so in the above code I've defined to set no expansion to the lower limit of the y scale and to use a multiplier of 0.15 (about 15%) to the upper limit. Default is 0.05, I believe (or 5%).
You can override the default limits on the y axis scale with with the ylim() function.
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
type <- c("rich","poor","poor","poor","rich")
df <- data.frame(labels, freq, type)
#set the max y axis limit to allow enough room for the label
ylimitmax <- 11
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
ylim(0, ylimitmax) +
geom_text(label = freq, size = 3.5, hjust = -0.2)
The script shows how to code the manual limits but you may want to automate the limit calculation with something like ylimitmax= max(freq) * 1.2.
I am pretty sure that this is easy to do but I can't seem to find a proper way to query this question into google or stack, so here we are:
I have a plot made in ggplot2 which makes use of geom_jitter(), efficiently creating one row for each element in a factor and plotting its values.
I would like to add a complementary geom_violin() to the plot, but just adding the extra geom_ function to the plot code returns two layers: the jitter and the violin, one on top of the other (as usually expected).
EDIT:
This is how the plot looks like:
How can I have the violin as a separate row, without generating a second plot?
Side quest: how I can I have the jitter and the violin geoms interleaved? (i.e. element A jitter row followed by element A violin row, and then element B jitter row followed by element B violin row)
This is the minimum required code to make it (without all the theme() embellishments):
P1 <- ggplot(data=TEST_STACK_SUB, aes(x=E, y=C, col=A)) +
theme(... , aspect.ratio=0.3) +
geom_point(position = position_jitter(w = 0.30, h = 0), alpha=0.2, size=0.5) +
geom_violin(data=TEST_STACK_SUB, mapping=aes(x=E, y=C), position="dodge") +
scale_x_discrete() +
scale_y_continuous(limits=c(0,1), breaks=seq(0,1,0.1),
labels=c(seq(0,1,0.1))) +
scale_color_gradient2(breaks=seq(0,100,20),
limits=c(0,100),
low="green3",
high="darkorchid4",
midpoint=50,
name="") +
coord_flip()
options(repr.plot.width=8, repr.plot.height=2)
plot(P1)
Here is a subset of the data to generate it (for you to try):
data
How about manipulating your factor as a continuous variable and nudging the entries across the aes() calls like so:
library(dplyr)
library(ggplot2)
set.seed(42)
tibble(x = rep(c(1, 3), each = 10),
y = c(rnorm(10, 2), rnorm(10))) -> plot_data
ggplot(plot_data) +
geom_jitter(aes(x = x - 0.5, y = y), width = 0.25) +
geom_violin(aes(x = x + 0.5, y = y, group = x), width = 0.5) +
coord_flip() +
labs(x = "x") +
scale_x_continuous(breaks = c(1, 3),
labels = paste("Level", 1:2),
trans = scales::reverse_trans())
I have this dataset
df <- data.frame(groups=factor(c(rep("A",6), rep("B",6), rep("C",6))),
types=factor(c(rep(c(rep("Z",2), rep("Y",2), rep("X",2)),3))),
values=c(10,11,1,2,0.1, 0.2, 12,13, 2,2.5, 0.2, 0.01,
12,14, 2,3,0.1,0.2))
library(ggplot2)
px <- ggplot(df, aes(groups, values)) + facet_wrap(~types, scale="free") + geom_point()
px
i conducted post-hoc analysis (values~groups for each level of types) and created a dataset, which contains significance groups (as letters: a, b, c and so on):
df.text <- data.frame(groups=factor(c(rep("A",3), rep("B",3), rep("C",3))),
label=rep("a", 9))
i proceeded to plot the labels on px:
px + geom_text(data=df.text, aes(x=groups, y=0.1, label=label), size=4, col="red", stat="identity") +theme_bw()
which doesnt look great.
My problem is to define a aes(y) in geom_text, which plots the labels at at a fixed position (e.g. above the x-axis or on top of the panel), without shifting the limits of the y axis too much. With previous datasets, the y-values were quite homogeneous among groups, so i could get away with a very low y-value. This time however the range of y is quite high, so its not easily getting done.
So, the question is how to plot the labels inside df.text at a fixed position in facet_wrap, while keeping scale="free". Best would be above the top panel.border.
You could define a new variable height which determines the height to plot the labels:
library(tidyverse)
df %>%
group_by(types) %>%
mutate(height = max(values) + .3 * sd(values)) %>%
left_join(df.text, by = "groups") %>%
ggplot(aes(groups, values)) +
facet_wrap(~types, scale = "free") +
geom_point() +
geom_text(aes(x = groups, y = height, label = label), size = 4, col = "red", stat = "identity") +
theme_bw()
Here I used the max value plus .3 times the standard deviation but you could change that to whatever you wanted obviously. Not sure how to get the labels on top of the panel strips though.
I'd like to plot some measures that have been standardized to z-scores. I want the size of the point in geom_point() to increase from 0 to 3, and also to increase from 0 to -3. I also want the colour to change from red, to blue. The trick is to get both to work together.
Here is an example that's as close as I can get to what I'd like, note that the size of the point increases from -2, whereas I want the size of the point to increase as the z_score moves away from zero.
library(tidyverse)
year <- rep(c(2015:2018), each = 3)
parameters <- rep(c("length", "weight", "condition"), 4)
z_score <- runif(12, min = -2, max = 2)
df <- tibble(year, parameters, z_score)
cols <- c("#d73027",
"darkgrey",
"#4575b4")
ggplot(df, aes(year, parameters, colour = z_score, size = z_score)) +
geom_point() +
scale_colour_gradientn(colours = cols) +
theme(legend.position="bottom") +
scale_size(range = c(1,15)) +
guides(color= guide_legend(), size=guide_legend())
bubble plot output
One trick I tried was to use the absolute value of z_score which scaled the points correctly but messed up the legend.
Here's what I'd like the legend and points size to be scaled to, though I'd like the colour to be a gradient as in my example. Any insight would be greatly appreciated!
Link to plot legend
You were very close. In order to adjust the size of the points in the legend, use the override.aes option in the guides function.
library(ggplot2)
year <- rep(c(2015:2018), each = 3)
parameters <- rep(c("length", "weight", "condition"), 4)
z_score <- runif(12, min = -2, max = 2)
df <- tibble(year, parameters, z_score)
cols <- c("#d73027", "darkgrey", "#4575b4")
ggplot(df, aes(year, parameters, colour = z_score)) +
geom_point( size=abs(5*df$z_score)) + # times 5 to increase size
scale_colour_gradientn(colours = cols) +
theme(legend.position="bottom") +
scale_size(range = c(1,15)) +
guides(color=guide_legend(override.aes = list(size = c( 5, 1, 5))) )
In order to suppress the legend being print for the size attribute, I moved it outside the aes, field. This works for this example, one will have to adjust the size=c(...) to match the number of division in the legend.
This should answer your question and get you most of the way there on answering your question.
Below is my code, I just want the Y axis ticks to be in 5% increments, there are too many ticks, because they correspond to each plot.
library(ggplot2)
ggplot(data = data, aes(x = X, y = Y, label = Label)) +
geom_point() +
scale_colour_manual(values = c("steelblue4", "chartreuse4", "gold", "firebrick2")) +
geom_text(aes(color = Goal), position = "jitter", hjust=0.5, vjust=1.1, size = 2.3) +
labs(title = "Google", x = "Correlation Coefficient", y = "Top-Box %")
Try adding this layer to your ggplot:
scale_y_continuous(breaks=seq(min(data$Y),max(data$Y),(max(data$Y)-min(data$Y))/20))
The breaks= argument takes a vector that allows you to manually specify the breaks. To get 20 equally spaced values from the lowest to highest values in data$Y the seq function comes in handy. You could also wrap the seq() function with round() function to clean up the potentially messy numbers that result from max()-min()/20.