I have been working on creating a histogram of some data I that I have recent generated and in a effort to make the data more readable would like to include the confidence intervals, including having the intervals numerically marked on the tick line.
This has created a small problem with the readability. Using the code below you can see that having mean as a float value will cause all of the tick marks to have the same precision as the mean value leading to a large number of trailing 0's, in this case there are 7 but if you manully set the mean value to something like 3.5 all will have 1 trailing 0.
I was wondering if anyone knows how to set the percision of each mark manually. Ideally I would like to have the marks at 0,1,2,..,10 to be integer while the mean value would have 2 digits of precision shown since I will have a more accurate number listed.
require(ggplot2)
set.seed(1235)
df <- data.frame(x=rexp(1000))
mean = mean(df$x)
ggplot(df, aes(x=x)) +
geom_histogram(binwidth = .05, position="dodge", color="black", fill="transparent") +
geom_vline(data=df, aes(xintercept=mean), linetype="dashed", color="red") +
theme_bw() +
scale_x_continuous(name="Values", expand = c(0, 0), breaks = sort(c(seq(0,10,1), mean)))
You can set the labels parameter of scale_x_continuous. The values still overlap, so adjust accordingly or put the label elsewhere, e.g. with geom_text.
ggplot(df, aes(x = x)) +
geom_histogram(binwidth = .05, position = "dodge", color = "black", fill = "transparent") +
geom_vline(aes(xintercept = mean), linetype = "dashed", color = "red") +
theme_bw() +
scale_x_continuous(name="Values", expand = c(0, 0),
breaks = sort(c(seq(0,10,1), mean)),
labels = sort(c(0L:10L, round(mean, digits = 2))))
Related
I have created a barchart using ggplot() + geom_bar() functions, by ggplot2 package. I have also used coord_flip() to reverse the orientation of the bars and geom_text() to add the values at the top of each bar. Some of the bars have different colors, so there is a legend following the graph. What I am getting as result is a picture half occupied by the graph, half by the legend and with the values on top of the longest bars being cut off because of the small size of the graph.
Any ideas on how to enlarge the size of the graph and reduce the size of the legend, in order the values of the bars not to be cut off?
Thank you
This is my code on imaginary data:
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
df <- as.data.frame(cbind(labels,freq))
type <- c("rich","poor","poor","poor","rich")
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
geom_text(label = sort(freq, decreasing = FALSE), size = 3.5, hjust = -0.2)
And this is the graph it gives as result:
There are a few fixes to this:
Change your Limits
As indicated by #Dave2e - see his response
Change the size of your output
The interesting thing about graphics in R is that the aspect ratio and resolution of the graphics device will change the result and look of a plot. When I ran your code... no clipping was observed. You can test this out creating the plot and then saving differently. If I take your default code, here's what I get with different arguments to width= and height= for ggsave() as a png:
ggsave('a1.png', width=10, height=5)
ggsave('a2.png', width=15, height=5)
Set an Expansion
The third way is to set an expansion to the scale limits. By default, ggplot2 actually adds some "padding" to the ends of a scale. So, if you set your limits from 0 to 10, you'll actually have a plot area that goes a bit beyond this (about 5% beyond by default). You can redefine that setting by using the expand= argument of scale_... commands in ggplot. So you can set this limit, for example in the following code:
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
type <- c("rich","poor","poor","poor","rich")
df <- data.frame(labels, freq, type)
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
geom_text(label = freq, size = 3.5, hjust = -0.2) +
scale_y_continuous(expand=expansion(mult=c(0,0.15)))
You can define the lower and upper expansion for an axis, so in the above code I've defined to set no expansion to the lower limit of the y scale and to use a multiplier of 0.15 (about 15%) to the upper limit. Default is 0.05, I believe (or 5%).
You can override the default limits on the y axis scale with with the ylim() function.
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
type <- c("rich","poor","poor","poor","rich")
df <- data.frame(labels, freq, type)
#set the max y axis limit to allow enough room for the label
ylimitmax <- 11
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
ylim(0, ylimitmax) +
geom_text(label = freq, size = 3.5, hjust = -0.2)
The script shows how to code the manual limits but you may want to automate the limit calculation with something like ylimitmax= max(freq) * 1.2.
I am trying to create a circle plot with the means of a set of data plotted around a center point. The code I have found online does it but the Y axis so big that the graphic isn't useful. I want to limit the Y-axis to 95-120 but when I use Y_scale_continuous(limit=c(95,120)) it drops the bars.
Data:
"","Hour","me"
"1",0,98.9192
"2",1,100.756333333333
"3",2,101.6815
"4",3,98.6551666666667
"5",4,102.668666666667
"6",5,104.024571428571
"7",6,106.137
"8",7,103.6535
"9",8,107.868333333333
"10",9,112.261428571429
"11",10,114.99
"12",11,113.452714285714
"13",12,110.534285714286
"14",13,112.974285714286
"15",14,112.731428571429
"16",15,104.658571428571
"17",16,112.271
"18",17,108.386666666667
"19",18,113.968857142857
"20",19,107.287142857143
"21",20,110.583
"22",21,102.811714285714
"23",22,105.983571428571
"24",23,100.98625
Code:
p<-ggplot(c, aes(x = Hour, y=me)) +
geom_bar(breaks = seq(0,24), width = 2, colour="grey",stat = "identity") +
theme_minimal() +
scale_fill_brewer()+coord_polar(start=0)+
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), labels = seq(0,24))
Bars are good at showing proportional changes between values. If you let go of the 0 baseline, they do no longer have that property and this will mislead many people. If a bar is twice is tall, it should encode a value twice as large. ggplot2 closely follows that philosophy. Consider an alternative visualization. Perhaps a simple line graph:
ggplot(d, aes(x = Hour, y=me)) +
geom_polygon(fill = NA, col = 1) +
geom_point(size = 5) +
theme_minimal() +
coord_polar() +
scale_x_continuous("", breaks = 0:24, limits = c(0, 24)) +
ylim(90, 115) # adjust as you like
Perhaps the solution here is just to change your data? Mathematically speaking, this accomplishes the same thing as shifting the axis away from zero.
ggplot(df, aes(x = Hour, y=me - 95)) +
geom_bar(width = 2, colour="grey",stat = "identity") +
theme_minimal() +
scale_fill_brewer() +
coord_polar(start=0) +
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), labels = seq(0,24))
This likely makes the chart harder to interpret and therefore you would need to explain any manipulation like this. If the relative values are significant, this can help interpretation, If the absolute values are significant, this kind of adjustment can range from confusing to quite misleading.
I have the following data frame:
observed <- c("1000","2000","3000","4000")
simulated <- c("1100","2100","3100","4100")
error <- c("-1","-2","-0.5","-4")
Date <- c("2013-01-01","2013-01-02","2013-01-03","2013-01-04")
y <- data.frame(Date,observed,simulated,error)
y[-1] <- sapply(y[-1], as.character)
y[-1] <- sapply(y[-1], as.numeric)
y$Date <- as.Date(y$Date, format="%Y-%m-%d")
It compares observed with simulated daily river dicharges on the left y axis and shows the related difference in percent on the right y axis (note that the percentages are just an example here and are not correctly calculated).
I would like to plot all three in one graph with the percentage error plotted on the secondary y axis. I used the following code:
p<-ggplot(y, aes(x=Date))
p<-p + geom_line(aes(y=observed, colour = "observed"), size=1.5)
p<-p + geom_line(aes(y=simulated, colour = "simulated"), size=1.5)
p<-p + geom_line(aes(y=error*-500, colour="red"), size=1.5)
p<-p + scale_colour_manual(name="Discharge [m3/sec]", labels=c("observed","simulated","error"), values = c("blue", "black","red"))
p <- p + scale_y_continuous(sec.axis = sec_axis(~./-500,name = "Error [%]"))
p <- p + labs(y=expression(paste('Q [',m^3~s^-1,']'),
colour = "Parameter"))
p <- p + theme(legend.position = c(0.2, 0.87), legend.title=element_blank(),axis.title.x=element_blank())
My problem is that the secondary y axis starts at -8 and goes down to 0 from top to bottom. What I would like to have is that the secondary y axis` zero is at the top and the -8 is at the bottom where the zero from the first y axis (left) is.
The reason your secondary axis looks like that is because that's how you transformed your data. Since you multiplied your error by -500 in your 3rd geom_line, as the error gets smaller (ie, closer to -8), the line will go up. Therefore, for the secondary axis to correctly map to the data you have, it must be upside down (with -8 at the top).
If you want 0 to be at the top, just divide your error and the trans formula in sec_axis by positive 500:
ggplot(y, aes(x=Date)) +
geom_line(aes(y=observed, colour = "observed"), size=1.5) +
geom_line(aes(y=simulated, colour = "simulated"), size=1.5) +
geom_line(aes(y=error*500, colour = "error"), size=1.5) +
scale_colour_manual(name="Discharge [m3/sec]",
values = c('observed' = "blue",
'simulated' = "black",
'error' = "red")) +
scale_y_continuous(sec.axis = sec_axis(~./500, name = "Error [%]",
breaks = c(0, -2, -4, -6, -8))) +
labs(y=expression(paste('Q [',m^3~s^-1,']'),
colour = "Parameter")) +
theme(legend.position = c(0.2, 0.87),
legend.title=element_blank(),
axis.title.x=element_blank())
And if you want to make the two plots overlap, you can manually add 8 to you error to move it up, and then subtract it from the sec_axis to keep the numbers correct:
ggplot(y, aes(x=Date)) +
geom_line(aes(y=observed, colour = "observed"), size=1.5) +
geom_line(aes(y=simulated, colour = "simulated"), size=1.5) +
geom_line(aes(y=(8 + error) * 500, colour = "error"), size=1.5) +
scale_colour_manual(name="Discharge [m3/sec]",
values = c('observed' = "blue",
'simulated' = "black",
'error' = "red")) +
scale_y_continuous(sec.axis = sec_axis(~(. / 500) - 8, name = "Error [%]",
breaks = c(0, -2, -4, -6, -8))) +
labs(y=expression(paste('Q [',m^3~s^-1,']'),
colour = "Parameter")) +
theme(legend.position = c(0.2, 0.87),
legend.title=element_blank(),
axis.title.x=element_blank())
Additional tips:
You can link multiple ggplot functions with the + operator like I do above instead of saving the intermediate result to a variable each time like you do in your example
The correct way to use scale_color_manual is to pass a named vector to values. This ensures that the given color value (ie. observed) is always associated with the correct color (ie. blue).
If you want the error line to be smaller and less dominant, just reduce the transformation factor. If you multiply (in geom_line) and divide (in sec_axis) it by 100 instead of 500 you get a much flatter line. You'll have to play around with the number to get it to look like what you want. In ggplot2, the secondary axis must be a transformation of the primary axis, so you can't just pass in its own limits= argument.
In a previous question, I asked about moving the label position of a barplot outside of the bar if the bar was too small. I was provided this following example:
library(ggplot2)
options(scipen=2)
dataset <- data.frame(Riserva_Riv_Fine_Periodo = 1:10 * 10^6 + 1,
Anno = 1:10)
ggplot(data = dataset,
aes(x = Anno,
y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity",
width=0.8,
position="dodge") +
geom_text(aes( y = Riserva_Riv_Fine_Periodo,
label = round(Riserva_Riv_Fine_Periodo, 0),
angle=90,
hjust= ifelse(Riserva_Riv_Fine_Periodo < 3000000, -0.1, 1.2)),
col="red",
size=4,
position = position_dodge(0.9))
And I obtain this graph:
The problem with the example is that the value at which the label is moved must be hard-coded into the plot, and an ifelse statement is used to reposition the label. Is there a way to automatically extract the value to cut?
A slightly better option might be to base the test and the positioning of the labels on the height of the bar relative to the height of the highest bar. That way, the cutoff value and label-shift are scaled to the actual vertical range of the plot. For example:
ydiff = max(dataset$Riserva_Riv_Fine_Periodo)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity", width=0.8) +
geom_text(aes(label = round(Riserva_Riv_Fine_Periodo, 0), angle=90,
y = ifelse(Riserva_Riv_Fine_Periodo < 0.3*ydiff,
Riserva_Riv_Fine_Periodo + 0.1*ydiff,
Riserva_Riv_Fine_Periodo - 0.1*ydiff)),
col="red", size=4)
You would still need to tweak the fractional cutoff in the test condition (I've used 0.3 in this case), depending on the physical size at which you render the plot. But you could package the code into a function to make the any manual adjustments a bit easier.
It's probably possible to automate this by determining the actual sizes of the various grobs that make up the plot and setting the condition and the positioning based on those sizes, but I'm not sure how to do that.
Just as an editorial comment, a plot with labels inside some bars and above others risks confusing the visual mapping of magnitudes to bar heights. I think it would be better to find a way to shrink, abbreviate, recode, or otherwise tweak the labels so that they contain the information you want to convey while being able to have all the labels inside the bars. Maybe something like this:
library(scales)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo/1000)) +
geom_col(width=0.8, fill="grey30") +
geom_text(aes(label = format(Riserva_Riv_Fine_Periodo/1000, big.mark=",", digits=0),
y = 0.5*Riserva_Riv_Fine_Periodo/1000),
col="white", size=3) +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
theme_classic() +
labs(y="Riserva (thousands)")
Or maybe go with a line plot instead of bars:
ggplot(dataset, aes(Anno, Riserva_Riv_Fine_Periodo/1e3)) +
geom_line(linetype="11", size=0.3, colour="grey50") +
geom_text(aes(label=format(Riserva_Riv_Fine_Periodo/1e3, big.mark=",", digits=0)),
size=3) +
theme_classic() +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
expand_limits(y=0) +
labs(y="Riserva (thousands)")
I have data which comes from a statistical test (gene set enrichment analysis, but that's not important), so I obtain p-values for statistics that are normally distributed, i.e., both positive and negative values:
The test is run on several categories:
set.seed(1)
df <- data.frame(col = rep(1,7),
category = LETTERS[1:7],
stat.sign = sign(rnorm(7)),
p.value = runif(7, 0, 1),
stringsAsFactors = TRUE)
I want to present these data in a geom_tile ggplot such that I color code the df$category by their df$p.value multiplied by their df$stat.sign (i.e, the sign of the statistic)
For that I first take the log10 of df$p.value:
df$sig <- df$stat.sign*(-1*log10(df$p.value))
Then I order the df by df$sig for each sign of df$sig:
library(dplyr)
df <- rbind(dplyr::filter(df, sig < 0)[order(dplyr::filter(df, sig < 0)$sig), ],
dplyr::filter(df, sig > 0)[order(dplyr::filter(df, sig > 0)$sig), ])
And then I ggplot it:
library(ggplot2)
df$category <- factor(df$category, levels=df$category)
ggplot(data = df,
aes(x = col, y = category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue', mid='white', high='darkred') +
theme_minimal() +
xlab("") + ylab("") + labs(fill="-log10(P-Value)") +
theme(axis.text.y = element_text(size=12, face="bold"),
axis.text.x = element_blank())
which gives me:
Is there a way to manipulate the legend such that the values of df$sig are represented by their absolute value but everything else remains unchanged? That way I still get both red and blue shades and maintain the order I want.
If you check ggplot's documentation, scale_fill_gradient2, like other continuous scales, accepts one of the following for its labels argument:
NULL for no labels
waiver() for the default labels computed for the transofrmation object
a character vector giving labels (must be same length as breaks)
a function that takes the breaks as input and returns labels as output
Since you only want the legend values to be absolute, I assume you're satisfied with the default breaks in the legend colour bar (-0.1 to 0.4 with increments in 0.1), so all you really need is to add a function that manipulates the labels.
I.e. instead of this:
scale_fill_gradient2(low = 'darkblue', mid = 'white', high = 'darkred') +
Use this:
scale_fill_gradient2(low = 'darkblue', mid = 'white', high = 'darkred',
labels = abs) +
I'm not sure I did understood what you're looking for. Do you meant that you wan't to change the labels within legends? If you want to change labels manipulating breaks and labels given by scale_fill_gradient2() shall do it.
ggplot(data=df,aes(x=col,y=category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue',mid='white',high='darkred',
breaks = order(unique(df$sig)),
labels = abs(order(unique(df$sig)))) +
theme_minimal()+xlab("")+ylab("")+labs(fill="-log10(P-Value)") +
theme(axis.text.y=element_text(size=12,face="bold"),axis.text.x=element_blank())
For what you're looking for maybe you could display texts inside the figure to show the values, try stacking stat_bin_2d() like this:
ggplot(data=df,aes(x=col,y=category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue',mid='white',high='darkred',
breaks = order(unique(df$sig)),
labels = abs(order(unique(df$sig)))) +
theme_minimal()+xlab("")+ylab("")+labs(fill="-log10(P-Value)") +
stat_bin_2d(geom = 'text', aes(label = sig), colour = 'black', size = 16) +
theme(axis.text.y=element_text(size=12,face="bold"),axis.text.x=element_blank())
You might want to give the size and colour arguments some tries.