Set Coord_cartesian for just one side? - r

Is there a way to use coord_cartesian to set the lower y limit to 0, but continue the automatic setting for ymax?
I have a large set with multiple groups and categories. I want them each group to display on it's own page with facets for every category, so I am content with the automatic upper-bounds.
I am making qq plots, and because of skewed data, the normal distribution (from stat_qq_line) goes into the negative. Simply setting ymin would limit stat_qq_line, so I want to use coord_cartesian instead.
I was hoping
coord_cartesian(ylim = c(0, NA))
would work, but it produces
Error in if (zero_range(range)) zero_width else diff(range) :
missing value where TRUE/FALSE needed
Entire block if that helps:
p <- ggplot(dsub2, mapping = aes(sample = Usual)) +
stat_qq_line() + stat_qq_point() +
facet_wrap(~Category, scales = "free", labeller=labeller(Category = labels)) +
labs(title=paste("Group", group),
x = "Theoretical Quantiles", y = "Sample Quantiles") +
theme(plot.title = element_text(hjust = 0.5)) +
coord_cartesian(ylim = c(0, NA))

Related

How to change the Y range for ggplot (geom_col) in R?

I am trying to create 2 ggplot bar graphs for text analysis to compare frequencies as percentages from the dictionary "loughran". Here is my code for one of the graphs. How can I edit my y range so that both graphs start at 0% and end at 100%? This way, it would be much easier to see the differences.
ggplot(loughran_nc) +
aes(x = fct_reorder(sentiment, perc), y = perc)+
geom_col()+
ylab("Percentage") +
xlab("Sentiment")+
ggtitle("Sentiment Analysis: Non-Complaints Loughran dictionary")+
theme(plot.title = element_text(hjust = 0.5))
you can set limits within coord_cartesian()
Some quick data:
library(tidyverse)
loughran_nc <- data.frame(sentiment = c("words","for","some","data"),perc=c(40,60,20,80))
Then your plot + 1 line:
ggplot(loughran_nc) +
aes(x = fct_reorder(sentiment, perc), y = perc)+
geom_col()+
ylab("Percentage") +
xlab("Sentiment")+
ggtitle("Sentiment Analysis: Non-Complaints Loughran dictionary")+
theme(plot.title = element_text(hjust = 0.5)) +
coord_cartesian(ylim = c(0,100))
An alternative to coord_cartesian() is to use scale_y_continuous() or ylim().
scale_y_continuous() lets you specify all sorts of attributes to the y axis; limits, breaks, name etc (see ?scale_y_continuous). For your example, you can add scale_y_continuous(limits = c(0, 100)) to your code
ylim() is simple, and adding ylim(c(0, 100)) would also do the same job

Problem with y axis format ( change from 1e+01)

I want to change the format of y axis from 1e+01.. to 0-200000 breaks( 0, 50000, 1000000, ...200000) in my log plot
p + geom_line(aes(group = state)) + facet_wrap(~ state)+
geom_point(aes(y = positive), col = "#8B1C62")+
scale_y_continuous(limits = c(0,200000)) + theme_minimal() +
scale_y_log10()
Also I get this error message
Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
I don't know what to do. Thank you in advance.
Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
This is not an error but a warning. scale_y_log is overwriting the parameters you define in scale_y_continuous.
To fix a log scale, you can use coord_trans and change the limits using ylim (untested solution)
p + geom_line(aes(group = state)) + facet_wrap(~ state)+
geom_point(aes(y = positive), col = "#8B1C62")+
theme_minimal() +
scale_y_continuous(breaks = seq(0, 200000, by = 50000)) +
coord_trans(y = “log10”, ylim = c(0,200000))
Edit: I had forgetten the question of axis ticks. You need to use breaks argument in scale_y_continous. By the way, you can directly set ylim in coord_trans (cf. the doc)

Revert secondary y-scale in ggplot 2

I have the following data frame:
observed <- c("1000","2000","3000","4000")
simulated <- c("1100","2100","3100","4100")
error <- c("-1","-2","-0.5","-4")
Date <- c("2013-01-01","2013-01-02","2013-01-03","2013-01-04")
y <- data.frame(Date,observed,simulated,error)
y[-1] <- sapply(y[-1], as.character)
y[-1] <- sapply(y[-1], as.numeric)
y$Date <- as.Date(y$Date, format="%Y-%m-%d")
It compares observed with simulated daily river dicharges on the left y axis and shows the related difference in percent on the right y axis (note that the percentages are just an example here and are not correctly calculated).
I would like to plot all three in one graph with the percentage error plotted on the secondary y axis. I used the following code:
p<-ggplot(y, aes(x=Date))
p<-p + geom_line(aes(y=observed, colour = "observed"), size=1.5)
p<-p + geom_line(aes(y=simulated, colour = "simulated"), size=1.5)
p<-p + geom_line(aes(y=error*-500, colour="red"), size=1.5)
p<-p + scale_colour_manual(name="Discharge [m3/sec]", labels=c("observed","simulated","error"), values = c("blue", "black","red"))
p <- p + scale_y_continuous(sec.axis = sec_axis(~./-500,name = "Error [%]"))
p <- p + labs(y=expression(paste('Q [',m^3~s^-1,']'),
colour = "Parameter"))
p <- p + theme(legend.position = c(0.2, 0.87), legend.title=element_blank(),axis.title.x=element_blank())
My problem is that the secondary y axis starts at -8 and goes down to 0 from top to bottom. What I would like to have is that the secondary y axis` zero is at the top and the -8 is at the bottom where the zero from the first y axis (left) is.
The reason your secondary axis looks like that is because that's how you transformed your data. Since you multiplied your error by -500 in your 3rd geom_line, as the error gets smaller (ie, closer to -8), the line will go up. Therefore, for the secondary axis to correctly map to the data you have, it must be upside down (with -8 at the top).
If you want 0 to be at the top, just divide your error and the trans formula in sec_axis by positive 500:
ggplot(y, aes(x=Date)) +
geom_line(aes(y=observed, colour = "observed"), size=1.5) +
geom_line(aes(y=simulated, colour = "simulated"), size=1.5) +
geom_line(aes(y=error*500, colour = "error"), size=1.5) +
scale_colour_manual(name="Discharge [m3/sec]",
values = c('observed' = "blue",
'simulated' = "black",
'error' = "red")) +
scale_y_continuous(sec.axis = sec_axis(~./500, name = "Error [%]",
breaks = c(0, -2, -4, -6, -8))) +
labs(y=expression(paste('Q [',m^3~s^-1,']'),
colour = "Parameter")) +
theme(legend.position = c(0.2, 0.87),
legend.title=element_blank(),
axis.title.x=element_blank())
And if you want to make the two plots overlap, you can manually add 8 to you error to move it up, and then subtract it from the sec_axis to keep the numbers correct:
ggplot(y, aes(x=Date)) +
geom_line(aes(y=observed, colour = "observed"), size=1.5) +
geom_line(aes(y=simulated, colour = "simulated"), size=1.5) +
geom_line(aes(y=(8 + error) * 500, colour = "error"), size=1.5) +
scale_colour_manual(name="Discharge [m3/sec]",
values = c('observed' = "blue",
'simulated' = "black",
'error' = "red")) +
scale_y_continuous(sec.axis = sec_axis(~(. / 500) - 8, name = "Error [%]",
breaks = c(0, -2, -4, -6, -8))) +
labs(y=expression(paste('Q [',m^3~s^-1,']'),
colour = "Parameter")) +
theme(legend.position = c(0.2, 0.87),
legend.title=element_blank(),
axis.title.x=element_blank())
Additional tips:
You can link multiple ggplot functions with the + operator like I do above instead of saving the intermediate result to a variable each time like you do in your example
The correct way to use scale_color_manual is to pass a named vector to values. This ensures that the given color value (ie. observed) is always associated with the correct color (ie. blue).
If you want the error line to be smaller and less dominant, just reduce the transformation factor. If you multiply (in geom_line) and divide (in sec_axis) it by 100 instead of 500 you get a much flatter line. You'll have to play around with the number to get it to look like what you want. In ggplot2, the secondary axis must be a transformation of the primary axis, so you can't just pass in its own limits= argument.

How to set automatic label position based on box height

In a previous question, I asked about moving the label position of a barplot outside of the bar if the bar was too small. I was provided this following example:
library(ggplot2)
options(scipen=2)
dataset <- data.frame(Riserva_Riv_Fine_Periodo = 1:10 * 10^6 + 1,
Anno = 1:10)
ggplot(data = dataset,
aes(x = Anno,
y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity",
width=0.8,
position="dodge") +
geom_text(aes( y = Riserva_Riv_Fine_Periodo,
label = round(Riserva_Riv_Fine_Periodo, 0),
angle=90,
hjust= ifelse(Riserva_Riv_Fine_Periodo < 3000000, -0.1, 1.2)),
col="red",
size=4,
position = position_dodge(0.9))
And I obtain this graph:
The problem with the example is that the value at which the label is moved must be hard-coded into the plot, and an ifelse statement is used to reposition the label. Is there a way to automatically extract the value to cut?
A slightly better option might be to base the test and the positioning of the labels on the height of the bar relative to the height of the highest bar. That way, the cutoff value and label-shift are scaled to the actual vertical range of the plot. For example:
ydiff = max(dataset$Riserva_Riv_Fine_Periodo)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity", width=0.8) +
geom_text(aes(label = round(Riserva_Riv_Fine_Periodo, 0), angle=90,
y = ifelse(Riserva_Riv_Fine_Periodo < 0.3*ydiff,
Riserva_Riv_Fine_Periodo + 0.1*ydiff,
Riserva_Riv_Fine_Periodo - 0.1*ydiff)),
col="red", size=4)
You would still need to tweak the fractional cutoff in the test condition (I've used 0.3 in this case), depending on the physical size at which you render the plot. But you could package the code into a function to make the any manual adjustments a bit easier.
It's probably possible to automate this by determining the actual sizes of the various grobs that make up the plot and setting the condition and the positioning based on those sizes, but I'm not sure how to do that.
Just as an editorial comment, a plot with labels inside some bars and above others risks confusing the visual mapping of magnitudes to bar heights. I think it would be better to find a way to shrink, abbreviate, recode, or otherwise tweak the labels so that they contain the information you want to convey while being able to have all the labels inside the bars. Maybe something like this:
library(scales)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo/1000)) +
geom_col(width=0.8, fill="grey30") +
geom_text(aes(label = format(Riserva_Riv_Fine_Periodo/1000, big.mark=",", digits=0),
y = 0.5*Riserva_Riv_Fine_Periodo/1000),
col="white", size=3) +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
theme_classic() +
labs(y="Riserva (thousands)")
Or maybe go with a line plot instead of bars:
ggplot(dataset, aes(Anno, Riserva_Riv_Fine_Periodo/1e3)) +
geom_line(linetype="11", size=0.3, colour="grey50") +
geom_text(aes(label=format(Riserva_Riv_Fine_Periodo/1e3, big.mark=",", digits=0)),
size=3) +
theme_classic() +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
expand_limits(y=0) +
labs(y="Riserva (thousands)")

Q: R ggplot2 - Different precision on seperate tick marks

I have been working on creating a histogram of some data I that I have recent generated and in a effort to make the data more readable would like to include the confidence intervals, including having the intervals numerically marked on the tick line.
This has created a small problem with the readability. Using the code below you can see that having mean as a float value will cause all of the tick marks to have the same precision as the mean value leading to a large number of trailing 0's, in this case there are 7 but if you manully set the mean value to something like 3.5 all will have 1 trailing 0.
I was wondering if anyone knows how to set the percision of each mark manually. Ideally I would like to have the marks at 0,1,2,..,10 to be integer while the mean value would have 2 digits of precision shown since I will have a more accurate number listed.
require(ggplot2)
set.seed(1235)
df <- data.frame(x=rexp(1000))
mean = mean(df$x)
ggplot(df, aes(x=x)) +
geom_histogram(binwidth = .05, position="dodge", color="black", fill="transparent") +
geom_vline(data=df, aes(xintercept=mean), linetype="dashed", color="red") +
theme_bw() +
scale_x_continuous(name="Values", expand = c(0, 0), breaks = sort(c(seq(0,10,1), mean)))
You can set the labels parameter of scale_x_continuous. The values still overlap, so adjust accordingly or put the label elsewhere, e.g. with geom_text.
ggplot(df, aes(x = x)) +
geom_histogram(binwidth = .05, position = "dodge", color = "black", fill = "transparent") +
geom_vline(aes(xintercept = mean), linetype = "dashed", color = "red") +
theme_bw() +
scale_x_continuous(name="Values", expand = c(0, 0),
breaks = sort(c(seq(0,10,1), mean)),
labels = sort(c(0L:10L, round(mean, digits = 2))))

Resources