I have the following data frame:
observed <- c("1000","2000","3000","4000")
simulated <- c("1100","2100","3100","4100")
error <- c("-1","-2","-0.5","-4")
Date <- c("2013-01-01","2013-01-02","2013-01-03","2013-01-04")
y <- data.frame(Date,observed,simulated,error)
y[-1] <- sapply(y[-1], as.character)
y[-1] <- sapply(y[-1], as.numeric)
y$Date <- as.Date(y$Date, format="%Y-%m-%d")
It compares observed with simulated daily river dicharges on the left y axis and shows the related difference in percent on the right y axis (note that the percentages are just an example here and are not correctly calculated).
I would like to plot all three in one graph with the percentage error plotted on the secondary y axis. I used the following code:
p<-ggplot(y, aes(x=Date))
p<-p + geom_line(aes(y=observed, colour = "observed"), size=1.5)
p<-p + geom_line(aes(y=simulated, colour = "simulated"), size=1.5)
p<-p + geom_line(aes(y=error*-500, colour="red"), size=1.5)
p<-p + scale_colour_manual(name="Discharge [m3/sec]", labels=c("observed","simulated","error"), values = c("blue", "black","red"))
p <- p + scale_y_continuous(sec.axis = sec_axis(~./-500,name = "Error [%]"))
p <- p + labs(y=expression(paste('Q [',m^3~s^-1,']'),
colour = "Parameter"))
p <- p + theme(legend.position = c(0.2, 0.87), legend.title=element_blank(),axis.title.x=element_blank())
My problem is that the secondary y axis starts at -8 and goes down to 0 from top to bottom. What I would like to have is that the secondary y axis` zero is at the top and the -8 is at the bottom where the zero from the first y axis (left) is.
The reason your secondary axis looks like that is because that's how you transformed your data. Since you multiplied your error by -500 in your 3rd geom_line, as the error gets smaller (ie, closer to -8), the line will go up. Therefore, for the secondary axis to correctly map to the data you have, it must be upside down (with -8 at the top).
If you want 0 to be at the top, just divide your error and the trans formula in sec_axis by positive 500:
ggplot(y, aes(x=Date)) +
geom_line(aes(y=observed, colour = "observed"), size=1.5) +
geom_line(aes(y=simulated, colour = "simulated"), size=1.5) +
geom_line(aes(y=error*500, colour = "error"), size=1.5) +
scale_colour_manual(name="Discharge [m3/sec]",
values = c('observed' = "blue",
'simulated' = "black",
'error' = "red")) +
scale_y_continuous(sec.axis = sec_axis(~./500, name = "Error [%]",
breaks = c(0, -2, -4, -6, -8))) +
labs(y=expression(paste('Q [',m^3~s^-1,']'),
colour = "Parameter")) +
theme(legend.position = c(0.2, 0.87),
legend.title=element_blank(),
axis.title.x=element_blank())
And if you want to make the two plots overlap, you can manually add 8 to you error to move it up, and then subtract it from the sec_axis to keep the numbers correct:
ggplot(y, aes(x=Date)) +
geom_line(aes(y=observed, colour = "observed"), size=1.5) +
geom_line(aes(y=simulated, colour = "simulated"), size=1.5) +
geom_line(aes(y=(8 + error) * 500, colour = "error"), size=1.5) +
scale_colour_manual(name="Discharge [m3/sec]",
values = c('observed' = "blue",
'simulated' = "black",
'error' = "red")) +
scale_y_continuous(sec.axis = sec_axis(~(. / 500) - 8, name = "Error [%]",
breaks = c(0, -2, -4, -6, -8))) +
labs(y=expression(paste('Q [',m^3~s^-1,']'),
colour = "Parameter")) +
theme(legend.position = c(0.2, 0.87),
legend.title=element_blank(),
axis.title.x=element_blank())
Additional tips:
You can link multiple ggplot functions with the + operator like I do above instead of saving the intermediate result to a variable each time like you do in your example
The correct way to use scale_color_manual is to pass a named vector to values. This ensures that the given color value (ie. observed) is always associated with the correct color (ie. blue).
If you want the error line to be smaller and less dominant, just reduce the transformation factor. If you multiply (in geom_line) and divide (in sec_axis) it by 100 instead of 500 you get a much flatter line. You'll have to play around with the number to get it to look like what you want. In ggplot2, the secondary axis must be a transformation of the primary axis, so you can't just pass in its own limits= argument.
Related
I'm trying to do a plot which consists in two main parts, the "background" is the shape of a USA state and on top, I'm adding measurement points (using latitude and longitude coordinates) which I want to be color scaled according to the value of the measurement (The data comes from a data frame). I'm having a hard time changing the color of the points and personalizing the legend bar, I would like the bar to also show the max and minimum values and use a color scale that is more visually appealing.
m = map_data('state', region = state)
finalplot <- ggplot() +
geom_polygon( data=m, aes(x=long, y=lat), colour="black", fill="white" ) +
geom_point(data=filteredtable,aes(x=LongitudeMeasure,y=LatitudeMeasure, colour = Result)) +
ggtitle(paste0("Measurement points of ", contaminant, " in ", state)) +
theme_void()
when adding something like + scale_color_grey(start = 0.8, end = 0.2) it gives me the following Error: Continuous value supplied to discrete scale
If you have any other idea in what would be the best approach into doing this type of plot I would appreciate it.
I think this is a good example of why it's better to post some data in your question as well as showing us your code. However, it's possible to create some data so that your exact plotting code produces a reasonable output:
set.seed(69)
filteredtable <- data.frame(LongitudeMeasure = runif(100, -81.5, -80.5),
LatitudeMeasure = runif(100, 26, 28),
Result = runif(100))
state <- "Florida"
contaminant <- "Dilithium"
Now let's try your plotting code:
m = map_data('state', region = state)
finalplot <- ggplot() +
geom_polygon( data=m, aes(x=long, y=lat), colour="black", fill="white" ) +
geom_point(data=filteredtable,aes(x=LongitudeMeasure,y=LatitudeMeasure, colour = Result)) +
ggtitle(paste0("Measurement points of ", contaminant, " in ", state)) +
theme_void()
So our plot looks like this:
finalplot
But if we try to add the grayscale that you wanted, we get the same error:
finalplot + scale_color_grey(start = 0.8, end = 0.2)
#> Error: Continuous value supplied to discrete scale
The reason for this is that scale_color_grey produces a discrete gray color scale, but you want a continuous color scale, since you have a continuous variable for Result. You probably wanted scale_color_gradient or scale_color_gradientn. Let's try scale_color_gradient with a grayscale palette and set our breaks to 0.1 increments so we get the labels we want on the bar:
finalplot + scale_color_gradient(low = "gray20", high = "gray80", breaks = seq(0, 1, 0.1))
Or if we want something more colorful:
finalplot +
scale_color_gradientn(colours = c("red", "gold", "forestgreen"), breaks = seq(0, 1, 0.1))
I am pretty sure that this is easy to do but I can't seem to find a proper way to query this question into google or stack, so here we are:
I have a plot made in ggplot2 which makes use of geom_jitter(), efficiently creating one row for each element in a factor and plotting its values.
I would like to add a complementary geom_violin() to the plot, but just adding the extra geom_ function to the plot code returns two layers: the jitter and the violin, one on top of the other (as usually expected).
EDIT:
This is how the plot looks like:
How can I have the violin as a separate row, without generating a second plot?
Side quest: how I can I have the jitter and the violin geoms interleaved? (i.e. element A jitter row followed by element A violin row, and then element B jitter row followed by element B violin row)
This is the minimum required code to make it (without all the theme() embellishments):
P1 <- ggplot(data=TEST_STACK_SUB, aes(x=E, y=C, col=A)) +
theme(... , aspect.ratio=0.3) +
geom_point(position = position_jitter(w = 0.30, h = 0), alpha=0.2, size=0.5) +
geom_violin(data=TEST_STACK_SUB, mapping=aes(x=E, y=C), position="dodge") +
scale_x_discrete() +
scale_y_continuous(limits=c(0,1), breaks=seq(0,1,0.1),
labels=c(seq(0,1,0.1))) +
scale_color_gradient2(breaks=seq(0,100,20),
limits=c(0,100),
low="green3",
high="darkorchid4",
midpoint=50,
name="") +
coord_flip()
options(repr.plot.width=8, repr.plot.height=2)
plot(P1)
Here is a subset of the data to generate it (for you to try):
data
How about manipulating your factor as a continuous variable and nudging the entries across the aes() calls like so:
library(dplyr)
library(ggplot2)
set.seed(42)
tibble(x = rep(c(1, 3), each = 10),
y = c(rnorm(10, 2), rnorm(10))) -> plot_data
ggplot(plot_data) +
geom_jitter(aes(x = x - 0.5, y = y), width = 0.25) +
geom_violin(aes(x = x + 0.5, y = y, group = x), width = 0.5) +
coord_flip() +
labs(x = "x") +
scale_x_continuous(breaks = c(1, 3),
labels = paste("Level", 1:2),
trans = scales::reverse_trans())
I have been working on creating a histogram of some data I that I have recent generated and in a effort to make the data more readable would like to include the confidence intervals, including having the intervals numerically marked on the tick line.
This has created a small problem with the readability. Using the code below you can see that having mean as a float value will cause all of the tick marks to have the same precision as the mean value leading to a large number of trailing 0's, in this case there are 7 but if you manully set the mean value to something like 3.5 all will have 1 trailing 0.
I was wondering if anyone knows how to set the percision of each mark manually. Ideally I would like to have the marks at 0,1,2,..,10 to be integer while the mean value would have 2 digits of precision shown since I will have a more accurate number listed.
require(ggplot2)
set.seed(1235)
df <- data.frame(x=rexp(1000))
mean = mean(df$x)
ggplot(df, aes(x=x)) +
geom_histogram(binwidth = .05, position="dodge", color="black", fill="transparent") +
geom_vline(data=df, aes(xintercept=mean), linetype="dashed", color="red") +
theme_bw() +
scale_x_continuous(name="Values", expand = c(0, 0), breaks = sort(c(seq(0,10,1), mean)))
You can set the labels parameter of scale_x_continuous. The values still overlap, so adjust accordingly or put the label elsewhere, e.g. with geom_text.
ggplot(df, aes(x = x)) +
geom_histogram(binwidth = .05, position = "dodge", color = "black", fill = "transparent") +
geom_vline(aes(xintercept = mean), linetype = "dashed", color = "red") +
theme_bw() +
scale_x_continuous(name="Values", expand = c(0, 0),
breaks = sort(c(seq(0,10,1), mean)),
labels = sort(c(0L:10L, round(mean, digits = 2))))
I have data that looks like this
df = data.frame(x=sample(1:5,100,replace=TRUE),y=rnorm(100),assay=sample(c('a','b'),100,replace=TRUE),project=rep(c('primary','secondary'),50))
and am producing a plot using this code
ggplot(df,aes(project,x)) + geom_violin(aes(fill=assay)) + geom_jitter(aes(shape=assay,colour=y),height=.5) + coord_flip()
which gives me this
This is 90% of the way to being what I want. But I would like it if each point was only plotted on top of the violin plot for the matching assay type. That is, the jitterred positions of the points were set such that the triangles were only ever on the upper teal violin plot and the circles in the bottom red violin plot for each project type.
Any ideas how to do this?
In order to get the desired result, it is probably best to use position_jitterdodge as this gives you the best control over the way the points are 'jittered':
ggplot(df, aes(x = project, y = x, fill = assay, shape = assay, color = y)) +
geom_violin() +
geom_jitter(position = position_jitterdodge(dodge.width = 0.9,
jitter.width = 0.5,
jitter.height = 0.2),
size = 2) +
coord_flip()
which gives:
You can use interaction between assay & project:
p <- ggplot(df,aes(x = interaction(assay, project), y=x)) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip()
The labeling can be adjusted by numeric scaled x axis:
# cbind the interaction as a numeric
df$group <- as.numeric(interaction(df$assay, df$project))
# plot
p <- ggplot(df,aes(x=group, y=x, group=cut_interval(group, n = 4))) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip() + scale_x_continuous(breaks = c(1.5, 3.5), labels = levels(df$project))
I'm trying to overlay two histograms and plot their y-axis in log scale. Some example code:
dat1<- data.frame(
x = rpois(1000, 50),
y = rep("X1", 1000)
)
dat2<- data.frame(
x = rpois(1000, 30),
y = rep("X0", 1000)
)
dat<- rbind(dat1, dat2)
p <- ggplot(dat, aes(x = x, fill = y)) +
geom_histogram(
aes(y=..density..),
breaks= seq( min(dat$x), max(dat$x),(max(dat$x)-min(dat$x))/30 ),
alpha=0.4,
position="identity", lwd=0.2
) +
scale_y_log10() +
scale_fill_manual(values=c("red", "black"), labels=c("X1", "X0"))
print(p)
Without setting scale_y_log10(), I got something like this:
However, after using scale_y_log10(), the histogram colors are not filled correctly (see below, the overlap of two histograms is not filled with colors, instead, the empty area is filled). Any ideas how to fix this?
With:
ggplot(dat) +
geom_histogram(aes(x=x, y=log10(..density.. + 1), fill=y), alpha=0.4, position="identity", lwd=0.2) +
scale_fill_manual(values=c("red", "black"), labels=c("X1", "X0"))
you get:
Some explanation: now you have x values in your plot with a count of zero. This gives some problems when doing a log-transformation. By adding 1, you are able to include zero's in a log-scale.