Plotting discrete predictions with probability intervals - ggplot2 - r

I need to plot some discrete predictions with probability intervals in ggplot2, but I'm having some problems.
I have the following data.frame
city pred min.80 max.80
BH 100 50 150
RJ 120 80 140
SP 90 80 100
I want a plot with the cities on y-axis and the predicted values on x-axis. For each discrete value of y, there should be a horizontal bar with its range being the min.80 and max.80 values. My idea is to use geom_rect from ggplot2 for doing it.
I've tried the following code, but the problem is that I'm converting the discrete variable to continuous in order to plot it, and I lose their values on the label.
> ggplot(df) + geom_rect(aes(xmin=min.80, xmax=max.80, ymin=as.numeric(city)-0.4,
+ ymax=as.numeric(city)+0.4))
Is there another way to do it?

I suggest you use the geom pointrange or crossbar:
ggplot(df, aes(x=city)) +
geom_pointrange(aes(ymin=min.80, ymax=max.80, y=pred)) +
coord_flip()
ggplot(df, aes(x=city)) +
geom_crossbar(aes(ymin=min.80, ymax=max.80, y=pred)) +
coord_flip()

I think you want to keep the y axis as a factor (y=city). This kind of (estimate+interval) data is probably is better done with something like geom_pointrange. After all, the "height" of the rectangle doesn't have an interpretation.
If you have to have the errorbars be horizontal, I've done this before in two ways:
using coord_flip()
Last time I tried coord_flip(), it was a bit limited, so I sometimes also recreated the geom_pointrange() functionality by combining geom_hline() with geom_point().

Related

Setting a fixed color scale for a series of data in ggplot2

I've been searching for a while, and I've found a number of answers for problems similar to mine, but not quite working when I try to implement them.
I'm trying to make a series of radar plots for different observations of performance. The data has been normalized such that the mean is 0 and the standard deviation is 1, and the y-axis on the plot has been set from -3 to 3 so as to make it visually comparable how well the subjects performed, with more extreme observations being worse. I would like to add colors associated with that scale, preferably such that -1 to 1 is green, and then the bands between +/- 1-2 is yellow and +/- 2-3 is red. All the examples I've been able to find relating to color fills is based directly in the data or from factors rather than a fixed scale, and anything I try seems to not show correctly. I'm not even sure if it is normally in the functionality of ggplot to be able to set a color scale in the way I'm looking for...
Here's the toy data I've been working with while working out the plotting (after reshaping):
variable <- c("time", "distance", "turns")
value <- c(0.9536197, 0.5842319, -2.1814528)
df <- data.frame(variable, value)
and here's my most recent attempt as far as ggplot code goes (using ggiraphExtra):
ggplot(temp, aes(x=variable, y=value, group=1)) + geom_point() + geom_polygon() +
ggiraphExtra:::coord_radar() + ylim(-3,3) +
scale_fill_gradient(low="red", high="green")
and this is the output:
radar plot with solid green geom_polygon fill

Making line plot with discrete x-axis in ggplot2

I am building a ggplot2 figure with a facet grid. On my Y-axis are percentages, and my X-axis is the concentration (in numbers). Each facet has 3 groups (0, 24 and 48 hours)
ggplot(data=MasterTable, aes(x=Concentration, y=Percentage, group=Time)) +
geom_point() +
geom_line() +
facet_grid(Chemicals ~ Treatments)
This generates a continuous x-axis. Since the values are not evenly spread out, I would prefer a discrete axis to better visualize my data. I followed the following tutorial with no luck. The first figure is exactly what I am trying to do.
I also tried formatting the axis:
scale_x_discrete(labels("0", "0.1", "2", "50"))
and formatting the line:
geom_line(aes(Time))
and following this tutorial.
I think this problem is that the values on the x-axis are integers rather than strings. This makes the default axis continuous. How can I change this?? I am sure the solution is simple, I just can't figure it out.
Thanks in advance!
On this page they make the following modification df2$dose<-as.factor(df2$dose). You can try to modify your x-axis as df2$Concentration<-as.factor(df2$Concentration)
or like this:
ggplot(data=MasterTable, aes(x=factor(Concentration), y=Percentage, group=Time)) +
geom_point() +
geom_line() +
facet_grid(Chemicals ~ Treatments)

Drawing flipped Normal distribution in R without using coord_flip()

Good day
Without using coord_flip(), Is there a way to draw normal distribution flipped by exchanging position x and y in aes()?
I' ve tried as below.
df3 <- data.frame(x=seq(-6,6,b=0.1),y=sapply(seq(-6,6,b=0.1),function(x) dnorm(x)))
ggplot(df3,aes(y,x))+ geom_line() # x,y position exchanged
I'm not sure what's wrong with coord_flip, but you can avoid it with geom_path. geom_path connects the points in the order they appear in the data, rather than in order of the magnitude of the x-value. So you just need to make sure the data are ordered by y-axis value (which they already are here).
ggplot(df3, aes(y,x)) +
geom_path() +
theme_classic()

Plotting percent change for a large number of factors on same figure using ggplot by faceting or color-coding factors

Here is an example of the code I'm working with
x<-as.factor(rep(c("tree_mean","tree_qmean","tree_skew"),3))
factor<-c(rep("mfn2_burned_99",3),rep("mfna_burned_5_7",3),rep("mfna_burned_5_7_10_12",3)))
y<-c(0.336457409,-0.347422910,-0.318945621,1.494109367, 0.003578698,-0.019985780,-0.484171146, 0.611589217,-0.322292664)
dat<-as.data.frame(cbind(x,factor,y))
head(dat)
x factor y
tree_mean mfn2_burned_99 -0.3364574
tree_qmean mfn2_burned_99 -0.3474229
tree_skew mfn2_burned_99 -0.3189456
tree_mean mfna_burned_5_7 -0.8269814
tree_qmean mfna_burned_5_7 -0.8088810
tree_skew mfna_burned_5_7 -2.5429226
tree_mean mfna_burned_5_7_10_12 -0.8601206
tree_qmean mfna_burned_5_7_10_12 -0.8474920
tree_skew mfna_burned_5_7_10_12 -2.9854178
I am trying to plot how much x deviates from 0, and facet it by each factor, as so:
ggplot(dat) +
geom_point(aes(x=x,y=y),shape=1,size=3)+
geom_linerange(aes(x=x,ymin=0,ymax=y))+
geom_hline(yintercept=0)+
facet_grid(factor~.)
This works fine when I have three factors (ignore the *: I had a significance column which I have since removed.
Example below:
However, I have 8 factors in total, and faceting obscures the plot such that the distance from zero for each x value gets very distorted.
Example below
So, my question is this: what would be a better way of coding/rendering this plot given my large number of x values and factors using faceting or color coding by factor in ggplot??
I would be very open to color-coding each distance for x by factor rather than faceting, but I have been beating my head against the wall trying to figure out how to even do that in ggplot (very new to ggplot), so I can't yet say if it would make the figure much more interpretable.
One option as you note is to color your point and/or linerange by a factor. You can then use position_dodge to move the points slightly on the x axis.
For example:
ggplot(dat, aes(color = factor)) +
geom_point(aes(x=x,y=y),shape=1,size=3, position = position_dodge(width = 0.5)+
geom_linerange(aes(x=x,ymin=0,ymax=y), position = position_dodge(width =0.5))+
geom_hline(yintercept=0)
I think this would still be difficult with many factors, but with 8 it might suit your purposes.

5 dimensional plot in r

I am trying to plot a 5 dimensional plot in R. I am currently using the rgl package to plot my data in 4 dimensions, using 3 variables as the x,y,z, coordinates, another variable as the color. I am wondering if I can add a fifth variable using this package, like for example the size or the shape of the points in the space. Here's an example of my data, and my current code:
set.seed(1)
df <- data.frame(replicate(4,sample(1:200,1000,rep=TRUE)))
addme <- data.frame(replicate(1,sample(0:1,1000,rep=TRUE)))
df <- cbind(df,addme)
colnames(df) <- c("var1","var2","var3","var4","var5")
require(rgl)
plot3d(df$var1, df$var2, df$var3, col=as.numeric(df$var4), size=0.5, type='s',xlab="var1",ylab="var2",zlab="var3")
I hope it is possible to do the 5th dimension.
Many thanks,
Here is a ggplot2 option. I usually shy away from 3D plots as they are hard to interpret properly. I also almost never put in 5 continuous variables in the same plot as I have here...
ggplot(df, aes(x=var1, y=var2, fill=var3, color=var4, size=var5^2)) +
geom_point(shape=21) +
scale_color_gradient(low="red", high="green") +
scale_size_continuous(range=c(1,12))
While this is a bit messy, you can actually reasonably read all 5 dimensions for most points.
A better approach to multi-dimensional plotting opens up if some of your variables are categorical. If all your variables are continuous, you can turn some of them to categorical with cut and then use facet_wrap or facet_grid to plot those.
For example, here I break up var3 and var4 into quintiles and use facet_grid on them. Note that I also keep the color aesthetics as well to highlight that most of the time turning a continuous variable to categorical in high dimensional plots is good enough to get the key points across (here you'll notice that the fill and border colors are pretty uniform within any given grid cell):
df$var4.cat <- cut(df$var4, quantile(df$var4, (0:5)/5), include.lowest=T)
df$var3.cat <- cut(df$var3, quantile(df$var3, (0:5)/5), include.lowest=T)
ggplot(df, aes(x=var1, y=var2, fill=var3, color=var4, size=var5^2)) +
geom_point(shape=21) +
scale_color_gradient(low="red", high="green") +
scale_size_continuous(range=c(1,12)) +
facet_grid(var3.cat ~ var4.cat)

Resources