How to plot different target line for different facet charts [duplicate] - r

Using the Iris data set as an example, I can produce a ggplot with facet.
The code is:
library(ggplot2)
data(iris)
y=iris
y$Petal.Width.Range=factor(ifelse(y$Petal.Width<1.3,"Narrow","Wide"))
y$Petal.Length.Range=factor(ifelse(y$Petal.Length<4.35,"Short","Long"))
ggplot(y, aes(Sepal.Length,Sepal.Width)) +
geom_point(alpha=0.5)+
geom_hline(yintercept =3 ,alpha=0.3)+
facet_grid(Petal.Width.Range ~ Petal.Length.Range)
Here I have a horizontal spec of 3 in each of the 4 cases. What should I do if I want a case dependent spec please? For example, I can define 4 different specs as the following:
y$threshold=2
y$threshold[(y$Petal.Width.Range=="Narrow")&(y$Petal.Length.Range=="Short")] =2
y$threshold[(y$Petal.Width.Range=="Narrow")&(y$Petal.Length.Range=="Long")] =2.5
y$threshold[(y$Petal.Width.Range=="Wide")&(y$Petal.Length.Range=="Short")] =3.1
y$threshold[(y$Petal.Width.Range=="Wide")&(y$Petal.Length.Range=="Long")] =4
How should I add y$threshold into the ggplot commands please?

One easy solution is just to change your hline call to this: geom_hline(aes(yintercept=threshold), alpha=0.3) +.
The problem is, that would draw 150 lines on your plot (150 being the number of rows in the y data.frame). Maybe that's ok with you, because the lines would mostly be stacked on top of each other and you would really only see four lines, in their correct locations.
However, here is another solution where I create a smaller auxiliary data.frame. This is a common approach in ggplot2. Notice how the new data.frame is specified as the data source inside the geom_hline call.
hline_dat = data.frame(Petal.Width.Range=c("Narrow", "Narrow", "Wide", "Wide"),
Petal.Length.Range=c("Short", "Long", "Short", "Long"),
threshold=c(2, 2.5, 3.1, 4))
p = ggplot(y, aes(Sepal.Length,Sepal.Width)) +
geom_point(alpha=0.5) +
geom_hline(data=hline_dat, aes(yintercept=threshold), colour="salmon") +
facet_grid(Petal.Width.Range ~ Petal.Length.Range)
ggsave("plot.png", plot=p, height=4, width=6)

Related

How does one control the appearance (e.g. line size, line type, colour) of mqgam plots produced using plot.mgamViz from the "mgcViz" package?

I am using quantile regression in R with the qgam package and visualising them using the mgcViz package, but I am struggling to understand how to control the appearance of the plots. The package effectively turns gams (in my case mqgams) into ggplots.
Simple reprex:
egfit <- mqgam(data = iris,
Sepal.Length ~ s(Petal.Length),
qu = c(0.25,0.5,0.75))
plot.mgamViz(getViz(egfit))
I am able to control things that can be added, for example the axis labels and theme of the plot, but I'm struggling to effect things that would normally be addressed in the aes() or geom_x() functions.
How would I control the thickness of the line? If this were a normal geom_smooth() or geom_line() I'd simply put size = 1 inside of the geoms, but I cannot see how I'd do so here.
How can I control the linetype of these lines? The "id" is continuous and one cannot supply a linetype to a continuous scale. If this were a nomral plot I would convert "id" to a character, but I can't see a way of doing so with the plot.mgamViz function.
How can I supply a new colour scale? It seems as though if I provide it with a new colour scale it invents new ID values to put on the legend that don't correlate to the actual "id" values, e.g.
plot.mgamViz(getViz(egfit)) + scale_colour_viridis_c()
I fully expect this to be relatively simple and I'm missing something obvious, and imagine the answer to all three of these subquestions are very similar to one another. Thanks in advance.
You need to extract your ggplot element using this:
p1 <- plot.mgamViz(getViz(egfit))
p <- p1$plots [[1]]$ggObj
Then, id should be as.factor:
p$data$id <- as.factor(p$data$id)
Now you can play with ggplot elements as you prefer:
library(mgcViz)
egfit <- mqgam(data = iris,
Sepal.Length ~ s(Petal.Length),
qu = c(0.25,0.5,0.75))
p1 <- plot.mgamViz(getViz(egfit))
# Taking gg infos and convert id to factor
p <- p1$plots [[1]]$ggObj
p$data$id <- as.factor(p$data$id)
# Changing ggplot attributes
p <- p +
geom_line(linetype = 3, size = 1)+
scale_color_brewer(palette = "Set1")+
labs(x="Petal Length", y="s(Petal Length)", color = "My ID labels:")+
theme_classic(14)+
theme(legend.position = "bottom")
p
Here the generated plot:
Hope it is useful!

compare boxplots with a single value

I want to compare the distribution of several variables (here X1 and X2) with a single value (here bm). The issue is that these variables are too many (about a dozen) to use a single boxplot.
Additionaly the levels are too different to use one plot. I need to use facets to make things more organised:
However with this plot my benchmark category (bm), which is a single value in X1 and X2, does not appear in X1 and seems to have several values in X2. I want it to be only this green line, which it is in the first plot. Any ideas why it changes? Is there any good workaround? I tried the options of facet_wrap/facet_grid, but nothing there delivered the right result.
I also tried combining a bar plot with bm and three empty categories with the boxplot. But firstly it looked terrible and secondly it got similarly screwed up in the facetting. Basically any work around would help.
Below the code to create the minimal example displayed here:
# Creating some sample data & loading libraries
library(ggplot2)
library(RColorBrewer)
set.seed(10111)
x=matrix(rnorm(40),20,2)
y=rep(c(-1,1),c(10,10))
x[y==1,]=x[y==1,]+1
x[,2]=x[,2]+20
df=data.frame(x,y)
# creating a benchmark point
benchmark=data.frame(y=rep("bm",2),key=c("X1","X2"),value=c(-0.216936,20.526312))
# melting the data frame, rbinding it with the benchmark
test_dat=rbind(tidyr::gather(df,key,value,-y),benchmark)
# Creating a plot
p_box <- ggplot(data = test_dat, aes(x=key, y=value,color=as.factor(test_dat$y))) +
geom_boxplot() + scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1"))
# The first line delivers the first plot, the second line the second plot
p_box
p_box + facet_wrap(~key,scales = "free",drop = FALSE) + theme(legend.position = "bottom")
The problem only lies int the use of test_dat$y inside the color aes. Never use $ in aes, ggplot will mess up.
Anyway, I think you plot would improve if you use a geom_hline for the benchmark, instead of hacking in a single value boxplot:
library(ggplot2)
library(RColorBrewer)
ggplot(tidyr::gather(df,key,value,-y)) +
geom_boxplot(aes(x=key, y=value, color=as.factor(y))) +
geom_hline(data = benchmark, aes(yintercept = value), color = '#4DAF4A', size = 1) +
scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1")) +
facet_wrap(~key,scales = "free",drop = FALSE) +
theme(legend.position = "bottom")

Variable hline in ggplot with facet

Using the Iris data set as an example, I can produce a ggplot with facet.
The code is:
library(ggplot2)
data(iris)
y=iris
y$Petal.Width.Range=factor(ifelse(y$Petal.Width<1.3,"Narrow","Wide"))
y$Petal.Length.Range=factor(ifelse(y$Petal.Length<4.35,"Short","Long"))
ggplot(y, aes(Sepal.Length,Sepal.Width)) +
geom_point(alpha=0.5)+
geom_hline(yintercept =3 ,alpha=0.3)+
facet_grid(Petal.Width.Range ~ Petal.Length.Range)
Here I have a horizontal spec of 3 in each of the 4 cases. What should I do if I want a case dependent spec please? For example, I can define 4 different specs as the following:
y$threshold=2
y$threshold[(y$Petal.Width.Range=="Narrow")&(y$Petal.Length.Range=="Short")] =2
y$threshold[(y$Petal.Width.Range=="Narrow")&(y$Petal.Length.Range=="Long")] =2.5
y$threshold[(y$Petal.Width.Range=="Wide")&(y$Petal.Length.Range=="Short")] =3.1
y$threshold[(y$Petal.Width.Range=="Wide")&(y$Petal.Length.Range=="Long")] =4
How should I add y$threshold into the ggplot commands please?
One easy solution is just to change your hline call to this: geom_hline(aes(yintercept=threshold), alpha=0.3) +.
The problem is, that would draw 150 lines on your plot (150 being the number of rows in the y data.frame). Maybe that's ok with you, because the lines would mostly be stacked on top of each other and you would really only see four lines, in their correct locations.
However, here is another solution where I create a smaller auxiliary data.frame. This is a common approach in ggplot2. Notice how the new data.frame is specified as the data source inside the geom_hline call.
hline_dat = data.frame(Petal.Width.Range=c("Narrow", "Narrow", "Wide", "Wide"),
Petal.Length.Range=c("Short", "Long", "Short", "Long"),
threshold=c(2, 2.5, 3.1, 4))
p = ggplot(y, aes(Sepal.Length,Sepal.Width)) +
geom_point(alpha=0.5) +
geom_hline(data=hline_dat, aes(yintercept=threshold), colour="salmon") +
facet_grid(Petal.Width.Range ~ Petal.Length.Range)
ggsave("plot.png", plot=p, height=4, width=6)

Keep same scale in different graphs ggplot2

I want to create 3 graphs in ggplot2 as follows:
ggplot(observbest,aes(x=factor(iteration),y=bottles,colour=Team ,group=Team)) + geom_line() + scale_colour_gradientn(colours=rainbow(16))
ggplot(observmedium,aes(x=factor(iteration),y=bottles,colour=Team ,group=Team)) + geom_line() + scale_colour_gradientn(colours=rainbow(16))
ggplot(observweak,aes(x=factor(iteration),y=bottles,colour=Team ,group=Team)) + geom_line() + scale_colour_gradientn(colours=rainbow(16))
That is, three graphs displaying the same thing but for difference dataset each time. I want to compare between them, therefore I want their y axis to be fixed to the same scale with the same margins on all graphs, something the currently doesn't happen automatically.
Any suggestion?
Thanks
It sounds like a facet_wrap on all the observations, combined into a single dataframe, might be what you're looking for. E.g.
library(plyr)
library(ggplot2)
observ <- rbind(
mutate(observbest, category = "best"),
mutate(observmedium, category = "medium"),
mutate(observweak, category = "weak")
)
qplot(iteration, bottles, data = observ, geom = "line") + facet_wrap(~category)
Add + ylim(min_value,max_value) to each graph.
Another option would be to merge the three datasets with an id variable identifying which value is in which dataset, and then plot the three of them together, differentiating them by linetype for instance.
Use scale_y_continuous to define the y axis for each graph and make them all easily comparable.

How to control ylim for a faceted plot with different scales in ggplot2?

In the following example, how do I set separate ylims for each of my facets?
qplot(x, value, data=df, geom=c("smooth")) + facet_grid(variable ~ ., scale="free_y")
In each of the facets, the y-axis takes a different range of values and I would like to different ylims for each of the facets.
The defaults ylims are too long for the trend that I want to see.
This was brought up on the ggplot2 mailing list a short while ago. What you are asking for is currently not possible but I think it is in progress.
As far as I know this has not been implemented in ggplot2, yet. However a workaround - that will give you ylims that exceed what ggplot provides automatically - is to add "artificial data". To reduce the ylims simply remove the data you don't want plot (see at the and for an example).
Here is an example:
Let's just set up some dummy data that you want to plot
df <- data.frame(x=rep(seq(1,2,.1),4),f1=factor(rep(c("a","b"),each=22)),f2=factor(rep(c("x","y"),22)))
df <- within(df,y <- x^2)
Which we could plot using line graphs
p <- ggplot(df,aes(x,y))+geom_line()+facet_grid(f1~f2,scales="free_y")
print(p)
Assume we want to let y start at -10 in first row and 0 in the second row, so we add a point at (0,-10) to the upper left plot and at (0,0) ot the lower left plot:
ylim <- data.frame(x=rep(0,2),y=c(-10,0),f1=factor(c("a","b")),f2=factor(c("x","y")))
dfy <- rbind(df,ylim)
Now by limiting the x-scale between 1 and 2 those added points are not plotted (a warning is given):
p <- ggplot(dfy,aes(x,y))+geom_line()+facet_grid(f1~f2,scales="free_y")+xlim(c(1,2))
print(p)
Same would work for extending the margin above by adding points with higher y values at x values that lie outside the range of xlim.
This will not work if you want to reduce the ylim, in which case subsetting your data would be a solution, for example to limit the upper row between -10 and 1.5 you could use:
p <- ggplot(dfy,aes(x,y))+geom_line(subset=.(y < 1.5 | f1 != "a"))+facet_grid(f1~f2,scales="free_y")+xlim(c(1,2))
print(p)
There are actually two packages that solve that problem now:
https://github.com/zeehio/facetscales, and https://cran.r-project.org/package=ggh4x.
I would recommend using ggh4x because it has very useful tools, such as facet grid multiple layers (having 2 variables defining the rows or columns), scaling the x and y-axis as you wish in each facet, and also having multiple fill and colour scales.
For your problems the solution would be like this:
library(ggh4x)
scales <- list(
# Here you have to specify all the scales, one for each facet row in your case
scale_y_continuous(limits = c(2,10),
scale_y_continuous(breaks = c(3, 4))
)
qplot(x, value, data=df, geom=c("smooth")) +
facet_grid(variable ~ ., scale="free_y") +
facetted_pos_scales(y = scales)
I have one example of function facet_wrap
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
facet_wrap(vars(class), scales = "free",
nrow=2,ncol=4)
Above code generates plot as:
my level too low to upload an image, click here to see plot

Resources