overlapping y-scales in facet (scale="free") - r

I've been learning ggplot in the last few weeks. Generally, I'm getting things done (slowly though), but now I'm stuck. I created the following facetted plot: http://dl.dropbox.com/u/7752237/example_bad_y_scales.pdf
Faceting is done by
pl <- pl + facet_wrap(~sci_name,ncol=1,scale="free")
The Problem: Numbers on the y-scale don't look good, especially the scales that go from 0-70 (numbers overlapping).
I'd like the somehow change the number of breaks on the y-scale (to let's say just 1 or 2 breaks). Does anybody maybe have an idea how to do that? Any help would be very much appreciated. :)
PS: I didn't include a minimal example because I think it wouldn't help much to solve that specific problem.
Edit after Kohskes answer:
Hi Kohske,
Wow, that was a really fast answer, thanks! However, I think it doesn't work well with facetted plots. Look at
p <- ggplot(mtcars, aes(wt, mpg))
p <- p + geom_point()
p <- p + facet_wrap(~gear,ncol=1,scale="free")
On the y-scale, it gives 3 breaks in the middle plot and 8 breaks in the lower plot… not very consistent (but at least not overlapping as in my example).
p2 <- p + scale_y_continuous(breaks=c(15,30),minor_breaks=c(10,20,25))
isn't really good neither: two major ticks on lower plots, only one in middle and upper plot.
When having scales with bigger differences than in mtcars, the result would be even less satisfying. Any other ideas? ;)
Edit after Kohskes edit:
Hi, I can't see how to implement this. Searching for ggplot and input_break on google yielded only 10 results, none of them did help.
I tried
p <- ggplot(mtcars, aes(wt, mpg))
p <- p + geom_point()
p <- p + facet_wrap(~gear,ncol=1,scale="free")
p$input_breaks<-function(., range) {
pretty(range, n=3)
}
print(p)
However, I can't see any effects in the graph (tried for n=1, 3, 15). Could you describe how to implement this on the mtcars example? Thanks!

p <- ggplot(mtcars, aes(wt, mpg))
p <- p + geom_point()
dev.new(height=1)
print(p)
dev.new(height=1)
p <- p + scale_y_continuous(breaks=c(15,30),minor_breaks=c(10,20,25))
print(p)
the trick is scale_y_continuous and you can specify the breaks and minor breaks in it.
edited:
probably you cannot specify the breaks separately for each facet.
one workaround is to control the prettiness of the breaks by:
Trans$input_breaks<-function(., range) {
pretty(range, n=3)
}
print(p)
changing the "n=3" yields different prettiness.
edited again:
here is full example:
library(ggplot2)
p <- ggplot(mtcars, aes(wt, mpg))+geom_point()
Trans$input_breaks<-function(., range) {
    pretty(range, n=100)
}
print(p)
in this case, probably you can see a hundred of ticks.
by changing n=100, you can custom it.
note that this has side-effect. all plots after this has same number of ticks, and also x and y axis have the same number of ticks.

Related

ggplot2 + scatterplot + geom_path

Do you know how to get the curved effect Jake Kaupp achieves on his plot?
Looks to be something along the lines of:
ggplot(full_data, aes(y = total_consumption_lbs, x = milk_production_lbs)) +
geom_xspline2(aes(s_open = TRUE, s_shape = 0.5))
Where geom_xspline2() comes from library(ggalt)
But don't ask me, here is his source code:
https://github.com/jkaupp/tidytuesdays/blob/master/2019/week5/R/analysis.R
This approach doesn't look quite as nice as your example, but it's a start, and some fiddling may get you the rest of the way.
First, some data to work with:
x <- seq(1:20)
y <- jitter(x,amount=1.5)
df <- data.frame(x,y)
The approach using ggplot2 is to draw a geom_smooth with very small span (small enough to cause lots of errors, as you'll see), and then plot points with white borders over the top of that.
ggplot(df, aes(x,y)) +
geom_smooth(se=F, colour="black", span=0.15) +
geom_point(fill="black", colour="white", shape=21, size=2.5) +
theme_minimal()
The downsides: As I noted above, you'll see many errors about singularities in the loess fit, because the span is so small. Second, you'll note that not all of the points are centred on the line, which makes sense since you are using a loess fit for the line. Lastly, there doesn't appear to be a way to change the width of the line around the points, so you end up with quite a thin white border.

R geom_line not plotting as expected

I am using the following code to plot a stacked area graph and I get the expected plot.
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) + #ggplot initial parameters
geom_ribbon(position='fill', aes(ymin=0, ymax=1))
but then when I add lines which are reading the same data source I get misaligned results towards the right side of the graph
P + geom_line(position='fill', aes(group=model, ymax=1))
does anyone know why this may be? Both plots are reading the same data source so I can't figure out what the problem is.
Actually, if all you wanted to do was draw an outline around the areas, then you could do the same using the colour aesthetic.
ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(position='fill', aes(ymin=0, ymax=1), colour = "black")
I have an answer, I hope it works for you, it looks good but very different from your original graph:
library(ggplot2)
DATA2 <- read.csv("C:/Users/corcoranbarriosd/Downloads/porsche model volumes.csv", header = TRUE, stringsAsFactors = FALSE)
In my experience you want to have X as a numeric variable and you have it as a string, if that is not the case I can Change that, but this will transform your bucket into a numeric vector:
bucket.list <- strsplit(unlist(DATA2$bucket), "[^0-9]+")
x=numeric()
for (i in 1:length(bucket.list)) {
x[i] <- bucket.list[[i]][2]
}
DATA2$bucket <- as.numeric(x)
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(aes(ymin=0, ymax=volume))+ geom_line(aes(group=model, ymax=volume))
It gives me the area and the line tracking each other, hope that's what you needed
If you switch to using geom_path in place of geom_line, it all seems to work as expected. I don't think the ordering of geom_line is behaving the same as geom_ribbon (and suspect that geom_line -- like geom_area -- assumes a zero base y value)
ggplot(DATA2, aes(x=bucket, y=volume, ymin=0, ymax=1,
group=model, fill=model, label=volume)) +
geom_ribbon(position='fill') +
geom_path(position='fill')
Should give you

ggplot2 and geom_density: How to remove baseline?

I'm using ggplot as described here
Smoothed density estimates
and entered in the R console
m <- ggplot(movies, aes(x = rating))
m + geom_density()
This works but is there some way to remove the connection between the x-axis and the density plot (the vertical lines which connect the density plot to the x-axis)
The most consistent way to do so is (thanks to #baptiste):
m + stat_density(geom="line")
My original proposal was to use geom_line with an appropriate stat:
m + geom_line(stat="density")
but it is no longer recommended since I'm receiving reports it's not universally working for every case in newer versions of ggplot.
The suggested answers dont provide exactly the same results as geom_density. Why not draw a white line over the baseline?
+ geom_hline(yintercept=0, colour="white", size=1)
This worked for me.
Another way would be to calculate the density separately and then draw it. Something like this:
a <- density(movies$rating)
b <- data.frame(a$x, a$y)
ggplot(b, aes(x=a.x, y=a.y)) + geom_line()
It's not exactly the same, but pretty close.

facet_wrap: How to add y axis to every individual graph when scales="free_x"?

The following code
library(ggplot2)
library(reshape2)
m=melt(iris[,1:4])
ggplot(m, aes(value)) +
facet_wrap(~variable,ncol=2,scales="free_x") +
geom_histogram()
produces 4 graphs with fixed y axis (which is what I want). However, by default, the y axis is only displayed on the left side of the faceted graph (i.e. on the side of 1st and 3rd graph).
What do I do to make the y axis show itself on all 4 graphs? Thanks!
EDIT: As suggested by #Roland, one could set scales="free" and use ylim(c(0,30)), but I would prefer not to have to set the limits everytime manually.
#Roland also suggested to use hist and ddply outside of ggplot to get the maximum count. Isn't there any ggplot2 based solution?
EDIT: There is a very elegant solution from #babptiste. However, when changing binwidth, it starts to behave oddly (at least for me). Check this example with default binwidth (range/30). The values on the y axis are between 0 and 30,000.
library(ggplot2)
library(reshape2)
m=melt(data=diamonds[,c("x","y","z")])
ggplot(m,aes(x=value)) +
facet_wrap(~variable,ncol=2,scales="free") +
geom_histogram() +
geom_blank(aes(y=max(..count..)), stat="bin")
And now this one.
ggplot(m,aes(x=value)) +
facet_wrap(~variable,scales="free") +
geom_histogram(binwidth=0.5) +
geom_blank(aes(y=max(..count..)), stat="bin")
The binwidth is now set to 0.5 so the highest frequency should change (decrease in fact, as in tighter bins there will be less observations). However, nothing happened with the y axis, it still covers the same amount of values, creating a huge empty space in each graph.
[The problem is solved... see #baptiste's edited answer.]
Is this what you're after?
ggplot(m, aes(value)) +
facet_wrap(~variable,scales="free") +
geom_histogram(binwidth=0.5) +
geom_blank(aes(y=max(..count..)), stat="bin", binwidth=0.5)
ggplot(m, aes(value)) +
facet_wrap(~variable,scales="free") +
ylim(c(0,30)) +
geom_histogram()
Didzis Elferts in https://stackoverflow.com/a/14584567/2416535 suggested using ggplot_build() to get the values of the bins used in geom_histogram (ggplot_build() provides data used by ggplot2 to plot the graph). Once you have your graph stored in an object, you can find the values for all the bins in the column count:
library(ggplot2)
library(reshape2)
m=melt(iris[,1:4])
plot = ggplot(m) +
facet_wrap(~variable,scales="free") +
geom_histogram(aes(x=value))
ggplot_build(plot)$data[[1]]$count
Therefore, I tried to replace the max y limit by this:
max(ggplot_build(plot)$data[[1]]$count)
and managed to get a working example:
m=melt(data=diamonds[,c("x","y","z")])
bin=0.5 # you can use this to try out different bin widths to see the results
plot=
ggplot(m) +
facet_wrap(~variable,scales="free") +
geom_histogram(aes(x=value),binwidth=bin)
ggplot(m) +
facet_wrap(~variable,ncol=2,scales="free") +
geom_histogram(aes(x=value),binwidth=bin) +
ylim(c(0,max(ggplot_build(plot)$data[[1]]$count)))
It does the job, albeit clumsily. It would be nice if someone improved upon that to eliminate the need to create 2 graphs, or rather the same graph twice.

How to add vertical lines to ggplot boxplots in R

I am plotting boxplots from this data:
MY_LABEL MY_REAL MY_CATEGORY
1 [POS] .56 POS
1 [POS] .57 POS
1 [POS] .37 POS
2 [POS] .51 POS
1 [sim v] .65 sim v
...
I'm using ggplot2:
ggplot( data=myDF, aes( x=MY_LABEL, y=MY_REAL, fill=MY_CATEGORY ) ) +
scale_colour_manual( values=palette ) +
coord_flip() +
geom_boxplot( outlier.size = 0 )
This works fine, and groups the boxplots by the field MY_CATEGORY:
I'd like to do 2 things:
1) To improve the clarity of this plot, I'd like to add separators between the various blocks, i.e. between POS and sim v, between sim v and C, etc (see the ugly red lines in the plot).
I've been struggling with geom_vline with no luck.
Alternatively, I'd like to add blank space between the blocks.
2) If I print this plot in grayscale, I can't distinguish the different blocks. I'm trying to force a different palette with:
scale_colour_manual( values=c("black","darkgray","gray","white") )
Again, no luck, the plot doesn't change at all.
What would you suggest to do?
Would this work for you?
require(ggplot2)
mtcars$cyl2<- ifelse(mtcars$cyl > 4, c('A'), c('B'))
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot() + facet_grid(. ~ cyl2, scales = "free", space = "free")
would give something like this,
No one covered the horizontal line route, so I thought I'd add it. Not sure why geom_vline() wasn't working for you. Here's what I did (chose to play off of Eric Fail's approach):
require(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p <- p + geom_boxplot(aes(fill=factor(cyl))) + coord_flip()
p <- p + geom_vline(xintercept=c(1.5,2.5))
p
There's only three boxplots here, but in playing around, ggplot appears to place them at integer locations. Just figure out which box you want a line after (nth) and put the xintercept argument at n+0.5 for the line. You can obviously change the thickness and color to your liking: just add a size=width and colour="name" after the xintercept bit.
By the way, geom_vline() seems to work for me regardless of whether it's before or after coord_flip(). I find that counter-intuitive.
I'm not sure bdemarest is correct that you need the names to match the category names. I think the issue is that you used scale_colour_manual(), which applies if you used aes(..., colour=var) whereas you used fill=var. Thus, you need scale_fill_manual. Building on the above, we can add:
p <- p + scale_fill_manual(values=c("black","gray","white"))
p
Note that I've not defined any factor names for the colors to match. I think the colors are simply applied to your factor levels according to their order, but I could be wrong.
The end result of all of the above:
To change the fill colors, you need a named vector of values. The names need exactly match the y-axis category names.
scale_fill_manual(values=c("POS"="black", "sim v"="gray50",
"C"="gray80", "sim t"="white"))
To separate the y-axis categories, try facet_grid().
facet_grid(factor(MY_CATEGORY) ~ ., drop=TRUE)
I'm not sure that this will work because I don't have your data to test it.

Resources