ggplot margins - change distance to axis - r

I am switching from basic R plot tools to ggplot2 and am struggling with one issue.
In basic R you can control distance to each of four axes (or the "box") by setting margins. Resulting margins are fixed and do not depend on what you plot. These allows me to produce plots for my papers with exactly same plot area sizes despite the size of tick labels and axis labels.
In ggplot, I ecnountered this (minimum working example):
library(ggplot2)
dat = data.frame(x = 1:5, y = 1e-5* (1:5) ^ 2)
p = ggplot(dat, aes(x, y)) + geom_point() + geom_line()
print(p)
print(p + scale_y_log10())
Black arrows at the left-hand side of the plots show the difference between actual margins I get. Axis label(y) stays in place, while position of the y-axis shifts depending on the size of tick labels (text representation). It can be further escalated by changing axis.text.y to e.g. increase size.
What I desire is to be able to control actual margins no matter what tick labels are drawn - in that case I can achieve same sizes of figures of different data sets.

Although there are many theme options in ggplot2, there does not appear to be an option which sets a fixed margin space for the axes (or if there is it is well hidden). The cowplot package has an align_plots function which can align one or both axes in a list of plots. align_plots returns a list, each component of which is the original plot but with the axes specified aligned. I am using the grid.arrange function from the gridExtra package to output both plots so you can see the way the alignment works:
library(ggplot2)
dat = data.frame(x = 1:5, y = 1e-5* (1:5) ^ 2)
p = ggplot(dat, aes(x, y)) + geom_point() + geom_line()
print(p)
p1 = p + scale_y_log10()
print(p1)
library(cowplot)
library(gridExtra)
p2 = align_plots(p, p1, align = "hv")
grid.arrange(p2[[1]], p2[[2]])
This is how the two original plots would have output:
grid.arrange(p, p1)

Following the approach suggested by Stewart Ross in this message, I ended up in the similar thread. I played around with grobs generated from my sample ggplots using this method - and was able to determine how to manually control layout of your grobs individually (at least, to some extent).
For a sample plot, a generated grob's layout looks like this:
> p1$layout
t l b r z clip name
17 1 1 10 7 0 on background
1 5 3 5 3 5 off spacer
2 6 3 6 3 7 off axis-l
3 7 3 7 3 3 off spacer
4 5 4 5 4 6 off axis-t
5 6 4 6 4 1 on panel
6 7 4 7 4 9 off axis-b
7 5 5 5 5 4 off spacer
8 6 5 6 5 8 off axis-r
9 7 5 7 5 2 off spacer
10 4 4 4 4 10 off xlab-t
11 8 4 8 4 11 off xlab-b
12 6 2 6 2 12 off ylab-l
13 6 6 6 6 13 off ylab-r
14 3 4 3 4 14 off subtitle
15 2 4 2 4 15 off title
16 9 4 9 4 16 off caption
Here we are interested in 4 axes - axis-l,t,b,r. Suppose we want to control left margin - look for axis-l. Notice that this particular grob has a layout of 7x10.
p1$layout[p1$layout$name == "axis-l", ]
t l b r z clip name
2 6 3 6 3 7 off axis-l
As far as I understood it, this output means that left axis takes one grid cell (#3 horizontally, #6 vertically). Note index ind = 3.
Now, there are two other fields in grob - widths and heights. Lets go to widths (which appears to be a specific list of grid's units) and pick up width with index ind we just obtained. In my sample case the output is something like
> p1$widths[3]
[1] sum(1grobwidth, 3.5pt)
I guess it is a 'runtime-determined' size of some 1grobwidth plus additional 3.5pt. Now we can replace this value by another unit (I tested very simple things like centimeters or points), e.g. p1$widths[3] = unit(4, "cm"). So far I was able to confirm that if you specify equal 'widths' for left axis of two diferent plots, you will get identical margins.
Exploring $layout table might provide other ways of controlling plot layout (e.g. look at the $layout$name == "panel" to change plot area size).

produce plots for my papers with exactly same plot area sizes
this might help:
grid::grid.draw(egg::set_panel_size(ggplot2::qplot(1,1), width = grid::unit(3, "in")))

Related

R: barplot inconsistent width

I have two sets of data
Co1 Col2
1 10
2 12
3 13
4 14
5 15
6 16
7 17
8 18
9 19
I put these two data into two variables Num and Leaf respectively
and I tried to plot them with barplot(Leaf,Num, space=0.5,col="red")
Everything is fine, besides that some bars have larger width than the others. While some other bars have no width and literablly became a line.
Why is that?
I know i can fix it by doing barplot(Leaf,Num, space=0.5,col="red", width=0.5)
But I am wondering why the default behavior of such function gives you inconsistent bar width
In base R you can do
barplot(df$Col2, names.arg = df$Co1)
Or using ggplot
library(ggplot2)
ggplot(df, aes(as.factor(Co1), Col2)) + geom_col() + xlab("Co1")
dataset = data.frame(Leaf,Num)
barplot(dataset$Leaf,dataset$Num,space=0.5,col="red")
or
barplot(Leaf,Num,space=0.5,col="red")

Plot frequency heatmap of positions from set of coordinates

I have a bunch of data that looks like this:
Track X1 X Y
1 Point 1 147.8333 258.5000
2 Point 2 148.5000 258.8333
3 Point 3 151.1667 260.8333
4 Point 4 154.5000 264.5000
5 Point 5 158.1667 266.5000
6 Point 6 161.5000 269.5000
I want to plot a heatmap of this, so a nice looking graph labelled x and y for the position coordinates, with a gradient color fill indicating the frequency that a particular point showed up, with a scale indicator showing what the colors mean. I'm looking for a simple gradient fill with a single color low and high.
I've been at this for a while but I think the first step should be to construct another data-set with the positions and a new column showing the frequencies? But I'm not 100% sure how to structure this.
So far my attempts look similar to:
ggplot(data=all_data, aes(x=X, y=Y)) + geom_tile(aes(fill=all_data$X)) +
scale_fill_gradient2(low="green", high="blue") + coord_equal()
As Jon Spring suggested, the following code shows up a graph like this:
all_data <- read.table(text = "
Track X1 X Y
1 Point 1 147.8333 258.5000
2 Point 2 148.5000 258.8333
3 Point 3 151.1667 260.8333
4 Point 4 154.5000 264.5000
5 Point 5 158.1667 266.5000
6 Point 6 161.5000 269.5000
", header = T, row.names = NULL)
ggplot(data=all_data, aes(x=X, y=Y)) + geom_bin2d()

Preventing incosistent spacing/bar widths in geom_bar with many bars

In a bar plot with lots of bars the problem occurs that the spacing between bars and/or the widths of the bars becomes incosistent, also changing with changing the width of the plot.
set.seed(23511)
dat <- data.frame(x = 1:540, y = rnorm(540))
library(ggplot2)
ggplot(dat) +
geom_bar(aes(x = x, y = y), stat = "identity")
Is there a way to solve this? I tried playing with width and the overall plot size to no avail.
In response to alistaire's comment here's a screenshot of the first few bars from RStudio. Looking at the first 10 values..
x y
1 1 0.9450960
2 2 0.9277378
3 3 0.4371033
4 4 -1.0333073
5 5 2.0473397
6 6 0.8174123
7 7 0.4277842
8 8 -0.4336887
9 9 0.2156801
10 10 0.4918345
.. to me it clearly looks like for the first 3 positive values there's space between the bars/the bars are narrower than for the second set of 3 positive values where there's no space between the bars/the bars are wider.
I think this is a pixel issue. If the x of a bar goes from 1.5 to 2.7 pixels, it will be one pixel wide, if it goes from 1.9 to 3.1 (same width) it will be 2 pixels wide.
You could do lines instead of bars.
ggplot(data=dat, aes(x=x, y=y)) +
geom_segment(aes(xend=x, yend=0), size = 0.6)
I think you still sometimes run into pixel issues, but it's maybe easier to control with size.

Reordering legend while modifying one particular line for a line chart in ggplot

Let's say I have a simple data frame as shown below:
> A <- data.frame(x=1:10, a=rep(1,10), d=rep(2,10), b=rep(3,10))
> A
x a d b
1 1 1 2 3
2 2 1 2 3
3 3 1 2 3
4 4 1 2 3
5 5 1 2 3
6 6 1 2 3
7 7 1 2 3
8 8 1 2 3
9 9 1 2 3
10 10 1 2 3
I want to plot this with x on the x-axis and the other columns as lines on the y-axis. I want the line representing final column to be a little thicker than the other lines. So I can do this with the following code, which leads to the plot shown below it:
library(ggplot2)
#Plot that creates a thicker line for last column of data.
#However, order of legend is changed to alphabetical order.
p <- ggplot(A, aes(x))
for(i in 2:length(A)){
gg.data <- data.frame(x=A$x, value=A[,i], name=names(A)[i])
if(i==length(A)){
p <- p + geom_line(data=gg.data, aes(y=value, color=name), size=1.1)
} else{
p <- p + geom_line(data=gg.data, aes(y=value, color=name))
}
}
Now the problem with the plot above is that the order of the variables in the legend has changed to align with alphabetical order. I don't want that; instead I want the order to remain a,d,b.
I can keep the order as I wish by using melt and then plotting using the code below, but now I don't see how to increase the size of the line representing the last column in A.
Amelt <- melt(A, id.vars='x')
#Plot that orders legend according to order of columns in data frame.
#However, not sure how to thicken one particular line over the others.
pmelt <- ggplot(Amelt)+geom_line(aes(x=x, y=value, color=variable))
How can I get both things that I want?
Have you tried using scale_fill_discrete(breaks=c("a","d","b")) to specify the legends for the plots.
Please have a look at this link:
http://www.cookbook-r.com/Graphs/Legends_(ggplot2)/
Hope this helps!

How to change from row to column major order with facet_wrap?

I want to make a 2x4 array of plots that show distributions changing over time. The default ggplot arrangement with facet_wrap is that the top row has series 1&2, the second row has series 3&4, etc. I would like to change this so that the first column has series in order (1->2->3->4) and then the second column has the next 4 series. This way your eye can compare immediately adjacent distributions in time vertically (as I think they should be).
Use the direction dir parameter to facet_wrap(). Default is horizontal, and this can be switched to vertical:
# Horizontally
ggplot(mtcars, aes(x=hp, y=mpg)) + geom_point() + facet_wrap(~ cyl, ncol=2)
# Vertically
ggplot(mtcars, aes(x=hp, y=mpg)) + geom_point() + facet_wrap(~ cyl, ncol=2, dir="v")
Looks like you need to do this with the ordering factor prior to the the facet_wrap call:
fac <- factor( fac, levels=as.character(c(1, 10, 2, 20, 3, 30, 4, 40) ) )
The default for as/table in facet_wrap is TRUE which is going to put the lowest value ("1" in this case) at the upper left and the highest value ("40" in the example above) at the lower right corner. So:
pl + facet_wrap(~fac, ncol=2, nrow=4)
Your comments suggest you are working with numeric class variables. (Your comments still do not provide a working example and you seem to think this is our responsibility and not yours. Where does one acquire such notions of entitlement?) This should create a factor that might be "column major" ordered with either numeric of factor input:
> ss <- 1:8; factor(ss, levels=ss[matrix(ss, ncol=2, byrow=TRUE)])
[1] 1 2 3 4 5 6 7 8
Levels: 1 3 5 7 2 4 6 8
On the other hand I can think of situations where this might be the effective approach:
> ss <- 1:8; factor(ss, levels=ss[matrix(ss, nrow=2, byrow=TRUE)])
[1] 1 2 3 4 5 6 7 8
Levels: 1 5 2 6 3 7 4 8

Resources