Preventing incosistent spacing/bar widths in geom_bar with many bars - r

In a bar plot with lots of bars the problem occurs that the spacing between bars and/or the widths of the bars becomes incosistent, also changing with changing the width of the plot.
set.seed(23511)
dat <- data.frame(x = 1:540, y = rnorm(540))
library(ggplot2)
ggplot(dat) +
geom_bar(aes(x = x, y = y), stat = "identity")
Is there a way to solve this? I tried playing with width and the overall plot size to no avail.
In response to alistaire's comment here's a screenshot of the first few bars from RStudio. Looking at the first 10 values..
x y
1 1 0.9450960
2 2 0.9277378
3 3 0.4371033
4 4 -1.0333073
5 5 2.0473397
6 6 0.8174123
7 7 0.4277842
8 8 -0.4336887
9 9 0.2156801
10 10 0.4918345
.. to me it clearly looks like for the first 3 positive values there's space between the bars/the bars are narrower than for the second set of 3 positive values where there's no space between the bars/the bars are wider.

I think this is a pixel issue. If the x of a bar goes from 1.5 to 2.7 pixels, it will be one pixel wide, if it goes from 1.9 to 3.1 (same width) it will be 2 pixels wide.
You could do lines instead of bars.
ggplot(data=dat, aes(x=x, y=y)) +
geom_segment(aes(xend=x, yend=0), size = 0.6)
I think you still sometimes run into pixel issues, but it's maybe easier to control with size.

Related

ggplot2: independent axes for grid facet

I have xy data on a grid with some categorical variables. Here is some dummy data.
fn <- function(x,group,type) {
y <- expand.grid(x=seq(1,x),y=seq(1,x))
y$group <- group
y$type <- type
return(y)
}
dfr <- rbind(
fn(5,"a","1"),
fn(5,"b","1"),
fn(20,"c","1"),
fn(10,"a","2"),
fn(50,"c","2")
)
head(dfr)
x y group type
1 1 1 a 1
2 2 1 a 1
3 3 1 a 1
4 4 1 a 1
5 5 1 a 1
6 1 2 a 1
I need them on a grid layout due to the grouping variables.
Default grid
Good: Keeps grid layout, Gaps between points fixed, aspect ratio fixed
Bad: Data does not fill facet
ggplot(dfr,aes(x,y))+
geom_point()+
facet_grid(group~type)+
theme(aspect.ratio=1)
Grid free scales
Good: Keeps grid layout, aspect ratio fixed.
Bad: Data does not fill facet, Gaps between points differ
ggplot(dfr,aes(x,y))+
geom_point()+
facet_grid(group~type,scales="free")+
theme(aspect.ratio=1)
Grid free scales + free space
Good: Keeps grid layout, Gaps between points fixed
Bad: Data does not fill facet, Aspect ratio cannot be set
ggplot(dfr,aes(x,y))+
geom_point()+
facet_grid(group~type,scales="free",space="free")
Wrap free scales
Good: Data fills facet, aspect ratio fixed
Bad: Loses grid layout, gap between points not equal
ggplot(dfr,aes(x,y))+
geom_point()+
facet_wrap(group~type,scales="free")+
theme(aspect.ratio=1)
I would like all of these combined.
Keeps grid layout
Gaps between points fixed
Aspect ratio fixed
Data fills the facet (ie; variable size facets)
I think this is only possible using grid layout with independent axes. Is there some option I am missing or do I have to build this manually?
Update: 1
Expected end result.
Update 2:
I think to be able to achieve what I want, the facets need to have free scales, independent axes, fixed aspect ratio and variable sized facets/free space. The first three together are possible using gghx.
library(gghx)
ggplot(dfr,aes(x,y))+
geom_point()+
facet_grid2(group~type,scales="free",independent="all")+
theme(aspect.ratio=1)

R: barplot inconsistent width

I have two sets of data
Co1 Col2
1 10
2 12
3 13
4 14
5 15
6 16
7 17
8 18
9 19
I put these two data into two variables Num and Leaf respectively
and I tried to plot them with barplot(Leaf,Num, space=0.5,col="red")
Everything is fine, besides that some bars have larger width than the others. While some other bars have no width and literablly became a line.
Why is that?
I know i can fix it by doing barplot(Leaf,Num, space=0.5,col="red", width=0.5)
But I am wondering why the default behavior of such function gives you inconsistent bar width
In base R you can do
barplot(df$Col2, names.arg = df$Co1)
Or using ggplot
library(ggplot2)
ggplot(df, aes(as.factor(Co1), Col2)) + geom_col() + xlab("Co1")
dataset = data.frame(Leaf,Num)
barplot(dataset$Leaf,dataset$Num,space=0.5,col="red")
or
barplot(Leaf,Num,space=0.5,col="red")

Plot frequency heatmap of positions from set of coordinates

I have a bunch of data that looks like this:
Track X1 X Y
1 Point 1 147.8333 258.5000
2 Point 2 148.5000 258.8333
3 Point 3 151.1667 260.8333
4 Point 4 154.5000 264.5000
5 Point 5 158.1667 266.5000
6 Point 6 161.5000 269.5000
I want to plot a heatmap of this, so a nice looking graph labelled x and y for the position coordinates, with a gradient color fill indicating the frequency that a particular point showed up, with a scale indicator showing what the colors mean. I'm looking for a simple gradient fill with a single color low and high.
I've been at this for a while but I think the first step should be to construct another data-set with the positions and a new column showing the frequencies? But I'm not 100% sure how to structure this.
So far my attempts look similar to:
ggplot(data=all_data, aes(x=X, y=Y)) + geom_tile(aes(fill=all_data$X)) +
scale_fill_gradient2(low="green", high="blue") + coord_equal()
As Jon Spring suggested, the following code shows up a graph like this:
all_data <- read.table(text = "
Track X1 X Y
1 Point 1 147.8333 258.5000
2 Point 2 148.5000 258.8333
3 Point 3 151.1667 260.8333
4 Point 4 154.5000 264.5000
5 Point 5 158.1667 266.5000
6 Point 6 161.5000 269.5000
", header = T, row.names = NULL)
ggplot(data=all_data, aes(x=X, y=Y)) + geom_bin2d()

ggplot margins - change distance to axis

I am switching from basic R plot tools to ggplot2 and am struggling with one issue.
In basic R you can control distance to each of four axes (or the "box") by setting margins. Resulting margins are fixed and do not depend on what you plot. These allows me to produce plots for my papers with exactly same plot area sizes despite the size of tick labels and axis labels.
In ggplot, I ecnountered this (minimum working example):
library(ggplot2)
dat = data.frame(x = 1:5, y = 1e-5* (1:5) ^ 2)
p = ggplot(dat, aes(x, y)) + geom_point() + geom_line()
print(p)
print(p + scale_y_log10())
Black arrows at the left-hand side of the plots show the difference between actual margins I get. Axis label(y) stays in place, while position of the y-axis shifts depending on the size of tick labels (text representation). It can be further escalated by changing axis.text.y to e.g. increase size.
What I desire is to be able to control actual margins no matter what tick labels are drawn - in that case I can achieve same sizes of figures of different data sets.
Although there are many theme options in ggplot2, there does not appear to be an option which sets a fixed margin space for the axes (or if there is it is well hidden). The cowplot package has an align_plots function which can align one or both axes in a list of plots. align_plots returns a list, each component of which is the original plot but with the axes specified aligned. I am using the grid.arrange function from the gridExtra package to output both plots so you can see the way the alignment works:
library(ggplot2)
dat = data.frame(x = 1:5, y = 1e-5* (1:5) ^ 2)
p = ggplot(dat, aes(x, y)) + geom_point() + geom_line()
print(p)
p1 = p + scale_y_log10()
print(p1)
library(cowplot)
library(gridExtra)
p2 = align_plots(p, p1, align = "hv")
grid.arrange(p2[[1]], p2[[2]])
This is how the two original plots would have output:
grid.arrange(p, p1)
Following the approach suggested by Stewart Ross in this message, I ended up in the similar thread. I played around with grobs generated from my sample ggplots using this method - and was able to determine how to manually control layout of your grobs individually (at least, to some extent).
For a sample plot, a generated grob's layout looks like this:
> p1$layout
t l b r z clip name
17 1 1 10 7 0 on background
1 5 3 5 3 5 off spacer
2 6 3 6 3 7 off axis-l
3 7 3 7 3 3 off spacer
4 5 4 5 4 6 off axis-t
5 6 4 6 4 1 on panel
6 7 4 7 4 9 off axis-b
7 5 5 5 5 4 off spacer
8 6 5 6 5 8 off axis-r
9 7 5 7 5 2 off spacer
10 4 4 4 4 10 off xlab-t
11 8 4 8 4 11 off xlab-b
12 6 2 6 2 12 off ylab-l
13 6 6 6 6 13 off ylab-r
14 3 4 3 4 14 off subtitle
15 2 4 2 4 15 off title
16 9 4 9 4 16 off caption
Here we are interested in 4 axes - axis-l,t,b,r. Suppose we want to control left margin - look for axis-l. Notice that this particular grob has a layout of 7x10.
p1$layout[p1$layout$name == "axis-l", ]
t l b r z clip name
2 6 3 6 3 7 off axis-l
As far as I understood it, this output means that left axis takes one grid cell (#3 horizontally, #6 vertically). Note index ind = 3.
Now, there are two other fields in grob - widths and heights. Lets go to widths (which appears to be a specific list of grid's units) and pick up width with index ind we just obtained. In my sample case the output is something like
> p1$widths[3]
[1] sum(1grobwidth, 3.5pt)
I guess it is a 'runtime-determined' size of some 1grobwidth plus additional 3.5pt. Now we can replace this value by another unit (I tested very simple things like centimeters or points), e.g. p1$widths[3] = unit(4, "cm"). So far I was able to confirm that if you specify equal 'widths' for left axis of two diferent plots, you will get identical margins.
Exploring $layout table might provide other ways of controlling plot layout (e.g. look at the $layout$name == "panel" to change plot area size).
produce plots for my papers with exactly same plot area sizes
this might help:
grid::grid.draw(egg::set_panel_size(ggplot2::qplot(1,1), width = grid::unit(3, "in")))

Barchart in log scale: cut-off bars, missing values

I'm trying to create a nice barchart from the following data:
> counts$counts_16
[1] 46921 1546 248 78 31 15 1 3 2 2 0
> counts$score
[1] 0 1 2 3 4 5 6 7 8 9 10
With the following code:
ggplot(data = counts, aes(x=score, y=counts_16)) + geom_bar(stat="identity", width=bar.width) + scale_y_continuous(trans=log2_trans())
Unfortunately, the result looks a bit odd. First of all, the bars do not start from the x axis, but are located too high.
Then, there is no bar for the 6th value, which should be 1.
For zero, there is a bar, although there should not be one.
Here's an example:
Now, I understand why it behaves odd for values of 0 on log scale, but how can I work around it? And how do I fix the other issues?
After a log transformation, the default "baseline" of the bar graph will be 1, rather than zero, because log(0) is -Inf. So when you have a count of 1, there's no bar to display since both the bottom at top of the bar are equal to 1. On the other hand, because log(0) = -Inf, the bar with a count of zero will extend downward beyond the bottom of the y-range of the graph for any lower y-limit less than 1.
UPDATE: Regarding your comment, another option is to add points to the plot, so that the you get a point where the y-value equals 1. ggplot also includes the top-half of the point for y=0, which sort of marks the zero count. For example:
counts = data.frame(score=0:6, counts_16=c(11000,10000,0:4))
ggplot(data = counts, aes(x=score, y=counts_16)) +
geom_bar(stat="identity", width=0.1, fill="grey50") +
geom_point(pch=21, fill="red", size=4) +
scale_y_log10(limits=c(1e-1,2e4), breaks=10^seq(-1,4,1),
labels=c(0.1, sprintf("%1.0f", 10^seq(0,4,1)))) +
scale_x_continuous(breaks=0:6)
You can, of course, just go with points (and perhaps a connecting line to guide the eye) and eliminate the bars, which avoids the awkward baseline issue with a bar plot on a log scale.

Resources