R: barplot inconsistent width - r

I have two sets of data
Co1 Col2
1 10
2 12
3 13
4 14
5 15
6 16
7 17
8 18
9 19
I put these two data into two variables Num and Leaf respectively
and I tried to plot them with barplot(Leaf,Num, space=0.5,col="red")
Everything is fine, besides that some bars have larger width than the others. While some other bars have no width and literablly became a line.
Why is that?
I know i can fix it by doing barplot(Leaf,Num, space=0.5,col="red", width=0.5)
But I am wondering why the default behavior of such function gives you inconsistent bar width

In base R you can do
barplot(df$Col2, names.arg = df$Co1)
Or using ggplot
library(ggplot2)
ggplot(df, aes(as.factor(Co1), Col2)) + geom_col() + xlab("Co1")

dataset = data.frame(Leaf,Num)
barplot(dataset$Leaf,dataset$Num,space=0.5,col="red")
or
barplot(Leaf,Num,space=0.5,col="red")

Related

ggplot margins - change distance to axis

I am switching from basic R plot tools to ggplot2 and am struggling with one issue.
In basic R you can control distance to each of four axes (or the "box") by setting margins. Resulting margins are fixed and do not depend on what you plot. These allows me to produce plots for my papers with exactly same plot area sizes despite the size of tick labels and axis labels.
In ggplot, I ecnountered this (minimum working example):
library(ggplot2)
dat = data.frame(x = 1:5, y = 1e-5* (1:5) ^ 2)
p = ggplot(dat, aes(x, y)) + geom_point() + geom_line()
print(p)
print(p + scale_y_log10())
Black arrows at the left-hand side of the plots show the difference between actual margins I get. Axis label(y) stays in place, while position of the y-axis shifts depending on the size of tick labels (text representation). It can be further escalated by changing axis.text.y to e.g. increase size.
What I desire is to be able to control actual margins no matter what tick labels are drawn - in that case I can achieve same sizes of figures of different data sets.
Although there are many theme options in ggplot2, there does not appear to be an option which sets a fixed margin space for the axes (or if there is it is well hidden). The cowplot package has an align_plots function which can align one or both axes in a list of plots. align_plots returns a list, each component of which is the original plot but with the axes specified aligned. I am using the grid.arrange function from the gridExtra package to output both plots so you can see the way the alignment works:
library(ggplot2)
dat = data.frame(x = 1:5, y = 1e-5* (1:5) ^ 2)
p = ggplot(dat, aes(x, y)) + geom_point() + geom_line()
print(p)
p1 = p + scale_y_log10()
print(p1)
library(cowplot)
library(gridExtra)
p2 = align_plots(p, p1, align = "hv")
grid.arrange(p2[[1]], p2[[2]])
This is how the two original plots would have output:
grid.arrange(p, p1)
Following the approach suggested by Stewart Ross in this message, I ended up in the similar thread. I played around with grobs generated from my sample ggplots using this method - and was able to determine how to manually control layout of your grobs individually (at least, to some extent).
For a sample plot, a generated grob's layout looks like this:
> p1$layout
t l b r z clip name
17 1 1 10 7 0 on background
1 5 3 5 3 5 off spacer
2 6 3 6 3 7 off axis-l
3 7 3 7 3 3 off spacer
4 5 4 5 4 6 off axis-t
5 6 4 6 4 1 on panel
6 7 4 7 4 9 off axis-b
7 5 5 5 5 4 off spacer
8 6 5 6 5 8 off axis-r
9 7 5 7 5 2 off spacer
10 4 4 4 4 10 off xlab-t
11 8 4 8 4 11 off xlab-b
12 6 2 6 2 12 off ylab-l
13 6 6 6 6 13 off ylab-r
14 3 4 3 4 14 off subtitle
15 2 4 2 4 15 off title
16 9 4 9 4 16 off caption
Here we are interested in 4 axes - axis-l,t,b,r. Suppose we want to control left margin - look for axis-l. Notice that this particular grob has a layout of 7x10.
p1$layout[p1$layout$name == "axis-l", ]
t l b r z clip name
2 6 3 6 3 7 off axis-l
As far as I understood it, this output means that left axis takes one grid cell (#3 horizontally, #6 vertically). Note index ind = 3.
Now, there are two other fields in grob - widths and heights. Lets go to widths (which appears to be a specific list of grid's units) and pick up width with index ind we just obtained. In my sample case the output is something like
> p1$widths[3]
[1] sum(1grobwidth, 3.5pt)
I guess it is a 'runtime-determined' size of some 1grobwidth plus additional 3.5pt. Now we can replace this value by another unit (I tested very simple things like centimeters or points), e.g. p1$widths[3] = unit(4, "cm"). So far I was able to confirm that if you specify equal 'widths' for left axis of two diferent plots, you will get identical margins.
Exploring $layout table might provide other ways of controlling plot layout (e.g. look at the $layout$name == "panel" to change plot area size).
produce plots for my papers with exactly same plot area sizes
this might help:
grid::grid.draw(egg::set_panel_size(ggplot2::qplot(1,1), width = grid::unit(3, "in")))

ggplot multiple lines in same graph

I am trying to plot multiple gene expressions over time in the same graph to demonstrate a similar profile and then add a line to illustrate the mean of total for each timepoint (like the figure 4b in recent Nature comm article https://www.nature.com/articles/s41467-017-02546-5/figures/4). My data has been normalised to be around 0 so they are all on the same scale.
df2 sample:
variable value gene
1 5 -0.610384193 1
2 5 -6.25967087 2
3 5 -3.773389731 3
50 6 -0.358879035 1
51 6 -6.066341017 2
52 6 -4.202998579 3
99 7 -0.103885903 1
100 7 -6.648844687 2
101 7 -5.041554127 3
I plot the expression levels with ggplot2:
plotC <- ggplot(df2, aes(x=variable, y=value, group=factor(gene), colour=gene)) + geom_line(size=0.5, aes(color=gene), alpha=0.4)
But adding the mean line in red to this plot is proving difficult. I calculated the means and put them in another dataframe:
means
value variable gene
1 -1.5037354 5 50
2 -0.8783492 6 50
3 -0.7769085 7 50
Then tried adding them as another layer:
plotC + geom_line(data=means, aes(x=variable, y=value, color="red", group=factor(gene)), size=0.75)
But I get an error Error: Discrete value supplied to continuous scale
Do you have any suggestions as to how I can plot this mean on the same graph in another color?
Thank you,
Anna
edit: the answer by RG20 is helpful, thanks for pointing out I had the color in the wrong place. However it plots the line outside the rest of the graph... I really don't understand what's wrong with my graph...
enter image description here
plotC + geom_line(data=means, aes(x=variable, y=value, group=factor(gene)), color='red',size=0.75)

Preventing incosistent spacing/bar widths in geom_bar with many bars

In a bar plot with lots of bars the problem occurs that the spacing between bars and/or the widths of the bars becomes incosistent, also changing with changing the width of the plot.
set.seed(23511)
dat <- data.frame(x = 1:540, y = rnorm(540))
library(ggplot2)
ggplot(dat) +
geom_bar(aes(x = x, y = y), stat = "identity")
Is there a way to solve this? I tried playing with width and the overall plot size to no avail.
In response to alistaire's comment here's a screenshot of the first few bars from RStudio. Looking at the first 10 values..
x y
1 1 0.9450960
2 2 0.9277378
3 3 0.4371033
4 4 -1.0333073
5 5 2.0473397
6 6 0.8174123
7 7 0.4277842
8 8 -0.4336887
9 9 0.2156801
10 10 0.4918345
.. to me it clearly looks like for the first 3 positive values there's space between the bars/the bars are narrower than for the second set of 3 positive values where there's no space between the bars/the bars are wider.
I think this is a pixel issue. If the x of a bar goes from 1.5 to 2.7 pixels, it will be one pixel wide, if it goes from 1.9 to 3.1 (same width) it will be 2 pixels wide.
You could do lines instead of bars.
ggplot(data=dat, aes(x=x, y=y)) +
geom_segment(aes(xend=x, yend=0), size = 0.6)
I think you still sometimes run into pixel issues, but it's maybe easier to control with size.

Creating a Bar Plot with Proportions on ggplot

I'm trying to create a bar graph on ggplot that has proportions rather than counts, and I have c+geom_bar(aes(y=(..count..)/sum(..count..)*100)) but I'm not sure what either of the counts refer to. I tried putting in the data but it didn't seem to work. What should I input here?
This is the data I'm using
> describe(topprob1)
topprob1
n missing unique Info Mean
500 0 9 0.93 3.908
1 2 3 4 5 6 7 8 9
Frequency 128 105 9 15 13 172 39 12 7
% 26 21 2 3 3 34 8 2 1
You haven't provided a reproducible example, so here's an illustration with the built-in mtcars data frame. Compare the following two plots. The first gives counts. The second gives proportions, which are displayed in this case as percentages. ..count.. is an internal variable that ggplot creates to store the count values.
library(ggplot2)
library(scales)
ggplot(mtcars, aes(am)) +
geom_bar()
ggplot(mtcars, aes(am)) +
geom_bar(aes(y=..count../sum(..count..))) +
scale_y_continuous(labels=percent_format())
You can also use ..prop.. computed variable with group aesthetics:
library(ggplot2)
library(scales)
ggplot(mtcars, aes(am)) +
geom_bar(aes(y=..prop.., group = 1)) +
scale_y_continuous(labels=percent_format())

Multiple Plots in R

I want to plot 2 graphs in 1 frame. Basically I want to compare the results.
Anyways, the code I tried is:
plot(male,pch=16,col="red")
lines(male,pch=16,col="red")
par(new=TRUE)
plot(female,pch=16,col="green")
lines(female,pch=16,col="green")
When I run it, I DO get 2 plots in a frame BUT it changes my y-axis. Added my plot below. Anyways, y-axis values are -4,-4,-3,-3,...
It's like both of the plots display their own axis.
Please help.
Thanks
You don't need the second plot. Just use
> plot(male,pch=16,col="red")
> lines(male, pch=16, col = "red")
> lines(female, pch=16, col = "green")
> points(female, pch=16, col = "green")
Note: that will set the frame boundaries based on the first data set, so some data from the second plot could be outside the boundaries of the plot. You can fix it by e.g. setting the limits of the first plot yourself.
For this kind of plot I usually like the plotting with ggplot2 much better. The main reason: It generalizes nicely to more than two lines without a lot of code.
The drawback for your sample data is that it is not available as a data.frame, which is required for ggplot2. Furthermore, in every case you need a x-variable to plot against. Thus, first let us create a data.frame out of your data.
dat <- data.frame(index=rep(1:10, 2), vals=c(male, female), group=rep(c('male', 'female'), each=10))
Which leaves us with
> dat
index vals group
1 1 -0.4334269341 male
2 2 0.8829902521 male
3 3 -0.6052638138 male
4 4 0.2270191965 male
5 5 3.5123679143 male
6 6 0.0615821014 male
7 7 3.6280155376 male
8 8 2.3508890457 male
9 9 2.9824432680 male
10 10 1.1938052833 male
11 1 1.3151289227 female
12 2 1.9956491556 female
13 3 0.8229389822 female
14 4 1.2062726250 female
15 5 0.6633392820 female
16 6 1.1331669670 female
17 7 -0.9002109636 female
18 8 3.2137052284 female
19 9 0.3113656610 female
20 10 1.4664434215 female
Note that my command assumes you have 10 data values each. That command would have to be adjusted according to your actual data.
Now we may use the mighty power of ggplot2:
library(ggplot2)
ggplot(dat, aes(x=index, y=vals, color=group)) + geom_point() + geom_line()
The call above has three elements: ggplot initializes the plot, tells R to use dat as datasource and defines the plot aesthetics, or better: Which aesthetic properties of the plot (such as color, position, size, etc.) are influenced by your data. We use the x and y-values as expected and furthermore set the color aesthetic to the grouping variable - that makes ggplot automatically plot two groups with different colors. Finally, we add two geometries, that pretty much do what is written above: Draw lines and draw points.
The result:
If you have your data saved in the standard way in R (in a data.frame), you end with one line of code. And if after some thousands years of evolution you want to add another gender, it is still one line of code.

Resources