I have created a barchart but the bars from the two data sets are overlying one another. I was wondering if anyone could help me separate the bars of the two data sets so they are sitting side by side rather than overlapping. Both of the categories for the x axis are exactly the same. Here is my code:
h.length.category <- sabdata.dat[,"H_Length_Category"]
h.length.sum <- sabdata.dat[,"H_Length_Sum"]
v.length.category <- sabdata.dat[,"V_Length_Category"]
v.length.sum <- sabdata.dat[,"V_Length_Sum"]
hum.len <- tapply(h.length.sum, list(h.length.category), sum)
ven.len <- tapply(v.length.sum, list(v.length.category), sum)
barplot(hum.len, ylim = c(0,80), las = 2, xlab = "Length (mm)", ylab = "Number of individuals", col = "dark grey")
par(new=T)
barplot(ven.len, ylim = c(0,80), las = 2, xlab = "", ylab = "", axes = F, col = "light grey")
par(new=F)
Here's a subset of the data:
H_Length_Category H_Length_Sum V_Length_Category V_Length_Sum
08-09.9 0 08-09.9 1
10-11.9 0 10-11.9 10
12-13.9 3 12-13.9 31
14-15.9 12 14-15.9 58
16-17.9 30 16-17.9 66
18-19.9 35 18-19.9 77
20-21.9 62 20-21.9 64
22-23.9 63 22-23.9 41
I think what's happening--without having seen your data--is that you're trying to overlay two different plots on the same set of axes. As a result, you're covering up what gets what was plotted first:
#make up some data
x <- c(10, 11, 12, 16)
y <- c(9, 12, 10, 13)
barplot(x)
barplot(y, col = "yellow", add = T) #The add statement is effectively the same as what you coded above
However, if all of your data is in one matrix,
dF <- as.matrix(cbind(x, y))
barplot(dF, beside = T)
The result is probably much closer to what you're looking for. Depending upon your data and how you want to present it, you may have to determine how your matrix is formatted to display what you want to display.
Related
I have grouped data, where every one of the bars has a different sample size, ranging from 0 samples to >600. I would like to have 2 more panels of this same graph for different data, and that would make it very crowded/hard to read if I simply wrote the sample size above each of the bars.
I decided to make a second axis and plot sample size as a dot plot over the bar chart. However I can't get it so the dots align over the bars. I've tried adjusting the width of the bars and spacing in between grouped bars and the sets of bars. And the spacing set for the dot plot should be the same as these widths/spaces (see verts). But its evidently not (see photo linked below). Does anyone have an idea of what is going wrong? Is there any fix or should I move on to trying a different way to communicate the sample sizes?
Here is a pared-down version of the code I am using to draw the figure and a picture of what it looks like right now.
#From https://statisticsglobe.com/r-draw-plot-with-two-y-axes
par(mar = c(5, 4, 4, 4) + 0.3) # Additional space for second y-axis
barplot(t(mxAe), beside=T,
space=c(0,0.75), width=c(0.75,0.75), # Spacing of bars
las=2, col= c("#DDCC77", "#44AA99") ,
ylim=c(0,100) ,
xlim=c(0.5,45),
main="")
par(new = TRUE) # Add new plot
plot(x=mxAe2$place,y=mxAe2$Tot, pch = 16,
cex= 0.5, col = 1, axes = FALSE,
xlab = "", ylab = "") # Create second plot without axes
axis(side = 4, at = pretty(range(0,800))) # Add second axis
abline(v=verts, col="gray30", lty=3) # Add vertical lines along dot plot points
verts <- c(1,1.75,3,3.75,5,5.75,7,7.75,9,9.75,11,11.75,
13, 13.75,15,15.75,17,17.75, 19, 19.75,21,21.75,
23,23.75,25,25.75,27,27.75,29,29.75,31,31.75,
33,33.75,35,35.75,37,37.75,39,39.75,41,41.75,43,43.75) #Position of dots
Reproducible code:
df_mxAe <- data.frame(group1 <- c(9,0,30),group2 <- c(5,20,90))
dotx <- c(1.375,2.125,3.625,4.375,5.875,6.625)
doty <- c(200, 400, 0, 600, 50, 100)
par(mar = c(5, 4, 4, 4) + 0.3) # Additional space for second y-axis
barplot(t(df_mxAe), beside=T, space=c(0,1), width=c(0.75,0.75),
las=2 ,
col= c("#DDCC77", "#44AA99") ,
ylim=c(0,100),
xlim=c(0.5,6.625),
main="") # Create first plot
par(new = TRUE) # Add new plot
plot(x=dotx,y=doty, pch = 18,
cex= 0.5, col = 1, axes = FALSE, xlim=c(0.5,6.625),
xlab = "", ylab = "") # Create second plot without axes
axis(side = 4, at = pretty(range(0,800))) # Add second axis
abline(v=dotx, col="gray30", lty=3) # Add vertical lines along dot plot points
I think that given your setup, your vents should maybe look more like this:
verts <- NULL
k <- 1
for(i in 1:22){
x <- c(.375, .375+.75)
verts <- c(verts, k+x)
k <- max(verts) + .375 + .75
}
verts
# [1] 1.375 2.125 3.625 4.375 5.875 6.625 8.125 8.875 10.375 11.125 12.625 13.375 14.875 15.625 17.125 17.875 19.375 20.125 21.625 22.375
# [21] 23.875 24.625 26.125 26.875 28.375 29.125 30.625 31.375 32.875 33.625 35.125 35.875 37.375 38.125 39.625 40.375 41.875 42.625 44.125 44.875
# [41] 46.375 47.125 48.625 49.375
Since the first bar starts at 1 and has a width of .75, you want the line to be half-way between the start and end of the bar, which would be 1.375. The second bar starts at 1.75 and goes to 2.5. Again, half-way between those two numbers is 2.125. After the second bar ends at 2.5 there is a .75 space, which means the third bar (first in the second group) starts at 2.5+.75 = 3.25. So, the line through the third bar should be at 3.25 + .375 = 3.625, etc...
My current plot:
My desired plot (nevermind the variables s)
Specifically: explanatory variables on the bottom with an x-axis, response variables on the right, relative frequency and the y-axis on the left. I'll attach my R code below.
mosaictable <- matrix (c (3, 9, 22, 21), byrow = T, ncol = 2)
rownames (mosaictable) = c ("White", "Blue ")
colnames (mosaictable) = c ("Captured", "Not Captured")
mosaicplot ((mosaictable), sub = "Pigeon Color", ylab = "Relative frequency",
col = c ("firebrick", "goldenrod1"), font = 2, main = "Mosaic Plot of Pigeon Color and Their Capture Rate"
)
axis (1)
axis (4)
This particular flavor of mosaic display where you have a "dependent" variable on the y-axis and want to add corresponding annotation, is sometimes also called a "spine plot". R implements this in the spineplot() function. Also plot(y ~ x) internally calls spineplot() when both y and x are categorical.
In your case, spineplot() does almost everything you want automatically provided that you supply it with a nicely formatted "table" object:
tab <- as.table(matrix(c(3, 22, 9, 21), ncol = 2))
dimnames(tab) <- list(
"Pigeon Color" = c("White", "Blue"),
"Relative Frequency" = c("Captured", "Not Captured")
)
tab
## Relative Frequency
## Pigeon Color Captured Not Captured
## White 3 9
## Blue 22 21
And then you get:
spineplot(tab)
Personally, I would leave it at that. But if it is really important to switch the axis labels from left to right and vice versa, then you can do so by first suppressing axes = FALSE and then adding them manually afterwards. The coordinates for that need to be obtained from the marginal distribution of the first variable and the conditional distribution of the second variable given the first, respectively
x <- prop.table(margin.table(tab, 1))
y <- prop.table(tab, 1)[2, ]
spineplot(tab, col = c("firebrick", "goldenrod1"), axes = FALSE)
axis(1, at = c(0, x[1]) + x/2, labels = rownames(tab), tick = FALSE)
axis(2)
axis(4, at = c(0, y[1]) + y/2, labels = colnames(tab), tick = FALSE)
My current plot:
My desired plot (nevermind the variables s)
Specifically: explanatory variables on the bottom with an x-axis, response variables on the right, relative frequency and the y-axis on the left. I'll attach my R code below.
mosaictable <- matrix (c (3, 9, 22, 21), byrow = T, ncol = 2)
rownames (mosaictable) = c ("White", "Blue ")
colnames (mosaictable) = c ("Captured", "Not Captured")
mosaicplot ((mosaictable), sub = "Pigeon Color", ylab = "Relative frequency",
col = c ("firebrick", "goldenrod1"), font = 2, main = "Mosaic Plot of Pigeon Color and Their Capture Rate"
)
axis (1)
axis (4)
This particular flavor of mosaic display where you have a "dependent" variable on the y-axis and want to add corresponding annotation, is sometimes also called a "spine plot". R implements this in the spineplot() function. Also plot(y ~ x) internally calls spineplot() when both y and x are categorical.
In your case, spineplot() does almost everything you want automatically provided that you supply it with a nicely formatted "table" object:
tab <- as.table(matrix(c(3, 22, 9, 21), ncol = 2))
dimnames(tab) <- list(
"Pigeon Color" = c("White", "Blue"),
"Relative Frequency" = c("Captured", "Not Captured")
)
tab
## Relative Frequency
## Pigeon Color Captured Not Captured
## White 3 9
## Blue 22 21
And then you get:
spineplot(tab)
Personally, I would leave it at that. But if it is really important to switch the axis labels from left to right and vice versa, then you can do so by first suppressing axes = FALSE and then adding them manually afterwards. The coordinates for that need to be obtained from the marginal distribution of the first variable and the conditional distribution of the second variable given the first, respectively
x <- prop.table(margin.table(tab, 1))
y <- prop.table(tab, 1)[2, ]
spineplot(tab, col = c("firebrick", "goldenrod1"), axes = FALSE)
axis(1, at = c(0, x[1]) + x/2, labels = rownames(tab), tick = FALSE)
axis(2)
axis(4, at = c(0, y[1]) + y/2, labels = colnames(tab), tick = FALSE)
This question already has answers here:
ggplot with 2 y axes on each side and different scales
(18 answers)
Closed 6 years ago.
I am struggling with something that, I believe, should be pretty straighforward in R.
Please consider the following example:
library(dplyr)
library(tidyverse)
time = c('2013-01-03 22:04:21.549', '2013-01-03 22:04:22.349', '2013-01-03 22:04:23.559', '2013-01-03 22:04:25.559' )
value1 = c(1,2,3,4)
value2 = c(400,500,444,210)
data <- data_frame(time, value1, value2)
data <-data %>% mutate(time = as.POSIXct(time))
> data
# A tibble: 4 × 3
time value1 value2
<dttm> <dbl> <dbl>
1 2013-01-03 22:04:21 1 400
2 2013-01-03 22:04:22 2 500
3 2013-01-03 22:04:23 3 444
4 2013-01-03 22:04:25 4 210
My problem is simple:
I want to plot value1 AND value2 on the SAME chart with TWO different Y axis.
Indeed, as you can see in the example, the units are largely different between the two variables so using just one axis would compress one of the time series.
Surprisingly, getting a nice looking chart for this problem has proven to be very difficult. I am mad (of course, not really mad. Just puzzled ;)).
In Python Pandas, one could simply use:
data.set_index('time', inplace = True)
data[['value1', 'value2']].plot(secondary_y = 'value2')
in Stata, one could simply say:
twoway (line value1 time, sort ) (line value2 time, sort)
In R, I don't know how to do it. Am I missing something here? Base R, ggplot2, some weird package, any working solution with decent customization options would be fine here.
A base R hack that may answer your need. I'll go out of my way to make it clear which components (blue vs red) are responsible for what components. It's ugly, but it demonstrates the requisite points. Using your data:
# making sure the left and right sides have the same space
par(mar = c(4,4,1,4) + 0.1)
# first plot
plot(value1 ~ time, data = data, pch = 16, col = "blue", las = 1,
col.axis = "blue", col.lab = "blue")
grid(lty = 1, col = "blue")
# "reset" the whole plot for an overlay
par(fig = c(0,1,0,1), new = TRUE)
# second plot, sans axes and other annotation
plot(value2 ~ time, data = data, pch = 16, col = "red",
axes = FALSE, ann = FALSE)
grid(lty = 3, col = "red")
# add the right-axis and label
axis(side = 4, las = 1, col.axis = "red")
mtext("value2", side = 4, line = 3, col = "red")
I added the grids to highlight an aesthetic issue: they don't align "neatly". If you're okay with that, feel free to stop now.
Here's one method (which has not been tested with significantly-different data ranges). (There are most certainly other methods depending on your data and your preferences.)
# one way that may "normalize" the y-axes for you, so that the grid should be identical
y1 <- pretty(data$value1)
y1n <- length(y1)
y2 <- pretty(data$value2)
y2n <- length(y2)
if (y1n < y2n) {
y1 <- c(y1, y1[y1n] + diff(y1)[1])
} else if (y1n > y2n) {
y2 <- c(y2, y2[y2n] + diff(y2)[1])
}
And the ensuing plot, adding ylim=range(...):
# making sure the left and right sides have the same space
par(mar = c(4,4,1,4) + 0.1)
# first plot
plot(value1 ~ time, data = data, pch = 16, col = "blue", las = 1, ylim = range(y1),
col.axis = "blue", col.lab = "blue")
grid(lty = 1, col = "blue")
# "reset" the whole plot for an overlay
par(fig = c(0,1,0,1), new = TRUE)
# second plot, sans axes and other annotation
plot(value2 ~ time, data = data, pch = 16, col = "red", ylim = range(y2),
axes = FALSE, ann = FALSE)
grid(lty = 3, col = "red")
# add the right-axis and label
axis(side = 4, las = 1, col.axis = "red")
mtext("value2", side = 4, line = 3, col = "red")
(Though the red-blue alternating grid lines are atrocious, they demonstrate that the grids do in fact align well.)
NB: the use of par(fig = c(0,1,0,1), new = TRUE) is a bit fragile. Doing things like changing margins or other significant changes between plots can easily break the overlay, and you won't really know unless you do some manual work to see how the additive process actually pans out. In this "check" process, you will likely want to remove axes=F, ann=F from the second plot in order to confirm that at least the boxes and x-axis are aligning as intended.
Version 2.2.0 of ggplot2 allows to define a secondary axis. Now, the second time series can be scaled appropriately and displayed in the same chart:
data %>%
mutate(value2 = value2 / 100) %>% # scale value2
gather(variable, value, -time) %>% # reshape wide to long
ggplot(aes(time, value, colour = variable)) +
geom_point() + geom_line() +
scale_y_continuous(name = "value1", sec.axis = sec_axis(~ . * 100, name = "value2"))
I'd like to make a histogram of my variable "sex" with the values 1 = male and 2 = female. My code works properly, but I'd like to have only the values 1 and 2 on the x-axis (at the moment R prints all values between 0 and 1 in steps which makes less sense in the case of sex).
hist(g1_sex,
main = "Häufigkeitsverteilung Geschlecht",
sub = "1 = männlich, 2 = weiblich",
xlab = "Geschlecht",
ylab ="Häufigkeit",
ylim = c(0,120),
col = "lightblue",
labels = TRUE,
breaks=2)
I already tried to do it with
breaks = seq (1,2,1)
but this doesn't look nice too.
I would be very thankful for every hint of you!
Best wishes!
I think you really want barplot. See examples:
set.seed(0); x <- rbinom(500, 1, 0.3) ## generate toy 0-1 data
y <- table(x) ## make contingency table
names(y) <- c("male", "female")
ylim = c(0, 1.2 * max(y)) ## set plotting range
z <- barplot(y, space = 0, col = 5, main = "statistics", ylim = ylim)
text(z, y + 20, y, cex = 2, col = 5) ## add count number above each bar
I have also give solutions to add number above each bar, by setting extra space on the top using ylim, and use text to put texts.
Note that barplot also accepts main, etc, so you can add other annotations if you want.