I'd like to make a histogram of my variable "sex" with the values 1 = male and 2 = female. My code works properly, but I'd like to have only the values 1 and 2 on the x-axis (at the moment R prints all values between 0 and 1 in steps which makes less sense in the case of sex).
hist(g1_sex,
main = "Häufigkeitsverteilung Geschlecht",
sub = "1 = männlich, 2 = weiblich",
xlab = "Geschlecht",
ylab ="Häufigkeit",
ylim = c(0,120),
col = "lightblue",
labels = TRUE,
breaks=2)
I already tried to do it with
breaks = seq (1,2,1)
but this doesn't look nice too.
I would be very thankful for every hint of you!
Best wishes!
I think you really want barplot. See examples:
set.seed(0); x <- rbinom(500, 1, 0.3) ## generate toy 0-1 data
y <- table(x) ## make contingency table
names(y) <- c("male", "female")
ylim = c(0, 1.2 * max(y)) ## set plotting range
z <- barplot(y, space = 0, col = 5, main = "statistics", ylim = ylim)
text(z, y + 20, y, cex = 2, col = 5) ## add count number above each bar
I have also give solutions to add number above each bar, by setting extra space on the top using ylim, and use text to put texts.
Note that barplot also accepts main, etc, so you can add other annotations if you want.
Related
My current plot:
My desired plot (nevermind the variables s)
Specifically: explanatory variables on the bottom with an x-axis, response variables on the right, relative frequency and the y-axis on the left. I'll attach my R code below.
mosaictable <- matrix (c (3, 9, 22, 21), byrow = T, ncol = 2)
rownames (mosaictable) = c ("White", "Blue ")
colnames (mosaictable) = c ("Captured", "Not Captured")
mosaicplot ((mosaictable), sub = "Pigeon Color", ylab = "Relative frequency",
col = c ("firebrick", "goldenrod1"), font = 2, main = "Mosaic Plot of Pigeon Color and Their Capture Rate"
)
axis (1)
axis (4)
This particular flavor of mosaic display where you have a "dependent" variable on the y-axis and want to add corresponding annotation, is sometimes also called a "spine plot". R implements this in the spineplot() function. Also plot(y ~ x) internally calls spineplot() when both y and x are categorical.
In your case, spineplot() does almost everything you want automatically provided that you supply it with a nicely formatted "table" object:
tab <- as.table(matrix(c(3, 22, 9, 21), ncol = 2))
dimnames(tab) <- list(
"Pigeon Color" = c("White", "Blue"),
"Relative Frequency" = c("Captured", "Not Captured")
)
tab
## Relative Frequency
## Pigeon Color Captured Not Captured
## White 3 9
## Blue 22 21
And then you get:
spineplot(tab)
Personally, I would leave it at that. But if it is really important to switch the axis labels from left to right and vice versa, then you can do so by first suppressing axes = FALSE and then adding them manually afterwards. The coordinates for that need to be obtained from the marginal distribution of the first variable and the conditional distribution of the second variable given the first, respectively
x <- prop.table(margin.table(tab, 1))
y <- prop.table(tab, 1)[2, ]
spineplot(tab, col = c("firebrick", "goldenrod1"), axes = FALSE)
axis(1, at = c(0, x[1]) + x/2, labels = rownames(tab), tick = FALSE)
axis(2)
axis(4, at = c(0, y[1]) + y/2, labels = colnames(tab), tick = FALSE)
My current plot:
My desired plot (nevermind the variables s)
Specifically: explanatory variables on the bottom with an x-axis, response variables on the right, relative frequency and the y-axis on the left. I'll attach my R code below.
mosaictable <- matrix (c (3, 9, 22, 21), byrow = T, ncol = 2)
rownames (mosaictable) = c ("White", "Blue ")
colnames (mosaictable) = c ("Captured", "Not Captured")
mosaicplot ((mosaictable), sub = "Pigeon Color", ylab = "Relative frequency",
col = c ("firebrick", "goldenrod1"), font = 2, main = "Mosaic Plot of Pigeon Color and Their Capture Rate"
)
axis (1)
axis (4)
This particular flavor of mosaic display where you have a "dependent" variable on the y-axis and want to add corresponding annotation, is sometimes also called a "spine plot". R implements this in the spineplot() function. Also plot(y ~ x) internally calls spineplot() when both y and x are categorical.
In your case, spineplot() does almost everything you want automatically provided that you supply it with a nicely formatted "table" object:
tab <- as.table(matrix(c(3, 22, 9, 21), ncol = 2))
dimnames(tab) <- list(
"Pigeon Color" = c("White", "Blue"),
"Relative Frequency" = c("Captured", "Not Captured")
)
tab
## Relative Frequency
## Pigeon Color Captured Not Captured
## White 3 9
## Blue 22 21
And then you get:
spineplot(tab)
Personally, I would leave it at that. But if it is really important to switch the axis labels from left to right and vice versa, then you can do so by first suppressing axes = FALSE and then adding them manually afterwards. The coordinates for that need to be obtained from the marginal distribution of the first variable and the conditional distribution of the second variable given the first, respectively
x <- prop.table(margin.table(tab, 1))
y <- prop.table(tab, 1)[2, ]
spineplot(tab, col = c("firebrick", "goldenrod1"), axes = FALSE)
axis(1, at = c(0, x[1]) + x/2, labels = rownames(tab), tick = FALSE)
axis(2)
axis(4, at = c(0, y[1]) + y/2, labels = colnames(tab), tick = FALSE)
This question already has answers here:
ggplot with 2 y axes on each side and different scales
(18 answers)
Closed 6 years ago.
I am struggling with something that, I believe, should be pretty straighforward in R.
Please consider the following example:
library(dplyr)
library(tidyverse)
time = c('2013-01-03 22:04:21.549', '2013-01-03 22:04:22.349', '2013-01-03 22:04:23.559', '2013-01-03 22:04:25.559' )
value1 = c(1,2,3,4)
value2 = c(400,500,444,210)
data <- data_frame(time, value1, value2)
data <-data %>% mutate(time = as.POSIXct(time))
> data
# A tibble: 4 × 3
time value1 value2
<dttm> <dbl> <dbl>
1 2013-01-03 22:04:21 1 400
2 2013-01-03 22:04:22 2 500
3 2013-01-03 22:04:23 3 444
4 2013-01-03 22:04:25 4 210
My problem is simple:
I want to plot value1 AND value2 on the SAME chart with TWO different Y axis.
Indeed, as you can see in the example, the units are largely different between the two variables so using just one axis would compress one of the time series.
Surprisingly, getting a nice looking chart for this problem has proven to be very difficult. I am mad (of course, not really mad. Just puzzled ;)).
In Python Pandas, one could simply use:
data.set_index('time', inplace = True)
data[['value1', 'value2']].plot(secondary_y = 'value2')
in Stata, one could simply say:
twoway (line value1 time, sort ) (line value2 time, sort)
In R, I don't know how to do it. Am I missing something here? Base R, ggplot2, some weird package, any working solution with decent customization options would be fine here.
A base R hack that may answer your need. I'll go out of my way to make it clear which components (blue vs red) are responsible for what components. It's ugly, but it demonstrates the requisite points. Using your data:
# making sure the left and right sides have the same space
par(mar = c(4,4,1,4) + 0.1)
# first plot
plot(value1 ~ time, data = data, pch = 16, col = "blue", las = 1,
col.axis = "blue", col.lab = "blue")
grid(lty = 1, col = "blue")
# "reset" the whole plot for an overlay
par(fig = c(0,1,0,1), new = TRUE)
# second plot, sans axes and other annotation
plot(value2 ~ time, data = data, pch = 16, col = "red",
axes = FALSE, ann = FALSE)
grid(lty = 3, col = "red")
# add the right-axis and label
axis(side = 4, las = 1, col.axis = "red")
mtext("value2", side = 4, line = 3, col = "red")
I added the grids to highlight an aesthetic issue: they don't align "neatly". If you're okay with that, feel free to stop now.
Here's one method (which has not been tested with significantly-different data ranges). (There are most certainly other methods depending on your data and your preferences.)
# one way that may "normalize" the y-axes for you, so that the grid should be identical
y1 <- pretty(data$value1)
y1n <- length(y1)
y2 <- pretty(data$value2)
y2n <- length(y2)
if (y1n < y2n) {
y1 <- c(y1, y1[y1n] + diff(y1)[1])
} else if (y1n > y2n) {
y2 <- c(y2, y2[y2n] + diff(y2)[1])
}
And the ensuing plot, adding ylim=range(...):
# making sure the left and right sides have the same space
par(mar = c(4,4,1,4) + 0.1)
# first plot
plot(value1 ~ time, data = data, pch = 16, col = "blue", las = 1, ylim = range(y1),
col.axis = "blue", col.lab = "blue")
grid(lty = 1, col = "blue")
# "reset" the whole plot for an overlay
par(fig = c(0,1,0,1), new = TRUE)
# second plot, sans axes and other annotation
plot(value2 ~ time, data = data, pch = 16, col = "red", ylim = range(y2),
axes = FALSE, ann = FALSE)
grid(lty = 3, col = "red")
# add the right-axis and label
axis(side = 4, las = 1, col.axis = "red")
mtext("value2", side = 4, line = 3, col = "red")
(Though the red-blue alternating grid lines are atrocious, they demonstrate that the grids do in fact align well.)
NB: the use of par(fig = c(0,1,0,1), new = TRUE) is a bit fragile. Doing things like changing margins or other significant changes between plots can easily break the overlay, and you won't really know unless you do some manual work to see how the additive process actually pans out. In this "check" process, you will likely want to remove axes=F, ann=F from the second plot in order to confirm that at least the boxes and x-axis are aligning as intended.
Version 2.2.0 of ggplot2 allows to define a secondary axis. Now, the second time series can be scaled appropriately and displayed in the same chart:
data %>%
mutate(value2 = value2 / 100) %>% # scale value2
gather(variable, value, -time) %>% # reshape wide to long
ggplot(aes(time, value, colour = variable)) +
geom_point() + geom_line() +
scale_y_continuous(name = "value1", sec.axis = sec_axis(~ . * 100, name = "value2"))
I have created a barchart but the bars from the two data sets are overlying one another. I was wondering if anyone could help me separate the bars of the two data sets so they are sitting side by side rather than overlapping. Both of the categories for the x axis are exactly the same. Here is my code:
h.length.category <- sabdata.dat[,"H_Length_Category"]
h.length.sum <- sabdata.dat[,"H_Length_Sum"]
v.length.category <- sabdata.dat[,"V_Length_Category"]
v.length.sum <- sabdata.dat[,"V_Length_Sum"]
hum.len <- tapply(h.length.sum, list(h.length.category), sum)
ven.len <- tapply(v.length.sum, list(v.length.category), sum)
barplot(hum.len, ylim = c(0,80), las = 2, xlab = "Length (mm)", ylab = "Number of individuals", col = "dark grey")
par(new=T)
barplot(ven.len, ylim = c(0,80), las = 2, xlab = "", ylab = "", axes = F, col = "light grey")
par(new=F)
Here's a subset of the data:
H_Length_Category H_Length_Sum V_Length_Category V_Length_Sum
08-09.9 0 08-09.9 1
10-11.9 0 10-11.9 10
12-13.9 3 12-13.9 31
14-15.9 12 14-15.9 58
16-17.9 30 16-17.9 66
18-19.9 35 18-19.9 77
20-21.9 62 20-21.9 64
22-23.9 63 22-23.9 41
I think what's happening--without having seen your data--is that you're trying to overlay two different plots on the same set of axes. As a result, you're covering up what gets what was plotted first:
#make up some data
x <- c(10, 11, 12, 16)
y <- c(9, 12, 10, 13)
barplot(x)
barplot(y, col = "yellow", add = T) #The add statement is effectively the same as what you coded above
However, if all of your data is in one matrix,
dF <- as.matrix(cbind(x, y))
barplot(dF, beside = T)
The result is probably much closer to what you're looking for. Depending upon your data and how you want to present it, you may have to determine how your matrix is formatted to display what you want to display.
I have a some data that I want to display graphically. Here's what it looks like:
data<- c(0.119197746, 0.054207788, 0.895580411, 0.64861727, 0.143249592,
0.284314897, 0.070027632, 0.297172433, 0.183569184, 0.713896071,
1.942425326, 1)
Using this command:
barplot(data, main="Ratio of Lipidated and Unlipidated LC3 I & II forms\nNormalized
to GAPDH", names.arg = c("PT250", "PT219", "PT165", "PT218", "PT244", "PT253", "PT279", "PT281",
"PT240", "PT262", "PT264", "CCD"), ylab = "Fold LC3 II/LC3I/GAPDH")
I produced this graph:
I would like to position the X-axis at 1 so that all values less-than-one will appear as down bars. I could achieve the desired affect by simply subtracting 1 from all of the values and plotting again but this would cause the numbers on the y-axis to be inaccurate. Is there some way to get R to plot values less than 1 as down bars?
Solution with custom axis.
barplot(data - 1, main="Ratio of Lipidated and Unlipidated LC3 I & II forms\nNormalized
to GAPDH", names.arg = c("PT250", "PT219", "PT165", "PT218", "PT244", "PT253", "PT279", "PT281",
"PT240", "PT262", "PT264", "CCD"), ylab = "Fold LC3 II/LC3I/GAPDH",
axes = F, ylim = c(-1, 1)
my_labs <- seq(-1, 1, by = 0.5)
axis(side = 2, at = my_labs, labels = my_labs + 1)