R: Bar plot on a continuous x-axis (time-scaled) - r

I'm fairly new to R so please comment on anything you see.
I have data taken at different timepoints, under two conditions (for one timpoint) and I want to plot this as a bar plot with errorbars and with the bars at the appropriate timepoint.
I currently have this (stolen from another question on this site):
library(ggplot2)
example <- data.frame(tp = factor(c(0, "14a", "14b", 24, 48, 72)), means = c(1, 2.1, 1.9, 1.8, 1.7, 1.2), std = c(0.3, 0.4, 0.2, 0.6, 0.2, 0.3))
ggplot(example, aes(x = tp, y = means)) +
geom_bar(position = position_dodge()) +
geom_errorbar(aes(ymin=means-std, ymax=means+std))
Now my timepoints are a factor, but the fact that there is an unequal distribution of measurements across time makes the plot less nice.!
This is how I imagine the graph :
I find the ggplot2 package can give you very nice graphs, but I have a lot more difficulty understanding it than I have with other R stuff.

Before we get into R, you have to realize that even in a bar plot the x axis needs a numeric value. If you treat them as factors then the software assumes equal spacing between the bars by default. What would be the x-values for each of the bars in this case? It can be (0, 14, 14, 24, 48, 72) but then it will plot two bars at point 14 which you don't seem to want. So you have to come up with the x-values.
Joran provides an elegant solution by modifying the width of the bars at position 14. Modifying the code given by joran to make the bars fall at the right position in the x-axis, the final solution is:
library(ggplot2)
example <- data.frame(tp = factor(c(0, "14a", "14b", 24, 48, 72)), means = c(1, 2.1, 1.9, 1.8, 1.7, 1.2), std = c(0.3, 0.4, 0.2, 0.6, 0.2, 0.3))
example$tp1 <- gsub("a|b","",example$tp)
example$grp <- c('a','a','b','a','a','a')
example$tp2 <- as.numeric(example$tp1)
ggplot(example, aes(x = tp2, y = means,fill = grp)) +
geom_bar(position = "dodge",stat = "identity") +
geom_errorbar(aes(ymin=means-std, ymax=means+std),position = "dodge")

Related

R stairstep without upwards line after last data point

I'm plotting a cumulative step function, and I want to suppress the behavior of the line jumping up after the last row in dataset. This happens both in base R and ggplot2.
Is there a way to do it without specifying xlim to exclude the jump upwards?
data = data.frame(V1 = c(-0.1, 0, 0, 1, 1.1), V2 = c(0, 0, 0.7, 0.3, 0.3))
base R
plot(data$V1, cumsum(data$V2), type="s")
ggplot2
ggplot(data, aes(x=V1, y=cumsum(V2))) +
geom_step()
The way the step function works seems correct to me, if you take sum(data$V2) that is 1.3 and that is where your line ends. It is also identical to tail(cumsum(data$V2), 1). However, if you insist on not drawing the last line segment, you can set the last value of data$V2 to 0. Example below:
library(ggplot2)
data = data.frame(V1 = c(-0.1, 0, 0, 1, 1.1), V2 = c(0, 0, 0.7, 0.3, 0.3))
ggplot(data, aes(x = V1, y = cumsum(c(head(V2, -1), 0)))) +
geom_step()
Note that the example doesn't generalise to multiple groups; pre-processing the data should help then.

R code for plotting multiple line segments with unique R ranges

I know there are many many questions on here around plotting multiple lines in a graph in R, but I've been struggling with a more specific task. I would like to add multiple line segments to a graph using only the intercept and slope specified for each line. abline() would work great for this, except each line has a specific range on the X axis, and I do not want the line plotted beyond the range.
I managed to get the graph I want using plotrix, but I am hoping to publish the work, and the graph does not look up-to-par (very basic). I am somewhat familiar with ggpplot, and think that graphs generated in ggplot look much better than what I have made, especially with the various themes availible, but I cannot figure out how to do something similar using ggplot.
Code:
library(plotrix)
plot(1, type="n", xlab="PM2.5(ug/m3)", ylab="LogRR Preeclampsia ", xlim=c(0, 20), ylim=c(-1, 2.5))
ablineclip(a = 0, b = 0.3, x1=1.2, x2=3)
ablineclip(a = 0, b = 0.08, x1=8.0, x2=13.1)
ablineclip(a = 0, b = 0.5, x1=10.1, x2=18.9)
ablineclip(a = 0, b = 0.12, x1=2.6, x2=14.1)
Any help would be appreciated!
Thank you.
You can write a basic function doing a bit of algebra to calculate the start/stop points for the line segments and then feed that into ggplot. For example
to_points <- function(intercept, slope, start, stop) {
data.frame(
segment = seq_along(start),
xstart = start,
xend = stop,
ystart = intercept + slope*start,
yend = intercept + slope*stop)
}
And then use that with
library(ggplot2)
segments <- to_points(0, c(0.3, 0.08, 0.5, .12),
c(1.2, 8.0, 10.1, 2.6),
c(3, 13.1, 18.9, 14.2))
ggplot(segments) +
aes(xstart, ystart, xend=xend, yend=yend) +
geom_segment() +
coord_cartesian(xlim=c(0,20), ylim=c(-1, 2.5)) +
labs(x="PM2.5(ug/m3)", y="LogRR Preeclampsia ")
That will produce the following plot
(Note the third segment is outside the region you specified. You can drop the coord_cartesian to see all the segments.)

How to put labels between columns in a bar plot in R?

I'm a beginner with R and looking for help with plotting.
I would like to make a distribution plot in R that looks like a histogram of continuous data bucketed into columns with x-axis labels between each column to denote the range captured in each column.
Instead of continuous data though, I only have the bucketed counts. I can create a plot with barplot, however I can't find a way to label BETWEEN the columns to denote the range captured in each bar.
I've tried barplot but cannot get the labels to fall between columns instead of being treated as column labels and falling directly beneath each column.
dat$freq = c(5,15,20,10)
dat$mid = c(-1.5,-.5,.5,1.5) #midpoint in each bucketed range
dat$perc = dat$freq/sum(dat$freq)
barplot(dat$perc, names.arg = dat$mid)
Each column is labeled with the midpoint. I would instead like the labels to be -2,-1,0,1,2 BETWEEN the columns.
Thank you!
edit: dput(dat) outputs:
list(freq = c(5, 15, 20, 10), mid = c(-1.5, -0.5, 0.5, 1.5), perc =
c(0.1, 0.3, 0.4, 0.2))
Is this what you're after?
df <- data.frame(freq = c(5, 15, 20, 10), mid = c(-1.5, -0.5, 0.5, 1.5), perc = c(0.1, 0.3, 0.4, 0.2))
I'm using the awesome and highly customisable library ggplot2 to plot this, which renders the plot as I think you want it. You can install this with install.packages('ggplot2'):
# install.packages('ggplot2')
library(ggplot2)
p <- ggplot(df)
p <- p + geom_bar(aes(mid, perc), stat='identity')
p

How to get quantile from category count in R?

For example, I have a sample data of human height in a DataFrame:
df <- data_frame(height = c(1.5, 1.6, 1.7, 1.8, 1.9), number = c(20, 30, 50, 30, 20))
How can I calculate the 90% quantile of this sample?
I know ggplot2 has a function can plot the ecdf of the sample:
ggplot(df, aes(x = height, y = number)) + stat_ecdf()
but I only need a specified quantile not the plot.
I could repeat each height number times to make a vector and use the quantile function on the vector, but as the number getting larger, this method seems to be very inefficient.
EDIT:
It seems stat_ecdf are not supposed to be used in this way, and when data distribution is skewed:
df <- data_frame(height = c(1.5, 1.6, 1.7, 1.8, 1.9), number = c(100, 2, 3, 4, 5))
only quantile of the repeated vector gives the desired result:
quantile(c(rep(1.5,100), rep(1.6,2), rep(1.7,3), rep(1.8,4), rep(1.9,5)))

Cutpoint histograms in ggplot2

In Minitab, it is easy to create "cutpoint" histograms. Can this be done in ggplot2?
For example,
df <- data.frame(x = c(0.08, 0.21, 0.25, 0.4, 0.5, 0.6))
ggplot(df, aes(x = x)) + geom_histogram(binwidth = 0.1)
As you can see here, R defaults to "midpoint" histograms. The bar containing the 0.08 is being marked with a bar at the 0.1. The bar containing the 0.21 and 0.25 is being marked at the 0.2 and so forth.
Can I somehow change these bars so the first bar covers the area between 0 and 0.1, and the second bar covers the area between 0.2 and 0.3, and so forth?
You can get rid of the problem in two ways: using parameter "center" or using "boundary". With "center" you can specify the center of one of the bins, "boundary" is similar but you specify the value of a boundary between two bins. Worth noting that "center" and "boundary" can be either above or below the range of the data, in this case the value provided will be shifted of an adequate number of widths.
In this case you already know the width of the bin, its boundaries, so with this parameters you can easily do what you asked:
library(ggplot2)
df <- data.frame(x = c(0.08, 0.21, 0.25, 0.4, 0.5, 0.6))
# This is "center" solution:
ggplot(df, aes(x = x)) + geom_histogram(binwidth = 0.1, center=0.05)
# This is with "boundary" parameter
ggplot(df, aes(x = x)) + geom_histogram(binwidth = 0.1, boundary=0.1)
You can find details and more information on the reference ?geom_histogram.
Hope this helps

Resources