I have a table with averages and interquartile ranges. I would like to create a dotplot, where the dot would show this average, and a bar would stretch through the dot, to show the interquartile range. In other words, the dot would be at the midpoint of a bar, the length of which would equal my interquartile range data. I am working in R.
For example,
labels<-c('a','b','c','d')
averages<-c(10,40,20,30)
ranges<-c(5,8,4,10)
dotchart(averages,labels=labels)
where the ranges would then be added to this plot as bars.
Any ideas?
Thanks!
Yet another method, using base.
labels <- c('a', 'b', 'c', 'd')
averages <- c(10, 40, 20, 30)
ranges <- c(5, 8, 4, 10)
dotchart(averages, labels=labels, xlab='average', pch=20,
xlim=c(min(averages-ranges), max(averages+ranges)))
segments(averages-ranges, 1:4, averages+ranges, 1:4)
For the record, here's a lattice solution, which uses a couple of functions from the Hmisc package:
library(lattice)
library(Hmisc)
labels<-c('a','b','c','d')
averages<-c(10,40,20,30)
ranges<-c(5,8,4,10)
low <- averages - ranges/2
high <- averages + ranges/2
d <- data.frame(labels, averages, low, high)
Dotplot(labels ~ Cbind(averages, low, high), data = d,
col = 1, # for black points
par.settings = list(plot.line = list(col = 1)), # for black bars
xlab = "Value")
ggplot2 has a good facility for doing this:
library(ggplot2)
labels<-c('a','b','c','d')
averages<-c(10,40,20,30)
ranges<-c(5,8,4,10)
x <- data.frame(labels,averages,ranges)
ggplot(x, aes(averages,labels)) +
geom_point() +
geom_errorbarh(aes(xmin=averages-ranges,xmax=averages+ranges))
Gives you a plot like:
Related
I need to do a deviance chart (lollipop chart with lines from the mean to values above / below the mean). From this question and answer Drawing line segments in R, it is clear that I need to plot segments and then add the points. However, my x axis is a factor and the solution fails.
This works:
df <- data.frame(ID = c(1, 2, 3),
score = c(30, 42, 48))
mid <- mean(df$score)
plot(range(df$ID), range(df$score),type="n")
segments(df$ID, df$score, df$ID, mid)
But changing my identifier variable into a factor breaks it.
df$ID2 <- factor(df$ID)
plot(range(df$ID2), range(df$score),type="n")
segments(df$ID2, df$score, df$ID2, mid)
How can I set up the plot area and x-axis values to deal with a factor?
Note that I need a base graphics solution to fit with the other charts in a dashboard style report.
You can convert the factor in a numeric variable, supress the x-axis and then add the correct labels to the plot:
df$ID2 <- factor(letters[df$ID]) # Use letters to show that this is working
plot(range(as.numeric(df$ID2)), range(df$score), type = "n", xaxt = "n")
segments(as.numeric(df$ID2), df$score, as.numeric(df$ID2), mid)
axis(1, at = seq_along(levels(df$ID2)), labels = levels(df$ID2))
I have the following dataframe that I would like to plot. I was wondering if it is possible to color portions of the lines connecting my outcome variable(stackOne$y) in a different color, depending on whether it is less than a certain value or not. For example, I would like portions of the lines falling below 2.2 to be red in color.
set.seed(123)
stackOne = data.frame(id = rep(c(1, 2, 3), each = 3),
y = rnorm(9, 2, 1),
x = rep(c(1, 2, 3), 3))
ggplot(stackOne, aes(x = x, y = y)) +
geom_point() +
geom_line(aes(group = id))
Thanks!
You have at least a couple of options here. The first is quite simple, general (in that it's not limited to straight-line segments) and precise, but uses base plot rather than ggplot. The second uses ggplot, but is slightly more complicated, and colour transition will not be 100% precise (but near enough, as long as you specify an appropriate resolution... read on).
base:
If you're willing to use base plotting functions rather than ggplot, you could clip the plotting region to above the threshold (2.2), then plot the segments in your preferred colour, and subsequently clip to the region below the threshold, and plot again in red. While the first clip is strictly unnecessary, it prevents overplotting different colours, which can look a bit dud.
threshold <- 2.2
set.seed(123)
stackOne=data.frame(id=rep(c(1,2,3),each=3),
y=rnorm(9,2,1),
x=rep(c(1,2,3),3))
# create a second df to hold segment data
d <- stackOne
d$y2 <- c(d$y[-1], NA)
d$x2 <- c(d$x[-1], NA)
d <- d[-findInterval(unique(d$id), d$id), ] # remove last row for each group
plot(stackOne[, 3:2], pch=20)
# clip to region above the threshold
clip(min(stackOne$x), max(stackOne$x), threshold, max(stackOne$y))
segments(d$x, d$y, d$x2, d$y2, lwd=2)
# clip to region below the threshold
clip(min(stackOne$x), max(stackOne$x), min(stackOne$y), threshold)
segments(d$x, d$y, d$x2, d$y2, lwd=2, col='red')
points(stackOne[, 3:2], pch=20) # plot points again so they lie over lines
ggplot:
If you want or need to use ggplot, you can consider the following...
One solution is to use geom_line(aes(group=id, color = y < 2.2)), however this will assign colours based on the y-value of the point at the beginning of each segment. I believe you want to have the colour change not just at the nodes, but wherever a line crosses your given threshold of 2.2. I'm not all that familiar with ggplot, but one way to achieve this is to make a higher-resolution version of your data by creating new points along the lines that connect your existing points, and then use the color = y < 2.2 argument to achieve the desired effect.
For example:
threshold <- 2.2 # set colour-transition threshold
yres <- 0.01 # y-resolution (accuracy of colour change location)
d <- stackOne # for code simplification
# new cols for point coordinates of line end
d$y2 <- c(d$y[-1], NA)
d$x2 <- c(d$x[-1], NA)
d <- d[-findInterval(unique(d$id), d$id), ] # remove last row for each group
# new high-resolution y coordinates between each pair within each group
y.new <- apply(d, 1, function(x) {
seq(x['y'], x['y2'], yres*sign(x['y2'] - x['y']))
})
d$len <- sapply(y.new, length) # length of each series of points
# new high-resolution x coordinates corresponding with new y-coords
x.new <- apply(d, 1, function(x) {
seq(x['x'], x['x2'], length.out=x['len'])
})
id <- rep(seq_along(y.new), d$len) # new group id vector
y.new <- unlist(y.new)
x.new <- unlist(x.new)
d.new <- data.frame(id=id, x=x.new, y=y.new)
p <- ggplot(d.new, aes(x=x,y=y)) +
geom_line(aes(group=d.new$id, color=d.new$y < threshold))+
geom_point(data=stackOne)+
scale_color_discrete(sprintf('Below %s', threshold))
p
There may well be a way to do this through ggplot functions, but in the meantime I hope this helps. I couldn't work out how to draw a ggplotGrob into a clipped viewport (rather it seems to just scale the plot). If you want colour to be conditional on some x-value threshold instead, this would obviously need some tweaking.
Encouraged by people in my answer to a newer but related question, I'll also share a easier to use approximation to the problem here.
Instead of interpolating the correct values exactly, one can use ggforce::geom_link2() to interpolate lines and use after_stat() to assign the correct colours after interpolation. If you want more precision you can increase the n of that function.
library(ggplot2)
library(ggforce)
#> Warning: package 'ggforce' was built under R version 4.0.3
set.seed(123)
stackOne = data.frame(id = rep(c(1, 2, 3), each = 3),
y = rnorm(9, 2, 1),
x = rep(c(1, 2, 3), 3))
ggplot(stackOne, aes(x = x, y = y)) +
geom_point() +
geom_link2(
aes(group = id,
colour = after_stat(y < 2.2))
) +
scale_colour_manual(
values = c("black", "red")
)
Created on 2021-03-26 by the reprex package (v1.0.0)
I know it was already answered here, but only for ggplot2 histogram.
Let's say I have the following code to generate a histogram with red bars and blue bars, same number of each (six red and six blue):
set.seed(69)
hist(rnorm(500), col = c(rep("red", 6), rep("blue", 7)), breaks = 10)
I have the following image as output:
I would like to automate the entire process, how can I use values from any x-axis and set a condition to color the histogram bars (with two or more colors) using the hist() function, without have to specify the number os repetitions of each color?
Assistance most appreciated.
The hist function uses the pretty function to determine break points, so you can do this:
set.seed(69)
x <- rnorm(500)
breaks <- pretty(x,10)
col <- ifelse(1:length(breaks) <= length(breaks)/2, "red", "blue")
hist(x, col = col, breaks = breaks)
When I want to do this, I actually tabulate the data and make a barplot as follows (note that a bar plot of tabulated data is a histogram):
set.seed(69)
dat <- rnorm(500, 0, 1)
tab <- table(round(dat, 1))#Round data from rnorm because rnorm can be precise beyond most real data
bools <- (as.numeric(attr(tab, "name")) >= 0)#your condition here
cols <- c("grey", "dodgerblue4")[bools+1]#Note that FALSE + 1 = 1 and TRUE + 1 = 2
barplot(tab, border = "white", col = cols, main = "Histogram with barplot")
The output:
I want to plot a time series, excluding in the plot a stretch of time in the middle. If I plot the series alone, with only an index on the x-axis, that is what I get. The interior set of excluded points do not appear.
x <- rnorm(50)
Dates <- seq(as.Date("2008-1-1"), by = "day", length.out = length(x))
dummy <- c(rep(1, 25), rep(0, 10), rep(1, length(x) - 35))
plot(x[dummy == 1])
Once the dates are on the x-axis, however, R dutifully presents an accurate true time scale, including the excluded dates. This produces a blank region on the plot.
plot(Dates[dummy == 1], x[dummy == 1])
How can I get dates on the x-axis, but not show the blank region of the excluded dates?
Three alternatives:
1. ggplot2 Apparently, ggplot2 would not allow for a discontinuous axis but you could use facet_wrap to get a similar effect.
# get the data
x = rnorm(50)
df <- data.frame( x = x,
Dates = seq(as.Date("2008-1-1"), by = "day", length.out = length(x)) ,
dummy = c(rep(1, 25), rep(0, 10), rep(1, length(x) - 35)))
df$f <- ifelse(df$Dates <= "2008-01-25", c("A"), c("B"))
# plot
ggplot( subset(df, dummy==1)) +
geom_point(aes(x= Dates, y=x)) +
facet_wrap(~f , scales = "free_x")
2. base R
plot(df$x ~ df$Dates, col= ifelse( df$f=="A", "blue", "red"), data=subset(df, dummy==1))
3. plotrix Another alternative would be to use gap.plot{plotrix}. The code would be something like this one below. However, I couldn't figure out how to make a break in an axis with date values. Perhaps this would and additional question.
library(plotrix)
gap.plot(Dates[dummy == 1], x[dummy == 1], gap=c(24,35), gap.axis="x")
I think I figured it out. Along the lines I proposed above, I had to fiddle with the axis command for a long time to get it to put the date labels in the right place. Here's what I used:
plot(x[dummy == 1], xaxt = "n", xlab = "") # plot with no x-axis title or tick labels
Dates1_index <- seq(1,length(Dates1), by = 5) # set the tick positions
axis(1, at = Dates1_index, labels = format(Dates1[Dates1_index], "%b %d"), las = 2)
Having succeeded, I now agree with #alistaire that it looks pretty misleading. Maybe if I put a vertical dashed line at the break...
I currently generate the following plot using ggplot in R:
The data is stored in a single dataframe with three columns: PDF (y-axis in the plot above), mids(x) and dataset name. This is created from histograms.
What I want to do is to plot a color-coded vertical line for each dataset representing the 95th quantile, like I manually painted below as an example:
I tried to use + geom_line(stat="vline", xintercept="mean") but of course I'm looking for the quantiles, not for the mean, and AFAIK ggplot does not allow that. Colors are fine.
I also tried + stat_quantile(quantiles = 0.95) but I'm not sure what it does exactly. Documentation is very scarce. Colors, again, are fine.
Please note that density values are very low, down to 1e-8. I don't know if the quantile() function likes that.
I understand that calculating the quantile of an histogram is not quite the same as calculating that of a list of numbers. I don't know how it would help, but the HistogramToolspackage contains an ApproxQuantile() function for histogram quantiles.
Minimum working example is included below. As you can see I obtain a data frame from each histogram, then bind the dataframes together and plot that.
library(ggplot2)
v <- c(1:30, 2:50, 1:20, 1:5, 1:100, 1, 2, 1, 1:5, 0, 0, 0, 5, 1, 3, 7, 24, 77)
h <- hist(v, breaks=c(0:100))
df1 <- data.frame(h$mids,h$density,rep("dataset1", 100))
colnames(df1) <- c('Bin','Pdf','Dataset')
df2 <- data.frame(h$mids*2,h$density*2,rep("dataset2", 100))
colnames(df2) <- c('Bin','Pdf','Dataset')
df_tot <- rbind(df1, df2)
ggplot(data=df_tot[which(df_tot$Pdf>0),], aes(x=Bin, y=Pdf, group=Dataset, colour=Dataset)) +
geom_point(aes(color=Dataset), alpha = 0.7, size=1.5)
Precomputing these values and plotting them separately seems like the simplest option. Doing so with dplyr requires minimal effort:
library(dplyr)
q.95 <- df_tot %>%
group_by(Dataset) %>%
summarise(Bin_q.95 = quantile(Bin, 0.95))
ggplot(data=df_tot[which(df_tot$Pdf>0),],
aes(x=Bin, y=Pdf, group=Dataset, colour=Dataset)) +
geom_point(aes(color=Dataset), alpha = 0.7, size=1.5) +
geom_vline(data = q.95, aes(xintercept = Bin_q.95, colour = Dataset))