I'm attempting to display a grid figure of summarized weekly data of several variables. The two components of this graph that are most pertinent are a distributional summary graph (so box plot or violin plot) of the values that a certain variables took over a given week and a cumulative count graph of an integer variable accumulating over weeks (so a step plot). I would like to plot these two graphs in on an aligned x-axis using grid. I'll be using ggplot2 to make the individual graphs, because I've got a crush on Hadley Wickham (j/k, ggplot is just really, really nice).
The problem is that geom_boxplot only takes factors for x-axis and the geom_step only takes continuous data for the x-axis. These don't necessarily align even if you force similar x-limits with coord_cartesian or scale_x_....
I've cobbled together a hack using geom_rect that will work for this specific application, but that will be a pain to adapt if, for example, I have some other factor that results in multiple boxes for a single week.
The obligatory reproducible:
library(ggplot2)
library(grid)
var1 <- data.frame(val = rnorm(300),
week = c(rep(25, 100),
rep(26, 100),
rep(27, 100))
)
var2 <- data.frame(cumul = cumsum(c(0, rpois(2, 15))),
week = c(25, 26, 27)
)
g1 <- ggplot(var1, aes(x = factor(week), y = val)) +
geom_boxplot()
g2 <- ggplot(var2, aes(x = week, y = cumul)) +
geom_step() + scale_x_continuous(breaks = 25:27)
grid.newpage()
grid.draw(rbind(ggplotGrob(g1),
ggplotGrob(g2),
size = "last"))
And the kludge:
library(dplyr)
chiggity_check <- var1 %>%
group_by(week) %>%
summarise(week.avg = mean(val),
week.25 = quantile(val)[2],
week.75 = quantile(val)[4],
week.05 = quantile(val)[1],
week.95 = quantile(val)[5])
riggity_rect <- ggplot(chiggity_check) +
geom_rect(aes(xmin = week - 0.25, xmax = week + 0.25,
ymin = week.25,
ymax = week.75)) +
geom_segment(aes(x = week - 0.25, xend = week + 0.25,
y = week.avg, yend=week.avg),
color = "white") +
geom_segment(aes(x = week, xend = week ,
y = week.25, yend=week.05)) +
geom_segment(aes(x = week, xend = week ,
y = week.75, yend=week.95)) +
coord_cartesian(c(24.5,27.5)) +
scale_x_continuous(breaks = 25:27)
grid.newpage()
grid.draw(rbind(ggplotGrob(riggity_rect),
ggplotGrob(g2 + coord_cartesian(c(24.5,27.5))),
size = "last"))
So the question(s) is/are: is there a way to force geom_boxplot to a continuous axis or geom_step to a factor axis? Or is there some other implementation, perhaps stat_summary that will be a bit more flexible so that I can align axes and also potentially easily add in things like grouping color variables?
One approach is to plot the two charts on an x-axis set up with factor(week), but in the g2 plot (the step plot) do so in geom_blank() so that the scale is set up. Then in geom_step(), plot on a numeric scale: as.numeric(factor(week))
library(ggplot2)
library(grid)
# Your data
var1 <- data.frame(val = rnorm(300),
week = c(rep(25, 100),
rep(26, 100),
rep(27, 100))
)
var2 <- data.frame(cumul = cumsum(c(0, rpois(2, 15))),
week = c(25, 26, 27)
)
# Your g1
g1 <- ggplot(var1, aes(x = factor(week), y = val)) +
geom_boxplot()
# Modified g2
g2 <- ggplot(var2) + geom_blank(aes(x = factor(week), y = cumul)) +
geom_step(aes(x = as.numeric(as.factor(week)), y = cumul))
grid.newpage()
grid.draw(gridExtra::rbind.gtable(ggplotGrob(g1),
ggplotGrob(g2),
size = "last"))
Related
I'm working with stock prices and trying to plot the price difference.
I created one using autoplot.zoo(), my question is, how can I manage to change the point shapes to triangles when they are above the upper threshold and to circles when they are below the lower threshold. I understand that when using the basic plot() function you can do these by calling the points() function, wondering how I can do this but with ggplot2.
Here is the code for the plot:
p<-autoplot.zoo(data, geom = "line")+
geom_hline(yintercept = threshold, color="red")+
geom_hline(yintercept = -threshold, color="red")+
ggtitle("AAPL vs. SPY out of sample")
p+geom_point()
We can't fully replicate without your data, but here's an attempt with some sample generated data that should be similar enough that you can adapt for your purposes.
# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
You can create an additional variable that determines the shape, based on the relationship in the data itself, and pass that as an argument into ggplot.
# Create conditional data
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
data$outlier[is.na(data$outlier)] <- "In Range"
library(ggplot2)
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16,15))
# If you want points just above and below# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
thresh <- 4
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))
Alternatively, you can just add the points above and below the threshold as individual layers with manually specified shapes, like this. The pch argument points to shape type.
# Another way of doing this
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
ggplot(data, aes(x = date, y = spread, group = 1)) +
geom_line() +
geom_point(data = data[data$spread>thresh,], pch = 17) +
geom_point(data = data[data$spread< (-thresh),], pch = 16) +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))
Here is my sample data:
Singer <- c("A","B","C","A","B","C")
Rank <- c(1,2,3,3,2,1)
Episode <- c(1,1,1,2,2,2)
Votes <- c(0.3,0.28,0.11,0.14,0.29,0.38)
data <- data_frame(Episode,Singer,Rank,Votes)
data$Episode <- as.character(data$Episode)
I would like to make a line graph to show the performance of each singer.
I tried to use ggplot2 like below:
ggplot(data,aes(x=Episode,y=Votes,group = Singer)) + geom_line()
I have two questions:
How can I format the y-axis as percentage?
How can I label each dot in this line graph as the values of "Rank", which allows me to show rank and votes in the same graph?
To label each point use:
geom_label(aes(label = Rank))
# or
geom_text(aes(label = Rank), nudge_y = .01, nudge_x = 0)
To format the axis labels use:
scale_y_continuous(labels = scales::percent_format())
# or without package(scales):
scale_y_continuous(breaks = (seq(0, .4, .2)), labels = sprintf("%1.f%%", 100 * seq(0, .4, .2)), limits = c(0,.4))
Complete code:
library(ggplot2)
library(scales)
ggplot(data, aes(x = factor(Episode), y = Votes, group = Singer)) +
geom_line() +
geom_label(aes(label = Rank)) +
scale_y_continuous(labels = scales::percent_format())
Data:
Singer <- c("A","B","C","A","B","C")
Rank <- c(1,2,3,3,2,1)
Episode <- c(1,1,1,2,2,2)
Votes <- c(0.3,0.28,0.11,0.14,0.29,0.38)
data <- data_frame(Episode,Singer,Rank,Votes)
# no need to transform to character bc we use factor(Episode) in aes(x=..)
I am trying to generate a ternary plot using ggtern.
My data ranges from 0 - 1000 for x, y,and z variables. I wondered if it is possible to extend the axis length above 100 to represent my data.
#Nevrome is on the right path, your points will still be plotted as 'compositions', ie, concentrations sum to unity, but you can change the labels of the axes, to indicate a range from 0 to 1000.
library(ggtern)
set.seed(1)
df = data.frame(x = runif(10)*1000,
y = runif(10)*1000,
z = runif(10)*1000)
breaks = seq(0,1,by=0.2)
ggtern(data = df, aes(x, y, z)) +
geom_point() +
limit_tern(breaks=breaks,labels=1000*breaks)
I think there is no direct solution to do this with ggtern. But an easy workaround could look like this:
library(ggtern)
df = data.frame(x = runif(50)*1000,
y = runif(50)*1000,
z = runif(50)*1000,
Group = as.factor(round(runif(50,1,2))))
ggtern() +
geom_point(data = df, aes(x/10, y/10, z/10, color = Group)) +
labs(x="X", y="Y", z="Z", title="Title") +
scale_T_continuous(breaks = seq(0,1,0.2), labels = 1000*seq(0,1,0.2)) +
scale_L_continuous(breaks = seq(0,1,0.2), labels = 1000*seq(0,1,0.2)) +
scale_R_continuous(breaks = seq(0,1,0.2), labels = 1000*seq(0,1,0.2))
Length of x-axis is important for my plot because it allows one to compare between facets, therefore I want facets to have different x-axis sizes. Here is my example data:
group1 <- seq(1, 10, 2)
group2 <- seq(1, 20, 3)
x = c(group1, group2)
mydf <- data.frame (X =x , Y = rnorm (length (x),5,1),
groups = c(rep(1, length (group1)), rep(2, length(group2))))
And my code:
p1 = ggplot(data=mydf,aes(x=X,y=Y,color=factor(groups)) )+
geom_point(size=2)+
scale_x_continuous(labels=comma)+
theme_bw()
p1+facet_grid(groups ~ .,scales = "fixed",space="free_x")
And the resulting figure:
Panel-1 has x-axis values less then 10 whereas panel-2 has x-axis value extending to 20. Still both panels and have same size on x-axis. Is there any way to make x-axis panel size different for different panels, so that they correspond to their (x-axis) values?
I found an example from some different package that shows what I am trying to do, here is the figure:
Maybe something like this can get you started. There's still some formatting to do, though.
library(grid)
library(gridExtra)
library(dplyr)
library(ggplot2)
p1 <- ggplot(data=mydf[mydf$groups==1,],aes(x=X,y=Y))+
geom_point(size=2)+
theme_bw()
p2 <- ggplot(data=mydf[mydf$groups==2,],aes(x=X,y=Y))+
geom_point(size=2)+
theme_bw()
summ <- mydf %>% group_by(groups) %>% summarize(len=diff(range(X)))
summ$p <- summ$len/max(summ$len)
summ$q <- 1-summ$p
ng <- nullGrob()
grid.arrange(arrangeGrob(p1,ng,widths=summ[1,3:4]),
arrangeGrob(p2,ng,widths=summ[2,3:4]))
I'm sure there's a way to make this more general, and the axes don't line up perfectly yet, but it's a beginning.
Here is a solution following OP's clarifying comment ("I guess axis will be same but the boxes will be of variable size. Is it possible by plotting them separately and aligning in grid?").
library(plyr); library(ggplot2)
buffer <- 0.5 # Extra space around the box
#Calculate box parameters
mydf.box <- ddply(mydf, .(groups), summarise,
max.X = max(X) + buffer,
min.X = 0,
max.Y = max(Y) + buffer,
min.Y = 0,
X = mean(X), Y = mean(Y)) #Dummy values for X and Y needed for geom_rect
p2 <- ggplot(data=mydf,aes(x=X, y=Y) )+
geom_rect(data = mydf.box, aes( xmax = max.X, xmin = min.X,
ymax = max.Y, ymin = min.Y),
fill = "white", colour = "black", fill = NA) +
geom_point(size=2) + facet_grid(groups ~ .,scales = "free_y") +
theme_classic() +
#Extra formatting to make your plot like the example
theme(panel.background = element_rect(fill = "grey85"),
strip.text.y = element_text(angle = 0),
strip.background = element_rect(colour = NA, fill = "grey65"))
I have a dataframe in R like this:
dat = data.frame(Sample = c(1,1,2,2,3), Start = c(100,300,150,200,160), Stop = c(180,320,190,220,170))
And I would like to plot it such that the x-axis is the position and the y-axis is the number of samples at that position, with each sample in a different colour. So in the above example you would have some positions with height 1, some with height 2 and one area with height 3. The aim being to find regions where there are a large number of samples and what samples are in that region.
i.e. something like:
&
---
********- -- **
where * = Sample 1, - = Sample 2 and & = Sample 3
My first try:
dat$Sample = factor(dat$Sample)
ggplot(aes(x = Start, y = Sample, xend = Stop, yend = Sample, color = Sample), data = dat) +
geom_segment(size = 2) +
geom_segment(aes(x = Start, y = 0, xend = Stop, yend = 0), size = 2, alpha = 0.2, color = "black")
I combine two segment geometries here. One draws the colored vertical bars. These show where Samples have been measured. The second geometry draws the grey bar below where the density of the samples is shown. Any comments to improve on this quick hack?
This hack may be what you're looking for, however I've greatly increased the size of the dataframe in order to take advantage of stacking by geom_histogram.
library(ggplot2)
dat = data.frame(Sample = c(1,1,2,2,3),
Start = c(100,300,150,200,160),
Stop = c(180,320,190,220,170))
# Reformat the data for plotting with geom_histogram.
dat2 = matrix(ncol=2, nrow=0, dimnames=list(NULL, c("Sample", "Position")))
for (i in seq(nrow(dat))) {
Position = seq(dat[i, "Start"], dat[i, "Stop"])
Sample = rep(dat[i, "Sample"], length(Position))
dat2 = rbind(dat2, cbind(Sample, Position))
}
dat2 = as.data.frame(dat2)
dat2$Sample = factor(dat2$Sample)
plot_1 = ggplot(dat2, aes(x=Position, fill=Sample)) +
theme_bw() +
opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank()) +
geom_hline(yintercept=seq(0, 20), colour="grey80", size=0.15) +
geom_hline(yintercept=3, linetype=2) +
geom_histogram(binwidth=1) +
ylim(c(0, 20)) +
ylab("Count") +
opts(axis.title.x=theme_text(size=11, vjust=0.5)) +
opts(axis.title.y=theme_text(size=11, angle=90)) +
opts(title="Segment Plot")
png("plot_1.png", height=200, width=650)
print(plot_1)
dev.off()
Note that the way I've reformatted the dataframe is a bit ugly, and will not scale well (e.g. if you have millions of segments and/or large start and stop positions).