Plotting overlapping positions in R

Plotting overlapping positions in R - r

I have a dataframe in R like this:
dat = data.frame(Sample = c(1,1,2,2,3), Start = c(100,300,150,200,160), Stop = c(180,320,190,220,170))
And I would like to plot it such that the x-axis is the position and the y-axis is the number of samples at that position, with each sample in a different colour. So in the above example you would have some positions with height 1, some with height 2 and one area with height 3. The aim being to find regions where there are a large number of samples and what samples are in that region.
i.e. something like:
&
---
********- -- **
where * = Sample 1, - = Sample 2 and & = Sample 3

My first try:
dat$Sample = factor(dat$Sample)
ggplot(aes(x = Start, y = Sample, xend = Stop, yend = Sample, color = Sample), data = dat) +
geom_segment(size = 2) +
geom_segment(aes(x = Start, y = 0, xend = Stop, yend = 0), size = 2, alpha = 0.2, color = "black")
I combine two segment geometries here. One draws the colored vertical bars. These show where Samples have been measured. The second geometry draws the grey bar below where the density of the samples is shown. Any comments to improve on this quick hack?

This hack may be what you're looking for, however I've greatly increased the size of the dataframe in order to take advantage of stacking by geom_histogram.
library(ggplot2)
dat = data.frame(Sample = c(1,1,2,2,3),
Start = c(100,300,150,200,160),
Stop = c(180,320,190,220,170))
# Reformat the data for plotting with geom_histogram.
dat2 = matrix(ncol=2, nrow=0, dimnames=list(NULL, c("Sample", "Position")))
for (i in seq(nrow(dat))) {
Position = seq(dat[i, "Start"], dat[i, "Stop"])
Sample = rep(dat[i, "Sample"], length(Position))
dat2 = rbind(dat2, cbind(Sample, Position))
}
dat2 = as.data.frame(dat2)
dat2$Sample = factor(dat2$Sample)
plot_1 = ggplot(dat2, aes(x=Position, fill=Sample)) +
theme_bw() +
opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank()) +
geom_hline(yintercept=seq(0, 20), colour="grey80", size=0.15) +
geom_hline(yintercept=3, linetype=2) +
geom_histogram(binwidth=1) +
ylim(c(0, 20)) +
ylab("Count") +
opts(axis.title.x=theme_text(size=11, vjust=0.5)) +
opts(axis.title.y=theme_text(size=11, angle=90)) +
opts(title="Segment Plot")
png("plot_1.png", height=200, width=650)
print(plot_1)
dev.off()
Note that the way I've reformatted the dataframe is a bit ugly, and will not scale well (e.g. if you have millions of segments and/or large start and stop positions).

Related

How do I make geom_bar colors change with gganimate when x and y are constant?

I'm new to gganimate and was having difficulty figuring out how to do this.
I'd like to show the spread in two different levels of a variable by animating colour transitions. I want to show this by having the narrow level transition through a smaller range of colours than the wider level in the same amount of time. Is this possible?
Here's the reproducible example I have up-to now.
library(ggplot2)
df <- data.frame(x = rep(c("narrow", "wide"), each = 1000),
y = c(sort(runif(n = 1000, min = 6, max = 7)),
sort(runif(n = 1000, min = 3, 10))),
animate.time = c(1:1000))
ggplot(data = df, aes(x = x, fill = y)) +
geom_bar()
Created on 2021-06-12 by the reprex package (v2.0.0)
As part of this, the color scale that each bar uses should be the same, but the narrow x should occupy a narrower range of the color range than the wide x.
I used geom_bar because I don't want the shape of the visualisation to change, however I'm not sure how to proceed. I tried adding animate.time so that gganimate would have something to step through, but then I don't see how I'm supposed to make it change colors based on y.
I'd appreciate any pointers.

There is an easier way to do this based on this.
library(colorspace)
library(gganimate)
library(ggplot2)
# Narrow Visualization
narrow<-data.frame(x = rep(10,10),
state_cont=rep(1:2, each = 5))
ggplot(data = narrow, aes(x = x, fill = state_cont)) +
geom_bar() +
colorspace::scale_fill_continuous_sequential(palette = "Reds 3", begin = 0.4, end = .6) +
transition_states(states = state_cont) + theme_void() +
theme(legend.position = "none")
# Wide Visualization
wide <- data.frame(x=rep(10,10),
state_cont=rep(1:2, each = 5))
ggplot(data = wide, aes(x = x, fill = state_cont)) +
geom_bar() +
colorspace::scale_fill_continuous_sequential(palette = "Reds 3", begin = 0, end = 1) +
transition_states(states = state_cont) + theme_void() + theme(legend.position = "none")
The explanation is that for this particular question, you don't want to set every single value that it should oscillate through, it'd be much easier to simply set the boundaries in the scale_fill_continuous_sequential() via the begin and end arguments. Then, gganimate will automatically cycle through based on state_cont

ggplot make default point size larger when size is already determined by another variable

I am trying to display data that includes non-detects. For the ND I want to have a circular outline at different sizes so that the lines do not overlap each other. I pretty much have what I want, but for the parameter cis-DCE the circular outline just makes the point look bigger instead of being a distinct outline. How do I attribute size to the parameter and also make the starting size larger?
I will include all of the code I am using for the graphing, but I am specifically working on this bit right now.
geom_point(aes(x= date, y = lrl, group = parm_nmShort, size = parm_nmShort), shape = 1) + #marking lower limit
I also know that I could use facet_wraps and I've done that previously, but historically this data has been shown in one graph, but without identifying the NDs and I do not want to drastically alter the display of the data and confuse anyone.
{
#graphing
# folder where you want the graphs to be saved:
results <- 'C:/Users/cbuckley/OneDrive - DOI/Documents/Projects/New Haven/Data/Graphs/'
{
VOC.graph <- function(df, na.rm = TRUE, ...){
df$parm_nmShort <- factor(df$parm_nm, levels = c("cis.1.2.Dichloroethene_77093",
"Trichloroethene_34485",
"Tetrachloroethene_34475"),
labels = c("cis-DCE", "TCE", "PCE"))
# create list of sites in data to loop over
site_list <- unique(df$site_nm)
# create for loop to produce ggplot2 graphs
for (i in seq_along(site_list)) {
# create plot for each county in df
plot <-
ggplot(subset(df, df$site_nm==site_list[i]),
aes(x = date, y = result,
group = parm_nmShort,
color = parm_nmShort)) +
geom_point() + #add data point plot
geom_line() + #add line plot
#geom_point(aes(y = lrl, group = parm_nmShort, shape = parm_nmShort)) +
geom_point(aes(x= date, y = lrl, group = parm_nmShort, size = parm_nmShort), shape = 1) + #marking lower limit
#scale_shape_manual(values = c("23","24","25")) + #create outlier shapes
#facet_wrap(~parm_nmShort) +
ggtitle(site_list[i]) + #name graphs well names
# theme(legend.position="none") + #removed legend
labs(x = "Year", y = expression(paste("Value, ug/L"))) + #add x and y label titles
theme_article() + #remove grey boxes, outline graph in black
theme(legend.title = element_blank()) + #removes legend title
scale_x_date(labels = date_format("%y"),
limits = as.Date(c("2000-01-01","2021-01-01"))) #+ # set x axis for all graphs
# geom_hline(yintercept = 5) #+ #add 5ug/L contaminant limit horizontal line
# theme(axis.text.x = element_text(angle = 45, size = 12, vjust = 1)) + #angles x axis titles 45 deg
# theme(aspect.ratio = 1) +
# scale_color_hue(labels = c("cic-DCE", "PCE", "TCE")) + #change label names
# scale_fill_discrete(breaks = c("PCE", "TCE", "cic-DCE"))
# Code below will let you block out below the resolution limit
# geom_ribbon(aes(ymin = 0, ymax = ###LRL###), fill ="white", color ="grey3") +
# geom_line(color ="black", lwd = 1)
#ggsave(plot,
# file=paste(results, "", site_list[i], ".png", sep=''),
# scale=1)
# print plots to screen
print(plot)
}
}
#run graphing function with long data set
VOC.graph(data)
}}

Well after a lot of playing around, I figured out the answer to my own question. I figured I'd leave the question up because none of the solutions I found online worked for me but this code did.
geom_point(aes(x= date, y = lrl, group = parm_nmShort, shape = parm_nmShort, size = parm_nmShort)) + #identify non detects
scale_shape_manual(values = c(1,1,1)) +
scale_size_manual(values = c(3,5,7)) +
I'm not very good at R, but for some reason when I didn't include the group and shape in the aes as parm_nmShort, I couldn't mannualy change the values. I don't know if it's because I have more than one geom_point in my whole script and so maybe it didn't know which one to change.

Is there an equivalent to points() on ggplot2

I'm working with stock prices and trying to plot the price difference.
I created one using autoplot.zoo(), my question is, how can I manage to change the point shapes to triangles when they are above the upper threshold and to circles when they are below the lower threshold. I understand that when using the basic plot() function you can do these by calling the points() function, wondering how I can do this but with ggplot2.
Here is the code for the plot:
p<-autoplot.zoo(data, geom = "line")+
geom_hline(yintercept = threshold, color="red")+
geom_hline(yintercept = -threshold, color="red")+
ggtitle("AAPL vs. SPY out of sample")
p+geom_point()

We can't fully replicate without your data, but here's an attempt with some sample generated data that should be similar enough that you can adapt for your purposes.
# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
You can create an additional variable that determines the shape, based on the relationship in the data itself, and pass that as an argument into ggplot.
# Create conditional data
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
data$outlier[is.na(data$outlier)] <- "In Range"
library(ggplot2)
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16,15))
# If you want points just above and below# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
thresh <- 4
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))
Alternatively, you can just add the points above and below the threshold as individual layers with manually specified shapes, like this. The pch argument points to shape type.
# Another way of doing this
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
ggplot(data, aes(x = date, y = spread, group = 1)) +
geom_line() +
geom_point(data = data[data$spread>thresh,], pch = 17) +
geom_point(data = data[data$spread< (-thresh),], pch = 16) +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))

Transforming the y-axis without changing raw data in ggplot2

I have a question about how to transform the y-axis in ggplot2. My plot now has two lines and a scatter plot. For the scatter plot, I am very interested in the area around zero. Is there a possible way to enlarge the space between 0% and 5% and narrow the space between 20% and 30%?
I have tried to use coord_trans(y = "log10") to transform into a log form. But in this case, I have a lot of negative values, so if I want to use sqrt or log, the negative values will be removed. Do you have any suggestions?
Example of data points:
df1 = data.frame(y = runif(200,min = -1, max = 1))
df1 = data.frame( x= seq(1:200), y = df1[order(abs(df1$y)),])
ggplot(df1) +
geom_point(colour = "black",aes(x,y) ,size = 0.1)
I want to have more space between 0% and 5 % and less space between 5% and 30%.
I have tried to use trans_new() to transform the axes.
eps <- 1e-8
tn <- trans_new("logpeps",
function(x) (x+eps)^(3),
function(y) ((y)^(1/3) ),
domain=c(- Inf, Inf)
)
ggplot(df1)+ geom_point(colour = "black",aes(x,y) ,size = 0.1) +
# xlab("Observations sorted by PD in v3.1") + ylab("Absolute PD difference ") +
# ggtitle("Absolute PD for RiskCalc v4.0 relative to v3.1") +
scale_x_continuous(breaks = seq(0, round(rownum/1000)*1000, by = round(rownum/100)*10)) +
scale_y_continuous(limits = c(-yrange,yrange),breaks = c(-breaksY,breaksY),
sec.axis = sec_axis(~.,breaks = c(-breaksY[2:length(breaksY)],breaksY), labels = scales:: percent
)) +
# geom_line(data = df, aes(x,y[,3], colour = "blue"),size = 1) +
# geom_line(data = ds,aes(xval, yval,colour = "red"),size = 1) +
coord_trans(y = tn) +
scale_color_discrete(name = element_blank())
But it compresses the plot to the center, which is opposite to what I want. Then I try to use y = y^3, but it shows an
ERROR: zero_range(range)

Try a cube root transform on the y values:
aes(y=yVariable^(1/3))
or use trans_new() to define a new transformation (such as cube root, with pleasing breaks and labels).

A couple thoughts:
You can remove the empty edges of the plot like so:
scale_y_continuous(expand = c(0,0))
If you want to try the log transformation, just do:
scale_y_log10()
If you want to focus the window:
scale_y_continuous(limits=c(-.15,.15), expand=c(0,0))
Also consider adding theme_bw() for a cleaner look

ggplot, facet, piechart: placing text in the middle of pie chart slices

I'm trying to produce a facetted pie-chart with ggplot and facing problems with placing text in the middle of each slice:
dat = read.table(text = "Channel Volume Cnt
AGENT high 8344
AGENT medium 5448
AGENT low 23823
KIOSK high 19275
KIOSK medium 13554
KIOSK low 38293", header=TRUE)
vis = ggplot(data=dat, aes(x=factor(1), y=Cnt, fill=Volume)) +
geom_bar(stat="identity", position="fill") +
coord_polar(theta="y") +
facet_grid(Channel~.) +
geom_text(aes(x=factor(1), y=Cnt, label=Cnt, ymax=Cnt),
position=position_fill(width=1))
The output:
What parameters of geom_text should be adjusted in order to place numerical labels in the middle of piechart slices?
Related question is Pie plot getting its text on top of each other but it doesn't handle case with facet.
UPDATE: following Paul Hiemstra advice and approach in the question above I changed code as follows:
---> pie_text = dat$Cnt/2 + c(0,cumsum(dat$Cnt)[-length(dat$Cnt)])
vis = ggplot(data=dat, aes(x=factor(1), y=Cnt, fill=Volume)) +
geom_bar(stat="identity", position="fill") +
coord_polar(theta="y") +
facet_grid(Channel~.) +
geom_text(aes(x=factor(1),
---> y=pie_text,
label=Cnt, ymax=Cnt), position=position_fill(width=1))
As I expected tweaking text coordiantes is absolute but it needs be within facet data:

NEW ANSWER: With the introduction of ggplot2 v2.2.0, position_stack() can be used to position the labels without the need to calculate a position variable first. The following code will give you the same result as the old answer:
ggplot(data = dat, aes(x = "", y = Cnt, fill = Volume)) +
geom_bar(stat = "identity") +
geom_text(aes(label = Cnt), position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") +
facet_grid(Channel ~ ., scales = "free")
To remove "hollow" center, adapt the code to:
ggplot(data = dat, aes(x = 0, y = Cnt, fill = Volume)) +
geom_bar(stat = "identity") +
geom_text(aes(label = Cnt), position = position_stack(vjust = 0.5)) +
scale_x_continuous(expand = c(0,0)) +
coord_polar(theta = "y") +
facet_grid(Channel ~ ., scales = "free")
OLD ANSWER: The solution to this problem is creating a position variable, which can be done quite easily with base R or with the data.table, plyr or dplyr packages:
Step 1: Creating the position variable for each Channel
# with base R
dat$pos <- with(dat, ave(Cnt, Channel, FUN = function(x) cumsum(x) - 0.5*x))
# with the data.table package
library(data.table)
setDT(dat)
dat <- dat[, pos:=cumsum(Cnt)-0.5*Cnt, by="Channel"]
# with the plyr package
library(plyr)
dat <- ddply(dat, .(Channel), transform, pos=cumsum(Cnt)-0.5*Cnt)
# with the dplyr package
library(dplyr)
dat <- dat %>% group_by(Channel) %>% mutate(pos=cumsum(Cnt)-0.5*Cnt)
Step 2: Creating the facetted plot
library(ggplot2)
ggplot(data = dat) +
geom_bar(aes(x = "", y = Cnt, fill = Volume), stat = "identity") +
geom_text(aes(x = "", y = pos, label = Cnt)) +
coord_polar(theta = "y") +
facet_grid(Channel ~ ., scales = "free")
The result:

I would like to speak out against the conventional way of making pies in ggplot2, which is to draw a stacked barplot in polar coordinates. While I appreciate the mathematical elegance of that approach, it does cause all sorts of headaches when the plot doesn't look quite the way it's supposed to. In particular, precisely adjusting the size of the pie can be difficult. (If you don't know what I mean, try to make a pie chart that extends all the way to the edge of the plot panel.)
I prefer drawing pies in a normal cartesian coordinate system, using geom_arc_bar() from ggforce. It requires a little bit of extra work on the front end, because we have to calculate angles ourselves, but that's easy and the level of control we get as a result is more than worth it.
I've used this approach in previous answers here and here.
The data (from the question):
dat = read.table(text = "Channel Volume Cnt
AGENT high 8344
AGENT medium 5448
AGENT low 23823
KIOSK high 19275
KIOSK medium 13554
KIOSK low 38293", header=TRUE)
The pie-drawing code:
library(ggplot2)
library(ggforce)
library(dplyr)
# calculate the start and end angles for each pie
dat_pies <- left_join(dat,
dat %>%
group_by(Channel) %>%
summarize(Cnt_total = sum(Cnt))) %>%
group_by(Channel) %>%
mutate(end_angle = 2*pi*cumsum(Cnt)/Cnt_total, # ending angle for each pie slice
start_angle = lag(end_angle, default = 0), # starting angle for each pie slice
mid_angle = 0.5*(start_angle + end_angle)) # middle of each pie slice, for the text label
rpie = 1 # pie radius
rlabel = 0.6 * rpie # radius of the labels; a number slightly larger than 0.5 seems to work better,
# but 0.5 would place it exactly in the middle as the question asks for.
# draw the pies
ggplot(dat_pies) +
geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = rpie,
start = start_angle, end = end_angle, fill = Volume)) +
geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = Cnt),
hjust = 0.5, vjust = 0.5) +
coord_fixed() +
scale_x_continuous(limits = c(-1, 1), name = "", breaks = NULL, labels = NULL) +
scale_y_continuous(limits = c(-1, 1), name = "", breaks = NULL, labels = NULL) +
facet_grid(Channel~.)
To show why I think this this approach is so much more powerful than the conventional (coord_polar()) approach, let's say we want the labels on the outside of the pie rather than inside. This creates a couple of problems, such as we will have to adjust hjust and vjust depending on the side of the pie a label falls, and also we will have to make the
plot panel wider than high to make space for the labels on the side without generating excessive space above and below. Solving these problems in the polar coordinate approach is not fun, but it's trivial in the cartesian coordinates:
# generate hjust and vjust settings depending on the quadrant into which each
# label falls
dat_pies <- mutate(dat_pies,
hjust = ifelse(mid_angle>pi, 1, 0),
vjust = ifelse(mid_angle<pi/2 | mid_angle>3*pi/2, 0, 1))
rlabel = 1.05 * rpie # now we place labels outside of the pies
ggplot(dat_pies) +
geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = rpie,
start = start_angle, end = end_angle, fill = Volume)) +
geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = Cnt,
hjust = hjust, vjust = vjust)) +
coord_fixed() +
scale_x_continuous(limits = c(-1.5, 1.4), name = "", breaks = NULL, labels = NULL) +
scale_y_continuous(limits = c(-1, 1), name = "", breaks = NULL, labels = NULL) +
facet_grid(Channel~.)

To tweak the position of the label text relative to the coordinate, you can use the vjust and hjust arguments of geom_text. This will determine the position of all labels simultaneously, so this might not be what you need.
Alternatively, you could tweak the coordinate of the label. Define a new data.frame where you average the Cnt coordinate (label_x[i] = Cnt[i+1] + Cnt[i]) to position the label in the center of that particular pie. Just pass this new data.frame to geom_text in replacement of the original data.frame.
In addition, piecharts have some visual interpretation flaws. In general I would not use them, especially where good alternatives exist, e.g. a dotplot:
ggplot(dat, aes(x = Cnt, y = Volume)) +
geom_point() +
facet_wrap(~ Channel, ncol = 1)
For example, from this plot it is obvious that Cnt is higher for Kiosk than for Agent, this information is lost in the piechart.

Following answer is partial, clunky and I won't accept it.
The hope is that it will solicit better solution.
text_KIOSK = dat$Cnt
text_AGENT = dat$Cnt
text_KIOSK[dat$Channel=='AGENT'] = 0
text_AGENT[dat$Channel=='KIOSK'] = 0
text_KIOSK = text_KIOSK/1.7 + c(0,cumsum(text_KIOSK)[-length(dat$Cnt)])
text_AGENT = text_AGENT/1.7 + c(0,cumsum(text_AGENT)[-length(dat$Cnt)])
text_KIOSK[dat$Channel=='AGENT'] = 0
text_AGENT[dat$Channel=='KIOSK'] = 0
pie_text = text_KIOSK + text_AGENT
vis = ggplot(data=dat, aes(x=factor(1), y=Cnt, fill=Volume)) +
geom_bar(stat="identity", position=position_fill(width=1)) +
coord_polar(theta="y") +
facet_grid(Channel~.) +
geom_text(aes(y=pie_text, label=format(Cnt,format="d",big.mark=','), ymax=Inf), position=position_fill(width=1))
It produces following chart:
As you noticed I can't move labels for green (low).