ggplot is not graphing a vertical line - r

I am trying to plot a graph in ggplot2 where the x-axis represents month-day combinations, the dots represent y-values for two different groups.
When graphing my original data set using this code,
ggplot(graphing.df, aes(MONTHDAY, y.var, color = GROUP)) +
geom_point() +
ylab(paste0(""))+
scale_x_discrete(breaks = function(x) x[seq(1, length(x), by = 15)])+
theme(legend.text = element_blank(),
legend.title = element_blank()) +
geom_vline(xintercept = which(graphing.df$MONTHDAY == "12-27")[1], col='red', lwd=2)
I get this graph where the vertical line is not showing.
When I tried to create a reproducible example using the following code...
df <- data.frame(MONTHDAY = c("01-01", "01-01", "01-02", "01-02", "01-03", "01-03"),
TYPE = rep(c("A", "B"), 3),
VALUE = sample(1:10, 6, replace = TRUE))
verticle_line <- "01-02"
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
#geom_vline(xintercept = which(df$MONTHDAY == verticle_line)[1], col='red', lwd=2)+
geom_vline(xintercept = which(df$MONTHDAY == verticle_line), col='blue', lwd=2)
The vertical line is showing, but now its showing in the wrong place
In my original data set I have two values for each month-day combination (representing each of the two groups). The month-day combination column is a character vector, it is not a factor and does not have levels.

Here is a way. It subsets the data keeping only the rows of interest and plots the vertical line defined by MONTHDAY.
library(ggplot2)
verticle_line <- "01-02"
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
geom_vline(data = subset(df, MONTHDAY == verticle_line),
mapping = aes(xintercept = MONTHDAY), color = 'blue', size = 2)
Data
I will repost the data creation code, this time setting the RNG seed in order to make the example reproducible.
set.seed(2020)
df <- data.frame(MONTHDAY = c("01-01", "01-01", "01-02", "01-02", "01-03", "01-03"),
TYPE = rep(c("A", "B"), 3),
VALUE = sample(1:10, 6, replace = TRUE))

The reason your line is not showing up where you expect is because you are setting the value of xintercept= via the output of the which() function. which() returns the index value where the condition is true. So in the case of your reproducible example, you get the following:
> which(df$MONTHDAY == verticle_line)
[1] 3 4
It returns a vector indicating that in df$MONTHDAY, indexes 3 and 4 in that vector are true. So your code below:
geom_vline(xintercept = which(df$MONTHDAY == verticle_line)...
Reduces down to this:
geom_vline(xintercept = c(3,4)...
Your MONTHDAY axis is not formatted as a date, but treated as a discrete axis of character vectors. In this case xintercept=c(3,4) applied to a discrete axis draws two vertical lines at x intercepts equivalent to the 3rd and 4th discrete position on that axis: in other words, "01-03" and... some unknown 4th position that is not observable within the axis limits.
How do you fix this? Just take out which():
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
geom_vline(xintercept = verticle_line, col='blue', lwd=2)

We can get the corresponding values of 'MONTHDAY' after subsetting
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
geom_vline(xintercept = df$MONTHDAY[df$MONTHDAY == verticle_line],
col='blue', lwd=2)

Related

Is there an equivalent to points() on ggplot2

I'm working with stock prices and trying to plot the price difference.
I created one using autoplot.zoo(), my question is, how can I manage to change the point shapes to triangles when they are above the upper threshold and to circles when they are below the lower threshold. I understand that when using the basic plot() function you can do these by calling the points() function, wondering how I can do this but with ggplot2.
Here is the code for the plot:
p<-autoplot.zoo(data, geom = "line")+
geom_hline(yintercept = threshold, color="red")+
geom_hline(yintercept = -threshold, color="red")+
ggtitle("AAPL vs. SPY out of sample")
p+geom_point()
We can't fully replicate without your data, but here's an attempt with some sample generated data that should be similar enough that you can adapt for your purposes.
# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
You can create an additional variable that determines the shape, based on the relationship in the data itself, and pass that as an argument into ggplot.
# Create conditional data
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
data$outlier[is.na(data$outlier)] <- "In Range"
library(ggplot2)
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16,15))
# If you want points just above and below# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
thresh <- 4
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))
Alternatively, you can just add the points above and below the threshold as individual layers with manually specified shapes, like this. The pch argument points to shape type.
# Another way of doing this
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
ggplot(data, aes(x = date, y = spread, group = 1)) +
geom_line() +
geom_point(data = data[data$spread>thresh,], pch = 17) +
geom_point(data = data[data$spread< (-thresh),], pch = 16) +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))

Smoothen Heatmap in ggplot

I have a dataframe that looks as follows:
X = c(6,6.2,6.4,6.6,6.8,5.6,5.8,6,6.2,6.4,6.6,6.8,7,7.2,7.4,7.6,7.8,8,2.8,3,3.2,3.4,3.6,3.8,4,4.2,4.4,4.6,4.8,5)
Y = c(2.2,2.2,2.2,2.2,2.2,2.6,2.6,2.6,2.6,2.6,2.6,2.6,2.6,2.6,2.6,2.6,2.6,2.6,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8)
Value = c(0,0.00683254,0,0.007595654,0.015517884,0,0,0,0,0,0,0,0,0,0.005219395,0,0,0,0,0,0,0,0,0,0,0,0.002892342,0,0.002758141,0)
table = data.frame(X, Y, Value)
I have put together a heatmap in R, based on the following command:
ggplot(data = table, mapping = aes(x = X, y = Y)) +
geom_tile(aes(fill = Value), colour = 'black') +
theme_void() +
scale_fill_gradient2(low = "white", high = "black") + xlab(label = "X") + ylab(label = "Y")
Since there is not a value for every X and Y, it leads to plots that appear as follows.
I am attempting to smoothen the plot and have the following question:
As there are small white spaces between the plotted values, how could one color these white spaces to be the median intensity? Said differently, how would I first create an initial layer with non-zero median 'Value' before plotting the non-zero 'Value' on top (overlayed)?
A sample is shown below, which has been 'smoothed', which looks closer to the desired output.
I'm not sure if it will totally fit your need but from my understanding you have some missing values and combination of X and Y.
So, you can use complete function from tidyr to get all different combinations of X and Y (those without values will be filled with NA) and then by using na.value argument in scale_fill_gradient2 function, you can set the values of these NA values to the same color of the midpoint value:
library(tidyr)
library(dplyr)
library(ggplot2)
table %>% complete(X,Y) %>%
ggplot(aes(x = X, y = Y))+
geom_raster(aes(fill = Value), interpolate = TRUE)+
scale_fill_gradient2(low = "white", mid = "grey",high = "black",
na.value = "grey")
Does it answer your question ?

Plot different parts of a vector with different colors on the same graph

As from the title suppose this vector and plot:
plot(rnorm(200,5,2),type="l")
This returns this plot
What i would like to know is whether there is a way to make the first half of it to be in blue col="blue" and the rest of it to be in red "col="red".
Similar question BUT in Matlab not R: Here
You could simply use lines for the second half:
dat <- rnorm(200, 5, 2)
plot(1:100, dat[1:100], col = "blue", type = "l", xlim = c(0, 200), ylim = c(min(dat), max(dat)))
lines(101:200, dat[101:200], col = "red")
Not a base R solution, but I think this is how to plot it using ggplot2. It is necessary to prepare a data frame to plot the data.
set.seed(1234)
vec <- rnorm(200,5,2)
dat <- data.frame(Value = vec)
dat$Group <- as.character(rep(c(1, 2), each = 100))
dat$Index <- 1:200
library(ggplot2)
ggplot(dat, aes(x = Index, y = Value)) +
geom_line(aes(color = Group)) +
scale_color_manual(values = c("blue", "red")) +
theme_classic()
We can also use the lattice package with the same data frame.
library(lattice)
xyplot(Value ~ Index, data = dat, type = 'l', groups = Group, col = c("blue", "red"))
Notice that the blue line and red line are disconnected. Not sure if this is important, but if you want to plot a continuous line, here is a workaround in ggplot2. The idea is to subset the data frame for the second half, plot the entire data frame with color as blue, and then plot the second data frame with color as red.
dat2 <- dat[dat$Index %in% 101:200, ]
ggplot(dat, aes(x = Index, y = Value)) +
geom_line(color = "blue") +
geom_line(data = dat2, aes(x = Index, y = Value), color = "red") +
theme_classic()

Facet wrap radar plot with three apexes in R

I have created the following plot which gives the shape of the plot I desire. But when I facet wrap it, the shapes no longer remain triangular and become almost cellular. How can I keep the triangular shape after faceting?
Sample data:
lvls <- c("a","b","c","d","e","1","2","3","4","5","6","7","8","9","10","11","12","13","14","15")
df <- data.frame(Product = factor(rep(lvls, 3)),
variable = c(rep("Ingredients", 20),
rep("Defence", 20),
rep("Benefit", 20)),
value = rnorm(60, mean = 5))
Now when I use this code, I get the shapes I desire.
ggplot(df,
aes(x = variable,
y = value,
color = Product,
group = Product)) +
geom_polygon(fill = NA) +
coord_polar()
However, the products are all on top of one another so ideally I would like to facet wrap.
ggplot(df,
aes(x = variable,
y = value,
color = Product,
group = Product)) +
geom_polygon(fill = NA) +
coord_polar() +
facet_wrap(~Product)
But when I facet wrap, the shapes become oddly cellular and not triangular (straight lines from point to point). Any ideas on how to alter this output?
Thanks.

ggplot2: how to add sample numbers to density plot?

I am trying to generate a (grouped) density plot labelled with sample sizes.
Sample data:
set.seed(100)
df <- data.frame(ab.class = c(rep("A", 200), rep("B", 200)),
val = c(rnorm(200, 0, 1), rnorm(200, 1, 1)))
The unlabelled density plot is generated and looks as follows:
ggplot(df, aes(x = val, group = ab.class)) +
geom_density(aes(fill = ab.class), alpha = 0.4)
What I want to do is add text labels somewhere near the peak of each density, showing the number of samples in each group. However, I cannot find the right combination of options to summarise the data in this way.
I tried to adapt the code suggested in this answer to a similar question on boxplots: https://stackoverflow.com/a/15720769/1836013
n_fun <- function(x){
return(data.frame(y = max(x), label = paste0("n = ",length(x))))
}
ggplot(df, aes(x = val, group = ab.class)) +
geom_density(aes(fill = ab.class), alpha = 0.4) +
stat_summary(geom = "text", fun.data = n_fun)
However, this fails with Error: stat_summary requires the following missing aesthetics: y.
I also tried adding y = ..density.. within aes() for each of the geom_density() and stat_summary() layers, and in the ggplot() object itself... none of which solved the problem.
I know this could be achieved by manually adding labels for each group, but I was hoping for a solution that generalises, and e.g. allows the label colour to be set via aes() to match the densities.
Where am I going wrong?
The y in the return of fun.data is not the aes. stat_summary complains that he cannot find y, which should be specificed in global settings at ggplot(df, aes(x = val, group = ab.class, y = or stat_summary(aes(y = if global setting of y is not available. The fun.data compute where to display point/text/... at each x based on y given in the data through aes. (I am not sure whether I have made this clear. Not a native English speaker).
Even if you have specified y through aes, you won't get desired results because stat_summary compute a y at each x.
However, you can add text to desired positions by geom_text or annotate:
# save the plot as p
p <- ggplot(df, aes(x = val, group = ab.class)) +
geom_density(aes(fill = ab.class), alpha = 0.4)
# build the data displayed on the plot.
p.data <- ggplot_build(p)$data[[1]]
# Note that column 'scaled' is used for plotting
# so we extract the max density row for each group
p.text <- lapply(split(p.data, f = p.data$group), function(df){
df[which.max(df$scaled), ]
})
p.text <- do.call(rbind, p.text) # we can also get p.text with dplyr.
# now add the text layer to the plot
p + annotate('text', x = p.text$x, y = p.text$y,
label = sprintf('n = %d', p.text$n), vjust = 0)

Resources