I am using ggplot and I was able to get the plot that I want.
But when I tried to add a legend, something went wrong. The legend has different shapes, sizes and linetypes; the only correct match is the color.
Here is the code, with simulated data:
library(ggplot)
set.seed(5703)
# DATA 1
x1 <- seq(1, 100)
y1 <- rnorm(mean = 50, sd = 10, length(x1))
df1 <- data.frame(x1, y1)
head(df1)
# DATA 2
x2 <- seq(1, 100, 5)
y2 <- rnorm(mean = 50, sd = 2, length(x2))
df2 <- data.frame(x2, y2)
head(df2)
# Plot: DATA 1 and DATA 2
p101 <- ggplot (df1, aes( x = x1, y = y1) ) +
geom_point(aes(color="Vals every 1sec - shape circle"), shape = 1, size = 4 ) +
geom_line (aes(color="Vals every 1sec - shape circle"), size = 0.5, linetype= "dotdash") +
geom_point(data= df2, aes(x = x2, y = y2, color="Vals every 5sec - shape: triangle & bigger, line: thicker"), shape= 2, size= 6 ) +
geom_line (data= df2, aes(x = x2, y = y2, color="Vals every 5sec - shape: triangle & bigger, line: thicker"), size = 1.25, linetype = "solid" ) +
scale_colour_manual("", values=c("Vals every 1sec - shape circle" = "#e66101",
"Vals every 5sec - shape: triangle & bigger, line: thicker" = "#5e3c99" ) )+
theme(legend.position = c(0.7,0.1) )+
labs (title = "Graph Nr. 101", x = "Time [s]", y = "Values")
p101
# legend is mixed up, it is not showing the correct shapes and sizes for each data
Here is the image:
You will notice that both items on the legend have a circle and a triangle, same size and linetype.
Maybe the plot code is entirely wrong, so I am open to any suggestions and ready to learn :)
You will have to add in the legend and theme changes but this should get you to where you want.
library(ggplot)
library(dplyr)
set.seed(5703)
# DATA 1
x1 <- seq(1, 100)
y1 <- rnorm(mean = 50, sd = 10, length(x1))
df1 <- data.frame(x = x1, y = y1, group = "A")
head(df1)
# DATA 2
x2 <- seq(1, 100, 5)
y2 <- rnorm(mean = 50, sd = 2, length(x2))
df2 <- data.frame(x = x2, y = y2, group = "B")
head(df2)
df <- bind_rows(df1, df2)
# Plot: DATA 1 and DATA 2
p101 <- ggplot (df, aes( x = x, y = y, color = group) ) +
geom_line(aes(linetype = group), size = 0.5) +
geom_point(aes(shape = group), size = 4 )
Related
I have continuous data that I'd like to plot using R's plotly with a box or violin plot without the outliers and whiskers:
set.seed(1)
df <- data.frame(group=c(rep("g1",500),rep("g2",700),rep("g3",600)),
value=c(c(rep(0,490),runif(10,10,15)),abs(rnorm(700,1,10)),c(rep(0,590),runif(10,10,15))),
stringsAsFactors = F)
df$group <- factor(df$group, levels = c("g1","g2","g3"))
I know how to remove outliers in plotly:
plotly::plot_ly(x = df$group, y =df$value, type = 'box', color = df$group, boxpoints = F, showlegend = F)
But I'm still left with the whiskers.
I tried using ggplot2 for that (also limiting the height of the y-axis to that of the 75 percentile):
library(ggplot2)
gp <- ggplot(df, aes(group, value, color = group, fill = group)) + geom_boxplot(outlier.shape = NA, coef = 0) +
scale_y_continuous(limits = c(0, ceiling(max(dplyr::summarise(dplyr::group_by(df, group), tile = quantile(value, probs = 0.75))$tile)))) +
theme_minimal() + theme(legend.position = "none",axis.title = element_blank())
But then trying to convert that to a plotly object doesn't maintain that:
plotly::ggplotly(gp)
Any idea?
This is a workaround.
I changed your plot a bit, first.
# box without outliers
p <- plot_ly(df, x = ~group, y = ~value, type = 'box',
color = ~group, boxpoints = F, showlegend = F,
whiskerwidth = 0, line = list(width = 0)) # no whisker, max or min line
Then I add the medians back to the graph. This requires calculating the medians, matching the colors, and creating the shape lists for Plotly.
For the colors, it's odd, the first three default colors are used, but the order is g3, g2, g1...
# the medians
res = df %>% group_by(group) %>%
summarise(med = median(value))
# default color list: https://community.plotly.com/t/plotly-colours-list/11730/2
col = rev(c('#1f77b4', '#ff7f0e', '#2ca02c')) # the plot is colored 3, 2, 1
# discrete x-axis; domain default [0, 1]
# default box margin = .08, three groups, each get 1/3 of space
details <- function(col){ # need everytime basics
list(type = 'line',
line = list(color = col, width = 4),
xref = "paper", yref = "y")
}
# horizontal segments/ median
segs = lapply(1:nrow(res),
function(k){
x1 <- k/3 - .08 # if the domain is [0, 1]
x0 <- (k - 1)/3 + .08
y0 <- y1 <- res[k, ]$med
line = list("x0" = x0, "x1" = x1,
"y0" = y0, "y1" = y1)
deets = details(col[k])
c(deets, line)
})
Finally, I added them back onto the plot.
p %>% layout(shapes = segs)
I made the lines obnoxiously wide, but you get the idea.
If you wanted the IQR outline back, you could do this, as well. I used functions here, as well. I figured that the data you've provided is not the actual data, so the function will serve a purpose.
# include IQR outline
res2 = df %>% group_by(group) %>%
summarise(q1 = setNames(quantile(value, type = 7, 1/4), NULL),
q3 = setNames(quantile(value, type = 7, 3/4), NULL),
med = median(value))
# IQR segments
rects = lapply(1:nrow(res2), # if the domain is [0, 1]
function(k){
x1 <- k/3 - .08
x0 <- (k - 1)/3 + .08
y0 <- res2[k, ]$q1
y1 <- res2[k, ]$q3
line = list(color = col[k], width = 4)
rect = list("x0" = x0, "x1" = x1,
"y0" = y0, "y1" = y1,
type = "rect", xref = "paper",
yref = "y", "line" = line)
rect
})
rects = append(segs, rects)
p %>% layout(shapes = rects)
Im not sure what the correct name for this type of plot would be, but lets say we have a list of names (or letters here): data <- data.frame(letters[1:10])
Lets also say that we want to illustrate which of these names are connected based on some empirical decision, so we have a list of observations we want to connect in a plot like the following (done in powerpoint):
Can this be done in ggplot?
Yes, it can be done in ggplot.
Let's start by setting up a data frame of letters, with associated positions on the x and y axis of a plot. We'll make the x values 1 and 2 (though this is arbitrary), and the y values 1:10 (also arbitrary, as long as they are evenly spaced)
labels <- data.frame(x = c(rep(1, 10), rep(2, 10)),
y = rep(1:10, 2),
labs = rep(LETTERS[10:1], 2),
stringsAsFactors = FALSE)
Now we also need some way of deciding which letters will be joined. Let's do this by having a simple data frame of "left" and "right" values, where each row describes which two letters will be joined:
set.seed(69)
joins <- data.frame(left = sample(LETTERS[1:10], 6, TRUE),
right = sample(LETTERS[1:10], 6, TRUE),
stringsAsFactors = FALSE)
joins
#> left right
#> 1 A G
#> 2 B B
#> 3 H J
#> 4 G D
#> 5 G J
#> 6 F B
Now we can assign start and end x and y co-ordinates for the lines by matching the letters in these two columns to the columns in our labels data frame:
joins$x <- rep(1.05, nrow(joins))
joins$xend <- rep(1.9, nrow(joins))
joins$y <- labels$y[match(joins$left, labels$labs)]
joins$yend <- labels$y[match(joins$right, labels$labs)]
This just leaves the plot. We want to get rid of all the axes, titles and legends so we use theme_void:
library(ggplot2)
ggplot(labels, aes(x, y)) +
geom_text(aes(label = labs), size = 8) +
geom_segment(data = joins, aes(xend = xend, yend = yend, color = left),
arrow = arrow(type = "closed", length = unit(0.02, "npc"))) +
coord_cartesian(xlim = c(0.5, 2.5)) +
theme_void() +
theme(legend.position = "none")
Created on 2020-07-10 by the reprex package (v0.3.0)
This solution could be tidied up, but gives a start using geom_segment
library(tidyverse)
tibble(x0 = 0, x1 = 1, y0 = sample(letters[1:10]), y1 = sample(letters[1:10])) %>%
mutate(y0 = factor(y0, levels = rev(letters[1:10])),
y1 = factor(y1, levels = rev(letters[1:10]))) %>%
ggplot(aes(x = x0, xend = x1, y = y0, yend = y1)) +
geom_segment(arrow = arrow(length = unit(0.03, "npc"))) +
geom_text(aes(x = x1, y = y1, label = y1), nudge_x = 0.01)
I'm attempting to draw tiles / rectangles to get the following result:
library(tidyverse)
library(plotly)
set.seed(0)
df <- tibble(
a = runif(5),
b = runif(5),
c = runif(5),
d = runif(5),
case_id = 1:5
) %>% tidyr::pivot_longer(cols = -case_id)
plot <- ggplot2::ggplot(
data = df,
mapping = aes(
x = name,
y = value,
group = case_id
)
) + geom_point()
plot_boxes_y <- seq(from = 0, to = 1, by = .2)
plot_boxes_x <- unique(df$name) %>% length()
for (x in 1:plot_boxes_x) {
for (y in plot_boxes_y) {
plot <- plot + geom_rect(
mapping = aes_(
xmin = x - .5,
xmax = x + .5,
ymin = y - .5,
ymax = y + .5
),
color = "red",
fill = NA
)
}
}
plotly::ggplotly(plot)
As you can see, I currently do this by looping through coordinates and drawing each rectangle individually. The problem is, that this generates many layers which makes plotly::ggplotly() really slow on large datasets.
Therefore, I'm looking for a more efficient way. Please note, that I cannot use the panel.grid, since I intend to visualize z-data by filling rectangles later on.
My approach was to draw geom_tile() on top of the scatter plot:
# my attempt
df$z <- rep(0, nrow(df))
plot2 <- ggplot2::ggplot(
data = df,
mapping = aes(
x = name,
y = value,
color = z,
group = case_id
)
) + geom_point() + geom_tile()
I assume that this fails because of the fact that name is a discrete variable? So, how can i efficiently draw tiles in addition to my scatterplot?
Thanks
Here is a solution using the geom_tile option. The key here creating a data frame to hold the coordinates of the grid and then specifying the aesthetics individually in each of the function calls.
library(ggplot2)
library(tidyr)
set.seed(0)
df <- tibble(
a = runif(5),
b = runif(5),
c = runif(5),
d = runif(5),
case_id = 1:5
) %>% pivot_longer(cols = -case_id)
df$z <- rep(0, nrow(df))
#make data frame for the grid corrdinates
grid<-data.frame(x=factor( ordered( 1:4), labels = c("a", "b", "c", "d" )),
y=rep(seq(0, 1, .1), each=4))
#plot using geom_tile & geom_point
plot2 <- ggplot2::ggplot() + geom_tile(data=grid, aes(x=x, y=y), fill=NA, col="red") +
geom_point(data = df,
mapping = aes(
x = name,
y = value,
color = z,
group = case_id))
print(plot2)
if you don't mind them going beyond the axis
ggplot(df,aes(x=name,y=value)) + geom_point() +
geom_vline(xintercept=seq(0.5,4.5,by=1)) +
geom_hline(yintercept=seq(0,2,by=.2))
else:
#make a new data frame
GRIDS = rbind(
# the vertical lines
data.frame(x=seq(0.5,4.5,by=1),xend=seq(0.5,4.5,by=1),y=0,yend=2),
# the horizontal lines
data.frame(x=0.5,xend=4.5,y=seq(0,2,by=.2),yend=seq(0,2,by=.2))
)
ggplot(df,aes(x=name,y=value)) + geom_point() +
geom_segment(data=GRIDS,aes(x=x,y=y,xend=xend,yend=yend),col="red")
I would like to stack several geom_line() plots one above the other. However they appeared with changed data.
Here is an example:
# make 3 data.frame with some random data
x <- seq(5, 15, length = 1000)
data1 <- data.frame(x = x, y = dnorm(x, mean = 10, sd = 3), sample = "1")
data2 <- data.frame(x = x, y = dnorm(x, mean = 7.5, sd = 3), sample = "2")
data3 <- data.frame(x = x, y = dnorm(x, mean = 12.5, sd = 1), sample = "3")
# bind data
data <- bind_rows(data1, data2, data3)
# plot data without stacking
plot.data <- data %>% ggplot(mapping = aes(x = x, y = y, color = sample)) + geom_line()
# plot data with stacking
plot.data <- data %>% ggplot(mapping = aes(x = x, y = y, color = sample)) + geom_line(position = position_stack(vjust = 1, reverse = T))
The plot without stacking looks like this:
The plot with stacking looks like this:
So it seems that position_stack sums the data, not shifts them to some constnant value, which is not expected behaviour for geom_line in my opinion. Could you suggest how to make the plots to be just shifted one above the other?
I'm doing a comparison chart of two different estimates of the same time series data. I'm filling the area between the two series in green if the original estimate is more than the latest estimate, and red otherwise.
I've got that part working, but I'd like to add a legend for the fill colors. I tried scale_fill_manual towards the bottom of the code, but it doesn't seem to be doing anything?
Here's the code:
library(ggplot2)
library(scales)
library(colorspace)
# Return a polygon that only plots between yLower and yUpper when yLower is
# less than yUpper.
getLowerPolygon = function(x, yLower, yUpper) {
# Create the table of coordinates
poly = data.frame(
x = numeric(),
y = numeric())
lastReversed = (yUpper[1] < yLower[1])
for (r in 1:length(x)) {
reversed = (yUpper[r] < yLower[r])
if (reversed != lastReversed) {
# Between points r-1 and r, the series intersected, so we need to
# change the polygon from visible to invisible or v.v. In either
# case, just add the intersection between those two segments to the
# polygon. Algorithm from:
# https://en.wikipedia.org/wiki/Line-line_intersection
# First line: x1,y1 - x2,y2
x1 = x[r-1]
y1 = yLower[r-1]
x2 = x[r]
y2 = yLower[r]
# Second line: x3,y3 - x4,y4
x3 = x[r-1]
y3 = yUpper[r-1]
x4 = x[r]
y4 = yUpper[r]
# Calculate determinants
xy12 = det(matrix(c(x1, y1, x2, y2), ncol = 2))
xy34 = det(matrix(c(x3, y3, x4, y4), ncol = 2))
x12 = det(matrix(c(x1, 1, x2, 1), ncol = 2))
x34 = det(matrix(c(x3, 1, x4, 1), ncol = 2))
y12 = det(matrix(c(y1, 1, y2, 1), ncol = 2))
y34 = det(matrix(c(y3, 1, y4, 1), ncol = 2))
# Calculate fraction pieces
xn = det(matrix(c(xy12, x12, xy34, x34), ncol = 2))
yn = det(matrix(c(xy12, y12, xy34, y34), ncol = 2))
d = det(matrix(c(x12 , y12, x34, y34), ncol = 2))
# Calculate intersection
xi = xn / d
yi = yn / d
# Add the point
poly[nrow(poly)+1,] = c(xi, yi)
}
lastReversed = reversed
# http://stackoverflow.com/questions/2563824
poly[nrow(poly)+1,] = c(x[r], min(yLower[r], yUpper[r]))
}
poly = rbind(poly, data.frame(
x = rev(x),
y = rev(yUpper)))
return(poly)
}
getComparisonPlot = function(data, title, lower_name, upper_name,
x_label, y_label, legend_title = '') {
lightGreen = '#b0dd8d'
lightRed = '#fdba9a'
darkGray = RGB(.8, .8, .8)
midGray = RGB(.5, .5, .5)
plot = ggplot(data, aes(x = x))
plot = plot + geom_polygon(
aes(x = x, y = y),
data = data.frame(
x = c(data$x, rev(data$x)),
y = c(data$yLower, rev(data$yUpper))
),
fill = lightRed)
coords = getLowerPolygon(data$x, data$yLower, data$yUpper)
plot = plot + geom_polygon(
aes(x = x, y = y),
data = coords,
fill = lightGreen)
plot = plot + geom_line(
aes(y = yUpper, color = 'upper'),
size = 0.5)
plot = plot + geom_line(
aes(y = yLower, color = 'lower'),
size = 0.5)
plot = plot +
ggtitle(paste(title, '\n', sep='')) +
xlab(x_label) +
ylab(y_label) +
scale_y_continuous(labels = comma)
# http://stackoverflow.com/a/10355844/106302
plot = plot + scale_color_manual(
name = legend_title,
breaks = c('upper' , 'lower'),
values = c('gray20', 'gray50'),
labels = c(upper_name, lower_name))
plot = plot + scale_fill_manual(
name = 'Margin',
breaks = c('upper', 'lower'),
values = c(lightGreen, lightRed),
labels = c('Over', 'Under'))
return(plot)
}
print(getComparisonPlot(
data = data.frame(
x = 1:20,
yLower = 1:20 %% 5 + 2,
yUpper = 1:20 %% 7
),
title = 'Comparison Chart',
lower_name = 'Latest',
upper_name = 'Original',
x_label = 'X axis',
y_label = 'Y axis',
legend_title = 'Thing'
))
Here's an image of the chart, I think it is a cool technique:
I'm also open to any other suggestions for improving my ggplot code.
GGplot need you to map polygons fill aesthetic to some variable. OR, in this case, it need just you to "label" the types of polygons (i.e. 'upper' and 'lower'). You do this by passing a string with the respective label for the fill aesthetic of geom_polygon(). What you are doing is passing a giving colour for each polygon and not mapping to anything that the ggplot will understand. It's kind of a "hard coded colour" =P.
Well, here are the changes inside getComparisonPlot:
plot = plot + geom_polygon(
aes(x = x, y = y, fill = "upper"),
data = coords)
plot = plot + geom_polygon(
aes(x = x, y = y, fill = "lower"),
data = data.frame(
x = c(data$x, rev(data$x)),
y = c(data$yLower, rev(data$yUpper))
))
One more thing. Note that the strings passed to fill aesthetic coincides with the breaks passed to the scale_fill_manual. It is necessary to make the legend map things right.
plot = plot + scale_fill_manual(
name = 'Margin',
breaks = c('upper', 'lower'), # <<< corresponds to fill aesthetic labels
values = c(lightGreen, lightRed),
labels = c('Over', 'Under'))
Result:
hope it helps.