Different behavior between ggplot2 and plotly using ggplotly - r

I want to make a line chart in plotly so that it does not have the same color on its whole length. The color is given continuous scale. It is easy in ggplot2 but when I translate it to plotly using ggplotly function the variable determining color behaves like categorical variable.
require(dplyr)
require(ggplot2)
require(plotly)
df <- data_frame(
x = 1:15,
group = rep(c(1,2,1), each = 5),
y = 1:15 + group
)
gg <- ggplot(df) +
aes(x, y, col = group) +
geom_line()
gg # ggplot2
ggplotly(gg) # plotly
ggplot2 (desired):
plotly:
I found one work-around that, on the other hand, behaves oddly in ggplot2.
df2 <- df %>%
tidyr::crossing(col = unique(.$group)) %>%
mutate(y = ifelse(group == col, y, NA)) %>%
arrange(col)
gg2 <- ggplot(df2) +
aes(x, y, col = col) +
geom_line()
gg2
ggplotly(gg2)
I also did not find a way how to do this in plotly directly. Maybe there is no solution at all. Any ideas?

It looks like ggplotly is treating group as a factor, even though it's numeric. You could use geom_segment as a workaround to ensure that segments are drawn between each pair of points:
gg2 = ggplot(df, aes(x,y,colour=group)) +
geom_segment(aes(x=x, xend=lead(x), y=y, yend=lead(y)))
gg2
ggplotly(gg2)
Regarding #rawr's (now deleted) comment, I think it would make sense to have group be continuous if you want to map line color to a continuous variable. Below is an extension of the OP's example to a group column that's continuous, rather than having just two discrete categories.
set.seed(49)
df3 <- data_frame(
x = 1:50,
group = cumsum(rnorm(50)),
y = 1:50 + group
)
Plot gg3 below uses geom_line, but I've also included geom_point. You can see that ggplotly is plotting the points. However, there are no lines, because no two points have the same value of group. If we hadn't included geom_point, the graph would be blank.
gg3 <- ggplot(df3, aes(x, y, colour = group)) +
geom_point() + geom_line() +
scale_colour_gradient2(low="red",mid="yellow",high="blue")
gg3
ggplotly(gg3)
Switching to geom_segment gives us the lines we want with ggplotly. Note, however, that line color will be based on the value of group at the first point in the segment (whether using geom_line or geom_segment), so there might be cases where you want to interpolate the value of group between each (x,y) pair in order to get smoother color gradations:
gg4 <- ggplot(df3, aes(x, y, colour = group)) +
geom_segment(aes(x=x, xend=lead(x), y=y, yend=lead(y))) +
scale_colour_gradient2(low="red",mid="yellow",high="blue")
ggplotly(gg4)

Related

getting colour scale gradient to work with ggplot converted to ggplotly

I'm not able to correctly draw a colour aesthetic line in plotly, using a ggplot object. What am I missing?
library(ggplot2)
library(plotly)
df <- data.frame(val = as.numeric(LakeHuron), idx = 1:length(LakeHuron))
p <- ggplot(df, aes(x = idx, y = val, colour = val)) + geom_line()
p <- p + scale_color_gradient2(low="red", mid = "gold", high="green", midpoint = mean(df$val))
p
p2 <- ggplotly(p)
p2
p prints the correct expected output.
When I print the plotly object p2, I dont get the line points joining correctly?
The problem is when i add the colour aesthetic I think.
Versions:
plotly 4.9, ggplot2 3.1.1
This is due to a limitation / difference in how plotly works vs. ggplot. Looks like there's an open issue here updated August 2018 suggesting it's not possible within the same structure ggplot uses -- a single series in plotly can't currently have varying color. ("We don't allow per-segment coloring on line traces")
But fear not! We could construct the plot a little differently using geom_segment to specify each part of the line as a separate segment. This structure is a separate object for each segment and will convert over to plotly fine:
df <- data.frame(val = as.numeric(LakeHuron), idx = 1:length(LakeHuron))
p_seg <- ggplot(df, aes(x = idx, y = val,
xend = lead(idx), yend = lead(val),
colour = val)) +
geom_segment()
p_seg <- p_seg + scale_color_gradient2(low="red", mid = "gold", high="green", midpoint = mean(df$val))
p_seg
p2 <- ggplotly(p_seg)

How to keep colours correct when facetting and using variable column names?

I am trying to make a facetted plot like this:
example_data <- data.frame(x = rnorm(100, mean = 0, sd = 1),
y = rnorm(100, mean = 0, sd = 1),
facet=sample(c(0,1), replace=TRUE, size=100))
ggplot(example_data, aes(x=x, y=y, colour=sign(x)!=sign(y)))+
geom_point()+
geom_hline(yintercept=0)+
geom_vline(xintercept=0)+
facet_wrap(~facet)
However, I am doing this for multiple plots where the column names are variable. For plotting the x and y this works using the aes_string, and without facetting this also works for the colour:
ggplot(example_data, aes_string(x='x', y='y', colour=sign(example_data[['x']])!=sign(example_data[['y']])))+
geom_point()+
geom_hline(yintercept=0)+
geom_vline(xintercept=0)+
guides(col=F)
But then when I facet, the colours are not correct anymore:
ggplot(example_data, aes_string(x='x', y='y', colour=sign(example_data[['x']])!=sign(example_data[['y']])))+
geom_point()+
geom_hline(yintercept=0)+
geom_vline(xintercept=0)+
guides(col=F)+
facet_wrap(~facet)
I'm guessing it is because the order of the points is dependent on which facet they are in. I can solve this by getting the colour per facet:
col_facet_0 <- sign(example_data[example_data$facet==0,][['x']])!=sign(example_data[example_data$facet==0,][['y']])
col_facet_1 <- sign(example_data[example_data$facet==1,][['x']])!=sign(example_data[example_data$facet==1,][['y']])
col <- c(col_facet_0, col_facet_1)
ggplot(example_data, aes_string(x='x', y='y', colour=col))+
geom_point()+
geom_hline(yintercept=0)+
geom_vline(xintercept=0)+
guides(col=F)+
facet_wrap(~facet)
The problem is, I need to know before hand which of the facet colours needs to be at the start of colour vector, and which last. e.g in above code, if I had used col <- c(col_facet_1, col_facet_0) instead, the colours would have been wrong.
My question, is there a way to do this within the ggplot function so that I don't need to know which facet has to be first?
You can make the expression a string, like so:
ggplot(example_data, aes_string(x='x', y='y', colour='sign(x) != sign(y)'))+
geom_point()+
geom_hline(yintercept=0)+
geom_vline(xintercept=0)+
guides(col=F)
If you need flexible column names, one could do e.g.:
x_col <- 'x'
y_col <- 'y'
ggplot(
example_data,
aes_string(x_col, y_col, colour = sprintf('sign(%s) != sign(%s)', x_col, y_col))
) + ...

Weird behavior of ggplot combined with fill and scale_y_log10()

I'm trying to produce a histogram with ggplot's geom_histogram which colors the bars according to a gradient, and log10's them.
Here's the code:
library(ggplot2)
set.seed(1)
df <- data.frame(id=paste("ID",1:1000,sep="."),val=rnorm(1000),stringsAsFactors=F)
bins <- 10
cols <- c("darkblue","darkred")
colGradient <- colorRampPalette(cols)
cut.cols <- colGradient(bins)
df$cut <- cut(df$val,bins)
df$cut <- factor(df$cut,level=unique(df$cut))
Then,
ggplot(data=df,aes_string(x="val",y="..count..+1",fill="cut"))+
geom_histogram(show.legend=FALSE)+
scale_color_manual(values=cut.cols,labels=levels(df$cut))+
scale_fill_manual(values=cut.cols,labels=levels(df$cut))+
scale_y_log10()
gives:
whereas dropping the fill from the aesthetics:
ggplot(data=df,aes_string(x="val",y="..count..+1"))+
geom_histogram(show.legend=FALSE)+
scale_color_manual(values=cut.cols,labels=levels(cuts))+
scale_fill_manual(values=cut.cols,labels=levels(cuts))+
scale_y_log10()
gives:
Any idea why do the histogram bars differ between the two plots and to make the first one similar to the second one?
The OP is trying to produce a histogram with ggplot's geom_histogram which colors the bars according to a gradient...
The OP has already done the binning (with 10 bins) but is then calling geom_histogram() which does a binning on its own using 30 bins by default (see ?geomhistogram).
When geom_bar() is used instead together with cutinstead of val
ggplot(data = df, aes_string(x = "cut", y = "..count..+1", fill = "cut")) +
geom_bar(show.legend = FALSE) +
scale_color_manual(values = cut.cols, labels = levels(df$cut)) +
scale_fill_manual(values = cut.cols, labels = levels(df$cut)) +
scale_y_log10()
the chart becomes:
Using geom_histogram() with filled bars is less straightforward as can be seen in this and this answer to the question How to fill histogram with color gradient?

How do I color by factors of a categorical variable for faceted barplots?

My question relates to plots in ggplot. Running the code below each image should work if you load the "diamonds" dataset that comes with ggplot2.
I am trying to generate a graph like this:
library(ggplot2)
#First plot
p1 <- ggplot(diamonds, aes(color)) + geom_bar(aes(group = cut, y = ..density..))
p1 <- p1 + facet_wrap(~cut)
p1
but I want to color each bar in each facet by factor, like in this plot:
#Second plot
p2 <- ggplot(diamonds, aes(color)) + geom_bar(aes( y = ..density.., fill = color))
p2 <- p2 + facet_wrap(~cut)
p2
The problem is that "group =" and "fill=" appear to interfere with each other when I attempt to call them both; ggplot seems to ignore the "fill" command when "group" is also called.
The call to group is important because it forces the y-axis to scale for each facet, so that densities within each facet add up to 1. However, I'd like to be able to visually distinguish between groups easily using fill colors.
How can I work around this?
The problem is with ..density... It often is a convenient shortcut, but in a more complicated situation like this one it's often easier just to calculate on your own:
library(dplyr)
diam2 <- diamonds %>% group_by(cut) %>%
mutate(ncut = n()) %>%
group_by(cut, color) %>%
summarize(den = n() / first(ncut))
ggplot(diam2, aes(x = color, fill = color, y = den)) +
geom_bar(stat = "identity") +
facet_wrap(~ cut)
I should add, comparing my plot with your p1, the shapes are the same but the scale looks a little different (mine being a little lower overall). I'm not sure why.

altering the color of one value in a ggplot histogram

I have a simplified dataframe
library(ggplot2)
df <- data.frame(wins=c(1,1,3,1,1,2,1,2,1,1,1,3))
ggplot(df,aes(x=wins))+geom_histogram(binwidth=0.5,fill="red")
I would like to get the final value in the sequence,3, shown with either a different fill or alpha. One way to identify its value is
tail(df,1)$wins
In addition, I would like to have the histogram bars shifted so that they are centered over the number. I tried unsuccesfully subtracting from the wins value
You can do this with a single geom_histogram() by using aes(fill = cond).
To choose different colours, use one of the scale_fill_*() functions, e.g. scale_fill_manual(values = c("red", "blue").
library(ggplot2)
df <- data.frame(wins=c(1,1,3,1,1,2,11,2,11,15,1,1,3))
df$cond <- df$wins == tail(df,1)$wins
ggplot(df, aes(x=wins, fill = cond)) +
geom_histogram() +
scale_x_continuous(breaks=df$wins+0.25, labels=df$wins) +
scale_fill_manual(values = c("red", "blue"))
1) To draw bins in different colors you can use geom_histogram() for subsets.
2) To center bars along numbers on the x axis you can invoke scale_x_continuous(breaks=..., labels=...)
So, this code
library(ggplot2)
df <- data.frame(wins=c(1,1,3,1,1,2,11,2,11,15,1,1,3))
cond <- df$wins == tail(df,1)$wins
ggplot(df, aes(x=wins)) +
geom_histogram(data=subset(df,cond==FALSE), binwidth=0.5, fill="red") +
geom_histogram(data=subset(df,cond==TRUE), binwidth=0.5, fill="blue") +
scale_x_continuous(breaks=df$wins+0.25, labels=df$wins)
produces the plot:

Resources