Plotly problem of specifying point symbol within the boxplot - r

I am trying to specify the point symbol (shape) based on a factor, so that the point shape within the boxplot can be different (which can be very useful for highlighting a group of points). However, it looks like instead of showing different shapes, the third boxplot got split into two boxes.
Can you please advise how to achieve that?
data(iris)
iris=mutate(iris, Petal.Width_high=ifelse(Petal.Width>2,"High","Low"))
iris %>% plot_ly(x = ~ Species, y = ~ Petal.Width, color= ~ Species,
symbol = ~ Petal.Width_high,
type = "box", mode="markers",boxpoints="all",
jitter = 0.4, marker = list(size = 10),
pointpos = 0,hoverinfo='text',
text= ~paste('</br>Species: ', Species,
'</br>Petal.Width: ', Petal.Width))

Do one plot first and then add_markers afterwards. Something like:
p <- iris %>%
group_by(Species) %>%
plot_ly(x = ~ Species, y = ~ Petal.Width,
type = "box",
hoverinfo='text',
text= ~paste('</br>Species: ', Species,
'</br>Petal.Width: ', Petal.Width))
add_markers(p, symbol = ~ Petal.Width_high, marker = list(size = 10))

Related

Adding a power curve to scatterplot

I want to add a power curve with confidence intervals to my diamter-weight relationship, which clearly follows a y=a*x^b regression. So far, I used the geom_smooth "loess" version, but this is not yet quite right and perfect. Any suggestion how to add a power regression line would be much appreciated. Below is the used code:
p2<-ggplot(Data,aes(x=Diameter,y=Wet_weight,colour=Site))+
geom_point(size=3.5,alpha=0.3)+
geom_smooth(aes(group=Species),method=loess,colour="black")+
labs(x="\nUmbrella diamter (mm)",y="Wet weight (mg)\n")+theme_classic()+
scale_colour_manual(values=c("black","dark blue","blue","dark green","green"))+
theme(axis.title.x=element_text(size=20),
axis.text.x=element_text(size=18,colour="black"),
axis.title.y=element_text(size=20),
axis.text.y=element_text(size=18,colour="black"),
axis.ticks=element_line(colour="black",size=1),
axis.line=element_line(colour="black",size=1,linetype="solid"),
legend.position=c(0.18,0.75),
legend.text=element_text(colour="black",size=17),
legend.title=element_text(colour="black",size=18))
p2
Thank you!
I used this to get many equations, R2, and plots.
df= #change your data frame so it fits the current code
variables=c("group","year") #if you have multiple groups/seasons/years/elements add them here
df$y= #which variable will be your y
df$x= #which variable will be your x
#No changes get the equations
text=df %>%
group_by(across(all_of(variables))) %>% #your grouping variables
do(broom::tidy(lm(log(y) ~ log(x), data = .))) %>%
ungroup() %>%
mutate(y = round(ifelse(term=='(Intercept)',exp(estimate),estimate),digits = 2)) %>% #your equation values rounded to 2
select(-estimate,-std.error,-statistic ,-p.value) %>%
pivot_wider(names_from = term,values_from = y) %>%
rename(.,a=`(Intercept)`,b=`log(x)`)
#CHANGE before running!! add your grouping variables
rsq=df %>%
split(list(.$group,.$year)) %>% #---- HERE add the names after $
map(~lm(log(y) ~ log(x), data = .)) %>%
map(summary) %>%
map_dbl("r.squared") %>%
data.frame()
#Join the R2 and y results for the plot in a single data frame and write the equations
labels.df=mutate(rsq,groups=row.names(rsq)) %>%
separate(col = groups,into = c(variables),sep = "[.]",
convert = TRUE, remove = T, fill = "right") %>%
rename("R"='.') %>%
left_join(text,.) %>%
mutate(R=round(R,digits = 4), #round your R2 digits
eq= paste('y==',a,"~x^(",b,")", sep = ""),
rsql=paste("R^2==",R),
full= paste('y==',a,"~x^(",b,")","~~R^2==",R, sep = ""))
# plot
ggplot(df,aes(x = x,y = y)) +
geom_point(size=4,mapping = aes(
colour=factor(ifelse(is.na(get(variables[2])),"",(get(variables[2])))), #points colour
shape=get(variables[1]))) + # different shapes
facet_wrap(get(variables[1])~ifelse(is.na(get(variables[2])),"",get(variables[2])),
scales = "free",labeller = labeller(.multi_line = F))+ #for multiple groups; join text in one line
stat_smooth(mapping=aes(colour=get(variables[1])), #colours for our trend
method = 'nls', formula = 'y~a*x^b',
method.args = list(start=c(a=1,b=1)),se=FALSE) +
geom_text(labels.df,x = Inf, y = Inf,size=5, mapping = aes(label = (eq)), parse = T,vjust=1, hjust=1)+
geom_text(labels.df,x = Inf,y = Inf,size=5, mapping = aes(label = (rsql)), parse = T,vjust=2.5, hjust=1)+
#scale_y_log10() + #add this to avoid problems with big y values
labs(x="Your x label",y="your y label")+
theme_bw(base_size = 16) +
theme(legend.position = "none",
strip.background = element_rect(fill="#b2d6e2"))

R plotly histogram hover text

This is my code. Just a simple historgram. But what I wanted to do is to customize the hover text so that when I hover, it will display all species included in that histogram bar. Can you help me?
iris %>%
plot_ly(x=~Sepal.Length, color=~Sepal.Width, text=~Species) %>%
add_histogram()
Here's the output. But when I hover it seems the text is only displaying the first species in the table.
plotly_hist
I'm not sure whether this is possible. Probably you are demanding too much from plotly. After trying some options I think there are two ways to go if you want the different Species to show up in the tooltip:
First option is to use a stacked histogram using hovermode = "unified" like so:
library(plotly)
fig <- plot_ly()
fig <- fig %>% add_trace(data = filter(iris, Species == "setosa"),
x = ~Sepal.Length,
color = ~Species,
text = ~Species,
type='histogram',
bingroup=1, showlegend = FALSE)
fig <- fig %>% add_trace(data = filter(iris, Species == "versicolor"),
x = ~Sepal.Length,
color = ~Species,
text = ~Species,
type='histogram',
bingroup=1, showlegend = FALSE)
fig <- fig %>% add_trace(data = filter(iris, Species == "virginica"),
x = ~Sepal.Length,
color = ~Species,
text = ~Species,
type='histogram',
bingroup=1, showlegend = FALSE)
fig <- fig %>% layout(
hovermode="unified",
barmode="stack",
bargap=0.1)
fig
The second option would be to make the computations yourself, i.e. binning and summarising and to make a bar chart of the counts.
iris %>%
mutate(Sepal.Length.Cut = cut(Sepal.Length, breaks = seq(4, 8, .5), right = FALSE)) %>%
group_by(Sepal.Length.Cut, Species) %>%
summarise(n = n(), Sepal.Width = sum(Sepal.Width)) %>%
tidyr::unite("text", Species, n, sep = ": ", remove = FALSE) %>%
summarise(n = sum(n), Sepal.Width = sum(Sepal.Width) / n, text = paste(unique(text), collapse = "\n")) %>%
plot_ly(x = ~Sepal.Length.Cut, y = ~n, text = ~text) %>%
add_bars(marker = list(colorscale = "Rainbow"), hovertemplate = "%{y}<br>%{text}")
Edit A third option would be to use ggplotly(). This way it is an easy task to add annotations displayling the total numbers per bin. This way we can make use of the stats layers in ggplot2 which will do all the computations. To the best of my knowledge that couldn't be done that easily using "pure" plotly.
library(plotly)
ggplot(iris, aes(Sepal.Length, fill = Species)) +
stat_bin(breaks = seq(4, 8, .5), closed = "left") +
stat_bin(breaks = seq(4, 8, .5), closed = "left", geom = "text", mapping = aes(Sepal.Length, label = ..count..), inherit.aes = FALSE, vjust = -.5) +
theme_light()
ggplotly()

Grouping not respected when using ggplotly to group boxplots

I was trying the following code in order to get a graph of boxplots with ggplot2 which are grouped according to different categories:
category_1 <- rep(LETTERS[1:4], each = 20)
value <- rnorm(length(category_1), mean = 200, sd = 20)
category_2 <- rep(as.factor(c("Good", "Medium", "Bad")), length.out = length(category_1))
category_3 <- rep(as.factor(c("Bright", "Dark")), length.out = length(category_1))
df <- data.frame( category_1, value, category_2, category_3)
p <- ggplot(df, aes(x = category_1, y = value, color = category_2, shape = category_3)) +
geom_boxplot(alpha = 0.5) +
geom_point(position=position_jitterdodge(), alpha=0.7)
p
I'm still too noob in stackoverflow to post images, but this is the result I want.
However, when I try to convert it to plotly using
pp <- ggplotly(p)
pp
the last 2 grouping layers (shape and color) are "ignored" and all the boxplots are plotted on top of each other, only respecting the x-axis grouping specified in aes(x = category_1, ...) as you can see here.
How can I avoid this problem? Thanks for your time.
EDIT
I've tried using plotly syntax directly and I get a similar result using the following code:
pp <- plot_ly(df, x = ~category_1, y = ~value, color = ~category_2,
mode = "markers", symbol = ~category_3, type = "box", boxpoints = "all") %>%
layout(boxmode = "group")
pp
Here the result. I said similar because plotly forces the dots to be next to, and not on top of the boxplot, which is not exactly what I wanted.
I guess the question is "solved". Although, I'm still curious if there is an explanation for the problem above. Thanks again!
I think this will solve your issue.
p <- ggplot(df, aes(x = category_1, y = value, color = category_2, shape = category_3)) +
geom_boxplot(alpha = 0.5) +
geom_point(position=position_jitterdodge(), alpha=0.7)
p %>%
ggplotly() %>%
layout(boxmode = "group")
Cheers.

Stacked bar graphs in plotly: how to control the order of bars in each stack

I'm trying to order a stacked bar chart in plotly, but it is not respecting the order I pass it in the data frame.
It is best shown using some mock data:
library(dplyr)
library(plotly)
cars <- sapply(strsplit(rownames(mtcars), split = " "), "[", i = 1)
dat <- mtcars
dat <- cbind(dat, cars, stringsAsFactors = FALSE)
dat <- dat %>%
mutate(carb = factor(carb)) %>%
distinct(cars, carb) %>%
select(cars, carb, mpg) %>%
arrange(carb, desc(mpg))
plot_ly(dat) %>%
add_trace(data = dat, type = "bar", x = carb, y = mpg, color = cars) %>%
layout(barmode = "stack")
The resulting plot doesn't respect the ordering, I want the cars with the largest mpg stacked at the bottom of each cylinder group. Any ideas?
As already pointed out here, the issue is caused by having duplicate values in the column used for color grouping (in this example, cars). As indicated already, the ordering of the bars can be remedied by grouping your colors by a column of unique names. However, doing so will have a couple of undesired side-effects:
different model cars from the same manufacturer would be shown with different colors (not what you are after - you want to color by manufacturer)
the legend will have more entries in it than you want i.e. one per model of car rather than one per manufacturer.
We can hack our way around this by a) creating the legend from a dummy trace that never gets displayed (add_trace(type = "bar", x = 0, y = 0... in the code below), and b) setting the colors for each category manually using the colors= argument. I use a rainbow pallette below to show the principle. You may like to select sme more attractive colours yourself.
dat$unique.car <- make.unique(as.character(dat$cars))
dat2 <- data.frame(cars=levels(as.factor(dat$cars)),color=rainbow(nlevels(as.factor(dat$cars))))
dat2[] <- lapply(dat2, as.character)
dat$color <- dat2$color[match(dat$cars,dat2$cars)]
plot_ly() %>%
add_trace(data=dat2, type = "bar", x = 0, y = 0, color = cars, colors=color, showlegend=T) %>%
add_trace(data=dat, type = "bar", x = carb, y = mpg, color = unique.car, colors=color, showlegend=F, marker=list(line=list(color="black", width=1))) %>%
layout(barmode = "stack", xaxis = list(range=c(0.4,8.5)))
One way to address this is to give unique names to all models of car and use that in plotly, but it's going to make the legend messier and impact the color mapping. Here are a few options:
dat$carsID <- make.unique(as.character(dat$cars))
# dat$carsID <- apply(dat, 1, paste0, collapse = " ") # alternative
plot_ly(dat) %>%
add_trace(data = dat, type = "bar", x = carb, y = mpg, color = carsID) %>%
layout(barmode = "stack")
plot_ly(dat) %>%
add_trace(data = dat, type = "bar", x = carb, y = mpg, color = carsID,
colors = rainbow(length(unique(carsID)))) %>%
layout(barmode = "stack")
I'll look more tomorrow to see if I can improve the legend and color mapping.

plotly not creating linear trend line

In creating a trend line for a scatter plot, I am using add_trace to add a linear trend line.
When the data only has one "series" of data, i.e. there is only one group of coordinates, the code below works fine. However, when I introduce a number of series, the "trend line" looks like this:
Here is the relevant part of the code:
p <- plot_ly(filteredFull(), x=Relative.Time.Progress, y=cumul.ans.keystroke,
mode='markers', color=KeystrokeRate, size=KeystrokeRate,
marker=list(sizeref=100), type='scatter',
hoverinfo='text', text=paste("token: ",Token, "Keystrokes: ",
KeystrokeCount)) %>%
layout(
xaxis=list(range=c(0,1)),
yaxis=list(range=c(0,max(filteredFull()$cumul.ans.keystroke)))
)
lm.all <- lm(cumul.ans.keystroke ~ Relative.Time.Progress,
data=df)
observe(print(summary(lm.all)))
p <- add_trace(p, y=fitted(lm.all), x=Relative.Time.Progress,
mode='lines') %>%
layout(
xaxis= list(range = c(0,1))
)
p
I can add more code, or try to make a minimal working example, if necessary. However, I'm hoping that this is a famililar problem that is obvious from the code.
I think you'll need to specify the data = ... argument in add_trace(p, y=fitted(lm.all), x=Relative.Time.Progress, mode='lines').
The first trace seems to be a subset but the second trace uses the regression fitted values which are obtained by fitting a regression model to the entire dataset.
There might be a mismatch between Relative.Time.Progress in filteredFull() vs df.
Here's an example. Hopefully helps...
library(plotly)
df <- diamonds[sample(1:nrow(diamonds), size = 500),]
fit <- lm(price ~ carat, data = df)
df1 <- df %>% filter(cut == "Ideal")
plot_ly(df1, x = carat, y = price, mode = "markers") %>%
add_trace(x = carat, y = fitted(fit), mode = "lines")
plot_ly(df1, x = carat, y = price, mode = "markers") %>%
add_trace(data = df, x = carat, y = fitted(fit), mode = "lines")
It changed now a bit, the following code should work fine:
df <- diamonds[sample(1:nrow(diamonds), size = 500),]
fit <- lm(price ~ carat, data = df)
df1 <- df %>% filter(cut == "Ideal")
plot_ly() %>%
add_trace(data = df1, x = ~carat, y = ~price, mode = "markers") %>%
add_trace(data = df, x = ~carat, y = fitted(fit), mode = "lines")
Need to start with empty plotly and add traces.

Resources