Show observations that are outliers in plot_ly - r

I am plotting jitter boxplots through plotly in R. Plotly boxplots allow analyzing interactively the quartiles and the values of outliers. (Examples here: https://plot.ly/r/box-plots/)
I would like to see the name of observations that are outliers, so I can analyze them later.
However, it seems that boxplots don't have the option of watching to which observation they belong to, in contrast to scatter boxplots, where one can see it through 'text' option.
Before implementing other approaches, however, I would like to confirm that there is no possibility to have this information plotted.

I didn't find this option also.
I tried to plot but I didn't succeed, so I Located the outlier with the function boxplot.stats and wrote over it.
Look this example:
set.seed(1234)
a<-rnorm(50)
a2 <- rnorm(50, 1)
plot_ly(y = a, type = 'box') %>%
add_trace(y = a2) %>%
layout(title = 'Box Plot',xaxis = list(title = "cond", showgrid = F), yaxis = list(title = "rating"),
annotations = list(
x = -0.01,
y = boxplot.stats(a)$out,
text = "Outlier",
showarrow = FALSE,
xanchor = "right"
))

If you still want the outliers labeled by tooltips you can also identify them separately and pass the outliers dataset to add_marker(), overwriting the boxplot outliers. Try something like this:
#Set seed
set.seed(9)
#Generate random dataset
x <- data.frame(values = rnorm(100,sd=2),labels = paste("point",as.character(1:100)))
#Get outliarsdata
vals <- boxplot(x[,"values"],plot = FALSE)
#Make outliars dataset
y <- x[x[,"values"] > vals$stats[5,1] | x[,"values"] < vals$stats[1,1],]
#Make plot
plot_ly(x,y = ~values,x = 1,type = "box") %>%
add_markers(data = y, text = y[,'labels'])

I know that this comes horrendously late.
Check This link. Plotly allows a few options for showing outliers.
Curiously, it does not allow any option to NOT plot outliers (that is what I was looking for).

Related

Problem with Choropleth map's legend created by plotly in R based on categorical-data and sf map

Please help ! This question is very similar to this one that has been answered quite some time ago. However, I still cannot get my head around the solution:
How to create a chloropleth map in R Plotly based on a Categorical variable?
I'm trying to create an interactive Choropleth map in R for my shiny app based on categorical data, using plotly and sf data get from GADM. Here is a reproducible example:
library(raster)
library(tidyverse)
library(plotly)
library(sf)
# Get the map data in sf format
map_data <- getData("GADM", country = "FRA", level = 2, type = "sf")
# Transform sf data to modern crs object to avoid further warning message
st_crs(map_data) <- st_crs(map_data)
# Generate some random data for each region
department <- map %>% as.data.frame() %>% .[, 13]
set.seed(10, sample.kind="Rounding")
data <- sample(x = 0:1200, size = length(department), replace = T)
map_dat <- data.frame(department = department,
data = data)
# Assign Class as categories
map_dat <- map_dat %>%
mutate(Class = cut(data,
breaks = c(-Inf, 50, 100, 200, 500, 1000, Inf),
labels = c("< 50", "50 - 100", "100 - 200",
"200 - 500", "500 - 1000", "> 1000")))
# Join data and plot
plot_dat <- map %>% as.data.frame() %>%
left_join(map_dat, by = c("HASC_2" = "department")) %>%
st_as_sf()
plot_ly(plot_dat) %>%
add_sf(type = "scatter",
stroke = I("transparent"),
span = I(1),
alpha = 1,
split = ~NAME_2,
color = ~Class,
colors = "Reds",
text = ~paste0(NAME_2, "\n", data),
hoveron = "fills",
hoverinfo = "text") %>%
config(displayModeBar = F)
The problem that I have with the default legend is that it's too detail and cumbersome in some way, as I just want to display a small and compact box filled with categorical class that I assign, similar to the one in ggplot. I've try to split the map with categorical Class and the legend looks a little bit better, however my hover text does not work anymore, and still, I have no idea how to edit or change the style of the legend in order for it to look decent and neat, like to change the key symbols to become a circle or a square like we usually see in a legend map box.
plot_dat %>% plot_ly() %>%
add_sf(type = "scatter",
stroke = I("transparent"),
span = I(1),
alpha = 1,
split = ~Class,
color = ~Class,
colors = "Reds",
text = ~paste0(NAME_1, "\n", Count),
hoveron = "fills",
hoverinfo = "text") %>%
config(displayModeBar = F) %>%
layout(showlegend = F)
I've read through the documentation for plotly's Choropleth map in R and yet found no documentaion for categorical case (unlike Python). As I'm running out of options, my question is, is there any way to achive my desired goal here ? How can I create a proper legend box, or a bar for categorical data ?
Apologies if my question is not that clear. I'm eager to answer any questions if anyone has. Thank you in advance.
After getting it to work, I can't help but think...there has to be a better way.
This uses the library spPlot. This isn't a Cran package. Use the following to get this one.
devtools::install_github("GegznaV/spPlot")
You'll likely run into the same dependency issue I ran into. You will need the package ChemometricsWithR to get spPlot. It will try to install that package, but it's another one that you need to get through alternative means.
devtools::install_github("rwehrens/ChemometricsWithR")
I'm using the first plot_ly call you made. I added a few things: legendgroup = ~Class, name = ~Class, and showlegend = F. Then I piped in the function plotly_modify_legend to add the grouped legend.
(plt <- plot_ly(plot_dat) %>%
add_sf(type = "scatter",
stroke = I("transparent"),
span = I(1),
alpha = 1,
split = ~NAME_2,
legendgroup = ~Class, # group legends together by class
name = ~Class, # so region names aren't shown in the legend
color = ~Class,
showlegend = F, # don't show a legend for each region
colors = "Reds",
text = ~paste0(NAME_2, "\n", data),
hoveron = "fills",
hoverinfo = "text") %>%
config(displayModeBar = F) %>%
plotly_modify_legend(showlegend = T, traceorder = "grouped")) # group legend visible
Plotly orders the legend in the order in which the elements appear, which is inherently alphabetical. That equates to having a really odd order in the legend.
The 96 separate subplots (one for each region) needed to be reordered to fix this. I could have just found one of each and changed the first 6 (one for each legend group), but I didn't think that would be easier.
Instead, I captured the traces, extracted their assigned group, and reordered them. After that, I replaced the traces in the plotly object.
# extract the legend groups
trOrder = map(1:length(plt$x$data),
~plt$x$data[[.x]]$legendgroup) %>%
unlist()
# assign an order to the groups and reorder the data
newOrder = data.frame(id = 1:96, trOrder = trOrder) %>%
mutate(trOrder = factor(trOrder, levels(plot_dat$Class))) %>%
arrange(trOrder)
# reorder the traces using the indicies
a = map(newOrder$id,
~plt$x$data[[.x]])
# check it
a[[1]]$legendgroup
# [1] "< 50"
# replace the traces
plt$x$data <- a
plt # mum--nikan-tez - ausgezeichnet - travail exceptionnel - Фантастический
Now the legend makes sense.

R Plotly linked subplot with percentage histogram and categories coloured

The Background
I am using the plotly API in R to create two linked plots. The first is a scatter plot and the second is a bar chart that should show the percentage of data belonging to each category, in the current selection. I can't make the percentages behave as expected.
The problem
The plots render correctly and the interactive selection works fine. When I select a set of data points in the top scatter plot, I would like to see the percentage of that selection that belongs to each category. Instead what I see is the percentage of points in that selection in that category that belong to that category, in other words always 100%. I guess this is because I set color = ~c which applies a grouping to the category.
The Example
Here is a reproducible example to follow. First create some dummy data.
library(plotly)
n = 1000
make_axis = function(n) c(rnorm(n, -1, 1), rnorm(n, 2, 0.25))
data = data.frame(
x = make_axis(n),
y = make_axis(n),
c = rep(c("A", "B"), each = n)
)
Create a sharedData object and supply it to plot_ly() for the base plot.
shared_data = data %>%
highlight_key()
baseplot = plot_ly(shared_data)
Make the individual panels.
points = baseplot %>%
add_markers(x = ~x, y = ~y, color = ~c)
bars = baseplot %>%
add_histogram(x = ~c, color = ~c, histnorm = "percent", showlegend = FALSE) %>%
layout(barmode = "group")
And put them together in a linked subplot with selection and highlighting.
subplot(points, bars) %>%
layout(dragmode = "select") %>%
highlight("plotly_selected")
Here is a screenshot of this to illustrate the problem.
An Aside
Incidentally when I set histnorm = "" in add_histogram() then I get closer to the expected behaviour but I do want percentages and not counts. When I remove color = ~c then I get closer to the expected behaviour but I do want the consistent colour scheme.
What have I tried
I have tried manually supplying the colours but then some of the linked selection breaks. I have tried creating a separate summarised data set from the sharedData object first and then plotting that but again this breaks the linkage between the plots.
If anyone has any clues as to how to solve this I would be very grateful.
To me it seems the behaviour you are looking for isn't implemented in plotly.
Please see schema(): object ► traces ► histogram ► attributes ► histnorm ► description
However, here is the closest I was able to achive via add_bars and perprocessing the data (Sorry for adding data.table, you will be able to do the same in base R, just personal preference):
library(plotly)
library(data.table)
n = 1000
make_axis = function(n) c(rnorm(n, -1, 1), rnorm(n, 2, 0.25))
DT = data.table(
x = make_axis(n),
y = make_axis(n),
c = rep(c("A", "B"), each = n)
)
DT[, grp_percent := rep(100/.N, .N), by = "c"]
shared_data = DT %>%
highlight_key()
baseplot = plot_ly(shared_data)
# Make the individual panels.
points = baseplot %>%
add_markers(x = ~x, y = ~y, color = ~c)
bars = baseplot %>%
add_bars(x = ~c, y = ~grp_percent, color = ~c, showlegend = FALSE) %>%
layout(barmode = "group")
subplot(points, bars) %>%
layout(dragmode = "select") %>%
highlight("plotly_selected")
Unfortunately, the resulting hoverinfo isn't really desirable.

R: Colors in dumbell plot seem to mix in inappropriate ways

I am trying to produce a dumbell plot in R. In this case, there are four rows, and they need to have different and specific colors each. I define the colors as part of the dataset using colorRampPalette(). Then when I produce the plot, the colors get mixed in inappropriate ways. See the image below, and in particular the legend.
As you can see, the orange is supposed to be #7570B3 according to the legend. But this is not correct. The color 7570B3 is purple ! For this reason, the colors that I had defined in the dataset are mixed in the plot. "Alt 2" sound be in orange and "Alt 3" should be in purple.
Does anyone know how to fix this ? Any help would be very appreciated.
Here is a simple version of the code:
table_stats_scores <- data.frame(alt=c("alt1","alt2","alt3","alt4"),
average=c(15,20,10,5),
dumb_colors= colorRampPalette(brewer.pal(4,"Dark2"))(4),
min=c(10,15,5,0),max=c(20,25,15,10)
)
table_stats_scores # This is the dataset
table_stats_scores <- table_stats_scores[order(-
table_stats_scores$average),] # ordering
table_stats_scores$alt <- factor(table_stats_scores$alt,
levels = table_stats_scores$alt[order(table_stats_scores$average)])
# giving factor status to alternatives so that plot_ly() picks up on this
p <- plot_ly(table_stats_scores, x=table_stats_scores$average, color = ~
dumb_colors,
y=table_stats_scores$alt,text=table_stats_scores$alt) %>%
add_segments(x = ~min, xend = ~max, y = ~alt, yend = ~alt,name = "Min-Max
range", showlegend = FALSE, line = list(width = 4)) %>%
add_markers(x = ~average, y = ~alt, name = "Mean",
marker=list(size=8.5),showlegend = FALSE) %>%
add_text(textposition = "top right") %>%
layout(title = "Scores of alternatives",
xaxis = list(title = "scores"),
yaxis = list(title = "Alternatives")
)
p
Yes color can be an issue in plotly, because there are several ways to specify it, and the assignment order of the various elements from the dataframe can be hard to keep in sync.
The following changes were made:
added a list of brighter colors to your dataframe because I couldn't easily visualize the brewer.pal colors. Better to debug with something obvious.
changed the color parameter to the alt column, because it is really just used only indirectly to set the color, and mostly it determines the text in the legend.
added the colors to the text parameter (instead of alt) so I could see if it was assigning the colors correctly.
changed the sort order to the default "ascending" on the table_stat_scores sort because otherwise it assigned the colors in the incorrect order (don't completely understand this - seems like there is some mysterious sorting/re-ordering going on internally)
added a colors parameter to the add_segments and add_markers so that they set the color in the same way using the same column.
I think this gets you want you want:
library(plotly)
library(RColorBrewer)
table_stats_scores <- data.frame(alt=c("alt1","alt2","alt3","alt4"),
average=c(15,20,10,5),
dumb_colors= colorRampPalette(brewer.pal(4,"Dark2"))(4),
min=c(10,15,5,0),max=c(20,25,15,10)
)
table_stats_scores # This is the dataset
table_stats_scores$bright_colors <- c("#FF0000","#00FF00","#0000FF","#FF00FF")
table_stats_scores <- table_stats_scores[order(table_stats_scores$average),] # ordering
table_stats_scores$alt <- factor(table_stats_scores$alt,
levels = table_stats_scores$alt[order(table_stats_scores$average)])
# giving factor status to alternatives so that plot_ly() picks up on this
p <- plot_ly(table_stats_scores, x=~average, color = ~alt, y=~alt,text=~bright_colors) %>%
add_segments(x = ~min, xend = ~max, y = ~alt, yend = ~alt,name = "Min-Max range",
colors=~bright_colors, showlegend = FALSE, line = list(width = 4)) %>%
add_markers(x = ~average, y = ~alt, name = "Mean",
marker=list(size=8.5,colors=~bright_colors),showlegend = FALSE) %>%
add_text(textposition = "top right") %>%
layout(title = "Scores of alternatives",
xaxis = list(title = "scores"),
yaxis = list(title = "Alternatives")
)
p
yielding this:

Grouped line plots in Plotly R: how to control line color?

I have a bunch of 'paired' observations from a study for the same subject, and I am trying to build a spaghetti plot to visualize these observations as follows:
library(plotly)
df <- data.frame(id = rep(1:10, 2),
type = c(rep('a', 10), rep('b', 10)),
state = rep(c(0, 1), 10),
values = c(rnorm(10, 2, 0.5), rnorm(10, -2, 0.5)))
df <- df[order(df$id), ]
plot_ly(df, x = type, y = values, group = id, type = 'line') %>%
layout(showlegend = FALSE)
It produces the correct plot I am seeking. But, the code shows each grouped line in own color, which is really annoying and distracting. I can't seem to find a way to get rid of colors.
Bonus question: I actually want to use color = state and actually color the sloped lines by that variable instead.
Any approaches / thoughts?
You can set the lines to the same colour like this
plot_ly(df, x = type, y = values, group = id, type = 'scatter', mode = 'lines+markers',
line=list(color='#000000'), showlegend = FALSE)
For the 'bonus' two-for-the-price-of-one question 'how to color by a different variable to the one used for grouping':
If you were only plotting markers, and no lines, this would be simple, as you can simply provide a vector of colours to marker.color. Unfortunately, however, line.color only takes a single value, not a vector, so we need to work around this limitation.
Provided the data are not too numerous (in which case this method becomes slow, and a faster method is given below), you can set colours of each line individually by adding them as separate traces one by one in a loop (looping over id)
p <- plot_ly()
for (id in df$id) {
col <- c('#AA0000','#0000AA')[df[which(df$id==id),3][1]+1] # calculate color for this line based on the 3rd column of df (df$state).
p <- add_trace(data=df[which(df$id==id),], x=type, y=values, type='scatter', mode='markers+lines',
marker=list(color=col),
line=list(color=col),
showlegend = FALSE,
evaluate=T)
}
p
Although this one-trace-per-line approach is probably the simplest way conceptually, it does become very (impractically) slow if applied to hundreds or thousands of line segments. In this case there is a faster method, which is to plot only one line per colour, but to split this line up into multiple segments by inserting NA's between the separate segments and using the connectgaps=FALSE option to break the line into segments where there are missing data.
Begin by using dplyr to insert missing values between line segements (i.e. for each unique id we add a row containing NA in the columns that provide x and y coordinates).
library(dplyr)
df %<>% distinct(id) %>%
`[<-`(,c(2,4),NA) %>%
rbind(df) %>%
arrange (id)
and plot, using connectgaps=FALSE:
plot_ly(df, x = type, y = values, group = state, type = 'scatter', mode = 'lines+markers',
showlegend = FALSE,
connectgaps=FALSE)

Subplots using Plotly in R (bug fixed)

How can I create a subplot grid with Plotly in R?
The official site has this nice Python example:
The python code has the option rows=2 and cols=2, but in R the subplot function has just the parameter nrows, without ncols.
I tried this example in R, but nrows do not seam to work as expected:
# Basic subplot
library(plotly)
p <- plot_ly(economics, x = date, y = uempmed)
subplot(p,p,p,p,
margin = 0.05,
nrows=2
) %>% layout(showlegend = FALSE)
They are in a line instead of in a grid. See the result:
Here is the R suplots page for reference. Unfortunately, use ggplotly is not a option for me, like this
UPDATE
It was a bug. Plotly team is very fast, and it was fixed in just 3 days (check here)! Github version is already updated. Great job!
This seems to be a genuine bug in the way subplot() generates the y-axis domains for the two plots. Indeed, they overlap which can easily be seen if you execute
p <- plot_ly(economics, x = date, y = uempmed)
q <- plot_ly(economics, x = date, y = unemploy)
subplot(p,q, nrows = 2)
This will produce the following plot:
If you take a close look at the y-axis you see that they overlap. That hints at a problem in the way subplot() defines the domain of the y-axes of the subplot.
If we correct the domain specification of the y-axes manually (following the plotly documentation), we can solve the problem:
subplot(p,q, nrows = 2) %>% layout(yaxis = list(domain = c(0, 0.48)),
yaxis2 = list(domain = c(0.52, 1)))
This produces:
Now, if you want to reproduce the 4x4 subplot matrix similar to the Python example, you probably need to manually adjust the x-axis domains in a similar way.
Since this is a bug and my solution is only a workaround, I suggest, however, that you file an issue with plotly on GitHub.
Based on this:
p <- economics %>%
tidyr::gather(variable, value, -date) %>%
transform(id = as.integer(factor(variable))) %>%
plot_ly(x = ~date, y = ~value, color = ~variable, colors = "Dark2",
yaxis = ~paste0("y", id)) %>%
add_lines() %>%
subplot(nrows = 5, shareX = TRUE)

Resources