R Plotly jittered boxplot with NAs width

R Plotly jittered boxplot with NAs width - r

I am plotting the grouped boxplot with jittering with the following function:
plot_boxplot <- function(dat) {
# taking one of each joine_group to be able to plot it
allx <- dat %>%
mutate(y = median(y, na.rm = TRUE)) %>%
group_by(joined_group) %>%
sample_n(1) %>%
ungroup()
p <- dat %>%
plotly::plot_ly() %>%
# plotting all the groups 1:20
plotly::add_trace(data = allx,
x = ~as.numeric(joined_group),
y = ~y,
type = "box",
hoverinfo = "none",
boxpoints = FALSE,
color = NULL,
opacity = 0,
showlegend = FALSE) %>%
# plotting the boxes
plotly::add_trace(data = dat,
x = ~as.numeric(joined_group),
y = ~y,
color = ~group1,
type = "box",
hoverinfo = "none",
boxpoints = FALSE,
showlegend = FALSE) %>%
# adding ticktext
layout(xaxis = list(tickvals = 1:20,
ticktext = rep(levels(dat$group1), each = 4)))
p <- p %>%
# adding jittering
add_markers(data = dat,
x = ~jitter(as.numeric(joined_group), amount = 0.2),
y = ~y,
color = ~group1,
showlegend = FALSE)
p
}
The problem is that when some of the levels have NA as y variable the width of the jittered boxes changes. Here is an example:
library(plotly)
library(dplyr)
set.seed(123)
dat <- data.frame(group1 = factor(sample(letters[1:5], 100, replace = TRUE)),
group2 = factor(sample(LETTERS[21:24], 100, replace = TRUE)),
y = runif(100)) %>%
dplyr::mutate(joined_group = factor(
paste0(group1, "-", group2)
))
# do the plot with all the levels
p1 <- plot_boxplot(dat)
# now the group1 e is having NAs as y values
dat$y[dat$group1 == "e"] <- NA
# create the plot with missing data
p2 <- plot_boxplot(dat)
# creating the subplot to see that the width has changed:
subplot(p1, p2, nrows = 2)
The problem is that the width of boxes in both plots is different:
I've realised that the boxes have the same size without jittering so I know that the jittering is "messing" with the width but I don't know how to fix that.
Does anyone know how to make the width in both jittered plots exactly the same?

I see two separate plot shifts:
due to jittering
due to NAs
First can be solved by declaring new jitter function with fixed seed
fixed_jitter <- function (x, factor = 1, amount = NULL) {
set.seed(42)
jitter(x, factor, amount)
}
and using it instead of jitter in add_markers call.
Second problem can be solved by assigning -1 instead of NA and setting
yaxis = list(range = c(0, ~max(1.1 * y)))
as a second parameter to layout.

Related

Changing the widths of plotly boxes and/or the space between them

I have grouped data which I want to plot as a group of box plots using R's plotly package, and control the width of the boxes and/or the space between theme.
Here are the data:
set.seed(1)
df <- data.frame(type = c(rep("t1", 1000), rep("t2", 1000), rep("t3", 1000), rep("t4", 1000), rep("t5", 1000), rep("t6", 1000)),
age = rep(c(rep("y", 500),rep("o", 500)), 6),
value = rep(c(runif(500, 5, 10), runif(500, 7.5, 12.5)), 6),
stringsAsFactors = F)
df$age <- factor(df$age, levels = c("y", "o"), ordered = T)
Following plotly's tutorial this is how I'm plotting it:
library(plotly)
library(dplyr)
plot_ly(x = df$type, y = df$value, name = df$age, color = df$type, type = 'box',showlegend = F) %>%
layout(yaxis=list(title="Diversity"),boxmode='group')
Which gives:
Where the boxes come out too narrow and the space both between boxes of the same type as well as the space between the different types are big.
Any idea how to change the box widths and/or the spaces?
According to this post, in python the boxgap and boxgroupgap control these aspects.

Analogous to the python version, layout parameters as being documented here can be changed as arguments of the function layout:
plot_ly(x = df$type, y = df$value, name = df$age, color = df$type,
type = "box", showlegend = F) %>%
layout(yaxis = list(title = "Diversity"),
boxmode = "group", boxgap = 0, boxgroupgap = 0
)

One alternative is to use a continuous x-axis. Here with ggplotly instead:
# convert factors to numbers
df$itype <- as.numeric (factor (df$type))
sc <- scale (unique (as.numeric (factor (df$age))))
df$iage <- sc[as.numeric (factor (df$age))] * .3
# plot
gg <-
ggplot (df, aes (x=itype+iage, y=value, color=type, group=itype+iage)) +
geom_boxplot() +
scale_x_continuous(labels = levels (factor (df$type)), breaks = 1:length (levels (factor (df$type)))) +
labs (x="", y="Diversity")
ggplotly (gg) %>%
layout(boxgroupgap = 0, boxgap=0)
plot

Remove whiskers and outliers in R plotly

I have continuous data that I'd like to plot using R's plotly with a box or violin plot without the outliers and whiskers:
set.seed(1)
df <- data.frame(group=c(rep("g1",500),rep("g2",700),rep("g3",600)),
value=c(c(rep(0,490),runif(10,10,15)),abs(rnorm(700,1,10)),c(rep(0,590),runif(10,10,15))),
stringsAsFactors = F)
df$group <- factor(df$group, levels = c("g1","g2","g3"))
I know how to remove outliers in plotly:
plotly::plot_ly(x = df$group, y =df$value, type = 'box', color = df$group, boxpoints = F, showlegend = F)
But I'm still left with the whiskers.
I tried using ggplot2 for that (also limiting the height of the y-axis to that of the 75 percentile):
library(ggplot2)
gp <- ggplot(df, aes(group, value, color = group, fill = group)) + geom_boxplot(outlier.shape = NA, coef = 0) +
scale_y_continuous(limits = c(0, ceiling(max(dplyr::summarise(dplyr::group_by(df, group), tile = quantile(value, probs = 0.75))$tile)))) +
theme_minimal() + theme(legend.position = "none",axis.title = element_blank())
But then trying to convert that to a plotly object doesn't maintain that:
plotly::ggplotly(gp)
Any idea?

This is a workaround.
I changed your plot a bit, first.
# box without outliers
p <- plot_ly(df, x = ~group, y = ~value, type = 'box',
color = ~group, boxpoints = F, showlegend = F,
whiskerwidth = 0, line = list(width = 0)) # no whisker, max or min line
Then I add the medians back to the graph. This requires calculating the medians, matching the colors, and creating the shape lists for Plotly.
For the colors, it's odd, the first three default colors are used, but the order is g3, g2, g1...
# the medians
res = df %>% group_by(group) %>%
summarise(med = median(value))
# default color list: https://community.plotly.com/t/plotly-colours-list/11730/2
col = rev(c('#1f77b4', '#ff7f0e', '#2ca02c')) # the plot is colored 3, 2, 1
# discrete x-axis; domain default [0, 1]
# default box margin = .08, three groups, each get 1/3 of space
details <- function(col){ # need everytime basics
list(type = 'line',
line = list(color = col, width = 4),
xref = "paper", yref = "y")
}
# horizontal segments/ median
segs = lapply(1:nrow(res),
function(k){
x1 <- k/3 - .08 # if the domain is [0, 1]
x0 <- (k - 1)/3 + .08
y0 <- y1 <- res[k, ]$med
line = list("x0" = x0, "x1" = x1,
"y0" = y0, "y1" = y1)
deets = details(col[k])
c(deets, line)
})
Finally, I added them back onto the plot.
p %>% layout(shapes = segs)
I made the lines obnoxiously wide, but you get the idea.
If you wanted the IQR outline back, you could do this, as well. I used functions here, as well. I figured that the data you've provided is not the actual data, so the function will serve a purpose.
# include IQR outline
res2 = df %>% group_by(group) %>%
summarise(q1 = setNames(quantile(value, type = 7, 1/4), NULL),
q3 = setNames(quantile(value, type = 7, 3/4), NULL),
med = median(value))
# IQR segments
rects = lapply(1:nrow(res2), # if the domain is [0, 1]
function(k){
x1 <- k/3 - .08
x0 <- (k - 1)/3 + .08
y0 <- res2[k, ]$q1
y1 <- res2[k, ]$q3
line = list(color = col[k], width = 4)
rect = list("x0" = x0, "x1" = x1,
"y0" = y0, "y1" = y1,
type = "rect", xref = "paper",
yref = "y", "line" = line)
rect
})
rects = append(segs, rects)
p %>% layout(shapes = rects)

Plotly in R: Piechart subplot changing domains when linked to choropleth with crosstalk

Is there a way to keep a piechart subplot from changing its domain with Plotly in R using crosstalk?
The idea here is to have a choropleth map on the left side and a piechart on the right side.
When I click on one country on the map, the piechart shows data from that country.
I use a SharedData object and the link between both subplots works fine.
The problem is:
The piechart subplot stays where it should for the first location Code in my dataframe (AUS in this case), but when I click on another country, the piechart moves to the center of the plot.
Maybe this is a bug or it's not implemented yet?
Here is my code:
library(plotly)
library(crosstalk)
df <- data.frame(Code = rep(c("AUS", "BRA", "CAN", "USA"),each = 4),
Category = rep(c("A","B","C","D"),4),
Values = rep(c(10,15,5,20),each=4),
Perc = c(10, 20, 20, 50,
35, 5, 15, 45,
5, 75, 5, 15,
60, 30, 10, 0))
shared_data <- SharedData$new(df, key = ~Code)
p1 <- shared_data %>%
plot_geo(z = ~Values,
zmin=0,
zmax=20,
color = ~Values,
locations = ~Code,
visible=T)
p2 <- shared_data %>%
plot_ly(type = "pie",
visible = T,
showlegend = F,
values = ~Perc,
labels = ~Category,
domain = list(x = c(0.5, 1),
y = c(0,1)),
hole = 0.8,
sort = F) %>%
layout(autosize = T, geo = list(domain = list(x = c(0.5, 1),
y = c(0,1)
))
)
sp1 <- subplot(p1, p2) %>%
hide_legend() %>%
hide_colorbar() %>%
layout(xaxis = list(domain=c(0,0.5)), #adding this does not work either
xaxis2 = list(domain=c(0.5,1)))

I seem to have found a workaround that answers the question.
I removed the domain of the piechart.
I had to plot the piechart first in the subplot (piechart goes to the
left). (Explanation after the code)
I defined the width of each subplot (smaller pie chart, larger
choropleth).
Then when defining the layout of the subplot it was important to define the number of rows and columns of the subplot grid.
This by itself already solves the problem, as you can play with the number of columns.
For further tinkering with the pie chart size, one can then use the domain list.
Here is the code:
g <- list(
showframe = FALSE,
showcoastlines = TRUE,
coastlinecolor = toRGB("black"),
projection = list(type = 'Mercator')
)
p1 <- shared_data %>%
plot_geo(z = ~Values,
zmin=0,
zmax=20,
color = ~Values,
locations = ~Code,
visible=T) %>%
layout(autosize = T, geo = g)
p2 <- shared_data %>%
plot_ly(type = "pie",
visible = T,
showlegend = F,
values = ~Perc,
labels = ~Category,
#domain = list(x = c(0, 0.3),
# y = c(0,1)),
hole = 0,
sort = F) %>%
layout(autosize = T, geo = g) # I have to set up 'geo' here, otherwise I get a warning in the subplot. Probably to match 'geo' from the choropleth?
sp1 <- subplot(p2, p1, # notice I plot the piechart (p2) first
widths=c(0.25, 0.75)) %>% # here I set up the size of each subplot
hide_legend() %>%
hide_colorbar() %>%
layout(grid = list(rows = 1,
columns = 3#, #setting columns as 2 and changing the domain also works
#domain = list(x = c(0,1),
# y = c(0,1))
)
)
sp1
I could not plot the piechart on the right like I wanted because I could not find a way to set up the domain right.
On the reference to layout.grid (https://plotly.com/r/reference/layout/#layout-grid),
the 'roworder' entry refers to the way grid columns are enumerated:
"Note that columns are always enumerated from left to right."
I think that if they could be enumerated from right to left I would be able to put the piechart on the right and set up its domain properly.
So probably "layout.grid.columnorder" should be implemented?

R plotly simple boxplot highlighting the most recent value

I probably have a simple question but I can't find a way to achieve what I need. I have a simple boxplot as the following:
end_dt <- as.Date("2021-02-12")
start_dt <- end_dt - (nrow(iris) - 1)
dim(iris)
dates <- seq.Date(start_dt, end_dt, by="1 day")
df <- iris
df$LAST_VAL <- "N"
df[3, 'LAST_VAL'] <- "Y"
df1 <- df[,c("Sepal.Length","LAST_VAL")]
df1$DES <- 'Sepal.Length'
colnames(df1) <- c("VALUES","LAST_VAL","DES")
df2 <- df[,c("Sepal.Width","LAST_VAL")]
df2$DES <- 'Sepal.Width'
colnames(df2) <- c("VALUES","LAST_VAL","DES")
df <- rbind(df1, df2)
fig <- plot_ly(df, y = ~VALUES, color = ~DES, type = "box") %>% layout(showlegend = FALSE)
What I would like to do now is a add a red marker to each box plot just for the value corresponding to LAST_VAL = "Y". This would allow me to see given the distribution of each plot, to see where the most recent value is located.
I tried to use the info on https://plotly.com/r/box-plots/ but I can't figure out how to do this.
Thanks

The following solution ended up to be a bit too long codewise. However, it should give you what you asked for. I think the boxplots should be added afterwards, like:
fig <- plot_ly(df[df$LAST_VAL=="Y",],
x=~DES, y = ~VALUES, color = ~DES, type = "scatter", colors='red') %>%
layout(showlegend = FALSE) %>%
add_boxplot(data = df[df$DES=="Sepal.Length",], x = ~DES, y = ~VALUES,
showlegend = F, color = ~DES,
boxpoints = F, fillcolor = 'white', line = list(color = c('blue'))) %>%
add_boxplot(data = df[df$DES=="Sepal.Width",], x = ~DES, y = ~VALUES,
showlegend = F, color = ~DES,
boxpoints = F, fillcolor = 'white', line = list(color = c('green')))

R plotly show only labels where percentage value is value is above 10

I am making a pie-chart in plotly in R.
I want my labels to be on the chart, so I use textposition = "inside", and for the very small slices those values are not visible.
I am trying to find a way to exclude those labels.
Ideally, I would like to like to not print any lables on my plot that are below 10%.
Setting textposition = "auto" doesn't work well, since there are a lot of small slices, and it makes the graph look very messy.
Is there a way to do it?
For example these piecharts from plotly website (https://plot.ly/r/pie-charts/)
library(plotly)
library(dplyr)
cut <- diamonds %>%
group_by(cut) %>%
summarize(count = n())
color <- diamonds %>%
group_by(color) %>%
summarize(count = n())
clarity <- diamonds %>%
group_by(clarity) %>%
summarize(count = n())
plot_ly(cut, labels = cut, values = count, type = "pie", domain = list(x = c(0, 0.4), y = c(0.4, 1)),
name = "Cut", showlegend = F) %>%
add_trace(data = color, labels = color, values = count, type = "pie", domain = list(x = c(0.6, 1), y = c(0.4, 1)),
name = "Color", showlegend = F) %>%
add_trace(data = clarity, labels = clarity, values = count, type = "pie", domain = list(x = c(0.25, 0.75), y = c(0, 0.6)),
name = "Clarity", showlegend = F) %>%
layout(title = "Pie Charts with Subplots")
In the plot for Clarity 1.37% are outside of the plot, while I would like them not to show at all.

You'll have to specify sector labels manually like so:
# Sample data
df <- data.frame(category = LETTERS[1:10],
value = sample(1:50, size = 10))
# Create sector labels
pct <- round(df$value/sum(df$value),2)
pct[pct<0.1] <- 0 # Anything less than 10% should be blank
pct <- paste0(pct*100, "%")
pct[grep("0%", pct)] <- ""
# Install devtools
install.packages("devtools")
# Install latest version of plotly from github
devtools::install_github("ropensci/plotly")
# Plot
library(plotly)
plot_ly(df,
labels = ~category, # Note formula since plotly 4.0
values = ~value, # Note formula since plotly 4.0
type = "pie",
text = pct, # Manually specify sector labels
textposition = "inside",
textinfo = "text" # Ensure plotly only shows our labels and nothing else
)
Check out https://plot.ly/r/reference/#pie for more information...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R Plotly jittered boxplot with NAs width - r

Related

Changing the widths of plotly boxes and/or the space between them

Remove whiskers and outliers in R plotly

Plotly in R: Piechart subplot changing domains when linked to choropleth with crosstalk

R plotly simple boxplot highlighting the most recent value

R plotly show only labels where percentage value is value is above 10

Categories

Resources