I have a data.frame with counts for several groups that are assigned to several loci from two ages:
set.seed(1)
df <- data.frame(group = c(rep("G1",2),rep("G2",2),rep("G3",2)),
locus = c(rep("A",4),rep("B",2)),
age = rep(c("2m","24m"),3),
n = as.integer(runif(6,10,100)),
stringsAsFactors = F)
I want to plot these data in a bar chart using plotly in R, where the x-axis is group, the bars are split by age, and colored by the combination of locus and age.
I set group, locus, and age as factors to give them the order I want them to folow in the figure:
df$group <- factor(df$group, levels = c("G1","G2","G3"))
df$locus <- factor(df$locus, levels = c("A","B"))
df$age <- factor(df$age,levels = c("2m","24m"))
Now I'm creating a data.frame with the specific colors I'd like each of the locus-age combination to have:
library(dplyr)
color.df <- data.frame(locus = c("A","A","B","B"), age = rep(c("2m","24m"),2), color = c("#66C2A5","#488A75","#FC8D62","#B46446"),stringsAsFactors = F) %>%
dplyr::mutate(locus_age=paste0(locus,"_",age))
color.df$locus <- factor(color.df$locus, levels=c("A","B"))
color.df$age <- factor(color.df$age, levels=c("2m","24m"))
color.df$locus_age <- factor(color.df$locus_age,levels=color.df$locus_age)
Then joining df with color.df:
df <- dplyr::left_join(df,color.df)
And finally plotting:
library(plotly)
plot_ly(x = df$group, y = df$n, split = df$age, text = df$n, type = 'bar', color = df$locus_age, colors = color.df$color, showlegend = T,
textposition = "inside", textfont = list(size=12,color='black')) %>%
layout(yaxis=list(title="N"))
Which gives:
My questions are:
Although I defined the df$age order to be c("2m","24m") the "24m" age appears before the "2m" age, as if the split argument in the plot_ly function is ignoring this order. Any idea how to fix this?
Looks like the legend is labelling both age and the locus_age. Any idea how to make it only label by locus_age?
After some experimentation, it seems to me that the split argument is causing the issues.
I have also streamlined the code so that there is a continuous pipe:
library(dplyr)
library(plotly)
data.frame(locus = c("A", "A", "B", "B"),
age = rep(c("2m", "24m"), 2),
color = c("#66C2A5", "#488A75", "#FC8D62", "#B46446"),
stringsAsFactors = FALSE) %>%
mutate(locus_age = paste0(locus, "_", age)) %>%
mutate(locus = factor(locus, levels = c("A", "B"))) %>%
mutate(age = factor(age, levels = c("2m", "24m"))) %>%
mutate(locus_age = forcats::fct_inorder(locus_age)) %>%
left_join(df, .) %>%
plot_ly(x = ~group, y = ~n,
# split = ~age,
text = ~n, type = 'bar',
color = ~locus_age, colors = ~color, showlegend = TRUE,
textposition = "inside", textfont = list(size = 12, color = 'black')) %>%
layout(yaxis = list(title = "N"))
Now, the factor levels seem to be in the expected order and the legend shows only locus_age.
Caveat
Dealing with factors can become tricky sometimes as factor levels are not always created in the expected order.
If we create a character vector in the expected order
x <- outer(c("A", "B"), c("2m", "24m"), paste, sep = "_") %>%
t() %>%
as.vector()
x
[1] "A_2m" "A_24m" "B_2m" "B_24m"
base R's
factor(x)
(without specifying factor levels explicitely)
creates a factor whose levels are sorted in the current locale by default:
[1] A_2m A_24m B_2m B_24m
Levels: A_24m A_2m B_24m B_2m
Alternatively,
forcats::as_factor(x)
creates factor levels in order of appearance (if x is character)
[1] A_2m A_24m B_2m B_24m
Levels: A_2m A_24m B_2m B_24m
or even more explicitely
forcats::fct_inorder(x)
Related
I have grouped data which I want to plot as a group of box plots using R's plotly package, and control the width of the boxes and/or the space between theme.
Here are the data:
set.seed(1)
df <- data.frame(type = c(rep("t1", 1000), rep("t2", 1000), rep("t3", 1000), rep("t4", 1000), rep("t5", 1000), rep("t6", 1000)),
age = rep(c(rep("y", 500),rep("o", 500)), 6),
value = rep(c(runif(500, 5, 10), runif(500, 7.5, 12.5)), 6),
stringsAsFactors = F)
df$age <- factor(df$age, levels = c("y", "o"), ordered = T)
Following plotly's tutorial this is how I'm plotting it:
library(plotly)
library(dplyr)
plot_ly(x = df$type, y = df$value, name = df$age, color = df$type, type = 'box',showlegend = F) %>%
layout(yaxis=list(title="Diversity"),boxmode='group')
Which gives:
Where the boxes come out too narrow and the space both between boxes of the same type as well as the space between the different types are big.
Any idea how to change the box widths and/or the spaces?
According to this post, in python the boxgap and boxgroupgap control these aspects.
Analogous to the python version, layout parameters as being documented here can be changed as arguments of the function layout:
plot_ly(x = df$type, y = df$value, name = df$age, color = df$type,
type = "box", showlegend = F) %>%
layout(yaxis = list(title = "Diversity"),
boxmode = "group", boxgap = 0, boxgroupgap = 0
)
One alternative is to use a continuous x-axis. Here with ggplotly instead:
# convert factors to numbers
df$itype <- as.numeric (factor (df$type))
sc <- scale (unique (as.numeric (factor (df$age))))
df$iage <- sc[as.numeric (factor (df$age))] * .3
# plot
gg <-
ggplot (df, aes (x=itype+iage, y=value, color=type, group=itype+iage)) +
geom_boxplot() +
scale_x_continuous(labels = levels (factor (df$type)), breaks = 1:length (levels (factor (df$type)))) +
labs (x="", y="Diversity")
ggplotly (gg) %>%
layout(boxgroupgap = 0, boxgap=0)
plot
I am trying to split the attached grouped bar chart by the variable spec. Two thoughts on best way to do this are by adding facet_grid() or if a filter can be applied to the static output? Can either be done? Any advice appreciated.
a sample is below:
period <- c('201901', '201901', '201904', '201905')
spec <- c('alpha', 'bravo','bravo', 'charlie')
c <- c(5,6,3,8)
e <- c(1,2,4,5)
df <- data.frame(period, spec, c,e)
library(tidyverse)
library(plotly)
plot_ly(df, x =~period, y = ~c, type = 'bar', name = "C 1", marker = list(color = 'lightsteelblue3'))
%>%
add_trace(y = ~e, name = "E 1", marker = list(color = 'Gray')) %>%
layout(xaxis = list(title="", tickangle = -45),
yaxis = list(title=""),
margin= list(b=100),
barmode = 'group'
)
I am not sure if you are plotting what you actually want to achieve? My suggestion is to create your plot using standard ggplot and then use ggplotly.
For this, you also need to reshape your data and make it a bit longer.
library(tidyverse)
library(plotly)
period <- c('201901', '201901', '201904', '201905')
spec <- c('alpha', 'bravo','bravo', 'charlie')
c <- c(5,6,3,8)
e <- c(1,2,4,5)
df <- data.frame(period, spec, c,e) %>%
pivot_longer(cols = c(c,e), names_to = 'var', values_to = 'val')
p <- ggplot(df, aes(period, val, fill = var)) +
geom_col(position = position_dodge()) +
facet_grid(~spec)
ggplotly(p)
It's probably easier to use facets here, but a more "interactive" option would be to use a filter transforms which gives you a drop-down menu in the top left corner of your plot.
spec.val <- unique(df$spec)
plot_ly(
df %>% pivot_longer(-c(period, spec)),
x = ~period, y = ~value, color = ~name,
type = "bar",
transforms = list(
list(
type = "filter",
target = ~spec,
operation = "=",
value = spec.val[1]))) %>%
layout(
updatemenus = list(
list(
type = "drowdown",
active = 0,
buttons = map(spec.val, ~list(
method = "restyle",
args = list("transforms[0].value", .x),
label = .x)))))
I am plotting the grouped boxplot with jittering with the following function:
plot_boxplot <- function(dat) {
# taking one of each joine_group to be able to plot it
allx <- dat %>%
mutate(y = median(y, na.rm = TRUE)) %>%
group_by(joined_group) %>%
sample_n(1) %>%
ungroup()
p <- dat %>%
plotly::plot_ly() %>%
# plotting all the groups 1:20
plotly::add_trace(data = allx,
x = ~as.numeric(joined_group),
y = ~y,
type = "box",
hoverinfo = "none",
boxpoints = FALSE,
color = NULL,
opacity = 0,
showlegend = FALSE) %>%
# plotting the boxes
plotly::add_trace(data = dat,
x = ~as.numeric(joined_group),
y = ~y,
color = ~group1,
type = "box",
hoverinfo = "none",
boxpoints = FALSE,
showlegend = FALSE) %>%
# adding ticktext
layout(xaxis = list(tickvals = 1:20,
ticktext = rep(levels(dat$group1), each = 4)))
p <- p %>%
# adding jittering
add_markers(data = dat,
x = ~jitter(as.numeric(joined_group), amount = 0.2),
y = ~y,
color = ~group1,
showlegend = FALSE)
p
}
The problem is that when some of the levels have NA as y variable the width of the jittered boxes changes. Here is an example:
library(plotly)
library(dplyr)
set.seed(123)
dat <- data.frame(group1 = factor(sample(letters[1:5], 100, replace = TRUE)),
group2 = factor(sample(LETTERS[21:24], 100, replace = TRUE)),
y = runif(100)) %>%
dplyr::mutate(joined_group = factor(
paste0(group1, "-", group2)
))
# do the plot with all the levels
p1 <- plot_boxplot(dat)
# now the group1 e is having NAs as y values
dat$y[dat$group1 == "e"] <- NA
# create the plot with missing data
p2 <- plot_boxplot(dat)
# creating the subplot to see that the width has changed:
subplot(p1, p2, nrows = 2)
The problem is that the width of boxes in both plots is different:
I've realised that the boxes have the same size without jittering so I know that the jittering is "messing" with the width but I don't know how to fix that.
Does anyone know how to make the width in both jittered plots exactly the same?
I see two separate plot shifts:
due to jittering
due to NAs
First can be solved by declaring new jitter function with fixed seed
fixed_jitter <- function (x, factor = 1, amount = NULL) {
set.seed(42)
jitter(x, factor, amount)
}
and using it instead of jitter in add_markers call.
Second problem can be solved by assigning -1 instead of NA and setting
yaxis = list(range = c(0, ~max(1.1 * y)))
as a second parameter to layout.
I'm trying to order a stacked bar chart in plotly, but it is not respecting the order I pass it in the data frame.
It is best shown using some mock data:
library(dplyr)
library(plotly)
cars <- sapply(strsplit(rownames(mtcars), split = " "), "[", i = 1)
dat <- mtcars
dat <- cbind(dat, cars, stringsAsFactors = FALSE)
dat <- dat %>%
mutate(carb = factor(carb)) %>%
distinct(cars, carb) %>%
select(cars, carb, mpg) %>%
arrange(carb, desc(mpg))
plot_ly(dat) %>%
add_trace(data = dat, type = "bar", x = carb, y = mpg, color = cars) %>%
layout(barmode = "stack")
The resulting plot doesn't respect the ordering, I want the cars with the largest mpg stacked at the bottom of each cylinder group. Any ideas?
As already pointed out here, the issue is caused by having duplicate values in the column used for color grouping (in this example, cars). As indicated already, the ordering of the bars can be remedied by grouping your colors by a column of unique names. However, doing so will have a couple of undesired side-effects:
different model cars from the same manufacturer would be shown with different colors (not what you are after - you want to color by manufacturer)
the legend will have more entries in it than you want i.e. one per model of car rather than one per manufacturer.
We can hack our way around this by a) creating the legend from a dummy trace that never gets displayed (add_trace(type = "bar", x = 0, y = 0... in the code below), and b) setting the colors for each category manually using the colors= argument. I use a rainbow pallette below to show the principle. You may like to select sme more attractive colours yourself.
dat$unique.car <- make.unique(as.character(dat$cars))
dat2 <- data.frame(cars=levels(as.factor(dat$cars)),color=rainbow(nlevels(as.factor(dat$cars))))
dat2[] <- lapply(dat2, as.character)
dat$color <- dat2$color[match(dat$cars,dat2$cars)]
plot_ly() %>%
add_trace(data=dat2, type = "bar", x = 0, y = 0, color = cars, colors=color, showlegend=T) %>%
add_trace(data=dat, type = "bar", x = carb, y = mpg, color = unique.car, colors=color, showlegend=F, marker=list(line=list(color="black", width=1))) %>%
layout(barmode = "stack", xaxis = list(range=c(0.4,8.5)))
One way to address this is to give unique names to all models of car and use that in plotly, but it's going to make the legend messier and impact the color mapping. Here are a few options:
dat$carsID <- make.unique(as.character(dat$cars))
# dat$carsID <- apply(dat, 1, paste0, collapse = " ") # alternative
plot_ly(dat) %>%
add_trace(data = dat, type = "bar", x = carb, y = mpg, color = carsID) %>%
layout(barmode = "stack")
plot_ly(dat) %>%
add_trace(data = dat, type = "bar", x = carb, y = mpg, color = carsID,
colors = rainbow(length(unique(carsID)))) %>%
layout(barmode = "stack")
I'll look more tomorrow to see if I can improve the legend and color mapping.
I'm trying to add legends with arbitrary text in a ggvis plot using data from different dataframes. I have tried using add_legend() but I have not idea about what parameters to use. Using plot() is very simple using the legend() function but it has been very hard to find a way to do it using ggvis()
Here is a simple example of what I have using plot():
df1 = data.frame(x = sample(1:10), y = sample(1:10))
df2 = data.frame(x = 1:10, y = 1:10)
df3 = data.frame(x = 1:10, y = sqrt(1:10))
plot(df1)
lines(df2$x, df2$y, col = "red")
lines(df3$x, df3$y, col = "green")
legend("topleft", c("Data 2","Data 3"), lty = 1, col = c("red","green"))
Now, using ggvis() I can plot the points and the lines from different datasets but I can not find a way to put the legends using add_legend(), Here is the code using ggvis():
df1 %>% ggvis(x=~x,y=~y) %>% layer_points() %>%
layer_paths(x=~x,y=~y,data = df2, stroke := "red") %>%
layer_paths(x=~x,y=~y,data = df3, stroke := "green")
I will really appreciate any help.
Thank you.
Edited:
This a sample code using only one data frame and plot()
df = data.frame(x = sample(1:10), y = sample(1:10), x2 = 1:10, y2 = 1:10, y3 = sqrt(1:10) )
plot(df[,c("x","y")])
lines(df$x2, df$y2, col = "red")
lines(df$x2, df$y3, col = "green")
legend("topleft", c("Data 2","Data 3"), lty = 1, col = c("red","green"))
So, what I came up with, is the following, which works:
#add an id column for df2 and df3 and then rbind
df2$id <- 1
df3$id <- 2
df4 <- rbind(df2,df3)
#turn id into a factor
df4$id <- factor(df4$id)
#then plot df4 using the stroke=~id argument
#then plot the legend
#and finally add df1 with a separate data
df4 %>% ggvis(x=~x,y=~y,stroke=~id) %>% layer_lines() %>%
add_legend('stroke', orient="left") %>%
layer_points(x=~x,y=~y,data = df1,stroke:='black')
And it works:
If you would like to move the legend to a position inside the plot then you need to try this:
df4 %>% ggvis(x=~x,y=~y,stroke=~id) %>% layer_lines() %>%
#make sure you use add relative scales
add_relative_scales() %>%
#values for x and y need to be between 0 and 1
#e.g for the x-axis 0 is the at far-most left point and 1 at the far-right
add_legend("stroke", title = "Cylinders",
properties = legend_props(
legend = list(
x = scaled_value("x_rel", 0.1),
y = scaled_value("y_rel", 1)
))) %>%
layer_points(x=~x,y=~y,data = df1,stroke:='black')
And the output: