Splitting a plotly violin plot by more than two groups

Splitting a plotly violin plot by more than two groups - r

My question is about expanding R plotly's grouped violin plot to a case with more than two groups.
Taking the data that are used in the grouped violin plot example code and adding a third level to df$sex:
library(dplyr)
set.seed(1)
df <- read.csv("https://raw.githubusercontent.com/plotly/datasets/master/violin_data.csv")
df <- df %>%
rbind(df[sample(nrow(df), 100, replace = F),] %>%
dplyr::mutate(sex = "undefined", day = sample(df$day, 100 , replace = F), day = sample(df$day, 100, replace = F)))
df$sex <- factor(df$sex)
Trying to plot this with:
plotly::plot_ly(x = df$day, y = df$total_bill, type = 'violin', split = df$sex, color = df$sex)
I get the violins of each of the sexes centered rather than split:
And this remains the case if I switch split = df$sex to name = df$sex.
But if I change type = 'violin' to type = 'bar' I do get df$sex split:
Any idea how to get this to work for the type = 'violin' case?

Related

Converting basic r barplot to ggplot

I've currently got a barplot that has a few basic parameters. However, I'm looking to try and convert this into ggplot. The extra parameters don't matter too much; the main problem that I'm having is that I'm trying to plot the sum of various columns, but I'm unable to transpose it correctly as t(data) doesn't seem to work. Here's what I've got so far:
## Subset of indicators
indicators <- clean_data[c(8, 12, 14:23)]
## Get sum of columns
indicator_sums <- colSums(indicators, na.rm = TRUE)
### Transpose for ggplot
(empty)
## Make bar plot
barplot(indicator_sums, ylim=range(pretty(c(0, indicator_sums))), cex.axis=0.75,cex.lab=0.8, cex.names=0.7, col='magenta', las=2, ylab = 'Offences Recorded Using Indicator')

You may try
library(dplyr)
library(reshape2)
dummy <- data.frame(
A = c(1:20),
B = rnorm(20, 10, 4),
C = runif(20, 19,30),
D = sample(c(10:40),20, replace = T)
)
barplot(colSums(dummy))
dummy %>%
colSums %>%
melt %>%
rownames_to_column %>%
ggplot(aes(x = rowname, y = value)) +
geom_col()

How to specify groups with colors in qqplot()?

I have created a qqplot (with quantiles of beta distribution) from a dataset including two groups. To visualize, which points belong to which group, I would like to color them. I have tried the following:
res <- beta.mle(data$values) #estimate parameters of beta distribution
qqplot(qbeta(ppoints(500),res$param[1], res$param[2]),data$values,
col = data$group,
ylab = "Quantiles of data",
xlab = "Quantiles of Beta Distribution")
the result is shown here:
I have seen solutions specifying a "col" vector for qqnorm, hover this seems to not work with qqplot, as simply half the points is colored in either color, regardless of group. Is there a way to fix this?

A simulated some data just to shown how to add color in ggplot
Libraries
library(tidyverse)
# install.packages("Rfast")
Data
#Simulating data from beta distribution
x <- rbeta(n = 1000,shape1 = .5,shape2 = .5)
#Estimating parameters
res <- Rfast::beta.mle(x)
data <-
tibble(
simulated_data = sort(x),
quantile_data = qbeta(ppoints(length(x)),res$param[1], res$param[2])
) %>%
#Creating a group variable using quartiles
mutate(group = cut(x = simulated_data,
quantile(simulated_data,seq(0,1,.25)),
include.lowest = T))
Code
data %>%
# Adding group variable as color
ggplot(aes( x = quantile_data, y = simulated_data, col = group))+
geom_point()
Output

For those who are wondering, how to work with pre-defined groups, this is the code that worked for me:
library(tidyverse)
library(Rfast)
res <- beta.mle(x)
# make sure groups are not numerrical
# (else color skale might turn out continuous)
g <- plyr::mapvalues(g, c("1", "2"), c("Group1", "Group2"))
data <-
tibble(
my_data = sort(x),
quantile_data = qbeta(ppoints(length(x)),res$param[1], res$param[2]),
group = g[order(x)]
)
data %>%
# Adding group variable as color
ggplot(aes( x = quantile_data, y = my_data, col = group))+
geom_point()
result

Legend by colors rather than by colors*groups in an R plotly violin plot

I have count data from two groups (G1 and G2), where the first group has counts from two types (T1 and T2) and the second group has counts from three types (T1, T2, and T3):
set.seed(1)
df <- data.frame(count = rpois(100,100),
type = c(rep("T1",20), rep("T2",20), rep("T1",20), rep("T2",20), rep("T3",20)),
group = c(rep("G1",40), rep("G2",60)),
stringsAsFactors = F)
I want to draw these data in a violin plot using R's plotly, where the colors are by type and the x-axis location is by group. Here's what I'm currently doing:
df$type <- factor(df$type)
library(plotly)
plot_ly(x = df$group, y = df$count, split = df$group, type = 'violin', box = list(visible = F), points = F,showlegend = T, color = df$type)
Which gives:
As you can see the legend shows the expansion of type * group, but I want it to only show type (i.e., follow the color argument in the plot_ly function).
Any idea how to do this?

Force plotly violin plot not to display a violin on zero values

I have measurements from several groups which I would like to plot as violin plots:
set.seed(1)
df <- data.frame(val = c(runif(100,1,5),runif(100,1,5),rep(0,100)),
group = c(rep("A",100),rep("B",100),rep("C",100)))
Using R's ggplot2:
library(ggplot2)
ggplot(data = df, aes(x = group, y = val, color = group)) + geom_violin()
I get:
But when I try to get the equivalent with R's plotly using:
library(plotly)
plot_ly(x = df$group, y = df$val, split = df$group, type = 'violin', box = list(visible = F), points = F, showlegend = T, color = df$group)
I get:
Where group "C" gets an inflated/artificial violin.
Any idea how to deal with this and not by using ggplotly?

I did not find a way to fix the behaviour of plotly (probably worth making a bug report for this). A workaround would be to filter your data to only draw violin plots on groups whose range is greater than zero. If you also need to show where the other groups are, you can use a boxplot for these.
To demonstrate, I use library(data.table) for the filtering stage. You could use dplyr or base versions of the same procedure if you prefer:
setDT(df)[, toplot := diff(range(val)) > 0, group]
Now we can plot the groups using different trace styles depending on whether they should have violins or not
plot_ly() %>%
add_trace(data = df[(toplot)], x = ~group, y = ~val, split = ~group,
type = 'violin', box = list(visible = F), points = F) %>%
add_boxplot(data = df[(!toplot)], x = ~group, y = ~val, split = ~group)

Plotting multiple box plots as a single graph in R

I am trying to plot multiple box plots as a single graph. The data is where I have done a wilcoxon test. It should be like this
I have four/five questions and I want to plot the respondent score for two sets as a box plot. This should be done for all questions (Two groups for each question).
I am thinking of using ggplot2. My data is like
q1o <- c(4,4,5,4,4,4,4,5,4,5,4,4,5,4,4,4,5,5,5,5,5,5,5,5,5,3,4,4,3,4)
q1s <- c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,5,4,4)
q2o <- c(3,3,3,4,3,4,4,3,3,3,4,4,3,4,3,3,4,3,3,3,3,4,4,4,4,3,3,3,3,4)
q2s <- c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,3,4,4)
....
....
q1 means question 1 and q2 means question 2. I also want to know how to align these stacked box plots based on my need. Like one row or two rows.

This should get you started:
Unfortunately you don't provide a minimal example with sample data, so I will generate some random sample data.
# Generate sample data
set.seed(2017);
df <- cbind.data.frame(
value = rnorm(1000),
Label = sample(c("Good", "Bad"), 1000, replace = T),
variable = sample(paste0("F", 5:11), 1000, replace = T));
# ggplot
library(tidyverse);
df %>%
mutate(variable = factor(variable, levels = paste0("F", 5:11))) %>%
ggplot(aes(variable, value, fill = Label)) +
geom_boxplot(position=position_dodge()) +
facet_wrap(~ variable, ncol = 3, scale = "free");
You can specify the number of columns and rows in your 2d panel layout through arguments ncol and nrow, respectively, of facet_wrap. Many more details and examples can be found if you follow ?geom_boxplot and ?facet_wrap.
Update 1
A boxplot based on your sample data doesn't make too much sense, because your data are not continuous. But ignoring that, you could do the following:
df <- data.frame(
q1o = c(4,4,5,4,4,4,4,5,4,5,4,4,5,4,4,4,5,5,5,5,5,5,5,5,5,3,4,4,3,4),
q1s = c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,5,4,4),
q2o = c(3,3,3,4,3,4,4,3,3,3,4,4,3,4,3,3,4,3,3,3,3,4,4,4,4,3,3,3,3,4),
q2s = c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,3,4,4));
df %>%
gather(key, value, 1:4) %>%
mutate(
variable = ifelse(grepl("q1", key), "F1", "F2"),
Label = ifelse(grepl("o$", key), "Bad", "Good")) %>%
ggplot(aes(variable, value, fill = Label)) +
geom_boxplot(position = position_dodge()) +
facet_wrap(~ variable, ncol = 3, scale = "free");
Update 2
One way of visualising discrete data would be in a mosaicplot.
mosaicplot(table(df2));
The plot shows the count of value (as filled rectangles) per Variable per Label. See ?mosaicplot for details.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Splitting a plotly violin plot by more than two groups - r

Related

Converting basic r barplot to ggplot

How to specify groups with colors in qqplot()?

Legend by colors rather than by colors*groups in an R plotly violin plot

Force plotly violin plot not to display a violin on zero values

Plotting multiple box plots as a single graph in R

Categories

Resources