How to make a stacked barplot with nested grouping variables? - r

I am trying to make a stacked barplot with two variables. My desired outcome looks like this:
This is the first part of my data. There are 220 more rows:
Type Week Stage
<chr> <dbl> <dbl>
1 Captured 1 2
2 Captured 1 1
3 Captured 1 1
4 Captured 1 2
5 Captured 1 1
6 Captured 1 3
7 Captured 1 NA
8 Captured 1 3
9 Captured 1 2
10 Captured 1 1
So far I'm not getting anywhere, this is my code so far
library(data.table)
dat.m <- melt(newrstudio2, id.vars="Type")
dat.m
library(ggplot2)
ggplot(dat.m, aes(x=Type, y=value, fill=variable)) +
geom_bar(stat="identity")
I guess I need to calculate the number of observations of each stage in each week of each type? I've tried both long and wide data, but I somehow need to combine week with type? I don't know, I'm at a loss.

Alternative way:
set.seed(123)
# sample data
my_data <- data.frame(Type = sample(c("W", "C"), 220, replace = TRUE),
Week = sample(paste0("Week ", 1:4), 220, replace = TRUE),
Stage = sample(paste0('S', 1:4), 220, replace = TRUE))
head(my_data)
library(ggplot2)
ggplot(my_data, aes(x = Type, fill = Stage)) +
geom_bar(aes(y = (..count..)/sum(..count..)), position = "fill") +
facet_grid(. ~ Week, switch="both") +
scale_y_continuous(labels = scales::percent) +
ylab("Stage [%]") +
theme(strip.background = element_blank(),
strip.placement = "outside",
panel.spacing = unit(0, "lines"))

Alternatively we could use base graphics. First, what you're probably most interested in, we should reshape the data.
For this we could split the data per week and run a dcast() over it.
L <- lapply(split(d, d$week), function(x)
data.table::dcast(x, type ~ stage, value.var="stage", fun=length))
d2 <- do.call(rbind, L) # transform back into a data frame
Now – with credits to #alemol – we want the proportions.
d2[-1] <- t(apply(d2[-1], 1, prop.table))
Then we are able to plot relatively simply. Note, that barplot() additionally gives us a vector of bar coordinates which we can use later for the axis() labels.
cols <- c("#ed1c24", "#ff7f27", "#00a2e8", "#fff200") # define stage colors
par(mar=c(5, 5, 3, 5) + .1, xpd=TRUE) # set plot margins
p <- barplot(t(d2[-1]), col=cols, border="white", space=rep(c(.2, 0), 5),
font.axis=2, xaxt="n", yaxt="n", xlab="Week")
axis(1, at=p, labels=rep(c("C", "W"), 5), tick=FALSE, line=0)
axis(1, at=apply(matrix(p, , 2, byrow=TRUE), 1, mean), labels=1:5, tick=FALSE, line=1)
axis(2, at=0:10/10, labels=paste0(seq(0, 100, 10), "%"), line=0, las=2)
legend(12, .5, legend=rev(names(d2[-1])), col=rev(cols), pch=15, title="Stage")
Result:
Data:
d <- structure(list(type = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L,
1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L,
2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L,
2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L,
2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L), .Label = c("C", "W"), class = "factor"), week = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), stage = c(3L,
1L, 1L, 2L, 2L, 2L, 1L, 3L, 2L, 4L, 1L, 1L, 2L, 2L, 3L, 4L, 3L,
2L, 4L, 1L, 1L, 3L, 1L, 2L, 3L, 1L, 4L, 1L, 2L, 4L, 2L, 3L, 4L,
4L, 2L, 4L, 4L, 2L, 3L, 1L, 1L, 4L, 4L, 1L, 4L, 3L, 3L, 3L, 2L,
1L, 3L, 4L, 2L, 4L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 2L, 1L, 3L, 2L,
1L, 1L, 1L, 4L, 2L, 4L, 1L, 4L, 3L, 4L, 4L, 4L, 2L, 2L, 2L, 2L,
2L, 1L, 3L, 4L, 2L, 4L, 4L, 2L, 2L, 3L, 4L, 4L, 3L, 3L, 1L, 1L,
1L, 2L, 4L, 3L, 1L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 4L, 2L, 1L,
2L, 1L, 3L, 3L, 2L, 4L, 3L, 1L, 1L, 4L, 1L, 4L, 4L, 1L, 2L, 2L,
2L, 1L, 3L, 4L, 3L, 4L, 3L, 4L, 4L, 3L, 1L, 1L, 2L, 1L, 2L, 3L,
2L, 2L, 1L, 4L, 3L, 4L, 2L, 2L, 3L, 1L, 2L, 3L, 3L, 3L, 3L, 2L,
1L, 2L, 2L, 1L, 1L, 3L, 4L, 3L, 4L, 2L, 4L, 1L, 1L, 2L, 1L, 3L,
2L, 1L, 3L, 3L, 2L, 2L, 1L, 3L, 2L, 2L, 2L, 1L, 4L, 2L, 4L, 2L,
4L, 3L, 3L, 1L, 3L, 4L, 3L, 2L, 1L, 2L, 4L, 1L, 2L, 4L, 2L, 1L,
2L, 1L, 2L, 2L, 3L, 1L, 3L, 3L, 3L, 2L, 2L, 1L, 2L, 3L, 2L, 2L,
1L, 2L, 1L, 3L, 3L, 2L, 1L, 3L, 4L, 2L, 1L, 2L, 4L, 3L, 4L, 2L,
3L, 2L, 4L, 1L, 4L, 4L, 2L, 1L, 2L)), row.names = c(NA, -250L
), class = "data.frame")

Is this what you're looking for:
set.seed(123)
# sample data
my_data <- data.frame(Type = sample(paste0('T', 1:4), 220, replace = TRUE),
Week = sample(paste0('W', 1:4), 220, replace = TRUE),
Stage = sample(paste0('S', 1:4), 220, replace = TRUE))
ggplot(my_data, aes(x=Week:Type, fill = Stage)) + geom_bar()

Related

Order function in R cannot be reversed with - or rev() when column has alpha and numeric characters

First time I ever cared to sort because I need to report this in a specific way. Typically, smallest to largest sort in excel on column quarter returns YTD, 3,2,1. etc. However, I cannot sort using Order(-begin$Quarter) or rev(begin$Quarter). Other solutions with mix sort prevent me from then also sorting other columns, like here I want to sort by quarter then segment. I have dput examples.
Thanks,
begin<- structure(list(Quarter = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
4L, 4L), .Label = c("1", "2", "3", "YTD"), class = "factor"),
Segment = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("Beverages", "Food"), class = "factor"), Person = structure(c(3L,
2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L,
2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L), .Label = c("Chris", "Jackie",
"Josh"), class = "factor"), Sales = c(4, 4, 3, 2, 3, 3, 7,
7, 1, 1, 2, 3, 7, 7, 8, 5, 7, 8, 9, 6, 6, 7, 5, 6)), class = "data.frame", row.names = c(NA,
-24L))
end<- structure(list(Quarter = structure(c(4L, 4L, 4L, 4L, 4L, 4L,
3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("1", "2", "3", "YTD"), class = "factor"),
Segment = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L
), .Label = c("Beverages", "Food"), class = "factor"), Person = structure(c(1L,
1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L,
3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), .Label = c("Chris", "Jackie",
"Josh"), class = "factor"), Sales = c(3, 6, 2, 5, 1, 7, 1,
6, 7, 6, 7, 9, 3, 8, 3, 7, 2, 5, 3, 8, 4, 7, 4, 7)), class = "data.frame", row.names = c(NA,
-24L))
library(dplyr)
begin %>%
arrange(desc(Quarter), Person, Segment)

Plot in 3D clusters using plotly package

I need to present 3 clusters in 3D using the plotly package in R. The clusters are generated using the k-means function included in R. I searched but I find only using ggplot package.
How can I do this, please?
This is a part of my data set to give reproducible example.
> dput(DATAFINALE[1:50,])
structure(list(YEAR_SALES = c(2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L), CREATION_YEAR_SALES = c(2L,
1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L,
1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L,
1L), TYPE_PEAU = c(2L, 3L, 4L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 4L, 4L, 1L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L,
3L, 4L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 4L, 2L, 2L, 3L,
3L, 2L, 2L, 2L, 2L, 2L, 4L), SENSIBILITE = c(3L, 3L, 3L, 2L,
1L, 3L, 3L, 2L, 2L, 2L, 3L, 1L, 3L, 1L, 2L, 3L, 3L, 2L, 3L, 3L,
3L, 3L, 3L, 2L, 1L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 3L, 2L, 1L, 3L,
3L, 3L, 3L, 1L, 2L, 2L, 3L, 2L, 3L, 3L, 3L, 1L, 2L, 3L), IMPERFECTIONS = c(2L,
3L, 2L, 1L, 3L, 2L, 2L, 1L, 2L, 1L, 2L, 3L, 2L, 2L, 1L, 3L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 3L, 2L,
1L, 3L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 1L, 2L, 3L, 1L,
2L), BRILLANCE = c(3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L,
3L, 1L, 3L, 3L, 1L, 3L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 3L, 3L, 3L,
1L, 3L, 3L, 3L, 3L, 3L, 3L), GRAIN_PEAU = c(3L, 3L, 3L, 3L, 1L,
3L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 1L, 1L, 2L, 1L, 1L, 3L, 3L, 1L,
1L, 1L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 2L, 3L, 3L, 1L, 1L, 1L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 2L), RIDES_VISAGE = c(1L,
1L, 1L, 3L, 3L, 3L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 3L, 3L, 3L, 3L,
1L, 3L, 1L, 1L, 1L, 3L, 1L, 3L, 2L, 1L, 3L, 3L, 3L, 3L, 1L, 3L,
3L, 3L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 3L, 3L, 3L, 2L, 3L, 3L, 1L,
1L), ALLERGIES = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), MAINS = c(2L, 3L, 3L, 3L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 1L, 3L, 3L, 2L, 3L, 3L, 2L, 2L,
2L, 3L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 3L,
2L, 3L, 1L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 2L, 3L), PEAU_CORPS = c(1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 1L, 2L, 3L, 2L, 2L, 1L, 2L, 1L, 3L, 2L, 2L, 2L, 3L, 2L,
2L, 2L, 2L, 3L, 3L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L), INTERET_ALIM_NATURELLE = c(1L, 3L, 3L, 1L, 3L, 1L, 1L, 1L,
3L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 3L,
3L, 1L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 3L,
3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), INTERET_ORIGINE_GEO = c(1L,
2L, 1L, 1L, 3L, 1L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 3L, 1L, 2L, 1L,
1L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 2L, 1L, 3L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L,
1L), INTERET_VACANCES = c(2L, 3L, 1L, 2L, 1L, 2L, 1L, 1L, 2L,
3L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), INTERET_ENVIRONNEMENT = c(1L,
3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 1L,
1L, 1L, 1L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), INTERET_COMPOSITION = c(1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 3L, 1L, 1L, 1L, 1L, 3L, 1L,
3L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), MONTH_SALES = c(9, 9,
2, 9, 3, 3, 11, 12, 3, 6, 3, 3, 8, 9, 5, 1, 10, 5, 4, 9, 2, 3,
4, 5, 6, 7, 7, 9, 7, 7, 11, 6, 4, 4, 4, 8, 9, 8, 9, 12, 4, 4,
3, 11, 5, 12, 11, 2, 6, 3), DAY_SALES = c(13, 3, 10, 23, 12,
10, 26, 4, 18, 9, 9, 9, 4, 10, 17, 28, 22, 4, 14, 22, 2, 10,
1, 20, 7, 12, 1, 3, 13, 3, 9, 5, 13, 27, 1, 28, 18, 10, 3, 2,
15, 6, 25, 4, 8, 23, 16, 19, 21, 14), HOURS_INS = c(17, 14, 18,
16, 23, 18, 16, 12, 17, 16, 21, 18, 22, 14, 10, 15, 13, 13, 21,
16, 23, 22, 17, 12, 15, 23, 17, 14, 8, 10, 12, 14, 13, 10, 17,
3, 19, 22, 17, 18, 23, 18, 8, 16, 12, 19, 21, 14, 11, 22), CREATION_MONTH_SALES = c(9,
9, 2, 10, 12, 3, 11, 2, 3, 6, 10, 3, 3, 9, 7, 11, 11, 5, 4, 9,
2, 3, 4, 8, 6, 7, 10, 5, 7, 8, 11, 6, 4, 4, 11, 8, 9, 8, 12,
12, 4, 8, 2, 11, 11, 1, 11, 10, 8, 3), CREATION_DAY_SALES = c(13,
11, 15, 31, 5, 10, 27, 7, 18, 9, 8, 18, 6, 26, 4, 24, 16, 12,
15, 22, 10, 10, 25, 5, 28, 20, 10, 18, 14, 31, 9, 5, 22, 27,
6, 29, 18, 11, 6, 2, 16, 17, 1, 4, 23, 23, 16, 1, 25, 16), VALIDATION_YEAR_SALES = c(2,
1, 2, 2, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 2, 1, 2, 1,
1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2,
1, 1, 2, 1, 2, 1, 1), VALIDATION_MONTH_SALES = c(9, 9, 2, 11,
12, 3, 12, 2, 3, 6, 10, 3, 3, 10, 7, 11, 11, 5, 4, 9, 2, 3, 4,
8, 6, 7, 10, 5, 7, 9, 11, 6, 4, 4, 11, 8, 9, 8, 12, 12, 4, 8,
2, 11, 11, 1, 11, 10, 8, 3), VALIDATION_DAY_SALES = c(15, 14,
16, 3, 6, 19, 1, 8, 21, 10, 9, 21, 7, 1, 6, 25, 17, 13, 20, 29,
11, 20, 29, 6, 30, 22, 12, 20, 16, 1, 10, 7, 25, 28, 14, 30,
19, 13, 8, 4, 28, 24, 2, 7, 25, 25, 19, 3, 27, 21), AGE_CUSTUMER = c(32,
37, 24, 32, 44, 33, 29, 30, 56, 48, 44, 43, 37, 43, 35, 62, 60,
33, 51, 32, 35, 33, 28, 24, 32, 38, 33, 36, 54, 45, 39, 41, 55,
34, 54, 51, 45, 57, 24, 47, 35, 51, 45, 39, 31, 40, 42, 42, 39,
58), MEAN_Sales = c(0, 71.75, 50.7142857142857, 0, 0.666666666666667,
83.3333333333333, 0.333333333333333, 25.7777777777778, 23.3846153846154,
35.5294117647059, 21.6363636363636, 46.8461538461538, 18.4, 15.0666666666667,
110.25, 8.85714285714286, 0, 21.5, 18.5714285714286, 28.125,
101.333333333333, 69.1428571428571, 48.25, 20.5833333333333,
12, 20.3333333333333, 23, 15.1428571428571, 12.3913043478261,
30.3076923076923, 24.625, 23.375, 20.0833333333333, 32.75, 0,
1.5, 0, 50.6, 32.3846153846154, 33, 28.6818181818182, 19.8076923076923,
25.6666666666667, 9.83333333333333, 33, 55.3333333333333, 42.7,
0, 31.375, 11.625), NBR_GIFTS = c(1, 1, 1, 1, 1, 1, 1, 1, 4,
3, 4, 2, 1, 4, 1, 1, 1, 1, 3, 2, 1, 2, 2, 1, 3, 5, 4, 1, 9, 2,
5, 1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 1, 3, 2, 1, 1, 4, 4),
OUTCOME = c(3, 4, 7, 3, 3, 6, 3, 9, 26, 17, 22, 13, 10, 30,
4, 7, 7, 6, 14, 16, 3, 7, 12, 12, 15, 24, 21, 7, 46, 13,
16, 8, 12, 8, 3, 8, 3, 10, 13, 13, 22, 26, 12, 6, 13, 6,
10, 4, 16, 24)), .Names = c("YEAR_SALES", "CREATION_YEAR_SALES",
"TYPE_PEAU", "SENSIBILITE", "IMPERFECTIONS", "BRILLANCE", "GRAIN_PEAU",
"RIDES_VISAGE", "ALLERGIES", "MAINS", "PEAU_CORPS", "INTERET_ALIM_NATURELLE",
"INTERET_ORIGINE_GEO", "INTERET_VACANCES", "INTERET_ENVIRONNEMENT",
"INTERET_COMPOSITION", "MONTH_SALES", "DAY_SALES", "HOURS_INS",
"CREATION_MONTH_SALES", "CREATION_DAY_SALES", "VALIDATION_YEAR_SALES",
"VALIDATION_MONTH_SALES", "VALIDATION_DAY_SALES", "AGE_CUSTUMER",
"MEAN_Sales", "NBR_GIFTS", "OUTCOME"), row.names = c(1L, 2L,
3L, 5L, 9L, 13L, 14L, 16L, 18L, 19L, 20L, 24L, 27L, 29L, 30L,
32L, 33L, 35L, 36L, 37L, 39L, 44L, 49L, 51L, 52L, 53L, 55L, 56L,
61L, 62L, 63L, 65L, 66L, 67L, 71L, 74L, 75L, 80L, 81L, 84L, 86L,
90L, 92L, 95L, 96L, 99L, 100L, 103L, 104L, 107L), class = "data.frame")
My model of clustering is given by this code:
Model<-kmeans(DATAFINALE,centers = 3,nstart=20)
Then I need to get a plot as given in this link https://plot.ly/r/3d-scatter-plots/ having as title Basic 3D Scatter Plot.
Thank you in advance
First of all, you have to add the cluster vector to the dataset.
# convert them as factor to plot them right
DATAFINALE$cluster <- as.factor(Model$cluster)
Then you have to decide which variables plot as x,y,and z (I've taken randomly three):
x <-'MONTH_SALES'
y <-'DAY_SALES'
z <- 'HOURS_INS'
Lastly you can plot it, using the cluster as colors:
library(plotly)
p <- plot_ly(DATAFINALE, x = ~MONTH_SALES, y = ~ DAY_SALES, z = ~HOURS_INS, color = ~cluster) %>%
add_markers() %>%
layout(scene = list(xaxis = list(title = x),
yaxis = list(title = y),
zaxis = list(title = z)))
p
Here the result:

Removing specific strips in a double-strip plot

I'm trying to remove the redundant "pro/retro" labels on the second row of panels on my plot. However, I still want to keep the top row of panel labels intact. I've tried for the past hour to selectively remove the 1st strip on the 2nd panel row and I was wondering if anyone here knows how to do this. See below for technical details.
I have the following plot:
It was generated from the following data:
absBtwnDat <- structure(list(setSize = structure(c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
6L, 7L), .Label = c("2", "3", "4", "5", "6", "7", "8"), class = "factor"),
Measure = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("Actual", "Predicted"), class = "factor"),
Location = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("fix", "forced"), class = "factor"),
JudgementType = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("pro", "retro"), class = "factor"),
Accuracy = c(1.91388888888889, 2.95555555555556, 3.74861111111111,
4.37777777777778, 4.21527777777778, 3.0875, 2.85277777777778,
2, 2.99444444444444, 4, 4.77222222222222, 5.24444444444444,
5.18472222222222, 5.20277777777778, 1.98888888888889, 3,
3.97222222222222, 4.85972222222222, 5.70555555555556, 6.56944444444444,
7.27222222222222, 2, 3, 3.99444444444444, 4.99444444444444,
5.86944444444444, 6.75555555555556, 7.57777777777778, 1.96111111111111,
2.97777777777778, 3.78333333333333, 3.97222222222222, 4.22361111111111,
3.64722222222222, 3.68888888888889, 2, 3, 3.97222222222222,
4.67777777777778, 5.26944444444444, 5.4625, 5.8, 2, 3, 3.98333333333333,
4.87777777777778, 5.73055555555556, 6.48333333333333, 7.62916666666667,
2, 3, 3.98333333333333, 4.96666666666667, 5.96944444444444,
6.94444444444444, 7.93333333333333), LL = c(1.85, 2.87777777777778,
3.59861111111111, 4.15555555555556, 3.78888888888889, 2.73055555555556,
2.55555555555556, 2, 2.96111111111111, 4, 4.64444444444444,
5.01666666666667, 4.88333333333333, 4.88611111111111, 1.91111111111111,
3, 3.89444444444444, 4.73611111111111, 5.47777777777778,
6.20277777777778, 6.71666666666667, 2, 3, 3.96666666666667,
4.95555555555556, 5.65096686319131, 6.48333333333333, 7.17222222222222,
1.86637442123568, 2.92222222222222, 3.65, 3.61666666666667,
3.88333333333333, 3.17092476055122, 3.18888888888889, 2,
3, 3.92222222222222, 4.49444444444444, 5.0375, 5.09444444444444,
5.40555555555556, 2, 3, 3.92777777777778, 4.72222222222222,
5.52777777777778, 6.24444444444444, 7.37361111111111, 2,
3, 3.95, 4.88888888888889, 5.93333333333333, 6.88333333333333,
7.73065763697428), UL = c(1.95555555555556, 2.98333333333333,
3.84444444444444, 4.56666666666667, 4.6, 3.43611111111111,
3.17916666666667, 2, 3, 4, 4.86111111111111, 5.42777777777778,
5.48656054159421, 5.58611111111111, 2, 3, 4, 4.93888888888889,
5.83888888888889, 6.76944444444444, 7.6, 2, 3, 4, 5, 5.94166666666667,
6.88888888888889, 7.78888888888889, 1.98888888888889, 2.99444444444444,
3.87777777777778, 4.22777777777778, 4.53611111111111, 4.19722222222222,
4.20555555555556, 2, 3, 3.98888888888889, 4.78333333333333,
5.45555555555556, 5.79583333333333, 6.16666666666667, 2,
3, 3.99444444444444, 4.95, 5.85972222222222, 6.67222222222222,
7.80138888888889, 2, 3, 3.99444444444444, 4.98888888888889,
5.9875, 6.97222222222222, 7.98333333333333)), .Names = c("setSize",
"Measure", "Location", "JudgementType", "Accuracy", "LL", "UL"
), row.names = c(NA, -56L), class = "data.frame")
I visualized it using using the following code:
library(ggplot2)
p1 <- ggplot(data = absBtwnDat, aes(x = as.numeric(as.character(setSize)),
y = Accuracy, group = Measure,
colour = Measure))+
geom_point()+
geom_line(aes(linetype = Measure))+
scale_x_continuous("Trial Set Size", breaks = 2:8)+
scale_y_continuous("Accuracy (# Correct)", breaks = 0:8, limits = c(0, 8))+
geom_errorbar(aes(ymin = LL, ymax = UL), width = .1, size = .75)+
scale_colour_grey(start = .8, end = .4)+
facet_wrap(~JudgementType+Location, dir = "v")+
theme(legend.position = "top")
Just to be certain, I've highlighted unwanted strip in the following image:
With this you'll only have one row of labels per panel, but they still include both words.
p1 <- ggplot(data = absBtwnDat,
aes(x = as.numeric(as.character(setSize)), y = Accuracy,
group = Measure,
colour = Measure))+
geom_point()+
geom_line(aes(linetype = Measure))+
scale_x_continuous("Trial Set Size", breaks = 2:8)+
scale_y_continuous("Accuracy (# Correct)",
breaks = 0:8, limits = c(0, 8))+
geom_errorbar(aes(ymin = LL, ymax = UL),
width = .1, size = .75)+
scale_colour_grey(start = .8, end = .4)+
facet_wrap(~JudgementType + Location,
dir = "v",
labeller = label_wrap_gen(multi_line=FALSE)) +
theme(legend.position = "top")
p1
Here is a possible solution:
g1 <- ggplotGrob(p1)
k <- which(g1$layout$name=="strip-t-1-2")
g1$grobs[[k]]$grobs[[1]]$children[[2]]$children[[1]]$label <- ""
g1$grobs[[k]]$grobs[[1]]$children[[1]]$gp$fill <- NA
k <- which(g1$layout$name=="strip-t-2-2")
g1$grobs[[k]]$grobs[[1]]$children[[2]]$children[[1]]$label <- ""
g1$grobs[[k]]$grobs[[1]]$children[[1]]$gp$fill <- NA
library(grid)
grid.draw(g1)

Gantt chart simulation using ggplot

Is there a way to make the thinner lines in the plot (those without an y axis tick label) appear closer to the lines above (those with a label) so as to better simulate pairs of baseline / actual bars of the same activity in a gantt chart?
See gantt chart examples here and here.
mdfr <- structure(list(name = structure(c(8L, 8L, 8L, 8L, 6L, 6L, 6L,
6L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 7L, 7L, 7L, 7L, 5L, 5L, 5L,
5L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 8L, 8L, 8L, 8L, 6L, 6L, 6L,
6L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 7L, 7L, 7L, 7L, 5L, 5L, 5L,
5L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L), .Label = c("100 A", "100 B",
"101 A", "101 B", "102 A", "102 B", "103 A", "103 B"), class = "factor"),
stadio = c(2, 4, 5, 7, 2, 4, 5, 7, 2, 4, 5, 7, 2, 4, 5, 7,
1, 3, 6, 8, 1, 3, 6, 8, 1, 3, 6, 8, 1, 3, 6, 8, 2, 4, 5,
7, 2, 4, 5, 7, 2, 4, 5, 7, 2, 4, 5, 7, 1, 3, 6, 8, 1, 3,
6, 8, 1, 3, 6, 8, 1, 3, 6, 8), variable = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("start_date", "end_date"), class = "factor"),
value = c("05/10/2012", "17/12/2012", "12/03/2012", "30/05/2013",
"10/01/2013", "14/10/2013", "24/10/2013", "10/01/2014", "30/09/2013",
"29/01/2014", "30/01/2014", "06/05/2014", "30/09/2013", "29/01/2014",
"30/01/2014", "06/05/2014", "05/10/2012", "17/12/2012", "12/03/2012",
"30/05/2013", "10/01/2013", "14/10/2013", "24/10/2013", "10/01/2014",
"30/09/2013", "29/01/2014", "30/01/2014", "05/06/2014", "30/09/2013",
"29/01/2014", "30/01/2014", "05/06/2014", "17/12/2012", "12/03/2012",
"30/05/2013", "30/05/2014", "14/10/2013", "24/10/2013", "10/01/2014",
"11/07/2014", "29/01/2014", "30/01/2014", "06/05/2014", "23/12/2014",
"29/01/2014", "30/01/2014", "06/05/2014", "23/12/2014", "17/12/2012",
"12/03/2012", "30/05/2013", "30/05/2014", "14/10/2013", "24/10/2013",
"10/01/2014", "11/07/2014", "29/01/2014", "30/01/2014", "05/06/2014",
"28/12/2014", "29/01/2014", "30/01/2014", "05/06/2014", "29/12/2014"
), rating = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("3",
"5"), class = "factor")), row.names = c(NA, -64L), .Names = c("name",
"stage", "variable", "value", "rating"), class = "data.frame")
names <- as.character(unique(mdfr$name))
names1 <- names[gsub("[^ B]","",names) == " B"]
names1 <- paste("No.",gsub("[ B]","",names1),sep="")
names2 <- rep("",length(names1))
new.names <- c(names1,names2)
ggplot(mdfr, aes(as.POSIXct(as.Date(value, "%d/%m/%Y")), name, colour = factor(stage))) +
geom_line(aes(size=rating)) +
labs(colour="(Baseline/Actual):", x = "", y = "") +
scale_colour_brewer(palette="RdYlGn",breaks = c("1", "3", "6","8"), guide = "none") +
scale_size_manual(breaks = levels(mdfr$rating), values = as.integer(levels(mdfr$rating)), guide = "none") +
scale_y_discrete(breaks=names, labels=new.names)
I would use facets to do this. Below you find a possible solution. This may not be the most elegant solution, but it lets you change the distance between thinner and thicker lines by changing the expand argument in scale_x_discrete.
# numbers to facet by (levels used for order of the facets)
mdfr$nr <- factor(paste0("No.", as.numeric(gsub("A|B", "", mdfr$name))),
levels=unique(paste0("No.", as.numeric(gsub("A|B", "", mdfr$name)))))
# recast your data
df <- dcast(mdfr, nr+stage+rating~variable)
# plot as before, switched x and y values
ggplot(df, aes(x=factor(rating),
ymin=as.POSIXct(as.Date(start_date, "%d/%m/%Y")),
ymax=as.POSIXct(as.Date(end_date, "%d/%m/%Y")),
color=factor(stage),
size=rating
)) +
geom_linerange() + # linerange instead of line
facet_grid(nr~., scales="free_x") + # faceting
coord_flip() + # flip coordinates back
scale_x_discrete(name="", breaks=NULL, expand=c(4,1)) + # use the expand variable to change the distances
scale_colour_brewer(palette="RdYlGn",breaks = c("1", "3", "6","8"), guide = "none") +
scale_size_manual(breaks = levels(mdfr$rating), values = as.integer(levels(mdfr$rating)), guide = "none")

R scatterplot3d: a custom axis step and ticks

Greeting to all.
I am striving with a scatterplot3d plot -- a graphical representation of a data.frame of three variables where one of them is a response variable, where I have a wrong representation of the axis steps. Here is the code ("temp" is a data.frame):
library(scatterplot3d)
temp[,1] <- as.numeric(levels(temp[,1]))[temp[,1]]
for (m in temp[,2]) m <- as.factor(as.numeric(m))
for (m in temp[,3]) m <- as.factor(as.numeric(m))
colnames(temp) = c("Values", "Factors", "AntiFactors") # "Values" is that responce variable
xtickmarks<-c("AntiFactor1","AntiFactor1", "AntiFactor3")
ytickmarks<-c("Factor1","Factor2")
plot3d <- scatterplot3d(temp[,3], temp[,2], temp[,1], color = "blue",
pch = 19, type = "h", box = T, xaxt = "n",
x.ticklabs=xtickmarks, y.ticklabs=ytickmarks,
zlab = "Time, min.")
dput(temp)
structure(list(Values = c(395, 310, 235, 290, 240, 490, 270,
225, 430, 385, 170, 55, 295, 320, 270, 130, 300, 285, 130, 200,
225, 90, 205, 340, 3, 8, 1, 0, 0, 0, 3, 2, 5, 2, 3, 5, 2, 3,
200, 5, 5, 10, 5, 5, 5, 10, 10, 130, 5, 200, 80, 10, 360, 10,
5, 8, 30, 8, 10, 10, 10, 5, 240, 120, 3, 10, 25, 5, 5, 10, 190,
30, 115, 1, 1, 1, 2, 3, 5, 2, 5, 3, 3, 3, 2, 3, 2, 3, 0, 0, 195,
150, 2, 2, 0, 2, 1, 1, 2, 1, 2, 1, 1, 1, 3, 2, 2, 1, 2, 2, 1,
1, 2, 3, 2, 2, 1, 3, 1, 1), Factors = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("Factor1", "Factor2"), class = "factor"),
AntiFactors = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("AntiFactor1", "AntiFactor2",
"AntiFactor3"), class = "factor")), .Names = c("Values",
"Factors", "AntiFactors"), row.names = c(NA, -120L), class = "data.frame")
Here is the picture of that plot I got:
The trouble is what I got twice more ticks at the x and y axis than it is needed. It is intended to have just one set of those Factor1..2 and AntiFactor1..3 ticks at each of those x, y axis. If I run that scatterplot3d without using x.ticklabs option, it gives "0, 0.5, 1, 1.5, 2.0, ...3.0" ticks etc at the axis. What is the way to set my step in x, y axis to be just a strong integer "1", so that all my discrete ticks to be displayed in their right place?
It seems that scatterplot3d coerces your discrete explanatory variables 'Factor' and 'AntiFactor' from factor to numeric. See e.g.:
levels(df$Factors)
# [1] "Factor1" "Factor2"
unique(as.numeric(df$Factors))
# [1] 1 2
levels(df$AntiFactors)
# [1] "AntiFactor1" "AntiFactor2" "AntiFactor3"
unique(as.numeric(df$AntiFactors))
# [1] 1 2 3
The labels you have created are recycled to get a label at each (default) tick mark. Also note your typo in 'xtickmarks' - I assume the second 'AntiFactor1' should be 'AntiFactor2'.
You may consider alternative ways to visualize your data, e.g. something like this:
library(ggplot2)
ggplot(data = temp, aes(x = AntiFactors, y = Values, fill = Factors)) +
geom_boxplot()

Resources