lmerTest::lmer - model is not identifiable - r

I would like to know whether a plant development score depends on a plant treatment. So I have the following experimental setup:
Treatment: "Control" or "Treated"
Plantpart: the part of the plant that was followed. Either "Root", "Stem" or "Leaf".
Score: the development score of the plant part. Explained variable, numeric (continuous).
I also have two factors treated as random errors:
Block: 4 blocks (places where plants were grown)
Biological_Replicate: each plant was used to gather the 3 plant parts (root, stem, leaf). Thus the scores of the plant parts of a given plant are not independent. There are 3 biological replicates per Block for treated and control plants.
I defined the variables then implemented the model:
library(lmerTest)
Score=Data$Score
Treatment=Data$Treatment
Biological_Replicate=as.factor(Data$Biological_Replicate)
Block=as.factor(Data$Block)
model<-lmer(Score~Treatment + (1|Biological_Replicate) + (1|Block), REML=FALSE)
Trying to retrieve the approximated p-value with coef(summary(model))
yielded the error:
Model is not identifiable...
summary from lme4 is returned
some computational error has occurred in lmerTest
The full data is below. The question is: what is wrong with the code, and/or the data?
Data<-structure(list(Treatment = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("Control", "Treated"), class = "factor"),
Plantpart = structure(c(1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L,
1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L,
2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L,
3L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L,
1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L,
2L, 2L, 2L), .Label = c("Leaf", "Root", "Stem"), class = "factor"),
Block = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2,
2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4,
4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4), Biological_Replicate = c(1,
2, 3, 1, 2, 3, 1, 2, 3, 4, 5, 6, 4, 5, 6, 4, 5, 6, 7, 8,
9, 7, 8, 9, 7, 8, 9, 10, 11, 12, 10, 11, 12, 10, 11, 12,
13, 14, 15, 13, 14, 15, 13, 14, 15, 16, 17, 18, 16, 17, 18,
16, 17, 18, 19, 20, 21, 19, 20, 21, 19, 20, 21, 22, 23, 24,
22, 23, 24, 22, 23, 24), Score = c(20628, 26610, 11410, 18755,
17366, 13228, 27011, 17558, 16512, 30945, 28606, 29092, 23262,
18306, 23034, 9627, 16193, 24391, 35197, 26092, 23789, 29900,
22649, 23548, 23868, 18495, 17204, 31750, 27496, 24687, 24115,
25911, 25076, 12472, 12267, 13120, 21580, 20697, 14854, 7190,
55734, 12194, 23853, 16762, 18322, 27582, 28056, 28497, 16156,
17680, 21789, 10137, 18122, 9786, 23866, 30878, 23101, 18104,
22276, 23694, 18534, 20743, 15460, 31997, 32559, 28969, 20408,
24503, 21395, 9925, 15407, 14717)), .Names = c("Treatment",
"Plantpart", "Block", "Biological_Replicate", "Score"), row.names = c(NA,
-72L), class = "data.frame")

Related

How to set a legend/key and colors independent of or directly linked to certain values in ggplot2?

Soo, I have got a series of plots that I would like to make - depending on a survey done with people. All of them depend on a corresponding column of a data frame, each column filled with a different rang of numbers, from 1 to x, where x depends on the question the plot is related to (some question are answered from 1 to 5, some from 1 to 7 and so on)
I would like to have a fixed legend/key for those questions/plots that have the same answering possibilities e.g.: c("Strongly disagree", "Disagree", "Somewhat disagree", "Neither agree or disagree", "Somewhat agree", "Agree", "Strongly agree")) the first option "Strongly disagree" is a "1" in the data, "Disagree" is a "2" and so on.
To make them easily comparable they should have the same legend/key with the same options and colours.
My Problem is that there are a number of occasions where one or more of the answer options of a question was not chosen by any of the respondents. My current code that looks something like this:
education_plot <- ggplot(Data) +
aes(Cluster, fill = as.character(Education)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(name = "Level of education", labels = c("No schooling completed", "Some high school, no diploma", "High school graduate, diploma or
#the equivalent", "College graduate", "Trade/technical/vocational training", "Bachelors degree", "Masters degree", "Doctorate degree")
I have got number of these codeblocks to build every single graph. A graph should display which option was chosen how often (scaled to 100%) in each respondent cluster.
Example:
If now no respondent was choosing "No schooling completed" ("1") the legend/key would still use this term and assign a colour but would display the answers "Some high school, no diploma" ("2") in the colour of "No schooling completed", so the legend/key would have the wrong names with the values theoretically connected with it and would not show all of the answer options in the legend/key. (cuts of the last n answer options in the legend/key where n is the number of answer options that nobody chose)
Image of an example graph
Here the last answer option "Doctorate degree" is cut off but actually nobody chose the first option: "No schooling completed", but these are shown and coloured in the "wrong" data since it should be 0/no bar for this option.
Can someone help me with setting a legend/key that is always fully printed and then showing of the correct values including 0 if not chosen by any respondent???
edit:
my test code looks like this:
color_mapping <- setNames(hue_pal() (8), 8)
education_plot <- ggplot(Data) +
aes(Cluster, fill = as.character(Education)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(name = "Level of education", values = color_mapping, drop = FALSE, labels = c("No schooling completed", "Some high school, no diploma", "High school graduate, diploma or
the equivalent", "College graduate", "Trade/technical/vocational training", "Bachelors degree", "Masters degree", "Doctorate degree"))
resulting graph
The problem is that the last label still is not represented in the legend ("Doctorate degree") and the data is coloured/connected wrongly since in this example no respondent answered with "No schooling completed". My code simply doesnt know how to match the right value (1-8 in this example) to the right category (label), so it finds 7 different values (2-8) and assigns them to the first 7 labels I definded. How do I tell my code how to match them and shouldnt the legend at least present "Doctorate degree" sind I set drop = FALSE
Dataset produced by dput():
! structure(list(Education = c(7, 4, 7, 7, 8, 6, 6, 8, 8, 6, 4,
5, 6, 7, 6, 8, 4, 4, 8, 7, 7, 3, 5, 7, 4, 4, 7, 7, 7, 5, 7, 3,
7, 8, 6, 8, 5, 7, 5, 6, 4, 6, 3, 6, 7, 7, 6, 4, 2, 7, 3, 6, 4,
4, 6, 6, 4, 4, 8, 7, 4, 4, 8, 6, 5, 7, 7, 7, 7, 4, 6, 4, 8, 8,
7, 8, 8, 6, 7, 4, 6, 6, 6, 5, 6, 7, 7, 4, 7, 6, 7, 7, 7, 4, 6,
7, 6, 3, 7, 7, 7, 6, 6, 4, 6, 4, 6, 4, 8, 7, 4, 5, 4, 6, 4, 7,
6, 6, 4, 7, 6, 6, 8, 7, 8, 5, 7, 7, 8, 7, 6, 6, 6, 4, 8, 7, 8,
6, 6, 4, 7, 6, 6, 6, 3, 7, 7, 4, 8, 8, 7, 8, 7, 4, 6, 4, 8, 6,
7, 7, 3, 7, 5, 8, 6, 3, 7, 7, 8, 4, 8, 6, 7, 7, 6, 6, 3, 6, 6,
8, 6, 6, 2, 4, 7, 6, 8, 8, 6, 3, 4, 8, 7, 6, 5, 7, 7, 8, 7, 3,
6, 4, 4, 4, 7, 4, 8, 7, 7, 6, 6, 6, 6, 6, 3, 4, 7, 6, 6, 6, 6,
6, 4, 6, 7, 7, 3, 6, 7, 6, 6, 6, 4, 7, 6, 6, 6, 7, 7, 4, 6, 3,
6, 6, 6, 6, 7, 6, 6, 4, 4, 6, 6, 4, 4, 4, 6, 4, 6, 6, 6, 6, 6,
6, 4, 6, 4, 4, 6, 6, 6, 8, 6, 6), Cluster = c(4L, 4L, 2L, 2L,
2L, 2L, 4L, 3L, 3L, 2L, 3L, 2L, 4L, 4L, 2L, 4L, 2L, 2L, 4L, 4L,
2L, 3L, 3L, 2L, 3L, 2L, 1L, 4L, 2L, 4L, 4L, 1L, 2L, 2L, 4L, 2L,
1L, 2L, 4L, 2L, 1L, 2L, 4L, 3L, 3L, 2L, 1L, 1L, 1L, 2L, 1L, 2L,
2L, 4L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 3L, 2L, 1L, 3L, 2L, 2L,
4L, 2L, 2L, 4L, 2L, 2L, 2L, 4L, 4L, 2L, 2L, 4L, 2L, 2L, 3L, 2L,
2L, 4L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L,
4L, 1L, 2L, 4L, 4L, 2L, 3L, 2L, 2L, 2L, 4L, 4L, 1L, 2L, 4L, 4L,
4L, 2L, 4L, 2L, 2L, 2L, 2L, 1L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 4L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 4L, 2L, 3L, 2L, 2L, 2L,
4L, 2L, 2L, 2L, 3L, 4L, 2L, 2L, 4L, 4L, 1L, 2L, 2L, 4L, 2L, 2L,
2L, 3L, 4L, 4L, 4L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 3L,
2L, 2L, 2L, 2L, 3L, 4L, 4L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 3L, 2L,
3L, 4L, 2L, 2L, 4L, 1L, 2L, 2L, 4L, 2L, 4L, 3L, 4L, 2L, 3L, 1L,
4L, 4L, 4L, 2L, 2L, 2L, 4L, 2L, 1L, 3L, 2L, 1L, 2L, 2L, 3L, 2L,
3L, 1L, 4L, 3L, 4L, 3L, 3L, 4L, 4L, 4L, 1L, 2L, 3L, 2L, 3L, 4L,
3L, 4L, 4L, 2L, 4L, 4L, 2L, 4L, 4L, 2L, 2L, 2L, 4L, 4L, 2L, 4L,
3L, 1L, 4L, 4L, 2L, 2L, 4L, 2L, 2L, 4L, 3L, 2L, 2L, 1L)), row.names = c(NA,
-274L), class = "data.frame")
Update responding to Update 2 from Dan Adams:
my Code:
education_plot <- ggplot(test1) +
aes(Cluster, fill = as.character(Education)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(name = "Level of education", values = color_mapping, drop = F)
education_plot
his code:
data1 %>%
ggplot(aes(x = Cluster)) +
geom_bar(aes(fill = Education), stat = "count", position = "fill") +
scale_fill_manual(values = color_mapping, drop = F) +
scale_y_continuous(labels = percent)
which is the result I wanted.
Update 3
You are using as.character(Education) which doesn't have levels so it will never be able to retain values not present in the actual data. This can only be accomplished with a factor. Also you need a factor to enforce the order. Otherwise the categories will sort alphabetically.
Update 2
I'll leave my original answer below in case it's helpful to others down the line. However, with the data you shared, I think it's easiest to use fct_recode() to modify the original data with the labels you want.
# load libraries
library(tidyverse)
library(scales)
# import data
data <- structure(list(Education = c(7, 4, 7, 7, 8, 6, 6, 8, 8, 6, 4, 5, 6, 7, 6, 8, 4, 4, 8, 7, 7, 3, 5, 7, 4, 4, 7, 7, 7, 5, 7, 3, 7, 8, 6, 8, 5, 7, 5, 6, 4, 6, 3, 6, 7, 7, 6, 4, 2, 7, 3, 6, 4, 4, 6, 6, 4, 4, 8, 7, 4, 4, 8, 6, 5, 7, 7, 7, 7, 4, 6, 4, 8, 8, 7, 8, 8, 6, 7, 4, 6, 6, 6, 5, 6, 7, 7, 4, 7, 6, 7, 7, 7, 4, 6, 7, 6, 3, 7, 7, 7, 6, 6, 4, 6, 4, 6, 4, 8, 7, 4, 5, 4, 6, 4, 7, 6, 6, 4, 7, 6, 6, 8, 7, 8, 5, 7, 7, 8, 7, 6, 6, 6, 4, 8, 7, 8, 6, 6, 4, 7, 6, 6, 6, 3, 7, 7, 4, 8, 8, 7, 8, 7, 4, 6, 4, 8, 6, 7, 7, 3, 7, 5, 8, 6, 3, 7, 7, 8, 4, 8, 6, 7, 7, 6, 6, 3, 6, 6, 8, 6, 6, 2, 4, 7, 6, 8, 8, 6, 3, 4, 8, 7, 6, 5, 7, 7, 8, 7, 3, 6, 4, 4, 4, 7, 4, 8, 7, 7, 6, 6, 6, 6, 6, 3, 4, 7, 6, 6, 6, 6, 6, 4, 6, 7, 7, 3, 6, 7, 6, 6, 6, 4, 7, 6, 6, 6, 7, 7, 4, 6, 3, 6, 6, 6, 6, 7, 6, 6, 4, 4, 6, 6, 4, 4, 4, 6, 4, 6, 6, 6, 6, 6, 6, 4, 6, 4, 4, 6, 6, 6, 8, 6, 6), Cluster = c(4L, 4L, 2L, 2L, 2L, 2L, 4L, 3L, 3L, 2L, 3L, 2L, 4L, 4L, 2L, 4L, 2L, 2L, 4L, 4L, 2L, 3L, 3L, 2L, 3L, 2L, 1L, 4L, 2L, 4L, 4L, 1L, 2L, 2L, 4L, 2L, 1L, 2L, 4L, 2L, 1L, 2L, 4L, 3L, 3L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 4L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 3L, 2L, 1L, 3L, 2L, 2L, 4L, 2L, 2L, 4L, 2L, 2L, 2L, 4L, 4L, 2L, 2L, 4L, 2L, 2L, 3L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 4L, 1L, 2L, 4L, 4L, 2L, 3L, 2L, 2L, 2L, 4L, 4L, 1L, 2L, 4L, 4L, 4L, 2L, 4L, 2L, 2L, 2L, 2L, 1L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 4L, 2L, 3L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 3L, 4L, 2L, 2L, 4L, 4L, 1L, 2L, 2L, 4L, 2L, 2L, 2L, 3L, 4L, 4L, 4L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 3L, 4L, 4L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 4L, 2L, 2L, 4L, 1L, 2L, 2L, 4L, 2L, 4L, 3L, 4L, 2L, 3L, 1L, 4L, 4L, 4L, 2L, 2L, 2L, 4L, 2L, 1L, 3L, 2L, 1L, 2L, 2L, 3L, 2L, 3L, 1L, 4L, 3L, 4L, 3L, 3L, 4L, 4L, 4L, 1L, 2L, 3L, 2L, 3L, 4L, 3L, 4L, 4L, 2L, 4L, 4L, 2L, 4L, 4L, 2L, 2L, 2L, 4L, 4L, 2L, 4L, 3L, 1L, 4L, 4L, 2L, 2L, 4L, 2L, 2L, 4L, 3L, 2L, 2L, 1L)), row.names = c(NA, -274L), class = "data.frame")
# create renaming key
ed_factor_naming <-
setNames(
object = as.character(1:8),
nm = c(
"No schooling completed",
"Some high school, no diploma",
"High school graduate, diploma or the equivalent",
"College graduate",
"Trade/technical/vocational training",
"Bachelors degree",
"Masters degree",
"Doctorate degree"
)
)
# recode data using key
data1 <- data %>%
mutate(Education = factor(Education, levels = 1:8)) %>%
mutate(Education = fct_recode(Education, !!!ed_factor_naming))
# set color mapping from levels
color_mapping <- setNames(hue_pal()(length(levels(data1$Education))), levels(data1$Education))
# plot with drop = FALSE to retain empty levels
data1 %>%
ggplot(aes(x = Cluster)) +
geom_bar(aes(fill = Education), stat = "count", position = "fill") +
scale_fill_manual(values = color_mapping, drop = F) +
scale_y_continuous(labels = percent)
Created on 2021-03-16 by the reprex package (v1.0.0)
Update 1
You can still do this as I described, but you can set whatever labels you like in scale_fill_manual() to recode the levels in your data to what you want them to display as. Alternatively you can change them in your actual data with functions like mutate(var = case_when(***)) or factor_recode(). See updated example below:
Original Answer
Two keys to getting what you wanted here:
Use a named vector for colors to unambiguously assign them so that they will always map the same even if some are empty.
Add drop = FALSE to scale_fill_manual() to retain empty factor levels.
# load packages
library(tidyverse)
library(scales)
# make data reproducible
set.seed(1)
# simulate data
grp = 1:4
freq = LETTERS[1:5]
df <- expand_grid(grp, freq) %>%
mutate(across(everything(), as.factor)) %>%
bind_cols(count = sample(x = c(1:10, rep(NA, 4)),
size = length(grp)*length(freq),
replace = T)) %>%
mutate(count = ifelse(freq == "E", NA, count)
)
# set unambiguous color mapping for each category with named vector
color_mapping <- setNames(hue_pal()(length(freq)), freq)
# plot and use drop = FALSE in scale_fill_manual() to preserve empty factor levels
df %>%
ggplot(aes(x = grp, y = count)) +
geom_col(aes(fill = freq), position = "fill") +
scale_fill_manual(values = color_mapping, drop = F, labels = c("These", "Are", "Arbitrary", "Legend", "Labels"))
#> Warning: Removed 9 rows containing missing values (position_stack).
Created on 2021-03-16 by the reprex package (v1.0.0)

How to make a stacked barplot with nested grouping variables?

I am trying to make a stacked barplot with two variables. My desired outcome looks like this:
This is the first part of my data. There are 220 more rows:
Type Week Stage
<chr> <dbl> <dbl>
1 Captured 1 2
2 Captured 1 1
3 Captured 1 1
4 Captured 1 2
5 Captured 1 1
6 Captured 1 3
7 Captured 1 NA
8 Captured 1 3
9 Captured 1 2
10 Captured 1 1
So far I'm not getting anywhere, this is my code so far
library(data.table)
dat.m <- melt(newrstudio2, id.vars="Type")
dat.m
library(ggplot2)
ggplot(dat.m, aes(x=Type, y=value, fill=variable)) +
geom_bar(stat="identity")
I guess I need to calculate the number of observations of each stage in each week of each type? I've tried both long and wide data, but I somehow need to combine week with type? I don't know, I'm at a loss.
Alternative way:
set.seed(123)
# sample data
my_data <- data.frame(Type = sample(c("W", "C"), 220, replace = TRUE),
Week = sample(paste0("Week ", 1:4), 220, replace = TRUE),
Stage = sample(paste0('S', 1:4), 220, replace = TRUE))
head(my_data)
library(ggplot2)
ggplot(my_data, aes(x = Type, fill = Stage)) +
geom_bar(aes(y = (..count..)/sum(..count..)), position = "fill") +
facet_grid(. ~ Week, switch="both") +
scale_y_continuous(labels = scales::percent) +
ylab("Stage [%]") +
theme(strip.background = element_blank(),
strip.placement = "outside",
panel.spacing = unit(0, "lines"))
Alternatively we could use base graphics. First, what you're probably most interested in, we should reshape the data.
For this we could split the data per week and run a dcast() over it.
L <- lapply(split(d, d$week), function(x)
data.table::dcast(x, type ~ stage, value.var="stage", fun=length))
d2 <- do.call(rbind, L) # transform back into a data frame
Now – with credits to #alemol – we want the proportions.
d2[-1] <- t(apply(d2[-1], 1, prop.table))
Then we are able to plot relatively simply. Note, that barplot() additionally gives us a vector of bar coordinates which we can use later for the axis() labels.
cols <- c("#ed1c24", "#ff7f27", "#00a2e8", "#fff200") # define stage colors
par(mar=c(5, 5, 3, 5) + .1, xpd=TRUE) # set plot margins
p <- barplot(t(d2[-1]), col=cols, border="white", space=rep(c(.2, 0), 5),
font.axis=2, xaxt="n", yaxt="n", xlab="Week")
axis(1, at=p, labels=rep(c("C", "W"), 5), tick=FALSE, line=0)
axis(1, at=apply(matrix(p, , 2, byrow=TRUE), 1, mean), labels=1:5, tick=FALSE, line=1)
axis(2, at=0:10/10, labels=paste0(seq(0, 100, 10), "%"), line=0, las=2)
legend(12, .5, legend=rev(names(d2[-1])), col=rev(cols), pch=15, title="Stage")
Result:
Data:
d <- structure(list(type = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L,
1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L,
2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L,
2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L,
2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L), .Label = c("C", "W"), class = "factor"), week = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), stage = c(3L,
1L, 1L, 2L, 2L, 2L, 1L, 3L, 2L, 4L, 1L, 1L, 2L, 2L, 3L, 4L, 3L,
2L, 4L, 1L, 1L, 3L, 1L, 2L, 3L, 1L, 4L, 1L, 2L, 4L, 2L, 3L, 4L,
4L, 2L, 4L, 4L, 2L, 3L, 1L, 1L, 4L, 4L, 1L, 4L, 3L, 3L, 3L, 2L,
1L, 3L, 4L, 2L, 4L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 2L, 1L, 3L, 2L,
1L, 1L, 1L, 4L, 2L, 4L, 1L, 4L, 3L, 4L, 4L, 4L, 2L, 2L, 2L, 2L,
2L, 1L, 3L, 4L, 2L, 4L, 4L, 2L, 2L, 3L, 4L, 4L, 3L, 3L, 1L, 1L,
1L, 2L, 4L, 3L, 1L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 4L, 2L, 1L,
2L, 1L, 3L, 3L, 2L, 4L, 3L, 1L, 1L, 4L, 1L, 4L, 4L, 1L, 2L, 2L,
2L, 1L, 3L, 4L, 3L, 4L, 3L, 4L, 4L, 3L, 1L, 1L, 2L, 1L, 2L, 3L,
2L, 2L, 1L, 4L, 3L, 4L, 2L, 2L, 3L, 1L, 2L, 3L, 3L, 3L, 3L, 2L,
1L, 2L, 2L, 1L, 1L, 3L, 4L, 3L, 4L, 2L, 4L, 1L, 1L, 2L, 1L, 3L,
2L, 1L, 3L, 3L, 2L, 2L, 1L, 3L, 2L, 2L, 2L, 1L, 4L, 2L, 4L, 2L,
4L, 3L, 3L, 1L, 3L, 4L, 3L, 2L, 1L, 2L, 4L, 1L, 2L, 4L, 2L, 1L,
2L, 1L, 2L, 2L, 3L, 1L, 3L, 3L, 3L, 2L, 2L, 1L, 2L, 3L, 2L, 2L,
1L, 2L, 1L, 3L, 3L, 2L, 1L, 3L, 4L, 2L, 1L, 2L, 4L, 3L, 4L, 2L,
3L, 2L, 4L, 1L, 4L, 4L, 2L, 1L, 2L)), row.names = c(NA, -250L
), class = "data.frame")
Is this what you're looking for:
set.seed(123)
# sample data
my_data <- data.frame(Type = sample(paste0('T', 1:4), 220, replace = TRUE),
Week = sample(paste0('W', 1:4), 220, replace = TRUE),
Stage = sample(paste0('S', 1:4), 220, replace = TRUE))
ggplot(my_data, aes(x=Week:Type, fill = Stage)) + geom_bar()

Make Y-axis start at 1 instead of 0 within ggplot bar chart

I have some survey data where people answered how much they strongly agree, agree, disagree, strongly disagree with different statements. Their responses could be any value (decimals included) between 1 and 4 (1 = strongly disagree, 2=disagree, etc...).
I want summarize this data by plotting the mean for each variable within a bar chart. I also want to change the Y axis labels to not be numeric values, but the labels at the anchor points of 1 = strongly disagree, 2 = disagree, etc...
Given the data included below, I can accomplish this with the following code:
ggplot(data = data, aes(x=factor(key), y=value, fill=key)) +
stat_summary(fun.y="mean", geom="bar", width = 0.5) +
stat_summary(aes(label=round(..y..,1)), fun.y="mean", geom="text", vjust = -0.5) +
geom_hline(yintercept = 3, linetype="solid", color = "red", size=1.5, alpha=0.25) +
scale_y_discrete(limits=c("Strongly Disagree", "Disagree", "Agree", "Strongly Agree"))
This is close to what I need, but I would really like to make the Y-axis start at 1 = Strongly Disagree instead of 0.
I was thinking that I could just subtract 1 from all of the numeric responses, but then my average score labels for each bar would be incorrect.
The only constraint that I have is that I would like to accomplish this task within ggplot, and hopefully not by reshaping the original data. I have another chart like this where I used facet_wrap() to create the same chart for each group (variable not included) within my dataset.
I've done much searching, but it seems changing the starting point of the axis in ggplot is not something that is typically advised. However, given this situation, it think it sounds acceptable.
data <- structure(list(key = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L), .Label = c("Clarity", "Appropriateness", "Commitment"
), class = "factor"), value = c(NA, 3.33333333333333, 3.33333333333333,
4, 4, 3, 4, NA, 3, NA, 3, 4, NA, NaN, 3, 2.66666666666667, 3,
NA, 3.33333333333333, 3.66666666666667, 3.66666666666667, 4,
NA, 3, 4, 3.66666666666667, 3, 2.66666666666667, 3, 4, 4, 3,
3, NaN, 3, 4, 3, 4, 3, 4, 4, 2.33333333333333, 3, 4, 4, 3, 4,
3, 3, 3.33333333333333, 3, 4, 3, NA, 2.66666666666667, 3.33333333333333,
4, 2.33333333333333, 3.66666666666667, 4, 4, 3, NA, 3, 4, 3.2,
4, 3, 4, NA, 3.2, NA, 3, 4, NA, 4, 3, 3.4, 3, NA, 2.8, 3.6, 3.6,
3.8, NA, 3, 3.4, 3.2, 3, 3, 3.4, 3.8, 3.6, 3, 3, NaN, 2.4, 4,
3, 3.2, 3.2, 4, 4, 2.6, 3.8, 4, 4, 3.6, 3.2, 3, 3, 4, 2.8, 4,
3, NA, 3.4, 3.4, 4, 2.6, 3.8, 4, 3.4, 3, NA, 2.33333333333333,
4, 3.66666666666667, 4, 3, 4, NA, 3.33333333333333, NA, 4, 4,
NA, 4, 4, 2.33333333333333, 3.66666666666667, NA, 3, 4, 4, 4,
NA, 3.33333333333333, 3, 4, 3.33333333333333, 3.66666666666667,
3.33333333333333, 4, 4, 2.33333333333333, 3.66666666666667, NaN,
3, 4, 3, 3, 4, 3.66666666666667, 4, 3.33333333333333, 4, 3.66666666666667,
4, 4, 4, 3.66666666666667, 3, 3.33333333333333, 3.66666666666667,
3.66666666666667, 2.66666666666667, NA, 2.33333333333333, 3,
4, 3, 3.66666666666667, 4, 4, 4)), class = "data.frame", row.names = c(NA,
-186L))
coord_cartesian() gets the job done by plotting on the limited area while still retaining the data:
If you use the limits = call in scale_y_continuous() your plot would break.
Code
ggplot(data = data, aes(x = key, y = value, fill = key)) +
stat_summary(fun.y = "mean", geom = "bar", width = 0.5) +
stat_summary(aes(label = round(..y.., 1)),
fun.y="mean", geom="text", vjust = -0.5) +
geom_hline(yintercept = 3, linetype = "solid",
color = "red", size = 1.5, alpha = 0.25) +
# limit the vertical space to 1 to 4, but keep the data
coord_cartesian(ylim = c(1, 4)) +
# set ticks at 1, 2, 3, 4
scale_y_continuous(breaks = c(1:4),
# label them with names
labels = c("Strongly Disagree", "Disagree",
"Agree", "Strongly Agree"))

Plot in 3D clusters using plotly package

I need to present 3 clusters in 3D using the plotly package in R. The clusters are generated using the k-means function included in R. I searched but I find only using ggplot package.
How can I do this, please?
This is a part of my data set to give reproducible example.
> dput(DATAFINALE[1:50,])
structure(list(YEAR_SALES = c(2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L), CREATION_YEAR_SALES = c(2L,
1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L,
1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L,
1L), TYPE_PEAU = c(2L, 3L, 4L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 4L, 4L, 1L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L,
3L, 4L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 4L, 2L, 2L, 3L,
3L, 2L, 2L, 2L, 2L, 2L, 4L), SENSIBILITE = c(3L, 3L, 3L, 2L,
1L, 3L, 3L, 2L, 2L, 2L, 3L, 1L, 3L, 1L, 2L, 3L, 3L, 2L, 3L, 3L,
3L, 3L, 3L, 2L, 1L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 3L, 2L, 1L, 3L,
3L, 3L, 3L, 1L, 2L, 2L, 3L, 2L, 3L, 3L, 3L, 1L, 2L, 3L), IMPERFECTIONS = c(2L,
3L, 2L, 1L, 3L, 2L, 2L, 1L, 2L, 1L, 2L, 3L, 2L, 2L, 1L, 3L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 3L, 2L,
1L, 3L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 1L, 2L, 3L, 1L,
2L), BRILLANCE = c(3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L,
3L, 1L, 3L, 3L, 1L, 3L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 3L, 3L, 3L,
1L, 3L, 3L, 3L, 3L, 3L, 3L), GRAIN_PEAU = c(3L, 3L, 3L, 3L, 1L,
3L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 1L, 1L, 2L, 1L, 1L, 3L, 3L, 1L,
1L, 1L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 2L, 3L, 3L, 1L, 1L, 1L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 2L), RIDES_VISAGE = c(1L,
1L, 1L, 3L, 3L, 3L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 3L, 3L, 3L, 3L,
1L, 3L, 1L, 1L, 1L, 3L, 1L, 3L, 2L, 1L, 3L, 3L, 3L, 3L, 1L, 3L,
3L, 3L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 3L, 3L, 3L, 2L, 3L, 3L, 1L,
1L), ALLERGIES = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), MAINS = c(2L, 3L, 3L, 3L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 1L, 3L, 3L, 2L, 3L, 3L, 2L, 2L,
2L, 3L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 3L,
2L, 3L, 1L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 2L, 3L), PEAU_CORPS = c(1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 1L, 2L, 3L, 2L, 2L, 1L, 2L, 1L, 3L, 2L, 2L, 2L, 3L, 2L,
2L, 2L, 2L, 3L, 3L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L), INTERET_ALIM_NATURELLE = c(1L, 3L, 3L, 1L, 3L, 1L, 1L, 1L,
3L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 3L,
3L, 1L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 3L,
3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), INTERET_ORIGINE_GEO = c(1L,
2L, 1L, 1L, 3L, 1L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 3L, 1L, 2L, 1L,
1L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 2L, 1L, 3L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L,
1L), INTERET_VACANCES = c(2L, 3L, 1L, 2L, 1L, 2L, 1L, 1L, 2L,
3L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), INTERET_ENVIRONNEMENT = c(1L,
3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 1L,
1L, 1L, 1L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), INTERET_COMPOSITION = c(1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 3L, 1L, 1L, 1L, 1L, 3L, 1L,
3L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), MONTH_SALES = c(9, 9,
2, 9, 3, 3, 11, 12, 3, 6, 3, 3, 8, 9, 5, 1, 10, 5, 4, 9, 2, 3,
4, 5, 6, 7, 7, 9, 7, 7, 11, 6, 4, 4, 4, 8, 9, 8, 9, 12, 4, 4,
3, 11, 5, 12, 11, 2, 6, 3), DAY_SALES = c(13, 3, 10, 23, 12,
10, 26, 4, 18, 9, 9, 9, 4, 10, 17, 28, 22, 4, 14, 22, 2, 10,
1, 20, 7, 12, 1, 3, 13, 3, 9, 5, 13, 27, 1, 28, 18, 10, 3, 2,
15, 6, 25, 4, 8, 23, 16, 19, 21, 14), HOURS_INS = c(17, 14, 18,
16, 23, 18, 16, 12, 17, 16, 21, 18, 22, 14, 10, 15, 13, 13, 21,
16, 23, 22, 17, 12, 15, 23, 17, 14, 8, 10, 12, 14, 13, 10, 17,
3, 19, 22, 17, 18, 23, 18, 8, 16, 12, 19, 21, 14, 11, 22), CREATION_MONTH_SALES = c(9,
9, 2, 10, 12, 3, 11, 2, 3, 6, 10, 3, 3, 9, 7, 11, 11, 5, 4, 9,
2, 3, 4, 8, 6, 7, 10, 5, 7, 8, 11, 6, 4, 4, 11, 8, 9, 8, 12,
12, 4, 8, 2, 11, 11, 1, 11, 10, 8, 3), CREATION_DAY_SALES = c(13,
11, 15, 31, 5, 10, 27, 7, 18, 9, 8, 18, 6, 26, 4, 24, 16, 12,
15, 22, 10, 10, 25, 5, 28, 20, 10, 18, 14, 31, 9, 5, 22, 27,
6, 29, 18, 11, 6, 2, 16, 17, 1, 4, 23, 23, 16, 1, 25, 16), VALIDATION_YEAR_SALES = c(2,
1, 2, 2, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 2, 1, 2, 1,
1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2,
1, 1, 2, 1, 2, 1, 1), VALIDATION_MONTH_SALES = c(9, 9, 2, 11,
12, 3, 12, 2, 3, 6, 10, 3, 3, 10, 7, 11, 11, 5, 4, 9, 2, 3, 4,
8, 6, 7, 10, 5, 7, 9, 11, 6, 4, 4, 11, 8, 9, 8, 12, 12, 4, 8,
2, 11, 11, 1, 11, 10, 8, 3), VALIDATION_DAY_SALES = c(15, 14,
16, 3, 6, 19, 1, 8, 21, 10, 9, 21, 7, 1, 6, 25, 17, 13, 20, 29,
11, 20, 29, 6, 30, 22, 12, 20, 16, 1, 10, 7, 25, 28, 14, 30,
19, 13, 8, 4, 28, 24, 2, 7, 25, 25, 19, 3, 27, 21), AGE_CUSTUMER = c(32,
37, 24, 32, 44, 33, 29, 30, 56, 48, 44, 43, 37, 43, 35, 62, 60,
33, 51, 32, 35, 33, 28, 24, 32, 38, 33, 36, 54, 45, 39, 41, 55,
34, 54, 51, 45, 57, 24, 47, 35, 51, 45, 39, 31, 40, 42, 42, 39,
58), MEAN_Sales = c(0, 71.75, 50.7142857142857, 0, 0.666666666666667,
83.3333333333333, 0.333333333333333, 25.7777777777778, 23.3846153846154,
35.5294117647059, 21.6363636363636, 46.8461538461538, 18.4, 15.0666666666667,
110.25, 8.85714285714286, 0, 21.5, 18.5714285714286, 28.125,
101.333333333333, 69.1428571428571, 48.25, 20.5833333333333,
12, 20.3333333333333, 23, 15.1428571428571, 12.3913043478261,
30.3076923076923, 24.625, 23.375, 20.0833333333333, 32.75, 0,
1.5, 0, 50.6, 32.3846153846154, 33, 28.6818181818182, 19.8076923076923,
25.6666666666667, 9.83333333333333, 33, 55.3333333333333, 42.7,
0, 31.375, 11.625), NBR_GIFTS = c(1, 1, 1, 1, 1, 1, 1, 1, 4,
3, 4, 2, 1, 4, 1, 1, 1, 1, 3, 2, 1, 2, 2, 1, 3, 5, 4, 1, 9, 2,
5, 1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 1, 3, 2, 1, 1, 4, 4),
OUTCOME = c(3, 4, 7, 3, 3, 6, 3, 9, 26, 17, 22, 13, 10, 30,
4, 7, 7, 6, 14, 16, 3, 7, 12, 12, 15, 24, 21, 7, 46, 13,
16, 8, 12, 8, 3, 8, 3, 10, 13, 13, 22, 26, 12, 6, 13, 6,
10, 4, 16, 24)), .Names = c("YEAR_SALES", "CREATION_YEAR_SALES",
"TYPE_PEAU", "SENSIBILITE", "IMPERFECTIONS", "BRILLANCE", "GRAIN_PEAU",
"RIDES_VISAGE", "ALLERGIES", "MAINS", "PEAU_CORPS", "INTERET_ALIM_NATURELLE",
"INTERET_ORIGINE_GEO", "INTERET_VACANCES", "INTERET_ENVIRONNEMENT",
"INTERET_COMPOSITION", "MONTH_SALES", "DAY_SALES", "HOURS_INS",
"CREATION_MONTH_SALES", "CREATION_DAY_SALES", "VALIDATION_YEAR_SALES",
"VALIDATION_MONTH_SALES", "VALIDATION_DAY_SALES", "AGE_CUSTUMER",
"MEAN_Sales", "NBR_GIFTS", "OUTCOME"), row.names = c(1L, 2L,
3L, 5L, 9L, 13L, 14L, 16L, 18L, 19L, 20L, 24L, 27L, 29L, 30L,
32L, 33L, 35L, 36L, 37L, 39L, 44L, 49L, 51L, 52L, 53L, 55L, 56L,
61L, 62L, 63L, 65L, 66L, 67L, 71L, 74L, 75L, 80L, 81L, 84L, 86L,
90L, 92L, 95L, 96L, 99L, 100L, 103L, 104L, 107L), class = "data.frame")
My model of clustering is given by this code:
Model<-kmeans(DATAFINALE,centers = 3,nstart=20)
Then I need to get a plot as given in this link https://plot.ly/r/3d-scatter-plots/ having as title Basic 3D Scatter Plot.
Thank you in advance
First of all, you have to add the cluster vector to the dataset.
# convert them as factor to plot them right
DATAFINALE$cluster <- as.factor(Model$cluster)
Then you have to decide which variables plot as x,y,and z (I've taken randomly three):
x <-'MONTH_SALES'
y <-'DAY_SALES'
z <- 'HOURS_INS'
Lastly you can plot it, using the cluster as colors:
library(plotly)
p <- plot_ly(DATAFINALE, x = ~MONTH_SALES, y = ~ DAY_SALES, z = ~HOURS_INS, color = ~cluster) %>%
add_markers() %>%
layout(scene = list(xaxis = list(title = x),
yaxis = list(title = y),
zaxis = list(title = z)))
p
Here the result:

legend for group of lines in ggplot

I am trying to plot a group of lines and assign one label to these lines in the legend. Here's an example of the data (df2) that I am using.
structure(list(
true = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
.Label = c("model1", "model2"), class = "factor"),
test = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L),
.Label = c("false.model1", "false.model2", "false.model3", "true model"), class = "factor"),
est.interval = c(0, 5, 5, 7, 7, 10, 10, 11, 11, 0, 2, 2, 3, 3, 0, 2, 2, 4, 4, 0, 2, 2, 3, 3, 4, 4, 0, 7, 7, 10, 10, 13, 13, 0, 4, 4, 5, 5, 0, 5, 5, 6, 6, 7, 7, 9, 9, 0, 2, 2, 3, 3, 4, 4),
sens = c(1, 1, 0.75, 0.75, 0.5, 0.5, 0.25, 0.25, 0, 1, 1, 0.25, 0.25, 0, 1, 1, 0.5, 0.5, 0, 1, 1, 0.5, 0.5, 0.25, 0.25, 0, 1, 1, 0.75, 0.75, 0.25, 0.25, 0, 1, 1, 0.5, 0.5, 0, 1, 1, 0.75, 0.75, 0.5, 0.5, 0.25, 0.25, 0, 1, 1, 0.5, 0.5, 0.25, 0.25, 0)),
.Names = c("true", "test", "est.interval", "sens"),
class = "data.frame")
And here I plot the curves using ggplot.
ggplot(df2, aes(x=est.interval, y=sens, color=test)) +
ylim(0,1) +
geom_line(size = 0.6) +
facet_wrap(~true, nrow=1) +
scale_colour_manual(values=c(rep('#CCCCCC', 3), "#000000"), name="Estimation Model") +
guides(color=guide_legend(keywidth = 3, keyheight = 1))
I'd like to have the three grey "false.model" curves separately visible on the plot, but in the legend I just want one "false models" entry beside one grey line. Any thoughts on how to do this?
I had a problem with your dput. It seemed to be missing rownames which made it an invalid data.frame. Here a corrected dump
df2 <- structure(list(true = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("model1",
"model2"), class = "factor"), test = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L), .Label = c("false.model1", "false.model2", "false.model3",
"true model"), class = "factor"), est.interval = c(0, 5, 5, 7,
7, 10, 10, 11, 11, 0, 2, 2, 3, 3, 0, 2, 2, 4, 4, 0, 2, 2, 3,
3, 4, 4, 0, 7, 7, 10, 10, 13, 13, 0, 4, 4, 5, 5, 0, 5, 5, 6,
6, 7, 7, 9, 9, 0, 2, 2, 3, 3, 4, 4), sens = c(1, 1, 0.75, 0.75,
0.5, 0.5, 0.25, 0.25, 0, 1, 1, 0.25, 0.25, 0, 1, 1, 0.5, 0.5,
0, 1, 1, 0.5, 0.5, 0.25, 0.25, 0, 1, 1, 0.75, 0.75, 0.25, 0.25,
0, 1, 1, 0.5, 0.5, 0, 1, 1, 0.75, 0.75, 0.5, 0.5, 0.25, 0.25,
0, 1, 1, 0.5, 0.5, 0.25, 0.25, 0)), .Names = c("true", "test",
"est.interval", "sens"), row.names = c(NA, -54L), class = "data.frame")
The easiest thing to do would be to create a new factor that you can use for the coloring. For example
df2$testcol<-factor(ifelse(df2$test=="true model",1,2),
levels=1:2,
labels=c("True Model","False Model"))
Now all the false models will share a common value. Then you can do
ggplot(df2, aes(x=est.interval, y=sens, group=test, colour=testcol)) +
ylim(0,1) +
geom_line(size = 0.6) +
facet_wrap(~true, nrow=1) +
scale_colour_manual(values=c("True Model"='#CC0000',
"False Model"="#999999"), name="Estimation Model") +
guides(color=guide_legend(keywidth = 3, keyheight = 1))
to get

Resources