What I'm doing
I'm using a library for R called ggplot2, which allows for a lot of different options for creating graphics and other things. I'm using that to display two different data sets on one graph with different colours for each set of data I want to display.
The Problem
I'm also trying to get a legend to to show up in my graph that will tell the user which set of data corresponds to which colour. So far, I've not been able to get it to show.
What I've tried
I've set it to have a position at the top/bottom/left/right to make sure nothing was making it's position to none by default, which would've hidden it.
The Code
# PDF/Plot generation
pdf("activity-plot.pdf")
ggplot(data.frame("Time"=times), aes(x=Time)) +
#Data Set 1
geom_density(fill = "#1A3552", colour = "#4271AE", alpha = 0.8) +
geom_text(x=mean(times)-1, y=max(density(times)$y/2), label="Mean {1} Activity", angle=90, size = 4) +
geom_vline(aes(xintercept=mean(times)), color="cyan", linetype="dashed", size=1, alpha = 0.5) +
# Data Set 2
geom_density(data=data.frame("Time"=timesSec), fill = "gray", colour = "orange", alpha = 0.8) +
geom_text(x=mean(timesSec)-1, y=max(density(timesSec)$y/2), label="Mean {2} Activity", angle=90, size = 4) +
geom_vline(aes(xintercept=mean(timesSec)), color="orange", linetype="dashed", size=1, alpha = 0.5) +
# Main Graph Info
labs(title="Activity in the past 48 hours", subtitle="From {DATE 1} to {DATE 2}", caption="{LOCATION}") +
scale_x_continuous(name = "Time of Day", breaks=seq(c(0:23))) +
scale_y_continuous(name = "Activity") +
theme(legend.position="top")
dev.off()
Result
As pointed out by #Ben, you should pass the color into an aes in order to get the legend being displayed.
However, a better way to get a ggplot is to merge your two values "Time" and "Timesec" into a single dataframe and reshape your dataframe into a longer format. Here, to illustrate this, I created this dummy dataframe:
Time = sample(1:24, 200, replace = TRUE)
Timesec = sample(1:24, 200, replace = TRUE)
df <- data.frame(Time, Timesec)
Time Timesec
1 22 23
2 21 9
3 19 9
4 10 6
5 7 24
6 15 9
... ... ...
So, the first step is to reshape your dataframe into a longer format. Here, I'm using pivot_longer function from tidyr package:
library(tidyr)
library(dplyr)
df %>% pivot_longer(everything(), names_to = "var",values_to = "val")
# A tibble: 400 x 2
var val
<chr> <int>
1 Time 22
2 Timesec 23
3 Time 21
4 Timesec 9
5 Time 19
6 Timesec 9
7 Time 10
8 Timesec 6
9 Time 7
10 Timesec 24
# … with 390 more rows
To add geom_vline and geom_text based on the mean of your values, a nice way of doing it easily is to create a second dataframe gathering the mean and the maximal density values needed to be plot:
library(tidyr)
library(dplyr)
df_lab <- df %>% pivot_longer(everything(), names_to = "var",values_to = "val") %>%
group_by(var) %>%
summarise(Mean = mean(val),
Density = max(density(val)$y))
# A tibble: 2 x 3
var Mean Density
<chr> <dbl> <dbl>
1 Time 11.6 0.0555
2 Timesec 12.1 0.0517
So, using df and df_lab, you can generate your entire plot. Here, we passed color and fill arguments into the aes and use scale_color_manual and scale_fill_manual to set appropriate colors:
library(dplyr)
library(tidyr)
library(ggplot2)
df %>% pivot_longer(everything(), names_to = "var",values_to = "val") %>%
ggplot(aes(x = val, fill = var, colour = var))+
geom_density(alpha = 0.8)+
scale_color_manual(values = c("#4271AE", "orange"))+
scale_fill_manual(values = c("#1A3552", "gray"))+
geom_vline(inherit.aes = FALSE, data = df_lab,
aes(xintercept = Mean, color = var), linetype = "dashed", size = 1,
show.legend = FALSE)+
geom_text(inherit.aes = FALSE, data = df_lab,
aes(x = Mean-0.5, y = Density/2, label = var, color = var), angle = 90,
show.legend = FALSE)+
labs(title="Activity in the past 48 hours", subtitle="From {DATE 1} to {DATE 2}", caption="{LOCATION}") +
scale_x_continuous(name = "Time of Day", breaks=seq(c(0:23))) +
scale_y_continuous(name = "Activity") +
theme(legend.position="top")
Does it answer your question ?
Related
With my dataframe that looks like this (I have in total 1322 rows) :
I'd like to make a bar plot with the percentage of rating of the CFS score. It should look similar to this :
With this code, I can make a single bar plot for the column cfs_triage :
ggplot(data = df) +
geom_bar(mapping = aes(x = cfs_triage, y = (..count..)/sum(..count..)))
But I can't find out to make one with the three varaibles next to another.
Thank you in advance to all of you that will help me with making this barplot with the percentage of rating for this three variable !(I'm not sure that my explanations are very clear, but I hope that it's the case :))
Your best bet here is to pivot your data into long format. We don't have your data, but we can reproduce a similar data set like this:
set.seed(1)
df <- data.frame(cfs_triage = sample(10, 1322, TRUE, prob = 1:10),
cfs_silver = sample(10, 1322, TRUE),
cfs_student = sample(10, 1322, TRUE, prob = 10:1))
df[] <- lapply(df, function(x) { x[sample(1322, 300)] <- NA; x})
Now the dummy data set looks a lot like yours:
head(df)
#> cfs_triage cfs_silver cfs_student
#> 1 9 NA 1
#> 2 8 4 2
#> 3 NA 8 NA
#> 4 NA 10 9
#> 5 9 5 NA
#> 6 3 1 NA
If we pivot into long format, then we will end up with two columns: one containing the values, and one containing the column name that the value belonged to in the original data frame:
library(tidyverse)
df_long <- df %>%
pivot_longer(everything())
head(df_long)
#> # A tibble: 6 x 2
#> name value
#> <chr> <int>
#> 1 cfs_triage 9
#> 2 cfs_silver NA
#> 3 cfs_student 1
#> 4 cfs_triage 8
#> 5 cfs_silver 4
#> 6 cfs_student 2
This then allows us to plot with value on the x axis, and we can use name as a grouping / fill variable:
ggplot(df_long, aes(value, fill = name)) +
geom_bar(position = 'dodge') +
scale_fill_grey(name = NULL) +
theme_bw(base_size = 16) +
scale_x_continuous(breaks = 1:10)
#> Warning: Removed 900 rows containing non-finite values (`stat_count()`).
Created on 2022-11-25 with reprex v2.0.2
Maybe you need something like this: The formatting was taken from #Allan Cameron (many Thanks!):
library(tidyverse)
library(scales)
df %>%
mutate(id = row_number()) %>%
pivot_longer(-id) %>%
group_by(id) %>%
mutate(percent = value/sum(value, na.rm = TRUE)) %>%
mutate(percent = ifelse(is.na(percent), 0, percent)) %>%
mutate(my_label = str_trim(paste0(format(100 * percent, digits = 1), "%"))) %>%
ggplot(aes(x = factor(name), y = percent, fill = factor(name), label = my_label))+
geom_col(position = position_dodge())+
geom_text(aes(label = my_label), vjust=-1) +
facet_wrap(. ~ id, nrow=1, strip.position = "bottom")+
scale_fill_grey(name = NULL) +
scale_y_continuous(labels = scales::percent)+
theme_bw(base_size = 16)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
I have density plots for each shift and year. The means are plotted by grouping in a df called mu. I also add vertical reference lines which I can label without issue but I cannot seem to get the labels on the grouped vertical lines. You will see my latest attempt which throws an error "Aesthetics must be either length 1 or the same as the data (134): x"
My code
library(ggplot2)
library(dplyr)
df <- read.csv("f4_bna_no_cup.csv")
head(df)
ï..n yr s ys x
1 1 2021 1 2021-1 116.83
2 2 2021 1 2021-1 114.83
3 3 2021 1 2021-1 115.50
4 4 2021 1 2021-1 115.42
5 5 2021 1 2021-1 115.58
6 6 2021 1 2021-1 115.58
#summarize means by ys (year-shift)
mu <- df %>%
group_by(ys,s) %>%
summarise(grp.mean = mean(x))
mu
ys s grp.mean
<chr> <int> <dbl>
1 2021-1 1 116.
2 2021-2 2 117.
3 2022-1 1 114.
4 2022-2 2 115.
llab<-mu
shift <- c("Shift 1", "Shift 2")
#density charts on df
ggplot(data=df, aes(x=x,group =ys, fill = yr, color = yr)) +
geom_density(alpha = 0.4) +
scale_x_continuous(limits=c(112,120))+
geom_vline(aes(xintercept = grp.mean), data = mu, linetype = "dashed", size = 0.5) +
geom_text(aes(x=llab$grp.mean, y=.6), label = llab$ys) + #this throws the error
geom_vline(aes(xintercept=114.8), linetype="dashed", size=0.5, color = 'green3') +
geom_text(aes(x=114.8, y=.6), label = "Target", angle = 90, color="black",size=3) +
geom_vline(aes(xintercept=114.1), linetype="solid", size=0.5, color = 'limegreen') +
geom_text(aes(x=114.1, y=.55), label = "Potential", angle = 90, color="black",size=3 ) +
geom_vline(aes(xintercept=113.4), linetype="solid", size=0.5, color = 'firebrick3') +
geom_text(aes(x=113.4, y=.62), label = "Label wt", angle = 90,
color="black",size=3, family = "Times New Roman", vjust=0) +
facet_grid(
.~s,
labeller = labeller(
s = c(`1` = "Shift 1", `2` = "Shift 2")
))+
theme_light()+
theme(legend.position = "none")
Output so far...I'm so close.
Persistence pays off. I figured it out and thought I would share it in case someone else has a similar problem:
All code remains the same as in my question except a slight change to grouping for the mu df, AND replace the line that I noted as throwing the error as follows:
#small change to group_by, retaining yr
mu <- df %>%
group_by(yr,s,ys) %>%
summarise(grp.mean = mean(x))
Replace: geom_text(aes(x=llab$grp.mean, y=.6), label = llab$ys), with
geom_text(data = mu, aes(label = yr), x = mu$grp.mean, y = .60, color = "black", angle = 90, vjust = 0)
Happy new year! Consider this simple example:
> df <- tibble(type = c('0_10','0_9','0_8','0_10','0_9','0_8','1_10','1_9','1_8','1_10','1_9','1_8'),
+ time = c(1,1,1,2,2,2,1,1,1,2,2,2),
+ value = c(2,3,4,2,3,6,-2,-3,-4,-2,-3,-5))
> df
# A tibble: 12 x 3
type time value
<chr> <dbl> <dbl>
1 0_10 1 2
2 0_9 1 3
3 0_8 1 4
4 0_10 2 2
5 0_9 2 3
6 0_8 2 6
7 1_10 1 -2
8 1_9 1 -3
9 1_8 1 -4
10 1_10 2 -2
11 1_9 2 -3
12 1_8 2 -5
I am creating a plot that stacks the values of value over time, by type. I would like to obtain the same color for the type 0_10 and 1_10, another color for 0_9 and 1_9 and another color for 0_8 and 1_8.
Unfortunately, ggplot does not seem to use the factor ordering I am asking for. You can see below that 0_10 is purple while 1_10 is green... They should have the same color.
mylevels = c('0_10','0_9','0_8','1_10','1_9','1_8')
df %>%
ggplot(aes(x = time)) +
geom_area(inheris.aes = FALSE,
data = . %>% dplyr::filter(str_detect(type, '0_')),
aes(y = value,
fill = factor(type, levels = mylevels)),
position = 'stack', color = 'black')+
scale_fill_viridis_d() +
geom_area(inheris.aes = FALSE,
data = . %>% dplyr::filter(str_detect(type, '1_')),
aes(y = value, fill = factor(type, levels = mylevels)),
position = 'stack', color = 'black')
Any idea?
Thanks!
I'd create a new column from the last numbers from your type column for the fill. and create groups by type.
library(tidyverse)
df <- tibble(type = c('0_10','0_9','0_8','0_10','0_9','0_8','1_10','1_9','1_8','1_10','1_9','1_8'), time = c(1,1,1,2,2,2,1,1,1,2,2,2), value = c(2,3,4,2,3,6,-2,-3,-4,-2,-3,-5))
df %>%
mutate(colvar = gsub("^*._","", type)) %>%
ggplot(aes(x = time)) +
geom_area(aes(y = value,
fill = colvar,
group = type),
position = 'stack', color = 'black') +
scale_fill_viridis_d()
If you want to visualise your information of your first "type number", you need to make it another aesthetic, for example alpha:
update new colvar variable as factor with ordered levels
df %>%
mutate(colvar = factor(gsub("^*._","", type), levels = 8:10),
alphavar = as.integer(substr(type, 1, 1))) %>%
ggplot(aes(x = time)) +
geom_area(aes(y = value,
fill = colvar,
group = type,
alpha = alphavar),
position = 'stack', color = 'black') +
scale_fill_viridis_d() +
scale_alpha_continuous(breaks = 0:1, range = c(0.7,1) )
I want to build plot with double y-axes.
In image you can see my dataframe and plot. It was done in Excel, I need to do the sames in R. I tried to use latticeExtra library, but it doesn't show any lines and boxes
library(latticeExtra)
obj1 <- xyplot(Q_TY_PAPER ~ PU, df, type = "h")
obj2 <- xyplot(COM_USD ~ PU, df, type = "l")
doubleYScale(obj1, obj2, text = c("obj1", "obj2"))`
Can you please help me?
Here the capture of my dataset and the plot that I would like to get:
You need to separate your dataframe in two, one that will be used for the barchart and need to be reshape and the second one to be used for the line that need to be scaled.
Basically, the line will be plot on the same y axis that the barchart, however, we will add a secondary y axis that will have mark corresponding to the "real" value of the line.
So, first, we need to rescale the value plot as a line. As, we saw in your example that a value of 8 in the barchart match a value of 500 for the line, we can rescale by applying a ratio of 8/500:
df_line = df[,c("PU","COM_USD")]
df_line$COM_USD_2 = df_line$COM_USD * 8/500
> df_line
PU COM_USD COM_USD_2
1 Client1 464 7.424
2 Client2 237 3.792
3 Client3 179 2.864
4 Client4 87 1.392
5 Client5 42 0.672
6 Client6 27 0.432
7 Client7 10 0.160
For the barchart, we need to pivot the data in a longer format in order to fit the grammar of ggplot2. For doing that, we can use pivot_longer from tidyr packages (loaded with tidyverse):
library(tidyverse)
df_bar <- df %>% select(-COM_USD) %>% pivot_longer(., - PU, names_to = "Variable", values_to = "Value")
# A tibble: 21 x 3
PU Variable Value
<fct> <chr> <dbl>
1 Client1 Q_TY_PAPER 7.1
2 Client1 Q_TY_ONLINE 7.1
3 Client1 CURR 6
4 Client2 Q_TY_PAPER 3.8
5 Client2 Q_TY_ONLINE 3.8
6 Client2 CURR 3.9
7 Client3 Q_TY_PAPER 4.4
8 Client3 Q_TY_ONLINE 4.4
9 Client3 CURR 2.3
10 Client4 Q_TY_PAPER 2.6
# … with 11 more rows
Now, you can plot both of them by doing:
library(tidyverse)
ggplot(df_bar, aes(x = PU, y = Value))+
geom_bar(aes(fill = Variable), stat = "identity", position = position_dodge(), alpha = 0.8)+
geom_line(data = df_line, aes(x = PU, y = COM_USD_2, group = 1), size = 2, color = "blue")+
scale_y_continuous(name = "Quantity", limits = c(0,8), sec.axis = sec_axis(~(500/8)*., name = "USD"))+
theme(legend.title = element_blank(),
axis.title.x = element_blank())
As you can see, in scale_y_continuous, we are setting a second axis that will have the value of its ticks multiply by the reverse ratio (500/8). Like that, it will match values of the line plotted.
Finally, you get the following plot:
DATA
PU = paste0("Client",1:7)
COM_USD = c(464,237,179,87,42,27,10)
Q_TY_PAPER = c(7.1,3.8,4.4,2.6,1.2,1.1,0.5)
Q_TY_ONLINE = c(7.1,3.8,4.4,2.6,1.2,1.1,0.5)
CURR = c(6.0,3.9,2.3,0.2,0.2,0.1,0)
df = data.frame(PU,COM_USD, Q_TY_PAPER, Q_TY_ONLINE, CURR)
EDIT: Dealing with long names as x-axis labels
If your real data names of clients is too long, you can use this solution (Two lines of X axis labels in ggplot) to write them on two lines.
So, first modifying the PU variables:
PU = c("Jon Jon", "Bob Bob", "Andrew Andrew", "Henry Henry", "Alexander Alexander","Donald Donald", "Jack Jack")
COM_USD = c(464,237,179,87,42,27,10)
Q_TY_PAPER = c(7.1,3.8,4.4,2.6,1.2,1.1,0.5)
Q_TY_ONLINE = c(7.1,3.8,4.4,2.6,1.2,1.1,0.5)
CURR = c(6.0,3.9,2.3,0.2,0.2,0.1,0)
df = data.frame(PU,COM_USD, Q_TY_PAPER, Q_TY_ONLINE, CURR)
Then, we apply the same code as described above:
df_line = df[,c("PU","COM_USD")]
df_line$COM_USD_2 = df_line$COM_USD * 8/500
library(tidyverse)
df_bar <- df %>% select(-COM_USD) %>% pivot_longer(., - PU, names_to = "Variable", values_to = "Value")
But for the plot, you can use scale_x_discrete and specify labels by adding \n to indicate R to write x-labels on multiple lines:
ggplot(df_bar, aes(x = PU, y = Value))+
geom_bar(aes(fill = Variable), stat = "identity", position = position_dodge(), alpha = 0.8)+
geom_line(data = df_line, aes(x = PU, y = COM_USD_2, group = 1), size = 2, color = "blue")+
scale_y_continuous(name = "Quantity", limits = c(0,8), sec.axis = sec_axis(~(500/8)*., name = "USD"))+
theme(legend.title = element_blank(),
axis.title.x = element_blank())+
scale_x_discrete(labels = gsub(" ","\n",PU), breaks = PU)
And you get this:
Consider this simple example
library(dplyr)
library(forcats)
library(ggplot2)
mydata <- data_frame(cat1 = c(1,1,2,2),
cat2 = c('a','b','a','b'),
value = c(10,20,-10,-20),
time = c(1,2,1,2))
mydata <- mydata %>% mutate(cat1 = factor(cat1),
cat2 = factor(cat2))
> mydata
# A tibble: 4 x 4
cat1 cat2 value time
<fct> <fct> <dbl> <dbl>
1 1 a 10.0 1.00
2 1 b 20.0 2.00
3 2 a -10.0 1.00
4 2 b -20.0 2.00
Now, I want to create a chart where I interact the two factor variables.
I know I can use interact in ggplot2 (see below).
My big problem is that I do not know how to automate the labeling (and the colouring) of the interactions so that I can avoid any manual error using scale_colour_manual.
For instance:
ggplot(mydata,
aes(x = time, y = value, col = interaction(cat1, cat2) )) +
geom_point(size=15) + theme(legend.position="bottom")+
scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
theme(legend.position="bottom",
legend.text=element_text(size=12, face = "bold")) +
scale_colour_manual(name = ""
, values=c("red","red4","royalblue","royalblue4")
, labels=c("1-b","1-a"
,"2-a","2-b"))
shows:
which has the wrong labels because of a (voluntarily) mistake I made in scale_colour_manual(). Indeed, the bright red dot is 1-a and not 1-b (note how the labels are simply the concatenation of the variable names). The idea is that with more factor levels, guessing the right order can be tricky.
Is there a way to automate this labeling (even better: labeling AND coloring)? Perhaps using forcats? Perhaps creating the labels as strings in the dataframe beforehand?
Thanks!
If the number of factor levels for cat1 / cat2 are not fixed (but could potentially be much larger than 2), I would try to calculate the appropriate colours with hsv(), rather than assign them manually.
The colour cheatsheet here summarise the HSV colour model rather nicely:
Hue (h) is essentially your rainbow colour wheel, Saturation (s) determines how intense the colour is, and Value (v) how dark it is. Each parameter accepts values in the range [0, 1].
Here's how I would adapt it for this use case:
mydata2 <- mydata %>%
# use "-" instead of the default "." since we are using that for the labels anyway
mutate(interacted.variable = interaction(cat1, cat2, sep = "-")) %>%
# cat1: assign hue evenly across the whole wheel,
# cat2: restrict both saturation & value to the [0.3, 1], as it can look too
# faint / dark otherwise
mutate(colour = hsv(h = as.integer(cat1) / length(levels(cat1)),
s = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2)),
v = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2))))
# create the vector of colours for scale_colour_manual()
manual.colour <- mydata2 %>% select(interacted.variable, colour) %>% unique()
colour.vector <- manual.colour$colour
names(colour.vector) <- manual.colour$interacted.variable
rm(manual.colour)
> colour.vector
1-a 1-b 2-a 2-b
"#3AA6A6" "#00FFFF" "#A63A3A" "#FF0000"
With the colours calculated automatically for any number of factors, plotting becomes quite straightforward:
ggplot(mydata2,
aes(x = time, y = value, colour = interacted.variable)) +
geom_point(size = 15) +
scale_colour_manual(name = "",
values = colour.vector,
breaks = names(colour.vector)) +
theme(legend.position = "bottom")
An illustration with more factor levels (code is the same except for the addition of specifying guide_legend(byrow = TRUE) in the colour scale:
mydata3 <- data.frame(
cat1 = factor(rep(1:3, times = 5)),
cat2 = rep(LETTERS[1:5], each = 3),
value = 1:15,
time = 15:1
) %>%
mutate(interacted.variable = interaction(cat1, cat2, sep = "-"),
colour = hsv(h = as.integer(cat1) / length(levels(cat1)),
s = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2)),
v = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2))))
manual.colour <- mydata3 %>% arrange(cat1, cat2) %>%
select(interacted.variable, colour) %>% unique()
colour.vector <- manual.colour$colour
names(colour.vector) <- manual.colour$interacted.variable
rm(manual.colour)
ggplot(mydata3,
aes(x = time, y = value, colour = interacted.variable)) +
geom_point(size = 15) +
scale_colour_manual(name = "",
values = colour.vector,
breaks = names(colour.vector),
guide = guide_legend(byrow = TRUE)) +
theme(legend.position = "bottom")