How to change the legend of a ggplot - r

When I use ggplot and try to change the legend name from "value" to "Work schedules" doesn't change. As well the scale 0 - Did not worked; 1-Did worked. Do you know what could be wrong with my code:
plot <- ggplot(df3, aes(x = time, y = index, fill = value)) +
geom_raster() +
facet_grid(~ day) +
theme(panel.spacing = unit(1, "mm"),
axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(x="Hours", y ="Identification Number") +
scale_x_continuous(breaks = c(9,17), name= "Time") +
scale_y_continuous()
plot + annotate("rect", fill = "red", alpha = 0.5, xmin = c(9), xmax = c(17), ymin = -Inf, ymax = Inf) +
ylab ("Identification number") +
theme_bw()

#Jordo82 has the right answer for naming the legend. As far as changing the scale from continuous to discrete, you should take a look at your variable "value" and see the range of the (for lack of a better word) values. If the variable type is a double, you may need to use dplyr::mutate() to create ranges. If the values are indeed discrete, try dplyr::mutate(value = as.factor(values))
df3 <- df3 %>% dplyr::mutate(value = ifelse(value < 2, "Not Worked", "Worked"))

Related

Raincloud plot - histogram?

I would like to create a raincloud plot. I have successfully done it. But I would like to know if instead of the density curve, I can put a histogram (it's better for my dataset).
This is my code if it can be usefull
ATSC <- ggplot(data = data, aes(y = atsc, x = numlecteur, fill = numlecteur)) +
geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .5) +
geom_point(aes(y = atsc, color = numlecteur), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
geom_point(data = sumld, aes(x = numlecteur, y = mean), position = position_nudge(x = 0.25), size = 2.5) +
geom_errorbar(data = sumld, aes(ymin = lower, ymax = upper, y = mean), position = position_nudge(x = 0.25), width = 0) +
guides(fill = FALSE) +
guides(color = FALSE) +
scale_color_brewer(palette = "Spectral") +
scale_y_continuous(breaks=c(0,2,4,6,8,10), labels=c("0","2","4","6","8","10"))+
scale_fill_brewer(palette = "Spectral") +
coord_flip() +
theme_bw() +
expand_limits(y=c(0, 10))+
xlab("Lecteur") + ylab("Age total sans check")+
raincloud_theme
I think we can maybe put the "geom_histogram()" but it doesn't work
Thank you in advance for your help !
(sources : https://peerj.com/preprints/27137v1.pdf
https://neuroconscience.wordpress.com/2018/03/15/introducing-raincloud-plots/)
This is actually not quite easy. There are a few challenges.
geom_histogram is "horizontal by nature", and the custom geom_flat_violin is vertical - as are boxplots. Therefore the final call to coord_flip in that tutorial. In order to combine both, I think best is switch x and y, forget about coord_flip, and use ggstance::geom_boxploth instead.
Creating separate histograms for each category is another challenge. My workaround to create facets and "merge them together".
The histograms are scaled way bigger than the width of the points/boxplots. My workaround scale via after_stat function.
How to nudge the histograms to the right position above Boxplot and points - I am converting the discrete scale to a continuous by mapping a constant numeric to the global y aesthetic, and then using the facet labels for discrete labels.
library(tidyverse)
my_data<-read.csv("https://data.bris.ac.uk/datasets/112g2vkxomjoo1l26vjmvnlexj/2016.08.14_AnxietyPaper_Data%20Sheet.csv")
my_datal <-
my_data %>%
pivot_longer(cols = c("AngerUH", "DisgustUH", "FearUH", "HappyUH"), names_to = "EmotionCondition", values_to = "Sensitivity")
# use y = -... to position boxplot and jitterplot below the histogram
ggplot(data = my_datal, aes(x = Sensitivity, y = -.5, fill = EmotionCondition)) +
# after_stat for scaling
geom_histogram(aes(y = after_stat(count/100)), binwidth = .05, alpha = .8) +
# from ggstance
ggstance::geom_boxploth( width = .1, outlier.shape = NA, alpha = 0.5) +
geom_point(aes(color = EmotionCondition), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
# merged those calls to one
guides(fill = FALSE, color = FALSE) +
# scale_y_continuous(breaks = 1, labels = unique(my_datal$EmotionCondition))
scale_color_brewer(palette = "Spectral") +
scale_fill_brewer(palette = "Spectral") +
# facetting, because each histogram needs its own y
# strip position = left to fake discrete labels in continuous scale
facet_wrap(~EmotionCondition, nrow = 4, scales = "free_y" , strip.position = "left") +
# remove all continuous labels from the y axis
theme(axis.title.y = element_blank(), axis.text.y = element_blank(),
axis.ticks.y = element_blank())
Created on 2021-04-15 by the reprex package (v1.0.0)

How to improve speed of ggplot bar chart when plotting >1000 points?

I'm producing a bar chart for 1200 observations using ggplot2. Each of these observations has an error bar. There's also an average shown (using geom_line) for these observations overall.
I'm finding the running time is very slow (2 seconds) in comparison to less observations (e.g. if 500 or were used <1 second). Also, all observations must be a seperate bar.
I realise it doesn't sound like much time, but this time adds up overall for what I need to do - producing over 100 of these plots and knitting them to rmd file.
Below is a piece of code I've created to replicate the issue - this is using ggplot2 inbuilt diamonds dataset.
diamonds1 <- as.data.frame(mutate(diamonds, upper = x + 1.2, lower = x - 0.4))
diamonds2 <- diamonds1 %>%
group_by(cut) %>%
summarize(Mean = mean(x, na.rm=TRUE))
ChosenColorClarity <- "VVS28451"
diamonds3 <- left_join(diamonds1 ,diamonds2, by = c("cut" = "cut") ) %>%
filter(cut == "Very Good") %>%
mutate(ID = paste0(clarity,row_number() )) %>%
mutate(CutType = case_when(ID==ChosenColorClarity ~ ID,
color == "F" & ID != ChosenColorClarity ~ " Same Color",
TRUE ~ " Other Color"),
CutLabel = ifelse(ID == ChosenColorClarity, "Your Cut", ""))
diamonds4 <- diamonds3[order(-xtfrm(diamonds3$CutLabel)),]
diamonds4 <- diamonds4[1:1255,]
diamonds4$Xval <- as.numeric(reorder(diamonds4$ID, diamonds4$x))
DiamondCutChart = diamonds4 %>%
ggplot(aes(x = Xval,
y = x)) +
geom_bar(aes(fill=CutType), stat = "identity", width = 1) +
geom_errorbar(aes(ymin = lower, ymax = upper)) +
geom_text(aes(label = CutLabel),
position = position_stack(vjust = 0.5),
size = 2.7, angle = 90, fontface = "bold") +
geom_line(aes(y = diamonds4$Mean), group = 1, linetype=2, colour = "#0000ff") +
scale_fill_manual(values = c("#32572C", "#41B1B1", "#db03fc")) +
annotate("text", x = 1, y = diamonds4$Mean, hjust =0, vjust = -0.5,
size = 3.2, colour = "#0000ff",
label=paste0("Mean ",diamonds4$Mean)) +
theme_classic()+
theme(axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
legend.position = "top") +
labs(fill = "")
StartTime = Sys.time()
DiamondCutChart
EndTime = Sys.time()
EndTime - StartTime
When running this, it takes around 2 seconds. I need this to be less than 1 second to be able to produce multiple plots and rmarkdown outputs in less overall time.
How can I reduce the time it takes to plot the graph from the piece of code?
Any help or pointing in the right direction is greatly appreciated.
I'm assuming for now that you're aiming for raw speed, and visualization that depicts the desired data content. I'm not sure you need geom_bar() if only one bar is a different color. If your real world scenario has 7 different colors mixed randomly among the 1255 bars... this workaround won't work for you. :) Hopefully this will be helpful! :)
The geom_ribbon() is much faster to render than geom_bar(). With 1255 positions I didn't fiddle with its options, but I understand it has step functions to make it appear like bars when zoomed in. Ymmv.
It is so much faster, I decided to use it twice: once to render "bars" and once to render "error bars". In order for geom_ribbon() to work (for me) I created a numeric column for the x-axis values Xval, see below.
The geom_text() step is really only printing one label, and subsetting the data during this step saves a lot of rendering time. You can adjust as needed.
Same with the annotate() step, it's actually printing and re-printing the same label 1255 times, takes a lot of time. Obviously you don't need that. :)
Each of the three steps above saves about 0.6 to 0.7 seconds. Maybe you can mix and match with other geoms as needed.
The final result (on my system) was 0.2 seconds.
diamonds4$Xval <- as.numeric(reorder(diamonds4$ID, diamonds4$x))
DiamondCutChartNew <- diamonds4 %>%
ggplot(aes(x = Xval, y = x)) +
geom_ribbon(aes(ymin = 0, ymax = x), fill="#32572C") +
geom_col(data = subset(diamonds4, nchar(CutLabel) > 0),
aes(x = Xval, y = x),
fill = "#41B1B1") +
geom_ribbon(data = diamonds4,
aes(ymin = lower, ymax = upper), fill="#FF000077") +
geom_line(aes(y = x)) +
geom_text(data = subset(diamonds4, nchar(CutLabel) > 0),
aes(label = CutLabel),
position = position_stack(vjust = 0.5),
size = 2.7, angle = 90, fontface = "bold") +
geom_line(aes(x = Xval, y = Mean), group = 1, linetype = 2, colour = "#0000ff") +
annotate("text", x = 1, y = head(diamonds4$Mean, 1), hjust = 0, vjust = -0.5,
size = 3.2, colour = "#0000ff",
label=paste0("Mean ", head(diamonds4$Mean, 1))) +
theme_classic() +
theme(axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
legend.position = "top") +
labs(fill = "")
{StartTime = Sys.time()
print(DiamondCutChartNew)
EndTime = Sys.time()
EndTime - StartTime}
Original result (for me):
Time difference of 2.05 secs
The new result:
Time difference of 0.229 secs
Pasting the ProfVis run for this question:
https://rstudio.github.io/profvis/
install.packages("profvis")
library(profvis)
profvis(expr = {
DiamondCutChart <- diamonds4 %>%
ggplot(aes(x = reorder(ID, x),
y = x)) +
geom_bar(aes(fill=CutType), stat = "identity", width = 1) +
geom_errorbar(aes(ymin = lower, ymax = upper)) +
geom_text(aes(label = CutLabel),
position = position_stack(vjust = 0.5),
size = 2.7, angle = 90, fontface = "bold") +
geom_line(aes(y = Mean), group = 1, linetype=2, colour = "#0000ff") +
scale_fill_manual(values = c("#32572C", "#41B1B1")) +
annotate("text", x = 1, y = diamonds4$Mean, hjust =0, vjust = -0.5,
size = 3.2, colour = "#0000ff",
label=paste0("Mean ",diamonds4$Mean)) +
theme_classic()+
theme(axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
legend.position = "top") +
labs(fill = "")
print(DiamondCutChart)
},
interval = 0.005
)

R Windrose percent label on figure

I am using the windrose function posted here: Wind rose with ggplot (R)?
I need to have the percents on the figure showing on the individual lines (rather than on the left side), but so far I have not been able to figure out how. (see figure below for depiction of goal)
Here is the code that makes the figure:
p.windrose <- ggplot(data = data,
aes(x = dir.binned,y = (..count..)/sum(..count..),
fill = spd.binned)) +
geom_bar()+
scale_y_continuous(breaks = ybreaks.prct,labels=percent)+
ylab("")+
scale_x_discrete(drop = FALSE,
labels = waiver()) +
xlab("")+
coord_polar(start = -((dirres/2)/360) * 2*pi) +
scale_fill_manual(name = "Wind Speed (m/s)",
values = spd.colors,
drop = FALSE)+
theme_bw(base_size = 12, base_family = "Helvetica")
I marked up the figure I have so far with what I am trying to do! It'd be neat if the labels either auto-picked the location with the least wind in that direction, or if it had a tag for the placement so that it could be changed.
I tried using geom_text, but I get an error saying that "aesthetics must be valid data columns".
Thanks for your help!
One of the things you could do is to make an extra data.frame that you use for the labels. Since the data isn't available from your question, I'll illustrate with mock data below:
library(ggplot2)
# Mock data
df <- data.frame(
x = 1:360,
y = runif(360, 0, 0.20)
)
labels <- data.frame(
x = 90,
y = scales::extended_breaks()(range(df$y))
)
ggplot(data = df,
aes(x = as.factor(x), y = y)) +
geom_point() +
geom_text(data = labels,
aes(label = scales::percent(y, 1))) +
scale_x_discrete(breaks = seq(0, 1, length.out = 9) * 360) +
coord_polar() +
theme(axis.ticks.y = element_blank(), # Disables default y-axis
axis.text.y = element_blank())
#teunbrand answer got me very close! I wanted to add the code I used to get everything just right in case anyone in the future has a similar problem.
# Create the labels:
x_location <- pi # x location of the labels
# Get the percentage
T_data <- data %>%
dplyr::group_by(dir.binned) %>%
dplyr::summarise(count= n()) %>%
dplyr::mutate(y = count/sum(count))
labels <- data.frame(x = x_location,
y = scales::extended_breaks()(range(T_data$y)))
# Create figure
p.windrose <- ggplot() +
geom_bar(data = data,
aes(x = dir.binned, y = (..count..)/sum(..count..),
fill = spd.binned))+
geom_text(data = labels,
aes(x=x, y=y, label = scales::percent(y, 1))) +
scale_y_continuous(breaks = waiver(),labels=NULL)+
scale_x_discrete(drop = FALSE,
labels = waiver()) +
ylab("")+xlab("")+
coord_polar(start = -((dirres/2)/360) * 2*pi) +
scale_fill_manual(name = "Wind Speed (m/s)",
values = spd.colors,
drop = FALSE)+
theme_bw(base_size = 12, base_family = "Helvetica") +
theme(axis.ticks.y = element_blank(), # Disables default y-axis
axis.text.y = element_blank())

Plot standard deviation

I want to plot the standard deviation for 1 line (1 flow serie, the plot will have 2) in a plot with lines or smoth areas. I've seen and applied some code from sd representation and other examples... but it's not working for me.
My original data has several flow values for the same day, of which I've calculated the daily mean and sd. I'm stuck here, don't know if it is possible to represent the daily sd with lines from the column created "called sd" or should I use the original data.
The bellow code is a general example of what I'll apply to my data. The flow, flow1 and sd, are examples of the result calculation of daily mean and sd of the original data.
library(gridExtra)
library(ggplot2)
library(grid)
x <- data.frame(
date = seq(as.Date("2012-01-01"),as.Date("2012-12-31"), by="week"),
rain = sample(0:20,53,replace=T),
flow1 = sample(50:150,53,replace=T),
flow = sample(50:200,53,replace=T),
sd = sample (0:10,53, replace=T))
g.top <- ggplot(x, aes(x = date, y = rain, ymin=0, ymax=rain)) +
geom_linerange() +
scale_y_continuous(limits=c(22,0),expand=c(0,0), trans="reverse")+
theme_classic() +
theme(plot.margin = unit(c(5,5,-32,6),units="points"),
axis.title.y = element_text(vjust = 0.3))+
labs(y = "Rain (mm)")
g.bottom <- ggplot(x, aes(x = date)) +
geom_line(aes(y = flow, colour = "flow")) +
geom_line(aes(y = flow1, colour = "flow1")) +
stat_summary(geom="ribbon", fun.ymin="min", fun.ymax="max", aes(fill=sd), alpha=0.3) +
theme_classic() +
theme(plot.margin = unit(c(0,5,1,1),units="points"),legend.position="bottom") +
labs(x = "Date", y = "River flow (m/s)")
grid.arrange(g.top, g.bottom , heights = c(1/5, 4/5))
The above code gives Error: stat_summary requires the following missing aesthetics: y
Other option is geom_smooth, but as far as I could understand it requires some line equation (I can be wrong, I'm new in R).
Something like this maybe?
g.bottom <- x %>%
select(date, flow1, flow, sd) %>%
gather(key, value, c(flow, flow1)) %>%
mutate(min = value - sd, max = value + sd) %>%
ggplot(aes(x = date)) +
geom_ribbon(aes(ymin = min, ymax = max, fill = key)) +
geom_line(aes(y = value, colour = key)) +
scale_fill_manual(values = c("grey", "grey")) +
theme_classic() +
theme(plot.margin = unit(c(0,5,1,1),units="points"),legend.position="bottom") +
labs(x = "Date", y = "River flow (m/s)")

How to plot 95 percentile and 5 percentile on ggplot2 plot with already calculated values?

I have this dataset and use this R code:
library(reshape2)
library(ggplot2)
library(RGraphics)
library(gridExtra)
long <- read.csv("long.csv")
ix <- 1:14
ggp2 <- ggplot(long, aes(x = id, y = value, fill = type)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(label = numbers), vjust=-0.5, position = position_dodge(0.9), size = 3, angle = 0) +
scale_x_continuous("Nodes", breaks = ix) +
scale_y_continuous("Throughput (Mbps)", limits = c(0,1060)) +
scale_fill_discrete(name="Legend",
labels=c("Inside Firewall (Dest)",
"Inside Firewall (Source)",
"Outside Firewall (Dest)",
"Outside Firewall (Source)")) +
theme_bw() +
theme(legend.position="right") +
theme(legend.title = element_text(colour="black", size=14, face="bold")) +
theme(legend.text = element_text(colour="black", size=12, face="bold")) +
facet_grid(type ~ .) +
plot(ggp2)
to get the following result:
Now I need to add the 95 percentile and 5 percentile to the plot. The numbers are calculated in this dataset (NFPnumbers (95 percentile) and FPnumbers (5 percentile) columns).
It seems boxplot() may work here but I am not sure how to use it with ggplot.
stat_quantile(quantiles = c(0.05,0.95)) could work as well, but the function calculates the numbers itself. Can I use my numbers here?
I also tried:
geom_line(aes(x = id, y = long$FPnumbers)) +
geom_line(aes(x = id, y = long$NFPnumbers))
but the result did not look good enough.
geom_boxplot() did not work as well:
geom_boxplot(aes(x = id, y = long$FPnumbers)) +
geom_boxplot(aes(x = id, y = long$NFPnumbers))
When you want to set the parameters for a boxplot, you also need ymin and ymax values. As they are not in the dataset, I calculated them.
ggplot(long, aes(x = factor(id), y = value, fill = type)) +
geom_boxplot(aes(lower = FPnumbers, middle = value, upper = NFPnumbers, ymin = FPnumbers*0.5, ymax = NFPnumbers*1.2, fill = type), stat = "identity") +
xlab("Nodes") +
ylab("Throughput (Mbps)") +
scale_fill_discrete(name="Legend",
labels=c("Inside Firewall (Dest)", "Inside Firewall (Source)",
"Outside Firewall (Dest)", "Outside Firewall (Source)")) +
theme_bw() +
theme(legend.position="right",
legend.title = element_text(colour="black", size=14, face="bold"),
legend.text = element_text(colour="black", size=12, face="bold")) +
facet_grid(type ~ .)
The result:
In the dataset you provided, you gave the value, FPnumbers & NFPnumbers variables. As FPnumbers & NFPnumbers represent the 5 and 95 percentiles, I suppose that the mean is represented by value. For this solution to work, you'll need min and max values for each "Node". I guess you have them somewhere in your raw data.
However, as they are not provided in the dataset, I made them up by calculating them based on FPnumbers & NFPnumbers. The multiplication factors of 0.5 and 1.2 are arbitrary. It is just a way of creating fictitious min and max values.
There are several suitable geoms for that, geom_errorbar is one of them:
ggp2 + geom_errorbar(aes(ymax = NFPnumbers, ymin = FPnumbers), alpha = 0.5, width = 0.5)
I don't know if there's a way to get rid of the central line though.

Resources