How to plot 95 percentile and 5 percentile on ggplot2 plot with already calculated values? - r

I have this dataset and use this R code:
library(reshape2)
library(ggplot2)
library(RGraphics)
library(gridExtra)
long <- read.csv("long.csv")
ix <- 1:14
ggp2 <- ggplot(long, aes(x = id, y = value, fill = type)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(label = numbers), vjust=-0.5, position = position_dodge(0.9), size = 3, angle = 0) +
scale_x_continuous("Nodes", breaks = ix) +
scale_y_continuous("Throughput (Mbps)", limits = c(0,1060)) +
scale_fill_discrete(name="Legend",
labels=c("Inside Firewall (Dest)",
"Inside Firewall (Source)",
"Outside Firewall (Dest)",
"Outside Firewall (Source)")) +
theme_bw() +
theme(legend.position="right") +
theme(legend.title = element_text(colour="black", size=14, face="bold")) +
theme(legend.text = element_text(colour="black", size=12, face="bold")) +
facet_grid(type ~ .) +
plot(ggp2)
to get the following result:
Now I need to add the 95 percentile and 5 percentile to the plot. The numbers are calculated in this dataset (NFPnumbers (95 percentile) and FPnumbers (5 percentile) columns).
It seems boxplot() may work here but I am not sure how to use it with ggplot.
stat_quantile(quantiles = c(0.05,0.95)) could work as well, but the function calculates the numbers itself. Can I use my numbers here?
I also tried:
geom_line(aes(x = id, y = long$FPnumbers)) +
geom_line(aes(x = id, y = long$NFPnumbers))
but the result did not look good enough.
geom_boxplot() did not work as well:
geom_boxplot(aes(x = id, y = long$FPnumbers)) +
geom_boxplot(aes(x = id, y = long$NFPnumbers))

When you want to set the parameters for a boxplot, you also need ymin and ymax values. As they are not in the dataset, I calculated them.
ggplot(long, aes(x = factor(id), y = value, fill = type)) +
geom_boxplot(aes(lower = FPnumbers, middle = value, upper = NFPnumbers, ymin = FPnumbers*0.5, ymax = NFPnumbers*1.2, fill = type), stat = "identity") +
xlab("Nodes") +
ylab("Throughput (Mbps)") +
scale_fill_discrete(name="Legend",
labels=c("Inside Firewall (Dest)", "Inside Firewall (Source)",
"Outside Firewall (Dest)", "Outside Firewall (Source)")) +
theme_bw() +
theme(legend.position="right",
legend.title = element_text(colour="black", size=14, face="bold"),
legend.text = element_text(colour="black", size=12, face="bold")) +
facet_grid(type ~ .)
The result:
In the dataset you provided, you gave the value, FPnumbers & NFPnumbers variables. As FPnumbers & NFPnumbers represent the 5 and 95 percentiles, I suppose that the mean is represented by value. For this solution to work, you'll need min and max values for each "Node". I guess you have them somewhere in your raw data.
However, as they are not provided in the dataset, I made them up by calculating them based on FPnumbers & NFPnumbers. The multiplication factors of 0.5 and 1.2 are arbitrary. It is just a way of creating fictitious min and max values.

There are several suitable geoms for that, geom_errorbar is one of them:
ggp2 + geom_errorbar(aes(ymax = NFPnumbers, ymin = FPnumbers), alpha = 0.5, width = 0.5)
I don't know if there's a way to get rid of the central line though.

Related

Creating manually a legend in ggplot R [duplicate]

This question already has an answer here:
How to insert a legend in a GGPLOT with multiple time series
(1 answer)
Closed 2 years ago.
So this is the graph and the code I have so far:
library(fivethirtyeight)
library(tidyverse)
bad_drivers$num_drivers
bad_drivers$perc_speeding
mytable <- bad_drivers %>%
mutate(SpeedPerBilion = (num_drivers * perc_speeding)/100)
ggplot(data = mytable, aes(x = state, y = SpeedPerBilion, fill='red')) +
xlab("") +
ylab("") +
coord_flip() +
geom_bar(stat = "identity")+
geom_bar(data= mytable, aes(x = state, y=num_drivers), alpha=0.5,stat="identity") +
theme(plot.title =`enter code here` element_text(face = "bold"), legend.position = "none")+
scale_y_continuous(sec.axis = dup_axis())+
labs(title = "Drviers Involved In Fatal Collisions While Speeding",
subtitle = "As a share of the number of fatal collisions per billion miles, 2009")
So my questions are:
How am I adding a legend to this graph?
How to erase this lower y coordinate (so to have just the upper one)?
Thank you in advance! :)
Find code below and what I believe is your desired plot. You will have to tweak labels to match what you need but I put place holder names. The key is using the scale_fill_manual with a named vector of colors and calling those color names in the aes of each layer you need to use that color in. Also a neat trick is using alpha() to apply alpha as a color rather than a separate scale. Finally the y axis transformation you were looking for is position = "right" so it ends up on top after coord_flip().
library(fivethirtyeight) # for data
library(tidyverse)
bad_drivers %>%
mutate(SpeedPerBilion = (num_drivers * perc_speeding)/100) %>%
ggplot(aes(x = state, y = SpeedPerBilion)) +
xlab("") +
ylab("") +
coord_flip() +
geom_bar(stat = "identity", aes(fill = "Speeding")) +
geom_bar(aes(x = state, y = num_drivers, fill = "All"),
stat = "identity") +
theme(plot.title = element_text(face = "bold")) +
scale_y_continuous(position = "right") +
scale_fill_manual(name = "Speeding Involved",
values = c("Speeding" = alpha("red", 1), "All" = alpha("red", 0.5))) +
labs(title = "Drviers Involved In Fatal Collisions While Speeding",
subtitle = "As a share of the number of fatal collisions per billion miles, 2009") +
guides(fill = guide_legend(override.aes = list(color = "red", alpha = c(0.25, 1))))
Created on 2022-10-11 by the reprex package (v2.0.1)
Note: For some reason the transparency in the legend doesn't look the same as in the plot so I manually set the legend to alpha = 0.25 so to my eye it matches the plot. Please confirm the result on your own computer.
Maybe this:
library(fivethirtyeight)
library(tidyverse)
mytable <- bad_drivers %>%
mutate(SpeedPerBilion = (num_drivers * perc_speeding)/100)
ggplot(data = mytable, aes(x = state, y = SpeedPerBilion, fill='red')) +
xlab("") +
ylab("") +
coord_flip() +
geom_bar(stat = "identity")+
geom_bar(data= mytable, aes(x = state, y=num_drivers), alpha=0.5,stat="identity") +
scale_y_continuous(sec.axis = dup_axis())+
theme(plot.title = element_text(face = "bold"),
axis.text.x.bottom = element_blank(),
axis.ticks.x.bottom = element_blank())+
labs(title = "Drviers Involved In Fatal Collisions While Speeding",
subtitle = "As a share of the number of fatal collisions per billion miles, 2009")
Output:

How do I represent percent of a variable in a filled barplot?

I have a data frame(t1) and I want to illustrate the shares of companies in relation to their size
I added a Dummy variable in order to make a filled barplot and not 3:
t1$row <- 1
The size of companies are separated in medium, small and micro:
f_size <- factor(t1$size,
ordered = TRUE,
levels = c("medium", "small", "micro"))
The plot is build up with the economic_theme:
ggplot(t1, aes(x = "Size", y = prop.table(row), fill = f_size)) +
geom_col() +
geom_text(aes(label = as.numeric(f_size)),
position = position_stack(vjust = 0.5)) +
theme_economist(base_size = 14) +
scale_fill_economist() +
theme(legend.position = "right",
legend.title = element_blank()) +
theme(axis.title.y = element_text(margin = margin(r = 20))) +
ylab("Percentage") +
xlab(NULL)
How can I modify my code to get the share for medium, small and micro in the middle of the three filled parts in the barplot?
Thanks in advance!
Your question isn't quite clear to me and I suggest you re-phrase it for clarity. But I believe you're trying to get the annotations to be accurately aligned on the Y-axis. For this use, pre-calculate the labels and then use annotate
library(data.table)
library(ggplot2)
set.seed(3432)
df <- data.table(
cat= sample(LETTERS[1:3], 1000, replace = TRUE)
, x= rpois(1000, lambda = 5)
)
tmp <- df[, .(pct= sum(x) / sum(df[,x])), cat][, cumsum := cumsum(pct)]
ggplot(tmp, aes(x= 'size', y= pct, fill= cat)) + geom_bar(stat='identity') +
annotate('text', y= tmp[,cumsum] - 0.15, x= 1, label= as.character(tmp[,pct]))
But this is a poor decision graphically. Stacked bar charts, by definition sum to 100%. Rather than labeling the components with text, just let the graphic do this for you via the axis labels:
ggplot(tmp, aes(x= cat, y= pct, fill= cat)) + geom_bar(stat='identity') + coord_flip() +
scale_y_continuous(breaks= seq(0,1,.05))

Plot standard deviation

I want to plot the standard deviation for 1 line (1 flow serie, the plot will have 2) in a plot with lines or smoth areas. I've seen and applied some code from sd representation and other examples... but it's not working for me.
My original data has several flow values for the same day, of which I've calculated the daily mean and sd. I'm stuck here, don't know if it is possible to represent the daily sd with lines from the column created "called sd" or should I use the original data.
The bellow code is a general example of what I'll apply to my data. The flow, flow1 and sd, are examples of the result calculation of daily mean and sd of the original data.
library(gridExtra)
library(ggplot2)
library(grid)
x <- data.frame(
date = seq(as.Date("2012-01-01"),as.Date("2012-12-31"), by="week"),
rain = sample(0:20,53,replace=T),
flow1 = sample(50:150,53,replace=T),
flow = sample(50:200,53,replace=T),
sd = sample (0:10,53, replace=T))
g.top <- ggplot(x, aes(x = date, y = rain, ymin=0, ymax=rain)) +
geom_linerange() +
scale_y_continuous(limits=c(22,0),expand=c(0,0), trans="reverse")+
theme_classic() +
theme(plot.margin = unit(c(5,5,-32,6),units="points"),
axis.title.y = element_text(vjust = 0.3))+
labs(y = "Rain (mm)")
g.bottom <- ggplot(x, aes(x = date)) +
geom_line(aes(y = flow, colour = "flow")) +
geom_line(aes(y = flow1, colour = "flow1")) +
stat_summary(geom="ribbon", fun.ymin="min", fun.ymax="max", aes(fill=sd), alpha=0.3) +
theme_classic() +
theme(plot.margin = unit(c(0,5,1,1),units="points"),legend.position="bottom") +
labs(x = "Date", y = "River flow (m/s)")
grid.arrange(g.top, g.bottom , heights = c(1/5, 4/5))
The above code gives Error: stat_summary requires the following missing aesthetics: y
Other option is geom_smooth, but as far as I could understand it requires some line equation (I can be wrong, I'm new in R).
Something like this maybe?
g.bottom <- x %>%
select(date, flow1, flow, sd) %>%
gather(key, value, c(flow, flow1)) %>%
mutate(min = value - sd, max = value + sd) %>%
ggplot(aes(x = date)) +
geom_ribbon(aes(ymin = min, ymax = max, fill = key)) +
geom_line(aes(y = value, colour = key)) +
scale_fill_manual(values = c("grey", "grey")) +
theme_classic() +
theme(plot.margin = unit(c(0,5,1,1),units="points"),legend.position="bottom") +
labs(x = "Date", y = "River flow (m/s)")

ggplot fill variable to add to 100%

Here is a dataframe
DF <- data.frame(SchoolYear = c("2015-2016", "2016-2017"),
Value = sample(c('Agree', 'Disagree', 'Strongly agree', 'Strongly disagree'), 50, replace = TRUE))
I have created this graph.
ggplot(DF, aes(x = Value, fill = SchoolYear)) +
geom_bar(position = 'dodge', aes(y = (..count..)/sum(..count..))) +
geom_text(aes(y = ((..count..)/sum(..count..)), label = scales::percent((..count..)/sum(..count..))),
stat = "count", vjust = -0.25, size = 2, position = position_dodge(width = 0.9)) +
scale_y_continuous(labels = percent) +
ylab("Percent") + xlab("Response") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
Is there a way to make the data for each school year add up to 100%, but not have the data stacked, in the graph?
I know this question is similar to this question Create stacked barplot where each stack is scaled to sum to 100%, but I don't want the graph to be stacked. I can't figure out how to apply the solution in my question to this situation. Also I would prefer not to summarize the data before graphing, as I have to make this graph many times using different data each time and would prefer not to have to summarize the data each time.
I'm not sure how to create the plot that you want without transforming the data. But if you want to re-use the same code for multiple datasets, you can write a function to transform your data and generate the plot at the same time:
plot.fun <- function (original.data) {
newDF <- reshape2::melt(apply(table(original.data), 1, prop.table))
Plot <- ggplot(newDF, aes(x=Value, y=value)) +
geom_bar(aes(fill=SchoolYear), stat="identity", position="dodge") +
geom_text(aes(group=SchoolYear, label=scales::percent(value)), stat="identity", vjust=-0.25, size=2, position=position_dodge(width=0.85)) +
scale_y_continuous(labels=scales::percent) +
ylab("Percent") + xlab("Response") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
return (Plot)
}
plot.fun(DF)
Big Disclaimer: I would highly recommend you summarize your data before hand and not try to do these calculations within ggplot. That is not what ggplot is meant to do. Furthermore, it not only complicates your code unnecessarily, but can easily introduce bugs/unintended results.
Given that, it appears that what you want is doable (without summarizing first). A very hacky way to get what you want by doing the calculations within ggplot would be:
#Store factor values
fac <- unique(DF$SchoolYear)
ggplot(DF, aes(x = Value, fill = SchoolYear)) +
geom_bar(position = 'dodge', aes(y = (..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum))) +
geom_text(aes(y = (..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum), label = scales::percent((..count..)/stats::ave(..count.., get("fac", globalenv()), FUN = sum))),
stat = "count", vjust = -0.25, size = 2, position = position_dodge(width = 0.9)) +
scale_y_continuous(labels = percent) +
ylab("Percent") + xlab("Response") +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
This takes the ..count.. variable and divides it by the sum within it's respective group using stats::ave. Note this can be messed up extremely easily.
Finally, we check to see the plot is in fact giving us what we want.
#Check to see we have the correct values
d2 <- DF
d2 <- setDT(d2)[, .(count = .N), by = .(SchoolYear, Value)][, percent := count/sum(count), by = SchoolYear]

R - ggplot2: geom_area loses its fill if limits are defined to max and min values from a data.frame

I am trying to reproduce a sparkline with ggplot2 like the one at the bottom of this image:
Using the following code I get the result displayed at the end of the code.
Note: My actual data.frame has only 2 rows. Therefore the result looks like a single line.
# Create sparkline for MM monthly
# sparkline(dailyMM2.aggregate.monthly$Count, type = 'line')
p <- ggplot(dailyMM2.aggregate.monthly, aes(x=seq(1:nrow(dailyMM2.aggregate.monthly)), y=Count)) +
geom_area(fill="#83CAF5") +
geom_line(color = "#2C85BB", size = 1.5) +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0))
p + theme(axis.line=element_blank(),axis.text.x=element_blank(),
axis.text.y=element_blank(),axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),legend.position="none",
panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank())
However, as I try to only show trends with the sparkline and, therefore, absolute values aren't relevant for me, I have to adapt the config of the ggplot to limit the visible area between the min and max of my axis.y. I do it using the limits option:
# Create sparkline for MM monthly
# sparkline(dailyMM2.aggregate.monthly$Count, type = 'line')
p <- ggplot(dailyMM2.aggregate.monthly, aes(x=seq(1:nrow(dailyMM2.aggregate.monthly)), y=Count)) +
geom_area(fill="#83CAF5") +
geom_line(color = "#2C85BB", size = 1.5) +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0), limits = c(min(dailyMM2.aggregate.monthly$Count)-100, max(dailyMM2.aggregate.monthly$Count)+100))
p + theme(axis.line=element_blank(),axis.text.x=element_blank(),
axis.text.y=element_blank(),axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),legend.position="none",
panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank())
However, the result is not like expected, as the whole geom_area's fill dissapears, as shown in the folllowing image:
Can anyone shed light why this behaviour is happening and maybe help me with a proper way to solve this problem?
If you check ?geom_area you will note that the minimum is fixed to 0. It might be easier to use geom_ribbon. It has a ymin aesthetic. Set the maximum y value using limits or coord_cartesian.
library(reshape2)
library(ggplot2)
# Some data
df=data.frame(year = rep(2010:2014, each = 4),
quarter=rep(c("Q1","Q2","Q3","Q4"),5),
da=c(46,47,51,50,56.3,53.6,55.8,58.9,61.0,63,58.8,62.5,59.5,61.7,60.6,63.9,68.4,62.2,62,70.4))
df.m <- melt(data = df, id.vars = c("year", "quarter"))
ymin <- min(df.m$value)
ymax <- max(df.m$value)
ggplot(data = df.m, aes(x = interaction(quarter,year), ymax = value, group = variable)) +
geom_ribbon(aes(ymin = ymin), fill = "#83CAF5") +
geom_line(aes(y = value), size = 1.5, colour = "#2C85BB") +
coord_cartesian(ylim = c(ymin, ymax)) +
scale_y_continuous(expand = c(0,0)) +
scale_x_discrete(expand = c(0,0)) +
theme_void()

Resources