I want to visualize the reference ranges of several liver enzymes (for example GOT and GPT) that I calculated with two programs "kosmic" and "RLE" using ggplot2.
I do not understand why the bars always start at 0, even if the lower range is for example 16.02.
How do I need to change my code so the minimum and maximum values of the bars look like that:
[16.02,45.46] [9.16,60.52] [16.10,68.90] and [9.30,64.40].
Thank you in advance!
#install.packages("ggplot2")
library(ggplot2)
program <- c(rep("kosmic",4),rep("RLE",4))
value <- c(16.02,45.46,9.16,60.52,16.1,48.9,9.3,64.4)
parameter <- c(rep("GOT",2),rep("GPT",2),rep("GOT",2),rep("GPT",2))
table1 <- data.frame(program,value,parameter)
p <- ggplot(table1, aes(parameter,value, fill = program))+
geom_bar(position="dodge", stat="identity")
print(p)
I am looking for something like this:
Are you looking for something like this?
library(dplyr)
table1 %>%
group_by(parameter, program) %>%
summarize(min = min(value),
median = median(value),
max = max(value), .groups = "drop") %>%
ggplot(aes(interaction(parameter,program), fill = program))+
geom_tile(aes(y = median, height = max-min), width = 0.6)
Edit:
Okay this is hacky, but:
table1 %>%
# example of reordering the parameters
mutate(parameter = fct_relevel(parameter, "GPT", after = 0)) %>%
# forcats offers a variety of fct_*** functions to change factors
# (factors are a data type that can separately store labels and ordering)
group_by(parameter, program) %>%
summarize(min = min(value),
median = median(value),
mean = mean(value),
max = max(value), .groups = "drop") %>%
ggplot(aes(parameter, mean, color = program))+
geom_errorbar(aes(ymin = min, ymax = max),
position = position_dodge(width = 0.3), size = 10,
width = 0) +
# control the legend so the key squares aren't gigantic to match the error bar widths
guides(colour = guide_legend(override.aes = list(size=8))) +
# example of assigning different colors.
# a variety of scale_color_* functions are available
scale_color_manual(values = c("kosmic" = "#cc5588", "RLE" = "#779988"))
A downside of this is that the width/spacing of the bars will vary depending on your graphic output aspect ratio, so to use it might take some fiddling to get as you want.
Based on what you want, I'd suggest a box plot instead of a bar plot:
ggplot(table1, aes(x = parameter, y = value, fill = program, color = program)) +
geom_point(position = position_jitterdodge()) +
geom_boxplot(outlier.shape = NA, color = 'black')
Related
So I have the following code which produces:
The issue here is twofold:
The group bar chart automatically places the highest value on the top (i.e. for avenue 4 CTP is on top), whereas I would always want FTP to be shown first then CTP to be shown after (so always blue bar then red bar)
I need all of the values to scale to 100 or 100% for their respective group (so for CTP avenue 4 would have a huge bar graph but the other avenues should be extremely tiny)
I am new to 'R'/Stack overflow so sorry if anything is wrong/you need more but any help is greatly appreciated.
library(ggplot2)
library(tidyverse)
library(magrittr)
# function to specify decimals
specify_decimal <- function(x, k) trimws(format(round(x, k), nsmall=k))
# sample data
avenues <- c("Avenue1", "Avenue2", "Avenue3", "Avenue4")
flytip_amount <- c(1000, 2000, 1500, 250)
collection_amount <- c(5, 15, 10, 2000)
# create data frame from the sample data
df <- data.frame(avenues, flytip_amount, collection_amount)
# got it working - now to test
df3 <- df
SumFA <- sum(df3$flytip_amount)
df3$FTP <- (df3$flytip_amount/SumFA)*100
df3$FTP <- specify_decimal(df3$FTP, 1)
SumCA <- sum(df3$collection_amount)
df3$CTP <- (df3$collection_amount/SumCA)*100
df3$CTP <- specify_decimal(df3$CTP, 1)
# Now we have percentages remove whole values
df2 <- df3[,c(1,4,5)]
df2 <- df2 %>% pivot_longer(-avenues)
FTGraphPos <- df2$name
ggplot(df2, aes(x = avenues, fill = as.factor(name), y = value)) +
geom_col(position = "dodge", width = 0.75) + coord_flip() +
labs(title = "Flytipping & Collection %", x = "ward_name", y = "Percentageperward") +
geom_text(aes(x= avenues, label = value), vjust = -0.1, position = "identity", size = 5)
I have tried the above and I have looked at lots of tutorials but nothing is exactly precise to what I need of ensuring the group bar charts puts the layers in the same order despite amount and scaling to 100/100%
As Camille notes, to handle ordering of the categories in a plot, you need to set them as factors, and then use functions from the forcats package to handle the order. Here I am using fct_relevel() (note that it will automatically convert character variables to factors).
Your numeric values are in fact set to character, so they need to be set to numeric for the chart to make sense.
To cover point #2, I'm using group_by() to calculate percentages within each name.
I have also fixed the labels so that they are properly dodged along with the bar chart. Also, note that you don't need to call ggplot2 or magrittr if you are calling tidyverse - those packages come along with it already.
df_plot <- df2 |>
mutate(name = fct_relevel(name, "CTP"),
value = as.numeric(value)) |>
group_by(name) |>
mutate(perc = value / sum(value)) |>
ungroup()
ggplot(df_plot, aes(x = value, y = avenues, fill = name)) +
geom_col(position = "dodge", width = 0.75) +
geom_text(aes(label = value), position = position_dodge(width = 0.75), size = 5) +
labs(title = "Flytipping & Collection %", x = "Percentageperward", y = "ward_name") +
guides(fill = guide_legend(reverse = TRUE))
I am new to R and I have problem with adding the text for each point in the coordinate xoy: assume that I have dataframe below:
library (dplyr)
library(ggplot2)
dat <- data.frame(
time = factor(c("Breakfast","Breakfast","Breakfast","Lunch","Lunch","Lunch","Dinner","Dinner","Dinner"), levels=c("Breakfast","Lunch","Dinner")),
total_bill_x = c(12.75,14.89,20.5,17.23,30.3,27.8,20.7,32.3,25.4), total_bill_y= c(20.75,15.29,18.52,19.23,27.3,23.6,19.75,27.3,21.48)
)
and here is my code:
dat %>%
group_by(time) %>%
summarise(
x = sum(total_bill_x),
y = sum(total_bill_y)
)%>%
ggplot(.,aes(x,y, col = time)) +
geom_point()
I know that we will use geom_text but i dont know which argument to add into it to know that which point represent breakfast, lunch, dinner.
Any help for this would be much appreciated.
You can use geom_text(aes(label = time), nudge_y = 0.5). nudge_y will vertical adjust the labels. If you want to move horizontally, you must use nudge_x.
dat %>%
group_by(time) %>% # group your data
summarise(
x = sum(total_bill_x),
y = sum(total_bill_y) # compute median YOU ARE NOT COMPUTING MEDIAN HERE
)%>%
ggplot(.,aes(x,y, col = time)) +
geom_point() +
geom_text(aes(label = time), nudge_y = 0.5)
I am trying to find accumulated values for each year of variables A to Z in myData. I have tried a few things but didn't succeed. Once i do that, i would then need to compute maximum,minimum, median, upper and lower quartile average across all those years. Here is my laborious code so far but don't have any idea how to proceed further- in fact, the current code also is not giving me what i am after.
library(tidyverse)
mydate <- as.data.frame(seq(as.Date("2000-01-01"), to= as.Date("2019-12-31"), by="day"))
colnames(mydate) <- "Date"
Data <- data.frame(A = runif(7305,0,10),
J = runif(7305,0,8),
X = runif(7305,0,12),
Z = runif(7305,0,10))
DF <- data.frame(mydate, Data)
myData <- DF %>% separate(Date, into = c("Year","Month","Day")) %>%
sapply(as.numeric) %>%
as.data.frame() %>%
mutate(Date = DF$Date) %>%
filter(Month > 4 & Month < 11) %>%
mutate(DOY = format(Date, "%j")) %>%
group_by(Year) %>%
mutate(cumulativeSum = accumulate(DOY))
I am trying to get a Figure like below for A, J, X, Z. any help would be appreciated.
Update (EDIT)
My question is pretty confusing so i decided to break it down into steps using excel. Here i am using only one variable which in this case is A (note: in my question i have multiple variable). i am accumulated data from May to October each year which is reflected in column cumulative sum. In the second step (Step-2), i re-arrange the data in day of the year (May to October) with their data. in step-3, i am taking the statistics i mentioned earlier across all the years for every day of the year. I try to clarify as much as i could but probably this a bit strange question.
Ultimate Figure
Here is an example Figure that i would like to derive as a result of this exercise.
So, if I'm understand well, you are trying to plot the statistical descriptive of the cumulative values of each variable between May and October of years 2000 to 2019.
So here is a possible solution to calculate first descriptive statistics of each variable (usingdplyr, lubridate, tiydr package) - I encouraged you to break this code in several part in order to understand all steps.
Basically, I isolate month and year of the date, then, pivot the dataframe into a longer format, filter for keeping values only in the period of interest (May to October), calculate the cumulative sum of values grouped by variables and year. Then, I create a fake date (by pasting a consistent year with real month and days) in order to calculate descriptive statistics in function of this date and variable.
Altogether, it gives something like that:
library(lubridate)
library(dplyr)
library(tidyr)
mydata <- DF %>% mutate(Year = year(Date), Month = month(Date)) %>%
pivot_longer(-c(Date,Year,Month), names_to = "variable", values_to = "values") %>%
filter(between(Month,5,10)) %>%
group_by(Year, variable) %>%
mutate(Cumulative = cumsum(values)) %>%
mutate(NewDate = ymd(paste("2020", Month,day(Date), sep = "-"))) %>%
ungroup() %>%
group_by(variable, NewDate) %>%
summarise(Median = median(Cumulative),
Maximum = max(Cumulative),
Minimum = min(Cumulative),
Upper = quantile(Cumulative,0.75),
Lower = quantile(Cumulative, 0.25))
Then, you can get a similar plot to your example by doing:
library(ggplot2)
ggplot(mydata, aes(x = NewDate))+
geom_ribbon(aes(ymin = Lower, ymax = Upper), color = "grey", alpha =0.5)+
geom_line(aes(y = Median), color = "darkblue")+
geom_line(aes(y = Maximum), color = "red", linetype = "dashed", size = 1.5)+
geom_line(aes(y = Minimum), color ="red", linetype = "dashed", size = 1.5)+
facet_wrap(~variable, scales = "free")+
scale_x_date(date_labels = "%b", date_breaks = "month", name = "Month")+
ylab("Daily Cumulative Precipitation (mm)")
Does it look what you are trying to achieve ?
EDIT: Adding Legends
Adding a legend here is not easy as you are using different geom (ribbon, line) with different color, shape, ...
So, one way is to regroup statistics that can be plot with the same geom and do:
mydata %>% pivot_longer(cols = c(Median, Minimum,Maximum), names_to = "Statistic",values_to = "Value") %>%
ggplot(aes(x = NewDate))+
geom_ribbon(aes(ymin = Lower, ymax = Upper, fill = "Upper / Lower"), alpha =0.5)+
geom_line(aes(y = Value, color = Statistic, linetype = Statistic, size = Statistic))+
facet_wrap(~variable, scales = "free")+
scale_x_date(date_labels = "%b", date_breaks = "month", name = "Month")+
ylab("Daily Cumulative Precipitation (mm)")+
scale_size_manual(values = c(1.5,1,1.5))+
scale_linetype_manual(values = c("dashed","solid","dashed"))+
scale_color_manual(values = c("red","darkblue","red"))+
scale_fill_manual(values = "grey", name = "")
So, it looks good but as you can see, it's a litle bit weird as the Upper/Lower is slightly out of the main legends.
Another solution is to add legends as labeling on the last date. For that, you can create a second dataframe by subsetting only the last date of your first dataframe:
mydata_label <- mydata %>% filter(NewDate == max(NewDate)) %>%
pivot_longer(cols = Median:Lower, names_to = "Stat",values_to = "val")
Then, without changing much the plotting part, you can do:
ggplot(mydata, aes(x = NewDate))+
geom_ribbon(aes(ymin = Lower, ymax = Upper), alpha =0.5)+
geom_line(aes(y = Median), color = "darkblue")+
geom_line(aes(y = Maximum), color = "red", linetype = "dashed", size = 1.5)+
geom_line(aes(y = Minimum), color ="red", linetype = "dashed", size = 1.5)+
facet_wrap(~variable, scales = "free")+
scale_x_date(date_labels = "%b", date_breaks = "month", name = "Month", limits = c(min(mydata$NewDate),max(mydata$NewDate)+25))+
ylab("Daily Cumulative Precipitation (mm)")+
geom_text(data = mydata_label,
aes(x = NewDate+5, y = val, label = Stat, color = Stat), size = 2, hjust = 0, show.legend = FALSE)+
scale_color_manual(values = c("Median" = "darkblue","Maximum" = "red","Minimum" = "red","Upper" = "black", "Lower" = "black"))
I reduced on purpose the size of the text labeling due to space issues in order you can see all of them. But based on the figure you attached to your question, you should have plenty of space to make it working.
I needed to add some partial boxplots to the following plot:
library(tidyverse)
foo <- tibble(
time = 1:100,
group = sample(c("a", "b"), 100, replace = TRUE) %>% as.factor()
) %>%
group_by(group) %>%
mutate(value = rnorm(n()) + 10 * as.integer(group)) %>%
ungroup()
foo %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE)
I would add a grid of (2 x 4 = 8) boxplots (4 per group) to the plot above. Each boxplot should consider a consecutive selection of 25 (or n) points (in each group). I.e., the firsts two boxplots represent the points between the 1st and the 25th (one boxplot below for the group a, and one boxplot above for the group b). Next to them, two other boxplots for the points between the 26th and 50th, etcetera. If they are not in a perfect grid (which I suppose would be both more challenging to obtain and uglier) it would be even better: I prefer if they will "follow" their corresponding smooth line!
That all without using facets (because I have to insert them in a plot which is already facetted :-))
I tried to
bar <- foo %>%
group_by(group) %>%
mutate(cut = 12.5 * (time %/% 25)) %>%
ungroup()
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(x = cut))
but it doesn't work.
I tried to call geom_boxplot() using group instead of x
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(group = cut))
But it draws the boxplots without considering the groups and loosing even the colors (and add a redundant call including color = group doesn't help)
Finally, I decided to try it roughly:
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(data = filter(bar, group == "a"), aes(group = cut)) +
geom_boxplot(data = filter(bar, group == "b"), aes(group = cut))
And it works (maintaining even the correct colors from the main aes)!
Does someone know if it is possible to obtain it using a single call to geom_boxplot()?
Thanks!
This was interesting! I haven't tried to use geom_boxplot with a continuous x before and didn't know how it behaved. I think what is happening is that setting group overrides colour in geom_boxplot, so it doesn't respect either the inherited or repeated colour aesthetic. I think this workaround does the trick; we combine the group and cut variables into group_cut, which takes 8 different values (one for each desired boxplot). Now we can map aes(group = group_cut) and get the desired output. I don't think this is particularly intuitive and it might be worth raising it on the Github, since usually we expect aesthetics to combine nicely (e.g. combining colour and linetype works fine).
library(tidyverse)
bar <- tibble(
time = 1:100,
group = sample(c("a", "b"), 100, replace = TRUE) %>% as.factor()
) %>%
group_by(group) %>%
mutate(
value = rnorm(n()) + 10 * as.integer(group),
cut = 12.5 * ((time - 1) %/% 25), # modified this to prevent an extra boxplot
group_cut = str_c(group, cut)
) %>%
ungroup()
bar %>%
ggplot(aes(x = time, y = value, colour = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(group = group_cut), position = "identity")
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Created on 2019-08-13 by the reprex package (v0.3.0)
About 18 months ago, this helpful exchange appeared, with code to show how to produce a plot of median along with interquartile ranges. Here's the code:
ggplot(data = diamonds) +
geom_pointrange(mapping = aes(x = cut, y = depth),
stat = "summary",
fun.ymin = function(z) {quantile(z,0.25)},
fun.ymax = function(z) {quantile(z,0.75)},
fun.y = median)
Producing this plot:
What I'd wonder is how to add labels for the median and IQ ranges, and how to format the bar (color, alpha, etc). I tried calling the plot as an object to see if there were objects within I could then use to call format functions, but nothing was obvious when I looked at it in the r Studio IDE.
Is this even doable? I know I can do a boxplot but that would have to include min/max. I'd like to produce boxplots with just mean/median and IQs.
You can change the formating like you would any ggplot layer, see the docs for Vertical intervals: lines, crossbars & errorbars in this case. An example of this is the following:
library(ggplot2)
ggplot(data = diamonds) +
geom_pointrange(mapping = aes(x = cut, y = depth),
stat = "summary",
fun.ymin = function(z) {quantile(z,0.25)},
fun.ymax = function(z) {quantile(z,0.75)},
fun.y = median,
size = 4, # <- adjusts size
colour = "red", # <- adjusts colour
alpha = .3) # <- adjusts transparency
If you want to control formatting for the points and lines individually you need to do as #camille suggests and pre-process your data as geom_pointrange() draws a single graphical object so the points and lines are one in the same.
I would suggest something like this:
library(dplyr)
library(ggplot2)
diamonds %>%
group_by(cut) %>%
summarise(median = median(depth),
lq = quantile(depth, 0.25),
uq = quantile(depth, 0.75)) %>%
ggplot(aes(cut, median)) +
geom_linerange(aes(ymin=lq, ymax=uq), size = 4, colour = "blue", alpha = .4) +
geom_point(size = 10, colour = "red", alpha = .8)