Reshape and plot 3 quantities with commun Label (geom_text) - r

My data as follow :
k2=structure(list(Hour = c("17:02:00", "17:04:00", "17:07:00", "17:13:00",
"17:14:00", "17:17:00", "17:19:00", "17:22:00", "17:28:00", "17:29:00"
), Cat1 = c(300L, 304L, 272L, 171L, 271L, 376L, 284L, 177L, 218L,
284L), Cat2 = c(15L, 45L, 36L, 31L, 36L, 26L, 26L, 32L, 46L,
32L), Cat3 = c(850L, 1073L, 612L, 537L, 709L, 929L, 870L, 452L,
474L, 696L), Label = c("BA", "EL", "BA", "CI", "MO",
"BA", "EL", "BA", "CI", "RO")), .Names = c("Hour",
"Cat1", "Cat2", "Cat3", "Label"), row.names = c("163", "164",
"165", "167", "168", "169", "170", "171", "173", "174"), class = "data.frame")
I'm stack with a simple question of plot as follow : X will express time, and Y will cumul respectively quantities of Cat1 , Cat2 , Cat3. For a given time, the three quantities will have the same Label.
I reshaped my data as follow, but it is not ok because I have a geom_text = Label for each cat while the three have the same Label.
k2$Hour=format(k2$Hour, format='%H:%M:%S' )
meltk2 = melt(k2, id = c("Hour","Label"))
meltk2$Hour <- as.POSIXct(paste("2012-11-03", meltk2$Hour, "CEST"))
ggplot(meltk2, aes(x=Hour , y = value, group = Hour, colour = variable)) +
geom_bar(stat = "identity") +
scale_x_datetime(breaks=date_breaks("1 hour"), labels=date_format("%H:%M:%S")) +
geom_text(aes(label = as.character(Label)), position = position_dodge(width = 0.8), vjust = -0.6)
What is the healthiest way to do this ?

Do you mean you want only 1 common label visible at each timestamp? Changing geom_text's y value to a number would set all 3 labels to the same location, effectively showing only one:
ggplot(meltk2, aes(x=Hour, y = value, group = Hour, fill = variable)) +
geom_bar(stat = "identity") +
scale_x_datetime(breaks=date_breaks("1 hour"), labels=date_format("%H:%M:%S")) +
geom_text(aes(label = as.character(Label)), y=0, vjust = 1)
(I set y=0 to position the labels at the bottom. You may want to pick some other height.)
If you don't want the bars to be stacked (as per PoGibas' answer):
ggplot(meltk2, aes(x=Hour, y = value, fill = variable)) +
geom_bar(stat = "identity", position="dodge") +
scale_x_datetime(breaks=date_breaks("1 hour"), labels=date_format("%H:%M:%S")) +
geom_text(aes(label = as.character(Label)), y=0, vjust = 1)

Related

change line colour ggplot (geom_line)

I still very new using GGPLOT, but ive created the following graphic in which i would like to switch the colors blue and red. Should be simple enough but i cannot figure it out.
df <- structure(list(Sex = c("M", "M", "M", "M", "M", "M", "M", "W",
"W", "W", "W", "W", "W", "W"), age_cat = structure(c(1L, 2L,
3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("<40",
"41-50", "51-60", "61-70", "71-80", "81-90", "90+"), class = "factor"),
DD = c(42L, 88L, 289L, 558L, 527L, 174L, 22L, 27L, 36L, 206L,
347L, 321L, 160L, 29L), pop = c(36642L, 16327L, 20232L, 18068L,
14025L, 5555L, 1293L, 35887L, 16444L, 20178L, 17965L, 14437L,
7150L, 2300L), proportion = c(0.114622564270509, 0.538984504195504,
1.428430209569, 3.08833296435687, 3.75757575757576, 3.13231323132313,
1.7014694508894, 0.0752361579402012, 0.218924835806373, 1.02091386658737,
1.9315335374339, 2.22345362609961, 2.23776223776224, 1.26086956521739
), lower = c(0.082621962613957, 0.432499099174075, 1.26946115577044,
2.8408823120445, 3.44891013496043, 2.69000601596679, 1.06929480146528,
0.0495867729368698, 0.153377923598767, 0.886828947142727,
1.73530361873497, 1.98914206124244, 1.90751612365318, 0.846006018532107
), upper = c(0.154905173671422, 0.663628389658291, 1.6015714811397,
3.35102939015014, 4.08561035940466, 3.6247050150149, 2.56476800800746,
0.10944592449968, 0.302956684874059, 1.16937842460687, 2.14353263545661,
2.47728262910991, 2.60770261266057, 1.80583393021473)), row.names = c(NA,
-14L), class = "data.frame")
Below is the script i've used, in which i get (sex = M) in red and (sex = W)in bliue.
ggplot(data = prevalence2021GGPLOT,
aes(x = age_cat, y = proportion, color = Sex))+
geom_point()+
labs(title="Prevalence 2021", y="Prevalence (%)", x="Age category") +
geom_errorbar(aes(ymin=(lower),
ymax=(upper)), width=.2) +
theme_bw()+
geom_line(aes(group = unlist(Sex)))
How do i make sex = M blue and sex = W red??
You can use scale_color_manual to manually change the colours in ggplot2. The first colour corresponds to the first modality of your fill variable.
ggplot(data = df,
aes(x = age_cat, y = proportion, color = Sex))+ geom_point()+
labs(title="Prevalence 2021", y="Prevalence (%)", x="Age category") +
geom_errorbar(aes(ymin=(lower),
ymax=(upper)), width=.2) +
theme_bw()+
geom_line(aes(group = unlist(Sex)))+
scale_color_manual(values=c("blue", "red"))
As the result :
you can find the documentation of the ggplot2 graphics on R here :
data_to_viz
scale_color_manual
This works. I need to point to df to use your sample df. Edited it per suggestion.
ggplot(data = df,
aes(x = age_cat, y = proportion, color = Sex))+
geom_point()+
labs(title="Prevalence 2021", y="Prevalence (%)", x="Age category") +
geom_errorbar(aes(ymin=(lower),
ymax=(upper)), width=.2) +
theme_bw()+
geom_line(aes(group = unlist(Sex))) +
ggplot(data = df,
aes(x = age_cat, y = proportion, color = Sex))+
geom_point()+
labs(title="Prevalence 2021", y="Prevalence (%)", x="Age category") +
geom_errorbar(aes(ymin=(lower),
ymax=(upper)), width=.2) +
theme_bw()+
geom_line(aes(group = unlist(Sex))) +
scale_color_manual(values=c(M="darkblue", W="darkred"))

How to implement stacked bar graph with a line chart in R

I have a dataset containing y variable as Year and x variables as (A, B, C(%)). I have attached the dataset here.
dput(result)
structure(list(Year = 2008:2021, A = c(4L, 22L, 31L, 48L, 54L,
61L, 49L, 56L, 59L, 85L, 72L, 58L, 92L, 89L), B = c(1L, 2L, 6L,
7L, 14L, 21L, 15L, 27L, 27L, 46L, 41L, 26L, 51L, 62L), C... = c(25,
9.09, 19.35, 14.58, 25.93, 34.43, 30.61, 48.21, 45.76, 54.12,
56.94, 44.83, 55.43, 69.66)), class = "data.frame", row.names = c(NA,
-14L))
The variables A and B will be plotted as stacked bar graph and the C will be plotted as line chart in the same plot. I have generated the plot using excel like below:
How can I create the same plot in R?
You first need to reshape longer, for example with pivot_longer() from tidyr, and then you can use ggplot2 to plot the bars and the line in two separate layers. The fill = argument in the geom_bar(aes()) lets you stratify each bar according to a categorical variable - name is created automatically by pivot_longer().
library(ggplot2)
library(tidyr)
dat |>
pivot_longer(A:B) |>
ggplot(aes(x = Year)) +
geom_bar(stat = "identity", aes(y = value, fill = name)) +
geom_line(aes(y = `C(%)`), size = 2)
Created on 2022-06-09 by the reprex package (v2.0.1)
You're asking for overlaid bars, in which case there's no need to pivot, and you can add separate layers. However I would argue that this could confuse or mislead many people - usually in stacked plots bars are stacked, not overlaid, so thread with caution!
library(ggplot2)
library(tidyr)
dat |>
ggplot(aes(x = Year)) +
geom_bar(stat = "identity", aes(y = A), fill = "lightgreen") +
geom_bar(stat = "identity", aes(y = B), fill = "red", alpha = 0.5) +
geom_line(aes(y = `C(%)`), size = 2) +
labs(y = "", caption = "NB: bars are overlaid, not stacked!")
Created on 2022-06-09 by the reprex package (v2.0.1)
I propose this:
library(data.table)
library(ggplot2)
library(ggthemes)
dt <- fread("dataset.csv")
dt.long <- melt(dt, id.vars = c("Year"))
dt.AB <- dt.long[variable %in% c("A", "B"), ]
dt.C <- copy(dt.long[variable == "C(%)", .(Year, variable, value = value * 3/2)])
ggplot(dt.AB, aes(x = Year, y = value, fill = variable), ) +
geom_bar(stat = "identity") +
geom_line(data=dt.C, colour='red', aes(x = Year, y = value)) +
scale_x_continuous(breaks = pretty(dt.AB$Year,
n = length(unique(dt.AB$Year)))) +
scale_y_continuous(
name = "A&B",
breaks = seq (0, 150, 10),
sec.axis = sec_axis(~.*2/3, name="C(%)", breaks = seq (0, 100, 10))
) + theme_hc() +
scale_fill_manual(values=c("grey70", "grey50", "grey30")) +
theme(
axis.line.y = element_line(colour = 'black', size=0.5,
linetype='solid'))

How to use hjust to move the label on only one plot?

I am creating a multipanel graph using plot_grid of the relationship of discharge and surface area with species richness in Amazon Rivers. Here is my data:
riverssub<-structure(list(richness = c(127L, 110L, 89L, 74L, 62L, 18L, 22L,
71L, 38L, 91L, 56L, 39L, 90L, 37L, 147L, 53L, 92L, 207L, 52L,
126L, 79L, 32L, 100L, 181L, 83L, 690L), surface = c(33490, 4410,
770, 164.7, 288.5, 9.85, 33.1, 750, 46.9, 970, 85.2, 39.2, 780,
97.3, 3983.71, 220, 500, 11250, 115, 1350, 278, 23.05, 310, 2050,
560, 34570), disch = c(2640L, 687L, 170L, 353L, 384L, 16L, 31L,
513L, 32L, 392L, 50L, 32L, 206L, 81L, 1260L, 104L, 220L, 6100L,
308L, 2060L, 443L, 102L, 348L, 4758L, 913L, 40487L)), class = "data.frame", row.names = c(NA,
-26L))
Here is the code for my graphs and multiplot:
library(cowplot)
a <- ggplot(data = riverssub, aes(x = surface , y = richness)) +
geom_point() +
scale_y_log10() +
scale_x_log10() +
labs(x='Surface Area (100 km\u00b2)', y="Fish Species Richness") +
theme_bw()
b <- ggplot(data = riverssub, aes(x = disch , y = richness)) +
geom_point() +
scale_y_log10() +
scale_x_log10() +
labs(x=bquote('Mean Annual Discharge'~(m^3 * s^-1)), y=" ") +
theme_bw()
plot_grid(a, b + theme(axis.text.y = element_blank()),
nrow = 1, align = 'h', labels="AUTO", label_y=0.97, label_x=0.1)
I want the "A" label to be in the same position on the first plot as the "B" label is on the second plot. I know I can use hjust() within plot_grid() to achieve this, although I am unsure how to do it. Can anyone help? Thanks in advance.
Instead of fiddling around with hjust to place the labels I would suggest to add the labels on the plots before aligning them via plot_grid as already suggested by #Guillaume in his comment. One option to do so and to ensure that the labels will be put on the same relative positions would be to make use annotation_custom:
library(cowplot)
library(ggplot2)
library(magrittr)
a <- ggplot(data = riverssub, aes(x = surface, y = richness)) +
geom_point() +
scale_y_log10() +
scale_x_log10() +
labs(x = "Surface Area (100 km\u00b2)", y = "Fish Species Richness") +
theme_bw()
b <- ggplot(data = riverssub, aes(x = disch, y = richness)) +
geom_point() +
scale_y_log10() +
scale_x_log10() +
labs(x = bquote("Mean Annual Discharge" ~ (m^3 * s^-1)), y = " ") +
theme_bw() +
theme(axis.text.y = element_blank())
list(A = a, B = b) %>%
purrr::imap(function(x, y) x + annotation_custom(grid::textGrob(label = y, x = .05, y = .97, gp = grid::gpar(fontface = "bold")))) %>%
plot_grid(plotlist = ., nrow = 1, align = "h")

Adding a second y axis in R

I am envisioning to use the following dataset to create a plot that combines a clustered bar chart and line chart with the following data:
structure(list(X = 1:14, ORIGIN = c("AUS", "AUS", "DAL", "DAL",
"DFW", "DFW", "IAH", "IAH", "OKC", "OKC", "SAT", "SAT", "SHV",
"SHV"), DEST = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L), .Label = c("ATL", "SEA"), class = "factor"),
flight19.x = c(293L, 93L, 284L, 93L, 558L, 284L, 441L, 175L,
171L, 31L, 262L, 31L, 175L, 0L), flight19.y = c(5526L, 5526L,
6106L, 6106L, 23808L, 23808L, 15550L, 15550L, 2055L, 2055L,
3621L, 3621L, 558L, 558L)), row.names = c(NA, -14L), class = "data.frame")
In Excel, the chart I am envisioning looks something like this:
I have already tried to used the sec.axis function to generate a second axis. However, the outcome looks like the line plot still uses the first y-axis instead of the second axis:
p1 <- ggplot()+
geom_bar(data = flight19, aes(ORIGIN, flight19.x, fill = DEST),stat = "identity", position = "dodge" )+
scale_fill_viridis(name = "Destinations", discrete = TRUE)+
labs(y= "Operation Counts", x = "Airports")
p2 <- p1 + geom_line(data = flight19, aes(as.character(ORIGIN), flight19.y, group = 1))+
geom_point(data = flight19, aes(as.character(ORIGIN), flight19.y, group = 1))+
scale_y_continuous(limit = c(0,600),sec.axis = sec_axis(~.*75/10, name = "Total Monthly Operations"))
The plot shows the warning below:
Warning messages:
1: Removed 12 row(s) containing missing values (geom_path).
2: Removed 12 rows containing missing values (geom_point).
And the codes produce the plot below:
Could someone teach me how to let the line plot corresponds to the second axis?
Thanks so much in advance.
Find a suitable transformation factor, here I used 50 just to get nice y-axis labels
#create x-axis
flight19$x_axis <- paste0(flight19$ORIGIN,'\n',flight19$DEST)
# The transformation factor
#transf_fact <- max(flight19$flight19.y)/max(flight19$flight19.x)
transf_fact <- 50
ggplot(flight19, aes(x = x_axis)) +
geom_bar(aes(y = flight19.x),stat = "identity", fill = "blue") +
geom_line(aes(y = flight19.y/transf_fact,group=1), color = "orange") +
scale_y_continuous(name = "Operation Counts",
limit = c(0,600),
breaks = seq(0,600,100),
sec.axis = sec_axis(~ (.*transf_fact),
breaks = function(limit)seq(0,limit[2],5000),
labels = scales::dollar_format(prefix = "$",suffix = " k",scale = .001),
name = "Total Monthly Operations")) +
xlab("Airports") +
theme_bw()

Unable to customize legend in ggplot

I am plotting yearly demand using ggplot (my code below) but I am not able to put color legend for the plot. My data.frame has "Zone" and "TotalDemand" (only 2 columns) and I have three data.frames for three years ("sales12", "sales13" and "sales14").
ggplot() +
geom_point(data=sales12, aes(x=factor(Zone), y=TotalDemand/1000),
color='green',size=6, shape=17) +
geom_point(data=sales13, aes(x=factor(Zone), y=TotalDemand/1000),
color='red',size=6, shape=18)+
geom_point(data=sales14, aes(x=factor(Zone), y=TotalDemand/1000),
color='black',size=4, shape=19) +
labs(y='Demand (in 1000s)',x='Zones') +
scale_colour_manual(name = 'the colour',
values = c('green'='green', 'black'='black', 'red'='red'),
labels = c('12','13','14'))
Please help me to identify my mistake.
With a very small example data frame, df, I melted it to format it for ggplot.
dput(df)
structure(list(Zone = structure(1:4, .Label = c("Alpha", "Baker",
"Charlie", "Delta"), class = "factor"), TotalDemand = c(90L,
180L, 57L, 159L), sales12 = c(25L, 40L, 13L, 50L), sales13 = c(30L,
60L, 16L, 55L), sales14 = c(35L, 80L, 28L, 54L)), .Names = c("Zone",
"TotalDemand", "sales12", "sales13", "sales14"), class = "data.frame", row.names = c(NA,
-4L))
df.m <- melt(df, id.vars = "Zone", measure.vars = c("sales12", "sales13", "sales14"))
ggplot(df.m, aes(x=factor(Zone), y=value, color = variable )) +
geom_point(size=6, shape=17) +
labs(y='Demand (in 1000s)',x='Zones') +
scale_colour_manual(values = c('green', 'black', 'red'))
You can adjust size and shape and colors of your points, add a title, etc.. Your legend can also be positioned on the bottom, for example.

Resources