Related
This question already has answers here:
Create stacked barplot where each stack is scaled to sum to 100%
(5 answers)
Closed 10 months ago.
So I am having trouble making a stacked bar chart showing proportion of cases vs deaths.
This is the data:
df <- structure(list(Date = structure(c(19108, 19108, 19108, 19108,
19108, 19108, 19108, 19108, 19108, 19108), class = "Date"), Country = c("US",
"India", "Brazil", "France", "Germany", "United Kingdom", "Russia",
"Korea, South", "Italy", "Turkey"), Confirmed = c(81100599L,
43065496L, 30378061L, 28605614L, 24337394L, 22168390L, 17887152L,
17086626L, 16191323L, 15023662L), Recovered = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), Deaths = c(991940L, 523654L, 663108L,
146464L, 134489L, 174778L, 367692L, 22466L, 162927L, 98720L),
Active = c(80108659L, 42541842L, 29714953L, 28459150L, 24202905L,
21993612L, 17519460L, 17064160L, 16028396L, 14924942L)), row.names = c(163539L,
163431L, 163375L, 163414L, 163418L, 163537L, 163496L, 163444L,
163437L, 163533L), class = "data.frame")
and I want to generate something that looks like this except with proportions of deaths vs cases.
This is a modification of #Allan Cameron's answer with adding the percent label and some other different approaches:
library(tidyverse)
library(scales)
df %>%
rename_with(., ~str_replace_all(., 'top10.', '')) %>%
pivot_longer(
cols = -Country,
names_to = "Status",
values_to = "value",
values_transform = list(value = as.integer)
) %>%
mutate(Status = fct_rev(fct_infreq(Status))) %>%
group_by(Country) %>%
mutate(pct= prop.table(value) * 100) %>%
ggplot(aes(x= Country, y = pct, fill=Status)) +
geom_col(position = position_fill())+
scale_fill_manual(values = c("#ff34b3", "#4976ff")) +
scale_y_continuous(labels = scales::percent)+
ylab("Percentage") +
geom_text(aes(label=paste0(sprintf("%1.1f", pct),"%")),
position=position_fill(vjust = 0.1)) +
ggtitle("Your Title")
I had to use OCR to convert the image of your data into actual data I could use. It's far better to include your data as text for this reason.
The plot is not particularly informative because the percentages are low, and difficult to read, but in any case, you can do it like this:
library(tidyverse)
p <- df %>%
mutate(top10.Confirmed = top10.Confirmed - top10.Deaths,
top10.Country = factor(top10.Country, top10.Country)) %>%
rename(Country = top10.Country,
Survived = top10.Confirmed,
Died = top10.Deaths) %>%
pivot_longer(-Country, names_to = "Outcome", values_to = "Count") %>%
mutate(Outcome = factor(Outcome, c("Survived", "Died"))) %>%
ggplot(aes(Country, Count, fill = Outcome)) +
geom_col(position = "fill") +
scale_fill_manual(values = c("#4976ff", "#ff34b3")) +
scale_y_continuous(labels = scales::percent) +
labs(title = "Covid outcomes by country", y = "Percent")
p
To make it easier to read, you could zoom into the bottom:
p + coord_cartesian(ylim = c(0, 0.05))
Data in reproducible format
df <- structure(list(top10.Country = c("US", "India", "Brazil", "France",
"Germany", "United Kingdom", "Russia", "Korea, South", "Italy",
"Turkey"), top10.Confirmed = c(81100599L, 43065496L, 30378061L,
28605614L, 24337394L, 22168390L, 17887152L, 17086626L, 16191323L,
15023662L), top10.Deaths = c(991940L, 523654L, 663108L, 146464L,
134489L, 174778L, 367692L, 22466L, 162927L, 98720L)), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
df
#> top10.Country top10.Confirmed top10.Deaths
#> 1 US 81100599 991940
#> 2 India 43065496 523654
#> 3 Brazil 30378061 663108
#> 4 France 28605614 146464
#> 5 Germany 24337394 134489
#> 6 United Kingdom 22168390 174778
#> 7 Russia 17887152 367692
#> 8 Korea, South 17086626 22466
#> 9 Italy 16191323 162927
#> 10 Turkey 15023662 98720
Created on 2022-05-01 by the reprex package (v2.0.1)
This is my dataframe:
df<-structure(list(year = c(1984, 1984), team = c("Australia", "Brazil"
), continent = c("Oceania", "Americas"), medal = structure(c(3L,
3L), .Label = c("Bronze", "Silver", "Gold"), class = "factor"),
n = c(84L, 12L)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
And this is my ggplot (my question is related to the annotations regard Brazil label):
ggplot(data = df)+
geom_point(aes(x = year, y = n)) +
geom_text_repel(aes(x = year, y = n, label = team),
size = 3, color = 'black',
seed = 10,
nudge_x = -.029,
nudge_y = 35,
segment.size = .65,
segment.curvature = -1,
segment.angle = 178.975,
segment.ncp = 1)+
coord_flip()
So, I have a segment divided by two parts. On both parts I have 'small braks'. How can I avoid them?
I already tried to use segment.ncp, change nudge_xor nudge_ynut its not working.
Any help?
Not really sure what is going on here. This is the best I could generate by experimenting with variations to the input values for segment... arguments.
There is some guidance at: https://ggrepel.slowkow.com/articles/examples.html which has an example with shorter leader lines, maybe that's an approach you could use.
df<-structure(list(year = c(1984, 1984), team = c("Australia", "Brazil"
), continent = c("Oceania", "Americas"), medal = structure(c(3L,
3L), .Label = c("Bronze", "Silver", "Gold"), class = "factor"),
n = c(84L, 12L)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
library(ggplot2)
library(ggrepel)
ggplot(data = df)+
geom_point(aes(x = year, y = n)) +
geom_text_repel(aes(x = year, y = n, label = team),
size = 3, color = 'black',
seed = 1,
nudge_x = -0.029,
nudge_y = 35,
segment.size = 0.5,
segment.curvature = -0.0000002,
segment.angle = 1,
segment.ncp = 1000)+
coord_flip()
Created on 2021-08-26 by the reprex package (v2.0.0)
I am trying to make a plot with three different csvs. In 2 of them, the columns are the same i.e. Year, GMSL and GMSLerror.
In the Frederikse file the columns are Year, GMSL, GMSLerrorlow and GMSLerrorup. How can I tell R to plot the Frederikse error using the columns GMSLerrorlow and GMSLerrorup? I tried the following but it did not work. Thanks.
p1<-files <- c("Frederikse.csv", "ChurchandWhite.csv","Hay.csv")
map_dfr(files, ~ read_csv(.x) %>%
mutate(Author = .x)) %>%
ggplot(aes(x = Time, y = GMSL, color = Author,fill=Author)) +
geom_line(size=0.6)+
theme_bw(12)+
theme(panel.grid.major = element_blank())+
theme(panel.grid.minor = element_blank())+
labs(x = "Year", y = "GMSL (mm)",color="Author")+
geom_errorbar(aes(ymin=GMSL-GMSLerror, ymax =GMSL+GMSLerror,alpha=Author))+
geom_errorbar("Frederikse.csv",(aes(ymin=GMSL-GMSLerrorlow, ymax =GMSL+GMSLerrorup,alpha=Author)))
scale_alpha_manual(values = c(0.3, 0.3, 0.8))+
scale_colour_manual(values=c("#BAB3F0","#1D3E72","#201641"))
p1
structure(list(Year = 1900:1905, GMSLerrorlow = c(-203.5572666,
-201.0185091, -212.0740442, -202.6975639, -200.1670151, -192.1312551
), GMSL = c(-173.2614421, -168.8016753, -180.389967, -170.2678322,
-168.7200709, -160.9814287), GMSLerrorup = c(-141.002807, -135.8976091,
-148.213824, -138.9305182, -137.4501224, -130.3514508)), row.names = c(NA,
6L), class = "data.frame")
structure(list(Time = 1900:1905, GMSL = c(-131.15, -130.5, -129.77,
-128.85, -128.1, -127.56), GMSLerror = c(25.32, 25.17, 25.01,
24.86, 24.7, 24.55)), row.names = c(NA, 6L), class = "data.frame")
structure(list(Time = c(1880.0417, 1880.125, 1880.2083, 1880.2917,
1880.375, 1880.4583), GMSL = c(-183, -171.1, -164.3, -158.2,
-158.7, -159.6), GMSLerror = c(24.2, 24.2, 24.2, 24.2, 24.2,
24.2)), row.names = c(NA, 6L), class = "data.frame")````
You can do this with mutate to make GMSLerrorlow column for all datasets
p1<-files <- c("Frederikse.csv", "ChurchandWhite.csv","Hay.csv")
set_names(files) %>% # give names - can use str_remove to drop `.csv` from names
map_dfr( ~ read_csv(.x), .id = "Author") %>% #use .id argument
mutate(
GMSLerrorlow = if_else(Author != "Frederikse.csv", GMSLerror, GMSLerrorlow),
GMSLerrorup = if_else(Author != "Frederikse.csv", GMSLerror, GMSLerrorup)
) %>%
ggplot(aes(x = Time, y = GMSL, color = Author,fill=Author)) +
geom_line(size=0.6)+
theme_bw(12)+
theme(panel.grid.major = element_blank())+
theme(panel.grid.minor = element_blank())+
labs(x = "Year", y = "GMSL (mm)",color="Author")+
geom_errorbar(aes(ymin=GMSL-GMSLerrorlow, ymax =GMSL+GMSLerrorup,alpha=Author))+
scale_alpha_manual(values = c(0.3, 0.3, 0.8))+
scale_colour_manual(values=c("#BAB3F0","#1D3E72","#201641"))
I have been working on this for some time, and am re-posting this hoping to simplify the definition of the problem and to bring some clarity from feedback of my previous attempt. I am able to label each individual column value, but not able to put the code together necessary to sum the total. The examples I have looked at never work the way I try to put them together, for example with goup_by, or summarize etc.. I would like to only sum the values of "Confirmed Cases", and not show the other column values as with many c("x", "Y", ... "data"), it becomes impossible to read.
Here is the data frame:
dput(COVID1[1:12, ])
structure(list(COUNTY = c("Antrim", "Antrim", "Antrim", "Charlevoix",
"Charlevoix", "Grand Traverse", "Grand Traverse", "Grand Traverse",
"Antrim", "Grand Traverse", "Grand Traverse", "Grand Traverse"
), Date = structure(c(18453, 18456, 18457, 18453, 18455, 18453,
18456, 18457, 18455, 18453, 18456, 18457), class = "Date"), CASE_STATUS = c("Confirmed",
"Confirmed", "Confirmed", "Confirmed", "Confirmed", "Confirmed",
"Confirmed", "Confirmed", "Probable", "Probable", "Probable",
"Probable"), Cases = c(1L, 1L, 2L, 1L, 3L, 2L, 2L, 1L, 1L, 1L,
1L, 1L)), row.names = c(NA, 12L), class = "data.frame")
Code:
ggplot(filter(COVID1, COUNTY %in% c("Antrim", "Charlevoix", "Grand Traverse"), Cases > 0)) +
geom_col(aes(x = Date, y = Cases, fill = CASE_STATUS), position = position_stack(reverse = TRUE), width = .88)+
geom_text(aes(x = Date, y = Cases, label = (Cases)), position = position_stack(reverse = TRUE), vjust = 1.5, size = 3, color = "white") +
scale_fill_manual(values = c('blue',"tomato"))+
scale_x_date(labels = date_format("%m/%d"), limits = as.Date(c('2020-07-09','today()')), breaks = "1 week")+
theme(axis.text.x = element_text(angle=0))+
labs(title = "Antrim - Grand Traverse - Charlevoix")
I'm not sure if I understood the question but I think you want to add the sum of the confirmed cases as labels. There might be a ggplot way of doing it but I think the most straightforward way is to make another dataset with your labels and feed it in.
date_labels <- filter(COVID1, COUNTY %in% c("Antrim", "Charlevoix", "Grand Traverse"), Cases > 0) %>% group_by(Date) %>% summarise(confirmed_cases = sum(Cases[CASE_STATUS == "Confirmed"]))
ggplot(filter(COVID1, COUNTY %in% c("Antrim", "Charlevoix", "Grand Traverse"), Cases > 0)) +
geom_col(aes(x = Date, y = Cases, fill = CASE_STATUS), position = position_stack(reverse = TRUE), width = .88)+
geom_text(data = date_labels, aes(x = Date, y = 1, label = confirmed_cases), position = position_stack(reverse = TRUE), vjust = 1.5, size = 3, color = "white") +
scale_fill_manual(values = c('blue',"tomato"))+
scale_x_date(labels = label_date("%m/%d"), limits = as.Date(c('2020-07-09','today()')), breaks = "1 week")+
theme(axis.text.x = element_text(angle=0))+
labs(title = "Antrim - Grand Traverse - Charlevoix")
Gives me this result:
I have this code
ggplot() +
stat_density(kernel = "biweight",aes(x=fd, colour=id), data=foo1,position="identity",geom="line")+
coord_cartesian(xlim = c(0, 200))+
xlab("Flood Duration")+
ylab("Density")+
ggtitle("PDFs of Flood Duration")+
ggsave("pdf_fd_conus.png")
And I wrote this function
pdf.plot<-function(data,x,xl,yl,title,save){
ggplot() +
stat_density(data, kernel = "biweight",aes_string(x=x, colour='id'),
position="identity",geom="line")+
coord_cartesian(xlim = c(0, 200))+
xlab(xl)+
ylab(yl)+
ggtitle(title)+
ggsave(save)
}
Calling using this:
pdf.plot(data=foo1,x='fd',xl='b',
yl='a',title='a',save='y.png')
But I am getting this error:
Error: ggplot2 doesn't know how to deal with data of class uneval
Called from: eval(expr, envir, enclos)
This is dput(head(foo1,4))
structure(list(id = structure(c(1L, 1L, 1L, 1L), .Label = c("dfa",
"dfb", "cfa", "csb", "bsk"), class = "factor"), lon = c(-70.978611,
-70.978611, -70.945278, -70.945278), lat = c(42.220833, 42.220833,
42.190278, 42.190278), peakq = c(14.7531, 17.3865, 3.3414, 2.7751
), area = c(74.3327, 74.3327, 11.6549, 11.6549), fd = c(29, 54.75,
23, 1), tp = c(14.25, 19.75, 13.5, 0.5), rt = c(14.75, 35, 9.5,
0.5), bl = c(15485.3, 15485.3, 8242.64, 8242.64), el = c(0.643551,
0.643551, 0.474219, 0.474219), k = c(0.325279, 0.325279, 0.176624,
0.176624), r = c(81.947, 81.947, 38.7003, 38.7003), si = c(0.0037157,
0.0037157, -9999, -9999), rr = c(0.00529193, 0.00529193, 0.00469513,
0.00469513)), .Names = c("id", "lon", "lat", "peakq", "area",
"fd", "tp", "rt", "bl", "el", "k", "r", "si", "rr"), row.names = c(NA,
4L), class = "data.frame")
Could you please help?
Your problem is that you didn't specify what argument data is in stat_density. If you look at ?stat_density you'll see the first implied argument is actually mapping=. You need to change pdf.plot to:
pdf.plot<-function(data,x,xl,yl,title,save){
ggplot() +
stat_density(data = data, kernel = "biweight",aes_string(x=x, colour='id'),
position="identity",geom="line")+
coord_cartesian(xlim = c(0, 200))+
xlab(xl)+
ylab(yl)+
ggtitle(title)+
ggsave(save)
}