How to make Financial Times faceted charts with ggplot2 in R - r

Financial Times have a nice faceted coronavirus chart: see Daily death tolls at https://www.ft.com/coronavirus-latest Do you have an idea how make it using R and ggplot2?
Facet_wrap function is not useful in this case, it separates every country line to single minigraphs. The other countries are not visible in gray.
Should I prepare 20+ charts and join them using gridExtra::grid.arrange()?

I am thinking whether there's way to plot the above without replicating the data.frame, so I simulate some data:
set.seed(111)
data = data.frame(group=rep(letters[1:6],each=60),
do.call(rbind,
replicate(6,
data.frame(x=1:60,y=cumsum(rnbinom(60,mu=20,size=0.1))),simplify=FALSE))
)
Below I roll through each group and create a data.frame, with another column called "highlight" to annotate the group of interest:
library(purrr)
library(ggthemes)
library(ggplot2)
library(dplyr)
unique(data$group) %>%
map_dfr(~cbind(data,facet=.x,highlight=data$group %in% .x)) %>%
ggplot(aes(x=x,y=y,group=group))+
geom_line(aes(col=highlight)) +
facet_wrap(~facet,ncol=3,scales="free") +
theme_tufte() + scale_color_manual(values=c("#e5dfdf","#357376")) +
theme(strip.text=element_text(size=12,colour="steelblue"))+
guides(colour = "none")
One can of course, create a list of ggplots, but in fact you are also replicating the data (ggplot creates a data.frame underneath):
plotfun = function(data,highlight){
data %>%
mutate(highlight = group == highlight) %>%
ggplot(aes(x=x,y=y,group=group))+
geom_line(aes(col=highlight)) +
theme_tufte() + scale_color_manual(values=c("#e5dfdf","#357376")) +
ggtitle(highlight)+
theme(plot.title = element_text(size=12,colour="steelblue"))+
guides(colour = "none")
}
grid.arrange(grobs=unique(data$group) %>% map(~plotfun(data,.x)),ncol=3)

Related

What's the easiest way to draw significance lines and asteriks in R?

So I have a good plot created in R consisting of 12 box plots. However, I need to do t-tests between them pair by pair and show their significance using the conventional bracketed lines and asterisks symbol. Is there no automated way to achieve this rather than using geom_line() manually to make every single line?
An example plot is shown below for what I would like to recreate except I'd like to do it for 12 plots instead of just 2.
I'm not sure about your data, but you may try using ggsignif package.
For an example data df,
library(dplyr)
library(ggsignif)
library(ggplot2)
library(reshape2)
df <- data.frame(
young = c(2:9),
old = seq(1, by = .5, length.out = 8)
)
df %>%
melt %>%
ggplot(aes(x = variable, y = value)) +
geom_boxplot() +
geom_signif(comparisons = list(c("young", "old")),
method = "t.test",
map_signif_level = TRUE)

Creating density plots using ggplot2 and purrr; colour of density line based on group

I am using a combination of ggplot2 and purrr in R Studio to loop through a dataframe and generate density plots. Here is a mock dataframe, similar to the structure of what I am working with:-
#load relevant libraries
library(ggplot2)
library(dplyr)
library(purrr)
library(gridExtra)
#mock dataframe
set.seed(123)
Duration<-floor(rnorm(1000, mean=200, sd=50))
DateTime<-seq.POSIXt(from = as.POSIXct("2020-08-01 01:00:00", tz = Sys.timezone()), length.out = 1000, by = "hours")
df<-cbind(Duration,DateTime)
df<-as.data.frame(df)
df$Duration<-as.integer(df$Duration)
df$DateTime<-seq.POSIXt(from = as.POSIXct("2020-08-01 01:00:00", tz = Sys.timezone()),
length.out = 1000, by = "hours")#re-doing this to stop the annoying change back to numeric
df$WeekNumber<-isoweek(df$DateTime)
#create a "period" column
setDT(df)[WeekNumber>=31 & WeekNumber <=32, Period:="Period 1"]
df[WeekNumber>=33 & WeekNumber <=35, Period:="Period 2"]
df[WeekNumber>=36 & WeekNumber <=37, Period:="Period 3"]
df$Period<-factor(df$Period, levels = c("Period 1", "Period 2", "Period 3"))
And here is the code which uses purrr to loop through the dataframe to generate a density plot for each week:-
densplot<-df %>%
group_by(WeekNumber) %>%
summarise() %>%
pull() %>%
# run map() instead of for()
map(~{
df %>%
# filter for each value
filter(WeekNumber == .x) %>%
# run unique density plot
ggplot(aes(group=WeekNumber)) +
geom_density(aes(Duration))+
ggtitle(paste0("Week ",.x," duration"), subtitle = "Log10")+
scale_x_log10()
})
#call grid.arrange to create a faceted version of the plot
do.call(grid.arrange,densplot)
Which gives this:-
What I am trying to do is colour the density lines by "Period" for aid of interpretation. This would be easy enough using ggplot2 on it's own but I would like to use it in my purrr pipeline. However, if I specify ggplot(aes(group=WeekNumber, colour=Period)) or geom_density(aes(Duration)), I get this:-
Plus, a legend for each individual plot, which does look untidy. I would like to be able to colour each individual Period and a single legend displaying the colour of all three Periods (perhaps placed on the right hand side). Is there a way to do this?
It would be better to use facet_wrap() in order to avoid issues with colors. Here the code for your options:
library(ggplot2)
library(dplyr)
#Code
df %>% mutate(WeekNumber=paste0("Week ",WeekNumber," duration")) %>%
ggplot(aes(x=Duration,group=WeekNumber,color=Period)) +
geom_density()+
scale_x_log10()+
facet_wrap(.~WeekNumber,scales='free')
Output:
Update: If you want to iterate, you can adapt a list strategy by splitting your df by period. Then using a function for the plot and patchwork package you can get the expected plot. As additional remark, if you wish different colors you can hack the pipeline by defining the colors in your dataframe before splitting. I did in a practical way but you could use a color palette if more periods are present. Here the code:
library(patchwork)
#Add Colors to df
dfcol <- data.frame(Period=unique(df$Period),color=c('blue','red','green'),stringsAsFactors = F)
#Add to df
df$Colors <- dfcol[match(df$Period,dfcol$Period),"color"]
#Approach 2
#Create a list
List <- split(df,df$WeekNumber)
#Plot function
myplot <- function(x)
{
#Extract color
mycol <- unique(x$Colors)
#Plots
p1 <- ggplot(x,aes(x=Duration,group=WeekNumber,color=Period)) +
geom_density()+
scale_x_log10()+
scale_color_manual(values = mycol)+
ggtitle(paste0("Week ",unique(x$WeekNumber)," duration"), subtitle = "Log10")+
theme(legend.title = element_blank())
return(p1)
}
#Apply
L1 <- lapply(List,myplot)
#Wrap plots
combined <- wrap_plots(L1,ncol = 3)
combined + plot_layout(guides = "collect")
Output:

How to do stacked bar plot in R? (including the value of the var)

i need your help.
I was trying to do a stacked bar plot in R and i m not succeding for the moment. I have read several post but, no succed neither.
Like i am newbie, this is the chart I want (I made it in excel)
And this is how i have the data
Thank you in advance
I would use the package ggplot2 to create this plot as it is easier to position text labels than compared to the basic graphics package:
# First we create a dataframe using the data taken from your excel sheet:
myData <- data.frame(
Q_students = c(1000,1100),
Students_with_activity = c(950, 10000),
Average_debt_per_student = c(800, 850),
Week = c(1,2))
# The data in the dataframe above is in 'wide' format, to use ggplot
# we need to use the tidyr package to convert it to 'long' format.
library(tidyr)
myData <- gather(myData,
Condition,
Value,
Q_students:Average_debt_per_student)
# To add the text labels we calculate the midpoint of each bar and
# add this as a column to our dataframe using the package dplyr:
library(dplyr)
myData <- group_by(myData,Week) %>%
mutate(pos = cumsum(Value) - (0.5 * Value))
#We pass the dataframe to ggplot2 and then add the text labels using the positions which
#we calculated above to place the labels correctly halfway down each
#column using geom_text.
library(ggplot2)
# plot bars and add text
p <- ggplot(myData, aes(x = Week, y = Value)) +
geom_bar(aes(fill = Condition),stat="identity") +
geom_text(aes(label = Value, y = pos), size = 3)
#Add title
p <- p + ggtitle("My Plot")
#Plot p
p
so <- data.frame ( week1= c(1000,950,800), week2=c(1100,10000,850),row.names = c("Q students","students with Activity","average debt per student")
barplot(as.matrix(so))

Rank Stacked Bar Chart by Sum of Subset of Fill Variable

Sample data:
set.seed(145)
df <- data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
df.plot <- ggplot(df,aes(x=Age,y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot
Within the ggplot, how can I reorder x=Age, by the sum of Ranks "Extremely" and "Very" only?
I tried using the below, without success.
df.plot <- ggplot(df,aes(x=reorder(Age,Rank=="Extremely",sum),y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot
Couple of notes:
The way that you are simulating your data does not rule out the possibility that for some ages, all categories are not represented (which is fine), but also that for some ages, some categories are duplicated. I am assuming that this is not true for your real data, so have let this be. Note also that your simulation logic does not produce percentages that add up, although the category names indicate that they should.
The way I would do this is to create the ordering of age based on your desired logic, and then pass that order to the factor call. This decouples the ordering logic and allows arbitrary ordering logic.
Here is then what I think you are looking for:
library(ggplot2)
library(dplyr)
library(scales)
set.seed(145)
# simulate the data
df_foo = data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
# get the ordering that you are interested in
age_order = df_foo %>%
filter(Rank %in% c("Extremely", "Very")) %>%
group_by(Age) %>%
summarize(SumRank = sum(Percent)) %>%
arrange(desc(SumRank)) %>%
`[[`("Age")
# in some cases ages do not appear in the order because the
# ordering logic does not span all categories
age_order = c(age_order, setdiff(unique(df_foo$Age), age_order))
# make age a factor sorted by the ordering above
ggplot(df_foo, aes(x = factor(Age, levels = age_order), y = Percent, fill = Rank))+
geom_bar(stat = "identity") +
coord_flip() +
theme_bw() +
scale_y_continuous(labels = percent)
Which code produces:

R- stacked charts

Hi I'm having issues with a stacked bar chart.
The goal is to print a bar chart that shows the sum of products sold stacked on top of each other, which I have done, but the products are not grouped together, so instead of having big blocks per product, they are all split. I need some way to aggregate the count, so it sums and then I can add the chart in some sort of order
library(ggplot2)
library(plyr) #Is this automatically loaded with ggplot2?
library(dplyr)
salesMixData <- read.csv("SalesMix.csv", stringsAsFactors = FALSE, header = TRUE)
productMix <- salesMixData[,c(1,6,7)]
ggplot(productMix, aes(x=JoinMonthYear, y=Count,fill=Prod)) +
geom_bar(stat='identity') +
theme(axis.text.x = element_text(angle=60, hjust = 1),legend.position="bottom")
The output looks like the following:
You probably want to summarise the data first, calculating an aggregate sum for each combination of JoinMonthYear and Prod.
Here's an example with a dummy data set:
library(ggplot2)
library(dplyr)
d <- data.frame(x=sample(20, 1000, replace=T),
count=rpois(1000, 10),
grp=sample(LETTERS[1:10], 1000, replace=TRUE))
This is equivalent to what you're seeing:
ggplot(d, aes(x=x, y=count, fill=grp)) +
geom_bar(stat='identity')
Grouping the observations (in your case by JoinMonthYear and Prod), and then summarising to the groups' sums, should get you what you're after:
d %>%
group_by(x, grp) %>%
summarise(sum_count=sum(count, na.rm=TRUE)) %>%
ggplot(aes(x=x, y=sum_count, fill=grp)) +
geom_bar(stat='identity')

Resources