How do I make my plot from R look like the one I have on Python? - r

I am trying to make a plot on my R look like the one I have on my Python:
This is the data frame for both Python and R.
All_Flights_Combined_Month
Year
Month
Delay_count
Total_count
2003
Jan
151238
552109
2003
Feb
158369
500206
2003
Mar
152156
559342
2003
Apr
125699
527303
2003
May
136551
533782
2003
Jun
163497
536496
2003
Jul
183491
558568
2003
Aug
178979
556984
2003
Sep
113916
527714
2003
Oct
131409
552370
2003
Nov
157157
528171
2003
Dec
206743
555495
2004
Jan
198818
583987
2004
Feb
183658
553876
2004
Mar
183273
601412
2004
Apr
170114
582970
2004
May
191604
594457
2004
Jun
238074
588792
2004
Jul
237670
614166
2004
Aug
215667
623107
2004
Sep
147508
585125
2004
Oct
193951
610037
2004
Nov
197560
584610
2004
Dec
254786
606731
2005
Jan
229809
594924
2005
Feb
184920
545332
2005
Mar
226883
617540
2005
Apr
169221
594492
2005
May
178327
614802
2005
Jun
236724
609195
2005
Jul
268988
627961
2005
Aug
240410
630904
2005
Sep
165541
574253
2005
Oct
186778
592712
2005
Nov
193399
566138
2005
Dec
256861
572343
And these are the codes for Python:
# To plot the line graph
# Create separate data frames for each year
years = All_Flights_Combined_Month['Year'].unique()
data_frames_month = [All_Flights_Combined_Month[All_Flights_Combined_Month['Year'] == year] for year in years]
# Create subplots
fig, ax = plt.subplots(figsize=(10, 8))
# Plot Delay_count for each year
for i, year in enumerate(years):
color = 'red' if str(year) == '2003' else 'green' if str(year) == '2004' else 'blue'
ax.plot(data_frames_month[i]['Month'], data_frames_month[i]['Delay_count'], label=f"{year} Delay Count", color=color)
# Plot Total_Count for each year
for i, year in enumerate(years):
color = 'orange' if str(year) == '2003' else 'yellow' if str(year) == '2004' else 'purple'
ax.plot(data_frames_month[i]['Month'], data_frames_month[i]['Total_Count'], label=f"{year} Total Count", color=color)
# Set title and labels
ax.set_title('Flight Count by Month')
ax.set_xlabel('Month')
ax.set_ylabel('Number of Flights')
# Add legend
ax.legend(title='Year')
# Save the plot as a pdf file
plt.savefig('Monthly Flight Comparison Python.pdf', format='pdf')
# Show the plot
plt.show()
While this is for R:
{r}
# To plot the line graph
month_plot <- ggplot() + geom_line(data= All_Flights_Combined_Month, aes(x =Month, y=Delay_count, group=Year, color=Year)) +
geom_line(data=All_Flights_Combined_Month, aes(x =Month, y=Total_count, group=Year, color=Year))+ scale_x_discrete(limits = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"))+
xlab("Months")+
ylab("Number of Flights")+
ggtitle("Flight Count by Month")
# To save the plot as .pdf
ggplot2::ggsave("Monthly Flight Comparison R.pdf", plot = last_plot(), width = 8, height = 6)
I need the legend and the line colors to match the ones on Python. I hope I have provide sufficient information. Please kindly advice thank you.
I tried adding scale_color_manual to each geom_line but it churned out an error stating that scale_color_manual values has already been used and it will overwrite the previous ones.

This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.
Then change variable Year or name to the interaction between these two. That's the color and grouping variable.
suppressPackageStartupMessages({
library(dplyr)
library(tidyr)
library(ggplot2)
})
clrs <- c("2003 Delay Count" = "#e44b3b", "2003 Total Count" = "#edbe70",
"2004 Delay Count" = "#0d720d", "2004 Total Count" = "#f8f867",
"2005 Delay Count" = "#0000cb", "2005 Total Count" = "#6d0469")
All_Flights_Combined_Month %>%
pivot_longer(ends_with("count")) %>%
mutate(Month = factor(Month, levels = month.abb),
Year = interaction(Year, name, sep = " "),
Year = sub("_c", " C", Year)) %>%
select(-name) %>%
ggplot(aes(Month, value, colour = Year, group = Year)) +
geom_line(linewidth = 1.25) +
scale_color_manual(values = clrs) +
theme_minimal()
Created on 2023-02-19 with reprex v2.0.2
Data
x <- "Year Month Delay_count Total_count
2003 Jan 151238 552109
2003 Feb 158369 500206
2003 Mar 152156 559342
2003 Apr 125699 527303
2003 May 136551 533782
2003 Jun 163497 536496
2003 Jul 183491 558568
2003 Aug 178979 556984
2003 Sep 113916 527714
2003 Oct 131409 552370
2003 Nov 157157 528171
2003 Dec 206743 555495
2004 Jan 198818 583987
2004 Feb 183658 553876
2004 Mar 183273 601412
2004 Apr 170114 582970
2004 May 191604 594457
2004 Jun 238074 588792
2004 Jul 237670 614166
2004 Aug 215667 623107
2004 Sep 147508 585125
2004 Oct 193951 610037
2004 Nov 197560 584610
2004 Dec 254786 606731
2005 Jan 229809 594924
2005 Feb 184920 545332
2005 Mar 226883 617540
2005 Apr 169221 594492
2005 May 178327 614802
2005 Jun 236724 609195
2005 Jul 268988 627961
2005 Aug 240410 630904
2005 Sep 165541 574253
2005 Oct 186778 592712
2005 Nov 193399 566138
2005 Dec 256861 572343"
All_Flights_Combined_Month <- read.table(text = x, header = TRUE)
Created on 2023-02-19 with reprex v2.0.2

Something like this:
library(tidyverse)
df %>%
pivot_longer(-c(Year, Month)) %>%
mutate(Year = paste(Year, name)) %>%
ggplot(aes(x =Month, y=value, color=factor(Year)))+
geom_line(aes(group = Year))+
scale_x_discrete(limits = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"))+
scale_color_manual(values = c("purple", "yellow", "gold", "blue", "green", "red"))+
xlab("Months")+
ylab("Number of Flights")+
ggtitle("Flight Count by Month")+
theme_classic()

You could transform your data to a longer format and combine the Year and longer format of Delay count and Total count to one string using paste0 and gsub. To get the right colors you could use scale_color_manual, with right order using breaks like this:
library(ggplot2)
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = Delay_count:Total_count) %>%
mutate(Year2 = paste0(Year, " ", gsub("_", " ", name)),
Month = factor(Month, levels = month.abb)) %>%
ggplot(aes(x = Month, y = value, color = Year2, group = Year2)) +
geom_line() +
labs(color = "Year", x = "Month", y = "Number of Flights") +
scale_color_manual(values = c("2003 Delay count" = "red",
"2004 Delay count" = "green",
"2005 Delay count" = "blue",
"2003 Total count" = "orange",
"2004 Total count" = "yellow",
"2005 Total count" = "purple"),
breaks = c("2003 Delay count",
"2004 Delay count",
"2005 Delay count",
"2003 Total count",
"2004 Total count",
"2005 Total count"))
Created on 2023-02-19 with reprex v2.0.2

Using just base R. First, reshape into wide format, then use matplot and customize axis and mtext a little.
dat_w <- reshape(dat, idvar='Month', timevar='Year', direction='w')
par(mar=c(5, 6, 4, 2))
matplot(dat_w[, -1], type='l', lty=1, col=2:8, axes=FALSE, ylab='', main='Flight Count By Month')
axis(side=1, at=1:12, labels=dat_w$Month, cex.axis=.8)
axis(2, axTicks(2), formatC(axTicks(2), format='f', digits=0), las=2, cex.axis=.8)
mtext('Month', side=1, line=2.5, cex=.8); mtext('Number of Flights', 2, 4, cex=.8)
legend('right', c(paste(unique(dat$Year), rep(gsub('_', ' ', names(dat)[3:4]), each=3))),
col=2:8, lty=1, title='Year', cex=.7)
box()

Related

How to build a Heatmap for each facet with its own respective scale instead of just one generic scale for all in r?

I am trying to create a heatmap that should assign colors based on % vaccinated for each month (for each row)
for example Comparison by colors between all states in month of Jan, then
for example Comparison by colors between all states in month of March .. .
then Apr ... Jun etc
Issue: Basically I would like Each month to have its own high & low scale & I am trying to do that with facet but it is assigning one common low-high scale for all the facets/months.
library(tidyverse)
library(lubridate)
library(scales)
file_url1 <- url("https://raw.githubusercontent.com/johnsnow09/covid19-df_stack-code/main/df_vaccination.csv")
df_vaccination <- read.csv(url(file_url1))
df_vaccination <- df_vaccination %>%
mutate(Updated.On = as.Date(Updated.On))
Code: I have tried
df_vaccination %>%
filter(State != "India") %>%
# summarise each month, state's vaccination
mutate(month_abbr = month(Updated.On, label = TRUE, abbr = TRUE),
State = fct_reorder(State, Population, max)) %>%
group_by(month_abbr, State) %>%
summarise(monthly_ind_vaccinated = sum(Total.Individuals.Vaccinated_Dailycalc,
na.rm = TRUE),
Population = first(Population), .groups = "drop") %>%
# get % Vaccination to State population for each month
group_by(State) %>%
mutate(prc_vaccinated_per_pop = monthly_ind_vaccinated / Population) %>%
na.omit() %>%
ungroup() %>%
filter(State %in% c("Delhi","Maharashtra")) %>%
# group_by(month_abbr) %>%
ggplot(aes(x = State, y = month_abbr, fill = prc_vaccinated_per_pop)) +
geom_tile() +
scale_fill_gradient2(low = "white", high = "darkblue", labels = percent) +
facet_wrap(~as.factor(month_abbr), scales = "free_y", nrow = 6) +
theme(axis.text.x = element_text(angle = 90, vjust = -.02),
strip.text = element_blank()) +
labs(title = "States with highest % Vaccination each month ?",
caption = "created by ViSa",
fill = "% Vaccinated each month",
x = "", y = "")
output:
I think since the color value is based on fill so it is not letting different scales apply on different facets.
Is there anything like (scales = free_fill) instead of (scales = free_y) ?
data output:
# A tibble: 12 x 5
# Groups: month_abbr [6]
month_abbr State monthly_ind_vaccina~ Population prc_vaccinated_per_~
<ord> <fct> <int> <dbl> <dbl>
1 Jan Delhi 43948 18710922 0.00235
2 Jan Maharash~ 228424 123144223 0.00185
3 Feb Delhi 322859 18710922 0.0173
4 Feb Maharash~ 794370 123144223 0.00645
5 Mar Delhi 666628 18710922 0.0356
6 Mar Maharash~ 4590035 123144223 0.0373
7 Apr Delhi 1547324 18710922 0.0827
8 Apr Maharash~ 7942882 123144223 0.0645
9 May Delhi 1613335 18710922 0.0862
10 May Maharash~ 4455440 123144223 0.0362
11 Jun Delhi 250366 18710922 0.0134
12 Jun Maharash~ 1777873 123144223 0.0144

How to prevent information on my ggplot2 timeline chart from being cut off

I am drawing a timeline chart with ggplot and it's plotting just fine, the problem comes in when the plot cuts off the names of the last organisations on my chart.
I changed the names of the organisations in my reproducible example but i have tried to retain the number of characters.
I tried making it a plotly graph so i can specify the margins but the names are still cut off.
Any help is really appreciated.
An image showing the graph is attached time line chart
library(scales)
library(ggplot2)
library(lubridate)
library(readxl)
library(plotly)
mydata<- "Jurisdiction Organisations Years.Start Years.End
Pan-African hfgvdbxvbdxncvnbx 1998 2018
International AfrimenRis 2006 2018
International AVSG 1984 2018
Local BOSCOUGYTRtruhjhjhgpp 2007 2018
International CarhIntemmnatoponal 1998 2018
International Caropkg 1980 2018
Local ChrjslignCounselling 2002 2018
Local GWWD-GIO 2004 2018
Local Hmgngnfhfhjdhfvhg 1994 2018
International bsbbjsdvvsnvfncvsjvbsdvvnbvcndbcv 1998 2018
International gkhvhdvfjvbvccvnbdvjbv 2006 2018
Local jhfdjhfgjhseghdfhjsgdjhgfjb 1998 2018
International bjhdbfvjhbjhgdbfvjhvsd 1998 2018
International vdcxnbvndbxcvbnvnbx 2006 2018
Local ACNEVTsvdcxbnvdjxbvfn 2007 2018
International ghjbgjxbdfngvcbdjfhcgbv 1986 2016"
usedata <- read.table(text=mydata, header = TRUE)
usedata$date<-with(usedata, ymd(sprintf("%04d%02d%02d", Years.Start, 1, 1)))
usedata$date2<-with(usedata, ymd(sprintf("%04d%02d%02d", Years.End, 1, 1)))
usedata<-usedata[with(usedata, order(date)),]
jurisdiction_level<-c("International", "Local", "Pan-African")
jurisdiction_colors <- c("#0070C0", "#00B050", "#FFC000")
positions <- c(0.5, -0.5, 1.0, -1.0, 1.5, -1.5)
directions <- c(1, -1)
line_pos <- data.frame(
"date"=unique(usedata$date),
"position"=rep(positions, length.out=length(unique(usedata$date))),
"direction"=rep(directions, length.out=length(unique(usedata$date)))
)
usedata<- merge(x=usedata, y=line_pos, by="date", all = TRUE)
usedata<-usedata[with(usedata, order(date, Jurisdiction)), ]
text_offset <- 0.2
usedata$year_count <- ave(usedata$date==usedata$date, usedata$date, FUN=cumsum)
usedata$text_position <- (usedata$year_count * text_offset * usedata$direction) + usedata$position
##############displaying all years
year_date_range <- as.Date(seq(min(usedata$date) , max(usedata$date) , by='year'), origin = "1970-01-01")
year_format <- format(year_date_range, '%Y')
year_df <- data.frame(year_date_range, year_format)
#png(file="timeline.png",width=1000,height=700,res=70)
####################################PLOT#####################################
timeline_plot<-ggplot(usedata,aes(x=date,y=0, col=Jurisdiction, label=Organisations))
timeline_plot<-timeline_plot+labs(col="Organisations")
timeline_plot<-timeline_plot+scale_color_manual(values=jurisdiction_colors, labels=jurisdiction_level, drop = FALSE)
timeline_plot<-timeline_plot+theme_classic()
########### Plot horizontal black line for timeline
timeline_plot<-timeline_plot+geom_hline(yintercept=0,
color = "black", size=0.3)
# Plot vertical segment lines for milestones
timeline_plot<-timeline_plot+geom_segment(data=usedata[usedata$year_count == 1,], aes(y=position,yend=0,xend=date), color='black', size=0.2)
# Plot scatter points at zero and date
timeline_plot<-timeline_plot+geom_point(aes(y=0), size=3)
# Don't show axes, appropriately position legend
timeline_plot<-timeline_plot+theme(axis.line.y=element_blank(),
axis.text.y=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.ticks.y=element_blank(),
axis.text.x =element_blank(),
axis.ticks.x =element_blank(),
axis.line.x =element_blank(),
legend.position = "bottom"
)
# Show year text
timeline_plot<-timeline_plot+geom_text(data=year_df, aes(x=year_date_range,y=-0.2,label=year_format, fontface="bold"),size=2.5, color='black')
# scale_x_date(date_labels = "%Y", breaks=seq(as.Date("1979-12-01"),as.Date("2008-06-01") ,by= "1 year" ))
# Show text for each milestone
timeline_plot<-timeline_plot+geom_text(aes(y=text_position,label=Organisations),size=3)
print(timeline_plot)
#####Making it a plotly graph
timeline_plot1<-ggplotly(timeline_plot) %>% layout(showlegend = TRUE,margin = list(l = 120, b =90) )
print(timeline_plot1)
First, we need to change the limits:
timeline_plot <- timeline_plot() +
xlim(as.Date("1977-01-01"), as.Date("2010-01-01"))
Next, since the horizontal line is now too long, remove the geom_hline call and instead use
timeline_plot <- timeline_plot +
geom_segment(data=NULL,
aes(y=0, yend=0,
x=as.Date("1979-01-01"), xend=as.Date("2008-01-01")),
color="black", size=.3)
Result:

How plot timing graph with specific options

I have this data.table which has 3 columns. the first one is about MonthlySalesMean , the second is the year and then the month.
> data[,MonthlySalesMean:=mean(StoreMean),by=c("DateMonth","DateYear")][,c("MonthlySalesMean","DateYear","DateMonth")]
MonthlySalesMean DateYear DateMonth
1: 6839.340 2015 7
2: 6839.340 2015 7
3: 6839.340 2015 7
4: 6839.340 2015 7
5: 6839.340 2015 7
---
641938: 6852.171 2013 1
641939: 6852.171 2013 1
641940: 6852.171 2013 1
641941: 6852.171 2013 1
641942: 6852.171 2013 1
I need to plot a graph of three lines because I have 3 years:
> unique(data[,DateYear])
[1] 2015 2014 2013
>
And For each year or each line, it should be plotted across all months of a year the MonthlySalesMean values. In another word it should be like this graph:
How can I do this, please?
thank you for advance!
Without a reproducible example, I can't test with your data, but here's the idea. You plot a path, with aesthetics of sales (y) against month (x) grouped by year (color)
library(tidyverse)
example_data <- tibble(
MonthlySalesMean = rnorm(36, 100, 20),
DateYear = c(rep(2013, 12), rep(2014, 12), rep(2015, 12)),
DateMonth = c(1:12, 1:12, 1:12)
)
ggplot(example_data, aes(x = DateMonth, y = MonthlySalesMean, color = as.factor(DateYear))) +
geom_path() +
geom_point(size = 2) +
geom_text(aes(label = DateYear),
data = filter(example_data, DateMonth == 1),
nudge_x = -0.5) + # plot year numbers
scale_x_continuous(breaks = 1:12, labels = month.abb) +
scale_colour_manual(guide = FALSE, # hides legend
values = c("red", "green", "blue")) + # custom colors
expand_limits(x = 0.5) + # adds a space before January
labs(x = "Month", y = "Sales") +
theme_bw() +
theme(panel.grid = element_blank()) # removes gridlines

R plot months for the first 2 years

I have a data frame with data for max 2 years period on different objects:
ISBN Date Quantity
3457 2004-06-15 10
3457 2004-08-16 6
3457 2004-08-19 10
3457 2005-04-19 7
3457 2005-04-20 12
9885 2013-01-15 10
9885 2013-03-16 6
9855 2013-08-19 10
9885 2014-09-19 7
9885 2014-09-20 12
How can I plot Jan to Dec for the 1st year, continued by Jan to Dec for the 2nd year?
I guess the idea is to normalize the years (to have 1st, 2nd), but not the months. (here's an example)
Number of Items Sold over 2 Years Period Since Release
I'd use the lubridate package for something like this. Note I am calling for dataframe df because you didn't give it a name.
So for example:
library(lubridate)
First format the date like so:
df$Date <- ymd(df$Date)
Then extract the month and the year:
df$Month <- month(df$Date, label=TRUE, abbr=TRUE)
df$Year <- year(df$Date)
From there you can plot your results with ggplot2:
library(ggplot2)
ggplot(df, aes(x=Month, y=Quantity, colour=Year)) +
geom_point()
Note your question could be asked better here as you haven't provided a reproducible example.
You could try:
data <- df %>%
group_by(ISBN) %>%
arrange(Date) %>%
mutate(Year = year(Date),
Month = month(Date, label = TRUE),
Rank = paste(sapply(cumsum(Year != lag(Year,default=0)), toOrdinal), "Year")) %>%
group_by(Rank, Month, add = TRUE) %>%
summarise(Sum = sum(Quantity))
ggplot(data = data, aes(x = Month, y = Sum,
group = factor(ISBN),
colour = factor(ISBN))) +
geom_line(stat = "identity") +
facet_grid(. ~ Rank) +
scale_colour_discrete(name = "ISBN") +
theme(panel.margin = unit(0, "lines"),
axis.text.x = element_text(angle = 90))
Aussming the following df:
df <- data.frame(
ISBN = sample(c(3457, 9885), 1000, replace = TRUE),
Date = sample(seq(as.Date('2004/01/01'),
as.Date('2011/12/31'), by = "month"),
1000, replace = TRUE),
Quantity = sample(1:12, 1000, replace = TRUE)
)
This would produce:

Reordering month results in the x axis (ggplot)

I'd like to produce a plot with reordered months on the x axis (instead of starting in Jan and ending in Dec, I'd like to start on Apr and end on Mar).
My data is something like:
Month An Fiscal.Year Month.Number Month.Name
1 2009-04-01 40488474 2009 4 Apr
2 2009-05-01 53071971 2009 5 May
3 2009-06-01 24063572 2009 6 Jun
...
44 2012-11-01 39457771 2012 11 Nov
45 2012-12-01 44045572 2012 12 Dec
46 2013-01-01 90734077 2012 1 Jan
My code for producing the plot is:
g <- ggplot(data = data, aes(x = Month.Number, y = An)) +
geom_line(aes(group = Fiscal.Year, colour = factor(Fiscal.Year))) +
scale_x_discrete(
name = "Month",
breaks = data$Month.Number,
labels = data$Month.Name
) +
scale_y_continuous();
but the result is a plot ordered by month from Jan to Dec, not from Apr to Mar as I want.
I've tried the limits option inside scale_x_discrete, but I think this just reorders the x axis labels, not the real data.
Could you please help me?
Thanks in advance for your answer!
You have to reorder the factor levels of Month.Name. Assuming dfis your data.frame:
df$Month.Name <- factor( df$Month.Name, levels = c( "Apr", "May", ..., "Feb", "Mar" ) )
g <- ggplot(data = df, aes(x = Month.Name, y = An) ) +
geom_line(aes(group = Fiscal.Year, colour = factor(Fiscal.Year))) +
scale_x_discrete( name = "Month" ) +
scale_y_continuous();
Alternatively you can just change Month.Number such that, Apr is 1, May is 2 and so on...
Just run before plotting:
data$Month.Number <- ((data$Month.Number+8) %% 12) + 1

Resources