How to build a Heatmap for each facet with its own respective scale instead of just one generic scale for all in r? - r

I am trying to create a heatmap that should assign colors based on % vaccinated for each month (for each row)
for example Comparison by colors between all states in month of Jan, then
for example Comparison by colors between all states in month of March .. .
then Apr ... Jun etc
Issue: Basically I would like Each month to have its own high & low scale & I am trying to do that with facet but it is assigning one common low-high scale for all the facets/months.
library(tidyverse)
library(lubridate)
library(scales)
file_url1 <- url("https://raw.githubusercontent.com/johnsnow09/covid19-df_stack-code/main/df_vaccination.csv")
df_vaccination <- read.csv(url(file_url1))
df_vaccination <- df_vaccination %>%
mutate(Updated.On = as.Date(Updated.On))
Code: I have tried
df_vaccination %>%
filter(State != "India") %>%
# summarise each month, state's vaccination
mutate(month_abbr = month(Updated.On, label = TRUE, abbr = TRUE),
State = fct_reorder(State, Population, max)) %>%
group_by(month_abbr, State) %>%
summarise(monthly_ind_vaccinated = sum(Total.Individuals.Vaccinated_Dailycalc,
na.rm = TRUE),
Population = first(Population), .groups = "drop") %>%
# get % Vaccination to State population for each month
group_by(State) %>%
mutate(prc_vaccinated_per_pop = monthly_ind_vaccinated / Population) %>%
na.omit() %>%
ungroup() %>%
filter(State %in% c("Delhi","Maharashtra")) %>%
# group_by(month_abbr) %>%
ggplot(aes(x = State, y = month_abbr, fill = prc_vaccinated_per_pop)) +
geom_tile() +
scale_fill_gradient2(low = "white", high = "darkblue", labels = percent) +
facet_wrap(~as.factor(month_abbr), scales = "free_y", nrow = 6) +
theme(axis.text.x = element_text(angle = 90, vjust = -.02),
strip.text = element_blank()) +
labs(title = "States with highest % Vaccination each month ?",
caption = "created by ViSa",
fill = "% Vaccinated each month",
x = "", y = "")
output:
I think since the color value is based on fill so it is not letting different scales apply on different facets.
Is there anything like (scales = free_fill) instead of (scales = free_y) ?
data output:
# A tibble: 12 x 5
# Groups: month_abbr [6]
month_abbr State monthly_ind_vaccina~ Population prc_vaccinated_per_~
<ord> <fct> <int> <dbl> <dbl>
1 Jan Delhi 43948 18710922 0.00235
2 Jan Maharash~ 228424 123144223 0.00185
3 Feb Delhi 322859 18710922 0.0173
4 Feb Maharash~ 794370 123144223 0.00645
5 Mar Delhi 666628 18710922 0.0356
6 Mar Maharash~ 4590035 123144223 0.0373
7 Apr Delhi 1547324 18710922 0.0827
8 Apr Maharash~ 7942882 123144223 0.0645
9 May Delhi 1613335 18710922 0.0862
10 May Maharash~ 4455440 123144223 0.0362
11 Jun Delhi 250366 18710922 0.0134
12 Jun Maharash~ 1777873 123144223 0.0144

Related

Plot the means of multiple columns

I want to show different barplots for the years and gender with the mean values of the variables Q1 to Q5, which should look like a density.
I have data that looks like this:
data <- data.frame(userid = c(1,1,1,2,2,2,3,3,3),
year = c(2013,2014,2015,2013,2014,2015,2013,2014,2015),
gender = c(1,1,1,0,0,0,0,0,0),
Q1 = c(3,2,3,1,0,1,2,1,0),
Q2 = c(4,3,4,2,0,2,1,4,3),
Q3 = c(1,2,1,3,5,4,5,4,5),
Q4 = c(1,2,1,2,4,3,2,2,1),
Q5 = c(1,1,1,2,1,0,0,0,1))
My solution was to filter() for year and gender first and then use summarise(),
to get a vector of the means and put this into the barplot() function:
data %>% filter(gender==1,year==2013) %>% select(-userid,-gender,-year) %>% summarise_all(mean) %>%
as.numeric() %>%
barplot()
Instead of doing this for every combination of year and gender,
is there a more elegant way, using ggplot and facet_wrap()?
I may have misunderstood how you want the plot arranged, but if you want to show the mean score answer per year and gender group, you could do facets like this:
library(tidyverse)
data %>%
pivot_longer(starts_with("Q")) %>%
group_by(year, gender, name) %>%
summarize(value = mean(value)) %>%
ggplot(aes(name, value)) +
geom_col(fill = 'deepskyblue4') +
facet_grid(year ~ gender) +
labs(x = 'Question', y = 'Average score') +
theme_minimal(base_size = 16)
Maybe you want something like this with facet_wrap and geom_col where the mean is calculate using rowMeans like this:
library(dplyr)
library(ggplot2)
data %>%
mutate(mean = rowMeans(select(., starts_with("Q")), na.rm = TRUE)) %>%
ggplot(aes(x = year, y = mean, fill = factor(gender))) +
geom_col() +
labs(x = 'Year', y = 'Mean Q1 to Q5', fill = 'Gender') +
theme_bw() +
facet_wrap(~userid)
Created on 2022-10-28 with reprex v2.0.2
First, pivot your data from wide to long format, group by year, gender, and Q, and summarize to mean values.
library(tidyr)
library(dplyr)
library(ggplot2)
data_long <- data %>%
pivot_longer(Q1:Q5, names_to = "Q", values_to = "value") %>%
group_by(year, gender, Q) %>%
summarize(value = mean(value), .groups = "drop")
data_long
# A tibble: 30 × 4
year gender Q value
<dbl> <dbl> <chr> <dbl>
1 2013 0 Q1 1.5
2 2013 0 Q2 1.5
3 2013 0 Q3 4
4 2013 0 Q4 2
5 2013 0 Q5 1
6 2013 1 Q1 3
7 2013 1 Q2 4
8 2013 1 Q3 1
9 2013 1 Q4 1
10 2013 1 Q5 1
# … with 20 more rows
Then plot using ggplot2::facet_grid().
ggplot(data_long, aes(Q, value)) +
geom_col() +
facet_grid(year ~ gender)
aggregate then barplot.
par(mfrow=c(1, 4))
sapply(unique(data$year), \(x) {
as.matrix(aggregate(cbind(Q1, Q2, Q3, Q4, Q5) ~ gender, data[data$year == x, ], FUN=mean)[-1]) |>
barplot(beside=TRUE, col=c(2, 4), main=x)
})
plot.new()
legend('left', legend=c('m', 'f'), col=c(2, 4), cex=1.2, pch=15, bty='n')
This approach does not require you to first calculate the mean, that is handled by stat_summary(), specifying fun = mean.
library(tidyverse)
data <- data.frame(userid = c(1,1,1,2,2,2,3,3,3),
year = c(2013,2014,2015,2013,2014,2015,2013,2014,2015),
gender = c(1,1,1,0,0,0,0,0,0),
Q1 = c(3,2,3,1,0,1,2,1,0),
Q2 = c(4,3,4,2,0,2,1,4,3),
Q3 = c(1,2,1,3,5,4,5,4,5),
Q4 = c(1,2,1,2,4,3,2,2,1),
Q5 = c(1,1,1,2,1,0,0,0,1))
data %>%
select(starts_with("Q")) %>% # Selects each column that starts with "Q"
pivot_longer(cols = everything()) %>% # Pivot to long format
ggplot(aes(x = name, y = value, fill = name)) +
stat_summary(geom = "bar", fun = "mean") + # Geom and function can be changed easily
theme_classic() +
labs(x = "Q", y = "Mean value", fill = NULL)
Created on 2022-10-28 by the reprex package (v2.0.1)

How to create layered line plots looping over a variable

City Year Month Deaths Guns Shootings
Miami 2010 1 69 73800 701
Miami 2010 2 99 85050 738
Miami 2010 3 122 92650 784
Houston 2010 1 98 92100 789
Houston 2010 2 146 103900 799
Houston 2010 3 162 136100 772
For each city, I want to create a layered line plot with the Month and Year on the x-axis and colored lines corresponding to deaths, guns, and shootings. But I don't see how to do this.
So far I tried
df <- shootings %>%
select(city, date, sales, volume, median, listings, inventory) %>%
gather(key = "variable", value = "value", -c(city,date))
head(df)
df<-df[df$city=='Miami',]
ggplot(df, aes(x = date, y = value)) +
geom_line(aes(color = variable, linetype = variable))
And that can give me the layered line plot for one specific city, but is there a way to loop this through all the cities in my dataframe? Also how do I have this show the month and year labeled on the x-axis?
Last...
when I do
library(ggplot2)
library(reshape2)
library(tidyverse)
for (city_name in df$city){
df %>% filter(city == city_name) %>%
pivot_longer(-c(city, year, month, date),
names_to = "Statistic") %>%
ggplot(aes(x = date, y = log(value))) +
geom_line(aes(color = Statistic,
linetype = Statistic,
group = Statistic))
}
I get no output. Why is this?
Using facet_wrap():
library(tidyverse)
df <- read.table(text = "City Year Month Deaths Guns Shootings
Miami 2010 1 69 73800 701
Miami 2010 2 99 85050 738
Miami 2010 3 122 92650 784
Houston 2010 1 98 92100 789
Houston 2010 2 146 103900 799
Houston 2010 3 162 136100 772 ",
header = TRUE)
df %>%
mutate(Month = factor(Month, labels = month.name[1:3])) %>%
pivot_longer(-c(City, Year, Month),
names_to = "Statistic") %>%
ggplot(aes(x = Month, y = value)) +
geom_line(aes(color = Statistic,
linetype = Statistic,
group = Statistic)) +
facet_wrap(City ~ Year)
# The difference between the number of guns and the number of
# death is pretty large; perhaps plot 'log(value)' instead?
df %>%
mutate(Month = factor(Month, labels = month.name[1:3])) %>%
pivot_longer(-c(City, Year, Month),
names_to = "Statistic") %>%
ggplot(aes(x = Month, y = log(value))) +
geom_line(aes(color = Statistic,
linetype = Statistic,
group = Statistic)) +
facet_wrap(~ City)
# For Month and Year on the x-axis, one option would be:
df %>%
mutate(Month = factor(Month, labels = month.abb[1:3])) %>%
mutate(Month_Year = paste(Month, Year)) %>%
mutate(Date = factor(Month_Year,
levels = paste(
month.abb,
2010:2010,
sep = " "
), ordered = TRUE)) %>%
select(-Month_Year) %>%
pivot_longer(-c(City, Year, Month, Date),
names_to = "Statistic") %>%
ggplot(aes(x = Date, y = log(value))) +
geom_line(aes(color = Statistic,
linetype = Statistic,
group = Statistic)) +
facet_wrap(~ City)
Created on 2022-07-03 by the reprex package (v2.0.1)

How to set specific date as the beginning date of the year

I want to plot the average annual value of the stream flow data using
WATER YEAR which starts at October and ends at September (say 10/01/1983 to 09/30/1984, this is defined as 1984 water year)
I tried to find solutions elsewhere but I have failed.
Now I'm using the following script to plot the annual average flow
library(tidyverse)
library(lubridate)
library(ggplot2)
#df <- read_csv('dataframe.csv')
df <- df %>%
mutate(date = mdy(df$date))
df <- df %>%
mutate(year = floor_date(date, "year")) %>%
group_by(year) %>%
summarize(avg = mean(flow))
y <- df$avg
x <- as.Date(df$year, format = "Y")
d <- data.frame(x = x, y = y)
# interpolate values from zero to y and create corresponding number of x values
vals <- lapply(d$y, function(y) seq(0, y, by = 0.1))
y <- unlist(vals)
mid <- rep(d$x, lengths(vals))
d2 <- data.frame(x = mid - 100,
xend = mid + 100,
y = y,
yend = y)
ggplot(data = d2, aes(x = x, xend = xend, y = y, yend = yend, color = y)) +
geom_segment(size = 2) +
scale_color_gradient2(low = "midnightblue", mid = "deepskyblue", high = "aquamarine",
midpoint = max(d2$y)/2)+
scale_x_date(date_breaks = "1 year",date_labels = "%Y", expand = c(0,0)) +
theme(axis.text.x = element_text(angle=90, vjust=.5))+
labs(x = "Years", y = "Mean Annual Flow (cms)")+
ggtitle("Mean Annual Flow, Rancho River at ELdorado (1983-2020)")+
theme(plot.title = element_text(hjust = 0.5))
For this I got the following results using calendar year
If I used Water Year there will be no results for 1983
The data frame can be found in the following link
https://drive.google.com/file/d/11PVub9avzMFhUz02cHfceGh9DrlVQDbD/view?usp=sharing
Kindly assist.
If date is superior to 10/01/year(date) it means that this is the next year (in water years):
df %>%
mutate(date=mdy(date), year=year(date), year = year + (date >= mdy(paste0("10/01/", year))))
# A tibble: 5,058 x 3
date flow year
<date> <dbl> <dbl>
1 1983-10-01 3.31 1984
2 1983-10-02 3.19 1984
3 1983-10-03 3.7 1984
4 1983-10-04 3.83 1984
5 1983-10-05 3.44 1984
6 1983-10-06 4.37 1984
7 1983-10-07 6.78 1984
8 1983-10-08 6.3 1984
9 1983-10-09 6.46 1984
10 1983-10-10 6.62 1984
# … with 5,048 more rows

Barplot for four variables side by side for each month (January to December)

I am a starter in R and would like to plot a bar chart of my rainfall and solar radiation data of two years side by side from January to December (attached data).
data to plot:
I am trying to plot the first row (January) but I am getting this error
Error in -0.01 * height : non-numeric argument to binary operator
How to deal with that? and and which script to use to get my data plotted?
Regards,
Here is an example
library(tidyverse)
set.seed(123456)
df <- data.frame(Month = month.abb,
R_2014 = runif(n = 12, min = 0, max = 195),
R_2015 = runif(n = 12, min = 0, max = 295),
S_2014 = runif(n = 12, min = 3, max = 10),
S_2015 = runif(n = 12, min = 4, max = 10))
df
#> Month R_2014 R_2015 S_2014 S_2015
#> 1 Jan 155.56794 267.06645 6.344445 9.714178
#> 2 Feb 146.94519 259.85035 7.903533 9.229704
#> 3 Mar 76.29486 293.18178 9.159223 8.272923
#> 4 Apr 66.60356 264.30712 9.144556 7.632427
#> 5 May 70.45235 259.19979 8.977157 5.352593
#> 6 Jun 38.67722 58.29370 4.161913 8.437571
#> 7 Jul 104.29730 98.82311 6.660781 9.373255
#> 8 Aug 18.82262 229.27586 9.083897 5.766779
#> 9 Sep 192.63015 47.08010 4.618097 7.092115
#> 10 Oct 32.67605 23.79035 3.833566 6.607897
#> 11 Nov 155.60788 39.13185 8.767659 7.450991
#> 12 Dec 115.78983 50.71209 3.561939 8.445736
# convert from wide to long format
# separate columns to get variable and year
df_long <- df %>%
gather(key, value, -Month) %>%
separate(key, into = c("variable", "Year"), "_") %>%
mutate(Month = factor(Month, levels = month.abb))
head(df_long)
#> Month variable Year value
#> 1 Jan R 2014 155.56794
#> 2 Feb R 2014 146.94519
#> 3 Mar R 2014 76.29486
#> 4 Apr R 2014 66.60356
#> 5 May R 2014 70.45235
#> 6 Jun R 2014 38.67722
# facet by year
plt1 <- ggplot(df_long, aes(x = Month, y = value, fill = variable)) +
geom_col(position = "dodge") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
facet_wrap(~ Year)
plt1
# facet by variable
plt2 <- ggplot(df_long, aes(x = Month, y = value, fill = Year)) +
geom_col(position = "dodge") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
facet_wrap( ~ variable, scales = "free_y")
plt2
Created on 2018-06-01 by the reprex package (v0.2.0).

R plot months for the first 2 years

I have a data frame with data for max 2 years period on different objects:
ISBN Date Quantity
3457 2004-06-15 10
3457 2004-08-16 6
3457 2004-08-19 10
3457 2005-04-19 7
3457 2005-04-20 12
9885 2013-01-15 10
9885 2013-03-16 6
9855 2013-08-19 10
9885 2014-09-19 7
9885 2014-09-20 12
How can I plot Jan to Dec for the 1st year, continued by Jan to Dec for the 2nd year?
I guess the idea is to normalize the years (to have 1st, 2nd), but not the months. (here's an example)
Number of Items Sold over 2 Years Period Since Release
I'd use the lubridate package for something like this. Note I am calling for dataframe df because you didn't give it a name.
So for example:
library(lubridate)
First format the date like so:
df$Date <- ymd(df$Date)
Then extract the month and the year:
df$Month <- month(df$Date, label=TRUE, abbr=TRUE)
df$Year <- year(df$Date)
From there you can plot your results with ggplot2:
library(ggplot2)
ggplot(df, aes(x=Month, y=Quantity, colour=Year)) +
geom_point()
Note your question could be asked better here as you haven't provided a reproducible example.
You could try:
data <- df %>%
group_by(ISBN) %>%
arrange(Date) %>%
mutate(Year = year(Date),
Month = month(Date, label = TRUE),
Rank = paste(sapply(cumsum(Year != lag(Year,default=0)), toOrdinal), "Year")) %>%
group_by(Rank, Month, add = TRUE) %>%
summarise(Sum = sum(Quantity))
ggplot(data = data, aes(x = Month, y = Sum,
group = factor(ISBN),
colour = factor(ISBN))) +
geom_line(stat = "identity") +
facet_grid(. ~ Rank) +
scale_colour_discrete(name = "ISBN") +
theme(panel.margin = unit(0, "lines"),
axis.text.x = element_text(angle = 90))
Aussming the following df:
df <- data.frame(
ISBN = sample(c(3457, 9885), 1000, replace = TRUE),
Date = sample(seq(as.Date('2004/01/01'),
as.Date('2011/12/31'), by = "month"),
1000, replace = TRUE),
Quantity = sample(1:12, 1000, replace = TRUE)
)
This would produce:

Resources