How to set specific date as the beginning date of the year - r

I want to plot the average annual value of the stream flow data using
WATER YEAR which starts at October and ends at September (say 10/01/1983 to 09/30/1984, this is defined as 1984 water year)
I tried to find solutions elsewhere but I have failed.
Now I'm using the following script to plot the annual average flow
library(tidyverse)
library(lubridate)
library(ggplot2)
#df <- read_csv('dataframe.csv')
df <- df %>%
mutate(date = mdy(df$date))
df <- df %>%
mutate(year = floor_date(date, "year")) %>%
group_by(year) %>%
summarize(avg = mean(flow))
y <- df$avg
x <- as.Date(df$year, format = "Y")
d <- data.frame(x = x, y = y)
# interpolate values from zero to y and create corresponding number of x values
vals <- lapply(d$y, function(y) seq(0, y, by = 0.1))
y <- unlist(vals)
mid <- rep(d$x, lengths(vals))
d2 <- data.frame(x = mid - 100,
xend = mid + 100,
y = y,
yend = y)
ggplot(data = d2, aes(x = x, xend = xend, y = y, yend = yend, color = y)) +
geom_segment(size = 2) +
scale_color_gradient2(low = "midnightblue", mid = "deepskyblue", high = "aquamarine",
midpoint = max(d2$y)/2)+
scale_x_date(date_breaks = "1 year",date_labels = "%Y", expand = c(0,0)) +
theme(axis.text.x = element_text(angle=90, vjust=.5))+
labs(x = "Years", y = "Mean Annual Flow (cms)")+
ggtitle("Mean Annual Flow, Rancho River at ELdorado (1983-2020)")+
theme(plot.title = element_text(hjust = 0.5))
For this I got the following results using calendar year
If I used Water Year there will be no results for 1983
The data frame can be found in the following link
https://drive.google.com/file/d/11PVub9avzMFhUz02cHfceGh9DrlVQDbD/view?usp=sharing
Kindly assist.

If date is superior to 10/01/year(date) it means that this is the next year (in water years):
df %>%
mutate(date=mdy(date), year=year(date), year = year + (date >= mdy(paste0("10/01/", year))))
# A tibble: 5,058 x 3
date flow year
<date> <dbl> <dbl>
1 1983-10-01 3.31 1984
2 1983-10-02 3.19 1984
3 1983-10-03 3.7 1984
4 1983-10-04 3.83 1984
5 1983-10-05 3.44 1984
6 1983-10-06 4.37 1984
7 1983-10-07 6.78 1984
8 1983-10-08 6.3 1984
9 1983-10-09 6.46 1984
10 1983-10-10 6.62 1984
# … with 5,048 more rows

Related

How to build a Heatmap for each facet with its own respective scale instead of just one generic scale for all in r?

I am trying to create a heatmap that should assign colors based on % vaccinated for each month (for each row)
for example Comparison by colors between all states in month of Jan, then
for example Comparison by colors between all states in month of March .. .
then Apr ... Jun etc
Issue: Basically I would like Each month to have its own high & low scale & I am trying to do that with facet but it is assigning one common low-high scale for all the facets/months.
library(tidyverse)
library(lubridate)
library(scales)
file_url1 <- url("https://raw.githubusercontent.com/johnsnow09/covid19-df_stack-code/main/df_vaccination.csv")
df_vaccination <- read.csv(url(file_url1))
df_vaccination <- df_vaccination %>%
mutate(Updated.On = as.Date(Updated.On))
Code: I have tried
df_vaccination %>%
filter(State != "India") %>%
# summarise each month, state's vaccination
mutate(month_abbr = month(Updated.On, label = TRUE, abbr = TRUE),
State = fct_reorder(State, Population, max)) %>%
group_by(month_abbr, State) %>%
summarise(monthly_ind_vaccinated = sum(Total.Individuals.Vaccinated_Dailycalc,
na.rm = TRUE),
Population = first(Population), .groups = "drop") %>%
# get % Vaccination to State population for each month
group_by(State) %>%
mutate(prc_vaccinated_per_pop = monthly_ind_vaccinated / Population) %>%
na.omit() %>%
ungroup() %>%
filter(State %in% c("Delhi","Maharashtra")) %>%
# group_by(month_abbr) %>%
ggplot(aes(x = State, y = month_abbr, fill = prc_vaccinated_per_pop)) +
geom_tile() +
scale_fill_gradient2(low = "white", high = "darkblue", labels = percent) +
facet_wrap(~as.factor(month_abbr), scales = "free_y", nrow = 6) +
theme(axis.text.x = element_text(angle = 90, vjust = -.02),
strip.text = element_blank()) +
labs(title = "States with highest % Vaccination each month ?",
caption = "created by ViSa",
fill = "% Vaccinated each month",
x = "", y = "")
output:
I think since the color value is based on fill so it is not letting different scales apply on different facets.
Is there anything like (scales = free_fill) instead of (scales = free_y) ?
data output:
# A tibble: 12 x 5
# Groups: month_abbr [6]
month_abbr State monthly_ind_vaccina~ Population prc_vaccinated_per_~
<ord> <fct> <int> <dbl> <dbl>
1 Jan Delhi 43948 18710922 0.00235
2 Jan Maharash~ 228424 123144223 0.00185
3 Feb Delhi 322859 18710922 0.0173
4 Feb Maharash~ 794370 123144223 0.00645
5 Mar Delhi 666628 18710922 0.0356
6 Mar Maharash~ 4590035 123144223 0.0373
7 Apr Delhi 1547324 18710922 0.0827
8 Apr Maharash~ 7942882 123144223 0.0645
9 May Delhi 1613335 18710922 0.0862
10 May Maharash~ 4455440 123144223 0.0362
11 Jun Delhi 250366 18710922 0.0134
12 Jun Maharash~ 1777873 123144223 0.0144

R pivot_longer and ggplot errorbar with two name/key columns

Let's assume we have the following artifical data:
df <- data.frame(Year = c(2015,2016,2017,2018),
GPP_mean = c(1700,1800,1750,1850),
Reco_mean = c(-1700,-1800,-1750,-1850),
GPP_min = c(1600,1700,1650,1750),
GPP_max = c(1800,1900,1850,1950),
Reco_min = c(-1600,-1700,-1650,-1750),
Reco_max = c(-1800,-1900,-1850,-1950))
I'd like to plot bars for each mean value and use the min/max columns for the errorbar.
This is what I've achieved so far:
df %>%
pivot_longer(cols = -Year,
names_to = c("variable", "stats"),
names_sep = "_")
Which gives us:
# A tibble: 24 x 4
Year variable stats value
<dbl> <chr> <chr> <dbl>
1 2015 GPP mean 1700
2 2015 Reco mean -1700
3 2015 GPP min 1600
4 2015 GPP max 1800
5 2015 Reco min -1600
6 2015 Reco max -1800
7 2016 GPP mean 1800
8 2016 Reco mean -1800
9 2016 GPP min 1700
10 2016 GPP max 1900
# … with 14 more rows
So far, so good (I guess?).
From here on, I have no clue of how I can tell ggplot to plot the mean values as the bars and use min/max for the errorbars. Any help appreciated, thanks.
additional solution using tidyverse
library(tidyverse)
out <- df %>%
pivot_longer(-Year, names_sep = "_", names_to = c("index", ".value"))
ggplot(out, aes(Year, mean, fill = index)) +
geom_col() +
geom_errorbar(aes(ymin = min, ymax = max), width = 0.5)
You should stick with your original data frame. There's no need to pivot longer for this:
ggplot(df, aes(Year, GPP_mean)) +
geom_col(fill = "forestgreen", colour = "black") +
geom_errorbar(aes(ymin = GPP_min, ymax = GPP_max), width = 0.5) +
geom_col(aes(y = Reco_mean), fill = "red", colour = "black", position = "dodge") +
geom_errorbar(aes(ymin = Reco_max, ymax = Reco_min), width = 0.5)

ggplot2 plotting two variables of two groups and with different scales

I have a dataframe of following form:
School_type Year fund rate
1 1998 8 0.1
0 1998 7 0.2
1 1999 9 0.11
0 1999 8 0.22
1 2000 10 0.12
0 2000 15 0.23
I am thinking about plotting the "fund" and "rate" for each school type and the x axis is year, so there are four lines--two higher lines and two lower lines, but I don't know how to implement this with two scales of y-axes. Thanks in advance.
I am not sure if this is what you are looking for, but here is my two cents on your question.
#create the dataframe
df = data.frame("school_type" = 0:1, "year" = c("1998","1998","1999","1999","2000","2000"),
"fund" = c("8","7","9","8","10","15"), "rate" = c("0.1","0.2","0.11","0.22","0.12","0.23"))
#Modify the variable typr
df$fund = as.numeric(as.character(df$fund))
df$rate = as.numeric(as.character(df$rate))
#plot the log of the variables
df %>%
mutate(log_fund = log(fund),
log_rate = log(rate)) %>%
melt(id.vars = c("school_type","year")) %>%
filter(variable %in% c("log_fund","log_rate")) %>%
ggplot(aes(x = year, y = value, group = variable, color = variable, shape = variable)) +
geom_line(size = 1) +
geom_point(size = 3) +
facet_wrap(~ school_type) +
theme_bw()
Result:

Barplot for four variables side by side for each month (January to December)

I am a starter in R and would like to plot a bar chart of my rainfall and solar radiation data of two years side by side from January to December (attached data).
data to plot:
I am trying to plot the first row (January) but I am getting this error
Error in -0.01 * height : non-numeric argument to binary operator
How to deal with that? and and which script to use to get my data plotted?
Regards,
Here is an example
library(tidyverse)
set.seed(123456)
df <- data.frame(Month = month.abb,
R_2014 = runif(n = 12, min = 0, max = 195),
R_2015 = runif(n = 12, min = 0, max = 295),
S_2014 = runif(n = 12, min = 3, max = 10),
S_2015 = runif(n = 12, min = 4, max = 10))
df
#> Month R_2014 R_2015 S_2014 S_2015
#> 1 Jan 155.56794 267.06645 6.344445 9.714178
#> 2 Feb 146.94519 259.85035 7.903533 9.229704
#> 3 Mar 76.29486 293.18178 9.159223 8.272923
#> 4 Apr 66.60356 264.30712 9.144556 7.632427
#> 5 May 70.45235 259.19979 8.977157 5.352593
#> 6 Jun 38.67722 58.29370 4.161913 8.437571
#> 7 Jul 104.29730 98.82311 6.660781 9.373255
#> 8 Aug 18.82262 229.27586 9.083897 5.766779
#> 9 Sep 192.63015 47.08010 4.618097 7.092115
#> 10 Oct 32.67605 23.79035 3.833566 6.607897
#> 11 Nov 155.60788 39.13185 8.767659 7.450991
#> 12 Dec 115.78983 50.71209 3.561939 8.445736
# convert from wide to long format
# separate columns to get variable and year
df_long <- df %>%
gather(key, value, -Month) %>%
separate(key, into = c("variable", "Year"), "_") %>%
mutate(Month = factor(Month, levels = month.abb))
head(df_long)
#> Month variable Year value
#> 1 Jan R 2014 155.56794
#> 2 Feb R 2014 146.94519
#> 3 Mar R 2014 76.29486
#> 4 Apr R 2014 66.60356
#> 5 May R 2014 70.45235
#> 6 Jun R 2014 38.67722
# facet by year
plt1 <- ggplot(df_long, aes(x = Month, y = value, fill = variable)) +
geom_col(position = "dodge") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
facet_wrap(~ Year)
plt1
# facet by variable
plt2 <- ggplot(df_long, aes(x = Month, y = value, fill = Year)) +
geom_col(position = "dodge") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
facet_wrap( ~ variable, scales = "free_y")
plt2
Created on 2018-06-01 by the reprex package (v0.2.0).

R plot months for the first 2 years

I have a data frame with data for max 2 years period on different objects:
ISBN Date Quantity
3457 2004-06-15 10
3457 2004-08-16 6
3457 2004-08-19 10
3457 2005-04-19 7
3457 2005-04-20 12
9885 2013-01-15 10
9885 2013-03-16 6
9855 2013-08-19 10
9885 2014-09-19 7
9885 2014-09-20 12
How can I plot Jan to Dec for the 1st year, continued by Jan to Dec for the 2nd year?
I guess the idea is to normalize the years (to have 1st, 2nd), but not the months. (here's an example)
Number of Items Sold over 2 Years Period Since Release
I'd use the lubridate package for something like this. Note I am calling for dataframe df because you didn't give it a name.
So for example:
library(lubridate)
First format the date like so:
df$Date <- ymd(df$Date)
Then extract the month and the year:
df$Month <- month(df$Date, label=TRUE, abbr=TRUE)
df$Year <- year(df$Date)
From there you can plot your results with ggplot2:
library(ggplot2)
ggplot(df, aes(x=Month, y=Quantity, colour=Year)) +
geom_point()
Note your question could be asked better here as you haven't provided a reproducible example.
You could try:
data <- df %>%
group_by(ISBN) %>%
arrange(Date) %>%
mutate(Year = year(Date),
Month = month(Date, label = TRUE),
Rank = paste(sapply(cumsum(Year != lag(Year,default=0)), toOrdinal), "Year")) %>%
group_by(Rank, Month, add = TRUE) %>%
summarise(Sum = sum(Quantity))
ggplot(data = data, aes(x = Month, y = Sum,
group = factor(ISBN),
colour = factor(ISBN))) +
geom_line(stat = "identity") +
facet_grid(. ~ Rank) +
scale_colour_discrete(name = "ISBN") +
theme(panel.margin = unit(0, "lines"),
axis.text.x = element_text(angle = 90))
Aussming the following df:
df <- data.frame(
ISBN = sample(c(3457, 9885), 1000, replace = TRUE),
Date = sample(seq(as.Date('2004/01/01'),
as.Date('2011/12/31'), by = "month"),
1000, replace = TRUE),
Quantity = sample(1:12, 1000, replace = TRUE)
)
This would produce:

Resources