I'm using heatwaveR package in R to make a plot (event_line()) and visualize the heatwaves over the years. The first step is to run ts2clm(), but this command turn my temp colum into NA so I can't plot anything. Does anyone see any errors?
This is my data:
>>> Data
t temp
[Date] [num]
0 2020-05-14 6.9
1 2020-05-06 6.8
2 2020-04-23 5.5
3 2020-04-16 3.6
4 2020-03-31 2.5
5 2020-02-25 2.3
6 2020-01-30 2.8
7 2019-10-02 13.4
8 2022-09-02 19
9 2022-08-15 18.7
...
687 1974-05-06 4.2
This is my code:
#Load data
Data <- read_xlsx("seili_raw_temp.xlsx")
#Set t as class Date
Data$t <- as.Date(Data$t, format = "%Y-%m-%d")
#Constructs seasonal and threshold climatologies
ts <- ts2clm(Data, climatologyPeriod = c("1974-05-06", "2020-05-14"))
#This is the point where almost all temp values turn into NA, so you can ignore below.
#Detect_even
res <- detect_event(ts)
#Draw heatwave plot
event_line(res, min_duration = "3",metric = "int_cum",
start_date = c("1974-05-06"), end_date = c("2020-05-14"))
The data you posted isn't long enough to get the function to work, so I just made some up:
library(heatwaveR)
library(lubridate)
set.seed(1234)
Data <- data.frame(
t = seq(ymd("2015-01-01"), ymd("2023-01-01"), by="7 day"))
Data$temp <- runif(nrow(Data), 0,45)
Then, when I execute the function, I get the result below. The problem is that your data (like the ones I generated) have one observation every 7 days. The ts2clm() function pads out the dataset so that every day has an entry and if a temperature was not observed on that day, it fills in with a missing value.
ts <- ts2clm(Data, climatologyPeriod = c("2015-01-01", "2022-12-29"))
ts
#> # A tibble: 2,920 × 5
#> doy t temp seas thresh
#> <int> <date> <dbl> <dbl> <dbl>
#> 1 1 2015-01-01 5.12 22.5 38.6
#> 2 2 2015-01-02 NA 22.4 38.5
#> 3 3 2015-01-03 NA 22.2 38.2
#> 4 4 2015-01-04 NA 22.1 37.9
#> 5 5 2015-01-05 NA 21.9 37.3
#> 6 6 2015-01-06 NA 21.7 36.8
#> 7 7 2015-01-07 NA 21.5 36.5
#> 8 8 2015-01-08 28.0 21.3 36.1
#> 9 9 2015-01-09 NA 21.2 36.1
#> 10 10 2015-01-10 NA 21.0 35.8
#> # … with 2,910 more rows
Created on 2023-02-10 by the reprex package (v2.0.1)
I have a data frame in R (xlsx file imported) that the first column contain dates with character type:
> dat
# A tibble: 4,372 x 3
date `1` `2`
<chr> <dbl> <dbl>
1 40544 35.5 35.5
2 40545 35.6 35.8
3 40546 37.2 36.4
4 40547 36.7 35.4
5 40548 36.6 35.3
I want to convert the character type into date type in R.
I tried the below code but NA's occur:
> dat%>%
+ mutate(date2 = as.Date(date, format= "%m-%d-%Y"))
# A tibble: 4,372 x 4
date `1` `2` date2
<chr> <dbl> <dbl> <date>
1 40544 35.5 35.5 NA
2 40545 35.6 35.8 NA
3 40546 37.2 36.4 NA
4 40547 36.7 35.4 NA
5 40548 36.6 35.3 NA
How can i fix this ?
Any help ?
Is this what you're looking for?
dat %>%
mutate(date2 = as.numeric(date),
date2 = as.Date(date2, origin = "1900-01-01"))
date date2
1 40544 2011-01-03
2 40545 2011-01-04
3 40546 2011-01-05
4 40547 2011-01-06
I have daily discharge data from a local stream near me. I am trying to sum and take the average of the daily data into weekly or monthly chunks so I can plot discharge_m3d(discharge) and Qs_sum(depletion) by weekly and monthly timeframes. Does anyone know how I can do this? I attached a figure of how my data frame looks.
People often use floor_date() from lubridate for these purposes. You can floor to a unit of month or week and then group by the resulting date column. Then you can use summarize() to compute the monthly or weekly sums/averages. From there you can use your plotting library of choice to visualize the result (like ggplot2, not shown).
This works even if you have more than one year of data (i.e. where the month or week number might repeat).
library(dplyr)
library(lubridate)
set.seed(123)
df <- tibble(
date = seq(
from = as.Date("2014-03-01"),
to = as.Date("2016-12-31"),
by = 1
),
Qs_sum = runif(length(date)),
discharge_m3d = runif(length(date))
)
df
#> # A tibble: 1,037 × 3
#> date Qs_sum discharge_m3d
#> <date> <dbl> <dbl>
#> 1 2014-03-01 0.288 0.560
#> 2 2014-03-02 0.788 0.427
#> 3 2014-03-03 0.409 0.448
#> 4 2014-03-04 0.883 0.833
#> 5 2014-03-05 0.940 0.720
#> 6 2014-03-06 0.0456 0.457
#> 7 2014-03-07 0.528 0.521
#> 8 2014-03-08 0.892 0.242
#> 9 2014-03-09 0.551 0.0759
#> 10 2014-03-10 0.457 0.391
#> # … with 1,027 more rows
df %>%
mutate(date = floor_date(date, unit = "month")) %>%
group_by(date) %>%
summarise(
n = n(),
qs_total = sum(Qs_sum),
qs_average = mean(Qs_sum),
discharge_total = sum(discharge_m3d),
discharge_average = mean(discharge_m3d),
.groups = "drop"
)
#> # A tibble: 34 × 6
#> date n qs_total qs_average discharge_total discharge_average
#> <date> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2014-03-01 31 18.1 0.585 15.3 0.494
#> 2 2014-04-01 30 12.9 0.429 15.2 0.507
#> 3 2014-05-01 31 15.5 0.500 15.3 0.493
#> 4 2014-06-01 30 15.8 0.525 16.3 0.542
#> 5 2014-07-01 31 15.1 0.487 13.9 0.449
#> 6 2014-08-01 31 14.8 0.478 16.2 0.522
#> 7 2014-09-01 30 15.3 0.511 13.1 0.436
#> 8 2014-10-01 31 15.6 0.504 14.7 0.475
#> 9 2014-11-01 30 16.0 0.532 15.1 0.502
#> 10 2014-12-01 31 14.2 0.458 15.5 0.502
#> # … with 24 more rows
# Assert that the "start of the week" is Sunday.
# So groups are made of data from [Sunday -> Monday]
sunday <- 7L
df %>%
mutate(date = floor_date(date, unit = "week", week_start = sunday)) %>%
group_by(date) %>%
summarise(
n = n(),
qs_total = sum(Qs_sum),
qs_average = mean(Qs_sum),
discharge_total = sum(discharge_m3d),
discharge_average = mean(discharge_m3d),
.groups = "drop"
)
#> # A tibble: 149 × 6
#> date n qs_total qs_average discharge_total discharge_average
#> <date> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2014-02-23 1 0.288 0.288 0.560 0.560
#> 2 2014-03-02 7 4.49 0.641 3.65 0.521
#> 3 2014-03-09 7 3.77 0.539 3.88 0.554
#> 4 2014-03-16 7 4.05 0.579 3.45 0.493
#> 5 2014-03-23 7 4.43 0.632 3.08 0.440
#> 6 2014-03-30 7 4.00 0.572 4.74 0.677
#> 7 2014-04-06 7 2.50 0.357 3.15 0.449
#> 8 2014-04-13 7 2.48 0.355 2.44 0.349
#> 9 2014-04-20 7 2.30 0.329 2.45 0.349
#> 10 2014-04-27 7 3.44 0.492 4.40 0.629
#> # … with 139 more rows
Created on 2022-04-13 by the reprex package (v2.0.1)
One way to approach this is using the lubridate and dplyr packages in the tidyverse. I assume here that your dates are year-month-day which they appear to be and that you only have one calendar year or at least no repeated months/weeks across two years.
monthly_discharge <- discharge %>%
filter(variable == "discharge") # First select just the rows that represent discharge (not clear if that's necessary here)
mutate(date = ymd(date), # convert date to a lubridate date object
month = month(date), # extract the numbered month from the date
week = week(date)) %>% # extract the numbered week in a year from the date
group_by(month, stream) %>% # group your data by month and stream
summarize(discharge_summary = mean(discharge_m3d)) # summarize your data so that each month has a single row with a single (mean) discharge value
# you can include multiple summary variables within the summarize function
This should produce a data frame with one row per month for each stream and a summary value for discharge. You could summarize by week by changing the month label in group_by to week.
Make use of the functions week(), month() and year() from the package lubridate to get the corresponding values for your date column. Afterwards we can find the means per week, month or year. For illustration, I added a row with year 2015, since there was only year 2014 in your sample data. Furthermore, for plotting reasons, I added a column "Year_Month" that shows the abbreviated month followed by year (x axis of the plot).
library(dplyr)
library(lubridate)
data <- data %>% mutate(Week = week(date), Month = month(date), Year = year(date)) %>%
group_by(Year, Week) %>%
mutate(mean_Week_Qs = mean(Qs_sum)) %>%
ungroup() %>%
group_by(Year, Month) %>%
mutate(mean_Month_Qs = mean(Qs_sum)) %>%
ungroup() %>%
group_by(Year) %>%
mutate(mean_Year_Qs = mean(Qs_sum)) %>%
ungroup() %>%
mutate(Year_Month = paste0(lubridate::month(date, label = TRUE), " ", Year)) %>%
ungroup()
> data
# A tibble: 12 x 10
date discharge_m3d Qs_sum Week Month Year mean_Week_Qs mean_Month_Qs mean_Year_Qs Year_Month
<date> <dbl> <dbl> <int> <int> <int> <dbl> <dbl> <dbl> <chr>
1 2014-03-01 797 0 9 3 2014 0.0409 0.629 0.629 Mar 2014
2 2014-03-02 826 0.00833 9 3 2014 0.0409 0.629 0.629 Mar 2014
3 2014-03-03 3760 0.114 9 3 2014 0.0409 0.629 0.629 Mar 2014
4 2014-03-04 4330 0.292 10 3 2014 0.785 0.629 0.629 Mar 2014
5 2014-03-05 2600 0.480 10 3 2014 0.785 0.629 0.629 Mar 2014
6 2014-03-06 4620 0.656 10 3 2014 0.785 0.629 0.629 Mar 2014
7 2014-03-07 2510 0.816 10 3 2014 0.785 0.629 0.629 Mar 2014
8 2014-03-08 1620 0.959 10 3 2014 0.785 0.629 0.629 Mar 2014
9 2014-03-09 2270 1.09 10 3 2014 0.785 0.629 0.629 Mar 2014
10 2014-03-10 5650 1.20 10 3 2014 0.785 0.629 0.629 Mar 2014
11 2014-03-11 2530 1.31 11 3 2014 1.31 0.629 0.629 Mar 2014
12 2015-03-06 1470 1.52 10 3 2015 1.52 1.52 1.52 Mar 2015
Now we can plot, for example Qs_sum per year and month, and add the mean as a red dot:
ggplot(data, aes(Year_Month, Qs_sum)) +
theme_classic() +
geom_point(size = 2) +
geom_point(aes(Year_Month, mean_Month_Qs), color = "red", size = 5, alpha = 0.6)
To summarize the results by weekly or monthly averages, you can do as follows, using distinct():
data %>% distinct(Year, Week, mean_Week_Qs)
# A tibble: 4 x 3
Week Year mean_Week_Qs
<int> <int> <dbl>
1 9 2014 0.0409
2 10 2014 0.785
3 11 2014 1.31
4 10 2015 1.52
data %>% distinct(Year, Month, mean_Month_Qs)
# A tibble: 2 x 3
Month Year mean_Month_Qs
<int> <int> <dbl>
1 3 2014 0.629
2 3 2015 1.52
This can only be done after the mutate() and mean() commands above. If you want to get directly to summarized results, you can use summarize() directly on the initial dataframe:
data %>% group_by(Year, Week) %>% summarise(Week_Avg = mean(Qs_sum))
# A tibble: 4 x 3
# Groups: Year [2]
Year Week Week_Avg
<int> <int> <dbl>
1 2014 9 0.0409
2 2014 10 0.785
3 2014 11 1.31
4 2015 10 1.52
data %>% group_by(Year, Month) %>% summarise(Month_Avg = mean(Qs_sum))
# A tibble: 2 x 3
# Groups: Year [2]
Year Month Month_Avg
<int> <int> <dbl>
1 2014 3 0.629
2 2015 3 1.52
Note that for plotting, mutate() is preferred, since it preserves the single weekly points (black in the plot above), if we used summarise() instead, we would be left with only the red points.
Data
data <- structure(list(date = structure(16130:16140, class = "Date"),
discharge_m3d = c(797, 826, 3760, 4330, 2600, 4620, 2510,
1620, 2270, 5650, 2530), Qs_sum = c(0, 0.00833424, 0.114224781,
0.291812109, 0.479780482, 0.656321971, 0.816140731, 0.959334606,
1.087579095, 1.20284046, 1.30695595), Week = c(9L, 9L, 9L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L), Month = c(3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L)), row.names = c(NA, -11L
), class = c("tbl_df", "tbl", "data.frame"))
I am struggling with finding the correct way of achieving the relative return within a month using the last observation in the previous month. Data for reference:
set.seed(123)
Date = seq(as.Date("2021/12/31"), by = "day", length.out = 90)
Returns = runif(90, min=-0.02, max = 0.02)
mData = data.frame(Date, Returns)
Then, I would like to have a return column. For example: When calculating the returns for February the third, then it should be the returns for the respective dates: 2022-02-03 / 2022-01-31 - 1. And likewise for e.g March the third: 2022-03-03 / 2022-02-28 -1. So the question is, how can I keep the date returns within a month as the numerator while having the last observation in the previous month as the denominator using dplyr?
Using a tmp column to get the previous value from the last month (assuming sorted data) and then picking the first. Grouping is done on year-month in group_by.
mData %>%
mutate(tmp=lag(Returns)) %>%
group_by(dat=strftime(Date, format="%Y-%m")) %>%
mutate(tmp=first(tmp), result=Returns/tmp-1) %>%
ungroup() %>%
select(-c(tmp, dat))
# A tibble: 90 × 5 # before select:
Date Returns result # tmp dat
<date> <dbl> <dbl> # <dbl> <chr>
1 2021-12-31 -0.00850 NA # NA 2021-12
2 2022-01-01 0.0115 -2.36 # -0.00850 2022-01
3 2022-01-02 -0.00364 -0.571 # -0.00850 2022-01
4 2022-01-03 0.0153 -2.80 # -0.00850 2022-01
5 2022-01-04 0.0176 -3.07 # -0.00850 2022-01
6 2022-01-05 -0.0182 1.14 # -0.00850 2022-01
7 2022-01-06 0.00112 -1.13 # -0.00850 2022-01
8 2022-01-07 0.0157 -2.85 # -0.00850 2022-01
9 2022-01-08 0.00206 -1.24 # -0.00850 2022-01
10 2022-01-09 -0.00174 -0.796 # -0.00850 2022-01
# … with 80 more rows
library(tidyverse)
library(lubridate)
set.seed(123)
Date = seq(as.Date("2021/12/31"), by = "day", length.out = 90)
Returns = runif(90, min=-0.02, max = 0.02)
mData = data.frame(Date, Returns)
mData |>
group_by(month(Date)) |>
mutate(last_return = last(Returns)) |>
ungroup() |>
nest(data = c(Date, Returns)) |>
mutate(last_return_lag = lag(last_return)) |>
unnest(data) |>
mutate(x = Returns/last_return_lag)
#> # A tibble: 90 × 6
#> `month(Date)` last_return Date Returns last_return_lag x
#> <dbl> <dbl> <date> <dbl> <dbl> <dbl>
#> 1 12 -0.00850 2021-12-31 -0.00850 NA NA
#> 2 1 0.0161 2022-01-01 0.0115 -0.00850 -1.36
#> 3 1 0.0161 2022-01-02 -0.00364 -0.00850 0.429
#> 4 1 0.0161 2022-01-03 0.0153 -0.00850 -1.80
#> 5 1 0.0161 2022-01-04 0.0176 -0.00850 -2.07
#> 6 1 0.0161 2022-01-05 -0.0182 -0.00850 2.14
#> 7 1 0.0161 2022-01-06 0.00112 -0.00850 -0.132
#> 8 1 0.0161 2022-01-07 0.0157 -0.00850 -1.85
#> 9 1 0.0161 2022-01-08 0.00206 -0.00850 -0.242
#> 10 1 0.0161 2022-01-09 -0.00174 -0.00850 0.204
#> # … with 80 more rows
Created on 2022-02-03 by the reprex package (v2.0.1)
I have a dataframe df as follows (my real data there are many columns):
df <- read.table(text = "date hfgf lmo
2019-01-01 0.7 1.4
2019-02-01 0.11 2.3
2019-03-01 1.22 6.7
2020-04-01 0.44 5.2
2020-05-01 0.19 2.3
2021-06-01 3.97 9.5
,
header = TRUE, stringsAsFactors = FALSE)
I would like to replace the monthly values in the columns value 1 value 2 etc by the yearly mean.
Note that I can melt and use summarize function but I need to keep the columns as they are.
If we want to update the columns with the yearly mean, do a grouping by the year extracted 'date' and use mutate to update the columns with the mean of those columns by looping across
If it is to return a single mean row per 'year', use summarise
library(lubridate)
library(dplyr)
df %>%
group_by(year = year(date)) %>%
summarise(across(where(is.numeric), mean, na.rm = TRUE))
-output
# A tibble: 3 × 3
year hfgf lmo
<dbl> <dbl> <dbl>
1 2019 0.677 3.47
2 2020 0.315 3.75
3 2021 3.97 9.5
Here is a base R solution with aggregate.
res <- aggregate(cbind(hfgf, lmo) ~ format(df$date, "%Y"), df, mean)
names(res)[1] <- names(df)[1]
res
# date hfgf lmo
#1 2019 0.6766667 3.466667
#2 2020 0.3150000 3.750000
#3 2021 3.9700000 9.500000
I am not sure but maybe you mean this kind of solution. In essence it is same as akrun's solution: Here with mutate, alternatively you could use the outcommented summarise:
library(lubridate)
library(dplyr)
df %>%
group_by(year = year(date)) %>%
mutate(across(c(hfgf, lmo), mean, na.rm=TRUE, .names = "mean_{unique(year)}_{.col}"))
# summarise(across(c(hfgf, lmo), mean, na.rm=TRUE, .names = "mean_{unique(year)}_{.col}"))
date hfgf lmo year mean_2019_hfgf mean_2019_lmo mean_2020_hfgf mean_2020_lmo mean_2021_hfgf mean_2021_lmo
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019-~ 0.7 1.4 2019 0.677 3.47 NA NA NA NA
2 2019-~ 0.11 2.3 2019 0.677 3.47 NA NA NA NA
3 2019-~ 1.22 6.7 2019 0.677 3.47 NA NA NA NA
4 2020-~ 0.44 5.2 2020 NA NA 0.315 3.75 NA NA
5 2020-~ 0.19 2.3 2020 NA NA 0.315 3.75 NA NA
6 2021-~ 3.97 9.5 2021 NA NA NA NA 3.97 9.5