Calculate length of night in data frame - r

I have a some test data with two columns. The column "hour" shows hourly values (p.m.). The column "day" indicates the corresponding day, i.e. on day 1 there are hourly values from 7 to 11 o'clock.
I now want to calculate how big the time span is for each day and store these values in a vector.
Something like:
timespan <- c(5,7,3)
How could I calculate this in a loop?
I thought about something like length(unique...)
Thanks in advance!
Here is the code:
day <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
hour <- c(7,7,8,10,11,5,6,6,7,11,9,10,10,11,11)
df <- data.frame(day,hour)

library(dplyr)
df %>%
group_by(day) %>%
summarise(time_span = max(hour) - min(hour) + 1)
## A tibble: 3 x 2
# day time_span
# <dbl> <dbl>
# 1 1 5
# 2 2 7
# 3 3 3

Related

How to take an arithmetic average over common variable, rather than whole data?

So I have a data frame which is daily data for stock prices, however, I have also a variable that indicates the week of year (1,2,3,4,...,51,52) this is repeated for 22 companies. I would like to create a new variable that takes an average of the daily prices but only across each week.
The above equation has d = day and t = week. My challenge is taking this average of days across each week. Therefore, I should have 52 values per stock that I observe.
Using ave().
dat <- transform(dat, avg_week_price=ave(price, week, company))
head(dat, 9)
# week company wday price avg_week_price
# 1 1 1 a 16.16528 15.47573
# 2 2 1 a 18.69307 15.13812
# 3 3 1 a 11.01956 12.99854
# 4 1 2 a 15.92029 14.56268
# 5 2 2 a 12.26731 13.64916
# 6 3 2 a 17.40726 17.27226
# 7 1 3 a 11.83037 13.02894
# 8 2 3 a 13.09144 12.95284
# 9 3 3 a 12.08950 15.81040
Data:
setseed(42)
dat <- expand.grid(week=1:3, company=1:5, wday=letters[1:7])
dat$price <- runif(nrow(dat), 10, 20)
An option with dplyr
library(dplyr)
dat %>%
group_by(week, company) %>%
mutate(avg_week_price = mean(price))

How could I form date interval with counts in R?

I have a date variable called DATE as follows:
DATE
2019-12-31
2020-01-01
2020-01-05
2020-01-09
2020-01-25
I am trying to return a result that counts the number of times the date occur in a week considering the Week variable starts from the minimum of DATE variable. So it would look something like this:
Week Count
1 3
2 1
3 0
4 1
Thanks in advance.
From base R
dates <- c('2019-12-31','2020-01-01','2020-01-05','2020-01-09','2020-01-25')
weeks <- strftime(dates, format = "%V")
table(weeks)
We subtract DATE values with minimum DATE value to get the difference in days between DATES. We divide the difference by 7 to get it in weeks and count it. We then use complete to fill the missing week information.
df %>%
dplyr::count(week = floor(as.integer(DATE - min(DATE))/7) + 1) %>%
tidyr::complete(week = min(week):max(week), fill = list(n = 0))
# week n
# <dbl> <dbl>
#1 1 3
#2 2 1
#3 3 0
#4 4 1
If your DATE column is not of date class, first run this :
df$DATE <- as.Date(df$DATE)

How to find the first occurrence of a negative value for each factor

I am working with weather data and trying to find the first time a temperature is negative for each winter season. I have a data frame with a column for the winter season (1,2,3,etc.), the temperature, and the ID.
I can get the first time the temperature is negative with this code:
FirstNegative <- min(which(df$temp<=0))
but it only returns the first value, and not one for each season.
I know I somehow need to group_by season, but how do I incorporate this?
For example,
season<-c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
temp<-c(2,-1,0,-1,3,-1,0,-1,2,-1,4,5,-1,-1,2)
ID<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
df <- cbind(season,temp,ID)
Ideally I want a table that looks like this from the above dummy code:
table
season id_firstnegative
[1,] 1 2
[2,] 2 4
[3,] 3 8
[4,] 4 10
[5,] 5 13
A base R option using subset and aggregate
aggregate(ID ~ season, subset(df, temp < 0), head, 1)
# season ID
#1 1 2
#2 2 4
#3 3 8
#4 4 10
#5 5 13
library(dplyr)
season<-c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
temp<-c(2,-1,0,-1,3,-1,0,-1,2,-1,4,5,-1,-1,2)
ID<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
df<-as.data.frame(cbind(season,temp,ID))
df %>%
dplyr::filter(temp < 0) %>%
group_by(season) %>%
dplyr::filter(row_number() == 1) %>%
ungroup()
As you said, I believe you could solve this by simply grouping season and examining the first index of IDs below zero within that grouping. However, the ordering of your data will be important, so ensure that each season has the correct ordering before using this possible solution.
library(dplyr)
library(tibble)
season<-c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
temp<-c(2,-1,0,-1,3,-1,0,-1,2,-1,4,5,-1,-1,2)
ID<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
df<- tibble(season,temp,ID)
df <- df %>%
group_by(season) %>%
mutate(firstNeg = ID[which(temp<0)][1]) %>%
distinct(season, firstNeg) # Combine only unique values of these columns for reduced output
This will provide output like:
# A tibble: 5 x 2
# Groups: season [5]
season firstNeg
<dbl> <dbl>
1 1 2
2 2 4
3 3 8
4 4 10
5 5 13

Add sequence of week count aligned to a date column with infrequent dates

I'm building a dataset and am looking to be able to add a week count to a dataset starting from the first date, ending on the last. I'm using it to summarize a much larger dataset, which I'd like summarized by week eventually.
Using this sample:
library(dplyr)
df <- tibble(Date = seq(as.Date("1944/06/1"), as.Date("1944/09/1"), "days"),
Week = nrow/7)
# A tibble: 93 x 2
Date Week
<date> <dbl>
1 1944-06-01 0.143
2 1944-06-02 0.286
3 1944-06-03 0.429
4 1944-06-04 0.571
5 1944-06-05 0.714
6 1944-06-06 0.857
7 1944-06-07 1
8 1944-06-08 1.14
9 1944-06-09 1.29
10 1944-06-10 1.43
# … with 83 more rows
Which definitely isn't right. Also, my real dataset isn't structured sequentially, there are many days missing between weeks so a straight sequential count won't work.
An ideal end result is an additional "week" column, based upon the actual dates (rather than hard-coded with a seq_along() type of result)
Similar solution to Ronak's but with lubridate:
library(lubridate)
(df <- tibble(Date = seq(as.Date("1944/06/1"), as.Date("1944/09/1"), "days"),
week = interval(min(Date), Date) %>%
as.duration() %>%
as.numeric("weeks") %>%
floor() + 1))
You could subtract all the Date values with the first Date and calculate the difference using difftime in "weeks", floor all the values and add 1 to start the counter from 1.
df$week <- floor(as.numeric(difftime(df$Date, df$Date[1], units = "weeks"))) + 1
df
# A tibble: 93 x 2
# Date week
# <date> <dbl>
# 1 1944-06-01 1
# 2 1944-06-02 1
# 3 1944-06-03 1
# 4 1944-06-04 1
# 5 1944-06-05 1
# 6 1944-06-06 1
# 7 1944-06-07 1
# 8 1944-06-08 2
# 9 1944-06-09 2
#10 1944-06-10 2
# … with 83 more rows
To use this in your dplyr pipe you could do
library(dplyr)
df %>%
mutate(week = floor(as.numeric(difftime(Date, first(Date), units = "weeks"))) + 1)
data
df <- tibble::tibble(Date = seq(as.Date("1944/06/1"), as.Date("1944/09/1"), "days"))

R daily multiyear mean

I need some help with R timeseries. I have daily values of temperature for a 30 year period = 365*30 days = 10950 days (if bisiest years are not considered) . I want to create a "daily climatology", that is, the average of
each (the 30 values) 1st of January, 2nd of January, etc.., to create a timesieres with 365 values. Could anyone help me with this topic?. Thanks in advance.
Something like this with dplyr + lubridate:
library(dplyr)
library(lubridate)
df %>%
group_by(month = month(date), day = day(date)) %>%
summarize(avg_value = mean(value)) %>%
pull(avg_value) %>%
ts() %>%
plot(ylab = "avg_value")
Result:
> df %>%
+ group_by(month = month(date), day = day(date)) %>%
+ summarize(avg_value = mean(value))
# A tibble: 366 x 3
# Groups: month [?]
month day avg_value
<dbl> <int> <dbl>
1 1 1 0.19750444
2 1 2 0.30492408
3 1 3 0.16760465
4 1 4 -0.09357058
5 1 5 0.10606383
6 1 6 -0.14456526
7 1 7 0.23384988
8 1 8 -0.11987095
9 1 9 -0.01166687
10 1 10 -0.08134161
# ... with 356 more rows
Data:
df = data.frame(date = seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days"),
value = rnorm(length(seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days"))))
I had the same probleme to solve and found an answer here:
Daily average calculation from multiple year daily weather data?
It took some time for me to understand and reorder all the comments beacause there was no straight code.
So here I give an complete example based on the link above.
As an example 3 years of random precipitation and temperature data:
test_data <- data.frame("date"= seq(from = as.Date("1990/1/1"), to = as.Date("1992/12/31"), by = "day"),"prec" =runif(1096, 0, 10),"temp" = runif(1096, 0, 10))
Next step is to ad a new column with a variable on which base the average will be calculated. One Day in this example:
test_data$day <- format(test_data$date, format='%m-%d')
In this column everyday of a year appears 3 times because of the 3 years. So we can calculate the mean for every day:
test_data_daily_mean <- aggregate(cbind(prec, temp) ~ (day), data=test_data, FUN=mean)
Hint: For this solution the date column really has to have dates inside. Otherwise you have to format them to R dates like this:
as.Date(data$date, format='%d-%m-%Y')
This answer is a little late, but maybe it helps someone else!

Resources