R: assign months to day of the year - r

Here's my data which has 10 years in one column and 365 day of another year in second column
dat <- data.frame(year = rep(1980:1989, each = 365), doy= rep(1:365, times = 10))
I am assuming all years are non-leap years i.e. they have 365 days.
I want to create another column month which is basically month of the year the day belongs to.
library(dplyr)
dat %>%
mutate(month = as.integer(ceiling(day/31)))
However, this solution is wrong since it assigns wrong months to days. I am looking for a dplyr
solution possibly.

We can convert it to to datetime class by using the appropriate format (i.e. %Y %j) and then extract the month with format
dat$month <- with(dat, format(strptime(paste(year, doy), format = "%Y %j"), '%m'))
Or use $mon to extract the month and add 1
dat$month <- with(dat, strptime(paste(year, doy), format = "%Y %j")$mon + 1)
tail(dat$month)
#[1] 12 12 12 12 12 12

This should give you an integer value for the months:
dat$month.num <- month(as.Date(paste(dat$year, dat$doy), '%Y %j'))
If you want the month names:
dat$month.names <- month.name[month(as.Date(paste(dat$year, dat$doy), '%Y %j'))]
The result (only showing a few rows):
> dat[29:33,]
year doy month.num month.names
29 1980 29 1 January
30 1980 30 1 January
31 1980 31 1 January
32 1980 32 2 February
33 1980 33 2 February

Related

How do I replace a value in my dataframe with text?

I have a dataframe with dates from April 2020 to today, right now they are labelled 1 to 492 with 1 being the first date I have data on. I also have a list of dates in the format I want. How can I tell R that date 1 is april 12 2020, date 2 is april 13, 2020, and so on for each date? I'm ok either replacing the values in the column or creating a new column called real_date next to it.
Update:
Sorry I didn't describe this very well. I ended up making a look-up table with the date number and real date, and I used the inner_join function to add the real date to my dataframe.
library(tidyverse)
library(lubridate)
#Creating a sample data.frame
df <-
tibble(
dates = seq.Date(dmy("01/04/20"),today(),by = "1 day")
)
df %>%
#Format date, where: %B = month as string, %d numeric day and %y numeric year
mutate(
new_date = format(dates,"%B %d %Y")
)
*Abril is April in portuguese.
If I have understood the question correctly, you have a dataframe which has numbers from 1 to 492, now you want to change them to dates where number 1 is 12th April 2020, number 2 is 13th April 2020 and so on.
You can use as.Date to convert these numbers to date and pass the origin as 11th April.
df <- data.frame(date = 1:492)
df$real_date <- as.Date(df$date, origin = '2020-04-11')
head(df)
# date real_date
#1 1 2020-04-12
#2 2 2020-04-13
#3 3 2020-04-14
#4 4 2020-04-15
#5 5 2020-04-16
#6 6 2020-04-17
Just create a sequence of dates
data.frame(date = seq(as.Date('2020-04-12'), length.out = 492,
by = '1 day'), code = 1:492)

R: date format with just year and month

I have a dataframe with monthly data, one column containing the year and one column containing the month. I'd like to combine them into one column with Date format, going from this:
Year Month Data
2020 1 54
2020 2 58
2020 3 78
2020 4 59
To this:
Date Data
2020-01 54
2020-02 58
2020-03 78
2020-04 59
I think you can't represent a Date format in R without showing the day. If you want a character column, like in your example, you can do:
> x <- data.frame(Year = c(2020,2020,2020,2020), Month = c(1,2,3,4), Data = c(54,58,78,59))
> x$Month <- ifelse(nchar(x$Month == 1), paste0(0, x$Month), x$Month) # add 0 behind.
> x$Date <- paste(x$Year, x$Month, sep = '-')
> x
Year Month Data Date
1 2020 01 54 2020-01
2 2020 02 58 2020-02
3 2020 03 78 2020-03
4 2020 04 59 2020-04
> class(x$Date)
[1] "character"
If you want a Date type column you will have to add:
x$Date <- paste0(x$Date, '-01')
x$Date <- as.Date(x$Date, format = '%Y-%m-%d')
x
class(x$Date)
Maybe the simplest way would be to arbitrarily set a day (e.g. 01) to all your dates ? Therefore date intervals would be preserved.
data<-data.frame(Year=c(2020,2020,2020,2020), Month=c(1,2,3,4), Data=c(54,58,78,59))
data$Date<-gsub(" ","",paste(data$Year,"-",data$Month,"-","01"))
data$Date<-as.Date(data$Date,format="%Y-%m-%d")
You can use sprintf -
sprintf('%d-%02d', data$Year, data$Month)
#[1] "2020-01" "2020-02" "2020-03" "2020-04"

Date range without year in R

I have data from several years and each record has a date value (YYYY-MM-DD). I want to label each record with the season that it fell into. For example, I want to take all the records from December 15 to March 15, across all years, and put "Winter" in a season column. Is there a way in R to specify a sequence of dates using just the month and date, regardless of year?
Lubridate quarter command doesn't work because I have custom dates to define the seasons and the seasons are not all of equal length, and I can't just do month(datevalue) %in% c(12,1,2,3) because I need to split the months in half (i.e. March 15 is winter and March 16 is spring).
I could manually enter in the date range for each year in my dataset (e.g. Dec 15 2015 to March 15 2015 or Dec 15 2016 to Mar 15 2016, etc...), but is there a better way?
You can extract the month and date out of the date column and use case_when to assign Season based on those two dates.
library(dplyr)
library(lubridate)
df %>%
mutate(day = day(Date),
month = month(Date),
Season = case_when(#15 December to 15 March as Winter
month == 12 & day >= 15 |
month %in% 1:2 | month == 3 & day <= 15 ~ "Winter",
#Add conditions for other season
)
)
We assume that when the question says that winter is "Dec 15 2015 to March 15 201 or Dec 15 2016 to Mar 15 2016" what is really meant is that winter is Dec 16, 2015 to Mar 15, 2016 or Dec 16, 2016 to Mar 15, 2017.
Also it is not clear what the precise output is supposed to be but in each case below we provide a second argument which takes a vector giving the season names or numbers. The default is that winter is reported as 1, spring is 2, summer is 3 and fall is 4 but you could pass a second argument of c("Winter", "Spring", "Summer", "Fall") instead or use other names if you wish.
1) yearmon/yearqtr Convert to Date class and subtract 15. Then convert that to yearmon class which represents dates internally as year + fraction where fraction = 0 for January, 1/12 for February, ..., 11/12 for December. Add 1/12 to get to the next month. Convert that to yearqtr class which represents dates as year + fraction where fraction is 0, 1/4, 2/4 or 3/4 for the 4 quarters and take cycle of that which gives the quarter number (1, 2, 3 or 4).
If we knew that the input x was a Date vector as opposed to a character vector then we could simplify this by replacing as.Date(x) in season.
library(zoo)
season <- function(x, s = 1:4)
s[cycle(as.yearqtr(as.yearmon(as.Date(x) - 15) + 1/12))]
# test
d <- c(as.Date("2020-12-15") + 0:1, as.Date("2021-03-15") + 0:1)
season(d)
## [1] 4 1 1 2
season(d, c("Winter", "Spring", "Summer", "Fall"))
## [1] "Fall" "Winter" "Winter" "Spring"
2) base The above could be translated to base R using POSIXlt. Subtract 15 as before and then add 1 to the month to get to the next month. Finally extract the month and ensure that is is less than or equal to the third month.
season.lt <- function(x, s = 1:4) {
lt <- as.POSIXlt(as.Date(d) - 15)
lt$mon <- lt$mon + 1
s[as.POSIXlt(format(lt))$mon %/% 3 + 1]
}
# test - d defined in (1)
is.season.lt(d)
## [1] 4 1 1 2
3) lubridate We can follow the same logic in lubridate like this:
season.lub <- function(x, s = 1:4)
s[(month((as.Date(x) - 15) %m+% months(1)) - 1) %/% 3 + 1]
# test - d defined in (1)
season.lub(d)
## [1] 4 1 1 2

Second to last Wednesday of month in R

In R, how can I produce a list of dates of all 2nd to last Wednesdays of the month in a specified date range? I've tried a few things but have gotten inconsistent results for months with five Wednesdays.
To generate a regular sequence of dates you can use seq with dates for parameter from and to. See the seq.Date documentation for more options.
Create a data frame with the date, the month and weekday. And then obtain the second to last wednesday for each month with the help of aggregate.
day_sequence = seq(as.Date("2020/1/1"), as.Date("2020/12/31"), "day")
df = data.frame(day = day_sequence,
month = months(day_sequence),
weekday = weekdays(day_sequence))
#Filter only wednesdays
df = df[df$weekday == "Wednesday",]
result = aggregate(day ~ month, df, function(x){head(tail(x,2),1)})
tail(x,2) will return the last two rows, then head(.., 1) will give you the first of these last two.
Result:
month day
1 April 2020-04-22
2 August 2020-08-19
3 December 2020-12-23
4 February 2020-02-19
5 January 2020-01-22
6 July 2020-07-22
7 June 2020-06-17
8 March 2020-03-18
9 May 2020-05-20
10 November 2020-11-18
11 October 2020-10-21
12 September 2020-09-23
There are probably simpler ways of doing this but the function below does what the question asks for. it returns a named vector of days such that
They are between from and to.
Are weekday day, where 1 is Monday.
Are n to last of the month.
By n to last I mean the nth counting from the end of the month.
whichWeekday <- function(from, to, day, n, format = "%Y-%m-%d"){
from <- as.Date(from, format = format)
to <- as.Date(to, format = format)
day <- as.character(day)
d <- seq(from, to, by = "days")
m <- format(d, "%Y-%m")
f <- c(TRUE, m[-1] != m[-length(m)])
f <- cumsum(f)
wed <- tapply(d, f, function(x){
i <- which(format(x, "%u") == day)
x[ tail(i, n)[1] ]
})
y <- as.Date(wed, origin = "1970-01-01")
setNames(y, format(y, "%Y-%m"))
}
whichWeekday("2019-01-01", "2020-03-31", 4, 2)
# 2019-01 2019-02 2019-03 2019-04 2019-05
#"2019-01-23" "2019-02-20" "2019-03-20" "2019-04-17" "2019-05-22"
# 2019-06 2019-07 2019-08 2019-09 2019-10
#"2019-06-19" "2019-07-24" "2019-08-21" "2019-09-18" "2019-10-23"
# 2019-11 2019-12 2020-01 2020-02 2020-03
#"2019-11-20" "2019-12-18" "2020-01-22" "2020-02-19" "2020-03-18"

format a time series as dataframe with julian date

I have a time series tt.txt of daily data from 1st May 1998 to 31 October 2012 in one column as this:
v1
296.172
303.24
303.891
304.603
304.207
303.22
303.137
303.343
304.203
305.029
305.099
304.681
304.32
304.471
305.022
304.938
304.298
304.120
Each number in the text file represents the maximum temperature in kelvin for the corresponding day. I want to put the data in 3 columns as follows by adding year, jday, and the value of the data:
year jday MAX_TEMP
1 1959 325 11.7
2 1959 326 15.6
3 1959 327 14.4
If you have a vector with dates, we can convert it to 'year' and 'jday' by
v1 <- c('May 1998 05', 'October 2012 10')
v2 <- format(as.Date(v1, '%b %Y %d'), '%Y %j')
df1 <- read.table(text=v2, header=FALSE, col.names=c('year', 'jday'))
df1
# year jday
#1 1998 125
#2 2012 284
To convert back from '%Y %j' to 'Date' class
df1$date <- as.Date(do.call(paste, df1[1:2]), '%Y %j')
Update
We can read the dataset with read.table. Create a sequence of dates using seq if we know the start and end dates, cbind with the original dataset after changing the format of 'date' to 'year' and 'julian day'.
dat <- read.table('tt.txt', header=TRUE)
date <- seq(as.Date('1998-05-01'), as.Date('2012-10-31'), by='day')
dat2 <- cbind(read.table(text=format(date, '%Y %j'),
col.names=c('year', 'jday')),MAX_TEMP=dat[1])
You can use yday
as.POSIXlt("8 Jun 15", format = "%d %b %y")$yday

Resources