Date objects in decimals spanning multiple leap years - r

I would like to figure out a way to convert a day into a decimal where 0 is January 1 and December 31 is 1. No time here just days. I looked for a few solutions like here and here but neither of those solution seem to fit my problem. I also had hopes for the date_decimal function in lubridate. I have figured out a solution which involves converting the Date into a number, merging a dataframe that accounts for leaps years then divides the number by the total number of days in the year.
library(lubridate)
library(dplyr)
df <- data.frame(Date=seq(as.Date("2003/2/10"), as.Date("2007/2/10"), "years"),
var=seq(1,5, by=1))
lubridate function attempt:
date_decimal(df$Date)
Leap year dataframe
maxdaydf<-data.frame(Year=seq(2003,2007,by=1), maxdays=c(365,366,365,365,365))
A dplyr pipe to generate the daydecimal:
df %>%
mutate(Year=year(Date), daynum=yday(Date)) %>%
full_join(maxdaydf, by=c("Year")) %>%
mutate(daydecimal=daynum/maxdays)
But as I said this is clunky and involves a 2nd dataframe which is never ideal. Any suggestions on how I can convert some Dates into decimals?

Instead of date_decimal() you could use decimal_date()
decimal_date(df$Date)
[1] 2003.110 2004.109 2005.110 2006.110 2007.110
Or you can use :
yday(df$Date)/yday(ISOdate(year(df$Date), 12,31))
[1] 0.1123288 0.1120219 0.1123288 0.1123288 0.1123288

Related

How can i obtain a tsibble from this tibble without using the as.Date function?

I need to convert several "tibble" into "tsibble".
Here a simple example:
require(tidyverse)
require(lubridate)
time_1 <- c(ymd_hms('20210101 000000'),
ymd_hms('20210101 080000'),
ymd_hms('20210101 160000'),
# ymd_hms('20210102 000000'),
ymd_hms('20210102 080000'),
ymd_hms('20210102 160000'))
df_1 <- tibble(time_1, y=rnorm(5))
df_1 %>%
as_tsibble(index=time_1)
This chunk of code works as expected.
But, if the dates are all midnights, this code throws an error:
time_2 <- c(ymd_hms('20210101 000000'),
ymd_hms('20210102 000000'),
ymd_hms('20210103 000000'),
# ymd_hms('20210104 000000'),
ymd_hms('20210105 000000'),
ymd_hms('20210106 000000'))
df_2 <- tibble(time_2, y=rnorm(5))
df_2 %>%
as_tsibble(index=time_2)
I don't want to solve this issue in this way because the as.Date function changes the column type.
df_2 %>%
mutate(time_2=as.Date(time_2)) %>%
as_tsibble(index=time_2)
I also don't want to fix the issue in this way because after converting the tibble into tsibble i need to apply the fill_gaps function, which doesn't create the ymd_hms('20210104 000000') in this second scenario.
df_2 %>%
as_tsibble(index=time_2, regular=FALSE)
Is this a bug?
Thanks.
This behaviour is explained in tsibble's FAQ.
Essentially subdaily data (ymd_hms()) measured at midnight each day doesn't necessarily have an interval of 1 day (24 hours). Consider that some days have shifts due to daylight savings in your time zone, and so the number of hours between midnight and midnight the next day may be 23 or 25 hours.
If you're working with data measured at a daily interval, you should use a date with ymd() precision. You can covert it back to a date time using as_datetime() if you like.
Personally I don't think this should produce an error, however it is much simpler if it does. Perhaps the appropriate interval here is 1 hour or 30 minutes (or whatever is appropriate for timezone shifts in the specified timezone).

Subset a dataframe based on numerical values of a string inside a variable

I have a data frame which is a time series of meteorological measurement with monthly resolution from 1961 till 2018. I am interested in the variable that measures the monthly average temperature since I need the multi-annual average temperature for the summers.
To do this I must filter from the "DateVaraible" column the fifth and sixth digit, which are the month.
The values in time column are formatted like this
"19610701". So I need the 07(Juli) after 1961.
I start coding for 1 month for other purposes, so I did not try anything worth to mention. I guess that .grepl could do the work, but I do not know how the "matching" operator works.
So I started with this code that works.
summersmonth<- Df[DateVariable %like% "19610101" I DateVariable %like% "19610201"]
I am expecting a code like this
summermonths <- Df[DateVariable %like% "**06**" I DateVariable%like% "**07**..]
So that all entries with month digit from 06 to 09 are saved in the new dataframe summermonths.
Thanks in advance for any reply or feedback regarding my question.
Update
Thank to your answers I got the first part, which is to convert the variable in a as.date with the format "month"(Class=char)
Now I need to select months from Juni to September .
A horrible way to get the result I wanted is to do several subset and a rbind afterward.
Sommer1<-subset(Df, MonthVar == "Mai")
Sommer2<-subset(Df, MonthVar == "Juli")
Sommer3<-subset(Df, MonthVar == "September")
SummerTotal<-rbind(Sommer1,Sommer2,Sommer3)
I would be very glad to see this written in a tidy way.
Update 2 - Solution
Here is the tidy way, as here Using multiple criteria in subset function and logical operators
Veg_Seas<-subset(Df, subset = MonthVar %in% c("Mai","Juni","Juli","August","September"))
You can convert your date variable as date (format) and take the month:
allmonths <- month(as.Date(Df$DateVariable, format="%Y%m%d"))
Note that of your column has been originally imported as factor you need to convert it to character first:
allmonths <- month(as.Date(as.character(Df$DateVariable), format="%Y%m%d"))
Then you can check whether it is a summermonth:
summersmonth <- Df[allmonths %in% 6:9, ]
Example:
as.Date("20190702", format="%Y%m%d")
[1] "2019-07-02"
month(as.Date("20190702", format="%Y%m%d"))
[1] 7
We can use anydate from anytime to convert to Date class and then extract the month
library(anytime)
month(anydate(as.character(Df$DateVariable)))

Converting monthly numerics to readable dates in R

How can set R to count months instead of dates when converting integers to dates?
After reading several threads on how to convert dates in R, it seems like nobody has asked how it is possible to convert numeric dates if the numerics is given in monthly timeseries. E.g. 552 represents January 2006.
I have tried several things, such as using as.Date(dates,origin="1899-12-01"), but I reckognize that R counts days instead of months. Thus, the code on year-month number 552 above yields "1901-06-06" instead of the correct 2006-01-01.
Sidenote: I also want the format to be YEARmonth, but does R allow displaying dates without days?
I think your starting date should be '1960-01-01'.
anyway you can solve this problem using the package lubridate.
in this case you can start from a date and add months.
library(lubridate)
as.Date('1960-01-01') %m+% months(552)
it gives you
[1] "2006-01-01"
you can display only the year and month of a date, but in that case R coerces the date into a character.
format(as.Date('2006-01-01'), "%Y-%m")

Can I lag daily data by months with lag ( )?

I have daily data over several years for several currencies. I would like to lag variables in the data set by exactly one month (i.e. June 15 to July 15, not necessarily 30 days). NA's are fine where that is not possible.
I have gotten by so far by writing something like this:
ddply(Data, .(Currency), function(x){ #First column is date, 2nd is Currency, rest data.
y=x[,-(3)] #this is all data I want lagged
y$date=as.Date(y$date) %m+% months(1) #This increases the dates by one month
x$date=as.Date(x$date) #what data I dont want lagged is x[, (1:3)] below
z=merge(x[,(1:3)], y, by=c("date", "Currency")) #merge by date,
#lagged stuff merges with non-lagged stuff 1 month later than original obs date
return(z)
})
I can include the data if necessary, but given that I have something that works already I dont want anyone to spend time on it.
I just want to check that I cant use the lubridate %m+% months(1) syntax within the lag function. I have tried the lag function from package "statar" that uses the along_by syntax but haven't been able to figure it out.
Thanks!

Creating a single timestamp from separate DAY OF YEAR, Year and Time columns in R

I have a time series dataset for several meteorological variables. The time data is logged in three separate columns:
Year (e.g. 2012)
Day of year (e.g. 261 representing 17-September in a Leap Year)
Hrs:Mins (e.g. 1610)
Is there a way I can merge the three columns to create a single timestamp in R? I'm not very familiar with how R deals with the Day of Year variable.
Thanks for any help with this!
It looks like the timeDate package can handle gregorian time frames. I haven't used it personally but it looks straightforward. There is a shift argument in some methods that allow you to set the offset from your data.
http://cran.r-project.org/web/packages/timeDate/timeDate.pdf
Because you mentioned it, I thought I'd show the actual code to merge together separate columns. When you have the values you need in separate columns you can use paste to bring them together and lubridate::mdy to parse them.
library(lubridate)
col.month <- "Jan"
col.year <- "2012"
col.day <- "23"
date <- mdy(paste(col.month, col.day, col.year, sep = "-"))
Lubridate is a great package, here's the official page: https://github.com/hadley/lubridate
And here is a nice set of examples: http://www.r-statistics.com/2012/03/do-more-with-dates-and-times-in-r-with-lubridate-1-1-0/
You should get quite far using ISOdatetime. This function takes vectors of year, day, hour, and minute as input and outputs an POSIXct object which represents time. You just have to split the third column into two separate hour minute columns and you can use the function.

Resources