Are there any built on functions that can be used on a data frame object to generate variables on a class Date time series to create day of the Week, Month, Year, Week of the Year, etc in R?
The weekdays, months, quarters, functions in the base package generate text output, looking for numerical output to denote that 3/5/2012, for example, is a Friday, 3rd day of the month, 1 week of the month, and the 63 day of the year, etc.
You get a few of those just from POSIXlt, with its weird convention. Year needs to 1900,
month is on the 0 to 11 range -- but you do get weekday and day-of-the-year.
R> dd <- as.Date("2012-05-03")
R> as.POSIXlt(dd)
[1] "2012-05-03 UTC"
Then
R> unclass(as.POSIXlt(dd))
$sec
[1] 0
$min
[1] 0
$hour
[1] 0
$mday
[1] 3
$mon
[1] 4
$year
[1] 112
$wday
[1] 4
$yday
[1] 123
$isdst
[1] 0
attr(,"tzone")
[1] "UTC"
R>
You can use the lubridate package to do a lot with dates.
From the help file: Lubridate provides tools that make it easier to parse and manipulate dates.
For example:
> library(lubridate)
> d <- today()
> d
[1] "2014-04-29"
> day(d)
[1] 29
> month(d)
[1] 4
> year(d)
[1] 2014
> week(d)
[1] 18
> weekdays(d)
[1] "Tuesday"
> days_in_month(d)
Apr
30
I prefer it to the built-in functions because it has a lot of date splicing, casting and arithmetic functions.
There are a couple of options that I can think of.
First, you could use the class as.POSIXlt so that you can subset with things like df$date$yday. The as.POSIXlt() includes the elements of dates as a list underneath that can be accessed that way.
Also, the package lubridate has functions like
yday(x)
wday(x)
mday(x)
Related
I'm trying to build folders to store data pulls. I want to label the folders with the day of that data in the pull.
Ex. I pull 5 days ago data from mysql i want to name the folder the date from 5 days ago.
MySQL can easily handle date arithmetic. I'm not sure exactly how R does it. Should i just subtract the appropriate number of seconds in POSIXct and then convert to POSIXlt to name the folder MM_DD_YYYY?
Or is there a better way?
Just subtract a number:
> as.Date("2009-10-01")
[1] "2009-10-01"
> as.Date("2009-10-01")-5
[1] "2009-09-26"
Since the Date class only has days, you can just do basic arithmetic on it.
If you want to use POSIXlt for some reason, then you can use it's slots:
> a <- as.POSIXlt("2009-10-04")
> names(unclass(as.POSIXlt("2009-10-04")))
[1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday" "isdst"
> a$mday <- a$mday - 6
> a
[1] "2009-09-28 EDT"
The answer probably depends on what format your date is in, but here is an example using the Date class:
dt <- as.Date("2010/02/10")
new.dt <- dt - as.difftime(2, unit="days")
You can even play with different units like weeks.
There is of course a lubridate solution for this:
library(lubridate)
date <- "2009-10-01"
ymd(date) - 5
# [1] "2009-09-26"
is the same as
ymd(date) - days(5)
# [1] "2009-09-26"
Other time formats could be:
ymd(date) - months(5)
# [1] "2009-05-01"
ymd(date) - years(5)
# [1] "2004-10-01"
ymd(date) - years(1) - months(2) - days(3)
# [1] "2008-07-29"
I have some dates in a dataframe, and when I use as.Date() to convert them into dates, the years convert into 2020, which isn't really valid because the file only has data up to 2018.
What I have so far:
> fechadeinsc1[2]
[1] "2020-08-15"
> class(fechadeinsc1)
[1] "Date"
> fechainsc[2]
[1] "2017/99/99"
> class(fechainsc)
[1] "character"
As you can see, fechadeinsc1 was converted into a date and fechainsc is the original dataframe which elements are characters. "fechadeinsc1" should give the same year, shouldn't it? Even though days and months aren't valid.
Another example:
> fechadenac1[2]
[1] "2020-12-31"
> class(fechadenac1)
[1] "Date"
> fechanac[2]
[1] "12/31/2016"
> class(fechanac)
[1] "character"
Again, the year changes.
My code:
fechanac <- dat$fecha_nac
fechainsc <- dat$fecha_insc
fechadeinsc1 <- as.Date(fechainsc,tryFormats =c("%d/%m/%y","%m/%d/%y","%y","%d%m%y","%m%d%y"))
fechadenac1 <- as.Date(fechanac,tryFormats =c("%d/%m/%y","%m/%d/%y","%y","%d%m%y","%m%d%y"))
"dat" is the original dataframe which contains information about newborns registered in 2016 and 2017 in Ecuador, if anyone wants the original .csv file please contact me.
Based on strptime, referred from as.Date, you should use upper case Y for 4-digit years:
%y Year without century (00--99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 -- that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’.
%Y Year with century. [...]
I am making a trivial error here but cannot get my head around figuring out what is the problem.
I need to get the date of the Monday of the week of a random date.. Seems I am getting something quite different
mydate <- date("2013-11-05")
format(mydate, "%A") # this is Tuesday, right
#[1] "Tuesday"
month(mydate) # Month November, right
#[1] 11
myyr <- year(mydate); myyr # year is 2013, right
#[1] 2013
day(mydate) # day number is 5, right
#[1] 5
mywk <- isoweek(mydate);mywk # weeknumber is 45, right (yes, not US convention here)
#[1] 45
format(mydate, "%V") # weeknumber is 45, right as well
#[1] "45"
# Monday of week 45 is 2013-11-04 but strptime following gives something else...
strptime(paste0(myyr, "Monday", mywk), "%Y%A%V")
#[1] "2013-11-19 EET"
# and for checking
strptime("2013Monday45","%Y%A%V")
#[1] "2013-11-19 EET"
Thanks in advance
Gabor's comment is all you need, essentially. Here is a full function:
mondayForGivenDate <- function(d) {
if (class(d) != "Date") d <- anytime::anydate(d)
d - as.POSIXlt(d)$wday + 1
}
Running this for today (a Saturday), next Monday and previous Saturday gets us three different Monday's as you expect:
R> mondayForGivenDate(Sys.Date())
[1] "2016-11-14"
R> mondayForGivenDate(Sys.Date()+2)
[1] "2016-11-21"
R> mondayForGivenDate(Sys.Date()-7)
[1] "2016-11-07"
R>
The use of the anydate() function from the anytime is optional but nice because you now get to use many different input formats:
R> mondayForGivenDate(20161119)
[1] "2016-11-14"
R> mondayForGivenDate("20161119")
[1] "2016-11-14"
R> mondayForGivenDate("2016-11-19")
[1] "2016-11-14"
R> mondayForGivenDate("2016-Nov-19")
[1] "2016-11-14"
R>
The key point, once again, is to work with the proper Date and/or POSIXt classes in R which almost always give you what is needed -- in this case the wday component for the day of the week need to revert back to the week's Monday.
I am working in R and I need to change from a column in format
9/27/2011 3:33:00 PM
to a value format. In Excel I can use the function value() but I do not know how to do it in R.
My data looks like this:
9/27/2011 15:33 a 1 5 9
9/27/2011 15:33 v 2 6 2
9/27/2011 15:34 c 3 7 1
To convert a string into R date format, use as.POSIXct - then you can coerce it to a numeric value using as.numeric:
> x <- as.POSIXct("9/27/2011 3:33:00 PM", format="%m/%d/%Y %H:%M:%S %p")
> x
[1] "2011-09-27 03:33:00 BST"
> as.numeric(x)
[1] 1317090780
The value you get indicates the number of seconds since an arbitrary date, usually 1/1/1970. Note that this is different from Excel, where a date is stored as the number of days since an arbitrary date (1/1/1900 if my memory serves me well - I try not to use Excel any more.)
For more information, see ?DateTimeClasses
This was useful for me:
> test=as.POSIXlt("09/13/2006", format="%m/%d/%Y")
> test
[1] "2006-09-13"
> 1900+test$year
[1] 2006
> test$yday
[1] 255
> test$yday/365
[1] 0.6986301
> 1900+test$year+test$yday/366
[1] 2006.697
You can use similar approaches if you need day numbers like in Excel.
I have multiple lists of measurements. In each list have the timestramp formated as a string ("2009-12-24 21:00:07.0") and I know that each measurement in the list is separated by 5 seconds.
I want to combine all data into a huge data.frame in R. Afterwards I want to be able to easily access the time difference of two measurements so I probably should convert data into something different than characters.
Which format should I use to store the times? Is there some time format in some package that I should use?
You want the (standard) POSIXt type from base R that can be had in 'compact form' as a POSIXct (which is essentially a double representing fractional seconds since the epoch) or as long form in POSIXlt (which contains sub-elements). The cool thing is that arithmetic etc are defined on this -- see help(DateTimeClasses)
Quick example:
R> now <- Sys.time()
R> now
[1] "2009-12-25 18:39:11 CST"
R> as.numeric(now)
[1] 1.262e+09
R> now + 10 # adds 10 seconds
[1] "2009-12-25 18:39:21 CST"
R> as.POSIXlt(now)
[1] "2009-12-25 18:39:11 CST"
R> str(as.POSIXlt(now))
POSIXlt[1:9], format: "2009-12-25 18:39:11"
R> unclass(as.POSIXlt(now))
$sec
[1] 11.79
$min
[1] 39
$hour
[1] 18
$mday
[1] 25
$mon
[1] 11
$year
[1] 109
$wday
[1] 5
$yday
[1] 358
$isdst
[1] 0
attr(,"tzone")
[1] "America/Chicago" "CST" "CDT"
R>
As for reading them in, see help(strptime)
As for difference, easy too:
R> Jan1 <- strptime("2009-01-01 00:00:00", "%Y-%m-%d %H:%M:%S")
R> difftime(now, Jan1, unit="week")
Time difference of 51.25 weeks
R>
Lastly, the zoo package is an extremely versatile and well-documented container for matrix with associated date/time indices.