Sometimes I work with data like this:
sep-2018
From date like this:
Sys.Date()
[1] "2018-09-21"
To have this result, I generally use:
format(Sys.Date(),'%b-%Y')
But its class is not a date:
class(format(Sys.Date(),'%b-%Y'))
[1] "character"
Why it's not a date? Is it possible to have it with class() = date, and how?
Also an external library like zoo have the same thing.
library(zoo)
> class(format(as.yearmon(format(Sys.Date()), "%Y-%m-%d"), "%b.%Y"))
[1] "character"
Also using "%m.%Y" seems to generate the same thing, but it does not creates (for example) ordering issue.
The format command takes the date and outputs a printable string based on the format you provide. To quote the documentation:
An object of similar structure to x containing character representations of the
elements of the first argument x in a common format, and in the current
locale's encoding.
Also, a Date variable is stored as a numeric type internally (number of days since 1970-01-01)
dput(Sys.Date())
#structure(17795, class = "Date")
structure(0, class = "Date")
#[1] "1970-01-01"
So to pinpoint the date, you need day, month and year fields. If you don't have all three, it will probably return NA or an error. Similarly for time classes. If you don't have the data then you can just use some dummy values, and use format to print only the fields you want.
As Rohit says, format doesn't outputs a Date object, but a string in the format of your choice.
To get a Date object from a string like "sep-2018" you could use readr::parse_date().
(my_date <- readr::parse_date("sep-2018", format = '%b-%Y'))
#[1] "2018-09-01"
class(my_date)
#[1] "Date"
Related
I have a simple question.
How does the function as.Date work?
Can anyone explain why a simple "" changes everything in the code below?
as.Date("03/04/2019")
#[1] "0003-04-19"
as.Date("03/04/2019", "")
#[1] "2019-07-16"
This is directly from the help page which you view with ?as.Date:
## S3 method for class 'character'
as.Date(x, format, tryFormats = c("%Y-%m-%d", "%Y/%m/%d"),
optional = FALSE, ...)
x: an object to be converted.
format: character string. If not specified, it will try tryFormats one by one on the first non-NA element, and give an error if none works. Otherwise, the processing is via strptime.
So, in the first case we do not provide format. In this case as.Date tries the following formats: "year-month-day" and "year/month/day". The second one "works" in this case so we get the following return:
as.Date("03/04/2019")
[1] "0003-04-20"
"03" is interpreted as year, "04" as month and "2019" as day. Since day can only have to digits only the first two digits are used.
In the second case we provide format ourselves:
as.Date("03/04/2019", format = "")
[1] "2019-07-16"
If format is provided as.Date does not try any formats. Instead strptime("03/04/2019", format = "") is returned.
In the help page of strptime you can find the following:
For strptime the input string need not specify the date completely: it
is assumed that unspecified seconds, minutes or hours are zero, and an
unspecified year, month or day is the current one.
So you provide a format but it does not contain anything so the current date is returned.
In any case, you can simply specify the format yourself:
as.Date("03/04/2019", format = "%d/%m/%Y")
[1] "2019-04-03"
I'm having some trouble working with the as.Date function in R. I have a vector of dates that I'm reading in from a .csv file that are coming in as a factor of integers or as character (depending on how I read in the file, but this doesn't seem to have anything to do with the issue), formatted as %m/%d/%Y.
I'm going through the file row by row, pulling out the date field and trying to convert it for use elsewhere using the following code:
tmpDtm <- as.Date(as.character(tempDF$myDate), "%m/%d/%Y")
This seems to give me what I want, for example, if I do this to a starting value of 12/30/2014, I get the value "2014-12-30" returned. However, if I examine this value using typeof(), R tells me that it its data type is 'double'. Additionally, if I try to bind this to other values and store it in a data frame using c() or cbind(), in the data frame, it winds up being stored as 16434, which looks to me like some sort of different internal storage value of a date. I'm pretty sure that's what it is too because if I try to convert that value again using as.Date(), it throws an error asking for an origin.
So, two questions: Is this as expected? If so, is there a more appropriate way to convert a date so that I actually end up with a date-typed object?
Thank you
Dates are internally represented as double, as you can see in the following example:
> typeof(as.Date("09/12/16", "%m/%d/%y"))
[1] "double"
it is still marked a class Date, as in
> class(as.Date("09/12/16", "%m/%d/%y"))
[1] "Date"
and because it is a double, you can do computations with it. But because it is of class Date, these computations lead to Dates:
> as.Date("09/12/16", "%m/%d/%y") + 1
[1] "2016-09-13"
> as.Date("09/12/16", "%m/%d/%y") + 31
[1] "2016-10-13"
EDIT
I have asked for c() and cbind(), because they can be assciated with some strange behaviour. See the following example, where switching the order within c changes not the type but the class of the result:
> c(as.Date("09/12/16", "%m/%d/%y"), 1)
[1] "2016-09-12" "1970-01-02"
> c(1, as.Date("09/12/16", "%m/%d/%y"))
[1] 1 17056
> class(c(as.Date("09/12/16", "%m/%d/%y"), 1))
[1] "Date"
> class(c(1, as.Date("09/12/16", "%m/%d/%y")))
[1] "numeric"
EDIT 2 - c() and cbind force objects to be of one type. The first edit shows an anomaly of coercion, but generally, the vector must be of one shared type. cbind shares this behavior because it coerces to matrix, which in turn coerces to a single type.
For more help on typeof and class see this link
This is as expected. You used typeof(); you probably should used class():
R> Sys.Date()
[1] "2016-09-12"
R> typeof(Sys.Date()) # this more or less gives you how it is stored
[1] "double"
R> class(Sys.Date()) # where as this gives you _behaviour_
[1] "Date"
R>
Minor advertisement: I have a new package anytime, currently in incoming at CRAN, which deals with this as it converts "anything" to POSIXct (via anytime()) or Date (via anydate().
E.g.:
R> anydate("12/30/2014") # no format needed
[1] "2014-12-30"
R> anydate(as.factor("12/30/2014")) # converts from factor too
[1] "2014-12-30"
R>
I'm using strptime to extract date and the result is a wrong year
Where is the error in the below code:
strptime('8/29/2013 14:13', "%m/%d/%y")
[1] "2020-08-29 PDT"
What are the other ways to extract date and time as separate columns.
The data I have is in this format - 8/29/2013 14:13
I want to split this into two columns, one is 8/29/2013 and the other is 14:13.
You have a four digit year so you need to use %Y
strptime('8/29/2013 14:13', "%m/%d/%Y" )
[1] "2013-08-29 CEST"
Do you really want data and time in separate columns? It usually much easier to deal with a single date-time object.
Here's one possibility to separate time and date from the string.
For convenience, we could first convert the string into a POSIX object:
datetime <- '8/29/2013 14:13'
datetime.P <- as.POSIXct(datetime, format='%m/%d/%Y %H:%M')
Then we can use as.Date() to extract the date from this object and use format() to display it in the desired format:
format(as.Date(datetime.P),"%m/%d/%Y")
#[1] "08/29/2013"
To store the time separately we can use, e.g., the strftime() function:
strftime(datetime.P, '%H:%M')
#[1] "14:13"
The last function (strftime()) is not vectorized, which means that if we are dealing with a vector datetime containing several character strings with date and time in the format as described in the OP, it should be wrapped into a loop like sapply() to extract the time from each string.
Example
datetime <- c('8/29/2013 14:13', '9/15/2014 12:03')
datetime.P <- as.POSIXct(datetime, format='%m/%d/%Y %H:%M')
format(as.Date(datetime.P),"%m/%d/%Y")
#[1] "08/29/2013" "09/15/2014"
sapply(datetime.P, strftime, '%H:%M')
#[1] "14:13" "12:03"
Hope this helps.
The following vector of Dates is given in form of a string sequence:
d <- c("01/09/1991","01/10/1991","01/11/1991","01/12/1991")
I would like to exemplary lag this vector by 1 month, that means to produce the following structure:
d <- c("01/08/1991","01/09/1991","01/10/1991","01/11/1991")
My data is much larger and I must impose higher lags as well, but this seems to be the basis I need to know.
By doing this, I would like to have the same format in the end again:("%d/%m/%Y). How can this be done in R? I found a couple of packages (e.g. lubridate), but I always have to convert between formats (strings, dates and more) so it's a bit messy and seems prone to mistake.
edit: some more info on why I want to do this: I am using this vector as rownames of a matrix, so I would prefer a solution where the final outcome is a string vector again.
This does not use any packages. We convert to "POSIXlt" class, subtract one from the month component and convert back:
fmt <- "%d/%m/%Y"
lt <- as.POSIXlt(d, format = fmt)
lt$mon <- lt$mon - 1
format(lt, format = fmt)
## [1] "01/08/1991" "01/09/1991" "01/10/1991" "01/11/1991"
My solution uses lubridatebut it does return what you want in the specified format:
require(lubridate)
d <- c("01/09/1991","01/10/1991","01/11/1991","01/12/1991")
format(as.Date(d,format="%d/%m/%Y")-months(1),'%d/%m/%Y')
[1] "01/08/1991" "01/09/1991" "01/10/1991" "01/11/1991"
You can then change the lag and (if you want) the output (which is this part : '%d/%m/%Y') by specifying what you want.
My attempt to format date like this results in NA NA. Neither the date nor the time is getting converted. What am I doing wrong?
x <- strptime(c("2013-12-12", "08:43:24.967"),"%Y-%m-%d %H:%M:%OS")
With the format string that you have supplied, strptime expects a vector of date-time strings. You have a vector containing date and time as separate vector elements. This is incorrect.
Instead of passing c("2013-12-12", "08:43:24.967") (two elements, date then time), you need to pass "2013-12-12 08:43:24.967" (one element, date-time).
The data you have can be put in the proper format with paste:
strptime(paste("2013-12-12", "08:43:24.967"),format="%Y-%m-%d %H:%M:%OS")
[1] "2013-12-12 08:43:24"
The fractional seconds aren't printed above, because the default is to not print them. But the expression does capture them (with the default options(digits.secs=NULL)). They would be printed with the proper format string for output, or a specification of the number of digits to print (e.g. options(digits.secs=3))
You need to pass a date string as first argument to strptime function follwed by the date format. It seems like you entered 0 in the date format string and please remove that milliseconds part in the date string.
You can use statement like this:
strptime("2013-12-12 08:43:24", "%Y-%m-%d %H:%M:%S")