I have a factor vector x looking like this:
""
"1992-02-13"
"2011-03-10"
""
"1998-11-30"
Can I convert this vector to a date vector (using as.Date())?
Trying the obvious way gives me:
> x <- as.Date(x)
Error in charToDate(x) :
character string is not in a standard unambiguous format
At the moment I solve this problem like this:
> levels(x)[1] <- NA
> x <- as.Date(x)
But this doesn't look too elegant...
Thank you in advance!
You simply need to tell as.Date what format to expect in your character vector:
xd <- as.Date(x, format="%Y-%m-%d")
xd
[1] NA "1992-02-13" "2011-03-10" NA "1998-11-30"
To illustrate that these are indeed dates:
xd[3] - xd[2]
Time difference of 6965 days
PS. This conversion using as.Date works regardless of whether your data is a character vector or a factor.
When you pull in the data with read.csv, or others, you can set
read.csv(...,na.strings=c(""))
to avoid having to deal with this entirely.
I usually convert factors to a POSIX* type class using the function strptime. First argument is your vector and the second argument is the "pattern" by which the date/time is constructed (a % sign + a specific letter). You basically tell R that first you have a year, then you have a -, then a month and so on. See ?strptime for a full list of conversion specifications.
x <- factor(c("1992-02-13", "2011-03-10", "1998-11-30"))
(x.date <- strptime(x, format = "%Y-%m-%d"))
[1] "1992-02-13" "2011-03-10" "1998-11-30"
class(x.date)
[1] "POSIXlt" "POSIXt"
The same principle holds for as.Date. You tell R to "make this a date/time object and here are the instructions on how to make it".
(as.Date(x, "%Y-%m-%d"))
[1] "1992-02-13" "2011-03-10" "1998-11-30"
Related
I had a dataset that looked like this:
#df
id date
1 2016-08-30 10:46:46.810
I tried to remove the hour part and only keep the date. This function worked:
df$date <- format(as.POSIXct(strptime(df$date,"%Y-%m-%d %H:%M:%S")) ,format = "%Y-%m-%d")
and the date now look likes this
id date
1 2016-08-30
Which is something that I was looking for. But the problem is I wish to do some calculation on this data and have to convert it to integer:
temp <- as.numeric(df$date )
It gives me the following warning:
Warning message:
NAs introduced by coercion
and results in
NA
Does anyone know where the issue is?
It's pretty easy as you have a standard format (see ISO 8601) which inter alia the anytime package supports (and it supports other, somewhat regular formats):
R> library(anytime)
R> at <- anytime("2016-08-30 10:46:46.810")
R> at
[1] "2016-08-30 10:46:46.80 CDT"
R> ad <- anydate("2016-08-30 10:46:46.810")
R> ad
[1] "2016-08-30"
R>
The key, though, is to understand the relationship between the underlying date formats. You will have to read and try a bit more on that. Here, in essence we just have
R> as.Date(anytime("2016-08-30 10:46:46.810"))
[1] "2016-08-30"
R>
The anytime package has a few other tricks such as automagic conversion from integer, character, factor, ordered, ...
As for the second part of your question, your were so close and then you spoiled it again with format() creating a character representation.
You almost always want Date representation instead:
R> ad <- as.Date(anytime("2016-08-30 10:46:46.810"))
R> as.integer(ad)
[1] 17043
R> as.numeric(ad)
[1] 17043
R> ad + 1:3
[1] "2016-08-31" "2016-09-01" "2016-09-02"
R>
Not format(). format gives you a character vector (string), and this confuses as.numeric because there are weird non-numeric characters in there. As far as the parser is concerned, you might as well have asked as.numeric("ripe red tomatoes").
Use as.Date() instead. e.g.
as.Date(as.POSIXct(df$date, format="%Y-%m-%d %H:%M:%S"))
I'm having some trouble working with the as.Date function in R. I have a vector of dates that I'm reading in from a .csv file that are coming in as a factor of integers or as character (depending on how I read in the file, but this doesn't seem to have anything to do with the issue), formatted as %m/%d/%Y.
I'm going through the file row by row, pulling out the date field and trying to convert it for use elsewhere using the following code:
tmpDtm <- as.Date(as.character(tempDF$myDate), "%m/%d/%Y")
This seems to give me what I want, for example, if I do this to a starting value of 12/30/2014, I get the value "2014-12-30" returned. However, if I examine this value using typeof(), R tells me that it its data type is 'double'. Additionally, if I try to bind this to other values and store it in a data frame using c() or cbind(), in the data frame, it winds up being stored as 16434, which looks to me like some sort of different internal storage value of a date. I'm pretty sure that's what it is too because if I try to convert that value again using as.Date(), it throws an error asking for an origin.
So, two questions: Is this as expected? If so, is there a more appropriate way to convert a date so that I actually end up with a date-typed object?
Thank you
Dates are internally represented as double, as you can see in the following example:
> typeof(as.Date("09/12/16", "%m/%d/%y"))
[1] "double"
it is still marked a class Date, as in
> class(as.Date("09/12/16", "%m/%d/%y"))
[1] "Date"
and because it is a double, you can do computations with it. But because it is of class Date, these computations lead to Dates:
> as.Date("09/12/16", "%m/%d/%y") + 1
[1] "2016-09-13"
> as.Date("09/12/16", "%m/%d/%y") + 31
[1] "2016-10-13"
EDIT
I have asked for c() and cbind(), because they can be assciated with some strange behaviour. See the following example, where switching the order within c changes not the type but the class of the result:
> c(as.Date("09/12/16", "%m/%d/%y"), 1)
[1] "2016-09-12" "1970-01-02"
> c(1, as.Date("09/12/16", "%m/%d/%y"))
[1] 1 17056
> class(c(as.Date("09/12/16", "%m/%d/%y"), 1))
[1] "Date"
> class(c(1, as.Date("09/12/16", "%m/%d/%y")))
[1] "numeric"
EDIT 2 - c() and cbind force objects to be of one type. The first edit shows an anomaly of coercion, but generally, the vector must be of one shared type. cbind shares this behavior because it coerces to matrix, which in turn coerces to a single type.
For more help on typeof and class see this link
This is as expected. You used typeof(); you probably should used class():
R> Sys.Date()
[1] "2016-09-12"
R> typeof(Sys.Date()) # this more or less gives you how it is stored
[1] "double"
R> class(Sys.Date()) # where as this gives you _behaviour_
[1] "Date"
R>
Minor advertisement: I have a new package anytime, currently in incoming at CRAN, which deals with this as it converts "anything" to POSIXct (via anytime()) or Date (via anydate().
E.g.:
R> anydate("12/30/2014") # no format needed
[1] "2014-12-30"
R> anydate(as.factor("12/30/2014")) # converts from factor too
[1] "2014-12-30"
R>
The following vector of Dates is given in form of a string sequence:
d <- c("01/09/1991","01/10/1991","01/11/1991","01/12/1991")
I would like to exemplary lag this vector by 1 month, that means to produce the following structure:
d <- c("01/08/1991","01/09/1991","01/10/1991","01/11/1991")
My data is much larger and I must impose higher lags as well, but this seems to be the basis I need to know.
By doing this, I would like to have the same format in the end again:("%d/%m/%Y). How can this be done in R? I found a couple of packages (e.g. lubridate), but I always have to convert between formats (strings, dates and more) so it's a bit messy and seems prone to mistake.
edit: some more info on why I want to do this: I am using this vector as rownames of a matrix, so I would prefer a solution where the final outcome is a string vector again.
This does not use any packages. We convert to "POSIXlt" class, subtract one from the month component and convert back:
fmt <- "%d/%m/%Y"
lt <- as.POSIXlt(d, format = fmt)
lt$mon <- lt$mon - 1
format(lt, format = fmt)
## [1] "01/08/1991" "01/09/1991" "01/10/1991" "01/11/1991"
My solution uses lubridatebut it does return what you want in the specified format:
require(lubridate)
d <- c("01/09/1991","01/10/1991","01/11/1991","01/12/1991")
format(as.Date(d,format="%d/%m/%Y")-months(1),'%d/%m/%Y')
[1] "01/08/1991" "01/09/1991" "01/10/1991" "01/11/1991"
You can then change the lag and (if you want) the output (which is this part : '%d/%m/%Y') by specifying what you want.
I have a vector made up of timestamps as POSIXlt, format: "2015-01-05 15:00:00, which I extracted from a timeframe.
I want to reassign the vector by loosing all elements where Minutes != 00
I've tried
vector <- vector[format(vector, "%M") == 00,]
which creates the following error of missing argument
Error in lapply(X = x, FUN = "[", ..., drop = drop) :
argument is missing, with no default
Also tried
vector <- vector["%M""== 00]
Which is seems to be an open command
Since POSIX time is stored as number of elapsed seconds since 1 Jan 1970, I guess that I could do this by excluding from my vector all elements which are not multiple of 3600. I rather not use this approach though. Thank you in advance, I'm new to R.
Format returns a character type, not numeric, so you should compare it to "00". Also the comma is not needed, as there's only 1 dimension.
vector <- vector[format(vector, "%M") == "00"]
You could try
v2[!v2$min]
#[1] "2015-01-05 15:00:00 EST" "2015-01-05 15:00:30 EST"
Or your command should also work without the comma
data
v1 <- c("2015-01-05 15:00:00", "2015-01-05 15:45:00", "2015-01-05 15:00:30")
v2 <- strptime(v1, '%Y-%m-%d %H:%M:%S')
Using:
vector2 <- vector2[v2$min==0]
I reassign vector 2 (v2) excluding all elements where minutes are not 0.
This was suggested by #akrun.
It does the selection while keeping data type as POSIX.
There were two issues with the first option of initial code:
1.function format() returns character;
2.there was a "," before last "]", which meant that the function was expecting another argument, which does not make sense to a vector as explained by #balint.
With the second option initially submitted there were a few syntax mistakes. The correct syntax is that on this answer, as suggested by #akron.
Imagine I have a data frame in which some columns represent dates or times. When working with these columns, it is convenient to have them formatted as POSIXlt objects (or other explicitly date/time oriented class).
However, when I display these columns to the screen or print them out to a .csv, I get the full ISO8601 formatted time. I realize I can turn the times into a character vector formatted however I desire using format(col, format="%m-%Y") or whatever I have in mind, but I'm not keen on changing the class of my object just to print. Other objects in R have print methods associated with them, we don't have to explicitly coerce them. Is there some way to do that with any of the date time classes of R objects that I've overlooked?
EDIT:
Here's a minimal example of what I'd hope to achieve:
a.datetime = Sys.time()
a.datetime
Displays:
2014-06-23 09:32:12
which is the format I get out in the CSV
write.csv(data.frame(a.datetime), "example.csv")
As I describe above, I realize I can coerce this to a character with the desired format manually, e.g.:
format(a.datetime, format="%y-%m")
write.csv(data.frame(format(a.datetime, format="%y-%m")), "example.csv")
Which is not what I want to have to do; I am looking for a way for the object to know how it should be printed without the user having to both apply that formatting and coerce to a character vector as shown above. (Hopefully this clarifies what I mean by changing type, I am referring the class of the output, not the class of the argument).
I can try to define such a class as below, e.g. using S3 classes, but it still does not print to csv using the format specified.
class(a.datetime) <- c("myclass", class(a.datetime))
attr(a.datetime, 'fmt') <- "%y-%m"
print.myclass <- function(x) print(format(x, format=attr(x,"fmt")))
print.csv(data.frame(a.datetime), "temp.csv")
Still prints a csv with the full ISO 8601 format.
It's pretty annoying that the base R functions for writing data don't have an argument to let the user easily adjust the datetime format.
There are ways around it, though. Here's what I've done sometimes when I want to specify a format quickly and I don't need to worry about side effects:
# In bash
Rscript -e "x <- readRDS('foo.rds'); "\
-e "as.character.POSIXct <- function(x) format(x, format='%Y-%m-%d %H:%M:%S%z'); " \
-e "write.csv(x, 'foo.csv', row.names=FALSE)"
(I'm showing that in a shell command just to emphasize that you'll want the new as.character.POSIXct method to disappear after using it.)
The essence is overriding the as.character method for the POSIXct class (for arcane reasons, overriding for the parent POSIXt class won't work):
as.character.POSIXct <- function(x)
format(x, format='%Y-%m-%d %H:%M:%S%z')
It's not something that should be done in a larger codebase where the global effects might spill into code that's not expecting it, though!
Some code to expand on my comment. R is a functional language so operations on a vector (and lists are actually vectors) will not change the vector, but will return a processed result and in the case of datatime objects that us usually a character vector. Here's a few views of a POSIXlt object:
x <- as.POSIXlt("2000-01-01")
x
#[1] "2000-01-01 PST"
x <- as.POSIXlt("2000-01-01 12:00:00")
x
#[1] "2000-01-01 12:00:00 PST"
str(x)
# POSIXlt[1:1], format: "2000-01-01 12:00:00"
mode(x)
#[1] "list"
x[[1]]
#[1] 0
x[[2]]
#[1] 0
x[[3]]
#[1] 12
x[[4]]
#[1] 1
unlist(x)
# sec min hour mday mon year wday yday isdst zone gmtoff
# "0" "0" "12" "1" "0" "100" "6" "0" "0" "PST" NA
mode(x[[3]])
#[1] "numeric"
# x[[10]]; mode(x[[10]])
#[1] "PST"
#[1] "character"
Notice that the unlist() process converted the list to a character vector. In R only lists can have mixed modes so the single character element in a POSIXlt object will end up coercing all of the elements that were stored as numeric values to character elements. As noted above POSIXlt object are tricky to use and the dataframe functions generally do not behave well with them because most (well-behaved) dataframe columns are atomic vectors rather than lists.