How can I append dates to a vector in R? - r

I created a vector using the vector() function:
actual_dates_vector <- vector()
I then extract the Julian date (eg: 2008201) from a text string:
julian_date<-substr(files[r],10,16)
I then convert the Julian date into YYYY-MM-DD format:
actual_date<-strptime(julian_date, "%Y %j")
This gives me a value like "2009-07-28". I then need to append this to the vector initially created. For which I do this:
actual_dates_vector<-c(actual_dates_vector,actual_date)
But this gives me:
$sec
[1] 0
$min
[1] 0
$hour
[1] 0
$mday
[1] 28
$mon
[1] 6
$year
[1] 109
$wday
[1] 2
$yday
[1] 208
$isdst
[1] 1
I don't understand what's going on. This code actually runs in a loop over multiple dates, so I want the date to be extracted from each date string, converted to YYYY-MM-DD format and appended to the vector. Is there a way to do this?
Thanks.

If you prefer a "loop & append" approach, you can do as follows :
# random data to emulate your files
files <- c("2008281","2009128","2010040")
n_files <- length(files)
# loop & append
actual_dates_vector <- vector()
for(r in 1:n_files){
dts <- as.POSIXct(files[r],format="%Y%j")
# convert dts (POSIXct class objects) to character with the desired format
dts <- format(dts,format="%Y-%m-%d")
actual_dates_vector <- c(actual_dates_vector,dts)
}
Date objects actually are something else under the hood. As you have seen POSIXlt's are actually lists of the date components while POSIXct's are basically doubles, so they're not what you see when you print them (also the printed format depends on the local settings so you can get different results on differnt machines).
For this reason, since you stated you want a specific representation of the dates (namely YYYY-MM-DD), I suggest you to follow the described approach and store the result into a vector of characters having the desired format.

strptime returns a POSIXlt object which is actually a list like you're seeing. If you use as.POSIXct instead of strptime you'll get the result you want.
Also, all the functions you're calling are vectorized so you don't need to do this append strategy, instead you should be able to:
strptime(substr(files, 10 ,16), '%Y %j')
Or something along those lines.
As pointed out in the comments, as.POSIXct calls strptime under the hood.

Related

Why is it that wrapping as.Date() in sapply return a numerical data type, but whereas lapply returns a date data type?

Where I work, the data set that we receive is formatted in character and so needs to be changed to their appropriate data types for any analysis in R.
But a weird thing I have noticed is converting the column containing dates from character to dates using as.Date within sapply converts the columns to number, whereas lapply converts them into the required date format.
I was just curios as to why such a behaviour takes place.
Welcome to StackOverflow, and excellent question.
It is due to the result type. sapply returns a vector and the as.vector() step strips the class attribute. This is unfortunate, but documented:
R> dates <- Sys.Date() + 0:2
R> dates
[1] "2020-04-25" "2020-04-26" "2020-04-27"
R> as.vector(dates)
[1] 18377 18378 18379
R>
(And the 'number' is how dates are represented internally: number of days since the epoch aka 1970-01-01. You get the same when you do as.numeric() or as.integer(0 on them.)
Lists have richer semantics, and lapply(), which returns a list, does not incur the side effect seen above:
as.list(dates)
[[1]]
[1] "2020-04-25"
[[2]]
[1] "2020-04-26"
[[3]]
[1] "2020-04-27"
R>

How to split filename strings and convert to a datetime in R

In R I'd like to split file names in the format "a_b_c_d.jpg"
For example:
20190104_080314_2048_1700.jpg
The Date: 2019.01.04 and time 08:03:14 is important to me. The other numbers (2048= pixel, 1700= filter) are not.
So I need the a and b value.
If I use strsplit I get: [1]"a" "b" "c" "d.jpg", but i want [1] a [2] b only.
And in the end i want to use the [1] date and [2] time and put it together into one value: 2019-01-04T08:03:14
Has anyone an idea how to do this?
Thanks for helping me with programming for my astrological research about the sun activity :)
You can use a regular expression to get the pieces of the string you need.
library(stringr)
x <- '20190104_080314_2048_1700.jpg'
str_replace(x, '(^.{4})(.{2})(.{2})_(.{2})(.{2})(.{2}).*', '\\1-\\2-\\3T\\4:\\5:\\6')
#[1] "2019-01-04T08:03:14"
The expression is anchored to the start of the string, then gets the first four characters, then the next 2 characters etc. The first bracket is capture group 1 (i.e. \1)
There are two steps here. First is to split the string, as you suggest, and second to convert those outputs to a datetime object.
Step 1:
strsplit produces a list object. To access individual parts of that list, you need to unlist() it and then call the specific elements you're after.
t <- "20190104_080314_2048_1700.jpg"
t.split <- unlist(strsplit(t, "_"))[c(1,2)]
# [1] "20190104" "080314"
Step 2:
Now you can convert these two strings to a datetime object of your choice. Using lubridate makes it pretty easy:
library(lubridate)
ymd_hms(paste(t.split[1], t.split[2]))
# [1] "2019-01-04 08:03:14 UTC"
or you can use the base R function strptime:
strptime(paste(t.split[1], t.split[2]), format="%Y%m%d %H%M%S")
# [1] "2019-01-04 08:03:14 PST"
Note the difference in the default timezones, and be sure to specify the right one (both functions take a tz= argument).

Why is Date is being returned as type 'double'?

I'm having some trouble working with the as.Date function in R. I have a vector of dates that I'm reading in from a .csv file that are coming in as a factor of integers or as character (depending on how I read in the file, but this doesn't seem to have anything to do with the issue), formatted as %m/%d/%Y.
I'm going through the file row by row, pulling out the date field and trying to convert it for use elsewhere using the following code:
tmpDtm <- as.Date(as.character(tempDF$myDate), "%m/%d/%Y")
This seems to give me what I want, for example, if I do this to a starting value of 12/30/2014, I get the value "2014-12-30" returned. However, if I examine this value using typeof(), R tells me that it its data type is 'double'. Additionally, if I try to bind this to other values and store it in a data frame using c() or cbind(), in the data frame, it winds up being stored as 16434, which looks to me like some sort of different internal storage value of a date. I'm pretty sure that's what it is too because if I try to convert that value again using as.Date(), it throws an error asking for an origin.
So, two questions: Is this as expected? If so, is there a more appropriate way to convert a date so that I actually end up with a date-typed object?
Thank you
Dates are internally represented as double, as you can see in the following example:
> typeof(as.Date("09/12/16", "%m/%d/%y"))
[1] "double"
it is still marked a class Date, as in
> class(as.Date("09/12/16", "%m/%d/%y"))
[1] "Date"
and because it is a double, you can do computations with it. But because it is of class Date, these computations lead to Dates:
> as.Date("09/12/16", "%m/%d/%y") + 1
[1] "2016-09-13"
> as.Date("09/12/16", "%m/%d/%y") + 31
[1] "2016-10-13"
EDIT
I have asked for c() and cbind(), because they can be assciated with some strange behaviour. See the following example, where switching the order within c changes not the type but the class of the result:
> c(as.Date("09/12/16", "%m/%d/%y"), 1)
[1] "2016-09-12" "1970-01-02"
> c(1, as.Date("09/12/16", "%m/%d/%y"))
[1] 1 17056
> class(c(as.Date("09/12/16", "%m/%d/%y"), 1))
[1] "Date"
> class(c(1, as.Date("09/12/16", "%m/%d/%y")))
[1] "numeric"
EDIT 2 - c() and cbind force objects to be of one type. The first edit shows an anomaly of coercion, but generally, the vector must be of one shared type. cbind shares this behavior because it coerces to matrix, which in turn coerces to a single type.
For more help on typeof and class see this link
This is as expected. You used typeof(); you probably should used class():
R> Sys.Date()
[1] "2016-09-12"
R> typeof(Sys.Date()) # this more or less gives you how it is stored
[1] "double"
R> class(Sys.Date()) # where as this gives you _behaviour_
[1] "Date"
R>
Minor advertisement: I have a new package anytime, currently in incoming at CRAN, which deals with this as it converts "anything" to POSIXct (via anytime()) or Date (via anydate().
E.g.:
R> anydate("12/30/2014") # no format needed
[1] "2014-12-30"
R> anydate(as.factor("12/30/2014")) # converts from factor too
[1] "2014-12-30"
R>

Controlling how a date-time object is printed without coercing to a character?

Imagine I have a data frame in which some columns represent dates or times. When working with these columns, it is convenient to have them formatted as POSIXlt objects (or other explicitly date/time oriented class).
However, when I display these columns to the screen or print them out to a .csv, I get the full ISO8601 formatted time. I realize I can turn the times into a character vector formatted however I desire using format(col, format="%m-%Y") or whatever I have in mind, but I'm not keen on changing the class of my object just to print. Other objects in R have print methods associated with them, we don't have to explicitly coerce them. Is there some way to do that with any of the date time classes of R objects that I've overlooked?
EDIT:
Here's a minimal example of what I'd hope to achieve:
a.datetime = Sys.time()
a.datetime
Displays:
2014-06-23 09:32:12
which is the format I get out in the CSV
write.csv(data.frame(a.datetime), "example.csv")
As I describe above, I realize I can coerce this to a character with the desired format manually, e.g.:
format(a.datetime, format="%y-%m")
write.csv(data.frame(format(a.datetime, format="%y-%m")), "example.csv")
Which is not what I want to have to do; I am looking for a way for the object to know how it should be printed without the user having to both apply that formatting and coerce to a character vector as shown above. (Hopefully this clarifies what I mean by changing type, I am referring the class of the output, not the class of the argument).
I can try to define such a class as below, e.g. using S3 classes, but it still does not print to csv using the format specified.
class(a.datetime) <- c("myclass", class(a.datetime))
attr(a.datetime, 'fmt') <- "%y-%m"
print.myclass <- function(x) print(format(x, format=attr(x,"fmt")))
print.csv(data.frame(a.datetime), "temp.csv")
Still prints a csv with the full ISO 8601 format.
It's pretty annoying that the base R functions for writing data don't have an argument to let the user easily adjust the datetime format.
There are ways around it, though. Here's what I've done sometimes when I want to specify a format quickly and I don't need to worry about side effects:
# In bash
Rscript -e "x <- readRDS('foo.rds'); "\
-e "as.character.POSIXct <- function(x) format(x, format='%Y-%m-%d %H:%M:%S%z'); " \
-e "write.csv(x, 'foo.csv', row.names=FALSE)"
(I'm showing that in a shell command just to emphasize that you'll want the new as.character.POSIXct method to disappear after using it.)
The essence is overriding the as.character method for the POSIXct class (for arcane reasons, overriding for the parent POSIXt class won't work):
as.character.POSIXct <- function(x)
format(x, format='%Y-%m-%d %H:%M:%S%z')
It's not something that should be done in a larger codebase where the global effects might spill into code that's not expecting it, though!
Some code to expand on my comment. R is a functional language so operations on a vector (and lists are actually vectors) will not change the vector, but will return a processed result and in the case of datatime objects that us usually a character vector. Here's a few views of a POSIXlt object:
x <- as.POSIXlt("2000-01-01")
x
#[1] "2000-01-01 PST"
x <- as.POSIXlt("2000-01-01 12:00:00")
x
#[1] "2000-01-01 12:00:00 PST"
str(x)
# POSIXlt[1:1], format: "2000-01-01 12:00:00"
mode(x)
#[1] "list"
x[[1]]
#[1] 0
x[[2]]
#[1] 0
x[[3]]
#[1] 12
x[[4]]
#[1] 1
unlist(x)
# sec min hour mday mon year wday yday isdst zone gmtoff
# "0" "0" "12" "1" "0" "100" "6" "0" "0" "PST" NA
mode(x[[3]])
#[1] "numeric"
# x[[10]]; mode(x[[10]])
#[1] "PST"
#[1] "character"
Notice that the unlist() process converted the list to a character vector. In R only lists can have mixed modes so the single character element in a POSIXlt object will end up coercing all of the elements that were stored as numeric values to character elements. As noted above POSIXlt object are tricky to use and the dataframe functions generally do not behave well with them because most (well-behaved) dataframe columns are atomic vectors rather than lists.

convert factor to date with empty cells

I have a factor vector x looking like this:
""
"1992-02-13"
"2011-03-10"
""
"1998-11-30"
Can I convert this vector to a date vector (using as.Date())?
Trying the obvious way gives me:
> x <- as.Date(x)
Error in charToDate(x) :
character string is not in a standard unambiguous format
At the moment I solve this problem like this:
> levels(x)[1] <- NA
> x <- as.Date(x)
But this doesn't look too elegant...
Thank you in advance!
You simply need to tell as.Date what format to expect in your character vector:
xd <- as.Date(x, format="%Y-%m-%d")
xd
[1] NA "1992-02-13" "2011-03-10" NA "1998-11-30"
To illustrate that these are indeed dates:
xd[3] - xd[2]
Time difference of 6965 days
PS. This conversion using as.Date works regardless of whether your data is a character vector or a factor.
When you pull in the data with read.csv, or others, you can set
read.csv(...,na.strings=c(""))
to avoid having to deal with this entirely.
I usually convert factors to a POSIX* type class using the function strptime. First argument is your vector and the second argument is the "pattern" by which the date/time is constructed (a % sign + a specific letter). You basically tell R that first you have a year, then you have a -, then a month and so on. See ?strptime for a full list of conversion specifications.
x <- factor(c("1992-02-13", "2011-03-10", "1998-11-30"))
(x.date <- strptime(x, format = "%Y-%m-%d"))
[1] "1992-02-13" "2011-03-10" "1998-11-30"
class(x.date)
[1] "POSIXlt" "POSIXt"
The same principle holds for as.Date. You tell R to "make this a date/time object and here are the instructions on how to make it".
(as.Date(x, "%Y-%m-%d"))
[1] "1992-02-13" "2011-03-10" "1998-11-30"

Resources