select rows by element components of timestamp - r

I have a vector made up of timestamps as POSIXlt, format: "2015-01-05 15:00:00, which I extracted from a timeframe.
I want to reassign the vector by loosing all elements where Minutes != 00
I've tried
vector <- vector[format(vector, "%M") == 00,]
which creates the following error of missing argument
Error in lapply(X = x, FUN = "[", ..., drop = drop) :
argument is missing, with no default
Also tried
vector <- vector["%M""== 00]
Which is seems to be an open command
Since POSIX time is stored as number of elapsed seconds since 1 Jan 1970, I guess that I could do this by excluding from my vector all elements which are not multiple of 3600. I rather not use this approach though. Thank you in advance, I'm new to R.

Format returns a character type, not numeric, so you should compare it to "00". Also the comma is not needed, as there's only 1 dimension.
vector <- vector[format(vector, "%M") == "00"]

You could try
v2[!v2$min]
#[1] "2015-01-05 15:00:00 EST" "2015-01-05 15:00:30 EST"
Or your command should also work without the comma
data
v1 <- c("2015-01-05 15:00:00", "2015-01-05 15:45:00", "2015-01-05 15:00:30")
v2 <- strptime(v1, '%Y-%m-%d %H:%M:%S')

Using:
vector2 <- vector2[v2$min==0]
I reassign vector 2 (v2) excluding all elements where minutes are not 0.
This was suggested by #akrun.
It does the selection while keeping data type as POSIX.
There were two issues with the first option of initial code:
1.function format() returns character;
2.there was a "," before last "]", which meant that the function was expecting another argument, which does not make sense to a vector as explained by #balint.
With the second option initially submitted there were a few syntax mistakes. The correct syntax is that on this answer, as suggested by #akron.

Related

error in getting the correct date using strptime in R

I'm using strptime to extract date and the result is a wrong year
Where is the error in the below code:
strptime('8/29/2013 14:13', "%m/%d/%y")
[1] "2020-08-29 PDT"
What are the other ways to extract date and time as separate columns.
The data I have is in this format - 8/29/2013 14:13
I want to split this into two columns, one is 8/29/2013 and the other is 14:13.
You have a four digit year so you need to use %Y
strptime('8/29/2013 14:13', "%m/%d/%Y" )
[1] "2013-08-29 CEST"
Do you really want data and time in separate columns? It usually much easier to deal with a single date-time object.
Here's one possibility to separate time and date from the string.
For convenience, we could first convert the string into a POSIX object:
datetime <- '8/29/2013 14:13'
datetime.P <- as.POSIXct(datetime, format='%m/%d/%Y %H:%M')
Then we can use as.Date() to extract the date from this object and use format() to display it in the desired format:
format(as.Date(datetime.P),"%m/%d/%Y")
#[1] "08/29/2013"
To store the time separately we can use, e.g., the strftime() function:
strftime(datetime.P, '%H:%M')
#[1] "14:13"
The last function (strftime()) is not vectorized, which means that if we are dealing with a vector datetime containing several character strings with date and time in the format as described in the OP, it should be wrapped into a loop like sapply() to extract the time from each string.
Example
datetime <- c('8/29/2013 14:13', '9/15/2014 12:03')
datetime.P <- as.POSIXct(datetime, format='%m/%d/%Y %H:%M')
format(as.Date(datetime.P),"%m/%d/%Y")
#[1] "08/29/2013" "09/15/2014"
sapply(datetime.P, strftime, '%H:%M')
#[1] "14:13" "12:03"
Hope this helps.

How can I append dates to a vector in R?

I created a vector using the vector() function:
actual_dates_vector <- vector()
I then extract the Julian date (eg: 2008201) from a text string:
julian_date<-substr(files[r],10,16)
I then convert the Julian date into YYYY-MM-DD format:
actual_date<-strptime(julian_date, "%Y %j")
This gives me a value like "2009-07-28". I then need to append this to the vector initially created. For which I do this:
actual_dates_vector<-c(actual_dates_vector,actual_date)
But this gives me:
$sec
[1] 0
$min
[1] 0
$hour
[1] 0
$mday
[1] 28
$mon
[1] 6
$year
[1] 109
$wday
[1] 2
$yday
[1] 208
$isdst
[1] 1
I don't understand what's going on. This code actually runs in a loop over multiple dates, so I want the date to be extracted from each date string, converted to YYYY-MM-DD format and appended to the vector. Is there a way to do this?
Thanks.
If you prefer a "loop & append" approach, you can do as follows :
# random data to emulate your files
files <- c("2008281","2009128","2010040")
n_files <- length(files)
# loop & append
actual_dates_vector <- vector()
for(r in 1:n_files){
dts <- as.POSIXct(files[r],format="%Y%j")
# convert dts (POSIXct class objects) to character with the desired format
dts <- format(dts,format="%Y-%m-%d")
actual_dates_vector <- c(actual_dates_vector,dts)
}
Date objects actually are something else under the hood. As you have seen POSIXlt's are actually lists of the date components while POSIXct's are basically doubles, so they're not what you see when you print them (also the printed format depends on the local settings so you can get different results on differnt machines).
For this reason, since you stated you want a specific representation of the dates (namely YYYY-MM-DD), I suggest you to follow the described approach and store the result into a vector of characters having the desired format.
strptime returns a POSIXlt object which is actually a list like you're seeing. If you use as.POSIXct instead of strptime you'll get the result you want.
Also, all the functions you're calling are vectorized so you don't need to do this append strategy, instead you should be able to:
strptime(substr(files, 10 ,16), '%Y %j')
Or something along those lines.
As pointed out in the comments, as.POSIXct calls strptime under the hood.

Controlling how a date-time object is printed without coercing to a character?

Imagine I have a data frame in which some columns represent dates or times. When working with these columns, it is convenient to have them formatted as POSIXlt objects (or other explicitly date/time oriented class).
However, when I display these columns to the screen or print them out to a .csv, I get the full ISO8601 formatted time. I realize I can turn the times into a character vector formatted however I desire using format(col, format="%m-%Y") or whatever I have in mind, but I'm not keen on changing the class of my object just to print. Other objects in R have print methods associated with them, we don't have to explicitly coerce them. Is there some way to do that with any of the date time classes of R objects that I've overlooked?
EDIT:
Here's a minimal example of what I'd hope to achieve:
a.datetime = Sys.time()
a.datetime
Displays:
2014-06-23 09:32:12
which is the format I get out in the CSV
write.csv(data.frame(a.datetime), "example.csv")
As I describe above, I realize I can coerce this to a character with the desired format manually, e.g.:
format(a.datetime, format="%y-%m")
write.csv(data.frame(format(a.datetime, format="%y-%m")), "example.csv")
Which is not what I want to have to do; I am looking for a way for the object to know how it should be printed without the user having to both apply that formatting and coerce to a character vector as shown above. (Hopefully this clarifies what I mean by changing type, I am referring the class of the output, not the class of the argument).
I can try to define such a class as below, e.g. using S3 classes, but it still does not print to csv using the format specified.
class(a.datetime) <- c("myclass", class(a.datetime))
attr(a.datetime, 'fmt') <- "%y-%m"
print.myclass <- function(x) print(format(x, format=attr(x,"fmt")))
print.csv(data.frame(a.datetime), "temp.csv")
Still prints a csv with the full ISO 8601 format.
It's pretty annoying that the base R functions for writing data don't have an argument to let the user easily adjust the datetime format.
There are ways around it, though. Here's what I've done sometimes when I want to specify a format quickly and I don't need to worry about side effects:
# In bash
Rscript -e "x <- readRDS('foo.rds'); "\
-e "as.character.POSIXct <- function(x) format(x, format='%Y-%m-%d %H:%M:%S%z'); " \
-e "write.csv(x, 'foo.csv', row.names=FALSE)"
(I'm showing that in a shell command just to emphasize that you'll want the new as.character.POSIXct method to disappear after using it.)
The essence is overriding the as.character method for the POSIXct class (for arcane reasons, overriding for the parent POSIXt class won't work):
as.character.POSIXct <- function(x)
format(x, format='%Y-%m-%d %H:%M:%S%z')
It's not something that should be done in a larger codebase where the global effects might spill into code that's not expecting it, though!
Some code to expand on my comment. R is a functional language so operations on a vector (and lists are actually vectors) will not change the vector, but will return a processed result and in the case of datatime objects that us usually a character vector. Here's a few views of a POSIXlt object:
x <- as.POSIXlt("2000-01-01")
x
#[1] "2000-01-01 PST"
x <- as.POSIXlt("2000-01-01 12:00:00")
x
#[1] "2000-01-01 12:00:00 PST"
str(x)
# POSIXlt[1:1], format: "2000-01-01 12:00:00"
mode(x)
#[1] "list"
x[[1]]
#[1] 0
x[[2]]
#[1] 0
x[[3]]
#[1] 12
x[[4]]
#[1] 1
unlist(x)
# sec min hour mday mon year wday yday isdst zone gmtoff
# "0" "0" "12" "1" "0" "100" "6" "0" "0" "PST" NA
mode(x[[3]])
#[1] "numeric"
# x[[10]]; mode(x[[10]])
#[1] "PST"
#[1] "character"
Notice that the unlist() process converted the list to a character vector. In R only lists can have mixed modes so the single character element in a POSIXlt object will end up coercing all of the elements that were stored as numeric values to character elements. As noted above POSIXlt object are tricky to use and the dataframe functions generally do not behave well with them because most (well-behaved) dataframe columns are atomic vectors rather than lists.

extract part of a date in a dataframe column

thanks for your help in advance. i am working with the getQuote function in the quantmod package, which returns the following data frame:
is there a way to modify all the dates in the first column to exclude the time stamp, while retaining the data frame structure? i just want the "YYYY-MM-DD" in the first column. i know that if it was a vector of dates, i would use substr(df[,1],1,10). i have also looked into the apply function, with: apply(df[,1],1,substr,1,10).
Another option not mentioned yet:
tt <- getQuote("AAPL")
trunc(tt[,1], units='days')
This returns the date in POSIXlt. You can wrap it in as.POSIXct, if you want.
using ?strptime
tt <- getQuote("AAPL")
tt[,1]
[1] "2013-01-16 02:52:00 CET"
as.POSIXct(strptime(tt[,1],format ='%Y-%m-%d')) ## as.POSIXct because strptime returns POSIXlt
[1] "2013-01-16 CET"
EDIT
You can use the format argument of POSIXct, but you need to convert the tt[,1] to character before.
as.POSIXct(as.character(tt[,1]),format ='%Y-%m-%d')
[1] "2013-01-16 CET"
I would do this with lubridate
library(plyr)
library(lubridate)
tickers <- c("AAPL","AAJX","ABR")
df <- ldply(tickers, getQuote)
rownames(df) <- tickers
df[,"Trade Time"] <- paste(year(df[,"Trade Time"]),month(df[,"Trade Time"]),day(df[,"Trade Time"]),sep="-")
There might be a more elegant way of printing the date, but this is what came to me first.
You may just use gsub. No need to convert data type.
tt <- getQuote("AAPL")
tt[, 'Trade Time']<- gsub(" [0-9]{2}:[0-9]{2}:[0-9]{2}", "", tt[, 'Trade Time'])
It can be as simple as:
tt[,1]=as.Date(tt[,1])
(where tt is tt <- getQuote("AAPL"), as shown in the alternative answers)
The blank before the comma means "do all rows" and the 1 after the comma means "operate on (just) the first column".
I prefer this solution because it gives you a Date object, which must be exactly what you want if you are trying to strip off timestamps.
agstudy's answer give you a date with a timezone, and that is going to bite you the first time you run your script in a different timezone. (Aside: I got some regressions in a unit test suite when I ran them in the U.K. while there at Christmas, due to a subtle timezone assumption in my test code.)

convert factor to date with empty cells

I have a factor vector x looking like this:
""
"1992-02-13"
"2011-03-10"
""
"1998-11-30"
Can I convert this vector to a date vector (using as.Date())?
Trying the obvious way gives me:
> x <- as.Date(x)
Error in charToDate(x) :
character string is not in a standard unambiguous format
At the moment I solve this problem like this:
> levels(x)[1] <- NA
> x <- as.Date(x)
But this doesn't look too elegant...
Thank you in advance!
You simply need to tell as.Date what format to expect in your character vector:
xd <- as.Date(x, format="%Y-%m-%d")
xd
[1] NA "1992-02-13" "2011-03-10" NA "1998-11-30"
To illustrate that these are indeed dates:
xd[3] - xd[2]
Time difference of 6965 days
PS. This conversion using as.Date works regardless of whether your data is a character vector or a factor.
When you pull in the data with read.csv, or others, you can set
read.csv(...,na.strings=c(""))
to avoid having to deal with this entirely.
I usually convert factors to a POSIX* type class using the function strptime. First argument is your vector and the second argument is the "pattern" by which the date/time is constructed (a % sign + a specific letter). You basically tell R that first you have a year, then you have a -, then a month and so on. See ?strptime for a full list of conversion specifications.
x <- factor(c("1992-02-13", "2011-03-10", "1998-11-30"))
(x.date <- strptime(x, format = "%Y-%m-%d"))
[1] "1992-02-13" "2011-03-10" "1998-11-30"
class(x.date)
[1] "POSIXlt" "POSIXt"
The same principle holds for as.Date. You tell R to "make this a date/time object and here are the instructions on how to make it".
(as.Date(x, "%Y-%m-%d"))
[1] "1992-02-13" "2011-03-10" "1998-11-30"

Resources