R- date time variable loses format after ifelse [duplicate] - r

This question already has answers here:
How to prevent ifelse() from turning Date objects into numeric objects
(7 answers)
Closed 10 months ago.
I have a variable in the proper POSIXct format, converted with ymd_hms(DateTime) {lubridate}.
However, after a transformation the variable loses its POSIXct format:
daily$DateTime<- ifelse(daily$ID %in% "r1_1"|daily$ID %in% "r1_2",
NA,daily$DateTime)
I try to convert the variable again to POSIXct with lubridate, but it seems it does´t like the NAs, and, in addition, now the variable DateTime has a num format that lubridate does´t recognise as a date and time format (e.g. 1377419400).
Please, any help to make the required transformation to convert to NA the DateTime when ID== r1_1 and r1_2??
Thanks

The following should work:
daily <- data.frame(DateTime = seq(Sys.time(), length.out=10, by=1000), ID=rep(1:2,5))
daily$DateTime[daily$ID%in%2]<-NA
(Although the solution with is.na<- is fine too. There is just the general logic of setting is.na that doesn't make much sense - but that's no problem as long as you make sure things don't get too complicated.)
ifelse does some implicit conversions so I don't think it would ever be possible to have a date class preserved using ifelse.

The idiomatic way to set NA values is to use is.na<-, most classes (including Dates) will be dealt with appropriately
is.na(daily$DateTime) <- daily$ID %in% c('r1_1', 'r1_2')
Should do the trick.
Using the example from ?as.POSIXct
## SPSS dates (R-help 2006-02-16)
z <- c(10485849600, 10477641600, 10561104000, 10562745600)
zz <- as.POSIXct(z, origin = "1582-10-14", tz = "GMT")
is.na(zz) <- c(FALSE, TRUE, FALSE, FALSE)
zz
# [1] "1915-01-26 GMT" NA "1917-06-15 GMT" "1917-07-04 GMT"

Related

Convert YYYYMM factor format to YYYY-MM format

I have data which have the format of YYYYMM and I wish convert it to YYYY-MM format.
exemple : 201805 should be in the format of 2018-05
How could I do it please ?
We can use as.yearmon from zoo to convert it to yearmon object and then do the format
library(zoo)
format(as.yearmon(as.character(v1), "%Y%m"), "%Y-%m")
#[1] "2018-05"
data
v1 <- 201805
I like the idea of using actual dates here. If the days component does not matter to you, then you may arbitrarily just set each of your dates to the first of the month. Then, we can leverage R's dates functions to handle the heavy lifting.
x <- "201805"
x <- paste0(x, "01")
x
y <- format(as.Date(x, format = "%Y%m%d"), "%Y-%m-%d")
substr(y, 1, 7)
[1] "20180501"
[1] "2018-05"
You could use regular expressions:
data <- "201805"
sub("(\\d{4})", "\\1-", data)
[1] "2018-05"
Another variant, using only lookarounds:
sub("(?<=\\d{4})(?=\\d{2})", "-", data, perl=TRUE)
How about following one(I am considering that OP need not to perform any checks on its variable's value here).
val="201805"
sub("(..$)","-\\1",val)
OR to perform substitution with last 2 digits only try following.
val="201805"
sub("(\\d{2}$)","-\\1",val)
[1] "2018-05"
Very similar to some of the others, but because I find the package useful I will mention it:
library(lubridate)
date <- "201805"
format(ymd(paste0(date,"01")), "%Y-%m")
Lubridate can make life easy if the formats start to vary.
Here is another option albeit a longer one:
library(tidyverse)
somestring<-"201805"
stringi::stri_sub(somestring,1,4)<-"-"
somestring1<-"201805"
somestring2<-substring(somestring1,1,4)
as.character.Date(paste0(somestring2,somestring))
Result:
"2018-05"

How to lag dates in form of strings in R

The following vector of Dates is given in form of a string sequence:
d <- c("01/09/1991","01/10/1991","01/11/1991","01/12/1991")
I would like to exemplary lag this vector by 1 month, that means to produce the following structure:
d <- c("01/08/1991","01/09/1991","01/10/1991","01/11/1991")
My data is much larger and I must impose higher lags as well, but this seems to be the basis I need to know.
By doing this, I would like to have the same format in the end again:("%d/%m/%Y). How can this be done in R? I found a couple of packages (e.g. lubridate), but I always have to convert between formats (strings, dates and more) so it's a bit messy and seems prone to mistake.
edit: some more info on why I want to do this: I am using this vector as rownames of a matrix, so I would prefer a solution where the final outcome is a string vector again.
This does not use any packages. We convert to "POSIXlt" class, subtract one from the month component and convert back:
fmt <- "%d/%m/%Y"
lt <- as.POSIXlt(d, format = fmt)
lt$mon <- lt$mon - 1
format(lt, format = fmt)
## [1] "01/08/1991" "01/09/1991" "01/10/1991" "01/11/1991"
My solution uses lubridatebut it does return what you want in the specified format:
require(lubridate)
d <- c("01/09/1991","01/10/1991","01/11/1991","01/12/1991")
format(as.Date(d,format="%d/%m/%Y")-months(1),'%d/%m/%Y')
[1] "01/08/1991" "01/09/1991" "01/10/1991" "01/11/1991"
You can then change the lag and (if you want) the output (which is this part : '%d/%m/%Y') by specifying what you want.

select rows by element components of timestamp

I have a vector made up of timestamps as POSIXlt, format: "2015-01-05 15:00:00, which I extracted from a timeframe.
I want to reassign the vector by loosing all elements where Minutes != 00
I've tried
vector <- vector[format(vector, "%M") == 00,]
which creates the following error of missing argument
Error in lapply(X = x, FUN = "[", ..., drop = drop) :
argument is missing, with no default
Also tried
vector <- vector["%M""== 00]
Which is seems to be an open command
Since POSIX time is stored as number of elapsed seconds since 1 Jan 1970, I guess that I could do this by excluding from my vector all elements which are not multiple of 3600. I rather not use this approach though. Thank you in advance, I'm new to R.
Format returns a character type, not numeric, so you should compare it to "00". Also the comma is not needed, as there's only 1 dimension.
vector <- vector[format(vector, "%M") == "00"]
You could try
v2[!v2$min]
#[1] "2015-01-05 15:00:00 EST" "2015-01-05 15:00:30 EST"
Or your command should also work without the comma
data
v1 <- c("2015-01-05 15:00:00", "2015-01-05 15:45:00", "2015-01-05 15:00:30")
v2 <- strptime(v1, '%Y-%m-%d %H:%M:%S')
Using:
vector2 <- vector2[v2$min==0]
I reassign vector 2 (v2) excluding all elements where minutes are not 0.
This was suggested by #akrun.
It does the selection while keeping data type as POSIX.
There were two issues with the first option of initial code:
1.function format() returns character;
2.there was a "," before last "]", which meant that the function was expecting another argument, which does not make sense to a vector as explained by #balint.
With the second option initially submitted there were a few syntax mistakes. The correct syntax is that on this answer, as suggested by #akron.

extract part of a date in a dataframe column

thanks for your help in advance. i am working with the getQuote function in the quantmod package, which returns the following data frame:
is there a way to modify all the dates in the first column to exclude the time stamp, while retaining the data frame structure? i just want the "YYYY-MM-DD" in the first column. i know that if it was a vector of dates, i would use substr(df[,1],1,10). i have also looked into the apply function, with: apply(df[,1],1,substr,1,10).
Another option not mentioned yet:
tt <- getQuote("AAPL")
trunc(tt[,1], units='days')
This returns the date in POSIXlt. You can wrap it in as.POSIXct, if you want.
using ?strptime
tt <- getQuote("AAPL")
tt[,1]
[1] "2013-01-16 02:52:00 CET"
as.POSIXct(strptime(tt[,1],format ='%Y-%m-%d')) ## as.POSIXct because strptime returns POSIXlt
[1] "2013-01-16 CET"
EDIT
You can use the format argument of POSIXct, but you need to convert the tt[,1] to character before.
as.POSIXct(as.character(tt[,1]),format ='%Y-%m-%d')
[1] "2013-01-16 CET"
I would do this with lubridate
library(plyr)
library(lubridate)
tickers <- c("AAPL","AAJX","ABR")
df <- ldply(tickers, getQuote)
rownames(df) <- tickers
df[,"Trade Time"] <- paste(year(df[,"Trade Time"]),month(df[,"Trade Time"]),day(df[,"Trade Time"]),sep="-")
There might be a more elegant way of printing the date, but this is what came to me first.
You may just use gsub. No need to convert data type.
tt <- getQuote("AAPL")
tt[, 'Trade Time']<- gsub(" [0-9]{2}:[0-9]{2}:[0-9]{2}", "", tt[, 'Trade Time'])
It can be as simple as:
tt[,1]=as.Date(tt[,1])
(where tt is tt <- getQuote("AAPL"), as shown in the alternative answers)
The blank before the comma means "do all rows" and the 1 after the comma means "operate on (just) the first column".
I prefer this solution because it gives you a Date object, which must be exactly what you want if you are trying to strip off timestamps.
agstudy's answer give you a date with a timezone, and that is going to bite you the first time you run your script in a different timezone. (Aside: I got some regressions in a unit test suite when I ran them in the U.K. while there at Christmas, due to a subtle timezone assumption in my test code.)

convert factor to date with empty cells

I have a factor vector x looking like this:
""
"1992-02-13"
"2011-03-10"
""
"1998-11-30"
Can I convert this vector to a date vector (using as.Date())?
Trying the obvious way gives me:
> x <- as.Date(x)
Error in charToDate(x) :
character string is not in a standard unambiguous format
At the moment I solve this problem like this:
> levels(x)[1] <- NA
> x <- as.Date(x)
But this doesn't look too elegant...
Thank you in advance!
You simply need to tell as.Date what format to expect in your character vector:
xd <- as.Date(x, format="%Y-%m-%d")
xd
[1] NA "1992-02-13" "2011-03-10" NA "1998-11-30"
To illustrate that these are indeed dates:
xd[3] - xd[2]
Time difference of 6965 days
PS. This conversion using as.Date works regardless of whether your data is a character vector or a factor.
When you pull in the data with read.csv, or others, you can set
read.csv(...,na.strings=c(""))
to avoid having to deal with this entirely.
I usually convert factors to a POSIX* type class using the function strptime. First argument is your vector and the second argument is the "pattern" by which the date/time is constructed (a % sign + a specific letter). You basically tell R that first you have a year, then you have a -, then a month and so on. See ?strptime for a full list of conversion specifications.
x <- factor(c("1992-02-13", "2011-03-10", "1998-11-30"))
(x.date <- strptime(x, format = "%Y-%m-%d"))
[1] "1992-02-13" "2011-03-10" "1998-11-30"
class(x.date)
[1] "POSIXlt" "POSIXt"
The same principle holds for as.Date. You tell R to "make this a date/time object and here are the instructions on how to make it".
(as.Date(x, "%Y-%m-%d"))
[1] "1992-02-13" "2011-03-10" "1998-11-30"

Resources