The following vector of Dates is given in form of a string sequence:
d <- c("01/09/1991","01/10/1991","01/11/1991","01/12/1991")
I would like to exemplary lag this vector by 1 month, that means to produce the following structure:
d <- c("01/08/1991","01/09/1991","01/10/1991","01/11/1991")
My data is much larger and I must impose higher lags as well, but this seems to be the basis I need to know.
By doing this, I would like to have the same format in the end again:("%d/%m/%Y). How can this be done in R? I found a couple of packages (e.g. lubridate), but I always have to convert between formats (strings, dates and more) so it's a bit messy and seems prone to mistake.
edit: some more info on why I want to do this: I am using this vector as rownames of a matrix, so I would prefer a solution where the final outcome is a string vector again.
This does not use any packages. We convert to "POSIXlt" class, subtract one from the month component and convert back:
fmt <- "%d/%m/%Y"
lt <- as.POSIXlt(d, format = fmt)
lt$mon <- lt$mon - 1
format(lt, format = fmt)
## [1] "01/08/1991" "01/09/1991" "01/10/1991" "01/11/1991"
My solution uses lubridatebut it does return what you want in the specified format:
require(lubridate)
d <- c("01/09/1991","01/10/1991","01/11/1991","01/12/1991")
format(as.Date(d,format="%d/%m/%Y")-months(1),'%d/%m/%Y')
[1] "01/08/1991" "01/09/1991" "01/10/1991" "01/11/1991"
You can then change the lag and (if you want) the output (which is this part : '%d/%m/%Y') by specifying what you want.
Related
If I have multiple spreadsheets with date column that have different formats, is it possible to convert the dates if some of the rows have the numeric format?
If I import the columns as character, I would still need to convert to date. I believe parse_date_time will fix anything accept the number format.
The following will convert the first two but not the numeric version. I don't think this function has a numeric function.
Is there a function that can process both Text and Numeric dates?
x<- c("2019-12-05","8-Dec-19","43787")
lubridate::parse_date_time(x, c("ymd", "d-b-y"))
It's a bit clunky, but you can take a second pass through the data with janitor::excel_numeric_to_date() for any values that are numeric and failed to parse via parse_date_time ... (if you're going to use this often, you can write a wrapper function - you might also want to suppress the warning messages from parse_date_time and as.numeric ...)
x <- c("2019-12-05","8-Dec-19","43787")
y <- lubridate::parse_date_time(x, c("ymd", "d-b-y"))
exceld <- is.na(y) & !is.na(as.numeric(x))
y[exceld] <- janitor::excel_numeric_to_date(as.numeric(x[exceld]))
Perhaps there is a better approach, but I was able to create a function that handles both the numeric and text versions of the date format with the tryFormats.
dfix(c("2019-12-05","8-Dec-19","43787"))
[1] "2019-12-05" "2019-12-08" "2019-11-18"
dfix<-function(x1){
dout<-c()
for (x in x1){
if(grepl('\\d{5}',x)){ #Check for numeric date (5 digits)
n<-as.numeric(x)
d<-as.Date(n, origin = "1899-12-30")
}
else if (grepl('-',x)){ # TryFormats for dates with "-"
d<-as.Date(x,tryFormats = c("%Y-%m-%d","%d-%b-%y"))
}
dout<-c(dout,d)
}
return (as.Date(dout,origin = '1970-01-01'))
}
I have a tibble in R with about 2,000 rows. It was imported from Excel using read_excel. One of the fields is a date field: dob. It imported as a string, and has dates in three formats:
"YYYY-MM-DD"
"DD-MM-YYYY"
"XXXXX" (ie, a five-digit Excel-style date)
Let's say I treat the column as a vector.
dob <- c("1969-02-02", "1986-05-02", "34486", "1995-09-05", "1983-06-05",
"1981-02-01", "30621", "01-05-1986")
I can see that I probably need a solution that uses both parse_date_time and as.Date.
If I use parse_date_time:
dob_fixed <- parse_date_time(dob, c("ymd", "dmy"))
This fixes them all, except the five-digit one, which returns NA.
I can fix the five-digit one, by using as.integer and as.Date:
dob_fixed2 <- as.Date(as.integer(dob), origin = "1899-12-30")
Ideally I would run one and then the other, but because each returns NA on the strings that don't work I can't do that.
Any suggestions for doing all? I could simply change them in Excel and re-import, but I feel like that's cheating!
We create a logical index after the first run based on the NA values and use that to index for the second run
i1 <- is.na(dob_fixed)
dob_fixed[i1] <- as.Date(as.integer(dob[i1]), origin = "1899-12-30")
I have a question which I think is solved by regex use in R.
I have a set of dates (as chr) which I would like in a different format (as chr).
I have tried to fool around with the below examples where the first (new_dates) gives the right format for months 1-9 and wrong for 10-12 and (new_dates2) gives the right format for 10-12 but nothing for 1-9.
I see that the code in the first case matches a single digit twice for 10-12, but don't really know how to tell it to match only single digit.
The final vector of correct dates shows the result I would like.
dates <- c("1/2016", "2/2016", "3/2016", "4/2016", "5/2016", "6/2016", "7/2016", "8/2016", "9/2016", "10/2016", "11/2016", "12/2016", "1/2017")
new_dates <- sub("(\\d)[:/:](\\d{4})","\\2M0\\1", dates)
new_dates2 <- sub("(\\d{2})[:/:](\\d{4})","\\2M\\1", dates)
correctdates <- c("2016M01", "2016M02", "2016M03", "2016M04", "2016M05", "2016M06", "2016M07", "2016M08", "2016M09", "2016M10", "2016M11", "2016M12", "2017M1")
Here's a base R method that will return the desired format:
format(as.Date(paste0("1/",dates), "%d/%m/%Y"), "%YM%m")
[1] "2016M01" "2016M02" "2016M03" "2016M04" "2016M05" "2016M06" "2016M07" "2016M08" "2016M09"
[10] "2016M10" "2016M11" "2016M12" "2017M01"
The idea is to first convert to a Date object and then use the format function to create the desired character representation. I pasted on 1/ so that a day is present in each element.
As #a p o m said it might be better to look for another solution if you are manipulating dates but if you want to stick with regular expressions you can try this one.
([02-9]|1[0-2]?)[:\/](\d{4}) example
new_dates <- sub("(\\d{1,2})\\/(\\d{4})","\\2M0\\1", dates)
It's fine.
thanks for your help in advance. i am working with the getQuote function in the quantmod package, which returns the following data frame:
is there a way to modify all the dates in the first column to exclude the time stamp, while retaining the data frame structure? i just want the "YYYY-MM-DD" in the first column. i know that if it was a vector of dates, i would use substr(df[,1],1,10). i have also looked into the apply function, with: apply(df[,1],1,substr,1,10).
Another option not mentioned yet:
tt <- getQuote("AAPL")
trunc(tt[,1], units='days')
This returns the date in POSIXlt. You can wrap it in as.POSIXct, if you want.
using ?strptime
tt <- getQuote("AAPL")
tt[,1]
[1] "2013-01-16 02:52:00 CET"
as.POSIXct(strptime(tt[,1],format ='%Y-%m-%d')) ## as.POSIXct because strptime returns POSIXlt
[1] "2013-01-16 CET"
EDIT
You can use the format argument of POSIXct, but you need to convert the tt[,1] to character before.
as.POSIXct(as.character(tt[,1]),format ='%Y-%m-%d')
[1] "2013-01-16 CET"
I would do this with lubridate
library(plyr)
library(lubridate)
tickers <- c("AAPL","AAJX","ABR")
df <- ldply(tickers, getQuote)
rownames(df) <- tickers
df[,"Trade Time"] <- paste(year(df[,"Trade Time"]),month(df[,"Trade Time"]),day(df[,"Trade Time"]),sep="-")
There might be a more elegant way of printing the date, but this is what came to me first.
You may just use gsub. No need to convert data type.
tt <- getQuote("AAPL")
tt[, 'Trade Time']<- gsub(" [0-9]{2}:[0-9]{2}:[0-9]{2}", "", tt[, 'Trade Time'])
It can be as simple as:
tt[,1]=as.Date(tt[,1])
(where tt is tt <- getQuote("AAPL"), as shown in the alternative answers)
The blank before the comma means "do all rows" and the 1 after the comma means "operate on (just) the first column".
I prefer this solution because it gives you a Date object, which must be exactly what you want if you are trying to strip off timestamps.
agstudy's answer give you a date with a timezone, and that is going to bite you the first time you run your script in a different timezone. (Aside: I got some regressions in a unit test suite when I ran them in the U.K. while there at Christmas, due to a subtle timezone assumption in my test code.)
I have a factor vector x looking like this:
""
"1992-02-13"
"2011-03-10"
""
"1998-11-30"
Can I convert this vector to a date vector (using as.Date())?
Trying the obvious way gives me:
> x <- as.Date(x)
Error in charToDate(x) :
character string is not in a standard unambiguous format
At the moment I solve this problem like this:
> levels(x)[1] <- NA
> x <- as.Date(x)
But this doesn't look too elegant...
Thank you in advance!
You simply need to tell as.Date what format to expect in your character vector:
xd <- as.Date(x, format="%Y-%m-%d")
xd
[1] NA "1992-02-13" "2011-03-10" NA "1998-11-30"
To illustrate that these are indeed dates:
xd[3] - xd[2]
Time difference of 6965 days
PS. This conversion using as.Date works regardless of whether your data is a character vector or a factor.
When you pull in the data with read.csv, or others, you can set
read.csv(...,na.strings=c(""))
to avoid having to deal with this entirely.
I usually convert factors to a POSIX* type class using the function strptime. First argument is your vector and the second argument is the "pattern" by which the date/time is constructed (a % sign + a specific letter). You basically tell R that first you have a year, then you have a -, then a month and so on. See ?strptime for a full list of conversion specifications.
x <- factor(c("1992-02-13", "2011-03-10", "1998-11-30"))
(x.date <- strptime(x, format = "%Y-%m-%d"))
[1] "1992-02-13" "2011-03-10" "1998-11-30"
class(x.date)
[1] "POSIXlt" "POSIXt"
The same principle holds for as.Date. You tell R to "make this a date/time object and here are the instructions on how to make it".
(as.Date(x, "%Y-%m-%d"))
[1] "1992-02-13" "2011-03-10" "1998-11-30"