How to convert a character column to date in R
My column is in the format "20110314" (exclude ") where first 4 characters refer to year, next two to month and the last two to date.
You can use lubridate: https://github.com/tidyverse/lubridate
Date-time data can be frustrating to work with in R. R commands for date-times are generally unintuitive and change depending on the type of date-time object being used. Moreover, the methods we use with date-times must be robust to time zones, leap days, daylight savings times, and other time related quirks, and R lacks these capabilities in some situations. Lubridate makes it easier to do the things R does with date-times and possible to do the things R does not.
Installation:
# The easiest way to get lubridate is to install the whole tidyverse:
install.packages("tidyverse")
# Alternatively, install just lubridate:
install.packages("lubridate")
# Or the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/lubridate")
Usage:
ymd(20110314)
#> [1] "2011-03-14"
strptime(as.character(20110314),"%Y%m%d")
[1] "2011-03-14 PDT"
Related
Tried using ymd() function on the column reign_start but give
error as:
Warning: All formats failed to parse. No formats found.
[1] NA
The years are represented in B.C.
ymd() is for converting character representations of dates into variables of class Date with format YYYY-MM-DD. Your dates are already of class Date.
The lubridate function you want is year(). I'm not sure how that handles BC dates but I assume you would like 0026-01-16 to return 26.
So try year(reign_start). Here are some examples which do require ymd() first (your data doesn't).
library(lubridate)
year(ymd("0026-01-16"))
[1] 26
# do we set year negative for BC? also works
year(ymd("-0026-01-16"))
[1] 26
How can set R to count months instead of dates when converting integers to dates?
After reading several threads on how to convert dates in R, it seems like nobody has asked how it is possible to convert numeric dates if the numerics is given in monthly timeseries. E.g. 552 represents January 2006.
I have tried several things, such as using as.Date(dates,origin="1899-12-01"), but I reckognize that R counts days instead of months. Thus, the code on year-month number 552 above yields "1901-06-06" instead of the correct 2006-01-01.
Sidenote: I also want the format to be YEARmonth, but does R allow displaying dates without days?
I think your starting date should be '1960-01-01'.
anyway you can solve this problem using the package lubridate.
in this case you can start from a date and add months.
library(lubridate)
as.Date('1960-01-01') %m+% months(552)
it gives you
[1] "2006-01-01"
you can display only the year and month of a date, but in that case R coerces the date into a character.
format(as.Date('2006-01-01'), "%Y-%m")
I have string like 1/1/-2150. How to make Date format from that in R
lubridate back:
library(lubridate)
dmy("1/1/-2150")
[1] "2150-01-01"
as.Date("1/1/-2150",format="%d/%m/%Y")
[1] NA
Now 1/1/-2150 have class character. I need same value but with class Date
Thanks
UPDATE
Something like that, but using lubridate if it possible
minus=as.numeric(dmy("1/1/-2150"))
x<-as.numeric(ymd("0000-1-1"))
dt=as.Date(x*2-minus,origin="1970-01-01")+days(1)
str(dt)
Date[1:1], format: "-2150-01-01"
We need to specify the - in the format
as.Date("1/1/-2150",format="%d/%m/-%Y")
#[1] "2150-01-01"
Unfortunately there aren't any really large R packages tackling this issue (maybe no one has asked). The gregorian package should however, be able to tackle your BCE needs.
gregorian::as_gregorian("-2150-1-1")
[1] "Tuesday January 1, 2151 BCE"
I have 2 Date variables in a .csv file with formats of "07-JUL-16 06.05.54.000000 AM". I want to use these in a regression model. Should I be reading these into a data frame as factors or characters? How can I take a difference of the 2 dates in each case?
Read them in as characters (e.g. stringsAsFactors=FALSE or tidyverse functions), then use as.POSIXct, e.g.
as.POSIXct("07-JUL-16 06.05.54.000000 AM",format="%d-%b-%y %I.%M.%OS %p")
## [1] "2016-07-07 06:05:54 EDT"
(I'm assuming that you are intending a day-month-year format rather than a month-day-year format -- but actually I don't have any evidence to support that thought!)
Once you've done this, subtracting the values should just work (give you an object of difftime) -- but be careful with units when converting to numeric!
For what it's worth, lubridate::ymd_hms thinks it can guess the format, but guesses wrong (?? assuming I guessed right above: with a two-digit year, and without any year values greater than 31, there's really nothing to distinguish years and days ...)
I am using the new version of data.table and especially the AWESOME fread function. My files contain dates that are loaded as strings (cause I don't know to do it otherwise) looking like 01APR2008:09:00:00.
I need to sort the data.table on those datetimes and then for the sort to be efficient to cast then in the IDateTime format (or anything alse I would not know yet).
> strptime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S")
[1] "2008-04-01 09:00:00"
> IDateTime(strptime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S"))
idate itime
1: 2008-04-01 09:00:00
> IDateTime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S")
Error in charToDate(x) :
character string is not in a standard unambiguous format
It looks like I cannot do DT[ , newType := IDateTime(strptime(oldType, "%d%b%Y:%H:%M:%S"))].
My questions are then:
Is there a way to cast directly to IDateTime from fread, such that I can sort afterward efficiently?
If not, what is the most efficient way to go knowing that I would like to be able to sort DT by this datetime column
Unfortunately (for efficiency) strptime produces a POSIXlt type, which is unsupported by data.table and always will be due its size (40 bytes per date!) and structure. Although strftime produces the much better POSIXct, it still does it via POSIXlt. More info here :
http://stackoverflow.com/a/12788992/403310
Looking to base functions such as as.Date, it uses strptime too, creating an integer offset from epoch (oddly) stored as double. The IDate (and friends) class in data.table aims to achieve integer epoch offsets stored as, um, integer. Suitable for fast sorting by base::sort.list(method = "radix") (which is really a counting sort). IDate doesn't really aim to be fast at (usually one off) conversion.
So to convert string dates/times, rightly or wrongly, I tend to roll my own helper function.
If the string date is "2012-12-24" I'd lean towards: as.integer(gsub("-", "", col)) and proceed with YYYYMMDD integer dates. Similarly times can be HHMMDD as an integer. Two columns: date and time separately can be useful if you generally want to roll = TRUE within a day, but not to the previous day. Grouping by month is simple and fast: by = date %/% 100L. Adding and subtracting days is troublesome, but it is anyway because rarely do you want to add calendar days, rather weekdays or business days. So that's a lookup to your business day vector anyway.
In your case the character month would need a conversion to 1:12. There isn't a separator in your dates "01APR2008", so a substring would be one way followed by a match or fmatch on the month name. Are you in control of the file format? If so, numbers are better in an unambiguous format that sorts naturally such as %Y-%m-%d, or %Y%m%d.
I haven't yet got to how best do this in fread, so date/times are left as character currently because I'm not yet sure how to detect the date format or which type to output. It does need to output either integer or double dates though, rather than inefficient character. I suspect that my use of YYYYMMDD integers are seen as unconventional, so I'm a little hesitant to make that the default. They have their place, and there are pros and cons of epoch based dates too. Dates don't have to be always epoch based is all I'm suggesting.
What do you think? Btw, thanks for encouragement on fread; was nice to see.
I d'ont know how your file is structured, but from your comment you want to use the date field as a key. Why not to read it as a time series and format it when in reading?
Here I use zoo to do it.(Here I suppose that the date column is the first one,otherwise see index.colum argument)
ff <- function(x) as.POSIXct(strptime(x,"%d%b%Y:%H:%M:%S"))
h <- read.zoo(text = "03avril2008:09:00:00 125
02avril2008:09:30:00 126
05avril2008:09:10:00 127
04avril2008:09:20:00 128
01avril2008:09:00:00 128"
,FUN=ff)
You get your dates sorted in the right format and sorted.
The conversion is natural from POSIXct to IDateTime
IDateTime(index(h))
idate itime
1: 2008-04-01 09:00:00
2: 2008-04-02 09:30:00
3: 2008-04-03 09:00:00
4: 2008-04-04 09:20:00
5: 2008-04-05 09:10:00
Here sure you still do 2 conversions, But you do it when reading data, and the second you do it without dealing with any format problem.