Convert a string into dates using R - r

I have a column of dates written as monthyear in the format:
11960 - this would be Jan 1960
121960 - this would be Dec 1960
I would like to convert this column into a day-month-year format assuming the first of the month as each date.
I have tried (using one number as an example as opposed to dt$dob)
x <- sprintf("%08d%", 11960)
and then x <- as.date(x, format = "%d%m%Y)
but this gives me NAs as I assume it doesn't like the 00 at the start
So I tried pasting 01 to each value but this pastes it to the end (R noob here). I was thinking maybe posting 01 to the start and then using the sprintf function may work still:
paste 01 to start of 11960 = 011960
sprintf("%08d%", 011960) to maybe give 0101960?
Then use as.Date to convert?
Many thanks for your help

i used paste0() instead of sprintf, but it seems it works.
> x<-paste0("010",11960)
> x
[1] "01011960"
> as.Date(x , format = "%d%m%Y" )
[1] "1960-01-01"
EDIT for 2 digit months i use ifelse() and nchar()
y<-c(11960,11970,11980, 111960,111970,111980)
x<-ifelse(nchar(y) == 5,paste0("010",y),paste0("01",y))
> x
[1] "01011960" "01011970" "01011980" "01111960" "01111970" "01111980"
as.Date(x , format = "%d%m%Y" )
[1] "1960-01-01" "1970-01-01" "1980-01-01" "1960-11-01" "1970-11-01" "1980-11-01"

Related

What does calling as.numeric() do to a lubridate Date object?

I am working with an external package that's converting columns of a dataframe with the lubridate date type Date into numeric type. (Confirmed by running as.numeric() on the columns).
I'm wondering if there's a way to convert it back?
For example, if I have the date "O1-01-2021" then running as.numeric on it returns -719143. How can I turn that back into "O1-01-2021" ?
Note that Date class is part of base R, not lubridate.
You probably assumed that the data was year/month/day by mistake. Using base R to eliminate lubridate as a problem we can replicate the question's result like this:
as.numeric(as.Date("01-01-2021", "%Y-%m-%d"))
## [1] -719143
Had we used day/month/year we would have gotten:
as.numeric(as.Date("01-01-2021", "%d-%m-%Y"))
## [1] 18628
or using lubridate
library(lubridate)
as.numeric(dmy("01-01-2021"))
## [1] 18628
It would be best if you fix the mistake that resulted in -719143 but if you don't control that and are faced with an input of
-719143 and want to get as.Date("2021-01-01") as the output then:
# input x is numeric; result is Date class
fixup <- function(x) as.Date(format(.Date(x), "%y-%m-%d"), "%d-%m-%y")
fixup(-719143)
## [1] "2020-01-01"
Note that we can't tell from the question whether 01-01-2020 is supposed to represent day-month-year or month-day-year so we assumed the first but if it is to represent the second then it should be obvious at this point how to proceed.
EDIT #2: It looks like the original data is being parsed as Jan 20, year 1, which might happen if the year-month-day columns were jumbled while being parsed:
as.numeric(as.Date("01-01-2021", format = "%Y-%m-%d", origin = "1970-01-01"))
[1] -719143
as.numeric(as.Date("0001-01-20", origin = "1970-01-01"))
[1] -719143
Is there a way to share an example of the raw data as you have it? e.g. dput(MY_DATA[1:10, DATE_COL])
EDIT: -719143 is about 1970 years of days, which can't be a coincidence, given that many date/time formats use 1970 as a baseline. I wonder if 01-01-2021 is being interpreted as the numeric formula equal to -2021 and so we're looking at perhaps -2021 seconds/days/[?] before year zero, which would be about -1970 years before the epoch...
-719143/(365)
[1] -1970.255
For instance, we can get something close with:
as.numeric(as.Date("0000-01-01", origin = "1970-01-01"))
[1] -719528
Original answer:
R treats a string describing a date as text:
x <- "01-01-2021"
class(x)
[1] "character"
We can convert it to a Date data type using these two equivalent commands:
base_dt <- as.Date(x, "%m-%d-%Y") # base R version
lubridt <- lubridate::mdy(x) # convenience lubridate function
identical(base_dt, lubridt)
[1] TRUE
Under the hood, a Date object in R is a numeric value with a flag telling R it's a date:
> typeof(lubridt) # What general type of data is it?
[1] "double" # --> numeric, stored as a double
> as.numeric(lubridt)
[1] 18628
> class(lubridt) # Does it have any special class attributes?
[1] "Date" # --> yes, it's a Date
> dput(lubridt) # How would we construct it from scratch?
structure(18628, class = "Date") # --> by giving 18628 a Date attribute
In R, a Date is encoded as the number of days since 1970 began:
> as.Date("1970-01-1") + as.numeric(lubridt)
[1] "2021-01-01"
We could convert it back to the original text using:
format(base_dt, "%m-%d-%Y")
[1] "01-01-2021"
identical(x, format(base_dt, "%m-%d-%Y"))
[1] TRUE

Converting character string to numeric value in user defined function R

I am trying to create a function that will take an input of character strings "mm/did/yyyy" and return an output vector in numeric form the month, day and year. I essentially want to combine this new function with the weekday() function created below to ultimately return the day of the week the input character string corresponds with.
weekday<-function(q,r,s)
{ (if(q>= 3)
m<-(q-2)
else
m<-(q+10))
k<-r
c<-floor(s/100)
(if(q>=3)
y<-s%%100
else
y<-(s%%100)-1)
f<-(floor((2.6*m)-0.2)+k+y+floor(y/4)+floor(c/4)-(2*c))%%7
if(f==0){return("Sunday")}
else
if(f==1){return("Monday")}
else
if(f==2){return("Tuesday")}
else
if(f==3){return("Wednesday")}
else
if(f==4){return("Thursday")}
else
if(f==5){return("Friday")}
else
if(f==6){return("Saturday")}}
I tried using something along the lines of type.convert but this isn't producing the desired output. Any help would be great thanks!
dateconvert<-function("q/r/s")
{
type.convert(dateconvert(), na.strings = )
weekday(convertedanswer)
Return (weekday)
}
Have you tried lubridate package?
input <- "12/30/2017"
# change into as.date format
inputdate <- strptime(input, "%m/%d/%Y")
library("lubridate")
day(inputdate)
# [1] 30
month(inputdate)
# [1] 12
year(inputdate)
# [1] 2017
It seems like a roundabout way to get to the day of week though. You should try using wday() that comes with lubridate package.
wday(inputdate, label=T)
# [1] Sat
# Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat.
# as a Ordered factor (Sunday is first)
wday(inputdate)
# [1] 7
# wday returns the day of the week as a decimal number (01-07, Sunday is 1) or an .
Alternatively without package:
date = "06/10/2017"
POSIXdate = as.POSIXlt(date, format = "%d/%m/%Y")
strftime(POSIXdate, "%A")
#Friday
# Or if you like one-liner
strftime(as.POSIXlt("dd/mm/yyyy", format= "%d/%m/%Y"), "%A")
The lubridate package is great for this! You can use lubridate::mdy() to convert a date in the "mm/dd/yyyy" formate you mentioned, and then lubridate::week() to get the week.
lubridate::week(lubridate::mdy("10/05/2017"))
#> 5
If you want the day itself, rather than the numeric output, you can use
lubridate::wday(lubridate::mdy("10/05/2017"),
label = TRUE)
#> [1] Thurs

Converting dates before January 1, 1970 in R

I am trying to convert a column of dates into Date objects in R, but I can't seem to get the desired results. These individuals have birth dates before January 1, 1970, so when I use as.Date R converts a date like 1/12/54, for example, to 2054-01-12. How can I work around this? Thanks so much.
No need for add-on packages, base R is fine. But you need to specify the century:
R> as.Date("1954-01-12")
[1] "1954-01-12"
R>
If you need non-default formats, just specify them:
R> as.Date("19540112", "%Y%m%d")
[1] "1954-01-12"
R>
Edit: In case your data really comes in using the %y% format, and you happen to make the policy decision that the 19th century is needed
, here is one base R way of doing it:
R> d <- as.Date("540112", "%y%m%d")
R> dlt <- as.POSIXlt(d)
R> dlt$year <- dlt$year - 100
R> as.Date(dlt)
[1] "1954-01-12"
R>
If everything is in the 1900s, its a one-liner - just format it with a two-digit year at the start and slap a 19 on the front and convert to a date. Again. Man this would look cool some %>% stuff:
s = c("1/12/54","1/12/74")
as.Date(format(as.Date(s,format="%d/%m/%y"), "19%y%m%d"), "%Y%m%d")
# [1] "1954-12-01" "1974-12-01"
If years from "69" to "99" are 1800s, then here's another one-liner:
library(dplyr) # for pipe operator:
s %>% as.Date(format="%d/%m/%y") %>%
format("%y%m%d") %>%
(function(d){
paste0(ifelse(d>700101,"18","19"),d)
}) %>%
as.Date("%Y%m%d")
## [1] "1954-12-01" "1874-12-01"
Note not thoroughly tested so might be some off-by-one errors or I've mixed months and days because you need to be ISO8601 Compliant
I would do:
library(lubridate)
x <- as.Date("1/12/54", format = "%m/%d/%y")
year(x) <- 1900 + year(x) %% 100
> x
[1] "1954-01-12"

Extract Hour From "yyyymmddHH"

This is probably asked many times but I couldn't find related resource and just can't get it right. I have a data frame with an HourStamp column in yyyymmddHH format and I need to extract the HH from it. How can I do it?
As an example:
HourStamp Hour
2013050100 00
2013050101 01
2013050102 02
...
I need that Hour column added. Thanks!
Like #Klaus already commented, in this case a simple substr would to the trick, i.e. substr('2013050100', 9, 10). Remember that substr is vectorized so you can simply do:
df$Hour = substr(df$HourStamp, 9, 10)
A more flexible and powerful way of dealing with dates/times is to simply convert HourStamp into a real R date object:
d = strptime('2013050100', format = '%Y%m%d%H')
strftime(d, '%H')
[1] "00"
In this case the strptime solution is a bit cumbersome, but it allows for stuff like:
> strftime(d, '%A %d of %B in the year %Y')
[1] "Wednesday 01 of May in the year 2013"
or:
strftime(d, 'file%Y%d.csv')
[1] "file201301.csv"
or in vectorized form for your example:
df$time = strptime(df$HourStamp, format = '%Y%m%d%H')
df$Hour = strftime(df$time, '%H')

Add correct century to dates with year provided as "Year without century", %y

I have an file with birthdays in %d%b%y format. Some eg.
# "01DEC71" "01AUG54" "01APR81" "01MAY81" "01SEP83" "01FEB59"
I tried to reformat the date as
o108$fmtbirth <- format(as.Date(o108$birth, "%d%b%y"), "%Y/%m/%d")
and this is the result
# "1971/12/01" "2054/08/01" "1981/04/01" "1981/05/01" "1983/09/01" "2059/02/01"
These are birthdays and I see 2054. From this page I see that year values between 00 and 68 are coded as 20 for century. Is there a way to toggle this, in my case I want only 00 to 12 to be coded as 20.
1) chron. chron uses 30 by default so this will convert them converting first to Date (since chron can't read those sorts of dates) reformatting to character with two digit years into a format that chron can understand and finally back to Date.
library(chron)
xx <- c("01AUG11", "01AUG12", "01AUG13") # sample data
as.Date(chron(format(as.Date(xx, "%d%b%y"), "%m/%d/%y")))
That gives a cutoff of 30 but we can get a cutoff of 13 using chron's chron.year.expand option:
library(chron)
options(chron.year.expand =
function (y, cut.off = 12, century = c(1900, 2000), ...) {
chron:::year.expand(y, cut.off = cut.off, century = century, ...)
}
)
and then repeating the original conversion. For example assuming we had run this options statement already we would get the following with our xx :
> as.Date(chron(format(as.Date(xx, "%d%b%y"), "%m/%d/%y")))
[1] "2011-08-01" "2012-08-01" "1913-08-01"
2) Date only. Here is an alternative that does not use chron. You might want to replace "2012-12-31" with Sys.Date() if the idea is that otherwise future dates are really to be set 100 years back:
d <- as.Date(xx, "%d%b%y")
as.Date(ifelse(d > "2012-12-31", format(d, "19%y-%m-%d"), format(d)))
EDIT: added Date only solution.
See response from related thread:
format(as.Date("65-05-14", "%y-%m-%d"), "19%y-%m-%d")
o108$fmtbirth <- format(as.Date(o108$birth, "%d%b%y"), "%Y/%m/%d")
o108$fmtbirth <- as.Date(ifelse(o108$fmtbirth > Sys.Date(),
format(o108$fmtbirth, "19%y-%m-%d"),
format(o108$fmtbirth)))

Resources