convert year and month into date format - r

I have dates in format
192607 192608
and want to transform them so that they are in the following format and can be used for a xts object
1926-07-01 1926-08-01
I have tried working with as.date and paste() but couldn't make it work.
Help is very much appreciated. Thank you!!

You need to paste then put format date. Something like this:
dates <- c("192607", "192608")
dates <- paste0(dates,"01")
dates <- as.Date(dates, format ="%Y%m%d")
dates
The result is
[1] "1926-07-01" "1926-08-01"

Assuming all the dates will be converted to the first of the month, this lubridate solution works.
library(lubridate)
dates <- c(192607, 192608)
dates <- paste0(dates, '01') # add 01 for day of month
# output: "19260701" "19260801"
dates <- ymd(dates)
# output: "1926-07-01" "1926-08-01"

Related

Convert character to date format and then compute difference in days

I know this question has probably been answered in different ways, but still struggling with this. I am working with a dataset where the dates format for date1 is '2/1/2000', '5/12/2000', '6/30/2015' where the class() is character. And the second column of dates date2 in the format '2015-07-06', '2015-08-01', '2017-10-09' where the class() is "POSIXct" "POSIXt" .
I am attempting to standardize both columns so I can compute the difference in days between them using something like this
abs(difftime(date1 ,date2 , units = c("days")))
I have tried numerous ways in converting the first date1 into the same class using strtime, lubridate etc. What's the best way to move forward for me to be able to standardize both and compute the difference in days?
sample data
x <- c('2/1/2000', '5/12/2000', '6/30/2015')
y <- as.POSIXct(c('2015-07-06', '2015-08-01', '2017-10-09'))
code
#make both posixct
x2 <- as.POSIXct(x, format = "%m/%d/%Y")
abs(x2 - y)
# Time differences in days
# [1] 5633.958 5559.000 832.000

Format POSIX scenario in Dates

Create a variable of value 15Aug1947 and 15Aug2018 in POSIX Date format.
Find the number of days elapsed since Independence as of 15th August 2018.
Need to code in R language.
DATE1 <- c("15Aug1947")
DATE2 <- c("15Aug2018")
X <- as.Date(DATE1, "%d/%m/%y") - as.Date(DATE2 , "%d/%m/%y")
print(X)
You are close, but are missing a small detail. The second argument in as.Date requires you to specify exactly in what format your dates is coming from. Right now, you are saying your date is comprised of 15/08/1947. Two things are wrong with this. Your date has no slashes and the month is not an integer but an abbreviation of the month name. The correct way to parse this date would be
> ps <- "%d%b%Y"
> DATE1 <- c("15Aug1947")
> DATE2 <- c("15Aug2018")
> X <- as.Date(DATE1, ps) - as.Date(DATE2 , ps)
>
> print(X)
Time difference of -25933 days
For more information on how to construct the string for parsing, see ?strptime.
You can use a package to parse dates automatically, such as lubridate.
The following code may help!
#Create a variable of value 15Aug1947 and 15Aug2018 in POSIX Date format
dt <- c(as.POSIXct("15Aug1947", format = "%d%b%Y"),as.POSIXct("15Aug1948", format = "%d%b%Y"))
#Finding the number of days elapsed
difftime(dt[2], dt[1], units = "days")
#Time difference of 25933 days

Convert multiple date format factor to date type in R

I have a variable in a data frame which hold different format of dates (month-year). for example. Jan-62, 98-Apr, March-1987.
The variable type is FACTOR at this point. I need help in converting this variable type to Date or POSIXct. I tried the function parse_date_time from lubridate package, it helped little bit but the year Jan-62 is taken as 01/01/2062 instead it should be 01/01/1962. I tried the function cutoff_2000 but I'm not getting the desired output.
Request your help.
Regards,
Aravindan S
Use parse_date_time and then subtract off 100 years from those components having a year beyond 2019:
x <- factor( c("Jan-62", "98-Apr", "March-1987") ) # input
p <- parse_date_time(x, c("my", "ym", "mY"))
year(p) <- year(p) - 100 * (year(p) > 2019)
p
## [1] "1962-01-01 UTC" "1998-04-01 UTC" "1987-03-01 UTC"
You can use the function as.date:
yourvariable<- as.Date(yourvariable, "%m/%d/%Y")
(m is month)
(d is day)
(y is year)

r intersect of date in with year and month

I would like to find the intersection of two dataframes based on the date column.
Previously, I have been using this command to find the intersect of a yearly date column (where the date only contained the year)
common_rows <-as.Date(intersect(df1$Date, df2$Date), origin = "1970-01-01")
But now my date column for df1 is of type date and looks like this:
1985-01-01
1985-04-01
1985-07-01
1985-10-01
My date column for df2 is also of type date and looks like this (notice the days are different)
1985-01-05
1985-04-03
1985-07-07
1985-10-01
The above command works fine when I keep the format like this (i.e year, month and day) but since my days are different and I am interested in the monthly intersection I dropped the days like this, but that produces and error when I look for the intersection:
df1$Date <- format(as.Date(df1$Date), "%Y-%m")
common_rows <-as.Date(intersect(df1$Date, df2$Date), origin = "1970-01-01")
Error in charToDate(x) :
character string is not in a standard unambiguous format
Is there a way to find the intersection of the two datasets, based on the year and month, while ignoring the day?
The problem is the as.Date() function wrapping your final output. I don't know if you can convert incomplete dates to date objects. If you are fine with simple strings then use common_rows <-intersect(df1$Date, df2$Date). Otherwise, try:
common_rows <-as.Date(paste(intersect(df1$Date, df2$Date),'-01',sep = ''), origin = "1970-01-01")
Try this:
date1 <- c('1985-01-01','1985-04-01','1985-07-01','1985-10-01')
date2 <- c('1985-01-05','1985-04-03','1985-07-07','1985-10-01')
# extract the part without date
date1 <- sapply(date1, function(j) substr(j, 1, 7))
date2 <- sapply(date2, function(j) substr(j, 1, 7))
print(intersect(date1, date2))
[1] "1985-01" "1985-04" "1985-07" "1985-10"

Create date column from datetime in R

I am new to R and I am an avid SAS programmer and am just having a difficult time wrapping my head around R.
Within a data frame I have a date time column formatted as a POSIXct with the following the column appearing as "2013-01-01 00:53:00". I would like to create a date column using a function that extracts the date and a column to extract the hour. In an ideal world I would like to be able to extract the date, year, day, month, time and hour all within the data frame to create these additional columns within the data frame.
It is wise to always to be careful with as.Date(as.POSIXct(...)):
E.g., for me in Australia:
df <- data.frame(dt=as.POSIXct("2013-01-01 00:53:00"))
df
# dt
#1 2013-01-01 00:53:00
as.Date(df$dt)
#[1] "2012-12-31"
You'll see that this is problematic as the dates don't match. You'll hit problems if your POSIXct object is not in the UTC timezone as as.Date defaults to tz="UTC" for this class. See here for more info: as.Date(as.POSIXct()) gives the wrong date?
To be safe you probably need to match your timezones:
as.Date(df$dt,tz=Sys.timezone()) #assuming you've just created df in the same session:
#[1] "2013-01-01"
Or safer option #1:
df <- data.frame(dt=as.POSIXct("2013-01-01 00:53:00",tz="UTC"))
as.Date(df$dt)
#[1] "2013-01-01"
Or safer option #2:
as.Date(df$dt,tz=attr(df$dt,"tzone"))
#[1] "2013-01-01"
Or alternatively use format to extract parts of the POSIXct object:
as.Date(format(df$dt,"%Y-%m-%d"))
#[1] "2013-01-01"
as.numeric(format(df$dt,"%Y"))
#[1] 2013
as.numeric(format(df$dt,"%m"))
#[1] 1
as.numeric(format(df$dt,"%d"))
#[1] 1
Use the lubridate package. For example, if df is a data.frame with a column dt of type POSIXct, then you could:
df$date = as.Date(as.POSIXct(df$dt, tz="UTC"))
df$year = year(df$dt)
df$month = month(df$dt)
df$day = day(df$dt)
# and so on...
If your can store your data in a data.table, then this is even easier:
df[, `:=`(date = as.Date(as.POSIXct(dt, tz="UTC")), year = year(dt), ...)]

Resources