r intersect of date in with year and month - r

I would like to find the intersection of two dataframes based on the date column.
Previously, I have been using this command to find the intersect of a yearly date column (where the date only contained the year)
common_rows <-as.Date(intersect(df1$Date, df2$Date), origin = "1970-01-01")
But now my date column for df1 is of type date and looks like this:
1985-01-01
1985-04-01
1985-07-01
1985-10-01
My date column for df2 is also of type date and looks like this (notice the days are different)
1985-01-05
1985-04-03
1985-07-07
1985-10-01
The above command works fine when I keep the format like this (i.e year, month and day) but since my days are different and I am interested in the monthly intersection I dropped the days like this, but that produces and error when I look for the intersection:
df1$Date <- format(as.Date(df1$Date), "%Y-%m")
common_rows <-as.Date(intersect(df1$Date, df2$Date), origin = "1970-01-01")
Error in charToDate(x) :
character string is not in a standard unambiguous format
Is there a way to find the intersection of the two datasets, based on the year and month, while ignoring the day?

The problem is the as.Date() function wrapping your final output. I don't know if you can convert incomplete dates to date objects. If you are fine with simple strings then use common_rows <-intersect(df1$Date, df2$Date). Otherwise, try:
common_rows <-as.Date(paste(intersect(df1$Date, df2$Date),'-01',sep = ''), origin = "1970-01-01")

Try this:
date1 <- c('1985-01-01','1985-04-01','1985-07-01','1985-10-01')
date2 <- c('1985-01-05','1985-04-03','1985-07-07','1985-10-01')
# extract the part without date
date1 <- sapply(date1, function(j) substr(j, 1, 7))
date2 <- sapply(date2, function(j) substr(j, 1, 7))
print(intersect(date1, date2))
[1] "1985-01" "1985-04" "1985-07" "1985-10"

Related

Convert character to date format and then compute difference in days

I know this question has probably been answered in different ways, but still struggling with this. I am working with a dataset where the dates format for date1 is '2/1/2000', '5/12/2000', '6/30/2015' where the class() is character. And the second column of dates date2 in the format '2015-07-06', '2015-08-01', '2017-10-09' where the class() is "POSIXct" "POSIXt" .
I am attempting to standardize both columns so I can compute the difference in days between them using something like this
abs(difftime(date1 ,date2 , units = c("days")))
I have tried numerous ways in converting the first date1 into the same class using strtime, lubridate etc. What's the best way to move forward for me to be able to standardize both and compute the difference in days?
sample data
x <- c('2/1/2000', '5/12/2000', '6/30/2015')
y <- as.POSIXct(c('2015-07-06', '2015-08-01', '2017-10-09'))
code
#make both posixct
x2 <- as.POSIXct(x, format = "%m/%d/%Y")
abs(x2 - y)
# Time differences in days
# [1] 5633.958 5559.000 832.000

How to fix character dates and add zeros to it in R?

I have a dataset that all of it’s date variables are messed up. All of the columns are characters. They look like this:
name <- c(“Ana”, “Maria”, “Rachel”, “Julia”)
date_of_birth <- c(“9/8/1997”, “22/3/1966”, “24/10/1969”, “25/6/2019”)
data <- as.data.frame(cbind(name, date_of_bieth))
I need to turn those dates into dd/mm/yyyy format. They are already in this order, but I need to add zero when dd or mm has only one digit.
For example, “9/8/1997” should be “09/08/1997”.
We can try this
> format(as.Date(date_of_birth, format = "%d/%m/%Y"), "%d/%m/%Y")
[1] "09/08/1997" "22/03/1966" "24/10/1969" "25/06/2019"

Format POSIX scenario in Dates

Create a variable of value 15Aug1947 and 15Aug2018 in POSIX Date format.
Find the number of days elapsed since Independence as of 15th August 2018.
Need to code in R language.
DATE1 <- c("15Aug1947")
DATE2 <- c("15Aug2018")
X <- as.Date(DATE1, "%d/%m/%y") - as.Date(DATE2 , "%d/%m/%y")
print(X)
You are close, but are missing a small detail. The second argument in as.Date requires you to specify exactly in what format your dates is coming from. Right now, you are saying your date is comprised of 15/08/1947. Two things are wrong with this. Your date has no slashes and the month is not an integer but an abbreviation of the month name. The correct way to parse this date would be
> ps <- "%d%b%Y"
> DATE1 <- c("15Aug1947")
> DATE2 <- c("15Aug2018")
> X <- as.Date(DATE1, ps) - as.Date(DATE2 , ps)
>
> print(X)
Time difference of -25933 days
For more information on how to construct the string for parsing, see ?strptime.
You can use a package to parse dates automatically, such as lubridate.
The following code may help!
#Create a variable of value 15Aug1947 and 15Aug2018 in POSIX Date format
dt <- c(as.POSIXct("15Aug1947", format = "%d%b%Y"),as.POSIXct("15Aug1948", format = "%d%b%Y"))
#Finding the number of days elapsed
difftime(dt[2], dt[1], units = "days")
#Time difference of 25933 days

How to convert Year-Month 'Character Value' into 'Date Value' in R

I am trying to change the Character value into Date value to make the data reactive to dateRangeinput in R shiny. However, I believe that the Dates should be in Date format or value, not in Character or String value.
I am having a problem to convert them. Below is the code
CompletedProject$EndDate <- ymd(substring(CompletedProject$EndDate, first = 1, last = 10))
CompletedProject$EndDate <- format(as.Date(CompletedProject$EndDate), "%Y-%m")
CompletedProject_monthly <- subset(CompletedProject_monthly, CompletedProject_monthly$EndDate >= input$IMdaterange[1] & CompletedProject_monthly$EndDate <= input$IMdaterange[2])
> class(CompletedProject$EndDate)
[1] "character"
I have tried as.Date function, but it didn't work.
I can do it with year month day format easily, but year month format is harder.
I need the date format as year-month.
Appreciate for help experts!!

R: aggregate quarterly data to hourly data - different behaviour with same date fields

I am trying to understand why R behaves differently with the "aggregate" function. I wanted to average 15m-data to hourly data. For this, I passed the 15m-data together with a pre-designed "hour" array (4 times the same date per hour, taking the original POSIXct array) to the aggregate function.
After some time, I realized that the function was behaving odd (well, probably the data was odd, but why?) when giving over the date-array with
strftime(data.15min$posix, format="%Y-%m-%d %H")
However, if I handed over the data with
cut(data.15min$posix, "1 hour")
the data was averaged correctly.
Below, a minimal example is embedded, including a sample of the data.
I would be happy to understand what I did wrong.
Thanks in advance!
d <- 3
bla <- read.table("test_daten.dat",header=TRUE,sep=",")
data.15min <- NULL
data.15min$posix <- as.POSIXct(bla$dates,tz="UTC")
data.15min$o3 <- bla$o3
hourtimes <- unique(as.POSIXct(paste(strftime(data.15min$posix, format="%Y-%m-%d %H"),":00:00",sep=""),tz="Universal"))
agg.mean <- function (xx, yy, rm.na = T)
# xx: parameter that determines the aggregation: list(xx), e.g. hour etc.
# yy: parameter that will be aggregated
{
aa <- yy
out.mean <- aggregate(aa, list(xx), FUN = mean, na.rm=rm.na)
out.mean <- out.mean[,2]
}
#############
data.o3.hour.mean <- round(agg.mean(strftime(data.15min$posix, format="%m/%d/%y %H"), data.15min$o3), d); data.o3.hour.mean[1:100]
win.graph(10,5)
par(mar=c(5,15,4,2), new =T)
plot(data.15min$posix,data.15min$o3,col=3,type="l",ylim=c(10,60)) # original data
par(mar=c(5,15,4,2), new =T)
plot(data.date.hour_mean,data.o3.hour.mean,col=5,type="l",ylim=c(10,60)) # Wrong
##############
data.o3.hour.mean <- round(agg.mean(cut(data.15min$posix, "1 hour"), data.15min$o3), d); data.o3.hour.mean[1:100]
win.graph(10,5)
par(mar=c(5,15,4,2), new =T)
plot(data.15min$posix,data.15min$o3,col=3,type="l",ylim=c(10,60)) # original data
par(mar=c(5,15,4,2), new =T)
plot(data.date.hour_mean,data.o3.hour.mean,col=5,type="l",ylim=c(10,60)) # Correct
Data:
Download data
Too long for a comment.
The reason your results look different is that aggregate(...) sorts the results by your grouping variable(s). In the first case,
strftime(data.15min$posix, format="%m/%d/%y %H")
is a character vector with poorly formatted dates (they do not sort properly). So the first row corresponds to the "date" "01/01/96 00".
In your second case,
cut(data.15min$posix, "1 hour")
generates actual POSIXct dates, which sort properly. So the first row corresponds to the date: 1995-11-04 13:00:00.
If you had used
strftime(data.15min$posix, format="%Y-%m-%d %H")
in your first case you would have gotten the same result as using cut(...)

Resources