I know this question has probably been answered in different ways, but still struggling with this. I am working with a dataset where the dates format for date1 is '2/1/2000', '5/12/2000', '6/30/2015' where the class() is character. And the second column of dates date2 in the format '2015-07-06', '2015-08-01', '2017-10-09' where the class() is "POSIXct" "POSIXt" .
I am attempting to standardize both columns so I can compute the difference in days between them using something like this
abs(difftime(date1 ,date2 , units = c("days")))
I have tried numerous ways in converting the first date1 into the same class using strtime, lubridate etc. What's the best way to move forward for me to be able to standardize both and compute the difference in days?
sample data
x <- c('2/1/2000', '5/12/2000', '6/30/2015')
y <- as.POSIXct(c('2015-07-06', '2015-08-01', '2017-10-09'))
code
#make both posixct
x2 <- as.POSIXct(x, format = "%m/%d/%Y")
abs(x2 - y)
# Time differences in days
# [1] 5633.958 5559.000 832.000
I have a dataset that all of it’s date variables are messed up. All of the columns are characters. They look like this:
name <- c(“Ana”, “Maria”, “Rachel”, “Julia”)
date_of_birth <- c(“9/8/1997”, “22/3/1966”, “24/10/1969”, “25/6/2019”)
data <- as.data.frame(cbind(name, date_of_bieth))
I need to turn those dates into dd/mm/yyyy format. They are already in this order, but I need to add zero when dd or mm has only one digit.
For example, “9/8/1997” should be “09/08/1997”.
We can try this
> format(as.Date(date_of_birth, format = "%d/%m/%Y"), "%d/%m/%Y")
[1] "09/08/1997" "22/03/1966" "24/10/1969" "25/06/2019"
Create a variable of value 15Aug1947 and 15Aug2018 in POSIX Date format.
Find the number of days elapsed since Independence as of 15th August 2018.
Need to code in R language.
DATE1 <- c("15Aug1947")
DATE2 <- c("15Aug2018")
X <- as.Date(DATE1, "%d/%m/%y") - as.Date(DATE2 , "%d/%m/%y")
print(X)
You are close, but are missing a small detail. The second argument in as.Date requires you to specify exactly in what format your dates is coming from. Right now, you are saying your date is comprised of 15/08/1947. Two things are wrong with this. Your date has no slashes and the month is not an integer but an abbreviation of the month name. The correct way to parse this date would be
> ps <- "%d%b%Y"
> DATE1 <- c("15Aug1947")
> DATE2 <- c("15Aug2018")
> X <- as.Date(DATE1, ps) - as.Date(DATE2 , ps)
>
> print(X)
Time difference of -25933 days
For more information on how to construct the string for parsing, see ?strptime.
You can use a package to parse dates automatically, such as lubridate.
The following code may help!
#Create a variable of value 15Aug1947 and 15Aug2018 in POSIX Date format
dt <- c(as.POSIXct("15Aug1947", format = "%d%b%Y"),as.POSIXct("15Aug1948", format = "%d%b%Y"))
#Finding the number of days elapsed
difftime(dt[2], dt[1], units = "days")
#Time difference of 25933 days
I am trying to change the Character value into Date value to make the data reactive to dateRangeinput in R shiny. However, I believe that the Dates should be in Date format or value, not in Character or String value.
I am having a problem to convert them. Below is the code
CompletedProject$EndDate <- ymd(substring(CompletedProject$EndDate, first = 1, last = 10))
CompletedProject$EndDate <- format(as.Date(CompletedProject$EndDate), "%Y-%m")
CompletedProject_monthly <- subset(CompletedProject_monthly, CompletedProject_monthly$EndDate >= input$IMdaterange[1] & CompletedProject_monthly$EndDate <= input$IMdaterange[2])
> class(CompletedProject$EndDate)
[1] "character"
I have tried as.Date function, but it didn't work.
I can do it with year month day format easily, but year month format is harder.
I need the date format as year-month.
Appreciate for help experts!!
I am trying to understand why R behaves differently with the "aggregate" function. I wanted to average 15m-data to hourly data. For this, I passed the 15m-data together with a pre-designed "hour" array (4 times the same date per hour, taking the original POSIXct array) to the aggregate function.
After some time, I realized that the function was behaving odd (well, probably the data was odd, but why?) when giving over the date-array with
strftime(data.15min$posix, format="%Y-%m-%d %H")
However, if I handed over the data with
cut(data.15min$posix, "1 hour")
the data was averaged correctly.
Below, a minimal example is embedded, including a sample of the data.
I would be happy to understand what I did wrong.
Thanks in advance!
d <- 3
bla <- read.table("test_daten.dat",header=TRUE,sep=",")
data.15min <- NULL
data.15min$posix <- as.POSIXct(bla$dates,tz="UTC")
data.15min$o3 <- bla$o3
hourtimes <- unique(as.POSIXct(paste(strftime(data.15min$posix, format="%Y-%m-%d %H"),":00:00",sep=""),tz="Universal"))
agg.mean <- function (xx, yy, rm.na = T)
# xx: parameter that determines the aggregation: list(xx), e.g. hour etc.
# yy: parameter that will be aggregated
{
aa <- yy
out.mean <- aggregate(aa, list(xx), FUN = mean, na.rm=rm.na)
out.mean <- out.mean[,2]
}
#############
data.o3.hour.mean <- round(agg.mean(strftime(data.15min$posix, format="%m/%d/%y %H"), data.15min$o3), d); data.o3.hour.mean[1:100]
win.graph(10,5)
par(mar=c(5,15,4,2), new =T)
plot(data.15min$posix,data.15min$o3,col=3,type="l",ylim=c(10,60)) # original data
par(mar=c(5,15,4,2), new =T)
plot(data.date.hour_mean,data.o3.hour.mean,col=5,type="l",ylim=c(10,60)) # Wrong
##############
data.o3.hour.mean <- round(agg.mean(cut(data.15min$posix, "1 hour"), data.15min$o3), d); data.o3.hour.mean[1:100]
win.graph(10,5)
par(mar=c(5,15,4,2), new =T)
plot(data.15min$posix,data.15min$o3,col=3,type="l",ylim=c(10,60)) # original data
par(mar=c(5,15,4,2), new =T)
plot(data.date.hour_mean,data.o3.hour.mean,col=5,type="l",ylim=c(10,60)) # Correct
Data:
Download data
Too long for a comment.
The reason your results look different is that aggregate(...) sorts the results by your grouping variable(s). In the first case,
strftime(data.15min$posix, format="%m/%d/%y %H")
is a character vector with poorly formatted dates (they do not sort properly). So the first row corresponds to the "date" "01/01/96 00".
In your second case,
cut(data.15min$posix, "1 hour")
generates actual POSIXct dates, which sort properly. So the first row corresponds to the date: 1995-11-04 13:00:00.
If you had used
strftime(data.15min$posix, format="%Y-%m-%d %H")
in your first case you would have gotten the same result as using cut(...)