Convert character to date format and then compute difference in days - r

I know this question has probably been answered in different ways, but still struggling with this. I am working with a dataset where the dates format for date1 is '2/1/2000', '5/12/2000', '6/30/2015' where the class() is character. And the second column of dates date2 in the format '2015-07-06', '2015-08-01', '2017-10-09' where the class() is "POSIXct" "POSIXt" .
I am attempting to standardize both columns so I can compute the difference in days between them using something like this
abs(difftime(date1 ,date2 , units = c("days")))
I have tried numerous ways in converting the first date1 into the same class using strtime, lubridate etc. What's the best way to move forward for me to be able to standardize both and compute the difference in days?

sample data
x <- c('2/1/2000', '5/12/2000', '6/30/2015')
y <- as.POSIXct(c('2015-07-06', '2015-08-01', '2017-10-09'))
code
#make both posixct
x2 <- as.POSIXct(x, format = "%m/%d/%Y")
abs(x2 - y)
# Time differences in days
# [1] 5633.958 5559.000 832.000

Related

Format POSIX scenario in Dates

Create a variable of value 15Aug1947 and 15Aug2018 in POSIX Date format.
Find the number of days elapsed since Independence as of 15th August 2018.
Need to code in R language.
DATE1 <- c("15Aug1947")
DATE2 <- c("15Aug2018")
X <- as.Date(DATE1, "%d/%m/%y") - as.Date(DATE2 , "%d/%m/%y")
print(X)
You are close, but are missing a small detail. The second argument in as.Date requires you to specify exactly in what format your dates is coming from. Right now, you are saying your date is comprised of 15/08/1947. Two things are wrong with this. Your date has no slashes and the month is not an integer but an abbreviation of the month name. The correct way to parse this date would be
> ps <- "%d%b%Y"
> DATE1 <- c("15Aug1947")
> DATE2 <- c("15Aug2018")
> X <- as.Date(DATE1, ps) - as.Date(DATE2 , ps)
>
> print(X)
Time difference of -25933 days
For more information on how to construct the string for parsing, see ?strptime.
You can use a package to parse dates automatically, such as lubridate.
The following code may help!
#Create a variable of value 15Aug1947 and 15Aug2018 in POSIX Date format
dt <- c(as.POSIXct("15Aug1947", format = "%d%b%Y"),as.POSIXct("15Aug1948", format = "%d%b%Y"))
#Finding the number of days elapsed
difftime(dt[2], dt[1], units = "days")
#Time difference of 25933 days

Identify Week in Column of Dates and Generate Automatic Dataframe Subsets per Week

I want to automatise a code that calculates transport times. I would like that the code gives me 4 months that you can choose out of a big readout from a year and splits up the last month in its four weeks and just describes the data subsets (describing is not the problem).
Generating subsets from a dataset for chosen months is not the problem because I can define the months.
But where I struggle is the 3/4 weeks of the last month. I need to identify them automatically and after that generate the subsets. (I hope that generating subsets should be easier after identifying.)
I can give you a little mock-up of my data.
dates <- as.Date(c("2019-01-07", "2019-01-08", "2019-01-09",
"2019-01-15", "2019-01-21"))
number <- c(12,13,14,15,20)
df <- data.frame(number, dates)
The original df contains of 60 variables but I believe this simple mockup can provide enough info for the task.
I am pretty new to r, I have no idea how to solve the problem, I will show you how I solved it with the months, but as said, in this case they are defined.
function(data = df, m1 = "01" , m2 = "02") {
Monat1 <- subset(data, format.Date(dates , "%m") == m1)
Thank you for helping me out a bit.
you can use the function strftime
strftime(df$dates, format = "%W")
in rstudio use
?strftime()
to see all the different values you can extract from a date or POSIXCT object
You can do it using base R and lubridate
Data
dates <- as.Date(c("2019-01-07", "2019-01-08", "2019-01-09",
"2019-01-15", "2019-01-21"))
number <- c(12,13,14,15,20)
df <- data.frame(number, dates)
str(df)
Answer
library(lubridate)
df$condition <- ifelse(month(df$dates) == month(Sys.Date())-1,week(df$dates),"-")
condition will check if the date is less than a month ago or not and if yes it will give you week number for that particular value

Convert multiple date format factor to date type in R

I have a variable in a data frame which hold different format of dates (month-year). for example. Jan-62, 98-Apr, March-1987.
The variable type is FACTOR at this point. I need help in converting this variable type to Date or POSIXct. I tried the function parse_date_time from lubridate package, it helped little bit but the year Jan-62 is taken as 01/01/2062 instead it should be 01/01/1962. I tried the function cutoff_2000 but I'm not getting the desired output.
Request your help.
Regards,
Aravindan S
Use parse_date_time and then subtract off 100 years from those components having a year beyond 2019:
x <- factor( c("Jan-62", "98-Apr", "March-1987") ) # input
p <- parse_date_time(x, c("my", "ym", "mY"))
year(p) <- year(p) - 100 * (year(p) > 2019)
p
## [1] "1962-01-01 UTC" "1998-04-01 UTC" "1987-03-01 UTC"
You can use the function as.date:
yourvariable<- as.Date(yourvariable, "%m/%d/%Y")
(m is month)
(d is day)
(y is year)

R: Summing POSIXct-objects using tapply

I'm trying to sum a set of POSIXct objects by a factor variable, but am getting an error that sum is not defined for POSIXt objects. However, it works fine if I just calculate the mean. But how can I get the summed times by group using tapply?
Example:
data <- data.frame(time = c("2:50:04", "1:24:10", "3:10:43", "1:44:26", "2:10:19", "3:01:04"),
group = c("A","A","A","B","B","B"))
data$group <- as.factor(data$group)
data$time <- as.POSIXct(paste("1970-01-01", data$time), format="%Y-%m-%d %H:%M:%S", tz="GMT")
# works
tapply(data$time, data$group, mean)
# doesn't work
tapply(data$time, data$group, sum)
Date objects cannot be summed, this does semantically not make sense, the + operator is also not defined for POSIXct objects.
Probably you want to model time differences and sum them up?
Try:
times <- as.difftime(c("2:50:04", "1:24:10", "3:10:43",
"1:44:26", "2:10:19", "3:01:04"), "%H:%M:%S")
sum(times)
A difftime object is also that what you get when you subtract two date objects (which is semantically reasonable).
EDIT:
A entire solution for the OPs problem in a semantically neater way (tapply seams to destroy the structure of the difftime class - use group_by from the dplyr package instead)
library(dplyr)
times <- as.difftime(c("2:50:04", "1:24:10", "3:10:43",
"1:44:26", "2:10:19", "3:01:04"), format="%H:%M:%S")
data <- data.frame(time = times, group = c("A","A","A","B","B","B"))
summarise(group_by(data, group), sum(time))
This gives the following output:
Source: local data frame [2 x 2]
group sum(time)
1 A 7.415833 hours
2 B 6.930278 hours

using date to calculate how many days since first date in R?

I have several dates of measurement for the same specimen. I'm trying to figure out the first day I have and the 2nd, 3rd, 4th...up to 6th day.
here is the data
First I took data$start and split it
#split timestamp into separate date and time vars
temp<-strsplit(as.character(data$start), " ")
mat<-matrix(unlist(temp), ncol=2, byrow=TRUE)
df<-as.data.frame(mat)
colnames(df)<-c("date", "time")
data<-cbind(df, data)
then
data$date<-as.Date(data$date, "%Y-%m-%d")
data$dob <- ave(as.numeric(data$date), data$mcode, FUN = min)
data$dob <- data$dob - 1
data$pnday <- as.numeric(data$date) - data$dob
Both pnd and dob columns have an NA -- sorry if this is silly, any ideas?
I'm new to working with dates/times in R
If you format your data as an xts, it will automatically be ordered by date from first to last. To do this, you'll need to make your timestamp readings into POSIXct instances and pass them to the order.by parameter of its constructor. I hope this helps you. If you need further help, please leave a comment, and I'll dig into your data.

Resources