Difference between two dates in years days hours minutes format [duplicate] - datetime

This question already has answers here:
Difference between two dates expressed as years, months, days (in one column)
(5 answers)
Closed 1 year ago.
I am trying to find out difference between two days "xx years xx days xx hours xx minutes" format. My start date is in A column and end date is in B. Calculating difference in C column.
I am using B1-A1 formula in C column and using custom format "yy" years "dd" days "hh" hours "mm" minutes" in C column. It works fine if the start and end years are different. But if it is same, C column reporting incorrect years and days. (showing as 99 years and 30 days).
How to fix it ?
Sample google sheet link

Your B2 - A2 formula is actually getting the correct result — it is a dateserial value that represents the difference between the two moments in units of days.
The problem is with the custom number format you are using. It refers to calendar time, while you would need elapsed time. Unfortunately, Google Sheets apparently only supports elapsed time in units of hours, minutes and seconds.
If a text string result is OK with you, use this:
=datedif(A2, B2, "y") & " years " & datedif(A2, B2, "d") - 365 * datedif(A2, B2, "y") & " days " & text(B2 - A2 - int(B2 - A2), "HH ""hours"" mm ""minutes""")
The formula assumes that all years are 365 days long, which will not be correct when the date span crosses a leap day.
To get the average elapsed time, evaluate =B2 - A2 for all rows with an array formula, then calculate the average of those results, and finally format the result to your liking, like this:
=arrayformula( average( if( B2:B + A2:A, B2:B - A2:A, iferror(1/0) ) ) )
=datedif(0, G2, "y") & " years " & datedif(0, G2, "d") - 365 * datedif(0, G2, "y") & " days " & text(G2 - 0 - int(G2 - 0), "H ""hours"" m ""minutes""")
See cells F2:G2 in your sample spreadsheet.
See this answer for an explanation of how date and time values work in spreadsheets.

A bit long, but it works. Would like to try TEXT but I have additional conditions for the days since it is tricky due to how sheets calculate the difference when actually they have different days per months. You need to consider the days per months as it is unique.
Formula:
=datedif(A2, B2, "y")&" years "
&if(day(B2) >= day(A2),
if(index(split(B2, " "), ,2) >= index(split(A2, " "), ,2),
day(B2) - day(A2),
day(B2) - day(A2) - 1),
if(index(split(B2, " "), ,2) >= index(split(A2, " "), ,2),
day(eomonth(B2, -1)) + day(B2) - min(day(A2), day(eomonth(B2, -1))),
day(eomonth(B2, -1)) + day(B2) - min(day(A2), day(eomonth(B2, -1))) -1))&" days "
&hour(B2 - A2)&" hours "
&minute(B2 - A2)&" minutes"
Output:
Note:
Formula took care of formatting.
Days have a very complicated formula due to different days per months. But this should show a more accurate answer
index(split(B2, " "), ,2) >= index(split(A2, " "), ,2) is added on the original formula to consider later time on start date, reducing the days by 1 if that happens
Reference:
https://stackoverflow.com/a/55222439/14606045

try:
=INDEX(IFERROR(TRIM(REGEXREPLACE(
TEXT( DATEDIF(A2:A, B2:B, "Y"), "0#")&CHOOSE(MATCH( DATEDIF(A2:A, B2:B, "Y"), {0,1,2})," years ", " year ", " years ")&
TEXT(IF(B2:B-A2:A<1,,DATEDIF(A2:A, B2:B, "YD")), "0#")&CHOOSE(MATCH(IF(B2:B-A2:A<1,,DATEDIF(A2:A, B2:B, "YD")), {0,1,2})," days ", " day ", " days ")&
TEXT( TEXT(B2:B-A2:A, "H"), "0#")& CHOOSE(MATCH(TEXT(B2:B-A2:A, "H") *1, {0,1,2})," hours ", " hour ", " hours ")&
TEXT(INT(TEXT(B2:B-A2:A, "M.S")), "0#")& CHOOSE(MATCH(TEXT(B2:B-A2:A, "M.S")*1, {0,1,2})," minutes"," minute"," minutes"),
"\b0 (year(s)|day(s)|hour(s)|minute(s))", ))))
works with leap years
works with singular
works with plural
works with arrays
works with seconds
works with dates only
works with time only
works with less than a day
works with blank rows
excludes all null values
to remove only leading null values:
=INDEX(IFERROR(TRIM(REGEXREPLACE(
TEXT( DATEDIF(A2:A, B2:B, "Y"), "00")&CHOOSE(MATCH( DATEDIF(A2:A, B2:B, "Y"), {0,1,2})," years ", " year ", " years ")&
TEXT(IF(B2:B-A2:A<1,,DATEDIF(A2:A, B2:B, "YD")), "00")&CHOOSE(MATCH(IF(B2:B-A2:A<1,,DATEDIF(A2:A, B2:B, "YD")), {0,1,2})," days ", " day ", " days ")&
TEXT( TEXT(B2:B-A2:A, "H"), "00")&CHOOSE(MATCH(TEXT(B2:B-A2:A, "H") *1, {0,1,2})," hours ", " hour ", " hours ")&
TEXT(INT(TEXT(B2:B-A2:A, "M.S")), "00")&CHOOSE(MATCH(TEXT(B2:B-A2:A, "M.S")*1, {0,1,2})," minutes"," minute"," minutes"),
"^\b(?:00 years )?(?:00 days )?(?:00 hours )?(?:00 minutes)?", ))))
to NOT remove any null values:
=INDEX(IFERROR(REGEXREPLACE(
TEXT( DATEDIF(A2:A, B2:B, "Y"), "00")&CHOOSE(MATCH( DATEDIF(A2:A, B2:B, "Y"), {0,1,2})," years ", " year ", " years ")&
TEXT(IF(B2:B-A2:A<1,,DATEDIF(A2:A, B2:B, "YD")), "00")&CHOOSE(MATCH(IF(B2:B-A2:A<1,,DATEDIF(A2:A, B2:B, "YD")), {0,1,2})," days ", " day ", " days ")&
TEXT( TEXT(B2:B-A2:A, "H"), "00")&CHOOSE(MATCH(TEXT(B2:B-A2:A, "H") *1, {0,1,2})," hours ", " hour ", " hours ")&
TEXT(INT(TEXT(B2:B-A2:A, "M.S")), "00")&CHOOSE(MATCH(TEXT(B2:B-A2:A, "M.S")*1, {0,1,2})," minutes"," minute"," minutes"),
"^\b00 years 00 days 00 hours 00 minutes", )))
to have it aligned with Mars, Jupiter and Saturn:
=INDEX(IFERROR(REGEXREPLACE(REGEXREPLACE(
TEXT( DATEDIF(A2:A, B2:B, "Y"), "00")&CHOOSE(MATCH( DATEDIF(A2:A, B2:B, "Y"), {0,1,2})," years ", " year ", " years ")&
TEXT(IF(B2:B-A2:A<1,,DATEDIF(A2:A, B2:B, "YD")), "000")&CHOOSE(MATCH(IF(B2:B-A2:A<1,,DATEDIF(A2:A, B2:B, "YD")), {0,1,2})," days ", " day ", " days ")&
TEXT( TEXT(B2:B-A2:A, "H"), "00")&CHOOSE(MATCH(TEXT(B2:B-A2:A, "H") *1, {0,1,2})," hours ", " hour ", " hours ")&
TEXT(INT(TEXT(B2:B-A2:A, "M.S")), "00")&CHOOSE(MATCH(TEXT(B2:B-A2:A, "M.S")*1, {0,1,2})," minutes"," minute "," minutes"),
"^\b00 years 000 days 00 hours 00 minutes", ), "(0)(\d{2})", " $2")))
update:
for average use either:
=INDEX(AVERAGE(IF(B2:B+A2:A, B2:B-A2:A, )))
or:
=INDEX(AVERAGE(IFERROR(1/(1/(B2:B-A2:A)))))
depends on the sensitivity you need

Put if condition if both values are same then it should be otherwise do subtract
=IF(A1=B1,"0",B1-A1)

This is because the GS considers the zero date to be the first day of its calendar, i.e. 1899-12-30. You can get a correct result with following formula:
=TEXT(DATEDIF(A2,B2,"Y"),"00 \y\ear\s ") &
TEXT(TRUNC(EDATE(B2,-12*(DATEDIF(A2,B2,"Y")))-A2), "00 \da\y\s ") &
TEXT(ABS((B2-TRUNC(B2,0))-(A2-TRUNC(A2,0))+1),"hh \hour\s mm \minut\e\s")
or using arrayformula:
=ArrayFormula(IF(LEN(A2:A),
TEXT(DATEDIF(A2:A,B2:B,"Y"),"00 \y\ear\s ") &
TEXT(TRUNC(EDATE(B2:B,-12*(DATEDIF(A2:A,B2:B,"Y")))-A2:A), "00 \da\y\s ") &
TEXT(ABS((B2:B-TRUNC(B2:B,0))-(A2:A-TRUNC(A2:A,0))+1),"hh \hour\s mm \minut\e\s"),))
The formula takes into account the leap years and calculates elapsed days.

Related

What is wrong with this ifelse-command? (rows get excluded that don´t match the if-statement)

I wanted to exclude rows with participants who show error rates above 15%
When I look at the error rate of participant 2, it is for example 2,97%
semdata[2,"error_rate"]
[1] "2,97"
But if I run this ifelse-statement, many participants get excluded that don´t display error rates (but others not, which is correct).
15% (e.g., this participant 2).
for(i in 1:NROW(semdata)){
#single trial blocks
ifelse((semdata[i,"error_rate"] >= 15),print(paste(i, "exclusion: error rate ST too high",semdata[i,"dt_tswp.err.prop_st"])),0)
ifelse((semdata[i,"error_rate"] >= 15),semdata[i,6:NCOL(semdata)]<-NA,0)
#dual-task blocks
# ifelse((semdata[i,"error_rate"] >= 15),print(paste(i, "exclusion: error rate DT too high")),0)
# ifelse((semdata[i,"error_rate"] >= 15),semdata[i,6:NCOL(semdata)]<-NA,0)
}
[1] "1 exclusion: error rate ST too high 6,72"
[1] "2 exclusion: error rate ST too high 2,97"
[1] "7 exclusion: error rate ST too high 2,87"
[1] "9 exclusion: error rate ST too high 5,28"
...
What am I doing wrong here?
You are comparing strings here.
"6,72" > 15
#[1] TRUE
You should convert the data to numeric first before comparing which can be done by using sub
as.numeric(sub(",", ".", "6,72"))
#[1] 6.72
This can be compared with 15.
as.numeric(sub(",", ".", "6,72")) > 15
#[1] FALSE
For the entire column you can do -
semdata$error_rate <- as.numeric(sub(",", ".", semdata$error_rate))

R: pasting (or combining) a variable amount of rows together as one

I have a text file I am trying to parse and put the information into a data frame. In each one of the 'events' there may or may not be some notes with it. However the notes can span various amounts of rows. I need to concatenate the notes for each event into one string to store in a column of the data frame.
ID: 20470
Version: 1
notes:
ID: 01040
Version: 2
notes:
The customer was late.
Project took 20 min. longer than anticipated
Work was successfully completed
ID: 00000
Version: 1
notes:
Customer was not at home.
ID: 00000
Version: 7
notes:
Fax at 2:30 pm
Called but no answer
Visit home no answer
Left note on door with call back number
Made a final attempt on 12/5/2013
closed case on 12/10 with nothing resolved
So for example for the third event the notes should be one long string: "The customer was late. Project took 20 min. longer than anticipated Work was successfully completed", which then would be store into the notes columns in the the data frame.
For each event I know how many rows the notes span.
Something like this (actually, you would be happier and learn more figuring it out yourself, I was just procrastinating between two tasks):
x <- readLines("R/xample.txt") # you'll probably read it from a file
ids <- grep("^ID:", x) # detecting lines starting with ID:
versions <- grep("^Version:", x)
notes <- grep("^notes:", x)
nStart <- notes + 1 # lines where the notes start
nEnd <- c(ids[-1]-1, length(x)) # notes end one line before the next ID: line
ids <- sapply(strsplit(x[ids], ": "), "[[", 2)
versions <- sapply(strsplit(x[versions], ": "), "[[", 2)
notes <- mapply(function(i,j) paste(x[i:j], collapse=" "), nStart, nEnd)
df <- data.frame(ID=ids, ver=versions, note=notes, stringsAsFactors=FALSE)
dput of data
> dput(x)
c("ID: 20470", "Version: 1", "notes: ", " ", " ", "ID: 01040",
"Version: 2", "notes: ", " The customer was late.", "Project took 20 min. longer than anticipated",
"Work was successfully completed", "", "ID: 00000", "Version: 1",
"notes: ", " Customer was not at home.", "", "ID: 00000", "Version: 7",
"notes: ", " Fax at 2:30 pm", "Called but no answer", "Visit home no answer",
"Left note on door with call back number", "Made a final attempt on 12/5/2013",
"closed case on 12/10 with nothing resolved ")

R weatherData Detroit station error

I keep getting errors in my attempt to pull weather data for Detroit airport. I am able to manually go to wunderground.com to get the historical hourly data, so it does exist there for Detroit location. But R package keeps sending me errors. I used "KDTW" for the airport code but it did not work. I tried "72537" for the station ID, which I got by using getStationCode("Detroit). I will appreciate any help with getting hour historical data for any close to Detroit station for the time interval January 1, 2017 through March 28, 2017.
Here is what I tried:
install.packages("weatherData")
library ('weatherData')
getStationCode("Detroit")
checkDataAvailabilityForDateRange(station_type ="KARB", start_date="2017-01-01", end_date="2017-03-28")
checkDataAvailabilityForDateRange(station_id ="KDTW", start_date="2017-01-01", end_date="2017-03-28")
Thank you!
I was able to get the weather data for KDTW.
temp <- getWeatherForDate(station_id = "KDTW", start_date = "2017-01-01", end_date="2017-01-10")
Here is my output for the date range
structure(c("Min. :2017-01-01 00:00:00 ", "1st Qu.:2017-01-03 06:00:00 ",
"Median :2017-01-05 12:00:00 ", "Mean :2017-01-05 12:00:00 ",
"3rd Qu.:2017-01-07 18:00:00 ", "Max. :2017-01-10 00:00:00 ",
"Min. :17.0 ", "1st Qu.:18.5 ", "Median :34.5 ", "Mean :31.9 ",
"3rd Qu.:42.5 ", "Max. :49.0 ", "Min. :11.00 ", "1st Qu.:13.75 ",
"Median :26.50 ", "Mean :25.20 ", "3rd Qu.:34.25 ", "Max. :41.00 ",
"Min. : 5.0 ", "1st Qu.: 9.0 ", "Median :17.5 ", "Mean :18.2 ",
"3rd Qu.:26.0 ", "Max. :37.0 "), .Dim = c(6L, 4L), .Dimnames = list(
c("", "", "", "", "", ""), c(" Date", "Max_TemperatureF",
"Mean_TemperatureF", "Min_TemperatureF")), class = "table")
I just got temperature data, but if you'd like, you can also get other information (like humidity, cloud cover, etc) by changing the opt_all_columns flag to TRUE.
Hope that Helps!
Edit: Looking at your code you have station_type ="KARB" Did you mean that to be station_id?

how to obtain days between two dates in R

I have a data set like below:
money date1 date2
"300" "10/30 " " 11/1"
"400" "10/28 " " 10/31"
"360" "10/28 " " 10/30"
"440" "10/25 " " 10/28"
"620" "10/21 " " 10/28"
I want to extract the days between two dates such as 10/30,10/31,and 11/1 for the first line. In addition, my code should assign a number to each extracted day. This number should be money/(# of days). As an example I would like to obtain 10/30,10/31,and 11/1 and 300/3 (i.e.=100),300/3,300/3 for each one. Does anyone have any idea about this?
This will give the total for everyday during the time period
data$date1<-as.Date(paste(data$date1,"/2012"), "%m/%d/%Y")
data$date2<-as.Date(paste(data$date2,"/2012"), "%m/%d/%Y")
data$perday<-with(data, money/(date2-date1))
period <- as.Date(min(data$date1):max(data$date2), origin = "1970-01-01")
sum <- sapply(period, function(x) sum(data[x >= data$date1 & x <= data$date2, 'perday']))
sumperday<-as.data.frame(period, sum)

From timespan (for example "15 min" or "2 sec") to "00:15:00" or "00:00:02"

I am searching all over help for R function that would convert timespan, for example "15 min" or "1 hour" or "6 sec" or "1 day" into datetime object like "00:15:00" or "01:00:00" or "00:00:06" or "1960-01-02 00:00:00" (not sure for this one). I am sure a function like this exists or there is a neat way to avoid programming it...
To be more specific I would like to do something like this (using made up function name transform.span.to.time):
library(chron)
times(transform.span.to.time("15 min"))
which should yield the same result as
times("00:15:00")
Does a function like transform.span.to.time("15 min") which returns something like "00:15:00" exists or does there exists a trick how to do that?
We will assume a single space separating the numbers and units, and also no trailing space after "secs" unit. This will handle mixed units:
test <- "0 hours 15 min 0 secs"
transform.span <- function(test){
testh <- if(!grepl( " hour | hours ", "0 hours 15 min 0 secs")){
# First consequent if no hours
sub("^", "0:", test)} else {
sub(" hour | hours ", ":", test)}
testm <- if(!grepl( " min | minutes ", testh)) {
# first consequent if no minutes
sub(" min | minutes ", "0:", testh)} else{
sub(" min | minutes ", ":", testh) }
test.s <- if(!grepl( " sec| secs| seconds", testm)) {
# first consequent if no seconds
sub(" sec| secs| seconds", "0", testm)} else{
sub(" sec| secs| seconds", "", testm)}
return(times(test.s)) }
### Use
> transform.span(test)
[1] 00:15:00
> test2 <- "21 hours 15 min 38 secs"
> transform.span(test2)
[1] 21:15:38
The first solution uses strapply in the gsubfn package and transforms to days, e.g. 1 hour is 1/24th of a day. The second solution transforms to an R expression which calculates the number of days and then evaluates it.
library(gsubfn)
library(chron)
unit2days <- function(d, u)
as.numeric(d) * switch(tolower(u), s = 1, m = 60, h = 3600)/(24 * 3600)
transform.span.to.time <- function(x)
sapply(strapply(x, "(\\d+) *(\\w)", unit2days), sum)
Here is a second solution:
library(chron)
transform.span.to.time2 <- function(x) {
x <- paste(x, 0)
x <- sub("h\\w*", "*3600+", x, ignore.case = TRUE)
x <- sub("m\\w*", "*60+", x, ignore.case = TRUE)
x <- sub("s\\w*", "+", x, ignore.case = TRUE)
unname(sapply(x, function(x) eval(parse(text = x)))/(24*3600))
}
Tests:
> x <- c("12 hours 3 min 1 sec", "22h", "18 MINUTES 23 SECONDS")
>
> times(transform.span.to.time(x))
[1] 12:03:01 22:00:00 00:18:23
>
> times(transform.span.to.time2(x))
[1] 12:03:01 22:00:00 00:18:23
The base function ?cut.POSIXt does this work for a specified set of values for breaks:
breaks: a vector of cut points _or_ number giving the number of
intervals which ‘x’ is to be cut into *_or_ an interval
specification, one of ‘"sec"’, ‘"min"’, ‘"hour"’, ‘"day"’,
‘"DSTday"’, ‘"week"’, ‘"month"’, ‘"quarter"’ or ‘"year"’,
optionally preceded by an integer and a space, or followed by
‘"s"’. For ‘"Date"’ objects only ‘"day"’, ‘"week"’,
‘"month"’, ‘"quarter"’ and ‘"year"’ are allowed.*
See the source code by typing in cut.POSIXt, the relevant section starts with this:
else if (is.character(breaks) && length(breaks) == 1L) {
You could adopt the code in this section to work for your needs.
You can define the time span with difftime:
span2time <- function(span, units = c('mins', 'secs', 'hours')) {
span.dt <- as.difftime(span, units = match.arg(units))
format(as.POSIXct("1970-01-01") + span.dt, "%H:%M:%S")
}
For example:
> span2time(15)
[1] "00:15:00"
EDIT: modified to produce character string acceptable to chron's times.
#DWin: thank you.
Based on DWin example I rearranged a bit and here is the result:
transform.span<-function(timeSpan) {
timeSpanH <- if(!grepl(" hour | hours | hour| hours|hour |hours |hour|hours", timeSpan)) {
# First consequent if no hours
sub("^", "00:", timeSpan)
} else {
sub(" hour | hours | hour| hours|hour |hours |hour|hours", ":", timeSpan)
}
timeSpanM <- if(!grepl( " min | minutes | min| minutes|min |minutes |min|minutes", timeSpanH)) {
# first consequent if no minutes
paste("00:", timeSpanH, sep="")
} else{
sub(" min | minutes | min| minutes|min |minutes |min|minutes", ":", timeSpanH)
}
timeSpanS <- if(!grepl( " sec| secs| seconds|sec|secs|seconds", timeSpanM)) {
# first consequent if no seconds
paste(timeSpanM, "00", sep="")
} else{
sub(" sec| secs| seconds|sec|secs|seconds", "", timeSpanM)
}
return(timeSpanS)
}
### Use
test <- "1 hour 2 min 1 sec"
times(transform.span(test))
test1hour <- "1 hour"
times(transform.span(test1hour))
test15min <- "15 min"
times(transform.span(test15min))
test4sec <- "4 sec"
times(transform.span(test4sec))

Resources