I imported date variables as strings from SQL (date1) into Stata and then created a new date variable (date2) like this:
gen double date2 = clock(date1, "YMDhms")
format date2 %tc
However, now I want to calculate the number of days between two dates (date3-date2), formatted as above, but I can't seem to do it.
I don't care about the hms, so perhaps I should strip that out first? And then deconstruct the date into YYYY MM DD as separate variables? Nothing I seems to do is working right now.
It sounds like by dates you actually mean timestamp (aka datetime) variables. In my experience, there's usually no need to cast dates/timestamps as strings since ODBC and Stata will handle the conversion to SIF td/tc formats nicely.
But perhaps you exported to a text file and then read in the data instead. Here are a couple solutions.
tc timestamps are in milliseconds since 01jan1960 00:00:00.000, assuming 1000*60*60*24=86,400 seconds/day (that is, ignoring leap seconds). This means that you need to divide your difference by that number to get elapsed days.
For example, 2016 was a leap year:
. display (tc(01jan2017 00:00:00) - tc(01jan2016 00:00:00))/(1000*60*60*24)
366
You can also use the dofc() function to make dates out of timestamps and omit the division:
. display (dofc(tc(01jan2018 00:00:00)) - dofc(tc(01jan2016 00:00:00)))
731
2017 is not a leap year, so 366 + 365 = 731 days.
You can use generate with all these functions, though display is often easier for debugging initial attempts.
Related
so I am lost with the following problem:
I have a dataframe, in which one column contains (STARTED) the starting time of a survey, and several others information of the survey schedule of that survey participant (D5 to D10: only the planned survey dates, D17 to D50: planned send-out times of measurement per day). I'd like to create to columns that indicate now which survey day (1-6) and which measurement per day (1-6) this survey corresponds to.
First problem is the format (!)...
STARTED has the format %Y-%m-%d %H:%M:%S, D5 to D10 %d.%m.%Y and D17 to D50 %d.%m.%Y %H:%M.
I tried dmy_hms() from lubridate, parse_date_time(), and simply as.POSIXct(), but I always fail to get STARTED and the D17 to D50 section into a comparable format. Any solutions on this one?
After just separating STARTED into date & time columns, I was able to compare using ifelse() with D5 to D10 and to create the column of day running from 1 to 6.
This might be already more elegant with something like which(), but I was not able to create a vectorized version of this, as which(<<D5:D10>> == STARTED) would need to compare that per row. Does anyone have a solution for this?
And lastly, how on earth can I set up the second column indicating the measurement time? The first and last survey of the is easy, as there are also uniquely labelled, but for the other four ones I would need to compare per day whether the starting time is before the planned survey time of the following survey. I could imagine just checking whether STARTED falls in between two planned survey times just next to each other - as a POSIXct object that might work, if I can parse the different formats.
Help is greatly appreciated, thanks!
A screenshot from the beginning of the data:
Screenshot from R data using View()
For these first few rows, the intended variable day would need to be c(1,2,1,1,1,2,2) and measurement c(3,2,4,2,1,2,3).
Your other columns are not formatted with %d.%m.%Y, instead either %d.%m.%t (date only) or %d.%m.%y %H:%M. Note the change from %Y to %y.
Try:
as.Date("20.05.22", format = "%d.%m.%y")
# [1] "2022-05-20"
as.POSIXct("20.05.22 06:00", format = "%d.%m.%y %H:%M")
# [1] "2022-05-20 06:00:00 EDT"
I have a dataset in .csv, and I have added in a column on my own in the csv that takes the total time taken for a task to be completed. There are two other columns that consists of the start time and the end time, and that is where I calculated the total time taken column from. The format of the start time and end time columns are in the datetime format 5/7/2018 16:13 while the format of the total time taken column is 0:08:20(H:MM:SS).
I understand that for datetime, it is possible to use the functions as.Date or as.POSIXlt to change the variable type from a factor to that of date. Is there a function that I can convert my total time taken column to (from that of factor) so that I can use it to plot scatterplots/plots in general? I tried as.numeric but the numbers that come out are gibberish and do not correspond to the original time.
If you want to plot the total time taken for each row, then I would suggest just plotting that difference as seconds. Here is a code snippet which shows how you can convert your start or end date into a numerical value:
start <- "5/7/2018 16:13"
start_date <- as.POSIXct(start, format="%d/%m/%Y %H:%M")
as.numeric(start_date)
[1] 1530799980
The above is a UNIX timestamp, which is number of seconds since the epoch (January 1, 1970). But, since you want a difference between start and end times, this detail does not really matter for you, and the difference you get should be valid.
If you want to use minutes, hours, or some other time unit, then you can easily convert.
As you can see the question above, I was wondering if IDL is able to add or subtract days / months / years to a given date.
For example:
given_date = anytim('01-jan-2000')
print, given_date
1-Jan-2000 00:00:00.000
When I would add 2 weeks to the given_date, then this date should appear:
15-Jan-2000 00:00:00.000
I was already looking for a solution for this problem, but I unfortunately couldn't find any solution.
Note:
I am using a normal calendar date, not the julian date.
Are you only concerned with dates after 1582? Is accuracy to the second important?
The ANYTIM routine is not part of the IDL distribution. Possibly there are third party routines to handle time increments, but I don't know of any builtin to the IDL library.
By default, which you are using, ANYTIM returns seconds from Jan 1, 1979. So to add/subtract some number of days, weeks, or years, you could calculate the number of seconds in the time interval. Of course, this does not take into account leap seconds/years (but leap years are fairly easy to take into account, leap seconds requires a database of when they were added). And adding months is going to require determining which month so to determine the number of days in it.
IDL can convert to and from Julian dates using JULDAY and CALDAT.
You may also read and write Julian dates (which are doubles or long integers) to and from strings using the format keyword to PRINT, STRING, and READS.
You'll want to use the (C()) calendar date format code.
format='(c(cdi0,"-",cMoa,"-"cyi04," ",cHi02,":",cmi02,":",csf06.3))'
date = julday(1, 1, 2000)
print, date, format=format
; 1-Jan-2000 00:00:00.000
date = date + 14
print, date, format=format
; 15-Jan-2000 00:00:00.000
I noticed some strange xts behaviour when trying to split an object that goes back a long way. The behaviour of split changes at the epoch.
#Create some data
dates <- seq(as.Date("1960-01-01"),as.Date("1980-01-01"),"days")
x <- rnorm(length(dates))
data <- xts(x, order.by=dates)
If we split the xts object by week, it defines the last day of the week as Monday prior to 1970. Post-1970, it defines it as Sunday (expected behaviour).
#Split the data, keep the last day of the week
lastdayofweek <- do.call(rbind, lapply(split(data, "weeks"), last))
head(lastdayofweek)
tail(lastdayofweek)
1960 Calendar
1979 Calendar
This seems to only be a problem for weeks, not months or years.
#Split the data, keep the last day of the month
lastdayofmonth <- do.call(rbind, lapply(split(data, "months"), last))
head(lastdayofmonth)
tail(lastdayofmonth)
The behaviour seems likely to do with the following, though I am not sure why it would apply to weeks only. From the xts cran.
For dates prior to the epoch (1970-01-01) the ending time is aligned to the 59.0000 second. This is
due to a bug/feature in the R implementation of asPOSIXct and mktime0 at the C-source level. This
limits the precision of ranges prior to 1970 to 1 minute granularity with the current xts workaround.
My workaround has been to shift the dates before splitting the objects for data prior to 1970, if I am splitting on weeks. I expect someone else has a more elegant solution (or a way to avoid the error).
EDIT: To be clear as to what the question is, I am looking for an answer that
a) specifies why this happens (so I can understand the nature of the error better, and therefore avoid it) and/or
b) the best workaround to deal with it.
One "workaround" would be to check out Rev. 743 or earlier, as it appears to me that this broke in Rev. 744.
svn checkout svn://svn.r-forge.r-project.org/svnroot/xts/#743
But, a much better idea is to file a bug report so that you don't have to use an old version forever. (also, of course, other bugs may have been patched and/or new features added since Rev. 743)
How can I accurately convert the products (units is in days) of the difftime below to years, months and days?
difftime(Sys.time(),"1931-04-10")
difftime(Sys.time(),"2012-04-10")
This does years and days but how could I include months?
yd.conv<-function(days, print=TRUE){
x<-days*0.00273790700698851
x2<-floor(x)
x3<-x-x2
x4<-floor(x3*365.25)
if (print) cat(x2,"years &",x4,"days\n")
invisible(c(x2, x4))
}
yd.conv(difftime(Sys.time(),"1931-04-10"))
yd.conv(difftime(Sys.time(),"2012-04-10"))
I'm not sure how to even define months either. Would 4 weeks be considered a month or the passing of the same month day. So for the later definition of a month if the initial date was 2012-01-10 and the current 2012-05-31 then we'd have 0 years, 5 months and 21 days. This works well but what if the original date was on the 31st of the month and the end date was on feb 28 would this be considered a month?
As I wrote this question the question itself evolved so I'd better clarify:
What would be the best (most logical approach) to defining months and then how to find diff time in years, months and days?
If you're doing something like
difftime(Sys.time(), someDate)
It comes as implied that you must know what someDate is. In that case, you can convert this to a POSIXct class object that gives you the ability to extract temporal information directly (package chron offers more methods, too). For instance
as.POSIXct(c(difftime(Sys.time(), someDate, units = "sec")), origin = someDate)
This will return your desired date object. If you have a timezone tz to feed into difftime, you can also pass that directly to the tz parameter in as.POSIXct.
Now that you have your date object, you can run things like months(.) and if you have chron you can do years(.) and days(.) (returns ordered factor).
From here, you could do more simple math on the difference of years, months, and days separately (converting to appropriate numeric representations). Of course, convert someDate to POSIXct will be required.
EDIT: On second thought, months(.) returns a character representation of the month, so that may not be efficient. At least, it'll require a little processing (not too difficult) to give a numeric representation.
R has not implemented these features out of ignorance. difftime objects are transitive. A 700 day difference on any arbitrary start-date can yield a differing number of years depending on whether there was a leap year or not. Similarly for months, they take between 28-31 days.
For research purposes, we use these units a lot (months and years) and pragmatically, we define a year as 365.25 days and a month as 365.25/12 = 30.4375 days.
To do arithmetic on a given difftime, you must convert this value to numeric using as.numeric(difftime.obj) which is, in default, days so R stops spouting off the units.
You can not simply convert a difftime to month, since the definition of months depends on the absolute time at which the difftime has started.
You'll need to know the start date or the end date to accurately tell the number of months.
You could then, e.g., calculate the number of months in the first year of your timespan, the number of month in the last your of the timespan, and add the number of years between times 12.
Hmm. I think the most sensible would be to look at the various units themselves. So compare the day of the month first, then compare the month of the year, then compare the year. At each point, you can introduce a carry to avoid negative values.
In other words, don't work with the product of difftime, but recode your own difftime.