I have data pulled from a database and stored in Stata .dta files. But when I read it into R using the foreign package, I get a date format unlike any I've seen. All of the other dates are "%m/%d/%Y" and import correctly.
I have searched the database's documentation, but there's no explanation for the odd date format for "DealActiveDate". The "facilitystartdate" date should be close to the "DealActiveDate", but not necessarily the same. Here are a few rows of these two columns.
facilitystartdate DealActiveDate
1 09/12/1987 874022400000
2 09/12/1987 874022400000
3 09/12/1987 874022400000
4 09/01/1987 873072000000
5 09/08/1987 873676800000
6 10/01/1987 875664000000
7 08/01/1987 870393600000
8 08/01/1987 870393600000
9 10/01/1987 875664000000
10 09/01/1987 873072000000
Please let me know if you have any idea how to convert "DealActiveDate" to a more conventional date. Thanks! (I'm not sure SO is the best venue, but I couln't think of any other options!)
Looks like milliseconds since 1960-01-01:
as.POSIXct(874022400000/1000, origin="1960-01-01")
# [1] "1987-09-12 01:00:00 CDT"
Related
When reading column of date and time format of excel file, the following problem occurs.
Any help would be greatly appreciated.
test <- read_excel('test.xlsx')
Data to read
2017-03-03
2017-03-04
2017-03-05
2017-03-06
2017-03-07
2017-03-08
2017-03-09
2017-03-10
1010-01-01
After Loading R
test
A tibble: 9 x 1
test1
1 42797
2 42798
3 42799
4 42800
5 42801
6 42802
7 42803
8 42804
9 1010-01-01
Try defining the column type in the function call:
read_excel("test.xlsx", col_types = "date")
It looks like some cells are formatted as date in excel and others probably as character. If you fix that column in Excel by setting the correct format for it, it should fix it too.
EDIT: There was a screenschot in the questions that hinted that the data wasn't in the same format in all cells (content was aligned differently). It is now deleted.
I am doing some date/time manipulation and experiencing explicable, but unpleasant, round-tripping problems when converting date -> time -> date . I have temporarily overcome this problem by rounding at appropriate points, but I wonder if there are best practices for date handling that would be cleaner. I'm using a mix of base-R and lubridate functions.
tl;dr is there a good, simple way to convert from decimal date (YYYY.fff) to the Date class (and back) without going through POSIXt and incurring round-off (and potentially time-zone) complications??
Start with a few days from 1918, as separate year/month/day columns (not a critical part of my problem, but it's where my pipeline happens to start):
library(lubridate)
dd <- data.frame(year=1918,month=9,day=1:12)
Convert year/month/day -> date -> time:
dd <- transform(dd,
time=decimal_date(make_date(year, month, day)))
The successive differences in the resulting time vector are not exactly 1 because of roundoff: this is understandable but leads to problems down the road.
table(diff(dd$time)*365)
## 0.999999999985448 1.00000000006844
## 9 2
Now suppose I convert back to a date: the dates are slightly before or after midnight (off by <1 second in either direction):
d2 <- lubridate::date_decimal(dd$time)
# [1] "1918-09-01 00:00:00 UTC" "1918-09-02 00:00:00 UTC"
# [3] "1918-09-03 00:00:00 UTC" "1918-09-03 23:59:59 UTC"
# [5] "1918-09-04 23:59:59 UTC" "1918-09-05 23:59:59 UTC"
# [7] "1918-09-07 00:00:00 UTC" "1918-09-08 00:00:00 UTC"
# [9] "1918-09-09 00:00:00 UTC" "1918-09-09 23:59:59 UTC"
# [11] "1918-09-10 23:59:59 UTC" "1918-09-12 00:00:00 UTC"
If I now want dates (rather than POSIXct objects) I can use as.Date(), but to my dismay as.Date() truncates rather than rounding ...
tt <- as.Date(d2)
## [1] "1918-09-01" "1918-09-02" "1918-09-03" "1918-09-03" "1918-09-04"
## [6] "1918-09-05" "1918-09-07" "1918-09-08" "1918-09-09" "1918-09-09"
##[11] "1918-09-10" "1918-09-12"
So the differences are now 0/1/2 days:
table(diff(tt))
# 0 1 2
# 2 7 2
I can fix this by rounding first:
table(diff(as.Date(round(d2))))
## 1
## 11
but I wonder if there is a better way (e.g. keeping POSIXct out of my pipeline and staying with dates ...
As suggested by this R-help desk article from 2004 by Grothendieck and Petzoldt:
When considering which class to use, always
choose the least complex class that will support the
application. That is, use Date if possible, otherwise use
chron and otherwise use the POSIX classes. Such a strategy will greatly reduce the potential for error and increase the reliability of your application.
The extensive table in this article shows how to translate among Date, chron, and POSIXct, but doesn't include decimal time as one of the candidates ...
It seems like it would be best to avoid converting back from decimal time if at all possible.
When converting from date to decimal date, one also needs to account for time. Since Date does not have a specific time associated with it, decimal_date inherently assumes it to be 00:00:00.
However, if we are concerned only with the date (and not the time), we could assume the time to be anything. Arguably, middle of the day (12:00:00) is as good as the beginning of the day (00:00:00). This would make the conversion back to Date more reliable as we are not at the midnight mark and a few seconds off does not affect the output. One of the ways to do this would be to add 12*60*60/(365*24*60*60) to dd$time
dd$time2 = dd$time + 12*60*60/(365*24*60*60)
data.frame(dd[1:3],
"00:00:00" = as.Date(date_decimal(dd$time)),
"12:00:00" = as.Date(date_decimal(dd$time2)),
check.names = FALSE)
# year month day 00:00:00 12:00:00
#1 1918 9 1 1918-09-01 1918-09-01
#2 1918 9 2 1918-09-02 1918-09-02
#3 1918 9 3 1918-09-03 1918-09-03
#4 1918 9 4 1918-09-03 1918-09-04
#5 1918 9 5 1918-09-04 1918-09-05
#6 1918 9 6 1918-09-05 1918-09-06
#7 1918 9 7 1918-09-07 1918-09-07
#8 1918 9 8 1918-09-08 1918-09-08
#9 1918 9 9 1918-09-09 1918-09-09
#10 1918 9 10 1918-09-09 1918-09-10
#11 1918 9 11 1918-09-10 1918-09-11
#12 1918 9 12 1918-09-12 1918-09-12
It should be noted, however, that the value of decimal time obtained in this way will be different.
lubridate::decimal_date() is returning a numeric. If I understand you correctly, the question is how to convert that numeric into Date and have it round appropriately without bouncing through POSIXct.
as.Date(1L, origin = '1970-01-01') shows us that we can provide as.Date with days since some specified origin and convert immediately to the Date type. Knowing this, we can skip the year part entirely and set it as origin. Then we can convert our decimal dates to days:
as.Date((dd$time-trunc(dd$time)) * 365, origin = "1918-01-01").
So, a function like this might do the trick (at least for years without leap days):
date_decimal2 <- function(decimal_date) {
years <- trunc(decimal_date)
origins <- paste0(years, "-01-01")
# c.f. https://stackoverflow.com/questions/14449166/dates-with-lapply-and-sapply
do.call(c, mapply(as.Date.numeric, x = (decimal_date-years) * 365, origin = origins, SIMPLIFY = FALSE))
}
Side note: I admit I went down a bit of a rabbit hole with trying to move origin around deal with the pre-1970 date. I found that the further origin shifted from the target date, the more weird the results got (and not in ways that seemed to be easily explained by leap days). Since origin is flexible, I decided to target it right on top of the target values. For leap days, seconds, and whatever other weirdness time has in store for us, on your own head be it. =)
I'll get straight to the point: I have been given some data sets in .csv format containing regularly logged sensor data from a machine. However, this data set also contains measurements taken when the machine is turned off, which I would like to separate from the data logged from when it is turned on. To subset the relevant data I also have a file containing start and end times of these shutdowns. This file is several hundred rows long.
Examples of the relevant files for this problem:
file: sensor_data.csv
sens_name,time,measurement
sens_A,17/12/11 06:45,32.3321
sens_A,17/12/11 08:01,36.1290
sens_B,17/12/11 05:32,17.1122
sens_B,18/12/11 03:43,12.3189
##################################################
file: shutdowns.csv
shutdown_start,shutdown_end
17/12/11 07:46,17/12/11 08:23
17/12/11 08:23,17/12/11 09:00
17/12/11 09:00,17/12/11 13:30
18/12/11 01:42,18/12/11 07:43
To subset data in R, I have previously used the subset() function with simple conditions which has worked fine, but I don't know how to go about subsetting sensor data which fall outside multiple shutdown date ranges. I've already formatted the date and time data using as.POSIXlt().
I'm suspecting some scripting may be involved to come up with a good solution, but I'm afraid I am not yet experienced enough to handle this type of data.
Any help, advice, or solutions will be greatly appreciated. Let me know if there's anything else needed for a solution.
I prefer POSIXct format for ranges within data frames. We create an index for sensors operating during shutdowns with t < shutdown_start OR t > shutdown_end. With these ranges we can then subset the data as necessary:
posixct <- function(x) as.POSIXct(x, format="%d/%m/%y %H:%M")
sensor_data$time <- posixct(sensor_data$time)
shutdowns[] <- lapply(shutdowns, posixct)
ind1 <- sapply(sensor_data$time, function(t) {
sum(t < shutdowns[,1] | t > shutdowns[,2]) == length(sensor_data$time)})
#Measurements taken when shutdown
sensor_data[ind1,]
# sens_name time measurement
# 1 sens_A 2011-12-17 06:45:00 32.3321
# 3 sens_B 2011-12-17 05:32:00 17.1122
#Measurements taken when not shutdown
sensor_data[!ind1,]
# sens_name time measurement
# 2 sens_A 2011-12-17 08:01:00 36.1290
# 4 sens_B 2011-12-18 03:43:00 12.3189
I thought I would post here since I have spent hours trying to figure this out. So I'm working with a csv file with Date and Closing return price. However, I can't get the file to be "timeBased." (timeBased function is from package xts). For example:
timeBased(dfx)
[1] FALSE
Here is what I have:
dfx = xts(aus$AUS, order.by=as.Date(aus$DATE))
and here's what the first 10 rows look like of the file:
DATE AUS
1 12/1/1988 -0.0031599720
2 12/2/1988 -0.0015724670
3 12/5/1988 -0.0000897619
4 12/6/1988 -0.0022670620
5 12/7/1988 0.0052895550
6 12/8/1988 -0.0048259860
7 12/9/1988 0.0106990910
8 12/12/1988 0.0033538810
9 12/13/1988 0.0118568700
10 12/14/1988 -0.0050105200
If anyone can help, I would appreciate it! I tried multiple codes using zoo and other edits, but nothing. Thank you!![enter image description here][1]
As Joshua Ulrich points out, using the timeBased function with an xts object should be expected to return FALSE. In addition to that, there may be another problem with your code. Assuming that your example displays the contents of aus, then aus$DATE is actually a factor or character data, not a Date object. To properly convert to an xts object, you'll have to specify the date format of the aus$DATE data. To convert and then test whether dfx is an xts object, you could use the following code:
dfx = xts(aus$AUS, order.by=as.Date(aus$DATE, "%m/%d/%Y"))
dfx
[,1]
1988-12-01 -0.0031599720
1988-12-02 -0.0015724670
1988-12-05 -0.0000897619
1988-12-06 -0.0022670620
timeBased(dfx)
[1] FALSE
is.xts(dfx)
[1] TRUE
I have looked all over the internet to find an answer to my problem and failed.
I am using R with the Rmetrics package.
I tried reading my own dataset.csv via the readSeries function but sadly the dates I entered are not imported correctly, now every row has the current date.
I tried using their sample data sets, exported them to csv and re-imported an it creates the same problem.
You can test it using this code:
data <- head(SWX.RET[,1:3])
write.csv(data, file="myData.csv")
data2 <- readSeries(file="myData.csv",header=T,sep=",")
If you now check the data2 time series you will notice that the row date is the current date.
I am confused why this is and what to do to fix it.
Your help is much appreciated!
This can be achieved with the extra option row.names=FALSE to the write.csv() functions; see its help page for details. Here is a worked example:
R> fakeData <- data.frame(date=Sys.Date()+seq(-7,-1), value=runif(7))
R> fakeData
date value
1 2011-02-14 0.261088
2 2011-02-15 0.514413
3 2011-02-16 0.675607
4 2011-02-17 0.982817
5 2011-02-18 0.759544
6 2011-02-19 0.566488
7 2011-02-20 0.849690
R> write.csv(fakeData, "/tmp/fakeDate.csv", row.names=FALSE, quote=FALSE)
R> readSeries("/tmp/fakeDate.csv", header=TRUE, sep=",")
GMT
value
2011-02-14 0.261088
2011-02-15 0.514413
2011-02-16 0.675607
2011-02-17 0.982817
2011-02-18 0.759544
2011-02-19 0.566488
2011-02-20 0.849690
R>