I'm reading in a date time column from Excel into R yet when converting to POSXIct and using the origin 1970-01-01 only converts to date time in 1970. Same with the lubridate package using the origin as lubridate=orgin. Any thoughts on how to remedy this?
the xlsx package works fine but for some reason the openxlsx does not.
test2 <- read.xlsx (FN, sheet = 3, startRow = 35, cols = 1:77)
test2$dt <- as.POSIXct(test2$DateTime, origin="1970-01-01")
DateTime reads in from excel as numeric and 43306.29
should be 7-25-2018 7:00:00 after conversion to POSXIct format but is 1970-01-01 05:01:46
One needs to know the key differences between the two time systems here.
In Excel 43306.29 means 43306 days from Jan 1, 1900 (day 1) and the 0.29 is the fraction of the day (about 7 hours here).
R uses Unix time keeping standard so it tracks the number of seconds from the Jan 1, 1970 (the beginning of time for a Unix programmer).
So in order to convert from Excel to R, you need to covert the number of days from the origin to the number of seconds (60 sec * 60 min * 24 hours).
as.POSIXct(43306.29*3600*24 , origin="1899-12-30")
#"2018-07-25 02:57:36 EDT"
as.POSIXct(43306.29*3600*24 , origin="1899-12-30", tz="UTC")
#"2018-07-25 06:57:36 UTC"
Note: Windows and Excel assumes there was a leap year in 1900 which there wasn't so the origin needs a correction to Dec 30, 1899.
Related
I have a column of dates with the following format in excel: MM/DD/YY AM or MM/DD/YY PM and was able to parse this date after importing with readxl::read_excel.
parse_date_time(x, '%m/%d/%y %p', tz = "UTC")
Now, if I wanted to bring in MM/DD/YY HH:MM PM instead, the import comes in as a number. For example.
"3/16/20 3:00 PM" becomes 43906.625 after import.
One solution would be to import date columns as strings, however, I have 50 columns in the file and don't want to hard code each column type. Is there a way to get the date and time from this numerical value instead (i.e. 43906.625)?
Excel uses a "day-integer" format. R uses "seconds-integer" for time and "day-integer" for Date, so depending on which class you are converting to, you need to accommodate a day of seconds (86,400). It is also worth knowing that Excel uses an "origin" from 1899 (the year).
as.POSIXct(43906.625 * 86400, origin = "1899-12-30", tz = "UTC")
# [1] "2020-03-16 15:00:00 UTC"
As a bit of history: the reason that it's "1899-12-30" and not, say "1899-12-31" (end of the day?) or something else is mentioned in a blog post from 2013:
For Excel on Windows, the origin date is December 30, 1899 for dates after 1900. (Excel’s designer thought 1900 was a leap year, but it was not.) For Excel on Mac, the origin date is January 1, 1904.
https://www.r-bloggers.com/date-formats-in-r/
I don't know the canonical reference for this, and the website from which R-Bloggers borrowed/scraped that article from is not responsive. I would much prefer to have still-active and more-canonical references for this assertion (that engineers mis-identified the leap-year).
This question already has an answer here:
Converting timestamp in microseconds to data and time in r
(1 answer)
Closed 3 years ago.
I have dates formatted like so: 1.475534e+15, which when converted via https://www.epochconverter.com/ convert to Monday, October 3, 2016 10:33:20 PM
However, I cannot replicate this in R.
For example:
library(anytime)
anytime(1.475534e+15)
Yields "46759781-01-30 14:33:20 EST"
The same is true if I do something like
as.POSIXct(1.475534e+15 / 1000, origin="1970-01-01")
The epochconverter site suggests that the time is in microseconds, but I haven't figured out how to convert from microseconds to a human-readable date.
a <- 1.475534e+15
as.POSIXct(a/1000000, origin="1970-01-01")
#[1] "2016-10-03 15:33:20 PDT" # interpreted in my local tz
With 7 significant digits in the scientific notation, that gets us around 20 minutes of time resolution. If you need more than that, you'll need to get the data in different format upstream.
I have a 1gb csv file with Dates and according values. Now is the Dates are in "undefined Format" - so they are diplayed as numbers in Excel like this:
DATE FXVol.DKK.EUR,0.75,4
38719 0.21825
I cannot open the csv file and change it to the date format I like since I would lose data in this way.
If I now import the data to R and convert the Dates:
as.Date( workingfilereturns[,1], format = "%Y-%m-%d")
It always yields dates that are 70 years + so 2076 instead of 2006. I really have no idea what goes wrong or how to fix this issue.
(Note: I have added a note about some quirks in R when dealing with Excel data. You may want to skip directly to that at the bottom; what follows first is the original answer.)
Going by your sample data, 38719 appears to be the number of days which have elapsed since January 1, 1900. So you can just add this number of days to January 1, 1900 to arrive at the correct Date object which you want:
as.Date("1900-01-01") + workingfilereturns[,1]
or
as.Date("1900-01-01") + workingfilereturns$DATE
Example:
> as.Date("1900-01-01") + 38719
[1] "2006-01-04"
Update:
As #Roland correctly pointed out, you could also use as.Date.numeric while specifying an origin of January 1, 1900:
> as.Date.numeric(38719, origin="1900-01-01")
[1] "2006-01-04"
Bug warning:
As the asker #Methamortix pointed out, my solution, namely using January 1, 1900, as the origin, yields a date which is two days too late in R. There are two reasons for this:
In R, the origin is indexed with 0, meaning that as.Date.numeric(0, origin="1900-01-01") is January 1, 1900, in R, but Excel starts counting at 1, meaning that formatting the number 1 in Excel as a Date yields January 1, 1900. This explains why R is one day ahead of Excel.
(Hold your breath) It appears that Excel has a bug in the year 1900, specifically Excel thinks that February 29, 1900 actually happened, even though 1900 was not a leap year (http://www.miniwebtool.com/leap-years-list/?start_year=1850&end_year=2020). As a result, when dealing with dates greater than February 28, 1900, R is a second day ahead of Excel.
As evidence of this, consider the following code:
> as.Date.numeric(57, origin="1900-01-01")
[1] "1900-02-27"
> as.Date.numeric(58, origin="1900-01-01")
[1] "1900-02-28"
> as.Date.numeric(59, origin="1900-01-01")
[1] "1900-03-01"
In other words, R's as.Date() correctly skipped over February 29th. But type the number 60 into a cell in Excel, format as date, and it will come back as February 29, 1900. My guess is that this has been reported somewhere, possibly on Stack Overflow or elsewhere, but let this serve as another reference point.
So, going back to the original question, the origin needs to be offset by 2 days when dealing with Excel dates in R, where the date is greater than February 28, 1900 (which is the case of the original problem). So he should use his date data frame in the following way:
as.Date.numeric(workingfilereturns$DATE - 2, origin="1900-01-01")
where the date column has been rolled back by two days to sync up with the values in Excel.
I read data from an xls file. Apparently, the time is not in the right format. It is as follows (for example)
0.3840277777777778
0.3847222222222222
0.3854166666666667
Indeed, they should be
09:12
09:13
09:13
I don't know how to convert it to the right format. I searched several threads and all of them are about converting the date (with/without time) to the right format.
Can somebody give me any clues?
You can use as.POSIXct after having multiplied your number by the number of seconds in a day (60 * 60 * 24)
nTime <- c(0.3840277777777778, 0.3847222222222222, 0.3854166666666667)
format(as.POSIXct((nTime) * 86400, origin = "1970-01-01", tz = "UTC"), "%H:%M")
## [1] "09:13" "09:14" "09:15"
Another option is times from chron
library(chron)
times(nTime)
#[1] 09:13:00 09:14:00 09:15:00
To strip off the seconds,
substr(times(nTime),1,5)
#[1] "09:13" "09:14" "09:15"
data
nTime <- c(0.3840277777777778, 0.3847222222222222, 0.3854166666666667)
For people who want the opposite way: given the 09:13:00, get 0.3840278
as.numeric(chron::times("09:13:00"))
Essentially, the idea is that one whole day is 1,so noon (12pm) is 0.5.
I'm trying to get acquainted with weatherData in R.
Having downloaded a set of temperature data I've then exported it to CSV.
Opening the CSV in Libre Calc shows the date and time for each temperature reading as a string of ten digits. In spite of some Googling I have not found a way of successfully converting the string into the format in which it appears in R.
For example: 1357084200 I believe should translate to 2013-01-01 23:50:00
Any help in getting the correct date in the same date format to appear in Calc via the CSV greatly appreciated.
Here is the direct way:
as.POSIXct(1357084200, origin="1970-01-01", tz="GMT")
#[1] "2013-01-01 23:50:00 GMT"
If it's really a character:
as.POSIXct(as.numeric("1357084200"), origin="1970-01-01", tz="GMT")
I'm not aware of a direct way of doing this, but I believe I've figured out a workaround.
For starters your example is correct. The long number (timestamp) is the number of seconds passed since 1970-01-01 00:00:00. Knowing this you can actually calculate the exact date and time from the timestamp. It's a bit complicated due to needing to take into account the leap years.
What comes in handy is the ability to supply an arbitrary number of days/months/years to LibreOffice function DATE. So in essence you can find out the number of days represented in timestamp by dividing it by 60*60*24 (number of seconds in a minute, number of minutes in an hour, number of hours in a day). And then supply that number to the date function.
timestamp = 1357084200
days = timestamp / FLOOR(timestamp / (60*60*24); 1) // comes out at 15706
actualdate = DATE(1970; 1; 1 + days) // comes out at 2013-01-01
seconds = timestamp - days * 60 * 60 * 24 // comes out at 85800
actualtime = TIME(0; 0; seconds) // comes out at 23:50:00
Then you can concatenate these or whatever else you want to do.