Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have integers like the following:
41764 41764 42634 42634 42445 42445 41792 41807 41813 41842 41838 41848 41849 41837
Which need to be converted into date, the time doesn't matter.
I'm told that when it's converted it should be in the year 2014, current conversions I've tried have either given the year as 1984 or 2084.
Thanks!
Sam Firke's janitor package includes a function for cleaning up this Excel mess:
x <- c(41764L, 41764L, 42634L, 42634L, 42445L, 42445L, 41792L, 41807L,
41813L, 41842L, 41838L, 41848L, 41849L, 41837L)
janitor::excel_numeric_to_date(x)
## [1] "2014-05-05" "2014-05-05" "2016-09-21" "2016-09-21" "2016-03-16" "2016-03-16" "2014-06-02"
## [8] "2014-06-17" "2014-06-23" "2014-07-22" "2014-07-18" "2014-07-28" "2014-07-29" "2014-07-17"
Excel reader functions likely take care of this for you, which would be the best approach.
I assume your have Excel date integers here. Microsoft Office Excel stores dates as sequential numbers that are called serial values. For example, in Microsoft Office Excel for Windows, January 1, 1900 is serial number 1, and January 1, 2008 is serial number 39448 because it is 39,448 days after January 1, 1900.
Please note Excel incorrectly assumes that the year 1900 is a leap year. No problem when calculating today only.
Microsoft Excel correctly handles all other leap years, including century years that are not leap years (for example, 2100). Only the year 1900 is incorrectly handled.
See Microsoft Knowledge Base for further information.
There is a offset of two days between the R script proposed by #loki and a calculation in Excel.
Please read following date conversion help documents (snippet see below):
## date given as number of days since 1900-01-01 (a date in 1989)
as.Date(32768, origin = "1900-01-01")
## Excel is said to use 1900-01-01 as day 1 (Windows default) or
## 1904-01-01 as day 0 (Mac default), but this is complicated by Excel
## incorrectly treating 1900 as a leap year.
## So for dates (post-1901) from Windows Excel
as.Date(35981, origin = "1899-12-30") # 1998-07-05
## and Mac Excel
as.Date(34519, origin = "1904-01-01") # 1998-07-05
## (these values come from http://support.microsoft.com/kb/214330)
use as.Date() as #MFR pointed out. However, use the origin 1900-01-01
x <- c(41764, 41764, 42634, 42634, 42445, 42445, 41792, 41807,
41813, 41842, 41838, 41848, 41849, 41837)
as.POSIXct.as.Date(x, origin = "1900-01-01")
# [1] "2014-05-07" "2014-05-07" "2016-09-23" "2016-09-23" "2016-03-18"
# [6] "2016-03-18" "2014-06-04" "2014-06-19" "2014-06-25" "2014-07-24"
# [11] "2014-07-20" "2014-07-30" "2014-07-31" "2014-07-19"
Related
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I have a column full of dates from 2003 and 2004. Currently, the column is a string that looks like the following (see column 6):
I want to convert that column to a date, but I'm not sure how to because of the abbreviated month in the middle of the string. I've tried mutating a number of ways, but can't seem to get it to work.
Ultimately, my goal is to use diff_time() to show how many days from January 1st 2003 each entry is. For instance, January 3rd, 2003 would be displayed as 3, and January 5th, 2004 would be displayed as 370 and so on and so forth. Here is what I'm working with at the moment:
data$contact_date1 <- as.Date(data$contact_date1, format="%d-%m-%y")
data$contact_date1 <- difftime(
as.POSIXct("2003-01-01 12:00:00"),
data$contact_date1,
unit = "days")
Any help would be very appreciated! Thanks so much.
As stated in the comments, you should use %b to decode your dates.
Have a look at the ?strptime help page for this.
first.date <- as.POSIXct("2003-01-01 12:00:00")
weird.dates <- c("17NOV03","06May02")
?strptime
times <- as.Date(weird.dates, format = "%d%b%y")
difftime(time1 = first.date, time2 = times)
#> Time differences in days
#> [1] -319.5417 240.4583
Created on 2021-09-19 by the reprex package (v2.0.1)
Use the package lubridate; its function dmy converts your data format into the yyyy-mm-dd format:
library(lubridate)
round(difftime(dmy(data$contact_date1),"2003-01-01", units = "days"), 0)
Time differences in days
[1] 9 45
To add the days to the dataframe:
data$days <- round(difftime(dmy(data$contact_date1),"2003-01-01", units = "days"), 0)
Toy data:
data <- data.frame(
contact_date1 = c("10JAN2003", "15FEB2003")
)
This question already has an answer here:
Converting timestamp in microseconds to data and time in r
(1 answer)
Closed 3 years ago.
I have dates formatted like so: 1.475534e+15, which when converted via https://www.epochconverter.com/ convert to Monday, October 3, 2016 10:33:20 PM
However, I cannot replicate this in R.
For example:
library(anytime)
anytime(1.475534e+15)
Yields "46759781-01-30 14:33:20 EST"
The same is true if I do something like
as.POSIXct(1.475534e+15 / 1000, origin="1970-01-01")
The epochconverter site suggests that the time is in microseconds, but I haven't figured out how to convert from microseconds to a human-readable date.
a <- 1.475534e+15
as.POSIXct(a/1000000, origin="1970-01-01")
#[1] "2016-10-03 15:33:20 PDT" # interpreted in my local tz
With 7 significant digits in the scientific notation, that gets us around 20 minutes of time resolution. If you need more than that, you'll need to get the data in different format upstream.
I have a 1gb csv file with Dates and according values. Now is the Dates are in "undefined Format" - so they are diplayed as numbers in Excel like this:
DATE FXVol.DKK.EUR,0.75,4
38719 0.21825
I cannot open the csv file and change it to the date format I like since I would lose data in this way.
If I now import the data to R and convert the Dates:
as.Date( workingfilereturns[,1], format = "%Y-%m-%d")
It always yields dates that are 70 years + so 2076 instead of 2006. I really have no idea what goes wrong or how to fix this issue.
(Note: I have added a note about some quirks in R when dealing with Excel data. You may want to skip directly to that at the bottom; what follows first is the original answer.)
Going by your sample data, 38719 appears to be the number of days which have elapsed since January 1, 1900. So you can just add this number of days to January 1, 1900 to arrive at the correct Date object which you want:
as.Date("1900-01-01") + workingfilereturns[,1]
or
as.Date("1900-01-01") + workingfilereturns$DATE
Example:
> as.Date("1900-01-01") + 38719
[1] "2006-01-04"
Update:
As #Roland correctly pointed out, you could also use as.Date.numeric while specifying an origin of January 1, 1900:
> as.Date.numeric(38719, origin="1900-01-01")
[1] "2006-01-04"
Bug warning:
As the asker #Methamortix pointed out, my solution, namely using January 1, 1900, as the origin, yields a date which is two days too late in R. There are two reasons for this:
In R, the origin is indexed with 0, meaning that as.Date.numeric(0, origin="1900-01-01") is January 1, 1900, in R, but Excel starts counting at 1, meaning that formatting the number 1 in Excel as a Date yields January 1, 1900. This explains why R is one day ahead of Excel.
(Hold your breath) It appears that Excel has a bug in the year 1900, specifically Excel thinks that February 29, 1900 actually happened, even though 1900 was not a leap year (http://www.miniwebtool.com/leap-years-list/?start_year=1850&end_year=2020). As a result, when dealing with dates greater than February 28, 1900, R is a second day ahead of Excel.
As evidence of this, consider the following code:
> as.Date.numeric(57, origin="1900-01-01")
[1] "1900-02-27"
> as.Date.numeric(58, origin="1900-01-01")
[1] "1900-02-28"
> as.Date.numeric(59, origin="1900-01-01")
[1] "1900-03-01"
In other words, R's as.Date() correctly skipped over February 29th. But type the number 60 into a cell in Excel, format as date, and it will come back as February 29, 1900. My guess is that this has been reported somewhere, possibly on Stack Overflow or elsewhere, but let this serve as another reference point.
So, going back to the original question, the origin needs to be offset by 2 days when dealing with Excel dates in R, where the date is greater than February 28, 1900 (which is the case of the original problem). So he should use his date data frame in the following way:
as.Date.numeric(workingfilereturns$DATE - 2, origin="1900-01-01")
where the date column has been rolled back by two days to sync up with the values in Excel.
The question is quite simple: I have a txt data imported in R. However, I forgot to change the date format to dd/mm/yyyy. For example: instead of having 30/09/2015 I have 42277.
Of course I could go back to my excel and change the column format from number to date and get the dd/mm/yyyy format easily. But I was thinking if there is a way of doing that inside R. I have several packages here, such as XLConnect but there is nothing there.
Here's how to convert Excel-style dates:
as.Date(42277, origin="1899-12-30")
The help file for as.Date discusses the vagaries of conversion from other time systems and includes a discussion and example for Excel.
## Excel is said to use 1900-01-01 as day 1 (Windows default) or
## 1904-01-01 as day 0 (Mac default), but this is complicated by Excel
## incorrectly treating 1900 as a leap year.
## So for dates (post-1901) from Windows Excel
as.Date(35981, origin = "1899-12-30") # 1998-07-05
## and Mac Excel
as.Date(34519, origin = "1904-01-01") # 1998-07-05
## (these values come from http://support.microsoft.com/kb/214330)
I have a netCDF file and I am attempting to identify the first date in the file and the 'base date'. The file contains monthly data. My notes, which are fairly old, indicate the first date is January 1, 1948.
The following R code gives the first date as 17067072:
library(ncdf)
library(chron)
my.data <- open.ncdf('my.netCDF.nc')
x = get.var.ncdf(my.data, "lon" )
y = get.var.ncdf(my.data, "lat" )
z = get.var.ncdf(my.data, "time")
z[1:5]
# [1] 17067072 17067816 17068512 17069256 17069976
I downloaded an application called ncdump.exe and after typing the following line in the Windows command window:
C:\Users\Mark W Miller\ncdump>ncdump -h my.netCDF.nc
I learned that the base date is:
time:units = "hours since 1-1-1 00:00:0.0" ;
This same base data is obtained in R using:
att.get.ncdf(my.data,"time","units")$value
[1] "hours since 1-1-1 00:00:0.0"
I tried to verify that with the following R code:
date1 <- as.Date("01/01/0001", "%m/%d/%Y")
date1
# [1] "0001-01-01"
date2 <- as.Date("01/01/1948", "%m/%d/%Y")
date2
# [1] "1948-01-01"
period <- as.Date(date1:date2, origin = "00-01-01")
hours <- 24 * (length(period)-1)
hours
# [1] 17067024
There is a difference of 48 hours between the number in z[1] and the number returned by the R code immediately above:
17067072 - 17067024
[1] 48
Where is my error? Since the netCDF file contains monthly data I doubt the first date is January 3, 1948. The website from which I downloaded the data does not offer the option of selecting the day within month.
The application ncdump.exe can be downloaded from here:
http://www.narccap.ucar.edu/data/ascii-howto.html
If I can figure out how to subset the netCDF file I might upload the smaller file somewhere.
Thank you for any advice.
Have you looked at your period vector? when I looked at the first few and last few values the year comes out as something that does not make sense. Possibly something is messed up in one of the conversions.
Also note that same computer programs treat 1900 as a leap year even though it was not, a difference between 2 programs on that factor could account for 24 of the hours in your difference.