Convert excel date code (text style) in R program - r

The question is quite simple: I have a txt data imported in R. However, I forgot to change the date format to dd/mm/yyyy. For example: instead of having 30/09/2015 I have 42277.
Of course I could go back to my excel and change the column format from number to date and get the dd/mm/yyyy format easily. But I was thinking if there is a way of doing that inside R. I have several packages here, such as XLConnect but there is nothing there.

Here's how to convert Excel-style dates:
as.Date(42277, origin="1899-12-30")

The help file for as.Date discusses the vagaries of conversion from other time systems and includes a discussion and example for Excel.
## Excel is said to use 1900-01-01 as day 1 (Windows default) or
## 1904-01-01 as day 0 (Mac default), but this is complicated by Excel
## incorrectly treating 1900 as a leap year.
## So for dates (post-1901) from Windows Excel
as.Date(35981, origin = "1899-12-30") # 1998-07-05
## and Mac Excel
as.Date(34519, origin = "1904-01-01") # 1998-07-05
## (these values come from http://support.microsoft.com/kb/214330)

Related

How do i fix date error when loading excel files in R

x1 <- read_excel("path",sheet = 1,skip=1,col_names =TRUE, col_types = c("date","date","date","date","date","date","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess","guess"))
View(x1)
I was trying to load an excel sheet with multiple columns in R and for some reason, the entire dates throughout the dataset turn out to be 1899-12-31 and don't proceed. The first four columns are supposed to be in "date" format. It should be 2018-01-01, 2018-01-02 and so on. How do I fix this?
for this issue with r and excel, you can use the following (answer will vary depending on whether you are using windows or mac):
On Windows, for dates (post-1901):
as.Date(43099, origin = "1900-01-01") # 2018-01-01
43099
On Mac, for dates (post-1904):
as.Date(41639, origin = "1904-01-01") # 2018-01-01
a bit of pertinent info taken from https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/as.Date
as.Date(32768, origin = "1900-01-01")
## Excel is said to use 1900-01-01 as day 1 (Windows default) or
## 1904-01-01 as day 0 (Mac default), but this is complicated by Excel
## incorrectly treating 1900 as a leap year.
## (these values come from http://support.microsoft.com/kb/214330)

as.Date giving me NA's

I've tried everything in this thread as.Date returning NA while converting from 'ddmmmyyyy' to try and sort my problem.
I'm using these commands to turn a factor into a date:
cohort$doi <- as.Date(cohort$doi, format= "%Y/%m/%d")
All my dates are currently in the format: YYYY-MM-DD, so as far as I'm aware the above should work
I used this code yesterday to convert all my dates for various variables from a factor to a date. It worked yesterday and everything was fine. Today I opened my script and imported in my data, ran this command and viewed my data but all of the dates now say NA.
I've tried everything from previous threads (I looked at a few more than just the one I linked above) but nothing has so far worked. I'm not sure what to do now
Example of what doi column looks like:
1970-01-01
1970-02-02
1970-03-03
1970-04-04
The column is currently classed as an factor. And when I do the code I used above, the column is defined as a date but all the dates now say NA
Other than closing R and opening it up again for today, I've done nothing else.
If you read the documentation for as.Date you will note the default format is %Y-%d-%m or %Y/%d/%m:
The default formats follow the rules of the ISO 8601 international standard which expresses a day as "2001-02-03".
In your code you have specified your dates are formatted by slashes, but your sample data shows they are formatted in the default format used by as.Date:
doi <- as.factor(c("1970-01-01",
"1970-02-02",
"1970-03-03",
"1970-04-04"))
as.Date(doi) # default format %Y-%m-%d
[1] "1970-01-01" "1970-02-02" "1970-03-03" "1970-04-04"
as.Date(doi, format = "%Y/%m/%d") # incorrect specification of your date format
[1] NA NA NA NA
as.Date("1970/01/01") # also a default format
[1] "1970-01-01"
Note: as.Date accepts character strings, factors, logical NA and objects of classes "POSIXlt" and "POSIXct".

convert SAS date format to R

I read in a sas7bdat file using the haven package.
One column has dates in the YYQ6. format.
I R this is converted to numbers like -5844, 0, 7121, ..
How can I convert this to a year format? I have no access to SAS but these values should be birth dates.
Bit of Research first. SAS uses as Zero 1st of January 1960 (see http://support.sas.com/publishing/pubcat/chaps/59411.pdf) so if you want the year of the data (represented by your number) it should be
format(as.Date(-5844, origin="1960-01-01"),"%Y")
and you get in this case
1944
is that correct? is what you are expecting?
To learn more on the data type YYQ6. check this Support article from SAS
http://support.sas.com/documentation/cdl/en/leforinforref/64790/HTML/default/viewer.htm#n02xxe6d9bgflsn18ee7qaachzaw.htm
Let me know if is working.
Umberto
I think in SAS dates are days from 1/1/1960, so you can do
sasq <- c(-5844, 0, 7121)
as.Date(sasq, '1960-01-01')
[1] "1944-01-01" "1960-01-01" "1979-07-01"

Using R for a Date format of 07-JUL-16 06.05.54.000000 AM

I have 2 Date variables in a .csv file with formats of "07-JUL-16 06.05.54.000000 AM". I want to use these in a regression model. Should I be reading these into a data frame as factors or characters? How can I take a difference of the 2 dates in each case?
Read them in as characters (e.g. stringsAsFactors=FALSE or tidyverse functions), then use as.POSIXct, e.g.
as.POSIXct("07-JUL-16 06.05.54.000000 AM",format="%d-%b-%y %I.%M.%OS %p")
## [1] "2016-07-07 06:05:54 EDT"
(I'm assuming that you are intending a day-month-year format rather than a month-day-year format -- but actually I don't have any evidence to support that thought!)
Once you've done this, subtracting the values should just work (give you an object of difftime) -- but be careful with units when converting to numeric!
For what it's worth, lubridate::ymd_hms thinks it can guess the format, but guesses wrong (?? assuming I guessed right above: with a two-digit year, and without any year values greater than 31, there's really nothing to distinguish years and days ...)

Time series data format

I'm a beginner in working with R. In general I do have csv files which I'm gonna read with "read.csv".
The files have 2 colums:
1st is date: "2013-01-01 22:20:00"
2nd is value: 0
So far I just took the var$2nd for analysis on data - but I need the date. Is it possible to read this date? And ask for the values between two dates? Or exclude values always between two times?
What is the right data format, how to convert and which is standard if I just read.csv
Thank you!
Say your csv file is called "foo.csv" and contains:
date, value
"2013-01-01 22:20:00", 3
"2013-01-02 12:20:00", 5
You need to tell R what kinds of things the columns are. By default, if it sees a string it will turn it into a factor, which is not what you want, so:
f <- read.csv ("foo.csv", colClasses=c("POSIXct", "integer"))
should do the trick.
Learn how read.csv works by doing:
?read.csv
and read carefully. If you do:
str (f)
you'll see that your date is POSIXct, as you asked. Do
?POSIXct
to learn how to do comparisons.

Resources