Date column produces unknown numbers in r - r

I wrote a data frame in CSV format with R and opened it with Excel. Then I converted it to an Excel Workbook and made some edit on it.
When I imported that Excel file in R again, the date column looked like this. (Few are in numbers and few are dates.)
Date
39387
39417
15/01/2007
16/01/2007
I tried to change the format with Excel but failed. General or number option in Excel format generate the number like I mentioned which is in no way related to the date.

It seems all four of your example are in respect of dates in January (11th and 12th for the first two), that the Excel involved has been configured to expect MDY (rather than DMY as popular in UK for example) and its date system to ‘1900’, and that the CSV has written the dates as 'conventional' dates rather than date index numbers.
So what Excel saw first was:
11/01/2017
12/01/2017
15/01/2017
16/01/2017
intended to represent the 11th, 12th, 15th and 16th of January. However, because expecting MDY Excel has interpreted (coercing the Text to date index) the first two entries as November 1 and December 1. For ‘months’ 15 and 16 it did no interpretation and just reported the text as in the CSV.
The coercion is automatic with no option to turn it off (nor 'reversed' with a change of format). Various ways to address this (common) issue are mentioned in the link kindly provided by #Gerard Wilkinson. I am not providing a full solution here since (a) some things are under user control (eg the ‘1904’ system is an option, as is the choice whether MDY or DMY) and (b) the preferred route will depend to some extent on how often required and what the user is most familiar with.

Related

Excel date format issue from multiple files

I have data coming in from different versions of a machine in CSV format but each machine (due to their age) provides the date/time in a different format.
2020-10-30T13:24:26.874Z
2020-10-30T14:37:27.052Z
10/30/2020 09:29:06
10/30/2020 11:42:47
2020-10-30T14:10:35.422Z
11/02/2020
I've used the following formulas to split the date and time into different column:
=LEFT(O1399,10)
=IF(RIGHT(O1399,1)="Z",LEFT(RIGHT(O1399,13),12),RIGHT(O1399,8))
However some of my date results are coming up as follows:
43872.3807
43872.3829
We've also had issues where the date is in an American format 10/6/20 (6 November 2020) but the result is English but still 10/6/2020 (10 June 2020).
Can anyone help with a simple solution of how I can get all of the dates and times into a UK format?

Trying to correctly format all the dates in RStudio imported from Excel

I imported some data from Excel to RStudio (csv file). The data contains date information. The date format I want is month-day-year (e.g. 2-10-16 means February 10th 2016). The problem is that Excel auto-fills 2-10-16 to 2002-10-16, and the problem continues to exist when I imported the data to R. So, my data column contains both the correctly formatted dates (e.g. 2-10-16) and incorrectly formatted dates (e.g. 2002-10-16). Because I have a lot of dates, it is impossible to manually change everything. I have tried to use the this code
as.Date(data[,1], format="%m-%d-%y") but it gives me NA for those incorrectly formatted dates (e.g. 2002-10-16). Does anybody know how to make all the dates correctly formatted?
Thank you very much in advance!
would you consider to have a consistent date format in excel before importing the data to R?
The best approach is likely to change how the data is captured in Excel even if it means storing the dates as strings. What you're looking for is string manipulation to then convert into a date which could potentially create incorrect data.
This will remove the first two digits and then allow conversion to a date.
as.Date(sub('^\\d{2}', '', '2002-10-16'), '%m-%d-%y')
[1] "2016-02-10"

Values in column change when reading in excel file

I am trying to read in an excel file using the readXL package with a column of time stamps. For this particular file, they are randomly distributed times, so they make look like 00:01, 00:03, 00:04, 00:08, 00:10, etc. so I need these timestamps to read into R correctly. The time stamps turn into random decimals.
I looked in the excel file (which is outputted from a different program) and it appears the column type within excel is "custom". When I convert that "custom" column to "text", it shows me the decimals that are actually stored and reading into R. Is there a way to load in the timestamps instead of the decimals?
I have already tried to using col_types to make it text or integers, but it is still reading the numbers in as decimals and not the timestamps.
df<-
readxl::read_xlsx(
"./Data/LAFLAC_60sec_9999Y1_2019-06-28.xlsx",
range = cell_cols("J:CE"),
col_names = T
)
The decimals are a representation in day after midnight. So 1:00 am is .041667, or 1/24.
In R, you can convert those numbers back into timestamps in a variety of ways.
Try this page for more info
https://stackoverflow.com/a/14484019/6912825

Converting dates in SPSS

I have data from Survey Monkey and the dates that the responses were provided are transferring over in a strange format. They look like:
"1036914:34:45.00"
I have altered the variable type to every available option that SPSS provides and none of them are giving me the right data.
The number that I pasted above should be a date/time between the end of March and the beginning of April of 2018.
Any thoughts?
This should be the number of
"hours:minutes:seconds.centiseconds"
that have passed since 01/01/1900. I looked it up and your example would be
Monday 16 April 2018 at 18:34:45
This is probably an MS Excel format?

Import data in R from Access

I'm trying to import a table from Microsoft Access (.accdb) to R.
The code that I use is:
library(RODBC)
testdb <- file.path("modelEAU Database V.2.accdb")
channel <- odbcConnectAccess2007(testdb)
WQ_data <- sqlFetch(channel, "WaterQuality")
It seems that it works but the problem is importing date and time data. Into the Access file there are two columns, one with date field (dd/mm/yyyy) and another one with time field (hh:mm:ss) and when I import them in R, in date column appears the date with yyyy-mm-dd format and into the time column the format is 1899-12-30 hh:mm:ss. Also, R can't recognise these formats as a variable and I can't work with them.
Also, I tried the mdb.get function but it didn't work as well.
Does somebody know how to import the data in R from Access defining the date and time format ? Any idea how to import the Access file as a text file?
Note: I'm working with with Office 2010 and R version 2.14.1
Thanks a lot in advanced.
Look at the result of runing str on your data frame. That will tell you more about how the data is actually stored. Generally dates and times are stored as a number from an origin date (Access uses 30 Dec. 1899 because MS thought that 1900 was a leap year). Sometimes it is stored as the number of days since the origin with time being represented as a fraction of the day, other times it is the number of seconds (or miliseconds) since the origin.
You will need to see how the data was sent (whether access and odbc converted to strings first, or sent days or seconds), then you will have a better feel for how to work with these (possibly converting) in R.
There is an article in the June 2004 edition of R News (the predecesor to the R Journal) that details the common ways to deal with dates and times in R and could be very useful to you.
You should decide what you want to end up with, a single column of DateTimes, 2 columns with numbers, 2 columns with characters, etc.

Resources