Trying to correctly format all the dates in RStudio imported from Excel - r

I imported some data from Excel to RStudio (csv file). The data contains date information. The date format I want is month-day-year (e.g. 2-10-16 means February 10th 2016). The problem is that Excel auto-fills 2-10-16 to 2002-10-16, and the problem continues to exist when I imported the data to R. So, my data column contains both the correctly formatted dates (e.g. 2-10-16) and incorrectly formatted dates (e.g. 2002-10-16). Because I have a lot of dates, it is impossible to manually change everything. I have tried to use the this code
as.Date(data[,1], format="%m-%d-%y") but it gives me NA for those incorrectly formatted dates (e.g. 2002-10-16). Does anybody know how to make all the dates correctly formatted?
Thank you very much in advance!

would you consider to have a consistent date format in excel before importing the data to R?

The best approach is likely to change how the data is captured in Excel even if it means storing the dates as strings. What you're looking for is string manipulation to then convert into a date which could potentially create incorrect data.
This will remove the first two digits and then allow conversion to a date.
as.Date(sub('^\\d{2}', '', '2002-10-16'), '%m-%d-%y')
[1] "2016-02-10"

Related

How can I convert a column of date/time data from numeric to character in R?

I have a column of data compiled from Excel files. Some of the values in the date column have changed upon binding and are now numeric date format (despite their starting out character) whilst others remain as they were (yyyy-mm-dd hh:mm). How can I change the entire column to the same date format (yyyy-mm-dd hh:mm)?
Thanks in advance.
Try strptime:
df$column <- strptime(df$column, format='%Y-%m-%d %H:%M')
OK so I finally cracked it. This is probably very obvious to everyone but just in case a newb like myself has this same issue this is what solved it for me.
I had two sets of data that I'd bound into the same table. One set of data came from XLSX files and the other from CSV files. They both presented fine in R but when combined the CSV-derived lost formatting and reverted to numerical dates. I discovered that the 'date' columns in the xlsx-derived tables were 'character' whilst the 'date' columns in the csv-derived tables were 'factor with 1 level'. When combined, the character data preserved format (i.e. looked like a date - yyyy-mm-dd hh:mm) and the factor data turned into numeric dates
So to rectify I used the following on the .csv (factor) tables before binding:
myfile$Date <- as.character(myfile$Date)
This changed the columns to character to match the others and the bind was successful and all date formatting was preserved. Thank you for your help!

Values in column change when reading in excel file

I am trying to read in an excel file using the readXL package with a column of time stamps. For this particular file, they are randomly distributed times, so they make look like 00:01, 00:03, 00:04, 00:08, 00:10, etc. so I need these timestamps to read into R correctly. The time stamps turn into random decimals.
I looked in the excel file (which is outputted from a different program) and it appears the column type within excel is "custom". When I convert that "custom" column to "text", it shows me the decimals that are actually stored and reading into R. Is there a way to load in the timestamps instead of the decimals?
I have already tried to using col_types to make it text or integers, but it is still reading the numbers in as decimals and not the timestamps.
df<-
readxl::read_xlsx(
"./Data/LAFLAC_60sec_9999Y1_2019-06-28.xlsx",
range = cell_cols("J:CE"),
col_names = T
)
The decimals are a representation in day after midnight. So 1:00 am is .041667, or 1/24.
In R, you can convert those numbers back into timestamps in a variety of ways.
Try this page for more info
https://stackoverflow.com/a/14484019/6912825

Date column produces unknown numbers in r

I wrote a data frame in CSV format with R and opened it with Excel. Then I converted it to an Excel Workbook and made some edit on it.
When I imported that Excel file in R again, the date column looked like this. (Few are in numbers and few are dates.)
Date
39387
39417
15/01/2007
16/01/2007
I tried to change the format with Excel but failed. General or number option in Excel format generate the number like I mentioned which is in no way related to the date.
It seems all four of your example are in respect of dates in January (11th and 12th for the first two), that the Excel involved has been configured to expect MDY (rather than DMY as popular in UK for example) and its date system to ‘1900’, and that the CSV has written the dates as 'conventional' dates rather than date index numbers.
So what Excel saw first was:
11/01/2017
12/01/2017
15/01/2017
16/01/2017
intended to represent the 11th, 12th, 15th and 16th of January. However, because expecting MDY Excel has interpreted (coercing the Text to date index) the first two entries as November 1 and December 1. For ‘months’ 15 and 16 it did no interpretation and just reported the text as in the CSV.
The coercion is automatic with no option to turn it off (nor 'reversed' with a change of format). Various ways to address this (common) issue are mentioned in the link kindly provided by #Gerard Wilkinson. I am not providing a full solution here since (a) some things are under user control (eg the ‘1904’ system is an option, as is the choice whether MDY or DMY) and (b) the preferred route will depend to some extent on how often required and what the user is most familiar with.

Read Excel file in R with multiple date format

I'm trying to read an Excel file in R, with two columns containing dates. Now here is my problem, when I view my data file in R, most of the dates are in the good format, but some were transformed into number that don't make sense at all. I joined images to show the different outputs from R/Excel.
(Only pay attention to the columns "ArrivalDate" and ActlFlightDate")
Output seen from R
Output seen from Excel
My question is, how, in R, can I make those numbers become the date they are supposed to be? Especially since the class of the elements in those columns are characters.
Thank you in advance!
Your dates in your excel file are different formats. That is why some are to left side of column and some are to right side of column in the spreadsheet. They are probably mixed between an actual excel date format and a text or general. You should copy the columns to new columns and highlight the new column and right click and change format until all your dates are uniform. Then you can import to R and have consistent dates data.

Import data in R from Access

I'm trying to import a table from Microsoft Access (.accdb) to R.
The code that I use is:
library(RODBC)
testdb <- file.path("modelEAU Database V.2.accdb")
channel <- odbcConnectAccess2007(testdb)
WQ_data <- sqlFetch(channel, "WaterQuality")
It seems that it works but the problem is importing date and time data. Into the Access file there are two columns, one with date field (dd/mm/yyyy) and another one with time field (hh:mm:ss) and when I import them in R, in date column appears the date with yyyy-mm-dd format and into the time column the format is 1899-12-30 hh:mm:ss. Also, R can't recognise these formats as a variable and I can't work with them.
Also, I tried the mdb.get function but it didn't work as well.
Does somebody know how to import the data in R from Access defining the date and time format ? Any idea how to import the Access file as a text file?
Note: I'm working with with Office 2010 and R version 2.14.1
Thanks a lot in advanced.
Look at the result of runing str on your data frame. That will tell you more about how the data is actually stored. Generally dates and times are stored as a number from an origin date (Access uses 30 Dec. 1899 because MS thought that 1900 was a leap year). Sometimes it is stored as the number of days since the origin with time being represented as a fraction of the day, other times it is the number of seconds (or miliseconds) since the origin.
You will need to see how the data was sent (whether access and odbc converted to strings first, or sent days or seconds), then you will have a better feel for how to work with these (possibly converting) in R.
There is an article in the June 2004 edition of R News (the predecesor to the R Journal) that details the common ways to deal with dates and times in R and could be very useful to you.
You should decide what you want to end up with, a single column of DateTimes, 2 columns with numbers, 2 columns with characters, etc.

Resources