I'm trying to import a table from Microsoft Access (.accdb) to R.
The code that I use is:
library(RODBC)
testdb <- file.path("modelEAU Database V.2.accdb")
channel <- odbcConnectAccess2007(testdb)
WQ_data <- sqlFetch(channel, "WaterQuality")
It seems that it works but the problem is importing date and time data. Into the Access file there are two columns, one with date field (dd/mm/yyyy) and another one with time field (hh:mm:ss) and when I import them in R, in date column appears the date with yyyy-mm-dd format and into the time column the format is 1899-12-30 hh:mm:ss. Also, R can't recognise these formats as a variable and I can't work with them.
Also, I tried the mdb.get function but it didn't work as well.
Does somebody know how to import the data in R from Access defining the date and time format ? Any idea how to import the Access file as a text file?
Note: I'm working with with Office 2010 and R version 2.14.1
Thanks a lot in advanced.
Look at the result of runing str on your data frame. That will tell you more about how the data is actually stored. Generally dates and times are stored as a number from an origin date (Access uses 30 Dec. 1899 because MS thought that 1900 was a leap year). Sometimes it is stored as the number of days since the origin with time being represented as a fraction of the day, other times it is the number of seconds (or miliseconds) since the origin.
You will need to see how the data was sent (whether access and odbc converted to strings first, or sent days or seconds), then you will have a better feel for how to work with these (possibly converting) in R.
There is an article in the June 2004 edition of R News (the predecesor to the R Journal) that details the common ways to deal with dates and times in R and could be very useful to you.
You should decide what you want to end up with, a single column of DateTimes, 2 columns with numbers, 2 columns with characters, etc.
Related
I imported some data from Excel to RStudio (csv file). The data contains date information. The date format I want is month-day-year (e.g. 2-10-16 means February 10th 2016). The problem is that Excel auto-fills 2-10-16 to 2002-10-16, and the problem continues to exist when I imported the data to R. So, my data column contains both the correctly formatted dates (e.g. 2-10-16) and incorrectly formatted dates (e.g. 2002-10-16). Because I have a lot of dates, it is impossible to manually change everything. I have tried to use the this code
as.Date(data[,1], format="%m-%d-%y") but it gives me NA for those incorrectly formatted dates (e.g. 2002-10-16). Does anybody know how to make all the dates correctly formatted?
Thank you very much in advance!
would you consider to have a consistent date format in excel before importing the data to R?
The best approach is likely to change how the data is captured in Excel even if it means storing the dates as strings. What you're looking for is string manipulation to then convert into a date which could potentially create incorrect data.
This will remove the first two digits and then allow conversion to a date.
as.Date(sub('^\\d{2}', '', '2002-10-16'), '%m-%d-%y')
[1] "2016-02-10"
Referring to the above screenshot, i'm trying to crawl data from Singapore Stock Exchange, which the web content is loaded dynamically from an API call returning json, example here
I'm having some problem with the dates, which is given as a number by the json. For example, 1575491760000 is supposed to be 2019-12-04 20:36:00GMT.
After some trial and error, i've figured solution using R:
as.POSIXct(1575491760000/1000, origin="1970-01-01", tz = 'GMT')
# not sure why need to divide the number by 1000 here but i guess this is the way to make it work
and the above code does return "2019-12-04 20:36:00 GMT" in R.
However, my question is there a solution to the above conversion in Excel? I've tried a few different ways but none of them can deal with such long data scenario (date + time format). Appreciated if anyone can provide a specific solution!
Here's the Excel equivalent.
=DATE(1970,1,1) + 1575491760000/(1000*60*60*24)
# 12/4/19 20:36:00 with cell formatting set to m/d/yy h:mm:ss
Unix time increments one for every millisecond since 1/1/1970. Excel datetimes increment one for every day since 1/1/1900.
So to convert from UNIX time to excel, divide by the number of milliseconds in a day (1000*60*60*24) and add to the date 1/1/70 (25569 under the hood in Excel.)
Tearing my hair out on this one. Took me hours just to get rJava up and running (because mac OS X el capitan was not wanting to play nice with Java) in order to load excel-specific data importing packages etc. But in the end this hasn't helped my problem, and I'm just about at my wits end. Please help.
Basic situation is this:
Have simple excel data of time durations, over a span of a couple of years. So the two columns I'm importing are the time(duration) and year(2016,2017 etc).
In Excel the data is formatted as [h]:mm:ss so it displays correctly (data is related to number of hours worked in a month, so typically something like 80:xx:xx ~ 120:xx:xx). I'm aware that in excel, despite the cells being formatted as above, and only showing the relevant period of hours, that in reality excel has appended an (irrelevant, arbitrary) date to this hours data. I have searched and searched and found no way around this limitation in the way excel handles dates/times/durations.
I import this data into R via the "import data -> import from excel data set" menu item in R commander GUI, not the console.
However when importing the data into R, the data displays as a single number e.g. approx. 110 hrs is converted to 4.xxxxx, not as hh:mm:ss. So when running analyses and generating plots etc, instead of the actual (meaningful) 110:xx:xx type data being displayed, a completely meaningless 4.xxxxxx is displayed.
If I change the formatting of the excel cells to display the date as well as the time rather than use the [h]:mm:ss cell formatting, R erroneously interprets the data to something equally useless, like 1901/02/04 05:23 am
I have installed and loaded a variety of packages such as xlsx, XLConnect, lubridate etc but it hasn't made any difference to how R interprets the excel data on import, from the GUI at least.
Please tell me how do I either
a) edit the raw data to a format that R will understand as a time duration (and nothing but a time duration) in hh:mm:ss format, or
b) format the current data from within R after importing, so that it displays the data in the correct way rather than a useless number or arbitrary date/time?
[Please note: I can use the console, when given the commands etc needed to be executed. But I need to find a solution that ultimately will allow the data to be imported and/or manipulated from within the GUI, not from typing a bunch of commands into the console, as the end user (not me) has zero programming ability and cannot use a console, and will only ever be using R via the GUI.]
Is your code importing the data from excel as seconds?
library(lubridate)
duration <- lubridate::as.duration(400000)
as.numeric(duration, "hours")
111.1111
as.numeric(duration, "days")
4.62963
seconds_to_period(400000)
"4d 15H 6M 40S"
I wrote a data frame in CSV format with R and opened it with Excel. Then I converted it to an Excel Workbook and made some edit on it.
When I imported that Excel file in R again, the date column looked like this. (Few are in numbers and few are dates.)
Date
39387
39417
15/01/2007
16/01/2007
I tried to change the format with Excel but failed. General or number option in Excel format generate the number like I mentioned which is in no way related to the date.
It seems all four of your example are in respect of dates in January (11th and 12th for the first two), that the Excel involved has been configured to expect MDY (rather than DMY as popular in UK for example) and its date system to ‘1900’, and that the CSV has written the dates as 'conventional' dates rather than date index numbers.
So what Excel saw first was:
11/01/2017
12/01/2017
15/01/2017
16/01/2017
intended to represent the 11th, 12th, 15th and 16th of January. However, because expecting MDY Excel has interpreted (coercing the Text to date index) the first two entries as November 1 and December 1. For ‘months’ 15 and 16 it did no interpretation and just reported the text as in the CSV.
The coercion is automatic with no option to turn it off (nor 'reversed' with a change of format). Various ways to address this (common) issue are mentioned in the link kindly provided by #Gerard Wilkinson. I am not providing a full solution here since (a) some things are under user control (eg the ‘1904’ system is an option, as is the choice whether MDY or DMY) and (b) the preferred route will depend to some extent on how often required and what the user is most familiar with.
I have not worked with SPSS (.sav) files before and am trying to work with some data files provided to me by importing them into R. I did not receive any explanation of the files, and because communication is difficult I am trying to figure out as much as I can on my own.
Here's my first question. This is what the Date field looks like in an R data frame after import:
> dataset2$Date[1:4]
[1] 13608172800 13608259200 13608345600 13608345600
I don't know what dates the data is supposed to be for, but I found that if I divide the above numbers by 10, that seems to give a reasonable date (in February 2013). Can anyone confirm this is indeed what the above represents?
My second question is regarding another column called Begin_time. Here's what that looks like:
> dataset2$Begin_time[1:4]
[1] 29520 61800 21480 55080
Any idea what this is representing? I want to believe this is some representation of time of day because the records are for wildlife observations, but I haven't got more info than that to try to guess. I noticed that if I take the difference between End_Time and Begin_time I get numbers like 120 and 180, which seems like minutes to me (3 hours seems reasonable to observe a wild animal), but the absolute numbers are far greater than the number of minutes in a day (1440), so that leaves me puzzled. Is this some time keeping format from SPSS? If so, what's the logic?
Unfortunately, I don't have access to SPSS, so any help would be much appreciated.
I had the same problem and this function is a good solution:
pss2date <- function(x) as.Date(x/86400, origin = "1582-10-14")
This is where I found the answer:
http://scs.math.yorku.ca/index.php/R:_Importing_dates_from_SPSS
Dates in SPSS Statistics are represented as floating point doubles holding the number of seconds since Oct 1, 1582. If you use the SPSS R plugin apis, they can be automatically converted to R dates, but any proper converter should be able to do this for you.