R date conversion to POSIX format in Tableau - r

I am trying to connect R-Impala-tableau.
I have an R code scripted in tableau calculated field which runs on a date field.
R code is:
script_str('date<- as.POSIXct(strptime(.arg1,"%m%d%y"))',attr([date_val]))
I want this code to run on the date values extracted from impala.
The date_val here is extracted from impala and its converted to date type in tableau.
However, the above code returns NULL. I also tried keeping the date_val field to string, date_time, numeric data types.
I am doing this to perform a cut operation on the dates obtained(eg:intervals<-cut(date,"1 day")). To do this, I need the field date in numeric format.
I even tried script_str('date<- as.POSIXct(strptime(as.character(.arg1),"%m%d%y"))',attr([date_val]))
Has anybody faced similar issue?
Any help in this regard will be appreciated.

Related

Trying to correctly format all the dates in RStudio imported from Excel

I imported some data from Excel to RStudio (csv file). The data contains date information. The date format I want is month-day-year (e.g. 2-10-16 means February 10th 2016). The problem is that Excel auto-fills 2-10-16 to 2002-10-16, and the problem continues to exist when I imported the data to R. So, my data column contains both the correctly formatted dates (e.g. 2-10-16) and incorrectly formatted dates (e.g. 2002-10-16). Because I have a lot of dates, it is impossible to manually change everything. I have tried to use the this code
as.Date(data[,1], format="%m-%d-%y") but it gives me NA for those incorrectly formatted dates (e.g. 2002-10-16). Does anybody know how to make all the dates correctly formatted?
Thank you very much in advance!
would you consider to have a consistent date format in excel before importing the data to R?
The best approach is likely to change how the data is captured in Excel even if it means storing the dates as strings. What you're looking for is string manipulation to then convert into a date which could potentially create incorrect data.
This will remove the first two digits and then allow conversion to a date.
as.Date(sub('^\\d{2}', '', '2002-10-16'), '%m-%d-%y')
[1] "2016-02-10"

Inconsistency date value when read.xlsx in R

I am using the read.xlsx function in R to read excel sheets. All the values of a date column 'A' is of the form dd/mm/yyyy. However,when using the read.xlsx function, the values of the date parsed ranges from being an integer ie. 42283 to string i.e. 20/08/2015. This probelm persist even when I uses read.xlsx2.
I guess the inconsistency in the format for different rows makes it hard to change the column to a single standard format. Also, it is hard to specify the column classes in the read.xlsx since I have more than 100 variables.
Are there ways around this problem and also is this an excel specific problems?
Thank you!
This problem with date formats is pervasive and it seems like every R package out there deals with it differently. My experience with read.xlsx has been that it sometimes saves the date as a character string of numbers, e.g. "42438" as character data that I then have to convert to numeric and then to POSIXct. Then other times, it seems to save it as numeric and sometimes as character and once in a while, actually as POSIXct! If you're consistently getting character data in the form "20/08/2015", try the lubridate package:
library(lubridate)
dmy("20/08/2015")

POSIxt and POSIXct Date Comaprison Count in R

I created a data table called ltfs from a table I loaded in through a ODBC connection that was named ltfs
ltfs<-data.table(ltfs)
In this table there is a column with a date field in it. When I check the class of that column it returns "POSIXct" "POSIXt". When I check what the dates look like they are in the form of ex. 2014-01-01. I am trying to count the number of entries with a date in this field less than a certain date. I tried using the .N feature but am getting an error that has to do with the way the date formats are. My code is below any help would be much appreciated.
ltfs<-data.table(ltfs)
datecount<-ltfs[Eligibility_Date<=as.Date("2011-01-01"),.N]
I know it has to do with the date format (either the criteria or the format of the dates in the table) just not sure how to handle them.

Incrementing date column in sqlite3

I have a sqlite3 table with dates stored as text not null with the format MM-DD-YY, and have been doing most of my manipulation in Java by iterating through the cursor, however, it's proving to be a bit ugly/cumbersome, and I was wondering if there's any way to do an update statement in sqlite3 to increment/decrement all records by a certain number of days. I'm mostly having trouble figuring out how to convert my text column into a format that I can use with something along the lines of datetime(myDate, '+1 Day');. Can anybody point me in the right direction? Thanks guys!!
The documentation of the built-in date functions documents the supported date format:
YYYY-MM-DD
You should store your date values in this format.
If you cannot do this, you can use substr() to extract the fields of the date and convert it into the supported format.

Import data in R from Access

I'm trying to import a table from Microsoft Access (.accdb) to R.
The code that I use is:
library(RODBC)
testdb <- file.path("modelEAU Database V.2.accdb")
channel <- odbcConnectAccess2007(testdb)
WQ_data <- sqlFetch(channel, "WaterQuality")
It seems that it works but the problem is importing date and time data. Into the Access file there are two columns, one with date field (dd/mm/yyyy) and another one with time field (hh:mm:ss) and when I import them in R, in date column appears the date with yyyy-mm-dd format and into the time column the format is 1899-12-30 hh:mm:ss. Also, R can't recognise these formats as a variable and I can't work with them.
Also, I tried the mdb.get function but it didn't work as well.
Does somebody know how to import the data in R from Access defining the date and time format ? Any idea how to import the Access file as a text file?
Note: I'm working with with Office 2010 and R version 2.14.1
Thanks a lot in advanced.
Look at the result of runing str on your data frame. That will tell you more about how the data is actually stored. Generally dates and times are stored as a number from an origin date (Access uses 30 Dec. 1899 because MS thought that 1900 was a leap year). Sometimes it is stored as the number of days since the origin with time being represented as a fraction of the day, other times it is the number of seconds (or miliseconds) since the origin.
You will need to see how the data was sent (whether access and odbc converted to strings first, or sent days or seconds), then you will have a better feel for how to work with these (possibly converting) in R.
There is an article in the June 2004 edition of R News (the predecesor to the R Journal) that details the common ways to deal with dates and times in R and could be very useful to you.
You should decide what you want to end up with, a single column of DateTimes, 2 columns with numbers, 2 columns with characters, etc.

Resources