Dates when exporting to CSV and reading into R - r

I'm using the R[1] package RGoogleDocs[2] to connect to my Google Docs, obtain a list of spreadsheets and import a specific worksheet from a specified spreadsheet. I can achieve this fine no problem following the example given at https://github.com/hammer/google-spreadsheets-to-r-dataframe
The problem I have encountered is with date columns. Under Google Docs I've selected to format these as YYYY-MM-DD and they display fine within Google Docs.
However, the exported CSV which gets imported to R has these as numeric fields, so for example....
Displayed in Google Docs > As imported to R
2013-02-15 > 41320
2013-02-19 > 41324
2013-02-26 > 41331
2013-03-22 > 41355
This isn't necessarily a problem as it appears that these are elapsed dates, but I don't know what the origin from which they are being counted is. Once I know the origin, R has a function for converting dates/times that allows this to be specified so I can then reformat internally in R ( using the as.Date(date, origin="") function).
To try and get round this I set the formatting to plain text for the date columns, but despite typing the dates in with leading zero's for days/months < 10 they are exported without, so the as.Date() function complains about them being in a non-standard format.
I therefore have two options/questions...
1) What is the origin that Google Docs uses internally for representing dates? (I've searched for this through the Google Help but can't find it, and wider web-searches have been fruitless)
2) Is there a method to export dates as strings to CSV? (I have tried this, but when they're set to "plain text" in Google Docs, the leading zeros ('0') that are typed in when entering the dates are not present in the export, meaning that R complains about the date being in a non standard format*).
Thanks in advance for your time,
slackline
[1] http://www.r-project.org/
[2] http://www.omegahat.org/RGoogleDocs/
I could write a function to pull out the day/month/year as individual elements and derive this though, but figured there is a more direct method.

Concerning your question number 1): Apparently, Google Docs uses 1899-12-30 as the date origin:
as.Date(41320, origin="1899-12-30")
# [1] "2013-02-15"
# etc

Related

Converting character to POSIXct in R Markdown is Making Me Sad

I'm learning to use R as part of a certificate program and I've been trying to compile a report from a practice project in R Markdown and getting my date-times back to POSIXct rather than being characters has been very difficult. I know write_csv() saves all cloumns in a dataframe as the character type, and for things like num and Date, I have corrected them using something like:
is.factor(all_trips_v2_rmd$date) # this checks to see if the variable is factor-type
all_trips_v2_rmd$date <- as.Date(as.character(all_trips_v2_rmd$date)) # this converts the columns into Date-type
is.Date(all_trips_v2_rmd$date) # this checks to make sure this column is now Date-type
which does the job just fine. However, when I run this:
is.factor(all_trips_v2_rmd$started_at)
all_trips_v2_rmd$started_at <- as.POSIXct(as.Date(as.character(all_trips_v2_rmd$started_at)), format="%Y/%m/%d/%H/%M/%S")
is.POSIXct(all_trips_v2_rmd$started_at)
the output of the first date-time value would be: "2021-04-11 19:00:00." The Y%/m%/d% part of the information will generally be correct, but all H%/M%/S% parts of the data will be exactly 19:00:00. Looking at the character-only version of the .csv file, the first date-time value is: "2021-04-12T18:25:36Z." I've been looking everywhere for a fix, and none of the solutions seem to work.

R: Evaluation Error - character string is not in a standard unambiguous format [duplicate]

I am working with data in R that I imported form excel format.
I have a column (isas1_3b) in the dataframe (LPAv1.1.1) which is in the character format. Upon importing, the dates have changed from the format dd/mm/yy, to days (e.g., 41268).
I have tried to convert this as below:
as.Date(LPAv1.1.1$isas1_3b, origin = "1899-12-30")
However, I get the following error:
Error in charToDate(x) : character string is not in a standard unambiguous format
4. stop("character string is not in a standard unambiguous format")
3. charToDate(x)
2. as.Date.character(LPAv1.1.1$isas1_3b, origin = "1899-12-30")
1. as.Date(LPAv1.1.1$isas1_3b, origin = "1899-12-30")
I'm not sure what I am doing wrong. After numerous searches, the above conversion is what was recommended.
I should also add, that there are two other date columns in the original excel document. But they have both been read in as 'POSIXct''POSIXt'.
Other info that may be relevant:
macOS 13.13.3 R 3.3.3 RStudio 1.1.419
Can someone please help resolve this issue... I am assuming it is something that I am doing. Please let me know if you need any more info.
As thelatemail rightly pointed out, the column with the days information must be in numeric format.
d <- 41268
as.Date(d, origin = "1899-12-30")
#[1] "2012-12-25"
On your dataset, this will fix it:
library(dplyr)
mutate(LPAv1.1.1, isas1_3b = as.Date(as.numeric(isas1_3b),
origin = "1899-12-30"))
The variable class was not consistent within the column/vector. There was a mix of dates, strings, and four digit numbers. Once I corrected these, it worked as expected. Thank you all for your help.
How did you import it from excel? Is it possible to adjust the import method so it imports it the way you are expecting?
Importing the data from excel with the package "xlsx" gives you two options read.xlsx which will detect/guess the class type based on the data in that column, or read.xlsx2 where you have to/get to set the class types manually with the colClasses option. (more info here: http://www.sthda.com/english/wiki/r-xlsx-package-a-quick-start-guide-to-manipulate-excel-files-in-r)
Another useful option is XLConnect.
A possible downside of both of these packages is they rely on Java to do the work of importing, so you must have Java installed for these to work.
I solved the issue by changing cell format for the dates in Excel to Date Type "2014-03-24" and saving the file as CSV (MS-DOS) (*csv) before loading into R.

as.Date fails in R: 'character string is not in a standard unambiguous'

I am working with data in R that I imported form excel format.
I have a column (isas1_3b) in the dataframe (LPAv1.1.1) which is in the character format. Upon importing, the dates have changed from the format dd/mm/yy, to days (e.g., 41268).
I have tried to convert this as below:
as.Date(LPAv1.1.1$isas1_3b, origin = "1899-12-30")
However, I get the following error:
Error in charToDate(x) : character string is not in a standard unambiguous format
4. stop("character string is not in a standard unambiguous format")
3. charToDate(x)
2. as.Date.character(LPAv1.1.1$isas1_3b, origin = "1899-12-30")
1. as.Date(LPAv1.1.1$isas1_3b, origin = "1899-12-30")
I'm not sure what I am doing wrong. After numerous searches, the above conversion is what was recommended.
I should also add, that there are two other date columns in the original excel document. But they have both been read in as 'POSIXct''POSIXt'.
Other info that may be relevant:
macOS 13.13.3 R 3.3.3 RStudio 1.1.419
Can someone please help resolve this issue... I am assuming it is something that I am doing. Please let me know if you need any more info.
As thelatemail rightly pointed out, the column with the days information must be in numeric format.
d <- 41268
as.Date(d, origin = "1899-12-30")
#[1] "2012-12-25"
On your dataset, this will fix it:
library(dplyr)
mutate(LPAv1.1.1, isas1_3b = as.Date(as.numeric(isas1_3b),
origin = "1899-12-30"))
The variable class was not consistent within the column/vector. There was a mix of dates, strings, and four digit numbers. Once I corrected these, it worked as expected. Thank you all for your help.
How did you import it from excel? Is it possible to adjust the import method so it imports it the way you are expecting?
Importing the data from excel with the package "xlsx" gives you two options read.xlsx which will detect/guess the class type based on the data in that column, or read.xlsx2 where you have to/get to set the class types manually with the colClasses option. (more info here: http://www.sthda.com/english/wiki/r-xlsx-package-a-quick-start-guide-to-manipulate-excel-files-in-r)
Another useful option is XLConnect.
A possible downside of both of these packages is they rely on Java to do the work of importing, so you must have Java installed for these to work.
I solved the issue by changing cell format for the dates in Excel to Date Type "2014-03-24" and saving the file as CSV (MS-DOS) (*csv) before loading into R.

R script in Power BI returns date as Microsoft.OleDb.Date

The essence:
Why does Powerbi show data of the form 2017-01-04 (yyyy-mm-dd) like this?
The details
I'm trying to transform a table in Power BI using the Run R Script functionality in Edit Query.
The source of the table is a csv file with a column with dates of the format 2017-01-04 (yyyy-mm-dd):
2017-01-04
2017-01-03
2017-01-02
2017-01-01
2016-12-31
2016-12-30
2016-12-29
2016-12-28
2016-12-27
2016-12-26
2016-12-25
2016-12-24
2016-12-23
2016-12-22
Using Get Data, Power BI shows the same date column like this:
And after opening the Edit Query window, the very same date column still looks like this:
However, when trying to run an R sctript with the same data, the column only consists of the "values" Microsoft.OleDb.Date like this:
The R script I'm running is simply:
# 'dataset' holds the input data for this script
output <- head(dataset)
If I try to change the data type, en error is returned:
It all seems very strange to me, and I haven't found a reasonable explanation using Google.
Any suggestions?
I already provided a solution in the comments, but I'll add a detailed suggestion here as well.
The applied steps in Power BI and the resulting date column should look like this:
Here are the details:
After loading the data from the csv file, go to Edit Queries and change the data type to text:
Run the R script
Change the datatype back to date once the script has provided an output.
I wandered upon this answer last week and found that it didn't work for me. I'd receive an error from the R script that "character string is not in a standard unambiguous format." I'm assuming this is due to the many updates that have happened with Power BI in the years since the original answer, because as far as I could tell, all dates were in the exact same format (this error did not occur if I ran the data separately in R/RStudio). I figured I'd leave my solution for those who happen upon this like I did.
I did the exact same thing as vestland's solution, except instead of changing the data type to text, I had to change it to a whole number:
1) Edit query.
2) Convert all date columns to "Whole Number."
3) Run R Script, and convert date columns from numbers to date:
In R, this requires that you use as.Date() or lubridate::as_date(), with the origin argument, origin = "1899-12-30" to get the correct date when you convert from whole number back to date. (This is the origin instead of "1900-01-01" because Excel/Power BI don't account for two leap years in early 20th century, I've heard.)

issues with formatting a column with date in r

I'm having an issue trying to format date in r... tried the following codes
rdate<-as.Date(dusted$time2,"%d/%m/%y") and also the recommendations on this stackoverflow question Changing date format in R but still couldn't get it to work.
geov<-dusted
geov$newdate <- strptime(as.character(geov$time2), "%d/%m/%Y")
all i'm getting is NA for the whole column for date. This are daily values, i would love if r can read them. Data available here https://www.dropbox.com/s/awstha04muoz66y/dusted.txt?dl=0
To convert to date, as long as you successfully imported the data already into a data frame such as dusted or geov, and have time2 holding dates as strings resembling 10-27-06, try:
geov$time2 = as.Date(geov$time2, "%m-%d-%y")
equal sign = used just to save on typing. It is equivalent to <-, so you can still use <- if you prefer
this stores the converted dates right back into geov$time2, overwriting it, instead of creating a new variable geov$newdate as you did in your original question. This is because a new variable is not required for conversion. But if for some reason you really need a new variable, feel free to use geov$newdate
similarly, you also didn't need to copy dusted to a new geov data frame just to convert. It does save time for testing purposes though, just in case the conversion doesn't work you can restart from copying dusted to geov instead of having to re-import data from a file into dusted
Additional resources
help(strptime) for looking up date code references such as %y. On Linux, man date can reveal the date codes

Resources