issues with formatting a column with date in r - r

I'm having an issue trying to format date in r... tried the following codes
rdate<-as.Date(dusted$time2,"%d/%m/%y") and also the recommendations on this stackoverflow question Changing date format in R but still couldn't get it to work.
geov<-dusted
geov$newdate <- strptime(as.character(geov$time2), "%d/%m/%Y")
all i'm getting is NA for the whole column for date. This are daily values, i would love if r can read them. Data available here https://www.dropbox.com/s/awstha04muoz66y/dusted.txt?dl=0

To convert to date, as long as you successfully imported the data already into a data frame such as dusted or geov, and have time2 holding dates as strings resembling 10-27-06, try:
geov$time2 = as.Date(geov$time2, "%m-%d-%y")
equal sign = used just to save on typing. It is equivalent to <-, so you can still use <- if you prefer
this stores the converted dates right back into geov$time2, overwriting it, instead of creating a new variable geov$newdate as you did in your original question. This is because a new variable is not required for conversion. But if for some reason you really need a new variable, feel free to use geov$newdate
similarly, you also didn't need to copy dusted to a new geov data frame just to convert. It does save time for testing purposes though, just in case the conversion doesn't work you can restart from copying dusted to geov instead of having to re-import data from a file into dusted
Additional resources
help(strptime) for looking up date code references such as %y. On Linux, man date can reveal the date codes

Related

Converting character to POSIXct in R Markdown is Making Me Sad

I'm learning to use R as part of a certificate program and I've been trying to compile a report from a practice project in R Markdown and getting my date-times back to POSIXct rather than being characters has been very difficult. I know write_csv() saves all cloumns in a dataframe as the character type, and for things like num and Date, I have corrected them using something like:
is.factor(all_trips_v2_rmd$date) # this checks to see if the variable is factor-type
all_trips_v2_rmd$date <- as.Date(as.character(all_trips_v2_rmd$date)) # this converts the columns into Date-type
is.Date(all_trips_v2_rmd$date) # this checks to make sure this column is now Date-type
which does the job just fine. However, when I run this:
is.factor(all_trips_v2_rmd$started_at)
all_trips_v2_rmd$started_at <- as.POSIXct(as.Date(as.character(all_trips_v2_rmd$started_at)), format="%Y/%m/%d/%H/%M/%S")
is.POSIXct(all_trips_v2_rmd$started_at)
the output of the first date-time value would be: "2021-04-11 19:00:00." The Y%/m%/d% part of the information will generally be correct, but all H%/M%/S% parts of the data will be exactly 19:00:00. Looking at the character-only version of the .csv file, the first date-time value is: "2021-04-12T18:25:36Z." I've been looking everywhere for a fix, and none of the solutions seem to work.

Importing date from csv in R

I want to import a excel file into r and the file contains a column with date and time in this form:
20.08.2018 16:32:20
If I change to standard format in the csv file itself it looks like this:
43332,68912
If I read in the file by using read_excel() R this date looks like this:
43332.689120370371
How can I turn the current format into a date format in R?
It is a good practice not to edit anything in a .csv (or excel) file—so to treat them as read only—and to make changes in a script (so in R).
Let's call your data frame "my_df" and your datetime variable "date".
library(readr)
library(magrittr)
my_df$date %<>% parse_datetime("%d.%m.%Y %H:%M:%S")
Edit: Trying to piece together information from your comments, I created an excel file with one column called STARTED with date and time in the form 20.08.2018 16:32:20 as you indicate in the question. Since you seem to like readxl:
library(readxl)
library(magrittr)
myData <- read_excel("myData.xlsx")
myData$STARTED %<>% parse_datetime("%d.%m.%Y %H:%M:%S")
Which is the same code I already wrote above. This gives:
# A tibble: 1 x 1
STARTED
<dttm>
1 2018-08-20 16:32:20
If you only get NA, your data is not in the format given by your example 20.08.2018 16:32:20.
Following your discussion with #prosoitos, it looks like the import function cannot make sense of your date column:
Your line of example data in the comments contains no quotes around your date string. That implies that you copied that data either by opening it with excel (or similar) or your survey tool does not qualify dates as strings. Did you open our .csv in excel, saved it as .xlsx and tried to import the result in R? That would explain the mess you get, as excel could try to interpret the date strings and convert them to some funny Microsoft format nobody else uses.
Please don't do that, use a raw csv-file that was never touched with excel and import it directly into R.
Your read function obviously does not understand the content of your date variable and apparently replaces it with some unix standard time, which are seconds since 1970. However, it looks like those time stamps are invalid (43332 is something like noon on 1970/01/01), else you could easily transform them to human readable dates.
I suggest you try importing your csv with:
read.csv("your_data.csv", header=TRUE, stringsAsFactors=FALSE)
You may have to specify your seperator, e.g. sep = "\t" (for a tab-seperated file), if it is not whitespace, which is the default seperatpr of the read function. After that, the dates in your dataframe are simple text strings and you can follow up with what #prosoitos said.
(Sorry for adding an additional answer. I would have commented to #prosoitos answer, but I have insufficent reputation points.)
Read CSV into R MyData
read.csv(file="TheDataIWantToReadIn.csv", header=TRUE, sep=",")

Date format when written in Excel comes as general not date

I am using openxlsx package to write back a file. I have already used as.date and format function to make my fates look like dd-mmm-yyyy.
However, when I open the Excel file, even though the date comes as, say "12-may-2018", I cannot filter them out like Excel dates. It shows that the type of the data is general.
Even if I convert it to date format in Excel, it still doesn't let me filter it out by year, month, and day, which happens for Excel dates. I can convert them to date type by manually placing my cursor in the middle of a cell and pressing the return key.
Doing that for the whole data will be too much manual effort which I want to reduce. Is there any way to make it happen. Thanks for any suggestions that you guys give.
Here is my code:
data$datecolumns <- as.date(as.numeric(data$datecolumn), origin = origin - somenumberforcalibartion, format = "%d")
data$datecolumn <- format(data$datecolumn, format = "%d-%b-%Y")
write.xlsx(data, filename)
Here, datecolumn is being read in Excel numeric format.
I just saw a code snippet where the date was being read from CSV as string converted to POSIXct and then again written as CSV is being read as date in Excel. Haven't found anything for Xlsx yet.
Format function makes the date back to string. Which was causing the whole issue, hence remove format function and things work fine. #Tjebo #Roman Lustrik helped me for this.

R script in Power BI returns date as Microsoft.OleDb.Date

The essence:
Why does Powerbi show data of the form 2017-01-04 (yyyy-mm-dd) like this?
The details
I'm trying to transform a table in Power BI using the Run R Script functionality in Edit Query.
The source of the table is a csv file with a column with dates of the format 2017-01-04 (yyyy-mm-dd):
2017-01-04
2017-01-03
2017-01-02
2017-01-01
2016-12-31
2016-12-30
2016-12-29
2016-12-28
2016-12-27
2016-12-26
2016-12-25
2016-12-24
2016-12-23
2016-12-22
Using Get Data, Power BI shows the same date column like this:
And after opening the Edit Query window, the very same date column still looks like this:
However, when trying to run an R sctript with the same data, the column only consists of the "values" Microsoft.OleDb.Date like this:
The R script I'm running is simply:
# 'dataset' holds the input data for this script
output <- head(dataset)
If I try to change the data type, en error is returned:
It all seems very strange to me, and I haven't found a reasonable explanation using Google.
Any suggestions?
I already provided a solution in the comments, but I'll add a detailed suggestion here as well.
The applied steps in Power BI and the resulting date column should look like this:
Here are the details:
After loading the data from the csv file, go to Edit Queries and change the data type to text:
Run the R script
Change the datatype back to date once the script has provided an output.
I wandered upon this answer last week and found that it didn't work for me. I'd receive an error from the R script that "character string is not in a standard unambiguous format." I'm assuming this is due to the many updates that have happened with Power BI in the years since the original answer, because as far as I could tell, all dates were in the exact same format (this error did not occur if I ran the data separately in R/RStudio). I figured I'd leave my solution for those who happen upon this like I did.
I did the exact same thing as vestland's solution, except instead of changing the data type to text, I had to change it to a whole number:
1) Edit query.
2) Convert all date columns to "Whole Number."
3) Run R Script, and convert date columns from numbers to date:
In R, this requires that you use as.Date() or lubridate::as_date(), with the origin argument, origin = "1899-12-30" to get the correct date when you convert from whole number back to date. (This is the origin instead of "1900-01-01" because Excel/Power BI don't account for two leap years in early 20th century, I've heard.)

Dates when exporting to CSV and reading into R

I'm using the R[1] package RGoogleDocs[2] to connect to my Google Docs, obtain a list of spreadsheets and import a specific worksheet from a specified spreadsheet. I can achieve this fine no problem following the example given at https://github.com/hammer/google-spreadsheets-to-r-dataframe
The problem I have encountered is with date columns. Under Google Docs I've selected to format these as YYYY-MM-DD and they display fine within Google Docs.
However, the exported CSV which gets imported to R has these as numeric fields, so for example....
Displayed in Google Docs > As imported to R
2013-02-15 > 41320
2013-02-19 > 41324
2013-02-26 > 41331
2013-03-22 > 41355
This isn't necessarily a problem as it appears that these are elapsed dates, but I don't know what the origin from which they are being counted is. Once I know the origin, R has a function for converting dates/times that allows this to be specified so I can then reformat internally in R ( using the as.Date(date, origin="") function).
To try and get round this I set the formatting to plain text for the date columns, but despite typing the dates in with leading zero's for days/months < 10 they are exported without, so the as.Date() function complains about them being in a non-standard format.
I therefore have two options/questions...
1) What is the origin that Google Docs uses internally for representing dates? (I've searched for this through the Google Help but can't find it, and wider web-searches have been fruitless)
2) Is there a method to export dates as strings to CSV? (I have tried this, but when they're set to "plain text" in Google Docs, the leading zeros ('0') that are typed in when entering the dates are not present in the export, meaning that R complains about the date being in a non standard format*).
Thanks in advance for your time,
slackline
[1] http://www.r-project.org/
[2] http://www.omegahat.org/RGoogleDocs/
I could write a function to pull out the day/month/year as individual elements and derive this though, but figured there is a more direct method.
Concerning your question number 1): Apparently, Google Docs uses 1899-12-30 as the date origin:
as.Date(41320, origin="1899-12-30")
# [1] "2013-02-15"
# etc

Resources