Converting numbers back to Date object in R [duplicate] - r

This question already has answers here:
How to convert Excel date format to proper date in R
(5 answers)
Closed 1 year ago.
I am reading an Excel file using the function readxl::read_excel(), but it appears that date are not getting read properly.
In the original file, one such date is 2020-JUL-13, but it is getting read as 44025.
Is there any way to get back the original date variable as in the original file?
Any pointer is very appreciated.
Thanks,

Basically, you could try to use:
as.Date(44025)
However, you will notice error saying Error in as.Date.numeric(44025) : 'origin' must be supplied. And that means that all you need is to know origin, i.e. starting date from which to start counting. When you check, mentioned by Bappa Das, help page for convertToDate function, you will see that it is just a wrapper for as.Date() function and that the default argument for origin parameter is "1900-01-01".
Next, you can check, why is this, by looking for date systems in Excel and here is a page for this:
Date systems in Excel
Where is an information that for Windows (for Mac there are some exceptions) starting date is indeed "1900-01-01".
And now, finally, if you want to use base R, you can do:
as.Date(44025, origin = "1900-01-01")
This is vectorized function, so you can pass whole column as well.

You can use openxlsx package to convert number to date like
library(openxlsx)
convertToDate("44025")
Or to convert the whole column you can use
convertToDate(df$date)

Related

Importing dates from excel (some formatted as dates some as numbers)

I am working an uploaded document originally from google docs downloaded to an xlsx file. This data has been hand entered & formatted to be DD-MM-YY, however this data has uploaded inconsistently (see example below). I've tried a few different things (kicking myself for not saving the code) and it left me with just removing the incorrectly formatted dates.
Any suggestions for fixing this in excel or (preferably) in R? This is longitudinal data so it would be frustrating to have to go back into every excel sheet to update. Thanks!
data <- read_excel("DescriptiveStats.xlsx")
ex:
22/04/13
43168.0
43168.0
is a correct date value
22/04/13
is not a valid date. it is a text string. to convert it into date you will need to change it into 04/13/2022
there are a few options. one is to change the locale so the 22/04/13 would be valid. see more over here: locale differences in google sheets (documentation missing pages)
2nd option is to use regex to convert it. see examples:
https://stackoverflow.com/a/72410817/5632629
https://stackoverflow.com/a/73722854/5632629
however, it is very likely that 43168 is also not the correct date. if your date before import was 01/02/2022 then after import it could be: 44563 which is actually 02/01/2022 so be careful. you can check it with:
=TO_DATE(43168)
and date can be checked with:
=ISDATE("22/04/13")

How to handle date time formats using VAEX?

I am new to VAEX. Also I couldn't find any solution for my specific question in google. So I am asking here hoping someone can solve my issue :).
I am using VAEX to import data from CSV file in my DASH Plotly app and then want to convert Date column to datetime format within VAEX. It successfully imports the data from csv file. Here is how I imported data from csv into VAEX:
vaex_df=vaex.from_csv(link,convert=True,chunk_size=5_000)
Below it shows the type of the Date column after importing into VAEX. As you can see, it takes the Date column as string type.
Then when I try to change data type of Date columns with below code, it gives error:
vaex_df['Date']=vaex_df['Date'].astype('datetime64[ns]')
I dont know how to handle this issue, so I need your help. What am I doing wrong here?
Thanks in advance
The vaex.from_csv is basically an alias to pandas.read_csv. In pandas.read_csv there is an argument that use can use to specify which columns should be parsed as datetime. Just pass that very same argument to vaex.from_csv and you should be good to go!

Using data.table::fread in R to read in date column with non-ISO format

I have a file that I am reading in. Everything is fine, except for one detail. In the file, dates are stored in the format "mm/dd/yyyy". When I try to read this in with fread, I'm using
fread(..., select = c(var = "Date"))
It appears fread assumes it's in the ISO format, so January 9, 2019 stored as 1/9/2019 is read in as the date"0001-09-20", September 20, year 1. Is there any way to specify a format to tell fread how to read this? It could be in select or colClasses, though select is my preference as I've already selected around 80 columns and specified their data types.
I know I could read it in as character and change it afterward. I'm trying to do as much as possible while reading in the data. If I have to change it after the fact, I will do that.
You have two options.
Read as character and convert in extra step.
Fill feature request in data.table github repo providing your minimal example file and wait for it to be implemented.
Personally I would go with the first one. Good thing is that you can do both.

Is there a way to read in a large document as a data.frame in R?

I'm trying to use ggplot2 on a large data set stored into a csv file. I used to read it with excel.
I don't know how to convert this data into a data.frame. In particular, I have a date column that has the following format: "2020/04/12:12:00". How can I get R to understand this format ?
If it's a csv, you can use:
fread function from data.table. This will be the fastest way to read your csv.
read_csv or read_csv2 (for ; delimited documents) in readr package
If it's .xls (or .xlsx) document, have a look at the readxl package.
All these functions import your data as data.frames (with additional classes like data.table for fread or tibble for read_csv).
Edit
Given your comment, it looks like your file is not an excel but a csv. If you want to convert a column type to date, assuming your dataframe is called df
df[, dates := as.POSIXct(get(colnames(df)[1]), format = "%Y/%m/%d:%H:%M")]
Note that you don't need to use cbind or even reassign the data.table because you use := operator
As the message is saying you, you don't need the extra-precision of POSIXlt
Going by the question alone, I would suggest the openxlsx package, it has helped me reduce the time significantly in reading large datasets. Three points you may find it to be helpful based on your question and the comments
The read command stays same as xlsx package, however would suggest you to use openxlsx::read.xslx(file_path)
the arguments are again same, but in the place of sheetIndex it is sheet and it takes only numbers
If the existing columns are converted to character, then a simple as.Date would work

How do I control the format of columns when I use the write.csv function?

I've created a dataframe in R and one of my columns convert a date such as 01/08/2018 (dd/mm/yyyy) into text form Aug-18 (mmm-yy). However, when I write this to csv using the write.csv function, Excel automatically converts this to date.
Is there a way I can specify the column type to be "Text" so that Excel doesn't change it to date format?
One simple trick that IMHO gets far too little attention is to pad your date colums with whitespace, e.g. df$mydate <- paste(' ', df$mydate, sep=''). This stops Excel from translating the text as dates.
I have started routinely doing that for all kinds of risky columns when doing R<->Excel transformations.
Taken from here: https://support.office.com/en-us/article/stop-automatically-changing-numbers-to-dates-452bd2db-cc96-47d1-81e4-72cec11c4ed8

Resources