How to handle date time formats using VAEX? - datetime

I am new to VAEX. Also I couldn't find any solution for my specific question in google. So I am asking here hoping someone can solve my issue :).
I am using VAEX to import data from CSV file in my DASH Plotly app and then want to convert Date column to datetime format within VAEX. It successfully imports the data from csv file. Here is how I imported data from csv into VAEX:
vaex_df=vaex.from_csv(link,convert=True,chunk_size=5_000)
Below it shows the type of the Date column after importing into VAEX. As you can see, it takes the Date column as string type.
Then when I try to change data type of Date columns with below code, it gives error:
vaex_df['Date']=vaex_df['Date'].astype('datetime64[ns]')
I dont know how to handle this issue, so I need your help. What am I doing wrong here?
Thanks in advance

The vaex.from_csv is basically an alias to pandas.read_csv. In pandas.read_csv there is an argument that use can use to specify which columns should be parsed as datetime. Just pass that very same argument to vaex.from_csv and you should be good to go!

Related

Importing dates from excel (some formatted as dates some as numbers)

I am working an uploaded document originally from google docs downloaded to an xlsx file. This data has been hand entered & formatted to be DD-MM-YY, however this data has uploaded inconsistently (see example below). I've tried a few different things (kicking myself for not saving the code) and it left me with just removing the incorrectly formatted dates.
Any suggestions for fixing this in excel or (preferably) in R? This is longitudinal data so it would be frustrating to have to go back into every excel sheet to update. Thanks!
data <- read_excel("DescriptiveStats.xlsx")
ex:
22/04/13
43168.0
43168.0
is a correct date value
22/04/13
is not a valid date. it is a text string. to convert it into date you will need to change it into 04/13/2022
there are a few options. one is to change the locale so the 22/04/13 would be valid. see more over here: locale differences in google sheets (documentation missing pages)
2nd option is to use regex to convert it. see examples:
https://stackoverflow.com/a/72410817/5632629
https://stackoverflow.com/a/73722854/5632629
however, it is very likely that 43168 is also not the correct date. if your date before import was 01/02/2022 then after import it could be: 44563 which is actually 02/01/2022 so be careful. you can check it with:
=TO_DATE(43168)
and date can be checked with:
=ISDATE("22/04/13")

Converting numbers back to Date object in R [duplicate]

This question already has answers here:
How to convert Excel date format to proper date in R
(5 answers)
Closed 1 year ago.
I am reading an Excel file using the function readxl::read_excel(), but it appears that date are not getting read properly.
In the original file, one such date is 2020-JUL-13, but it is getting read as 44025.
Is there any way to get back the original date variable as in the original file?
Any pointer is very appreciated.
Thanks,
Basically, you could try to use:
as.Date(44025)
However, you will notice error saying Error in as.Date.numeric(44025) : 'origin' must be supplied. And that means that all you need is to know origin, i.e. starting date from which to start counting. When you check, mentioned by Bappa Das, help page for convertToDate function, you will see that it is just a wrapper for as.Date() function and that the default argument for origin parameter is "1900-01-01".
Next, you can check, why is this, by looking for date systems in Excel and here is a page for this:
Date systems in Excel
Where is an information that for Windows (for Mac there are some exceptions) starting date is indeed "1900-01-01".
And now, finally, if you want to use base R, you can do:
as.Date(44025, origin = "1900-01-01")
This is vectorized function, so you can pass whole column as well.
You can use openxlsx package to convert number to date like
library(openxlsx)
convertToDate("44025")
Or to convert the whole column you can use
convertToDate(df$date)

Get a list from R string that contains a csv

for one of my projects I will need to import the dataset (csv-File) outside of R and then assign it from the Ruby side of the project in R (this will be done with rinruby and already works).
In my R-Script I now need to create a list out of that csv file.
The variable contains an escaped string that contains the original csv.
data <- "\"\",\"futime\",\"fustat\",\"age\",\"resid.ds\",\"rx\",\"ecog.ps\"\n\"1\",59,1,72.3315,2,1,1\n\"2\",115,1,74.4932,2,1,1\n\"3\",156,1,66.4658,2,1,2\n\"4\",421,0,53.3644,2,2,1\n\"5\",431,1,50.3397,2,1,1\n\"6\",448,0,56.4301,1,1,2\n\"7\",464,1,56.937,2,2,2\n\"8\",475,1,59.8548,2,2,2\n\"9\",477,0,64.1753,2,1,1\n\"10\",563,1,55.1781,1,2,2\n\"11\",638,1,56.7562,1,1,2\n\"12\",744,0,50.1096,1,2,1\n\"13\",769,0,59.6301,2,2,2\n\"14\",770,0,57.0521,2,2,1\n\"15\",803,0,39.2712,1,1,1\n\"16\",855,0,43.1233,1,1,2\n\"17\",1040,0,38.8932,2,1,2\n\"18\",1106,0,44.6,1,1,1\n\"19\",1129,0,53.9068,1,2,1\n\"20\",1206,0,44.2055,2,2,1\n\"21\",1227,0,59.589,1,2,2\n\"22\",268,1,74.5041,2,1,2\n\"23\",329,1,43.137,2,1,1\n\"24\",353,1,63.2192,1,2,2\n\"25\",365,1,64.4247,2,2,1\n\"26\",377,0,58.3096,1,2,1"
And I would like to convert this to a R-List.
So my approach is basically to call read.csv(data_as_string) but unfortunately the signature is read.csv(file_where_data_lies).
How can this be done?
Thanks so much!
As Therkel mentioned above, myfunc(file = textConnection(data)) did exactly what I was about to do. Thanks!

export 'datetime.date' to excel via xlwings

I have a pandas dataframe with a datetime.date column.
I try to export the dataframe to excel via xlwings.
I get the following error message:
AttributeError: 'datetime.date' object has no attribute 'microsecond'
I am quite confident the error takes place in the translation between the datetime.date type column into the excel equivalent.
The obvious solution would be convert the column into datetime which should map to the excel timestamp (16.02.2015 00:00:00 -> 42051).
Are there alternatives to that? I find quite odd that there isn't a Date type in Excel. Are there workarounds? Add a dummy time of the day to the date just to convert the column into datetime for the sake of exporting it to excel is not the (type) safest solution.
This is a bug as logged here and admittedly it's a shame it hasn't been resolved yet.
However, in the case of a Pandas DataFrame, you can for now workaround the issue by converting the column into a Pandas datetime column:
df.DateColumn = pandas.to_datetime(df.DateColumn)

Teradata tpump utility

Iam trying to load some data using tpump utility from Unix Console.
The data has various datatypes viz., text, number, decimal, date.
Now, iam stuck as what should be the FORMAT type i need to specify in the tpump script.
I went through the tpump manual, but could not decipher the FORMAT type to used.
The data/columns are delimited by "|" symbol.
Any info/hint in using the appropriate FORMAT type would be of great help.
If this is a duplicate question, please help me with the actual question link.
Thanks a lot in advance.

Resources