I am working with an S3 bucket and one of the fields is a datetime group and the values appear to be 128-bit integers. An example: 45376204963002810833065984. Has anyone seen this before? If so, how should this be parsed? This should work out to a date in 2022.
This is giving me problems when I try to get the data into a pandas (or polars) dataframe. The data outputs as a JSON string, and then when I try to put it into a pandas dataframe, I get an error: Value too big!. So I'm hoping to use SQL to cast this integer as a datetime before outputting as a JSON file.
Related
I have some code snippet as below that is not accepted by Scala, it would be appreciated if someone can help to fix it, thanks.
train_no_header is an RDD generated from a csv file, its first line is shown as below:
scala> train_no_header.first
res4: String = 87540,12,1,13,497,2017-11-07 09:30:38,,0
Now, I want to generate another RDD to parse and transform records with null or empty value for the 6th field which should be a DateTime (in the above sample the field is empty), some records might have that and some might not, for those having that, the format is same as 5th which is a UTC DateTime.
I need to calculate the delta between the two DateTime, I plan to convert them into Unixtime format, that being said, the final RDD should have both the two date fields converted into Unixtime format.
So my question is:
with the sample data and format, how do I create the RDD with the needed result?
for records with empty value in the 6th field, how should I handle it so that no exception would be generated in the future query in data frame (which is what I intend to work in)
Thank you very much in advance, any clue is appreciated.
I insert my date/time data into a CHAR column in the format: '6/4/2015 2:08:00 PM'.
I want that this should get automatically converted to format:
'2015-06-04 14:08:00' so that it can be used in a query because the format of DATETIME is YYYY-MM-DD hh:mm:ss.fffff.
How to convert it?
Given that you've stored the data in a string format (CHAR or VARCHAR), you have to decide how to make it work as a DATETIME YEAR TO SECOND value. For computational efficiency, and for storage efficiency, it would be better to store the value as a DATETIME YEAR TO SECOND value, converting it on input and (if necessary) reconverting on output. However, if you will frequently display the value without doing computations (including comparisons or sorting) it, then maybe a rococo locale-dependent string notation is OK.
The key function for converting the string to a DATETIME value is TO_DATE. You also need to look at the TO_CHAR function because that documents the format codes that you need to use, and because you'll use that to convert a DATETIME value to your original format.
Assuming the column name is time_string, then you need to use:
TO_DATE(time_string, '%m/%d/%Y %I:%M %x') -- What goes in place of x?
to convert to a DATETIME YEAR TO SECOND — or maybe DATETIME YEAR TO MINUTE — value (which will be further manipulated as if by EXTEND as necessary).
I would personally almost certainly convert the database column to DATETIME YEAR TO SECOND and, when necessary, convert to the string format on output with TO_CHAR. The column name would now be time_value (for sake of concreteness):
TO_CHAR(time_value, '%m/%d/%Y %I:%M %x') -- What goes in place of x?
The manual pages referenced do not immediately lead to a complete specification of the format strings. I think a relevant reference is GL_DATETIME environment variable, but finding that requires more knowledge of the arcana of the Informix product set than is desirable (it is not the first thing that should spring to anyone's mind — not even mine!). If that's correct (it probably is), then one of %p and %r should be used in place of %x in my examples. I have to get Informix (re)configured on my machine to be able to test it.
I am using PHPExcel library to read spread sheet data. This library is giving me trouble reading the 'DATE' values.
Question:
How to instruct the PHPExcel to read the 'DATE' values properly even if 'setReadDataOnly' is set as 'false'?
Description:
PHPExcel version: 2.1, Environment: Ubuntu 12
Here is the code block:
$objReader = \PhpExcel\PHPExcel_IOFactory::createReaderForFile($spreadSheetFullFilePathString);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($spreadSheetFullFilePathString);
$objWorksheetObject = $objPHPExcel->getActiveSheet();
Am getting 'integer' values for the 'DATE' column values by default.
I found that, the line '$objReader->setReadDataOnly(true)' is causing the trouble.
So, Changed it as '$objReader->setReadDataOnly(false)'.
I am getting the Date column value properly after this change. But now the reader is reading ALL the ROWS + COLUMNS found in the excel.
Any help would be greatly appreciated.
If $objReader->setReadDataOnly(true);, then the only reader that can identify dates is the Gunumeric Reader. A date value can only be identified by the number format mask, and setReadDataOnly(true) tells the Reader not to read the formatting, so dates cannot subsequently be identified (Gnumeric is the exception, because it was written after the problem was identified, but none of the other Readers have yet been modified to read this information regardless of the setReadDataOnly value. If you need to identify dates, then the only option at this point is $objReader->setReadDataOnly(false);
However, for date cells, you shouldn't get an integer value; you should get a float: it's the decimal that identifies the time part of the Excel datetime serialized value.
If you know which cells contain dates, then you can convert them to unix timestamps or PHPExcel DateTime objects using the helper functions defined in the PHPExcel_Shared_DateTime class (and can then use all the standard PHP date handling functions).
You say that the reader is now reading ALL the ROWS + COLUMNS found in the excel: it should, irrespective of the setReadDataOnly value. If you don't want to read all the rows and columns, then you need to set a Read Filter.
I am using SQLite for a project and < symbol is not working in the query.
There is a table named Holidays which has a field of datatype datetime.
Suppose the table contains some dates of current year in the column HolidayDate.
SELECT HolidayDate
FROM Holidays
WHERE (HolidayDate >= '1/1/2011')
AND (HolidayDate <= '1/1/2012')
The < symbol in the above query is not working. > symbol in the above query is working well.
Please help me.
Try:
SELECT HolidayDate
FROM Holidays
WHERE HolidayDate >= date('2011-01-01')
AND HolidayDate <= date('2012-01-01')
(date format must be YYYY-MM-DD)
There is no datetime datatype in sqlite.
Sqlite only has 4 types:
integeral number
floating-point number
string (stored either as utf-8 or utf-16 and automatically converted)
blob
Moreover, sqlite is manifest-typed, which means any column can hold value of any type. The declared type is used for two things only:
inserted values are converted to the specified type if they seem to be convertible (and it does not seem to apply to values bound with sqlite_bind_* methods at all)
it hints the indexer or optimizer somehow (I just know it has trouble using indices when the column is not typed)
Even worse, sqlite will silently accept anything as type. It will interpret it as integeral type if it starts with "int", as string if it contains "char" or "text", as floating-point number if it is "real", "double" or "number" and as blob if it's "blob". In other cases the column is simply untyped, which poses no problem to sqlite given how little the typing means.
That means '1/1/2011' is simply a string and neither dates in format 'mm/dd/yyyy' nor dates in format 'dd/mm/yyyy' sort by date when sorted asciibetically (unicodebetically really).
If you stored the dates in ISO format ('yyyy-mm-dd'), the asciibetical sort would be compatible with date sort and you would have no problem.
I'm trying to do a query like this on a table with a DATETIME column.
SELECT * FROM table WHERE the_date =
2011-03-06T15:53:34.890-05:00
I have the following as an string input from an external source:
2011-03-06T15:53:34.890-05:00
I need to perform a query on my database table and extract the row which contains this same date. In my database it gets stored as a DATETIME and looks like the following:
2011-03-06 15:53:34.89
I can probably manipulate the outside input slightly ( like strip off the -5:00 ). But I can't figure out how to do a simple select with the datetime column.
I found the convert function, and style 123 seems to match my needs but I can't get it to work. Here is the link to reference about style 123
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.ase_15.0.blocks/html/blocks/blocks125.htm
I think that convert's slightly wrongly documented in that version of the docs.
Because this format always has century I think you only need use 23. Normally the 100 range for convert adds the century to the year format.
That format only goes down to seconds what's more.
If you want more you'll need to past together 2 x converts. That is, past a ymd part onto a convert(varchar, datetime-column, 14) and compare with your trimmed string. milliseconds comparison is likely to be a problem depending on where you got your big time string though because the Sybase binary stored form has a granularity of 300ms I think, so if your source string is from somewhere else it's not likely to compare. In other words - strip the milliseconds and compare as strings.
So maybe:
SELECT * FROM table WHERE convert(varchar,the_date,23) =
'2011-03-06T15:53:34'
But the convert on the column would prevent the use of an index, if that's a problem.
If you compare as datetimes then the convert is on the rhs - but you have to know what your milliseconds are in the_date. Then an index can be used.