Output converter for converting DECIMAL('100.000') in pyodbc - pyodbc

I have numeric column in one of the SQL Server tables. When I tried retrieving the data from this table, the numeric value of 100.00 has been taken in Python as DECIMAL (100.000) which is preventing me in passing this value as parameter to one of the stored procedures.
This is the bytecode I am getting for 100.000: b'\x13\x05\x01\x00\xe8vH\x17\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
When I find length of this bytecode it is 19.
def decimal_converter(raw_bytes):
struct.unpack(?, raw_bytes)
return
cn.add_output_converter(pyodbc.SQL_DECIMAL, decimal_converter)
Any help in unpacking this bytecode would be greatly appreciated. Thanks

Related

Duckdb_read_csv struggling with with auto detecting column data types in R

I have some very large CSV files (~183mio. rows by 8 columns) that I want to load into a database using R. I use duckdb for this and it its built-in function duckdb_read_csv, which is supposed to auto-detect datatypes for each column. If I enter the following code:
con = dbConnect(duckdb::duckdb(), dbdir="testdata.duckdb", read_only = FALSE)
duckdb_read_csv(con, "d15072021","mydata.csv",
header = TRUE)
It produces this error:
Error: rapi_execute: Failed to run query
Error: Invalid Input Error: Could not convert string '2' to BOOL between line 12492801 and 12493825 in column 9. Parser options: DELIMITER=',', QUOTE='"', ESCAPE='"' (default), HEADER=1, SAMPLE_SIZE=10240, IGNORE_ERRORS=0, ALL_VARCHAR=0
I've looked at the rows in question and I can't find any irregularities in column 9. Unfortunately, I cannot post the dataset because it's confidential. But the entire column is filled with either FALSE or TRUE.
If I set the parameter nrow.check to something larger than 12493825 it doesn't produce the same error but takes very long and simply converts the column to VARCHAR instead of a logical. Setting nrow.check to -1 (meaning it checks every row for a pattern) crashes R and my PC completely.
The weird thing: This isn't consistent. Earlier I imported the dataset whilst keeping the default value for nrow.check at 500 and it read the file with no issue (though still converting column 9 to VARCHAR). I have to read a lot of files that are the same pattern so I need to have a reliable way of reading them. Anyone know how duckdb_read_csv actually works and why I might get this error?
Note that reading the files into memory and then into a database isn't an option because I run out of memory instantly.
the way the sniffer works is by sampling nrow.check rows to figure out the data type, so the result can differ from runs if you get unlucky, increasing it will reduce the chances of failing it, mainly because the sniffer looks at more rows.
If increasing the number of rows is not possible due to performance issues, you can of course first define the schema of the CSV file. But then you must know the schema beforehand.
As an example of how you can define the schema and turn off the sniffer:
select * from
SELECT * FROM read_csv('test.csv', COLUMNS=STRUCT_PACK(a := 'INTEGER', b := 'INTEGER'), auto_detect='false')

Has anyone used 128 bit integer for datetime

I am working with an S3 bucket and one of the fields is a datetime group and the values appear to be 128-bit integers. An example: 45376204963002810833065984. Has anyone seen this before? If so, how should this be parsed? This should work out to a date in 2022.
This is giving me problems when I try to get the data into a pandas (or polars) dataframe. The data outputs as a JSON string, and then when I try to put it into a pandas dataframe, I get an error: Value too big!. So I'm hoping to use SQL to cast this integer as a datetime before outputting as a JSON file.

How to handle and convert null datetime field into unixtime stamp in Scala

I have some code snippet as below that is not accepted by Scala, it would be appreciated if someone can help to fix it, thanks.
train_no_header is an RDD generated from a csv file, its first line is shown as below:
scala> train_no_header.first
res4: String = 87540,12,1,13,497,2017-11-07 09:30:38,,0
Now, I want to generate another RDD to parse and transform records with null or empty value for the 6th field which should be a DateTime (in the above sample the field is empty), some records might have that and some might not, for those having that, the format is same as 5th which is a UTC DateTime.
I need to calculate the delta between the two DateTime, I plan to convert them into Unixtime format, that being said, the final RDD should have both the two date fields converted into Unixtime format.
So my question is:
with the sample data and format, how do I create the RDD with the needed result?
for records with empty value in the 6th field, how should I handle it so that no exception would be generated in the future query in data frame (which is what I intend to work in)
Thank you very much in advance, any clue is appreciated.

teradata : to calulate cast as length of column

I need to use cast function with length of column in teradata.
say I have a table with following data ,
id | name
1|dhawal
2|bhaskar
I need to use cast operation something like
select cast(name as CHAR(<length of column>) from table
how can i do that?
thanks
Dhawal
You have to find the length by looking at the table definition - either manually (show table) or by writing dynamic SQL that queries dbc.ColumnsV.
update
You can find the maximum length of the actual data using
select max(length(cast(... as varchar(<large enough value>))) from TABLE
But if this is for FastExport I think casting as varchar(large-enough-value) and postprocessing to remove the 2-byte length info FastExport includes is a better solution (since exporting a CHAR() will results in a fixed-length output file with lots of spaces in it).
You may know this already, but just in case: Teradata usually recommends switching to TPT instead of the legacy fexp.

Maximum Length of Value in R Data Frame, RODBC

I am trying to do a simple query of a DB2 database using the RODBC package in R (myQuery<-sqlQuery(channel,paste0("..."))) One of the columns is a Varchar of length 3000. The resulting data frame shows a "NA" in that column when there should be text. Exporting it to csv also only shows "NA". A query in Access shows an odd character encoding (only after clicking on the cell). Is there a maximum length of a value in a R data frame or a maximum length of a field that can be pulled using RODBC? Or is it the encoding of the field that causes the "NA" to appear?
I did an end to end test on DB2 (LUW 9.7) and R (3.2.2 Windows) and it worked fine for me.
SQL code:
create table test (foo varchar(3000));
--actual insert is 3000 chars
insert into test values ('aaaaaa .... a');
--this select worked fine in my normal SQL client
select * from test
R code:
long = sqlQuery(connection, "select * from test");
#Displays the 3000 character value.
long;
My guess is the problem is for some other reason than simply the size of the field:
Character encoding issues. If you are seeing something funny in Access, perhaps the content of the field is something not acceptable in the character encoding R is using, so it is being discarded. (I'm not familiar with character encoding in R in particular, but it is in general a thorny issue for software development).
Overall size of the results. Maybe the problem is due to the overall length of a row rather than the length of a single field. Is the query also returning lots of other stuff? Have you tried a simple test of just this field?
Problem in another version. Maybe you are using a different version than I was, and there is indeed a problem with your version. If you think so, update your question with more information.

Resources