Teradata : ByteInt format , Compress and Numeric Overflow - teradata

I am working on an existing script where the format of one of the columns is set as :
column A BYTEINT Compress (0,1,2,3,4,5,6,7,8,9,NULL)
The possible values for this field lie between 1 - 899 or null. However, when I try to insert the data in the table, I get error 2616 'Numeric Overflow'.
As a possible solution, I changed the format to INT and it seems to work. However, I am not sure what the impact of changing it to INT would be. In particular, I am not sure about:
What compress does to the data
Why BYTEINT would have been set in the first place?
Thanks
Note - It is a pre-existing script of over 5000 rows of code.

Related

Duckdb_read_csv struggling with with auto detecting column data types in R

I have some very large CSV files (~183mio. rows by 8 columns) that I want to load into a database using R. I use duckdb for this and it its built-in function duckdb_read_csv, which is supposed to auto-detect datatypes for each column. If I enter the following code:
con = dbConnect(duckdb::duckdb(), dbdir="testdata.duckdb", read_only = FALSE)
duckdb_read_csv(con, "d15072021","mydata.csv",
header = TRUE)
It produces this error:
Error: rapi_execute: Failed to run query
Error: Invalid Input Error: Could not convert string '2' to BOOL between line 12492801 and 12493825 in column 9. Parser options: DELIMITER=',', QUOTE='"', ESCAPE='"' (default), HEADER=1, SAMPLE_SIZE=10240, IGNORE_ERRORS=0, ALL_VARCHAR=0
I've looked at the rows in question and I can't find any irregularities in column 9. Unfortunately, I cannot post the dataset because it's confidential. But the entire column is filled with either FALSE or TRUE.
If I set the parameter nrow.check to something larger than 12493825 it doesn't produce the same error but takes very long and simply converts the column to VARCHAR instead of a logical. Setting nrow.check to -1 (meaning it checks every row for a pattern) crashes R and my PC completely.
The weird thing: This isn't consistent. Earlier I imported the dataset whilst keeping the default value for nrow.check at 500 and it read the file with no issue (though still converting column 9 to VARCHAR). I have to read a lot of files that are the same pattern so I need to have a reliable way of reading them. Anyone know how duckdb_read_csv actually works and why I might get this error?
Note that reading the files into memory and then into a database isn't an option because I run out of memory instantly.
the way the sniffer works is by sampling nrow.check rows to figure out the data type, so the result can differ from runs if you get unlucky, increasing it will reduce the chances of failing it, mainly because the sniffer looks at more rows.
If increasing the number of rows is not possible due to performance issues, you can of course first define the schema of the CSV file. But then you must know the schema beforehand.
As an example of how you can define the schema and turn off the sniffer:
select * from
SELECT * FROM read_csv('test.csv', COLUMNS=STRUCT_PACK(a := 'INTEGER', b := 'INTEGER'), auto_detect='false')

Add leading zeros to a character variable in progress 4gl

I am trying to import a .csv file to match the records in the database. However, the database records has leading zeros. This is a character field The amount of data is a bit higher side.
Here the length of the field in database is x(15).
The problem I am facing is that the .csv file contains data like example AB123456789 wherein the database field has "00000AB123456789" .
I am importing the .csv to a character variable.
Could someone please let me know what should I do to get the prefix zeros using progress query?
Thank you.
You need to FILL() the input string with "0" in order to pad it to a specific length. You can do that with code similar to this:
define variable inputText as character no-undo format "x(15)".
define variable n as integer no-undo.
input from "input.csv".
repeat:
import inputText.
n = 15 - length( inputText ).
if n > 0 then
inputText = fill( "0", n ) + inputText.
display inputText.
end.
input close.
Substitute your actual field name for inputText and use whatever mechanism you are actually using for importing the CSV data.
FYI - the "length of the field in the database" is NOT "x(15)". That is a display formatting string. The data dictionary has a default format string that was created when the schema was defined but it has absolutely no impact on what is actually stored in the database. ALL Progress data is stored as variable length length. It is not padded to fit the display format and, in fact, it can be "overstuffed" and it is very, very common for applications to do so. This is a source of great frustration to SQL reporting tools that think the display format is some sort of length limit. It is not.

teradata : to calulate cast as length of column

I need to use cast function with length of column in teradata.
say I have a table with following data ,
id | name
1|dhawal
2|bhaskar
I need to use cast operation something like
select cast(name as CHAR(<length of column>) from table
how can i do that?
thanks
Dhawal
You have to find the length by looking at the table definition - either manually (show table) or by writing dynamic SQL that queries dbc.ColumnsV.
update
You can find the maximum length of the actual data using
select max(length(cast(... as varchar(<large enough value>))) from TABLE
But if this is for FastExport I think casting as varchar(large-enough-value) and postprocessing to remove the 2-byte length info FastExport includes is a better solution (since exporting a CHAR() will results in a fixed-length output file with lots of spaces in it).
You may know this already, but just in case: Teradata usually recommends switching to TPT instead of the legacy fexp.

Maximum Length of Value in R Data Frame, RODBC

I am trying to do a simple query of a DB2 database using the RODBC package in R (myQuery<-sqlQuery(channel,paste0("..."))) One of the columns is a Varchar of length 3000. The resulting data frame shows a "NA" in that column when there should be text. Exporting it to csv also only shows "NA". A query in Access shows an odd character encoding (only after clicking on the cell). Is there a maximum length of a value in a R data frame or a maximum length of a field that can be pulled using RODBC? Or is it the encoding of the field that causes the "NA" to appear?
I did an end to end test on DB2 (LUW 9.7) and R (3.2.2 Windows) and it worked fine for me.
SQL code:
create table test (foo varchar(3000));
--actual insert is 3000 chars
insert into test values ('aaaaaa .... a');
--this select worked fine in my normal SQL client
select * from test
R code:
long = sqlQuery(connection, "select * from test");
#Displays the 3000 character value.
long;
My guess is the problem is for some other reason than simply the size of the field:
Character encoding issues. If you are seeing something funny in Access, perhaps the content of the field is something not acceptable in the character encoding R is using, so it is being discarded. (I'm not familiar with character encoding in R in particular, but it is in general a thorny issue for software development).
Overall size of the results. Maybe the problem is due to the overall length of a row rather than the length of a single field. Is the query also returning lots of other stuff? Have you tried a simple test of just this field?
Problem in another version. Maybe you are using a different version than I was, and there is indeed a problem with your version. If you think so, update your question with more information.

PHPExcel - Reading Date values

I am using PHPExcel library to read spread sheet data. This library is giving me trouble reading the 'DATE' values.
Question:
How to instruct the PHPExcel to read the 'DATE' values properly even if 'setReadDataOnly' is set as 'false'?
Description:
PHPExcel version: 2.1, Environment: Ubuntu 12
Here is the code block:
$objReader = \PhpExcel\PHPExcel_IOFactory::createReaderForFile($spreadSheetFullFilePathString);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($spreadSheetFullFilePathString);
$objWorksheetObject = $objPHPExcel->getActiveSheet();
Am getting 'integer' values for the 'DATE' column values by default.
I found that, the line '$objReader->setReadDataOnly(true)' is causing the trouble.
So, Changed it as '$objReader->setReadDataOnly(false)'.
I am getting the Date column value properly after this change. But now the reader is reading ALL the ROWS + COLUMNS found in the excel.
Any help would be greatly appreciated.
If $objReader->setReadDataOnly(true);, then the only reader that can identify dates is the Gunumeric Reader. A date value can only be identified by the number format mask, and setReadDataOnly(true) tells the Reader not to read the formatting, so dates cannot subsequently be identified (Gnumeric is the exception, because it was written after the problem was identified, but none of the other Readers have yet been modified to read this information regardless of the setReadDataOnly value. If you need to identify dates, then the only option at this point is $objReader->setReadDataOnly(false);
However, for date cells, you shouldn't get an integer value; you should get a float: it's the decimal that identifies the time part of the Excel datetime serialized value.
If you know which cells contain dates, then you can convert them to unix timestamps or PHPExcel DateTime objects using the helper functions defined in the PHPExcel_Shared_DateTime class (and can then use all the standard PHP date handling functions).
You say that the reader is now reading ALL the ROWS + COLUMNS found in the excel: it should, irrespective of the setReadDataOnly value. If you don't want to read all the rows and columns, then you need to set a Read Filter.

Resources