Datastage job failed netezza to greenplum data load using ODBC Greenplum Wire Protocol driver - odbc

Greenplum_Connector_0,0: The following SQL statement failed: INSERT INTO GPCC_TT_20211121154035261_15420_0_XXXXX_TABLE_NAME (COLUMN1,COLUMN2,...) SELECT COLUMN1,COLUMN2,... FROM GPCC_ET_20211121154035417_15420_0. The statement reported the following reason: [SQLCODE=HY000][Native=3,484,948] [IBM (DataDirect OEM)][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: missing data for column "xyz_id" (seg2 slice1 192.168.0.0:00 pid=30826)(Where External table gpcc_et_20211121154035417_15420_0, line 91 of gpfdist://ABCD:123/DDCETLMIG_15420_gpw_3_3_20211121154035261: "AG?199645?ABCD EFGH. - HELLOU - JSF RT ADF?MMM?+1?A?DAD. SDA?0082323209?N?N..."; File copy.c; Line 5211; Routine NextCopyFromX; )

The trick here is to read the error message carefully. Somehow your job has managed not to provide a value for column xyz_id. Check your job design thoroughly.

Related

Create table in multiple databases with flyway

I'm trying to connect to the multiple databases and create tables, but when migrating flyway gets syntax error.
This is the migration file I'm trying to run:
\c testdatabase;
CREATE TABLE testtable1;
\c testdatabase2;
CREATE TABLE testtable2;
Flyway gives this output:
Error Code : 0
Message : ERROR: syntax error at or near "\"
Position: 1
Line : 1
Statement : \c testdatabase
It seems like flyway does not support meta-commands like "\c" for connecting to the database. Is there any other way to do connect to the databases and create a table?
The error comes (as indicated in the error input) from the comment lines preceding your two SQL statements in the script: \c testdatabase; which are not valid SQL syntax for comments.
You could simply correct those faulty lines like the following: -- testdatabase, and generally, the error input already gives you a hint as to where lies the problem.

Create a stored procedure using RMySQL

Background: I am developing a rscript that pulls data from a mysql database, performs a logistic regression and then inserts the predictions back into the database. I want the entire system to be self contained in the script in case of database failure. This includes all mysql stored procedures that the script depends on to aggregate the data on the backend since these would be deleted in such a database failure.
Question: I'm having trouble creating a stored procedure from an R script. I am running the following:
mySQLDriver <- dbDriver("MySQL")
connect <- dbConnect(mySQLDriver, group = connection)
query <-
"
DROP PROCEDURE IF EXISTS Test.Tester;
DELIMITER //
CREATE PROCEDURE Test.Tester()
BEGIN
/***DO DATA AGGREGATION***/
END //
DELIMITER ;
"
sendQuery <- dbSendQuery(connect, query)
dbClearResult(dbListResults(connect)[[1]])
dbDisconnect(connect)
I however get the following error that seems to involve the DELIMITER change.
Error in .local(conn, statement, ...) :
could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DELIMITER //
CREATE PROCEDURE Test.Tester()
BEGIN
/***DO DATA AGGREGATION***/
EN' at line 2
What I've Done: I have spent quite a bit of time searching for the answer, but have come up with nothing. What am I missing?
Just wanted to follow up on this string of comments. Thank you for your thoughts on this issue. I have a couple Python scripts that need to have this functionality and I began researching the same topic for Python. I found this question that indicates the answer. The question states:
"The DELIMITER command is a MySQL shell client builtin, and it's recognized only by that program (and MySQL Query Browser). It's not necessary to use DELIMITER if you execute SQL statements directly through an API.
The purpose of DELIMITER is to help you avoid ambiguity about the termination of the CREATE FUNCTION statement, when the statement itself can contain semicolon characters. This is important in the shell client, where by default a semicolon terminates an SQL statement. You need to set the statement terminator to some other character in order to submit the body of a function (or trigger or procedure)."
Hence the following code will run in R:
mySQLDriver <- dbDriver("MySQL")
connect <- dbConnect(mySQLDriver, group = connection)
query <-
"
CREATE PROCEDURE Test.Tester()
BEGIN
/***DO DATA AGGREGATION***/
END
"
sendQuery <- dbSendQuery(connect, query)
dbClearResult(dbListResults(connect)[[1]])
dbDisconnect(connect)

RODBC Error for CURRENT_TIMESTAMP() - is there a list of acceptable keywords?

I submitted a simple (so I thought) query via RODBC :
ch <- odbcConnect(dsn.name, believeNRows=FALSE, rows_at_time=1)
sqlQuery(ch, "CURRENT_TIMESTAMP()")
And it threw the following error:
[1] "42000? -1 Malformed SQL Statement: Unrecognized keyword: CURRENT_TIMESTAMP\r\nStatement:CURRENT_TIMESTAMP()"
[2] "[RODBC] ERROR: Could not SQLExecDirect 'CURRENT_TIMESTAMP()'"
I thought CURRENT_TIMESTAMP() is a common SQL command and didn't expect this to not run. I had checked that the ODBC connection (RSSBus DynamicsCRM Source x64) supports CURRENT_TIMESTAMP(). My connection is OK, I was able to perform some other SQL queries.
So is there a problem with my syntax above? Or is there a list of keywords that RODBC doesn't recognise?
In the above code, first line i.e.
ch <- odbcConnect(dsn.name, believeNRows=FALSE, rows_at_time=1)
creates a connection to your ODBC data source name (dsn.name). So here ch basically stores the connection instance. The second line:
sqlQuery(ch, "CURRENT_TIMESTAMP()")
executes the SQL query on the connection i.e. ch and returns the result in a data frame. So instead of using the method CURRENT_TIMESTAMP() use complete query:
sqlQuery(ch, "SELECT CURRENT_TIMESTAMP()")
I hope this will help.

RODBC connection- limited rows

I set up an ODBC connect to a Netezza (SQL database). The connection is fine. However, R only pulls out 256 rows by default and restricts the number of rows it can pull out.
If I ran the query in Netezza, it would return a total number of rows (300k). I am expecting the same number of rows in R. However, it only returned 256 rows quite a bit short from 300k.
The driver I am using NetezzaSQL version 7.00.02 NSQLODBC.DLL
I tried to change the pre-fetch count to zero in the "Drivers Option' from
Control Panel > Administrative Tools > Data Sources(OBBC) > System DNS
It didn't work. Any ideas?
I think RODBC acts poorly with Netezza. A solution http://datamining.togaware.com/survivor/Database_Connection.html
just add believeNRows=FALSE to either your sqlQuery or odbcConnect call (use the later if you also use sqlFetch.
You can also try using JDBC driver:
library(RJDBC)
drv <- JDBC("org.netezza.Driver", "nzjdbc.jar", "'")
conn <- dbConnect(drv, "jdbc:netezza://host:5480/database", "user", "password")
res <- dbSendQuery(conn, "select * from mytable")
That way you don't have to deal with DSNs, etc.
I know this is kind of out-dated but the problem is not with the RODBC package. The problem lies in how you set up the ODBC connection if you configure the connection in windows you'll see a last tab in the settings where you can specify the amount of rows it'll fetch. And the default is on 256.

SQL query error with ODBC connection in R using Informix driver

With functionality from the RODBC package, I have successfully created an ODBC but receive error messages when I try to query the database. I am using the INFORMIX 3.31 32 bit driver (version 3.31.00.10287).
channel <- odbcConnect("exampleDSN")
unclass(channel)
[1] 3
attr(,"connection.string")
[1] "DSN=exampleDSN;UID=user;PWD=****;DB=exampleDB;HOST=exampleHOST;SRVR=exampleSRVR;SERV=exampleSERV;PRO=onsoctcp ... (more parameters)"
attr(,"handle_ptr")
<pointer: 0x0264c098>
attr(,"case")
[1] "nochange"
attr(,"id")
[1] 4182
attr(,"believeNRows")
[1] TRUE
attr(,"colQuote")
[1] "\""
attr(,"tabQuote")
[1] "\""
attr(,"interpretDot")
[1] TRUE
attr(,"encoding")
[1] ""
attr(,"rows_at_time")
[1] 100
attr(,"isMySQL")
[1] FALSE
attr(,"call")
odbcDriverConnect(connection = "DSN=exampleDSN")
When I try to query and investigate the structure of the returned object, I receive an error message 'chr [1:2] "42000 -201 [Informix][Informix ODBC Driver][Informix]A syntax error has occurred." ...'
Specifically, I wrote an expression to loop through all tables in the database, retrieve 10 rows, and investigate the structure of the returned object.
for (i in 1:153){res <- sqlFetch(channel, sqlTables(channel, tableType="TABLE")$TABLE_NAME[i], max=10); str(res)}
Each iteration returns the same error message. Any ideas where to start?
ADDITIONAL INFO: When I return the object 'res', I receive the following -
> res
[1] "42000 -201 [Informix][Informix ODBC Driver][Informix]A syntax error has occurred."
[2] "[RODBC] ERROR: Could not SQLExecDirect 'SELECT * FROM \"exampleTABLE\"'"
The error message you quote is:
"[RODBC] ERROR: Could not SQLExecDirect 'SELECT * FROM \"exampleTABLE\"'"
Informix only recognizes table names enclosed in double quotes if the environment DELIMIDENT is set in the environment, either of the server or the client (or both). It doesn't much matter what it is set to; I use DELIMIDENT=1 when I want delimited identifiers.
How did you create the table in the Informix database? Unless you created the table with DELIMIDENT set, the table name will not be case sensitive; you do not need the quotes around the table name.
The fact that you're getting error -201 means you've got through the connection process; that is a good start, and simplifies what follows.
I'm not sure whether you're on a Unix machine or a Windows machine - it often helps to indicate that. On Windows, you might have to set the environment with SETNET32 (an Informix program), or there may be a way to specify the DELIMIDENT in the connect string. On Unix, you probably set it in your environment and the R software picks it up. However, there might be problems if you launch R via some sort of menu button or option in a GUI environment; the chances are that the profile is not executed before the R program is.
You can try using the sqlQuery() function in RODBC to retrieve your results. This is the function I use at work and have never had a problem with it:
sqlQuery(channel, "select top 10 * from exampleTABLE")
You should be able to put all of your queries into a list and iterate through them as you were before:
dat <- lapply(queries, function(x) sqlQuery(channel, x))
where queries is your list of queries and channel is your open ODBC connection. I guess I should also encourage you to close said connection when your done with odbcCloseAll()

Resources