Redshift encryption and decrypted could not insert the data into redshift - encryption

I have applied Encryption function using aes_decrypt function in Redshift table.Now try to revert the data using aes_decrypt function.Could not update the data or insert to another table.
It Return errors like "error: UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 4: ordinal not in range(128). Please look at svl_udf_log for more information redshift"?

Related

AES256 encrypted data unable to be copied and paste

I am using OpenSSL to encrypt my data. Assuming I have 3 rows of data(for simplicty)
0123456789
987654321
121212121
After encrypting, I get
Salted__èøm¬è!^¬ü
?‘¡ñ1•yÈ}, .◊¬ó≤|Úx$mø©
However, when I copy using my Mac's CMD+ C, then I paste in another file to be decrpyted, i get this error
bad decrypt
0076160502000000:error:1C80006B:Provider routines:ossl_cipher_generic_block_final:wrong final block length:providers/implementations/ciphers/ciphercommon.c:429:
However if I did not copy and paste the encrpyted data, it can be decrypted properly. I believe is due to the spacings changed.. Is it that we cannot copy the data to another file to be decrpyted and MUST use the exact file that was encrpyted?

Datastage job failed netezza to greenplum data load using ODBC Greenplum Wire Protocol driver

Greenplum_Connector_0,0: The following SQL statement failed: INSERT INTO GPCC_TT_20211121154035261_15420_0_XXXXX_TABLE_NAME (COLUMN1,COLUMN2,...) SELECT COLUMN1,COLUMN2,... FROM GPCC_ET_20211121154035417_15420_0. The statement reported the following reason: [SQLCODE=HY000][Native=3,484,948] [IBM (DataDirect OEM)][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: missing data for column "xyz_id" (seg2 slice1 192.168.0.0:00 pid=30826)(Where External table gpcc_et_20211121154035417_15420_0, line 91 of gpfdist://ABCD:123/DDCETLMIG_15420_gpw_3_3_20211121154035261: "AG?199645?ABCD EFGH. - HELLOU - JSF RT ADF?MMM?+1?A?DAD. SDA?0082323209?N?N..."; File copy.c; Line 5211; Routine NextCopyFromX; )
The trick here is to read the error message carefully. Somehow your job has managed not to provide a value for column xyz_id. Check your job design thoroughly.

Can we use JDBC to write data from postgresql to Spark?

I am trying to load my tables on PostgreSQL to Spark.
I have successfully read the table from PostgreSQL to Spark using jdbc.
I have a code written in R, which I want to use on the table, but I cannot access the data in R.
using the following code to connect
val pgDF_table = spark.read
.format("jdbc")
.option("driver", "org.postgresql.Driver")
.option("url", "jdbc:postgresql://10.128.0.4:5432/sparkDB")
.option("dbtable", "survey_results")
.option("user", "prashant")
.option("password","pandey")
.load()
pgDF_table.show
is there any option as spark.write?
In SparkR,
You can read data from JDBC using the following code:
read.jdbc(url, tableName, partitionColumn = NULL, lowerBound = NULL,
upperBound = NULL, numPartitions = 0L, predicates = list(), ...)
Arguments
`url': JDBC database url of the form 'jdbc:subprotocol:subname'
`tableName': the name of the table in the external database
`partitionColumn': the name of a column of integral type that will be used for partitioning
`lowerBound': the minimum value of 'partitionColumn' used to decide partition stride
`upperBound': the maximum value of 'partitionColumn' used to decide partition stride
`numPartitions': the number of partitions, This, along with 'lowerBound' (inclusive), 'upperBound' (exclusive), form partition strides for generated WHERE clause expressions used to split the column 'partitionColumn' evenly. This defaults to SparkContext.defaultParallelism when unset.
`predicates': a list of conditions in the where clause; each one defines one partition
Data can be written to JDBC using the following code:
write.jdbc(x, url, tableName, mode = "error", ...)
Arguments
`x`: a SparkDataFrame.
`url`: JDBC database url of the form jdbc:subprotocol:subname.
`tableName`: yhe name of the table in the external database.
`mode`: one of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default).
`...`: additional JDBC database connection properties.
JDBC Driver must be in spark classpath

Insert Blob into VARBINARY(MAX) into column encrypted table on SQL Server using pyodbc

I am currently investigating the use of the Always Encrypted feature for Microsoft SQL Server. I'm trying to simply store a blob object in a column encrypted table ('randomised') using pyodbc. Where the code works perfectly fine on non-encrypted columns for inserting arbitrary binary objects, it fails when running the same code on a column that is encrypted. Even more strange is the fact that it works for non-image files, but whenever I'm trying to upload a PDF, JPEG, PNG or similar, it fails.
The code looks like this.
import pyodbc
server = 'tcp:XXXXX-XXXXXX-XXXXX-XXXXX-XXXXX.windows.net,1433'
database = 'db-encryption'
username = 'XXXXXX#dbs-always-encrypted'
password = 'XXXXXXXXX'
connection_string = [
'DRIVER={ODBC Driver 17 for SQL Server}',
'Server={}'.format(server),
'Database={}'.format(database),
'UID={}'.format(username),
'PWD={}'.format(password),
'Encrypt=yes',
'TrustedConnection=yes',
'ColumnEncryption=Enabled',
'KeyStoreAuthentication=KeyVaultClientSecret',
'KeyStorePrincipalId=XXXXX-XXXXXX-XXXXX-XXXXX-XXXXX',
'KeyStoreSecret=XXXXX-XXXXXX-XXXXX-XXXXX-XXXXX'
]
cnxn = pyodbc.connect( ';'.join(connection_string) )
cursor = cnxn.cursor()
insert = 'insert into Blob (Data) values (?)'
files = ['Text.txt', 'SimplePDF.pdf']
for file in files:
# without hex encode
bindata = None
with open(file, 'rb') as f:
bindata = pyodbc.Binary(f.read())
# insert binary
cursor.execute(insert, bindata)
cnxn.commit()
The error message I receive when trying to run the code on the encrypted 'Data' column (VARBINARY(MAX)) is the following
pyodbc.DataError: ('22018', "[22018] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Operand type clash: image is incompatible with varbinary(max) encrypted with (encryption_type = 'RANDOMIZED', encryption_algorithm_name = 'AEAD_AES_256_CBC_HMAC_SHA_256', column_encryption_key_name = 'CEK_Auto1', column_encryption_key_database_name = 'db-encryption') (206) (SQLExecDirectW)")
It seems like the driver reads the bytes and sees that it is a 'known type' and treats the data as 'image'
Is there any way I can prevent this from happening? I simply wanna store any arbitrary byte object in said column.
It might be late but the issue is with your driver. You must install the ODBC 17 driver or use {ODBC Driver 13 for SQL Server} or you can also try {SQL Server}.
Download the driver from here

Create a stored procedure using RMySQL

Background: I am developing a rscript that pulls data from a mysql database, performs a logistic regression and then inserts the predictions back into the database. I want the entire system to be self contained in the script in case of database failure. This includes all mysql stored procedures that the script depends on to aggregate the data on the backend since these would be deleted in such a database failure.
Question: I'm having trouble creating a stored procedure from an R script. I am running the following:
mySQLDriver <- dbDriver("MySQL")
connect <- dbConnect(mySQLDriver, group = connection)
query <-
"
DROP PROCEDURE IF EXISTS Test.Tester;
DELIMITER //
CREATE PROCEDURE Test.Tester()
BEGIN
/***DO DATA AGGREGATION***/
END //
DELIMITER ;
"
sendQuery <- dbSendQuery(connect, query)
dbClearResult(dbListResults(connect)[[1]])
dbDisconnect(connect)
I however get the following error that seems to involve the DELIMITER change.
Error in .local(conn, statement, ...) :
could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DELIMITER //
CREATE PROCEDURE Test.Tester()
BEGIN
/***DO DATA AGGREGATION***/
EN' at line 2
What I've Done: I have spent quite a bit of time searching for the answer, but have come up with nothing. What am I missing?
Just wanted to follow up on this string of comments. Thank you for your thoughts on this issue. I have a couple Python scripts that need to have this functionality and I began researching the same topic for Python. I found this question that indicates the answer. The question states:
"The DELIMITER command is a MySQL shell client builtin, and it's recognized only by that program (and MySQL Query Browser). It's not necessary to use DELIMITER if you execute SQL statements directly through an API.
The purpose of DELIMITER is to help you avoid ambiguity about the termination of the CREATE FUNCTION statement, when the statement itself can contain semicolon characters. This is important in the shell client, where by default a semicolon terminates an SQL statement. You need to set the statement terminator to some other character in order to submit the body of a function (or trigger or procedure)."
Hence the following code will run in R:
mySQLDriver <- dbDriver("MySQL")
connect <- dbConnect(mySQLDriver, group = connection)
query <-
"
CREATE PROCEDURE Test.Tester()
BEGIN
/***DO DATA AGGREGATION***/
END
"
sendQuery <- dbSendQuery(connect, query)
dbClearResult(dbListResults(connect)[[1]])
dbDisconnect(connect)

Resources