Transferring encrypted data using Sqoop - encryption

Is this use case possible:
To first extract data, encrypt it, transfer it over the network, decrypt it and load in hive or HDFS using Sqoop?

You can achieve this by following below steps :
Use sqoop codegen tool to generate mapper code which handles deserialization of table data.
Modify this code to encrypt the data read from table. Each instance represents one row.
Now run sqoop import command which will use this modified mapper code to generate encrypted data. This is transmitted to hdfs.
Use decryption logic over output files in hdfs to get back the content.

Related

Consuming Client-Side Encrypted Data from Snowflake

I am trying to ingest client side encrypted data files from S3 to Snowflake, and want to query the data in Snowflake in readable format using Snowflake SQLs.
I have encrypted the data file using AES 256 and placed in S3. Also, followed the pre requisites of setting up my external stage with MASTER_KEY (AES-256, base64 encoded). However when I read data, it does not show me in readable format.
I would like to know if client side encrypted data can be read in the clear in Snowflake with the right authentication and authorization (without having to unload them back to S3).
Thanks in advance.

Decrypt in Snowflake using Unix command

I am facing an issue where I have to decrypt a db column in Snowflake.The transformation to decrypt the column is a unix command.How do I achieve this decryption in Snowflake.
If you have a row with normal data and one column that is encrypted, and
are not prepared to decrypt the column prior to loading the data into Snowflake
you are also not prepared to decrypt the column after returning result rows from Snowflake via a query.
Then point 2 would imply you ether cannot decrypt client side, OR you need the results to do some form of JOIN/Filtering on, that it would make sense to store the data non-encrypted.
When you refer to decrypt as a command line tool, implies to you are ether encrypting the whole file/pipe-stream with does not match your column reference.
But if you have to decrypt in Snowflake you will need to implement a Javascript UDF to do that. You might find the Using Binary Data doc's helpful.
You can't run unix commands in the Snowflake environment.
If you can't do client side decryption on the way in or out, you have to figure out what the unix command actually does and hopefully you will be able to recreate it using the Cryptographic/Checksum functions.

Smooth migration to encrypted database ( SQLChiper )

I've been using unecrypted data base so far in my app and now I wanted to migrated my database to encrypted database using SQLCipher. Here is my situation.
For the next release I'll update the database version and that will execute the script. For the onUpgrade() to be called I've to call geReadableDatabase() or getWriteableDatabase(). So when I'll called for any operation() it will execute my script and that will do the following operation.
Create and encrypted database.
Export data from the old (un-encrypted) database to the encrypted database.
Delete the old database.
So when I'll perform the migration I'll like to halt all the other operation until the migration is complete and then halted operation will be performed on the encrypted database.
Not really sure how can I achieve that so what approach should I use to achieve this.
I think what you are searching for is
sqlcipher_export method
It easly facilitates copying unencrypted DB to the encrypted one. On Android probably you could do something like:
unencryptedDb.rawExecSQL("ATTACH DATABASE 'secure_db_name.db' AS encrypted KEY 'testkey'");
unencryptedDb.rawExecSQL("SELECT sqlcipher_export('secure_db_name')");
unencryptedDb.rawExecSQL("DETACH DATABASE secure_db_name");

Decryption issue when running presto queries in EMR for data encrypted using AWS client side master key

I have used your latest script that successfully installs presto server(version 0.99) and java 8 on Amazon EMR instance. My data files are located in a s3 bucket encrypted with client-side customer managed key that were encrypted . When I create a hive table that references those encrypted data files in s3, hive can successfully decrypt the records and display it in console. However, when viewing the same external table from presto command line interface the data is displayed in its encrypted form. I have looked at your link given in:
https://prestodb.io/docs/current/release/release-0.57.html and added those properties in my hive.properties file and it looks like given below.
hive.s3.connect-timeout=2m
hive.s3.max-backoff-time=10m
hive.s3.max-error-retries=50
hive.metastore-refresh-interval=1m
hive.s3.max-connections=500
hive.s3.max-client-retries=50
connector.name=hive-hadoop2
hive.s3.socket-timeout=2m
hive.s3.aws-access-key=***
hive.s3.aws-secret-key=**
hive.metastore.uri=thrift://localhost:9083
hive.metastore-cache-ttl=20m
hive.s3.staging-directory=/mnt/tmp/
hive.s3.use-instance-credentials=true
Any help on how to decrypt the files in using presto cli will be much appreciated.
We will follow-up in the issue you filed: https://github.com/facebook/presto/issues/2945

using wcf-sql adapter

I need to poll the data in xml format and map it to the EDI 834.........
I have written the stored procedure using for xml auto,element
when i consume it using add adapter metadata i am getting a xml message....
but i need to use this xml message to map it to the EDI834 ....How to get the structure of xml so that i can use that in map....
I also followed http://social.msdn.microsoft.com/Forums/en-US/biztalkgeneral/thread/6a7e0093-0692-4ba5-9e14-0d2090c2cf54
this thread and generated the schems using xml polling and mapped that to EDI834.
But when i use the map into outbound map...It doesnt map the polling data to edi 834..
The WCF-SQL adapter removes the need to use the 'for xml auto, elements' syntax. This is a legacy leftover from the old Sql Adapter.
Just write your stored procedure in a manner to return a consistent result set, then generate metadata against the stored procedure. The adapter framework will create an appropriate schema based on the metadata returned from your stored procedure.
Then simply map the data from your WCF-SQL schema to your EDI834 schema.
Create the stored procedure that returns xml (or xml part) by using the FOR XML PATH syntax
-Setup a receive location using WCF-SQL. Select XmlPolling. Choose a rootname and namespace for the adapter to wrap around the xml returned from SQL (mandatory).
-Set Polling Statement to: exec [SPNAME]
-Set PollDataAvailableStatement to something appropriate that will return a count > 0 if there are rows/xml to be polled.
-Use passthrureceive pipeline for the receive-location
-Set up a send port (FILE) that subscribes to everything that comes from the receiveport used for the receivelocation.
-Start the application. Examine the XML returned from the adapter.
-In VS generate a schema using well-formed XML (Add->Add generated Items->Generate Schemas) (NOTE: You may have to run the InstallWFX.vbs found under the BizTalk SDK/Utilities/Schema generator, if you have not already done this earlier on the machine).
-Choose the xml file generated by the adapter (give the file a name representing the schema you are trying to create).
-Now you should have a schema representing the xml returned by the adapter, you may have to go through the schema manually and change data types to something more appropriate than what the wizard has chosen.

Resources