Hadoop GPG SerDe - encryption

I am currently working on a Hadoop project that requires data encryption (because the data will be stored in S3). While I primarily expect to access the data though Hive, it would be nice to be able to access it via Pig and any other MapReduce methods.
I know Hadoop has built-in support for compression codecs like gzip, snappy, etc... Is there any support for encryption codecs as well (specifically, GPG)? Has anyone written a GPG SerDe (or anything similar) that is publicly available?

Last I knew Hadoop has no internal support for encryption whatsoever. Seems like you could overload the CompressionCodec with your GPG code, ala http://www.mail-archive.com/common-user#hadoop.apache.org/msg06229.html
Happy Hacking & let us know if you find a solution!

Related

FTPS for transferring file from unix to mainframe

I am looking for JCL Script/Procedures in mainframe which can facilitate file transfer from Unix server to Mainframe.I am required to do FTPS for the Outbound Jobs (pull the file from UNIX server to mainframe Host).
Rather than a JCL, just do it a shell script. Here is a good site on using such commands:
https://blog.eduonix.com/shell-scripting/how-to-automate-ftp-transfers-in-linux-shell-scripting/
Once you have that working in the shell script in USS, you should be able to call the shell script from a JCL so you can execute it on a scheduled batch job if you need it.
Kenny's suggestion is fairly reasonable. IBM's documentation on how to write JCL for FTP(S)-related tasks is available in their "z/OS Communications Server: IP User's Guide and Commands" publication, IBM Publication No. SC27-3662. The current revision appears to be SC27-3662-30, but later revisions are possible. You can easily find this publication online, and make sure you don't skip the section beginning with the title "Submitting FTP requests in batch." Make sure you set the security options correctly (of course).
Please note that you're asking about FTPS, i.e. TLS encryption applied to either or both (preferably both) of the FTP channels (control and data). SFTP is another file transfer protocol based on SSH that z/OS also supports.
Another possible approach that you'll fairly often find available on z/OS installations is to use IBM MQ Advanced for z/OS's Managed File Transfer (MFT) feature to retrieve the file(s) using FTPS. As the name suggests, this'll be managed and have at least some error handling capabilities.
Yet another possible approach if you prefer HTTPS protocol is to use the z/OS Client Web Enablement Toolkit's HTTPS protocol enabler to fetch the file. That's a built-in, standard feature in all currently supported z/OS releases, and you can use it from a relatively simple REXX script for example. Details are available here (z/OS 2.3 variant of the documentation):
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.ieac100/ieac1-cwe-http.htm

Connect to z/OS Mainframe with SFTP

We have a IBM Host System Z sitting in our cellar. Now the issue is that i have no clue about Mainframes!!! (It's not USS btw.)
The Problem: How can i transfer a file from the host system to a windows machine.
Usually on UNIX systems i would just install and ssh daemon and connect to it via. a program called winscp. After that transfer the file in binary so that it does not convert something (Ultraedit and other Editors can handle this).
With the host system it seems to be a bit difficult as the original format from IBM is EBCDIC and i have no idea if there is a state of the art SFTP server program for the host. Could anybody be so kind and enlighten me? From my current expirience with IT there must be a state of the art sftp connection to that system? I appreciate any help/hints/solutions.
Thank you,
O.S
If the mainframe "sitting in [your] cellar" is running z/OS then it has Unix System Services installed. You can't have z/OS without it.
There is an SFTP package available (for free) for z/OS.
You can test to see about Unix System Services by firing up a 3270 emulator going to ISPF option 3.17, putting a forward slash (/) in the Pathname field and pressing the mainframe Enter key. Another way would be to key OMVS at a TSO READY prompt, which will start up a 3270-based Unix shell.
It is possible that USS is simply not available to you; if you're running any supported release of z/OS then USS is present. There could be concerns about supporting something outside a particular group,
Or, depending on what OS you have running on your System z, it's possible you don't have z/OS. You could have z/VM, you could have zLinux, you could have TPF. However, if you're running zLinux, you have linux, which has sftp installed, and which uses ASCII, not EBCDIC.
As cschneid says, however, if you have z/OS, you have USS. TCP/IP, among other things, won't run without it. Also note that z/OS TCP/IP has an FTP server, so you can connect that way if the FTP server is set up. If security is an issue, FTPS is supported, although it's painful to set up. With the native FTP server, you can convert from EBCDIC to ASCII when you're doing the transfer. There's also an NFS server available. And SMB as well, I believe.
And there's an FTP client available as well, so you could FTP from z/OS to your system, if you wanted to.
Maybe a better thing to do would explain what you're trying to do with the data, and what the data is, in general. You can edit files directly on the mainframe, using either TSO, ISPF, or OMVS editors. There are a lot of data types that the mainframe supports that you're not going to be able to handle on a non-z system unless you go through an export process. I'm not really clear on whether you want to convert the file to ASCII when you transfer it or not.
While the others are correct that all recent releases of z/OS have USS built-in, there's quite a bit of setup work that needs to be done in order for individual users to have access to USS capabilities like SFTP. Out of the box, you get USS "minimal mode" that just has enough of USS to support the TCP/IP stack and so forth. USS "full function mode" requires setup:
HFS filesystems need to be allocated
Your security package needs to be manage UIDs/GIDs for your users
etc etc etc
Still, with these details and with nothing more than the software you're entitled to as part of your z/OS license, you can certainly run SFTP and all the other UNIX style network services you're used to.
A good place to start is the UNIX Services Planning guide: http://publibz.boulder.ibm.com/epubs/pdf/bpxzb2c0.pdf

Decrypt packet in lua dissector

I am working on a custom dissector for Wireshark in lua.
Certain PDUs in the protocol is encrypted using AES and I would like to decrypt these so that I can show the clear content in Wireshark. Is this possible with a lua dissector and what APIs can I use to make the decryption?
Or do I need to make a c/c++ dissector to make a dissector that decrypts data?
Personally i use lua-crypto but it requires OpenSSL.
You can check lua-wiki.
Recently i create wrapper for this AES implementation called bgcrypto.
It has no external dependencies but i really do not use it yet in real work.
At the moment Wireshark (2.0) does not expose a crypto API to LUA dissectors, so you have to implement it in the Lua dissector.
For a pure Lua solution you can use lua-lockbox (as mentioned on the Lua wiki). This is not recommended if you need performance, but might be useful for prototyping.
Faster AES decryption implementations typically use a native library, for example:
LuaCrypto - uses OpenSSL, though it does not seem maintained
lcrypt - uses libtomcrypt, but there seems to be no development either
Since none of these libraries satisfied my needs, I developed a new one based on Libgcrypt for these reasons:
Wireshark already links to Libgcrypt for things like SSL decryption.
The Libgcrypt library supports sufficiently many ciphers and hashes.
Libgcrypt is widely available and has an active development team.
The Luagcrypt API is simple enough and documented.
The result is luagcrypt which works on the platforms supported by Wireshark (Linux, OS X, Windows). It is used in the KDNET dissector, this commit shows the transformation from lua-lockbox to luagcrypt.

Transmit the binary diff(delta) of a file over http and merge the diff on the server

Basically i need to find out the binary diff of a file (client and server) and then transmit the diff(delta) over HTTP and then merge the diff(delta) to the file.
Is there any tool for this?
One more requirement is that it should work on all environments.
Thanks a lot
I think this is what you might be looking for.
Options:-
[1].
zsync sends data using delta transfer algorithm (same is used in rsync tool) but over http
read more about zsync and try using it straight away
everything about zsync here :-
http://zsync.moria.org.uk/
[2]. Syncrify. Known tool for rsync over http. Though I have not used it. I doubt this is free tool.
http://web.synametrics.com/Syncrify.htm
[3]. Also , check csync http://www.csync.org/ .
I haven't used this tool. But, got it's reference in this post:-
https://stackoverflow.com/a/8578192/1617067
[EDIT]
you can find library for remote delta-tranfer algorithm used in rsync here:-
http://librsync.sourceforge.net/

Using encryption with Hadoop

The Cloudera documentation says that Hadoop does not support on disk encryption. Would it be possible to use hardware encrypted hard drives with Hadoop?
eCryptfs can be used to do per-file encryption on each individual Hadoop node. It's rather tedious to setup, but it certainly can be done.
Gazzang offers a turnkey commercial solution built on top of eCryptfs to secure "big data" through encryption, and partners with several of the Hadoop and NoSQL vendors.
Gazzang's cloud-based Encryption Platform for Big Data helps
organizations transparently encrypt data stored in the cloud or on
premises, using advanced key management and process-based access control
lists, and helping meet security and compliance requirements.
Full disclosure: I am one of authors and current maintainers of eCryptfs. I am also Gazzang's Chief Architect and a lead developer.
If you have mounted a file system on the drive then Hadoop can use the drive. HDFS stores its data in the normal OS file system. Hadoop will not know whether the drive is encrypted or not and it will not care.
Hadoop doesn't directly support encryption, though a compression codec can be used used for encryption/decryption. Here are more details about encryption and HDFS.
Regarding h/w based encryption, I think Hadoop should be able to work on it. As Spike mentioned, HDFS is like any other Java application and stores it's data in the normal OS file systems. FYI, MapR uses Direct I/O for better HDFS performance.
See also Intel's Rhino. Not open source yet...
https://github.com/intel-hadoop/project-rhino/
https://hadoop.intel.com/pdfs/IntelEncryptionforHadoopSolutionBrief.pdf

Resources