How to encrypt files with AES256-GCM in golang? - encryption

AES256-GCM could be implemented in go as https://gist.github.com/cannium/c167a19030f2a3c6adbb5a5174bea3ff
However, Seal method of interface cipher.AEAD has signature:
Seal(dst, nonce, plaintext, additionalData []byte) []byte
So for very large files, one must read all file contents into memory, which is unacceptable.
A possible way is to implement Reader/Writer interfaces on Seal and Open, but shouldn't that be solved by those block cipher "modes" of AEAD? So I wonder if this is a design mistake of golang cipher lib, or I missed something important with GCM?

AEADs should not be used to encrypt large amounts of data in one go. The API is designed to discourage this.
Encrypting large amounts of data in a single operation means that a) either all the data has to be held in memory or b) the API has to operate in a streaming fashion, by returning unauthenticated plaintext.
Returning unauthenticated data is dangerous it's not
hard to find people on the internet suggesting things like gpg -d your_archive.tgz.gpg | tar xzbecause the gpg command also provides a streaming interface.
With constructions like AES-GCM it's, of course, very easy to
manipulate the plaintext at will if the application doesn't
authenticate it before processing. Even if the application is careful
not to "release" plaintext to the UI until the authenticity has been
established, a streaming design exposes more program attack surface.
By normalising large ciphertexts and thus streaming APIs, the next
protocol that comes along is more likely to use them without realising
the issues and thus the problem persists.
Preferably, plaintext inputs would be chunked into reasonably large
parts (say 16KiB) and encrypted separately. The chunks only need to be
large enough that the overhead from the additional authenticators is
negligible. With such a design, large messages can be incrementally
processed without having to deal with unauthenticated plaintext, and
AEAD APIs can be safer. (Not to mention that larger messages can be
processed since AES-GCM, for one, has a 64GiB limit for a single
plaintext.)
Some thought is needed to ensure that the chunks are in the correct
order, i.e. by counting nonces, that the first chunk should be first, i.e. by starting the nonce at zero, and that the last chunk should be
last, i.e. by appending an empty, terminator chunk with special
additional data. But that's not hard.
For an example, see the chunking used in miniLock.
Even with such a design it's still the case that an attacker can cause
the message to be detectably truncated. If you want to aim higher, an
all-or-nothing transform can be used, although that requires two
passes over the input and isn't always viable.

It's not a design mistake. It's just that the API is incomplete in that regard.
GCM is a streaming mode of operation and therefore able to handle encryption and decryption on demand without stopping the stream. It seems that you cannot reuse the same AEAD instance with the previous MAC state, so you cannot directly use this API for GCM encryption.
You could implement your own GCM on top of crypto.NewCTR and your own implementation of GHASH.

Related

SonarQube: Make sure that encrypting data is safe here. AES/GCM/NoPadding, RSA/ECB/PKCS1Padding

I'm using:
1. RSA/ECB/PKCS1Padding
2. AES/GCM/NoPadding
To encrypt my data in my Android (Java) application. At the documentation of SonarQube it states that:
The Advanced Encryption Standard (AES) encryption algorithm can be used with various modes. Galois/Counter Mode (GCM) with no padding should be preferred to the following combinations which are not secured:
Electronic Codebook (ECB) mode: Under a given key, any given
plaintext block always gets encrypted to the same ciphertext block.
Thus, it does not hide data patterns well. In some senses, it doesn't
provide serious message confidentiality, and it is not recommended
for use in cryptographic protocols at all.
Cipher Block Chaining (CBC) with PKCS#5 padding (or PKCS#7) is
susceptible to padding oracle attacks.
So, as it is recommended, I use AES/GCM/NoPadding as :
Cipher c = Cipher.getInstance("AES/GCM/NoPadding");
But, it still gives me the warning Make sure that encrypting data is safe here.
The same for:
Cipher c = Cipher.getInstance("RSA/ECB/PKCS1Padding");
Why does SonarQube throws that warning?
Aren't these uses safe any more?
AES in GCM mode is secured as a block cipher algorithm. But that doesn't guarantee that the code that encrypts data using AES (in GCM mode) is secured. Several things can go wrong leaving the code vulnerable to attacks. It is developers' responsibility to code it in the right way to get the desired level of security. Some examples where things can go wrong are:
The IV repeats for a given key
The key or the raw data are stored in String data type which keeps lingering in the heap
The secret key is stored in clear text in a property file that goes in the code repository
and so on.
Now, SonarQube cannot identify all these vulnerabilities and hence they've come up with a new concept called Hotspot which is described here as:
Unlike Vulnerabilities, Security Hotspots aren't necessarily issues that are open to attack. Instead, Security Hotspots highlight security-sensitive pieces of code that need to be manually reviewed. Upon review, you'll either find a Vulnerability that needs to be fixed or that there is no threat.
Hotspots have a separate life cycle which is explained in the link given above.
P.S. This answer explains how to encrypt a string in Java with AES in GCM mode in a secured way: https://stackoverflow.com/a/53015144/1235935
Seems like it's a general warning about encrypting any data. There shouldn't be an issue with "AES/GCM/NoPadding", as shown in their test code.

Is HMAC still needed if encrypted data is always saved and retrieved locally

My understanding of HMAC is that it can help to verify the integrity of encrypted data before the data is processed i.e. it can be used to determine whether or not the data being sent to a decryption routine has been modified in any way.
That being the case, is there any advantage in incorporating it into an encryption scheme if the data is never transmitted outside of the application generating it? My use case is quite simple - a user submits data (in plaintext) to the scripts I've written to store customer details. My scripts then encrypt this data and save it to the database, and my scripts then provide a way for the user to retrieve the data and decrypt it based on the record ID they supply. There is no way for my users to send encrypted data directly to the decryption routine and I don't need to provide an external API.
Therefore, is it reasonable to assume that there is a chain of trust in the application by default because the same application is responsible for writing and retrieving the data? If I add HMAC to this scheme, is it redundant in this context or is it best practice to always implement HMAC regardless of the context? I'm intending to use the Defuse library but I'd like to understand what the benefit of HMAC is to my project.
Thanks in advance for any advice or input :)
First, you should understand that there are attacks that allow an attacker to modify encrypted data without decrypting it. See Is there an attack that can modify ciphertext while still allowing it to be decrypted? on Security.SE and Malleability attacks against encryption without authentication on Crypto.SE. If an attacker gets write access to the encrypted data -- even without any decryption keys -- they could cause significant havoc.
You say that the encrypted data is "never transmitted outside of the application generating it" but in the next two sentences you say that you "save it to the database" which appears (to me) to be something of a contradiction. Trusting the processing of encrypted data in memory is one thing, but trusting its serialization to disk, especially if done by another program (such as a database system) and/or on a separate physical machine (now or in the future, as the system evolves).
The significant question here is: would it ever be a possible for an attacker to modify or replace the encrypted data with alternate encrypted data, without access to the application and keys? If the attacker is an insider and runs the program as a normal user, then it's not generally possible to defend your data: anything the program allows the attacker to do is on the table. However, HMAC is relevant when write access to the data is possible for a non-user (or for a user in excess of their normal permissions). If the database is compromised, an attacker could possibly modify data with impunity, even without access to the application itself. Using HMAC verification severely limits the attacker's ability to modify the data usefully, even if they get write access.
My OCD usually dictates that implementing HMAC is always good practice, if for no other reason, to remove the warning from logs.
In your case I do not believe there is a defined upside to implementing HMAC other than ensuring the integrity of the plain text submission. Your script may encrypt the data but it would not be useful in the unlikely event that bad data is passed to it.

Encrypting SQLite

I am going to write my own encryption, but would like to discuss some internals. Should be employed on several mobile platforms - iOS, Android, WP7 with desktop serving more or less as a test platform.
Let's start first with brief characteristics of existing solutions:
SQLite standard (commercial) SEE extension - I have no idea how it works internally and how it co-operates with mentioned mobile platforms.
System.data.sqlite (Windows only): RC4 encyption of the complete DB, ECB mode. They encrypt also DB header, which occasionally (0.01% chance) leads to DB corruption.*) Additional advantage: They use SQLite amalgamation distribution.
SqlCipher (openssl, i.e. several platforms): Selectable encryption scheme. They encrypt whole DB. CBC mode (I think), random IV vector. Because of this, they must modify page parameters (size + reserved space to store IV). They realized the problems related to unencrypted reading of the DB header and tried to introduce workarounds, yet the solution is unsatisfactory. Additional disadvantage: They use SQLite3 source tree. (Which - on the other hand - enables additional features, i.e. fine tuning of the encryption parameters using special pragmas.)
Based on my own analysis I think the following could be a good solution that would not suffer above mentioned problems:
Encrypting whole DB except the DB header.
ECB mode: Sounds risky, but after briefly looking at the DB format I cannot imagine how this could be exploited for an attack.
AES128?
Implementation on top of the SQLite amalgamation (similarly as system.data.sqlite)
I'd like to discuss possible problems of this encryption scheme.
*) Due to SQLite reading DB header without decryption. Due to RC4 (a stream cipher) this problem will manifest at the very first use only. AES would be a lot more dangerous as every "live" DB would sooner or later face this problem.
EDITED - case of VFS-based encryption
Above mentioned methods use codec-based methodology endorsed by sqlite.org. It is a set of 3 callbacks, the most important being this one:
void *(*xCodec)(void *iCtx, void *data, Pgno pgno, int mode)
This callback is used at SQLite discretion for encrypting/decrypting data read from/written to the disk. The data is exchanged page by page. (Page is a multiple of 512 By.)
Alternative option is to use VFS. VFS is a set of callbacks used for low-level OS-services. Among them there are several file-related services, e.g. xOpen/xSeek/xRead/xWrite/xClose. In particular, here are the methods used for data exchange
int (*xRead)(sqlite3_file*, void*, int iAmt, sqlite3_int64 iOfst);
int (*xWrite)(sqlite3_file*, const void*, int iAmt, sqlite3_int64 iOfst);
Data size in these calls ranges from 4 By (frequent case) to the DB page size. If you want to use a block cipher (what else to use?), then you need to organize underlying block cache. I cannot imagine an implementation that would be as safe and as efficient as SQLite built-in transactions.
Second problem: VFS implementation is platform-dependent. Android/iOS/WP7/desktop all use different sources, i.e. VFS-based encryption would have to be implemented platform-by-platform.
Next problem is a more subtle: Platform may use VFS calls to realize file locks. These uses must not be encrypted. More over, shared locks must not be buffered. In other words, encryption at the VFS level might compromise locking functionality.
EDITED - plaintext attack on VFS-based encryption
I realized this later: DB header starts with fixed string "SQLite format 3" and the header contains a lot of other fixed byte values. This opens the door for known plaintext attacks (KPA).
This is mainly the problem of VFS-based encryption as it does not have the info that the DB header is being encrypted.
System.data.sqlite has also this problem as it encrypts (RC4) also the DB header.
SqlCipher overwrites hdr string with salt used to convert password to the key. Moreover, it uses by default AES, hence KPA attack presents no danger.
You don't need to hack db format or sqlite source code. SQLite exposes virtual file-system (vfs) API, which can be used to wrap file system (or another vfs) with encryption layer which encrypts/decrypts pages on the fly. When I did that it turned out to be very simple task, just hundred lines of code or so. This way whole DB will be encrypted, including journal file, and it is completely transparent to any client code. With typical page size of 1024, almost any known block cipher can be used. From what I can conclude from their docs, this is exactly what SQLCipher does.
Regarding the 'problems' you see:
You don't need to reimplement file system support, you can wrap around the default VFS. So no problems with locks or platform-dependence.
SQLite's default OS backend is also VFS, there is no overhead for using VFS except that you add.
You don't need block cache. Of course you will have to read whole block when it asks for just 4 bytes, but don't cache it, it will never be read again. SQLite has its own cache to prevent that (Pager module).
Didn't get much response, so here is my decision:
Own encryption (AES128), CBC mode
Codec interface (same as used by SqlCipher or system.data.sqlite)
DB header unencrypted
Page headers unencrypted as well and used for IV generation
Using amalgamation SQLite distribution
AFAIK this solution should be better than either SqlCipher or system.data.sqlite.

How to safely de-duplicate files encrypted at the client's side?

Bitcasa's claim its to provide infinite storage for a fixed fee.
According to a TechCrunch interview, Bitcasa uses client-side convergent encryption. Thus no unencrypted data ever reaches the server. Using convergent encryption, the encryption-key gets derived from the be encrypted source-data.
Basically, Bitcasa uses a hash function to identify identical files uploaded by different users to store them only once on their servers.
I wonder, how the provider is able to ensure, that no two different files get mapped to the same encrypted file or the same encrypted data stream, since hash functions aren't bijective.
Technical question: What do I have to implement, so that such a collision may never happen.
Most deduplication schemes make the assumption that hash collisions are so unlikely to happen that they can be ignored. This allows clients to skip reuploading already-present data. It does break down when you have two files with the same hash, but that's unlikely to happen by chance (and you did pick a secure hash function to prevent people from doing it intentionally, right?)
If you insist on being absolutely sure, all clients must reupload their data (even if it's already on the server), and once this data is reuploaded, you must check that it's identical to the currently-present data. If it's not, you need to pick a new ID rather than using the hash (and sound the alarm that a collision has been found in SHA1!)

Is it insecure to pass initialization vector and salt along with ciphertext?

I'm new to implementing encryption and am still learning basics, it seems.
I have need for symmetric encryption capabilities in my open source codebase. There are three components to this system:
A server that stores some user data, and information about whether or not it is encrypted, and how
A C# client that lets a user encrypt their data with a simple password when sending to the server, and decrypt with the same password when receiving
A JavaScript client that does the same and therefore must be compatible with the C# client's encryption method
Looking at various JavaScript libraries, I came across SJCL, which has a lovely demo page here: http://bitwiseshiftleft.github.com/sjcl/demo/
From this, it seems that what a client needs to know (besides the password used) in order to decrypt the ciphertext is:
The initialization vector
Any salt used on the password
The key size
Authentication strength (I'm not totally sure what this is)
Is it relatively safe to keep all of this data with the ciphertext? Keep in mind that this is an open source codebase, and there is no way I can reasonably hide these variables unless I ask the user to remember them (yeah, right).
Any advice appreciated.
Initialization vectors and salts are called such, and not keys, precisely because they need not be kept secret. It is safe, and customary, to encode such data along with the encrypted/hashed element.
What an IV or salt needs is to be used only once with a given key or password. For some algorithms (e.g. CBC encryption) there may be some additional requirements, fulfilled by chosing the IV randomly, with uniform probability and a cryptographically strong random number generator. However, confidentiality is not a needed property for an IV or salt.
Symmetric encryption is rarely enough to provide security; by itself, encryption protects against passive attacks, where the attacker observes but does not interfere. To protect against active attacks, you also need some kind of authentication. SJCL uses CCM or OCB2 encryption modes which combine encryption and authentication, so that's fine. The "authentication strength" is the length (in bits) of a field dedicated to authentication within the encrypted text; a strength of "64 bits" means that an attacker trying to alter a message has a maximum probability of 2-64 to succeed in doing so without being detected by the authentication mechanism (and he cannot know whether he has succeeded without trying, i.e. having the altered message sent to someone who knows the key/password). That's enough for most purposes. A larger authentication strength implies a larger ciphertext, by (roughly) the same amount.
I have not looked at the implementation, but from the documentation it seems that the SJCL authors know their trade, and did things properly. I recommend using it.
Remember the usual caveats of passwords and Javascript:
Javascript is code which runs on the client side but is downloaded from the server. This requires that the download be integrity-protected in some way; otherwise, an attacker could inject some of his own code, for instance a simple patch which also logs a copy of the password entered by the user somewhere. In practice, this means that the SJCL code should be served across a SSL/TLS session (i.e. HTTPS).
Users are human beings and human beings are bad at choosing passwords. It is a limitation of the human brain. Moreover, computers keep getting more and more powerful while human brains keep getting more or less unchanged. This makes passwords increasingly weak towards dictionary attacks, i.e. exhaustive searches on passwords (the attacker tries to guess the user's password by trying "probable" passwords). A ciphertext produced by SJCL can be used in an offline dictionary attack: the attacker can "try" passwords on his own computers, without having to check them against your server, and he is limited only by his own computing abilities. SJCL includes some features to make offline dictionary attacks more difficult:
SJCL uses a salt, which prevents cost sharing (usually known as "precomputed tables", in particular "rainbow tables" which are a special kind of precomputed tables). At least the attacker will have to pay the full price of dictionary search for each attacked password.
SJCL uses the salt repeatedly, by hashing it with the password over and over in order to produce the key. This is what SJCL calls the "password strengthening factor". This makes the password-to-key transformation more expensive for the client, but also for the attacker, which is the point. Making the key transformation 1000 times longer means that the user will have to wait, maybe, half a second; but it also multiplies by 1000 the cost for the attacker.

Resources