Encfs on top of ZFS - encryption

I want to switch to ZFS, but still want to encrypt my data. Native Encryption for ZFS was added in "ZFS Pool Version Number 30", but I'm using ZFS on FreeBSD with Version 28. My question is how would encfs (fuse encryption) affect ZFS specific features like data Integrity and deduplication?

encfs has a limit for a file/directory name length which is shorter than that of zfs. I couldn't remember but probably less than 255 chars per object. If you happen to have any files/directories with names exceeding that limit you'll get an i/o error during copying to a mounted encfs resource and the offending file/directory will not be created, that's all.
I do not use deduplication (too little RAM unfortunately) but since encfs uses ECB mode for the encryption of file names, then naturally similar file names are seen as such on the encrypted side (file attributes are unchanged too), which is fortunate for tools like rsync. Unfortunately for deduplication, encfs uses data signing (hmac) for initialization vectors which renders copies of the same content completely different. It is probably feasible to find a way to block this behaviour, but then data integrity depends on it, so I wouldn't recommend that.
If you need device-level encryption take look at cryptsetup. For this you would need to migrate your pool from /dev/disk/by-id/ata-* to /dev/disk/by-id/dm-name-* devices. That would not prohibit using deduplication, only incur slight performance penalty. And you would have to opearate on decrypted data for backup, which may not be desirable.
Currently I am using both methods (that is cryptsetup and encfs). It may seem a little redundant, but I find it necessary to avoid both decrypting data for backup as well as storing encryption parameters in plain text in the .encfs.xml file which bothers my paranoid sense of security ;)

Related

Opinion on borg backup for long-term archiving

Lately I started using borg backup for backing up data from different servers and am amazed by its features (deduplication, encryption, etc...).
Currently I am looking for a good method of archiving larger amounts of data (multiple GB of mixed data, partially containing duplicates) over a longer time frame. The data should be stored as efficiently as possible regarding disk space usage and encryption is desired. However, it should be easily accessible, e.g. via mounts (read-only is sufficient). This question is not about backups, as the archived data will be stored on a machine that is regularly backed-up by itself.
One option would be tar/bzip2 as an "old school way" to create archives and encrypting them using gpg. However, no deduplication is possible and AFAIK there is no option to mount such encrypted archives without decrypting the whole amount of data and using archivemount afterward. This is what I would like to avoid.
LUKS-encrypted container files are more flexible, but I am not sure how to easily implement deduplication here (probably using a suitable FS/hard links?). Also root privileges are needed for mounting them, which I also would like to avoid.
I thought about borg being useful for this purpose, as it allows for creating repos/archives that are
encrypted
deduplicated
compressed
mountable with user privileges
can easily be transferred
Still, right now I am a bit hesitant:
in my understanding the original purpose of borg is running regular backups
borg creates local caches on client machines which are maintained, but not essential for accessing repos/archives (unnecessary)
I usually prefer the most "fundamental" ways of structuring data for long-term storage to ensure accessibility (reverse compatibility with newer versions of the archiving tools)
how susceptible are borg repos to data corruption?
Are there any opinions on this use case from more experience borg users (or any suggestions regarding alternatives)?

Adding Encryption to Solr/lucene indexes

I am currently using Solr to perform search services over some sensitive records.
As Solr/lucene provides fast searching by storing inverted indexes of the sensitive information in plain text on a disk there is a requirement to encrypt these index files so that unauthorized people can't have access to them by bypassing the system's security.
I found there are similar patches open on Apache JIRA AES encrypted directory and Codec for index-level encryption.
AES encrypted directory looks promising but this patch has been implemented for lucene 3.1 as I am using the newer version, I am not sure if this patch can be used with lucene version 5 or higher.
I was wondering if there is a way to implement a security measure that encrypts the indexes or if it is possible to write some custom plugin which can encrypt/decrypt the indexes on I/O level(i.e FsDirectory)?
The discussion in the comment section of LUCENE-6966 you have shared is really interesting. I would reason with this quote of Robert Muir that there is nothing baked into Solr and probably will never be.
More importantly, with file-level encryption, data would reside in an unencrypted form in memory which is not acceptable to our security team and, therefore, a non-starter for us.
This speaks volumes. You should fire your security team! You are wasting your time worrying about this: if you are using lucene, your data will be in memory, in plaintext, in ways you cannot control, and there is nothing you can do about that!
Trying to guarantee anything better than "at rest" is serious business, sounds like your team is over their head.
So you should consider to encrypt the storage Solr is using on OS level. This should be transparent for Solr. But if someone comes into your system, he should not be able to copy the Solr data.
This is also the conclusion the article Encrypting Solr/Lucene indexes from Erick Erickson of Lucidwors draws in the end
The short form is that this is one of those ideas that doesn't stand up to scrutiny. If you're concerned about security at this level, it's probably best to consider other options, from securing your communications channels to using an encrypting file system to physically divorcing your system from public networks. Of course, you should never, ever, let your working Solr installation be accessible directly from the outside world, just consider the following: http://server:port/solr/update?stream.body=<delete><query>*:*</query></delete>!

How to obfuscate key for encryption function?

If an encryption function requires a key, how do you obfuscate the key in your source so that decompilation will not reveal the key and thereby enable decryption?
The answer to large extent depends on the platform and development tool, but in general there's no reliable solution. Encryption function is the point at which the key must be present in it's "natural" form. So all the hacker needs to do is to put the breakpoint there and dump the key. There's no need to even decompile anything. Consequently any obfuscation is only good for newbies or when debugging is not possible for whatever reason. Using the text string that exists in the application as the key is one of variants.
But the best approach is not to have the key inside, of course. Depending on your usage scenario you sometimes can use some system information (eg. smartphone's IMEI) as the key. In other cases you can generate the key when the application is installed and store that key as an integral part of your application data (eg. use column names of your DB as the key, or something similar).
Still, as said, all of this is tracked relatively easy when one can run the debugger.
There's one thing to counteract debugging -- offload decryption to third-party. This can be done by employing external cryptodevice (USB cryptotoken or smartcard) or by calling a web service to decrypt certain parts of information. Of course, there methods are also suitable only for a limited set of scenarios.
Encryption is built into the .NET configuration system. You can encrypt chunks of your app/web.config file, including where you store your private key.
http://www.dotnetprofessional.com/blog/post/2008/03/03/Encrypt-sections-of-WebConfig-or-AppConfig.aspx
source

Encrypting SQLite

I am going to write my own encryption, but would like to discuss some internals. Should be employed on several mobile platforms - iOS, Android, WP7 with desktop serving more or less as a test platform.
Let's start first with brief characteristics of existing solutions:
SQLite standard (commercial) SEE extension - I have no idea how it works internally and how it co-operates with mentioned mobile platforms.
System.data.sqlite (Windows only): RC4 encyption of the complete DB, ECB mode. They encrypt also DB header, which occasionally (0.01% chance) leads to DB corruption.*) Additional advantage: They use SQLite amalgamation distribution.
SqlCipher (openssl, i.e. several platforms): Selectable encryption scheme. They encrypt whole DB. CBC mode (I think), random IV vector. Because of this, they must modify page parameters (size + reserved space to store IV). They realized the problems related to unencrypted reading of the DB header and tried to introduce workarounds, yet the solution is unsatisfactory. Additional disadvantage: They use SQLite3 source tree. (Which - on the other hand - enables additional features, i.e. fine tuning of the encryption parameters using special pragmas.)
Based on my own analysis I think the following could be a good solution that would not suffer above mentioned problems:
Encrypting whole DB except the DB header.
ECB mode: Sounds risky, but after briefly looking at the DB format I cannot imagine how this could be exploited for an attack.
AES128?
Implementation on top of the SQLite amalgamation (similarly as system.data.sqlite)
I'd like to discuss possible problems of this encryption scheme.
*) Due to SQLite reading DB header without decryption. Due to RC4 (a stream cipher) this problem will manifest at the very first use only. AES would be a lot more dangerous as every "live" DB would sooner or later face this problem.
EDITED - case of VFS-based encryption
Above mentioned methods use codec-based methodology endorsed by sqlite.org. It is a set of 3 callbacks, the most important being this one:
void *(*xCodec)(void *iCtx, void *data, Pgno pgno, int mode)
This callback is used at SQLite discretion for encrypting/decrypting data read from/written to the disk. The data is exchanged page by page. (Page is a multiple of 512 By.)
Alternative option is to use VFS. VFS is a set of callbacks used for low-level OS-services. Among them there are several file-related services, e.g. xOpen/xSeek/xRead/xWrite/xClose. In particular, here are the methods used for data exchange
int (*xRead)(sqlite3_file*, void*, int iAmt, sqlite3_int64 iOfst);
int (*xWrite)(sqlite3_file*, const void*, int iAmt, sqlite3_int64 iOfst);
Data size in these calls ranges from 4 By (frequent case) to the DB page size. If you want to use a block cipher (what else to use?), then you need to organize underlying block cache. I cannot imagine an implementation that would be as safe and as efficient as SQLite built-in transactions.
Second problem: VFS implementation is platform-dependent. Android/iOS/WP7/desktop all use different sources, i.e. VFS-based encryption would have to be implemented platform-by-platform.
Next problem is a more subtle: Platform may use VFS calls to realize file locks. These uses must not be encrypted. More over, shared locks must not be buffered. In other words, encryption at the VFS level might compromise locking functionality.
EDITED - plaintext attack on VFS-based encryption
I realized this later: DB header starts with fixed string "SQLite format 3" and the header contains a lot of other fixed byte values. This opens the door for known plaintext attacks (KPA).
This is mainly the problem of VFS-based encryption as it does not have the info that the DB header is being encrypted.
System.data.sqlite has also this problem as it encrypts (RC4) also the DB header.
SqlCipher overwrites hdr string with salt used to convert password to the key. Moreover, it uses by default AES, hence KPA attack presents no danger.
You don't need to hack db format or sqlite source code. SQLite exposes virtual file-system (vfs) API, which can be used to wrap file system (or another vfs) with encryption layer which encrypts/decrypts pages on the fly. When I did that it turned out to be very simple task, just hundred lines of code or so. This way whole DB will be encrypted, including journal file, and it is completely transparent to any client code. With typical page size of 1024, almost any known block cipher can be used. From what I can conclude from their docs, this is exactly what SQLCipher does.
Regarding the 'problems' you see:
You don't need to reimplement file system support, you can wrap around the default VFS. So no problems with locks or platform-dependence.
SQLite's default OS backend is also VFS, there is no overhead for using VFS except that you add.
You don't need block cache. Of course you will have to read whole block when it asks for just 4 bytes, but don't cache it, it will never be read again. SQLite has its own cache to prevent that (Pager module).
Didn't get much response, so here is my decision:
Own encryption (AES128), CBC mode
Codec interface (same as used by SqlCipher or system.data.sqlite)
DB header unencrypted
Page headers unencrypted as well and used for IV generation
Using amalgamation SQLite distribution
AFAIK this solution should be better than either SqlCipher or system.data.sqlite.

How to durably rename a file in POSIX?

What's the correct way to durably rename a file in a POSIX file system? Specifically wondering about fsyncs on the directories. (If this depends on the OS/FS, I'm asking about Linux and ext3/ext4).
Note: there are other questions on StackOverflow about durable renames, but AFAICT they don't address fsync-ing the directories (which is what matters to me - I'm not even modifying file data).
I currently have (in Python):
dstdirfd = open(dstdirpath, O_DIRECTORY|O_RDONLY)
rename(srcdirpath + '/' + filename, dstdirpath + '/' + filename)
fsync(dstdirfd)
Specific questions:
Does this also implicitly fsync the source directory? Or might I end up with the file showing up in both directories after a power cycle (meaning I'd have to check the hard link count and manually perform recovery), i.e. it's impossible to guarantee a durably atomic move operation?
If I fsync the source directory instead of the destination directory, will that also implicitly fsync the destination directory?
Are there any useful related testing/debugging/learning tools (fault injectors, introspection tools, mock filesystems, etc.)?
Thanks in advance.
Unfortunately Dave’s answer is wrong.
Not all POSIX systems might even have a durable storage. And if they do, it is still “allowed” to be hosed after a system crash. For those systems a no-op fsync() makes sense, and such fsync() is explicitly allowed under POSIX. It is also legal for the file to be recoverable in the old directory, the new directory, both, or any other location. POSIX makes no guarantees for system crashes or file system recoveries.
The real question should be:
How to do a durable rename on systems which support that through the POSIX API?
You need to do a fsync() on both, source and destination directory, because the minimum those fsync()s are supposed to do is persist how source or destination directory should look like.
Does a fsync(destdirfd) also implicitly fsync the source directory?
POSIX in general: no, nothing implies that
ext3/4: I’m not sure if both changes to source and destination dir end up in the same transaction in the journal. If they do, they get both commited together.
Or might I end up with the file showing up in both directories after a power cycle (“crash”), i.e. it's impossible to guarantee a durably atomic move operation?
POSIX in general: no guarantees, but you’re supposed to fsync() both directories, which might not be atomic-durable
ext3/4: how much fsync() you minimally need depends on the mount options. E.g. if mounted with “dirsync” you don’t need any of those two fsync()s. At most you need both fsync()s, but I’m almost sure one is enough (atomic-durable then).
If I fsync the source directory instead of the destination directory, will that also implicitly fsync the destination directory?
POSIX: no
ext3/4: I really believe both end up in the same transaction, so it doesn’t matter which of them you fsync()
older kernels ext3: (if they aren’t in the same transaction) some not-so-optimal implementation did way too much syncing on fsync(), I bet it did commit every transaction which came before. And yes, a normal implementation would first link it to the destination and then remove it from the source. So the fsync(srcdirfd) would trigger the fsync() of the destination as well.
ext4/latest ext3: if they aren’t in the same transaction, you might be able to completely sync them independently (so do both)
Are there any useful related testing/debugging/learning tools (fault injectors, introspection tools, mock filesystems, etc.)?
For a real crash, no. By the way, a real crash goes beyond the viewpoint of the kernel. The hardware might reorder writes (and fail to write everything), corrupting the filesystem. Ext4 is better prepared against this, because it enables write barries (mount options) by default (ext3 does not) and can detect corruption with journal checksums (also a mount option).
And for learning: find out if both changes are somehow linked in the journal! :-P
POSIX defines that the rename function must be atomic.
So if you rename(A, B), under no circumstances should you ever see a state with the file in both directories or neither directory. There will always be exactly one, no matter what you do with fsync() or whether the system crashes.
But that doesn't solve the problem of making sure the rename() operation is durable. POSIX answers this question:
If _POSIX_SYNCHRONIZED_IO is defined, the fsync() function shall force all currently queued I/O operations associated with the file indicated by file descriptor fildes to the synchronized I/O completion state. All I/O operations shall be completed as defined for synchronized I/O file integrity completion.
So if you fsync() a directory, pending rename operations must be transferred to disk by the time this returns. fsync() of either directory should be sufficient because atomicity of the rename() operation would require that both directories' changes be synced atomically.
Finally, in contrast to the claim in the blog post mentioned in another answer, the rationale for this explains the following:
The fsync() function is intended to force a physical write of data from the buffer cache, and to assure that after a system crash or other failure that all data up to the time of the fsync() call is recorded on the disk. Since the concepts of "buffer cache", "system crash", "physical write", and "non-volatile storage" are not defined here, the wording has to be more abstract.
A system that claimed to be POSIX compliant and that considered it correct behavior (i.e. not a bug or hardware failure) to complete an fsync() and not persist those changes across a system crash would have to be deliberately misrepresenting itself with respect to the spec.
(updated with additional info re: Linux-specific vs. portable behavior)
The answer to your question is going to depend a lot on the specific OS being used, the type of filesystem being used and whether the source and dest are on the same device or not.
I'd start by reading the rename(2) man page on the platform you're using.
It sounds to me like you're trying to do the job of the filesystem. If you move a file the kernel and file-system are responsible for atomic operation and fault-recovery, not your code.
Anyway, this article seems to address your questions regarding fsync:
http://blogs.gnome.org/alexl/2009/03/16/ext4-vs-fsync-my-take/

Resources