Adding Encryption to Solr/lucene indexes

Adding Encryption to Solr/lucene indexes - encryption

I am currently using Solr to perform search services over some sensitive records.
As Solr/lucene provides fast searching by storing inverted indexes of the sensitive information in plain text on a disk there is a requirement to encrypt these index files so that unauthorized people can't have access to them by bypassing the system's security.
I found there are similar patches open on Apache JIRA AES encrypted directory and Codec for index-level encryption.
AES encrypted directory looks promising but this patch has been implemented for lucene 3.1 as I am using the newer version, I am not sure if this patch can be used with lucene version 5 or higher.
I was wondering if there is a way to implement a security measure that encrypts the indexes or if it is possible to write some custom plugin which can encrypt/decrypt the indexes on I/O level(i.e FsDirectory)?

The discussion in the comment section of LUCENE-6966 you have shared is really interesting. I would reason with this quote of Robert Muir that there is nothing baked into Solr and probably will never be.
More importantly, with file-level encryption, data would reside in an unencrypted form in memory which is not acceptable to our security team and, therefore, a non-starter for us.
This speaks volumes. You should fire your security team! You are wasting your time worrying about this: if you are using lucene, your data will be in memory, in plaintext, in ways you cannot control, and there is nothing you can do about that!
Trying to guarantee anything better than "at rest" is serious business, sounds like your team is over their head.
So you should consider to encrypt the storage Solr is using on OS level. This should be transparent for Solr. But if someone comes into your system, he should not be able to copy the Solr data.
This is also the conclusion the article Encrypting Solr/Lucene indexes from Erick Erickson of Lucidwors draws in the end
The short form is that this is one of those ideas that doesn't stand up to scrutiny. If you're concerned about security at this level, it's probably best to consider other options, from securing your communications channels to using an encrypting file system to physically divorcing your system from public networks. Of course, you should never, ever, let your working Solr installation be accessible directly from the outside world, just consider the following: http://server:port/solr/update?stream.body=<delete><query>*:*</query></delete>!

Related

Cipher/Encrypt and uncrypt passwords in .properties files using Talend Data Integration

One suggested way to run jobs is to save context parameters in properties files.
Like this one:
#
#Wed Dec 16 18:23:03 CET 2015
MySQL_AdditionalParams=noDatetimeStringSync\=true
MySQL_Port=3306
MySQL_Login=root
MySQL_Password=secret_password_to_cipher
MySQL_Database=talend MySQL_Server=localhost
This is really easy and useful, but the issue with this is that passwords are saved in clear.
So I'm looking for ways to do easily ciphering.
Here are 2 very insteresting questions already discussed in Stack overflow about password ciphering technics:
Encrypt passwords in configuration files
Securing passwords in properties file
But they are Java native and I'm searching for a better Talend integration. I've already tried different ways in my Talend jobs:
Simple obfuscation using base64 encoding of passwords
Using tEncrypt and tDecrypt components from the forge
Using Jasypt ot JavaXCrypto librairies
Using pwdstore routine from the forge
All these technics are described in a tutorial (in french, sorry) explaining how to crypt passwords in Talend
But another issue is encountered: keys used to cipher/uncipher are always in clear, so if you know good ways to address this point I'll be glad to experiment it.

Fundamentally, anything an application can reach can be reached by somebody breaking in into the system/taking over control of the application.
Even if you use obfuscation (such as base64 or more advanced), or real encryption where the keys are available (even if they too might be obfuscated).
So essentially there is no good enough way to do what you seek to do and worse: it simply cannot exist.
So what do you do instead ?
1. Limit the rights
MySQL_Login=root is a big problem ... a compromise of the application will lead to an immediate compromise of the database (and its data).
So, limit the rights to what is absolutely needed for the application.
This should really be done and is quite easy to achieve.
2. Separate user and admin level access
If certain things are only needed after user interaction, you can use secrets provided by the user (e.g. a password of the user can give a hash and that can be xor-ed with and get you a key that's not always present in the application nor configuration files).
You can use this e.g. to separate out permissions in two levels: the normal user level which only has the bare minimal rights to make the application work for the average user, (but e.g. not the application management rights that allow managing the application itself), and use the secrets kept by the user to keep (pert of) the key outside of the application while there's no admin logged in into the administrative part of the application.
This is rarely done to be honest, nor all that easy.
But even with all that you essentially have to consider the access to e.g. the database to be compromised if the application is compromised.
That's also why data such as application user password should not (must not) be stored in the database without proper precautions.

Scheduled process - providing key for encrypted config

I have developed a tool that loads in an configuration file at runtime. Some of the values are encrypted with an AES key.
The tool will be scheduled to run on a regular basis from a remote machine. What is an acceptable way to provide the decryption key to the program. It has a command line interface which I can pass it through. I can currently see three options
Provide the full key via CLI, meaning the key is available in the clear at OS config level (i.e. CronJob)
Hardcode the key into the binary via source code. Not a good idea for a number of reasons. (Decompiling and less portable)
Use a combination of 1 and 2 i.e. Have a base key in exe and then accept partial key via CLI. This way I can use the same build for multiple machines, but it doesn't solve the problem of decompiling the exe.
It is worth noting that I am not too worried about decompiling the exe to get key. If i'm sure there are ways I could address via obfuscation etc.
Ultimately if I was really conscious I wouldn't be storing the password anywhere.
I'd like to hear what is considered best practice. Thanks.
I have added the Go tag because the tool is written in Go, just in case there is a magical Go package that might help, other than that, this question is not specific to a technology really.
UPDATE:: I am trying to protect the key from external attackers. Not the regular physical user of the machine.

Best practice for this kind of system is one of two things:
A sysadmin authenticates during startup, providing a password at the console. This is often extremely inconvenient, but is pretty easy to implement.
A hardware device is used to hold the credential. The most common and effective are called HSMs (Hardware Security Modules). They come in all kinds of formats, from USB keys to plug-in boards to external rack-mounted devices. HSMs come with their own API that you would need to interface with. The main feature of an HSM is that it never divulges its key, and it has physical safeguards to protect against it being extracted. Your app sends it some data and it signs the data and returns it. That proves that that the hardware module was connected to this machine.
For specific OSes, you can make use of the local secure credential storage, which can provide some reasonable protection. Windows and OS X in particular have these, generally keyed to some credential the admin is required to type at startup. I'm not aware of a particularly effective one for Linux, and in general this is pretty inconvenient in a server setting (because of manual sysadmin intervention).
In every case that I've worked on, an HSM was the best solution in the end. For simple uses (like starting an application), you can get them for a few hundred bucks. For a little more "roll-your-own," I've seen them as cheap as $50. (I'm not reviewing these particularly. I've mostly worked with a bit more expensive ones, but the basic idea is the same.)

Protecting hard-coded data that cannot be available to the user, such as a pass phrase

My program needs to decrypt an encrypted file after it starts up to load data it requires to function. This data cannot be available to the user.
I'm not a cryptography expert, so what is the best way to protect hardcoded passphrases and other tidbits of data from users, debugging software and disassembling software?
I understand that this is probably bad practice but it's essential for me (at least for now).
If there are other ways to protect my data from the above 3, could you let me know what those are?

Short answer: you can't. Once the software is on the user's disk, a sufficiently smart and determined user will be able to extract the secret data from it.
For a longer answer, see "Storing secrets in software" on the security.SE blog.

what is the best way to protect hardcoded passphrases and other
tidbits of data from users, debugging software and disassembling
software?
Request the password from the user and don't hardcode the passphrase. This is the ONLY way to be safe.
If you can't do that and must be hardcoded in the app then all bets are off.
The simplest thing you can do (if you don't have the luxury to do something elaborate which will only delay the inevidable) is to delegate the responsibility to the user of the system.
I mean explicitely state that you software is as secure as the "machine" it runs.
If the attacker has access to start pocking around the file system then your app would be the user's least of concerns

In my experience this type of questions are often motivated by either of four reasons:
Your application is connecting to a restricted remote service, such as a database server.
You do not want your users to mess with configuration settings, which in turn do not really have to be kept confidential as long as they are unmodified.
Copy protection of your own software.
Copy protection of data.
Like Illmari Karonen wrote in his answer, you can't do exactly what you are asking for, and this means in particular that 3 & 4 cannot be solved by cryptography alone.
However, if your reason for asking is either 1 or 2, you have ended up asking the questions you do, because you have made some bad decisions earlier in your design process. For instance, in case of 1, you should not make a restricted service accessible from systems you do not trust completely. The typical safe solution is to introduce a middle tier that is the only client to your restricted resource, and which you can make public.
In case of 2, the best solution is often to use exactly the same logic for checking your configuration files (or registry settings or what ever) when they are loaded at start up, as you use for checking consistency when the user enters them using your preferred configuration user interface. If you spot an inconsistency, just bring up your configuration UI and highlight the problem.

Is the filesystem for Raven DB encrypted?

I'm just trying to determine if the files on the filesystem used by Raven DB are encrypted or not? Can someone just open the files on the filesystem and convert them from binary to ASCII directly, or are they encrypted?
I am trying to convince our management to give RavenDB a shot, but they have concerns about security. They gave the example that you can't just open up an MS SQL db file, convert it from binary to ASCII, and read it. So I am trying to verify if RavenDB prevented that kind of thing as well?

Well, personally I think that your management sucks if they come up with such straw-man arguments.
To answer your question: No, you can't just open any file inside ravens data folder with Notepad and expect to see something meaningful. So, for the ones that don't know how to program, yes they are encrypted.
To convice your management you can tell them that raven uses the same encryption algorithm as Microsofts Exchange Server does. If they want to dig deeper - it's called Esent.

RavenDb storage is not encrypted. You can open it with notepad and see some pieces of data. At the same time I do not think that MS SQL encrypts files by default either.

RavenDB added encryption in mid-2012. Get RavenDB's “bundle:encryption” and then make sure your key is properly encrypted in the .NET config file or whatever.
http://ravendb.net/docs/article-page/3.0/csharp/server/bundles/encryption
http://ayende.com/blog/157473/awesome-ravendb-feature-of-the-day-encryption

SQL Server 2008 does have encryption, but you need to prepare the DB instance beforehand to enable it, then create the DB with encryption enabled and then store data.
If you haven't, you could just copy the DB off the machine and open it in a tool that does have access to it.
With RavenDB, you can tick the box and off you go! (although I do not know the intricacies of moving backups to another machine and restoring them).
In relation to the point your management made, this is a relatively pointless argument.
If you had access directly to the file of a DB, it's game over. Encryption is your very last line of defence.
[I don't think hackers are going to be opening a 40GB file in Notepad .. thats just silly :-)]
So instead of ending up at the worst case, you have to look at the controls you can implement to even get to that level of concern.
You need to work out how would someone even get to that file (and the costs associated with all of the mitigation techniques):
What if they steal the server, or the disk inside it?
What if they can get to the DB via a file share?
What if they can log onto the DB server?
What if an legitimate employee syphons off the data?
Physical Access
Restricting direct access to a server mitigates stealing it. You have to think about all of the preventative controls (door locks, ID cards, iris scanners), detective controls (alarm systems, CCTV) and how much you want to spend on that.
Hence why cloud computing is so attractive!
Access Controls
You then have to get onto the machine via RDP or connect remotely to its file system via Active Directory, so that only a select few could access it - probably IT support and database administrators. Being administrators, they should be vetted and trusted within the organisation (through an Information Security Governance Framework).
If you also wanted to reduce the risk even further, maybe implement 2 Factor Authentication like banks do, so that even knowing the username and password doesn't get you to the server!
Then there's the risk of employees of your company accessing it - legitimately and illegitimately. I mean why go to all of the trouble of buying security guards, dogs and a giant fence when users can query it anyway! You would only allow certain operations on certain parts of the data.
In summary ... 'defence in depth' is how you respond to it. There is always a risk that can be identified, but you need to consider the number of controls in place, add more if the risk is too high. But adding more controls to your organisation in general makes the system less user friendly.

Encrypted, Compressible, Cross Platform, File system in a file

We wish to make a desktop application that searches a locally packaged text database that will be a few GB in size. We are thinking of using lucene.
So basically the user will search for a few words and the local lucene database will give back a result. However, we want to prevent the user from taking a full text dump of the lucene index as the text database is valuable and proprietary. A web application is not the solution here as the Customer would like for this desktop application to work in areas where the internet is not available.
How do we encrypt lucene's database so that only the client application can access lucene's index and a prying user can't take a full text dump of the index?
One way of doing this, we thought, was if the lucene index could be stored on an encrypted file system within a file (something like truecrypt). So the desktop application would "mount" the file containing the lucene indexes.
And this needs to be cross platform (Linux, Windows)...We would be using Qt or Java to write the desktop application.
Is there an easier/better way to do this?
[This is for a client. Yes, yes, conceptually this is bad thing :-) but this is how they want it. Basically the point is that only the Desktop application should be able to access the lucene index and no one else. Someone pointed that this is essentially DRM. Yeah, it resembles DRM]

How do we encrypt lucene's database so
that only the client application can
access lucene's index and a prying
user can't take a full text dump of
the index?
You don't. The user would have the key and the encrypted data, so they could access everything. You can bury the key in an obfuscated file, but that only adds a slight delay. It certainly will not keep out prying users. You need to rethink.

The problem here is that you're trying to both provide the user with data and deny it from em, at the same time. This is basically the DRM problem under a different name - the attacker (user) is in full control of the application's environment (hardware and OS). No security is possible in such situation, only obfuscation and illusion of security.
While you can make it harder for the user to get to the unencrypted data, you can never prevent it - because that would mean breaking your app. Probably the closest thing is to provide a sealed hardware box, but IMHO that would make it unusable.
Note that making a half-assed illusion of security might be sufficient from a legal standpoint (e.g. DMCA's anti-circumvention clauses) - but that's outside SO's scope.

Technically, there is little you can do. Lucene is written in Java and Java code can always be decompiled or run in a debugger to get the key which you need to store somewhere (probably in the license key which you sell the user).
Your only option is the law (or the contract with the user). The text data is copyrighted, so you can sue the user if they use it in any way that is outside the scope of the license agreement.
Or you can write your own text indexing system.
Or buy a commercial one which meets your needs.
[EDIT] If you want to use an encrypted index, just implement your own FSDirectory. Check the source for SimpleFSDirectory for an example.

Why not building an index that contains only the data that user can access and ship that index with the desktop app?

True-crypt sounds like a solid plan to me. You can mount volumes and encrypt them in all sorts of crazy overkill ways, and access them just as any other file.
No, it isn't entirely secure, but it should work well enough.

One-way hash function.
You don't store the plaintext, you store hashes. When you want to search for a term, you push the term through the function and then search for the hash. If there's a match in the database, return thumbs up.
Are you willing to entertain false positives in order to save space? Bloom filter.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex