I have a solr Cloud cluster, running on different machines, now the data which is indexed needs to be encrypted in such a way that it it stored in the encrypted format on the hard disk and when solr requires the for some queries or to be indexed it should be able to decrypt it do the operation and the data after operation is done should be stored in the encrypted form. I am ok in using amazon s3 if it solves the problem. I have searched and researched a lot but there is no relevant articles, if someone can give me nudge on some direction or how can i accomplish this it will be really great help.
There is no built-in support for encrypting indexes, but there are a few open issues with possible patches.
The most promising one is probably LUCENE-6966. If you want to implement it yourself, looking at writing a custom codec is probably the way to go.
If you don't want to deal with manual patching or writing code yourself, Hitachi has a ready-to-deploy solution for Solr and Lucene named Credeon.
To prevent the risk of a data breach from the search indices, Credeon SFS delivers searchable encryption technology which allows the search process to be carried out directly on encrypted data. Specifically, Credeon:
Encrypts the search indices
Uses a unique randomization process for encrypting each plaintext index
Searches through encrypted indices in real time
Returns search results without decrypting the indices
Related
My C++ application needs to support caching of files downloaded from the network. I started to write a native LRU implementation when someone suggested I look at using SQLite to store an ID, a file blob (typically audio files) and the the add/modify datetimes for each entry.
I have a proof of concept working well for the simple case where one client is accessing the local SQLite database file.
However, I also need to support multiple access by different processes in my application as well as support multiple instances of the application - all reading/writing to/from the same database.
I have found a bunch of posts to investigate but I wanted to ask the experts here too - is this a reasonable use case for SQLite and if so, what features/settings should I dig deeper into in order to support my multiple access case.
Thank you.
M.
Most filesystems are in effect databases too, and most store two or more timestamps for each file, i.e. related to the last modification and last access time allowing implementation of an LRU cache. Using the filesystem directly will make just as efficient use of storage as any DB, and perhaps more so. The filesystem is also already geared toward efficient and relatively safe access by multiple processes (assuming you follow the rules and algorithms for safe concurrent access in a filesystem).
The main advantage of SQLite may be a slightly simpler support for sorting the list of records, though at the cost of using a separate query API. Of course a DB also offers the future ability of storing additional descriptive attributes without having to encode those in the filename or in some additional file(s).
So I’m trying to understand what the process is of encrypting a program I wrote. How does it work. When you encrypt something can that executable be ran without a key? Is there a key that is used?
If you can explain this or add some links that would be great.
There are many different approaches to protecting code. They all fall under the category of DRM (Digital Rights Management).
These are what come to mind for me:
Encryption, actually modifying the byte codes in such a way that they can only be executed if a key or password is provided.
Obfuscation, rearranging code into a way that is still fully executable as is, but reversing by hand is tedious because the code is purposely arranged into a non-standard/confusing order.
Shield, protecting active code that has been loaded into runtime memory. This can be done either with another process that is performing real-time memory checking with checksums. Or it can be done via in memory code encryption with the key stored somewhere in memory that only the application knows where to find it.
There are so many options for DRM, that I'd have trouble picking any implementations that stand out to list here. A simple google search should help point you in the direction of actual implementations.
I'm using an Instant Messaging software, and I suspect that the software is retaing a lot of information about my machine (such as my MAC address) and possibly leaks it. I decided I want to check the local DBs of the software and see what it saves locally.
I have been able to locate, using the software's own log dump and Procmon, the interesting DBs. However, they are SQLite DBs that are key-protected.
Do I have any way to know what will be the format and size of the key? Will it be hex?
How can I efficiantly continue my research? I looked, using procmon, and been able to detect the first time that the software uses a key-protected DB from the first time it is being opened. However, I couldn't detect any 'interesting' local file that the software uses and could hint about the key's location - apart from several Windows Registries values that are being used - but I'm not so sure on how to approach that.
Sorry if I have mistakes in English, and thank in advance.
Do I have any way to know what will be the format and size of the key? Will it be hex?
The key is just in plaintext (just like normal passwords) and the size is (also like passwords) defined by the creator of the database.
How can I efficiantly continue my research?
I would recommend reverse engineering the application and look for the part, where the connection to the database gets initiated. For that, you can use dynamic analysis (with a debugger) or static analysis (analyse the binary with a disassembler).
I am currently using Solr to perform search services over some sensitive records.
As Solr/lucene provides fast searching by storing inverted indexes of the sensitive information in plain text on a disk there is a requirement to encrypt these index files so that unauthorized people can't have access to them by bypassing the system's security.
I found there are similar patches open on Apache JIRA AES encrypted directory and Codec for index-level encryption.
AES encrypted directory looks promising but this patch has been implemented for lucene 3.1 as I am using the newer version, I am not sure if this patch can be used with lucene version 5 or higher.
I was wondering if there is a way to implement a security measure that encrypts the indexes or if it is possible to write some custom plugin which can encrypt/decrypt the indexes on I/O level(i.e FsDirectory)?
The discussion in the comment section of LUCENE-6966 you have shared is really interesting. I would reason with this quote of Robert Muir that there is nothing baked into Solr and probably will never be.
More importantly, with file-level encryption, data would reside in an unencrypted form in memory which is not acceptable to our security team and, therefore, a non-starter for us.
This speaks volumes. You should fire your security team! You are wasting your time worrying about this: if you are using lucene, your data will be in memory, in plaintext, in ways you cannot control, and there is nothing you can do about that!
Trying to guarantee anything better than "at rest" is serious business, sounds like your team is over their head.
So you should consider to encrypt the storage Solr is using on OS level. This should be transparent for Solr. But if someone comes into your system, he should not be able to copy the Solr data.
This is also the conclusion the article Encrypting Solr/Lucene indexes from Erick Erickson of Lucidwors draws in the end
The short form is that this is one of those ideas that doesn't stand up to scrutiny. If you're concerned about security at this level, it's probably best to consider other options, from securing your communications channels to using an encrypting file system to physically divorcing your system from public networks. Of course, you should never, ever, let your working Solr installation be accessible directly from the outside world, just consider the following: http://server:port/solr/update?stream.body=<delete><query>*:*</query></delete>!
I am using a AES encryption/decryption class that needs a key value and vector value encrypt and decrypt data in an MVC3 application.
On saving the record I am encrypting the data then storing in a database. When i retrieve the record i am decrypting in the controller and passing the unencrypted value to the view.
The concern is not protecting data as it traverses the network but to protect the database should it be compromised.
I have read many posts that say dont put the keys for encryption in your code.
Ok so where should they be kept? File system? Another Database?
Looking for some direction.
Common sense says, if an intruder gets access to your database, they will most likely also have access to your file system. It really comes down to you. For one, you can try to hide it. In configuration files, in plain files somewhere in file system, encrypt it with another key that is within the application ... and so on and so forth.
Configuration files are a logical answer, but why take a chance - mix it. Feel free to mix keys with multi-level encryptions - one requiring something from the record itself and being unique to every record, other one requiring a configuration value, third one requiring an application-specific value, and perhaps a fourth one from a library hidden well within your application's references? This way, even if one layer somehow gets compromised, you will have several others protecting it.
Yes, it adds overhead. Yes, it is relatively expensive. But is it worth it if you have sensitive data like user credit card details? You bet it is.
I'm using similar encryption and hashing techniques in one of my personal pet projects that is highly security focused and carefully controlled. It depends how much data you need to display at any one time - for example, mine will ever fetch only 10 records at a time, most likely even less.
... To specify what I mean by mixing: Encrypt once. Then encrypt that data again with different key and suggestedly different algorithm.
I would use Registry Keys protected by ACL, so only the account under which your app pool is running can read them.