I have a sensitive data A (floating point) and nobody should be able to see the raw data.
A user submits some data B which is added to data A. Data B can be seen by anybody. The result of the computation is sent back to the user.
The computation should be done somewhere which could be permanent i.e. not on a private server. Data could be stored in IPFS.
E.g. data A could be encrypted and stored in Ethereum somewhere where the data persists.I thought about pallier homomorphic encryption for the computation but it is too slow i.e. encrypt data A and data B and perform addition.
Any suggestions guys?
Related
I'm trying to implement simple AES encryption for small data packets (8bytes, fixed length) being sent from nodes with very little computational resources.
Ideally the encrypted data would be a single 16byte packet. The data being sent is very repetitive (for an example, wind velocity packet "S011D023", where S and D would always be in the same place, length would always be 8). The nodes may not have the ability to hold runtime variables (e.g. a frame counter, nonce etc.) but could be implemented if essential.
How do I make the solution "reasonably" secure against replay, brute force, any other well known attack? Does adding new random padding bytes to each new packet help in anyway?
it is a question I have. I am not expert in this subject, so please, be kind on the answer.
I understood the homomorphic encryption process allow to read a message as if it has been decrypted, but it will do so without removing the protective layer that the encryption process placed on it.
Let's suppose the word "TESTE" is cryptographed, and a homomorphic encryption is done on that encrypted word.
My question is:
Homomorphic will understand the "meaning" of encrypted text? Homomorphic will know that the encrypted word is also "TESTE" ?
Thank you.
Let me give you a different example. I'm not sure the example can be implemented with today's systems. But it illustrates it anyway:
Party A has 10 numbers and encrypts them.
The encrypted numbers are given to party B.
Party B calculates the sum of the 10 numbers. The result is encrypted data.
Party A is given the encrypted result.
Party A decrypts the result.
The main feature is that party B does not need to decrypt the 10 numbers. Futhermore, the encryption is maintained throughout the entire sum calculation. Therefore, party B has neither knowledge of the input numbers nor of the calculated sum because all operations are carried out on the encrypted data.
I understood the homomorphic encryption process allow to read a message as if it has been decrypted.
No. Homomorphic encryption is public key encryption that allows someone to evaluate (technically circuit evaluation) on the encrypted data without accessing the data. The good side, the client can give the heavy process to the cloud without thinking that it's data is compromised as long as the scheme is not broken.
To understand the FHE, we can look at the textbook RSA, which has no padding. Textbook RSA enables multiplication that is if you multiply two the ciphertexts then decrypt you will get the multiplication of the plaintexts. So if you want to multiplication of your data on the cloud just send your data encrypted with RSA.
RSA only multiplies but there is no other operation and this called partially homomorphic.
There are other public key cryptosystems that support only one operation, for example x-or. This can be used to verify fingerprints on the cloud without revealing the data to the cloud.
If two operations are performed then it is called Fully-Homomorphic and we can build an arbitrary circuit, in theory.
The main idea is semantically encrypting your data (input for the calculation), then send the circuit (the operation you want to be performed) and send to the cloud to calculate with your public key. The cloud calculates the circuit and returns back to you. Only, you can decrypt with your private the return to get the result.
The main point is the calculation is done without accessing the data. As long as the Cryptographic primitives are not broken, no one will access your data.
Note: The breakthrough of Gentry's seminal work is finding a novel way to deal with the noise which is doubles with multiplication.
I am using PBEWithMD5AndDES algorithm to encrypt a password. I need to store encrypted password in a db column. Based on password length, i have to decide the length of db column. I want to know what can be the max length of a password that can be encrypted using the mentioned algorithm.
If this is a password for access to something(s) on your system, you shouldn't store it encrypted at all, you should instead store an irreversible and slow hash, sometimes disguised as a PBKDF (password-based key derivation function). This provides much better security, which is offtopic for SO but has been discussed thousands of times at great length on security.SX and to a somewhat lesser extent on crypto.SX. See https://security.stackexchange.com/questions/211/how-to-securely-hash-passwords for the canonical, and many links.
If you actually do need to store encrypted, PBEwithMD5andDES is not secure. Any value you store encrypted with this algorithm can be easily decrypted in hours at most and probably seconds by a competent attacker, so it's hardly worth the effort of encrypting.
Those said, to answer the only Q you asked: there is no inherent limit on the size of data that can be encrypted insecurely with this algorithm. Like all Java Cipher instances it uses an init, update, doFinal structure, and although the Java language limits the size of the byte[] arguments passed or returned on one call to slightly less than 2^31 (about 2,100,000,000) bytes, you can make any number of calls. (For secure ciphers there are some limits on data size, depending on algorithm and mode, to remain secure, but since this one is not secure to start with it doesn't lose anything by exceeding the data sizes that could reduce security.)
It will take increasing amounts of time to process more data, because this cipher uses CBC mode which cannot be parallelized or pipelined on encrypt. I haven't measured and anyway it will vary depending on your hardware and to some extent your Java version, but if for example you want to encrypt 1,000,000,000,000,000,000 bytes it will probably take somewhere in the range of 300,000 years. Your computer may not last that long.
OTOH since you want to store the result in a database, there will almost certainly be some limit there. Even if there is no architectural limit, or no effective one, every database is ultimately limited by the size of the disk or equivaent storage it can use, and the total of all storage in data centers (admittedly not all) is estimated as approaching 2,000,000,000,000,000,000 bytes.
This question could go into a bitcoin forum but I am trying to understand from a programming point of view.
There are technologies used for distributed storage, like distributed hashtables (say kademlia or similar). How is the bitcoin blockchain different from distributed hashtables? Or is maybe distributed hashtable technology underpinning the bitcoin blockchain? Or why is the bitcoin blockchain hailed as such a breakthrough compared to DHT?
Distributed Hash Table
A DHT is simply a key-value store distributed accross a number of
nodes in a network. The keys are distributed among nodes with a
deterministic algorithm. Each node is responsible for a portion of
the hash table.
A routing algorithm allows to perform requests in the hash table
without knowing every node of the network.
For exemple in the Chord
DHT —which is relatively simple DHT implementation— each
node is assigned an identifier and is responsible of keys which
are closer to its identifier.
Imagine there is 4 nodes that have identifiers: 2a6c, 7811, a20f, e9c3
The data with the identifier 2c92 will be stored on the node 2a6c.
Imagine now that you only know the node 7811 and you are looking
for the data with the identifier eabc.
You ask the node 7811 for the data eabc. 7811 doesn't have it so
it ask the node e9c3 wich send it to node 7811 which send it back
to you.
A clever algorithm allows to find data in O(log(N))
jumps. Without storing the entire routing table of the
network (the addresses of each nodes). Basically you ask the
closest node to the data identifier you know wich itself asks the
closest node it knows and so on reducing the size of the jump at
each step.
A DHT is very scalable because the data are uniformly distributed
among nodes and lookup time generally grows in O(log(N)).
Blockchain
A blockchain is also a distributed data structure but its purpose
is completely different.
Think of it as a history, or a ledger. The purpose is to store a
continuously-growing list of record without the possibility of
tampering and revision.
It is mainly used in the bitcoin currency system for keeping
track of transactions. Its property of being tamper-proof let everybody
know the exact balance of an account by knowing its history of
transaction.
In a blockchain, each node of the network stores the full data.
So it is absolutely not the same idea as the DHT in which data
are divided among nodes. Every new entry in the blockchain must
be validated by a process called mining whose details are out of the scope of this answer but this process insure consensus of the
data.
The two structures are both distributed data structure but serve
different purposes. DHT aims to provide an efficient (in term of
lookup time and storage footprint) structure to divide data on a
network and blockchain aims to provide a tamper-proof data
structure.
In computing, a hash table (hash map) is a data structure which implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.
but block chain is
a digital ledger in which transactions made in bitcoin or another cryptocurrency are recorded chronologically and publicly.
I've an idea in my mind but I've no idea what the magic words are to use in Google - I'm hoping to describe the idea here and maybe someone will know what I'm looking for.
Imagine you have a database. Lots of data. It's encrypted. What I'm looking for is an encryption whereby to decrypt, a variable N must at a given time hold the value M (obtained from a third party, like a hardware token) or it failed to decrypt.
So imagine AES - well, AES is just a single key. If you have the key, you're in. Now imagine AES modified in such a way that the algorithm itself requires an extra fact, above and beyond the key - this extra datum from an external source, and where that datum varies over time.
Does this exist? does it have a name?
This is easy to do with the help of a trusted third party. Yeah, I know, you probably want a solution that doesn't need one, but bear with me — we'll get to that, or at least close to that.
Anyway, if you have a suitable trusted third party, this is easy: after encrypting your file with AES, you just send your AES key to the third party, ask them to encrypt it with their own key, to send the result back to you, and to publish their key at some specific time in the future. At that point (but no sooner), anyone who has the encrypted AES key can now decrypt it and use it to decrypt the file.
Of course, the third party may need a lot of key-encryption keys, each to be published at a different time. Rather than storing them all on a disk or something, an easier way is for them to generate each key-encryption key from a secret master key and the designated release time, e.g. by applying a suitable key-derivation function to them. That way, a distinct and (apparently) independent key can be generated for any desired release date or time.
In some cases, this solution might actually be practical. For example, the "trusted third party" might be a tamper-resistant hardware security module with a built-in real time clock and a secure external interface that allows keys to be encrypted for any release date, but to be decrypted only for dates that have passed.
However, if the trusted third party is a remote entity providing a global service, sending each AES key to them for encryption may be impractical, not to mention a potential security risk. In that case, public-key cryptography can provide a solution: instead of using symmetric encryption to encrypt the file encryption keys (which would require them either to know the file encryption key or to release the key-encryption key), the trusted third party can instead generate a public/private key pair for each release date and publish the public half of the key pair immediately, but refuse to disclose the private half until the specified release date. Anyone else holding the public key may encrypt their own keys with it, but nobody can decrypt them until the corresponding private key has been disclosed.
(Another partial solution would be to use secret sharing to split the AES key into the shares and to send only one share to the third party for encryption. Like the public-key solution described above, this would avoid disclosing the AES key to the third party, but unlike the public-key solution, it would still require two-way communication between the encryptor and the trusted third party.)
The obvious problem with both of the solutions above is that you (and everyone else involved) do need to trust the third party generating the keys: if the third party is dishonest or compromised by an attacker, they can easily disclose the private keys ahead of time.
There is, however, a clever method published in 2006 by Michael Rabin and Christopher Thorpe (and mentioned in this answer on crypto.SE by one of the authors) that gets at least partially around the problem. The trick is to distribute the key generation among a network of several more or less trustworthy third parties in such a way that, even if a limited number of the parties are dishonest or compromised, none of them can learn the private keys until a sufficient majority of the parties agree that it is indeed time to release them.
The Rabin & Thorpe protocol also protects against a variety of other possible attacks by compromised parties, such as attempts to prevent the disclosure of private keys at the designated time or to cause the generated private or public keys not to match. I don't claim to understand their protocol entirely, but, given that it's based on a combination of existing and well studies cryptographic techniques, I see no reason why it shouldn't meet its stated security specifications.
Of course, the major difficulty here is that, for those security specifications to actually amount to anything useful, you do need a distributed network of key generators large enough that no single attacker can plausibly compromise a sufficient majority of them. Establishing and maintaining such a network is not a trivial exercise.
Yes, the kind of encrpytion you are looking for exists. It is called timed-release encryption, or abbreviated TRE. Here is a paper about it: http://cs.brown.edu/~foteini/papers/MathTRE.pdf
The following is an excerpt from the abstract of the above paper:
There are nowdays various e-business applications, such as sealedbid auctions and electronic voting, that require time-delayed decryption of encrypted data. The literature oers at least three main categories of protocols that provide such timed-release encryption (TRE).
They rely either on forcing the recipient of a message to solve some time-consuming, non-paralellizable problem before being able to decrypt, or on the use of a trusted entity responsible for providing a piece of information which is necessary for decryption.
I personally like another name, which is "time capsule cryptography", probably coined at crypto.stackoverflow.com: Time Capsule cryptography?.
A quick answer is no: the key used to decrypt the data cannot change in time, unless you decrypt and re-encrypt all the database periodically (I suppose it is not feasible).
The solution suggested by #Ilmari Karonen is the only one feasible but it needs a trusted third party, furthermore, once obtained the master AES key it is reusable in the future: you cannot use 'one time pads' with that solution.
If you want your token to be time-based you can use TOTP algorithm
TOTP can help you generate a value for variable N (token) at a given time M. So the service requesting the access to your database would attach a token which was generated using TOTP. During validation of token at access provider end, you'll validate if the token holds the correct value based on the current time. You'll need to have a Shared Key at both the ends to generate same TOTP.
The advantage of TOTP is that the value changes with time and one token cannot be reused.
I have implemented a similar thing for two factor authentication.
"One time Password" could be your google words.
I believe what you are looking for is called Public Key Cryptography or Public Key Encryption.
Another good word to google is "asymmetric key encryption scheme".
Google that and I'm quite sure you'll find what you're looking for.
For more information Wikipedia's article
An example of this is : Diffie–Hellman key exchange
Edit (putting things into perspective)
The second key can be determined by an algorithm that uses a specific time (for example at the insert of data) to generate the second key which can be stored in another location.
As other guys pointed out One Time Password may be a good solution for the scenario you proposed.
There's an OTP implemented in C# that you might take a look https://code.google.com/p/otpnet/.
Ideally, we want a generator that depends on the time, but I don't know any algorithm that can do that today.
More generally, if Alice wants to let Bob know about something at a specific point in time, you can consider this setup:
Assume we have a public algorithm that has two parameters: a very large random seed number and the expected number of seconds the algorithm will take to find the unique solution of the problem.
Alice generates a large seed.
Alice runs it first on her computer and computes the solution to the problem. It is the key. She encrypts the message with this key and sends it to Bob along with the seed.
As soon as Bob receives the message, Bob runs the algorithm with the correct seed and finds the solution. He then decrypts the message with this key.
Three flaws exist with this approach:
Some computers can be faster than others, so the algorithm has to be made in such a way as to minimize the discrepancies between two different computers.
It requires a proof of work which may be OK in most scenarios (hello Bitcoin!).
If Bob has some delay, then it will take him more time to see this message.
However, if the algorithm is independent of the machine it runs on, and the seed is large enough, it is guaranteed that Bob will not see the content of the message before the deadline.