I am pondering creating a hash function (like md5 or sha1) using the RSA crypto algorithm. I am wondering if there are any obvious reasons that this algorithm wouldn't work:
Generate RSA public/private keys.
Discard private key, never store it at all.
Begin with a hash with a length of the block size for the RSA encryption.
Encrypt message using public key, one block at a time.
For each encrypted block of the message, accumulate it to the hash using a specified algorithm (probably a combination of +, xor, etc.)
To verify a message has the same hash as a stored hash, use the saved public key and repeat the process.
Is this possible, secure, and practical?
Thanks for any comments.
RSA encryption is not deterministic: if you follow the RSA standard, you will see that some random bytes are injected. Therefore, if you encrypt with RSA the same message twice, chances are that you will not get twice the same output.
Also, your "unspecified step 5" is likely to be weak. For instance, if you define a way to hash a block, and then just XOR the blocks together, then A||B and B||A (for block-sized values A and B) will hash to the same value; that's collision bonanza.
Academically, building hash functions out of number-theoretic structures (i.e. not a raw RSA, but reusing the same kind of mathematical element) has been tried; see this presentation from Lars Knudsen for some details. Similarly, the ECOH hash function was submitted for the SHA-3 competition, using elliptic curves at its core (but it was "broken"). The underlying hope is that hash function security could somehow be linked to the underlying number-theoretic hard problem, thus providing provable security. However, in practice, such hash functions are either slow, weak, or both.
There are already hashes that do essentially this, except perhaps not with the RSA algorithm in particular. They're called cryptographic hashes, and their salient point is that they're cryptographically secure - meaning that the same strength and security-oriented thought that goes into public key cryptographic functions has gone into them as well.
The only difference is, they've been designed from the ground-up as hashes, so they also meet the individual requirements of hash functions, which can be considered as additional strong points that cryptographic functions need not have.
Moreover, there are factors which are completely at odds between the two, for instance, you want hash functions to be as fast as possible without compromising security whereas being slow is oftentimes seen as a feature of cryptographic functions as it limits brute force attacks considerably.
SHA-512 is a great cryptographic hash and probably worthy of your attention. Whirlpool, Tiger, and RipeMD are also excellent choices. You can't go wrong with any of these.
One more thing: if you actually want it to be slow, then you definitely DON'T want a hash function and are going about this completely wrong. If, as I'm assuming, what you want is a very, very secure hash function, then like I said, there are numerous options out there better suited than your example, while being just as or even more cryptographically secure.
BTW, I'm not absolutely convinced that there is no weakness with your mixing algorithm. While the output of each RSA block is intended to already be uniform with high avalanching, etc, etc, etc, I remain concerned that this could pose a problem for chosen plaintext or comparative analysis of similar messages.
Typically, it is best to use an algorithm that is publicly available and has gone through a review process. Even though there might be known weaknesses with such algorithms, that is probably better than the unknown weaknesses in a home-grown algorithm. Note that I'm not saying the proposed algorithm has flaws; it's just that even if a large number of answers are given here saying that it seems good, it doesn't guarantee that it doesn't. Of course, the same thing can be said about algorithms such as MD5, SHA, etc. But at least with those, a large number of people have put them through a rigorous analysis.
Aside from the previous "boilerplate" warnings against designing one's own cryptographic functions, it seems the proposed solution might be somewhat expensive in terms of processing time. RSA encryption on a large document could be prohibitive.
Without thinking too much about it, it seems like that would be cryptographically secure.
However, you'd have to be careful of chosen plaintext attacks, and if your input is large you may run into speed issues (as asymmetric crypto is significantly slower than cryptographic hashes).
So, in short: yes, this seems like it could be possible and secure… But unless there is a really compelling reason, I would use a standard HMAC if you want a keyed hash.
As mentioned above step 4.) is to be done deterministic, i.e. with modulus and public key exponent, only.
If the hash in step 3.) is private, the concept appears secure to me.
Concerning Step 5.): In known CBC mode of kernel algorithms the mix with previous result is done before encryption, Step 4.), might be better for avoiding collusions, e.g. with a lazy hash; XOR is fine.
Will apply this, as available implementations of known hash functions might have backdoors :)
Deterministic Java RSA is here.
EDIT
Also one should mention, that RSA is scalable without any limits. Such hash function can immediately serve as Mask Generation Function.
Related
The IV used in schemes such as CBC has to be random and unpredictable. But at the same time it does not have to be kept secret.
If the IV does not have to be secret, why does it have to be random then? I fail to make sense out of these seemingly contradicting requirements.
I have seen descriptions of attacks which exploit the non-randomness. So, I would understand why randomness is needed. However, things get confusing when the requirements specify that the IV does not have to be secret !This seems to defeat the whole purpose of randomness.
Somebody help clarify this please.
I think you are reversing the roles.
When a cryptographic protocol is designed, it is designed with certain assumptions in mind. The more assumptions you use, the less useful the protocol is, as you are less likely to find scenarios in which the assumptions hold.
In the case of CBC, the IV was designed to not need to be secret. You can keep it a secret, if you like. The algorithm is definitely not less secure this way. It is not, however, a requirement.
Having a non-random IV, on the other hand, causes the entire protocol to be unsuitable for certain applications. When choosing between adding a requirement to the protocol and adding a requirement to its data, the right choice is obvious.
In other words, the IV does not need to be secret, merely because it can be non-secret.
I am developing a large application and i need encryption when a data is traveling between two machines in different continents. I have never worked on encryption. I want a simple encryption which can be handled in PHP / Ruby / Python without any dependencies.
So i decided to use HMAC SHA1.
$pad=hash_hmac("sha1","The quick brown....","mykey");
This is what i found out after some research on the internet.
How hard it is to decrypt it if someone doesn't know the key? Also, any alternatives to this?
UPDATE - thanks for all the responses. Problem solved.
It's impossible to decrypt it, even if you know the key. HMAC SHA1 is a keyed hash algorithm, not encryption.
A hash is a cryptographic one-way function that always generates a value of the same length (I think SHA1 is 128-bits) regardless of the length of the input. The point of a hash is that, given the output value, it's computationally infeasible to find an input value to produce that output. A keyed hash is used to prevent rainbow table attacks. Even if you know the key you can't reverse the hash process.
For encryption you want to look at AES.
SHA1 is a one-way-hash function, by definition it is not decryptable by anyone. The question becomes if you have a plaintext T that hashes to H. How hard is it to find another T which also hashes to H.
According to Wikipedia, for SHA1, the best known brute force attack would take 2^51 evlautions to find a plain text that matches.
If you need actual encryption where you can reverse the process, you should take a look at AES256.
See:
http://en.wikipedia.org/wiki/Cryptographic_hash_function
For a general discussion on this.
Like Andrew said SHA1 is an hash algorithm and cannot be used for encryption (since you cannot get back the original value). The digest it produce can be used to validate the integrity of the data.
An HMAC is a construct above an hash algorithm that accept a key. However it's not for meant for encryption (again it can't be decrypted) but it allows you to sign the data, i.e. with the same key you'll be able to ensure the data was not tampered with during it's transfer.
Foe encryption you should look at using AES or, if applicable to your application, HTTPS (which will deal with more issues than you want to know about ;-)
SHA-1 , MD-5 are all one way Hashing algorithms.
They just generate a lengthy string. Each and every string when subjected to these functions will yield you a lengthy string which cannot be retained back.
They are far from encryptions.
If you are looking for encryption algorithms , go for AES (Advanced Encryption Standard) , DES (Data Encryption Standard) Algorithms.
As I say, this is a hash, so not an encryption/decryption problem. If you want to implement a straightforward encryption algorithm, I would recommend looking into XOR encryption. If the key is long enough (longer than the message) and your key sharing policy is suitably secure, this is a one time pad; otherwise, it can potentially be broken using statistical analysis.
I am reading about cryptography I was thinking about these properties of AES (that I use):
same message = same ouput
no message length secrecy
possible insecurity if you know the messages (does this actually apply to AES?)
I hear that AES is secure, but what if I want to theoritcaly improve these properties?
I was thinking I could do this:
apply encryption algorithm A
XOR with random data D (making sure the output looks random in case of any cipher)
generate random data that are longer than the original message
use hashing function F to allocate slots in random data (this scrambles the order bytes)
Inputs: Encryption algorith A, Data to XOR with D and a hashing function F
My questions are
does the proposed solution theoreticaly help with my concerns?
is this approach used somewhere?
Possible enhancements to this approach
I could also say that the next position chosen by hashing function will be altered using a checksum of the last decoded byte after the XOR step (that way the message has to be decoded from beginning to end)
If I was to use this to have conversation with someone, the data to XOR with could be the last message from the other person, but thats probably a vulnerability.
I am looking forward to your thoughts!
(This is only theoretical, I am not in need of more secure encryption, just trying to learn from you guys.)
Yeah.
Look. If you want to learn about cryptography, I suggest you read Applied Cryptography. Really, just do it. You will get some nice definitive learnings, and get an understanding of what is appropriate and what is not. It specifically talks about implementation, which is what you are after.
Some rules of thumb:
Don't make up your own scheme. This is almost universally true. There may be exceptions, but it's fair to say that you should only invent your own scheme if you've thoroughly reviewed all existing schemes and have specific quantifiable reasons for them not being good enough.
Model your attacker. Find out what scenarios you are intending to protect against, and structure your system so that it works to mitigate the potential attacks.
Complexity is your enemy. Don't make your system more complex then it needs to be.
Stay up to date. You can find a few mailing lists related to cryptography and (and hashing) join them. From there you will learn interesting implementation details, and be aware of the latest attacks.
As for specifically addressing your question, well, it's confusing. I don't understand your goal, nor do I understand steps 3 and 4. You might like to take a quick look here to gain an understanding of the different ways you can use a given encryption algorithm.
Hope this helps.
You assumptions are incorrect.
same message != same output
The output will not be the same if you encrypt the same message twice.
This is because you are suppsed to use different IVs'.
Message length can be hidden by adding random data to the plaintext.
Attacks have been demonstrated against AES with lesser number of rounds.
Full-round AES has not been compromised in any way.
Other than that I suggest you follow Noon Silks recommendation and read Applied Cryptography.
What's the point of the random data XOR? If it's truly random, how will you ever decrypt it? If you're saying the random data is part of the key, you might as well drop AES and use only the truly random key - as long as it's the same length (or longer than) the data and is never used more than once to encrypt. It's called a one-time pad, the only theoretically unbreakable encryption algorithm I know about.
If the random bits are pseudo-randomly generated, it's highly unlikely that your efforts will yield added security. Consider how many talented mathematicians were involved in designing AES...
EDIT: And I too highly recommend Applied Cryptography, it's an actually very readable and interesting book, not as dry as it may sound.
The iPhone supports the following encryption algorithms
enum {
kCCAlgorithmAES128 = 0,
kCCAlgorithmDES,
kCCAlgorithm3DES,
kCCAlgorithmCAST,
kCCAlgorithmRC4,
kCCAlgorithmRC2
};
I want to use only symmetric algorithm since asymmetric encryptions requires more computation overhead.
So I want to know which of the ones listed is the best algorithm and also what is the key-length in order to avoid excessive computation overhead.
Key length
Bruce Schneier wrote back in 1999:
Longer key lengths are better, but
only up to a point. AES will have
128-bit, 192-bit, and 256-bit key
lengths. This is far longer than
needed for the foreseeable future. In
fact, we cannot even imagine a world
where 256-bit brute force searches are
possible. It requires some fundamental
breakthroughs in physics and our
understanding of the universe. For
public-key cryptography, 2048-bit keys
have same sort of property; longer is
meaningless.
Block ciphers
AES
It's the current standard encryption algorithm. It's considered to be safe by most people. That's what you should be using if you haven't got a very deep knowledge in cryptography.
DES
DES is the predecessor of AES and is considered broken because of its short key length.
3DES
Is a variation of DES with a longer key length. It's still in use but there are some known attacks. Still it's not yet broken.
RC2
It's considered to be weak.
Stream ciphers
RC4
It has some known vulnerabilities but is still used today, for example in SSL. I recommend not to use it in new products.
Conclusion
Use either RC4 or AES, depending if you need a stream or a block cipher.
Of those algorithms you list, I believe RC4 is the fastest. In addition, the speed of RC4 does not depend on the key length once it has been initialized. So you should be able to use the maximum key size for that one without worrying about runtime cost.
RC4 is probably the fastest, but it has some security issues.
If security is an important factor, I would recommend going for AES128. AES is the standard solution and on the top of excellent security you might expect the implementations to get faster over time as people are still actively working on them. Maybe future CPUs will also include support for it, just like new Intel desktop processors will.
I realize this question might not be that programming related, and that it by many will sound like a silly question due to the intuitive logical fault of this idéa.
My question is: is it provable impossible to construct a cryptographic scheme (implementable with a turing-complete programming language) where the encrypted data can be decrypted, without exposing a decryption key to the decrypting party?
Of course, I can see the intuitive logical fault to such a scheme, but as so often with formal logic and math, a formal proof have to be constructed before assuming such a statement. Is such a proof present, or can it easely be constructed?
Thank you for advice on this one!
Edit: Thank you all for valuable input to this discussion!
YES!!! This already exists and are called zero knowledge protocols and zero knowledge proofs.
See http://en.wikipedia.org/wiki/Zero-knowledge_proof
However, you have to have a quite a good background in mathematics and crypto to understand the way it works and why it works.
One example of a zero knowledge protocol is Schnorr's ZK protocol
No; but I'm not sure you're asking what you want to be asking.
Obviously any person who is decrypting something (i.e. using a decryption key) must, obviously, have the key, otherwise they aren't decrypting it.
Are you asking about RSA, which has different keys for decrypting and encrypting? Or are you asking about a system where you may get a different (valid) result, based on the key you use?
If by "decrypted" you just mean arrive at the clear text in some way, then it is certainly possible to create such a cryptographic scheme. In fact it already exists:
Take an asymmetric encryption scheme, eg: RSA where you have the public key but not the private key. Now we get a message that's been encrypted with the public key (and therefore needs the private key to decrypt it). We can get the original message by "brute force" (yes, this'll take an enormously long time given a reasonable key/block size) going through all possible candidates and encrypting them ourselves until we get the same encrypted text. Once we get the same encrypted text we know what the decrypted text would be without ever having discovered the private key.
Yes.
Proof: Encryption can be considered as a black box, so you get an input and an output and you have no idea how the black box transforms the input to get the output.
To reverse engineer the black box, you "simply" need to enumerate all possible Turing machines until one of them does produce the same result as the one you seek.
The same applies when you want to reverse the encryption.
Granted, this will take much more time than the universe will probably live, but it's not impossible that the algorithm will find a match before time runs out.
In practice, the question is how to efficiently find the key that will decode the output. This is a much smaller problem (since you already know the algorithm).
It's called encoding.
But everyone with the encoding algorithm can "decrypt" the message. This is the only way of keyless encryption.