What's the benefit of using a well-known encryption algorithm? - encryption

I can easily handcraft my own encryption algorithm like the following:
// make sure the private key is long enough
byte key[] = {0x3e, 0x33, 0x7e, 0x02, 0x48, 0x2a, 0x4e, ...};
byte data[] = "a string to be encrypted".getBytes("utf-8");
for (int i = 0, j = 0; i < data.length; ++i, ++j) {
data[i] ^= key[j];
if (j + 1 == key.length)
j = 0;
}
With the above algorithm, if I don't give away the private key, I find no easy way of breaking the encryption(or I am too naive?), if an encryption algorithm can be created easily like this, what's the point in creating the standard? what's the benefit of using those well-known algorithms?

speaking directly to your proposed algorithm, there are a couple of major problems with it:
If your data is longer than your key, then it will become very easy to recover your key, because it becomes like a Vigenere Cipher.
you can only use a key to encrypt a single message. Encrypting two or more messages with the same key causes the same problems that reusing the key within a message does, but even worse.
Because your key has to be as long as the message to actually be secure, you have a key management nightmare. How do you store the key that is as long as the message? how do you communicate it securely to whomever wants to read it? What if you want to encrypt a 1GB file? you now need 2GB of storage for your encrypted version
Your key has to be perfectly random. Any flaws in your random number generator will expose information about your message and key. Admittedly, this is an issue with algorithms like AES too, but while a bad RNG will reduce the effort to break an AES message/key, it is not nearly as bad as it would be for your scheme
Well known algorithms like AES address these sorts of issues, using short keys to get more security. a single 256 bit key can be used to encrypt a huge number of messages before it becomes unsafe to use, and managing and storing a 256 bit key is much easier than a 1GB one.

You use a well-known algorithm in order to not become Dave.
Schneier's Law states that "any person can invent a security system so clever that she or he can't think of how to break it."
Which basically means: an security system (encryption algorithm, authentication system, ...) can't be assumed to be secure because the creator says it's secure. Other experts on the topic must review it (and usually review it for a long time) before it can be considered secure.

The main benefit of using a well known algorithm is that it will be reviewed and analyzed for defects and weaknesses. If you roll your own, you don't get the benefit of community review. Moreover, it's probably almost as easy, or just as easy, to use built in encryption, depending on your platform.
I'm no expert in cryptanalysis, but something this simple probably isn't that secure.
Besides, if it were this easy, wouldn't everyone do it?

Related

OTP encryption with ceasar encryption

Why not to use OTP to encrypt more than one message but every encryption after the XOR do something like subtitution/ceasar cipher on the CT?
Reusing a one-time-pad is bad because it gives you information about the key.
p: a plaintext message to be encrypted: p_1 p_2 ... p_n
e_i: encryption of p_i with key k_i
otp: e_i = p_i^k_i for ii in 1..n
If you encrypt multiple messages and you xor them together you get something like
e1_1^e2_1 = p1_1^k_1^k_1^p2_1
and since k_1^k_1 cancels that becomes
e1_1^e2_1 = p1_1^p2_1
So you instantly learn information about the messages, but if you happened to know something about the input, you also learn something about the key.
By something like Caesar cipher you might mean
e2_1 = p2_1^(k_1+13)
That's assuming a 26-letter alphabet for your key and message space.
Unfortunately after 2 messages, your key wraps again, and you're back to the same problem you had before.
(there are other big problems too)
more generally, whatever simple thing you do , you give away information about the messages and typically key. The attacker can typically set up a big matrix of equations and use linear algebra to solve for the key once you give them enough information.
However if you take the simple thing you're doing and make it more and more complex and eventually get to a point where
kn: the key for the nth message
kn = f(k,n) for some function k
such that an attacker cannot learn significant information about f(k,n) givenf(k,m)forn != m, you've invented a stream cipher.
People do use stream ciphers all the time; they are not as secure as OTP, but they are a core of internet security.
The trick of course is figuring out a good functionf`; describing how to do that is beyond the margin of this question. (And besides I don't actually have that skill).

Wanted: encryption scheme for copy protection purposes

I am tasked with implementing a dongle-based copy protection scheme for an application. I realize that no matter what I do, someone will crack it, but I want to at least make it a little more difficult than an if-statement checking whether a dongle is present.
My approach is to encrypt critical data that the application needs for proper execution. During runtime, the decryption key is retrieved from the dongle (our chosen model has some suitable API functions for that), the data is decrypted and the application is happy.
Of course, a determined attacker can intercept that decryption key and also get ahold of the decrypted data. That's ok. But what should be hard is to substitute their own data. So I'm looking for an encryption scheme where knowing the decryption key doesn't enable someone to encrypt their own data.
That's obviously asymmetric encryption. But for every such algorithm I found so far, the encryption (or public) key can be generated from the decryption (or private) key, which is exactly what I'm trying to avoid.
Note: simply signing the data won't help much, since (unless I'm totally misunderstanding such signatures) verifying the signature will just be another if-statement, which is easily circumvented.
So... any ideas?
The moment the private key is known to the attacker you won't have any secret information to differentiate yourself from the others.
To make it harder for the attacker: You might want to expire each pair (public key, private key) after an application specific time T and generate a new pair based on the previous pair both on the dongle and your own machine, independently. This way the attacker needs to have a constant access to the dongle to be able to encrypt his data with the new private key or to run his private_key_detection algorithm as often as T.
You probably want to run the decrypt on the dongle. There are a few pieces of hardware that help this (I just googled this one, for example.). There are likely many others....Dallas Semiconductor used to have a Java button that would allow you to run code on a small dongle like device, but I don't think they have it anymore.
Some of these allow you to execute code in the dongle. So maybe a critical function that is hard to recreate yet doesn't require high performance might work? Perhaps a license key validation algorithm.
Maybe you could include code in the dongle that has to be put into memory in order for the program to run. This would be a little harder to break, but might be hard to implement depending on what tools you are using to make your program.
You probably also want to study up on some anti-debugging subjects. I remember seeing a few publications a while back, but here is at least one. This is another layer that will make it harder to crack.
Dependency on an Internet connection may also be an option. You have to be careful here to not piss off your customers if they can't get your code to run without an Internet connection.
You can also check out FlexLM (or whatever it is called these days). It works, but it is a beast. They also try to negotiate a percentage of your company's gross profit for the license fee if I recall correctly (it's been years....I think we told them to stuff it when they asked for that.)
Good luck!
To answer my own question (somewhat), it is possible to do this with RSA, but most APIs (including the one of OpenSSL's crypto library) need to be "tricked" into doing it. The reason you can generate the public key, given the private key, is that
It is common practive for implementations of RSA to save p and q (those big prime numbers) in the private key data structure.
Since the public key (which consists of the modulus N and some exponent e) is public anyway, there's (usually) no point in choosing an obscure e. Thus, there are a handful of standard values that are used commonly, like 3 or 65537. So even if p and q are unknown, you might be able to "guess" the public exponent.
However, RSA is symmetrical in the sense that anything you encrypt with the public key can be decrypted with the private key and vice versa. So what I've done (I'm a monster) is to let the crypto library generate an RSA key. You can choose your own public exponent there, which will later be used to decrypt (contrary to the normal way). Then, I switch around the public and private exponent in the key data structure.
Some tips for anyone trying to do something similar with the crypto library:
In the RSA data structure, clear out everything but n and e / d, depending on whether you want to encrypt or decrypt with that particular key.
Turn off blinding with RSA_blind_off. It requires the encryption exponent even when decrypting, which is not what we want. Note that this might open you up to some attacks.
If someone needs more help, leave a comment and I'll edit this post with more information.

Been advised to use same IV in AES implementation

We've had to extend our website to communicate user credentials to a suppliers website (in the query string) using AES with a 256-bit key, however they are using a static IV when decrypting the information.
I've advised that the IV should not be static and that it is not in our standards to do that, but if they change it their end we would incur the [big] costs so we have agreed to accept this as a security risk and use the same IV (much to my extreme frustration).
What I wanted to know is, how much of a security threat is this? I need to be able to communicate this effectively to management so that they know exactly what they are agreeing to.
*UPDATE:*We are also using the same KEY throughout as well.
Thanks
Using a static IV is always a bad idea, but the exact consequences depend on the Mode of Operation in use. In all of them, the same plaintext will produce the same ciphertext, but there may be additional vulnerabilities: For example, in CFB mode, given a static key, the attacker can extract the cipherstream from a known plaintext, and use it to decrypt all subsequent strings!
Using a static IV is always a bad idea. Using a static key is always a bad idea. I bet that your supplier had compiled the static key into their binaries.
Sadly, I've seen this before. Your supplier has a requirement that they implement encryption and they are attempting to implement the encryption in a manner that's as transparent as possible---or as "checkbox" as possible. That is, they aren't really using encryption to provide security, they are using it to satisfy a checkbox requirement.
My suggestion is that you see if the supplier would be willing to forsake this home-brewed encryption approach and instead run their system over SSL. Then you get the advantage of using a quality standard security protocol with known properties. It's clear from your question that neither your supplier nor you should be attempting to design a security protocol. You should, instead, use one that is free and available on every platform.
As far as I know (and I hope others will correct me if I'm wrong / the user will verify this), you lose a significant amount of security by keeping a static key and IV. The most significant effect you should notice is that when you encrypt a specific plaintext (say usernameA+passwordB), you get the same ciphertext every time.
This is great for pattern analysis by attackers, and seems like a password-equivalent that would give attackers the keys to the kingdom:
Pattern analysis: The attacker can see that the encrypted user+password combination "gobbbledygook" is used every night just before the CEO leaves work. The attacker can then leverage that information into the future to remotely detect when the CEO leaves.
Password equivalent: You are passing this username+password in the URL. Why can't someone else pass exactly the same value and get the same results you do? If they can, the encrypted data is a plaintext equivalent for the purposes of gaining access, defeating the purpose of encrypting the data.
What I wanted to know is, how much of a security threat is this? I need to be able to communicate this effectively to management so that they know exactly what they are agreeing to.
A good example of re-using the same nonce is Sony vs. Geohot (on a different algorithm though). You can see the results for sony :) To the point. Using the same IV might have mild or catastrophic issues depending on the encryption mode of AES you use. If you use CTR mode then everything you encrypted is as good as plaintext. In CBC mode your first block of plaintext will be the same for the same encrypted data.

Would CSPRNG + XOR be a secure encryption method?

Similarily to RC4 (RC4_PRNG+XOR ), would it be secure to use another CSPRNG(Cryptographically secure pseudorandom number generator)[Isaac, BlumBlumShub, etc) instead of RC4's and XOR the data with the resulting keystream?
Essentially this is just using Blum Blum Shub (or whatever PRNG) as a stream cipher. This isn't how they're designed to be used, and they might be weak to attacks that make sense in a stream cipher context but not in a CSPRNG context (eg. related-key attacks).
If this is what you want, you're better off just using a modern stream cipher. For example, DJB's Salsa20 is well-regarded.
Well, it depends.
Most encryption algorithms do significantly more than XOR. But that's because the key is shorter than the plaintext. If the key is as large as the plaintext, and truly random, then it is impossible to crack it (it's called a One Time Pad).
So, you need to explain more.
But I'm going to guess that you're key length is not the same as your input length, and that even if it was, almost certainly the random number service you are using is not truly secure, so I'd advise against your approach (furthermore, it goes without saying (maybe) that the problem with OTP is key-exchange).
Swapping out the CSPRNG in this scheme would probably be just as secure, and have the exact same set of assumptions, weaknesses and practical issues.

What is the difference between Obfuscation, Hashing, and Encryption?

What is the difference between Obfuscation, Hashing, and Encryption?
Here is my understanding:
Hashing is a one-way algorithm; cannot be reversed
Obfuscation is similar to encryption but doesn't require any "secret" to understand (ROT13 is one example)
Encryption is reversible but a "secret" is required to do so
Hashing is a technique of creating semi-unique keys based on larger pieces of data. In a given hash you will eventually have "collisions" (e.g. two different pieces of data calculating to the same hash value) and when you do, you typically create a larger hash key size.
obfuscation generally involves trying to remove helpful clues (i.e. meaningful variable/function names), removing whitespace to make things hard to read, and generally doing things in convoluted ways to make following what's going on difficult. It provides no serious level of security like "true" encryption would.
Encryption can follow several models, one of which is the "secret" method, called private key encryption where both parties have a secret key. Public key encryption uses a shared one-way key to encrypt and a private recipient key to decrypt. With public key, only the recipient needs to have the secret.
That's a high level explanation. I'll try to refine them:
Hashing - in a perfect world, it's a random oracle. For the same input X, you always recieve the same output Y, that is in NO WAY related to X. This is mathematically impossible (or at least unproven to be possible). The closest we get is trapdoor functions. H(X) = Y for with H-1(Y) = X is so difficult to do you're better off trying to brute force a Z such that H(Z) = Y
Obfuscation (my opinion) - Any function f, such that f(a) = b where you rely on f being secret. F may be a hash function, but the "obfuscation" part implies security through obscurity. If you never saw ROT13 before, it'd be obfuscation
Encryption - Ek(X) = Y, Dl(Y) = X where E is known to everyone. k and l are keys, they may be the same (in symmetric, they are the same). Y is the ciphertext, X is the plaintext.
A hash is a one way algorithm used to compare an input with a reference without compromising the reference.
It is commonly used in logins to compare passwords and you can also find it on your reciepe if you shop using credit-card. There you will find your credit-card-number with some numbers hidden, this way you can prove with high propability that your card was used to buy the stuff while someone searching through your garbage won't be able to find the number of your card.
A very naive and simple hash is "The first 3 letters of a string".
That means the hash of "abcdefg" will be "abc". This function can obviously not be reversed which is the entire purpose of a hash. However, note that "abcxyz" will have exactly the same hash, this is called a collision. So again: a hash only proves with a certain propability that the two compared values are the same.
Another very naive and simple hash is the 5-modulus of a number, here you will see that 6,11,16 etc.. will all have the same hash: 1.
Modern hash-algorithms are designed to keep the number of collisions as low as possible but they can never be completly avoided. A rule of thumb is: the longer your hash is, the less collisions it has.
Obfuscation in cryptography is encoding the input data before it is hashed or encrypted.
This makes brute force attacks less feasible, as it gets harder to determine the correct cleartext.
That's not a bad high-level description. Here are some additional considerations:
Hashing typically reduces a large amount of data to a much smaller size. This is useful for verifying the contents of a file without having to have two copies to compare, for example.
Encryption involves storing some secret data, and the security of the secret data depends on keeping a separate "key" safe from the bad guys.
Obfuscation is hiding some information without a separate key (or with a fixed key). In this case, keeping the method a secret is how you keep the data safe.
From this, you can see how a hash algorithm might be useful for digital signatures and content validation, how encryption is used to secure your files and network connections, and why obfuscation is used for Digital Rights Management.
This is how I've always looked at it.
Hashing is deriving a value from
another, using a set algorithm. Depending on the algo used, this may be one way, may not be.
Obfuscating is making something
harder to read by symbol
replacement.
Encryption is like hashing, except the value is dependent on another value you provide the algorithm.
A brief answer:
Hashing - creating a check field on some data (to detect when data is modified). This is a one way function and the original data cannot be derived from the hash. Typical standards for this are SHA-1, SHA256 etc.
Obfuscation - modify your data/code to confuse anyone else (no real protection). This may or may not loose some of the original data. There are no real standards for this.
Encryption - using a key to transform data so that only those with the correct key can understand it. The encrypted data can be decrypted to obtain the original data. Typical standards are DES, TDES, AES, RSA etc.
All fine, except obfuscation is not really similar to encryption - sometimes it doesn't even involve ciphers as simple as ROT13.
Hashing is one-way task of creating one value from another. The algorithm should try to create a value that is as short and as unique as possible.
obfuscation is making something unreadable without changing semantics. It involves value transformation, removing whitespace, etc. Some forms of obfuscation can also be one-way,so it's impossible to get the starting value
encryption is two-way, and there's always some decryption working the other way around.
So, yes, you are mostly correct.
Obfuscation is hiding or making something harder to understand.
Hashing takes an input, runs it through a function, and generates an output that can be a reference to the input. It is not necessarily unique, a function can generate the same output for different inputs.
Encryption transforms the input into an output in a unique manner. There is a one-to-one correlation so there is no potential loss of data or confusion - the output can always be transformed back to the input with no ambiguity.
Obfuscation is merely making something harder to understand by intruducing techniques to confuse someone. Code obfuscators usually do this by renaming things to remove anything meaningful from variable or method names. It's not similar to encryption in that nothing has to be decrypted to be used.
Typically, the difference between hashing and encryption is that hashing generally just employs a formula to translate the data into another form where encryption uses a formula requiring key(s) to encrypt/decrypt. Examples would be base 64 encoding being a hash algorithm where md5 being an encryption algorithm. Anyone can unhash base64 encoded data, but you can't unencrypt md5 encrypted data without a key.

Resources