Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I've been reading up on file encryption lately, and In many places I've seen warnings that encrypted files are susceptible to decryption by people so inclined regardless of encryption algorithm strength.
However, I can't get my head around how someone would go about attempting to decrypt an encrypted file.
For example, lets say you've got an encrypted file and you'd like to know it's contents. You have no idea what the key used to encrypt the file is, nor the encryption algorithm used. What do you do? (Assume for this example that the encryption algorithm is a symmetric-key algorithm such as AES-256, I.E. a file encrypted with key which requires said key to decrypt it).
Additionally, how would your approach change if you knew the encryption algorithm used? (Assume in this case that the encryption algorithm used is AES-256, with a random key + salt).
There's two ways to answer this question, in the literal sense of how a perfect crypto system is attacked, and how real world systems are attacked. One of the biggest problems you'll find as you begin to learn more about cryptography is that selecting algorithms is the easy part. It's how you manage those keys that becomes impossibly difficult.
The way in which you attack the basic primitives depends on the type of algorithm. In the case of data encrypted by symmetric ciphers like AES you use Brute force attacks. That is, you effectively try every key possible, until you find the right one. Unfortunately, barring changes in the laws of physics trying every possible 256-bit key can't be done. From Wikipedia: "A device that could check a billion billion (10^18) AES keys per second would in theory require about 3×10^51 years to exhaust the 256-bit key space"
The problem with your question about coming across a seemingly encrypted file, with no knowledge of the methods used, is that it's a bit of a hard problem known as a Distinguishing Attack. One of the requirements of all modern algorithms is that their output should be indistinguishable from random data. If I encrypt something under both AES and Twofish, and then give you some random data, absent any other information like headers, there's no way for you to tell them apart. That being said....
You asked how knowledge of the algorithm changes the approach. One assumption cryptographers usually make is that knowledge of the algorithm shouldn't affect security at all, it should all depend on the secret key. Usually whatever protocol you're working with will tell you the algorithm specifications. If this wasn't public, interoprobility would be a nightmare. Cipher Suites, for example, are sets of algorithms that protocols like SSL support. NIST FIPS and the NSA Suite B are algorithms that have been standardized by the Federal Government, that most everyone follows.
In practice though, most crypto-systems have much larger problems.
Bad random number generation: Cryptography requires very good, unpredictable random number generators. Bad random number generators can completely collapse security, as in the case of Netscape's SSL implementation. You also have examples like the Debian RNG bug, where a developer changed code to satisfy a memory leak warning, which ultimately led to Debian generating the same certificate keys for every system.
Timing Attacks: Certain operations take longer to execute on a computer than others. Sometimes, attackers can observe this latency and deduce secret values. This has been demonstrated by remotely recovering a server's private key over a local network.
Attacks against the host: One way to attack a cryptosystem is to attack the host. By cooling memory, its contents can be preserved and inspected in a machine you control.
Rubber hose cryptanalysis: Maybe one of the easiest attacks, you threaten the party with physical harm or incarceration unless they reveal the key. There has been a lot of interesting case law on whether or not courts can force you to reveal crypto keys.
AES256 is effectively unbreakable.
From http://www.wilderssecurity.com/showthread.php?t=212324:
I don't think there's any credible speculation that any agency can
break a properly implemented AES. There are no known cryptanalytic
attacks, and actually bruteforcing AES-256 is probably beyond human
capabilities within any of our lifetimes. Let's assume that 56 bit DES
can be bruteforced in 1 sec, which is a ridiculous assumption to begin
with. Then AES-256 would take 2^200 seconds, which is 5 x 10^52 years.
So, you can see that without any known weakness in AES, it would be a
total impossibility within any of our lifetimes, even with quantum
computing. Our sun will explode, approximately 5 billion years from
now, before we obtain enough computing power to bruteforce AES-256
without a known weakness. IF a weakness in AES is never found, there
is absolutely no reason to ever look for another cipher besides AES.
It will suffice for as long as humans occupy the planet.
With basic Brute force attack for example. You ask a software to try every single combination between 1 character to 15 character with a-z A-Z 0-9 and wait.
The software will start with 0 to 10... then 0a, 0b, 0c until it finds the password. Wikipedia will give you more detail.
I partially agree with Andrew and partially with Jeremy.
In the case, if encryption key is generated correctly (random generated or based on complex password, good key derivation function and random salt) then AES256 is effectively unbreakable (as Andrew said)
On other hand, if a key isn't correctly generated. As example, just straight hash of 4 digit's PIN password, brute force could be very efficient.
Regarding "You have no idea what the key used to encrypt the file is, nor the encryption algorithm used. "
In most case, encrypted files has a header or a footer which specify something (an application used to encrypt a file, encryption algorithm or something else).
You can try to figure out algorithm by padding (as example 3DES has padding and AES has different padding)
I don't know much about the heavy math behind cryptosystems, I get stuck when it gets bad with the Z/nZ algebra, and sometimes with all these exponent of exponents. It's not I don't like it, it's just that the information you find on the web are not easy to follow blindly.
I was wondering: how reliable can a algorithm be when it encodes a message into plain binary. If my algorithm is arbitrary and known only to me, how can a cryptanalist study an encrypted file and decrypt it, with or without having the decoded file ?
I'm thinking about not using ASCII text to code my message, and I have some ideas to make this algorithm/program.
Attacking a AES or blowfish crypted file is more trivial for a cryptanalyst, than if the algorithm the file is encrypted with is unknown to him, but how does he do then ?
I don't know if I understanded well, but a CS teacher once told me that codes are harder to crack that crypted ciphers.
What do you think ?
Attacking a AES or blowfish crypted file is more trivial for a cryptanalyst, than if the algorithm the file is encrypted with is unknown to him...
What about:
Attacking an untested self written algorithm with no real research is more trivial for a cryptanalyst, than if the algorithm the file is encrypted with, is a well known and proofed one, that has been correctly used....
In short, DO NOT roll your own cryptography unless you're an expert, no unless you're part of an expert group in that field.
Nintendo failed when they implemented RSA on their own in the Wii, Sony failed too when using it in the PS3 (they pretty much used XKCD's random number function for M...)
And you really think you can win by using security by obscurity?
PS: That doesn't mean that you should take the Wikipedia entry on RSA and roll you own implementation from that one (that's exactly were Sony and Big-N failed), no use a tested, open source implementation.
You seem to be using two words interchangeably but remember that Encoding is Not Encryption
When the attacker has no idea which algorithm you used and it is safe, cryptoanalyst has a hard job. So it is unimportant if you use AES or your own cipher as long as it is as strong and safe as AES. Here is the but. Cryptography is a bit demanding and therefore you have many ways to shoot yourself in a foot without knowing it. I would suggest using standard algorithms, maybe with some safe variations.
Common wisdom is that you should not build your own algorithms, and especially not rely on these algorithms remaining secret.
The conceptual reason is that good encryption is about quantified confidentiality. We do not want our secrets to get cracked, but in a more precise way we want to be able to tell how much it would cost to crack our secrets (and hopefully show that the cost is way too high to be envisioned by any entity on Earth). This is the real advance which occurred a few years after World War II: to understand the distinction between key and algorithm. The key concentrates the secret. The algorithm becomes the implementation.
Since the implementation is, well, implemented, it exists as some code or a device, which is tangible and stored even when it is not used. Keeping an implementation secret requires keeping track of the hard disk on which the code resides at all times. If the attacker sees the binary code, he may be able to reverse-engineer it, something which depends on his wits and patience. The point here is that it is very difficult to be able to say: "it costs X dollars to recover a description of the algorithm".
On the other hand, the key is short. It can be stored safely much more easily; e.g. you could memorize it, and avoid committing it to any permanent storage device. You then have to worry about your key only at times when you use it (and not when you do not, e.g. in the middle of the night, when you sleep). The number of possible keys is a simple mathematical problem. You can easily and accurately estimate the average cost of enumerating the possible keys until your key is found. The key is a sturdy foundation for quantified security.
So you should not roll your own algorithms because then you do not know how much security you get.
Also, most people who rolled their own algorithms found out, usually the hard way, that they did not get much security at all. Designing a good encryption algorithm is hard, because it cannot be automatically tested. Your code may run, and properly decrypt data that it encrypted, but it tells you nothing about how secure the algorithm is. The design of the AES was the result of a process which took several years and involved hundreds of skilled cryptographers (most of whom had a PhD and years of experience in academic research on symmetric encryption). That a lone developer could do as well, let alone better, in the secrecy of his own workshop, looks kind of... implausible.
The biggest part of your strategy is called "security through obscurity." You're making the gamble that, since nobody knows the precise details of your little variation on an idea, they won't be able to figure it out.
I'm not a security expert, but I can tell you that you probably won't come up with something incredibly new. Cryptography has been studied by people for millenia and your idea is highly unlikely to be original. Even if you're a relatively good programmer and code something really tricky, the question will come down to who you're up against. If you're just trying to protect your data from your kid sister, then it will probably be fine. On the other hand, if you're using it to send credit card numbers across the internet, then you're doomed to fail. It will be analysed in ways you didn't think of or don't know, and ultimately cracked.
Another way to think of it: algorithms like AES have been extensively studied by professionals in the field and its level of security is pretty well understood. Anything you come up with by yourself will not have the benefit of having been attacked by the best and brightest minds out there. You will have almost no idea of how good it actually is until people start reporting identity theft.
I'm currently investigating the use of curve25519 for signing. Original distribution and a C implementation (and a second C implementation).
Bernstein suggests to use ECDSA for this but I could not find any code.
ECDSA is specified by ANSI X9.62. That standard defines the kind of curves on which ECDSA is defined, including details curve equations, key representations and so on. These do not match Curve25519: part of the optimizations which make Curve25519 faster than standard curves of the same size rely on the special curve equation, which does not enter in X9.62 formalism. Correspondingly, there cannot be any implementation of ECDSA which both conforms to ANSI X9.62, and uses Curve25519. In practice, I know of no implementation of an ECDSA-like algorithm on Curve25519.
To be brief, you are on your own. You may want to implement ECDSA over the Curve25519 implementation by following X9.62 (there a draft from 1998 which can be downloaded from several places, e.g. there, or you can spend a hundred bucks and get the genuine 2005 version from Techstreet). But be warned that you are walking outside of the carefully trodden paths of analyzed cryptography; in other words I explicitly deny any kind of guarantee on how secure that kind-of-ECDSA would be.
My advice would be to stick to standard curves (such as NIST P-256). Note that while Curve25519 is faster than most curves of the same size, smaller standard curves will be faster, and yet provide adequate security for most purposes. NIST P-192, for instance, provides "96-bit security", somewhat similar to 1536-bit RSA. Also, standard curves already provide performance on the order of several thousands signature per second on a small PC, and I have trouble imagining a scenario where more performance is needed.
To use Curve25519 for this, you'd have to implement a lot of functions that AFAIK aren't currently implemented anywhere for this curve, which would mean getting very substantially into the mathematics of elliptic curve cryptography. The reason is that the existing functions throw away the "y" coordinate of the point and work only with the "x" coordinate. Without the "y" coordinate, the points P and -P look the same. That's fine for ECDH which Curve25519 is designed for, because |x(yG)| = |x(-yG)|. But for ECDSA you need to calculate aG + bP, and |aG + bP| does not in general equal |aG - bP|. I've looked into what would be involved in extending curve25519-donna to support such calculations; it's doable, but far from trivial.
Since what you need most of all is fast verification, I recommend Bernstein's Rabin-Williams scheme.
I recently shared the curve25519 library that I developed awhile back. It is hosted at https://github.com/msotoodeh and provides more functionality, higher security as well as higher performance than any other portable-C library I have tested with. It outperforms curve25519-donna by a factor of almost 2 on 64-bit platforms and a factor of almost 4 on 32-bit targets.
Today, many years after this question was asked, the correct answer is the signature scheme Ed25519.
Imagine you have a channel of communication that is inherently lossy and one-way. That is, there is some inherent noise that is impossible to remove that causes, say, random bits to be toggled. Also imagine that it is one way - you cannot request retransmission.
But you need to send data over it regardless. What techniques can you use to send numbers and text over that channel?
Is it possible to encode numbers so that even with random bit twiddling they can still be interpreted as values close to the original (lossy transmittion)?
Is there a way to send a string of characters (ASCII, say) in a lossless fashion?
This is just for fun. I know you can use morse code or any very low frequency binary communication. I know about parity bits and checksums to detect errors and retrying. I know that you might as well use an analog signal. I'm just curious if there are any interesting computer-sciency techniques to send this stuff over a lossy channel.
Depending on some details that you don't supply about your lossy channel, I would recommend, first using a Gray code to ensure that single-bit errors result in small differences (to cover your desire for loss mitigation in lossy transmission), and then possibly also encoding the resulting stream with some "lossless" (==tries to be loss-less;-) encoding.
Reed-Solomon and variants thereof are particularly good if your noise episodes are prone to occur in small bursts (several bit mistakes within, say, a single byte), which should interoperate well with Gray coding (since multi-bit mistakes are the killers for the "loss mitigation" aspect of Gray, designed to degrade gracefully for single-bit errors on the wire). That's because R-S is intrinsically a block scheme, and multiple errors within one block are basically the same as a single error in it, from R-S's point of view;-).
R-S is particularly awesome if many of the errors are erasures -- to put it simply, an erasure is a symbol that has most probably been mangled in transmission, BUT for which you DO know the crucial fact that it HAS been mangled. The physical layer, depending on how it's designed, can often have hints about that fact, and if there's a way for it to inform the higher layers, that can be of crucial help. Let me explain erasures a bit...:
Say for a simplified example that a 0 is sent as a level of -1 volt and a 1 is send as a level of +1 volt (wrt some reference wave), but there's noise (physical noise can often be well-modeled, ask any competent communication engineer;-); depending on the noise model the decoding might be that anything -0.7 V and down is considered a 0 bit, anything +0.7 V and up is considered a 1 bit, anything in-between is considered an erasure, i.e., the higher layer is told that the bit in question was probably mangled in transmission and should therefore be disregarded. (I sometimes give this as one example of my thesis that sometimes abstractions SHOULD "leak" -- in a controlled and architected way: the Martelli corollary to Spolsky's Law of Leaky Abstractions!-).
A R-S code with any given redundancy ratio can be about twice as effective at correcting erasures (errors the decoder is told about) as it can be at correcting otherwise-unknown errors -- it's also possible to mix both aspects, correcting both some erasures AND some otherwise-unknown errors.
As the cherry on top, custom R-S codes can be (reasonably easily) designed and tailored to reduce the probability of uncorrected errors to below any required threshold θ given a precise model of the physical channel's characteristics in terms of both erasures and undetected errors (including both probability and burstiness).
I wouldn't call this whole area a "computer-sciency" one, actually: back when I graduated (MSEE, 30 years ago), I was mostly trying to avoid "CS" stuff in favor of chip design, system design, advanced radio systems, &c -- yet I was taught this stuff (well, the subset that was already within the realm of practical engineering use;-) pretty well.
And, just to confirm that things haven't changed all that much in one generation: my daughter just got her MS in telecom engineering (strictly focusing on advanced radio systems) -- she can't design just about any serious program, algorithm, or data structure (though she did just fine in the mandatory courses on C and Java, there was absolutely no CS depth in those courses, nor elsewhere in her curriculum -- her daily working language is matlab...!-) -- yet she knows more about information and coding theory than I ever learned, and that's before any PhD level study (she's staying for her PhD, but that hasn't yet begun).
So, I claim these fields are more EE-y than CS-y (though of course the boundaries are ever fuzzy -- witness the fact that after a few years designing chips I ended up as a SW guy more or less by accident, and so did a lot of my contemporaries;-).
This question is the subject of coding theory.
Probably one of the better-known methods is to use Hamming code. It might not be the best way of correcting errors on large scales, but it's incredibly simple to understand.
There is the redundant encoding used in optical media that can recover bit-loss.
ECC is also used in hard-disks and RAM
The TCP protocol can handle quite a lot of data loss with retransmissions.
Either Turbo Codes or Low-density parity-checking codes for general data, because these come closest to approaching the Shannon limit - see wikipedia.
You can use Reed-Solomon codes.
See also the Sliding Window Protocol (which is used by TCP).
Although this includes dealing with packets being re-ordered or lost altogether, which was not part of your problem definition.
As Alex Martelli says, there's lots of coding theory in the world, but Reed-Solomon codes are definitely a sweet spot. If you actually want to build something, Jim Plank has written a nice tutorial on Reed-Solomon coding. Plank has a professional interest in coding with a lot of practical expertise to back it up.
I would go for some of these suggestions, followed by multiple sendings of the same data. So that way you can hope for different errors to be introduced at different points in the stream, and you may be able to infer the desired number a lot easier.
Triple DES or RC4?
I have the choice to employ either one.
As a high level view the following comments on both should be useful.
It is extremely easy to create a protocol based on RC4 (such as WEP) that is of extremely low strength (breakable with commodity hardware in minutes counts as extremely weak).
Triple DES is not great in that its strength comes though excessive cpu effort but it is of considerably greater strength (both theoretically in real world attacks) than RC4 so should be the default choice.
Going somewhat deeper:
In the field of encryption with no well defined target application then the definition of 'best' is inherently hard since the 'fitness' of an algorithm is multi variant.
Ease of implementation
Can you run it on commodity hardware?
Are implementations subject to accidental flaws that significantly reduce security while still allowing 'correctness' of behaviour.
Cost of implementation
Power/silicon/time to encode/decode.
Effort to break
Brute Force resilience. Pretty quantifiable
Resistance to cryptanalysis, less quantifiable, you might think so but perhaps the right person hasn't had a try yet:)
Flexibility
Can you trade off one of the above for another
What's the maximum key size (thus upper limits of the Brute Force)
What sort of input size is required to get decent encryption, does it require salting.
Actually working out the effort to break itself requires a lot of time and effort, which is why you (as a non cryptographer) go with something already done rather than roll your own. It is also subject to change over time, hopefully solely as a result of improvements in the hardware available rather than fundamental flaws in the algorithm being discovered.
The core overriding concern is of course just that, is it secure? It should be noted that many older algorithms previously considered secure are no longer in that category. Some are so effectively broken that their use is simply pointless, you have no security whatsoever simply obscurity (useful but in no way comparable to real security).
Currently neither of the core algorithms of RC4 and TDES is in that category but the naive implementation of RC4 is considered extremely flawed in protocols where the message data can be forced to repeat. RC4 has several more significant theoretical flaws than TDES.
That said TDES is NOT better than RC4 in all the areas listed above. It is significantly more expensive to compute (and that expensiveness is not justified, other less costly crypto systems exist with comparable security to TDES)
Once you have a real world application you normally get one or both of the following:
Constrains on your hardware to do the task
Constraints imposed be the data you are encrypting (this is used to transmit data which needs to be kept secret only for X days... for example)
Then you can state, with tolerances and assumptions, what can achieve this (or if you simply can't) and go with that.
In the absence of any such constraints we can only give you the following:
Ease of implementation
Both have publicly available secure free implementations for almost any architecture and platform available.
RC4 implementations may not be as secure as you think if the message can be forced to repeat (see the WEP issues). Judicious use of salting may reduce this risk but this will NOT have been subject to the rigorous analysis that the raw implementations have been and as such should be viewed with suspision.
Cost of implementation
I have no useful benchmarks for RC4 (it is OLD) http://www.cryptopp.com/benchmarks.html has some useful guides to put TDES in context with RC5 which is slower than RC4 (TDES is at least an order of magnitude slower than RC4) RC4 can encrypt a stream at approximately 7 cycles per byte in a fast implementation on modern x86 processors for comparison.
Effort to break
Brute Force resilience of TDES is currently believed to be high, even in the presence of many encryption outputs.
RC4 brute force resilience is orders of magnitude lower than TDES and further is extremely low in certain modes of operation (failure to discard initial bits of stream)
Resistance to cryptanalysis, There are publicly known flaws for Triple DES but they do not reduce the effectiveness of it to realistic attack in the next decade or two, the same is not true for RC4 where several flaws are known and combined they have produced reliable attacks on several protocols based on it.
Flexibility
TDES has very little flexibility (and your library may not expose them anyway)
RC4 has a lot more flexibility (the key used to initialize it can be arbitrarily long in theory, though the library may limit this.
Based on this and your statement that you must use one or the other you should consider the RC4 implementation only if the CPU cost of TripleDES makes it unrealistic to implement in your environment or the low level of security provided by RC4 is still considerably higher than your requirements specify.
I should also point out that systems exist which are empirically better in all areas than RC4 and TDES.
The eSTREAM project is evaluating various stream cyphers in the order of 5 or less cycles per byte though the cryptanalysis work on them is not really complete.
Many faster, stronger block cyphers exist to compete with TDES. AES is probably the best known, and would be a candidate since it is of comparable (if not better) security but is much faster.
Sorry - triple DES is no longer considered best practices. AES is simply a better algorithm so if you can use it then you should. For an easy implementation, go here.
I strongly suggest that you learn more by reading up on TDES on Wikipedia. The money quote is:
"TDES is slowly disappearing from use,
largely replaced by the Advanced
Encryption Standard (AES)."
RC4 is, honestly, just not an acceptable option for any application where security is important.
Agreed -- DES is largely outdated, so unless there is a good reason to use it, go with AES. If that's not an option, TDES would be the better choice, unless you're dealing with streaming data (ie, data which cannot be broken into blocks), then RC4 is the way to go (out of the given options).
Of course, I feel like I should mention... Cryptography is really, really hard to get right, and even the strongest algorithm can be broken easily if you get something even a little wrong (see, eg, older Kerberos or WEP).
This might not be the most informative answer, but during my 4 year employment term with a very large telco, Triple DES was the encryption standard for all sensitive applications, others were simply not allowed. It was Triple DES or the application does not go live. Hope that helps.
Both are secure, well... enough. RC4 is faster so if that's important to you...
After reading other peoples answers (which are all correct), it's clear that it really depends on your context. There are so many other questions that could influence your decision. If it just needs to be fool proof, if it's not really something sensitive and you have a lot of data and speed is the factor, go for RC4.
Otherwise, if you need something a bit more secure and easier to implement or as you say "tougher to screw up" :) then go for 3DES, which is, as far as I remember, secure enough (!) till 2020-2030, or something like that.
Are those your only two options? If you can use AES (also known as Rijndael) then use it instead. DES is slow, and now considered obsolete (AES is the replacement for it).
RC4 sucks, don't use it. It's a stream cipher but you can use a block cipher instead, just pad the final block of data (Google PKCS#5 padding scheme).
Lately I've only seen DES being used in embedded devices (firmware), because the implementation is simple and it uses very little memory. Even in JavaME you can use AES.
One factor in deciding between 3DES and RC4 is language support. Java doesn't natively support RC4 and you would need to grab an open source library such as BouncyCastle to implement. MS doesn't have this same challenge.