Can hashes or keys generated (either intentionally or accidentally) that would trigger an injection attack? For example, if the hash or key was generated as something like SELECT%0d*%0dFROM%0dWHEREVER, could this cause an injection attack? I am aware with current technologies and standards, any decent protection will protect against all input, hashes and keys included, so it almost surely wouldn't effect any systems in reality.
Yes, I have been informed this is the wrong location for this type of question. Yes, I am now aware of where to put it next time.
In theory, I suppose it's possible that the result of a hash function would result in a specific sequence of bytes that happens to be SQL syntax, either when used as raw binary bytes or if encoded in the range of printable ASCII characters (values 0x20 through 0x7F).
But it would be a hard task to come up with an input string that produced that exact result when hashed.
The result of a hash function is always of fixed length, depending on the hash algorithm and options. So you would need to have an attack query in mind that fit in that fixed length exactly, and then you would need to find the input that hashed to that string exactly.
Also, the method for defending against such an attack is the same as defending against any other SQL injection attack: use query parameters. Any unsafe content, no matter if it is the result of a hash function or not, is able to effect an SQL injection if it is kept separate from the SQL syntax.
I think there are other means of attack that would be easier and more effective. Social hacking is still the most general-purpose means of attack, and can get around just about any security defense.
Related
I need to encrypt user names that i receive from an external partners SSO. This needs to be done because the user names are assigned to school children. But we still need to be able to track each individual to prevent abuse of our systems, so we have decided to encrypt the user names in our logs etc.
This way, a breach of our systems will not compromise the identity of the children.
Heres my predicament. I have very limited knowledge in this area, so i am looking for advice on which algorithm to use.
I was thinking of using an asymmetrical algorithm, like PGP, and throwing away one of the keys so that we will not be able to decrypt the user name.
My questions:
Does PGP encryption always provide the same output given the same input?
Is PGP a good choice for this, or should we use an other algorithm?
Does anyone have a better suggestion for achieving the same thing - anonymization of the user
If you want a one-way function, you don't want encryption. You want hashing. The easiest thing to do is to use a hash like SHA-256. I recommend salting the username before hashing. In this case, I would probably pick a static salt like edu.myschoolname: and put that in front of the username. Then run that through SHA-256. Convert the result to Base-64 or hex encoding, and use the resulting string as the "username."
From a unix command line, this would look like:
$ echo -n "edu.myschoolname:robnapier#myschoolname.edu" | shasum -a 256
09356cf6df6aea20717a346668a1aad986966b192ff2d54244802ecc78f964e3 -
That output is unique to that input string (technically it's not "unique" but you will never find a collision, by accident or by searching). And that output is stable, in that it will always be the same for the given input. (I believe that PGP includes some randomization; if it doesn't, it should.)
(Regarding comments below)
Cryptographic hash algorithms are extremely secure for their purposes. Non-cryptographic hash algorithms are not secure (but also aren't meant to be). There are no major attacks I know of against SHA-2 (which includes SHA-256 and SHA-512).
You're correct that your system needs to be robust against someone with access to the code. If they know what userid they're looking for, however, no system will be resistant to them discovering the masked version of that id. If you encrypt, an attacker with access to the key can just encrypt the value themselves to figure out what it is.
But if you're protecting against the reverse: preventing attackers from determining the id when they do not already know the id they're looking for, the correct solution is a cryptographic hash, specifically SHA-256 or SHA-512. Using PGP to create a one-way function is using a cryptographic primitive for something it is not built to do, and that's always a mistake. If you want a one-way function, you want a hash.
I think that PGP is a good Idea, but risk to make usernames hard to memorize, why not simply make a list of usernames composed with user + OrderedNumbers where user can be wichever word you want and oredered number is a 4-5 digit number ordered by birth date of childrens?Once you have done this you only have to keep a list where the usernames are linked wit the corresponding child abd then you can encript this "nice to have" list with a key only you know.
I found a "lua aes" solution on the web a while ago. And have some concern about its safety.
It states that:
-- Do not use for real encryption, because the password is easily viewable while encrypting.
It says this at its "file encryption test" script.
My questions are:
Why is that, how is it any different from encrypting a string and writing it to a file?
How could it be viewable while encryption? Is it viewable after encryption too?
Basically, Is it safe to use or not?
Is there anyone who can confirm this who has used it? I mailed the original developer but the email address was invalid.
Should I be using it at all?
I assume there are two reasons why that recommendation was made:
Strings are immutable in Lua, so there is no way to overwrite a string with different data
once it's created.
In Lua, objects are garbage collected. The garbage collector runs only at certain points in
the program, and the application has no way of telling when the garbage collector will run after there are no more references to the object. Until then, the password string will remain in memory by point 1.
See Java's case, which is similar to Lua:
Why is char[] preferred over String for passwords?
As you can see there, using char arrays instead of strings is a better way to store passwords, since arrays are mutable and can be reinitialized to zero when done.
The closest Lua equivalent to a char array is a table filled with numbers. Here the password is stored as a table, rather than a string, where each element in the table consists of the integer representation of each character. For example, "pass" becomes {0x70,0x61,0x73,0x73}. After the table containing the password is used to encrypt or decrypt, it is filled with zeros before it's unreachable by the program and eventually gets garbage collected.
According to your comment, I may have misunderstood. Maybe the "file encryption test" stores the password in plain text along with the encrypted file, allowing anyone with access to the file, even attackers, the ability to trivially decrypt it. The points above still apply, though. This is still only a guess, however; I can't know exactly what you mean unless you provide a link to the encryption library you mention.
I've taken a look at the AES library and the concern about the password being "easily viewable" occurs because the user types the password in plain text, through the command line or terminal, in order to start the Lua program, even though the output of the program contains only cipher text. A slightly more secure way of providing the password would be not to show the input (as is done in sudo) or to mask the input with dots or stars (as is done in many Web pages).
Either that or the points given above are perhaps the only logical explanation.
You may also try out alternate methods, like LuaCrypto, which is a binding to OpenSSL and is able to encrypt data using the AES standard.
I'm processing human-written text documents and I do a dictionary based string matching to find specific strings in the document.
For security reasons, I can not input the document in unencrypted text format, but rather in a strong encrypted format. I can not allow developers working on the unit access the unencrypted input string, but they can access the matched strings.
To make it clearer:
Dictionary = {"Apple", "Apple pie", "World War II"}
Document1 = "apple is my favorite fruit." -> Should match "apple"
Document2 = "apple pie was invented during world war II" -> Should match "apple pie" and "world war II"
So the string matching is case-insensitive and only matches longest occurrence (I'm using Aho-Corasick).
The options I see are:
Find an encryption function F where F("ABCD") = F("A")+F("B")+F("C")+F("D") = F("AB")+F("CD").
Chunk the document by whitespace, hash both the chunks and the dictionary and then look for similarities. (complicated)
Make a separate unit responsible for encryption and string matching with obfuscated code. (most obvious way)
As I'm not good at cryptography, I might be missing something here. Can anyone see a better way of achieving this?
Firstly, any encryption function that satisfies your condition:
F("ABCD") = F("A")+F("B")+F("C")+F("D")
is inherently not strong encryption (assuming + here means concatenation). The problem is that this condition implies that F("A") is invariant, which means that it the encryption is equivalent to a simple substitution cipher, vulnerable to frequency analysis.
A bigger problem however is that any solution is going to be vulnerable to a dictionary attack. If you can determine that a word in the unknown document is a particular word in your limited dictionary, then you can also search for it in a complete dictionary - in this way, you can quickly discover the entire plaintext.
If I understand correctly, the goal is to prevent someone who has physical access to the machine and access to the processes running on it from being able to determine the contents of the document. I don't think that is possible if the "bad guy" is extremely dedicated. He will be able to extract key information necessary to decrypt the document from the process space. As a general rule, if the attacker has physical access, then there is not a lot that can be done.
If the program can match parts of text of a document to known text, then the attacker will be able to observe that and extract the information. Obfuscation of the code may make it harder, but if the information is valuable enough, then the attacker will just work harder.
It seems that it would be better if the server can be run in a secure fashion and limiting physical access as much as possible. There are, of course, still a lot of issues involved (code would need to be audited for malicious code for example since the developers are apparently not trusted) but that at least gets you to a position that has a chance of being defended.
Edit A couple thoughts about encryption in the context of what you are trying to do. If you are using, for example, AES encryption in CBC (cipher block chaining) mode, then it is not possible to decrypt a single word from the document (assuming the document is encrypted as a whole). Each block of cipher text depends on the preceding block. Thus, it would be necessary to decrypt the entire document up to the point of interest. In other words, you would have to decrypt the entire document to search it.
Another encryption possibility would be to use AES in CTR mode. CTR mode generates cipher stream (based on the key and some initialization vector) and XORs that against the plain text to produce the cipher text. In this mode, it is possible to decrypt a portion in the middle of the document without decrypting the previous section. But that is somewhat misleading and a bit of a semantics argument. Even though you don't have to decrypt the preceding section, it is still necessary to generate the cipher stream for the entire document up to the point of interest. And from an attacker's standpoint, that would be the same as decrypting the document since the attacker would have access to the encrypted text (presumably in the situation you describe) and the generated XOR stream, which would yield the plain text.
Your proposed solution #1 is a very very difficult problem - known to be solvable, but almost certainly not worth your while to solve.
The technique you would want for it is Homomorphic Encryption. It was first demonstrated in 2009 by Craig Gentry of IBM that arbitrary computation can be performed without revealing the plaintext.
The state-of-the-art is probably too inefficient for almost all applications - while exponential security can be obtained with "polynomial" computation (which is all the theorists really care about), the polynomial is enormous enough to be not valuable. This might change in the near future.
With that said, I don't see any reason why you can't:
hash each entry in the dictionary
(split each entry on whitespace, multiword entries are tuples of hashes)
split document on whitespace, hash each word
do the matching with the hashes
Essentially, you're matching arbitrary items, not inherently words. The client can produce the words-items map, and pass the items to the server. The server doesn't need to know anything about the items, just that an item from the dictionary appears in the text.
There are many articles and quotes on the web saying that a 'salt' must be kept secret. Even the Wikipedia entry on Salt:
For best security, the salt value is
kept secret, separate from the
password database. This provides an
advantage when a database is stolen,
but the salt is not. To determine a
password from a stolen hash, an
attacker cannot simply try common
passwords (such as English language
words or names). Rather, they must
calculate the hashes of random
characters (at least for the portion
of the input they know is the salt),
which is much slower.
Since I happen to know for a fact that encryption Salt (or Initialization Vectors) are OK to be stored on clear text along with the encrypted text, I want to ask why is this misconception perpetuated ?
My opinion is that the origin of the problem is a common confusion between the encryption salt (the block cipher's initialization vector) and the hashing 'salt'. In storing hashed passwords is a common practice to add a nonce, or a 'salt', and is (marginally) true that this 'salt' is better kept secret. Which in turn makes it not a salt at all, but a key, similar to the much clearly named secret in HMAC. If you look at the article Storing Passwords - done right! which is linked from the Wikipedia 'Salt' entry you'll see that is talking about this kind of 'salt', the password hash. I happen to disagree with most of these schemes because I believe that a password storage scheme should also allow for HTTP Digest authentication, in which case the only possible storage is the HA1 digest of the username:realm:password, see Storing password in tables and Digest authentication.
If you have an opinion on this issue, please post here as a response.
Do you think that the salt for block cipher encryption should be hidden? Explain why and how.
Do you agree that the blanket statement 'salts should be hidden' originates from salted hashing and does not apply to encryption?
Sould we include stream ciphers in discussion (RC4)?
If you are talking about IV in block cipher, it definitely should be in clear. Most people make their cipher weaker by using secret IV.
IV should be random, different for each encryption. It's very difficult to manage a random IV so some people simply use a fixed IV, defeating the purpose of IV.
I used to work with a database with password encrypted using secret fixed IV. The same password is always encrypted to the same ciphertext. This is very prone to rainbow table attack.
Do you think that the salt for block
cipher encryption should be hidden?
Explain why and how
No it shouldn't. The strength of a block cipher relies on the key. IMO you should not increase the strength of your encryption by adding extra secrets. If the cipher and key are not strong enough then you need to change the cipher or key length, not start keeping other bits of data secret. Security is hard enough so keep it simple.
Like LFSR Consulting says:
There are people that are much smarter
than you and I that have spent more
time thinking about this topic than
you or I ever will.
Which is a loaded answer to say the least. There are folks who, marginally in the honest category, will overlook some restraints when money is available. There are a plethora of people who have no skin at the fire and will lower the boundaries for that type,....
then, not too far away, there is a type of risk that comes from social factors - which is almost impossible to program away. For that person, setting up a device solely to "break the locks" can be an exercise of pure pleasure for no gain or measurable reason. That said, you asked that those who have an opinion please respond so here goes:
Do you think that the salt for block
cipher encryption should be hidden?
Explain why and how.
Think of it this way, it adds to the computational strength needed. It's just one more thing to hide if it has to be hidden. By and of it's self, being forced to hide ( salt, iv, or anything ) places the entity doing the security in the position of being forced to do something. Anytime the opposition can tell you what to do, they can manipulate you. If it leaks, that should have been caught by cross-controls that would have detected the leak and replacement salts available. There is no perfect cipher, save otp, and even that can be compromised somehow as greatest risk comes from within.
In my opinion, the only solution is to be selective about whom you do any security for - the issue of protecting salts leads to issues that are relevant to the threat model. Obviously, keys have to be protected. If you have to protect the salt, you probably need to review your burger flippin resume and question the overall security approach of those for whom you are working.
There is no answer, actually.
Do you agree that the blanket statement 'salts should be hidden' originates from salted hashing and does not apply to encryption?
Who said this, where, and what basis was given.
Should we include stream ciphers in discussion (RC4)?
A cipher is a cipher - what difference would it make?
Each encrypted block is the next block IV. So by definition, the IV cannot be secret. Each block is an IV.
The first block is not very different. An attacker who knows the length of the plain text could have a hint that the first block is the IV.
BLOCK1 could be IV or Encrypted with well known IV
BLOCK2 is encrypted with BLOCK#1 as an IV
...
BLOCK N is encrypted with BLOCK#N-1 as an IV
Still, whenever possible, I generate a random (non-null) IV and give it to each party out-of-band. But the security gain is probably not that important.
The purpose of a per record salt is to make the task of reversing the hashes much harder. So if a password database is exposed the effort required to break the passwords is increased. So assuming that the attacker knows exactly how you perform the hash, rather than constructing a single rainbow table for the entire database they need to do this for every entry in the database.
The per record salt is usually some combination of fields in the record that vary greatly between records. Transaction time, Account Number, transaction Number are all good examples of fields that can be used in a per record salt. A record salt should come from other fields in the record. So yes it is not secret, but you should avoid publicising the method of calculation.
There is a separate issue with a database wide salt. This is a sort of key, and protects against the attacker using existing rainbow tables to crack the passwords. The database wide salt should be stored separately so that if the database is compromised then it is unlikely that the attacker will get this value as well.
A database wide salt should be treated as though it was a key and access to the salt value should be moderately protected. One way of doing this is to split the salt into components that are managed in different domains. One component in the code, one in a configuration file, one in the database. Only the running code should be able to read all of these and combine them together using a bit wide XOR.
The last area is where many fail. There must be a way to change these salt values and or algorithm. If a security incident occurs we may want to be able to change the salt values easily. The database should have a salt version field and the code will use the version to identify which salts to use and in what combination. The encryption or hash creation always uses the latest salt algorithm, but the decode verify function always uses the algorithm specified in the record. This way a low priority thread can read through the database decrypting and re-encrypting the entries.
What is the difference between Obfuscation, Hashing, and Encryption?
Here is my understanding:
Hashing is a one-way algorithm; cannot be reversed
Obfuscation is similar to encryption but doesn't require any "secret" to understand (ROT13 is one example)
Encryption is reversible but a "secret" is required to do so
Hashing is a technique of creating semi-unique keys based on larger pieces of data. In a given hash you will eventually have "collisions" (e.g. two different pieces of data calculating to the same hash value) and when you do, you typically create a larger hash key size.
obfuscation generally involves trying to remove helpful clues (i.e. meaningful variable/function names), removing whitespace to make things hard to read, and generally doing things in convoluted ways to make following what's going on difficult. It provides no serious level of security like "true" encryption would.
Encryption can follow several models, one of which is the "secret" method, called private key encryption where both parties have a secret key. Public key encryption uses a shared one-way key to encrypt and a private recipient key to decrypt. With public key, only the recipient needs to have the secret.
That's a high level explanation. I'll try to refine them:
Hashing - in a perfect world, it's a random oracle. For the same input X, you always recieve the same output Y, that is in NO WAY related to X. This is mathematically impossible (or at least unproven to be possible). The closest we get is trapdoor functions. H(X) = Y for with H-1(Y) = X is so difficult to do you're better off trying to brute force a Z such that H(Z) = Y
Obfuscation (my opinion) - Any function f, such that f(a) = b where you rely on f being secret. F may be a hash function, but the "obfuscation" part implies security through obscurity. If you never saw ROT13 before, it'd be obfuscation
Encryption - Ek(X) = Y, Dl(Y) = X where E is known to everyone. k and l are keys, they may be the same (in symmetric, they are the same). Y is the ciphertext, X is the plaintext.
A hash is a one way algorithm used to compare an input with a reference without compromising the reference.
It is commonly used in logins to compare passwords and you can also find it on your reciepe if you shop using credit-card. There you will find your credit-card-number with some numbers hidden, this way you can prove with high propability that your card was used to buy the stuff while someone searching through your garbage won't be able to find the number of your card.
A very naive and simple hash is "The first 3 letters of a string".
That means the hash of "abcdefg" will be "abc". This function can obviously not be reversed which is the entire purpose of a hash. However, note that "abcxyz" will have exactly the same hash, this is called a collision. So again: a hash only proves with a certain propability that the two compared values are the same.
Another very naive and simple hash is the 5-modulus of a number, here you will see that 6,11,16 etc.. will all have the same hash: 1.
Modern hash-algorithms are designed to keep the number of collisions as low as possible but they can never be completly avoided. A rule of thumb is: the longer your hash is, the less collisions it has.
Obfuscation in cryptography is encoding the input data before it is hashed or encrypted.
This makes brute force attacks less feasible, as it gets harder to determine the correct cleartext.
That's not a bad high-level description. Here are some additional considerations:
Hashing typically reduces a large amount of data to a much smaller size. This is useful for verifying the contents of a file without having to have two copies to compare, for example.
Encryption involves storing some secret data, and the security of the secret data depends on keeping a separate "key" safe from the bad guys.
Obfuscation is hiding some information without a separate key (or with a fixed key). In this case, keeping the method a secret is how you keep the data safe.
From this, you can see how a hash algorithm might be useful for digital signatures and content validation, how encryption is used to secure your files and network connections, and why obfuscation is used for Digital Rights Management.
This is how I've always looked at it.
Hashing is deriving a value from
another, using a set algorithm. Depending on the algo used, this may be one way, may not be.
Obfuscating is making something
harder to read by symbol
replacement.
Encryption is like hashing, except the value is dependent on another value you provide the algorithm.
A brief answer:
Hashing - creating a check field on some data (to detect when data is modified). This is a one way function and the original data cannot be derived from the hash. Typical standards for this are SHA-1, SHA256 etc.
Obfuscation - modify your data/code to confuse anyone else (no real protection). This may or may not loose some of the original data. There are no real standards for this.
Encryption - using a key to transform data so that only those with the correct key can understand it. The encrypted data can be decrypted to obtain the original data. Typical standards are DES, TDES, AES, RSA etc.
All fine, except obfuscation is not really similar to encryption - sometimes it doesn't even involve ciphers as simple as ROT13.
Hashing is one-way task of creating one value from another. The algorithm should try to create a value that is as short and as unique as possible.
obfuscation is making something unreadable without changing semantics. It involves value transformation, removing whitespace, etc. Some forms of obfuscation can also be one-way,so it's impossible to get the starting value
encryption is two-way, and there's always some decryption working the other way around.
So, yes, you are mostly correct.
Obfuscation is hiding or making something harder to understand.
Hashing takes an input, runs it through a function, and generates an output that can be a reference to the input. It is not necessarily unique, a function can generate the same output for different inputs.
Encryption transforms the input into an output in a unique manner. There is a one-to-one correlation so there is no potential loss of data or confusion - the output can always be transformed back to the input with no ambiguity.
Obfuscation is merely making something harder to understand by intruducing techniques to confuse someone. Code obfuscators usually do this by renaming things to remove anything meaningful from variable or method names. It's not similar to encryption in that nothing has to be decrypted to be used.
Typically, the difference between hashing and encryption is that hashing generally just employs a formula to translate the data into another form where encryption uses a formula requiring key(s) to encrypt/decrypt. Examples would be base 64 encoding being a hash algorithm where md5 being an encryption algorithm. Anyone can unhash base64 encoded data, but you can't unencrypt md5 encrypted data without a key.