How to handle cracking passwords of different lengths with rainbow tables? - encryption

I'm doing a rainbow attack for homework and I'm getting some trouble on cracking passwords of different lengths. It means that I can crack every password of fixed length 8 for example in +-2 minutes. However, I don't know how to handle passwords from lengths 5 to 8 without losing much time.
Supposing that it's impossible to know the length of the password only by having the hash, I've already tried to crack the hash by trying every length one by one. It means that I spend 2 x 4 minutes to crack only 1 password.
Should I reduce every possible password with the maximum password length and then check only first characters or it is a bad idea?
I'm using a lower alphanumeric case rainbow table, sha256 algorithm and 50 000 different R functions. I'd like to find a way to accelerate this operation. Thanks to anyone who can help.

I suspect you're on the wrong road for improving performance. As you seem to suspect, shorter passwords are not related in any useful way to longer passwords. There's no relationship between all passwords that start with a particular letter (assuming the hash function is cryptographic).
The important point is that the 7 character space is 36 times smaller than 8 character space (lowercase alphanumeric), and the 6 character space is 36 times smaller than that. So checking the entire 6 character space costs around 0.1% of the 8 character space, and the smaller spaces are essentially free.
So your performance work should be focused on the per-hash cost. You won't get much benefit by trying to short-cut the shorter password lengths because they represent such a tiny part of the search space.

Related

Encoding DNA strand in Binary

Hey guys I have the following question:
Suppose we are working with strands of DNA, each strand consisting of
a sequence of 10 nucleotides. Each nucleotide can be any one of four
different types: A, G, T or C. How many bits does it take to encode a
DNA strand?
Here is my approach to it and I want to know if that is correct.
We have 10 spots. Each spot can have 4 different symbols. This means we require 4^10 combinations using our binary digits.
4^10 = 1048576.
We will then find the log base 2 of that. What do you guys think of my approach?
Each nucleotide (aka base-pair) takes two bits (one of four states -> 2 bits of information). 10 base-pairs thus take 20 bits. Reasoning that way is easier than doing the log2(4^10), but gives the same answer.
It would be fewer bits of information if there were any combinations that couldn't appear. e.g. some codons (sequence of three base-pairs) that never appear. But ten independent 2-bit pieces of information sum to 20 bits.
If some sequences appear more frequently than others, and a variable-length representation is viable, then Huffman coding or other compression schemes could save bits most of the time. This might be good in a file-format, but unlikely to be good in-memory when you're working with them.
Densely packing your data into an array of 2bit fields makes it slower to access a single base-pair, but comparing the whole chunk for equality with another chunk is still efficient. (memcmp).
20 bits is unfortunately just slightly too large for a 16bit integer (which computers are good at). Storing in an array of 32bit zero-extended values wastes a lot of space. On hardware with good unaligned support, storing 24bit zero-extended values is ok (do a 32bit load and mask the high 8 bits. Storing is even less convenient though: probably a 16b store and an 8b store, or else load the old value and merge the high 8, then do a 32b store. But that's not atomic.).
This is a similar problem for storing codons (groups of three base-pairs that code for an amino acid): 6 bits of information doesn't fill a byte. Only wasting 2 of every 8 bits isn't that bad, though.
Amino-acid sequences (where you don't care about mutations between different codons that still code for the same AA) have about 20 symbols per position, which means a symbol doesn't quite fit into a 4bit nibble.
I used to work for the phylogenetics research group at Dalhousie, so I've sometimes thought about having a look at DNA-sequence software to see if I could improve on how they internally store sequence data. I never got around to it, though. The real CPU intensive work happens in finding a maximum-likelihood evolutionary tree after you've already calculated a matrix of the evolutionary distance between every pair of input sequences. So actual sequence comparison isn't the bottleneck.
do the maths:
4^10 = 2^2^10 = 2^20
Answer: 20 bits

Reverse Engineering hash/encryption function

I have 5 numeric codes. They vary in length (8-10 digits). For each numeric code I have a corresponding alpha-numeric code. The alpha numeric codes are always 8 digits in length.
Now the problem. I know that by some process each numeric code is converted into it's corresponding 8 digit alpha numeric code but I do not know the process used. At first I thought that the alpha-numeric codes may be randomly generated using a seed from the numeric code but that did not seem to work. Now I am thinking that some sort of hashing algorithm is being used to convert the numerics to the alpha-numerics
My question is
1) Can I brute force solve this
2) If yes then what algorithms should I look into that can covert a numeric code to an 8 digit alpha-numeric code
3) Is there some other way to solve this?
Notes: The alpha-numeric codes are not case sensitive. I do not mind if a brute force search returns a few false positives because I will be able to narrow them down myself.
Clarification: I think the first guy misunderstood something. I know the exact values of these numeric and alpha-numeric codes. I simply am not sharing them on the site. I'm not trying to randomly map codes to codes I'm trying to find an algorithm that map my specific codes to the outputs.
No, you cannot brute force this.
There are an unlimited number of functions that will map 5 inputs to 5 outputs. How would you know whether you found the right function? For example, you can use these 5 pairs as constraints for a polynomial of degree n. There are an infinite number of possible polynomial solutions.
If you can narrow the functions down, then there are additional constraints on the problem.
If you assume a hash function is used, you can try guessing that there is no salting, and the search space is over well known hash functions. If there is salting, you are stuck brute forcing all possible salts over all possible hash functions. With just the salts, you are probably looking at > 2^128 values. A brute force attack is not going to be useful.
If a symmetric cipher is used, you have an instance of the chosen ciphertext problem. Modern ciphers are intentionally designed with this attack in mind and use 128 bits or more of key space. Brute forcing all keys is not going to work.
You do not state anything about the function. Is it reversible? Is it randomized?

Finding similar hashes

I'm trying to find 2 different plain text words that create very similar hashes.
I'm using the hashing method 'whirlpool', but I don't really need my question to be answered in the case or whirlpool, if you can using md5 or something easier that's ok.
The similarities i'm looking for is that they contain the same number of letters (doesnt matter how much they're jangled up)
i.e
plaintext 'test'
hash 1: abbb5 has 1 a , 3 b's , one 5
plaintext 'blahblah'
hash 2: b5bab must have the same, but doesnt matter what order.
I'm sure I can read up on how they're created and break it down and reverse it, but I am just wondering if what I'm talking about occurs.
I'm wondering because I haven't found a match of what I'm explaining (I created a PoC to run threw random words / letters till it recreated a similar match), but then again It would take forever doing it the way i was dong it. and was wondering if anyone with real knowledge of hashes / encryption would help me out.
So you can do it like this:
create an empty sorted map \
create a 64 bit counter (you don't need more than 2^63 inputs, in all probability, since you would be dead before they would be calculated - unless quantum crypto really takes off)
use the counter as input, probably easiest to encode it in 8 bytes;
use this as input for your hash function;
encode output of hash in hex (use ASCII bytes, for speed);
sort hex on number / alphabetically (same thing really)
check if sorted hex result is a key in the map
if it is, show hex result, the old counter from the map & the current counter (and stop)
if it isn't, put the sorted hex result in the map, with the counter as value
increase counter, goto 3
That's all folks. Results for SHA-1:
011122344667788899999aaaabbbcccddeeeefff for both 320324 and 429678
I don't know why you want to do this for hex, the hashes will be so large that they won't look too much alike. If your alphabet is smaller, your code will run (even) quicker. If you use whole output bytes (i.e. 00 to FF instead of 0 to F) instead of hex, it will take much more time - a quick (non-optimized) test on my machine shows it doesn't finish in minutes and then runs out of memory.

repetition in encrypted data -- red flag?

I have some base-64 encoded encrypted data and noticed a fair amount of repetition. In a (approx) 200-character-long string, a certain base-64 character is repeated up to 7 times in several separate repeated runs.
Is this a red flag that there is a problem in the encryption? According to my understanding, encrypted data should never show significant repetition, even if the plaintext is entirely uniform (i.e. even if I encrypt 2 GB of nothing but the letter A, there should be no significant repetition in the encrypted version).
According to the binomial distribution, there is about a 2.5% chance that you'd see one character from a set of 64 appear seven times in a series of 200 random characters. That's a small chance, but not negligible. With more information, you might raise your confidence from 97.5% to something very close to 100% … or find that the cipher text really is uniformly distributed.
You say that the "character is repeated up to 7 times" in several separate repeated runs. That's not enough information to say whether the cipher text has a bias. Instead, tell us the total number of times the character appeared, and the total number of cipher text characters. For example, "it appeared a total of 3125 times in 1000 runs of 200 characters each."
Also, you need to be sure that you are talking about the raw output of a cipher. Cipher text is often encapsulated in an "envelope" like that defined by the Cryptographic Message Syntax. Of course, this enclosing structure will have predictable patterns.
Well I guess it depends. Repetition in general is bad thing if it represents the same data.
Considering you are encoding it have you looked at data to see if you have something that repeats in those counts?
In order to understand better you gotta know what kind of encryption does it use.
It could be just coincidence that they are repeating.
But if repetition comes from same data, then it can be a red flag because then frequency counts can be used to decode it.
What kind of encryption are you using? Home made or some industry standard?
It depends on how are you encrypting your data.
Base64 encoding a string may count as light obfuscation, but it is NOT encryption. The purpose of Base64 encoding is to allow any sort of binary data to be encoded as a safe ASCII string.

Could AES-256 be less secure if a portion of the key is known an attacker?

Disclaimer: this is just out of curiosity; I'm no expert at all when it comes to cryptography.
Suppose a 256-bit key is composed of the following (UTF-16) characters:
aaaaaaaabbbbcccc
Futher suppose that an attacker knows the last 4 characters of the key is cccc.
Does this knowledge make it easier for an attacker?
My guess is that it makes it easier for the attacker to brute-force the encrypted text, but my understanding is that brute-forcing AES-256 is a very difficult problem. Then again, there might be something I don't understand about AES itself that makes this type of knowledge more valuable for an attacker.
I would say it's a bigger problem that your key's bytes are from UTF-16 characters, but in the ASCII character space (meaning you could have a key 32 ASCII characters long). As such 16 of the 32 bytes of your key are known to be 0x00. Knowing that the last 4 are c means that 4 more bytes have been compromised.
As such, you've really only got 12-bytes => 96-bits of your AES key unknown.
If your attacker assumes the alpha-numeric character space, that cuts it down by about a quarter as well (62 / 256).
With what you're working with, your key is pretty compromised (but not just because 4 characters of it are known)
A 256-bit key should give someone a 1 in 1.16 × 10^77 chance of guessing right.
With your situation, it's about a 1 in 3.23 x 10^21 chance (basically 62^12), which is a LOT smaller.
UPDATE:
I was a nerd and had to do the math. 12 alpha-numeric characters (upper and lower case) is roughly a 71-bit encryption strength. ( Math check = log(62^12)/log(2) )
Theoretically, it is less secure now.
Practically, as long as there are still at least 80 bits unknown to attacker, you are good to go.
I don't think so. The hacker could guess the last 4 bytes [run through them brute-force], even if they didn't know, so knowing the last 4 bytes just reduces key space.
[BTW, I'm not an "expert" either.]

Resources