Recover email address from special application of MD5 hash function - encryption

First, we segment the email address into 2-character strings.
Then, for every segment s, we compute the following hash J:
md5(md5(s) + s + md5(s)) [where + is the string concatenation operator].
Finally, we concatenate all hash strings J to form the long hash below.
For example: for an input of helloworld#company.com, we would compute:
md5(md5('he') + 'he' + md5('he')) +
md5(md5('ll') + 'll' + md5('ll')) +
md5(md5('ow') + 'ow' + md5('ow')) +
...
Long Hash:
f894e71e1551d1833a977df952d0cc9de44a1f9669fbf97d51309a2c6574d5eaa746cdeb9ee1a5df
c771d280d33e5672bf024973657c99bf80cb242d493d5bacc771b3b0b422d5c13595cf3e73cfb1df
91caedee7a6c5f3ce2c283564a39c52d3306d60cbc0e3e33d7ed01e780acb1ccd9174cfea4704eb2
33b0f06e52f6d5aba5a5a89e6122dd55f8efcf024961c1003d116007775d60a0d5781d2e35d747b5
dece2e0e3d79d272e40c8c66555f5525
How can I recover the email address from the hash? As I understand it, a "Hash" is a One Way Function. I can only compare it to another hash to see if they match or generate a Hash of the original text.

While it may be true in general that it is impractical to extract the original message from a hash, this clearly looks like an exercise with conditions carefully crafted to make it possible to break the "encryption".
Consider that the email address is broken up into two-character segments. If you limit yourself to just lowercase letters (26 letters + 2 symbols, # and ., there are only 28 * 28 = 784 possible two-letter combinations. Even if the emails have lowercase and uppercase letters and numbers, there are only 64 * 64 = 4096 combinations -- well within computational limits.
The thing to do is to pre-compute a rainbow table, or table of all possible hash values in your search space. You could do this with a matrix:
+----------------------------------+----------------------------------+----------------------------------------+-----------------------------+
| a | b | c | ... |
+----------------------------------+----------------------------------+----------------------------------------+-----------------------------+
a| md5(md5('aa') + 'aa' + m5('aa')) | md5(md5('ba') + 'ba' + m5('ba')) | md5(md5('ca') + 'ca' + m5('ca')) | ... |
+----------------------------------+----------------------------------+----------------------------------------+-----------------------------+
b| md5(md5('ab') + 'ab' + m5('ab')) | md5(md5('bb') + 'bb' + m5('bb')) | md5(md5('cb') + 'cb' + m5('cb')) | ... |
+----------------------------------+----------------------------------+----------------------------------------+-----------------------------+
c| md5(md5('ac') + 'ac' + m5('ac')) | md5(md5('bc') + 'bc' + m5('bc')) | md5(md5('cc') + 'cc' + m5('cc')) | ... |
+----------------------------------+----------------------------------+----------------------------------------+-----------------------------+
| ... | ... | ... | ... |
+----------------------------------+----------------------------------+----------------------------------------+-----------------------------+
but then you would have to traverse the matrix each time to find a match -- slow!
An alternative is to use a dictionary with the key being the hash, and the value being the 'decoded' letters:
{
md5(md5('aa') + 'aa' + md5('aa')): 'aa',
md5(md5('ab') + 'ab' + md5('ab')): 'ab',
md5(md5('ac') + 'ac' + md5('ac')): 'ac',
...
}
Either way, you will now have the hashes for all possible two-letter combinations. Now you process the input string. Since MD5 produces 32-character long hashes, break the input up into 32-character strings, and perform lookups against your table:
'f894e71e1551d1833a977df952d0cc9d' => 'he'
'e44a1f9669fbf97d51309a2c6574d5ea' => 'll'
...

Here is implementation of your question in python.
My Code:
import hashlib, string
# lambda function for MD5
md5hashFunction = lambda data: hashlib.md5(data.encode()).hexdigest()
# lambda function for md5(md5(data) + data + md5)
finalHash = lambda data: md5hashFunction(
md5hashFunction(data) + data + md5hashFunction(data)
)
# All MD5 hashes are 32 char length size therefore we need dive 32 fixed parts
hashes = [
"f894e71e1551d1833a977df952d0cc9d",
"e44a1f9669fbf97d51309a2c6574d5ea",
"a746cdeb9ee1a5dfc771d280d33e5672",
"bf024973657c99bf80cb242d493d5bac",
"c771b3b0b422d5c13595cf3e73cfb1df",
"91caedee7a6c5f3ce2c283564a39c52d",
"3306d60cbc0e3e33d7ed01e780acb1cc",
"d9174cfea4704eb233b0f06e52f6d5ab",
"a5a5a89e6122dd55f8efcf024961c100",
"3d116007775d60a0d5781d2e35d747b5",
"dece2e0e3d79d272e40c8c66555f5525",
]
# Enumurate all alphabet and extra characters for decryption => "_+.#"
alphabet = list(
string.ascii_lowercase + string.ascii_uppercase + string.digits + "_+.#"
)
# Create python dictionary for map hashes to string
rainbowTable = {finalHash(x + y): x + y for x in alphabet for y in alphabet}
"""
rainbowTable
'31453dd786a8c6f6c7c8860d5fcea4be': 'aa',
'857dce5bcf6b6b32bec281207b2dba80': 'ab',
'e90d94b4b65ac19188fdae82acf7fbbc': 'ac',
'67299b8cedc5eafea7dda1daf9356b54': 'ad',
'40fca4e80bfc6e1faa2c4e2b7e0929f0': 'ae',
'de48fc1bd98f5508c513f9947a514ce8': 'af',
'4852089b1b43b45204907df0066c0edf': 'ag',
'e1b82a5fe4fdcf73d034a0d5063ffe3f': 'ah',
...... Continues....
"""
# Search for matched hash and join to single string
print("".join([rainbowTable[hash] for hash in hashes]))
"""
f894e71e1551d1833a977df952d0cc9de44a1f9669fbf97d51309a2c6574d5eaa746cdeb9ee1a5df
c771d280d33e5672bf024973657c99bf80cb242d493d5bacc771b3b0b422d5c13595cf3e73cfb1df
91caedee7a6c5f3ce2c283564a39c52d3306d60cbc0e3e33d7ed01e780acb1ccd9174cfea4704eb2
33b0f06e52f6d5aba5a5a89e6122dd55f8efcf024961c1003d116007775d60a0d5781d2e35d747b5
dece2e0e3d79d272e40c8c66555f5525
"""
"""
Output ==> secret_jobs#anvato.com
"""

Here is what you can do:
Step 1: Divide the hash string in 32 bit blocks
Step 2: find all possible combinations of 2 character strings from the list of strings which can be combination of alphabets, numbers and any special characters.
Step 3: generate MD5 hash code for that segment, concatenate it with plain text segment and same hash code and generate MD5 hash code again
Step 4: Compare the generates hash code with the existing hash code. If it matched save it in string buffer. Iterate this process till all the blocks are decoded. You will have your answer.

Related

jinja2 variable inside a variable

I am passing a dictionary to a template.
dict road_len = {"US_NEWYORK_MAIN":24,
"US_BOSTON_WALL":18,
"FRANCE_PARIS_RUE":9,
"MEXICO_CABOS_STILL":8}
file_handle = output.txt
env.globals.update(country = "MEXICO")
env.globals.update(city = "CABOS")
env.globals.update(street = "STILL")
file_handle.write(env.get_template(template.txt).render(road_len=road_len)))
template.txt
This is a road length is: {{road_len["{{country}}_{{city}}_{{street}}"]}}
Expected output.txt
This is a road length is: 8
But this does not work as nested variable substitution are not allowed.
You never nest Jinja {{..}} markers. You're already inside a template context, so if you want to use the value of a variable you just use the variable. It helps if you're familiar with Python, because you can use most of Python's string formatting constructs.
So you could write:
This is a road length is: {{road_len["%s_%s_%s" % (country, city, street)]}}
Or:
This is a road length is: {{road_len[country + "_" + city + "_" + street]}}
Or:
This is a road length is: {{road_len["{}_{}_{}".format(country, city, street)]}}

The encryption won't decrypt

I was given an encrypted copy of the study guide here, but how do you decrypt and read it???
In a file called pa11.py write a method called decode(inputfile,outputfile). Decode should take two parameters - both of which are strings. The first should be the name of an encoded file (either helloworld.txt or superdupertopsecretstudyguide.txt or yet another file that I might use to test your code). The second should be the name of a file that you will use as an output file.
Your method should read in the contents of the inputfile and, using the scheme described in the hints.txt file above, decode the hidden message, writing to the outputfile as it goes (or all at once when it is done depending on what you decide to use).
The penny math lecture is here.
"""
Program: pennyMath.py
Author: CS 1510
Description: Calculates the penny math value of a string.
"""
# Get the input string
original = input("Enter a string to get its cost in penny math: ")
cost = 0
Go through each character in the input string
for char in original:
value = ord(char) #ord() gives us the encoded number!
if char>="a" and char<="z":
cost = cost+(value-96) #offset the value of ord by 96
elif char>="A" and char<="Z":
cost = cost+(value-64) #offset the value of ord by 64
print("The cost of",original,"is",cost)
Another hint: Don't forget about while loops...
Another hint: After letters -
skip ahead by their pennymath value positions + 2
After numbers - skip ahead by their number + 7 positions
After anything else - just skip ahead by 1 position
The issue I'm having in that I cant seem to get the coding right to decode the file it comes out looking the same. This is the current code I have been using. But once I try to decrypt the message it stays the same.
def pennycost(c):
if c >="a" and c <="z":
return ord(c)-96
elif c>="A" and c<="Z":
return ord(c)-64
def decryption(inputfile,outputfile):
with open(inputfile) as f:
fo = open(outputfile,"w")
count = 0
while True:
c = f.read(1)
if not c:
break;
if count > 0:
count = count -1;
continue
elif c.isalpha():
count = pennycost(c)
fo.write(c)
elif c.isdigit():
count = int(c)
fo.write(c)
else:
count = 6
fo.write(c)
fo.close()
inputfile = input("Please enter the input file name: ")
outputfile = input("Plese enter the output file name(EXISTING FILE WILL BE OVER WRITTEN!): ")
decryption(inputfile,outputfile)

sqlite timecode calculations

I have two tables which I export from my video editing suite, one ("MediaPool") containing a row for each media file imported into the project, another ("Montage") for the portions of that file used in a specific edit. The fields that are associated between the two are MediaPool.FileName and Montage.Name, which are very similar (Filename only adds the file extension).
# MediaPool
Filename | Take
---------------------------------
somefile.mp4 | Getty
file2.mov | Associated Press
file3.mov | Associated Press
and
# Montage
Name | RecordIn | RecordOut
------------------------------------------
somefile | 01:01:01:01 | 01:01:20:19
somefile | 01:05:15:23 | 01:05:16:10
somefile | 01:25:19:10 | 01:30:16:04
file2 | 01:30:11:10 | 01:31:18:12
file2 | 01:40:15:22 | 01:42:21:17
The tables contain many more columns of course, but only the above is relevant.
Only the "MediaPool" table contains the field called "Take" which designates the file's copyright holder (long story). It can't be included in the "Montage" export. I needed to calculate the total duration of footage used from each source, by subtracting the RecordIn timecode from RecordOut and adding each result. This turned out to be more complicated than I expected, as I have some notions of programming but almost none when it comes to SQL (sqlite in my case).
I managed to come up with the following, which works fine and runs in under 4 seconds. However, from the little programming I've done, it seems overlong and very inelegant. Is there a shorter way to achieve this?
BTW, I'm using 25 fps timecode and I can't use LPAD in sqlite.
SELECT
Source,
SUBSTR('00' || CAST(DurationFrames/(60*60*25) AS TEXT), -2, 2) || ':' ||
SUBSTR('00' || CAST(DurationFrames%(60*60*25)/(60*25) AS TEXT), -2, 2) || ':' ||
SUBSTR('00' || CAST(DurationFrames%(60*60*25)%(60*25)/25 AS TEXT), -2, 2) || ':' ||
SUBSTR('00' || CAST(DurationFrames%(60*60*25)%(60*25)%25 AS TEXT), -2, 2)
AS DurationTC
FROM
(
SELECT
MediaPool.Take AS Source,
Montage.RecordIn,
Montage.RecordOut,
SUM(CAST(SUBSTR(Montage.RecordOut, 1, 2) AS INT)*3600*25 +
CAST(SUBSTR(Montage.RecordOut, 4, 2) AS INT)*60*25 +
CAST(SUBSTR(Montage.RecordOut, 7, 2) AS INT)*25 +
CAST(SUBSTR(Montage.RecordOut, 10, 2) AS INT) -
CAST(SUBSTR(Montage.RecordIn, 1, 2) AS INT)*3600*25 -
CAST(SUBSTR(Montage.RecordIn, 4, 2) AS INT)*60*25 -
CAST(SUBSTR(Montage.RecordIn, 7, 2) AS INT)*25 -
CAST(SUBSTR(Montage.RecordIn, 10, 2) AS INT))
AS DurationFrames
FROM
MediaPool
JOIN
Montage ON MediaPool.FileName LIKE '%' || Montage.Name || '%'
GROUP BY
Take
ORDER BY
Take
)
Here's a simplified query that produces the same results as yours on your test data. Mostly it uses printf() instead of a bunch of string concatenation and substr()s, and uses strftime() to calculate the total seconds of the hours minutes seconds part of the timecode:
WITH frames AS
(SELECT Take, sum((strftime('%s', substr(RecordOut,1,8))*25 + substr(RecordOut,10))
- (strftime('%s', substr(RecordIn,1,8))*25 + substr(RecordIn,10)))
AS DurationFrames
FROM MediaPool
JOIN Montage ON MediaPool.Filename LIKE Montage.Name || '.%'
GROUP BY Take)
SELECT Take AS Source
, printf("%02d:%02d:%02d:%02d", DurationFrames/(60*60*25),
DurationFrames%(60*60*25)/(60*25),
DurationFrames%(60*60*25)%(60*25)/25,
DurationFrames%(60*60*25)%(60*25)%25)
AS DurationTC
FROM frames
ORDER BY Take;

pyparsing recursive grammar space separated list inside a comma separated list

Have the following string that I'd like to parse:
((K00134,K00150) K00927,K11389) (K00234,K00235)
each step is separated by a space and alternation is represented by a comma. I'm stuck in the first part of the string where there is a space inside the brackets. The desired output I'm looking for is:
[[['K00134', 'K00150'], 'K00927'], 'K11389'], ['K00234', 'K00235']
What I've got so far is a basic setup to do recursive parsing, but I'm stumped on how to code in a space separated list into the bracket expression
from pyparsing import Word, Literal, Combine, nums, \
Suppress, delimitedList, Group, Forward, ZeroOrMore
ortholog = Combine(Literal('K') + Word(nums, exact=5))
exp = Forward()
ortholog_group = Suppress('(') + Group(delimitedList(ortholog)) + Suppress(')')
atom = ortholog | ortholog_group | Group(Suppress('(') + exp + Suppress(')'))
exp <<= atom + ZeroOrMore(exp)
You are on the right track, but I think you only need one place where you include grouping with ()'s, not two.
import pyparsing as pp
LPAR,RPAR = map(pp.Suppress, "()")
ortholog = pp.Combine('K' + pp.Word(pp.nums, exact=5))
ortholog_group = pp.Forward()
ortholog_group <<= pp.Group(LPAR + pp.OneOrMore(ortholog_group | pp.delimitedList(ortholog)) + RPAR)
expr = pp.OneOrMore(ortholog_group)
tests = """\
((K00134,K00150) K00927,K11389) (K00234,K00235)
"""
expr.runTests(tests)
gives:
((K00134,K00150) K00927,K11389) (K00234,K00235)
[[['K00134', 'K00150'], 'K00927', 'K11389'], ['K00234', 'K00235']]
[0]:
[['K00134', 'K00150'], 'K00927', 'K11389']
[0]:
['K00134', 'K00150']
[1]:
K00927
[2]:
K11389
[1]:
['K00234', 'K00235']
This is not exactly what you said you were looking for:
you wanted: [[['K00134', 'K00150'], 'K00927'], 'K11389'], ['K00234', 'K00235']
I output : [[['K00134', 'K00150'], 'K00927', 'K11389'], ['K00234', 'K00235']]
I'm not sure why there is grouping in your desired output around the space-separated part (K00134,K00150) K00927. Is this your intention or a typo? If intentional, you'll need to rework the definition of ortholog_group, something that will do a delimited list of space-delimited groups in addition to the grouping at parens. The closest I could get was this:
[[[[['K00134', 'K00150']], 'K00927'], ['K11389']], [['K00234', 'K00235']]]
which required some shenanigans to group on spaces, but not group bare orthologs when grouped with other groups. Here is what it looked like:
ortholog_group <<= pp.Group(LPAR + pp.delimitedList(pp.Group(ortholog_group*(1,) & ortholog*(0,))) + RPAR) | pp.delimitedList(ortholog)
The & operator in combination with the repetition operators gives the space-delimited grouping (*(1,) is equivalent to OneOrMore, *(0,) with ZeroOrMore, but also supports *(10,) for "10 or more", or *(3,5) for "at least 3 and no more than 5"). This too is not quite exactly what you asked for, but may get you closer if indeed you need to group the space-delimited bits.
But I must say that grouping on spaces is ambiguous - or at least confusing. Should "(A,B) C D" be [[A,B],C,D] or [[A,B],C],[D] or [[A,B],[C,D]]? I think, if possible, you should permit comma-delimited lists, and perhaps space-delimited also, but require the ()'s when items should be grouped.

decoding an ASCII character message

I have no idea what im doing I need to decode mmZ\dxZmx]Zpgy, I have an example but not sure what to do please help!
If (EncryptedChar - Key < 32) then
DecryptedChar = ((EncryptedChar - Key) + 127) - 32
Else
DecryptedChar = (EncryptedChar - Key)
the key us unknown 1-100
Use ord and chr to convert between the ordinals (ord) of the ASCII value and the and the characters (chr) they represent.
Then you can loop through your string and apply the your algorithm to each.
For example:
import sys
SecretMessage = "mmZ\dxZmx]Zpgy"
Key = 88
for Letter in SecretMessage:
EncryptedChar = ord(Letter)
if (EncryptedChar - Key) < 32:
DecryptedChar = ((EncryptedChar - Key) + 127) - 32
else:
DecryptedChar = (EncryptedChar - Key)
sys.stdout.write(chr(DecryptedChar))
Run this to see the output. I'll leave the exercise of finding the key value 88 up to you (hint: it involves iterations). You also appear to be missing the first letter from SecretMessage (probably a :).

Resources