Hello everyone,
In one of projects I've found that data encrypts with algorithm I have never meet before. It converts input string to 10-symbol numeric string. It's definitely not one of popular hashes (md5, sha1, etc) or check sums (crc16, crc32, etc), I've checked all I known.
Example:
Input string: "davidc"; Output string: "2172453193".
Length of output string is fixed - 10 symbols. It contains only digits ("0123456789"). This is all details i can add =/
Maybe someone met something similar to that or maybe know algorithm - You'll economy lot of my time.
With love <3
Related
Im new so if this question was already Asked (i didnt find it scrolling through the list of results though) please send me the link.
I got a math quiz and im to lazy to go through all the possibilities so i thought i can find a program instead. I know a bit about programming but not much.
Is it possible (and in what programming language, and how) to read only one digit, e.g at the 3rd Position, in a integer?
And how is an integer actually saved, in a kind of array?
Thanks!
You can get rid of any lower valued digit (the ones and tens if you only want the hundreds) by dividing with rounding/truncation. 1234/100 is 12 in most languages if you are doing integer division.
You can get rid of any higher valued digits by using taking the modulus. 12 % 10 is 2 in many languages; just find out how the modulus is done in yours. I use "modulus" meaning "divide and keep the rest only", i.e. it is the opposite of "divide with rounding"; that which is lost by rounding is the final result of the modulus.
The alternative is however to actually NOT see the input as a number and treat it as text. That way it is often easier to ignore the last 2 characters and all leading characters.
I am working on a system that makes heavy use of pseudonyms to make privacy-critical data available to researchers. These pseudonyms should have the following properties:
They should not contain any information (e.g. time of creation, relation to other pseudonyms, encoded data, …).
It should be easy to create unique pseudonyms.
They should be human readable. That means they should be easy for humans to compare, copy, and understand when read out aloud.
My first idea was to use UUID4. They are quite good on (1) and (2), but not so much on (3).
An variant is to encode UUIDs with a wider alphabet, resulting in shorter strings (see for example shortuuid). But I am not sure whether this actually improves readability.
Another approach I am currently looking into is a paper from 2005 titled "An optimal code for patient identifiers" which aims to tackle exactly my problem. The algorithm described there creates 8-character pseudonyms with 30 bits of entropy. I would prefer to use a more widely reviewed standard though.
Then there is also the git approach: only display the first few characters of the actual pseudonym. But this would mean that a pseudonym could lose its uniqueness after some time.
So my question is: Is there any widely-used standard for human-readable unique ids?
Not aware of any widely-used standard for this. Here’s a non-widely-used one:
Proquints
https://arxiv.org/html/0901.4016
https://github.com/dsw/proquint
A UUID4 (128 bit) would be converted into 8 proquints. If that’s too much, you can take the last 64 bits of the UUID4 (= just take 64 random bits). This doesn’t make it magically lose uniqueness; only increases the likelihood of collisions, which was non-zero to begin with, and which you can estimate mathematically to decide if it’s still OK for your purposes.
Here you go UUID Readable
Generate Easy to Remember, Readable UUIDs, that are Shakespearean and Grammatically Correct Sentences
This article suggests to use the first few characters from a SHA-256 hash, similarly to what git does. UUIDs are typically based on SHA-1, so this is not all that different. The tradeoff between property (2) and (3) is in the number of characters.
With d being the number of digits, you get 2 ** (4 * d) identifiers in total, but the first collision is expected to happen after 2 ** (2 * d).
The big question is really not about the kind of identifier you use, it is how you handle collisions.
I have recently been looking into Bitcoin and the proof of work system.
In this system, when mining, a user has a "challenge string" that they need to concatenate with the CORRECT "proof string"(nonce) and hash, the outcome of that hash starts with a prefix of leading zeros and that's how they verify the block.
My question is, when combining that "challenge string" and the CORRECT "proof string"(nonce), why does the corresponding hash of those values start with the prefix of zeros? How does that work?
The combination of "Challenge string" and "proof of string" is sent to a hashing function, which is a one way function and results in a "random string" (The condition on "random string" is that it should have x number of zeroes in the beginning. The difficulty of guessing increases day to day, which is nothing but x increases).
The job of a Miner is to guess the "proof of string" until the condition on the "random string" is met.
So, it is a pure guessing game. GPUs are very good at generating random numbers very quickly. That is why miners around the world are using top class GPUs to mine the bitcoin transactions.
Not specific to Bitcoin but Bitcoin uses the same mechanism, see https://en.wikipedia.org/wiki/Hashcash for a description of how Proof Of Work "works" and why it works that way.
The short answer is that you are creating a hash collision (https://en.wikipedia.org/wiki/Collision_resistance) and the more leading zeros there are the more difficult it is to create the collision.
In Bitcoin, the difficulty is algorithmically chosen based on the compute on the bitcoin network. This typically only goes up, but should the compute go down the dificulty will also go down. The algorithm adjusts dificulty to keep transaction verification time around 10 minutes.
This is a Vigenere cipher-text
EORLL TQFDI HOEZF CHBQN IFGGQ MBVXM SIMGK NCCSV
WSXYD VTLQS BVBMJ YRTXO JCNXH THWOD FTDCC RMHEH
SNXVY FLSXT ICNXM GUMET HMTUR PENSU TZHMV LODGN
MINKA DTLOG HEVNI DXQUG AZGRM YDEXR TUYRM LYXNZ
ZGJ
The index of coincidence gave a shift of six (6): I know this is right (I used an online Java applet to decrypt the whole thing using the key 'QUARTZ').
However, in this question we are only told the first and last two letters of the Key - 'Q' and 'TZ.'
So far I have split the ciphertext into slices using this awesome applet. So the first slice is 0, k, 2k, 3k, 4k; the second is 1, k + 1, 2k + 1, 3k + 1; et cetera.
KeyPos=0: EQEQQSCXQJJHDEYIUTSVMTVUMTYJ
KeyPos=1: OFZNMICYSYCWCHFCMUULILNGYUX
KeyPos=2: RDFIBMSDBRNOCSLNERTONOIADYN
KeyPos=3: LICFVGVVVTXDRNSXTPZDKGDZERZ
KeyPos=4: LHHGXKWTBXHFMXXMHEHGAHXGXMZ
KeyPos=5: TOBGMNSLMOTTHVTGMNMNDEQRRLG
My idea was to calculate the highest-frequency letter in each block, hoping that the most frequent letter would give me some clue as to how to find 'U,' 'A' and 'R.' However, the most frequent letters in these blocks are:
KeyPos=0: Q,4 T,3 E,3, J,3
KeyPos=1: C,4 U,3 Y,3
KeyPos=2: N,4 O,3 R,3 D,3 B,2
KeyPos=3: V,4 D,3 Z,3
KeyPos=4: H,6 X,6 M,3 G,3
KeyPos=5: M,4 T,4 N,3 G,3
Which yields QCNVHM, or QUNVHM (being generous), neither of which are that close to QUARTZ. There are online applets that can crack this no problem, so it mustn't be too short a text to yield decent frequency counts from the blocks.
I guess I must be approaching this the wrong way. I just hoped one of you might be able to offer some clue as to where I am going wrong.
p.s. This is for a digital crypto class.
Interesting question...
I don't have a programmatic solution for cracking the original ciphertext, but I was able to solve it with a little mind power and some helpful JavaScript.
I started by using this page and the information you supplied. Provide the ciphertext, a key length of 6 and hit initialize. What's nice about the approach here is that unknowns in either the plaintext or key are left as hyphens.
Update the key, adding only what you know Q---TZ and click 'update plaintext'. At this point we know:
o---sua---opo---oca---nha---enc---rom---dth---ama---int---ept---our---mun---tio---ewi---eus---the---ond---loc---onf---now---hed---off---ere---nsw---esd---tmi---ght
Here's where I applied a bit of brain power. You start recognizing bits of the plaintext. the, now and off make an appearance. At the end, there's ght - this made me think the prior letter is likely a vowel. For example light or thought. I replaced the corresponding hyphen with u and clicked update keyword to find what letter would have produced that combination. The matching letter turns out to be F. I think updated the plaintext to see the results. They didn't look promising. So I tried i instead which resulted in:
o--usua--ropo--loca--onha--eenc--prom--edth--eama--eint--cept--gour--mmun--atio--wewi--beus--gthe--cond--yloc--ionf--mnow--thed--poff--mere--insw--nesd--atmi--ight
Now we're getting somewhere. At the start I see something that might be usual, and further in I see int--cept and near the end w--nesd-- at mi--ight. Voila. Filling in the letters for wednesday and updating the keyword yielded QUARTZ.
... So, how to port this approach to code? Not sure about the best way to do that just yet. The idea of using the known characters in the key, partially decrypting the ciphertext and brute forcing the rest is appealing. But without a dictionary handy, I'm not sure what the best brute-forcing method would be...
To be continued (maybe)...
An algorithm wouldn't just consider the most frequent letters but the frequency pattern of the whole alphabet. Technically you compute the index of coincidence for each possible shift and consider the maximal ones.
Well I have been working on an assigment and it states:
A program has to be developed, and coded in C language, to decipher a document written
in Italian that is encoded using a secret key. The secret key is obtained as random
permutation of all the uppercase letters, lowercase letters, numbers and blank space. As
an example, let us consider the following two strings:
Plain: “ABCDEFGHIJKLMNOPQRSTUVXWYZabcdefghijklmnopqrstuvwxyz0123456789 ”
Code: “BZJ9y0KePWopxYkQlRjhzsaNTFAtM7H6S24fC5mcIgXbnLOq8Uid 3EDv1ruVGw”
The secret key modifies only letters, numbers, and spaces of the original document, while
the remaining characters are left unchanged. The document is stored in a text file whose
length is unknown.
The program has to read the document, find the secret key (which by definition is
unknown; the above table is just an example and it is not the key used for preparing the
sample files available on the web course) using a suitable decoding algorithm, and write
the decoded document to a new text file.
And I know that I have to upload an English dictionary into the program but I don't why it has been asked (may be not in that statement but I have to do THAT). My question is, while I can do that program using simple encryption/decryption algorithm then what's the use of uploading the English dictionary in our program? So is there any decryption algorithm that uses a dictionary to decrypt an encrypted algorithm? Or can somebody tell me what approach or algorithm should I use to solve that problem???
An early reply (and also authentic one) will be highly appreciated from you.
Thank you guys.
This is a simple substitution cipher. It can be broken using frequency analysis. The Wikipedia articles explain both concepts thoroughly. What you need to do is:
Find the statistical frequency of characters in Italian texts. If you can't find this published anywhere, you can build it yourself by analyzing a large corpus of Italian texts.
Analyze the frequency of characters in the cipher text, and match it to the statistical data.
The first Wikipedia article links to a set of tools that implement all of the above. You just need to use and possibly adapt it to your use case.
Your cipher is a substitution cipher. That is it substitutes one letter for another.
consider the cipher text
"yjr,1drv2ry1od1q1..."
We can use a dictionary to find the plaintext.
Find punctuation, since a space always follows a comma, you can find the substitution rule for spaces.
which gives you.
"yjr, drv2ry od q..."
Notice the word lengths. Since there only two 1 letter words in the english language the q is probably i or a. "yjr" is probably "why", "the", "how" etc.
We try why with the result
"why, dyv2yw od q..."
There are no english words with two y's, and end in w.
So we try "the" and get
"the, dev2et od q..."
We conclude that the is a likely answer.
Now we search our dictionary for words that start look like ?e??et.
rinse repeat.
That is, find some set of words which fit into the lengths available and do not break each others substitution rules.
Personally I just do the frequency analysis suggested above.
Frequency analysis, as both other respondents said, is the way to go, and you can use digrams and trigrams to make it much stronger. Just grab tons of Italian text from the web and churn ahead! It's really pretty simple programming.