Difference between encoding and encryption - encryption

What is the difference between encoding and encryption?

Encoding transforms data into another format using a scheme that is publicly available so that it can easily be reversed.
Encryption transforms data into another format in such a way that only specific individual(s) can reverse the transformation.
For Summary -
Encoding is for maintaining data usability and uses schemes that are publicly available.
Encryption is for maintaining data confidentiality and thus the ability to reverse the transformation (keys) are limited to certain people.
More details in SOURCE

Encoding:
Purpose: The purpose of encoding is to transform data so that it can be properly (and safely) consumed by a different type of system.
Used for: Maintaining data usability i.e., to ensure that it is able to be properly consumed.
Data Retrieval Mechanism: No key and can be easily reversed provided we know what algorithm was used in encoding.
Algorithms Used: ASCII, Unicode, URL Encoding, Base64.
Example: Binary data being sent over email, or viewing special characters on a web page.
Encryption:
Purpose: The purpose of encryption is to transform data in order to keep it secret from others.
Used for: Maintaining data confidentiality i.e., to ensure the data cannot be consumed by anyone other than the intended recipient(s).
Data Retrieval Mechanism: Original data can be obtained if we know the key and encryption algorithm used.
Algorithms Used: AES, Blowfish, RSA.
Example: Sending someone a secret letter that only they should be able to read, or securely sending a password over the Internet.
Reference URL: http://danielmiessler.com/study/encoding_vs_encryption/

Encoding is the process of transforming data so that it may be transmitted without danger over a communication channel or stored without danger on a storage medium. For instance, computer hardware does not manipulate text, it merely manipulates bytes, so a text encoding is a description of how text should be transformed into bytes. Similarly, HTTP does not allow all characters to be transmitted safely, so it may be necessary to encode data using base64 (uses only letters, numbers and two safe characters).
When encoding or decoding, the emphasis is placed on everyone having the same algorithm, and that algorithm is usually well-documented, widely distributed and fairly easily implemented. Anyone is eventually able to decode encoded data.
Encryption, on the other hand, applies a transformation to a piece of data that can only be reversed with specific (and secret) knowledge of how to decrypt it. The emphasis is on making it hard for anyone but the intended recipient to read the original data. An encoding algorithm that is kept secret is a form of encryption, but quite vulnerable (it takes skill and time to devise any kind of encryption, and by definition you can't have someone else create such an encoding algorithm for you - or you would have to kill them). Instead, the most used encryption method uses secret keys : the algorithm is well-known, but the encryption and decryption process requires having the same key for both operations, and the key is then kept secret. Decrypting encrypted data is only possible with the corresponding key.

Encoding is the process of putting a sequence of characters into a special format for transmission or storage purposes
Encryption is the process of translation of data into a secret code. Encryption is the most effective way to achieve data security. To read an encrypted file, you must have access to a secret key or password that enables you to decrypt it. Unencrypted data is called plain text ; encrypted data is referred to as cipher text

Encoding is for maintaining data usability and can be reversed by employing the same algorithm that encoded the content, i.e. no key is used.
Encryption is for maintaining data confidentiality and requires the use of a key (kept secret) in order to return to plaintext.
Also there are two major terms that brings confusion in the world of security Hashing and Obfuscation
Hashing is for validating the integrity of content by detecting all modification thereof via obvious changes to the hash output.
Obfuscation is used to prevent people from understanding the meaning of something, and is often used with computer code to help prevent successful reverse engineering and/or theft of a product’s functionality.
Read more # Danielmiessler article

See encoding as a way to store or communicate data between different systems. For example, if you want to store text on a hard drive, you're going to have to find a way to convert your characters to bits. Alternatively, if all you have is a flash light, you might want to encode your text using Morse. The result is always "readable", provided you know how it's stored.
Encryption means you want to make your data unreadable, by encrypting it using an algorithm. For example, Caesar did this by substituting each letter by another. The result here is unreadable, unless you know the secret "key" with which is was encrypted.

I'd say that both operations transform information from one form to another, the difference being:
Encoding means transforming information from one form to another, in most cases it is easily reversible
Encryption means that the original information is obscured and involves encryption keys which must be supplied to the encryption / decryption process to do the transformation.
So, if it involves (symmetric or asymmetric) keys (aka a "secret"), it's encryption, otherwise it's encoding.

Encoding -》 example data is 16
Then encoding is 10000 means it's binary format or ASCII or UNCODED etc
Which can be read by any system eassily and eassy to understand it's real meaning
Encryption -》 example data is 16
Then encryprion is 3t57 or may be anything depend upon which algo is used to encryption
Which can be read by any system eassily BUT ony who can understand it's real meaning who has it's decryption key

These are little bit different from each other. The encoding used when we want to convert text in a specific computer coding technique and in the encryption we hide data between a specific key or text.

Encoding is process of transforming given set of characters in relevant accepted format, take this question's URL,
This is what we see -->
hhttps://stackoverflow.com/questions/4657416/difference-between-encoding-and-encryption
Over transmission this will be transformed to -->
https%3A%2F%2Fstackoverflow.com%2Fquestions%2F4657416%2Fdifference-between-encoding-and-encryption
^ is example of URL encoding using ASCII char set where,
: = %3A
/ = %2F
The reverse of Encoding is Decoding to original form and with given ASCII standard.
Encryption is process of converting plane text to cipher text so only authorized party can decipher it.
For example a simple HELLO is encrypted into KHOOR if just 3 characters are shifted.
p.s. Encoding (to code in some form) is form of encryption. :)
what-is-encryption

Encryption converts data to non-readable format (Possibly containing special non-readable characters).
Encoding helps to convert that data to readable format (characters) so that it can be stored for future use i.e. possibly during decryption.

Related

File Delimiters on AES 256 Encrypted fields

I have a requirement for one of my projects in which I am expecting a few of the incoming fields encrypted as AES-256 when sent to us by upstream. The incoming file is comma delimited. Is there a possibility that the AES encrypted fields may contain "," throwing off the values to different fields? What about if it is pipe delimited or some other delimiter?
Also, how what should be the datatype of these encrypted fields in order to read these encrypted fields using an ETL tool?
Thanks in Advance
AES as a block cipher is a family of permutations selected by the key. The output is expected to be random ( more precisely we believe that AES is a Pseudo-Random-Permutation)
AES ( like any block cipher) outputs binary data, usually as a byte array and bytes can take any value between 0 and 256 with equal probability.
You are not alone;
Transmitting binary data can create problems, especially in protocols that are designed to deal with textual data. To avoid it altogether, we don't transmit binary data. Many of the programming errors related to encryption on Stack Overflow are due to sending binary data over text-based protocols. Most of the time this works, but occasionally it fails and the coders wonder about the problem. The binary data corrupts the network protocol.
Therefore hex, base64, or similar encodings are necessary to mitigate this. Base64 is not totally URL safe and one can make it URL safe with a little work.
And note that has nothing to do with security; it is about visibility and interoperability.

Terminology : encoding without decoding?

I am having trouble understanding such basic concept.
I did some research about cryptography and manipulated few concepts (RSA key pair, AES/DES/whatever secret key, hash functions ...). But I would like to understand more deeply one basic thing :
Encoding is transforming a message into an other form.
Decoding is giving a message its original form.
Well, for me it sounds like encryption is encoding. And I think (please correct me) that encrytion is a way of encoding (for a very particular purpose : increasing the confidence of having a known list of person who can decode).
But what about hash function ? Since there is no decoding function, when we hash a message, can we say :
"this text is this message encoded with SHA-1 algorithm",
as we can surely say :
"this digest is this message hashed with SHA-1 algorithm" ?
Thank you !
Encoding, and its reverse decoding, are mere transformations of the data into some alternative form. Each form expresses the exact same data, just written differently. The transformation is well known and can be carried out by anyone.
Encryption, and its reverse decryption, is the encoding of data using a secret. The ciphertext (the encrypted data), is for all intents and purposes random noise. The ciphertext does not express the plaintext in some alternative format, the plaintext is hidden inside the ciphertext. The transformation is not well known, as it requires a secret key which, supposedly, only specific entities are in possession of.
In that way, yes, encryption is a specialised form of encoding, but in usage "encoding" typically means a transformation that can be carried out by anyone, while "encryption" specifically involves preventing unauthorised parties from carrying out the transformation.
Hashing is a one-way operation (there's no dehashing) and thereby entirely distinct from the other two operations.

keydata and IV for aes in tcl

I have a tcl/tk based tool, which uses network password for authentication. Issue is that, it is saving password in the logs/history. So objective is to encrypt the password.
I tried to use aes package. But at the very beginning aes::init asks for keydata and initialization vector (16 byte). So how to generate IV and keydata. Is is some Random number? I am a novice in encryption algorithms.
If you have the password in the logs/history, why not fix the bug of logging/storing it in the first place?
Otherwise there are distinct things you might want:
A password hashing scheme like PBKDF2, bcrypt, argon2 etc. to store a password in a safe way and compare some user input to it. This is typically the case when you need to implement some kind of authentication with passwords on the server side.
A password encryption and protection scheme like AES. You need a password to authenticate to some service automatically, and it requires some form of cleartext password.
You have some secret data and need to securly store it to in non cleartext form.
If you have case 1, don't use the aespackage, it is the wrong tool for the job. If you have case 2, the aes package might help you, but you just exchanged the problem of keeping the password secret with the other problem of keeping the key secret (not a huge win). So the only viable case where aes is an option might be 3.
Lets assume you need to store some secret data in a reversible way, e.g. case 3 from above.
AES has a few possible modes of operation, common ones you might see are ECB, CBC, OFB, GCM, CTR. The Tcllib package just supports ECB and CBC, and only CBC (which is the default) is really an option to use.
Visit Wikipedia for an example why you should never use ECB mode.
Now back to your actual question:
Initialization Vector (IV)
This is a random value you pick for each encryption, it is not secret, you can just publish it together with the encrypted data. Picking a random IV helps to make two encrypted blocks differ, even if you use the same key and cleartext.
Secret Key
This is also a random value, but you must keep it secret, as it can be used for encryption and decryption. You often have the same key for multiple encryptions.
Where to get good randomness?
If you are on Linux, BSD or other unixoid systems just read bytes from /dev/urandom or use a wrapper for getrandom(). Do NOT use Tcls expr {rand()} or similar pseudorandom number generators (PRNG). On Windows TWAPI and the CryptGenRandom function would be the best idea, but sadly there is no Tcl high level wrapper included.
Is that enough?
Depends. If you just want to hide a bit of plaintext from cursory looks, maybe. If you have attackers manipulating your data or actively trying to hack your system, less so. Plain AES-CBC has a lot of things you can do wrong, and even experts did wrong (read about SSL/TLS 1.0 problems with AES-CBC).
Final words: If you are a novice in encryption algorithms, be sure you understand what you want and need to protect, there are a lot of pitfalls.
If I read the Tcler's Wiki page on aes, I see that I encrypt by doing this:
package require aes
set plaintext "Some super-secret bytes!"
set key "abcd1234dcba4321"; # 16 bytes
set encrypted [aes::aes -dir encrypt -key $key $plaintext]
and I decrypt by doing:
# Assuming the code above was run...
set decrypted [aes::aes -dir decrypt -key $key $encrypted]
Note that the decrypted text has NUL (zero) bytes added on the end (8 of them in this example) because the encryption algorithm always works on blocks of 16 bytes, and if you're working with non-ASCII text then encoding convertto and encoding convertfrom might be necessary.
You don't need to use aes::init directly unless you are doing large-scale streaming encryption. Your use case doesn't sound like it needs that sort of thing. (The key data is your “secret”, and the initialisation vector is something standardised that usually you don't need to set.)

Is Base64 an encryption or encoding algorithm?

I have to use an encryption algorithm using Base64 but when I researched online I find forums state it is an encoding algorithm. This has me confused. :(
Is Base64 an encryption or encoding algorithm? How do we differentiate between the two except for the fact that one is publicly decipherable while the other needs a key for that?
It's an encoding algorithm (hence "Base64 encoding") to allow people to move data in an ASCII friendly environment (i.e. no control characters or anything non-printable). It should give you good portability with XML and JSON etc.
The encoding is entirely well known, the algorithm is simple and as it has not "mutability" of the algorithm or concept of keys etc. it is not considered as "encryption".
In summary, anybody can Base64 decode your content, so it's not encryption. At least not useful as encryption. It may keep a four year old stumped, but that's it.
An encoding algorithm merely presents data in an alternative format. It does not in any way attempt to hide data, it merely expresses the same data in an alternative syntax. Base64 is such an encoding algorithm. It merely encodes arbitrary data using only ASCII characters, which is useful in many situations in which non-ASCII characters may not be handled correctly. You can encode and decode Base64 back and forth all day long; there's no secret, no protection, no encryption.
The difference between encoding and encrypting is in whether you need to know a secret in order to get back the original form. base64 is an encoding because all you need to know is the algorithm to encode/decode.
When something is encrypted, there's a secret key that's used, and you need to know the key in order to decrypt it. There's two general types of encryption:
symmetric encryption = the same key is used to encrypt and decrypt. The correspondents using this encryption both need to know this key.
asymmetric encryption = different keys are used to encrypt and decrypt. This is also called public key encryption because you can make one of the keys well known (public), while keeping the other one secret (private). This allows anyone to encrypt a message that using the public key, while only the person who knows the private key can decrypt it, or vice versa.
One can certainly see Base64 as a substitution cipher with a pre-set/fixed key which also blows up the ciphertext by roughly 4/3, but this is not a very useful thought process. The main property of it is that it transforms some data into another format without some additional information. So it is an encoding algorithm.
Note that there are different variants of Base64 with different alphabets such as the one that is URL-safe (table 2 of the RFC4648). If you can set the alphabet with positions, then it will be an encryption algorithm, but it shouldn't be called Base64 anymore.

How to distribute init. vector with data encrypted via a symmetric cipher?

I want to provide for the user a service of encrypting some data via symmetric cipher to a file. The user simply provide a key and he/she may provide an initialize vector for the cipher.
Is there a standard how the file should look like? It makes sense to fill the file with the encrypted data and show the corresponding initialize vector in a dialog window. It may seem reasonable to someone else that the initialize vector should be stored in the file with the encrypted data.
The important thing for me is that the result is useful for a user and he/she won't need to bother with adjustment of the result.
Thank for a comment!
It is common practice to provide the IV as the first block of the cyphertext file. That way the receiver just treats the first 8 bytes (DES) or 16 bytes (AES) as the IV and the rest of the file as the actual cyphertext.
Use the same format for the IV as you are using for the cyphertext: Base64, hex, byte data or whatever.
In principle, you can use any format you want, as long as the decrypting part of the program knows how to read it. For efficiency, having the initialization vector before the data seems a good idea.
If you want to encrypt files, a good idea would be to not create your own format (which leads to you having to do decisions like the one here), but use an existing file format (which then also is a cryptographic protocol).
I recommend the OpenPGP message format, as defined in RFC 4880 (or some subset thereof, if you don't need all features). This also has the advantage that your clients then can decrypt your files using any OpenPGP implementation (like pgp or gpg), if your program somehow ceases to work (of course, only if they have the key/password).
you should be fine if you store the IV together with the encrypted data in the file ...

Resources