base64 decoded string gives weird chars - encryption

eyJpc3MiOiJodHRwczpcL1wvYXV0aC5zbmFwY2hhdC5jb21cL3NuYXBfdG9rZW5cL3Rva2VuIiwidHlwIjoiSldUIiwiZW5jIjoiQTEyOENCQy1IUzI1NiIsImFsZyI6ImRpciIsImtpZCI6InNuYXAtYWNjZXNzLXRva2VuLWExMjhjYmMtaHMyNTYuMCJ9..mpqjrn8IPdzqrQC0VhJwMA.JJN9Rrc1k_qh1Iq-jGS-1754-iI_L5mISH7mHix5WCIXx4wqkQz3z8o9nDcBRUJijioV_EMFYW9OayWGHaFR5NlG0ROKHfkJPPWSz4Y47jyZwQxKjEQDMCdPi9HcpNJM_ao6umAQj3gdfFqGK8M9e2_oYy-q6bR6UzeqFvQVLt599KLwl2yJhevgLRFBs7kLd5NG8ZsKGNhTwWs7zYPPZFutyhOmPY13zt1hJsSwek1UXRRZm8qZEEQZsmSbuSQ0sAMvyIh9uZyMCEwdMfo6pU31cnya29Pi_vHJP_TLHH0PNgddOPzpp911Yp4c1lfEY99C3dknQ5DJFtkfdaA3MAUrqKj8NAsIcrX8qPrxpVhDgZ2tqqrkgQb6EMoxEIdRGssIRdR5_jL-F8_8xfhNxIM3mv1NEPkSPIBfOsbSRbBGPecCUwmaB-yP9OmPEyUWv0ieQkGKp5B1J6cFykrMlpmmGkB7H9WIwuDNM4IPLBBBaLgGegIBdwrTU22Yv7Qn2RXKpDObPRuSghUmIvLpr_LwGZ78N4YW-G-nTw_EOjlD58UDHOuth_EcKszBeLs0_EIe9JZzykjulg3ffROHI-
This is a token. when base64 decoded it gives some valid output but then it starts printing weird chars. Is this really only b64 or is their even a way to tell. I stripped some of the chars out for obvious reasons.

It's not base 64 encoded, it is base 64 URL encoded. Replace the - with a + and the _ (underscore) with the / character, then pad with = characters until you have a multiple of 4 base 64 characters (not counting whitespace). Then decode and the result should be correct. If not, the base 64 URL code was probably damaged.
I presume you have stripped off the characters and replaced them by a dot, because dots should not be present in URL-safe base 64.
Of course finding and using a base64url decoder would be more efficient than the generic find/replace/append scheme mentioned here.

Related

Original Base64 value is edited but still it is giving same normal string

I am encrypting the plain text using RSA and converting that value to base64 string.But while decrypting the I altered the base64 string and try to decrypt it...it given me same original text return.
Is there any thing wrong ?
Original Plain Text :007189562312
Output Base64 string : VfZN7WXwVz7Rrxb+W08u9F0N9Yt52DUnfCOrF6eltK3tzUUYw7KgvY3C8c+XER5nk6yfQFI9qChAes/czWOjKzIRMUTgGPjPPBfAwUjCv4Acodg7F0+EwPkdnV7Pu7jmQtp4IMgGaNpZpt33DgV5AJYj3Uze0A3w7wSQ6/tIgL4=
Altered Base64 String : VfZN7WXwVz7Rrxb+W08u9F0N9Yt52DUnfCOrF6eltK3tzUUYw7KgvY3C8c+XER5nk6yfQFI9qChAes/czWOjKzIRMUTgGPjPPBfAwUjCv4Acodg7F0+EwPkdnV7Pu7jmQtp4IMgGaNpZpt33DgV5AJYj3Uze0A3w7wSQ6/tIgL4=55
Please explain. Thank you.
I'm assuming you're asking whether the altered ciphertext should have thrown an error when decrypting. It looks like the altered string only adds two characters to the end and is otherwise the same string.
Your Base 64 library probably makes some reasonable assumptions when parsing Base 64 data. Base 64 works by encoding 3 bytes into 4 characters. If at the end the data length is not a multiple of 3 it must be padded. That is signalized by the = at the end of the encoded string.
This also means that during parsing, the library knows that padding characters are at the end and stops parsing there. If the alteration appeared at the end of the string then the encoded ciphertext didn't effectively change.

How to represent acute accents in ASCII?

I'm having an encoding problem related to cookies on one of my websites.
A user is inputing Usuário, which has an acute accent, and that's being put in a cookie. The raw HEX for the cookie response is (for the Usuário string):
55 73 75 C3 A1 72 69 6F
When I see it in the browser, it looks like this:
...which is really messy. I need to fix this up.
Then I went to this website: http://www.rapidtables.com/convert/number/hex-to-ascii.htm and converted the HEX value to see how it would look like. And I got the same output:
Right. This means the HEX code is wrong. Then I tried to convert Usuário to ASCII to see how it should be. I used this WebSite: http://www.asciitohex.com/ and this is the result:
For my surprise, the HEX is exactly the one that is showing up messy. Why???
And how do I represent Usuário in ASCII so I can put it in a cookie? Should I manually encode it?
PS: I'm using ASP.NET, just in case it matters.
As of 2015 the standard of the web to store character data is UTF-8 and not ASCII. ASCII actually only contains the first 128 characters of the codepage, and does not include any kind of accented characters. To add accented characters to this 128 characters there were many legacy solutions: codepages. They each added 128 different characters to the default ASCII list thereby allowing representing 256 different characters.
The problem was, that this didn't properly solve the issue: ASCII based codepages were more or less incomatible with each other (except for the first 128 characters), and there was usually no way of programatically knowing which codepage was in used.
One of the solutions was UTF-8, which is a way to encode the unocde character set (containing most of the characters used around the world, and more) while trying to remain compatible with ASCII. The first 128 characters are actually the same in both cases, but afterwards UTF-8 characters become multi-byte: one character is encoded using a series of bytes (usually 2-3, depends on which character needs to be encoded)
The problem is if you are using some kind of ASCII based single byte codebase (like ISO-8859-1), which encodes supported characters in single bytes, but your input is actually UTF-8, which will encode accented characters in multiple bytes (you can see this in your HEX example. á is encoded as C3 A1: two bytes). If you try to read these two bytes in an ASCII based codepage, which uses single bytes for every characters (in West-Europe this codepage is usually ISO-8859-1), then each of this two bytes will be reprensented with two different characters.
In the web world the default encoding is UTF-8, so your clients will usually send their requests using UTF-8. ASP.NET is Unicode aware, so it can handle these requests. However somewere in your code this UTF-8 is converted acccidentally into ISO-8859-1, and then back into UTF-8. This might happen on various layers. As you have issues it probably happens at the cookie layer, which is sometimes problematic (here is how it worked in 2009). You should also double check your application that it uses UTF-8 everywhere else though (views, database, etc.), if you want to properly support accented characters.

CR/LF generated by PBEWithMD5AndDES encryption?

May the encryption string provided by PBEWithMD5AndDES and then Base64 encoded contain the CR and or LF characters?
Base64 is only printable characters. However when it's used as a MIME type for email it's split into lines which are separated by CR-LF.
PBEWithMD5AndDES returns binary data. PBE encryption is defined within the PKCS#5 standard, and this standard does not have a dedicated base 64 encoding scheme. So the question becomes for which system you need to Base 64 encode the binary data. Wikipedia has a nice section within the Base 64 article that explains the various forms.
You may encounter a PBE implementation that returns a Base 64, and the implementation does not mention which of the above schemes is used. In that case you need to somehow figure out which scheme is used. I would suggest searching for it, asking the community, looking at the source or if all fails, creating a set of tests on the output.
Fortunately you are pretty safe if you are decoding base 64 and you are ignoring all the white space. Note that some implementations are disregarding padding, so add it before decoding, if applicable.
If you perform the encoding base 64 yourself, I would strongly suggest to not output any whitespace, use only the default alphabet (with '+' and '/' signs) and always perform padding when required. After that you can always split the result and replace any non-standard character (especially the '+' and '/' signs of course), or remove the padding.
I was using java with Andorid SDK. I found that the command:
String s = Base64.encodeToString(enc, Base64.DEFAULT);
did line wrapping. It put LF chars into the output string.
I found that:
String s = Base64.encodeToString(enc, Base64.NO_WRAP);
did not put the LF characters into the output string.

Why does the encoding's of a URL and the query string part differ?

I was researching why my query parameters have plus + signs in it instead of %20 and why they have strings like %C3%BC instead of a ü (UTF-8) as an encoded URL does.
After 2 hours of thinking my webapp is not compatible to the URL encoding standard I found that the encoding scheme of a query string is not the same as the encoding of a URL (here i mean the part without the query string).
Examples:
URL:
whitespace encodes to %20
UTF-8 chars stays UTF-8 chars
Query params:
whitespace encodes to +
UTF-8 chars encodes to the hex representation
So can someone tell me why do the encoding schemes differ, since the query parameters are a part of the URL?
See:
wiki Percent-encoding
wiki: Query String
URIs originated in RFC 1630, with percent-encoding as a method to allow "unsafe" characters to be represented. This original version actually mentioned the ISO Latin 1 character set as the encoding for non-ASCII characters. RFC 1738 later that year removed this reference to Latin-1 in defining URLs.
The query string format is actually a different but related encoding, application/x-www-form-urlencoded, defined in RFC 1866 along with HTML 2.0. It was based on RFC 1738, but specified that spaces (not all whitespace, just the character with ASCII code 0x20) are replaced by '+' and that line breaks are to be encoded as CRLF (i.e. %0D%0A). The former is likely because that saves 2 bytes for a very common character in form submissions at the expense of using an extra 2 bytes for a much less common character, and the latter is to avoid problems when transferring between systems using different end-of-line codings. Non-ASCII characters were left unconsidered.
UTF-8 coding in URIs came over a decade later, in RFC 3986, although individual protocols may have specified this or another encoding of non-ASCII characters earlier. To maintain backwards compatibility, all UTF-8 octets must be percent-encoded. The companion RFC 3987 defines "Internationalized Resource Identifiers" (IRIs) which are basically "URIs with most codepoints 160 and above allowed to appear unencoded", but many protocols still require URIs. Note that your statement above is incorrect, as a URL may not contain an unencoded ü or any other non-ASCII character.
application/x-www-form-urlencoded has been internationalized in a different manner. The HTML5 specification of application/x-www-form-urlencoded explicitly allows that any ASCII-compatible character set may be used for characters in the query string, and in fact different fields may use different character sets, but all non-ASCII octets must still be percent-encoded. When used in the query part of an IRI, it is possible that these characters could be represented unencoded if properly-normalized UTF-8 is being used as the character set, since conversion back to a URI would result in correct application/x-www-form-urlencoded data.
They don't necessarily have to differ, a + is a valid path character and a ü is a valid search character (per RFC 3987). You're probably seeing browsers or some other preconceived encoding scheme making assumptions that are either outdated or overly cautious.
There is no difference between + and %20 when it comes to Query string parameters:
SPACE is encoded as '+' or '%20'
Quote reference

HttpUtility.HtmlDecode cannot decode ASCII greater than 127

I have a list of character that display fine in WebBrowser in the form of encoded characters such as €  ...
But when posting these characters onto server to I realized that HttpUtility.HtmlDecode cannot convert them to characters as browser did, they all become space.
text = System.Web.HttpUtility.HtmlDecode("€");
I expect it to return € but it return space instead. The same thing happen for some other characters as well.
Does anyone know how to fix this or any workaround?
This is commonly result of using literal values and mixing UTF-8 and ASCII. In UTF-8 euro sign is encoded as 3 bytes so there is no ASCII counterpart for it.
Update
Your code is illegal if you are using UTF-8 since it only supports the first 128 characters and the rest are encoded is multiple bytes. You need to use the Unicode syntax:
// !!! NOT HtmlDecode!!!
text = System.Web.HttpUtility.UrlDecode("%E2%82%AC");
UPDATE
OK, I have left the code as it was but added the comment that it does not work. It does not work because it is not an encoding which is of concern for HTML - it is not an HTML. This is of concern for the URL and as such you need to use UrlDecode instead.
ASCII is 7-Bit; there are no characters 128 through 255. The MSDN article you linked is following the long tradition of pretending ASCII is 8-Bit; the article actually shows code page 437.
I'm not sure why you're not simply writing € (compatibility?), but € or € should do, too.
You typically want to do something like:
string html = "€"
string trash = WebUtility.HtmlDecode(html);
//Convert from default encoding to UTF8
byte[] bytes = Encoding.Default.GetBytes(trash);
string proper = Encoding.UTF8.GetString(bytes);

Resources