CR/LF generated by PBEWithMD5AndDES encryption? - encryption

May the encryption string provided by PBEWithMD5AndDES and then Base64 encoded contain the CR and or LF characters?

Base64 is only printable characters. However when it's used as a MIME type for email it's split into lines which are separated by CR-LF.

PBEWithMD5AndDES returns binary data. PBE encryption is defined within the PKCS#5 standard, and this standard does not have a dedicated base 64 encoding scheme. So the question becomes for which system you need to Base 64 encode the binary data. Wikipedia has a nice section within the Base 64 article that explains the various forms.
You may encounter a PBE implementation that returns a Base 64, and the implementation does not mention which of the above schemes is used. In that case you need to somehow figure out which scheme is used. I would suggest searching for it, asking the community, looking at the source or if all fails, creating a set of tests on the output.
Fortunately you are pretty safe if you are decoding base 64 and you are ignoring all the white space. Note that some implementations are disregarding padding, so add it before decoding, if applicable.
If you perform the encoding base 64 yourself, I would strongly suggest to not output any whitespace, use only the default alphabet (with '+' and '/' signs) and always perform padding when required. After that you can always split the result and replace any non-standard character (especially the '+' and '/' signs of course), or remove the padding.

I was using java with Andorid SDK. I found that the command:
String s = Base64.encodeToString(enc, Base64.DEFAULT);
did line wrapping. It put LF chars into the output string.
I found that:
String s = Base64.encodeToString(enc, Base64.NO_WRAP);
did not put the LF characters into the output string.

Related

How to represent acute accents in ASCII?

I'm having an encoding problem related to cookies on one of my websites.
A user is inputing Usuário, which has an acute accent, and that's being put in a cookie. The raw HEX for the cookie response is (for the Usuário string):
55 73 75 C3 A1 72 69 6F
When I see it in the browser, it looks like this:
...which is really messy. I need to fix this up.
Then I went to this website: http://www.rapidtables.com/convert/number/hex-to-ascii.htm and converted the HEX value to see how it would look like. And I got the same output:
Right. This means the HEX code is wrong. Then I tried to convert Usuário to ASCII to see how it should be. I used this WebSite: http://www.asciitohex.com/ and this is the result:
For my surprise, the HEX is exactly the one that is showing up messy. Why???
And how do I represent Usuário in ASCII so I can put it in a cookie? Should I manually encode it?
PS: I'm using ASP.NET, just in case it matters.
As of 2015 the standard of the web to store character data is UTF-8 and not ASCII. ASCII actually only contains the first 128 characters of the codepage, and does not include any kind of accented characters. To add accented characters to this 128 characters there were many legacy solutions: codepages. They each added 128 different characters to the default ASCII list thereby allowing representing 256 different characters.
The problem was, that this didn't properly solve the issue: ASCII based codepages were more or less incomatible with each other (except for the first 128 characters), and there was usually no way of programatically knowing which codepage was in used.
One of the solutions was UTF-8, which is a way to encode the unocde character set (containing most of the characters used around the world, and more) while trying to remain compatible with ASCII. The first 128 characters are actually the same in both cases, but afterwards UTF-8 characters become multi-byte: one character is encoded using a series of bytes (usually 2-3, depends on which character needs to be encoded)
The problem is if you are using some kind of ASCII based single byte codebase (like ISO-8859-1), which encodes supported characters in single bytes, but your input is actually UTF-8, which will encode accented characters in multiple bytes (you can see this in your HEX example. á is encoded as C3 A1: two bytes). If you try to read these two bytes in an ASCII based codepage, which uses single bytes for every characters (in West-Europe this codepage is usually ISO-8859-1), then each of this two bytes will be reprensented with two different characters.
In the web world the default encoding is UTF-8, so your clients will usually send their requests using UTF-8. ASP.NET is Unicode aware, so it can handle these requests. However somewere in your code this UTF-8 is converted acccidentally into ISO-8859-1, and then back into UTF-8. This might happen on various layers. As you have issues it probably happens at the cookie layer, which is sometimes problematic (here is how it worked in 2009). You should also double check your application that it uses UTF-8 everywhere else though (views, database, etc.), if you want to properly support accented characters.

What is the encoding of this data?

Who can tell me the encoding type of these data? It doesn't seem to be base64.
/zZ/u00GIaP9HW010G000G01003/sm1302WS7YCU6IWZ8ICjAoWmF6H1F3StF7jONKbaaO2Pbe+
0Z8gWjER3eAhQhOgCoF/Bskxr////cy7////w/+Rz//Z/sm130IijBJmrF7P1GNRufOob+FZu+F
Zu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZu+FZ/m00H200X07430
I800X410n41/yG07m000GK10G410G400000000000420mG51WS82GeB/yG0jH000W430m840mK5
10G0005z0G8300GH1H8XCK464r5X1o9n53A1aQ488qAnmHLIqV0aCs9oWWaA5XSO6Heb9YSeAIe
qDJOtE3awGqH5HaT8IKfJL5LMLrXPMcDaPMPdQ6bgStHrTdTuUNg3X8M6XuY9YfAJb9MMbvYPcg
AZfAMcfwYfghApjBMsjxYvkiB3nCN6nyZ9ojBJrDNMrzZPsk7Yu+JbvkVewUhnylFqzVRt+Fdw/
yG07m400m410G410G410G00000000420mG51WS82GeB/yG0jH400W4210G310S510G00G9t0042
0n441I4n1X91KGTXSHCYCe4854AHeR712ICpKl0LOdBH2XOaDE4byHSO6Hec9oWfAZKsDpWvEaD
4HKP7I4bAKrHLLbTOMLfZP6LcPsXfQdDqTNPtU7bwWeE4XOQ7Y8cAafEKbPQNc9cQegEafQQdgA
cgihEqjRQtkBcwmiF4nSR7oCdAqjFKrTRNsDdQukFavURdwEdgylFqzVRt+Fdw/ze030C1008H0
n40Fm2vQMjkzf0pGHVSKab1ad5I/PBPkbl5Z/S7D5dyrb1dfvQ/ZnIotCAIB6x7B38LLB5lm7Qc
G9zajcwMyMFzmSr18sd82pnn1586H5a4aP7EFIfblRUMRoVCsj/TO5IVph7kBAUH1TnkMJPl18s
bU/JFuvxydwWpLYZi9s8YItR7OAkV/m1LFIsjPLLrjujX08FbWPhdxUEIvlnvESbzsuZE1dgSd+
jT7DF77jyni1ZXG0IMFq52JUmCRzajcwMyMFy0S7D7sIsRfRnO/m1mSqXlRSo17VPaP0TIkVpxK
iy+C95jQHssgfFV6Sds0fyhMuYbBFPBUHsyTh4+M2imK31pzAjImsKKPbaWYMDUfyiS/fM8xgcg
ix7v1EIJrutLSdqoUxccdSX2I0e7Ex0ndhmE/Vy0ngQIjOQBqCTZSblAXYPKE2H6C4/N5IVPBPk
bl5Z/071pMHRwQ3V97vmEq1sKZ3O/0d+VlMpBSHHZzv8gZhZkVmzAWFGRzajcwMyMFzmSqVPBPk
bl5Z/S7DBzfYPrGahk+xkKZT+TJVV/0Dt+T0Qd6qKKKYZgxFvhA3FJor/7Yi+rbaRKBif5vpbzf
F2XL1nr/BZlYj2p+QoWpqyjVnugl5ljRYLNHtXcSoAw8JrwWu/3wqo1VEZakaIxjbZaF+hSuODW
yOFzAfHtKgj1O+KZgGkL2bICW7a+eF9EEsQkp1hwvX2HbOeM3cHq8px3FwrFBPmN4WaTCaSWXYC
dru/3ds50p1byrBpVRAoC+S8WzEm0wZZkFK7a6j4oksgkVACiYe0g361m2VcFrFrpLo6pWZVUY3
e02UJZ6C0+d+UcAYn91TFAoj97C536DUX0nqzEl+UkbDsk9wYpJ8P45vRg49mid3BdwaSVvxKv5
bCiapnAt92yu9K7W3sxzidZfKTtklWi4KR1IGnaT201xPxrU+//0Blyw9Q90SvgCPMyas/SPYm9
mESPFFy08VJ7NdRUQIApCio1olB2FE2Czi+t+SK2pWPjs7FkP6EUdlx3yXKWXZC8Y0Fb3W0a/m0
wKfNIGRb3Hcy/xHCxPTt+PSzktuSdyglI97l+q6CiLN0mCa/GVv/AajxQA5I8a037B7+yVyAYbb
dIut6DfBSZ0239pKeQrUX3VJ6TOeoZHHkGJ98E1MZz/m3tVvrHkNUyZycE1nd1tEk0Ajn9jYHCv
LL0p/UflOgMoHo5555G0KKKK0555501HHHG0KKKK0555501HHHG0KKKK0555507/za
(Line endings inserted for readability)
It appears to be base64. If you add a single equal ('=') for padding to the end your decoder should be happy (see https://en.wikipedia.org/wiki/Base64#Padding).
Decoded it's 1568 bytes which mod 16 is zero. This histogram of byte value occurance is flat. So I'd guess something encrypted with a 128 bit block cipher like AES.
It does look like base64 to me. Most variants of base64 include the following characters:
A-Z a-z 0-9 + / (and = for padding)
However, if it were proper base64 it would end with a single = as padding, as the 2091 characters don't exactly fit a number of bytes.
Your data doesn't seem to decode to anything readable, so it might be binary data, or encrypted (or both). Only with thorough knowledge of cryptography systems, and a lot of hints and luck, might some expert be able to figure out the encryption used (if any), but that's beyond the scope of this site.
Without more information as to the source of the data, we can only guess.

Why does the encoding's of a URL and the query string part differ?

I was researching why my query parameters have plus + signs in it instead of %20 and why they have strings like %C3%BC instead of a ü (UTF-8) as an encoded URL does.
After 2 hours of thinking my webapp is not compatible to the URL encoding standard I found that the encoding scheme of a query string is not the same as the encoding of a URL (here i mean the part without the query string).
Examples:
URL:
whitespace encodes to %20
UTF-8 chars stays UTF-8 chars
Query params:
whitespace encodes to +
UTF-8 chars encodes to the hex representation
So can someone tell me why do the encoding schemes differ, since the query parameters are a part of the URL?
See:
wiki Percent-encoding
wiki: Query String
URIs originated in RFC 1630, with percent-encoding as a method to allow "unsafe" characters to be represented. This original version actually mentioned the ISO Latin 1 character set as the encoding for non-ASCII characters. RFC 1738 later that year removed this reference to Latin-1 in defining URLs.
The query string format is actually a different but related encoding, application/x-www-form-urlencoded, defined in RFC 1866 along with HTML 2.0. It was based on RFC 1738, but specified that spaces (not all whitespace, just the character with ASCII code 0x20) are replaced by '+' and that line breaks are to be encoded as CRLF (i.e. %0D%0A). The former is likely because that saves 2 bytes for a very common character in form submissions at the expense of using an extra 2 bytes for a much less common character, and the latter is to avoid problems when transferring between systems using different end-of-line codings. Non-ASCII characters were left unconsidered.
UTF-8 coding in URIs came over a decade later, in RFC 3986, although individual protocols may have specified this or another encoding of non-ASCII characters earlier. To maintain backwards compatibility, all UTF-8 octets must be percent-encoded. The companion RFC 3987 defines "Internationalized Resource Identifiers" (IRIs) which are basically "URIs with most codepoints 160 and above allowed to appear unencoded", but many protocols still require URIs. Note that your statement above is incorrect, as a URL may not contain an unencoded ü or any other non-ASCII character.
application/x-www-form-urlencoded has been internationalized in a different manner. The HTML5 specification of application/x-www-form-urlencoded explicitly allows that any ASCII-compatible character set may be used for characters in the query string, and in fact different fields may use different character sets, but all non-ASCII octets must still be percent-encoded. When used in the query part of an IRI, it is possible that these characters could be represented unencoded if properly-normalized UTF-8 is being used as the character set, since conversion back to a URI would result in correct application/x-www-form-urlencoded data.
They don't necessarily have to differ, a + is a valid path character and a ü is a valid search character (per RFC 3987). You're probably seeing browsers or some other preconceived encoding scheme making assumptions that are either outdated or overly cautious.
There is no difference between + and %20 when it comes to Query string parameters:
SPACE is encoded as '+' or '%20'
Quote reference

Please help identify multi-byte character encoding scheme on ASP Classic page

I'm working with a 3rd party (Commidea.com) payment processing system and one of the parameters being sent along with the processing result is a "signature" field. This is used to provide a SHA1 hash of the result message wrapped in an RSA encrypted envelope to provide both integrity and authenticity control. I have the API from Commidea but it doesn't give details of encoding and uses artificially created signatures derived from Base64 strings to illustrate the examples.
I'm struggling to work out what encoding is being used on this parameter and hoped someone might recognise the quite distinctive pattern. I initially thought it was UTF8 but having looked at the individual characters I am less sure.
Here is a short sample of the content which was created by the following code where I am looping through each "byte" in the string:
sig = Request.Form("signature")
For x = 1 To LenB(sig)
s = s & AscB(MidB(sig,x,1)) & ","
Next
' Print s to a debug log file
When I look in the log I get something like this:
129,0,144,0,187,0,67,0,234,0,71,0,197,0,208,0,191,0,9,0,43,0,230,0,19,32,195,0,248,0,102,0,183,0,73,0,192,0,73,0,175,0,34,0,163,0,174,0,218,0,230,0,157,0,229,0,234,0,182,0,26,32,42,0,123,0,217,0,143,0,65,0,42,0,239,0,90,0,92,0,57,0,111,0,218,0,31,0,216,0,57,32,117,0,160,0,244,0,29,0,58,32,56,0,36,0,48,0,160,0,233,0,173,0,2,0,34,32,204,0,221,0,246,0,68,0,238,0,28,0,4,0,92,0,29,32,5,0,102,0,98,0,33,0,5,0,53,0,192,0,64,0,212,0,111,0,31,0,219,0,48,32,29,32,89,0,187,0,48,0,28,0,57,32,213,0,206,0,45,0,46,0,88,0,96,0,34,0,235,0,184,0,16,0,187,0,122,0,33,32,50,0,69,0,160,0,11,0,39,0,172,0,176,0,113,0,39,0,218,0,13,0,239,0,30,32,96,0,41,0,233,0,214,0,34,0,191,0,173,0,235,0,126,0,62,0,249,0,87,0,24,0,119,0,82,0
Note that every other value is a zero except occasionally where it is 32 (0x20). I'm familiar with UTF8 where it represents characters above 127 by using two bytes but if this was UTF8 encoding then I would expect the "32" value to be more like 194 (0xC2) or (0xC3) and the other value would be greater than 0x80.
Ultimately what I'm trying to do is convert this signature parameter into a hex encoded string (eg. "12ab0528...") which is then used by the RSA/SHA1 function to verify the message is intact. This part is already working but I can't for the life of me figure out how to get the signature parameter decoded.
For historical reasons we are having to use classic ASP and the SHA1/RSA functions are javascript based.
Any help would be much appreciated.
Regards,
Craig.
Update: Tried looking into UTF-16 encoding on Wikipedia and other sites. Can't find anything to explain why I am seeing only 0x20 or 0x00 in the (assumed) high order byte positions. I don't think this is relevant any more as the example below shows other values in this high order position.
Tried adding some code to log the values using Asc instead of AscB (Len,Mid instead of LenB,MidB too). Got some surprising results. Here is a new stream of byte-wise characters followed by the equivalent stream of word-wise (if you know what I mean) characters.
21,0,83,1,214,0,201,0,88,0,172,0,98,0,182,0,43,0,103,0,88,0,103,0,34,33,88,0,254,0,173,0,188,0,44,0,66,0,120,1,246,0,64,0,47,0,110,0,160,0,84,0,4,0,201,0,176,0,251,0,166,0,211,0,67,0,115,0,209,0,53,0,12,0,243,0,6,0,78,0,106,0,250,0,19,0,204,0,235,0,28,0,243,0,165,0,94,0,60,0,82,0,82,0,172,32,248,0,220,2,176,0,141,0,239,0,34,33,47,0,61,0,72,0,248,0,230,0,191,0,219,0,61,0,105,0,246,0,3,0,57,32,54,0,34,33,127,0,224,0,17,0,224,0,76,0,51,0,91,0,210,0,35,0,89,0,178,0,235,0,161,0,114,0,195,0,119,0,69,0,32,32,188,0,82,0,237,0,183,0,220,0,83,1,10,0,94,0,239,0,187,0,178,0,19,0,168,0,211,0,110,0,101,0,233,0,83,0,75,0,218,0,4,0,241,0,58,0,170,0,168,0,82,0,61,0,35,0,184,0,240,0,117,0,76,0,32,0,247,0,74,0,64,0,163,0
And now the word-wise data stream:
21,156,214,201,88,172,98,182,43,103,88,103,153,88,254,173,188,44,66,159,246,64,47,110,160,84,4,201,176,251,166,211,67,115,209,53,12,243,6,78,106,250,19,204,235,28,243,165,94,60,82,82,128,248,152,176,141,239,153,47,61,72,248,230,191,219,61,105,246,3,139,54,153,127,224,17,224,76,51,91,210,35,89,178,235,161,114,195,119,69,134,188,82,237,183,220,156,10,94,239,187,178,19,168,211,110,101,233,83,75,218,4,241,58,170,168,82,61,35,184,240,117,76,32,247,74,64,163
Note the second pair of byte-wise characters (83,1) seem to be interpreted as 156 in the word-wise stream. We also see (34,33) as 153 and (120,1) as 159 and (220,2) as 152. Does this give any clues as the encoding? Why are these 15[2369] values apparently being treated differently from other values?
What I'm trying to figure out is whether I should use the byte-wise data and carry out some post-processing to get back to the intended values or if I should trust the word-wise data with whatever implicit decoding it is apparently performing. At the moment, neither seem to give me a match between data content and signature so I need to change something.
Thanks.
Quick observation tells me that you are likely dealing with UTF-16. Start from there.

How I encode the ugly string?

I have a string that is:
!"#$%&'()*+,-./0123456789:;?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[]\^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª« ®¯°±²³´µ¶•¸¹º»¼½¾¿ÀÁÂÃÄÅàáâäèçéêëìíîïôö÷òóõùúý
I post that to service and used Htmlencode, then I get a result:
!#$%&'()* ,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~����������� ���������•������������������������������������
it isn't result that i need,how i get original string? thanks!
Your string is not ASCII, so you are either using a string to represent binary data, or you're not maintaining awareness of multi-byte encoding. In any case, the simplest way to deal with any Internet-based technology (HTTP, SMTP, POP, IMAP) is to encode it as 7-bit clean. One common way is to base64-encode your data, send it across the wire, then base64-decode it before trying to process it.
I believe this is what you're looking for:
!"#$%&&apos;()*+,-./0123456789:;?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[]\\^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«®¯°±²³´µ¶•¸¹º»¼½¾¿ÀÁÂÃÄÅàáâäèçéêëìíîïôö÷òóõùúý
You just need to use a better html entity/encoding library or tool. The one I used to generate this is from Ruby - I used the HTML Entities library. The code I wrote to do this follows. I had to put your text in input.txt to preserve Unicode (there was an EOF character in the string), but it worked great.
require 'rubygems'
require 'htmlentities'
str = File.read('input.txt')
coder = HTMLEntities.new
puts coder.encode(str, :named)

Resources