I'm trying to decode a json and have no idea where to start.
i would like to know how i can try to figure out what encoding this json or any other string is using, how can i do that? is there a list of the most used encoders?
I only know a few formats:
Base64, HEX, UTF8, URL, MD5
Sorry for my ignorance I'm new to the subject and want to learn more.
{
"event":2,
"data":{
"YEAbwsMHIW-f":"A6iuU6B7AQAAUeDxaDmLyNOisr_dYVFqppbIY-iIPEOIvfBKifp72ZzGne6GAclFzmSuctWowH8AADQwAAAAAA==",
"YEAbwsMHIW-b":"53sf37",
"YEAbwsMHIW-c":"AGB8UaB7AQAAvAKgqnFEgnYq1px145WH3ZX9OzlLT0yTIb2g0jS6CZVAueBb",
"YEAbwsMHIW-d":"ABaChIjBDKGNgUGAQZIQhISi0eIApJmBDgA0ugmVQLngWwAAAABVAgdIAPyLJ_H11hCM1VUTI6_DwmE",
"YEAbwsMHIW-z":"q",
"YEAbwsMHIW-a":"UYD3v6BpIocdXxHjH0gD8ng2eU2Nx_KWHUmYHEG3XxSo3Eewva8p9vuvaoJrOH7lhji_L10el5qvehZuMJ1sGse2QrF7JhP=KUg7XJ37ivsgUzImVIztGpZsZkDvT=Qji8=ds0sn_m4wojH6ElNQDD0ZhhgBulkbpUq0uAw7-zLzG8YDcVCsHbcpPiiUTMwrAie32C-TzEbD9EsBt=GVYJwErhx5TrxSfGcQU_WMEBBds6kE3Nm_hte7L3zvDSpqkbKDlTNtAtYZOWgMg4huh-_nAhsMx0zzQ5MNMls2-o5AbZm3RIljmTs5IO4U0bwmWzJScqhg1YCDu3GAqo96YX5NglFSoUsMx8W=TuvEngXO7eROuMQdvcWtVMWY5gcRhgXkUACdWNtrkr=J-6q8PSnNc8PdwpU2Hzw3_uJgIJ8SF7F=wS1ixSgwY-sKhlKYWPQiOGCsDjKUphWOhG6eGkBFKbi5gHmffzqXtKrn413At3A0MxvP=Mbn3FLxI9HGQ-b4A8dIMh0Z_3MsxfD3sruw1bDg_gxMoI4tmKBO-MvQrmtGBPYF1_I=RUknIwfnVQFWDS7So6gmfT4n56OWxrr2smutwE4PdmnOwczVsxRZhEKBpWEFOvp-zYjHte4fYW9gJt7Hg7tJIuTQg95jYGbabLktnX44XCOc7=dn5HKRS8BtMAb2=W4xlEDxo2cUA=pNn7OCG4lGuiUADCmULnVRCQ66L9cZwoxJEwVUdQvrwmRtpfIi_vrrwQS0l-TMknTmHItCJ3Ul68WctnvCqOcok=0_SwBVXed0KEWP7W=pR3xV92eYKRt3ou98NKCIukems6KjpMMFG-kIhn5acGUL_c7SzE=v2qNE5WvHmjvfEx8da3LSlzku8EGMKf8T_q6JZzJpPWl_KUZNz6hfATDLJTQMPI_5v9mwqstTFIzpsqYvpKMZ5WwJr9iUFWZMpV3lQKeISeHRDHF-O3g=dc6AgRIKGqlB-iY_FluUQb6JoJGU7MJvVwaaIDtGGULZ8fKG01djvNtTzv9Sf=sMnZmAeiD0ER86hKo=FxwW6ShjVKFwf=IEIpD=xELrZA0GLfxxvo22fzTLrNDnRGMU1oPeZsEEb_Ecuk_FxWVXdbLWktmPNIR3wuxegXSLwaeVI-0TaXdCEIejR1e_u0afWYaVpiFkRooM3mRituqc1GBH8dWSA9tU2o3cb0_JEv2pru2Q0rxZuEQvJiBuBscJaPGF=X3vK1mZEBlQVaeVThowDPLKIv3bpHwqCk0DScx6eq97k5JCnhAG--BUSIEUNdmbsCmG6Wu5MaVmu37MGUZtM3wsECwnAJafXDdw4YDd41tnkAUQ9sPIbNSgKfA2wCDDSAIhgfLVsYCNLnC4XVHE8CfsdBODU2tF3nKJcmT-3YwsHreXE323oD-dbWJtjBEUWGqW7nZLvQ5EnQzRhUW7gj88XN0fTtH_8H7cFSuDoMl2C=Q5VBpAHo-FrPzGHN9Rm=MA4_oJBNvPNT_SGlhIQ8M3veZn4Z3r7tba_6H3mr1ZRY4nfrove9AF67th-vLVvqsRexE8KIYTNHHvzIjTMC2eXrw9BFJvlwkaLAWj=CAO-Pq8k60Nxq_eUKAVt7r2aTpT6nbiAPS2XlhJePr2eL61WNNwZHzvwvVGC7MvpoEPIN7tO3K_mTnaKRNWD_Ee3NPnZbpl6cI=XrATTsugAXI5S3c1LNpEHrXmN3oMjB6ZhGshO4j6L0MBbR_24=ItKXzo4-QI=E-Av0g_6Z7=qwkpPQ4vlMaFvpNEvSxXU3h88Jh=AzZ3BuphHhzVdNN4D2Jih5uAarXU2eNYsCktfUKJgWfbCkK4qbMF0qonILBdArOWRf7xbg2dFor8jlZBfbhiDjmMPaLJRQwojJtT27DE_0X2indDo0MQ4WWCXbrnTaaddDQgr6684xfxnBLCbqkcdXZQW2LTESS6c-smg8Uxsw7dJn2OsrYaubQUnxVizhHeCpTxdx34tohh-m_OQcaKsMOQ6I7nr1u5vr_1_w8TKMY0jf1po9km1_QYZVrehx-UP_bGnHLiMiQFVz9VNeed_0xsHGJZN3WxZ344WATtVmCpMWURgmx0x4hnlrf4xLnQEhxA5NYGNxeAsx9jhoDkU23f=KT8hpWJzFOk1LsxGLCZCAFNRM4ILWsNtc=sbMh2ECKHUmV5gGFKmIsBSgLxUSE0=988IGdV_wiqmjSRHNZqGVam0JYg3Yw6YTtC5WYrLRSewwjSiSEkeg7gcOQsp9N1YO2uGKm4-wMC_u3GgPwYwe5Wj2J6nxBMl2w5o9a16kKW_1WLZEdP0MKReBWtX8kP_GSDmt2KoZ6MkM3ktg5fSQdMmkHJPCFIU5ipC05QX=DBO5wsnMbFPl91MU=ZGV9jvZGZJZfS9aVf0k0QaMv9XARILI=Pmlhg8vELbPGbLtJf-eDbAGmNulE_nUi6-Yrq0R78W9fOcNsiBZRhGOK3EVsG205_=0VTDiiHUkEXgD8OmA-kcBs_R0u5mnxvPZse-JMZFPf4O_WFOzLhC1_8qP=Oo9wKlXI=Yamcc_Nr6FWswWZd91XlYo9QBIVxtbcXrRRL-zs=rpYcXMNM_j50X=BhD3sNqrFwWvmzSAchUl4JnhvjN0XJRQPp5_vxJsBKbNZXZtZWhOnruqW5o0LrQCKXK4ww2vOb_5hY3srdcaDliK1liQ8gq4T1AMGcV6sHUz-S01Qz1aD5jZtEz7YBMOG6G6P_50Nu85zYv8sQ3LCUfgFbR3sSg3dRDsP=WweqTvJLg5ki95It1I8vWIHJW5H9pcNTt8AB1wV_4chtZEhiTEmWMhYS6pwb1CC-cTj50o3niWEk9BqzfV5VrX1MAaj1YYlcvZmgVtL-0jECQCjbpgDn3DfiqTX-0GcXNZLa0haEpRG4DXTu3p7Fw1MUhs5dNiRWMU2_AlhtHbcTA4iARrerobSqs_Qn9U3R_bVYTgfU60NM5n1xLxspqKDprxctmpjhL1oGLS3vRb2Vkh1TPpicWo3BY6vjihM_gs6ER6bZeDfW0Rza8ARiEfHKA23dlio0r-e6RWZj8J6-sRPYcRjOgpEPIskW5zUWfnpeOW4516DfZ1RgTRzE2vdnajGum6dzQkAeucis_eMFKvVdpkNtl7IRYGhBRcle6qjetZVkd=N6p1gZ132G-RHdSlXm0g0E24bMXkcKgx21jaK1iTDxuBUZLWvALRJUNq3uuXKL-3Nw1R6Xu6VHkow5XH4jD3gzr7ve4nOhugTD-8qBSDxdNt3IFSNOYbS9iIGc995EDkWEw8a82QwOEsq7Un-Es5Lnv4noYD7HjCX0HkxnLSMiqmdSdBePZ8kVtlBH0H8olCYJlQ-_652F_DGAIO2rl_6fY-dG6-v9iqImjRg-1wFzR6q_tk2RjkCXTFmnO_q6PBKQ0Eb4SFT77h7ri3NQ_qGUGOkjNrcI2HTnmGLlgFsuXmtbm9phahUNJG-9-TiIObSJK5zxQJDJn2tbbEserXp4acFBGYc103aO1NauerDt4ZNn1sHSQhHHTXYa7UuOERcr_DLAt1HOt_D7_ioRTozmW7vJdV3OIRm4S2SM83AMgSYw=Tzzqzc4reatgd4H_Lo=8Yx5ksBhgQFBNgEIndkek7N3U6NExsafQGx4J9JA-0HS1LD3cd4wi=QmLfLb"
}
}
or
{"event":2,"data":{"YEAbwsMHIW-f":"A6iuU6B7AQAAUeDxaDmLyNOisr_dYVFqppbIY-iIPEOIvfBKifp72ZzGne6GAclFzmSuctWowH8AADQwAAAAAA==","YEAbwsMHIW-b":"53sf37","YEAbwsMHIW-c":"AGB8UaB7AQAAvAKgqnFEgnYq1px145WH3ZX9OzlLT0yTIb2g0jS6CZVAueBb","YEAbwsMHIW-d":"ABaChIjBDKGNgUGAQZIQhISi0eIApJmBDgA0ugmVQLngWwAAAABVAgdIAPyLJ_H11hCM1VUTI6_DwmE","YEAbwsMHIW-z":"q","YEAbwsMHIW-a":"UYD3v6BpIocdXxHjH0gD8ng2eU2Nx_KWHUmYHEG3XxSo3Eewva8p9vuvaoJrOH7lhji_L10el5qvehZuMJ1sGse2QrF7JhP=KUg7XJ37ivsgUzImVIztGpZsZkDvT=Qji8=ds0sn_m4wojH6ElNQDD0ZhhgBulkbpUq0uAw7-zLzG8YDcVCsHbcpPiiUTMwrAie32C-TzEbD9EsBt=GVYJwErhx5TrxSfGcQU_WMEBBds6kE3Nm_hte7L3zvDSpqkbKDlTNtAtYZOWgMg4huh-_nAhsMx0zzQ5MNMls2-o5AbZm3RIljmTs5IO4U0bwmWzJScqhg1YCDu3GAqo96YX5NglFSoUsMx8W=TuvEngXO7eROuMQdvcWtVMWY5gcRhgXkUACdWNtrkr=J-6q8PSnNc8PdwpU2Hzw3_uJgIJ8SF7F=wS1ixSgwY-sKhlKYWPQiOGCsDjKUphWOhG6eGkBFKbi5gHmffzqXtKrn413At3A0MxvP=Mbn3FLxI9HGQ-b4A8dIMh0Z_3MsxfD3sruw1bDg_gxMoI4tmKBO-MvQrmtGBPYF1_I=RUknIwfnVQFWDS7So6gmfT4n56OWxrr2smutwE4PdmnOwczVsxRZhEKBpWEFOvp-zYjHte4fYW9gJt7Hg7tJIuTQg95jYGbabLktnX44XCOc7=dn5HKRS8BtMAb2=W4xlEDxo2cUA=pNn7OCG4lGuiUADCmULnVRCQ66L9cZwoxJEwVUdQvrwmRtpfIi_vrrwQS0l-TMknTmHItCJ3Ul68WctnvCqOcok=0_SwBVXed0KEWP7W=pR3xV92eYKRt3ou98NKCIukems6KjpMMFG-kIhn5acGUL_c7SzE=v2qNE5WvHmjvfEx8da3LSlzku8EGMKf8T_q6JZzJpPWl_KUZNz6hfATDLJTQMPI_5v9mwqstTFIzpsqYvpKMZ5WwJr9iUFWZMpV3lQKeISeHRDHF-O3g=dc6AgRIKGqlB-iY_FluUQb6JoJGU7MJvVwaaIDtGGULZ8fKG01djvNtTzv9Sf=sMnZmAeiD0ER86hKo=FxwW6ShjVKFwf=IEIpD=xELrZA0GLfxxvo22fzTLrNDnRGMU1oPeZsEEb_Ecuk_FxWVXdbLWktmPNIR3wuxegXSLwaeVI-0TaXdCEIejR1e_u0afWYaVpiFkRooM3mRituqc1GBH8dWSA9tU2o3cb0_JEv2pru2Q0rxZuEQvJiBuBscJaPGF=X3vK1mZEBlQVaeVThowDPLKIv3bpHwqCk0DScx6eq97k5JCnhAG--BUSIEUNdmbsCmG6Wu5MaVmu37MGUZtM3wsECwnAJafXDdw4YDd41tnkAUQ9sPIbNSgKfA2wCDDSAIhgfLVsYCNLnC4XVHE8CfsdBODU2tF3nKJcmT-3YwsHreXE323oD-dbWJtjBEUWGqW7nZLvQ5EnQzRhUW7gj88XN0fTtH_8H7cFSuDoMl2C=Q5VBpAHo-FrPzGHN9Rm=MA4_oJBNvPNT_SGlhIQ8M3veZn4Z3r7tba_6H3mr1ZRY4nfrove9AF67th-vLVvqsRexE8KIYTNHHvzIjTMC2eXrw9BFJvlwkaLAWj=CAO-Pq8k60Nxq_eUKAVt7r2aTpT6nbiAPS2XlhJePr2eL61WNNwZHzvwvVGC7MvpoEPIN7tO3K_mTnaKRNWD_Ee3NPnZbpl6cI=XrATTsugAXI5S3c1LNpEHrXmN3oMjB6ZhGshO4j6L0MBbR_24=ItKXzo4-QI=E-Av0g_6Z7=qwkpPQ4vlMaFvpNEvSxXU3h88Jh=AzZ3BuphHhzVdNN4D2Jih5uAarXU2eNYsCktfUKJgWfbCkK4qbMF0qonILBdArOWRf7xbg2dFor8jlZBfbhiDjmMPaLJRQwojJtT27DE_0X2indDo0MQ4WWCXbrnTaaddDQgr6684xfxnBLCbqkcdXZQW2LTESS6c-smg8Uxsw7dJn2OsrYaubQUnxVizhHeCpTxdx34tohh-m_OQcaKsMOQ6I7nr1u5vr_1_w8TKMY0jf1po9km1_QYZVrehx-UP_bGnHLiMiQFVz9VNeed_0xsHGJZN3WxZ344WATtVmCpMWURgmx0x4hnlrf4xLnQEhxA5NYGNxeAsx9jhoDkU23f=KT8hpWJzFOk1LsxGLCZCAFNRM4ILWsNtc=sbMh2ECKHUmV5gGFKmIsBSgLxUSE0=988IGdV_wiqmjSRHNZqGVam0JYg3Yw6YTtC5WYrLRSewwjSiSEkeg7gcOQsp9N1YO2uGKm4-wMC_u3GgPwYwe5Wj2J6nxBMl2w5o9a16kKW_1WLZEdP0MKReBWtX8kP_GSDmt2KoZ6MkM3ktg5fSQdMmkHJPCFIU5ipC05QX=DBO5wsnMbFPl91MU=ZGV9jvZGZJZfS9aVf0k0QaMv9XARILI=Pmlhg8vELbPGbLtJf-eDbAGmNulE_nUi6-Yrq0R78W9fOcNsiBZRhGOK3EVsG205_=0VTDiiHUkEXgD8OmA-kcBs_R0u5mnxvPZse-JMZFPf4O_WFOzLhC1_8qP=Oo9wKlXI=Yamcc_Nr6FWswWZd91XlYo9QBIVxtbcXrRRL-zs=rpYcXMNM_j50X=BhD3sNqrFwWvmzSAchUl4JnhvjN0XJRQPp5_vxJsBKbNZXZtZWhOnruqW5o0LrQCKXK4ww2vOb_5hY3srdcaDliK1liQ8gq4T1AMGcV6sHUz-S01Qz1aD5jZtEz7YBMOG6G6P_50Nu85zYv8sQ3LCUfgFbR3sSg3dRDsP=WweqTvJLg5ki95It1I8vWIHJW5H9pcNTt8AB1wV_4chtZEhiTEmWMhYS6pwb1CC-cTj50o3niWEk9BqzfV5VrX1MAaj1YYlcvZmgVtL-0jECQCjbpgDn3DfiqTX-0GcXNZLa0haEpRG4DXTu3p7Fw1MUhs5dNiRWMU2_AlhtHbcTA4iARrerobSqs_Qn9U3R_bVYTgfU60NM5n1xLxspqKDprxctmpjhL1oGLS3vRb2Vkh1TPpicWo3BY6vjihM_gs6ER6bZeDfW0Rza8ARiEfHKA23dlio0r-e6RWZj8J6-sRPYcRjOgpEPIskW5zUWfnpeOW4516DfZ1RgTRzE2vdnajGum6dzQkAeucis_eMFKvVdpkNtl7IRYGhBRcle6qjetZVkd=N6p1gZ132G-RHdSlXm0g0E24bMXkcKgx21jaK1iTDxuBUZLWvALRJUNq3uuXKL-3Nw1R6Xu6VHkow5XH4jD3gzr7ve4nOhugTD-8qBSDxdNt3IFSNOYbS9iIGc995EDkWEw8a82QwOEsq7Un-Es5Lnv4noYD7HjCX0HkxnLSMiqmdSdBePZ8kVtlBH0H8olCYJlQ-_652F_DGAIO2rl_6fY-dG6-v9iqImjRg-1wFzR6q_tk2RjkCXTFmnO_q6PBKQ0Eb4SFT77h7ri3NQ_qGUGOkjNrcI2HTnmGLlgFsuXmtbm9phahUNJG-9-TiIObSJK5zxQJDJn2tbbEserXp4acFBGYc103aO1NauerDt4ZNn1sHSQhHHTXYa7UuOERcr_DLAt1HOt_D7_ioRTozmW7vJdV3OIRm4S2SM83AMgSYw=Tzzqzc4reatgd4H_Lo=8Yx5ksBhgQFBNgEIndkek7N3U6NExsafQGx4J9JA-0HS1LD3cd4wi=QmLfLb"}}
Related
I am trying to decode filenames in HTTP but the string from browser messages are different.
In my test file I put the name ç.jpg.
What I need is the name %C3%A7.jpg.
But the browser is sending %C3%83%C2%A7.jpg.
It's not UTF8, UTF16 or UTF32.
For another example I test the file name €.jpg.
What I need is the name %E2%82%AC.jpg.
But I am receiving %C3%A2%E2%80%9A%C2%AC.jpg.
how can I convert this names to UTF8?
Ok I played with this for about 30 minutes and I finally figured it out.
This is how the original string was encoded:
The string was in UTF-8
Some encoding mechanism thought it was CP1252, and based on that wrong assumption re-encoded it to UTF-8 again.
The resulting string was url-encoded.
To get back to a real UTF-8 string, this is what I did. (note, I used PHP, don't know what you are using but it should be doable in other languages just the same).
$input = '%C3%A2%E2%80%9A%C2%AC %C3%83%C2%A7';
$str1 = urldecode($input);
echo iconv('UTF-8', 'CP1252', $str1);
// output "€ ç"
So that conversion is counter intuitive. We're converting to CP1252, but still end up with a UTF-8 string. This only works because an existing UTF-8 was falsely treated as CP1252, and that incorrect interpretation was then converted to UTF-8. So I'm just reversing this double-encoding.
In other languages there might be a few more steps, this works in just 1 line with PHP because strings are bytes, not characters.
Hi guys I encrypted school project but my AES saved txt has been deleted, I pictured it before and I filled a new file. But new AES key file is not equal to the typed in jpeg file. Which character is wrong I couldn't find it. Could you please help me.
Pic : https://i.stack.imgur.com/pAXzl.jpg
Text file : http://textuploader.com/dfop6
If you directly convert bytes with any value to Unicode you may lose information because some bytes will not correspond to a Unicode character, a whitespace character or other information that cannot be easily distinguished in printed out form.
Of course there may be ways to brute force your way out of this, but this could easily result in very complex code and possibly near infinite running time. Better start over, and if you want to use screen shots or similar printed text: base 64 or hex encode your results; those can be easily converted back.
I am having a problem where a Hebrew string is being displayed in reverse. I use QTableWidget to display some info, and here the string appears correctly using:
CString hebrewStr; hebrewStr.ToUTF8();
QString s = QString::fromUtf8( hebrewStr );
In another part of my program this same string is displayed on the screen, but not using QT, and this is what is being shown in reverse:
CString hebrewStr;
hebrewStr.ToUTF8();
I have debugged and hebrewStr.ToUTF8() in both cases produces the exact same unicode string, but the string is only displayed correctly in the QTableWidget. So I am wondering if Qt automatically reverses a given Hebrew string (since it is a rigth-to-left language). Thanks!
Yes, in this case QString generate the full unicode wchar_t from the UTF-8 encoded string. If you would like to do similar thing in MFC, you should use CStringW and decode the string.
Use MultiByteToWideChar for UTF8 to CStringW conversion.
Connected question in StackOverflow.
I have following piece of code:
public void ProcessRequest (HttpContext context)
{
context.Response.ContentType = "text/rtf; charset=UTF-8";
context.Response.Charset = "UTF-8";
context.Response.ContentEncoding = System.Text.Encoding.UTF8;
context.Response.AddHeader("Content-disposition", "attachment;filename=lista_obecnosci.csv");
context.Response.Write("ąęćżźń󳥌ŻŹĆŃŁÓĘ");
}
When I try to open generated csv file, I get following behavior:
In Notepad2 - everything is fine.
In Word - conversion wizard opens and asks to convert the text. It suggest UTF-8, which is somehow ok.
In Excel - I get real mess. None of those Polish characters can be displayed.
I wanted to write those special encoding-information characters in front of my string, i.e.
context.Response.Write((char)0xef);
context.Response.Write((char)0xbb);
context.Response.Write((char)0xbf);
but that won't do any good. The response stream is treating that as normal data and converts it to something different.
I'd appreciate help on this one.
I ran into the same problem, and this was my solution:
context.Response.BinaryWrite(System.Text.Encoding.UTF8.GetPreamble());
context.Response.Write("ąęćżźń󳥌ŻŹĆŃŁÓĘ");
What you call "encoding-information" is actually a BOM. I suspect each of those "characters" is getting encoded separately. To write the BOM manually, you have to write it as three bytes, not three characters. I'm not familiar with the .NET I/O classes, but there should be a method available to you that takes a byte or byte[] parameter and writes them directly to the file.
By the way, the UTF-8 BOM is optional; in fact, its use is discouraged by the Unicode Consortium. If you don't have a specific reason for using it, save yourself some hassle and leave it out.
EDIT: I just remembered you can also write the actual BOM character, '\uFEFF', and let the encoder handle it:
context.Response.Write('\uFEFF');
I think the problem is with Excel based on Microsoft Excel mangles Diacritics in .csv files. To prove this, copy your sample output string of ąęćżźń󳥌ŻŹĆŃŁÓĘ and paste into a test file using your favorite editor, and save as a UTF-8 encoded .csv file. Open in Excel and see the same issues.
The answer from Alan Moore
translated to VB:
Context.Response.Write(""c)
I have a string that is:
!"#$%&'()*+,-./0123456789:;?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[]\^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª« ®¯°±²³´µ¶•¸¹º»¼½¾¿ÀÁÂÃÄÅàáâäèçéêëìíîïôö÷òóõùúý
I post that to service and used Htmlencode, then I get a result:
!#$%&'()* ,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~����������� ���������•������������������������������������
it isn't result that i need,how i get original string? thanks!
Your string is not ASCII, so you are either using a string to represent binary data, or you're not maintaining awareness of multi-byte encoding. In any case, the simplest way to deal with any Internet-based technology (HTTP, SMTP, POP, IMAP) is to encode it as 7-bit clean. One common way is to base64-encode your data, send it across the wire, then base64-decode it before trying to process it.
I believe this is what you're looking for:
!"#$%&'()*+,-./0123456789:;?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[]\\^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«®¯°±²³´µ¶•¸¹º»¼½¾¿ÀÁÂÃÄÅàáâäèçéêëìíîïôö÷òóõùúý
You just need to use a better html entity/encoding library or tool. The one I used to generate this is from Ruby - I used the HTML Entities library. The code I wrote to do this follows. I had to put your text in input.txt to preserve Unicode (there was an EOF character in the string), but it worked great.
require 'rubygems'
require 'htmlentities'
str = File.read('input.txt')
coder = HTMLEntities.new
puts coder.encode(str, :named)