Given final block not properly padded. Such issues can arise if a bad key is used during decryption - encryption

Hi guys I encrypted school project but my AES saved txt has been deleted, I pictured it before and I filled a new file. But new AES key file is not equal to the typed in jpeg file. Which character is wrong I couldn't find it. Could you please help me.
Pic : https://i.stack.imgur.com/pAXzl.jpg
Text file : http://textuploader.com/dfop6

If you directly convert bytes with any value to Unicode you may lose information because some bytes will not correspond to a Unicode character, a whitespace character or other information that cannot be easily distinguished in printed out form.
Of course there may be ways to brute force your way out of this, but this could easily result in very complex code and possibly near infinite running time. Better start over, and if you want to use screen shots or similar printed text: base 64 or hex encode your results; those can be easily converted back.

Related

Chinese/Japanes characters are loading as ???? to MariaDB database in AWS RDS using pentaho [duplicate]

I tried to use UTF-8 and ran into trouble.
I have tried so many things; here are the results I have gotten:
???? instead of Asian characters. Even for European text, I got Se?or for Señor.
Strange gibberish (Mojibake?) such as Señor or 新浪新闻 for 新浪新闻.
Black diamonds, such as Se�or.
Finally, I got into a situation where the data was lost, or at least truncated: Se for Señor.
Even when I got text to look right, it did not sort correctly.
What am I doing wrong? How can I fix the code? Can I recover the data, if so, how?
This problem plagues the participants of this site, and many others.
You have listed the five main cases of CHARACTER SET troubles.
Best Practice
Going forward, it is best to use CHARACTER SET utf8mb4 and COLLATION utf8mb4_unicode_520_ci. (There is a newer version of the Unicode collation in the pipeline.)
utf8mb4 is a superset of utf8 in that it handles 4-byte utf8 codes, which are needed by Emoji and some of Chinese.
Outside of MySQL, "UTF-8" refers to all size encodings, hence effectively the same as MySQL's utf8mb4, not utf8.
I will try to use those spellings and capitalizations to distinguish inside versus outside MySQL in the following.
Overview of what you should do
Have your editor, etc. set to UTF-8.
HTML forms should start like <form accept-charset="UTF-8">.
Have your bytes encoded as UTF-8.
Establish UTF-8 as the encoding being used in the client.
Have the column/table declared CHARACTER SET utf8mb4 (Check with SHOW CREATE TABLE.)
<meta charset=UTF-8> at the beginning of HTML
Stored Routines acquire the current charset/collation. They may need rebuilding.
UTF-8 all the way through
More details for computer languages (and its following sections)
Test the data
Viewing the data with a tool or with SELECT cannot be trusted.
Too many such clients, especially browsers, try to compensate for incorrect encodings, and show you correct text even if the database is mangled.
So, pick a table and column that has some non-English text and do
SELECT col, HEX(col) FROM tbl WHERE ...
The HEX for correctly stored UTF-8 will be
For a blank space (in any language): 20
For English: 4x, 5x, 6x, or 7x
For most of Western Europe, accented letters should be Cxyy
Cyrillic, Hebrew, and Farsi/Arabic: Dxyy
Most of Asia: Exyyzz
Emoji and some of Chinese: F0yyzzww
More details
Specific causes and fixes of the problems seen
Truncated text (Se for Señor):
The bytes to be stored are not encoded as utf8mb4. Fix this.
Also, check that the connection during reading is UTF-8.
Black Diamonds with question marks (Se�or for Señor);
one of these cases exists:
Case 1 (original bytes were not UTF-8):
The bytes to be stored are not encoded as utf8. Fix this.
The connection (or SET NAMES) for the INSERT and the SELECT was not utf8/utf8mb4. Fix this.
Also, check that the column in the database is CHARACTER SET utf8 (or utf8mb4).
Case 2 (original bytes were UTF-8):
The connection (or SET NAMES) for the SELECT was not utf8/utf8mb4. Fix this.
Also, check that the column in the database is CHARACTER SET utf8 (or utf8mb4).
Black diamonds occur only when the browser is set to <meta charset=UTF-8>.
Question Marks (regular ones, not black diamonds) (Se?or for Señor):
The bytes to be stored are not encoded as utf8/utf8mb4. Fix this.
The column in the database is not CHARACTER SET utf8 (or utf8mb4). Fix this. (Use SHOW CREATE TABLE.)
Also, check that the connection during reading is UTF-8.
Mojibake (Señor for Señor):
(This discussion also applies to Double Encoding, which is not necessarily visible.)
The bytes to be stored need to be UTF-8-encoded. Fix this.
The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4. Fix this.
The column needs to be declared CHARACTER SET utf8 (or utf8mb4). Fix this.
HTML should start with <meta charset=UTF-8>.
If the data looks correct, but won't sort correctly, then
either you have picked the wrong collation,
or there is no collation that suits your need,
or you have Double Encoding.
Double Encoding can be confirmed by doing the SELECT .. HEX .. described above.
é should come back C3A9, but instead shows C383C2A9
The Emoji 👽 should come back F09F91BD, but comes back C3B0C5B8E28098C2BD
That is, the hex is about twice as long as it should be.
This is caused by converting from latin1 (or whatever) to utf8, then treating those
bytes as if they were latin1 and repeating the conversion.
The sorting (and comparing) does not work correctly because it is, for example,
sorting as if the string were Señor.
Fixing the Data, where possible
For Truncation and Question Marks, the data is lost.
For Mojibake / Double Encoding, ...
For Black Diamonds, ...
The Fixes are listed here. (5 different fixes for 5 different situations; pick carefully): http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
I had similar issues with two of my projects, after a server migration. After searching and trying a lot of solutions, I came across with this one:
mysqli_set_charset($con,"utf8mb4");
After adding this line to my configuration file, everything works fine!
I found this solution for MySQLi—PHP mysqli set_charset() Function—when I was looking to solve an insert from an HTML query.
I was also searching for the same issue. It took me nearly one month to find the appropriate solution.
First of all, you will have to update you database will all the recent CHARACTER and COLLATION to utf8mb4 or at least which support UTF-8 data.
For Java:
while making a JDBC connection, add this to the connection URL useUnicode=yes&characterEncoding=UTF-8 as parameters and it will work.
For Python:
Before querying into the database, try enforcing this over the cursor
cursor.execute('SET NAMES utf8mb4')
cursor.execute("SET CHARACTER SET utf8mb4")
cursor.execute("SET character_set_connection=utf8mb4")
If it does not work, happy hunting for the right solution.
Set your code IDE language to UTF-8
Add <meta charset="utf-8"> to your webpage header where you collect data form.
Check your MySQL table definition looks like this:
CREATE TABLE your_table (
...
) ENGINE=InnoDB DEFAULT CHARSET=utf8
If you are using PDO, make sure
$options = array(PDO::MYSQL_ATTR_INIT_COMMAND=>'SET NAMES utf8');
$dbL = new PDO($pdo, $user, $pass, $options);
If you already got a large database with above problem, you can try SIDU to export with correct charset, and import back with UTF-8.
Depending on how the server is setup, you have to change the encode accordingly. utf8 from what you said should work the best. However, if you're getting weird characters, it might help if you change the webpage encoding to ANSI.
This helped me when I was setting up a PHP MySQLi. This might help you understand more: ANSI to UTF-8 in Notepad++

Reading text files in Ada: Get_Line "reads" the byte-order mark as well

I'm trying to read a file line-by-line in Ada, it's a XML text file. I'm following the instructions here:
http://rosettacode.org/wiki/Read_a_file_line_by_line#Ada
However there's a problem that annoys me: the "Get_Line" function seems to be unaware of byte-order marks and reads them as part of the text itself, which means that when I raed the lines, the first one will always start with some extra bytes that should not be there.
While removing the extra bytes manually from the string is no big deal it seems strange to me that a function dedicated to text input/output is unaware of BOMs, there must be a way to read a text file in ada without having to worry about this... is there?
Ada.Text_IO is specified to handle ISO-8859-1 encoded text, so ignoring an UTF-8 feature is the proper thing to do.
If Ada.Wide_Text_IO and Ada.Wide_Wide_Text_IO also output the byte-order-mark, when asked to read UTF-8 encoded text, then you should consider reporting it as a bug to GCC - but as there is quite a lot of implementation defined details for the text I/O packages in Ada, you should be ready for a "wont fix" answer.
One possibility is using the stream attributes and making a UTF_8 file-type to handle the BOM reading-and-discarding.

Handle utf 8 characters in unix

I was trying to find a solution for my problem and after looking at the forums I couldn't so I'll explain my problem here.
We receive a csv file from a client with some special characters and encoded as unknown-8bit. We convert this csv file to xml using an awk script. With the xml file we make an API call to our system using utf-8 as default encoding. The response is an error with following information:
org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence
The content of the file is as bellow:
151215901579-109617744500,sandra,sandra,Coesfeld,,Coesfeld,48653,DE,1,2.30,ASTRA 16V CAVALIER CALIBRA TURBO BLUE 10,53.82,GB,,.80,3,ASTRA 16V CAVALIER CALIBRA TURBO BLUE 10MM 4CORE IGNITION HT LEADS WIRES MLR.CR,,sandra#online.de,parcel1,Invalid Request,,%004865315500320004648880276,INTL,%004865315500320004648880276,1,INTL,DPD,180380,INTL,2.30,Send A2B Ltd,4th Floor,200 Gray’s Inn Road,LONDON,,WC1X8XZ,GBR,
I think the problem is in the field "200 Gray’s Inn Road" cause when I use utf-8 encoding it automatically converts "'" character by a x92 value.
Does anybody know how can I handle this?
Thanks in advance,
Sandra
Find out the actual encoding first, best would be asking the sender.
If you cannot do so, and also for sanity-checking, the unix command file is very useful for that (the linked page shows more options).
Next step, convert to UTF-8.
As it is obviously an ASCII-based encoding, you could just discard all non-ASCII or replace them on encoding, if that loss is acceptable.
As an alternative, open it in the editor of your choice and flip the encoding used for interpreting the data until you get something useful. My guess is you'll have either Latin-1 or Windows-1252, but check it for yourself.
Last step, do what you wanted to do, in comforting knowledge that you now have valid UTF-8.
Obviously, don't pretend it's UTF-8 if it isn't. Find out what the encoding is, or replace all non-ASCII characters with the UTF-8 REPLACEMENT CHARACTER sequence 0xEF 0xBF 0xBD.
Since you are able to view this particular sample just fine, you apparently already know which encoding it is (even if you don't know that you know -- it would be whatever your current set-up is using) -- I would guess Windows-1252 which uses 0x92 for a curvy right single quote.

Native method in R to test if file is ascii

Is there a native method in R to test if a file on disk is an ASCII text file, or a binary file? Similar to the file command in Linux, but a method that will work cross platform?
The file.info() function can distinguish a file from a dir, but it doesn't seem to go beyond that.
If all you care about is whether the file is ASCII or binary...
Well, first up definitions. All files are binary at some level:
is.binary <- function(file){
if(system.type() != "quantum computer"){
return(TRUE)
}else{
return(cat=alive&dead)
}
}
ASCII is just an encoding system for characters. It is therefore impossible to tell if a file is ASCII or binary, because ASCII-ness is a matter of interpretation. If I save a file and decide that binary number 01001101 is Q and 01001110 is Z then you might decode this as ASCII but you'll get the wrong message. Luckily the Americans muscled in and said "Hey, everyone use ASCII to code their text! You get 128 characters and a parity bit! Woo! Go USA!". IBM tried to tell people to use EBCDIC but nobody listened. Which was A Good Thing.
So everyone was packing ASCII-coded text into their 8-bit bytes, and using the eighth bit for parity checking. But then people stopped doing parity checking because TCP/IP handled all that, which was also A Good Thing, and the eighth bit was expected to be zero. If not, there was trouble.
Because people (read "Microsoft") started abusing the eighth bit, and making up their own encoding schemes, and so unless you knew what encoding scheme the file was using, you were stuffed. And the file very rarely told you what encoding scheme it was. And now we have Unicode and even more encoding schemes. And that is a third Good Thing. But I digress.
Nowadays when people ask if a file is binary, what they are normally asking is "Does any byte in this file have it's highest bit set?". Which you can do in R by reading a raw file connection as unsigned integers and testing the highest value. Something like:
is.binary <- function(filepath,max=1000){
f=file(filepath,"rb",raw=TRUE)
b=readBin(f,"int",max,size=1,signed=FALSE)
return(max(b)>128)
}
This will by default test only at most the first 1000 characters. I think the file command does something similar.
You may want to change the test to check for printable character codes, and whitespace, and line feed, carriage return, and other codes you might want to consider plausible in your non-binary files...
Well, how would you do that? I guess you can't without reading (parts or all of) the file, which is why files extensions are used to signal content type.
I looked into that years ago---and as I recall, the file(1) apps actually reads the first few header bytes of a file and compares that to what is stored in a lookup table. Sounds like a good candidate for an add-on package to me..
The example section of the manual for ?raw uses this:
isASCII <- function(txt) all(charToRaw(txt) <= as.raw(127))

Please help identify multi-byte character encoding scheme on ASP Classic page

I'm working with a 3rd party (Commidea.com) payment processing system and one of the parameters being sent along with the processing result is a "signature" field. This is used to provide a SHA1 hash of the result message wrapped in an RSA encrypted envelope to provide both integrity and authenticity control. I have the API from Commidea but it doesn't give details of encoding and uses artificially created signatures derived from Base64 strings to illustrate the examples.
I'm struggling to work out what encoding is being used on this parameter and hoped someone might recognise the quite distinctive pattern. I initially thought it was UTF8 but having looked at the individual characters I am less sure.
Here is a short sample of the content which was created by the following code where I am looping through each "byte" in the string:
sig = Request.Form("signature")
For x = 1 To LenB(sig)
s = s & AscB(MidB(sig,x,1)) & ","
Next
' Print s to a debug log file
When I look in the log I get something like this:
129,0,144,0,187,0,67,0,234,0,71,0,197,0,208,0,191,0,9,0,43,0,230,0,19,32,195,0,248,0,102,0,183,0,73,0,192,0,73,0,175,0,34,0,163,0,174,0,218,0,230,0,157,0,229,0,234,0,182,0,26,32,42,0,123,0,217,0,143,0,65,0,42,0,239,0,90,0,92,0,57,0,111,0,218,0,31,0,216,0,57,32,117,0,160,0,244,0,29,0,58,32,56,0,36,0,48,0,160,0,233,0,173,0,2,0,34,32,204,0,221,0,246,0,68,0,238,0,28,0,4,0,92,0,29,32,5,0,102,0,98,0,33,0,5,0,53,0,192,0,64,0,212,0,111,0,31,0,219,0,48,32,29,32,89,0,187,0,48,0,28,0,57,32,213,0,206,0,45,0,46,0,88,0,96,0,34,0,235,0,184,0,16,0,187,0,122,0,33,32,50,0,69,0,160,0,11,0,39,0,172,0,176,0,113,0,39,0,218,0,13,0,239,0,30,32,96,0,41,0,233,0,214,0,34,0,191,0,173,0,235,0,126,0,62,0,249,0,87,0,24,0,119,0,82,0
Note that every other value is a zero except occasionally where it is 32 (0x20). I'm familiar with UTF8 where it represents characters above 127 by using two bytes but if this was UTF8 encoding then I would expect the "32" value to be more like 194 (0xC2) or (0xC3) and the other value would be greater than 0x80.
Ultimately what I'm trying to do is convert this signature parameter into a hex encoded string (eg. "12ab0528...") which is then used by the RSA/SHA1 function to verify the message is intact. This part is already working but I can't for the life of me figure out how to get the signature parameter decoded.
For historical reasons we are having to use classic ASP and the SHA1/RSA functions are javascript based.
Any help would be much appreciated.
Regards,
Craig.
Update: Tried looking into UTF-16 encoding on Wikipedia and other sites. Can't find anything to explain why I am seeing only 0x20 or 0x00 in the (assumed) high order byte positions. I don't think this is relevant any more as the example below shows other values in this high order position.
Tried adding some code to log the values using Asc instead of AscB (Len,Mid instead of LenB,MidB too). Got some surprising results. Here is a new stream of byte-wise characters followed by the equivalent stream of word-wise (if you know what I mean) characters.
21,0,83,1,214,0,201,0,88,0,172,0,98,0,182,0,43,0,103,0,88,0,103,0,34,33,88,0,254,0,173,0,188,0,44,0,66,0,120,1,246,0,64,0,47,0,110,0,160,0,84,0,4,0,201,0,176,0,251,0,166,0,211,0,67,0,115,0,209,0,53,0,12,0,243,0,6,0,78,0,106,0,250,0,19,0,204,0,235,0,28,0,243,0,165,0,94,0,60,0,82,0,82,0,172,32,248,0,220,2,176,0,141,0,239,0,34,33,47,0,61,0,72,0,248,0,230,0,191,0,219,0,61,0,105,0,246,0,3,0,57,32,54,0,34,33,127,0,224,0,17,0,224,0,76,0,51,0,91,0,210,0,35,0,89,0,178,0,235,0,161,0,114,0,195,0,119,0,69,0,32,32,188,0,82,0,237,0,183,0,220,0,83,1,10,0,94,0,239,0,187,0,178,0,19,0,168,0,211,0,110,0,101,0,233,0,83,0,75,0,218,0,4,0,241,0,58,0,170,0,168,0,82,0,61,0,35,0,184,0,240,0,117,0,76,0,32,0,247,0,74,0,64,0,163,0
And now the word-wise data stream:
21,156,214,201,88,172,98,182,43,103,88,103,153,88,254,173,188,44,66,159,246,64,47,110,160,84,4,201,176,251,166,211,67,115,209,53,12,243,6,78,106,250,19,204,235,28,243,165,94,60,82,82,128,248,152,176,141,239,153,47,61,72,248,230,191,219,61,105,246,3,139,54,153,127,224,17,224,76,51,91,210,35,89,178,235,161,114,195,119,69,134,188,82,237,183,220,156,10,94,239,187,178,19,168,211,110,101,233,83,75,218,4,241,58,170,168,82,61,35,184,240,117,76,32,247,74,64,163
Note the second pair of byte-wise characters (83,1) seem to be interpreted as 156 in the word-wise stream. We also see (34,33) as 153 and (120,1) as 159 and (220,2) as 152. Does this give any clues as the encoding? Why are these 15[2369] values apparently being treated differently from other values?
What I'm trying to figure out is whether I should use the byte-wise data and carry out some post-processing to get back to the intended values or if I should trust the word-wise data with whatever implicit decoding it is apparently performing. At the moment, neither seem to give me a match between data content and signature so I need to change something.
Thanks.
Quick observation tells me that you are likely dealing with UTF-16. Start from there.

Resources