Can somebody say me what is the problem:
I will print out a hex file on the "Epson TM-t88II" printer, but after this all my "Umlaute" (ä/ö/ü/ß) lost its format and are black dots, etc.
My Hex code for an ü is "FC". Is it false?
&00FC is "ü" in many encoding formats, but not all. If your printer does not know what encoding you're using, it may well not understand what you want to print.
Exactly what are you sending to the printer? And what application are you printing from?
The following may also be of interest.
http://www.joelonsoftware.com/articles/Unicode.html
Related
In 2001 German scene group Farbrausch released a demo called "fuenf" (in your face). pouet.net It contains a 5 Byte executable which could be rather considered a troll approach than a demo. If you run it your hear a weird sound and it could crash your computer. At least it produces a sound. Whatever.
The hexadecimal content is:
95cd 21eb fc
And the binary representation is:
10010101 11001101 00100001 11101011 11111100
Using xxd I also get the printable chars from the content, which are:
..!..
And that makes me a little confused. Looking up the values in the ASCII table (e.g. here), I get this as a result:
•Í!ëü
At least the exclamation mark is correct.
But how does 95cd21ebfc translate into ..!..?
Side note:
file -bi fuenf.com sais the encoding is not known:
charset=unknown-8bit
And iconv -f ISO-8859-1 -t UTF-8 fuenf.com returns
Í!ëü
Which leads to the assumption, that XXD simply cannot decode the content and therefore just uses default results, like the dot?
First of all, this is not a text file, so looking at it as one makes no sense. It's instructions.
Secondly, even if it could be interpreted as text, you would need to know the encoding. It's definitely not ASCII, because that only defines symbols in the range 0-127 (and the 3rd byte here is the only one in that range, which maps to '!'). The "extended ASCII" table you link to is only one of many possible code pages that give meaning to the value from 128-255, but there are many of those code pages. Calling it "extended ASCII" is misleading, because it suggests that ASCII created an updated standard for this, which they did not. For a while, computer vendors just did whatever they wanted with those additional characters, and some of them became quasi-standards by virtue of being included in DOS, Windows, etc. Or they got standardized by ISO (you tried iso-8859-1, which is one such standard).
Hi guys I encrypted school project but my AES saved txt has been deleted, I pictured it before and I filled a new file. But new AES key file is not equal to the typed in jpeg file. Which character is wrong I couldn't find it. Could you please help me.
Pic : https://i.stack.imgur.com/pAXzl.jpg
Text file : http://textuploader.com/dfop6
If you directly convert bytes with any value to Unicode you may lose information because some bytes will not correspond to a Unicode character, a whitespace character or other information that cannot be easily distinguished in printed out form.
Of course there may be ways to brute force your way out of this, but this could easily result in very complex code and possibly near infinite running time. Better start over, and if you want to use screen shots or similar printed text: base 64 or hex encode your results; those can be easily converted back.
I was trying to find a solution for my problem and after looking at the forums I couldn't so I'll explain my problem here.
We receive a csv file from a client with some special characters and encoded as unknown-8bit. We convert this csv file to xml using an awk script. With the xml file we make an API call to our system using utf-8 as default encoding. The response is an error with following information:
org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence
The content of the file is as bellow:
151215901579-109617744500,sandra,sandra,Coesfeld,,Coesfeld,48653,DE,1,2.30,ASTRA 16V CAVALIER CALIBRA TURBO BLUE 10,53.82,GB,,.80,3,ASTRA 16V CAVALIER CALIBRA TURBO BLUE 10MM 4CORE IGNITION HT LEADS WIRES MLR.CR,,sandra#online.de,parcel1,Invalid Request,,%004865315500320004648880276,INTL,%004865315500320004648880276,1,INTL,DPD,180380,INTL,2.30,Send A2B Ltd,4th Floor,200 Gray’s Inn Road,LONDON,,WC1X8XZ,GBR,
I think the problem is in the field "200 Gray’s Inn Road" cause when I use utf-8 encoding it automatically converts "'" character by a x92 value.
Does anybody know how can I handle this?
Thanks in advance,
Sandra
Find out the actual encoding first, best would be asking the sender.
If you cannot do so, and also for sanity-checking, the unix command file is very useful for that (the linked page shows more options).
Next step, convert to UTF-8.
As it is obviously an ASCII-based encoding, you could just discard all non-ASCII or replace them on encoding, if that loss is acceptable.
As an alternative, open it in the editor of your choice and flip the encoding used for interpreting the data until you get something useful. My guess is you'll have either Latin-1 or Windows-1252, but check it for yourself.
Last step, do what you wanted to do, in comforting knowledge that you now have valid UTF-8.
Obviously, don't pretend it's UTF-8 if it isn't. Find out what the encoding is, or replace all non-ASCII characters with the UTF-8 REPLACEMENT CHARACTER sequence 0xEF 0xBF 0xBD.
Since you are able to view this particular sample just fine, you apparently already know which encoding it is (even if you don't know that you know -- it would be whatever your current set-up is using) -- I would guess Windows-1252 which uses 0x92 for a curvy right single quote.
I need to create DataMatrix barcodes which may contain non-Latin characters. I have code which creates the barcodes correctly when they only consist of Latin characters; when I run the same code with non-Latin (Hebrew or Russian) characters, however, although the code runs to completion and the barcode is created, the non-Latin characters are not deciphered by the barcode reader.
Any assistance or ideas would be greatly appreciated!
Your issue is related to the character encoding used prior to generating the barcode. The encoding used by the generator to encode must match the encoding used by the reader to decode.
Possible encodings are:
Extended Channel Interpretations (ECI) is supported by DataMatrix and other 2D barcode standards. The generator places an ECI identifying code inside the barcode data, so the reader knows to use ECI to correctly convert the data back to text.
UTF-8 encodes pretty much any language.
Code page is an older encoding, but if your generator is using it, you can use 1255 for Hebrew code page or 1251 for Russian code page. See this SO answer for more info.
To test your encoding, try Inlite's Online Barcode Reader (OBR) which should read the correct text for ECI and UTF-8 encoded barcodes. If it does, the problem is with your barcode reader which is not decoding correctly.
If OBR returns binary data, either your generator uses code page or does not encode correctly at all. Try another generator that supports ECI or UTF-8.
Is there a native method in R to test if a file on disk is an ASCII text file, or a binary file? Similar to the file command in Linux, but a method that will work cross platform?
The file.info() function can distinguish a file from a dir, but it doesn't seem to go beyond that.
If all you care about is whether the file is ASCII or binary...
Well, first up definitions. All files are binary at some level:
is.binary <- function(file){
if(system.type() != "quantum computer"){
return(TRUE)
}else{
return(cat=alive&dead)
}
}
ASCII is just an encoding system for characters. It is therefore impossible to tell if a file is ASCII or binary, because ASCII-ness is a matter of interpretation. If I save a file and decide that binary number 01001101 is Q and 01001110 is Z then you might decode this as ASCII but you'll get the wrong message. Luckily the Americans muscled in and said "Hey, everyone use ASCII to code their text! You get 128 characters and a parity bit! Woo! Go USA!". IBM tried to tell people to use EBCDIC but nobody listened. Which was A Good Thing.
So everyone was packing ASCII-coded text into their 8-bit bytes, and using the eighth bit for parity checking. But then people stopped doing parity checking because TCP/IP handled all that, which was also A Good Thing, and the eighth bit was expected to be zero. If not, there was trouble.
Because people (read "Microsoft") started abusing the eighth bit, and making up their own encoding schemes, and so unless you knew what encoding scheme the file was using, you were stuffed. And the file very rarely told you what encoding scheme it was. And now we have Unicode and even more encoding schemes. And that is a third Good Thing. But I digress.
Nowadays when people ask if a file is binary, what they are normally asking is "Does any byte in this file have it's highest bit set?". Which you can do in R by reading a raw file connection as unsigned integers and testing the highest value. Something like:
is.binary <- function(filepath,max=1000){
f=file(filepath,"rb",raw=TRUE)
b=readBin(f,"int",max,size=1,signed=FALSE)
return(max(b)>128)
}
This will by default test only at most the first 1000 characters. I think the file command does something similar.
You may want to change the test to check for printable character codes, and whitespace, and line feed, carriage return, and other codes you might want to consider plausible in your non-binary files...
Well, how would you do that? I guess you can't without reading (parts or all of) the file, which is why files extensions are used to signal content type.
I looked into that years ago---and as I recall, the file(1) apps actually reads the first few header bytes of a file and compares that to what is stored in a lookup table. Sounds like a good candidate for an add-on package to me..
The example section of the manual for ?raw uses this:
isASCII <- function(txt) all(charToRaw(txt) <= as.raw(127))