NSData to NSString losing data - nsstring

I'm attempting to convert a binary file into text, the problem is that a large portion of the file was not encoded in ascii and ends up being special characters. I'm using
[[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];
but am only getting a few characters back in a 20000 byte data block. What I would like to be able to see is all of the text (even if most is nonsense), which is what I get when I open the file using a binary editor.

It's a binary file. To read it, you find the documentation for the file format, then you parse it. Trying to throw it all into an NSString* seems absolutely pointless.

Related

How to display the content of an ELF file in QTextEdit?

I need to display all the bytes from and ELF file to a QTextEdit and i did not find any reasonable way to do this. I could print maximum "?ELF??" then nothing. The content of the ELF is read in a char* array (this is a requirement, can't change that) and yes, for sure the content is read.
I am guessing that your code looks something like this:
char *elf = ReadElfFile();
QString str(elf); // Constructs a string initialized with the 8-bit string str.
QTextEdit edit(str);
The problem is that QString constructor will stop on first NUL character, and the ELF file is full of them.
If you want to make a QString that contains NULs, do something like this:
QString str(QByteArray(elf, length_of_elf));
This just nearly broke me too, so I'll post my solution to anyone interested.
Let's say I have a QByteArray data that is filled like so
data += file.readAll();
I'll then invoke an update of the QTextEdit where I'll do
QByteArray copy = data;
QString text = copy.replace((char)0x00, "\\0");
textEdit.setPlainText(text);
This way, all null bytes in the data will be displayed as the printable string \0.
Since I want changes of the textEdit to be reflected in my data, I have to parse this back using
QByteArray hex = textEdit.toPlainText().toUtf8().toHex().toUpper();
hex.replace("5C30", "00");
hex.replace("5C00", "5C30"); // oops, was escaped
data = QByteArray::fromHex(hex);
I'm using the hex format because I just could not get the replace to work with null byte characters. The code above first replaces all occurrences of the string \0 with null bytes in the data. Then it replaces any \ followed by a null byte back with \0 - which essentially means \\0 becomes \0.
It's not very elegant, but maybe it helps anyone ending up here to move on in the right direction. If you have improvements, please comment.

how to read an hebrew text from a file

I want to read a text of hebrew from a file to nsstring object,I know the code line:
NSString *text=[NSString stringWithContentsOfFile:path encoding:encoding error:NULL];
i dont know what kind of file it have to be:rtf,xml or something else
and what kind of encoding I have to use for hebrew
UTF-8 is the proper encoding you want.
http://www.alanwood.net/unicode/hebrew.html

HttpUtility.HtmlDecode cannot decode ASCII greater than 127

I have a list of character that display fine in WebBrowser in the form of encoded characters such as €  ...
But when posting these characters onto server to I realized that HttpUtility.HtmlDecode cannot convert them to characters as browser did, they all become space.
text = System.Web.HttpUtility.HtmlDecode("€");
I expect it to return € but it return space instead. The same thing happen for some other characters as well.
Does anyone know how to fix this or any workaround?
This is commonly result of using literal values and mixing UTF-8 and ASCII. In UTF-8 euro sign is encoded as 3 bytes so there is no ASCII counterpart for it.
Update
Your code is illegal if you are using UTF-8 since it only supports the first 128 characters and the rest are encoded is multiple bytes. You need to use the Unicode syntax:
// !!! NOT HtmlDecode!!!
text = System.Web.HttpUtility.UrlDecode("%E2%82%AC");
UPDATE
OK, I have left the code as it was but added the comment that it does not work. It does not work because it is not an encoding which is of concern for HTML - it is not an HTML. This is of concern for the URL and as such you need to use UrlDecode instead.
ASCII is 7-Bit; there are no characters 128 through 255. The MSDN article you linked is following the long tradition of pretending ASCII is 8-Bit; the article actually shows code page 437.
I'm not sure why you're not simply writing € (compatibility?), but € or € should do, too.
You typically want to do something like:
string html = "€"
string trash = WebUtility.HtmlDecode(html);
//Convert from default encoding to UTF8
byte[] bytes = Encoding.Default.GetBytes(trash);
string proper = Encoding.UTF8.GetString(bytes);

Fix Special Characters in String

I've got a program that in a nutshell reads values from a SQL database and writes them to a tab-delimited text file.
The issue is that some of the values in the database have special characters (TM, dash, ellipsis, etc.) When written to the text file, the formatting is lost and they come across as junk "™ or – etc"
When the value is viewed in the immediate window, before it is written to the txt file, everything looks fine. My guess is that this is an issue of encoding. But, I'm not real sure how to proceed, where to look, or what to look for.
Is this ASCII or UTF-8? If it's one of those how do I correct it before it's written to the text file.
Here's how I build the text file (where feedStr is a StringBuilder)
objReader = New StreamWriter(filePath)
objReader.Write(feedStr)
objReader.Close()
The default encoding for StreamWriter is UTF8 (with no byte order mark). Your result file is ok, the question is what do you open it in afterwards? If you open it in a UTF8 capable text editor, the characters should look the way you want.
You can also write the text file in another encoding, for example iso-8859-1 (latin1)
objReader = New StreamWriter(filePath, false, Encoding.GetEncoding("iso-8859-1"))

How to add encoding information to the response stream in ASP.NET?

I have following piece of code:
public void ProcessRequest (HttpContext context)
{
context.Response.ContentType = "text/rtf; charset=UTF-8";
context.Response.Charset = "UTF-8";
context.Response.ContentEncoding = System.Text.Encoding.UTF8;
context.Response.AddHeader("Content-disposition", "attachment;filename=lista_obecnosci.csv");
context.Response.Write("ąęćżźń󳥌ŻŹĆŃŁÓĘ");
}
When I try to open generated csv file, I get following behavior:
In Notepad2 - everything is fine.
In Word - conversion wizard opens and asks to convert the text. It suggest UTF-8, which is somehow ok.
In Excel - I get real mess. None of those Polish characters can be displayed.
I wanted to write those special encoding-information characters in front of my string, i.e.
context.Response.Write((char)0xef);
context.Response.Write((char)0xbb);
context.Response.Write((char)0xbf);
but that won't do any good. The response stream is treating that as normal data and converts it to something different.
I'd appreciate help on this one.
I ran into the same problem, and this was my solution:
context.Response.BinaryWrite(System.Text.Encoding.UTF8.GetPreamble());
context.Response.Write("ąęćżźń󳥌ŻŹĆŃŁÓĘ");
What you call "encoding-information" is actually a BOM. I suspect each of those "characters" is getting encoded separately. To write the BOM manually, you have to write it as three bytes, not three characters. I'm not familiar with the .NET I/O classes, but there should be a method available to you that takes a byte or byte[] parameter and writes them directly to the file.
By the way, the UTF-8 BOM is optional; in fact, its use is discouraged by the Unicode Consortium. If you don't have a specific reason for using it, save yourself some hassle and leave it out.
EDIT: I just remembered you can also write the actual BOM character, '\uFEFF', and let the encoder handle it:
context.Response.Write('\uFEFF');
I think the problem is with Excel based on Microsoft Excel mangles Diacritics in .csv files. To prove this, copy your sample output string of ąęćżźń󳥌ŻŹĆŃŁÓĘ and paste into a test file using your favorite editor, and save as a UTF-8 encoded .csv file. Open in Excel and see the same issues.
The answer from Alan Moore
translated to VB:
Context.Response.Write(""c)

Resources