Backspace delimited flat files - flat-file

Has any one ever seen a backspace delimited flat file? My requirement is to parse such a file but I am not able to put a backspace character into a file to check if I am able to detect it.

Splitting shouldn't be any harder than using any other delimiter. It's just another character, after all. In Python, for instance:
>>> x = "apples\bbanana\bcoconut\bthese are delicious!"
>>> x.split('\b')
['apples', 'banana', 'coconut', 'these are delicious!']
Most languages use \b as the escape character for a backspace. If yours doesn't you can also include the ASCII control code for backspace itself, which is \x08.

I have never seen one, but some editors allow you to put a backspace character in by pressing e.g. Ctrl-V first.

You could write a script that appends the ASCII character code for backspace (\0x008) to a file.

Here is a C program that will generate you a backspace delimited file for testing (with newlines delimiting different rows). Pass in either a filename, or it will write it to stdout (I chose C because you didn't mention a platform; most people have a C compiler available):
#include <stdio.h>
int main(int argc, char **argv) {
FILE *outfile;
if (argc < 2)
outfile = stdout;
else
outfile = fopen(argv[1], "w");
fprintf(outfile, "this\bis\nbackspace\bdelimited\n");
fclose(outfile);
return 0;
}
The same string literal syntax should work in Java; I'll let you write the rest of the program:
"this\bis\nbackspace\bdelimited\n"

If using Windows, you can insert a backspace into notepad by using Ctrl+Backspace.

I would also recommend getting a hex editor like 0xED (for Mac). It's pretty useful for viewing and editing files containing unusual characters. With it, you can just type "08" to insert a backspace character into a file.

Related

What do these characters mean?(ANSI Code)

It's the code I'm printing with node:
const m = `[38;5;1;48;5;16m TEST`
console.log(m)
output:
It changes the text color.
As you can see `` is a special char I don't understand(It's not being shown by the browser). How does it work?
Is there any alternative for ESC?
As #puucee already mentions they are terminal control characters. I find it surprising that it says ESC[ in the code as that won't be escaped in normal node. I suspect that maybe your IDE is converting the "true" escape character to ESC. Node does not support octal escapes (such as \033), but hexadecimal escapes. That is, you string should usually be like this:
console.log('\x1b[38;5;1;48;5;16m TEST \x1b[0m')
These are terminal control characters. They are often used e.g. for coloring the output. Some are non-printable. Backticks ` in your javascript example are called template literals.

How to display the content of an ELF file in QTextEdit?

I need to display all the bytes from and ELF file to a QTextEdit and i did not find any reasonable way to do this. I could print maximum "?ELF??" then nothing. The content of the ELF is read in a char* array (this is a requirement, can't change that) and yes, for sure the content is read.
I am guessing that your code looks something like this:
char *elf = ReadElfFile();
QString str(elf); // Constructs a string initialized with the 8-bit string str.
QTextEdit edit(str);
The problem is that QString constructor will stop on first NUL character, and the ELF file is full of them.
If you want to make a QString that contains NULs, do something like this:
QString str(QByteArray(elf, length_of_elf));
This just nearly broke me too, so I'll post my solution to anyone interested.
Let's say I have a QByteArray data that is filled like so
data += file.readAll();
I'll then invoke an update of the QTextEdit where I'll do
QByteArray copy = data;
QString text = copy.replace((char)0x00, "\\0");
textEdit.setPlainText(text);
This way, all null bytes in the data will be displayed as the printable string \0.
Since I want changes of the textEdit to be reflected in my data, I have to parse this back using
QByteArray hex = textEdit.toPlainText().toUtf8().toHex().toUpper();
hex.replace("5C30", "00");
hex.replace("5C00", "5C30"); // oops, was escaped
data = QByteArray::fromHex(hex);
I'm using the hex format because I just could not get the replace to work with null byte characters. The code above first replaces all occurrences of the string \0 with null bytes in the data. Then it replaces any \ followed by a null byte back with \0 - which essentially means \\0 becomes \0.
It's not very elegant, but maybe it helps anyone ending up here to move on in the right direction. If you have improvements, please comment.

Specifying pronunciations with user dictionaries (Nuance Vocalizer Expressive TTS 5.4)

I am currently trying to correct pronunciation of a word using dictionary called userdct_eng.dct which later will be converted to .dat file using python.
My problem is I don't know how to modify the pronunciation of an input word which enclosed in double quotes (").
this is the example code inside the dictionary:
[Header]
Name=userdct_eng.dct
Description=userdct_eng
Language=ENG
Content=EDCT_CONTENT_BROAD_NARROWS
Representation=EDCT_REPR_SZZ_STRING
[Data]
you // #'jEs#
"you" // #'jEs#
I am trying to modify word you to pronounce as yes. it's work, this string ( you // #'jEs# ) is working.
And in the second string I am trying to modify word "you" (including the double quotes) to pronouncing as yes. but it doesn't, this string ( "you" // #'jEs# ) doesn't work, the voice still pronounce it as you.
my question is: How to deal with double quotation marks word?
thanks.
SOLVED by using backslash (\) before double quote (").
Example:
\"you\" // #'jEs#

Preserve non-ascii characters between std::string and QString

In my program the user can either provide a filename on the command line or using a QFileDialog. In the first case, I have a char* without any encoding information, in the second I have a QString.
To store the filename for later use (Recent Files), I need it as a QString. But to open the file with std::ifstream, I need a std::string.
Now the fun starts. I can do:
filename = QString::fromLocal8Bit(argv[1]);
later on, I can do:
std::string fn = filename.toLocal8Bit().constData();
This works for most characters, but not all. For example, the word Раи́са will look the same after going through this conversion, but, in fact, have different characters.
So while I can have a Раи́са.txt, and it will display Раи́са.txt, it will not find the file in the filesystem. Most letters work, but и́ doesnt.
(Note that it does work correctly when the file was chosen in the QFileDialog. It does not when it originated from the command line.)
Is there any better way to preserve the filename? Right now I obtain it in whatever native encoding, and can pass-on in the same encoding, without knowing it. At least so I thought.
'и́' is not an ASCII character, that is to say it has no 8-bit representation. How it is represented in argv[1] then is OS dependent. But it's not getting represented in just one char.
The fromLocal8bit uses the same QTextCodec::codecForLocale as toLocal8bit. And as you say your std::string will hold "Раи́са.txt" so that's not the problem.
Depending on how your OS defined std::ifstream though std::ifstream may expect each char to be it's own char and not go through the OS's translation. I expect that you are on Windows since you are seeing this problm. In which case you should use the std::wstring implementation of std::fstream which is Microsoft specific: http://msdn.microsoft.com/en-us/library/4dx08bh4.aspx
You can get a std::wstring from QString by using: toStdWString
See here for more info: fstream::open() Unicode or Non-Ascii characters don't work (with std::ios::out) on Windows
EDIT:
A good cross-platform option for projects with access to it is Boost::Filesystem. ypnos Mentions File-Streams as specifically pertinent.

Retrieve Unicode code points > U+FFFF from QChar

I have an application that is supposed to deal with all kinds of characters and at some point display information about them. I use Qt and its inherent Unicode support in QChar, QString etc.
Now I need the code point of a QChar in order to look up some data in http://unicode.org/Public/UNIDATA/UnicodeData.txt, but QChar's unicode() method only returns a ushort (unsigned short), which usually is a number from 0 to 65535 (or 0xFFFF). There are characters with code points > 0xFFFF, so how do I get these? Is there some trick I am missing or is this currently not supported by Qt/QChar?
Each QChar is a UTF-16 value, not a complete Unicode codepoint. Therefore, non-BMP characters consist of two QChar surrogate pairs.
The solution appears to lay in code that is documented but not seen much on the Web. You can get the utf-8 value in decimal form. You then apply to determine if a single QChar is large enough. In this case it is not. Then you need to create two QChar's.
uint32_t cp = 155222; // a 4-byte Japanese character
QString str;
if(Qchar::requiresSurrogate(cp))
{
QChar charArray[2];
charArray[0] = QChar::highSurrogate(cp);
charArray[1] = QChar::lowSurrogate(cp);
str = QString(charArray, 2);
}
The resulting QString will contain the correct information to display your supplemental utf-8 character.
Unicode characters beyond U+FFFF in Qt
QChar itself only supports Unicode characters up to U+FFFF.
QString supports Unicode characters beyond U+FFFF by concatenating two QChars (that is, by using UTF-16 encoding). However, the QString API doesn't help you much if you need to process characters beyond U+FFFF. As an example, a QString instance which contains the single Unicode character U+131F6 will return a size of 2, not 1.
I've opened QTBUG-18868 about this problem back in 2011, but after more than three years (!) of discussion, it was finally closed as "out of scope" without any resolution.
Solution
You can, however, download and use these Unicode Qt string wrapper classes which have been attached to the Qt bug report. Licensed under the LGPL.
This download contains the wrapper classes QUtfString, QUtfChar, QUtfRegExp and QUtfStringList which supplement the existing Qt classes and allow you to do things like this:
QUtfString str;
str.append(0x1307C); // Some Unicode character beyond U+FFFF
Q_ASSERT(str.size() == 1);
Q_ASSERT(str[0] == 0x1307C);
str += 'a';
Q_ASSERT(str.size() == 2);
Q_ASSERT(str[1] == 'a');
Q_ASSERT(str.indexOf('a') == 1);
For further details about the implementation, usage and runtime complexity please see the API documentation included within the download.

Resources