Related
QtWebEngine uses a IPC mechanism to communicate between the C+ Qt world and the JavaScript work. This mechanism is used for QWebChannel, and it appears to be based on WebSockets. Is there a way to use the underlying IPC or WebSockets without using QWebChannel, as the latter seems restricted to strings or JSON-encoded data?
Background: I wrote an application QtDomTerm which is a JavaScript-based terminal emulator that uses QWebChannel to connect input/output from a PTY to QtWebEngine. This works fairly well, but there is a glitch relating to utf8/string conversion. Ideally, I'd like to send raw bytes from the PTY, and do byte-to-text conversion in JavaScript. But QWebChannel is too high-level and only handles strings or JSON-encoded data. It does not handle QByteArray.
Of course there are multiple ways to solve my problem. One is to manually create a WebSocket server, and have the JavaScript running in the QtWebEngine connect to it. But it appears that is what is going on behind the scene anyway, using qt.webChannelTransport. It seems like it would be most efficient and elegant if I could access the underlying transport (the class WebChannelIPCTransportHost seems to be relevant).
Anyone tried something like this? I.e. I would like to not use QWebChannel - unless there is an efficient way for it to pass a QByteArray.
(I rephrased the question. There was a comment about missing research, but I've browsed heavily though the Qt docuemntation, source code, and here, without finding a clear answer.)
What prevents you from sending QString::fromLatin1(data.toHex()), where data is of the QByteArray type? That's all you need, really. Use a reverse conversion on the javascript side, see e.g. this question.
Since there doesn't seem to be a way to go "below" the QWebChannel (without hacking Qt itself) there seemed to be three options:
Create a separate WebSockets connection. That would require bigger non-local changes, and possibly creating extra processes.
Do the byte-to-text decoding on the C++ side, maybe using QTextDecoder. That would probably be simplest. However, there are architectural reasons for preferring to do text decoding close to where we do escape sequence processing. That is certainly more traditional, and may better handle corner cases. Plus DomTerm needs to work with other front-ends than the QtDomTerm front-end.
Encode the bytestream to a sequence of strings, transmit the latter using QWebChannel, convert back to bytes on the JavaScript side, and then handle text-decoding in JavaScript.
I chose to implement the third option. There is some overhead in therms of converting back-end-forth. However, I designed and implemented an encoding that is quite efficient:
/** Encode an arbitrary sequence of bytes as an ASCII string.
* This is used because QWebChannel doesn't have a way to transmit
* data except as strings or JSON-encoded strings.
* We restrict the encoding to ASCII (i.e. codes less then 128)
* to avoid excess bytes if the result is UTF-8-encoded.
*
* The encoding optimizes UTF-8 data, with the following byte values:
* 0-3: 1st byte of a 2-byte sequence encoding an arbitrary 8-bit byte.
* 4-7: 1st byte of a 2-byte sequence encoding a 2-byte UTF8 Latin-1 character.
* 8-13: mean the same ASCII control character
* 14: special case for ESC
* 15: followed by 2 more bytes encodes a 2-byte UTF8 sequence.
* bytes 16-31: 1st byte of a 3-byte sequence encoding a 3-byte UTF8 sequence.
* 32-127: mean the same ASCII printable character
* The only times we generate extra bytes for a valid UTF8 sequence
* if for code-points 0-7, 14-26, 28-31, 0x100-0x7ff.
* A byte that is not part of a valid UTF9 sequence may need 2 bytes.
* (A character whose encoding is partial, may also need extra bytes.)
*/
static QString encodeAsAscii(const char * buf, int len)
{
QString str;
const unsigned char *ptr = (const unsigned char *) buf;
const unsigned char *end = ptr + len;
while (ptr < end) {
unsigned char ch = *ptr++;
if (ch >= 32 || (ch >= 8 && ch <= 13)) {
// Characters in the printable ascii range plus "standard C"
// control characters are encoded as-is
str.append(QChar(ch));
} else if (ch == 27) {
// Special case for ESC, encoded as '\016'
str.append(QChar(14));
} else if ((ch & 0xD0) == 0xC0 && end - ptr >= 1
&& (ptr[0] & 0xC0) == 0x80) {
// Optimization of 2-byte UTF-8 sequence
if ((ch & 0x1C) == 0) {
// If Latin-1 encode 110000aa,10bbbbbb as 1aa,0BBBBBBB
// where BBBBBBB=48+bbbbbb
str.append(4 + QChar(ch & 3));
} else {
// Else encode 110aaaaa,10bbbbbb as '\017',00AAAAA,0BBBBBBB
// where AAAAAA=48+aaaaa;BBBBBBB=48+bbbbbb
str.append(QChar(15));
str.append(QChar(48 + (ch & 0x3F)));
}
str.append(QChar(48 + (*ptr++ & 0x3F)));
} else if ((ch & 0xF0) == 0xE0 && end - ptr >= 2
&& (ptr[0] & 0xC0) == 0x80 && (ptr[1] & 0xC0) == 0x80) {
// Optimization of 3-byte UTF-8 sequence
// encode 1110aaaa,10bbbbbb,10cccccc as AAAA,0BBBBBBB,0CCCCCCC
// where AAAA=16+aaaa;BBBBBBB=48+bbbbbb;CCCCCCC=48+cccccc
str.append(QChar(16 + (ch & 0xF)));
str.append(QChar(48 + (*ptr++ & 0x3F)));
str.append(QChar(48 + (*ptr++ & 0x3F)));
} else {
// The fall-back case - use 2 bytes for 1:
// encode aabbbbbb as 000000aa,0BBBBBBB, where BBBBBBB=48+bbbbbb
str.append(QChar((ch >> 6) & 3));
str.append(QChar(48 + (ch & 0x3F)));
}
}
return str;
}
Yes, this is overkill and almost certainly premature optimization, but I can't help myself :-)
I'm trying to store binary data in a QR code. Apparently QR codes do support storing raw binary data (or ISO-8859-1 / Latin1). Here is what I want to encode (hex):
d1 50 01 00 00 00 f6 5f 05 2d 8f 0b 40 e2 01
I've tried the following encoders:
qr.js
Google Charts
qrcode.js
Decoding with zxing.org produces various incorrect results. The two javascript ones produce this (it's wrong; the first text character should be Ñ.
Whereas Google Charts produces this...
What is going on? Are any of these correct? What's really weird is that if I encode this sequence (with the JS ones at least) then it works fine - I would have thought the issue was non-ASCII characters but Ñ (0xd1) is non-ASCII.
d1 50 01 00 00 00 01 02 03 04 05 06 40 e2 01
Does anyone know what is going on?
Update
It occurred to me to try scanning them with a ZBar-based scanner app I found. It scans both JS versions ok (at least they start with ÑP). The Google Charts one is just wrong. So it seems like the issue is with ZXing (which is surprisingly shit - I wouldn't recommend it to anyone).
Update 2
ZBar can't handle null bytes. :-(
"What is going on? Are any of these correct?"
Except for the google chart (which is just empty), your QR codes are correct.
You can see the binary data from zxing is what you would expect:
4: Byte mode indicator
0f: length of 15 byte
d15001...: your 15 bytes of data
ec11 is just padding
The problem comes from the decoding. Because most decoders will try to interpret it as text. But since it's binary data, you should not try to handle it as text. Even if you think you can convert it from text to binary, as you saw this may cause issues with values which are not valid text.
So the solution is to use a decoder that will output you the binary data, and not text data.
Now about interpreting the QR code binary data as text, you said the first character should be 'Ñ' which is true if interpreted it as "ISO-8859-1",
which according to the QR code standard, is what should be done when there is no ECI mode defined.
But in practice, most smartphone QR code reader will interpret it as UTF-8 in this case (or at least try to auto-detect the encoding).
Even though this is not the standard, this had become common practice:
binary mode with no ECI, UTF-8 encoded text.
Maybe the reason behind it is that no one wants to waste these precious bytes adding an ECI mode specifying UTF-8. And actually, not all decoders support ECI.
There are two issues that you have to overcome to store binary data in QR codes.
ISO-8859-1 does not allow bytes in ranges of 00-1F and 7F-9F. If you
need to encode these bytes anyway, quote or encode them, i.e. use
quoted-printable or Base-64 encoding to avoid these ranges.
Since you are trying to store binary data in QR codes, you have to
rely only on your own scanner that will handle this binary data. You
don’t have to display text from your QR codes by other software,
like web application at zxing.org, because most QR decoders,
including that of zxing.org use heuristics to detect the character
set used. These heuristics may detect a character set other than
ISO-8859-1 and thus fail to properly display your binary data. Some
scanners use heuristics to detect a character set even if the
character set is explicitly given by ECI. This is why providing ECI
may not help much – scanners still use heuristics even with ECI.
So, using US-ASCII printable characters only (e.g., binary data encoded in Base64 before passing it to a QR Code generator) is the safest choice for QR code against the heuristics. This will also overcome another complication: that ISO-8859-1 was not the default encoding in earlier QR code standard published in 2000 (ISO/IEC 18004:2000). That standard did specify 8-bit Latin/Kana character set in accordance with JIS X 0201 (JIS8 also known as ISO-2022-JP) as default encoding for 8-bit mode, while the updated standard published in 2005 did change the default to ISO-8859-1.
As an alternative to Base-64, you can encode each byte with two hexadecimal characters (0-9, A-F), so, in the QR code your data will be encoded in the alphanumeric mode, not in 8-bit mode. This will disable all heuristics for sure and should not produce larger QR Code than with Base-64, because each character in the alphanumeric mode takes only 6 bits in the QR code stream.
Update:
I recently went back and published the referenced code as a project on GitHub for anyone who wants to use it.
https://github.com/yurelle/Base45Encoder
This is a bit necro, but I just hit this problem, and figured out a solution.
The problem with reading QR Codes with ZXING is that it assumes all QR Payloads are Strings. If you're willing to generate the QR Code in java with ZXING, I developed a solution which enables storing a binary payload in ZXING QR Codes with a storage efficiently loss of only -8%; better than the 33% inflation from Base64.
It exploits an internal compression optimization of the ZXING library based around pure Alphanum Strings. If you want a full explanation, with math and Unit Tests, check out my other answer.
But the short answer is this:
Solution
I implemented it as a self-contained static utility class, so all you have to do is call:
//Encode
final byte[] myBinaryData = ...;
final String encodedStr = BinaryToBase45Encoder.encodeToBase45QrPayload(myBinaryData);
//Decode
final byte[] decodedBytes = BinaryToBase45Encoder.decodeBase45QrPayload(encodedStr);
Alternatively, you can also do it via InputStreams:
//Encode
final InputStream in_1 = ... ;
final String encodedStr = BinaryToBase45Encoder.encodeToBase45QrPayload(in_1);
//Decode
final InputStream in_2 = ... ;
final byte[] decodedBytes = BinaryToBase45Encoder.decodeBase45QrPayload(in_2);
Here's the implementation
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.lang.reflect.Field;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.Map;
/**
* For some reason none of the Java QR Code libraries support binary payloads. At least, none that
* I could find anyway. The commonly suggested workaround for this is to use Base64 encoding.
* However, this results in a 33% payload size inflation. If your payload is already near the size
* limit of QR codes, this is a lot.
*
* This class implements an encoder which takes advantage of a built-in compression optimization
* of the ZXING QR Code library, to enable the storage of Binary data into a QR Code, with a
* storage efficiency loss of only -8%.
*
* The built-in optimization is this: ZXING will automatically detect if your String payload is
* purely AlphaNumeric (by their own definition), and if so, it will automatically compress 2
* AlphaNumeric characters into 11 bits.
*
*
* ----------------------
*
*
* The included ALPHANUMERIC_TABLE is the conversion table used by the ZXING library as a reverse
* index for determining if a given input data should be classified as alphanumeric.
*
* See:
*
* com.google.zxing.qrcode.encoder.Encoder.chooseMode(String content, String encoding)
*
* which scans through the input string one character at a time and passes them to:
*
* getAlphanumericCode(int code)
*
* in the same class, which uses that character as a numeric index into the the
* ALPHANUMERIC_TABLE.
*
* If you examine the values, you'll notice that it ignores / disqualifies certain values, and
* effectively converts the input into base 45 (0 -> 44; -1 is interpreted by the calling code
* to mean a failure). This is confirmed in the function:
*
* appendAlphanumericBytes(CharSequence content, BitArray bits)
*
* where they pack 2 of these base 45 digits into 11 bits. This presents us with an opportunity.
* If we can take our data, and convert it into a compatible base 45 alphanumeric representation,
* then the QR Encoder will automatically pack that data into sub-byte chunks.
*
* 2 digits in base 45 is 2,025 possible values. 11 bits has a maximum storage capacity of 2,048
* possible states. This is only a loss of 1.1% in storage efficiency behind raw binary.
*
* 45 ^ 2 = 2,025
* 2 ^ 11 = 2,048
* 2,048 - 2,025 = 23
* 23 / 2,048 = 0.01123046875 = 1.123%
*
* However, this is the ideal / theoretical efficiency. This implementation processes data in
* chunks, using a Long as a computational buffer. However, since Java Long's are singed, we
* can only use the lower 7 bytes. The conversion code requires continuously positive values;
* using the highest 8th byte would contaminate the sign bit and randomly produce negative
* values.
*
*
* Real-World Test:
*
* Using a 7 byte Long to encode a 2KB buffer of random bytes, we get the following results.
*
* Raw Binary Size: 2,048
* Encoded String Size: 3,218
* QR Code Alphanum Size: 2,213 (after the QR Code compresses 2 base45 digits to 11 bits)
*
* This is a real-world storage efficiency loss of only 8%.
*
* 2,213 - 2,048 = 165
* 165 / 2,048 = 0.08056640625 = 8.0566%
*/
public class BinaryToBase45Encoder {
public final static int[] ALPHANUMERIC_TABLE;
/*
* You could probably just copy & paste the array literal from the ZXING source code; it's only
* an array definition. But I was unsure of the licensing issues with posting it on the internet,
* so I did it this way.
*/
static {
final Field SOURCE_ALPHANUMERIC_TABLE;
int[] tmp;
//Copy lookup table from ZXING Encoder class
try {
SOURCE_ALPHANUMERIC_TABLE = com.google.zxing.qrcode.encoder.Encoder.class.getDeclaredField("ALPHANUMERIC_TABLE");
SOURCE_ALPHANUMERIC_TABLE.setAccessible(true);
tmp = (int[]) SOURCE_ALPHANUMERIC_TABLE.get(null);
} catch (NoSuchFieldException e) {
e.printStackTrace();//Shouldn't happen
tmp = null;
} catch (IllegalAccessException e) {
e.printStackTrace();//Shouldn't happen
tmp = null;
}
//Store
ALPHANUMERIC_TABLE = tmp;
}
public static final int NUM_DISTINCT_ALPHANUM_VALUES = 45;
public static final char[] alphaNumReverseIndex = new char[NUM_DISTINCT_ALPHANUM_VALUES];
static {
//Build AlphaNum Index
final int len = ALPHANUMERIC_TABLE.length;
for (int x = 0; x < len; x++) {
// The base45 result which the alphanum lookup table produces.
// i.e. the base45 digit value which String characters are
// converted into.
//
// We use this value to build a reverse lookup table to find
// the String character we have to send to the encoder, to
// make it produce the given base45 digit value.
final int base45DigitValue = ALPHANUMERIC_TABLE[x];
//Ignore the -1 records
if (base45DigitValue > -1) {
//The index into the lookup table which produces the given base45 digit value.
//
//i.e. to produce a base45 digit with the numeric value in base45DigitValue, we need
//to send the Encoder a String character with the numeric value in x.
alphaNumReverseIndex[base45DigitValue] = (char) x;
}
}
}
/*
* The storage capacity of one digit in the number system; i.e. the maximum
* possible number of distinct values which can be stored in 1 logical digit
*/
public static final int QR_PAYLOAD_NUMERIC_BASE = NUM_DISTINCT_ALPHANUM_VALUES;
/*
* We can't use all 8 bytes, because the Long is signed, and the conversion math
* requires consistently positive values. If we populated all 8 bytes, then the
* last byte has the potential to contaminate the sign bit, and break the
* conversion math. So, we only use the lower 7 bytes, and avoid this problem.
*/
public static final int LONG_USABLE_BYTES = Long.BYTES - 1;
//The following mapping was determined by brute-forcing -1 Long (all bits 1), and compressing to base45 until it hit zero.
public static final int[] BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION = new int[] {0,2,3,5,6,8,9,11,12};
public static final int NUM_BASE45_DIGITS_PER_LONG = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION[LONG_USABLE_BYTES];
public static final Map<Integer, Integer> BASE45_TO_BINARY_DIGIT_COUNT_CONVERSION = new HashMap<>();
static {
//Build Reverse Lookup
int len = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION.length;
for (int x=0; x<len; x++) {
int numB45Digits = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION[x];
BASE45_TO_BINARY_DIGIT_COUNT_CONVERSION.put(numB45Digits, x);
}
}
public static String encodeToBase45QrPayload(final byte[] inputData) throws IOException {
return encodeToBase45QrPayload(new ByteArrayInputStream(inputData));
}
public static String encodeToBase45QrPayload(final InputStream in) throws IOException {
//Init conversion state vars
final StringBuilder strOut = new StringBuilder();
int data;
long buf = 0;
// Process all input data in chunks of size LONG.BYTES, this allows for economies of scale
// so we can process more digits of arbitrary size before we hit the wall of the binary
// chunk size in a power of 2, and have to transmit a sub-optimal chunk of the "crumbs"
// left over; i.e. the slack space between where the multiples of QR_PAYLOAD_NUMERIC_BASE
// and the powers of 2 don't quite line up.
while(in.available() > 0) {
//Fill buffer
int numBytesStored = 0;
while (numBytesStored < LONG_USABLE_BYTES && in.available() > 0) {
//Read next byte
data = in.read();
//Push byte into buffer
buf = (buf << 8) | data; //8 bits per byte
//Increment
numBytesStored++;
}
//Write out in lower base
final StringBuilder outputChunkBuffer = new StringBuilder();
final int numBase45Digits = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION[numBytesStored];
int numB45DigitsProcessed = 0;
while(numB45DigitsProcessed < numBase45Digits) {
//Chunk out a digit
final byte digit = (byte) (buf % QR_PAYLOAD_NUMERIC_BASE);
//Drop digit data from buffer
buf = buf / QR_PAYLOAD_NUMERIC_BASE;
//Write Digit
outputChunkBuffer.append(alphaNumReverseIndex[(int) digit]);
//Track output digits
numB45DigitsProcessed++;
}
/*
* The way this code works, the processing output results in a First-In-Last-Out digit
* reversal. So, we need to buffer the chunk output, and feed it to the OutputStream
* backwards to correct this.
*
* We could probably get away with writing the bytes out in inverted order, and then
* flipping them back on the decode side, but just to be safe, I'm always keeping
* them in the proper order.
*/
strOut.append(outputChunkBuffer.reverse().toString());
}
//Return
return strOut.toString();
}
public static byte[] decodeBase45QrPayload(final String inputStr) throws IOException {
//Prep for InputStream
final byte[] buf = inputStr.getBytes();//Use the default encoding (the same encoding that the 'char' primitive uses)
return decodeBase45QrPayload(new ByteArrayInputStream(buf));
}
public static byte[] decodeBase45QrPayload(final InputStream in) throws IOException {
//Init conversion state vars
final ByteArrayOutputStream out = new ByteArrayOutputStream();
int data;
long buf = 0;
int x=0;
// Process all input data in chunks of size LONG.BYTES, this allows for economies of scale
// so we can process more digits of arbitrary size before we hit the wall of the binary
// chunk size in a power of 2, and have to transmit a sub-optimal chunk of the "crumbs"
// left over; i.e. the slack space between where the multiples of QR_PAYLOAD_NUMERIC_BASE
// and the powers of 2 don't quite line up.
while(in.available() > 0) {
//Convert & Fill Buffer
int numB45Digits = 0;
while (numB45Digits < NUM_BASE45_DIGITS_PER_LONG && in.available() > 0) {
//Read in next char
char c = (char) in.read();
//Translate back through lookup table
int digit = ALPHANUMERIC_TABLE[(int) c];
//Shift buffer up one digit to make room
buf *= QR_PAYLOAD_NUMERIC_BASE;
//Append next digit
buf += digit;
//Increment
numB45Digits++;
}
//Write out in higher base
final LinkedList<Byte> outputChunkBuffer = new LinkedList<>();
final int numBytes = BASE45_TO_BINARY_DIGIT_COUNT_CONVERSION.get(numB45Digits);
int numBytesProcessed = 0;
while(numBytesProcessed < numBytes) {
//Chunk out 1 byte
final byte chunk = (byte) buf;
//Shift buffer to next byte
buf = buf >> 8; //8 bits per byte
//Write byte to output
//
//Again, we need to invert the order of the bytes, so as we chunk them off, push
//them onto a FILO stack; inverting their order.
outputChunkBuffer.push(chunk);
//Increment
numBytesProcessed++;
}
//Write chunk buffer to output stream (in reverse order)
while (outputChunkBuffer.size() > 0) {
out.write(outputChunkBuffer.pop());
}
}
//Return
out.flush();
out.close();
return out.toByteArray();
}
}
Just at a glance, the qr formats are different. I'd compare the qr formats to see if it's a problem of error correction or encoding or something else.
It turned out that ZXing is just crap, and ZBar does some weird stuff with the data (converting it to UTF-8 for example). I managed to get it to output the raw data including null bytes though. Here is a patch for the best Android ZBar library I found, that has now been merged.
I used System.Convert.ToBase64String to convert the supplied sample byte array into a Base64-encoded string, then I used ZXing to create a QRCode image.
Next I called ZXing to read the string back from the generated QRCode, and then called System.Convert.FromBase64String to convert the string back into a byte array.
I confirm that the data completed the round trip successfully.
The informational RFC 9285 - The Base45 Data Encoding document describing the optimal scheme for storing binary data within the constraints of QR Alphanumeric Mode was recently published by the IETF.
(one positive side-effect of ongoing standardization work surrounding Health Certificate QR-codes)
I used
sqlite3_open16(v_st,&m_db)//SQLITE_OK
I want to insert unicode string.
Example:
sqlite3_exec(m_db,"UPDATE t1 SET a1='text'",0,0,0); //is good
but
sqlite3_exec(m_db,"UPDATE t1 SET a1='текст'",0,0,0); //is bad
When you use the sqlite3_open16 function you open the DB with the UTF-16 encoding.
And when you use the sqlite3_exec(m_db,"UPDATE t1 SET a1='текст'",0,0,0) function you use the encoding of your compiler (UTF-8 by default or cp1251).
You should use the same encoding in both cases. You may do one of theese things:
1) use the sqlite3_open function to open it with UTF-8 encoding and enshure your compiler uses UTF-8 as well;
or
2) change your compiler encoding to UTF-16;
or
3) use the sql statement PRAGMA encoding = "UTF-8"
string ANSItoUTF16le(const char* v_str)
{
string t_res;
unsigned char* t_buf=(unsigned char*)v_str;
auto t_size=strlen(v_str);
unsigned char t_s='А',t_f='п',t_E='Ё',t_e='Ё',t2_s='р',t2_f='я';
unsigned char t_c;
for(auto i=0;i<t_size;i++)
{
if(t_buf[i]>=t_s&&t_buf[i]<=t_f)
{
t_res+='\xD0';
t_c='\x90'+t_buf[i]-t_s;
t_res+=t_c;
continue;
}
if(t_buf[i]>=t2_s&&t_buf[i]<=t2_f)
{
t_res+='\xD1';
t_c='\x80'+t_buf[i]-t2_s;
t_res+=t_c;
continue;
}
if(t_buf[i]==t_E)
{
t_res+="\xD0\x01";
continue;
}
if(t_buf[i]==t_e)
{
t_res+="\xD1\x91";
continue;
}
t_res+=v_str[i];
}
return t_res;
}
and
sqlite3_exec(m_db,ANSItoUTF16le("UPDATE t1 SET a1='текст'").c_str(),0,0,0);
You can insert unicode string into sqllite in c++ by using a wrapper class on top of sqlite3.
CppSqlite3U is a unicode wrapper around the SQLite database. It allows using UNICODE types, such as wchar_t (instead of char), or It can be downloaded from here. also described in this article. For example: a query executed by execQuery can be of a LPCTSTR type, meaning that it can be either an array of wide chars or chars depending on whether UNICODE is defined in your project.
CppSQLite3Query execQuery(LPCTSTR szSQL);
I'm working with Qt on an existing project. I'm trying to send a string over a serial cable to a thermostat to send it a command. I need to make sure the string only contains 0-9, a-f, and is no more or less than 6 characters long. I was trying to use QString.contains, but I'm am currently stuck. Any help would be appreciated.
You have two options:
Use QRegExp
Use the QRegExp class to create a regular expression that finds what you're looking for. In your case, something like the following might do the trick:
QRegExp hexMatcher("^[0-9A-F]{6}$", Qt::CaseInsensitive);
if (hexMatcher.exactMatch(someString))
{
// Found hex string of length 6.
}
Update
Qt 5 users should consider using QRegularExpression instead of QRegExp:
QRegularExpression hexMatcher("^[0-9A-F]{6}$",
QRegularExpression::CaseInsensitiveOption);
QRegularExpressionMatch match = hexMatcher.match(someString);
if (match.hasMatch())
{
// Found hex string of length 6.
}
Use QString Only
Check the length of the string and then check to see that you can convert it to an integer successfully (using a base 16 conversion):
bool conversionOk = false;
int value = myString.toInt(&conversionOk, 16);
if (conversionOk && myString.length() == 6)
{
// Found hex string of length 6.
}
Which hashmap collision handling scheme is better when the load factor is close to 1 to ensure minimum memory wastage?
I personally think the answer is open addressing with linear probing, because it doesn't need any additional storage space in case of collisions. Is this correct?
Answering the question: Which hashmap collision handling scheme is better when the load factor is close to 1 to ensure minimum memory wastage?
Open addressing/probing that allows a high fill. Because as you said so yourself, there is no extra space required for collisions (just, well, possibly time -- of course this is also assuming the hash function isn't perfect).
If you did not specify "load factor close to 1" or included "cost" metrics in the question then it would be entirely different.
Happy coding.
A hashmap that is that full will degrade into a linear search, so you will want to keep them under 90% full.
You are right about open addressing using less memory, chaining will need a pointer or offset field in each node.
I have created a hasharray data structure for when I need very lightweight hashtables that will not have alot of inserts. To keep memory usage low all data is embedded in the same block of memory, with the HashArray structure at the start, then two arrays for hashs & values. Hasharray can only be used with the lookup key is stored in the value.
typedef uint16_t HashType; /* this can be 32bits if needed. */
typedef uint16_t HashSize; /* this can be made 32bits if large hasharrays are needed. */
struct HashArray {
HashSize length; /* hasharray length. */
HashSize count; /* number of hash/values pairs contained in the hasharray. */
uint16_t value_size; /* size of each value. (maximum size of value 64Kbytes) */
/* these last two fields are just for show, they are not defined in the HashArray struct. */
uint16_t hashs[length]; /* array of hashs for each value, this helps with resolving bucket collision */
uint8_t values[length * value_size]; /* array holding all values. */
};
#define hasharray_get_hashs(array) (HashType *)(((uint8_t *)(array)) + sizeof(HashArray))
#define hasharray_get_values(array) ((uint8_t *)(array)) + sizeof(HashArray) + \
((array)->length * sizeof(HashType))
#define hasharray_get_value(array, idx) (hasharray_get_values(array) + ((idx) * (array)->value_size))
The macros hasharray_get_hashs & hasharray_get_values are used to get the 'hashs' & 'values' arrays.
I have used this for adding fast lookup of complex objects that are already stored in an array. The objects have a string 'name' field which is used for the lookup. The names are hashed and inserted into the hasharray with the objects index. The values stored in the hasharray can be indexes/pointers/whole objects (I only use small 16bit index values).
If you want to pack the hasharray till it is almost full, then you will want to use full 32bit Hashs instead of the 16bit ones defined above. Larger 32bit hashs will help keep searchs fast when the hasharray is more then 90% full.
The hasharray as defined above can only hold a maximum of 65535, which is fine since I never use it on anything that would have more the a few hundred values. Anything that needs more that that should just use an normal hashtable. But if memory is really an issue, the HashSize type could be changed to 32bits. Also I use power-of-2 lengths to keep the hash lookup fast. Some people prefer to use prime bucket lengths, but that is only needed if the hash function has bad distribution.
#define hasharray_empty_hash 0xFFFF /* hash value to mark empty slots. */
void *hasharray_search(HashArray *array, HashType hash, uint32_t *next) {
HashType *hashs = hasharray_get_hashs(array);
uint32_t mask = array->length - 1;
uint32_t start_idx;
uint32_t idx;
hash = (hash == hasharray_empty_hash) ? 0 : hash; /* need one hash value to mark empty slots. */
start_hash_idx = (hash & mask);
if(*next == 0) {
idx = start_idx; /* new search. */
} else {
idx = *next & mask; /* continuing search to next slot. */
}
/* find hash in hash array. */
do {
/* check for hash match. */
if(hashs[idx] == hash) goto found_hash;
/* check for end of chain. */
if(hashs[idx] == hasharray_empty_hash) break;
idx++;
idx &= mask;
} while(idx != start_idx);
/* maximum tries reached (i.e. did a linear search of whole array) or end of chain. */
return NULL;
found_hash:
*next = idx + 1; /* where to continue search at, if this is not the right value. */
return hasharray_get_values(array) + (idx * array->value_size);
}
hash collisions will happen so the code that calls hasharray_search() needs to compare the search key with the one stored in the value object. If they don't match then hasharray_search() is called again. Also non-unique keys can exist, since searching can continue until 'NULL' is returned to find all values that match one key. The search function uses linear probing to be cache freindly.
typedef struct {
char *name; /* this is the lookup key. */
char *type;
/* other field info... */
} Field;
typedef struct {
Field *list; /* array of Field objects. */
HashArray *lookup; /* hasharray for fast lookup of Field objects by name. The values stored in this hasharray are 16bit indices. */
uint32_t field_count; /* number of Field objects in 'list'. */
} Fields;
extern Fields *fields_new(uint16_t count) {
Fields *fields;
fields = calloc(1, sizeof(Fields));
fields->list = calloc(count, sizeof(Field));
/* allocate hasharray to hold at most 'count' uint16_t values.
* The hasharray will round 'count' up to the next power-of-2.
* That power-of-2 length must be atleast (count+1), so that there will always be one empty slot.
*/
fields->lookup = hasharray_new(count, sizeof(uint16_t));
fields->field_count = count;
}
extern Field *fields_lookup_by_name(Fields *fields, const char *name) {
HashType hash = str_to_hash(name);
Field *field;
uint32_t next = 0;
uint16_t *rc;
uint16_t idx;
do {
rc = hasharray_search(fields->lookup, hash, &next);
if(rc == NULL) break; /* field not found. */
/* found a possible match. */
idx = *rc;
assert(idx < fields->field_count);
field = &(fields->list[idx]);
/* compare lookup name with field's name. */
if(strcmp(name, field->name) == 0) {
/* found match. */
return field;
}
/* field didn't match continue search to next field. */
} while(1);
return NULL;
}
The worst case searching will degrade to a linear search of the whole array if it is 99% full and the key doesn't exist. If the keys are integers, then a linear search shouldn't be to bad, also only keys with the same hash value will need to be compared. I try to keep the hasharrays sized so they are only about 70-80% full, the space wasted on empty slots isn't much if the values are only 16bit values. With this design you only waste 4bytes per empty slot when using 16bit hashs & 16bit index values. The array of objects (Field structs in the above example) has no empty spots.
Also most hashtable implementations that I have seen don't store the computed hashs and require full key compares to resolve bucket collisions. Comparing the hashs helps a lot since only a small part of the hash value is used to lookup the bucket.
As the others said, in linear probing, when load factor near to 1, the time complexity near to linear search. (When it's full, its infinite.) There is a memory-efficiency trade off here. While segregate chaining always give us theoretically constant time.
Normally, under linear probing, it's recommended to keep the load factor between 1/8 and 1/2. when the array is 1/2 full, we resize it to double the size of original array. (Reference: Algorithms. by Robert Sedgewick. Kevin Wayne. ). When delete, we resize the array to 1/2 of original size as well. If you are really interested, it's good for you to begin with the book I mentioned above.
In practical, it's said that 0.72 is an empirical value we usually use.