Why is the MZ DOS Header Signature 0x54AD in PE files? - hex

I have recently started with the PE(Portable Executable) file format, more specifically PE/COFF. I was reading a tutorial by Randy Kath here.
When I was reading through the structure of MZ DOS header, I found that MZ DOS header was verified using the signature in e_magic field. The structure is as follows:
typedef struct _IMAGE_DOS_HEADER { // DOS .EXE header
USHORT e_magic; // Magic number
USHORT e_cblp; // Bytes on last page of file
USHORT e_cp; // Pages in file
USHORT e_crlc; // Relocations
USHORT e_cparhdr; // Size of header in paragraphs
USHORT e_minalloc; // Minimum extra paragraphs needed
USHORT e_maxalloc; // Maximum extra paragraphs needed
USHORT e_ss; // Initial (relative) SS value
USHORT e_sp; // Initial SP value
USHORT e_csum; // Checksum
USHORT e_ip; // Initial IP value
USHORT e_cs; // Initial (relative) CS value
USHORT e_lfarlc; // File address of relocation table
USHORT e_ovno; // Overlay number
USHORT e_res[4]; // Reserved words
USHORT e_oemid; // OEM identifier (for e_oeminfo)
USHORT e_oeminfo; // OEM information; e_oemid specific
USHORT e_res2[10]; // Reserved words
LONG e_lfanew; // File address of new exe header
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
and the tutorial says :
All MS-DOS-compatible executable files set this value to 0x54AD,
which represents the ASCII characters MZ.
My question is the same.
The ascii value of M and Z is 77 and 90 respectively, which translates to 4D and 5A in hexadecimal. How does 0x54AD represent MZ?
It may be a silly question. But do help me understand if its too silly.
Thanks.

First off, the source that claims the signature is 0x54AD is wrong; MZ in hex is actually 0x5A4D (for little-endian architectures), as evidenced by the output of this program:
#include <Windows.h> // for the USHORT type
#include <stdio.h>
int main()
{
USHORT MZ = ('M' | 'Z' << 8);
printf("0x%.4hX\n", MZ);
return 0;
}
Output:
0x5A4D
You might still have the question, why does the byte for 'Z' (5A) come first when the signature is actually 'MZ'?
This is to do with endianness, which is the order in which bytes are stored in individual halfwords, words, doublewords, and so on.
Big-endian stores bytes with the most significant byte in the highest memory address, and little-endian is the opposite, storing the most significant byte in the least significant memory address.
The x86 and x64 architectures are little-endian, so the most significant byte in MZ (i.e. the Z) comes first.

Related

How can i determine duration of wav file

I'm working with .wav files and I need to get their duration in seconds.
So far I've been determining it with:
File size / byte_rate
Byte_rate being (Sample Rate * BitsPerSample * Channels) / 8.
And it works, with smaller files, when I try to parse bigger files, I get more seconds than the actual duration.
Example:
Size(bytes): 45207622 Byte_rate: 176400 Duration: 256
(45207622 / 176400)
but the actual duration is 250...
FYI: I've double checked the size and byte_rate, they are correct.
Without a sample RIFF header or your code, it would be difficult to answer the specifics in your question. (i.e. Why your math isn't coming to your expected result.)
However, since you've specified that you're working in C in the comments, might I suggest using the sox library instead of parsing the headers with newly written code? In addition to catching a fair number of edge cases, this allows you to support any format sox supports reading without having to write any of the reading code yourself. (Though anyone inclined to do so should probably take a look at Can someone explain .wav(WAVE) file headers? and RIFF WAVE format specifications. The process should be roughly the method described in the question, at least in most cases. [Edit: That is chunk data length divided by the header's byte rate.])
Example code:
#include <sox.h>
#include <stdio.h>
int main(int argc, char **argv) {
sox_format_t *fmt;
if(argc < 2) {
printf("Please provide audio file.\n");
return 1;
}
fmt = sox_open_read(argv[1], NULL, NULL, NULL);
__uint64_t ws = fmt->signal.length / fmt->signal.channels;
if(fmt->signal.length) {
printf("%0.2f seconds long\n", (double)ws / fmt->signal.rate);
} else {
printf("Cannot determine duration from header.\n");
}
}
For anyone curious, I largely derived this from the sox command line tool's source code.
Thank you EPR for giving me the fix to timing in my program. I'm not using libsox, I've set up a struct trying to match the original at http://www.lightlink.com/tjweber/StripWav/Canon.html This is NOT the correct way to do it but it works for simple files. Another useful reference is at http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html
Anyway I assume the header is 44 bytes and read() it into memory at the location of the struct. Then I can access fields of the struct, malloc space for the pcm data and read() it into the pcm space from where the file pointer was left. I'm just writing an audiogram program so it needs to be close to correct for the WAV files I generate with arecord, sox, Audacity. Always 2 channels, 44100 sample rate. My struct:
struct wavhdr { // defined by Microsoft, needs to match
char riff[4]; // should be "RIFF"
uint32_t len8; // file length - 8
char wave[4]; // should be "WAVE"
char fmt[4]; // should be "fmt "
uint32_t fdatalen; // should be 16 (0x10)
uint16_t ftag; // format tag, 1 = pcm
uint16_t channels; // 2 for stereo
uint32_t sps; // samples/sec
uint32_t srate; // sample rate in bytes/sec (block align)
uint16_t chan8; // channels * bits/sample / 8
uint16_t bps; // bits/sample
char data[4]; // should be "data"
uint32_t datlen; // length of data block
// pcm data follows this
} hdr;
I was trying to use the measured file size - header length / samples/sec, that didn't work, I was off by a factor of 6.

QString to unicode std::string

I know there is plenty of information about converting QString to char*, but I still need some clarification in this question.
Qt provides QTextCodecs to convert QString (which internally stores characters in unicode) to QByteArray, allowing me to retrieve char* which represents the string in some non-unicode encoding. But what should I do when I want to get a unicode QByteArray?
QTextCodec* codec = QTextCodec::codecForName("UTF-8");
QString qstr = codec->toUnicode("Юникод");
std::string stdstr(reinterpret_cast<const char*>(qstr.constData()), qstr.size() * 2 ); // * 2 since unicode character is twice longer than char
qDebug() << QString(reinterpret_cast<const QChar*>(stdstr.c_str()), stdstr.size() / 2); // same
The above code prints "Юникод" as I've expected. But I'd like to know if that is the right way to get to the unicode char* of the QString. In particular, reinterpret_casts and size arithmetics in this technique looks pretty ugly.
The below applies to Qt 5. Qt 4's behavior was different and, in practice, broken.
You need to choose:
Whether you want the 8-bit wide std::string or 16-bit wide std::wstring, or some other type.
What encoding is desired in your target string?
Internally, QString stores UTF-16 encoded data, so any Unicode code point may be represented in one or two QChars.
Common cases:
Locally encoded 8-bit std::string (as in: system locale):
std::string(str.toLocal8Bit().constData())
UTF-8 encoded 8-bit std::string:
str.toStdString()
This is equivalent to:
std::string(str.toUtf8().constData())
UTF-16 or UCS-4 encoded std::wstring, 16- or 32 bits wide, respectively. The selection of 16- vs. 32-bit encoding is done by Qt to match the platform's width of wchar_t.
str.toStdWString()
U16 or U32 strings of C++11 - from Qt 5.5 onwards:
str.toStdU16String()
str.toStdU32String()
UTF-16 encoded 16-bit std::u16string - this hack is only needed up to Qt 5.4:
std::u16string(reinterpret_cast<const char16_t*>(str.constData()))
This encoding does not include byte order marks (BOMs).
It's easy to prepend BOMs to the QString itself before converting it:
QString src = ...;
src.prepend(QChar::ByteOrderMark);
#if QT_VERSION < QT_VERSION_CHECK(5,5,0)
auto dst = std::u16string{reinterpret_cast<const char16_t*>(src.constData()),
src.size()};
#else
auto dst = src.toStdU16String();
If you expect the strings to be large, you can skip one copy:
const QString src = ...;
std::u16string dst;
dst.reserve(src.size() + 2); // BOM + termination
dst.append(char16_t(QChar::ByteOrderMark));
dst.append(reinterpret_cast<const char16_t*>(src.constData()),
src.size()+1);
In both cases, dst is now portable to systems with either endianness.
Use this:
QString Widen(const std::string &stdStr)
{
return QString::fromUtf8(stdStr.data(), stdStr.size());
}
std::string Narrow(const QString &qtStr)
{
QByteArray utf8 = qtStr.toUtf8();
return std::string(utf8.data(), utf8.size());
}
In all cases you should have utf8 in std::string.
You can get the QByteArray from a UTF-16 encoded QString using this:
QTextCodec *codec = QTextCodec::codecForName("UTF-16");
QTextEncoder *encoderWithoutBom = codec->makeEncoder( QTextCodec::IgnoreHeader );
QByteArray array = encoderWithoutBom->fromUnicode( str );
This way you ignore the unicode byte order mark (BOM) at the beginning.
You can convert it to char * like:
int dataSize=array.size();
char * data= new char[dataSize];
for(int i=0;i<dataSize;i++)
{
data[i]=array[i];
}
Or simply:
char *data = array.data();

Onewire temperatures to MQTT broker server

I am trying to modify the code from the example included in the onewire library for arduino so that no matter how many onewire devices I have plugged it will always find them and publish it to a MQTT using the device ID and the current temperature. I have gotten it to publish the temperature, but am having trouble adding the device ID or ROM which is in HEX to my topic.
So for example i want it to appear like this. Note the topic and msg for MQTT need to be Char* (more info here: http://knolleary.net/arduino-client-for-mqtt/api/#publish1)
topic = Celsius eg 12.09
payload (or msg) = \home[ROM]\temperature\current eg. \home\2894AA6220025\temperature\current
(just an example of the output you normally get when you run the code without my additions, this is the serial output!! notice the ROM and celsius that I want to use)
Have put my full code here, it is just a modification of the included onewire example with the pubsub MQTT part added on.
(see line 155 onwards) https://gist.github.com/matbor/5931466
//publish the temp now to mqtt topic
String strTopic = "/house/285A9282300F1/temperature/current"; // need to replace the 285A9282300F1 with the ROM ID on each LOOP!
char charMsg[10];
String strMsg = dtostrf(celsius, 4, 2, charMsg); //convert celsius to char
char charTopic[strTopic.length() + 1];
//char charMsg[strMsg.length() + 1];
strTopic.toCharArray(charTopic, sizeof(charTopic));
strMsg.toCharArray(charMsg, sizeof(charMsg));
client.publish(charTopic,charMsg);
Add this to the top of your sketch, outside of the loop function:
char hexChars[] = "0123456789ABCDEF";
#define HEX_MSB(v) hexChars[(v & 0xf0) >> 4]
#define HEX_LSB(v) hexChars[v & 0x0f]
This defines a pair of macros that return the most-significant and least-significant bytes of an int as the appropriate HEX character. (There may be more appropriate built-in's for this, but this is what I use out of habit).
The following code will insert the ROM, as a HEX string, into the topic. Note you can create the topic as a char[] directly - you don't need to go via a String object.
char charTopic[] = "/house/XXXXXXXXXXXXXXXX/temperature/current";
for (i = 0; i < 8; i++) {
charTopic[7+i*2] = HEX_MSB(addr[i]);
charTopic[8+i*2] = HEX_LSB(addr[i]);
}
For the payload, I'm not sure if it is 100% necessary, but I always explicitly initialise any char[] to all 0's when using as a buffer. This ensures whatever is written into the buffer will definitely be null-terminated. Again, you don't need to go via String types:
char charMsg[10];
memset(charMsg,'\0',10);
dtostrf(celsius, 4, 2, charMsg);
Finally, publish the message:
client.publish(charTopic,charMsg);

Struct Stuffing Incorrectly

I have the following struct:
typedef union
{
struct
{
unsigned char ID;
unsigned short Vdd;
unsigned char B1State;
unsigned short B1FloatV;
unsigned short B1ChargeV;
unsigned short B1Current;
unsigned short B1TempC;
unsigned short B1StateTimer;
unsigned short B1DutyMod;
unsigned char B2State;
unsigned short B2FloatV;
unsigned short B2ChargeV;
unsigned short B2Current;
unsigned short B2TempC;
unsigned short B2StateTimer;
unsigned short B2DutyMod;
} bat_values;
unsigned char buf[64];
} BATTERY_CHARGE_STATUS;
and I am stuffing it from an array as follows:
for(unsigned char ii = 0; ii < 64; ii++) usb_debug_data.buf[ii]=inBuffer[ii];
I can see that the array has the following (arbitrary) values:
inBuffer[0] = 80;
inBuffer[1] = 128;
inBuffer[2] = 12;
inBuffer[3] = 0;
inBuffer[4] = 23;
...
now I want display these values by changing the text of a QEditLine:
str=QString::number((int)usb_debug_data.bat_values.ID);
ui->batID->setText(str);
str=QString::number((int)usb_debug_data.bat_values.Vdd)
ui->Vdd->setText(str);
str=QString::number((int)usb_debug_data.bat_values.B1State)
ui->B1State->setText(str);
...
however, the QEditLine text values are not turning up as expected. I see the following:
usb_debug_data.bat_values.ID = 80 (correct)
usb_debug_data.bat_values.Vdd = 12 (incorrect)
usb_debug_data.bat_values.B1State = 23 (incorrect)
seems like 'usb_debug_data.bat_values.Vdd', which is a short, is not taking its value from inBuffer[1] and inBuffer[2]. Likewise, 'usb_debug_data.bat_values.B1State' should get its value from inBuffer[3] but for some reason is picking up its value from inBuffer[4].
Any idea why this is happening?
C and C++ are free to insert padding between elements of a structure, and beyond the last element, for whatever purposes it desires (usually efficiency but sometimes because the underlying architecture does not allow unaligned access at all).
So you'll probably find that items of two-bytes length are aligned to two-byte boundaries, so you'll end up with something like:
unsigned char ID; // 1 byte
// 1 byte filler, aligns following short
unsigned short Vdd; // 2 bytes
unsigned char B1State; // 1 byte
// 3 bytes filler, aligns following int
unsigned int myVar; // 4 bytes
Many compilers will allow you to specific how to pack structures, such as with:
#pragma pack(1)
or the gcc:
__attribute__((packed))
attribute.
If you don't want to (or can't) pack your structures, you can revert to field-by-filed copying (probably best in a function):
void copyData (BATTERY_CHARGE_STATUS *bsc, unsigned char *debugData) {
memcpy (&(bsc->ID), debugData, sizeof (bsc->ID));
debugData += sizeof (bsc->ID);
memcpy (&(bsc->Vdd), debugData, sizeof (bsc->Vdd));
debugData += sizeof (bsc->Vdd);
: : :
memcpy (&(bsc->B2DutyMod), debugData, sizeof (bsc->B2DutyMod));
debugData += sizeof (bsc->B2DutyMod); // Not really needed
}
It's a pain that you have to keep the structure and function synchronised but hopefully it won't be changing that much.
Structs are not packed by default so the compiler is free to insert padding between members. The most common reason is to ensure some machine dependent alignment. The wikipedia entry on data structure alignment is a pretty good place to start. You essentially have two choices:
insert compiler specific pragmas to force alignment (e.g, #pragma packed or __attribute__((packed))__.
write explicit serialization and deserialization functions to transform your structures into and from byte arrays
I usually prefer the latter since it doesn't make my code ugly with little compiler specific adornments everywhere.
The next thing that you are likely to discover is that the byte order for multi-byte integers is also platform specific. Look up endianness for more details

Is there a way to receive data as unsigned char over UDP on Qt?

I need to send floating point numbers using a UDP connection to a Qt application. Now in Qt the only function available is
qint64 readDatagram ( char * data, qint64 maxSize, QHostAddress * address = 0, quint16 * port = 0 )
which accepts data in the form of signed character buffer. I can convert my float into a string and send it but it will obviously not be very efficient converting a 4 byte float into a much longer sized character buffer.
I got hold of these 2 functions to convert a 4 byte float into an unsinged 32 bit integer to transfer over network which works fine for a simple C++ UDP program but for Qt I need to receive the data as unsigned char.
Is it possible to avoid converting the floatinf point data into a string and then sending it?
uint32_t htonf(float f)
{
uint32_t p;
uint32_t sign;
if (f < 0) { sign = 1; f = -f; }
else { sign = 0; }
p = ((((uint32_t)f)&0x7fff)<<16) | (sign<<31); // Whole part and sign.
p |= (uint32_t)(((f - (int)f) * 65536.0f))&0xffff; // Fraction.
return p;
}
float ntohf(uint32_t p)
{
float f = ((p>>16)&0x7fff); // Whole part.
f += (p&0xffff) / 65536.0f; // Fraction.
if (((p>>31)&0x1) == 0x1) { f = -f; } // Sign bit set.
return f;
}
Have you tried using readDatagram? Or converting the data to a QByteArray after reading? In many cases a char* is really just a byte array. This is one of those cases. Note that the writeDatagram can take a QByteArray.
Generally every thing sent across sockets is in bytes not strings, layers on either end do the conversions. Take a look here, especially the Broadcaster examples. They show how to create a QByteArray for broadcast and receive.
Not sure why the downvote, since the question is vague in requirements.
A 4-byte float is simply a 4 character buffer, if cast as one. If the systems are homogenous, the float can be sent as a signed char *, and bit for bit it'll be the same read into the signed char * on the receiver directly, no conversion needed. If the systems are heterogenous, then this won't work and you need to convert it to a portable format, anyway. IEEE format is often used, but my question is still, what are the requirements, is the float format the same between systems?
If I read it correctly, your primary question seems to be how to receive data of type unsigned char with QT's readDatagram function which uses a pointer to a buffer of type char.
The short answer is use a cast along these lines:
const size_t MAXSIZE = 1024;
unsigned char* data = malloc(MAXSIZE);
readDatagram ( (unsigned char *)data, MAXSIZE, address, port )
I'm going to assume you have multiple machines which use the same IEEE floating point format but some of which are big endian and some of which are little endian. See this SO post for a good discussion of this issue.
In that case you could do something a bit simpler like this:
const size_t FCOUNT = 256;
float* data = malloc(FCOUNT * sizeof(*data));
readDatagram ( (char *)data, FCOUNT * sizeof(*data), address, port )
for (int i = 0; i != FCOUNT; ++i)
data[i] = ntohf(*((uint32_t*)&data[i]));
The thing to remember is that as far as networking functions like readDatagram are concerned, the data is just a bunch of bits and it doesn't care what type those bits are interpreted as.
If both ends of your UDP connection use Qt, I would suggest looking at QDataStream. You can create this from a QByteArray each time you read a datagram, and then read whatever values you require - floats, maps, lists, QVariants, and of course string.
Similarly, on the sending side, you'd create a data stream, push data into it, then send the resulting QByteArray over writeDatagram.
Obviously this only works if both ends use Qt - the data encoding is well-defined, but non-trivial to generate by hand.
(If you want stream orientated behaviour, you could use the fact that QUDPSocket is a QIODevice with a data-stream, but it sounds as if you want per-datagram behaviour)

Resources