How does dlmalloc coalesce chunks? - unix

Here is a detailed description of the dlmalloc algorithm: http://g.oswego.edu/dl/html/malloc.html
A dlmalloc chunk is bookended by some metadata, which includes information about the amount of space in the chunk. Two contiguous free chunks might look like
[metadata | X bytes free space | metadata ][metadata | X bytes free space | metadata]
Block A Block B
In that case we want to coalesce block B into block A. Now how many bytes of free space should block A report?
I think it should be 2X + 2 size(metadata) bytes, since now the coalesced block looks like:
[metadata | X bytes free space metadata metadata X bytes free space | metadata]
But I'm wondering if this is correct, because I have a textbook that says the metadata will report 2X bytes without including the extra space we get from being able to write over the metadata.

You can see the answer yourself by looking at the source. Begin with line 1876 to verify your diagram. The metadata is just two size_t unsigned integers, accessed by aliasing a struct malloc_chunk (line 1847). Field prev_size is the size of the previous chunk, and size is the size of this one. Both include the size of the struct malloc_chunk itself. This will be 8 or 16 bytes on nearly all machines depending on whether the code is compiled for 32- or 64-bit addressing.
The "normal case" coalescing code starts at line 3766. You can see that the size variable it's using to track coalescing is chunk size.
So - yeah - in the code blocks marked /* consolidate backward */ and /* consolidate forward */, when he adds the size of the preceding and succeeding chunks, he's implicitly adding the size of the struct malloc_chunk as you suspected.
This shows that your interpretation is correct. My expectation is that the textbook author just got sloppy about the difference between chunk size (which includes metadata) and the size of the memory block allocated to the user. Incidentally, malloc takes care of this difference at line 3397.
Perhaps the bigger lesson here is that - when you're trying to learn anything - you should never skip an opportunity to go straight to the first-hand source and figure stuff out for yourself.

Related

Finding similar hashes

I'm trying to find 2 different plain text words that create very similar hashes.
I'm using the hashing method 'whirlpool', but I don't really need my question to be answered in the case or whirlpool, if you can using md5 or something easier that's ok.
The similarities i'm looking for is that they contain the same number of letters (doesnt matter how much they're jangled up)
i.e
plaintext 'test'
hash 1: abbb5 has 1 a , 3 b's , one 5
plaintext 'blahblah'
hash 2: b5bab must have the same, but doesnt matter what order.
I'm sure I can read up on how they're created and break it down and reverse it, but I am just wondering if what I'm talking about occurs.
I'm wondering because I haven't found a match of what I'm explaining (I created a PoC to run threw random words / letters till it recreated a similar match), but then again It would take forever doing it the way i was dong it. and was wondering if anyone with real knowledge of hashes / encryption would help me out.
So you can do it like this:
create an empty sorted map \
create a 64 bit counter (you don't need more than 2^63 inputs, in all probability, since you would be dead before they would be calculated - unless quantum crypto really takes off)
use the counter as input, probably easiest to encode it in 8 bytes;
use this as input for your hash function;
encode output of hash in hex (use ASCII bytes, for speed);
sort hex on number / alphabetically (same thing really)
check if sorted hex result is a key in the map
if it is, show hex result, the old counter from the map & the current counter (and stop)
if it isn't, put the sorted hex result in the map, with the counter as value
increase counter, goto 3
That's all folks. Results for SHA-1:
011122344667788899999aaaabbbcccddeeeefff for both 320324 and 429678
I don't know why you want to do this for hex, the hashes will be so large that they won't look too much alike. If your alphabet is smaller, your code will run (even) quicker. If you use whole output bytes (i.e. 00 to FF instead of 0 to F) instead of hex, it will take much more time - a quick (non-optimized) test on my machine shows it doesn't finish in minutes and then runs out of memory.

OpenCL - Are work-group axes exchangeable?

I was trying to find the best work-group size for a problem and I figured out something that I couldn't justify for myself.
These are my results :
GlobalWorkSize {6400 6400 1}, WorkGroupSize {64 4 1}, Time(Milliseconds) = 44.18
GlobalWorkSize {6400 6400 1}, WorkGroupSize {4 64 1}, Time(Milliseconds) = 24.39
Swapping axes caused a twice faster execution. Why !?
By the way, I was using an AMD GPU.
Thanks :-)
EDIT :
This is the kernel (a Simple Matrix Transposition):
__kernel void transpose(__global float *input, __global float *output, const int size){
int i = get_global_id(0);
int j = get_global_id(1);
output[i*size + j] = input[j*size + i];
}
I agree with #Thomas, it most probably depends on your kernel. Most probably, in the second case you access memory in a coalescent way and/or make a full use of memory transaction.
Coalescence: When threads need to access elements in the memory the hardware tries to access these elements in as less as possible transactions i.e. if the thread 0 and the thread 1 have to access contiguous elements there will be only one transaction.
full use of a memory transaction: Let's say you have a GPU that fetches 32 bytes in one transaction. Therefore if you have 4 threads that need to fetch one int each you are using only half of the data fetched by the transaction; you waste the rest (assuming an int is 4 bytes).
To illustrate this, let's say that you have a n by n matrix to access. Your matrix is in row major, and you use n threads organized in one dimension. You have two possibilities:
Each workitem takes care of one column, looping through each column element one at a time.
Each workitem takes care of one line, looping through each line element one at a time.
It might be counter-intuitive, but the first solution will be able to make coalescent access while the second won't be. The reason is that when the first workitem will need to access the first element in the first column, the second workitem will access the first element in the second column and so on. These elements are contiguous in the memory. This is not the case for the second solution.
Now if you take the same example, and apply the solution 1 but this time you have 4 workitems instead of n and the same GPU I've just spoken before you'll most probably increase the time by a factor 2 since you will waste half of your memory transactions.
EDIT: Now that you posted your kernel I see that I forgot to mention something else.
With your kernel, it seems that choosing a local size of (1, 256) or (256, 1) is always a bad choice. In the first case 256 transactions will be necessary to read a column (each fetching 32 bytes out of which only 4 will be used - keeping in mind the same GPU of my previous examples) in input while 32 transactions will be necessary to write in output: You can write 8 floats in one transaction hence 32 transactions to write the 256 elements.
This is the same problem with a workgroup size of (256, 1) but this time using 32 transactions to read, and 256 to write.
So why the first size works better? It's because there is a cache system, that can mitigate the bad access for the read part. Therefore the size (1, 256) is good for the write part and the cache system handle the not very good read part, decreasing the number of necessary read transactions.
Note that the number of transactions decreases overall (taking into considerations all the workgroups within the NDRange). For example the first workgroup issues the 256 transactions, to read the 256 first elements of the first column. The second workgroup might just go in the cache to retrieve the elements of the second column because they were fetched by the transactions (of 32 bytes) issued by the first workgroup.
Now, I'm almost sure that you can do better than (1, 256) try (8, 32).

Storing a BMP image in a QR code

I'm trying to create (or, if I've somehow missed it in my research, find) an algorithm to encode/decode a bmp image into/from a QR code format. I've been using a guide (Thonky) to try to understand the basics of QR codes and I'm still not sure how to go about this problem, specifically:
Should I encode the data as binary or would numeric be more reasonable (assuming each pixel will have a max. value of 255)?
I've searched for information on the structured append capabilities of QR codes but haven't found much detail beyond the fact that it's supported by QR codes -- how could I implement/utilize this functionality?
And, of course, if there are any tips/suggestions to better store an image as binary data, I'm very open to suggestions!
Thanks for your time,
Sean
I'm not sure you'll be able to achieve that, as the amount of information a QR Code can hold is quite limited.
First of all, you'll probably want to store your image as raw bytes, as the other formats (numeric and alphanumeric) are designed to hold text/numbers and would provide less space to store your image. Let's assume you choose the biggest possible QR Code (version 40), with the smallest level of error correction, which can hold up to 2953 bytes of binary information (see here).
First option, as you suggest, you store the image as a bitmap. This format allows no compression at all and requires (in the case of an RGB image without alpha channel) 3 bytes per pixel. If we take into account the file header size (14 to 54 bytes), and ignore the padding (each row of image data must be padded to a length being a multiple of 4), that allows you to store roughly 2900/3 = 966 pixels. If we consider a square image, this represents a 31x31 bitmap, which is small even for a thumbnail image (for example, my avatar at the end of this post is 32x32 pixels).
Second option, you use JPEG to encode your image. This format has the advantage of using a compression algorithm that can reduce the file size. This time there is no exact formula to get the size of an image fitting in 2.9kB, but I tried using a few square images and downsizing them until they fit in this size, keeping a good (93) quality factor: this gives an average of about 60x60 pixel images. (On such small images, it's normal not to see an incredible compression factor between jpeg and bmp, as the file header in a jpeg file is far larger than in a bmp file: about 500 bytes). This is better than bitmap, but remains quite small.
Finally, even if you succeed in encoding your image in this QR Code, you will encounter an other problem: a QR Code this big is very, very hard to scan successfully. As a matter of fact, this QR Code will have a size of 177x177 modules (a "module" being a small white or black square). Assuming you scan it using a smartphone providing so-called "HD" frames (1280x720 pixels), each module will have a maximum size on the frame of about 4 pixels. If you take into account the camera noise, the aliasing and the blur due to the fact that the user is never perfectly idle when scanning, the quality of the input frames will make it very hard for any QR Code decoding algorithm to successfully get the QR Code (don't forget we set its error correction level on low at the beginning of this!).
Even though it's not very good news, I hope this helps you!
There is indeed a way to encode information on several (up to 16) QR Codes, using a special header in your QR Codes called "Structured append". The best source of information you can use is the norm about QR Codes (ISO 18004:2006); it's possible (but not necessarily easy) to find it for free on the web.
The relevant part (section 9) of this norm says:
"Up to 16 QR Code symbols may be appended in a structured format. If a symbol is part of a Structured Append message, it is indicated by a header block in the first three symbol character positions.
The Structured Append Mode Indicator 0011 is placed in the four most significant bit positions in the first symbol character.
This is immediately followed by two Structured Append codewords, spread over the four least significant bits of the first symbol character, the second symbol character and the four most significant bits of the third symbol character. The first codeword is the symbol sequence indicator. The second codeword is the parity data and is identical in all symbols in the message, enabling it to be verified that all symbols read form part of the same Structured Append message. This header is immediately followed by the data codewords for the symbol commencing with the first Mode Indicator."
Nevertheless, i'm not sure most QR Code scanners can handle this, as it's a quite advanced feature.
You can define a fixed image size, reduce jpg header parts and using just vital information about it, so you can save up to 480bytes of a ~500bytes normal header.
I was using this method to store people photos for a small-club ID cards, images about 64x64 pixels is enough.

What is the name for encoding/encrypting with noise padding?

I want code to render n bits with n + x bits, non-sequentially. I'd Google it but my Google-fu isn't working because I don't know the term for it.
For example, the input value in the first column (2 bits) might be encoded as any of the output values in the comma-delimited second column (4 bits) below:
0 1,2,7,9
1 3,8,12,13
2 0,4,6,11
3 5,10,14,15
My goal is to take a list of integer IDs, and transform them in a way they can still be used for persistent URLs, but that can't be iterated/enumerated sequentially, and where a client cannot determine programmatically if a URL in a search result set has been visited previously without visiting it again.
I would term this process "encoding". You'll see something similar done to permit the use of communications channels that have special symbols that are not permitted in data. Examples: uuencoding and base64 encoding.
That said, you still need to (and appear at first blush to have) ensure that there is only one correct de-code; and accept the increase in size of the output (in the case above, the output will be double the size, bit-for-bit as the input).
I think you'd be better off encrypting the number with a cheap cypher + a constant secret key stored on your server(s), adding a random character or four at the end, and a cheap checksum, and simply reject any responses that don't have a valid checksum.
<encrypt(secret)>
<integer>+<random nonsense>
</encrypt>
+
<checksum()>
<integer>+<random nonsense>
</checksum>
Then decrypt the first part (remember, cheap == fast), validate the ciphertext using the checksum, throw off the random nonsense, and use the integer you stored.
There are probably some cryptographic no-no's here, but let's face it, the cost of this algorithm being broken is a touch on the low side.

Program to mimic scanf() using system calls

As the Title says, i am trying out this last year's problem that wants me to write a program that works the same as scanf().
Ubuntu:
Here is my code:
#include<unistd.h>
#include<stdio.h>
int main()
{
int fd=0;
char buf[20];
read(0,buf,20);
printf("%s",buf);
}
Now my program does not work exactly the same.
How do i do that both the integer and character values can be stored since my given code just takes the character strings.
Also how do i make my input to take in any number of data, (only 20 characters in this case).
Doing this job thoroughly is a non-trivial exercise.
What you show does not emulate sscanf("%s", buffer); very well. There are at least two problems:
You limit the input to 20 characters.
You do not stop reading at the first white space character, leaving it and other characters behind to be read next time.
Note that the system calls cannot provide an 'unget' functionality; that has to be provided by the FILE * type. With file streams, you are guaranteed one character of pushback. I recently did some empirical research on the limits, finding values that the number of pushed back characters ranged from 1 (AIX, HP-UX) to 4 (Solaris) to 'big', meaning up to 4 KiB, possibly more, on Linux and MacOS X (BSD). Fortunately, scanf() only requires one character of pushback. (Well, that's the usual claim; I'm not sure whether that's feasible when distinguishing between "1.23e+f" and "1.23e+1"; the first needs three characters of lookahead, it seems to me, before it can tell that the e+f is not part of the number.)
If you are writing a plug-in replacement for scanf(), you are going to need to use the <stdarg.h> mechanism. Fortunately, all the arguments to scanf() after the format string are data pointers, and all data pointers are the same size. This simplifies some aspects of the code. However, you will be parsing the scan format string (a non-trivial exercise in its own right; see the recent discussion of print format string parsing) and then arranging to make the appropriate conversions and assignments.
Unless you have unusually stringent conditions imposed upon you, assume that you will use the character-level Standard I/O library functions such as getchar(), getc() and ungetc(). If you can't even use them, then write your own variants of them. Be aware that full integration with the rest of the I/O functions is tricky - things like fseek() complicate matters, and ensuring that pushed-back characters are properly consumed is also not entirely trivial.

Resources