Memory (sbrk) 16-byte aligned shifting on pointer access - pointers

I wrote a reasonably basic memory allocator using sbrk. I ask for a chunk of memory, say 65k and carve it up as needed for variables requesting dynamic memory. I free the memory by adding it back to the 65k block. The 65k block is derived from a union sizeof(16-bytes). Then I align the block along an even 16-byte boundary. But I'm getting unusual behavior.
Accessing the memory appears fine as I allocate and begin to populate my data structures accept that on one of my function calls, I pass a pointer to a member variable in a global structure but the address of the pointer argument doesn't map directly to the address of that member.
For example, the real address of this particular member happens to be: 0x100313d50 but when executing a particular function (nothing special) the address of the member is being represented as 0x100313d70. Inside the debugger I can query the real address and it appears correct when inside the function where this manifests. This isn't the first member being accessed either, it's the third so two prior memory accesses are fine, but during the third access I'm seeing this unusual shifting.
Is it possible that I'm accessing this memory via a misaligned block? It's possible but I'd expect the get a SIGBUS exception thrown (SPARC chip). I'm compiling using -memalign=16s so it ought to SIGBUS instead of trapping and fixing the misalignment.
All of my structures are padded on a multiple of 16-bytes: sizeof(structure)%16 = 0. Has anyone had experience with this type of behavior? Generally speaking, what type of things/stuff/etc. might cause a pointer to misrepresent a memory address?
Cheers,
Tracy.
Solaris 10, SunStudio-12, C language on modern SPARC processor (in case this helps).

I figure I should answer my own question in the event someone else out there has a similar problem.
The reason why the memory address was shifting is because a prior call to a utility function accidentally overwrote the meta-address of the global structure thusly rewriting the meta-address of that block so lookups on that block were shifted even though the actual data still resided in the original block.
In simple words, I wrote past my buffer. Since I hand out memory from the tail, overwriting would blow away my much needed meta-address for my global structure (or whatever). Now I know what undefined behavior looks like.

Related

Rust Global.dealloc vs ptr::drop_in_place vs ManuallyDrop

I'm relatively new to Rust. I was working on some lock-free algorithms, and started playing around with manually managing memory, something similar to C++ new/delete. I noticed a couple different ways that do this throughout the standard library components, but I want to really understand the differences and use cases of each. Here's what it seems like to me:
ManuallyDrop<Box<T>> will prevent Box's destructor from running. I can save a raw pointer to the ManuallyDrop element, and have the actual element go out of scope (what would normally be dropped in Rust) without being dropped. I can later call ManuallyDrop::drop(&mut *ptr) to drop this value manually.
I can also dereference the ManuallyDrop<Box<T>> element, save a raw pointer to just the Box<T>, and later call std::ptr::drop_in_place(box_ptr). This is supposed to destroy the Boxitself and drop the heap-allocated T.
Looking at the ManuallyDrop::drop implementation, it looks those are literally doing the exact same thing. Since ManuallyDrop is zero cost and just stores a value in it's struct, is there any difference in the above two approaches?
I can also call std::alloc::Global.dealloc(...), which looks like it will deallocate the memory block without calling drop. So if I call this on a pointer to Box<T>, it'll deallocate the heap pointer, but won't call drop, so T will still be lying around on the heap. I could call it on a pointer to T itself, which will remove T.
From exploring the standard library, it looks like Global.dealloc gets called in the raw_vec implementation to actually remove the heap-allocated array that Vec points to. This makes sense, since it's literally trying to remove a block of memory.
Rc has a drop implementation that looks roughly like this:
// destroy the contained object
ptr::drop_in_place(self.ptr.as_mut());
// remove the implicit "strong weak" pointer now that we've
// destroyed the contents.
self.dec_weak();
if self.weak() == 0 {
Global.dealloc(self.ptr.cast(), Layout::for_value(self.ptr.as_ref()));
}
I don't really understand why it needs both the dealloc and the drop_in_place. What does the dealloc add that the drop_in_place doesn't do?
Also, if I just save a raw pointer to a heap-allocated value by doing something like Box::new(5).into_raw(), does my pointer now control that memory allocation. As in, will it remain alive until I explicitly call ptr::drop_in_place()?
Finally, when I was playing with all this, I ran into a strange issue. After running ManuallyDrop::drop or ptr::drop_in_place on my raw pointer, I then tried running println! on the pointer's dereferenced value. Sometimes I get a scary heap error and my test fails, which is what I would expect. Other times, it just prints the same value, as if no drops happened. I also tried running ManuallyDrop::drop multiple times on the exact same value, and same thing. Sometimes a heap error, sometimes totally fine, and the same value prints out.
What is happening here?
If you come from C++, you can think of drop_in_place as calling the destructor manually, and dealloc as calling old C free.
They serve different purposes:
drop_in_place just calls Drop::drop, that releases the resources held by your type.
dealloc frees the memory pointed to by a pointer, previously allocated with alloc.
You seem to think that drop_in_place also frees the memory, but that is not the case. I think your confusion arises because Box<T> contains a dynamically allocated object, so its Box::drop implementation does release the memory used by that object, after calling its drop_in_place, of course.
That is what you see in the Rc implementation, first it calls the drop_in_place (destructor) of the inner object, then it releases the memory.
About what happens if you call drop_in_place several times in a row... well, the function is unsafe for a reason: you most likely get Uundefined Behavior. From the docs:
...if T is not Copy, using the pointed-to value after calling drop_in_place can cause undefined behavior.
Note the can cause. I think it is perfectly possible to write a type that allows calling drop several times, but it doesn't sound like such a good idea.

How can the processor discern a far return from a near return?

Reading Intel's big manual, I see that if you want to return from a far call, that is, a call to a procedure in another code segment, you simply issue a return instruction (possibly with an immediate argument that moves the stack pointer up n bytes after the pointer popping).
This, apparently, if I'm interpreting things correctly, is enough for the hardware to pop both the segment selector and offset into the correct registers.
But, how does the system know that the return should be a far return and that both an offset AND a selector need to be popped?
If the hardware just pops the offset pointer and not the selector after it, then you'll be pointing to the right offset but wrong segment.
There is nothing special about the far return command compared to the near return version.
They both look identical as far as I can tell.
I assume then that the processor, perhaps at the micro-architecture level, keeps track of which calls are far and which are close so that when they're returned from, the system knows how many bytes to pop and where to pop them (pointer registers and segment selector registers).
Is my assumption correct?
What do you guys know about this mechanism?
The processor doesn't track whether or not a call should be far or near; the compiler decides how to encode the function call and return using either far or near opcodes.
As it is, FAR calls have no use on modern processors because you don't need to change any segment register values; that's the point of a flat memory model. Segment registers still exist, but the OS sets them up with base=0 and limit=0xffffffff so just a plain 32-bit pointer can access all memory. Everything is NEAR, if you need to put a name on it.
Normally you just don't even think about segmentation so you don't actually call it either. But the manual still describes the call/ret opcodes we use for normal code as the NEAR versions.
FAR and NEAR were used on old 86 processors, which used a segmented memory model. Programs at that time needed to choose what kind of architecture they wished to support, ranging from "tiny" to "large". If your program was small enough to fit in a single segment, then it could be compiled using NEAR calls and returns exclusively. If it was "large", the opposite was true. For anything in between, you had power to choose whether local functions needed to be able to be either callable/returnable from code in another segment.
Most modern programs (besides bootloaders and the like) run on a different construct: they expect a flat memory model. Behind the scenes the OS will swap out memory as needed (with paging not segmentation), but as far as the program is concerned, it has its virtual address space all to itself.
But, to answer your question, the difference in the call/return is the opcode used; the processor obeys the command given to it. If you mistake (say, give it a FAR return opcode when in flat mode), it'll fail.

What is the most likely reason a variable, passed as reference to a program, would have its value changed past its length?

My coworker came to me with a problem yesterday.
He has a CL with two variables defined, 10 characters each. He then calls another CL with the first variable as a parameter, and after the CL comes back (which is a long string of programs which will likely have to be manually combed through for the offending code), his first variable is unchanged, but the first 5 characters of his second variable are blanked out.
Because the parameters are passed by reference there is obviously something that is able to affect more than the 10 bytes allotted and it is overflowing to the variable defined immediately after it in memory, but I'm wondering what common examples of this would be (that would not result in some kind of explicit error). Another program that is getting this address passed to it that has its parameters defined with 15 bytes? Using a pointer to the address and then dereferencing and assigning a 15 character string? This is V7R1.
In the meantime he stuck another variable between the two to act as a buffer lol. Interestingly enough, variables that are not assigned a value are never initialized so they are not given any space in memory. Interesting thing to discover when playing around with this.
All is takes is for a program further down in the call stack to have a parameter mis-defined
calling program
/*pgm a*/
pgm
dcl &parm1 char(10) value('Hello')
dcl &parm2 char(10) value('Charles')
call pgmb parm(&parm1 &parm2)
endpgm
called program
/*pgm b*/
pgm parm(&parmA &parmB)
dcl &parmA char(15)
dcl &parmB char(10)
chgvar &parmA value('Bye')
endpgm
Note: It's not guaranteed to break, it might work fine, till some PTF is applied that changes the internals of the OS or just the compiler.
I personally saw where an RPG III program had been working "fine" for years, that broke when converted to RPGIV due to differences in how the compiler laid out memory. The RPG III program was corrupting unused memory, but at RPGIV the corrupted memory had been important.
Bottom line, count yourself lucky that an error is being thrown...trace down through the call stack to find the mismatch.
Defining another variable between the two variables won't necessarily make a difference. There is no guarantee that variables are laid out in storage in the same order they are defined in the program.
To ensure there is additional storage following a variable, the extra storage must be defined explicitly.
In CL, do it like this. This will add a guaranteed extra 90 bytes following &MYVAR.
dcl &myvar_stg type(*char) len(100)
dcl &myvar type(*char) stg(*defined) len(10) defvar(&myvar_stg 1)
Assuming all the programs involved are debuggable, you can find out exactly where the storage is getting corrupted by putting a watch on the second variable in debug.
In the debugger, use the "watch" command and then let the program run.
===> watch &myvar2
You'll get a watch breakpoint on the next debuggable statement following the statement that changed the storage. Usually, the next debuggable statement is in the same program that changed the storage.

Dissasemble 68xx code without entry point vector

I am trying to disassemble a code from a old radio containing a 68xx (68hc12 like) microcontroller. The problem is, I dont have the access to the interrupt vector of the micro in the top of the ROM, so I don't know where start to look. I only have the code below the top. There is some suggestion of where or how can I find meaningful routines in the code data?
You can't really disassemble reliably without knowing where the reset vector points. What you can do, however, is try to narrow down the possible reset addresses by eliminating all those other addresses that cannot possibly be a starting point.
So, given that any address in the memory map that contains a valid opcode is a potential reset point, you need to either eliminate it, or keep it for further analysis.
For the 68HC11 case, you could try to guess somewhat the entry point by looking for LDS instructions with legitimate operand value (i.e., pointing at or near the top of available RAM -- if multiple RAM banks, then to any of them).
It may help a bit if you know the device's full memory map, i.e., if external memory is used, its mapping and possible mapped peripherals (e.g., LCD). Do you also know CONFIG register contents?
The LDS instruction is usually either the very first instruction, or close thereafter (so look back a few instructions when you feel you have finally singled out your reset address). The problem here is some data may, by chance, appear as LDS instructions so you could end up with multiple potentially valid entry points. Only one of them is valid, of course.
You can eliminate further by disassembling a few instructions starting from each of these LDS instructions until you either hit an illegal opcode (i.e. obviously not a valid code sequence but an accidental data arrangement that looks like opcodes), or you see a series of instructions that are commonly used in 68HC11 initialization. These involve (usually) initialization of any one or more of the registers BPROT, OPTION, SCI, INIT ($103D in most parts, but for some $3D), etc.
You could write a relatively small script (e.g., in Lua) to do the basic scanning of the memory map and produce a (hopefully small) set of potential reset points to be examined further with a true disassembler for hints like the ones I mentioned.
Now, once you have the reset vector figured out the job becomes somewhat easier but you still need to figure out where any interrupt handlers are located. For this your hint is an RTI instruction and whatever preceding code that normally should acknowledge the specific interrupt it handles.
Hope this helps.

How do pointers increase execution speed?

How does using a pointer in a program increase the execution speed?
When I use a pointer to access a variable while running the program first it has to go to the pointer's address to find the address of the variable and then go to the variable to use it(that's what I know).
It is obvious that using a variable is faster here.
So how does a pointer increase the speed?
Passing a pointer to 4KB of data, is faster (and uses less memory) than copying that 4KB to pass it "by value".
You are correct that, for a simple 'integer', passing it directly is faster than passing a pointer to it & de-referencing (looking up) the pointer.
Pointers are typically used for larger data-structures than that, however.
The other use of pointers is to enable modifiability -- the function can modify the original data or data-structure via the pointer received, rather than just having a copy which is independent of the caller's & to which the caller would not see changes.
For example a FILE * -- a pointer to a file-handle. I/O functions take this & update internal pointers to keep track of where you are, in the file.

Resources