Valid and Invalid Windows in SPARC V8 - cpu-registers

What criteria does a register window need to hold in order to be considered valid or invalid?
My understanding is that, if a window contains information relative to some function, say, in a chain of functions, then it contains valid information. If the out registers of a window would overlap the contents (in registers) of a valid window then it is considered invalid (or if the OS reserved that window as invalid for trap handling).
However, this is not explained in great detail by the SPARC V8 manual or the System V ABI for SPARC, in my opinion.
So, again, my question is: When is a window considered to be valid or contain valid information and when is it considered to be invalid?
Thanks

The WIM (Window Invalid Mask) register holds the information about invalid and valid registers, while the CWP (Current Window Pointer) holds the currently used register window.
For each register it holds a bit about the validity.
If the WIM is set to 1 the window is considered invalid and causes a trap when the CWP is changed during a SAVE or RETSTORE op or when a trap is executed.
I found this in the SPARC v8 Architecture Manual on p. 27 and p. 30

Related

Finding calling site of instrumented system calls

In my pintool, I check NtReadFile() and NtCreateFile() system calls using the PIN API:
PIN_AddSyscallEntryFunction()
PIN_AddSyscallExitFunction()
But the outputs seems to be polluted with many unexpected additional interceptions, I would like to filter out.
Problem is, the SYSCALL_ENTRY_CALLBACK functions do not let you access to information needed to deduce from where the system call has been spawned (calling site), even at the entry. Checking the value of REG_EIP (address of the instruction pointer) juste before the system call is executed, I see I am way off the calling site (out of the address range of the image I am instrumenting, although the system call is made within this image).
I also tried to instrument instructions with INS_IsSyscall() at IPOINT_BEFORE and check it's address, but it seems it is too late too (out of range of the image's low and high addresses)
What would be the correct process to instrument only system calls starting from the image I am instrumenting ?

Vulkan: trouble understanding cycling of framebuffers

In Vulkan,
A semaphore(A) and a fence(X) can be passed to vkAcquireNextImageKHR. That semaphore(A) is subsequently passed to vkQueueSubmit, to wait until the image is released by the Presentation Engine (PE). A fence(Y) can also be passed to vkQueueSubmit. Client code can check when the submission has completed by checking fence(Y).
When fence(Y) signals, this means the PE can display the image.
My question:
How do I know when the PE has finished using the image after a call to vkQueuePresentKHR? To me, it doesn't seem that it would be by checking fence(X), because that is for client code to know when the image can be written to by vkQueueSubmit, isn't it? After the image is sent to vkQueueSubmit, it seems the usefulness of fence(X) is done. Or, can the same fence(X) be used to query the image availability after the call to vkQueuePresentKHR?
I don't know when the image is available again after a call to vkQueuePresentKHR, without having to call vkAcquireNextImageKHR.
The reason this is causing trouble for me is that in an asynchronous, 60fps, triple buffered app (throwaway learning code), things get out of wack like this:
Send an initial framebuffer to the PE. This framebuffer is now unavailable for 16 milliseconds.
Within the 16ms, acquire a second image/framebuffer, submit commands, but don't present.
Do the same as #2, for a third image. We submit it before 16ms.
16ms have gone by, so we vkQueuePresentKHR the second image.
Now, if I call vkAcquireNextImageKHR, the whole thing can fail if image #1 is not yet done being used, because I have acquired three images at this point.
How to know if image #1 is available again without calling vkAcquireNextImageKHR?
How do I know when the PE has finished using the image after a call to vkQueuePresentKHR?
You usually do not need to know.
Either you need to acquire a new VkImage, or you don't. Whether PE has finished or not does not even enter that decision.
Only reason wanting to know is if you want to measure presentation times. There's a special extension for that: VK_GOOGLE_display_timing.
After the image is sent to vkQueueSubmit, it seems the usefulness of fence(X) is done.
Well, you can reuse the fence. But the Implementation has stopped using it as soon as it was signaled and won't be changing its state anymore to anything, if that's what you are asking (and so you are free to vkDestroy it or do other things with it).
I don't know when the image is available again after a call to vkQueuePresentKHR, without having to call vkAcquireNextImageKHR.
Hopefully I cover it below, but I am not precisely sure what the problem here is. I don't know how to eat a soup without a spoon neither. Simply use a spoon— I mean vkAcquireNextImageKHR.
Now, if I call vkAcquireNextImageKHR, the whole thing can fail if image #1 >is not yet done being used, because I have acquired 3 images at this point.
How to know if image #1 is available again without calling >vkAcquireNextImageKHR?
How is it any different than image #1 and #2?
Yes, you may have already acquired all the images the swapchain has to offer, or the PE is "not ready" to give away an image even if it has two.
In the first case the spec advises against calling vkAcquireNextImageKHR with timeout of UINT64_MAX. It is a simple matter of counting the successful vkAcquireNextImageKHR calls vs the vkQueuePresentKHRs. One way may be to simply do one vkAcquireNextImageKHR and then do one vkQueuePresentKHR.
In the second case you can simply call vkAcquireNextImageKHR and you will eventually get the image.
In order to use a swapchain image, You need to acquire it. After that the actual availability of the image for rendering purposes is signaled by the semaphore (A) or the fence (X). You can either use the semaphore (X) during the submission as a wait semaphore or wait on the CPU for the fence (X) and submit after that. For performance reasons the semaphore is a preferred way.
Now when You present an image, You give it back to the Presentation Engine. From now on You cannot use that image for whatever purposes. There is no way to check when that image is available again for You so You can render into it again. You cannot do that. If You want to render into a swapchain image again, You need to acquire another image. And during this operation You once again provide a semaphore or a fence (probably different than those provided when You previously acquired a swapchain image). There is no other way to check when an image is again available than through calling the vkAcquireNextImageKHR() function.
And when You want to implement triple-buffering, You should just select appropriate presentation mode (mailbox mode is the closest match). You shouldn't wait for a specific time before You present an image. You just should present it when You are done rendering into it. Your synchronization should be entirely based on acquire, present commands and semaphores or fences provided during these operations and during submission. Appropriate present mode should do the rest. Detailed explanation of different present modes is available in Intel's tutorial.

How to avoid detecting uninitialized variables when using the impact analysis of Frama-C

I find that if there is an uninitialized left-value (variable X for example) in the program, Frama-C asserts that X has been initialized, but then the assertion gets the final status invalid. It seems that Frama-C stops the analysis after detecting the invalid final status, so that the actual result of the impact analysis (the impacted statements) is just a part of the ideal result. I want Frama-C to proceed the impact analysis regardless of those uninitialized variables, but I haven't found any related options yet. How to deal with this problem?
You're invoking an undefined behavior as indicated in annex J.2 of ISO C standard "The value of an object with automatic storage duration is used while it is indeterminate" (Note to language lawyers: said annex is informative, and I've been unable to trace that claim back to the normative sections of the standard, at least for C11). The EVA plug-in, which is used internally by Impact analysis, restricts itself to execution paths that have a well-defined meaning according to the standard (the proverbial nasal demons are not part of the abstract domains of EVA). If there are no such paths, abstract execution will indeed stop. The appropriate way to deal with this problem is to ensure the local variables of the program under analysis are properly initialized before being accessed.
Update
I forgot to mention that in the next version (16 - Sulfur), whose beta version is available at https://github.com/Frama-C/Frama-C-snapshot/wiki/downloads/frama-c-Sulfur-20171101-beta.tar.gz, EVA has an option -val-initialized-locals, whose help specifies:
Local variables enter in scope fully initialized. Only useful for the analysis of programs buggy w.r.t. initialization.

How can the processor discern a far return from a near return?

Reading Intel's big manual, I see that if you want to return from a far call, that is, a call to a procedure in another code segment, you simply issue a return instruction (possibly with an immediate argument that moves the stack pointer up n bytes after the pointer popping).
This, apparently, if I'm interpreting things correctly, is enough for the hardware to pop both the segment selector and offset into the correct registers.
But, how does the system know that the return should be a far return and that both an offset AND a selector need to be popped?
If the hardware just pops the offset pointer and not the selector after it, then you'll be pointing to the right offset but wrong segment.
There is nothing special about the far return command compared to the near return version.
They both look identical as far as I can tell.
I assume then that the processor, perhaps at the micro-architecture level, keeps track of which calls are far and which are close so that when they're returned from, the system knows how many bytes to pop and where to pop them (pointer registers and segment selector registers).
Is my assumption correct?
What do you guys know about this mechanism?
The processor doesn't track whether or not a call should be far or near; the compiler decides how to encode the function call and return using either far or near opcodes.
As it is, FAR calls have no use on modern processors because you don't need to change any segment register values; that's the point of a flat memory model. Segment registers still exist, but the OS sets them up with base=0 and limit=0xffffffff so just a plain 32-bit pointer can access all memory. Everything is NEAR, if you need to put a name on it.
Normally you just don't even think about segmentation so you don't actually call it either. But the manual still describes the call/ret opcodes we use for normal code as the NEAR versions.
FAR and NEAR were used on old 86 processors, which used a segmented memory model. Programs at that time needed to choose what kind of architecture they wished to support, ranging from "tiny" to "large". If your program was small enough to fit in a single segment, then it could be compiled using NEAR calls and returns exclusively. If it was "large", the opposite was true. For anything in between, you had power to choose whether local functions needed to be able to be either callable/returnable from code in another segment.
Most modern programs (besides bootloaders and the like) run on a different construct: they expect a flat memory model. Behind the scenes the OS will swap out memory as needed (with paging not segmentation), but as far as the program is concerned, it has its virtual address space all to itself.
But, to answer your question, the difference in the call/return is the opcode used; the processor obeys the command given to it. If you mistake (say, give it a FAR return opcode when in flat mode), it'll fail.

Z80 flags - how to generate?

I'm designing a Z80 compatible project. I'm up to designing the flags register.
I originally thought that the flags were generated straight from the ALU depending on the inputs and type of ALU operation.
But after looking at the instructions and the flags result it doesn't seem that the flags are always consistent with this logic.
As a result I'm then assuming I also have to feed the ALU the op-code as well to generate the correct flags each time. But this would seem to make the design over-complicated. And before making this huge design step I wanted to check with the Internet.
Am I correct? OR just really confused, and it is as simple as I originally thought?
Of course, the type of the operation is important. Consider overflow when doing addition and subtraction. Say, you're adding or subtracting 8-bit bytes:
1+5=6 - no overflow
255+7=6 - overflow
1-5=252 - overflow
200-100=100 - no overflow
200+100=44 - overflow
100-56=44 - no overflow
Clearly, the carry flag's state here depends not only on the input bytes or the resultant byte value, but also on the operation. And it indicates unsigned overflow.
The logic is very consistent. If it's not, it's time to read the documentation to learn the official logic.
You might be interested in this question.
Your code is written for a CP/M operating system. I/O is done through the BDOS (Basic Disc Operating System) interface. Basically, you load an operation code in the C register, any additional parameters in other registers, and call location 0x5. Function code C=2 is write a character in the E register to the console (=screen). You can see this in action at line 1200:
ld e,a
ld c,2
call bdos
pop hl
pop de
pop bc
pop af
ret
bdos push af
push bc
push de
push hl
call 5
pop hl
pop de
pop bc
pop af
ret
For a reference to BDOS calls, try here.
To emulate this you need to trap calls to address 5 and implement them using whatever facilities you have available to you.

Resources