Reading Intel's big manual, I see that if you want to return from a far call, that is, a call to a procedure in another code segment, you simply issue a return instruction (possibly with an immediate argument that moves the stack pointer up n bytes after the pointer popping).
This, apparently, if I'm interpreting things correctly, is enough for the hardware to pop both the segment selector and offset into the correct registers.
But, how does the system know that the return should be a far return and that both an offset AND a selector need to be popped?
If the hardware just pops the offset pointer and not the selector after it, then you'll be pointing to the right offset but wrong segment.
There is nothing special about the far return command compared to the near return version.
They both look identical as far as I can tell.
I assume then that the processor, perhaps at the micro-architecture level, keeps track of which calls are far and which are close so that when they're returned from, the system knows how many bytes to pop and where to pop them (pointer registers and segment selector registers).
Is my assumption correct?
What do you guys know about this mechanism?
The processor doesn't track whether or not a call should be far or near; the compiler decides how to encode the function call and return using either far or near opcodes.
As it is, FAR calls have no use on modern processors because you don't need to change any segment register values; that's the point of a flat memory model. Segment registers still exist, but the OS sets them up with base=0 and limit=0xffffffff so just a plain 32-bit pointer can access all memory. Everything is NEAR, if you need to put a name on it.
Normally you just don't even think about segmentation so you don't actually call it either. But the manual still describes the call/ret opcodes we use for normal code as the NEAR versions.
FAR and NEAR were used on old 86 processors, which used a segmented memory model. Programs at that time needed to choose what kind of architecture they wished to support, ranging from "tiny" to "large". If your program was small enough to fit in a single segment, then it could be compiled using NEAR calls and returns exclusively. If it was "large", the opposite was true. For anything in between, you had power to choose whether local functions needed to be able to be either callable/returnable from code in another segment.
Most modern programs (besides bootloaders and the like) run on a different construct: they expect a flat memory model. Behind the scenes the OS will swap out memory as needed (with paging not segmentation), but as far as the program is concerned, it has its virtual address space all to itself.
But, to answer your question, the difference in the call/return is the opcode used; the processor obeys the command given to it. If you mistake (say, give it a FAR return opcode when in flat mode), it'll fail.
What is a segmentation fault? Is it different in C and C++? How are segmentation faults and dangling pointers related?
Segmentation fault is a specific kind of error caused by accessing memory that “does not belong to you.” It’s a helper mechanism that keeps you from corrupting the memory and introducing hard-to-debug memory bugs. Whenever you get a segfault you know you are doing something wrong with memory – accessing a variable that has already been freed, writing to a read-only portion of the memory, etc. Segmentation fault is essentially the same in most languages that let you mess with memory management, there is no principal difference between segfaults in C and C++.
There are many ways to get a segfault, at least in the lower-level languages such as C(++). A common way to get a segfault is to dereference a null pointer:
int *p = NULL;
*p = 1;
Another segfault happens when you try to write to a portion of memory that was marked as read-only:
char *str = "Foo"; // Compiler marks the constant string as read-only
*str = 'b'; // Which means this is illegal and results in a segfault
Dangling pointer points to a thing that does not exist anymore, like here:
char *p = NULL;
{
char c;
p = &c;
}
// Now p is dangling
The pointer p dangles because it points to the character variable c that ceased to exist after the block ended. And when you try to dereference dangling pointer (like *p='A'), you would probably get a segfault.
It would be worth noting that segmentation fault isn't caused by directly accessing another process memory (this is what I'm hearing sometimes), as it is simply not possible. With virtual memory every process has its own virtual address space and there is no way to access another one using any value of pointer. Exception to this can be shared libraries which are same physical address space mapped to (possibly) different virtual addresses and kernel memory which is even mapped in the same way in every process (to avoid TLB flushing on syscall, I think). And things like shmat ;) - these are what I count as 'indirect' access. One can, however, check that they are usually located long way from process code and we are usually able to access them (this is why they are there, nevertheless accessing them in a improper way will produce segmentation fault).
Still, segmentation fault can occur in case of accessing our own (process) memory in improper way (for instance trying to write to non-writable space). But the most common reason for it is the access to the part of the virtual address space that is not mapped to physical one at all.
And all of this with respect to virtual memory systems.
A segmentation fault is caused by a request for a page that the process does not have listed in its descriptor table, or an invalid request for a page that it does have listed (e.g. a write request on a read-only page).
A dangling pointer is a pointer that may or may not point to a valid page, but does point to an "unexpected" segment of memory.
To be honest, as other posters have mentioned, Wikipedia has a very good article on this so have a look there. This type of error is very common and often called other things such as Access Violation or General Protection Fault.
They are no different in C, C++ or any other language that allows pointers. These kinds of errors are usually caused by pointers that are
Used before being properly initialised
Used after the memory they point to has been realloced or deleted.
Used in an indexed array where the index is outside of the array bounds. This is generally only when you're doing pointer math on traditional arrays or c-strings, not STL / Boost based collections (in C++.)
According to Wikipedia:
A segmentation fault occurs when a
program attempts to access a memory
location that it is not allowed to
access, or attempts to access a memory
location in a way that is not allowed
(for example, attempting to write to a
read-only location, or to overwrite
part of the operating system).
Segmentation fault is also caused by hardware failures, in this case the RAM memories. This is the less common cause, but if you don't find an error in your code, maybe a memtest could help you.
The solution in this case, change the RAM.
edit:
Here there is a reference: Segmentation fault by hardware
Wikipedia's Segmentation_fault page has a very nice description about it, just pointing out the causes and reasons. Have a look into the wiki for a detailed description.
In computing, a segmentation fault (often shortened to segfault) or access violation is a fault raised by hardware with memory protection, notifying an operating system (OS) about a memory access violation.
The following are some typical causes of a segmentation fault:
Dereferencing NULL pointers – this is special-cased by memory management hardware
Attempting to access a nonexistent memory address (outside process's address space)
Attempting to access memory the program does not have rights to (such as kernel structures in process context)
Attempting to write read-only memory (such as code segment)
These in turn are often caused by programming errors that result in invalid memory access:
Dereferencing or assigning to an uninitialized pointer (wild pointer, which points to a random memory address)
Dereferencing or assigning to a freed pointer (dangling pointer, which points to memory that has been freed/deallocated/deleted)
A buffer overflow.
A stack overflow.
Attempting to execute a program that does not compile correctly. (Some compilers will output an executable file despite the presence of compile-time errors.)
Segmentation fault occurs when a process (running instance of a program) is trying to access read-only memory address or memory range which is being used by other process or access the non-existent (invalid) memory address.
Dangling Reference (pointer) problem means that trying to access an object or variable whose contents have already been deleted from memory, e.g:
int *arr = new int[20];
delete arr;
cout<<arr[1]; //dangling problem occurs here
In simple words: segmentation fault is the operating system sending a signal to the program
saying that it has detected an illegal memory access and is prematurely terminating the program to prevent
memory from being corrupted.
There are several good explanations of "Segmentation fault" in the answers, but since with segmentation fault often there's a dump of the memory content, I wanted to share where the relationship between the "core dumped" part in Segmentation fault (core dumped) and memory comes from:
From about 1955 to 1975 - before semiconductor memory - the dominant technology in computer memory used tiny magnetic doughnuts strung on copper wires. The doughnuts were known as "ferrite cores" and main memory thus known as "core memory" or "core".
Taken from here.
"Segmentation fault" means that you tried to access memory that you do not have access to.
The first problem is with your arguments of main. The main function should be int main(int argc, char *argv[]), and you should check that argc is at least 2 before accessing argv[1].
Also, since you're passing in a float to printf (which, by the way, gets converted to a double when passing to printf), you should use the %f format specifier. The %s format specifier is for strings ('\0'-terminated character arrays).
Simple meaning of Segmentation fault is that you are trying to access some memory which doesn't belong to you. Segmentation fault occurs when we attempt to read and/or write tasks in a read only memory location or try to freed memory. In other words, we can explain this as some sort of memory corruption.
Below I mention common mistakes done by programmers that lead to Segmentation fault.
Use scanf() in wrong way(forgot to put &).
int num;
scanf("%d", num);// must use &num instead of num
Use pointers in wrong way.
int *num;
printf("%d",*num); //*num should be correct as num only
//Unless You can use *num but you have to point this pointer to valid memory address before accessing it.
Modifying a string literal(pointer try to write or modify a read only memory.)
char *str;
//Stored in read only part of data segment
str = "GfG";
//Problem: trying to modify read only memory
*(str+1) = 'n';
Try to reach through an address which is already freed.
// allocating memory to num
int* num = malloc(8);
*num = 100;
// de-allocated the space allocated to num
free(num);
// num is already freed there for it cause segmentation fault
*num = 110;
Stack Overflow -: Running out of memory on the stack
Accessing an array out of bounds'
Use wrong format specifiers when using printf() and scanf()'
Consider the following snippets of Code,
SNIPPET 1
int *number = NULL;
*number = 1;
SNIPPET 2
int *number = malloc(sizeof(int));
*number = 1;
I'd assume you know the meaning of the functions: malloc() and sizeof() if you are asking this question.
Now that that is settled,
SNIPPET 1 would throw a Segmentation Fault Error.
while SNIPPET 2 would not.
Here's why.
The first line of snippet one is creating a variable(*number) to store the address of some other variable but in this case it is initialized to NULL.
on the other hand,
The second line of snippet two is creating the same variable(*number) to store the address of some other and in this case it is given a memory address(because malloc() is a function in C/C++ that returns a memory address of the computer)
The point is you cannot put water inside a bowl that has not been bought OR a bowl that has been bought but has not been authorized for use by you.
When you try to do that, the computer is alerted and it throws a SegFault error.
You should only face this errors with languages that are close to low-level like C/C++. There is an abstraction in other High Level Languages that ensure you do not make this error.
It is also paramount to understand that Segmentation Fault is not language-specific.
There are enough definitions of segmentation fault, I would like to quote few examples which I came across while programming, which might seem like silly mistakes, but will waste a lot of time.
You can get a segmentation fault in below case while argument type mismatch in printf:
#include <stdio.h>
int main(){
int a = 5;
printf("%s",a);
return 0;
}
output : Segmentation Fault (SIGSEGV)
When you forgot to allocate memory to a pointer, but try to use it.
#include <stdio.h>
typedef struct{
int a;
} myStruct;
int main(){
myStruct *s;
/* few lines of code */
s->a = 5;
return 0;
}
output : Segmentation Fault (SIGSEGV)
In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection,
notifying an operating system the software has attempted to access a
restricted area of memory. -WIKIPEDIA
You might be accessing the computer memory with the wrong data type. Your case might be like the code below:
#include <stdio.h>
int main(int argc, char *argv[]) {
char A = 'asd';
puts(A);
return 0;
}
'asd' -> is a character chain rather than a single character char data type. So, storing it as a char causes the segmentation fault. Stocking some data at the wrong position.
Storing this string or character chain as a single char is trying to fit a square peg in a round hole.
Terminated due to signal: SEGMENTATION FAULT (11)
Segm. Fault is the same as trying to breath in under water, your lungs were not made for that. Reserving memory for an integer and then trying to operate it as another data type won't work at all.
Segmentation fault occurs when a process (running instance of a program) is trying to access a read-only memory address or memory range which is being used by another process or access the non-existent memory address.
seg fault,when type gets mismatched
A segmentation fault or access violation occurs when a program attempts to access a memory location that is not exist, or attempts to access a memory location in a way that is not allowed.
/* "Array out of bounds" error
valid indices for array foo
are 0, 1, ... 999 */
int foo[1000];
for (int i = 0; i <= 1000 ; i++)
foo[i] = i;
Here i[1000] not exist, so segfault occurs.
Causes of segmentation fault:
it arise primarily due to errors in use of pointers for virtual memory addressing, particularly illegal access.
De-referencing NULL pointers – this is special-cased by memory management hardware.
Attempting to access a nonexistent memory address (outside process’s address space).
Attempting to access memory the program does not have rights to (such as kernel structures in process context).
Attempting to write read-only memory (such as code segment).
I would like to understand how to correctly use the async_work_group_copy() call in OpenCL. Let's have a look on a simplified example:
__kernel void test(__global float *x) {
__local xcopy[GROUP_SIZE];
int globalid = get_global_id(0);
int localid = get_local_id(0);
event_t e = async_work_group_copy(xcopy, x+globalid-localid, GROUP_SIZE, 0);
wait_group_events(1, &e);
}
The reference http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/async_work_group_copy.html says "Perform an async copy of num_elements gentype elements from src to dst. The async copy is performed by all work-items in a work-group and this built-in function must therefore be encountered by all work-items in a workgroup executing the kernel with the same argument values; otherwise the results are undefined."
But that doesn't clarify my questions...
I would like to know, if the following assumptions are correct:
The call to async_work_group_copy() must be executed by all work-items in the group.
The call should be in a way, that the source address is identical for all work-items and points to the first element of the memory area to be copied.
As my source address is relative based on the global work-item id of the first work-item in the work-group. So I have to subtract the local id to have the address identical for all work-items...
Is the third parameter really the number of elements (not the size in bytes)?
Bonus questions:
a. Can I just use barrier(CLK_LOCAL_MEM_FENCE) instead of wait_group_events() and ignore the return value? If so, would that be probably faster?
b. Does a local copy also make sense for processing on CPUs or is that overhead as they share a cache anyway?
Regards,
Stefan
One of the main reasons for this function existing is to allow the driver/kernel compiler to efficiently copy the memory without the developer having to make assumptions about the hardware.
You describe what memory you need copied as if it were a single-threaded copy, and async_work_group_copy gets it done for you using the parallel hardware.
For your specific questions:
I have never seen async_work_group_copy used by only some of the work items in a group. I always assumed this is because it it required. I think the blocking nature of wait_group_events forces all work items to be part of the copy.
Yes. Source (and destination) addresses need to be the same for all work items.
You could subtract your local id to get the correct address, but I find that basing the address on groupId solves this problem as well. (get_group_id)
Yes. The last param is the number of elements, not the size in bytes.
a. No. The event-based you will find that your barrier is hit almost immediately by the work items, and the data won't necessarily be copied. This makes sense because some opencl hardware might not even use the compute units at all to do the actual copy operation.
b. I think that cpu opencl implementations might guarantee L1 cache usage when you use local memory. The only way to know for sure if this performs better is to benchmark your application with various settings.
In C/C++ we're used to checking for null pointers before dereferencing them, e.g.
int *p = malloc(sizeof(int));
if (p != 0)
{
/* Do something with the pointer */
}
Hence the memory manager can never return a pointer to the first memory address (where p == 0) as the calling program will assume that the memory could not be allocated.
Does that mean that the first byte or word (for alignment purposes) is always unused, both in the entire system memory space and the process' memory space? Or is this memory used by the system or kernel, which knows which null pointers it can dereference safely?
First of all, to make it clear, malloc returning 0 means signaling an error.
In most modern operating systems the virtual address space (addresses used in a program) is not the same as the physical address space (the real addresses that the memory understands). Most modern operating systems use paging. So the addresses used in a program (the address returned by malloc for example) aren't the same as the physical ones. The OS has some mechanism to make a correspondence between them.
The OS must simply never map anything at the physical address 0 for a regular process and that address will always be invalid if the process tries to access it. The OS itself, for its own benefit can access the memory at address 0 if it so desires.
Yes, sort of. Technically you could store something at address zero, but Windows doesn't allow - you get access violation and Linux doesn't allow - you get segmentation fault. This is done assist in having a designated special value that would mean "a null pointer - a pointer that clearly isn't pointing to any live object". Maybe there're systems where storing data at address zero is outright allowed, but still they would need some special value for a "null pointer".
In most systems, there is something already at physical address 0. Some older processors simply start execution there when they come out of reset. Others may expect a vector table there (addresses for reset and interrupts). Often a ROM will be mapped there to contain these special things the processor is expecting, so it doesn't make sense to ever get a pointer there.
With virtual memory, your app and all its allocations live in a virtual memory space which is mapped to physical memory. My guess would be that the program is mapped in at virtual address 0 and so you'd still never expect malloc to return 0. This is more of a guess on my part, as I'm not intimately familiar with the details of virtual memory layout.
BTW, I've often thought processors should return zero when reading from a NULL pointer without page faulting. This would allow speculative prefetch of data even when there isn't any.
Sometimes, on various Unix architectures, recompiling a program while it is running causes the program to crash with a "Bus error". Can anyone explain under which conditions this happens? First, how can updating the binary on disk do anything to the code in memory? The only thing I can imagine is that some systems mmap the code into memory and when the compiler rewrites the disk image, this causes the mmap to become invalid. What would the advantages be of such a method? It seems very suboptimal to be able to crash running codes by changing the executable.
On local filesystems, all major Unix-like systems support solving this problem by removing the file. The old vnode stays open and, even after the directory entry is gone and then reused for the new image, the old file is still there, unchanged, and now unnamed, until the last reference to it (in this case the kernel) goes away.
But if you just start rewriting it, then yes, it is mmap(3)'ed. When the block is rewritten one of two things can happen depending on which mmap(3) options the dynamic linker uses:
the kernel will invalidate the corresponding page, or
the disk image will change but existing memory pages will not
Either way, the running program is possibly in trouble. In the first case, it is essentially guaranteed to blow up, and in the second case it will get clobbered unless all of the pages have been referenced, paged in, and are never dropped.
There were two mmap flags intended to fix this. One was MAP_DENYWRITE (prevent writes) and the other was MAP_COPY, which kept a pure version of the original and prevented writers from changing the mapped image.
But DENYWRITE has been disabled for security reasons, and COPY is not implemented in any major Unix-like system.
Well this is a bit complex scenario that might be happening in your case. The reason of this error is normally the Memory Alignment issue. The Bus Error is more common to FreeBSD based system. Consider a scenario that you have a structure something like,
struct MyStruct {
char ch[29]; // 29 bytes
int32 i; // 4 bytes
}
So the total size of this structure would be 33 bytes. Now consider a system where you have 32 byte cache lines. This structure cannot be loaded in a single cache line. Now consider following statements
Struct MyStruct abc;
char *cptr = &abc; // char point points at start of structure
int32 *iptr = (cptr + 1) // iptr points at 2nd byte of structure.
Now total structure size is 33 bytes your int pointer points at 2nd byte so you can 32 byte read data from int pointer (because total size of allocated memory is 33 bytes). But when you try to read it and if the structure is allocated at the border of a cache line then it is not possible for OS to read 32 bytes in a single call. Because current cache line only contains 31 bytes data and remaining 1 bytes is on next cache line. This will result into an invalid address and will give "Buss Error". Most operating systems handle this scenario by generating two memory read calls internally but some Unix systems don't handle this scenario. To avoid this, it is recommended take care of Memory Alignment. Mostly this scenario happen when you try to type cast a structure into another datatype and try reading the memory of that structure.
The scenario is bit complex, so I am not sure if I can explain it in simpler way. I hope you understand the scenario.