Do ioctl Functions Use 32 bit Pointers? [duplicate] - pointers

Does C guarantee that sizeof(long) == sizeof(void*)? If not, why is it used so extensively in Linux kernel source code?
I looked at sizeof (int) == sizeof (void*)? but that talks about sizeof(int) vs sizeof(void *).

No, the C standard does not guarantee that sizeof(long) == sizeof(void *).
In practice, on Windows 64-bit systems, the values are 4 for sizeof(long) and 8 for sizeof(void *). This design conforms to the C standard. See also What is the bit-size of long on 64-bit Windows?
Those implementing the Linux kernel have presumably decided that they'll never port the code to a system that follows the Windows 64-bit LLP64 (long long and pointers are 64-bit quantities) system, and therefore don't need to concern themselves with whether the sizes are different. Both the 32-bit systems (ILP32) and the 64-bit systems (LP64) do have sizeof(long) == sizeof(void *). But the C standard does not guarantee it.

The only guarantees are:
void * and char * have the same size and alignment;
Pointers to qualified types have the same size and alignment as pointers to their unqualified equivalents (i.e., sizeof (const int *) == sizeof (int *));
All struct pointer types have the same size and alignment;
All union pointer types have the same size and alignment;
That's it.
If Linux kernel developers are writing code that assumes sizeof (long) == sizeof (void *), then they've decided to limit which platforms they're going to support. Which is absolutely fine - you don't have to support every oddball architecture out there.

Related

What memory addresses can never point to a valid object

I would like to have a set of dummy addresses as flag values that can never be a valid pointer.
For example, if I knew that pointers 0xffff0000 through 0xffffffff where always invalid I could do something like this in C
enum {
SIZE_TOO_SMALL = 0xffff0001,
SIZE_TOO_LARGE = 0xffff0002,
SIZE_EVEN = 0xffff0003,
};
char* allocate_odd_arry(int size) {
if (size % 2 == 0)
return SIZE_EVEN;
if (size < 100)
return SIZE_TOO_SMALL;
if (size > 1000)
return SIZE_TOO_LARGE;
return malloc(size);
}
A silly example, but potentially powerful since it removes the need of sending an extra flag variable. One way I could do this is to allocate a few bytes myself and use those addresses as flags, but that comes with a small memory cost for each unique flag I use.
I don't expect a portable solution, but is there any guarantee on windows, linux, macos, that the addressable space will not include certain values?
For windows I have found this article which says that on 32 bits systems the virtual address space is 0x00000000 to 0x7fffffff, and for 64 bit systems it is 0x0000000000000000 to 0x00007fffffffffff. I am not sure if other addresses have any reserved meaning, but they ought to be safe for this use case.
Looking at Linux the answer seems a bit more complicated because (like everything else in linux) it is configurable. This answer on unix SE shows how memory is divided between the kernel and user space. 0000_8000_0000_0000 to ffff_7fff_ffff_ffff is listed as non canonical, which I think means it should never be used. Though really the kernel space (ffff_8000_0000_0000 to ffff_ffff_ffff_ffff) seems like it ought to be safe to use as well, but I'm less sure if there could never be a system function that returns such a pointer.
On Mac OS I've found this article which puts the virtual memory range as
0 to 0x0007_FFFF_FFFF_F000 (64 bit) or 0 to 0xFFFF_F000 (32 bit), so outside of these ranges would be fine.
Seems there is a little bit of overlap between all of the unused regions, so if you wanted to target all three platforms with the same address it would be possible. I'm still not 100% confident that these addresses are really truly safe to use on the respective OS, so I'm still holding out for anyone more knowledgeable to chime in.

Why my memory has 18 digits addresses? D:

I was programming following a tutorial and when I was in the pointer tutorial I notice that the output of thsi code is so much larger than the normal (it is ptr = 0x000000cd9d1cf504) :/ why?
int main()
{
int pointerTest = 6;
void* ptr = 0;
ptr = &pointerTest;
std::cout << ptr << std::endl;
std::cin.get();
}
It's not an 18-digit address - it only consists of 16 digits. The prefix 0x merely indicates that what comes after it is going to be in hexadecimal form. The other commonly used notation for hexadecimal integers is h (or sometimes x, such as in VHDL) either prefixed or postfixed (for example hCD9D1CF504, h'CD9D1CF504 or CD9D1CF504h - note that this is quite unclear unless the hexadecimal digits A-F are capitalized).
One hexadecimal digit represents 4 bits, so the pointer is 4 * 16 = 64 bits in size. In other words, the binary executable produced by your compiler is 64-bit, while the tutorial binary likely was 32-bit, as pointed out by #Hawky in the comments.
To fully understand the difference between 32-bit and 64-bit code, you'll have to study computer architecture, the x86-64 in particular. Be warned, though - if you choose to go down that route, prepare for a lifetime of pain and suffering (the worst bit being that you might just enjoy it).

how those bit-wise operation work and why wouldn't it use little/small endian instead

i found those at arduino.h library, and was confused about the lowbyte macro
#define lowByte(w) ((uint8_t) ((w) & 0xff))
#define highByte(w) ((uint8_t) ((w) >> 8))
at lowByte : wouldn't the conversion from WORD to uint8_t just take the low byte anyway? i know they w & 0x00ff to get the low byte but wouldn't the casting just take the low byte ?
at both the low/high : why wouldn't they use little endians, and read with size/offset
i.e. if the w is 0x12345678, high is 0x1234, low is 0x5678, they write it to memory as 78 56 34 12 at say offset x
to read the w, you read to size of word at location x
to read the high, you read byte/uint8_t at location x
to read the low, you read byte/uint8_t at location x + 2
at lowByte : wouldn't the conversion from WORD to uint8_t just take the low byte anyway? i know they w & 0x00ff to get the low byte but wouldn't the casting just take the low byte ?
Yes. Some people like to be extra explicit in their code anyway, but you are right.
at both the low/high : why wouldn't they use little endians, and read with size/offset
I don't know what that means, "use little endians".
But simply aliasing a WORD as a uint8_t and using pointer arithmetic to "move around" the original object generally has undefined behaviour. You can't alias objects like that. I know your teacher probably said you can because it's all just bits in memory, but your teacher was wrong; C and C++ are abstractions over computer code, and have rules of their own.
Bit-shifting is the conventional way to achieve this.
In the case of lowByte, yes the cast to uint8_t is equivalent to (w) & 0xff).
Regarding "using little endians", you don't want to access individual bytes of the value because you don't necessarily know whether your system is using big endian or little endian.
For example:
uint16_t n = 0x1234;
char *p = (char *)&n;
printf("0x%02x 0x%02x", p[0], p[1]);
If you ran this code on a little endian machine it would output:
0x34 0x12
But if you ran it on a big endian machine you would instead get:
0x12 0x34
By using shifts and bitwise operators you operate on the value which must be the same on all implementations instead of the representation of the value which may differ.
So don't operate on individual bytes unless you have a very specific reason to.

OpenCL and Tesla M1060

I'm using the Tesla m1060 for GPGPU computation. It has the following specs:
# of Tesla GPUs 1
# of Streaming Processor Cores (XXX per processor) 240
Memory Interface (512-bit per GPU) 512-bit
When I use OpenCL, I can display the following board information:
available platform OpenCL 1.1 CUDA 6.5.14
device Tesla M1060 type:CL_DEVICE_TYPE_GPU
max compute units:30
max work item dimensions:3
max work item sizes (dim:0):512
max work item sizes (dim:1):512
max work item sizes (dim:2):64
global mem size(bytes):4294770688 local mem size:16383
How can I relate the GPU card informations to the OpenCL memory informations ?
For example:
What does "Memory Interace" means ? Is it linked the a Work Item ?
How can I relate the "240 cores" of the GPU to Work Groups/Items ?
How can I map the work-groups to it (what would be the number of Work groups to use) ?
Thanks
EDIT:
After the following answers, there is a thing that is still unclear to me:
The CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE value is 32 for the kernel I use.
However, my device has a CL_DEVICE_MAX_COMPUTE_UNITS value of 30.
In the OpenCL 1.1 Api, it is written (p. 15):
Compute Unit: An OpenCL device has one or more compute units. A work-group executes on a single compute unit
It seems that either something is incoherent here, or that I didn't fully understand the difference between Work-Groups and Compute Units.
As previously stated, when I set the number of Work Groups to 32, the programs fails with the following error:
Entry function uses too much shared data (0x4020 bytes, 0x4000 max).
The value 16 works.
Addendum
Here is my Kernel signature:
// enable double precision (not enabled by default)
#ifdef cl_khr_fp64
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#else
#error "IEEE-754 double precision not supported by OpenCL implementation."
#endif
#define BLOCK_SIZE 16 // --> this is what defines the WG size to me
__kernel __attribute__((reqd_work_group_size(BLOCK_SIZE, BLOCK_SIZE, 1)))
void mmult(__global double * A, __global double * B, __global double * C, const unsigned int q)
{
__local double A_sub[BLOCK_SIZE][BLOCK_SIZE];
__local double B_sub[BLOCK_SIZE][BLOCK_SIZE];
// stuff that does matrix multiplication with __local
}
In the host code part:
#define BLOCK_SIZE 16
...
const size_t local_work_size[2] = {BLOCK_SIZE, BLOCK_SIZE};
...
status = clEnqueueNDRangeKernel(command_queue, kernel, 2, NULL, global_work_size, local_work_size, 0, NULL, NULL);
The memory interface doesn't mean anything to an opencl application. It is the number of bits the memory controller has for reading/writing to the memory (the ddr5 part in modern gpus). The formula for maximum global memory speed is approximately: pipelineWidth * memoryClockSpeed, but since opencl is meant to be cross-platform, you won't really need to know this value unless you are trying to figure out an upper bound for memory performance. Knowing about the 512-bit interface is somewhat useful when you're dealing with memory coalescing. wiki: Coalescing (computer science)
The max work item sizes have to do with 1) how the hardware schedules computations, and 2) the amount of low-level memory on the device -- eg. private memory and local memory.
The 240 figure doesn't matter to opencl very much either. You can determine that each of the 30 compute units is made up of 8 streaming processor cores for this gpu architecture (because 240/30 = 8). If you query for CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, it will very likey be a multiple of 8 for this device. see: clGetKernelWorkGroupInfo
I have answered a similar questions about work group sizing. see here, and here
Ultimately, you need to tune your application and kernels based on your own bench-marking results. I find it worth the time to write many tests with various work group sizes and eventually hard-code the optimal size.
Adding another answer to address your local memory issue.
Entry function uses too much shared data (0x4020 bytes, 0x4000 max)
Since you are allocating A_sub and B_sub, each having 32*32*sizeof(double), you run out of local memory. The device should be allowing you to allocate 16kb, or 0x4000 bytes of local memory without an issue.
0x4020 is 32 bytes or 4 doubles more than what your device allows. There are only two things I can think of that may cause the error: 1) there could be a bug with your device or drivers preventing you from allocating the full 16kb, or 2) you are allocating the memory somewhere else in your kernel.
You will have to use a BLOCK_SIZE value less than 32 to work around this for now.
There's good news though. If you only want to hit a multiple of CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE as a work group size, BLOCK_SIZE=16 already does this for you. (16*16 = 256 = 32*8). To better take advantage of local memory, try BLOCK_SIZE=24. (576=32*18)

Will a 64-bit Unsigned Integer wrap around on 32-bit system?

Simple question, need an answer quickly please!
Take this situation on a 32-bit machine:
Unsigned long long n = 1;
n -= 2;
I know on a 64-bit machine, this would wrap around to the highest unsigned long long. But what would happen on a 32-bit machine, since the long long is stored as two separate words?
Thank you!
If the implementation is conforming, then the same: it will correctly wrap around. I assume this is C; The C standard requires this behavior independently from the implementation details.
A 64 bit integer datatype behaves the same on all architectures, including 32 bit. If not, programming would be quite hard, wouldn't it?

Resources