Allocate Memory on the heap of Client Process using Pin - intel-pin

I am a newbie to Pin, and basically, I would like to use a Pintool to initialize a memory region, which can be read/write by the user process later during run time.
I would like to make the memory region on heap, as the data structure I want to initialize is relatively large. After reading the Pin manual, I am aware of Pin safecopy which can be used for memory copy.
However, I don't know how to allocate memory on the heap, which, should be accessible by the attached client process.
Am I clear enough? Could anyone give me some help on it? Thank you!

Related

Booting from SRAM

I was reading the specification of the microcontroller. In Booting, they mentioned three option.
1.Main Flash memory
2.System Memory
3.Embedded SRAM Memory.
First two memory is non-volatile memory, so you put your code and start booting. but SRAM is a volatile memory, when the power goes off code will be erased. so what is use of SRAM for booting? In many blogs, All advise to use SRAM for booting.
what is the use of using non-volatile memory in booting?
Disclaimer: Since you didn't tell us which microcontroller you are using, this answer has to be quite general.
Not every system start follows a power-down. The SRAM can be filled with some decent program before the reset. This can be done by hardware, or by software. In the latter case another (or the same) program ran in non-volatile memory (that is non-RAM) and filled the volatile memory (that is RAM).
SRAM keeps its contents during reset.
Many microcontrollers allow to change the selection where to boot from during run-time.

Using PIC18F K40 microcontroller flash memory as storage

I'm using PIC18F67K40 microcontroller in my project.
It has 1kB EEPROM memory and 128kB program memory (flash).
For now I'm using EEPROM to store my settings.
Application is "growing" and I realized that at some point 1kB will be not enough. Some of settings are arrays of pretty big structures.
I realize, that flash memory has 100k 10k write cycles and that I can buy external EEPROM, but I don't want to change anything in hardware and memory in this product will never reach 2k writes for sure.
My quesion is:
How can I switch from EEPROM storage to flash storage?
Do I have to recalculate some CRC after program memory changes?
Do I have to define somewhere in project settings, that I'm using some flash memory for storage?
Is there anything what I have to do in order to use flash memory like this?
100k writes is only the endurance of the data EEPROM not of the flash memory (only 10k writes). You could expand the endurance with a EEPROM emulation.
There is a really nice library from Microchip for EEPROM emulation in flash memory.
Have a look here: EEPROM emulation
I did this for a client a couple of years ago. I can't post the code for NDA and copyright reasons, but the basic trick was to use something called RTSP (Run Time Self Programming). RTSP may be going obsolete now, but whatever replaces it may work in a similar way.
Essentially the flash looks like a series of pages which can be written word at a time, but erased page at a time. What you will need to do is write some code that can unlock and erase a page then write to it. Once you have done this the page can be read as ordinary memory.
You don't need to change the settings. However make sure that the page you use is well clear of the program code.
If you want a CRC (usually a good move) you'll have to calculate it yourself.

OpenCL Buffer Creation

I am fairly new to OpenCL and though I have understood everything up until now, but I am having trouble understanding how buffer objects work.
I haven't understood where a buffer object is stored. In this StackOverflow question it is stated that:
If you have one device only, probably (99.99%) is going to be in the device. (In rare cases it may be in the host if the device does not have enough memory for the time being)
To me, this means that buffer objects are stored in device memory. However, as is stated in this StackOverflow question, if the flag CL_MEM_ALLOC_HOST_PTR is used in clCreateBuffer, the memory used will most likely be pinned memory. My understanding is that, when memory is pinned it will not be swapped out. This means that pinned memory MUST be located in RAM, not in device memory.
So what is actually happening?
What I would like to know what do the flags:
CL_MEM_USE_HOST_PTR
CL_MEM_COPY_HOST_PTR
CL_MEM_ALLOC_HOST_PTR
imply about the location of buffer.
Thank you
Let's first have a look at the signature of clCreateBuffer:
cl_mem clCreateBuffer(
cl_context context,
cl_mem_flags flags,
size_t size,
void *host_ptr,
cl_int *errcode_ret)
There is no argument here that would provide the OpenCL runtime with an exact device to whose memory the buffer shall be put, as a context can have multiple devices. The runtime only knows as soon as we use a buffer object, e.g. read/write from/to it, as those operations need a command queue that is connected to a specific device.
Every memory object an reside in either the host memory or one of the context's device's memories, and the runtime might migrate it as needed. So in general, every memory object, might have a piece of internal host memory within the OpenCL runtime. What the runtime actually does is implementation dependent, so we cannot not make too many assumptions and get no portable guarantees. That means everything about pinning etc. is implementation-dependent, and you can only hope for the best, but avoid patterns that will definitely prevent the use of pinned memory.
Why do we want pinned memory?
Pinned memory means, that the virtual address of our memory page in our process' address space has a fixed translation into a physical memory address of the RAM. This enables DMA (Direct Memory Access) transfers (which operate on physical addresses) between the device memory of a GPU and the CPU memory using PCIe. DMA lowers the CPU load and possibly increases copy speed. So we want the internal host storage of our OpenCL memory objects to be pinned, to increase the performance of data transfers between the internal host storage and the device memory of an OpenCL memory object.
As a basic rule of thumb: if your runtime allocates the host memory, it might be pinned. If you allocate it in your application code, the runtime will pessimistically assume it is not pinned - which usually is a correct assumption.
CL_MEM_USE_HOST_PTR
Allows us to provide memory to the OpenCL implementation for internal host-storage of the object. It does not mean that the memory object will not be migrated into device memory if we call a kernel. As that memory is user-provided, the runtime cannot assume it to be pinned. This might lead to an additional copy between the un-pinned internal host storage and a pinned buffer prior to device transfer, to enable DMA for host-device-transfers.
CL_MEM_ALLOC_HOST_PTR
We tell the runtime to allocate host memory for the object. It could be pinned.
CL_MEM_COPY_HOST_PTR
We provide host memory to copy-initialise our buffer from, not to use it internally. We can also combine it with CL_MEM_ALLOC_HOST_PTR. The runtime will allocate memory for internal host storage. It could be pinned.
Hope that helps.
The specification is (deliberately?) vague on the topic, leaving a lot of freedom to implementors. So unless an OpenCL implementation you are targeting makes explicit guarantees for the flags, you should treat them as advisory.
First off, CL_MEM_COPY_HOST_PTR actually has nothing to do with allocation, it just means that you would like clCreateBuffer to pre-fill the allocated memory with the contents of the memory at the host_ptr you passed to the call. This is as if you called clCreateBuffer with host_ptr = NULL and without this flag, and then made a blocking clEnqueueWriteBuffer call to write the entire buffer.
Regarding allocation modes:
CL_MEM_USE_HOST_PTR - this means you've pre-allocated some memory, correctly aligned, and would like to use this as backing memory for the buffer. The implementation can still allocate device memory and copy back and forth between your buffer and the allocated memory, if the device does not support directly accessing host memory, or if the driver decides that a shadow copy to VRAM will be more efficient than directly accessing system memory. On implementations that can read directly from system memory though, this is one option for zero-copy buffers.
CL_MEM_ALLOC_HOST_PTR - This is a hint to tell the OpenCL implementation that you're planning to access the buffer from the host side by mapping it into host address space, but unlike CL_MEM_USE_HOST_PTR, you are leaving the allocation itself to the OpenCL implementation. For implementations that support it, this is another option for zero copy buffers: create the buffer, map it to the host, get a host algorithm or I/O to write to the mapped memory, then unmap it and use it in a GPU kernel. Unlike CL_MEM_USE_HOST_PTR, this leaves the door open for using VRAM that can be mapped directly to the CPU's address space (e.g. PCIe BARs).
Default (neither of the above 2): Allocate wherever most convenient for the device. Typically VRAM, and if memory-mapping into host memory is not supported by the device, this typically means that if you map it into host address space, you end up with 2 copies of the buffer, one in VRAM and one in system memory, while the OpenCL implementation internally copies back and forth between the 2.
Note that the implementation may also use any access flags provided ( CL_MEM_HOST_WRITE_ONLY, CL_MEM_HOST_READ_ONLY, CL_MEM_HOST_NO_ACCESS, CL_MEM_WRITE_ONLY, CL_MEM_READ_ONLY, and CL_MEM_READ_WRITE) to influence the decision where to allocate memory.
Finally, regarding "pinned" memory: many modern systems have an IOMMU, and when this is active, system memory access from devices can cause IOMMU page faults, so the host memory technically doesn't even need to be resident. In any case, the OpenCL implementation is typically deeply integrated with a kernel-level device driver, which can typically pin system memory ranges (exclude them from paging) on demand. So if using CL_MEM_USE_HOST_PTR you just need to make sure you provide appropriately aligned memory, and the implementation will take care of pinning for you.

How to execute large code in less ram?

I have a doubt that , in all micro controllers the flash memory much more that ram( Example: atmega16 it is 16k, However the RAM is just 1 Kb).
.
So , how exactly that code is executed , does the CPU execute directly from the Flash itself , if yes then whats the use of that small RAM given.
The flash memory is for storing the programs that you want to execute. They change seldom, so flash memory is appropriate.
The RAM is for the memory required during execution of the program: stack (local variables), heap (malloc), etc.
AVRs using a Harvard Architecture that strictly separates Program and Data Memory.
In difference to PC that laods the Programm to RAM first to execute it from RAM, the code is directly executed from Programm Memory and only runtime data is stored in the RAM.
Be aware that setting a variable as const does not necessarily create the variable and put it in flash. Although it may or may not be best off in flash, the compiler does not automatically do this.
For an example check out the following link for avr-gcc.
http://www.nongnu.org/avr-libc/user-manual/pgmspace.html

Pointers between OpenCL buffers

Consider the following. In a context there exist two buffers allocated in device memory, buffer A and buffer B. One buffer contains a pointer to something in another buffer. Assuming the host will propery keep the buffers alive between kernel invocations, is it safe to have this setup? Particularly is it guaranted that the implementation will not move buffers around thus invalidating the pointers?
It seems not, atleast if the context has more than one device.
Сlause 5.4.4 Migrating Memory Objects of the specification among other things states:
Typically, memory objects are implicitly migrated to a device for
which enqueued commands, using the memory object, are targeted.
And there seems to be no way to prohibit this migration, and no information on what happens if there is only one device in a context.
Alas it appears that the only way to keep addressing consistent is to allocate one huge buffer and do manual memory-management of it's contents storing all addresses as offsets from the beginning of the buffer.
OpenCL 1.2 does not support pointers to pointers in buffers, but it seems that OpenCL 2.0 will allow this. See the slide titled "SVM: Shared Virtual Memory" in this presentation.

Resources