What's a sub-buffer object in OpenCL for? - opencl

The align requirement seems to render at least the region part of this functionality almost completely useless.
Could anyone give me an example of when to create a sub-buffer from a region of a buffer?
And am I right that I can create a readonly or writeonly sub-buffer from a read-write buffer? If I can, will I benefit from this read/writeonly reference to a actually read-write buffer?

The purpose is to allow different parts of a buffer to be independently updated. One example would be if you want different devices to update different parts of your data structure. Rather than copying regions into new buffers, passing to devices, getting data back and remerging you can create sub-buffers and pass those to the devices.
You can create a read-write sub-buffer though. clCreateSubBuffer allows CL_MEM_READ_WRITE.

Related

ESP32 Partition and Data Storage

I am trying to write firmware code for RFID device which will have config data storage as well as the temporary storage that maybe can be read and then if convenient be removed.
I am using Arduino IDE to program this on an ESP32 Wroom32. I have tried to understand how the storage actually works, finding various resources. One being datasheet of the same, that says that there could be 4 MB of program code storage possible, and that sounds fantastic, my question is if for example I take EEPROM library and save about 214 bytes to config which will rarely be touched, where is it exactly being stored? Is it simply in NVS? I can see that the default settings show me about 1310720 Bytes of storage and I know that I can utilise other partitions as well to store more in case I ever try to have more sketch storage than 1310720 Bytes.
My question is if I am trying to store data such as config and real time data, how much would I possibly be able to store? Is there a limit? Would it cause any kind of problems if I try to use the other such partitions to write the code? Will it be only NVS that is storing that data or can I utilise the other app0, app1, spiffs etc to store extra Bytes? A lot of the resources are confusing me, here are the data that I am referring to from online 1 and 2. Any idea would help me proceed very further.
P.S. I am aware that the EEPROM library has been deprecated and I shall use either Preferences or littlefs for better management but if I am aware correctly I can still utilise them, and without much issue that will work since there is still compatibility for that. I am also curious about using inbuilt SRAM of RTC with the RTC attribute RTC_DATA_ATTR, since I hope to also utilise deep sleep mode incorporated.
My question is if I am trying to store data such as config and real time data, how much would I possibly be able to store? Is there a limit?
It depends. First on the module; there is ESP32-WROOM with 4MB flash but you could also order different flash sizes.
Then the question is: how big is your application (code)? Obviously this needs to be saved on the flash as well, reducing the total usable amount for data storage (by the size of the application). Also there is a bootloader which needs some small space as well.
Next, ESP32 is using a partition scheme. One partition is reserved for the bootloader. The rest can be divided between one or more application partitions, NVS partitions, and possibly other utility partitions (i.e. OTAData).
If you are using the OTA functions, there will be at least 3 application partitions of equal size, further reducing the total usable amount for data storage.
So the absolute upper limit of what you can store using NVS functions is the size of your NVS partition. However since it's a key-value storage, you must take into account the size of the key, which can be considerably larger than the data you store (up to 12 times for a 12 character key and a uint8 value).
So there is no way to say exactly how much data you can put into the system without knowing exactly how you're going to use it. For example, you could store one very large "blob" value that could take "up to 97.6%" of the partition size. But you could not store 10 "blob" values of 1/10 (9.76%) the size since you must take into account the keys and some flash metadata used internally.
Would it cause any kind of problems if I try to use the other such partitions to write the code?
That depends on what these partitions are used for. If you override the partition table, or bootloader, or your application code, yes there will be problems. If there is "free space" then it won't be a problem, but then you should redefine this free space as NVS space. It's nice of Espressif to provide this NVS library, dont work around it, work with it.
Using Espressif's esptool you can create custom partition tables where you could minimize the size of the application partition to just barely fit your application, and maximize the NVS partition size. This way you will get the most storage out of your device without manually implementing a filesystem. If you are using OTA, you should leave some empty room in your application partition, in case your application code grows, as it usually does.
Will it be only NVS that is storing that data or can I utilise the other app0, app1, spiffs etc to store extra Bytes?
You absolutely can, but you will destroy whatever data is on that partition. And you will have lots of work to do, because you'll have to implement all of this yourself (basically roll your own flash driver).
If you don't need OTA, you dont need app0/app1 partitions at all.
Note that SPIFFS is also a way to store data, except it's not key-value but file-based. If you dont need it, remove that partition, and fill the space with your NVS partition.
On the other hand, SPIFFS is probably a better alternative if you are really tight on flash space, since you can omit the key and do your own referencing.

How are you supposed to update a texture per frame in Vulkan?

I'm trying to work with 2D in vulkan along with 3D. So right now testing out updating a texture for every frame as whatever 2D is going on. I've gotten something of a texture updater working, the problem is that it's very slow and probably not the way it's supposed to be done. Is there any better way of getting this done? The code is based on the https://vulkan-tutorial.com/ code.
https://vulkan-tutorial.com/code/26_depth_buffering.cpp
void UpdateTexture()
{
vkDeviceWaitIdle(device);
vkFreeMemory(device, textureImageMemory, nullptr);
VkBuffer stagingBuffer;
VkDeviceMemory stagingBufferMemory;
createBuffer(imageSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);
void* data;
vkMapMemory(device, stagingBufferMemory, 0, imageSize, 0, &data);
memcpy(data, pixel2.data(), static_cast<size_t>(imageSize));
vkUnmapMemory(device, stagingBufferMemory);
createImage(texWidth, texHeight, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, textureImage, textureImageMemory);
transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
copyBufferToImage(stagingBuffer, textureImage, static_cast<uint32_t>(texWidth), static_cast<uint32_t>(texHeight));
transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);
vkDestroyBuffer(device, stagingBuffer, nullptr);
vkFreeMemory(device, stagingBufferMemory, nullptr);
createTextureImageView();
createDescriptorPool();
createDescriptorSets();
createCommandBuffers();
}
This code looks like a direct translation of some OpenGL code, and not particularly good/modern OpenGL code at that.
There's a lot wrong in this code, but most of it boils down to over-synchronization.
First, you should always view any call to vkDeviceWaitIdle as the wrong thing to do. The only exception would be when you are preparing to destroy the VkDevice itself. There is no other reason to do a full CPU/GPU sync like that.
Presumably, this synchronization exists so that you can be sure the GPU is finished using the image before modifying it. This is the wrong thing to do. You should instead employ multiple-buffering. That is, you should have two images that you use. One is currently being used in a rendering process, while the other is being transferred into.
Instead of doing a full device sync, you instead synchronize with the batch you sent two frames ago. That is, if you're wanting to transfer data for use by frame 10, then you must first do a fence-sync operation with the batch you sent in frame 8. Frame 9 is still being processed, but frame 8 is probably done by now. So the synchronization shouldn't hurt too much.
Second, never allocate memory in the middle of an operation like this. Memory gets allocated early in your application, and you leave it allocated until it's time to destroy your application. If you need a staging buffer, then keep it around and reuse it in subsequent frames. Make sure to allocate sufficient storage up-front.
Whatever your createBuffer call is doing, it seems very much like a bad idea. Vulkan is not OpenGL; Vulkan separated memory from buffers/textures that use it for a reason. Creating APIs that hide this separation basically throws all of that away.
Similarly, never unmap memory, unless you're about to destroy that memory object. There's no problem in Vulkan (or OpenGL) with leaving a piece of memory mapped indefinitely. Just map the entire memory's range and leave it mapped. Indeed, you could just pass the mapped pointer directly to your image loader, depending on how the memory get written by the image loading code (if it tries to read data from this pointer, they could be trouble).
Lastly, the commands doing the transfer need to be synchronized with the commands that consume the image. How this happens depends on which queues are being used to do the transfer.
And of course, if you want optimal performance, you may want to check to see if your implementation can read from linear images in your shader. If it can, then you may not need staging at all; you can just write the data directly to the memory in Vulkan's image format, and use it directly.
Employing all of the above is going to add a lot of complexity to your application. But that's how it's supposed to work.
A naive way consists in using the CPU to define the update depending on the time or data and then update the data for the shader, such as a MVP transformation matrix. But this is inefficient with lots of syncing and too low refresh rates, and also overloading the cpu in a loop.
So people recommend using many buffers sometimes mentioning old drivers. If someone can clarify it, that would be nice. I have a naive and probably wrong guess. If they know exactly the frame rate, then they can calculate the time for each frame and dispatch several frames in advance. But it confuses me because the frame rate is dynamic, especially for new screens with the FreeSync functionality that have dynamic refresh rates.
I have thought of a third possibility. One can use the clock directly in the shader. GL_EXT_shader_realtime_clock provides clockRealtimeEXT. It has no defined unit, and will wrap when exceeding the maximum value. But it is said "globally coherent by all invocations on the GPU". During initialization, you can measure its rate using a uniform buffer, and then assume the rate will be constant. And also manage the wrapping.
Then if you can write your shaders as a function of time, for example in a translation, that would be efficient. You just need the initial data. Remember that one must avoid if conditions in shaders.

OpenCL: Page flipping / ping-pong buffer with image3D?

I want to implement an algorithm in openCL which needs to apply a certain transformation on a 3D grayscale image several times. I have an input and an output image for my kernel. Now I would like to simply swap the input and output image and apply the kernel again. However, one image was created with read_only and the other one with write_only. Does this mean I have to use conventional buffers, or is there some trick, how to flip the two images, without first copying them from the device back to the host and back to the device again?
You say: "However, one image was created with read_only and the other one with write_only". The obvious answer is: don't do that, and you'll be fine.
The less obvious subtext is: There's a difference between creating an image with writeonly/readonly flags (which is done on the host-side via clCreateImage(...,CL_MEM_WRITE_ONLY/CL_MEM_READ_ONLY)) and the access-type inside a particular kernel (which is specified with the __read_only/__write_only qualifiers in the kernel's arguments definition).
Unless I'm totally mistaken, you can safely create your image with no restrictions (i.e. CL_MEM_READ_WRITE), then use it as a kernel's input parameter, and for the next kernel run, use it as the output parameter. You just can't mix read/write accesses during a single kernel run.

OutOfMemoryException

I have an application that is pretty memory hungry. It holds a large amount of data in some big arrays.
I have recently been noticing the occasional OutOfMemoryException. These OutOfMemoryExceptions are occurring long before my application (ASP.Net) has used up the 800mb available to it.
I have track the issue down to the area of code where the array is resized. The array contains a structure that is 74bytes in size. (I know that you shouldn't create struct's that are bigger than 16bytes), but this application is a port from a Vb6 application). I have tried changing the struct to a class and this appears to have fixed the problem for now.
I think the reason that changing to a class solves the problem has to do with the fact that when using a struct and the array is resized, a segment of memory that is large enough to store the new array needs to be reserved (e.g. (currentArraySize + increaseBySize)*74) cannot be found. This leads to the OutOfMemoryException.
This isn't the case with a class as each element of the array only needs 8bytes to store a pointer to the new object.
Is my thinking correct here?
Your assumptions regarding how arrays are stored are correct. Changing from struct to class will add a bit of overhead to each instance and you'll loose the advantages of locality as all data must be collected via a reference, but as you have observed it may solve your memory problem for now.
When you resize an array it will create a new one to hold the new data, then copy over the data, and you will have two copies of the same data in memory at the same time. Just as you expected.
When using structs the array will occupy the struct size * number of elements. When using a class it will only contain the pointer.
The same scenario is also true for List which increase in size over time, thus it's smart to initialize it with the expected number of items to avoid resizing and copying.
On 32bit systems you will hit outofmem around ~800mb, as you are aware of. One solution you can try is to but your structs on disk and read them when needed. Since they are a fixed size you can easily jump into the correct position at the file.
I have a project on Codeplex for handling large amounts of data. It has a type of Array with possibility for autogrowing, which might help your scenario if you run into problems with keeping it all in memory again.
The issue you are experiencing might be caused by fragmentation of the Large Object Heap rather than a normal out of memory condition where all memory really is used up.
See http://msdn.microsoft.com/en-us/magazine/cc534993.aspx
The solution might be as simple as growing the array by large fixed increments rather than smaller random increments so that as arrays are freed up the blocks of LOH memory can be reused for a new large array.
This may also explain the struct->class issue as the struct is likely stored in the array itself while the class will be a small object on the small object heap.
The .NET Framework 4.5.1, has the ability to explicitly compact the large object heap (LOH) during garbage collection.
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();
See more info in: GCSettings.LargeObjectHeapCompactionMode
And a question about it: Large Object Heap Compaction, when is it good?

Why are weak pointers useful?

I've been reading up on garbage collection looking for features to include in my programming language and I came across "weak pointers". From here:
Weak pointers are like pointers,
except that references from weak
pointers do not prevent garbage
collection, and weak pointers must
have their validity checked before
they are used.
Weak pointers interact with the
garbage collector because the memory
to which they refer may in fact still
be valid, but containing a different
object than it did when the weak
pointer was created. Thus, whenever a
garbage collector recycles memory, it
must check to see if there are any
weak pointers referring to it, and
mark them as invalid (this need not be
implemented in such a naive way).
I've never heard of weak pointers before. I would like to support many features in my language, but in this case I cannot for the life of me think of a case where this would be useful. For what would one use weak pointer?
A really big one is caching. Let's think through how a cache would work:
The idea behind a cache is to store objects in memory until memory pressure becomes so great that some of the objects need to be pushed out (or are explicitly invalidated of course). So your cache repository object must hold on to these objects somehow. By holding onto them via weak reference, when the garbage collector goes looking for things to consume because memory is low, the items referred to only by weak reference will appear as candidates for garbage collection. Items in the cache that are currently being used by other code will have hard references still active, so those items will be protected from garbage collection.
In most situations you won't be rolling your own caching mechanism, but it is common to use a cache. Let's suppose you want to have a property which refers to an object in cache, and that property stays in scope for a long time. You would prefer to fetch the object from cache, but if it's not available, you can get it from persisted storage. You also don't want to force that particular object to stay in memory if pressure gets too high. So you can use a weak reference to that object, which will allow you to fetch it if it is available but also allow it to fall out of cache.
A typical use case is storage of additional object attributes. Suppose you have a class with a fixed set of members, and, from the outside, you want to add more members. So you create a dictionary object -> attributes, where the keys are weak references. Then, the dictionary doesn't prevent the keys from being garbage collected; removal of the object should also trigger removal of the values in the WeakKeyDictionary (e.g. by means of a callback).
If your language's garbage collector is incapable of collecting circular data structures, then you can use weak references to enable it to do so. Normally, if you have two objects which have references to each other, but no other outside object has a reference to those two, they would be candidates for garbage collection. But, a naïve garbage collector wouldn't collect them, since they contain references to each other.
To fix this, you make it so one object has a strong reference to the second, but the second has a weak reference to the first. Then, when the last outside reference to the first object goes away, the first object becomes a candidate for garbage collection, followed shortly thereafter by the second, since now its only reference is weak.
Another example... not quite caching, but similar: Suppose an I/O library provides an object which wraps a file descriptor and permits access to the file. When the object is collected, the file descriptor is closed. It is desired to be able to list all currently opened files. If you use strong pointers for this list, then files are never closed.
Use them when you wanted to keep a cached list of objects but not prevent those objects from getting garbage collected if the "real" owner of the object is done with it.
A web browser might have a history object that keeps references to image objects that the browser loaded elsewhere and saved in the history/disk cache. The web browser might expire one of those images (user cleared the cache, the cache timeout elapsed, etc) but the page would still have the reference/pointer. If the page used a weak reference/pointer the object would go away as expected and the memory would be garbage collected.
One important reason for having weak references is to deal with the possibility that an object may serve as a pipeline to connect a source of information or events to one or more listeners. If there aren't any listeners, there's no reason to keep sending information to the pipeline.
Consider, for example, an enumerable collection which allows updates during enumeration. The collection may need notify any active enumerators that it has been changed, so those enumerators can adjust themselves accordingly. If some enumerators get abandoned by their creators, but the collection holds strong references to them, those enumerators will continue to exist (and process update notifications) as long as the collection exists. If the collection itself will exist for the lifetime of the application, those enumerators will effectively become a permanent memory leak.
If the collection holds weak references to the enumerators, this problem can be largely solved. If an enumerator is abandoned, it will be eligible for garbage collection, even though the collection still holds a weak reference to it. The next time the collection is changed, it can look through its list of weak references, send updates to the ones that are still valid, and remove from its list the ones that are not.
It would be possible to achieve many of the effects of weak references using finalizers along with some extra objects, and it's possible to make such implementations more efficient than those using weak references, but there are many pitfalls and it's hard to avoid bugs. It's much easier to make a correct approach using WeakReference. The approach may not be optimally efficient, but it won't fail badly.
Weak Pointers keep whatever holds them from becoming a form of "life support" for the object the pointer points to.
Say you had a Viewport class, and 2 UI classes, and a buch of Widget classes. You want your UI to control the lifespan of the Widgets it creates, so your UI keeps SharedPtrs to all the Widgets it controls. For as long as your UI object is alive, none of the Widgets it refrences will be garbage collected (thanks to SharedPtr).
However, the Viewport is your class that actually does the drawing, so your UI needs to pass the Viewport a pointer to the Widgets so that it can draw them. For whatever reason, you want to change your active UI class to the other one. Lets consider two scenarios, one where the UI passed the Viewport WeakPtrs and one where it passed SharedPtrs (pointing to the Widgets).
If you had passed the Viewport all the Widgets as WeakPointers, as soon as the UI class was deleted there would be no more SharedPointers to the Widgets, so they would be garbage collected, the Viewport's references to the objects wouldn't keep them on "life support", which is exactly what you want because you aren't even using that UI anymore, much less the Widgets it created.
Now, consider you had passed the Viewport a SharedPointer, you delete the UI, and the Widgets are NOT garbage collected! Why? because the Viewport, which is still alive has an array (vector or list, whatever) full of SharedPtrs to the Widgets. The Viewport has in effect became a form of "life support" for them, even though you had deleted the UI that was controlling the widgets for another UI object.
Generally, a language/system/framework will garbage collect anything unless there is a "strong" reference to it somewhere in memory. Imagine if everything had a strong reference to everything, nothing would ever get garbage collected! Sometimes you want that behavior sometimes you don't. If you use a WeakPtr, and there are no Shared/StrongPtrs left pointing at the object (only WeakPtrs), then the objects will be garbage collected despite the WeakPtr references, and the WeakPtrs (should be) set to NULL (or deleted, or something).
Again, when you use a WeakPtr you're basically allowing the object you're giving it too to be able to access the data, but the WeakPtr won't prevent garbage collection of the object it points to like a SharedPtr would. When you think SharedPtr, think "life support", WeakPtr, NO "life support." Garbage collection won't (generally) occur until the object has zero life support.
Weak references can for example be used in caching scenarios - you can access data through weak references, but if you don't access the data for a long time or there is high memory pressure, the GC can free it.
The reason for garbage collection at all is that in a language like C where memory management is totally under explicit control of the programmer, when object ownership is passed around, especially between threads or, even harder, between processes sharing memory, avoiding memory leaks and dangling pointers can become very hard. If that weren't hard enough, you also have to deal with the need to have access to more objects than will fit in memory at one time—you need to have a way to have free up some objects for a while so that other objects can be in memory.
So, some languages (e.g., Perl, Lisp, Java) provide a mechanism where you can just stop "using" an object and the garbage collector will eventually discover this and free up memory used for the object. It does this correctly without the programmer worrying about all the ways they can get it wrong (albeit there are lots of ways programmers can screw this up).
If you conceptually multiply the number of times you access an object by the time that it takes to compute the value of an object, and possibly multiply again by the cost of not having the object readily available or by the size of an object since keeping a large object around in memory can prevent keeping several smaller objects around, you could classify objects into three categories.
Some objects are so important that you want to explicitly manage their existence—they will not be managed by the garbage collector or they must never be collected until explicitly freed. Some objects are cheap to compute, are small, are not accessed frequently or have similar characteristics that allow them to be garbage collected at any time.
The third class, objects which are expensive to be recomputed but could be recomputed, are accessed somewhat frequently (perhaps for a short burst of time), are of large size, and so on are a third class. You'd like to keep them in memory as long as possible because they might be reused again, but you don't want to run out of memory needed for critical objects. These are candidates for weak references.
You want these objects kept around as long as possible if they aren't conflicting with critical resources, but they should be dropped if memory is needed for a critical resource because it can be recomputed again when needed. These are hat weak pointers are for.
An example of this might be pictures. Say you have a photo web page with thousands of pictures to display. You need to know how many pictures to lay out and maybe you have to do a database query to get the list. The memory to hold a list of a few thousand items is probably very small. You want to do the query once and keep it around.
You can only physically show perhaps a few dozen pictures at a time, though, in a pane of a web page. You don't need to fetch the bits for the pictures that the user can't be looking at. When the user scrolls the page, you'll gather the actual bits for the pictures visible. Those pictures could require many megabytes to show them. If the user scrolls back and forth between a few scroll positions, you'd like not to have to refetch those megabytes over and over again. But you can't keep all the pictures in memory all the time. So you use weak pointers.
If the user just looks at a few pictures over and over again, they may stay in cache and you don't have to refetch them. But if they scroll enough, you need to free up some memory so the visible pictures can be fetched. With a weak reference, you check the reference just before you use it. If its still valid, you use it. If its not, you make the expensive calculation (fetch) to get it.

Resources