I was studying OpenCL releasing functions ( clRelease(objectName) ) and it was interesting for me that there was no function to release Platform (more specifically, cl_platform_id) objects.
Does anybody know the reason ?
It's because you create platform objects with a regular malloc and not a clCreateObjectName() function. So you release them with a regular free. I guess it's like that because platforms are host resources.
Note that it is the same for device objects.
EDIT: To clarify a bit, thanks to #chippies' comment: The clGetPlatformIDs() function has two uses. first to query the number of platforms available in the system. Second, to fill the memory space you allocated for the platforms with the actual platform(s) you decided to use. You store these platforms in a memory space that you first malloc. Therefore when you are done with these platforms you release them with a free the memory that you malloc-ed.
Related
I am writing a N-body physics simulation. I would like to ask if there is an alternative to the OpenCL clGetGLContextInfoKHR() function ? I need to find out during runtime which GPU is used for OpenGL rendering so that I can use OpenCL for vertex manipulation on this same GPU (for performance reasons).
I have searched the OpenCL.dll for the function clGetGLContextInfoKHR() using Dependancy Walker, but it seems that the implementation that is installed on my computer does not support it sice this function is missing from the DLL. I have also tried glGetString(GL_RENDERER), but the name string it returns differres from the name string which clGetDeviceInfo(.., CL_DEVICE_NAME, ...) returns (not by much, but enough to make it for example difficult to destinguish two GPUs from the same manufacturer). Is there any other way except manually choosing the correct OpenCL device ?
Thanks for help !
I can't find any information about a requirement to release the buffers within an IMFSample before actually releasing the IMFSample itself. If I do just release the IMFSample does that automatically release it's buffers? I'm writing a video player app and I'm receiving a sample from IMFSourceReader::ReadSample. While I see the code running I see a small increase in memory usage in VS2017 and I'm not sure if it's a leak yet. The code I'm using is based on the sample code in this post.
Media Foundation webcam video H264 encode/decode produces artifacts when played back
I found the IMFSample::RemoveAllBuffers method that may or may not release the buffers, it doesn't specify in the documentation. Maybe this needs to be used before releasing the IMFSample? https://msdn.microsoft.com/en-us/library/windows/desktop/ms703108(v=vs.85).aspx
(I also run across another related post in my research but I don't think it applies to my question:)
Should I release the returned IMFSample of an internally allocated MFT output buffer?
Thanks!
Regular COM interface pointer management rules apply: you just release IMFSample pointer and that's it.
In certain cases your release of a sample pointer does not lead to actual freeing of memory because samples might be pooled: the released sample object is returned to its parent pool and is ready for reuse.
Just to echo from what Roman said, if you just handle an IMFSample, you just need to release this IMFSample. If you AddBuffer on the IMFSample, you have to remove it (i agree it's not clear, but if buffers are removed at this time, this will create problems with the component who handles IMFSample, and it will not find buffers after). that said, this design can cause problems if no one calls RemoveAllBuffers at the right time.
Differents components are not supposed to use AddBuffer on the same IMFSample. But it is possible to do it.
This is bad design for memory management. You have to deal with it.
Last but not least.
According to Microsoft you have to release the buffers (before- or afterward doesn't matter).
See: Decode the audio
I am preparing my opencl accelerated toolkit for release. Currently, I compile my opencl kernels into binaries targeted for a particular card.
Are there other ways of discouraging reverse engineering? Right now, I have many opencl binaries in my release folder, one for each kernel. Would it be better to splice these binaries into one single binary, or even add them into the host binary, and somehow read them in using a special offset ?
OpenCL 2.0 and SPIR-V can be used for this, but is not available on all platforms yet.
Encode binaries. Keep keys in server and have clients request it at time of usage. Ofcourse keys should be encoded too,( using a variable value such as time of server maybe). Then decode in client to use as binary kernel.
I'm not encode pro but I would use multiple algorithms applied multiple times to make it harder, if they are crunchable in several months(needed for new version update of your GPGPU software for example) when they are alone. But simple unknown algorithm of your own such as reversing order of bits of all data (1st bit goes nth position, nth goes 1st) should make it look hard for level-1 hackers.
Warning: some profiling tools could get its codes in "run-time" so you should add many maybe hundreds of trivial kernels without performance penalty to hide it in a crowded timeline or you could disable profiling in kernel options or you could add a deliberate error maybe some broken events in queues then restart so profiler cannot initiate.
Maybe you could obfuscate final C99 code so it becomes unreadable by humans. If can, he/she doesn't need hacking in first place.
Maybe most effectively, don't do anything, just buy copyrights of your genuine algorithm and show it in a txt so they can look but can not dare copying for money.
If kernel can be rewritten into an "interpreted" version without performance penalty, you can get bytecodes from server to client, so when client person checks profiler, he/she sees only interpreter codes but not real working algorithm since it depends on bytecodes from server as being "data"(buffer). For example, c=a+b becomes if(..)else if(...)else(case ...) and has no meaning without a data feed.
On top of all these, you could buy time against some evil people reverseengineer it, you could pick variable names to initiate his/her "selective perception" so he/she can't focus for a while. Then you develop a meaner version at the same time. Such as c=a+b becomes bulletsToDevilsEar=breakDevilsLeg+goodGame
I want to ship OpenCL code that should work on all OpenCL 1.1 compatible GPUs. Rather than buying a bunch of GPUs and testing on them, are there any tools that can help ensure reliability?
If anyone has experience shipping OpenCL applications to a wide hardware base, I'd be interested in knowing about any other methods for testing reliability.
I've a bit of knowledge on this. Unfortunately, the answer is: depends on what the kernel is doing.
My biggest gripe is with NVIDIA and OpenCL, since they don't seem to support: vectors (float2, 4, etc) and global offsets. Kind of obnoxious. Intel and ATI are both solid, but even then vector sizes can differ. The above doesn't really matter if you are doing image convolution.
It matters if you want to run AMD FFT on an NVIDIA card, are doing matrix math, etc. To address the vector issue, you can write multiple kernels that each have a different vector size and call the right one: MatrixMult_float4(...).
You can check whether your code compiles by using the AMD KernelAnalyzer2, although this does need some component of the Catalyst drivers so it only works for me on PCs with AMD GPUs. There is also the Intel Kernel Builder, which works for devices with Intel OpenCL SDK support. Nvidia's implementation has bugs in it, especially on newer GPUs in my experience so there the best is to test one GPU from each generation.
To avoid extensions and validate CL language versions, one could try to test compile the code using the LLVM, or just getting the grammar for validation, e.g. as BNF.
There's a promising open source project, which probably contains useful stuff: http://bazaar.launchpad.net/~pocl/pocl/master/files/head:/lib/CL/
However, the problems I encountered were:
Newline characters caused build breakers on certain implementations (CR, LF, CRLF) in OpenCL source files. Specifying one of these as the only valid line ending would be just stupid. If one is editing source files on different platforms in conjunction with an SCM, it could get inconvenient. So I remove comments and clean up line breaks before compilation.
Performance: Feeding the GPU efficiently using multithreading; different hardware constellations have different bottlenecks. Here I needed a client side pipeline with multiple dispatcher threads. Of course, the amount of work that remains for the CPU depends on the task or capabilities, amount and resources of computing devices. Things that needed serialized execution or dynamic loop counts have been such candidates.
I'm looking for comparisons between OpenCL and DirectCompute, but I haven't found anything. OpenCL's advantages of being cross-platform and having a wider range of supported GPUs don't matter to me. I'm fine with coding on Windows against DX11 GPUs only. Assuming that, what are the pros and cons of each API?
I know this question was raised before, but I'm looking for more details.
I'm not interested in CUDA, since I don't want to restrict myself to only Nvidia hardware.
Probably the biggest difference for a coder is that DirectCompute is programmed by a language which is similar to HLSL, and OpenCL is programmed via a C-like language.
Another difference to consider is that, generally, for commodity level GPUs, the DirectX support is better (faster and less buggy) than OpenGL support on Windows. This may translate to more stable support for DirectCompute, but really, this is just speculation.
Well the major advantage of OpenCL is that it is not just limited to graphics cards. You can make use of your multicore CPU, Graphics Card and potentially any number of other hardware acceleration devices (DSPs etc) all from the same program.
I'm not sure if DirectCompute allows that freedom.
The OpenCL cross-platform-ness is not just a detail, as the host code (the one calling the OpenCL API and submitting kernels) can itself be cross-platform (see link text, link text...).
Write once, run on any GPGPU, anywhere.
Otherwise the OpenCL tooling is really getting better, with an ATI Stream plugin for Visual Studio, the NVidia & ATI SDKs that contains tons of samples, etc...
Another option now is C++ AMP which gives you modern C++ syntax without a need for a seperate compiler while still preserving hardware portability. Please follow links from here for more info and feel free to post questions as you have them: http://blogs.msdn.com/b/nativeconcurrency/archive/2011/09/13/c-amp-in-a-nutshell.aspx
I use OpenCL because i can easily port my App to Linux but with DirectCompute this is not possible.
I think also that the performance of the OpenCL implementation will increase with time (that it comes at the same Level like CUDA for NVidia Cards) and also that the (driver)bugs will (hopefully ;) ) be eliminated with time.