any one help i execute opencl kernel program for image processing and want to know the process faster when the kerenl execute on cpu or gpu
i using tool "GPU caps viewer" take picture for specification cpu and gpu and want to know the best for run opencl kernel code to the cpu or gpu . why??
the gpu info
and for cpu
amd info
intel
and i want to know why three option different for cpu (intel , amd) info
constant buffer amd 64KB intel 128KB
max sampler amd 16 intel 480
opencl extention amd 16 intel 14
work group size amd 16 intel 14
any help
thanks
Related
I'm starting OpenCL. As I've understood, a platform is a vendor-specific OpenCL implementation, and a device is a processing unit that can be used by a platform.
I've made a simple C++ code that prints the platform name and for each of its devices prints the device name, and its output is
Platform 0: Intel(R) OpenCL HD Graphics
Device 0: Intel(R) Gen9 HD Graphics NEO
Platform 1: Intel(R) CPU Runtime for OpenCL(TM) Applications
Device 0: Intel(R) Core(TM) i5-6200U CPU # 2.3GHz
My question is, shouldn't I expect the two devices to be under the same platform? Given I have a laptop, and the GPU is integrated together with the processor. Also, will this then forbid me for assigning both GPU and CPU devices to the same context? (which I've read has some memory sharing advantages)
shouldn't I expect the two devices to be under the same platform
Only if the vendor provides a platform with drivers for both those devices. I'm not sure if Intel's "NEO" platform has also CPU driver, but i'm pretty sure the "CPU runtime" only has driver for the CPU, not the iGPU. You'll have to list the devices of each platform to find out.
will this then forbid me for assigning both GPU and CPU devices to the same context
You have to list the devices - if NEO has both devices then you can use that. But you can't have devices from different platforms in a single context.
I can view the Intel HD Graphics Command Queue with VTune, but I cannot the CPU Command Queue. Why? It is the expected behavior, to only capture GPU "events" but not those from the CPU that are independent of the GPU?
The same OpenCL program (a simple vector addition) running in the GPU shows the events (NDRange, etc) but in the CPU not (you only see clWrite,Read Buffer and clBuildProgram). Also, you cannot see any info in the region where CPU is working with OpenCL (clWaitForEvents).
CPU:
GPU:
I have two basic questions about getting started with GPGPU programming:
(1) If I do GPGPU on my Mac, will it affect the image on the monitor? How do I know the windowing system or other programs output is not competing for the GPU?
(2) Is there a way to try out AMD GPU programming somewhere without buying a high-end graphics card? The rental cloud places I have seen all use Nvidia. My computation would be logical integer (bit-twiddling) compute-bound, and I have read that AMD GPU is better for these applications.
1) It won't affect the image on the monitor. And to check if another process is using the GPU you'll need something like AMD System Monitor for mac (this application only works on Windows)
2) Any radeon HD 4xxx and above supports OpenCL (previous card might support this, but I'm not sure). This mean any new AMD card you can buy, including the cheapest ones support OpenCL.
The difference between the expensive cards and the cheap ones is the number of stream processors. For example
Radeon HD 4350: 80 stream processors
Radeon x290: 2560 stream processors
My laptop had
- one CPU core i5: Intel(R) Core(TM) i5-3210M CPU # 2.50GHz
- one Graphic card: Intel(R) HD Graphics 4000
- one Nvidia card ( external card ): GeForce GT 630M
But When I tried to use JavaCL.createBestContext(), it looks like just use one card Intel HD Graphics. So I tried to combine 3 : CPU and 2 GPUs by using:
List<CLDevice> devices = new ArrayList<CLDevice>();
// try to list all platform and devices
for(CLPlatform platform : JavaCL.listPlatforms()) {
//System.out.println(platform.getName());
for (CLDevice device : platform.listAllDevices(true)) {
System.out.println(device.getName().trim());
devices.add(device);
}
}
CLDevice device1 = (CLDevice)devices.get(0);
CLDevice device2 = (CLDevice)devices.get(1);
CLDevice device3 = (CLDevice)devices.get(2);
CLContext context = JavaCL.createContext(null, device1, device2, device3);
But I got error when try to use 3 at the same. So How can compile CPU and GPUs in JavaCL ? Because I read that OpenCL is standard to support parallel programming by using CPU and GPU. So If I miss something, please let me know. Any idea or answers will be appreciated.
Thanks,
Duy.
Sadly, its not that easy. When creating a single context across multiple devices, the devices all have to come from the same platform. Creating a context containing the Intel CPU and GPU should work, but the Nvidia GPU has to be in its own context (different platform, Nvidia not Intel).
Here's how I handle this scenario: I create a context for each device and a thread for each context. Each thread takes a portion of the data I'm working on and dispatches it to its assigned OpenCL device. This way, you can mix, CPUs, GPUs from both AMD and Nvidia, and any other hardware that comes along.
Its important to do load balancing across the threads so that you don't have faster devices sitting idle waiting for a slower device to catch up.
The definition of a platform in Khronos' OpenCL 1.0 and 1.1 specification:
Platform: The host plus a collection of devices managed by the OpenCL framework that allow an application to share resources and execute kernels on devices in the platform.
The OpenCL function clGetPlatformIDs creates an array of platforms, implying that multiple platforms are possible. Is it safe to assume that a given OpenCL host has only one platform?
In other words, will I lose anything on any host by doing this:
cl_platform_id platform_id;
cl_uint num_platforms;
errcode = clGetPlatformIDs(1, &platform_id, &num_platforms);
I wouldn't rely on there being only one Platform. When you have multiple OpenCL implementations on one system (which should be possible with the OpenCL ICD, although I'm not sure if that is only planned or already finished), you should get multiple platforms, one for each opencl implementation. One example where there could be multiple opencl implementations would be an nvidia implementation to run opencl on gpu and an amd implementation to run on cpu, so that it not that far fetched either.
edit: look at http://developer.amd.com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.aspx?ID=71 for (better) desciption of this
To complement the answer of Tim Child with an example (Thinkpad X201 with both AMD and Intel SDK's installed):
$ python /usr/share/doc/python-pyopencl/examples/benchmark-all.py
Execution time of test without OpenCL: 10.9563219547 s
===============================================================
Platform name: AMD Accelerated Parallel Processing
Platform profile: FULL_PROFILE
Platform vendor: Advanced Micro Devices, Inc.
Platform version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
---------------------------------------------------------------
Device name: Intel(R) Core(TM) i5 CPU M 520 # 2.40GHz
Device type: CPU
Device memory: 7799 MB
Device max clock speed: 2399 MHz
Device compute units: 2
Execution time of test: 0.00842799 s
Results OK
===============================================================
Platform name: Intel(R) OpenCL
Platform profile: FULL_PROFILE
Platform vendor: Intel(R) Corporation
Platform version: OpenCL 1.1 LINUX
---------------------------------------------------------------
Device name: Intel(R) Core(TM) i5 CPU M 520 # 2.40GHz
Device type: CPU
Device memory: 7799 MB
Device max clock speed: 2400 MHz
Device compute units: 2
Execution time of test: 0.00260659 s
Results OK
Yes, there is one Platform Id for each vendors OpenCL installation. So if you install AMD's and Intel's OpenCL SDK's you will get one Platform Id for each.
Even if you assume that a host has only one platform, you would have to figure out what the Id of that platform is, before calling clGetPlatformInfo. So its better if you call clGetPlatformIDs, pick up a default or user supplied platform and then call clGetPlatformInfo.