I want to run some threads on my CPU ( the localhost host ) and some other on a portable device connected ( like a USB ).
I know that OpenCL supports parallelization, but how do I distribute a work onto a portable devices using OpenCL?
Any other idea to do this other than OpenCL would also help.
Any device which might run an OpenCL task must have an Installable Client Driver associated with it, which can be picked up by the OpenCL Driver on the computer in question. Graphics Cards (especially if they're no older than half a decade) are nearly guaranteed to have a valid ICD, provided their drivers are up-to-date, and many Consumer-level CPUs have ICDs that are provided by their drivers.
However, other devices like a Network Device or a USB device are considerably less guaranteed to have a valid ICD unless they've been specifically designed for use in a Heterogeneous Compute system. If they do have a valid ICD, then it's a mere matter of querying for their platform at runtime and choosing it to use when constructing your OpenCL Context, then using it the same way you'd use OpenCL normally:
//C++ OpenCL API
cl::Platform target_platform;
std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
for(cl::Platform & platform : platforms) {
std::string name = platform.getInfo<CL_PLATFORM_NAME>();
if(name == /*Whatever the Name of the platform is*/) {
target_platform = platform;
break;
}
}
std::vector<cl::Device> devices;
target_platform.getDevices(CL_DEVICE_TYPE_ALL, &devices);
cl::Device target_device;
for(cl::Device & device : devices) {
if(device.getInfo</*...*/>() == /*...*/) {//Whatever properties you need
target_device = device;
break;
}
}
Related
I'm starting OpenCL. As I've understood, a platform is a vendor-specific OpenCL implementation, and a device is a processing unit that can be used by a platform.
I've made a simple C++ code that prints the platform name and for each of its devices prints the device name, and its output is
Platform 0: Intel(R) OpenCL HD Graphics
Device 0: Intel(R) Gen9 HD Graphics NEO
Platform 1: Intel(R) CPU Runtime for OpenCL(TM) Applications
Device 0: Intel(R) Core(TM) i5-6200U CPU # 2.3GHz
My question is, shouldn't I expect the two devices to be under the same platform? Given I have a laptop, and the GPU is integrated together with the processor. Also, will this then forbid me for assigning both GPU and CPU devices to the same context? (which I've read has some memory sharing advantages)
shouldn't I expect the two devices to be under the same platform
Only if the vendor provides a platform with drivers for both those devices. I'm not sure if Intel's "NEO" platform has also CPU driver, but i'm pretty sure the "CPU runtime" only has driver for the CPU, not the iGPU. You'll have to list the devices of each platform to find out.
will this then forbid me for assigning both GPU and CPU devices to the same context
You have to list the devices - if NEO has both devices then you can use that. But you can't have devices from different platforms in a single context.
I'm working on a project that involves connecting a single-board computer (either a BeagleBone Black, or a BeagleBoard X15) to a Mac via USB OTG, and then delivering basic mouse/touch input (pointer coordinates and left/right-click events).
This process should be technically very similar to connecting a mouse (or, more accurately, a touchscreen-style device that receives precise mouse coordinates) and passing some ordinary HID input to MacOS. So I don't need most of the complexity of IOKit - I don't think I need to create a kernel extension; I should just be able to create an instance of a HID for which MacOS already has generic kernel extensions.
So I'm delving into IOKit to figure out how to create the device instance and provide input. However, nearly everything I'm reading about IOKit involves creating and registering new kernel extensions, services, etc. - none of which is germane to my project.
So far, the only relevant leads I've got are the I/O Registry Explorer and the contents of /System/Library/Extensions. Several items in there look promising, such as AppleDHIDMouse.kext. However, I cannot find any examples of code to bridge the gap: how my USB-connected device can connect with the kernel extension, create an instance for itself, and send commands.
Any help? Thanks in advance.
Either your device is fully USB HID compliant, in which case you shouldn't need any code at all on the Mac side, or you'll need to create a kernel extension.
How far have you got? What does your device look like in ioreg/IORegistryExplorer? (The latter is from the "Additional Tools for Xcode", downloadable from https://developer.apple.com/download/more/ )
Does your USB device's interface report as HID? (bInterfaceClass 3) The device itself normally reports as a composite device (bDeviceType 0). bInterfaceProtocol and bInterfacSubClass also have defined meanings in the context of HID, and should probably both be 0 for a "tablet" style device. With that in place, macOS should pick up your device as a HID device and try to drive it with one of its built in HID device drivers.
The way HID devices work is through "reports" - event data structures with a flexible format/layout, which is defined via the device's "report descriptor". What buttons, input axes, etc. your device has is defined there.
For an example of a USB "tablet" device (absolute coordinate pointing device) that works with macOS, check out the code for the USB Tablet device that Qemu emulates. That might be a good starting point for the report descriptor of your own device.
If your device doesn't conform to general USB HID conventions and uses some custom protocol, you'll need a custom kext (up to macOS 10.14) or dext (from macOS 10.15 onwards) which will most likely implement a IOHIDDevice subclass. An example of such a driver is the open source Mac driver for the Xbox 360's game controller, which doesn't behave like a standard USB HID device.
I'm devloping on Linux & CPP (Using Eclipse SDK).
I'm novice at OpenCL (GPU Programming)
I want to execute some of my code on the GPU (rewrite some functions with openCL and run them on the GPU).
I'm liitle bit confuse - If I will write some code (.cl files) how can I call them from my cpp application ?
I didnt saw any examples for this need.
There are two parts of code if you want to use opencl.
A. The Kernel code.
consists of 1 to many kernel functions which perform your calculations on a device.
B. The Host code
normal c/c++ code. what happens here:
pick a device for the kernel to be computed on (gpu/cpu/igpu/xeon phi/...)
in opencl you got a set of platforms which can contain several different devices. So you pick a platform AND a device.
example:
platform: intel cpu+gpu opencl 1.2
device: cpu OR IGPU
build your kernel
const char * code = load_program_source("kernel.cl");
cl_program program = clCreateProgramWithSource(context, 1, (const char **)&code, NULL, &err); errWrapper("clCreateProgramWithSource", err);
create buffer for memory transfers to device:
cl_mem devInput1 = clCreateBuffer(context, CL_MEM_READ_ONLY, variable1* sizeof(int), NULL, &err);
transfer to device
errWrapper("setKernel", clSetKernelArg(countKeyCardinality, 0, sizeof (cl_mem), &devInput1));
launch kernel
errWrapper("clEnqueueNDRangeKernel", clEnqueueNDRangeKernel(command_queue, kernel_function1, 1, NULL, &tasksize, NULL, 0, NULL, NULL));
wait for termination
clFinish(command_queue)
Fetch your result from the device
using
clEnqueueReadBuffer
Proceed with your c++ code using the result created by opencl calculations.
Thats the basic idea of using opencl in your code.
Better start doing a full opencl tutorial. (just google it, you will drown in opencl tutorials)
concepts you should be familar with:
opencl host api
command queue
kernel arguments.
work group
local size
local memory
global memory
cl_mem object
Debugging opencl is possible but painfull. I would suggest doing debug with NORMAL C Code and port it to opencl if it works.
The main source for all commands is the offical API documentation, which can be found here: opencl 1.2 api
edit: you do not need a special IDE to code opencl.
I'm not sure if each hardware type (display screen, USB, printer, etc) has to follow a unified standard in order to communicate with the CPU. For example, the bits transmitted back and forth between a display screen interface and the CPU are interpreted by the CPU as a specific command, and this interpretation is also correct (for the same bits) even if another display screen is used (from another manufacturer).
If this is not true, how BIOS is supposed to communicate with hundreds of different hardware devices with varying methods of interpreting bits going back and forth from the device interface to the CPU?
I find the standardization notion to be much more practical.
The BIOS itself actually only needs to understand a limited set of hardware required to boot the CPU. It does not need to understand "hundreds" of devices. For example, the BIOS has no idea what a USB printer is.
In general, the BIOS only understands the following devices:
The CPU/Chipset "core" hardware - e.g. the DDR3 memory controller
Basic PCI/PCI Express initialization - nothing device-specific
The video controller - just enough code for basic initialization, typically provided by an Option ROM
The SATA controller - as long as it is IDE/ACHI compatible.
The USB controller - possibly just USB 2.0
Standard USB storage devices
Standard USB keyboard/mouse devices
Ethernet controller - typically provided by an Option ROM
Any other device is ignored by the BIOS, unless the vendor included an Option ROM on the board. (You typically see this on SAS/SCSI controllers or Ethernet cards.)
Note most of the devices listed above conform to a standard specification, so they are software compatible regardless of who made it. For example, a USB 2.0 controller should comply with the EHCI spec, it would be compatible across all BIOSes. SATA controllers should follow the AHCI spec.
Once the Operating System loads, it takes over from the BIOS and loads its own drivers to interface with the hardware.
There is specific way(i.e. protocol) for each hardware to communicate with CPU. Maybe we can regard it as "device specification". To communicate with hundreds of different hardware devices BIOS should implement corresponding protocols within it. Thus we can say BIOS is actually a "collection" of specifications.
Whenever new spec is announced, BIOS should be modified to support it, or BIOS does not identify the corresponding device,not to speak of configuring it !
My laptop had
- one CPU core i5: Intel(R) Core(TM) i5-3210M CPU # 2.50GHz
- one Graphic card: Intel(R) HD Graphics 4000
- one Nvidia card ( external card ): GeForce GT 630M
But When I tried to use JavaCL.createBestContext(), it looks like just use one card Intel HD Graphics. So I tried to combine 3 : CPU and 2 GPUs by using:
List<CLDevice> devices = new ArrayList<CLDevice>();
// try to list all platform and devices
for(CLPlatform platform : JavaCL.listPlatforms()) {
//System.out.println(platform.getName());
for (CLDevice device : platform.listAllDevices(true)) {
System.out.println(device.getName().trim());
devices.add(device);
}
}
CLDevice device1 = (CLDevice)devices.get(0);
CLDevice device2 = (CLDevice)devices.get(1);
CLDevice device3 = (CLDevice)devices.get(2);
CLContext context = JavaCL.createContext(null, device1, device2, device3);
But I got error when try to use 3 at the same. So How can compile CPU and GPUs in JavaCL ? Because I read that OpenCL is standard to support parallel programming by using CPU and GPU. So If I miss something, please let me know. Any idea or answers will be appreciated.
Thanks,
Duy.
Sadly, its not that easy. When creating a single context across multiple devices, the devices all have to come from the same platform. Creating a context containing the Intel CPU and GPU should work, but the Nvidia GPU has to be in its own context (different platform, Nvidia not Intel).
Here's how I handle this scenario: I create a context for each device and a thread for each context. Each thread takes a portion of the data I'm working on and dispatches it to its assigned OpenCL device. This way, you can mix, CPUs, GPUs from both AMD and Nvidia, and any other hardware that comes along.
Its important to do load balancing across the threads so that you don't have faster devices sitting idle waiting for a slower device to catch up.