OpenCL on isolated CPU cores - opencl

I have an 8 core linux machine on which I've altered my grub.conf with
isolcpus=4,5,6,7
so that the last four cores are not used by the OS process scheduler. Running the command clinfo shows for the CPU: MAX_COMPUTE_UNITS: 4.
Removing the isolcpus line from my grub.conf file and running clinfo shows for the CPU: MAX_COMPUTE_UNITS: 8. I guess this means that any OpenCL kernel will not use the isolated CPUs. Does anyone know how to force an OpenCL kernel to use the isolated CPUs? More info on my specific OpenCL implementation from clinfo:
NAME: Intel(R) Xeon(R) CPU E5-2603 v2 # 1.80GHz
VENDOR: Intel(R) Corporation
PROFILE: FULL_PROFILE
VERSION: OpenCL 1.2 (Build 8)
EXTENSIONS: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_intel_exec_by_local_thread cl_khr_depth_images cl_khr_3d_image_writes cl_khr_fp64
DRIVER_VERSION: 1.2.0.8

Related

Unable to paralleled running VG bioinformatic tool in parallel computing (if possible with GUI), when working on server?

I am running a server at my workplace with specs as follows:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 56
On-line CPU(s) list: 0-55
Thread(s) per core: 2
Core(s) per socket: 14
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2683 v3 # 2.00GHz
Stepping: 2
CPU MHz: 2000.000
BogoMIPS: 4004.58
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 35840K
NUMA node0 CPU(s): 0-13,28-41
NUMA node1 CPU(s): 14-27,42-55
With the OS information:
Transient hostname: some-server-gs
Icon name: computer-server
Chassis: server
Operating System: CentOS Linux 7 (Core)
Kernel: Linux 3.10.0-514.el7.x86_64
Architecture: x86-64
I'm starting with the "Parallel" package but is there any GUI based way to use parallel computing?
I just tried running the command "parallel" but it did not work!

Intel OpenCL Intel CPU not detected

I'm running the following code in order to retrieve device information:
#include <CL/cl.h>
#include <vector>
int main(int argc, char *argv[])
{
//find and print all available opencl devices using clGetdeviceinfo
cl_int err;
cl_uint num_platforms;
cl_platform_id *platforms;
cl_uint num_devices;
cl_device_id *devices;
err = clGetPlatformIDs(0, NULL, &num_platforms);
platforms = (cl_platform_id *)malloc(num_platforms * sizeof(cl_platform_id));
err = clGetPlatformIDs(num_platforms, platforms, NULL);
for (int i = 0; i < num_platforms; i++)
{
cl_platform_id platform = platforms[i];
char platform_name[100];
err = clGetPlatformInfo(platform, CL_PLATFORM_NAME, 100, platform_name, NULL);
std::cout << "Platform " << i << ": " << platform_name << std::endl;
cl_uint num_devices;
err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 0, NULL, &num_devices);
devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id));
err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, num_devices, devices, NULL);
for (int j = 0; j < num_devices; j++)
{
cl_device_id device = devices[j];
char device_name[100];
err = clGetDeviceInfo(device, CL_DEVICE_NAME, 100, device_name, NULL);
std::cout << "Device " << j << ": " << device_name << std::endl;
}
}
}
Which results in the following output:
Platform 0: Intel(R) OpenCL HD Graphics
Device 0: Intel(R) Iris(R) Plus Graphics [0x8a52]
Platform 1: Clover
With a segfault occuring at the clGetDeviceInfo for the second platform.
I'm running Ubuntu 20.04.4 LTS on a MS Surface Pro 7 with a
10th Gen Intel® Core™ i7-1065G7 CPU.
clinfo output:
Number of platforms 2
Platform Name Intel(R) OpenCL HD Graphics
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 3.0
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_subgroup_local_block_io
Platform Host timer resolution 1ns
Platform Extensions function suffix INTEL
Platform Name Clover
Platform Vendor Mesa
Platform Version OpenCL 1.1 Mesa 21.2.6
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix MESA
Platform Name Intel(R) OpenCL HD Graphics
Number of devices 1
Device Name Intel(R) Iris(R) Plus Graphics [0x8a52]
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 3.0 NEO
Driver Version 22.17.23034
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 64
Max clock frequency 1100MHz
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 32
Max sub-groups per work group 32
Sub-group sizes (Intel) 8, 16, 32
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 1 / 1
half 8 / 8 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (n/a)
Half-precision Floating-point support (cl_khr_fp16)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 64, Little-Endian
Global memory size 13084180480 (12.19GiB)
Error Correction support No
Max memory allocation 4294959104 (4GiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing No
Fine-grained system sharing No
Atomics No
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Preferred alignment for atomics
SVM 64 bytes
Global 64 bytes
Local 64 bytes
Max size for global variable 65536 (64KiB)
Preferred total size of global vars 4294959104 (4GiB)
Global Memory cache type Read/Write
Global Memory cache size 1048576 (1024KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 268434944 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 4 bytes
Pitch alignment for 2D image buffers 4 pixels
Max 2D image size 16384x16384 pixels
Max planar YUV image size 16384x16352 pixels
Max 3D image size 16384x16384x2048 pixels
Max number of read image args 128
Max number of write image args 128
Max number of read/write image args 128
Max number of pipe args 16
Max active pipe reservations 1
Max pipe packet size 1024
Local memory type Local
Local memory size 65536 (64KiB)
Max number of constant args 8
Max constant buffer size 4294959104 (4GiB)
Max size of kernel argument 2048 (2KiB)
Queue properties (on host)
Out-of-order execution Yes
Profiling Yes
Queue properties (on device)
Out-of-order execution No
Profiling No
Preferred size 0
Max size 0
Max queues on device 0
Max events on device 0
Prefer user sync for interop Yes
Profiling timer resolution 52ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Sub-group independent forward progress Yes
IL version SPIR-V_1.2
SPIR versions 1.2
printf() buffer size 4194304 (4MiB)
Built-in kernels block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
Motion Estimation accelerator version (Intel) 2
Device-side AVC Motion Estimation version 1
Supports texture sampler use Yes
Supports preemption Yes
Device Extensions cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_subgroup_local_block_io
Platform Name Clover
Number of devices 0
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [INTEL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name Intel(R) OpenCL HD Graphics
Device Name Intel(R) Iris(R) Plus Graphics [0x8a52]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Intel(R) OpenCL HD Graphics
Device Name Intel(R) Iris(R) Plus Graphics [0x8a52]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Intel(R) OpenCL HD Graphics
Device Name Intel(R) Iris(R) Plus Graphics [0x8a52]
NOTE: your OpenCL library only supports OpenCL 2.2,
but some installed platforms support OpenCL 3.0.
Programs using 3.0 features may crash
or behave unexpectedly
I have tried to install Intel-SDK OpenCL CPU-runtime with no success.
How do I configure the CPU for OpenCL?
The intended use for the CPU is to be able to debug OpenCL-kernels.
The latest official OpenCL 18.1 CPU Runtime release from Intel is a bugged mess and/or not available.
Intel have lately integrated the OpenCL CPU Runtime into their oneAPI DPC++ Compiler.
One version that I know of that works (I tested it on 10980XE) is this one:
oclcpuexp-2020.10.4.0.15_rel.tar.gz
You may also try the latest version, oclcpuexp-2021.13.11.0.23_rel.tar.gz
I originally found the hint here.

/usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available

When I am running computecpp_info
$ /usr/local/computecpp/bin/computecpp_info
/usr/local/computecpp/bin/computecpp_info: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by /usr/local/computecpp/bin/computecpp_info)
/usr/local/computecpp/bin/computecpp_info: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by /usr/local/computecpp/bin/computecpp_info)
********************************************************************************
ComputeCpp Info (CE 0.7.0)
********************************************************************************
Toolchain information:
GLIBC version: 2.19
GLIBCXX: 20150426
This version of libstdc++ is supported.
********************************************************************************
Device Info:
Discovered 3 devices matching:
platform : <any>
device type : <any>
--------------------------------------------------------------------------------
Device 0:
Device is supported : NO - Device does not support SPIR
CL_DEVICE_NAME : GeForce GTX 750 Ti
CL_DEVICE_VENDOR : NVIDIA Corporation
CL_DRIVER_VERSION : 384.111
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 1:
Device is supported : UNTESTED - Device not tested on this OS
CL_DEVICE_NAME : Intel(R) HD Graphics
CL_DEVICE_VENDOR : Intel(R) Corporation
CL_DRIVER_VERSION : r5.0.63503
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 2:
Device is supported : YES - Tested internally by Codeplay Software Ltd.
CL_DEVICE_NAME : Intel(R) Core(TM) i7-4790 CPU # 3.60GHz
CL_DEVICE_VENDOR : Intel(R) Corporation
CL_DRIVER_VERSION : 1.2.0.475
CL_DEVICE_TYPE : CL_DEVICE_TYPE_CPU
If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v0.7.0/platform-support-notes
********************************************************************************
and when running clinfo
$ clinfo
clinfo: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by clinfo)
Number of platforms 2
Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 1.2 CUDA 9.0.282
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
Platform Extensions function suffix NV
Platform Name Intel(R) OpenCL
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 1.2
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
Platform Extensions function suffix INTEL
Platform Name NVIDIA CUDA
Number of devices 1
Device Name GeForce GTX 750 Ti
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.2 CUDA
Driver Version 384.111
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Device Topology (NV) PCI-E, 01:00.0
Max compute units 5
Max clock frequency 1110MHz
NVIDIA Compute Capability 5.0
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Preferred work group size multiple 32
Warp size (NV) 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 2094530560 (1.951GiB)
Error Correction support No
Max memory allocation 523632640 (499.4MiB)
Unified memory for Host and Device No
Integrated memory (NV) No
Minimum alignment for any data type 128 bytes
Alignment of base address 4096 bits (512 bytes)
Global Memory cache type Read/Write
Global Memory cache size 81920
Global Memory cache line 128 bytes
Image support Yes
Max number of samplers per kernel 32
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x16384 pixels
Max 3D image size 4096x4096x4096 pixels
Max number of read image args 256
Max number of write image args 16
Local memory type Local
Local memory size 49152 (48KiB)
Registers per block (NV) 65536
Max constant buffer size 65536 (64KiB)
Max number of constant args 9
Max size of kernel argument 4352 (4.25KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Kernel execution timeout (NV) Yes
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 1
Prefer user sync for interop No
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
Platform Name Intel(R) OpenCL
Number of devices 2
Device Name Intel(R) HD Graphics
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 1.2
Driver Version r5.0.63503
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Max compute units 20
Max clock frequency 1200MHz
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 32
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 1 / 1
half 0 / 0 (n/a)
float 1 / 1
double 0 / 0 (n/a)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (n/a)
Address bits 64, Little-Endian
Global memory size 1709598311 (1.592GiB)
Error Correction support No
Max memory allocation 854799155 (815.2MiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 524288
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 53424947 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 128
Local memory type Local
Local memory size 65536 (64KiB)
Max constant buffer size 854799155 (815.2MiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Profiling timer resolution 80ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
SPIR versions 1.2
Prefer user sync for interop Yes
printf() buffer size 4194304 (4MiB)
Built-in kernels block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_intel_accelerator cl_intel_advanced_motion_estimation cl_intel_driver_diagnostics cl_intel_motion_estimation cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_required_subgroup_size cl_intel_subgroups cl_intel_va_api_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
Device Name Intel(R) Core(TM) i7-4790 CPU # 3.60GHz
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 1.2 (Build 475)
Driver Version 1.2.0.475
Device OpenCL C Version OpenCL C 1.2
Device Type CPU
Device Profile FULL_PROFILE
Max compute units 8
Max clock frequency 3600MHz
Device Partition (core)
Max number of sub-devices 8
Supported partition types by counts, equally, by names (Intel)
Max work item dimensions 3
Max work item sizes 8192x8192x8192
Max work group size 8192
Preferred work group size multiple 128
Preferred / native vector sizes
char 1 / 32
short 1 / 16
int 1 / 8
long 1 / 4
half 0 / 0 (n/a)
float 1 / 8
double 1 / 4 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 8260567040 (7.693GiB)
Error Correction support No
Max memory allocation 2065141760 (1.923GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 262144
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 480
Max size for 1D images from buffer 129071360 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 480
Max number of write image args 480
Local memory type Global
Local memory size 32768 (32KiB)
Max constant buffer size 131072 (128KiB)
Max number of constant args 480
Max size of kernel argument 3840 (3.75KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Local thread execution (Intel) Yes
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions 1.2
Prefer user sync for interop No
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [NV]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
Here, I tried to find the reason of /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available and tried to fix this issue, but I didn't get any fruitful help. Please help by explaining or referring something to get it clear.

Using all cores with Microsoft R Open and Google Compute Engine

I'm using Microsoft R Open on a GCE instance that has two vCPUs. Here are its specs.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU # 2.30GHz
Stepping: 0
CPU MHz: 2300.000
BogoMIPS: 4600.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0,1
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc
eagerfpu pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hyp
ervisor lahf_lm abm fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms xsaveopt
Even though I have two cores, Microsoft R Open seems to recognize only one of them, so I'm not taking full advantage of my computing capacity. I can't set the numbers of threads manually either.
Microsoft R Open 3.3.2
The enhanced R distribution from Microsoft
Microsoft packages Copyright (C) 2016 Microsoft Corporation
Using the Intel MKL for parallel mathematical computing(using 1 cores).
Default CRAN mirror snapshot taken on 2016-11-01.
See: https://mran.microsoft.com/.
> getMKLthreads()
[1] 1
> setMKLthreads(2)
Number of threads at maximum: no change has been made.
Here's a graph showing CPU usage. It never uses more than 50% of CPU power.
So, what should I do so I can use all my cores with MRO?
you are running Xeon which is hyper threaded . You have 1 cpu with hyper threading,the os treats it as 2 cpus but there is only one physical cpu. MRO uses the physical cores only(without hyper threading)
You can use this:
library(doParallel)
no_cores <- detectCores() - 1
registerDoParallel(cores=no_cores)
It will one core less than the actual cores you have. It leaves one core for OS operations. Try it out.

AMD OpenCL driver: 1) detects Intel CPU 2) lists same type of GPU with different OpenCL version

My workstation configuration:
Intel(R) Xeon(R) CPU E5-2609 v2 # 2.50GHz (x2)
AMD FirePro W9100 (x2)
Operating system:
Windows Server 2012 R2 Standard
I use lwjgl's OpenCL demo (link) to list the platforms and devices.
Problem 1)
After installing AMD drivers on my workstation AMD Accelerated Parallel Processing platform list Intel(R) Xeon(R) CPU E5-2609 v2 # 2.50GHz as a device. (3rd device listed for AMD platform)
I tested the same code for both AMD and Intel OpenCL driver and Intel's own implementation was much faster for it's own hardware. (no surprise there)
Anyway I don't want an Intel device to be listen under an AMD platform.
Problem 2)
My 2 identical AMD FirePro W9100 (device 1 and 2 under AMD platform) is listed with different level of OpenCL support.
What could cause this problem, and more importantly how can I make my 2nd card use the 2.0 OpenCL driver?
OpenCL demo results:
NEW PLATFORM: [0x7FFE51B57B60]
CL_PLATFORM_PROFILE = FULL_PROFILE
CL_PLATFORM_VERSION = OpenCL 2.0 AMD-APP (1642.5)
CL_PLATFORM_NAME = AMD Accelerated Parallel Processing
CL_PLATFORM_VENDOR = Advanced Micro Devices, Inc.
CL_PLATFORM_EXTENSIONS = cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices
CL_PLATFORM_ICD_SUFFIX_KHR = AMD
** NEW DEVICE: [0x1087BD0]
OpenCL 2.0 - Extensions: cl_amd_device_attribute_query cl_amd_fp64 cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_printf cl_amd_vec3 cl_ext_atomic_counters_32 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_fp64 cl_khr_gl_event cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_image2d_from_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
CL_DEVICE_TYPE = 4
CL_DEVICE_VENDOR_ID = 4098
CL_DEVICE_MAX_COMPUTE_UNITS = 44
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3
CL_DEVICE_MAX_WORK_GROUP_SIZE = 256
CL_DEVICE_MAX_CLOCK_FREQUENCY = 930
CL_DEVICE_ADDRESS_BITS = 64
CL_DEVICE_AVAILABLE = true
CL_DEVICE_COMPILER_AVAILABLE = true
CL_DEVICE_NAME = Hawaii
CL_DEVICE_VENDOR = Advanced Micro Devices, Inc.
CL_DRIVER_VERSION = 1642.5 (VM)
CL_DEVICE_PROFILE = FULL_PROFILE
CL_DEVICE_VERSION = OpenCL 2.0 AMD-APP (1642.5)
CL_DEVICE_EXTENSIONS = cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images
CL_DEVICE_OPENCL_C_VERSION = OpenCL C 2.0
Sub Buffer destructed: 17348816
Buffer destructed (2): 16864864
Buffer destructed (1): 16864864
** NEW DEVICE: [0x10851E0]
OpenCL 1.2 - Extensions: cl_amd_device_attribute_query cl_amd_fp64 cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_printf cl_amd_vec3 cl_ext_atomic_counters_32 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp64 cl_khr_gl_event cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_image2d_from_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
CL_DEVICE_TYPE = 4
CL_DEVICE_VENDOR_ID = 4098
CL_DEVICE_MAX_COMPUTE_UNITS = 44
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3
CL_DEVICE_MAX_WORK_GROUP_SIZE = 256
CL_DEVICE_MAX_CLOCK_FREQUENCY = 930
CL_DEVICE_ADDRESS_BITS = 64
CL_DEVICE_AVAILABLE = true
CL_DEVICE_COMPILER_AVAILABLE = true
CL_DEVICE_NAME = Hawaii
CL_DEVICE_VENDOR = Advanced Micro Devices, Inc.
CL_DRIVER_VERSION = 1642.5 (VM)
CL_DEVICE_PROFILE = FULL_PROFILE
CL_DEVICE_VERSION = OpenCL 1.2 AMD-APP (1642.5)
CL_DEVICE_EXTENSIONS = cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
CL_DEVICE_OPENCL_C_VERSION = OpenCL C 1.2
Sub Buffer destructed: 17348816
Buffer destructed (2): 16864864
Buffer destructed (1): 16864864
** NEW DEVICE: [0x5DDFE790]
OpenCL 1.2 - Extensions: cl_amd_device_attribute_query cl_amd_fp64 cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_printf cl_amd_vec3 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp64 cl_khr_gl_event cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
CL_DEVICE_TYPE = 2
CL_DEVICE_VENDOR_ID = 4098
CL_DEVICE_MAX_COMPUTE_UNITS = 8
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3
CL_DEVICE_MAX_WORK_GROUP_SIZE = 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY = 2500
CL_DEVICE_ADDRESS_BITS = 64
CL_DEVICE_AVAILABLE = true
CL_DEVICE_COMPILER_AVAILABLE = true
CL_DEVICE_NAME = Intel(R) Xeon(R) CPU E5-2609 v2 # 2.50GHz
CL_DEVICE_VENDOR = GenuineIntel
CL_DRIVER_VERSION = 1642.5 (sse2,avx)
CL_DEVICE_PROFILE = FULL_PROFILE
CL_DEVICE_VERSION = OpenCL 1.2 AMD-APP (1642.5)
CL_DEVICE_EXTENSIONS = cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_spir cl_khr_gl_event
CL_DEVICE_OPENCL_C_VERSION = OpenCL C 1.2
-TRYING TO EXEC NATIVE KERNEL-
KERNEL EXEC argument: 1337, should be 1337
Event callback status: CL_COMPLETE
EMPTY NATIVE KERNEL AVG EXEC TIME: 28.8072us
Sub Buffer destructed: 17348816
Buffer destructed (2): 16864864
Buffer destructed (1): 16864864
-------------------------
NEW PLATFORM: [0x5DB27010]
CL_PLATFORM_PROFILE = FULL_PROFILE
CL_PLATFORM_VERSION = OpenCL 1.2
CL_PLATFORM_NAME = Intel(R) OpenCL
CL_PLATFORM_VENDOR = Intel(R) Corporation
CL_PLATFORM_EXTENSIONS = cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64
CL_PLATFORM_ICD_SUFFIX_KHR = INTEL
** NEW DEVICE: [0x5DB1E8F0]
OpenCL 1.2 - Extensions: cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
CL_DEVICE_TYPE = 2
CL_DEVICE_VENDOR_ID = 32902
CL_DEVICE_MAX_COMPUTE_UNITS = 8
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3
CL_DEVICE_MAX_WORK_GROUP_SIZE = 8192
CL_DEVICE_MAX_CLOCK_FREQUENCY = 2500
CL_DEVICE_ADDRESS_BITS = 64
CL_DEVICE_AVAILABLE = true
CL_DEVICE_COMPILER_AVAILABLE = true
CL_DEVICE_NAME = Intel(R) Xeon(R) CPU E5-2609 v2 # 2.50GHz
CL_DEVICE_VENDOR = Intel(R) Corporation
CL_DRIVER_VERSION = 5.0.0.57
CL_DEVICE_PROFILE = FULL_PROFILE
CL_DEVICE_VERSION = OpenCL 1.2 (Build 57)
CL_DEVICE_EXTENSIONS = cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64
CL_DEVICE_OPENCL_C_VERSION = OpenCL C 1.2
-TRYING TO EXEC NATIVE KERNEL-
KERNEL EXEC argument: 1337, should be 1337
Event callback status: CL_COMPLETE
EMPTY NATIVE KERNEL AVG EXEC TIME: 9.5031us
Sub Buffer destructed: 1574991568
Buffer destructed (2): 1563032880
Buffer destructed (1): 1563032880
1) Look in your regkeys (under local_machine/Software/Krono/vendors or software/AMD). These things are usually controlled by regkeys. I'm not sure what AMD uses though.
2) Make sure both cards have the same driver version. Otherwise it could be that AMD driver only allows one 2.0 device on any machine due to potential coherence issues between devices. 2.0 devices are supposed to have SVM support with the cores, but I'm not sure what happens with coherence across devices.
Problem 1)
It seems that AMD's OpenCL implementation works for all x86 CPUs. I don't have hard facts on this, only another similiar discussion on Khronos forums.
A solution could be to filter out devices based on the name reported by different platforms, but I don't know if it is guaranteed that different platforms will always report the same name for the same device.
Problem 2)
I have no idea on your 2nd problem, it looks bizarre...

Resources