Is there a possibility to exclusively reserve the GPU for an OpenCL host programme?
No other process shall have access to this device via OpenCL or OpenGL.
Background is that my software calculates real time data on the GPU and therefore it's not good for the performance if the GPU is doing other stuff as well.
Related
I'm using intel integrated gpu for my opencl implementation. I am implementing a program with zero copy where I'm not copying the data to the gpu instead it shares the common memory(RAM).
I have a 64bit cpu, but in gpu specs it shows it has only 32 bit addressing mode.
I'm sharing a malloc heap space between the gpu and cpu and when I print the address I see the following.
In GPU:
if(id==0){
printf("Mem address: %p\n",A);
//Outputs Mem address: 0x1010000
In CPU: it prints
printf("Outside Mem address: %p\n",cpuA);
Device: Intel(R) HD Graphics IvyBridge M GT2
Outside Mem address: 0x7fcd529d9000
I am not getting how is it mapped in gpu. And I wonder if 2^28/2^32 is the maximum address gpu could access?.
The memory address you are printing on the host is a virtual address that only makes sense in the context of your program's process. In the CPU, this is transparently translated to a physical RAM page, the address of which is unrelated to the virtual address but stored in a lookup table (page table) maintained by the operating system. Note that "64-bit CPU" typically refers to the number of bits in a virtual address. (Although many 64-bit CPUs actually ignore 8-16 bits.) The number of bits for physical addresses (for addressing physical RAM cells and mapped device memory) is often a lot less, as little as 40 bits.
Devices attached to the system and able to perform direct memory accesses (DMA) most commonly deal with physical memory addresses. If your Intel GPU does not have an internal memory mapping scheme (and there is no IOMMU active, see below) then the address you are seeing in your OpenCL kernel code is probably a physical memory address. If the device can only address 32 bits, this means it can only access the first 4GiB of physical memory in your system. By assigning memory above 4GiB to devices and user space processes that aren't affected by a 32-bit restriction, or by using "bounce buffers", the operating system can arrange for any buffers used by the restricted device to be located in that memory area, regardless of virtual address.
Recently, IOMMUs have become common. These introduce a virtual memory like mapping system for devices as well - so the memory addresses a device sees are again unrelated to the physical addresses of system memory they correspond to. This is primarily a security feature - ideally, each device gets its own address space, so devices can't accidentally or deliberately access system memory they should not be accessing. It also means that the 32-bit limitation becomes completely irrelevant, as each device gets its own 32-bit address space which can be mapped to physical memory beyond the 4GiB boundary.
I have two basic questions about getting started with GPGPU programming:
(1) If I do GPGPU on my Mac, will it affect the image on the monitor? How do I know the windowing system or other programs output is not competing for the GPU?
(2) Is there a way to try out AMD GPU programming somewhere without buying a high-end graphics card? The rental cloud places I have seen all use Nvidia. My computation would be logical integer (bit-twiddling) compute-bound, and I have read that AMD GPU is better for these applications.
1) It won't affect the image on the monitor. And to check if another process is using the GPU you'll need something like AMD System Monitor for mac (this application only works on Windows)
2) Any radeon HD 4xxx and above supports OpenCL (previous card might support this, but I'm not sure). This mean any new AMD card you can buy, including the cheapest ones support OpenCL.
The difference between the expensive cards and the cheap ones is the number of stream processors. For example
Radeon HD 4350: 80 stream processors
Radeon x290: 2560 stream processors
I am trying to measure PCIe Bandwidth on ATI FirePro 8750. The amd app sample PCIeBandwidth in the SDK measures the bandwith of transfers from:
Host to device, using clEnqueueReadBuffer().
Device to host, using clEnqueueWriteBuffer().
On my system (windows 7, Intel Core2Duo 32 bit) the output is coming like this:
Selected Platform Vendor : Advanced Micro Devices, Inc.
Device 0 : ATI RV770
Host to device : 0.412435 GB/s
Device to host : 0.792844 GB/s
This particular card has 2 GB DRAM and max clock frequency is 750 Mhz
1- Why is bandwidth different in each direction?
2- Why is the Bandwdith so small?
Also I understand that this communication takes place through DMA, so the Bandwidth may not be affected by CPU.
This paper from Microsoft Research labs give some inkling of why there is asymmetric PCIe data transfer bandwidth between GPU - CPU. The paper describes performance metrics for FPGA - GPU data transfer bandwidth over PCIe. It also includes metrics from CPU - GPU data transfer bandwidth over PCIe.
To quote the relevant section
'it should also be noted that the GPU-CPU transfers themselves also
show some degree of asymmetric behavior. In the case of a GPU to CPU
transfer, where the GPU is initiating bus master writes, the GPU
reaches a maximum of
6.18 GByte/Sec. In the opposite direction from CPU to GPU, the GPU is initiating bus master reads and the resulting bandwidth falls to 5.61
GByte/Sec. In our observations it is typically the case that bus
master writes are more efficient than bus master reads for any PCIe
implementation due to protocol overhead and the relative complexity of
implementation. While a possible solution to this asymmetry would be
to handle the CPU to GPU direction by using CPU initiated bus master
writes, that hardware facility is not available in the PC architecture
in general. '
The answer to the second question on bandwidth could be due units of data transfer size.
See figs 2,3,4 and 5. I have also seen graphs like this at the 1st AMD Fusion Conference. The explanation is that the PCIe transfer of data has overheads due to the protocol and the device latency. The overheads are more significant for small transfer sizes and become less significant for larger sizes.
What levers do you have to control or improve performance?
Getting the right combo of chip/motherboard and GPU is the H/W lever. Chips with the max number of PCIe lanes are better. Using a higher spec PCIe protocol, PCIe 3.0 is better than PCIe 2.0. All components need to support the higher standards.
As a programmer controlling the data transfer size, is a very important lever.
Transfer sizes of 128K - 256K bytes get approx 50% of the max bandwidth. Transfers of 1M - 2M bytes get over 90% of max bandwidth.
In the AMD APP programming guide it is written that (p.no 4-15):
For transfers <=32 kB: For transfers from the host to device, the data is copied by the CPU
to a runtime pinned host memory buffer, and the DMA engine transfers the
data to device memory. The opposite is done for transfers from the device to
the host.
Is the above DMA, the CPU DMA engine or the GPU DMA engine?
I believe it is the GPU DMA engine since on some cards (e.g., NVIDIA) you can do simultaneous read and write (so this is a GPU capability and not a CPU capability).
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
One difference is microcontrollers are usually designed to perform a small set of specific functions whereas microprocessors are for huge, general functions.
Anything else??
A microcontroller is a microprocessor (a.k.a. CPU core or cores) with additional peripherals on-chip. The terms come from the 1970s, where a microprocessor (e.g. Motorola 6800 or Intel 8086) would have an address bus, a data bus, and control lines, and a microcontroller (e.g. Motorola 6801 or Intel 8051) would have peripheral I/O pins (serial ports, parallel I/O, timer I/O, etc.) but no external memory bus (you were stuck with what was on the chip).
Additionally, microprocessors executed their programs from external ROM and microcontrollers would use internal masked (as in "programmed at the factory by changing the IC photo mask") ROM. The only practical erasable ROMs were UV-erased EPROMS, electrically erasable PROMS (EEPROMS) were expensive, slow, and not very dense, and "flash" meant the bits of plastic sticking out of the mold seam lines on the chip.
Honestly, the line between them is fading away. Modern microcontrollers such as the Motorola 6812 series have an external memory bus and peripheral I/O pins at the same time, and can be used as either a microprocessor or microcontroller.
From
http://wiki.answers.com/Q/What_is_the_difference_between_a_microprocessor_and_a_microcontroller
A microcontroller is a specialized form of microprocessor that is designed to be self-sufficient and cost-effective, where a microprocessor is typically designed to be general purpose (the kind used in a PC). Microcontrollers are frequently found in automobiles, office machines, toys, and appliances.
The microcontroller is the integration of a number of useful functions into a single IC package. These functions are:
The ability to execute a stored set of instructions to carry out user defined tasks.
The ability to be able to access external memory chips to both read and write data from and to the memory.
Basically, a microcontroller is a device which integrates a number of the components of a microprocessor system onto a single microchip.
So a microcontroller combines onto the same microchip :
The CPU core (microprocessor)
Memory (both ROM and RAM)
Some parallel digital I/O
Also, a microcontroller is part of an embedded system, which is essentially the whole circuit board. Look up "embedded system" on Wikipedia.
The difference is that microcontroller incorporates features of microprocessor(CPU,ALU,Registers)along with the presence of added features like presence of RAM,ROM,I\O ports,counter etc.Here microcontroller control the operation of machine using fixed programme stored in Rom that doesn't change with lifetime.
The other difference is that the micro controllers usually has to handle real time tasks while on the contrary the microprocessors in a computer system may not handle a real time task at all times.
A microcontroller is much more of a complete computer system. A microprocessor is just that -- a processor. A microcontroller will normally include memory (often both RAM and some sort of ROM) as well as peripherals such as serial ports and timers, and (in some case) more specialized hardware. For example, a microcontroller intended for motor control will typically include some PWM ports, while one intended for communication use might include encryption hardware.
In short:
Microprocessor= CPU
Microcontroller= CPU+ peripherals+ memory
This link was useful too.
Micro-controller is a general purpose processor having 40pins. It is used as CPU in computer. It uses memory devices like RAM or ROM externally.
Micro-controller is also a processor designed with memory internally. It may be a computer.
General use
Microprocessor - generally use in computers as a general purpose programmable device.
microcontroller- generally use in Robotic system or a Traffic signal control system.
Ref -Difference between Microprocessor and Microcontroller
In the short word, microprocessor is the one part of microcontroller.