Memory copy speed comparison CPU<->GPU - opencl

I am now learning boost::compute openCL wrapper library.
I am experiencing very slow copy procedure.
If we scale CPU to CPU copy speed as 1, how fast is GPU to CPU, GPU to GPU, CPU to GPU copy?
I don't require precise numbers. Just a general idea would be a great help. In example CPU-CPU is at least 10 times faster than GPU-GPU.

No one is answering my question.
So I made a program to check the copy speed.
#include<vector>
#include<chrono>
#include<algorithm>
#include<iostream>
#include<boost/compute.hpp>
namespace compute = boost::compute;
using namespace std::chrono;
using namespace std;
int main()
{
int sz = 10000000;
std::vector<float> v1(sz, 2.3f), v2(sz);
compute::vector<float> v3(sz), v4(sz);
auto s = system_clock::now();
std::copy(v1.begin(), v1.end(), v2.begin());
auto e = system_clock::now();
cout << "cpu2cpu cp " << (e - s).count() << endl;
s = system_clock::now();
compute::copy(v1.begin(), v1.end(), v3.begin());
e = system_clock::now();
cout << "cpu2gpu cp " << (e - s).count() << endl;
s = system_clock::now();
compute::copy(v3.begin(), v3.end(), v4.begin());
e = system_clock::now();
cout << "gpu2gpu cp " << (e - s).count() << endl;
s = system_clock::now();
compute::copy(v3.begin(), v3.end(), v1.begin());
e = system_clock::now();
cout << "gpu2cpu cp " << (e - s).count() << endl;
return 0;
}
I expected that gpu2gpu copy would be fast.
But on the contrary, cpu2cpu was fastest and gpu2gpu was so slow in my case.
(My system is Intel I3 and Intel(R) HD Graphics Skylake ULT GT2.)
Maybe parallel processing is one thing and copy speed is another.
cpu2cpu cp 7549776
cpu2gpu cp 18707268
gpu2gpu cp 65841100
gpu2cpu cp 65803119
I hope anyone can benefit with this test program.

Related

NPP function returns nppiFilterRow_8u_C1R CUDA KERNEL execution error

I am using nvidia nsight application to rotate and blur images. I am using the NPP libraries for the same,
The oDeviceDst gets filled the data from the output of function rotate, which is working fine.The code is shown below: Also note that the below piece of code is in a for loop for some rotation angles.
npp::ImageNPP_8u_C1 oDeviceDst(768 , 768);
Npp32s masksize = (Npp32s)KERNEL_LENGTH;
Npp32s anchor = (Npp32s)KERNEL_RADIUS;
NppiSize SzROI ={(int)oDeviceDst_gauss.width(),(int)oDeviceDst_gauss.height()};
std::cout << " anchor " << anchor << std::endl; std::cout << " masksize " << masksize << std::endl;
NppStatus status2 = nppiFilterRow_8u_C1R(oDeviceDst.data(), oDeviceDst.pitch(), oDeviceDst_gauss.data(), oDeviceDst_gauss.pitch(), SzROI, (Npp32s*)h_Kernel, masksize , anchor, int(sumh));
std::cerr << "status of blur is" << status2 << std::endl;
But as soon as i run it(remote on target), I get a status of -1000.
status of blur is -1000.
logout
Has anyone faced a similar issue in usage of NPP libraries.

Tower of Hanoi recursion solution explanation

I have struggled with Tower of Hanoi problem for 3 months. Here's the simplest code I can find:
void moveDisk(int n, char from, char to, char buffer)
{
if(n == 1)
{
cout << "Move disk 1 from " << from << " to " << to << endl;
return;
}
moveDisk(n-1, from, buffer, to);
cout << "Move disk " << n << " from " << from << " to " << to << endl;
moveDisk(n-1, buffer, to, from);
}
int main()
{
moveDisk(3, 'A', 'C', 'B');
}
And it works perfectly well:
Move disk 1 from A to C
Move disk 2 from A to B
Move disk 1 from C to B
Move disk 3 from A to C
Move disk 1 from B to A
Move disk 2 from B to C
Move disk 1 from A to C
My biggest question is: Why it works?
I have done some research and they all say that: "Move the first n-1 disks to the 'buffer', then move the n disk to the 'to' stack and finally move the first n-1 disks to the 'to' stack. I understand the idea of it, but my question is: why writing the code that way works? Also: why do we need to print:
cout << "Move disk 1 from " << from << " to " << to << endl;
in the base case in order for it to work? If according to the idea above, why don't we just return when we hit the base case instead?
I tried to track the recursion by hand and the variable "from", "to", and "buffer" keep changing constantly - but somehow it works!
Please explain to me why?
Thank you!

How to get camera intrinsics and extrinsics in openni2?

I have a primesense carmine 1.08 and carmine 1.09. I need the intrinsic parameters for the RGB and the IR camera and the extrinsics between the two. I use pcl with openni2 support. So I need to know the sensor parameters used by openni2/pcl.
Is there a way in openni2 to find the intrinsics and the extrinsics using openni2/pcl? Libfreenect2 has option to get IR and color camera intrinsics, but are these parameters same as that in openni? Are all these parameters extracated from sensor during runtime?
I tried to get it via pcl, but i get nan for the focal length and the principal points
int main (int argc, char** argv)
{
std::string device_id ("");
pcl::io::OpenNI2Grabber::Mode depth_mode =
pcl::io::OpenNI2Grabber::OpenNI_Default_Mode;
pcl::io::OpenNI2Grabber::Mode image_mode =
pcl::io::OpenNI2Grabber::OpenNI_Default_Mode;
pcl::io::OpenNI2Grabber grabber (device_id, depth_mode, image_mode);
grabber.start();
double fx,fy,px,py;
grabber.getDepthCameraIntrinsics(fx,fy,px,py);
cout << "fx=" << fx << endl;
cout << "fy=" << fy << endl;
cout << "px=" << px << endl;
cout << "py=" << px << endl;
return (0);
}
A similar question has been asked here https://stackoverflow.com/questions/41110791/openni-intrinsic-and-extrinsic-calibration. However it hasnt recieved any answers.

Qt or OpenCV: print out the codec of a video file

I'd like to know how i can print out the codec of a video file after opening it with VideoCapture (on OSX or Ubuntu).
The file is correctly loaded and visualized by opencv inside a qt application.
QString filename = QFileDialog::getOpenFileName(...)
cout << filename.size() << endl; // size in byte
VideoCapture cap = VideoCapture(filename.toStdString());
cout << cap.get(CV_CAP_PROP_FRAME_HEIGHT) << endl; // print the height
cout << cap.get(CV_CAP_PROP_FPS) << endl; // print the fps
codec ??
Try
cap.get(CV_CAP_PROP_FOURCC);
to get the codec.
Edit:
I am not a C++ programmer, but this is what I found searching around to change it to a char array:
int ex = static_cast<int>(inputVideo.get(CV_CAP_PROP_FOURCC));
char EXT[] = {ex & 0XFF , (ex & 0XFF00) >> 8,(ex & 0XFF0000) >> 16,(ex & 0XFF000000) >> 24, 0};
See:
http://docs.opencv.org/doc/tutorials/highgui/video-write/video-write.html

QList memory deallocation

I'm trying to free memory after using QList, but it doesn't seem to work properly.
Here's my code:
QList<double> * myList;
myList = new QList<double>;
double myNumber;
cout << "CP1" << endl;
getchar(); // checkpoint 1
for (int i=0; i<1000000; i++)
{
myNumber = i;
myList->append(myNumber);
cout << myList->size() << endl;
}
cout << "CP2!" << endl;
getchar(); // checkpoint 2
for (int i=999999; i>0; i--)
{
myList->removeLast();
cout << myList->size() << endl;
}
cout << "CP3!" << endl;
getchar(); // checkpoint 3
delete myList;
cout << "CP4!" << endl;
getchar(); // checkpoint 4
Memory usage:
CP1: 460k
CP2:19996k
CP3:19996k
CP4:16088k
So it looks like despite removing of elements and deleting myList still large part of memory is being used. I believe there is a way to handle it but I can't find it.
Thanks in advance for any help.
Pawel
Memory manager is not required to release the memory your program has allocated. There are no problems in your deallocation.
QList is an array based list. The array expands automatically, but does not shrink automatically. Removing elements from the list does not affect the size of the array.
To trim the array down to the actual size, create a new QList and add the contents to it. Then delete the original list.
Unfortunately looks like there is no convenience method to do this, like the List.TrimExcess() in .NET.

Resources