I implemented an encoder in 2 ways.
1) based on the SDK Transcoder example, which uses topology and transcoding profile
2) based on IMFSourceReader and IMFSinkWriter, where the Sinkwriter delivers the samples to the Sourcewriter for transcoding
I tested both implementations on Windows 8.1 with Nvidia (Quadro K2200) and Intel GPU (P4600/P4700)
But bizarrly only the topology implementation uses GPU (on both).
In 2) I both I set "MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS", which has not to be set I guess, because 1) works with GPU with and without this flag set for the container type.
Is there a trick to enable GPU with IMFSinkWriter or is this a bug in the media foundation?
I had initially ran into the same issue. You don't mention how you configured the output type of the source reader (and the input type of the sink), but I found that if you allow the system to handle it (by selecting the output type of the reader to be RGB32), the performance will be horrible and all CPU bound. (error checking omitted for brevity)
hr = videoMediaType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
hr = videoMediaType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_RGB32);
hr = reader->SetCurrentMediaType((DWORD)MF_SOURCE_READER_FIRST_VIDEO_STREAM, nullptr, videoMediaType);
reader->SetStreamSelection((DWORD)MF_SOURCE_READER_FIRST_VIDEO_STREAM, true);
And the documentation agrees, indicating that this configuration is useful for getting a single snapshot from the video. As a result, if you configure the reader to deliver the native media type, performance is excellent, but you now have to transform the format yourself.
reader->GetNativeMediaType((DWORD)MF_SOURCE_READER_FIRST_VIDEO_STREAM, videoMode->GetIndex(), videoMediaType);
From here, if you are dealing with simple color conversion (like YUY2 or YUV from a webcam) then there are a few options. I originally tried writing my own converter, and pushing that off to the GPU using HLSL with DirectCompute. This works very well, but in your case, the format isn't as trivial.
Ultimately, creating and configuring an instance of the color converter (as an IMFTransform) works perfectly.
Microsoft::WRL::ComPtr<IMFMediaType> mediaTransform;
hr = ::CoCreateInstance(CLSID_CColorConvertDMO, nullptr, CLSCTX_INPROC_SERVER, __uuidof(IMFTransform), reinterpret_cast<void**>(mediaTransform.GetAddressOf());
// set the input type of the transform to the NATIVE output type of the reader
hr = mediaTransform->SetInputType(0u, videoMediaType.Get(), 0u);
Create and configure a separate sample and buffer.
IMFSample* transformSample;
hr = ::MFCreateSample(&transformSample);
hr = ::MFCreateMemoryBuffer(RGB_MFT_OUTPUT_BUFFER_SIZE, &_transformBuffer);
hr = transformSample->AddBuffer(transformBuffer);
MFT_OUTPUT_DATA_BUFFER* transformDataBuffer;
transformDataBuffer = new MFT_OUTPUT_DATA_BUFFER();
transformDataBuffer->pSample = _transformSample;
transformDataBuffer->dwStreamID = 0u;
transformDataBuffer->dwStatus = 0u;
transformDataBuffer->pEvents = nullptr;
When receiving samples from the source, hand them off to the transform to be converted.
hr = mediaTransform->ProcessInput(0u, sample, 0u));
hr = mediaTransform->ProcessOutput(0u, 1u, transformDataBuffer, &outStatus));
hr = transformDataBuffer->pSample->GetBufferByIndex(0, &mediaBuffer);
Then of course, finally hand off the transformed sample to the sink just as you do today. I am confident that this will work, and you will be a very happy person. For me, I went from 20% CPU utilization (originally implementation) down to 2% (I am concurrently displaying the video). Good luck. I hope you enjoy your project.
Related
I am trying to create a PostScript file using the Apache xml-graphics library. When I run under Java, it works fine.
But when I build for .NET using IKVM, then on a call to GlyphVector.getoutline(x, y), the code never returns.
This is happening when I:
AttributedString as = new AttributedString(buf.toString());
// call as.addAttribute() setting various TextAttribute for various ranges
FontRenderContext frc = graphics.getFontRenderContext();
LineBreakMeasurer lineMeasurer = new LineBreakMeasurer(as.getIterator(), frc);
TextLayout layout = lineMeasurer.nextLayout(Integer.MAX_VALUE);
layout.draw (graphics, left, line.getBaseline());
Any idea what can be going wrong?
thanks - dave
I can succesfully save video which I captured from c++ opencv there is no problem.
Bu similar code not capturing the video. Just opening out.avi . and only 6 kb.
I put the code in showframe func. there is no resizing fyi.
Anybody has experience with the opencv videowriter on the Qt?
void Widget::show_frame(Mat &image)
{
Mat resized_image = image.clone();
video.write(image);
int width_of_label = ui->label_camera->width();
int height_of_label = ui->label_camera->height();
Size size(width_of_label, height_of_label);
// cv::resize(image, resized_image, size);
cvtColor(image,image,CV_BGR2RGB);
cvtColor(resized_image, resized_image, CV_BGR2RGB);
ps : Platform MacOSX
I encountered the same problem with you, and I have tried many solutions, I think you can make the fifth parameter of videowriter() be false. That is, VideoWriter out = VideoWriter(video_name, CV_FOURCC('D', 'I', 'V', 'X'),frame_fps,Size(frame_width,frame_height),false). This works for me!
make sure that your application has access to the opencv_ffmpeg*.dll. For example place it in the working directory or the PATH variable.
Try different codecs, too. Afaik, MJPG did work on all tested machines/systems so far.
I wrote a barebone progran template in XC8 (1.37) that I use to develop and test new GLCD functions for the 18F family. Programming is done via a PICkit3. Since I need to quicky reprogram several times the code it is really important that programming is faster as much as possible.
Tipically, the code size is around 2K and it takes less than 10 sec to program,
Everiything is fine until I must use a font table, defined as:
const char font8[] = {....
Now, with just $400 bytes added, the compiler place the table at the ROM's end and the programming of 64K memory takes more than 1 minute.
Is there any way to avoid this?
I tried to manually limit the memory range in the MPLABX options, but this is annoying and a little unsafe (sometimes part of code is truncated).
A while back I had to write some code for emissions testing, where I needed to copy data between extreme ends of RAM. To do that I needed to specify the exact memory addresses. You can also use the C extension __at() construct. http://ww1.microchip.com/downloads/en/DeviceDoc/50002053F.pdf#page=27
int scanMode __at(0x200);
const char keys[] __at(123) = { ’r’, ’s’, ’u’, ’d’};
int modify(int x) __at(0x1000) {
return x * 2 + 3;
}
I'm trying to create a VolumeSlider widget that changes the volume of my audio output.
log.debug("Starting audio player (%s)..." % url)
mediaSource = Phonon.MediaSource(url)
mediaSource.setAutoDelete(True)
self.player = Phonon.createPlayer(Phonon.VideoCategory, mediaSource)
self.player.setTickInterval(100)
self.player.tick.connect(self.updatePlayerLength)
self.mediaSlider.setMediaObject(self.player)
audioOutput = Phonon.AudioOutput(Phonon.MusicCategory)
Phonon.createPath(self.player, audioOutput)
self.mediaVolumeSlider.setAudioOutput(audioOutput)
self.player.play()
However even though I can move the volume slider, the actual volume doesn't change. What did I miss?
I have never used Phonon.createPlayer, because the API seems totally baffling. Apparently, it is supposed to be a "convenience" function that creates a path between an media object and an audio output. But it only gives a reference to the media object. There appears to be no access to the audio output object, which would seem to make it completely useless (but I may well be missing something).
Anyway, I think it is much more convenient to create the paths explicitly, so that it is clear how all the parts are connected together.
The following code works for me on Linux and WinXP:
self.media = Phonon.MediaObject(self)
self.video = Phonon.VideoWidget(self)
self.audio = Phonon.AudioOutput(Phonon.VideoCategory, self)
Phonon.createPath(self.media, self.audio)
Phonon.createPath(self.media, self.video)
self.slider = Phonon.VolumeSlider(self)
self.slider.setAudioOutput(self.audio)
Problem: Attempting to share a Renderbuffer (or a Texture), bound to a Framebuffer, fails when clSetKernelArg() is called. Thorough error checking reports no problems until that call.
My program generates frames for a video projector that runs at 60fps (16.7ms frames).
My kernel runs in (typically) 24ms, but it's taking 50ms between each frame. I assume that some of the extra cost is because I'm using the GPU to calculate the pixels, then enqueuing a readbuffer to pull the data off the GPU, then using glDrawPixels to put it back onto the GPU for display. Perfect situation to try OpenGL/OpenCL interoperation, right?, to avoid the two extra copy operations.
There are many examples, and I have succeeded in sharing a VBO with OpenCL, and can write to it, but that doesn't help me. I don't want to write vertex data, just a 2-D image that's been calculated.
There are examples of two different ways to do this, and they both involve Framebuffer objects.
You can attach a Renderbuffer to a Framebuffer, or you can attach a Texture to a Framebuffer.
Then you should be able to write to that buffer in opencl and display it with opengl, no extra copies.
I have found a few examples of this in code, and I think I'm doing everything exactly the way the examples say to do it, but maybe it is broken in OSX? .. because it doesn't work. The FBO is "Complete", no errors along the way, until I try to do the clSetKernelArg. That call returns error -38, CL_INVALID_MEM_OBJECT.
*note: I would rather use a Renderbuffer than a Texture, since all I'm doing is making a 2-D RGB image that I want to display. But I tried a Texture out of desperation. Still no help.
I do these steps, in this order, with some other stuff in between:
kCGLContext = CGLGetCurrentContext();
kCGLShareGroup = CGLGetShareGroup( kCGLContext );
glGenFramebuffers( 1, &fboid );
glBindFramebuffer( GL_FRAMEBUFFER, fboid );
glGenRenderbuffers( 1, &rboid );
glBindRenderbuffer( GL_RENDERBUFFER, rboid );
glRenderbufferStorage( GL_RENDERBUFFER, GL_RGBA, rb_wid, rb_hgt );
glboid = rboid;
glFramebufferRenderbuffer( GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, rboid );
then:
cl_context_properties ourprops[] = { CL_CONTEXT_PROPERTY_USE_CGL_SHAREGROUP_APPLE, (cl_context_properties)kCGLShareGroup, 0 };
contextZ = clCreateContext( ourprops, 1, &dev_idZ[0], clLogMessagesToStdoutAPPLE, NULL, &err );
clbo = clCreateFromGLRenderbuffer( contextZ, CL_MEM_WRITE_ONLY, glboid, &err );
then clCreateCommandQueue, clCreateProgramWithBinary, clBuildProgram, clCreateKernel, all no errors
then later:
glFinish();
clEnqueueAcquireGLObjects( queueZ, 1, &clbo, 0,0,0 );
err = clSetKernelArg( kernelZ, 1, sizeof(cl_mem), &clbo );
... which fails with error -38, CL_INVALID_MEM_OBJECT.
clbo is a static cl_mem, just like the buffer object that's used when interop is not on. The difference being that it was created using clCreateFromGLRenderbuffer instead of clCreateBuffer, and it's in a context created in association with the gl sharegroup.
(I've tried adding a second Renderbuffer and attaching it to a Depth Attachment Point, in case that was needed. No help.)
(I've tried the same thing with a Texture bound to the FBO, and I get the same error in the same place.)
... does anybody have any ideas at all?
Ok; got it!
The problem was that the kernel was specifying a uint * as its output argument instead of image2d_t. I didn't think this would matter at least for the setkarg call; they are both cl_mem on the host side (image2d_t is #defined as cl_mem). However, once you call clCreateFromGLTexture2D or clCreateFromGLRenderbuffer, that object acquires properties the ocl knows about. When the kernel was changed to specify image2d_t, more useful error messages appeared, and it now works.
Bonus fact: you can't write to a UNORM_INT8 image with write_imageui; you have to use write_imagef.