Compare Buffers in tosca testing tool - tosca

How to compare two buffers in tosca?
I want to compare the 2 buffers in one test step, can any one tell me how to compare the two buffers in tosca

Related

how does parallel GADriver support distributed memory

When I set run_parallel = True for the SimpleGADriver how is the memory handled? Does it do anything with the distributed memory? Does it send each point in generation to a single memory (in case I have a setup that connects multiple nodes (each has its own memory) ) ?
I am not sure I completely understand your question, but I can give an overview of how it works.
When "run_parallel" is True, and you are running under MPI with n processors, the SimpleGADriver will use those procs to evaluate the newly generated population design values. To start, the GA runs on each processor with local values in local memory. When a new set of points is generated, the values from rank 0 are broadcast to all ranks and placed into a list. Then those points are evaluated based on the processor rank, so that each proc is evaluating a different point. When completed, all of the values are allgathered, after which, every processor has all of the objective values for the new generation. This process continues until termination criteria are reached.
So essentially, we are just using multiple processors to speed up objective function evaluation (i.e., running the model), which can be significant for slower models.
One caveat is that the total population size needs to be divisible by the number of processors or an exception will be raised.
The choice to broadcast the population from rank 0 (rather than any other rank) is arbitrary, but those values come from a process that includes random crossover and tournament selection, so each processor does generate a new valid unique population and we just choose one.

Stream processors with multiple inputs and outputs as arrows

Reading John Hughes's Generalising monads to arrows, I understand that arrows can be used to represent and combine stream processors with a single input and a single output. It is also possible to represent multiple inputs and outputs using pairs, or using ArrowChoice.
However, using a pair means the input is a stream of pairs, which isn't enough to express processing streams that arrive at difference rates. ArrowChoice is able express that, but it "multiplexes" the two streams in a single one.
I'm looking for a way to combine streams with multiple inputs and multiple outputs, while still being able to distinguish between the case where the streams are multiplexed and the case of separate streams.
Is that possible?
Maybe you could use the These type (from here) which is defined as :
data These a b = This a | That b | These a b
This way you could express that you are receiving one stream, or the other, or both.

Passing multiple variables in MPI

I am trying an implementation in MPI where I am invoking multiple slaves (upto 4) on the same machine (localhost) and distributing the computations of my for loop amongst the slaves. MPI is suited for my current application and I cannot take the openMP route.
The variables that are involved are about 50 and all are uni-dimensional arrays.
What would be the best way to send the 50 variables to the master process? Should I send and receive all variables or should I pack them in one 2D array and send this array across to the master?
I am looking for an efficient and computationally inexpensive approach.
Thanks
As so often: it depends. If your individual arrays are sufficiently large, such that latency gets insignificant, it would be fine to send each one individually. Otherwise, it will be better to increase the size of your message by collecting all those arrays into a single one.
If your variables are of different type you could make use of MPI datatypes to describe the layout of your data.
Additionally, if you need to collect this data from multiple processes it might be a good idea to use MPI_Gather or one of its variants.
It might also be, that a viable option in your scenario would be to make use of the one-sided communication facilities offered by MPI.

Using OpenCL for multiple devices (multiple GPU)

Hello fellow StackOverflow Users,
I have this problem : I have one very big image which i want to work on. My first idea is to divide the big image to couple of sub-images and then send this sub-images to different GPUs. I don't use the Image-Object, because I don't work with the RGB-Value, but I'm only using the brightness value to manipulate the image.
My Question are:
Can I use one context with many commandqueues for every device? or should I use one context with one commandqueue for each device ?
Can anyone give me an example or ideas, how I can dynamically change the inputMem-Data (sub-images data) for setting up the kernel arguments to send to the each device ? (I only know how to send the same input data)
For Example, If I have more sub-images than the GPU-number, how can I distribute the sub-images to the GPUs ?
Or maybe another smarter approach?
I'll appreciate every help and ideas.
Thank you very much.
Use 1 context, and many queues. The simple method is one queue per device.
Create 1 program, and a kernel for each device (created from the same program). Then create different buffers (one per device) and set each kernel with each buffer. Now you have different kernels, and you can queue them in parallel with different arguments.
To distribute the jobs, simple use the event system. Checking if a GPU is empty and queing there the next job.
I can provide more detailed example with code, but as general sketch that should be the way to follow.
AMD APP SDK has few samples on multi gpu handling. You should be looking at these 2 samples
SimpleMultiDevice: shows how to create multiple commandqueues on single context and some performance results
BinomailoptionMultiGPU: look at loadBalancing method. It divides the buffer based on compute units & max clock freq of available gpus

Can opencl chain multiple passes without returning to CPU?

I want to auto scale some data. So, I want to pass through all the data and find the maximum extents of the data. Then I want to go through the data, do calculations, and send the results to opengl for rendering. Is this type of multipass thing possible in opencl? Or does the CPU have to direct the "find extents" calc, get the results, and then direct the other calc with that?
It sounds like you would need two OpenCL kernels, one for calculating the min and max and the other to actually scale the data. Using OpenCL command queues and events you can queue up these two kernels in order and store the results from the first in global memory, reading those results in the second kernel. The semantics of OpenCL command queues and events (assuming you don't have out-of-order execution enabled) will ensure that one completes before the other without any interaction from your host application (see clEnqueueNDRangeKernel).

Resources