CS50, Pset 4, filter-less, reflection issues - pointers

so im currently trying to do the less comfortable pset 4 and have just got to the reflection part.
My code seems to work with the examples provided but fails the CS50 Check, I believe its something to do with my logic that i just cant get my head around. Maybe to do with the fact ive not actually done anything regarding whether the width is even or odd.
Also I originally tried to put the * pointer on the "image[i][j]" parts but that didnt work so then decided to do it on the "temp" part and discovered I had to use "malloc". Im not sure why this is the case as in the lecture its the variables we want changing and not the temporary variable that was assigned a pointer, any explanation towards that would also be appreciated.
I've also never used this site before so apologies if this doesnt make sense.
PG x
edit: I've also just realise it 'works' on the images provided even if there are no pointers.
edit: I saw on another question that doing "image[i][width -1 - j]" works, so now it passes the check but im not sure why its a "-1", is it because arrays start at [0] ?
void reflect(int height, int width, RGBTRIPLE image[height][width])
for (int i = 0; i < height; i++)
{
//width / 2 as we swap half the picture with the other half (unless its an uneven width?! :O)
for (int j = 0; j < width / 2; j++)
{
//first pixel = last pixel
//the * means that we are going to the LOCATION of temp in memory rather than just the value itself
RGBTRIPLE *temp = malloc(sizeof(RGBTRIPLE));
*temp = image[i][j];
image[i][j] = image[i][width -j];
image[i][width - j] = *temp;
free(temp);
}
}

Related

OpenCL Atomic add for vector types?

I'm updating a single element in a buffer from two lanes and need an atomic for float4 types. (More specifically, I launch twice as many threads as there are buffer elements, and each successive pair of threads updates the same element.)
For instance (this pseudocode does nothing useful, but hopefully illustrates my issue):
int idx = get_global_id(0);
int mapIdx = floor (idx / 2.0);
float4 toAdd;
// ...
if (idx % 2)
{
toAdd = (float4)(0,1,0,1);
}
else
{
toAdd = float3(1,0,1,0);
}
// avoid race condition here?
// I'd like to atomic_add(map[mapIdx],toAdd);
map[mapIdx] += toAdd;
In this example, map[0] should be incremented by (1,1,1,1). (0,1,0,1) from thread 0, and (1,0,1,0) from thread 1.
Suggestions? I haven't found any reference to vector atomics in the CL documents. I suppose I could do this on each individual vector component separately:
atomic_add(map[mapIdx].x, toAdd.x);
atomic_add(map[mapIdx].y, toAdd.y);
atomic_add(map[mapIdx].z, toAdd.z);
atomic_add(map[mapIdx].w, toAdd.w);
... but that just feels like a bad idea. (And requires a cmpxchg hack since there are no float atomics.
Suggestions?
Alternatively you could try using local memory like that:
__local float4 local_map[LOCAL_SIZE/2];
if(idx < LOCAL_SIZE/2) // More optimal would be to use work items together than every second (idx%2) as they work together in a warp/wavefront anyway, otherwise that may affect the performance
local_map[mapIdx] = toAdd;
barrier(CLK_LOCAL_MEM_FENCE);
if(idx >= LOCAL_SIZE/2)
local_map[mapIdx - LOCAL_SIZE/2] += toAdd;
barrier(CLK_LOCAL_MEM_FENCE);
What will be faster - atomics or local memory - or possible (size of local memory may be too big) depends on actual kernel, so you will need to benchmark and choose the right solution.
Update:
Answering your question from comments - to write later back to global buffer do:
if(idx < LOCAL_SIZE/2)
map[mapIdx] = local_map[mapIdx];
Or you can try without introducing local buffer and write directly into global buffer:
if(idx < LOCAL_SIZE/2)
map[mapIdx] = toAdd;
barrier(CLK_GLOBAL_MEM_FENCE); // <- notice that now we use barrier related to global memory
if(idx >= LOCAL_SIZE/2)
map[mapIdx - LOCAL_SIZE/2] += toAdd;
barrier(CLK_GLOBAL_MEM_FENCE);
Aside from that I can see now problem with indexes. To use the code from my answer the previous code should look like:
if(idx < LOCAL_SIZE/2)
{
toAdd = (float4)(0,1,0,1);
}
else
{
toAdd = (float4)(1,0,1,0);
}
If you need to use id%2 though then all the code must follow this or you will have to do some index arithmetic so that the values go into right places in map.
If I understand issue correctly I would do next.
Get rid of ifs by making array with offsets
float4[2] = {(1,0,1,0), (0,1,0,1)}
and use idx %2 as offset
move map into local memory and use mem_fence(CLK_LOCAL_MEM_FENCE) to make sure all threads in group synced.

how to rewrite the recursive solution to iterative one

The problem is derive from OJ.
The description is :
We are playing the Guess Game. The game is as follows:
I pick a number from 1 to n. You have to guess which number I picked.
Every time you guess wrong, I'll tell you whether the number I picked is higher or lower.
However, when you guess a particular number x, and you guess wrong, you pay $x. You win the game when you guess the number I picked.
Given a particular n ≥ 1, find out how much money you need to have to guarantee a win.
I write small snippet about MinMax problem in recursion. But it is slow and I want to rewrite it in a iterative way. Could anyone help with that and give me the idea about how you convert the recursive solution to iterative one? Any idea is appreciated. The code is showed below:
public int getMoneyAmount(int n) {
int[][] dp = new int[n + 1][n + 1];
for(int i = 0; i < dp.length; i++)
Arrays.fill(dp[i], -1);
return solve(dp, 1, n);
}
private int solve(int[][] dp, int left, int right){
if(left >= right){
return 0;
}
if(dp[left][right] != -1){
return dp[left][right];
}
dp[left][right] = Integer.MAX_VALUE;
for(int i = left; i <= right; i++){
dp[left][right] = Math.min(dp[left][right], i + Math.max(solve(dp, left, i - 1),solve(dp, i + 1, right)));
}
return dp[left][right];
}
In general, you convert using some focused concepts:
Replace the recursion with a while loop -- or a for loop, if you can pre-determine how many iterations you need (which you can do in this case).
Within the loop, check for the recursion's termination conditions; when you hit one of those, skip the rest of the loop.
Maintain local variables to replace the parameters and return value.
The loop termination is completion of the entire problem. In your case, this would be filling out the entire dp array.
The loop body consists of the computations that are currently in your recursion step: preparing the arguments for the recursive call.
Your general approach is to step through a nested (2-D) loop to fill out your array, starting from the simplest cases (left = right) and working your way to the far corner (left = 1, right = n). Note that your main diagonal is 0 (initialize that before you get into the loop), and your lower triangle is unused (don't even bother to initialize it).
For the loop body, you should be able to derive how to fill in each succeeding diagonal (one element shorter in each iteration) from the one you just did. That assignment statement is the body. In this case, you don't need the recursion termination conditions: the one that returns 0 is what you cover in initialization; the other you never hit, controlling left and right with your loop indices.
Are these enough hints to get you moving?

Getting a part of a QMap as a QVector

I have some elements in a QMap<double, double> a-element. Now I want to get a vector of some values of a. The easiest approach would be (for me):
int length = x1-x0;
QVector<double> retVec;
for(int i = x0; i < length; i++)
{
retVec.push_back(a.values(i));
}
with x1 and x0 as the stop- and start-positions of the elements to be copied. But is there a faster way instead of using this for-loop?
Edit: With "faster" I mean both faster to type and (not possible, as pointed out) a faster execution. As it has been pointed out, values(i) is not working as expected, thus I will leave it here as pseudo-code until I found a better_working replacement.
Maybe this works:
QVector<double>::fromList(a.values().mid(x0, length));
The idea is to get all the values as a list of doubles, extract the sublist you are interested in, thus create a vector from that list by means of an already existent static method of QVector .
EDIT
As suggested in the comments and in the updated question, it follows a slower to type but faster solution:
QVector<double> v{length};
auto it = a.cbegin()+x0;
for(auto last = it+length; it != last; it++) {
v.push_back(it.value());
}
I assume that x0 and length take care of the actual length of the key list, so a.cbegin()+x0 is valid and it doesn't worth to add the guard it != a.cend() as well.
Try this, shouldn work, haven't tested it:
int length = x1-x0;
QVector<double> retVec;
retVec.reserve(length); // reserve to avoid reallocations
QMap<double, double>::const_iterator i = map.constBegin();
i += x0; // increment to range start
while (length--) retVec << i++.value(); // add value to vector and advance iterator
This assumes the map has actually enough elements, thus the iterator is not tested before use.

Usage of Map and Translate Functions in Processing

New to Processing working on understanding this code:
import com.onformative.leap.LeapMotionP5;
import java.util.*;
LeapMotionP5 leap;
LinkedList<Integer> values;
public void setup() {
size(800, 300);
frameRate(120); //Specifies the number of frames to be displayed every second
leap = new LeapMotionP5(this);
values = new LinkedList<Integer>();
stroke(255);
}
int lastY = 0;
public void draw() {
**translate(0, 180)**; //(x, y, z)
background(0);
if (values.size() >= width) {
values.removeFirst();
}
values.add((int) leap.getVelocity(leap.getHand(0)).y);
System.out.println((int) leap.getVelocity(leap.getHand(0)).y);
int counter = 0;
** for (Integer val : values)** {
**val = (int) map(val, 0, 1500, 0, height);**
line(counter, val, counter - 1, lastY);
point(counter, val);
lastY = val;
counter++;
}
** line(0, map(1300, 0, 1500, 0, height), width, map(1300, 0, 1500, 0, height)); //(x1, y1, x2, y2)**
}
It basically draw of graph of movement detected on the y axis using the Leap Motion sensor. Output looks like this:
I eventually need to do something similar to this that would detect amplitude instead of velocity simultaneously on all 3 axis instead of just the y.
The use of Map and Translate are whats really confusing me. I've read the definitions of these functions on the Processing website so I know what they are and the syntax, but what I dont understand is the why?! (which is arguably the most important part.
I am asking if someone can provide simple examples that explain the WHY behind using these 2 functions. For instance, given a program that needs to do A, B, and C, with data foo, y, and x, you would use Map or Translate because A, B, and C.
I think programming guides often overlook this important fact but to me it is very important to truly understanding a function.
Bonus points for explaining:
for (Integer val : values) and LinkedList<Integer> values; (cant find any documentation on the processing website for these)
Thanks!
First, we'll do the easiest one. LinkedList is a data structure similar to ArrayList, which you may be more familiar with. If not, then it's just a list of values (of the type between the angle braces, in this case integer) that you can insert and remove from. It's a bit complicated on the inside, but if it doesn't appear in the Processing documentation, it's a safe bet that it's built into Java itself (java documentation).
This line:
for (Integer val : values)
is called a "for-each" or "foreach" loop, which has plenty of very good explanation on the internet, but I'll give a brief explanation here. If you have some list (perhaps a LinkedList, perhaps an ArrayList, whatever) and want to do something with all the elements, you might do something like this:
for(int i = 0; i < values.size(); i++){
println(values.get(i)); //or whatever
println(values.get(i) * 2);
println(pow(values.get(i),3) - 2*pow(values.get(i),2) + values.get(i));
}
If you're doing a lot of manipulation with each element, it quickly gets tedious to write out values.get(i) each time. The solution would be to capture values.get(i) into some variable at the start of the loop and use that everywhere instead. However, this is not 100% elegant, so java has a built-in way to do this, which is the for-each loop. The code
for (Integer val : values){
//use val
}
is equivalent to
for(int i = 0; i < values.size(); i++){
int val = values.get(i);
//use val
}
Hopefully that makes sense.
map() takes a number in one linear system and maps it onto another linear system. Imagine if I were an evil professor and wanted to give students random grades from 0 to 100. I have a function that returns a random decimal between 0 and 1, so I can now do map(rand(),0,1,0,100); and it will convert the number for me! In this example, you could also just multiply by 100 and get the same result, but it is usually not so trivial. In this case, you have a sensor reading between 0 and 1500, but if you just plotted that value directly, sometimes it would go off the screen! So you have to scale it to an appropriate scale, which is what that does. 1500 is the max that the reading can be, and presumably we want the maximum graphing height to be at the edge of the screen.
I'm not familiar with your setup, but it looks like the readings can be negative, which means that they might get graphed off the screen, too. The better solution would be to map the readings from -1500,1500 to 0,height, but it looks like they chose to do it a different way. Whenever you call a drawing function in processing (eg point(x,y)), it draws the pixels at (x,y) offset from (0,0). Sometimes you don't want it to draw it relative to (0,0), so the translate() function allows you to change what it draws things relative against. In this case, translating allows you to plot some point (x,0) somewhere in the middle of the screen, rather than on the edge.
Hope that helps!

How to get a QVector<T> from a QVector<QVector<T>>?

I've got a QVector of QVector. And I want to collect all elements in all QVectors to form a new QVector.
Currently I use the code like this
QVector<QVector<T> > vectors;
// ...
QVector<T> collected;
for (int i = 0; i < vectors.size(); ++i) {
collected += vectors[i];
}
But it seems the operator+= is actually appending each element to the QVector. So is there a more time-efficent usage of QVector or a better suitable type replace QVector?
If you really need to, then I would do something like:
QVector< QVector<T> > vectors = QVector< QVector<T> >();
int totalSize = 0;
for (int i = 0; i < vectors.size(); ++i)
totalSize += vectors.at(i).size();
QVector<T> collected;
collected.reserve(totalSize);
for (int i = 0; i < vectors.size(); ++i)
collected << vectors[i];
But please take note that this sounds a bit like premature optimisation. As the documentation points out:
QVector tries to reduce the number of reallocations by preallocating up to twice as much memory as the actual data needs.
So don't do this kind of thing unless you're really sure it will improve your performance. Keep it simple (like your current way of doing it).
Edit in response to your additional requirement of O(1):
Well if you're randomly inserting it's a linked list but if you're just appending (as that's all you've mentioned) you've already got amortized O(1) with the QVector. Take a look at the documentation for Qt containers.
for (int i = 0; i < vectors.size(); ++i) {
for(int k=0;k<vectors[i].size();k++){
collected.push_back(vectors[i][k]);
}
}
outer loop: take out each vector from vectors
inner loop: take out each element in the i'th vector and push into collected
You could use Boost Multi-Array, this provides a multi-dimensional array.
It is also a 'header only' library, so you don't need to separately compile a library, just drop the headers into a folder in your project and include them.
See the link for the tutorial and example.

Resources