Why iteration is so much more time-consuming than recursion? - recursion

Today when I am solving Fibonacci arrays, I meet with a very strange thing. Recursion only takes 16ms, but iteration takes 80ms. I have tried to optimize my iteration (such as I use a vector container to fulfill my stack) but iteration is still much slower than recursion. It doesn't make sense because recursion still builds a stack at OS level, which is more time-consuming than iteration.
Here is my iteration code:
class Solution {
public:
int fib(int n) {
std::stack<int, std::vector<int>> st;
st.push(n);
int result = 0;
int temp = 0;
while(!st.empty()) {
temp = st.top(); st.pop();
if(temp == 1) result++;
else if(temp == 0) continue;
else {
st.push(temp - 1);
st.push(temp - 2);
}
}
return result;
}
};
Here is my recursion code
class Solution {
public:
int fib(int n) {
if(n == 0) return 0;
if(n == 1) return 1;
else return fib(n - 1) + fib(n - 2);
}
};
Well, I have searched for the reason. According to Is recursion ever faster than looping?, recursion is more time-consuming than iteration in an imperative language. But C++ is one of the imperative languages, it is not convincing.
I think I find the reason. You can help me check if there is any incorrect in my analysis?
The reason why recursion is faster than iteration is that if you use an STL container as a stack, it would be allocated in heap space.
When the PC pointer wants to access the stack, cache missing might happen, which is greatly expensive as for a small scale problem.
However, as for the Fibonacci solution, the code length is not very long. So the PC pointer can easily jump to the function's beginning. If you use a static int array, the result is satisfying.
Here is the code:
class Solution {
public:
int fib(int n) {
int arr[1000];
arr[0] = n;
int s = 1;
int result = 0;
int temp;
while (s) {
temp = arr[s-1];
s--;
switch (temp) {
case 1:
result++;
break;
case 0:
continue;
break;
default:
arr[s++] = temp - 1;
arr[s++] = temp - 2;
}
}
return result;
}
};

Related

How to make a flowchart of this code? The hardest part for me is recursion in a for loop

static void Main(string[] args)
{
string str = "ABCDE";
char[] charArry = str.ToCharArray();
permute(charArry, 0, str.Length - 1);
Console.ReadKey();
}
static void permute(char[] arry, int i, int n)
{
int j;
if (i == n)
Console.WriteLine(arry);
else
{
for (j = i; j <= n; j++)
{
swap(ref arry[i], ref arry[j]);
permute(arry, i + 1, n);
swap(ref arry[i], ref arry[j]); //backtrack
}
}
}
I don't understand how to draw recursion in a block diagram, when there is already a for loop...
There is no standard flowcharting style; I think we pretty much wore out that concept 50 years ago. However, the way I would do this, and recommend to students, is to flowchart the recursion call as you would any other call. It's a "simple" process step. You do not diagram the control flow of the recursion itself.

How do I convert my depth-first traversal code using recursion to the one using stack?

My data structures and stack operations are defined like this.
typedef struct AdjListNode{
int dest;
struct AdjListNode* next;
int visited;
}AdjListNode;
typedef struct AdjList{
struct AdjListNode* head;
}AdjList;
typedef struct Graph{
int V;
struct AdjList* array;
}Graph;
typedef struct Stack{
int top;
char items[MAX];
}Stack;
void Push(struct Stack *s,float val){
if(s->top == MAX-1){
printf("Error: Stack overflow.");
exit(1);
}else{
s->items[++(s->top)]=val;
}
}
float Pop(struct Stack *s){
if(s->top == -1){
printf("Error: Stack underflow");
exit(1);
}else{
return(s->items[(s->top)--]);
}
}
int isFull(struct Stack* s){
return s->top == MAX-1;
}
int isEmpty(struct Stack* s){
return s->top == -1;
}
And this function checks if there is a path from p to q
void FindPath_DepthFirstSearch(Graph* graph, int p, int q) {
AdjListNode* node = graph->array[p].head;
node->visited = 1;
// printf("%d\n", p);
if(p == q){
printf("Path found!\n");
exit(1);
}
while(node){
if (!graph->array[node->dest].head->visited)
FindPath_DepthFirstSearch(graph, node->dest, q);
node = node->next;
}
printf("Path not found!\n");
exit(1);
}
I'm fairly new to learning data strctures. The code works perfectly when I use recursion but I got stuck for a long time when I tried to implement this using stack. Can anyone help me with this? Thanks!
The basic idea when converting from a recursive definition to a loop-based one is that you store the data that you normally store in local variables on a stack (LIFO) container. Instead of recursing, you push additional elements on the stack. The code then looks like this:
function algorithm(args):
stack = new FIFO
# push initial state (e.g. root node for DFS)
stack.push(args)
# process all elements
while stack.has_elements():
# retrieve topmost element from stack
e = stack.pop()
# do something with the current element
e.frobnicate()
# push additional elements onto the stack
if e.condition1():
stack.push(e.derived1())
if e.condition2():
stack.push(e.derived2())

simple testing of semaphores

I am trying to create a simple program to test semaphores. I am forking the process and tormenting the value of variable c in the critical section of each process, but the value of c I get is still 1 not 2. Even with the mmap() uncommented. Can anyone please explain to me what I am doing wrong? Any help would be appreciated. I am a total newbie in this. Thank you very much for your time.
int main()
{
int c = 0;
sem_t mutex;
sem_t mutex1;
// sem_t *mutex = (sem_t*)mmap(NULL, sizeof(sem_t*), PROT_READ|PROT_WRITE,MAP_SHARED|MAP_ANONYMOUS,-1, 0);
sem_init(&mutex, 0, 1);
sem_init(&mutex1, 0, 1);
pid_t i;
int id = fork();
if(id == -1) {}
else if(id == 0)
{
sem_wait (&mutex);
c++;
sem_post (&mutex);
}
else
{
sem_wait (&mutex);
c++;
sem_post (&mutex);
}
cout<<c<<endl;
//system("pause");
return 0;
}
I tried it another way by making the pshared argument 1, but it still does not work.
I have also tried it sem_op but it still does not work.
int main()
{
int c = 0;
int sid =semget(1105,2, 0666 | IPC_CREAT);
pid_t i;
int id = fork();
if(id == -1)
{
}
else if(id == 0)
{
struct sembuf sb;
sb.sem_num = 0;
sb.sem_op = -1;
sb.sem_flg = 0;
if((semop(sid, &sb, 1)) == -1)
cout<<"error"<<endl;
c++;
sb.sem_num = 0;
sb.sem_op = -1;
sb.sem_flg = 0;
if((semop(sid, &sb, 1)) == -1)
cout<<"error"<<endl;
}
else if(id == 1)
{
struct sembuf sb;
if((semop(sid, &sb, 1)) == -1)
cout<<"error"<<endl;
c++;
sb.sem_num = 0;
sb.sem_op = -1;
sb.sem_flg = 0;
if((semop(sid, &sb, 1)) == -1)
cout<<"error"<<endl;
}
cout<<c<<endl;
return 0;
}
If you use fork() you have to share the semaphores between the forked processes. See sem_init() manual for more details.
Alternatively you can use a named semaphore, see sem_open() for details, and
also a good article on the subject.
Your primary misstep is that the variable c is not itself shared — each process operates on its own copy of the variable. You want something like this:
int *c;
c = mmap(NULL, sizeof(*c), PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0);
*c = 0;
// ... later ...
++*c;
Additionally, with respect to your sem_init() example, you should:
Allocate shared memory of the correct size: sizeof(sem_t) and not sizeof(sem_t*)
Set the pshared flag during sem_init()
You likely don't need conditional logic differentiating parent from child after the fork(). After all, you want them to do the same thing.
(Separately, please do not name a POSIX semaphore "mutex." That name will mislead hurried, POSIXly-minded folk who will think you are referring to a different kind of synchronization primitive.)
With respect to your semget() example, you appear to be waiting on the semaphore twice (sb.sem_op = -1) in the child process. The post-fork() check for the parent is incorrect — you check if the returned PID is 1 (which it will never be on a typical UNIX system) rather than if the returned PID is > 0. (Again, however, you likely don't need to have parent and child do different things here.)

How to assign a pointer a 2D square array of unknown size?Whats wrong in the following function?

Here is the code I tried,Segmentation fault was the result..
void allocate(int ** universe,int n) // to assign pointer "a" a nXn matrix
{
universe=(int **) calloc(n,sizeof(int *));
int l;
for(l=0;l<n;l++)
universe[l]=(int *)calloc(n,sizeof(int));
int u;
for(l=0;l<n;l++)
for(u=0;u<n;u++) //making all entries 00
universe[l][u]=0;
}
Whats wrong in the following function?
Since arguments are passed by value, your function works on a copy of the passed-in pointer, and doesn't modify the pointer in the caller, that remains uninitialised (or still points where it pointed before), and you can't access the allocated memory from the caller, and trying to access it via universe[i][j] is likely to cause a segmentation fault. As a further consequence, when allocate() returns, you have lost the only pointers to the allocated memory and that's a leak.
The correct way to do it is to
return the pointer,
int ** allocate(int n)
{
int **universe = calloc(n,sizeof(int *));
if (!universe) {
fputs(stderr, "Allocation of universe failed.");
exit(EXIT_FAILURE);
}
int l;
for(l = 0; l < n; l++) {
universe[l] = calloc(n,sizeof(int));
if (!universe[l]) {
fprintf(stderr, "Failed to allocate row %d.\n", l);
exit(EXIT_FAILURE);
}
}
return universe;
}
and call that like int **universe = allocate(124);, or
pass in the address of the pointer you want to allocate memory to,
void allocate(int *** universe_addr,int n) // to assign pointer "a" a nXn matrix
{
int ** universe = calloc(n,sizeof(int *));
if (!universe) {
/* repair or exit */
}
int l;
for(l = 0; l < n; l++) {
universe[l]=(int *)calloc(n,sizeof(int));
if (!universe[l]) {
/* repair or exit */
}
}
/* Now set the pointer in the caller */
*universe_addr = universe;
}
and call it like allocate(&universe, 123);.
Note: I have removed the initialisation loop, since calloc already zeros the allocated memory, hence it is unnecessary to set it to 0 again.

OpenCL autocorrelation kernel

I have written a simple program that does autocorrelation as follows...I've used pgi accelerator directives to move the computation to GPUs.
//autocorrelation
void autocorr(float *restrict A, float *restrict C, int N)
{
int i, j;
float sum;
#pragma acc region
{
for (i = 0; i < N; i++) {
sum = 0.0;
for (j = 0; j < N; j++) {
if ((i+j) < N)
sum += A[j] * A[i+j];
else
continue;
}
C[i] = sum;
}
}
}
I wrote a similar program in OpenCL, but I am not getting correct results. The program is as follows...I am new to GPU programming, so apart from hints that could fix my error, any other advices are welcome.
__kernel void autocorrel1D(__global double *Vol_IN, __global double *Vol_AUTOCORR, int size)
{
int j, gid = get_global_id(0);
double sum = 0.0;
for (j = 0; j < size; j++) {
if ((gid+j) < size)
{
sum += Vol_IN[j] * Vol_IN[gid+j];
}
else
continue;
}
barrier(CLK_GLOBAL_MEM_FENCE);
Vol_AUTOCORR[gid] = sum;
}
Since I have passed the dimension to be 1, so I am considering my get_global_size(0) call would give me the id of the current block, which is used to access the input 1d array.
Thanks,
Sayan
The code is correct. As far as I know, that should run fine and give corret results.
barrier(CLK_GLOBAL_MEM_FENCE); is not needed. You'll get more speed without that sentence.
Your problem should be outside the kernel, check that you a re passing correctly the input, and you are taking out of GPU the correct data.
BTW, I supose you are using a double precision suported GPU as you are doing double calcs.
Check that you are passing also double values. Remember you CAN't point a float pointer to a double value, and viceversa. That will give you wrong results.

Resources