Hadoop MapReduce recursion with several outputs? - recursion

Hi I have map reduce program which take output of reducer in every recursion step. but i also need to output another results in every recursion .
input1--->Map1-->Reduce1--> output1 and output11
output1--->Map2-->Reduce2--> output2 and output22
output2--->Map3-->Reduce3--> output3 and output33
output3--->Map4-->Reduce4--> output4 and output44
as Final output i need : output11,output22,output33,output44 and output4
like this each step has 2 output files in which 1 goes to next iteration and other goes to output.
I am using SequenceFileAsTextInputFormat .
any help thank you

You can have a map reduce job for each recursive step, and use the output file of 1 recursive step for the next map reduce job

Related

Why is the FOR loop in my program producing empty matrices?

I am having a problem running a spiking-neuron simulator. I keep getting the error message, "operation +: Warning adding a matrix with the empty matrix will give an empty matrix result." Now I'm writing this program in "Scilab," but I'm hoping the problem I am having will be clear for the educated eye regardess. What I am doing is converting an existing MATLAB program to Scilab. The original MATLAB program and an explanation can be found here: https://www.izhikevich.org/publications/spikes.pdf
What happens in my Scilab version is that the first pass through the loop produces all the expected values. I Know this becuase I hit pause at the end of the first run, right before "end," and check all the values and matrix elements. However, if I run the program proper, which includes a loop of 20 iterations, I get the error message above, and all of the matrix values are empty! I cannot figure out what the problem is. I am fairly new to programming so the answer may be very simple as far as I know. Here is the Scilab version of the program:
Ne=8; Ni=2;
re=rand(Ne,1); ri=rand(Ni,1);
a=[0.02*ones(Ne,1); 0.02+0.08*ri];
b=[0.2*ones(Ne,1); 0.25-0.05*ri];
c=[-65+15*re.^2; -65*ones(Ni,1)];
d=[8-6*re.^2; 2*ones(Ni,1)];
S=[0.5*rand(Ne+Ni,Ne), -rand(Ne+Ni,Ni)];
v=60*rand(10,1)
v2=v
u=b.*v;
firings=[];
for t=1:20
I=[5*rand(Ne,1,"normal");2*rand(Ni,1,"normal")];
fired=find(v>=30);
j = length(fired);
h = t*ones(j,1);
k=[h,fired'];
firings=[firings;k];
v(fired)=c(fired);
u(fired)=u(fired)+d(fired);
I=I+sum(S(:,fired),"c");
v=v+0.5*(0.04*v.^2+5*v+140-u+I);
v=v+0.5*(0.04*v.^2+5*v+140-u+I);
u=u+a.*(b.*v-u);
end
plot(firings(:,1), firings(:,2),".");
I tried everything to no avail. The program should run through 20 iterations and produce a "raster plot" of dots representing the fired neurons at each of the 20 time steps.
You can add the following line
oldEmptyBehaviour("on")
at the beginning of your script in order to prevent the default Scilab rule (any algebraic operation with an empty matrix yields an empty matrix). However you will still have some warnings (despite the result will be OK). As a definitive fix I recommend testing the emptyness of fired in your code, like this:
Ne=8; Ni=2;
re=rand(Ne,1); ri=rand(Ni,1);
a=[0.02*ones(Ne,1); 0.02+0.08*ri];
b=[0.2*ones(Ne,1); 0.25-0.05*ri];
c=[-65+15*re.^2; -65*ones(Ni,1)];
d=[8-6*re.^2; 2*ones(Ni,1)];
S=[0.5*rand(Ne+Ni,Ne), -rand(Ne+Ni,Ni)];
v=60*rand(10,1)
v2=v
u=b.*v;
firings=[];
for t=1:20
I=[5*rand(Ne,1,"normal");2*rand(Ni,1,"normal")];
fired=find(v>=30);
if ~isempty(fired)
j = length(fired);
h = t*ones(j,1);
k=[h,fired'];
firings=[firings;k];
v(fired)=c(fired);
u(fired)=u(fired)+d(fired);
I=I+sum(S(:,fired),"c");
end
v=v+0.5*(0.04*v.^2+5*v+140-u+I);
v=v+0.5*(0.04*v.^2+5*v+140-u+I);
u=u+a.*(b.*v-u);
end
plot(firings(:,1), firings(:,2),".");
The [] + 1 is not really defined in a mathematical sense. The operation might fail or produce different results depending on the software you use. For example:
Scilab 5 [] + 1 produces 1
Scilab 6 [] + 1 produces [] and a warning
Julia 1.8 [] .+ 1 produces [] but [] + 1 an error.
Python+Numpy 1.23 np.zeros((0,0)) + 1 produces [].
I suggest checking with size() or a comparison to the empty matrix to avoid such strange behaviour.

R: Enriched debugging for linear code chains

I am trying to figure out if it is possible, with a sane amount of programming, to create a certain debugging function by using R's metaprogramming features.
Suppose I have a block of code, such that each line uses as all or part of its input the output from thee line before -- the sort of code you might build with pipes (though no pipe is used here).
{
f1(args1) -> out1
f2(out1, args2) -> out2
f3(out2, args3) -> out3
...
fn(out<n-1>, args<n>) -> out<n>
}
Where for example it might be that:
f1 <- function(first_arg, second_arg, ...){my_body_code},
and you call f1 in the block as:
f1(second_arg = 1:5, list(a1 ="A", a2 =1), abc = letters[1:3], fav = foo_foo)
where foo_foo is an object defined in the calling environment of f1.
I would like a function I could wrap around my block that would, for each line of code, create an entry in a list. Each entry would be named (line1, line2) and each line entry would have a sub-entry for each argument and for the function output. the argument entries would consist, first, of the name of the formal, to which the actual argument is matched, second, the expression or name supplied to that argument if there is one (and a placeholder if the argument is just a constant), and third, the value of that expression as if it were immediately forced on entry into the function. (I'd rather have the value as of the moment the promise is first kept, but that seems to me like a much harder problem, and the two values will most often be the same).
All the arguments assigned to the ... (if any) would go in a dots = list() sublist, with entries named if they have names and appropriately labeled (..1, ..2, etc.) if they are assigned positionally. The last element of each line sublist would be the name of the output and its value.
The point of this is to create a fairly complete record of the operation of the block of code. I think of this as analogous to an elaborated version of purrr::safely that is not confined to iteration and keeps a more detailed record of each step, and indeed if a function exits with an error you would want the error message in the list entry as well as as much of the matched arguments as could be had before the error was produced.
It seems to me like this would be very useful in debugging linear code like this. This lets you do things that are difficult using just the RStudio debugger. For instance, it lets you trace code backwards. I may not know that the value in out2 is incorrect until after I have seen some later output. Single-stepping does not keep intermediate values unless you insert a bunch of extra code to do so. In addition, this keeps the information you need to track down matching errors that occur before promises are even created. By the time you see output that results from such errors via single-stepping, the matching information has likely evaporated.
I have actually written code that takes a piped function and eliminates the pipes to put it in this format, just using text manipulation. (Indeed, it was John Mount's "Bizarro pipe" that got me thinking of this). And if I, or we, or you, can figure out how to do this, I would hope to make a serious run on a second version where each function calls the next, supplying it with arguments internally rather than externally -- like a traceback where you get the passed argument values as well as the function name and and formals. Other languages have debugging environments like that (e.g. GDB), and I've been wishing for one for R for at least five years, maybe 10, and this seems like a step toward it.
Just issue the trace shown for each function that you want to trace.
f <- function(x, y) {
z <- x + y
z
}
trace(f, exit = quote(print(returnValue())))
f(1,2)
giving the following which shows the function name, the input and output. (The last 3 is from the function itself.)
Tracing f(1, 2) on exit
[1] 3
[1] 3

Recursion with code after the recursive call

I am trying to understand how recursion works and there is just one more thing that I do not quite understand: how a recursive function works when there is code after the recursive call within the recursive function itself. Please see below at the example pseudocode to help see what I mean. My exact question is in what order (meaning when) the code after that recursive call will be executed. Will the machine note the recursive call, execute the remaining bit of code after the call (print "done"), then go back and actually execute the entire recursive call, or will the machine execute the recursive call as soon as it gets to that line and only execute that last bit of code (print "done") after that recursion bottoms out? When and how many times will "done" be printed?
void recurse()
{
print "hello world";
for i = 0 up to 2
recurse();
print "done";
}
The recursive call runs BEFORE any code below it. Once it returns, it will go back and finish the rest of the code. So what happens is
"hello world"
i = 0
"hello world"
i = 0
"hello world"
...
forever. Because you don't pass the value of i to the next recursive function, your code will run forever, restarting each time with i=0.
Let's assume though that you did pass i to the recursive function properly:
void recurse(i) {
// is i still < 2?
if (i < 2) {
print "hello world";
recurse(i+1);
print "done";
}
recurse(0);
In this case, you would get:
i = 0
"hello world"
i = 1
"hello world"
i = 2
"done"
"done"
A good way to visualize recursion is using the depth/height of the stack. As you may know, whenever a new function is called, it's pushed onto the stack like a pancake, increasing the depth/height by 1. If you code it up and print your "start" and "end" notes with an indentation to visualize the depth, it should be easy to see what is executed when. In case it isn't clear, time is on the Y-axis (things printed above have executed before things below) and recursion depth is on the X-axis.
Here's the code in Python:
def recurse(depth=0):
if depth < 4:
print(" " * depth + f"starting at depth {depth}")
for _ in range(2):
recurse(depth + 1)
print(" " * depth + f"ending at depth {depth}")
recurse()
Output:
starting at depth 0
starting at depth 1
starting at depth 2
starting at depth 3
ending at depth 3
starting at depth 3
ending at depth 3
ending at depth 2
starting at depth 2
starting at depth 3
ending at depth 3
starting at depth 3
ending at depth 3
ending at depth 2
ending at depth 1
starting at depth 1
starting at depth 2
starting at depth 3
ending at depth 3
starting at depth 3
ending at depth 3
ending at depth 2
starting at depth 2
starting at depth 3
ending at depth 3
starting at depth 3
ending at depth 3
ending at depth 2
ending at depth 1
ending at depth 0
As can be seen, there are two identical recursive calls that are spawned in the loop. The first trip through the loop completes its entire recursive execution before the second one begins. After both recursive calls complete, then the entire call ends.
Also note that the depth represents a base case or terminal/leaf node that has no children. Your original algorithm will recurse infinitely and blow the stack.

how to stop R code if dataframe has 0 obersvations then stop execution

I hav lots of exam data that am cleaning but some of the data after being processed has zero obs, I have written this code in a function & am calling the function in a loop so if any 1 file has zero observation # check7 data-frame how can i code to stop the ongoing code execution & how can i directly display only the roll.number and score=0 of the current student roll.number without there being a error 'Error in aggregate.data.frame that no rows to aggregate' any help is appreciated...
it should display
Roll.number Score
602200166 0
You should be able to use the stop function. Add the following beneath check7:
if(nrow(check7) == 0L) stop('Error in aggregate.data.frame that no rows to aggregate')
Since you say that you are running a loop, you could also add the iteration into the message using paste. Let's say your loop iterator variable was "i." You could then print the iteration as follows:
if(nrow(check7) == 0L) stop(paste("Error in aggregate.data.frame", i, "no rows to aggregate")
If the error is occurring at the final line, i.e., the creation of the total object, you could put the following to print out which iteration is creating the problem:
cat("we have reached test number", i, "at this point")
This will print every iteration until the error occurs, which will let you figure out what test is causing the problem.

Run a function without executing its print() statements

I wrote a function that outputs a single element numeric after a 300-cycle for loop. I make it print about 10 lines in each cycle, to know where it's at. Now I want to run this for loop itself in a 1000 cycle for loop (and place the resulting numbers in a matrix). But it prints way to much stuff and I don't know where it's at in the execution of the outer (1000 cycle) for loop. The output from the inner for loop overwhelms a print statement executed at each of the outer loop's cycles. Here's how it looks:
for(i in 1:1000){
function(...){...} #prints 10 lines 300 times before outputting a single element numeric
cat("Outer loop step "); print(i)}
Now I don't want to remove the print statements from my function, but I want to mute them when I call the function in that for loop. How can I run my function without executing it's print() statements?
Modify your function so you can pass in a "debug" true/false parameter to control the print statements.
Don't use print or cat. Use message instead. You can then use suppressMessages to suppress the message output.

Resources