I can't understand the flow of multiple recursion - recursion

I can't expect this code output, I generally don't understand recursion, can you write the flow of this code or explain please, thanks in advance.
static void m(int n) {
if (n <= 0) {
} else {
m(n - 1);
m(n - 2);
System.out.println(n);
}
}
public static void main(String[] args) {
m(5);
}
}

Code is always processed top down (unless you have jump conditions)
For better understanding rewrite your code like this:
public static void main(String[] args) {
{
m(5);
}
private static int callNumber = 0;
private static int GetCallNumber() {
return ++callNumber;
}
private static void m(int n) {
int callNumber = GetCallNumber();
System.out.println(callNumber.ToString() + ": static void m(" + n.ToString() + ")");
System.out.println(callNumber.ToString() + ": if (" + n.ToString() + " <= 0)");
if (n <= 0) {
} else {
System.out.println(callNumber.ToString() + ": else");
System.out.println(callNumber.ToString() + ": m(" + n.ToString() + " - 1)");
m(n - 1);
System.out.println(callNumber.ToString() + ": m(" + n.ToString() + " - 2)");
m(n - 2);
System.out.println(callNumber.ToString() + ": System.out.println(" + n.ToString() + ")");
System.out.println(n);
}
System.out.println(callNumber.ToString() + ": //exit");
}
1: static void m(5)
1: if (5 <= 0)
1: else
1: m(5 - 1)
2: static void m(4)
2: if (4 <= 0)
2: else
2: m(4 - 1)
3: static void m(3)
3: if (3 <= 0)
3: else
3: m(3 - 1)
4: static void m(2)
4: if (2 <= 0)
4: else
4: m(2 - 1)
5: static void m(1)
5: if (1 <= 0)
5: else
5: m(1 - 1)
6: static void m(0)
6: if (0 <= 0)
6: //exit
5: m(1 - 2)
7: static void m(-1)
7: if (-1 <= 0)
7: //exit
5: System.out.println(1)
1
5: //exit
4: m(2 - 2)
8: static void m(0)
8: if (0 <= 0)
8: //exit
4: System.out.println(2)
2
4: //exit
3: m(3 - 2)
9: static void m(1)
9: if (1 <= 0)
9: else
9: m(1 - 1)
10: static void m(0)
10: if (0 <= 0)
10: //exit
9: m(1 - 2)
11: static void m(-1)
11: if (-1 <= 0)
11: //exit
9: System.out.println(1)
1
9: //exit
3: System.out.println(3)
3
3: //exit
2: m(4 - 2)
12: static void m(2)
12: if (2 <= 0)
12: else
12: m(2 - 1)
13: static void m(1)
13: if (1 <= 0)
13: else
13: m(1 - 1)
14: static void m(0)
14: if (0 <= 0)
14: //exit
13: m(1 - 2)
15: static void m(-1)
15: if (-1 <= 0)
15: //exit
13: System.out.println(1)
1
13: //exit
12: m(2 - 2)
16: static void m(0)
16: if (0 <= 0)
16: //exit
12: System.out.println(2)
2
12: //exit
2: System.out.println(4)
4
2: //exit
1: m(5 - 2)
17: static void m(3)
17: if (3 <= 0)
17: else
17: m(3 - 1)
18: static void m(2)
18: if (2 <= 0)
18: else
18: m(2 - 1)
19: static void m(1)
19: if (1 <= 0)
19: else
19: m(1 - 1)
20: static void m(0)
20: if (0 <= 0)
20: //exit
19: m(1 - 2)
21: static void m(-1)
21: if (-1 <= 0)
21: //exit
19: System.out.println(1)
1
19: //exit
18: m(2 - 2)
22: static void m(0)
22: if (0 <= 0)
22: //exit
18: System.out.println(2)
2
18: //exit
17: m(3 - 2)
23: static void m(1)
23: if (1 <= 0)
23: else
23: m(1 - 1)
24: static void m(0)
24: if (0 <= 0)
24: //exit
23: m(1 - 2)
25: static void m(-1)
25: if (-1 <= 0)
25: //exit
23: System.out.println(1)
1
23: //exit
17: System.out.println(3)
3
17: //exit
1: System.out.println(5)
5
1: //exit

Related

Trying to move numbers in NumericMatrix

I am created a double for loop in Rcpp to move up one cell all 1's in a column that has 5 in the next available cell. When I compile the code I don't get any error but the code does move 1's in the matrix, it just returns the same matrix. Let's take an original matrix, say named t:
5 1 1 1 1
1 5 5 5 1
5 5 1 5 5
5 0 0 5 1
5 5 0 1 1
after running the code up_rcpp(t,5,5), I should get the following results
1 5 1 1 1
5 5 1 5 1
5 5 5 5 1
5 0 0 5 5
5 1 0 1 1
Below is my rcpp code:
#include <Rcpp.h>
using namespace Rcpp;
//[[Rcpp::export]]
Rcpp::NumericMatrix up_rcpp(Rcpp::NumericMatrix main, int r, int c) {
Rcpp::NumericMatrix t = clone(main);
for (int j=0; j <= c-1; ++j) {
for (int i=0; i <= r-2; ++i){
if ((t(i,j) == 5) & (t(i+1, j) == 1))
{
main(i, j) = 1;
main(i + 1, j) = 5;
}
}
for (int i= r-1; i == r-1; ++i){
if ((t(i, j) == 5) & (t(1, j) == 1))
{
main(i, j) = 1;
main(1, j) = 5;
}
}
}
return main;
}
Maybe I'm a bit paranoid when I pass values to Rcpp, but I never allow my function to change what I pass either. But the clone(main) is necessary here to avoid changes to main changing t.
The last piece was to change the 1 indicies to 0 for the top row.
#include <Rcpp.h>
using namespace Rcpp;
//[[Rcpp::export]]
Rcpp::NumericMatrix up_rcpp(Rcpp::NumericMatrix main, int r, int c) {
Rcpp::NumericMatrix ans = clone(main);
Rcpp::NumericMatrix t = clone(main);
for (int j=0; j <= c-1; ++j) {
for (int i=0; i <= r-2; ++i){
if ((t(i,j) == 5) && (t(i+1, j) == 1))
{
ans(i, j) = 1;
ans(i + 1, j) = 5;
}
}
for (int i= r-1; i <= r-1; ++i){
if ((t(i, j) == 5) && (t(0, j) == 1))
{
ans(i, j) = 1;
ans(0, j) = 5;
}
}
}
return ans;
}
Which gives:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 1 1 1
[2,] 5 5 1 5 1
[3,] 5 5 5 5 1
[4,] 5 0 0 1 5
[5,] 5 1 0 5 1
This is different than your solution in column 4, but the way I understand the logic, this is correct.

Along runs of certain values, fill with first value from a second column

I have sample data:
testSample <- data.table(a = rnorm(n = 20, mean = 2, sd = 1),
b = sample(c(0,1), replace=TRUE, size=20))
testSample
a b
1: 3.1731458 0
2: 1.0687438 1
3: 2.9078655 1
4: 1.5675078 0
5: 2.7825992 0
6: 1.3672285 1
7: 3.6178619 0
8: 2.9067640 1
9: 2.5021129 0
10: 2.7672849 1
11: 2.3501007 1
12: -0.2923344 0
13: 0.3920071 1
14: 2.5113855 0
15: 2.2192234 1
16: 0.5913632 0
17: 0.8864734 1
18: 1.9187394 0
19: 1.1238824 1
20: 1.5001240 1
In column 'b' there are runs of alternating 0 and 1. Along each consecutive run of 1, I want a new column 'c' to be filled with the number from the column "a" at the index of the first 1 in each run.
When 'b' is 0, 'c' should be NA
Desired output where I filled in the new 'c' column manually:
a b c
1: 3.1731458 0 NA
2: 1.0687438 1 1.0687438 # <- run of 1.
3: 2.9078655 1 1.0687438 # <- All rows filled with the first 'a' value in the run
4: 1.5675078 0 NA
5: 2.7825992 0 NA
6: 1.3672285 1 1.3672285 # <-
7: 3.6178619 0 NA
8: 2.9067640 1 2.9067640 # <-
9: 2.5021129 0 NA
10: 2.7672849 1 2.7672849 # <- run of 1
11: 2.3501007 1 2.7672849 # <- All rows filled with the first 'a' value in the run
12: -0.2923344 0 NA
13: 0.3920071 1 0.3920071
14: 2.5113855 0 NA
15: 2.2192234 1 2.2192234
16: 0.5913632 0 NA
17: 0.8864734 1 0.8864734
18: 1.9187394 0 NA
19: 1.1238824 1 1.1238824
20: 1.5001240 1 1.1238824
set.seed(47)
testSample <- data.table(a = rnorm(n = 20, mean = 2, sd = 1),
b = sample(c(0,1), replace=TRUE, size=20))
testSample[, grouper := rleid(b)][b == 1, c := a[1], by = .(grouper)]
testSample
# a b grouper c
# 1: 3.9946963 0 1 NA
# 2: 2.7111425 0 1 NA
# 3: 2.1854053 1 2 2.1854053
# 4: 1.7182350 0 3 NA
# 5: 2.1087755 0 3 NA
# 6: 0.9142625 1 4 0.9142625
# 7: 1.0145178 1 4 0.9142625
# 8: 2.0151309 1 4 0.9142625
# 9: 1.7479541 1 4 0.9142625
# 10: 0.5342497 1 4 0.9142625
# 11: 1.0775438 0 5 NA
# 12: 2.0396024 0 5 NA
# 13: 2.4938202 0 5 NA
# 14: 0.1717708 1 6 0.1717708
# 15: 2.0914729 0 7 NA
# 16: 2.6707792 1 8 2.6707792
# 17: 1.9189219 0 9 NA
# 18: 3.2642411 0 9 NA
# 19: 1.2966118 1 10 1.2966118
# 20: 1.9594218 0 11 NA
You can, of course, drop the grouper column when you're done with it.
Using Gregor's data, similar answer:
testSample[, c := head(a, 1), by = rleid(b) ][ b == 0, c := NA ]
testSample
# a b c
# 1: 3.9946963 0 NA
# 2: 2.7111425 0 NA
# 3: 2.1854053 1 2.1854053
# 4: 1.7182350 0 NA
# 5: 2.1087755 0 NA
# 6: 0.9142625 1 0.9142625
# 7: 1.0145178 1 0.9142625
# 8: 2.0151309 1 0.9142625
# 9: 1.7479541 1 0.9142625
# 10: 0.5342497 1 0.9142625
# 11: 1.0775438 0 NA
# 12: 2.0396024 0 NA
# 13: 2.4938202 0 NA
# 14: 0.1717708 1 0.1717708
# 15: 2.0914729 0 NA
# 16: 2.6707792 1 2.6707792
# 17: 1.9189219 0 NA
# 18: 3.2642411 0 NA
# 19: 1.2966118 1 1.2966118
# 20: 1.9594218 0 NA
Here is another option:
mtd2 = DT2[, cc := {
ri <- rowid(rleid(b))
bool <- ri>1L
v <- replace(a, bool, NA_real_)
v <- nafill(v, "locf")
replace(v, b==0L, NA_real_)
}]
timing code:
microbenchmark::microbenchmark(times=3L,
mtd0 = DT0[, grouper := rleid(b)][b == 1L, cc := a[1L], by = .(grouper)],
mtd1 = DT1[, cc := head(a, 1L), by = rleid(b) ][ b == 0L, cc := NA_real_ ],
mtd2 = DT2[, cc := {
ri <- rowid(rleid(b))
bool <- ri>1L
v <- replace(a, bool, NA_real_)
v <- nafill(v, "locf")
replace(v, b==0L, NA_real_)
}],
mtd3 = DT3[, cc := {
cs = cumsum(b)
nafill(a * NA_real_^(cs - cummax((!b) * cs) > 1), "locf") * NA_real_^(b == 0L)
}]
)
all.equal(DT0$cc, DT1$cc)
#[1] TRUE
all.equal(DT0$cc, DT2$cc)
#[1] TRUE
all.equal(DT0$cc, DT3$cc)
#[1] TRUE
timings for nr <- 1e6L:
Unit: milliseconds
expr min lq mean median uq max neval
mtd0 198.31551 202.87683 211.27864 207.43815 217.76021 228.08227 3
mtd1 3559.34608 3575.83858 3648.31707 3592.33108 3692.80257 3793.27405 3
mtd2 62.99026 63.58249 64.05060 64.17471 64.58078 64.98684 3
mtd3 48.19877 49.60878 51.08868 51.01879 52.53364 54.04849 3
timings for nr <- 1e7L:
Unit: milliseconds
expr min lq mean median uq max neval
mtd0 1912.1486 2019.0890 2069.4102 2126.0294 2148.0410 2170.0527 3
mtd2 712.6978 774.7994 806.8337 836.9009 853.9016 870.9023 3
mtd3 515.7079 525.7400 531.4712 535.7722 539.3529 542.9336 3
data:
library(data.table)
set.seed(47L)
nr <- 1e6L
testSample <- data.table(a = rnorm(n = nr, mean = 2, sd = 1),
b = sample(c(0,1), replace=TRUE, size=nr))
DT0 <- copy(testSample)
DT1 <- copy(testSample)
DT2 <- copy(testSample)

R data.table grepl column on another column in i

Can I subset for when a string in column A is in column B?
Example:
x <- data.table(a=letters, y=paste0(letters,"x"))
x[grepl(a, y)]
x[like(y, a)]
Both return only a one row data.table of the first row and the following warning:
Warning message:
In grepl(pattern, vector) :
argument 'pattern' has length > 1 and only the first element will be used
I would expect this to return all rows.
The following code applies grepl to each row with the a and y as a pair of that row. Basically, the first argument of grepl cannot be a vector with length larger than 1, so looping or lapply based approach is needed.
x[mapply(grepl, a, y), ]
# a y
# 1: a ax
# 2: b bx
# 3: c cx
# 4: d dx
# 5: e ex
# 6: f fx
# 7: g gx
# 8: h hx
# 9: i ix
# 10: j jx
# 11: k kx
# 12: l lx
# 13: m mx
# 14: n nx
# 15: o ox
# 16: p px
# 17: q qx
# 18: r rx
# 19: s sx
# 20: t tx
# 21: u ux
# 22: v vx
# 23: w wx
# 24: x xx
# 25: y yx
# 26: z zx
# a y
One more possibility could be using dplyr. Something like:
x <- data.table(a=letters, y=paste0(letters,"x"))
x %>% rowwise() %>%
filter(grepl(a,y)) %>% as.data.frame()
a y
1: a ax
2: b bx
3: c cx
4: d dx
5: e ex
6: f fx
7: g gx
8: h hx
9: i ix
........ so

Better conditional insertion of new data entries (rows) in a data.table with R

The following data table contains returns (RET) of 5 portfolios (portfolio numbers are factors NOT integers) for two dates.
set.seed(123)
DT <- data.table(date = rep(as.Date(c("2005-05-02", "2005-05-03")), each = 5), portfolio = factor(rep(1:5, 2), levels = c(1:5, "diff", "avg")), RET = rnorm(n = 10))
date portfolio RET
1: 2005-05-02 1 -0.56047565
2: 2005-05-02 2 -0.23017749
3: 2005-05-02 3 1.55870831
4: 2005-05-02 4 0.07050839
5: 2005-05-02 5 0.12928774
6: 2005-05-03 1 1.71506499
7: 2005-05-03 2 0.46091621
8: 2005-05-03 3 -1.26506123
9: 2005-05-03 4 -0.68685285
10: 2005-05-03 5 -0.44566197
For each date, I want to add to the data table the return of the difference portfolio, i.e. the difference between the return of the 5th portfolio and the return of the 1st portfolio, and the return of the average portfolio, i.e. the average return of the five portfolios. In particular, I want to create the following data.table
date portfolio RET
1: 2005-05-02 1 -0.56047565
2: 2005-05-02 2 -0.23017749
3: 2005-05-02 3 1.55870831
4: 2005-05-02 4 0.07050839
5: 2005-05-02 5 0.12928774
6: 2005-05-02 avg 0.19357026
7: 2005-05-02 diff 0.68976338
8: 2005-05-03 1 1.71506499
9: 2005-05-03 2 0.46091621
10: 2005-05-03 3 -1.26506123
11: 2005-05-03 4 -0.68685285
12: 2005-05-03 5 -0.44566197
13: 2005-05-03 avg -0.04431897
14: 2005-05-03 diff -2.16072696
One way to do this (based on this post) is
DT = DT[, .SD[1:(.N+1)], date][, .(portfolio = replace(portfolio, is.na(portfolio), "avg"), RET = replace(RET, is.na(portfolio), mean(RET[!is.na(RET)]) ) ), date]
DT = DT[, .SD[1:(.N+1)], date][, .(portfolio = replace(portfolio, is.na(portfolio), "diff"), RET = replace(RET, is.na(portfolio), RET[portfolio == "5"] - RET[portfolio == "1"]) ), date]
Another way would be to create new data tables for the difference and average portfolio and then rbindlist them all.
DT = rbindlist(
l = list(DT,
DT[, .(portfolio = "diff", RET = RET[portfolio == "5"] - RET[portfolio == "1"]), by = date],
DT[, .(portfolio = "avg", RET = mean(RET)), by = date]
))
DT[order(date, portfolio)]
Is there a better way?

The running time of my program in big O time

Can someone help me figure out the running time of this loop? I believe it is O(5nlogn).
for(int f = 0; f < Array.length; f++) {
F = Array[f];
for(int e = 0; e <= f; e++) {
E = Array[e];
for(int d = 0; d <= e; d++) {
D = Array[d];
for(int c = 0; c <= d; c++) {
C = Array[c];
for(int b = 0; b <= c; b++) {
B = Array[b];
for(int a = 0; a <= b; a++) {
A = Array[a];
}
}
}
}
}
}
Thanks
The answer is Θ(n6). I wrote a program to simulate the inner loop and record how many times a series of n executions occurs:
static void Main(string[] args)
{
int arrLength = 20;
int[] arr = new int[arrLength];
for (int f = 0; f < arrLength; f++)
{
for (int e = 0; e <= f; e++)
{
for (int d = 0; d <= e; d++)
{
for (int c = 0; c <= d; c++)
{
for (int b = 0; b <= c; b++)
{
//for (int a = 0; a <= b; a++)
arr[b] = arr[b] + 1;
}
}
}
}
}
for (int i = 0; i < arr.Length; i++)
{
Debug.WriteLine(string.Format("{0} execution: {1} time(s).", i + 1, arr[i]));
Console.WriteLine(string.Format("{0} execution: {1} time(s).", i + 1, arr[i]));
}
Console.ReadLine();
}
Running this with an arrLength of 1 gives:
1 execution: 1 time(s).
Running this with an arrLength of 2 gives:
1 execution: 5 time(s).
2 execution: 1 time(s).
Running this with an arrLength of 3 gives:
1 execution: 15 time(s).
2 execution: 5 time(s).
3 execution: 1 time(s).
As it turns out, the execution times always follow the same equation. At arrLength of 20, we get:
1 execution: 8855 time(s).
2 execution: 7315 time(s).
3 execution: 5985 time(s).
4 execution: 4845 time(s).
5 execution: 3876 time(s).
6 execution: 3060 time(s).
7 execution: 2380 time(s).
8 execution: 1820 time(s).
9 execution: 1365 time(s).
10 execution: 1001 time(s).
11 execution: 715 time(s).
12 execution: 495 time(s).
13 execution: 330 time(s).
14 execution: 210 time(s).
15 execution: 126 time(s).
16 execution: 70 time(s).
17 execution: 35 time(s).
18 execution: 15 time(s).
19 execution: 5 time(s).
20 execution: 1 time(s).
Plugging this into the awesome Online Encyclopedia of Integer Sequences, we get the Binomial coefficient binomial(n,4), which is this (the sequence starts at an offset of 4):
binomial(n,4)
n*(n-1)*(n-2)*(n-3)/24
0 = 0
1 = 0
2 = 0
3 = 0
4 = 1
5 = 5
6 = 15
7 = 35
...
If we look at the execution patterns output by my program above, we can rewrite it using a summation and this binomial sequence. For each integer i between 1 and n inclusive, we have the (n - i + 4)th number in the binomial(n,4) sequence, then multiplied by i, as the total number of executions. This is expressed as the following:
Substituting j = n - i + 1, and realizing that j goes from n downto 1, we can rewrite this equation as:
Relying on Wolfram Alpha to figure out this equation, I plugged in sum (n-j+1)(j+3)(j+2)(j+1)*j/24, j = 1 to n, and it came up with:
This is very obviously Θ(n6), so that is our answer.
The final equation is actually binomial(n,6), so for m loops, the number of executions of the innermost loop is probably binomial(n,m). For a given number of m loops, we have:
A good way to do this is to think about the space you're iterating over. If you think about it, the loops will iterate over nonnegative integral valuesof (a, b, c, d, e, f) where
n > f ≥ e ≥ d ≥ c ≥ b ≥ a
Each of these iterations does O(1) work (all loops just assign a variable, which takes O(1) work), so the question is how many possible values there are that satisfy the above formula. I'm going to claim it's Θ(n6), and will try to justify this with the rest of my answer.
First, note that the value certainly isn't any more than O(n6). All of a, b, c, d, e, and f range between 0 and n-1, so there's at most n different values for each. Therefore, the maximum possible number of values they can have is n6. This is not a tight bound, but it's certainly an upper bound. That gives us that the runtime is at most O(n6).
If we want to get a tighter bound, we have to work harder. To do this, I'm going to use the following fact:
1k + 2k + 3k + ... + nk = Θ(nk)
This is the sum of a geometric series, which is where it comes from.
This means that
sum(f from 0 to n-1)
sum (e from 0 to f)
sum (d from 0 to e)
sum (c from 0 to d)
sum (b from 0 to c)
sum (a from 0 to b)
1
= sum(f from 0 to n-1)
sum (e from 0 to f)
sum (d from 0 to e)
sum (c from 0 to d)
sum (b from 0 to c)
Theta(b)
= sum(f from 0 to n-1)
sum (e from 0 to f)
sum (d from 0 to e)
sum (c from 0 to d)
Theta(c^2)
= sum(f from 0 to n-1)
sum (e from 0 to f)
sum (d from 0 to e)
Theta(d^3)
= sum(f from 0 to n-1)
sum (e from 0 to f)
Theta(e^4)
= sum(f from 0 to n-1)
Theta(f^5)
= Theta(n^6)
Hope this helps!

Resources