Nested While-Loop - R

Nested While-Loop - R - r

So I have a table m, consisting of a random number of rows and columns. (can be any size)...
I want to do this calculation against each rows/columns totals:
r[i] * c[j] / n;
Where r <- rowSums(m);, c <- colSums(m); and n <- sum(m);
I can do it with a double for-loop but I'm hoping to implement it now using while loops.
I wasn't going to use while loops but seems the table size can differ, I figured it was wise too.
I'm storing each value as it's found in a test vector.
This is my attempt, but I'm messing up the indices:
while(i < nrow(m)){
while(j < ncol(m)){
test[i] <- r[i]*c[j] / n;
j=j+1;
i=i+1;
}
j=j+1;
i=i+1;
}
Any guidance to help me sort out my loops would be much appreciated. Thanks in advance.
update
See below for an example and expected result:
m <- t(matrix(c(28,48,10,114), nrow=2, ncol=2));
r <- rowSums(m); #76 124 (sum of rows)
c <- colSums(m); #38 162 (sum of cols)
n <- sum(m); #200 (sum of all cells)
test <- c(0, times length(m)); #empty vector/data frame
#inside while loops, calc for each r and c indice:
test[1] <- 76 *38 /200 #first calc to test[i] where i=1
test[2] <- 124*38 /200
test[3] <- 76*162 /200
test[4] <- 124*162/200 #last calc to test[i] where i==length(m)

I would avoid using a for or while loop and do something like this instead:
> as.vector(outer(r,c, function(x,y) x*y/n))
[1] 14.44 23.56 61.56 100.44

No need to use a while loop. It is always best to use vector operations in R (and any other array-based language). It makes for clearer and faster code.
nrows<-sample(1:100,1) # a random number of rows
ncols<-sample(1:100,1) # a random number of columns
#create a matrix of random numbers with our random dimnesions
m<-matrix(runif(nrows*ncols), nrow=nrows)
n<-sum(m)
#read into outer, it creates a cartesian product of your vectors
#so you will have every r[i] multipled with every r[j]...ie what your loop is doing
r<-outer(rowSums(m),colSums(m),function(x,y) x*y/n)
Hope this helps, let me know if you have any questions.

A more R-like solution would be to use expand.grid instead of a nested while loop:
Set-up:
> m <- matrix(1:12, 3, 4)
> m
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> n <- sum(m)
> r <- rowSums(m)
> c <- colSums(m)
Now:
> test <- expand.grid(r,c)
> test
Var1 Var2
1 22 6
2 26 6
3 30 6
4 22 15
5 26 15
6 30 15
7 22 24
8 26 24
9 30 24
10 22 33
11 26 33
12 30 33
> test <- test[,1] * test[,2] / n
> test
[1] 1.692308 2.000000 2.307692 4.230769 5.000000 5.769231 6.769231
[8] 8.000000 9.230769 9.307692 11.000000 12.692308

Related

How do I write a function that tests if a positive integer n can be written as sum of two squared integers, i. e. n = i^2 +j^2 with i; j 2 N>0

I've been cracking my head trying to solve this problem but so far I haven't been able to do it. How could I write this code in R?

Sadly I can't add a comment but I guess this wikipedia article can really help you:
https://en.wikipedia.org/wiki/Sum_of_two_squares_theorem
basically, you should decompose n to its primes and then check:
An integer greater than one can be written as a sum of two squares if and only if its prime decomposition contains no term p^k, where prime p = 3 mod 4 and k is odd.

Actually you can build a user function f like below
f <- function(n) {
p <- seq(floor(sqrt(n - 1)))
q <- sqrt(n - p**2)
idx <- q == floor(q)
data.frame(A = p[idx], B = q[idx])
}
such that
> f(9)
[1] A B
<0 rows> (or 0-length row.names)
> f(100)
A B
1 6 8
2 8 6
> f(500)
A B
1 4 22
2 10 20
3 20 10
4 22 4
> f(1000)
A B
1 10 30
2 18 26
3 26 18
4 30 10
where f(9) giving an empty data frame means that 9 cannot be decomposed as the sum of two squared integers.

Applying the theorem mentioned by Blop
library(gmp)
library(data.table)
is_sum_two_squared_ints <- function(x){
factor <- gmp::factorize(x)
factor_mod4 <- gmp::mod.bigz(factor, 4)
# converting from Big Integer to integer, assuming your number isn't too big
dt <- data.table(factor = as.integer(factor), factor_mod4 = as.integer(factor_mod4))
n_meets_criteria <-
dt[, .N, .(factor, factor_mod4)][factor_mod4 == 3 & N %% 2 == 1, .N]
return(n_meets_criteria == 0)
}
is_sum_two_squared_ints(25)
# [1] TRUE
is_sum_two_squared_ints(3)
# [1] FALSE
is_sum_two_squared_ints(220)
# [1] FALSE
is_sum_two_squared_ints(116)
# [1] TRUE

Circumvent aggregation in for-loop R

I want to find the first index k of an array, where the aggregate until that k is bigger than an given cutoff. This looks like follows in the code:
k <- 0
agg <- 0
while (agg < cutoff) {
k <- k +1
agg <- sum(array[1:k])
}
I was told there is a way to rewrite this without the for loop, I was also told the which statement would be helpful. I'm new to R and couldn't find the way. Any thoughts on this?

First we find array of partial sums:
x <- 1:10
partial_sums <- Reduce('+', x, accumulate = T)
partial_sums
[1] 1 3 6 10 15 21 28 36 45 55
Next we find the indices of all the elements of partial_sums array which are bigger then cutoff:
cutoff <- 17
indices <- which(partial_sums > cutoff)
indices[1]
[1] 6
Please note, that indices could be empty.

You can use the following:
seed(123)#in order to have reproducible "random" numbers
m1 <- matrix(sample(10),nrow = 5,ncol = 2)# create a matrix
m1
[,1] [,2]
[1,] 7 5
[2,] 4 2
[3,] 9 8
[4,] 1 6
[5,] 3 10
cutoff <- 5 #applying cutoff value
apply(m1,2,function(x){x<cutoff})#checking each column using apply instead of loop
OR:
which(m1 < cutoff) #results in the indices of m1 that comply to the condition <cutoff
[1] 2 4 5 7
EDIT
cutoff<-30# a new cutoff
v1<-unlist(lapply(seq_along(1:(nrow(m1)*ncol(m1))),function(x){sum(m1[1:x])}))#adding the values of each cell
which(v1>=cutoff)[1]#find the 1st of occurrence

R: Need to populate a matrix with counts of coordinates (from data frame with list of coordinates, some of which are duplicates)

I have a list of events with (x,y) coordinates, where x ranges from 1 to 100 and y from 1 to 86. There are (often many) duplicates of each coordinate.
I want to populate a matrix (effectively a number grid) with the counts of each coordinate. How do I do this?
Right now, my best attempt is:
s=matrix(data=NA,nrow=n,ncol=k)
for(i in 1:n){
for(j in 1:k){
s[i,j]=nrow(subset(data,x_column==i & y_column==j))
}
}
This works for small (~10,000 rows) data frames, but I'd like to run it for a data frame with nearly 3 million rows, and my method is far too slow.
Edit (data):
n=86;k=100;
x_column y_column
54 30
51 32
65 34
19 46
51 27
45 60
62 31
64 45
16 69
31 33
Thanks guys!
Edit: well, it turns out the program was fast enough for my needs -- my workspace was just bogged down with tons of data, and it was slowing everything I tried to do down. So my method works, but it's good to know alternate ways of populating a matrix. I uploaded the first 10 rows; could someone do a speed test?

Here's one approach, using the data.table and Matrix packages:
library(data.table)
library(Matrix)
f <- function(df, nx, ny) {
## Tally up the frequencies
dt <- data.table(df, key=c("x", "y"))
xyN <- dt[, .N, by=key(dt)]
## Place counts in matrix in their respective i/j x/y row/column
as.matrix(with(xyN, sparseMatrix(i=x,j=y,x=N,dims=c(nx,ny))))
}
## Check that it works:
df <- data.frame(x=c(2,2,2,3,3,3), y=c(1,1,1,1,2,2))
f(df, nx=4, ny=4)
# [,1] [,2] [,3] [,4]
# [1,] 0 0 0 0
# [2,] 3 0 0 0
# [3,] 1 2 0 0
# [4,] 0 0 0 0
## Speed test with 3 million coordinates
df <- data.frame(x=sample(1:100, 3e6,replace=T), y=sample(1:86, 3e6, replace=T))
system.time(res <- f(df, nx=100, ny=86))
# user system elapsed
# 0.16 0.03 0.19
sum(res)
# [1] 3e+06
If you can guarantee that you'll have at least some coordinates in each possible row and column, you can just use base R's table() (though it isn't nearly as fast):
df <- data.frame(x=sample(1:100, 3e6,replace=T), y=sample(1:86, 3e6, replace=T))
system.time(res2 <- as.matrix(table(df)))
# user system elapsed
# 2.67 0.07 2.74
sum(res2)
# [1] 3000000

Multiplying two different dataframes only for completely matching row and column names

Lets make a dummy data first
a=data.frame(average=c(5,6), row.names=c("Q", "R"))
> a
average
Q 5
R 6
b=data.frame(c(5,5,7), c(8,9,10), c(11,12,14))
> colnames(b)<-c("Q","QQ","R")
> b
Q QQ R
1 5 8 11
2 5 9 12
3 7 10 14
i want to multiply columns of 'b' that completely match with the row names of 'a' (here, Q and R completely match). But when I make a simple loop it gives
n = row.names(a)
> lapply(1:length(n), function(i)
+ a[grep(n[i], row.names(a)),]*b[,grep(n[i], colnames(b))])
[[1]]
Q QQ
1 25 40
2 25 45
3 35 50
[[2]]
[1] 66 72 84
which means it also multiplies QQ since Q and QQ have one letter in common! How can I get the following?
[[1]]
[1] 25 25 35
[[2]]
[1] 66 72 84

When you want column "Q" of a data.frame, you can just do b[, "Q"] or b[["Q"]]. Whereas doing b[, grep("Q", colnames(b))] will return all columns whose name contains a Q. With that in mind, your code should be:
n <- rownames(a)
lapply(1:length(n), function(i) a[n[i], ] * b[, n[i]])
or
lapply(rownames(a), function(i) a[i, ] * b[, i])
Maybe a more elegant approach would be to do:
i <- intersect(rownames(a), colnames(b))
Map(`*`, a[i, ], b[, i])

Pairwise matrix correlations in R - how to iterate through all pairs?

I have 13 matrices of various dimensions that i'd like to use in pairwise matrix correlations with a custom function (that calculates the Rv coefficient). The function takes two arguments (matrix1, matrix2) and produces a scalar (basically a multivariate r value). I'd like to run the function on all possible pairs of matrices (so 78 correlations in total) and produce a 13 by 13 matrix of the resulting Rv values with the names of the 13 matricies in the rows and columns. I thought of trying to do this by putting the matricies inside a list and using a double for loop to go through the elements of the list, but that seems very complex. I've given a stipped down example with dummy data below. Does anyone have any suggestions for how this might be approached? Thanks in advance.
# Rv function
Rv <- function(M1, M2) {
tr <- function(x) sum( diag(x) )
psd <- function(x) x %*% t(x)
AA <- psd(M1)
BB <- psd(M2)
num <- tr(AA %*% BB)
den <- sqrt( tr(AA %*% AA) * tr(BB %*% BB) )
Rv <- num / den
list(Rv=Rv, "Rv^2"=Rv^2)
}
# data in separate matricies
matrix1 <- matrix(rnorm(100), 10, 10)
matrix2 <- matrix(rnorm(100), 10, 10)
# ... etc. up to matrix 13
# or, in a list
matrix1 <- list( matrix(rnorm(100), 10, 10) )
rep(matrix1, 13) # note, the matrices are identical in this example
# call Rv function
Rv1 <- Rv(matrix1, matrix2)
Rv1$Rv^2
# loop through all 78 combinations?
# store results in 13 by 13 matrix with matrix rownames and colnames?

What I used in the past is expand.grid() followed by apply(). Here is a simpler example using just 1:3 rather than 1:13.
R> work <- expand.grid(1:3,1:3)
R> work
Var1 Var2
1 1 1
2 2 1
3 3 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
R> apply(work, 1, function(z) prod(z))
[1] 1 2 3 2 4 6 3 6 9
R>
You obviously want a different worker function.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Nested While-Loop - R - r

I would avoid using a for or while loop and do something like this instead: > as.vector(outer(r,c, function(x,y) x*y/n)) [1] 14.44 23.56 61.56 100.44

Related

How do I write a function that tests if a positive integer n can be written as sum of two squared integers, i. e. n = i^2 +j^2 with i; j 2 N>0

Circumvent aggregation in for-loop R

R: Need to populate a matrix with counts of coordinates (from data frame with list of coordinates, some of which are duplicates)

Multiplying two different dataframes only for completely matching row and column names

Pairwise matrix correlations in R - how to iterate through all pairs?

Categories

Resources