So I have a table m, consisting of a random number of rows and columns. (can be any size)...
I want to do this calculation against each rows/columns totals:
r[i] * c[j] / n;
Where r <- rowSums(m);, c <- colSums(m); and n <- sum(m);
I can do it with a double for-loop but I'm hoping to implement it now using while loops.
I wasn't going to use while loops but seems the table size can differ, I figured it was wise too.
I'm storing each value as it's found in a test vector.
This is my attempt, but I'm messing up the indices:
while(i < nrow(m)){
while(j < ncol(m)){
test[i] <- r[i]*c[j] / n;
j=j+1;
i=i+1;
}
j=j+1;
i=i+1;
}
Any guidance to help me sort out my loops would be much appreciated. Thanks in advance.
update
See below for an example and expected result:
m <- t(matrix(c(28,48,10,114), nrow=2, ncol=2));
r <- rowSums(m); #76 124 (sum of rows)
c <- colSums(m); #38 162 (sum of cols)
n <- sum(m); #200 (sum of all cells)
test <- c(0, times length(m)); #empty vector/data frame
#inside while loops, calc for each r and c indice:
test[1] <- 76 *38 /200 #first calc to test[i] where i=1
test[2] <- 124*38 /200
test[3] <- 76*162 /200
test[4] <- 124*162/200 #last calc to test[i] where i==length(m)
I would avoid using a for or while loop and do something like this instead:
> as.vector(outer(r,c, function(x,y) x*y/n))
[1] 14.44 23.56 61.56 100.44
No need to use a while loop. It is always best to use vector operations in R (and any other array-based language). It makes for clearer and faster code.
nrows<-sample(1:100,1) # a random number of rows
ncols<-sample(1:100,1) # a random number of columns
#create a matrix of random numbers with our random dimnesions
m<-matrix(runif(nrows*ncols), nrow=nrows)
n<-sum(m)
#read into outer, it creates a cartesian product of your vectors
#so you will have every r[i] multipled with every r[j]...ie what your loop is doing
r<-outer(rowSums(m),colSums(m),function(x,y) x*y/n)
Hope this helps, let me know if you have any questions.
A more R-like solution would be to use expand.grid instead of a nested while loop:
Set-up:
> m <- matrix(1:12, 3, 4)
> m
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> n <- sum(m)
> r <- rowSums(m)
> c <- colSums(m)
Now:
> test <- expand.grid(r,c)
> test
Var1 Var2
1 22 6
2 26 6
3 30 6
4 22 15
5 26 15
6 30 15
7 22 24
8 26 24
9 30 24
10 22 33
11 26 33
12 30 33
> test <- test[,1] * test[,2] / n
> test
[1] 1.692308 2.000000 2.307692 4.230769 5.000000 5.769231 6.769231
[8] 8.000000 9.230769 9.307692 11.000000 12.692308
Related
I've been cracking my head trying to solve this problem but so far I haven't been able to do it. How could I write this code in R?
Sadly I can't add a comment but I guess this wikipedia article can really help you:
https://en.wikipedia.org/wiki/Sum_of_two_squares_theorem
basically, you should decompose n to its primes and then check:
An integer greater than one can be written as a sum of two squares if and only if its prime decomposition contains no term p^k, where prime p = 3 mod 4 and k is odd.
Actually you can build a user function f like below
f <- function(n) {
p <- seq(floor(sqrt(n - 1)))
q <- sqrt(n - p**2)
idx <- q == floor(q)
data.frame(A = p[idx], B = q[idx])
}
such that
> f(9)
[1] A B
<0 rows> (or 0-length row.names)
> f(100)
A B
1 6 8
2 8 6
> f(500)
A B
1 4 22
2 10 20
3 20 10
4 22 4
> f(1000)
A B
1 10 30
2 18 26
3 26 18
4 30 10
where f(9) giving an empty data frame means that 9 cannot be decomposed as the sum of two squared integers.
Applying the theorem mentioned by Blop
library(gmp)
library(data.table)
is_sum_two_squared_ints <- function(x){
factor <- gmp::factorize(x)
factor_mod4 <- gmp::mod.bigz(factor, 4)
# converting from Big Integer to integer, assuming your number isn't too big
dt <- data.table(factor = as.integer(factor), factor_mod4 = as.integer(factor_mod4))
n_meets_criteria <-
dt[, .N, .(factor, factor_mod4)][factor_mod4 == 3 & N %% 2 == 1, .N]
return(n_meets_criteria == 0)
}
is_sum_two_squared_ints(25)
# [1] TRUE
is_sum_two_squared_ints(3)
# [1] FALSE
is_sum_two_squared_ints(220)
# [1] FALSE
is_sum_two_squared_ints(116)
# [1] TRUE
I want to find the first index k of an array, where the aggregate until that k is bigger than an given cutoff. This looks like follows in the code:
k <- 0
agg <- 0
while (agg < cutoff) {
k <- k +1
agg <- sum(array[1:k])
}
I was told there is a way to rewrite this without the for loop, I was also told the which statement would be helpful. I'm new to R and couldn't find the way. Any thoughts on this?
First we find array of partial sums:
x <- 1:10
partial_sums <- Reduce('+', x, accumulate = T)
partial_sums
[1] 1 3 6 10 15 21 28 36 45 55
Next we find the indices of all the elements of partial_sums array which are bigger then cutoff:
cutoff <- 17
indices <- which(partial_sums > cutoff)
indices[1]
[1] 6
Please note, that indices could be empty.
You can use the following:
seed(123)#in order to have reproducible "random" numbers
m1 <- matrix(sample(10),nrow = 5,ncol = 2)# create a matrix
m1
[,1] [,2]
[1,] 7 5
[2,] 4 2
[3,] 9 8
[4,] 1 6
[5,] 3 10
cutoff <- 5 #applying cutoff value
apply(m1,2,function(x){x<cutoff})#checking each column using apply instead of loop
OR:
which(m1 < cutoff) #results in the indices of m1 that comply to the condition <cutoff
[1] 2 4 5 7
EDIT
cutoff<-30# a new cutoff
v1<-unlist(lapply(seq_along(1:(nrow(m1)*ncol(m1))),function(x){sum(m1[1:x])}))#adding the values of each cell
which(v1>=cutoff)[1]#find the 1st of occurrence
I have a list of events with (x,y) coordinates, where x ranges from 1 to 100 and y from 1 to 86. There are (often many) duplicates of each coordinate.
I want to populate a matrix (effectively a number grid) with the counts of each coordinate. How do I do this?
Right now, my best attempt is:
s=matrix(data=NA,nrow=n,ncol=k)
for(i in 1:n){
for(j in 1:k){
s[i,j]=nrow(subset(data,x_column==i & y_column==j))
}
}
This works for small (~10,000 rows) data frames, but I'd like to run it for a data frame with nearly 3 million rows, and my method is far too slow.
Edit (data):
n=86;k=100;
x_column y_column
54 30
51 32
65 34
19 46
51 27
45 60
62 31
64 45
16 69
31 33
Thanks guys!
Edit: well, it turns out the program was fast enough for my needs -- my workspace was just bogged down with tons of data, and it was slowing everything I tried to do down. So my method works, but it's good to know alternate ways of populating a matrix. I uploaded the first 10 rows; could someone do a speed test?
Here's one approach, using the data.table and Matrix packages:
library(data.table)
library(Matrix)
f <- function(df, nx, ny) {
## Tally up the frequencies
dt <- data.table(df, key=c("x", "y"))
xyN <- dt[, .N, by=key(dt)]
## Place counts in matrix in their respective i/j x/y row/column
as.matrix(with(xyN, sparseMatrix(i=x,j=y,x=N,dims=c(nx,ny))))
}
## Check that it works:
df <- data.frame(x=c(2,2,2,3,3,3), y=c(1,1,1,1,2,2))
f(df, nx=4, ny=4)
# [,1] [,2] [,3] [,4]
# [1,] 0 0 0 0
# [2,] 3 0 0 0
# [3,] 1 2 0 0
# [4,] 0 0 0 0
## Speed test with 3 million coordinates
df <- data.frame(x=sample(1:100, 3e6,replace=T), y=sample(1:86, 3e6, replace=T))
system.time(res <- f(df, nx=100, ny=86))
# user system elapsed
# 0.16 0.03 0.19
sum(res)
# [1] 3e+06
If you can guarantee that you'll have at least some coordinates in each possible row and column, you can just use base R's table() (though it isn't nearly as fast):
df <- data.frame(x=sample(1:100, 3e6,replace=T), y=sample(1:86, 3e6, replace=T))
system.time(res2 <- as.matrix(table(df)))
# user system elapsed
# 2.67 0.07 2.74
sum(res2)
# [1] 3000000
Lets make a dummy data first
a=data.frame(average=c(5,6), row.names=c("Q", "R"))
> a
average
Q 5
R 6
b=data.frame(c(5,5,7), c(8,9,10), c(11,12,14))
> colnames(b)<-c("Q","QQ","R")
> b
Q QQ R
1 5 8 11
2 5 9 12
3 7 10 14
i want to multiply columns of 'b' that completely match with the row names of 'a' (here, Q and R completely match). But when I make a simple loop it gives
n = row.names(a)
> lapply(1:length(n), function(i)
+ a[grep(n[i], row.names(a)),]*b[,grep(n[i], colnames(b))])
[[1]]
Q QQ
1 25 40
2 25 45
3 35 50
[[2]]
[1] 66 72 84
which means it also multiplies QQ since Q and QQ have one letter in common! How can I get the following?
[[1]]
[1] 25 25 35
[[2]]
[1] 66 72 84
When you want column "Q" of a data.frame, you can just do b[, "Q"] or b[["Q"]]. Whereas doing b[, grep("Q", colnames(b))] will return all columns whose name contains a Q. With that in mind, your code should be:
n <- rownames(a)
lapply(1:length(n), function(i) a[n[i], ] * b[, n[i]])
or
lapply(rownames(a), function(i) a[i, ] * b[, i])
Maybe a more elegant approach would be to do:
i <- intersect(rownames(a), colnames(b))
Map(`*`, a[i, ], b[, i])
I have 13 matrices of various dimensions that i'd like to use in pairwise matrix correlations with a custom function (that calculates the Rv coefficient). The function takes two arguments (matrix1, matrix2) and produces a scalar (basically a multivariate r value). I'd like to run the function on all possible pairs of matrices (so 78 correlations in total) and produce a 13 by 13 matrix of the resulting Rv values with the names of the 13 matricies in the rows and columns. I thought of trying to do this by putting the matricies inside a list and using a double for loop to go through the elements of the list, but that seems very complex. I've given a stipped down example with dummy data below. Does anyone have any suggestions for how this might be approached? Thanks in advance.
# Rv function
Rv <- function(M1, M2) {
tr <- function(x) sum( diag(x) )
psd <- function(x) x %*% t(x)
AA <- psd(M1)
BB <- psd(M2)
num <- tr(AA %*% BB)
den <- sqrt( tr(AA %*% AA) * tr(BB %*% BB) )
Rv <- num / den
list(Rv=Rv, "Rv^2"=Rv^2)
}
# data in separate matricies
matrix1 <- matrix(rnorm(100), 10, 10)
matrix2 <- matrix(rnorm(100), 10, 10)
# ... etc. up to matrix 13
# or, in a list
matrix1 <- list( matrix(rnorm(100), 10, 10) )
rep(matrix1, 13) # note, the matrices are identical in this example
# call Rv function
Rv1 <- Rv(matrix1, matrix2)
Rv1$Rv^2
# loop through all 78 combinations?
# store results in 13 by 13 matrix with matrix rownames and colnames?
What I used in the past is expand.grid() followed by apply(). Here is a simpler example using just 1:3 rather than 1:13.
R> work <- expand.grid(1:3,1:3)
R> work
Var1 Var2
1 1 1
2 2 1
3 3 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
R> apply(work, 1, function(z) prod(z))
[1] 1 2 3 2 4 6 3 6 9
R>
You obviously want a different worker function.