R Calculate big NOR matrix - r

I have a big square matrix in R:
norMat <- matrix(NA, nrow=1024, ncol=1024)
This empty matrix needs to be filled with the sum of all equal bits of all matrix index pairs.
So I need to calculate the logical NOR for i(rowIndex) and j(colIndex) and sum the result,e.g:
sum(intToBits(2)==intToBits(3))
Currenty, I have this function which fills the matrix:
norMatrix <- function()
{
matDim=1024
norMat <<- matrix(NA, nrow=matDim, ncol=matDim)
for(i in 0:(matDim-1)) {
for(j in 0:(matDim-1)) {
norMat[i+1,j+1] = norsum(i,j)
}
}
return(norMat)
}
And here's the norsum function:
norsum <- function(bucket1, bucket2)
{
res = sum(intToBits(bucket1)==intToBits(bucket2))
return(res)
}
Is this an efficient solution to fill the matrix?
I'm in doubt since on my machine this takes over 5 minutes.

I suggest this is a great opportunity for the *apply functions. Here's one solution that's a bit faster than 5 minutes.
First, proof of concept, non-square solely for clarity of dimensions.
nc <- 5
nr <- 6
mtxi <- sapply(seq_len(nc), intToBits)
mtxj <- sapply(seq_len(nr), intToBits)
sapply(1:nc, function(i) sapply(1:nr, function(j) sum(mtxi[,i] == mtxj[,j])))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 32 30 31 30 31
# [2,] 30 32 31 30 29
# [3,] 31 31 32 29 30
# [4,] 30 30 29 32 31
# [5,] 31 29 30 31 32
# [6,] 29 31 30 31 30
Assuming that these are correct, the full meal deal:
n <- 1024
mtx <- sapply(seq_len(n), intToBits)
system.time(
ret <- sapply(1:n, function(i) sapply(1:n, function(j) sum(mtx[,i] == mtx[,j])))
)
# user system elapsed
# 3.25 0.00 3.36
You don't technically need to pre-calculate mtxi and mtxj. Though intToBits does not introduce much overhead, I think it's silly to recalculate every time.
My system is reasonable (i7 6600U CPU # 2.60GHz), win10_64, R-3.3.2 ... nothing too fancy.

Related

Twin primes less than 87 in R

I am trying to list the first 87 twin primes. I'm using the Eratosthenes approach. Here is what I've worked on so far
Eratosthenes <- function(n) {
# Return all prime numbers up to n (based on the sieve of Eratosthenes)
if (n >= 2) {
sieve <- seq(2, n) # initialize sieve
primes <- c() # initialize primes vector
for (i in seq(2, n)) {
if (any(sieve == i)) { # check if i is in the sieve
primes <- c(primes, i) # if so, add i to primes
sieve <- sieve[(sieve %% i) != 0] # remove multiples of i from sieve
}
}
return(primes)
} else {
stop("Input value of n should be at least 2.")
}
}
Era <- c(Eratosthenes(87))
i <- 2:86
for (i in Era){
if (Era[i]+2 == Era[i+1]){
print(c(Era[i], Era[i+1]))
}
}
First thing I dont understand is this error:
Error in if (Era[i] + 2 == Era[i + 1]) { :
missing value where TRUE/FALSE needed
Second thing is in the list there are missing twin primes so for example (29,31)
Within your for loop, i is not index any more but the element in Era. In this case, you can try using (i+2) %in% Era to judge if i+2 is the twin
for (i in Era){
if ((i+2) %in% Era){
print(c(i,i+2))
}
}
which gives
[1] 3 5
[1] 5 7
[1] 11 13
[1] 17 19
[1] 29 31
[1] 41 43
[1] 59 61
[1] 71 73
A simpler way might be using diff, e.g.,
i <- Era[c(diff(Era)==2,FALSE)]
print(cbind(i,j = i+2))
which gives
> print(cbind(i,j = i+2))
i j
[1,] 3 5
[2,] 5 7
[3,] 11 13
[4,] 17 19
[5,] 29 31
[6,] 41 43
[7,] 59 61
[8,] 71 73
Firstly, (23,29) is not twin prime.
Secondly, your answer may be found in here
Edit: I've tried your code, I found that length of Era is 23.
Maybe when running if (Era[i] + 2 == Era[i+1]), it reaches to 24 and causes the problem.
for (i in Era) will set i to 2, then 3, then 5 etc which is not what you intended. Use for (i in seq_len(length(Era) - 1)).
for (i in seq_len(length(Era) - 1)){
if (Era[i] + 2 == Era[i + 1]){
print(c(Era[i], Era[i + 1]))
}
}
#> [1] 3 5
#> [1] 5 7
#> [1] 11 13
#> [1] 17 19
#> [1] 29 31
#> [1] 41 43
#> [1] 59 61
#> [1] 71 73

Loop over matrix using n consecutive rows in R

I have a matrix that consists of two columns and a number (n) of rows, while each row represents a point with the coordinates x and y (the two columns).
This is what it looks (LINK):
V1 V2
146 17
151 19
153 24
156 30
158 36
163 39
168 42
173 44
...
now, I would like to use a subset of three consecutive points starting from 1 to do some fitting, save the values from this fit in another list, an den go on to the next 3 points, and the next three, ... till the list is finished. Something like this:
Data_Fit_Kasa_1 <- CircleFitByKasa(Data[1:3,])
Data_Fit_Kasa_2 <- CircleFitByKasa(Data[3:6,])
....
Data_Fit_Kasa_n <- CircleFitByKasa(Data[i:i+2,])
I have tried to construct a loop, but I can't make it work. R either tells me that there's an "unexpected '}' in "}" " or that the "subscript is out of bonds". This is what I've tried:
minimal runnable code
install.packages("conicfit")
library(conicfit)
CFKasa <- NULL
Data.Fit <- NULL
for (i in 1:length(Data)) {
row <- Data[i:(i+2),]
CFKasa <- CircleFitByKasa(row)
Data.Fit[i] <- CFKasa[3]
}
RStudio Version 0.99.902 – © 2009-2016 RStudio, Inc.; Win10 Edu.
The third element of the fitted circle (CFKasa[3]) represents the radius, which is what I am really interested in. I am really stuck here, please help.
Many thanks in advance!
Best, David
Turn your data into a 3D array and use apply:
DF <- read.table(text = "V1 V2
146 17
151 19
153 24
156 30
158 36
163 39", header = TRUE)
a <- t(DF)
dim(a) <-c(nrow(a), 3, ncol(a) / 3)
a <- aperm(a, c(2, 1, 3))
# , , 1
#
# [,1] [,2]
# [1,] 146 17
# [2,] 151 19
# [3,] 153 24
#
# , , 2
#
# [,1] [,2]
# [1,] 156 30
# [2,] 158 36
# [3,] 163 39
center <- function(m) c(mean(m[,1]), mean(m[,2]))
t(apply(a, 3, center))
# [,1] [,2]
#[1,] 150 20
#[2,] 159 35
center(DF[1:3,])
#[1] 150 20

Output vector of loop function r

i´m trying to create an output vector of a loop, containing a result from each loop.
out=NULL
for (i in 1:5) {
out<-cbind(out,sample(1:100, 1)) #placeholderfunction
for (i in 1:5) {out[i]<- i+1}
}
The good side: My result contains the correct values. The bad side: it does as a matrix and i don´t know why.
> out
out
[1,] 2 71 14 46 96
[2,] 3 71 14 46 96
[3,] 4 71 14 46 96
[4,] 5 71 14 46 96
[5,] 6 71 14 46 96
What i want would be something like:
> out
out
[1,] 2 71 14 46 96
Probably it is just a small step from where i stand, but i just can´t figure it out, maybe someone could help?
(and yes i could just remove but i would like my code clean)
Thanks!
Ok,
by looking at the problem again on this scale i found it - a superfluous line:
> out=NULL
> for (i in 1:5) {
+ out<-cbind(out,sample(1:100, 1))
+ }
> out
[,1] [,2] [,3] [,4] [,5]
[1,] 63 98 78 43 19
What about this
out <- sample(100,5)
Update
I see why I got a -1, the OP wants to construct a vector with a for loop. As a word of caution, creating a vector in this manner is usually not a good idea. For example, my above code is both simpler and faster than the OP's code. That withstanding, if you want generate a vector of random numbers with a for loop use this approach
my.loop <- function(l){
out_1 <- numeric(l)
for (i in 1:l) {
out_1[i] <- sample(1:100, 1)
}
out_1
}
This will be much better than op approach below because we are preallocating memory.
op.loop <- function(l){
out_2 = NULL
for (i in 1:l) {
out_2 <- cbind(out_2, sample(1:100, 1))
}
out_2
}
For fun I timed the two approaches:

How to write function that takes uses the single ouput from another function as starting point for new analysis?

I'm having trouble writing a function that calls another function and uses the output as the basis for running new analysis in a loop (or equivalent). For example, let's say function 1 creates this output: 10. The second function would take that as a starting point to run new analysis. The single data point from the second output would then be the basis for the next round of analysis, and so on.
Here's a simple example. The question is how to create a for loop for this. Or perhaps there's a more efficient way using lapply. In any case, the first function might be as follows:
f.1 <-function(x) {
x
a <-seq(x,by=1,length.out=5)
a.1 <-tail(a,1)
}
The second function, which calls the first function, could run as follows:
f.2 <-function(x) {
f.1 <-function(x) {
a <-seq(x,by=1,length.out=5)
a.1 <-tail(a,1)
}
z <-f.1(x)
y=z+1
seq(y,by=1,length.out=5)
}
How can I modify f.2() so that it re-runs that computation using the previous output as the basis for the next round of analysis. To be precise, f.1(10) outputs:
[1] 14
In turn, f.2(10) results in:
[1] 15 16 17 18 19
How can I re-write f.2() so that it automatically computes f.2(19) on the next iteration, and continually do so for several loops. In the process, I'd like to collect the outputs in a separate file for review. Thanks much!
The magrittr library (which is used most notably by dplyr) makes this type of chaining somewhat simple. First, define the functions,
f.1 <-function(x) {
x
a <- seq(x, by=1, length.out=5)
a.1 <- tail(a,1)
}
f.2 <-function(x) {
y <- x+1
seq(y, by=1, length.out=5)
}
then
library(magrittr)
f.1(10) %>% f.2
# [1] 15 16 17 18 19
As #BondedDust mentioned, you could use Reduce although normally it expects to use the same function over and over so you just need to flip the most common use case
Reduce(function(x,f) f(x), list(f.1, f.2), init=10)
# [1] 15 16 17 18 19
You can try this with two arguments for f.2. The first argument is the x value that you need to initialize x with and n is the number of iterations that you want to do. The output of the function will be a matrix containing n rows and 5 columns.
f.2 <-function(x, n) {
c <- matrix(nrow=n, ncol=5)
for (i in 1:nrow(c))
{
z <-f.1(x) ##if you have already defined your f.1(x) beforehand, there is no need to define it again in f.2. you can simply use z <- f.1(x) like it is done here
y=z+1
c[i,] = seq(y, by=1, length.out=5)
x = c[i,5]
}
return(c)
}
The output of
f <- f.2(10, 10) ##initialising x with 10 and running 10 loops
f
[,1] [,2] [,3] [,4] [,5]
[1,] 15 16 17 18 19
[2,] 24 25 26 27 28
[3,] 33 34 35 36 37
[4,] 42 43 44 45 46
[5,] 51 52 53 54 55
[6,] 60 61 62 63 64
[7,] 69 70 71 72 73
[8,] 78 79 80 81 82
[9,] 87 88 89 90 91
[10,] 96 97 98 99 100

vectorize this for loop (current row is dependent on row above)

Suppose I want to create n=3 random walk paths (pathlength = 100) given a pre-generated matrix (100x3) of plus/minus ones. The first path will start at 10, the second at 20, the third at 30:
set.seed(123)
given.rand.matrix <- replicate(3,sign(rnorm(100)))
path <- matrix(NA,101,3)
path[1,] = c(10,20,30)
for (j in 2:101) {
path[j,]<-path[j-1,]+given.rand.matrix[j-1,]
}
The end values (given the seed and rand matrix) are 14, 6, 34... which is the desired result... but...
Question: Is there a way to vectorize the for loop? The problem is that the path matrix is not yet fully populated when calculating. Thus, replacing the loop with
path[2:101,]<-path[1:100,]+given.rand.matrix
returns mostly NAs. I just want to know if this type of for loop is avoidable in R.
Thank you very much in advance.
Definitely vectorizable: Skip the initialization of path, and use cumsum over the matrix:
path <- apply( rbind(c(10,20,30),given.rand.matrix), 2, cumsum)
> head(path)
[,1] [,2] [,3]
[1,] 10 20 30
[2,] 9 19 31
[3,] 8 20 32
[4,] 9 19 31
[5,] 10 18 32
[6,] 11 17 31
> tail(path)
[,1] [,2] [,3]
[96,] 15 7 31
[97,] 14 8 32
[98,] 15 9 33
[99,] 16 8 32
[100,] 15 7 33
[101,] 14 6 34

Resources