Finding denominators of Farey tree fractions in constant time - r

Hello r masters.
I need to calculate Denominators of Farey tree fractions up to 2**30.
I came up with this C++ solution using this approach:
struct FareySB {
int num, den;
FareySB() : den(0) {}
int sum() {
return num + den;
}
};
const int LGMAX = 30;
const int MAX = 1 << LGMAX;
FareySB FTF[MAX];
void get_FTF() {
FTF[0].num = 0; FTF[0].den = 1;
FTF[1].num = 1; FTF[1].den = 1;
FTF[2].num = 1; FTF[2].den = 2;
int k = 3;
for (int i = 1; i < LGMAX; i++) {
int len = 1 << i;
int hlen = len >> 1;
for (int j=0; j<hlen; j++) {
FTF[k].num = FTF[k-hlen].num;
FTF[k].den = FTF[k-hlen].sum();
k++;
}
for (int j=0; j<hlen; j++) {
FTF[k].num = FTF[k-len].den;
FTF[k].den = FTF[k-1-(j<<1)].den;
k++;
}
}
}
To know the n-th term I need to know all [0..n-1] terms. Ok so far.
This has a problem: memory just explodes after about 2**27.
The denominators of Farey Tree Fractions are the OEIS-A007306:
1, 1, 2, 3, 3, 4, 5, 5, 4, 5, 7, 8, 7, 7, 8, 7, ...
In that OEIS page I found a code which seems to return the n-th term of the sequence in constant time. If thats true it would solve my Memory Limit Exceeded issue.
But the code is in R language:
(R)
# Given n, compute directly a(n)
# by taking into account the binary representation of n-1
aa <- function(n){
b <- as.numeric(intToBits(n))
l <- sum(b)
m <- which(b == 1)-1
d <- 1
if(l > 1) for(j in 1:(l-1)) d[j] <- m[j+1]-m[j]+1
f <- c(1, m[1]+2) # In A002487: f <- c(0, 1)
if(l > 1) for(j in 3:(l+1)) f[j] <- d[j-2]*f[j-1]-f[j-2]
return(f[l+1])
}
# a(0) = 1, a(1) = 1, a(n) = aa(n-1) n > 1
It may be really simple to you, but I don't know R language, and can't understand the above code.
Is it really a constant function? How does that function works?
If you could show me for a given n whats happening inside this function, then I could be able to code it in C++ myself.
Thanks in advance.

I'm not sure quite how it works, but here is what the R code is doing. Assume n=100.
b <- as.numeric(intToBits(n)) this produces a 32-element vector of a (reversed) binary representation of n. For n=100, b is 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
l <- sum(b) is the sum of the elements of b (i.e. the number of 1s). In this case l=3
m <- which(b == 1)-1 is a vector of the indices of the elements of b that are equal to 1, each reduced by 1. So for n=100, m= 2 5 6
d <- 1 just setting d equal to 1
if(l > 1) for(j in 1:(l-1)) d[j] <- m[j+1]-m[j]+1. If l is bigger than one, then d becomes a vector of length l-1, where each d is the differences between successive values of m, plus one. So for n=100, d= 4 2
f <- c(1, m[1]+2) sets f as a vector with the first value 1, second value the first element of m, plus 2. Here f is 1 4
if(l > 1) for(j in 3:(l+1)) f[j] <- d[j-2]*f[j-1]-f[j-2]. If l is bigger than one, this adds elements onto the end of f, according to that formula - e.g. f[3] is d[1]*f[2]-f[1] or 4*4-1=15. For n=100, f is 1 4 15 26.
return(f[l+1]) This returns the last element of f as the result.
I'm not sure whether it is constant, but it looks pretty quick as n increases. Good luck!

Related

Expressing Natural Number by sum of Triangular numbers

Triangular numbers are numbers which is number of things when things can be arranged in triangular shape.
For Example, 1, 3, 6, 10, 15... are triangular numbers.
o o o o o o o o o o is shape of n=4 triangular number
what I have to do is A natural number N is given and I have to print
N expressed by sum of triangular numbers.
if N = 4
output should be
1 1 1 1
1 3
3 1
else if N = 6
output should be
1 1 1 1 1 1
1 1 1 3
1 1 3 1
1 3 1 1
3 1 1 1
3 3
6
I have searched few hours and couldn't find answers...
please help.
(I am not sure this might help, but I found that
If i say T(k) is Triangular number when n is k, then
T(k) = T(k-1) + T(k-3) + T(k-6) + .... + T(k-p) while (k-p) > 0
and p is triangular number )
Here's Code for k=-1(Read comments below)
#include <iostream>
#include <vector>
using namespace std;
long TriangleNumber(int index);
void PrintTriangles(int index);
vector<long> triangleNumList(450); //(450 power raised by 2 is about 200,000)
vector<long> storage(100001);
int main() {
int n, p;
for (int i = 0; i < 450; i++) {
triangleNumList[i] = i * (i + 1) / 2;
}
cin >> n >> p;
cout << TriangleNumber(n);
if (p == 1) {
//PrintTriangles();
}
return 0;
}
long TriangleNumber(int index) {
int iter = 1, out = 0;
if (index == 1 || index == 0) {
return 1;
}
else {
if (storage[index] != 0) {
return storage[index];
}
else {
while (triangleNumList[iter] <= index) {
storage[index] = ( storage[index] + TriangleNumber(index - triangleNumList[iter]) ) % 1000000;
iter++;
}
}
}
return storage[index];
}
void PrintTriangles(int index) {
// What Algorithm?
}
Here is some recursive Python 3.6 code that prints the sums of triangular numbers that total the inputted target. I prioritized simplicity of code in this version. You may want to add error-checking on the input value, counting the sums, storing the lists rather than just printing them, and wrapping the entire routine into a function. Setting up the list of triangular numbers could also be done in fewer lines of code.
Your code saved time but worsened memory usage by "memoizing" the triangular numbers (storing and reusing them rather than always calculating them when needed). You could do the same to the sum lists, if you like. It is also possible to make this more in the dynamic programming style: find the sum lists for n=1 then for n=2 etc. I'll leave all that to you.
""" Given a positive integer n, print all the ways n can be expressed as
the sum of triangular numbers.
"""
def print_sums_of_triangular_numbers(prefix, target):
"""Print sums totalling to target, each after printing the prefix."""
if target == 0:
print(*prefix)
return
for tri in triangle_num_list:
if tri > target:
return
print_sums_of_triangular_numbers(prefix + [tri], target - tri)
n = int(input('Value of n ? '))
# Set up list of triangular numbers not greater than n
triangle_num_list = []
index = 1
tri_sum = 1
while tri_sum <= n:
triangle_num_list.append(tri_sum)
index += 1
tri_sum += index
# Print the sums totalling to n
print_sums_of_triangular_numbers([], n)
Here are the printouts of two runs of this code:
Value of n ? 4
1 1 1 1
1 3
3 1
Value of n ? 6
1 1 1 1 1 1
1 1 1 3
1 1 3 1
1 3 1 1
3 1 1 1
3 3
6

R - Finding bitwise binary neighbors (flipping one bit at a time)

Is there a more effective way to match matrix rows when using large matrices?
I have a vector with values that correspond to a matrix of 2^N rows. N is typically large e.g., >20. Each row is a unique combination of N={0,1} values and represents a 'position' on a decision space. I.e., for N=3 the rows would be
0 0 0,
0 0 1,
0 1 0,
1 0 0,
...,
1 1 1
I need to determine whether a position is a local maximum, i.e., whether the N neighboring positions are of lower values. For example, to the position 0 0 0, the neighboring positions are 1 0 0, 0 1 0, and 0 0 1, accordingly.
I have coded the following solution that does the job but very slowly for large N.
library(prodlim) #for row.match command
set.seed(1234)
N=10
space = as.matrix(expand.grid(rep(list(0:1), N))) #creates all combinations of 0-1 along N-dimensions
performance = replicate(2^N, runif(1, min=0, max=1)) #corresponding values for each space-row (position)
#determine whether a space position is a local maxima, that is, the N neighboring positions are smaller in performance value
system.time({
local_peaks_pos = matrix(NA,nrow=2^N, ncol=1)
for(v in 1:2^N)
{
for(q in 1:N)
{
temp_local_pos = space[v,1:N]
temp_local_pos[q] = abs(temp_local_pos[q]-1)
if(performance[row.match(temp_local_pos[1:N], space[,1:N])] > performance[v])
{
local_peaks_pos[v,1] = 0
break
}
}
}
local_peaks_pos[is.na(local_peaks_pos)] = 1
})
user system elapsed
9.94 0.05 10.06
As Gabe mentioned in his comment,
you can exploit the fact that your decision space can be interpreted as single integers:
set.seed(1234L)
N <- 10L
performance <- runif(2^N)
powers_of_two <- as.integer(rev(2L ^ (0L:(N - 1L))))
is_local_max <- sapply(0L:(2^N - 1), function(i) {
multipliers <- as.integer(rev(intToBits(i)[1L:N])) * -1L
multipliers[multipliers == 0L] <- 1L
neighbors <- i + powers_of_two * multipliers
# compensate that R vectors are 1-indexed
!any(performance[neighbors + 1L] > performance[i + 1L])
})
# compensate again
local_peaks_int <- which(is_local_max) - 1L
local_peaks_binary <- t(sapply(local_peaks_int, function(int) {
as.integer(rev(intToBits(int)[1L:N]))
}))
> head(local_peaks_binary)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0 0 0 0 1 0 0
[2,] 0 0 0 0 1 0 0 1 1 0
[3,] 0 0 0 0 1 1 1 1 0 0
[4,] 0 0 0 1 0 0 0 1 1 1
[5,] 0 0 0 1 0 1 0 1 0 1
[6,] 0 0 0 1 1 0 1 1 1 0
In decimal,
multipliers contains the the sign of the powers_of_two so that,
when added to the current integer,
it represents a bit flip in binary.
For example,
if the original binary was 0 0 and we flip one bit to get 1 0,
it's as if we added 2^1 in decimal,
but if it was originally 1 0 and we flip one bit to get 0 0,
then we subtracted 2^1 in decimal.
Each row in local_peaks_binary is a binary from your decision space,
where the least significant bit is on the right.
So, for example, the first local peak is a decimal 4.
See this question for the mapping of integers to binary.
EDIT: and if you want to do it in parallel:
library(doParallel)
set.seed(1234L)
N <- 20L
performance <- runif(2^N)
powers_of_two <- as.integer(rev(2 ^ (0:(N - 1))))
num_cores <- detectCores()
workers <- makeCluster(num_cores)
registerDoParallel(workers)
chunks <- splitIndices(length(performance), num_cores)
chunks <- lapply(chunks, "-", 1L)
local_peaks_int <- foreach(chunk=chunks, .combine=c, .multicombine=TRUE) %dopar% {
is_local_max <- sapply(chunk, function(i) {
multipliers <- as.integer(rev(intToBits(i)[1L:N])) * -1L
multipliers[multipliers == 0L] <- 1L
neighbors <- i + powers_of_two * multipliers
# compensate that R vectors are 1-indexed
!any(performance[neighbors + 1L] > performance[i + 1L])
})
# return
chunk[is_local_max]
}
local_peaks_binary <- t(sapply(local_peaks_int, function(int) {
as.integer(rev(intToBits(int)[1L:N]))
}))
stopCluster(workers); registerDoSEQ()
The above completes in ~2.5 seconds in my system,
which has 4 CPU cores.
Here is a C++ version that uses multi-threading but,
at least in my system with 4 threads,
it doesn't seem faster than Gabe's Fortran version.
However, when I try to run Gabe's Fortran code in a new session,
I get the following error with N <- 29L:
cannot allocate vector of size 4.0 Gb.
EDIT: Apparently I changed something important along the way,
because after testing again,
the C++ version actually seems faster.
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(RcppParallel)]]
#include <cstddef> // size_t
#include <vector>
#include <Rcpp.h>
#include <RcppParallel.h>
using namespace std;
using namespace Rcpp;
using namespace RcppParallel;
class PeakFinder : public Worker
{
public:
PeakFinder(const NumericVector& performance, vector<int>& peaks, const int N)
: performance_(performance)
, peaks_(peaks)
, N_(N)
{ }
void operator()(size_t begin, size_t end) {
vector<int> peaks;
for (size_t i = begin; i < end; i++) {
bool is_local_peak = true;
unsigned int mask = 1;
for (int exponent = 0; exponent < N_; exponent++) {
unsigned int neighbor = static_cast<unsigned int>(i) ^ mask; // bitwise XOR
if (performance_[i] < performance_[neighbor]) {
is_local_peak = false;
break;
}
mask <<= 1;
}
if (is_local_peak)
peaks.push_back(static_cast<int>(i));
}
mutex_.lock();
peaks_.insert(peaks_.end(), peaks.begin(), peaks.end());
mutex_.unlock();
}
private:
const RVector<double> performance_;
vector<int>& peaks_;
const int N_;
tthread::mutex mutex_;
};
// [[Rcpp::export]]
IntegerVector local_peaks(const NumericVector& performance, const int N) {
vector<int> peaks;
PeakFinder peak_finder(performance, peaks, N);
// each thread call will check at least 1024 values
parallelFor(0, performance.length(), peak_finder, 1024);
IntegerVector result(peaks.size());
int i = 0;
for (int peak : peaks) {
result[i++] = peak;
}
return result;
}
After saving the C++ code in local-peaks.cpp,
this code:
library(Rcpp)
library(RcppParallel)
sourceCpp("local-peaks.cpp")
set.seed(1234L)
N <- 29L
performance <- runif(2^N)
system.time({
local_peaks_int <- local_peaks(performance, N)
})
finished in ~2 seconds
(without considering the time required to allocate performance).
If you do need the binary representation,
you can change local_peaks like this
(see this question):
// [[Rcpp::export]]
IntegerMatrix local_peaks(const NumericVector& performance, const int N) {
vector<int> peaks;
PeakFinder peak_finder(performance, peaks, N);
// each thread call will check at least 1024 values
parallelFor(0, performance.length(), peak_finder, 1024);
// in case you want the same order every time, #include <algorithm> and uncomment next line
// sort(peaks.begin(), peaks.end());
IntegerMatrix result(peaks.size(), N);
int i = 0;
for (int peak : peaks) {
for (int j = 0; j < N; j++) {
result(i, N - j - 1) = peak & 1;
peak >>= 1;
}
i++;
}
return result;
}
Here is one solution that follows the same general structure as your example code. intToBits and packBits map to and from the binary representation for each integer (subtracting one to start at zero). The inner loop flips each of the N bits to get the neighbors. On my laptop, this runs in a fraction of a second for N=10 and around a minute for N=20. The commented code stores some information from neighbors already tested so as to not redo the calculation. Uncommenting those lines makes it run in about 35 seconds for N=20.
loc_max <- rep(1, 2^N)
for (v in 1:2^N){
## if (loc_max[v] == 0) next
vbits <- intToBits(v-1)
for (q in 1:N){
tmp <- vbits
tmp[q] <- !vbits[q]
pos <- packBits(tmp, type = "integer") + 1
if (performance[pos] > performance[v]){
loc_max[v] <- 0
break
## } else {
## loc_max[pos] <- 0
}
}
}
identical(loc_max, local_peaks_pos[, 1])
## [1] TRUE
EDIT:
It sounds like you need every bit of speed possible, so here's another suggestion that relies on compiled code to run significantly faster than my first example. A fraction of a second for N=20 and a bit under 20 seconds for N=29 (the largest example I could fit in my laptop's RAM).
This is using a single core; you could either parallelize this, or alternatively run it in a single core and parallelize your Monte Carlo simulations instead.
library(inline)
loopcode <-
" integer v, q, pos
do v = 0, (2**N)-1
do q = 0, N-1
if ( btest(v,q) ) then
pos = ibclr(v, q)
else
pos = ibset(v, q)
end if
if (performance(pos) > performance(v)) then
loc_max(v) = 0
exit
end if
end do
end do
"
loopfun <- cfunction(sig = signature(performance="numeric", loc_max="integer", n="integer"),
dim=c("(0:(2**n-1))", "(0:(2**n-1))", ""),
loopcode,
language="F95")
N <- 20
performance = runif(2^N, min=0, max=1)
system.time({
floop <- loopfun(performance, rep(1, 2^N), N)
})
## user system elapsed
## 0.049 0.003 0.052
N <- 29
performance = runif(2^N, min=0, max=1)
system.time({
floop <- loopfun(performance, rep(1, 2^N), N)
})
## user system elapsed
## 17.892 1.848 19.741
I don't think pre-computing the neighbors would help much here since I'd guess the comparisons accessing different sections of such a large array are the most time consuming part.

R: Summing up neighboring matrix elements. How to speed up?

I'm working with large matrices of about 2500x2500x50 (lonxlatxtime). The matrix contains only 1 and 0. I need to know for each timestep the sum of the 24 surrounding elements. So far I did it about this way:
xdim <- 2500
ydim <- 2500
tdim <- 50
a <- array(0:1,dim=c(xdim,ydim,tdim))
res <- array(0:1,dim=c(xdim,ydim,tdim))
for (t in 1:tdim){
for (x in 3:(xdim-2)){
for (y in 3:(ydim-2)){
res[x,y,t] <- sum(a[(x-2):(x+2),(y-2):(y+2),t])
}
}
}
This works, but it is much too slow for my needs. Has anybody please an advice how to speed up?
Intro
I have to say, there are so many hidden things behind just the setup of the arrays. The remainder of the problem is trivial though. As a result, there are two ways to go about it really:
Bruteforce given by #Alex (written in C++)
Observing replication patterns
Bruteforce with OpenMP
If we want to 'brute force' it, then we can use the suggestion given by #Alex to employ OpenMP with Armadillo
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// Add a flag to enable OpenMP at compile time
// [[Rcpp::plugins(openmp)]]
// Protect against compilers without OpenMP
#ifdef _OPENMP
#include <omp.h>
#endif
// [[Rcpp::export]]
arma::cube cube_parallel(arma::cube a, arma::cube res, int cores = 1) {
// Extract the different dimensions
unsigned int tdim = res.n_slices;
unsigned int xdim = res.n_rows;
unsigned int ydim = res.n_cols;
// Same calculation loop
#pragma omp parallel for num_threads(cores)
for (unsigned int t = 0; t < tdim; t++){
// pop the T
arma::mat temp_mat = a.slice(t);
// Subset the rows
for (unsigned int x = 2; x < xdim-2; x++){
arma::mat temp_row_sub = temp_mat.rows(x-2, x+2);
// Iterate over the columns with unit accumulative sum
for (unsigned int y = 2; y < ydim-2; y++){
res(x,y,t) = accu(temp_row_sub.cols(y-2,y+2));
}
}
}
return res;
}
Replication Patterns
However, the smarter approach is understanding how the array(0:1, dims) is being constructed.
Most notably:
Case 1: If xdim is even, then only the rows of a matrix alternate.
Case 2: If xdim is odd and ydim is odd, then rows alternate as well as the matrices alternate.
Case 3: If xdim is odd and ydim is even, then only the rows alternate
Examples
Let's see the cases in action to observe the patterns.
Case 1:
xdim <- 2
ydim <- 3
tdim <- 2
a <- array(0:1,dim=c(xdim,ydim,tdim))
Output:
, , 1
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 1 1 1
, , 2
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 1 1 1
Case 2:
xdim <- 3
ydim <- 3
tdim <- 3
a <- array(0:1,dim=c(xdim,ydim,tdim))
Output:
, , 1
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 1 0 1
[3,] 0 1 0
, , 2
[,1] [,2] [,3]
[1,] 1 0 1
[2,] 0 1 0
[3,] 1 0 1
, , 3
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 1 0 1
[3,] 0 1 0
Case 3:
xdim <- 3
ydim <- 4
tdim <- 2
a <- array(0:1,dim=c(xdim,ydim,tdim))
Output:
, , 1
[,1] [,2] [,3] [,4]
[1,] 0 1 0 1
[2,] 1 0 1 0
[3,] 0 1 0 1
, , 2
[,1] [,2] [,3] [,4]
[1,] 0 1 0 1
[2,] 1 0 1 0
[3,] 0 1 0 1
Pattern Hacking
Alrighty, based on the above discussion, we opt to make a bit of code the exploits this unique pattern.
Creating Alternating Vectors
An alternating vector in this case switches between two different values.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// ------- Make Alternating Vectors
arma::vec odd_vec(unsigned int xdim){
// make a temporary vector to create alternating 0-1 effect by row.
arma::vec temp_vec(xdim);
// Alternating vector (anyone have a better solution? )
for (unsigned int i = 0; i < xdim; i++) {
temp_vec(i) = (i % 2 ? 0 : 1);
}
return temp_vec;
}
arma::vec even_vec(unsigned int xdim){
// make a temporary vector to create alternating 0-1 effect by row.
arma::vec temp_vec(xdim);
// Alternating vector (anyone have a better solution? )
for (unsigned int i = 0; i < xdim; i++) {
temp_vec(i) = (i % 2 ? 1 : 0); // changed
}
return temp_vec;
}
Creating the three cases of matrix
As mentioned above, there are three cases of matrix. The even, first odd, and second odd cases.
// --- Handle the different cases
// [[Rcpp::export]]
arma::mat make_even_matrix(unsigned int xdim, unsigned int ydim){
arma::mat temp_mat(xdim,ydim);
temp_mat.each_col() = even_vec(xdim);
return temp_mat;
}
// xdim is odd and ydim is even
// [[Rcpp::export]]
arma::mat make_odd_matrix_case1(unsigned int xdim, unsigned int ydim){
arma::mat temp_mat(xdim,ydim);
arma::vec e_vec = even_vec(xdim);
arma::vec o_vec = odd_vec(xdim);
// Alternating column
for (unsigned int i = 0; i < ydim; i++) {
temp_mat.col(i) = (i % 2 ? o_vec : e_vec);
}
return temp_mat;
}
// xdim is odd and ydim is odd
// [[Rcpp::export]]
arma::mat make_odd_matrix_case2(unsigned int xdim, unsigned int ydim){
arma::mat temp_mat(xdim,ydim);
arma::vec e_vec = even_vec(xdim);
arma::vec o_vec = odd_vec(xdim);
// Alternating column
for (unsigned int i = 0; i < ydim; i++) {
temp_mat.col(i) = (i % 2 ? e_vec : o_vec); // slight change
}
return temp_mat;
}
Calculation Engine
Same as the previous solution, just without the t as we no longer need to repeat calculations.
// --- Calculation engine
// [[Rcpp::export]]
arma::mat calc_matrix(arma::mat temp_mat){
unsigned int xdim = temp_mat.n_rows;
unsigned int ydim = temp_mat.n_cols;
arma::mat res = temp_mat;
// Subset the rows
for (unsigned int x = 2; x < xdim-2; x++){
arma::mat temp_row_sub = temp_mat.rows(x-2, x+2);
// Iterate over the columns with unit accumulative sum
for (unsigned int y = 2; y < ydim-2; y++){
res(x,y) = accu(temp_row_sub.cols(y-2,y+2));
}
}
return res;
}
Call Main Function
Here is the core function that pieces everything together. This gives us the desired distance arrays.
// --- Main Engine
// Create the desired cube information
// [[Rcpp::export]]
arma::cube dim_to_cube(unsigned int xdim = 4, unsigned int ydim = 4, unsigned int tdim = 3) {
// Initialize values in A
arma::cube res(xdim,ydim,tdim);
if(xdim % 2 == 0){
res.each_slice() = calc_matrix(make_even_matrix(xdim, ydim));
}else{
if(ydim % 2 == 0){
res.each_slice() = calc_matrix(make_odd_matrix_case1(xdim, ydim));
}else{
arma::mat first_odd_mat = calc_matrix(make_odd_matrix_case1(xdim, ydim));
arma::mat sec_odd_mat = calc_matrix(make_odd_matrix_case2(xdim, ydim));
for(unsigned int t = 0; t < tdim; t++){
res.slice(t) = (t % 2 ? sec_odd_mat : first_odd_mat);
}
}
}
return res;
}
Timing
Now, the real truth is how well does this perform:
Unit: microseconds
expr min lq mean median uq max neval
r_1core 3538.022 3825.8105 4301.84107 3957.3765 4043.0085 16856.865 100
alex_1core 2790.515 2984.7180 3461.11021 3076.9265 3189.7890 15371.406 100
cpp_1core 174.508 180.7190 197.29728 194.1480 204.8875 338.510 100
cpp_2core 111.960 116.0040 126.34508 122.7375 136.2285 162.279 100
cpp_3core 81.619 88.4485 104.54602 94.8735 108.5515 204.979 100
cpp_cache 40.637 44.3440 55.08915 52.1030 60.2290 302.306 100
Script used for timing:
cpp_parallel = cube_parallel(a,res, 1)
alex_1core = alex(a,res,xdim,ydim,tdim)
cpp_cache = dim_to_cube(xdim,ydim,tdim)
op_answer = cube_r(a,res,xdim,ydim,tdim)
all.equal(cpp_parallel, op_answer)
all.equal(cpp_cache, op_answer)
all.equal(alex_1core, op_answer)
xdim <- 20
ydim <- 20
tdim <- 5
a <- array(0:1,dim=c(xdim,ydim,tdim))
res <- array(0:1,dim=c(xdim,ydim,tdim))
ga = microbenchmark::microbenchmark(r_1core = cube_r(a,res,xdim,ydim,tdim),
alex_1core = alex(a,res,xdim,ydim,tdim),
cpp_1core = cube_parallel(a,res, 1),
cpp_2core = cube_parallel(a,res, 2),
cpp_3core = cube_parallel(a,res, 3),
cpp_cache = dim_to_cube(xdim,ydim,tdim))
Here's one solution that's fast for a large array:
res <- apply(a, 3, function(a) t(filter(t(filter(a, rep(1, 5), circular=TRUE)), rep(1, 5), circular=TRUE)))
dim(res) <- c(xdim, ydim, tdim)
I filtered the array using rep(1,5) as the weights (i.e. sum values within a neighborhood of 2) along each dimension. I then modified the dim attribute since it initially comes out as a matrix.
Note that this wraps the sum around at the edges of the array (which might make sense since you're looking at latitude and longitude; if not, I can modify my answer).
For a concrete example:
xdim <- 500
ydim <- 500
tdim <- 15
a <- array(0:1,dim=c(xdim,ydim,tdim))
and here's what you're currently using (with NAs at the edges) and how long this example takes on my laptop:
f1 <- function(a, xdim, ydim, tdim){
res <- array(NA_integer_,dim=c(xdim,ydim,tdim))
for (t in 1:tdim){
for (x in 3:(xdim-2)){
for (y in 3:(ydim-2)){
res[x,y,t] <- sum(a[(x-2):(x+2),(y-2):(y+2),t])
}
}
}
return(res)
}
system.time(res1 <- f1(a, xdim, ydim, tdim))
# user system elapsed
# 14.813 0.005 14.819
And here's a comparison with the version I described:
f2 <- function(a, xdim, ydim, tdim){
res <- apply(a, 3, function(a) t(filter(t(filter(a, rep(1, 5), circular=TRUE)), rep(1, 5), circular=TRUE)))
dim(res) <- c(xdim, ydim, tdim)
return(res)
}
system.time(res2 <- f2(a, xdim, ydim, tdim))
# user system elapsed
# 1.188 0.047 1.236
You can see there's a significant speed boost (for large arrays). And to check that it's giving the correct solution (note that I'm adding NAs so both results match, since the one I gave filters in a circular manner):
## Match NAs
res2NA <- ifelse(is.na(res1), NA, res2)
all.equal(res2NA, res1)
# [1] TRUE
I'll add that your full array (2500x2500x50) took just under a minute (about 55 seconds), although it did use a lot of memory in the process, FYI.
Your current code has a lot of overhead from redundant subsetting and calculation. Clean this up if you want better speed.
At xdim <- ydim <- 20; tdim <- 5, I see a 23% speedup on my machine.
At xdim <- ydim <- 200; tdim <- 10, I see a 25% speedup.
This comes at small cost of additional memory, which is obvious by examining the code below.
xdim <- ydim <- 20; tdim <- 5
a <- array(0:1,dim=c(xdim,ydim,tdim))
res <- array(0:1,dim=c(xdim,ydim,tdim))
microbenchmark(op= {
for (t in 1:tdim){
for (x in 3:(xdim-2)){
for (y in 3:(ydim-2)){
res[x,y,t] <- sum(a[(x-2):(x+2),(y-2):(y+2),t])
}
}
}
},
alex= {
for (t in 1:tdim){
temp <- a[,,t]
for (x in 3:(xdim-2)){
temp2 <- temp[(x-2):(x+2),]
for (y in 3:(ydim-2)){
res[x,y,t] <- sum(temp2[,(y-2):(y+2)])
}
}
}
}, times = 50)
Unit: milliseconds
expr min lq mean median uq max neval cld
op 4.855827 5.134845 5.474327 5.321681 5.626738 7.463923 50 b
alex 3.720368 3.915756 4.213355 4.012120 4.348729 6.320481 50 a
Further improvements:
If you write this in C++, my guess is that recognizing res[x,y,t] = res[x,y-1,t] - sum(a[...,y-2,...]) + sum(a[...,y+2,...]) will save you additional time. In R, it did not in my timing tests.
This problem is also embarrassingly parallel. There's no reason you couldn't split the t dimension to make more use of a multi-core architecture.
Both of these are left to the reader / OP.

Can this be expressed using Integer Programming or Constraint Programming?

Consider a fixed m by n matrix M, all of whose entries are 0 or 1. The question is whether there exists a non zero vector v, all of whose entries are -1, 0 or 1 for which Mv = 0. For example,
[0 1 1 1]
M_1 = [1 0 1 1]
[1 1 0 1]
In this example, there is no such vector v.
[1 0 0 0]
M_2 = [0 1 0 0]
[0 0 1 0]
In this example, the vector (0,0,0,1) gives M_2v = 0.
I am currently solving this problem by trying all different vectors v.
However, is it possible to express the problem as an integer
programming problem or constraint programming problem so I can use an
existing software package, such as SCIP instead which might be more
efficient.
It would help a little if you also give a positive example, not just a negative.
I might have missed something in the requirement/definitions, but here is a way of doing it in the Constraint Programming (CP) system MiniZinc (http://minizinc.org/). It don't use any specific constraints unique to CP systems - except perhaps for the function syntax, so it should be possible to translate it to other CP or IP systems.
% dimensions
int: rows = 3;
int: cols = 4;
% the matrix
array[1..rows, 1..cols] of int: M = array2d(1..rows,1..cols,
[0, 1, 1, 1,
1, 0, 1, 1,
1, 1, 0, 1,
] );
% function for matrix multiplication: res = matrix x vec
function array[int] of var int: matrix_mult(array[int,int] of var int: m,
array[int] of var int: v) =
let {
array[index_set_2of2(m)] of var int: res; % result
constraint
forall(i in index_set_1of2(m)) (
res[i] = sum(j in index_set_2of2(m)) (
m[i,j]*v[j]
)
)
;
} in res; % return value
solve satisfy;
constraint
% M x v = 0
matrix_mult(M, v) = [0 | j in 1..cols] /\
sum(i in 1..cols) (abs(v[i])) != 0 % non-zero vector
;
output
[
"v: ", show(v), "\n",
"M: ",
]
++
[
if j = 1 then "\n" else " " endif ++
show(M[i,j])
| i in 1..rows, j in 1..cols
];
By changing the definition of "M" to use decision variables with the domain 0..1 instead of constants:
array[1..rows, 1..cols] of var 0..1: M;
then this model yield 18066 different solutions, for example these two:
v: [-1, 1, 1, 1]
M:
1 0 0 1
1 1 0 0
1 0 0 1
----------
v: [-1, 1, 1, 1]
M:
0 0 0 0
1 0 1 0
1 0 0 1
Note: Generating all solutions is probably more common in CP systems than in traditional MIP systems (this is a feature that I really appreciate).

how to compute the original vector from a distance matrix?

I have a small question about vector and matrix.
Suppose a vector V = {v1, v2, ..., vn}. I generate a n-by-n distance matrix M defined as:
M_ij = | v_i - v_j | such that i,j belong to [1, n].
That is, each element M_ij in the square matrix is the absolute distance of two elements in V.
For example, I have a vector V = {1, 3, 3, 5}, the distance matrix will be
M=[
0 2 2 4;
2 0 0 2;
2 0 0 2;
4 2 2 0; ]
It seems pretty simple. Now comes to the question. Given such a matrix M, how to obtain the initial V?
Thank you.
Based on some answer for this question, it seems that the answer is not unique. So, now suppose that all the initial vector has been normalized to 0 mean and 1 variance. The question is: Given such a symmetric distance matrix M, how to decide the initial normalized vector?
You can't. To give you an idea of why, consider these two cases:
V1 = {1,2,3}
M1 = [ 0 1 2 ; 1 0 1 ; 2 1 0 ]
V2 = {3,4,5}
M2 = [ 0 1 2 ; 1 0 1 ; 2 1 0 ]
As you can see, a single M could be the result of more than one V. Therefore, you can't map backwards.
There is no way to determine the answer uniquely, since the distance matrix is invariant to adding a constant to all elements and to multiplying all the values by -1. Assuming that element 1 is equal to 0, and that the first nonzero element is positive, however, you can find an answer. Here is the pseudocode:
# Assume v[1] is 0
v[1] = 0
# e is value of first non-zero vector element
e = 0
# ei is index of first non-zero vector element
ei = 0
for i = 2...n:
# if all vector elements have been 0 so far
if e == 0:
# get the current distance from element 1 and its index
# this new element may still be 0
e = d[1,i]
ei = i
v[i] = e
elseif d[1,i] == d[ei,i] + v[ei]: # v[i] <= v[1]
# v[i] is to the left of v[1] (assuming v[ei] > v[1])
v[i] = -d[1,i]
else:
# some other case; v[i] is to the right of v[1]
v[i] = d[1,i]
I don't think it is possible to find the original vector, but you can find a translation of the vector by taking the first row of the matrix.
If you let M_ij = | v_i - v_j | and you translate all v_k for k\in [1,n] you will get
M_ij = | v-i + 1 - v_j + 1 |
= | v_i - v_j |
Hence, just take the first row as the vector and find one initial point to translate the vector to.
Correction:
Let v_1 = 0, and let l_k = | v_k | for k\in [2,n] and p_k the parity of v_k
Let p_1 = 1
for(int i = 2; i < n; i++)
if( | l_i - l_(i+1) | != M_i(i+1) )
p_(i+1) = - p_i
else
p_(i+1) = p_i
doing this for all v_k for k\in [2,n] in order will show the parity of each v_k in respect to the others
Then you can find a translation of the original vector with the same or opposite direction
Update (For Normalized vector):
Let d = Sqrt(v_1^2 + v_2^2 + ... + v_n^2)
Vector = {0, v_1 / d, v_2 / d, ... , v_n / d}
or
{0, -v_1 / d, -v_2 / d, ... , -v_n / d}

Resources